Title: Mathematical models of progressive diseases and screening
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00099387/00001
 Material Information
Title: Mathematical models of progressive diseases and screening
Physical Description: x, 220 leaves : ill. ; 28 cm.
Language: English
Creator: Hoshyar, Abdollazim, 1948-
Copyright Date: 1978
Subject: Cancer -- Diagnosis   ( lcsh )
Medical screening   ( lcsh )
Industrial and Systems Engineering thesis Ph. D
Dissertations, Academic -- Industrial and Systems Engineering -- UF
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
Statement of Responsibility: by Abdollazim Hoshyar.
Thesis: Thesis--University of Florida.
Bibliography: Bibliography: leaves 212-219.
General Note: Typescript.
General Note: Vita.
 Record Information
Bibliographic ID: UF00099387
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: alephbibnum - 000074404
oclc - 04673807
notis - AAH9678


This item has the following downloads:

mathematicalmode00hosh ( PDF )

Full Text






To my Wi6e, Meixi,
and my daughter, Hani,
who have made nurmneouh 6 sacti icuZ so that
I might ptnsue aa ,it goat.


I wish to express my sincere appreciation and gratitude to the

members of my doctoral committee for their overall guidance, understanding

and friendship in assisting me in my research. In particular I am deeply

indebted to my committee chairman Dr. Ralph W. Swain not only for his

technical insight and timely assistance but for suggesting the area of

screening for research and for his attitude toward my work. His

exemplary contributions to my education will never be forgotten. I wish

to extend my appreciation to my committee co-chairman Dr. Thom J. Hodgson

for his assistance and encouragement. I would like to thank the other

members of the committee, Dr. Kerry E. Kilpatrick, Dr. Gary J. Koehler

and Dr. Jeffrey P. Krischer for their comments on drafts of the manuscript.

I would also like to thank Dr. Jeffrey P. Krischer and Dr. Lawrence S. Frankel

for providing useful references.

I would like to thank Dr. A Ghavami, Deputy Chancellor, Dr. M.S. Mayeri,

Dean of the School of Engineering, Dr. M. Bahadori-Nejad and all the

faculty members of the department of Mechanical Engineering of Pahlavi

University for offering me a scholarship which made my post-graduate study


I am also grateful to the Division of Health Systems Research and

the Department of Industrial and Systems Engineering for their financial

support during the study.

Finally Mrs. Beth Beville deserves many thanks for her excellent

typing of the dissertation.


ACKNOWLEDGMENTS .................................. ........... .

KEY TO SYMBOLS ...................................................

ABSTRACT .... .....................................................


ONE INTRODUCTION .............................................

TWO LITERATURE REVIEW ........................................

THREE MODEL DEVELOPMENT .....................................

3.1 A General Model of Cancer Screening .................
3.2 The Disease Process .................................
3.3 The Screening Process ...............................
3.4 Estimation of Transition Probabilities ..............
3.5 Objective Functions .................................
3.6 An Alternative Expression for the Objective Function.

FOUR NEUROBLASTOMA ............................................

4.1 Literature on Neuroblastoma .........................
4.2 Development of Unconditional Probabilities ..........
4.3 Objective Function ......................... ........
4.4 Parameter Estimation and Determination of Optimal
Policies .......................... ...............
4.5 Sensitivity Analysis ................................
4.6 Results and Conclusions .............................

FIVE SPECIAL CASES ........................................

5.1 Dependency of True Positive Rate of Two Successive
Examinations .......................................
5.2 True Positive Rate as a Function of Time from
Onset of the Disease .............................
5.3 Screening Examination Consists of a Sequence of
Tests ............................ ............... .
5.4 Transient Problem ...................................
5.5 Aoplication of the Model to the Case of Breast
Cancer Using the Results of HIP Study ..............



















6.1 Branch and Bound Method: Search for an Optimal
Solution .................. .... .... ........... 181
6.2 Search for a "Good" Heuristic Solution ............. 197


PROBLEM ..................... ...................... 208

REFERENCES .............. .. ........ .. ......... ................ 212

BIOGRAPHICAL SKETCH ............... ................................ 220


A1 = 1 F

aij = Transition probability of going from i to j in the t th

period without screening

aijt = Transition probability of going from i to j in the t th

period under any screening program

B1 = 1 F + F/bit,

bilt = Probability of transition from stage i to i' in the t t

interval without screening

bi = Probability of transition from stage i to i' in the t t

interval under any screening program

Cd(t) = Cost associated with death of an individual of age t

Cs(t) = Screening cost at age t

Csp = Cost of screening the population of susoects to the disea

t,i') = Treatment cost of an individual who is in stage i' at age

Cti, CT(t,i') + Cd(t) di(t)

ij(t) = Probability of death due to disease for an individual who
is detected at stage i at age t

fi(t) = Probability that a person who has been in stage i at age

t is properly classified as diseased

f(t) = True positive rate of screening as a function of the time

from onset

C = Constant true positive rate of screening

I = Number of population per one diseased indiviouai






LB = Lower bound to the objective function

M = Number of screenings

OF = The objective function

O = Occult stages = {1,2, ... N}

D = Detected stages = {1',2', ... N'}

PL(t) = Probability of not dying of other causes up to age t

P'[Xt = i] = Probability of being in an occult stage i at age t under

no screening program

P[Xt = i] = Probability of being in an occult stage i at age t under

any screening program

P'[Xt = i'] = Probability of being in a detected stage i' at age t under

no screening program

P[Xt = i'] = Probability of being in a detected stage i' at age t under

any screening program

R(Z,k) = The ratio of population who have completed (t-k) examina-

tions up to i th interval

r = Discount rate

S = State space (stage of the disease, age of the individual)

Ti = Time of the ith test i = 1, ... M

Tmax = Maximum age under consideration

Uit = Probability of staying in stage i in the t th period

without screening

Uit = Probability of staying in stage i in the t th period

under any screening program

Z Decision O if test is not done in the t th interval
t variable -1 if test is done in the t th interval

0 = Healthy state

X0 = Rate of onset of the disease


X' = Rate of detection of the disease
XI = Rate of transition from (-)s category to (+-)s category
2 = Rate of transition from (-)s category to (++)s category
23 = Rate of transition from (+-)s category to (++)s category

i = PCX = i]

Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of
the Requirements for the Degree of Doctor of Philosophy



Abdollazim Hoshyar

August 1978

Chairman: Ralph W. Swain
Major Department: Industrial and Systems Engineering

A stochastic model for a screening program is presented in which the

natural history of the disease is assumed to progress through a set of

stages before detection. A model of the whole process is developed

which addresses the interaction of the disease process and screening process.

The purpose of the model is to develop insight into the disease process

and to derive policies which are optimal relative to the particular

objectives chosen.

The model is implemented for neuroblastoma and breast cancer and, in

the latter case, a comparison is made between the model's output with that

of an actual study group. An investigation is also made of the special

cases when there is dependency between screening results, when the true

positive rate of screening is a function of time from onset, when

multiple tests may be employed in each screening and/or when the prevalence

oool has a transient period.

To determine the "best" policy, an attempt is made to solve the

optimization problem, but to avoid an extensive computation time a


"heuristic" method is also developed which determines "good" policies

efficiently. It is shown that a number of screening strategies yield

objective function values close to the minimum value for the example




The fact that some diseases may reach an advanced state without

obvious symptoms, coupled with potentially better cure rates associated

with early detection,indicates the potential benefits of early detection

of the disease. The probability of detection can be increased by the

process of screening. While screening programs continue to attract an

increasing number of researchers from all fields of science, most of the

work done is primarily medically oriented and, generally, there remains

a lack of a unifying theory relative to the timing and type of screening


The analysis of screening processes centers around determination of

an optimal number of examinations at specific ages. This objective is

of interest for its theoretical and applied consideration. Theoretically

it is a challenging problem which can be difficult to model. Less than

perfect predictability, which is a characteristic of screening proce-

dures, plus variability of the characteristics of the disease in differ-

ent individuals makes the problem difficult to analyze. The real life

situations involving different costs and benefits associated with differ-

ent screening plans provide practical interest in determining good

screening strategies. There is general agreement that significant

differences exist in the use of one screening policy over another

[32,35,49,52,78.89,98,99], and in some cases, such as pap smear and x-ray

examinations, the screening itself may not be harmless [35]. Both cost


and health effect factors motivate the need to evaluate alternative

screening patterns in a systematic manner.

1.1 Objectives of This Research

A few investigators have presented heuristic procedures for screen-

ing of a specified disease. One reason for the lack of a good mathemat-

ical model for screening is the lack of understanding of the disease

process itself. If a particular disease could be modeled mathematically,

with its parameters realistically estimated, then the problem of screen-

ing would be easier to analyze. In order to present a "good" screening

policy, there is a strong need for a basic mathematical model of the


This research is directed toward the development of optimal screen-

ing policies. To accomplish this task for any specific disease it is

necessary to decide 1) who should be screened, 2) how often they should

be screened and 3) what combination of test methods should be used at

each examination. A model of the whole process will be developed, which

addresses the interaction of the disease process and screening process.

The purpose of the model is to develop insight into the disease process

and to derive policies which are optimal relative to the particular

objectives chosen. Some of these objectives are: minimization of the

total cost of medical care (including the cost of screening and treat-

ment), maximization of life expectancy, maximization of the probability

of detection of the disease in a favorable stage and minimization of the

delay in time between onset of the disease and its detection.

Increasing the number of examinations increases the screening cost,

but hopefully, to some extent, it also increases the chance of earlier

detection of the disease. Age of the individual at the time of examina-

tion is a basic factor and depending on the disease under consideration,

it can play a major role. Screening at an age in which disease is not

likely to be active will increase the cost and may introduce some side

effects; screening at the proper age helps the physician detect the

disease before its natural detection time and therefore increases the

chance of survival and may decrease the cost of treatment. Also using

different combinations of screening methods might increase the screening

cost, but it decreases the chance of false negative results. Determina-

tion of basic factors affecting a screening program, presentation of a

mathematical model for that particular disease with estimation of its

parameters, validation of the model and selection of the best screening

policy depend on the objectives chosen.

Therefore the objective of this research is to develop a general

model of cancer screening, specialize it for Neuroblastoma and breast

cancer, investigate model sensitivity to changes in parameters or

assumptions, and derive good screening policies by use of optimization


Chapter Three provides a general formulation of screening programs

and presents a model of the screening process superimposed on the disease

process. Restrictive assumptions on the process as well as a discussion

of relevant optimality criteria are presented. This chapter is intended

to provide a general model applicable to any disease. It consists of

four parts: part A presents a stochastic model of disease progress in

an individual, part B postulates the probability structure for the

progress of the disease and employs available data to estimate the

elements of a transition probability matrix, part C presents different

objective functions of interest and estimates the terms, part D contains

the basic idea of screening and develops different strategies of


Chapter Four is based on the development of the previous section

and extends the results to the case of Neuroblastoma which is a cancer

of early childhood.

In Chapter Five several assumptions of the model are altered and

a sensitivity analysis is employed to determine the robustness of the

model to changes in some parameters. In the first three sections the

assumption of constant screening accuracy is relaxed and the expected

total cost of screening is expressed as a function of screening policy

for different cases. In the last two sections, the initial effect of a

screening program on population prevalence pool is investigated and the

model is implemented for the case of breast cancer.

In Chapter Six an attempt is made to solve the optimization

problem efficiently and determine the "best" policy. Finally a "heuristic"

method is presented which determines "good" policies.



The problem of screening progressive diseases has received

considerable attention in the recent years due to the impact it has

on society as a whole. Daniel G. Miller [71] under the topic "What

Is Early Diaqnosis Doing?" looks at cancer as a very peculiar disease

.Cancer affects the health of other members of society
as well as the patient. A disease, the cost of which is
fifteen billion dollars a year, one half of the annual
budget deficit, has to be considered a matter which affects
the health of the nation, if only because it diverts medical
manpower and facilities which could be used for other
urgent health care needs. Furthermore, a disease which
disrupts families,. . must be considered in terms of its
total impact on the society. Currently one in every six
dollars spent in health care is spent on cancer, .. .the
cost of surgery, radiation therapy, and terminal care is
estimated to be at twenty-thousand dollars per case, ..

Miller points out the importance of early detection and mentions

that the use of an appropriate screening policy might reduce the cost

associated with cancer.

Gilbertsen [36] in his report on 14,978 cases of cancer claims

that his studies suggest that the majority of commonly occurring cancers

can be detected on periodic examination far earlier than is likely when

the patient waits until his symptoms force him to see the physician and

prompt the physician to undertake examination and eventual diagnosis.

The result of his studies also suggest that when cancers so detected are

treated promptly and adequately, substantial improvement in prognosis

for survival can be anticipated for patients with most of the common

cancers which occur today. This concept has been pointed out by many

other investigators [4,19,27,45,69,83].

Early diagnosis, which is possible through screening at-risk

individuals, is an important concept in aborting or ameliorating the

consequences of such diseases as cancers. Considerable funds and

effort are being expended for the purpose of the early detection of

cancers. The investigations are mostly concerned with the comparison

of the costs and benefits of alternative individual screening strategies

for selected site-specific cancers and with the selection of preferred

strategies. An individual screening strategy is defined [98,99] as the

specification of the number and type of screens to be given to an

individual in a particular at-risk group and the ages at which the tests

are to be given. Screening is operationally defined [99] as the process

of selecting those asymptomatic persons who would benefit from further

diagnostic studies. The selection of one screening strategy over

another depends on the criteria by which the strategy is evaluated and

the constraints placed on alternative screening schedule.

Before reviewing the literature a set of definitions related to the

screening will be presented.

1) The true positive rate of screening is the probability the

test indicates an affected individual, given that the individual has

that particular cancer.

2) The false positive rate of screening is the probability the

test indicates the disease is present when the disease is not present.

3) The true negative rate of screening is the probability the

test indicates the disease is not present when the disease is not


4) The false negative rate of screening is the probability the

test fails to indicate the presence of the disease in an affected


5) Onset time is the first age at which some recognizable

biological change occurs.

6) The pre-clinical state of the disease is regarded as a state

where clinical symptoms have not been exhibited and the individual is

unaware of the disease.

7) The clinical state of the disease is a state where clinical

symptoms are exhibited.

8) Clinical surfacing: If the disease is not detected by a

scheduled screening examination, the disease will be said to surface


9) Lead time for a screening program is the difference between

time of diagnosis by the screen and that later time when the disease

would be clinically apparent and detectable.

To make a cost-effectiveness analysis, the following requirements

would have to be specified by the analyst.

1) The necessary data on the nature of the disease--including

detection rates, onset rate, and survival rate.

2) A decision as to what assumptions are to be accepted concerning

the disease progression in the occult part.

3) A specification of the accuracy and reliability of the different

screening methods, including false-negative rates, false-positive rates

and the degree of the uniformity of the test results.

4) A specification of the constraints on the number of screens and

their interval per lifetime of an at-risk individual.

5) A specification of the effectiveness measures of interest.

To implement a model to a site-specific cancer there is a strong

need for the appropriate data. Unfortunately for most types of cancer,

data do not exist and/or are not in a useful form, which makes it

extremely difficult to estimate most of the epidemiologic parameters.

For data to be useful, it should carry information on the individual's

age, sex, race, social level, education, age at each screen, mode of

screening, result of screens, whether or not the individual was eventually

found with disease through screening, the cause and date of his/her death.

Recently there has been some effort to collect data in a more useful

manner. For instance, in the case of breast cancer, in December 1963,

the Health Insurance Program of Greater New York (HIP) started a long

term randomized trial directed at the question "Does periodic breast

cancer screening with mammography and clinical examination result in a

reduction in mortality from breast cancer in the female population."

Two systematic random samples, each consisting of 31,000 women aged 40 to

64 were selected. Each woman in the study group was offered a screening

examination and three additional examinations at annual intervals each

consisting of a clinical examination, mammography, and an interview.

The women in the control group were matched to the study women for date

of entry and continued to receive their ordinary medical care

[85,86,87,95,101]. These data have been used extensively by many inves-

tigators for the purpose of modeling the behavior of breast cancers

[39,89,90], estimation of epidemiologic parameters [49,55,56,87],

estimation of false negative rates in medical screening [39], estimation

of the effect of screening on 5-year mortality rates [85,86,87,95,101]

and determination of factors that correlate with promptness in seeking

diagnosis [31].

The accuracy and reliability of the different screening methods are

important factors in early detection of the disease and for any

particular type of screen an estimate of the false positive rate and

false negative rate of screening has to be made.* In 1942 Sawitz and

Karpinos [82] proved that when each individual in a group receives the

same number of examinations, estimation of the efficiency-true positive

rate of screening F, and of the prevalence rate, P, are given by

SR[1-(-F)n] R p R K
n K nNF N-[I-(l-F)n]

where R is the total number of positive test results, n is the number

of examinations given to each individual, K is the number of individuals

with any positive test results, and N is the number of individuals in

the group. In this formulation F is the adjusted ratio of positive

examinations to the number of examinations on detected diseased indi-

viduals and P is the adjusted ratio of detected diseased individuals to

the number of individuals examined. Therefore knowing R, n, N and K, the

procedure can be used to get an estimate of true positive rate of screening

and prevalence.

Later, in 1951 Mantel [68] modified these estimates for the

case of unequal number of examinations given to different suspects.

Although the literature on estimating the prevalence and true positive
rate of screening is cited here, these techniques are not used in the
development of the model.

Wittes and Sidel [112] presented a method for estimation of

population size from the simple capture-recapture model and showed how

to obtain estimates when there are more than two independent sources of

notification. They defined

K = the number of notification sources

Ei = probability that a member of the population is identified

by i-th source

N = the total size of the population

n. = the number of members of the population identified by the

i-th source

nij = the total number of members of the population identified by

the i-th and j-th source

n = the total number of different population members identified.

They found that El is the solution of the following polynomial of degree

n1-nI Tl (1-niE/nl) n-E1 = 0
1 =1

Estimate of E2,..., En are obtained from El

s s = 2,...,n
s n1

The population size is estimated as
N = n/[I- 1 (1-E0)]
i=1 1

They gave an approximation to the variance of N and tabulated it for some

values of E,, E2 and n.

Goldberg and Wittes [39] proposed a similar capture-recapture method

[109,11o,11] to estimate the sensitivity of the screen, the false

negative rate, the population prevalence, the population incidence,

their means, variances, and covariances, and their properties in small

strata. They used HIP findings [87] to illustrate the model. They


i = number of the screens i=0,...,S

M. = number of individuals attending the i-th screen

d. = number of cases of disease detected at the i-th screen

NO = true population prevalence at the 0-th, or initial screen

Ni = true population incidence between the (i-1)-st and the i-th

screen (i>0)

3i = the number of false negatives

ai = the number of false positives.

Suppressing the subscript i, at any screen = = where and 9

are false positive rate and false negative rate. They assumed that false

positives have been removed and considered a screening program in which

subjects are screened S+1 times by K different screening methods. Let

dij = number of cases of disease detected at the i-th screen by the

j-th screening mechanism. (i=0, ..S, j=l, ..K) An unbiased estimate of

the total number of diseased individuals at the i-th screen, Ni, is

K= K
S(Ni-d. (+1-1 (N -di) i=O,...,S

They also computed the variance of N for the case K=2.

It is seen that the general limitation of all these procedures

is that, for some diseases, the assumption that examination efficiency

is the same for all infected individuals may not be realistic, and

instead examination efficiency may vary from individual to individual,

depending on the nature and stage of the disease.

It is generally agreed that many cancers can be modeled as progressing

through different stages [5]. The staging of the disease is useful,

because of its importance in treatment and its effect on prognosis.

In December 1965, the American Joint Committee on Cancer Staging [16]

offered the TNM system which was based upon the three capital letters;

T Tumor or primary lesion and its extent; N lymph nodes of the region

and their condition; M distant metastasis. Within each letter element,

increasing involvement was categorized by the combination of the capital

letter with a numerical suffix. But this staging was too general to be

used for all types of cancer. In fact, depending on the type of cancer

under consideration different investigators employed different staging


Eker [22] introduced a method of staging for Carcinomas of the

colon and rectum in 1963.

Cutler and Myers [17] introduced a method of staging for breast

cancer in 1967.

Barron and Richart [6] introduced a method of staging for the cancer

of cervical carcinoma in 1968.

Evans et al. [27] introduced a method of staging for Neuroblastoma

in 1971.

Aside from the fact that the importance of early detection is known,

there are only a few mathematical models for screening, none of which

is general enough to cover all forms of the disease. In what follows,

the literature on screening models is reviewed and general concepts of

screening programs as a tool for early diagnosis of malignant diseases

are pointed out.

In 1963 Lincoln and Weiss [65] derived properties of the time with

disease before detection in recurrent screening and investigated the

consequences of these properties with respect to the interval between

screens. They considered the efficiency of different policies for

scheduling medical examinations, and treated both periodic and random

examinations allowing for imperfect diagnosis depending on how long the

disease had been present. Working with that portion of the population

in which a tumor appears, they defined

U(t) = Probability density for the time at which one can
first observe the presence of a tumor.

a(t) = Probability that a diagnosis made at a time t after
the appearance of the first observable signs of tumor,
will be incorrect.

Let examinations occur at times T,T2' ..., such that the intervals

A1 = l'A 2 = T2- ,... are independent identically distributed random
variables with probability density function p(A). Then under the

assumption that a tumor is discovered only by examination, they used the

idea that the examination times {r.} form a renewal process, and if the

tumor first becomes observable at t, the time to detection is

Td= T1 + T2 +3 + ... Tn

where n is the number of tests before detection, T is the forward delay,

or time to first test following the initiation of the disease, and T.
is the time between the st and th test. Then
is the time between the j-lst and jth test. Then

n(t,x) = [1-a(x)]f(t,x)

where n(t,x) = probability density for the time to discovery of the

tumor conditional on its having become observable at

time t,

and f(t,x) = probability density for the event that a test occurs at

t and that any diagnostic made in (t,t+x) were incorrect.


n(x) = 0 u(t)n(t,x)dt

where n(x) = probability density of tumor age at discovery.

They found the expression for moments of n(x) as functions of a(t) and

$(A). Using the quantities shown and employing the following two

optimality criteria they developed an optimal schedule:

Criterion 1: No more than a fraction e<1 of those people who eventually

have a tumor will have an undetected tumor for more than

a specified time T.

Criterion 2: Mean undetected time of tumor growth does not exceed a

given time T .

Weiss and Lincoln [106] used the model in [65] for the case of

cervical cancer. They used a gamma distribution for u(t) and a negative

exponential for A. where A = Ti Ti-l. Neglecting death from other

causes, they obtained some characteristics for screening period by using

an (a,b) policy, where a is time and b is a probability. The (a,b) policy

is one in which the probability that any tumor is of age a or older at

the time of discovery is b.

In 1967 Feinleib [29] presented the mathematical justification and

restrictions for the well-known epidemiologic relation that the prevalence

of a disease is proportional to its incidence and mean duration. Let P(t)

be the prevalence of a disease at time t; i(t) the incidence of disease

at t; and g(djt) the conditional probability density function of

durations of incident cases where d is the duration from time of onset,

then if P(O) = 0,

P(t) = i(y).g(xly) dxdy
0 t-y

For the stable disease model, he imposed the following three


1) i(t)=i for all t 0 and is zero for t<0.

2) g(dlt)=g(d) for all t 0.

3) g(d)=0 for d>M.

For this model he proved that P(t) = i 0 for t>M, where d is the

mean duration.

In 1968 Hutchison and Shapiro [49] published their work on the

estimation of some parameters of preclinical breast cancer. They used

preliminary findings of a clinical program of screening for breast cancer

to estimate average duration of preclinical disease (early stages of the

disease, in which tumor may be detected only in a screening program).

They assumed that in any large population there are some individuals

with preclinical disease. The number of such women (prevalence) depends

on the rate with which new preclinical disease develops (incidence) and

the length of time (duration) it persists before clinical diagnosis.

Then if the number of preclinical cases remains constant during the long

run, the rate of new cases must be the same as the rate at which old
cases are passing over to clinical disease. In general a prevalence P

and an incidence I imply that the average duration is d = -

Moreover if, in the absence of screening, the duration of pre-

clinical disease were the same for all individuals, then duration-to-date

of those detected at screening would be uniformly distributed between

(O,t), average duration-to-date would be half the total duration, and

this would be equal to average lead time. In their mathematical model,

the only input was

Id = incidence of cases of duration d. I is expressed as a

discrete distribution function with I Id = 1.

Where d = duration, or interval of time during which an individual case

is diagnosable by screening but not diagnosed under usual practice. Then

they gave functional form for prevalence, incidence, mean duration and

lead time as a function of Id'

For instance,

P = Id d + n Id
S d=O d=n+l d

where P = prevalence at time n of cases detectable by screening but

not diagnosed under usual practice.


I(n) = Id(n-d)

where I() = total incidence in interval n following screening.

Using preliminary HIP findings [84,85], they estimated the average

duration of preclinical breast cancer in absence of special screening to

be 20 months. This will give an average lead time of 10 months for a

completely homogenous population. Later on it will be seen that there

are some assumptions inherent in this model which makes their analysis


In 1969 Blumenson and Bross [11] presented a mathematical analysis

of the growth and spread of breast cancer. They developed a mathematical

model which describes the development of the cancer from the appearance

of the first cell to the possible occurrence of a distant metastasis.

Their model also accounts for limitation on the minimum size of the tumor

before it can be detected and for the effect of surgical intervention

by the physician on the development of a recurrence of the disease. They

used a deductive method and constrained the progress of breast cancer to

the contribution of the following parameters (1) the tumor doubling time,

(2) patient's delay in reporting her disease to the physician, (3) the

chance that the disease will spread to nearby lymph nodes, and (4) the

chance of spreading to more distant parts of the body. A patient is

classified as having a small primary tumor (S) average diameter less

than 5 centimeters, or a large primary tumor (L). A patient either has

negative nodes (0), 1-3 positive nodes (2), or more than three positive

nodes (4). After surgery the patients are followed for at least 18 months.

At the end of this period a patient is classified as (N) or (R) depending

on whether she had no clinically detectable recurrence. This introduces

12 stages; SOR, S2R, S4R, .... They presented a method for calculating

the probability of being in any of these states as a function of the

parameters introduced. Once the twelve probabilities have been calculated,

they are compared with the data and a 2-test is used to determine those

values of the parameters which minimize the X2-value.

Calling their model a deep mathematical model* for human breast

cancer, interestingly, they found out [14] that the only way to get

an acceptable X2-value is to employ a two-disease hypothesis for breast

cancer, i.e., the population of patients consists of two groups with

different rate of doubling time.

In 1969 Zelen and Feinleib [116] presented a model of a chronic

disease which progresses from a pre-clinical state to a clinical state

and related the potential benefit of the screening program to the

lead time gained by early diagnosis. They developed a stochastic model

for early detection programs which led to an estimate of the mean lead

time as a function of observable variables. They considered a screening

program where an individual was examined only once. In their model,

transitions are from a disease-free state (SO) to a preclinical disease

state (S ), and then to a clinical disease state (Sc). They assume that

(S ) eventually progresses to (Sc) if not detected and treated. They


q(t) = p.d.f. of sojourn time in S

Q(t) = q(x) dx

P(t) = probability of being in S at time t

Qf(t) = unconditional forward recurrence time (lead time)

m = mean sojourn time in Sp = Q(x) dx

A deep model describes an underlying process which in theory,
generates the surface events.

and assume that

t Q(y) dy
Qf(t) =
T m
2 2
rn-i-a m 21 r-
L = mean lead time = m2- = (1+C2

where C = and m and a2 are the mean and variance of the sojourn time
distribution in S They mentioned that L> is because of the length-

biased sampling which means the screen does not detect people at random,

but detects people with longer preclinical sojourn times. They also

mentioned that particular care must be exercised if one is comparing the

survival of the individuals detected early by a test with a comparable

group of individuals detected at a clinical state because those found

in S tend to have longer preclinical sojourn times in S than the
p p
general population. This might be synonomous with a slow growing disease

in the preclinical state. Zelen and Feinleib consider different

conditions under which the following relationship referred to by

Hutchison and Shapiro [49] and Feinleib [29] is valid:

P = m -I

where P, I and m are prevalence, incidence and mean duration of the

disease. They proved that even if prevalence and incidence are time
dependent, P(t)/I(t) is equal to the mean sojourn time in S provided

the sojourn time follows an exponential distribution. In general

I P() P -+ (C2-_) P'(t) + ... where C = 2
m m

They developed relationships among age, prevalence and incidence

and employing data from HIP [87] found an estimate of 1.84 years for m.

Finally they relaxed the assumption that every individual eventually

leaves the (S ) and developed some relationships which depend on the

ratio of those who eventually will surface to the total population

and is a generalization of their model to non-progressive diseases.

In 1972 Kodlin [57] used a series of biometric arguments and very

simple cost estimates to attack the cost-benefit problem in screening

for breast cancer and found that survival results would justify the

increased costs that might result from mass screening. To do this he

claimed that in the case of breast cancer, one is faced with essentially

two alternative basic strategies:

1) To screen and bring the positives to therapeutic intervention--

associated total cost is called C1.

2) To let the cases come to diagnosis and treatment through the

traditional pathway of recognition--associated total cost is

called C2.

Let t = treatment cost for those picked up by screen

t' = treatment cost for those not picked up by screen

W = W0 (physical exam) + (biopsy fee) W1

S = screen cost

b = biopsy rate amongst false positives

Then true positive false negative

C1 = P Il.(t+W+s) + P(1-TI).(t'+W+s) +

false positive true negative

(1-P)-(l-]2) (Wo+b-W1+s) + (1-P) (H2).S

where P = presumed frequency of breast cancer in the population

il = conditional probability of identifying a case correctly
by mammography and palpation

12 = conditional probability of identifying a non-case correctly.


C2 = P-(t'+W) + *(1-P).[m+(l-nH)(Wo+b-W )]


S= fraction of the non-diseased who demand a breast check-up,


m = mammography cost.

Using some estimate of the parameters P, l, n2, 2 and costs, he found

that C1 is usually greater than C2. Then he attempted to assess the

total cost per case cured and using this objective found that it would

be beneficial to choose the first strategy, screening.

Shapiro, Goldberg and Hutchison [84] used the experience in the HIP

Study [87] to estimate the average time gained through screening in

the detection of breast cancer among the women who were aged 40-64 years

at the start of the screening program. They used the model presented

by Hutchison and Shapiro [49] which says d = P/I and found a mean

duration of 1.3 years for a prevalence of 2.73 and incidence of 2.09

as calculated from HIP data. The statistical models that were applied

suggested that the average lead time was about a year.

In 1974 Kirch and Klein [53] developed methods for determining

the optimal screening policy using the criterion of detection delay

which is time from first point at which disease is detectable to the

DOint of actual detection. They started with an age-dependent disease

and showed that, under certain conditions in an optimal schedule, the

interval between examinations is proportional to the square root of the

age-specific incidence of the disease. They were interested in possible

advantages of nonperiodic policies in mass screening, and tried to

find out whether a nonperiodic schedule, involving the same expected

tests per patient as a periodic schedule, could reduce the average time

to detect a given disease, or, whether a nonperiodic schedule involving

fewer expected tests per patient could lead to detection of the disease

as early as a given periodic schedule.

They divided population into two groups, those who will eventually

get the disease and those who will not. For the first group, interest

was centered on early detection. For the second group, interest was

centered on minimization of the expected number of tests, and found a

class of optimal schedules by varying the number of screenings. They


1) The age span consists of "equal length periods" in which

incidence rate is usually tabulated in the literature.

2) Each such period starts with an examination and all examinations

within a period are at equal intervals.

3) Examinations are error-free.

Define: T = Earliest time at which the disease could be detected, if

an examination took place (they assumed T is uniformly

distributed over the interval).

x. = Number of tests scheduled for i-th period.

D = Length of time between T and examination time.

Q(xi,D) = Detection delay if the disease becomes detectable in

the i-th period.

S. = Probability that patient survives to the start of period i.

Assuming that T has a uniform distribution, they found the expression

for E[Q (xi,D)] as a function of D and xi for two different cases:

1. D is a constant.

2. D is a random variable with probability density function


They used the optimization model

Minimize G(x,...,Xn ;D) = Pi-E [Q(xi,D)]
S.T. xi S. K (K is a constantan)

x. 1

where P. represents the conditional probability that the detectability

point occurs in period i, given that it will occur sometime within the

n periods of interest.

Kirch and Klein [52] applied this model to breast cancer and found

that there is a slight economic advantage (2% to 3%) if examination

frequency is taken to be a function of age rather than fixed throughout

life. Their model has some basic assumptions (such as screening is

error-free) which are not always true, and consequently restrict the

application of their model.

In 1974 Tallis and Sarfaty [97] presented a model for the distri-

bution of the time to reporting cancer, called T. The basic assumption

of their model was that the infinitesimal conditional probability of

reporting the disease is proportional to the rate of tumor growth. Let

the distribution function of T be F(t) = P[T t] with derivative

F'(t) = f(t), then the force of reporting function is

f(t) 0 0 t < t0
l-F(t) =-() c.V'(t) t t

where V(t) is the tumor volume t time units after onset, a is a
constant of proportionality and t0 is defined by the equation
V(t0) = v0. For instance, v0 may be the minimum clinically detectable
tumor size. Then

F(t) 0 0 t < t
S= -exp(-a.V(t)) t o tO

They defined R=T-to0IT> t, and used a staging method similar to TNM [16]
which consisted of four stages SI, S2, S3 and S4 in such a way that each
stage is associated with certain volumes of tumor growth, i.e.,
vi-_ < V < vi is classified as Si, i=1,...4.
Let ri be such that V(t0+ri)=vi, i=0,1,2,3, (ri=0), then if Pi
is the probability of reporting the disease in Si,

P1 = P[R
P2 = P[rl R< r2] = [exp(-a-v )-exp(-cav2)]exp(arvO)

P3 = P[r2IR
P4 = P[r3 R] = exp[-a-v3+a-v0]

They used a standard model for tumor growth V(t0+r)=v0-.er where B is
the rate of growth and found E(R) and E[V(R)] as functions of a and S,

E(R) = ea- o y-le-Ydy

E[V(R)] = expected tumor volume at reporting
l+a- v -
0 -1

Knox [55,56] in his simulation studies of breast cancer screening

programs postulated a model for the natural history of the disease

which consisted of a set of stages. A statement of the natural history

of the disease was then provided in the form of a transition matrix

which gave estimated transfer rates between the various stages. This

set of values was adjusted iteratively until an output was provided

which matched available data on incidence, prevalence and mortality.

He used HIP findings as his data [87].

The full set of stages for the simulation included 26 defined

stages and the general sequence of the transfer pattern was held constant.

The simulation was developed in four steps.

1) Some specifications for the natural history of the disease

were developed which make the model capable of explaining available

incidence, prevalence, mortality and case-fatality data.

2) Sensitivities and specificities for palpation and mammography

were developed that make the model capable of explaining the results

of HIP.

3) The results were extrapolated to predict the benefits of

extended screening services.

4) The scope of Urinary-Steroid prescreening tests was examined.

He points out that mortality savings are not to be seen as the

sole criterion and it is possible to provide short but useful prolon-

gations of life without grossly affecting the cumulative death statistics.

He also mentions that

results of the HIP experiment are not the results
that we shall be needing. We would need to know quite
accurately, for each age group, the effectiveness, hazards

and costs of an extended palpation program using staff
other than doctors, together with the teaching of self-
palpation, and the marginal benefits, risks and costs of
a selective and limited use of mammography, superimposed
upon this background. ... Having regard to the limited
acceptability of screening procedures, their high costs,
.. .,a reasonable service target may be a reduction of
breast cancer mortality by about one-tenth.

In 1976, a significant amount of research was completed. Prorok

[78,79] presented a stochastic model for a periodic screening policy

in the case of a chronic disease whose natural history is assumed to

follow a progressive path from a preclinical state to a clinical state.

The distribution of the forward recurrence time (time interval from

initiation of the disease to its detection) is derived and used to

obtain the distribution and mean of the lead time (time interval between

the point at which early diagnosis occurs as a result of a screening

test and the point when disease would have been detected in the absence

of screening), and the relationships for calculating the proportion of

preclinical cases detected. Prorok defines (So) to be disease-free

state, (S ) preclinical state and (Sc) the clinical disease state. He

develops a model for the interaction of the disease process with an

independent screening process. Assuming that screening time is

independent of the time at which the preclinical state is entered

and the duration of stay therein, and allowing for the possibility of

imperfect detection, he derives all necessary distributions.

Prorok also measures the number or fraction of preclinical cases

which are actually detected earlier than usual as a result of screening.

His model is a generalization of the model presented by Zelen and

Feinleib [116] for the case of multiple screening. The restriction

on his model and models similar to it is due to the method of staging

used, and the assumption of periodic screening. The staging (SO, S ,

Sc) is very general in nature and by using it, we lose the information

available from data. Periodic screening is also a restriction which

is not necessarily optimal, since there are some indications that, in

the case of age-dependent diseases, the optimal screening policy would

not be periodic [53].

Galliher [35] has raised the question of how soon a repeat pap

smear should be scheduled in the case of a woman who has had no test

or has received negative smears up to date in her life. He used cost-

effectiveness analysis to determine optimal frequencies of such schedules.

Normally, there is a trade off between the amount of effort at pre-

vention and the resulting amount of advanced disease if not screened.

To analyze this problem there is a strong need for data on the occult

part of the disease, but, in almost all cases, the early disease has

always been treated when detected. Therefore, due to the lack of

appropriate data, he bounded the objective by obtaining two extreme

sets of objectives, between which the true evaluation of objective should

lie. He introduced two extreme levels, (L) and (M). Under L one

assumes: 1) carcinoma-in-situ could come to detection between the

scheduled preventive smears in more than 50% of all affected individuals;

2) only 40% or so of all carcinoma-in-situ would ever progress to

invasive disease if not treated. Therefore under (L) one obtains a

minimal role to the periodic screening. Under M one assumes:

1) carcinoma-in-situ would never come to detection except at the

scheduled preventive examinations; 2) carcinoma-in-situ, if untreated,

would always eventually progress to invasive cancer, and that invasive

cancer is always proceeded by carcinoma-in-situ. Therefore (M) provides

a maximal role to the periodic screening. In his paper, Galliher only

worked out level (M), because it offered certain simplicities in the

task of producing a cost-effectiveness analysis.

Galliher assumed a course based upon four possible stages of

diagnosis at screening, and defined them as

H = Healthy

U = Carcinoma-in-situ

V-C = (Micro) invasive disease, curable at detection

V-F = (Micro) invasive disease, fatal even though treated.

Figure 1 is a flow chart for the onset and course of the disease in an


Galliher's main assumptions are

1) Carcinoma-in-situ does not regress spontaneously.

2) Duration from onset of carcinoma-in-situ to onset of invasion

is negative exponentially distributed (he used a mean duration of

ten years in his computations).

3) Carcinoma-in-situ alone will not be detected by discomfort

of the woman.

4) Carcinoma-in-situ is completely curable at diagnosis.

5) There is a sharp transition to invasion at the end of the

duration of carcinoma-in-situ.

6) When the stage of invasion is entered, two competing processes

commence and operate concurrently; these are

a) tendency to surface clinically,

b) tendency to make the transition to fatal disease.

7) The rate for transition to fatal disease and clinical surfacing

are constants g and c, independent of how long invasion has been in

Fioure 1: Flow chart for onset and course of the disease.

process. Therefore if the invasive condition is not first diagnosed,

the chance that it remains curable for at least the next t years is

equal to e-t and the chance that the invasive condition remains

clinically occult for at least t years is equal to e-ct

8) Initially, the individual starts at 20 years old and has

no risk for previous cervical cancer.

He employed the following measures of effectiveness in his


1) The minimization of the total medical care.

2) The minimization of sum of 1) plus the economic loss to

the society if the patient dies.

3) The minimization of premature death.

Galliher used data to estimate incidence of onset, g, c and

different elements of cost. For the numerical task of finding optimal

schedules he needed the probabilities

P(i, j, m, t) = The probability that an individual who is in

stage i of the disease at the end of the m-th time interval will be

in stage j of the disease at age t if she does not have a pap smear

between the two stages. This quantity could be easily computed from the

above mentioned assumptions.

A mathematical model of breast cancer which is similar to the work

of Galliher [35] was developed by Shwartz [88,89,90,91,92,93]. His model

was based uoon the hypothesis that breast cancer is a "time dependent"

disease, and if it is detected and treated earlier, the prognosis will

be more favorable. However, due to the potential high cost of screening

(in terms of dollars, psychological effects, and physical side-effects),

it is important to establish estimates of the benefits of screening so

that one might better evaluate if the benefits appear worth the


Literature on breast cancer shows [88] that tumor size and extent

of lymph node involvement are the most important prognostic variables

that affect prognosis. Employing this, Shwartz: assumed tumor growth

rate is exponentially distributed, and the rate of lymph node involve-

ment is a function of the size of the tumor and its growth rate. Using

these functional forms, he compared the results with available data and

computed the parameters. Therefore his model consisted of a set of

hypotheses on the incidence of the disease, its progression, its tendency

to be detected without benefits of any scheduled screening examinations

and the relationship between stage of the disease and survival. Using

this model he found, for a woman in a given at-risk level, the level of

effectiveness as a function of the number of tests, the corresponding ages

at examinations, the reliability of the tests, and whether or not the

individual had performed self examinations between scheduled screening

examinations. Modifications in the model were made and a second model

was proposed so that the two models bracketed relevant hypotheses about

the rate of disease progression. The two models differed in their hypo-

thesis about the rate at which lymph nodes become involved.

Due to the lack of appropriate data, Shwartz vaired the assumptions

over the entire range and computed the benefits, with the hope that

actual process lies somewhere in between. For instance, he parameterized

the distribution of tumor growth rates, threat of death from breast cancer,

the false negative rate and the correlation between the false negative

probability on successive examinations.

Using heuristic techniques rather than optimization techniques,

he determined the best ages at which screening examinations should be

given to each individual.

The three benefit measures employed by Shwartz were

1) the life expectancy of a women from her current age,

2) the probability that detection occurs before nodal involvement,


3) the probability that there is no recurrence of breast cancer.

D.E. Thompson and T.C. Doyle [99] proposed an approach to the

selection of screening policy for cancer of the colon and rectum. They

presented an approach to analyzing the question of how often a person

should be screened for colorectal cancer to achieve a desired cost-

benefit outcome. To do this they reviewed and analyzed data on the

incidence and prevalence of the disease, the course of the disease, the

cost and ability of detecting the disease at various stages of progression,

the cost and effectiveness of treating the disease in any stage and

benefits of treatment.

Thompson and Doyle offered two approaches to modeling the disease.

1) A continuous model, based on tumor size and growth data.

2) A discrete state model, based on progress of the disease

through several stages.

In this paper they emphasized the second model, which, in the absence

of data reflecting the true progress of the disease, has been structured

in the context of available data and is relatively simple in design.

This simplified version of the discrete model was used to perform

parametric analysis.

Their model has two principal functions:

1) It provides a basis for determining the relative values of

different screening policies.

2) It provides a means of analyzing the sensitivity of screening

policies to variations of the onset and progress of the disease, the

efficiency of screens, costs associated and/or benefits.

Their discrete model can be viewed as consisting of

1) A set of discrete states, and

2) A set of probability distributions, which describe the length

of time that an individual remains in a particular state.

The model consisted of the following stages:

H : Health

A : Lesion confined to the mucosa but undetected

A': Lesion confined to the mucosa and detected

8 : Cancer into the muscularis propria with rngativ lym,,ph

nodes but undetected

B': Cancer into the muscularis propria with negative lymph

nodes and detected

C : Cancer with nodal involvement but undetected

C': Cancer with nodal involvement and detected

D : Distant metastases present but undetected

D': Distant metastases present but detected

M : Death.

Due to the unavailability of appropriate data, they proposed a

particular staging and obtained few primary conclusions. In this model

the status of an individual is related to whether disease, when detected

and treated, is curable or not, i.e.,

H : Health

P : Occult Colorectal Cancer, pre-fatal, i.e., curable if


P': Colorectal Cancer, detected by clinical surfacing or

screening, curable

F : Occult Colorectal Cancer, fatal

F': Colorectal Cancer, detected and fatal

M : Death.

They assumed that the death rate (u(a)), onset rate (P(a)), rate

of progression from pre-fatal to fatal stages (X), rate of clinical

surfacing of pre-fatal (ip), rate of clinical surfacing of fatal

colorectal cancer (pf) and rate of death associated with detected fatal

cancer (u') were constants (i.e., they are assumed to have negative

exponential distribution). This assumption resulted in a Markov process:

Let Pi(t) = the probability that an individual is in stage i

at age t.


Pi(t+s) = I P.(t) Bji(tt+s)


P(t+s) = P(t) B(t,t+s)

where Bji(t,t+s) = probability of a change from j to i in the time

interval from t to t+s.

Then screening was introduced into the model by means of the probability

of detecting occult cancer, i.e.,

ak = P{Detection of occult cancer, given that the

disease is in stage k}

They considered a screen at time tl, and denoted the time immediately

after the screen by t ; then

Pi (t) = P(t ) a + P (t

0 0 0 D

where iO, iD correspond to occult and detected stages of the disease.


P(t+) = P(tl) A


1 for i=j and i corresponds to a
detected stage of the disease
1-ai for i=j and i corresponds to an
occult stage of the disease
A =
Aij a. i and j correspond respectively to
occult and detected stages of the
0 otherwise

Therefore for time t2

P(t2) = P(t ) B(t ,t2)

= P(tI) A B(tlt2)

and so on.

They examined their model using the following three measures:

1) Oncological: More effective strategies are characterized by

porportionately higher detection in earlier stages.

2) Medical costs: This includes the cost of the screening program

itself, the cost of diagnosis associated with false positives, the

cost of treatment and the costs attributed to the period of disability

of a patient.

3) Life expectancy: More effective strategies are characterized

by longer expected length of time prior to death.

D.E. Thompson and R. Disney [98] introduced a general mathematical

model of progressive diseases and screening. Conceptually, the disease

history could be represented as progressing through a series of stages

whose durations are random variables and whose meaning could be inter-

preted in terms of the individual's prognosis and the sensitivity of the

disease to detection in that stage through clinical surfacing and/or

screening. Their model was a mathematical model of the interaction of

two independent random processes, namely, the disease process and the

screening process. The purpose of the disease portion of the model was

to predict a person's status at any age t. The model can compute the

probability that an individual is healthy or in some stage of the disease

and, if he has had the disease, the length of time he had been in the

particular stage.

They assumed that the screening methods could produce false negative

or correct results, and the probability of these results could depend

on the stage of the disease, the length of time in the current stage and

the particular method used for screening. In their model, as soon as

an individual dies of other causes or is detected as having the disease,

the process terminates. Moreover, the time from birth to death from other

causes is assumed to be a random variable independent of the disease

process. This assumption makes the analysis much simpler, because it

makes it possible to model the disease portion as a semi-Markov process

whose transition matrix is independent of the age of the person.

This assumption allows them to use the existing theory of Semi-Markov


Let E = {0,1,2,...,S,S+1,...,N} be the state space of the disease.

State 0 indicates absence of disease, and state 1 through S indicate

that the individual is in one of the S occult stages of the disease.

Stages S+1 through N indicate that the disease has surfaced clinically,
or been detected by screening. Let Yt be the state of the individual,
given that he is alive at age t.

T = age of the individual when he changes stage for the
n-th time

Xn = Length of time that an individual will spend in the

state he entered when he was Tnl years old.

They assume that everybody starts life free of disease. This is

not generally true (for example in the case of Neuroblastoma), and the
analysis can be modified to take care of this possibility.

Thompson and Disney propose the following assumption:

P{n = jXn x Yn-1 n iY ,Y.2 ,X.n-1 Xo}...
= P{Y = jXn x lYn-l = i}= Aij(x)

1 i if j=0
P(Y0=j) 0 if j~0

P(XO=O) = 1

and compute the probability of the event

B = Zt: Yt = j, Ut > x, Vt > y

where Vt and Ut are forward and backward recurrence times of the


They define quantity Qt(j,x) to be: Qt(j,x) = P{Yt = j, Ut x},

and consider the screen given at the time Tm = t. Assuming Qt(j,x)

denotes the distribution of (Yt,Ut) immediately after this screen, they

evaluate Q,(j,x) as a function of Qt(j,x). So given the initial condition

(Yo = 0, Uo = 0) at birth, the equation P{Yt = j, Ut < ylY = i, Us = x}

can take the distribution of (Yt,Ut) up to the time of the first screen

at age T1. Then Qt(j,x) = f[Qt(j,x)] modifies this distribution at

the time of the first screen, as individuals are taken from the occult

part of the disease to the detected states. A similar analysis is

applicable for the interval Tl to T2 and to the screen given at age

T2, etc.

This model is restricted to the case of diseases which are

stationary in time and are not age dependent. Unfortunately, the

literature on most of the cancers shows that there is a significant

difference in relative survival statistics among young and old indivi-

duals [5]. Therefore, in those cases this model has to be modified to

take care of time dependence of the disease.

The forward recurrence time at age t is defined as the interval from
age t to the epoch of the next state change of {Yt}, that is
Vt = Tn-t, if Tn_1 t at age t is Ut = t-Tn_1 if Tn_ t

Albert and Louis [2,3,66] have done very broad research on the

subject of screening of progressive diseases in the last two years.

In their first paper [21 on screening for the early detection of cancer,

they have characterized the natural history of a chronic disease state

in terms of the distribution of X (a person's age at the time of entering

the disease state), Y (the sojourn time in that disease state), and

A (a person's present age) over a population of individuals. Then,

they have defined age specific incidence, prevalence, life time attack

rate, mean duration of the disease state, cohort effect, etc., in terms

of the joint distribution (X, Y, A), which for known (X, Y, A) gives a

method for estimation of those parameters. In their second and third

papers [3,66] they have found the impact of screening on the natural

history of the disease and presented a method for estimation of the

disease natural history. A brief review of their work follows.

They define

1) fXYA(.,,. ;t) = joint distribution for (X,Y,A) at any instant t.

2) A person is nonsusceptible if X = .

3) A person is a chronic habitue of S if, for that person

X<- and Y = .

** proportion of chronic habitue = Pr.{X<-, Y = -}

4) A cohort effect is said to exist if the distribution of

(X,Y) varies over age strata.

5) The lifetime attack rate of disease state S is

P [X < m] ;

P [X < =] = P [x< Y = +] + fOXYA(x,y,a) dx dy da

6) IS(a) = The age specific incidence of S among those aged a:

I(a) = fX,A(a,a)/fA(a)= fXJA (ala)

7) IS = The overall incidence of S:

** I= (a) fA(a) da

8) 45(a) = The age specific prevalence of the disease state
among those aged a.
S s(a) x= a-x fXY/A(xy/a) dy dx

9) bS = The overall prevalence of S:

** S= S(a) f fA(a) da

10) If there is no cohort effect and if E [XjX then
= f s(a) da
E [YIX<] = 0 S(a) da

F I(a) da

This equation is a generalization of d=P/I used by other investigators
[29,49,116]. They define S1 to be the state which is entered upon
leaving S, and show that IS = Is, which is used by other researchers,

is true if the age distribution is flat and there is no chronic habitue.
They also present a method for construction of (X,Y) distribution
from prevalence and incidence information.
If there is no cohort effect

fx(x) = W I () + dI S(x)

If there is no cohort effect and X and Y are independent, then

IS (s)

[s-s(s)+TS (s)]

where f, I and are laplace transforms.

Then using counter examples they prove that without the assumption of

independence (between X and Y), the Y distribution is not unique.

Since Y denotes the preclinical latency and X0 denotes the age

at time of entering the disease state, then at the instant of time

XO+Y the patient surfaces. Now, consider a population of individuals at

a certain instant of time, t, each person has an associated vector of

sojourn time Z = (XO, X,1..., Xk,Y). This plus the person's age, A(t),

describe the natural course of the disease in that individual. Denote

the density of (Z, A(t)) over the study population fZ,A(t) and the total

population size by N(t).

In the absence of screening, they allow people to leave the study
population for two reasons:

a) Death from competing risk

b) By reason of surfacing with the type of cancer under study
and assume that

a) A clinically surfaced individual leaves the study population


b) False positives are eventually discovered and returned to the

study population. They actually take this probability to be zero.
Then if nt (Z,a)-dA-dZ is the number of individuals in the population

at time t that occupy the cell (Z,Z+dZ)-(a,a+da), they show that

an(t) ,(t) (t) (t)
t (Z,a+t) = M(t).fZ,A (Z,a+t)-p (Z,a+t)-n (Z,a+t)

provided 0 < a+t < Xo+Y

and of course in the complementary region, n(t)(Z,a+t) = 0.
M(t) is the immigration rate,
fZ,A(.) is the joint density of Z and A among those who immigrate
in at time t,
(t)(Z,a) = r(Z,a,t)-((Z,a) + d(Z,a,t)


r(z,a,t) is the screening rate at time t in the stratum Z=z,
A(t)=a, s(z,a) is the probability of a positive screen if Z=z and A(t)=a,
and d(Z,a,t) is the death rate at time t in the stratum Z=z and A(t)=a.

Therefore (t(Z,a) is the instantaneous net rate of removal of indi-
viduals from the above mentioned stratum.
The solution to the above differential equation is

n (z,a [n()(z,a-t)+K (z,a-t)].exp[-Q(t(z,a-t)]
n(t)(' ,a) : _
0 otherwise if < a< X+Y


Q( z,a) = t (u(z,a+u) du
K((r,) t (v) (,)
K(t)(z,a) = M(v)-fZA (z,a+v)-exp Q (z,a) dv
f0 ZA -

To find the joint density of Z and A in the study population at

time t, n(t)(z,a) should be normalized

(t) n(t)(z,a)
fZ,A(z,a) -if (t)
n (z,a)d dz da

Therefore the effect of screening on fZ,A(.) can be computed which gives

a means to predict the temporal behavior of epidemiologic parameters in

the presence of screening.

In order to answer the question of "what is a better strategy of

screening?" they introduce the concept of "critical point." Treatment

that is begun before this time point has a relatively high probability

of success, whereas treatment begun after this point has a markedly

lower probability of success. Their objectives are

1) The discovery of rate of less-favorable-prognosis disease,

If(t); If(t) dt is expected number of cases with less favorable prognosis

that are diagnosed in the interval (t, t+dt).

2) The salvage rate s(t); s(t) dt is the expected number of cases

discovered by screening in (t, t+dt), who have favorable prognosis, but

have Y> X> .

The following data are required as input to their computations.

1) Age specific death rate d(a,t)

2) Immigration rate, M(t)

3) Initial population size, N(O)

4) Age specific screening rate, r(a,t), as a function of time

5) The screening detectability function, f(z,a)
6) The initial distribution for (XO, X1, Y, A)--fZ,A(z,a).

7) The (possibly time varying) distribution for (Xg, X1, Y, A)
among immigrants--f ,(z,a).

In their third paper [66] Louis, Albert and Heghinian present a non-

parametric method for estimation of fXyA(.,. :t). To observe the

accuracy of their estimates, they generated data, used it to find the

estimates of certain epidemiologic parameters and compared these

estimated values with their corresponding theoretical values and the

usual epidemiologic estimates. Several points should be noted about

their work:

1) The method of staging employed is very restrictive and the fact

that there is a unique passage from SO to S1.. to Sk makes it impossible

to employ their model to the case of diseases such as Neuroblastoma where

from S3, the individual can either go to S4 or S5.

2) Their objective function is not strong enough to use all

information on survival. In fact it picks a point--critical point tc-

and weights every point t tc with

value zero.

3) The effect associated with immigrates is not well defined and

since the distribution is assumed to be known, in most cases it decreases

the accuracy. It would be much better to neglect its effect than to

bring it in at an unknown level of effectiveness. They mention that 20%

immigration did not change the screening policy significantly. This is

why other investigators neglect its effect and assume a closed population.

Having reviewed the literature on the topic of screening processes,

we observe that the main issues on this complicated subject are

1) An understanding of the natural behavior of the disease

2) Presentation of a method for estimation of the disease


3) Investigation of the impacts of screening procedures on

performance measures of interest such as the probability of detection,

the probability of recurrence and the probability of survival (in an


4) Investigation of the impacts of screening on the whole society.

No single model has been able to answer all these issues rigorously. Some

unanswered questions are related to the last subject. Most investigators

have tried to find the benefits of lead time on the total population.

It has been claimed that those who are detected by the screening pro-

cedure have a longer preclinical time and are considered to be slow

developing disease individuals. Therefore any measurement solely based

on this group would be biased and determination of the benefit of lead time

for the general population would be imperfect.

From a modeling point of view, the most general models are those

of Lincoln and Weiss [65,106], Klein and Kirch [52,53], Zelen and Feinleib

[29,113,114,115,116], Bross and Blumenson [10,11,14,15], Prorok [78,79],

Galliher [35], Shwartz [88,89,90,91,92,93], Thompson, Disney and Doyle

[98,99] and Albert [2,3,66]. These models cover a variety of methods,

some of which are similar in the direction of their conclusions but some

are not. The structure of the model developed in this research is similar

to the models developed by Albert, Thompson and Disney, Shwartz and

Galliher but the specific formulation and general conclusions differ.

The following pages summarize the material review in this chapter.

> S- dW Cn Qu S- M
*L *i 0 >S L a) 4

1 *r- *- 4 C I *- E E
S 1/ S 1 U F XDD (0 3 3
*- Cdii WCC m

r- c D o a) a 0 a1 11 c 9 o 1/1
S 4-1 Q 4- 3 *- mS
CdL CL ) 4-U C -> d3
3i d i dd didjd-c i o"-
Sa) =t a 0= 10 ra 1 0 V) 2 (40

On o +S1 cu OJ 0 4 d l3 E OJ 4-I a) 0

(31 u 4IC.-' di-do i dia a), .Z 4
3 O +- '4-. -i r'=S4 0 w

c = 'a 1) s- 01 11 r=
L) i4- o+ u di o X Iu S- a- S- 4 C

w- a. U 0 c COC1 0 C >,
C >S- C. >sS..C.- 4-l rU >,dl-0d 3 >di

I. 4. 4
o 3 d
7- CL ai c
0- 1 a a) -4 u

C.- > c) > e
Mi C
a)r- 0 4 -.
a *~- *r-- +
l > *0 +-O 0 U W- 0
dl I C 0 diS *CC

w aj m 71 a=t O*a
viU rd 4 c r u vi C o -a)

*" 9) 0 V l -i- =
0I 4-+ CA > ( 4-- l. -
G EE (i ai =i = Ui -c l =
a It a) W a)< U "O >

-44 a m 1.
0) diCU > a)X 0- Wdid.) u
L*io di e4 L..L + CCo U

4C di C a I UO

m C ) M D. d U- C

ei CE! Ca) WiCC C + iO 4 d (lU
0 *w- a 0 *.- !L o o- *a r*- Uj > n 2 c
0*- U 4U ( CC I > U 3 CV U C *n .C
0 O C O o 4-CI C 411CC 4- 4 d CO d1 -'' *C.diC OC

a O Oe O i 4 0- aI
-3 *T d a-r- *r- z, ( > *-- I U
o di 3d i CI 3 u ai sld a) d U ii = o

U ro dl
C W C >-U 0 i C 0- ( C

1U 0V

} i- i- i i-

rSI C &- >l &i-)-' di
di C -o C 01- C u So
di eL C S. L
C- -+


0J 04

.0 0
4- in -o 0

0 .0 0,

O C O0

1- 0.+.' L

3 =

C 4 1

43 U
) 100

C0 ) 0

0. Ua)

0 u -0 ai

41 4-1

W -4.1 +.i 0 3

I- *O-Q .0 *

4J V)

0, M 4.

10 a C

'a- 0
d10 0) 0

=1 4. 0


100 0 0

0 4

II 0



0 UU


/ aL


0 S-o

U, 4- 3


M raa~a
0, W'00
0 00 4-Q W
4 4u -a >n
(X 3 V)

0- 4C- C*.C
0 0 f.0 L04^(

10 <- 4^ .C W

*~r- 0 4j faj
00 L- i
4 4 0'0 *r L

a) W (U0 m 0. S-
= OJ e- M. V) a
1= 4<-) U E a) C-
0. 4.1 Wt 0
CL M 4- =- C. 0 4-
I E 0 vi I 4C1 0

3 CC

0 og 0

r a >
e- O- > *- 0
3 0 r- ) -+

> >4 40
w >)- 4- C :
- i- -0 W r3
CC > U U

0 4j 0 0 a-
V .t CL

.- .o E 0--V
> ?D > i *
41 1) 4- m Q (D

4- C) 10 (A

S- i a

(J" S- 4- CL
*- o 4. r-

o u vu r- =

<.-0 -a 4--
SO ( 4 *- W

40L aL1 aj 0j

)SL0ooo o E o

S4 a0 u, Q *' 4- -a


r-(31 'o

S- S- 0)
0I 0- u
a0. 0

Si 4- u


r -01/

(*-J U



41 U


4 0

0O >, V
>4- 0o

(/) *-a 0

U. 33 c
+1 CLC E, a)

3 *- a.. i 4 .- a

W- 0 00
V 4- o Qr- v)- J

0 > c a u c

4- 0
(U4 0) C- U

4- 4J1

W (n

ea v >

0I 30

a*rE r-
4- L

'0 U

)0 0

m u

4- 0



0 0

I- fI
c i






- >


> a

et g




-C3 f
I- V

=E 0)
4-1 -=

0 a
0 4



0 C -0

*) -M 0

C 3


4 0 VI
-0 VI

a1 I

Si- ra

s- 5-01

0 ) 4
I 0) C m
0) -r-

0 4- VI

c) L
Co C
i 4<- a) =

0- 4 Q

S 40
CS*- 4-' S-
VI l)

0 a 4c

0 m
-C *r- 10

CL 00 C
o *0) C- 4-
1 ).LL *
C 4-'l V2 C 0 2

r.. 0 3- -- 0 5<


4VIE o0

4C C .
4- = 0
0 = -r- C 0
c ..- ..1 4 aj

* S- 0 r- U
0-0 m C LC

*4- S 0 0 S S
C-O V uL O4-'
t L 1 o k X r

I C.'0- I0 C, 0 C



a =

0 -

vi 4-
m C

0 v


0 C 01
01 0 CI

Q +-I VI

00 CL

1 0
I- -- 0u

c0 4-


2aC 0


'- 4
a )

o C-- sU


S10 0 *-

- U C- -

0) 0 -
C >i4-'-o*0

C-)' 0 -

I-' 0 U. I CL 0

0) C' 4-3

4- V i 0 2





41 > s-


) 4-' 01
= 0 4-' c
D D +

=1 u +-

0 0

.0 ^- C
*(- -2 C= V-



01 41 0-

M -4- M- 4-- 0

+O > V, VI f
Q. 1- a)- 0 0) (1
4 ,, 0
-0 3-w- 0 0

0 4- C0 1 1
S 4 I 0 0 ..0r- +

L. U En M 4- 0

C o ea CO
c vl a) 4- C
o O O GJO-.04-4-

U, = 4-'4-1QIam,

*- 4- S e W> _

3 i 0 CL a" r eo
U SI C4 **-C U
C 0 > 0

41 U a n e Yo -a
Sj S- au
C a) >' M 0 .0

( 0 CM -- wO w
S- O J" .l
U, aC o o a .-i

o 4U V, In
E fo ii
U3 a in U, a c a
Ul ul 0 I S 4-

C *I *W 0 *r- 41 f

*r- cu 1 S m e > io

o0 i :JU 0 a U
0 I I 0 1

I c0 '0

-0 E 1. CO *OO C
0 .1 i M C vi -

E .. - O 0 C
4-) -5 .C+-O 0 s...

4- U

M C U0ai
0 O 0) U
f0 ( C 0 1 n 4- U C
0 oo- O 0" -'"0 O' Co
U, tO S- C U C

1.1 01
't 01

U 0O

0^ 1



C C'- t -


4- L Li 0 L-
~uFO ovv

^ J
>0 C

mQ. m ora
C- 04 -

00 0 '0 j -

C- L i 400 1
- .0 0.41o L
E < rt*-*<- -
*'- a. ^ ^- "
~o ^- c c a
c j a c
-rC t *-L- r

0- 00'
000 CC oCC
w 04 C 0 01
0'0,0~ -0- C

Osa -v-ve GC
OC0 C>* 2 w a 0

-i 4400! x 00. 0

oo~~ r oc
.4 0 C -> *r O C
s IL c uT-i *-*<
0003 U C 0' C 44

u^ O >b -*C 0( j ul*-

COtL CS 01 *rU
(it~ o ~ e e>e n o-J- t*

0 S CC 2-C-CO
o- t2 010.0 CS ajs 11 Liau0t
0>LE CO C're
--( -0 C0.'---. C O '<-.;:4-Li

,0' LO
00 0 *-

c~a o a

t- a ri
rCLO 0(S 0
| 03 41
S-C0 00- r

u 4-' ui 0'L
-'--r-^-0 fj C0"-
i-a - lai~t


a, L

Ca mcJC
.40 CC 0

o taoo LOc
C3 5 a L C o a
I 0' 0 I 0 c
sJx 3 *, (
L*- *^ U a;

aj i, njl

*fl L1 !U CJ

*J 0 = *- irt S
S <-> S I )I-1

L ( 0
4-a 0 0


*C '
O L 01


tS 0 0''

0 ,3

L.-lOC L,-O41I --
00.0) 00 00 04
cacC 0C L:
e~L (nom c (n


JZ.C ^
u *Q r
<- j e

o vlrl
cr ar

.0 5 01
I*-r C
O1--- 00 < 30-
"1'40 -U*'
u u S "-

0: -C 1-0 C
5/ 0' C0 00 <
C 0- .00. 4Xn -'
c' o 0 o 0
1-1Y e ^ o

-0 5

=4- 0
0 c0 0

0 i

"u- 0 4
-C Oc

0 5 0 0
0' o a 00' 00
4- ^- -r---4- 4- 0,0 C*QL- '00

C-CC>, C 03 2) 4)
0L- O 0 i -O 4-C40
CU C ~ C C: 3 <
.40.40- a L -0 Ll
Ogu 00 00' -C aU -0h- iy---

50 5M fU- 5L -- OO Ce 0'- o i
Sj ro ro*- "l E f m >ig f

( 0 a 3 C -- -C0u
0-*~ i4- -- 00 4-Ch 00 OO00
-C- *- 3*-O-*'-E 5-0 *- *- -4CJ> -LC

.4~(-- 0.4 2) -C a cu-C o9o
i-S-O -L" -. 01 -30 0 S..

~OCIL ~ ~ L PI~OC -C

410 e 0. <
0'00 4 Ir CC0 0

00.0 C'1 C 0->, CL

tflooOUo iC .44 -Ca .4

.4~~ >0 CL 0-C '.00 CC-5
400 00 O.LO 4 CO3-C

00- -0.40- 0o C0'0 Ci
(.-0 LC -0fV L >o -C
0)- > 00 0 41 COCO0.---1J"C E.0

va CEL)00 !C'L '-U L

ao 0o 5. > a cj ~
ao-4-'~ oao 4-'>. 0.a'3C u Ca
04-COaj~ja-'-ai -^'-o*-oa4- ioo>o-Co

0- -C OOC c ^ !* -a o t

COL~O ~ 0'0' C
0 4ju .' j 1 t.4 *c >

4-c 0. 4-1- *- C jc a > .
0- '-C L C 30
COOo i000' 0

al-0 cua*^*' 0 -0- a))"'* -?<3o3
aCOC asC C
0; 4 5- v v r Qir c aC- 0O cc 0


L0~u ~LOC--1a
4-C 01 l- iC 00
r o -o- 4- e
0 00 1-



a L

L S.



*1 U
41 C *--
-* 0 .C3 0 *-<
41-h > 4, 4
4- 3 C* 0, 41 4 C 3-3 4
41 (U C43. Cfl.- -04.
C .C 414 0,4 2C*-' c -
o u41 0, 41.OT 'o-4 v .
**-~ faY uc*'* q1

- 0*- (3 C4 0, C08

U .-'4 04 444<-n C 10. C.43
C 1. .C330 13 0,f- U0 *-'
o 41.44,041o 0414.43
c; I E"ra >>

41 0 4. 00 ,40
a* i- 4*- -- f- -
-- 4, r- 0 040.03
4304 I .3 41 0 J 0

41 410.44 *V- 4- 4141- 0,41>4
43 1. U 4 0 04 1.0 041cu > -C
1. >3>S 0 O 3
43 4101 4-0 0,4 (_*- 413
4i.* o o **- 0 ai 0 ~ -43.-.
43 034.= 3 O 0 4.. fa14 >t+-4*,41
41 '-O CI 1. i 14 343
13. 4,..4C300 0 004 434 D 4-* 01
o 0. >341 >r- 0141041400
41 ^-i 1.1) 41 41 041f >0t)3 00O.
04 4-043(V N > 4'.- -4 1
41 431,g0, 41 0 > 41u O 41T-li-f
U" f T *0..-4 0434041fl04- 1 .i
4* C 0,14 .. X l*~" Jt 41 41. 4141
4, aO 411 J 4,341.43434343 1.

-0 0,<1.U 1 0. 44 01. 41
(ru l-cnaO O G*

04 m j 41.)*
-0 411. -

1.43414 a14 -*4-4, 4,4 41414'-
41 0 1 C 2010 4rO< 30
e +J 0="0 r3 3 q-<*- y

0 4, 1 14 0 4 ./ C~^-- U-1 *- SC 4,*
T 1 ul 1 44 .3 1-- 1. iU C
0. G.--3, S4O 1 1>40C, 4 S
0, 00414-4, 40 *,- 414
ar df *^"'- 'o 'u *->C*--

41 0- 0 1. 4, C 4, (.1l40

.43 .2 41 1.30, nOCOJ 41 51.
l_'< a'4 *-i O "- ^.'- ( G- i
u QIl-'r a 'f-' i- o o rt-' c o a--

' j ('+-oj u aj j^ a
41 44,40, 30403 0,

rg 0.1.44I30 3 -U 04,4 4 *
4 1 0i 00 c 3 '4 o 4- --

C1i 434Q.14 3 -> 0 343* 0a 1.0*-

..1.41. .14
41 0 4 1 41 2 4 4
40 41 40 014343~ 411.30 41S 433

0 c, a- 4- >.4 4o1 4, 4*04

044.43 .43-.. 1j- i .4 21 41 o44
.4 4i3t 434a .0 ->-l4 -' a

4, 43l 4-13 1 0 0i 4, 0j!> M -- 1 4 1 4, 41 04l

0 > 43..,- I- 431 0,41.0 .43-4140 3
0,41.343.44 0..-- -4. -- 434 434 0SE
o 41001000,4343 41 40300.0430414- 4*o
1.4, 1..43. 041 0 4-V 04 .--S 0-43 -'t U144
*- II

0 .4 4.04-434o4103413 0,4,.43
41a 11 '4

41 i ,1. 41.-0,1.- i m a U s /i-- .i>

O -- 1.40 ( U ^ (2 '40 y -*-0- J(IM_

.0 C0tf
41 0) 04

0, .4 c- .04 a

U 01 0
Y- ^r-S I- i
1/ 0 0 cn >-U 0 f- r

01 CIILr '- CK LI Cl0 = w-0 ?S~~'

a m o
z '- CMr~



In this chapter, a general model of cancer screening will be

developed which incorporates the stochastic nature of the disease

process with the viewpoint common to most of the cancer literature-

that cancer disease process can be represented as progressing through

a series of stages whose durations are random variables.

3.1 A General Model of Cancer Screening

The literature review reveals that almost any site specific cancer

could be represented as a process that progresses through a series of

stages [16,27,35,85]. The staging of the disease can be done in

alternative ways, and there has been a continual argument concerning

the choice of one method over another. Staging of cancer found its

initial importance in reporting the end results. The "American Joint

Committee on Cancer Staging And End Results Reporting" was organized on

January 9, 1959,to develop a system of clinical staging of cancer by sites,

acceptable to the American medical profession. The committee has completed

and published different brochures for clinical staging of different

site specific cancers. The main objective for clinical staging, to be

useful, is that it should be relatively simple and should yield mean-

ingful information as to prognosis. In the case of modeling a specific

cancer, the choice of one method of staging over another depends on


1) The availability of needed information. There are cases where

a staging method seems to be easy to apply, but there are no data to

support it. In such a case, data must be gathered by employing the

staging method under consideration which needs time and the cooperation

of different investigative groups.

2) The Complexity of staging versus usefulness of the results.

It might be possible to combine data in different manners and design

different staging methods, in which case there is a trade-off between

complexity of the staging, accuracy of the result and its usefulness.

In the model developed, the disease process is looked upon as if

it passes through a series of stages before the individual is clinically

surfaced by coming to medical attention through signs and symptoms. At

any time, the disease process may be terminated by death from cancer or

from causes other than cancer. Throughout this research, attention will

be restricted to the disease process and the results will be conditioned

on the individual's surviving to the age of interest. The model makes it

possible to predict an individual's status at any desired age. It will

be possible to compute the probability that a person is free of disease

and, if he has the disease, the stage he might be in.

Then the interaction of the disease process with a random process

called the screening process will be considered. The screening process

is a process in which the individual who is suspected of having the

disease is examined by one or more of several screening methods at

specific points in time. Based on the result of the screening procedure,

the person under consideration will be categorized as "free of disease"

or "having the disease." In the first case he will not be considered

until the next scheduled examination time. In the second case more

specific examinations are performed to verify initial results and

identify the stage of disease. To allow for all possible situations,

screening methods are permitted to produce false negative and false

positive results. Under a false negative result, the individual is

classified as healthy, whereas in reality he is in some stage of the

disease. Under a false positive result, the individual is classified

as being in some stage of the disease, whereas he is healthy. This

introduces some extra costs, due to additional examinations necessary,

but eventually it will confirm that the individual is healthy. In the

development of the model it will be assumed that the screened population

is closed. There is no allowance for emigration from the population

under consideration, and death from other causes is independent of the

death from cancer. A competing risk approach can be used to tie these

two processes together.

3.2 The Disease Process

Data analysis [5,9] shows that for several cancer sites of interest

in this research the relative survival statistics are strongly age-

dependent and that the distribution of the time from detection of the

disease to death from the disease depends upon the individual's age at

the time the disease is detected. Therefore, the age of an individual

is incorporated as a dimension in the state space of the model. The

basic elements of the model are the age of the individual and the stage

of the disease he is in.

Suppose that an individual of age t can be in one of (2N+3) <

states with respect to a given disease. Let

5 = ([0t,1,, ... ,N t,1 ,2 . ,Nt] ,0',D}
T = 0,1,2,...Tmax
"' 'max

be the state space of the disease, where T is the set of discrete times

which are the ages of interest in ascending order and T is the
maximum age under consideration. Depending on the nature of the disease,

t could be weeks, months, quarters or years and Tmax should be large

enough to cover screening of any individual carrying the disease.

State 0t is assumed to indicate that the individual of age t is

free of disease. States lt through Nt indicate that the individual of

age t is in one of N occult stages of the disease. Stages I' through
N. indicate that the disease has surfaced clinically or been detected

by examination for an individual of age t, when he was in one of

stages 1 through N, respectively. States 0' and D indicate that the

individual has been "cured" or "died," respectively. In Figure 2,

the stages of the disease are depicted schematically.

An individual may start life healthy and remain in that state

until he dies of other causes. or he might start life healthy and develop

the disease sometime during his course of life. Then he will be classified

as in one of the stages 1 through N. The disease may progress from

one state to another until he is surfaced clinically (through signs

or symptoms) or is detected by screening. Then he goes from state n

(1 5 n N) to state n'. In this simplified model, regardless of the

progression of the disease in the detected stages, the individual is

assumed to die from the disease or be cured. That is, depending on the

age of the individual and the stage he was detected in, whether or not

the disease progresses through the detected states, he has a certain

chance of going to states 0' and D. In Figure 3, the transitions that

are permitted in the model are illustrated in more detail.

Figure 2: A schematic diagram of the stages of the disease.

Figure 3: Structure of the disease.

There is no restriction on the condition at birth, which means

that a newborn infant could be healthy (with respect to the special

disease under consideration) or possibly in any one of the occult stages.

This modification plus "time dependency" of the disease would be

necessary in the case of some kinds of site specific cancers such as- --


At any time the disease process can be terminated by death from

causes other than the disease, therefore the attention is restricted to

the disease process and the results are conditioned on the individual

surviving to the age of interest. For modeling purposes, the possibility

of death from the disease prior to recognition of the disease is included

in the model with clinical surfacing from the disease. This is because

death from the cancers is unlikely prior to clinical surfacing of the

disease. Therefore, there is no one step transition from occult stages

of the disease to the state "death from the disease." The process

terminates when the individual dies of any cause or is cured through

the process of treatment. The re-examination of the individual after

the treatment is done will not be considered. This is because the risk

of a second cancer is usually high enough that screening of former

patients cannot be considered routine screening.

Let Xt by the state of the individual of age t, given that he is

alive. For instance 210 would mean a 10 periods (weeks, months,...)

old person is in stage 2 of the disease. Let the disease process be

tracked only at the end of fixed intervals of time. Conceptually, the

random process {X} can be pictured as a process that goes from one state

to another. As soon as a new state is reached, the process randomly

chooses the next stage to be visited. After choosing this next stage

and depending on which stage the process is currently in and will next

go in, the process randomly chooses a time required to make that

transition. Therefore the resulting process {X} is a Semi-Markov

process and the sequence {X } would be a Markov chain discrete time

process. The one-step transition probability

Pij = P {Xn+l = jXn = i}

has a very special structure, which consists of upper diagonal blocks.

This can be seen from Figure 4.

It is seen that the only possible transition for an individual

of age t is to go to age t+l, regardless of progression of the disease.

To clarify the process, a sample function of the disease process is shown,

in which the individual is detected in stage 2 at age 6 and gets cured.

See Figure 5.

3.3 The Screening Process

The disease process goes from one stage to another until it is

clinically surfaced through the nature of the disease itself. If the

disease is a time-dependent process, it usually surfaces at a stage

in which there is a relatively poor chance of survival. Therefore,

there is an interest in detecting the disease in its early stages of

progress, which can be done through screening process. The screening

process is a process in which, at specific points in time, the individual

who is suspected to have the disease, is examined by one or more of

several screening methods. In order to screen for a disease, one needs

a simple, rapid, relatively accurate test to select from a general

population those persons who would benefit from further diagnostic

studies [45]. A screening method has several characteristics, which in

AGE 2 .....




Figure 4: One step transition probability diagram.


2t 2




0 1 2 3 4 5 6 7 8 9
Age t

Figure 5: A samDle function of the disease process.


general make it possible to choose one method over another, if any one

is to be chosen. For instance, a screening method may be harmful [89]

because of the unreliability of the technique used, resulting in

increased psychological trauma because of high false-positive rates and

perhaps decreased surveillance and an unwarranted sense of security

because of the high false-negative rates and because of possible

deleterious side-effects of screening. In general an ideal screening

test must have an acceptable false-negative rate and false-positive


An individual should be screened under a screening policy which

determines the ages at which the individual is to be screened, and the

method which is going to be used for screening. Assume that there are

M different examination methods and that screening examinations are given

at ages T1 < T2
one screening test per each time interval is done and the screening

examination for the period (t,t+l) is done at t which is a time

very close to t but greater than t. A sample function of the process

with and without screening is shown in Figure 6.

It is assumed that when an individual in state i (1 i N)

is screened he will stay in the state i (if test gives a false-negative

result) or go to state i' (if the test gives a correct result), and

therefore there would be no error in stage recognition of the screening

method. In the following material, the effect of screening on the

process will be determined.

Let bit = Probability of transition from stage i to i'

in the t-th interval with screening.

b:t = Probability of transition from stage i to i'

in the t-th interval without screening.

With scr~ning

_ without scrorning

I -

t Age t

Screen at time t-= 7

Figure 6: A sample function of the process with and
without screening.

Define f.(t) to be the probability that a person who has been in state i

at age t is properly classified as diseased. Then screening will affect

bit in the following manner:

b. = P[test is done in the t-th interval] -

P[individual being in state i at the time of test] -

P[test gives correct result] +

P[test is done in the t-th interval] *

P[test gives false-negative result] b' +
P[test is not done in the t-th interval] b .

Define Zt as a zero-one variable, which takes value one if the test is

done in the t-th interval and zero otherwise, i.e.,

O If test is not done in the t-th interval
Z =
1 If test is done in the t-th interval.


bit = P[ZtO] f (t) P[Xt+=ijXt=i] + P[Zt=O] +

+ [P[ZtO] (1-f.(t))]l b'
t 1J it

bit = Zt f(t) P[Xt+=iixti] + {(1-zt) + zt'1-fi(t)] b

bit = Zt fi(t) P[Xt+=ilXt=i] + [l-Zt-fi(t)] b.t

Later, the system will be assumed to be discrete so that the process

can jump only at the end of a discrete interval. This assumption is

equivalent to saying

P[Xt+ iIXt = i] = 1

in which case b.t reduces further to

bi Zt fi(t) + [l-Zt.fi(t)] bt

This means

bt If screening is not done in
b.t = the (t,t+l) interval
f (t)+[1-fi(t)]-b' If screening is done in the
(t,t+l) interval

A simple way to see this is by the use of the following argument

bl If test is not done
b =it 1 If test is done and gives true-positive result

bit If test is done but gives false-negative result.

But the following probabilities are associated with the above events:

P[test not done in the (t,t+l) interval] = 1 Z

P[test is done in the (t,t+l) interval and gives true-positive

result] = Zt fi(t)

P[test is done in the (t,t+l) interval but gives false-negative

result] = Zt[1-fi(t)]


bl with probability (1-Z )
bit 1 with probability Zt fi(t)

bit with probability Zt [l-fi(t)]
I v 1-1

bit = bt (-Zt) + 1 Zt fi(t) + bit Zt[l-fi(t)]

=> bi Zt fi(t) + [l-Zt.fi(t)] bt

A similar argument can be employed in the case of other elements of
the transition probability matrix. Define

aijt = Transition probability of going from i to j in the t-th
period without screening
aijt = Transition probability of going from i to j in the t-th

aijt = a t{probability that the test is not done} +

a jt{probability that test is done}
(probability that test gives false-negative result)

aijt a -jt {PP[ZtO] + P[Zt] [l-f (t)]}

a it aijt (-Zt) + Z [1-ft)

a = a [l-Zt*f (t)]
aijt ijt

A simple way to see this

a1 jt


is by the use of the following argument:

If the test is not done in the (t,t+l)
If the test is done in the (t,t+l) interval
and gives true-positive result
If the test is done in the (t,t+l) interval
but gives false-negative result.


aijt 0
aat 10j

with probability

with probability

with probability


z t f-f.(t)
z t E~-f i(t)]

This gives

a t = a jt[-Z t.f(t)]
ijt ljt t 1

Let U' = Probability of staying in
without screening

Uit = Probability of staying in

The same analysis reveals that

Uit = Ut [l-Zt fi (t)
Uit it

This analysis shows that

probability are affected

state i in the t-th period

state i in the t-th period.

all elements of the one-step transition

by screening in the following manner:

No screening in Screening done in
Probability element Notation (t,t+l)th int. (t,t+l)th int.

Probability of detection bit b' fi(t)+[1-fi(t)].bt
i it 1 1 it
Probability of stay Ui* U1 U [1-fi(t)]

Probability of jump aijt* a~j a'ijt[-f(t)]

Note that Uot and aolt, which are related to the "Healthy" state, are not
dependent on the screening procedure and regardless of the screening method
used, they will remain unchanged.

It is seen that U and al are decreased and b't is increased through

the process of screening. Later on, in this research, this analysis will

be employed to develop a general form for the probability transition


3.4 Estimation of Transition Probabilities

Any model has certain associated parameters, the realistic estimation

of which should be a primary goal of the analyst. In the model employed

in this research there are four types of parameters.

1) Probability of staying in the occult stage i in the

interval (t,t+l).

2) Probability of going from stage i to j (in the occult part

of the disease) in the interval (t,t+l).

3) Probability of detection of an individual in state i at age t.

4) Probability of survival of an individual in state i at age t.

The probability of going from i to j includes onset of the disease

as a special case. In order to have a realistic estimate of these para-

meters, data available in the literature should be employed consistently.

The survival probabilities are given in the literature on cancer and a

realistic estimate of these probabilities for each stage and age group

is not that difficult (although lead time effects are difficult to

estimate), but the first three sets of parameters mentioned above are

extremely difficult to estimate. This is due to the fact that there is

no consistent data on the occult part of the disease, and there is no

simple way to estimate those parameters. It is known [35,89,98,99]

that the unavailability of data on the occult stages is due to the fact

that almost all detected individuals go under treatment as soon as they

are screened and found to be in some stages of the disease. Then

it remains to be clarified what are those statistics presented in the

literature, say the percentage of the people in each age and stage

group. A close look at those data reveals that they are on the

detected stages of the disease, because there is no way to gather

data unless a diseased individual is detected. For instance, assume

there are data indicating that a% of people who eventually get the

disease are found in state i at age t. The only way to interpret these

data is that in the long run, assuming no trend of the disease in time,

the steady state probability of being in state i' at time t is a.

If these data are available for a system in which there has been

no scheduled screening policy in the process of data collection, then

this information would be very useful in estimation of the parameters

of the model, i.e., a.jt' bt and Ut. To do this, it is necessary to

hypothesize a model structure for the occult part of the disease and

then check the model's output with data. The model which gives the

closest match between output and data would be the one which has more

chance of representing the actual phenomenon of the disease in the


The following approach is employed in estimation of the model's

parameters. It is decided to use the data as the probabilities of

being in the detected states (call them P[Xt=i'])** and compare them with

In the use of this distribution, it is assumed that irrespective of
the screening time and interval, the population probability of natural
detection remains unchanged.

In this development the prime denoting the probability of natural
detection has been dropped. It will be reintroduced later.

the corresponding theoretical values. Due to the dependence of the

process on the initial state occupied and time dependency of the disease,

there is no long-run distribution. Therefore another concept will be

employed in which lim P gives the steady state probability of being
n- ijt
in state j after t time intervals, given that the individual started life

in state i at time 0; Pnj could be computed from postmultiplication
of matrix Pn-l by matrix P.

Assume that there is no more than one transition per unit interval.

This is possible by making the interval small enough to cover the

possibility of any short duration transition. Let P[X = i'] denote the

probability of an individual of age n being detected in state i'. The

following observations are used to establish a mathematical form for

P[Xt = i']. U

Observation 1:

P[X SIX =S] = U (I)
n 0 j=l sj S

Proof: The only possible way for an individual to stay in the state

he was in n periods ago, is to stay there at each and every interval,

which means the product of the probability of being there at the end

of his first, second, ..., and n-th age period.

Observation 2: If there is only one step in going from S to T

(i.e., one arc), then

D[Xn-1 n Xo n:
[ j0 1 si=l s STj+1 k 2 Tk!j


Proof: If the individual wants to be at T by age n, he has to jump

from S to T sometime during (O,n). Thus he has an extra time of n-I

periods to spend in any possible combination of states S and T.

Observation 3: If there are two steps in going from S to T (i.e., two

arcs), then

n-2 n-2 1 J+1
P[Xn=T Xo=S] = Ui aSW,j+ T U k
j=0 Q =j i=l k=+2

a WT,+2 +3 UTm (III)
aWT, +2 i m=2+3

a SW aWT
S W--- T

Proof: If the individual is to be at T by age n, he has to jump from S

to W and from W to T sometime during (O,n). The rest of the time

(n-2 remaining periods) has to be spent by stayingin any possible

combination of states S, W and T. The same type of discussion could be

used to generalize the observation for cases of higher order.

Observation 4: i
P[Xn=i'] = P[X nli] bin (IV)

Proof: Due to the assumption employed before (there is no mistake

in stage recognition in the detection techniques), there is only one

way to be detected in stage i at time n; the individual should have been

in state i at time n-1.

The same equality could be derived by using the definition of
conditional probability

P[Xn=T] I P[Xn=TIXt =S] P[Xt =S] 0 to n (1)
o o
all possible S's

because for the special case when to = n-l

P[Xn=i' ] = P[X=i' n ] P[][Xn1=I] = bin P[Xnli]

limit for n=t. The reason behind this is that nobody who is in any
occult stage after t-l can be detected at t, and if somebody has been
detected before time t, he would not account for P[Xt = i'].
Use of Observations 1 through 4 plus a special case of equation 1
(where to = 0) makes it possible to find a mathematical form for
P[X n=i] and P[Xn=i'] as a function of U's, b's and a's.
As an example, consider a disease which has been postulated to
have the following structure:
0 a0 2
P[X n='] = bin.P[Xn-_1] ao 1 2 2

= b: n{P[X ln-= Xo=O] P[Xo=O] + b b2

P[Xn-l=l1Xo=] P[Xo=]} I

where P{X n=1|Xo=O} and P{X n=1 Xo=1} could be found from Observations 2
and 1 respectively.

P[Xn=2'] = b2n P[Xn-1=2]

= b2n.P[Xn-l=21X =O] P[X0=O] + P[X n-l=2X =l] P[Xo=l] +

P[Xn1=21Xo=2] P[X=2]}

where P[Xn 1=21Xo=0] is the sum of two different conditional probabilities

a) going from 0 to 2 straight (one arc)

b) going from 0 to 2 through 1 (two arcs)

Observations 2 and 3 give expressions for these probabilities

respectively. Moreover P[Xn _=1'IX =l] and P[Xn_ =2'IXo=2] are found

from Observations 2 and 1. Therefore having initial conditions P[X =O],

P[Xo=1] and P[Xo=2], it would be easy to find the steady state

probabilities as functions of the elements of the probability transition


The question is how can P[Xn=i'] be used to estimate the model

parameters? A close look at the structure of the transition probability

matrix reveals that depending on how the occult part of the disease is

to be modeled, a different number of parameters are to be estimated.

Assume that the number of parameters of the model of interest is M. Using

the methodology developed, the following algorithm will give the unknown

parameters uniquely or as functions of other parameters whose estimates

would be necessary. The following three concepts are employed in the

establishment of the algorithm

1) For each time interval (say n) there are N equations (one for

each occult stage) of the form P[X =I'] = a where ai is the value
n n
obtained from data and P[Xn=I'] is the mathematical function developed
in this section.

2) For each time interval (say n) there are N+1 equations of the

form Y P.. = 1; one for state "healthy" and N for N occult stages.

3) For each time interval (say n) there are M unknowns of the

form a.jn, b. and Ui .

Therefore there are a total of 2N+1 equations and (M*2N+1) unknowns.


Step 1: Start with t=l.

Step 2: a) There are 2N+1 equations and m unknowns. Depending on

the number of arcs (transitions possible) there may or may

not be a unique solution. If not M-(2N+1) of the parameters

must be realistically estimated.*

b) Solve the system of equations for the remaining unknowns.

Step 3: Let t = t+l, go back to step 2, unless t = Tmax, in which case

stop. Therefore starting with first period and going in steps

of one period, the algorithm gives systematically the value

of parameters uniquely or as functions of M-(2N+1) other

parameters. Therefore, the algorithm solves the system of

equations and determines the remaining unknown parameters,


3.5 Objective Functions

Having structured an analytic model of the disease, policy selection

can be approached in a straightforward fashion. It should be noted

that there are several possible measures of performance; different

measures may lead to different "optimal" screening policies. The most

common objectives used in the policy selection of cancer screening are

One way to get an estimate of these parameters is to find a bound on
them by estimating the corresponding parameter in the detected stage.
This method will be discussed later in this research.

1) Minimization of the total cost--Cost of screening, cost of

treatment and cost of losing a patient due to death.

2) Maximization of life expectancy.

3) Maximization of the probability of detection of the disease

in a favorable stage.

4) Maximization of the lead time--Time between the detection of

the disease under screening and the time that disease would have been

discovered under no screening program.

Cost-effectiveness measures of screening policies are an objective

of much interest, because if screening is to be used nationwide, there

should be a net positive benefit (measured in dollars) of doing so. The

total cost of a screening program consists of the cost of screening, the

cost of treatment, the cost attributed to the period of disability of a

patient and the cost to the society of losing an individual. Therefore

Expected total cost of a screening program = E(S)+E(T)+E(D)

where E(S) = Expected total cost of screening

E(T) = Expected total cost of treatment

E(D) = Expected total cost of death due to disease.

In this section, each of these cost elements is expressed mathematically.

In order to do this, an undiscounted analysis is employed and later on a

discounted version is used to transform all the costs to present worth.

a) Screening Cost

Assume that M screenings are scheduled for ages T ,T ...,T

The expected total screening cost is the sum of the expected costs of

screening at T1,T2 ..., and TM. However, since an individual will be

screened at age t only if he is in an occult stage (including state

"healthy"), the following expected value analysis is used.

Expected cost of screening at age t =

(cost of screening at age t)(probability of being in an occult

stage at age t)(probability of test done at age t)

Let Cs(t) = screening cost at age t

P[Xt=i] = probability of being in an occult stage i at age t.


Expected cost of screening at age t = i C (t) P[Xt=i] Zt

where 0 is the set of occult stages, i.e., 0 = {0,i,2,...,N}.

The expected total screening cost is the sum of this quantity over

all intervals, i.e.,

E(S) = O i Cs(t) P[Xt=i] Z
t=0 ie 0

b) Treatment Cost

Treatment cost will occur whenever the disease is diagnosed on

the screening tests or clinically surfaced. At a specific interval

(t,t+l) the expected treatment cost E(t) is

E(t) = (cost of treatment of an individual in stage i' at age t)

(probability of being in a detected stage i' at age t)

(probability of accepting the treatment)

Assume that every individual when detected accepts treatment, and define

CT(t,i') = Treatment cost of an individual who is in stage i' at

age t.

P[Xt=i'] = Probability of being in a detected stage i' at age t.


E(t) = CT(t,i') P[Xt=i']
i 'EDs

where 0 is the set of detected stages, i.e., De{l',2',...,N'}.

The expected total treatment cost is the sum of this quantity over

all intervals, i.e.,
E(T) = t i CT(t,i') P[Xt=i']
t=0 Vi'eD

c) Mortality Cost

In the simplified model, there is no possibility of death due to

disease before detection, and any individual who is in a detected state i'

at age t may die with probability d'(t) and/or be cured with
probability [1 d.(t)]. This transition is assumed to happen one interval

after he gets into the detected stage. (This is not what happens in

reality, but it is a good approximation of the actual process and is

employed here because most of the data on survival are given by age at


Expected cost of death due to disease at age t = (cost of death at

age t)(probability of being in a detected stage i' at age t)(probability

of death at age t in stage i')

Let Cd(t) = Cost of death at age t (present worth of future income)

d.,(t) = Probability of death due to disease for an individual

who is in stage i' at age t.

Expected cost of death due to disease at age t =

I Cd(t) P[Xt=i'] d ,(t)
i ED

The expected total mortality cost due to disease is the sum of this

quantity over all intervals, i.e.,
max }
E(D) = iD Cd(t) P[Xt=i'] di'(t)
t=0 i I0 C

Since this model is based on the

not die due to any other causes up to

probability of this event, then PL(t)

objective function. Therefore

assumption that the individual does

age t, let PL(t) represent the
has to be incorporated in the

Expected discounted total cost =

mx {i C5(t) P[Xt=i] Zt +
t=0 ieO i'eO


P[Xt=if]) PL(t)

where r = discount rate

PL(t) = probability of not dying of other causes up to age t.

It is shown in Section 3.4 that P[Xt=i] and P[Xt=i'] are known in terms
of the parameters of the model and the decision variables Zt. It remains

to give a realistic estimate of quantities Cs(t), CT(t,i'), Cd(t),

di,(t) and PL(t).

3.6 An Alternative Expression for the Objective Function

The criterion of minimization of the expected total cost resulted
in an objective function derived in the previous section and repeated

O.F. = Y Cs(t)-P[Xt=i] Zt +
t=0 iO

+ I [CT(ti')+Cd(t)di(t)] P[Xt=i'] PL(t)
ieD d t J (r+1)t

where Cs(t), CT(t,i'), Cd(t), d.(t), P (t) and r are the known parameters
of the optimization problem and P[Xt=i] and P[Xt=i'] are functions of
the decision variables Zt O0titmax

For any site-specific disease, the methodology developed in Section 3.4
could be employed to write P[Xt=i]iE0 and P[Xt=i']i eD as functions of
the disease parameters aj, Ui., b. and the decision variables Zt. To do
this in the formulation of P[Xt=i] and P[Xt=i'], the parameters are
written as functions of Z in the following manner:

at = at [-Zt fi(t)] ift

Uit = U1 [1-Zt fi(t)] ifO

bit = Zt.f(t) + [-Zt.f(t)]bt

where a; U' and b! stand for the probability elements estimated for
Ijt' it it
the case of no screening program available and aijt, Uit and bit are
the corresponding values for any specific screening policy. In other
words, prime indicates measurement of elements of the disease when the
nature of the disease is unchanged by having no external screening test.
This reduces the objective function to a function of Zt, Ot't max'
where Zt can take one of the two integer values zero or one. Therefore
the optimization problem is an integer program but because of the
multiplication of variables.-- to see this look at any one of the

expressions for P[Xt=i] when aij, Ui and bi's are functions of Zt--

it is not an integer linear program and the classic methods of

solution to "integer programming" are not applicable. It remains to

evaluate all possible policies or use some branch and bound procedure

to omit any policy as soon as it is dominated by another policy already


The optimization problem is whether or not to screen the individual

at any time t. Therefore if t is a discrete time interval and varies
from 1 to t there would be (2) solutions, because at any time

t either there should be a test or should not; of which the best one

is to be chosen.

There may be constraints on the total number of tests permitted

per each individual and on the interval between two successive tests

due to the fact that certain tests are dangerous in nature and/or might

introduce some side effects. The constraints are mathematically shown


Constraint 1: Maximum number of tests permitted for an individual

is Sax
y z, t=l max

Constraint 2: The interval between two screens should be at least


Z. l j = 1,2,...,t max-tmin

Depending on the values of S and tmin, Constraint 2 may imply

Constraint 1. For instance if t = 72, Sax 10, tmin 9, then

there is no way to do ten tests or more and keep the interval between

two successive tests as desired, which means in this case Constraint 1

is implicitly satisfied by the second set of constraints.

3.6.1 Alternative Form 1

If it is assumed that f.(t) which is the probability that a person

who has been in state i at age t is properly classified as diseased, is

a constant independent of age and stage, then it is possible to simplify

the objective function and reduce the computational aspects of the

problem significantly. The assumption of constant true positive rate of

detection is valid in the case of some of the diseases such as Neuro-

blastoma and, in general, the range of variation of f.(t) is sufficiently

small that a constant could be used to represent the true positive rate

of detection of the test. Later on the constant could be varied over its

possible domain and a sensitivity analysis could be used to determine the

validity of the assumption.

The assumption f.(t) = F is basic to the following analysis and is

used to write

aj = a jt(l-ZtF) ifo

Uit = Ut (1-Zt.F) ifO

b = b't(l-Zt-F) + ZtF

A close look at the mathematical form of P[X ni] and P[Xn=i'] derived in

Section 3.4 shows that each of these terms is the sum of several terms,

each of which is in the form of multiplication of n elements. The terms

are such that there exists exactly one element corresponding to each
time interval t; 1t- n. To explain this in more detail, take the

following simple case:

Observation 2, Section 3.3 gives

P[X 1T!Xo=S] = {T U2Si aST (j+1 UTk]}
j=0 i=1 k=j+2

and Observation 1 gives aST UT

n-l T
P[Xn-_l=TXo=T] = TT UTj
j=1 b'

Therefore T'

PnXT'] = -2 j a n-l
P[X=T'] 0 i= S T,j+1 lk=j+2 UTkJ o= +

j U [ bT,n

P[X-=T'] = [(aST,UT,2 ..UT,n-I+US,'-aST,2UT,3 ..UT,n +

..US,1...US,n-2*aST,n-l] P[ =S +

(UTl'UT2-.UT,n-l) PEXo=T] bTn

In this particular example P[Xn=T'] is the sum of n+l terms, each of
which consists of multiplications of n elements of the form a.., U. and
13 1
b. In each term, say aST, U T,2T,,3...UT,n- -bT,n, there exists one

and only one element corresponding to any time interval 1
aST,1 is the only parameter that carries the time subscript "1"). This
concept plus the assumption of constant f.(t) will be employed to derive
a simpler form for the objective function.

that the screening program is scheduled for Te{T1
the individual will be screened at ages T1,T2,.... and TM,

following notation

P'[Xt=i] = Probability

age t under
P[Xt=i] = Probability

age t under

P'[Xt=i'] = Probability

age t under

P[Xt=i'] = Probability

age t under

of being in the occult stage i at

no screening program.

of being in the occult stage i at

any screening program.

of being in the detected stage i' at

no screening program.

of being in the detected stage i' at

any screening program.

Then P'[Xt=i'] is known from the data and P'[Xt=i] could

data as a function of bi,t+ by the use of Observation 4

P'[Xt+ =i']
P'[Xt=i] : b'

For any time interval Ot

P'[Xt=i] = P[Xt=i] O0t

P'[Xt=i'] = P[Xt=i']


Temporarily, assume that every individual who will eventually get

the disease has the disease by the time of the first screening. Later

this analysis is used to modify the model to cover every possible case.

At time t = T1

P[XT =i] = (1-F) P'[X =i]
L 1 T 1


which means

and use the

be found from

This is because the last element of P[XT1 i] carries time T1 and is

modified by (1-F).

For t in the interval T1\ t
P[Xt=i] corresponds to time T and is therefore modified by (1-F).
Therefore the whole expression is modified by (l-F).
At time T2, under the same assumption (everybody has the disease
by the time of the first screening), in each term of quantity P[Xt=i]
there is one element corresponding to time T1 and one element

corresponding to time T2. These two elements are the only ones that
are modified by the factor (1-F). Therefore the whole expression is

modified by (1-F)2, i.e.,

P[XT =i] = (1-F)2 P'[XT =i]
2 2

This assumes that the outcome of each screen is independent of previous
results. The same analysis holds through for T2
generalized to establish that under the assumption that everybody who

eventually gets the disease has it by time T1, the screening cost will

I C (t) P[Xt=i] = C Cs(t) P[Xt=O] +
tcT iEO teT

(1-F)Cs(T P[XT =i]+(-F)2C(T2) P'[XT =i] + ... +
i 1

(1-F)1 Cs(TM) P'[XTM=i]

This is because the P[X =O] is not effected by the screening program;
(it is written separately).

Observation 4 can be employed to replace P'[XT.=i] by its
equivalent quantity P'[XT.+ =i']/b.,T+ .. Therefore
J 3

Screening cost = Cs (t) P[Xt=O] +
M P'[XTk+ l i']J
Cs (Tk)-(1-F) --- -
k=l i,T +1

where M is the total number of tests.

To show the relationship between P[Xt=i'] and P'[Xt=i'] it is
necessary to investigate the changes due to bit in addition to those of
ai and Ui and to consider the effect of screening on each individual
element of P[Xt=i'].
Taking into account only those individuals who get the disease
before time T1 it is obvious that for Ot between P[Xt=i'] and P'[Xt=i']--simply because there is no test in this
interval and therefore aij, Ui and bi's remain unchanged.

At time t = T1, according to Observation 4

P[XT =i'] = P[XT 1=i] b.,

But from the previous analysis P[XT1- =i] = P'[XT =i], because there
has been no test up to T1. Moreover, since a screening is done at time T

bi = b. (1-F) + F
b i,T1 T i,

Substitution for P[XT _=i] and b i, n P[XT =i'] gives
T 1j T i1"*

PCXT i'] = P'[XT 1=i] [b (1-F)+F]
1 T I i i ,T -'

But according to Observation 4

P' [XT 1=i] = P'[XT=i']/b ,T1

P[XT=i'] b= -XTI-I [bT (1-F)+F]
,1 b
Il T

= P'[XTl i'] [1-F+ ]
i ,T1

For Tl
t=T1 and this element is either of the set a or U. Therefore the whole

expression is modified by factor (1-F), i.e.,

P[Xt=i'] = (1-F) P'[Xt=i'] T1
At time t = T2, under the assumption that everybody has the disease by

the time of the first test, in each term of the quantity P[Xt=i'] there

is one element corresponding to time T1 and the term biT corresponding

to time T2. The element corresponding to time t=T1 is of the set a or U

and is therefore modified by factor (1-F) and the element biT2 is

modified by

b = b' (I-F) + F
i ,T2 I ,T2


P[XT =i'] = (1-F) P'[XT =i'] [1-F+ b
2 2 i,T2

For T2 times T1 and T2 and each of them is modified by the factor (1-F).


P[Xt=i'] = (1-F)2 p'[xt=i']


In a similar manner the whole treatment cost and death cost can
be shown to be
Treatment cost + death cost = , P'[Xt=i'] +
i'ED t=O

CT i 1-F+ b F P'[XT =i] +
T1 1 T 1

t Ct (1-F) T
t=T1 +1

P'[Xt=i'] + CT2, (1-F) (1-F+ b-- P'[XT2 i'] + ...
'2'' i T2ij 2

where Ct,i = CT(t,i') + Cd(t) di,(t).

Therefore under the main assumption that all of those individuals who
will eventually get the disease are involved at or before time TI,
the expected total cost will be
max k= k
(cost) = Zt'Cs(t) [Xt=0] + (-F)k
P'[X =i'l] max V k
b. + C, -(1-F) -

ibt [ i

Not everybody has the disease at or before T1 and, in fact, only
a known percentage of them may have the described situation. To find

this percentage, assume that onset time (the time that the individual

gets the disease), has a negative exponential distribution with rate X .


-X t
P[XtO] = Xoe o


T "T
SP[Xt/0]dt = 1 e o

Therefore (1-e 1) percent of all the individuals who eventually get

the disease have had it at or before T1, and their contribution to

objective function is

1 e 'l] (cost)1

Using this argument, let us divide the population into M+1 groups--

M is the total number of test scheduled--in the following manner:

Group 1 consists of those individuals who have the disease at

or before T1.

Group m consists of those individuals who get the disease

between T _l and T m=2,...,M

Group M+1 consists of those individuals who get the disease

after last screening.

From assumption of negative exponential distribution of time of onset,

it is clear that

-A -T. -Ti+l
P[TitTi+] = e 1 e 1+1

Considering the second group it is clear that for any individual
in this group there would be no benefits from the first screening,
but from then on they get the benefit of any other screening.
Therefore, using the same analysis as that of group one, it is clear
that their contribution to the objective function is

e o 1 T2 mx ( I F)kT +1 Zk

P'[Xt+l=i t+ max
b,t+l t=T+l '1 'E

I D Z tC s(t) "
i 'ED
( z1

l-F+ b F Zt .P'.[X=i,
b,t J Ji

A similar analysis could be used for other groups.

The expected total cost is the sum of all contributions from different
groups: t

04 *T. -1 T. I max
O.F. = T e- e -=T o i +
T^T t +T1
J j-1

p'[X t,=i']
FZ C(t) t +
1 t s b i,t+l-

1-F+ 'F ti P'[ x =i']


S zk
k=T +1 k
(1-F) j-

Z zk
k=Tj +1k
F) J-1

+ zI t Cs(t)

where T = {Tk; k = 0,...,M+1} and Tg = 0, TM+ = tmax

In this formulation Zt is the decision variable which takes on zero

or one, F is the true positive rate of the test, o is the rate of onset

of the disease and bit is the conditional probability of transition

from state i to i' during time (t-l,t). In the objective function o ,

and therefore P[Xt=O], and b.it are the only parameters that are not

explicitly known from data but the method of Section 3.4 could be used

to estimate them.

If, instead of negative exponential, the onset time has a general

distribution function FT (t), the objective function would be
t Zk
t max k=T. +1 -
O.F. = FT j)-FT (T j-1) I (1-F) 3-1
T.eT o o t=T. +1

P'[X t+1=i i ] k=T 1+1 k
Zt Cs(t)'- b +* (1-F) *

SZ max
(l-F+ t Cti P'[Xt=i] + Zt C(t)


3.6.2 Alternative Form 2

This method is also used to determine the value of P[Xt=i] and

P[Xt=i'] as a function of Zt. The basic idea is the same as the first

method and the population is grouped according to the time of onset of
the disease. Therefore at each screening time different categories are

considered each of which gets a different benefit from screening tests.

For instance, for time t f T1 there is only one group which gets a

benefit from screening and the cost associated with them at any time
t is
I Zk
SCs(t) Zt (l-F)k=l P'[Xt=i] +

S Ct,i, 1I-F+ F_ Zt P'[Xt=i,] t T1

For any time t (T1
Group 1: Those who had the disease at screening time T1.
Group 2: Those who had not had the disease at screening time T1.

The cost associated with the first group at any time t is


SCs(t) Z (1-F)k1 Z P'[X =i,XT ] + Ct,i (1-F)

1-F+ b--F t P'[t=i 'XT l=J TI

where j is the set of occult stages.

The cost associated with the second group at any time t is
: z k
k=T +1 k
SCs(t) Z (-F) 1 PX =iX =] + C

1-F ZF t P =ix T0] T tiT2

In a similar manner for any time t(T2 people:

Group 1: Those who were carrying the disease at screening time T1.
Group 2: Those who were healthy at T1 but carried the disease by
Group 3: Those who had not had the disease at time T2.

The cost associated with these three groups at any time t is


a) C (t) Zt(I-F)k1k P'[Xt=iXT j]I +

C (1F)2 1-F bt Z P[X =iXTT ]

k=T+1 Zk
b) 1 Cs(t) Zt (I-F)

[ P'[x 'X =0-,XT ]


c) Cs(t) Zt (l-F)k=T 2

Sct,i, 1-F-+ b zt
1 bi,t

* P'[X =i,X =0,X T2J]i

b-F Zt

P [Xt=i,X2=0

for T2

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs