MTHEMATiCAL MODELS OF PROGRESSIVE
DISEASES AND SCREENING
BY
ABDOLLAZIM HOSHYAR
A DISSERTATION RESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA IN PARTIAL FUL'ILLMENT OF
THE REOUIREMENTS TOP THE DEGREE OF DOCTOR CF PHILOSOPHY
UNIVERSITY OF FLORIDA
To my Wi6e, Meixi,
and my daughter, Hani,
who have made nurmneouh 6 sacti icuZ so that
I might ptnsue aa ,it goat.
ACKNOWLEDGMENTS
I wish to express my sincere appreciation and gratitude to the
members of my doctoral committee for their overall guidance, understanding
and friendship in assisting me in my research. In particular I am deeply
indebted to my committee chairman Dr. Ralph W. Swain not only for his
technical insight and timely assistance but for suggesting the area of
screening for research and for his attitude toward my work. His
exemplary contributions to my education will never be forgotten. I wish
to extend my appreciation to my committee co-chairman Dr. Thom J. Hodgson
for his assistance and encouragement. I would like to thank the other
members of the committee, Dr. Kerry E. Kilpatrick, Dr. Gary J. Koehler
and Dr. Jeffrey P. Krischer for their comments on drafts of the manuscript.
I would also like to thank Dr. Jeffrey P. Krischer and Dr. Lawrence S. Frankel
for providing useful references.
I would like to thank Dr. A Ghavami, Deputy Chancellor, Dr. M.S. Mayeri,
Dean of the School of Engineering, Dr. M. Bahadori-Nejad and all the
faculty members of the department of Mechanical Engineering of Pahlavi
University for offering me a scholarship which made my post-graduate study
possible.
I am also grateful to the Division of Health Systems Research and
the Department of Industrial and Systems Engineering for their financial
support during the study.
Finally Mrs. Beth Beville deserves many thanks for her excellent
typing of the dissertation.
TABLE OF CONTENTS
ACKNOWLEDGMENTS .................................. ........... .
KEY TO SYMBOLS ...................................................
ABSTRACT .... .....................................................
CHAPTER
ONE INTRODUCTION .............................................
TWO LITERATURE REVIEW ........................................
THREE MODEL DEVELOPMENT .....................................
3.1 A General Model of Cancer Screening .................
3.2 The Disease Process .................................
3.3 The Screening Process ...............................
3.4 Estimation of Transition Probabilities ..............
3.5 Objective Functions .................................
3.6 An Alternative Expression for the Objective Function.
FOUR NEUROBLASTOMA ............................................
4.1 Literature on Neuroblastoma .........................
4.2 Development of Unconditional Probabilities ..........
4.3 Objective Function ......................... ........
4.4 Parameter Estimation and Determination of Optimal
Policies .......................... ...............
4.5 Sensitivity Analysis ................................
4.6 Results and Conclusions .............................
FIVE SPECIAL CASES ........................................
5.1 Dependency of True Positive Rate of Two Successive
Examinations .......................................
5.2 True Positive Rate as a Function of Time from
Onset of the Disease .............................
5.3 Screening Examination Consists of a Sequence of
Tests ............................ ............... .
5.4 Transient Problem ...................................
5.5 Aoplication of the Model to the Case of Breast
Cancer Using the Results of HIP Study ..............
PAGE
iii
vi
ix
1
5
52
52
54
58
66
72
76
93
94
105
114
116
121
124
133
134
136
139
154
161
CHAPTER PAGE
SIX DETERMINATION OF AN OPTIMAL POLICY ...................... 180
6.1 Branch and Bound Method: Search for an Optimal
Solution .................. .... .... ........... 181
6.2 Search for a "Good" Heuristic Solution ............. 197
SEVEN CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH ......... 205
APPENDIX DERIVATION OF THE GENERAL EXPRESSION FOR TRANSIENT
PROBLEM ..................... ...................... 208
REFERENCES .............. .. ........ .. ......... ................ 212
BIOGRAPHICAL SKETCH ............... ................................ 220
KEY TO SYMBOLS
A1 = 1 F
aij = Transition probability of going from i to j in the t th
period without screening
aijt = Transition probability of going from i to j in the t th
period under any screening program
B1 = 1 F + F/bit,
bilt = Probability of transition from stage i to i' in the t t
interval without screening
bi = Probability of transition from stage i to i' in the t t
interval under any screening program
Cd(t) = Cost associated with death of an individual of age t
Cs(t) = Screening cost at age t
Csp = Cost of screening the population of susoects to the disea
t,i') = Treatment cost of an individual who is in stage i' at age
Cti, CT(t,i') + Cd(t) di(t)
ij(t) = Probability of death due to disease for an individual who
is detected at stage i at age t
fi(t) = Probability that a person who has been in stage i at age
t is properly classified as diseased
f(t) = True positive rate of screening as a function of the time
from onset
C = Constant true positive rate of screening
I = Number of population per one diseased indiviouai
h
h
se
t
CT(
d
LB = Lower bound to the objective function
M = Number of screenings
OF = The objective function
O = Occult stages = {1,2, ... N}
D = Detected stages = {1',2', ... N'}
PL(t) = Probability of not dying of other causes up to age t
P'[Xt = i] = Probability of being in an occult stage i at age t under
no screening program
P[Xt = i] = Probability of being in an occult stage i at age t under
any screening program
P'[Xt = i'] = Probability of being in a detected stage i' at age t under
no screening program
P[Xt = i'] = Probability of being in a detected stage i' at age t under
any screening program
R(Z,k) = The ratio of population who have completed (t-k) examina-
tions up to i th interval
r = Discount rate
S = State space (stage of the disease, age of the individual)
Ti = Time of the ith test i = 1, ... M
Tmax = Maximum age under consideration
Uit = Probability of staying in stage i in the t th period
without screening
Uit = Probability of staying in stage i in the t th period
under any screening program
Z Decision O if test is not done in the t th interval
t variable -1 if test is done in the t th interval
0 = Healthy state
X0 = Rate of onset of the disease
vii
X' = Rate of detection of the disease
XI = Rate of transition from (-)s category to (+-)s category
2 = Rate of transition from (-)s category to (++)s category
23 = Rate of transition from (+-)s category to (++)s category
i = PCX = i]
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of
the Requirements for the Degree of Doctor of Philosophy
MATHEMATICAL MODELS OF PROGRESSIVE
DISEASES AND SCREENING
By
Abdollazim Hoshyar
August 1978
Chairman: Ralph W. Swain
Major Department: Industrial and Systems Engineering
A stochastic model for a screening program is presented in which the
natural history of the disease is assumed to progress through a set of
stages before detection. A model of the whole process is developed
which addresses the interaction of the disease process and screening process.
The purpose of the model is to develop insight into the disease process
and to derive policies which are optimal relative to the particular
objectives chosen.
The model is implemented for neuroblastoma and breast cancer and, in
the latter case, a comparison is made between the model's output with that
of an actual study group. An investigation is also made of the special
cases when there is dependency between screening results, when the true
positive rate of screening is a function of time from onset, when
multiple tests may be employed in each screening and/or when the prevalence
oool has a transient period.
To determine the "best" policy, an attempt is made to solve the
optimization problem, but to avoid an extensive computation time a
ix
"heuristic" method is also developed which determines "good" policies
efficiently. It is shown that a number of screening strategies yield
objective function values close to the minimum value for the example
studied.
CHAPTER ONE
INTRODUCTION
The fact that some diseases may reach an advanced state without
obvious symptoms, coupled with potentially better cure rates associated
with early detection,indicates the potential benefits of early detection
of the disease. The probability of detection can be increased by the
process of screening. While screening programs continue to attract an
increasing number of researchers from all fields of science, most of the
work done is primarily medically oriented and, generally, there remains
a lack of a unifying theory relative to the timing and type of screening
used.
The analysis of screening processes centers around determination of
an optimal number of examinations at specific ages. This objective is
of interest for its theoretical and applied consideration. Theoretically
it is a challenging problem which can be difficult to model. Less than
perfect predictability, which is a characteristic of screening proce-
dures, plus variability of the characteristics of the disease in differ-
ent individuals makes the problem difficult to analyze. The real life
situations involving different costs and benefits associated with differ-
ent screening plans provide practical interest in determining good
screening strategies. There is general agreement that significant
differences exist in the use of one screening policy over another
[32,35,49,52,78.89,98,99], and in some cases, such as pap smear and x-ray
examinations, the screening itself may not be harmless [35]. Both cost
1
and health effect factors motivate the need to evaluate alternative
screening patterns in a systematic manner.
1.1 Objectives of This Research
A few investigators have presented heuristic procedures for screen-
ing of a specified disease. One reason for the lack of a good mathemat-
ical model for screening is the lack of understanding of the disease
process itself. If a particular disease could be modeled mathematically,
with its parameters realistically estimated, then the problem of screen-
ing would be easier to analyze. In order to present a "good" screening
policy, there is a strong need for a basic mathematical model of the
disease.
This research is directed toward the development of optimal screen-
ing policies. To accomplish this task for any specific disease it is
necessary to decide 1) who should be screened, 2) how often they should
be screened and 3) what combination of test methods should be used at
each examination. A model of the whole process will be developed, which
addresses the interaction of the disease process and screening process.
The purpose of the model is to develop insight into the disease process
and to derive policies which are optimal relative to the particular
objectives chosen. Some of these objectives are: minimization of the
total cost of medical care (including the cost of screening and treat-
ment), maximization of life expectancy, maximization of the probability
of detection of the disease in a favorable stage and minimization of the
delay in time between onset of the disease and its detection.
Increasing the number of examinations increases the screening cost,
but hopefully, to some extent, it also increases the chance of earlier
detection of the disease. Age of the individual at the time of examina-
tion is a basic factor and depending on the disease under consideration,
it can play a major role. Screening at an age in which disease is not
likely to be active will increase the cost and may introduce some side
effects; screening at the proper age helps the physician detect the
disease before its natural detection time and therefore increases the
chance of survival and may decrease the cost of treatment. Also using
different combinations of screening methods might increase the screening
cost, but it decreases the chance of false negative results. Determina-
tion of basic factors affecting a screening program, presentation of a
mathematical model for that particular disease with estimation of its
parameters, validation of the model and selection of the best screening
policy depend on the objectives chosen.
Therefore the objective of this research is to develop a general
model of cancer screening, specialize it for Neuroblastoma and breast
cancer, investigate model sensitivity to changes in parameters or
assumptions, and derive good screening policies by use of optimization
theory.
Chapter Three provides a general formulation of screening programs
and presents a model of the screening process superimposed on the disease
process. Restrictive assumptions on the process as well as a discussion
of relevant optimality criteria are presented. This chapter is intended
to provide a general model applicable to any disease. It consists of
four parts: part A presents a stochastic model of disease progress in
an individual, part B postulates the probability structure for the
progress of the disease and employs available data to estimate the
elements of a transition probability matrix, part C presents different
objective functions of interest and estimates the terms, part D contains
the basic idea of screening and develops different strategies of
screenings.
Chapter Four is based on the development of the previous section
and extends the results to the case of Neuroblastoma which is a cancer
of early childhood.
In Chapter Five several assumptions of the model are altered and
a sensitivity analysis is employed to determine the robustness of the
model to changes in some parameters. In the first three sections the
assumption of constant screening accuracy is relaxed and the expected
total cost of screening is expressed as a function of screening policy
for different cases. In the last two sections, the initial effect of a
screening program on population prevalence pool is investigated and the
model is implemented for the case of breast cancer.
In Chapter Six an attempt is made to solve the optimization
problem efficiently and determine the "best" policy. Finally a "heuristic"
method is presented which determines "good" policies.
CHAPTER TWO
LITERATURE REVIEW
The problem of screening progressive diseases has received
considerable attention in the recent years due to the impact it has
on society as a whole. Daniel G. Miller [71] under the topic "What
Is Early Diaqnosis Doing?" looks at cancer as a very peculiar disease
.Cancer affects the health of other members of society
as well as the patient. A disease, the cost of which is
fifteen billion dollars a year, one half of the annual
budget deficit, has to be considered a matter which affects
the health of the nation, if only because it diverts medical
manpower and facilities which could be used for other
urgent health care needs. Furthermore, a disease which
disrupts families,. . must be considered in terms of its
total impact on the society. Currently one in every six
dollars spent in health care is spent on cancer, .. .the
cost of surgery, radiation therapy, and terminal care is
estimated to be at twenty-thousand dollars per case, ..
Miller points out the importance of early detection and mentions
that the use of an appropriate screening policy might reduce the cost
associated with cancer.
Gilbertsen [36] in his report on 14,978 cases of cancer claims
that his studies suggest that the majority of commonly occurring cancers
can be detected on periodic examination far earlier than is likely when
the patient waits until his symptoms force him to see the physician and
prompt the physician to undertake examination and eventual diagnosis.
The result of his studies also suggest that when cancers so detected are
treated promptly and adequately, substantial improvement in prognosis
for survival can be anticipated for patients with most of the common
cancers which occur today. This concept has been pointed out by many
other investigators [4,19,27,45,69,83].
Early diagnosis, which is possible through screening at-risk
individuals, is an important concept in aborting or ameliorating the
consequences of such diseases as cancers. Considerable funds and
effort are being expended for the purpose of the early detection of
cancers. The investigations are mostly concerned with the comparison
of the costs and benefits of alternative individual screening strategies
for selected site-specific cancers and with the selection of preferred
strategies. An individual screening strategy is defined [98,99] as the
specification of the number and type of screens to be given to an
individual in a particular at-risk group and the ages at which the tests
are to be given. Screening is operationally defined [99] as the process
of selecting those asymptomatic persons who would benefit from further
diagnostic studies. The selection of one screening strategy over
another depends on the criteria by which the strategy is evaluated and
the constraints placed on alternative screening schedule.
Before reviewing the literature a set of definitions related to the
screening will be presented.
1) The true positive rate of screening is the probability the
test indicates an affected individual, given that the individual has
that particular cancer.
2) The false positive rate of screening is the probability the
test indicates the disease is present when the disease is not present.
3) The true negative rate of screening is the probability the
test indicates the disease is not present when the disease is not
present.
4) The false negative rate of screening is the probability the
test fails to indicate the presence of the disease in an affected
individual.
5) Onset time is the first age at which some recognizable
biological change occurs.
6) The pre-clinical state of the disease is regarded as a state
where clinical symptoms have not been exhibited and the individual is
unaware of the disease.
7) The clinical state of the disease is a state where clinical
symptoms are exhibited.
8) Clinical surfacing: If the disease is not detected by a
scheduled screening examination, the disease will be said to surface
clinically.
9) Lead time for a screening program is the difference between
time of diagnosis by the screen and that later time when the disease
would be clinically apparent and detectable.
To make a cost-effectiveness analysis, the following requirements
would have to be specified by the analyst.
1) The necessary data on the nature of the disease--including
detection rates, onset rate, and survival rate.
2) A decision as to what assumptions are to be accepted concerning
the disease progression in the occult part.
3) A specification of the accuracy and reliability of the different
screening methods, including false-negative rates, false-positive rates
and the degree of the uniformity of the test results.
4) A specification of the constraints on the number of screens and
their interval per lifetime of an at-risk individual.
5) A specification of the effectiveness measures of interest.
To implement a model to a site-specific cancer there is a strong
need for the appropriate data. Unfortunately for most types of cancer,
data do not exist and/or are not in a useful form, which makes it
extremely difficult to estimate most of the epidemiologic parameters.
For data to be useful, it should carry information on the individual's
age, sex, race, social level, education, age at each screen, mode of
screening, result of screens, whether or not the individual was eventually
found with disease through screening, the cause and date of his/her death.
Recently there has been some effort to collect data in a more useful
manner. For instance, in the case of breast cancer, in December 1963,
the Health Insurance Program of Greater New York (HIP) started a long
term randomized trial directed at the question "Does periodic breast
cancer screening with mammography and clinical examination result in a
reduction in mortality from breast cancer in the female population."
Two systematic random samples, each consisting of 31,000 women aged 40 to
64 were selected. Each woman in the study group was offered a screening
examination and three additional examinations at annual intervals each
consisting of a clinical examination, mammography, and an interview.
The women in the control group were matched to the study women for date
of entry and continued to receive their ordinary medical care
[85,86,87,95,101]. These data have been used extensively by many inves-
tigators for the purpose of modeling the behavior of breast cancers
[39,89,90], estimation of epidemiologic parameters [49,55,56,87],
estimation of false negative rates in medical screening [39], estimation
of the effect of screening on 5-year mortality rates [85,86,87,95,101]
and determination of factors that correlate with promptness in seeking
diagnosis [31].
The accuracy and reliability of the different screening methods are
important factors in early detection of the disease and for any
particular type of screen an estimate of the false positive rate and
false negative rate of screening has to be made.* In 1942 Sawitz and
Karpinos [82] proved that when each individual in a group receives the
same number of examinations, estimation of the efficiency-true positive
rate of screening F, and of the prevalence rate, P, are given by
SR[1-(-F)n] R p R K
n K nNF N-[I-(l-F)n]
where R is the total number of positive test results, n is the number
of examinations given to each individual, K is the number of individuals
with any positive test results, and N is the number of individuals in
the group. In this formulation F is the adjusted ratio of positive
examinations to the number of examinations on detected diseased indi-
viduals and P is the adjusted ratio of detected diseased individuals to
the number of individuals examined. Therefore knowing R, n, N and K, the
procedure can be used to get an estimate of true positive rate of screening
and prevalence.
Later, in 1951 Mantel [68] modified these estimates for the
case of unequal number of examinations given to different suspects.
Although the literature on estimating the prevalence and true positive
rate of screening is cited here, these techniques are not used in the
development of the model.
Wittes and Sidel [112] presented a method for estimation of
population size from the simple capture-recapture model and showed how
to obtain estimates when there are more than two independent sources of
notification. They defined
K = the number of notification sources
Ei = probability that a member of the population is identified
by i-th source
N = the total size of the population
n. = the number of members of the population identified by the
i-th source
nij = the total number of members of the population identified by
the i-th and j-th source
n = the total number of different population members identified.
They found that El is the solution of the following polynomial of degree
K-1
K
n1-nI Tl (1-niE/nl) n-E1 = 0
1 =1
Estimate of E2,..., En are obtained from El
ns
s s = 2,...,n
s n1
The population size is estimated as
K
N = n/[I- 1 (1-E0)]
i=1 1
They gave an approximation to the variance of N and tabulated it for some
values of E,, E2 and n.
Goldberg and Wittes [39] proposed a similar capture-recapture method
[109,11o,11] to estimate the sensitivity of the screen, the false
negative rate, the population prevalence, the population incidence,
their means, variances, and covariances, and their properties in small
strata. They used HIP findings [87] to illustrate the model. They
defined
i = number of the screens i=0,...,S
M. = number of individuals attending the i-th screen
d. = number of cases of disease detected at the i-th screen
NO = true population prevalence at the 0-th, or initial screen
Ni = true population incidence between the (i-1)-st and the i-th
screen (i>0)
3i = the number of false negatives
ai = the number of false positives.
Suppressing the subscript i, at any screen = = where and 9
are false positive rate and false negative rate. They assumed that false
positives have been removed and considered a screening program in which
subjects are screened S+1 times by K different screening methods. Let
dij = number of cases of disease detected at the i-th screen by the
j-th screening mechanism. (i=0, ..S, j=l, ..K) An unbiased estimate of
the total number of diseased individuals at the i-th screen, Ni, is
K= K
S(Ni-d. (+1-1 (N -di) i=O,...,S
They also computed the variance of N for the case K=2.
It is seen that the general limitation of all these procedures
is that, for some diseases, the assumption that examination efficiency
is the same for all infected individuals may not be realistic, and
instead examination efficiency may vary from individual to individual,
depending on the nature and stage of the disease.
It is generally agreed that many cancers can be modeled as progressing
through different stages [5]. The staging of the disease is useful,
because of its importance in treatment and its effect on prognosis.
In December 1965, the American Joint Committee on Cancer Staging [16]
offered the TNM system which was based upon the three capital letters;
T Tumor or primary lesion and its extent; N lymph nodes of the region
and their condition; M distant metastasis. Within each letter element,
increasing involvement was categorized by the combination of the capital
letter with a numerical suffix. But this staging was too general to be
used for all types of cancer. In fact, depending on the type of cancer
under consideration different investigators employed different staging
methods.
Eker [22] introduced a method of staging for Carcinomas of the
colon and rectum in 1963.
Cutler and Myers [17] introduced a method of staging for breast
cancer in 1967.
Barron and Richart [6] introduced a method of staging for the cancer
of cervical carcinoma in 1968.
Evans et al. [27] introduced a method of staging for Neuroblastoma
in 1971.
Aside from the fact that the importance of early detection is known,
there are only a few mathematical models for screening, none of which
is general enough to cover all forms of the disease. In what follows,
the literature on screening models is reviewed and general concepts of
screening programs as a tool for early diagnosis of malignant diseases
are pointed out.
In 1963 Lincoln and Weiss [65] derived properties of the time with
disease before detection in recurrent screening and investigated the
consequences of these properties with respect to the interval between
screens. They considered the efficiency of different policies for
scheduling medical examinations, and treated both periodic and random
examinations allowing for imperfect diagnosis depending on how long the
disease had been present. Working with that portion of the population
in which a tumor appears, they defined
U(t) = Probability density for the time at which one can
first observe the presence of a tumor.
a(t) = Probability that a diagnosis made at a time t after
the appearance of the first observable signs of tumor,
will be incorrect.
Let examinations occur at times T,T2' ..., such that the intervals
A1 = l'A 2 = T2- ,... are independent identically distributed random
variables with probability density function p(A). Then under the
assumption that a tumor is discovered only by examination, they used the
idea that the examination times {r.} form a renewal process, and if the
tumor first becomes observable at t, the time to detection is
Td= T1 + T2 +3 + ... Tn
where n is the number of tests before detection, T is the forward delay,
or time to first test following the initiation of the disease, and T.
is the time between the st and th test. Then
is the time between the j-lst and jth test. Then
n(t,x) = [1-a(x)]f(t,x)
where n(t,x) = probability density for the time to discovery of the
tumor conditional on its having become observable at
time t,
and f(t,x) = probability density for the event that a test occurs at
t and that any diagnostic made in (t,t+x) were incorrect.
Also,
n(x) = 0 u(t)n(t,x)dt
where n(x) = probability density of tumor age at discovery.
They found the expression for moments of n(x) as functions of a(t) and
$(A). Using the quantities shown and employing the following two
optimality criteria they developed an optimal schedule:
Criterion 1: No more than a fraction e<1 of those people who eventually
have a tumor will have an undetected tumor for more than
a specified time T.
Criterion 2: Mean undetected time of tumor growth does not exceed a
given time T .
Weiss and Lincoln [106] used the model in [65] for the case of
cervical cancer. They used a gamma distribution for u(t) and a negative
exponential for A. where A = Ti Ti-l. Neglecting death from other
causes, they obtained some characteristics for screening period by using
an (a,b) policy, where a is time and b is a probability. The (a,b) policy
is one in which the probability that any tumor is of age a or older at
the time of discovery is b.
In 1967 Feinleib [29] presented the mathematical justification and
restrictions for the well-known epidemiologic relation that the prevalence
of a disease is proportional to its incidence and mean duration. Let P(t)
be the prevalence of a disease at time t; i(t) the incidence of disease
at t; and g(djt) the conditional probability density function of
durations of incident cases where d is the duration from time of onset,
then if P(O) = 0,
P(t) = i(y).g(xly) dxdy
0 t-y
For the stable disease model, he imposed the following three
restrictions
1) i(t)=i for all t 0 and is zero for t<0.
2) g(dlt)=g(d) for all t 0.
3) g(d)=0 for d>M.
For this model he proved that P(t) = i 0 for t>M, where d is the
mean duration.
In 1968 Hutchison and Shapiro [49] published their work on the
estimation of some parameters of preclinical breast cancer. They used
preliminary findings of a clinical program of screening for breast cancer
to estimate average duration of preclinical disease (early stages of the
disease, in which tumor may be detected only in a screening program).
They assumed that in any large population there are some individuals
with preclinical disease. The number of such women (prevalence) depends
on the rate with which new preclinical disease develops (incidence) and
the length of time (duration) it persists before clinical diagnosis.
Then if the number of preclinical cases remains constant during the long
run, the rate of new cases must be the same as the rate at which old
cases are passing over to clinical disease. In general a prevalence P
and an incidence I imply that the average duration is d = -
Moreover if, in the absence of screening, the duration of pre-
clinical disease were the same for all individuals, then duration-to-date
of those detected at screening would be uniformly distributed between
(O,t), average duration-to-date would be half the total duration, and
this would be equal to average lead time. In their mathematical model,
the only input was
Id = incidence of cases of duration d. I is expressed as a
discrete distribution function with I Id = 1.
d=O
Where d = duration, or interval of time during which an individual case
is diagnosable by screening but not diagnosed under usual practice. Then
they gave functional form for prevalence, incidence, mean duration and
lead time as a function of Id'
For instance,
n
P = Id d + n Id
S d=O d=n+l d
where P = prevalence at time n of cases detectable by screening but
not diagnosed under usual practice.
Also
n
I(n) = Id(n-d)
d=O
where I() = total incidence in interval n following screening.
Using preliminary HIP findings [84,85], they estimated the average
duration of preclinical breast cancer in absence of special screening to
be 20 months. This will give an average lead time of 10 months for a
completely homogenous population. Later on it will be seen that there
are some assumptions inherent in this model which makes their analysis
restricted.
In 1969 Blumenson and Bross [11] presented a mathematical analysis
of the growth and spread of breast cancer. They developed a mathematical
model which describes the development of the cancer from the appearance
of the first cell to the possible occurrence of a distant metastasis.
Their model also accounts for limitation on the minimum size of the tumor
before it can be detected and for the effect of surgical intervention
by the physician on the development of a recurrence of the disease. They
used a deductive method and constrained the progress of breast cancer to
the contribution of the following parameters (1) the tumor doubling time,
(2) patient's delay in reporting her disease to the physician, (3) the
chance that the disease will spread to nearby lymph nodes, and (4) the
chance of spreading to more distant parts of the body. A patient is
classified as having a small primary tumor (S) average diameter less
than 5 centimeters, or a large primary tumor (L). A patient either has
negative nodes (0), 1-3 positive nodes (2), or more than three positive
nodes (4). After surgery the patients are followed for at least 18 months.
At the end of this period a patient is classified as (N) or (R) depending
on whether she had no clinically detectable recurrence. This introduces
12 stages; SOR, S2R, S4R, .... They presented a method for calculating
the probability of being in any of these states as a function of the
parameters introduced. Once the twelve probabilities have been calculated,
they are compared with the data and a 2-test is used to determine those
values of the parameters which minimize the X2-value.
Calling their model a deep mathematical model* for human breast
cancer, interestingly, they found out [14] that the only way to get
an acceptable X2-value is to employ a two-disease hypothesis for breast
cancer, i.e., the population of patients consists of two groups with
different rate of doubling time.
In 1969 Zelen and Feinleib [116] presented a model of a chronic
disease which progresses from a pre-clinical state to a clinical state
and related the potential benefit of the screening program to the
lead time gained by early diagnosis. They developed a stochastic model
for early detection programs which led to an estimate of the mean lead
time as a function of observable variables. They considered a screening
program where an individual was examined only once. In their model,
transitions are from a disease-free state (SO) to a preclinical disease
state (S ), and then to a clinical disease state (Sc). They assume that
(S ) eventually progresses to (Sc) if not detected and treated. They
define
q(t) = p.d.f. of sojourn time in S
p
Q(t) = q(x) dx
P(t) = probability of being in S at time t
Qf(t) = unconditional forward recurrence time (lead time)
distribution
m = mean sojourn time in Sp = Q(x) dx
*
A deep model describes an underlying process which in theory,
generates the surface events.
and assume that
t Q(y) dy
Qf(t) =
T m
2 2
rn-i-a m 21 r-
L = mean lead time = m2- = (1+C2
where C = and m and a2 are the mean and variance of the sojourn time
m
distribution in S They mentioned that L> is because of the length-
biased sampling which means the screen does not detect people at random,
but detects people with longer preclinical sojourn times. They also
mentioned that particular care must be exercised if one is comparing the
survival of the individuals detected early by a test with a comparable
group of individuals detected at a clinical state because those found
in S tend to have longer preclinical sojourn times in S than the
p p
general population. This might be synonomous with a slow growing disease
in the preclinical state. Zelen and Feinleib consider different
conditions under which the following relationship referred to by
Hutchison and Shapiro [49] and Feinleib [29] is valid:
P = m -I
where P, I and m are prevalence, incidence and mean duration of the
disease. They proved that even if prevalence and incidence are time
dependent, P(t)/I(t) is equal to the mean sojourn time in S provided
the sojourn time follows an exponential distribution. In general
I P() P -+ (C2-_) P'(t) + ... where C = 2
m m
They developed relationships among age, prevalence and incidence
and employing data from HIP [87] found an estimate of 1.84 years for m.
Finally they relaxed the assumption that every individual eventually
leaves the (S ) and developed some relationships which depend on the
ratio of those who eventually will surface to the total population
and is a generalization of their model to non-progressive diseases.
In 1972 Kodlin [57] used a series of biometric arguments and very
simple cost estimates to attack the cost-benefit problem in screening
for breast cancer and found that survival results would justify the
increased costs that might result from mass screening. To do this he
claimed that in the case of breast cancer, one is faced with essentially
two alternative basic strategies:
1) To screen and bring the positives to therapeutic intervention--
associated total cost is called C1.
2) To let the cases come to diagnosis and treatment through the
traditional pathway of recognition--associated total cost is
called C2.
Let t = treatment cost for those picked up by screen
t' = treatment cost for those not picked up by screen
W = W0 (physical exam) + (biopsy fee) W1
S = screen cost
b = biopsy rate amongst false positives
Then true positive false negative
C1 = P Il.(t+W+s) + P(1-TI).(t'+W+s) +
false positive true negative
(1-P)-(l-]2) (Wo+b-W1+s) + (1-P) (H2).S
where P = presumed frequency of breast cancer in the population
il = conditional probability of identifying a case correctly
by mammography and palpation
12 = conditional probability of identifying a non-case correctly.
Also
C2 = P-(t'+W) + *(1-P).[m+(l-nH)(Wo+b-W )]
where
S= fraction of the non-diseased who demand a breast check-up,
and
m = mammography cost.
Using some estimate of the parameters P, l, n2, 2 and costs, he found
that C1 is usually greater than C2. Then he attempted to assess the
total cost per case cured and using this objective found that it would
be beneficial to choose the first strategy, screening.
Shapiro, Goldberg and Hutchison [84] used the experience in the HIP
Study [87] to estimate the average time gained through screening in
the detection of breast cancer among the women who were aged 40-64 years
at the start of the screening program. They used the model presented
by Hutchison and Shapiro [49] which says d = P/I and found a mean
duration of 1.3 years for a prevalence of 2.73 and incidence of 2.09
as calculated from HIP data. The statistical models that were applied
suggested that the average lead time was about a year.
In 1974 Kirch and Klein [53] developed methods for determining
the optimal screening policy using the criterion of detection delay
which is time from first point at which disease is detectable to the
DOint of actual detection. They started with an age-dependent disease
and showed that, under certain conditions in an optimal schedule, the
interval between examinations is proportional to the square root of the
age-specific incidence of the disease. They were interested in possible
advantages of nonperiodic policies in mass screening, and tried to
find out whether a nonperiodic schedule, involving the same expected
tests per patient as a periodic schedule, could reduce the average time
to detect a given disease, or, whether a nonperiodic schedule involving
fewer expected tests per patient could lead to detection of the disease
as early as a given periodic schedule.
They divided population into two groups, those who will eventually
get the disease and those who will not. For the first group, interest
was centered on early detection. For the second group, interest was
centered on minimization of the expected number of tests, and found a
class of optimal schedules by varying the number of screenings. They
assumed
1) The age span consists of "equal length periods" in which
incidence rate is usually tabulated in the literature.
2) Each such period starts with an examination and all examinations
within a period are at equal intervals.
3) Examinations are error-free.
Define: T = Earliest time at which the disease could be detected, if
an examination took place (they assumed T is uniformly
distributed over the interval).
x. = Number of tests scheduled for i-th period.
D = Length of time between T and examination time.
Q(xi,D) = Detection delay if the disease becomes detectable in
the i-th period.
S. = Probability that patient survives to the start of period i.
Assuming that T has a uniform distribution, they found the expression
for E[Q (xi,D)] as a function of D and xi for two different cases:
1. D is a constant.
2. D is a random variable with probability density function
PD(t).
They used the optimization model
n
Minimize G(x,...,Xn ;D) = Pi-E [Q(xi,D)]
i=l
n
S.T. xi S. K (K is a constantan)
i=l
x. 1
where P. represents the conditional probability that the detectability
point occurs in period i, given that it will occur sometime within the
n periods of interest.
Kirch and Klein [52] applied this model to breast cancer and found
that there is a slight economic advantage (2% to 3%) if examination
frequency is taken to be a function of age rather than fixed throughout
life. Their model has some basic assumptions (such as screening is
error-free) which are not always true, and consequently restrict the
application of their model.
In 1974 Tallis and Sarfaty [97] presented a model for the distri-
bution of the time to reporting cancer, called T. The basic assumption
of their model was that the infinitesimal conditional probability of
reporting the disease is proportional to the rate of tumor growth. Let
the distribution function of T be F(t) = P[T t] with derivative
F'(t) = f(t), then the force of reporting function is
f(t) 0 0 t < t0
l-F(t) =-() c.V'(t) t t
where V(t) is the tumor volume t time units after onset, a is a
constant of proportionality and t0 is defined by the equation
V(t0) = v0. For instance, v0 may be the minimum clinically detectable
tumor size. Then
F(t) 0 0 t < t
S= -exp(-a.V(t)) t o tO
They defined R=T-to0IT> t, and used a staging method similar to TNM [16]
which consisted of four stages SI, S2, S3 and S4 in such a way that each
stage is associated with certain volumes of tumor growth, i.e.,
vi-_ < V < vi is classified as Si, i=1,...4.
Let ri be such that V(t0+ri)=vi, i=0,1,2,3, (ri=0), then if Pi
is the probability of reporting the disease in Si,
P1 = P[R
P2 = P[rl R< r2] = [exp(-a-v )-exp(-cav2)]exp(arvO)
P3 = P[r2IR
P4 = P[r3 R] = exp[-a-v3+a-v0]
They used a standard model for tumor growth V(t0+r)=v0-.er where B is
the rate of growth and found E(R) and E[V(R)] as functions of a and S,
E(R) = ea- o y-le-Ydy
a-v0
E[V(R)] = expected tumor volume at reporting
l+a- v -
0 -1
Knox [55,56] in his simulation studies of breast cancer screening
programs postulated a model for the natural history of the disease
which consisted of a set of stages. A statement of the natural history
of the disease was then provided in the form of a transition matrix
which gave estimated transfer rates between the various stages. This
set of values was adjusted iteratively until an output was provided
which matched available data on incidence, prevalence and mortality.
He used HIP findings as his data [87].
The full set of stages for the simulation included 26 defined
stages and the general sequence of the transfer pattern was held constant.
The simulation was developed in four steps.
1) Some specifications for the natural history of the disease
were developed which make the model capable of explaining available
incidence, prevalence, mortality and case-fatality data.
2) Sensitivities and specificities for palpation and mammography
were developed that make the model capable of explaining the results
of HIP.
3) The results were extrapolated to predict the benefits of
extended screening services.
4) The scope of Urinary-Steroid prescreening tests was examined.
He points out that mortality savings are not to be seen as the
sole criterion and it is possible to provide short but useful prolon-
gations of life without grossly affecting the cumulative death statistics.
He also mentions that
results of the HIP experiment are not the results
that we shall be needing. We would need to know quite
accurately, for each age group, the effectiveness, hazards
and costs of an extended palpation program using staff
other than doctors, together with the teaching of self-
palpation, and the marginal benefits, risks and costs of
a selective and limited use of mammography, superimposed
upon this background. ... Having regard to the limited
acceptability of screening procedures, their high costs,
.. .,a reasonable service target may be a reduction of
breast cancer mortality by about one-tenth.
In 1976, a significant amount of research was completed. Prorok
[78,79] presented a stochastic model for a periodic screening policy
in the case of a chronic disease whose natural history is assumed to
follow a progressive path from a preclinical state to a clinical state.
The distribution of the forward recurrence time (time interval from
initiation of the disease to its detection) is derived and used to
obtain the distribution and mean of the lead time (time interval between
the point at which early diagnosis occurs as a result of a screening
test and the point when disease would have been detected in the absence
of screening), and the relationships for calculating the proportion of
preclinical cases detected. Prorok defines (So) to be disease-free
state, (S ) preclinical state and (Sc) the clinical disease state. He
develops a model for the interaction of the disease process with an
independent screening process. Assuming that screening time is
independent of the time at which the preclinical state is entered
and the duration of stay therein, and allowing for the possibility of
imperfect detection, he derives all necessary distributions.
Prorok also measures the number or fraction of preclinical cases
which are actually detected earlier than usual as a result of screening.
His model is a generalization of the model presented by Zelen and
Feinleib [116] for the case of multiple screening. The restriction
on his model and models similar to it is due to the method of staging
used, and the assumption of periodic screening. The staging (SO, S ,
Sc) is very general in nature and by using it, we lose the information
available from data. Periodic screening is also a restriction which
is not necessarily optimal, since there are some indications that, in
the case of age-dependent diseases, the optimal screening policy would
not be periodic [53].
Galliher [35] has raised the question of how soon a repeat pap
smear should be scheduled in the case of a woman who has had no test
or has received negative smears up to date in her life. He used cost-
effectiveness analysis to determine optimal frequencies of such schedules.
Normally, there is a trade off between the amount of effort at pre-
vention and the resulting amount of advanced disease if not screened.
To analyze this problem there is a strong need for data on the occult
part of the disease, but, in almost all cases, the early disease has
always been treated when detected. Therefore, due to the lack of
appropriate data, he bounded the objective by obtaining two extreme
sets of objectives, between which the true evaluation of objective should
lie. He introduced two extreme levels, (L) and (M). Under L one
assumes: 1) carcinoma-in-situ could come to detection between the
scheduled preventive smears in more than 50% of all affected individuals;
2) only 40% or so of all carcinoma-in-situ would ever progress to
invasive disease if not treated. Therefore under (L) one obtains a
minimal role to the periodic screening. Under M one assumes:
1) carcinoma-in-situ would never come to detection except at the
scheduled preventive examinations; 2) carcinoma-in-situ, if untreated,
would always eventually progress to invasive cancer, and that invasive
cancer is always proceeded by carcinoma-in-situ. Therefore (M) provides
a maximal role to the periodic screening. In his paper, Galliher only
worked out level (M), because it offered certain simplicities in the
task of producing a cost-effectiveness analysis.
Galliher assumed a course based upon four possible stages of
diagnosis at screening, and defined them as
H = Healthy
U = Carcinoma-in-situ
V-C = (Micro) invasive disease, curable at detection
V-F = (Micro) invasive disease, fatal even though treated.
Figure 1 is a flow chart for the onset and course of the disease in an
individual.
Galliher's main assumptions are
1) Carcinoma-in-situ does not regress spontaneously.
2) Duration from onset of carcinoma-in-situ to onset of invasion
is negative exponentially distributed (he used a mean duration of
ten years in his computations).
3) Carcinoma-in-situ alone will not be detected by discomfort
of the woman.
4) Carcinoma-in-situ is completely curable at diagnosis.
5) There is a sharp transition to invasion at the end of the
duration of carcinoma-in-situ.
6) When the stage of invasion is entered, two competing processes
commence and operate concurrently; these are
a) tendency to surface clinically,
b) tendency to make the transition to fatal disease.
7) The rate for transition to fatal disease and clinical surfacing
are constants g and c, independent of how long invasion has been in
Fioure 1: Flow chart for onset and course of the disease.
process. Therefore if the invasive condition is not first diagnosed,
the chance that it remains curable for at least the next t years is
equal to e-t and the chance that the invasive condition remains
clinically occult for at least t years is equal to e-ct
8) Initially, the individual starts at 20 years old and has
no risk for previous cervical cancer.
He employed the following measures of effectiveness in his
computations.
1) The minimization of the total medical care.
2) The minimization of sum of 1) plus the economic loss to
the society if the patient dies.
3) The minimization of premature death.
Galliher used data to estimate incidence of onset, g, c and
different elements of cost. For the numerical task of finding optimal
schedules he needed the probabilities
P(i, j, m, t) = The probability that an individual who is in
stage i of the disease at the end of the m-th time interval will be
in stage j of the disease at age t if she does not have a pap smear
between the two stages. This quantity could be easily computed from the
above mentioned assumptions.
A mathematical model of breast cancer which is similar to the work
of Galliher [35] was developed by Shwartz [88,89,90,91,92,93]. His model
was based uoon the hypothesis that breast cancer is a "time dependent"
disease, and if it is detected and treated earlier, the prognosis will
be more favorable. However, due to the potential high cost of screening
(in terms of dollars, psychological effects, and physical side-effects),
it is important to establish estimates of the benefits of screening so
that one might better evaluate if the benefits appear worth the
costs.
Literature on breast cancer shows [88] that tumor size and extent
of lymph node involvement are the most important prognostic variables
that affect prognosis. Employing this, Shwartz: assumed tumor growth
rate is exponentially distributed, and the rate of lymph node involve-
ment is a function of the size of the tumor and its growth rate. Using
these functional forms, he compared the results with available data and
computed the parameters. Therefore his model consisted of a set of
hypotheses on the incidence of the disease, its progression, its tendency
to be detected without benefits of any scheduled screening examinations
and the relationship between stage of the disease and survival. Using
this model he found, for a woman in a given at-risk level, the level of
effectiveness as a function of the number of tests, the corresponding ages
at examinations, the reliability of the tests, and whether or not the
individual had performed self examinations between scheduled screening
examinations. Modifications in the model were made and a second model
was proposed so that the two models bracketed relevant hypotheses about
the rate of disease progression. The two models differed in their hypo-
thesis about the rate at which lymph nodes become involved.
Due to the lack of appropriate data, Shwartz vaired the assumptions
over the entire range and computed the benefits, with the hope that
actual process lies somewhere in between. For instance, he parameterized
the distribution of tumor growth rates, threat of death from breast cancer,
the false negative rate and the correlation between the false negative
probability on successive examinations.
Using heuristic techniques rather than optimization techniques,
he determined the best ages at which screening examinations should be
given to each individual.
The three benefit measures employed by Shwartz were
1) the life expectancy of a women from her current age,
2) the probability that detection occurs before nodal involvement,
and
3) the probability that there is no recurrence of breast cancer.
D.E. Thompson and T.C. Doyle [99] proposed an approach to the
selection of screening policy for cancer of the colon and rectum. They
presented an approach to analyzing the question of how often a person
should be screened for colorectal cancer to achieve a desired cost-
benefit outcome. To do this they reviewed and analyzed data on the
incidence and prevalence of the disease, the course of the disease, the
cost and ability of detecting the disease at various stages of progression,
the cost and effectiveness of treating the disease in any stage and
benefits of treatment.
Thompson and Doyle offered two approaches to modeling the disease.
1) A continuous model, based on tumor size and growth data.
2) A discrete state model, based on progress of the disease
through several stages.
In this paper they emphasized the second model, which, in the absence
of data reflecting the true progress of the disease, has been structured
in the context of available data and is relatively simple in design.
This simplified version of the discrete model was used to perform
parametric analysis.
Their model has two principal functions:
1) It provides a basis for determining the relative values of
different screening policies.
2) It provides a means of analyzing the sensitivity of screening
policies to variations of the onset and progress of the disease, the
efficiency of screens, costs associated and/or benefits.
Their discrete model can be viewed as consisting of
1) A set of discrete states, and
2) A set of probability distributions, which describe the length
of time that an individual remains in a particular state.
The model consisted of the following stages:
H : Health
A : Lesion confined to the mucosa but undetected
A': Lesion confined to the mucosa and detected
8 : Cancer into the muscularis propria with rngativ lym,,ph
nodes but undetected
B': Cancer into the muscularis propria with negative lymph
nodes and detected
C : Cancer with nodal involvement but undetected
C': Cancer with nodal involvement and detected
D : Distant metastases present but undetected
D': Distant metastases present but detected
M : Death.
Due to the unavailability of appropriate data, they proposed a
particular staging and obtained few primary conclusions. In this model
the status of an individual is related to whether disease, when detected
and treated, is curable or not, i.e.,
H : Health
P : Occult Colorectal Cancer, pre-fatal, i.e., curable if
detected
P': Colorectal Cancer, detected by clinical surfacing or
screening, curable
F : Occult Colorectal Cancer, fatal
F': Colorectal Cancer, detected and fatal
M : Death.
They assumed that the death rate (u(a)), onset rate (P(a)), rate
of progression from pre-fatal to fatal stages (X), rate of clinical
surfacing of pre-fatal (ip), rate of clinical surfacing of fatal
colorectal cancer (pf) and rate of death associated with detected fatal
cancer (u') were constants (i.e., they are assumed to have negative
exponential distribution). This assumption resulted in a Markov process:
Let Pi(t) = the probability that an individual is in stage i
at age t.
Then
Pi(t+s) = I P.(t) Bji(tt+s)
or
P(t+s) = P(t) B(t,t+s)
where Bji(t,t+s) = probability of a change from j to i in the time
interval from t to t+s.
Then screening was introduced into the model by means of the probability
of detecting occult cancer, i.e.,
ak = P{Detection of occult cancer, given that the
disease is in stage k}
They considered a screen at time tl, and denoted the time immediately
after the screen by t ; then
Pi (t) = P(t ) a + P (t
0 0 0 D
where iO, iD correspond to occult and detected stages of the disease.
Then
P(t+) = P(tl) A
where
1 for i=j and i corresponds to a
detected stage of the disease
1-ai for i=j and i corresponds to an
occult stage of the disease
A =
Aij a. i and j correspond respectively to
occult and detected stages of the
disease
0 otherwise
Therefore for time t2
P(t2) = P(t ) B(t ,t2)
= P(tI) A B(tlt2)
and so on.
They examined their model using the following three measures:
1) Oncological: More effective strategies are characterized by
porportionately higher detection in earlier stages.
2) Medical costs: This includes the cost of the screening program
itself, the cost of diagnosis associated with false positives, the
cost of treatment and the costs attributed to the period of disability
of a patient.
3) Life expectancy: More effective strategies are characterized
by longer expected length of time prior to death.
D.E. Thompson and R. Disney [98] introduced a general mathematical
model of progressive diseases and screening. Conceptually, the disease
history could be represented as progressing through a series of stages
whose durations are random variables and whose meaning could be inter-
preted in terms of the individual's prognosis and the sensitivity of the
disease to detection in that stage through clinical surfacing and/or
screening. Their model was a mathematical model of the interaction of
two independent random processes, namely, the disease process and the
screening process. The purpose of the disease portion of the model was
to predict a person's status at any age t. The model can compute the
probability that an individual is healthy or in some stage of the disease
and, if he has had the disease, the length of time he had been in the
particular stage.
They assumed that the screening methods could produce false negative
or correct results, and the probability of these results could depend
on the stage of the disease, the length of time in the current stage and
the particular method used for screening. In their model, as soon as
an individual dies of other causes or is detected as having the disease,
the process terminates. Moreover, the time from birth to death from other
causes is assumed to be a random variable independent of the disease
process. This assumption makes the analysis much simpler, because it
makes it possible to model the disease portion as a semi-Markov process
whose transition matrix is independent of the age of the person.
This assumption allows them to use the existing theory of Semi-Markov
processes.
Let E = {0,1,2,...,S,S+1,...,N} be the state space of the disease.
State 0 indicates absence of disease, and state 1 through S indicate
that the individual is in one of the S occult stages of the disease.
Stages S+1 through N indicate that the disease has surfaced clinically,
or been detected by screening. Let Yt be the state of the individual,
given that he is alive at age t.
T = age of the individual when he changes stage for the
n-th time
Xn = Length of time that an individual will spend in the
state he entered when he was Tnl years old.
They assume that everybody starts life free of disease. This is
not generally true (for example in the case of Neuroblastoma), and the
analysis can be modified to take care of this possibility.
Thompson and Disney propose the following assumption:
P{n = jXn x Yn-1 n iY ,Y.2 ,X.n-1 Xo}...
= P{Y = jXn x lYn-l = i}= Aij(x)
where
1 i if j=0
P(Y0=j) 0 if j~0
P(XO=O) = 1
and compute the probability of the event
B = Zt: Yt = j, Ut > x, Vt > y
where Vt and Ut are forward and backward recurrence times of the
process.*
They define quantity Qt(j,x) to be: Qt(j,x) = P{Yt = j, Ut x},
and consider the screen given at the time Tm = t. Assuming Qt(j,x)
denotes the distribution of (Yt,Ut) immediately after this screen, they
evaluate Q,(j,x) as a function of Qt(j,x). So given the initial condition
(Yo = 0, Uo = 0) at birth, the equation P{Yt = j, Ut < ylY = i, Us = x}
can take the distribution of (Yt,Ut) up to the time of the first screen
at age T1. Then Qt(j,x) = f[Qt(j,x)] modifies this distribution at
the time of the first screen, as individuals are taken from the occult
part of the disease to the detected states. A similar analysis is
applicable for the interval Tl to T2 and to the screen given at age
T2, etc.
This model is restricted to the case of diseases which are
stationary in time and are not age dependent. Unfortunately, the
literature on most of the cancers shows that there is a significant
difference in relative survival statistics among young and old indivi-
duals [5]. Therefore, in those cases this model has to be modified to
take care of time dependence of the disease.
The forward recurrence time at age t is defined as the interval from
age t to the epoch of the next state change of {Yt}, that is
Vt = Tn-t, if Tn_1 t
at age t is Ut = t-Tn_1 if Tn_ t
Albert and Louis [2,3,66] have done very broad research on the
subject of screening of progressive diseases in the last two years.
In their first paper [21 on screening for the early detection of cancer,
they have characterized the natural history of a chronic disease state
in terms of the distribution of X (a person's age at the time of entering
the disease state), Y (the sojourn time in that disease state), and
A (a person's present age) over a population of individuals. Then,
they have defined age specific incidence, prevalence, life time attack
rate, mean duration of the disease state, cohort effect, etc., in terms
of the joint distribution (X, Y, A), which for known (X, Y, A) gives a
method for estimation of those parameters. In their second and third
papers [3,66] they have found the impact of screening on the natural
history of the disease and presented a method for estimation of the
disease natural history. A brief review of their work follows.
They define
1) fXYA(.,,. ;t) = joint distribution for (X,Y,A) at any instant t.
2) A person is nonsusceptible if X = .
3) A person is a chronic habitue of S if, for that person
X<- and Y = .
** proportion of chronic habitue = Pr.{X<-, Y = -}
4) A cohort effect is said to exist if the distribution of
(X,Y) varies over age strata.
5) The lifetime attack rate of disease state S is
P [X < m] ;
P [X < =] = P [x< Y = +] + fOXYA(x,y,a) dx dy da
6) IS(a) = The age specific incidence of S among those aged a:
I(a) = fX,A(a,a)/fA(a)= fXJA (ala)
7) IS = The overall incidence of S:
** I= (a) fA(a) da
8) 45(a) = The age specific prevalence of the disease state
among those aged a.
a
S s(a) x= a-x fXY/A(xy/a) dy dx
9) bS = The overall prevalence of S:
** S= S(a) f fA(a) da
10) If there is no cohort effect and if E [XjX
then
= f s(a) da
E [YIX<] = 0 S(a) da
F I(a) da
This equation is a generalization of d=P/I used by other investigators
[29,49,116]. They define S1 to be the state which is entered upon
leaving S, and show that IS = Is, which is used by other researchers,
is true if the age distribution is flat and there is no chronic habitue.
They also present a method for construction of (X,Y) distribution
from prevalence and incidence information.
If there is no cohort effect
fx(x) = W I () + dI S(x)
If there is no cohort effect and X and Y are independent, then
IS (s)
[s-s(s)+TS (s)]
where f, I and are laplace transforms.
Then using counter examples they prove that without the assumption of
independence (between X and Y), the Y distribution is not unique.
Since Y denotes the preclinical latency and X0 denotes the age
at time of entering the disease state, then at the instant of time
XO+Y the patient surfaces. Now, consider a population of individuals at
a certain instant of time, t, each person has an associated vector of
sojourn time Z = (XO, X,1..., Xk,Y). This plus the person's age, A(t),
describe the natural course of the disease in that individual. Denote
the density of (Z, A(t)) over the study population fZ,A(t) and the total
population size by N(t).
In the absence of screening, they allow people to leave the study
population for two reasons:
a) Death from competing risk
b) By reason of surfacing with the type of cancer under study
and assume that
a) A clinically surfaced individual leaves the study population
forever,
b) False positives are eventually discovered and returned to the
study population. They actually take this probability to be zero.
Then if nt (Z,a)-dA-dZ is the number of individuals in the population
at time t that occupy the cell (Z,Z+dZ)-(a,a+da), they show that
an(t) ,(t) (t) (t)
t (Z,a+t) = M(t).fZ,A (Z,a+t)-p (Z,a+t)-n (Z,a+t)
provided 0 < a+t < Xo+Y
and of course in the complementary region, n(t)(Z,a+t) = 0.
Here
M(t) is the immigration rate,
.(t)
fZ,A(.) is the joint density of Z and A among those who immigrate
in at time t,
and
(t)(Z,a) = r(Z,a,t)-((Z,a) + d(Z,a,t)
where
r(z,a,t) is the screening rate at time t in the stratum Z=z,
A(t)=a, s(z,a) is the probability of a positive screen if Z=z and A(t)=a,
and d(Z,a,t) is the death rate at time t in the stratum Z=z and A(t)=a.
Therefore (t(Z,a) is the instantaneous net rate of removal of indi-
viduals from the above mentioned stratum.
The solution to the above differential equation is
n (z,a [n()(z,a-t)+K (z,a-t)].exp[-Q(t(z,a-t)]
n(t)(' ,a) : _
0 otherwise if < a< X+Y
where
Q( z,a) = t (u(z,a+u) du
and
K((r,) t (v) (,)
K(t)(z,a) = M(v)-fZA (z,a+v)-exp Q (z,a) dv
f0 ZA -
To find the joint density of Z and A in the study population at
time t, n(t)(z,a) should be normalized
(t) n(t)(z,a)
fZ,A(z,a) -if (t)
n (z,a)d dz da
(t)
Therefore the effect of screening on fZ,A(.) can be computed which gives
a means to predict the temporal behavior of epidemiologic parameters in
the presence of screening.
In order to answer the question of "what is a better strategy of
screening?" they introduce the concept of "critical point." Treatment
that is begun before this time point has a relatively high probability
of success, whereas treatment begun after this point has a markedly
lower probability of success. Their objectives are
1) The discovery of rate of less-favorable-prognosis disease,
If(t); If(t) dt is expected number of cases with less favorable prognosis
that are diagnosed in the interval (t, t+dt).
2) The salvage rate s(t); s(t) dt is the expected number of cases
discovered by screening in (t, t+dt), who have favorable prognosis, but
have Y> X> .
The following data are required as input to their computations.
1) Age specific death rate d(a,t)
2) Immigration rate, M(t)
3) Initial population size, N(O)
4) Age specific screening rate, r(a,t), as a function of time
5) The screening detectability function, f(z,a)
(0)
6) The initial distribution for (XO, X1, Y, A)--fZ,A(z,a).
7) The (possibly time varying) distribution for (Xg, X1, Y, A)
(t)
among immigrants--f ,(z,a).
In their third paper [66] Louis, Albert and Heghinian present a non-
parametric method for estimation of fXyA(.,. :t). To observe the
accuracy of their estimates, they generated data, used it to find the
estimates of certain epidemiologic parameters and compared these
estimated values with their corresponding theoretical values and the
usual epidemiologic estimates. Several points should be noted about
their work:
1) The method of staging employed is very restrictive and the fact
that there is a unique passage from SO to S1.. to Sk makes it impossible
to employ their model to the case of diseases such as Neuroblastoma where
from S3, the individual can either go to S4 or S5.
2) Their objective function is not strong enough to use all
information on survival. In fact it picks a point--critical point tc-
and weights every point t tc with
value zero.
3) The effect associated with immigrates is not well defined and
since the distribution is assumed to be known, in most cases it decreases
the accuracy. It would be much better to neglect its effect than to
bring it in at an unknown level of effectiveness. They mention that 20%
immigration did not change the screening policy significantly. This is
why other investigators neglect its effect and assume a closed population.
Having reviewed the literature on the topic of screening processes,
we observe that the main issues on this complicated subject are
1) An understanding of the natural behavior of the disease
2) Presentation of a method for estimation of the disease
parameters.
3) Investigation of the impacts of screening procedures on
performance measures of interest such as the probability of detection,
the probability of recurrence and the probability of survival (in an
individual).
4) Investigation of the impacts of screening on the whole society.
No single model has been able to answer all these issues rigorously. Some
unanswered questions are related to the last subject. Most investigators
have tried to find the benefits of lead time on the total population.
It has been claimed that those who are detected by the screening pro-
cedure have a longer preclinical time and are considered to be slow
developing disease individuals. Therefore any measurement solely based
on this group would be biased and determination of the benefit of lead time
for the general population would be imperfect.
From a modeling point of view, the most general models are those
of Lincoln and Weiss [65,106], Klein and Kirch [52,53], Zelen and Feinleib
[29,113,114,115,116], Bross and Blumenson [10,11,14,15], Prorok [78,79],
Galliher [35], Shwartz [88,89,90,91,92,93], Thompson, Disney and Doyle
[98,99] and Albert [2,3,66]. These models cover a variety of methods,
some of which are similar in the direction of their conclusions but some
are not. The structure of the model developed in this research is similar
to the models developed by Albert, Thompson and Disney, Shwartz and
Galliher but the specific formulation and general conclusions differ.
The following pages summarize the material review in this chapter.
> S- dW Cn Qu S- M
*L *i 0 >S L a) 4
1 *r- *- 4 C I *- E E
S 1/ S 1 U F XDD (0 3 3
*- Cdii WCC m
r- c D o a) a 0 a1 11 c 9 o 1/1
S 4-1 Q 4- 3 *- mS
CdL CL ) 4-U C -> d3
3i d i dd didjd-c i o"-
Sa) =t a 0= 10 ra 1 0 V) 2 (40
On o +S1 cu OJ 0 4 d l3 E OJ 4-I a) 0
(31 u 4IC.-' di-do i dia a), .Z 4
3 O +- '4-. -i r'=S4 0 w
c = 'a 1) s- 01 11 r=
L) i4- o+ u di o X Iu S- a- S- 4 C
w- a. U 0 c COC1 0 C >,
C >S- C. >sS..C.- 4-l rU >,dl-0d 3 >di
C
0-
I. 4. 4
o 3 d
7- CL ai c
0- 1 a a) -4 u
C.- > c) > e
Mi C
a)r- 0 4 -.
a *~- *r-- +
l > *0 +-O 0 U W- 0
dl I C 0 diS *CC
w aj m 71 a=t O*a
viU rd 4 c r u vi C o -a)
*" 9) 0 V l -i- =
0I 4-+ CA > ( 4-- l. -
G EE (i ai =i = Ui -c l =
a It a) W a)< U "O >
-44 a m 1.
0) diCU __
> a)X 0- Wdid.) u__
L*io di e4 L..L + CCo U
4C di C a I UO
m C ) M D. d U- C
ei CE! Ca) WiCC C + iO 4 d (lU
0 *w- a 0 *.- !L o o- *a r*- Uj > n 2 c
0*- U 4U ( CC I > U 3 CV U C *n .C
0 O C O
o 4-CI C 411CC 4- 4 d CO d1 -'' *C.diC OC
a O Oe O i 4 0- aI
-3 *T d a-r- *r- z, ( > *-- I U
o di 3d i CI 3 u ai sld a) d U ii = o
U ro dl
C W C >-U 0 i C 0- ( C
1U 0V
5-I
} i- i- i i-
rSI C &- >l &i-)-' di
di C -o C 01- C u So
di eL C S. L
C- -+
0E
04
L4
0J 04
.0 0
4- in -o 0
0 .0 0,
O C O0
1- 0.+.' L
3 =
C 4 1
;-
43 U
) 100
C0 ) 0
0. Ua)
0 u -0 ai
41 4-1
W -4.1 +.i 0 3
I- *O-Q .0 *
4J V)
0, M 4.
10 a C
a
'a- 0
d10 0) 0
=1 4. 0
U
C
100 0 0
0 4
II 0
as
S*-
LW
M V
0)
CU,
S0
0 UU
CL0
/ aL
0O
0 S-o
LI:*r
U, 4- 3
00CICI
M raa~a
0, W'00
0 00 4-Q W
4 4u -a >n
(X 3 V)
0- 4C- C*.C
0 0 f.0 L04^(
10 <- 4^ .C W
*~r- 0 4j faj
00 L- i
4 4 0'0 *r L
a) W (U0 m 0. S-
= OJ e- M. V) a
1= 4<-) U E a) C-
0. 4.1 Wt 0
CL M 4- =- C. 0 4-
I E 0 vi I 4C1 0
3 CC
0 og 0
__
r a >__
e- O- > *- 0
3 0 r- ) -+
> >4 40
w >)- 4- C :
- i- -0 W r3
CC > U U
0 4j 0 0 a-
V .t CL
.- .o E 0--V
> ?D > i *
41 1) 4- m Q (D
4- C) 10 (A
S- i a
(J" S- 4- CL
*- o 4. r-
o u vu r- =
<.-0 -a 4--
SO ( 4 *- W
40L aL1 aj 0j
)SL0ooo o E o
S4 a0 u, Q *' 4- -a
4->
r-(31 'o
S- S- 0)
C CJ
0I 0- u
a0. 0
Si 4- u
.0
r -01/
(*-J U
v0
V)
E0
41 U
41
u
4 0
0O >, V
>4- 0o
(/) *-a 0
U. 33 c
+1 CLC E, a)
3 *- a.. i 4 .- a
W- 0 00
V 4- o Qr- v)- J
0 > c a u c
4- 0
(U4 0) C- U
4- 4J1
W (n
ea v >
0,
L EL
0I 30
a*rE r-
4- L
0CU
'0 U
)0 0
m u
U,
4- 0
>10
OU
0 0
I- fI
00,CT
c i
C,
VI
0
4-'
o0
- >
ma
> a
0)
et g
0
0)
>5
c
-0
>1
0)
-C3 f
I- V
0)
=E 0)
4-1 -=
0 a
0 4
00
-'-'--0)
LS--
0 C -0
*) -M 0
C 3
4-'
4 0 VI
-0 VI
a1 I
Si- ra
s- 5-01
0 ) 4
I 0) C m
0) -r-
0 4- VI
c) L
Co C
i 4<- a) =
0- 4 Q
S 40
CS*- 4-' S-
VI l)
0 a 4c
0 m
-C *r- 10
CL 00 C
o *0) C- 4-
1 ).LL *
C 4-'l V2 C 0 2
r.. 0 3- -- 0 5<
V)
4-4
4VIE o0
4C C .
4- = 0
0 = -r- C 0
c ..- ..1 4 aj
* S- 0 r- U
0-0 m C LC
*4- S 0 0 S S
C-O V uL O4-'
t L 1 o k X r
I C.'0- I0 C, 0 C
COr
xC-
C-
a =
0 -
vi 4-
m C
0)
0 v
Cm
1-
0 C 01
01 0 CI
VIl0)
Q +-I VI
'LL
00 CL
1 0
I- -- 0u
c0 4-
02E
2aC 0
0)
OUa
'- 4
0*0
a )
o C-- sU
4-UU
V---4
S10 0 *-
- U C- -
0) 0 -
C >i4-'-o*0
C-)' 0 -
I-' 0 U. I CL 0
0) C' 4-3
oua.--0
4- V i 0 2
a0
C-
0I-
--
0..
41 > s-
4-1
0
) 4-' 01
= 0 4-' c
D D +
=1 u +-
C
0 0
CLVI
.0 ^- C
*(- -2 C= V-
a-eo
49
01 41 0-
M -4- M- 4-- 0
+O > V, VI f
Q. 1- a)- 0 0) (1
4 ,, 0
-0 3-w- 0 0
0 4- C0 1 1
S 4 I 0 0 ..0r- +
L. U En M 4- 0
C o ea CO
c vl a) 4- C
o O O GJO-.04-4-
U, = 4-'4-1QIam,
*- 4- S e W> _
3 i 0 CL a" r eo
U SI C4 **-C U
C 0 > 0
41 U a n e Yo -a
Sj S- au
C a) >' M 0 .0
( 0 CM -- wO w
S- O J" .l
U, aC o o a .-i
o 4U V, In
E fo ii
U3 a in U, a c a
Ul ul 0 I S 4-
C *I *W 0 *r- 41 f
*r- cu 1 S m e > io
o0 i :JU 0 a U
0 I I 0 1
I c0 '0
-0 E 1. CO *OO C
0 .1 i M C vi -
E .. - O 0 C
4-) -5 .C+-O 0 s...
4- U
M C U0ai
0 O 0) U
f0 ( C 0 1 n 4- U C
0 oo- O 0" -'"0 O' Co
U, tO S- C U C
1.1 01
't 01
U 0O
0^ 1
C
OL O
C C'- t -
00Co
4- L Li 0 L-
000000
~uFO ovv
^ J
>0 C
mQ. m ora
C- 04 -
00 0 '0 j -
C- L i 400 1
- .0 0.41o L
E < rt*-*<- -
*'-
a. ^ ^- "
~o ^- c c a
c j a c
-rC t *-L- r
0- 00'
000 CC oCC
w 04 C 0 01
0'0,0~ -0- C
Osa -v-ve GC
OC0 C>* 2 w a 0
-i 4400! x 00. 0
oo~~ r oc
.4 0 C -> *r O C
s IL c uT-i *-*<
0003 U C 0' C 44
,LOC CL ~
u^ O >b -*C 0( j ul*-
COtL CS 01 *rU
(it~ o ~ e e>e n o-J- t*
0 S CC 2-C-CO
o- t2
010.0 CS ajs 11 Liau0t
0>LE CO C're
--( -0 C0.'---. C O '<-.;:4-Li
,0' LO
00 0 *-
c~a o a
C C C
t- a ri
rCLO 0(S 0
| 03 41
S-C0 00- r
u 4-' ui 0'L
-'--r-^-0 fj C0"-
GJLC 0. CM*L=
i-a - lai~t
0I
a, L
Ca mcJC
LLL OlCO
.40 CC 0
o taoo LOc
C3 5 a L C o a
I 0' 0 I 0 c
sJx 3 *, (
L*- *^ U a;
aj i, njl
*fl L1 !U CJ
*J 0 = *- irt S
S <-> S I )I-1
CCJ
L ( 0
0.-.4-
4-a 0 0
0-C
*C '
~CCY
O L 01
CL1O) C
tS 0 0''
0 ,3
L.-lOC L,-O41I --
00.0) 00 00 04
cacC 0C L:
e~L (nom c (n
5
JZ.C ^
u *Q r
<- j e
c
c
o vlrl
uuru;
cr ar
Ye
.0 5 01
I*-r C
0U COI-L
O1--- 00 < 30-
"1'40 -U*'
u u S "-
0: -C 1-0 C
5/ 0' C0 00 <
C 0- .00. 4Xn -'
c' o 0 o 0
1-1Y e ^ o
-0 5
=4- 0
0 c0 0
0 i
"u- 0 4
-C Oc
0 5 0 0
0' o a 00' 00
4- ^- -r---4- 4- 0,0 C*QL- '00
C-CC>, C 03 2) 4)
0L- O 0 i -O 4-C40
CU C ~ C C: 3 <
.40.40- a L -0 Ll
Ogu 00 00' -C aU -0h- iy---
50 5M fU- 5L -- OO Ce 0'- o i
Sj ro ro*- "l E f m >ig f
( 0 a 3 C -- -C0u
0-*~ i4- -- 00 4-Ch 00 OO00
-C- *- 3*-O-*'-E 5-0 *- *- -4CJ> -LC
.4~(-- 0.4 2) -C a cu-C o9o
i-S-O -L" -. 01 -30 0 S..
~OCIL ~ ~ L PI~OC -C
410 e 0. <
0'00 4 Ir CC0 0
00.0 C'1 C 0->, CL
tflooOUo iC .44 -Ca .4
.4~~ >0 CL 0-C '.00 CC-5
400 00 O.LO 4 CO3-C
00- -0.40- 0o C0'0 Ci
(.-0 LC -0fV L >o -C
0)- > 00 0 41 COCO0.---1J"C E.0
va CEL)00 !C'L '-U L
ao 0o 5. > a cj ~
ao-4-'~ oao 4-'>. 0.a'3C u Ca
04-COaj~ja-'-ai -^'-o*-oa4- ioo>o-Co
0- -C OOC c ^ !* -a o t
COL~O ~ 0'0' C
0 4ju .' j 1 t.4 *c >
4-c 0. 4-1- *- C jc a > .
0- '-C L C 30
COOo i000' 0
al-0 cua*^*' 0 -0- a))"'* -?<3o3
aCOC asC C
0; 4 5- v v r Qir c aC- 0O cc 0
0
L0~u ~LOC--1a
4-C 01 l- iC 00
r o -o- 4- e
0 00 1-
a.. __
__
OL UT U __
__
a L
L S.
002
00
51
*1 U
41 C *--
-* 0 .C3 0 *-<
41-h > 4, 4
4- 3 C* 0, 41 4 C 3-3 4
41 (U C43. Cfl.- -04.
C .C 414 0,4 2C*-' c -
o u41 0, 41.OT 'o-4 v .
**-~ faY uc*'* q1
- 0*- (3 C4 0, C08
U .-'4 04 444<-n C 10. C.43
C 1. .C330 13 0,f- U0 *-'
o 41.44,041o 0414.43
c; I E"ra >>
41 0 4. 00 ,40
a* i- 4*- -- f- -
-- 4, r- 0 040.03
4304 I .3 41 0 J 0
41 410.44 *V- 4- 4141- 0,41>4
43 1. U 4 0 04 1.0 041cu > -C
1. >3>S 0 O 3
43 4101 4-0 0,4 (_*- 413
4i.* o o **- 0 ai 0 ~ -43.-.
43 034.= 3 O 0 4.. fa14 >t+-4*,41
41 '-O CI 1. i 14 343
13. 4,..4C300 0 004 434 D 4-* 01
o 0. >341 >r- 0141041400
41 ^-i 1.1) 41 41 041f >0t)3 00O.
04 4-043(V N > 4'.- -4 1
41 431,g0, 41 0 > 41u O 41T-li-f
U" f T *0..-4 0434041fl04- 1 .i
4* C 0,14 .. X l*~" Jt 41 41. 4141
4, aO 411 J 4,341.43434343 1.
-0 0,<1.U 1 0. 44 01. 41
(ru l-cnaO O G*
04 m j 41.)*
-0 411. -
1.43414 a14 -*4-4, 4,4 41414'-
41 0 1 C 2010 4rO< 30
e +J 0="0 r3 3 q-<*- y
0 4, 1 14 0 4 ./ C~^-- U-1 *- SC 4,*
T 1 ul 1 44 .3 1-- 1. iU C
0. G.--3, S4O 1 1>40C, 4 S
0, 00414-4, 40 *,- 414
ar df *^"'- 'o 'u *->C*--
41 0- 0 1. 4, C 4, (.1l40
.43 .2 41 1.30, nOCOJ 41 51.
l_'< a'4 *-i O "- ^.'- ( G- i
u QIl-'r a 'f-' i- o o rt-' c o a--
' j ('+-oj u aj j^ a
41 44,40, 30403 0,
rg 0.1.44I30 3 -U 04,4 4 *
4 1 0i 00 c 3 '4 o 4- --
C1i 434Q.14 3 -> 0 343* 0a 1.0*-
..1.41. .14
41 0 4 1 41 2 4 4
40 41 40 014343~ 411.30 41S 433
0 c, a- 4- >.4 4o1 4, 4*04
044.43 .43-.. 1j- i .4 21 41 o44
.4 4i3t 434a .0 ->-l4 -' a
4, 43l 4-13 1 0 0i 4, 0j!> M -- 1 4 1 4, 41 04l
0 > 43..,- I- 431 0,41.0 .43-4140 3
0,41.343.44 0..-- -4. -- 434 434 0SE
o 41001000,4343 41 40300.0430414- 4*o
1.4, 1..43. 041 0 4-V 04 .--S 0-43 -'t U144
*- II
0 .4 4.04-434o4103413 0,4,.43
41a 11 '4
41 i ,1. 41.-0,1.- i m a U s /i-- .i>
O -- 1.40 ( U ^ (2 '40 y -*-0- J(IM_
.0 C0tf
41 0) 04
__
__
0, .4 c- .04 a
U 01 0
Y- ^r-S I- i
1/ 0 0 cn >-U 0 f- r
01 CIILr '-
CK LI Cl0 = w-0 ?S~~'
a m o
z '- CMr~
CHAPTER THREE
MODEL DEVELOPMENT
In this chapter, a general model of cancer screening will be
developed which incorporates the stochastic nature of the disease
process with the viewpoint common to most of the cancer literature-
that cancer disease process can be represented as progressing through
a series of stages whose durations are random variables.
3.1 A General Model of Cancer Screening
The literature review reveals that almost any site specific cancer
could be represented as a process that progresses through a series of
stages [16,27,35,85]. The staging of the disease can be done in
alternative ways, and there has been a continual argument concerning
the choice of one method over another. Staging of cancer found its
initial importance in reporting the end results. The "American Joint
Committee on Cancer Staging And End Results Reporting" was organized on
January 9, 1959,to develop a system of clinical staging of cancer by sites,
acceptable to the American medical profession. The committee has completed
and published different brochures for clinical staging of different
site specific cancers. The main objective for clinical staging, to be
useful, is that it should be relatively simple and should yield mean-
ingful information as to prognosis. In the case of modeling a specific
cancer, the choice of one method of staging over another depends on
52
1) The availability of needed information. There are cases where
a staging method seems to be easy to apply, but there are no data to
support it. In such a case, data must be gathered by employing the
staging method under consideration which needs time and the cooperation
of different investigative groups.
2) The Complexity of staging versus usefulness of the results.
It might be possible to combine data in different manners and design
different staging methods, in which case there is a trade-off between
complexity of the staging, accuracy of the result and its usefulness.
In the model developed, the disease process is looked upon as if
it passes through a series of stages before the individual is clinically
surfaced by coming to medical attention through signs and symptoms. At
any time, the disease process may be terminated by death from cancer or
from causes other than cancer. Throughout this research, attention will
be restricted to the disease process and the results will be conditioned
on the individual's surviving to the age of interest. The model makes it
possible to predict an individual's status at any desired age. It will
be possible to compute the probability that a person is free of disease
and, if he has the disease, the stage he might be in.
Then the interaction of the disease process with a random process
called the screening process will be considered. The screening process
is a process in which the individual who is suspected of having the
disease is examined by one or more of several screening methods at
specific points in time. Based on the result of the screening procedure,
the person under consideration will be categorized as "free of disease"
or "having the disease." In the first case he will not be considered
until the next scheduled examination time. In the second case more
specific examinations are performed to verify initial results and
identify the stage of disease. To allow for all possible situations,
screening methods are permitted to produce false negative and false
positive results. Under a false negative result, the individual is
classified as healthy, whereas in reality he is in some stage of the
disease. Under a false positive result, the individual is classified
as being in some stage of the disease, whereas he is healthy. This
introduces some extra costs, due to additional examinations necessary,
but eventually it will confirm that the individual is healthy. In the
development of the model it will be assumed that the screened population
is closed. There is no allowance for emigration from the population
under consideration, and death from other causes is independent of the
death from cancer. A competing risk approach can be used to tie these
two processes together.
3.2 The Disease Process
Data analysis [5,9] shows that for several cancer sites of interest
in this research the relative survival statistics are strongly age-
dependent and that the distribution of the time from detection of the
disease to death from the disease depends upon the individual's age at
the time the disease is detected. Therefore, the age of an individual
is incorporated as a dimension in the state space of the model. The
basic elements of the model are the age of the individual and the stage
of the disease he is in.
Suppose that an individual of age t can be in one of (2N+3) <
states with respect to a given disease. Let
5 = ([0t,1,, ... ,N t,1 ,2 . ,Nt] ,0',D}
tE{T
T = 0,1,2,...Tmax
"' 'max
be the state space of the disease, where T is the set of discrete times
which are the ages of interest in ascending order and T is the
max
maximum age under consideration. Depending on the nature of the disease,
t could be weeks, months, quarters or years and Tmax should be large
enough to cover screening of any individual carrying the disease.
State 0t is assumed to indicate that the individual of age t is
free of disease. States lt through Nt indicate that the individual of
age t is in one of N occult stages of the disease. Stages I' through
t
N. indicate that the disease has surfaced clinically or been detected
by examination for an individual of age t, when he was in one of
stages 1 through N, respectively. States 0' and D indicate that the
individual has been "cured" or "died," respectively. In Figure 2,
the stages of the disease are depicted schematically.
An individual may start life healthy and remain in that state
until he dies of other causes. or he might start life healthy and develop
the disease sometime during his course of life. Then he will be classified
as in one of the stages 1 through N. The disease may progress from
one state to another until he is surfaced clinically (through signs
or symptoms) or is detected by screening. Then he goes from state n
(1 5 n N) to state n'. In this simplified model, regardless of the
progression of the disease in the detected stages, the individual is
assumed to die from the disease or be cured. That is, depending on the
age of the individual and the stage he was detected in, whether or not
the disease progresses through the detected states, he has a certain
chance of going to states 0' and D. In Figure 3, the transitions that
are permitted in the model are illustrated in more detail.
Figure 2: A schematic diagram of the stages of the disease.
Figure 3: Structure of the disease.
There is no restriction on the condition at birth, which means
that a newborn infant could be healthy (with respect to the special
disease under consideration) or possibly in any one of the occult stages.
This modification plus "time dependency" of the disease would be
necessary in the case of some kinds of site specific cancers such as- --
Neuroblastoma.
At any time the disease process can be terminated by death from
causes other than the disease, therefore the attention is restricted to
the disease process and the results are conditioned on the individual
surviving to the age of interest. For modeling purposes, the possibility
of death from the disease prior to recognition of the disease is included
in the model with clinical surfacing from the disease. This is because
death from the cancers is unlikely prior to clinical surfacing of the
disease. Therefore, there is no one step transition from occult stages
of the disease to the state "death from the disease." The process
terminates when the individual dies of any cause or is cured through
the process of treatment. The re-examination of the individual after
the treatment is done will not be considered. This is because the risk
of a second cancer is usually high enough that screening of former
patients cannot be considered routine screening.
Let Xt by the state of the individual of age t, given that he is
alive. For instance 210 would mean a 10 periods (weeks, months,...)
old person is in stage 2 of the disease. Let the disease process be
tracked only at the end of fixed intervals of time. Conceptually, the
random process {X} can be pictured as a process that goes from one state
to another. As soon as a new state is reached, the process randomly
chooses the next stage to be visited. After choosing this next stage
and depending on which stage the process is currently in and will next
go in, the process randomly chooses a time required to make that
transition. Therefore the resulting process {X} is a Semi-Markov
process and the sequence {X } would be a Markov chain discrete time
process. The one-step transition probability
Pij = P {Xn+l = jXn = i}
has a very special structure, which consists of upper diagonal blocks.
This can be seen from Figure 4.
It is seen that the only possible transition for an individual
of age t is to go to age t+l, regardless of progression of the disease.
To clarify the process, a sample function of the disease process is shown,
in which the individual is detected in stage 2 at age 6 and gets cured.
See Figure 5.
3.3 The Screening Process
The disease process goes from one stage to another until it is
clinically surfaced through the nature of the disease itself. If the
disease is a time-dependent process, it usually surfaces at a stage
in which there is a relatively poor chance of survival. Therefore,
there is an interest in detecting the disease in its early stages of
progress, which can be done through screening process. The screening
process is a process in which, at specific points in time, the individual
who is suspected to have the disease, is examined by one or more of
several screening methods. In order to screen for a disease, one needs
a simple, rapid, relatively accurate test to select from a general
population those persons who would benefit from further diagnostic
studies [45]. A screening method has several characteristics, which in
AGE 2 .....
AGE 0
AGE 1
AGE 2
Figure 4: One step transition probability diagram.
0'
D
it
N
2t 2
N
1
0o
0 1 2 3 4 5 6 7 8 9
Age t
Figure 5: A samDle function of the disease process.
AGE 0
general make it possible to choose one method over another, if any one
is to be chosen. For instance, a screening method may be harmful [89]
because of the unreliability of the technique used, resulting in
increased psychological trauma because of high false-positive rates and
perhaps decreased surveillance and an unwarranted sense of security
because of the high false-negative rates and because of possible
deleterious side-effects of screening. In general an ideal screening
test must have an acceptable false-negative rate and false-positive
rate.
An individual should be screened under a screening policy which
determines the ages at which the individual is to be screened, and the
method which is going to be used for screening. Assume that there are
M different examination methods and that screening examinations are given
at ages T1 < T2
one screening test per each time interval is done and the screening
examination for the period (t,t+l) is done at t which is a time
very close to t but greater than t. A sample function of the process
with and without screening is shown in Figure 6.
It is assumed that when an individual in state i (1 i N)
is screened he will stay in the state i (if test gives a false-negative
result) or go to state i' (if the test gives a correct result), and
therefore there would be no error in stage recognition of the screening
method. In the following material, the effect of screening on the
process will be determined.
Let bit = Probability of transition from stage i to i'
in the t-th interval with screening.
b:t = Probability of transition from stage i to i'
in the t-th interval without screening.
With scr~ning
_ without scrorning
I -
F2-
t Age t
Screen at time t-= 7
Figure 6: A sample function of the process with and
without screening.
Define f.(t) to be the probability that a person who has been in state i
at age t is properly classified as diseased. Then screening will affect
bit in the following manner:
b. = P[test is done in the t-th interval] -
P[individual being in state i at the time of test] -
P[test gives correct result] +
P[test is done in the t-th interval] *
P[test gives false-negative result] b' +
it
P[test is not done in the t-th interval] b .
Define Zt as a zero-one variable, which takes value one if the test is
done in the t-th interval and zero otherwise, i.e.,
O If test is not done in the t-th interval
Z =
1 If test is done in the t-th interval.
Then
bit = P[ZtO] f (t) P[Xt+=ijXt=i] + P[Zt=O] +
+ [P[ZtO] (1-f.(t))]l b'
t 1J it
bit = Zt f(t) P[Xt+=iixti] + {(1-zt) + zt'1-fi(t)] b
bit = Zt fi(t) P[Xt+=ilXt=i] + [l-Zt-fi(t)] b.t
Later, the system will be assumed to be discrete so that the process
can jump only at the end of a discrete interval. This assumption is
equivalent to saying
P[Xt+ iIXt = i] = 1
in which case b.t reduces further to
bi Zt fi(t) + [l-Zt.fi(t)] bt
This means
bt If screening is not done in
b.t = the (t,t+l) interval
f (t)+[1-fi(t)]-b' If screening is done in the
(t,t+l) interval
A simple way to see this is by the use of the following argument
bl If test is not done
it
b =it 1 If test is done and gives true-positive result
bit If test is done but gives false-negative result.
But the following probabilities are associated with the above events:
P[test not done in the (t,t+l) interval] = 1 Z
P[test is done in the (t,t+l) interval and gives true-positive
result] = Zt fi(t)
P[test is done in the (t,t+l) interval but gives false-negative
result] = Zt[1-fi(t)]
Therefore
bl with probability (1-Z )
bit 1 with probability Zt fi(t)
bit with probability Zt [l-fi(t)]
I v 1-1
bit = bt (-Zt) + 1 Zt fi(t) + bit Zt[l-fi(t)]
=> bi Zt fi(t) + [l-Zt.fi(t)] bt
A similar argument can be employed in the case of other elements of
the transition probability matrix. Define
aijt = Transition probability of going from i to j in the t-th
period without screening
aijt = Transition probability of going from i to j in the t-th
period.
aijt = a t{probability that the test is not done} +
a jt{probability that test is done}
(probability that test gives false-negative result)
aijt a -jt {PP[ZtO] + P[Zt] [l-f (t)]}
a it aijt (-Zt) + Z [1-ft)
a = a [l-Zt*f (t)]
aijt ijt
A simple way to see this
a1 jt
a'
ijt
is by the use of the following argument:
If the test is not done in the (t,t+l)
interval
If the test is done in the (t,t+l) interval
and gives true-positive result
If the test is done in the (t,t+l) interval
but gives false-negative result.
Therefore
aijt
aijt 0
aat 10j
with probability
with probability
with probability
(l-Zt)
z t f-f.(t)
z t E~-f i(t)]
This gives
a t = a jt[-Z t.f(t)]
ijt ljt t 1
Let U' = Probability of staying in
it
without screening
Uit = Probability of staying in
The same analysis reveals that
Uit = Ut [l-Zt fi (t)
Uit it
This analysis shows that
probability are affected
state i in the t-th period
state i in the t-th period.
all elements of the one-step transition
by screening in the following manner:
No screening in Screening done in
Probability element Notation (t,t+l)th int. (t,t+l)th int.
Probability of detection bit b' fi(t)+[1-fi(t)].bt
i it 1 1 it
Probability of stay Ui* U1 U [1-fi(t)]
Probability of jump aijt* a~j a'ijt[-f(t)]
*
Note that Uot and aolt, which are related to the "Healthy" state, are not
dependent on the screening procedure and regardless of the screening method
used, they will remain unchanged.
It is seen that U and al are decreased and b't is increased through
the process of screening. Later on, in this research, this analysis will
be employed to develop a general form for the probability transition
matrix.
3.4 Estimation of Transition Probabilities
Any model has certain associated parameters, the realistic estimation
of which should be a primary goal of the analyst. In the model employed
in this research there are four types of parameters.
1) Probability of staying in the occult stage i in the
interval (t,t+l).
2) Probability of going from stage i to j (in the occult part
of the disease) in the interval (t,t+l).
3) Probability of detection of an individual in state i at age t.
4) Probability of survival of an individual in state i at age t.
The probability of going from i to j includes onset of the disease
as a special case. In order to have a realistic estimate of these para-
meters, data available in the literature should be employed consistently.
The survival probabilities are given in the literature on cancer and a
realistic estimate of these probabilities for each stage and age group
is not that difficult (although lead time effects are difficult to
estimate), but the first three sets of parameters mentioned above are
extremely difficult to estimate. This is due to the fact that there is
no consistent data on the occult part of the disease, and there is no
simple way to estimate those parameters. It is known [35,89,98,99]
that the unavailability of data on the occult stages is due to the fact
that almost all detected individuals go under treatment as soon as they
are screened and found to be in some stages of the disease. Then
it remains to be clarified what are those statistics presented in the
literature, say the percentage of the people in each age and stage
group. A close look at those data reveals that they are on the
detected stages of the disease, because there is no way to gather
data unless a diseased individual is detected. For instance, assume
there are data indicating that a% of people who eventually get the
disease are found in state i at age t. The only way to interpret these
data is that in the long run, assuming no trend of the disease in time,
the steady state probability of being in state i' at time t is a.
If these data are available for a system in which there has been
no scheduled screening policy in the process of data collection, then
this information would be very useful in estimation of the parameters
of the model, i.e., a.jt' bt and Ut. To do this, it is necessary to
hypothesize a model structure for the occult part of the disease and
then check the model's output with data. The model which gives the
closest match between output and data would be the one which has more
chance of representing the actual phenomenon of the disease in the
population.*
The following approach is employed in estimation of the model's
parameters. It is decided to use the data as the probabilities of
being in the detected states (call them P[Xt=i'])** and compare them with
*
In the use of this distribution, it is assumed that irrespective of
the screening time and interval, the population probability of natural
detection remains unchanged.
In this development the prime denoting the probability of natural
detection has been dropped. It will be reintroduced later.
the corresponding theoretical values. Due to the dependence of the
process on the initial state occupied and time dependency of the disease,
there is no long-run distribution. Therefore another concept will be
employed in which lim P gives the steady state probability of being
n- ijt
n-*O
in state j after t time intervals, given that the individual started life
in state i at time 0; Pnj could be computed from postmultiplication
ijt
of matrix Pn-l by matrix P.
Assume that there is no more than one transition per unit interval.
This is possible by making the interval small enough to cover the
possibility of any short duration transition. Let P[X = i'] denote the
probability of an individual of age n being detected in state i'. The
following observations are used to establish a mathematical form for
P[Xt = i']. U
Observation 1:
n
P[X SIX =S] = U (I)
n 0 j=l sj S
Proof: The only possible way for an individual to stay in the state
he was in n periods ago, is to stay there at each and every interval,
which means the product of the probability of being there at the end
of his first, second, ..., and n-th age period.
Observation 2: If there is only one step in going from S to T
(i.e., one arc), then
D[Xn-1 n Xo n:
[ j0 1 si=l s STj+1 k 2 Tk!j
U aST T
Proof: If the individual wants to be at T by age n, he has to jump
from S to T sometime during (O,n). Thus he has an extra time of n-I
periods to spend in any possible combination of states S and T.
Observation 3: If there are two steps in going from S to T (i.e., two
arcs), then
n-2 n-2 1 J+1
P[Xn=T Xo=S] = Ui aSW,j+ T U k
j=0 Q =j i=l k=+2
a WT,+2 +3 UTm (III)
aWT, +2 i m=2+3
U UW UT
a SW aWT
S W--- T
Proof: If the individual is to be at T by age n, he has to jump from S
to W and from W to T sometime during (O,n). The rest of the time
(n-2 remaining periods) has to be spent by stayingin any possible
combination of states S, W and T. The same type of discussion could be
used to generalize the observation for cases of higher order.
Observation 4: i
bin
P[Xn=i'] = P[X nli] bin (IV)
Proof: Due to the assumption employed before (there is no mistake
in stage recognition in the detection techniques), there is only one
way to be detected in stage i at time n; the individual should have been
in state i at time n-1.
The same equality could be derived by using the definition of
conditional probability
P[Xn=T] I P[Xn=TIXt =S] P[Xt =S] 0 to n (1)
o o
all possible S's
because for the special case when to = n-l
P[Xn=i' ] = P[X=i' n ] P[][Xn1=I] = bin P[Xnli]
n-
limit for n=t. The reason behind this is that nobody who is in any
occult stage after t-l can be detected at t, and if somebody has been
detected before time t, he would not account for P[Xt = i'].
Use of Observations 1 through 4 plus a special case of equation 1
(where to = 0) makes it possible to find a mathematical form for
P[X n=i] and P[Xn=i'] as a function of U's, b's and a's.
As an example, consider a disease which has been postulated to
have the following structure:
0 a0 2
P[X n='] = bin.P[Xn-_1] ao 1 2 2
= b: n{P[X ln-= Xo=O] P[Xo=O] + b b2
P[Xn-l=l1Xo=] P[Xo=]} I
where P{X n=1|Xo=O} and P{X n=1 Xo=1} could be found from Observations 2
and 1 respectively.
P[Xn=2'] = b2n P[Xn-1=2]
= b2n.P[Xn-l=21X =O] P[X0=O] + P[X n-l=2X =l] P[Xo=l] +
P[Xn1=21Xo=2] P[X=2]}
where P[Xn 1=21Xo=0] is the sum of two different conditional probabilities
a) going from 0 to 2 straight (one arc)
b) going from 0 to 2 through 1 (two arcs)
Observations 2 and 3 give expressions for these probabilities
respectively. Moreover P[Xn _=1'IX =l] and P[Xn_ =2'IXo=2] are found
from Observations 2 and 1. Therefore having initial conditions P[X =O],
P[Xo=1] and P[Xo=2], it would be easy to find the steady state
probabilities as functions of the elements of the probability transition
matrix.
The question is how can P[Xn=i'] be used to estimate the model
parameters? A close look at the structure of the transition probability
matrix reveals that depending on how the occult part of the disease is
to be modeled, a different number of parameters are to be estimated.
Assume that the number of parameters of the model of interest is M. Using
the methodology developed, the following algorithm will give the unknown
parameters uniquely or as functions of other parameters whose estimates
would be necessary. The following three concepts are employed in the
establishment of the algorithm
1) For each time interval (say n) there are N equations (one for
each occult stage) of the form P[X =I'] = a where ai is the value
n n
obtained from data and P[Xn=I'] is the mathematical function developed
in this section.
2) For each time interval (say n) there are N+1 equations of the
form Y P.. = 1; one for state "healthy" and N for N occult stages.
1jn
3) For each time interval (say n) there are M unknowns of the
form a.jn, b. and Ui .
Therefore there are a total of 2N+1 equations and (M*2N+1) unknowns.
Algorithm
Step 1: Start with t=l.
Step 2: a) There are 2N+1 equations and m unknowns. Depending on
the number of arcs (transitions possible) there may or may
not be a unique solution. If not M-(2N+1) of the parameters
must be realistically estimated.*
b) Solve the system of equations for the remaining unknowns.
Step 3: Let t = t+l, go back to step 2, unless t = Tmax, in which case
stop. Therefore starting with first period and going in steps
of one period, the algorithm gives systematically the value
of parameters uniquely or as functions of M-(2N+1) other
parameters. Therefore, the algorithm solves the system of
equations and determines the remaining unknown parameters,
uniquely.
3.5 Objective Functions
Having structured an analytic model of the disease, policy selection
can be approached in a straightforward fashion. It should be noted
that there are several possible measures of performance; different
measures may lead to different "optimal" screening policies. The most
common objectives used in the policy selection of cancer screening are
One way to get an estimate of these parameters is to find a bound on
them by estimating the corresponding parameter in the detected stage.
This method will be discussed later in this research.
1) Minimization of the total cost--Cost of screening, cost of
treatment and cost of losing a patient due to death.
2) Maximization of life expectancy.
3) Maximization of the probability of detection of the disease
in a favorable stage.
4) Maximization of the lead time--Time between the detection of
the disease under screening and the time that disease would have been
discovered under no screening program.
Cost-effectiveness measures of screening policies are an objective
of much interest, because if screening is to be used nationwide, there
should be a net positive benefit (measured in dollars) of doing so. The
total cost of a screening program consists of the cost of screening, the
cost of treatment, the cost attributed to the period of disability of a
patient and the cost to the society of losing an individual. Therefore
Expected total cost of a screening program = E(S)+E(T)+E(D)
where E(S) = Expected total cost of screening
E(T) = Expected total cost of treatment
E(D) = Expected total cost of death due to disease.
In this section, each of these cost elements is expressed mathematically.
In order to do this, an undiscounted analysis is employed and later on a
discounted version is used to transform all the costs to present worth.
a) Screening Cost
Assume that M screenings are scheduled for ages T ,T ...,T
The expected total screening cost is the sum of the expected costs of
screening at T1,T2 ..., and TM. However, since an individual will be
screened at age t only if he is in an occult stage (including state
"healthy"), the following expected value analysis is used.
Expected cost of screening at age t =
(cost of screening at age t)(probability of being in an occult
stage at age t)(probability of test done at age t)
Let Cs(t) = screening cost at age t
P[Xt=i] = probability of being in an occult stage i at age t.
Then
Expected cost of screening at age t = i C (t) P[Xt=i] Zt
where 0 is the set of occult stages, i.e., 0 = {0,i,2,...,N}.
The expected total screening cost is the sum of this quantity over
all intervals, i.e.,
T
max
E(S) = O i Cs(t) P[Xt=i] Z
t=0 ie 0
b) Treatment Cost
Treatment cost will occur whenever the disease is diagnosed on
the screening tests or clinically surfaced. At a specific interval
(t,t+l) the expected treatment cost E(t) is
E(t) = (cost of treatment of an individual in stage i' at age t)
(probability of being in a detected stage i' at age t)
(probability of accepting the treatment)
Assume that every individual when detected accepts treatment, and define
CT(t,i') = Treatment cost of an individual who is in stage i' at
age t.
P[Xt=i'] = Probability of being in a detected stage i' at age t.
Then
E(t) = CT(t,i') P[Xt=i']
i 'EDs
where 0 is the set of detected stages, i.e., De{l',2',...,N'}.
The expected total treatment cost is the sum of this quantity over
all intervals, i.e.,
T
max
E(T) = t i CT(t,i') P[Xt=i']
t=0 Vi'eD
c) Mortality Cost
In the simplified model, there is no possibility of death due to
disease before detection, and any individual who is in a detected state i'
at age t may die with probability d'(t) and/or be cured with
1
probability [1 d.(t)]. This transition is assumed to happen one interval
after he gets into the detected stage. (This is not what happens in
reality, but it is a good approximation of the actual process and is
employed here because most of the data on survival are given by age at
diagnosis.)
Expected cost of death due to disease at age t = (cost of death at
age t)(probability of being in a detected stage i' at age t)(probability
of death at age t in stage i')
Let Cd(t) = Cost of death at age t (present worth of future income)
d.,(t) = Probability of death due to disease for an individual
who is in stage i' at age t.
Expected cost of death due to disease at age t =
I Cd(t) P[Xt=i'] d ,(t)
i ED
The expected total mortality cost due to disease is the sum of this
quantity over all intervals, i.e.,
T
max }
E(D) = iD Cd(t) P[Xt=i'] di'(t)
t=0 i I0 C
Since this model is based on the
not die due to any other causes up to
probability of this event, then PL(t)
objective function. Therefore
assumption that the individual does
age t, let PL(t) represent the
has to be incorporated in the
Expected discounted total cost =
t
max
mx {i C5(t) P[Xt=i] Zt +
t=0 ieO i'eO
[CT(ti')+Cd(t)-dil(t)]
P[Xt=if]) PL(t)
(r+l)
where r = discount rate
PL(t) = probability of not dying of other causes up to age t.
It is shown in Section 3.4 that P[Xt=i] and P[Xt=i'] are known in terms
of the parameters of the model and the decision variables Zt. It remains
to give a realistic estimate of quantities Cs(t), CT(t,i'), Cd(t),
di,(t) and PL(t).
3.6 An Alternative Expression for the Objective Function
The criterion of minimization of the expected total cost resulted
in an objective function derived in the previous section and repeated
below.
t
max
O.F. = Y Cs(t)-P[Xt=i] Zt +
t=0 iO
+ I [CT(ti')+Cd(t)di(t)] P[Xt=i'] PL(t)
ieD d t J (r+1)t
where Cs(t), CT(t,i'), Cd(t), d.(t), P (t) and r are the known parameters
of the optimization problem and P[Xt=i] and P[Xt=i'] are functions of
the decision variables Zt O0titmax
For any site-specific disease, the methodology developed in Section 3.4
could be employed to write P[Xt=i]iE0 and P[Xt=i']i eD as functions of
the disease parameters aj, Ui., b. and the decision variables Zt. To do
this in the formulation of P[Xt=i] and P[Xt=i'], the parameters are
written as functions of Z in the following manner:
at = at [-Zt fi(t)] ift
Uit = U1 [1-Zt fi(t)] ifO
bit = Zt.f(t) + [-Zt.f(t)]bt
where a; U' and b! stand for the probability elements estimated for
Ijt' it it
the case of no screening program available and aijt, Uit and bit are
the corresponding values for any specific screening policy. In other
words, prime indicates measurement of elements of the disease when the
nature of the disease is unchanged by having no external screening test.
This reduces the objective function to a function of Zt, Ot't max'
where Zt can take one of the two integer values zero or one. Therefore
the optimization problem is an integer program but because of the
multiplication of variables.-- to see this look at any one of the
expressions for P[Xt=i] when aij, Ui and bi's are functions of Zt--
it is not an integer linear program and the classic methods of
solution to "integer programming" are not applicable. It remains to
evaluate all possible policies or use some branch and bound procedure
to omit any policy as soon as it is dominated by another policy already
computed.
The optimization problem is whether or not to screen the individual
at any time t. Therefore if t is a discrete time interval and varies
tmax
from 1 to t there would be (2) solutions, because at any time
t either there should be a test or should not; of which the best one
is to be chosen.
There may be constraints on the total number of tests permitted
per each individual and on the interval between two successive tests
due to the fact that certain tests are dangerous in nature and/or might
introduce some side effects. The constraints are mathematically shown
as
Constraint 1: Maximum number of tests permitted for an individual
is Sax
max
t
max
y z, ~~
t=l max~~
Constraint 2: The interval between two screens should be at least
min
j+tmi
min
Z. l j = 1,2,...,t max-tmin
i=j
Depending on the values of S and tmin, Constraint 2 may imply
Constraint 1. For instance if t = 72, Sax 10, tmin 9, then
there is no way to do ten tests or more and keep the interval between
two successive tests as desired, which means in this case Constraint 1
is implicitly satisfied by the second set of constraints.
3.6.1 Alternative Form 1
If it is assumed that f.(t) which is the probability that a person
who has been in state i at age t is properly classified as diseased, is
a constant independent of age and stage, then it is possible to simplify
the objective function and reduce the computational aspects of the
problem significantly. The assumption of constant true positive rate of
detection is valid in the case of some of the diseases such as Neuro-
blastoma and, in general, the range of variation of f.(t) is sufficiently
small that a constant could be used to represent the true positive rate
of detection of the test. Later on the constant could be varied over its
possible domain and a sensitivity analysis could be used to determine the
validity of the assumption.
The assumption f.(t) = F is basic to the following analysis and is
used to write
aj = a jt(l-ZtF) ifo
Uit = Ut (1-Zt.F) ifO
b = b't(l-Zt-F) + ZtF
A close look at the mathematical form of P[X ni] and P[Xn=i'] derived in
Section 3.4 shows that each of these terms is the sum of several terms,
each of which is in the form of multiplication of n elements. The terms
are such that there exists exactly one element corresponding to each
time interval t; 1t- n. To explain this in more detail, take the
following simple case:
Observation 2, Section 3.3 gives
P[X 1T!Xo=S] = {T U2Si aST (j+1 UTk]}
j=0 i=1 k=j+2
and Observation 1 gives aST UT
n-l T
P[Xn-_l=TXo=T] = TT UTj
j=1 b'
Therefore T'
PnXT'] = -2 j a n-l
P[X=T'] 0 i= S T,j+1 lk=j+2 UTkJ o= +
j U [ bT,n
P[X-=T'] = [(aST,UT,2 ..UT,n-I+US,'-aST,2UT,3 ..UT,n +
..US,1...US,n-2*aST,n-l] P[ =S +
(UTl'UT2-.UT,n-l) PEXo=T] bTn
In this particular example P[Xn=T'] is the sum of n+l terms, each of
which consists of multiplications of n elements of the form a.., U. and
13 1
b. In each term, say aST, U T,2T,,3...UT,n- -bT,n, there exists one
and only one element corresponding to any time interval 1
aST,1 is the only parameter that carries the time subscript "1"). This
concept plus the assumption of constant f.(t) will be employed to derive
a simpler form for the objective function.
that the screening program is scheduled for Te{T1
the individual will be screened at ages T1,T2,.... and TM,
following notation
P'[Xt=i] = Probability
age t under
P[Xt=i] = Probability
age t under
P'[Xt=i'] = Probability
age t under
P[Xt=i'] = Probability
age t under
of being in the occult stage i at
no screening program.
of being in the occult stage i at
any screening program.
of being in the detected stage i' at
no screening program.
of being in the detected stage i' at
any screening program.
Then P'[Xt=i'] is known from the data and P'[Xt=i] could
data as a function of bi,t+ by the use of Observation 4
P'[Xt+ =i']
P'[Xt=i] : b'
i,t+l
For any time interval Ot
P'[Xt=i] = P[Xt=i] O0t
P'[Xt=i'] = P[Xt=i']
Ot
Temporarily, assume that every individual who will eventually get
the disease has the disease by the time of the first screening. Later
this analysis is used to modify the model to cover every possible case.
At time t = T1
P[XT =i] = (1-F) P'[X =i]
L 1 T 1
Assume
which means
and use the
be found from
This is because the last element of P[XT1 i] carries time T1 and is
modified by (1-F).
For t in the interval T1\ t
P[Xt=i] corresponds to time T and is therefore modified by (1-F).
Therefore the whole expression is modified by (l-F).
At time T2, under the same assumption (everybody has the disease
by the time of the first screening), in each term of quantity P[Xt=i]
there is one element corresponding to time T1 and one element
corresponding to time T2. These two elements are the only ones that
are modified by the factor (1-F). Therefore the whole expression is
modified by (1-F)2, i.e.,
P[XT =i] = (1-F)2 P'[XT =i]
2 2
This assumes that the outcome of each screen is independent of previous
results. The same analysis holds through for T2
generalized to establish that under the assumption that everybody who
eventually gets the disease has it by time T1, the screening cost will
be
I C (t) P[Xt=i] = C Cs(t) P[Xt=O] +
tcT iEO teT
(1-F)Cs(T P[XT =i]+(-F)2C(T2) P'[XT =i] + ... +
i 1
(1-F)1 Cs(TM) P'[XTM=i]
This is because the P[X =O] is not effected by the screening program;
(it is written separately).
Observation 4 can be employed to replace P'[XT.=i] by its
equivalent quantity P'[XT.+ =i']/b.,T+ .. Therefore
J 3
Screening cost = Cs (t) P[Xt=O] +
teT
M P'[XTk+ l i']J
Cs (Tk)-(1-F) --- -
k=l i,T +1
where M is the total number of tests.
To show the relationship between P[Xt=i'] and P'[Xt=i'] it is
necessary to investigate the changes due to bit in addition to those of
ai and Ui and to consider the effect of screening on each individual
element of P[Xt=i'].
Taking into account only those individuals who get the disease
before time T1 it is obvious that for Ot
between P[Xt=i'] and P'[Xt=i']--simply because there is no test in this
interval and therefore aij, Ui and bi's remain unchanged.
At time t = T1, according to Observation 4
P[XT =i'] = P[XT 1=i] b.,
But from the previous analysis P[XT1- =i] = P'[XT =i], because there
has been no test up to T1. Moreover, since a screening is done at time T
bi = b. (1-F) + F
b i,T1 T i,
Substitution for P[XT _=i] and b i, n P[XT =i'] gives
T 1j T i1"*
PCXT i'] = P'[XT 1=i] [b (1-F)+F]
1 T I i i ,T -'
But according to Observation 4
P' [XT 1=i] = P'[XT=i']/b ,T1
P'[XT-
P[XT=i'] b= -XTI-I [bT (1-F)+F]
,1 b
Il T
= P'[XTl i'] [1-F+ ]
i ,T1
For Tl
t=T1 and this element is either of the set a or U. Therefore the whole
expression is modified by factor (1-F), i.e.,
P[Xt=i'] = (1-F) P'[Xt=i'] T1
At time t = T2, under the assumption that everybody has the disease by
the time of the first test, in each term of the quantity P[Xt=i'] there
is one element corresponding to time T1 and the term biT corresponding
to time T2. The element corresponding to time t=T1 is of the set a or U
and is therefore modified by factor (1-F) and the element biT2 is
modified by
b = b' (I-F) + F
i ,T2 I ,T2
Therefore
P[XT =i'] = (1-F) P'[XT =i'] [1-F+ b
2 2 i,T2
For T2
times T1 and T2 and each of them is modified by the factor (1-F).
Therefore
P[Xt=i'] = (1-F)2 p'[xt=i']
T2
In a similar manner the whole treatment cost and death cost can
be shown to be
T1-1
Treatment cost + death cost = , P'[Xt=i'] +
i'ED t=O
CT i 1-F+ b F P'[XT =i] +
T1 1 T 1
T2-1
t Ct (1-F) T
t=T1 +1
P'[Xt=i'] + CT2, (1-F) (1-F+ b-- P'[XT2 i'] + ...
'2'' i T2ij 2
where Ct,i = CT(t,i') + Cd(t) di,(t).
Therefore under the main assumption that all of those individuals who
will eventually get the disease are involved at or before time TI,
the expected total cost will be
t
max k= k
(cost) = Zt'Cs(t) [Xt=0] + (-F)k
t=1
t-l
P'[X =i'l] max V k
b. + C, -(1-F) -
ibt [ i
Not everybody has the disease at or before T1 and, in fact, only
a known percentage of them may have the described situation. To find
this percentage, assume that onset time (the time that the individual
gets the disease), has a negative exponential distribution with rate X .
Then
-X t
P[XtO] = Xoe o
and
T "T
SP[Xt/0]dt = 1 e o
0
Therefore (1-e 1) percent of all the individuals who eventually get
the disease have had it at or before T1, and their contribution to
objective function is
1 e 'l] (cost)1
Using this argument, let us divide the population into M+1 groups--
M is the total number of test scheduled--in the following manner:
Group 1 consists of those individuals who have the disease at
or before T1.
Group m consists of those individuals who get the disease
between T _l and T m=2,...,M
Group M+1 consists of those individuals who get the disease
after last screening.
From assumption of negative exponential distribution of time of onset,
it is clear that
-A -T. -Ti+l
P[TitTi+] = e 1 e 1+1
Considering the second group it is clear that for any individual
in this group there would be no benefits from the first screening,
but from then on they get the benefit of any other screening.
Therefore, using the same analysis as that of group one, it is clear
that their contribution to the objective function is
4f-
tZ
e o 1 T2 mx ( I F)kT +1 Zk
t=T1+1
P'[Xt+l=i t+ max
b,t+l t=T+l '1 'E
I D Z tC s(t) "
i 'ED
t-l
( z1
(l-F)k=Tl+l
l-F+ b F Zt .P'.[X=i,
b,t J Ji
A similar analysis could be used for other groups.
The expected total cost is the sum of all contributions from different
groups: t
04 *T. -1 T. I max
O.F. = T e- e -=T o i +
T^T t +T1
J j-1
p'[X t,=i']
FZ C(t) t +
1 t s b i,t+l-
1-F+ 'F ti P'[ x =i']
P[Xt=O]
S zk
k=T +1 k
(1-F) j-
t-l
Z zk
k=Tj +1k
F) J-1
t
max
+ zI t Cs(t)
t=l
where T = {Tk; k = 0,...,M+1} and Tg = 0, TM+ = tmax
In this formulation Zt is the decision variable which takes on zero
or one, F is the true positive rate of the test, o is the rate of onset
of the disease and bit is the conditional probability of transition
from state i to i' during time (t-l,t). In the objective function o ,
and therefore P[Xt=O], and b.it are the only parameters that are not
explicitly known from data but the method of Section 3.4 could be used
to estimate them.
If, instead of negative exponential, the onset time has a general
distribution function FT (t), the objective function would be
o
t
t Zk
t max k=T. +1 -
O.F. = FT j)-FT (T j-1) I (1-F) 3-1
T.eT o o t=T. +1
t-l
P'[X t+1=i i ] k=T 1+1 k
Zt Cs(t)'- b +* (1-F) *
SZ max
(l-F+ t Cti P'[Xt=i] + Zt C(t)
P[Xt=O]
3.6.2 Alternative Form 2
This method is also used to determine the value of P[Xt=i] and
P[Xt=i'] as a function of Zt. The basic idea is the same as the first
method and the population is grouped according to the time of onset of
the disease. Therefore at each screening time different categories are
considered each of which gets a different benefit from screening tests.
For instance, for time t f T1 there is only one group which gets a
benefit from screening and the cost associated with them at any time
t is
t
I Zk
SCs(t) Zt (l-F)k=l P'[Xt=i] +
S Ct,i, 1I-F+ F_ Zt P'[Xt=i,] t T1
For any time t (T1
Group 1: Those who had the disease at screening time T1.
Group 2: Those who had not had the disease at screening time T1.
The cost associated with the first group at any time t is
t
SCs(t) Z (1-F)k1 Z P'[X =i,XT ] + Ct,i (1-F)
1-F+ b--F t P'[t=i 'XT l=J TI
where j is the set of occult stages.
The cost associated with the second group at any time t is
t
: z k
k=T +1 k
SCs(t) Z (-F) 1 PX =iX =] + C
1-F ZF t P =ix T0] T tiT2
In a similar manner for any time t(T2
people:
Group 1: Those who were carrying the disease at screening time T1.
Group 2: Those who were healthy at T1 but carried the disease by
T2'
Group 3: Those who had not had the disease at time T2.
The cost associated with these three groups at any time t is
t
a) C (t) Zt(I-F)k1k P'[Xt=iXT j]I +
C (1F)2 1-F bt Z P[X =iXTT ]
k=T+1 Zk
b) 1 Cs(t) Zt (I-F)
[ P'[x 'X =0-,XT ]
t
c) Cs(t) Zt (l-F)k=T 2
Sct,i, 1-F-+ b zt
1 bi,t
* P'[X =i,X =0,X T2J]i
b-F Zt
P [Xt=i,X2=0
for T2
__
__ |