
Citation 
 Permanent Link:
 http://ufdc.ufl.edu/AA00032928/00001
Material Information
 Title:
 Nonparametric regression in the analysis of survival data
 Creator:
 Ireson, Michael J., 1942
 Publication Date:
 1983
 Language:
 English
 Physical Description:
 ix, 115 leaves : illustrations ; 28 cm
Subjects
 Subjects / Keywords:
 Censored data ( jstor )
Censorship ( jstor ) Confidence interval ( jstor ) Estimation methods ( jstor ) Permutations ( jstor ) Point estimators ( jstor ) Sample size ( jstor ) Simulations ( jstor ) Statistical discrepancies ( jstor ) Subroutines ( jstor ) Failure time data analysis ( fast ) Nonparametric statistics ( fast ) Regression analysis ( fast )
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Bibliography:
 Includes bibliographical references (leaves 112114).
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 by Michael J. Ireson.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for nonprofit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
 Resource Identifier:
 11386682 ( OCLC )
ocm11386682 22664044 ( Aleph number )

Downloads 
This item has the following downloads:

Full Text 
NONPARAMETRIC REGRESSION IN THE
ANALYSIS OF SURVIVAL DATA
BY
MICHAEL J. IRESON
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1983
Copyright 1983 by
Michael J. Ireson
ACKNOWLED GMENTS
I am grateful for the stimulating contact I have
had with many faculty and students at the University of Florida. In particular I want to thank Dr. P. V. Rao for his advice throughout my graduate program and for his learned and shrewd guidance of my dissertation. Also I want to acknowledge valuable help from Drs. Jon Shuster and Ramon Littell.
I would not have undertaken this work without
encouragement from my wife and I shall always appreciate the sacrifices she made. I also thank Roband Nicola for their tolerance of an ofttimes reclusive dad. My special thanks go to Mrs. Edna Larrick for a speedy and excellent typing job.
Computing was done using the facilities of the
Northeast Regional Data Center located on the campus of the University of Florida in Gainesville.
iii
TABLE OF CONTENTS
ACKNOWLEDGMENTS ............
LIST OF TABLES .............
ABSTRACT . .................
CHAPTER
ONE INTRODUCTION .........
1.1 The Concept of Censoring 1.2 Aims and a Conspectus .
TWO REGRESSION METHODS ......
2.1 Regression Models ..........
2.2 The Cox Method ..... ............
2.3 The Linear Model .... ...........
2.4 The Miller Method ..........
2.5 The BuckleyJames Method ... ......
2.6 The Koul Susarla and Van Ryzin Method 2.7 Some Comparative Studies .. ........
THREE A NEW METHOD ...............
3.1 Structure and Notation ... ........
3.2 Point Estimation of a ........
3.3 Interval Estimation of 8: Exact Method
3.4 Interval Estimation of
S : Asymptotic Method ........
3.5 The TwoSample Location Problem . . .
3.6 Computational Aspects ........
FOUR COMPARATIVE STUDIES .... ..............
4.1 Simulations Using a Simple Linear Model 4.2 Analysis of Heart Transplant Data . . .
4.3 Summary of Results .... ...........
FIVE CONCLUSION .................
Page
. . . . . . . iii
. . . . . . . vi
viii
5
6
6 9
14 16
20 22 24
26
26 27
* 37
48 49 53
TABLE OF CONTENTS (Continued)
APPENDICES
A SMALL SAMPLE REGRESSION PROGRAM ......
B TWOSAMPLE ADAPTATION ...........
C LARGE SAMPLE PROGRAM ..... ............
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . .
BIOGRAPHICAL SKETCH ....... ................
Page
94 104 107 112 115
LIST OF TABLES
Table Page
1. Values of bij and (c ij) for Example 2.16 34
2. Computation of S(b) .... ............ . 35
3. Frequency of occurrence of censoring at
different x in 2000 simulations of samples
of size n = 7 ..................... .... 58
4. Percentages of 200 simulations leading to
infinite intervals by N and percentages
giving convergence problems with BJ ..... .. 63
5. Arithmetic means (x 10 3) based on 2000
simulations of slope estimates for samples
of size n = 7 ...... ................ ... 65
6. Arithmetic means (x 10 ) based on 200
'simulations of slope estimates for samples
of size n = 25 ...... ................ . 66
7. Simulation results concerning estimation
of the variance of the BuckleyJames
estimator with sample size n = 7 ....... ... 68
8. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 7 ..... ................. . 70
9. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 10 ..... ................. .... 71
10. Percentage coverage of confidence intervals based on 200 simulations with sample
size n = 15 ..... ................. . 72
11. Percentage coverage of confidence intervals based on 200 simulations with sample
size n = 25 ..... ................. . 73
12. Mean lengths of 90% confidence intervals based on 200 simulations for sample
size n = 7 ...... ........... ..... ... 78
LIST OF TABLES (Continued)
Table Page
13. Mean lengths 6f 90% confidence intervals based on 200 simulations for sample
size n = 15 ..... ................. ..79
14. Differences in mean lengths of 90% confidence intervals based on 200 simulations for
sample size n = 25 .... .............. . 80
15. Proportions of 200 simulations for which Ho: 8 = 0 was rejected at a test level
a= 0.1 ....... .................... . 82
16. Regression slope estimates and confidence intervals for logl0 of survival time versus
age at transplant with Stanford heart
transplant data ..... ................ . 85
17. Regression slope estimates and confidence intervals for logl0 of survival times versus
mismatch score wits n = 157 Stanford heart
transplant patients .... .............. . 85
vii
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
NONPARAMETRIC REGRESSION IN THE
ANALYSIS OF SURVIVAL DATA
By
Michael J. Ireson
August 1983
Chairman: Pejaver V. Rao
Major Department: Statistics
Confidence intervals and point estimators are defined for the slope parameter of a simple linear regression model fitted to rightcensored data.' Estimation of shift using two independent rightcensored samples arises as a special application. Existing nonparametric methods for the regression problem all depend on asymptotic results whereas the new method gives intervals which have exact coverage probability under some assumptions on the censoring mechanisms.
The suggested technique is an extension of a nonparametric regression method that is available for uncensored data. A statistic is defined as a function of a hypothesized slope value and that statistic is considered over a search of slope values. A series of permutation tests are defined on the statistic and the corresponding inversion produces the confidence interval for the slope value.
viii
Exact permutation tests are used for small samples but asymptotic results are available for larger samples.
The new method of estimation, which is particularly recommended as a small sample method, appears to have several desirable properties. Simulation studies indicate that the method performs well over all sample sizes and is robust against a variety of censoring patterns.
CHAPTER ONE
INTRODUCTION
1.1 The Concept of Censoring
A good deal of research data do not conform to optimal statistical designs and methods. A common departure from the ideal is the "missing data" problem. Another departure, common in certain contexts, is the occurrence of what we might think of as partial information. For example, in medical followup, the time from an initial treatment to a particular event, such as "relapse" or "death," would not
be available if contact with the patient was lost. A "loss" would often be due simply to a decision to analyze the data at a certain time regardless of whether all patients had undergone the particular event. Thus, for those patients alive at the time of analysis the time from treatment to death would not be known; but it would be known to be greater than their observed survival time. Such partial data are termed in general, censored data, and in this example where the observed value is a lower bound for the "true" value it is termed a rightcensored value. We define this and other types of censoring more formally later.
The terms "lifetime" and "survival time" are natural
in a medical context or anywhere where time is the response
measure, and "survival analysis" is a general term used for methods of analyzing censored data. However, censored responses need not concern survival or be a time. Miller (1981) quotes an example from Leavitt and Olshen (1974) where the response is the amount paid on an insurance claim; in some cases the patient's illness is over and the total claim is known, in others the patient is still sick and only the amount paid to date is known.
Much of the early work in lifetesting, following a pioneering paper by Epstein and Sobel (1953), was done in a parametric framework. That work was largely applied to electrical engineering problems where certain distributional assumptions seem to be appropriate (cf., Mann, Schafer and Singpurwalla, 1974; Nelson, 1981). Mann et al. also describe Type I and Type II censoring mechanisms which are particularly appropriate for industrial experimentation. Our focus is on situations, particularly medical, where we would like to use a distributionfree technique and where the censorship is random in the sense that we now describe.
Let Ti, i = 1,...,n be independent random variables, which we may think of as representing true, possibly unobservable, lifetimes and let independent variables Cif i = 1,...,n be censoring variables in the sense that Ti is observable only if T. < C.. We can observe variables
1 1
(Y~i,3i) i = 1,... ,n where
Y. Min (Ti,Ci),
and
1 T. < Ci' (i.e., T. is not censored) T. > C..
This is a random censorship model. It is common to assume that Ti and Ci are independent. Since the observable Yvalues are lower bounds for the "true" values, they are described as rightcensored values. In random leftcensoring we would observe Ti only if Ti > Ci; that is, we observe
Yi =Max (Ti,C) ,
and
T > Cif
0 T. < C.
1 1
Both types of censoring might occur in the same data set and both are special cases of interval censoring in which we observe only that the random variable of interest falls in an interval. For example, with leftcensoring we observe only that Ti falls in the interval (,Ci].
The occurrence of censoring will now be discussed with reference to a particular medical studya study to which we shall refer throughout. The Stanford ileart Transplant Program is described by Brown, Hollander, and Korwar (1974)
and by several other authors. A transplant patient will receive a transplant if he survives until a donor is found. Records for a patient who does receive a heart include date of admission to the program, date of transplant and the date of death if a death occurred. Survival times of those who receive a heart will in some cases be rightcensored. This would occur if analysis was carried out on a date on which some recipients were still alive, or if contact was lost with a patient at a time when he was known to be alive, or if a patient died from some cause not related to heart function. It seems reasonable in this example to assume that the censorship time is not related to the survival time except perhaps in a case where a patient may have chosen to drop out due to some features of heart performance.
Special techniques are needed for analyzing censored
data. Many of the available nonparametric methods for these problems are given in recent textbooks by Miller (1981), Kalbfleisch and Prentice (1980), Lawless (1981), and ElandtJohnson and Johnson (1980). The particular problem tackled in this study is now described.
1.2 Aims and a Conspectus
This dissertation concerns regression analysis where the response variable is rightcensored. Such an analysis might be useful for example with the Stanford data in relating survival time to one or more concomitant measures taken on patients. Waiting time for a donor would be one such measure. Others available in that data set are age of patient and a mismatch score which measures the degree of tissue incompatability between the donor and recipient hearts.
In the next chapter we review some of the methods that have been proposed for the regression problem and in Chapter Three present a new method. Some simulation results and results of analyzing the Stanford heart data are presented in Chapter Four and concluding observations and suggestions are made in Chapter Five. Fortran coding and documentation for the new method are included in appendices.
Tables and figures are numbered on separate systems
which run consecutively through the whole text. All other cross referencing follows the pattern chapter, section and item within the section in that order, except that the chapter reference will be dropped when the citation appears in the same chapter. Thus the fourth numbered item in Section Two of Chapter Three would be cited as (3.2.4), except in Chapter Three where it would be simply (2.4). Theorems, Lemmas, Definitions, Examples, etc. all fall consecutively into the numbering scheme.
CHAPTER TWO
REGRESSION METHODS
2.1 Regression Models
The four main nonparametric regression techniques
currently available are due to Cox (1972), Miller (1976), Buckley and James (1979), and Koul, Susazla and Van Ryzin (1981). These methods will be described later in this chapter, but first some preliminary discussion about hazard rates and accelerated time models is in order.
For a random variable T > 0 with density f(t) and
rightcontinuous distribution function F(t) the survival
function is
P(t) = 1  F(t) = Pr(T > t). (1.1)
The hazard rate (hazard function or force of mortality), A(t),is defined
XWf (t) (1.2)
X(t) = 1  F(t) '
with the interpretation that A(t)dt is the probability of "death" in the interval (t,t+dt) given survival beyond t. Integration of A(t) shows that
P(t) = exp ( t (u)du). (1.3)
The hazard rate forms the essential structure of the Cox regression model. It can be connected with a linear regression model in the following way. Suppose U0 is a survival time with hazard rate X (u) = f (u)/(i  F (u)). It might be appropriate to assume that for a parameter 3 and some covariate x we could obtain different failure times via
Ux = e xUo. (1.4)
The relationship (1.4) is referred to as an accelerated time model, and indeed with 3x < 0 we have U x < U0 and the model represents a situation where a failure time experiment might be speeded up by some control over a covariable x (cf., Mann et al., 1974). This model is readily seen to coincide with a loglinear model,
Tx = Log Ux = a + ax + 6, (1.5) where
= E[Log UO] and s = Log U0  E[Log UO).
Notice that in (1.5) the covariate acts in an additive way on the log lifetime.
With model (1.4) the hazard rates of Ux and U are related in the following way;
x (u) f f (u)
x 1 F (u)
x
f (e u)e X
1  F (e$ u)
0
= A (e u)e
If U0 has an exponential distribution, then the hazard rate A0 () is constant. Letting A (ea u) = A we see that for this special case,
X (u) = X e x.
That is, the hazard function for Ux is proportional to some function of x and the covariable acts in a multiplicative way on the hazard. This motivates consideration of a proportional hazards model,
A (u) = (u) e ax (1.6) for some baseline A (') and a parameter a.
0
We have seen that with the exponential distribution, the accelerated time model (1.4), which is a loglinear model (1.5), implies a proportional hazards model (1.6). In general the two models do not coincide. Only in the case of the twoparameter Weibull family of distributions will they be equivalent (Kalbfleisch and Prentice, 1980, p. 34).
The Cox approach to regression with censored data is through the proportional hazards model, while the other methods we describe assume a direct linear relationship between a response variable, possibly transformed, and one or more covariables.
2.2 The Cox Method
The proportional hazards model proposed by Cox (1972) is
X(t,x) = 0 (t) e' , (2.1)
where X(t;x) is the hazard function corresponding to the covariate value x. The vector of regression coefficients is  and the baseline X (t) is the hazard rate when x = 0. The model implies that distribution functions F(t;x) form a family of Lehmann alternatives and thus that, in terms of survival functions,
P (t;x) = {P (t)}y' (2.2) where y = e'x (Miller, 1981, p. 120). The generalization of (1.3) is
P(t;x) = exD 1 J Ao(u)du} (2.3)
0
In the Cox analysis, A (t) remains an arbitrary
0
nuisance function in the estimation of a, but we must estimate X (t) in order to make inference about the survival
function. We first describe the estimation of . Cox argued that the only times at which X(t;x) in (2.1) could not conceivably be zero are the observed uncensored survival times. Thus, with Xo (t) arbitrary the only times at which
(t) could not be zero are those uncensored times and
0
those times are the only ones giving information about a. This stance led Cox to seek a likelihood by conditioning on the observed uncensored survival times, and then to treat that likelihood as an ordinary likelihood.
Suppose there are no ties and that the censored and uncensored observations are ordered
y(1) < y(2) < <(n)
Associated with each y(i) is a censoring indicator 6 (i) and covariate vector x.. Let R denote the "risk set" for
1 (i)
each time y(i) i.e.,
R(i) = {j : Yj >I y(i)}.
Using (2.1) and the interpretation given at (1.2) we see that for each uncensored time y(i)'
Pr {death in [y(i) 'y(i) + dy) I R (i) S'x.
E e X 0(Y(i))dy,
JER(i)
and
Pr {death of (i) at time y(i) Ione death in R(i)
at time y(i) }
e . (2.4) j ER W
i(i)
The product of the conditional probabilities at (2.4) is called a conditional likelihood:
_8'x(i
L (8) = M e (2.5) u e3
JER(i)
Note that (2.5) is not a true conditional likelihood.
Standard likelihood analysis is applied to (2.5) and iterative methods are usually required (cf., for example, Kendall and Stuart, 1973). Through standard maximum likelihood arguments, Cox (1975) suggests that a from maximizing (2.5) is asymptotically normal with mean a and covariance matrix the inverse of the information matrix:
a2
I() = _2 Log L (5). (2.6) Da 2 c
A formal proof of asymptotic normality is given by Tsiatis (1981).
A formal justification for the use of (2.5) in the
case of no ties is given by Kalbfleisch and Prentice (1973). They show the equivalence of (2.5) and a marginal likelihood
for ranks defined on the censored and uncensored observations. Cox (1975) constructs a full likelihood which is written as a product of two terms, one of which he calls the partial likelihood. This partial likelihood coincides with Lc (), from (2.5). Cox suggests that the partial likelihood contains most of the information about  and this seems justified by Efron (1977) and Oakes (1977) who have established that the Cox estimator is nearly fully efficient.
We now need to consider the case of ties in the data.
In the case of an uncensored time equalling a censored time, the censored time is considered larger and the foregoing method applies directly. Cox (1972) suggests ad hoc modification of the procedure if there are just small numbers of ties. His generalization for discrete time is considered computationally not feasible. Two other modifications to the likelihood expression that have been proposed for dealing with discrete data are due to Peto (1972) and Prentice and Gloeckler (1978). The method of Prentice and Glpeckler does strictly adhere to a proportional hazards model; the other two methods do not.
Estimation of the survival function via (2.3) requires estimation of X (t). A method for this has been given by
0
Breslow (1974).
The Cox model is flexible and it is interesting to note two adaptations of it. Firstly, suppose we have a
twosample problem with no ties and we let a single covariable x take values 0 or 1 to indicate the sample. The Cox test of H = 0 using the test statistic,
Log Lc (0)
Log Lc(0)
is then formally equivalent to the MantelHaenszel (1959) test of equality of the two distribution functions. Secondly, we can permit the covariable to be a function of time; i.e., use x(t) in (2.1).
The textbook by Kalbfleisch and Prentice (1980) gives a good account of the proportional hazard approach to regression analysis with censored data and lists many useful references.
The Cox analysis does not in general give a direct
relationship between T and x. Some indication of a direct relationship can be seen using estimates of median lifetimes using the survival function estimates at various settings for the vector x. For one covariable the median could be plotted to give a comparison with the results of a direct regression analysis.
The performance of the Cox method and of the other
methods to be described later will be discussed in Section
2.7 and in Chapter Five. Next we define a linear model and then describe three methods of analysis, all of which use a form of least squares analysis.
2.3 The Linear Model
We now state the structure, notation and possible assumptions for a censored linear model that will be referred to throughout the sequel. Thus,
T. = a + I'x. + E i = 1,... ,n, (3.1)
where a is a parameter, a a pxl vector of regression parameters and x a p x1 vector of known covariables. The error terms Ei are independent and identically distributed (i.i.d.) with common distribution function F(') and E(Ei) =0. The response Ti may of course be a transformation, possibly log, of some other response of interest. The Ti are not all observable. Let Ci, i =I,...,n be independent censoring variables and (Yi,6i) be observable pairs of variables where
Y. = Min (Ti,Ci) i = l,...,n,
and
1i if T. < C.
if T. > C..
1 1
Attention must be given to three possible assumptions.
A fundamental assumption imposed throughout is
Al: Ci, si are independent i = 1,...,n.
A second, which is essential for the validity of some of the following methods is
A2: Pr(Ci < c x = 1i) = G (c) = G (c  i)
1
where G is some distribution function and G is the
i
distribution function for the censoring variable C. at x.. Assumption A2 imposes a dependence between C. and x.. The
1 1
distribution shift given by A2 ensures that the probability distribution of 6. is independent of x.. To see this
1
we have
Pr(6 i = lx = x.) = Pr(Ti
f Pr(Ti< c1x= xi, Ci= c)dG (c8_'xi) =J Pr(Ti
= F(c a _xi)dGo (c  Ixi) f F (u  )dG 0(u), which is independent of x.. An alternative to A2 is
A3: G (c) =Go (c).
1
Thus under A3 the censoring distribution does not depend on x..
1
These assumptions will come into consideration throughout what follows. We now look at three methods of estimating the parameters of (3.1).
2.4 The Miller Method
For simplicity of description we now take (3.1) with
p =1, but note that all the methods of this chapter do apply
to multiple regression.
The Miller method is seen to be a natural modification of ordinary least squares by writing the residual sum of squares for the uncensored case in the following way;
n 2 nf 2
7 (yitSxi) = z () dF n(z),
i=1 100
where
z (0) = Y   xi, (4.1) and F is the empirical distribution function of z. (),
n 1
i = 1,...,n. In the censored case some yi and hence some z. will be censored values. Miller's (1976 modification of
1
theleast squares'method is to estimated byminimizing
fz2 () dF(z) (4.2) where F(z) is an estimate of the distribution function of z based on the observations (zi,6 i), i = 1,2,...,n.
For ordered censored and uncensored observations,
z (1) < Z(2) < ... < Z (n), the productlimit (PL) estimator for F(z) introduced by Kaplan and Meier (1958) is
(z) =1 E (n i (4.3) {i:z (i) < z}, 6
where 6(i) denotes the 6 associated with z (i). There is a difficulty if 6(n) 0; the convention is to define F(z) =1, z >Z (n)* The PL estimate F(z) is a step function with jumps only at the uncensored points. The size of the jump at an uncensored z (i) is
1 tn3
dF(z ) =  'nj2l' (4.4)
Wi ni+1 nj+1
{j: j
The computational forms given at (4.3) and (4.4)
are applied to tied zvalues by assuming that in a run of tied values the uncensored values precede the censored values. Otherwise the labelling across the tied group is arbitrary and the jumps at each uncensored point in a tied group are equal (by 4.4).
Denoting the jump (4.4) by w.($), (4.2) becomes
n . 2
w i(0) (yi  c  $xi) (4.5)
i=l 11
The weights wi (8) do not depend on a and with the convention that 6(n) is set equal to one in all cases,
n
we have E wi() = 1. If 6i =0, then Wi(i)=0 and (4.5)
i=l
appears to depend only on the uncensored observations; however the PL estimator and therefore each weight do depend on all the observations. Seeking an &which minimises (4.5) we get
^ n  n
wi(0)yi  a Z wi(1W)xi, (4.6)
i=l i=
and resubstituting this in (4.5) would require us to minimize a function
n
Z wi(a) (yi a axi)2. (4.7) i=l
This could be minimized by a search procedure, but this has difficulties, especially in higher dimensions. Miller (1'976) advocates a modified approach vhich we now describe.
The alternative procedure is an iterative one. An initial set of zvalues are formed using .(4.1) with a =0
A A A
and some starting estimate, 1o , for 1. Weights wi (1o) are then computed using (4.4) and the next estimate for 1 is defined
2 wi (00) Yi (xixu)
= ^* ^ * 2 (4.8)
2 Wi (, ) (xix)
1 0 1
u
where
1 0
wi4(O) = ^* ^
u
* =2Zw.(3)x, Xu io xi
u
with the summations being over the uncensored observations.
The estimate al is then used in (4.1) and so on.
Notice that since the weights do not depend on a we chose
a =0 in (4.1) for convenience and also note that because the weights are renormalized over the uncensored observations the estimation of a does not require any adjustment to a (n) = 0. As a starting value 0 , Miller suggests using the unweighted least squares estimate of slope using only the uncensored observations. The final estimate of a is used in (4.6) for calculating a and in this computation we do set 6(n) =1 regardless of the actual censorship.
This iterative method might not attain convergence. If estimates become trapped in a loop, Miller advocates averaging the values in the loop. Assumption A2 is a sufficient condition for the estimates cX and S to be consistent and for S to be asymptotically normal. Under an additional assumption that the variability due to weights i(5) is negligible, Miller obtains an asymptotic estimate for the variance of S given by
A * A 2
w wi (N)(yi a xi)2
Var () = u A* A t 2 (4.9) z W.0) (x x
. wi ( ) ( i _ u)
u
Clearly if A2 is not satisfied, as in the case where censoring is independent of x with a fairly steep regression line, then there will be an irregular censoring pattern along the regression line which can lead to biased estimates. In such cases Miller suggests dividing the x range into two or more intervals each of which shows
a more even censoring pattern. Separate sets of weights are computed for each interval and the sum of squares to be minimized is then a weighted average of the separate weighted sums of squares. This correction does of course demand a fairly large number of observations and there are practical difficulties in applying it to the multiple regression situation.
2.5 The BuckleyJames Method
The method due to Miller applied estimated weights
to the residual sum of squares and then used a least squares
solution. For the same model (3.1) the BuckleyJames method (BJ from here on) modifies instead the solution to the usual normal equations by using estimates of Ti when these are not observable. Let the pseudo random variable Yi be defined by
Y. 1 6 + E[TiIT> Yi](161). (5.1)
Since E(Yi) = a + Sxi (Miller, 1981, p. 151), we could
1
*
estimate 8 using ordinary least squares if all the Yi were observable. The BJ approach is to estimate the observed value yi whenever 6. = 0 by using the following estimate
i1
in (5.1):
k ()Z
r[TinTi >yi] = xi I (5.2) F 
21
Here as before z i y  SX., F(') is the PL estimator based on (zi,i), i = l,...,n, and wi(), i = l,...,n are the jumps of F() (see 4.4). The second term on the righthand side of (5.2) uses a weighted average of those zk' s which are greater than the current zi. Iteration is again necessary.
Given a starting estimate of a, (5.2) is used in (5.1) to obtain the "responses" yi, i = l,...,n. Using least squares the next iterate for S is
n *
E yi (xix')
A i=l (5.3) n 2
n (xix)
Again this process may not converge. Oscillation cycles might develop and the average of values in the cycle is suggested as the solution. Given a solution for S the solution for a is
a= y  5x. (5.4)
The starting value for the iterations might be the
same as for the Miller method or it could, as suggested by Miller and Halpern (1982), be the least squares solution for a using all of the data as if they were uncensored. The variability among the oscillating values if they occur is thought to be less in the BJ method than in the Miller method.
Buckley and James (1979) suggest but donot fully justify, that S is asymptotically normal and that a reasonable estimate of variance is
"2
Var (5) = (X _ u2 ' (5.5)
u
where
^2 = 1 7 (i u))2
au ni  (YyiYu(xxu)) . (5.6) u u
Although this method does not require assumption A2 for point estimation, we would not expect (5.5) to be adequate if there is extremely uneven censoring along the regression line. As in the Miller method, it seems sensible to consider dividing the x range to create more homogeneous regions.
We should note that the BJ method is a nonparametric analogue of a technique due to Schmee and Hahn (1979) that uses normality assumptions in estimating the expectation at (5.2).
2.6 The Koul, Susarla and Van Ryzin Method
This method (KSV) assunes the Model (3.1) and like BJ defines a pseudo random variable which is unbiased for a + ax and then uses estimates for some observations of that variable. In this case the pseudo random variable is
, 5.Y.
y= 11 , (6.1)
1 1 G(Yi)
where C(') is the right continuous distribution function for the censoring variables. The Yi variables, being either
zero or inflated values of true survival times, are not intuitively pleasing but we do have E(Y.) = a + Sxi. Using an estimate of G() we can obtain observations yi, i = 1,...,n, and then apply the usual least squares solutions (5.3, 5.4). No assumption about G() is required for Yi to be unbiased but in order to estimate G() we must assume that it is independent of x; i.e., we make assumption A3 (cf., Section
2.3).
By reversing the roles of the survival and censoring random variables it would be possible to estimate the distribution function G(') using the PL estimator; however, KSV advocate the use of an alternative, asymptotically similar estimator, which is technically more convenient in their theory. Use is made of a Bayesian estimator from Susarla and Van Ryzin (1980),
1 S.
1N+ (Yi) 11  G(t) = . N+ (6.2) i: yi
where N +(y) is the number of yj greater than y. Note that in (6.2), because we are making inference about censoring variables, we take censored y's as preceding uncensored
y's in the case of ties. Like the PL estimator, the estimator at (6.2) can be unstable for large t, and KSV propose a truncation of very large observations and use (6.2) only for Yi Mn" The selection of Mn, however, is not clearcut.
No iteration is required and the parameter estimation is computationally straightforward for any number of covarables. Koul et al. (1981); give conditions for consistency and asymptotic normality of the solutions and discuss their implications in practice. Integral expressions are given for the asymptotic variances and covariances of the estimators, but computational forms are available only for the one covariable case.
2.7 Some Comparative Studies
The Cox method has been given much attention in the
literature and is used extensively, yet there is much appeal to the direct easy interpretation ofthe linear model. It would seem that the choice between the Cox method and the other three methods just outlined must be based on the appropriateness of the proportional hazards model or the linear model. However, a distinction may be hard to make in practice. The choice of method should also be influenced by the nature of the censoring, but again this is not likely to be clearcut. The methods are asymptotic and in most cases involve heuristic arguments or hard to verify conditions.
Of major concern, then, is how robust the methods turn out to be in practice. Some trials and comparisons that have been made with these methods are now referred to briefly, but a fuller appraisal of various aspects of the methods is deferred to Chapter Five.
In his original paper Miller (1976) compares the performances of the Cox and Miller methods in analyzing data from the Stanford Heart Program that was referred to in Chapter One. Miller and Halpern (1982) extend this to a comparison of all four methods and come out generally in favor of the Cox.and BJ methods, finding that the other two methods have methodological weaknesses. Buckley and James (1979) provide some simulation results for the Miller and BJ methods. Most of their trials are at sample size 50 with 50 percent censoring. They employ a variety of residual error and censoring models and investigate the bias of the estimate. The BJ method appears robust, while the Miller method seems rather sensitive to departures from assumption A2 (see Section 2.3). Buckley and James (1979, p. 434) do not report coverage probabilities for their intervals but do report, "quite adequate variance estimates" (using 4.5).
All of the methods considered in this chapter are asymptotic methods. In the next chapter we develop a procedure which is appropriate for all sample sizes.
CHAPTER THREE
ANEW METHOD
3.1 Structure and Notation
We shall adopt the model and notation of Section 2.3.
A method will be described for estimating 8 in model (2.3.1) with p =1. The observable pairs (Yi,6i) described in Section 2.3 may also be represented in the following way. Let Yi (=Ti) denote an observable uncensored random variable and Y. (= C. < T.) denote an observable censored random var1 1 1
iable. The general notation, Y +) isusedto indicate that censoring might or might not be present. Sample observations will be denoted (y(+) x)
Assumption A2 of Section 2.3 will be imposed in order for the new method to provide a confidence interval. It might be argued that A2 imposes a severe restriction on the practical utility of the proposed method, but a Monte Carlo study reported in Chapter Four indicates that the method is fairly robust to departures from A2. Recall that assumption A2 is also required for the use of the Miller and BuckleyJames methods. This is discussed further in Chapter Five.
3.2 Point Estimation of S
In the case of no censoring, some procedures for
testing Ho: a =b and for estimation of a use the fact that
Zi (b) = Yi  bxi = a + (b)xi +
and xi are functionally independent under H . Any test for trend among the pairs (Zi (b),xi) may be used to test H0 Suppose the test statistic S is a function of the pairs (Zi(b),x) , i = l,...,n; then following Hodges and Lehmann (1963) an estimate of a can be defined as that value b which makes S = S (Z(b),x) most consistent with a hypothesis of no trend relationship between Zi (b) and x. Typically, the statistic chosen is T, the hypothesis of no trend is T =0 and T is distributionfree, symmetric about zero (cf., Kendall, 1970, Sen, 1968).
Brown, Hollander and Korwar (1974) extend the above
testing procedure to the case of censored data, but do not attempt to estimate 6. They provide a test of Ho: 8 =S0 using a statistic which is a special case (b =0) of that which is now described and given at (2.3). Following the above discussion and taking censoring into account, define
Zi+)(b) = Y(+)  bxi., i = 1,...,n. (2.1) As before, under H 0 =b there will be no trend among the pairs (Zi(+) (b), xi) and our purpose now is to define a
statistic that would indicate any such trend. Firstly, in the intuitive spirit of Brown, Hollander and Korwar we state the following definition. The argument b of Z(+)(') is suppressed for convenience and Z without superscript is the variable value without regard to censorship. Definition 2.2. A pair of points (Z+) ,x) , (Z+) ,x) with xi > x. are "definitely concordant" if, either 6. = 6. = 1 and Z >Zj, or 6. =0, 6 =1 and Z > Z and "definitely discordant" if, either 6i = 6. = 1 and Z.
Pairs of points for which Definition 2.2 does not apply are considered as unable to contribute to an assessment of overall concordance and are ignored.
Motivated by this assessment of pairs of points we
define a statistic that may be used to indicate trend among the pairs (Zi (b) ,x i = 1,... n, by n n
S(Z (+) (b) ,x) E E n..(b) f x i x), (2.3) i=l j=l
where
ij (b) = {6ji(Zi (b)Z. (b)) 6i(Zj (b)Zi (b))},
J 1 t > O,
W 0 t < 0,
;i t > 0,
(t) = :i0::
(0 t <0.
For convenience the arguments for Z(+)(.) and S(") will be suppressed for general reference. To emphasize the dependence on b, S(Z(b),x) will often be abbreviated to S(b). Notice that S is the number of "definitely concordant" pairs minus the number of "definitely discordant" pairs and thus has some appeal as a measure of trend among the pairs (Z(+),x.). Values of S close to zero indicate no trend.
Two other forms of the statistic S will be found useful in the sequel.
i) In the special case of distinct Zvalues (i.e., Zi (b) # Z. (b) for i j) and ordered distinct xvalues, x1
nl n
S(b) = Z E {tp(Z (b)Z (b)) (+6i)  6.}" (2.4)
j=l i=j+l 1 1
ii) If Zvalues are distinct and ordered Z1
nl n
S(b) = E E 5. sgn (x..), (2.5)
j=l i=j+l 3 (ix
where
1 > 0,
sgn (t) = 0 t = 0,
 t < 0.
Here the dependence on (b) is through the ordering of Zvalues and consequent relabelling of 6's and x's.
An important property of S is stated in the following Lemma.
Lemma 2.6. The statistic S(b) is nonincreasing in b. 'Proof: x. > xj implies that Zi (b)Zj (b) =Yi Yj  b(xixj) is a decreasing function of b. This, together with the fact that (t) is nondecreasing in t, implies that nij (b) is nonincreasing in b for each pair (i,j) with xi > xj, which completes the proof.
As noted earlier any solution to the equation S(b) =0 is a reasonable estimate for 8. An exact solution might not exist but Lemma 2.6 ensures that we can define a unique point estimator in the following way. Definition 2.7. Let S = Sup {b : S(b) > 0}
andS = Inf {b : S(b) < 0}, then a unique point estimator for S is
70~( + a
Clearly the statistic S(b) depends on the ranks of the Z's, the ranks of the x's and the censoring pattern. The xvalues and the censoring pattern are fixed for a particular data set but numerical realizations of Z change with b (cf., 2.1). As noted in the following Lemma, for certain "critical" bvalues two or more Z's will change rank. Some further properties of S (b) as a function of b are given in Lemma 2.8.
Lemma 2.8.
i) A change in the value of ij (b) can occur only as
b changes through
Y.Y.
b.. x. > x.. (2.9)
1 1
ii) For (i,j) xi> xj the change in nij (b) as b increases through bij is
c =  (6. +6.). (2.10) iii) Suppose there are ï¿½ distinct values of bij given
by (2.9). Denote these values,
b < b2 <, ...,
Bk = {(i,j) : bij= b kxi> xj}, kl,...,.
Then S (b) is a step function in b with steps of magnitude c k at bk where
ck = E c (2.11)
(i,j) Bk
Proof :
i) Since (b) = Y!+)  bx., it follows that for
1 1
x. > x.,
1 J
> b
1J
Zi(b) = Z. (b) b = bij , (2.12)
< b >b..
13
where bij is as in (2.9). Therefore
nij (b) = {6i(Zi (b)  Zj (b))  Gi (Z (b)  Zi (b))}
can change only as b changes through bij. ii) Notice that j , Zi (b) > Z (b) qij (b) = j i , Zi(b) =Z (b). (2.13) (6i , Zi(b) < Z (b) In view of (2.12) it is clear that for (i,j) : x. >X.
i I
we have that nij (b) decreases by (6j+i) as b increases through bij. Letting cij denote the change, c ij=(6.j+6i)
completes the proof of (ii). iii) Recall that
S(b) = E niJ (b), (2.14) 1(i,j) :x>x.1
3
which can change with b only when nij (b) changes with b for some (i,j). Since ij (b) changes only at b =bk' S(b) will also change only at b =b For a specified bk the change in S (b) is the accuanulation of the change in l.. (b) 1J
as (i,j) varies over Bk; that is the change is ck = E c.. which completes the proof of (iii). (ij)EB k
We now make some observations relevant to the computation of S and 8.
(a) If for a data set there are no tied x's and no coincident b.. values, then in Lemma 2.8 (iii), k n
(b) Since S(b) is nonincreasing in b (Lemma 2.6), the greatest possible value for S(b) is S (b0) where
b0 < bI < b2, ...,
(c) Any change in S(b) which may occur at bV,b2...,b , depends (via 2.10, 2.11) on the censorship pattern of the Z(b) which change rankat the critical b.
(d) Each critical b value bk has a change value ck, k = l,2,...,k, associated with it (Lemma 2.8 (iii)) and easy sequential computation of S is facilitated in the following way. Let sk denote the value of S(b)for bk < b < b k+ with b,+1 =c6, then
sI =s + c, s2 1 s c2,
and in general
sk = sk1 + ck, k = 1 . (2.15)
(e) The algorithm given in (d) does not give an evaluation of S at the critical b values but this is not needed and the enumeration given makes it easy to locate the point estimate proposed in Definition 2.7.
34
For the purpose of illustration consider the following hypothetical example consisting of n =5 (x,y (+)) pairs.
Example 2.16
i X.
Yi y i
(S.
Using (2.9) and
for (i,j) : x. > x..
1 3
b2,1 = (2
(2.10) we compute For example,
 3) / (2  1) = 1
bij and cij values
and
These and Table 1.
C2 1 c ,1
the other
Values of
(1 + 0) = 1. values are given in Table 1. b.j and (cij) for Example 2.16.
i\j 1 2 3 4 5
1
2 l (i)3 0 (2) 1 (1)
4 0 (1) 0.5 (0) 0 (1)
5 0.25(2) 0.67(l) 0.5(2) i(l)
The first tation of
step in the computation of the sk is the compuso. This can be done using (2.3) with any value
b < bl; however, the computation can be much simplified in the following way. For any b < bI, (2.12) and (2.13) imply n ij (b) = 6. if xi > x.. Therefore, from (2.14)
s = E
0 (i,j):x.>x.
1 3
Since entries appear in Table 1 only if x. > x. we see that
1 3
s is simply the total number of entries under columns for which 6. = 1. In our example these are columns 1, 3 and 5,
J
thus so 4 + 2 + 0 = 6. Sequential computation of S then follows using the ordered bk, k = i,...,ï¿½ and the corresponding "change" values, ck, computed using (2.11). The resulting step function S (b) is tabulated in Table 2 and graphed in Figure'l. The point estimate is S = 0.25.
Table 2.
Computation of S (b)
s =6
k bk ck Sk=S k1 +Ck
1 1 . 5 2 0 4 1 3 0.25 2 1 4 0.5 2 3 5 0.67 1 4 6 1 2 6
S (b)
6
I I
1.0
0.5
0.5
1.0
2
4
6
Graph of S(b) vs. b for Example 2.16.
2
!
!
Figure 1.
3.3 Interval Estimation of S: Exact Method
A 100 (la)% confidence set for a can be obtained by inverting an clevel test of H 0 8 =b (Lehmann, 1959,
0
p. 173). Our objective in this section is to show that a distributionfree confidence interval for a can be defined in terms of a permutation test of Ho: = b using the statistic S(b) defined at (2.3).
Consideration is now given to using permutation distributions for S (b) that are generated for a given data set under assumption A2 of Section 2.3. A direct consequence. of assumption A2 is that the probability of censoring or noncensoring of a yvalue is independent of the associated value of x, i.e.,
Pr(Ti
w(b) = ((z (b) 6,x1) ,... ,(zn (b) i'n'xn))'
be an observed vector. We now reckon on all permutations of { (1 '61) , (z2,62) " I***(Zn,6n) }, keeping the x's fixed and consider the n! transformations of w(b) given by
g (w(b))= (z x6 x) ,...,(z , ,xn))" (3.2)
wh er en n where
P FPn = {p : p is a permutation of (l,...,n)}.
Thus under Ho: a = b and assumption A2, the conditional distribution of a random W (b), conditional on its observed value w(b) , is uniform on {g P(w(b)), p Pn}. Using the superscript c to indicate that the result is conditional we have that the conditional null distribution of W(b) is
Prc(W(b) = g (w(b))) = ks P (3.3) The conditional discrete null distribution of S(b) can thus be determined by computing S(g(')) for each equally likely g('). A conservative test of Ho = b at nominal level a is then given by
I ,S(w(b)) > s (b) or S(w(b))
(wb)=U (3.4)
0 ,otherwise,
where f(w(b)) is the probability of rejecting H when W(b) = w(b),
s (b) = Inf {s: Prc (S(b) > s1 = b)} < a/2} and
s.(b) = Sup {s: PrC (S(b) < s15 = b)} >>a/21.
The test given by (3.4) defines a conservative 100(la)% confidence set for 8:
CS = {b: sz(b) < S(b) < s (b)}, (3.5) consisting of the set of all "acceptable" values of b.
The confidence set CS will be conservative in the sense Pr{O CS a 11  a, because of the use of discrete distributions in defining su (b) and sz(b) at (3.4).
The determination of CS for a given set of data is
complicated by the fact that the null distribution of S(b) depends upon b (Corollary 3.7). The following set of Lemmas characterizes this dependence.
Lemma 3.6. If the xvalues are distinct then, for all b, the permutation distribution of S(b) will be symmetric about zero with support of the form p+2q, q = O,12,...,r, for some integers p,r. That is, the support values increment by 2.
Proof: Consider the effect of associating the reverse order of zvalues with fixed xvalues. From (2.3) we have
S(g (w(b))) =  S (g (w(b)) Y,
for all p ï¿½Pn and q E {q: ql =npl+l"' 'qn = npnl
(+)(b ,il
This proves the symmetry of S(b). Consider z. (b),i1, ...,n associated with ordered xvalues xI < x2 < ... < xnLet (i,j) , i > j denote arbitrary positions in the vector of zvalues. Suppose that after a permutation of zvalues the z(+)'s from positions (i,j) have moved to positions (PiPj), 'respectively. Notice that if *
pi > P then nij =piPj and if
pi < pj then ij = pi,
where nl. = 1, 0, 1 from (2.3).
Thus in an arbitrary permutation of the elements z (+) (b), i = 1,... ,n, all pairs contribute a change of 2 or 0 to the total change of S(b) = Z . . Therefore, the (i,j):x.>x. i j
values of S(b) for two arbitrary permutations of z (+) values must differ by zero or a nonzero multiple of 2.
S
The following Corollary shows that the null distribution of S(b) does indeed depend on b.
Corollary 3.7. A change in b can change the permutation null distribution of S(b).
Proof: In view of Lemma 3.6 this will be proved if we show that a change in b can change S(b) by some odd number. From (2.11) the jump in S(.) occurring as b changes through bk is
ck
(i,j)sBk 1
Clearly this change can be an odd number.
Recall that S (b) is a function of the ranks of
{zi (b): i = 1,...,n} and that these ranks can change only at the critical values bk' k = i,...,ï¿½ (see 2.12). Thus there can be no difference in the permutation distributions
of S(b) for b lying between critical values. The following Lemma and Corollary concern situations for which the distribution of S (b) changes as b changes through a bkLemma 3.8. Suppose bij = bk for just one pair of subscripts (i,j). That is, bk is a nontied critical value. Consider b', b" such that bkl
d d
c
i) If c.. = 2 or c.. = 0,then S(b') S(b"), using Cto indicate "equal in conditional distribution."
dc
ii) If c.. = 1 and xvalues are distinct,then S(b') # S(b").
Proof:
i) Since bij = bk for just one pair of subscripts (i,j) it follows from (2.12) that only onepair (zi(b), z.(b)) changesorder as b changes through bk. Now, cij =(V.+6i) =2 or 0 implies 6. = 6. = 1 or 6. = 6. = 0, so that the elements which change rank order are either both censored or both uncensored. Thus the censoring pattern over the ranks of {zi (b') : i = 1,...,n} is the same as the censoring pattern over the ranks of {zi (b") : i = 1,... ,n} and consedc (b.
quently S (b') = S (b).
ii) Immediate from Lemma 3.6. Corollary 3.9. If ci4 = 2 or 0 for each pair (i,j) such that b.. =bk, then the permutation distribution of S is not changed as b jumps through bk from b' to b".
Proof: Immediate from Lemma 3.8.
From the prior Lemmas and discussion it is clear that
we can identify ranges of b values for which critical points for the tests (3.4) do not have to be recomputed. A change of b through certain b k will change the permutation distribution and in these cases test critical values must be recomputed, from a regenerated permutation distribution.
The question of whether the confidence set CS at
(3.5) will be a single interval is now addressed. It will be shown that if, as b values are incremented outwards from the point estimate, a point is reached at which test (3.4) leads to rejection, no subsequent test would lead to acceptance, and thus CS does make up a single interval of plausible b values.
We again denote critical b values as in Lemma 2.8(iii), i.e., b1 < b2 < ...
(Prc(Sk < s k), sk < 0
Pk=
Prc(Sk > sk), sk > 0.
In terms of the test at (3.4) we would reject Ho: =b'' for some bk
and think in terms of a test at some level a and b' a rejected value. Then pk1 < a/2 and it is clear that b" cannot be accepted at the level a provided Pk < Pkl" Therefore a sufficient condition for the confidence set (3.5) to be a single interval is that
Pk
(3.11)
Pki < Pk , k such that sk > 0.
We now argue that the condition (3.11) will be satisfied by the conditional null distributions of S. Because S(b) is nonincreasing in b (Lemma 2.6) condition (3.11) is satisfied whenever
d
Skl Sk
We now consider what happens when the distribution of S(b) does change. We treat first the case where there are no tied bij values at bk and extend the result to the tied case later. By Lemma 3.8(ii) our concern is with the case where c.. = 1. For this case just one censored and one uncensored value of z(b) change rank at the critical b.. and such interchange is between adjacent ranks (cf., 2.12). Furthermore, since S(b) is nonincreasing, after such a rank interchange the higher rank will be associated with the lower xvalue and S(b) will have reduced by one. Without loss of generality assume xI
can be no interchange of ranks associated with a pair of tied x's (cf., 2.9). Let
R (+) (b) = (R + (b),R2W (b), R. WR (b)) (R12 n
denote the ranks of {zi (b) i=1,...,n1. As before the superscript (+) indicates the incorporation of censoring information, R. (b) indicating that zi (b) is censored and
1 1
the value of zi (b) has rank Ri (b) and Ri (b) indicating that zi(b) is not censored and has rank Ri(b). Clearly there may be tied ranks, but for (i,j) such that b.. is a crit13
ical value we have Ri(b) =R. (b) only at b =bij. If there are no ties R sP For notation purposes take bk b. and let R(+) denote R(+) (b') and R(+) denote _R(+) (b") for
and leeo k
(+) w l
b', b" as in Lemma 3.8. A second subscript on R will
k
identify a particular vector. We now give as a theorem an important result concerning condition (3.11). Theorem 3.12. Suppose x
1 2 n 13 = kfo just one pair of subscripts (i,j). Then ck =i implies
condition (3.11):
Pk < PkI if sk < 0, Pki < Pk if sk > 0.
Proof: As noted earlier, when b passes through bk' the only change in the rank vector R W+(b) is the interchange of its ith and jth elements. The elements interchanged
represent adjacent ranks and the exact nature of the change depends upon the censoring pattern. Let rl,r2 = 1,...,n1 represent arbitrary ranks. If ck =cij =i, then one of two situations must obtain as b passes through bk:
(1)
( 1 ( (+) lR(+) +(rl+l) ,R(+))'
kl,l 1 R 2 ,... ,R ,... , . n will become
E (+) = (R  (n R+) rl+l R r+) ...I . (+)
k,l 2 1 1 n or (2)
(+)  ( R R =r2 Ri(+) r2+1,. Rn(+)) Rkl,2 1 '2 j ,... 2 n will become
R (R . + ,R (r+l) + R!+) = r
k,2 1 '2 J 2 1 2,..R n
th
In (1) the i position has the censored rank and in (2) the censored rank is in the jth position. Let stm denote the value of S(b) computed from Rt(+) Then from (2.15) Sk,m =Skl,m 1, m = 1,2. Notice that because the ranks
+
are adjacent, (r1+l) has the same concordances and discordances with elements of R(+) as r has with elements
kl,l 1
of RM(+) except for concordances between (rl+1)+ and
k,l'1
(++) +mla
in R and r and rl+1 in R k,1 A similar rinkl,l 1 r+ situation obtains for r1 and r1 +1 in situation (1) and also for corresponding ranks in situation (2). Notice further that for any permutation of the elements ofR(+)
k, r
keeping x's fixed the rank pair (r +1,r +) will contribute
zero in the computation of Sk,l. Contribution to s ki of (+ i
(rI, (rl+l) +) for corresponding permutations of R(+) is
1 kl,l
either +1 or 1. Thus if a permutation of elements of R(+)
k, 1
gives a value of Sk
Therefore,
Number of permutations Number of permutations
o(+) < of elements of R(+)
(giving Sk< s k, ( giving Sk < s jkl,l
kl of lnt okl,l
Consideration of situation (ii) gives similarly
Number of permutations Number of permutations ï¿½f elemets ï¿½f R(+) fR+
ot< of elements o RM
(giving Sk
k k2 of lementso
These together imply in terms of pvalues for sk < 0
Pk < Pkl ' Sk < 0.
Similar argument gives
Number of permutations Number of permutations
of elements o Rk(+)m < gvemen Rkm
giving Sk_ > s kl,m (gvn S > s k,m
gin tklhsklm km
m = 1,2 and these imply
Pk1 < P k'
S k> 0.
It remains to consider ties in the critical b's and the general result is given in the following corollary. Corollary 3.13. For the permutation distributions described at (3.3) and pvalues defined in Definition 3.10, we have in general for an arbitrary critical value bk that.
Pk < Pki S k < 0, Pki 4< Pk Sk > 0.
Proof: As b increases through a critical b k a series of adjacent ranks of z's is reversed and S(b) reduces by E cij (as at 2.11) with the summation being over q pairs (i,j) such that bij = bk. Again let the rank vector after change through bk be and observe (in the manner of Theorem 3.12) that for any permutation of the elements of R(+) no corresponding permutation of the elements of R(+)
k k 1
can increase S(b) by more than E c... Thus,
Number of permutations S Number of permutations
of elements of Rk < (of elements of Rkl giving giving Sk 4 sk J kkl <= S k 1 Sk  i J
The result follows as in Theorem 3.12.
We have shown that for a chosen level a, as b increases above the point estimate a value can be reached at which the sample value of S(b) falls in the reject region of the test at (3.4) and that further increase of b does not lead to acceptance of S(b). Similarly for b decreasing below
the point estimate. The confidence set described at (3.5) is thus established as a 100 (la)% confidence interval.
3.4 Interval Estimation of a: Asymptotic Method
For large n, the tests proposed at (3.4) can be
performed using an approximation for the conditional null distribution of S; i.e., the conditional distribution of S under H = b. We may use directly the results of Daniels (1944). Those results concern, in Daniels' notation, the n! different ways of grouping n sample yvalues with n sample xvalues. A score aij, ij = 1,...,n, is assigned to each pair (xi,xj) and a score bij to each (yi,yj). For scores such that a aji, bij = b.i and for summations over all Ji
3
subscripts E ai jaij and E bijbij' each of order n it is shown that a statistic
C E a.. b.. (4.1) ,j13
is conditionally asymptotically Normal with conditional mean zero and an expression for the conditional variance is given in terms of n, a.. and b.., i,j = 1,... ,n. Applying this to
i3J1
data points (z( ,x.), i = l,...,n, and the statistic,
n n
S(b) = Z nij (b) ip)(x.x
i=l j=l (xixJ
from (2.3), we have, comparing (4.1) and writing Tij for fij (b),
CE2S(b) = ij Xij
with Xij = sgn (xixj). Clearly, nij = Tji and Xij = Furthermore, since In ij I< 1 and IXijI < 1, the summations I E i n ij < n3 and I E X Xj .1
3
are of order n . We can therefore apply Daniels' results and conclude for the conditional null distribution of S(b) that
i) E (S(b)) = 0,
ii) Var (S(b)) = [n(nl) (n2)] 1(Ei. ,n 2(Zx. x..,
E.Xij)+ [2n(nl)]1(En2j )(EXi), (4.2) where summation is over all subscripts from i = 1,... ,n.
iii) S(b) is asymptotically Normal.
Some discussion of when this asymptotic approximation might be appropriate in practice is given in Chapter Four.
3.5 The TwoSample Location Problem
Gehan (1965) formulates a twosample test for right censored data. An inversion of that test is now shown to be equivalent to a special case of the method presented in the previous sections of this chapter.
For populations 1 and 2 with distribution functions F1(), F2('), respectively, Gehan tests
Ho: F1(t) = F2 (t) against
HI F1(t) < F2 (t) or against
H2 F1(t) < F2(t) or F1(t) > F2(t).
If we consider the shift model as a special case of this, we have
F1(t) = F2(tA) , for some A,
or equivalently, for random variables T, T2 from populations 1 and 2,
d
T= T +A.
1 2
We would wish to test
H: A 0
0
against
H A > 0 or H2: A # 0.
Consider samples of size nl, n2 from populations 1 and 2, respectively, i.e., y(1),y(2 ** y(+) and y21 ,y22 (+ where again y s are the possibly censored lifetimes and the extra subscript indicates the population. Gehan's test of A = 0 would use the statistic
W = E U..
i=l j=l 1D where
1
0
Uij =
ï¿½ i
Yli
yli yyli1 y2j + +
Yli = Yjor (Yli' Y2j ) or < or
Yli y2j Y < Yli Yli >Y2j or Yl>Y2j
(5.1)
Following an inversion procedure (cf., Randles and Wolfe, 1979, Ch. 6) for a distribution free confidence interval, consider this statistic W computed with sample points:
(+)
Z li
(+) z2j
(+) = Yli
(+) = 2j
 A
I j = l ,...,n 2
i.e.,
with
1
U ij 0
11
Using the notation
W = E U..
i,j 13
Z i < z2j otherwise zli > z2j
or z li< z
ii 2j
+ >2
Zli z2j
1
91m ( 0
nI n2
W E E i=l j=l
z Zm z 2m
z =
uncensored
censored
1,2, m = 1,2,...,n .
(5.2)
i = 1,...,nill
2j (ZiiZ2j)  li (2jzli)
This is readily seen to be obtainable as a special case of (2.3) by using that expression with
observation from population 1 observation from population 2
b E A
and labelling x's such that
x . = ( 0
0
i = 1+,...,n1
i = nl1+1l,...,n,
where n = n1 + n2. The summation at (2.3) then becomes
nl1 n
S(A) = E {6 .(z z.) z 0
i=1 j=nl+l J i1J 1 ji
and this relabels to the expression for W at (5.2) by denoting
z li = zi
6 l i = i ' i = l , . n
zj = zi z2j Z i
62j 1
i =n +l,...,n j = i  nI.
Finding a confidence interval for A by inverting Gehan's statistic (5.2) is therefore an equivalent procedure to finding a confidence interval by inverting S(b) at (2.3).
xl 0
3.6 Computational Aspects
The confidence interval procedure described in the
previous sections requires the progressive testing of hypothesized a values. For each such test use is to be made either of the exact permutation distribution or the asymptotic approximation. The exact test of an arbitrary value b', for b' somewhere between two critical values, would perhaps be made most conveniently if we could obtain a pvalue directly for the sample s(b') value and reject H 8 = b' for a pvalue < a/2 for some level a. Neither direct nor progressive computation of pvalues seems to be tractable in practiceto do it requires identifying all
z(+)
permutations of the current {z!i: i = 1,...,nl elements giving S values less than (or more than) the current sample arrangement. Resort is made in this study to the more laborious complete enumeration of the permutation distribution for each b' to be tested. The algorithm employed is to work outwards from the point estimate and to regenerate the permutation distribution as b changes through a critical value, realizing (from Corollary 3.9) that this will not always be necessary because of no change in the distribution. For each exact permutation distribution, critical regions can be set and appropriate tests made.
The asymptotic procedure is similar but with variances recomputed after certain steps in the S(b) function.
In both cases coarser searches across bvalues can be made in order to reduce computing time and in this regard there is also some advantage to making an initial search using the asymptotic method and restricting the exact method search to the appropriate regions.
Further details of the computing aspects are given in appendices, along with some computer programs and their documentation. Some results of using this new method appear in the next chapter.
CHAPTER FOUR
COMPARATIVE STUDIES
4.1 Simulations Using a Simple Linear Model
As discussed in Section 2.6 there are some reasons for preferring the Cox and the Buckley and James regression estimates for use with censored data. According to Miller and Halpern (1982, p. 530), "the choice between {these two methods} should depend on the appropriateness of the proportional hazards model or the linear model for the data." This section reports some simulation studies that compare the performances of the Buckley and James method and the new method of Chapter Three using data generated from the linear model (2.3.1) with p =1. To afford some comparison with the original simulations reported by Buckley and James their choice of a = 0.2 for the linear model is adopted.
We. first describe how data were generated for the
simulation trials. Recall that according to model (2.3.1) the survival times T. have the representation
1
T . = a + Sx. + si, i = l,...,n.
For this series of simulations we fixed a = 30, 5 = 0.2 and the covariate x was given values from 40 to 100 in increments of 60/(nl) for sample size n. Four models,
referred to as Ti through T4, were used for the distribution of e in generating Tvalues. These were
T1 :  N (0,100),
T2 : 6  Double exponential, i.e.,
f (E:) = exp 1 F < e < 2Y Y
with y = 7.07,
T3 :   Shifted exponential, i.e.,
f(1) 1 exp () , 10 < e < with j = 10,
and
T4:sE  Uniform (a,a) with a = 17.32.
Such choice permits comparisons over heavy tailed, light tailed and nonsymmetric distributions. The variance of e is 100 in each case in keeping with the scale of the models used by Buckley and James.
Three different censoring mechanisms were imposed on sets of Tvalues simulated using each of the distributions Ti  T4. These were
Cl : C.  Uniform (Sx. + a, 8x. + b)
1 2. 1
where a and b are constants chosen to control the censoring rate
C2 Ci has the same density for all i given by
f(c) = dc' 0 < c
1  e
where A and d are constants chosen to control the censoring rate, and
C3 C. = r,
where r is a specified constant.
The type of censoring contrived for the form C2 is motivated by consideration of a clinical trial of fixed duration d with time of entry in to the trial random at constant rate X. All censoring mechanisms satisfy assumption Al of Section 2.3. Only Cl satisfies assumption A2. Departures from A2 will be moderate for C2 and quite severe for C3.
In all simulations reported here the constants in
Cl, C2 and C3 were fixed to give a nominal 40% censoring in the data. The average censoring rate attained for each type of data is given in the following tables. Any generated data sets having less than 3 uncensored values were discarded (see 2.5.6). As examples of the censoring patterns that can arise using Cl, C2, C3, Table 3 gives the number of censored observations occurring at x = 40(10)100 for 2000 simulations of 3 different types of data. The frequencies clearly indicate the anticipated departures from assumption A2 with censoring forms C2 and C3. The percentage censored column
gives the percentage of censored responses over all simulations for the particular data type.
Table 3. Frequency of occurrence of censoring at different
x in 2000 simulations of samples of size n = 7.
% xvalues
Data Censored 40 50' 60 70 80 90 100 Tl, Cl 38 723 774 791 742 769 793 784 Tl, C2 41 641 681 754 815 891 916 1015 TI, C3 38 320 453 583 765 942 1070 1251
We now explain five methods of analysis which were
compared using the simulated data. The abbreviations given below in parentheses will be used to identify the methods throughout the presentation of the results.
a) Least squares (LS). Standard least squares analysis was applied to the survival times T as if they were all known. The usual confidence interval procedure is strictly valid only in case Tl. In that case the confidence interval serves as a standard with which to compare other methods. The point estimates will be unbiased in all cases and this fact provides some check on the simulation procedure.
b) New method (N). This is the method introduced in Chapter Three. It is valid for all models Ti through T4 with censoring mechanism Cl. The application of this method differs for different sample sizes. Simulations were run
for sample sizes n = 7, 10, 15 and 25. For samples of size n = 7, the permutation distribution required for the method was based on all 7! = 5040 permutations generated as explained in Section 3.3. For n = 10, the permutation distribution was estimated using a random sample of 10,000 permutations, selected with replacement from the possible 10! permutations. A separate small study which compared 1%, 2%, 5% and 10% points of the estimated permutation distribution with the corresponding percentage points of the true permutation distribution showed good agreement. For n = 15 and n = 25 the asymptotic approach of Section 3.4 was used.
Randomized tests were not incorporated in the simulations and so the intervals obtained would be expected to be conservative in cases where assumption A2 is satisfied. In the small sample simulations (n = 7, 10) the test critical values were adjusted accordingly to give less conservative intervals. Details of this are given with the results.
By its nature this method can give infinite length intervals and some results on the likelihood of this for different size samples are given in Table 4. In contrast to the other methods the confidence intervals by this method are not necessarily symmetric about the point estimate and some implications of this are presented in Table 15.
c) Least squares on a reduced sample (1). Standard least squares analysis was applied just to the known (uncensored) survival times. This was included to investigate
the worth of the passive stance of simply throwing away the censored observations. Clearly, this is not a valid sampling procedure for application of the usual least squares theory. It is intuitively likely that slope estimates would be unbiased in the case of type Cl censoring. At least three uncensored observations are required for application of the usual error variance estimation.
d) BuckleyJames Normal approximation (BJN). The
method of Buckley and James as explained in Section 2.5 was used. It is an asymptotic method and its appropriateness in small samples is unknown. At least three uncensored observations are required. As discussed previously (Section 2.5) the convergence of the BuckleyJames method is not guaranteed and can be a problem. For the sake of these simulations the following procedure was adopted. Convergence was considered to have occurred if iterate. agreed to within ï¿½0.00005. If convergence was not reached within 20 iterations a search for oscillation cycles was made over the final 10 iterations. If 2 iterates in this search agreed to ï¿½0.005,then the arithmetic mean of all iterates in that cycle was taken as the point estimate. If such agreement was not reached within the 10 iterations, then it was deemed that "no solution was possible." Doubtless a solution could be contrived in an individual case, but no further attempt to coax out a solution was made in these trials. In some simulations the starting value chosen
for the iteration was 8(I), the least squares estimate using only the uncensored observations, and in others the value
'(0)
chosen was ( , the least squares estimate using all the observations without regard to whether they were censored or not. The choice of starting value was discussed briefly in Section 2.5 and some indication of how it might affect the convergence is given in Table 4.
Confidence intervals for 5 using this method were
placed by using the variance estimate (2.5.5) and critical values of a normal distribution.
e) BuckleyJames tapproximation (BJT). This procedure was as for BJN except that by ad hoc reasoning the critical values for a tdistribution were used in placing confidence intervals. From consideration of expression (2.5.6) for the error variance estimation the value for the degrees of freedom for t was taken as two less than the number of uncensored observations.
Some results are now presented of how appropriate
selections of the above methods compared in various aspects of their performance. The performance aspects to be considered are
i) Infinite intervals and convergence
ii) Bias of estimator
iii) BJ variance estimation
iv) Coverage probability
v) Length of confidence interval
vi) A power comparison.
These aspects are now discussed in turn.
i) Infinite intervals and convergence. Some features of the methodologies of the BuckleyJames method and the new method are now considered. Method N will always produce a point estimate but it is possible to obtain infinitelength intervals. The infinitelength intervals usually contain one finite bound so that the interval will still be useful. The proportions of infinite length intervals obtained at different sample sizes in a series of 200 "simulations are given in Table 4. Clearly, infinitelength interval is a problem associated with very small sample sizes. Even for n = 7, the proportion of such intervals was less than 15% in all but one case.
Recall that the BJ method might not give a point
estimate due to nonconvergence and consequently might not produce a confidence interval. The proportion of simulations that did not give direct convergence and the proportion for which "no solution was possible" even after allowing for oscillation are also included in Table 4 for sample size n = 7 and starting values R(0) and ('), as well as for sample sizes n = 15 and n = 25 with starting value
The comparison of starting values at n = 7 indicates that '(1) might be a better choice than (0. However, except for the results given in Tables 4 and 5, all our results were obtained by using ( as suggested by Miller and Halpern (1982). Simulations for n = 7 using each
Table 4. Percentages of 200 simulations leading to infinite intervals by N
and percentages giving convergence problems with BJ.
cinterval Nonconvergence No solution
n = 7 7 15 25 7 15 25
Data Start = ^(0) ^(l) ^(0) ^(0) ^(0) ^(l) ^(0) ^(0)
Tl,Cl 7.5 23.0 20.5 30.5 26.0 3.5 4.0 0.5 0.0 T2,Cl 13.5 30.5 23.5 30.5 21.5 10.0 5.5 1.5 0.5
T3,CI 10.5 24.0 19.5 24.0 21.0 3.0 1.0 1.0 0.0 T:4,Cl 11.0 25.5 22.5 28.0 26.5 2.0 2.5 1.0 0.5
Tl,C2 9.5 22.0 17.5 25.0 18.5 7.0 3.5 1.0 0.5 T2,C2 12.0 23.5 18.5 31.5 19.5 9.5 1.5 1.5 0.5 T3,C2 14.5 17.0 15.0 19.5 16.0 8.5 2.5 1.5 0.0 T4,C2 13.5 23.5 19.5 35.0 27.0 2.5 3.5 4.5 0.5
TI,C3 14.0 31.0 31.0 35.0 39.5 5.0 4.5 1.0 0.5 T2,C3 21.5 34.5 34.0 43.0 41.5 8.0 5.0 4.5 2.0 T3,C3 11.0 37.0 38.0 35.5 30.0 2.5 2.0 0.0 0.0 T4,C3 3.5 29.5 28.5 23.5 23.5 3.5 2.5 0.5 0.0
1_ 0) and 1
1) Comparisons between and a are far 2) No infinite intervals occurred for n = 15
exactly the same data. or n = 25.
starting value gave no indication that the overall conclusions of the following sections would be altered by the choice of starting value.
Looking at results for the individual samples generated showed that nonconvergence is more common for highly censored samples but that it does occur at all levels of censoring. In a search over some of the results for the case n = 7 infinite intervals were not found to occur when there were more than three uncensored values. No intervals of (oc) were encountered. It was not particularly common for nonconvergence and infinite intervals to occur for the same data set.
ii) Bias of estimator. Arithmetic means of the point estimates obtained by each method over 2000 simulations for n = 7 are given in Table 5 and means over 200 simulations for n = 25 are given in Table 6. The estimated standard errors of these means are also given.
Because of some nonconvergence for the BJ method the number of estimates averaged might be less than the number of simulations. The actual number used for the BJ method is given in the final column. Means which are more than two standard deviations from the nominal 8 = 0.2 are indicated.
Table 5. Arithmetic means (x 10 3) based on 2000 simulations of slope estimates for
samples of size n = 7.
% Number of
Data Censored LS N 1 BJ BJ Estimates TlCl 38 204 (4.2) 204 (5.6) 197 (6.1) 203 (5.2) 1960 T2,Cl 42 197 (4.1) 195 (5.8) 195 (6.6) 196 (5.4) 1938 T3,Cl 36 203 (4.1) 200 (3.8) 202 (4.2) 201 (4.0) 1971 T4,C1 39 199 (4.3) 200 (5.6) 202 (6.3) 199 (5.4) 1943
TI,C2 41 200 (4.2) 197 (6.0) 169 (6.7) 194 (5.7) 1941 T2,C2 40 196 (4.2) 200 (5.9) 165 (6.6) 193 (5.6) 1948 T3,C2 41 204 (4.1) 200 (4.4) 181 (4.6) 195 (4.4) 1947 T4,C2 41 195 (4.3) 199 (6.2) 167 (7.0) 194 (6.0) 1921
TlC3 38 209 (4.2) 220 (5.5) 091 (5.1) 200 (4.9) 1937 T2,C3 44 203 (4.2) 218 (5.6) 051 (6.0) 184 (5.2) 1901 T3,C3 37 198 (4.2) 193 (2.8) 136 (2.5) 186 (2.6) 1965 T4,C3 30 186 (4.4) 192 (5.7) 088 (4.5) 177 (5.1) 1958
*Indicates that the mean is more than 2 standard deviations from the. nominal a = 0.2. Estimated standard deviations (x 103) of the means are given in parentheses.
Table 6. Arithmetic means (x 10 3) based on 200 simulations of slope estimates for
samples of size n = 25.
% Number of
Data Censored LS N 1 BJ BJ Estimates Tl,C1 44 192 (7.1) 185 (8.9) 185 (9.2) 190 (8.7) 200 T2,CI 50 198 (7.4) 194 (9.2) 192 (10.7) 194 (9.5) 199 T3,Cl 38 209 (8.1) 211 (5.9 213 (7.1) 215 (7.4) 200 T4,Cl 44 202 (7.7) 206 (9.5) 209 (10.5) 207 (9.2) 199
Tl,C2 46 200 (7.1) 196 (10.5) 168 (10.0) 193 (9.5) 199
* * , T2,C2 46 180 (8.4) 185 (10.0) 157 (10.7) 178 (10.3) 199 T3,C2 47 193 (8.1) 195 (6.7) 175 (7.4) 194 (8.1) 200 T4,C2 47 194 (7.7) 186 (11.0) 160 (10.3) 188 (10.3) 199
Tl,C3 43 196 (7.4) 205 (8.4) 083 (6.7) 198 (8.7) 199
*
T2,C3 51 194 (7.1) 210 (8.4) 052 (8.4) 195 (9.2) 196 T3,C3 38 200 (8.4) 197 (4.5) 135 (3.2) 196 (4.5) 200 T4,C3 32 188 (8.1) 188 (10.3) 102 (6.3) 187 (9.5) 200
*Indicates that the mean is more than 2 standard deviations from the nominal
8 = 0.2. Estimated standard deviations (x 10 ) of the means are given in parentheses.
The results in Tables 5 and 6 indicate that method 1 can give severely biased results if used in cases where assumptionA2 is not satisfied. Departures from A2 do not seem so serious for methods N and BJ, especially at the larger sample size. There is some bias indicated for methods N and BJ at sample size 7 with censoring type C3.
iii) BJ variance estimation. An investigation was made of how well the expression (2.5.5) approximates the variance of the BJ estimator. For each type of data 2000 simulations for n = 7 and 200 simulations each for n = 15 and n = 25 were run, computing where possible the BJ estimate and an estimate of its variance using (2.5.5). The sampling variance of the estimates and the mean of the variance estimates were computed. The results for n = 7 are displayed in Table 7.
From the results shown in Table 7 it seems clear that the variance expression (2.5.5) overestimates the variance of the BJ estimator quite markedly for sample size n = 7. A similar but less marked overestimation was found for n = 15 while for n = 25 the estimation was much better.
Table 7. Simulation results concerning estimation
of the variance of the BuckleyJames
estimator with sample size n = 7.
# of Mean of Data Estimates Var L Variance
0.053 0.057
0.031 0.057
0.064 0.061 0.038
0.069
0.046 0.052 0.013 0.051
Estimates
0.095 0.096
0.045 0.092
0.103 0.091
0.049 0.110
0.091
0.104 0.019 0.066
Tl,Cl
T2 ,CI T3,Cl
T4,CI
TI,C2. T2,C2 T3,C2 T4,C2
Ti ,C3 T2,C3 T3 ,C3 T4 ,C3
1960 1938
1971 1943
1941 1948
1947 1921
1937
1901 1965
1958
Bi
iv) Coverage probability. The various confidence interval procedures are now compared on the basis of the coverage attained by the intervals. Nominally the intervals are 90% except for method N with n = 7 and n = 10. In those cases the intervals are nominally 88% but known to be conservative with Cl type censoring distribution (see 3.3.5). In general this adjustment led to intervals with coverage 90% or greater and thus established a more common base from which to compare other aspects of the various methods. Tables 8  11 give the results for n = 7, 10, 15, 25, respectively. Recall that results for n = 10 are based on estimates of exact permutation distributions. Because of computing expense results for n = 10 were obtained only for three data types. It was felt to be useful to include these to give some indication of how well the distribution approximation functioned. Coverages for the BJ method are calculated using only the data which do give a BJ estimate.
Table 8.
Percentage coverage of confidence intervals based on 200 simulations with sample size n = 7.
LS N 1 BJN BJT % Censored TI,CI 85.5 93.0 88.5 81.9 93.8 38 T2,CI 90.5 95.0 90.5 80.6 92.2 43 T3,CI 90.0 93.0 90.0 77.8 89.7 37 T4,CI 91.0 98.0 92.0 84.2 95.9 39
TI,C2 92.0 96.0 89.0 80.6 93.0 41 T2,C2 89.5 93.5 91.0 88.4 95.6 40 T3,C2 88.0 91.5 93.5 81.4 95.1 41 T4,C2 87.5 93.0 85.0 76.8 89.7 42
TI,C3 90.5 93.0 87.0 83.2 92.6 38 T2,C3 90.5 92.0 80.0 79.9 91.3 43 T3,C3 89.5 93.0 81.0 84.1 91.3 37 T4,C3 88.5 93.5 82.5 79.8 90.7 32
Nominally intervals were 90% except for method N where the setting was 88%. Standard deviations for the table entries are approximately 2.1.
Table 9.
Percentage coverage of confidence intervals based on 200 simulations with sample size n = 10.
LS N 1 BJN BJT % Censored TI,C1 86.5 91.5 87.5 87.8 92.9 44 TI,C2 92.5 94.5 89.0 85.7 93.1 44 TI,C3 90.0 90.5 80.0 79.0 86.7 41
Nominally intervals were 90% except for method N where the setting was 88%. Standard deviations for the table entries are approximately 2.1.
Table 10. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 15.
LS
92.0 90.5 93.0
89.5
91.5 88.5
93.5 92.5
92.0 91.0
87.5 90.0
Nominally intervals were 90%
table entries are approximately 2.1.
Standard deviations for the
TI,CI T2,Cl T3,C1 T4,CI
Tl,C2
T2,C2 T3,C2 T4,C2
Tl,C3 T2,C3 T3,C3 T4,C3
N
92.5 91.0 88.5 89.0
93.5 87.5 92.0 91.0
89.0 91.0 90.0
92.0
1
89.0 92.5 90.0 89.5
91.0 89.5 91.0 88.5
75.0 70.5 83.0 84.5
BJN
90.5 91.9 84.9 88.9
89.9 85.3 88.3
87.4
84.9 90.1 81.0 81.4
BJT
94.5 94.9 91.4 91.4
95.5 91.4 95.0 94.8
89.9
93.7 84.5 87.4
% Censored
44 49 40 44
47 48 47 49
42 49 39 33
Table 11. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 25.
% Censored
44 50
38
44
46 46 47 47
43 51 38 32
T1,CI T2,Cl T3,C1
T4,CI
TI,C2 T2,C2 T3,C2 T4,C2
TI,C3 T2,C3 T3,C3 T4,C3
LS
92.5
91.0 91.5
87.5
92.0
89.5 84.5 90.5
94.0 92.5
87.5 89.0
N
91.5 91.5 90.5
87.0
87.0 88.0 91.0 89.0
94.0 94.0 92.0 88.5
1
90.5
91.5 88.5
87.0
88.5 85.5
88.5 90.0
71.0 64.0
69.5 77.5
BJN
91.0 91.5 86.0 87.4
86.9 87.4 79.0 89.5
83.4 88.3 81.5 78.5
BJT
91.5 95.5 89.0 90.0
91.5 90.0 82.0 91.5
87.4 90.8 84.5 82.0
Nominally intervals were 90%. Standard deviations for the table entries are approximately 2.1.
The implication from the results for all sample sizes is that method 1 may be acceptable in terms of coverage for C1 and possibly C2 censoring but would not guarantee coverage with C3 type censoring. This finding of course is in keeping with the observations about bias.
Method N shows quite close to or greater than nominal coverage over all models and sample sizes. The asymptotic method of section 3.4 with n = 15 and n = 25 and the approx" imate permutation distribution for n = 10 give coverage generally close to 90%. Use of the exact permutation distribution with n = 7 gives very conservative 90% intervals even though the setting was 88%.
Method BJN does not come close to attaining coverage for n = 7 and the coverage is fairly consistently low over all models for the larger sample sizes. For n = 7, all the observed coverages are more than 2 standard deviations below 90%. There are some similar significant departures for the larger sample sizes, particularly with C3.
Method BJT with n = 7 generally gave conservative
intervals with coverages comparable to those of Method N. Method BJT gave some low coverages at the larger sample sizes particularly with C3. This loss of coverage for the larger n is a result of the improvement, mentioned earlier, of the estimate of the variance of the point estimate. For n = 7 the inflated variance estimate secures coverage for the interval.
Overall, the coverage by method N was never significantly low, whereas at least one significantly low coverage occurred for each of the methods 1, BJN and BJT.
v) Length of confidence interval. Confidence intervals are of course most meaningfully compared in terms of length when they attain much the same coverage. Table 12 shows mean interval lengths and estimates of their standard deviations for a series of simulations with sample size n = 7. An interval length was included in the computation only if all methods provided a finite length interval for the particular data set. Thus for each data type the number of interval lengths averaged is the same for all methods. As discussed earlier, highly censored data sets are the ones which tend to not give an interval, particularly for n = 7. Therefore the results on interval length for n = 7 should be viewed as applying to data with censoring less than that indicated in Table 8.
Results for method BJN are not included in Table 12. That method did give consistently shorter intervals than method N but according to results in Table 8 is not competitive in terms of coverage. Where coverages are comparable the indications are that method N gives, on average, shorter intervals than both BJT and 1. With nominally 40% censoring the intervals by method N turn out to be about twice the length that would be obtained by LS had the data been complete. The column headed BJTN in Table 12 gives the
difference in mean interval lengths for the two methods and the estimate of the standard deviation of the difference for a paired comparison. On all but two occasions the method N length is shorter, and is shorter by more than 2 standard deviations on 3 occasions.
Results for n = 15 are in Table 13. Because of coverage limitations results for methods 1 and BJN are not, included. In cases where coverage was attained BJN tended to give rather shorter intervals than N and intervals by 1 were comparable in length to those of N.
Method N gave intervals that tend to be about onehalf the lengths that it gave in the n= 7 trials. In most cases the length of the interval by N is about 1.4 times the standard set by LS. For data generated using T3 the interval lengths by both methods N and BJT are close to or less than the LS value.
Average interval lengths obtained with'data generated using Cl or C2 turned out shorter using method N than using method BJT. The reverse was the case with C3 censoring. These differences seem to be merely reflections of the coverages obtained and it is likely that for the same coverage there is little to choose between N and BJT in terms of length.
Simulations with n = 25 gave intervals with method N which were about twothirds of the length of corresponding intervals with n = 15. Method N intervals were approximately
1.3 times the length of LS intervals except, as before, in the case of T3 models where all methods tended to give shorter intervals than LS.
Table 14 gives mean differences of interval lengths and their estimated standard deviations for paired comparisons of N with BJN and N with BJT. Assessment of these differences must take account of the observed coverages and these are included in the table. Where coverages were comparable, intervals by BJN were significantly shorter than those by N on three occasions and significantly larger on one occasion. Only once did BJT attain both reasonable coverage (87.4%) and a significantly shorter length than N. Method N gave significantly shorter intervals than BJT on seven occasions.
Table 12. Mean lengths of 90% confidence intervals based on 200 simulations
for sample size n = 7.
LS N 1 BJT BJTN
Tl,Cl 0.729(.017) 1.360(.054) 1.500(.124) 1.739(.156) 0.379 (.135) T2,Cl 0.685(.021) 1.442(.069) 1.771(.155) 2.101(.221) 0.659 (.183) T3,Cl 0.692(.030) 0.900(.050) 0.804(.066) 0.934(.075) 0.035 (.047) T4,Cl 0.735(.013) 1.289(.051) 1.463(.104) 1.781(.171) 0.492 (.141)
Tl,C2 0.714(.019) 1.653(.080) 1.577(.133) 1.847(.168) 0.194 (.134) T2,C2 0.718(.023) 1.502(.069) 1.501(.115) 1.766(.188) 0.264 (.161) T3,C2 0.628(.022) 1.179(.066) 0.941(.074) 1.080(.102) 0.098 (.084)
T4,C2 0.721(.014) 1.578(.071) 1.440(.126) 1.676(.156) 0.098 (.125)
Tl,C3 0.737(.018) 1.203(.057) 1.053(.122) 1.276(.131) 0.073 (.099) T2,C3 0.701(.022) 1.173(.058) 1.073(.084) 1.292(.105) 0.118 (.075) T3,C3 0.614(.022) 0.669(.031) 0.617(.064) 0.770(.072) 0.102 (.058) T4,C3 0.740(.013) 1.373(.051) 0.900(.058) 1.241(.096) 0.132 (.079)
Indicates a difference greater than standard deviations for the entries.
2 s.d.
Values in parentheses are estimated
Table 13. Mean lengths of 90% confidence intervals based
on 200 simulations for sample size n = 15.
LS
.495 (.007) .474(.009) .471(.011) .488(.005)
.485 (.006) .463(.009) .464 (.012) .496(.005)
.491(.007) .475 (.009) .460(.011) .488(.005)
.670(.021) .674(.021) .400 (.011) .679(.017)
.723(.018) .648(.021) .487 (.013) .798(.026)
.680 (.022)
.656(.022)
.353 (.009) .662(.016)
92.5 91.0 88.5
89.0
93.5
87.5 92.0 91.0
89.0 91.0 90.0 92.0
BJT
.713(.039) .863 (.052
.471(.017) .756(.025)
.830(.050)
.732 (.029) .508(.016) .863(.029)
.605(.021)
.653(.025) .293(.010)
.519 (.013)
Values in parentheses the means.
are estimated standard deviations for
The third figure in the N and BJT columns is the observed percentage coverage.
TI,C1 T2,CI T3,Cl T4,C1
TI,C2 T2 ,C2
T3,C2 T4,C2
TI,C3 T2 ,C3 T2 ,C3 T4,C3
94.5 94.9
91.4 91.4
95.5 91.4 95.0 94.8
89.9 93.7 84.5 87.4
Table 14. Differences in mean lengths of 90% confidence
intervals based on 200 simulations for sample
size n = 25.
BJN Coverage BJNN N Coverage BJTN BJT Coverage
Tl,Cl 91.0 0.030 (.005) 91.5 0.010 (.006) 91.5
* .
T2,Cl 91.5 0.015 (.006) 91.5 0.069 (.007) 95.5 T3,CI 86.0 0.020 (.006) 90.5 0.045 (.007) 89.0 T4,CI 87.4 +0.000 (.006) 87.0 0.044 (.007) 90.0
* *
Tl,C2 86.9 0.023 (.005) 87.0 0.022 (.006) 91.5 T2,C2 87.4 0.004 (.007) 88.0 0.048 (.009) 90.0 T3,C2 79.0 0.011(.006) 91.0 0.021 (.006) 82.0 T4,C2 89.5 0.030 (.006) 89.0 0.020 (.007) 91.5
Tl,C3 83.4 0.119 (.007) 94.0 0.087 (.007) 87.4 T2,C3 88.3 0.052 (.009) 94.0 0.006 (.009) 90.8 T3,C3 81.5 0.069 (.004) 92.0 0.055 (.004) 84.5
, *
T4,C3 78.5 0.151 (.007) 88.5 0.128 (.007) 82.0
*Indicates a difference greater than 2 s.d. Values in parentheses are estimated standard deviations for the differences.
vi) A power comparison. Because the method N can give nonsymmetric confidence intervals it was felt that an interesting comparison of methods might be made on the basis of rejection of a null hypothesis that $ = 0. It was thought that, although method N may give intervals of the same length as another method, it might be that there would be a different chance of rejecting 8 = 0 for the two methods because of the asymmetry of the N intervals. With our setting of a = 0.2 for the model this amounts to comparing the power of the methods for a test of Ho: 0 = 0 versus H1: a = 0.2. Recall (Section 3.2) that method N used for testing Ho: = 0 is the test of Brown, Hollander and Korwar (1974). In thinking of estimating a power value for the BJ method we must decide whether to use the proportion of rejections out of all simulations or the proportion of rejections out of those simulations which give convergence and thus do give a test. Values could be quite different for small n. Table 15 shows the proportions out of all simulations.
No marked effects of asymmetric intervals are seen.
Method N shows up rather well compared to BJT for n = 25 in keeping with the results on interval length and both methods show good increase in proportions ("power") with sample size.
Table 15.
Proportions of 200 simulations for which Ho: = 0 was rejected at a test level a = 0.1.
N BJT
n = 7 25 7 25
0.14 0.16
0.30
0.09
0.07 0.13
0.21 0.11
0. 21 0.28 0.35 0.19
0.38
0.43 0.77 0.43
0.11
0.15
0.28 0.09
0.39 0.38 0.67 0.37
TI,Cl T2,Cl
T3,Cl T4,C1
Tl,C2
T2,C2 T3,C2 T4,C2
TI,C3 T2,C3 T3,C3 T4,C3
0.10
0.12
0.21 0.10
0.08
0.06
0.26 0.12
0.39 0.38 0.69 0.38
0.34 0.32 .0.60
0.34
0.51 0.43 0.95 0.52
0.46
0.56
0.79 0.38
4.2 Analysis of Heart Transplant Data
The Stanford Heart Transplant Program was mentioned in Chapter One. Data from that program has been analyzed by various authors at various times since the start of data collection in 1967. Survival times, ages and mismatch scores available as of February, 1980, are tabulated in Miller and Halpern (1982). Those authors compare the results from analyses using the methods of Cox, Buckley and James, Miller, and Koul, Susarla and Van Ryzin that were described in Chapter Two.
Data are available for 184 patients and 55 of the
survival times are censored values. Ages are available for all 184, but mismatch scores are available for only 157 patients., Miller and Halpern compare the four methods of analysis using the 157 complete records in a multiple regression of the base ten logarithm of survival time against age and mismatch score. They declare the mismatch score insignificant and say that the results they quote for age from the multiple regression are practically identical to those using age alone for the various methods. Thus the results on age from their multiple regression are included in Table 16. Also included in Table 16 are new computations for the Buckley and James method for a single regression with age alone using all 184 data values and the results of using the new method, first with the 157 complete records and then with the whole set of age data.
That the KSV estimate is positive is in conflict with the commonly held view that increasing age is deleterious to successful heart transplantation. Because computational expressions are not available for an estimate of the variances of the KSV estimates in multiple regression they could not be given by Miller and Halpern and although an expression is available in the single covariable case the computation of the variance has not been added here.
The Miller method did not converge. Oscillation
occurred between two values and the implication from those values is that there is no significant regression on age.
At the 95% level of testing the method of BuckleyJames showed a significant negative slope on age for the larger sample size and the Cox result was also significant. With the Cox parameterization the positive value of 8 indicates a decrease in survival time with age.
The new method also showed a significant negative slope. Miller and Halpern compare the Cox results with the direct regression results by plotting the median survival times estimated by the Cox model against age. That plot is close to linear and has a slope more negative than the BJ estimate and close to the values given by the new method.
Comparison of results of regressing logarithm of survival times on the mismatch scores are given in Table 17 for the Buckley and James method and the new method. In this case the new method gives the smaller negative slope, but neither result attains significance.
Table 16.
Regression slope estimates. and confidence intervals for logl0 of survival time versus age at transplant with Stanford heart
transDlant data. (n=157 except where stated)
Estimator 95% C.I.
Cox 0.030 (0.008, 0. 052) Miller (loop with 0.001 (0.023, 0.021) 2 values) 0.000 (0.016, 0.016) BJ n = 157 0.015 (0.031, 0.001) n = 184 0.014 (0.028, 0.000) KSV 0.024
New n = 157 0.030 (0.050,0.010) n = 184 0.026 (0.045,0.009)
Table 17. Regression slope estimates and confidence
intervals for logl0 of survival times versus
mismatch score with n = 157 Stanford heart
transplant patients.
Estimator 95% C.I.
BJ 0.105 (0.373, 0.163) New 0.002 (0.327, 0.311)
4.3 Summary of Results
Both the BuckleyJames and the new method were put forward as procedures which did not require assumptions about censoring as far as point estimation was concerned. The indications from the trials reported in Tables 5 and 6 are that both methods might lead to biased results with small samples showing extreme departures from assumption A2. Otherwise point estimation was close to the model setting except in two cases using method BJ. In terms of the methodologies, the new method will always give an estimate whereas, as shown in Table 4, convergence difficulties can arise with BJ, especially with small samples.
The confidence interval procedures for both BJ and N
require assumption A2 and a major concern is how robust the methods turned out to be against departures from that assumption. The observed coverage over all trials with method N was always close to or greater than the nominal 90%. From the results in Tables 8 to 11 it seems that method BJN should not be advocated for samples up to size 25. Contrary to anticipation, method BJT gave more security over coverage at the small sample size. At larger sample sizes BJT would probably attain coverage with mild departures from A2 but perhaps could not be relied on with severe departures from A2. The indications are that over a range of sample sizes less than 30 method N has the better robustness properties.
The overall impression from Tables 12 to 14 is that the new method is competitive in terms of interval length when comparisons are made for those intervals that attain coverage. Thus in conclusion it can be said that over all trials method N gave the better assurance of coverage coupled with a competitive interval length.
Sample sizes greater than 25 were not simulated in this study. Large sample results were compared using the Stanford heart data. Here of course there are no "true" results for comparison but method N gave results consistent with commonly held views about the data.
CHAPTER FIVE
CONCLUSION
We now submit a final appraisal of all the methods considered; firstly with regard to their properties and scope, and then in terms of some observations on their performance.
The methods of Cox, Miller and BJ extend to multiple regression whereas at this stage the new method does not. Also, computing forms for interval estimation in multiple regression with KSV are not available.
The KSV method is well founded theoretically but the conditions required for consistency and asymptotic normality of the estimates are difficult to relate to practical criteria. Large sample properties have been established for the Cox method andwere referenced in section 2.2. Some criticism of that method's robustness against the influence of an outlying covariable value was given by Samuels (1978). That the efficiency of the estimator decreases as 8 departs from zero is shown in Kalbfleisch and Prentice (1980, p.103). Assumption A2 has a prominent position with the Miller, BJ and N methods. It is a sufficient condition for consistency of the Miller estimate and is used in the heuristic derivation of that estimate's asymptotic distribution. Buckley and James suggest their point estimation procedure without
regard to any assumption on censoring pattern but A2 would seem to be required for consistent estimation of the variance of their estimate. They rely on simulation support for their heuristic arguments concerning large sample properties. The new method gives exact intervals for small samples under A2 and for larger samples employs the asymptotic theory of Daniels (see Section 3.4), again assumingA2.
All of the methods adapt to leftcensored data.
A simple approach for this is to reverse the sign on all response and covariable values and use the original method treating the leftcensored values as rightcensored. The methods of Cox and KSV do not simply adapt to data exhibiting both left and right censoring. The Miller and BJ methods can be extended to that situation by using the distribution estimator of Turnbull (1974, 19.76) and the new method would apply by incorporating a natural extension of the definitions of "definitely concordant" and "definitely discordant" (see Definition 3.2.2).
We now compare some computing aspects of the various
methods. For KSV the computations are without iteration and easy, once the choice of Mn for their expression (2.6.2) has been made. That choice, however, is not clearcut. Convergence can be a problem with the Miller method and to a lesser extent with BJ. Computing costs for the Cox, Miller and BJ methods are comparable. The new method requires more computing time than the other methods, but does not suffer
from convergence problems. On the system used, the cost of analyzing the 184 age data by the new method was $7.20, whereas by BJ the cost was 30 cents. The small sample costs are less alarming. For analysing a sample size n = 10 the new method cost: 71 cents.
Because many results are asymptotic and many based on assumptions which are not often plausible, we must be very concerned with simulation and case study performances. Buckley and James (1979) report a simulation study which showed some bias in the Miller estimate. Their indications are that the Miller method is acceptable when assumption A2 is appropriate but that in that case the results are similar to those from analyzing just the uncensored observations. Analyzing the same simulated data with their own (BJ) method, they report encouraging results forn = 50 with 50% censoring and suggest that the method might be used when there are 20 or more uncensored observations. The Miller approach is no longer advocated by the author himself largely because of consistency and convergence problems (Miller and Halpern, 1982). As stated in section 4.2 the KSV method gave rather disconcerting results with the Stanford datathis is discussed in Miller and Halpern. The Cox, BJ and new methods gave seemingly satisfactory analyses of that data, with the new method results falling close to those of the Cox method.
The purpose of the Monte Carlo studies reported in
section 4.1 was to test the new method applied to a range
of small sample sizes and to compare its performance with the BJ method when that method was applied to small samples. Results suggest that in terms of assurance of coverage with confidence intervals the new method can be used safely over all sample sizes for a wide range of data types, while for sample sizes up to 25 the asymptotic distribution for BJ estimates cannot safely be taken as normal with the proposed variance estimate. Use of a tdistribution gives better coverage performance for BJ with these smaller sample sizes. By an anomaly due to overestimation of variance the use of a tdistribution with BJ gave its most conservative intervals for the smallest sample size tested.
Because intervals by the new method can be expected to be conservative in small samples, it is possible to adjust the test setting in order to come closer to the nominal coverage for the interval. However, it is not possible to give any firm guidelines as to what that adjustment should be.
It would be impractical to use exact permutation distributions over all sample sizes with the new method. Random samples of permutations gave adequate results for n = 10 and the asymptotic results of Section 3.4 seemed appropriate from n = 15 up. The program documentation which follows in the appendices contains further information and recommendations about this.
Assumption A2 pervades the problem of direct regression with censored data, yet it is rather unlikely to be satisfied

Full Text 
91
of small sample sizes and to compare its performance with
the BJ method when that method was applied to small samples.
Results suggest that in terms of assurance of coverage with
confidence intervals the new method can be used safely over
all sample sizes for a wide range of data types, while for
sample sizes up to 25 the asymptotic distribution for BJ
estimates cannot safely be taken as normal with the pro
posed variance estimate. Use of a tdistribution gives bet
ter coverage performance for BJ with these smaller sample
sizes. By an anomaly due to overestimation of variance the
use of a tdistribution with BJ gave its most conservative
intervals for the smallest sample size tested.
Because intervals by the new method can be expected to
be conservative in small samples, it is possible to adjust
the test setting in order to come closer to the nominal
coverage for the interval. However, it is not possible to
give any firm guidelines as to what that adjustment should be.
It would be impractical to use exact permutation dis
tributions over all sample sizes with the new method. Random
samples of permutations gave adequate results for n = 10 and
the asymptotic results of Section 3.4 seemed appropriate from
n = 15 up. The program documentation which follows in the
appendices contains further information and recommendations
about this.
Assumption A2 pervades the problem of direct regression
with censored data, yet it is rather unlikely to be satisfied
Table 4. Percentages of 200 simulations leading to infinite intervals by N
and percentages giving convergence problems with BJ.
interval Nonconvergence No solution
Data
n =
Start =
7
e(0>
s'1'
15
B<0)
25
3 (0)
B<>7
3(l)
15
8
25
B>
Tl, Cl
7.5
23.0
20.5
30.5
26.0
3.5
4.0
0.5
0.0
T2 ,C1
13.5
30.5
23.5
30.5
21.5
10.0
5.5
1.5
0.5
T 3, Cl
10.5
24.0
19.5
24.0
21.0
3. 0
1. 0
1. 0
0.0
T:4 ,C1
11.0
25.5
22.5
28.0
26.5
2.0
2.5
1.0
0.5
Tl, C2
9.5
22.0
17.5
25.0
18.5
7.0
3.5
1.0
0.5
T2 ,C2
12.0
23.5
18.5
31.5
19.5
9.5
1.5
1. 5
0.5
T3,C2
14.5
17.0
15.0
19.5
16.0
8.5
2.5
1.5
0.0
T4 C2
13.5
23.5
19.5
35.0
27.0
2.5
3.5
4.5
0.5
Tl ,C3
14.0
31.0
31.0
35.0
39. 5
5.0
4.5
1.0
0.5
T2 ,C3
21.5
34.5
34.0
43.0
41.5
8.0
5.0
4.5
2.0
T3,C3
11.0
37.0
38.0
35.5
30.0
2.5
2.0
0.0
0.0
T4,C3
3.5
29.5
28.5
23.5
23. S
3.5
2.5
0.5
0.0
~(0) ~(i)
1) Comparisons between 3 and 3 are for exactly the same data.
2) No infinite intervals occurred for n = 15 or n = 25.
72
Table 10. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 15.
LS
N
1
BJN
BJT
% Censored
Tl ,C1
92.0
92.5
89.0
90.5
94.5
44
T2 ,C1
90.5
91.0
92.5
91.9
94.9
49
T3,C1
93.0
88.5
90.0
84.9
91.4
40
T4 ,C1
89.5
89.0
89.5
88.9
91.4
44
Tl ,C2
91.5
93.5
91. 0
89.9
95.5
47
T2 ,C2
88.5
87.5
89.5
85.3
91.4
48
T3 ,C2
93.5
92.0
91.0
88.3
95.0
47
T4 ,C2
92.5
91.0
88.5
87.4
94.8
49
Tl ,C3
92.0
89.0
75.0
84.9
89.9
42
T2 ,C3
91.0
91.0
70.5
90.1
93.7
49
T3 ,C3
87.5
90.0
83.0
81.0
84.5
39
T4 ,C3
90.0
92.0
84.5
81.4
87.4
33
Nominally intervals were 90% Standard deviations for the
table entries are approximately 2.1.
5
1.2 Aims and a Conspectus
This dissertation concerns regression analysis where
the response variable is rightcensored. Such an analysis
might be useful for example with the Stanford data in
relating survival time to one or more concomitant measures
taken on patients. Waiting time for a donor would be one
such measure. Others available in that data set are age
of patient and a mismatch score which measures the degree
of tissue incompatability between the donor and recipient
hearts.
In the next chapter we review some of the methods that
have been proposed for the regression problem and in Chapter
Three present a new method. Some simulation results and
results of analyzing the Stanford heart data are presented
in Chapter Four and concluding observations and suggestions
are made in Chapter Five. Fortran coding and documentation
for the new method are included in appendices.
Tables and figures are numbered on separate systems
which run consecutively through the whole text. All other
cross referencing follows the pattern chapter, section and
item within the section in that order, except that the chap
ter reference will be dropped when the citation appears in
the same chapter. Thus the fourth numbered item in Section
Two of Chapter Three would be cited as (3.2.4), except in
Chapter Three where it would be simply (2.4). Theorems,
Lemmas, Definitions, Examples, etc. all fall consecutively
into the numbering scheme.
51
U. .
13
<
Yli < Y2j or yu < y2j
li = y2j or (Yi' *2j>
or y,, < y, or y,.
(5.1)
li 2 j
2 j
v* 1
*li > Y2j or yu > Y2j
Following an inversion procedure (cf. Randles and Wolfe,
1979, Ch. 6) for a distribution free confidence interval,
consider this statistic W computed with sample points:
( + ) (+) A A
Zli ^li A 1 ~ 1//ni,
z(+) = y(+)
Z2j y2j
j 1,...,/
1. e. ,
with
W = 2 U. .
r 1
ui. J 0
S. 1
z.. < z_. or z,. < z.
lx 2j li 2j
otherwise
z, > z_.
li 2j
r zu>z2j
Using the notation
&m
z 0 uncensored
xm
ZÂ£m censored
^ 1,2, III 1,2,... f T n a
nl n2
M = dl j1 5U*(z2jzli>
(5.2)
CONTINUATION OF LISTING FOR APPENDIX C
110
C SUBROUTINE ORDER: PLACES B'S IN SIZE
C ORDER AND CORRESPONDINGLY RELABELS
C CHANGE INFORMATION.
C
SUBROUTINE ORDER(Z, IC, N)
DIMENSION Z(50),IC<30)
N1=N1
DO 10 1=1/N1
SM=Z(I)
K=I
11=1+1
DO 20 J=I1,N
IF
SM=Z(J)
KJ
20 CONTINUE
Z(K)=Z(I)
Z(I)=SM
ICK=IC(K)
IC
IC(I) = ICK
10 CONTINUE
RETURN
END
C
C SUBROUTINE COMP: COMPUTES STARTING VALUE
C FOR SEQUENTIAL COMPUTATION OF Si. )
C
SUBROUTINE COMPiB,M, IC,Z, IS)
DIMENSION B10),IC(10),Z(10)
IS=0
DO 230 1=1, M
DO 240 J=l,M
IF(B
IF(IC(J) .EG. 0)G0 TO 200
IF(IC(I) EQ. 0)GO TO 160
IF(Z(I) GT. Z(J)) IS=IS+1
IF(Z(I) LT. ZCJ)) IS=IS1
GO TO 240
160 IF(Z(I) GE. Z(J))IS=IS+1
GO TO 240
200 IF(IC(I) EQ. 0)G0 TO 240
IF(Z(J) GE. ZCI))IS=IS1
240 CONTINUE
230 CONTINUE
RETURN
END
C
C SUBROUTINE COEF: MAKES PRELIMINARY
C CALCULATIONS ON X VECTOR FOR INCLUSION
C IN VARIANCE COMPUTATION IN SUBROUTINE VAR.
C
SUBROUTINE COEF
DIMENSION X(100)
INTEGER SUMSQ,SUMX, SUMA
SUMSG=0
SUMX=0
DO 40 1=1, N
SUMA=0
DO 20 J=l, N
IF(X
10 SUMA=SUMA1
GO TO 35
30 SUMA=SUMA+1
33 SUMSQ=SUMSQ+1
20 CONTINUE
SUMX=SUMX+SUMA*SUMA
40 CONTINUE
COEF1=FL0AT(SUMXSUMSQ)/FLOAT*
C 0EF2=FL0A T(SUMSQ)/FLOAT(2*N* < N1))
RETURN
END
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
*
Jan
CAC
This dissertation was submitted to the Graduate Faculty of
the Department of Statistics of Liberal Arts and Sciences
and to the Graduate School; and was accepted as partial
fulfillment of the requirements for the degree of
Doctor of Philosophy.
August 1983
Dean for Graduate Studies
and Research
62
These aspects are now discussed in turn.
i) Infinite intervals and convergence. Some features
of the methodologies of the BuckleyJames method and the
new method are now considered. Method N will always pro
duce a point estimate but it is possible to obtain infinite
length intervals. The infinitelength intervals usually
contain one finite bound so that the interval will still be
useful. The proportions of infinite length intervals ob
tained at different sample sizes in a series of 200 simula
tions are given in Table 4. Clearly, infinitelength inter
val is a problem associated with very small sample sizes.
Even for n = 7, the proportion of such intervals was less
than 15% in all but one case.
Recall that the BJ method might not give a point
estimate due to nonconvergence and consequently might not
produce a confidence interval. The proportion of simula
tions that did not give direct convergence and the propor
tion for which "no solution was possible" even after allow
ing for oscillation are also included in Table 4 for
*(0) /N(1)
sample size n = 7 and starting values 6 and 3 as well
as for sample sizes n = 15 and n = 25 with starting value
~ (1)
3' The comparison of starting values at n = 7 indicates
~ (1) ~(o)
that 3 might be a better choice than 3 However,
except for the results given in Tables 4 and 5, all our
~ (0)
results were obtained by using 3 as suggested by Miller
and Halpern (1982). Simulations for n = 7 using each
CHAPTER THREE
ANEW METHOD
3.1 Structure and Notation
We shall adopt the model and notation of Section 2.3.
A method will be described for estimating 6 in model (2.3.1)
with p = 1. The observable pairs (Y^,<5^) described in
Section 2.3 may also be represented in the following way.
Let Y^(=T^) denote an observable uncensored random variable
and Y? (= < T^) denote an observable censored random var
iable. The general notation, y!+^, is used to indicate that
censoring might or might not be present. Sample observa
tions will be denoted (yf4^ ,x.).
Assumption A2 of Section 2.3 will be imposed in order
for the new method to provide a confidence interval. It
might be argued that A2 imposes a severe restriction on the
practical utility of the proposed method, but a Monte Carlo
study reported in Chapter Four indicates that the method is
fairly robust to departures from A2. Recall that assump
tion A2 is also required for the use of the Miller and
BuckleyJames methods. This is discussed further in
Chapter Five.
26
CHAPTER TWO
REGRESSION METHODS
2.1 Regression Models
The four main nonparametric regression techniques
currently available are due to Cox (1972), Miller (1976),
Buckley and James (1979) and Haul, Susarla and Van Ryzin
(1981). These methods will be described later in this
chapter, but first some preliminary discussion about hazard
rates and accelerated time models is in order.
For a random variable T ^ 0 with density f(t) and
rightcontinuous distribution function F(t) the survival
function is
(1.1)
P (t) = 1 F(t) = Pr (T > t) .
The hazard rate (hazard function or force of mortality),
A (t),is defined
(1.2)
with the interpretation that A(t)dt is the probability of
"death" in the interval (t,t+dt) given survival beyond t.
Integration of A(t) shows that
P (t) = exp
(1.3)
6
11
Pr {death of (i) at time y ... (one death in R...
J (1) (1)
at time y ^ }
(i)
Z
j eR
(2.4)
(i)
The product of the conditional probabilities at (2.4) is
called a conditional likelihood:
V)
Â£'X(i)
u
Z
j Â£R
B'x
J
(i)
(2.5)
Note that (2.5) is not a true conditional likelihood.
Standard likelihood analysis is applied to (2.5) and
iterative methods are usually required (cf., for example,
Kendall and Stuart, 1973). Through standard maximum like
A
lihood arguments, Cox (1975) suggests that _3 from maximizing
(2.5) is asymptotically normal with mean _3 and covariance
matrix the inverse of the information matrix:
92
1 (6) = 5 Log L (3) (2.6)
33
A formal proof of asymptotic normality is given by Tsiatis
(1981) .
A formal justification for the use of (2.5) in the
case of no ties is given by Kalbfleisch and Prentice (1973).
They show the equivalence of (2.5) and a marginal likelihood
74
The implication from the results for all sample sizes
is that method 1 may be acceptable in terms of coverage for
Cl and possibly C2 censoring but would not guarantee cover
age with C3 type censoring. This finding of course is in
keeping with the observations about bias.
Method N shows quite close to or greater than nominal
coverage over all models and sample sizes. The asymptotic
method of section 3.4 with n = 15 and n = 25 and the approx
imate permutation distribution for n = 10 give coverage gen
erally close to 90%. Use of the exact permutation distribu
tion with n = 7 gives very conservative 90% intervals even
though the setting was 88%.
Method BJN does not come close to attaining coverage
for n = 7 and the coverage is fairly consistently low over
all models for the larger sample sizes. For n = 7, all the
observed coverages are more than 2 standard deviations below
90%. There are some similar significant departures for the
larger sample sizes, particularly with C3.
Method BJT with n = 7 generally gave conservative
intervals with coverages comparable to those of Method N.
Method BJT gave some low coverages at the larger sample sizes
particularly with C3. This loss of coverage for the larger
n is a result of the improvement, mentioned earlier, of the
estimate of the variance of the point estimate. For n = 7
the inflated variance estimate secures coverage for the
interval.
46
zero in the computation of ^. Contribution to sk ^ of
(rl,(rl+l)+) for corresponding permutations of R^i l ^s
either +1 or 1. Thus if a permutation of elements of R/+!
K /1
gives a value of Sk < sk ^ then the same permutation of
+ i ]_ will give a value of Sk_^
Therefore,
Number of permutations'
of elements of R,^+i
k,l
giving < sk x
rNumber of permutations'
< of elements of r}+\ ,
I k1,1
^giving Sj.i < skl,l
Consideration of situation (ii) gives similarly
Number of permutations
( + )
of elements of _R
giving Sk < sk^2
k ,2
)
f Number of permutations'
of elements of Rk*^ 2
J ^giving Sk_1
These together imply in terms of pvalues for s, < 0
pk < pkl '
s, < 0.
k
Similar argument gives
/'Number of permutations'
< of elements of r/+!
\ kl,m
^ giving Sk_l>skl,m
Number of permutations
of elements of R,+1
k ,m
.giving skjIn
}
m = 1,2 and these imply
pki < pk'
s, > 0.
k
BIBLIOGRAPHY
Breslow, N. E. (1974). Covariance analysis of censored
survival data. Biometrics, 30, 8999.
Brown, B. W., M. Hollander, and R. M. Korwar. (1974).
Nonparametric tests of independence for censored data
with applications to heart transplant studies.
Reliability and Biometry: Statistical Analysis of
Life Length (F. Prochan and R. J. Serfling, Eds.).
SIAM, Phildelphid.
Buckley, J., and I. James. (1979). Linear regression with
censored data. Biometrika, 66, 429436.
Cox, D. R. (1972). Regression models and lifetables.
J. R. Stat. Soc. B, 34, 187202.
Cox, D. R. (1975). Partial likelihood. Biometrika, 62,
269276.
Daniels, H. E. (1944). The relation between measures of
correlation in the universe of sample permutations.
Biometrika, 33, 129135.
Efron, B. (1977). Efficiency of Cox's likelihood function
for censored data. J. Am. Stat. Assoc., 72, 557565.
ElandtJohnson, R. C., and N. L. Johnson. (1980). Survival
Models and Data Analysis. Wiley, New York.
Epstein, B., and M. Sobel. (1953). Life testing. J. Am.
Stat. Assoc., 48, 486502.
Gehan, E. A. (1965). A generalized Wilcoxon test for
comparing arbitrarily singlycensored samples.
Biometrika, 52, 203223.
Hodges, J. L., and E. L. Lehmann. (1963). Estimates of
location based on rank tests. Ann. Math. Stat., 34,
 598611.
Kalbfleisch, J. D., and R. L. Prentice. (1973). Marginal
likelihoods based on Cox's regression and life model.
Biometrika, 60, 267278.
112
64
starting value gave no indication that the overall conclu
sions of the following sections would be altered by the
choice of starting value.
Looking at results for the individual samples generated
showed that nonconvergence is more common for highly cen
sored samples but that it does occur at all levels of cen
soring. In a search over some of the results for the case
n = 7 infinite intervals were not found to occur when there
were more than three uncensored values. No intervals of
(00,00) were encountered. It was not particularly common
for nonconvergence and infinite intervals to occur for the
same data set.
ii) Bias of estimator. Arithmetic means of the point
estimates obtained by each method over 2000 simulations
for n = 7 are given in Table 5 and means over 200 simula
tions for n = 25 are given in Table 6. The estimated stan
dard errors of these means are also given.
Because of some nonconvergence for the BJ method the
number of estimates averaged might be less than the number
of simulations. The actual number used for the BJ method is
given in the final column. Means which are more than two
standard deviations from the nominal 3 = 0.2 are indicated.
44
can be no interchange of ranks associated with a pair of
tied x's (cf. 2.9). Let
R (+) (b) = (r{+) (b),R2(+) (b) ... ,R^+) (b) ) '
denote the ranks of {z+^ (b) : i=l,...,n}. As before, the
superscript (+) indicates the incorporation of censoring
information, R*(b) indicating that (b) is censored and
the value of z^(b) has rank R^ (b) and R^(b) indicating that
z^(b) is not censored and has rank R^(b). Clearly there
may be tied ranks, but for (i,j) such that b^^ is a crit
ical value we have R^ (b) =R^(b) only at b = b^^ If there
are no ties R eP For notation purposes take b, =b. .
n k ij
and let R^*^ denote R^ (b' ) and R^"1^ denote R^ (b") for
b' b" as in Lemma 3.8. A second subscript on R^4^ will
identify a particular vector. We now give as a theorem an
important result concerning condition (3.11).
Theorem 3.12. Suppose x. < x_ < ...< x and b.. = b, for
1 2 n i] k
just one pair of subscripts (i,j). Then ck = l implies
condition (3.11):
pk pkl
Pkl15 pk
if sk < 0 ,
if s^ > 0.
Proof: As noted earlier, when b passes through b^,, the
only change in the rank vector R^ + \b) is the interchange
J_ T_
and j elements. The elements interchanged
of its i
Exact permutation tests are used for small samples but
asymptotic results are available for larger samples.
The new method of estimation, which is particularly
recommended as a small sample method, appears to have
several desirable properties. Simulation studies indicate
that the method performs well over all sample sizes and is
robust against a variety of censoring patterns.
IX
non non
CONTINUATION OF LISTING FOR APPENDIX C
109
COMPUTE POINT EST.
IF
ISUB1=1SUB 2KAV
BPNT=
GO TO 70
65 ISUB1=1SUB
ISUB2=ISUB
BPNT=B
70 J=ISUB2
CONFIDENCE INTERVAL
71 ISCUT=10
72 ICH=0
74 IF(B(J) NE. B < J+1))GO TO 76
IF(C
J=J+1
GO TO 74
76 IF(ISCUT EQ. 10>G0 TO SO
IF <(C < J) EQ. 1) .OR. (ICH EQ. 1))G0 TO SO
78 IF
IF( J NE. L+DSQ TO 79
BUP=9999.0
GO TO 151
79 J=J+1
GO TO 72
80 BB=
DO 90 K=1jN
90 Z(K)=Y(K)BB*XCK)
CALL VAR(Z, N, IC, C0EF1, C0EF2 ZNGRM, ISCUT, VARS)
GO TO 78
150 BUP=B
151 J=ISUB1
ZNORM=1. *ZNORM
ISCUT=10
140 ICH=0
142 IF(B(J) NE. B< J1))GO TO 146
IF(C(J) EQ. 1)ICH=1
J=J1
GO TO 142
146 IF(ISCUT EQ. 10)G0 TO 155
IF((C(J) EQ. 1) .OR.
148 IF(S
IF(J NE. 2)GO TO 143
BLOW=9999. 0
GO TO 221
143 J=J1
GO TO 140
155 BB=
DO 160 K=1jN
160 Z(K)=Y(K)BB*X(K)
CALL VAR< Z/ N IC, C0EF1, C0EF2i ZNORM, ISCUTi VARS)
GO TO 148
220 BLOW=B(J)
221 CONTINUE
WRITE(6, 232)
WRITE(6, 230)BPNT
WRITE<6, 232)
WRITE(6, 231)BLOW, BUP
232 FORMAT(' ')
230 FORMAT(' POINT ESTIMATE F10. 2)
231 FORMAT(' CONFIDENCE INTERVAL F6. 2,
STOP
END
F6. 2)
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS iii
LIST OF TABLES vi
ABSTRACT viii
CHAPTER
ONE INTRODUCTION 1
1.1 The Concept of Censoring 1
1.2 Aims and a Conspectus 5
TWO REGRESSION METHODS 6
2.1 Regression Models 6
2.2 The Cox Method 9
2.3 The Linear Model 14
2.4 The Miller Method 16
2.5 The BuckleyJames Method 2 0
2.6 The Koul Susarla and Van Ryzin Method 22
2.7 Some Comparative Studies 2 4
THREE A NEW METHOD 2 6
3.1 Structure and Notation 2 6
3.2 Point Estimation of 6 27
3.3 Interval Estimation of 6 : Exact Method 37
3.4 Interval Estimation of
6 : Asymptotic Method 48
3.5 The TwoSample Location Problem .... 49
3.6 Computational Aspects 53
FOUR COMPARATIVE STUDIES 55
4.1 Simulations Using a Simple Linear Model 55
4.2 Analysis of Heart Transplant Data ... 83
4.3 Summary of Results 86
FIVE CONCLUSION 88
iv
CHAPTER FIVE
CONCLUSION
We now submit a final appraisal of all the methods
considered; firstly with regard to their properties and
scope, and then in terms of some observations on their
performance.
The methods of Cox, Miller and BJ extend to multiple
regression whereas at this stage the new method does not.
Also, computing forms for interval estimation in multiple
regression with KSV are not available.
The KSV method is well founded theoretically but the
conditions required for consistency and asymptotic normal
ity of the estimates are difficult to relate to practical
criteria. Large sample properties have been established
for the Cox method and were referenced in section 2.2. Some
criticism of that method's robustness against the influence
of an outlying covariable value was given by Samuels (1978).
That the efficiency of the estimator decreases as _8 departs
from zero is shown in Kalbfleisch and Prentice (1980, p.103).
Assumption A2 has a prominent position with the Miller, BJ
and N methods. It is a sufficient condition for consistency
of the Miller estimate and is used in the heuristic deriva
tion of that estimate's asymptotic distribution. Buckley
and James suggest their point estimation procedure without
88
>
then n..
iD
i
and if
= n
P P
ID
p < p. then n .
i D ID
P P
/
where = 1, 0, 1 from (2.3).
Thus in an arbitrary permutation of the elements (b) ,
i = l,...,n, all pairs contribute a change of 2 or 0 to the
total change of S(b) = I n. Therefore, the
(i,j):xi>x. 13
values of S(b) for two arbitrary permutations of z ^ values
must differ by zero or a nonzero multiple of 2.
The following Corollary shows that the null distribu
tion of S (b) does indeed depend on b.
Corollary 3.7. A change in b can change the permutation
null distribution of S(b).
Proof: In view of Lemma 3.6 this will be proved if we show
that a change in b can change S (b) by some odd number.
From (2.11) the jump in S( ) occurring as b changes through
bkis
c
k
Â£ (6+6,).
(i,j)eBk 3
Clearly this change can be an odd number.
Recall that S(b) is a function of the ranks of
(z^(b): i = l,...,n} and that these ranks can change only
at the critical values b^, k = 1,...,& (see 2.12). Thus
there can be no difference in the permutation distributions
20
a more even censoring pattern. Separate sets of weights
are computed for each interval and the sum of squares to
be minimized is then a weighted average of the separate
weighted sums of squares. This correction does of course
demand a fairly large number of observations and there are
practical difficulties in applying it to the multiple
regression situation.
2.5 The BuckleyJames Method
The method due to Miller applied estimated weights
to the residual sum of squares and then used a least squares
solution. For the same model (3.1) the BuckleyJames method
(BJ from here on) modifies instead the solution to the usual
normal equations by using estimates of T^ when these
*
are not observable. Let the pseudo random variable be
defined by
Y. = Y.. + E[T. T. > Y.] (16.) .
1 11 11 1 1
(5.1)
Since E(Y^) = a + 3x^ (Miller, 1981, p. 151) we could
*
estimate 6 using ordinary least squares if all the Y^ were
observable. The BJ approach is to estimate the observed
* t
value y^ whenever 6^ = 0 by using the following estimate
in (5.1):
[t. It. > y. ] = (jx. +
i1 i 11 i
wk(B)zk
Vzi
1 F(Z)
(5.2)
24
y's in the case of ties. Like the PL estimator, the esti
mator at (6.2) can be unstable for large t, and KSV propose
a truncation of very large observations and use (6.2) only
for y^ The selection of Mn, however, is not clearcut.
No iteration is required and the parameter estimation
is computationally straightforward for any number of covar
ables. Koul et al. (1981) give conditions for consistency and
asymptotic normality of the solutions and discuss their
implications in practice. Integral expressions are given
for the asymptotic variances and covariances of the esti
mators, but computational forms are available only for the
one covariable case.
2.7 Some Comparative Studies
The Cox method has been given much attention in the
literature and is used extensively, yet there is much appeal
to the direct easy interpretation of the linear model. It
would seem that the choice between the Cox method and the
other three methods just outlined must be based on the appro
priateness of the proportional hazards model or the linear
model. However, a distinction may be hard to make in prac
tice. The choice of method should also be influenced by the
nature of the censoring, but again this is not likely to be
clearcut. The methods are asymptotic and in most cases
involve heuristic arguments or hard to verify conditions.
31
Clearly the statistic S (b) depends on the ranks of the
Z's, the ranks of the x's and the censoring pattern. The
xvalues and the censoring pattern are fixed for a partic
ular data set but numerical realizations of Z change with
b (cf., 2.1). As noted in the following Lemma, for certain
"critical" bvalues two or more Z's will change rank. Some
further properties of S(b) as a function of b are given in
Lemma 2.8.
Lemma 2.8.
i)A change in the value of (b) can occur only as
b changes through
Y.Y.
b. = 3, x. > x. .
il xixj 1 3
. (2.9)
ii)For (i,j) : x^ > x^ the change in n^j(b) as
b increases through b^j is
c. = (6 + 6. )
13 3i
(2.10)
iii)Suppose there are distinct values of b^j given
by (2.9). Denote these values,
bi< b2 < ...,
B, = { (i j ) : b . = b, ,x. > x.}, k = l,...,&.
K lj K. J
Then S(b) is a step function in b with steps of magnitude
c^ at b^ where
c = E c. .
(i,j)eBk 13
(2.11)
67
The results in Tables 5 and 6 indicate that method 1 can
give severely biased results if used in cases where assump
tion A2 is not satisfied. Departures from A2 do not seem so
serious for methods N and BJ, especially at the larger sample
size. There is some bias indicated for methods N and BJ at
sample size 7 with censoring type C3.
iii) BJ variance estimation. An investigation was made
of how well the expression (2.5.5) approximates the variance
of the BJ estimator. For each type of data 2000 simulations
for n = 7 and 200 simulations each for n = 15 and n = 25 were
run, computing where possible the BJ estimate and an estimate
of its variance using (2.5.5). The sampling variance of the
estimates and the mean of the variance estimates were computed.
The results for n = 7 are displayed in Table 7.
From the results shown in Table 7 it seems clear that
the variance expression (2.5.5) overestimates the variance
of the BJ estimator quite markedly for sample size n = 7.
A similar but less marked overestimation was found'for n = 15
while for n = 25 the estimation was much better.
CONTINUATION OF LISTING FOR APPENDIX A
102
C SUBROUTINE XCALC: FORMS MATRIX OF VALUES
C USING THE X PERMUTATIONS. THESE VALUES ARE
C USED BY REPEATED USE OF SUBROUTINE DIST.
C
SUBROUTINE XCALCCXP, M 111,X)
COMMON ISUMXC11000, 15)
DIMENSION X(10>
INTEGER XP(10)
M1=M1
DO 40 J=l Ml
IT=0
J1=J+1
JJ=XPCJ)
DO 20 I=J1 M
II=XPCI)
IF(X CII)XCJJ)>10/ 2030
30 IT=IT+1
GO TO 20
10 IT=IT1
20 CONTINUE
ISUMX(IIIJ) = IT
40 CONTINUE
RETURN
END
C
C SUBROUTINE ORDER: PLACES B'S IN SIZE ORDER
C AND CORRESPONDINGLY RELABELS CHANGE
C INFORMATION
C
SUBROUTINE ORDER CZ 1CN)
DIMENSION ZC50)IC(50)
N1=N1
DO 10 1=1. N1
SM=Z(I>
K=I
11=1+1
DO 20 U=I1N
IF (ZCJ) GE. SM)GO TO 20
SM=Z(U>
K=J
20 CONTINUE
ZCK)=ZCl)
Z(I)=SM
ICK=ICCK)
ICCK)=ICCI)
IC(I)=ICK
10 CONTINUE
RETURN
END
C
C SUBROUTINE COMP: COMPUTES STARTING VALUE
C FOR SEQUENTIAL COMPUTATION OF SC. )
C
SUBROUTINE COMPCB.M IC Z IS)
DIMENSION BC10), ICC10), ZC10)
is=o
DO 250 1=1 M
DO 240 U=li M
IFCB(I) LE. B(J)) GO TO 240
IFCICCJ) .EG. 0)G0 TO 200
IFCICCI) EQ. 0)GO TO 160
IFCZCI) GT. ZCJ)) IS=IS+1
IFCZCI) LT. ZCJ)) IS=IS1
GO TO 240
160 IFCZCI) GE. ZCJ))IS=IS+1
GO TO 240
200 IFCICCI) EQ. 0)G0 TO 240
IFCZCJ) GE. ZCI))IS=IS1
240 CONTINUE
250 CONTINUE
RETURN
END
33
We now make some observations relevant to the compu
/\
tation of S and 8.
(a) If for a data set there are no tied x's and no coinci
dent b. values, then in Lemma 2.8 (iii) ,&=(!?)
1J A
(b) Since S (b) is nonincreasing in b (Lemma 2.6), the
greatest possible value for S(b) is S(bQ) where
kQ < b^ < b2, , < b.
(c) Any change in S (b) which may occur at b.,b2...,bÂ£,
depends (via 2.10, 2.11) on the censorship pattern of the
Z(b) which change rank at the critical b.
(d) Each critical b value b^ has a change value
c^, k = 1,2,...,A, associated with it (Lemma 2.8 (iii))
and easy sequential computation of S is facilitated in the
following way. Let s^ denote the value of S(b)for
b^ < b < with b+^ = , then
sl = so + clf
s 2 = sl + c2'
and in general
sk = sJc_1 + ck, k = 1,...,A. (2.15)
(e) The algorithm given in (d) does not give an evaluation
of S at the critical b values but this is not needed and the
enumeration given makes it easy to locate the point estimate
proposed in Definition 2.7.
3
Y. = Min
x
and
6 =
i
O
1 Ti < C^, (i.e., is not censored)
This is a random censorship model. It is common to
assume that T^ and are independent. Since the observ
able Yvalues are lower bounds for the "true" values, they
are described as rightcensored values. In random left
censoring we would observe T^ only if T^ ^ C^; that is,
we observe
Y. = Max (T.,C. )
i l i
and
T. > C.
i l
T. < C..
l i
Both types of censoring might occur in the same data set
and both are special cases of interval censoring in which
we observe only that the random variable of interest falls
in an interval. For example, with leftcensoring we observe
only that T^ falls in the interval (,C^] .
The occurrence of censoring will now be discussed with
reference to a particular medical studya study to which
we shall refer throughout. The Stanford Heart Transplant
Program is described by Brown, Hollander, and Korwar (1974)
59
for sample sizes n = 7, 10, 15 and 25. For samples of size
n = 7 the permutation distribution required for the method
was based on all 7! = 5040 permutations generated as ex
plained in Section 3.3. For n = 10, the permutation distri
bution was estimated using a random sample of 10,000 permu
tations, selected with replacement from the possible 10!
permutations. A separate small study which compared 1%, 2%,
5% and 10% points of the estimated permutation distribution
with the corresponding percentage points of the true permu
tation distribution showed good agreement. For n = 15 and
n = 25 the asymptotic approach of Section 3.4 was used.
Randomized tests were not incorporated in the simula
tions and so the intervals obtained would be expected to be
conservative in cases where assumption A2 is satisfied.
In the small sample simulations (n = 7, 10) the test crit
ical values were adjusted accordingly to give less conserva
tive intervals. Details of this are given with the results.
By its nature this method can give infinite length
intervals and some results on the likelihood of this for
different size samples are given in Table 4. In contrast
to the other methods the confidence intervals by this method
are not necessarily symmetric about the point estimate and
some implications of this are presented in Table 15.
c) Least squares on a reduced sample (1). Standard
least squares analysis was applied just to the known
(uncensored) survival times. This was included to investigate
83
4.2 Analysis of Heart Transplant Data
The Stanford Heart Transplant Program was mentioned
in Chapter One. Data from that program has been analyzed
by various authors at various times since the start of data
collection in 1967. Survival times, ages and mismatch
scores available as of February, 1980, are tabulated in
Miller and Halpern (1982). Those authors compare the
results from analyses using the methods of Cox, Buckley and
James, Miller, and Koul, Susarla and Van Ryzin that were
described in Chapter Two.
Data are available for 184 patients and 55 of the
survival times are censored values. Ages are available for
all 184, but mismatch scores are available for only 157
patients. Miller and Halpern compare the four methods of
analysis using the 157 complete records in a multiple
regression of the base ten logarithm of survival time
against age and mismatch score. They declare the mismatch
score, insignificant and say that the results they quote for
age from the multiple regression are practically identical
to those using age alone for the various methods. Thus the
results on age from their multiple regression are included
in Table 16. Also included in Table 16 are new computations
for the Buckley and James method for a single regression
with age alone using all 184 data values and the results of
using the new method, first with the 157 complete records and
then with the whole set of age data.
2
measure, and "survival analysis" is a general term used for
methods of analyzing censored data. However, censored
responses need not concern survival or be a time. Miller
(1981) quotes an example from Leavitt and Olshen (1974)
where the response is the amount paid on an insurance claim;
in some cases the patient's illness is over and the total
claim is known, in others the patient is still sick and only
the amount paid to date is known.
Much of the early work in lifetesting, following a
pioneering paper by Epstein and Sobel (1953), was done in
a parametric framework. That work was largely applied to
electrical engineering problems where certain distributional
assumptions seem to be appropriate (cf., Mann, Schafer and
Singpurwalla, 1974; Nelson, 1981). Mann et al. also describe
Type I and Type II censoring mechanisms which are partic
ularly appropriate for industrial experimentation. Our
focus is on situations, particularly medical, where we would
like to use a distributionfree technique and where the
censorship is random in the sense that we now describe.
Let T^, i = l,...,n be independent random variables,
which we may think of as representing true, possibly unob
servable, lifetimes and let independent variables ,
i = l,...,n be censoring variables in the sense that T^ is
observable only if T. < C.. We can observe variables
i i
(Y.,6.) i = l,...,n where
37
3.3 Interval Estimation of 3; Exact Method
A 100(la)% confidence set for 6 can be obtained by
inverting an alevel test of Hq : B=b (Lehmann, 1959,
p. 173). Our objective in this section is to show that a
distributionfree confidence interval for 6 can be defined
in terms of a permutation test of Hq: 8 = b using the sta
tistic S (b) defined at (2.3).
Consideration is now given to using permutation dis
tributions for S(b) that are generated for a given'data set
under assumption A2 of Section 2.3. A direct consequence,
of assumption A2 is that the probability of censoring or
noncensoring of a yvalue is independent of the associated
value of x, i.e. ,
Pr(T.
i i1 i i i
Thus the probability distribution of 6^ in a pair (Z^,6^)
is independent of x^ for all i. Let
w(b) = ( (z1 (b) ,6 1,x1) ,. . ,Un(b) ..<5n/Xn))'
be an observed vector. We now reckon on all permutations
of { (^ 61) (z2
consider the n! transformations of w(b) given by
g (w(b))=((z ,6 ,x,),...,(z ,6 ,x ) ) (3.2)
Â£ Pi Pi x Pn pn
where
Â£ ePn =Â£ : Â£ is a permutation of (l,...,n)}.
19
a = 0 in (4.1) for convenience and also note that because
the weights are renormalized over the uncensored observa
tions the estimation of 3 does not require any adjustment
/v
to =0. As a starting value 3Q, Miller suggests using
the unweighted least squares estimate of slope using only
the uncensored observations. The final estimate of 3 is
used in (4.6) for calculating a and in this computation we
do set =1 regardless of the actual censorship.
This iterative method might not attain convergence.
If estimates become trapped in a loop, Miller advocates
averaging the values in the loop. Assumption A2 is a suf
A A
ficient condition for the estimates a and 3 to be consis
/v
tent and for 3 to be asymptotically normal. Under an addi
tional assumption that the variability due to weights
^ ~
w^(3) is negligible, Miller obtains an asymptotic estimate
/N
for the variance of 3 given by
Â£ wi (3)(yict3xi)
Var (3) = ^7 Z* 2~ (49>
Â£ w (3) (x. xu)
u
Clearly if A2 is not satisfied, as in the case where
censoring is independent of x with a fairly steep regres
sion line, then there will be an irregular censoring pat
tern along the regression line which can lead to biased
estimates. In such cases Miller suggests dividing the
x range into two or more intervals each of which shows
76
difference in mean interval lengths for the two methods and
the estimate of the standard deviation of the difference for
a paired comparison. On all but two occasions the method N
length is shorter, and is shorter by more than 2 standard
deviations on 3 occasions.
Results for n = 15 are in Table 13. Because of cover
age limitations results for methods 1 and BJN are not,
included. In cases where coverage was attained BJN tended
to give rather shorter intervals than N and intervals by 1
were comparable in length to those of N.
Method N gave intervals that tend to be about onehalf
the lengths that it gave in the n = 7 trials. In most cases
the length of the interval by N is about 1.4 times the stan
dard set by LS. For data generated using T3 the interval
lengths by both methods N and BJT are close to or less than
the LS value.
Average interval lengths obtained with' data generated
using Cl or C2 turned out shorter using method N than using
method BJT. The reverse was the case with C3 censoring.
These differences seem to be merely reflections of the cover
ages obtained and it is likely that for the same coverage
there is little to choose between N and BJT in terms of
length.
Simulations with n = 25 gave intervals with method N
which were about twothirds of the length of corresponding
intervals with n = 15. Method N intervals were approximately
61
~ (1)
for the iteration was Â¡3 the least squares estimate using
only the uncensored observations, and in others the value
~ (0)
chosen was 6 the least squares estimate using all the
observations without regard to whether they were censored
or not. The choice of starting value was discussed briefly
in Section 2.5 and some indication of how it might affect
the convergence is given in Table 4.
Confidence intervals for 3 using this method were
placed by using the variance estimate (2.5.5) and critical
values of a normal distribution.
e) BuckleyJames tapproximation (BJT). This proce
dure was as for BJN except that by ad hoc reasoning the
critical values for a tdistribution were used in placing
confidence intervals. From consideration of expression
(2.5.6) for the error variance estimation the value for the
degrees of freedom for t was taken as two less than the
number of uncensored observations.
Some results are now presented of how appropriate
selections of the above methods compared in various aspects
of their performance. The performance aspects to be con
sidered are
i)Infinite intervals and convergence
ii)Bias of estimator
iii)BJ variance estimation
iv)Coverage probability
v)Length of confidence interval
vi)A power comparison.
22
Buckley and James (1979) suggest but do'not fully justify,
A
that 3 is asymptotically normal and that a reasonable esti
mate of variance is
where
Var (3)
I
u
u
(x.x )2
1 u
r
(yi"yuB(xixu,)
(5.5)
(5.6)
Although this method does not require assumption A2
for point estimation, we would not expect (5.5) to be
adequate if there is extremely uneven censoring along the
regression line. As in the Miller method, it seems sensible
to consider dividing the x range to create more homogeneous
regions.
We should note that the BJ method is a nonparametric
analogue of a technique due to Schmee and Hahn (1979) that
uses normality assumptions in estimating the expectation
at (5.2).
2.6 The Ko.ul, Susarla and Van Ryzin Method
This method (KSV) assumes the Model (3.1) and like BJ
defines a pseudo random variable which is unbiased for
a + 3x and then uses estimates for some observations of
that variable. In this case the pseudo random variable is
84
That the KSV estimate is positive is in conflict with
the commonly held view that increasing age is deleterious
to successful heart transplantation. Because computational
expressions are not available for an estimate of the vari
ances of the KSV estimates in multiple regression they could
not be given by Miller and Halpern and although an expression
is available in the single covariable case the computation of
the variance has not been added here.
The Miller method did not converge. Oscillation
occurred between two values and the implication from those
values is that there is no significant regression on age.
At the 95% level of testing the method of BuckleyJames
showed a significant negative slope on age for the larger
sample size and the Cox result was also significant. With
A
the Cox parameterization the positive value of 8 indicates
a decrease in survival time with age.
The new method also showed a significant negative slope.
Miller and Halpern compare the Cox results with the direct
regression results by plotting the median survival times
estimated by the Cox model against age. That plot is close
to linear and has a slope more negative than the BJ estimate
and close to the values given by the new method.
Comparison of results of regressing logarithm of sur
vival times on the mismatch scores are given in Table 17 for
the Buckley and James method and the new method. In this
case the new method gives the smaller negative slope, but
neither result attains significance.
17
where ,.> denotes the 6 associated with z,.,
(i) (1)
a difficulty if 6
F (z) = 1, z > z
(n)
There is
= 0; the convention is to define
. The PL estimate F(z) is a step function
with jumps only at the uncensored points. The size of the
jump at an uncensored z ^ is
dF(z(i)>
ni+l
{j: j
n (
n~i
nj +1
5(j)=l}
(4.4)
The computational forms given at (4.3) and (4.4)
are applied to tied zvalues by assuming that in a run of
tied values the uncensored values precede the censored
values. Otherwise the labelling across the tied group is
arbitrary and the jumps at each uncensored point in a tied
group are equal (by 4.4).
A
Denoting the jump (4.4) by w. (6) (4.2) becomes
n 2
Â£ w. (3) (y a 3x.) (4.5)
i=l 11 1
The weights w^(3)
convention that 6, is
(n)
n
do not depend on a and with the
set equal to one in all cases,
we have I w (3) = 1. If 6 = 0 then w ( 3) =0 and (4.5)
T 1 JL 1
1=1
appears to depend only on the uncensored observations; how
ever the PL estimator and therefore each weight do depend
on all the observations. Seeking an a which minimises (4.5)
we get
ACKNOWLEDGMENTS
I am grateful for the stimulating contact I have
had with many faculty and students at the University of
Florida. In particular I want to thank Dr. P. V. Rao
for his advice throughout my graduate program and for
his learned and shrewd guidance of my dissertation.
Also I want to acknowledge valuable help from Drs. Jon
Shuster and Ramon Littell.
I would not have undertaken this work without
encouragement from my wife and I shall always appreci
ate the sacrifices she made. I also thank Rob and
Nicola for their tolerance of an ofttimes reclusive dad.
My special thanks go to Mrs. Edna Larrick for a speedy
and excellent typing job.
Computing was done using the facilities of the
Northeast Regional Data Center located on the campus of
the University of Florida in Gainesville.
Ill
CONTINUATION OF LISTING FOR APPENDIX A
100
B/CX(I)X(J))
C(L)=IC
40 CONTINUE
35 CONTINUE
C
C COMMENT 4 ORDER B'S RE ARRANGE C'S
C
CALL ORDER(BC, L)
DO 36 J=1,L
B
C(L+2J)=C < L+1J)
36 CONTINUE
B C 1 )=8 (2) 1. 0
B(L+2)=B(L+1) +1. 0
C
C COMMENT 5 COMPUTE STARTING VALUE S<1)
C
DO 55 1=1, N
55 Z(I)=Y(I)BC1)*X(I)
CALL COMP < X, N, IC, Z, IS)
S<1)=IS
C
C COMMENT 6 COMPUTE S'S SEQUENTIALLY
C
KAV=0
L1=L+1
DO 60 1=2, LI
S
IF((S(I1)*S(I)) LT. 0) ISUB=I
IF( S(I) .NE. 0)G0 TO 60
ISUB2=1+1
KAV=KAV+1
60 CONTINUE
C
C COMMENT 7 COMPUTE POINT EST.
C
IF (KAV EQ. O) GO TO 65
ISUB1=ISUB2KAV
BPNT=(B
GO TO 70
65 ISUB1=ISUB
I SUB 2= I SUB
BPNT=B CISUB)
70 J=ISUB2
C
C COMMENT 8 CONFIDENCE INTERVAL
C
71 ISCUT=10
72 ICH=0
74 IF(B
IF(C(J) EQ. 1)ICH=1
J=U+1
GO TO 74
76 IF
IF <(C CJ) EQ. 1) .OR. (ICH EQ. 1))G0 TO 80
78 IF(S(J) LT. ISCUT)GO TO 150
IF(J NE. L+1)GO TO 79
BUP=9999. 0
GO TO 151
79 U=J+1
GO TO 72
80 BB=(B(J)+B(J+1))/2. O
DO 90 K=l, N
90 Z(K)=Y(K)BB*X
CALL RANK(Z, IC, ICENS, N)
DO 100 K=l, NNN
10O IF(K)=0
CALL DIST(III, IF, ICENS, N,NN)
105 IT0T=O
DO 110 K=l, NNN
110 ITOT=IF(K)+ITOT
KC0U=0
82
Table 15. Proportions of 200 simulations for which
Hq: 8=0 was rejected at a test level
a = 0.1.
n
N
= 7
25
BJT
7
25
T1,C1
0.14
0.38
0.11
0.39
T2,C1
0.16
0.43
0.15
0.38
T3,C1
0.30
0.77
0.28
0.69
T4, Cl
0.09
0.43
0.09
0.38
T1 ,C2
0.07
0.39
0.10
0.34
T2,C2
0.13
0.38
0.12
0.32
T3 ,C2
0.21
0.67
0.21
0.60
T4,C2
0.11
0.37
0.10
0.34
T1,C3
0.21
0.46
0.08
0.51
T2,C3
0.28
0.56
0.06
0.43
T3 ,C3
0.35
0.79
0.26
0.95
T4,C3
0.19
0.38
0.12
0.52
60
the worth of the passive stance of simply throwing away
the censored observations. Clearly, this is not a valid
sampling procedure for application of the usual least
squares theory. It is intuitively likely that slope esti
mates would be unbiased in the case of type Cl censoring.
At least three uncensored observations are required for
application of the usual error variance estimation.
d) BuckleyJames Normal approximation (BJN). The
method of Buckley and James as explained in Section 2.5 was
used. It is an asymptotic method and its appropriateness
in small samples is unknown. At least three uncensored
observations are required. As discussed previously
(Section 2.5) the convergence of the BuckleyJames method
is not guaranteed and can be a problem. For the sake of
these simulations the following procedure was adopted.
Convergence was considered to have occurred if iterates
agreed to within 0.00005. If convergence was not reached
within 20 iterations a search for oscillation cycles was
made over the final 10 iterations. If 2 iterates in this
search agreed to 0.005,then the arithmetic mean of all
iterates in that cycle was taken as the point estimate.
If such agreement was not reached within the 10 iterations,
then it was deemed that "no solution was possible." Doubt
less a solution could be contrived in an individual case,
but no further attempt to coax out a solution was made in
these trials. In some simulations the starting value chosen
Table 12. Mean lengths of 90% confidence intervals based on 200 simulations
for sample size n = 7.
LS
N
1
BJT
BJT
N
Tl, Cl
0.729 (. 017)
1.360 (.054)
1.500 (. 124)
1.739 (.156)
0.379*
(.135)
T2 ,C1
0.685 (. 021)
1. 442 (. 069)
1.771 (.155)
2.101 (.221)
0.659*
(.183)
T3,C1
0.692 (. 030)
0.900 (.050)
0.804 (.066)
0.934 (.075)
0.035
(. 047)
T4 ,C1
0.735 (. 013)
1.289(.051)
1. 463 (.104)
1.781(.171)
0.492*
(.141)
Tl ,C2
0.714 (.019)
1.653 (.080)
1.577 (.133)
1. 847 (.168)
0.194
(.134)
T2 ,C2
0.718 (.023)
1.502 (. 069)
1.501 (.115)
1.766 (. 188)
0.264
(.161)
T3 ,C2
0.628 (. 022)
1.179(.066)
0.941 (. 074)
1.080(.102)
0.098
(.084)
T4 ,C2
0.721 (.014)
1.578 (.071)
1.440(.126)
1.676(.156)
0.098
(.125)
Tl ,C3
0.737 (.018)
1.203 (.057)
1.053 (. 122)
1.276(.131)
0.073
(.099)
T2,C3
0.701 (.022)
1. 173 (.058)
1.073 (. 084)
1.292 (.105)
0.118
(.075)
T3 ,C3
0.614 (.022)
0.669(.031)
0.617 (.064)
0.770 (.072)
0.102
(.058)
T4,C3
0.740 (.013)
1.373(.051)
0.900 (.058)
1.241 (.096)
0.132
(.079)
Indicates a difference greater than 2 s.d. Values in parentheses are estimated
standard deviations for the entries.
45
represent adjacent ranks and the exact nature of the change
depends upon the censoring pattern. Let r^=1,...,nl
represent arbitrary ranks. If c, = c. = l, then one of two
K 1]
situations must obtain as b passes through b, :
(1)
R(+) fR(+) R(+) R(+) ^
/ ^2 ^ /1\ j
,R
= (r.+1) ,.. ,R
n
will become
R + (R ( + ^ R ( + )
k,l lKl ,K2 '
,Rj+) =rx+l,
( + ) +
Ri r i'
,R(+))
n
or (2)
(+)
k1,2 (R1
(+)
, R,
( + )
r R .
( + )
 r+ r ( + ) r +1
,R ( + ) ) '
n
will become
( + ) ( + ) (+)
k 12 ~ 'Ki 7 9
,Rj + ^ = (^+1) + ,. . rRl+) =r2'
/R
( + )
n
In (1)
the
. th
position has the censored rank and in (2)
th
the censored rank is in the j position. Let s^ denote
the value of S(b) computed from Then from (2.15)
t ,m
s, = s, 1, m = 1,2. Notice that because the ranks
k,m kl,m
are adjacent, (r^+1)+ has the same concordances and dis
cordances with elements of R./ + as rt has with elements
k1,1 1
( + ) +
of R^ except for concordances between (r^+1) and
A similar
r. in r} + 1 i and rt and r,+l in r}+1 .
1 k1,1 1 1 k, 1
situation obtains for r^ and r^+1 in situation (1) and
also for corresponding ranks in situation (2). Notice
further that for any permutation of the elements of R^+]_
keeping x's fixed the rank pair (r^+l,r^) will contribute
41
of S(b) for b lying between critical values. The following
Lemma and Corollary concern situations for which the distri
bution of S(b) changes as b changes through a b^.
Lemma 3.8. Suppose b.. = b, for just one pair of subscripts
1J K
(i,j). That is, b^ is a nontied critical value. Consider
b', b" such that b^_^ < b' < b^< b"
by (2.10).
d
i) If c^j = 2 or c^j = 0,then S(b') = S(b") using
indicate "equal in conditional distribution."
ii) If c^j = 1 and xvalues are distinct,then S(b')
to
dc
# S(b").
Proof;
i) Since b. = b, for just one pair of subscripts (i,j)
ID K
it follows from (2.12) that only one pair (z^(b), (b))
changes order as b changes through b, Now, c. = (6 + 6 ) = 2
K 1] J 1
or 0 implies 6^ = 6^ = 1 or 5^ = 5^ = 0, so that the elements
which change rank order are either both censored or both
uncensored. Thus the censoring pattern over the ranks
of (z^(b'): i = l,...,n> is the same as the censoring pat
tern over the ranks of (z. (b") : i = 1,...,n) and conse
dc 1
quently S (b') = S (b") .
ii) Immediate from Lemma 3.6.
Corollary 3.9. If c.. = 2 or 0 for each pair (i,j) such
13
that b. =b, then the permutation distribution of S is not
JD k
changed as b jumps through b^ from b' to b".
Proof: Immediate from Lemma 3.8.
86
4.3 Summary of Results
Both the BuckleyJames and the new method were put
forward as procedures which did not require assumptions
about censoring as far as point estimation was concerned.
The indications from the trials reported in Tables 5 and 6
are that both methods might lead to biased results with
small samples showing extreme departures from assumption A2.
Otherwise point estimation was close to the model setting
except in two cases using method BJ. In terms of the method
ologies the new method will always give an estimate whereas,
as shown in Table 4, convergence difficulties can arise with
BJ, especially with small samples.
The confidence interval procedures for both BJ and N
require assumption A2 and a major concern is how robust the
methods turned out to be against departures from that assump
tion. The observed coverage over all trials with method N
was always close to or greater than the nominal 90%. From
the results in Tables 8 to 11 it seems that method BJN should
not be advocated for samples up to size 25. Contrary to
anticipation, method BJT gave more security over coverage
at the small sample size. At larger sample sizes BJT would
probably attain coverage with mild departures from A2 but
perhaps could not be relied on with severe departures from
A2. The indications are that over a range of sample sizes
less than 30 method N has the better robustness properties.
32
Proof
i) Since z[+) (b) = bx^, it follows that for
X > X ,
1 3
>
Zi (b) = Z. (b)
<
b < b. .
13
b bij '
b > blj
(2.12)
where b^j is as in (2.9). Therefore
nj (b) = 6j(j)(zi(b) z.(b)) 5i4)(zj(b) z(b))}
can change only as b changes through b^ j.
ii) Notice that
n. .(b) =
13
, zi (b) > Z. (b)
, Z (b) = Z.. (b)
, Zi(b) < Z. (b)
(2.13)
In view of (2.12) it is clear that for (i,j)
X. > X .
1 3
we have that n^j (b) decreases by (6^+6^) as b increases
through b. .. Letting c. denote the change, c. .=(6.+ 6.)
completes the proof of (ii).
iii) Recall that
S (b) = Z n. (b) (2.14)
{ (i, j) :x.>x.} J
which can change with b only when (b) changes with
b for some (i,j). Since n (b) changes only at b = b, ,
13 x
S (b) will also change only at b=bk> For a specified b^
the change in S (b) is the accumulation of the change in (b)
as (i,j) varies over B^.; that is the change is c^ =
which completes the proof of (iii).
(i, j)e Bk
c. .
13
21
/N ^\ /\
Here as before = y^ 8x^, F(*) is the PL estimator based
/N /N
on (z^,^), i = 1,. ..,n, and w^ (8) i = 1,...,n are the
jumps of F(*) (see 4.4). The second term on the righthand
A
side of (5.2) uses a weighted average of those z^'s which
A
are greater than the current z^. Iteration is again neces
sary.
Given a starting estimate of 8/ (5.2) is used in (5.1)
*
to obtain the "responses" y^, i = l,...,n. Using least
A
squares the next iterate for 8 is
6
* 
Z y. (x.x)
i=l 1 1
n
Z
i=l
(xix)
(5.3)
Again this process may not converge. Oscillation cycles
might develop and the average of values in the cycle is
suggested as the solution. Given a solution for 8 the
solution for a is
a = y 8 x. (5.4)
The starting value for the iterations might be the
same as for the Miller method or it could, as suggested by
Miller and Halpern (1982), be the least squares solution
for $ using all of the data as if they were uncensored.
The variability among the oscillating values if they occur
is thought to be less in the BJ method than in the Miller
method.
CHAPTER FOUR
COMPARATIVE STUDIES
4.1 Simulations Using a Simple Linear Model
As discussed in Section 2.6 there are some reasons for
preferring the Cox and the Buckley and James regression
estimates for use with censored data. According to Miller
and Halpern (1982 p. 530) "the choice between {these two
methods) should depend on the appropriateness of the propor
tional hazards model or the linear model for the data."
This section reports some simulation studies that compare
the performances of the Buckley and James method and the new
method of Chapter Three using data generated from the linear
model (2.3.1) with p =1. To afford some comparison with the
original simulations reported by Buckley and James their
choice of 8 = 0.2 for the linear model is adopted.
Wefirst describe how data were generated for the
simulation trials. Recall that according to model (2.3.1)
the survival times T^ have the representation
T. =a+Bx. +Â£., i=l,...,n.
X 11
For this series of simulations we fixed a = 30, 8 = 0.2
and the covariate x was given values from 40 to 100 in
increments of 60/(nl) for sample size n. Four models,
55
85
Table 16. Regression slope estimates and confidence
intervals for log^o survival time versus
age at transplant with Stanford heart
transplant data. (n=157 except where stated)
Estimator 6 95% C.I.
Cox
0.030
(0.008, 0.052)
Miller
(loop with
0.001
(0.023, 0.021)
2 values)
0.000
(0.016, 0.016)
BJ
n = 157
0.015
(0.031, 0.001)
n = 184
0.014
(0.028, 0.000)
KSV
0.024

New
n = 157
0.030
(0.050,0.010)
n = 184
0.026
(0.045,0.009)
\
Table 17. Regression slope
estimates and
confidence
intervals for log^o of survival times versus
mismatch score with n = 157 Stanford heart'
transplant patients.
Estimator 8 95% C.I.
BJ 0.105
New
0.002
(0.373, 0.163)
(0.327, 0.311)
28
statistic that would indicate any such trend. Firstly, in
the intuitive spirit of Brown, Hollander and Korwar we state
the following definition. The argument b of Z^ () is
suppressed for convenience and Z without superscript is the
variable value without regard to censorship.
Definition 2.2. A pair of points (z+^ ,x^) (zj+^ ,x^)
with > xj are "definitely concordant" if,
either
6 = 6 =1
i D
and Z > Z ,
i D
or
6^ = 0 =1 and Z^ > Z^ ;
and "definitely discordant" if,
either
6 = 6j = !
and Z. < Z ,
i D
or
6 = 1, 6. =0 and Z. < Z .
I D ID
Pairs of points for which Definition 2.2 does not apply are
considered as unable to contribute to an assessment of over
all concordance and are ignored.
Motivated by this assessment of pairs of points we
define a statistic that may be used to indicate trend among
the pairs (z+^ (b) ,x^) i = 1,. . ,n, by
( + > n n
S (Z (b) ,x) = Z Z n,^(b) Ip (x. x.) (2.3)
i=l j=l 1J 3
where
77
1.3 times the length of LS intervals except, as before, in
the case of T3 models where all methods tended to give
shorter intervals than LS.
Table 14 gives mean differences of interval lengths and
their estimated standard deviations for paired comparisons
of N with BJN and N with BJT. Assessment of these differ
ences must take account of the observed coverages and these
are included in the table. Where coverages were comparable,
intervals by BJN were significantly shorter than those by N
on three occasions and significantly larger on one occasion.
Only once did BJT attain both reasonable coverage (87.4%)
and a significantly shorter length than N. Method N gave
significantly shorter intervals than BJT on seven occasions.
LIST OF TABLES (Continued)
Table Page
13. Mean lengths f 90% confidence intervals
based on 200 simulations for sample
size n = 15 79
14. Differences in mean lengths of 90% confidence
intervals based on 200 simulations for
sample size n = 25 80
15. Proportions of 200 simulations for which
Hq: 6=0 was rejected at a test level
a = 0.1 82
16. Regression slope estimates and confidence
intervals for log1Q of survival time versus
age at transplant with Stanford heart
transplant data 85
17. Regression slope estimates and confidence
intervals for log^g of survival times versus
mismatch score with n = 157 Stanford heart
transplant patients 85
vii
42
From the prior Lemmas and discussion it is clear that
we can identify ranges of b values for which critical points
for the tests (3.4) do not have to be recomputed. A change
of b through certain b^ will change the permutation distri
bution and in these cases test critical values must be
recomputed, from a regenerated permutation distribution.
The question of whether the confidence set CSD at
p
(3.5) will be a single interval is now addressed. It will
be shown that if, as b values are incremented outwards from
the point estimate, a point is reached at which test (3.4)
leads to rejection, no subsequent test would lead to accep
tance, and thus CSD does make up a single interval of plaus
p
ible b values.
We again denote critical b values as in Lemma 2.8(iii) ,
i.e., b^ < b2 < ... < b^, and let Sk1 denote the random var
iable S(b') and Sk denote S(b") for bk1< b' < b^ < b" < bk+1>
As noted earlier the distribution of is the same for any
b^
bution for we make the following definition.
Definition 3.10. The pvalue for a sample realization s, is
Pk "
Pr (Sk < sk) sk C 0
.Pr (Sk > sk) sk ^ 0.
In terms of the test at (3.4) we would reject Hq: 6 b'
for some bk < b" < bk+1 if pk < a/2. Consider sk_^ < 0
3
Table 5. Arithmetic means (x 10 ) based on 2000 simulations of slope estimates for
samples of size n = ?.
%
Data Censored
LS
N
1
BJ
Number of
BJ Estimates
Tl, Cl
38
204
(4.2)
204
(5.6)
197 (6.1)
203
(5.2)
1960
T2,C1
42
197
(4.1)
195
(5.8)
195 (6.6)
196
(5.4)
1938
T3 ,C1
36
203
(4.1)
200
(3.8)
202 (4.2)
201
(4.0)
1971
T4,C1
39
199
(4.3)
200
(5.6)
202 (6.3)
199
(5.4)
1943
Tl ,C2
41
200
(4.2)
197
(6.0)
169* (6.7)
194
(5.7)
1941
T2 ,C2
40
196
(4.2)
200
(5.9)
165*(6.6)
193
(5.6)
1948
T3,C2
41
204
(4.1)
200
(4.4)
181*(4.6)
195
(4.4)
1947
T4,C2
41
195
(4.3)
199
(6.2)
167* (7.0)
194
(6.0)
1921
Tl ,C3
38
209
(4.2)
k
220
(5.5)
091* (5.1)
200
(4.9)
1937
*
*
*
T2,C3
44
203
(4.2)
218
(5.6)
051 (6.0)
184
(5.2)
1901
k
*
k
T3,C3
37
198
(4.2)
193
(2.8)
136 (2.5)
186
(2.6)
1965
*
*
*
T4 ,C3
30
186
(4.4)
192
(5.7)
088 (4.5)
177
(5.1)
1958
indicates that the mean is more than 2 standard deviations from the nominal
3 = 0.2. Estimated standard deviations (x 103) of the means are given in
parentheses.
APPENDIX B
TWOSAMPLE ADAPTATION
Clearly by coding x = 0,1 the previous program can be
used to analyze a twosample problem (Section 3.5). However,
for this special application it is computationally propitious
to replace the permutation segment between COMMENTS 2 and 3
with the following segment for generating permutations in the
twosample case. The advantage of this is that we no longer
need to consider all permutations of the x's because permuta
tions within tied x's lead to the same values for S(*). It is
necessary to consider only the different configurations of 0's
and l's in the xvector. For this application subroutine
XCALC should be replaced by subroutine SAM and subroutines
SWITCH and SHUNT are not required. The program applies pro
vided that x = 1 is associated with a sample size M = 10 or
less, but the dimensions of array ISUMX must be increased if
sample size N > 15 or (^) > 11,000.
104
NONPARAMETRIC REGRESSION IN THE
ANALYSIS OF SURVIVAL DATA
BY
MICHAEL J. IRESON
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1983
92
in practice. The trials with the new method have given
encouraging indications of the method's robustness against
departures from that assumption, but it remains a major
challenge to relax the assumption completely.
58
gives the percentage of censored responses over all simula
tions for the particular data type.
Table 3. Frequency of occurrence of censoring at different
x in 2000 simulations of samples of size n = 7.
Data
%
Censored
40
50*'
xvalues
60 70
80
90
100
Tl,
Cl
38
723
774
791
742
769
793
784
Tl,
C2
41
641
681
754
815
891
916
1015
Tl,
C3
38
320
453
583
765
942
1070
1251
We now explain five methods of analysis which were
compared using the simulated data. The abbreviations given
below in parentheses will be used to identify the methods
throughout the presentation of the results.
a) Least squares (LS). Standard least squares analysis
was applied to the survival times T as if they were all known.
The usual confidence interval procedure is strictly valid
only in case Tl. In that case the confidence interval serves
as a standard with which to compare other methods. The point
estimates will be unbiased in all cases and this fact pro
vides some check on the simulation procedure.
b) New method (N). This is the method introduced in
Chapter Three. It is valid for all models Tl through T4
with censoring mechanism Cl. The application of this method
differs for different sample sizes. Simulations were run
APPENDIX C
LARGE SAMPLE PROGRAM
The final listing is for the large sample routine.
Array sizes in the program would need to be changed for
sample sizes greater than 200. The computations for
Daniels' approximate variance (3.4.2) are made in subrou
tines COEF and VAR.
107
APPENDIX A
SMALL SAMPLE REGRESSION PROGRAM
This appendix contains a listing of a FORTRAN program
that could be used for obtaining a confidence interval
using the new method with sample sizes 4 to 15. Comments
in the listing are numbered for easy reference.
Recall that the exact method requires creating all
permutations of the pairs (z^,^), i = l,...,n, keeping
x^, i = l,...,n, fixed, or equivalently permuting x's keep
ing the (z^,5^) pairs fixed. The zvalues are functions
of b and change as the search over b values is made, whereas
xvalues do not change. There is therefore much advantage
in making a onetime computation for each permutation of
the xvalues and then using those with each set of (z^,<5^)
pairs as they are formed. Thus, for distinct and ordered
Z1/<'Z2 <'r **' < zm we have from (3.2.5),
n1
S(b) = 2
j=l
n
6. 2 sgn (x. x.)
3 i=j+l 1 3
where the second summation can be made for each permuta
tion of xvalues without regard to the value of b. Provided
that we do always have distinct zvalues they can be ordered
for a particular b setting and corresponding 6's identified.
94
68
Table 7. Simulation results concerning estimation
of the variance of the BuckleyJames
estimator with sample size n = 7.
Data
# of
Estimates
Var *BJ
Mean of
Variance
Estimates
Tl ,C1
1960
0.053
0.095
T2,C1
1938
0.057
0.096
T3,C1
1971
0.031
0.045
T4 ,C1
1943
0.057
0.092
Tl ,C2.
1941
0.064
0.103
T2 ,C2
1948
0.061
0.091
T3,C2
1947
0.038
0.049
T4 ,C2
1921
0.069
0.110
Tl ,C3
1937
0.046
0.091
T2 ,C3
1901
0.052
0.104
T3 ,C3
1965
0.013
0.019
T4,C3
1958
0.051
0.066
114
Prentice, R. L. and L. A. Gloeckler. (1978). Regression
analysis of grouped survival data with application to
breast cancer data. Biometrics, 34, 5767.
Randles, R. H., and D. A. Wolfe. (1979). Introduction to
the Theory of Nonparametric Statistics. Wiley, New
York.
Samuels, S. (1978). Robustness of Survival Estimators.
Ph.D. Thesis. Department of Biostatistics. Univ. of
Washington.
Schmee, J. and Hahn, G. J. (1979). A simple method for
regression analysis with censored data. Technometrics,
21, 417432.
Sen, P. K. (1968). Estimates of the regression coefficient
based on Kendall's Tau. J. Am. Stat. Assoc., 63,
13791389.
Susarla, V., and J. Van Ryzin (1980) Large sample theory
for an estimator of the mean survival time from censored
samples. Ann. Statist., 8, 10021016.
Tsiatis, A. (1981). A large sample study of Cox's regres
sion model. Ann. Statist. 9, 93108..
Turnbull, B. W. (1974). Nonparametric estimation of a
survivorship function with doubly censored data.
J. Am. Stat. Assoc., 69, 169173.
Turnbull, B. W. (1976). The empirical distribution function
with arbitrarily grouped censored and truncated data.
J. R. Stat. Soc. B, 38, 290295.
95
Application of the above expression then gives a value of
S (b) for each permutation of the xvalues, as required
for the test described at (3.3.4).
In case of tied zvalues the exact tests should be
obtained via expression (3.2.3) but this time with much
greater computing expensebecause now the whole computa
tion for the permutation distribution for S(b) would need
to be made in each region over b where the distribution of
S(b) is known to differ. In the terminology of Chapter
Three it is clear that zvalues can only be equal at crit
ical bvalues (3.2.9) and when data points coincide; i.e.,
(y+)/Xi) = (Yj+) ,Xj) for some (i,j) (from z+) (b) = y+) bx^ .
Permutation distributions are not considered at the critical
bvalues and we need consider only the problem of coincident
points.
To avoid the expense of repeated use of (3.2.3) we
advocate using randomized ranks over tied zvalues and pro
ceeding via (3.2.5). Realize that two censored zvalues
do not contribute to the computation of S(b) anyway and that
a censored zvalue tied numerically with an uncensored
zvalue is considered greater. Thus random ordering of tied
uncensored zvalues is all that is necessary and this is
effected in the program by a "fixup" on the y data values
(see COMMENT 2). Coincident points are rather unlikely in
practice with the regression of continuous y on continuous x.
TABLE OF CONTENTS (Continued)
APPENDICES Page
A SMALL SAMPLE REGRESSION PROGRAM 94
B TWOSAMPLE ADAPTATION 104
C LARGE SAMPLE PROGRAM 107
BIBLIOGRAPHY 112
BIOGRAPHICAL SKETCH 115
v
CONTINUATION OF LISTING FOR APPENDIX A
103
C SUBROUTINE RANK: LABELS CENSORING INFORMATION
C ACCORDING TO RANK OF Z VALUE.
C
SUBROUTINE RANK(Z, IC, ICENS, N)
DIMENSION Z(5Q),IC(50),ICENSC50)
DO 20 1=1/N
K=1
DO 10 J=l/ N
IF(Z(I> LE. Z (J > )GO TO 10
K=K+1
10 CONTINUE
ICENS(K)=ICCI)
20 CONTINUE
RETURN
END
C
C SUBROUTINE DIST: COMPUTES 'IS' USING (3.2.5)
C AND FORMS FREQUENCY DISTRIBUTION IN
C ARRAY 'IF'
C
SUBROUTINE DISTIII, IF. ICENS,M/NN)
COMMON ISUMX(11000, 15)
DIMENSION IF(100), ICENS(50)
M1=M1
DO 20 K=l, III
IS=0
DO 10 J=l,Ml
IS=IS+ICENS(J)*ISUMX(K, J)
10 CONTINUE
IS=IS+NN
IF(IS)=IF(IS)+1
20 CONTINUE
30 FORMAT(2110)
RETURN
END
4
and by several other authors. A transplant patient will
receive a transplant if he survives until a donor is found.
Records for a patient who does receive a heart include
date of admission to the program, date of transplant and
the date of death if a death occurred. Survival times of
those who receive a heart will in some cases be right
censored. This would occur if analysis was carried out on
a date on which some recipients were still alive, or if
contact was lost with a patient at a time when he was known
to be alive, or if a patient died from some cause not
related to heart function. It seems reasonable in this
example to assume that the censorship time is not related
to the survival time except perhaps in a case where a
patient may have chosen to drop out due to some features of
heart performance.
Special techniques are needed for analyzing censored
data. Many of the available nonparametric methods for these
problems are given in recent textbooks by Miller (1981) ,
Kalbfleisch and Prentice (1980), Lawless (1981) and Elandt
Johnson and Johnson (1980) The particular problem tackled
in this study is now described.
16
2.4 The Miller Method
For simplicity of description we now take (3.1) with
p=l, but note that all the methods of this chapter do apply
to multiple regression.
The Miller method is seen to be a natural modification
of ordinary least squares by writing the residual sum of
squares for the uncensored case in the following way;
2 (y ctBx. ) 2 = nf z2 (6) dF (z),
i=l 1 1 Joo n
where
Zj_(3) = yi a 8x^, (4.1)
and F is the empirical distribution function of z.(6),
n i
i = l,...,n. In the censored case some y^ and hence some
z^ will be censored values. Miller's (1976) modification of
the least squares'method is to estimate'8 by minimizing
Jz2 ( 8) dF (z) (4.2)
where F(z) is an estimate of the distribution function of z
based on the observations (z^,6^), i = 1,2,...,n.
For ordered censored and uncensored observations,
z... < z < ... < z, the productlimit (PL) estimator
for F(z) introduced by Kaplan and Meier (1958) is
t (z)
1 
{ i : z
n
(i)
ni
'ni+1
< z 6
)
)
1}
f
(4.3)
15
A2: Pr (C. < cx = X. ) = G (c) = G (c j?'x. )
where G is some distribution function and G is the
o x.
l
distribution function for the censoring variable at x^
Assumption A2 imposes a dependence between and x^. The
distribution shift given by A2 ensures that the probabil
ity distribution of 6^ is independent of x^. To see this
we have
Pr(6. = 11 x = x. ) = Pr (T < C. I x = x.)
i i i i1 i
= J Pr (Ti< cx= Xj_/ Ci= c) dGQ(cH3' x.Â¡_)
= J Pr (T^ < c  x= x.Â¡_) d G0
= J F (c a J3' xi)dGQ (c J3' x^)
= J F (u a)dGQ (u) ,
which is independent of x^*
An alternative to A2 is
A3: G (c) = G (c) .
X (J
1
Thus under A3 the censoring distribution does not depend
on x. .
l
These assumptions will come into consideration through'
out what follows. We now look at three methods of esti
mating the parameters of (3.1).
38
Thus under Hq: 8 = b and assumption A2, the conditional dis
tribution of a random W(b), conditional on its observed
value w(b) is uniform on {g (w(b)) Â£ e P }. Using the
E n
superscript c to indicate that the result is conditional we
have that the conditional null distribution of W(b) is
Prc (W (b) = g (w(b))) = ^7 Â£ e Pn (3.3)
The conditional discrete null distribution of S(b) can thus
be determined by computing S(g(*)) for each equally likely
g(*). A conservative test of Hq: 8 = b at nominal level a is
then given by
(1 S (w (b) ) > s (b) or S (w (b) )
4>(w(b)) =( (3.4)
v.0 otherwise,
where (w(b)) is the probability of rejecting Hq when
W (b) = w (b) ,
s (b) = Inf {s: Prc (S(b) > s  8 =b)}
u
and
sÂ£ (b) = Sup is: Prc (S(b) < s  8 =b)}>a/2).
The test given by (3.4) defines a conservative 100(la)%
confidence set for 8:
CSQ = (b: s (b) < S (b) < s (b) } (3.5)
p U
consisting of the set of all "acceptable" values of b.
113
Kalbfleisch, J. D., and R. L. Prentice. (1980). The
Statistical Analysis of Failure Time Data. Wiley,
New York.
Kaplan, E. L., and P. Meier. (1958). Nonparametric estima
tion from incomplete observations. J. Am. Stat. Assoc.,
53, 457481.
Kendall, M. G. (1970). Rank Correlation Methods, 4th Ed.
Griffin, London.
Kendall, M. G., and A. Stuart. (1973). The Advanced Theory
of Statistics, Vol. 2. Griffin, London.
Koul, H., V. Susarla, and J. Van Ryzin. (1981). Regression
analysis with randomly rightcensored data. Ann.
Statist., 9, 12761288.
Lawless, J. F. (1981). Statistical Models and Methods for
Lifetime Data. Wiley, New York.
Leavitt, S. S., and R. A. Olshen. (1974). The insurance
claims adjuster as patients' advocate: quantitative
impact. Report for Insurance Technology Company,
Berkeley, California.
Lehmann, E. L. (1959). Testing Statistical Hypotheses.
Wiley, New York.
Mann, N. R., R. E. Schafer, and N. D. Singpurwalla. (1974).
Methods for Statistical Analysis of Reliability and
Life Data. Wiley, New York.
Mantel, N., and W. Haenszel. (1959). Statistical aspects
of the analysis of data from retrospective studies of
disease. Journal of the National Cancer Institute,
22, 719748.
Miller, R. G. (1976). Least squares regression with
censored data. Biometrika, 63, 449464.
Miller, R. G. (1981). Survival Analysis. Wiley, New York.
Miller, R. G., and J. Halpern. (1982). Regression with
censored data. Biometrika, 69, 521531.
Nelson, W. (1981). Applied Life Data Analysis. Wiley,
New York.
Oakes, D. (1977). The asymptotic information in censored
survival data. Biometrika, 64, 441448.
Peto, R. (1972). Discussion on paper by D. R. Cox.
J. R. Stat. Soc. B, 34, 205207.
CONTINUATION OF LISTING FOR APPENDIX A
99
111=111+1
CALL XCALCCXP,M,III,X)
GO TO 370
37S XPCM5)=6
DO 360 K5=l,6
CALL SWITCHCK5, 7, XP M)
IF (M NE. 6)GO TO 365
111=111+1
CALL XCALCCXP, M, III,X)
GO TO 360
365 XP(n6)=7
DO 350 K6=l,7
CALL SWITCHCK6, 8, XP, M)
IFCM NE. 7)GO TO 355
111=111+1
CALL XCALCCXP, M III, X)
GO TO 350
355 XPCM7)=8
DO 340 K7=l, 8
CALL SWITCHCK7, 9, XP, M)
IF(M NE. 8)GO TO 345
R=RNDMF(1. 0>
IFCR LT. 75)GO TO 340
111=111+1
CALL XCALCCXP, M, III, X)
GO TO 340
345 XP CM8)=9
DO 330 K8=l,9
CALL SWITCHCK810, XP, M)
R=RNDMFC1. O)
IFCR LT. 972)GO TO 330
111=111+1
CALL XCALCCXP,M, III, X)
330 CONTINUE
CALL SHUNTC8, XP, M)
340 CONTINUE
CALL SHUNTC7, XP, M)
350 CONTINUE
CALL SHUNTC6, XP, M)
360 CONTINUE
CALL SHUNTC5, XP, M)
370 CONTINUE
CALL SHUNTC4, XP, M)
380 CONTINUE
CALL SHUNTC3. XP, M)
390 CONTINUE
CALL SHUNTC2, XP, M)
400 CONTINUE
GO TO 699
600 M=N
DO 610 K=l,lOOOO
DO 620 1=1, M
620 IACI)=I
DO 630 1=1, M
MM=M1+1
R=RNDMF C1. O)
JJ=INTCR*MM) + 1
XP CI)=IACJJ)
IA
630 CONTINUE
111=111+1
CALL XCALCCXP, M, III, X)
610 CONTINUE
699 CONTINUE
COMMENT 3 COMPUTE B'S
L=0
DO 35 1=1, N
DO 40 J=l, N
IF CXCI) LE. XCJ)) GO TO 40
L=L+1
C
C
C
noon
CONTINUATION OF LISTING FOR APPENDIX A 101
DO 120 K=l,NNN
KCOU=KCOU+IF(K)
PTU=FLOAT(KCOU >/FLOAT(ITOT)
IF (PTU GT. XXX) GO TO 130
120 CONTINUE
130 ISCUT=KNN
KCOU=KCOUIF(K)
PTU=FLOAT(KCOU)/FLOAT(ITOT)
GO TO 78
150 BUP=B(J)
151 J=ISUB1
ISCUT=10
140 ICH=0
142 IF(B(J) NE. BGO TO 146
IF(C(J) EQ. DICH1
1
GO TO 142
146 IF(ISCUT .EG. 10)G0 TO 155
IFC (C (J) .EG. 1) .OR. (ICH .EG. 1))G0 TO 155
148 IF(S
IF(J NE. 2)GO TO 143
BLQW=9999. 0
GO TO 221
143 J=J1
GO TO 140
155 BB=(B(J)+B(J1))/2. 0
DO 160 K=l, N
160 Z(K)=Y(K)BB*XCK)
CALL RANK(Z, IC, ICENS, N)
DO 170 K=l, NNN
170 IF(K)=0
CALL DIST(III, IF ICENS/ N, NN)
175 ITOT=0
DO 180 K=l, NNN
180 ITOT=IF(K)+ITOT
KC0U=0
DO 190 K=l, NNN
KCOU=KCOU+1F < NNNK)
PTL=FLOAT(KCOU)/FLOAT(ITOT)
IFCPTL GT. XXX) GO TO 200
190 CONTINUE
200 ISCUT=NNK
KCOU=KCOUIF(NNNK)
P TL=FLOAT(KCOU)/FLOAT(ITOT)
GO TO 148
220 BLOW=B(J)
221 CONTINUE
WRITE(6, 230)BPNT
WRITE(6/ 232)
232 FORMAT(' ')
WRITE(6/ 231)IPC/ BLOW, BUP
230 FORMAT(1 POINT ESTIMATE 7,F10. 2)
231 FORMAT( 15, PERCENT INTERVAL ',F6. 2,' F6. 2)
STOP
END
SUBROUTINES SHUNT AND SWITCH: USED IN
GENERATION OF ALL PERMUTATIONS.
SUBROUTINE SHUNT(J, B, M)
DIMENSION B(10)
DO 10 1=1, J
10 B(M1+1)=B(MI)
RETURN
END
SUBROUTINE SWITCH(K, J, B, M)
DIMENSION BC10)
IF(K .EG. 1)G0 TO 10
SAVE=B(MJ+K)
B(MJ+K)=B(MJ+1+K)
B(MJ+1+K)=SAVE
10 RETURN
END
onooooo
CONTINUATION OF LISTING FOR APPENDIX B
106
575 K8=XP(8)+1
NM9=NM+9
DO 580 J9=K8,NM9
XP(9)=J9
IF(M NE. 9)GO TO 585
111=111+1
CALL SAM(XP, III, M. N)
GO TO 580
585 K9=XP(9)+1
NM10=NM+10
DO 590 J10=K9,NM10
XP(10)=J10
111=111+1
CALL SACKXP III, M, N)
590 CONTINUE
580 CONTINUE
570 CONTINUE
560 CONTINUE
550 CONTINUE
54Q CONTINUE
530 CONTINUE
520 CONTINUE
510 CONTINUE
500 CONTINUE
SUBROUTINE SAM: FORMS ISUMX, THE MATRIX
OF VALUES WHERE ROWS CORRESPOND TO
DIFFERENT CONFIGURATIONS OF O'S AND 1'S
AND COLUMNS ARE AS IN ISUMX IN SMALL SAMPLE
SUBROUTINE XCALC.
SUBROUTINE SAM
COMMON ISUMX(11000, 15?
DIMENSION K.( 10), IX (30)
DO 10 J=l, N
IQ IX(J)=0
DO 20 J=l,M
20 IX(K(J))=1
IT0T=0
N1=N1
DO 30 J=1,N1
JJ=NJ
IT0T=IX(JJ+1)+ITOT
ISUMX (III, JJ) = ITOTIX(JJ)J
30 CONTINUE
RETURN
END
14
2.3 The Linear Model
We now state the structure, notation and possible
assumptions for a censored linear model that will be
referred to throughout the sequel. Thus,
T^=a+j3'x^ + e^ i = 1,. . ,n,
(3.1)
where a is a parameter, J3 a p.xl vector of regression
parameters and x a pxl vector of known covariables. The
error terms are independent and identically distributed
(i.i.d.) with common distribution function F() and E(e^) =0.
The response T^ may of course be a transformation, possibly
log, of some other response of interest. The T^ are not
all observable. Let CL i=l,...,n be independent censor
ing variables and (Y^,6^) be observable pairs of variables
where
Y. =
l
Min
(T.,C.)
i i
H
II
0
and
f1
if T.
i
< c.
1
6. =
X
L
if T.
l
> c..
1
Attention
must
be given
to three possible
A fundamental assumption imposed throughout is
Al: CL are independent i = l,...,n.
A second, which is essential for the validity of some of
the following methods is
56
referred to as Tl through T4 were used for the distribution
of e in generating Tvalues. These were
and
T1 : e N (0,100) ,
T2 : Â£ Double exponential, i.e.,
f (Â£) = exp oo < Â£ < oo
with y = 7.07,
T3 : Â£ Shifted exponential, i.e.,
f ( Â£) = ^ exp (^) / 10 < Â£ < 00
with y = 10 ,
T4: e Uniform (a, a)
with a = 17.32.
Such choice permits comparisons over heavy tailed, light
tailed and nonsymmetric distributions. The variance of
e is 100 in each case in keeping with the scale of the
models used by Buckley and James.
Three different censoring mechanisms were imposed on
sets of Tvalues simulated using each of the distributions
Tl T4. These were
Cl : Uniform (Bx^ + a, 8x^ + b)
where a and b are constants chosen to control the censoring
rate
73
Table 11. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 25.
LS
N
1
BJN
BJT
% Censored
T1, Cl
92.5
91.5
90.5
91.0
91.5
44
T2,Cl
91.0
91.5
91.5
91.5
95.5
50
T3, Cl
91.5
90.5
88.5
86.0
89.0
38
T4 ,C1
87.5
87.0
87.0
87.4
90.0
44
T1 ,C2
92.0
87.0
88.5
86.9
91.5
46
T2 ,C2
89.5
88.0
85.5
87.4
90.0
46
T3 ,C2
84.5
91.0
88.5
79.0
82.0
47
T4 ,C2
90.5
89.0
90.0
89.5
91.5
47
Tl ,C3
94.0
94.0
71.0
83.4
87.4
43
T2 ,C3
92.5
94.0
64.0
88.3
90.8
51
T3 ,C3
87.5
92.0
69.5
81.5
84.5
38
T4 ,C3
89.0
88.5
77.5
78.5
82.0
32
Nominally intervals were 90%. Standard deviations for the
table entries are approximately 2.1.
CHAPTER ONE
INTRODUCTION
1.1 The Concept of Censoring
A good deal of research data do not conform to optimal
statistical designs and methods. A common departure from
the ideal is the "missing data" problem. Another departure,
common in certain contexts, is the occurrence of what we
might think of as partial information. For example, in
medical followup, the time from an initial treatment to a
particular event, such as "relapse" or "death," would not
be available if contact with the patient was lost. A "loss"
would often be due simply to a decision to analyze the data
at a certain time regardless of whether all patients had
undergone the particular event. Thus, for those patients
alive at the time of analysis the time from treatment to
death would not be known; but it would be known to be greater
than their observed survival time. Such partial data are
termed in general, censored data, and in this example where
the observed value is a lower bound for the "true" value it
is termed a rightcensored value. We define this and other
types of censoring more formally later.
The terms "lifetime" and "survival time" are natural
in a medical context or anywhere where time is the response
1
70
Table 8. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 7.
LS
N'
1
BJN
BJT
% Censored
T1 ,Cl
85.5
93.0
88.5
81. 9
93.8
38
T2 ,C1
90.5
95.0
90.5
80.6
92.2
43
T3,C1
90.0
93.0
90.0
77.8
89.7
37
T4,C1
91.0
98.0
92.0
84.2
95.9
39
Tl ,C2
92.0
96.0
89.0
80.6
93.0
41
T2 ,C2
89.5
93.5
91.0
88.4
95.6
40
T3,C2
88.0
91.5
93.5
81.4
95.1
41
T4,C2
87.5
93.0
85.0
76.8
89.7
42
Tl ,C3
90.5
93.0
87.0
83.2
92.6
38
T2 ,C3
90.5
92.0
80.0
79.9
91.3
43
T3 ,C3
89.5
93.0
81.0
84.1
91. 3
37
T4,C3
88.5
93.5
82.5
79.8
90.7
32
Nominally intervals were 90% except for method N where
the setting was 88%. Standard deviations for the table
entries are approximately 2.1.
9
The Cox approach to regression with censored data is
through the proportional hazards model, while the other
methods we describe assume a direct linear relationship
between a response variable, possibly transformed, and one
or more covariables.
2.2 The Cox Method
The proportional hazards model proposed by Cox (1972)
is
A(t,x) = Ao(t) e^' (2.1)
where A (t;x) is the hazard function corresponding to the
covariate value x. The vector of regression coefficients
is _6 and the baseline Aq (t) is the hazard rate when x = 0^.
The model implies that distribution functions F(t;x) form
a family of Lehmann alternatives and thus that, in terms of
survival functions,
P (t; x) = (PQ (t) }Y, (2.2)
where y = e' (Miller, 1981, p. 120). The generalization
of (1.3) is
P (t;x) = exp {e^ J AQ(u)du) .
(2.3)
In the Cox analysis, A (t) remains an arbitrary
o
nuisance function in the estimation of J3, but we must esti
mate AQ(t) in order to make inference about the survival
43
and think in terms of a test at some level a and b' a
rejected value. Then Pk_j_ < a/2 and it is clear that b"
cannot be accepted at the level a provided p < p .
JC K x
Therefore a sufficient condition for the confidence set
(3.5) to be a single interval is that
Pk < pkl k such that
Pk_^ < pk k such that sk > 0.
(3.11)
We now argue that the condition (3.11) will be satis
fied by the conditional null distributions of S. Because
S(b) is nonincreasing in b (Lemma 2.6) condition (3.11) is
satisfied whenever
S
k1
We now consider what happens when the distribution of
S(b) does change. We treat first the case where there are
no tied b.. values at b, and extend the result to the tied
13 K
case later. By Lemma 3.8(ii) our concern is with the case
where c.. = 1. For this case just one censored and one
uncensored value of 2 (b) change rank at the critical b^^ and
such interchange is between adjacent ranks (cf., 2.12).
Furthermore, since S (b) is nonincreasing, after such a
rank interchange the higher rank will be associated with
the lower xvalue and S (b) will have reduced by one. Without
loss of generality assume < x2 < ... < xn and note that there
3
Table 6. Arithmetic means (x 10 ) based on 200 simulations of slope estimates for
samples of size n = 25.
Data
%
Censored
LS
N
1
BJ
Number of
BJ Estimates
Tl,Cl
44
192
(7.1)
185
(8.9)
185 (9.2)
190
(8.7)
200
T2 ,C1
50
198
(7.4)
194
(9.2)
192 (10.7)
194
(9.5)
199
T3,C1
38
209
(8.1)
211
(5.9
213 (7.1)
215*
(7.4)
200
T4 ,C1
44
202
(7.7)
206
(9.5)
209 (10.5)
207
(9.2)
199
Tl ,C2
46
200
*
(7.1)
196
(10.5)
168*(10.0)
*
193
*
(9.5)
199
T2 ,C2
46
180
(8.4)
185
(10.0)
157 (10.7)
178
(10.3)
199
T3,C2
47
193
(8.1)
195
(6.7)
175* (7.4)
194
(8.1)
200
T4,C2
47
194
(7.7)
186
(11.0)
160* (10.3)
188
(10.3)
199
Tl ,C3
43
196
(7.4)
205
(8.4)
083* (6.7)
198
(8.7)
199
T2 ,C3
51
194
(7.1)
210
(8.4)
052*(8.4)
195
(9.2)
196
T3 ,C3
38
200
(8.4)
197
(4.5)
135*(3.2)
196
(4.5)
200
T4,C3
32
188
(8.1)
188
(10.3)
102* (6.3)
187
(9.5)
200
Indicates that the mean is more than 2 standard deviations from the nominal
3 = 0.2. Estimated standard deviations (x 10j of the means are given in
parentheses.
Copyright 1983
by
Michael J. Ireson
75
Overall, the coverage by method N was never signif
icantly low, whereas at least one significantly low cover
age occurred for each of the methods 1, BJN and BJT.
v) Length of confidence interval. Confidence inter
vals are of course most meaningfully compared in terms of
length when they attain much the same coverage. Table 12
shows mean interval lengths and estimates of their standard
deviations for a series of simulations with sample size
n = 7. An interval length was included in the computation
only if all methods provided a finite length interval for
the particular data set. Thus for each data type the number
of interval lengths averaged is the same for all methods.
As discussed earlier, highly censored data sets are the ones
which tend to not give an interval, particularly for n = 7.
Therefore the results on interval length for n = 7 should
be viewed as applying to data with censoring less than that
indicated in Table 8.
Results for method BJN are not included in Table 12.
That method did give consistently shorter intervals than
method N but according to results in Table 8 is not competi
tive in terms of coverage. Where coverages are comparable
the indications are that method N gives, on average, shorter
intervals than both BJT and 1. With nominally 40% censoring
the intervals by method N turn out to be about twice the
length that would be obtained by LS had the data been com
plete. The column headed BJTN in Table 12 gives the
10
function. We first describe the estimation of _g. Cox
argued that the only times at which X (t;x) in (2.1) could
not conceivably be zero are the observed uncensored survival
times. Thus, with XQ (t) arbitrary the only times at which
X (t) could not be zero are those uncensored times and
o
those times are the only ones giving information about J3.
This stance led Cox to seek a likelihood by conditioning
on the observed uncensored survival times, and then to
treat that likelihood as an ordinary likelihood.
Suppose there are no ties and that the censored and
uncensored observations are ordered
Y(l) < y(2) < < Y(n)
Associated with each y ^ is a censoring indicator 6^ and
covariate vector x.. Let R,.v denote the "risk set" for
i (i)
each time y^ i.e.,
R(i) = Yj* Y{!)>
Using (2.1) and the interpretation given at (1.2) we see
that for each uncensored time y^ ,
Pr {death in [y ^ ,y (i) + dy)  R (i) >
B'x.
= I
j eR(i)
Ao(Y(i))dY'
and
29
nij (b)
(t)
(t)
(b)Z^ (b) )
t > 0,
t < 0,
t > Or
t < 0.
^(Z. (b)Zi(b))>,
For convenience the arguments for Z ^ ( ) and S ( ) will be
suppressed for general reference. To emphasize the depen
dence on b, S (Zjb) ,x) will often be abbreviated to S (b) .
Notice that S is the number of "definitely concordant" pairs
minus the number of "definitely discordant" pairs and thus
has some appeal as a measure of trend among the pairs
(Z^+^ ,x^). Values of S close to zero indicate no trend.
Two other forms of the statistic S will be found useful
in the sequel.
i) In the special case of distinct Zvalues (i.e.,
Z^ (b) Zj (b) for i j) and ordered distinct xvalues,
X. < X < . < X ,
12 n
n1 n
S (b) = Z Z {iMZ. (b)Z. (b) ) (6.+6.) 6.}. (2.4)
j=l i=j+l 1 3 3
ii) If Zvalues are distinct and ordered
Zi
n1 n
= L L 6. sgn (x.x.) ,
j=l i=j+l 3 3
S (b)
(2.5)
71
Table 9. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 10.
LS
N
1
BJN
BJT
% Censored
Tl ,C1
86.5
91.5
87.5
87.8
92.9
44
T1 ,C2
92.5
94.5
89.0
85.7
93'. 1
44
Tl,C3
90.0
90.5
80.0
79.0
86.7
41
Nominally intervals were 90% except for method N where
the setting was 88%. Standard deviations for the table
entries are approximately 2.1.
30
where
sgn (t)
0
1 t > 0,
t = 0,
Li t < 0.
Here the dependence on (b) is through the ordering of
Zvalues and consequent relabelling of S's and x's.
An important property of S is stated in the following
Lemma.
Lemma 2,6. The statistic S (b) is nonincreasing in b.
Proof: x. > x. implies that Z. (b)Z. (b) = Y. Y. b(x.x.)
i 3 ^ 2. 3 i 3 i 3
is a decreasing function of b. This, together with the
fact that (t) is nondecreasing in t, implies that (b)
is nonincreasing in b for each pair (i,j) with x^ > x^,
which completes the proof.
As noted earlier any solution to the equation S (b) = 0
is a reasonable estimate for 6. An exact solution might
not exist but Lemma 2.6 ensures that we can define
point estimator in the following way.
Definition 2.7. Let 6 = Sup {b : S (b) 0}
and 3 = Inf {b : S(b) < 0},
then a unique point estimator for 3 is
a unique
52
This is readily seen to be obtainable as a special case
of (2.3) by using that expression with
x
i
b =
observation from population 1
observation from population 2
and labelling x's such that
i = 1,. . ,n.
x. =
i
:
i = n^+1,...,n,
where n = n^ + n^. The summation at (2.3) then becomes
n.
1 n
S (A) = Z L {6 .
i=l j=nx+l D 1 D 1 3 1
and this relabels to the expression for W at (5.2) by denoting
li
= z .
i
li
= 5.
i
c
w
i1
ll
H
2j
= z.
i
2j
= 6.
i = n.+1,..
i
1
j = i ~ n.
Finding a confidence interval for A by inverting Gehan's
statistic (5.2) is therefore an equivalent procedure to
finding a confidence interval by inverting S(b) at (2.3).
57
C2 : has the same density for all i given by
A (dc)
f(c) = ~ ^Ad' 0 < c < d ,
1 e
where A and d are constants chosen to control the censoring
rate, and
C3 : C. = r,
i
where r is a specified constant.
The type of censoring contrived for the form C2 is
motivated by consideration of a clinical trial of fixed
duration d with time of entry in to the trial random at
constant rate A. All censoring mechanisms satisfy assumption
Al of Section 2.3. Only Cl satisfies assumption A2.
Departures from A2 will be moderate for C2 and quite severe
for C3.
In all simulations reported here the constants in
Cl, C2 and C3 were fixed to give a nominal 40% censoring in
the data. The average censoring rate attained for each type
of data is given in the following tables. Any generated
data sets having less than 3 uncensored values were discarded
(see 2.5.6). As examples of the censoring patterns that can
arise using Cl, C2, C3, Table 3 gives the number of censored
observations occurring at x = 40(10)100 for 2000 simulations
of 3 different types of data. The frequencies clearly indi
cate the anticipated departures from assumption A2 with
censoring forms C2 and C3. The percentage censored column
49
C = 2 S (b) = Z Z n. Y
i j 1] A1D
with x =
13
sgn (x^Xj).
Clearly,
tj 
n and x = ~X
31 13 31
Furthermore
, since nij
1 < 1
and
IXijl <
1, the summations
 z n.
i /j / j 1 30
nij 1 < n
and
1 ^
i/j
Xij.l < n3
3
are of order n We can therefore apply Daniels' results
and conclude for the conditional null distribution of
S (b) that
i)E (S (b) ) = 0,
ii) Var (S (b) ) = [n (n1) (n2) ] _1 (Zr^ j j ,n? j ) (Â£x.Â¡_ j Xj_ j i
Zx? .) + [2n(nl) ] _1 (Zn? .) (X? .) (4.2)
where summation is over all subscripts from i = l,...,n.
iii) S(b) is asymptotically Normal.
Some discussion of when this asymptotic approximation
might be appropriate in practice is given in Chapter Four.
3.5 The TwoSample Location Problem
Gehan (1965) formulates a twosample test for right
censored data. An inversion of that test is now shown to
be equivalent to a special case of the method presented in
the previous sections of this chapter.
For populations 1 and 2 with distribution functions
Ffi*), F2C), respectively, Gehan tests
36
S (b)
^ 6 
i
I
i
i
i
4 1
I
I
t
1
1
i
1
1
1
1
i
b
1.0
0.5
1
0
1
1
1
0.5
1. 0
T
i
i
i
4 ..
6 
Figure 1. Graph of S (b) vs. b for Example 2.16.
13
twosample problem with no ties and we let a single covar
iable x take values 0 or 1 to indicate the sample. The Cox
test of Hq: 3=0 using the test statistic,
tb Log V01
V"rfL9 v
is then formally equivalent to the MantelHaenszel (1959)
test of equality of the two distribution functions.
Secondly, we can permit the covariable to be a function of
time; i.e., use x(t) in (2.1).
The textbook by Kalbfleisch and Prentice (1980) gives
a good account of the proportional hazard approach to
regression analysis with censored data and lists many useful
references.
The Cox analysis does not in general give a direct
relationship between T and x Some indication of a direct
relationship can be seen using estimates of median lifetimes
using the survival function estimates at various settings
for the vector x* Fox one covariable the median could be
plotted to give a comparison with the results of a direct
regression analysis.
The performance of the Cox method and of the other
t
methods to be described later will be discussed in Section
2..7 and in Chapter Five. Next we define a linear model
and then describe three methods of analysis, all of which use
a form of least squares analysis.
80
Table 14. Differences in mean lengths of 90% confidence
intervals based on 200 simulations for sample
size n = 25.
BJN
Coverage
BJNN N
Coverage
BJTN
BJT Coverage
T1,C1
91.0
0.030*(.005)
91.5
0.010 (.006)
91.5
*
*
T2 ,C1
91.5
0.015 (. 006)
91.5
0.069 (. 007)
95.5
*
*
T3,C1
86.0
0.020 (.006)
90.5
0.045 (. 007)
89.0
T4,C1
87.4
+0.000 (.006)
87.0
0.044* (.007)
90.0
T1,C2
86.9
0.023*(.005)
87.0
0.022 (.
006)
91.5
T2 ,C2
87.4
0.004 (.007)
88.0
0.048*(.
009)
90.0
T3 ,C2
79.0
0.011(.006)
91.0
0.021*(.
006)
82.0
T4 ,C2
89.5
0.030*(.006)
89.0
0.020*(.
007)
91.5
Tl ,C3
83.4
0.119*(.007)
94.0
0.087 *(.
007)
87.4
T2 ,C3
88.3
0.052*(.009)
94.0
0.006 (.
009)
90.8
T3 ,C3
81.5
0.069*(.004)
92.0
0.055*(.
004)
84.5
T4,C3
78.5
0.151*(.007)
88.5
0.128*(.
007)
82.0
*Indicates a difference greater than 2 s.d. Values in
parentheses are estimated standard deviations for the
differences.
nono
CONTINUATION OF LISTING FOR APPENDIX C
111
SUBROUTINE VAR: COMPUTES ASYMPTOTIC
VARIANCE USING (3. 4. 2)
SUBROUTINE VAR(Z, N IC, COEF1/ C0EF2/ ZNORM, ISCUT/ VARS)
DIMENSION Z(200) IC (200)
REAL ISCUT
INTEGER SUMSQBSUMZ/ SUMB
SUMSQB=0
SUMZ=0
DO 50 1=11 N
SUMB=0
DO 40 J=l/ N
IF(Z(J)Z(I>>10, 20/ 30
10 SUMB=SUMBIC(J)
SUMSQB =SUMSGBt1C (J)
GO TO 40
30 SUMB=SUMB+IC(I)
SUMSQB=SUMSQB+IC
GO TO 40
20 SUMB=SUMB+IC(J)IC(I>
SUMSQB=SUMSQB+1C(J)1C(I)
40 CONTINUE
SUM Z=SUMZ+SUMB *SUMB
50 CONTINUE
VARS=CQEF1*(SUMZSUMSQB> +C0EF2*SUMSQB
ISCUT=1. *ZNORM*SGRT(VARS>
RETURN
END
39
The confidence set CS. will be conservative in the sense
p
Pr{ 6 Â£ CS,) >1 a, because of the use of discrete distribu
te
tions in defining sy (b) and s^(b) at (3.4).
The determination of CSa for a given set of data is
p
complicated by the fact that the null distribution of S(b)
depends upon b (Corollary 3.7). The following set of Lemmas
characterizes this dependence.
Lemma 3.6. If the xvalues are distinct then, for all b,
the permutation distribution of S (b) will be symmetric about
zero with support of the form p +2q, q = 0,1,2,...,r, for
m
some integers p,r. That is, the support values increment
by 2.
Proof; Consider the effect of associating the reverse order
of zvalues with fixed xvalues. From (2.3) we have
S(g (w(b))) = S(g (w(bm,
il H
for all p e Pn and Â£ e (3: q^ = np^+1,. . ,qn = npn+l} .
This proves the symmetry of S(b) Consider (b) ,i=lf
...,n associated with ordered xvalues x^ < < < xn.
Let (i,j) i > j denote arbitrary positions in the vector
of zvalues. Suppose that after a permutation of zvalues
the z(+)'s from positions (i,j) have moved to positions
(Pj_,Pj), respectively. Notice that if
non non non non non noooonoono
LISTING FOR APPENDIX C
108
THIS PROGRAM IMPLEMENTS THE NEW METHOD USING
THE LARGE SAMPLE RESULTS OF DANIELS (SECTION 3.4)
DATA INPUT:
SAMPLE SIZE, NORMAL VALUE (1.96 FOR 95% INTERVAL),
LOG OPTIONd LOGIO, 0 OTHERWISE) IN FORMAT 15, F5. 0, 15.
X, Y, DELTA IN FORMAT 2F5. 0, 15.
DIMENSION X(200), Z(200), Y(200), IC(20O),B(20000)
INTEGER C(20000),S(20000)
REAL ISCUT
READ(5, 10) N, ZNORM, LOGT
10 FORMAT (15, F5. O, 15)
DO 20 1=1, N
20 READ(5, 30)X(I),Y(I), IC(I)
30 FORMAT(2F5. 0, 15)
IF(LOGT EQ. Q)GO TO 5
DO 4 1=1, N
4 Y(I)=LQG10(Y(I))
5 CONTINUE
DO 31 1=1, N
31 WRITE(6. 29)X(I),Y(I), IC(I)
29 FORMAT(2F20. 8, 110)
COMPUTATIONS ON X FOR DANIELS' RESULTS (FROM 3.4.2)
CALL COEFX, N, C0EF1, C0EF2)
COMPUTE B'S
L=0
DO 35 1=1, N
DO 40 J=1,N
IF (XCI) .LE. X) GO TO 40
L=L+1
B(L)=
C(L)=IC(I)+IC(J)
40 CONTINUE
35 CONTINUE
ORDER B'S RE ARRANGE C'S
CALL ORDER(B, C, L)
DO 36 J=l, L
B(L+2J)=B(L+1J)
C(L+2J)=C(L+lJ)
36 CONTINUE
B(1)=B < 2) 1.O
B(L+2)=B(L+1) + l.O
COMPUTE STARTING VALUE S(l)
DO 55 1=1, N
55 Z(I)=Y(I)B(i)*X(I)
CALL COMP(X, N, IC, Z, IS)
S(1)=IS
COMPUTE S'S SEQUENTIALLY
KAV=0
L1=L+1
DO 60 1=2, LI
S(I)=S(I1)C(I)
IF((S(I1)*S(I)) LT. 0) ISUB=I
IF( S(I) NE. 0)G0 TO 60
ISUB2=I+1
KAV=KAV+1
60 CONTINUE
23
*
Y.
i
. Y.
x i
1 G(Yi)
(6.1)
where G() is the right continuous distribution function
*
for the censoring variables. The Y^ variables, being either
zero or inflated values of true survival times, are not intui
*
tively pleasing but we do have E (Y^) = a + Using an
*
estimate of G(*) we can obtain observations y., i = l,...,n,
and then apply the usual least squares solutions (5.3, 5.4).
*
No assumption about G(*) is required for Y^ to be unbiased
but in order to estimate G(0 we must assume that it is
independent of x; i.e., we make assumption A3 (cf., Section
2.3) .
By reversing the roles of the survival and censoring
random variables it would be possible to estimate the dis
tribution function G (*) using the PL estimator; however,
KSV advocate the use of an alternative, asymptotically
similar estimator, which is technically more convenient in
their theory. Use is made of a Bayesian estimator from
Susarla and Van Ryzin (1980),
1 G(t)
n
i:yi
+ N+(y..)A
+ N+(y.,)'
16
i
/
(6.2)
where N+(y) is the number of y^ greater than y. Note that
in (6.2) because we are making inference about censoring
variables, we take censored y's as preceding uncensored
APPENDICES
50
V Fl(t) = F2(t)
against
Hx: Fx(t) < F2 (t) ,
or against
H2: F]_(t) < F2(t) or F1 (t) > F2 (t) .
If we consider the shift model as a special case of this,
we have
F^ (t) = F2 (tA) for some A,
or equivalently, for random variables T^, T2 from popula
tions 1 and 2,
T
1
+ A.
We would wish to test
against
H : A = 0
o
H1: A > 0 or H2: A # 0.
Consider samples of size n^, n2 from populations 1 and 2,
respectively, i.e., y^ ,y^2 /Y^n and
y2l 'Y22 ,',',y2n where again y 's are the possibly
censored lifetimes and the extra subscript indicates the
population. Gehan's test of A = 0 would use the statistic
nl n2
W = I l U. .
ii ID
1=1 3=1 J
where
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
NONPARAMETRIC REGRESSION IN THE
ANALYSIS OF SURVIVAL DATA
By
Michael J. Ireson
August 1983
Chairman: Pejaver V. Rao
Major Department: Statistics
Confidence intervals and point estimators are defined
for the slope parameter of a simple linear regression model
fitted to rightcensored data.' Estimation of shift using
two independent rightcensored samples arises as a special
application. Existing nonparametric methods for the regres
sion problem all depend on asymptotic results whereas the
new method gives intervals which have exact coverage prob
ability under some assumptions on the censoring mechanisms.
The suggested technique is an extension of a nonpar
ametric regression method that is available for uncensored
data. A statistic is defined as a function of a hypothe
sized slope value and that statistic is considered over
a search of slope values. A series of permutation tests
are defined on the statistic and the corresponding inversion
produces the confidence interval for the slope value.
viii
27
3.2 Point Estimation of 3
In the case of no censoring, some procedures for
testing Hq: 8=b and for estimation of 8 use the fact that
Z. (b) = Y. bx. = a + (3b)x. + e. ,
i i i i x
and x^ are functionally independent under Hq. Any test
for trend among the pairs (Z^ (b) ,x^) may be used to test Hq.
Suppose the test statistic S is a function of the pairs
(Z^(b),x^), i = l,...,n; then following Hodges and Lehmann
(1963) an estimate of 3 can be defined as that value b which
makes S = S (Mb) ,x) most consistent with a hypothesis of
no trend relationship between Z^(b) and x^. Typically, the
A
statistic chosen is t, the hypothesis of no trend is x = 0
A
and i is distributionfree, symmetric about zero (cf.,
Kendall, 1970, Sen, 1968).
Brown, Hollander and Korwar (1974) extend the above
testing procedure to the case of censored data, but do not
attempt to estimate 3. They provide a test of Hq: 3 = 0
using a statistic which is a special case (b = 0) of that
which is now described and given at (2.3). Following the
above discussion and taking censoring into account, define
z{+) (b) = y+) bx, i = 1,. . ,n. (2.1)
As before, under Hq: 8=b there will be no trend among the
pairs (z+^ (b), x^) and our purpose now is to define a
12
for ranks defined on the censored and uncensored observa
tions. Cox (1975) constructs a full likelihood which is
written as a product of two terms, one of which he calls
the partial likelihood. This partial likelihood coincides
with Lc(m), from (2.5). Cox suggests that the partial
likelihood contains most of the information about jl and
this seems justified by Efron (1977) and Oakes (1977) who
have established that the Cox estimator is nearly fully
efficient.
We now need to consider the case of ties in the data.
In the case of an uncensored time equalling a censored time,
the censored time is considered larger and the foregoing
method applies directly. Cox (1972) suggests ad hoc modi
fication of the procedure if there are just small numbers
of ties. His generalization for discrete time is considered
computationally not feasible. Two other modifications to
the likelihood expression that have been proposed for deal
ing with discrete data are due to Peto (1972) and Prentice
and Gloeckler (1978). The method of Prentice and Glpeckler
does strictly adhere to a proportional hazards model; the
other two methods do not.
Estimation of the survival function via (2.3) requires
estimation of X (t). A method for this has been given by
o
Breslow (1974).
The Cox model is flexible and it is interesting to
note two adaptations of it. Firstly, suppose we have a
47
It remains to consider ties in the critical b's and
the general result is given in the following corollary.
Corollary 3.13. For the permutation distributions described
at (3.3) and pvalues defined in Definition 3.10, we have
in general for an arbitrary critical value b^ that,
Pk < pkl sk < '
Pk1 < Pk sk > 0.
Proof: As b increases through a critical bk a series of
adjacent ranks of z's is reversed and S(b) reduces by
Â£ c^j (as at 2.11) with the summation being over q pairs
(i,j) such that b. = b. Again let the rank vector after
1 j K
change through b^ be Rk+^ and observe (in the manner of
Theorem 3.12) that for any permutation of the elements of
no corresponding permutation of the elements of
can increase S (b) by more than Â£ cj_j Thus,
Number of permutations"
of elements of
giving Sk < sk
Number of permutations
of elements of Rk_^ giving)
ski < ski = sk E cij >
The result follows as in Theorem 3.12.
We have shown that for a chosen level a, as b increases
above the point estimate a value can be reached at which
the sample value of S(b) falls in the reject region of the
test at (3.4) and that further increase of b does not lead
to acceptance of S (b). Similarly for b decreasing below
87
The overall impression from Tables 12 to 14 is that the
new method is competitive in terms of interval length when
comparisons are made for those intervals that attain cover
age. Thus in conclusion it can be said that over all trials
method N gave the better assurance of coverage coupled with
a competitive interval length.
Sample sizes greater than 25 were not simulated in this
study. Large sample results were compared using the Stanford
heart data. Here of course there are no "true" results for
comparison but method N gave results consistent with com
monly held views about the data.
34
For the purpose of illustration consider the following
hypothetical example consisting of n = 5 (x,y^) pairs.
Example 2.16
i 1
2 3 4 5
x. 1
i
2 3 4 5
*i 3
2+ 3 3+ 4
6 1
i
0 10 1
Using (2.9) and
. (2.10) we compute and c^j
values
for (i / j)
: x > Xj.
For example,
b2,1 = (2
 3) / (2 1) = 1
and
C2,l = "
(1+0) = 1.
These and
the other
values are given in Table 1.
Table 1.
Values of
b^j and (c^j) for Example 2.16
i\j
1
2 3 4
5
1

2
1 (D

3
0 (2)
1 (D
4
0 (1)
0.5 (0) 0 (1)
5
0.25 (2)
0.67 (1) 0.5 (2) 1 (1)

The first
step in the computation of the s^ is the
compu
tation of
s This
o
can be done using (2.3) with any value
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Pejaver V. Rao, Chairman
Professor of Statistics
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree o.f^Doctor of Philosophy.
Ramon C.OLit'tell /
Professor of Statistics
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Jon J. Shuster
Professor of Statistics
89
regard to any assumption on censoring pattern but A2 would
seem to be required for consistent estimation of the vari
ance of their estimate. They rely on simulation support
for their heuristic arguments concerning large sample prop
erties. The new method gives exact intervals for small
samples under A2 and for larger samples employs. the asymp
totic theory of Daniels (see Section 3.4) again assuming A2.
All of the methods adapt to leftcensored data.
A simple approach for this is to reverse the sign on all
response and covariable values and use the original method
treating the leftcensored values as rightcensored. The
methods of Cox and KSV do not simply adapt to data exhibit
ing both left and right censoring. The Miller and BJ meth
ods can be extended to that situation by using the distribu
tion estimator of Turnbull (1974, 1976) and the new method
would apply by incorporating a natural extension of the
definitions of "definitely concordant" and "definitely
discordant" (see Definition 3.2.2).
We now compare some computing aspects of the various
methods. For KSV the computations are without iteration and
easy, once the choice of M:n for their expression (2.6.2)
has been made. That choice, however, is not clearcut.
Convergence can be a problem with the Miller method and to
a lesser extent with BJ. Computing costs for the Cox, Miller
and BJ methods are comparable. The new method requires more
computing time than the other methods, but does not suffer
LISTING FOR APPENDIX B
105
C TWO SAMPLE COMBINATIONS OF X=1 POSITIONS
C IN THE X VECTOR. SUBROUTINE SAM IS
C CALLED FOR COMPUTING ISUMX.
C
M=0
DO 4 J=l, N
4 M=M+IFIXXJ))
DO 3 1=1,11000
DO 6 J=l, N
ISUMX(I, J)=G
6 CONTINUE
3 CONTINUE
111=0
NM1=NM+l
DO 300 Jl=l, NM1
XP(1)=J1
Kl=XPi1>+1
NM2=NM+2
DO 310 J2=K1, NM2
XP ( 2 ) =J2
IF(M NE. 2)GO TO 515
111=111+1
CALL SAMXP, III, M, N>
GO TO 510
515 K2=XPi2)+l
NM3=NM+3
DO 520 J3=K2, NM3
XP(3)=U3
IF(M NE. 3)GO TO 525
111=111+1
CALL SAM
GO TO 520
525 K3=XP <3)+l
NM4=NM+4
DO 530 J4=K3, NM4
XP(4)=J4
IF(M NE. 4)GO TO 535
111=111+1
CALL SAMXP, III, M, N)
GO TO 530
535 K4=XP(4>+1
NM5=NM+5
DO 540 J5=K4, NM5
XP ( 5)=J5
IF(M NE. 5)GO TO 545
111=111+1
CALL SAM(XP, III, M, N)
GO TO 540
545 K5=XP(5)+1
NM6=NM+6
DO 550 J6=K5, NM6
XP <6)=J6
IF(M NE. 6)GO TO 555
111=111+1
CALL SAM(XP, III, M, N)
GO TO 550
555 K6=XP<6)+1
NM7=NM+7
DO 560 J7=K6, NM7
XP ( 7) =J7
IF(M NE. 7)GO TO 565
111=111+1
CALL SAMXP, III, M, N>
GO TO 560
565 K7=XP(7)+1
NM8=NM+8
DO 570 J8=K7, NM8
XP(8)=U8
IF (M NE. 8) GO TO 575
111=111+1
CALL SAMXP, III, M. N)
GO TO 570
79
Table 13. Mean lengths of 90% confidence intervals based
on 200 simulations for sample size n = 15.
LS
N
BJT
T1,C1
. 495 (. 007)
. 670 (.021)
92.5
.713 (.039)
94.5
T2 ,C1
.474 (.009)
. 674 (.021)
91.0
. 863 (. 052
94.9
T3,C1
.471 (.011)
.400 (.011)
88.5
.471(.017)
91.4
T4,C1
. 488 (.005)
.679 (.017)
89.0
.756 (.025)
91.4
Tl ,C2
.485 (.006)
.723 (.018)
93.5
. 830 (. 050)
95.5
T2,C2
. 463 (.009)
.648 (.021)
87.5
.732 (.029)
91.4
T3 ,C2
.464 (.012)
.487 (.013)
92.0
.508 (.016)
95.0
T4,C2
. 496 (. 005)
.798 (.026)
91.0
. 863 (.029)
94.8
Tl ,C3
.491 (.007)
. 680 (.022)
89.0
.605(.021)
89.9
T2 ,C3
.475 (.009)
.656 (.022)
91.0
. 653 (.025)
93.7
T2 ,C3
. 460 (. 011)
. 353 (.009)
90.0
. 293 (. 010)
84.5
T4 ,C3
. 488 (. 005)
. 662 (.016)
92.0
.519 (.013)
87.4
Values
in parentheses
are estimated
standard
deviations
for
the means.
The third figure in the N and BJT columns is the observed
percentage coverage.
LIST OF TABLES
Table Page
1. Values of and (c^j) for Example 2.16 . 34
2. Computation of S (b) 35
3. Frequency of occurrence of censoring at
different x in 2000 simulations of samples
of size n = 7 58
4. Percentages of 200 simulations leading to
infinite intervals by N and percentages
giving convergence problems with BJ 63
3
5. Arithmetic means (x 10 ) based on 2000
simulations of slope estimates for samples
of size n = 7 65
3
6. Arithmetic means (x 10 ) based on 200
simulations of slope estimates for samples
of size n = 25 66
7. Simulation results concerning estimation
of the variance of the BuckleyJames
estimator with sample size n = 7 68
8. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 7 70
9. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 10 71
10. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 15 72
11. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 25 73
12. Mean lengths of 90% confidence intervals
based on 200 simulations for sample
size n = 7 78
vi
25
Of major concern, then, is how robust the methods turn out
to be in practice. Some trials and comparisons that have
been made with these methods are now referred to briefly,
but a fuller appraisal of various aspects of the methods
is deferred to Chapter Five.
In his original paper Miller (1976) compares the per
formances of the Cox and Miller methods in analyzing data
from the Stanford Heart Program that was referred to in
Chapter One. Miller and Halpern (1982) extend this to a
comparison of all four methods and come out generally in
favor of the Cox .and BJ methods, finding that the other
two methods have methodological weaknesses. Buckley and
James (1979) provide some simulation results for the Miller
and BJ methods. Most of their trials are at sample size
50 with 50 percent censoring. They employ a variety of
residual error and censoring models and investigate the
bias of the estimate. The BJ method appears robust, while
the Miller method seems rather sensitive to departures from
assumption A2 (see Section 2.3). Buckley and James (1979,
p. 434) do not report coverage probabilities for their inter
vals but do report, "quite adequate variance estimates"
(using 4.5).
All of the methods considered in this chapter are
asymptotic methods. In the next chapter we develop a
procedure which is appropriate for all sample sizes.
NONPARAMETRIC REGRESSION IN THE
ANALYSIS OF SURVIVAL DATA
BY
MICHAEL J. IRESON
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1983
Copyright 1983
by
Michael J. Ireson
ACKNOWLEDGMENTS
I am grateful for the stimulating contact I have
had with many faculty and students at the University of
Florida. In particular I want to thank Dr. P. V. Rao
for his advice throughout my graduate program and for
his learned and shrewd guidance of my dissertation.
Also I want to acknowledge valuable help from Drs. Jon
Shuster and Ramon Littell.
I would not have undertaken this work without
encouragement from my wife and I shall always appreci
ate the sacrifices she made. I also thank Rob and
Nicola for their tolerance of an ofttimes reclusive dad.
My special thanks go to Mrs. Edna Larrick for a speedy
and excellent typing job.
Computing was done using the facilities of the
Northeast Regional Data Center located on the campus of
the University of Florida in Gainesville.
Ill
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS iii
LIST OF TABLES vi
ABSTRACT viii
CHAPTER
ONE INTRODUCTION 1
1.1 The Concept of Censoring 1
1.2 Aims and a Conspectus 5
TWO REGRESSION METHODS 6
2.1 Regression Models 6
2.2 The Cox Method 9
2.3 The Linear Model 14
2.4 The Miller Method 16
2.5 The BuckleyJames Method 2 0
2.6 The Koul Susarla and Van Ryzin Method 22
2.7 Some Comparative Studies 2 4
THREE A NEW METHOD 2 6
3.1 Structure and Notation 2 6
3.2 Point Estimation of 6 27
3.3 Interval Estimation of 6 : Exact Method 37
3.4 Interval Estimation of
6 : Asymptotic Method 48
3.5 The TwoSample Location Problem .... 49
3.6 Computational Aspects 53
FOUR COMPARATIVE STUDIES 55
4.1 Simulations Using a Simple Linear Model 55
4.2 Analysis of Heart Transplant Data ... 83
4.3 Summary of Results 86
FIVE CONCLUSION 88
iv
TABLE OF CONTENTS (Continued)
APPENDICES Page
A SMALL SAMPLE REGRESSION PROGRAM 94
B TWOSAMPLE ADAPTATION 104
C LARGE SAMPLE PROGRAM 107
BIBLIOGRAPHY 112
BIOGRAPHICAL SKETCH 115
v
LIST OF TABLES
Table Page
1. Values of and (c^j) for Example 2.16 . 34
2. Computation of S (b) 35
3. Frequency of occurrence of censoring at
different x in 2000 simulations of samples
of size n = 7 58
4. Percentages of 200 simulations leading to
infinite intervals by N and percentages
giving convergence problems with BJ 63
3
5. Arithmetic means (x 10 ) based on 2000
simulations of slope estimates for samples
of size n = 7 65
3
6. Arithmetic means (x 10 ) based on 200
simulations of slope estimates for samples
of size n = 25 66
7. Simulation results concerning estimation
of the variance of the BuckleyJames
estimator with sample size n = 7 68
8. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 7 70
9. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 10 71
10. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 15 72
11. Percentage coverage of confidence intervals
based on 200 simulations with sample
size n = 25 73
12. Mean lengths of 90% confidence intervals
based on 200 simulations for sample
size n = 7 78
vi
LIST OF TABLES (Continued)
Table Page
13. Mean lengths f 90% confidence intervals
based on 200 simulations for sample
size n = 15 79
14. Differences in mean lengths of 90% confidence
intervals based on 200 simulations for
sample size n = 25 80
15. Proportions of 200 simulations for which
Hq: 6=0 was rejected at a test level
a = 0.1 82
16. Regression slope estimates and confidence
intervals for log1Q of survival time versus
age at transplant with Stanford heart
transplant data 85
17. Regression slope estimates and confidence
intervals for log^g of survival times versus
mismatch score with n = 157 Stanford heart
transplant patients 85
vii
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
NONPARAMETRIC REGRESSION IN THE
ANALYSIS OF SURVIVAL DATA
By
Michael J. Ireson
August 1983
Chairman: Pejaver V. Rao
Major Department: Statistics
Confidence intervals and point estimators are defined
for the slope parameter of a simple linear regression model
fitted to rightcensored data.' Estimation of shift using
two independent rightcensored samples arises as a special
application. Existing nonparametric methods for the regres
sion problem all depend on asymptotic results whereas the
new method gives intervals which have exact coverage prob
ability under some assumptions on the censoring mechanisms.
The suggested technique is an extension of a nonpar
ametric regression method that is available for uncensored
data. A statistic is defined as a function of a hypothe
sized slope value and that statistic is considered over
a search of slope values. A series of permutation tests
are defined on the statistic and the corresponding inversion
produces the confidence interval for the slope value.
viii
Exact permutation tests are used for small samples but
asymptotic results are available for larger samples.
The new method of estimation, which is particularly
recommended as a small sample method, appears to have
several desirable properties. Simulation studies indicate
that the method performs well over all sample sizes and is
robust against a variety of censoring patterns.
IX
CHAPTER ONE
INTRODUCTION
1.1 The Concept of Censoring
A good deal of research data do not conform to optimal
statistical designs and methods. A common departure from
the ideal is the "missing data" problem. Another departure,
common in certain contexts, is the occurrence of what we
might think of as partial information. For example, in
medical followup, the time from an initial treatment to a
particular event, such as "relapse" or "death," would not
be available if contact with the patient was lost. A "loss"
would often be due simply to a decision to analyze the data
at a certain time regardless of whether all patients had
undergone the particular event. Thus, for those patients
alive at the time of analysis the time from treatment to
death would not be known; but it would be known to be greater
than their observed survival time. Such partial data are
termed in general, censored data, and in this example where
the observed value is a lower bound for the "true" value it
is termed a rightcensored value. We define this and other
types of censoring more formally later.
The terms "lifetime" and "survival time" are natural
in a medical context or anywhere where time is the response
1
2
measure, and "survival analysis" is a general term used for
methods of analyzing censored data. However, censored
responses need not concern survival or be a time. Miller
(1981) quotes an example from Leavitt and Olshen (1974)
where the response is the amount paid on an insurance claim;
in some cases the patient's illness is over and the total
claim is known, in others the patient is still sick and only
the amount paid to date is known.
Much of the early work in lifetesting, following a
pioneering paper by Epstein and Sobel (1953), was done in
a parametric framework. That work was largely applied to
electrical engineering problems where certain distributional
assumptions seem to be appropriate (cf., Mann, Schafer and
Singpurwalla, 1974; Nelson, 1981). Mann et al. also describe
Type I and Type II censoring mechanisms which are partic
ularly appropriate for industrial experimentation. Our
focus is on situations, particularly medical, where we would
like to use a distributionfree technique and where the
censorship is random in the sense that we now describe.
Let T^, i = l,...,n be independent random variables,
which we may think of as representing true, possibly unob
servable, lifetimes and let independent variables ,
i = l,...,n be censoring variables in the sense that T^ is
observable only if T. < C.. We can observe variables
i i
(Y.,6.) i = l,...,n where
3
Y. = Min
x
and
6 =
i
O
1 Ti < C^, (i.e., is not censored)
This is a random censorship model. It is common to
assume that T^ and are independent. Since the observ
able Yvalues are lower bounds for the "true" values, they
are described as rightcensored values. In random left
censoring we would observe T^ only if T^ ^ C^; that is,
we observe
Y. = Max (T.,C. )
i l i
and
T. > C.
i l
T. < C..
l i
Both types of censoring might occur in the same data set
and both are special cases of interval censoring in which
we observe only that the random variable of interest falls
in an interval. For example, with leftcensoring we observe
only that T^ falls in the interval (,C^] .
The occurrence of censoring will now be discussed with
reference to a particular medical studya study to which
we shall refer throughout. The Stanford Heart Transplant
Program is described by Brown, Hollander, and Korwar (1974)
4
and by several other authors. A transplant patient will
receive a transplant if he survives until a donor is found.
Records for a patient who does receive a heart include
date of admission to the program, date of transplant and
the date of death if a death occurred. Survival times of
those who receive a heart will in some cases be right
censored. This would occur if analysis was carried out on
a date on which some recipients were still alive, or if
contact was lost with a patient at a time when he was known
to be alive, or if a patient died from some cause not
related to heart function. It seems reasonable in this
example to assume that the censorship time is not related
to the survival time except perhaps in a case where a
patient may have chosen to drop out due to some features of
heart performance.
Special techniques are needed for analyzing censored
data. Many of the available nonparametric methods for these
problems are given in recent textbooks by Miller (1981) ,
Kalbfleisch and Prentice (1980), Lawless (1981) and Elandt
Johnson and Johnson (1980) The particular problem tackled
in this study is now described.
5
1.2 Aims and a Conspectus
This dissertation concerns regression analysis where
the response variable is rightcensored. Such an analysis
might be useful for example with the Stanford data in
relating survival time to one or more concomitant measures
taken on patients. Waiting time for a donor would be one
such measure. Others available in that data set are age
of patient and a mismatch score which measures the degree
of tissue incompatability between the donor and recipient
hearts.
In the next chapter we review some of the methods that
have been proposed for the regression problem and in Chapter
Three present a new method. Some simulation results and
results of analyzing the Stanford heart data are presented
in Chapter Four and concluding observations and suggestions
are made in Chapter Five. Fortran coding and documentation
for the new method are included in appendices.
Tables and figures are numbered on separate systems
which run consecutively through the whole text. All other
cross referencing follows the pattern chapter, section and
item within the section in that order, except that the chap
ter reference will be dropped when the citation appears in
the same chapter. Thus the fourth numbered item in Section
Two of Chapter Three would be cited as (3.2.4), except in
Chapter Three where it would be simply (2.4). Theorems,
Lemmas, Definitions, Examples, etc. all fall consecutively
into the numbering scheme.
CHAPTER TWO
REGRESSION METHODS
2.1 Regression Models
The four main nonparametric regression techniques
currently available are due to Cox (1972), Miller (1976),
Buckley and James (1979) and Haul, Susarla and Van Ryzin
(1981). These methods will be described later in this
chapter, but first some preliminary discussion about hazard
rates and accelerated time models is in order.
For a random variable T ^ 0 with density f(t) and
rightcontinuous distribution function F(t) the survival
function is
(1.1)
P (t) = 1 F(t) = Pr (T > t) .
The hazard rate (hazard function or force of mortality),
A (t),is defined
(1.2)
with the interpretation that A(t)dt is the probability of
"death" in the interval (t,t+dt) given survival beyond t.
Integration of A(t) shows that
P (t) = exp
(1.3)
6
7
The hazard rate forms the essential structure of the
Cox regression model. It can be connected with a linear
regression model in the following way. Suppose UQ is a
survival time with hazard rate A (u) = f (u) / (1 F (u) ) .
o o o
It might be appropriate to assume that for a parameter 3
and some covariate x we could obtain different failure
times via
U
x
(1.4)
The relationship (1.4) is referred to as an accelerated
time model, and indeed with 3x < 0 we have U < U and the
X o
model represents a situation where a failure time experiment
might be speeded up by some control over a covariable x
(cf., Mann et al. 1974) This model is readily seen to
coincide with a loglinear model,
T = Log 0 = a + 6x + Â£, (1.5)
X X
where
a = E[Log UQ] and e = Log UQ E[Log UQ).
Notice that in (1.5) the covariate acts in an additive way
on the log lifetime.
With model (1.4) the hazard rates of Ux and UQ are
related in the following way;
8
X (u)
x
f (u)
X
1 F (u)
X
f0(e'ex u)e'6x
1 F (e"8x u)
o
 Xq (e
Bx
u) e
Bx
If UQ has an exponential distribution, then the
XQ(') is constant. Letting XQ (e u) = X we
this special case,
hazard rate
see that for
Xx(u) = X e"Sx.
That is, the hazard function for Ux is proportional to some
function of x and the covariable acts in a multiplicative
way on the hazard. This motivates consideration of a pro
portional hazards model,
X (u) = A (u) eBx, (1.6)
o
for some baseline Xq(*) and a parameter 8.
We have seen that with the exponential distribution,
the accelerated time model (1.4), which is a loglinear
model (1.5) implies a proportional hazards model (1.6) .
In general the two models do not coincide. Only in the
case of the twoparameter Weibull family of distributions
will they be equivalent (Kalbfleisch and Prentice, 1980,
p. 34).
9
The Cox approach to regression with censored data is
through the proportional hazards model, while the other
methods we describe assume a direct linear relationship
between a response variable, possibly transformed, and one
or more covariables.
2.2 The Cox Method
The proportional hazards model proposed by Cox (1972)
is
A(t,x) = Ao(t) e^' (2.1)
where A (t;x) is the hazard function corresponding to the
covariate value x. The vector of regression coefficients
is _6 and the baseline Aq (t) is the hazard rate when x = 0^.
The model implies that distribution functions F(t;x) form
a family of Lehmann alternatives and thus that, in terms of
survival functions,
P (t; x) = (PQ (t) }Y, (2.2)
where y = e' (Miller, 1981, p. 120). The generalization
of (1.3) is
P (t;x) = exp {e^ J AQ(u)du) .
(2.3)
In the Cox analysis, A (t) remains an arbitrary
o
nuisance function in the estimation of J3, but we must esti
mate AQ(t) in order to make inference about the survival
10
function. We first describe the estimation of _g. Cox
argued that the only times at which X (t;x) in (2.1) could
not conceivably be zero are the observed uncensored survival
times. Thus, with XQ (t) arbitrary the only times at which
X (t) could not be zero are those uncensored times and
o
those times are the only ones giving information about J3.
This stance led Cox to seek a likelihood by conditioning
on the observed uncensored survival times, and then to
treat that likelihood as an ordinary likelihood.
Suppose there are no ties and that the censored and
uncensored observations are ordered
Y(l) < y(2) < < Y(n)
Associated with each y ^ is a censoring indicator 6^ and
covariate vector x.. Let R,.v denote the "risk set" for
i (i)
each time y^ i.e.,
R(i) = Yj* Y{!)>
Using (2.1) and the interpretation given at (1.2) we see
that for each uncensored time y^ ,
Pr {death in [y ^ ,y (i) + dy)  R (i) >
B'x.
= I
j eR(i)
Ao(Y(i))dY'
and
11
Pr {death of (i) at time y ... (one death in R...
J (1) (1)
at time y ^ }
(i)
Z
j eR
(2.4)
(i)
The product of the conditional probabilities at (2.4) is
called a conditional likelihood:
V)
Â£'X(i)
u
Z
j Â£R
B'x
J
(i)
(2.5)
Note that (2.5) is not a true conditional likelihood.
Standard likelihood analysis is applied to (2.5) and
iterative methods are usually required (cf., for example,
Kendall and Stuart, 1973). Through standard maximum like
A
lihood arguments, Cox (1975) suggests that _3 from maximizing
(2.5) is asymptotically normal with mean _3 and covariance
matrix the inverse of the information matrix:
92
1 (6) = 5 Log L (3) (2.6)
33
A formal proof of asymptotic normality is given by Tsiatis
(1981) .
A formal justification for the use of (2.5) in the
case of no ties is given by Kalbfleisch and Prentice (1973).
They show the equivalence of (2.5) and a marginal likelihood
12
for ranks defined on the censored and uncensored observa
tions. Cox (1975) constructs a full likelihood which is
written as a product of two terms, one of which he calls
the partial likelihood. This partial likelihood coincides
with Lc(m), from (2.5). Cox suggests that the partial
likelihood contains most of the information about jl and
this seems justified by Efron (1977) and Oakes (1977) who
have established that the Cox estimator is nearly fully
efficient.
We now need to consider the case of ties in the data.
In the case of an uncensored time equalling a censored time,
the censored time is considered larger and the foregoing
method applies directly. Cox (1972) suggests ad hoc modi
fication of the procedure if there are just small numbers
of ties. His generalization for discrete time is considered
computationally not feasible. Two other modifications to
the likelihood expression that have been proposed for deal
ing with discrete data are due to Peto (1972) and Prentice
and Gloeckler (1978). The method of Prentice and Glpeckler
does strictly adhere to a proportional hazards model; the
other two methods do not.
Estimation of the survival function via (2.3) requires
estimation of X (t). A method for this has been given by
o
Breslow (1974).
The Cox model is flexible and it is interesting to
note two adaptations of it. Firstly, suppose we have a
13
twosample problem with no ties and we let a single covar
iable x take values 0 or 1 to indicate the sample. The Cox
test of Hq: 3=0 using the test statistic,
tb Log V01
V"rfL9 v
is then formally equivalent to the MantelHaenszel (1959)
test of equality of the two distribution functions.
Secondly, we can permit the covariable to be a function of
time; i.e., use x(t) in (2.1).
The textbook by Kalbfleisch and Prentice (1980) gives
a good account of the proportional hazard approach to
regression analysis with censored data and lists many useful
references.
The Cox analysis does not in general give a direct
relationship between T and x Some indication of a direct
relationship can be seen using estimates of median lifetimes
using the survival function estimates at various settings
for the vector x* Fox one covariable the median could be
plotted to give a comparison with the results of a direct
regression analysis.
The performance of the Cox method and of the other
t
methods to be described later will be discussed in Section
2..7 and in Chapter Five. Next we define a linear model
and then describe three methods of analysis, all of which use
a form of least squares analysis.
14
2.3 The Linear Model
We now state the structure, notation and possible
assumptions for a censored linear model that will be
referred to throughout the sequel. Thus,
T^=a+j3'x^ + e^ i = 1,. . ,n,
(3.1)
where a is a parameter, J3 a p.xl vector of regression
parameters and x a pxl vector of known covariables. The
error terms are independent and identically distributed
(i.i.d.) with common distribution function F() and E(e^) =0.
The response T^ may of course be a transformation, possibly
log, of some other response of interest. The T^ are not
all observable. Let CL i=l,...,n be independent censor
ing variables and (Y^,6^) be observable pairs of variables
where
Y. =
l
Min
(T.,C.)
i i
H
II
0
and
f1
if T.
i
< c.
1
6. =
X
L
if T.
l
> c..
1
Attention
must
be given
to three possible
A fundamental assumption imposed throughout is
Al: CL are independent i = l,...,n.
A second, which is essential for the validity of some of
the following methods is
15
A2: Pr (C. < cx = X. ) = G (c) = G (c j?'x. )
where G is some distribution function and G is the
o x.
l
distribution function for the censoring variable at x^
Assumption A2 imposes a dependence between and x^. The
distribution shift given by A2 ensures that the probabil
ity distribution of 6^ is independent of x^. To see this
we have
Pr(6. = 11 x = x. ) = Pr (T < C. I x = x.)
i i i i1 i
= J Pr (Ti< cx= Xj_/ Ci= c) dGQ(cH3' x.Â¡_)
= J Pr (T^ < c  x= x.Â¡_) d G0
= J F (c a J3' xi)dGQ (c J3' x^)
= J F (u a)dGQ (u) ,
which is independent of x^*
An alternative to A2 is
A3: G (c) = G (c) .
X (J
1
Thus under A3 the censoring distribution does not depend
on x. .
l
These assumptions will come into consideration through'
out what follows. We now look at three methods of esti
mating the parameters of (3.1).
16
2.4 The Miller Method
For simplicity of description we now take (3.1) with
p=l, but note that all the methods of this chapter do apply
to multiple regression.
The Miller method is seen to be a natural modification
of ordinary least squares by writing the residual sum of
squares for the uncensored case in the following way;
2 (y ctBx. ) 2 = nf z2 (6) dF (z),
i=l 1 1 Joo n
where
Zj_(3) = yi a 8x^, (4.1)
and F is the empirical distribution function of z.(6),
n i
i = l,...,n. In the censored case some y^ and hence some
z^ will be censored values. Miller's (1976) modification of
the least squares'method is to estimate'8 by minimizing
Jz2 ( 8) dF (z) (4.2)
where F(z) is an estimate of the distribution function of z
based on the observations (z^,6^), i = 1,2,...,n.
For ordered censored and uncensored observations,
z... < z < ... < z, the productlimit (PL) estimator
for F(z) introduced by Kaplan and Meier (1958) is
t (z)
1 
{ i : z
n
(i)
ni
'ni+1
< z 6
)
)
1}
f
(4.3)
17
where ,.> denotes the 6 associated with z,.,
(i) (1)
a difficulty if 6
F (z) = 1, z > z
(n)
There is
= 0; the convention is to define
. The PL estimate F(z) is a step function
with jumps only at the uncensored points. The size of the
jump at an uncensored z ^ is
dF(z(i)>
ni+l
{j: j
n (
n~i
nj +1
5(j)=l}
(4.4)
The computational forms given at (4.3) and (4.4)
are applied to tied zvalues by assuming that in a run of
tied values the uncensored values precede the censored
values. Otherwise the labelling across the tied group is
arbitrary and the jumps at each uncensored point in a tied
group are equal (by 4.4).
A
Denoting the jump (4.4) by w. (6) (4.2) becomes
n 2
Â£ w. (3) (y a 3x.) (4.5)
i=l 11 1
The weights w^(3)
convention that 6, is
(n)
n
do not depend on a and with the
set equal to one in all cases,
we have I w (3) = 1. If 6 = 0 then w ( 3) =0 and (4.5)
T 1 JL 1
1=1
appears to depend only on the uncensored observations; how
ever the PL estimator and therefore each weight do depend
on all the observations. Seeking an a which minimises (4.5)
we get
18
a
Â£ w (B)y
i=l 1 1
8 Â£ w (8) x. ,
i=l 1 1
(4.6)
and resubstituting this in (4.5) would require us to min
imize a function
f (8)
2 w (8) (y, a Bx.) 2.
i=l 1 1 1
(4.7)
This could be minimized by a search procedure, but this has
difficulties, especially in higher dimensions. Miller (1976)
advocates a modified approach v>hich.we now describe.
The alternative procedure is an iterative one. An
initial set of zvalues are formed using .(4.1) with a = 0
and some starting estimate, 8Q / for 8. Weights w^(BQ) are
then computed using (4.4) and the next estimate for 8 is
defined
where
z w* (3J y< (x.x*)
6, =
u
u
y\C /N 9
I w. (8 ) (x.x )^
i o x u'
u
 wi(eo}
w.(B) =  
Z wi(Bo}
u
(4.8)
w. (8 ) x. ,
i o 1
)
with the summations being over the uncensored observations.
The estimate 8^ is then used in (4.1) and so on.
Notice that since the weights do not depend on a we chose
19
a = 0 in (4.1) for convenience and also note that because
the weights are renormalized over the uncensored observa
tions the estimation of 3 does not require any adjustment
/v
to =0. As a starting value 3Q, Miller suggests using
the unweighted least squares estimate of slope using only
the uncensored observations. The final estimate of 3 is
used in (4.6) for calculating a and in this computation we
do set =1 regardless of the actual censorship.
This iterative method might not attain convergence.
If estimates become trapped in a loop, Miller advocates
averaging the values in the loop. Assumption A2 is a suf
A A
ficient condition for the estimates a and 3 to be consis
/v
tent and for 3 to be asymptotically normal. Under an addi
tional assumption that the variability due to weights
^ ~
w^(3) is negligible, Miller obtains an asymptotic estimate
/N
for the variance of 3 given by
Â£ wi (3)(yict3xi)
Var (3) = ^7 Z* 2~ (49>
Â£ w (3) (x. xu)
u
Clearly if A2 is not satisfied, as in the case where
censoring is independent of x with a fairly steep regres
sion line, then there will be an irregular censoring pat
tern along the regression line which can lead to biased
estimates. In such cases Miller suggests dividing the
x range into two or more intervals each of which shows
20
a more even censoring pattern. Separate sets of weights
are computed for each interval and the sum of squares to
be minimized is then a weighted average of the separate
weighted sums of squares. This correction does of course
demand a fairly large number of observations and there are
practical difficulties in applying it to the multiple
regression situation.
2.5 The BuckleyJames Method
The method due to Miller applied estimated weights
to the residual sum of squares and then used a least squares
solution. For the same model (3.1) the BuckleyJames method
(BJ from here on) modifies instead the solution to the usual
normal equations by using estimates of T^ when these
*
are not observable. Let the pseudo random variable be
defined by
Y. = Y.. + E[T. T. > Y.] (16.) .
1 11 11 1 1
(5.1)
Since E(Y^) = a + 3x^ (Miller, 1981, p. 151) we could
*
estimate 6 using ordinary least squares if all the Y^ were
observable. The BJ approach is to estimate the observed
* t
value y^ whenever 6^ = 0 by using the following estimate
in (5.1):
[t. It. > y. ] = (jx. +
i1 i 11 i
wk(B)zk
Vzi
1 F(Z)
(5.2)
21
/N ^\ /\
Here as before = y^ 8x^, F(*) is the PL estimator based
/N /N
on (z^,^), i = 1,. ..,n, and w^ (8) i = 1,...,n are the
jumps of F(*) (see 4.4). The second term on the righthand
A
side of (5.2) uses a weighted average of those z^'s which
A
are greater than the current z^. Iteration is again neces
sary.
Given a starting estimate of 8/ (5.2) is used in (5.1)
*
to obtain the "responses" y^, i = l,...,n. Using least
A
squares the next iterate for 8 is
6
* 
Z y. (x.x)
i=l 1 1
n
Z
i=l
(xix)
(5.3)
Again this process may not converge. Oscillation cycles
might develop and the average of values in the cycle is
suggested as the solution. Given a solution for 8 the
solution for a is
a = y 8 x. (5.4)
The starting value for the iterations might be the
same as for the Miller method or it could, as suggested by
Miller and Halpern (1982), be the least squares solution
for $ using all of the data as if they were uncensored.
The variability among the oscillating values if they occur
is thought to be less in the BJ method than in the Miller
method.
22
Buckley and James (1979) suggest but do'not fully justify,
A
that 3 is asymptotically normal and that a reasonable esti
mate of variance is
where
Var (3)
I
u
u
(x.x )2
1 u
r
(yi"yuB(xixu,)
(5.5)
(5.6)
Although this method does not require assumption A2
for point estimation, we would not expect (5.5) to be
adequate if there is extremely uneven censoring along the
regression line. As in the Miller method, it seems sensible
to consider dividing the x range to create more homogeneous
regions.
We should note that the BJ method is a nonparametric
analogue of a technique due to Schmee and Hahn (1979) that
uses normality assumptions in estimating the expectation
at (5.2).
2.6 The Ko.ul, Susarla and Van Ryzin Method
This method (KSV) assumes the Model (3.1) and like BJ
defines a pseudo random variable which is unbiased for
a + 3x and then uses estimates for some observations of
that variable. In this case the pseudo random variable is
23
*
Y.
i
. Y.
x i
1 G(Yi)
(6.1)
where G() is the right continuous distribution function
*
for the censoring variables. The Y^ variables, being either
zero or inflated values of true survival times, are not intui
*
tively pleasing but we do have E (Y^) = a + Using an
*
estimate of G(*) we can obtain observations y., i = l,...,n,
and then apply the usual least squares solutions (5.3, 5.4).
*
No assumption about G(*) is required for Y^ to be unbiased
but in order to estimate G(0 we must assume that it is
independent of x; i.e., we make assumption A3 (cf., Section
2.3) .
By reversing the roles of the survival and censoring
random variables it would be possible to estimate the dis
tribution function G (*) using the PL estimator; however,
KSV advocate the use of an alternative, asymptotically
similar estimator, which is technically more convenient in
their theory. Use is made of a Bayesian estimator from
Susarla and Van Ryzin (1980),
1 G(t)
n
i:yi
+ N+(y..)A
+ N+(y.,)'
16
i
/
(6.2)
where N+(y) is the number of y^ greater than y. Note that
in (6.2) because we are making inference about censoring
variables, we take censored y's as preceding uncensored
24
y's in the case of ties. Like the PL estimator, the esti
mator at (6.2) can be unstable for large t, and KSV propose
a truncation of very large observations and use (6.2) only
for y^ The selection of Mn, however, is not clearcut.
No iteration is required and the parameter estimation
is computationally straightforward for any number of covar
ables. Koul et al. (1981) give conditions for consistency and
asymptotic normality of the solutions and discuss their
implications in practice. Integral expressions are given
for the asymptotic variances and covariances of the esti
mators, but computational forms are available only for the
one covariable case.
2.7 Some Comparative Studies
The Cox method has been given much attention in the
literature and is used extensively, yet there is much appeal
to the direct easy interpretation of the linear model. It
would seem that the choice between the Cox method and the
other three methods just outlined must be based on the appro
priateness of the proportional hazards model or the linear
model. However, a distinction may be hard to make in prac
tice. The choice of method should also be influenced by the
nature of the censoring, but again this is not likely to be
clearcut. The methods are asymptotic and in most cases
involve heuristic arguments or hard to verify conditions.
25
Of major concern, then, is how robust the methods turn out
to be in practice. Some trials and comparisons that have
been made with these methods are now referred to briefly,
but a fuller appraisal of various aspects of the methods
is deferred to Chapter Five.
In his original paper Miller (1976) compares the per
formances of the Cox and Miller methods in analyzing data
from the Stanford Heart Program that was referred to in
Chapter One. Miller and Halpern (1982) extend this to a
comparison of all four methods and come out generally in
favor of the Cox .and BJ methods, finding that the other
two methods have methodological weaknesses. Buckley and
James (1979) provide some simulation results for the Miller
and BJ methods. Most of their trials are at sample size
50 with 50 percent censoring. They employ a variety of
residual error and censoring models and investigate the
bias of the estimate. The BJ method appears robust, while
the Miller method seems rather sensitive to departures from
assumption A2 (see Section 2.3). Buckley and James (1979,
p. 434) do not report coverage probabilities for their inter
vals but do report, "quite adequate variance estimates"
(using 4.5).
All of the methods considered in this chapter are
asymptotic methods. In the next chapter we develop a
procedure which is appropriate for all sample sizes.
CHAPTER THREE
ANEW METHOD
3.1 Structure and Notation
We shall adopt the model and notation of Section 2.3.
A method will be described for estimating 6 in model (2.3.1)
with p = 1. The observable pairs (Y^,<5^) described in
Section 2.3 may also be represented in the following way.
Let Y^(=T^) denote an observable uncensored random variable
and Y? (= < T^) denote an observable censored random var
iable. The general notation, y!+^, is used to indicate that
censoring might or might not be present. Sample observa
tions will be denoted (yf4^ ,x.).
Assumption A2 of Section 2.3 will be imposed in order
for the new method to provide a confidence interval. It
might be argued that A2 imposes a severe restriction on the
practical utility of the proposed method, but a Monte Carlo
study reported in Chapter Four indicates that the method is
fairly robust to departures from A2. Recall that assump
tion A2 is also required for the use of the Miller and
BuckleyJames methods. This is discussed further in
Chapter Five.
26
27
3.2 Point Estimation of 3
In the case of no censoring, some procedures for
testing Hq: 8=b and for estimation of 8 use the fact that
Z. (b) = Y. bx. = a + (3b)x. + e. ,
i i i i x
and x^ are functionally independent under Hq. Any test
for trend among the pairs (Z^ (b) ,x^) may be used to test Hq.
Suppose the test statistic S is a function of the pairs
(Z^(b),x^), i = l,...,n; then following Hodges and Lehmann
(1963) an estimate of 3 can be defined as that value b which
makes S = S (Mb) ,x) most consistent with a hypothesis of
no trend relationship between Z^(b) and x^. Typically, the
A
statistic chosen is t, the hypothesis of no trend is x = 0
A
and i is distributionfree, symmetric about zero (cf.,
Kendall, 1970, Sen, 1968).
Brown, Hollander and Korwar (1974) extend the above
testing procedure to the case of censored data, but do not
attempt to estimate 3. They provide a test of Hq: 3 = 0
using a statistic which is a special case (b = 0) of that
which is now described and given at (2.3). Following the
above discussion and taking censoring into account, define
z{+) (b) = y+) bx, i = 1,. . ,n. (2.1)
As before, under Hq: 8=b there will be no trend among the
pairs (z+^ (b), x^) and our purpose now is to define a
28
statistic that would indicate any such trend. Firstly, in
the intuitive spirit of Brown, Hollander and Korwar we state
the following definition. The argument b of Z^ () is
suppressed for convenience and Z without superscript is the
variable value without regard to censorship.
Definition 2.2. A pair of points (z+^ ,x^) (zj+^ ,x^)
with > xj are "definitely concordant" if,
either
6 = 6 =1
i D
and Z > Z ,
i D
or
6^ = 0 =1 and Z^ > Z^ ;
and "definitely discordant" if,
either
6 = 6j = !
and Z. < Z ,
i D
or
6 = 1, 6. =0 and Z. < Z .
I D ID
Pairs of points for which Definition 2.2 does not apply are
considered as unable to contribute to an assessment of over
all concordance and are ignored.
Motivated by this assessment of pairs of points we
define a statistic that may be used to indicate trend among
the pairs (z+^ (b) ,x^) i = 1,. . ,n, by
( + > n n
S (Z (b) ,x) = Z Z n,^(b) Ip (x. x.) (2.3)
i=l j=l 1J 3
where
29
nij (b)
(t)
(t)
(b)Z^ (b) )
t > 0,
t < 0,
t > Or
t < 0.
^(Z. (b)Zi(b))>,
For convenience the arguments for Z ^ ( ) and S ( ) will be
suppressed for general reference. To emphasize the depen
dence on b, S (Zjb) ,x) will often be abbreviated to S (b) .
Notice that S is the number of "definitely concordant" pairs
minus the number of "definitely discordant" pairs and thus
has some appeal as a measure of trend among the pairs
(Z^+^ ,x^). Values of S close to zero indicate no trend.
Two other forms of the statistic S will be found useful
in the sequel.
i) In the special case of distinct Zvalues (i.e.,
Z^ (b) Zj (b) for i j) and ordered distinct xvalues,
X. < X < . < X ,
12 n
n1 n
S (b) = Z Z {iMZ. (b)Z. (b) ) (6.+6.) 6.}. (2.4)
j=l i=j+l 1 3 3
ii) If Zvalues are distinct and ordered
Zi
n1 n
= L L 6. sgn (x.x.) ,
j=l i=j+l 3 3
S (b)
(2.5)
30
where
sgn (t)
0
1 t > 0,
t = 0,
Li t < 0.
Here the dependence on (b) is through the ordering of
Zvalues and consequent relabelling of S's and x's.
An important property of S is stated in the following
Lemma.
Lemma 2,6. The statistic S (b) is nonincreasing in b.
Proof: x. > x. implies that Z. (b)Z. (b) = Y. Y. b(x.x.)
i 3 ^ 2. 3 i 3 i 3
is a decreasing function of b. This, together with the
fact that (t) is nondecreasing in t, implies that (b)
is nonincreasing in b for each pair (i,j) with x^ > x^,
which completes the proof.
As noted earlier any solution to the equation S (b) = 0
is a reasonable estimate for 6. An exact solution might
not exist but Lemma 2.6 ensures that we can define
point estimator in the following way.
Definition 2.7. Let 6 = Sup {b : S (b) 0}
and 3 = Inf {b : S(b) < 0},
then a unique point estimator for 3 is
a unique
31
Clearly the statistic S (b) depends on the ranks of the
Z's, the ranks of the x's and the censoring pattern. The
xvalues and the censoring pattern are fixed for a partic
ular data set but numerical realizations of Z change with
b (cf., 2.1). As noted in the following Lemma, for certain
"critical" bvalues two or more Z's will change rank. Some
further properties of S(b) as a function of b are given in
Lemma 2.8.
Lemma 2.8.
i)A change in the value of (b) can occur only as
b changes through
Y.Y.
b. = 3, x. > x. .
il xixj 1 3
. (2.9)
ii)For (i,j) : x^ > x^ the change in n^j(b) as
b increases through b^j is
c. = (6 + 6. )
13 3i
(2.10)
iii)Suppose there are distinct values of b^j given
by (2.9). Denote these values,
bi< b2 < ...,
B, = { (i j ) : b . = b, ,x. > x.}, k = l,...,&.
K lj K. J
Then S(b) is a step function in b with steps of magnitude
c^ at b^ where
c = E c. .
(i,j)eBk 13
(2.11)
32
Proof
i) Since z[+) (b) = bx^, it follows that for
X > X ,
1 3
>
Zi (b) = Z. (b)
<
b < b. .
13
b bij '
b > blj
(2.12)
where b^j is as in (2.9). Therefore
nj (b) = 6j(j)(zi(b) z.(b)) 5i4)(zj(b) z(b))}
can change only as b changes through b^ j.
ii) Notice that
n. .(b) =
13
, zi (b) > Z. (b)
, Z (b) = Z.. (b)
, Zi(b) < Z. (b)
(2.13)
In view of (2.12) it is clear that for (i,j)
X. > X .
1 3
we have that n^j (b) decreases by (6^+6^) as b increases
through b. .. Letting c. denote the change, c. .=(6.+ 6.)
completes the proof of (ii).
iii) Recall that
S (b) = Z n. (b) (2.14)
{ (i, j) :x.>x.} J
which can change with b only when (b) changes with
b for some (i,j). Since n (b) changes only at b = b, ,
13 x
S (b) will also change only at b=bk> For a specified b^
the change in S (b) is the accumulation of the change in (b)
as (i,j) varies over B^.; that is the change is c^ =
which completes the proof of (iii).
(i, j)e Bk
c. .
13
33
We now make some observations relevant to the compu
/\
tation of S and 8.
(a) If for a data set there are no tied x's and no coinci
dent b. values, then in Lemma 2.8 (iii) ,&=(!?)
1J A
(b) Since S (b) is nonincreasing in b (Lemma 2.6), the
greatest possible value for S(b) is S(bQ) where
kQ < b^ < b2, , < b.
(c) Any change in S (b) which may occur at b.,b2...,bÂ£,
depends (via 2.10, 2.11) on the censorship pattern of the
Z(b) which change rank at the critical b.
(d) Each critical b value b^ has a change value
c^, k = 1,2,...,A, associated with it (Lemma 2.8 (iii))
and easy sequential computation of S is facilitated in the
following way. Let s^ denote the value of S(b)for
b^ < b < with b+^ = , then
sl = so + clf
s 2 = sl + c2'
and in general
sk = sJc_1 + ck, k = 1,...,A. (2.15)
(e) The algorithm given in (d) does not give an evaluation
of S at the critical b values but this is not needed and the
enumeration given makes it easy to locate the point estimate
proposed in Definition 2.7.
34
For the purpose of illustration consider the following
hypothetical example consisting of n = 5 (x,y^) pairs.
Example 2.16
i 1
2 3 4 5
x. 1
i
2 3 4 5
*i 3
2+ 3 3+ 4
6 1
i
0 10 1
Using (2.9) and
. (2.10) we compute and c^j
values
for (i / j)
: x > Xj.
For example,
b2,1 = (2
 3) / (2 1) = 1
and
C2,l = "
(1+0) = 1.
These and
the other
values are given in Table 1.
Table 1.
Values of
b^j and (c^j) for Example 2.16
i\j
1
2 3 4
5
1

2
1 (D

3
0 (2)
1 (D
4
0 (1)
0.5 (0) 0 (1)
5
0.25 (2)
0.67 (1) 0.5 (2) 1 (1)

The first
step in the computation of the s^ is the
compu
tation of
s This
o
can be done using (2.3) with any value
35
b < b^; however, the computation can be much simplified in
the following way. For any b < b^ (2.12) and (2.13) imply
nij(b) = <5j if x. > x.. Therefore, from (2.14)
s = Z 6 .
(i j ) :xj_>xj 11
Since entries appear in Table 1 only if x^ > x^ we see that
sQ is simply the total number of entries under columns for
which 6. = 1. In our example these are columns 1, 3 and 5,
thus sq=4+2+0=6. Sequential computation of S then
follows using the ordered b^, k = !,...,Â£ and the correspond
ing "change" values, c^, computed using (2.11). The result
ing step function S(b) is tabulated in Table 2 and graphed
A
in Figure 1. The point estimate is 8 = 0.25.
Table 2. Computation of S(b)
s =6
o
Si Si i to.
k k1 k
1
2
3
4
5
1
0
0.25
0.5
0.67
1
1
4
2
2
1
2
5
1
1
3
4
6
6
36
S (b)
^ 6 
i
I
i
i
i
4 1
I
I
t
1
1
i
1
1
1
1
i
b
1.0
0.5
1
0
1
1
1
0.5
1. 0
T
i
i
i
4 ..
6 
Figure 1. Graph of S (b) vs. b for Example 2.16.
37
3.3 Interval Estimation of 3; Exact Method
A 100(la)% confidence set for 6 can be obtained by
inverting an alevel test of Hq : B=b (Lehmann, 1959,
p. 173). Our objective in this section is to show that a
distributionfree confidence interval for 6 can be defined
in terms of a permutation test of Hq: 8 = b using the sta
tistic S (b) defined at (2.3).
Consideration is now given to using permutation dis
tributions for S(b) that are generated for a given'data set
under assumption A2 of Section 2.3. A direct consequence,
of assumption A2 is that the probability of censoring or
noncensoring of a yvalue is independent of the associated
value of x, i.e. ,
Pr(T.
i i1 i i i
Thus the probability distribution of 6^ in a pair (Z^,6^)
is independent of x^ for all i. Let
w(b) = ( (z1 (b) ,6 1,x1) ,. . ,Un(b) ..<5n/Xn))'
be an observed vector. We now reckon on all permutations
of { (^ 61) (z2
consider the n! transformations of w(b) given by
g (w(b))=((z ,6 ,x,),...,(z ,6 ,x ) ) (3.2)
Â£ Pi Pi x Pn pn
where
Â£ ePn =Â£ : Â£ is a permutation of (l,...,n)}.
38
Thus under Hq: 8 = b and assumption A2, the conditional dis
tribution of a random W(b), conditional on its observed
value w(b) is uniform on {g (w(b)) Â£ e P }. Using the
E n
superscript c to indicate that the result is conditional we
have that the conditional null distribution of W(b) is
Prc (W (b) = g (w(b))) = ^7 Â£ e Pn (3.3)
The conditional discrete null distribution of S(b) can thus
be determined by computing S(g(*)) for each equally likely
g(*). A conservative test of Hq: 8 = b at nominal level a is
then given by
(1 S (w (b) ) > s (b) or S (w (b) )
4>(w(b)) =( (3.4)
v.0 otherwise,
where (w(b)) is the probability of rejecting Hq when
W (b) = w (b) ,
s (b) = Inf {s: Prc (S(b) > s  8 =b)}
u
and
sÂ£ (b) = Sup is: Prc (S(b) < s  8 =b)}>a/2).
The test given by (3.4) defines a conservative 100(la)%
confidence set for 8:
CSQ = (b: s (b) < S (b) < s (b) } (3.5)
p U
consisting of the set of all "acceptable" values of b.
39
The confidence set CS. will be conservative in the sense
p
Pr{ 6 Â£ CS,) >1 a, because of the use of discrete distribu
te
tions in defining sy (b) and s^(b) at (3.4).
The determination of CSa for a given set of data is
p
complicated by the fact that the null distribution of S(b)
depends upon b (Corollary 3.7). The following set of Lemmas
characterizes this dependence.
Lemma 3.6. If the xvalues are distinct then, for all b,
the permutation distribution of S (b) will be symmetric about
zero with support of the form p +2q, q = 0,1,2,...,r, for
m
some integers p,r. That is, the support values increment
by 2.
Proof; Consider the effect of associating the reverse order
of zvalues with fixed xvalues. From (2.3) we have
S(g (w(b))) = S(g (w(bm,
il H
for all p e Pn and Â£ e (3: q^ = np^+1,. . ,qn = npn+l} .
This proves the symmetry of S(b) Consider (b) ,i=lf
...,n associated with ordered xvalues x^ < < < xn.
Let (i,j) i > j denote arbitrary positions in the vector
of zvalues. Suppose that after a permutation of zvalues
the z(+)'s from positions (i,j) have moved to positions
(Pj_,Pj), respectively. Notice that if
>
then n..
iD
i
and if
= n
P P
ID
p < p. then n .
i D ID
P P
/
where = 1, 0, 1 from (2.3).
Thus in an arbitrary permutation of the elements (b) ,
i = l,...,n, all pairs contribute a change of 2 or 0 to the
total change of S(b) = I n. Therefore, the
(i,j):xi>x. 13
values of S(b) for two arbitrary permutations of z ^ values
must differ by zero or a nonzero multiple of 2.
The following Corollary shows that the null distribu
tion of S (b) does indeed depend on b.
Corollary 3.7. A change in b can change the permutation
null distribution of S(b).
Proof: In view of Lemma 3.6 this will be proved if we show
that a change in b can change S (b) by some odd number.
From (2.11) the jump in S( ) occurring as b changes through
bkis
c
k
Â£ (6+6,).
(i,j)eBk 3
Clearly this change can be an odd number.
Recall that S(b) is a function of the ranks of
(z^(b): i = l,...,n} and that these ranks can change only
at the critical values b^, k = 1,...,& (see 2.12). Thus
there can be no difference in the permutation distributions
41
of S(b) for b lying between critical values. The following
Lemma and Corollary concern situations for which the distri
bution of S(b) changes as b changes through a b^.
Lemma 3.8. Suppose b.. = b, for just one pair of subscripts
1J K
(i,j). That is, b^ is a nontied critical value. Consider
b', b" such that b^_^ < b' < b^< b"
by (2.10).
d
i) If c^j = 2 or c^j = 0,then S(b') = S(b") using
indicate "equal in conditional distribution."
ii) If c^j = 1 and xvalues are distinct,then S(b')
to
dc
# S(b").
Proof;
i) Since b. = b, for just one pair of subscripts (i,j)
ID K
it follows from (2.12) that only one pair (z^(b), (b))
changes order as b changes through b, Now, c. = (6 + 6 ) = 2
K 1] J 1
or 0 implies 6^ = 6^ = 1 or 5^ = 5^ = 0, so that the elements
which change rank order are either both censored or both
uncensored. Thus the censoring pattern over the ranks
of (z^(b'): i = l,...,n> is the same as the censoring pat
tern over the ranks of (z. (b") : i = 1,...,n) and conse
dc 1
quently S (b') = S (b") .
ii) Immediate from Lemma 3.6.
Corollary 3.9. If c.. = 2 or 0 for each pair (i,j) such
13
that b. =b, then the permutation distribution of S is not
JD k
changed as b jumps through b^ from b' to b".
Proof: Immediate from Lemma 3.8.
42
From the prior Lemmas and discussion it is clear that
we can identify ranges of b values for which critical points
for the tests (3.4) do not have to be recomputed. A change
of b through certain b^ will change the permutation distri
bution and in these cases test critical values must be
recomputed, from a regenerated permutation distribution.
The question of whether the confidence set CSD at
p
(3.5) will be a single interval is now addressed. It will
be shown that if, as b values are incremented outwards from
the point estimate, a point is reached at which test (3.4)
leads to rejection, no subsequent test would lead to accep
tance, and thus CSD does make up a single interval of plaus
p
ible b values.
We again denote critical b values as in Lemma 2.8(iii) ,
i.e., b^ < b2 < ... < b^, and let Sk1 denote the random var
iable S(b') and Sk denote S(b") for bk1< b' < b^ < b" < bk+1>
As noted earlier the distribution of is the same for any
b^
bution for we make the following definition.
Definition 3.10. The pvalue for a sample realization s, is
Pk "
Pr (Sk < sk) sk C 0
.Pr (Sk > sk) sk ^ 0.
In terms of the test at (3.4) we would reject Hq: 6 b'
for some bk < b" < bk+1 if pk < a/2. Consider sk_^ < 0
43
and think in terms of a test at some level a and b' a
rejected value. Then Pk_j_ < a/2 and it is clear that b"
cannot be accepted at the level a provided p < p .
JC K x
Therefore a sufficient condition for the confidence set
(3.5) to be a single interval is that
Pk < pkl k such that
Pk_^ < pk k such that sk > 0.
(3.11)
We now argue that the condition (3.11) will be satis
fied by the conditional null distributions of S. Because
S(b) is nonincreasing in b (Lemma 2.6) condition (3.11) is
satisfied whenever
S
k1
We now consider what happens when the distribution of
S(b) does change. We treat first the case where there are
no tied b.. values at b, and extend the result to the tied
13 K
case later. By Lemma 3.8(ii) our concern is with the case
where c.. = 1. For this case just one censored and one
uncensored value of 2 (b) change rank at the critical b^^ and
such interchange is between adjacent ranks (cf., 2.12).
Furthermore, since S (b) is nonincreasing, after such a
rank interchange the higher rank will be associated with
the lower xvalue and S (b) will have reduced by one. Without
loss of generality assume < x2 < ... < xn and note that there
44
can be no interchange of ranks associated with a pair of
tied x's (cf. 2.9). Let
R (+) (b) = (r{+) (b),R2(+) (b) ... ,R^+) (b) ) '
denote the ranks of {z+^ (b) : i=l,...,n}. As before, the
superscript (+) indicates the incorporation of censoring
information, R*(b) indicating that (b) is censored and
the value of z^(b) has rank R^ (b) and R^(b) indicating that
z^(b) is not censored and has rank R^(b). Clearly there
may be tied ranks, but for (i,j) such that b^^ is a crit
ical value we have R^ (b) =R^(b) only at b = b^^ If there
are no ties R eP For notation purposes take b, =b. .
n k ij
and let R^*^ denote R^ (b' ) and R^"1^ denote R^ (b") for
b' b" as in Lemma 3.8. A second subscript on R^4^ will
identify a particular vector. We now give as a theorem an
important result concerning condition (3.11).
Theorem 3.12. Suppose x. < x_ < ...< x and b.. = b, for
1 2 n i] k
just one pair of subscripts (i,j). Then ck = l implies
condition (3.11):
pk pkl
Pkl15 pk
if sk < 0 ,
if s^ > 0.
Proof: As noted earlier, when b passes through b^,, the
only change in the rank vector R^ + \b) is the interchange
J_ T_
and j elements. The elements interchanged
of its i
45
represent adjacent ranks and the exact nature of the change
depends upon the censoring pattern. Let r^=1,...,nl
represent arbitrary ranks. If c, = c. = l, then one of two
K 1]
situations must obtain as b passes through b, :
(1)
R(+) fR(+) R(+) R(+) ^
/ ^2 ^ /1\ j
,R
= (r.+1) ,.. ,R
n
will become
R + (R ( + ^ R ( + )
k,l lKl ,K2 '
,Rj+) =rx+l,
( + ) +
Ri r i'
,R(+))
n
or (2)
(+)
k1,2 (R1
(+)
, R,
( + )
r R .
( + )
 r+ r ( + ) r +1
,R ( + ) ) '
n
will become
( + ) ( + ) (+)
k 12 ~ 'Ki 7 9
,Rj + ^ = (^+1) + ,. . rRl+) =r2'
/R
( + )
n
In (1)
the
. th
position has the censored rank and in (2)
th
the censored rank is in the j position. Let s^ denote
the value of S(b) computed from Then from (2.15)
t ,m
s, = s, 1, m = 1,2. Notice that because the ranks
k,m kl,m
are adjacent, (r^+1)+ has the same concordances and dis
cordances with elements of R./ + as rt has with elements
k1,1 1
( + ) +
of R^ except for concordances between (r^+1) and
A similar
r. in r} + 1 i and rt and r,+l in r}+1 .
1 k1,1 1 1 k, 1
situation obtains for r^ and r^+1 in situation (1) and
also for corresponding ranks in situation (2). Notice
further that for any permutation of the elements of R^+]_
keeping x's fixed the rank pair (r^+l,r^) will contribute
46
zero in the computation of ^. Contribution to sk ^ of
(rl,(rl+l)+) for corresponding permutations of R^i l ^s
either +1 or 1. Thus if a permutation of elements of R/+!
K /1
gives a value of Sk < sk ^ then the same permutation of
+ i ]_ will give a value of Sk_^
Therefore,
Number of permutations'
of elements of R,^+i
k,l
giving < sk x
rNumber of permutations'
< of elements of r}+\ ,
I k1,1
^giving Sj.i < skl,l
Consideration of situation (ii) gives similarly
Number of permutations
( + )
of elements of _R
giving Sk < sk^2
k ,2
)
f Number of permutations'
of elements of Rk*^ 2
J ^giving Sk_1
These together imply in terms of pvalues for s, < 0
pk < pkl '
s, < 0.
k
Similar argument gives
/'Number of permutations'
< of elements of r/+!
\ kl,m
^ giving Sk_l>skl,m
Number of permutations
of elements of R,+1
k ,m
.giving skjIn
}
m = 1,2 and these imply
pki < pk'
s, > 0.
k
47
It remains to consider ties in the critical b's and
the general result is given in the following corollary.
Corollary 3.13. For the permutation distributions described
at (3.3) and pvalues defined in Definition 3.10, we have
in general for an arbitrary critical value b^ that,
Pk < pkl sk < '
Pk1 < Pk sk > 0.
Proof: As b increases through a critical bk a series of
adjacent ranks of z's is reversed and S(b) reduces by
Â£ c^j (as at 2.11) with the summation being over q pairs
(i,j) such that b. = b. Again let the rank vector after
1 j K
change through b^ be Rk+^ and observe (in the manner of
Theorem 3.12) that for any permutation of the elements of
no corresponding permutation of the elements of
can increase S (b) by more than Â£ cj_j Thus,
Number of permutations"
of elements of
giving Sk < sk
Number of permutations
of elements of Rk_^ giving)
ski < ski = sk E cij >
The result follows as in Theorem 3.12.
We have shown that for a chosen level a, as b increases
above the point estimate a value can be reached at which
the sample value of S(b) falls in the reject region of the
test at (3.4) and that further increase of b does not lead
to acceptance of S (b). Similarly for b decreasing below
the point estimate. The confidence set described at (3.5)
is thus established as a 100(1a)% confidence interval.
48
3.4 Interval Estimation of 0: Asymptotic Method
For large n, the tests proposed at (3.4) can be
performed using an approximation for the conditional null
distribution of S; i.e., the conditional distribution of S
under Hq: 6 = b. We may use directly the results of Daniels
(1944). Those results concern, in Daniels' notation, the
nl different ways of grouping n sample yvalues with n sample
xvalues. A score aj_j / i,j = l,...,n, is assigned to each
pair (x^,Xj) and a score b^j to each (y^,yj). For scores
such that a.. = a.., b.. = b.. and for summations over all
13 3i 13 31
3
subscripts Â£ a..a... and Â£ b..b... each of order n it is
13 13 13 13'
shown that a statistic
C = Â£ a. b. (4.1)
i,j 13
is conditionally asymptotically Normal with conditional mean
zero and an expression for the conditional variance is given
in terms of n, a^^ and b^j, i,j = l,...,n. Applying this to
data points (z+^,x^), i = l,...,n, and the statistic,
n n
S (b) = Â£ Â£ n (b) \Â¡j (x. x .) ,
i=l j=l 13 3
from (2.3), we have, comparing (4.1) and writing r\. ^ for
n (b)
'ii
r
49
C = 2 S (b) = Z Z n. Y
i j 1] A1D
with x =
13
sgn (x^Xj).
Clearly,
tj 
n and x = ~X
31 13 31
Furthermore
, since nij
1 < 1
and
IXijl <
1, the summations
 z n.
i /j / j 1 30
nij 1 < n
and
1 ^
i/j
Xij.l < n3
3
are of order n We can therefore apply Daniels' results
and conclude for the conditional null distribution of
S (b) that
i)E (S (b) ) = 0,
ii) Var (S (b) ) = [n (n1) (n2) ] _1 (Zr^ j j ,n? j ) (Â£x.Â¡_ j Xj_ j i
Zx? .) + [2n(nl) ] _1 (Zn? .) (X? .) (4.2)
where summation is over all subscripts from i = l,...,n.
iii) S(b) is asymptotically Normal.
Some discussion of when this asymptotic approximation
might be appropriate in practice is given in Chapter Four.
3.5 The TwoSample Location Problem
Gehan (1965) formulates a twosample test for right
censored data. An inversion of that test is now shown to
be equivalent to a special case of the method presented in
the previous sections of this chapter.
For populations 1 and 2 with distribution functions
Ffi*), F2C), respectively, Gehan tests
50
V Fl(t) = F2(t)
against
Hx: Fx(t) < F2 (t) ,
or against
H2: F]_(t) < F2(t) or F1 (t) > F2 (t) .
If we consider the shift model as a special case of this,
we have
F^ (t) = F2 (tA) for some A,
or equivalently, for random variables T^, T2 from popula
tions 1 and 2,
T
1
+ A.
We would wish to test
against
H : A = 0
o
H1: A > 0 or H2: A # 0.
Consider samples of size n^, n2 from populations 1 and 2,
respectively, i.e., y^ ,y^2 /Y^n and
y2l 'Y22 ,',',y2n where again y 's are the possibly
censored lifetimes and the extra subscript indicates the
population. Gehan's test of A = 0 would use the statistic
nl n2
W = I l U. .
ii ID
1=1 3=1 J
where
51
U. .
13
<
Yli < Y2j or yu < y2j
li = y2j or (Yi' *2j>
or y,, < y, or y,.
(5.1)
li 2 j
2 j
v* 1
*li > Y2j or yu > Y2j
Following an inversion procedure (cf. Randles and Wolfe,
1979, Ch. 6) for a distribution free confidence interval,
consider this statistic W computed with sample points:
( + ) (+) A A
Zli ^li A 1 ~ 1//ni,
z(+) = y(+)
Z2j y2j
j 1,...,/
1. e. ,
with
W = 2 U. .
r 1
ui. J 0
S. 1
z.. < z_. or z,. < z.
lx 2j li 2j
otherwise
z, > z_.
li 2j
r zu>z2j
Using the notation
&m
z 0 uncensored
xm
ZÂ£m censored
^ 1,2, III 1,2,... f T n a
nl n2
M = dl j1 5U*(z2jzli>
(5.2)
52
This is readily seen to be obtainable as a special case
of (2.3) by using that expression with
x
i
b =
observation from population 1
observation from population 2
and labelling x's such that
i = 1,. . ,n.
x. =
i
:
i = n^+1,...,n,
where n = n^ + n^. The summation at (2.3) then becomes
n.
1 n
S (A) = Z L {6 .
i=l j=nx+l D 1 D 1 3 1
and this relabels to the expression for W at (5.2) by denoting
li
= z .
i
li
= 5.
i
c
w
i1
ll
H
2j
= z.
i
2j
= 6.
i = n.+1,..
i
1
j = i ~ n.
Finding a confidence interval for A by inverting Gehan's
statistic (5.2) is therefore an equivalent procedure to
finding a confidence interval by inverting S(b) at (2.3).
53
3.6 Computational Aspects
The confidence interval procedure described in the
previous sections requires the progressive testing of hypo
thesized 8 values. For each such test use is to be made
either of the exact permutation distribution or the asymp
totic approximation. The exact test of an arbitrary value
b', for b' somewhere between two critical values, would
perhaps be made most conveniently if we could obtain a
pvalue directly for the sample s(b') value and reject
Hq: 8 = b' for a pvalue < a/2 for some level a. Neither
direct nor progressive computation of pvalues seems to be
tractable in practiceto do it requires identifying all
permutations of the current {z^4^: i = l,...,n} elements
giving S values less than (or more than) the current sample
arrangement. Resort is made in this study to the more labori
ous complete enumeration of the permutation distribution for
each b' to be tested. The algorithm employed is to work
outwards from the point estimate and to regenerate the per
mutation distribution as b changes through a critical value ,
realizing (from Corollary 3.9) that this will not always
be necessary because of no change in the distribution. For
each exact permutation distribution, critical regions can
be set and appropriate tests made.
The asymptotic procedure is similar but with variances
recomputed after certain steps in the S(b) function.
54
In both cases coarser searches across bvalues can be
made in order to reduce computing time and in this regard
there is also some advantage to making an initial search
using the asymptotic method and restricting the exact method
search to the appropriate regions.
Further details of the computing aspects are given in
appendices, along with some computer programs and their
documentation. Some results of using this new method appear
in the next chapter.
CHAPTER FOUR
COMPARATIVE STUDIES
4.1 Simulations Using a Simple Linear Model
As discussed in Section 2.6 there are some reasons for
preferring the Cox and the Buckley and James regression
estimates for use with censored data. According to Miller
and Halpern (1982 p. 530) "the choice between {these two
methods) should depend on the appropriateness of the propor
tional hazards model or the linear model for the data."
This section reports some simulation studies that compare
the performances of the Buckley and James method and the new
method of Chapter Three using data generated from the linear
model (2.3.1) with p =1. To afford some comparison with the
original simulations reported by Buckley and James their
choice of 8 = 0.2 for the linear model is adopted.
Wefirst describe how data were generated for the
simulation trials. Recall that according to model (2.3.1)
the survival times T^ have the representation
T. =a+Bx. +Â£., i=l,...,n.
X 11
For this series of simulations we fixed a = 30, 8 = 0.2
and the covariate x was given values from 40 to 100 in
increments of 60/(nl) for sample size n. Four models,
55
56
referred to as Tl through T4 were used for the distribution
of e in generating Tvalues. These were
and
T1 : e N (0,100) ,
T2 : Â£ Double exponential, i.e.,
f (Â£) = exp oo < Â£ < oo
with y = 7.07,
T3 : Â£ Shifted exponential, i.e.,
f ( Â£) = ^ exp (^) / 10 < Â£ < 00
with y = 10 ,
T4: e Uniform (a, a)
with a = 17.32.
Such choice permits comparisons over heavy tailed, light
tailed and nonsymmetric distributions. The variance of
e is 100 in each case in keeping with the scale of the
models used by Buckley and James.
Three different censoring mechanisms were imposed on
sets of Tvalues simulated using each of the distributions
Tl T4. These were
Cl : Uniform (Bx^ + a, 8x^ + b)
where a and b are constants chosen to control the censoring
rate
57
C2 : has the same density for all i given by
A (dc)
f(c) = ~ ^Ad' 0 < c < d ,
1 e
where A and d are constants chosen to control the censoring
rate, and
C3 : C. = r,
i
where r is a specified constant.
The type of censoring contrived for the form C2 is
motivated by consideration of a clinical trial of fixed
duration d with time of entry in to the trial random at
constant rate A. All censoring mechanisms satisfy assumption
Al of Section 2.3. Only Cl satisfies assumption A2.
Departures from A2 will be moderate for C2 and quite severe
for C3.
In all simulations reported here the constants in
Cl, C2 and C3 were fixed to give a nominal 40% censoring in
the data. The average censoring rate attained for each type
of data is given in the following tables. Any generated
data sets having less than 3 uncensored values were discarded
(see 2.5.6). As examples of the censoring patterns that can
arise using Cl, C2, C3, Table 3 gives the number of censored
observations occurring at x = 40(10)100 for 2000 simulations
of 3 different types of data. The frequencies clearly indi
cate the anticipated departures from assumption A2 with
censoring forms C2 and C3. The percentage censored column
58
gives the percentage of censored responses over all simula
tions for the particular data type.
Table 3. Frequency of occurrence of censoring at different
x in 2000 simulations of samples of size n = 7.
Data
%
Censored
40
50*'
xvalues
60 70
80
90
100
Tl,
Cl
38
723
774
791
742
769
793
784
Tl,
C2
41
641
681
754
815
891
916
1015
Tl,
C3
38
320
453
583
765
942
1070
1251
We now explain five methods of analysis which were
compared using the simulated data. The abbreviations given
below in parentheses will be used to identify the methods
throughout the presentation of the results.
a) Least squares (LS). Standard least squares analysis
was applied to the survival times T as if they were all known.
The usual confidence interval procedure is strictly valid
only in case Tl. In that case the confidence interval serves
as a standard with which to compare other methods. The point
estimates will be unbiased in all cases and this fact pro
vides some check on the simulation procedure.
b) New method (N). This is the method introduced in
Chapter Three. It is valid for all models Tl through T4
with censoring mechanism Cl. The application of this method
differs for different sample sizes. Simulations were run
59
for sample sizes n = 7, 10, 15 and 25. For samples of size
n = 7 the permutation distribution required for the method
was based on all 7! = 5040 permutations generated as ex
plained in Section 3.3. For n = 10, the permutation distri
bution was estimated using a random sample of 10,000 permu
tations, selected with replacement from the possible 10!
permutations. A separate small study which compared 1%, 2%,
5% and 10% points of the estimated permutation distribution
with the corresponding percentage points of the true permu
tation distribution showed good agreement. For n = 15 and
n = 25 the asymptotic approach of Section 3.4 was used.
Randomized tests were not incorporated in the simula
tions and so the intervals obtained would be expected to be
conservative in cases where assumption A2 is satisfied.
In the small sample simulations (n = 7, 10) the test crit
ical values were adjusted accordingly to give less conserva
tive intervals. Details of this are given with the results.
By its nature this method can give infinite length
intervals and some results on the likelihood of this for
different size samples are given in Table 4. In contrast
to the other methods the confidence intervals by this method
are not necessarily symmetric about the point estimate and
some implications of this are presented in Table 15.
c) Least squares on a reduced sample (1). Standard
least squares analysis was applied just to the known
(uncensored) survival times. This was included to investigate
60
the worth of the passive stance of simply throwing away
the censored observations. Clearly, this is not a valid
sampling procedure for application of the usual least
squares theory. It is intuitively likely that slope esti
mates would be unbiased in the case of type Cl censoring.
At least three uncensored observations are required for
application of the usual error variance estimation.
d) BuckleyJames Normal approximation (BJN). The
method of Buckley and James as explained in Section 2.5 was
used. It is an asymptotic method and its appropriateness
in small samples is unknown. At least three uncensored
observations are required. As discussed previously
(Section 2.5) the convergence of the BuckleyJames method
is not guaranteed and can be a problem. For the sake of
these simulations the following procedure was adopted.
Convergence was considered to have occurred if iterates
agreed to within 0.00005. If convergence was not reached
within 20 iterations a search for oscillation cycles was
made over the final 10 iterations. If 2 iterates in this
search agreed to 0.005,then the arithmetic mean of all
iterates in that cycle was taken as the point estimate.
If such agreement was not reached within the 10 iterations,
then it was deemed that "no solution was possible." Doubt
less a solution could be contrived in an individual case,
but no further attempt to coax out a solution was made in
these trials. In some simulations the starting value chosen
61
~ (1)
for the iteration was Â¡3 the least squares estimate using
only the uncensored observations, and in others the value
~ (0)
chosen was 6 the least squares estimate using all the
observations without regard to whether they were censored
or not. The choice of starting value was discussed briefly
in Section 2.5 and some indication of how it might affect
the convergence is given in Table 4.
Confidence intervals for 3 using this method were
placed by using the variance estimate (2.5.5) and critical
values of a normal distribution.
e) BuckleyJames tapproximation (BJT). This proce
dure was as for BJN except that by ad hoc reasoning the
critical values for a tdistribution were used in placing
confidence intervals. From consideration of expression
(2.5.6) for the error variance estimation the value for the
degrees of freedom for t was taken as two less than the
number of uncensored observations.
Some results are now presented of how appropriate
selections of the above methods compared in various aspects
of their performance. The performance aspects to be con
sidered are
i)Infinite intervals and convergence
ii)Bias of estimator
iii)BJ variance estimation
iv)Coverage probability
v)Length of confidence interval
vi)A power comparison.
62
These aspects are now discussed in turn.
i) Infinite intervals and convergence. Some features
of the methodologies of the BuckleyJames method and the
new method are now considered. Method N will always pro
duce a point estimate but it is possible to obtain infinite
length intervals. The infinitelength intervals usually
contain one finite bound so that the interval will still be
useful. The proportions of infinite length intervals ob
tained at different sample sizes in a series of 200 simula
tions are given in Table 4. Clearly, infinitelength inter
val is a problem associated with very small sample sizes.
Even for n = 7, the proportion of such intervals was less
than 15% in all but one case.
Recall that the BJ method might not give a point
estimate due to nonconvergence and consequently might not
produce a confidence interval. The proportion of simula
tions that did not give direct convergence and the propor
tion for which "no solution was possible" even after allow
ing for oscillation are also included in Table 4 for
*(0) /N(1)
sample size n = 7 and starting values 6 and 3 as well
as for sample sizes n = 15 and n = 25 with starting value
~ (1)
3' The comparison of starting values at n = 7 indicates
~ (1) ~(o)
that 3 might be a better choice than 3 However,
except for the results given in Tables 4 and 5, all our
~ (0)
results were obtained by using 3 as suggested by Miller
and Halpern (1982). Simulations for n = 7 using each
Table 4. Percentages of 200 simulations leading to infinite intervals by N
and percentages giving convergence problems with BJ.
interval Nonconvergence No solution
Data
n =
Start =
7
e(0>
s'1'
15
B<0)
25
3 (0)
B<>7
3(l)
15
8
25
B>
Tl, Cl
7.5
23.0
20.5
30.5
26.0
3.5
4.0
0.5
0.0
T2 ,C1
13.5
30.5
23.5
30.5
21.5
10.0
5.5
1.5
0.5
T 3, Cl
10.5
24.0
19.5
24.0
21.0
3. 0
1. 0
1. 0
0.0
T:4 ,C1
11.0
25.5
22.5
28.0
26.5
2.0
2.5
1.0
0.5
Tl, C2
9.5
22.0
17.5
25.0
18.5
7.0
3.5
1.0
0.5
T2 ,C2
12.0
23.5
18.5
31.5
19.5
9.5
1.5
1. 5
0.5
T3,C2
14.5
17.0
15.0
19.5
16.0
8.5
2.5
1.5
0.0
T4 C2
13.5
23.5
19.5
35.0
27.0
2.5
3.5
4.5
0.5
Tl ,C3
14.0
31.0
31.0
35.0
39. 5
5.0
4.5
1.0
0.5
T2 ,C3
21.5
34.5
34.0
43.0
41.5
8.0
5.0
4.5
2.0
T3,C3
11.0
37.0
38.0
35.5
30.0
2.5
2.0
0.0
0.0
T4,C3
3.5
29.5
28.5
23.5
23. S
3.5
2.5
0.5
0.0
~(0) ~(i)
1) Comparisons between 3 and 3 are for exactly the same data.
2) No infinite intervals occurred for n = 15 or n = 25.
64
starting value gave no indication that the overall conclu
sions of the following sections would be altered by the
choice of starting value.
Looking at results for the individual samples generated
showed that nonconvergence is more common for highly cen
sored samples but that it does occur at all levels of cen
soring. In a search over some of the results for the case
n = 7 infinite intervals were not found to occur when there
were more than three uncensored values. No intervals of
(00,00) were encountered. It was not particularly common
for nonconvergence and infinite intervals to occur for the
same data set.
ii) Bias of estimator. Arithmetic means of the point
estimates obtained by each method over 2000 simulations
for n = 7 are given in Table 5 and means over 200 simula
tions for n = 25 are given in Table 6. The estimated stan
dard errors of these means are also given.
Because of some nonconvergence for the BJ method the
number of estimates averaged might be less than the number
of simulations. The actual number used for the BJ method is
given in the final column. Means which are more than two
standard deviations from the nominal 3 = 0.2 are indicated.
3
Table 5. Arithmetic means (x 10 ) based on 2000 simulations of slope estimates for
samples of size n = ?.
%
Data Censored
LS
N
1
BJ
Number of
BJ Estimates
Tl, Cl
38
204
(4.2)
204
(5.6)
197 (6.1)
203
(5.2)
1960
T2,C1
42
197
(4.1)
195
(5.8)
195 (6.6)
196
(5.4)
1938
T3 ,C1
36
203
(4.1)
200
(3.8)
202 (4.2)
201
(4.0)
1971
T4,C1
39
199
(4.3)
200
(5.6)
202 (6.3)
199
(5.4)
1943
Tl ,C2
41
200
(4.2)
197
(6.0)
169* (6.7)
194
(5.7)
1941
T2 ,C2
40
196
(4.2)
200
(5.9)
165*(6.6)
193
(5.6)
1948
T3,C2
41
204
(4.1)
200
(4.4)
181*(4.6)
195
(4.4)
1947
T4,C2
41
195
(4.3)
199
(6.2)
167* (7.0)
194
(6.0)
1921
Tl ,C3
38
209
(4.2)
k
220
(5.5)
091* (5.1)
200
(4.9)
1937
*
*
*
T2,C3
44
203
(4.2)
218
(5.6)
051 (6.0)
184
(5.2)
1901
k
*
k
T3,C3
37
198
(4.2)
193
(2.8)
136 (2.5)
186
(2.6)
1965
*
*
*
T4 ,C3
30
186
(4.4)
192
(5.7)
088 (4.5)
177
(5.1)
1958
indicates that the mean is more than 2 standard deviations from the nominal
3 = 0.2. Estimated standard deviations (x 103) of the means are given in
parentheses.
3
Table 6. Arithmetic means (x 10 ) based on 200 simulations of slope estimates for
samples of size n = 25.
Data
%
Censored
LS
N
1
BJ
Number of
BJ Estimates
Tl,Cl
44
192
(7.1)
185
(8.9)
185 (9.2)
190
(8.7)
200
T2 ,C1
50
198
(7.4)
194
(9.2)
192 (10.7)
194
(9.5)
199
T3,C1
38
209
(8.1)
211
(5.9
213 (7.1)
215*
(7.4)
200
T4 ,C1
44
202
(7.7)
206
(9.5)
209 (10.5)
207
(9.2)
199
Tl ,C2
46
200
*
(7.1)
196
(10.5)
168*(10.0)
*
193
*
(9.5)
199
T2 ,C2
46
180
(8.4)
185
(10.0)
157 (10.7)
178
(10.3)
199
T3,C2
47
193
(8.1)
195
(6.7)
175* (7.4)
194
(8.1)
200
T4,C2
47
194
(7.7)
186
(11.0)
160* (10.3)
188
(10.3)
199
Tl ,C3
43
196
(7.4)
205
(8.4)
083* (6.7)
198
(8.7)
199
T2 ,C3
51
194
(7.1)
210
(8.4)
052*(8.4)
195
(9.2)
196
T3 ,C3
38
200
(8.4)
197
(4.5)
135*(3.2)
196
(4.5)
200
T4,C3
32
188
(8.1)
188
(10.3)
102* (6.3)
187
(9.5)
200
Indicates that the mean is more than 2 standard deviations from the nominal
3 = 0.2. Estimated standard deviations (x 10j of the means are given in
parentheses.
67
The results in Tables 5 and 6 indicate that method 1 can
give severely biased results if used in cases where assump
tion A2 is not satisfied. Departures from A2 do not seem so
serious for methods N and BJ, especially at the larger sample
size. There is some bias indicated for methods N and BJ at
sample size 7 with censoring type C3.
iii) BJ variance estimation. An investigation was made
of how well the expression (2.5.5) approximates the variance
of the BJ estimator. For each type of data 2000 simulations
for n = 7 and 200 simulations each for n = 15 and n = 25 were
run, computing where possible the BJ estimate and an estimate
of its variance using (2.5.5). The sampling variance of the
estimates and the mean of the variance estimates were computed.
The results for n = 7 are displayed in Table 7.
From the results shown in Table 7 it seems clear that
the variance expression (2.5.5) overestimates the variance
of the BJ estimator quite markedly for sample size n = 7.
A similar but less marked overestimation was found'for n = 15
while for n = 25 the estimation was much better.
68
Table 7. Simulation results concerning estimation
of the variance of the BuckleyJames
estimator with sample size n = 7.
Data
# of
Estimates
Var *BJ
Mean of
Variance
Estimates
Tl ,C1
1960
0.053
0.095
T2,C1
1938
0.057
0.096
T3,C1
1971
0.031
0.045
T4 ,C1
1943
0.057
0.092
Tl ,C2.
1941
0.064
0.103
T2 ,C2
1948
0.061
0.091
T3,C2
1947
0.038
0.049
T4 ,C2
1921
0.069
0.110
Tl ,C3
1937
0.046
0.091
T2 ,C3
1901
0.052
0.104
T3 ,C3
1965
0.013
0.019
T4,C3
1958
0.051
0.066
69
iv) Coverage probability. The various confidence
interval procedures are now compared on the basis of the
coverage attained by the intervals. Nominally the intervals
are 90% except for method N with n = 7 and n = 10. In those
cases the intervals are nominally 88% but known to be con
servative with Cl type censoring distribution (see 3.3.5).
In general this adjustment led to intervals with coverage
90% or greater and thus established a more common base from
which to compare other aspects of the various methods.
Tables 811 give the results for n = 7, 10, 15, 25,
respectively. Recall that results for n = 10 are based on
estimates of exact permutation distributions. Because of
computing expense results for n = 10 were obtained only for
three data types. It was felt to be useful to include these
to give some indication of how well the distribution approx
imation functioned. Coverages for the BJ method are calcu
lated using only the data which do give a BJ estimate.
70
Table 8. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 7.
LS
N'
1
BJN
BJT
% Censored
T1 ,Cl
85.5
93.0
88.5
81. 9
93.8
38
T2 ,C1
90.5
95.0
90.5
80.6
92.2
43
T3,C1
90.0
93.0
90.0
77.8
89.7
37
T4,C1
91.0
98.0
92.0
84.2
95.9
39
Tl ,C2
92.0
96.0
89.0
80.6
93.0
41
T2 ,C2
89.5
93.5
91.0
88.4
95.6
40
T3,C2
88.0
91.5
93.5
81.4
95.1
41
T4,C2
87.5
93.0
85.0
76.8
89.7
42
Tl ,C3
90.5
93.0
87.0
83.2
92.6
38
T2 ,C3
90.5
92.0
80.0
79.9
91.3
43
T3 ,C3
89.5
93.0
81.0
84.1
91. 3
37
T4,C3
88.5
93.5
82.5
79.8
90.7
32
Nominally intervals were 90% except for method N where
the setting was 88%. Standard deviations for the table
entries are approximately 2.1.
71
Table 9. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 10.
LS
N
1
BJN
BJT
% Censored
Tl ,C1
86.5
91.5
87.5
87.8
92.9
44
T1 ,C2
92.5
94.5
89.0
85.7
93'. 1
44
Tl,C3
90.0
90.5
80.0
79.0
86.7
41
Nominally intervals were 90% except for method N where
the setting was 88%. Standard deviations for the table
entries are approximately 2.1.
72
Table 10. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 15.
LS
N
1
BJN
BJT
% Censored
Tl ,C1
92.0
92.5
89.0
90.5
94.5
44
T2 ,C1
90.5
91.0
92.5
91.9
94.9
49
T3,C1
93.0
88.5
90.0
84.9
91.4
40
T4 ,C1
89.5
89.0
89.5
88.9
91.4
44
Tl ,C2
91.5
93.5
91. 0
89.9
95.5
47
T2 ,C2
88.5
87.5
89.5
85.3
91.4
48
T3 ,C2
93.5
92.0
91.0
88.3
95.0
47
T4 ,C2
92.5
91.0
88.5
87.4
94.8
49
Tl ,C3
92.0
89.0
75.0
84.9
89.9
42
T2 ,C3
91.0
91.0
70.5
90.1
93.7
49
T3 ,C3
87.5
90.0
83.0
81.0
84.5
39
T4 ,C3
90.0
92.0
84.5
81.4
87.4
33
Nominally intervals were 90% Standard deviations for the
table entries are approximately 2.1.
73
Table 11. Percentage coverage of confidence intervals based
on 200 simulations with sample size n = 25.
LS
N
1
BJN
BJT
% Censored
T1, Cl
92.5
91.5
90.5
91.0
91.5
44
T2,Cl
91.0
91.5
91.5
91.5
95.5
50
T3, Cl
91.5
90.5
88.5
86.0
89.0
38
T4 ,C1
87.5
87.0
87.0
87.4
90.0
44
T1 ,C2
92.0
87.0
88.5
86.9
91.5
46
T2 ,C2
89.5
88.0
85.5
87.4
90.0
46
T3 ,C2
84.5
91.0
88.5
79.0
82.0
47
T4 ,C2
90.5
89.0
90.0
89.5
91.5
47
Tl ,C3
94.0
94.0
71.0
83.4
87.4
43
T2 ,C3
92.5
94.0
64.0
88.3
90.8
51
T3 ,C3
87.5
92.0
69.5
81.5
84.5
38
T4 ,C3
89.0
88.5
77.5
78.5
82.0
32
Nominally intervals were 90%. Standard deviations for the
table entries are approximately 2.1.
74
The implication from the results for all sample sizes
is that method 1 may be acceptable in terms of coverage for
Cl and possibly C2 censoring but would not guarantee cover
age with C3 type censoring. This finding of course is in
keeping with the observations about bias.
Method N shows quite close to or greater than nominal
coverage over all models and sample sizes. The asymptotic
method of section 3.4 with n = 15 and n = 25 and the approx
imate permutation distribution for n = 10 give coverage gen
erally close to 90%. Use of the exact permutation distribu
tion with n = 7 gives very conservative 90% intervals even
though the setting was 88%.
Method BJN does not come close to attaining coverage
for n = 7 and the coverage is fairly consistently low over
all models for the larger sample sizes. For n = 7, all the
observed coverages are more than 2 standard deviations below
90%. There are some similar significant departures for the
larger sample sizes, particularly with C3.
Method BJT with n = 7 generally gave conservative
intervals with coverages comparable to those of Method N.
Method BJT gave some low coverages at the larger sample sizes
particularly with C3. This loss of coverage for the larger
n is a result of the improvement, mentioned earlier, of the
estimate of the variance of the point estimate. For n = 7
the inflated variance estimate secures coverage for the
interval.
75
Overall, the coverage by method N was never signif
icantly low, whereas at least one significantly low cover
age occurred for each of the methods 1, BJN and BJT.
v) Length of confidence interval. Confidence inter
vals are of course most meaningfully compared in terms of
length when they attain much the same coverage. Table 12
shows mean interval lengths and estimates of their standard
deviations for a series of simulations with sample size
n = 7. An interval length was included in the computation
only if all methods provided a finite length interval for
the particular data set. Thus for each data type the number
of interval lengths averaged is the same for all methods.
As discussed earlier, highly censored data sets are the ones
which tend to not give an interval, particularly for n = 7.
Therefore the results on interval length for n = 7 should
be viewed as applying to data with censoring less than that
indicated in Table 8.
Results for method BJN are not included in Table 12.
That method did give consistently shorter intervals than
method N but according to results in Table 8 is not competi
tive in terms of coverage. Where coverages are comparable
the indications are that method N gives, on average, shorter
intervals than both BJT and 1. With nominally 40% censoring
the intervals by method N turn out to be about twice the
length that would be obtained by LS had the data been com
plete. The column headed BJTN in Table 12 gives the
76
difference in mean interval lengths for the two methods and
the estimate of the standard deviation of the difference for
a paired comparison. On all but two occasions the method N
length is shorter, and is shorter by more than 2 standard
deviations on 3 occasions.
Results for n = 15 are in Table 13. Because of cover
age limitations results for methods 1 and BJN are not,
included. In cases where coverage was attained BJN tended
to give rather shorter intervals than N and intervals by 1
were comparable in length to those of N.
Method N gave intervals that tend to be about onehalf
the lengths that it gave in the n = 7 trials. In most cases
the length of the interval by N is about 1.4 times the stan
dard set by LS. For data generated using T3 the interval
lengths by both methods N and BJT are close to or less than
the LS value.
Average interval lengths obtained with' data generated
using Cl or C2 turned out shorter using method N than using
method BJT. The reverse was the case with C3 censoring.
These differences seem to be merely reflections of the cover
ages obtained and it is likely that for the same coverage
there is little to choose between N and BJT in terms of
length.
Simulations with n = 25 gave intervals with method N
which were about twothirds of the length of corresponding
intervals with n = 15. Method N intervals were approximately
77
1.3 times the length of LS intervals except, as before, in
the case of T3 models where all methods tended to give
shorter intervals than LS.
Table 14 gives mean differences of interval lengths and
their estimated standard deviations for paired comparisons
of N with BJN and N with BJT. Assessment of these differ
ences must take account of the observed coverages and these
are included in the table. Where coverages were comparable,
intervals by BJN were significantly shorter than those by N
on three occasions and significantly larger on one occasion.
Only once did BJT attain both reasonable coverage (87.4%)
and a significantly shorter length than N. Method N gave
significantly shorter intervals than BJT on seven occasions.
Table 12. Mean lengths of 90% confidence intervals based on 200 simulations
for sample size n = 7.
LS
N
1
BJT
BJT
N
Tl, Cl
0.729 (. 017)
1.360 (.054)
1.500 (. 124)
1.739 (.156)
0.379*
(.135)
T2 ,C1
0.685 (. 021)
1. 442 (. 069)
1.771 (.155)
2.101 (.221)
0.659*
(.183)
T3,C1
0.692 (. 030)
0.900 (.050)
0.804 (.066)
0.934 (.075)
0.035
(. 047)
T4 ,C1
0.735 (. 013)
1.289(.051)
1. 463 (.104)
1.781(.171)
0.492*
(.141)
Tl ,C2
0.714 (.019)
1.653 (.080)
1.577 (.133)
1. 847 (.168)
0.194
(.134)
T2 ,C2
0.718 (.023)
1.502 (. 069)
1.501 (.115)
1.766 (. 188)
0.264
(.161)
T3 ,C2
0.628 (. 022)
1.179(.066)
0.941 (. 074)
1.080(.102)
0.098
(.084)
T4 ,C2
0.721 (.014)
1.578 (.071)
1.440(.126)
1.676(.156)
0.098
(.125)
Tl ,C3
0.737 (.018)
1.203 (.057)
1.053 (. 122)
1.276(.131)
0.073
(.099)
T2,C3
0.701 (.022)
1. 173 (.058)
1.073 (. 084)
1.292 (.105)
0.118
(.075)
T3 ,C3
0.614 (.022)
0.669(.031)
0.617 (.064)
0.770 (.072)
0.102
(.058)
T4,C3
0.740 (.013)
1.373(.051)
0.900 (.058)
1.241 (.096)
0.132
(.079)
Indicates a difference greater than 2 s.d. Values in parentheses are estimated
standard deviations for the entries.
79
Table 13. Mean lengths of 90% confidence intervals based
on 200 simulations for sample size n = 15.
LS
N
BJT
T1,C1
. 495 (. 007)
. 670 (.021)
92.5
.713 (.039)
94.5
T2 ,C1
.474 (.009)
. 674 (.021)
91.0
. 863 (. 052
94.9
T3,C1
.471 (.011)
.400 (.011)
88.5
.471(.017)
91.4
T4,C1
. 488 (.005)
.679 (.017)
89.0
.756 (.025)
91.4
Tl ,C2
.485 (.006)
.723 (.018)
93.5
. 830 (. 050)
95.5
T2,C2
. 463 (.009)
.648 (.021)
87.5
.732 (.029)
91.4
T3 ,C2
.464 (.012)
.487 (.013)
92.0
.508 (.016)
95.0
T4,C2
. 496 (. 005)
.798 (.026)
91.0
. 863 (.029)
94.8
Tl ,C3
.491 (.007)
. 680 (.022)
89.0
.605(.021)
89.9
T2 ,C3
.475 (.009)
.656 (.022)
91.0
. 653 (.025)
93.7
T2 ,C3
. 460 (. 011)
. 353 (.009)
90.0
. 293 (. 010)
84.5
T4 ,C3
. 488 (. 005)
. 662 (.016)
92.0
.519 (.013)
87.4
Values
in parentheses
are estimated
standard
deviations
for
the means.
The third figure in the N and BJT columns is the observed
percentage coverage.
80
Table 14. Differences in mean lengths of 90% confidence
intervals based on 200 simulations for sample
size n = 25.
BJN
Coverage
BJNN N
Coverage
BJTN
BJT Coverage
T1,C1
91.0
0.030*(.005)
91.5
0.010 (.006)
91.5
*
*
T2 ,C1
91.5
0.015 (. 006)
91.5
0.069 (. 007)
95.5
*
*
T3,C1
86.0
0.020 (.006)
90.5
0.045 (. 007)
89.0
T4,C1
87.4
+0.000 (.006)
87.0
0.044* (.007)
90.0
T1,C2
86.9
0.023*(.005)
87.0
0.022 (.
006)
91.5
T2 ,C2
87.4
0.004 (.007)
88.0
0.048*(.
009)
90.0
T3 ,C2
79.0
0.011(.006)
91.0
0.021*(.
006)
82.0
T4 ,C2
89.5
0.030*(.006)
89.0
0.020*(.
007)
91.5
Tl ,C3
83.4
0.119*(.007)
94.0
0.087 *(.
007)
87.4
T2 ,C3
88.3
0.052*(.009)
94.0
0.006 (.
009)
90.8
T3 ,C3
81.5
0.069*(.004)
92.0
0.055*(.
004)
84.5
T4,C3
78.5
0.151*(.007)
88.5
0.128*(.
007)
82.0
*Indicates a difference greater than 2 s.d. Values in
parentheses are estimated standard deviations for the
differences.
81
vi) A power comparison. Because the method N can give
nonsymmetric confidence intervals it was felt that an inter
esting comparison of methods might be made on the basis of
rejection of a null hypothesis that 8=0. It was thought
that, although method N may give intervals of the same length
as another method, it might be that there would be a different
chance of rejecting 8=0 for the two methods because of the
asymmetry of the N intervals. With our setting of 8 = 0.2
for the model this amounts to comparing the power of the
methods for a test of Hq: 8=0 versus : 8 = 0.2. Recall
(Section 3.2) that method N used for testing Hq: 8 = 0 is
the test of Brown, Hollander and Korwar (1974). In think
ing of estimating a power value for the BJ method we must
decide whether to use the proportion of rejections out of
all simulations or the proportion of rejections out of those
simulations which give convergence and thus do give a test.
Values could be quite different for small n. Table 15 shows
the proportions out of all simulations.
No marked effects of asymmetric intervals are seen.
Method N shows up rather well compared to BJT for n = 25 in
keeping with the results on interval length and both methods
show good increase in proportions ("power") with sample size.
82
Table 15. Proportions of 200 simulations for which
Hq: 8=0 was rejected at a test level
a = 0.1.
n
N
= 7
25
BJT
7
25
T1,C1
0.14
0.38
0.11
0.39
T2,C1
0.16
0.43
0.15
0.38
T3,C1
0.30
0.77
0.28
0.69
T4, Cl
0.09
0.43
0.09
0.38
T1 ,C2
0.07
0.39
0.10
0.34
T2,C2
0.13
0.38
0.12
0.32
T3 ,C2
0.21
0.67
0.21
0.60
T4,C2
0.11
0.37
0.10
0.34
T1,C3
0.21
0.46
0.08
0.51
T2,C3
0.28
0.56
0.06
0.43
T3 ,C3
0.35
0.79
0.26
0.95
T4,C3
0.19
0.38
0.12
0.52
83
4.2 Analysis of Heart Transplant Data
The Stanford Heart Transplant Program was mentioned
in Chapter One. Data from that program has been analyzed
by various authors at various times since the start of data
collection in 1967. Survival times, ages and mismatch
scores available as of February, 1980, are tabulated in
Miller and Halpern (1982). Those authors compare the
results from analyses using the methods of Cox, Buckley and
James, Miller, and Koul, Susarla and Van Ryzin that were
described in Chapter Two.
Data are available for 184 patients and 55 of the
survival times are censored values. Ages are available for
all 184, but mismatch scores are available for only 157
patients. Miller and Halpern compare the four methods of
analysis using the 157 complete records in a multiple
regression of the base ten logarithm of survival time
against age and mismatch score. They declare the mismatch
score, insignificant and say that the results they quote for
age from the multiple regression are practically identical
to those using age alone for the various methods. Thus the
results on age from their multiple regression are included
in Table 16. Also included in Table 16 are new computations
for the Buckley and James method for a single regression
with age alone using all 184 data values and the results of
using the new method, first with the 157 complete records and
then with the whole set of age data.
84
That the KSV estimate is positive is in conflict with
the commonly held view that increasing age is deleterious
to successful heart transplantation. Because computational
expressions are not available for an estimate of the vari
ances of the KSV estimates in multiple regression they could
not be given by Miller and Halpern and although an expression
is available in the single covariable case the computation of
the variance has not been added here.
The Miller method did not converge. Oscillation
occurred between two values and the implication from those
values is that there is no significant regression on age.
At the 95% level of testing the method of BuckleyJames
showed a significant negative slope on age for the larger
sample size and the Cox result was also significant. With
A
the Cox parameterization the positive value of 8 indicates
a decrease in survival time with age.
The new method also showed a significant negative slope.
Miller and Halpern compare the Cox results with the direct
regression results by plotting the median survival times
estimated by the Cox model against age. That plot is close
to linear and has a slope more negative than the BJ estimate
and close to the values given by the new method.
Comparison of results of regressing logarithm of sur
vival times on the mismatch scores are given in Table 17 for
the Buckley and James method and the new method. In this
case the new method gives the smaller negative slope, but
neither result attains significance.
85
Table 16. Regression slope estimates and confidence
intervals for log^o survival time versus
age at transplant with Stanford heart
transplant data. (n=157 except where stated)
Estimator 6 95% C.I.
Cox
0.030
(0.008, 0.052)
Miller
(loop with
0.001
(0.023, 0.021)
2 values)
0.000
(0.016, 0.016)
BJ
n = 157
0.015
(0.031, 0.001)
n = 184
0.014
(0.028, 0.000)
KSV
0.024

New
n = 157
0.030
(0.050,0.010)
n = 184
0.026
(0.045,0.009)
\
Table 17. Regression slope
estimates and
confidence
intervals for log^o of survival times versus
mismatch score with n = 157 Stanford heart'
transplant patients.
Estimator 8 95% C.I.
BJ 0.105
New
0.002
(0.373, 0.163)
(0.327, 0.311)
86
4.3 Summary of Results
Both the BuckleyJames and the new method were put
forward as procedures which did not require assumptions
about censoring as far as point estimation was concerned.
The indications from the trials reported in Tables 5 and 6
are that both methods might lead to biased results with
small samples showing extreme departures from assumption A2.
Otherwise point estimation was close to the model setting
except in two cases using method BJ. In terms of the method
ologies the new method will always give an estimate whereas,
as shown in Table 4, convergence difficulties can arise with
BJ, especially with small samples.
The confidence interval procedures for both BJ and N
require assumption A2 and a major concern is how robust the
methods turned out to be against departures from that assump
tion. The observed coverage over all trials with method N
was always close to or greater than the nominal 90%. From
the results in Tables 8 to 11 it seems that method BJN should
not be advocated for samples up to size 25. Contrary to
anticipation, method BJT gave more security over coverage
at the small sample size. At larger sample sizes BJT would
probably attain coverage with mild departures from A2 but
perhaps could not be relied on with severe departures from
A2. The indications are that over a range of sample sizes
less than 30 method N has the better robustness properties.
87
The overall impression from Tables 12 to 14 is that the
new method is competitive in terms of interval length when
comparisons are made for those intervals that attain cover
age. Thus in conclusion it can be said that over all trials
method N gave the better assurance of coverage coupled with
a competitive interval length.
Sample sizes greater than 25 were not simulated in this
study. Large sample results were compared using the Stanford
heart data. Here of course there are no "true" results for
comparison but method N gave results consistent with com
monly held views about the data.
CHAPTER FIVE
CONCLUSION
We now submit a final appraisal of all the methods
considered; firstly with regard to their properties and
scope, and then in terms of some observations on their
performance.
The methods of Cox, Miller and BJ extend to multiple
regression whereas at this stage the new method does not.
Also, computing forms for interval estimation in multiple
regression with KSV are not available.
The KSV method is well founded theoretically but the
conditions required for consistency and asymptotic normal
ity of the estimates are difficult to relate to practical
criteria. Large sample properties have been established
for the Cox method and were referenced in section 2.2. Some
criticism of that method's robustness against the influence
of an outlying covariable value was given by Samuels (1978).
That the efficiency of the estimator decreases as _8 departs
from zero is shown in Kalbfleisch and Prentice (1980, p.103).
Assumption A2 has a prominent position with the Miller, BJ
and N methods. It is a sufficient condition for consistency
of the Miller estimate and is used in the heuristic deriva
tion of that estimate's asymptotic distribution. Buckley
and James suggest their point estimation procedure without
88
89
regard to any assumption on censoring pattern but A2 would
seem to be required for consistent estimation of the vari
ance of their estimate. They rely on simulation support
for their heuristic arguments concerning large sample prop
erties. The new method gives exact intervals for small
samples under A2 and for larger samples employs. the asymp
totic theory of Daniels (see Section 3.4) again assuming A2.
All of the methods adapt to leftcensored data.
A simple approach for this is to reverse the sign on all
response and covariable values and use the original method
treating the leftcensored values as rightcensored. The
methods of Cox and KSV do not simply adapt to data exhibit
ing both left and right censoring. The Miller and BJ meth
ods can be extended to that situation by using the distribu
tion estimator of Turnbull (1974, 1976) and the new method
would apply by incorporating a natural extension of the
definitions of "definitely concordant" and "definitely
discordant" (see Definition 3.2.2).
We now compare some computing aspects of the various
methods. For KSV the computations are without iteration and
easy, once the choice of M:n for their expression (2.6.2)
has been made. That choice, however, is not clearcut.
Convergence can be a problem with the Miller method and to
a lesser extent with BJ. Computing costs for the Cox, Miller
and BJ methods are comparable. The new method requires more
computing time than the other methods, but does not suffer
90
from convergence problems. On the system used, the cost of
analyzing the 184 age data by the new method was $7.20,
whereas by BJ the cost was 30 cents. The small sample costs
are less alarming. For analysing a sample size n = 10 the
new method cost;; 71 cents.
Because many results are asymptotic and many based on
assumptions which are not often plausible, we must be very
concerned with simulation and case study performances.
Buckley and James (1979) report a simulation study which
showed some bias in the Miller estimate. Their indications
are that the Miller method is acceptable when assumption A2
is appropriate but that in that case the results are similar
to those from analyzing just the uncensored observations.
Analyzing the same simulated data with their own (BJ) method,
they report encouraging results for.n = 50 with 50% censor
ing and suggest that the method might be used when there are
20 or more uncensored observations. The Miller approach is
no longer advocated by the author himself largely because
of consistency and convergence problems (Miller and Halpern,
1982). As stated in section 4.2 the KSV method gave rather
disconcerting results with the Stanford datathis is dis
cussed in Miller and Halpern. The Cox, BJ and new methods
gave seemingly satisfactory analyses of that data, with the
new method results falling close to those of the Cox method.
The purpose of the Monte Carlo studies reported in
section 4.1 was to test the new method applied to a range
91
of small sample sizes and to compare its performance with
the BJ method when that method was applied to small samples.
Results suggest that in terms of assurance of coverage with
confidence intervals the new method can be used safely over
all sample sizes for a wide range of data types, while for
sample sizes up to 25 the asymptotic distribution for BJ
estimates cannot safely be taken as normal with the pro
posed variance estimate. Use of a tdistribution gives bet
ter coverage performance for BJ with these smaller sample
sizes. By an anomaly due to overestimation of variance the
use of a tdistribution with BJ gave its most conservative
intervals for the smallest sample size tested.
Because intervals by the new method can be expected to
be conservative in small samples, it is possible to adjust
the test setting in order to come closer to the nominal
coverage for the interval. However, it is not possible to
give any firm guidelines as to what that adjustment should be.
It would be impractical to use exact permutation dis
tributions over all sample sizes with the new method. Random
samples of permutations gave adequate results for n = 10 and
the asymptotic results of Section 3.4 seemed appropriate from
n = 15 up. The program documentation which follows in the
appendices contains further information and recommendations
about this.
Assumption A2 pervades the problem of direct regression
with censored data, yet it is rather unlikely to be satisfied
92
in practice. The trials with the new method have given
encouraging indications of the method's robustness against
departures from that assumption, but it remains a major
challenge to relax the assumption completely.
APPENDICES
APPENDIX A
SMALL SAMPLE REGRESSION PROGRAM
This appendix contains a listing of a FORTRAN program
that could be used for obtaining a confidence interval
using the new method with sample sizes 4 to 15. Comments
in the listing are numbered for easy reference.
Recall that the exact method requires creating all
permutations of the pairs (z^,^), i = l,...,n, keeping
x^, i = l,...,n, fixed, or equivalently permuting x's keep
ing the (z^,5^) pairs fixed. The zvalues are functions
of b and change as the search over b values is made, whereas
xvalues do not change. There is therefore much advantage
in making a onetime computation for each permutation of
the xvalues and then using those with each set of (z^,<5^)
pairs as they are formed. Thus, for distinct and ordered
Z1/<'Z2 <'r **' < zm we have from (3.2.5),
n1
S(b) = 2
j=l
n
6. 2 sgn (x. x.)
3 i=j+l 1 3
where the second summation can be made for each permuta
tion of xvalues without regard to the value of b. Provided
that we do always have distinct zvalues they can be ordered
for a particular b setting and corresponding 6's identified.
94
95
Application of the above expression then gives a value of
S (b) for each permutation of the xvalues, as required
for the test described at (3.3.4).
In case of tied zvalues the exact tests should be
obtained via expression (3.2.3) but this time with much
greater computing expensebecause now the whole computa
tion for the permutation distribution for S(b) would need
to be made in each region over b where the distribution of
S(b) is known to differ. In the terminology of Chapter
Three it is clear that zvalues can only be equal at crit
ical bvalues (3.2.9) and when data points coincide; i.e.,
(y+)/Xi) = (Yj+) ,Xj) for some (i,j) (from z+) (b) = y+) bx^ .
Permutation distributions are not considered at the critical
bvalues and we need consider only the problem of coincident
points.
To avoid the expense of repeated use of (3.2.3) we
advocate using randomized ranks over tied zvalues and pro
ceeding via (3.2.5). Realize that two censored zvalues
do not contribute to the computation of S(b) anyway and that
a censored zvalue tied numerically with an uncensored
zvalue is considered greater. Thus random ordering of tied
uncensored zvalues is all that is necessary and this is
effected in the program by a "fixup" on the y data values
(see COMMENT 2). Coincident points are rather unlikely in
practice with the regression of continuous y on continuous x.
96
The xpercnutations are obtained using the programming
that follows COMMENT 3. Up to sample size n = 7 all n!
permutations are used. For n = 8 and n = 9 a random sample
of about 10,000 permutations from the possible n! is chosen
without replacement. For sample size 10 or greater, 10,000
I
permutations are chosen randomly with replacement. From the
results of Section 4.1, it is not felt necessary to use this
program for sample sizes greater than 15/ but rather to use
asymptotic results and the program given in Appendix C.
Given an xpermutation, subroutine XCALC is called to
compute the sums,
n
l sgn (x.x.),
i=j+l 1 :
for each j = l,...,nl. These values are stored in the
array ISUMX for use in subroutine DIST when that is called
for forming the permutation distribution of S(b) (see below
COMMENT 8).
Critical b's are formed by the program section below
COMMENT 4. The algorithm for computing the step function
S(*) and locating the point estimate is then implemented.
The programming below COMMENT 8 facilitates the search
through bvalues both above and below the point estimate.
By an application of Corollary 3.3.9 it is determined at
which points of b to recompute the permutation distribution
for S(b). The distribution is formed in subroutine DIST
97
and the percentage points of the null distribution then formed.
The test of the current S(b) value at lines 78 and 148 deter
mines the acceptability or otherwise of the current bvalue.
Rejection of a bvalue marks a boundary to the confidence
interval.
non non oooooooon
LISTING FOR APPENDIX A
98
THIS PROGRAM IMPLEMENTS THE NEW METHOD
FOR SAMPLE SIZES 4 TO 15.
DATA INPUT:
SAMPLE SIZE, INTERVAL PERCENTAGE,
LOG OPTIONd L0G10, 0 OTHERWISE) IN FORMAT 315.
X, Y, DELTA IN FORMAT 2F5. 0. 15.
COMMON ISUMXC11000, 15)
DIMENSION X(15),Z(15),Y(15),IC(15),B(11Q),IF(220)
DIMENSION IA(15),ICENS(15)
INTEGER XP(15),C(110),S(llO)
CALL RSEED(35637)
READ(5, 10) N, IPC, LOGT
lO FORMAT(315)
DO 20 1=1, N
20 READ(5, 30) X(I),Y(I) IC(I)
30 FORMAT(2F5. O, 15)
NNN=N*(N1)
NN=NNN/2
XXX=(100. 0IPO/20G. O
COMMENT 1 FIXUP FOR EQUAL POINTS
N1=N1
DO 21 1=1, N1
11=1+1
DO 22 J=I1,N
IFC(XCI) NE. X(J)) .OR. (Y(I) NE. Y(J)))G0 TO 22
R=RNDMF(1. 0)+. 1
IF(IC(I) EQ. 1)Y(I)=Y(I)R/10000. O
IF(IC(I) EQ. 0)Y(I)=Y(I)+R/10000. 0
GO TO 21
22 CONTINUE
21 CONTINUE
IF(LOGT EQ. 0)G0 TO 2
DO 3 1=1, N
3 Y(I)=LG10(Y(I))
2 CONTINUE
DO 31 1=1, N
31 WRITE(6, 29) X ( I ) Y( I), IC(I)
29 FORMAT(2F20. S, 110)
COMMENT 2 PERMUTATIONS
DO 3 1=1,11000
DO 6 J=l. 15
ISUMX(I, J>=0
6 CONTINUE
3 CONTINUE
111=0
IF(N GE. 10)GO TO 600
500 M=N
XP(M)=l
XP(M1)=2
DO 400 Kl=l,2
CALL SWITCH(K1, 3, XP, M)
XP(M2)=3
DO 390 K2=l, 3
CALL SWITCH(K2, 4, XP, M)
XP (M3) =4
DO 380 K3=l, 4
CALL SWITCH(K3, 5, XP, M)
IF (M NE. 4)GO TO 385
111=111+1
CALL XCALC(XP,M, III, X)
GO TO 380
385 XP(M4)=5
DO 370 K4=l, 5
CALL SWITCH(K4, 6, XP, M)
IF (M .NE. 5)GO TO 375
CONTINUATION OF LISTING FOR APPENDIX A
99
111=111+1
CALL XCALCCXP,M,III,X)
GO TO 370
37S XPCM5)=6
DO 360 K5=l,6
CALL SWITCHCK5, 7, XP M)
IF (M NE. 6)GO TO 365
111=111+1
CALL XCALCCXP, M, III,X)
GO TO 360
365 XP(n6)=7
DO 350 K6=l,7
CALL SWITCHCK6, 8, XP, M)
IFCM NE. 7)GO TO 355
111=111+1
CALL XCALCCXP, M III, X)
GO TO 350
355 XPCM7)=8
DO 340 K7=l, 8
CALL SWITCHCK7, 9, XP, M)
IF(M NE. 8)GO TO 345
R=RNDMF(1. 0>
IFCR LT. 75)GO TO 340
111=111+1
CALL XCALCCXP, M, III, X)
GO TO 340
345 XP CM8)=9
DO 330 K8=l,9
CALL SWITCHCK810, XP, M)
R=RNDMFC1. O)
IFCR LT. 972)GO TO 330
111=111+1
CALL XCALCCXP,M, III, X)
330 CONTINUE
CALL SHUNTC8, XP, M)
340 CONTINUE
CALL SHUNTC7, XP, M)
350 CONTINUE
CALL SHUNTC6, XP, M)
360 CONTINUE
CALL SHUNTC5, XP, M)
370 CONTINUE
CALL SHUNTC4, XP, M)
380 CONTINUE
CALL SHUNTC3. XP, M)
390 CONTINUE
CALL SHUNTC2, XP, M)
400 CONTINUE
GO TO 699
600 M=N
DO 610 K=l,lOOOO
DO 620 1=1, M
620 IACI)=I
DO 630 1=1, M
MM=M1+1
R=RNDMF C1. O)
JJ=INTCR*MM) + 1
XP CI)=IACJJ)
IA
630 CONTINUE
111=111+1
CALL XCALCCXP, M, III, X)
610 CONTINUE
699 CONTINUE
COMMENT 3 COMPUTE B'S
L=0
DO 35 1=1, N
DO 40 J=l, N
IF CXCI) LE. XCJ)) GO TO 40
L=L+1
C
C
C
CONTINUATION OF LISTING FOR APPENDIX A
100
B/CX(I)X(J))
C(L)=IC
40 CONTINUE
35 CONTINUE
C
C COMMENT 4 ORDER B'S RE ARRANGE C'S
C
CALL ORDER(BC, L)
DO 36 J=1,L
B
C(L+2J)=C < L+1J)
36 CONTINUE
B C 1 )=8 (2) 1. 0
B(L+2)=B(L+1) +1. 0
C
C COMMENT 5 COMPUTE STARTING VALUE S<1)
C
DO 55 1=1, N
55 Z(I)=Y(I)BC1)*X(I)
CALL COMP < X, N, IC, Z, IS)
S<1)=IS
C
C COMMENT 6 COMPUTE S'S SEQUENTIALLY
C
KAV=0
L1=L+1
DO 60 1=2, LI
S
IF((S(I1)*S(I)) LT. 0) ISUB=I
IF( S(I) .NE. 0)G0 TO 60
ISUB2=1+1
KAV=KAV+1
60 CONTINUE
C
C COMMENT 7 COMPUTE POINT EST.
C
IF (KAV EQ. O) GO TO 65
ISUB1=ISUB2KAV
BPNT=(B
GO TO 70
65 ISUB1=ISUB
I SUB 2= I SUB
BPNT=B CISUB)
70 J=ISUB2
C
C COMMENT 8 CONFIDENCE INTERVAL
C
71 ISCUT=10
72 ICH=0
74 IF(B
IF(C(J) EQ. 1)ICH=1
J=U+1
GO TO 74
76 IF
IF <(C CJ) EQ. 1) .OR. (ICH EQ. 1))G0 TO 80
78 IF(S(J) LT. ISCUT)GO TO 150
IF(J NE. L+1)GO TO 79
BUP=9999. 0
GO TO 151
79 U=J+1
GO TO 72
80 BB=(B(J)+B(J+1))/2. O
DO 90 K=l, N
90 Z(K)=Y(K)BB*X
CALL RANK(Z, IC, ICENS, N)
DO 100 K=l, NNN
10O IF(K)=0
CALL DIST(III, IF, ICENS, N,NN)
105 IT0T=O
DO 110 K=l, NNN
110 ITOT=IF(K)+ITOT
KC0U=0
noon
CONTINUATION OF LISTING FOR APPENDIX A 101
DO 120 K=l,NNN
KCOU=KCOU+IF(K)
PTU=FLOAT(KCOU >/FLOAT(ITOT)
IF (PTU GT. XXX) GO TO 130
120 CONTINUE
130 ISCUT=KNN
KCOU=KCOUIF(K)
PTU=FLOAT(KCOU)/FLOAT(ITOT)
GO TO 78
150 BUP=B(J)
151 J=ISUB1
ISCUT=10
140 ICH=0
142 IF(B(J) NE. BGO TO 146
IF(C(J) EQ. DICH1
1
GO TO 142
146 IF(ISCUT .EG. 10)G0 TO 155
IFC (C (J) .EG. 1) .OR. (ICH .EG. 1))G0 TO 155
148 IF(S
IF(J NE. 2)GO TO 143
BLQW=9999. 0
GO TO 221
143 J=J1
GO TO 140
155 BB=(B(J)+B(J1))/2. 0
DO 160 K=l, N
160 Z(K)=Y(K)BB*XCK)
CALL RANK(Z, IC, ICENS, N)
DO 170 K=l, NNN
170 IF(K)=0
CALL DIST(III, IF ICENS/ N, NN)
175 ITOT=0
DO 180 K=l, NNN
180 ITOT=IF(K)+ITOT
KC0U=0
DO 190 K=l, NNN
KCOU=KCOU+1F < NNNK)
PTL=FLOAT(KCOU)/FLOAT(ITOT)
IFCPTL GT. XXX) GO TO 200
190 CONTINUE
200 ISCUT=NNK
KCOU=KCOUIF(NNNK)
P TL=FLOAT(KCOU)/FLOAT(ITOT)
GO TO 148
220 BLOW=B(J)
221 CONTINUE
WRITE(6, 230)BPNT
WRITE(6/ 232)
232 FORMAT(' ')
WRITE(6/ 231)IPC/ BLOW, BUP
230 FORMAT(1 POINT ESTIMATE 7,F10. 2)
231 FORMAT( 15, PERCENT INTERVAL ',F6. 2,' F6. 2)
STOP
END
SUBROUTINES SHUNT AND SWITCH: USED IN
GENERATION OF ALL PERMUTATIONS.
SUBROUTINE SHUNT(J, B, M)
DIMENSION B(10)
DO 10 1=1, J
10 B(M1+1)=B(MI)
RETURN
END
SUBROUTINE SWITCH(K, J, B, M)
DIMENSION BC10)
IF(K .EG. 1)G0 TO 10
SAVE=B(MJ+K)
B(MJ+K)=B(MJ+1+K)
B(MJ+1+K)=SAVE
10 RETURN
END
CONTINUATION OF LISTING FOR APPENDIX A
102
C SUBROUTINE XCALC: FORMS MATRIX OF VALUES
C USING THE X PERMUTATIONS. THESE VALUES ARE
C USED BY REPEATED USE OF SUBROUTINE DIST.
C
SUBROUTINE XCALCCXP, M 111,X)
COMMON ISUMXC11000, 15)
DIMENSION X(10>
INTEGER XP(10)
M1=M1
DO 40 J=l Ml
IT=0
J1=J+1
JJ=XPCJ)
DO 20 I=J1 M
II=XPCI)
IF(X CII)XCJJ)>10/ 2030
30 IT=IT+1
GO TO 20
10 IT=IT1
20 CONTINUE
ISUMX(IIIJ) = IT
40 CONTINUE
RETURN
END
C
C SUBROUTINE ORDER: PLACES B'S IN SIZE ORDER
C AND CORRESPONDINGLY RELABELS CHANGE
C INFORMATION
C
SUBROUTINE ORDER CZ 1CN)
DIMENSION ZC50)IC(50)
N1=N1
DO 10 1=1. N1
SM=Z(I>
K=I
11=1+1
DO 20 U=I1N
IF (ZCJ) GE. SM)GO TO 20
SM=Z(U>
K=J
20 CONTINUE
ZCK)=ZCl)
Z(I)=SM
ICK=ICCK)
ICCK)=ICCI)
IC(I)=ICK
10 CONTINUE
RETURN
END
C
C SUBROUTINE COMP: COMPUTES STARTING VALUE
C FOR SEQUENTIAL COMPUTATION OF SC. )
C
SUBROUTINE COMPCB.M IC Z IS)
DIMENSION BC10), ICC10), ZC10)
is=o
DO 250 1=1 M
DO 240 U=li M
IFCB(I) LE. B(J)) GO TO 240
IFCICCJ) .EG. 0)G0 TO 200
IFCICCI) EQ. 0)GO TO 160
IFCZCI) GT. ZCJ)) IS=IS+1
IFCZCI) LT. ZCJ)) IS=IS1
GO TO 240
160 IFCZCI) GE. ZCJ))IS=IS+1
GO TO 240
200 IFCICCI) EQ. 0)G0 TO 240
IFCZCJ) GE. ZCI))IS=IS1
240 CONTINUE
250 CONTINUE
RETURN
END
CONTINUATION OF LISTING FOR APPENDIX A
103
C SUBROUTINE RANK: LABELS CENSORING INFORMATION
C ACCORDING TO RANK OF Z VALUE.
C
SUBROUTINE RANK(Z, IC, ICENS, N)
DIMENSION Z(5Q),IC(50),ICENSC50)
DO 20 1=1/N
K=1
DO 10 J=l/ N
IF(Z(I> LE. Z (J > )GO TO 10
K=K+1
10 CONTINUE
ICENS(K)=ICCI)
20 CONTINUE
RETURN
END
C
C SUBROUTINE DIST: COMPUTES 'IS' USING (3.2.5)
C AND FORMS FREQUENCY DISTRIBUTION IN
C ARRAY 'IF'
C
SUBROUTINE DISTIII, IF. ICENS,M/NN)
COMMON ISUMX(11000, 15)
DIMENSION IF(100), ICENS(50)
M1=M1
DO 20 K=l, III
IS=0
DO 10 J=l,Ml
IS=IS+ICENS(J)*ISUMX(K, J)
10 CONTINUE
IS=IS+NN
IF(IS)=IF(IS)+1
20 CONTINUE
30 FORMAT(2110)
RETURN
END
APPENDIX B
TWOSAMPLE ADAPTATION
Clearly by coding x = 0,1 the previous program can be
used to analyze a twosample problem (Section 3.5). However,
for this special application it is computationally propitious
to replace the permutation segment between COMMENTS 2 and 3
with the following segment for generating permutations in the
twosample case. The advantage of this is that we no longer
need to consider all permutations of the x's because permuta
tions within tied x's lead to the same values for S(*). It is
necessary to consider only the different configurations of 0's
and l's in the xvector. For this application subroutine
XCALC should be replaced by subroutine SAM and subroutines
SWITCH and SHUNT are not required. The program applies pro
vided that x = 1 is associated with a sample size M = 10 or
less, but the dimensions of array ISUMX must be increased if
sample size N > 15 or (^) > 11,000.
104
LISTING FOR APPENDIX B
105
C TWO SAMPLE COMBINATIONS OF X=1 POSITIONS
C IN THE X VECTOR. SUBROUTINE SAM IS
C CALLED FOR COMPUTING ISUMX.
C
M=0
DO 4 J=l, N
4 M=M+IFIXXJ))
DO 3 1=1,11000
DO 6 J=l, N
ISUMX(I, J)=G
6 CONTINUE
3 CONTINUE
111=0
NM1=NM+l
DO 300 Jl=l, NM1
XP(1)=J1
Kl=XPi1>+1
NM2=NM+2
DO 310 J2=K1, NM2
XP ( 2 ) =J2
IF(M NE. 2)GO TO 515
111=111+1
CALL SAMXP, III, M, N>
GO TO 510
515 K2=XPi2)+l
NM3=NM+3
DO 520 J3=K2, NM3
XP(3)=U3
IF(M NE. 3)GO TO 525
111=111+1
CALL SAM
GO TO 520
525 K3=XP <3)+l
NM4=NM+4
DO 530 J4=K3, NM4
XP(4)=J4
IF(M NE. 4)GO TO 535
111=111+1
CALL SAMXP, III, M, N)
GO TO 530
535 K4=XP(4>+1
NM5=NM+5
DO 540 J5=K4, NM5
XP ( 5)=J5
IF(M NE. 5)GO TO 545
111=111+1
CALL SAM(XP, III, M, N)
GO TO 540
545 K5=XP(5)+1
NM6=NM+6
DO 550 J6=K5, NM6
XP <6)=J6
IF(M NE. 6)GO TO 555
111=111+1
CALL SAM(XP, III, M, N)
GO TO 550
555 K6=XP<6)+1
NM7=NM+7
DO 560 J7=K6, NM7
XP ( 7) =J7
IF(M NE. 7)GO TO 565
111=111+1
CALL SAMXP, III, M, N>
GO TO 560
565 K7=XP(7)+1
NM8=NM+8
DO 570 J8=K7, NM8
XP(8)=U8
IF (M NE. 8) GO TO 575
111=111+1
CALL SAMXP, III, M. N)
GO TO 570
onooooo
CONTINUATION OF LISTING FOR APPENDIX B
106
575 K8=XP(8)+1
NM9=NM+9
DO 580 J9=K8,NM9
XP(9)=J9
IF(M NE. 9)GO TO 585
111=111+1
CALL SAM(XP, III, M. N)
GO TO 580
585 K9=XP(9)+1
NM10=NM+10
DO 590 J10=K9,NM10
XP(10)=J10
111=111+1
CALL SACKXP III, M, N)
590 CONTINUE
580 CONTINUE
570 CONTINUE
560 CONTINUE
550 CONTINUE
54Q CONTINUE
530 CONTINUE
520 CONTINUE
510 CONTINUE
500 CONTINUE
SUBROUTINE SAM: FORMS ISUMX, THE MATRIX
OF VALUES WHERE ROWS CORRESPOND TO
DIFFERENT CONFIGURATIONS OF O'S AND 1'S
AND COLUMNS ARE AS IN ISUMX IN SMALL SAMPLE
SUBROUTINE XCALC.
SUBROUTINE SAM
COMMON ISUMX(11000, 15?
DIMENSION K.( 10), IX (30)
DO 10 J=l, N
IQ IX(J)=0
DO 20 J=l,M
20 IX(K(J))=1
IT0T=0
N1=N1
DO 30 J=1,N1
JJ=NJ
IT0T=IX(JJ+1)+ITOT
ISUMX (III, JJ) = ITOTIX(JJ)J
30 CONTINUE
RETURN
END
APPENDIX C
LARGE SAMPLE PROGRAM
The final listing is for the large sample routine.
Array sizes in the program would need to be changed for
sample sizes greater than 200. The computations for
Daniels' approximate variance (3.4.2) are made in subrou
tines COEF and VAR.
107
non non non non non noooonoono
LISTING FOR APPENDIX C
108
THIS PROGRAM IMPLEMENTS THE NEW METHOD USING
THE LARGE SAMPLE RESULTS OF DANIELS (SECTION 3.4)
DATA INPUT:
SAMPLE SIZE, NORMAL VALUE (1.96 FOR 95% INTERVAL),
LOG OPTIONd LOGIO, 0 OTHERWISE) IN FORMAT 15, F5. 0, 15.
X, Y, DELTA IN FORMAT 2F5. 0, 15.
DIMENSION X(200), Z(200), Y(200), IC(20O),B(20000)
INTEGER C(20000),S(20000)
REAL ISCUT
READ(5, 10) N, ZNORM, LOGT
10 FORMAT (15, F5. O, 15)
DO 20 1=1, N
20 READ(5, 30)X(I),Y(I), IC(I)
30 FORMAT(2F5. 0, 15)
IF(LOGT EQ. Q)GO TO 5
DO 4 1=1, N
4 Y(I)=LQG10(Y(I))
5 CONTINUE
DO 31 1=1, N
31 WRITE(6. 29)X(I),Y(I), IC(I)
29 FORMAT(2F20. 8, 110)
COMPUTATIONS ON X FOR DANIELS' RESULTS (FROM 3.4.2)
CALL COEFX, N, C0EF1, C0EF2)
COMPUTE B'S
L=0
DO 35 1=1, N
DO 40 J=1,N
IF (XCI) .LE. X) GO TO 40
L=L+1
B(L)=
C(L)=IC(I)+IC(J)
40 CONTINUE
35 CONTINUE
ORDER B'S RE ARRANGE C'S
CALL ORDER(B, C, L)
DO 36 J=l, L
B(L+2J)=B(L+1J)
C(L+2J)=C(L+lJ)
36 CONTINUE
B(1)=B < 2) 1.O
B(L+2)=B(L+1) + l.O
COMPUTE STARTING VALUE S(l)
DO 55 1=1, N
55 Z(I)=Y(I)B(i)*X(I)
CALL COMP(X, N, IC, Z, IS)
S(1)=IS
COMPUTE S'S SEQUENTIALLY
KAV=0
L1=L+1
DO 60 1=2, LI
S(I)=S(I1)C(I)
IF((S(I1)*S(I)) LT. 0) ISUB=I
IF( S(I) NE. 0)G0 TO 60
ISUB2=I+1
KAV=KAV+1
60 CONTINUE
non non
CONTINUATION OF LISTING FOR APPENDIX C
109
COMPUTE POINT EST.
IF
ISUB1=1SUB 2KAV
BPNT=
GO TO 70
65 ISUB1=1SUB
ISUB2=ISUB
BPNT=B
70 J=ISUB2
CONFIDENCE INTERVAL
71 ISCUT=10
72 ICH=0
74 IF(B(J) NE. B < J+1))GO TO 76
IF(C
J=J+1
GO TO 74
76 IF(ISCUT EQ. 10>G0 TO SO
IF <(C < J) EQ. 1) .OR. (ICH EQ. 1))G0 TO SO
78 IF
IF( J NE. L+DSQ TO 79
BUP=9999.0
GO TO 151
79 J=J+1
GO TO 72
80 BB=
DO 90 K=1jN
90 Z(K)=Y(K)BB*XCK)
CALL VAR(Z, N, IC, C0EF1, C0EF2 ZNGRM, ISCUT, VARS)
GO TO 78
150 BUP=B
151 J=ISUB1
ZNORM=1. *ZNORM
ISCUT=10
140 ICH=0
142 IF(B(J) NE. B< J1))GO TO 146
IF(C(J) EQ. 1)ICH=1
J=J1
GO TO 142
146 IF(ISCUT EQ. 10)G0 TO 155
IF((C(J) EQ. 1) .OR.
148 IF(S
IF(J NE. 2)GO TO 143
BLOW=9999. 0
GO TO 221
143 J=J1
GO TO 140
155 BB=
DO 160 K=1jN
160 Z(K)=Y(K)BB*X(K)
CALL VAR< Z/ N IC, C0EF1, C0EF2i ZNORM, ISCUTi VARS)
GO TO 148
220 BLOW=B(J)
221 CONTINUE
WRITE(6, 232)
WRITE(6, 230)BPNT
WRITE<6, 232)
WRITE(6, 231)BLOW, BUP
232 FORMAT(' ')
230 FORMAT(' POINT ESTIMATE F10. 2)
231 FORMAT(' CONFIDENCE INTERVAL F6. 2,
STOP
END
F6. 2)
CONTINUATION OF LISTING FOR APPENDIX C
110
C SUBROUTINE ORDER: PLACES B'S IN SIZE
C ORDER AND CORRESPONDINGLY RELABELS
C CHANGE INFORMATION.
C
SUBROUTINE ORDER(Z, IC, N)
DIMENSION Z(50),IC<30)
N1=N1
DO 10 1=1/N1
SM=Z(I)
K=I
11=1+1
DO 20 J=I1,N
IF
SM=Z(J)
KJ
20 CONTINUE
Z(K)=Z(I)
Z(I)=SM
ICK=IC(K)
IC
IC(I) = ICK
10 CONTINUE
RETURN
END
C
C SUBROUTINE COMP: COMPUTES STARTING VALUE
C FOR SEQUENTIAL COMPUTATION OF Si. )
C
SUBROUTINE COMPiB,M, IC,Z, IS)
DIMENSION B10),IC(10),Z(10)
IS=0
DO 230 1=1, M
DO 240 J=l,M
IF(B
IF(IC(J) .EG. 0)G0 TO 200
IF(IC(I) EQ. 0)GO TO 160
IF(Z(I) GT. Z(J)) IS=IS+1
IF(Z(I) LT. ZCJ)) IS=IS1
GO TO 240
160 IF(Z(I) GE. Z(J))IS=IS+1
GO TO 240
200 IF(IC(I) EQ. 0)G0 TO 240
IF(Z(J) GE. ZCI))IS=IS1
240 CONTINUE
230 CONTINUE
RETURN
END
C
C SUBROUTINE COEF: MAKES PRELIMINARY
C CALCULATIONS ON X VECTOR FOR INCLUSION
C IN VARIANCE COMPUTATION IN SUBROUTINE VAR.
C
SUBROUTINE COEF
DIMENSION X(100)
INTEGER SUMSQ,SUMX, SUMA
SUMSG=0
SUMX=0
DO 40 1=1, N
SUMA=0
DO 20 J=l, N
IF(X
10 SUMA=SUMA1
GO TO 35
30 SUMA=SUMA+1
33 SUMSQ=SUMSQ+1
20 CONTINUE
SUMX=SUMX+SUMA*SUMA
40 CONTINUE
COEF1=FL0AT(SUMXSUMSQ)/FLOAT*
C 0EF2=FL0A T(SUMSQ)/FLOAT(2*N* < N1))
RETURN
END
nono
CONTINUATION OF LISTING FOR APPENDIX C
111
SUBROUTINE VAR: COMPUTES ASYMPTOTIC
VARIANCE USING (3. 4. 2)
SUBROUTINE VAR(Z, N IC, COEF1/ C0EF2/ ZNORM, ISCUT/ VARS)
DIMENSION Z(200) IC (200)
REAL ISCUT
INTEGER SUMSQBSUMZ/ SUMB
SUMSQB=0
SUMZ=0
DO 50 1=11 N
SUMB=0
DO 40 J=l/ N
IF(Z(J)Z(I>>10, 20/ 30
10 SUMB=SUMBIC(J)
SUMSQB =SUMSGBt1C (J)
GO TO 40
30 SUMB=SUMB+IC(I)
SUMSQB=SUMSQB+IC
GO TO 40
20 SUMB=SUMB+IC(J)IC(I>
SUMSQB=SUMSQB+1C(J)1C(I)
40 CONTINUE
SUM Z=SUMZ+SUMB *SUMB
50 CONTINUE
VARS=CQEF1*(SUMZSUMSQB> +C0EF2*SUMSQB
ISCUT=1. *ZNORM*SGRT(VARS>
RETURN
END
BIBLIOGRAPHY
Breslow, N. E. (1974). Covariance analysis of censored
survival data. Biometrics, 30, 8999.
Brown, B. W., M. Hollander, and R. M. Korwar. (1974).
Nonparametric tests of independence for censored data
with applications to heart transplant studies.
Reliability and Biometry: Statistical Analysis of
Life Length (F. Prochan and R. J. Serfling, Eds.).
SIAM, Phildelphid.
Buckley, J., and I. James. (1979). Linear regression with
censored data. Biometrika, 66, 429436.
Cox, D. R. (1972). Regression models and lifetables.
J. R. Stat. Soc. B, 34, 187202.
Cox, D. R. (1975). Partial likelihood. Biometrika, 62,
269276.
Daniels, H. E. (1944). The relation between measures of
correlation in the universe of sample permutations.
Biometrika, 33, 129135.
Efron, B. (1977). Efficiency of Cox's likelihood function
for censored data. J. Am. Stat. Assoc., 72, 557565.
ElandtJohnson, R. C., and N. L. Johnson. (1980). Survival
Models and Data Analysis. Wiley, New York.
Epstein, B., and M. Sobel. (1953). Life testing. J. Am.
Stat. Assoc., 48, 486502.
Gehan, E. A. (1965). A generalized Wilcoxon test for
comparing arbitrarily singlycensored samples.
Biometrika, 52, 203223.
Hodges, J. L., and E. L. Lehmann. (1963). Estimates of
location based on rank tests. Ann. Math. Stat., 34,
 598611.
Kalbfleisch, J. D., and R. L. Prentice. (1973). Marginal
likelihoods based on Cox's regression and life model.
Biometrika, 60, 267278.
112
113
Kalbfleisch, J. D., and R. L. Prentice. (1980). The
Statistical Analysis of Failure Time Data. Wiley,
New York.
Kaplan, E. L., and P. Meier. (1958). Nonparametric estima
tion from incomplete observations. J. Am. Stat. Assoc.,
53, 457481.
Kendall, M. G. (1970). Rank Correlation Methods, 4th Ed.
Griffin, London.
Kendall, M. G., and A. Stuart. (1973). The Advanced Theory
of Statistics, Vol. 2. Griffin, London.
Koul, H., V. Susarla, and J. Van Ryzin. (1981). Regression
analysis with randomly rightcensored data. Ann.
Statist., 9, 12761288.
Lawless, J. F. (1981). Statistical Models and Methods for
Lifetime Data. Wiley, New York.
Leavitt, S. S., and R. A. Olshen. (1974). The insurance
claims adjuster as patients' advocate: quantitative
impact. Report for Insurance Technology Company,
Berkeley, California.
Lehmann, E. L. (1959). Testing Statistical Hypotheses.
Wiley, New York.
Mann, N. R., R. E. Schafer, and N. D. Singpurwalla. (1974).
Methods for Statistical Analysis of Reliability and
Life Data. Wiley, New York.
Mantel, N., and W. Haenszel. (1959). Statistical aspects
of the analysis of data from retrospective studies of
disease. Journal of the National Cancer Institute,
22, 719748.
Miller, R. G. (1976). Least squares regression with
censored data. Biometrika, 63, 449464.
Miller, R. G. (1981). Survival Analysis. Wiley, New York.
Miller, R. G., and J. Halpern. (1982). Regression with
censored data. Biometrika, 69, 521531.
Nelson, W. (1981). Applied Life Data Analysis. Wiley,
New York.
Oakes, D. (1977). The asymptotic information in censored
survival data. Biometrika, 64, 441448.
Peto, R. (1972). Discussion on paper by D. R. Cox.
J. R. Stat. Soc. B, 34, 205207.
114
Prentice, R. L. and L. A. Gloeckler. (1978). Regression
analysis of grouped survival data with application to
breast cancer data. Biometrics, 34, 5767.
Randles, R. H., and D. A. Wolfe. (1979). Introduction to
the Theory of Nonparametric Statistics. Wiley, New
York.
Samuels, S. (1978). Robustness of Survival Estimators.
Ph.D. Thesis. Department of Biostatistics. Univ. of
Washington.
Schmee, J. and Hahn, G. J. (1979). A simple method for
regression analysis with censored data. Technometrics,
21, 417432.
Sen, P. K. (1968). Estimates of the regression coefficient
based on Kendall's Tau. J. Am. Stat. Assoc., 63,
13791389.
Susarla, V., and J. Van Ryzin (1980) Large sample theory
for an estimator of the mean survival time from censored
samples. Ann. Statist., 8, 10021016.
Tsiatis, A. (1981). A large sample study of Cox's regres
sion model. Ann. Statist. 9, 93108..
Turnbull, B. W. (1974). Nonparametric estimation of a
survivorship function with doubly censored data.
J. Am. Stat. Assoc., 69, 169173.
Turnbull, B. W. (1976). The empirical distribution function
with arbitrarily grouped censored and truncated data.
J. R. Stat. Soc. B, 38, 290295.
BIOGRAPHICAL SKETCH
Michael Ireson was born in Luton, England, in December,
1942. He received a B.Sc. degree in mathematics and physics
from London University in 1966, and in 1968 obtained a
Diploma in Education from Makerere University, Uganda.
Following a period of teaching secondary school mathematics
in Kenya, he attended the University of Wales for a year and
received a Diploma in Statistics in 1972. For the next seven
years he held a Lectureship in Mathematics at the University
of Malawi. In 1979 he joined the graduate program in statis
tics at the University of Florida. After graduation he plans
to take an appointment as a statistician with Smith, Kline
and French in Welwyn, England.
He is an Associate Fellow of the Instituted Mathematics
and its Applications and a Member of the Institute .of Statis
ticians the Biometric Society and the American Statistical
Association.
He is married and has two children.
115
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Pejaver V. Rao, Chairman
Professor of Statistics
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree o.f^Doctor of Philosophy.
Ramon C.OLit'tell /
Professor of Statistics
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Jon J. Shuster
Professor of Statistics
I certify that I have read this study and that in my
opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
*
Jan
CAC
This dissertation was submitted to the Graduate Faculty of
the Department of Statistics of Liberal Arts and Sciences
and to the Graduate School; and was accepted as partial
fulfillment of the requirements for the degree of
Doctor of Philosophy.
August 1983
Dean for Graduate Studies
and Research
81
vi) A power comparison. Because the method N can give
nonsymmetric confidence intervals it was felt that an inter
esting comparison of methods might be made on the basis of
rejection of a null hypothesis that 8=0. It was thought
that, although method N may give intervals of the same length
as another method, it might be that there would be a different
chance of rejecting 8=0 for the two methods because of the
asymmetry of the N intervals. With our setting of 8 = 0.2
for the model this amounts to comparing the power of the
methods for a test of Hq: 8=0 versus : 8 = 0.2. Recall
(Section 3.2) that method N used for testing Hq: 8 = 0 is
the test of Brown, Hollander and Korwar (1974). In think
ing of estimating a power value for the BJ method we must
decide whether to use the proportion of rejections out of
all simulations or the proportion of rejections out of those
simulations which give convergence and thus do give a test.
Values could be quite different for small n. Table 15 shows
the proportions out of all simulations.
No marked effects of asymmetric intervals are seen.
Method N shows up rather well compared to BJT for n = 25 in
keeping with the results on interval length and both methods
show good increase in proportions ("power") with sample size.
54
In both cases coarser searches across bvalues can be
made in order to reduce computing time and in this regard
there is also some advantage to making an initial search
using the asymptotic method and restricting the exact method
search to the appropriate regions.
Further details of the computing aspects are given in
appendices, along with some computer programs and their
documentation. Some results of using this new method appear
in the next chapter.
35
b < b^; however, the computation can be much simplified in
the following way. For any b < b^ (2.12) and (2.13) imply
nij(b) = <5j if x. > x.. Therefore, from (2.14)
s = Z 6 .
(i j ) :xj_>xj 11
Since entries appear in Table 1 only if x^ > x^ we see that
sQ is simply the total number of entries under columns for
which 6. = 1. In our example these are columns 1, 3 and 5,
thus sq=4+2+0=6. Sequential computation of S then
follows using the ordered b^, k = !,...,Â£ and the correspond
ing "change" values, c^, computed using (2.11). The result
ing step function S(b) is tabulated in Table 2 and graphed
A
in Figure 1. The point estimate is 8 = 0.25.
Table 2. Computation of S(b)
s =6
o
Si Si i to.
k k1 k
1
2
3
4
5
1
0
0.25
0.5
0.67
1
1
4
2
2
1
2
5
1
1
3
4
6
6
the point estimate. The confidence set described at (3.5)
is thus established as a 100(1a)% confidence interval.
48
3.4 Interval Estimation of 0: Asymptotic Method
For large n, the tests proposed at (3.4) can be
performed using an approximation for the conditional null
distribution of S; i.e., the conditional distribution of S
under Hq: 6 = b. We may use directly the results of Daniels
(1944). Those results concern, in Daniels' notation, the
nl different ways of grouping n sample yvalues with n sample
xvalues. A score aj_j / i,j = l,...,n, is assigned to each
pair (x^,Xj) and a score b^j to each (y^,yj). For scores
such that a.. = a.., b.. = b.. and for summations over all
13 3i 13 31
3
subscripts Â£ a..a... and Â£ b..b... each of order n it is
13 13 13 13'
shown that a statistic
C = Â£ a. b. (4.1)
i,j 13
is conditionally asymptotically Normal with conditional mean
zero and an expression for the conditional variance is given
in terms of n, a^^ and b^j, i,j = l,...,n. Applying this to
data points (z+^,x^), i = l,...,n, and the statistic,
n n
S (b) = Â£ Â£ n (b) \Â¡j (x. x .) ,
i=l j=l 13 3
from (2.3), we have, comparing (4.1) and writing r\. ^ for
n (b)
'ii
r
96
The xpercnutations are obtained using the programming
that follows COMMENT 3. Up to sample size n = 7 all n!
permutations are used. For n = 8 and n = 9 a random sample
of about 10,000 permutations from the possible n! is chosen
without replacement. For sample size 10 or greater, 10,000
I
permutations are chosen randomly with replacement. From the
results of Section 4.1, it is not felt necessary to use this
program for sample sizes greater than 15/ but rather to use
asymptotic results and the program given in Appendix C.
Given an xpermutation, subroutine XCALC is called to
compute the sums,
n
l sgn (x.x.),
i=j+l 1 :
for each j = l,...,nl. These values are stored in the
array ISUMX for use in subroutine DIST when that is called
for forming the permutation distribution of S(b) (see below
COMMENT 8).
Critical b's are formed by the program section below
COMMENT 4. The algorithm for computing the step function
S(*) and locating the point estimate is then implemented.
The programming below COMMENT 8 facilitates the search
through bvalues both above and below the point estimate.
By an application of Corollary 3.3.9 it is determined at
which points of b to recompute the permutation distribution
for S(b). The distribution is formed in subroutine DIST
8
X (u)
x
f (u)
X
1 F (u)
X
f0(e'ex u)e'6x
1 F (e"8x u)
o
 Xq (e
Bx
u) e
Bx
If UQ has an exponential distribution, then the
XQ(') is constant. Letting XQ (e u) = X we
this special case,
hazard rate
see that for
Xx(u) = X e"Sx.
That is, the hazard function for Ux is proportional to some
function of x and the covariable acts in a multiplicative
way on the hazard. This motivates consideration of a pro
portional hazards model,
X (u) = A (u) eBx, (1.6)
o
for some baseline Xq(*) and a parameter 8.
We have seen that with the exponential distribution,
the accelerated time model (1.4), which is a loglinear
model (1.5) implies a proportional hazards model (1.6) .
In general the two models do not coincide. Only in the
case of the twoparameter Weibull family of distributions
will they be equivalent (Kalbfleisch and Prentice, 1980,
p. 34).
90
from convergence problems. On the system used, the cost of
analyzing the 184 age data by the new method was $7.20,
whereas by BJ the cost was 30 cents. The small sample costs
are less alarming. For analysing a sample size n = 10 the
new method cost;; 71 cents.
Because many results are asymptotic and many based on
assumptions which are not often plausible, we must be very
concerned with simulation and case study performances.
Buckley and James (1979) report a simulation study which
showed some bias in the Miller estimate. Their indications
are that the Miller method is acceptable when assumption A2
is appropriate but that in that case the results are similar
to those from analyzing just the uncensored observations.
Analyzing the same simulated data with their own (BJ) method,
they report encouraging results for.n = 50 with 50% censor
ing and suggest that the method might be used when there are
20 or more uncensored observations. The Miller approach is
no longer advocated by the author himself largely because
of consistency and convergence problems (Miller and Halpern,
1982). As stated in section 4.2 the KSV method gave rather
disconcerting results with the Stanford datathis is dis
cussed in Miller and Halpern. The Cox, BJ and new methods
gave seemingly satisfactory analyses of that data, with the
new method results falling close to those of the Cox method.
The purpose of the Monte Carlo studies reported in
section 4.1 was to test the new method applied to a range
97
and the percentage points of the null distribution then formed.
The test of the current S(b) value at lines 78 and 148 deter
mines the acceptability or otherwise of the current bvalue.
Rejection of a bvalue marks a boundary to the confidence
interval.
7
The hazard rate forms the essential structure of the
Cox regression model. It can be connected with a linear
regression model in the following way. Suppose UQ is a
survival time with hazard rate A (u) = f (u) / (1 F (u) ) .
o o o
It might be appropriate to assume that for a parameter 3
and some covariate x we could obtain different failure
times via
U
x
(1.4)
The relationship (1.4) is referred to as an accelerated
time model, and indeed with 3x < 0 we have U < U and the
X o
model represents a situation where a failure time experiment
might be speeded up by some control over a covariable x
(cf., Mann et al. 1974) This model is readily seen to
coincide with a loglinear model,
T = Log 0 = a + 6x + Â£, (1.5)
X X
where
a = E[Log UQ] and e = Log UQ E[Log UQ).
Notice that in (1.5) the covariate acts in an additive way
on the log lifetime.
With model (1.4) the hazard rates of Ux and UQ are
related in the following way;
BIOGRAPHICAL SKETCH
Michael Ireson was born in Luton, England, in December,
1942. He received a B.Sc. degree in mathematics and physics
from London University in 1966, and in 1968 obtained a
Diploma in Education from Makerere University, Uganda.
Following a period of teaching secondary school mathematics
in Kenya, he attended the University of Wales for a year and
received a Diploma in Statistics in 1972. For the next seven
years he held a Lectureship in Mathematics at the University
of Malawi. In 1979 he joined the graduate program in statis
tics at the University of Florida. After graduation he plans
to take an appointment as a statistician with Smith, Kline
and French in Welwyn, England.
He is an Associate Fellow of the Instituted Mathematics
and its Applications and a Member of the Institute .of Statis
ticians the Biometric Society and the American Statistical
Association.
He is married and has two children.
115
18
a
Â£ w (B)y
i=l 1 1
8 Â£ w (8) x. ,
i=l 1 1
(4.6)
and resubstituting this in (4.5) would require us to min
imize a function
f (8)
2 w (8) (y, a Bx.) 2.
i=l 1 1 1
(4.7)
This could be minimized by a search procedure, but this has
difficulties, especially in higher dimensions. Miller (1976)
advocates a modified approach v>hich.we now describe.
The alternative procedure is an iterative one. An
initial set of zvalues are formed using .(4.1) with a = 0
and some starting estimate, 8Q / for 8. Weights w^(BQ) are
then computed using (4.4) and the next estimate for 8 is
defined
where
z w* (3J y< (x.x*)
6, =
u
u
y\C /N 9
I w. (8 ) (x.x )^
i o x u'
u
 wi(eo}
w.(B) =  
Z wi(Bo}
u
(4.8)
w. (8 ) x. ,
i o 1
)
with the summations being over the uncensored observations.
The estimate 8^ is then used in (4.1) and so on.
Notice that since the weights do not depend on a we chose
69
iv) Coverage probability. The various confidence
interval procedures are now compared on the basis of the
coverage attained by the intervals. Nominally the intervals
are 90% except for method N with n = 7 and n = 10. In those
cases the intervals are nominally 88% but known to be con
servative with Cl type censoring distribution (see 3.3.5).
In general this adjustment led to intervals with coverage
90% or greater and thus established a more common base from
which to compare other aspects of the various methods.
Tables 811 give the results for n = 7, 10, 15, 25,
respectively. Recall that results for n = 10 are based on
estimates of exact permutation distributions. Because of
computing expense results for n = 10 were obtained only for
three data types. It was felt to be useful to include these
to give some indication of how well the distribution approx
imation functioned. Coverages for the BJ method are calcu
lated using only the data which do give a BJ estimate.
53
3.6 Computational Aspects
The confidence interval procedure described in the
previous sections requires the progressive testing of hypo
thesized 8 values. For each such test use is to be made
either of the exact permutation distribution or the asymp
totic approximation. The exact test of an arbitrary value
b', for b' somewhere between two critical values, would
perhaps be made most conveniently if we could obtain a
pvalue directly for the sample s(b') value and reject
Hq: 8 = b' for a pvalue < a/2 for some level a. Neither
direct nor progressive computation of pvalues seems to be
tractable in practiceto do it requires identifying all
permutations of the current {z^4^: i = l,...,n} elements
giving S values less than (or more than) the current sample
arrangement. Resort is made in this study to the more labori
ous complete enumeration of the permutation distribution for
each b' to be tested. The algorithm employed is to work
outwards from the point estimate and to regenerate the per
mutation distribution as b changes through a critical value ,
realizing (from Corollary 3.9) that this will not always
be necessary because of no change in the distribution. For
each exact permutation distribution, critical regions can
be set and appropriate tests made.
The asymptotic procedure is similar but with variances
recomputed after certain steps in the S(b) function.
non non oooooooon
LISTING FOR APPENDIX A
98
THIS PROGRAM IMPLEMENTS THE NEW METHOD
FOR SAMPLE SIZES 4 TO 15.
DATA INPUT:
SAMPLE SIZE, INTERVAL PERCENTAGE,
LOG OPTIONd L0G10, 0 OTHERWISE) IN FORMAT 315.
X, Y, DELTA IN FORMAT 2F5. 0. 15.
COMMON ISUMXC11000, 15)
DIMENSION X(15),Z(15),Y(15),IC(15),B(11Q),IF(220)
DIMENSION IA(15),ICENS(15)
INTEGER XP(15),C(110),S(llO)
CALL RSEED(35637)
READ(5, 10) N, IPC, LOGT
lO FORMAT(315)
DO 20 1=1, N
20 READ(5, 30) X(I),Y(I) IC(I)
30 FORMAT(2F5. O, 15)
NNN=N*(N1)
NN=NNN/2
XXX=(100. 0IPO/20G. O
COMMENT 1 FIXUP FOR EQUAL POINTS
N1=N1
DO 21 1=1, N1
11=1+1
DO 22 J=I1,N
IFC(XCI) NE. X(J)) .OR. (Y(I) NE. Y(J)))G0 TO 22
R=RNDMF(1. 0)+. 1
IF(IC(I) EQ. 1)Y(I)=Y(I)R/10000. O
IF(IC(I) EQ. 0)Y(I)=Y(I)+R/10000. 0
GO TO 21
22 CONTINUE
21 CONTINUE
IF(LOGT EQ. 0)G0 TO 2
DO 3 1=1, N
3 Y(I)=LG10(Y(I))
2 CONTINUE
DO 31 1=1, N
31 WRITE(6, 29) X ( I ) Y( I), IC(I)
29 FORMAT(2F20. S, 110)
COMMENT 2 PERMUTATIONS
DO 3 1=1,11000
DO 6 J=l. 15
ISUMX(I, J>=0
6 CONTINUE
3 CONTINUE
111=0
IF(N GE. 10)GO TO 600
500 M=N
XP(M)=l
XP(M1)=2
DO 400 Kl=l,2
CALL SWITCH(K1, 3, XP, M)
XP(M2)=3
DO 390 K2=l, 3
CALL SWITCH(K2, 4, XP, M)
XP (M3) =4
DO 380 K3=l, 4
CALL SWITCH(K3, 5, XP, M)
IF (M NE. 4)GO TO 385
111=111+1
CALL XCALC(XP,M, III, X)
GO TO 380
385 XP(M4)=5
DO 370 K4=l, 5
CALL SWITCH(K4, 6, XP, M)
IF (M .NE. 5)GO TO 375

