<%BANNER%>

Bayesian Nonparametric and Semi-Parametric Methods for Incomplete Longitudinal Data

Permanent Link: http://ufdc.ufl.edu/UFE0041942/00001

Material Information

Title: Bayesian Nonparametric and Semi-Parametric Methods for Incomplete Longitudinal Data
Physical Description: 1 online resource (123 p.)
Language: english
Creator: Wang, Chenguang
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: bayesian, dropout, elicitation, intermittent, longitudinal, missing, nonparametric, shrinkage
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to follow-up. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. Sensitivity analysis that assesses the sensitivity of model-based inferences to such assumptions is often necessary. In this dissertation, we posit an exponential tilt model that links non-identifiable distributions and identifiable distributions. This exponential tilt model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. We propose two different saturated models for the observed data distribution, as well as shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. In this dissertation, we also discuss pattern mixture models. Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions are one approach to mixture model identification and are a natural starting point for missing not at random sensitivity analysis. However, when the pattern specific models are multivariate normal (MVN), identifying restrictions corresponding to missing at random may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with time-invariant coefficients). In this paper, we explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. A longitudinal clinical trial is used for illustration of sensitivity analysis. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Chenguang Wang.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Daniels, Michael J.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-02-28

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041942:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041942/00001

Material Information

Title: Bayesian Nonparametric and Semi-Parametric Methods for Incomplete Longitudinal Data
Physical Description: 1 online resource (123 p.)
Language: english
Creator: Wang, Chenguang
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: bayesian, dropout, elicitation, intermittent, longitudinal, missing, nonparametric, shrinkage
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to follow-up. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. Sensitivity analysis that assesses the sensitivity of model-based inferences to such assumptions is often necessary. In this dissertation, we posit an exponential tilt model that links non-identifiable distributions and identifiable distributions. This exponential tilt model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. We propose two different saturated models for the observed data distribution, as well as shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. In this dissertation, we also discuss pattern mixture models. Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions are one approach to mixture model identification and are a natural starting point for missing not at random sensitivity analysis. However, when the pattern specific models are multivariate normal (MVN), identifying restrictions corresponding to missing at random may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with time-invariant coefficients). In this paper, we explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. A longitudinal clinical trial is used for illustration of sensitivity analysis. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Chenguang Wang.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Daniels, Michael J.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-02-28

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041942:00001


This item has the following downloads:


Full Text
xml version 1.0 standalone yes
Volume_Errors
Errors
PageID P134
ErrorID 4
P136
4







BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS
FOR INCOMPLETE LONGITUDINAL DATA


















By
CHENGUANG WANG


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2010






























2010 Chenguang Wang
































I dedicate this to my family









ACKNOWLEDGMENTS

First and foremost, I would like to express the deepest appreciation to my advisor,

Professor Michael J. Daniels. Without his extraordinary guidance and persistent help,

I will never be able to be where I am. I admire his wisdom, his knowledge and his

commitment to the highest standard. It has been truly an honor to work with him.

I wish to specially thank Professor Daniel O. Scharfstein of Johns Hopkins for his

encouragement and crucial contribution to the research. I will always bear in mind the

advice he gave: just relax and enjoy the learning process.

I would like to thank my committee members, Professor Malay Ghosh, Dr. Brett

Presnell, and Dr. Almut Winterstein, who have provided abundant support and valuable

insights over the entire process throughout the classes, exams and dissertation.

Many thanks go in particular to Professor Rongling Wu of Pennsylvania State

University. From Professor Wu, I started learning what I wanted for my career. I also

gratefully thank Professor Myron Chang, Professor Linda Young, Dr. Meenakshi Devidas

and Dr. Gregory Campbell of FDA. I am fortunate to have their support at those critical

moments of my career.

Finally, I would like to thank my wife, my son, my soon-to-be-born baby, my parents

and my parents-in-law. It is only because of you that I have been able to keep working

toward this dream I have.









TABLE OF CONTENTS
page

ACKNOWLEDGMENTS ................... ............... 4

LIST O FTABLES ..................... ................. 8

LIST OF FIGURES .................... ................. 9

ABSTRACT .................... ................... .. 10

CHAPTER

1 INTRODUCTION .................... ............... 12

1.1 Missing Data Concepts and Definitions .................. 12
1.2 Likelihood-Based Methods ............................ 16
1.3 Non-Likelihood Methods ................... ........ 19
1.4 Interm ittent M issingness ............................ 20
1.5 Identifying Restrictions in Pattern Mixture Models .... 22
1.6 Dissertation G oals . .. 24

2 A BAYESIAN SHRINKAGE MODEL FOR LONGITUDINAL BINARY DATA
W ITH DRO P-O UT ................... ............... 26

2.1 Introduction . .. 26
2.1.1 Breast Cancer Prevention Trial .. .. 26
2.1.2 Informative Drop-Out in Longitudinal Studies ... 27
2.1.3 Outline. .... .. 30
2.2 Data Structure and Notation ..... .. ..... 30
2.3 Assum options . . .. 31
2.4 Identifiability .................... .............. 32
2.5 Modeling ..................... .... .......... 35
2.6 Prior Specification and Posterior Computation ..... 35
2.6.1 Shrinkage Priors . 36
2.6.2 Prior of Sensitivity Parameters .. .. 37
2.6.3 Posterior Computation ..... ..... 40
2.7 Assessment of Model Performance via Simulation .... 40
2.8 Application: Breast Cancer Prevention Trial (BCPT) .... 42
2.8.1 Model Fit and Shrinkage Results ... 42
2.8.2 Inference .. . .. 43
2.9 Summary and Discussion ........................... 43
2.10 Acknowledgments ................... ............ 45
2.11 Tables and Figures . .. 45









3 A BETA-BINOMIAL BAYESIAN SHRINKAGE MODEL FOR INTERMITTENT
MISSINGNESS LONGITUDINAL BINARY DATA .................. 54

3.1 Introduction .................... ............... 54
3.1.1 Interm ittent Missing Data ................ ....... 54
3.1.2 Com putational Issues ......................... 55
3 .1.3 O utline . . 55
3.2 Notation, Assumptions and Identifiability ..... 56
3.3 Modeling, Prior Specification and Posterior Computation ... 58
3.3.1 M odeling . . 58
3.3.2 Shrinkage Prior . 58
3.3.3 Prior of Sensitivity Parameters .. .. 60
3.3.4 Posterior Computation ......................... 61
3.4 Assessment of Model Performance via Simulation .... 61
3.5 Application: Breast Cancer Prevention Trial (BCPT) .... 62
3.5.1 Model Fit ... ............... ........ .. .. 62
3.5.2 Inference ... ............... ........ .. .. 63
3.5.3 Sensitivity of Inference to the Priors ... 64
3.6 Summary and Discussion ................. ........ 65
3.7 Tables and Figures ............................ 65
3.8 Appendix ................ ................. 73

4 A NOTE ON MAR, IDENTIFYING RESTRICTIONS, AND SENSITIVITY ANALYSIS
IN PATTERN MIXTURE MODELS ................ ......... 77

4.1 Introduction ... . .... 77
4.2 Existence of MAR under Multivariate Normality within Pattern ....... 79
4.3 Sequential Model Specification and Sensitivity Analysis under MAR .. 83
4.4 Non-Future Dependence and Sensitivity Analysis under Multivariate Normality
w within Pattern . .. 85
4.5 MAR and Sensitivity Analysis with Multivariate Normality on the Observed-Data
R response . . .. 87
4.6 Example: Growth Hormone Study .. .. 89
4.7 ACMV Restrictions and Multivariate Normality with Baseline Covariates .91
4.7.1 Bivariate Case .............. ....... ....... 91
4.7.2 Multivariate Case ........... ............. 93
4.8 Sum m ary . . 95
4.9 Tables . .. 96
4.10 Appendix ..... ..... .. 98

5 DISCUSSION: FUTURE APPLICATION OF THE BAYESIAN NONPARAMETRIC
AND SEMI-PARAMETRIC METHODS ..... ..... 103

5.1 S um m ary . . 103
5.2 Extensions to Genetics Mapping ..... ... ... 103
5.3 Extensions to Causal Inference ..... .... ... 105









5.3.1 Causal Inference Introduction . ... 105
5.3.2 Data and Notation ........................... 107
5.3.3 Missing Data Mechanism ..... ...... 108
5.3.4 Causal Inference Assumption .. ... 109
5.3.5 Stochastic Survival Monotonicity Assumption .... 111
5.3.6 Summary of Causal Inference . ... 112
5.4 Figures . . .. 112

REFERENCES ................ ................... .. 115

BIOGRAPHICAL SKETCH ............. ..... .............. 123









LIST OF TABLES


Table page

2-1 Relative Risks to be Elicited ........... ............... 45

2-2 Percentiles of Relative Risks Elicited ..... ..... 45

2-3 Sim ulation Scenario . .. 46

2-4 Simulation Results: MSE (x103). P and T represent placebo and tamoxifen
arm s, respectively. . .. 47

2-5 Patients Cumulative Drop Out Rate ..... .. ...... 47

3-1 Missingness by Scheduled Measurement Time ... 65

3-2 Simulation Results: MSE (x103). P and T represent placebo and tamoxifen
arm s, respectively. . .. 66

3-3 Sensitivity to the Elicited Prior ... 67

3-4 Sensitivity to the Elicited Prior ... 68

4-1 Growth Hormone Study: Sample mean (standard deviation) stratified by dropout
pattern. ........... .... .................... ... ... 96

4-2 Growth Hormone Study: Posterior mean (standard deviation) ... 97









LIST OF FIGURES


Figure page

2-1 Extrapolation of the elicited relative risks. .. 48

2-2 Prior Density of p .. .. 49

2-3 M odel Fit . . .. 50

2-4 Model Shrinkage ... . 51

2-5 Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent
tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR
and MAR, respectively. ............................... 52

2-6 Posterior mean and 95% credible interval of difference of P[Yj = -1Z = z]
between placebo and tamoxifen arms. The gray and white boxes are for MAR
and MNAR, respectively.................... ............ 53

3-1 M odel Fit . . .. 69

3-2 S hrinkage . . 70

3-3 Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent
tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR
and M AR, respectively. .. .. .. .. .. .. .. 71

3-4 Posterior mean and 95% credible interval of difference of P[Y, = -1Z = z]
between placebo and tamoxifen arms. The gray and white boxes are for MAR
and MNAR, respectively.................... ............ 72

5-1 Contour and Perspective Plots of a Bivariate Density ... 113

5-2 Illustration of pc,t . ...... 114









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS
FOR INCOMPLETE LONGITUDINAL DATA

By

Chenguang Wang

August 2010

Chair: Michael J. Daniels
Major: Statistics

We consider inference in randomized longitudinal studies with missing data that is

generated by skipped clinic visits and loss to follow-up. In this setting, it is well known

that full data estimands are not identified unless unverified assumptions are imposed.

Sensitivity analysis that assesses the sensitivity of model-based inferences to such

assumptions is often necessary.

In Chapters 2 and 3, we posit an exponential tilt model that links non-identifiable

distributions and identifiable distributions. This exponential tilt model is indexed by

non-identified parameters, which are assumed to have an informative prior distribution,

elicited from subject-matter experts. Under this model, full data estimands are shown

to be expressed as functionals of the distribution of the observed data. We propose

two different saturated models for the observed data distribution, as well as shrinkage

priors to avoid the curse of dimensionality. The two procedures provide researchers

different strategies for reducing the dimension of parameter space. We assume a

non-future dependence model for the drop-out mechanism and partial ignorability for

the intermittent missingness. In a simulation study, we compare our approach to a fully

parametric and a fully saturated model for the distribution of the observed data. Our

methodology is motivated by, and applied to, data from the Breast Cancer Prevention

Trial.









In Chapter 4, we discuss pattern mixture models. Pattern mixture modeling is

a popular approach for handling incomplete longitudinal data. Such models are not

identifiable by construction. Identifying restrictions are one approach to mixture model

identification (Daniels and Hogan, 2008; Kenward et al., 2003; Little, 1995; Little and

Wang, 1996; Thijs et al., 2002) and are a natural starting point for missing not at random

sensitivity analysis (Daniels and Hogan, 2008; Thijs et al., 2002). However, when

the pattern specific models are multivariate normal (MVN), identifying restrictions

corresponding to missing at random may not exist. Furthermore, identification

strategies can be problematic in models with covariates (e.g. baseline covariates with

time-invariant coefficients). In this paper, we explore conditions necessary for identifying

restrictions that result in missing at random (MAR) to exist under a multivariate normality

assumption and strategies for identifying sensitivity parameters for sensitivity analysis

or for a fully Bayesian analysis with informative priors. A longitudinal clinical trial is

used for illustration of sensitivity analysis. Problems caused by baseline covariates

with time-invariant coefficients are investigated and an alternative identifying restriction

based on residuals is proposed as a solution.









CHAPTER 1
INTRODUCTION

The problem of incomplete data is frequently confronted by statisticians, especially

in longitudinal studies. The most common type of incomplete data is missing data,

in which each data value is either perfectly known or completely unknown. In other

situations, data are partially missing and partially observed. Examples include rounded

data and censored data, etc.. This type of incomplete data is referred to as coarse data.

Missing data can be viewed as a special case of coarse data (Heitjan and Rubin, 1991).

In both cases, the incompleteness occurs because we observe only a subset of the

complete data, which includes the true, unobservable data. In this dissertation, missing

data including the drop-out missingness, in which case subjects missing a measurement

will not return to study at the next follow-up, and the intermittent missingness, in which

case the missing visit is followed by an observed measurement.

1.1 Missing Data Concepts and Definitions

If the missingness does not happen "at random" and the missingness process is

ignored, biased inferences will often occur. Until the 1970s, most of the methods for

handling missing values in the statistics literature ignored the missingness mechanism

by deleting the incomplete units. Complete-case analysis, also known as case deletion,

confines the analysis to cases that have all variables observed. Available-case analysis,

also known as partial deletion, uses all values observed for univariate analysis.

Both approaches are only valid under the strong assumption that the missingness is

completely at random (MCAR), i.e. the missingness is independent of the response. In

situations when MCAR doesn't hold, it is possible to adjust the selection bias caused by

case deletion by reweighting the remaining cases (Little and Rubin, 1987, chapter 4);

however, this method is inefficient. Another common approach is single imputation, that

is, filling in a single value for each missing value. The advantage of single imputation is

that it does not delete any units and after the imputation, standard methods for complete









data can be applied on the filled-in data. However, single imputation does not reflect the
uncertainty of the missing value. Multiple imputation was proposed to address this flaw
by imputing several values for each missing response(Rubin, 1987).
For notation, let y = {yi, ... y} denote the full data response vector of outcome,
possibly partially observed. Let r = {ri, r2,..., rj} denote the missing data indicator,
with rj = 0 if yj is missing and 1 if yj is observed. Let x denote the covariates. Let yobs
and ymis denote the observed and missing response data, respectively. Let w be the
parameters indexing the full data model p(y, r), 0(w) be the parameters indexing the
full data response model p(y), and O(w) be the parameters indexing the missing data
mechanism model p(r|y).
The common assumptions about the missing data mechanism are as follows.
Little and Rubin's taxonomy: Rubin (1976) and Little and Rubin (1987) developed
a hierarchy for missing data mechanisms by classifying the relationship between
missingness and the response data.
Definition 1.1. Missing responses are missing completely at random (MCAR) if

P(rlYobs, Ymis, x; O(W)) = p(rlx; O( ))

for all x and w.

Definition 1.2. Missing responses are missing at random (MAR) if

P(rlYobs, Ymis, x; O(W)) = p(rlYobs, X; O())

for all x and w.

Note that MAR holds if and only if P(Ymis Yobs, r) = P(YmislYobs). The proof is as
follows:
Proof:

1. Suppose MAR holds. Then we have

p(rlymis, Yobs) = p(rlYobs)









and we can derive that


P(Ymis, rlYobs) p(rlYmis, Yobs)P(Ymis Yobs)
P(Ymis Yobs, ) --) -
p(rlYobs) p(rlYobs)
p(rlYobs)P(Ymis Yobs)
p(rYobs)P(YMI ) = P(YmislYobs, r).
P(r Yobs)
2. To show the reverse direction, note that

p(rm p(r, Ymis Yobs) P(Ymis r, Yobs)P(r Yobs)
P(r Ymis. Yobs) -- ----
P(Ymis Yobs) P(Ymis Yobs)
P(Ymis Yobs)P(r Yobs) p(r
= P(rlYobs)
P(Ymis IYobs)
This completes the proof.
D

Definition 1.3. Missing responses are missing not at random (MNAR) if

P(rlYobs, Ymis, x; O(W)) $ p(rlYobs, y'mis, X; (W))

for some Ymis 7 yis.

Ignorability: Under certain condition, the missingness process can be left
unspecified for the inference on the response model parameter 0(w) (Laird, 1988).
This condition is called ignorability (Rubin, 1976).

Definition 1.4. The missing data mechanism is ignorable if

1. The missing data mechanism is MAR.

2. The parameters of the full data response model, 0(w) and the parameters of
the missingness model are distinguishable, i.e. the full data parameter w can be
decomposed as (0(w), O(w)).

3. The parameters 0(w) and O(w) are a priori independent, i.e. p(0(w), O(w)) =
p(( ))u p( ( ) ).

Full data models that do not satisfy Definition 1.4 have non-ignorable missingness.








Under ignorability, posterior inference on parameters 0(w) can be based on the
observed data response likelihood
n
L(0(w) |obs) c / Pi(Yobs, Ymis 1(w))dymis.
i-1
We show this below,

L(w Yobs, r) = L(0(w), (w) |obs, r)

= p(yobs, Ymis, r|(w), (w))dymis

Sp(ryobs, O())P(Yobs, Ymis 0(w))dymis

= p(r|yobs, (w)) J P(Yobs, Ymis (w))dymis

= p(rlYobs, (W))P(Yobsl (w))

= L(0(w)lr, Yobs)L(0(W ) yobs),

and furthermore,

P( lYobs, r) oc p(w)L(w Yobs, r) = p(O(w))L(/(w)lr, yobs)p(O(w))L(O(w) Yobs).

Therefore,
P(0(w)lyobs, r) oc p(0(w))L(0(w)lyobs)

and the posterior inference of 0(w) can be based on observed response likelihood
L(0(w) Yobs).
Non-future Dependence: For cases with monotone missingness, i.e. rj = 0 implies
rj, = 0 forj' > j, Kenward et al. (2003) defined the term non-future dependence.
Definition 1.5. If the missingness is monotone, the MDM is non-future dependent if

P(rlYobs, Ymis, x; O(W)) = p(rlYobs, Yc, x; (W))

with C = minj{rj = 0}.









Non-future dependence assumes that missingness only depends on observed

data and the current missing value. It can be viewed as a special case of MNAR and an

extension of MAR

1.2 Likelihood-Based Methods

Likelihood based methods handle the missing values by integrating them out of the

likelihood function, instead of deletion or explicitly filling in values. The general strategy

is to model the joint distribution of a response and the missingness process (Hogan and

Laird, 1997b). Likelihood-based models for missing data are distinguished by the way

the joint distribution of the outcome and missing data processes are factorized. They

can be classified as selection models, pattern-mixture models, and shared-parameter

models.

Selection model: Selection models factor the full-data distribution as


p(y, r|w) = p(r y, O(w))p(y|0(w)).

The term "selection" was first introduced in the economics literature for modeling sample

selection bias; that is different responses have different probabilities of being selected

into a sample. Heckman (1979a,b) used a bivariate response Y with missing Y2 as an

example and showed that in general it's critical to answer the question "why are the data

missing" by modeling the missingness of Y2i as a function of observed Yi; (for subject

i). Diggle and Kenward (1994) extended the Heckman model to longitudinal studies and

modeled the drop-out process by logistic regression such as

logit(rj = 0lrj_1 = 1, y) = y'P.

The Diggle and Kenward model has been adopted and extended by many

researchers by (mostly) proposing different full data response models (Albert, 2000;

Baker, 1995; Fitzmaurice et al., 1995; Heagerty, 2002; Kurland and Heagerty, 2004).









Pattern mixture model: Pattern mixture models factor the full-data distribution as

p(y, rlw) = p(y|r, O(w))p(rl|(w)).

Rubin (1977) introduced the idea of modeling respondents and nonrespondents in

surveys separately and using subjective priors to relate respondents' and nonrespondents'

model parameters. Little (1993, 1994) explored pattern mixture models in discrete time

settings. Specifically, different identifying restrictions (see Section 1.5) were proposed

to identify the full-data model. When the number of dropout patterns is large and

pattern-specific parameters will be weakly identified by identifying restrictions, Roy

(2003) and Roy and Daniels (2008) proposed to use latent-class model for dropout

classes. When the dropout time is continuous and the mixture of patterns is infinite,

Hogan et al. (2004) proposed to model the response given dropout by a varying

coefficient model where regression coefficients were unspecified, non-parametric

functions of dropout time. For time-event data with informative censoring, Wu and

Bailey (1988, 1989) and Hogan and Laird (1997a) developed random effects mixture

models. Fitzmaurice and Laird (2000a) generalized Wu and Bailey and Hogan and

Laird approach for discrete, ordinal and count data by using generalized linear mixture

models and GEE approach for statistical inference. Daniels and Hogan (2000) proposed

a parameterization of the pattern mixture model for continuous data. Sensitivity analysis

can be done on the additive (location) and multiplicative (scale) terms. Forster and

Smith (1998) considered a pattern mixture model for a single categorical response

with categorical covariates. Bayesian approaches were employed for non-ignorable

missingness.

Shared parameter model: Shared parameter models factorize the full-data model

as

p(y, r w) = p(y, r, b, w)p(b w)db,
J









where b are subject-specific random effects. It is usually assumed that y and r are

independent conditionally on b.

Wu and Carroll (1988) presented a shared parameter random effects model for

continuous responses and informative censoring, in which individual effects are taken

into account as intercepts and slopes for modeling the censoring process. DeGruttola

and Tu (1994) extended Wu and Carroll's model to allow general covariates. Follmann

and Wu (1995) developed generalized linear model for response and proposed an

approximation algorithm for the joint full-data model for inference. Faucett and Thomas

(1996) and Wulfsohn and Tsiatis (1997) proposed to jointly model the continuous

covariate over time and relate the covariates to the response simultaneously. Henderson

et al. (2000) generalized the joint modeling approach by using two correlated Gaussian

random processes for covariates and response. Ten Have et al. (1998, 2000) proposed

a shared parameter mixed effects logistic regression model for longitudinal ordinal

data. Recently, Yuan and Little (2009) proposed a mixed-effect hybrid model allows the

missingness and response to be conditionally dependent given random effects.

Sensitivity analysis for missing not at random: Sensitivity analysis is critical in

longitudinal analysis of incomplete data. The full-data model can be factored into an

extrapolation model and an observed data model,


p(y, rlw) = P(Ymis Yobs, r, WE)P(Yobs, rlw/),

where WE are parameters indexing the extrapolation model and wi are parameters

indexing the observed data model and are identifiable from observed data (Daniels and

Hogan, 2008). Full-data model inference requires unverifiable assumptions about the

extrapolation model p(ymis Yobs' r, WE). A sensitivity analysis explores the sensitivity of

inferences of interest about the full data response model to unverifiable assumptions

about the extrapolation model. This is typically done by varying sensitivity parameters,

which we define next (Daniels and Hogan, 2008).









Definition 1.6. Let p(y, r w) be a full data model with extrapolation factorization


p(y, r1w) = P(Ymis Yobs, r,W E)P(Yobs, rl|/).

Suppose there exists a reparameterization ((w) = (4s, m) such that

1. (s is a non-constant function of wE,

2. The observed likelihood L((s, M Ylobs, r) is a constant as a function of 5s,

3. Given s fixed, L(s, M Ylobs, r) is a non-constant function of 5m

then s is a sensitivity parameter.

Unfortunately, fully parametric selection models and shared parameter models do

not allow sensitivity analysis as sensitivity parameters cannot be found (Daniels and

Hogan, 2008, Chapter 8). Examining sensitivity to distributional assumptions, e.g.,

random effects, will provide different fits to the observed data, (Yobs, r). In such cases, a

sensitivity analysis cannot be done since varying the distributional assumptions does not

provide equivalent fits to the observed data (Daniels and Hogan, 2008). It then becomes

an exercise in model selection.

Fully Bayesian analysis allows researchers to have a single conclusion by admitting

prior beliefs about the sensitivity parameters. For continuous responses, Lee and

Berger (2001) built a semiparametric Bayesian selection model which has strong

distributional assumption for the response but weak assumption on missing data

mechanism. Scharfstein et al. (2003) on the other hand, placed strong parametric

assumptions on missing data mechanism but minimal assumptions on the response

outcome.

1.3 Non-Likelihood Methods

In non-likelihood approaches, the joint distribution of the outcomes is typically

modeled semiparametrically and estimating equations are used for inference. Liang

and Zeger (1986) proposed generalized estimating equations (GEE) whose solution









is consistent if the marginal mean of response is correctly specified. However,
inference based on GEE is only valid under MCAR. Robins et al. (1995) proposed

inverse-probability of censoring weighted generalized estimating equations (IPCW-GEE)

approach, which reweights each individual's contribution to the usual GEE by the

estimated probability of drop-out. IPCW-GEE will lead to consistent estimation when

the missingness is MAR. However, both GEE and IPCW-GEE can result in biased

estimation under MNAR.

Rotnitzky et al. (1998a, 2001), Scharfstein et al. (2003) and Schulman et al. (1999)

adopted semiparametric selection modeling approaches, in which the model for drop-out

is indexed by interpretable sensitivity parameters that express departures from MAR.

For such approaches, the inference results depend on the choice of unidentified, yet

interpretable, sensitivity analysis parameters.

1.4 Intermittent Missingness

Intermittent missingness occurs when a missing value is followed by an observed

value. The existence of intermittent missing values increases exponentially the number

of missing patterns that need to be properly modeled. Thus, handling informative

intermittent missing data is methodologically and computationally challenging and, as a

result, the statistics literature is limited.

One approach to handle intermittent missingness is to consider a "monotonized"

dataset, whereby all observed values on an individual after their first missingness are

deleted Land et al. (2002). However, this increases the "dropout" rate, loses efficiency,

and may introduce bias.

Other methods in the literature often adopt a likelihood approach and rely on strong

parametric assumptions. For example, Troxel et al. (1998), Albert (2000) and Ibrahim

et al. (2001) suggested a selection model approach. Albert et al. (2002) used a shared

latent autoregressive process model. Lin et al. (2004) employed latent class pattern

mixture model.









Semiparametric methods have been proposed by Troxel et al. (1998) and Vansteelandt

et al. (2007). Troxel et al. (1998) proposed a marginal model and introduced a

pseudo-likelihood estimation procedure. Vansteelandt et al. (2007) extended the

ideas of Rotnitzky et al. (1998b), Scharfstein et al. (1999) and Rotnitzky et al. (2001) to

non-monotone missing data that assume (exponentially tilted) extensions of sequential

explainability and specified parametric models for certain conditional means.

Most related to the approach we will use in Chapter 3 are the (partial ignorability)

assumptions formalized in Harel and Schafer (2009) that partition the missing data

and allow one (or more) of the partitions to be ignored given the other partition(s)

and the observed data. Specifically, Harel and Schafer (2009) defined a missing data

mechanism to be partially missing at random if


P(rlYobs, Ymis, g(r), x; O( )) = p(rlyobs, g(r), x; (w ))

where g(-) denotes a summary function of r and can be chosen based on what aspects

of r are related to missing values. These assumptions are similar to the sequential

explainability assumption reviewed in Vansteelandt et al. (2007).

In this dissertation, we explicitly partition the missing data indicator vector r into

{rs, s}, where s = maxt{rt = 1} denotes the last time point a response was observed,
i.e. the "survival" time, and rs = {rt: t < s} denotes the missing data indicators recorded

prior to the drop-out time. With this partition, we define partial missing at random as

follows:

Definition 1.7. Missing responses are partially missing at random if

P(rs Yobs Ymis, S,X; (W)) = p(rs Yobs, S,X; (W)).

This can be viewed as Harel and Schafer's definition with g(r) chosen to be the survival

time.









In the following Chapters, we first propose a model that admits identification of the

treatment-specific distributions of the trajectory of longitudinal binary outcomes when

there is drop-out, but no intermittent observations. We then obtain identification with

intermittent missing observations by assuming, that within drop-out and treatment strata,

the intermittent missing responses are missing at random. This is the partial ignorability

assumption in Definition 1.7.

1.5 Identifying Restrictions in Pattern Mixture Models

Pattern-mixture models by construction are not identified: the observed data does

not provide enough information to identify the distributions for incomplete patterns (Little,

1993, 1994). Additional assumptions about the missing data process are necessary in

order to yield identifying restrictions that equate the inestimable parameters to functions

of estimable parameters and identify the full-data model.

For example, consider the situation when y = (yl, y2) is a bivariate normal response

with missing data only in y2. Let s be the survival time, i.e. s = 1 if y2 is missing and s =

2 if y2 is observed. We model p(s) and p(yls) as s ~ Bern(O) and yls = i ~ N(piS), (s))

for i = 1, 2, with

(s) = and (s) J= 1 .
(s) (s) (s)
2 12 22

For s = 1, only yl is observed. Therefore, parameters p1), oa) (equals o,)) and a-) are

not identified. By assuming


y2 ly, s = 1 y y2ly1, s = 2,









which is the available case missing value (ACMV) restriction defined later in this section,

we have

(1) (2)
~ 1 21) (1) (2) .21 (2)
/2 + () )(Y1 P ) 2 ( (2)Y1 1
11 11
1 (1))2 ( (2))2
'(1) (2) ( 21l
22 (1) 22 (2)
11 11
(1) (2)
U21 21
(1) (2)'
11 11

by which all the unidentified parameters are identified.

Understanding (identifying) restrictions that lead to MAR is an important first step

for sensitivity analysis under missing not at random (MNAR) (Daniels and Hogan, 2008;

Scharfstein et al., 2003; Zhang and Heitjan, 2006). In particular, MAR provides a good

starting point for sensitivity analysis and sensitivity analysis are essential for the analysis

of incomplete data (Daniels and Hogan, 2008; Scharfstein et al., 1999).

Little (1993) developed several common identifying restrictions. For example,

complete case missing value (CCMV) restrictions which equate all missing patterns to

the complete cases, i.e.

Pk(yj lY-i) = PJ(YjlYj-1);

equating parameters to a set of patterns, that is set parameters in pattern k, namely 0(k),

equal to the set of patterns S

0(k) ZrjOG);
jES
or equating pattern distributions to a mixture of a set of patterns 5, i.e.


PkQ)= jPj(.)
jEs

Some special case of the pattern-set mixture models restrictions include nearest-neighbor

constraints:

Pk(Yjj-1) = pj( ylj-1),









and available case missing value (ACMV) constraints:


Pk(jYIYj-i) = P(yj yj-1)

Molenberghs et al. (1998) proved that for discrete time points and monotone missingness,

the ACMV constraint is equivalent to missing at random (MAR).

Thijs et al. (2002) developed strategies to apply identifying restrictions. That is first

fit Pk(Yk), then choose an identifying restriction to identify the missing patterns. Multiple
imputation can be applied by drawing unobserved components from the identified

missing patterns. Kenward et al. (2003) discussed identifying restrictions corresponding

to missing non-future dependence.

1.6 Dissertation Goals

There will be two major components to this dissertation. First, we will develop a

Bayesian semiparametric model for longitudinal binary responses with non-ignorable

missingness, including drop-out and intermittent missingness. Second, we will carefully

explore identifying restrictions for pattern mixture models.

Bayesian shrinkage model: We propose two different parameterizations of

saturated models for the observed data distribution, as well as corresponding shrinkage

priors to avoid the curse of dimensionality. The two procedures provide researchers

different strategies for reducing the dimension of parameter space. We assume a

non-future dependence model for the drop-out mechanism and partial ignorability for

the intermittent missingness. In a simulation study, we compare our approach to a fully

parametric and a fully saturated model for the distribution of the observed data. Our

methodology is motivated by, and applied to, data from the Breast Cancer Prevention

Trial.

Identifying restrictions and sensitivity analysis in pattern mixture models:

The normality of response data (if appropriate) for pattern mixture models is desirable

as it easily allows incorporation of baseline covariates and introduction of sensitivity









parameters (for MNAR analysis) that have convenient interpretations as deviations of

means and variances from MAR (Daniels and Hogan, 2008). However, multivariate

normality within patterns can be overly restrictive when applying identifying restrictions.

We explore such issues in Chapter 4.

Furthermore, identification strategies can be problematic in models with covariates

(e.g. baseline covariates with time-invariant coefficients). In this Chapter, we also

explore conditions necessary for identifying restrictions that result in missing at random

(MAR) to exist under a multivariate normality assumption and strategies for sensitivity

analysis. Problems caused by baseline covariates with time-invariant coefficients are

investigated and an alternative identifying restriction based on residuals is proposed as

a solution.









CHAPTER 2
A BAYESIAN SHRINKAGE MODEL FOR LONGITUDINAL BINARY DATA WITH
DROP-OUT
2.1 Introduction
2.1.1 Breast Cancer Prevention Trial

The Breast Cancer Prevention Trial (BCPT) was a large multi-center, double-blinded,

placebo-controlled, chemoprevention trial of the National Surgical Adjuvant Breast

and Bowel Project (NSABP) designed to test the efficacy of 20mg/day tamoxifen in

preventing breast cancer and coronary heart disease in healthy women at risk for

breast cancer (Fisher et al., 1998). The study was open to accrual from June 1, 1992

through September 30, 1997 and 13,338 women aged 35 or older were enrolled in the

study during this interval. The primary objective was to determine whether long-term

tamoxifen therapy is effective in preventing the occurrence of invasive breast cancer.

Secondary objectives included quality of life (QOL) assessments to evaluate benefit as

well as risk resulting from the use of tamoxifen.

Monitoring QOL was of particular importance for this trial since the participants

were healthy women and there had been concerns voiced by researchers about the

association between clinical depression and tamoxifen use. Accordingly, data on

depression symptoms was scheduled to be collected at baseline prior to randomization,

at 3 months, at 6 months and every 6 months thereafter for up to 5 years. The

primary instrument used to monitor depressive symptoms over time was the Center

for Epidemiologic Studies Depression Scale (CES-D)(Radloff, 1977). This self-test

questionnaire is composed of 20 items, each of which is scored on a scale of 0-3. A

score of 16 or higher is considered as a likely case of clinical depression.

The trial was unblinded on March 31, 1998, after an interim analysis showed a

dramatic reduction in the incidence of breast cancer in the treatment arm. Due to

the potential loss of the control arm, we focus on QOL data collected on the 10,982

participants who were enrolled during the first two years of accrual and had their CES-D









score recorded at baseline. All women in this cohort had the potential for three years of

follow-up (before the unblinding).

In the BCPT, the clinical centers were not required to collect QOL data on women

after they stopped their assigned therapy. This design feature aggravated the problem of

missing QOL data in the trial. As reported in Land et al. (2002), more than 30% of the

CES-D scores were missing at the 36-month follow-up, with a slightly higher percentage

in the tamoxifen group. They also showed that women with higher baseline CES-D

scores had higher rates of missing data at each follow-up visit and the mean observed

CES-D scores preceding a missing measurement were higher than those preceding

an observed measurement; there was no evidence that these relationships differed by

treatment group.

While these results suggest that the missing data process is associated with

observed QOL outcomes, one cannot rule out the possibility that the process is further

related to unobserved outcomes and that this relationship is modified by treatment. In

particular, investigators were concerned (a priori) that, between assessments, tamoxifen

might be causing depression in some individuals, who then do not return for their

next assessment. If this occurs, the data are said be missing not at random (MNAR);

otherwise the data are said to be missing at random (MAR).

2.1.2 Informative Drop-Out in Longitudinal Studies

In this paper, we will concern ourselves with inference in longitudinal studies,

where individuals who miss visits do not return for subsequent visits (i.e., drop-out).

In such a setting, MNAR is often referred to as informative drop-out. While there were

some intermittent responses in the BCPT, we will, as in Land et al. (2002), consider a

"monotonized" dataset, whereby all CES-D scores observed on an individual after their

first missing score have been deleted (this increases the "dropout" rate).

There are two main inferential paradigms for analyzing longitudinal studies with

informative drop-out: likelihood parametricc) and non-likelihood (semi-parametric).









Articles by Little (1995), Hogan and Laird (1997b) and Kenward and Molenberghs
(1999) as well as recent books by Molenberghs and Kenward (2007) and Daniels

and Hogan (2008) provide a comprehensive review of likelihood-based approaches,

including selection models, pattern-mixture models, and shared-parameter models.

These models differ in the way the joint distribution of the outcome and missing data

processes are factorized. In selection models, one specifies a model for the marginal

distribution of the outcome process and a model for the conditional distribution of the

drop-out process given the outcome process (see, for example, Albert, 2000; Baker,

1995; Diggle and Kenward, 1994; Fitzmaurice et al., 1995; Heckman, 1979a; Liu

et al., 1999; Molenberghs et al., 1997); in pattern-mixture models, one specifies a

model for the conditional distribution of the outcome process given the drop-out time

and the marginal distribution of the drop-out time (see, for example, Birmingham and

Fitzmaurice, 2002; Daniels and Hogan, 2000; Fitzmaurice and Laird, 2000b; Hogan and

Laird, 1997a; Little, 1993, 1994, 1995; Pauler et al., 2003; Roy, 2003; Roy and Daniels,

2008; Thijs et al., 2002); and in shared-parameter models, the outcome and drop-out

processes are assumed to be conditionally independent given shared random effects

(see, for example, DeGruttola and Tu, 1994; Land et al., 2002; Pulkstenis et al., 1998;

Ten Have et al., 1998, 2000; Wu and Carroll, 1988; Yuan and Little, 2009). Traditionally,

these models have relied on very strong distributional assumptions in order to obtain

model identifiability.

Without these strong distributional assumptions, specific parameters from these

models would not be identified from the distribution of the observed data. To address

this issue within a likelihood-based framework, several authors (Baker et al., 1992;

Daniels and Hogan, 2008; Kurland and Heagerty, 2004; Little, 1994; Little and Rubin,

1999; Nordheim, 1984) have promoted the use of global sensitivity analysis, whereby

non- or weakly- identified, interpretable parameters are fixed and then varied to evaluate









the robustness of the inferences. Scientific experts can be employed to constrain the

range of these parameters.

Non-likelihood approaches to informative drop-out in longitudinal studies have

been primarily developed from a selection modeling perspective. Here, the marginal

distribution of the outcome process is modeled non- or semi-parametrically and the

conditional distribution of the drop-out process given the outcome process is modeled

semi- or fully- parametrically. In the case where the drop-out process is assumed to

depend only on observable outcomes (i.e., MAR), Robins et al. (1994, 1995), van der

Laan and Robins (2003) and Tsiatis (2006) developed inverse-weighted and augmented

inverse-weighted estimating equations for inference. For informative drop-out, Rotnitzky

et al. (1998a), Scharfstein et al. (1999) and Rotnitzky et al. (2001) introduced a class of

selection models, in which the model for drop-out is indexed by interpretable sensitivity

parameters that express departures from MAR. Inference using inverse-weighted

estimating equations was proposed.

The problem with the aforementioned sensitivity analysis approaches is that

the ultimate inferences can be cumbersome to display. Vansteelandt et al. (2006a)

developed a method for reporting ignorance and uncertainty intervals (regions) that

contain the true parameters) of interest with a prescribed level of precision, when the

true data generating model is assumed to fall within a plausible class of models (as

an example, see Scharfstein et al., 2004). An alternative and very natural strategy is

specify an informative prior distribution on the non- or weakly- identified parameters

and conduct a fully Bayesian analysis, whereby the ultimate inferences are reported in

terms of posterior distributions. In the cross-sectional setting with a continuous outcome,

Scharfstein et al. (2003) adopted this approach from a semi-parametric selection

modeling perspective. Kaciroti et al. (2009) proposed a parametric pattern-mixture

model for cross-sectional, clustered binary outcomes Lee et al. (2008) introduced

a fully-parametric pattern-mixture approach in the longitudinal setting with binary









outcomes. In this paper, we consider the same setting as Lee et al. (2008), but offer

a more flexible strategy. In the context of BCPT, the longitudinal outcome will be the

indicator that the CES-D score is 16 or higher.

2.1.3 Outline

The paper is organized as follows. In Section 2.2, we describe the data structure. In

Section 2.3 and 2.4, we formalize identification assumptions and prove that the full-data

distribution is identified under these assumptions. We introduce a saturated model for

the distribution of the observed data in Section 2.5. In Section 2.6, we illustrate how

to apply shrinkage priors to high-order interaction parameters in the saturated model

to reduce the dimensionality of the parameter space and how to elicit (conditional)

informative priors for non-identified sensitivity parameters from experts. In Section 2.7,

we assess, by simulation, the behavior of three classes of models for the distribution of

observed data; parametric, saturated, and shrinkage. Our analysis of the BCPT trial is

presented in Section 2.8. Section 2.9 is devoted to a summary and discussion.

2.2 Data Structure and Notation

Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen

and Z = 0 denotes placebo. Let Yj denote the binary outcome (i.e., depression)

scheduled to be measured at thejth visit ( = 0(baseline),..., J) and let Yy = (Yo,.... Yj)

denote the history of the outcome process through visit j. Let Rj denote the indicator

that an individual has her depression status recorded at visit j. We assume that Ro = 1

(i.e., Yo is always observed) and Rj = 0 implies that Rj i = 0 (i.e., monotone missing

data). Let C = max{t : Rt = 1} be the last visit at which an individual's depression

status is recorded. The full and observed data for an individual are F = (Z, C, Yj) and

O = (Z, C, Yc), respectively. We assume that we observe n i.i.d., copies of 0. We will

use the subscript i to denote data for the ith individual.

Our goal is to draw inference about p_ = P[Y = 1 Z = z] forj = 1,... J and

z = 0, 1.









2.3 Assumptions

To identify pj from the distribution of the observed data, we make the following two

untestable assumptions:

Assumption 1 (Non-Future Dependence): Rj is independent of (j,,,..., Yj) given

Rj_ = 1 and Yj, forj = 1...J 1.
This assumption asserts that for individuals at risk for drop-out at visit and who

share the same history of outcomes up to and including visit j, the distribution of future

outcomes is the same for those who are last seen at visit j and those who remain on

study past visit j. This assumption has been referred to as non-future dependence

(Kenward et al., 2003).

Assumption 2 (Pattern-Mixture Representation): Forj = 1,..., J and yj = 0, 1,


P[Yj = yj|Rj = 0, Rj-1 = 1, Yj-_, Z = z] =
P[Yj = yjlRj = 1,Yj_I, Z = z] exp{qz,(Yy_l, yj)}
E[exp{qz,i(Yi_1, Yj)} Rj = 1, Yj_1, Z = z]

where qz,(Y-y_, Yj) is a specified function of its arguments.

Assumption 2 links the non-identified conditional distribution of Yj given Rj = 0,

Rj_1 = 1, Yil, and Z = z to the identified conditional distribution of Yj given Rj = 1,

Yi_1, and Z = z using exponential tilting via the specified function qz,(Y y_, Yj).
Assumption (2) has a selection model representation that is obtained using Bayes' rule.

Assumption 2 (Selection Model Representation): Forj = 1,..., J,

logit {P[R = 01 Rj_1 = 1, Y,, Z = z]} = hz,(Y,_1) + qz (Y_-, Yj)

where

hz,j(Yj_) = logit P[R = 01Rj_ = 1, Yj_,Z = z] -

log{E[exp{qz,(Y_l, Yj)}IR_- = 1, Y _1, Z = z]}









With this characterization, we see that the function qz,(Yi_i, Yj) quantifies the

influence (on a log odds ratio scale) of the potentially unobservable outcome Yj on the

conditional odds of dropping at time j.

2.4 Identifiability

The above two assumptions non-parametrically, just-identify pj for all j 1,..., J

and z = 0, 1. To see this, consider the following representation of this conditional

distribution, derived using the laws of total and conditional probability:




j-
,zi = P[Y = 1IRJ = 1, Y-1= y _1, Z = z]x


P[R, = 1|R,_i = 1, Y-1 = Y,-i, Z = z] I P[Yi = yIRY = 1, Y,_i = Y,-i, Z = z] +
/=1 /= 0



k-I k-I
x P[R =1 Ri-1 = 1,Y=-1 = Y/-1, Z = z] P[Yi = yIRi = 1,Yk-1 = Y-1, Z =z]
/=1 /=0
All quantities on the right hand side of this equation are identified, without appealing

to any assumptions, except P[YY = 1 Rk = 0, Rk-i = 1, Yk-1 = Yk-1, Z = z] for

k = 1, ... j 1. Under Assumptions 1 and 2, these probabilities can be shown to be

identified, implying that :,j is identified for all j and z.

Theorem 1: P[Yj = 1IRk-1 = 1,Yk-i Yk-1, Z = z] and P[Y; = 1IRk = 0, Rk-i

1, Yk- = Yk-1, Z = z] are identified for k = 1,... j.
Proof: The proof follows by backward induction. Consider k = j. By Assumption 2,


P[Yj = 1|Ry = 0, Rj_i = 1, Yj_1 = yj_l, Z = z] =
P[Yj = 1 R = 1, Y_i, Z = z] exp{qz,(Yj_i, 1)}
E[exp{qz,(Y-_l = YY)}|R = 1, Y_= y j_, Z = z]









Since the right hand side is identified, we know that P[Yj = 1 Rj = 0, Rj-_ =

1, Y _1 = yj-, Z = z] is identified. Further, we can write


P[Yj = 1Rj_ I= 1, Y_il = y _i,Z= z]
1
-.
= > P[Y = 1|R, = r, Rj_1 1,Yj- I= Yj_1, Z = z]P[Rj = rlR -1 = 1,Y j- = yj-i, Z = z]
r=O

Since all quantities on the right hand side are identified, P[Yj = 1 Rj-_ = 1, Y _- =

y _1, Z = z] is identified.

Suppose that P[Yj = 1IRk = 0, Rk-i = 1,Yk-1 Yk-, Z = z] and P[Y = 1IRk-1 =

1, Yk-i = Yk-1, Z = z] are identified for some k where 1 < k < j. Then, we need to show

that these probabilities are identified for k' = k 1. To see this, note that


P[Yj = IlRk' = 0, Rk'-1 = 1,Yk'-i = Yk,-1, Z = z]

= P[Y = 1 iRk-I = 0, Rk-2 = 1,Yk-2 = Yk-2, Z = z]
1
= P[Y,=IlRk- =O, Rk-2 Yk-1 Yk-1,Z=z]x
Yk- 1 0

P[Yk-1 Yk-1IRk-1 = 0, Rk-2 = 1, Yk-2 = k-2, Z = z]
1
= P[Y = 1IRk- = 1,Yk- k-,Z = z]x
Yk 1 0
P[Yk-1 Yk-1 Rk-i = 1, Yk-2 = k-2, Z = z] exp{qz,k-I(Yk-2, Yk-)}
E[exp{q,k-1(Yk-2, Yk-1)}Rk-1 = 1,Yk-2 = Yk-2, Z = z]

The third equality follows by Assumptions 1 and 2. Since all the quantities on the

right hand side of the last equality are identified, P[Yj = 1 Rk, = 0, Rk'-1 = 1, Yk'- =

yk'-1, Z = z] is identified. Further,

P[Yj = 1IRk/_I = Yk'-i = yk'-1, Z = z]

= P[Yj = 1lRk-2 = 1,Yk-2 = Yk-2, Z = z]
1
= P[Yj= 1IRk- 1,Yk1 Yk-l,Z=z]x
Yk-1=0









P[Yk-1 = Yk- Rk- = 1, Yk-2 = Yk-2, Z = z]x

P[Rk-i = 1 Rk-2 = 1,Yk-2 = Yk-2, Z = ]+
1
SP[Yj = 1Rk- = 0, Rk-2 = 1, Yk- = Yk-, Z = z]x
Yk-1= 0

P[Yk-1 = Yk-i Rk- = 0, Yk-2 = Yk-2, Z = z]x

P[Rk-1 = 0Rk-2 Yk-2 Yk-2, Z z]
1
S P[Y = Rkl 1,Yk-1 Yk-, Z = z]x
Yk-1= 0

P[Yk-1 = k-i Rk-i = 1, Yk-2 = Yk-2, Z = z]x

P[Rk- = 1 Rk-2 1,Yk-2 = Yk-2, Z = z]+


Yk-1= 0
P[Yk-1 = Yk- Rk-1 = 1, Yk-2 = Yk-2, Z = z] exp{qz,k-1(Yk-2, Yk-1)}
E[exp{qz,k-1(Yk-2, Yk-1)} Rk-1 = 1, Yk-2 = Yk-2, Z = z]

P[Rk-1 = 0Rk-2 = 1,Yk-2 = Yk-2, Z = z]

The third equality follows by Assumptions 1 and 2. Since all the quantities on the right

hand side of the last equality are identified, P[Yj = 1IRk'-1 = 1, Yk'-1 = Yk-1, Z = z] is

identified. o

The identifiability result shows that, given the functions qz,j(Yj-, Y), p*,j can

be expressed as functional of the distribution of the observed data. In particular, the

functional depends on the conditional distributions of Yj given Rj = 1, Yj1, and Z

forj = 0,... J and the conditional distributions of Rj given Rj_i = 1, Yj-_ and Z

forj = 1,..., J. Furthermore, the functions qzj(Yjy_, Yj) are not identifiable from the

distribution of the observed data and their specification places no restrictions on the

distribution of the observed data.









2.5 Modeling


We specify saturated models for the observed data via the sequential conditional

distributions of [Yj R = 1, Yj _, Z] forj = 0,..., J and the conditional hazards [RjI Rj- =

1, Yji, Z] forj = 1,..., J. We parameterize these models as follows:


logit P[Yo = 1Ro = 1, Z = z] = z,o,o
j-2
logit P[Yj = R 1 1R ,Yj_I = Yj-1, Z = z] = az,o -+ az,j,ly-1 + OzkYk
k=0
Z (2) ,(3) C(j-1)
..y. zjYkY/Ym +... z + YoYi Y -1
k,IA 2) k,l,mEA6 3)
j-2
(1)
logit P[Rj = 0|RjI = 1, Y-I -- Yj-1, Z = z] = 7,j,o + 7z,j,lj-i + _7z,j,kk
k=O
Z (2) (3) C 1)
+ (.YkY 7 ),.Yky/Ym +... + yz,j Y1) l "yj-1
k,IEA2) k,l,mEA(3)

forj = 1,..., J, where A t) is the set of all t-tuples of the integers 0,... j 1. Let a

denote the parameters indexing the conditional distributions [YjIRj = 1, Y-l, Z], 7

denote the parameters indexing the conditional distributions [Rj Rj-_ = 1, Y _1, Z] and

0= {a,7}.
Furthermore, we propose to parameterize the functions qzj(Y_-, Yj) with

parameters z,.y = qzj((Yi-1, 1)) qz,((Yi-1, 0)). Here, exp(,,yj _) represents,

in the context of the BCPT trial, the conditional odds ratio of dropping out between visits

j 1 and j for individuals who are depressed vs. not depressed at visit j, but share the

mental history yj- through visit 1. We let 7 denote the collection of Tjy 's.

2.6 Prior Specification and Posterior Computation

For specified sensitivity analysis parameters 7, the saturated model proposed

in Section 2.5 provides a perfect fit to the distribution of the observed data. In this

model, however, the number of parameters increases exponentially in J. In contrast,

the number of data points increases linearly in J. As a consequence, there will be many









combinations of Yj-1 (i.e., "cells") which will be sparsely represented in the dataset.

For example, in the BCPT trial, about 50% of the possible realizations of Y7 have

less than two observations and about 15% have no observations. For a frequentist

perspective, this implies that components of 0 will be imprecisely estimated; in turn, this

can adversely affect estimation of pj. This has been called the curse of dimensionality

(Robins and Ritov, 1997).

2.6.1 Shrinkage Priors

To address this problem, we introduce data driven shrinkage priors for higher order

interactions to reduce the number of parameters in an automated manner. In particular,

we assume


z,j,k ~ N(, at) and 7,k N(0,) k A(t) 3 < t (2-1)

where t is the order of interactions and the hyper-parameters (shrinkage variances)

follow distributions

(t) Unif(0, 10) and a(t) ~ Unif(0, 10).

When oat) and -(t) equal zero for all interactions, the saturated model is reduced to a

first order Markov model,

logit P[Yo = 1Ro = 1, Z = z] = az,o,o

logit P[Yj = 1Rj = 1, Yj1 = yj1, Z = z] = azjo + zj,lYj-1

logit P[Rj = 0Rj _l = 1,Yi = yji, Z = z] = 7z,o + 7z,,lyj-1.

The shrinkage priors allow the "neighboring" cells in the observed data model to borrow

information from each other and provide more precise estimates.

When the first order Markov model is not true, as n goes to infinity, the posterior

means of observed data probabilities will converge to their true values as long as









the shrinkage priors are 0(1) (which is the case here) and all the true values of the

observed data probabilities, P[YjIRj = 1, Yj_, Z] forj = 0,..., J and are in the

open interval, (0, 1). This follows, since under this latter condition, all combinations

of depression histories have a positive probability of being observed and the prior will

become swamped by the observed data. However, when the true value of any of the

observed data probabilities is zero or one, there exists at least one combination of

depression history that will never be observed and thus the influence of the prior will not

dissipate as n increases.

We specify non-informative priors N(0, 1000) for the non-interaction parameters in

0, namely cazjo forj = 0,... J and z = 0, 1, aczj,, 7zj,o and 7,j,1 forj = 1,... J and

z = 0,1.

2.6.2 Prior of Sensitivity Parameters

The sensitivity parameters in Assumption 2, defined formally in Section 2.5, are

(conditional) odds ratios. In our experience, subject matter experts often have difficulty

thinking in terms of odds ratios; rather, they are more comfortable expressing beliefs

about relative risks (Scharfstein et al., 2006; Shepherd et al., 2007). With this is in

mind, we asked Dr. Patricia Ganz, a medical oncologist and expert on quality of life

outcomes in breast cancer, to express her beliefs about the risk of dropping out and its

relationship to treatment assignment and depression. We then translated her beliefs into

prior distributional assumptions about the odds ratio sensitivity parameters r.

Specifically, we asked Dr. Ganz to answer the following question for each treatment

group:

Q: Consider a group of women assigned to placebo (tamoxifen), who are on study

through visit 1 and who share the same history of depression. Suppose that the

probability that a randomly selected woman in this group drops out before visit j is p

(denoted by the columns in Table 1). For each p, what is the minimum, maximum and

your best guess (median) representing how much more (e.g. twice) or less (e.g., half)








likely you consider the risk of dropping out before visit j for a woman who would be
depressed at year RELATIVE to a woman who would not be depressed at visit ?
Implicit in this question is the assumption that, for each treatment group, the relative
risk only depends on past history and the visit number only through the risk of dropping
out between visits j 1 and j.
For notational convenience, let rz(p) denote the relative risk of drop-out for
treatment group z and drop-out probability p. Further, let rz,min(p), rz,med(p) and
rz,max(p) denote the elicited minimum, median, and maximum relative risks (see
Table2-1). Let pzj(Yj-i) = P[Rj = 0ORj = 1,Y/_j = y_1,,Z = z] and let

Pj(yy-1) = P[Rj = OiR-1 = 1,Y_1 = yi_ Y = y, Z = z] fory = 0,1.
By definition,

^P ,y-i)) =PZ( -/1 P^ -1)
P p (yj_1)7(yYj_)
1
p=,jy ) =y -l)Wzj ( -)
y=O

where (-(y) = P[Yj = y|R-y = 1,YJ_- = yi, Z = z] for y = 0,1. This implies that
z ()(y. -) P[___ pzy YRi-1
(0) Pzj(Yj-1)
PZ(,y,-/) (yi_ )(rz(Pz,(Yj_1))- 1)+ 1

Since r~ ,(yj-) E [0, 1], given pz,(Yj-1) and rz(pz,(yy_1)), p],)(y-) is bounded as

follows:
for rz(pzj(yj_ )) > 1,

Pz,j(yj-)/rz(pzj(yj-)) <- PzC (yj-I) < min{pz,(yj_l), 1}

and, for rz(pz,j(yj-)) < 1,

Pz,j(Yj-) < PO(yj-1) < mn{hpz,(Yj-le)/rz(pz,(Yj-le)), 1}.

We will use these bounds to construct our prior.








We construct the conditional prior of Tz,,y,, given pz,(yi-1) using Steps 1-4 given
below. The general strategy is to use the elicited information on the relative risk at
different drop-out probabilities and the bounds derived above to construct the prior of
interest.

Step 1. For m c {min, med, max}, interpolate the elicited rz,m(p) at different drop-out
probabilities (see Figure 2-1) to find rz,m(Pzj(yji_)) for any pzj(yj-_).
Step 2. Construct the prior of rz(pzj(y-i)) given pz,(Yj_,) as a 50-50 mixture of
Uniform(rz,min(pz,(yj_-)), rz,med(Pzj(Yj-1))
and
Uniform(rz,med(pz,j (yj-1)), z,max (Pzj(j-))
random variables. This preserves the elicited percentiles of the relative risk.
Step 3. Construct a conditional prior of p )(yil) given pzj(Yj-1) and rz(pzj(yi_)) as a
uniform distribution with lower bound
Pz,i(Yi-1)
max {rz(pz,(yj 1)),1}
and upper bound

min P(Yj-1) 1
mi m {rzn((pz,(Y-i)), 1} max {r(pzj(yj-)), 1}
The bounds were derived above.
Step 4. Steps (2) and (3) induce a prior for Tz,yy | by noting


is Ziy- (= log rz(pz (y_ ))(1- (o y_))
S1 rz(PzYj-1))P(Yj-1)

i.e., Tzj,,y is a deterministic function of rz(pzj(yj-i)) and pz(yi-).
The relative risks elicited from Dr. Ganz are given in Table 2-2. We extrapolated the
relative risks outside the ranges given in Table 2-2 as shown in Figure 2-1.
Figure 2-2 shows the density of r given pz,(yj-,) equal 10% and 25% for the
tamoxifen and placebo arms. For two patients with the same response history up to
time point j 1, the log odds ratio of dropping out at time point j, for the patient that is









depressed at time point j versus the patient that is not, increases as the overall drop
out rate at time point increases. In general, for a given pzj(Yj-), the log odds ratio is
higher for patients in the tamoxifen versus placebo arms.

2.6.3 Posterior Computation

With the shrinkage priors on 0, the elicited conditional priors 7 given 0, and the
observed data, the following steps are used to simulate draws from the posterior of p :

1. Using the proposed observed data model with the shrinkage priors on 0, we
simulate draws from the posterior distributions of P[Yj = 1Rj = 1, Yj-_ = yj-, Z =
z] and P[Rj = 0ORj = 1, Yj_ = yj-, Z = z] for all j, z and yj,_ in WinBUGS.

2. For each draw of P[Rj = 01Rj = 1, Y _- = yi-1, Z = z], we draw ry 1 based on
the conditional priors described in Section 6.2.

3. We compute pj by plugging the draws of P[Yj = 1Rj = 1, Yj- = y,_-, Z = z],
P[Rj = 01Rj = 1, Y-_1 = yj-, Z = z] and zjy into the identification algorithm
discussed in Section 2.4.
To sample from the posterior distributions of P[Yj = 1Rj = 1, Yj_i = yj_1, Z = z]

and P[Rj = 0 Rj = 1, Yji = yj-, Z = z] in WinBUGS we stratify the individual
binary data (by previous response history) and analyze as Binomial data; this serves to
drastically improve the computational efficiency. Sampling zjy and computing p*, is
implemented separately from the first step using R.

2.7 Assessment of Model Performance via Simulation

Via simulation, we compared the performance of the shrinkage model with a correct
parametric model (given below), an incorrect parametric model (first order Markov
model) and the saturated model with diffuse priors (given below).
The shrinkage model uses the shrinkage priors proposed in Section 2.6.1 (shrink

the saturated model toward a first order Markov model). Note that the shrinkage priors
shrink the saturated model to an incorrect parametric model.









For the saturated model with diffuse priors, we re-parameterize the model as

P[Yj = 1Ry, = 1,Y,_j = Yj_1, Z = z] = #z,yj,,

P[Rj = 0|Rj_1 = 1,Y,_1 = yi, Z = z] = Pz,,.1

forj = 1,... 7, and specify independent Unif(0, 1) on p's and p's.
We simulated observed data from a "true" parametric model of the following form:

logitP[Yo = 1Ro = 1, Z = z] = az,o,o

logitP[Y1 = 1Ri = 1, Yo = Yo, Z = z] = az,,o + oz,1,1yo

logitP[R1 = 0IRo = 1, Yo = yo, Z = z] = 7z,i,o + 7z,i,iYo

logitP[Yj = 1|Rj = 1, Yj_ = yj_1, Z = z] = Oaz,o + az,jlYj- + az,j,2Yj-2

logitP[Rj = 0|Rj_1 = 1, Yj1 = Yj-1, Z = z] = 7zj,o + 7zj,lj- + 7z,j,2Yj-2,

forj = 2 to 7.
To determine the parameters of the data generating model, we fit this model to

the "monotonized" BCPT data in WinBUGS with non-informative priors. We used the
posterior mean of the of parameters az and 7 as the true parameters. We compute
the "true" values of pj by (1) drawing 10,000 values from the elicited prior of Tz given

7y given in Table 2-2, (2) computing pj using the identification algorithm in Section 2.4
for each draw, and (3) average the resulting pj's. The model parameters and the "true"
depression rates p j, are given in Table 2-3.
We considered (relatively) small (3000), moderate (5000), and large (10000) sample

sizes for each treatment arm; for each sample size, we simulated 50 datasets. We
assessed model performance using the mean squared error (MSE) criterion.
In Table 2-4, we report the MSEs of P[Yj = 1 Rj = 1, Yj-, Z = z] and

P[Rj = 1\Rj_1 = 1, Yj_1, Z = z] averaged over allj and all Yj_I (see columns
3 and 4, respectively). We also report the MSEs for pj (see columns 6-12). For









reference, the MSEs associated with the true data generating model are bolded.

This table demonstrate that the shrinkage model generally outperforms both the

incorrectly specified parametric model and the saturated model at all sample sizes.

This improved performance is especially noticeable when comparing the MSEs for the

rates of depression at times 3-7.

In addition, the MSEs for the shrinkage model compare favorably with those of

the true parametric model for all sample sizes considered, despite the fact that the

shrinkage priors were specified to shrink toward an incorrect model.

2.8 Application: Breast Cancer Prevention Trial (BCPT)

Table 2-5 displays the treatment-specific monotonized drop-out rates in the BCPT

By the 7th study visit, more than 40% of patients had missed one or more assessments,

with a slightly higher percentage in the tamoxifen arm.

We fit the shrinkage model to the observed data using WinBUGS, with four chains

of 8000 iterations and 1000 burn-in. Convergence was checked by examining trace plots

of the multiple chains.

2.8.1 Model Fit and Shrinkage Results

To assess the model fit, we compared the empirical rates and posterior means

(with 95% credible intervals) of P[Yj = 1, Rj = 1|Z = z] and P[Rj = 0 Z = z].

As shown in Figure 2-3, the shrinkage model fits the observed data well. Figure 2-4

illustrates the effect of shrinkage on the model fits by comparing the difference between

the empirical rate and posterior mean of P[Yj = 1 Rj = 1, YYl, Z = z] for allj, z

and Yj_i. We can see that for early time points, the difference is close to zero since

there is little shrinkage applied to the model parameters. For later time points, more

higher order interaction coefficients are shrunk toward zero and the magnitude of

difference increases and drifts away from zero line. In general, the empirical estimates

are less reliable for the later time points (re: the simulation results in Section 7). In

some cases, there are no observations within "cells." By shrinking the high order









interactions (i.e., borrowing information across neighboring cells), we are able to

estimate P[Yj = 1l R = 1, Y _i, Z = z] for all j, z and Yj_1 with reasonable precision.

2.8.2 Inference

Figure 2-5 shows the posterior of P[Y7 = 1|Z = z], the treatment-specific probability

of depression at the end of the 36-month follow up (solid lines). For comparison, the

posterior under MAR (corresponding to point mass priors for r at zero) is also presented

(dashed lines). The observed depression rates (i.e., complete case analysis) were

0.115 on both the placebo and tamoxifen arms. Under the MNAR analysis (using

the elicited priors), the posterior mean of the depression rates at month 36 were

0.126 (95%C/ : 0.115, 0.138) and 0.130 (95%C/ : 0.119, 0.143) for the placebo and

tamoxifen arms; the difference was 0.004 (95%C/ : -0.012, 0.021). Under MAR, the

rates were 0.125 (95%C/ : 0.114, 0.136) and 0.126 (95%C/ : 0.115, 0.138) for the

placebo and tamoxifen arms; the difference was 0.001 (95% C : -0.015, 0.018). The

posterior probability of depression was higher under the MNAR analysis than the MAR

analysis since researchers believed depressed patients were more likely to drop out

(see Table 2-2), a belief that was captured by the elicited priors. Figure 2-6 shows that

under the two treatments there were no significant differences in the depression rates

at every time point (95% credible intervals all cover zero) under both MNAR and MAR.

Similar (non-significant) treatment differences were seen when examining treatment

comparisons conditional on depression status at baseline.

2.9 Summary and Discussion

In this paper, we have presented a Bayesian shrinkage approach for longitudinal

binary data with informative drop-out. Our model provides a framework that incorporates

expert opinion about non-identifiable parameters and avoids the curse of dimensionality

by using shrinkage priors. In our analysis of the BCPT data, we concluded that there

was little (if any) evidence that women on tamoxifen were more depressed than those on

placebo.









An important feature of our approach is that the specification of models for the

identifiable distribution of the observed data and the non-identifiable parameters can

be implemented by separate independent data analysts. This feature can be used

to increase the objectivity of necessarily subjective inferences in the FDA review of

randomized trials with informative drop-out.

Penalized likelihood (Fan and Li, 2001; Green and Silverman, 1994; Wahba, 1990)

is another approach for high-dimensional statistical modeling. There are similarities

between the penalized likelihood approach and our shrinkage model. In fact, the

shrinkage priors on the saturated model parameters proposed in our approach can be

viewed as a specific form for the penalty.

The ideas in this paper can be extended to continuous outcomes. For example,

one could use the mixtures of Dirichlet processes model (Escobar and West, 1995) for

the distribution of observed responses. They can also be extended to multiple cause

dropout; in this trial, missed assessments were due to a variety of reasons including

patient-specific causes such as experiencing a protocol defined event, stopping therapy,

or withdrawing consent and institution-specific causes such as understaffing an staff

turnover. Therefore, some missingness is less likely to be informative; extensions will

need to account for that. In addition, institutional differences might be addressed by

allowing institution-specific parameters with priors that shrink them toward a common

set of parameters.

For smaller sample sizes, WinBUGS has difficulty sampling from the posterior

distribution of the parameters in the shrinkage model. In addition, the "monotonizing"

approach ignores the intermittent missing data and may lead to biased results. These

issues will be examined in the next Chapter.









2.10 Acknowledgments


This research was supported by NIH grants R01-CA85295, U10-CA37377, and

U10-CA69974. The authors are grateful to oncologist Patricia Ganz at UCLA for

providing her expertise for the MNAR analysis.

2.11 Tables and Figures

Table 2-1. Relative Risks to be Elicited
Drop out Rate p
Question Relative Risk p, P2 ...
100% confident the number is above rz,min(p)
Best Guess rz,med(P)
100% confident the number is below r,max(p)


Table 2-2. Percentiles of Relative Risks Elicited
Drop out Rate
Treatment Percentile 10% 25%
Tamoxifen Minimum 1.10 1.30
Median 1.20 1.50
Maximum 1.30 1.60
Placebo Minimum 1.01 1.20
Median 1.05 1.30
Maximum 1.10 1.40
























Table 2-3. Simulation Scenario


Parameter
Tamoxifen


ao -2.578 -2.500 -2.613
a 2.460 1.978
a2 1.500
70 -2.352 -2.871
71 0.611 0.397
72 0.121
Depression Rate 0.066 0.097 0.119
Placebo


co -2.653
a1
a2
70
71
72
Depression Rate 0.071


-2.632 -2.59
2.708 2.304
1.241
-2.308 -2.970
0.466 0.468
-0.293
0.107 0.118


Time Point
2 3


-2.752
1.940
1.599
-2.625
0.460
0.422
0.124

-2.663
1.874
1.608
-2.729
0.469
0.323
0.120


-2.626
2.023
1.389
-2.513
0.247
0.261
0.139

-2.598
2.104
1.471
-2.474
0.272
0.278
0.132


4 5


-2.789
2.072
1.612
-2.281
0.320
0.035
0.126

-2.884
2.068
1.693
-2.410
0.376
0.288
0.130


-2.811
1.885
1.639
-2.217
0.127
0.293
0.126

-2.853
2.123
1.540
-2.460
0.088
0.241
0.126


-2.895
2.007
1.830
-2.536
0.228
0.204
0.123

-3.035
2.243
1.989
-2.673
0.001
0.428
0.125


i









Table 2-4. Simulation Results: MSE (x103). P and T represent placebo and tamoxifen
arms, respectively.
Observed uT


Model Treat
Sample Size 3000
True P
T
Parametric P
T
Shrinkage P
T
Saturated P


Sample Size 5(
True

Parametric

Shrinkage

Saturated


Y R 1 2 3 4 5 6 7


0.946
1.075
30.176
28.882
6.970
6.988
35.678
34.654


)00
P 0.659
T 0.604
P 30.029
T 28.571
P 4.628
T 4.448
P 30.274
T 29.599


Sample Size 10000
True P
T
Parametric P
T
Shrinkage P
T
Saturated P
T


0.314
0.278
29.849
28.418
2.392
2.474
22.989
22.245


0.378
0.431
0.451
0.385
1.999
2.401
67.171
62.606

0.238
0.261
0.381
0.277
1.188
1.414
54.647
51.219

0.121
0.126
0.315
0.223
0.707
0.712
37.716
34.791


0.034 0.048
0.026 0.035
0.034 0.048
0.026 0.037
0.033 0.045
0.024 0.026
0.036 0.050
0.026 0.033

0.023 0.027
0.020 0.020
0.023 0.027
0.020 0.023
0.025 0.028
0.017 0.023
0.023 0.028
0.020 0.020

0.009 0.009
0.011 0.013
0.009 0.010
0.011 0.014
0.008 0.010
0.011 0.015
0.009 0.009
0.011 0.013


0.049
0.046
0.048
0.056
0.051
0.056
0.054
0.045

0.027
0.032
0.028
0.042
0.032
0.028
0.028
0.032

0.015
0.013
0.016
0.02
0.015
0.011
0.016
0.014


0.052
0.056
0.062
0.078
0.059
0.053
0.058
0.059

0.028
0.027
0.033
0.042
0.031
0.026
0.033
0.028

0.017
0.013
0.023
0.026
0.017
0.015
0.018
0.014


0.050
0.052
0.061
0.073
0.047
0.063
0.101
0.097

0.036
0.034
0.044
0.049
0.035
0.038
0.061
0.051

0.013
0.015
0.028
0.028
0.014
0.016
0.018
0.021


0.05 0.065
0.078 0.061
0.067 0.086
0.096 0.077
0.052 0.066
0.119 0.073
0.231 0.561
0.329 0.722

0.043 0.058
0.035 0.044
0.052 0.073
0.056 0.057
0.048 0.057
0.033 0.044
0.138 0.290
0.140 0.392

0.014 0.018
0.019 0.020
0.033 0.043
0.039 0.043
0.013 0.014
0.019 0.023
0.038 0.094
0.048 0.128


Table 2-5. Patients Cumulative Drop Out Rate
Month 3 6 12 18 24 30 36
Tamoxifen Available 5364 4874 4597 4249 3910 3529 3163
Drop out 490 767 1115 1454 1835 2201 2447
Drop Rate(%) 9.13 14.30 20.79 27.11 34.21 41.03 45.62
Placebo Available 5375 4871 4624 4310 3951 3593 3297
Drop out 504 751 1065 1424 1782 2078 2304
Drop Rate(%) 9.38 13.97 19.81 26.49 33.15 38.66 42.87












co Tamoxifen: maximum
r--------------------------------------


i Tamoxifen: median
r--------------------------------------
I /


-^ ,' ,, / Placebo:maximum
co Tamoxifen:minimum
S'- / Placebo:median
>c

/' /Placebo:minimum




o ./

0 10 25 100
Drop Out Rate

Figure 2-1. Extrapolation of the elicited relative risks.


























48









































































CD




























Figure 2-2.






Figure 2-2.


---


I\
I (
I \
I (
I )
I \
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I I
I
I
I
I I
I
I )
I
II
I
1
I
1
I
I
I I
I
I I
I
I
I
) I
I
I I
I
I I
I
I \


Prior conditional density Tj,,y given pz,(yj-_). Black and gray lines


represent tamoxifen and placebo arms, respectively. Solid and dashed lines


are for pzj(y,-) = 0.25 and pzj(Y-i) = 0.10, respectively.


0.2


0.4


0.6


0.8


I I I I I


).0
















Tamoxifen


3 12 24
Time Points


Figure 2-3.


36


3 12 24
Time Points


36


Solid and dashed lines represent the empirical rate of
P[Yj = 1, Rj = 1|Z = z] and P[Rj = 0|Z = z], respectively. The posterior
means of P[Yj = 1, Rj = 1 Z = z] (diamond) and P[Rj = 0 Z = z] (triangle)
and their 95% credible intervals are displayed at each time point.


B


Placebo






















A1:Placebo (Y) .

0
S0 0 0 o 0

0o o o o o o


A2:Tamoxifen(Y) .. .. ...... -
0 0 00 I0O O 0

0 0O 0 00
o o o o o


0 0 0 o


B1:Placebo(R) o o .. .... .oo
00 0 0 o I o 0 o o o o o






C 6 C 8
B2:Tamoxifen(R) .* .. -
i i o o o














0 0 0000 0 0 0
C -6 C -8 o





o o o o
CL6 6 C8 o
B2:Tamoxifen(R) ............


- o c-I I 9-_,o% Po o-oo oo Oooo o ooo oo o
o o o oo o
C--6 C--8 o o


Pattern of Historical Response


Figure 2-4.


Differences between posterior mean and empirical rate of
P[Y 1 Rj = 1, Yj_, Z = z] (Al and A2) and
P[Rj = 0|Rj1 = 1, Yi_1, Z = z] (B1 and B2). The x-axis is ordered by follow
up time C (max{t : Rt 1}). The bullets are the posterior mean of
P[Yj = 1 Rj = 1, Y_, Z = z] and P[Rj = O|Ri_ = 1Y, Z = z] when
there are no patients with historical response Y,_1.






















O
00
o




CO





a>
C)o




C)
N




CM

0



Figure 2-5.


.10 0.11 0.12 0.13 0.14 0.15
Depression Probability


Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent
tamoxifen and placebo arms, respectively. Solid and dashed lines are for
MNAR and MAR, respectively.


























TT
T


8 T
-







I I
,,


18
Time Points


Figure 2-6.


Posterior mean and 95% credible interval of difference of P[Yj = 1 Z = z]
between placebo and tamoxifen arms. The gray and white boxes are for
MAR and MNAR, respectively.









CHAPTER 3
A BETA-BINOMIAL BAYESIAN SHRINKAGE MODEL FOR INTERMITTENT
MISSINGNESS LONGITUDINAL BINARY DATA
3.1 Introduction

We proposed in Chapter 2 a Bayesian shrinkage approach for longitudinal binary

data with informative drop-out. The saturated observed data models were constructed

sequentially via conditional distributions for response and for drop out time and

parameterized on the logistic scale using all interaction terms. However, two issues were

not addressed: the "ignored" intermittent missing data and the intrinsic computational

challenge with the "interaction" parameterization. This Chapter proposes solutions to

these two issues.

3.1.1 Intermittent Missing Data

In the BCPT, approximately 15% of the responses were intermittently missing,

i.e. there are missing values prior to drop-out. One approach to handle intermittent

missingness is to consider a "monotonized" dataset, whereby all CES-D scores

observed on an individual after their first missing score are deleted, as in Land et al.

(2002); we did this in Chapter 2. However, this increases the "drop-out" rate, throws

away information and thus loses efficiency, and may introduce bias.

Handling informative intermittent missing data is methodologically and computationally

challenging and, as a result, the statistics literature is relatively limited. Most methods

adopt a likelihood approach and rely on strong parametric assumptions (see, for

example, Albert, 2000; Albert et al., 2002; Ibrahim et al., 2001; Lin et al., 2004; Troxel

et al., 1998). Semiparametric methods have been proposed by Troxel et al. (1998)

and Vansteelandt et al. (2007). Troxel et al. (1998) proposed a marginal model and

introduced a pseudo-likelihood estimation procedure. Vansteelandt et al. (2007)

extended the ideas of Rotnitzky et al. (1998b), Scharfstein et al. (1999) and Rotnitzky

et al. (2001) to non-monotone missing data.









Most related to our approach are the (partial ignorability) assumptions proposed

in Harel and Schafer (2009) that partition the missing data and allow one (or more) of

the partitions to be ignored given the other partition(s) and the observed data. In this

Chapter, we apply a partial ignorability assumption such that the intermittent missing

data mechanism can be ignored given drop-out and treatment strata.

3.1.2 Computational Issues

WinBUGS is a popular software package that allows convenient application of

MCMC techniques. However, there are major drawbacks. For the shrinkage model

proposed in Chapter 2, Section 2.5, WinBUGS has difficulty sampling from the posterior

distribution of the parameters when sample size is relatively small (less than 3000 per

arm). Tailored sampling algorithms can be written to overcome this difficulty, however,

WinBUGS lacks the flexibility to incorporate modifications and/or extensions to its

existing algorithms.

In this Chapter, we will provide an alternative parameterizations of the saturated

model for the observed data as well as alternative shrinkage prior specifications to

improve computational efficiency. This alternative approach to posterior sampling can

easily be programmed in R.

3.1.3 Outline

This Chapter is organized as follows. In Section 3.2, we describe the data structure,

formalize identification assumptions and prove that the treatment-specific distribution

of the full trajectory of longitudinal outcomes is identified under these assumptions. In

Section 3.3, we introduce a saturated model for the distribution of the data that would be

observed when there is drop-out, but no intermittent observations. We then introduce

shrinkage priors to parameters in the saturated model to reduce the dimensionality of

the parameter space. In Section 3.4, we assess, by simulation, the behavior of three

classes of models: parametric, saturated, and shrinkage. Our analysis of the BCPT trial

is presented in Section 3.5. Section 3.6 is devoted to a summary and discussion.









3.2 Notation, Assumptions and Identifiability

To address the intermittent missingness, we redefine the notation in Chapter 2,

Section 2.2, as well as introduce some additional notation in this Section. The following

notation is defined for a random individual. When necessary, we use the subscript i to

denote data for the ith individual.

Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen

and Z = 0 denotes placebo. Let Y be the complete response data vector with elements

Yj denoting the binary outcome (i.e., depression) scheduled to be measured at the jth

visit ( = O(baseline),..., J) and let Yy = (Yo,..., Yj) denote the history of the outcome

process through visit j. Let R be the vector of missing data indicators with the same

dimension as Y, such that Rj = 1 indicates Yj is observed and Rj = 0 indicates Yj is

missing. Let S = max{t : Rt = 1} be the last visit at which an individual's depression

status is recorded. If S < J, then we say that the individual has dropped out and S is
referred to as the drop-out time. Let Rs = {Rj : j < S} be the collection of intermittent

missing data indicators recorded prior to S.

We will find it useful to distinguish three sets of data for an individual: the complete

data C = (Z, S, Rs, Y), the full data F = (Z, 5, Rs, Ys), and the observed data

0 = (Z, 5, Rs, Yobs), where Yobs is the subset of Y for which Rj = 1. It is useful to

also define Ymis = (Ymis ,Y is, Y i), where Yis = Y : R = 0,j < S} denotes the
"intermittent" missing responses, Y -i, = : j = S + 1,j < J} denotes the missing

response at the time right after drop-out, and Yis = { Yj S + 1 < j < J} denotes the
"future" missing responses. Note that Ys = (Ymis, Yobs).

We assume that individuals are drawn as a simple random sample from a

super-population so that we have an i.i.d. data structure for C, F and 0. We let the

parameters Oz index a model for the joint conditional distribution of S and Ys given

Z = z and the parameters qs,z index a model for the conditional distribution of Rs given









S = s, Ys and Z = z. We assume that the parameters Oz and z = (1,z ..., 0,z) are
distinct.

Our goal is to draw inference about j = P[Y = llZ = z] forj = 1,...,J and

z = 0, 1. To identify p/j from the distribution of the observed data, we make the following
three untestablee) assumptions:

Assumption 1: Given Z and S, the intermittent missing data are missing at random,
i.e.,

Rs IL Ymis Z, S, Yobs.

Under this assumption the parameters of the joint conditional distribution of S and

Ys given Z = z are estimable from the distribution of the observed data.
This assumption plus the assumption that Oz is a priori independent of Oz implies

that the intermittent missingness mechanism is ancillary or ignorable. Specifically, this
means that when considering inferences about Oz from a likelihood perspective, as

we are in this paper, the conditional distribution of Rs given Z, S and Yobs does not
contribute to the likelihood and can be ignored (Harel and Schafer, 2009).

Assumptions 2 and 3 are the same as Assumptions 1 and 2 in Chapter 2,

Section 2.3, respectively. We restate below the two assumption using the "survival"
time S notation (instead of missing indicators R in Chapter 2).
Assumption 2 (Non-Future Dependence): Forj = 1,..., J,

P[S = -1 S > 1, Y] = P[S = -1 S >j 1,YJ]

Assumption 3 (Pattern-Mixture Representation): Forj = 1,... J and yj = 0, 1,

P[Yj = I z P[Yj =yS >j,Yj1,Z= z]exp{qz,(Yj_1, y)}
E[exp{qz,(Yj_l, Y)}IS > j, Yj_i, Z = z]

where qz,(Yjy_, Y,) is a specified function of its arguments.

Theorem 1: P[Yj = 1IS > k 1,Yk- = Yk-1, Z = z] and P[Yj = 11S = k 1,Yk- =

yk-1, Z = z] are identified for k = 1,..., j under Assumptions 1-3.









Proof: Under Assumption 1, we know that the parameters of the conditional joint
distribution of S and Ys given Z = z are estimable from the distribution of the observed

data. The rest of the proof is the same as in Chapter 2.
The identifiability result shows that, given the functions qz,j(Yj-, Yj), I*,j can be
expressed as functional of the distribution of the observed data.

3.3 Modeling, Prior Specification and Posterior Computation
3.3.1 Modeling

We reparameterize the saturated observed data model in Chapter 2 as follows:

P[Yo = 1] = az,o

P[Y = 1IS > 1, Yo = y, Z = z] = azl,y

P[Y, = 1IS > j, = y, Y,-2 = -_2, Z = z] = a Z.,yJ-_2 (3-1)

P[S = 0 Yo = y, Z = z] = 7,o,y

P[S =j 1|S >j Y_1 = y, Y-2 = Yj-2, Z = z] = 7z'-1,Y 2,y

for = 2,.. ,Jandy = 0, 1.
Let az denote the parameters indexing the first set of models for response and 7-

denote the parameters indexing the second set of models for drop-out. Recall that we

defined Oz to denote the parameters of the conditional distribution of S and Ys given
Z = z; thus, Oz = (az, y7).
This saturated model avoids the complex interaction term model parameterization.
As a result, the (conditional) posterior distributions of Oz will have simple forms and

efficient posterior sampling is possible even when the sample size is moderate or small.

We use the same parameterization of the functions qz(Yj_1, Yj) as in Chapter 2,
Section 2.5.

3.3.2 Shrinkage Prior

In Chapter 2, the strategy to avoid the curse of dimensionality was to apply

shrinkage priors for higher order interactions to reduce the number of parameters








(i.e., shrink them to zero). For the directly parameterized model 3-1, we use a different
shrinkage strategy. In particular, we propose to use Beta priors for shrinkage as follows:

azy, 2- Beta (m () /-(-) (1- m() ()
"- zj zj Y)/ Y (3-2)
z,j-,yI, -,y Beta ( j (,y zj,y -m(Y) ,y)
forj = 2,... J and y = 0, 1. For az,o, oz,l,y and 7z,o,y for y = 0, 1, we assign Unif(0, 1)
priors. Let m) (m()) and r ) (r)()) denote the parameters m ,z (mJ_-y) and ,y

(r) -1y,) respectively.
Note that for a random variable X that follows a Beta(m/T/, (1 m)/,) distribution,
we have
E[X] = m and Var[X] = m(1 m) x -
I+l
For fixed m, Var[X] -> 0 as r -> 0, indicating shrinkage of the distribution of X toward
the mean. Thus, (), and / -,y serve as shrinkage parameters for azjy2y and

7zj-y ,y respectively. As the shrinkage parameters go to zero, the distribution of the
probabilities a z and 7 J- ,y are shrunk toward the mean of the probabilities
that do not depend on yj-2, namely m() and m()i) respectively. In essence, the
model is being shrunk toward a first-order Markov model. The shrinkage priors allow
"neighboring cells" to borrow information from each other and provide more precise
inferences.
Theorem 2: When there is no intermittent missingness, the proposed model yields
consistent posterior means of the observed data probabilities, as long as all the true
values of the observed data probabilities are in the open interval (0, 1).
Proof: See Appendix.
We specify independent Unif(0, 1) priors for m-") and m'). For the shrinkage
parameters (- and r_,) we specify independent, uniform shrinkage priors (Daniels,









1999) as follows


.)y 9. ,2 nd l,(Y)
I zgE ) and T )
zJ ~(g (E)) ) y+ 1)2,j-1,y
SzJ, zJ,y


(g (E) (7y) )2


where


* g(-) is a summary function (e.g., minimum, median or maximum, as suggested in
Christiansen and Morris (1997)).

Ey) = {e()- : the expected number of subjects with S > j, Y,_1 = y, Y_-2 =
zzJyiy 2,Y
yj-2, Z = z}.

E(- {e ) the expected number of subjects with S > j 1, Yj_1 =
zJ-l,y Z,J lyIj 2
y, Yj-2 Y -2, Z z}.
The expected number of subjects with S > j, Y-_1 = y, Yi-2 = Y-2, Z = z and with

S > j 1, Yi- = y, Yj-2 = Y-2, Z = z can be computed as:
J
e -y =nz P[S =s, Y = ,,... j = Y- = y, Yj-2= Yj_2Z = z]
s-= yj,yj+I ...,ys
J
eK'Y = nz P[S = s, Y, = y,..., Yj = Y, Y-1 = y, Yj-2 = Yj_21Z = z]
s=j-1 y,,yj,+...,ys


(3-4)


where the probabilities on the right hand side of the above equations are estimable
under Assumption 1.

The expected sample sizes above are used in the prior instead of the observed

binomial sample sizes which are not completely determined due to the intermittent

missingness. Thus, our formulation of these priors induces a small additional amount of

data dependence beyond its standard dependence on the binomial sample sizes. This

additional dependence affects the median of the prior but not its diffuseness.

3.3.3 Prior of Sensitivity Parameters

We use the same approach as in Chapter 2, Section 2.6.2 for constructing priors of

Tz given Oz.


(3-3)









3.3.4 Posterior Computation

Compared to Chapter 2, posterior computations for the observed data model are

much easier and more efficient under the reparameterized model 3-1 and the Beta

shrinkage priors. The posterior sampling algorithms can be implemented in R with no

sample size restrictions.

The following steps are used to simulate draws from the posterior of -:

1. Sample P(0z, Ymis Yobs, 5, Rs, Z = z) using Gibbs sampling with data augmentation
(see details in Appendix). Continue sampling until convergence.

2. For each draw of 7 -lYy 2,Y 1, draw .,y. 1 based on the conditional priors
described in Section 2.6.2.

3. Compute pZ by plugging the draws of azy- 2 ,y, Zy-2,y, and T,,y into the
identification algorithm discussed in Section 2.4.

3.4 Assessment of Model Performance via Simulation

For assessment of model performance, we use the same "true" parametric model

as in Chapter 2, Section 2.7 to simulate observed data (no intermittent missingness).

We again compared the performance of our shrinkage model with (1) a correct

parametric model, (2) an incorrect parametric model (first order Markov model) and

(3) a saturated model (with diffuse priors). Our shrinkage model uses the shrinkage
priors proposed in Section 3.3.2.

We considered small (500), moderate (2000), large (5000) and very large

(1,000,000) sample sizes for each treatment arm; for each sample size, we simulated

500 datasets. We assessed model performance using mean squared error (MSE).

In Table 3-2 (sample size 1,000,000 not shown), we report the MSE's of P[Yj =

1|S > j, Yi-_, Z = z] and P[S = j 11S > j 1, Y -, Z = z] averaged over all

j and all Yj_1 (see columns 3 and 4, respectively). We also report the MSE's for pj

(see columns 6-12). For reference, the MSE's associated with the true data generating

model are bolded. At all sample sizes, the shrinkage model has lower MSE's for the









rates of depression at times 3-7 than the incorrectly specified parametric model and the

saturated model. Our simulation results show that as sample size goes to infinity (e.g.

very large, 1,000,000), both the shrinkage model and the saturated model converge to

the true values of pj, whereas the incorrectly specified parametric model yields biased

estimates.

In addition, the MSE's for the parameters pj in the shrinkage model compare

favorably with those of the true parametric model for all sample sizes considered,

despite the fact that the shrinkage priors were specified to shrink toward an incorrect

model.

3.5 Application: Breast Cancer Prevention Trial (BCPT)

Table 3-1 displays the treatment-specific drop-out and intermittent missing rates in

the BCPT. By the 7th study visit (36 months), more than 30% of patients had dropped

out in each treatment arm, with a slightly higher percentage in the tamoxifen arm.

3.5.1 Model Fit

We fit the shrinkage model to the observed data using R, with multiple chains of

5000 iterations and 1000 burn-in. Convergence was checked by examining trace plots of

the multiple chains. We defined g(.) in the priors for the hyperparameters (Equation 3-3)

to be the maximum function. To compute the expected number of subjects e()- and
z,j,y- 2'Y
e Y in Equation (3-4), we assigned a point mass prior at 0.5 to all mV m(), (a)

and ri(-) (which corresponds to Unif(0, 1) priors on a z,y-2, and 7z,y, ,) and sampled

az,y-,y and 7zy-21Y using Step 1 in the algorithm described in Section 3.3.4. To avoid
data sparsity, we calculated P[S = s, Ys = Ys] using the posterior mean of ca,y- ,y and

7z, 2,y rather than the empirical probabilities.
To assess model fit, we compared the empirical rates and posterior means (with

95% credible intervals) of P[Yj = 1, S > jlZ = z] and P[S < j Z = z]. As shown in

Figure 3-1, the shrinkage model fits the observed data well.









Figure 3-2 illustrates the effect of shrinkage on the model fit by comparing the

difference between the empirical rates and posterior means of P[Yj = 1 S > j, Y,_i =

y _1, Z = z] for the tamoxifen arm (Z = 1) and j = 6, 7. We use the later time points
to illustrate this since the observed data were more sparse and the shrinkage effect

was more apparent. The empirical depression rates often reside on the boundary (0

or 1). In some cases, there are no observations within "cells", thus the empirical rates

were undefined. From the simulation results in Section 2.7, we know that the empirical

estimates are less reliable for later time points. Via the shrinkage priors, the probabilities

P[Yj = 15 > j, Yj-1 = y_-1, Yj-2 = Yj-2, Z = z] with the same yj-2 are shrunk together
and away from the boundaries. By borrowing information across neighboring cells,

we are able to estimate P[Yj = 1 S > j, Yj-l, Z = z] for all z and Yj_1 with better

precision. The differences between the empirical rates and the posterior means illustrate

the magnitude of the shrinkage effect. In the BCPT, the depression rate was (relatively)

low and there were few subjects at the later times that were observed with a history of

mostly depression at the earlier visits; as a result, the differences were larger when Yj_i

had a lot of 1's (depression).

3.5.2 Inference

Figure 3-3 shows the posterior of P[Y7 = 1|Z = z], the treatment-specific probability

of depression at the end of the 36-month follow up (solid lines). For comparison, the

posterior under MAR (corresponding to point mass priors for r at zero) is also presented

(dashed lines). The observed depression rates (i.e., complete case analysis) were 0.124

and 0.112 for the placebo and tamoxifen arms, respectively. Under the MNAR analysis

(using the elicited priors), the posterior mean of the depression rates at month 36 were

0.133 (95%C/ : 0.122, 0.144) and 0.125 (95%C/ : 0.114, 0.136) for the placebo and
tamoxifen arms; the difference was -0.007 (95%C/ : -0.023, 0.008). Under MAR,

the rates were 0.132 (95%C/ : 0.121, 0.143) and 0.122 (95%C/ : 0.111, 0.133) for
the placebo and tamoxifen arms; the difference was -0.01 (95%C/ : -0.025, 0.005).









The posterior probability of depression was higher under the MNAR analysis than the

MAR analysis since researchers believed depressed patients were more likely to drop

out (see Table 2-2), a belief that was captured by the elicited priors. Figure 3-4 shows

that under the two treatments there were no significant differences in the depression

rates at any measurement time (95% credible intervals all cover zero) under both MNAR

and MAR. Similar (non-significant) treatment differences were seen when examining

treatment comparisons conditional on depression status at baseline.

3.5.3 Sensitivity of Inference to the Priors

To assess the sensitivity of inference on the 36 month depression rates to the

elicited (informative) priors {rmin, rmed, rmax}, we considered several alternative scenarios

based on Table 1. In the first scenario, we made the priors more or less informative by

scaling the range, but leaving the median unchanged. That is, we considered increasing

(or decreasing) the range by a scale factor v to {rmed v(rmed rmin), armed, armed + V(rmax

rmed)}. In the second scenario, we shifted the prior by a factor u, {u + rmin, u + rmed, U +

rmax }.
The posterior mean and between-treatment difference of the depression rate

at month 36 with 95% Cl are given in Tables 3-3 and 3-4. None of the scenarios

considered resulted in the 95% Cl for the difference in rates of depression at 36 months

that excluded zero except for the (extreme) scenario where the elicited tamoxifen

intervals were shifted by 0.5 and the elicited placebo intervals were shifted by -0.5.

We also assessed the impact of switching the priors for the placebo and tamoxifen

arms; in this case, the posterior means were 0.135 (95% C : 0.124,0.146) and

0.123 (95% C : 0.112, 0.134) for the placebo and tamoxifen arms respectively, while the

difference was -0.012 (95% C : -0.027, 0.004).









3.6 Summary and Discussion

In this Chapter, we extended the Bayesian shrinkage approach proposed in

Chapter 2 for intermittent missingness. In addition, we reparameterized the saturated

observed data model and dramatically improved the computational efficiency.

WinBUGS can still be applied for the reparameterized model when there is no

intermittent missing data. However, with the intermittent missingness, the augmentation

step in the posterior computation requires extensive programming in WinBUGS.

Nevertheless, the approach in Chapter 2 may still be preferred in certain cases, e.g.

for directly shrinking the interaction terms.

As an extension, we might consider alternatives to the partial ignorability assumption

(Assumption 1) which has been widely used, but questioned by some (Robins, 1997).

3.7 Tables and Figures

Table 3-1. Missingness by Scheduled Measurement Time
Time Point j (Month)
1(3) 2(6) 3(12) 4(18) 5(24) 6(30) 7(36)
Tamoxifen (Total N = 5364, Overall Missing 34.94%)
Intermittent Missing 330 224 190 200 203 195
Drop-out at j 160 122 259 280 332 352 369
Cumulative Drop-out 160 282 541 821 1153 1505 1874
Placebo (Total N = 5375, Overall Missing 31.83 %)
Intermittent Missing 347 215 153 181 199 197
Drop-out at j 157 106 247 287 309 272 333
Cumulative Drop-out 157 263 510 797 1106 1378 1711















Table 3-2. Simulation Results: MSE (x103). P and T represent placebo and tamoxifen
arms, respectively.


Observed


Model Treat
Sample Size 500
True P


Parametric

Shrinkage

Saturated


Sample Size 2000
True P
T
Parametric P
T
Shrinkage P
T
Saturated P


Sample Size 5C
True

Parametric

Shrinkage

Saturated


Y

6.209
6.790
33.351
32.323
29.478
28.410
57.107
55.582

1.474
1.610
30.507
29.168
23.545
22.598
40.322
38.943


)00
P 0.594
T 0.623
P 29.983
T 28.616
P 18.83
T 18.055
P 30.071
T 29.156


pjz (Month)


R 1(3) 2(6) 3(12)


2.474
2.789
1.511
1.602
2.310
2.365
111.263
104.882

0.586
0.634
0.543
0.495
0.647
0.615
77.627
72.731

0.234
0.265
0.379
0.298
0.394
0.322
54.454
50.590


0.199
0.205
0.199
0.205
0.202
0.212
0.202
0.211

0.052
0.050
0.051
0.050
0.053
0.050
0.053
0.050

0.020
0.024
0.020
0.024
0.020
0.024
0.020
0.024


0.225
0.228
0.227
0.226
0.226
0.232
0.228
0.245

0.058
0.063
0.055
0.064
0.056
0.063
0.057
0.064

0.024
0.024
0.025
0.025
0.024
0.024
0.024
0.024


0.258
0.297
0.26
0.292
0.252
0.294
0.302
0.383

0.063
0.062
0.064
0.071
0.063
0.063
0.069
0.067

0.026
0.028
0.029
0.035
0.026
0.028
0.027
0.029


4(18) 5(24) 6(30) 7(36)


0.313
0.344
0.317
0.345
0.303
0.336
0.490
0.657

0.078
0.080
0.081
0.086
0.078
0.080
0.100
0.110

0.033
0.035
0.037
0.048
0.033
0.036
0.038
0.039


0.319
0.331
0.323
0.333
0.312
0.330
1.083
1.352

0.081
0.093
0.090
0.101
0.082
0.093
0.188
0.218

0.031
0.033
0.043
0.045
0.031
0.034
0.052
0.059


0.352
0.405
0.349
0.403
0.337
0.390
2.401
3.167

0.086
0.095
0.091
0.110
0.084
0.095
0.457
0.560

0.040
0.039
0.049
0.060
0.039
0.040
0.13
0.148


0.390
0.428
0.388
0.425
0.372
0.419
4.427
5.782

0.097
0.101
0.108
0.121
0.095
0.102
0.946
1.223

0.036
0.040
0.055
0.059
0.036
0.041
0.270
0.373









Table 3-3. Sensitivity to the Elicited Prior
Scenario (T:Tamoxifen, P:Placebo)
VT = 5, VP = 5 VT = 0.2, vP = 0.2 uT = 0.5, uP = 0.5 uT = -0.5, uP = -0.5
Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25%
Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80
Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00
Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10
P[Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.120, 0.143) 0.117(0.107,0.128)
Placebo Minimum 0.85 0.80 1.04 1.28 1.51 1.70 0.51 0.70
Median 1.05 1.30 1.05 1.30 1.55 1.80 0.55 0.80
Maximum 1.30 1.80 1.06 1.32 1.60 1.90 0.60 0.90
P[Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.139(0.128, 0.150) 0.125(0.114,0.135)
Difference of P[Y7 = 1](95% CI) -0.008(-0.024, 0.008) -0.007(-0.023, 0.008) -0.007(-0.023, 0.009) -0.008(-0.023, 0.007)


















Table 3-4. Sensitivity to the Elicited Prior
Scenario (T:Tamoxifen, P:Placebo)
VT = 5, VP = 0.2 vT = 0.2, vP = 5 uT = 0.5, uP = -0.5 uT = -0.5, uP = 0.5
Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25%
Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80
Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00
Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10
P[Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.121, 0.143) 0.117(0.107, 0.128)
Placebo Minimum 1.04 1.28 0.85 0.80 0.51 0.70 1.51 1.70
Median 1.05 1.30 1.05 1.30 0.55 0.80 1.55 1.80
Maximum 1.06 1.32 1.30 1.80 0.60 0.90 1.60 1.90
P[Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.125(0.114, 0.135) 0.139(0.128, 0.150)
Difference of P[Y7 = 1](95% CI) -0.008(-0.024, 0.008) -0.008(-0.023, 0.008) 0.007(-0.008, 0.023) -0.022(-0.037, -0.006)









Placebo


Pj


Tamoxifen


3 12 24
Time Points


36


3 12 24
Time Points


Solid and dashed lines represent the empirical rate of P[Yj = 1, S > jZ = z]
and P[S < jZ = z], respectively. The posterior means of
P[Yj = 1, S > jZ = z] (diamond) and P[S < jZ = z] (triangle) and their
95% credible intervals are displayed at each time point.


Figure 3-1


36


I I I I I I I


I I I I I I I


";""~,
































(A) Conditional Depression Rate


o Empirical A Model-based, Empirical Undefined
* Model-based Posterior Mean of Depression Rate


0 a *
0 g
S*
o o o .. o
0 0 *. *: c
o *
0o .
*. oA "* 00

0 o
0 0 0 00 0


0 .. *
0 0 0 c*
* ** -* -* A
.- -


mo


o .,* *. .**" 7A A *
o o o
0 0 0


I 0
o o o o ooo o coo o, oc


(B) Shrinkage Difference


0 0


o o 00o o ooo oo 0 c coo o 0
S0 oO0 o o o0 0
oo o 00 o o o o o
0 -- -- ------- ---
0a 0 0 00 0
0o o0
0 0 0 0 0


0 0 0 0 0
0


0oo


[ Depressed I Not Depressed


Yu


I lll ll lll ll lll ll ll lll ll lll ll ll lll ll lll ll lll ll ll lll ll lll ll ll lll ll -I
I-II-I-I-I-I-I-I-I:I:I:I:Iiiiiiiiiiiiiiiiiiiiiiiiiiiii:::::::::: iiiiiiiiiiiiiiiiiiiiiiiii:i:i:iiiiiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii


Historical Response Pattern


Figure 3-2.


(A) The empirical rate and model-based posterior mean of

P[Yj = 11S > j, Yj_ = yj,, Z = z] for Z = 2 and j = 6, 7. (B) The difference

between the empirical and model-based posterior mean of the depression

rate. The x-axis is the pattern of historical response data Y,_I.


* Co c,


000ooo


oo0000


oo00 0


SCo 0 0 0 Cm oo 00 0


0 0 oo0 0 0


o O


CO 0


a 0






















O
00
o




CO





a>
C)o




C)
N




CM

0



Figure 3-3.


.10 0.11 0.12 0.13 0.14 0.15
Depression Probability


Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent
tamoxifen and placebo arms, respectively. Solid and dashed lines are for
MNAR and MAR, respectively.



























T
T
T T T T





HLH


IL J I I I J
J_


18
Time Points


Figure 3-4.


Posterior mean and 95% credible interval of difference of P[Yj = 1 Z = z]
between placebo and tamoxifen arms. The gray and white boxes are for
MAR and MNAR, respectively.









3.8 Appendix

Gibbs sampler for posterior computation: In the first step of the Gibbs sampler,

we draw, for each subject with intermittent missing data, from the full conditional of

Y1is given az, 7y, mV), i), m ), '), Yobs, S, Rs and Z = z. The full conditional
distribution can be expressed as


P[Yis = Ymisa, z, mz ), m(), rY Yobs = Yobs, S = s, Rs = rs, Z = z]

P[Ymis = Ymis, Yobs = Yobs, S = s|5z, 7z, m?, 7 m) ), Z = z]
Eal y1 P[Y is= Ymis, Yobs = Yobs, S = s |a, -y, rz 7 ), r 7 ), Z = z]

where the right hand side can be expressed as a function of yis, Yobs, S and az and 7z.

In the second step, we draw from the full conditional of m ) given {Yis} a, -z,

77 ) m( 77 {Yobs}, {S}, {Rs} and {Z} = z, where the notation {} denotes data 9
for all the individuals on the study. The full conditional can be expressed as
J 1
] m f {Y) msl' (y, {Yobs}, {S}, {Z} = z)
j=2 y0

where


f(Z) Y Yis}, az {Yobs}, {S}, {Z} = z)

oc B(Z(aY.2; mz jy/ zJ'(1 )/ zy)
Si Yi I=Y Y
Si.-1 ij-1Y
i.Zi=z

and B(a; c, d) is a Beta density with parameters c and d.

In the third step, we draw from the full conditional of mr) given {Ymi}, a~, Tz, '7),

miz), 77, {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as
J 1
f f f (m ( {Ymis 1}, -Y, lz( {Yobs}, {S}, {Z} = z)
j=2 y=









where


f(m {Ys 7z(y, {Yobs}, {S}, {Z} = z)

,j oc n B(is z,Y- 1,yi.Y, z 1, ,y ,y
S,1-i1, Yj -1y
i.Z,-z

In the fourth step, we draw from the full conditional of r) given ma), {Yis} az,

"z, m 7, {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as
J 1
7 7f(Iz {'Ymis}, a, mz, {Yobs}, {S}, {Z} = z)
j=2 y=0

where


f(() {Y ms} m z {Yobs}, {S}, {Z} = z)

g (E y (zjy
( g,(E z ja Z 'j /.Z ,= z


In the fifth step, we draw from the full conditional of j1) given m), {Ymis}, az, 7

m, 7 {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as
J 1
I f(i j I y is' 7z, m' {Yobs}, {S}, {Z} = z)
j=2 y=0

where

g (E(7)-l,y,
f {jy mis 7z, 'i ,, z {Yobs}, {S}, {Z} = z) oc (E 2,) 7
((z ,J-ly) iz-l ,y + 1)2

x(n B(zy-1Y, y (z) (1 z- 1Y) z-)A
-Sj-1,Y j-1=,
S -i 1,YI,j 1- Y
I.Z,=z

To draw from the full conditionals for steps two to five, we use slice sampling (Neal,

2003).









In the sixth step, we draw from the full conditional of az given {Ymis}, 7z, m(), ,),

mz, 7 7), {Yobs}, {5}, {Rs} and {Z} = z. The full conditional can be expressed as
J 1
S f(az, -2,y mis}Zj,y 'Z, {Yobs}, {S}, {Z} = z)
j=2 y=0 all y 2

where


f(Czj,y -2, Y is}, {Yobs}, {S}, {Z} = z)
SB(, ; (a) /() m / + o(- (1 M( /() + n(- -o (a)
B( zJYj-2 2,Y; M'z y/z*j,y zj,yj 2,y' zJY)/ zJy+ zj,yJ 2,y z,j,y- 2,y

n ()- is the number of subjects with S > j, Yj1 = y, Y-2 = -2 and Z = z, and
z,j,yj-2,Y
o0ZJ)- is the number of subjects with S > j, Y1 = y, Yi-2 = Y2, Z = z and Yj = 1.
Finally, we draw from the full conditional of 7, given {Ymis}, z, r ), 7, r(), -),

{Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as
J 1
I I 7zf -l,y-,yl { mis} ) _, (', {Yobs}, {S}, {Z} = z)
j=2 y=O all y 2

where


f(,-,yj 2,{Ymis}, m -, {Yobs}, {5}, {Z} = z)
B (( m, / 7(') -1- M) (7 () (^ -(()7) _o(7)
SB(z-i-,yj 2, j-,y zJ+-1, Ozj-1,y 2,y' L 1y-mz')/ J-I'y nzj-1,y-2,y z,yj-"YJ- 2,y

n) is the number of subjects with S > j 1, y = y, Yj-2 = Yj-2 and Z = z,
z,j-1,yj 2'y
and o()- is the number of subjects with S =- 1, Yj1 = y, Yi-2 = Yi-2 and Z = z.
ZJYJ 2'Y
Proof of Theorem 2: We will show that, with no intermittent missingness and
the shrinkage priors (Equation 3-2), the posterior distributions of P[Yj = 1|S >

j, Y-1, Yi-2, Z] and P[S =j-15S >j-1, Yj1, Y-2, Z], modeled as a zy-2,y and 7z,yj 2',
(re: Equation 3-1), for all Z, j and Yj_- are consistent under the condition that all the
true values of the probabilities are in the open interval (0, 1).








We use n() to denote the number of subjects with S > j, YV_1 = yj_1, Yj-2 -
zj,y'j
Y-2, Z = z, and use n()- denote the number of subjects with S > j 1, Y_ =
y1, Yj-2 = Y -2, Z = z. The condition that all the true values of az,y- 2,y and 7z,y ,y are
in the open interval (0, 1) holds if and only if as the number of subjects goes to infinity,
all the n()- and n() go to infinity. This indicates that to prove Theorem 2, we can
zj,yj zjyJ 1
just prove that given Y = {Y1,..., Yk} and N = {ni,..., nk), where Yj Bin(nj, pj),

pj ~ Beta(a, /) forj = ..., k and (a, /) has proper prior density r(a, 3), the posterior
distributions for all pj are consistent as all nj go to infinity, with regard to the distributions
under Yj ~ Bin(n, pj).
To see this, note that

7r(pj Y, N) oc ( (1 p)"-Y 7(piJa,/)7(a,/3)dad3.

Note that

P(j)= j r(pj Ia,/3)7(ao,/3)dod/3= p-1(1 pj),-'(a, 3)dad/3 = M < oc,

r(PJ) is 0(1). As nj and Yj go to infinity (this occurs since pj C (0, 1)), by the Bernstein-von
Mises theorem, we have

S( n(p pj)|l Y, n,) N (0, p (1- p7))

in distribution, which further implies that

E[pj|Y, N] pj a.s.

Var[pjlY, N] 0 a.s..









CHAPTER 4
A NOTE ON MAR, IDENTIFYING RESTRICTIONS, AND SENSITIVITY ANALYSIS IN
PATTERN MIXTURE MODELS
4.1 Introduction

For analyzing longitudinal studies with informative missingness, popular modeling

frameworks include pattern mixture models, selection models and shared parameter

models, which differ in the way the joint distribution of the outcome and missing data

process are factorized (for a comprehensive review, see Daniels and Hogan, 2008;

Hogan and Laird, 1997b; Kenward and Molenberghs, 1999; Little, 1995; Molenberghs

and Kenward, 2007). In this paper, we will concern ourselves with pattern mixture

models with monotone missingness (i.e., drop-out). For pattern mixture models with

non-monotone (i.e., intermittent) missingness (details go beyond the scope of this

paper), one approach is to partition the missing data and allow one (or more) or the
partitions to be ignored given the other partition(s) (Harel and Schafer, 2009; Wang

et al., 2010).

It is well known that pattern-mixture models are not identified: the observed data

does not provide enough information to identify the distributions for incomplete patterns.

The use of identifying restrictions that equate the inestimable parameters to functions of

estimable parameters is an approach to resolve the problem (Daniels and Hogan, 2008;

Kenward et al., 2003; Little, 1995; Little and Wang, 1996; Thijs et al., 2002). Common

identifying restrictions include complete case missing value (CCMV) constraints and

available case missing value (ACMV) constraints. Molenberghs et al. (1998) proved that

for discrete time points and monotone missingness, the ACMV constraint is equivalent

to missing at random (MAR), as defined by Rubin (1976) and Little and Rubin (1987).

A key and attractive feature of identifying restrictions is that they do not affect the fit

of the model to the observed data. Understanding (identifying) restrictions that lead

to MAR is an important first step for sensitivity analysis under missing not at random

(MNAR) (Daniels and Hogan, 2008; Scharfstein et al., 2003; Zhang and Heitjan, 2006).









In particular, MAR provides a good starting point for sensitivity analysis and sensitivity

analysis are essential for the analysis of incomplete data (Daniels and Hogan, 2008;

Scharfstein et al., 1999; Vansteelandt et al., 2006b).

The normality of response data (if appropriate) for pattern mixture models is

desirable as it easily allows incorporation of baseline covariates and introduction of

sensitivity parameters (for MNAR analysis) that have convenient interpretations as

deviations of means and variances from MAR (Daniels and Hogan, 2008). However,

multivariate normality within patterns can be overly restrictive. We explore such issues in

this paper.

One criticism of mixture models is that they often induce missing data mechanisms

that depend on the future (Kenward et al., 2003). We explore such non-future dependence

in our context here and show how mixture models that have such missing data

mechanisms have fewer sensitivity parameters.

In Section 4.2, we show conditions under which MAR exists and does not exist

when the full-data response is assumed multivariate normal within each missing

pattern. In Section 4.3 and Section 4.4 in the same setting, we explore sensitivity

analysis strategies under MNAR and under non-future dependent MNAR respectively.

In Section 4.5, we propose a sensitivity analysis approach where only the observed

data within pattern are assumed multivariate normal. In Section 4.6, we apply the

frameworks described in previous sections to a randomized clinical trial for estimating

the effectiveness of recombinant growth hormone for increasing muscle strength in

the elderly. In Section 4.7, we show that in the presence of baseline covariates with

time-invariant coefficients, standard identifying restrictions cause over-identification of

the baseline covariate effects and we propose a remedy. We provide conclusions in

Section 8.








4.2 Existence of MAR under Multivariate Normality within Pattern
Let Y be a J-dimensional longitudinal response vector with components scheduled
to be measured at time points tj (j {1,..., J}); this is the full data response. Without
loss of generality, we assume Y1 is always observed. Let S = s denote the number
of observed responses (s = 1, 2,..., J) corresponding to the follow up time ts. Let Yj
denote the historical response vector (Yi, Y2,... Yj). Finally, we define ps(.) = p(.IS =
s).
We show that MAR does not necessarily exist when it is assumed that

YIS = s N(pi(s), '()) (4-1)

for all s.
To see this, we introduce some further notation. Let


,(-)(j) = E(YjIS = s) = s)


and

.(s)(j) = Var(YS = s)= J

w r (2 ) 2(S2)
where '(j) = E(YjlIS = s), P) (j) = E(Y|IS = s), )(j) = Var(Y;_| S =s),
S)(j) = Var(YjIS = s), I)(j) = Cov(Y_ Y| S = s) and s- (j) is the transpose of


Lemma 4.1. For monotone dropout, under the model given in (4-1), define

s)(j) (S) :( )(j)

(s)() = = p^\)j) (Ss^O)

K3 22 21 (j) ( 11 12








The condition that for a given the conditional distributions ps(yj Y1 y) are identical for
all s is equivalent to K(s)(j) ,s) (j) and Ks) (j) being constant in s.

Proof. The proof is trivial since

Y9| Yi ~ N (Ks2 1 (j-1Y 3'3



In other words, if the condition in Lemma 4.1 is satisfied, then there exists a conditional
distribution p>s(ylyji,) such that ps(yj Y-i) = p>s(yjlYj-1) for all s > j. We now state a
theorem that gives the restrictions on the model given in (4-1) for MAR to exist.
Theorem 4.2. For pattern mixture models with monotone dropout, under the model
given in (4-1), identification via MAR constraints exists if and only if A(") and X(s) satisfy
Lemma 4.1 for s > j and 1 < j < J.

Proof. By Lemma 4.1, we only need to show that MAR constraints exist if and only if for
all 1 j.
Molenberghs et al. (1998) proved that MAR holds if and only if

Pk(~jlY~j-) = PJ( j = i P(S = s)
pk(YJY-) P(y y~ ~) Y j Ps (yj I,-) (4-2)
s .P(S =s)
s-j s-j
for all j > 2 and k < j. These conditionals are normal distributions since we assume YIS
is multivariate normal.
Suppose that there exists j such that ps(yjlyj-1) is not the same for all s > j. Then
from (4-2), pj(yj|yj-_) will be a mixture of normals whereas Pk(yjlyj-1) will be a normal
distribution. Thus, Molenbergh's condition will not be satisfied, i.e. the MAR constraints
do not exist.
On the other hand, if for all 1 < j < J, the conditional distributions ps(yjly j_) are
identical for s > j, then Pk(Yjly j-) and p.(yj|yj-) are both normally distributed and the
identification restrictions Pk(YjlYj-1) = Pj(y Jyj-1) will result in MAR. D









So, a default approach for continuous Y, assuming the full data response is
multivariate normal within pattern, does not allow an MAR restriction (unless the
restrictions in Theorem 4.2 are imposed).
We now examine the corresponding missing data mechanism (MDM), SLY. We use
"~" to denote equality in distribution.

Corollary 4.3. For pattern mixture models of the form (4-1) with monotone dropout,
MAR holds if and only if S |Y SI Y1.

Proof. Since Y1 is always observed (by assumption), SlY S| Y implies that

SIYmis, Yobs ~ SIYobs, where Ymis and Yobs denote the missing and observed data
respectively. This shows that MAR holds.
On the other hand, MAR implies that

p(S = slY) = p(S = slYobs) = p(S = s Ys).

By Theorem 4.2, we have that MAR holds only if for all 1 < j < J, the conditional
distributions ps(y jlYj-) are identical for s > j. Thus, under MAR


Pk(Yjlj-1) = p (YiYj-1) = Ps(yjlYj-1)

for all j > 2, k < j and s > j. This implies that for all j > 2
J
p(yj yj-) = -p,(ylj y_,)p(S = s) = ps(yjlj-,)
S-1
for all s.
Therefore,

p(S = slY) = p(S = s|y,) = p(S = s)
p(ys)
ps(ysys-)- ) ... p(y2 y ))p(yI1)
p(Y s-1) ... P(Y2 Y ) (1)
P(Y) (S= s) = p(S = sly).
-- X p b -- --- --











Thus, the implicit MDM is very restrictive and does not depend on the entire history, Ys.
We now show connections to missing completely at random (MCAR) and other common
identifying restrictions.
Corollary 4.4. For pattern mixture models of the form (4-1) with monotone dropout,
MCAR is equivalent to MAR if p, (yi) = p(yi) for all s.

Proof. First, MCAR implies MAR. Second, in the proof of Corollary 4.3, we showed that
MAR holds if

p(S = slY) = p(S = s).
p(ys)
P(Yi)

Thus under the assumption that ps(yi) = p(yi), MAR implies that p(S = slY) = p(S =
s), i.e. MCAR. D

Corollary 4.5. For pattern mixture models of the form (4-1) with monotone dropout,
MAR constraints are identical to complete case missing value (CCMV) and nearest-
neighbor constraints (NCMV).

Proof. By Theorem 4.2, the MAR constraints imply


Pj(YjlYj-i) = PJ(Yjlyj-i) = P^j(YjlYj-J-

Therefore for all k < j, the MAR constraints

Pk(YJYij-) -= P(y~jlj-2)

are identical to CCMV restrictions

Pk(YjlYj-1) = PJ(yjj-1)

and to NCMV restrictions

Pk (Yj(yj-i) = pjyj-y )

=









The results in this section were all based on specifying the mixture model in (4-1)
and demonstrate that MAR only exists under the fairly strict conditions given in Theorem
1.

4.3 Sequential Model Specification and Sensitivity Analysis under MAR

Due to the structure of p(s) and I(s) under MAR constraints as outlined in Section
4.2, we propose to follow the approach in Daniels and Hogan (2008, Chapter 8) and
specify distributions of observed Y within pattern as:

ps(y1) ~ N(p~S),S)) 1 < s (4-3)
p,(yj|yj-) N( (Ij) ) 2!j)s 2
(/ j)) 2 < < s < J

where j = {1, 2,... j 1}. Note that by construction, we assume ps(yIy j_-) are
identical for all j < s < J. Consequently, we have ps(y~ Yj-i) = P(y~lyj-i, S > s), denoted

as ps(yjlYj-1).
Corollary 4.6. For pattern mixture models of the form (4-1) with monotone dropout,
identification via MAR constraints exists if and only the observed data can be modeled
as (4-3).

Proof. Theorem 4.2 shows that identification via MAR constraints exists if and only if
conditional distributions ps(yjlYj-1) are identical for s > j and j > 2. That is, for observed
data, we have
Ps(yjlyj-,) N( (>-j) 7(1IJ))

D

Corollary 4.6 implies that under the multivariate normality assumption in (4-1) and
the MAR assumption, a sequential specification as in (4-3) always exists.
We provide some details for MAR in model (4-1) (which implies the specification
in (4-3) as stated in Corollary 4.6) next. Distributions for missing data (which are not









identified) are specified as:


Ps(YjlYj-_) ~ N( ) < s
The conditional mean structure of () and p) is parameterized as follows:
1i-- 1
j-1

j-1
1*L:- o + Z PI y,
/=1
To identify the full-data model, the MAR constraints require that


Pk(Yj j-) -= -p(y.y j-1)

for k < j, which implies that

WC) (J) and O ) = 1J)
pjj- = jj- an L j-

for 2 < j < J. Since the equality of the conditional means need to hold for all Y, this

further implies that the MAR assumption requires that

/3) = (j)


for 0 < < J.

The motivation of the proposed sequential model is to allow a straightforward

extension of the MAR specification to a large class of MNAR models indexed by

parameters measuring departures from MAR, as well as the attraction of doing

sensitivity analysis on means and/or variances in normal models.

For example, we can let


-/o = A + /l) and log 0j) = A) + log oJ)








for all j > 1 and 0 < / < j. Sensitivity analysis can be done on these A parameters that
capture the information about the missing data mechanism. For example, in a Bayesian
framework, we may assign informative priors elicited from experts to these sensitivity
parameters A. Note in general we may have separate AU) and AW) for each pattern s
(s < j), but in practice it is necessary to limit the dimensionality of these (Daniels and
Hogan, 2008). Indeed, we could make A0) and AS) independent of j to further reduce
the number of sensitivity parameters.
To see the impact of the A parameters on the MDM, we introduce notation A0)

0A + A) Yi and then for k
Y I Yil, S k N ( + Lj )

The conditional probability (hazard) of observing the first s observations given at least s
observations is then given by:

P(S = s|Y) 0s) (y (s))2 { eA)
Iogp > = log P(S =s) + + +
P(S > slY) 2 2a(s) 2
1-/ s+l
(Y- -0 -A>0 )2 j (Y1 -(k))2
S '2 (>1 /) -log P(S = k)(C ) 2 exp 2 k)
2e 0 T 2a(k)
k (Y ~ ))2 J1 1 1() )2
exp (>/) x U (eA )2 p r ( 2
/=s+l /11-1 I=k+l /1-1

In general the MDM depends on Yj, i.e. MNAR. However, one might want hazard at
time ts to only depend on Ys+ in which case we need to have different distributions and
assumptions on [Yj| Y _, S = k] for k 2, as shown in the next section.

4.4 Non-Future Dependence and Sensitivity Analysis under Multivariate
Normality within Pattern
Non-future dependence assumes that missingness only depends on observed data
and the current missing value, i.e.

[S = slY] [S = sl Ys+],









and can be viewed as a special case of MNAR and an extension of MAR (Kenward

et al., 2003). Kenward et al. (2003) showed that non-future dependence holds if and

only if for each j > 3 and k

Pk(YjIYj-i) = P1-1i(yY1Yj-1).

An approach to implement non-future dependence within the framework of

Section 4.3 is as follows. We model the observed data as in (4-3). For the conditional

distribution of the current missing data (Ys+,), we assume that


ps(ys| Ay ) ~ N (/3 s + ( A( s+)) e "S 2 < s < J
( 1/= 1
and for the conditional distribution of the future missing data (Ys+2 ..., Yj), we assume

that

Ps(ylyj-i) = pj-1((yJlYj-) 2 < s < -1 -1,

where

Si ( ) (S j 1) p(S > j)
p(S >J 1)i( p(S >J 1) ).

Note that by this approach, although the model for future missing data is a mixture

of normals, the sensitivity parameters are kept the same as in Section 4.3 (Aj) and As),

j = 2,..., J and / = 0,... j 1). In addition, this significantly reduces the number of
potential sensitivity parameters. For J-dimensional longitudinal data, the total number of

sensitivity parameters, (2J3 + 3J2 + J)/6 J is reduced to (J2 + 3J 4)/2; for J=3 (6),

from 11 (85) to 7 (25). Further reduction is typically needed. See the data example in

Section 4.6 as an illustration. If all the remaining sensitivity parameters are set to zero,

then we have

Ps(ys+lys) = p>s+i(ys+llys), 2 < s < J

and

Ps(yjlYy-1) = P9j(y|Yjy-1), 2 < s < 1 < J- 1,









which implies


Ps (Yj|yj-) = p^j(Yily^-1)

for all s < j, i.e. MAR.

4.5 MAR and Sensitivity Analysis with Multivariate Normality on the
Observed-Data Response

If we assume multivariate normality only on observed data response, YobsIS instead

of the full data response, YIS, we can weaken the restrictions on ps(yj yj_-) for s > j and

allow the MDM to incorporate all observed data under MAR (cf. Corollary 4.3).

For example, we may specify distributions YobsIS as follows:

Ps(yj) ~ N(ps),S)) 1 << sJ

p,(y, y-1) ~ N( () L ) ) 2
where
j-1
(S) R(s) 3(5)y
/ 1
To identify the full-data model, recall the MAR constraints imply that
JP(S k)
Ps(y |Yj-1) = P(Y |y-1) = P(S > ) Pk(YjlYj-1) (4-4)
k-j

for s < j, which are mixture of normals. For sensitivity analysis in this setting of mixture

of normals, we propose to introduce sensitivity parameters A, (location) and A, (scale)

such that for s J A) (1 e ^, (k)
Ps(yjlyj-1) = e- A -j,kPk( )-- --- yj-1) (4-5)
k~j
k-j eACT

where jk = P(k). The rationale for this parameterization is that each pk( IYj-1) in the

summation will have mean A) + (k) and variance e2An~ ) 1. To reduce the dimension

of the sensitivity parameters, we could make A0) and A0) common for allj (namely A,

and A,).









In this set up, we have


(s),MNAR Aj) j,k (k)
Ij- jl~k-
k-j


and


(s),MNAR e2A) (k) -i (k)
j- J lj-i
k-j


j(k)j 2
k=j-


where

=(k ))k 2 (k)
k-j k-j

Note that _/ does not depend on k) for k j,..., J.

Under an MAR assumption (4-4), for [Yjl Yj_, S = s], we have

J
(s),MAR I (k)
k-j


and
_(s),MAR (k ) (k) ) (,(s),MAR)2
k-j
Therefore, under MNAR assumption (4-5), the two sensitivity parameters control the

departure of the mean and variance from MAR in the following way,


(s),MNAR -A ) (s),MAR
Jj|- Pj1-


and (s),MNAR 2 ) (s),MAR (1 e2
and -e (1j e IW


with A) being a location parameter and AW) being a scale parameter. The MNAR class

allows MAR when A) = AO) = 0 for allj > 2.

By assuming non-future dependence, we obtain


p(S =j- 1) e_A) y,
Ps(|yj1Y-i) = P-1i(|y-i1) p(S >j 1) ekk
k-k

SP(S >j- PkYj Yj-1)
k-j 1)


A (1- e^ )-(



2








for the future data and (4-5) for the current data U( = s + 1). The number of sensitivity
parameters in this setup is reduced from J(J 1) to (J 2)(J 1); so, for J = 3 (6), from

6 (30) to 2 (20). Further reductions are illustrated in the next section.

4.6 Example: Growth Hormone Study

We analyze a longitudinal clinical trial using the framework from Sections 4.4 and

4.5 that assume multivariate normality for the full-data response within pattern (MVN)

or multivariate normality for the observed data response within pattern (OMVN). We

assume non-future dependence for the missing data mechanism to minimize the number

of sensitivity parameters.

The growth hormone (GH) trial was a randomized clinical trial conducted to estimate

the effectiveness of recombinant human growth hormone therapy for increasing muscle

strength in the elderly. The trial had four treatment arms: placebo (P), growth hormone

only (G), exercise plus placebo (EP), and exercise plus growth hormone (EG). Muscle

strength, here mean quadriceps strength (QS), measured as the maximum foot-pounds

of torque that can be exerted against resistance provided by a mechanical device,

was measured at baseline, 6 months and 12 months. There were 161 participants

enrolled on this study, but only (roughly) 75% of them completed the 12 month follow up.

Researchers believed that dropout was related to the unobserved strength measures at

the dropout times.

For illustration, we confine our attention to the two arms using exercise: exercise

plus placebo (EP) and exercise plus growth hormone (EG). Table 4-1 contains the

observed data.

Let (Y1, Y2, Y3) denote the full-data response corresponding to baseline, 6 months,

and 12 months. Let Z be the treatment indicator (1 = EG, 0 = EP). Our goal is to draw

inference about the mean difference of QS between the two treatment arms at month









12. That is, the treatment effect


0= E(Y31Z =1)- E(Y31Z = 0).

In the full-data model for each treatment under non-future dependence, there are

seven sensitivity parameters for the MVN model: {A0 A2) A) A3) A3) A(2), ()}

and four sensitivity parameters for OMVN model: {A 2), A 3), A2), A3)} (see Appendix).

For the MNAR analysis, we reduced the number of sensitivity parameters as follows:

* A) and A ) do not appear in the posterior distribution of E(Y31Z) for Z = 0, 1, and
thus are not necessary for inference on 0.

We restrict to MNAR departures from MAR in terms of the intercept terms by
assuming A2)= 3)= A3) 0.

We assume the sensitivity parameters are identical between treatments.

This reduces the set of sensitivity parameters to {A2), A3)} for MVN model and

{A2), A()} for the OMVN model.
There are a variety of ways to specify priors for the sensitivity parameters A2) and
A(3)
0 '

2 = E(Y2 Y, S = 1)- E(Y2 Y,, S > 2)

A3 = E ( Y3Y2, Y,, S = 2)- E(Y31Y2, Y, S = 3).

Both represent the difference of conditional means between the observed and

unobserved responses. A(2) and A() have (roughly) the same interpretation as A2)

and A3) respectively.

Based on discussion with investigators, we made the assumption that dropouts

do worse than completers; thus, we restrict the A's to be less than zero. To do a fully

Bayesian analysis to fairly characterize the uncertainty associated with the missing

data mechanism, we assume a uniform prior for the A's as a default choice. Subject

matter considerations gave an upper bound of zero for the uniform distributions. We set









the lower bound using the variability of the observed data as follows. We estimate the
residual variances of Y2 Y1 and Y3 Y2, Y1 using the observed data; we denote these

by -2|1 and 7312,1 respectively. We use the square root of these estimates as the lower
bounds. In particular, we specify the priors for {A2), A3)} as well as {A(2), A()} as
Unif(9(r)), where

9(-) = /2 O] x L -1/2 0]

Based on the estimates 7-2 18 and 7 /2 = 12, the priors are [-18, 0] x [-12, 0]

for {A2), A3)} and for {A(2), A3)}. For the other parameters in the full-data model, we
assign N(0, 106) for mean parameters (p, /) and N(0, 104) for variance parameters (o);
see the Appendix for further details on the models.
We fit the model using WinBUGS, with multiple chains of 25, 000 iterations and 4000
burn-in. Convergence was checked by examining trace plots of the multiple chains.
The results of the MVN model, OMVN model, and the observed data analysis are
presented in Table 4-2. Under MNAR, the posterior mean (posterior standard deviation)
of the difference in quadriceps strength at 12 months between the two treatment arms
was 4.0 (8.9) and 4.4 (10) for the MVN and OMVN models. Under MAR the differences
were 5.4 (8.8) and 5.8 (9.9) for the MVN and OMVN models, respectively. The smaller
differences under MNAR were due to quadriceps strength at 12 months being lower
under MNAR due to the assumption that dropouts do worse than completers. We
conclude that the treatment difference, 0 was not significantly different from zero.

4.7 ACMV Restrictions and Multivariate Normality with Baseline Covariates

In this section, we show that common identifying restrictions over-identify estimable
parameters in the presence of baseline covariates with time invariant coefficients and
offer a solution.
4.7.1 Bivariate Case

Consider the situation when Y = (Yi, Y2) is a bivariate normal response (J = 2)
with missing data only in Y2, i.e. S = 1 or 2. Assume there are baseline covariates X









with time invariant coefficients a. We model p(S) and p(Y|S) as follows:


siX ~ Bern(q(X))

Y S = s N(p(s)(X), i(S))


s= 1,2


where
(S) + Xa(S) (S) (S)
I (= and I(s) = ll 2
(s) + Xa(s) 2

MAR (ACMV) implies the following restriction


[Y2 Y1, S = 1] [Y2 Y1, S= 2].

This implies that conditional means, E(Y21 Y, X, S = s) for s = 1, 2, are equal, i.e.


(1) (2)
21) + Xa() + L(Y Xa(l)) = 2) + Xa (2)+ (Y1
0I I1I.


For (4-6) to hold for all Y, and X, we need that

a(1) _= (2)


However, both a'l) and a(2) are already identified by the observed data Y1. Thus the

ACMV (MAR) restriction affects the model fit to the observed data. This is against the

principle of applying identifying restrictions (Little and Wang, 1996; Wang and Daniels,

2009).

To resolve the over-identification issue we propose to apply MAR constraints on

residuals instead of directly on the responses. In the bivariate case, the corresponding

restriction is


[Y2- Xa(l)| Y X~(1), X, S = 1] [Y2 Xa(2) 1 X X(2),X, S = 2] (4-7)


12) Xa(2)). (4-6)









Since the conditional distributions


7 (s) / (s)
(S-) 21 2
[Y2 Xa(S)| Yi Xa(s), X, S = s] ~ N + (sYi) 1 () s) )-2s)

are independent of a(s) for s = 1, 2, the restriction (4-7) places no constraints on ca ),

thus avoiding over-identification.

The MDM corresponding to the ACMV(MAR) on the residuals is given by

log P(S = 1Y, X) og ) 1 (1 B)2X(a(2)a(2)T (1)(1)T)XT
P(S = 21Y, X) 1 (X) 2-*

2(1 B)(Y2 A(Y))X((2)- (1)) g
2 log (2)
r11

(Y Xoa(2) -/2))2 1 Xa() /))2
( ))2 + 2(O11)2

where a* = a72) (2, B = and A(Y1) = 22) + 1 -() (2)) Hence by assuming
22i' B = ai + )ll
11 11 11
MAR on the residuals, we have the MDM being a quadratic form of Y1, but independent

of Y2 if and only if c(2) = (1). In other words, assumption (4-7) implies MAR if and

only if c(2) = (1). So in general, MAR on residuals does not imply that missingness

in Y2 is MAR. However, it is an identifying restriction that does not affect the fit of the

model to the observed data. CCMV and NCMV restrictions can be applied similarly to

the residuals.

Remark: In general, p~s) can be replaced by pi(s) if there are subject-specific

covariates with time varying coefficients.

4.7.2 Multivariate Case

To incorporate baseline covariates in the multivariate case and apply similar MAR

restrictions, we specify the model for the observed data as follows:

ps(yi\X) ~ N(/) + Xa(s), aS)) 1 < s < J

Ps(yy_1,X) ~ N( ) 2








where
j-1
Ij : J- + +- 4- +8
/= 1
For the missing data, the conditional distributions are specified as


P,(ylyj-_) ~ N( l ,o7) 1
where
j-1
j S = ( s) + a() / (s) Xa(s)). (4-9)
/ 1
The conditional mean structures in (4-8) and (4-9) induce the following form for the
marginal mean response

E(Yj S = s) = U(s) + Xa()

where U s) is a function of intercept (e.g. (j)) and slope (e.g. /<)) parameters

from p( but not X or a. This marginal mean response reflects the fact that X is the
baseline covariates and a is its time-invariant coefficient. This form is also necessary

for resolving over-identification of a via the MAR on the residuals restrictions as shown
later.

Note that since Y1 is always observed, (s) (1 < s < J) are identified by the

observed data. However, in the model given by (4-8) and (4-9), there is a two-fold

over-identification of as") under MAR:

1. For MAR constraints to exist under the model given in (4-1), (s) as defined in
(4-8) must be equal for 2 < j < s < J and for all X. This requires that a(S) a* for
2
2. MAR constraints also imply that as defined in (4-9) must be equal to (J) for
1 < s Similar over-identification exists under CCMV and NCMV.









Similar to the bivariate case, to avoid the over-identification, we again use the MAR

on the residuals restriction,


Pk(Y Xa(k) ly Xa(k),... Yj-1 Xa(k), X) =

P(S )s)Ps(yJ -Xa(S)| y Xa(,... yj_1 -Xa(S),X) k s=-
With the conditional mean structures specified as (4-8) and (4-9), the MAR on the

residuals restriction places no assumptions on ca ")

The corresponding MDM is

P(S = slY, X) P(S = s)p(Y|S = s,X)
log = log
P(S> slY, X) P(Y, S >sX)
P(S = s)pY(YjIYj-1,X)p,(Yj-1 YJ-2,X) ... s(Y1lX)
= log
E =s PI(YJIYJ-1 X)pi(YJ-1 YJ-2, X) ...pi(YX)P(S = I)
It does not have a simple form in general. However, if a ") = a* for all s, then

P(S = slY, X) p(Y1 X)P(S = s)
log P = log j
P(S > sY,X) /=s P5p(Y1 X)P(S = I)

i.e. the MDM only depends on Y, and X. Otherwise, the missingness is MNAR.

4.8 Summary

Most pattern mixture models allow the missingness to be MNAR, with MAR as a

unique point in the parameter space. The magnitude of departure from MAR can be

quantified via a set of sensitivity parameters. For MNAR analysis, it is critical to find

scientifically meaningful and dimensionally tractable sensitivity parameters. For this

purpose, (multivariate) normal distributions are often found attractive since the MNAR

departure from MAR can be parsimoniously defined by deviations in the mean and

(co-)variance.
However, a simple pattern mixture model based on multivariate normality for the

full data response within patterns does not allow MAR without special restrictions

that themselves, induce a very restrictive missing data mechanism. We explored this









fully and proposed alternatives based on multivariate normality for the observed data

response within patterns. In both these contexts, we proposed strategies for specifying

sensitivity parameters.

In addition, we showed that when introducing baseline covariates with time invariant

coefficients, standard identifying restrictions result in over-identification of the model.

This is against the principle of applying identifying restriction in that they should not

affect the model fit to the observed data. We proposed a simple alternative set of

restrictions based on residuals that can be used as an 'identification' starting point for an

analysis using mixture models.

In the growth hormone study data example, we showed how to reduce the number

of sensitivity parameters in practice and a default way to construct informative priors for

sensitivity parameters based on limited knowledge about the missingness. In particular,

all the values in the range, were weighted equally via a uniform distribution. If there

is additional external information from expert opinion or historical data, informative

priors may be used to incorporate such information (for example, see Ibrahim and Chen,

2000; Wang et al., 2010). Finally, an important consideration in sensitivity analysis and

constructing informative priors is that they should avoid extrapolating missing values

outside of a reasonable range (e.g., negative quadriceps strength).

4.9 Tables

Table 4-1. Growth Hormone Study: Sample mean (standard deviation) stratified by
dropout pattern.
Dropout Number of Month
Treatment Pattern Participants 0 6 12
EG 1 12 58(26)
2 4 57(15) 68(26)
3 22 78(24) 90(32) 88(32)
All 38 69(25) 87(32) 88(32)
EP 1 7 65(32)
2 2 87(52) 86(51)
3 31 65(24) 81(25) 73(21)
All 40 66(26) 82(26) 73(21)









Table 4-2. Growth Hormone Study: Posterior mean (standard deviation)


Treatment
EP


EG


Month
0
6
12
0
6
12


Difference at 12 mos.


Observed
Data
66(9.9)
82(18)
72(3.8)
69(7.3)
87(16)
88(6.8)
12(7.8)


MAR Analysis


MVN
66(6.0)
82(5.9)
73(4.9)
69(4.9)
81(6.8)
78(7.2)
5.4(8.8)


OMVN
66(6.0)
81(8.2)
73(6.1)
69(4.9)
82(7.7)
79(7.8)
5.8(9.9)


MNAR
MVN
66(6.0)
80(6.0)
72(5.0)
69(4.9)
78(7.1)
76(7.5)
4.0(8.9)


Analysis
OMVN
66(6.0)
80(8.3)
71(6.1)
69(4.9)
79(8.0)
76(8.0)
4.4(10)








4.10 Appendix
Missing data mechanism under missing not at random and multivariate
normality: The MDM in Section 4.3 is derived as follows:

P(S = slY) P(S = s)ps(Y)
log = log
P(S > slY) P(Y, S > s)
g P(S = s)ps(YI) R=2 Ps(Y/ IY-)
= log
E=, sP(S = k)pk(Y) H/22 Pk(y -)}
-log 2 2P>1(Y1 Y-1)P(S= s)ps(Yi) Hns +iP(Y/ vi-1)
= log
I/=2 pP>(Y / Y-1 k=s {P(S = k)pk(Y) HJ =sl Pk(y -1)}
l P(S = s)p,(YI) H/s J+lPs(Y/i /-1)
= log
k{=s P(S = k)Pk(Y) /=s 1 Pk( YI 1k -1)
(5S) (?s(i ^)) 2 i> 0a" + ((0i))}
=log P(S = s) +-+1 )
2 2(s) 2 2e1 ek(>)

log {P(S k)(k) -exp { k)2 exp} (Y2_ />1)2
k=s 1 /=+s1 -/|-1

0 ( Y / 1 / ILI/ 1 ) 2 2
S) a(>)
I=k+l 2ze & /1 i-1

Mean and variance of [Yj Yj-, S = s]: The mean and variance of [Yj| Y-,, S = s]
under MNAR assumption in Section 4.5 are derived as follows:
Sa ), (k)
N W y.- AUO) (1 ( e )/
(s),MNAR E(Y Yj-1, S = s) = e- Vj,k yJPk( O1 ej-1)
/(e^
k=j
= e ) jk A) W ) dy
e =k-e) +A (1 e- e') Pk(Y)lyJ1)e dy
k=j


k=j









and


o)MNAR = Var(lY, S = s)
J (k)
Sy). U) (1 e^)pY-
= e -A jk Pk( j-1)dy E2(Yj -i S = k)
k-j e

Se- =j,k 8e y2 )l'j) Pk(Yj -yj_1)e)( ) dy;
k=j

(k 2/( ) ) 2
-- e +y (1 je j-

k-j
J J
-jkE((j) 1' S k) (k) "2
e2A kE((y)21 S = k) +j + (1 e) )2
k=j k-j

+ 2e k(A" (1 e)lj) )E( y~*y,j_ S k)- A + ,k II-)
k=j k-j

e 2A =,k (o1j) + ( 1k) 2 2 ,W (k)
I e j z^-i 4- -i-'lj- ) I =j,k-


S(1 e2)) (I) 2 ( kJ )
k=j k-=j

y, ---(1-e -
where yj =- -
e )

Full-data model for the growth hormone example (Section 4.6): We specify

a pattern mixture model with sensitivity parameters for the two treatment arms. For

compactness, we suppress subscript treatment indicator z from all the parameters in the

following models.

For missing pattern 5, we specify


S ~ Mult(o)

with the multinomial parameter ( = ( 02, 3), s = P(S = s) for s E {1, 2, 3}, and

E=1 s = 1.








For the observed response data model [Yobs|S], we specify the same MVN and
OMVN model for [YIIS] as follows:

Y, S = 1 N(,', a))
Y, lS = 2 Q N a(/ )2))
Y l S = 3 ~ N (/j3 3)),

For MVN model, we specify

Y2 Y1, S 2} N ((2) + 2 (2)
Y2 Y1, S = 3
&\ o3) + (3) y+ n(3) y ^(
v2, 7(> 3)"
Y3 Y2, Y1, S = 3 N ~>3 + 31 + ~ 3)2. '03) 3'

For OMVN model, we specify
//?(2) + (2) Y 7(2)
Y2 Y, S = 2 + -21 u2N 2
S(3) + (3) y '7(3)
Y2 Y1, = 3 ~ N 2,0 2,1 2
{9(3) + (3) Y1 + (3) Y2, _(3)"
Y3 Y2, Y1, S = 3 ~ N 3,0 3,1 3,2 313,

For missing response data model [Ymis Yobs, S], we specify for MVN model

Y2 1, S = 1 N 0 -(2) 2) 2) )Y1, e() 2 2)
Y3 Y2, Y1, S = 2 N (>3) A3 + (>3) + A3))Y (/3) + A3))2, eA()3)

Y31 Y2, Y,, S = 1 ~ 0 3 3) + 01 32 31)
+ 2 N (/3) + A3) + (/3) + 3)) + (/>3) + A3)), (-3)'7
+ 2 + (N3 0 0 1 1 2 Y2, 313 a313


100








For OMVN model, we specify


03 (L (Au2) + /(3) + /(3)Yi, 'aL)(3)
Y2 1, S = 1 ~ 2 3 2)e) (2)_

+ 2 N(A(2) +/(2) +/(2) 3 e1 2
+ 42+43 N U 2,0 2,1 Y1, 21,)

L( + (3) + (3) y+ (3) 7) (3)
Y31 Y2, Y1, S = 2 ~ N ( u3) 3,1Y 3,2 Y2, e -313,
3 N (4,3 + 3) y + 93)'y2, 73)
Y3 Y2 1, S = 1 N 3,2 313
2 (L (3) + (3) + (3) y+ (3) y '7eA (3)
+ 2 3N ( u3) '3,1 v2 + p 23 e, )313


MAR on residuals constraints: Here we show that in multivariate case (Section
4.7.2), the MAR on the residuals restriction puts no constraints on c() .
Let [ZjIS] [Yj Xo(S)]. The MAR on the residuals constraints are

SP(S > s)
Pk(ZjZj-1,X) = P(S > jPs(zjljiX).
s=-
Note that
J
ps(yj, ... yJ) = ps(yi) ps(yl y-i)
/=2
Sexp (> Xa(S) > -)(Yt (>ai) Xa(s))
= Ps(Yi)
/=2 227a

Thus,
J
PS(z, ... ) ps(zi) H ps(z/|z/-1)
/=2
exp (Z i t/ P/ ~_ _(l t _t/) 2
= ps (z2) -)-
/=2 2/I/

We can further show that

[Z, Zj_, S = s, X] ~ N p ) +( )(Zi I)), ) ,
/=i









which is independent of s. Therefore,


P(S > A)
S ( p(zyz)-I' X)--N
s --J


( P() +


j-1
/. (Z/
/=1


Similarly, we may derive that


pk(ZjZZj-1,X) = N (P) +


j-1
/ (Z


The constraints (4-10) thus imply

(S) =
(-1


which places no restrictions on )

which places no restrictions on (s).


102


( )) (o J)
P/1 7j( ) .


/s)) )
P/ 7JO- )


P/1 ))









CHAPTER 5
DISCUSSION: FUTURE APPLICATION OF THE BAYESIAN NONPARAMETRIC AND
SEMI-PARAMETRIC METHODS
5.1 Summary

In this dissertation, we explored the utility of Bayesian shrinkage methods for the

analysis of incomplete longitudinal data with informative missingness that includes

both drop-out and intermittent missingness. We considered two different saturated

model parameterizations and corresponding parameter space reduction strategies. By

simulation, we showed that the proposed methods outperform a saturated model without

parameter shrinkage and a misspecified parametric model while being very competitive

with the correct parametric model. Furthermore, the incomplete data analysis framework

developed in this dissertation allows straightforward incorporation of experts' opinions

in the form of informative priors, as well as flexible sensitivity analysis. Second, we

explored conditions necessary for identifying restrictions that result in missing at random

(MAR) to exist under a multivariate normality assumption and strategies for identifying

sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with

informative priors, with application to a longitudinal clinical trial. In the following sections,

we will discuss further applications of Bayesian nonparametric and semi-parametric

methods we are currently considering.

5.2 Extensions to Genetics Mapping

For identifying genes involved in human diseases and their inheritance, multi-generational

family-based designs (including parents and offspring) have become popular (Farnir

et al., 2002; Liu et al., 2006). In such analysis, researchers are often interested in

simultaneously estimating linkage that describes the tendency of certain alleles to

be inherited together, and linkage disequilibrium (LD) that measures the non-random

association between different markers. However, if the LD is close to zero, the linkage

recombination fraction is hard to estimate (or not estimable at all). We use a two marker

scenario to illustrate this dilemma.


103









Consider two markers A and B, each with two alleles A and a and B and b,

respectively. Four possible haplotypes, [AB], [Ab], [aB], and [ab], can be formed by

the two markers, with the frequencies denoted as p1i, pio, Pol, and poo, respectively. We

use p, 1 p, q and 1 q to denote the allele frequencies of A, a, B and b, respectively.

Then, we have the following relationships:

Pl = pq + D

Pio = p(1 q)- D
(5-1)
Poi = (1 p)q D

Poo = (1- p)(1- q) + D,

where D is the LD parameter. By simple algebra, we can show that


D = PuPoo PioPol.

Now, let r denote the linkage recombination fraction, the frequency that a chromosomal

crossover will take place between the two markers A and B during meiosis. To estimate

r, only offspring with at least a double-heterozygous parent, i.e. parent with genotype

Aa/Bb, contribute to the likelihood (proportionally) by


PinPoor + ploPo(1 r).

Therefore, r is not estimable when D is zero.

One possible solution is to incorporate more markers in the linkage analysis. By

doing this, the number of parents with no less than two heterozygous markers increases.

Consequently, more offspring contribute to the likelihood for estimating the linkage

recombination fraction. However, the number of haplotype frequencies to be estimated

also increases (exponentially) as the number of markers increases. Bayesian shrinkage

methods can be applied to address this problem.









5.3 Extensions to Causal Inference
5.3.1 Causal Inference Introduction

For clinical studies of terminal diseases, incomplete data is often caused by

death. If a response is truncated by death, the missingness should not be classified

as "censored" because "censoring" implies the masking of a value that is potentially

observable. It is also not appropriate to handle these cases by traditional non-mortality

missing data approaches such as models assuming ignorability or models assuming

non-ignorable missing data mechanism which implicitly "impute" missing responses.

In randomized studies with no missingness, causal relationships are well established

and the treatment causal effects can be estimated directly (Rubin, 1974). However, in

non-randomized trials or in presence of missing data, these methods are limited if the

research interest demands estimation of causally interpretable effects.

To define causal effects, we first introduce the concept of potential outcomes, which

are sometimes used exchangeablely with the term counterfactual (but not always, see

Rubin, 2000). The use of term "potential outcome" can be traced at least to Neyman

(1923). Neyman used "potential yields" Uik to indicate the yield of a plot k if exposed to

a variety i. Rubin (1974) defines the causal effect of one treatment, E, over another, C,

for a particular unit as the difference between what would have happened if the unit had

been exposed to E, namely Y(E), and what would have happened if the unit had been

exposed to C, namely Y(C).

Using potential outcomes, Frangakis and Rubin (2002) introduce a framework for

comparing treatment effects based on principle stratification, which is a cross-classification

of units defined by their potential outcomes with respect to (post)treatment variables,

such as treatment noncompliance or drop-out. The treatment comparison adjustment for

posttreatment variables is necessary because such variables encode the characteristics

of both the treatment and the patient. For example, a patient with diagnosed cancer in a

cancer prevention trail may have depression caused by the treatment or by the diagnosis


105









itself. Furthermore, the comparison may be meaningless without the posttreatment

variable adjustment. For example, a response such as depression is no longer defined

after a non-response cause such as death happens.

A stratum s/ is defined by the joint potential response S(Z) with respect to the

posttreatment variable Z (e.g., Z = 0, 1). For example, let S(Z) be the potential survival

status and let S(Z) = 1 and 0 denote alive and dead respectively. Then the stratum

s/- = {S(0) = 1, 5(1) = 1} defines the patients who will (potentially) survive on both

arms.

A stratum is unaffected by treatment. That is, for subject i, i e s/ or i s/ does not

depend on the actual treatment i is assigned. Consequently, the treatment effect defined

as the difference between

{Yi(0)|i c/} and {Y(1)| E c}

is a causal effect.

On the contrary, a standard adjustment for posttreatment variables uses the

treatment comparison between


{Y,(0)|Si(0) = s} and {Y,(1)lSi(1) = s}.

Such an estimand is not a causal effect when S(z) is affected by z, which results in the

fact that the group of patients with S(0) = s is not identical to the group of patients with

S(1) = s.

Consistent with Frangakis and Rubin's framework, Rubin (2000) introduced the

concept of survivors average causal effect (SACE), that is the causal effects of treatment

on endpoints that are defined only for survivors, i.e. the group of patients who would live

regardless of their treatment assignment.

Within the principal strata framework, the identification of SACE or other principal

stratum causal effects usually depends on untestable assumptions. To address the


106









uncertainty of the untestable assumptions, sensitivity analysis is carried out, and/or

bounds of the causal effects are derived. For example, Zhang and Rubin (2003) derived

large sample bounds for causal effects without assumptions and with assumptions such

as monotonicity on death rate on different treatment arms. Gilbert et al. (2003) used a

class of logistic selection bias models to identify the causal estimands and carried out

sensitivity analysis for the magnitude of selection bias. Hayden et al. (2005) assumed

"explainable nonrandom noncompliance" (Robins, 1998) and outlined a sensitivity

analysis for exploring the robustness of the assumption. Cheng and Small (2006)

derived sharp bounds for the causal effects and constructed confidence intervals to

cover the identification region. Egleston et al. (2007) proposed a similar method to

Zhang and Rubin (2003), but instead of identifying the full joint distribution of potential

outcomes, they only identify features of the joint distribution that are necessary for

identifying the SACE estimand. Lee et al. (2010) replaced the common deterministic

monotonicity assumption by a stochastic one that allows incorporation of subject specific

effects and generalized the assumptions to more complex trials.

5.3.2 Data and Notation

The following notation is defined for a random individual. When necessary, we use

the subscript i to denote data for the ith individual.

We consider a controlled randomized clinical study with treatment arm (Z = 1) and

control arm (Z = 0). A longitudinal binary outcome Y is scheduled to be measured at

visits = 1,..., J, i.e. Y = (Y, ..., Yj) is a J-dimensional vector. Let R = (R, ..., Rj) be

the missing indicator vector with Rj = 1 if Yj is observed and Rj = 0 if Yj is missing. We

assume the missingness is monotone.

We assume there are multiple events that will cause drop out for a patient on this

trial, and categorize the events as non-response events (e.g. death) and missing events

(e.g. withdraw of consent). We assume that non-response events may happen after the


107









occurrence of a missing event but not vice versa. We further assume all the events are
observed.

Let C denote the "survival" time for a patient. That is, C = c implies that a

non-response event happened to the patient between visit c and c + 1 and caused

the patient to drop out on and after visit c + 1. Let Rc = {R1, ..., Rc} be the missing data

indicator recorded prior to patient drop-out that is caused by a non-response event.

We use Yj = (Yi,..., Yj) to denote the historical data up to time point and Yobs to

denote the observed response data. We use Y(z), C(z), Rc(z) and Yobs(Z) to denote

the value of Y, C, Rc and Yobs of a patient, possibly counterfactual, if the patient is

assigned to treatment z.

The full data F of a patient thus consists of

{Z, C(O), Yc(o)(O), C(1), YC()(1))},

and the observed data 0 contains

{Z, C(Z), R,(Z), Yobs(Z)}.

One goal is to measure the causal effect of treatment by estimating the treatment

effect for those who would not have dropped out due to non-response reasons under

either treatment or control. That is, to estimate the "survivor" average causal effect


SACEj = E(Yj(1) Y(0) C(0) > j, C(1) > j)

= P(Y,(1) = 1 C(0) > j, C(1) >j) P(,(O) = 11C(0) > j, C(1) >j) (5-2)

for allj. Note that the group of patients of interest {C(O) > j, C(1) > j} form a principal

stratum.

5.3.3 Missing Data Mechanism

To make causal inferences, we first need to estimate P,z,c = E[YI C = c, Z = z] for

all z and j < c, which are not identifiable without unverifiable assumptions. We make the


108









same partial missing at random assumption as in Chapter 3, Section 3.2, that

R, I Yc ZZ, C, Yobs.

We have shown in Chapter 3 that P*,z,c is identified by the observed data under this

partial missing at random assumption.

5.3.4 Causal Inference Assumption

The causal effect (5-2) is not identifiable from the observed data

0= {Z,C(Z), R(Z),Yobs(Z)}.

We propose the following assumptions to identify boundaries for the causal effect:

I Stable Unit Treatment Value Assumption (SUTVA).
Let Z = (Z, ..., ZN) be the vector of treatment assignment for all the patients.
SUTVA means
Z, = Zf = (Y,(Z,), C,(Z,)) = (Y,(Zf), C,(Z;)),
regardless of what Z is. That is, the potential outcome of patient i is unrelated to the
treatment assignment of other patients. The allows us to write Yi(Z) and Ci(Z) as
Yi(Zi) and Ci(Zi) respectively.

II Random Assignment
The treatment assignment Z is random, i.e.

Z L (Y(0), Y(1), C(0), C(1)),

which holds in a controlled randomized clinical trial. This assumption allows us to
write Yj(z) and Cj(z) as YjIZ = z and Cj\Z = z respectively.

III Mean Monotonicity

E[Y'(z)lC(z) = c, C(1 z) = t] < E[Yj(z) C(z) = c', C(1 z) = t']

for c' > c > j, t' > t, z = 0, 1.
This assumption provides an ordering of the mean potential response at visit under
treatment z for all the principal cohorts of individuals who would be on study at visit j
under treatment z. The means are assumed to not be worse for cohorts who remain
on-study longer under both treatments. That is, the individuals who would be last
seen at time c' (c' > j) under treatment z and time t' under treatment 1 z will not
have a worse mean potential response at time under treatment z than individuals


109









who would last be seen at a time less than c' (but still greater than or equal to time)
under treatment z or a time less than t' under treatment 1 z.
The mean monotonicity assumption is often reasonable in clinical studies. For
example, in a cardiovascular stent implantation trial, multiple endpoints including
all-cause mortality free survival and 6-minute walk test score are used to evaluate
the effectiveness of the device. Since the two endpoints are positively correlated, it
is plausible to assume that patients will potentially perform better with their 6-minute
walk tests if they have a longer survival time, i.e. remain on the study longer.
We introduce some further notation

1. pc,t = P(C(O) = c, C(1)= t).

2. 7,, = P(C(z)= c).

3. mj,z,,t = E[Yj(z)IC(z) = c, C(1 z) = t] (c >j).

4. ijz, = E[Yj(z)|C(z) = c] (c >j).
Note that under Assumption II (randomization), both 7, = P(C(z) = c) = P(C = cIZ =
z) and p,z,c = E[Yj(z)|C(z) = c] = E[Yj C = c, Z = z] are identified by the observed
data under the partial missing at random assumption.
The causal effect of interest SACEj can be expressed as

SACEj = E[Y(1) Y.(o) C(1) >j, C(o) > j]

= E[Yj(1) Y(0), C(1) > j, C(O) > j]P(C(1) > j, C(O) >j)- (5-3)

= Pc,t ,,c, m oct)Pct .
c=-j t-j c-j t-j
The boundaries of SACEj in (5-3) can be found subject to the following restrictions:

1. O < pc,t < 1 for all c and t, and t = 1,
c I t 1Pc,t= 1,
2. Ec l Pc,t = 7l,t and c,t =7c,
3. t=1 Pct = 70j) C'
3. ZE1t mjo,c,tPc,t = pjo,c (c > j) and 1, mj,z,t,cPc,t = pj,i,t ((t > j) for all j,

4. mj,c',t, > mj,z,c,t for c' > c > j, t' > t and all j and z.


110









where restrictions (1)-(2) are for ps,t to be a distribution with (identified) marginals,
restriction (3) satisfies the identified conditional means, and restriction (4) comes from

Assumption III.

Finding the boundaries of the SACE i.e. finding the minimum and the maximum

of the objective function (5-3), can be approximated (by ignoring the normalizing

constant) as a non-convex quadratically constrained quadratic problem (QCQP) (Boyd

and Vandenberghe, 1997, 2004). For a QCQP, a standard approach is to optimize a

semidefinite relaxation of the QCQP and get lower and upper bounds on local optimal of

the objective function (Boyd and Vandenberghe, 1997).

The uncertainty of the estimated bounds can be characterized in a Bayesian

framework. The joint posterior distribution of the bounds can be constructed by

implementing the optimization for each posterior sample of *,z,c, identified by the

algorithm proposed in Section 5.3.3. The result can be presented as in Figure 5-1.

A study decision might be based on the mode of the posterior joint distribution of the

bounds.

5.3.5 Stochastic Survival Monotonicity Assumption

Under Assumption II, the marginal distributions P(C(0)) and P(C(1)) of P(C(0) =

c, C(1) = t) (re: 0 and 1 represent the placebo and treatment arm, respectively)

are identified. However, the joint distribution remains unidentified without further

assumption. We outline several Assumptions that will identify pc,t beyond the identified

margins (Figure 5-2). These assumptions, when reasonable, will simplify the optimization

of the objective function and yield more precise results.

1. P(C(O) = m C(1) = c) = qn-mP(C(O) = n C(1) = c) for c > m > n and q > 1.
That is, given a patient will "survive" until time point c on the treatment arm, the
probability the patient will "survive" until time point n 1 is q times the probability that
the patient will "survive" until n for n < c on the placebo arm. The parameter q is a
sensitivity parameter.

2. P(C(1) = t C(O) = c) = 0 for c > t.









That is, the chance that a patient will "survive" longer on the placebo arm than
the active treatment arm is zero. This assumes the lower-triangle (excluding the
diagonal) in Figure 5-2 is zero.

These assumptions may be incorporated in the optimization Bayesian framework to

improve the precision of the posterior joint distribution of the bounds.

5.3.6 Summary of Causal Inference

We have outlined an approach to estimate the causal effect of treatment where

there is dropout due to non-response reasons such as death. We also outlined an

approach for posterior inference. We need to further explore point estimation of the

intervals/bounds for the causal effect and characterizing their uncertainty in a Bayesian

framework.

5.4 Figures


112






























Contour Plot


CO --------------------------------------------




(0 -











006




02













-4 -2 0 2 4

Lower Bound


Figure 5-1. Contour and Perspective Plots of a Bivariate Density


113


Density Plot





























(1,1)



(2,2) a (c,t)
c






(t,c) (c,c)







(J,J)


Figure 5-2. Illustration of pc,t









REFERENCES


Albert, P (2000). A Transitional Model for Longitudinal Binary Data Subject to
Nonignorable Missing Data. Biometrics 56, 602-608.

Albert, P., Follmann, D., Wang, S., and Suh, E. (2002). A latent autoregressive model for
longitudinal binary data subject to informative missingness. Biometrics 58, 631-642.

Baker, S. (1995). Marginal regression for repeated binary data with outcome subject to
non-ignorable non-response. Biometrics 51, 1042-1052.

Baker, S., Rosenberger, W., and DerSimonian, R. (1992). Closed-form estimates for
missing counts in two-way contingency tables. Statistics in Medicine 11, 643-657.

Birmingham, J. and Fitzmaurice, G. (2002). A Pattern-Mixture Model for Longitudinal
Binary Responses with Nonignorable Nonresponse. Biometrics 58, 989-996.

Boyd, S. and Vandenberghe, L. (1997). Semidefinite programming relaxations of
non-convex problems in control and combinatorial optimization. communications,
computation, control and signal processing: a tribute to Thomas Kailath .

Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge Univ Pr.

Cheng, J. and Small, D. (2006). Bounds on causal effects in three-arm trials with
non-compliance. Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 68, 815-836.

Christiansen, C. and Morris, C. (1997). Hierarchical Poisson regression modeling.
Journal of the American Statistical Association pages 618-632.

Daniels, M. (1999). A prior for the variance in hierarchical models. Canadian Journal of
Statistics 27,.

Daniels, M. and Hogan, J. (2000). Reparameterizing the Pattern Mixture Model for
Sensitivity Analyses Under Informative Dropout. Biometrics 56, 1241-1248.

Daniels, M. and Hogan, J. (2008). Missing Data in Longitudinal Studies: Strategies for
Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC.

DeGruttola, V. and Tu, X. (1994). Modelling Progression of CD4-lymphocyte Count and
its Relationship to Survival Time. Biometrics 50, 1003-1014.

Diggle, P. and Kenward, M. (1994). Informative Drop-out Longitudinal Data Analysis.
Applied Statistics 43, 49-93.

Egleston, B. L., Scharfstein, D. O., Freeman, E. E., and West, S. K. (2007). Causal
inference for non-mortality outcomes in the presence of death. Biostatistics 8, 526 -
545.


115









Escobar, M. and West, M. (1995). Bayesian density estimation and inference using
mixtures. Journal of the American Statistical Association pages 577-588.

Fan, J. and Li, R. (2001). Variable Selection Via Nonconcave Penalized Likelihood and
Its Oracle Properties. Journal of the American Statistical Association 96, 1348-1361.

Farnir, F, Grisart, B., Coppieters, W., Riquet, J., Berzi, P., Cambisano, N., Karim, L., Mni,
M., Moisio, S., Simon, P., et al. (2002). Simultaneous mining of linkage and linkage
disequilibrium to fine map quantitative trait loci in outbred half-sib pedigrees: revisiting
the location of a quantitative trait locus with major effect on milk production on bovine
chromosome 14. Genetics 161, 275.

Faucett, C. and Thomas, D. (1996). Simultaneously modelling censored survival data
and repeatedly measured covariates: a Gibbs sampling approach. Statistics in
Medicine 15,.

Fisher, B., Costantino, J., Wickerham, D., Redmond, C., Kavanah, M., Cronin, W., Vogel,
V., Robidoux, A., Dimitrov, N., Atkins, J., Daly, M., Wieand, S., Tan-Chiu, E., Ford, L.,
Wolmark, N., other National Surgical Adjuvant Breast, and Investigators, B. P. (1998).
Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant
Breast and Bowel Project P-1 study. Journal of the National Cancer Institute 90,
1371-1388.

Fitzmaurice, G. and Laird, N. (2000a). Generalized linear mixture models for handling
nonignorable dropouts in longitudinal studies. Biostatistics 1, 141-156.

Fitzmaurice, G. and Laird, N. (2000b). Generalized Linear Mixture Models for Handling
Nonignorable Dropouts in Longitudinal Studies. Biostatistics 1, 141-156.

Fitzmaurice, G., Molenberghs, G., and Lipsitz, S. (1995). Regression Models for
Longitudinal Binary Responses with Informative Drop-Outs. Journal of the Royal
Statistical Society. Series B. Methodological 57, 691-704.

Follmann, D. and Wu, M. (1995). An approximate generalized linear model with random
effects for informative missing data. Biometrics pages 151-168.

Forster, J. and Smith, P. (1998). Model-Based Inference for Categorical Survey Data
Subject to Non-lgnorable Non-Response. Journal of the Royal Statistical Society:
Series B: Statistical Methodology 60, 57-70.

Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference.
Biometrics 58, 21-29.

Gilbert, P., Bosch, R., and Hudgens, M. (2003). Sensitivity analysis for the assessment
of causal vaccine effects on viral load in HIV vaccine trials. Biometrics 59, 531-541.

Green, P. J. and Silverman, B. (1994). Nonparametric Regression and Generalized
Linear Models. Chapman & Hall.


116









Harel, O. and Schafer, J. (2009). Partial and latent ignorability in missing-data problems.
Biometrika 96, 37.

Hayden, D., Pauler, D., and Schoenfeld, D. (2005). An estimator for treatment
comparisons among survivors in randomized trials. Biometrics 61, 305-310.

Heagerty, P (2002). Marginalized transition models and likelihood inference for
longitudinal categorical data. Biometrics pages 342-351.

Heckman, J. (1979a). Sample Selection Bias as a Specification Error. Econometrica 47,
153-161.

Heckman, J. (1979b). Sample selection bias as a specification error. Econometrica:
Journal of the econometric society pages 153-161.

Heitjan, D. and Rubin, D. (1991). Ignorability and coarse data. The Annals of Statistics
pages 2244-2253.

Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal
measurements and event time data. Biostatistics 1,465-480.

Hogan, J. and Laird, N. (1997a). Mixture Models for the Joint Distribution of Repeated
Measures and Event Times. Statistics in Medicine 16, 239-257.

Hogan, J. and Laird, N. (1997b). Model-Based Approaches to Analysing Incomplete
Longitudinal and Failure Time Data. Statistics in Medicine 16, 259-272.

Hogan, J., Lin, X., and Herman, B. (2004). Mixtures of varying coefficient models for
longitudinal data with discrete or continuous nonignorable dropout. Biometrics 60,
854-864.

Ibrahim, J. and Chen, M. (2000). Power prior distributions for regression models.
Statistical Science pages 46-60.

Ibrahim, J., Chen, M., and Lipsitz, S. (2001). Missing responses in generalised linear
mixed models when the missing data mechanism is nonignorable. Biometrika 88, 551.

Kaciroti, N., Schork, M., Raghunathan, T, and Julius, S. (2009). A Bayesian Sensitivity
Model for Intention-to-treat Analysis on Binary Outcomes with Dropouts. Statistics in
Medicine 28, 572-585.

Kenward, M. and Molenberghs, G. (1999). Parametric Models for Incomplete Continuous
and Categorical Longitudinal Data. Statistical Methods in Medical Research 8, 51.

Kenward, M., Molenberghs, G., and Thijs, H. (2003). Pattern-mixture models with proper
time dependence. Biometrika 90, 53-71.

Kurland, B. and Heagerty, P. (2004). Marginalized Transition Models for Longitudinal
Binary Data with Ignorable and Non-lgnorable Drop-Out. Statistics in Medicine 23,
2673-2695.


117









Laird, N. (1988). Missing data in longitudinal studies. Statistics in Medicine 7,.

Land, S., Wieand, S., Day, R., Ten Have, T., Costantino, J., Lang, W., and Ganz, P.
(2002). Methodological Issues In the Analysis of Quality of Life Data in Clinical Trials:
Illustrations from the National Surgical Adjuvant Breast And Bowel Project (NSABP)
Breast Cancer Prevention Trial. Statistical Methods for Quality of Life Studies pages
71-85.

Lee, J. and Berger, J. (2001). Semiparametric Bayesian Analysis of Selection Models.
Journal of the American Statistical Association 96, 1397-1409.

Lee, J., Hogan, J., and Hitsman, B. (2008). Sensitivity analysis and informative priors
for longitudinal binary data with outcome-related drop-out. Technical Report, Brown
University.

Lee, K., Daniels, M. J., and Sargent, D. J. (2010). Causal effects of treatments for
informative missing data due to progression. To Appear in JASA.

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear
models. Biometrika 73, 13-22.

Lin, H., McCulloch, C., and Rosenheck, R. (2004). Latent pattern mixture models for
informative intermittent missing data in longitudinal studies. Biometrics 60, 295-305.

Little, R. (1993). Pattern-Mixture Models for Multivariate Incomplete Data. Journal of the
American Statistical Association 88, 125-134.

Little, R. (1994). A Class of Pattern-Mixture Models for Normal Incomplete Data.
Biometrika 81, 471-483.

Little, R. (1995). Modeling the drop-out mechanism in repeated-measures studies.
Journal of the American Statistical Association 90,.

Little, R. and Rubin, D. (1987). Statistical Analysis with Missing Data. Wiley.

Little, R. and Rubin, D. (1999). Comment on Adjusting for Non-lgnorable Drop-out Using
Semiparametric Models by D.O. Scharfstein, A. Rotnitsky and J.M. Robins. Journal of
the American Statistical Association 94, 1130-1132.

Little, R. and Wang, Y. (1996). Pattern-mixture models for multivariate incomplete data
with covariates. Biometrics 52, 98-111.

Liu, T, Todhunter, R., Lu, Q., Schoettinger, L., Li, H., Littell, R., Burton-Wurster, N.,
Acland, G., Lust, G., and Wu, R. (2006). Modelling extent and distribution of zygotic
disequilibrium: Implications for a multigenerational canine pedigree. Genetics.


118









Liu, X., Waternaux, C., and Petkova, E. (1999). Influence of Human Immunodeficiency
Virus Infection on Neurological Impairment: An Analysis of Longitudinal Binary Data
with Informative Drop-Out. Journal of the Royal Statistical Society (Series C): Applied
Statistics 48, 103-115.

Molenberghs, G., Kenward, M., and Lesaffre, E. (1997). The Analysis of Longitudinal
Ordinal Data with Nonrandom Drop-Out. Biometrika 84, 33-44.

Molenberghs, G. and Kenward, M. G. (2007). Missing Data in Clinical Studies. Wiley.

Molenberghs, G., Michiels, B., Kenward, M., and Diggle, P. (1998). Monotone Missing
Data and Pattern-Mixture Models. Statistica Neerlandica 52, 153-161.

Neal, R. (2003). Slice sampling. The Annals of Statistics 31, 705-741.

Neyman, J. (1923). On the application of probability theory to agricultural experiments.
Statistical Science 5, 465-472.

Nordheim, E. (1984). Inference from Nonrandomly Missing Categorical Data: an
Example From a Genetic Study of Turner's Syndrome. Journal of the American
Statistical Association 79, 772-780.

Pauler, D., McCoy, S., and Moinpour, C. (2003). Pattern Mixture Models for Longitudinal
Quality of Life Studies in Advanced Stage Disease. Statistics in Medicine 22,
795-809.

Pulkstenis, E., Ten Have, T, and Landis, J. (1998). Model for the Analysis of Binary
Longitudinal Pain Data Subject to Informative Dropout Through Remedication. Journal
of the American Statistical Association 93, 438-450.

Radloff, L. (1977). The CES-D Scale: A Self-Report Depression Scale for Research in
the General Population. Applied Psychological Measurement 1, 385.

Robins, J. (1997). Non-response models for the analysis of non-monotone non-ignorable
missing data. Statistics in Medicine 16, 21-37.

Robins, J. (1998). Correction for non-compliance in equivalence trials. Statistics in
Medicine 17,.

Robins, J. and Ritov, Y. (1997). Toward a curse of dimensionality appropriate(coda)
asymptotic theory for semi-parametric models. Statistics in Medicine 16, 285-319.

Robins, J., Rotnitzky, A., and Zhao, L. (1994). Estimation of regression coefficients
when some regressors are not always observed. Journal of the American Statistical
Association 89, 846-866.

Robins, J., Rotnitzky, A., and Zhao, L. (1995). Analysis of semiparametric regression
models for repeated outcomes in the presence of missing data. Journal of the
American Statistical Association 90,.


119









Rotnitzky, A., Robins, J., and Scharfstein, D. (1998a). Semiparametric regression
for repeated outcomes with nonignorable nonresponse. Journal of the American
Statistical Association 93, 1321-1322.

Rotnitzky, A., Robins, J., and Scharfstein, D. (1998b). Semiparametric regression
for repeated outcomes with nonignorable nonresponse. Journal of the American
Statistical Association 93, 1321-1322.

Rotnitzky, A., Scharfstein, D. O., Su, T.-L., and Robins, J. M. (2001). Methods for
conducting sensitivity analysis of trials with potentially nonignorable competing causes
of censoring. Biometrics 57, 103-113.

Roy, J. (2003). Modeling Longitudinal Data with Nonignorable Dropouts Using a Latent
Dropout Class Model. Biometrics 59, 829-836.

Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for
nonignorable dropout with many possible dropout times. Biometrics 64, 538-545.

Rubin, D. (1974). Estimating causal effects of treatments in randomized and
nonrandomized studies. Journal of Educational Psychology 66, 688-701.

Rubin, D. (1976). Inference and missing data. Biometrika 63, 581-592.

Rubin, D. (1977). Formalizing subjective notions about the effect of nonrespondents in
sample surveys. Journal of the American Statistical Association pages 538-543.

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.

Rubin, D. B. (2000). Causal inference without counterfactuals: comment. Journal of the
American Statistical Association pages 435-438.

Scharfstein, D., Daniels, M., and Robins, J. (2003). Incorporating Prior Beliefs about
Selection Bias into the Analysis of Randomized Trials with Missing Outcomes.
Biostatistics 4, 495.

Scharfstein, D., Halloran, M., Chu, H., and Daniels, M. (2006). On estimation of vaccine
efficacy using validation samples with selection bias. Biostatistics 7, 615.

Scharfstein, D., Manski, C., and Anthony, J. (2004). On the Construction of Bounds in
Prospective Studies with Missing Ordinal Outcomes: Application to the Good Behavior
Game Trial. Biometrics 60, 154-164.

Scharfstein, D., Rotnitzky, A., and Robins, J. (1999). Adjusting for Nonignorable
Drop-Out Using Semiparametric Nonresponse Models. Journal of the American
Statistical Association 94, 1096-1146.


120









Schulman, K., Berlin, J., Harless, W., Kerner, J., Sistrunk, S., Gersh, B., Dube, R.,
Taleghani, C., Burke, J., Williams, S., et al. (1999). The effect of race and sex on
physicians recommendations for cardiac catheterization. New England Journal of
Medicine 340, 618-26.

Shepherd, B., Gilbert, P., and Mehrotra, D. (2007). Eliciting a Counterfactual Sensitivity
Parameter. American Statistician 61, 56.

Ten Have, T, Kunselman, A., Pulkstenis, E., and Landis, J. (1998). Mixed effects logistic
regression models for longitudinal binary response data with informative drop-out.
Biometrics 54, 367-383.

Ten Have, T, Miller, M., Reboussin, B., and James, M. (2000). Mixed Effects Logistic
Regression Models for Longitudinal Ordinal Functional Response Data with
Multiple-Cause Drop-Out from the Longitudinal Study of Aging. Biometrics 56,
279-287.

Thijs, H., Molenberghs, G., Michiels, B., Verbeke, G., and Curran, D. (2002). Strategies
to fit pattern-mixture models. Biostatistics 3, 245.

Troxel, A., Harrington, D., and Lipsitz, S. (1998). Analysis of longitudinal data with
non-ignorable non-monotone missing values. Journal of the Royal Statistical Society.
Series C (Applied Statistics) 47, 425-438.

Troxel, A., Lipsitz, S., and Harrington, D. (1998). Marginal models for the analysis of
longitudinal measurements with nonignorable non-monotone missing data. Biometrika
85, 661.

Tsiatis, A. A. (2006). Semiparametric theory and missing data. Springer, New York.

van der Laan, M. J. and Robins, J. (2003). Unified Methods for Censored Longitudinal
Data and Causality. Springer.

Vansteelandt, S., Goetghebeur, E., Kenward, M., and Molenberghs, G. (2006a).
Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statis-
tica Sinica 16, 953-979.

Vansteelandt, S., Goetghebeur, E., Kenward, M., and Molenberghs, G. (2006b).
Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statis-
tica Sinica 16, 953-979.

Vansteelandt, S., Rotnitzky, A., and Robins, J. (2007). Estimation of regression models
for the mean of repeated outcomes under nonignorable nonmonotone nonresponse.
Biometrika 94, 841.

Wahba, G. (1990). Spline models for observational data. Society for Industrial
Mathematics.









Wang, C. and Daniels, M. (2009). Discussion of "Missing Data in longitudinal studies: A
review" by Ibrahim and Molenberghs. TEST 18, 51-58.

Wang, C., Daniels, M., D.O., S., and Land, S. (2010). A Bayesian shrinkage model for
incomplete longitudinal binary data with application to the breast cancer prevention
trial. To Appear in JASA.

Wu, M. and Bailey, K. (1988). Analysing changes in the presence of informative right
censoring caused by death and withdrawal. Statistics in Medicine 7,.

Wu, M. and Bailey, K. (1989). Estimation and comparison of changes in the presence of
informative right censoring: conditional linear model. Biometrics pages 939-955.

Wu, M. and Carroll, R. (1988). Estimation and comparison of changes in the presence
of informative right censoring by modeling the censoring process. Biometrics 44,
175-188.

Wulfsohn, M. and Tsiatis, A. (1997). A joint model for survival and longitudinal data
measured with error. Biometrics pages 330-339.

Yuan, Y. and Little, R. J. (2009). Mixed-effect hybrid models for longitudinal data with
nonignorable drop-out. Biometrics (in press).

Zhang, J. and Heitjan, D. (2006). A simple local sensitivity analysis tool for nonignorable
coarsening: application to dependent censoring. Biometrics 62, 1260-1268.

Zhang, J. and Rubin, D. (2003). Estimation of Causal Effects via Principal Stratification
When Some Outcomes are Truncated by" Death". Journal of Educational and
Behavioral Statistics 28, 353.


122









BIOGRAPHICAL SKETCH

Chenguang Wang received his bachelor's and master's degrees in computer

science from Dalian University of Technology, China. Chenguang later joined the

biometry program of and received his master's degree in statistics from University of

Nebraska-Lincoln. At University of Florida, Chenguang's major was statistics while

simultaneously working for the Children's Oncology Group Statistics and Data Center

(2004-2009) and Center for Devices and Radiological Health, FDA (2009-2010).

Chenguang received his Ph.D. from University of Florida in the summer of 2010.

Chenguang's research has focused on constructing a Bayesian framework for

incomplete longitudinal data that identifies the parameters of interest and assesses

sensitivity of the inference via incorporating expert opinions. Such a framework can

be broadly used in clinical trials to provide health care professionals more accurate

understanding of the statistical or causal relationship between clinical interventions and

human diseases.

Chenguang is a member of American Statistical Association, a member of Eastern

North American Region/International Biometric Society, and a member of Children's

Oncology Group.


123





PAGE 2

2

PAGE 3

3

PAGE 4

Firstandforemost,Iwouldliketoexpressthedeepestappreciationtomyadvisor,ProfessorMichaelJ.Daniels.Withouthisextraordinaryguidanceandpersistenthelp,IwillneverbeabletobewhereIam.Iadmirehiswisdom,hisknowledgeandhiscommitmenttothehigheststandard.Ithasbeentrulyanhonortoworkwithhim.IwishtospeciallythankProfessorDanielO.ScharfsteinofJohnsHopkinsforhisencouragementandcrucialcontributiontotheresearch.Iwillalwaysbearinmindtheadvicehegave:justrelaxandenjoythelearningprocess.Iwouldliketothankmycommitteemembers,ProfessorMalayGhosh,Dr.BrettPresnell,andDr.AlmutWinterstein,whohaveprovidedabundantsupportandvaluableinsightsovertheentireprocessthroughouttheclasses,examsanddissertation.ManythanksgoinparticulartoProfessorRonglingWuofPennsylvaniaStateUniversity.FromProfessorWu,IstartedlearningwhatIwantedformycareer.IalsogratefullythankProfessorMyronChang,ProfessorLindaYoung,Dr.MeenakshiDevidasandDr.GregoryCampbellofFDA.Iamfortunatetohavetheirsupportatthosecriticalmomentsofmycareer.Finally,Iwouldliketothankmywife,myson,mysoon-to-be-bornbaby,myparentsandmyparents-in-law.ItisonlybecauseofyouthatIhavebeenabletokeepworkingtowardthisdreamIhave. 4

PAGE 5

page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 12 1.1MissingDataConceptsandDenitions .................... 12 1.2Likelihood-BasedMethods .......................... 16 1.3Non-LikelihoodMethods ............................ 19 1.4IntermittentMissingness ............................ 20 1.5IdentifyingRestrictionsinPatternMixtureModels .............. 22 1.6DissertationGoals ............................... 24 2ABAYESIANSHRINKAGEMODELFORLONGITUDINALBINARYDATAWITHDROP-OUT .................................. 26 2.1Introduction ................................... 26 2.1.1BreastCancerPreventionTrial .................... 26 2.1.2InformativeDrop-OutinLongitudinalStudies ............ 27 2.1.3Outline .................................. 30 2.2DataStructureandNotation .......................... 30 2.3Assumptions .................................. 31 2.4Identiability ................................... 32 2.5Modeling .................................... 35 2.6PriorSpecicationandPosteriorComputation ................ 35 2.6.1ShrinkagePriors ............................ 36 2.6.2PriorofSensitivityParameters .................... 37 2.6.3PosteriorComputation ......................... 40 2.7AssessmentofModelPerformanceviaSimulation ............. 40 2.8Application:BreastCancerPreventionTrial(BCPT) ............ 42 2.8.1ModelFitandShrinkageResults ................... 42 2.8.2Inference ................................ 43 2.9SummaryandDiscussion ........................... 43 2.10Acknowledgments ............................... 45 2.11TablesandFigures ............................... 45 5

PAGE 6

................. 54 3.1Introduction ................................... 54 3.1.1IntermittentMissingData ....................... 54 3.1.2ComputationalIssues ......................... 55 3.1.3Outline .................................. 55 3.2Notation,AssumptionsandIdentiability ................... 56 3.3Modeling,PriorSpecicationandPosteriorComputation .......... 58 3.3.1Modeling ................................. 58 3.3.2ShrinkagePrior ............................. 58 3.3.3PriorofSensitivityParameters .................... 60 3.3.4PosteriorComputation ......................... 61 3.4AssessmentofModelPerformanceviaSimulation ............. 61 3.5Application:BreastCancerPreventionTrial(BCPT) ............ 62 3.5.1ModelFit ................................ 62 3.5.2Inference ................................ 63 3.5.3SensitivityofInferencetothePriors .................. 64 3.6SummaryandDiscussion ........................... 65 3.7TablesandFigures ............................... 65 3.8Appendix .................................... 73 4ANOTEONMAR,IDENTIFYINGRESTRICTIONS,ANDSENSITIVITYANALYSISINPATTERNMIXTUREMODELS ......................... 77 4.1Introduction ................................... 77 4.2ExistenceofMARunderMultivariateNormalitywithinPattern ....... 79 4.3SequentialModelSpecicationandSensitivityAnalysisunderMAR ... 83 4.4Non-FutureDependenceandSensitivityAnalysisunderMultivariateNormalitywithinPattern .................................. 85 4.5MARandSensitivityAnalysiswithMultivariateNormalityontheObserved-DataResponse .................................... 87 4.6Example:GrowthHormoneStudy ...................... 89 4.7ACMVRestrictionsandMultivariateNormalitywithBaselineCovariates 91 4.7.1BivariateCase ............................. 91 4.7.2MultivariateCase ............................ 93 4.8Summary .................................... 95 4.9Tables ...................................... 96 4.10Appendix .................................... 98 5DISCUSSION:FUTUREAPPLICATIONOFTHEBAYESIANNONPARAMETRICANDSEMI-PARAMETRICMETHODS ....................... 103 5.1Summary .................................... 103 5.2ExtensionstoGeneticsMapping ....................... 103 5.3ExtensionstoCausalInference ........................ 105 6

PAGE 7

..................... 105 5.3.2DataandNotation ........................... 107 5.3.3MissingDataMechanism ....................... 108 5.3.4CausalInferenceAssumption ..................... 109 5.3.5StochasticSurvivalMonotonicityAssumption ............ 111 5.3.6SummaryofCausalInference ..................... 112 5.4Figures ..................................... 112 REFERENCES ....................................... 115 BIOGRAPHICALSKETCH ................................ 123 7

PAGE 8

Table page 2-1RelativeRiskstobeElicited ............................. 45 2-2PercentilesofRelativeRisksElicited ........................ 45 2-3SimulationScenario ................................. 46 2-4SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. .................................. 47 2-5PatientsCumulativeDropOutRate ......................... 47 3-1MissingnessbyScheduledMeasurementTime .................. 65 3-2SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. .................................. 66 3-3SensitivitytotheElicitedPrior ............................ 67 3-4SensitivitytotheElicitedPrior ............................ 68 4-1GrowthHormoneStudy:Samplemean(standarddeviation)stratiedbydropoutpattern. ........................................ 96 4-2GrowthHormoneStudy:Posteriormean(standarddeviation) .......... 97 8

PAGE 9

Figure page 2-1Extrapolationoftheelicitedrelativerisks. ..................... 48 2-2PriorDensityofjp 49 2-3ModelFit ....................................... 50 2-4ModelShrinkage ................................... 51 2-5PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. ............................... 52 2-6Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. ............................... 53 3-1ModelFit ....................................... 69 3-2Shrinkage ....................................... 70 3-3PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. ............................... 71 3-4Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. ............................... 72 5-1ContourandPerspectivePlotsofaBivariateDensity .............. 113 5-2Illustrationofpc,t 114 9

PAGE 10

Weconsiderinferenceinrandomizedlongitudinalstudieswithmissingdatathatisgeneratedbyskippedclinicvisitsandlosstofollow-up.Inthissetting,itiswellknownthatfulldataestimandsarenotidentiedunlessunveriedassumptionsareimposed.Sensitivityanalysisthatassessesthesensitivityofmodel-basedinferencestosuchassumptionsisoftennecessary. InChapters 2 and 3 ,wepositanexponentialtiltmodelthatlinksnon-identiabledistributionsandidentiabledistributions.Thisexponentialtiltmodelisindexedbynon-identiedparameters,whichareassumedtohaveaninformativepriordistribution,elicitedfromsubject-matterexperts.Underthismodel,fulldataestimandsareshowntobeexpressedasfunctionalsofthedistributionoftheobserveddata.Weproposetwodifferentsaturatedmodelsfortheobserveddatadistribution,aswellasshrinkagepriorstoavoidthecurseofdimensionality.Thetwoproceduresprovideresearchersdifferentstrategiesforreducingthedimensionofparameterspace.Weassumeanon-futuredependencemodelforthedrop-outmechanismandpartialignorabilityfortheintermittentmissingness.Inasimulationstudy,wecompareourapproachtoafullyparametricandafullysaturatedmodelforthedistributionoftheobserveddata.Ourmethodologyismotivatedby,andappliedto,datafromtheBreastCancerPreventionTrial. 10

PAGE 11

4 ,wediscusspatternmixturemodels.Patternmixturemodelingisapopularapproachforhandlingincompletelongitudinaldata.Suchmodelsarenotidentiablebyconstruction.Identifyingrestrictionsareoneapproachtomixturemodelidentication( DanielsandHogan 2008 ; Kenwardetal. 2003 ; Little 1995 ; LittleandWang 1996 ; Thijsetal. 2002 )andareanaturalstartingpointformissingnotatrandomsensitivityanalysis( DanielsandHogan 2008 ; Thijsetal. 2002 ).However,whenthepatternspecicmodelsaremultivariatenormal(MVN),identifyingrestrictionscorrespondingtomissingatrandommaynotexist.Furthermore,identicationstrategiescanbeproblematicinmodelswithcovariates(e.g.baselinecovariateswithtime-invariantcoefcients).Inthispaper,weexploreconditionsnecessaryforidentifyingrestrictionsthatresultinmissingatrandom(MAR)toexistunderamultivariatenormalityassumptionandstrategiesforidentifyingsensitivityparametersforsensitivityanalysisorforafullyBayesiananalysiswithinformativepriors.Alongitudinalclinicaltrialisusedforillustrationofsensitivityanalysis.Problemscausedbybaselinecovariateswithtime-invariantcoefcientsareinvestigatedandanalternativeidentifyingrestrictionbasedonresidualsisproposedasasolution. 11

PAGE 12

Theproblemofincompletedataisfrequentlyconfrontedbystatisticians,especiallyinlongitudinalstudies.Themostcommontypeofincompletedataismissingdata,inwhicheachdatavalueiseitherperfectlyknownorcompletelyunknown.Inothersituations,dataarepartiallymissingandpartiallyobserved.Examplesincluderoundeddataandcensoreddata,etc..Thistypeofincompletedataisreferredtoascoarsedata.Missingdatacanbeviewedasaspecialcaseofcoarsedata( HeitjanandRubin 1991 ).Inbothcases,theincompletenessoccursbecauseweobserveonlyasubsetofthecompletedata,whichincludesthetrue,unobservabledata.Inthisdissertation,missingdataincludingthedrop-outmissingness,inwhichcasesubjectsmissingameasurementwillnotreturntostudyatthenextfollow-up,andtheintermittentmissingness,inwhichcasethemissingvisitisfollowedbyanobservedmeasurement. LittleandRubin 1987 ,chapter4);however,thismethodisinefcient.Anothercommonapproachissingleimputation,thatis,llinginasinglevalueforeachmissingvalue.Theadvantageofsingleimputationisthatitdoesnotdeleteanyunitsandaftertheimputation,standardmethodsforcomplete 12

PAGE 13

Rubin 1987 ). Fornotation,lety=fy1,...,yJgdenotethefulldataresponsevectorofoutcome,possiblypartiallyobserved.Letr=fr1,r2,...,rJgdenotethemissingdataindicator,withrj=0ifyjismissingand1ifyjisobserved.Letxdenotethecovariates.Letyobsandymisdenotetheobservedandmissingresponsedata,respectively.Let!betheparametersindexingthefulldatamodelp(y,r),(!)betheparametersindexingthefulldataresponsemodelp(y),and(!)betheparametersindexingthemissingdatamechanismmodelp(rjy). Thecommonassumptionsaboutthemissingdatamechanismareasfollows. ( 1976 )and LittleandRubin ( 1987 )developedahierarchyformissingdatamechanismsbyclassifyingtherelationshipbetweenmissingnessandtheresponsedata. NotethatMARholdsifandonlyifp(ymisjyobs,r)=p(ymisjyobs).Theproofisasfollows: SupposeMARholds.Thenwehavep(rjymis,yobs)=p(rjyobs)

PAGE 14

Toshowthereversedirection,notethatp(rjymis,yobs)=p(r,ymisjyobs) Laird 1988 ).Thisconditioniscalledignorability( Rubin 1976 ). 1. ThemissingdatamechanismisMAR. 2. Theparametersofthefulldataresponsemodel,(!)andtheparametersofthemissingnessmodelaredistinguishable,i.e.thefulldataparameter!canbedecomposedas((!),(!)). 3. Theparameters(!)and(!)areaprioriindependent,i.e.p((!),(!))=p((!))p((!)). FulldatamodelsthatdonotsatisfyDenition 1.4 havenon-ignorablemissingness. 14

PAGE 15

Kenwardetal. ( 2003 )denedthetermnon-futuredependence. 15

PAGE 16

HoganandLaird 1997b ).Likelihood-basedmodelsformissingdataaredistinguishedbythewaythejointdistributionoftheoutcomeandmissingdataprocessesarefactorized.Theycanbeclassiedasselectionmodels,pattern-mixturemodels,andshared-parametermodels. Heckman ( 1979a b )usedabivariateresponseYwithmissingY2asanexampleandshowedthatingeneralit'scriticaltoanswerthequestionwhyarethedatamissingbymodelingthemissingnessofY2iasafunctionofobservedY1i(forsubjecti). DiggleandKenward ( 1994 )extendedtheHeckmanmodeltolongitudinalstudiesandmodeledthedrop-outprocessbylogisticregressionsuchaslogit(rj=0jrj1=1,y)=y0. Albert 2000 ; Baker 1995 ; Fitzmauriceetal. 1995 ; Heagerty 2002 ; KurlandandHeagerty 2004 ). 16

PAGE 17

( 1977 )introducedtheideaofmodelingrespondentsandnonrespondentsinsurveysseparatelyandusingsubjectivepriorstorelaterespondents'andnonrespondents'modelparameters. Little ( 1993 1994 )exploredpatternmixturemodelsindiscretetimesettings.Specically,differentidentifyingrestrictions(seeSection 1.5 )wereproposedtoidentifythefull-datamodel.Whenthenumberofdropoutpatternsislargeandpattern-specicparameterswillbeweaklyidentiedbyidentifyingrestrictions, Roy ( 2003 )and RoyandDaniels ( 2008 )proposedtouselatent-classmodelfordropoutclasses.Whenthedropouttimeiscontinuousandthemixtureofpatternsisinnite, Hoganetal. ( 2004 )proposedtomodeltheresponsegivendropoutbyavaryingcoefcientmodelwhereregressioncoefcientswereunspecied,non-parametricfunctionsofdropouttime.Fortime-eventdatawithinformativecensoring, WuandBailey ( 1988 1989 )and HoganandLaird ( 1997a )developedrandomeffectsmixturemodels. FitzmauriceandLaird ( 2000a )generalizedWuandBaileyandHoganandLairdapproachfordiscrete,ordinalandcountdatabyusinggeneralizedlinearmixturemodelsandGEEapproachforstatisticalinference. DanielsandHogan ( 2000 )proposedaparameterizationofthepatternmixturemodelforcontinuousdata.Sensitivityanalysiscanbedoneontheadditive(location)andmultiplicative(scale)terms. ForsterandSmith ( 1998 )consideredapatternmixturemodelforasinglecategoricalresponsewithcategoricalcovariates.Bayesianapproacheswereemployedfornon-ignorablemissingness.

PAGE 18

WuandCarroll ( 1988 )presentedasharedparameterrandomeffectsmodelforcontinuousresponsesandinformativecensoring,inwhichindividualeffectsaretakenintoaccountasinterceptsandslopesformodelingthecensoringprocess. DeGruttolaandTu ( 1994 )extendedWuandCarroll'smodeltoallowgeneralcovariates. FollmannandWu ( 1995 )developedgeneralizedlinearmodelforresponseandproposedanapproximationalgorithmforthejointfull-datamodelforinference. FaucettandThomas ( 1996 )and WulfsohnandTsiatis ( 1997 )proposedtojointlymodelthecontinuouscovariateovertimeandrelatethecovariatestotheresponsesimultaneously. Hendersonetal. ( 2000 )generalizedthejointmodelingapproachbyusingtwocorrelatedGaussianrandomprocessesforcovariatesandresponse. TenHaveetal. ( 1998 2000 )proposedasharedparametermixedeffectslogisticregressionmodelforlongitudinalordinaldata.Recently, YuanandLittle ( 2009 )proposedamixed-effecthybridmodelallowsthemissingnessandresponsetobeconditionallydependentgivenrandomeffects. DanielsandHogan 2008 ).Full-datamodelinferencerequiresunveriableassumptionsabouttheextrapolationmodelp(ymisjyobs,r,!E).Asensitivityanalysisexploresthesensitivityofinferencesofinterestaboutthefulldataresponsemodeltounveriableassumptionsabouttheextrapolationmodel.Thisistypicallydonebyvaryingsensitivityparameters,whichwedenenext( DanielsandHogan 2008 ). 18

PAGE 19

1. 2. TheobservedlikelihoodL(S,Mjyobs,r)isaconstantasafunctionofS, 3. GivenSxed,L(S,Mjyobs,r)isanon-constantfunctionofM Unfortunately,fullyparametricselectionmodelsandsharedparametermodelsdonotallowsensitivityanalysisassensitivityparameterscannotbefound( DanielsandHogan 2008 ,Chapter8).Examiningsensitivitytodistributionalassumptions,e.g.,randomeffects,willprovidedifferenttstotheobserveddata,(yobs,r).Insuchcases,asensitivityanalysiscannotbedonesincevaryingthedistributionalassumptionsdoesnotprovideequivalenttstotheobserveddata( DanielsandHogan 2008 ).Itthenbecomesanexerciseinmodelselection. FullyBayesiananalysisallowsresearcherstohaveasingleconclusionbyadmittingpriorbeliefsaboutthesensitivityparameters.Forcontinuousresponses, LeeandBerger ( 2001 )builtasemiparametricBayesianselectionmodelwhichhasstrongdistributionalassumptionfortheresponsebutweakassumptiononmissingdatamechanism. Scharfsteinetal. ( 2003 )ontheotherhand,placedstrongparametricassumptionsonmissingdatamechanismbutminimalassumptionsontheresponseoutcome. LiangandZeger ( 1986 )proposedgeneralizedestimatingequations(GEE)whosesolution 19

PAGE 20

Robinsetal. ( 1995 )proposedinverse-probabilityofcensoringweightedgeneralizedestimatingequations(IPCW-GEE)approach,whichreweightseachindividual'scontributiontotheusualGEEbytheestimatedprobabilityofdrop-out.IPCW-GEEwillleadtoconsistentestimationwhenthemissingnessisMAR.However,bothGEEandIPCW-GEEcanresultinbiasedestimationunderMNAR. Rotnitzkyetal. ( 1998a 2001 ), Scharfsteinetal. ( 2003 )and Schulmanetal. ( 1999 )adoptedsemiparametricselectionmodelingapproaches,inwhichthemodelfordrop-outisindexedbyinterpretablesensitivityparametersthatexpressdeparturesfromMAR.Forsuchapproaches,theinferenceresultsdependonthechoiceofunidentied,yetinterpretable,sensitivityanalysisparameters. Oneapproachtohandleintermittentmissingnessistoconsideramonotonizeddataset,wherebyallobservedvaluesonanindividualaftertheirrstmissingnessaredeleted Landetal. ( 2002 ).However,thisincreasesthedropoutrate,losesefciency,andmayintroducebias. Othermethodsintheliteratureoftenadoptalikelihoodapproachandrelyonstrongparametricassumptions.Forexample, Troxeletal. ( 1998 ), Albert ( 2000 )and Ibrahimetal. ( 2001 )suggestedaselectionmodelapproach. Albertetal. ( 2002 )usedasharedlatentautoregressiveprocessmodel. Linetal. ( 2004 )employedlatentclasspatternmixturemodel. 20

PAGE 21

Troxeletal. ( 1998 )and Vansteelandtetal. ( 2007 ). Troxeletal. ( 1998 )proposedamarginalmodelandintroducedapseudo-likelihoodestimationprocedure. Vansteelandtetal. ( 2007 )extendedtheideasof Rotnitzkyetal. ( 1998b ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )tonon-monotonemissingdatathatassume(exponentiallytilted)extensionsofsequentialexplainabilityandspeciedparametricmodelsforcertainconditionalmeans. MostrelatedtotheapproachwewilluseinChapter 3 arethe(partialignorability)assumptionsformalizedin HarelandSchafer ( 2009 )thatpartitionthemissingdataandallowone(ormore)ofthepartitionstobeignoredgiventheotherpartition(s)andtheobserveddata.Specically, HarelandSchafer ( 2009 )denedamissingdatamechanismtobepartiallymissingatrandomifp(rjyobs,ymis,g(r),x;(!))=p(rjyobs,g(r),x;(!)) Vansteelandtetal. ( 2007 ). Inthisdissertation,weexplicitlypartitionthemissingdataindicatorvectorrintofrs,sg,wheres=maxtfrt=1gdenotesthelasttimepointaresponsewasobserved,i.e.thesurvivaltime,andrs=frt:t
PAGE 22

1.7 Little 1993 1994 ).Additionalassumptionsaboutthemissingdataprocessarenecessaryinordertoyieldidentifyingrestrictionsthatequatetheinestimableparameterstofunctionsofestimableparametersandidentifythefull-datamodel. Forexample,considerthesituationwheny=(y1,y2)isabivariatenormalresponsewithmissingdataonlyiny2.Letsbethesurvivaltime,i.e.s=1ify2ismissingands=2ify2isobserved.Wemodelp(s)andp(yjs)assBern()andyjs=iN((s),(s))fori=1,2,with(s)=264(s)1(s)2375and(s)=264(s)11(s)12(s)12(s)22375.

PAGE 23

Understanding(identifying)restrictionsthatleadtoMARisanimportantrststepforsensitivityanalysisundermissingnotatrandom(MNAR)( DanielsandHogan 2008 ; Scharfsteinetal. 2003 ; ZhangandHeitjan 2006 ).Inparticular,MARprovidesagoodstartingpointforsensitivityanalysisandsensitivityanalysisareessentialfortheanalysisofincompletedata( DanielsandHogan 2008 ; Scharfsteinetal. 1999 ). Little ( 1993 )developedseveralcommonidentifyingrestrictions.Forexample,completecasemissingvalue(CCMV)restrictionswhichequateallmissingpatternstothecompletecases,i.e.pk(yjj

PAGE 24

( 1998 )provedthatfordiscretetimepointsandmonotonemissingness,theACMVconstraintisequivalenttomissingatrandom(MAR). Thijsetal. ( 2002 )developedstrategiestoapplyidentifyingrestrictions.Thatisrsttpk( Kenwardetal. ( 2003 )discussedidentifyingrestrictionscorrespondingtomissingnon-futuredependence. 24

PAGE 25

DanielsandHogan 2008 ).However,multivariatenormalitywithinpatternscanbeoverlyrestrictivewhenapplyingidentifyingrestrictions.WeexploresuchissuesinChapter 4 Furthermore,identicationstrategiescanbeproblematicinmodelswithcovariates(e.g.baselinecovariateswithtime-invariantcoefcients).InthisChapter,wealsoexploreconditionsnecessaryforidentifyingrestrictionsthatresultinmissingatrandom(MAR)toexistunderamultivariatenormalityassumptionandstrategiesforsensitivityanalysis.Problemscausedbybaselinecovariateswithtime-invariantcoefcientsareinvestigatedandanalternativeidentifyingrestrictionbasedonresidualsisproposedasasolution. 25

PAGE 26

2.1.1BreastCancerPreventionTrial Fisheretal. 1998 ).ThestudywasopentoaccrualfromJune1,1992throughSeptember30,1997and13,338womenaged35orolderwereenrolledinthestudyduringthisinterval.Theprimaryobjectivewastodeterminewhetherlong-termtamoxifentherapyiseffectiveinpreventingtheoccurrenceofinvasivebreastcancer.Secondaryobjectivesincludedqualityoflife(QOL)assessmentstoevaluatebenetaswellasriskresultingfromtheuseoftamoxifen. MonitoringQOLwasofparticularimportanceforthistrialsincetheparticipantswerehealthywomenandtherehadbeenconcernsvoicedbyresearchersabouttheassociationbetweenclinicaldepressionandtamoxifenuse.Accordingly,dataondepressionsymptomswasscheduledtobecollectedatbaselinepriortorandomization,at3months,at6monthsandevery6monthsthereafterforupto5years.TheprimaryinstrumentusedtomonitordepressivesymptomsovertimewastheCenterforEpidemiologicStudiesDepressionScale(CES-D)( Radloff 1977 ).Thisself-testquestionnaireiscomposedof20items,eachofwhichisscoredonascaleof0-3.Ascoreof16orhigherisconsideredasalikelycaseofclinicaldepression. ThetrialwasunblindedonMarch31,1998,afteraninterimanalysisshowedadramaticreductionintheincidenceofbreastcancerinthetreatmentarm.Duetothepotentiallossofthecontrolarm,wefocusonQOLdatacollectedonthe10,982participantswhowereenrolledduringthersttwoyearsofaccrualandhadtheirCES-D 26

PAGE 27

IntheBCPT,theclinicalcenterswerenotrequiredtocollectQOLdataonwomenaftertheystoppedtheirassignedtherapy.ThisdesignfeatureaggravatedtheproblemofmissingQOLdatainthetrial.Asreportedin Landetal. ( 2002 ),morethan30%oftheCES-Dscoresweremissingatthe36-monthfollow-up,withaslightlyhigherpercentageinthetamoxifengroup.TheyalsoshowedthatwomenwithhigherbaselineCES-Dscoreshadhigherratesofmissingdataateachfollow-upvisitandthemeanobservedCES-Dscoresprecedingamissingmeasurementwerehigherthanthoseprecedinganobservedmeasurement;therewasnoevidencethattheserelationshipsdifferedbytreatmentgroup. WhiletheseresultssuggestthatthemissingdataprocessisassociatedwithobservedQOLoutcomes,onecannotruleoutthepossibilitythattheprocessisfurtherrelatedtounobservedoutcomesandthatthisrelationshipismodiedbytreatment.Inparticular,investigatorswereconcerned(apriori)that,betweenassessments,tamoxifenmightbecausingdepressioninsomeindividuals,whothendonotreturnfortheirnextassessment.Ifthisoccurs,thedataaresaidbemissingnotatrandom(MNAR);otherwisethedataaresaidtobemissingatrandom(MAR). Landetal. ( 2002 ),consideramonotonizeddataset,wherebyallCES-Dscoresobservedonanindividualaftertheirrstmissingscorehavebeendeleted(thisincreasesthedropoutrate). Therearetwomaininferentialparadigmsforanalyzinglongitudinalstudieswithinformativedrop-out:likelihood(parametric)andnon-likelihood(semi-parametric). 27

PAGE 28

Little ( 1995 ), HoganandLaird ( 1997b )and KenwardandMolenberghs ( 1999 )aswellasrecentbooksby MolenberghsandKenward ( 2007 )and DanielsandHogan ( 2008 )provideacomprehensivereviewoflikelihood-basedapproaches,includingselectionmodels,pattern-mixturemodels,andshared-parametermodels.Thesemodelsdifferinthewaythejointdistributionoftheoutcomeandmissingdataprocessesarefactorized.Inselectionmodels,onespeciesamodelforthemarginaldistributionoftheoutcomeprocessandamodelfortheconditionaldistributionofthedrop-outprocessgiventheoutcomeprocess(see,forexample, Albert 2000 ; Baker 1995 ; DiggleandKenward 1994 ; Fitzmauriceetal. 1995 ; Heckman 1979a ; Liuetal. 1999 ; Molenberghsetal. 1997 );inpattern-mixturemodels,onespeciesamodelfortheconditionaldistributionoftheoutcomeprocessgiventhedrop-outtimeandthemarginaldistributionofthedrop-outtime(see,forexample, BirminghamandFitzmaurice 2002 ; DanielsandHogan 2000 ; FitzmauriceandLaird 2000b ; HoganandLaird 1997a ; Little 1993 1994 1995 ; Pauleretal. 2003 ; Roy 2003 ; RoyandDaniels 2008 ; Thijsetal. 2002 );andinshared-parametermodels,theoutcomeanddrop-outprocessesareassumedtobeconditionallyindependentgivensharedrandomeffects(see,forexample, DeGruttolaandTu 1994 ; Landetal. 2002 ; Pulkstenisetal. 1998 ; TenHaveetal. 1998 2000 ; WuandCarroll 1988 ; YuanandLittle 2009 ).Traditionally,thesemodelshavereliedonverystrongdistributionalassumptionsinordertoobtainmodelidentiability. Withoutthesestrongdistributionalassumptions,specicparametersfromthesemodelswouldnotbeidentiedfromthedistributionoftheobserveddata.Toaddressthisissuewithinalikelihood-basedframework,severalauthors( Bakeretal. 1992 ; DanielsandHogan 2008 ; KurlandandHeagerty 2004 ; Little 1994 ; LittleandRubin 1999 ; Nordheim 1984 )havepromotedtheuseofglobalsensitivityanalysis,wherebynon-orweakly-identied,interpretableparametersarexedandthenvariedtoevaluate 28

PAGE 29

Non-likelihoodapproachestoinformativedrop-outinlongitudinalstudieshavebeenprimarilydevelopedfromaselectionmodelingperspective.Here,themarginaldistributionoftheoutcomeprocessismodelednon-orsemi-parametricallyandtheconditionaldistributionofthedrop-outprocessgiventheoutcomeprocessismodeledsemi-orfully-parametrically.Inthecasewherethedrop-outprocessisassumedtodependonlyonobservableoutcomes(i.e.,MAR), Robinsetal. ( 1994 1995 ), vanderLaanandRobins ( 2003 )and Tsiatis ( 2006 )developedinverse-weightedandaugmentedinverse-weightedestimatingequationsforinference.Forinformativedrop-out, Rotnitzkyetal. ( 1998a ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )introducedaclassofselectionmodels,inwhichthemodelfordrop-outisindexedbyinterpretablesensitivityparametersthatexpressdeparturesfromMAR.Inferenceusinginverse-weightedestimatingequationswasproposed. Theproblemwiththeaforementionedsensitivityanalysisapproachesisthattheultimateinferencescanbecumbersometodisplay. Vansteelandtetal. ( 2006a )developedamethodforreportingignoranceanduncertaintyintervals(regions)thatcontainthetrueparameter(s)ofinterestwithaprescribedlevelofprecision,whenthetruedatageneratingmodelisassumedtofallwithinaplausibleclassofmodels(asanexample,see Scharfsteinetal. 2004 ).Analternativeandverynaturalstrategyisspecifyaninformativepriordistributiononthenon-orweakly-identiedparametersandconductafullyBayesiananalysis,wherebytheultimateinferencesarereportedintermsofposteriordistributions.Inthecross-sectionalsettingwithacontinuousoutcome, Scharfsteinetal. ( 2003 )adoptedthisapproachfromasemi-parametricselectionmodelingperspective. Kacirotietal. ( 2009 )proposedaparametricpattern-mixturemodelforcross-sectional,clusteredbinaryoutcomes. Leeetal. ( 2008 )introducedafully-parametricpattern-mixtureapproachinthelongitudinalsettingwithbinary 29

PAGE 30

Leeetal. ( 2008 ),butofferamoreexiblestrategy.InthecontextofBCPT,thelongitudinaloutcomewillbetheindicatorthattheCES-Dscoreis16orhigher. 2.2 ,wedescribethedatastructure.InSection 2.3 and 2.4 ,weformalizeidenticationassumptionsandprovethatthefull-datadistributionisidentiedundertheseassumptions.WeintroduceasaturatedmodelforthedistributionoftheobserveddatainSection 2.5 .InSection 2.6 ,weillustratehowtoapplyshrinkagepriorstohigh-orderinteractionparametersinthesaturatedmodeltoreducethedimensionalityoftheparameterspaceandhowtoelicit(conditional)informativepriorsfornon-identiedsensitivityparametersfromexperts.InSection 2.7 ,weassess,bysimulation,thebehaviorofthreeclassesofmodelsforthedistributionofobserveddata;parametric,saturated,andshrinkage.OuranalysisoftheBCPTtrialispresentedinSection 2.8 .Section 2.9 isdevotedtoasummaryanddiscussion. Ourgoalistodrawinferenceaboutz,j=P[Yj=1jZ=z]forj=1,...,Jandz=0,1. 30

PAGE 31

Thisassumptionassertsthatforindividualsatriskfordrop-outatvisitjandwhosharethesamehistoryofoutcomesuptoandincludingvisitj,thedistributionoffutureoutcomesisthesameforthosewhoarelastseenatvisitjandthosewhoremainonstudypastvisitj.Thisassumptionhasbeenreferredtoasnon-futuredependence( Kenwardetal. 2003 ). Assumption2linksthenon-identiedconditionaldistributionofYjgivenRj=0,Rj1=1,

PAGE 33

SupposethatP[Yj=1jRk=0,Rk1=1,

PAGE 34

34

PAGE 35

Furthermore,weproposetoparameterizethefunctionsqz,j( 2.5 providesaperfectttothedistributionoftheobserveddata.Inthismodel,however,thenumberofparametersincreasesexponentiallyinJ.Incontrast,thenumberofdatapointsincreaseslinearlyinJ.Asaconsequence,therewillbemany 35

PAGE 36

RobinsandRitov 1997 ). wheretistheorderofinteractionsandthehyper-parameters(shrinkagevariances)followdistributions(t)Unif(0,10)and(t)Unif(0,10). WhentherstorderMarkovmodelisnottrue,asngoestoinnity,theposteriormeansofobserveddataprobabilitieswillconvergetotheirtruevaluesaslongas 36

PAGE 37

Wespecifynon-informativepriorsN(0,1000)forthenon-interactionparametersin,namelyz,j,0forj=0,...,Jandz=0,1,z,j,1,z,j,0andz,j,1forj=1,...,Jandz=0,1. 2.5 ,are(conditional)oddsratios.Inourexperience,subjectmatterexpertsoftenhavedifcultythinkingintermsofoddsratios;rather,theyaremorecomfortableexpressingbeliefsaboutrelativerisks( Scharfsteinetal. 2006 ; Shepherdetal. 2007 ).Withthisisinmind,weaskedDr.PatriciaGanz,amedicaloncologistandexpertonqualityoflifeoutcomesinbreastcancer,toexpressherbeliefsabouttheriskofdroppingoutanditsrelationshiptotreatmentassignmentanddepression.Wethentranslatedherbeliefsintopriordistributionalassumptionsabouttheoddsratiosensitivityparameters. Specically,weaskedDr.Ganztoanswerthefollowingquestionforeachtreatmentgroup:

PAGE 38

Fornotationalconvenience,letrz(p)denotetherelativeriskofdrop-outfortreatmentgroupzanddrop-outprobabilityp.Further,letrz,min(p),rz,med(p)andrz,max(p)denotetheelicitedminimum,median,andmaximumrelativerisks(seeTable 2-1 ).Letpz,j( Bydenition,rz(pz,j( forrz(pz,j( 38

PAGE 39

Step1. Form2fmin,med,maxg,interpolatetheelicitedrz,m(p)atdifferentdrop-outprobabilities(seeFigure 2-1 )tondrz,m(pz,j( Step2. Constructthepriorofrz(pz,j( Step3. Constructaconditionalpriorofp(0)z,j( maxrz(pz,j( minrz(pz,j( maxrz(pz,j( Step4. Steps(2)and(3)induceapriorforz,j, 1rz(pz,j( TherelativeriskselicitedfromDr.GanzaregiveninTable 2-2 .WeextrapolatedtherelativerisksoutsidetherangesgiveninTable 2-2 asshowninFigure 2-1 Figure 2-2 showsthedensityofgivenpz,j( 39

PAGE 40

1. Usingtheproposedobserveddatamodelwiththeshrinkagepriorson,wesimulatedrawsfromtheposteriordistributionsofP[Yj=1jRj=1, 2. ForeachdrawofP[Rj=0jRj=1, 3. Wecomputez,jbypluggingthedrawsofP[Yj=1jRj=1, 2.4 TosamplefromtheposteriordistributionsofP[Yj=1jRj=1, TheshrinkagemodelusestheshrinkagepriorsproposedinSection 2.6.1 (shrinkthesaturatedmodeltowardarstorderMarkovmodel).Notethattheshrinkagepriorsshrinkthesaturatedmodeltoanincorrectparametricmodel. 40

PAGE 41

Wesimulatedobserveddatafromatrueparametricmodelofthefollowingform:logitP[Y0=1jR0=1,Z=z]=z,0,0logitP[Y1=1jR1=1,Y0=y0,Z=z]=z,1,0+z,1,1y0logitP[R1=0jR0=1,Y0=y0,Z=z]=z,1,0+z,1,1y0logitP[Yj=1jRj=1, Todeterminetheparametersofthedatageneratingmodel,wetthismodeltothemonotonizedBCPTdatainWinBUGSwithnon-informativepriors.Weusedtheposteriormeanoftheofparameterszandzasthetrueparameters.Wecomputethetruevaluesofz,jby(1)drawing10,000valuesfromtheelicitedpriorofzgivenzgiveninTable 2-2 ,(2)computingz,jusingtheidenticationalgorithminSection 2.4 foreachdraw,and(3)averagetheresultingz,j's.Themodelparametersandthetruedepressionratesz,j,aregiveninTable 2-3 Weconsidered(relatively)small(3000),moderate(5000),andlarge(10000)samplesizesforeachtreatmentarm;foreachsamplesize,wesimulated50datasets.Weassessedmodelperformanceusingthemeansquarederror(MSE)criterion. InTable 2-4 ,wereporttheMSEsofP[Yj=1jRj=1, 41

PAGE 42

Inaddition,theMSEsfortheshrinkagemodelcomparefavorablywiththoseofthetrueparametricmodelforallsamplesizesconsidered,despitethefactthattheshrinkagepriorswerespeciedtoshrinktowardanincorrectmodel. 2-5 displaysthetreatment-specicmonotonizeddrop-outratesintheBCPT.Bythe7thstudyvisit,morethan40%ofpatientshadmissedoneormoreassessments,withaslightlyhigherpercentageinthetamoxifenarm. WettheshrinkagemodeltotheobserveddatausingWinBUGS,withfourchainsof8000iterationsand1000burn-in.Convergencewascheckedbyexaminingtraceplotsofthemultiplechains. 2-3 ,theshrinkagemodeltstheobserveddatawell.Figure 2-4 illustratestheeffectofshrinkageonthemodeltsbycomparingthedifferencebetweentheempiricalrateandposteriormeanofP[Yj=1jRj=1, 42

PAGE 43

2-5 showstheposteriorofP[Y7=1jZ=z],thetreatment-specicprobabilityofdepressionattheendofthe36-monthfollowup(solidlines).Forcomparison,theposteriorunderMAR(correspondingtopointmasspriorsforatzero)isalsopresented(dashedlines).Theobserveddepressionrates(i.e.,completecaseanalysis)were0.115onboththeplaceboandtamoxifenarms.UndertheMNARanalysis(usingtheelicitedpriors),theposteriormeanofthedepressionratesatmonth36were0.126(95%CI:0.115,0.138)and0.130(95%CI:0.119,0.143)fortheplaceboandtamoxifenarms;thedifferencewas0.004(95%CI:0.012,0.021).UnderMAR,therateswere0.125(95%CI:0.114,0.136)and0.126(95%CI:0.115,0.138)fortheplaceboandtamoxifenarms;thedifferencewas0.001(95%CI:0.015,0.018).TheposteriorprobabilityofdepressionwashigherundertheMNARanalysisthantheMARanalysissinceresearchersbelieveddepressedpatientsweremorelikelytodropout(seeTable 2-2 ),abeliefthatwascapturedbytheelicitedpriors.Figure 2-6 showsthatunderthetwotreatmentstherewerenosignicantdifferencesinthedepressionratesateverytimepoint(95%credibleintervalsallcoverzero)underbothMNARandMAR.Similar(non-signicant)treatmentdifferenceswereseenwhenexaminingtreatmentcomparisonsconditionalondepressionstatusatbaseline. 43

PAGE 44

Penalizedlikelihood( FanandLi 2001 ; GreenandSilverman 1994 ; Wahba 1990 )isanotherapproachforhigh-dimensionalstatisticalmodeling.Therearesimilaritiesbetweenthepenalizedlikelihoodapproachandourshrinkagemodel.Infact,theshrinkagepriorsonthesaturatedmodelparametersproposedinourapproachcanbeviewedasaspecicformforthepenalty. Theideasinthispapercanbeextendedtocontinuousoutcomes.Forexample,onecouldusethemixturesofDirichletprocessesmodel( EscobarandWest 1995 )forthedistributionofobservedresponses.Theycanalsobeextendedtomultiplecausedropout;inthistrial,missedassessmentswereduetoavarietyofreasonsincludingpatient-speciccausessuchasexperiencingaprotocoldenedevent,stoppingtherapy,orwithdrawingconsentandinstitution-speciccausessuchasunderstafnganstaffturnover.Therefore,somemissingnessislesslikelytobeinformative;extensionswillneedtoaccountforthat.Inaddition,institutionaldifferencesmightbeaddressedbyallowinginstitution-specicparameterswithpriorsthatshrinkthemtowardacommonsetofparameters. Forsmallersamplesizes,WinBUGShasdifcultysamplingfromtheposteriordistributionoftheparametersintheshrinkagemodel.Inaddition,themonotonizingapproachignorestheintermittentmissingdataandmayleadtobiasedresults.TheseissueswillbeexaminedinthenextChapter. 44

PAGE 45

RelativeRiskstobeElicited DropoutRatep 100%condentthenumberisaboverz,min(p) PercentilesofRelativeRisksElicited DropoutRate TreatmentPercentile 10%25% TamoxifenMinimum 1.101.30Median 1.201.50Maximum 1.301.60 PlaceboMinimum 1.011.20Median 1.051.30Maximum 1.101.40 45

PAGE 46

SimulationScenario TimePoint Parameter01234567 46

PAGE 47

SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. Observedj,z ShrinkageP6.9701.9990.0330.0450.0510.0590.0470.0520.066T6.9882.4010.0240.0260.0560.0530.0630.1190.073 SaturatedP35.67867.1710.0360.0500.0540.0580.1010.2310.561T34.65462.6060.0260.0330.0450.0590.0970.3290.722 ShrinkageP4.6281.1880.0250.0280.0320.0310.0350.0480.057T4.4481.4140.0170.0230.0280.0260.0380.0330.044 SaturatedP30.27454.6470.0230.0280.0280.0330.0610.1380.290T29.59951.2190.0200.0200.0320.0280.0510.1400.392 ShrinkageP2.3920.7070.0080.0100.0150.0170.0140.0130.014T2.4740.7120.0110.0150.0110.0150.0160.0190.023 SaturatedP22.98937.7160.0090.0090.0160.0180.0180.0380.094T22.24534.7910.0110.0130.0140.0140.0210.0480.128 Table2-5. PatientsCumulativeDropOutRate Month 361218243036 TamoxifenAvailable 5364487445974249391035293163Dropout 49076711151454183522012447DropRate(%) 9.1314.3020.7927.1134.2141.0345.62 PlaceboAvailable 5375487146244310395135933297Dropout 50475110651424178220782304DropRate(%) 9.3813.9719.8126.4933.1538.6642.87 47

PAGE 48

Extrapolationoftheelicitedrelativerisks. 48

PAGE 49

Priorconditionaldensityz,j, 49

PAGE 50

SolidanddashedlinesrepresenttheempiricalrateofP[Yj=1,Rj=1jZ=z]andP[Rj=0jZ=z],respectively.TheposteriormeansofP[Yj=1,Rj=1jZ=z](diamond)andP[Rj=0jZ=z](triangle)andtheir95%credibleintervalsaredisplayedateachtimepoint. 50

PAGE 51

DifferencesbetweenposteriormeanandempiricalrateofP[Yj=1jRj=1, 51

PAGE 52

PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. 52

PAGE 53

Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. 53

PAGE 54

2 aBayesianshrinkageapproachforlongitudinalbinarydatawithinformativedrop-out.Thesaturatedobserveddatamodelswereconstructedsequentiallyviaconditionaldistributionsforresponseandfordropouttimeandparameterizedonthelogisticscaleusingallinteractionterms.However,twoissueswerenotaddressed:theignoredintermittentmissingdataandtheintrinsiccomputationalchallengewiththeinteractionparameterization.ThisChapterproposessolutionstothesetwoissues. Landetal. ( 2002 );wedidthisinChapter 2 .However,thisincreasesthedrop-outrate,throwsawayinformationandthuslosesefciency,andmayintroducebias. Handlinginformativeintermittentmissingdataismethodologicallyandcomputationallychallengingand,asaresult,thestatisticsliteratureisrelativelylimited.Mostmethodsadoptalikelihoodapproachandrelyonstrongparametricassumptions(see,forexample, Albert 2000 ; Albertetal. 2002 ; Ibrahimetal. 2001 ; Linetal. 2004 ; Troxeletal. 1998 ).Semiparametricmethodshavebeenproposedby Troxeletal. ( 1998 )and Vansteelandtetal. ( 2007 ). Troxeletal. ( 1998 )proposedamarginalmodelandintroducedapseudo-likelihoodestimationprocedure. Vansteelandtetal. ( 2007 )extendedtheideasof Rotnitzkyetal. ( 1998b ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )tonon-monotonemissingdata. 54

PAGE 55

HarelandSchafer ( 2009 )thatpartitionthemissingdataandallowone(ormore)ofthepartitionstobeignoredgiventheotherpartition(s)andtheobserveddata.InthisChapter,weapplyapartialignorabilityassumptionsuchthattheintermittentmissingdatamechanismcanbeignoredgivendrop-outandtreatmentstrata. 2 ,Section 2.5 ,WinBUGShasdifcultysamplingfromtheposteriordistributionoftheparameterswhensamplesizeisrelativelysmall(lessthan3000perarm).Tailoredsamplingalgorithmscanbewrittentoovercomethisdifculty,however,WinBUGSlackstheexibilitytoincorporatemodicationsand/orextensionstoitsexistingalgorithms. InthisChapter,wewillprovideanalternativeparameterizationsofthesaturatedmodelfortheobserveddataaswellasalternativeshrinkagepriorspecicationstoimprovecomputationalefciency.ThisalternativeapproachtoposteriorsamplingcaneasilybeprogrammedinR. 3.2 ,wedescribethedatastructure,formalizeidenticationassumptionsandprovethatthetreatment-specicdistributionofthefulltrajectoryoflongitudinaloutcomesisidentiedundertheseassumptions.InSection 3.3 ,weintroduceasaturatedmodelforthedistributionofthedatathatwouldbeobservedwhenthereisdrop-out,butnointermittentobservations.Wethenintroduceshrinkagepriorstoparametersinthesaturatedmodeltoreducethedimensionalityoftheparameterspace.InSection 3.4 ,weassess,bysimulation,thebehaviorofthreeclassesofmodels:parametric,saturated,andshrinkage.OuranalysisoftheBCPTtrialispresentedinSection 3.5 .Section 3.6 isdevotedtoasummaryanddiscussion. 55

PAGE 56

2 ,Section 2.2 ,aswellasintroducesomeadditionalnotationinthisSection.Thefollowingnotationisdenedforarandomindividual.Whennecessary,weusethesubscriptitodenotedatafortheithindividual. LetZdenotethetreatmentassignmentindicator,whereZ=1denotestamoxifenandZ=0denotesplacebo.LetYbethecompleteresponsedatavectorwithelementsYjdenotingthebinaryoutcome(i.e.,depression)scheduledtobemeasuredatthejthvisit(j=0(baseline),...,J)andlet Wewillnditusefultodistinguishthreesetsofdataforanindividual:thecompletedataC=(Z,S,RS,Y),thefulldataF=(Z,S,RS, Weassumethatindividualsaredrawnasasimplerandomsamplefromasuper-populationsothatwehaveani.i.d.datastructureforC,FandO.WelettheparameterszindexamodelforthejointconditionaldistributionofSand 56

PAGE 57

Ourgoalistodrawinferenceaboutz,j=P[Yj=1jZ=z]forj=1,...,Jandz=0,1.Toidentifyz,jfromthedistributionoftheobserveddata,wemakethefollowingthree(untestable)assumptions: Thisassumptionplustheassumptionthatzisaprioriindependentofzimpliesthattheintermittentmissingnessmechanismisancillaryorignorable.Specically,thismeansthatwhenconsideringinferencesaboutzfromalikelihoodperspective,asweareinthispaper,theconditionaldistributionofRSgivenZ,SandYobsdoesnotcontributetothelikelihoodandcanbeignored( HarelandSchafer 2009 ). Assumptions2and3arethesameasAssumptions1and2inChapter 2 ,Section 2.3 ,respectively.WerestatebelowthetwoassumptionusingthesurvivaltimeSnotation(insteadofmissingindicatorsRinChapter 2 ). 57

PAGE 58

2 Theidentiabilityresultshowsthat,giventhefunctionsqz,j( 3.3.1Modeling 2 asfollows: forj=2,...,Jandy=0,1. Letzdenotetheparametersindexingtherstsetofmodelsforresponseandzdenotetheparametersindexingthesecondsetofmodelsfordrop-out.RecallthatwedenedztodenotetheparametersoftheconditionaldistributionofSand Thissaturatedmodelavoidsthecomplexinteractiontermmodelparameterization.Asaresult,the(conditional)posteriordistributionsofzwillhavesimpleformsandefcientposteriorsamplingispossibleevenwhenthesamplesizeismoderateorsmall. Weusethesameparameterizationofthefunctionsqz,j( 2 ,Section 2.5 2 ,thestrategytoavoidthecurseofdimensionalitywastoapplyshrinkagepriorsforhigherorderinteractionstoreducethenumberofparameters 58

PAGE 59

3 ,weuseadifferentshrinkagestrategy.Inparticular,weproposetouseBetapriorsforshrinkageasfollows: forj=2,...,Jandy=0,1.Forz,0,z,1,yandz,0,yfory=0,1,weassignUnif(0,1)priors.Letm()z(m()z)and()z(()z)denotetheparametersm()z,j,y(m()z,j1,y)and()z,j,y(()z,j1,y)respectively. NotethatforarandomvariableXthatfollowsaBeta(m=,(1m)=)distribution,wehaveE[X]=mandVar[X]=m(1m) +1. WespecifyindependentUnif(0,1)priorsform()z,j,yandm()z,j1,y.Fortheshrinkageparameters()z,j,yand()z,j1,y,wespecifyindependent,uniformshrinkagepriors( Daniels 59

PAGE 60

)asfollows gE()z,j,y()z,j,y+12and()z,j1,ygE()z,j1,y gE()z,j1,y()z,j1,y+12,(3) where ChristiansenandMorris ( 1997 )). TheexpectednumberofsubjectswithSj,Yj1=y, wheretheprobabilitiesontherighthandsideoftheaboveequationsareestimableunderAssumption1. Theexpectedsamplesizesaboveareusedinthepriorinsteadoftheobservedbinomialsamplesizeswhicharenotcompletelydeterminedduetotheintermittentmissingness.Thus,ourformulationofthesepriorsinducesasmalladditionalamountofdatadependencebeyonditsstandarddependenceonthebinomialsamplesizes.Thisadditionaldependenceaffectsthemedianofthepriorbutnotitsdiffuseness. 2 ,Section 2.6.2 forconstructingpriorsofzgivenz. 60

PAGE 61

2 ,posteriorcomputationsfortheobserveddatamodelaremucheasierandmoreefcientunderthereparameterizedmodel 3 andtheBetashrinkagepriors.TheposteriorsamplingalgorithmscanbeimplementedinRwithnosamplesizerestrictions. Thefollowingstepsareusedtosimulatedrawsfromtheposteriorofz,j: 1. SampleP(z,YImisjYobs,S,RS,Z=z)usingGibbssamplingwithdataaugmentation(seedetailsinAppendix).Continuesamplinguntilconvergence. 2. Foreachdrawofz,j1, 2.6.2 3. Computez,jbypluggingthedrawsofz, 2.4 2 ,Section 2.7 tosimulateobserveddata(nointermittentmissingness).Weagaincomparedtheperformanceofourshrinkagemodelwith(1)acorrectparametricmodel,(2)anincorrectparametricmodel(rstorderMarkovmodel)and(3)asaturatedmodel(withdiffusepriors).OurshrinkagemodelusestheshrinkagepriorsproposedinSection 3.3.2 Weconsideredsmall(500),moderate(2000),large(5000)andverylarge(1,000,000)samplesizesforeachtreatmentarm;foreachsamplesize,wesimulated500datasets.Weassessedmodelperformanceusingmeansquarederror(MSE). InTable 3-2 (samplesize1,000,000notshown),wereporttheMSE'sofP[Yj=1jSj, 61

PAGE 62

Inaddition,theMSE'sfortheparametersz,jintheshrinkagemodelcomparefavorablywiththoseofthetrueparametricmodelforallsamplesizesconsidered,despitethefactthattheshrinkagepriorswerespeciedtoshrinktowardanincorrectmodel. 3-1 displaysthetreatment-specicdrop-outandintermittentmissingratesintheBCPT.Bythe7thstudyvisit(36months),morethan30%ofpatientshaddroppedoutineachtreatmentarm,withaslightlyhigherpercentageinthetamoxifenarm. 3 )tobethemaximumfunction.Tocomputetheexpectednumberofsubjectse()z,j, 3 ),weassignedapointmasspriorat0.5toallm()z,m()z,()zand()z(whichcorrespondstoUnif(0,1)priorsonz, 3.3.4 .Toavoiddatasparsity,wecalculatedP[S=s, Toassessmodelt,wecomparedtheempiricalratesandposteriormeans(with95%credibleintervals)ofP[Yj=1,SjjZ=z]andP[S
PAGE 63

3-2 illustratestheeffectofshrinkageonthemodeltbycomparingthedifferencebetweentheempiricalratesandposteriormeansofP[Yj=1jSj, 2.7 ,weknowthattheempiricalestimatesarelessreliableforlatertimepoints.Viatheshrinkagepriors,theprobabilitiesP[Yj=1jSj,Yj1=yj1, 3-3 showstheposteriorofP[Y7=1jZ=z],thetreatment-specicprobabilityofdepressionattheendofthe36-monthfollowup(solidlines).Forcomparison,theposteriorunderMAR(correspondingtopointmasspriorsforatzero)isalsopresented(dashedlines).Theobserveddepressionrates(i.e.,completecaseanalysis)were0.124and0.112fortheplaceboandtamoxifenarms,respectively.UndertheMNARanalysis(usingtheelicitedpriors),theposteriormeanofthedepressionratesatmonth36were0.133(95%CI:0.122,0.144)and0.125(95%CI:0.114,0.136)fortheplaceboandtamoxifenarms;thedifferencewas0.007(95%CI:0.023,0.008).UnderMAR,therateswere0.132(95%CI:0.121,0.143)and0.122(95%CI:0.111,0.133)fortheplaceboandtamoxifenarms;thedifferencewas0.01(95%CI:0.025,0.005). 63

PAGE 64

2-2 ),abeliefthatwascapturedbytheelicitedpriors.Figure 3-4 showsthatunderthetwotreatmentstherewerenosignicantdifferencesinthedepressionratesatanymeasurementtime(95%credibleintervalsallcoverzero)underbothMNARandMAR.Similar(non-signicant)treatmentdifferenceswereseenwhenexaminingtreatmentcomparisonsconditionalondepressionstatusatbaseline. Theposteriormeanandbetween-treatmentdifferenceofthedepressionrateatmonth36with95%CIaregiveninTables 3-3 and 3-4 .Noneofthescenariosconsideredresultedinthe95%CIforthedifferenceinratesofdepressionat36monthsthatexcludedzeroexceptforthe(extreme)scenariowheretheelicitedtamoxifenintervalswereshiftedby0.5andtheelicitedplacebointervalswereshiftedby0.5. Wealsoassessedtheimpactofswitchingthepriorsfortheplaceboandtamoxifenarms;inthiscase,theposteriormeanswere0.135(95%CI:0.124,0.146)and0.123(95%CI:0.112,0.134)fortheplaceboandtamoxifenarmsrespectively,whilethedifferencewas0.012(95%CI:0.027,0.004). 64

PAGE 65

2 forintermittentmissingness.Inaddition,wereparameterizedthesaturatedobserveddatamodelanddramaticallyimprovedthecomputationalefciency. WinBUGScanstillbeappliedforthereparameterizedmodelwhenthereisnointermittentmissingdata.However,withtheintermittentmissingness,theaugmentationstepintheposteriorcomputationrequiresextensiveprogramminginWinBUGS.Nevertheless,theapproachinChapter 2 maystillbepreferredincertaincases,e.g.fordirectlyshrinkingtheinteractionterms. Asanextension,wemightconsideralternativestothepartialignorabilityassumption(Assumption1)whichhasbeenwidelyused,butquestionedbysome( Robins 1997 ). MissingnessbyScheduledMeasurementTime TimePointj(Month) 65

PAGE 66

SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. Observedj,z(Month) ModelTreatYR1(3)2(6)3(12)4(18)5(24)6(30)7(36) ShrinkageP29.4782.3100.2020.2260.2520.3030.3120.3370.372T28.4102.3650.2120.2320.2940.3360.3300.3900.419 SaturatedP57.107111.2630.2020.2280.3020.4901.0832.4014.427T55.582104.8820.2110.2450.3830.6571.3523.1675.782 ShrinkageP23.5450.6470.0530.0560.0630.0780.0820.0840.095T22.5980.6150.0500.0630.0630.0800.0930.0950.102 SaturatedP40.32277.6270.0530.0570.0690.1000.1880.4570.946T38.94372.7310.0500.0640.0670.1100.2180.5601.223 ShrinkageP18.830.3940.0200.0240.0260.0330.0310.0390.036T18.0550.3220.0240.0240.0280.0360.0340.0400.041 SaturatedP30.07154.4540.0200.0240.0270.0380.0520.130.270T29.15650.5900.0240.0240.0290.0390.0590.1480.373 66

PAGE 67

SensitivitytotheElicitedPrior Scenario(T:Tamoxifen,P:Placebo) 10%25%10%25%10%25%10%25% TamoxifenMinimum 0.790.501.181.461.601.800.600.80Median 1.201.501.201.501.702.000.701.00Maximum 1.702.001.221.521.802.100.801.10P[Y7=1](95%CI) 0.850.801.041.281.511.700.510.70Median 1.051.301.051.301.551.800.550.80Maximum 1.301.801.061.321.601.900.600.90P[Y7=1](95%CI)

PAGE 68

SensitivitytotheElicitedPrior Scenario(T:Tamoxifen,P:Placebo) 10%25%10%25%10%25%10%25% TamoxifenMinimum 0.790.501.181.461.601.800.600.80Median 1.201.501.201.501.702.000.701.00Maximum 1.702.001.221.521.802.100.801.10P[Y7=1](95%CI) 1.041.280.850.800.510.701.511.70Median 1.051.301.051.300.550.801.551.80Maximum 1.061.321.301.800.600.901.601.90P[Y7=1](95%CI)

PAGE 69

SolidanddashedlinesrepresenttheempiricalrateofP[Yj=1,SjjZ=z]andP[S
PAGE 70

(A)Theempiricalrateandmodel-basedposteriormeanofP[Yj=1jSj, 70

PAGE 71

PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. 71

PAGE 72

Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. 72

PAGE 73

Gibbssamplerforposteriorcomputation:IntherststepoftheGibbssampler,wedraw,foreachsubjectwithintermittentmissingdata,fromthefullconditionalofYImisgivenz,z,m()z,()z,m()z,()z,Yobs,S,RSandZ=z.ThefullconditionaldistributioncanbeexpressedasP[YImis=yImisjz,z,m()z,()z,m()z,()z,Yobs=yobs,S=s,Rs=rs,Z=z]=P[YImis=yImis,Yobs=yobs,S=sjz,z,m()z,()z,m()z,()z,Z=z] Inthesecondstep,wedrawfromthefullconditionalofm()zgivenfYImisg,z,z,()z,m()z,()z,fYobsg,fSg,fRSgandfZg=z,wherethenotationfDgdenotesdataDforalltheindividualsonthestudy.ThefullconditionalcanbeexpressedasJYj=21Yy=0f(m()z,j,yjfYImisg,z,()z,j,y,fYobsg,fSg,fZg=z) Inthethirdstep,wedrawfromthefullconditionalofm()zgivenfYImisg,z,z,()z,m()z,()z,fYobsg,fSg,fRSgandfZg=z.ThefullconditionalcanbeexpressedasJYj=21Yy=0f(m()z,j1,yjfYImisg,z,()z,j1,y,fYobsg,fSg,fZg=z)

PAGE 74

gE()z,j,y()z,j,y+12YSij,Yi,j1=yi:Zi=zB(z,j, gE()z,j1,y()z,j1,y+12YSij1,Yi,j1=yi:Zi=zB(z,j1, Neal 2003 ). 74

PAGE 75

Finally,wedrawfromthefullconditionalofzgivenfYImisg,z,r()z,()z,r()z,()z,fYobsg,fSg,fRsgandfZg=z.ThefullconditionalcanbeexpressedasJYj=21Yy=0Yall 3 ),theposteriordistributionsofP[Yj=1jSj,Yj1, 3 ),forallZ,jand 75

PAGE 76

Toseethis,notethat(pjjY,N)/pYjj(1pj)njYjZ,(pjj,)(,)dd.

PAGE 77

DanielsandHogan 2008 ; HoganandLaird 1997b ; KenwardandMolenberghs 1999 ; Little 1995 ; MolenberghsandKenward 2007 ).Inthispaper,wewillconcernourselveswithpatternmixturemodelswithmonotonemissingness(i.e.,drop-out).Forpatternmixturemodelswithnon-monotone(i.e.,intermittent)missingness(detailsgobeyondthescopeofthispaper),oneapproachistopartitionthemissingdataandallowone(ormore)orthepartitionstobeignoredgiventheotherpartition(s)( HarelandSchafer 2009 ; Wangetal. 2010 ). Itiswellknownthatpattern-mixturemodelsarenotidentied:theobserveddatadoesnotprovideenoughinformationtoidentifythedistributionsforincompletepatterns.Theuseofidentifyingrestrictionsthatequatetheinestimableparameterstofunctionsofestimableparametersisanapproachtoresolvetheproblem( DanielsandHogan 2008 ; Kenwardetal. 2003 ; Little 1995 ; LittleandWang 1996 ; Thijsetal. 2002 ).Commonidentifyingrestrictionsincludecompletecasemissingvalue(CCMV)constraintsandavailablecasemissingvalue(ACMV)constraints. Molenberghsetal. ( 1998 )provedthatfordiscretetimepointsandmonotonemissingness,theACMVconstraintisequivalenttomissingatrandom(MAR),asdenedby Rubin ( 1976 )and LittleandRubin ( 1987 ).Akeyandattractivefeatureofidentifyingrestrictionsisthattheydonotaffectthetofthemodeltotheobserveddata.Understanding(identifying)restrictionsthatleadtoMARisanimportantrststepforsensitivityanalysisundermissingnotatrandom(MNAR)( DanielsandHogan 2008 ; Scharfsteinetal. 2003 ; ZhangandHeitjan 2006 ). 77

PAGE 78

DanielsandHogan 2008 ; Scharfsteinetal. 1999 ; Vansteelandtetal. 2006b ). Thenormalityofresponsedata(ifappropriate)forpatternmixturemodelsisdesirableasiteasilyallowsincorporationofbaselinecovariatesandintroductionofsensitivityparameters(forMNARanalysis)thathaveconvenientinterpretationsasdeviationsofmeansandvariancesfromMAR( DanielsandHogan 2008 ).However,multivariatenormalitywithinpatternscanbeoverlyrestrictive.Weexploresuchissuesinthispaper. Onecriticismofmixturemodelsisthattheyofteninducemissingdatamechanismsthatdependonthefuture( Kenwardetal. 2003 ).Weexploresuchnon-futuredependenceinourcontexthereandshowhowmixturemodelsthathavesuchmissingdatamechanismshavefewersensitivityparameters. InSection 4.2 ,weshowconditionsunderwhichMARexistsanddoesnotexistwhenthefull-dataresponseisassumedmultivariatenormalwithineachmissingpattern.InSection 4.3 andSection 4.4 inthesamesetting,weexploresensitivityanalysisstrategiesunderMNARandundernon-futuredependentMNARrespectively.InSection 4.5 ,weproposeasensitivityanalysisapproachwhereonlytheobserveddatawithinpatternareassumedmultivariatenormal.InSection 4.6 ,weapplytheframeworksdescribedinprevioussectionstoarandomizedclinicaltrialforestimatingtheeffectivenessofrecombinantgrowthhormoneforincreasingmusclestrengthintheelderly.InSection 4.7 ,weshowthatinthepresenceofbaselinecovariateswithtime-invariantcoefcients,standardidentifyingrestrictionscauseover-identicationofthebaselinecovariateeffectsandweproposearemedy.WeprovideconclusionsinSection8. 78

PAGE 79

WeshowthatMARdoesnotnecessarilyexistwhenitisassumedthat foralls. Toseethis,weintroducesomefurthernotation.Let(s)(j)=E( 4 ),dene(s)1(j)=(s)21(j)(s)11(j)1(s)2(j)=(s)2(j)(s)1(j)(s)1(j)(s)3(j)=(s)22(j)(s)21(j)(s)11(j)1(s)12(j).

PAGE 80

Proof. 4.1 issatised,thenthereexistsaconditionaldistributionps(yjj 4 )forMARtoexist. 4 ),identicationviaMARconstraintsexistsifandonlyif(s)and(s)satisfyLemma 4.1 forsjand1
PAGE 81

4.2 areimposed). Wenowexaminethecorrespondingmissingdatamechanism(MDM),SjY.Weuse'todenoteequalityindistribution. 4 )withmonotonedropout,MARholdsifandonlyifSjY'SjY1. Proof. Ontheotherhand,MARimpliesthatp(S=sjY)=p(S=sjYobs)=p(S=sj 4.2 ,wehavethatMARholdsonlyifforall1
PAGE 82

4 )withmonotonedropout,MCARisequivalenttoMARifps(y1)=p(y1)foralls. Proof. 4.3 ,weshowedthatMARholdsifp(S=sjY)=ps(y1) 4 )withmonotonedropout,MARconstraintsareidenticaltocompletecasemissingvalue(CCMV)andnearest-neighborconstraints(NCMV). Proof. 4.2 ,theMARconstraintsimplypj(yjj

PAGE 83

4 )anddemonstratethatMARonlyexistsunderthefairlystrictconditionsgiveninTheorem1. 4.2 ,weproposetofollowtheapproachin DanielsandHogan ( 2008 ,Chapter8)andspecifydistributionsofobservedYwithinpatternas: wherej=f1,2,...,j1g.Notethatbyconstruction,weassumeps(yjj 4 )withmonotonedropout,identicationviaMARconstraintsexistsifandonlytheobserveddatacanbemodeledas( 4 ). Proof. 4.2 showsthatidenticationviaMARconstraintsexistsifandonlyifconditionaldistributionsps(yjj 4.6 impliesthatunderthemultivariatenormalityassumptionin( 4 )andtheMARassumption,asequentialspecicationasin( 4 )alwaysexists. WeprovidesomedetailsforMARinmodel( 4 )(whichimpliesthespecicationin( 4 )asstatedinCorollary 4.6 )next.Distributionsformissingdata(whicharenot 83

PAGE 84

ThemotivationoftheproposedsequentialmodelistoallowastraightforwardextensionoftheMARspecicationtoalargeclassofMNARmodelsindexedbyparametersmeasuringdeparturesfromMAR,aswellastheattractionofdoingsensitivityanalysisonmeansand/orvariancesinnormalmodels. Forexample,wecanlet(j)l=(j)l+(j)landlog(j)jjj=(j)+log(j)jjj

PAGE 85

DanielsandHogan 2008 ).Indeed,wecouldmake(j)land(j)independentofjtofurtherreducethenumberofsensitivityparameters. ToseetheimpactoftheparametersontheMDM,weintroducenotation(j)jjj=(j)0+Pj1l=1(j)lYlandthenfork
PAGE 86

Kenwardetal. 2003 ). Kenwardetal. ( 2003 )showedthatnon-futuredependenceholdsifandonlyifforeachj3andk
PAGE 87

4.3 ). Forexample,wemayspecifydistributionsYobsjSasfollows:ps(y1)N((s)1,(s)1)1sJps(yjj fors
PAGE 88

UnderanMARassumption( 4 ),for[Yjj 4 ),thetwosensitivityparameterscontrolthedepartureofthemeanandvariancefromMARinthefollowingway,(s),MNARjjj=(j)+(s),MARjjjand(s),MNARjjj=e2(j)(s),MARjjj+(1e2(j))M, Byassumingnon-futuredependence,weobtainps(yjj

PAGE 89

4 )forthecurrentdata(j=s+1).ThenumberofsensitivityparametersinthissetupisreducedfromJ(J1)to(J2)(J1);so,forJ=3(6),from6(30)to2(20).Furtherreductionsareillustratedinthenextsection. 4.4 and 4.5 thatassumemultivariatenormalityforthefull-dataresponsewithinpattern(MVN)ormultivariatenormalityfortheobserveddataresponsewithinpattern(OMVN).Weassumenon-futuredependenceforthemissingdatamechanismtominimizethenumberofsensitivityparameters. Thegrowthhormone(GH)trialwasarandomizedclinicaltrialconductedtoestimatetheeffectivenessofrecombinanthumangrowthhormonetherapyforincreasingmusclestrengthintheelderly.Thetrialhadfourtreatmentarms:placebo(P),growthhormoneonly(G),exerciseplusplacebo(EP),andexerciseplusgrowthhormone(EG).Musclestrength,heremeanquadricepsstrength(QS),measuredasthemaximumfoot-poundsoftorquethatcanbeexertedagainstresistanceprovidedbyamechanicaldevice,wasmeasuredatbaseline,6monthsand12months.Therewere161participantsenrolledonthisstudy,butonly(roughly)75%ofthemcompletedthe12monthfollowup.Researchersbelievedthatdropoutwasrelatedtotheunobservedstrengthmeasuresatthedropouttimes. Forillustration,weconneourattentiontothetwoarmsusingexercise:exerciseplusplacebo(EP)andexerciseplusgrowthhormone(EG).Table 4-1 containstheobserveddata. Let(Y1,Y2,Y3)denotethefull-dataresponsecorrespondingtobaseline,6months,and12months.LetZbethetreatmentindicator(1=EG,0=EP).OurgoalistodrawinferenceaboutthemeandifferenceofQSbetweenthetwotreatmentarmsatmonth 89

PAGE 90

Thisreducesthesetofsensitivityparameterstof(2)0,(3)0gforMVNmodelandf(2),(3)gfortheOMVNmodel. Thereareavarietyofwaystospecifypriorsforthesensitivityparameters(2)0and(3)0,(2)0=E(Y2jY1,S=1)E(Y2jY1,S2)(3)0=E(Y3jY2,Y1,S=2)E(Y3jY2,Y1,S=3). Basedondiscussionwithinvestigators,wemadetheassumptionthatdropoutsdoworsethancompleters;thus,werestrictthe'stobelessthanzero.TodoafullyBayesiananalysistofairlycharacterizetheuncertaintyassociatedwiththemissingdatamechanism,weassumeauniformpriorforthe'sasadefaultchoice.Subjectmatterconsiderationsgaveanupperboundofzerofortheuniformdistributions.Weset 90

PAGE 91

WetthemodelusingWinBUGS,withmultiplechainsof25,000iterationsand4000burn-in.Convergencewascheckedbyexaminingtraceplotsofthemultiplechains. TheresultsoftheMVNmodel,OMVNmodel,andtheobserveddataanalysisarepresentedinTable 4-2 .UnderMNAR,theposteriormean(posteriorstandarddeviation)ofthedifferenceinquadricepsstrengthat12monthsbetweenthetwotreatmentarmswas4.0(8.9)and4.4(10)fortheMVNandOMVNmodels.UnderMARthedifferenceswere5.4(8.8)and5.8(9.9)fortheMVNandOMVNmodels,respectively.ThesmallerdifferencesunderMNARwereduetoquadricepsstrengthat12monthsbeinglowerunderMNARduetotheassumptionthatdropoutsdoworsethancompleters.Weconcludethatthetreatmentdifference,wasnotsignicantlydifferentfromzero.

PAGE 92

For( 4 )toholdforallY1andX,weneedthat(1)=(2). LittleandWang 1996 ; WangandDaniels 2009 ). Toresolvetheover-identicationissue,weproposetoapplyMARconstraintsonresidualsinsteadofdirectlyontheresponses.Inthebivariatecase,thecorrespondingrestrictionis 92

PAGE 93

4 )placesnoconstraintson(s),thusavoidingover-identication. TheMDMcorrespondingtotheACMV(MAR)ontheresidualsisgivenbylogP(S=1jY,X) 1(X)1 2((1B)2X((2)(2)T(1)(1)T)XT2(1B)(Y2(Y1))X((2)(1)))1 2log(2)11 4 )impliesMARifandonlyif(2)=(1).Soingeneral,MARonresidualsdoesnotimplythatmissingnessinY2isMAR.However,itisanidentifyingrestrictionthatdoesnotaffectthetofthemodeltotheobserveddata.CCMVandNCMVrestrictionscanbeappliedsimilarlytotheresiduals.

PAGE 94

Forthemissingdata,theconditionaldistributionsarespeciedasps(yjj Theconditionalmeanstructuresin( 4 )and( 4 )inducethefollowingformforthemarginalmeanresponseE(YjjS=s)=U(s)j+X(s), NotethatsinceY1isalwaysobserved,(s)(1sJ)areidentiedbytheobserveddata.However,inthemodelgivenby( 4 )and( 4 ),thereisatwo-foldover-identicationof(s)underMAR: 1. ForMARconstraintstoexistunderthemodelgivenin( 4 ),(s)jjjasdenedin( 4 )mustbeequalfor2jsJandforallX.Thisrequiresthat(s)=for2jsJ. 2. MARconstraintsalsoimplythat(s)jjjasdenedin( 4 )mustbeequalto(j)jjjfor1s
PAGE 95

Withtheconditionalmeanstructuresspeciedas( 4 )and( 4 ),theMARontheresidualsrestrictionplacesnoassumptionson(s). ThecorrespondingMDMislogP(S=sjY,X) However,asimplepatternmixturemodelbasedonmultivariatenormalityforthefulldataresponsewithinpatternsdoesnotallowMARwithoutspecialrestrictionsthatthemselves,induceaveryrestrictivemissingdatamechanism.Weexploredthis 95

PAGE 96

Inaddition,weshowedthatwhenintroducingbaselinecovariateswithtimeinvariantcoefcients,standardidentifyingrestrictionsresultinover-identicationofthemodel.Thisisagainsttheprincipleofapplyingidentifyingrestrictioninthattheyshouldnotaffectthemodelttotheobserveddata.Weproposedasimplealternativesetofrestrictionsbasedonresidualsthatcanbeusedasan'identication'startingpointforananalysisusingmixturemodels. Inthegrowthhormonestudydataexample,weshowedhowtoreducethenumberofsensitivityparametersinpracticeandadefaultwaytoconstructinformativepriorsforsensitivityparametersbasedonlimitedknowledgeaboutthemissingness.Inparticular,allthevaluesintherange,Dwereweightedequallyviaauniformdistribution.Ifthereisadditionalexternalinformationfromexpertopinionorhistoricaldata,informativepriorsmaybeusedtoincorporatesuchinformation(forexample,see IbrahimandChen 2000 ; Wangetal. 2010 ).Finally,animportantconsiderationinsensitivityanalysisandconstructinginformativepriorsisthattheyshouldavoidextrapolatingmissingvaluesoutsideofareasonablerange(e.g.,negativequadricepsstrength). GrowthHormoneStudy:Samplemean(standarddeviation)stratiedbydropoutpattern. DropoutNumberofMonth TreatmentPatternParticipants0612 EG11258(26)2457(15)68(26)32278(24)90(32)88(32)All3869(25)87(32)88(32) EP1765(32)2287(52)86(51)33165(24)81(25)73(21)All4066(26)82(26)73(21) 96

PAGE 97

GrowthHormoneStudy:Posteriormean(standarddeviation) ObservedMARAnalysisMNARAnalysis TreatmentMonthDataMVNOMVNMVNOMVN EP066(9.9)66(6.0)66(6.0)66(6.0)66(6.0)682(18)82(5.9)81(8.2)80(6.0)80(8.3)1272(3.8)73(4.9)73(6.1)72(5.0)71(6.1) EG069(7.3)69(4.9)69(4.9)69(4.9)69(4.9)687(16)81(6.8)82(7.7)78(7.1)79(8.0)1288(6.8)78(7.2)79(7.8)76(7.5)76(8.0) Differenceat12mos.12(7.8)5.4(8.8)5.8(9.9)4.0(8.9)4.4(10) 97

PAGE 98

Missingdatamechanismundermissingnotatrandomandmultivariatenormality:TheMDMinSection 4.3 isderivedasfollows:logP(S=sjY) 2exp((Y1(k)1)2 2exp((Yl(l)ljl(l)ljl)2 4.5 arederivedasfollows:(s),MNARjjj=E(Yjj

PAGE 99

4.6 ):Wespecifyapatternmixturemodelwithsensitivityparametersforthetwotreatmentarms.Forcompactness,wesuppresssubscripttreatmentindicatorzfromalltheparametersinthefollowingmodels. FormissingpatternS,wespecifySMult() 99

PAGE 101

4.7.2 ),theMARontheresidualsrestrictionputsnoconstraintson(s). Let[ZjjS]'YjX(s).TheMARontheresidualsconstraintsarepk(zjj 2(s)ljlyl(l)lX(s)Pj1t=1(j)t(yt(l)tX(s))2 q 2(s)ljlzl(l)lPj1t=1(j)t(zt(l)t2 q

PAGE 102

4 )thusimply(s)j,l=(j)j,l(s)j=(j)j+j1Xl=1(j)j,l((s)l(j)l)(j)jjj=(j)jjj, 102

PAGE 103

Farniretal. 2002 ; Liuetal. 2006 ).Insuchanalysis,researchersareofteninterestedinsimultaneouslyestimatinglinkagethatdescribesthetendencyofcertainallelestobeinheritedtogether,andlinkagedisequilibrium(LD)thatmeasuresthenon-randomassociationbetweendifferentmarkers.However,iftheLDisclosetozero,thelinkagerecombinationfractionishardtoestimate(ornotestimableatall).Weuseatwomarkerscenariotoillustratethisdilemma. 103

PAGE 104

whereDistheLDparameter.Bysimplealgebra,wecanshowthatD=p11p00p10p01. Onepossiblesolutionistoincorporatemoremarkersinthelinkageanalysis.Bydoingthis,thenumberofparentswithnolessthantwoheterozygousmarkersincreases.Consequently,moreoffspringcontributetothelikelihoodforestimatingthelinkagerecombinationfraction.However,thenumberofhaplotypefrequenciestobeestimatedalsoincreases(exponentially)asthenumberofmarkersincreases.Bayesianshrinkagemethodscanbeappliedtoaddressthisproblem. 104

PAGE 105

5.3.1CausalInferenceIntroduction Rubin 1974 ).However,innon-randomizedtrialsorinpresenceofmissingdata,thesemethodsarelimitediftheresearchinterestdemandsestimationofcausallyinterpretableeffects. Todenecausaleffects,werstintroducetheconceptofpotentialoutcomes,whicharesometimesusedexchangeablelywiththetermcounterfactual(butnotalways,see Rubin 2000 ).Theuseoftermpotentialoutcomecanbetracedatleastto Neyman ( 1923 ).NeymanusedpotentialyieldsUiktoindicatetheyieldofaplotkifexposedtoavarietyi. Rubin ( 1974 )denesthecausaleffectofonetreatment,E,overanother,C,foraparticularunitasthedifferencebetweenwhatwouldhavehappenediftheunithadbeenexposedtoE,namelyY(E),andwhatwouldhavehappenediftheunithadbeenexposedtoC,namelyY(C). Usingpotentialoutcomes, FrangakisandRubin ( 2002 )introduceaframeworkforcomparingtreatmenteffectsbasedonprinciplestratication,whichisacross-classicationofunitsdenedbytheirpotentialoutcomeswithrespectto(post)treatmentvariables,suchastreatmentnoncomplianceordrop-out.Thetreatmentcomparisonadjustmentforposttreatmentvariablesisnecessarybecausesuchvariablesencodethecharacteristicsofboththetreatmentandthepatient.Forexample,apatientwithdiagnosedcancerinacancerpreventiontrailmayhavedepressioncausedbythetreatmentorbythediagnosis 105

PAGE 106

AstratumAisdenedbythejointpotentialresponseS(Z)withrespecttotheposttreatmentvariableZ(e.g.,Z=0,1).Forexample,letS(Z)bethepotentialsurvivalstatusandletS(Z)=1and0denotealiveanddeadrespectively.ThenthestratumA=fS(0)=1,S(1)=1gdenesthepatientswhowill(potentially)surviveonbotharms. Astratumisunaffectedbytreatment.Thatis,forsubjecti,i2Aori62Adoesnotdependontheactualtreatmentiisassigned.Consequently,thetreatmenteffectdenedasthedifferencebetweenfYi(0)ji2AgandfYi(1)ji2Ag Onthecontrary,astandardadjustmentforposttreatmentvariablesusesthetreatmentcomparisonbetweenfYi(0)jSi(0)=sgandfYi(1)jSi(1)=sg. ConsistentwithFrangakisandRubin'sframework, Rubin ( 2000 )introducedtheconceptofsurvivorsaveragecausaleffect(SACE),thatisthecausaleffectsoftreatmentonendpointsthataredenedonlyforsurvivors,i.e.thegroupofpatientswhowouldliveregardlessoftheirtreatmentassignment. Withintheprincipalstrataframework,theidenticationofSACEorotherprincipalstratumcausaleffectsusuallydependsonuntestableassumptions.Toaddressthe 106

PAGE 107

ZhangandRubin ( 2003 )derivedlargesampleboundsforcausaleffectswithoutassumptionsandwithassumptionssuchasmonotonicityondeathrateondifferenttreatmentarms. Gilbertetal. ( 2003 )usedaclassoflogisticselectionbiasmodelstoidentifythecausalestimandsandcarriedoutsensitivityanalysisforthemagnitudeofselectionbias. Haydenetal. ( 2005 )assumedexplainablenonrandomnoncompliance( Robins 1998 )andoutlinedasensitivityanalysisforexploringtherobustnessoftheassumption. ChengandSmall ( 2006 )derivedsharpboundsforthecausaleffectsandconstructedcondenceintervalstocovertheidenticationregion. Eglestonetal. ( 2007 )proposedasimilarmethodto ZhangandRubin ( 2003 ),butinsteadofidentifyingthefulljointdistributionofpotentialoutcomes,theyonlyidentifyfeaturesofthejointdistributionthatarenecessaryforidentifyingtheSACEestimand. Leeetal. ( 2010 )replacedthecommondeterministicmonotonicityassumptionbyastochasticonethatallowsincorporationofsubjectspeciceffectsandgeneralizedtheassumptionstomorecomplextrials. Weconsideracontrolledrandomizedclinicalstudywithtreatmentarm(Z=1)andcontrolarm(Z=0).AlongitudinalbinaryoutcomeYisscheduledtobemeasuredatvisitsj=1,...,J,i.e.Y=(Y1,...,YJ)isaJ-dimensionalvector.LetR=(R1,...,RJ)bethemissingindicatorvectorwithRj=1ifYjisobservedandRj=0ifYjismissing.Weassumethemissingnessismonotone. Weassumetherearemultipleeventsthatwillcausedropoutforapatientonthistrial,andcategorizetheeventsasnon-responseevents(e.g.death)andmissingevents(e.g.withdrawofconsent).Weassumethatnon-responseeventsmayhappenafterthe 107

PAGE 108

LetCdenotethesurvivaltimeforapatient.Thatis,C=cimpliesthatanon-responseeventhappenedtothepatientbetweenvisitcandc+1andcausedthepatienttodropoutonandaftervisitc+1.LetRc=fR1,...,RCgbethemissingdataindicatorrecordedpriortopatientdrop-outthatiscausedbyanon-responseevent. Weuse ThefulldataFofapatientthusconsistsoffZ,C(0), forallj.NotethatthegroupofpatientsofinterestfC(0)j,C(1)jgformaprincipalstratum. 108

PAGE 109

3 ,Section 3.2 ,thatRc? 3 thatj,z,cisidentiedbytheobserveddataunderthispartialmissingatrandomassumption. 5 )isnotidentiablefromtheobserveddataO=fZ,C(Z),Rc(Z),Yobs(Z)g. I LetZ=(Z1,...,ZN)bethevectoroftreatmentassignmentforallthepatients.SUTVAmeansZi=Z0i)(Yi(Zi),Ci(Zi))=(Yi(Z0i),Ci(Z0i)), II III Thisassumptionprovidesanorderingofthemeanpotentialresponseatvisitjundertreatmentzforalltheprincipalcohortsofindividualswhowouldbeonstudyatvisitjundertreatmentz.Themeansareassumedtonotbeworseforcohortswhoremainon-studylongerunderbothtreatments.Thatis,theindividualswhowouldbelastseenattimec0(c0j)undertreatmentzandtimet0undertreatment1zwillnothaveaworsemeanpotentialresponseattimejundertreatmentzthanindividuals 109

PAGE 110

Themeanmonotonicityassumptionisoftenreasonableinclinicalstudies.Forexample,inacardiovascularstentimplantationtrial,multipleendpointsincludingall-causemortalityfreesurvivaland6-minutewalktestscoreareusedtoevaluatetheeffectivenessofthedevice.Sincethetwoendpointsarepositivelycorrelated,itisplausibletoassumethatpatientswillpotentiallyperformbetterwiththeir6-minutewalktestsiftheyhavealongersurvivaltime,i.e.remainonthestudylonger. Weintroducesomefurthernotation 1. 2. 3. 4. NotethatunderAssumption II (randomization),bothz,c=P(C(z)=c)=P(C=cjZ=z)andj,z,c=E[Yj(z)jC(z)=c]=E[YjjC=c,Z=z]areidentiedbytheobserveddataunderthepartialmissingatrandomassumption. ThecausaleffectofinterestSACEjcanbeexpressedas TheboundariesofSACEjin( 5 )canbefoundsubjecttothefollowingrestrictions: 1. 2. 3. 4. 110

PAGE 111

III FindingtheboundariesoftheSACE,i.e.ndingtheminimumandthemaximumoftheobjectivefunction( 5 ),canbeapproximated(byignoringthenormalizingconstant)asanon-convexquadraticallyconstrainedquadraticproblem(QCQP)( BoydandVandenberghe 1997 2004 ).ForaQCQP,astandardapproachistooptimizeasemideniterelaxationoftheQCQPandgetlowerandupperboundsonlocaloptimaloftheobjectivefunction( BoydandVandenberghe 1997 ). TheuncertaintyoftheestimatedboundscanbecharacterizedinaBayesianframework.Thejointposteriordistributionoftheboundscanbeconstructedbyimplementingtheoptimizationforeachposteriorsampleofj,z,c,identiedbythealgorithmproposedinSection 5.3.3 .TheresultcanbepresentedasinFigure 5-1 .Astudydecisionmightbebasedonthemodeoftheposteriorjointdistributionofthebounds. 5-2 ).Theseassumptions,whenreasonable,willsimplifytheoptimizationoftheobjectivefunctionandyieldmorepreciseresults. 1. Thatis,givenapatientwillsurviveuntiltimepointconthetreatmentarm,theprobabilitythepatientwillsurviveuntiltimepointn1isqtimestheprobabilitythatthepatientwillsurviveuntilnforncontheplaceboarm.Theparameterqisasensitivityparameter. 2. 111

PAGE 112

5-2 iszero. TheseassumptionsmaybeincorporatedintheoptimizationBayesianframeworktoimprovetheprecisionoftheposteriorjointdistributionofthebounds.

PAGE 113

ContourandPerspectivePlotsofaBivariateDensity 113

PAGE 114

Illustrationofpc,t

PAGE 115

Albert,P.(2000).ATransitionalModelforLongitudinalBinaryDataSubjecttoNonignorableMissingData.Biometrics56,602. Albert,P.,Follmann,D.,Wang,S.,andSuh,E.(2002).Alatentautoregressivemodelforlongitudinalbinarydatasubjecttoinformativemissingness.Biometrics58,631. Baker,S.(1995).Marginalregressionforrepeatedbinarydatawithoutcomesubjecttonon-ignorablenon-response.Biometrics51,1042. Baker,S.,Rosenberger,W.,andDerSimonian,R.(1992).Closed-formestimatesformissingcountsintwo-waycontingencytables.StatisticsinMedicine11,643. Birmingham,J.andFitzmaurice,G.(2002).APattern-MixtureModelforLongitudinalBinaryResponseswithNonignorableNonresponse.Biometrics58,989. Boyd,S.andVandenberghe,L.(1997).Semideniteprogrammingrelaxationsofnon-convexproblemsincontrolandcombinatorialoptimization.communications,computation,controlandsignalprocessing:atributetoThomasKailath. Boyd,S.andVandenberghe,L.(2004).Convexoptimization.CambridgeUnivPr. Cheng,J.andSmall,D.(2006).Boundsoncausaleffectsinthree-armtrialswithnon-compliance.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology)68,815. Christiansen,C.andMorris,C.(1997).HierarchicalPoissonregressionmodeling.JournaloftheAmericanStatisticalAssociationpages618. Daniels,M.(1999).Apriorforthevarianceinhierarchicalmodels.CanadianJournalofStatistics27,. Daniels,M.andHogan,J.(2000).ReparameterizingthePatternMixtureModelforSensitivityAnalysesUnderInformativeDropout.Biometrics56,1241. Daniels,M.andHogan,J.(2008).MissingDatainLongitudinalStudies:StrategiesforBayesianModelingandSensitivityAnalysis.Chapman&Hall/CRC. DeGruttola,V.andTu,X.(1994).ModellingProgressionofCD4-lymphocyteCountanditsRelationshiptoSurvivalTime.Biometrics50,1003. Diggle,P.andKenward,M.(1994).InformativeDrop-outLongitudinalDataAnalysis.AppliedStatistics43,49. Egleston,B.L.,Scharfstein,D.O.,Freeman,E.E.,andWest,S.K.(2007).Causalinferencefornon-mortalityoutcomesinthepresenceofdeath.Biostatistics8,526545. 115

PAGE 116

Fan,J.andLi,R.(2001).VariableSelectionViaNonconcavePenalizedLikelihoodandItsOracleProperties.JournaloftheAmericanStatisticalAssociation96,1348. Farnir,F.,Grisart,B.,Coppieters,W.,Riquet,J.,Berzi,P.,Cambisano,N.,Karim,L.,Mni,M.,Moisio,S.,Simon,P.,etal.(2002).Simultaneousminingoflinkageandlinkagedisequilibriumtonemapquantitativetraitlociinoutbredhalf-sibpedigrees:revisitingthelocationofaquantitativetraitlocuswithmajoreffectonmilkproductiononbovinechromosome14.Genetics161,275. Faucett,C.andThomas,D.(1996).Simultaneouslymodellingcensoredsurvivaldataandrepeatedlymeasuredcovariates:aGibbssamplingapproach.StatisticsinMedicine15,. Fisher,B.,Costantino,J.,Wickerham,D.,Redmond,C.,Kavanah,M.,Cronin,W.,Vogel,V.,Robidoux,A.,Dimitrov,N.,Atkins,J.,Daly,M.,Wieand,S.,Tan-Chiu,E.,Ford,L.,Wolmark,N.,otherNationalSurgicalAdjuvantBreast,andInvestigators,B.P.(1998).Tamoxifenforpreventionofbreastcancer:reportoftheNationalSurgicalAdjuvantBreastandBowelProjectP-1study.JournaloftheNationalCancerInstitute90,1371. Fitzmaurice,G.andLaird,N.(2000a).Generalizedlinearmixturemodelsforhandlingnonignorabledropoutsinlongitudinalstudies.Biostatistics1,141. Fitzmaurice,G.andLaird,N.(2000b).GeneralizedLinearMixtureModelsforHandlingNonignorableDropoutsinLongitudinalStudies.Biostatistics1,141. Fitzmaurice,G.,Molenberghs,G.,andLipsitz,S.(1995).RegressionModelsforLongitudinalBinaryResponseswithInformativeDrop-Outs.JournaloftheRoyalStatisticalSociety.SeriesB.Methodological57,691. Follmann,D.andWu,M.(1995).Anapproximategeneralizedlinearmodelwithrandomeffectsforinformativemissingdata.Biometricspages151. Forster,J.andSmith,P.(1998).Model-BasedInferenceforCategoricalSurveyDataSubjecttoNon-IgnorableNon-Response.JournaloftheRoyalStatisticalSociety:SeriesB:StatisticalMethodology60,57. Frangakis,C.E.andRubin,D.B.(2002).Principalstraticationincausalinference.Biometrics58,21. Gilbert,P.,Bosch,R.,andHudgens,M.(2003).SensitivityanalysisfortheassessmentofcausalvaccineeffectsonviralloadinHIVvaccinetrials.Biometrics59,531. Green,P.J.andSilverman,B.(1994).NonparametricRegressionandGeneralizedLinearModels.Chapman&Hall. 116

PAGE 117

Hayden,D.,Pauler,D.,andSchoenfeld,D.(2005).Anestimatorfortreatmentcomparisonsamongsurvivorsinrandomizedtrials.Biometrics61,305. Heagerty,P.(2002).Marginalizedtransitionmodelsandlikelihoodinferenceforlongitudinalcategoricaldata.Biometricspages342. Heckman,J.(1979a).SampleSelectionBiasasaSpecicationError.Econometrica47,153. Heckman,J.(1979b).Sampleselectionbiasasaspecicationerror.Econometrica:Journaloftheeconometricsocietypages153. Heitjan,D.andRubin,D.(1991).Ignorabilityandcoarsedata.TheAnnalsofStatisticspages2244. Henderson,R.,Diggle,P.,andDobson,A.(2000).Jointmodellingoflongitudinalmeasurementsandeventtimedata.Biostatistics1,465. Hogan,J.andLaird,N.(1997a).MixtureModelsfortheJointDistributionofRepeatedMeasuresandEventTimes.StatisticsinMedicine16,239. Hogan,J.andLaird,N.(1997b).Model-BasedApproachestoAnalysingIncompleteLongitudinalandFailureTimeData.StatisticsinMedicine16,259. Hogan,J.,Lin,X.,andHerman,B.(2004).Mixturesofvaryingcoefcientmodelsforlongitudinaldatawithdiscreteorcontinuousnonignorabledropout.Biometrics60,854. Ibrahim,J.andChen,M.(2000).Powerpriordistributionsforregressionmodels.StatisticalSciencepages46. Ibrahim,J.,Chen,M.,andLipsitz,S.(2001).Missingresponsesingeneralisedlinearmixedmodelswhenthemissingdatamechanismisnonignorable.Biometrika88,551. Kaciroti,N.,Schork,M.,Raghunathan,T.,andJulius,S.(2009).ABayesianSensitivityModelforIntention-to-treatAnalysisonBinaryOutcomeswithDropouts.StatisticsinMedicine28,572. Kenward,M.andMolenberghs,G.(1999).ParametricModelsforIncompleteContinuousandCategoricalLongitudinalData.StatisticalMethodsinMedicalResearch8,51. Kenward,M.,Molenberghs,G.,andThijs,H.(2003).Pattern-mixturemodelswithpropertimedependence.Biometrika90,53. Kurland,B.andHeagerty,P.(2004).MarginalizedTransitionModelsforLongitudinalBinaryDatawithIgnorableandNon-IgnorableDrop-Out.StatisticsinMedicine23,2673. 117

PAGE 118

Land,S.,Wieand,S.,Day,R.,TenHave,T.,Costantino,J.,Lang,W.,andGanz,P.(2002).MethodologicalIssuesIntheAnalysisofQualityofLifeDatainClinicalTrials:IllustrationsfromtheNationalSurgicalAdjuvantBreastAndBowelProject(NSABP)BreastCancerPreventionTrial.StatisticalMethodsforQualityofLifeStudiespages71. Lee,J.andBerger,J.(2001).SemiparametricBayesianAnalysisofSelectionModels.JournaloftheAmericanStatisticalAssociation96,1397. Lee,J.,Hogan,J.,andHitsman,B.(2008).Sensitivityanalysisandinformativepriorsforlongitudinalbinarydatawithoutcome-relateddrop-out.TechnicalReport,BrownUniversity. Lee,K.,Daniels,M.J.,andSargent,D.J.(2010).Causaleffectsoftreatmentsforinformativemissingdataduetoprogression.ToAppearinJASA. Liang,K.-Y.andZeger,S.L.(1986).Longitudinaldataanalysisusinggeneralizedlinearmodels.Biometrika73,13. Lin,H.,McCulloch,C.,andRosenheck,R.(2004).Latentpatternmixturemodelsforinformativeintermittentmissingdatainlongitudinalstudies.Biometrics60,295. Little,R.(1993).Pattern-MixtureModelsforMultivariateIncompleteData.JournaloftheAmericanStatisticalAssociation88,125. Little,R.(1994).AClassofPattern-MixtureModelsforNormalIncompleteData.Biometrika81,471. Little,R.(1995).Modelingthedrop-outmechanisminrepeated-measuresstudies.JournaloftheAmericanStatisticalAssociation90,. Little,R.andRubin,D.(1987).StatisticalAnalysiswithMissingData.Wiley. Little,R.andRubin,D.(1999).CommentonAdjustingforNon-IgnorableDrop-outUsingSemiparametricModelsbyD.O.Scharfstein,A.RotnitskyandJ.M.Robins.JournaloftheAmericanStatisticalAssociation94,1130. Little,R.andWang,Y.(1996).Pattern-mixturemodelsformultivariateincompletedatawithcovariates.Biometrics52,98. Liu,T.,Todhunter,R.,Lu,Q.,Schoettinger,L.,Li,H.,Littell,R.,Burton-Wurster,N.,Acland,G.,Lust,G.,andWu,R.(2006).Modellingextentanddistributionofzygoticdisequilibrium:Implicationsforamultigenerationalcaninepedigree.Genetics. 118

PAGE 119

Molenberghs,G.,Kenward,M.,andLesaffre,E.(1997).TheAnalysisofLongitudinalOrdinalDatawithNonrandomDrop-Out.Biometrika84,33. Molenberghs,G.andKenward,M.G.(2007).MissingDatainClinicalStudies.Wiley. Molenberghs,G.,Michiels,B.,Kenward,M.,andDiggle,P.(1998).MonotoneMissingDataandPattern-MixtureModels.StatisticaNeerlandica52,153. Neal,R.(2003).Slicesampling.TheAnnalsofStatistics31,705. Neyman,J.(1923).Ontheapplicationofprobabilitytheorytoagriculturalexperiments.StatisticalScience5,465. Nordheim,E.(1984).InferencefromNonrandomlyMissingCategoricalData:anExampleFromaGeneticStudyofTurner'sSyndrome.JournaloftheAmericanStatisticalAssociation79,772. Pauler,D.,McCoy,S.,andMoinpour,C.(2003).PatternMixtureModelsforLongitudinalQualityofLifeStudiesinAdvancedStageDisease.StatisticsinMedicine22,795. Pulkstenis,E.,TenHave,T.,andLandis,J.(1998).ModelfortheAnalysisofBinaryLongitudinalPainDataSubjecttoInformativeDropoutThroughRemedication.JournaloftheAmericanStatisticalAssociation93,438. Radloff,L.(1977).TheCES-DScale:ASelf-ReportDepressionScaleforResearchintheGeneralPopulation.AppliedPsychologicalMeasurement1,385. Robins,J.(1997).Non-responsemodelsfortheanalysisofnon-monotonenon-ignorablemissingdata.StatisticsinMedicine16,21. Robins,J.(1998).Correctionfornon-complianceinequivalencetrials.StatisticsinMedicine17,. Robins,J.andRitov,Y.(1997).Towardacurseofdimensionalityappropriate(coda)asymptotictheoryforsemi-parametricmodels.StatisticsinMedicine16,285. Robins,J.,Rotnitzky,A.,andZhao,L.(1994).Estimationofregressioncoefcientswhensomeregressorsarenotalwaysobserved.JournaloftheAmericanStatisticalAssociation89,846. Robins,J.,Rotnitzky,A.,andZhao,L.(1995).Analysisofsemiparametricregressionmodelsforrepeatedoutcomesinthepresenceofmissingdata.JournaloftheAmericanStatisticalAssociation90,. 119

PAGE 120

Rotnitzky,A.,Robins,J.,andScharfstein,D.(1998b).Semiparametricregressionforrepeatedoutcomeswithnonignorablenonresponse.JournaloftheAmericanStatisticalAssociation93,1321. Rotnitzky,A.,Scharfstein,D.O.,Su,T.-L.,andRobins,J.M.(2001).Methodsforconductingsensitivityanalysisoftrialswithpotentiallynonignorablecompetingcausesofcensoring.Biometrics57,103. Roy,J.(2003).ModelingLongitudinalDatawithNonignorableDropoutsUsingaLatentDropoutClassModel.Biometrics59,829. Roy,J.andDaniels,M.J.(2008).Ageneralclassofpatternmixturemodelsfornonignorabledropoutwithmanypossibledropouttimes.Biometrics64,538. Rubin,D.(1974).Estimatingcausaleffectsoftreatmentsinrandomizedandnonrandomizedstudies.JournalofEducationalPsychology66,688. Rubin,D.(1976).Inferenceandmissingdata.Biometrika63,581. Rubin,D.(1977).Formalizingsubjectivenotionsabouttheeffectofnonrespondentsinsamplesurveys.JournaloftheAmericanStatisticalAssociationpages538. Rubin,D.B.(1987).MultipleImputationforNonresponseinSurveys.Wiley. Rubin,D.B.(2000).Causalinferencewithoutcounterfactuals:comment.JournaloftheAmericanStatisticalAssociationpages435. Scharfstein,D.,Daniels,M.,andRobins,J.(2003).IncorporatingPriorBeliefsaboutSelectionBiasintotheAnalysisofRandomizedTrialswithMissingOutcomes.Biostatistics4,495. Scharfstein,D.,Halloran,M.,Chu,H.,andDaniels,M.(2006).Onestimationofvaccineefcacyusingvalidationsampleswithselectionbias.Biostatistics7,615. Scharfstein,D.,Manski,C.,andAnthony,J.(2004).OntheConstructionofBoundsinProspectiveStudieswithMissingOrdinalOutcomes:ApplicationtotheGoodBehaviorGameTrial.Biometrics60,154. Scharfstein,D.,Rotnitzky,A.,andRobins,J.(1999).AdjustingforNonignorableDrop-OutUsingSemiparametricNonresponseModels.JournaloftheAmericanStatisticalAssociation94,1096. 120

PAGE 121

Shepherd,B.,Gilbert,P.,andMehrotra,D.(2007).ElicitingaCounterfactualSensitivityParameter.AmericanStatistician61,56. TenHave,T.,Kunselman,A.,Pulkstenis,E.,andLandis,J.(1998).Mixedeffectslogisticregressionmodelsforlongitudinalbinaryresponsedatawithinformativedrop-out.Biometrics54,367. TenHave,T.,Miller,M.,Reboussin,B.,andJames,M.(2000).MixedEffectsLogisticRegressionModelsforLongitudinalOrdinalFunctionalResponseDatawithMultiple-CauseDrop-OutfromtheLongitudinalStudyofAging.Biometrics56,279. Thijs,H.,Molenberghs,G.,Michiels,B.,Verbeke,G.,andCurran,D.(2002).Strategiestotpattern-mixturemodels.Biostatistics3,245. Troxel,A.,Harrington,D.,andLipsitz,S.(1998).Analysisoflongitudinaldatawithnon-ignorablenon-monotonemissingvalues.JournaloftheRoyalStatisticalSociety.SeriesC(AppliedStatistics)47,425. Troxel,A.,Lipsitz,S.,andHarrington,D.(1998).Marginalmodelsfortheanalysisoflongitudinalmeasurementswithnonignorablenon-monotonemissingdata.Biometrika85,661. Tsiatis,A.A.(2006).Semiparametrictheoryandmissingdata.Springer,NewYork. vanderLaan,M.J.andRobins,J.(2003).UniedMethodsforCensoredLongitudinalDataandCausality.Springer. Vansteelandt,S.,Goetghebeur,E.,Kenward,M.,andMolenberghs,G.(2006a).Ignoranceanduncertaintyregionsasinferentialtoolsinasensitivityanalysis.Statis-ticaSinica16,953. Vansteelandt,S.,Goetghebeur,E.,Kenward,M.,andMolenberghs,G.(2006b).Ignoranceanduncertaintyregionsasinferentialtoolsinasensitivityanalysis.Statis-ticaSinica16,953. Vansteelandt,S.,Rotnitzky,A.,andRobins,J.(2007).Estimationofregressionmodelsforthemeanofrepeatedoutcomesundernonignorablenonmonotonenonresponse.Biometrika94,841. Wahba,G.(1990).Splinemodelsforobservationaldata.SocietyforIndustrialMathematics. 121

PAGE 122

Wang,C.,Daniels,M.,D.O.,S.,andLand,S.(2010).ABayesianshrinkagemodelforincompletelongitudinalbinarydatawithapplicationtothebreastcancerpreventiontrial.ToAppearinJASA. Wu,M.andBailey,K.(1988).Analysingchangesinthepresenceofinformativerightcensoringcausedbydeathandwithdrawal.StatisticsinMedicine7,. Wu,M.andBailey,K.(1989).Estimationandcomparisonofchangesinthepresenceofinformativerightcensoring:conditionallinearmodel.Biometricspages939. Wu,M.andCarroll,R.(1988).Estimationandcomparisonofchangesinthepresenceofinformativerightcensoringbymodelingthecensoringprocess.Biometrics44,175. Wulfsohn,M.andTsiatis,A.(1997).Ajointmodelforsurvivalandlongitudinaldatameasuredwitherror.Biometricspages330. Yuan,Y.andLittle,R.J.(2009).Mixed-effecthybridmodelsforlongitudinaldatawithnonignorabledrop-out.Biometrics(inpress). Zhang,J.andHeitjan,D.(2006).Asimplelocalsensitivityanalysistoolfornonignorablecoarsening:applicationtodependentcensoring.Biometrics62,1260. Zhang,J.andRubin,D.(2003).EstimationofCausalEffectsviaPrincipalStraticationWhenSomeOutcomesareTruncatedbyDeath.JournalofEducationalandBehavioralStatistics28,353. 122

PAGE 123

ChenguangWangreceivedhisbachelor'sandmaster'sdegreesincomputersciencefromDalianUniversityofTechnology,China.Chenguanglaterjoinedthebiometryprogramofandreceivedhismaster'sdegreeinstatisticsfromUniversityofNebraska-Lincoln.AtUniversityofFlorida,Chenguang'smajorwasstatisticswhilesimultaneouslyworkingfortheChildren'sOncologyGroupStatisticsandDataCenter(2004-2009)andCenterforDevicesandRadiologicalHealth,FDA(2009-2010).ChenguangreceivedhisPh.D.fromUniversityofFloridainthesummerof2010.Chenguang'sresearchhasfocusedonconstructingaBayesianframeworkforincompletelongitudinaldatathatidentiestheparametersofinterestandassessessensitivityoftheinferenceviaincorporatingexpertopinions.Suchaframeworkcanbebroadlyusedinclinicaltrialstoprovidehealthcareprofessionalsmoreaccurateunderstandingofthestatisticalorcausalrelationshipbetweenclinicalinterventionsandhumandiseases.ChenguangisamemberofAmericanStatisticalAssociation,amemberofEasternNorthAmericanRegion/InternationalBiometricSociety,andamemberofChildren'sOncologyGroup. 123