UFDC Home  myUFDC Home  Help 



Full Text  
xml version 1.0 standalone yes Volume_Errors Errors PageID P134 ErrorID 4 P136 4 BAYESIAN NONPARAMETRIC AND SEMIPARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA By CHENGUANG WANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2010 2010 Chenguang Wang I dedicate this to my family ACKNOWLEDGMENTS First and foremost, I would like to express the deepest appreciation to my advisor, Professor Michael J. Daniels. Without his extraordinary guidance and persistent help, I will never be able to be where I am. I admire his wisdom, his knowledge and his commitment to the highest standard. It has been truly an honor to work with him. I wish to specially thank Professor Daniel O. Scharfstein of Johns Hopkins for his encouragement and crucial contribution to the research. I will always bear in mind the advice he gave: just relax and enjoy the learning process. I would like to thank my committee members, Professor Malay Ghosh, Dr. Brett Presnell, and Dr. Almut Winterstein, who have provided abundant support and valuable insights over the entire process throughout the classes, exams and dissertation. Many thanks go in particular to Professor Rongling Wu of Pennsylvania State University. From Professor Wu, I started learning what I wanted for my career. I also gratefully thank Professor Myron Chang, Professor Linda Young, Dr. Meenakshi Devidas and Dr. Gregory Campbell of FDA. I am fortunate to have their support at those critical moments of my career. Finally, I would like to thank my wife, my son, my soontobeborn baby, my parents and my parentsinlaw. It is only because of you that I have been able to keep working toward this dream I have. TABLE OF CONTENTS page ACKNOWLEDGMENTS ................... ............... 4 LIST O FTABLES ..................... ................. 8 LIST OF FIGURES .................... ................. 9 ABSTRACT .................... ................... .. 10 CHAPTER 1 INTRODUCTION .................... ............... 12 1.1 Missing Data Concepts and Definitions .................. 12 1.2 LikelihoodBased Methods ............................ 16 1.3 NonLikelihood Methods ................... ........ 19 1.4 Interm ittent M issingness ............................ 20 1.5 Identifying Restrictions in Pattern Mixture Models .... 22 1.6 Dissertation G oals . .. 24 2 A BAYESIAN SHRINKAGE MODEL FOR LONGITUDINAL BINARY DATA W ITH DRO PO UT ................... ............... 26 2.1 Introduction . .. 26 2.1.1 Breast Cancer Prevention Trial .. .. 26 2.1.2 Informative DropOut in Longitudinal Studies ... 27 2.1.3 Outline. .... .. 30 2.2 Data Structure and Notation ..... .. ..... 30 2.3 Assum options . . .. 31 2.4 Identifiability .................... .............. 32 2.5 Modeling ..................... .... .......... 35 2.6 Prior Specification and Posterior Computation ..... 35 2.6.1 Shrinkage Priors . 36 2.6.2 Prior of Sensitivity Parameters .. .. 37 2.6.3 Posterior Computation ..... ..... 40 2.7 Assessment of Model Performance via Simulation .... 40 2.8 Application: Breast Cancer Prevention Trial (BCPT) .... 42 2.8.1 Model Fit and Shrinkage Results ... 42 2.8.2 Inference .. . .. 43 2.9 Summary and Discussion ........................... 43 2.10 Acknowledgments ................... ............ 45 2.11 Tables and Figures . .. 45 3 A BETABINOMIAL BAYESIAN SHRINKAGE MODEL FOR INTERMITTENT MISSINGNESS LONGITUDINAL BINARY DATA .................. 54 3.1 Introduction .................... ............... 54 3.1.1 Interm ittent Missing Data ................ ....... 54 3.1.2 Com putational Issues ......................... 55 3 .1.3 O utline . . 55 3.2 Notation, Assumptions and Identifiability ..... 56 3.3 Modeling, Prior Specification and Posterior Computation ... 58 3.3.1 M odeling . . 58 3.3.2 Shrinkage Prior . 58 3.3.3 Prior of Sensitivity Parameters .. .. 60 3.3.4 Posterior Computation ......................... 61 3.4 Assessment of Model Performance via Simulation .... 61 3.5 Application: Breast Cancer Prevention Trial (BCPT) .... 62 3.5.1 Model Fit ... ............... ........ .. .. 62 3.5.2 Inference ... ............... ........ .. .. 63 3.5.3 Sensitivity of Inference to the Priors ... 64 3.6 Summary and Discussion ................. ........ 65 3.7 Tables and Figures ............................ 65 3.8 Appendix ................ ................. 73 4 A NOTE ON MAR, IDENTIFYING RESTRICTIONS, AND SENSITIVITY ANALYSIS IN PATTERN MIXTURE MODELS ................ ......... 77 4.1 Introduction ... . .... 77 4.2 Existence of MAR under Multivariate Normality within Pattern ....... 79 4.3 Sequential Model Specification and Sensitivity Analysis under MAR .. 83 4.4 NonFuture Dependence and Sensitivity Analysis under Multivariate Normality w within Pattern . .. 85 4.5 MAR and Sensitivity Analysis with Multivariate Normality on the ObservedData R response . . .. 87 4.6 Example: Growth Hormone Study .. .. 89 4.7 ACMV Restrictions and Multivariate Normality with Baseline Covariates .91 4.7.1 Bivariate Case .............. ....... ....... 91 4.7.2 Multivariate Case ........... ............. 93 4.8 Sum m ary . . 95 4.9 Tables . .. 96 4.10 Appendix ..... ..... .. 98 5 DISCUSSION: FUTURE APPLICATION OF THE BAYESIAN NONPARAMETRIC AND SEMIPARAMETRIC METHODS ..... ..... 103 5.1 S um m ary . . 103 5.2 Extensions to Genetics Mapping ..... ... ... 103 5.3 Extensions to Causal Inference ..... .... ... 105 5.3.1 Causal Inference Introduction . ... 105 5.3.2 Data and Notation ........................... 107 5.3.3 Missing Data Mechanism ..... ...... 108 5.3.4 Causal Inference Assumption .. ... 109 5.3.5 Stochastic Survival Monotonicity Assumption .... 111 5.3.6 Summary of Causal Inference . ... 112 5.4 Figures . . .. 112 REFERENCES ................ ................... .. 115 BIOGRAPHICAL SKETCH ............. ..... .............. 123 LIST OF TABLES Table page 21 Relative Risks to be Elicited ........... ............... 45 22 Percentiles of Relative Risks Elicited ..... ..... 45 23 Sim ulation Scenario . .. 46 24 Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arm s, respectively. . .. 47 25 Patients Cumulative Drop Out Rate ..... .. ...... 47 31 Missingness by Scheduled Measurement Time ... 65 32 Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arm s, respectively. . .. 66 33 Sensitivity to the Elicited Prior ... 67 34 Sensitivity to the Elicited Prior ... 68 41 Growth Hormone Study: Sample mean (standard deviation) stratified by dropout pattern. ........... .... .................... ... ... 96 42 Growth Hormone Study: Posterior mean (standard deviation) ... 97 LIST OF FIGURES Figure page 21 Extrapolation of the elicited relative risks. .. 48 22 Prior Density of p .. .. 49 23 M odel Fit . . .. 50 24 Model Shrinkage ... . 51 25 Posterior distribution of P[Y7 = 1Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively. ............................... 52 26 Posterior mean and 95% credible interval of difference of P[Yj = 1Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively.................... ............ 53 31 M odel Fit . . .. 69 32 S hrinkage . . 70 33 Posterior distribution of P[Y7 = 1Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and M AR, respectively. .. .. .. .. .. .. .. 71 34 Posterior mean and 95% credible interval of difference of P[Y, = 1Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively.................... ............ 72 51 Contour and Perspective Plots of a Bivariate Density ... 113 52 Illustration of pc,t . ...... 114 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy BAYESIAN NONPARAMETRIC AND SEMIPARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA By Chenguang Wang August 2010 Chair: Michael J. Daniels Major: Statistics We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to followup. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. Sensitivity analysis that assesses the sensitivity of modelbased inferences to such assumptions is often necessary. In Chapters 2 and 3, we posit an exponential tilt model that links nonidentifiable distributions and identifiable distributions. This exponential tilt model is indexed by nonidentified parameters, which are assumed to have an informative prior distribution, elicited from subjectmatter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. We propose two different saturated models for the observed data distribution, as well as shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a nonfuture dependence model for the dropout mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. In Chapter 4, we discuss pattern mixture models. Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions are one approach to mixture model identification (Daniels and Hogan, 2008; Kenward et al., 2003; Little, 1995; Little and Wang, 1996; Thijs et al., 2002) and are a natural starting point for missing not at random sensitivity analysis (Daniels and Hogan, 2008; Thijs et al., 2002). However, when the pattern specific models are multivariate normal (MVN), identifying restrictions corresponding to missing at random may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with timeinvariant coefficients). In this paper, we explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. A longitudinal clinical trial is used for illustration of sensitivity analysis. Problems caused by baseline covariates with timeinvariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution. CHAPTER 1 INTRODUCTION The problem of incomplete data is frequently confronted by statisticians, especially in longitudinal studies. The most common type of incomplete data is missing data, in which each data value is either perfectly known or completely unknown. In other situations, data are partially missing and partially observed. Examples include rounded data and censored data, etc.. This type of incomplete data is referred to as coarse data. Missing data can be viewed as a special case of coarse data (Heitjan and Rubin, 1991). In both cases, the incompleteness occurs because we observe only a subset of the complete data, which includes the true, unobservable data. In this dissertation, missing data including the dropout missingness, in which case subjects missing a measurement will not return to study at the next followup, and the intermittent missingness, in which case the missing visit is followed by an observed measurement. 1.1 Missing Data Concepts and Definitions If the missingness does not happen "at random" and the missingness process is ignored, biased inferences will often occur. Until the 1970s, most of the methods for handling missing values in the statistics literature ignored the missingness mechanism by deleting the incomplete units. Completecase analysis, also known as case deletion, confines the analysis to cases that have all variables observed. Availablecase analysis, also known as partial deletion, uses all values observed for univariate analysis. Both approaches are only valid under the strong assumption that the missingness is completely at random (MCAR), i.e. the missingness is independent of the response. In situations when MCAR doesn't hold, it is possible to adjust the selection bias caused by case deletion by reweighting the remaining cases (Little and Rubin, 1987, chapter 4); however, this method is inefficient. Another common approach is single imputation, that is, filling in a single value for each missing value. The advantage of single imputation is that it does not delete any units and after the imputation, standard methods for complete data can be applied on the filledin data. However, single imputation does not reflect the uncertainty of the missing value. Multiple imputation was proposed to address this flaw by imputing several values for each missing response(Rubin, 1987). For notation, let y = {yi, ... y} denote the full data response vector of outcome, possibly partially observed. Let r = {ri, r2,..., rj} denote the missing data indicator, with rj = 0 if yj is missing and 1 if yj is observed. Let x denote the covariates. Let yobs and ymis denote the observed and missing response data, respectively. Let w be the parameters indexing the full data model p(y, r), 0(w) be the parameters indexing the full data response model p(y), and O(w) be the parameters indexing the missing data mechanism model p(ry). The common assumptions about the missing data mechanism are as follows. Little and Rubin's taxonomy: Rubin (1976) and Little and Rubin (1987) developed a hierarchy for missing data mechanisms by classifying the relationship between missingness and the response data. Definition 1.1. Missing responses are missing completely at random (MCAR) if P(rlYobs, Ymis, x; O(W)) = p(rlx; O( )) for all x and w. Definition 1.2. Missing responses are missing at random (MAR) if P(rlYobs, Ymis, x; O(W)) = p(rlYobs, X; O()) for all x and w. Note that MAR holds if and only if P(Ymis Yobs, r) = P(YmislYobs). The proof is as follows: Proof: 1. Suppose MAR holds. Then we have p(rlymis, Yobs) = p(rlYobs) and we can derive that P(Ymis, rlYobs) p(rlYmis, Yobs)P(Ymis Yobs) P(Ymis Yobs, ) )  p(rlYobs) p(rlYobs) p(rlYobs)P(Ymis Yobs) p(rYobs)P(YMI ) = P(YmislYobs, r). P(r Yobs) 2. To show the reverse direction, note that p(rm p(r, Ymis Yobs) P(Ymis r, Yobs)P(r Yobs) P(r Ymis. Yobs)   P(Ymis Yobs) P(Ymis Yobs) P(Ymis Yobs)P(r Yobs) p(r = P(rlYobs) P(Ymis IYobs) This completes the proof. D Definition 1.3. Missing responses are missing not at random (MNAR) if P(rlYobs, Ymis, x; O(W)) $ p(rlYobs, y'mis, X; (W)) for some Ymis 7 yis. Ignorability: Under certain condition, the missingness process can be left unspecified for the inference on the response model parameter 0(w) (Laird, 1988). This condition is called ignorability (Rubin, 1976). Definition 1.4. The missing data mechanism is ignorable if 1. The missing data mechanism is MAR. 2. The parameters of the full data response model, 0(w) and the parameters of the missingness model are distinguishable, i.e. the full data parameter w can be decomposed as (0(w), O(w)). 3. The parameters 0(w) and O(w) are a priori independent, i.e. p(0(w), O(w)) = p(( ))u p( ( ) ). Full data models that do not satisfy Definition 1.4 have nonignorable missingness. Under ignorability, posterior inference on parameters 0(w) can be based on the observed data response likelihood n L(0(w) obs) c / Pi(Yobs, Ymis 1(w))dymis. i1 We show this below, L(w Yobs, r) = L(0(w), (w) obs, r) = p(yobs, Ymis, r(w), (w))dymis Sp(ryobs, O())P(Yobs, Ymis 0(w))dymis = p(ryobs, (w)) J P(Yobs, Ymis (w))dymis = p(rlYobs, (W))P(Yobsl (w)) = L(0(w)lr, Yobs)L(0(W ) yobs), and furthermore, P( lYobs, r) oc p(w)L(w Yobs, r) = p(O(w))L(/(w)lr, yobs)p(O(w))L(O(w) Yobs). Therefore, P(0(w)lyobs, r) oc p(0(w))L(0(w)lyobs) and the posterior inference of 0(w) can be based on observed response likelihood L(0(w) Yobs). Nonfuture Dependence: For cases with monotone missingness, i.e. rj = 0 implies rj, = 0 forj' > j, Kenward et al. (2003) defined the term nonfuture dependence. Definition 1.5. If the missingness is monotone, the MDM is nonfuture dependent if P(rlYobs, Ymis, x; O(W)) = p(rlYobs, Yc, x; (W)) with C = minj{rj = 0}. Nonfuture dependence assumes that missingness only depends on observed data and the current missing value. It can be viewed as a special case of MNAR and an extension of MAR 1.2 LikelihoodBased Methods Likelihood based methods handle the missing values by integrating them out of the likelihood function, instead of deletion or explicitly filling in values. The general strategy is to model the joint distribution of a response and the missingness process (Hogan and Laird, 1997b). Likelihoodbased models for missing data are distinguished by the way the joint distribution of the outcome and missing data processes are factorized. They can be classified as selection models, patternmixture models, and sharedparameter models. Selection model: Selection models factor the fulldata distribution as p(y, rw) = p(r y, O(w))p(y0(w)). The term "selection" was first introduced in the economics literature for modeling sample selection bias; that is different responses have different probabilities of being selected into a sample. Heckman (1979a,b) used a bivariate response Y with missing Y2 as an example and showed that in general it's critical to answer the question "why are the data missing" by modeling the missingness of Y2i as a function of observed Yi; (for subject i). Diggle and Kenward (1994) extended the Heckman model to longitudinal studies and modeled the dropout process by logistic regression such as logit(rj = 0lrj_1 = 1, y) = y'P. The Diggle and Kenward model has been adopted and extended by many researchers by (mostly) proposing different full data response models (Albert, 2000; Baker, 1995; Fitzmaurice et al., 1995; Heagerty, 2002; Kurland and Heagerty, 2004). Pattern mixture model: Pattern mixture models factor the fulldata distribution as p(y, rlw) = p(yr, O(w))p(rl(w)). Rubin (1977) introduced the idea of modeling respondents and nonrespondents in surveys separately and using subjective priors to relate respondents' and nonrespondents' model parameters. Little (1993, 1994) explored pattern mixture models in discrete time settings. Specifically, different identifying restrictions (see Section 1.5) were proposed to identify the fulldata model. When the number of dropout patterns is large and patternspecific parameters will be weakly identified by identifying restrictions, Roy (2003) and Roy and Daniels (2008) proposed to use latentclass model for dropout classes. When the dropout time is continuous and the mixture of patterns is infinite, Hogan et al. (2004) proposed to model the response given dropout by a varying coefficient model where regression coefficients were unspecified, nonparametric functions of dropout time. For timeevent data with informative censoring, Wu and Bailey (1988, 1989) and Hogan and Laird (1997a) developed random effects mixture models. Fitzmaurice and Laird (2000a) generalized Wu and Bailey and Hogan and Laird approach for discrete, ordinal and count data by using generalized linear mixture models and GEE approach for statistical inference. Daniels and Hogan (2000) proposed a parameterization of the pattern mixture model for continuous data. Sensitivity analysis can be done on the additive (location) and multiplicative (scale) terms. Forster and Smith (1998) considered a pattern mixture model for a single categorical response with categorical covariates. Bayesian approaches were employed for nonignorable missingness. Shared parameter model: Shared parameter models factorize the fulldata model as p(y, r w) = p(y, r, b, w)p(b w)db, J where b are subjectspecific random effects. It is usually assumed that y and r are independent conditionally on b. Wu and Carroll (1988) presented a shared parameter random effects model for continuous responses and informative censoring, in which individual effects are taken into account as intercepts and slopes for modeling the censoring process. DeGruttola and Tu (1994) extended Wu and Carroll's model to allow general covariates. Follmann and Wu (1995) developed generalized linear model for response and proposed an approximation algorithm for the joint fulldata model for inference. Faucett and Thomas (1996) and Wulfsohn and Tsiatis (1997) proposed to jointly model the continuous covariate over time and relate the covariates to the response simultaneously. Henderson et al. (2000) generalized the joint modeling approach by using two correlated Gaussian random processes for covariates and response. Ten Have et al. (1998, 2000) proposed a shared parameter mixed effects logistic regression model for longitudinal ordinal data. Recently, Yuan and Little (2009) proposed a mixedeffect hybrid model allows the missingness and response to be conditionally dependent given random effects. Sensitivity analysis for missing not at random: Sensitivity analysis is critical in longitudinal analysis of incomplete data. The fulldata model can be factored into an extrapolation model and an observed data model, p(y, rlw) = P(Ymis Yobs, r, WE)P(Yobs, rlw/), where WE are parameters indexing the extrapolation model and wi are parameters indexing the observed data model and are identifiable from observed data (Daniels and Hogan, 2008). Fulldata model inference requires unverifiable assumptions about the extrapolation model p(ymis Yobs' r, WE). A sensitivity analysis explores the sensitivity of inferences of interest about the full data response model to unverifiable assumptions about the extrapolation model. This is typically done by varying sensitivity parameters, which we define next (Daniels and Hogan, 2008). Definition 1.6. Let p(y, r w) be a full data model with extrapolation factorization p(y, r1w) = P(Ymis Yobs, r,W E)P(Yobs, rl/). Suppose there exists a reparameterization ((w) = (4s, m) such that 1. (s is a nonconstant function of wE, 2. The observed likelihood L((s, M Ylobs, r) is a constant as a function of 5s, 3. Given s fixed, L(s, M Ylobs, r) is a nonconstant function of 5m then s is a sensitivity parameter. Unfortunately, fully parametric selection models and shared parameter models do not allow sensitivity analysis as sensitivity parameters cannot be found (Daniels and Hogan, 2008, Chapter 8). Examining sensitivity to distributional assumptions, e.g., random effects, will provide different fits to the observed data, (Yobs, r). In such cases, a sensitivity analysis cannot be done since varying the distributional assumptions does not provide equivalent fits to the observed data (Daniels and Hogan, 2008). It then becomes an exercise in model selection. Fully Bayesian analysis allows researchers to have a single conclusion by admitting prior beliefs about the sensitivity parameters. For continuous responses, Lee and Berger (2001) built a semiparametric Bayesian selection model which has strong distributional assumption for the response but weak assumption on missing data mechanism. Scharfstein et al. (2003) on the other hand, placed strong parametric assumptions on missing data mechanism but minimal assumptions on the response outcome. 1.3 NonLikelihood Methods In nonlikelihood approaches, the joint distribution of the outcomes is typically modeled semiparametrically and estimating equations are used for inference. Liang and Zeger (1986) proposed generalized estimating equations (GEE) whose solution is consistent if the marginal mean of response is correctly specified. However, inference based on GEE is only valid under MCAR. Robins et al. (1995) proposed inverseprobability of censoring weighted generalized estimating equations (IPCWGEE) approach, which reweights each individual's contribution to the usual GEE by the estimated probability of dropout. IPCWGEE will lead to consistent estimation when the missingness is MAR. However, both GEE and IPCWGEE can result in biased estimation under MNAR. Rotnitzky et al. (1998a, 2001), Scharfstein et al. (2003) and Schulman et al. (1999) adopted semiparametric selection modeling approaches, in which the model for dropout is indexed by interpretable sensitivity parameters that express departures from MAR. For such approaches, the inference results depend on the choice of unidentified, yet interpretable, sensitivity analysis parameters. 1.4 Intermittent Missingness Intermittent missingness occurs when a missing value is followed by an observed value. The existence of intermittent missing values increases exponentially the number of missing patterns that need to be properly modeled. Thus, handling informative intermittent missing data is methodologically and computationally challenging and, as a result, the statistics literature is limited. One approach to handle intermittent missingness is to consider a "monotonized" dataset, whereby all observed values on an individual after their first missingness are deleted Land et al. (2002). However, this increases the "dropout" rate, loses efficiency, and may introduce bias. Other methods in the literature often adopt a likelihood approach and rely on strong parametric assumptions. For example, Troxel et al. (1998), Albert (2000) and Ibrahim et al. (2001) suggested a selection model approach. Albert et al. (2002) used a shared latent autoregressive process model. Lin et al. (2004) employed latent class pattern mixture model. Semiparametric methods have been proposed by Troxel et al. (1998) and Vansteelandt et al. (2007). Troxel et al. (1998) proposed a marginal model and introduced a pseudolikelihood estimation procedure. Vansteelandt et al. (2007) extended the ideas of Rotnitzky et al. (1998b), Scharfstein et al. (1999) and Rotnitzky et al. (2001) to nonmonotone missing data that assume (exponentially tilted) extensions of sequential explainability and specified parametric models for certain conditional means. Most related to the approach we will use in Chapter 3 are the (partial ignorability) assumptions formalized in Harel and Schafer (2009) that partition the missing data and allow one (or more) of the partitions to be ignored given the other partition(s) and the observed data. Specifically, Harel and Schafer (2009) defined a missing data mechanism to be partially missing at random if P(rlYobs, Ymis, g(r), x; O( )) = p(rlyobs, g(r), x; (w )) where g() denotes a summary function of r and can be chosen based on what aspects of r are related to missing values. These assumptions are similar to the sequential explainability assumption reviewed in Vansteelandt et al. (2007). In this dissertation, we explicitly partition the missing data indicator vector r into {rs, s}, where s = maxt{rt = 1} denotes the last time point a response was observed, i.e. the "survival" time, and rs = {rt: t < s} denotes the missing data indicators recorded prior to the dropout time. With this partition, we define partial missing at random as follows: Definition 1.7. Missing responses are partially missing at random if P(rs Yobs Ymis, S,X; (W)) = p(rs Yobs, S,X; (W)). This can be viewed as Harel and Schafer's definition with g(r) chosen to be the survival time. In the following Chapters, we first propose a model that admits identification of the treatmentspecific distributions of the trajectory of longitudinal binary outcomes when there is dropout, but no intermittent observations. We then obtain identification with intermittent missing observations by assuming, that within dropout and treatment strata, the intermittent missing responses are missing at random. This is the partial ignorability assumption in Definition 1.7. 1.5 Identifying Restrictions in Pattern Mixture Models Patternmixture models by construction are not identified: the observed data does not provide enough information to identify the distributions for incomplete patterns (Little, 1993, 1994). Additional assumptions about the missing data process are necessary in order to yield identifying restrictions that equate the inestimable parameters to functions of estimable parameters and identify the fulldata model. For example, consider the situation when y = (yl, y2) is a bivariate normal response with missing data only in y2. Let s be the survival time, i.e. s = 1 if y2 is missing and s = 2 if y2 is observed. We model p(s) and p(yls) as s ~ Bern(O) and yls = i ~ N(piS), (s)) for i = 1, 2, with (s) = and (s) J= 1 . (s) (s) (s) 2 12 22 For s = 1, only yl is observed. Therefore, parameters p1), oa) (equals o,)) and a) are not identified. By assuming y2 ly, s = 1 y y2ly1, s = 2, which is the available case missing value (ACMV) restriction defined later in this section, we have (1) (2) ~ 1 21) (1) (2) .21 (2) /2 + () )(Y1 P ) 2 ( (2)Y1 1 11 11 1 (1))2 ( (2))2 '(1) (2) ( 21l 22 (1) 22 (2) 11 11 (1) (2) U21 21 (1) (2)' 11 11 by which all the unidentified parameters are identified. Understanding (identifying) restrictions that lead to MAR is an important first step for sensitivity analysis under missing not at random (MNAR) (Daniels and Hogan, 2008; Scharfstein et al., 2003; Zhang and Heitjan, 2006). In particular, MAR provides a good starting point for sensitivity analysis and sensitivity analysis are essential for the analysis of incomplete data (Daniels and Hogan, 2008; Scharfstein et al., 1999). Little (1993) developed several common identifying restrictions. For example, complete case missing value (CCMV) restrictions which equate all missing patterns to the complete cases, i.e. Pk(yj lYi) = PJ(YjlYj1); equating parameters to a set of patterns, that is set parameters in pattern k, namely 0(k), equal to the set of patterns S 0(k) ZrjOG); jES or equating pattern distributions to a mixture of a set of patterns 5, i.e. PkQ)= jPj(.) jEs Some special case of the patternset mixture models restrictions include nearestneighbor constraints: Pk(Yjj1) = pj( ylj1), and available case missing value (ACMV) constraints: Pk(jYIYji) = P(yj yj1) Molenberghs et al. (1998) proved that for discrete time points and monotone missingness, the ACMV constraint is equivalent to missing at random (MAR). Thijs et al. (2002) developed strategies to apply identifying restrictions. That is first fit Pk(Yk), then choose an identifying restriction to identify the missing patterns. Multiple imputation can be applied by drawing unobserved components from the identified missing patterns. Kenward et al. (2003) discussed identifying restrictions corresponding to missing nonfuture dependence. 1.6 Dissertation Goals There will be two major components to this dissertation. First, we will develop a Bayesian semiparametric model for longitudinal binary responses with nonignorable missingness, including dropout and intermittent missingness. Second, we will carefully explore identifying restrictions for pattern mixture models. Bayesian shrinkage model: We propose two different parameterizations of saturated models for the observed data distribution, as well as corresponding shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a nonfuture dependence model for the dropout mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. Identifying restrictions and sensitivity analysis in pattern mixture models: The normality of response data (if appropriate) for pattern mixture models is desirable as it easily allows incorporation of baseline covariates and introduction of sensitivity parameters (for MNAR analysis) that have convenient interpretations as deviations of means and variances from MAR (Daniels and Hogan, 2008). However, multivariate normality within patterns can be overly restrictive when applying identifying restrictions. We explore such issues in Chapter 4. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with timeinvariant coefficients). In this Chapter, we also explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for sensitivity analysis. Problems caused by baseline covariates with timeinvariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution. CHAPTER 2 A BAYESIAN SHRINKAGE MODEL FOR LONGITUDINAL BINARY DATA WITH DROPOUT 2.1 Introduction 2.1.1 Breast Cancer Prevention Trial The Breast Cancer Prevention Trial (BCPT) was a large multicenter, doubleblinded, placebocontrolled, chemoprevention trial of the National Surgical Adjuvant Breast and Bowel Project (NSABP) designed to test the efficacy of 20mg/day tamoxifen in preventing breast cancer and coronary heart disease in healthy women at risk for breast cancer (Fisher et al., 1998). The study was open to accrual from June 1, 1992 through September 30, 1997 and 13,338 women aged 35 or older were enrolled in the study during this interval. The primary objective was to determine whether longterm tamoxifen therapy is effective in preventing the occurrence of invasive breast cancer. Secondary objectives included quality of life (QOL) assessments to evaluate benefit as well as risk resulting from the use of tamoxifen. Monitoring QOL was of particular importance for this trial since the participants were healthy women and there had been concerns voiced by researchers about the association between clinical depression and tamoxifen use. Accordingly, data on depression symptoms was scheduled to be collected at baseline prior to randomization, at 3 months, at 6 months and every 6 months thereafter for up to 5 years. The primary instrument used to monitor depressive symptoms over time was the Center for Epidemiologic Studies Depression Scale (CESD)(Radloff, 1977). This selftest questionnaire is composed of 20 items, each of which is scored on a scale of 03. A score of 16 or higher is considered as a likely case of clinical depression. The trial was unblinded on March 31, 1998, after an interim analysis showed a dramatic reduction in the incidence of breast cancer in the treatment arm. Due to the potential loss of the control arm, we focus on QOL data collected on the 10,982 participants who were enrolled during the first two years of accrual and had their CESD score recorded at baseline. All women in this cohort had the potential for three years of followup (before the unblinding). In the BCPT, the clinical centers were not required to collect QOL data on women after they stopped their assigned therapy. This design feature aggravated the problem of missing QOL data in the trial. As reported in Land et al. (2002), more than 30% of the CESD scores were missing at the 36month followup, with a slightly higher percentage in the tamoxifen group. They also showed that women with higher baseline CESD scores had higher rates of missing data at each followup visit and the mean observed CESD scores preceding a missing measurement were higher than those preceding an observed measurement; there was no evidence that these relationships differed by treatment group. While these results suggest that the missing data process is associated with observed QOL outcomes, one cannot rule out the possibility that the process is further related to unobserved outcomes and that this relationship is modified by treatment. In particular, investigators were concerned (a priori) that, between assessments, tamoxifen might be causing depression in some individuals, who then do not return for their next assessment. If this occurs, the data are said be missing not at random (MNAR); otherwise the data are said to be missing at random (MAR). 2.1.2 Informative DropOut in Longitudinal Studies In this paper, we will concern ourselves with inference in longitudinal studies, where individuals who miss visits do not return for subsequent visits (i.e., dropout). In such a setting, MNAR is often referred to as informative dropout. While there were some intermittent responses in the BCPT, we will, as in Land et al. (2002), consider a "monotonized" dataset, whereby all CESD scores observed on an individual after their first missing score have been deleted (this increases the "dropout" rate). There are two main inferential paradigms for analyzing longitudinal studies with informative dropout: likelihood parametricc) and nonlikelihood (semiparametric). Articles by Little (1995), Hogan and Laird (1997b) and Kenward and Molenberghs (1999) as well as recent books by Molenberghs and Kenward (2007) and Daniels and Hogan (2008) provide a comprehensive review of likelihoodbased approaches, including selection models, patternmixture models, and sharedparameter models. These models differ in the way the joint distribution of the outcome and missing data processes are factorized. In selection models, one specifies a model for the marginal distribution of the outcome process and a model for the conditional distribution of the dropout process given the outcome process (see, for example, Albert, 2000; Baker, 1995; Diggle and Kenward, 1994; Fitzmaurice et al., 1995; Heckman, 1979a; Liu et al., 1999; Molenberghs et al., 1997); in patternmixture models, one specifies a model for the conditional distribution of the outcome process given the dropout time and the marginal distribution of the dropout time (see, for example, Birmingham and Fitzmaurice, 2002; Daniels and Hogan, 2000; Fitzmaurice and Laird, 2000b; Hogan and Laird, 1997a; Little, 1993, 1994, 1995; Pauler et al., 2003; Roy, 2003; Roy and Daniels, 2008; Thijs et al., 2002); and in sharedparameter models, the outcome and dropout processes are assumed to be conditionally independent given shared random effects (see, for example, DeGruttola and Tu, 1994; Land et al., 2002; Pulkstenis et al., 1998; Ten Have et al., 1998, 2000; Wu and Carroll, 1988; Yuan and Little, 2009). Traditionally, these models have relied on very strong distributional assumptions in order to obtain model identifiability. Without these strong distributional assumptions, specific parameters from these models would not be identified from the distribution of the observed data. To address this issue within a likelihoodbased framework, several authors (Baker et al., 1992; Daniels and Hogan, 2008; Kurland and Heagerty, 2004; Little, 1994; Little and Rubin, 1999; Nordheim, 1984) have promoted the use of global sensitivity analysis, whereby non or weakly identified, interpretable parameters are fixed and then varied to evaluate the robustness of the inferences. Scientific experts can be employed to constrain the range of these parameters. Nonlikelihood approaches to informative dropout in longitudinal studies have been primarily developed from a selection modeling perspective. Here, the marginal distribution of the outcome process is modeled non or semiparametrically and the conditional distribution of the dropout process given the outcome process is modeled semi or fully parametrically. In the case where the dropout process is assumed to depend only on observable outcomes (i.e., MAR), Robins et al. (1994, 1995), van der Laan and Robins (2003) and Tsiatis (2006) developed inverseweighted and augmented inverseweighted estimating equations for inference. For informative dropout, Rotnitzky et al. (1998a), Scharfstein et al. (1999) and Rotnitzky et al. (2001) introduced a class of selection models, in which the model for dropout is indexed by interpretable sensitivity parameters that express departures from MAR. Inference using inverseweighted estimating equations was proposed. The problem with the aforementioned sensitivity analysis approaches is that the ultimate inferences can be cumbersome to display. Vansteelandt et al. (2006a) developed a method for reporting ignorance and uncertainty intervals (regions) that contain the true parameters) of interest with a prescribed level of precision, when the true data generating model is assumed to fall within a plausible class of models (as an example, see Scharfstein et al., 2004). An alternative and very natural strategy is specify an informative prior distribution on the non or weakly identified parameters and conduct a fully Bayesian analysis, whereby the ultimate inferences are reported in terms of posterior distributions. In the crosssectional setting with a continuous outcome, Scharfstein et al. (2003) adopted this approach from a semiparametric selection modeling perspective. Kaciroti et al. (2009) proposed a parametric patternmixture model for crosssectional, clustered binary outcomes Lee et al. (2008) introduced a fullyparametric patternmixture approach in the longitudinal setting with binary outcomes. In this paper, we consider the same setting as Lee et al. (2008), but offer a more flexible strategy. In the context of BCPT, the longitudinal outcome will be the indicator that the CESD score is 16 or higher. 2.1.3 Outline The paper is organized as follows. In Section 2.2, we describe the data structure. In Section 2.3 and 2.4, we formalize identification assumptions and prove that the fulldata distribution is identified under these assumptions. We introduce a saturated model for the distribution of the observed data in Section 2.5. In Section 2.6, we illustrate how to apply shrinkage priors to highorder interaction parameters in the saturated model to reduce the dimensionality of the parameter space and how to elicit (conditional) informative priors for nonidentified sensitivity parameters from experts. In Section 2.7, we assess, by simulation, the behavior of three classes of models for the distribution of observed data; parametric, saturated, and shrinkage. Our analysis of the BCPT trial is presented in Section 2.8. Section 2.9 is devoted to a summary and discussion. 2.2 Data Structure and Notation Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen and Z = 0 denotes placebo. Let Yj denote the binary outcome (i.e., depression) scheduled to be measured at thejth visit ( = 0(baseline),..., J) and let Yy = (Yo,.... Yj) denote the history of the outcome process through visit j. Let Rj denote the indicator that an individual has her depression status recorded at visit j. We assume that Ro = 1 (i.e., Yo is always observed) and Rj = 0 implies that Rj i = 0 (i.e., monotone missing data). Let C = max{t : Rt = 1} be the last visit at which an individual's depression status is recorded. The full and observed data for an individual are F = (Z, C, Yj) and O = (Z, C, Yc), respectively. We assume that we observe n i.i.d., copies of 0. We will use the subscript i to denote data for the ith individual. Our goal is to draw inference about p_ = P[Y = 1 Z = z] forj = 1,... J and z = 0, 1. 2.3 Assumptions To identify pj from the distribution of the observed data, we make the following two untestable assumptions: Assumption 1 (NonFuture Dependence): Rj is independent of (j,,,..., Yj) given Rj_ = 1 and Yj, forj = 1...J 1. This assumption asserts that for individuals at risk for dropout at visit and who share the same history of outcomes up to and including visit j, the distribution of future outcomes is the same for those who are last seen at visit j and those who remain on study past visit j. This assumption has been referred to as nonfuture dependence (Kenward et al., 2003). Assumption 2 (PatternMixture Representation): Forj = 1,..., J and yj = 0, 1, P[Yj = yjRj = 0, Rj1 = 1, Yj_, Z = z] = P[Yj = yjlRj = 1,Yj_I, Z = z] exp{qz,(Yy_l, yj)} E[exp{qz,i(Yi_1, Yj)} Rj = 1, Yj_1, Z = z] where qz,(Yy_, Yj) is a specified function of its arguments. Assumption 2 links the nonidentified conditional distribution of Yj given Rj = 0, Rj_1 = 1, Yil, and Z = z to the identified conditional distribution of Yj given Rj = 1, Yi_1, and Z = z using exponential tilting via the specified function qz,(Y y_, Yj). Assumption (2) has a selection model representation that is obtained using Bayes' rule. Assumption 2 (Selection Model Representation): Forj = 1,..., J, logit {P[R = 01 Rj_1 = 1, Y,, Z = z]} = hz,(Y,_1) + qz (Y_, Yj) where hz,j(Yj_) = logit P[R = 01Rj_ = 1, Yj_,Z = z]  log{E[exp{qz,(Y_l, Yj)}IR_ = 1, Y _1, Z = z]} With this characterization, we see that the function qz,(Yi_i, Yj) quantifies the influence (on a log odds ratio scale) of the potentially unobservable outcome Yj on the conditional odds of dropping at time j. 2.4 Identifiability The above two assumptions nonparametrically, justidentify pj for all j 1,..., J and z = 0, 1. To see this, consider the following representation of this conditional distribution, derived using the laws of total and conditional probability: j ,zi = P[Y = 1IRJ = 1, Y1= y _1, Z = z]x P[R, = 1R,_i = 1, Y1 = Y,i, Z = z] I P[Yi = yIRY = 1, Y,_i = Y,i, Z = z] + /=1 /= 0 kI kI x P[R =1 Ri1 = 1,Y=1 = Y/1, Z = z] P[Yi = yIRi = 1,Yk1 = Y1, Z =z] /=1 /=0 All quantities on the right hand side of this equation are identified, without appealing to any assumptions, except P[YY = 1 Rk = 0, Rki = 1, Yk1 = Yk1, Z = z] for k = 1, ... j 1. Under Assumptions 1 and 2, these probabilities can be shown to be identified, implying that :,j is identified for all j and z. Theorem 1: P[Yj = 1IRk1 = 1,Yki Yk1, Z = z] and P[Y; = 1IRk = 0, Rki 1, Yk = Yk1, Z = z] are identified for k = 1,... j. Proof: The proof follows by backward induction. Consider k = j. By Assumption 2, P[Yj = 1Ry = 0, Rj_i = 1, Yj_1 = yj_l, Z = z] = P[Yj = 1 R = 1, Y_i, Z = z] exp{qz,(Yj_i, 1)} E[exp{qz,(Y_l = YY)}R = 1, Y_= y j_, Z = z] Since the right hand side is identified, we know that P[Yj = 1 Rj = 0, Rj_ = 1, Y _1 = yj, Z = z] is identified. Further, we can write P[Yj = 1Rj_ I= 1, Y_il = y _i,Z= z] 1 . = > P[Y = 1R, = r, Rj_1 1,Yj I= Yj_1, Z = z]P[Rj = rlR 1 = 1,Y j = yji, Z = z] r=O Since all quantities on the right hand side are identified, P[Yj = 1 Rj_ = 1, Y _ = y _1, Z = z] is identified. Suppose that P[Yj = 1IRk = 0, Rki = 1,Yk1 Yk, Z = z] and P[Y = 1IRk1 = 1, Yki = Yk1, Z = z] are identified for some k where 1 < k < j. Then, we need to show that these probabilities are identified for k' = k 1. To see this, note that P[Yj = IlRk' = 0, Rk'1 = 1,Yk'i = Yk,1, Z = z] = P[Y = 1 iRkI = 0, Rk2 = 1,Yk2 = Yk2, Z = z] 1 = P[Y,=IlRk =O, Rk2 Yk1 Yk1,Z=z]x Yk 1 0 P[Yk1 Yk1IRk1 = 0, Rk2 = 1, Yk2 = k2, Z = z] 1 = P[Y = 1IRk = 1,Yk k,Z = z]x Yk 1 0 P[Yk1 Yk1 Rki = 1, Yk2 = k2, Z = z] exp{qz,kI(Yk2, Yk)} E[exp{q,k1(Yk2, Yk1)}Rk1 = 1,Yk2 = Yk2, Z = z] The third equality follows by Assumptions 1 and 2. Since all the quantities on the right hand side of the last equality are identified, P[Yj = 1 Rk, = 0, Rk'1 = 1, Yk' = yk'1, Z = z] is identified. Further, P[Yj = 1IRk/_I = Yk'i = yk'1, Z = z] = P[Yj = 1lRk2 = 1,Yk2 = Yk2, Z = z] 1 = P[Yj= 1IRk 1,Yk1 Ykl,Z=z]x Yk1=0 P[Yk1 = Yk Rk = 1, Yk2 = Yk2, Z = z]x P[Rki = 1 Rk2 = 1,Yk2 = Yk2, Z = ]+ 1 SP[Yj = 1Rk = 0, Rk2 = 1, Yk = Yk, Z = z]x Yk1= 0 P[Yk1 = Yki Rk = 0, Yk2 = Yk2, Z = z]x P[Rk1 = 0Rk2 Yk2 Yk2, Z z] 1 S P[Y = Rkl 1,Yk1 Yk, Z = z]x Yk1= 0 P[Yk1 = ki Rki = 1, Yk2 = Yk2, Z = z]x P[Rk = 1 Rk2 1,Yk2 = Yk2, Z = z]+ Yk1= 0 P[Yk1 = Yk Rk1 = 1, Yk2 = Yk2, Z = z] exp{qz,k1(Yk2, Yk1)} E[exp{qz,k1(Yk2, Yk1)} Rk1 = 1, Yk2 = Yk2, Z = z] P[Rk1 = 0Rk2 = 1,Yk2 = Yk2, Z = z] The third equality follows by Assumptions 1 and 2. Since all the quantities on the right hand side of the last equality are identified, P[Yj = 1IRk'1 = 1, Yk'1 = Yk1, Z = z] is identified. o The identifiability result shows that, given the functions qz,j(Yj, Y), p*,j can be expressed as functional of the distribution of the observed data. In particular, the functional depends on the conditional distributions of Yj given Rj = 1, Yj1, and Z forj = 0,... J and the conditional distributions of Rj given Rj_i = 1, Yj_ and Z forj = 1,..., J. Furthermore, the functions qzj(Yjy_, Yj) are not identifiable from the distribution of the observed data and their specification places no restrictions on the distribution of the observed data. 2.5 Modeling We specify saturated models for the observed data via the sequential conditional distributions of [Yj R = 1, Yj _, Z] forj = 0,..., J and the conditional hazards [RjI Rj = 1, Yji, Z] forj = 1,..., J. We parameterize these models as follows: logit P[Yo = 1Ro = 1, Z = z] = z,o,o j2 logit P[Yj = R 1 1R ,Yj_I = Yj1, Z = z] = az,o + az,j,ly1 + OzkYk k=0 Z (2) ,(3) C(j1) ..y. zjYkY/Ym +... z + YoYi Y 1 k,IA 2) k,l,mEA6 3) j2 (1) logit P[Rj = 0RjI = 1, YI  Yj1, Z = z] = 7,j,o + 7z,j,lji + _7z,j,kk k=O Z (2) (3) C 1) + (.YkY 7 ),.Yky/Ym +... + yz,j Y1) l "yj1 k,IEA2) k,l,mEA(3) forj = 1,..., J, where A t) is the set of all ttuples of the integers 0,... j 1. Let a denote the parameters indexing the conditional distributions [YjIRj = 1, Yl, Z], 7 denote the parameters indexing the conditional distributions [Rj Rj_ = 1, Y _1, Z] and 0= {a,7}. Furthermore, we propose to parameterize the functions qzj(Y_, Yj) with parameters z,.y = qzj((Yi1, 1)) qz,((Yi1, 0)). Here, exp(,,yj _) represents, in the context of the BCPT trial, the conditional odds ratio of dropping out between visits j 1 and j for individuals who are depressed vs. not depressed at visit j, but share the mental history yj through visit 1. We let 7 denote the collection of Tjy 's. 2.6 Prior Specification and Posterior Computation For specified sensitivity analysis parameters 7, the saturated model proposed in Section 2.5 provides a perfect fit to the distribution of the observed data. In this model, however, the number of parameters increases exponentially in J. In contrast, the number of data points increases linearly in J. As a consequence, there will be many combinations of Yj1 (i.e., "cells") which will be sparsely represented in the dataset. For example, in the BCPT trial, about 50% of the possible realizations of Y7 have less than two observations and about 15% have no observations. For a frequentist perspective, this implies that components of 0 will be imprecisely estimated; in turn, this can adversely affect estimation of pj. This has been called the curse of dimensionality (Robins and Ritov, 1997). 2.6.1 Shrinkage Priors To address this problem, we introduce data driven shrinkage priors for higher order interactions to reduce the number of parameters in an automated manner. In particular, we assume z,j,k ~ N(, at) and 7,k N(0,) k A(t) 3 < t where t is the order of interactions and the hyperparameters (shrinkage variances) follow distributions (t) Unif(0, 10) and a(t) ~ Unif(0, 10). When oat) and (t) equal zero for all interactions, the saturated model is reduced to a first order Markov model, logit P[Yo = 1Ro = 1, Z = z] = az,o,o logit P[Yj = 1Rj = 1, Yj1 = yj1, Z = z] = azjo + zj,lYj1 logit P[Rj = 0Rj _l = 1,Yi = yji, Z = z] = 7z,o + 7z,,lyj1. The shrinkage priors allow the "neighboring" cells in the observed data model to borrow information from each other and provide more precise estimates. When the first order Markov model is not true, as n goes to infinity, the posterior means of observed data probabilities will converge to their true values as long as the shrinkage priors are 0(1) (which is the case here) and all the true values of the observed data probabilities, P[YjIRj = 1, Yj_, Z] forj = 0,..., J and are in the open interval, (0, 1). This follows, since under this latter condition, all combinations of depression histories have a positive probability of being observed and the prior will become swamped by the observed data. However, when the true value of any of the observed data probabilities is zero or one, there exists at least one combination of depression history that will never be observed and thus the influence of the prior will not dissipate as n increases. We specify noninformative priors N(0, 1000) for the noninteraction parameters in 0, namely cazjo forj = 0,... J and z = 0, 1, aczj,, 7zj,o and 7,j,1 forj = 1,... J and z = 0,1. 2.6.2 Prior of Sensitivity Parameters The sensitivity parameters in Assumption 2, defined formally in Section 2.5, are (conditional) odds ratios. In our experience, subject matter experts often have difficulty thinking in terms of odds ratios; rather, they are more comfortable expressing beliefs about relative risks (Scharfstein et al., 2006; Shepherd et al., 2007). With this is in mind, we asked Dr. Patricia Ganz, a medical oncologist and expert on quality of life outcomes in breast cancer, to express her beliefs about the risk of dropping out and its relationship to treatment assignment and depression. We then translated her beliefs into prior distributional assumptions about the odds ratio sensitivity parameters r. Specifically, we asked Dr. Ganz to answer the following question for each treatment group: Q: Consider a group of women assigned to placebo (tamoxifen), who are on study through visit 1 and who share the same history of depression. Suppose that the probability that a randomly selected woman in this group drops out before visit j is p (denoted by the columns in Table 1). For each p, what is the minimum, maximum and your best guess (median) representing how much more (e.g. twice) or less (e.g., half) likely you consider the risk of dropping out before visit j for a woman who would be depressed at year RELATIVE to a woman who would not be depressed at visit ? Implicit in this question is the assumption that, for each treatment group, the relative risk only depends on past history and the visit number only through the risk of dropping out between visits j 1 and j. For notational convenience, let rz(p) denote the relative risk of dropout for treatment group z and dropout probability p. Further, let rz,min(p), rz,med(p) and rz,max(p) denote the elicited minimum, median, and maximum relative risks (see Table21). Let pzj(Yji) = P[Rj = 0ORj = 1,Y/_j = y_1,,Z = z] and let Pj(yy1) = P[Rj = OiR1 = 1,Y_1 = yi_ Y = y, Z = z] fory = 0,1. By definition, ^P ,yi)) =PZ( /1 P^ 1) P p (yj_1)7(yYj_) 1 p=,jy ) =y l)Wzj ( ) y=O where ((y) = P[Yj = yRy = 1,YJ_ = yi, Z = z] for y = 0,1. This implies that z ()(y. ) P[___ pzy YRi1 (0) Pzj(Yj1) PZ(,y,/) (yi_ )(rz(Pz,(Yj_1)) 1)+ 1 Since r~ ,(yj) E [0, 1], given pz,(Yj1) and rz(pz,(yy_1)), p],)(y) is bounded as follows: for rz(pzj(yj_ )) > 1, Pz,j(yj)/rz(pzj(yj)) < PzC (yjI) < min{pz,(yj_l), 1} and, for rz(pz,j(yj)) < 1, Pz,j(Yj) < PO(yj1) < mn{hpz,(Yjle)/rz(pz,(Yjle)), 1}. We will use these bounds to construct our prior. We construct the conditional prior of Tz,,y,, given pz,(yi1) using Steps 14 given below. The general strategy is to use the elicited information on the relative risk at different dropout probabilities and the bounds derived above to construct the prior of interest. Step 1. For m c {min, med, max}, interpolate the elicited rz,m(p) at different dropout probabilities (see Figure 21) to find rz,m(Pzj(yji_)) for any pzj(yj_). Step 2. Construct the prior of rz(pzj(yi)) given pz,(Yj_,) as a 5050 mixture of Uniform(rz,min(pz,(yj_)), rz,med(Pzj(Yj1)) and Uniform(rz,med(pz,j (yj1)), z,max (Pzj(j)) random variables. This preserves the elicited percentiles of the relative risk. Step 3. Construct a conditional prior of p )(yil) given pzj(Yj1) and rz(pzj(yi_)) as a uniform distribution with lower bound Pz,i(Yi1) max {rz(pz,(yj 1)),1} and upper bound min P(Yj1) 1 mi m {rzn((pz,(Yi)), 1} max {r(pzj(yj)), 1} The bounds were derived above. Step 4. Steps (2) and (3) induce a prior for Tz,yy  by noting is Ziy (= log rz(pz (y_ ))(1 (o y_)) S1 rz(PzYj1))P(Yj1) i.e., Tzj,,y is a deterministic function of rz(pzj(yji)) and pz(yi). The relative risks elicited from Dr. Ganz are given in Table 22. We extrapolated the relative risks outside the ranges given in Table 22 as shown in Figure 21. Figure 22 shows the density of r given pz,(yj,) equal 10% and 25% for the tamoxifen and placebo arms. For two patients with the same response history up to time point j 1, the log odds ratio of dropping out at time point j, for the patient that is depressed at time point j versus the patient that is not, increases as the overall drop out rate at time point increases. In general, for a given pzj(Yj), the log odds ratio is higher for patients in the tamoxifen versus placebo arms. 2.6.3 Posterior Computation With the shrinkage priors on 0, the elicited conditional priors 7 given 0, and the observed data, the following steps are used to simulate draws from the posterior of p : 1. Using the proposed observed data model with the shrinkage priors on 0, we simulate draws from the posterior distributions of P[Yj = 1Rj = 1, Yj_ = yj, Z = z] and P[Rj = 0ORj = 1, Yj_ = yj, Z = z] for all j, z and yj,_ in WinBUGS. 2. For each draw of P[Rj = 01Rj = 1, Y _ = yi1, Z = z], we draw ry 1 based on the conditional priors described in Section 6.2. 3. We compute pj by plugging the draws of P[Yj = 1Rj = 1, Yj = y,_, Z = z], P[Rj = 01Rj = 1, Y_1 = yj, Z = z] and zjy into the identification algorithm discussed in Section 2.4. To sample from the posterior distributions of P[Yj = 1Rj = 1, Yj_i = yj_1, Z = z] and P[Rj = 0 Rj = 1, Yji = yj, Z = z] in WinBUGS we stratify the individual binary data (by previous response history) and analyze as Binomial data; this serves to drastically improve the computational efficiency. Sampling zjy and computing p*, is implemented separately from the first step using R. 2.7 Assessment of Model Performance via Simulation Via simulation, we compared the performance of the shrinkage model with a correct parametric model (given below), an incorrect parametric model (first order Markov model) and the saturated model with diffuse priors (given below). The shrinkage model uses the shrinkage priors proposed in Section 2.6.1 (shrink the saturated model toward a first order Markov model). Note that the shrinkage priors shrink the saturated model to an incorrect parametric model. For the saturated model with diffuse priors, we reparameterize the model as P[Yj = 1Ry, = 1,Y,_j = Yj_1, Z = z] = #z,yj,, P[Rj = 0Rj_1 = 1,Y,_1 = yi, Z = z] = Pz,,.1 forj = 1,... 7, and specify independent Unif(0, 1) on p's and p's. We simulated observed data from a "true" parametric model of the following form: logitP[Yo = 1Ro = 1, Z = z] = az,o,o logitP[Y1 = 1Ri = 1, Yo = Yo, Z = z] = az,,o + oz,1,1yo logitP[R1 = 0IRo = 1, Yo = yo, Z = z] = 7z,i,o + 7z,i,iYo logitP[Yj = 1Rj = 1, Yj_ = yj_1, Z = z] = Oaz,o + az,jlYj + az,j,2Yj2 logitP[Rj = 0Rj_1 = 1, Yj1 = Yj1, Z = z] = 7zj,o + 7zj,lj + 7z,j,2Yj2, forj = 2 to 7. To determine the parameters of the data generating model, we fit this model to the "monotonized" BCPT data in WinBUGS with noninformative priors. We used the posterior mean of the of parameters az and 7 as the true parameters. We compute the "true" values of pj by (1) drawing 10,000 values from the elicited prior of Tz given 7y given in Table 22, (2) computing pj using the identification algorithm in Section 2.4 for each draw, and (3) average the resulting pj's. The model parameters and the "true" depression rates p j, are given in Table 23. We considered (relatively) small (3000), moderate (5000), and large (10000) sample sizes for each treatment arm; for each sample size, we simulated 50 datasets. We assessed model performance using the mean squared error (MSE) criterion. In Table 24, we report the MSEs of P[Yj = 1 Rj = 1, Yj, Z = z] and P[Rj = 1\Rj_1 = 1, Yj_1, Z = z] averaged over allj and all Yj_I (see columns 3 and 4, respectively). We also report the MSEs for pj (see columns 612). For reference, the MSEs associated with the true data generating model are bolded. This table demonstrate that the shrinkage model generally outperforms both the incorrectly specified parametric model and the saturated model at all sample sizes. This improved performance is especially noticeable when comparing the MSEs for the rates of depression at times 37. In addition, the MSEs for the shrinkage model compare favorably with those of the true parametric model for all sample sizes considered, despite the fact that the shrinkage priors were specified to shrink toward an incorrect model. 2.8 Application: Breast Cancer Prevention Trial (BCPT) Table 25 displays the treatmentspecific monotonized dropout rates in the BCPT By the 7th study visit, more than 40% of patients had missed one or more assessments, with a slightly higher percentage in the tamoxifen arm. We fit the shrinkage model to the observed data using WinBUGS, with four chains of 8000 iterations and 1000 burnin. Convergence was checked by examining trace plots of the multiple chains. 2.8.1 Model Fit and Shrinkage Results To assess the model fit, we compared the empirical rates and posterior means (with 95% credible intervals) of P[Yj = 1, Rj = 1Z = z] and P[Rj = 0 Z = z]. As shown in Figure 23, the shrinkage model fits the observed data well. Figure 24 illustrates the effect of shrinkage on the model fits by comparing the difference between the empirical rate and posterior mean of P[Yj = 1 Rj = 1, YYl, Z = z] for allj, z and Yj_i. We can see that for early time points, the difference is close to zero since there is little shrinkage applied to the model parameters. For later time points, more higher order interaction coefficients are shrunk toward zero and the magnitude of difference increases and drifts away from zero line. In general, the empirical estimates are less reliable for the later time points (re: the simulation results in Section 7). In some cases, there are no observations within "cells." By shrinking the high order interactions (i.e., borrowing information across neighboring cells), we are able to estimate P[Yj = 1l R = 1, Y _i, Z = z] for all j, z and Yj_1 with reasonable precision. 2.8.2 Inference Figure 25 shows the posterior of P[Y7 = 1Z = z], the treatmentspecific probability of depression at the end of the 36month follow up (solid lines). For comparison, the posterior under MAR (corresponding to point mass priors for r at zero) is also presented (dashed lines). The observed depression rates (i.e., complete case analysis) were 0.115 on both the placebo and tamoxifen arms. Under the MNAR analysis (using the elicited priors), the posterior mean of the depression rates at month 36 were 0.126 (95%C/ : 0.115, 0.138) and 0.130 (95%C/ : 0.119, 0.143) for the placebo and tamoxifen arms; the difference was 0.004 (95%C/ : 0.012, 0.021). Under MAR, the rates were 0.125 (95%C/ : 0.114, 0.136) and 0.126 (95%C/ : 0.115, 0.138) for the placebo and tamoxifen arms; the difference was 0.001 (95% C : 0.015, 0.018). The posterior probability of depression was higher under the MNAR analysis than the MAR analysis since researchers believed depressed patients were more likely to drop out (see Table 22), a belief that was captured by the elicited priors. Figure 26 shows that under the two treatments there were no significant differences in the depression rates at every time point (95% credible intervals all cover zero) under both MNAR and MAR. Similar (nonsignificant) treatment differences were seen when examining treatment comparisons conditional on depression status at baseline. 2.9 Summary and Discussion In this paper, we have presented a Bayesian shrinkage approach for longitudinal binary data with informative dropout. Our model provides a framework that incorporates expert opinion about nonidentifiable parameters and avoids the curse of dimensionality by using shrinkage priors. In our analysis of the BCPT data, we concluded that there was little (if any) evidence that women on tamoxifen were more depressed than those on placebo. An important feature of our approach is that the specification of models for the identifiable distribution of the observed data and the nonidentifiable parameters can be implemented by separate independent data analysts. This feature can be used to increase the objectivity of necessarily subjective inferences in the FDA review of randomized trials with informative dropout. Penalized likelihood (Fan and Li, 2001; Green and Silverman, 1994; Wahba, 1990) is another approach for highdimensional statistical modeling. There are similarities between the penalized likelihood approach and our shrinkage model. In fact, the shrinkage priors on the saturated model parameters proposed in our approach can be viewed as a specific form for the penalty. The ideas in this paper can be extended to continuous outcomes. For example, one could use the mixtures of Dirichlet processes model (Escobar and West, 1995) for the distribution of observed responses. They can also be extended to multiple cause dropout; in this trial, missed assessments were due to a variety of reasons including patientspecific causes such as experiencing a protocol defined event, stopping therapy, or withdrawing consent and institutionspecific causes such as understaffing an staff turnover. Therefore, some missingness is less likely to be informative; extensions will need to account for that. In addition, institutional differences might be addressed by allowing institutionspecific parameters with priors that shrink them toward a common set of parameters. For smaller sample sizes, WinBUGS has difficulty sampling from the posterior distribution of the parameters in the shrinkage model. In addition, the "monotonizing" approach ignores the intermittent missing data and may lead to biased results. These issues will be examined in the next Chapter. 2.10 Acknowledgments This research was supported by NIH grants R01CA85295, U10CA37377, and U10CA69974. The authors are grateful to oncologist Patricia Ganz at UCLA for providing her expertise for the MNAR analysis. 2.11 Tables and Figures Table 21. Relative Risks to be Elicited Drop out Rate p Question Relative Risk p, P2 ... 100% confident the number is above rz,min(p) Best Guess rz,med(P) 100% confident the number is below r,max(p) Table 22. Percentiles of Relative Risks Elicited Drop out Rate Treatment Percentile 10% 25% Tamoxifen Minimum 1.10 1.30 Median 1.20 1.50 Maximum 1.30 1.60 Placebo Minimum 1.01 1.20 Median 1.05 1.30 Maximum 1.10 1.40 Table 23. Simulation Scenario Parameter Tamoxifen ao 2.578 2.500 2.613 a 2.460 1.978 a2 1.500 70 2.352 2.871 71 0.611 0.397 72 0.121 Depression Rate 0.066 0.097 0.119 Placebo co 2.653 a1 a2 70 71 72 Depression Rate 0.071 2.632 2.59 2.708 2.304 1.241 2.308 2.970 0.466 0.468 0.293 0.107 0.118 Time Point 2 3 2.752 1.940 1.599 2.625 0.460 0.422 0.124 2.663 1.874 1.608 2.729 0.469 0.323 0.120 2.626 2.023 1.389 2.513 0.247 0.261 0.139 2.598 2.104 1.471 2.474 0.272 0.278 0.132 4 5 2.789 2.072 1.612 2.281 0.320 0.035 0.126 2.884 2.068 1.693 2.410 0.376 0.288 0.130 2.811 1.885 1.639 2.217 0.127 0.293 0.126 2.853 2.123 1.540 2.460 0.088 0.241 0.126 2.895 2.007 1.830 2.536 0.228 0.204 0.123 3.035 2.243 1.989 2.673 0.001 0.428 0.125 i Table 24. Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arms, respectively. Observed uT Model Treat Sample Size 3000 True P T Parametric P T Shrinkage P T Saturated P Sample Size 5( True Parametric Shrinkage Saturated Y R 1 2 3 4 5 6 7 0.946 1.075 30.176 28.882 6.970 6.988 35.678 34.654 )00 P 0.659 T 0.604 P 30.029 T 28.571 P 4.628 T 4.448 P 30.274 T 29.599 Sample Size 10000 True P T Parametric P T Shrinkage P T Saturated P T 0.314 0.278 29.849 28.418 2.392 2.474 22.989 22.245 0.378 0.431 0.451 0.385 1.999 2.401 67.171 62.606 0.238 0.261 0.381 0.277 1.188 1.414 54.647 51.219 0.121 0.126 0.315 0.223 0.707 0.712 37.716 34.791 0.034 0.048 0.026 0.035 0.034 0.048 0.026 0.037 0.033 0.045 0.024 0.026 0.036 0.050 0.026 0.033 0.023 0.027 0.020 0.020 0.023 0.027 0.020 0.023 0.025 0.028 0.017 0.023 0.023 0.028 0.020 0.020 0.009 0.009 0.011 0.013 0.009 0.010 0.011 0.014 0.008 0.010 0.011 0.015 0.009 0.009 0.011 0.013 0.049 0.046 0.048 0.056 0.051 0.056 0.054 0.045 0.027 0.032 0.028 0.042 0.032 0.028 0.028 0.032 0.015 0.013 0.016 0.02 0.015 0.011 0.016 0.014 0.052 0.056 0.062 0.078 0.059 0.053 0.058 0.059 0.028 0.027 0.033 0.042 0.031 0.026 0.033 0.028 0.017 0.013 0.023 0.026 0.017 0.015 0.018 0.014 0.050 0.052 0.061 0.073 0.047 0.063 0.101 0.097 0.036 0.034 0.044 0.049 0.035 0.038 0.061 0.051 0.013 0.015 0.028 0.028 0.014 0.016 0.018 0.021 0.05 0.065 0.078 0.061 0.067 0.086 0.096 0.077 0.052 0.066 0.119 0.073 0.231 0.561 0.329 0.722 0.043 0.058 0.035 0.044 0.052 0.073 0.056 0.057 0.048 0.057 0.033 0.044 0.138 0.290 0.140 0.392 0.014 0.018 0.019 0.020 0.033 0.043 0.039 0.043 0.013 0.014 0.019 0.023 0.038 0.094 0.048 0.128 Table 25. Patients Cumulative Drop Out Rate Month 3 6 12 18 24 30 36 Tamoxifen Available 5364 4874 4597 4249 3910 3529 3163 Drop out 490 767 1115 1454 1835 2201 2447 Drop Rate(%) 9.13 14.30 20.79 27.11 34.21 41.03 45.62 Placebo Available 5375 4871 4624 4310 3951 3593 3297 Drop out 504 751 1065 1424 1782 2078 2304 Drop Rate(%) 9.38 13.97 19.81 26.49 33.15 38.66 42.87 co Tamoxifen: maximum r i Tamoxifen: median r I / ^ ,' ,, / Placebo:maximum co Tamoxifen:minimum S' / Placebo:median >c /' /Placebo:minimum o ./ 0 10 25 100 Drop Out Rate Figure 21. Extrapolation of the elicited relative risks. 48 CD Figure 22. Figure 22.  I\ I ( I \ I ( I ) I \ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ) I II I 1 I 1 I I I I I I I I I I ) I I I I I I I I I \ Prior conditional density Tj,,y given pz,(yj_). Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for pzj(y,) = 0.25 and pzj(Yi) = 0.10, respectively. 0.2 0.4 0.6 0.8 I I I I I ).0 Tamoxifen 3 12 24 Time Points Figure 23. 36 3 12 24 Time Points 36 Solid and dashed lines represent the empirical rate of P[Yj = 1, Rj = 1Z = z] and P[Rj = 0Z = z], respectively. The posterior means of P[Yj = 1, Rj = 1 Z = z] (diamond) and P[Rj = 0 Z = z] (triangle) and their 95% credible intervals are displayed at each time point. B Placebo A1:Placebo (Y) . 0 S0 0 0 o 0 0o o o o o o A2:Tamoxifen(Y) .. .. ......  0 0 00 I0O O 0 0 0O 0 00 o o o o o 0 0 0 o B1:Placebo(R) o o .. .... .oo 00 0 0 o I o 0 o o o o o C 6 C 8 B2:Tamoxifen(R) .* ..  i i o o o 0 0 0000 0 0 0 C 6 C 8 o o o o o CL6 6 C8 o B2:Tamoxifen(R) ............  o cI I 9_,o% Po ooo oo Oooo o ooo oo o o o o oo o C6 C8 o o Pattern of Historical Response Figure 24. Differences between posterior mean and empirical rate of P[Y 1 Rj = 1, Yj_, Z = z] (Al and A2) and P[Rj = 0Rj1 = 1, Yi_1, Z = z] (B1 and B2). The xaxis is ordered by follow up time C (max{t : Rt 1}). The bullets are the posterior mean of P[Yj = 1 Rj = 1, Y_, Z = z] and P[Rj = ORi_ = 1Y, Z = z] when there are no patients with historical response Y,_1. O 00 o CO a> C)o C) N CM 0 Figure 25. .10 0.11 0.12 0.13 0.14 0.15 Depression Probability Posterior distribution of P[Y7 = 1Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively. TT T 8 T  I I ,, 18 Time Points Figure 26. Posterior mean and 95% credible interval of difference of P[Yj = 1 Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively. CHAPTER 3 A BETABINOMIAL BAYESIAN SHRINKAGE MODEL FOR INTERMITTENT MISSINGNESS LONGITUDINAL BINARY DATA 3.1 Introduction We proposed in Chapter 2 a Bayesian shrinkage approach for longitudinal binary data with informative dropout. The saturated observed data models were constructed sequentially via conditional distributions for response and for drop out time and parameterized on the logistic scale using all interaction terms. However, two issues were not addressed: the "ignored" intermittent missing data and the intrinsic computational challenge with the "interaction" parameterization. This Chapter proposes solutions to these two issues. 3.1.1 Intermittent Missing Data In the BCPT, approximately 15% of the responses were intermittently missing, i.e. there are missing values prior to dropout. One approach to handle intermittent missingness is to consider a "monotonized" dataset, whereby all CESD scores observed on an individual after their first missing score are deleted, as in Land et al. (2002); we did this in Chapter 2. However, this increases the "dropout" rate, throws away information and thus loses efficiency, and may introduce bias. Handling informative intermittent missing data is methodologically and computationally challenging and, as a result, the statistics literature is relatively limited. Most methods adopt a likelihood approach and rely on strong parametric assumptions (see, for example, Albert, 2000; Albert et al., 2002; Ibrahim et al., 2001; Lin et al., 2004; Troxel et al., 1998). Semiparametric methods have been proposed by Troxel et al. (1998) and Vansteelandt et al. (2007). Troxel et al. (1998) proposed a marginal model and introduced a pseudolikelihood estimation procedure. Vansteelandt et al. (2007) extended the ideas of Rotnitzky et al. (1998b), Scharfstein et al. (1999) and Rotnitzky et al. (2001) to nonmonotone missing data. Most related to our approach are the (partial ignorability) assumptions proposed in Harel and Schafer (2009) that partition the missing data and allow one (or more) of the partitions to be ignored given the other partition(s) and the observed data. In this Chapter, we apply a partial ignorability assumption such that the intermittent missing data mechanism can be ignored given dropout and treatment strata. 3.1.2 Computational Issues WinBUGS is a popular software package that allows convenient application of MCMC techniques. However, there are major drawbacks. For the shrinkage model proposed in Chapter 2, Section 2.5, WinBUGS has difficulty sampling from the posterior distribution of the parameters when sample size is relatively small (less than 3000 per arm). Tailored sampling algorithms can be written to overcome this difficulty, however, WinBUGS lacks the flexibility to incorporate modifications and/or extensions to its existing algorithms. In this Chapter, we will provide an alternative parameterizations of the saturated model for the observed data as well as alternative shrinkage prior specifications to improve computational efficiency. This alternative approach to posterior sampling can easily be programmed in R. 3.1.3 Outline This Chapter is organized as follows. In Section 3.2, we describe the data structure, formalize identification assumptions and prove that the treatmentspecific distribution of the full trajectory of longitudinal outcomes is identified under these assumptions. In Section 3.3, we introduce a saturated model for the distribution of the data that would be observed when there is dropout, but no intermittent observations. We then introduce shrinkage priors to parameters in the saturated model to reduce the dimensionality of the parameter space. In Section 3.4, we assess, by simulation, the behavior of three classes of models: parametric, saturated, and shrinkage. Our analysis of the BCPT trial is presented in Section 3.5. Section 3.6 is devoted to a summary and discussion. 3.2 Notation, Assumptions and Identifiability To address the intermittent missingness, we redefine the notation in Chapter 2, Section 2.2, as well as introduce some additional notation in this Section. The following notation is defined for a random individual. When necessary, we use the subscript i to denote data for the ith individual. Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen and Z = 0 denotes placebo. Let Y be the complete response data vector with elements Yj denoting the binary outcome (i.e., depression) scheduled to be measured at the jth visit ( = O(baseline),..., J) and let Yy = (Yo,..., Yj) denote the history of the outcome process through visit j. Let R be the vector of missing data indicators with the same dimension as Y, such that Rj = 1 indicates Yj is observed and Rj = 0 indicates Yj is missing. Let S = max{t : Rt = 1} be the last visit at which an individual's depression status is recorded. If S < J, then we say that the individual has dropped out and S is referred to as the dropout time. Let Rs = {Rj : j < S} be the collection of intermittent missing data indicators recorded prior to S. We will find it useful to distinguish three sets of data for an individual: the complete data C = (Z, S, Rs, Y), the full data F = (Z, 5, Rs, Ys), and the observed data 0 = (Z, 5, Rs, Yobs), where Yobs is the subset of Y for which Rj = 1. It is useful to also define Ymis = (Ymis ,Y is, Y i), where Yis = Y : R = 0,j < S} denotes the "intermittent" missing responses, Y i, = : j = S + 1,j < J} denotes the missing response at the time right after dropout, and Yis = { Yj S + 1 < j < J} denotes the "future" missing responses. Note that Ys = (Ymis, Yobs). We assume that individuals are drawn as a simple random sample from a superpopulation so that we have an i.i.d. data structure for C, F and 0. We let the parameters Oz index a model for the joint conditional distribution of S and Ys given Z = z and the parameters qs,z index a model for the conditional distribution of Rs given S = s, Ys and Z = z. We assume that the parameters Oz and z = (1,z ..., 0,z) are distinct. Our goal is to draw inference about j = P[Y = llZ = z] forj = 1,...,J and z = 0, 1. To identify p/j from the distribution of the observed data, we make the following three untestablee) assumptions: Assumption 1: Given Z and S, the intermittent missing data are missing at random, i.e., Rs IL Ymis Z, S, Yobs. Under this assumption the parameters of the joint conditional distribution of S and Ys given Z = z are estimable from the distribution of the observed data. This assumption plus the assumption that Oz is a priori independent of Oz implies that the intermittent missingness mechanism is ancillary or ignorable. Specifically, this means that when considering inferences about Oz from a likelihood perspective, as we are in this paper, the conditional distribution of Rs given Z, S and Yobs does not contribute to the likelihood and can be ignored (Harel and Schafer, 2009). Assumptions 2 and 3 are the same as Assumptions 1 and 2 in Chapter 2, Section 2.3, respectively. We restate below the two assumption using the "survival" time S notation (instead of missing indicators R in Chapter 2). Assumption 2 (NonFuture Dependence): Forj = 1,..., J, P[S = 1 S > 1, Y] = P[S = 1 S >j 1,YJ] Assumption 3 (PatternMixture Representation): Forj = 1,... J and yj = 0, 1, P[Yj = I z P[Yj =yS >j,Yj1,Z= z]exp{qz,(Yj_1, y)} E[exp{qz,(Yj_l, Y)}IS > j, Yj_i, Z = z] where qz,(Yjy_, Y,) is a specified function of its arguments. Theorem 1: P[Yj = 1IS > k 1,Yk = Yk1, Z = z] and P[Yj = 11S = k 1,Yk = yk1, Z = z] are identified for k = 1,..., j under Assumptions 13. Proof: Under Assumption 1, we know that the parameters of the conditional joint distribution of S and Ys given Z = z are estimable from the distribution of the observed data. The rest of the proof is the same as in Chapter 2. The identifiability result shows that, given the functions qz,j(Yj, Yj), I*,j can be expressed as functional of the distribution of the observed data. 3.3 Modeling, Prior Specification and Posterior Computation 3.3.1 Modeling We reparameterize the saturated observed data model in Chapter 2 as follows: P[Yo = 1] = az,o P[Y = 1IS > 1, Yo = y, Z = z] = azl,y P[Y, = 1IS > j, = y, Y,2 = _2, Z = z] = a Z.,yJ_2 (31) P[S = 0 Yo = y, Z = z] = 7,o,y P[S =j 1S >j Y_1 = y, Y2 = Yj2, Z = z] = 7z'1,Y 2,y for = 2,.. ,Jandy = 0, 1. Let az denote the parameters indexing the first set of models for response and 7 denote the parameters indexing the second set of models for dropout. Recall that we defined Oz to denote the parameters of the conditional distribution of S and Ys given Z = z; thus, Oz = (az, y7). This saturated model avoids the complex interaction term model parameterization. As a result, the (conditional) posterior distributions of Oz will have simple forms and efficient posterior sampling is possible even when the sample size is moderate or small. We use the same parameterization of the functions qz(Yj_1, Yj) as in Chapter 2, Section 2.5. 3.3.2 Shrinkage Prior In Chapter 2, the strategy to avoid the curse of dimensionality was to apply shrinkage priors for higher order interactions to reduce the number of parameters (i.e., shrink them to zero). For the directly parameterized model 31, we use a different shrinkage strategy. In particular, we propose to use Beta priors for shrinkage as follows: azy, 2 Beta (m () /() (1 m() () " zj zj Y)/ Y (32) z,j,yI, ,y Beta ( j (,y zj,y m(Y) ,y) forj = 2,... J and y = 0, 1. For az,o, oz,l,y and 7z,o,y for y = 0, 1, we assign Unif(0, 1) priors. Let m) (m()) and r ) (r)()) denote the parameters m ,z (mJ_y) and ,y (r) 1y,) respectively. Note that for a random variable X that follows a Beta(m/T/, (1 m)/,) distribution, we have E[X] = m and Var[X] = m(1 m) x  I+l For fixed m, Var[X] > 0 as r > 0, indicating shrinkage of the distribution of X toward the mean. Thus, (), and / ,y serve as shrinkage parameters for azjy2y and 7zjy ,y respectively. As the shrinkage parameters go to zero, the distribution of the probabilities a z and 7 J ,y are shrunk toward the mean of the probabilities that do not depend on yj2, namely m() and m()i) respectively. In essence, the model is being shrunk toward a firstorder Markov model. The shrinkage priors allow "neighboring cells" to borrow information from each other and provide more precise inferences. Theorem 2: When there is no intermittent missingness, the proposed model yields consistent posterior means of the observed data probabilities, as long as all the true values of the observed data probabilities are in the open interval (0, 1). Proof: See Appendix. We specify independent Unif(0, 1) priors for m") and m'). For the shrinkage parameters ( and r_,) we specify independent, uniform shrinkage priors (Daniels, 1999) as follows .)y 9. ,2 nd l,(Y) I zgE ) and T ) zJ ~(g (E)) ) y+ 1)2,j1,y SzJ, zJ,y (g (E) (7y) )2 where * g() is a summary function (e.g., minimum, median or maximum, as suggested in Christiansen and Morris (1997)). Ey) = {e() : the expected number of subjects with S > j, Y,_1 = y, Y_2 = zzJyiy 2,Y yj2, Z = z}. E( {e ) the expected number of subjects with S > j 1, Yj_1 = zJl,y Z,J lyIj 2 y, Yj2 Y 2, Z z}. The expected number of subjects with S > j, Y_1 = y, Yi2 = Y2, Z = z and with S > j 1, Yi = y, Yj2 = Y2, Z = z can be computed as: J e y =nz P[S =s, Y = ,,... j = Y = y, Yj2= Yj_2Z = z] s= yj,yj+I ...,ys J eK'Y = nz P[S = s, Y, = y,..., Yj = Y, Y1 = y, Yj2 = Yj_21Z = z] s=j1 y,,yj,+...,ys (34) where the probabilities on the right hand side of the above equations are estimable under Assumption 1. The expected sample sizes above are used in the prior instead of the observed binomial sample sizes which are not completely determined due to the intermittent missingness. Thus, our formulation of these priors induces a small additional amount of data dependence beyond its standard dependence on the binomial sample sizes. This additional dependence affects the median of the prior but not its diffuseness. 3.3.3 Prior of Sensitivity Parameters We use the same approach as in Chapter 2, Section 2.6.2 for constructing priors of Tz given Oz. (33) 3.3.4 Posterior Computation Compared to Chapter 2, posterior computations for the observed data model are much easier and more efficient under the reparameterized model 31 and the Beta shrinkage priors. The posterior sampling algorithms can be implemented in R with no sample size restrictions. The following steps are used to simulate draws from the posterior of : 1. Sample P(0z, Ymis Yobs, 5, Rs, Z = z) using Gibbs sampling with data augmentation (see details in Appendix). Continue sampling until convergence. 2. For each draw of 7 lYy 2,Y 1, draw .,y. 1 based on the conditional priors described in Section 2.6.2. 3. Compute pZ by plugging the draws of azy 2 ,y, Zy2,y, and T,,y into the identification algorithm discussed in Section 2.4. 3.4 Assessment of Model Performance via Simulation For assessment of model performance, we use the same "true" parametric model as in Chapter 2, Section 2.7 to simulate observed data (no intermittent missingness). We again compared the performance of our shrinkage model with (1) a correct parametric model, (2) an incorrect parametric model (first order Markov model) and (3) a saturated model (with diffuse priors). Our shrinkage model uses the shrinkage priors proposed in Section 3.3.2. We considered small (500), moderate (2000), large (5000) and very large (1,000,000) sample sizes for each treatment arm; for each sample size, we simulated 500 datasets. We assessed model performance using mean squared error (MSE). In Table 32 (sample size 1,000,000 not shown), we report the MSE's of P[Yj = 1S > j, Yi_, Z = z] and P[S = j 11S > j 1, Y , Z = z] averaged over all j and all Yj_1 (see columns 3 and 4, respectively). We also report the MSE's for pj (see columns 612). For reference, the MSE's associated with the true data generating model are bolded. At all sample sizes, the shrinkage model has lower MSE's for the rates of depression at times 37 than the incorrectly specified parametric model and the saturated model. Our simulation results show that as sample size goes to infinity (e.g. very large, 1,000,000), both the shrinkage model and the saturated model converge to the true values of pj, whereas the incorrectly specified parametric model yields biased estimates. In addition, the MSE's for the parameters pj in the shrinkage model compare favorably with those of the true parametric model for all sample sizes considered, despite the fact that the shrinkage priors were specified to shrink toward an incorrect model. 3.5 Application: Breast Cancer Prevention Trial (BCPT) Table 31 displays the treatmentspecific dropout and intermittent missing rates in the BCPT. By the 7th study visit (36 months), more than 30% of patients had dropped out in each treatment arm, with a slightly higher percentage in the tamoxifen arm. 3.5.1 Model Fit We fit the shrinkage model to the observed data using R, with multiple chains of 5000 iterations and 1000 burnin. Convergence was checked by examining trace plots of the multiple chains. We defined g(.) in the priors for the hyperparameters (Equation 33) to be the maximum function. To compute the expected number of subjects e() and z,j,y 2'Y e Y in Equation (34), we assigned a point mass prior at 0.5 to all mV m(), (a) and ri() (which corresponds to Unif(0, 1) priors on a z,y2, and 7z,y, ,) and sampled az,y,y and 7zy21Y using Step 1 in the algorithm described in Section 3.3.4. To avoid data sparsity, we calculated P[S = s, Ys = Ys] using the posterior mean of ca,y ,y and 7z, 2,y rather than the empirical probabilities. To assess model fit, we compared the empirical rates and posterior means (with 95% credible intervals) of P[Yj = 1, S > jlZ = z] and P[S < j Z = z]. As shown in Figure 31, the shrinkage model fits the observed data well. Figure 32 illustrates the effect of shrinkage on the model fit by comparing the difference between the empirical rates and posterior means of P[Yj = 1 S > j, Y,_i = y _1, Z = z] for the tamoxifen arm (Z = 1) and j = 6, 7. We use the later time points to illustrate this since the observed data were more sparse and the shrinkage effect was more apparent. The empirical depression rates often reside on the boundary (0 or 1). In some cases, there are no observations within "cells", thus the empirical rates were undefined. From the simulation results in Section 2.7, we know that the empirical estimates are less reliable for later time points. Via the shrinkage priors, the probabilities P[Yj = 15 > j, Yj1 = y_1, Yj2 = Yj2, Z = z] with the same yj2 are shrunk together and away from the boundaries. By borrowing information across neighboring cells, we are able to estimate P[Yj = 1 S > j, Yjl, Z = z] for all z and Yj_1 with better precision. The differences between the empirical rates and the posterior means illustrate the magnitude of the shrinkage effect. In the BCPT, the depression rate was (relatively) low and there were few subjects at the later times that were observed with a history of mostly depression at the earlier visits; as a result, the differences were larger when Yj_i had a lot of 1's (depression). 3.5.2 Inference Figure 33 shows the posterior of P[Y7 = 1Z = z], the treatmentspecific probability of depression at the end of the 36month follow up (solid lines). For comparison, the posterior under MAR (corresponding to point mass priors for r at zero) is also presented (dashed lines). The observed depression rates (i.e., complete case analysis) were 0.124 and 0.112 for the placebo and tamoxifen arms, respectively. Under the MNAR analysis (using the elicited priors), the posterior mean of the depression rates at month 36 were 0.133 (95%C/ : 0.122, 0.144) and 0.125 (95%C/ : 0.114, 0.136) for the placebo and tamoxifen arms; the difference was 0.007 (95%C/ : 0.023, 0.008). Under MAR, the rates were 0.132 (95%C/ : 0.121, 0.143) and 0.122 (95%C/ : 0.111, 0.133) for the placebo and tamoxifen arms; the difference was 0.01 (95%C/ : 0.025, 0.005). The posterior probability of depression was higher under the MNAR analysis than the MAR analysis since researchers believed depressed patients were more likely to drop out (see Table 22), a belief that was captured by the elicited priors. Figure 34 shows that under the two treatments there were no significant differences in the depression rates at any measurement time (95% credible intervals all cover zero) under both MNAR and MAR. Similar (nonsignificant) treatment differences were seen when examining treatment comparisons conditional on depression status at baseline. 3.5.3 Sensitivity of Inference to the Priors To assess the sensitivity of inference on the 36 month depression rates to the elicited (informative) priors {rmin, rmed, rmax}, we considered several alternative scenarios based on Table 1. In the first scenario, we made the priors more or less informative by scaling the range, but leaving the median unchanged. That is, we considered increasing (or decreasing) the range by a scale factor v to {rmed v(rmed rmin), armed, armed + V(rmax rmed)}. In the second scenario, we shifted the prior by a factor u, {u + rmin, u + rmed, U + rmax }. The posterior mean and betweentreatment difference of the depression rate at month 36 with 95% Cl are given in Tables 33 and 34. None of the scenarios considered resulted in the 95% Cl for the difference in rates of depression at 36 months that excluded zero except for the (extreme) scenario where the elicited tamoxifen intervals were shifted by 0.5 and the elicited placebo intervals were shifted by 0.5. We also assessed the impact of switching the priors for the placebo and tamoxifen arms; in this case, the posterior means were 0.135 (95% C : 0.124,0.146) and 0.123 (95% C : 0.112, 0.134) for the placebo and tamoxifen arms respectively, while the difference was 0.012 (95% C : 0.027, 0.004). 3.6 Summary and Discussion In this Chapter, we extended the Bayesian shrinkage approach proposed in Chapter 2 for intermittent missingness. In addition, we reparameterized the saturated observed data model and dramatically improved the computational efficiency. WinBUGS can still be applied for the reparameterized model when there is no intermittent missing data. However, with the intermittent missingness, the augmentation step in the posterior computation requires extensive programming in WinBUGS. Nevertheless, the approach in Chapter 2 may still be preferred in certain cases, e.g. for directly shrinking the interaction terms. As an extension, we might consider alternatives to the partial ignorability assumption (Assumption 1) which has been widely used, but questioned by some (Robins, 1997). 3.7 Tables and Figures Table 31. Missingness by Scheduled Measurement Time Time Point j (Month) 1(3) 2(6) 3(12) 4(18) 5(24) 6(30) 7(36) Tamoxifen (Total N = 5364, Overall Missing 34.94%) Intermittent Missing 330 224 190 200 203 195 Dropout at j 160 122 259 280 332 352 369 Cumulative Dropout 160 282 541 821 1153 1505 1874 Placebo (Total N = 5375, Overall Missing 31.83 %) Intermittent Missing 347 215 153 181 199 197 Dropout at j 157 106 247 287 309 272 333 Cumulative Dropout 157 263 510 797 1106 1378 1711 Table 32. Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arms, respectively. Observed Model Treat Sample Size 500 True P Parametric Shrinkage Saturated Sample Size 2000 True P T Parametric P T Shrinkage P T Saturated P Sample Size 5C True Parametric Shrinkage Saturated Y 6.209 6.790 33.351 32.323 29.478 28.410 57.107 55.582 1.474 1.610 30.507 29.168 23.545 22.598 40.322 38.943 )00 P 0.594 T 0.623 P 29.983 T 28.616 P 18.83 T 18.055 P 30.071 T 29.156 pjz (Month) R 1(3) 2(6) 3(12) 2.474 2.789 1.511 1.602 2.310 2.365 111.263 104.882 0.586 0.634 0.543 0.495 0.647 0.615 77.627 72.731 0.234 0.265 0.379 0.298 0.394 0.322 54.454 50.590 0.199 0.205 0.199 0.205 0.202 0.212 0.202 0.211 0.052 0.050 0.051 0.050 0.053 0.050 0.053 0.050 0.020 0.024 0.020 0.024 0.020 0.024 0.020 0.024 0.225 0.228 0.227 0.226 0.226 0.232 0.228 0.245 0.058 0.063 0.055 0.064 0.056 0.063 0.057 0.064 0.024 0.024 0.025 0.025 0.024 0.024 0.024 0.024 0.258 0.297 0.26 0.292 0.252 0.294 0.302 0.383 0.063 0.062 0.064 0.071 0.063 0.063 0.069 0.067 0.026 0.028 0.029 0.035 0.026 0.028 0.027 0.029 4(18) 5(24) 6(30) 7(36) 0.313 0.344 0.317 0.345 0.303 0.336 0.490 0.657 0.078 0.080 0.081 0.086 0.078 0.080 0.100 0.110 0.033 0.035 0.037 0.048 0.033 0.036 0.038 0.039 0.319 0.331 0.323 0.333 0.312 0.330 1.083 1.352 0.081 0.093 0.090 0.101 0.082 0.093 0.188 0.218 0.031 0.033 0.043 0.045 0.031 0.034 0.052 0.059 0.352 0.405 0.349 0.403 0.337 0.390 2.401 3.167 0.086 0.095 0.091 0.110 0.084 0.095 0.457 0.560 0.040 0.039 0.049 0.060 0.039 0.040 0.13 0.148 0.390 0.428 0.388 0.425 0.372 0.419 4.427 5.782 0.097 0.101 0.108 0.121 0.095 0.102 0.946 1.223 0.036 0.040 0.055 0.059 0.036 0.041 0.270 0.373 Table 33. Sensitivity to the Elicited Prior Scenario (T:Tamoxifen, P:Placebo) VT = 5, VP = 5 VT = 0.2, vP = 0.2 uT = 0.5, uP = 0.5 uT = 0.5, uP = 0.5 Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25% Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80 Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00 Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10 P[Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.120, 0.143) 0.117(0.107,0.128) Placebo Minimum 0.85 0.80 1.04 1.28 1.51 1.70 0.51 0.70 Median 1.05 1.30 1.05 1.30 1.55 1.80 0.55 0.80 Maximum 1.30 1.80 1.06 1.32 1.60 1.90 0.60 0.90 P[Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.139(0.128, 0.150) 0.125(0.114,0.135) Difference of P[Y7 = 1](95% CI) 0.008(0.024, 0.008) 0.007(0.023, 0.008) 0.007(0.023, 0.009) 0.008(0.023, 0.007) Table 34. Sensitivity to the Elicited Prior Scenario (T:Tamoxifen, P:Placebo) VT = 5, VP = 0.2 vT = 0.2, vP = 5 uT = 0.5, uP = 0.5 uT = 0.5, uP = 0.5 Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25% Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80 Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00 Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10 P[Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.121, 0.143) 0.117(0.107, 0.128) Placebo Minimum 1.04 1.28 0.85 0.80 0.51 0.70 1.51 1.70 Median 1.05 1.30 1.05 1.30 0.55 0.80 1.55 1.80 Maximum 1.06 1.32 1.30 1.80 0.60 0.90 1.60 1.90 P[Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.125(0.114, 0.135) 0.139(0.128, 0.150) Difference of P[Y7 = 1](95% CI) 0.008(0.024, 0.008) 0.008(0.023, 0.008) 0.007(0.008, 0.023) 0.022(0.037, 0.006) Placebo Pj Tamoxifen 3 12 24 Time Points 36 3 12 24 Time Points Solid and dashed lines represent the empirical rate of P[Yj = 1, S > jZ = z] and P[S < jZ = z], respectively. The posterior means of P[Yj = 1, S > jZ = z] (diamond) and P[S < jZ = z] (triangle) and their 95% credible intervals are displayed at each time point. Figure 31 36 I I I I I I I I I I I I I I ";""~, (A) Conditional Depression Rate o Empirical A Modelbased, Empirical Undefined * Modelbased Posterior Mean of Depression Rate 0 a * 0 g S* o o o .. o 0 0 *. *: c o * 0o . *. oA "* 00 0 o 0 0 0 00 0 0 .. * 0 0 0 c* * ** * * A .  mo o .,* *. .**" 7A A * o o o 0 0 0 I 0 o o o o ooo o coo o, oc (B) Shrinkage Difference 0 0 o o 00o o ooo oo 0 c coo o 0 S0 oO0 o o o0 0 oo o 00 o o o o o 0     0a 0 0 00 0 0o o0 0 0 0 0 0 0 0 0 0 0 0 0oo [ Depressed I Not Depressed Yu I lll ll lll ll lll ll ll lll ll lll ll ll lll ll lll ll lll ll ll lll ll lll ll ll lll ll I IIIIIIIIII:I:I:I:Iiiiiiiiiiiiiiiiiiiiiiiiiiiii:::::::::: iiiiiiiiiiiiiiiiiiiiiiiii:i:i:iiiiiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii Historical Response Pattern Figure 32. (A) The empirical rate and modelbased posterior mean of P[Yj = 11S > j, Yj_ = yj,, Z = z] for Z = 2 and j = 6, 7. (B) The difference between the empirical and modelbased posterior mean of the depression rate. The xaxis is the pattern of historical response data Y,_I. * Co c, 000ooo oo0000 oo00 0 SCo 0 0 0 Cm oo 00 0 0 0 oo0 0 0 o O CO 0 a 0 O 00 o CO a> C)o C) N CM 0 Figure 33. .10 0.11 0.12 0.13 0.14 0.15 Depression Probability Posterior distribution of P[Y7 = 1Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively. T T T T T T HLH IL J I I I J J_ 18 Time Points Figure 34. Posterior mean and 95% credible interval of difference of P[Yj = 1 Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively. 3.8 Appendix Gibbs sampler for posterior computation: In the first step of the Gibbs sampler, we draw, for each subject with intermittent missing data, from the full conditional of Y1is given az, 7y, mV), i), m ), '), Yobs, S, Rs and Z = z. The full conditional distribution can be expressed as P[Yis = Ymisa, z, mz ), m(), rY Yobs = Yobs, S = s, Rs = rs, Z = z] P[Ymis = Ymis, Yobs = Yobs, S = s5z, 7z, m?, 7 m) ), Z = z] Eal y1 P[Y is= Ymis, Yobs = Yobs, S = s a, y, rz 7 ), r 7 ), Z = z] where the right hand side can be expressed as a function of yis, Yobs, S and az and 7z. In the second step, we draw from the full conditional of m ) given {Yis} a, z, 77 ) m( 77 {Yobs}, {S}, {Rs} and {Z} = z, where the notation {} denotes data 9 for all the individuals on the study. The full conditional can be expressed as J 1 ] m f {Y) msl' (y, {Yobs}, {S}, {Z} = z) j=2 y0 where f(Z) Y Yis}, az {Yobs}, {S}, {Z} = z) oc B(Z(aY.2; mz jy/ zJ'(1 )/ zy) Si Yi I=Y Y Si.1 ij1Y i.Zi=z and B(a; c, d) is a Beta density with parameters c and d. In the third step, we draw from the full conditional of mr) given {Ymi}, a~, Tz, '7), miz), 77, {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 f f f (m ( {Ymis 1}, Y, lz( {Yobs}, {S}, {Z} = z) j=2 y= where f(m {Ys 7z(y, {Yobs}, {S}, {Z} = z) ,j oc n B(is z,Y 1,yi.Y, z 1, ,y ,y S,1i1, Yj 1y i.Z,z In the fourth step, we draw from the full conditional of r) given ma), {Yis} az, "z, m 7, {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 7 7f(Iz {'Ymis}, a, mz, {Yobs}, {S}, {Z} = z) j=2 y=0 where f(() {Y ms} m z {Yobs}, {S}, {Z} = z) g (E y (zjy ( g,(E z ja Z 'j /.Z ,= z In the fifth step, we draw from the full conditional of j1) given m), {Ymis}, az, 7 m, 7 {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 I f(i j I y is' 7z, m' {Yobs}, {S}, {Z} = z) j=2 y=0 where g (E(7)l,y, f {jy mis 7z, 'i ,, z {Yobs}, {S}, {Z} = z) oc (E 2,) 7 ((z ,Jly) izl ,y + 1)2 x(n B(zy1Y, y (z) (1 z 1Y) z)A Sj1,Y j1=, S i 1,YI,j 1 Y I.Z,=z To draw from the full conditionals for steps two to five, we use slice sampling (Neal, 2003). In the sixth step, we draw from the full conditional of az given {Ymis}, 7z, m(), ,), mz, 7 7), {Yobs}, {5}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 S f(az, 2,y mis}Zj,y 'Z, {Yobs}, {S}, {Z} = z) j=2 y=0 all y 2 where f(Czj,y 2, Y is}, {Yobs}, {S}, {Z} = z) SB(, ; (a) /() m / + o( (1 M( /() + n( o (a) B( zJYj2 2,Y; M'z y/z*j,y zj,yj 2,y' zJY)/ zJy+ zj,yJ 2,y z,j,y 2,y n () is the number of subjects with S > j, Yj1 = y, Y2 = 2 and Z = z, and z,j,yj2,Y o0ZJ) is the number of subjects with S > j, Y1 = y, Yi2 = Y2, Z = z and Yj = 1. Finally, we draw from the full conditional of 7, given {Ymis}, z, r ), 7, r(), ), {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 I I 7zf l,y,yl { mis} ) _, (', {Yobs}, {S}, {Z} = z) j=2 y=O all y 2 where f(,,yj 2,{Ymis}, m , {Yobs}, {5}, {Z} = z) B (( m, / 7(') 1 M) (7 () (^ (()7) _o(7) SB(zi,yj 2, j,y zJ+1, Ozj1,y 2,y' L 1ymz')/ JI'y nzj1,y2,y z,yj"YJ 2,y n) is the number of subjects with S > j 1, y = y, Yj2 = Yj2 and Z = z, z,j1,yj 2'y and o() is the number of subjects with S = 1, Yj1 = y, Yi2 = Yi2 and Z = z. ZJYJ 2'Y Proof of Theorem 2: We will show that, with no intermittent missingness and the shrinkage priors (Equation 32), the posterior distributions of P[Yj = 1S > j, Y1, Yi2, Z] and P[S =j15S >j1, Yj1, Y2, Z], modeled as a zy2,y and 7z,yj 2', (re: Equation 31), for all Z, j and Yj_ are consistent under the condition that all the true values of the probabilities are in the open interval (0, 1). We use n() to denote the number of subjects with S > j, YV_1 = yj_1, Yj2  zj,y'j Y2, Z = z, and use n() denote the number of subjects with S > j 1, Y_ = y1, Yj2 = Y 2, Z = z. The condition that all the true values of az,y 2,y and 7z,y ,y are in the open interval (0, 1) holds if and only if as the number of subjects goes to infinity, all the n() and n() go to infinity. This indicates that to prove Theorem 2, we can zj,yj zjyJ 1 just prove that given Y = {Y1,..., Yk} and N = {ni,..., nk), where Yj Bin(nj, pj), pj ~ Beta(a, /) forj = ..., k and (a, /) has proper prior density r(a, 3), the posterior distributions for all pj are consistent as all nj go to infinity, with regard to the distributions under Yj ~ Bin(n, pj). To see this, note that 7r(pj Y, N) oc ( (1 p)"Y 7(piJa,/)7(a,/3)dad3. Note that P(j)= j r(pj Ia,/3)7(ao,/3)dod/3= p1(1 pj),'(a, 3)dad/3 = M < oc, r(PJ) is 0(1). As nj and Yj go to infinity (this occurs since pj C (0, 1)), by the Bernsteinvon Mises theorem, we have S( n(p pj)l Y, n,) N (0, p (1 p7)) in distribution, which further implies that E[pjY, N] pj a.s. Var[pjlY, N] 0 a.s.. CHAPTER 4 A NOTE ON MAR, IDENTIFYING RESTRICTIONS, AND SENSITIVITY ANALYSIS IN PATTERN MIXTURE MODELS 4.1 Introduction For analyzing longitudinal studies with informative missingness, popular modeling frameworks include pattern mixture models, selection models and shared parameter models, which differ in the way the joint distribution of the outcome and missing data process are factorized (for a comprehensive review, see Daniels and Hogan, 2008; Hogan and Laird, 1997b; Kenward and Molenberghs, 1999; Little, 1995; Molenberghs and Kenward, 2007). In this paper, we will concern ourselves with pattern mixture models with monotone missingness (i.e., dropout). For pattern mixture models with nonmonotone (i.e., intermittent) missingness (details go beyond the scope of this paper), one approach is to partition the missing data and allow one (or more) or the partitions to be ignored given the other partition(s) (Harel and Schafer, 2009; Wang et al., 2010). It is well known that patternmixture models are not identified: the observed data does not provide enough information to identify the distributions for incomplete patterns. The use of identifying restrictions that equate the inestimable parameters to functions of estimable parameters is an approach to resolve the problem (Daniels and Hogan, 2008; Kenward et al., 2003; Little, 1995; Little and Wang, 1996; Thijs et al., 2002). Common identifying restrictions include complete case missing value (CCMV) constraints and available case missing value (ACMV) constraints. Molenberghs et al. (1998) proved that for discrete time points and monotone missingness, the ACMV constraint is equivalent to missing at random (MAR), as defined by Rubin (1976) and Little and Rubin (1987). A key and attractive feature of identifying restrictions is that they do not affect the fit of the model to the observed data. Understanding (identifying) restrictions that lead to MAR is an important first step for sensitivity analysis under missing not at random (MNAR) (Daniels and Hogan, 2008; Scharfstein et al., 2003; Zhang and Heitjan, 2006). In particular, MAR provides a good starting point for sensitivity analysis and sensitivity analysis are essential for the analysis of incomplete data (Daniels and Hogan, 2008; Scharfstein et al., 1999; Vansteelandt et al., 2006b). The normality of response data (if appropriate) for pattern mixture models is desirable as it easily allows incorporation of baseline covariates and introduction of sensitivity parameters (for MNAR analysis) that have convenient interpretations as deviations of means and variances from MAR (Daniels and Hogan, 2008). However, multivariate normality within patterns can be overly restrictive. We explore such issues in this paper. One criticism of mixture models is that they often induce missing data mechanisms that depend on the future (Kenward et al., 2003). We explore such nonfuture dependence in our context here and show how mixture models that have such missing data mechanisms have fewer sensitivity parameters. In Section 4.2, we show conditions under which MAR exists and does not exist when the fulldata response is assumed multivariate normal within each missing pattern. In Section 4.3 and Section 4.4 in the same setting, we explore sensitivity analysis strategies under MNAR and under nonfuture dependent MNAR respectively. In Section 4.5, we propose a sensitivity analysis approach where only the observed data within pattern are assumed multivariate normal. In Section 4.6, we apply the frameworks described in previous sections to a randomized clinical trial for estimating the effectiveness of recombinant growth hormone for increasing muscle strength in the elderly. In Section 4.7, we show that in the presence of baseline covariates with timeinvariant coefficients, standard identifying restrictions cause overidentification of the baseline covariate effects and we propose a remedy. We provide conclusions in Section 8. 4.2 Existence of MAR under Multivariate Normality within Pattern Let Y be a Jdimensional longitudinal response vector with components scheduled to be measured at time points tj (j {1,..., J}); this is the full data response. Without loss of generality, we assume Y1 is always observed. Let S = s denote the number of observed responses (s = 1, 2,..., J) corresponding to the follow up time ts. Let Yj denote the historical response vector (Yi, Y2,... Yj). Finally, we define ps(.) = p(.IS = s). We show that MAR does not necessarily exist when it is assumed that YIS = s N(pi(s), '()) (41) for all s. To see this, we introduce some further notation. Let ,()(j) = E(YjIS = s) = s) and .(s)(j) = Var(YS = s)= J w r (2 ) 2(S2) where '(j) = E(YjlIS = s), P) (j) = E(YIS = s), )(j) = Var(Y;_ S =s), S)(j) = Var(YjIS = s), I)(j) = Cov(Y_ Y S = s) and s (j) is the transpose of Lemma 4.1. For monotone dropout, under the model given in (41), define s)(j) (S) :( )(j) (s)() = = p^\)j) (Ss^O) K3 22 21 (j) ( 11 12 The condition that for a given the conditional distributions ps(yj Y1 y) are identical for all s is equivalent to K(s)(j) ,s) (j) and Ks) (j) being constant in s. Proof. The proof is trivial since Y9 Yi ~ N (Ks2 1 (j1Y 3'3 In other words, if the condition in Lemma 4.1 is satisfied, then there exists a conditional distribution p>s(ylyji,) such that ps(yj Yi) = p>s(yjlYj1) for all s > j. We now state a theorem that gives the restrictions on the model given in (41) for MAR to exist. Theorem 4.2. For pattern mixture models with monotone dropout, under the model given in (41), identification via MAR constraints exists if and only if A(") and X(s) satisfy Lemma 4.1 for s > j and 1 < j < J. Proof. By Lemma 4.1, we only need to show that MAR constraints exist if and only if for all 1 Molenberghs et al. (1998) proved that MAR holds if and only if Pk(~jlY~j) = PJ( j = i P(S = s) pk(YJY) P(y y~ ~) Y j Ps (yj I,) (42) s .P(S =s) sj sj for all j > 2 and k < j. These conditionals are normal distributions since we assume YIS is multivariate normal. Suppose that there exists j such that ps(yjlyj1) is not the same for all s > j. Then from (42), pj(yjyj_) will be a mixture of normals whereas Pk(yjlyj1) will be a normal distribution. Thus, Molenbergh's condition will not be satisfied, i.e. the MAR constraints do not exist. On the other hand, if for all 1 < j < J, the conditional distributions ps(yjly j_) are identical for s > j, then Pk(Yjly j) and p.(yjyj) are both normally distributed and the identification restrictions Pk(YjlYj1) = Pj(y Jyj1) will result in MAR. D So, a default approach for continuous Y, assuming the full data response is multivariate normal within pattern, does not allow an MAR restriction (unless the restrictions in Theorem 4.2 are imposed). We now examine the corresponding missing data mechanism (MDM), SLY. We use "~" to denote equality in distribution. Corollary 4.3. For pattern mixture models of the form (41) with monotone dropout, MAR holds if and only if S Y SI Y1. Proof. Since Y1 is always observed (by assumption), SlY S Y implies that SIYmis, Yobs ~ SIYobs, where Ymis and Yobs denote the missing and observed data respectively. This shows that MAR holds. On the other hand, MAR implies that p(S = slY) = p(S = slYobs) = p(S = s Ys). By Theorem 4.2, we have that MAR holds only if for all 1 < j < J, the conditional distributions ps(y jlYj) are identical for s > j. Thus, under MAR Pk(Yjlj1) = p (YiYj1) = Ps(yjlYj1) for all j > 2, k < j and s > j. This implies that for all j > 2 J p(yj yj) = p,(ylj y_,)p(S = s) = ps(yjlj,) S1 for all s. Therefore, p(S = slY) = p(S = sy,) = p(S = s) p(ys) ps(ysys) ) ... p(y2 y ))p(yI1) p(Y s1) ... P(Y2 Y ) (1) P(Y) (S= s) = p(S = sly).  X p b    Thus, the implicit MDM is very restrictive and does not depend on the entire history, Ys. We now show connections to missing completely at random (MCAR) and other common identifying restrictions. Corollary 4.4. For pattern mixture models of the form (41) with monotone dropout, MCAR is equivalent to MAR if p, (yi) = p(yi) for all s. Proof. First, MCAR implies MAR. Second, in the proof of Corollary 4.3, we showed that MAR holds if p(S = slY) = p(S = s). p(ys) P(Yi) Thus under the assumption that ps(yi) = p(yi), MAR implies that p(S = slY) = p(S = s), i.e. MCAR. D Corollary 4.5. For pattern mixture models of the form (41) with monotone dropout, MAR constraints are identical to complete case missing value (CCMV) and nearest neighbor constraints (NCMV). Proof. By Theorem 4.2, the MAR constraints imply Pj(YjlYji) = PJ(Yjlyji) = P^j(YjlYjJ Therefore for all k < j, the MAR constraints Pk(YJYij) = P(y~jlj2) are identical to CCMV restrictions Pk(YjlYj1) = PJ(yjj1) and to NCMV restrictions Pk (Yj(yji) = pjyjy ) = The results in this section were all based on specifying the mixture model in (41) and demonstrate that MAR only exists under the fairly strict conditions given in Theorem 1. 4.3 Sequential Model Specification and Sensitivity Analysis under MAR Due to the structure of p(s) and I(s) under MAR constraints as outlined in Section 4.2, we propose to follow the approach in Daniels and Hogan (2008, Chapter 8) and specify distributions of observed Y within pattern as: ps(y1) ~ N(p~S),S)) 1 < s p,(yjyj) N( (Ij) ) 2!j)s 2 (/ j)) 2 < < s < J where j = {1, 2,... j 1}. Note that by construction, we assume ps(yIy j_) are identical for all j < s < J. Consequently, we have ps(y~ Yji) = P(y~lyji, S > s), denoted as ps(yjlYj1). Corollary 4.6. For pattern mixture models of the form (41) with monotone dropout, identification via MAR constraints exists if and only the observed data can be modeled as (43). Proof. Theorem 4.2 shows that identification via MAR constraints exists if and only if conditional distributions ps(yjlYj1) are identical for s > j and j > 2. That is, for observed data, we have Ps(yjlyj,) N( (>j) 7(1IJ)) D Corollary 4.6 implies that under the multivariate normality assumption in (41) and the MAR assumption, a sequential specification as in (43) always exists. We provide some details for MAR in model (41) (which implies the specification in (43) as stated in Corollary 4.6) next. Distributions for missing data (which are not identified) are specified as: Ps(YjlYj_) ~ N( ) < s The conditional mean structure of () and p) is parameterized as follows: 1i 1 j1 j1 1*L: o + Z PI y, /=1 To identify the fulldata model, the MAR constraints require that Pk(Yj j) = p(y.y j1) for k < j, which implies that WC) (J) and O ) = 1J) pjj = jj an L j for 2 < j < J. Since the equality of the conditional means need to hold for all Y, this further implies that the MAR assumption requires that /3) = (j) for 0 < < J. The motivation of the proposed sequential model is to allow a straightforward extension of the MAR specification to a large class of MNAR models indexed by parameters measuring departures from MAR, as well as the attraction of doing sensitivity analysis on means and/or variances in normal models. For example, we can let /o = A + /l) and log 0j) = A) + log oJ) for all j > 1 and 0 < / < j. Sensitivity analysis can be done on these A parameters that capture the information about the missing data mechanism. For example, in a Bayesian framework, we may assign informative priors elicited from experts to these sensitivity parameters A. Note in general we may have separate AU) and AW) for each pattern s (s < j), but in practice it is necessary to limit the dimensionality of these (Daniels and Hogan, 2008). Indeed, we could make A0) and AS) independent of j to further reduce the number of sensitivity parameters. To see the impact of the A parameters on the MDM, we introduce notation A0) 0A + A) Yi and then for k Y I Yil, S k N ( + Lj ) The conditional probability (hazard) of observing the first s observations given at least s observations is then given by: P(S = sY) 0s) (y (s))2 { eA) Iogp > = log P(S =s) + + + P(S > slY) 2 2a(s) 2 1/ s+l (Y 0 A>0 )2 j (Y1 (k))2 S '2 (>1 /) log P(S = k)(C ) 2 exp 2 k) 2e 0 T 2a(k) k (Y ~ ))2 J1 1 1() )2 exp (>/) x U (eA )2 p r ( 2 /=s+l /111 I=k+l /11 In general the MDM depends on Yj, i.e. MNAR. However, one might want hazard at time ts to only depend on Ys+ in which case we need to have different distributions and assumptions on [Yj Y _, S = k] for k 4.4 NonFuture Dependence and Sensitivity Analysis under Multivariate Normality within Pattern Nonfuture dependence assumes that missingness only depends on observed data and the current missing value, i.e. [S = slY] [S = sl Ys+], and can be viewed as a special case of MNAR and an extension of MAR (Kenward et al., 2003). Kenward et al. (2003) showed that nonfuture dependence holds if and only if for each j > 3 and k Pk(YjIYji) = P11i(yY1Yj1). An approach to implement nonfuture dependence within the framework of Section 4.3 is as follows. We model the observed data as in (43). For the conditional distribution of the current missing data (Ys+,), we assume that ps(ys Ay ) ~ N (/3 s + ( A( s+)) e "S 2 < s < J ( 1/= 1 and for the conditional distribution of the future missing data (Ys+2 ..., Yj), we assume that Ps(ylyji) = pj1((yJlYj) 2 < s < 1 1, where Si ( ) (S j 1) p(S > j) p(S >J 1)i( p(S >J 1) ). Note that by this approach, although the model for future missing data is a mixture of normals, the sensitivity parameters are kept the same as in Section 4.3 (Aj) and As), j = 2,..., J and / = 0,... j 1). In addition, this significantly reduces the number of potential sensitivity parameters. For Jdimensional longitudinal data, the total number of sensitivity parameters, (2J3 + 3J2 + J)/6 J is reduced to (J2 + 3J 4)/2; for J=3 (6), from 11 (85) to 7 (25). Further reduction is typically needed. See the data example in Section 4.6 as an illustration. If all the remaining sensitivity parameters are set to zero, then we have Ps(ys+lys) = p>s+i(ys+llys), 2 < s < J and Ps(yjlYy1) = P9j(yYjy1), 2 < s < 1 < J 1, which implies Ps (Yjyj) = p^j(Yily^1) for all s < j, i.e. MAR. 4.5 MAR and Sensitivity Analysis with Multivariate Normality on the ObservedData Response If we assume multivariate normality only on observed data response, YobsIS instead of the full data response, YIS, we can weaken the restrictions on ps(yj yj_) for s > j and allow the MDM to incorporate all observed data under MAR (cf. Corollary 4.3). For example, we may specify distributions YobsIS as follows: Ps(yj) ~ N(ps),S)) 1 << sJ p,(y, y1) ~ N( () L ) ) 2 where j1 (S) R(s) 3(5)y / 1 To identify the fulldata model, recall the MAR constraints imply that JP(S k) Ps(y Yj1) = P(Y y1) = P(S > ) Pk(YjlYj1) (44) kj for s < j, which are mixture of normals. For sensitivity analysis in this setting of mixture of normals, we propose to introduce sensitivity parameters A, (location) and A, (scale) such that for s Ps(yjlyj1) = e A j,kPk( )  yj1) (45) k~j kj eACT where jk = P(k). The rationale for this parameterization is that each pk( IYj1) in the summation will have mean A) + (k) and variance e2An~ ) 1. To reduce the dimension of the sensitivity parameters, we could make A0) and A0) common for allj (namely A, and A,). In this set up, we have (s),MNAR Aj) j,k (k) Ij jl~k kj and (s),MNAR e2A) (k) i (k) j J lji kj j(k)j 2 k=j where =(k ))k 2 (k) kj kj Note that _/ does not depend on k) for k j,..., J. Under an MAR assumption (44), for [Yjl Yj_, S = s], we have J (s),MAR I (k) kj and _(s),MAR (k ) (k) ) (,(s),MAR)2 kj Therefore, under MNAR assumption (45), the two sensitivity parameters control the departure of the mean and variance from MAR in the following way, (s),MNAR A ) (s),MAR Jj Pj1 and (s),MNAR 2 ) (s),MAR (1 e2 and e (1j e IW with A) being a location parameter and AW) being a scale parameter. The MNAR class allows MAR when A) = AO) = 0 for allj > 2. By assuming nonfuture dependence, we obtain p(S =j 1) e_A) y, Ps(yj1Yi) = P1i(yi1) p(S >j 1) ekk kk SP(S >j PkYj Yj1) kj 1) A (1 e^ )( 2 for the future data and (45) for the current data U( = s + 1). The number of sensitivity parameters in this setup is reduced from J(J 1) to (J 2)(J 1); so, for J = 3 (6), from 6 (30) to 2 (20). Further reductions are illustrated in the next section. 4.6 Example: Growth Hormone Study We analyze a longitudinal clinical trial using the framework from Sections 4.4 and 4.5 that assume multivariate normality for the fulldata response within pattern (MVN) or multivariate normality for the observed data response within pattern (OMVN). We assume nonfuture dependence for the missing data mechanism to minimize the number of sensitivity parameters. The growth hormone (GH) trial was a randomized clinical trial conducted to estimate the effectiveness of recombinant human growth hormone therapy for increasing muscle strength in the elderly. The trial had four treatment arms: placebo (P), growth hormone only (G), exercise plus placebo (EP), and exercise plus growth hormone (EG). Muscle strength, here mean quadriceps strength (QS), measured as the maximum footpounds of torque that can be exerted against resistance provided by a mechanical device, was measured at baseline, 6 months and 12 months. There were 161 participants enrolled on this study, but only (roughly) 75% of them completed the 12 month follow up. Researchers believed that dropout was related to the unobserved strength measures at the dropout times. For illustration, we confine our attention to the two arms using exercise: exercise plus placebo (EP) and exercise plus growth hormone (EG). Table 41 contains the observed data. Let (Y1, Y2, Y3) denote the fulldata response corresponding to baseline, 6 months, and 12 months. Let Z be the treatment indicator (1 = EG, 0 = EP). Our goal is to draw inference about the mean difference of QS between the two treatment arms at month 12. That is, the treatment effect 0= E(Y31Z =1) E(Y31Z = 0). In the fulldata model for each treatment under nonfuture dependence, there are seven sensitivity parameters for the MVN model: {A0 A2) A) A3) A3) A(2), ()} and four sensitivity parameters for OMVN model: {A 2), A 3), A2), A3)} (see Appendix). For the MNAR analysis, we reduced the number of sensitivity parameters as follows: * A) and A ) do not appear in the posterior distribution of E(Y31Z) for Z = 0, 1, and thus are not necessary for inference on 0. We restrict to MNAR departures from MAR in terms of the intercept terms by assuming A2)= 3)= A3) 0. We assume the sensitivity parameters are identical between treatments. This reduces the set of sensitivity parameters to {A2), A3)} for MVN model and {A2), A()} for the OMVN model. There are a variety of ways to specify priors for the sensitivity parameters A2) and A(3) 0 ' 2 = E(Y2 Y, S = 1) E(Y2 Y,, S > 2) A3 = E ( Y3Y2, Y,, S = 2) E(Y31Y2, Y, S = 3). Both represent the difference of conditional means between the observed and unobserved responses. A(2) and A() have (roughly) the same interpretation as A2) and A3) respectively. Based on discussion with investigators, we made the assumption that dropouts do worse than completers; thus, we restrict the A's to be less than zero. To do a fully Bayesian analysis to fairly characterize the uncertainty associated with the missing data mechanism, we assume a uniform prior for the A's as a default choice. Subject matter considerations gave an upper bound of zero for the uniform distributions. We set the lower bound using the variability of the observed data as follows. We estimate the residual variances of Y2 Y1 and Y3 Y2, Y1 using the observed data; we denote these by 21 and 7312,1 respectively. We use the square root of these estimates as the lower bounds. In particular, we specify the priors for {A2), A3)} as well as {A(2), A()} as Unif(9(r)), where 9() = /2 O] x L 1/2 0] Based on the estimates 72 18 and 7 /2 = 12, the priors are [18, 0] x [12, 0] for {A2), A3)} and for {A(2), A3)}. For the other parameters in the fulldata model, we assign N(0, 106) for mean parameters (p, /) and N(0, 104) for variance parameters (o); see the Appendix for further details on the models. We fit the model using WinBUGS, with multiple chains of 25, 000 iterations and 4000 burnin. Convergence was checked by examining trace plots of the multiple chains. The results of the MVN model, OMVN model, and the observed data analysis are presented in Table 42. Under MNAR, the posterior mean (posterior standard deviation) of the difference in quadriceps strength at 12 months between the two treatment arms was 4.0 (8.9) and 4.4 (10) for the MVN and OMVN models. Under MAR the differences were 5.4 (8.8) and 5.8 (9.9) for the MVN and OMVN models, respectively. The smaller differences under MNAR were due to quadriceps strength at 12 months being lower under MNAR due to the assumption that dropouts do worse than completers. We conclude that the treatment difference, 0 was not significantly different from zero. 4.7 ACMV Restrictions and Multivariate Normality with Baseline Covariates In this section, we show that common identifying restrictions overidentify estimable parameters in the presence of baseline covariates with time invariant coefficients and offer a solution. 4.7.1 Bivariate Case Consider the situation when Y = (Yi, Y2) is a bivariate normal response (J = 2) with missing data only in Y2, i.e. S = 1 or 2. Assume there are baseline covariates X with time invariant coefficients a. We model p(S) and p(YS) as follows: siX ~ Bern(q(X)) Y S = s N(p(s)(X), i(S)) s= 1,2 where (S) + Xa(S) (S) (S) I (= and I(s) = ll 2 (s) + Xa(s) 2 MAR (ACMV) implies the following restriction [Y2 Y1, S = 1] [Y2 Y1, S= 2]. This implies that conditional means, E(Y21 Y, X, S = s) for s = 1, 2, are equal, i.e. (1) (2) 21) + Xa() + L(Y Xa(l)) = 2) + Xa (2)+ (Y1 0I I1I. For (46) to hold for all Y, and X, we need that a(1) _= (2) However, both a'l) and a(2) are already identified by the observed data Y1. Thus the ACMV (MAR) restriction affects the model fit to the observed data. This is against the principle of applying identifying restrictions (Little and Wang, 1996; Wang and Daniels, 2009). To resolve the overidentification issue we propose to apply MAR constraints on residuals instead of directly on the responses. In the bivariate case, the corresponding restriction is [Y2 Xa(l) Y X~(1), X, S = 1] [Y2 Xa(2) 1 X X(2),X, S = 2] (47) 12) Xa(2)). (46) Since the conditional distributions 7 (s) / (s) (S) 21 2 [Y2 Xa(S) Yi Xa(s), X, S = s] ~ N + (sYi) 1 () s) )2s) are independent of a(s) for s = 1, 2, the restriction (47) places no constraints on ca ), thus avoiding overidentification. The MDM corresponding to the ACMV(MAR) on the residuals is given by log P(S = 1Y, X) og ) 1 (1 B)2X(a(2)a(2)T (1)(1)T)XT P(S = 21Y, X) 1 (X) 2* 2(1 B)(Y2 A(Y))X((2) (1)) g 2 log (2) r11 (Y Xoa(2) /2))2 1 Xa() /))2 ( ))2 + 2(O11)2 where a* = a72) (2, B = and A(Y1) = 22) + 1 () (2)) Hence by assuming 22i' B = ai + )ll 11 11 11 MAR on the residuals, we have the MDM being a quadratic form of Y1, but independent of Y2 if and only if c(2) = (1). In other words, assumption (47) implies MAR if and only if c(2) = (1). So in general, MAR on residuals does not imply that missingness in Y2 is MAR. However, it is an identifying restriction that does not affect the fit of the model to the observed data. CCMV and NCMV restrictions can be applied similarly to the residuals. Remark: In general, p~s) can be replaced by pi(s) if there are subjectspecific covariates with time varying coefficients. 4.7.2 Multivariate Case To incorporate baseline covariates in the multivariate case and apply similar MAR restrictions, we specify the model for the observed data as follows: ps(yi\X) ~ N(/) + Xa(s), aS)) 1 < s < J Ps(yy_1,X) ~ N( ) 2 where j1 Ij : J + + 4 +8 /= 1 For the missing data, the conditional distributions are specified as P,(ylyj_) ~ N( l ,o7) 1 where j1 j S = ( s) + a() / (s) Xa(s)). (49) / 1 The conditional mean structures in (48) and (49) induce the following form for the marginal mean response E(Yj S = s) = U(s) + Xa() where U s) is a function of intercept (e.g. (j)) and slope (e.g. /<)) parameters from p( but not X or a. This marginal mean response reflects the fact that X is the baseline covariates and a is its timeinvariant coefficient. This form is also necessary for resolving overidentification of a via the MAR on the residuals restrictions as shown later. Note that since Y1 is always observed, (s) (1 < s < J) are identified by the observed data. However, in the model given by (48) and (49), there is a twofold overidentification of as") under MAR: 1. For MAR constraints to exist under the model given in (41), (s) as defined in (48) must be equal for 2 < j < s < J and for all X. This requires that a(S) a* for 2 2. MAR constraints also imply that as defined in (49) must be equal to (J) for 1 < s Similar to the bivariate case, to avoid the overidentification, we again use the MAR on the residuals restriction, Pk(Y Xa(k) ly Xa(k),... Yj1 Xa(k), X) = P(S )s)Ps(yJ Xa(S) y Xa(,... yj_1 Xa(S),X) k With the conditional mean structures specified as (48) and (49), the MAR on the residuals restriction places no assumptions on ca ") The corresponding MDM is P(S = slY, X) P(S = s)p(YS = s,X) log = log P(S> slY, X) P(Y, S >sX) P(S = s)pY(YjIYj1,X)p,(Yj1 YJ2,X) ... s(Y1lX) = log E =s PI(YJIYJ1 X)pi(YJ1 YJ2, X) ...pi(YX)P(S = I) It does not have a simple form in general. However, if a ") = a* for all s, then P(S = slY, X) p(Y1 X)P(S = s) log P = log j P(S > sY,X) /=s P5p(Y1 X)P(S = I) i.e. the MDM only depends on Y, and X. Otherwise, the missingness is MNAR. 4.8 Summary Most pattern mixture models allow the missingness to be MNAR, with MAR as a unique point in the parameter space. The magnitude of departure from MAR can be quantified via a set of sensitivity parameters. For MNAR analysis, it is critical to find scientifically meaningful and dimensionally tractable sensitivity parameters. For this purpose, (multivariate) normal distributions are often found attractive since the MNAR departure from MAR can be parsimoniously defined by deviations in the mean and (co)variance. However, a simple pattern mixture model based on multivariate normality for the full data response within patterns does not allow MAR without special restrictions that themselves, induce a very restrictive missing data mechanism. We explored this fully and proposed alternatives based on multivariate normality for the observed data response within patterns. In both these contexts, we proposed strategies for specifying sensitivity parameters. In addition, we showed that when introducing baseline covariates with time invariant coefficients, standard identifying restrictions result in overidentification of the model. This is against the principle of applying identifying restriction in that they should not affect the model fit to the observed data. We proposed a simple alternative set of restrictions based on residuals that can be used as an 'identification' starting point for an analysis using mixture models. In the growth hormone study data example, we showed how to reduce the number of sensitivity parameters in practice and a default way to construct informative priors for sensitivity parameters based on limited knowledge about the missingness. In particular, all the values in the range, were weighted equally via a uniform distribution. If there is additional external information from expert opinion or historical data, informative priors may be used to incorporate such information (for example, see Ibrahim and Chen, 2000; Wang et al., 2010). Finally, an important consideration in sensitivity analysis and constructing informative priors is that they should avoid extrapolating missing values outside of a reasonable range (e.g., negative quadriceps strength). 4.9 Tables Table 41. Growth Hormone Study: Sample mean (standard deviation) stratified by dropout pattern. Dropout Number of Month Treatment Pattern Participants 0 6 12 EG 1 12 58(26) 2 4 57(15) 68(26) 3 22 78(24) 90(32) 88(32) All 38 69(25) 87(32) 88(32) EP 1 7 65(32) 2 2 87(52) 86(51) 3 31 65(24) 81(25) 73(21) All 40 66(26) 82(26) 73(21) Table 42. Growth Hormone Study: Posterior mean (standard deviation) Treatment EP EG Month 0 6 12 0 6 12 Difference at 12 mos. Observed Data 66(9.9) 82(18) 72(3.8) 69(7.3) 87(16) 88(6.8) 12(7.8) MAR Analysis MVN 66(6.0) 82(5.9) 73(4.9) 69(4.9) 81(6.8) 78(7.2) 5.4(8.8) OMVN 66(6.0) 81(8.2) 73(6.1) 69(4.9) 82(7.7) 79(7.8) 5.8(9.9) MNAR MVN 66(6.0) 80(6.0) 72(5.0) 69(4.9) 78(7.1) 76(7.5) 4.0(8.9) Analysis OMVN 66(6.0) 80(8.3) 71(6.1) 69(4.9) 79(8.0) 76(8.0) 4.4(10) 4.10 Appendix Missing data mechanism under missing not at random and multivariate normality: The MDM in Section 4.3 is derived as follows: P(S = slY) P(S = s)ps(Y) log = log P(S > slY) P(Y, S > s) g P(S = s)ps(YI) R=2 Ps(Y/ IY) = log E=, sP(S = k)pk(Y) H/22 Pk(y )} log 2 2P>1(Y1 Y1)P(S= s)ps(Yi) Hns +iP(Y/ vi1) = log I/=2 pP>(Y / Y1 k=s {P(S = k)pk(Y) HJ =sl Pk(y 1)} l P(S = s)p,(YI) H/s J+lPs(Y/i /1) = log k{=s P(S = k)Pk(Y) /=s 1 Pk( YI 1k 1) (5S) (?s(i ^)) 2 i> 0a" + ((0i))} =log P(S = s) ++1 ) 2 2(s) 2 2e1 ek(>) log {P(S k)(k) exp { k)2 exp} (Y2_ />1)2 k=s 1 /=+s1 /1 0 ( Y / 1 / ILI/ 1 ) 2 2 S) a(>) I=k+l 2ze & /1 i1 Mean and variance of [Yj Yj, S = s]: The mean and variance of [Yj Y,, S = s] under MNAR assumption in Section 4.5 are derived as follows: Sa ), (k) N W y. AUO) (1 ( e )/ (s),MNAR E(Y Yj1, S = s) = e Vj,k yJPk( O1 ej1) /(e^ k=j = e ) jk A) W ) dy e =ke) +A (1 e e') Pk(Y)lyJ1)e dy k=j k=j and o)MNAR = Var(lY, S = s) J (k) Sy). U) (1 e^)pY = e A jk Pk( j1)dy E2(Yj i S = k) kj e Se =j,k 8e y2 )l'j) Pk(Yj yj_1)e)( ) dy; k=j (k 2/( ) ) 2  e +y (1 je j kj J J jkE((j) 1' S k) (k) "2 e2A kE((y)21 S = k) +j + (1 e) )2 k=j kj + 2e k(A" (1 e)lj) )E( y~*y,j_ S k) A + ,k II) k=j kj e 2A =,k (o1j) + ( 1k) 2 2 ,W (k) I e j z^i 4 i'lj ) I =j,k S(1 e2)) (I) 2 ( kJ ) k=j k=j y, (1e  where yj =  e ) Fulldata model for the growth hormone example (Section 4.6): We specify a pattern mixture model with sensitivity parameters for the two treatment arms. For compactness, we suppress subscript treatment indicator z from all the parameters in the following models. For missing pattern 5, we specify S ~ Mult(o) with the multinomial parameter ( = ( 02, 3), s = P(S = s) for s E {1, 2, 3}, and E=1 s = 1. For the observed response data model [YobsS], we specify the same MVN and OMVN model for [YIIS] as follows: Y, S = 1 N(,', a)) Y, lS = 2 Q N a(/ )2)) Y l S = 3 ~ N (/j3 3)), For MVN model, we specify Y2 Y1, S 2} N ((2) + 2 (2) Y2 Y1, S = 3 &\ o3) + (3) y+ n(3) y ^( v2, 7(> 3)" Y3 Y2, Y1, S = 3 N ~>3 + 31 + ~ 3)2. '03) 3' For OMVN model, we specify //?(2) + (2) Y 7(2) Y2 Y, S = 2 + 21 u2N 2 S(3) + (3) y '7(3) Y2 Y1, = 3 ~ N 2,0 2,1 2 {9(3) + (3) Y1 + (3) Y2, _(3)" Y3 Y2, Y1, S = 3 ~ N 3,0 3,1 3,2 313, For missing response data model [Ymis Yobs, S], we specify for MVN model Y2 1, S = 1 N 0 (2) 2) 2) )Y1, e() 2 2) Y3 Y2, Y1, S = 2 N (>3) A3 + (>3) + A3))Y (/3) + A3))2, eA()3) Y31 Y2, Y,, S = 1 ~ 0 3 3) + 01 32 31) + 2 N (/3) + A3) + (/3) + 3)) + (/>3) + A3)), (3)'7 + 2 + (N3 0 0 1 1 2 Y2, 313 a313 100 For OMVN model, we specify 03 (L (Au2) + /(3) + /(3)Yi, 'aL)(3) Y2 1, S = 1 ~ 2 3 2)e) (2)_ + 2 N(A(2) +/(2) +/(2) 3 e1 2 + 42+43 N U 2,0 2,1 Y1, 21,) L( + (3) + (3) y+ (3) 7) (3) Y31 Y2, Y1, S = 2 ~ N ( u3) 3,1Y 3,2 Y2, e 313, 3 N (4,3 + 3) y + 93)'y2, 73) Y3 Y2 1, S = 1 N 3,2 313 2 (L (3) + (3) + (3) y+ (3) y '7eA (3) + 2 3N ( u3) '3,1 v2 + p 23 e, )313 MAR on residuals constraints: Here we show that in multivariate case (Section 4.7.2), the MAR on the residuals restriction puts no constraints on c() . Let [ZjIS] [Yj Xo(S)]. The MAR on the residuals constraints are SP(S > s) Pk(ZjZj1,X) = P(S > jPs(zjljiX). s= Note that J ps(yj, ... yJ) = ps(yi) ps(yl yi) /=2 Sexp (> Xa(S) > )(Yt (>ai) Xa(s)) = Ps(Yi) /=2 227a Thus, J PS(z, ... ) ps(zi) H ps(z/z/1) /=2 exp (Z i t/ P/ ~_ _(l t _t/) 2 = ps (z2) ) /=2 2/I/ We can further show that [Z, Zj_, S = s, X] ~ N p ) +( )(Zi I)), ) , /=i which is independent of s. Therefore, P(S > A) S ( p(zyz)I' X)N s J ( P() + j1 /. (Z/ /=1 Similarly, we may derive that pk(ZjZZj1,X) = N (P) + j1 / (Z The constraints (410) thus imply (S) = (1 which places no restrictions on ) which places no restrictions on (s). 102 ( )) (o J) P/1 7j( ) . /s)) ) P/ 7JO ) P/1 )) CHAPTER 5 DISCUSSION: FUTURE APPLICATION OF THE BAYESIAN NONPARAMETRIC AND SEMIPARAMETRIC METHODS 5.1 Summary In this dissertation, we explored the utility of Bayesian shrinkage methods for the analysis of incomplete longitudinal data with informative missingness that includes both dropout and intermittent missingness. We considered two different saturated model parameterizations and corresponding parameter space reduction strategies. By simulation, we showed that the proposed methods outperform a saturated model without parameter shrinkage and a misspecified parametric model while being very competitive with the correct parametric model. Furthermore, the incomplete data analysis framework developed in this dissertation allows straightforward incorporation of experts' opinions in the form of informative priors, as well as flexible sensitivity analysis. Second, we explored conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors, with application to a longitudinal clinical trial. In the following sections, we will discuss further applications of Bayesian nonparametric and semiparametric methods we are currently considering. 5.2 Extensions to Genetics Mapping For identifying genes involved in human diseases and their inheritance, multigenerational familybased designs (including parents and offspring) have become popular (Farnir et al., 2002; Liu et al., 2006). In such analysis, researchers are often interested in simultaneously estimating linkage that describes the tendency of certain alleles to be inherited together, and linkage disequilibrium (LD) that measures the nonrandom association between different markers. However, if the LD is close to zero, the linkage recombination fraction is hard to estimate (or not estimable at all). We use a two marker scenario to illustrate this dilemma. 103 Consider two markers A and B, each with two alleles A and a and B and b, respectively. Four possible haplotypes, [AB], [Ab], [aB], and [ab], can be formed by the two markers, with the frequencies denoted as p1i, pio, Pol, and poo, respectively. We use p, 1 p, q and 1 q to denote the allele frequencies of A, a, B and b, respectively. Then, we have the following relationships: Pl = pq + D Pio = p(1 q) D (51) Poi = (1 p)q D Poo = (1 p)(1 q) + D, where D is the LD parameter. By simple algebra, we can show that D = PuPoo PioPol. Now, let r denote the linkage recombination fraction, the frequency that a chromosomal crossover will take place between the two markers A and B during meiosis. To estimate r, only offspring with at least a doubleheterozygous parent, i.e. parent with genotype Aa/Bb, contribute to the likelihood (proportionally) by PinPoor + ploPo(1 r). Therefore, r is not estimable when D is zero. One possible solution is to incorporate more markers in the linkage analysis. By doing this, the number of parents with no less than two heterozygous markers increases. Consequently, more offspring contribute to the likelihood for estimating the linkage recombination fraction. However, the number of haplotype frequencies to be estimated also increases (exponentially) as the number of markers increases. Bayesian shrinkage methods can be applied to address this problem. 5.3 Extensions to Causal Inference 5.3.1 Causal Inference Introduction For clinical studies of terminal diseases, incomplete data is often caused by death. If a response is truncated by death, the missingness should not be classified as "censored" because "censoring" implies the masking of a value that is potentially observable. It is also not appropriate to handle these cases by traditional nonmortality missing data approaches such as models assuming ignorability or models assuming nonignorable missing data mechanism which implicitly "impute" missing responses. In randomized studies with no missingness, causal relationships are well established and the treatment causal effects can be estimated directly (Rubin, 1974). However, in nonrandomized trials or in presence of missing data, these methods are limited if the research interest demands estimation of causally interpretable effects. To define causal effects, we first introduce the concept of potential outcomes, which are sometimes used exchangeablely with the term counterfactual (but not always, see Rubin, 2000). The use of term "potential outcome" can be traced at least to Neyman (1923). Neyman used "potential yields" Uik to indicate the yield of a plot k if exposed to a variety i. Rubin (1974) defines the causal effect of one treatment, E, over another, C, for a particular unit as the difference between what would have happened if the unit had been exposed to E, namely Y(E), and what would have happened if the unit had been exposed to C, namely Y(C). Using potential outcomes, Frangakis and Rubin (2002) introduce a framework for comparing treatment effects based on principle stratification, which is a crossclassification of units defined by their potential outcomes with respect to (post)treatment variables, such as treatment noncompliance or dropout. The treatment comparison adjustment for posttreatment variables is necessary because such variables encode the characteristics of both the treatment and the patient. For example, a patient with diagnosed cancer in a cancer prevention trail may have depression caused by the treatment or by the diagnosis 105 itself. Furthermore, the comparison may be meaningless without the posttreatment variable adjustment. For example, a response such as depression is no longer defined after a nonresponse cause such as death happens. A stratum s/ is defined by the joint potential response S(Z) with respect to the posttreatment variable Z (e.g., Z = 0, 1). For example, let S(Z) be the potential survival status and let S(Z) = 1 and 0 denote alive and dead respectively. Then the stratum s/ = {S(0) = 1, 5(1) = 1} defines the patients who will (potentially) survive on both arms. A stratum is unaffected by treatment. That is, for subject i, i e s/ or i s/ does not depend on the actual treatment i is assigned. Consequently, the treatment effect defined as the difference between {Yi(0)i c/} and {Y(1) E c} is a causal effect. On the contrary, a standard adjustment for posttreatment variables uses the treatment comparison between {Y,(0)Si(0) = s} and {Y,(1)lSi(1) = s}. Such an estimand is not a causal effect when S(z) is affected by z, which results in the fact that the group of patients with S(0) = s is not identical to the group of patients with S(1) = s. Consistent with Frangakis and Rubin's framework, Rubin (2000) introduced the concept of survivors average causal effect (SACE), that is the causal effects of treatment on endpoints that are defined only for survivors, i.e. the group of patients who would live regardless of their treatment assignment. Within the principal strata framework, the identification of SACE or other principal stratum causal effects usually depends on untestable assumptions. To address the 106 uncertainty of the untestable assumptions, sensitivity analysis is carried out, and/or bounds of the causal effects are derived. For example, Zhang and Rubin (2003) derived large sample bounds for causal effects without assumptions and with assumptions such as monotonicity on death rate on different treatment arms. Gilbert et al. (2003) used a class of logistic selection bias models to identify the causal estimands and carried out sensitivity analysis for the magnitude of selection bias. Hayden et al. (2005) assumed "explainable nonrandom noncompliance" (Robins, 1998) and outlined a sensitivity analysis for exploring the robustness of the assumption. Cheng and Small (2006) derived sharp bounds for the causal effects and constructed confidence intervals to cover the identification region. Egleston et al. (2007) proposed a similar method to Zhang and Rubin (2003), but instead of identifying the full joint distribution of potential outcomes, they only identify features of the joint distribution that are necessary for identifying the SACE estimand. Lee et al. (2010) replaced the common deterministic monotonicity assumption by a stochastic one that allows incorporation of subject specific effects and generalized the assumptions to more complex trials. 5.3.2 Data and Notation The following notation is defined for a random individual. When necessary, we use the subscript i to denote data for the ith individual. We consider a controlled randomized clinical study with treatment arm (Z = 1) and control arm (Z = 0). A longitudinal binary outcome Y is scheduled to be measured at visits = 1,..., J, i.e. Y = (Y, ..., Yj) is a Jdimensional vector. Let R = (R, ..., Rj) be the missing indicator vector with Rj = 1 if Yj is observed and Rj = 0 if Yj is missing. We assume the missingness is monotone. We assume there are multiple events that will cause drop out for a patient on this trial, and categorize the events as nonresponse events (e.g. death) and missing events (e.g. withdraw of consent). We assume that nonresponse events may happen after the 107 occurrence of a missing event but not vice versa. We further assume all the events are observed. Let C denote the "survival" time for a patient. That is, C = c implies that a nonresponse event happened to the patient between visit c and c + 1 and caused the patient to drop out on and after visit c + 1. Let Rc = {R1, ..., Rc} be the missing data indicator recorded prior to patient dropout that is caused by a nonresponse event. We use Yj = (Yi,..., Yj) to denote the historical data up to time point and Yobs to denote the observed response data. We use Y(z), C(z), Rc(z) and Yobs(Z) to denote the value of Y, C, Rc and Yobs of a patient, possibly counterfactual, if the patient is assigned to treatment z. The full data F of a patient thus consists of {Z, C(O), Yc(o)(O), C(1), YC()(1))}, and the observed data 0 contains {Z, C(Z), R,(Z), Yobs(Z)}. One goal is to measure the causal effect of treatment by estimating the treatment effect for those who would not have dropped out due to nonresponse reasons under either treatment or control. That is, to estimate the "survivor" average causal effect SACEj = E(Yj(1) Y(0) C(0) > j, C(1) > j) = P(Y,(1) = 1 C(0) > j, C(1) >j) P(,(O) = 11C(0) > j, C(1) >j) (52) for allj. Note that the group of patients of interest {C(O) > j, C(1) > j} form a principal stratum. 5.3.3 Missing Data Mechanism To make causal inferences, we first need to estimate P,z,c = E[YI C = c, Z = z] for all z and j < c, which are not identifiable without unverifiable assumptions. We make the 108 same partial missing at random assumption as in Chapter 3, Section 3.2, that R, I Yc ZZ, C, Yobs. We have shown in Chapter 3 that P*,z,c is identified by the observed data under this partial missing at random assumption. 5.3.4 Causal Inference Assumption The causal effect (52) is not identifiable from the observed data 0= {Z,C(Z), R(Z),Yobs(Z)}. We propose the following assumptions to identify boundaries for the causal effect: I Stable Unit Treatment Value Assumption (SUTVA). Let Z = (Z, ..., ZN) be the vector of treatment assignment for all the patients. SUTVA means Z, = Zf = (Y,(Z,), C,(Z,)) = (Y,(Zf), C,(Z;)), regardless of what Z is. That is, the potential outcome of patient i is unrelated to the treatment assignment of other patients. The allows us to write Yi(Z) and Ci(Z) as Yi(Zi) and Ci(Zi) respectively. II Random Assignment The treatment assignment Z is random, i.e. Z L (Y(0), Y(1), C(0), C(1)), which holds in a controlled randomized clinical trial. This assumption allows us to write Yj(z) and Cj(z) as YjIZ = z and Cj\Z = z respectively. III Mean Monotonicity E[Y'(z)lC(z) = c, C(1 z) = t] < E[Yj(z) C(z) = c', C(1 z) = t'] for c' > c > j, t' > t, z = 0, 1. This assumption provides an ordering of the mean potential response at visit under treatment z for all the principal cohorts of individuals who would be on study at visit j under treatment z. The means are assumed to not be worse for cohorts who remain onstudy longer under both treatments. That is, the individuals who would be last seen at time c' (c' > j) under treatment z and time t' under treatment 1 z will not have a worse mean potential response at time under treatment z than individuals 109 who would last be seen at a time less than c' (but still greater than or equal to time) under treatment z or a time less than t' under treatment 1 z. The mean monotonicity assumption is often reasonable in clinical studies. For example, in a cardiovascular stent implantation trial, multiple endpoints including allcause mortality free survival and 6minute walk test score are used to evaluate the effectiveness of the device. Since the two endpoints are positively correlated, it is plausible to assume that patients will potentially perform better with their 6minute walk tests if they have a longer survival time, i.e. remain on the study longer. We introduce some further notation 1. pc,t = P(C(O) = c, C(1)= t). 2. 7,, = P(C(z)= c). 3. mj,z,,t = E[Yj(z)IC(z) = c, C(1 z) = t] (c >j). 4. ijz, = E[Yj(z)C(z) = c] (c >j). Note that under Assumption II (randomization), both 7, = P(C(z) = c) = P(C = cIZ = z) and p,z,c = E[Yj(z)C(z) = c] = E[Yj C = c, Z = z] are identified by the observed data under the partial missing at random assumption. The causal effect of interest SACEj can be expressed as SACEj = E[Y(1) Y.(o) C(1) >j, C(o) > j] = E[Yj(1) Y(0), C(1) > j, C(O) > j]P(C(1) > j, C(O) >j) (53) = Pc,t ,,c, m oct)Pct . c=j tj cj tj The boundaries of SACEj in (53) can be found subject to the following restrictions: 1. O < pc,t < 1 for all c and t, and t = 1, c I t 1Pc,t= 1, 2. Ec l Pc,t = 7l,t and c,t =7c, 3. t=1 Pct = 70j) C' 3. ZE1t mjo,c,tPc,t = pjo,c (c > j) and 1, mj,z,t,cPc,t = pj,i,t ((t > j) for all j, 4. mj,c',t, > mj,z,c,t for c' > c > j, t' > t and all j and z. 110 where restrictions (1)(2) are for ps,t to be a distribution with (identified) marginals, restriction (3) satisfies the identified conditional means, and restriction (4) comes from Assumption III. Finding the boundaries of the SACE i.e. finding the minimum and the maximum of the objective function (53), can be approximated (by ignoring the normalizing constant) as a nonconvex quadratically constrained quadratic problem (QCQP) (Boyd and Vandenberghe, 1997, 2004). For a QCQP, a standard approach is to optimize a semidefinite relaxation of the QCQP and get lower and upper bounds on local optimal of the objective function (Boyd and Vandenberghe, 1997). The uncertainty of the estimated bounds can be characterized in a Bayesian framework. The joint posterior distribution of the bounds can be constructed by implementing the optimization for each posterior sample of *,z,c, identified by the algorithm proposed in Section 5.3.3. The result can be presented as in Figure 51. A study decision might be based on the mode of the posterior joint distribution of the bounds. 5.3.5 Stochastic Survival Monotonicity Assumption Under Assumption II, the marginal distributions P(C(0)) and P(C(1)) of P(C(0) = c, C(1) = t) (re: 0 and 1 represent the placebo and treatment arm, respectively) are identified. However, the joint distribution remains unidentified without further assumption. We outline several Assumptions that will identify pc,t beyond the identified margins (Figure 52). These assumptions, when reasonable, will simplify the optimization of the objective function and yield more precise results. 1. P(C(O) = m C(1) = c) = qnmP(C(O) = n C(1) = c) for c > m > n and q > 1. That is, given a patient will "survive" until time point c on the treatment arm, the probability the patient will "survive" until time point n 1 is q times the probability that the patient will "survive" until n for n < c on the placebo arm. The parameter q is a sensitivity parameter. 2. P(C(1) = t C(O) = c) = 0 for c > t. That is, the chance that a patient will "survive" longer on the placebo arm than the active treatment arm is zero. This assumes the lowertriangle (excluding the diagonal) in Figure 52 is zero. These assumptions may be incorporated in the optimization Bayesian framework to improve the precision of the posterior joint distribution of the bounds. 5.3.6 Summary of Causal Inference We have outlined an approach to estimate the causal effect of treatment where there is dropout due to nonresponse reasons such as death. We also outlined an approach for posterior inference. We need to further explore point estimation of the intervals/bounds for the causal effect and characterizing their uncertainty in a Bayesian framework. 5.4 Figures 112 Contour Plot CO  (0  006 02 4 2 0 2 4 Lower Bound Figure 51. Contour and Perspective Plots of a Bivariate Density 113 Density Plot (1,1) (2,2) a (c,t) c (t,c) (c,c) (J,J) Figure 52. Illustration of pc,t REFERENCES Albert, P (2000). A Transitional Model for Longitudinal Binary Data Subject to Nonignorable Missing Data. Biometrics 56, 602608. Albert, P., Follmann, D., Wang, S., and Suh, E. (2002). A latent autoregressive model for longitudinal binary data subject to informative missingness. Biometrics 58, 631642. Baker, S. (1995). Marginal regression for repeated binary data with outcome subject to nonignorable nonresponse. Biometrics 51, 10421052. Baker, S., Rosenberger, W., and DerSimonian, R. (1992). Closedform estimates for missing counts in twoway contingency tables. Statistics in Medicine 11, 643657. Birmingham, J. and Fitzmaurice, G. (2002). A PatternMixture Model for Longitudinal Binary Responses with Nonignorable Nonresponse. Biometrics 58, 989996. Boyd, S. and Vandenberghe, L. (1997). Semidefinite programming relaxations of nonconvex problems in control and combinatorial optimization. communications, computation, control and signal processing: a tribute to Thomas Kailath . Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge Univ Pr. Cheng, J. and Small, D. (2006). Bounds on causal effects in threearm trials with noncompliance. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 815836. Christiansen, C. and Morris, C. (1997). Hierarchical Poisson regression modeling. Journal of the American Statistical Association pages 618632. Daniels, M. (1999). A prior for the variance in hierarchical models. Canadian Journal of Statistics 27,. Daniels, M. and Hogan, J. (2000). Reparameterizing the Pattern Mixture Model for Sensitivity Analyses Under Informative Dropout. Biometrics 56, 12411248. Daniels, M. and Hogan, J. (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC. DeGruttola, V. and Tu, X. (1994). Modelling Progression of CD4lymphocyte Count and its Relationship to Survival Time. Biometrics 50, 10031014. Diggle, P. and Kenward, M. (1994). Informative Dropout Longitudinal Data Analysis. Applied Statistics 43, 4993. Egleston, B. L., Scharfstein, D. O., Freeman, E. E., and West, S. K. (2007). Causal inference for nonmortality outcomes in the presence of death. Biostatistics 8, 526  545. 115 Escobar, M. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association pages 577588. Fan, J. and Li, R. (2001). Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association 96, 13481361. Farnir, F, Grisart, B., Coppieters, W., Riquet, J., Berzi, P., Cambisano, N., Karim, L., Mni, M., Moisio, S., Simon, P., et al. (2002). Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred halfsib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics 161, 275. Faucett, C. and Thomas, D. (1996). Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine 15,. Fisher, B., Costantino, J., Wickerham, D., Redmond, C., Kavanah, M., Cronin, W., Vogel, V., Robidoux, A., Dimitrov, N., Atkins, J., Daly, M., Wieand, S., TanChiu, E., Ford, L., Wolmark, N., other National Surgical Adjuvant Breast, and Investigators, B. P. (1998). Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P1 study. Journal of the National Cancer Institute 90, 13711388. Fitzmaurice, G. and Laird, N. (2000a). Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics 1, 141156. Fitzmaurice, G. and Laird, N. (2000b). Generalized Linear Mixture Models for Handling Nonignorable Dropouts in Longitudinal Studies. Biostatistics 1, 141156. Fitzmaurice, G., Molenberghs, G., and Lipsitz, S. (1995). Regression Models for Longitudinal Binary Responses with Informative DropOuts. Journal of the Royal Statistical Society. Series B. Methodological 57, 691704. Follmann, D. and Wu, M. (1995). An approximate generalized linear model with random effects for informative missing data. Biometrics pages 151168. Forster, J. and Smith, P. (1998). ModelBased Inference for Categorical Survey Data Subject to Nonlgnorable NonResponse. Journal of the Royal Statistical Society: Series B: Statistical Methodology 60, 5770. Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58, 2129. Gilbert, P., Bosch, R., and Hudgens, M. (2003). Sensitivity analysis for the assessment of causal vaccine effects on viral load in HIV vaccine trials. Biometrics 59, 531541. Green, P. J. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models. Chapman & Hall. 116 Harel, O. and Schafer, J. (2009). Partial and latent ignorability in missingdata problems. Biometrika 96, 37. Hayden, D., Pauler, D., and Schoenfeld, D. (2005). An estimator for treatment comparisons among survivors in randomized trials. Biometrics 61, 305310. Heagerty, P (2002). Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics pages 342351. Heckman, J. (1979a). Sample Selection Bias as a Specification Error. Econometrica 47, 153161. Heckman, J. (1979b). Sample selection bias as a specification error. Econometrica: Journal of the econometric society pages 153161. Heitjan, D. and Rubin, D. (1991). Ignorability and coarse data. The Annals of Statistics pages 22442253. Henderson, R., Diggle, P., and Dobson, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics 1,465480. Hogan, J. and Laird, N. (1997a). Mixture Models for the Joint Distribution of Repeated Measures and Event Times. Statistics in Medicine 16, 239257. Hogan, J. and Laird, N. (1997b). ModelBased Approaches to Analysing Incomplete Longitudinal and Failure Time Data. Statistics in Medicine 16, 259272. Hogan, J., Lin, X., and Herman, B. (2004). Mixtures of varying coefficient models for longitudinal data with discrete or continuous nonignorable dropout. Biometrics 60, 854864. Ibrahim, J. and Chen, M. (2000). Power prior distributions for regression models. Statistical Science pages 4660. Ibrahim, J., Chen, M., and Lipsitz, S. (2001). Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika 88, 551. Kaciroti, N., Schork, M., Raghunathan, T, and Julius, S. (2009). A Bayesian Sensitivity Model for Intentiontotreat Analysis on Binary Outcomes with Dropouts. Statistics in Medicine 28, 572585. Kenward, M. and Molenberghs, G. (1999). Parametric Models for Incomplete Continuous and Categorical Longitudinal Data. Statistical Methods in Medical Research 8, 51. Kenward, M., Molenberghs, G., and Thijs, H. (2003). Patternmixture models with proper time dependence. Biometrika 90, 5371. Kurland, B. and Heagerty, P. (2004). Marginalized Transition Models for Longitudinal Binary Data with Ignorable and Nonlgnorable DropOut. Statistics in Medicine 23, 26732695. 117 Laird, N. (1988). Missing data in longitudinal studies. Statistics in Medicine 7,. Land, S., Wieand, S., Day, R., Ten Have, T., Costantino, J., Lang, W., and Ganz, P. (2002). Methodological Issues In the Analysis of Quality of Life Data in Clinical Trials: Illustrations from the National Surgical Adjuvant Breast And Bowel Project (NSABP) Breast Cancer Prevention Trial. Statistical Methods for Quality of Life Studies pages 7185. Lee, J. and Berger, J. (2001). Semiparametric Bayesian Analysis of Selection Models. Journal of the American Statistical Association 96, 13971409. Lee, J., Hogan, J., and Hitsman, B. (2008). Sensitivity analysis and informative priors for longitudinal binary data with outcomerelated dropout. Technical Report, Brown University. Lee, K., Daniels, M. J., and Sargent, D. J. (2010). Causal effects of treatments for informative missing data due to progression. To Appear in JASA. Liang, K.Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 1322. Lin, H., McCulloch, C., and Rosenheck, R. (2004). Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics 60, 295305. Little, R. (1993). PatternMixture Models for Multivariate Incomplete Data. Journal of the American Statistical Association 88, 125134. Little, R. (1994). A Class of PatternMixture Models for Normal Incomplete Data. Biometrika 81, 471483. Little, R. (1995). Modeling the dropout mechanism in repeatedmeasures studies. Journal of the American Statistical Association 90,. Little, R. and Rubin, D. (1987). Statistical Analysis with Missing Data. Wiley. Little, R. and Rubin, D. (1999). Comment on Adjusting for Nonlgnorable Dropout Using Semiparametric Models by D.O. Scharfstein, A. Rotnitsky and J.M. Robins. Journal of the American Statistical Association 94, 11301132. Little, R. and Wang, Y. (1996). Patternmixture models for multivariate incomplete data with covariates. Biometrics 52, 98111. Liu, T, Todhunter, R., Lu, Q., Schoettinger, L., Li, H., Littell, R., BurtonWurster, N., Acland, G., Lust, G., and Wu, R. (2006). Modelling extent and distribution of zygotic disequilibrium: Implications for a multigenerational canine pedigree. Genetics. 118 Liu, X., Waternaux, C., and Petkova, E. (1999). Influence of Human Immunodeficiency Virus Infection on Neurological Impairment: An Analysis of Longitudinal Binary Data with Informative DropOut. Journal of the Royal Statistical Society (Series C): Applied Statistics 48, 103115. Molenberghs, G., Kenward, M., and Lesaffre, E. (1997). The Analysis of Longitudinal Ordinal Data with Nonrandom DropOut. Biometrika 84, 3344. Molenberghs, G. and Kenward, M. G. (2007). Missing Data in Clinical Studies. Wiley. Molenberghs, G., Michiels, B., Kenward, M., and Diggle, P. (1998). Monotone Missing Data and PatternMixture Models. Statistica Neerlandica 52, 153161. Neal, R. (2003). Slice sampling. The Annals of Statistics 31, 705741. Neyman, J. (1923). On the application of probability theory to agricultural experiments. Statistical Science 5, 465472. Nordheim, E. (1984). Inference from Nonrandomly Missing Categorical Data: an Example From a Genetic Study of Turner's Syndrome. Journal of the American Statistical Association 79, 772780. Pauler, D., McCoy, S., and Moinpour, C. (2003). Pattern Mixture Models for Longitudinal Quality of Life Studies in Advanced Stage Disease. Statistics in Medicine 22, 795809. Pulkstenis, E., Ten Have, T, and Landis, J. (1998). Model for the Analysis of Binary Longitudinal Pain Data Subject to Informative Dropout Through Remedication. Journal of the American Statistical Association 93, 438450. Radloff, L. (1977). The CESD Scale: A SelfReport Depression Scale for Research in the General Population. Applied Psychological Measurement 1, 385. Robins, J. (1997). Nonresponse models for the analysis of nonmonotone nonignorable missing data. Statistics in Medicine 16, 2137. Robins, J. (1998). Correction for noncompliance in equivalence trials. Statistics in Medicine 17,. Robins, J. and Ritov, Y. (1997). Toward a curse of dimensionality appropriate(coda) asymptotic theory for semiparametric models. Statistics in Medicine 16, 285319. Robins, J., Rotnitzky, A., and Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846866. Robins, J., Rotnitzky, A., and Zhao, L. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90,. 119 Rotnitzky, A., Robins, J., and Scharfstein, D. (1998a). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association 93, 13211322. Rotnitzky, A., Robins, J., and Scharfstein, D. (1998b). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association 93, 13211322. Rotnitzky, A., Scharfstein, D. O., Su, T.L., and Robins, J. M. (2001). Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics 57, 103113. Roy, J. (2003). Modeling Longitudinal Data with Nonignorable Dropouts Using a Latent Dropout Class Model. Biometrics 59, 829836. Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64, 538545. Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688701. Rubin, D. (1976). Inference and missing data. Biometrika 63, 581592. Rubin, D. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association pages 538543. Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley. Rubin, D. B. (2000). Causal inference without counterfactuals: comment. Journal of the American Statistical Association pages 435438. Scharfstein, D., Daniels, M., and Robins, J. (2003). Incorporating Prior Beliefs about Selection Bias into the Analysis of Randomized Trials with Missing Outcomes. Biostatistics 4, 495. Scharfstein, D., Halloran, M., Chu, H., and Daniels, M. (2006). On estimation of vaccine efficacy using validation samples with selection bias. Biostatistics 7, 615. Scharfstein, D., Manski, C., and Anthony, J. (2004). On the Construction of Bounds in Prospective Studies with Missing Ordinal Outcomes: Application to the Good Behavior Game Trial. Biometrics 60, 154164. Scharfstein, D., Rotnitzky, A., and Robins, J. (1999). Adjusting for Nonignorable DropOut Using Semiparametric Nonresponse Models. Journal of the American Statistical Association 94, 10961146. 120 Schulman, K., Berlin, J., Harless, W., Kerner, J., Sistrunk, S., Gersh, B., Dube, R., Taleghani, C., Burke, J., Williams, S., et al. (1999). The effect of race and sex on physicians recommendations for cardiac catheterization. New England Journal of Medicine 340, 61826. Shepherd, B., Gilbert, P., and Mehrotra, D. (2007). Eliciting a Counterfactual Sensitivity Parameter. American Statistician 61, 56. Ten Have, T, Kunselman, A., Pulkstenis, E., and Landis, J. (1998). Mixed effects logistic regression models for longitudinal binary response data with informative dropout. Biometrics 54, 367383. Ten Have, T, Miller, M., Reboussin, B., and James, M. (2000). Mixed Effects Logistic Regression Models for Longitudinal Ordinal Functional Response Data with MultipleCause DropOut from the Longitudinal Study of Aging. Biometrics 56, 279287. Thijs, H., Molenberghs, G., Michiels, B., Verbeke, G., and Curran, D. (2002). Strategies to fit patternmixture models. Biostatistics 3, 245. Troxel, A., Harrington, D., and Lipsitz, S. (1998). Analysis of longitudinal data with nonignorable nonmonotone missing values. Journal of the Royal Statistical Society. Series C (Applied Statistics) 47, 425438. Troxel, A., Lipsitz, S., and Harrington, D. (1998). Marginal models for the analysis of longitudinal measurements with nonignorable nonmonotone missing data. Biometrika 85, 661. Tsiatis, A. A. (2006). Semiparametric theory and missing data. Springer, New York. van der Laan, M. J. and Robins, J. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer. Vansteelandt, S., Goetghebeur, E., Kenward, M., and Molenberghs, G. (2006a). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statis tica Sinica 16, 953979. Vansteelandt, S., Goetghebeur, E., Kenward, M., and Molenberghs, G. (2006b). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statis tica Sinica 16, 953979. Vansteelandt, S., Rotnitzky, A., and Robins, J. (2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika 94, 841. Wahba, G. (1990). Spline models for observational data. Society for Industrial Mathematics. Wang, C. and Daniels, M. (2009). Discussion of "Missing Data in longitudinal studies: A review" by Ibrahim and Molenberghs. TEST 18, 5158. Wang, C., Daniels, M., D.O., S., and Land, S. (2010). A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. To Appear in JASA. Wu, M. and Bailey, K. (1988). Analysing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine 7,. Wu, M. and Bailey, K. (1989). Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics pages 939955. Wu, M. and Carroll, R. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44, 175188. Wulfsohn, M. and Tsiatis, A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics pages 330339. Yuan, Y. and Little, R. J. (2009). Mixedeffect hybrid models for longitudinal data with nonignorable dropout. Biometrics (in press). Zhang, J. and Heitjan, D. (2006). A simple local sensitivity analysis tool for nonignorable coarsening: application to dependent censoring. Biometrics 62, 12601268. Zhang, J. and Rubin, D. (2003). Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by" Death". Journal of Educational and Behavioral Statistics 28, 353. 122 BIOGRAPHICAL SKETCH Chenguang Wang received his bachelor's and master's degrees in computer science from Dalian University of Technology, China. Chenguang later joined the biometry program of and received his master's degree in statistics from University of NebraskaLincoln. At University of Florida, Chenguang's major was statistics while simultaneously working for the Children's Oncology Group Statistics and Data Center (20042009) and Center for Devices and Radiological Health, FDA (20092010). Chenguang received his Ph.D. from University of Florida in the summer of 2010. Chenguang's research has focused on constructing a Bayesian framework for incomplete longitudinal data that identifies the parameters of interest and assesses sensitivity of the inference via incorporating expert opinions. Such a framework can be broadly used in clinical trials to provide health care professionals more accurate understanding of the statistical or causal relationship between clinical interventions and human diseases. Chenguang is a member of American Statistical Association, a member of Eastern North American Region/International Biometric Society, and a member of Children's Oncology Group. 123 PAGE 2 2 PAGE 3 3 PAGE 4 Firstandforemost,Iwouldliketoexpressthedeepestappreciationtomyadvisor,ProfessorMichaelJ.Daniels.Withouthisextraordinaryguidanceandpersistenthelp,IwillneverbeabletobewhereIam.Iadmirehiswisdom,hisknowledgeandhiscommitmenttothehigheststandard.Ithasbeentrulyanhonortoworkwithhim.IwishtospeciallythankProfessorDanielO.ScharfsteinofJohnsHopkinsforhisencouragementandcrucialcontributiontotheresearch.Iwillalwaysbearinmindtheadvicehegave:justrelaxandenjoythelearningprocess.Iwouldliketothankmycommitteemembers,ProfessorMalayGhosh,Dr.BrettPresnell,andDr.AlmutWinterstein,whohaveprovidedabundantsupportandvaluableinsightsovertheentireprocessthroughouttheclasses,examsanddissertation.ManythanksgoinparticulartoProfessorRonglingWuofPennsylvaniaStateUniversity.FromProfessorWu,IstartedlearningwhatIwantedformycareer.IalsogratefullythankProfessorMyronChang,ProfessorLindaYoung,Dr.MeenakshiDevidasandDr.GregoryCampbellofFDA.Iamfortunatetohavetheirsupportatthosecriticalmomentsofmycareer.Finally,Iwouldliketothankmywife,myson,mysoontobebornbaby,myparentsandmyparentsinlaw.ItisonlybecauseofyouthatIhavebeenabletokeepworkingtowardthisdreamIhave. 4 PAGE 5 page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 12 1.1MissingDataConceptsandDenitions .................... 12 1.2LikelihoodBasedMethods .......................... 16 1.3NonLikelihoodMethods ............................ 19 1.4IntermittentMissingness ............................ 20 1.5IdentifyingRestrictionsinPatternMixtureModels .............. 22 1.6DissertationGoals ............................... 24 2ABAYESIANSHRINKAGEMODELFORLONGITUDINALBINARYDATAWITHDROPOUT .................................. 26 2.1Introduction ................................... 26 2.1.1BreastCancerPreventionTrial .................... 26 2.1.2InformativeDropOutinLongitudinalStudies ............ 27 2.1.3Outline .................................. 30 2.2DataStructureandNotation .......................... 30 2.3Assumptions .................................. 31 2.4Identiability ................................... 32 2.5Modeling .................................... 35 2.6PriorSpecicationandPosteriorComputation ................ 35 2.6.1ShrinkagePriors ............................ 36 2.6.2PriorofSensitivityParameters .................... 37 2.6.3PosteriorComputation ......................... 40 2.7AssessmentofModelPerformanceviaSimulation ............. 40 2.8Application:BreastCancerPreventionTrial(BCPT) ............ 42 2.8.1ModelFitandShrinkageResults ................... 42 2.8.2Inference ................................ 43 2.9SummaryandDiscussion ........................... 43 2.10Acknowledgments ............................... 45 2.11TablesandFigures ............................... 45 5 PAGE 6 ................. 54 3.1Introduction ................................... 54 3.1.1IntermittentMissingData ....................... 54 3.1.2ComputationalIssues ......................... 55 3.1.3Outline .................................. 55 3.2Notation,AssumptionsandIdentiability ................... 56 3.3Modeling,PriorSpecicationandPosteriorComputation .......... 58 3.3.1Modeling ................................. 58 3.3.2ShrinkagePrior ............................. 58 3.3.3PriorofSensitivityParameters .................... 60 3.3.4PosteriorComputation ......................... 61 3.4AssessmentofModelPerformanceviaSimulation ............. 61 3.5Application:BreastCancerPreventionTrial(BCPT) ............ 62 3.5.1ModelFit ................................ 62 3.5.2Inference ................................ 63 3.5.3SensitivityofInferencetothePriors .................. 64 3.6SummaryandDiscussion ........................... 65 3.7TablesandFigures ............................... 65 3.8Appendix .................................... 73 4ANOTEONMAR,IDENTIFYINGRESTRICTIONS,ANDSENSITIVITYANALYSISINPATTERNMIXTUREMODELS ......................... 77 4.1Introduction ................................... 77 4.2ExistenceofMARunderMultivariateNormalitywithinPattern ....... 79 4.3SequentialModelSpecicationandSensitivityAnalysisunderMAR ... 83 4.4NonFutureDependenceandSensitivityAnalysisunderMultivariateNormalitywithinPattern .................................. 85 4.5MARandSensitivityAnalysiswithMultivariateNormalityontheObservedDataResponse .................................... 87 4.6Example:GrowthHormoneStudy ...................... 89 4.7ACMVRestrictionsandMultivariateNormalitywithBaselineCovariates 91 4.7.1BivariateCase ............................. 91 4.7.2MultivariateCase ............................ 93 4.8Summary .................................... 95 4.9Tables ...................................... 96 4.10Appendix .................................... 98 5DISCUSSION:FUTUREAPPLICATIONOFTHEBAYESIANNONPARAMETRICANDSEMIPARAMETRICMETHODS ....................... 103 5.1Summary .................................... 103 5.2ExtensionstoGeneticsMapping ....................... 103 5.3ExtensionstoCausalInference ........................ 105 6 PAGE 7 ..................... 105 5.3.2DataandNotation ........................... 107 5.3.3MissingDataMechanism ....................... 108 5.3.4CausalInferenceAssumption ..................... 109 5.3.5StochasticSurvivalMonotonicityAssumption ............ 111 5.3.6SummaryofCausalInference ..................... 112 5.4Figures ..................................... 112 REFERENCES ....................................... 115 BIOGRAPHICALSKETCH ................................ 123 7 PAGE 8 Table page 21RelativeRiskstobeElicited ............................. 45 22PercentilesofRelativeRisksElicited ........................ 45 23SimulationScenario ................................. 46 24SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. .................................. 47 25PatientsCumulativeDropOutRate ......................... 47 31MissingnessbyScheduledMeasurementTime .................. 65 32SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. .................................. 66 33SensitivitytotheElicitedPrior ............................ 67 34SensitivitytotheElicitedPrior ............................ 68 41GrowthHormoneStudy:Samplemean(standarddeviation)stratiedbydropoutpattern. ........................................ 96 42GrowthHormoneStudy:Posteriormean(standarddeviation) .......... 97 8 PAGE 9 Figure page 21Extrapolationoftheelicitedrelativerisks. ..................... 48 22PriorDensityofjp 49 23ModelFit ....................................... 50 24ModelShrinkage ................................... 51 25PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. ............................... 52 26Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. ............................... 53 31ModelFit ....................................... 69 32Shrinkage ....................................... 70 33PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. ............................... 71 34Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. ............................... 72 51ContourandPerspectivePlotsofaBivariateDensity .............. 113 52Illustrationofpc,t 114 9 PAGE 10 Weconsiderinferenceinrandomizedlongitudinalstudieswithmissingdatathatisgeneratedbyskippedclinicvisitsandlosstofollowup.Inthissetting,itiswellknownthatfulldataestimandsarenotidentiedunlessunveriedassumptionsareimposed.Sensitivityanalysisthatassessesthesensitivityofmodelbasedinferencestosuchassumptionsisoftennecessary. InChapters 2 and 3 ,wepositanexponentialtiltmodelthatlinksnonidentiabledistributionsandidentiabledistributions.Thisexponentialtiltmodelisindexedbynonidentiedparameters,whichareassumedtohaveaninformativepriordistribution,elicitedfromsubjectmatterexperts.Underthismodel,fulldataestimandsareshowntobeexpressedasfunctionalsofthedistributionoftheobserveddata.Weproposetwodifferentsaturatedmodelsfortheobserveddatadistribution,aswellasshrinkagepriorstoavoidthecurseofdimensionality.Thetwoproceduresprovideresearchersdifferentstrategiesforreducingthedimensionofparameterspace.Weassumeanonfuturedependencemodelforthedropoutmechanismandpartialignorabilityfortheintermittentmissingness.Inasimulationstudy,wecompareourapproachtoafullyparametricandafullysaturatedmodelforthedistributionoftheobserveddata.Ourmethodologyismotivatedby,andappliedto,datafromtheBreastCancerPreventionTrial. 10 PAGE 11 4 ,wediscusspatternmixturemodels.Patternmixturemodelingisapopularapproachforhandlingincompletelongitudinaldata.Suchmodelsarenotidentiablebyconstruction.Identifyingrestrictionsareoneapproachtomixturemodelidentication( DanielsandHogan 2008 ; Kenwardetal. 2003 ; Little 1995 ; LittleandWang 1996 ; Thijsetal. 2002 )andareanaturalstartingpointformissingnotatrandomsensitivityanalysis( DanielsandHogan 2008 ; Thijsetal. 2002 ).However,whenthepatternspecicmodelsaremultivariatenormal(MVN),identifyingrestrictionscorrespondingtomissingatrandommaynotexist.Furthermore,identicationstrategiescanbeproblematicinmodelswithcovariates(e.g.baselinecovariateswithtimeinvariantcoefcients).Inthispaper,weexploreconditionsnecessaryforidentifyingrestrictionsthatresultinmissingatrandom(MAR)toexistunderamultivariatenormalityassumptionandstrategiesforidentifyingsensitivityparametersforsensitivityanalysisorforafullyBayesiananalysiswithinformativepriors.Alongitudinalclinicaltrialisusedforillustrationofsensitivityanalysis.Problemscausedbybaselinecovariateswithtimeinvariantcoefcientsareinvestigatedandanalternativeidentifyingrestrictionbasedonresidualsisproposedasasolution. 11 PAGE 12 Theproblemofincompletedataisfrequentlyconfrontedbystatisticians,especiallyinlongitudinalstudies.Themostcommontypeofincompletedataismissingdata,inwhicheachdatavalueiseitherperfectlyknownorcompletelyunknown.Inothersituations,dataarepartiallymissingandpartiallyobserved.Examplesincluderoundeddataandcensoreddata,etc..Thistypeofincompletedataisreferredtoascoarsedata.Missingdatacanbeviewedasaspecialcaseofcoarsedata( HeitjanandRubin 1991 ).Inbothcases,theincompletenessoccursbecauseweobserveonlyasubsetofthecompletedata,whichincludesthetrue,unobservabledata.Inthisdissertation,missingdataincludingthedropoutmissingness,inwhichcasesubjectsmissingameasurementwillnotreturntostudyatthenextfollowup,andtheintermittentmissingness,inwhichcasethemissingvisitisfollowedbyanobservedmeasurement. LittleandRubin 1987 ,chapter4);however,thismethodisinefcient.Anothercommonapproachissingleimputation,thatis,llinginasinglevalueforeachmissingvalue.Theadvantageofsingleimputationisthatitdoesnotdeleteanyunitsandaftertheimputation,standardmethodsforcomplete 12 PAGE 13 Rubin 1987 ). Fornotation,lety=fy1,...,yJgdenotethefulldataresponsevectorofoutcome,possiblypartiallyobserved.Letr=fr1,r2,...,rJgdenotethemissingdataindicator,withrj=0ifyjismissingand1ifyjisobserved.Letxdenotethecovariates.Letyobsandymisdenotetheobservedandmissingresponsedata,respectively.Let!betheparametersindexingthefulldatamodelp(y,r),(!)betheparametersindexingthefulldataresponsemodelp(y),and(!)betheparametersindexingthemissingdatamechanismmodelp(rjy). Thecommonassumptionsaboutthemissingdatamechanismareasfollows. ( 1976 )and LittleandRubin ( 1987 )developedahierarchyformissingdatamechanismsbyclassifyingtherelationshipbetweenmissingnessandtheresponsedata. NotethatMARholdsifandonlyifp(ymisjyobs,r)=p(ymisjyobs).Theproofisasfollows: SupposeMARholds.Thenwehavep(rjymis,yobs)=p(rjyobs) PAGE 14 Toshowthereversedirection,notethatp(rjymis,yobs)=p(r,ymisjyobs) Laird 1988 ).Thisconditioniscalledignorability( Rubin 1976 ). 1. ThemissingdatamechanismisMAR. 2. Theparametersofthefulldataresponsemodel,(!)andtheparametersofthemissingnessmodelaredistinguishable,i.e.thefulldataparameter!canbedecomposedas((!),(!)). 3. Theparameters(!)and(!)areaprioriindependent,i.e.p((!),(!))=p((!))p((!)). FulldatamodelsthatdonotsatisfyDenition 1.4 havenonignorablemissingness. 14 PAGE 15 Kenwardetal. ( 2003 )denedthetermnonfuturedependence. 15 PAGE 16 HoganandLaird 1997b ).Likelihoodbasedmodelsformissingdataaredistinguishedbythewaythejointdistributionoftheoutcomeandmissingdataprocessesarefactorized.Theycanbeclassiedasselectionmodels,patternmixturemodels,andsharedparametermodels. Heckman ( 1979a b )usedabivariateresponseYwithmissingY2asanexampleandshowedthatingeneralit'scriticaltoanswerthequestionwhyarethedatamissingbymodelingthemissingnessofY2iasafunctionofobservedY1i(forsubjecti). DiggleandKenward ( 1994 )extendedtheHeckmanmodeltolongitudinalstudiesandmodeledthedropoutprocessbylogisticregressionsuchaslogit(rj=0jrj1=1,y)=y0. Albert 2000 ; Baker 1995 ; Fitzmauriceetal. 1995 ; Heagerty 2002 ; KurlandandHeagerty 2004 ). 16 PAGE 17 ( 1977 )introducedtheideaofmodelingrespondentsandnonrespondentsinsurveysseparatelyandusingsubjectivepriorstorelaterespondents'andnonrespondents'modelparameters. Little ( 1993 1994 )exploredpatternmixturemodelsindiscretetimesettings.Specically,differentidentifyingrestrictions(seeSection 1.5 )wereproposedtoidentifythefulldatamodel.Whenthenumberofdropoutpatternsislargeandpatternspecicparameterswillbeweaklyidentiedbyidentifyingrestrictions, Roy ( 2003 )and RoyandDaniels ( 2008 )proposedtouselatentclassmodelfordropoutclasses.Whenthedropouttimeiscontinuousandthemixtureofpatternsisinnite, Hoganetal. ( 2004 )proposedtomodeltheresponsegivendropoutbyavaryingcoefcientmodelwhereregressioncoefcientswereunspecied,nonparametricfunctionsofdropouttime.Fortimeeventdatawithinformativecensoring, WuandBailey ( 1988 1989 )and HoganandLaird ( 1997a )developedrandomeffectsmixturemodels. FitzmauriceandLaird ( 2000a )generalizedWuandBaileyandHoganandLairdapproachfordiscrete,ordinalandcountdatabyusinggeneralizedlinearmixturemodelsandGEEapproachforstatisticalinference. DanielsandHogan ( 2000 )proposedaparameterizationofthepatternmixturemodelforcontinuousdata.Sensitivityanalysiscanbedoneontheadditive(location)andmultiplicative(scale)terms. ForsterandSmith ( 1998 )consideredapatternmixturemodelforasinglecategoricalresponsewithcategoricalcovariates.Bayesianapproacheswereemployedfornonignorablemissingness. PAGE 18 WuandCarroll ( 1988 )presentedasharedparameterrandomeffectsmodelforcontinuousresponsesandinformativecensoring,inwhichindividualeffectsaretakenintoaccountasinterceptsandslopesformodelingthecensoringprocess. DeGruttolaandTu ( 1994 )extendedWuandCarroll'smodeltoallowgeneralcovariates. FollmannandWu ( 1995 )developedgeneralizedlinearmodelforresponseandproposedanapproximationalgorithmforthejointfulldatamodelforinference. FaucettandThomas ( 1996 )and WulfsohnandTsiatis ( 1997 )proposedtojointlymodelthecontinuouscovariateovertimeandrelatethecovariatestotheresponsesimultaneously. Hendersonetal. ( 2000 )generalizedthejointmodelingapproachbyusingtwocorrelatedGaussianrandomprocessesforcovariatesandresponse. TenHaveetal. ( 1998 2000 )proposedasharedparametermixedeffectslogisticregressionmodelforlongitudinalordinaldata.Recently, YuanandLittle ( 2009 )proposedamixedeffecthybridmodelallowsthemissingnessandresponsetobeconditionallydependentgivenrandomeffects. DanielsandHogan 2008 ).Fulldatamodelinferencerequiresunveriableassumptionsabouttheextrapolationmodelp(ymisjyobs,r,!E).Asensitivityanalysisexploresthesensitivityofinferencesofinterestaboutthefulldataresponsemodeltounveriableassumptionsabouttheextrapolationmodel.Thisistypicallydonebyvaryingsensitivityparameters,whichwedenenext( DanielsandHogan 2008 ). 18 PAGE 19 1. 2. TheobservedlikelihoodL(S,Mjyobs,r)isaconstantasafunctionofS, 3. GivenSxed,L(S,Mjyobs,r)isanonconstantfunctionofM Unfortunately,fullyparametricselectionmodelsandsharedparametermodelsdonotallowsensitivityanalysisassensitivityparameterscannotbefound( DanielsandHogan 2008 ,Chapter8).Examiningsensitivitytodistributionalassumptions,e.g.,randomeffects,willprovidedifferenttstotheobserveddata,(yobs,r).Insuchcases,asensitivityanalysiscannotbedonesincevaryingthedistributionalassumptionsdoesnotprovideequivalenttstotheobserveddata( DanielsandHogan 2008 ).Itthenbecomesanexerciseinmodelselection. FullyBayesiananalysisallowsresearcherstohaveasingleconclusionbyadmittingpriorbeliefsaboutthesensitivityparameters.Forcontinuousresponses, LeeandBerger ( 2001 )builtasemiparametricBayesianselectionmodelwhichhasstrongdistributionalassumptionfortheresponsebutweakassumptiononmissingdatamechanism. Scharfsteinetal. ( 2003 )ontheotherhand,placedstrongparametricassumptionsonmissingdatamechanismbutminimalassumptionsontheresponseoutcome. LiangandZeger ( 1986 )proposedgeneralizedestimatingequations(GEE)whosesolution 19 PAGE 20 Robinsetal. ( 1995 )proposedinverseprobabilityofcensoringweightedgeneralizedestimatingequations(IPCWGEE)approach,whichreweightseachindividual'scontributiontotheusualGEEbytheestimatedprobabilityofdropout.IPCWGEEwillleadtoconsistentestimationwhenthemissingnessisMAR.However,bothGEEandIPCWGEEcanresultinbiasedestimationunderMNAR. Rotnitzkyetal. ( 1998a 2001 ), Scharfsteinetal. ( 2003 )and Schulmanetal. ( 1999 )adoptedsemiparametricselectionmodelingapproaches,inwhichthemodelfordropoutisindexedbyinterpretablesensitivityparametersthatexpressdeparturesfromMAR.Forsuchapproaches,theinferenceresultsdependonthechoiceofunidentied,yetinterpretable,sensitivityanalysisparameters. Oneapproachtohandleintermittentmissingnessistoconsideramonotonizeddataset,wherebyallobservedvaluesonanindividualaftertheirrstmissingnessaredeleted Landetal. ( 2002 ).However,thisincreasesthedropoutrate,losesefciency,andmayintroducebias. Othermethodsintheliteratureoftenadoptalikelihoodapproachandrelyonstrongparametricassumptions.Forexample, Troxeletal. ( 1998 ), Albert ( 2000 )and Ibrahimetal. ( 2001 )suggestedaselectionmodelapproach. Albertetal. ( 2002 )usedasharedlatentautoregressiveprocessmodel. Linetal. ( 2004 )employedlatentclasspatternmixturemodel. 20 PAGE 21 Troxeletal. ( 1998 )and Vansteelandtetal. ( 2007 ). Troxeletal. ( 1998 )proposedamarginalmodelandintroducedapseudolikelihoodestimationprocedure. Vansteelandtetal. ( 2007 )extendedtheideasof Rotnitzkyetal. ( 1998b ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )tononmonotonemissingdatathatassume(exponentiallytilted)extensionsofsequentialexplainabilityandspeciedparametricmodelsforcertainconditionalmeans. MostrelatedtotheapproachwewilluseinChapter 3 arethe(partialignorability)assumptionsformalizedin HarelandSchafer ( 2009 )thatpartitionthemissingdataandallowone(ormore)ofthepartitionstobeignoredgiventheotherpartition(s)andtheobserveddata.Specically, HarelandSchafer ( 2009 )denedamissingdatamechanismtobepartiallymissingatrandomifp(rjyobs,ymis,g(r),x;(!))=p(rjyobs,g(r),x;(!)) Vansteelandtetal. ( 2007 ). Inthisdissertation,weexplicitlypartitionthemissingdataindicatorvectorrintofrs,sg,wheres=maxtfrt=1gdenotesthelasttimepointaresponsewasobserved,i.e.thesurvivaltime,andrs=frt:t PAGE 22 1.7 Little 1993 1994 ).Additionalassumptionsaboutthemissingdataprocessarenecessaryinordertoyieldidentifyingrestrictionsthatequatetheinestimableparameterstofunctionsofestimableparametersandidentifythefulldatamodel. Forexample,considerthesituationwheny=(y1,y2)isabivariatenormalresponsewithmissingdataonlyiny2.Letsbethesurvivaltime,i.e.s=1ify2ismissingands=2ify2isobserved.Wemodelp(s)andp(yjs)assBern()andyjs=iN((s),(s))fori=1,2,with(s)=264(s)1(s)2375and(s)=264(s)11(s)12(s)12(s)22375. PAGE 23 Understanding(identifying)restrictionsthatleadtoMARisanimportantrststepforsensitivityanalysisundermissingnotatrandom(MNAR)( DanielsandHogan 2008 ; Scharfsteinetal. 2003 ; ZhangandHeitjan 2006 ).Inparticular,MARprovidesagoodstartingpointforsensitivityanalysisandsensitivityanalysisareessentialfortheanalysisofincompletedata( DanielsandHogan 2008 ; Scharfsteinetal. 1999 ). Little ( 1993 )developedseveralcommonidentifyingrestrictions.Forexample,completecasemissingvalue(CCMV)restrictionswhichequateallmissingpatternstothecompletecases,i.e.pk(yjj PAGE 24 ( 1998 )provedthatfordiscretetimepointsandmonotonemissingness,theACMVconstraintisequivalenttomissingatrandom(MAR). Thijsetal. ( 2002 )developedstrategiestoapplyidentifyingrestrictions.Thatisrsttpk( Kenwardetal. ( 2003 )discussedidentifyingrestrictionscorrespondingtomissingnonfuturedependence. 24 PAGE 25 DanielsandHogan 2008 ).However,multivariatenormalitywithinpatternscanbeoverlyrestrictivewhenapplyingidentifyingrestrictions.WeexploresuchissuesinChapter 4 Furthermore,identicationstrategiescanbeproblematicinmodelswithcovariates(e.g.baselinecovariateswithtimeinvariantcoefcients).InthisChapter,wealsoexploreconditionsnecessaryforidentifyingrestrictionsthatresultinmissingatrandom(MAR)toexistunderamultivariatenormalityassumptionandstrategiesforsensitivityanalysis.Problemscausedbybaselinecovariateswithtimeinvariantcoefcientsareinvestigatedandanalternativeidentifyingrestrictionbasedonresidualsisproposedasasolution. 25 PAGE 26 2.1.1BreastCancerPreventionTrial Fisheretal. 1998 ).ThestudywasopentoaccrualfromJune1,1992throughSeptember30,1997and13,338womenaged35orolderwereenrolledinthestudyduringthisinterval.Theprimaryobjectivewastodeterminewhetherlongtermtamoxifentherapyiseffectiveinpreventingtheoccurrenceofinvasivebreastcancer.Secondaryobjectivesincludedqualityoflife(QOL)assessmentstoevaluatebenetaswellasriskresultingfromtheuseoftamoxifen. MonitoringQOLwasofparticularimportanceforthistrialsincetheparticipantswerehealthywomenandtherehadbeenconcernsvoicedbyresearchersabouttheassociationbetweenclinicaldepressionandtamoxifenuse.Accordingly,dataondepressionsymptomswasscheduledtobecollectedatbaselinepriortorandomization,at3months,at6monthsandevery6monthsthereafterforupto5years.TheprimaryinstrumentusedtomonitordepressivesymptomsovertimewastheCenterforEpidemiologicStudiesDepressionScale(CESD)( Radloff 1977 ).Thisselftestquestionnaireiscomposedof20items,eachofwhichisscoredonascaleof03.Ascoreof16orhigherisconsideredasalikelycaseofclinicaldepression. ThetrialwasunblindedonMarch31,1998,afteraninterimanalysisshowedadramaticreductionintheincidenceofbreastcancerinthetreatmentarm.Duetothepotentiallossofthecontrolarm,wefocusonQOLdatacollectedonthe10,982participantswhowereenrolledduringthersttwoyearsofaccrualandhadtheirCESD 26 PAGE 27 IntheBCPT,theclinicalcenterswerenotrequiredtocollectQOLdataonwomenaftertheystoppedtheirassignedtherapy.ThisdesignfeatureaggravatedtheproblemofmissingQOLdatainthetrial.Asreportedin Landetal. ( 2002 ),morethan30%oftheCESDscoresweremissingatthe36monthfollowup,withaslightlyhigherpercentageinthetamoxifengroup.TheyalsoshowedthatwomenwithhigherbaselineCESDscoreshadhigherratesofmissingdataateachfollowupvisitandthemeanobservedCESDscoresprecedingamissingmeasurementwerehigherthanthoseprecedinganobservedmeasurement;therewasnoevidencethattheserelationshipsdifferedbytreatmentgroup. WhiletheseresultssuggestthatthemissingdataprocessisassociatedwithobservedQOLoutcomes,onecannotruleoutthepossibilitythattheprocessisfurtherrelatedtounobservedoutcomesandthatthisrelationshipismodiedbytreatment.Inparticular,investigatorswereconcerned(apriori)that,betweenassessments,tamoxifenmightbecausingdepressioninsomeindividuals,whothendonotreturnfortheirnextassessment.Ifthisoccurs,thedataaresaidbemissingnotatrandom(MNAR);otherwisethedataaresaidtobemissingatrandom(MAR). Landetal. ( 2002 ),consideramonotonizeddataset,wherebyallCESDscoresobservedonanindividualaftertheirrstmissingscorehavebeendeleted(thisincreasesthedropoutrate). Therearetwomaininferentialparadigmsforanalyzinglongitudinalstudieswithinformativedropout:likelihood(parametric)andnonlikelihood(semiparametric). 27 PAGE 28 Little ( 1995 ), HoganandLaird ( 1997b )and KenwardandMolenberghs ( 1999 )aswellasrecentbooksby MolenberghsandKenward ( 2007 )and DanielsandHogan ( 2008 )provideacomprehensivereviewoflikelihoodbasedapproaches,includingselectionmodels,patternmixturemodels,andsharedparametermodels.Thesemodelsdifferinthewaythejointdistributionoftheoutcomeandmissingdataprocessesarefactorized.Inselectionmodels,onespeciesamodelforthemarginaldistributionoftheoutcomeprocessandamodelfortheconditionaldistributionofthedropoutprocessgiventheoutcomeprocess(see,forexample, Albert 2000 ; Baker 1995 ; DiggleandKenward 1994 ; Fitzmauriceetal. 1995 ; Heckman 1979a ; Liuetal. 1999 ; Molenberghsetal. 1997 );inpatternmixturemodels,onespeciesamodelfortheconditionaldistributionoftheoutcomeprocessgiventhedropouttimeandthemarginaldistributionofthedropouttime(see,forexample, BirminghamandFitzmaurice 2002 ; DanielsandHogan 2000 ; FitzmauriceandLaird 2000b ; HoganandLaird 1997a ; Little 1993 1994 1995 ; Pauleretal. 2003 ; Roy 2003 ; RoyandDaniels 2008 ; Thijsetal. 2002 );andinsharedparametermodels,theoutcomeanddropoutprocessesareassumedtobeconditionallyindependentgivensharedrandomeffects(see,forexample, DeGruttolaandTu 1994 ; Landetal. 2002 ; Pulkstenisetal. 1998 ; TenHaveetal. 1998 2000 ; WuandCarroll 1988 ; YuanandLittle 2009 ).Traditionally,thesemodelshavereliedonverystrongdistributionalassumptionsinordertoobtainmodelidentiability. Withoutthesestrongdistributionalassumptions,specicparametersfromthesemodelswouldnotbeidentiedfromthedistributionoftheobserveddata.Toaddressthisissuewithinalikelihoodbasedframework,severalauthors( Bakeretal. 1992 ; DanielsandHogan 2008 ; KurlandandHeagerty 2004 ; Little 1994 ; LittleandRubin 1999 ; Nordheim 1984 )havepromotedtheuseofglobalsensitivityanalysis,wherebynonorweaklyidentied,interpretableparametersarexedandthenvariedtoevaluate 28 PAGE 29 Nonlikelihoodapproachestoinformativedropoutinlongitudinalstudieshavebeenprimarilydevelopedfromaselectionmodelingperspective.Here,themarginaldistributionoftheoutcomeprocessismodelednonorsemiparametricallyandtheconditionaldistributionofthedropoutprocessgiventheoutcomeprocessismodeledsemiorfullyparametrically.Inthecasewherethedropoutprocessisassumedtodependonlyonobservableoutcomes(i.e.,MAR), Robinsetal. ( 1994 1995 ), vanderLaanandRobins ( 2003 )and Tsiatis ( 2006 )developedinverseweightedandaugmentedinverseweightedestimatingequationsforinference.Forinformativedropout, Rotnitzkyetal. ( 1998a ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )introducedaclassofselectionmodels,inwhichthemodelfordropoutisindexedbyinterpretablesensitivityparametersthatexpressdeparturesfromMAR.Inferenceusinginverseweightedestimatingequationswasproposed. Theproblemwiththeaforementionedsensitivityanalysisapproachesisthattheultimateinferencescanbecumbersometodisplay. Vansteelandtetal. ( 2006a )developedamethodforreportingignoranceanduncertaintyintervals(regions)thatcontainthetrueparameter(s)ofinterestwithaprescribedlevelofprecision,whenthetruedatageneratingmodelisassumedtofallwithinaplausibleclassofmodels(asanexample,see Scharfsteinetal. 2004 ).AnalternativeandverynaturalstrategyisspecifyaninformativepriordistributiononthenonorweaklyidentiedparametersandconductafullyBayesiananalysis,wherebytheultimateinferencesarereportedintermsofposteriordistributions.Inthecrosssectionalsettingwithacontinuousoutcome, Scharfsteinetal. ( 2003 )adoptedthisapproachfromasemiparametricselectionmodelingperspective. Kacirotietal. ( 2009 )proposedaparametricpatternmixturemodelforcrosssectional,clusteredbinaryoutcomes. Leeetal. ( 2008 )introducedafullyparametricpatternmixtureapproachinthelongitudinalsettingwithbinary 29 PAGE 30 Leeetal. ( 2008 ),butofferamoreexiblestrategy.InthecontextofBCPT,thelongitudinaloutcomewillbetheindicatorthattheCESDscoreis16orhigher. 2.2 ,wedescribethedatastructure.InSection 2.3 and 2.4 ,weformalizeidenticationassumptionsandprovethatthefulldatadistributionisidentiedundertheseassumptions.WeintroduceasaturatedmodelforthedistributionoftheobserveddatainSection 2.5 .InSection 2.6 ,weillustratehowtoapplyshrinkagepriorstohighorderinteractionparametersinthesaturatedmodeltoreducethedimensionalityoftheparameterspaceandhowtoelicit(conditional)informativepriorsfornonidentiedsensitivityparametersfromexperts.InSection 2.7 ,weassess,bysimulation,thebehaviorofthreeclassesofmodelsforthedistributionofobserveddata;parametric,saturated,andshrinkage.OuranalysisoftheBCPTtrialispresentedinSection 2.8 .Section 2.9 isdevotedtoasummaryanddiscussion. Ourgoalistodrawinferenceaboutz,j=P[Yj=1jZ=z]forj=1,...,Jandz=0,1. 30 PAGE 31 Thisassumptionassertsthatforindividualsatriskfordropoutatvisitjandwhosharethesamehistoryofoutcomesuptoandincludingvisitj,thedistributionoffutureoutcomesisthesameforthosewhoarelastseenatvisitjandthosewhoremainonstudypastvisitj.Thisassumptionhasbeenreferredtoasnonfuturedependence( Kenwardetal. 2003 ). Assumption2linksthenonidentiedconditionaldistributionofYjgivenRj=0,Rj1=1, PAGE 33 SupposethatP[Yj=1jRk=0,Rk1=1, PAGE 34 34 PAGE 35 Furthermore,weproposetoparameterizethefunctionsqz,j( 2.5 providesaperfectttothedistributionoftheobserveddata.Inthismodel,however,thenumberofparametersincreasesexponentiallyinJ.Incontrast,thenumberofdatapointsincreaseslinearlyinJ.Asaconsequence,therewillbemany 35 PAGE 36 RobinsandRitov 1997 ). wheretistheorderofinteractionsandthehyperparameters(shrinkagevariances)followdistributions(t)Unif(0,10)and(t)Unif(0,10). WhentherstorderMarkovmodelisnottrue,asngoestoinnity,theposteriormeansofobserveddataprobabilitieswillconvergetotheirtruevaluesaslongas 36 PAGE 37 WespecifynoninformativepriorsN(0,1000)forthenoninteractionparametersin,namelyz,j,0forj=0,...,Jandz=0,1,z,j,1,z,j,0andz,j,1forj=1,...,Jandz=0,1. 2.5 ,are(conditional)oddsratios.Inourexperience,subjectmatterexpertsoftenhavedifcultythinkingintermsofoddsratios;rather,theyaremorecomfortableexpressingbeliefsaboutrelativerisks( Scharfsteinetal. 2006 ; Shepherdetal. 2007 ).Withthisisinmind,weaskedDr.PatriciaGanz,amedicaloncologistandexpertonqualityoflifeoutcomesinbreastcancer,toexpressherbeliefsabouttheriskofdroppingoutanditsrelationshiptotreatmentassignmentanddepression.Wethentranslatedherbeliefsintopriordistributionalassumptionsabouttheoddsratiosensitivityparameters. Specically,weaskedDr.Ganztoanswerthefollowingquestionforeachtreatmentgroup: PAGE 38 Fornotationalconvenience,letrz(p)denotetherelativeriskofdropoutfortreatmentgroupzanddropoutprobabilityp.Further,letrz,min(p),rz,med(p)andrz,max(p)denotetheelicitedminimum,median,andmaximumrelativerisks(seeTable 21 ).Letpz,j( Bydenition,rz(pz,j( forrz(pz,j( 38 PAGE 39 Step1. Form2fmin,med,maxg,interpolatetheelicitedrz,m(p)atdifferentdropoutprobabilities(seeFigure 21 )tondrz,m(pz,j( Step2. Constructthepriorofrz(pz,j( Step3. Constructaconditionalpriorofp(0)z,j( maxrz(pz,j( minrz(pz,j( maxrz(pz,j( Step4. Steps(2)and(3)induceapriorforz,j, 1rz(pz,j( TherelativeriskselicitedfromDr.GanzaregiveninTable 22 .WeextrapolatedtherelativerisksoutsidetherangesgiveninTable 22 asshowninFigure 21 Figure 22 showsthedensityofgivenpz,j( 39 PAGE 40 1. Usingtheproposedobserveddatamodelwiththeshrinkagepriorson,wesimulatedrawsfromtheposteriordistributionsofP[Yj=1jRj=1, 2. ForeachdrawofP[Rj=0jRj=1, 3. Wecomputez,jbypluggingthedrawsofP[Yj=1jRj=1, 2.4 TosamplefromtheposteriordistributionsofP[Yj=1jRj=1, TheshrinkagemodelusestheshrinkagepriorsproposedinSection 2.6.1 (shrinkthesaturatedmodeltowardarstorderMarkovmodel).Notethattheshrinkagepriorsshrinkthesaturatedmodeltoanincorrectparametricmodel. 40 PAGE 41 Wesimulatedobserveddatafromatrueparametricmodelofthefollowingform:logitP[Y0=1jR0=1,Z=z]=z,0,0logitP[Y1=1jR1=1,Y0=y0,Z=z]=z,1,0+z,1,1y0logitP[R1=0jR0=1,Y0=y0,Z=z]=z,1,0+z,1,1y0logitP[Yj=1jRj=1, Todeterminetheparametersofthedatageneratingmodel,wetthismodeltothemonotonizedBCPTdatainWinBUGSwithnoninformativepriors.Weusedtheposteriormeanoftheofparameterszandzasthetrueparameters.Wecomputethetruevaluesofz,jby(1)drawing10,000valuesfromtheelicitedpriorofzgivenzgiveninTable 22 ,(2)computingz,jusingtheidenticationalgorithminSection 2.4 foreachdraw,and(3)averagetheresultingz,j's.Themodelparametersandthetruedepressionratesz,j,aregiveninTable 23 Weconsidered(relatively)small(3000),moderate(5000),andlarge(10000)samplesizesforeachtreatmentarm;foreachsamplesize,wesimulated50datasets.Weassessedmodelperformanceusingthemeansquarederror(MSE)criterion. InTable 24 ,wereporttheMSEsofP[Yj=1jRj=1, 41 PAGE 42 Inaddition,theMSEsfortheshrinkagemodelcomparefavorablywiththoseofthetrueparametricmodelforallsamplesizesconsidered,despitethefactthattheshrinkagepriorswerespeciedtoshrinktowardanincorrectmodel. 25 displaysthetreatmentspecicmonotonizeddropoutratesintheBCPT.Bythe7thstudyvisit,morethan40%ofpatientshadmissedoneormoreassessments,withaslightlyhigherpercentageinthetamoxifenarm. WettheshrinkagemodeltotheobserveddatausingWinBUGS,withfourchainsof8000iterationsand1000burnin.Convergencewascheckedbyexaminingtraceplotsofthemultiplechains. 23 ,theshrinkagemodeltstheobserveddatawell.Figure 24 illustratestheeffectofshrinkageonthemodeltsbycomparingthedifferencebetweentheempiricalrateandposteriormeanofP[Yj=1jRj=1, 42 PAGE 43 25 showstheposteriorofP[Y7=1jZ=z],thetreatmentspecicprobabilityofdepressionattheendofthe36monthfollowup(solidlines).Forcomparison,theposteriorunderMAR(correspondingtopointmasspriorsforatzero)isalsopresented(dashedlines).Theobserveddepressionrates(i.e.,completecaseanalysis)were0.115onboththeplaceboandtamoxifenarms.UndertheMNARanalysis(usingtheelicitedpriors),theposteriormeanofthedepressionratesatmonth36were0.126(95%CI:0.115,0.138)and0.130(95%CI:0.119,0.143)fortheplaceboandtamoxifenarms;thedifferencewas0.004(95%CI:0.012,0.021).UnderMAR,therateswere0.125(95%CI:0.114,0.136)and0.126(95%CI:0.115,0.138)fortheplaceboandtamoxifenarms;thedifferencewas0.001(95%CI:0.015,0.018).TheposteriorprobabilityofdepressionwashigherundertheMNARanalysisthantheMARanalysissinceresearchersbelieveddepressedpatientsweremorelikelytodropout(seeTable 22 ),abeliefthatwascapturedbytheelicitedpriors.Figure 26 showsthatunderthetwotreatmentstherewerenosignicantdifferencesinthedepressionratesateverytimepoint(95%credibleintervalsallcoverzero)underbothMNARandMAR.Similar(nonsignicant)treatmentdifferenceswereseenwhenexaminingtreatmentcomparisonsconditionalondepressionstatusatbaseline. 43 PAGE 44 Penalizedlikelihood( FanandLi 2001 ; GreenandSilverman 1994 ; Wahba 1990 )isanotherapproachforhighdimensionalstatisticalmodeling.Therearesimilaritiesbetweenthepenalizedlikelihoodapproachandourshrinkagemodel.Infact,theshrinkagepriorsonthesaturatedmodelparametersproposedinourapproachcanbeviewedasaspecicformforthepenalty. Theideasinthispapercanbeextendedtocontinuousoutcomes.Forexample,onecouldusethemixturesofDirichletprocessesmodel( EscobarandWest 1995 )forthedistributionofobservedresponses.Theycanalsobeextendedtomultiplecausedropout;inthistrial,missedassessmentswereduetoavarietyofreasonsincludingpatientspeciccausessuchasexperiencingaprotocoldenedevent,stoppingtherapy,orwithdrawingconsentandinstitutionspeciccausessuchasunderstafnganstaffturnover.Therefore,somemissingnessislesslikelytobeinformative;extensionswillneedtoaccountforthat.Inaddition,institutionaldifferencesmightbeaddressedbyallowinginstitutionspecicparameterswithpriorsthatshrinkthemtowardacommonsetofparameters. Forsmallersamplesizes,WinBUGShasdifcultysamplingfromtheposteriordistributionoftheparametersintheshrinkagemodel.Inaddition,themonotonizingapproachignorestheintermittentmissingdataandmayleadtobiasedresults.TheseissueswillbeexaminedinthenextChapter. 44 PAGE 45 RelativeRiskstobeElicited DropoutRatep 100%condentthenumberisaboverz,min(p) PercentilesofRelativeRisksElicited DropoutRate TreatmentPercentile 10%25% TamoxifenMinimum 1.101.30Median 1.201.50Maximum 1.301.60 PlaceboMinimum 1.011.20Median 1.051.30Maximum 1.101.40 45 PAGE 46 SimulationScenario TimePoint Parameter01234567 46 PAGE 47 SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. Observedj,z ShrinkageP6.9701.9990.0330.0450.0510.0590.0470.0520.066T6.9882.4010.0240.0260.0560.0530.0630.1190.073 SaturatedP35.67867.1710.0360.0500.0540.0580.1010.2310.561T34.65462.6060.0260.0330.0450.0590.0970.3290.722 ShrinkageP4.6281.1880.0250.0280.0320.0310.0350.0480.057T4.4481.4140.0170.0230.0280.0260.0380.0330.044 SaturatedP30.27454.6470.0230.0280.0280.0330.0610.1380.290T29.59951.2190.0200.0200.0320.0280.0510.1400.392 ShrinkageP2.3920.7070.0080.0100.0150.0170.0140.0130.014T2.4740.7120.0110.0150.0110.0150.0160.0190.023 SaturatedP22.98937.7160.0090.0090.0160.0180.0180.0380.094T22.24534.7910.0110.0130.0140.0140.0210.0480.128 Table25. PatientsCumulativeDropOutRate Month 361218243036 TamoxifenAvailable 5364487445974249391035293163Dropout 49076711151454183522012447DropRate(%) 9.1314.3020.7927.1134.2141.0345.62 PlaceboAvailable 5375487146244310395135933297Dropout 50475110651424178220782304DropRate(%) 9.3813.9719.8126.4933.1538.6642.87 47 PAGE 48 Extrapolationoftheelicitedrelativerisks. 48 PAGE 49 Priorconditionaldensityz,j, 49 PAGE 50 SolidanddashedlinesrepresenttheempiricalrateofP[Yj=1,Rj=1jZ=z]andP[Rj=0jZ=z],respectively.TheposteriormeansofP[Yj=1,Rj=1jZ=z](diamond)andP[Rj=0jZ=z](triangle)andtheir95%credibleintervalsaredisplayedateachtimepoint. 50 PAGE 51 DifferencesbetweenposteriormeanandempiricalrateofP[Yj=1jRj=1, 51 PAGE 52 PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. 52 PAGE 53 Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. 53 PAGE 54 2 aBayesianshrinkageapproachforlongitudinalbinarydatawithinformativedropout.Thesaturatedobserveddatamodelswereconstructedsequentiallyviaconditionaldistributionsforresponseandfordropouttimeandparameterizedonthelogisticscaleusingallinteractionterms.However,twoissueswerenotaddressed:theignoredintermittentmissingdataandtheintrinsiccomputationalchallengewiththeinteractionparameterization.ThisChapterproposessolutionstothesetwoissues. Landetal. ( 2002 );wedidthisinChapter 2 .However,thisincreasesthedropoutrate,throwsawayinformationandthuslosesefciency,andmayintroducebias. Handlinginformativeintermittentmissingdataismethodologicallyandcomputationallychallengingand,asaresult,thestatisticsliteratureisrelativelylimited.Mostmethodsadoptalikelihoodapproachandrelyonstrongparametricassumptions(see,forexample, Albert 2000 ; Albertetal. 2002 ; Ibrahimetal. 2001 ; Linetal. 2004 ; Troxeletal. 1998 ).Semiparametricmethodshavebeenproposedby Troxeletal. ( 1998 )and Vansteelandtetal. ( 2007 ). Troxeletal. ( 1998 )proposedamarginalmodelandintroducedapseudolikelihoodestimationprocedure. Vansteelandtetal. ( 2007 )extendedtheideasof Rotnitzkyetal. ( 1998b ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )tononmonotonemissingdata. 54 PAGE 55 HarelandSchafer ( 2009 )thatpartitionthemissingdataandallowone(ormore)ofthepartitionstobeignoredgiventheotherpartition(s)andtheobserveddata.InthisChapter,weapplyapartialignorabilityassumptionsuchthattheintermittentmissingdatamechanismcanbeignoredgivendropoutandtreatmentstrata. 2 ,Section 2.5 ,WinBUGShasdifcultysamplingfromtheposteriordistributionoftheparameterswhensamplesizeisrelativelysmall(lessthan3000perarm).Tailoredsamplingalgorithmscanbewrittentoovercomethisdifculty,however,WinBUGSlackstheexibilitytoincorporatemodicationsand/orextensionstoitsexistingalgorithms. InthisChapter,wewillprovideanalternativeparameterizationsofthesaturatedmodelfortheobserveddataaswellasalternativeshrinkagepriorspecicationstoimprovecomputationalefciency.ThisalternativeapproachtoposteriorsamplingcaneasilybeprogrammedinR. 3.2 ,wedescribethedatastructure,formalizeidenticationassumptionsandprovethatthetreatmentspecicdistributionofthefulltrajectoryoflongitudinaloutcomesisidentiedundertheseassumptions.InSection 3.3 ,weintroduceasaturatedmodelforthedistributionofthedatathatwouldbeobservedwhenthereisdropout,butnointermittentobservations.Wethenintroduceshrinkagepriorstoparametersinthesaturatedmodeltoreducethedimensionalityoftheparameterspace.InSection 3.4 ,weassess,bysimulation,thebehaviorofthreeclassesofmodels:parametric,saturated,andshrinkage.OuranalysisoftheBCPTtrialispresentedinSection 3.5 .Section 3.6 isdevotedtoasummaryanddiscussion. 55 PAGE 56 2 ,Section 2.2 ,aswellasintroducesomeadditionalnotationinthisSection.Thefollowingnotationisdenedforarandomindividual.Whennecessary,weusethesubscriptitodenotedatafortheithindividual. LetZdenotethetreatmentassignmentindicator,whereZ=1denotestamoxifenandZ=0denotesplacebo.LetYbethecompleteresponsedatavectorwithelementsYjdenotingthebinaryoutcome(i.e.,depression)scheduledtobemeasuredatthejthvisit(j=0(baseline),...,J)andlet Wewillnditusefultodistinguishthreesetsofdataforanindividual:thecompletedataC=(Z,S,RS,Y),thefulldataF=(Z,S,RS, Weassumethatindividualsaredrawnasasimplerandomsamplefromasuperpopulationsothatwehaveani.i.d.datastructureforC,FandO.WelettheparameterszindexamodelforthejointconditionaldistributionofSand 56 PAGE 57 Ourgoalistodrawinferenceaboutz,j=P[Yj=1jZ=z]forj=1,...,Jandz=0,1.Toidentifyz,jfromthedistributionoftheobserveddata,wemakethefollowingthree(untestable)assumptions: Thisassumptionplustheassumptionthatzisaprioriindependentofzimpliesthattheintermittentmissingnessmechanismisancillaryorignorable.Specically,thismeansthatwhenconsideringinferencesaboutzfromalikelihoodperspective,asweareinthispaper,theconditionaldistributionofRSgivenZ,SandYobsdoesnotcontributetothelikelihoodandcanbeignored( HarelandSchafer 2009 ). Assumptions2and3arethesameasAssumptions1and2inChapter 2 ,Section 2.3 ,respectively.WerestatebelowthetwoassumptionusingthesurvivaltimeSnotation(insteadofmissingindicatorsRinChapter 2 ). 57 PAGE 58 2 Theidentiabilityresultshowsthat,giventhefunctionsqz,j( 3.3.1Modeling 2 asfollows: forj=2,...,Jandy=0,1. Letzdenotetheparametersindexingtherstsetofmodelsforresponseandzdenotetheparametersindexingthesecondsetofmodelsfordropout.RecallthatwedenedztodenotetheparametersoftheconditionaldistributionofSand Thissaturatedmodelavoidsthecomplexinteractiontermmodelparameterization.Asaresult,the(conditional)posteriordistributionsofzwillhavesimpleformsandefcientposteriorsamplingispossibleevenwhenthesamplesizeismoderateorsmall. Weusethesameparameterizationofthefunctionsqz,j( 2 ,Section 2.5 2 ,thestrategytoavoidthecurseofdimensionalitywastoapplyshrinkagepriorsforhigherorderinteractionstoreducethenumberofparameters 58 PAGE 59 3 ,weuseadifferentshrinkagestrategy.Inparticular,weproposetouseBetapriorsforshrinkageasfollows: forj=2,...,Jandy=0,1.Forz,0,z,1,yandz,0,yfory=0,1,weassignUnif(0,1)priors.Letm()z(m()z)and()z(()z)denotetheparametersm()z,j,y(m()z,j1,y)and()z,j,y(()z,j1,y)respectively. NotethatforarandomvariableXthatfollowsaBeta(m=,(1m)=)distribution,wehaveE[X]=mandVar[X]=m(1m) +1. WespecifyindependentUnif(0,1)priorsform()z,j,yandm()z,j1,y.Fortheshrinkageparameters()z,j,yand()z,j1,y,wespecifyindependent,uniformshrinkagepriors( Daniels 59 PAGE 60 )asfollows gE()z,j,y()z,j,y+12and()z,j1,ygE()z,j1,y gE()z,j1,y()z,j1,y+12,(3) where ChristiansenandMorris ( 1997 )). TheexpectednumberofsubjectswithSj,Yj1=y, wheretheprobabilitiesontherighthandsideoftheaboveequationsareestimableunderAssumption1. Theexpectedsamplesizesaboveareusedinthepriorinsteadoftheobservedbinomialsamplesizeswhicharenotcompletelydeterminedduetotheintermittentmissingness.Thus,ourformulationofthesepriorsinducesasmalladditionalamountofdatadependencebeyonditsstandarddependenceonthebinomialsamplesizes.Thisadditionaldependenceaffectsthemedianofthepriorbutnotitsdiffuseness. 2 ,Section 2.6.2 forconstructingpriorsofzgivenz. 60 PAGE 61 2 ,posteriorcomputationsfortheobserveddatamodelaremucheasierandmoreefcientunderthereparameterizedmodel 3 andtheBetashrinkagepriors.TheposteriorsamplingalgorithmscanbeimplementedinRwithnosamplesizerestrictions. Thefollowingstepsareusedtosimulatedrawsfromtheposteriorofz,j: 1. SampleP(z,YImisjYobs,S,RS,Z=z)usingGibbssamplingwithdataaugmentation(seedetailsinAppendix).Continuesamplinguntilconvergence. 2. Foreachdrawofz,j1, 2.6.2 3. Computez,jbypluggingthedrawsofz, 2.4 2 ,Section 2.7 tosimulateobserveddata(nointermittentmissingness).Weagaincomparedtheperformanceofourshrinkagemodelwith(1)acorrectparametricmodel,(2)anincorrectparametricmodel(rstorderMarkovmodel)and(3)asaturatedmodel(withdiffusepriors).OurshrinkagemodelusestheshrinkagepriorsproposedinSection 3.3.2 Weconsideredsmall(500),moderate(2000),large(5000)andverylarge(1,000,000)samplesizesforeachtreatmentarm;foreachsamplesize,wesimulated500datasets.Weassessedmodelperformanceusingmeansquarederror(MSE). InTable 32 (samplesize1,000,000notshown),wereporttheMSE'sofP[Yj=1jSj, 61 PAGE 62 Inaddition,theMSE'sfortheparametersz,jintheshrinkagemodelcomparefavorablywiththoseofthetrueparametricmodelforallsamplesizesconsidered,despitethefactthattheshrinkagepriorswerespeciedtoshrinktowardanincorrectmodel. 31 displaysthetreatmentspecicdropoutandintermittentmissingratesintheBCPT.Bythe7thstudyvisit(36months),morethan30%ofpatientshaddroppedoutineachtreatmentarm,withaslightlyhigherpercentageinthetamoxifenarm. 3 )tobethemaximumfunction.Tocomputetheexpectednumberofsubjectse()z,j, 3 ),weassignedapointmasspriorat0.5toallm()z,m()z,()zand()z(whichcorrespondstoUnif(0,1)priorsonz, 3.3.4 .Toavoiddatasparsity,wecalculatedP[S=s, Toassessmodelt,wecomparedtheempiricalratesandposteriormeans(with95%credibleintervals)ofP[Yj=1,SjjZ=z]andP[S PAGE 63 32 illustratestheeffectofshrinkageonthemodeltbycomparingthedifferencebetweentheempiricalratesandposteriormeansofP[Yj=1jSj, 2.7 ,weknowthattheempiricalestimatesarelessreliableforlatertimepoints.Viatheshrinkagepriors,theprobabilitiesP[Yj=1jSj,Yj1=yj1, 33 showstheposteriorofP[Y7=1jZ=z],thetreatmentspecicprobabilityofdepressionattheendofthe36monthfollowup(solidlines).Forcomparison,theposteriorunderMAR(correspondingtopointmasspriorsforatzero)isalsopresented(dashedlines).Theobserveddepressionrates(i.e.,completecaseanalysis)were0.124and0.112fortheplaceboandtamoxifenarms,respectively.UndertheMNARanalysis(usingtheelicitedpriors),theposteriormeanofthedepressionratesatmonth36were0.133(95%CI:0.122,0.144)and0.125(95%CI:0.114,0.136)fortheplaceboandtamoxifenarms;thedifferencewas0.007(95%CI:0.023,0.008).UnderMAR,therateswere0.132(95%CI:0.121,0.143)and0.122(95%CI:0.111,0.133)fortheplaceboandtamoxifenarms;thedifferencewas0.01(95%CI:0.025,0.005). 63 PAGE 64 22 ),abeliefthatwascapturedbytheelicitedpriors.Figure 34 showsthatunderthetwotreatmentstherewerenosignicantdifferencesinthedepressionratesatanymeasurementtime(95%credibleintervalsallcoverzero)underbothMNARandMAR.Similar(nonsignicant)treatmentdifferenceswereseenwhenexaminingtreatmentcomparisonsconditionalondepressionstatusatbaseline. Theposteriormeanandbetweentreatmentdifferenceofthedepressionrateatmonth36with95%CIaregiveninTables 33 and 34 .Noneofthescenariosconsideredresultedinthe95%CIforthedifferenceinratesofdepressionat36monthsthatexcludedzeroexceptforthe(extreme)scenariowheretheelicitedtamoxifenintervalswereshiftedby0.5andtheelicitedplacebointervalswereshiftedby0.5. Wealsoassessedtheimpactofswitchingthepriorsfortheplaceboandtamoxifenarms;inthiscase,theposteriormeanswere0.135(95%CI:0.124,0.146)and0.123(95%CI:0.112,0.134)fortheplaceboandtamoxifenarmsrespectively,whilethedifferencewas0.012(95%CI:0.027,0.004). 64 PAGE 65 2 forintermittentmissingness.Inaddition,wereparameterizedthesaturatedobserveddatamodelanddramaticallyimprovedthecomputationalefciency. WinBUGScanstillbeappliedforthereparameterizedmodelwhenthereisnointermittentmissingdata.However,withtheintermittentmissingness,theaugmentationstepintheposteriorcomputationrequiresextensiveprogramminginWinBUGS.Nevertheless,theapproachinChapter 2 maystillbepreferredincertaincases,e.g.fordirectlyshrinkingtheinteractionterms. Asanextension,wemightconsideralternativestothepartialignorabilityassumption(Assumption1)whichhasbeenwidelyused,butquestionedbysome( Robins 1997 ). MissingnessbyScheduledMeasurementTime TimePointj(Month) 65 PAGE 66 SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. Observedj,z(Month) ModelTreatYR1(3)2(6)3(12)4(18)5(24)6(30)7(36) ShrinkageP29.4782.3100.2020.2260.2520.3030.3120.3370.372T28.4102.3650.2120.2320.2940.3360.3300.3900.419 SaturatedP57.107111.2630.2020.2280.3020.4901.0832.4014.427T55.582104.8820.2110.2450.3830.6571.3523.1675.782 ShrinkageP23.5450.6470.0530.0560.0630.0780.0820.0840.095T22.5980.6150.0500.0630.0630.0800.0930.0950.102 SaturatedP40.32277.6270.0530.0570.0690.1000.1880.4570.946T38.94372.7310.0500.0640.0670.1100.2180.5601.223 ShrinkageP18.830.3940.0200.0240.0260.0330.0310.0390.036T18.0550.3220.0240.0240.0280.0360.0340.0400.041 SaturatedP30.07154.4540.0200.0240.0270.0380.0520.130.270T29.15650.5900.0240.0240.0290.0390.0590.1480.373 66 PAGE 67 SensitivitytotheElicitedPrior Scenario(T:Tamoxifen,P:Placebo) 10%25%10%25%10%25%10%25% TamoxifenMinimum 0.790.501.181.461.601.800.600.80Median 1.201.501.201.501.702.000.701.00Maximum 1.702.001.221.521.802.100.801.10P[Y7=1](95%CI) 0.850.801.041.281.511.700.510.70Median 1.051.301.051.301.551.800.550.80Maximum 1.301.801.061.321.601.900.600.90P[Y7=1](95%CI) PAGE 68 SensitivitytotheElicitedPrior Scenario(T:Tamoxifen,P:Placebo) 10%25%10%25%10%25%10%25% TamoxifenMinimum 0.790.501.181.461.601.800.600.80Median 1.201.501.201.501.702.000.701.00Maximum 1.702.001.221.521.802.100.801.10P[Y7=1](95%CI) 1.041.280.850.800.510.701.511.70Median 1.051.301.051.300.550.801.551.80Maximum 1.061.321.301.800.600.901.601.90P[Y7=1](95%CI) PAGE 69 SolidanddashedlinesrepresenttheempiricalrateofP[Yj=1,SjjZ=z]andP[S PAGE 70 (A)TheempiricalrateandmodelbasedposteriormeanofP[Yj=1jSj, 70 PAGE 71 PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. 71 PAGE 72 Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. 72 PAGE 73 Gibbssamplerforposteriorcomputation:IntherststepoftheGibbssampler,wedraw,foreachsubjectwithintermittentmissingdata,fromthefullconditionalofYImisgivenz,z,m()z,()z,m()z,()z,Yobs,S,RSandZ=z.ThefullconditionaldistributioncanbeexpressedasP[YImis=yImisjz,z,m()z,()z,m()z,()z,Yobs=yobs,S=s,Rs=rs,Z=z]=P[YImis=yImis,Yobs=yobs,S=sjz,z,m()z,()z,m()z,()z,Z=z] Inthesecondstep,wedrawfromthefullconditionalofm()zgivenfYImisg,z,z,()z,m()z,()z,fYobsg,fSg,fRSgandfZg=z,wherethenotationfDgdenotesdataDforalltheindividualsonthestudy.ThefullconditionalcanbeexpressedasJYj=21Yy=0f(m()z,j,yjfYImisg,z,()z,j,y,fYobsg,fSg,fZg=z) Inthethirdstep,wedrawfromthefullconditionalofm()zgivenfYImisg,z,z,()z,m()z,()z,fYobsg,fSg,fRSgandfZg=z.ThefullconditionalcanbeexpressedasJYj=21Yy=0f(m()z,j1,yjfYImisg,z,()z,j1,y,fYobsg,fSg,fZg=z) PAGE 74 gE()z,j,y()z,j,y+12YSij,Yi,j1=yi:Zi=zB(z,j, gE()z,j1,y()z,j1,y+12YSij1,Yi,j1=yi:Zi=zB(z,j1, Neal 2003 ). 74 PAGE 75 Finally,wedrawfromthefullconditionalofzgivenfYImisg,z,r()z,()z,r()z,()z,fYobsg,fSg,fRsgandfZg=z.ThefullconditionalcanbeexpressedasJYj=21Yy=0Yall 3 ),theposteriordistributionsofP[Yj=1jSj,Yj1, 3 ),forallZ,jand 75 PAGE 76 Toseethis,notethat(pjjY,N)/pYjj(1pj)njYjZ,(pjj,)(,)dd. PAGE 77 DanielsandHogan 2008 ; HoganandLaird 1997b ; KenwardandMolenberghs 1999 ; Little 1995 ; MolenberghsandKenward 2007 ).Inthispaper,wewillconcernourselveswithpatternmixturemodelswithmonotonemissingness(i.e.,dropout).Forpatternmixturemodelswithnonmonotone(i.e.,intermittent)missingness(detailsgobeyondthescopeofthispaper),oneapproachistopartitionthemissingdataandallowone(ormore)orthepartitionstobeignoredgiventheotherpartition(s)( HarelandSchafer 2009 ; Wangetal. 2010 ). Itiswellknownthatpatternmixturemodelsarenotidentied:theobserveddatadoesnotprovideenoughinformationtoidentifythedistributionsforincompletepatterns.Theuseofidentifyingrestrictionsthatequatetheinestimableparameterstofunctionsofestimableparametersisanapproachtoresolvetheproblem( DanielsandHogan 2008 ; Kenwardetal. 2003 ; Little 1995 ; LittleandWang 1996 ; Thijsetal. 2002 ).Commonidentifyingrestrictionsincludecompletecasemissingvalue(CCMV)constraintsandavailablecasemissingvalue(ACMV)constraints. Molenberghsetal. ( 1998 )provedthatfordiscretetimepointsandmonotonemissingness,theACMVconstraintisequivalenttomissingatrandom(MAR),asdenedby Rubin ( 1976 )and LittleandRubin ( 1987 ).Akeyandattractivefeatureofidentifyingrestrictionsisthattheydonotaffectthetofthemodeltotheobserveddata.Understanding(identifying)restrictionsthatleadtoMARisanimportantrststepforsensitivityanalysisundermissingnotatrandom(MNAR)( DanielsandHogan 2008 ; Scharfsteinetal. 2003 ; ZhangandHeitjan 2006 ). 77 PAGE 78 DanielsandHogan 2008 ; Scharfsteinetal. 1999 ; Vansteelandtetal. 2006b ). Thenormalityofresponsedata(ifappropriate)forpatternmixturemodelsisdesirableasiteasilyallowsincorporationofbaselinecovariatesandintroductionofsensitivityparameters(forMNARanalysis)thathaveconvenientinterpretationsasdeviationsofmeansandvariancesfromMAR( DanielsandHogan 2008 ).However,multivariatenormalitywithinpatternscanbeoverlyrestrictive.Weexploresuchissuesinthispaper. Onecriticismofmixturemodelsisthattheyofteninducemissingdatamechanismsthatdependonthefuture( Kenwardetal. 2003 ).Weexploresuchnonfuturedependenceinourcontexthereandshowhowmixturemodelsthathavesuchmissingdatamechanismshavefewersensitivityparameters. InSection 4.2 ,weshowconditionsunderwhichMARexistsanddoesnotexistwhenthefulldataresponseisassumedmultivariatenormalwithineachmissingpattern.InSection 4.3 andSection 4.4 inthesamesetting,weexploresensitivityanalysisstrategiesunderMNARandundernonfuturedependentMNARrespectively.InSection 4.5 ,weproposeasensitivityanalysisapproachwhereonlytheobserveddatawithinpatternareassumedmultivariatenormal.InSection 4.6 ,weapplytheframeworksdescribedinprevioussectionstoarandomizedclinicaltrialforestimatingtheeffectivenessofrecombinantgrowthhormoneforincreasingmusclestrengthintheelderly.InSection 4.7 ,weshowthatinthepresenceofbaselinecovariateswithtimeinvariantcoefcients,standardidentifyingrestrictionscauseoveridenticationofthebaselinecovariateeffectsandweproposearemedy.WeprovideconclusionsinSection8. 78 PAGE 79 WeshowthatMARdoesnotnecessarilyexistwhenitisassumedthat foralls. Toseethis,weintroducesomefurthernotation.Let(s)(j)=E( 4 ),dene(s)1(j)=(s)21(j)(s)11(j)1(s)2(j)=(s)2(j)(s)1(j)(s)1(j)(s)3(j)=(s)22(j)(s)21(j)(s)11(j)1(s)12(j). PAGE 80 Proof. 4.1 issatised,thenthereexistsaconditionaldistributionps(yjj 4 )forMARtoexist. 4 ),identicationviaMARconstraintsexistsifandonlyif(s)and(s)satisfyLemma 4.1 forsjand1 PAGE 81 4.2 areimposed). Wenowexaminethecorrespondingmissingdatamechanism(MDM),SjY.Weuse'todenoteequalityindistribution. 4 )withmonotonedropout,MARholdsifandonlyifSjY'SjY1. Proof. Ontheotherhand,MARimpliesthatp(S=sjY)=p(S=sjYobs)=p(S=sj 4.2 ,wehavethatMARholdsonlyifforall1 PAGE 82 4 )withmonotonedropout,MCARisequivalenttoMARifps(y1)=p(y1)foralls. Proof. 4.3 ,weshowedthatMARholdsifp(S=sjY)=ps(y1) 4 )withmonotonedropout,MARconstraintsareidenticaltocompletecasemissingvalue(CCMV)andnearestneighborconstraints(NCMV). Proof. 4.2 ,theMARconstraintsimplypj(yjj PAGE 83 4 )anddemonstratethatMARonlyexistsunderthefairlystrictconditionsgiveninTheorem1. 4.2 ,weproposetofollowtheapproachin DanielsandHogan ( 2008 ,Chapter8)andspecifydistributionsofobservedYwithinpatternas: wherej=f1,2,...,j1g.Notethatbyconstruction,weassumeps(yjj 4 )withmonotonedropout,identicationviaMARconstraintsexistsifandonlytheobserveddatacanbemodeledas( 4 ). Proof. 4.2 showsthatidenticationviaMARconstraintsexistsifandonlyifconditionaldistributionsps(yjj 4.6 impliesthatunderthemultivariatenormalityassumptionin( 4 )andtheMARassumption,asequentialspecicationasin( 4 )alwaysexists. WeprovidesomedetailsforMARinmodel( 4 )(whichimpliesthespecicationin( 4 )asstatedinCorollary 4.6 )next.Distributionsformissingdata(whicharenot 83 PAGE 84 ThemotivationoftheproposedsequentialmodelistoallowastraightforwardextensionoftheMARspecicationtoalargeclassofMNARmodelsindexedbyparametersmeasuringdeparturesfromMAR,aswellastheattractionofdoingsensitivityanalysisonmeansand/orvariancesinnormalmodels. Forexample,wecanlet(j)l=(j)l+(j)landlog(j)jjj=(j)+log(j)jjj PAGE 85 DanielsandHogan 2008 ).Indeed,wecouldmake(j)land(j)independentofjtofurtherreducethenumberofsensitivityparameters. ToseetheimpactoftheparametersontheMDM,weintroducenotation(j)jjj=(j)0+Pj1l=1(j)lYlandthenfork PAGE 86 Kenwardetal. 2003 ). Kenwardetal. ( 2003 )showedthatnonfuturedependenceholdsifandonlyifforeachj3andk PAGE 87 4.3 ). Forexample,wemayspecifydistributionsYobsjSasfollows:ps(y1)N((s)1,(s)1)1sJps(yjj fors PAGE 88 UnderanMARassumption( 4 ),for[Yjj 4 ),thetwosensitivityparameterscontrolthedepartureofthemeanandvariancefromMARinthefollowingway,(s),MNARjjj=(j)+(s),MARjjjand(s),MNARjjj=e2(j)(s),MARjjj+(1e2(j))M, Byassumingnonfuturedependence,weobtainps(yjj PAGE 89 4 )forthecurrentdata(j=s+1).ThenumberofsensitivityparametersinthissetupisreducedfromJ(J1)to(J2)(J1);so,forJ=3(6),from6(30)to2(20).Furtherreductionsareillustratedinthenextsection. 4.4 and 4.5 thatassumemultivariatenormalityforthefulldataresponsewithinpattern(MVN)ormultivariatenormalityfortheobserveddataresponsewithinpattern(OMVN).Weassumenonfuturedependenceforthemissingdatamechanismtominimizethenumberofsensitivityparameters. Thegrowthhormone(GH)trialwasarandomizedclinicaltrialconductedtoestimatetheeffectivenessofrecombinanthumangrowthhormonetherapyforincreasingmusclestrengthintheelderly.Thetrialhadfourtreatmentarms:placebo(P),growthhormoneonly(G),exerciseplusplacebo(EP),andexerciseplusgrowthhormone(EG).Musclestrength,heremeanquadricepsstrength(QS),measuredasthemaximumfootpoundsoftorquethatcanbeexertedagainstresistanceprovidedbyamechanicaldevice,wasmeasuredatbaseline,6monthsand12months.Therewere161participantsenrolledonthisstudy,butonly(roughly)75%ofthemcompletedthe12monthfollowup.Researchersbelievedthatdropoutwasrelatedtotheunobservedstrengthmeasuresatthedropouttimes. Forillustration,weconneourattentiontothetwoarmsusingexercise:exerciseplusplacebo(EP)andexerciseplusgrowthhormone(EG).Table 41 containstheobserveddata. Let(Y1,Y2,Y3)denotethefulldataresponsecorrespondingtobaseline,6months,and12months.LetZbethetreatmentindicator(1=EG,0=EP).OurgoalistodrawinferenceaboutthemeandifferenceofQSbetweenthetwotreatmentarmsatmonth 89 PAGE 90 Thisreducesthesetofsensitivityparameterstof(2)0,(3)0gforMVNmodelandf(2),(3)gfortheOMVNmodel. Thereareavarietyofwaystospecifypriorsforthesensitivityparameters(2)0and(3)0,(2)0=E(Y2jY1,S=1)E(Y2jY1,S2)(3)0=E(Y3jY2,Y1,S=2)E(Y3jY2,Y1,S=3). Basedondiscussionwithinvestigators,wemadetheassumptionthatdropoutsdoworsethancompleters;thus,werestrictthe'stobelessthanzero.TodoafullyBayesiananalysistofairlycharacterizetheuncertaintyassociatedwiththemissingdatamechanism,weassumeauniformpriorforthe'sasadefaultchoice.Subjectmatterconsiderationsgaveanupperboundofzerofortheuniformdistributions.Weset 90 PAGE 91 WetthemodelusingWinBUGS,withmultiplechainsof25,000iterationsand4000burnin.Convergencewascheckedbyexaminingtraceplotsofthemultiplechains. TheresultsoftheMVNmodel,OMVNmodel,andtheobserveddataanalysisarepresentedinTable 42 .UnderMNAR,theposteriormean(posteriorstandarddeviation)ofthedifferenceinquadricepsstrengthat12monthsbetweenthetwotreatmentarmswas4.0(8.9)and4.4(10)fortheMVNandOMVNmodels.UnderMARthedifferenceswere5.4(8.8)and5.8(9.9)fortheMVNandOMVNmodels,respectively.ThesmallerdifferencesunderMNARwereduetoquadricepsstrengthat12monthsbeinglowerunderMNARduetotheassumptionthatdropoutsdoworsethancompleters.Weconcludethatthetreatmentdifference,wasnotsignicantlydifferentfromzero. PAGE 92 For( 4 )toholdforallY1andX,weneedthat(1)=(2). LittleandWang 1996 ; WangandDaniels 2009 ). Toresolvetheoveridenticationissue,weproposetoapplyMARconstraintsonresidualsinsteadofdirectlyontheresponses.Inthebivariatecase,thecorrespondingrestrictionis 92 PAGE 93 4 )placesnoconstraintson(s),thusavoidingoveridentication. TheMDMcorrespondingtotheACMV(MAR)ontheresidualsisgivenbylogP(S=1jY,X) 1(X)1 2((1B)2X((2)(2)T(1)(1)T)XT2(1B)(Y2(Y1))X((2)(1)))1 2log(2)11 4 )impliesMARifandonlyif(2)=(1).Soingeneral,MARonresidualsdoesnotimplythatmissingnessinY2isMAR.However,itisanidentifyingrestrictionthatdoesnotaffectthetofthemodeltotheobserveddata.CCMVandNCMVrestrictionscanbeappliedsimilarlytotheresiduals. PAGE 94 Forthemissingdata,theconditionaldistributionsarespeciedasps(yjj Theconditionalmeanstructuresin( 4 )and( 4 )inducethefollowingformforthemarginalmeanresponseE(YjjS=s)=U(s)j+X(s), NotethatsinceY1isalwaysobserved,(s)(1sJ)areidentiedbytheobserveddata.However,inthemodelgivenby( 4 )and( 4 ),thereisatwofoldoveridenticationof(s)underMAR: 1. ForMARconstraintstoexistunderthemodelgivenin( 4 ),(s)jjjasdenedin( 4 )mustbeequalfor2jsJandforallX.Thisrequiresthat(s)=for2jsJ. 2. MARconstraintsalsoimplythat(s)jjjasdenedin( 4 )mustbeequalto(j)jjjfor1s PAGE 95 Withtheconditionalmeanstructuresspeciedas( 4 )and( 4 ),theMARontheresidualsrestrictionplacesnoassumptionson(s). ThecorrespondingMDMislogP(S=sjY,X) However,asimplepatternmixturemodelbasedonmultivariatenormalityforthefulldataresponsewithinpatternsdoesnotallowMARwithoutspecialrestrictionsthatthemselves,induceaveryrestrictivemissingdatamechanism.Weexploredthis 95 PAGE 96 Inaddition,weshowedthatwhenintroducingbaselinecovariateswithtimeinvariantcoefcients,standardidentifyingrestrictionsresultinoveridenticationofthemodel.Thisisagainsttheprincipleofapplyingidentifyingrestrictioninthattheyshouldnotaffectthemodelttotheobserveddata.Weproposedasimplealternativesetofrestrictionsbasedonresidualsthatcanbeusedasan'identication'startingpointforananalysisusingmixturemodels. Inthegrowthhormonestudydataexample,weshowedhowtoreducethenumberofsensitivityparametersinpracticeandadefaultwaytoconstructinformativepriorsforsensitivityparametersbasedonlimitedknowledgeaboutthemissingness.Inparticular,allthevaluesintherange,Dwereweightedequallyviaauniformdistribution.Ifthereisadditionalexternalinformationfromexpertopinionorhistoricaldata,informativepriorsmaybeusedtoincorporatesuchinformation(forexample,see IbrahimandChen 2000 ; Wangetal. 2010 ).Finally,animportantconsiderationinsensitivityanalysisandconstructinginformativepriorsisthattheyshouldavoidextrapolatingmissingvaluesoutsideofareasonablerange(e.g.,negativequadricepsstrength). GrowthHormoneStudy:Samplemean(standarddeviation)stratiedbydropoutpattern. DropoutNumberofMonth TreatmentPatternParticipants0612 EG11258(26)2457(15)68(26)32278(24)90(32)88(32)All3869(25)87(32)88(32) EP1765(32)2287(52)86(51)33165(24)81(25)73(21)All4066(26)82(26)73(21) 96 PAGE 97 GrowthHormoneStudy:Posteriormean(standarddeviation) ObservedMARAnalysisMNARAnalysis TreatmentMonthDataMVNOMVNMVNOMVN EP066(9.9)66(6.0)66(6.0)66(6.0)66(6.0)682(18)82(5.9)81(8.2)80(6.0)80(8.3)1272(3.8)73(4.9)73(6.1)72(5.0)71(6.1) EG069(7.3)69(4.9)69(4.9)69(4.9)69(4.9)687(16)81(6.8)82(7.7)78(7.1)79(8.0)1288(6.8)78(7.2)79(7.8)76(7.5)76(8.0) Differenceat12mos.12(7.8)5.4(8.8)5.8(9.9)4.0(8.9)4.4(10) 97 PAGE 98 Missingdatamechanismundermissingnotatrandomandmultivariatenormality:TheMDMinSection 4.3 isderivedasfollows:logP(S=sjY) 2exp((Y1(k)1)2 2exp((Yl(l)ljl(l)ljl)2 4.5 arederivedasfollows:(s),MNARjjj=E(Yjj PAGE 99 4.6 ):Wespecifyapatternmixturemodelwithsensitivityparametersforthetwotreatmentarms.Forcompactness,wesuppresssubscripttreatmentindicatorzfromalltheparametersinthefollowingmodels. FormissingpatternS,wespecifySMult() 99 PAGE 101 4.7.2 ),theMARontheresidualsrestrictionputsnoconstraintson(s). Let[ZjjS]'YjX(s).TheMARontheresidualsconstraintsarepk(zjj 2(s)ljlyl(l)lX(s)Pj1t=1(j)t(yt(l)tX(s))2 q 2(s)ljlzl(l)lPj1t=1(j)t(zt(l)t2 q PAGE 102 4 )thusimply(s)j,l=(j)j,l(s)j=(j)j+j1Xl=1(j)j,l((s)l(j)l)(j)jjj=(j)jjj, 102 PAGE 103 Farniretal. 2002 ; Liuetal. 2006 ).Insuchanalysis,researchersareofteninterestedinsimultaneouslyestimatinglinkagethatdescribesthetendencyofcertainallelestobeinheritedtogether,andlinkagedisequilibrium(LD)thatmeasuresthenonrandomassociationbetweendifferentmarkers.However,iftheLDisclosetozero,thelinkagerecombinationfractionishardtoestimate(ornotestimableatall).Weuseatwomarkerscenariotoillustratethisdilemma. 103 PAGE 104 whereDistheLDparameter.Bysimplealgebra,wecanshowthatD=p11p00p10p01. Onepossiblesolutionistoincorporatemoremarkersinthelinkageanalysis.Bydoingthis,thenumberofparentswithnolessthantwoheterozygousmarkersincreases.Consequently,moreoffspringcontributetothelikelihoodforestimatingthelinkagerecombinationfraction.However,thenumberofhaplotypefrequenciestobeestimatedalsoincreases(exponentially)asthenumberofmarkersincreases.Bayesianshrinkagemethodscanbeappliedtoaddressthisproblem. 104 PAGE 105 5.3.1CausalInferenceIntroduction Rubin 1974 ).However,innonrandomizedtrialsorinpresenceofmissingdata,thesemethodsarelimitediftheresearchinterestdemandsestimationofcausallyinterpretableeffects. Todenecausaleffects,werstintroducetheconceptofpotentialoutcomes,whicharesometimesusedexchangeablelywiththetermcounterfactual(butnotalways,see Rubin 2000 ).Theuseoftermpotentialoutcomecanbetracedatleastto Neyman ( 1923 ).NeymanusedpotentialyieldsUiktoindicatetheyieldofaplotkifexposedtoavarietyi. Rubin ( 1974 )denesthecausaleffectofonetreatment,E,overanother,C,foraparticularunitasthedifferencebetweenwhatwouldhavehappenediftheunithadbeenexposedtoE,namelyY(E),andwhatwouldhavehappenediftheunithadbeenexposedtoC,namelyY(C). Usingpotentialoutcomes, FrangakisandRubin ( 2002 )introduceaframeworkforcomparingtreatmenteffectsbasedonprinciplestratication,whichisacrossclassicationofunitsdenedbytheirpotentialoutcomeswithrespectto(post)treatmentvariables,suchastreatmentnoncomplianceordropout.Thetreatmentcomparisonadjustmentforposttreatmentvariablesisnecessarybecausesuchvariablesencodethecharacteristicsofboththetreatmentandthepatient.Forexample,apatientwithdiagnosedcancerinacancerpreventiontrailmayhavedepressioncausedbythetreatmentorbythediagnosis 105 PAGE 106 AstratumAisdenedbythejointpotentialresponseS(Z)withrespecttotheposttreatmentvariableZ(e.g.,Z=0,1).Forexample,letS(Z)bethepotentialsurvivalstatusandletS(Z)=1and0denotealiveanddeadrespectively.ThenthestratumA=fS(0)=1,S(1)=1gdenesthepatientswhowill(potentially)surviveonbotharms. Astratumisunaffectedbytreatment.Thatis,forsubjecti,i2Aori62Adoesnotdependontheactualtreatmentiisassigned.Consequently,thetreatmenteffectdenedasthedifferencebetweenfYi(0)ji2AgandfYi(1)ji2Ag Onthecontrary,astandardadjustmentforposttreatmentvariablesusesthetreatmentcomparisonbetweenfYi(0)jSi(0)=sgandfYi(1)jSi(1)=sg. ConsistentwithFrangakisandRubin'sframework, Rubin ( 2000 )introducedtheconceptofsurvivorsaveragecausaleffect(SACE),thatisthecausaleffectsoftreatmentonendpointsthataredenedonlyforsurvivors,i.e.thegroupofpatientswhowouldliveregardlessoftheirtreatmentassignment. Withintheprincipalstrataframework,theidenticationofSACEorotherprincipalstratumcausaleffectsusuallydependsonuntestableassumptions.Toaddressthe 106 PAGE 107 ZhangandRubin ( 2003 )derivedlargesampleboundsforcausaleffectswithoutassumptionsandwithassumptionssuchasmonotonicityondeathrateondifferenttreatmentarms. Gilbertetal. ( 2003 )usedaclassoflogisticselectionbiasmodelstoidentifythecausalestimandsandcarriedoutsensitivityanalysisforthemagnitudeofselectionbias. Haydenetal. ( 2005 )assumedexplainablenonrandomnoncompliance( Robins 1998 )andoutlinedasensitivityanalysisforexploringtherobustnessoftheassumption. ChengandSmall ( 2006 )derivedsharpboundsforthecausaleffectsandconstructedcondenceintervalstocovertheidenticationregion. Eglestonetal. ( 2007 )proposedasimilarmethodto ZhangandRubin ( 2003 ),butinsteadofidentifyingthefulljointdistributionofpotentialoutcomes,theyonlyidentifyfeaturesofthejointdistributionthatarenecessaryforidentifyingtheSACEestimand. Leeetal. ( 2010 )replacedthecommondeterministicmonotonicityassumptionbyastochasticonethatallowsincorporationofsubjectspeciceffectsandgeneralizedtheassumptionstomorecomplextrials. Weconsideracontrolledrandomizedclinicalstudywithtreatmentarm(Z=1)andcontrolarm(Z=0).AlongitudinalbinaryoutcomeYisscheduledtobemeasuredatvisitsj=1,...,J,i.e.Y=(Y1,...,YJ)isaJdimensionalvector.LetR=(R1,...,RJ)bethemissingindicatorvectorwithRj=1ifYjisobservedandRj=0ifYjismissing.Weassumethemissingnessismonotone. Weassumetherearemultipleeventsthatwillcausedropoutforapatientonthistrial,andcategorizetheeventsasnonresponseevents(e.g.death)andmissingevents(e.g.withdrawofconsent).Weassumethatnonresponseeventsmayhappenafterthe 107 PAGE 108 LetCdenotethesurvivaltimeforapatient.Thatis,C=cimpliesthatanonresponseeventhappenedtothepatientbetweenvisitcandc+1andcausedthepatienttodropoutonandaftervisitc+1.LetRc=fR1,...,RCgbethemissingdataindicatorrecordedpriortopatientdropoutthatiscausedbyanonresponseevent. Weuse ThefulldataFofapatientthusconsistsoffZ,C(0), forallj.NotethatthegroupofpatientsofinterestfC(0)j,C(1)jgformaprincipalstratum. 108 PAGE 109 3 ,Section 3.2 ,thatRc? 3 thatj,z,cisidentiedbytheobserveddataunderthispartialmissingatrandomassumption. 5 )isnotidentiablefromtheobserveddataO=fZ,C(Z),Rc(Z),Yobs(Z)g. I LetZ=(Z1,...,ZN)bethevectoroftreatmentassignmentforallthepatients.SUTVAmeansZi=Z0i)(Yi(Zi),Ci(Zi))=(Yi(Z0i),Ci(Z0i)), II III Thisassumptionprovidesanorderingofthemeanpotentialresponseatvisitjundertreatmentzforalltheprincipalcohortsofindividualswhowouldbeonstudyatvisitjundertreatmentz.Themeansareassumedtonotbeworseforcohortswhoremainonstudylongerunderbothtreatments.Thatis,theindividualswhowouldbelastseenattimec0(c0j)undertreatmentzandtimet0undertreatment1zwillnothaveaworsemeanpotentialresponseattimejundertreatmentzthanindividuals 109 PAGE 110 Themeanmonotonicityassumptionisoftenreasonableinclinicalstudies.Forexample,inacardiovascularstentimplantationtrial,multipleendpointsincludingallcausemortalityfreesurvivaland6minutewalktestscoreareusedtoevaluatetheeffectivenessofthedevice.Sincethetwoendpointsarepositivelycorrelated,itisplausibletoassumethatpatientswillpotentiallyperformbetterwiththeir6minutewalktestsiftheyhavealongersurvivaltime,i.e.remainonthestudylonger. Weintroducesomefurthernotation 1. 2. 3. 4. NotethatunderAssumption II (randomization),bothz,c=P(C(z)=c)=P(C=cjZ=z)andj,z,c=E[Yj(z)jC(z)=c]=E[YjjC=c,Z=z]areidentiedbytheobserveddataunderthepartialmissingatrandomassumption. ThecausaleffectofinterestSACEjcanbeexpressedas TheboundariesofSACEjin( 5 )canbefoundsubjecttothefollowingrestrictions: 1. 2. 3. 4. 110 PAGE 111 III FindingtheboundariesoftheSACE,i.e.ndingtheminimumandthemaximumoftheobjectivefunction( 5 ),canbeapproximated(byignoringthenormalizingconstant)asanonconvexquadraticallyconstrainedquadraticproblem(QCQP)( BoydandVandenberghe 1997 2004 ).ForaQCQP,astandardapproachistooptimizeasemideniterelaxationoftheQCQPandgetlowerandupperboundsonlocaloptimaloftheobjectivefunction( BoydandVandenberghe 1997 ). TheuncertaintyoftheestimatedboundscanbecharacterizedinaBayesianframework.Thejointposteriordistributionoftheboundscanbeconstructedbyimplementingtheoptimizationforeachposteriorsampleofj,z,c,identiedbythealgorithmproposedinSection 5.3.3 .TheresultcanbepresentedasinFigure 51 .Astudydecisionmightbebasedonthemodeoftheposteriorjointdistributionofthebounds. 52 ).Theseassumptions,whenreasonable,willsimplifytheoptimizationoftheobjectivefunctionandyieldmorepreciseresults. 1. Thatis,givenapatientwillsurviveuntiltimepointconthetreatmentarm,theprobabilitythepatientwillsurviveuntiltimepointn1isqtimestheprobabilitythatthepatientwillsurviveuntilnforncontheplaceboarm.Theparameterqisasensitivityparameter. 2. 111 PAGE 112 52 iszero. TheseassumptionsmaybeincorporatedintheoptimizationBayesianframeworktoimprovetheprecisionoftheposteriorjointdistributionofthebounds. PAGE 113 ContourandPerspectivePlotsofaBivariateDensity 113 PAGE 114 Illustrationofpc,t PAGE 115 Albert,P.(2000).ATransitionalModelforLongitudinalBinaryDataSubjecttoNonignorableMissingData.Biometrics56,602. Albert,P.,Follmann,D.,Wang,S.,andSuh,E.(2002).Alatentautoregressivemodelforlongitudinalbinarydatasubjecttoinformativemissingness.Biometrics58,631. Baker,S.(1995).Marginalregressionforrepeatedbinarydatawithoutcomesubjecttononignorablenonresponse.Biometrics51,1042. Baker,S.,Rosenberger,W.,andDerSimonian,R.(1992).Closedformestimatesformissingcountsintwowaycontingencytables.StatisticsinMedicine11,643. Birmingham,J.andFitzmaurice,G.(2002).APatternMixtureModelforLongitudinalBinaryResponseswithNonignorableNonresponse.Biometrics58,989. Boyd,S.andVandenberghe,L.(1997).Semideniteprogrammingrelaxationsofnonconvexproblemsincontrolandcombinatorialoptimization.communications,computation,controlandsignalprocessing:atributetoThomasKailath. Boyd,S.andVandenberghe,L.(2004).Convexoptimization.CambridgeUnivPr. Cheng,J.andSmall,D.(2006).Boundsoncausaleffectsinthreearmtrialswithnoncompliance.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology)68,815. Christiansen,C.andMorris,C.(1997).HierarchicalPoissonregressionmodeling.JournaloftheAmericanStatisticalAssociationpages618. Daniels,M.(1999).Apriorforthevarianceinhierarchicalmodels.CanadianJournalofStatistics27,. Daniels,M.andHogan,J.(2000).ReparameterizingthePatternMixtureModelforSensitivityAnalysesUnderInformativeDropout.Biometrics56,1241. Daniels,M.andHogan,J.(2008).MissingDatainLongitudinalStudies:StrategiesforBayesianModelingandSensitivityAnalysis.Chapman&Hall/CRC. DeGruttola,V.andTu,X.(1994).ModellingProgressionofCD4lymphocyteCountanditsRelationshiptoSurvivalTime.Biometrics50,1003. Diggle,P.andKenward,M.(1994).InformativeDropoutLongitudinalDataAnalysis.AppliedStatistics43,49. Egleston,B.L.,Scharfstein,D.O.,Freeman,E.E.,andWest,S.K.(2007).Causalinferencefornonmortalityoutcomesinthepresenceofdeath.Biostatistics8,526545. 115 PAGE 116 Fan,J.andLi,R.(2001).VariableSelectionViaNonconcavePenalizedLikelihoodandItsOracleProperties.JournaloftheAmericanStatisticalAssociation96,1348. Farnir,F.,Grisart,B.,Coppieters,W.,Riquet,J.,Berzi,P.,Cambisano,N.,Karim,L.,Mni,M.,Moisio,S.,Simon,P.,etal.(2002).Simultaneousminingoflinkageandlinkagedisequilibriumtonemapquantitativetraitlociinoutbredhalfsibpedigrees:revisitingthelocationofaquantitativetraitlocuswithmajoreffectonmilkproductiononbovinechromosome14.Genetics161,275. Faucett,C.andThomas,D.(1996).Simultaneouslymodellingcensoredsurvivaldataandrepeatedlymeasuredcovariates:aGibbssamplingapproach.StatisticsinMedicine15,. Fisher,B.,Costantino,J.,Wickerham,D.,Redmond,C.,Kavanah,M.,Cronin,W.,Vogel,V.,Robidoux,A.,Dimitrov,N.,Atkins,J.,Daly,M.,Wieand,S.,TanChiu,E.,Ford,L.,Wolmark,N.,otherNationalSurgicalAdjuvantBreast,andInvestigators,B.P.(1998).Tamoxifenforpreventionofbreastcancer:reportoftheNationalSurgicalAdjuvantBreastandBowelProjectP1study.JournaloftheNationalCancerInstitute90,1371. Fitzmaurice,G.andLaird,N.(2000a).Generalizedlinearmixturemodelsforhandlingnonignorabledropoutsinlongitudinalstudies.Biostatistics1,141. Fitzmaurice,G.andLaird,N.(2000b).GeneralizedLinearMixtureModelsforHandlingNonignorableDropoutsinLongitudinalStudies.Biostatistics1,141. Fitzmaurice,G.,Molenberghs,G.,andLipsitz,S.(1995).RegressionModelsforLongitudinalBinaryResponseswithInformativeDropOuts.JournaloftheRoyalStatisticalSociety.SeriesB.Methodological57,691. Follmann,D.andWu,M.(1995).Anapproximategeneralizedlinearmodelwithrandomeffectsforinformativemissingdata.Biometricspages151. Forster,J.andSmith,P.(1998).ModelBasedInferenceforCategoricalSurveyDataSubjecttoNonIgnorableNonResponse.JournaloftheRoyalStatisticalSociety:SeriesB:StatisticalMethodology60,57. Frangakis,C.E.andRubin,D.B.(2002).Principalstraticationincausalinference.Biometrics58,21. Gilbert,P.,Bosch,R.,andHudgens,M.(2003).SensitivityanalysisfortheassessmentofcausalvaccineeffectsonviralloadinHIVvaccinetrials.Biometrics59,531. Green,P.J.andSilverman,B.(1994).NonparametricRegressionandGeneralizedLinearModels.Chapman&Hall. 116 PAGE 117 Hayden,D.,Pauler,D.,andSchoenfeld,D.(2005).Anestimatorfortreatmentcomparisonsamongsurvivorsinrandomizedtrials.Biometrics61,305. Heagerty,P.(2002).Marginalizedtransitionmodelsandlikelihoodinferenceforlongitudinalcategoricaldata.Biometricspages342. Heckman,J.(1979a).SampleSelectionBiasasaSpecicationError.Econometrica47,153. Heckman,J.(1979b).Sampleselectionbiasasaspecicationerror.Econometrica:Journaloftheeconometricsocietypages153. Heitjan,D.andRubin,D.(1991).Ignorabilityandcoarsedata.TheAnnalsofStatisticspages2244. Henderson,R.,Diggle,P.,andDobson,A.(2000).Jointmodellingoflongitudinalmeasurementsandeventtimedata.Biostatistics1,465. Hogan,J.andLaird,N.(1997a).MixtureModelsfortheJointDistributionofRepeatedMeasuresandEventTimes.StatisticsinMedicine16,239. Hogan,J.andLaird,N.(1997b).ModelBasedApproachestoAnalysingIncompleteLongitudinalandFailureTimeData.StatisticsinMedicine16,259. Hogan,J.,Lin,X.,andHerman,B.(2004).Mixturesofvaryingcoefcientmodelsforlongitudinaldatawithdiscreteorcontinuousnonignorabledropout.Biometrics60,854. Ibrahim,J.andChen,M.(2000).Powerpriordistributionsforregressionmodels.StatisticalSciencepages46. Ibrahim,J.,Chen,M.,andLipsitz,S.(2001).Missingresponsesingeneralisedlinearmixedmodelswhenthemissingdatamechanismisnonignorable.Biometrika88,551. Kaciroti,N.,Schork,M.,Raghunathan,T.,andJulius,S.(2009).ABayesianSensitivityModelforIntentiontotreatAnalysisonBinaryOutcomeswithDropouts.StatisticsinMedicine28,572. Kenward,M.andMolenberghs,G.(1999).ParametricModelsforIncompleteContinuousandCategoricalLongitudinalData.StatisticalMethodsinMedicalResearch8,51. Kenward,M.,Molenberghs,G.,andThijs,H.(2003).Patternmixturemodelswithpropertimedependence.Biometrika90,53. Kurland,B.andHeagerty,P.(2004).MarginalizedTransitionModelsforLongitudinalBinaryDatawithIgnorableandNonIgnorableDropOut.StatisticsinMedicine23,2673. 117 PAGE 118 Land,S.,Wieand,S.,Day,R.,TenHave,T.,Costantino,J.,Lang,W.,andGanz,P.(2002).MethodologicalIssuesIntheAnalysisofQualityofLifeDatainClinicalTrials:IllustrationsfromtheNationalSurgicalAdjuvantBreastAndBowelProject(NSABP)BreastCancerPreventionTrial.StatisticalMethodsforQualityofLifeStudiespages71. Lee,J.andBerger,J.(2001).SemiparametricBayesianAnalysisofSelectionModels.JournaloftheAmericanStatisticalAssociation96,1397. Lee,J.,Hogan,J.,andHitsman,B.(2008).Sensitivityanalysisandinformativepriorsforlongitudinalbinarydatawithoutcomerelateddropout.TechnicalReport,BrownUniversity. Lee,K.,Daniels,M.J.,andSargent,D.J.(2010).Causaleffectsoftreatmentsforinformativemissingdataduetoprogression.ToAppearinJASA. Liang,K.Y.andZeger,S.L.(1986).Longitudinaldataanalysisusinggeneralizedlinearmodels.Biometrika73,13. Lin,H.,McCulloch,C.,andRosenheck,R.(2004).Latentpatternmixturemodelsforinformativeintermittentmissingdatainlongitudinalstudies.Biometrics60,295. Little,R.(1993).PatternMixtureModelsforMultivariateIncompleteData.JournaloftheAmericanStatisticalAssociation88,125. Little,R.(1994).AClassofPatternMixtureModelsforNormalIncompleteData.Biometrika81,471. Little,R.(1995).Modelingthedropoutmechanisminrepeatedmeasuresstudies.JournaloftheAmericanStatisticalAssociation90,. Little,R.andRubin,D.(1987).StatisticalAnalysiswithMissingData.Wiley. Little,R.andRubin,D.(1999).CommentonAdjustingforNonIgnorableDropoutUsingSemiparametricModelsbyD.O.Scharfstein,A.RotnitskyandJ.M.Robins.JournaloftheAmericanStatisticalAssociation94,1130. Little,R.andWang,Y.(1996).Patternmixturemodelsformultivariateincompletedatawithcovariates.Biometrics52,98. Liu,T.,Todhunter,R.,Lu,Q.,Schoettinger,L.,Li,H.,Littell,R.,BurtonWurster,N.,Acland,G.,Lust,G.,andWu,R.(2006).Modellingextentanddistributionofzygoticdisequilibrium:Implicationsforamultigenerationalcaninepedigree.Genetics. 118 PAGE 119 Molenberghs,G.,Kenward,M.,andLesaffre,E.(1997).TheAnalysisofLongitudinalOrdinalDatawithNonrandomDropOut.Biometrika84,33. Molenberghs,G.andKenward,M.G.(2007).MissingDatainClinicalStudies.Wiley. Molenberghs,G.,Michiels,B.,Kenward,M.,andDiggle,P.(1998).MonotoneMissingDataandPatternMixtureModels.StatisticaNeerlandica52,153. Neal,R.(2003).Slicesampling.TheAnnalsofStatistics31,705. Neyman,J.(1923).Ontheapplicationofprobabilitytheorytoagriculturalexperiments.StatisticalScience5,465. Nordheim,E.(1984).InferencefromNonrandomlyMissingCategoricalData:anExampleFromaGeneticStudyofTurner'sSyndrome.JournaloftheAmericanStatisticalAssociation79,772. Pauler,D.,McCoy,S.,andMoinpour,C.(2003).PatternMixtureModelsforLongitudinalQualityofLifeStudiesinAdvancedStageDisease.StatisticsinMedicine22,795. Pulkstenis,E.,TenHave,T.,andLandis,J.(1998).ModelfortheAnalysisofBinaryLongitudinalPainDataSubjecttoInformativeDropoutThroughRemedication.JournaloftheAmericanStatisticalAssociation93,438. Radloff,L.(1977).TheCESDScale:ASelfReportDepressionScaleforResearchintheGeneralPopulation.AppliedPsychologicalMeasurement1,385. Robins,J.(1997).Nonresponsemodelsfortheanalysisofnonmonotonenonignorablemissingdata.StatisticsinMedicine16,21. Robins,J.(1998).Correctionfornoncomplianceinequivalencetrials.StatisticsinMedicine17,. Robins,J.andRitov,Y.(1997).Towardacurseofdimensionalityappropriate(coda)asymptotictheoryforsemiparametricmodels.StatisticsinMedicine16,285. Robins,J.,Rotnitzky,A.,andZhao,L.(1994).Estimationofregressioncoefcientswhensomeregressorsarenotalwaysobserved.JournaloftheAmericanStatisticalAssociation89,846. Robins,J.,Rotnitzky,A.,andZhao,L.(1995).Analysisofsemiparametricregressionmodelsforrepeatedoutcomesinthepresenceofmissingdata.JournaloftheAmericanStatisticalAssociation90,. 119 PAGE 120 Rotnitzky,A.,Robins,J.,andScharfstein,D.(1998b).Semiparametricregressionforrepeatedoutcomeswithnonignorablenonresponse.JournaloftheAmericanStatisticalAssociation93,1321. Rotnitzky,A.,Scharfstein,D.O.,Su,T.L.,andRobins,J.M.(2001).Methodsforconductingsensitivityanalysisoftrialswithpotentiallynonignorablecompetingcausesofcensoring.Biometrics57,103. Roy,J.(2003).ModelingLongitudinalDatawithNonignorableDropoutsUsingaLatentDropoutClassModel.Biometrics59,829. Roy,J.andDaniels,M.J.(2008).Ageneralclassofpatternmixturemodelsfornonignorabledropoutwithmanypossibledropouttimes.Biometrics64,538. Rubin,D.(1974).Estimatingcausaleffectsoftreatmentsinrandomizedandnonrandomizedstudies.JournalofEducationalPsychology66,688. Rubin,D.(1976).Inferenceandmissingdata.Biometrika63,581. Rubin,D.(1977).Formalizingsubjectivenotionsabouttheeffectofnonrespondentsinsamplesurveys.JournaloftheAmericanStatisticalAssociationpages538. Rubin,D.B.(1987).MultipleImputationforNonresponseinSurveys.Wiley. Rubin,D.B.(2000).Causalinferencewithoutcounterfactuals:comment.JournaloftheAmericanStatisticalAssociationpages435. Scharfstein,D.,Daniels,M.,andRobins,J.(2003).IncorporatingPriorBeliefsaboutSelectionBiasintotheAnalysisofRandomizedTrialswithMissingOutcomes.Biostatistics4,495. Scharfstein,D.,Halloran,M.,Chu,H.,andDaniels,M.(2006).Onestimationofvaccineefcacyusingvalidationsampleswithselectionbias.Biostatistics7,615. Scharfstein,D.,Manski,C.,andAnthony,J.(2004).OntheConstructionofBoundsinProspectiveStudieswithMissingOrdinalOutcomes:ApplicationtotheGoodBehaviorGameTrial.Biometrics60,154. Scharfstein,D.,Rotnitzky,A.,andRobins,J.(1999).AdjustingforNonignorableDropOutUsingSemiparametricNonresponseModels.JournaloftheAmericanStatisticalAssociation94,1096. 120 PAGE 121 Shepherd,B.,Gilbert,P.,andMehrotra,D.(2007).ElicitingaCounterfactualSensitivityParameter.AmericanStatistician61,56. TenHave,T.,Kunselman,A.,Pulkstenis,E.,andLandis,J.(1998).Mixedeffectslogisticregressionmodelsforlongitudinalbinaryresponsedatawithinformativedropout.Biometrics54,367. TenHave,T.,Miller,M.,Reboussin,B.,andJames,M.(2000).MixedEffectsLogisticRegressionModelsforLongitudinalOrdinalFunctionalResponseDatawithMultipleCauseDropOutfromtheLongitudinalStudyofAging.Biometrics56,279. Thijs,H.,Molenberghs,G.,Michiels,B.,Verbeke,G.,andCurran,D.(2002).Strategiestotpatternmixturemodels.Biostatistics3,245. Troxel,A.,Harrington,D.,andLipsitz,S.(1998).Analysisoflongitudinaldatawithnonignorablenonmonotonemissingvalues.JournaloftheRoyalStatisticalSociety.SeriesC(AppliedStatistics)47,425. Troxel,A.,Lipsitz,S.,andHarrington,D.(1998).Marginalmodelsfortheanalysisoflongitudinalmeasurementswithnonignorablenonmonotonemissingdata.Biometrika85,661. Tsiatis,A.A.(2006).Semiparametrictheoryandmissingdata.Springer,NewYork. vanderLaan,M.J.andRobins,J.(2003).UniedMethodsforCensoredLongitudinalDataandCausality.Springer. Vansteelandt,S.,Goetghebeur,E.,Kenward,M.,andMolenberghs,G.(2006a).Ignoranceanduncertaintyregionsasinferentialtoolsinasensitivityanalysis.StatisticaSinica16,953. Vansteelandt,S.,Goetghebeur,E.,Kenward,M.,andMolenberghs,G.(2006b).Ignoranceanduncertaintyregionsasinferentialtoolsinasensitivityanalysis.StatisticaSinica16,953. Vansteelandt,S.,Rotnitzky,A.,andRobins,J.(2007).Estimationofregressionmodelsforthemeanofrepeatedoutcomesundernonignorablenonmonotonenonresponse.Biometrika94,841. Wahba,G.(1990).Splinemodelsforobservationaldata.SocietyforIndustrialMathematics. 121 PAGE 122 Wang,C.,Daniels,M.,D.O.,S.,andLand,S.(2010).ABayesianshrinkagemodelforincompletelongitudinalbinarydatawithapplicationtothebreastcancerpreventiontrial.ToAppearinJASA. Wu,M.andBailey,K.(1988).Analysingchangesinthepresenceofinformativerightcensoringcausedbydeathandwithdrawal.StatisticsinMedicine7,. Wu,M.andBailey,K.(1989).Estimationandcomparisonofchangesinthepresenceofinformativerightcensoring:conditionallinearmodel.Biometricspages939. Wu,M.andCarroll,R.(1988).Estimationandcomparisonofchangesinthepresenceofinformativerightcensoringbymodelingthecensoringprocess.Biometrics44,175. Wulfsohn,M.andTsiatis,A.(1997).Ajointmodelforsurvivalandlongitudinaldatameasuredwitherror.Biometricspages330. Yuan,Y.andLittle,R.J.(2009).Mixedeffecthybridmodelsforlongitudinaldatawithnonignorabledropout.Biometrics(inpress). Zhang,J.andHeitjan,D.(2006).Asimplelocalsensitivityanalysistoolfornonignorablecoarsening:applicationtodependentcensoring.Biometrics62,1260. Zhang,J.andRubin,D.(2003).EstimationofCausalEffectsviaPrincipalStraticationWhenSomeOutcomesareTruncatedbyDeath.JournalofEducationalandBehavioralStatistics28,353. 122 PAGE 123 ChenguangWangreceivedhisbachelor'sandmaster'sdegreesincomputersciencefromDalianUniversityofTechnology,China.Chenguanglaterjoinedthebiometryprogramofandreceivedhismaster'sdegreeinstatisticsfromUniversityofNebraskaLincoln.AtUniversityofFlorida,Chenguang'smajorwasstatisticswhilesimultaneouslyworkingfortheChildren'sOncologyGroupStatisticsandDataCenter(20042009)andCenterforDevicesandRadiologicalHealth,FDA(20092010).ChenguangreceivedhisPh.D.fromUniversityofFloridainthesummerof2010.Chenguang'sresearchhasfocusedonconstructingaBayesianframeworkforincompletelongitudinaldatathatidentiestheparametersofinterestandassessessensitivityoftheinferenceviaincorporatingexpertopinions.Suchaframeworkcanbebroadlyusedinclinicaltrialstoprovidehealthcareprofessionalsmoreaccurateunderstandingofthestatisticalorcausalrelationshipbetweenclinicalinterventionsandhumandiseases.ChenguangisamemberofAmericanStatisticalAssociation,amemberofEasternNorthAmericanRegion/InternationalBiometricSociety,andamemberofChildren'sOncologyGroup. 123 