Citation |

- Permanent Link:
- http://ufdc.ufl.edu/UFE0041942/00001
## Material Information- Title:
- Bayesian Nonparametric and Semi-Parametric Methods for Incomplete Longitudinal Data
- Creator:
- Wang, Chenguang
- Place of Publication:
- [Gainesville, Fla.]
- Publisher:
- University of Florida
- Publication Date:
- 2010
- Language:
- english
- Physical Description:
- 1 online resource (123 p.)
## Thesis/Dissertation Information- Degree:
- Doctorate ( Ph.D.)
- Degree Grantor:
- University of Florida
- Degree Disciplines:
- Statistics
- Committee Chair:
- Daniels, Michael J.
- Committee Members:
- Ghosh, Malay
Presnell, Brett D. Winterstein, Almut G. - Graduation Date:
- 8/7/2010
## Subjects- Subjects / Keywords:
- Biometrics ( jstor )
Data models ( jstor ) Inference ( jstor ) Missing data ( jstor ) Modeling ( jstor ) Parametric models ( jstor ) Placebos ( jstor ) School dropouts ( jstor ) Sensitivity analysis ( jstor ) Statistical models ( jstor ) Statistics -- Dissertations, Academic -- UF bayesian, dropout, elicitation, intermittent, longitudinal, missing, nonparametric, shrinkage - Genre:
- Electronic Thesis or Dissertation
bibliography ( marcgt ) theses ( marcgt ) government publication (state, provincial, terriorial, dependent) ( marcgt ) Statistics thesis, Ph.D.
## Notes- Abstract:
- BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to follow-up. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. Sensitivity analysis that assesses the sensitivity of model-based inferences to such assumptions is often necessary. In this dissertation, we posit an exponential tilt model that links non-identifiable distributions and identifiable distributions. This exponential tilt model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. We propose two different saturated models for the observed data distribution, as well as shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. In this dissertation, we also discuss pattern mixture models. Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions are one approach to mixture model identification and are a natural starting point for missing not at random sensitivity analysis. However, when the pattern specific models are multivariate normal (MVN), identifying restrictions corresponding to missing at random may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with time-invariant coefficients). In this paper, we explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. A longitudinal clinical trial is used for illustration of sensitivity analysis. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution. ( en )
- General Note:
- In the series University of Florida Digital Collections.
- General Note:
- Includes vita.
- Bibliography:
- Includes bibliographical references.
- Source of Description:
- Description based on online resource; title from PDF title page.
- Source of Description:
- This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
- Thesis:
- Thesis (Ph.D.)--University of Florida, 2010.
- Local:
- Adviser: Daniels, Michael J.
- Electronic Access:
- RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-02-28
- Statement of Responsibility:
- by Chenguang Wang.
## Record Information- Source Institution:
- UFRGP
- Rights Management:
- Applicable rights reserved.
- Embargo Date:
- 2/28/2011
- Resource Identifier:
- 004979703 ( ALEPH )
705931179 ( OCLC ) - Classification:
- LD1780 2010 ( lcc )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

and we can derive that P(Ymis, rlYobs) p(rlYmis, Yobs)P(Ymis Yobs) P(Ymis Yobs, ) --) - p(rlYobs) p(rlYobs) p(rlYobs)P(Ymis Yobs) p(rYobs)P(YMI ) = P(YmislYobs, r). P(r Yobs) 2. To show the reverse direction, note that p(rm p(r, Ymis Yobs) P(Ymis r, Yobs)P(r Yobs) P(r Ymis. Yobs) -- ---- P(Ymis Yobs) P(Ymis Yobs) P(Ymis Yobs)P(r Yobs) p(r = P(rlYobs) P(Ymis IYobs) This completes the proof. D Definition 1.3. Missing responses are missing not at random (MNAR) if P(rlYobs, Ymis, x; O(W)) $ p(rlYobs, y'mis, X; (W)) for some Ymis 7 yis. Ignorability: Under certain condition, the missingness process can be left unspecified for the inference on the response model parameter 0(w) (Laird, 1988). This condition is called ignorability (Rubin, 1976). Definition 1.4. The missing data mechanism is ignorable if 1. The missing data mechanism is MAR. 2. The parameters of the full data response model, 0(w) and the parameters of the missingness model are distinguishable, i.e. the full data parameter w can be decomposed as (0(w), O(w)). 3. The parameters 0(w) and O(w) are a priori independent, i.e. p(0(w), O(w)) = p(( ))u p( ( ) ). Full data models that do not satisfy Definition 1.4 have non-ignorable missingness. LIST OF TABLES Table page 2-1 Relative Risks to be Elicited ........... ............... 45 2-2 Percentiles of Relative Risks Elicited ..... ..... 45 2-3 Sim ulation Scenario . .. 46 2-4 Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arm s, respectively. . .. 47 2-5 Patients Cumulative Drop Out Rate ..... .. ...... 47 3-1 Missingness by Scheduled Measurement Time ... 65 3-2 Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arm s, respectively. . .. 66 3-3 Sensitivity to the Elicited Prior ... 67 3-4 Sensitivity to the Elicited Prior ... 68 4-1 Growth Hormone Study: Sample mean (standard deviation) stratified by dropout pattern. ........... .... .................... ... ... 96 4-2 Growth Hormone Study: Posterior mean (standard deviation) ... 97 the robustness of the inferences. Scientific experts can be employed to constrain the range of these parameters. Non-likelihood approaches to informative drop-out in longitudinal studies have been primarily developed from a selection modeling perspective. Here, the marginal distribution of the outcome process is modeled non- or semi-parametrically and the conditional distribution of the drop-out process given the outcome process is modeled semi- or fully- parametrically. In the case where the drop-out process is assumed to depend only on observable outcomes (i.e., MAR), Robins et al. (1994, 1995), van der Laan and Robins (2003) and Tsiatis (2006) developed inverse-weighted and augmented inverse-weighted estimating equations for inference. For informative drop-out, Rotnitzky et al. (1998a), Scharfstein et al. (1999) and Rotnitzky et al. (2001) introduced a class of selection models, in which the model for drop-out is indexed by interpretable sensitivity parameters that express departures from MAR. Inference using inverse-weighted estimating equations was proposed. The problem with the aforementioned sensitivity analysis approaches is that the ultimate inferences can be cumbersome to display. Vansteelandt et al. (2006a) developed a method for reporting ignorance and uncertainty intervals (regions) that contain the true parameters) of interest with a prescribed level of precision, when the true data generating model is assumed to fall within a plausible class of models (as an example, see Scharfstein et al., 2004). An alternative and very natural strategy is specify an informative prior distribution on the non- or weakly- identified parameters and conduct a fully Bayesian analysis, whereby the ultimate inferences are reported in terms of posterior distributions. In the cross-sectional setting with a continuous outcome, Scharfstein et al. (2003) adopted this approach from a semi-parametric selection modeling perspective. Kaciroti et al. (2009) proposed a parametric pattern-mixture model for cross-sectional, clustered binary outcomes Lee et al. (2008) introduced a fully-parametric pattern-mixture approach in the longitudinal setting with binary Most related to our approach are the (partial ignorability) assumptions proposed in Harel and Schafer (2009) that partition the missing data and allow one (or more) of the partitions to be ignored given the other partition(s) and the observed data. In this Chapter, we apply a partial ignorability assumption such that the intermittent missing data mechanism can be ignored given drop-out and treatment strata. 3.1.2 Computational Issues WinBUGS is a popular software package that allows convenient application of MCMC techniques. However, there are major drawbacks. For the shrinkage model proposed in Chapter 2, Section 2.5, WinBUGS has difficulty sampling from the posterior distribution of the parameters when sample size is relatively small (less than 3000 per arm). Tailored sampling algorithms can be written to overcome this difficulty, however, WinBUGS lacks the flexibility to incorporate modifications and/or extensions to its existing algorithms. In this Chapter, we will provide an alternative parameterizations of the saturated model for the observed data as well as alternative shrinkage prior specifications to improve computational efficiency. This alternative approach to posterior sampling can easily be programmed in R. 3.1.3 Outline This Chapter is organized as follows. In Section 3.2, we describe the data structure, formalize identification assumptions and prove that the treatment-specific distribution of the full trajectory of longitudinal outcomes is identified under these assumptions. In Section 3.3, we introduce a saturated model for the distribution of the data that would be observed when there is drop-out, but no intermittent observations. We then introduce shrinkage priors to parameters in the saturated model to reduce the dimensionality of the parameter space. In Section 3.4, we assess, by simulation, the behavior of three classes of models: parametric, saturated, and shrinkage. Our analysis of the BCPT trial is presented in Section 3.5. Section 3.6 is devoted to a summary and discussion. reference, the MSEs associated with the true data generating model are bolded. This table demonstrate that the shrinkage model generally outperforms both the incorrectly specified parametric model and the saturated model at all sample sizes. This improved performance is especially noticeable when comparing the MSEs for the rates of depression at times 3-7. In addition, the MSEs for the shrinkage model compare favorably with those of the true parametric model for all sample sizes considered, despite the fact that the shrinkage priors were specified to shrink toward an incorrect model. 2.8 Application: Breast Cancer Prevention Trial (BCPT) Table 2-5 displays the treatment-specific monotonized drop-out rates in the BCPT By the 7th study visit, more than 40% of patients had missed one or more assessments, with a slightly higher percentage in the tamoxifen arm. We fit the shrinkage model to the observed data using WinBUGS, with four chains of 8000 iterations and 1000 burn-in. Convergence was checked by examining trace plots of the multiple chains. 2.8.1 Model Fit and Shrinkage Results To assess the model fit, we compared the empirical rates and posterior means (with 95% credible intervals) of P[Yj = 1, Rj = 1|Z = z] and P[Rj = 0 Z = z]. As shown in Figure 2-3, the shrinkage model fits the observed data well. Figure 2-4 illustrates the effect of shrinkage on the model fits by comparing the difference between the empirical rate and posterior mean of P[Yj = 1 Rj = 1, YYl, Z = z] for allj, z and Yj_i. We can see that for early time points, the difference is close to zero since there is little shrinkage applied to the model parameters. For later time points, more higher order interaction coefficients are shrunk toward zero and the magnitude of difference increases and drifts away from zero line. In general, the empirical estimates are less reliable for the later time points (re: the simulation results in Section 7). In some cases, there are no observations within "cells." By shrinking the high order Table 3-3. Sensitivity to the Elicited Prior Scenario (T:Tamoxifen, P:Placebo) VT = 5, VP = 5 VT = 0.2, vP = 0.2 uT = 0.5, uP = 0.5 uT = -0.5, uP = -0.5 Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25% Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80 Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00 Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10 P[Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.120, 0.143) 0.117(0.107,0.128) Placebo Minimum 0.85 0.80 1.04 1.28 1.51 1.70 0.51 0.70 Median 1.05 1.30 1.05 1.30 1.55 1.80 0.55 0.80 Maximum 1.30 1.80 1.06 1.32 1.60 1.90 0.60 0.90 P[Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.139(0.128, 0.150) 0.125(0.114,0.135) Difference of P[Y7 = 1](95% CI) -0.008(-0.024, 0.008) -0.007(-0.023, 0.008) -0.007(-0.023, 0.009) -0.008(-0.023, 0.007) An important feature of our approach is that the specification of models for the identifiable distribution of the observed data and the non-identifiable parameters can be implemented by separate independent data analysts. This feature can be used to increase the objectivity of necessarily subjective inferences in the FDA review of randomized trials with informative drop-out. Penalized likelihood (Fan and Li, 2001; Green and Silverman, 1994; Wahba, 1990) is another approach for high-dimensional statistical modeling. There are similarities between the penalized likelihood approach and our shrinkage model. In fact, the shrinkage priors on the saturated model parameters proposed in our approach can be viewed as a specific form for the penalty. The ideas in this paper can be extended to continuous outcomes. For example, one could use the mixtures of Dirichlet processes model (Escobar and West, 1995) for the distribution of observed responses. They can also be extended to multiple cause dropout; in this trial, missed assessments were due to a variety of reasons including patient-specific causes such as experiencing a protocol defined event, stopping therapy, or withdrawing consent and institution-specific causes such as understaffing an staff turnover. Therefore, some missingness is less likely to be informative; extensions will need to account for that. In addition, institutional differences might be addressed by allowing institution-specific parameters with priors that shrink them toward a common set of parameters. For smaller sample sizes, WinBUGS has difficulty sampling from the posterior distribution of the parameters in the shrinkage model. In addition, the "monotonizing" approach ignores the intermittent missing data and may lead to biased results. These issues will be examined in the next Chapter. That is, the chance that a patient will "survive" longer on the placebo arm than the active treatment arm is zero. This assumes the lower-triangle (excluding the diagonal) in Figure 5-2 is zero. These assumptions may be incorporated in the optimization Bayesian framework to improve the precision of the posterior joint distribution of the bounds. 5.3.6 Summary of Causal Inference We have outlined an approach to estimate the causal effect of treatment where there is dropout due to non-response reasons such as death. We also outlined an approach for posterior inference. We need to further explore point estimation of the intervals/bounds for the causal effect and characterizing their uncertainty in a Bayesian framework. 5.4 Figures 112 Non-future dependence assumes that missingness only depends on observed data and the current missing value. It can be viewed as a special case of MNAR and an extension of MAR 1.2 Likelihood-Based Methods Likelihood based methods handle the missing values by integrating them out of the likelihood function, instead of deletion or explicitly filling in values. The general strategy is to model the joint distribution of a response and the missingness process (Hogan and Laird, 1997b). Likelihood-based models for missing data are distinguished by the way the joint distribution of the outcome and missing data processes are factorized. They can be classified as selection models, pattern-mixture models, and shared-parameter models. Selection model: Selection models factor the full-data distribution as p(y, r|w) = p(r y, O(w))p(y|0(w)). The term "selection" was first introduced in the economics literature for modeling sample selection bias; that is different responses have different probabilities of being selected into a sample. Heckman (1979a,b) used a bivariate response Y with missing Y2 as an example and showed that in general it's critical to answer the question "why are the data missing" by modeling the missingness of Y2i as a function of observed Yi; (for subject i). Diggle and Kenward (1994) extended the Heckman model to longitudinal studies and modeled the drop-out process by logistic regression such as logit(rj = 0lrj_1 = 1, y) = y'P. The Diggle and Kenward model has been adopted and extended by many researchers by (mostly) proposing different full data response models (Albert, 2000; Baker, 1995; Fitzmaurice et al., 1995; Heagerty, 2002; Kurland and Heagerty, 2004). Definition 1.6. Let p(y, r w) be a full data model with extrapolation factorization p(y, r1w) = P(Ymis Yobs, r,W E)P(Yobs, rl|/). Suppose there exists a reparameterization ((w) = (4s, m) such that 1. (s is a non-constant function of wE, 2. The observed likelihood L((s, M Ylobs, r) is a constant as a function of 5s, 3. Given s fixed, L(s, M Ylobs, r) is a non-constant function of 5m then s is a sensitivity parameter. Unfortunately, fully parametric selection models and shared parameter models do not allow sensitivity analysis as sensitivity parameters cannot be found (Daniels and Hogan, 2008, Chapter 8). Examining sensitivity to distributional assumptions, e.g., random effects, will provide different fits to the observed data, (Yobs, r). In such cases, a sensitivity analysis cannot be done since varying the distributional assumptions does not provide equivalent fits to the observed data (Daniels and Hogan, 2008). It then becomes an exercise in model selection. Fully Bayesian analysis allows researchers to have a single conclusion by admitting prior beliefs about the sensitivity parameters. For continuous responses, Lee and Berger (2001) built a semiparametric Bayesian selection model which has strong distributional assumption for the response but weak assumption on missing data mechanism. Scharfstein et al. (2003) on the other hand, placed strong parametric assumptions on missing data mechanism but minimal assumptions on the response outcome. 1.3 Non-Likelihood Methods In non-likelihood approaches, the joint distribution of the outcomes is typically modeled semiparametrically and estimating equations are used for inference. Liang and Zeger (1986) proposed generalized estimating equations (GEE) whose solution which is the available case missing value (ACMV) restriction defined later in this section, we have (1) (2) ~ 1 21) (1) (2) .21 (2) /2 + () )(Y1 P ) 2 ( (2)Y1 1 11 11 1 (1))2 ( (2))2 '(1) (2) ( 21l 22 (1) 22 (2) 11 11 (1) (2) U21 21 (1) (2)' 11 11 by which all the unidentified parameters are identified. Understanding (identifying) restrictions that lead to MAR is an important first step for sensitivity analysis under missing not at random (MNAR) (Daniels and Hogan, 2008; Scharfstein et al., 2003; Zhang and Heitjan, 2006). In particular, MAR provides a good starting point for sensitivity analysis and sensitivity analysis are essential for the analysis of incomplete data (Daniels and Hogan, 2008; Scharfstein et al., 1999). Little (1993) developed several common identifying restrictions. For example, complete case missing value (CCMV) restrictions which equate all missing patterns to the complete cases, i.e. Pk(yj lY-i) = PJ(YjlYj-1); equating parameters to a set of patterns, that is set parameters in pattern k, namely 0(k), equal to the set of patterns S 0(k) ZrjOG); jES or equating pattern distributions to a mixture of a set of patterns 5, i.e. PkQ)= jPj(.) jEs Some special case of the pattern-set mixture models restrictions include nearest-neighbor constraints: Pk(Yjj-1) = pj( ylj-1), CD Figure 2-2. Figure 2-2. --- I\ I ( I \ I ( I ) I \ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ) I II I 1 I 1 I I I I I I I I I I ) I I I I I I I I I \ Prior conditional density Tj,,y given pz,(yj-_). Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for pzj(y,-) = 0.25 and pzj(Y-i) = 0.10, respectively. 0.2 0.4 0.6 0.8 I I I I I ).0 Pattern mixture model: Pattern mixture models factor the full-data distribution as p(y, rlw) = p(y|r, O(w))p(rl|(w)). Rubin (1977) introduced the idea of modeling respondents and nonrespondents in surveys separately and using subjective priors to relate respondents' and nonrespondents' model parameters. Little (1993, 1994) explored pattern mixture models in discrete time settings. Specifically, different identifying restrictions (see Section 1.5) were proposed to identify the full-data model. When the number of dropout patterns is large and pattern-specific parameters will be weakly identified by identifying restrictions, Roy (2003) and Roy and Daniels (2008) proposed to use latent-class model for dropout classes. When the dropout time is continuous and the mixture of patterns is infinite, Hogan et al. (2004) proposed to model the response given dropout by a varying coefficient model where regression coefficients were unspecified, non-parametric functions of dropout time. For time-event data with informative censoring, Wu and Bailey (1988, 1989) and Hogan and Laird (1997a) developed random effects mixture models. Fitzmaurice and Laird (2000a) generalized Wu and Bailey and Hogan and Laird approach for discrete, ordinal and count data by using generalized linear mixture models and GEE approach for statistical inference. Daniels and Hogan (2000) proposed a parameterization of the pattern mixture model for continuous data. Sensitivity analysis can be done on the additive (location) and multiplicative (scale) terms. Forster and Smith (1998) considered a pattern mixture model for a single categorical response with categorical covariates. Bayesian approaches were employed for non-ignorable missingness. Shared parameter model: Shared parameter models factorize the full-data model as p(y, r w) = p(y, r, b, w)p(b w)db, J 12. That is, the treatment effect 0= E(Y31Z =1)- E(Y31Z = 0). In the full-data model for each treatment under non-future dependence, there are seven sensitivity parameters for the MVN model: {A0 A2) A) A3) A3) A(2), ()} and four sensitivity parameters for OMVN model: {A 2), A 3), A2), A3)} (see Appendix). For the MNAR analysis, we reduced the number of sensitivity parameters as follows: * A) and A ) do not appear in the posterior distribution of E(Y31Z) for Z = 0, 1, and thus are not necessary for inference on 0. We restrict to MNAR departures from MAR in terms of the intercept terms by assuming A2)= 3)= A3) 0. We assume the sensitivity parameters are identical between treatments. This reduces the set of sensitivity parameters to {A2), A3)} for MVN model and {A2), A()} for the OMVN model. There are a variety of ways to specify priors for the sensitivity parameters A2) and A(3) 0 ' 2 = E(Y2 Y, S = 1)- E(Y2 Y,, S > 2) A3 = E ( Y3Y2, Y,, S = 2)- E(Y31Y2, Y, S = 3). Both represent the difference of conditional means between the observed and unobserved responses. A(2) and A() have (roughly) the same interpretation as A2) and A3) respectively. Based on discussion with investigators, we made the assumption that dropouts do worse than completers; thus, we restrict the A's to be less than zero. To do a fully Bayesian analysis to fairly characterize the uncertainty associated with the missing data mechanism, we assume a uniform prior for the A's as a default choice. Subject matter considerations gave an upper bound of zero for the uniform distributions. We set Consider two markers A and B, each with two alleles A and a and B and b, respectively. Four possible haplotypes, [AB], [Ab], [aB], and [ab], can be formed by the two markers, with the frequencies denoted as p1i, pio, Pol, and poo, respectively. We use p, 1 p, q and 1 q to denote the allele frequencies of A, a, B and b, respectively. Then, we have the following relationships: Pl = pq + D Pio = p(1 q)- D (5-1) Poi = (1 p)q D Poo = (1- p)(1- q) + D, where D is the LD parameter. By simple algebra, we can show that D = PuPoo PioPol. Now, let r denote the linkage recombination fraction, the frequency that a chromosomal crossover will take place between the two markers A and B during meiosis. To estimate r, only offspring with at least a double-heterozygous parent, i.e. parent with genotype Aa/Bb, contribute to the likelihood (proportionally) by PinPoor + ploPo(1 r). Therefore, r is not estimable when D is zero. One possible solution is to incorporate more markers in the linkage analysis. By doing this, the number of parents with no less than two heterozygous markers increases. Consequently, more offspring contribute to the likelihood for estimating the linkage recombination fraction. However, the number of haplotype frequencies to be estimated also increases (exponentially) as the number of markers increases. Bayesian shrinkage methods can be applied to address this problem. In the sixth step, we draw from the full conditional of az given {Ymis}, 7z, m(), ,), mz, 7 7), {Yobs}, {5}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 S f(az, -2,y mis}Zj,y 'Z, {Yobs}, {S}, {Z} = z) j=2 y=0 all y 2 where f(Czj,y -2, Y is}, {Yobs}, {S}, {Z} = z) SB(, ; (a) /() m / + o(- (1 M( /() + n(- -o (a) B( zJYj-2 2,Y; M'z y/z*j,y zj,yj 2,y' zJY)/ zJy+ zj,yJ 2,y z,j,y- 2,y n ()- is the number of subjects with S > j, Yj1 = y, Y-2 = -2 and Z = z, and z,j,yj-2,Y o0ZJ)- is the number of subjects with S > j, Y1 = y, Yi-2 = Y2, Z = z and Yj = 1. Finally, we draw from the full conditional of 7, given {Ymis}, z, r ), 7, r(), -), {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 I I 7zf -l,y-,yl { mis} ) _, (', {Yobs}, {S}, {Z} = z) j=2 y=O all y 2 where f(,-,yj 2,{Ymis}, m -, {Yobs}, {5}, {Z} = z) B (( m, / 7(') -1- M) (7 () (^ -(()7) _o(7) SB(z-i-,yj 2, j-,y zJ+-1, Ozj-1,y 2,y' L 1y-mz')/ J-I'y nzj-1,y-2,y z,yj-"YJ- 2,y n) is the number of subjects with S > j 1, y = y, Yj-2 = Yj-2 and Z = z, z,j-1,yj 2'y and o()- is the number of subjects with S =- 1, Yj1 = y, Yi-2 = Yi-2 and Z = z. ZJYJ 2'Y Proof of Theorem 2: We will show that, with no intermittent missingness and the shrinkage priors (Equation 3-2), the posterior distributions of P[Yj = 1|S > j, Y-1, Yi-2, Z] and P[S =j-15S >j-1, Yj1, Y-2, Z], modeled as a zy-2,y and 7z,yj 2', (re: Equation 3-1), for all Z, j and Yj_- are consistent under the condition that all the true values of the probabilities are in the open interval (0, 1). for all j > 1 and 0 < / < j. Sensitivity analysis can be done on these A parameters that capture the information about the missing data mechanism. For example, in a Bayesian framework, we may assign informative priors elicited from experts to these sensitivity parameters A. Note in general we may have separate AU) and AW) for each pattern s (s < j), but in practice it is necessary to limit the dimensionality of these (Daniels and Hogan, 2008). Indeed, we could make A0) and AS) independent of j to further reduce the number of sensitivity parameters. To see the impact of the A parameters on the MDM, we introduce notation A0) 0A + A) Yi and then for k Y I Yil, S k N ( + Lj ) The conditional probability (hazard) of observing the first s observations given at least s observations is then given by: P(S = s|Y) 0s) (y (s))2 { eA) Iogp > = log P(S =s) + + + P(S > slY) 2 2a(s) 2 1-/ s+l (Y- -0 -A>0 )2 j (Y1 -(k))2 S '2 (>1 /) -log P(S = k)(C ) 2 exp 2 k) 2e 0 T 2a(k) k (Y ~ ))2 J1 1 1() )2 exp (>/) x U (eA )2 p r ( 2 /=s+l /11-1 I=k+l /1-1 In general the MDM depends on Yj, i.e. MNAR. However, one might want hazard at time ts to only depend on Ys+ in which case we need to have different distributions and assumptions on [Yj| Y _, S = k] for k 4.4 Non-Future Dependence and Sensitivity Analysis under Multivariate Normality within Pattern Non-future dependence assumes that missingness only depends on observed data and the current missing value, i.e. [S = slY] [S = sl Ys+], Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA By Chenguang Wang August 2010 Chair: Michael J. Daniels Major: Statistics We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to follow-up. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. Sensitivity analysis that assesses the sensitivity of model-based inferences to such assumptions is often necessary. In Chapters 2 and 3, we posit an exponential tilt model that links non-identifiable distributions and identifiable distributions. This exponential tilt model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. We propose two different saturated models for the observed data distribution, as well as shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. 3.8 Appendix Gibbs sampler for posterior computation: In the first step of the Gibbs sampler, we draw, for each subject with intermittent missing data, from the full conditional of Y1is given az, 7y, mV), i), m ), '), Yobs, S, Rs and Z = z. The full conditional distribution can be expressed as P[Yis = Ymisa, z, mz ), m(), rY Yobs = Yobs, S = s, Rs = rs, Z = z] P[Ymis = Ymis, Yobs = Yobs, S = s|5z, 7z, m?, 7 m) ), Z = z] Eal y1 P[Y is= Ymis, Yobs = Yobs, S = s |a, -y, rz 7 ), r 7 ), Z = z] where the right hand side can be expressed as a function of yis, Yobs, S and az and 7z. In the second step, we draw from the full conditional of m ) given {Yis} a, -z, 77 ) m( 77 {Yobs}, {S}, {Rs} and {Z} = z, where the notation {} denotes data 9 for all the individuals on the study. The full conditional can be expressed as J 1 ] m f {Y) msl' (y, {Yobs}, {S}, {Z} = z) j=2 y0 where f(Z) Y Yis}, az {Yobs}, {S}, {Z} = z) oc B(Z(aY.2; mz jy/ zJ'(1 )/ zy) Si Yi I=Y Y Si.-1 ij-1Y i.Zi=z and B(a; c, d) is a Beta density with parameters c and d. In the third step, we draw from the full conditional of mr) given {Ymi}, a~, Tz, '7), miz), 77, {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as J 1 f f f (m ( {Ymis 1}, -Y, lz( {Yobs}, {S}, {Z} = z) j=2 y= where b are subject-specific random effects. It is usually assumed that y and r are independent conditionally on b. Wu and Carroll (1988) presented a shared parameter random effects model for continuous responses and informative censoring, in which individual effects are taken into account as intercepts and slopes for modeling the censoring process. DeGruttola and Tu (1994) extended Wu and Carroll's model to allow general covariates. Follmann and Wu (1995) developed generalized linear model for response and proposed an approximation algorithm for the joint full-data model for inference. Faucett and Thomas (1996) and Wulfsohn and Tsiatis (1997) proposed to jointly model the continuous covariate over time and relate the covariates to the response simultaneously. Henderson et al. (2000) generalized the joint modeling approach by using two correlated Gaussian random processes for covariates and response. Ten Have et al. (1998, 2000) proposed a shared parameter mixed effects logistic regression model for longitudinal ordinal data. Recently, Yuan and Little (2009) proposed a mixed-effect hybrid model allows the missingness and response to be conditionally dependent given random effects. Sensitivity analysis for missing not at random: Sensitivity analysis is critical in longitudinal analysis of incomplete data. The full-data model can be factored into an extrapolation model and an observed data model, p(y, rlw) = P(Ymis Yobs, r, WE)P(Yobs, rlw/), where WE are parameters indexing the extrapolation model and wi are parameters indexing the observed data model and are identifiable from observed data (Daniels and Hogan, 2008). Full-data model inference requires unverifiable assumptions about the extrapolation model p(ymis Yobs' r, WE). A sensitivity analysis explores the sensitivity of inferences of interest about the full data response model to unverifiable assumptions about the extrapolation model. This is typically done by varying sensitivity parameters, which we define next (Daniels and Hogan, 2008). A1:Placebo (Y) . 0 S0 0 0 o 0 0o o o o o o A2:Tamoxifen(Y) .. .. ...... - 0 0 00 I0O O 0 0 0O 0 00 o o o o o 0 0 0 o B1:Placebo(R) o o .. .... .oo 00 0 0 o I o 0 o o o o o C 6 C 8 B2:Tamoxifen(R) .* .. - i i o o o 0 0 0000 0 0 0 C -6 C -8 o o o o o CL6 6 C8 o B2:Tamoxifen(R) ............ - o c-I I 9-_,o% Po o-oo oo Oooo o ooo oo o o o o oo o C--6 C--8 o o Pattern of Historical Response Figure 2-4. Differences between posterior mean and empirical rate of P[Y 1 Rj = 1, Yj_, Z = z] (Al and A2) and P[Rj = 0|Rj1 = 1, Yi_1, Z = z] (B1 and B2). The x-axis is ordered by follow up time C (max{t : Rt 1}). The bullets are the posterior mean of P[Yj = 1 Rj = 1, Y_, Z = z] and P[Rj = O|Ri_ = 1Y, Z = z] when there are no patients with historical response Y,_1. The results in this section were all based on specifying the mixture model in (4-1) and demonstrate that MAR only exists under the fairly strict conditions given in Theorem 1. 4.3 Sequential Model Specification and Sensitivity Analysis under MAR Due to the structure of p(s) and I(s) under MAR constraints as outlined in Section 4.2, we propose to follow the approach in Daniels and Hogan (2008, Chapter 8) and specify distributions of observed Y within pattern as: ps(y1) ~ N(p~S),S)) 1 < s p,(yj|yj-) N( (Ij) ) 2!j)s 2 (/ j)) 2 < < s < J where j = {1, 2,... j 1}. Note that by construction, we assume ps(yIy j_-) are identical for all j < s < J. Consequently, we have ps(y~ Yj-i) = P(y~lyj-i, S > s), denoted as ps(yjlYj-1). Corollary 4.6. For pattern mixture models of the form (4-1) with monotone dropout, identification via MAR constraints exists if and only the observed data can be modeled as (4-3). Proof. Theorem 4.2 shows that identification via MAR constraints exists if and only if conditional distributions ps(yjlYj-1) are identical for s > j and j > 2. That is, for observed data, we have Ps(yjlyj-,) N( (>-j) 7(1IJ)) D Corollary 4.6 implies that under the multivariate normality assumption in (4-1) and the MAR assumption, a sequential specification as in (4-3) always exists. We provide some details for MAR in model (4-1) (which implies the specification in (4-3) as stated in Corollary 4.6) next. Distributions for missing data (which are not (i.e., shrink them to zero). For the directly parameterized model 3-1, we use a different shrinkage strategy. In particular, we propose to use Beta priors for shrinkage as follows: azy, 2- Beta (m () /-(-) (1- m() () "- zj zj Y)/ Y (3-2) z,j-,yI, -,y Beta ( j (,y zj,y -m(Y) ,y) forj = 2,... J and y = 0, 1. For az,o, oz,l,y and 7z,o,y for y = 0, 1, we assign Unif(0, 1) priors. Let m) (m()) and r ) (r)()) denote the parameters m ,z (mJ_-y) and ,y (r) -1y,) respectively. Note that for a random variable X that follows a Beta(m/T/, (1 m)/,) distribution, we have E[X] = m and Var[X] = m(1 m) x - I+l For fixed m, Var[X] -> 0 as r -> 0, indicating shrinkage of the distribution of X toward the mean. Thus, (), and / -,y serve as shrinkage parameters for azjy2y and 7zj-y ,y respectively. As the shrinkage parameters go to zero, the distribution of the probabilities a z and 7 J- ,y are shrunk toward the mean of the probabilities that do not depend on yj-2, namely m() and m()i) respectively. In essence, the model is being shrunk toward a first-order Markov model. The shrinkage priors allow "neighboring cells" to borrow information from each other and provide more precise inferences. Theorem 2: When there is no intermittent missingness, the proposed model yields consistent posterior means of the observed data probabilities, as long as all the true values of the observed data probabilities are in the open interval (0, 1). Proof: See Appendix. We specify independent Unif(0, 1) priors for m-") and m'). For the shrinkage parameters (- and r_,) we specify independent, uniform shrinkage priors (Daniels, Table 4-2. Growth Hormone Study: Posterior mean (standard deviation) Treatment EP EG Month 0 6 12 0 6 12 Difference at 12 mos. Observed Data 66(9.9) 82(18) 72(3.8) 69(7.3) 87(16) 88(6.8) 12(7.8) MAR Analysis MVN 66(6.0) 82(5.9) 73(4.9) 69(4.9) 81(6.8) 78(7.2) 5.4(8.8) OMVN 66(6.0) 81(8.2) 73(6.1) 69(4.9) 82(7.7) 79(7.8) 5.8(9.9) MNAR MVN 66(6.0) 80(6.0) 72(5.0) 69(4.9) 78(7.1) 76(7.5) 4.0(8.9) Analysis OMVN 66(6.0) 80(8.3) 71(6.1) 69(4.9) 79(8.0) 76(8.0) 4.4(10) and o)MNAR = Var(lY, S = s) J (k) Sy). U) (1 e^)pY- = e -A jk Pk( j-1)dy E2(Yj -i S = k) k-j e Se- =j,k 8e y2 )l'j) Pk(Yj -yj_1)e)( ) dy; k=j (k 2/( ) ) 2 -- e +y (1 je j- k-j J J -jkE((j) 1' S k) (k) "2 e2A kE((y)21 S = k) +j + (1 e) )2 k=j k-j + 2e k(A" (1 e)lj) )E( y~*y,j_ S k)- A + ,k II-) k=j k-j e 2A =,k (o1j) + ( 1k) 2 2 ,W (k) I e j z^-i 4- -i-'lj- ) I =j,k- S(1 e2)) (I) 2 ( kJ ) k=j k-=j y, ---(1-e - where yj =- - e ) Full-data model for the growth hormone example (Section 4.6): We specify a pattern mixture model with sensitivity parameters for the two treatment arms. For compactness, we suppress subscript treatment indicator z from all the parameters in the following models. For missing pattern 5, we specify S ~ Mult(o) with the multinomial parameter ( = ( 02, 3), s = P(S = s) for s E {1, 2, 3}, and E=1 s = 1. 5.3.1 Causal Inference Introduction . ... 105 5.3.2 Data and Notation ........................... 107 5.3.3 Missing Data Mechanism ..... ...... 108 5.3.4 Causal Inference Assumption .. ... 109 5.3.5 Stochastic Survival Monotonicity Assumption .... 111 5.3.6 Summary of Causal Inference . ... 112 5.4 Figures . . .. 112 REFERENCES ................ ................... .. 115 BIOGRAPHICAL SKETCH ............. ..... .............. 123 LIST OF FIGURES Figure page 2-1 Extrapolation of the elicited relative risks. .. 48 2-2 Prior Density of p .. .. 49 2-3 M odel Fit . . .. 50 2-4 Model Shrinkage ... . 51 2-5 Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively. ............................... 52 2-6 Posterior mean and 95% credible interval of difference of P[Yj = -1Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively.................... ............ 53 3-1 M odel Fit . . .. 69 3-2 S hrinkage . . 70 3-3 Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and M AR, respectively. .. .. .. .. .. .. .. 71 3-4 Posterior mean and 95% credible interval of difference of P[Y, = -1Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively.................... ............ 72 5-1 Contour and Perspective Plots of a Bivariate Density ... 113 5-2 Illustration of pc,t . ...... 114 identified) are specified as: Ps(YjlYj-_) ~ N( ) < s The conditional mean structure of () and p) is parameterized as follows: 1i-- 1 j-1 j-1 1*L:- o + Z PI y, /=1 To identify the full-data model, the MAR constraints require that Pk(Yj j-) -= -p(y.y j-1) for k < j, which implies that WC) (J) and O ) = 1J) pjj- = jj- an L j- for 2 < j < J. Since the equality of the conditional means need to hold for all Y, this further implies that the MAR assumption requires that /3) = (j) for 0 < < J. The motivation of the proposed sequential model is to allow a straightforward extension of the MAR specification to a large class of MNAR models indexed by parameters measuring departures from MAR, as well as the attraction of doing sensitivity analysis on means and/or variances in normal models. For example, we can let -/o = A + /l) and log 0j) = A) + log oJ) Under ignorability, posterior inference on parameters 0(w) can be based on the observed data response likelihood n L(0(w) |obs) c / Pi(Yobs, Ymis 1(w))dymis. i-1 We show this below, L(w Yobs, r) = L(0(w), (w) |obs, r) = p(yobs, Ymis, r|(w), (w))dymis Sp(ryobs, O())P(Yobs, Ymis 0(w))dymis = p(r|yobs, (w)) J P(Yobs, Ymis (w))dymis = p(rlYobs, (W))P(Yobsl (w)) = L(0(w)lr, Yobs)L(0(W ) yobs), and furthermore, P( lYobs, r) oc p(w)L(w Yobs, r) = p(O(w))L(/(w)lr, yobs)p(O(w))L(O(w) Yobs). Therefore, P(0(w)lyobs, r) oc p(0(w))L(0(w)lyobs) and the posterior inference of 0(w) can be based on observed response likelihood L(0(w) Yobs). Non-future Dependence: For cases with monotone missingness, i.e. rj = 0 implies rj, = 0 forj' > j, Kenward et al. (2003) defined the term non-future dependence. Definition 1.5. If the missingness is monotone, the MDM is non-future dependent if P(rlYobs, Ymis, x; O(W)) = p(rlYobs, Yc, x; (W)) with C = minj{rj = 0}. 3.3.4 Posterior Computation Compared to Chapter 2, posterior computations for the observed data model are much easier and more efficient under the reparameterized model 3-1 and the Beta shrinkage priors. The posterior sampling algorithms can be implemented in R with no sample size restrictions. The following steps are used to simulate draws from the posterior of -: 1. Sample P(0z, Ymis Yobs, 5, Rs, Z = z) using Gibbs sampling with data augmentation (see details in Appendix). Continue sampling until convergence. 2. For each draw of 7 -lYy 2,Y 1, draw .,y. 1 based on the conditional priors described in Section 2.6.2. 3. Compute pZ by plugging the draws of azy- 2 ,y, Zy-2,y, and T,,y into the identification algorithm discussed in Section 2.4. 3.4 Assessment of Model Performance via Simulation For assessment of model performance, we use the same "true" parametric model as in Chapter 2, Section 2.7 to simulate observed data (no intermittent missingness). We again compared the performance of our shrinkage model with (1) a correct parametric model, (2) an incorrect parametric model (first order Markov model) and (3) a saturated model (with diffuse priors). Our shrinkage model uses the shrinkage priors proposed in Section 3.3.2. We considered small (500), moderate (2000), large (5000) and very large (1,000,000) sample sizes for each treatment arm; for each sample size, we simulated 500 datasets. We assessed model performance using mean squared error (MSE). In Table 3-2 (sample size 1,000,000 not shown), we report the MSE's of P[Yj = 1|S > j, Yi-_, Z = z] and P[S = j 11S > j 1, Y -, Z = z] averaged over all j and all Yj_1 (see columns 3 and 4, respectively). We also report the MSE's for pj (see columns 6-12). For reference, the MSE's associated with the true data generating model are bolded. At all sample sizes, the shrinkage model has lower MSE's for the (1,1) (2,2) a (c,t) c (t,c) (c,c) (J,J) Figure 5-2. Illustration of pc,t Thus, the implicit MDM is very restrictive and does not depend on the entire history, Ys. We now show connections to missing completely at random (MCAR) and other common identifying restrictions. Corollary 4.4. For pattern mixture models of the form (4-1) with monotone dropout, MCAR is equivalent to MAR if p, (yi) = p(yi) for all s. Proof. First, MCAR implies MAR. Second, in the proof of Corollary 4.3, we showed that MAR holds if p(S = slY) = p(S = s). p(ys) P(Yi) Thus under the assumption that ps(yi) = p(yi), MAR implies that p(S = slY) = p(S = s), i.e. MCAR. D Corollary 4.5. For pattern mixture models of the form (4-1) with monotone dropout, MAR constraints are identical to complete case missing value (CCMV) and nearest- neighbor constraints (NCMV). Proof. By Theorem 4.2, the MAR constraints imply Pj(YjlYj-i) = PJ(Yjlyj-i) = P^j(YjlYj-J- Therefore for all k < j, the MAR constraints Pk(YJYij-) -= P(y~jlj-2) are identical to CCMV restrictions Pk(YjlYj-1) = PJ(yjj-1) and to NCMV restrictions Pk (Yj(yj-i) = pjyj-y ) = Rotnitzky, A., Robins, J., and Scharfstein, D. (1998a). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association 93, 1321-1322. Rotnitzky, A., Robins, J., and Scharfstein, D. (1998b). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association 93, 1321-1322. Rotnitzky, A., Scharfstein, D. O., Su, T.-L., and Robins, J. M. (2001). Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics 57, 103-113. Roy, J. (2003). Modeling Longitudinal Data with Nonignorable Dropouts Using a Latent Dropout Class Model. Biometrics 59, 829-836. Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64, 538-545. Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688-701. Rubin, D. (1976). Inference and missing data. Biometrika 63, 581-592. Rubin, D. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association pages 538-543. Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley. Rubin, D. B. (2000). Causal inference without counterfactuals: comment. Journal of the American Statistical Association pages 435-438. Scharfstein, D., Daniels, M., and Robins, J. (2003). Incorporating Prior Beliefs about Selection Bias into the Analysis of Randomized Trials with Missing Outcomes. Biostatistics 4, 495. Scharfstein, D., Halloran, M., Chu, H., and Daniels, M. (2006). On estimation of vaccine efficacy using validation samples with selection bias. Biostatistics 7, 615. Scharfstein, D., Manski, C., and Anthony, J. (2004). On the Construction of Bounds in Prospective Studies with Missing Ordinal Outcomes: Application to the Good Behavior Game Trial. Biometrics 60, 154-164. Scharfstein, D., Rotnitzky, A., and Robins, J. (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association 94, 1096-1146. 120 CHAPTER 4 A NOTE ON MAR, IDENTIFYING RESTRICTIONS, AND SENSITIVITY ANALYSIS IN PATTERN MIXTURE MODELS 4.1 Introduction For analyzing longitudinal studies with informative missingness, popular modeling frameworks include pattern mixture models, selection models and shared parameter models, which differ in the way the joint distribution of the outcome and missing data process are factorized (for a comprehensive review, see Daniels and Hogan, 2008; Hogan and Laird, 1997b; Kenward and Molenberghs, 1999; Little, 1995; Molenberghs and Kenward, 2007). In this paper, we will concern ourselves with pattern mixture models with monotone missingness (i.e., drop-out). For pattern mixture models with non-monotone (i.e., intermittent) missingness (details go beyond the scope of this paper), one approach is to partition the missing data and allow one (or more) or the partitions to be ignored given the other partition(s) (Harel and Schafer, 2009; Wang et al., 2010). It is well known that pattern-mixture models are not identified: the observed data does not provide enough information to identify the distributions for incomplete patterns. The use of identifying restrictions that equate the inestimable parameters to functions of estimable parameters is an approach to resolve the problem (Daniels and Hogan, 2008; Kenward et al., 2003; Little, 1995; Little and Wang, 1996; Thijs et al., 2002). Common identifying restrictions include complete case missing value (CCMV) constraints and available case missing value (ACMV) constraints. Molenberghs et al. (1998) proved that for discrete time points and monotone missingness, the ACMV constraint is equivalent to missing at random (MAR), as defined by Rubin (1976) and Little and Rubin (1987). A key and attractive feature of identifying restrictions is that they do not affect the fit of the model to the observed data. Understanding (identifying) restrictions that lead to MAR is an important first step for sensitivity analysis under missing not at random (MNAR) (Daniels and Hogan, 2008; Scharfstein et al., 2003; Zhang and Heitjan, 2006). with time invariant coefficients a. We model p(S) and p(Y|S) as follows: siX ~ Bern(q(X)) Y S = s N(p(s)(X), i(S)) s= 1,2 where (S) + Xa(S) (S) (S) I (= and I(s) = ll 2 (s) + Xa(s) 2 MAR (ACMV) implies the following restriction [Y2 Y1, S = 1] [Y2 Y1, S= 2]. This implies that conditional means, E(Y21 Y, X, S = s) for s = 1, 2, are equal, i.e. (1) (2) 21) + Xa() + L(Y Xa(l)) = 2) + Xa (2)+ (Y1 0I I1I. For (4-6) to hold for all Y, and X, we need that a(1) _= (2) However, both a'l) and a(2) are already identified by the observed data Y1. Thus the ACMV (MAR) restriction affects the model fit to the observed data. This is against the principle of applying identifying restrictions (Little and Wang, 1996; Wang and Daniels, 2009). To resolve the over-identification issue we propose to apply MAR constraints on residuals instead of directly on the responses. In the bivariate case, the corresponding restriction is [Y2- Xa(l)| Y X~(1), X, S = 1] [Y2 Xa(2) 1 X X(2),X, S = 2] (4-7) 12) Xa(2)). (4-6) parameters (for MNAR analysis) that have convenient interpretations as deviations of means and variances from MAR (Daniels and Hogan, 2008). However, multivariate normality within patterns can be overly restrictive when applying identifying restrictions. We explore such issues in Chapter 4. Furthermore, identification strategies can be problematic in models with covariates (e.g. baseline covariates with time-invariant coefficients). In this Chapter, we also explore conditions necessary for identifying restrictions that result in missing at random (MAR) to exist under a multivariate normality assumption and strategies for sensitivity analysis. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution. data can be applied on the filled-in data. However, single imputation does not reflect the uncertainty of the missing value. Multiple imputation was proposed to address this flaw by imputing several values for each missing response(Rubin, 1987). For notation, let y = {yi, ... y} denote the full data response vector of outcome, possibly partially observed. Let r = {ri, r2,..., rj} denote the missing data indicator, with rj = 0 if yj is missing and 1 if yj is observed. Let x denote the covariates. Let yobs and ymis denote the observed and missing response data, respectively. Let w be the parameters indexing the full data model p(y, r), 0(w) be the parameters indexing the full data response model p(y), and O(w) be the parameters indexing the missing data mechanism model p(r|y). The common assumptions about the missing data mechanism are as follows. Little and Rubin's taxonomy: Rubin (1976) and Little and Rubin (1987) developed a hierarchy for missing data mechanisms by classifying the relationship between missingness and the response data. Definition 1.1. Missing responses are missing completely at random (MCAR) if P(rlYobs, Ymis, x; O(W)) = p(rlx; O( )) for all x and w. Definition 1.2. Missing responses are missing at random (MAR) if P(rlYobs, Ymis, x; O(W)) = p(rlYobs, X; O()) for all x and w. Note that MAR holds if and only if P(Ymis Yobs, r) = P(YmislYobs). The proof is as follows: Proof: 1. Suppose MAR holds. Then we have p(rlymis, Yobs) = p(rlYobs) Wang, C. and Daniels, M. (2009). Discussion of "Missing Data in longitudinal studies: A review" by Ibrahim and Molenberghs. TEST 18, 51-58. Wang, C., Daniels, M., D.O., S., and Land, S. (2010). A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. To Appear in JASA. Wu, M. and Bailey, K. (1988). Analysing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine 7,. Wu, M. and Bailey, K. (1989). Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics pages 939-955. Wu, M. and Carroll, R. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44, 175-188. Wulfsohn, M. and Tsiatis, A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics pages 330-339. Yuan, Y. and Little, R. J. (2009). Mixed-effect hybrid models for longitudinal data with nonignorable drop-out. Biometrics (in press). Zhang, J. and Heitjan, D. (2006). A simple local sensitivity analysis tool for nonignorable coarsening: application to dependent censoring. Biometrics 62, 1260-1268. Zhang, J. and Rubin, D. (2003). Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by" Death". Journal of Educational and Behavioral Statistics 28, 353. 122 Proof: Under Assumption 1, we know that the parameters of the conditional joint distribution of S and Ys given Z = z are estimable from the distribution of the observed data. The rest of the proof is the same as in Chapter 2. The identifiability result shows that, given the functions qz,j(Yj-, Yj), I*,j can be expressed as functional of the distribution of the observed data. 3.3 Modeling, Prior Specification and Posterior Computation 3.3.1 Modeling We reparameterize the saturated observed data model in Chapter 2 as follows: P[Yo = 1] = az,o P[Y = 1IS > 1, Yo = y, Z = z] = azl,y P[Y, = 1IS > j, = y, Y,-2 = -_2, Z = z] = a Z.,yJ-_2 (3-1) P[S = 0 Yo = y, Z = z] = 7,o,y P[S =j 1|S >j Y_1 = y, Y-2 = Yj-2, Z = z] = 7z'-1,Y 2,y for = 2,.. ,Jandy = 0, 1. Let az denote the parameters indexing the first set of models for response and 7- denote the parameters indexing the second set of models for drop-out. Recall that we defined Oz to denote the parameters of the conditional distribution of S and Ys given Z = z; thus, Oz = (az, y7). This saturated model avoids the complex interaction term model parameterization. As a result, the (conditional) posterior distributions of Oz will have simple forms and efficient posterior sampling is possible even when the sample size is moderate or small. We use the same parameterization of the functions qz(Yj_1, Yj) as in Chapter 2, Section 2.5. 3.3.2 Shrinkage Prior In Chapter 2, the strategy to avoid the curse of dimensionality was to apply shrinkage priors for higher order interactions to reduce the number of parameters BIOGRAPHICAL SKETCH Chenguang Wang received his bachelor's and master's degrees in computer science from Dalian University of Technology, China. Chenguang later joined the biometry program of and received his master's degree in statistics from University of Nebraska-Lincoln. At University of Florida, Chenguang's major was statistics while simultaneously working for the Children's Oncology Group Statistics and Data Center (2004-2009) and Center for Devices and Radiological Health, FDA (2009-2010). Chenguang received his Ph.D. from University of Florida in the summer of 2010. Chenguang's research has focused on constructing a Bayesian framework for incomplete longitudinal data that identifies the parameters of interest and assesses sensitivity of the inference via incorporating expert opinions. Such a framework can be broadly used in clinical trials to provide health care professionals more accurate understanding of the statistical or causal relationship between clinical interventions and human diseases. Chenguang is a member of American Statistical Association, a member of Eastern North American Region/International Biometric Society, and a member of Children's Oncology Group. 123 BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS FOR INCOMPLETE LONGITUDINAL DATA By CHENGUANG WANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2010 the shrinkage priors are 0(1) (which is the case here) and all the true values of the observed data probabilities, P[YjIRj = 1, Yj_, Z] forj = 0,..., J and are in the open interval, (0, 1). This follows, since under this latter condition, all combinations of depression histories have a positive probability of being observed and the prior will become swamped by the observed data. However, when the true value of any of the observed data probabilities is zero or one, there exists at least one combination of depression history that will never be observed and thus the influence of the prior will not dissipate as n increases. We specify non-informative priors N(0, 1000) for the non-interaction parameters in 0, namely cazjo forj = 0,... J and z = 0, 1, aczj,, 7zj,o and 7,j,1 forj = 1,... J and z = 0,1. 2.6.2 Prior of Sensitivity Parameters The sensitivity parameters in Assumption 2, defined formally in Section 2.5, are (conditional) odds ratios. In our experience, subject matter experts often have difficulty thinking in terms of odds ratios; rather, they are more comfortable expressing beliefs about relative risks (Scharfstein et al., 2006; Shepherd et al., 2007). With this is in mind, we asked Dr. Patricia Ganz, a medical oncologist and expert on quality of life outcomes in breast cancer, to express her beliefs about the risk of dropping out and its relationship to treatment assignment and depression. We then translated her beliefs into prior distributional assumptions about the odds ratio sensitivity parameters r. Specifically, we asked Dr. Ganz to answer the following question for each treatment group: Q: Consider a group of women assigned to placebo (tamoxifen), who are on study through visit 1 and who share the same history of depression. Suppose that the probability that a randomly selected woman in this group drops out before visit j is p (denoted by the columns in Table 1). For each p, what is the minimum, maximum and your best guess (median) representing how much more (e.g. twice) or less (e.g., half) For the saturated model with diffuse priors, we re-parameterize the model as P[Yj = 1Ry, = 1,Y,_j = Yj_1, Z = z] = #z,yj,, P[Rj = 0|Rj_1 = 1,Y,_1 = yi, Z = z] = Pz,,.1 forj = 1,... 7, and specify independent Unif(0, 1) on p's and p's. We simulated observed data from a "true" parametric model of the following form: logitP[Yo = 1Ro = 1, Z = z] = az,o,o logitP[Y1 = 1Ri = 1, Yo = Yo, Z = z] = az,,o + oz,1,1yo logitP[R1 = 0IRo = 1, Yo = yo, Z = z] = 7z,i,o + 7z,i,iYo logitP[Yj = 1|Rj = 1, Yj_ = yj_1, Z = z] = Oaz,o + az,jlYj- + az,j,2Yj-2 logitP[Rj = 0|Rj_1 = 1, Yj1 = Yj-1, Z = z] = 7zj,o + 7zj,lj- + 7z,j,2Yj-2, forj = 2 to 7. To determine the parameters of the data generating model, we fit this model to the "monotonized" BCPT data in WinBUGS with non-informative priors. We used the posterior mean of the of parameters az and 7 as the true parameters. We compute the "true" values of pj by (1) drawing 10,000 values from the elicited prior of Tz given 7y given in Table 2-2, (2) computing pj using the identification algorithm in Section 2.4 for each draw, and (3) average the resulting pj's. The model parameters and the "true" depression rates p j, are given in Table 2-3. We considered (relatively) small (3000), moderate (5000), and large (10000) sample sizes for each treatment arm; for each sample size, we simulated 50 datasets. We assessed model performance using the mean squared error (MSE) criterion. In Table 2-4, we report the MSEs of P[Yj = 1 Rj = 1, Yj-, Z = z] and P[Rj = 1\Rj_1 = 1, Yj_1, Z = z] averaged over allj and all Yj_I (see columns 3 and 4, respectively). We also report the MSEs for pj (see columns 6-12). For Figure 3-2 illustrates the effect of shrinkage on the model fit by comparing the difference between the empirical rates and posterior means of P[Yj = 1 S > j, Y,_i = y _1, Z = z] for the tamoxifen arm (Z = 1) and j = 6, 7. We use the later time points to illustrate this since the observed data were more sparse and the shrinkage effect was more apparent. The empirical depression rates often reside on the boundary (0 or 1). In some cases, there are no observations within "cells", thus the empirical rates were undefined. From the simulation results in Section 2.7, we know that the empirical estimates are less reliable for later time points. Via the shrinkage priors, the probabilities P[Yj = 15 > j, Yj-1 = y_-1, Yj-2 = Yj-2, Z = z] with the same yj-2 are shrunk together and away from the boundaries. By borrowing information across neighboring cells, we are able to estimate P[Yj = 1 S > j, Yj-l, Z = z] for all z and Yj_1 with better precision. The differences between the empirical rates and the posterior means illustrate the magnitude of the shrinkage effect. In the BCPT, the depression rate was (relatively) low and there were few subjects at the later times that were observed with a history of mostly depression at the earlier visits; as a result, the differences were larger when Yj_i had a lot of 1's (depression). 3.5.2 Inference Figure 3-3 shows the posterior of P[Y7 = 1|Z = z], the treatment-specific probability of depression at the end of the 36-month follow up (solid lines). For comparison, the posterior under MAR (corresponding to point mass priors for r at zero) is also presented (dashed lines). The observed depression rates (i.e., complete case analysis) were 0.124 and 0.112 for the placebo and tamoxifen arms, respectively. Under the MNAR analysis (using the elicited priors), the posterior mean of the depression rates at month 36 were 0.133 (95%C/ : 0.122, 0.144) and 0.125 (95%C/ : 0.114, 0.136) for the placebo and tamoxifen arms; the difference was -0.007 (95%C/ : -0.023, 0.008). Under MAR, the rates were 0.132 (95%C/ : 0.121, 0.143) and 0.122 (95%C/ : 0.111, 0.133) for the placebo and tamoxifen arms; the difference was -0.01 (95%C/ : -0.025, 0.005). I dedicate this to my family 3.2 Notation, Assumptions and Identifiability To address the intermittent missingness, we redefine the notation in Chapter 2, Section 2.2, as well as introduce some additional notation in this Section. The following notation is defined for a random individual. When necessary, we use the subscript i to denote data for the ith individual. Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen and Z = 0 denotes placebo. Let Y be the complete response data vector with elements Yj denoting the binary outcome (i.e., depression) scheduled to be measured at the jth visit ( = O(baseline),..., J) and let Yy = (Yo,..., Yj) denote the history of the outcome process through visit j. Let R be the vector of missing data indicators with the same dimension as Y, such that Rj = 1 indicates Yj is observed and Rj = 0 indicates Yj is missing. Let S = max{t : Rt = 1} be the last visit at which an individual's depression status is recorded. If S < J, then we say that the individual has dropped out and S is referred to as the drop-out time. Let Rs = {Rj : j < S} be the collection of intermittent missing data indicators recorded prior to S. We will find it useful to distinguish three sets of data for an individual: the complete data C = (Z, S, Rs, Y), the full data F = (Z, 5, Rs, Ys), and the observed data 0 = (Z, 5, Rs, Yobs), where Yobs is the subset of Y for which Rj = 1. It is useful to also define Ymis = (Ymis ,Y is, Y i), where Yis = Y : R = 0,j < S} denotes the "intermittent" missing responses, Y -i, = : j = S + 1,j < J} denotes the missing response at the time right after drop-out, and Yis = { Yj S + 1 < j < J} denotes the "future" missing responses. Note that Ys = (Ymis, Yobs). We assume that individuals are drawn as a simple random sample from a super-population so that we have an i.i.d. data structure for C, F and 0. We let the parameters Oz index a model for the joint conditional distribution of S and Ys given Z = z and the parameters qs,z index a model for the conditional distribution of Rs given Contour Plot CO -------------------------------------------- (0 - 006 02 -4 -2 0 2 4 Lower Bound Figure 5-1. Contour and Perspective Plots of a Bivariate Density 113 Density Plot which implies Ps (Yj|yj-) = p^j(Yily^-1) for all s < j, i.e. MAR. 4.5 MAR and Sensitivity Analysis with Multivariate Normality on the Observed-Data Response If we assume multivariate normality only on observed data response, YobsIS instead of the full data response, YIS, we can weaken the restrictions on ps(yj yj_-) for s > j and allow the MDM to incorporate all observed data under MAR (cf. Corollary 4.3). For example, we may specify distributions YobsIS as follows: Ps(yj) ~ N(ps),S)) 1 << sJ p,(y, y-1) ~ N( () L ) ) 2 where j-1 (S) R(s) 3(5)y / 1 To identify the full-data model, recall the MAR constraints imply that JP(S k) Ps(y |Yj-1) = P(Y |y-1) = P(S > ) Pk(YjlYj-1) (4-4) k-j for s < j, which are mixture of normals. For sensitivity analysis in this setting of mixture of normals, we propose to introduce sensitivity parameters A, (location) and A, (scale) such that for s Ps(yjlyj-1) = e- A -j,kPk( )-- --- yj-1) (4-5) k~j k-j eACT where jk = P(k). The rationale for this parameterization is that each pk( IYj-1) in the summation will have mean A) + (k) and variance e2An~ ) 1. To reduce the dimension of the sensitivity parameters, we could make A0) and A0) common for allj (namely A, and A,). P[Yk-1 = Yk- Rk- = 1, Yk-2 = Yk-2, Z = z]x P[Rk-i = 1 Rk-2 = 1,Yk-2 = Yk-2, Z = ]+ 1 SP[Yj = 1Rk- = 0, Rk-2 = 1, Yk- = Yk-, Z = z]x Yk-1= 0 P[Yk-1 = Yk-i Rk- = 0, Yk-2 = Yk-2, Z = z]x P[Rk-1 = 0Rk-2 Yk-2 Yk-2, Z z] 1 S P[Y = Rkl 1,Yk-1 Yk-, Z = z]x Yk-1= 0 P[Yk-1 = k-i Rk-i = 1, Yk-2 = Yk-2, Z = z]x P[Rk- = 1 Rk-2 1,Yk-2 = Yk-2, Z = z]+ Yk-1= 0 P[Yk-1 = Yk- Rk-1 = 1, Yk-2 = Yk-2, Z = z] exp{qz,k-1(Yk-2, Yk-1)} E[exp{qz,k-1(Yk-2, Yk-1)} Rk-1 = 1, Yk-2 = Yk-2, Z = z] P[Rk-1 = 0Rk-2 = 1,Yk-2 = Yk-2, Z = z] The third equality follows by Assumptions 1 and 2. Since all the quantities on the right hand side of the last equality are identified, P[Yj = 1IRk'-1 = 1, Yk'-1 = Yk-1, Z = z] is identified. o The identifiability result shows that, given the functions qz,j(Yj-, Y), p*,j can be expressed as functional of the distribution of the observed data. In particular, the functional depends on the conditional distributions of Yj given Rj = 1, Yj1, and Z forj = 0,... J and the conditional distributions of Rj given Rj_i = 1, Yj-_ and Z forj = 1,..., J. Furthermore, the functions qzj(Yjy_, Yj) are not identifiable from the distribution of the observed data and their specification places no restrictions on the distribution of the observed data. 3.6 Summary and Discussion In this Chapter, we extended the Bayesian shrinkage approach proposed in Chapter 2 for intermittent missingness. In addition, we reparameterized the saturated observed data model and dramatically improved the computational efficiency. WinBUGS can still be applied for the reparameterized model when there is no intermittent missing data. However, with the intermittent missingness, the augmentation step in the posterior computation requires extensive programming in WinBUGS. Nevertheless, the approach in Chapter 2 may still be preferred in certain cases, e.g. for directly shrinking the interaction terms. As an extension, we might consider alternatives to the partial ignorability assumption (Assumption 1) which has been widely used, but questioned by some (Robins, 1997). 3.7 Tables and Figures Table 3-1. Missingness by Scheduled Measurement Time Time Point j (Month) 1(3) 2(6) 3(12) 4(18) 5(24) 6(30) 7(36) Tamoxifen (Total N = 5364, Overall Missing 34.94%) Intermittent Missing 330 224 190 200 203 195 Drop-out at j 160 122 259 280 332 352 369 Cumulative Drop-out 160 282 541 821 1153 1505 1874 Placebo (Total N = 5375, Overall Missing 31.83 %) Intermittent Missing 347 215 153 181 199 197 Drop-out at j 157 106 247 287 309 272 333 Cumulative Drop-out 157 263 510 797 1106 1378 1711 Placebo Pj Tamoxifen 3 12 24 Time Points 36 3 12 24 Time Points Solid and dashed lines represent the empirical rate of P[Yj = 1, S > jZ = z] and P[S < jZ = z], respectively. The posterior means of P[Yj = 1, S > jZ = z] (diamond) and P[S < jZ = z] (triangle) and their 95% credible intervals are displayed at each time point. Figure 3-1 36 I I I I I I I I I I I I I I ";""~, 1999) as follows .)y 9. ,2 nd l,(Y) I zgE ) and T ) zJ ~(g (E)) ) y+ 1)2,j-1,y SzJ, zJ,y (g (E) (7y) )2 where * g(-) is a summary function (e.g., minimum, median or maximum, as suggested in Christiansen and Morris (1997)). Ey) = {e()- : the expected number of subjects with S > j, Y,_1 = y, Y_-2 = zzJyiy 2,Y yj-2, Z = z}. E(- {e ) the expected number of subjects with S > j 1, Yj_1 = zJ-l,y Z,J lyIj 2 y, Yj-2 Y -2, Z z}. The expected number of subjects with S > j, Y-_1 = y, Yi-2 = Y-2, Z = z and with S > j 1, Yi- = y, Yj-2 = Y-2, Z = z can be computed as: J e -y =nz P[S =s, Y = ,,... j = Y- = y, Yj-2= Yj_2Z = z] s-= yj,yj+I ...,ys J eK'Y = nz P[S = s, Y, = y,..., Yj = Y, Y-1 = y, Yj-2 = Yj_21Z = z] s=j-1 y,,yj,+...,ys (3-4) where the probabilities on the right hand side of the above equations are estimable under Assumption 1. The expected sample sizes above are used in the prior instead of the observed binomial sample sizes which are not completely determined due to the intermittent missingness. Thus, our formulation of these priors induces a small additional amount of data dependence beyond its standard dependence on the binomial sample sizes. This additional dependence affects the median of the prior but not its diffuseness. 3.3.3 Prior of Sensitivity Parameters We use the same approach as in Chapter 2, Section 2.6.2 for constructing priors of Tz given Oz. (3-3) which is independent of s. Therefore, P(S > A) S ( p(zyz)-I' X)--N s --J ( P() + j-1 /. (Z/ /=1 Similarly, we may derive that pk(ZjZZj-1,X) = N (P) + j-1 / (Z The constraints (4-10) thus imply (S) = (-1 which places no restrictions on ) which places no restrictions on (s). 102 ( )) (o J) P/1 7j( ) . /s)) ) P/ 7JO- ) P/1 )) The posterior probability of depression was higher under the MNAR analysis than the MAR analysis since researchers believed depressed patients were more likely to drop out (see Table 2-2), a belief that was captured by the elicited priors. Figure 3-4 shows that under the two treatments there were no significant differences in the depression rates at any measurement time (95% credible intervals all cover zero) under both MNAR and MAR. Similar (non-significant) treatment differences were seen when examining treatment comparisons conditional on depression status at baseline. 3.5.3 Sensitivity of Inference to the Priors To assess the sensitivity of inference on the 36 month depression rates to the elicited (informative) priors {rmin, rmed, rmax}, we considered several alternative scenarios based on Table 1. In the first scenario, we made the priors more or less informative by scaling the range, but leaving the median unchanged. That is, we considered increasing (or decreasing) the range by a scale factor v to {rmed v(rmed rmin), armed, armed + V(rmax rmed)}. In the second scenario, we shifted the prior by a factor u, {u + rmin, u + rmed, U + rmax }. The posterior mean and between-treatment difference of the depression rate at month 36 with 95% Cl are given in Tables 3-3 and 3-4. None of the scenarios considered resulted in the 95% Cl for the difference in rates of depression at 36 months that excluded zero except for the (extreme) scenario where the elicited tamoxifen intervals were shifted by 0.5 and the elicited placebo intervals were shifted by -0.5. We also assessed the impact of switching the priors for the placebo and tamoxifen arms; in this case, the posterior means were 0.135 (95% C : 0.124,0.146) and 0.123 (95% C : 0.112, 0.134) for the placebo and tamoxifen arms respectively, while the difference was -0.012 (95% C : -0.027, 0.004). uncertainty of the untestable assumptions, sensitivity analysis is carried out, and/or bounds of the causal effects are derived. For example, Zhang and Rubin (2003) derived large sample bounds for causal effects without assumptions and with assumptions such as monotonicity on death rate on different treatment arms. Gilbert et al. (2003) used a class of logistic selection bias models to identify the causal estimands and carried out sensitivity analysis for the magnitude of selection bias. Hayden et al. (2005) assumed "explainable nonrandom noncompliance" (Robins, 1998) and outlined a sensitivity analysis for exploring the robustness of the assumption. Cheng and Small (2006) derived sharp bounds for the causal effects and constructed confidence intervals to cover the identification region. Egleston et al. (2007) proposed a similar method to Zhang and Rubin (2003), but instead of identifying the full joint distribution of potential outcomes, they only identify features of the joint distribution that are necessary for identifying the SACE estimand. Lee et al. (2010) replaced the common deterministic monotonicity assumption by a stochastic one that allows incorporation of subject specific effects and generalized the assumptions to more complex trials. 5.3.2 Data and Notation The following notation is defined for a random individual. When necessary, we use the subscript i to denote data for the ith individual. We consider a controlled randomized clinical study with treatment arm (Z = 1) and control arm (Z = 0). A longitudinal binary outcome Y is scheduled to be measured at visits = 1,..., J, i.e. Y = (Y, ..., Yj) is a J-dimensional vector. Let R = (R, ..., Rj) be the missing indicator vector with Rj = 1 if Yj is observed and Rj = 0 if Yj is missing. We assume the missingness is monotone. We assume there are multiple events that will cause drop out for a patient on this trial, and categorize the events as non-response events (e.g. death) and missing events (e.g. withdraw of consent). We assume that non-response events may happen after the 107 outcomes. In this paper, we consider the same setting as Lee et al. (2008), but offer a more flexible strategy. In the context of BCPT, the longitudinal outcome will be the indicator that the CES-D score is 16 or higher. 2.1.3 Outline The paper is organized as follows. In Section 2.2, we describe the data structure. In Section 2.3 and 2.4, we formalize identification assumptions and prove that the full-data distribution is identified under these assumptions. We introduce a saturated model for the distribution of the observed data in Section 2.5. In Section 2.6, we illustrate how to apply shrinkage priors to high-order interaction parameters in the saturated model to reduce the dimensionality of the parameter space and how to elicit (conditional) informative priors for non-identified sensitivity parameters from experts. In Section 2.7, we assess, by simulation, the behavior of three classes of models for the distribution of observed data; parametric, saturated, and shrinkage. Our analysis of the BCPT trial is presented in Section 2.8. Section 2.9 is devoted to a summary and discussion. 2.2 Data Structure and Notation Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen and Z = 0 denotes placebo. Let Yj denote the binary outcome (i.e., depression) scheduled to be measured at thejth visit ( = 0(baseline),..., J) and let Yy = (Yo,.... Yj) denote the history of the outcome process through visit j. Let Rj denote the indicator that an individual has her depression status recorded at visit j. We assume that Ro = 1 (i.e., Yo is always observed) and Rj = 0 implies that Rj i = 0 (i.e., monotone missing data). Let C = max{t : Rt = 1} be the last visit at which an individual's depression status is recorded. The full and observed data for an individual are F = (Z, C, Yj) and O = (Z, C, Yc), respectively. We assume that we observe n i.i.d., copies of 0. We will use the subscript i to denote data for the ith individual. Our goal is to draw inference about p_ = P[Y = 1 Z = z] forj = 1,... J and z = 0, 1. So, a default approach for continuous Y, assuming the full data response is multivariate normal within pattern, does not allow an MAR restriction (unless the restrictions in Theorem 4.2 are imposed). We now examine the corresponding missing data mechanism (MDM), SLY. We use "~" to denote equality in distribution. Corollary 4.3. For pattern mixture models of the form (4-1) with monotone dropout, MAR holds if and only if S |Y SI Y1. Proof. Since Y1 is always observed (by assumption), SlY S| Y implies that SIYmis, Yobs ~ SIYobs, where Ymis and Yobs denote the missing and observed data respectively. This shows that MAR holds. On the other hand, MAR implies that p(S = slY) = p(S = slYobs) = p(S = s Ys). By Theorem 4.2, we have that MAR holds only if for all 1 < j < J, the conditional distributions ps(y jlYj-) are identical for s > j. Thus, under MAR Pk(Yjlj-1) = p (YiYj-1) = Ps(yjlYj-1) for all j > 2, k < j and s > j. This implies that for all j > 2 J p(yj yj-) = -p,(ylj y_,)p(S = s) = ps(yjlj-,) S-1 for all s. Therefore, p(S = slY) = p(S = s|y,) = p(S = s) p(ys) ps(ysys-)- ) ... p(y2 y ))p(yI1) p(Y s-1) ... P(Y2 Y ) (1) P(Y) (S= s) = p(S = sly). -- X p b -- --- -- O 00 o CO a> C)o C) N CM 0 Figure 3-3. .10 0.11 0.12 0.13 0.14 0.15 Depression Probability Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively. likely you consider the risk of dropping out before visit j for a woman who would be depressed at year RELATIVE to a woman who would not be depressed at visit ? Implicit in this question is the assumption that, for each treatment group, the relative risk only depends on past history and the visit number only through the risk of dropping out between visits j 1 and j. For notational convenience, let rz(p) denote the relative risk of drop-out for treatment group z and drop-out probability p. Further, let rz,min(p), rz,med(p) and rz,max(p) denote the elicited minimum, median, and maximum relative risks (see Table2-1). Let pzj(Yj-i) = P[Rj = 0ORj = 1,Y/_j = y_1,,Z = z] and let Pj(yy-1) = P[Rj = OiR-1 = 1,Y_1 = yi_ Y = y, Z = z] fory = 0,1. By definition, ^P ,y-i)) =PZ( -/1 P^ -1) P p (yj_1)7(yYj_) 1 p=,jy ) =y -l)Wzj ( -) y=O where (-(y) = P[Yj = y|R-y = 1,YJ_- = yi, Z = z] for y = 0,1. This implies that z ()(y. -) P[___ pzy YRi-1 (0) Pzj(Yj-1) PZ(,y,-/) (yi_ )(rz(Pz,(Yj_1))- 1)+ 1 Since r~ ,(yj-) E [0, 1], given pz,(Yj-1) and rz(pz,(yy_1)), p],)(y-) is bounded as follows: for rz(pzj(yj_ )) > 1, Pz,j(yj-)/rz(pzj(yj-)) <- PzC (yj-I) < min{pz,(yj_l), 1} and, for rz(pz,j(yj-)) < 1, Pz,j(Yj-) < PO(yj-1) < mn{hpz,(Yj-le)/rz(pz,(Yj-le)), 1}. We will use these bounds to construct our prior. In particular, MAR provides a good starting point for sensitivity analysis and sensitivity analysis are essential for the analysis of incomplete data (Daniels and Hogan, 2008; Scharfstein et al., 1999; Vansteelandt et al., 2006b). The normality of response data (if appropriate) for pattern mixture models is desirable as it easily allows incorporation of baseline covariates and introduction of sensitivity parameters (for MNAR analysis) that have convenient interpretations as deviations of means and variances from MAR (Daniels and Hogan, 2008). However, multivariate normality within patterns can be overly restrictive. We explore such issues in this paper. One criticism of mixture models is that they often induce missing data mechanisms that depend on the future (Kenward et al., 2003). We explore such non-future dependence in our context here and show how mixture models that have such missing data mechanisms have fewer sensitivity parameters. In Section 4.2, we show conditions under which MAR exists and does not exist when the full-data response is assumed multivariate normal within each missing pattern. In Section 4.3 and Section 4.4 in the same setting, we explore sensitivity analysis strategies under MNAR and under non-future dependent MNAR respectively. In Section 4.5, we propose a sensitivity analysis approach where only the observed data within pattern are assumed multivariate normal. In Section 4.6, we apply the frameworks described in previous sections to a randomized clinical trial for estimating the effectiveness of recombinant growth hormone for increasing muscle strength in the elderly. In Section 4.7, we show that in the presence of baseline covariates with time-invariant coefficients, standard identifying restrictions cause over-identification of the baseline covariate effects and we propose a remedy. We provide conclusions in Section 8. TT T 8 T - I I ,, 18 Time Points Figure 2-6. Posterior mean and 95% credible interval of difference of P[Yj = 1 Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively. fully and proposed alternatives based on multivariate normality for the observed data response within patterns. In both these contexts, we proposed strategies for specifying sensitivity parameters. In addition, we showed that when introducing baseline covariates with time invariant coefficients, standard identifying restrictions result in over-identification of the model. This is against the principle of applying identifying restriction in that they should not affect the model fit to the observed data. We proposed a simple alternative set of restrictions based on residuals that can be used as an 'identification' starting point for an analysis using mixture models. In the growth hormone study data example, we showed how to reduce the number of sensitivity parameters in practice and a default way to construct informative priors for sensitivity parameters based on limited knowledge about the missingness. In particular, all the values in the range, were weighted equally via a uniform distribution. If there is additional external information from expert opinion or historical data, informative priors may be used to incorporate such information (for example, see Ibrahim and Chen, 2000; Wang et al., 2010). Finally, an important consideration in sensitivity analysis and constructing informative priors is that they should avoid extrapolating missing values outside of a reasonable range (e.g., negative quadriceps strength). 4.9 Tables Table 4-1. Growth Hormone Study: Sample mean (standard deviation) stratified by dropout pattern. Dropout Number of Month Treatment Pattern Participants 0 6 12 EG 1 12 58(26) 2 4 57(15) 68(26) 3 22 78(24) 90(32) 88(32) All 38 69(25) 87(32) 88(32) EP 1 7 65(32) 2 2 87(52) 86(51) 3 31 65(24) 81(25) 73(21) All 40 66(26) 82(26) 73(21) combinations of Yj-1 (i.e., "cells") which will be sparsely represented in the dataset. For example, in the BCPT trial, about 50% of the possible realizations of Y7 have less than two observations and about 15% have no observations. For a frequentist perspective, this implies that components of 0 will be imprecisely estimated; in turn, this can adversely affect estimation of pj. This has been called the curse of dimensionality (Robins and Ritov, 1997). 2.6.1 Shrinkage Priors To address this problem, we introduce data driven shrinkage priors for higher order interactions to reduce the number of parameters in an automated manner. In particular, we assume z,j,k ~ N(, at) and 7,k N(0,) k A(t) 3 < t where t is the order of interactions and the hyper-parameters (shrinkage variances) follow distributions (t) Unif(0, 10) and a(t) ~ Unif(0, 10). When oat) and -(t) equal zero for all interactions, the saturated model is reduced to a first order Markov model, logit P[Yo = 1Ro = 1, Z = z] = az,o,o logit P[Yj = 1Rj = 1, Yj1 = yj1, Z = z] = azjo + zj,lYj-1 logit P[Rj = 0Rj _l = 1,Yi = yji, Z = z] = 7z,o + 7z,,lyj-1. The shrinkage priors allow the "neighboring" cells in the observed data model to borrow information from each other and provide more precise estimates. When the first order Markov model is not true, as n goes to infinity, the posterior means of observed data probabilities will converge to their true values as long as Articles by Little (1995), Hogan and Laird (1997b) and Kenward and Molenberghs (1999) as well as recent books by Molenberghs and Kenward (2007) and Daniels and Hogan (2008) provide a comprehensive review of likelihood-based approaches, including selection models, pattern-mixture models, and shared-parameter models. These models differ in the way the joint distribution of the outcome and missing data processes are factorized. In selection models, one specifies a model for the marginal distribution of the outcome process and a model for the conditional distribution of the drop-out process given the outcome process (see, for example, Albert, 2000; Baker, 1995; Diggle and Kenward, 1994; Fitzmaurice et al., 1995; Heckman, 1979a; Liu et al., 1999; Molenberghs et al., 1997); in pattern-mixture models, one specifies a model for the conditional distribution of the outcome process given the drop-out time and the marginal distribution of the drop-out time (see, for example, Birmingham and Fitzmaurice, 2002; Daniels and Hogan, 2000; Fitzmaurice and Laird, 2000b; Hogan and Laird, 1997a; Little, 1993, 1994, 1995; Pauler et al., 2003; Roy, 2003; Roy and Daniels, 2008; Thijs et al., 2002); and in shared-parameter models, the outcome and drop-out processes are assumed to be conditionally independent given shared random effects (see, for example, DeGruttola and Tu, 1994; Land et al., 2002; Pulkstenis et al., 1998; Ten Have et al., 1998, 2000; Wu and Carroll, 1988; Yuan and Little, 2009). Traditionally, these models have relied on very strong distributional assumptions in order to obtain model identifiability. Without these strong distributional assumptions, specific parameters from these models would not be identified from the distribution of the observed data. To address this issue within a likelihood-based framework, several authors (Baker et al., 1992; Daniels and Hogan, 2008; Kurland and Heagerty, 2004; Little, 1994; Little and Rubin, 1999; Nordheim, 1984) have promoted the use of global sensitivity analysis, whereby non- or weakly- identified, interpretable parameters are fixed and then varied to evaluate is consistent if the marginal mean of response is correctly specified. However, inference based on GEE is only valid under MCAR. Robins et al. (1995) proposed inverse-probability of censoring weighted generalized estimating equations (IPCW-GEE) approach, which reweights each individual's contribution to the usual GEE by the estimated probability of drop-out. IPCW-GEE will lead to consistent estimation when the missingness is MAR. However, both GEE and IPCW-GEE can result in biased estimation under MNAR. Rotnitzky et al. (1998a, 2001), Scharfstein et al. (2003) and Schulman et al. (1999) adopted semiparametric selection modeling approaches, in which the model for drop-out is indexed by interpretable sensitivity parameters that express departures from MAR. For such approaches, the inference results depend on the choice of unidentified, yet interpretable, sensitivity analysis parameters. 1.4 Intermittent Missingness Intermittent missingness occurs when a missing value is followed by an observed value. The existence of intermittent missing values increases exponentially the number of missing patterns that need to be properly modeled. Thus, handling informative intermittent missing data is methodologically and computationally challenging and, as a result, the statistics literature is limited. One approach to handle intermittent missingness is to consider a "monotonized" dataset, whereby all observed values on an individual after their first missingness are deleted Land et al. (2002). However, this increases the "dropout" rate, loses efficiency, and may introduce bias. Other methods in the literature often adopt a likelihood approach and rely on strong parametric assumptions. For example, Troxel et al. (1998), Albert (2000) and Ibrahim et al. (2001) suggested a selection model approach. Albert et al. (2002) used a shared latent autoregressive process model. Lin et al. (2004) employed latent class pattern mixture model. T T T T T T HLH IL J I I I J J_ 18 Time Points Figure 3-4. Posterior mean and 95% credible interval of difference of P[Yj = 1 Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively. CHAPTER 2 A BAYESIAN SHRINKAGE MODEL FOR LONGITUDINAL BINARY DATA WITH DROP-OUT 2.1 Introduction 2.1.1 Breast Cancer Prevention Trial The Breast Cancer Prevention Trial (BCPT) was a large multi-center, double-blinded, placebo-controlled, chemoprevention trial of the National Surgical Adjuvant Breast and Bowel Project (NSABP) designed to test the efficacy of 20mg/day tamoxifen in preventing breast cancer and coronary heart disease in healthy women at risk for breast cancer (Fisher et al., 1998). The study was open to accrual from June 1, 1992 through September 30, 1997 and 13,338 women aged 35 or older were enrolled in the study during this interval. The primary objective was to determine whether long-term tamoxifen therapy is effective in preventing the occurrence of invasive breast cancer. Secondary objectives included quality of life (QOL) assessments to evaluate benefit as well as risk resulting from the use of tamoxifen. Monitoring QOL was of particular importance for this trial since the participants were healthy women and there had been concerns voiced by researchers about the association between clinical depression and tamoxifen use. Accordingly, data on depression symptoms was scheduled to be collected at baseline prior to randomization, at 3 months, at 6 months and every 6 months thereafter for up to 5 years. The primary instrument used to monitor depressive symptoms over time was the Center for Epidemiologic Studies Depression Scale (CES-D)(Radloff, 1977). This self-test questionnaire is composed of 20 items, each of which is scored on a scale of 0-3. A score of 16 or higher is considered as a likely case of clinical depression. The trial was unblinded on March 31, 1998, after an interim analysis showed a dramatic reduction in the incidence of breast cancer in the treatment arm. Due to the potential loss of the control arm, we focus on QOL data collected on the 10,982 participants who were enrolled during the first two years of accrual and had their CES-D 2.5 Modeling We specify saturated models for the observed data via the sequential conditional distributions of [Yj R = 1, Yj _, Z] forj = 0,..., J and the conditional hazards [RjI Rj- = 1, Yji, Z] forj = 1,..., J. We parameterize these models as follows: logit P[Yo = 1Ro = 1, Z = z] = z,o,o j-2 logit P[Yj = R 1 1R ,Yj_I = Yj-1, Z = z] = az,o -+ az,j,ly-1 + OzkYk k=0 Z (2) ,(3) C(j-1) ..y. zjYkY/Ym +... z + YoYi Y -1 k,IA 2) k,l,mEA6 3) j-2 (1) logit P[Rj = 0|RjI = 1, Y-I -- Yj-1, Z = z] = 7,j,o + 7z,j,lj-i + _7z,j,kk k=O Z (2) (3) C 1) + (.YkY 7 ),.Yky/Ym +... + yz,j Y1) l "yj-1 k,IEA2) k,l,mEA(3) forj = 1,..., J, where A t) is the set of all t-tuples of the integers 0,... j 1. Let a denote the parameters indexing the conditional distributions [YjIRj = 1, Y-l, Z], 7 denote the parameters indexing the conditional distributions [Rj Rj-_ = 1, Y _1, Z] and 0= {a,7}. Furthermore, we propose to parameterize the functions qzj(Y_-, Yj) with parameters z,.y = qzj((Yi-1, 1)) qz,((Yi-1, 0)). Here, exp(,,yj _) represents, in the context of the BCPT trial, the conditional odds ratio of dropping out between visits j 1 and j for individuals who are depressed vs. not depressed at visit j, but share the mental history yj- through visit 1. We let 7 denote the collection of Tjy 's. 2.6 Prior Specification and Posterior Computation For specified sensitivity analysis parameters 7, the saturated model proposed in Section 2.5 provides a perfect fit to the distribution of the observed data. In this model, however, the number of parameters increases exponentially in J. In contrast, the number of data points increases linearly in J. As a consequence, there will be many 4.2 Existence of MAR under Multivariate Normality within Pattern Let Y be a J-dimensional longitudinal response vector with components scheduled to be measured at time points tj (j {1,..., J}); this is the full data response. Without loss of generality, we assume Y1 is always observed. Let S = s denote the number of observed responses (s = 1, 2,..., J) corresponding to the follow up time ts. Let Yj denote the historical response vector (Yi, Y2,... Yj). Finally, we define ps(.) = p(.IS = s). We show that MAR does not necessarily exist when it is assumed that YIS = s N(pi(s), '()) (4-1) for all s. To see this, we introduce some further notation. Let ,(-)(j) = E(YjIS = s) = s) and .(s)(j) = Var(YS = s)= J w r (2 ) 2(S2) where '(j) = E(YjlIS = s), P) (j) = E(Y|IS = s), )(j) = Var(Y;_| S =s), S)(j) = Var(YjIS = s), I)(j) = Cov(Y_ Y| S = s) and s- (j) is the transpose of Lemma 4.1. For monotone dropout, under the model given in (4-1), define s)(j) (S) :( )(j) (s)() = = p^\)j) (Ss^O) K3 22 21 (j) ( 11 12 and can be viewed as a special case of MNAR and an extension of MAR (Kenward et al., 2003). Kenward et al. (2003) showed that non-future dependence holds if and only if for each j > 3 and k Pk(YjIYj-i) = P1-1i(yY1Yj-1). An approach to implement non-future dependence within the framework of Section 4.3 is as follows. We model the observed data as in (4-3). For the conditional distribution of the current missing data (Ys+,), we assume that ps(ys| Ay ) ~ N (/3 s + ( A( s+)) e "S 2 < s < J ( 1/= 1 and for the conditional distribution of the future missing data (Ys+2 ..., Yj), we assume that Ps(ylyj-i) = pj-1((yJlYj-) 2 < s < -1 -1, where Si ( ) (S j 1) p(S > j) p(S >J 1)i( p(S >J 1) ). Note that by this approach, although the model for future missing data is a mixture of normals, the sensitivity parameters are kept the same as in Section 4.3 (Aj) and As), j = 2,..., J and / = 0,... j 1). In addition, this significantly reduces the number of potential sensitivity parameters. For J-dimensional longitudinal data, the total number of sensitivity parameters, (2J3 + 3J2 + J)/6 J is reduced to (J2 + 3J 4)/2; for J=3 (6), from 11 (85) to 7 (25). Further reduction is typically needed. See the data example in Section 4.6 as an illustration. If all the remaining sensitivity parameters are set to zero, then we have Ps(ys+lys) = p>s+i(ys+llys), 2 < s < J and Ps(yjlYy-1) = P9j(y|Yjy-1), 2 < s < 1 < J- 1, Similar to the bivariate case, to avoid the over-identification, we again use the MAR on the residuals restriction, Pk(Y Xa(k) ly Xa(k),... Yj-1 Xa(k), X) = P(S )s)Ps(yJ -Xa(S)| y Xa(,... yj_1 -Xa(S),X) k With the conditional mean structures specified as (4-8) and (4-9), the MAR on the residuals restriction places no assumptions on ca ") The corresponding MDM is P(S = slY, X) P(S = s)p(Y|S = s,X) log = log P(S> slY, X) P(Y, S >sX) P(S = s)pY(YjIYj-1,X)p,(Yj-1 YJ-2,X) ... s(Y1lX) = log E =s PI(YJIYJ-1 X)pi(YJ-1 YJ-2, X) ...pi(YX)P(S = I) It does not have a simple form in general. However, if a ") = a* for all s, then P(S = slY, X) p(Y1 X)P(S = s) log P = log j P(S > sY,X) /=s P5p(Y1 X)P(S = I) i.e. the MDM only depends on Y, and X. Otherwise, the missingness is MNAR. 4.8 Summary Most pattern mixture models allow the missingness to be MNAR, with MAR as a unique point in the parameter space. The magnitude of departure from MAR can be quantified via a set of sensitivity parameters. For MNAR analysis, it is critical to find scientifically meaningful and dimensionally tractable sensitivity parameters. For this purpose, (multivariate) normal distributions are often found attractive since the MNAR departure from MAR can be parsimoniously defined by deviations in the mean and (co-)variance. However, a simple pattern mixture model based on multivariate normality for the full data response within patterns does not allow MAR without special restrictions that themselves, induce a very restrictive missing data mechanism. We explored this 3 A BETA-BINOMIAL BAYESIAN SHRINKAGE MODEL FOR INTERMITTENT MISSINGNESS LONGITUDINAL BINARY DATA .................. 54 3.1 Introduction .................... ............... 54 3.1.1 Interm ittent Missing Data ................ ....... 54 3.1.2 Com putational Issues ......................... 55 3 .1.3 O utline . . 55 3.2 Notation, Assumptions and Identifiability ..... 56 3.3 Modeling, Prior Specification and Posterior Computation ... 58 3.3.1 M odeling . . 58 3.3.2 Shrinkage Prior . 58 3.3.3 Prior of Sensitivity Parameters .. .. 60 3.3.4 Posterior Computation ......................... 61 3.4 Assessment of Model Performance via Simulation .... 61 3.5 Application: Breast Cancer Prevention Trial (BCPT) .... 62 3.5.1 Model Fit ... ............... ........ .. .. 62 3.5.2 Inference ... ............... ........ .. .. 63 3.5.3 Sensitivity of Inference to the Priors ... 64 3.6 Summary and Discussion ................. ........ 65 3.7 Tables and Figures ............................ 65 3.8 Appendix ................ ................. 73 4 A NOTE ON MAR, IDENTIFYING RESTRICTIONS, AND SENSITIVITY ANALYSIS IN PATTERN MIXTURE MODELS ................ ......... 77 4.1 Introduction ... . .... 77 4.2 Existence of MAR under Multivariate Normality within Pattern ....... 79 4.3 Sequential Model Specification and Sensitivity Analysis under MAR .. 83 4.4 Non-Future Dependence and Sensitivity Analysis under Multivariate Normality w within Pattern . .. 85 4.5 MAR and Sensitivity Analysis with Multivariate Normality on the Observed-Data R response . . .. 87 4.6 Example: Growth Hormone Study .. .. 89 4.7 ACMV Restrictions and Multivariate Normality with Baseline Covariates .91 4.7.1 Bivariate Case .............. ....... ....... 91 4.7.2 Multivariate Case ........... ............. 93 4.8 Sum m ary . . 95 4.9 Tables . .. 96 4.10 Appendix ..... ..... .. 98 5 DISCUSSION: FUTURE APPLICATION OF THE BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC METHODS ..... ..... 103 5.1 S um m ary . . 103 5.2 Extensions to Genetics Mapping ..... ... ... 103 5.3 Extensions to Causal Inference ..... .... ... 105 CHAPTER 3 A BETA-BINOMIAL BAYESIAN SHRINKAGE MODEL FOR INTERMITTENT MISSINGNESS LONGITUDINAL BINARY DATA 3.1 Introduction We proposed in Chapter 2 a Bayesian shrinkage approach for longitudinal binary data with informative drop-out. The saturated observed data models were constructed sequentially via conditional distributions for response and for drop out time and parameterized on the logistic scale using all interaction terms. However, two issues were not addressed: the "ignored" intermittent missing data and the intrinsic computational challenge with the "interaction" parameterization. This Chapter proposes solutions to these two issues. 3.1.1 Intermittent Missing Data In the BCPT, approximately 15% of the responses were intermittently missing, i.e. there are missing values prior to drop-out. One approach to handle intermittent missingness is to consider a "monotonized" dataset, whereby all CES-D scores observed on an individual after their first missing score are deleted, as in Land et al. (2002); we did this in Chapter 2. However, this increases the "drop-out" rate, throws away information and thus loses efficiency, and may introduce bias. Handling informative intermittent missing data is methodologically and computationally challenging and, as a result, the statistics literature is relatively limited. Most methods adopt a likelihood approach and rely on strong parametric assumptions (see, for example, Albert, 2000; Albert et al., 2002; Ibrahim et al., 2001; Lin et al., 2004; Troxel et al., 1998). Semiparametric methods have been proposed by Troxel et al. (1998) and Vansteelandt et al. (2007). Troxel et al. (1998) proposed a marginal model and introduced a pseudo-likelihood estimation procedure. Vansteelandt et al. (2007) extended the ideas of Rotnitzky et al. (1998b), Scharfstein et al. (1999) and Rotnitzky et al. (2001) to non-monotone missing data. 2.10 Acknowledgments This research was supported by NIH grants R01-CA85295, U10-CA37377, and U10-CA69974. The authors are grateful to oncologist Patricia Ganz at UCLA for providing her expertise for the MNAR analysis. 2.11 Tables and Figures Table 2-1. Relative Risks to be Elicited Drop out Rate p Question Relative Risk p, P2 ... 100% confident the number is above rz,min(p) Best Guess rz,med(P) 100% confident the number is below r,max(p) Table 2-2. Percentiles of Relative Risks Elicited Drop out Rate Treatment Percentile 10% 25% Tamoxifen Minimum 1.10 1.30 Median 1.20 1.50 Maximum 1.30 1.60 Placebo Minimum 1.01 1.20 Median 1.05 1.30 Maximum 1.10 1.40 O 00 o CO a> C)o C) N CM 0 Figure 2-5. .10 0.11 0.12 0.13 0.14 0.15 Depression Probability Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively. Table 2-4. Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arms, respectively. Observed uT Model Treat Sample Size 3000 True P T Parametric P T Shrinkage P T Saturated P Sample Size 5( True Parametric Shrinkage Saturated Y R 1 2 3 4 5 6 7 0.946 1.075 30.176 28.882 6.970 6.988 35.678 34.654 )00 P 0.659 T 0.604 P 30.029 T 28.571 P 4.628 T 4.448 P 30.274 T 29.599 Sample Size 10000 True P T Parametric P T Shrinkage P T Saturated P T 0.314 0.278 29.849 28.418 2.392 2.474 22.989 22.245 0.378 0.431 0.451 0.385 1.999 2.401 67.171 62.606 0.238 0.261 0.381 0.277 1.188 1.414 54.647 51.219 0.121 0.126 0.315 0.223 0.707 0.712 37.716 34.791 0.034 0.048 0.026 0.035 0.034 0.048 0.026 0.037 0.033 0.045 0.024 0.026 0.036 0.050 0.026 0.033 0.023 0.027 0.020 0.020 0.023 0.027 0.020 0.023 0.025 0.028 0.017 0.023 0.023 0.028 0.020 0.020 0.009 0.009 0.011 0.013 0.009 0.010 0.011 0.014 0.008 0.010 0.011 0.015 0.009 0.009 0.011 0.013 0.049 0.046 0.048 0.056 0.051 0.056 0.054 0.045 0.027 0.032 0.028 0.042 0.032 0.028 0.028 0.032 0.015 0.013 0.016 0.02 0.015 0.011 0.016 0.014 0.052 0.056 0.062 0.078 0.059 0.053 0.058 0.059 0.028 0.027 0.033 0.042 0.031 0.026 0.033 0.028 0.017 0.013 0.023 0.026 0.017 0.015 0.018 0.014 0.050 0.052 0.061 0.073 0.047 0.063 0.101 0.097 0.036 0.034 0.044 0.049 0.035 0.038 0.061 0.051 0.013 0.015 0.028 0.028 0.014 0.016 0.018 0.021 0.05 0.065 0.078 0.061 0.067 0.086 0.096 0.077 0.052 0.066 0.119 0.073 0.231 0.561 0.329 0.722 0.043 0.058 0.035 0.044 0.052 0.073 0.056 0.057 0.048 0.057 0.033 0.044 0.138 0.290 0.140 0.392 0.014 0.018 0.019 0.020 0.033 0.043 0.039 0.043 0.013 0.014 0.019 0.023 0.038 0.094 0.048 0.128 Table 2-5. Patients Cumulative Drop Out Rate Month 3 6 12 18 24 30 36 Tamoxifen Available 5364 4874 4597 4249 3910 3529 3163 Drop out 490 767 1115 1454 1835 2201 2447 Drop Rate(%) 9.13 14.30 20.79 27.11 34.21 41.03 45.62 Placebo Available 5375 4871 4624 4310 3951 3593 3297 Drop out 504 751 1065 1424 1782 2078 2304 Drop Rate(%) 9.38 13.97 19.81 26.49 33.15 38.66 42.87 ACKNOWLEDGMENTS First and foremost, I would like to express the deepest appreciation to my advisor, Professor Michael J. Daniels. Without his extraordinary guidance and persistent help, I will never be able to be where I am. I admire his wisdom, his knowledge and his commitment to the highest standard. It has been truly an honor to work with him. I wish to specially thank Professor Daniel O. Scharfstein of Johns Hopkins for his encouragement and crucial contribution to the research. I will always bear in mind the advice he gave: just relax and enjoy the learning process. I would like to thank my committee members, Professor Malay Ghosh, Dr. Brett Presnell, and Dr. Almut Winterstein, who have provided abundant support and valuable insights over the entire process throughout the classes, exams and dissertation. Many thanks go in particular to Professor Rongling Wu of Pennsylvania State University. From Professor Wu, I started learning what I wanted for my career. I also gratefully thank Professor Myron Chang, Professor Linda Young, Dr. Meenakshi Devidas and Dr. Gregory Campbell of FDA. I am fortunate to have their support at those critical moments of my career. Finally, I would like to thank my wife, my son, my soon-to-be-born baby, my parents and my parents-in-law. It is only because of you that I have been able to keep working toward this dream I have. score recorded at baseline. All women in this cohort had the potential for three years of follow-up (before the unblinding). In the BCPT, the clinical centers were not required to collect QOL data on women after they stopped their assigned therapy. This design feature aggravated the problem of missing QOL data in the trial. As reported in Land et al. (2002), more than 30% of the CES-D scores were missing at the 36-month follow-up, with a slightly higher percentage in the tamoxifen group. They also showed that women with higher baseline CES-D scores had higher rates of missing data at each follow-up visit and the mean observed CES-D scores preceding a missing measurement were higher than those preceding an observed measurement; there was no evidence that these relationships differed by treatment group. While these results suggest that the missing data process is associated with observed QOL outcomes, one cannot rule out the possibility that the process is further related to unobserved outcomes and that this relationship is modified by treatment. In particular, investigators were concerned (a priori) that, between assessments, tamoxifen might be causing depression in some individuals, who then do not return for their next assessment. If this occurs, the data are said be missing not at random (MNAR); otherwise the data are said to be missing at random (MAR). 2.1.2 Informative Drop-Out in Longitudinal Studies In this paper, we will concern ourselves with inference in longitudinal studies, where individuals who miss visits do not return for subsequent visits (i.e., drop-out). In such a setting, MNAR is often referred to as informative drop-out. While there were some intermittent responses in the BCPT, we will, as in Land et al. (2002), consider a "monotonized" dataset, whereby all CES-D scores observed on an individual after their first missing score have been deleted (this increases the "dropout" rate). There are two main inferential paradigms for analyzing longitudinal studies with informative drop-out: likelihood parametricc) and non-likelihood (semi-parametric). 2.3 Assumptions To identify pj from the distribution of the observed data, we make the following two untestable assumptions: Assumption 1 (Non-Future Dependence): Rj is independent of (j,,,..., Yj) given Rj_ = 1 and Yj, forj = 1...J 1. This assumption asserts that for individuals at risk for drop-out at visit and who share the same history of outcomes up to and including visit j, the distribution of future outcomes is the same for those who are last seen at visit j and those who remain on study past visit j. This assumption has been referred to as non-future dependence (Kenward et al., 2003). Assumption 2 (Pattern-Mixture Representation): Forj = 1,..., J and yj = 0, 1, P[Yj = yj|Rj = 0, Rj-1 = 1, Yj-_, Z = z] = P[Yj = yjlRj = 1,Yj_I, Z = z] exp{qz,(Yy_l, yj)} E[exp{qz,i(Yi_1, Yj)} Rj = 1, Yj_1, Z = z] where qz,(Y-y_, Yj) is a specified function of its arguments. Assumption 2 links the non-identified conditional distribution of Yj given Rj = 0, Rj_1 = 1, Yil, and Z = z to the identified conditional distribution of Yj given Rj = 1, Yi_1, and Z = z using exponential tilting via the specified function qz,(Y y_, Yj). Assumption (2) has a selection model representation that is obtained using Bayes' rule. Assumption 2 (Selection Model Representation): Forj = 1,..., J, logit {P[R = 01 Rj_1 = 1, Y,, Z = z]} = hz,(Y,_1) + qz (Y_-, Yj) where hz,j(Yj_) = logit P[R = 01Rj_ = 1, Yj_,Z = z] - log{E[exp{qz,(Y_l, Yj)}IR_- = 1, Y _1, Z = z]} For the observed response data model [Yobs|S], we specify the same MVN and OMVN model for [YIIS] as follows: Y, S = 1 N(,', a)) Y, lS = 2 Q N a(/ )2)) Y l S = 3 ~ N (/j3 3)), For MVN model, we specify Y2 Y1, S 2} N ((2) + 2 (2) Y2 Y1, S = 3 &\ o3) + (3) y+ n(3) y ^( v2, 7(> 3)" Y3 Y2, Y1, S = 3 N ~>3 + 31 + ~ 3)2. '03) 3' For OMVN model, we specify //?(2) + (2) Y 7(2) Y2 Y, S = 2 + -21 u2N 2 S(3) + (3) y '7(3) Y2 Y1, = 3 ~ N 2,0 2,1 2 {9(3) + (3) Y1 + (3) Y2, _(3)" Y3 Y2, Y1, S = 3 ~ N 3,0 3,1 3,2 313, For missing response data model [Ymis Yobs, S], we specify for MVN model Y2 1, S = 1 N 0 -(2) 2) 2) )Y1, e() 2 2) Y3 Y2, Y1, S = 2 N (>3) A3 + (>3) + A3))Y (/3) + A3))2, eA()3) Y31 Y2, Y,, S = 1 ~ 0 3 3) + 01 32 31) + 2 N (/3) + A3) + (/3) + 3)) + (/>3) + A3)), (-3)'7 + 2 + (N3 0 0 1 1 2 Y2, 313 a313 100 We construct the conditional prior of Tz,,y,, given pz,(yi-1) using Steps 1-4 given below. The general strategy is to use the elicited information on the relative risk at different drop-out probabilities and the bounds derived above to construct the prior of interest. Step 1. For m c {min, med, max}, interpolate the elicited rz,m(p) at different drop-out probabilities (see Figure 2-1) to find rz,m(Pzj(yji_)) for any pzj(yj-_). Step 2. Construct the prior of rz(pzj(y-i)) given pz,(Yj_,) as a 50-50 mixture of Uniform(rz,min(pz,(yj_-)), rz,med(Pzj(Yj-1)) and Uniform(rz,med(pz,j (yj-1)), z,max (Pzj(j-)) random variables. This preserves the elicited percentiles of the relative risk. Step 3. Construct a conditional prior of p )(yil) given pzj(Yj-1) and rz(pzj(yi_)) as a uniform distribution with lower bound Pz,i(Yi-1) max {rz(pz,(yj 1)),1} and upper bound min P(Yj-1) 1 mi m {rzn((pz,(Y-i)), 1} max {r(pzj(yj-)), 1} The bounds were derived above. Step 4. Steps (2) and (3) induce a prior for Tz,yy | by noting is Ziy- (= log rz(pz (y_ ))(1- (o y_)) S1 rz(PzYj-1))P(Yj-1) i.e., Tzj,,y is a deterministic function of rz(pzj(yj-i)) and pz(yi-). The relative risks elicited from Dr. Ganz are given in Table 2-2. We extrapolated the relative risks outside the ranges given in Table 2-2 as shown in Figure 2-1. Figure 2-2 shows the density of r given pz,(yj-,) equal 10% and 25% for the tamoxifen and placebo arms. For two patients with the same response history up to time point j 1, the log odds ratio of dropping out at time point j, for the patient that is Tamoxifen 3 12 24 Time Points Figure 2-3. 36 3 12 24 Time Points 36 Solid and dashed lines represent the empirical rate of P[Yj = 1, Rj = 1|Z = z] and P[Rj = 0|Z = z], respectively. The posterior means of P[Yj = 1, Rj = 1 Z = z] (diamond) and P[Rj = 0 Z = z] (triangle) and their 95% credible intervals are displayed at each time point. B Placebo REFERENCES Albert, P (2000). A Transitional Model for Longitudinal Binary Data Subject to Nonignorable Missing Data. Biometrics 56, 602-608. Albert, P., Follmann, D., Wang, S., and Suh, E. (2002). A latent autoregressive model for longitudinal binary data subject to informative missingness. Biometrics 58, 631-642. Baker, S. (1995). Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 51, 1042-1052. Baker, S., Rosenberger, W., and DerSimonian, R. (1992). Closed-form estimates for missing counts in two-way contingency tables. Statistics in Medicine 11, 643-657. Birmingham, J. and Fitzmaurice, G. (2002). A Pattern-Mixture Model for Longitudinal Binary Responses with Nonignorable Nonresponse. Biometrics 58, 989-996. Boyd, S. and Vandenberghe, L. (1997). Semidefinite programming relaxations of non-convex problems in control and combinatorial optimization. communications, computation, control and signal processing: a tribute to Thomas Kailath . Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge Univ Pr. Cheng, J. and Small, D. (2006). Bounds on causal effects in three-arm trials with non-compliance. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 815-836. Christiansen, C. and Morris, C. (1997). Hierarchical Poisson regression modeling. Journal of the American Statistical Association pages 618-632. Daniels, M. (1999). A prior for the variance in hierarchical models. Canadian Journal of Statistics 27,. Daniels, M. and Hogan, J. (2000). Reparameterizing the Pattern Mixture Model for Sensitivity Analyses Under Informative Dropout. Biometrics 56, 1241-1248. Daniels, M. and Hogan, J. (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC. DeGruttola, V. and Tu, X. (1994). Modelling Progression of CD4-lymphocyte Count and its Relationship to Survival Time. Biometrics 50, 1003-1014. Diggle, P. and Kenward, M. (1994). Informative Drop-out Longitudinal Data Analysis. Applied Statistics 43, 49-93. Egleston, B. L., Scharfstein, D. O., Freeman, E. E., and West, S. K. (2007). Causal inference for non-mortality outcomes in the presence of death. Biostatistics 8, 526 - 545. 115 With this characterization, we see that the function qz,(Yi_i, Yj) quantifies the influence (on a log odds ratio scale) of the potentially unobservable outcome Yj on the conditional odds of dropping at time j. 2.4 Identifiability The above two assumptions non-parametrically, just-identify pj for all j 1,..., J and z = 0, 1. To see this, consider the following representation of this conditional distribution, derived using the laws of total and conditional probability: j- ,zi = P[Y = 1IRJ = 1, Y-1= y _1, Z = z]x P[R, = 1|R,_i = 1, Y-1 = Y,-i, Z = z] I P[Yi = yIRY = 1, Y,_i = Y,-i, Z = z] + /=1 /= 0 k-I k-I x P[R =1 Ri-1 = 1,Y=-1 = Y/-1, Z = z] P[Yi = yIRi = 1,Yk-1 = Y-1, Z =z] /=1 /=0 All quantities on the right hand side of this equation are identified, without appealing to any assumptions, except P[YY = 1 Rk = 0, Rk-i = 1, Yk-1 = Yk-1, Z = z] for k = 1, ... j 1. Under Assumptions 1 and 2, these probabilities can be shown to be identified, implying that :,j is identified for all j and z. Theorem 1: P[Yj = 1IRk-1 = 1,Yk-i Yk-1, Z = z] and P[Y; = 1IRk = 0, Rk-i 1, Yk- = Yk-1, Z = z] are identified for k = 1,... j. Proof: The proof follows by backward induction. Consider k = j. By Assumption 2, P[Yj = 1|Ry = 0, Rj_i = 1, Yj_1 = yj_l, Z = z] = P[Yj = 1 R = 1, Y_i, Z = z] exp{qz,(Yj_i, 1)} E[exp{qz,(Y-_l = YY)}|R = 1, Y_= y j_, Z = z] Since the right hand side is identified, we know that P[Yj = 1 Rj = 0, Rj-_ = 1, Y _1 = yj-, Z = z] is identified. Further, we can write P[Yj = 1Rj_ I= 1, Y_il = y _i,Z= z] 1 -. = > P[Y = 1|R, = r, Rj_1 1,Yj- I= Yj_1, Z = z]P[Rj = rlR -1 = 1,Y j- = yj-i, Z = z] r=O Since all quantities on the right hand side are identified, P[Yj = 1 Rj-_ = 1, Y _- = y _1, Z = z] is identified. Suppose that P[Yj = 1IRk = 0, Rk-i = 1,Yk-1 Yk-, Z = z] and P[Y = 1IRk-1 = 1, Yk-i = Yk-1, Z = z] are identified for some k where 1 < k < j. Then, we need to show that these probabilities are identified for k' = k 1. To see this, note that P[Yj = IlRk' = 0, Rk'-1 = 1,Yk'-i = Yk,-1, Z = z] = P[Y = 1 iRk-I = 0, Rk-2 = 1,Yk-2 = Yk-2, Z = z] 1 = P[Y,=IlRk- =O, Rk-2 Yk-1 Yk-1,Z=z]x Yk- 1 0 P[Yk-1 Yk-1IRk-1 = 0, Rk-2 = 1, Yk-2 = k-2, Z = z] 1 = P[Y = 1IRk- = 1,Yk- k-,Z = z]x Yk 1 0 P[Yk-1 Yk-1 Rk-i = 1, Yk-2 = k-2, Z = z] exp{qz,k-I(Yk-2, Yk-)} E[exp{q,k-1(Yk-2, Yk-1)}Rk-1 = 1,Yk-2 = Yk-2, Z = z] The third equality follows by Assumptions 1 and 2. Since all the quantities on the right hand side of the last equality are identified, P[Yj = 1 Rk, = 0, Rk'-1 = 1, Yk'- = yk'-1, Z = z] is identified. Further, P[Yj = 1IRk/_I = Yk'-i = yk'-1, Z = z] = P[Yj = 1lRk-2 = 1,Yk-2 = Yk-2, Z = z] 1 = P[Yj= 1IRk- 1,Yk1 Yk-l,Z=z]x Yk-1=0 We use n() to denote the number of subjects with S > j, YV_1 = yj_1, Yj-2 - zj,y'j Y-2, Z = z, and use n()- denote the number of subjects with S > j 1, Y_ = y1, Yj-2 = Y -2, Z = z. The condition that all the true values of az,y- 2,y and 7z,y ,y are in the open interval (0, 1) holds if and only if as the number of subjects goes to infinity, all the n()- and n() go to infinity. This indicates that to prove Theorem 2, we can zj,yj zjyJ 1 just prove that given Y = {Y1,..., Yk} and N = {ni,..., nk), where Yj Bin(nj, pj), pj ~ Beta(a, /) forj = ..., k and (a, /) has proper prior density r(a, 3), the posterior distributions for all pj are consistent as all nj go to infinity, with regard to the distributions under Yj ~ Bin(n, pj). To see this, note that 7r(pj Y, N) oc ( (1 p)"-Y 7(piJa,/)7(a,/3)dad3. Note that P(j)= j r(pj Ia,/3)7(ao,/3)dod/3= p-1(1 pj),-'(a, 3)dad/3 = M < oc, r(PJ) is 0(1). As nj and Yj go to infinity (this occurs since pj C (0, 1)), by the Bernstein-von Mises theorem, we have S( n(p pj)|l Y, n,) N (0, p (1- p7)) in distribution, which further implies that E[pj|Y, N] pj a.s. Var[pjlY, N] 0 a.s.. Table 3-2. Simulation Results: MSE (x103). P and T represent placebo and tamoxifen arms, respectively. Observed Model Treat Sample Size 500 True P Parametric Shrinkage Saturated Sample Size 2000 True P T Parametric P T Shrinkage P T Saturated P Sample Size 5C True Parametric Shrinkage Saturated Y 6.209 6.790 33.351 32.323 29.478 28.410 57.107 55.582 1.474 1.610 30.507 29.168 23.545 22.598 40.322 38.943 )00 P 0.594 T 0.623 P 29.983 T 28.616 P 18.83 T 18.055 P 30.071 T 29.156 pjz (Month) R 1(3) 2(6) 3(12) 2.474 2.789 1.511 1.602 2.310 2.365 111.263 104.882 0.586 0.634 0.543 0.495 0.647 0.615 77.627 72.731 0.234 0.265 0.379 0.298 0.394 0.322 54.454 50.590 0.199 0.205 0.199 0.205 0.202 0.212 0.202 0.211 0.052 0.050 0.051 0.050 0.053 0.050 0.053 0.050 0.020 0.024 0.020 0.024 0.020 0.024 0.020 0.024 0.225 0.228 0.227 0.226 0.226 0.232 0.228 0.245 0.058 0.063 0.055 0.064 0.056 0.063 0.057 0.064 0.024 0.024 0.025 0.025 0.024 0.024 0.024 0.024 0.258 0.297 0.26 0.292 0.252 0.294 0.302 0.383 0.063 0.062 0.064 0.071 0.063 0.063 0.069 0.067 0.026 0.028 0.029 0.035 0.026 0.028 0.027 0.029 4(18) 5(24) 6(30) 7(36) 0.313 0.344 0.317 0.345 0.303 0.336 0.490 0.657 0.078 0.080 0.081 0.086 0.078 0.080 0.100 0.110 0.033 0.035 0.037 0.048 0.033 0.036 0.038 0.039 0.319 0.331 0.323 0.333 0.312 0.330 1.083 1.352 0.081 0.093 0.090 0.101 0.082 0.093 0.188 0.218 0.031 0.033 0.043 0.045 0.031 0.034 0.052 0.059 0.352 0.405 0.349 0.403 0.337 0.390 2.401 3.167 0.086 0.095 0.091 0.110 0.084 0.095 0.457 0.560 0.040 0.039 0.049 0.060 0.039 0.040 0.13 0.148 0.390 0.428 0.388 0.425 0.372 0.419 4.427 5.782 0.097 0.101 0.108 0.121 0.095 0.102 0.946 1.223 0.036 0.040 0.055 0.059 0.036 0.041 0.270 0.373 5.3 Extensions to Causal Inference 5.3.1 Causal Inference Introduction For clinical studies of terminal diseases, incomplete data is often caused by death. If a response is truncated by death, the missingness should not be classified as "censored" because "censoring" implies the masking of a value that is potentially observable. It is also not appropriate to handle these cases by traditional non-mortality missing data approaches such as models assuming ignorability or models assuming non-ignorable missing data mechanism which implicitly "impute" missing responses. In randomized studies with no missingness, causal relationships are well established and the treatment causal effects can be estimated directly (Rubin, 1974). However, in non-randomized trials or in presence of missing data, these methods are limited if the research interest demands estimation of causally interpretable effects. To define causal effects, we first introduce the concept of potential outcomes, which are sometimes used exchangeablely with the term counterfactual (but not always, see Rubin, 2000). The use of term "potential outcome" can be traced at least to Neyman (1923). Neyman used "potential yields" Uik to indicate the yield of a plot k if exposed to a variety i. Rubin (1974) defines the causal effect of one treatment, E, over another, C, for a particular unit as the difference between what would have happened if the unit had been exposed to E, namely Y(E), and what would have happened if the unit had been exposed to C, namely Y(C). Using potential outcomes, Frangakis and Rubin (2002) introduce a framework for comparing treatment effects based on principle stratification, which is a cross-classification of units defined by their potential outcomes with respect to (post)treatment variables, such as treatment noncompliance or drop-out. The treatment comparison adjustment for posttreatment variables is necessary because such variables encode the characteristics of both the treatment and the patient. For example, a patient with diagnosed cancer in a cancer prevention trail may have depression caused by the treatment or by the diagnosis 105 For OMVN model, we specify 03 (L (Au2) + /(3) + /(3)Yi, 'aL)(3) Y2 1, S = 1 ~ 2 3 2)e) (2)_ + 2 N(A(2) +/(2) +/(2) 3 e1 2 + 42+43 N U 2,0 2,1 Y1, 21,) L( + (3) + (3) y+ (3) 7) (3) Y31 Y2, Y1, S = 2 ~ N ( u3) 3,1Y 3,2 Y2, e -313, 3 N (4,3 + 3) y + 93)'y2, 73) Y3 Y2 1, S = 1 N 3,2 313 2 (L (3) + (3) + (3) y+ (3) y '7eA (3) + 2 3N ( u3) '3,1 v2 + p 23 e, )313 MAR on residuals constraints: Here we show that in multivariate case (Section 4.7.2), the MAR on the residuals restriction puts no constraints on c() . Let [ZjIS] [Yj Xo(S)]. The MAR on the residuals constraints are SP(S > s) Pk(ZjZj-1,X) = P(S > jPs(zjljiX). s=- Note that J ps(yj, ... yJ) = ps(yi) ps(yl y-i) /=2 Sexp (> Xa(S) > -)(Yt (>ai) Xa(s)) = Ps(Yi) /=2 227a Thus, J PS(z, ... ) ps(zi) H ps(z/|z/-1) /=2 exp (Z i t/ P/ ~_ _(l t _t/) 2 = ps (z2) -)- /=2 2/I/ We can further show that [Z, Zj_, S = s, X] ~ N p ) +( )(Zi I)), ) , /=i (A) Conditional Depression Rate o Empirical A Model-based, Empirical Undefined * Model-based Posterior Mean of Depression Rate 0 a * 0 g S* o o o .. o 0 0 *. *: c o * 0o . *. oA "* 00 0 o 0 0 0 00 0 0 .. * 0 0 0 c* * ** -* -* A .- - mo o .,* *. .**" 7A A * o o o 0 0 0 I 0 o o o o ooo o coo o, oc (B) Shrinkage Difference 0 0 o o 00o o ooo oo 0 c coo o 0 S0 oO0 o o o0 0 oo o 00 o o o o o 0 -- -- ------- --- 0a 0 0 00 0 0o o0 0 0 0 0 0 0 0 0 0 0 0 0oo [ Depressed I Not Depressed Yu I lll ll lll ll lll ll ll lll ll lll ll ll lll ll lll ll lll ll ll lll ll lll ll ll lll ll -I I-II-I-I-I-I-I-I-I:I:I:I:Iiiiiiiiiiiiiiiiiiiiiiiiiiiii:::::::::: iiiiiiiiiiiiiiiiiiiiiiiii:i:i:iiiiiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii Historical Response Pattern Figure 3-2. (A) The empirical rate and model-based posterior mean of P[Yj = 11S > j, Yj_ = yj,, Z = z] for Z = 2 and j = 6, 7. (B) The difference between the empirical and model-based posterior mean of the depression rate. The x-axis is the pattern of historical response data Y,_I. * Co c, 000ooo oo0000 oo00 0 SCo 0 0 0 Cm oo 00 0 0 0 oo0 0 0 o O CO 0 a 0 same partial missing at random assumption as in Chapter 3, Section 3.2, that R, I Yc ZZ, C, Yobs. We have shown in Chapter 3 that P*,z,c is identified by the observed data under this partial missing at random assumption. 5.3.4 Causal Inference Assumption The causal effect (5-2) is not identifiable from the observed data 0= {Z,C(Z), R(Z),Yobs(Z)}. We propose the following assumptions to identify boundaries for the causal effect: I Stable Unit Treatment Value Assumption (SUTVA). Let Z = (Z, ..., ZN) be the vector of treatment assignment for all the patients. SUTVA means Z, = Zf = (Y,(Z,), C,(Z,)) = (Y,(Zf), C,(Z;)), regardless of what Z is. That is, the potential outcome of patient i is unrelated to the treatment assignment of other patients. The allows us to write Yi(Z) and Ci(Z) as Yi(Zi) and Ci(Zi) respectively. II Random Assignment The treatment assignment Z is random, i.e. Z L (Y(0), Y(1), C(0), C(1)), which holds in a controlled randomized clinical trial. This assumption allows us to write Yj(z) and Cj(z) as YjIZ = z and Cj\Z = z respectively. III Mean Monotonicity E[Y'(z)lC(z) = c, C(1 z) = t] < E[Yj(z) C(z) = c', C(1 z) = t'] for c' > c > j, t' > t, z = 0, 1. This assumption provides an ordering of the mean potential response at visit under treatment z for all the principal cohorts of individuals who would be on study at visit j under treatment z. The means are assumed to not be worse for cohorts who remain on-study longer under both treatments. That is, the individuals who would be last seen at time c' (c' > j) under treatment z and time t' under treatment 1 z will not have a worse mean potential response at time under treatment z than individuals 109 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................... ............... 4 LIST O FTABLES ..................... ................. 8 LIST OF FIGURES .................... ................. 9 ABSTRACT .................... ................... .. 10 CHAPTER 1 INTRODUCTION .................... ............... 12 1.1 Missing Data Concepts and Definitions .................. 12 1.2 Likelihood-Based Methods ............................ 16 1.3 Non-Likelihood Methods ................... ........ 19 1.4 Interm ittent M issingness ............................ 20 1.5 Identifying Restrictions in Pattern Mixture Models .... 22 1.6 Dissertation G oals . .. 24 2 A BAYESIAN SHRINKAGE MODEL FOR LONGITUDINAL BINARY DATA W ITH DRO P-O UT ................... ............... 26 2.1 Introduction . .. 26 2.1.1 Breast Cancer Prevention Trial .. .. 26 2.1.2 Informative Drop-Out in Longitudinal Studies ... 27 2.1.3 Outline. .... .. 30 2.2 Data Structure and Notation ..... .. ..... 30 2.3 Assum options . . .. 31 2.4 Identifiability .................... .............. 32 2.5 Modeling ..................... .... .......... 35 2.6 Prior Specification and Posterior Computation ..... 35 2.6.1 Shrinkage Priors . 36 2.6.2 Prior of Sensitivity Parameters .. .. 37 2.6.3 Posterior Computation ..... ..... 40 2.7 Assessment of Model Performance via Simulation .... 40 2.8 Application: Breast Cancer Prevention Trial (BCPT) .... 42 2.8.1 Model Fit and Shrinkage Results ... 42 2.8.2 Inference .. . .. 43 2.9 Summary and Discussion ........................... 43 2.10 Acknowledgments ................... ............ 45 2.11 Tables and Figures . .. 45 co Tamoxifen: maximum r-------------------------------------- i Tamoxifen: median r-------------------------------------- I / -^ ,' ,, / Placebo:maximum co Tamoxifen:minimum S'- / Placebo:median >c /' /Placebo:minimum o ./ 0 10 25 100 Drop Out Rate Figure 2-1. Extrapolation of the elicited relative risks. 48 where restrictions (1)-(2) are for ps,t to be a distribution with (identified) marginals, restriction (3) satisfies the identified conditional means, and restriction (4) comes from Assumption III. Finding the boundaries of the SACE i.e. finding the minimum and the maximum of the objective function (5-3), can be approximated (by ignoring the normalizing constant) as a non-convex quadratically constrained quadratic problem (QCQP) (Boyd and Vandenberghe, 1997, 2004). For a QCQP, a standard approach is to optimize a semidefinite relaxation of the QCQP and get lower and upper bounds on local optimal of the objective function (Boyd and Vandenberghe, 1997). The uncertainty of the estimated bounds can be characterized in a Bayesian framework. The joint posterior distribution of the bounds can be constructed by implementing the optimization for each posterior sample of *,z,c, identified by the algorithm proposed in Section 5.3.3. The result can be presented as in Figure 5-1. A study decision might be based on the mode of the posterior joint distribution of the bounds. 5.3.5 Stochastic Survival Monotonicity Assumption Under Assumption II, the marginal distributions P(C(0)) and P(C(1)) of P(C(0) = c, C(1) = t) (re: 0 and 1 represent the placebo and treatment arm, respectively) are identified. However, the joint distribution remains unidentified without further assumption. We outline several Assumptions that will identify pc,t beyond the identified margins (Figure 5-2). These assumptions, when reasonable, will simplify the optimization of the objective function and yield more precise results. 1. P(C(O) = m C(1) = c) = qn-mP(C(O) = n C(1) = c) for c > m > n and q > 1. That is, given a patient will "survive" until time point c on the treatment arm, the probability the patient will "survive" until time point n 1 is q times the probability that the patient will "survive" until n for n < c on the placebo arm. The parameter q is a sensitivity parameter. 2. P(C(1) = t C(O) = c) = 0 for c > t. 2010 Chenguang Wang Table 2-3. Simulation Scenario Parameter Tamoxifen ao -2.578 -2.500 -2.613 a 2.460 1.978 a2 1.500 70 -2.352 -2.871 71 0.611 0.397 72 0.121 Depression Rate 0.066 0.097 0.119 Placebo co -2.653 a1 a2 70 71 72 Depression Rate 0.071 -2.632 -2.59 2.708 2.304 1.241 -2.308 -2.970 0.466 0.468 -0.293 0.107 0.118 Time Point 2 3 -2.752 1.940 1.599 -2.625 0.460 0.422 0.124 -2.663 1.874 1.608 -2.729 0.469 0.323 0.120 -2.626 2.023 1.389 -2.513 0.247 0.261 0.139 -2.598 2.104 1.471 -2.474 0.272 0.278 0.132 4 5 -2.789 2.072 1.612 -2.281 0.320 0.035 0.126 -2.884 2.068 1.693 -2.410 0.376 0.288 0.130 -2.811 1.885 1.639 -2.217 0.127 0.293 0.126 -2.853 2.123 1.540 -2.460 0.088 0.241 0.126 -2.895 2.007 1.830 -2.536 0.228 0.204 0.123 -3.035 2.243 1.989 -2.673 0.001 0.428 0.125 i interactions (i.e., borrowing information across neighboring cells), we are able to estimate P[Yj = 1l R = 1, Y _i, Z = z] for all j, z and Yj_1 with reasonable precision. 2.8.2 Inference Figure 2-5 shows the posterior of P[Y7 = 1|Z = z], the treatment-specific probability of depression at the end of the 36-month follow up (solid lines). For comparison, the posterior under MAR (corresponding to point mass priors for r at zero) is also presented (dashed lines). The observed depression rates (i.e., complete case analysis) were 0.115 on both the placebo and tamoxifen arms. Under the MNAR analysis (using the elicited priors), the posterior mean of the depression rates at month 36 were 0.126 (95%C/ : 0.115, 0.138) and 0.130 (95%C/ : 0.119, 0.143) for the placebo and tamoxifen arms; the difference was 0.004 (95%C/ : -0.012, 0.021). Under MAR, the rates were 0.125 (95%C/ : 0.114, 0.136) and 0.126 (95%C/ : 0.115, 0.138) for the placebo and tamoxifen arms; the difference was 0.001 (95% C : -0.015, 0.018). The posterior probability of depression was higher under the MNAR analysis than the MAR analysis since researchers believed depressed patients were more likely to drop out (see Table 2-2), a belief that was captured by the elicited priors. Figure 2-6 shows that under the two treatments there were no significant differences in the depression rates at every time point (95% credible intervals all cover zero) under both MNAR and MAR. Similar (non-significant) treatment differences were seen when examining treatment comparisons conditional on depression status at baseline. 2.9 Summary and Discussion In this paper, we have presented a Bayesian shrinkage approach for longitudinal binary data with informative drop-out. Our model provides a framework that incorporates expert opinion about non-identifiable parameters and avoids the curse of dimensionality by using shrinkage priors. In our analysis of the BCPT data, we concluded that there was little (if any) evidence that women on tamoxifen were more depressed than those on placebo. and available case missing value (ACMV) constraints: Pk(jYIYj-i) = P(yj yj-1) Molenberghs et al. (1998) proved that for discrete time points and monotone missingness, the ACMV constraint is equivalent to missing at random (MAR). Thijs et al. (2002) developed strategies to apply identifying restrictions. That is first fit Pk(Yk), then choose an identifying restriction to identify the missing patterns. Multiple imputation can be applied by drawing unobserved components from the identified missing patterns. Kenward et al. (2003) discussed identifying restrictions corresponding to missing non-future dependence. 1.6 Dissertation Goals There will be two major components to this dissertation. First, we will develop a Bayesian semiparametric model for longitudinal binary responses with non-ignorable missingness, including drop-out and intermittent missingness. Second, we will carefully explore identifying restrictions for pattern mixture models. Bayesian shrinkage model: We propose two different parameterizations of saturated models for the observed data distribution, as well as corresponding shrinkage priors to avoid the curse of dimensionality. The two procedures provide researchers different strategies for reducing the dimension of parameter space. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial. Identifying restrictions and sensitivity analysis in pattern mixture models: The normality of response data (if appropriate) for pattern mixture models is desirable as it easily allows incorporation of baseline covariates and introduction of sensitivity 4.10 Appendix Missing data mechanism under missing not at random and multivariate normality: The MDM in Section 4.3 is derived as follows: P(S = slY) P(S = s)ps(Y) log = log P(S > slY) P(Y, S > s) g P(S = s)ps(YI) R=2 Ps(Y/ IY-) = log E=, sP(S = k)pk(Y) H/22 Pk(y -)} -log 2 2P>1(Y1 Y-1)P(S= s)ps(Yi) Hns +iP(Y/ vi-1) = log I/=2 pP>(Y / Y-1 k=s {P(S = k)pk(Y) HJ =sl Pk(y -1)} l P(S = s)p,(YI) H/s J+lPs(Y/i /-1) = log k{=s P(S = k)Pk(Y) /=s 1 Pk( YI 1k -1) (5S) (?s(i ^)) 2 i> 0a" + ((0i))} =log P(S = s) +-+1 ) 2 2(s) 2 2e1 ek(>) log {P(S k)(k) -exp { k)2 exp} (Y2_ />1)2 k=s 1 /=+s1 -/|-1 0 ( Y / 1 / ILI/ 1 ) 2 2 S) a(>) I=k+l 2ze & /1 i-1 Mean and variance of [Yj Yj-, S = s]: The mean and variance of [Yj| Y-,, S = s] under MNAR assumption in Section 4.5 are derived as follows: Sa ), (k) N W y.- AUO) (1 ( e )/ (s),MNAR E(Y Yj-1, S = s) = e- Vj,k yJPk( O1 ej-1) /(e^ k=j = e ) jk A) W ) dy e =k-e) +A (1 e- e') Pk(Y)lyJ1)e dy k=j k=j |

Full Text |

xml version 1.0 standalone yes
Volume_Errors Errors PageID P134 ErrorID 4 P136 4 PAGE 2 2 PAGE 3 3 PAGE 4 Firstandforemost,Iwouldliketoexpressthedeepestappreciationtomyadvisor,ProfessorMichaelJ.Daniels.Withouthisextraordinaryguidanceandpersistenthelp,IwillneverbeabletobewhereIam.Iadmirehiswisdom,hisknowledgeandhiscommitmenttothehigheststandard.Ithasbeentrulyanhonortoworkwithhim.IwishtospeciallythankProfessorDanielO.ScharfsteinofJohnsHopkinsforhisencouragementandcrucialcontributiontotheresearch.Iwillalwaysbearinmindtheadvicehegave:justrelaxandenjoythelearningprocess.Iwouldliketothankmycommitteemembers,ProfessorMalayGhosh,Dr.BrettPresnell,andDr.AlmutWinterstein,whohaveprovidedabundantsupportandvaluableinsightsovertheentireprocessthroughouttheclasses,examsanddissertation.ManythanksgoinparticulartoProfessorRonglingWuofPennsylvaniaStateUniversity.FromProfessorWu,IstartedlearningwhatIwantedformycareer.IalsogratefullythankProfessorMyronChang,ProfessorLindaYoung,Dr.MeenakshiDevidasandDr.GregoryCampbellofFDA.Iamfortunatetohavetheirsupportatthosecriticalmomentsofmycareer.Finally,Iwouldliketothankmywife,myson,mysoon-to-be-bornbaby,myparentsandmyparents-in-law.ItisonlybecauseofyouthatIhavebeenabletokeepworkingtowardthisdreamIhave. 4 PAGE 5 page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 12 1.1MissingDataConceptsandDenitions .................... 12 1.2Likelihood-BasedMethods .......................... 16 1.3Non-LikelihoodMethods ............................ 19 1.4IntermittentMissingness ............................ 20 1.5IdentifyingRestrictionsinPatternMixtureModels .............. 22 1.6DissertationGoals ............................... 24 2ABAYESIANSHRINKAGEMODELFORLONGITUDINALBINARYDATAWITHDROP-OUT .................................. 26 2.1Introduction ................................... 26 2.1.1BreastCancerPreventionTrial .................... 26 2.1.2InformativeDrop-OutinLongitudinalStudies ............ 27 2.1.3Outline .................................. 30 2.2DataStructureandNotation .......................... 30 2.3Assumptions .................................. 31 2.4Identiability ................................... 32 2.5Modeling .................................... 35 2.6PriorSpecicationandPosteriorComputation ................ 35 2.6.1ShrinkagePriors ............................ 36 2.6.2PriorofSensitivityParameters .................... 37 2.6.3PosteriorComputation ......................... 40 2.7AssessmentofModelPerformanceviaSimulation ............. 40 2.8Application:BreastCancerPreventionTrial(BCPT) ............ 42 2.8.1ModelFitandShrinkageResults ................... 42 2.8.2Inference ................................ 43 2.9SummaryandDiscussion ........................... 43 2.10Acknowledgments ............................... 45 2.11TablesandFigures ............................... 45 5 PAGE 6 ................. 54 3.1Introduction ................................... 54 3.1.1IntermittentMissingData ....................... 54 3.1.2ComputationalIssues ......................... 55 3.1.3Outline .................................. 55 3.2Notation,AssumptionsandIdentiability ................... 56 3.3Modeling,PriorSpecicationandPosteriorComputation .......... 58 3.3.1Modeling ................................. 58 3.3.2ShrinkagePrior ............................. 58 3.3.3PriorofSensitivityParameters .................... 60 3.3.4PosteriorComputation ......................... 61 3.4AssessmentofModelPerformanceviaSimulation ............. 61 3.5Application:BreastCancerPreventionTrial(BCPT) ............ 62 3.5.1ModelFit ................................ 62 3.5.2Inference ................................ 63 3.5.3SensitivityofInferencetothePriors .................. 64 3.6SummaryandDiscussion ........................... 65 3.7TablesandFigures ............................... 65 3.8Appendix .................................... 73 4ANOTEONMAR,IDENTIFYINGRESTRICTIONS,ANDSENSITIVITYANALYSISINPATTERNMIXTUREMODELS ......................... 77 4.1Introduction ................................... 77 4.2ExistenceofMARunderMultivariateNormalitywithinPattern ....... 79 4.3SequentialModelSpecicationandSensitivityAnalysisunderMAR ... 83 4.4Non-FutureDependenceandSensitivityAnalysisunderMultivariateNormalitywithinPattern .................................. 85 4.5MARandSensitivityAnalysiswithMultivariateNormalityontheObserved-DataResponse .................................... 87 4.6Example:GrowthHormoneStudy ...................... 89 4.7ACMVRestrictionsandMultivariateNormalitywithBaselineCovariates 91 4.7.1BivariateCase ............................. 91 4.7.2MultivariateCase ............................ 93 4.8Summary .................................... 95 4.9Tables ...................................... 96 4.10Appendix .................................... 98 5DISCUSSION:FUTUREAPPLICATIONOFTHEBAYESIANNONPARAMETRICANDSEMI-PARAMETRICMETHODS ....................... 103 5.1Summary .................................... 103 5.2ExtensionstoGeneticsMapping ....................... 103 5.3ExtensionstoCausalInference ........................ 105 6 PAGE 7 ..................... 105 5.3.2DataandNotation ........................... 107 5.3.3MissingDataMechanism ....................... 108 5.3.4CausalInferenceAssumption ..................... 109 5.3.5StochasticSurvivalMonotonicityAssumption ............ 111 5.3.6SummaryofCausalInference ..................... 112 5.4Figures ..................................... 112 REFERENCES ....................................... 115 BIOGRAPHICALSKETCH ................................ 123 7 PAGE 8 Table page 2-1RelativeRiskstobeElicited ............................. 45 2-2PercentilesofRelativeRisksElicited ........................ 45 2-3SimulationScenario ................................. 46 2-4SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. .................................. 47 2-5PatientsCumulativeDropOutRate ......................... 47 3-1MissingnessbyScheduledMeasurementTime .................. 65 3-2SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. .................................. 66 3-3SensitivitytotheElicitedPrior ............................ 67 3-4SensitivitytotheElicitedPrior ............................ 68 4-1GrowthHormoneStudy:Samplemean(standarddeviation)stratiedbydropoutpattern. ........................................ 96 4-2GrowthHormoneStudy:Posteriormean(standarddeviation) .......... 97 8 PAGE 9 Figure page 2-1Extrapolationoftheelicitedrelativerisks. ..................... 48 2-2PriorDensityofjp 49 2-3ModelFit ....................................... 50 2-4ModelShrinkage ................................... 51 2-5PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. ............................... 52 2-6Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. ............................... 53 3-1ModelFit ....................................... 69 3-2Shrinkage ....................................... 70 3-3PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. ............................... 71 3-4Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. ............................... 72 5-1ContourandPerspectivePlotsofaBivariateDensity .............. 113 5-2Illustrationofpc,t 114 9 PAGE 10 Weconsiderinferenceinrandomizedlongitudinalstudieswithmissingdatathatisgeneratedbyskippedclinicvisitsandlosstofollow-up.Inthissetting,itiswellknownthatfulldataestimandsarenotidentiedunlessunveriedassumptionsareimposed.Sensitivityanalysisthatassessesthesensitivityofmodel-basedinferencestosuchassumptionsisoftennecessary. InChapters 2 and 3 ,wepositanexponentialtiltmodelthatlinksnon-identiabledistributionsandidentiabledistributions.Thisexponentialtiltmodelisindexedbynon-identiedparameters,whichareassumedtohaveaninformativepriordistribution,elicitedfromsubject-matterexperts.Underthismodel,fulldataestimandsareshowntobeexpressedasfunctionalsofthedistributionoftheobserveddata.Weproposetwodifferentsaturatedmodelsfortheobserveddatadistribution,aswellasshrinkagepriorstoavoidthecurseofdimensionality.Thetwoproceduresprovideresearchersdifferentstrategiesforreducingthedimensionofparameterspace.Weassumeanon-futuredependencemodelforthedrop-outmechanismandpartialignorabilityfortheintermittentmissingness.Inasimulationstudy,wecompareourapproachtoafullyparametricandafullysaturatedmodelforthedistributionoftheobserveddata.Ourmethodologyismotivatedby,andappliedto,datafromtheBreastCancerPreventionTrial. 10 PAGE 11 4 ,wediscusspatternmixturemodels.Patternmixturemodelingisapopularapproachforhandlingincompletelongitudinaldata.Suchmodelsarenotidentiablebyconstruction.Identifyingrestrictionsareoneapproachtomixturemodelidentication( DanielsandHogan 2008 ; Kenwardetal. 2003 ; Little 1995 ; LittleandWang 1996 ; Thijsetal. 2002 )andareanaturalstartingpointformissingnotatrandomsensitivityanalysis( DanielsandHogan 2008 ; Thijsetal. 2002 ).However,whenthepatternspecicmodelsaremultivariatenormal(MVN),identifyingrestrictionscorrespondingtomissingatrandommaynotexist.Furthermore,identicationstrategiescanbeproblematicinmodelswithcovariates(e.g.baselinecovariateswithtime-invariantcoefcients).Inthispaper,weexploreconditionsnecessaryforidentifyingrestrictionsthatresultinmissingatrandom(MAR)toexistunderamultivariatenormalityassumptionandstrategiesforidentifyingsensitivityparametersforsensitivityanalysisorforafullyBayesiananalysiswithinformativepriors.Alongitudinalclinicaltrialisusedforillustrationofsensitivityanalysis.Problemscausedbybaselinecovariateswithtime-invariantcoefcientsareinvestigatedandanalternativeidentifyingrestrictionbasedonresidualsisproposedasasolution. 11 PAGE 12 Theproblemofincompletedataisfrequentlyconfrontedbystatisticians,especiallyinlongitudinalstudies.Themostcommontypeofincompletedataismissingdata,inwhicheachdatavalueiseitherperfectlyknownorcompletelyunknown.Inothersituations,dataarepartiallymissingandpartiallyobserved.Examplesincluderoundeddataandcensoreddata,etc..Thistypeofincompletedataisreferredtoascoarsedata.Missingdatacanbeviewedasaspecialcaseofcoarsedata( HeitjanandRubin 1991 ).Inbothcases,theincompletenessoccursbecauseweobserveonlyasubsetofthecompletedata,whichincludesthetrue,unobservabledata.Inthisdissertation,missingdataincludingthedrop-outmissingness,inwhichcasesubjectsmissingameasurementwillnotreturntostudyatthenextfollow-up,andtheintermittentmissingness,inwhichcasethemissingvisitisfollowedbyanobservedmeasurement. LittleandRubin 1987 ,chapter4);however,thismethodisinefcient.Anothercommonapproachissingleimputation,thatis,llinginasinglevalueforeachmissingvalue.Theadvantageofsingleimputationisthatitdoesnotdeleteanyunitsandaftertheimputation,standardmethodsforcomplete 12 PAGE 13 Rubin 1987 ). Fornotation,lety=fy1,...,yJgdenotethefulldataresponsevectorofoutcome,possiblypartiallyobserved.Letr=fr1,r2,...,rJgdenotethemissingdataindicator,withrj=0ifyjismissingand1ifyjisobserved.Letxdenotethecovariates.Letyobsandymisdenotetheobservedandmissingresponsedata,respectively.Let!betheparametersindexingthefulldatamodelp(y,r),(!)betheparametersindexingthefulldataresponsemodelp(y),and(!)betheparametersindexingthemissingdatamechanismmodelp(rjy). Thecommonassumptionsaboutthemissingdatamechanismareasfollows. ( 1976 )and LittleandRubin ( 1987 )developedahierarchyformissingdatamechanismsbyclassifyingtherelationshipbetweenmissingnessandtheresponsedata. NotethatMARholdsifandonlyifp(ymisjyobs,r)=p(ymisjyobs).Theproofisasfollows: SupposeMARholds.Thenwehavep(rjymis,yobs)=p(rjyobs) PAGE 14 Toshowthereversedirection,notethatp(rjymis,yobs)=p(r,ymisjyobs) Laird 1988 ).Thisconditioniscalledignorability( Rubin 1976 ). 1. ThemissingdatamechanismisMAR. 2. Theparametersofthefulldataresponsemodel,(!)andtheparametersofthemissingnessmodelaredistinguishable,i.e.thefulldataparameter!canbedecomposedas((!),(!)). 3. Theparameters(!)and(!)areaprioriindependent,i.e.p((!),(!))=p((!))p((!)). FulldatamodelsthatdonotsatisfyDenition 1.4 havenon-ignorablemissingness. 14 PAGE 15 Kenwardetal. ( 2003 )denedthetermnon-futuredependence. 15 PAGE 16 HoganandLaird 1997b ).Likelihood-basedmodelsformissingdataaredistinguishedbythewaythejointdistributionoftheoutcomeandmissingdataprocessesarefactorized.Theycanbeclassiedasselectionmodels,pattern-mixturemodels,andshared-parametermodels. Heckman ( 1979a b )usedabivariateresponseYwithmissingY2asanexampleandshowedthatingeneralit'scriticaltoanswerthequestionwhyarethedatamissingbymodelingthemissingnessofY2iasafunctionofobservedY1i(forsubjecti). DiggleandKenward ( 1994 )extendedtheHeckmanmodeltolongitudinalstudiesandmodeledthedrop-outprocessbylogisticregressionsuchaslogit(rj=0jrj1=1,y)=y0. Albert 2000 ; Baker 1995 ; Fitzmauriceetal. 1995 ; Heagerty 2002 ; KurlandandHeagerty 2004 ). 16 PAGE 17 ( 1977 )introducedtheideaofmodelingrespondentsandnonrespondentsinsurveysseparatelyandusingsubjectivepriorstorelaterespondents'andnonrespondents'modelparameters. Little ( 1993 1994 )exploredpatternmixturemodelsindiscretetimesettings.Specically,differentidentifyingrestrictions(seeSection 1.5 )wereproposedtoidentifythefull-datamodel.Whenthenumberofdropoutpatternsislargeandpattern-specicparameterswillbeweaklyidentiedbyidentifyingrestrictions, Roy ( 2003 )and RoyandDaniels ( 2008 )proposedtouselatent-classmodelfordropoutclasses.Whenthedropouttimeiscontinuousandthemixtureofpatternsisinnite, Hoganetal. ( 2004 )proposedtomodeltheresponsegivendropoutbyavaryingcoefcientmodelwhereregressioncoefcientswereunspecied,non-parametricfunctionsofdropouttime.Fortime-eventdatawithinformativecensoring, WuandBailey ( 1988 1989 )and HoganandLaird ( 1997a )developedrandomeffectsmixturemodels. FitzmauriceandLaird ( 2000a )generalizedWuandBaileyandHoganandLairdapproachfordiscrete,ordinalandcountdatabyusinggeneralizedlinearmixturemodelsandGEEapproachforstatisticalinference. DanielsandHogan ( 2000 )proposedaparameterizationofthepatternmixturemodelforcontinuousdata.Sensitivityanalysiscanbedoneontheadditive(location)andmultiplicative(scale)terms. ForsterandSmith ( 1998 )consideredapatternmixturemodelforasinglecategoricalresponsewithcategoricalcovariates.Bayesianapproacheswereemployedfornon-ignorablemissingness. PAGE 18 WuandCarroll ( 1988 )presentedasharedparameterrandomeffectsmodelforcontinuousresponsesandinformativecensoring,inwhichindividualeffectsaretakenintoaccountasinterceptsandslopesformodelingthecensoringprocess. DeGruttolaandTu ( 1994 )extendedWuandCarroll'smodeltoallowgeneralcovariates. FollmannandWu ( 1995 )developedgeneralizedlinearmodelforresponseandproposedanapproximationalgorithmforthejointfull-datamodelforinference. FaucettandThomas ( 1996 )and WulfsohnandTsiatis ( 1997 )proposedtojointlymodelthecontinuouscovariateovertimeandrelatethecovariatestotheresponsesimultaneously. Hendersonetal. ( 2000 )generalizedthejointmodelingapproachbyusingtwocorrelatedGaussianrandomprocessesforcovariatesandresponse. TenHaveetal. ( 1998 2000 )proposedasharedparametermixedeffectslogisticregressionmodelforlongitudinalordinaldata.Recently, YuanandLittle ( 2009 )proposedamixed-effecthybridmodelallowsthemissingnessandresponsetobeconditionallydependentgivenrandomeffects. DanielsandHogan 2008 ).Full-datamodelinferencerequiresunveriableassumptionsabouttheextrapolationmodelp(ymisjyobs,r,!E).Asensitivityanalysisexploresthesensitivityofinferencesofinterestaboutthefulldataresponsemodeltounveriableassumptionsabouttheextrapolationmodel.Thisistypicallydonebyvaryingsensitivityparameters,whichwedenenext( DanielsandHogan 2008 ). 18 PAGE 19 1. 2. TheobservedlikelihoodL(S,Mjyobs,r)isaconstantasafunctionofS, 3. GivenSxed,L(S,Mjyobs,r)isanon-constantfunctionofM Unfortunately,fullyparametricselectionmodelsandsharedparametermodelsdonotallowsensitivityanalysisassensitivityparameterscannotbefound( DanielsandHogan 2008 ,Chapter8).Examiningsensitivitytodistributionalassumptions,e.g.,randomeffects,willprovidedifferenttstotheobserveddata,(yobs,r).Insuchcases,asensitivityanalysiscannotbedonesincevaryingthedistributionalassumptionsdoesnotprovideequivalenttstotheobserveddata( DanielsandHogan 2008 ).Itthenbecomesanexerciseinmodelselection. FullyBayesiananalysisallowsresearcherstohaveasingleconclusionbyadmittingpriorbeliefsaboutthesensitivityparameters.Forcontinuousresponses, LeeandBerger ( 2001 )builtasemiparametricBayesianselectionmodelwhichhasstrongdistributionalassumptionfortheresponsebutweakassumptiononmissingdatamechanism. Scharfsteinetal. ( 2003 )ontheotherhand,placedstrongparametricassumptionsonmissingdatamechanismbutminimalassumptionsontheresponseoutcome. LiangandZeger ( 1986 )proposedgeneralizedestimatingequations(GEE)whosesolution 19 PAGE 20 Robinsetal. ( 1995 )proposedinverse-probabilityofcensoringweightedgeneralizedestimatingequations(IPCW-GEE)approach,whichreweightseachindividual'scontributiontotheusualGEEbytheestimatedprobabilityofdrop-out.IPCW-GEEwillleadtoconsistentestimationwhenthemissingnessisMAR.However,bothGEEandIPCW-GEEcanresultinbiasedestimationunderMNAR. Rotnitzkyetal. ( 1998a 2001 ), Scharfsteinetal. ( 2003 )and Schulmanetal. ( 1999 )adoptedsemiparametricselectionmodelingapproaches,inwhichthemodelfordrop-outisindexedbyinterpretablesensitivityparametersthatexpressdeparturesfromMAR.Forsuchapproaches,theinferenceresultsdependonthechoiceofunidentied,yetinterpretable,sensitivityanalysisparameters. Oneapproachtohandleintermittentmissingnessistoconsideramonotonizeddataset,wherebyallobservedvaluesonanindividualaftertheirrstmissingnessaredeleted Landetal. ( 2002 ).However,thisincreasesthedropoutrate,losesefciency,andmayintroducebias. Othermethodsintheliteratureoftenadoptalikelihoodapproachandrelyonstrongparametricassumptions.Forexample, Troxeletal. ( 1998 ), Albert ( 2000 )and Ibrahimetal. ( 2001 )suggestedaselectionmodelapproach. Albertetal. ( 2002 )usedasharedlatentautoregressiveprocessmodel. Linetal. ( 2004 )employedlatentclasspatternmixturemodel. 20 PAGE 21 Troxeletal. ( 1998 )and Vansteelandtetal. ( 2007 ). Troxeletal. ( 1998 )proposedamarginalmodelandintroducedapseudo-likelihoodestimationprocedure. Vansteelandtetal. ( 2007 )extendedtheideasof Rotnitzkyetal. ( 1998b ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )tonon-monotonemissingdatathatassume(exponentiallytilted)extensionsofsequentialexplainabilityandspeciedparametricmodelsforcertainconditionalmeans. MostrelatedtotheapproachwewilluseinChapter 3 arethe(partialignorability)assumptionsformalizedin HarelandSchafer ( 2009 )thatpartitionthemissingdataandallowone(ormore)ofthepartitionstobeignoredgiventheotherpartition(s)andtheobserveddata.Specically, HarelandSchafer ( 2009 )denedamissingdatamechanismtobepartiallymissingatrandomifp(rjyobs,ymis,g(r),x;(!))=p(rjyobs,g(r),x;(!)) Vansteelandtetal. ( 2007 ). Inthisdissertation,weexplicitlypartitionthemissingdataindicatorvectorrintofrs,sg,wheres=maxtfrt=1gdenotesthelasttimepointaresponsewasobserved,i.e.thesurvivaltime,andrs=frt:t PAGE 22 1.7 Little 1993 1994 ).Additionalassumptionsaboutthemissingdataprocessarenecessaryinordertoyieldidentifyingrestrictionsthatequatetheinestimableparameterstofunctionsofestimableparametersandidentifythefull-datamodel. Forexample,considerthesituationwheny=(y1,y2)isabivariatenormalresponsewithmissingdataonlyiny2.Letsbethesurvivaltime,i.e.s=1ify2ismissingands=2ify2isobserved.Wemodelp(s)andp(yjs)assBern()andyjs=iN((s),(s))fori=1,2,with(s)=264(s)1(s)2375and(s)=264(s)11(s)12(s)12(s)22375. PAGE 23 Understanding(identifying)restrictionsthatleadtoMARisanimportantrststepforsensitivityanalysisundermissingnotatrandom(MNAR)( DanielsandHogan 2008 ; Scharfsteinetal. 2003 ; ZhangandHeitjan 2006 ).Inparticular,MARprovidesagoodstartingpointforsensitivityanalysisandsensitivityanalysisareessentialfortheanalysisofincompletedata( DanielsandHogan 2008 ; Scharfsteinetal. 1999 ). Little ( 1993 )developedseveralcommonidentifyingrestrictions.Forexample,completecasemissingvalue(CCMV)restrictionswhichequateallmissingpatternstothecompletecases,i.e.pk(yjj PAGE 24 ( 1998 )provedthatfordiscretetimepointsandmonotonemissingness,theACMVconstraintisequivalenttomissingatrandom(MAR). Thijsetal. ( 2002 )developedstrategiestoapplyidentifyingrestrictions.Thatisrsttpk( Kenwardetal. ( 2003 )discussedidentifyingrestrictionscorrespondingtomissingnon-futuredependence. 24 PAGE 25 DanielsandHogan 2008 ).However,multivariatenormalitywithinpatternscanbeoverlyrestrictivewhenapplyingidentifyingrestrictions.WeexploresuchissuesinChapter 4 Furthermore,identicationstrategiescanbeproblematicinmodelswithcovariates(e.g.baselinecovariateswithtime-invariantcoefcients).InthisChapter,wealsoexploreconditionsnecessaryforidentifyingrestrictionsthatresultinmissingatrandom(MAR)toexistunderamultivariatenormalityassumptionandstrategiesforsensitivityanalysis.Problemscausedbybaselinecovariateswithtime-invariantcoefcientsareinvestigatedandanalternativeidentifyingrestrictionbasedonresidualsisproposedasasolution. 25 PAGE 26 2.1.1BreastCancerPreventionTrial Fisheretal. 1998 ).ThestudywasopentoaccrualfromJune1,1992throughSeptember30,1997and13,338womenaged35orolderwereenrolledinthestudyduringthisinterval.Theprimaryobjectivewastodeterminewhetherlong-termtamoxifentherapyiseffectiveinpreventingtheoccurrenceofinvasivebreastcancer.Secondaryobjectivesincludedqualityoflife(QOL)assessmentstoevaluatebenetaswellasriskresultingfromtheuseoftamoxifen. MonitoringQOLwasofparticularimportanceforthistrialsincetheparticipantswerehealthywomenandtherehadbeenconcernsvoicedbyresearchersabouttheassociationbetweenclinicaldepressionandtamoxifenuse.Accordingly,dataondepressionsymptomswasscheduledtobecollectedatbaselinepriortorandomization,at3months,at6monthsandevery6monthsthereafterforupto5years.TheprimaryinstrumentusedtomonitordepressivesymptomsovertimewastheCenterforEpidemiologicStudiesDepressionScale(CES-D)( Radloff 1977 ).Thisself-testquestionnaireiscomposedof20items,eachofwhichisscoredonascaleof0-3.Ascoreof16orhigherisconsideredasalikelycaseofclinicaldepression. ThetrialwasunblindedonMarch31,1998,afteraninterimanalysisshowedadramaticreductionintheincidenceofbreastcancerinthetreatmentarm.Duetothepotentiallossofthecontrolarm,wefocusonQOLdatacollectedonthe10,982participantswhowereenrolledduringthersttwoyearsofaccrualandhadtheirCES-D 26 PAGE 27 IntheBCPT,theclinicalcenterswerenotrequiredtocollectQOLdataonwomenaftertheystoppedtheirassignedtherapy.ThisdesignfeatureaggravatedtheproblemofmissingQOLdatainthetrial.Asreportedin Landetal. ( 2002 ),morethan30%oftheCES-Dscoresweremissingatthe36-monthfollow-up,withaslightlyhigherpercentageinthetamoxifengroup.TheyalsoshowedthatwomenwithhigherbaselineCES-Dscoreshadhigherratesofmissingdataateachfollow-upvisitandthemeanobservedCES-Dscoresprecedingamissingmeasurementwerehigherthanthoseprecedinganobservedmeasurement;therewasnoevidencethattheserelationshipsdifferedbytreatmentgroup. WhiletheseresultssuggestthatthemissingdataprocessisassociatedwithobservedQOLoutcomes,onecannotruleoutthepossibilitythattheprocessisfurtherrelatedtounobservedoutcomesandthatthisrelationshipismodiedbytreatment.Inparticular,investigatorswereconcerned(apriori)that,betweenassessments,tamoxifenmightbecausingdepressioninsomeindividuals,whothendonotreturnfortheirnextassessment.Ifthisoccurs,thedataaresaidbemissingnotatrandom(MNAR);otherwisethedataaresaidtobemissingatrandom(MAR). Landetal. ( 2002 ),consideramonotonizeddataset,wherebyallCES-Dscoresobservedonanindividualaftertheirrstmissingscorehavebeendeleted(thisincreasesthedropoutrate). Therearetwomaininferentialparadigmsforanalyzinglongitudinalstudieswithinformativedrop-out:likelihood(parametric)andnon-likelihood(semi-parametric). 27 PAGE 28 Little ( 1995 ), HoganandLaird ( 1997b )and KenwardandMolenberghs ( 1999 )aswellasrecentbooksby MolenberghsandKenward ( 2007 )and DanielsandHogan ( 2008 )provideacomprehensivereviewoflikelihood-basedapproaches,includingselectionmodels,pattern-mixturemodels,andshared-parametermodels.Thesemodelsdifferinthewaythejointdistributionoftheoutcomeandmissingdataprocessesarefactorized.Inselectionmodels,onespeciesamodelforthemarginaldistributionoftheoutcomeprocessandamodelfortheconditionaldistributionofthedrop-outprocessgiventheoutcomeprocess(see,forexample, Albert 2000 ; Baker 1995 ; DiggleandKenward 1994 ; Fitzmauriceetal. 1995 ; Heckman 1979a ; Liuetal. 1999 ; Molenberghsetal. 1997 );inpattern-mixturemodels,onespeciesamodelfortheconditionaldistributionoftheoutcomeprocessgiventhedrop-outtimeandthemarginaldistributionofthedrop-outtime(see,forexample, BirminghamandFitzmaurice 2002 ; DanielsandHogan 2000 ; FitzmauriceandLaird 2000b ; HoganandLaird 1997a ; Little 1993 1994 1995 ; Pauleretal. 2003 ; Roy 2003 ; RoyandDaniels 2008 ; Thijsetal. 2002 );andinshared-parametermodels,theoutcomeanddrop-outprocessesareassumedtobeconditionallyindependentgivensharedrandomeffects(see,forexample, DeGruttolaandTu 1994 ; Landetal. 2002 ; Pulkstenisetal. 1998 ; TenHaveetal. 1998 2000 ; WuandCarroll 1988 ; YuanandLittle 2009 ).Traditionally,thesemodelshavereliedonverystrongdistributionalassumptionsinordertoobtainmodelidentiability. Withoutthesestrongdistributionalassumptions,specicparametersfromthesemodelswouldnotbeidentiedfromthedistributionoftheobserveddata.Toaddressthisissuewithinalikelihood-basedframework,severalauthors( Bakeretal. 1992 ; DanielsandHogan 2008 ; KurlandandHeagerty 2004 ; Little 1994 ; LittleandRubin 1999 ; Nordheim 1984 )havepromotedtheuseofglobalsensitivityanalysis,wherebynon-orweakly-identied,interpretableparametersarexedandthenvariedtoevaluate 28 PAGE 29 Non-likelihoodapproachestoinformativedrop-outinlongitudinalstudieshavebeenprimarilydevelopedfromaselectionmodelingperspective.Here,themarginaldistributionoftheoutcomeprocessismodelednon-orsemi-parametricallyandtheconditionaldistributionofthedrop-outprocessgiventheoutcomeprocessismodeledsemi-orfully-parametrically.Inthecasewherethedrop-outprocessisassumedtodependonlyonobservableoutcomes(i.e.,MAR), Robinsetal. ( 1994 1995 ), vanderLaanandRobins ( 2003 )and Tsiatis ( 2006 )developedinverse-weightedandaugmentedinverse-weightedestimatingequationsforinference.Forinformativedrop-out, Rotnitzkyetal. ( 1998a ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )introducedaclassofselectionmodels,inwhichthemodelfordrop-outisindexedbyinterpretablesensitivityparametersthatexpressdeparturesfromMAR.Inferenceusinginverse-weightedestimatingequationswasproposed. Theproblemwiththeaforementionedsensitivityanalysisapproachesisthattheultimateinferencescanbecumbersometodisplay. Vansteelandtetal. ( 2006a )developedamethodforreportingignoranceanduncertaintyintervals(regions)thatcontainthetrueparameter(s)ofinterestwithaprescribedlevelofprecision,whenthetruedatageneratingmodelisassumedtofallwithinaplausibleclassofmodels(asanexample,see Scharfsteinetal. 2004 ).Analternativeandverynaturalstrategyisspecifyaninformativepriordistributiononthenon-orweakly-identiedparametersandconductafullyBayesiananalysis,wherebytheultimateinferencesarereportedintermsofposteriordistributions.Inthecross-sectionalsettingwithacontinuousoutcome, Scharfsteinetal. ( 2003 )adoptedthisapproachfromasemi-parametricselectionmodelingperspective. Kacirotietal. ( 2009 )proposedaparametricpattern-mixturemodelforcross-sectional,clusteredbinaryoutcomes. Leeetal. ( 2008 )introducedafully-parametricpattern-mixtureapproachinthelongitudinalsettingwithbinary 29 PAGE 30 Leeetal. ( 2008 ),butofferamoreexiblestrategy.InthecontextofBCPT,thelongitudinaloutcomewillbetheindicatorthattheCES-Dscoreis16orhigher. 2.2 ,wedescribethedatastructure.InSection 2.3 and 2.4 ,weformalizeidenticationassumptionsandprovethatthefull-datadistributionisidentiedundertheseassumptions.WeintroduceasaturatedmodelforthedistributionoftheobserveddatainSection 2.5 .InSection 2.6 ,weillustratehowtoapplyshrinkagepriorstohigh-orderinteractionparametersinthesaturatedmodeltoreducethedimensionalityoftheparameterspaceandhowtoelicit(conditional)informativepriorsfornon-identiedsensitivityparametersfromexperts.InSection 2.7 ,weassess,bysimulation,thebehaviorofthreeclassesofmodelsforthedistributionofobserveddata;parametric,saturated,andshrinkage.OuranalysisoftheBCPTtrialispresentedinSection 2.8 .Section 2.9 isdevotedtoasummaryanddiscussion. Ourgoalistodrawinferenceaboutz,j=P[Yj=1jZ=z]forj=1,...,Jandz=0,1. 30 PAGE 31 Thisassumptionassertsthatforindividualsatriskfordrop-outatvisitjandwhosharethesamehistoryofoutcomesuptoandincludingvisitj,thedistributionoffutureoutcomesisthesameforthosewhoarelastseenatvisitjandthosewhoremainonstudypastvisitj.Thisassumptionhasbeenreferredtoasnon-futuredependence( Kenwardetal. 2003 ). Assumption2linksthenon-identiedconditionaldistributionofYjgivenRj=0,Rj1=1, PAGE 33 SupposethatP[Yj=1jRk=0,Rk1=1, PAGE 34 34 PAGE 35 Furthermore,weproposetoparameterizethefunctionsqz,j( 2.5 providesaperfectttothedistributionoftheobserveddata.Inthismodel,however,thenumberofparametersincreasesexponentiallyinJ.Incontrast,thenumberofdatapointsincreaseslinearlyinJ.Asaconsequence,therewillbemany 35 PAGE 36 RobinsandRitov 1997 ). wheretistheorderofinteractionsandthehyper-parameters(shrinkagevariances)followdistributions(t)Unif(0,10)and(t)Unif(0,10). WhentherstorderMarkovmodelisnottrue,asngoestoinnity,theposteriormeansofobserveddataprobabilitieswillconvergetotheirtruevaluesaslongas 36 PAGE 37 Wespecifynon-informativepriorsN(0,1000)forthenon-interactionparametersin,namelyz,j,0forj=0,...,Jandz=0,1,z,j,1,z,j,0andz,j,1forj=1,...,Jandz=0,1. 2.5 ,are(conditional)oddsratios.Inourexperience,subjectmatterexpertsoftenhavedifcultythinkingintermsofoddsratios;rather,theyaremorecomfortableexpressingbeliefsaboutrelativerisks( Scharfsteinetal. 2006 ; Shepherdetal. 2007 ).Withthisisinmind,weaskedDr.PatriciaGanz,amedicaloncologistandexpertonqualityoflifeoutcomesinbreastcancer,toexpressherbeliefsabouttheriskofdroppingoutanditsrelationshiptotreatmentassignmentanddepression.Wethentranslatedherbeliefsintopriordistributionalassumptionsabouttheoddsratiosensitivityparameters. Specically,weaskedDr.Ganztoanswerthefollowingquestionforeachtreatmentgroup: PAGE 38 Fornotationalconvenience,letrz(p)denotetherelativeriskofdrop-outfortreatmentgroupzanddrop-outprobabilityp.Further,letrz,min(p),rz,med(p)andrz,max(p)denotetheelicitedminimum,median,andmaximumrelativerisks(seeTable 2-1 ).Letpz,j( Bydenition,rz(pz,j( forrz(pz,j( 38 PAGE 39 Step1. Form2fmin,med,maxg,interpolatetheelicitedrz,m(p)atdifferentdrop-outprobabilities(seeFigure 2-1 )tondrz,m(pz,j( Step2. Constructthepriorofrz(pz,j( Step3. Constructaconditionalpriorofp(0)z,j( maxrz(pz,j( minrz(pz,j( maxrz(pz,j( Step4. Steps(2)and(3)induceapriorforz,j, 1rz(pz,j( TherelativeriskselicitedfromDr.GanzaregiveninTable 2-2 .WeextrapolatedtherelativerisksoutsidetherangesgiveninTable 2-2 asshowninFigure 2-1 Figure 2-2 showsthedensityofgivenpz,j( 39 PAGE 40 1. Usingtheproposedobserveddatamodelwiththeshrinkagepriorson,wesimulatedrawsfromtheposteriordistributionsofP[Yj=1jRj=1, 2. ForeachdrawofP[Rj=0jRj=1, 3. Wecomputez,jbypluggingthedrawsofP[Yj=1jRj=1, 2.4 TosamplefromtheposteriordistributionsofP[Yj=1jRj=1, TheshrinkagemodelusestheshrinkagepriorsproposedinSection 2.6.1 (shrinkthesaturatedmodeltowardarstorderMarkovmodel).Notethattheshrinkagepriorsshrinkthesaturatedmodeltoanincorrectparametricmodel. 40 PAGE 41 Wesimulatedobserveddatafromatrueparametricmodelofthefollowingform:logitP[Y0=1jR0=1,Z=z]=z,0,0logitP[Y1=1jR1=1,Y0=y0,Z=z]=z,1,0+z,1,1y0logitP[R1=0jR0=1,Y0=y0,Z=z]=z,1,0+z,1,1y0logitP[Yj=1jRj=1, Todeterminetheparametersofthedatageneratingmodel,wetthismodeltothemonotonizedBCPTdatainWinBUGSwithnon-informativepriors.Weusedtheposteriormeanoftheofparameterszandzasthetrueparameters.Wecomputethetruevaluesofz,jby(1)drawing10,000valuesfromtheelicitedpriorofzgivenzgiveninTable 2-2 ,(2)computingz,jusingtheidenticationalgorithminSection 2.4 foreachdraw,and(3)averagetheresultingz,j's.Themodelparametersandthetruedepressionratesz,j,aregiveninTable 2-3 Weconsidered(relatively)small(3000),moderate(5000),andlarge(10000)samplesizesforeachtreatmentarm;foreachsamplesize,wesimulated50datasets.Weassessedmodelperformanceusingthemeansquarederror(MSE)criterion. InTable 2-4 ,wereporttheMSEsofP[Yj=1jRj=1, 41 PAGE 42 Inaddition,theMSEsfortheshrinkagemodelcomparefavorablywiththoseofthetrueparametricmodelforallsamplesizesconsidered,despitethefactthattheshrinkagepriorswerespeciedtoshrinktowardanincorrectmodel. 2-5 displaysthetreatment-specicmonotonizeddrop-outratesintheBCPT.Bythe7thstudyvisit,morethan40%ofpatientshadmissedoneormoreassessments,withaslightlyhigherpercentageinthetamoxifenarm. WettheshrinkagemodeltotheobserveddatausingWinBUGS,withfourchainsof8000iterationsand1000burn-in.Convergencewascheckedbyexaminingtraceplotsofthemultiplechains. 2-3 ,theshrinkagemodeltstheobserveddatawell.Figure 2-4 illustratestheeffectofshrinkageonthemodeltsbycomparingthedifferencebetweentheempiricalrateandposteriormeanofP[Yj=1jRj=1, 42 PAGE 43 2-5 showstheposteriorofP[Y7=1jZ=z],thetreatment-specicprobabilityofdepressionattheendofthe36-monthfollowup(solidlines).Forcomparison,theposteriorunderMAR(correspondingtopointmasspriorsforatzero)isalsopresented(dashedlines).Theobserveddepressionrates(i.e.,completecaseanalysis)were0.115onboththeplaceboandtamoxifenarms.UndertheMNARanalysis(usingtheelicitedpriors),theposteriormeanofthedepressionratesatmonth36were0.126(95%CI:0.115,0.138)and0.130(95%CI:0.119,0.143)fortheplaceboandtamoxifenarms;thedifferencewas0.004(95%CI:0.012,0.021).UnderMAR,therateswere0.125(95%CI:0.114,0.136)and0.126(95%CI:0.115,0.138)fortheplaceboandtamoxifenarms;thedifferencewas0.001(95%CI:0.015,0.018).TheposteriorprobabilityofdepressionwashigherundertheMNARanalysisthantheMARanalysissinceresearchersbelieveddepressedpatientsweremorelikelytodropout(seeTable 2-2 ),abeliefthatwascapturedbytheelicitedpriors.Figure 2-6 showsthatunderthetwotreatmentstherewerenosignicantdifferencesinthedepressionratesateverytimepoint(95%credibleintervalsallcoverzero)underbothMNARandMAR.Similar(non-signicant)treatmentdifferenceswereseenwhenexaminingtreatmentcomparisonsconditionalondepressionstatusatbaseline. 43 PAGE 44 Penalizedlikelihood( FanandLi 2001 ; GreenandSilverman 1994 ; Wahba 1990 )isanotherapproachforhigh-dimensionalstatisticalmodeling.Therearesimilaritiesbetweenthepenalizedlikelihoodapproachandourshrinkagemodel.Infact,theshrinkagepriorsonthesaturatedmodelparametersproposedinourapproachcanbeviewedasaspecicformforthepenalty. Theideasinthispapercanbeextendedtocontinuousoutcomes.Forexample,onecouldusethemixturesofDirichletprocessesmodel( EscobarandWest 1995 )forthedistributionofobservedresponses.Theycanalsobeextendedtomultiplecausedropout;inthistrial,missedassessmentswereduetoavarietyofreasonsincludingpatient-speciccausessuchasexperiencingaprotocoldenedevent,stoppingtherapy,orwithdrawingconsentandinstitution-speciccausessuchasunderstafnganstaffturnover.Therefore,somemissingnessislesslikelytobeinformative;extensionswillneedtoaccountforthat.Inaddition,institutionaldifferencesmightbeaddressedbyallowinginstitution-specicparameterswithpriorsthatshrinkthemtowardacommonsetofparameters. Forsmallersamplesizes,WinBUGShasdifcultysamplingfromtheposteriordistributionoftheparametersintheshrinkagemodel.Inaddition,themonotonizingapproachignorestheintermittentmissingdataandmayleadtobiasedresults.TheseissueswillbeexaminedinthenextChapter. 44 PAGE 45 RelativeRiskstobeElicited DropoutRatep 100%condentthenumberisaboverz,min(p) PercentilesofRelativeRisksElicited DropoutRate TreatmentPercentile 10%25% TamoxifenMinimum 1.101.30Median 1.201.50Maximum 1.301.60 PlaceboMinimum 1.011.20Median 1.051.30Maximum 1.101.40 45 PAGE 46 SimulationScenario TimePoint Parameter01234567 46 PAGE 47 SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. Observedj,z ShrinkageP6.9701.9990.0330.0450.0510.0590.0470.0520.066T6.9882.4010.0240.0260.0560.0530.0630.1190.073 SaturatedP35.67867.1710.0360.0500.0540.0580.1010.2310.561T34.65462.6060.0260.0330.0450.0590.0970.3290.722 ShrinkageP4.6281.1880.0250.0280.0320.0310.0350.0480.057T4.4481.4140.0170.0230.0280.0260.0380.0330.044 SaturatedP30.27454.6470.0230.0280.0280.0330.0610.1380.290T29.59951.2190.0200.0200.0320.0280.0510.1400.392 ShrinkageP2.3920.7070.0080.0100.0150.0170.0140.0130.014T2.4740.7120.0110.0150.0110.0150.0160.0190.023 SaturatedP22.98937.7160.0090.0090.0160.0180.0180.0380.094T22.24534.7910.0110.0130.0140.0140.0210.0480.128 Table2-5. PatientsCumulativeDropOutRate Month 361218243036 TamoxifenAvailable 5364487445974249391035293163Dropout 49076711151454183522012447DropRate(%) 9.1314.3020.7927.1134.2141.0345.62 PlaceboAvailable 5375487146244310395135933297Dropout 50475110651424178220782304DropRate(%) 9.3813.9719.8126.4933.1538.6642.87 47 PAGE 48 Extrapolationoftheelicitedrelativerisks. 48 PAGE 49 Priorconditionaldensityz,j, 49 PAGE 50 SolidanddashedlinesrepresenttheempiricalrateofP[Yj=1,Rj=1jZ=z]andP[Rj=0jZ=z],respectively.TheposteriormeansofP[Yj=1,Rj=1jZ=z](diamond)andP[Rj=0jZ=z](triangle)andtheir95%credibleintervalsaredisplayedateachtimepoint. 50 PAGE 51 DifferencesbetweenposteriormeanandempiricalrateofP[Yj=1jRj=1, 51 PAGE 52 PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. 52 PAGE 53 Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. 53 PAGE 54 2 aBayesianshrinkageapproachforlongitudinalbinarydatawithinformativedrop-out.Thesaturatedobserveddatamodelswereconstructedsequentiallyviaconditionaldistributionsforresponseandfordropouttimeandparameterizedonthelogisticscaleusingallinteractionterms.However,twoissueswerenotaddressed:theignoredintermittentmissingdataandtheintrinsiccomputationalchallengewiththeinteractionparameterization.ThisChapterproposessolutionstothesetwoissues. Landetal. ( 2002 );wedidthisinChapter 2 .However,thisincreasesthedrop-outrate,throwsawayinformationandthuslosesefciency,andmayintroducebias. Handlinginformativeintermittentmissingdataismethodologicallyandcomputationallychallengingand,asaresult,thestatisticsliteratureisrelativelylimited.Mostmethodsadoptalikelihoodapproachandrelyonstrongparametricassumptions(see,forexample, Albert 2000 ; Albertetal. 2002 ; Ibrahimetal. 2001 ; Linetal. 2004 ; Troxeletal. 1998 ).Semiparametricmethodshavebeenproposedby Troxeletal. ( 1998 )and Vansteelandtetal. ( 2007 ). Troxeletal. ( 1998 )proposedamarginalmodelandintroducedapseudo-likelihoodestimationprocedure. Vansteelandtetal. ( 2007 )extendedtheideasof Rotnitzkyetal. ( 1998b ), Scharfsteinetal. ( 1999 )and Rotnitzkyetal. ( 2001 )tonon-monotonemissingdata. 54 PAGE 55 HarelandSchafer ( 2009 )thatpartitionthemissingdataandallowone(ormore)ofthepartitionstobeignoredgiventheotherpartition(s)andtheobserveddata.InthisChapter,weapplyapartialignorabilityassumptionsuchthattheintermittentmissingdatamechanismcanbeignoredgivendrop-outandtreatmentstrata. 2 ,Section 2.5 ,WinBUGShasdifcultysamplingfromtheposteriordistributionoftheparameterswhensamplesizeisrelativelysmall(lessthan3000perarm).Tailoredsamplingalgorithmscanbewrittentoovercomethisdifculty,however,WinBUGSlackstheexibilitytoincorporatemodicationsand/orextensionstoitsexistingalgorithms. InthisChapter,wewillprovideanalternativeparameterizationsofthesaturatedmodelfortheobserveddataaswellasalternativeshrinkagepriorspecicationstoimprovecomputationalefciency.ThisalternativeapproachtoposteriorsamplingcaneasilybeprogrammedinR. 3.2 ,wedescribethedatastructure,formalizeidenticationassumptionsandprovethatthetreatment-specicdistributionofthefulltrajectoryoflongitudinaloutcomesisidentiedundertheseassumptions.InSection 3.3 ,weintroduceasaturatedmodelforthedistributionofthedatathatwouldbeobservedwhenthereisdrop-out,butnointermittentobservations.Wethenintroduceshrinkagepriorstoparametersinthesaturatedmodeltoreducethedimensionalityoftheparameterspace.InSection 3.4 ,weassess,bysimulation,thebehaviorofthreeclassesofmodels:parametric,saturated,andshrinkage.OuranalysisoftheBCPTtrialispresentedinSection 3.5 .Section 3.6 isdevotedtoasummaryanddiscussion. 55 PAGE 56 2 ,Section 2.2 ,aswellasintroducesomeadditionalnotationinthisSection.Thefollowingnotationisdenedforarandomindividual.Whennecessary,weusethesubscriptitodenotedatafortheithindividual. LetZdenotethetreatmentassignmentindicator,whereZ=1denotestamoxifenandZ=0denotesplacebo.LetYbethecompleteresponsedatavectorwithelementsYjdenotingthebinaryoutcome(i.e.,depression)scheduledtobemeasuredatthejthvisit(j=0(baseline),...,J)andlet Wewillnditusefultodistinguishthreesetsofdataforanindividual:thecompletedataC=(Z,S,RS,Y),thefulldataF=(Z,S,RS, Weassumethatindividualsaredrawnasasimplerandomsamplefromasuper-populationsothatwehaveani.i.d.datastructureforC,FandO.WelettheparameterszindexamodelforthejointconditionaldistributionofSand 56 PAGE 57 Ourgoalistodrawinferenceaboutz,j=P[Yj=1jZ=z]forj=1,...,Jandz=0,1.Toidentifyz,jfromthedistributionoftheobserveddata,wemakethefollowingthree(untestable)assumptions: Thisassumptionplustheassumptionthatzisaprioriindependentofzimpliesthattheintermittentmissingnessmechanismisancillaryorignorable.Specically,thismeansthatwhenconsideringinferencesaboutzfromalikelihoodperspective,asweareinthispaper,theconditionaldistributionofRSgivenZ,SandYobsdoesnotcontributetothelikelihoodandcanbeignored( HarelandSchafer 2009 ). Assumptions2and3arethesameasAssumptions1and2inChapter 2 ,Section 2.3 ,respectively.WerestatebelowthetwoassumptionusingthesurvivaltimeSnotation(insteadofmissingindicatorsRinChapter 2 ). 57 PAGE 58 2 Theidentiabilityresultshowsthat,giventhefunctionsqz,j( 3.3.1Modeling 2 asfollows: forj=2,...,Jandy=0,1. Letzdenotetheparametersindexingtherstsetofmodelsforresponseandzdenotetheparametersindexingthesecondsetofmodelsfordrop-out.RecallthatwedenedztodenotetheparametersoftheconditionaldistributionofSand Thissaturatedmodelavoidsthecomplexinteractiontermmodelparameterization.Asaresult,the(conditional)posteriordistributionsofzwillhavesimpleformsandefcientposteriorsamplingispossibleevenwhenthesamplesizeismoderateorsmall. Weusethesameparameterizationofthefunctionsqz,j( 2 ,Section 2.5 2 ,thestrategytoavoidthecurseofdimensionalitywastoapplyshrinkagepriorsforhigherorderinteractionstoreducethenumberofparameters 58 PAGE 59 3 ,weuseadifferentshrinkagestrategy.Inparticular,weproposetouseBetapriorsforshrinkageasfollows: forj=2,...,Jandy=0,1.Forz,0,z,1,yandz,0,yfory=0,1,weassignUnif(0,1)priors.Letm()z(m()z)and()z(()z)denotetheparametersm()z,j,y(m()z,j1,y)and()z,j,y(()z,j1,y)respectively. NotethatforarandomvariableXthatfollowsaBeta(m=,(1m)=)distribution,wehaveE[X]=mandVar[X]=m(1m) +1. WespecifyindependentUnif(0,1)priorsform()z,j,yandm()z,j1,y.Fortheshrinkageparameters()z,j,yand()z,j1,y,wespecifyindependent,uniformshrinkagepriors( Daniels 59 PAGE 60 )asfollows gE()z,j,y()z,j,y+12and()z,j1,ygE()z,j1,y gE()z,j1,y()z,j1,y+12,(3) where ChristiansenandMorris ( 1997 )). TheexpectednumberofsubjectswithSj,Yj1=y, wheretheprobabilitiesontherighthandsideoftheaboveequationsareestimableunderAssumption1. Theexpectedsamplesizesaboveareusedinthepriorinsteadoftheobservedbinomialsamplesizeswhicharenotcompletelydeterminedduetotheintermittentmissingness.Thus,ourformulationofthesepriorsinducesasmalladditionalamountofdatadependencebeyonditsstandarddependenceonthebinomialsamplesizes.Thisadditionaldependenceaffectsthemedianofthepriorbutnotitsdiffuseness. 2 ,Section 2.6.2 forconstructingpriorsofzgivenz. 60 PAGE 61 2 ,posteriorcomputationsfortheobserveddatamodelaremucheasierandmoreefcientunderthereparameterizedmodel 3 andtheBetashrinkagepriors.TheposteriorsamplingalgorithmscanbeimplementedinRwithnosamplesizerestrictions. Thefollowingstepsareusedtosimulatedrawsfromtheposteriorofz,j: 1. SampleP(z,YImisjYobs,S,RS,Z=z)usingGibbssamplingwithdataaugmentation(seedetailsinAppendix).Continuesamplinguntilconvergence. 2. Foreachdrawofz,j1, 2.6.2 3. Computez,jbypluggingthedrawsofz, 2.4 2 ,Section 2.7 tosimulateobserveddata(nointermittentmissingness).Weagaincomparedtheperformanceofourshrinkagemodelwith(1)acorrectparametricmodel,(2)anincorrectparametricmodel(rstorderMarkovmodel)and(3)asaturatedmodel(withdiffusepriors).OurshrinkagemodelusestheshrinkagepriorsproposedinSection 3.3.2 Weconsideredsmall(500),moderate(2000),large(5000)andverylarge(1,000,000)samplesizesforeachtreatmentarm;foreachsamplesize,wesimulated500datasets.Weassessedmodelperformanceusingmeansquarederror(MSE). InTable 3-2 (samplesize1,000,000notshown),wereporttheMSE'sofP[Yj=1jSj, 61 PAGE 62 Inaddition,theMSE'sfortheparametersz,jintheshrinkagemodelcomparefavorablywiththoseofthetrueparametricmodelforallsamplesizesconsidered,despitethefactthattheshrinkagepriorswerespeciedtoshrinktowardanincorrectmodel. 3-1 displaysthetreatment-specicdrop-outandintermittentmissingratesintheBCPT.Bythe7thstudyvisit(36months),morethan30%ofpatientshaddroppedoutineachtreatmentarm,withaslightlyhigherpercentageinthetamoxifenarm. 3 )tobethemaximumfunction.Tocomputetheexpectednumberofsubjectse()z,j, 3 ),weassignedapointmasspriorat0.5toallm()z,m()z,()zand()z(whichcorrespondstoUnif(0,1)priorsonz, 3.3.4 .Toavoiddatasparsity,wecalculatedP[S=s, Toassessmodelt,wecomparedtheempiricalratesandposteriormeans(with95%credibleintervals)ofP[Yj=1,SjjZ=z]andP[S PAGE 63 3-2 illustratestheeffectofshrinkageonthemodeltbycomparingthedifferencebetweentheempiricalratesandposteriormeansofP[Yj=1jSj, 2.7 ,weknowthattheempiricalestimatesarelessreliableforlatertimepoints.Viatheshrinkagepriors,theprobabilitiesP[Yj=1jSj,Yj1=yj1, 3-3 showstheposteriorofP[Y7=1jZ=z],thetreatment-specicprobabilityofdepressionattheendofthe36-monthfollowup(solidlines).Forcomparison,theposteriorunderMAR(correspondingtopointmasspriorsforatzero)isalsopresented(dashedlines).Theobserveddepressionrates(i.e.,completecaseanalysis)were0.124and0.112fortheplaceboandtamoxifenarms,respectively.UndertheMNARanalysis(usingtheelicitedpriors),theposteriormeanofthedepressionratesatmonth36were0.133(95%CI:0.122,0.144)and0.125(95%CI:0.114,0.136)fortheplaceboandtamoxifenarms;thedifferencewas0.007(95%CI:0.023,0.008).UnderMAR,therateswere0.132(95%CI:0.121,0.143)and0.122(95%CI:0.111,0.133)fortheplaceboandtamoxifenarms;thedifferencewas0.01(95%CI:0.025,0.005). 63 PAGE 64 2-2 ),abeliefthatwascapturedbytheelicitedpriors.Figure 3-4 showsthatunderthetwotreatmentstherewerenosignicantdifferencesinthedepressionratesatanymeasurementtime(95%credibleintervalsallcoverzero)underbothMNARandMAR.Similar(non-signicant)treatmentdifferenceswereseenwhenexaminingtreatmentcomparisonsconditionalondepressionstatusatbaseline. Theposteriormeanandbetween-treatmentdifferenceofthedepressionrateatmonth36with95%CIaregiveninTables 3-3 and 3-4 .Noneofthescenariosconsideredresultedinthe95%CIforthedifferenceinratesofdepressionat36monthsthatexcludedzeroexceptforthe(extreme)scenariowheretheelicitedtamoxifenintervalswereshiftedby0.5andtheelicitedplacebointervalswereshiftedby0.5. Wealsoassessedtheimpactofswitchingthepriorsfortheplaceboandtamoxifenarms;inthiscase,theposteriormeanswere0.135(95%CI:0.124,0.146)and0.123(95%CI:0.112,0.134)fortheplaceboandtamoxifenarmsrespectively,whilethedifferencewas0.012(95%CI:0.027,0.004). 64 PAGE 65 2 forintermittentmissingness.Inaddition,wereparameterizedthesaturatedobserveddatamodelanddramaticallyimprovedthecomputationalefciency. WinBUGScanstillbeappliedforthereparameterizedmodelwhenthereisnointermittentmissingdata.However,withtheintermittentmissingness,theaugmentationstepintheposteriorcomputationrequiresextensiveprogramminginWinBUGS.Nevertheless,theapproachinChapter 2 maystillbepreferredincertaincases,e.g.fordirectlyshrinkingtheinteractionterms. Asanextension,wemightconsideralternativestothepartialignorabilityassumption(Assumption1)whichhasbeenwidelyused,butquestionedbysome( Robins 1997 ). MissingnessbyScheduledMeasurementTime TimePointj(Month) 65 PAGE 66 SimulationResults:MSE(103).PandTrepresentplaceboandtamoxifenarms,respectively. Observedj,z(Month) ModelTreatYR1(3)2(6)3(12)4(18)5(24)6(30)7(36) ShrinkageP29.4782.3100.2020.2260.2520.3030.3120.3370.372T28.4102.3650.2120.2320.2940.3360.3300.3900.419 SaturatedP57.107111.2630.2020.2280.3020.4901.0832.4014.427T55.582104.8820.2110.2450.3830.6571.3523.1675.782 ShrinkageP23.5450.6470.0530.0560.0630.0780.0820.0840.095T22.5980.6150.0500.0630.0630.0800.0930.0950.102 SaturatedP40.32277.6270.0530.0570.0690.1000.1880.4570.946T38.94372.7310.0500.0640.0670.1100.2180.5601.223 ShrinkageP18.830.3940.0200.0240.0260.0330.0310.0390.036T18.0550.3220.0240.0240.0280.0360.0340.0400.041 SaturatedP30.07154.4540.0200.0240.0270.0380.0520.130.270T29.15650.5900.0240.0240.0290.0390.0590.1480.373 66 PAGE 67 SensitivitytotheElicitedPrior Scenario(T:Tamoxifen,P:Placebo) 10%25%10%25%10%25%10%25% TamoxifenMinimum 0.790.501.181.461.601.800.600.80Median 1.201.501.201.501.702.000.701.00Maximum 1.702.001.221.521.802.100.801.10P[Y7=1](95%CI) 0.850.801.041.281.511.700.510.70Median 1.051.301.051.301.551.800.550.80Maximum 1.301.801.061.321.601.900.600.90P[Y7=1](95%CI) PAGE 68 SensitivitytotheElicitedPrior Scenario(T:Tamoxifen,P:Placebo) 10%25%10%25%10%25%10%25% TamoxifenMinimum 0.790.501.181.461.601.800.600.80Median 1.201.501.201.501.702.000.701.00Maximum 1.702.001.221.521.802.100.801.10P[Y7=1](95%CI) 1.041.280.850.800.510.701.511.70Median 1.051.301.051.300.550.801.551.80Maximum 1.061.321.301.800.600.901.601.90P[Y7=1](95%CI) PAGE 69 SolidanddashedlinesrepresenttheempiricalrateofP[Yj=1,SjjZ=z]andP[S PAGE 70 (A)Theempiricalrateandmodel-basedposteriormeanofP[Yj=1jSj, 70 PAGE 71 PosteriordistributionofP[Y7=1jZ=z].Blackandgraylinesrepresenttamoxifenandplaceboarms,respectively.SolidanddashedlinesareforMNARandMAR,respectively. 71 PAGE 72 Posteriormeanand95%credibleintervalofdifferenceofP[Yj=1jZ=z]betweenplaceboandtamoxifenarms.ThegrayandwhiteboxesareforMARandMNAR,respectively. 72 PAGE 73 Gibbssamplerforposteriorcomputation:IntherststepoftheGibbssampler,wedraw,foreachsubjectwithintermittentmissingdata,fromthefullconditionalofYImisgivenz,z,m()z,()z,m()z,()z,Yobs,S,RSandZ=z.ThefullconditionaldistributioncanbeexpressedasP[YImis=yImisjz,z,m()z,()z,m()z,()z,Yobs=yobs,S=s,Rs=rs,Z=z]=P[YImis=yImis,Yobs=yobs,S=sjz,z,m()z,()z,m()z,()z,Z=z] Inthesecondstep,wedrawfromthefullconditionalofm()zgivenfYImisg,z,z,()z,m()z,()z,fYobsg,fSg,fRSgandfZg=z,wherethenotationfDgdenotesdataDforalltheindividualsonthestudy.ThefullconditionalcanbeexpressedasJYj=21Yy=0f(m()z,j,yjfYImisg,z,()z,j,y,fYobsg,fSg,fZg=z) Inthethirdstep,wedrawfromthefullconditionalofm()zgivenfYImisg,z,z,()z,m()z,()z,fYobsg,fSg,fRSgandfZg=z.ThefullconditionalcanbeexpressedasJYj=21Yy=0f(m()z,j1,yjfYImisg,z,()z,j1,y,fYobsg,fSg,fZg=z) PAGE 74 gE()z,j,y()z,j,y+12YSij,Yi,j1=yi:Zi=zB(z,j, gE()z,j1,y()z,j1,y+12YSij1,Yi,j1=yi:Zi=zB(z,j1, Neal 2003 ). 74 PAGE 75 Finally,wedrawfromthefullconditionalofzgivenfYImisg,z,r()z,()z,r()z,()z,fYobsg,fSg,fRsgandfZg=z.ThefullconditionalcanbeexpressedasJYj=21Yy=0Yall 3 ),theposteriordistributionsofP[Yj=1jSj,Yj1, 3 ),forallZ,jand 75 PAGE 76 Toseethis,notethat(pjjY,N)/pYjj(1pj)njYjZ,(pjj,)(,)dd. PAGE 77 DanielsandHogan 2008 ; HoganandLaird 1997b ; KenwardandMolenberghs 1999 ; Little 1995 ; MolenberghsandKenward 2007 ).Inthispaper,wewillconcernourselveswithpatternmixturemodelswithmonotonemissingness(i.e.,drop-out).Forpatternmixturemodelswithnon-monotone(i.e.,intermittent)missingness(detailsgobeyondthescopeofthispaper),oneapproachistopartitionthemissingdataandallowone(ormore)orthepartitionstobeignoredgiventheotherpartition(s)( HarelandSchafer 2009 ; Wangetal. 2010 ). Itiswellknownthatpattern-mixturemodelsarenotidentied:theobserveddatadoesnotprovideenoughinformationtoidentifythedistributionsforincompletepatterns.Theuseofidentifyingrestrictionsthatequatetheinestimableparameterstofunctionsofestimableparametersisanapproachtoresolvetheproblem( DanielsandHogan 2008 ; Kenwardetal. 2003 ; Little 1995 ; LittleandWang 1996 ; Thijsetal. 2002 ).Commonidentifyingrestrictionsincludecompletecasemissingvalue(CCMV)constraintsandavailablecasemissingvalue(ACMV)constraints. Molenberghsetal. ( 1998 )provedthatfordiscretetimepointsandmonotonemissingness,theACMVconstraintisequivalenttomissingatrandom(MAR),asdenedby Rubin ( 1976 )and LittleandRubin ( 1987 ).Akeyandattractivefeatureofidentifyingrestrictionsisthattheydonotaffectthetofthemodeltotheobserveddata.Understanding(identifying)restrictionsthatleadtoMARisanimportantrststepforsensitivityanalysisundermissingnotatrandom(MNAR)( DanielsandHogan 2008 ; Scharfsteinetal. 2003 ; ZhangandHeitjan 2006 ). 77 PAGE 78 DanielsandHogan 2008 ; Scharfsteinetal. 1999 ; Vansteelandtetal. 2006b ). Thenormalityofresponsedata(ifappropriate)forpatternmixturemodelsisdesirableasiteasilyallowsincorporationofbaselinecovariatesandintroductionofsensitivityparameters(forMNARanalysis)thathaveconvenientinterpretationsasdeviationsofmeansandvariancesfromMAR( DanielsandHogan 2008 ).However,multivariatenormalitywithinpatternscanbeoverlyrestrictive.Weexploresuchissuesinthispaper. Onecriticismofmixturemodelsisthattheyofteninducemissingdatamechanismsthatdependonthefuture( Kenwardetal. 2003 ).Weexploresuchnon-futuredependenceinourcontexthereandshowhowmixturemodelsthathavesuchmissingdatamechanismshavefewersensitivityparameters. InSection 4.2 ,weshowconditionsunderwhichMARexistsanddoesnotexistwhenthefull-dataresponseisassumedmultivariatenormalwithineachmissingpattern.InSection 4.3 andSection 4.4 inthesamesetting,weexploresensitivityanalysisstrategiesunderMNARandundernon-futuredependentMNARrespectively.InSection 4.5 ,weproposeasensitivityanalysisapproachwhereonlytheobserveddatawithinpatternareassumedmultivariatenormal.InSection 4.6 ,weapplytheframeworksdescribedinprevioussectionstoarandomizedclinicaltrialforestimatingtheeffectivenessofrecombinantgrowthhormoneforincreasingmusclestrengthintheelderly.InSection 4.7 ,weshowthatinthepresenceofbaselinecovariateswithtime-invariantcoefcients,standardidentifyingrestrictionscauseover-identicationofthebaselinecovariateeffectsandweproposearemedy.WeprovideconclusionsinSection8. 78 PAGE 79 WeshowthatMARdoesnotnecessarilyexistwhenitisassumedthat foralls. Toseethis,weintroducesomefurthernotation.Let(s)(j)=E( 4 ),dene(s)1(j)=(s)21(j)(s)11(j)1(s)2(j)=(s)2(j)(s)1(j)(s)1(j)(s)3(j)=(s)22(j)(s)21(j)(s)11(j)1(s)12(j). PAGE 80 Proof. 4.1 issatised,thenthereexistsaconditionaldistributionps(yjj 4 )forMARtoexist. 4 ),identicationviaMARconstraintsexistsifandonlyif(s)and(s)satisfyLemma 4.1 forsjand1 PAGE 81 4.2 areimposed). Wenowexaminethecorrespondingmissingdatamechanism(MDM),SjY.Weuse'todenoteequalityindistribution. 4 )withmonotonedropout,MARholdsifandonlyifSjY'SjY1. Proof. Ontheotherhand,MARimpliesthatp(S=sjY)=p(S=sjYobs)=p(S=sj 4.2 ,wehavethatMARholdsonlyifforall1 PAGE 82 4 )withmonotonedropout,MCARisequivalenttoMARifps(y1)=p(y1)foralls. Proof. 4.3 ,weshowedthatMARholdsifp(S=sjY)=ps(y1) 4 )withmonotonedropout,MARconstraintsareidenticaltocompletecasemissingvalue(CCMV)andnearest-neighborconstraints(NCMV). Proof. 4.2 ,theMARconstraintsimplypj(yjj PAGE 83 4 )anddemonstratethatMARonlyexistsunderthefairlystrictconditionsgiveninTheorem1. 4.2 ,weproposetofollowtheapproachin DanielsandHogan ( 2008 ,Chapter8)andspecifydistributionsofobservedYwithinpatternas: wherej=f1,2,...,j1g.Notethatbyconstruction,weassumeps(yjj 4 )withmonotonedropout,identicationviaMARconstraintsexistsifandonlytheobserveddatacanbemodeledas( 4 ). Proof. 4.2 showsthatidenticationviaMARconstraintsexistsifandonlyifconditionaldistributionsps(yjj 4.6 impliesthatunderthemultivariatenormalityassumptionin( 4 )andtheMARassumption,asequentialspecicationasin( 4 )alwaysexists. WeprovidesomedetailsforMARinmodel( 4 )(whichimpliesthespecicationin( 4 )asstatedinCorollary 4.6 )next.Distributionsformissingdata(whicharenot 83 PAGE 84 ThemotivationoftheproposedsequentialmodelistoallowastraightforwardextensionoftheMARspecicationtoalargeclassofMNARmodelsindexedbyparametersmeasuringdeparturesfromMAR,aswellastheattractionofdoingsensitivityanalysisonmeansand/orvariancesinnormalmodels. Forexample,wecanlet(j)l=(j)l+(j)landlog(j)jjj=(j)+log(j)jjj PAGE 85 DanielsandHogan 2008 ).Indeed,wecouldmake(j)land(j)independentofjtofurtherreducethenumberofsensitivityparameters. ToseetheimpactoftheparametersontheMDM,weintroducenotation(j)jjj=(j)0+Pj1l=1(j)lYlandthenfork PAGE 86 Kenwardetal. 2003 ). Kenwardetal. ( 2003 )showedthatnon-futuredependenceholdsifandonlyifforeachj3andk PAGE 87 4.3 ). Forexample,wemayspecifydistributionsYobsjSasfollows:ps(y1)N((s)1,(s)1)1sJps(yjj fors PAGE 88 UnderanMARassumption( 4 ),for[Yjj 4 ),thetwosensitivityparameterscontrolthedepartureofthemeanandvariancefromMARinthefollowingway,(s),MNARjjj=(j)+(s),MARjjjand(s),MNARjjj=e2(j)(s),MARjjj+(1e2(j))M, Byassumingnon-futuredependence,weobtainps(yjj PAGE 89 4 )forthecurrentdata(j=s+1).ThenumberofsensitivityparametersinthissetupisreducedfromJ(J1)to(J2)(J1);so,forJ=3(6),from6(30)to2(20).Furtherreductionsareillustratedinthenextsection. 4.4 and 4.5 thatassumemultivariatenormalityforthefull-dataresponsewithinpattern(MVN)ormultivariatenormalityfortheobserveddataresponsewithinpattern(OMVN).Weassumenon-futuredependenceforthemissingdatamechanismtominimizethenumberofsensitivityparameters. Thegrowthhormone(GH)trialwasarandomizedclinicaltrialconductedtoestimatetheeffectivenessofrecombinanthumangrowthhormonetherapyforincreasingmusclestrengthintheelderly.Thetrialhadfourtreatmentarms:placebo(P),growthhormoneonly(G),exerciseplusplacebo(EP),andexerciseplusgrowthhormone(EG).Musclestrength,heremeanquadricepsstrength(QS),measuredasthemaximumfoot-poundsoftorquethatcanbeexertedagainstresistanceprovidedbyamechanicaldevice,wasmeasuredatbaseline,6monthsand12months.Therewere161participantsenrolledonthisstudy,butonly(roughly)75%ofthemcompletedthe12monthfollowup.Researchersbelievedthatdropoutwasrelatedtotheunobservedstrengthmeasuresatthedropouttimes. Forillustration,weconneourattentiontothetwoarmsusingexercise:exerciseplusplacebo(EP)andexerciseplusgrowthhormone(EG).Table 4-1 containstheobserveddata. Let(Y1,Y2,Y3)denotethefull-dataresponsecorrespondingtobaseline,6months,and12months.LetZbethetreatmentindicator(1=EG,0=EP).OurgoalistodrawinferenceaboutthemeandifferenceofQSbetweenthetwotreatmentarmsatmonth 89 PAGE 90 Thisreducesthesetofsensitivityparameterstof(2)0,(3)0gforMVNmodelandf(2),(3)gfortheOMVNmodel. Thereareavarietyofwaystospecifypriorsforthesensitivityparameters(2)0and(3)0,(2)0=E(Y2jY1,S=1)E(Y2jY1,S2)(3)0=E(Y3jY2,Y1,S=2)E(Y3jY2,Y1,S=3). Basedondiscussionwithinvestigators,wemadetheassumptionthatdropoutsdoworsethancompleters;thus,werestrictthe'stobelessthanzero.TodoafullyBayesiananalysistofairlycharacterizetheuncertaintyassociatedwiththemissingdatamechanism,weassumeauniformpriorforthe'sasadefaultchoice.Subjectmatterconsiderationsgaveanupperboundofzerofortheuniformdistributions.Weset 90 PAGE 91 WetthemodelusingWinBUGS,withmultiplechainsof25,000iterationsand4000burn-in.Convergencewascheckedbyexaminingtraceplotsofthemultiplechains. TheresultsoftheMVNmodel,OMVNmodel,andtheobserveddataanalysisarepresentedinTable 4-2 .UnderMNAR,theposteriormean(posteriorstandarddeviation)ofthedifferenceinquadricepsstrengthat12monthsbetweenthetwotreatmentarmswas4.0(8.9)and4.4(10)fortheMVNandOMVNmodels.UnderMARthedifferenceswere5.4(8.8)and5.8(9.9)fortheMVNandOMVNmodels,respectively.ThesmallerdifferencesunderMNARwereduetoquadricepsstrengthat12monthsbeinglowerunderMNARduetotheassumptionthatdropoutsdoworsethancompleters.Weconcludethatthetreatmentdifference,wasnotsignicantlydifferentfromzero. PAGE 92 For( 4 )toholdforallY1andX,weneedthat(1)=(2). LittleandWang 1996 ; WangandDaniels 2009 ). Toresolvetheover-identicationissue,weproposetoapplyMARconstraintsonresidualsinsteadofdirectlyontheresponses.Inthebivariatecase,thecorrespondingrestrictionis 92 PAGE 93 4 )placesnoconstraintson(s),thusavoidingover-identication. TheMDMcorrespondingtotheACMV(MAR)ontheresidualsisgivenbylogP(S=1jY,X) 1(X)1 2((1B)2X((2)(2)T(1)(1)T)XT2(1B)(Y2(Y1))X((2)(1)))1 2log(2)11 4 )impliesMARifandonlyif(2)=(1).Soingeneral,MARonresidualsdoesnotimplythatmissingnessinY2isMAR.However,itisanidentifyingrestrictionthatdoesnotaffectthetofthemodeltotheobserveddata.CCMVandNCMVrestrictionscanbeappliedsimilarlytotheresiduals. PAGE 94 Forthemissingdata,theconditionaldistributionsarespeciedasps(yjj Theconditionalmeanstructuresin( 4 )and( 4 )inducethefollowingformforthemarginalmeanresponseE(YjjS=s)=U(s)j+X(s), NotethatsinceY1isalwaysobserved,(s)(1sJ)areidentiedbytheobserveddata.However,inthemodelgivenby( 4 )and( 4 ),thereisatwo-foldover-identicationof(s)underMAR: 1. ForMARconstraintstoexistunderthemodelgivenin( 4 ),(s)jjjasdenedin( 4 )mustbeequalfor2jsJandforallX.Thisrequiresthat(s)=for2jsJ. 2. MARconstraintsalsoimplythat(s)jjjasdenedin( 4 )mustbeequalto(j)jjjfor1s PAGE 95 Withtheconditionalmeanstructuresspeciedas( 4 )and( 4 ),theMARontheresidualsrestrictionplacesnoassumptionson(s). ThecorrespondingMDMislogP(S=sjY,X) However,asimplepatternmixturemodelbasedonmultivariatenormalityforthefulldataresponsewithinpatternsdoesnotallowMARwithoutspecialrestrictionsthatthemselves,induceaveryrestrictivemissingdatamechanism.Weexploredthis 95 PAGE 96 Inaddition,weshowedthatwhenintroducingbaselinecovariateswithtimeinvariantcoefcients,standardidentifyingrestrictionsresultinover-identicationofthemodel.Thisisagainsttheprincipleofapplyingidentifyingrestrictioninthattheyshouldnotaffectthemodelttotheobserveddata.Weproposedasimplealternativesetofrestrictionsbasedonresidualsthatcanbeusedasan'identication'startingpointforananalysisusingmixturemodels. Inthegrowthhormonestudydataexample,weshowedhowtoreducethenumberofsensitivityparametersinpracticeandadefaultwaytoconstructinformativepriorsforsensitivityparametersbasedonlimitedknowledgeaboutthemissingness.Inparticular,allthevaluesintherange,Dwereweightedequallyviaauniformdistribution.Ifthereisadditionalexternalinformationfromexpertopinionorhistoricaldata,informativepriorsmaybeusedtoincorporatesuchinformation(forexample,see IbrahimandChen 2000 ; Wangetal. 2010 ).Finally,animportantconsiderationinsensitivityanalysisandconstructinginformativepriorsisthattheyshouldavoidextrapolatingmissingvaluesoutsideofareasonablerange(e.g.,negativequadricepsstrength). GrowthHormoneStudy:Samplemean(standarddeviation)stratiedbydropoutpattern. DropoutNumberofMonth TreatmentPatternParticipants0612 EG11258(26)2457(15)68(26)32278(24)90(32)88(32)All3869(25)87(32)88(32) EP1765(32)2287(52)86(51)33165(24)81(25)73(21)All4066(26)82(26)73(21) 96 PAGE 97 GrowthHormoneStudy:Posteriormean(standarddeviation) ObservedMARAnalysisMNARAnalysis TreatmentMonthDataMVNOMVNMVNOMVN EP066(9.9)66(6.0)66(6.0)66(6.0)66(6.0)682(18)82(5.9)81(8.2)80(6.0)80(8.3)1272(3.8)73(4.9)73(6.1)72(5.0)71(6.1) EG069(7.3)69(4.9)69(4.9)69(4.9)69(4.9)687(16)81(6.8)82(7.7)78(7.1)79(8.0)1288(6.8)78(7.2)79(7.8)76(7.5)76(8.0) Differenceat12mos.12(7.8)5.4(8.8)5.8(9.9)4.0(8.9)4.4(10) 97 PAGE 98 Missingdatamechanismundermissingnotatrandomandmultivariatenormality:TheMDMinSection 4.3 isderivedasfollows:logP(S=sjY) 2exp((Y1(k)1)2 2exp((Yl(l)ljl(l)ljl)2 4.5 arederivedasfollows:(s),MNARjjj=E(Yjj PAGE 99 4.6 ):Wespecifyapatternmixturemodelwithsensitivityparametersforthetwotreatmentarms.Forcompactness,wesuppresssubscripttreatmentindicatorzfromalltheparametersinthefollowingmodels. FormissingpatternS,wespecifySMult() 99 PAGE 101 4.7.2 ),theMARontheresidualsrestrictionputsnoconstraintson(s). Let[ZjjS]'YjX(s).TheMARontheresidualsconstraintsarepk(zjj 2(s)ljlyl(l)lX(s)Pj1t=1(j)t(yt(l)tX(s))2 q 2(s)ljlzl(l)lPj1t=1(j)t(zt(l)t2 q PAGE 102 4 )thusimply(s)j,l=(j)j,l(s)j=(j)j+j1Xl=1(j)j,l((s)l(j)l)(j)jjj=(j)jjj, 102 PAGE 103 Farniretal. 2002 ; Liuetal. 2006 ).Insuchanalysis,researchersareofteninterestedinsimultaneouslyestimatinglinkagethatdescribesthetendencyofcertainallelestobeinheritedtogether,andlinkagedisequilibrium(LD)thatmeasuresthenon-randomassociationbetweendifferentmarkers.However,iftheLDisclosetozero,thelinkagerecombinationfractionishardtoestimate(ornotestimableatall).Weuseatwomarkerscenariotoillustratethisdilemma. 103 PAGE 104 whereDistheLDparameter.Bysimplealgebra,wecanshowthatD=p11p00p10p01. Onepossiblesolutionistoincorporatemoremarkersinthelinkageanalysis.Bydoingthis,thenumberofparentswithnolessthantwoheterozygousmarkersincreases.Consequently,moreoffspringcontributetothelikelihoodforestimatingthelinkagerecombinationfraction.However,thenumberofhaplotypefrequenciestobeestimatedalsoincreases(exponentially)asthenumberofmarkersincreases.Bayesianshrinkagemethodscanbeappliedtoaddressthisproblem. 104 PAGE 105 5.3.1CausalInferenceIntroduction Rubin 1974 ).However,innon-randomizedtrialsorinpresenceofmissingdata,thesemethodsarelimitediftheresearchinterestdemandsestimationofcausallyinterpretableeffects. Todenecausaleffects,werstintroducetheconceptofpotentialoutcomes,whicharesometimesusedexchangeablelywiththetermcounterfactual(butnotalways,see Rubin 2000 ).Theuseoftermpotentialoutcomecanbetracedatleastto Neyman ( 1923 ).NeymanusedpotentialyieldsUiktoindicatetheyieldofaplotkifexposedtoavarietyi. Rubin ( 1974 )denesthecausaleffectofonetreatment,E,overanother,C,foraparticularunitasthedifferencebetweenwhatwouldhavehappenediftheunithadbeenexposedtoE,namelyY(E),andwhatwouldhavehappenediftheunithadbeenexposedtoC,namelyY(C). Usingpotentialoutcomes, FrangakisandRubin ( 2002 )introduceaframeworkforcomparingtreatmenteffectsbasedonprinciplestratication,whichisacross-classicationofunitsdenedbytheirpotentialoutcomeswithrespectto(post)treatmentvariables,suchastreatmentnoncomplianceordrop-out.Thetreatmentcomparisonadjustmentforposttreatmentvariablesisnecessarybecausesuchvariablesencodethecharacteristicsofboththetreatmentandthepatient.Forexample,apatientwithdiagnosedcancerinacancerpreventiontrailmayhavedepressioncausedbythetreatmentorbythediagnosis 105 PAGE 106 AstratumAisdenedbythejointpotentialresponseS(Z)withrespecttotheposttreatmentvariableZ(e.g.,Z=0,1).Forexample,letS(Z)bethepotentialsurvivalstatusandletS(Z)=1and0denotealiveanddeadrespectively.ThenthestratumA=fS(0)=1,S(1)=1gdenesthepatientswhowill(potentially)surviveonbotharms. Astratumisunaffectedbytreatment.Thatis,forsubjecti,i2Aori62Adoesnotdependontheactualtreatmentiisassigned.Consequently,thetreatmenteffectdenedasthedifferencebetweenfYi(0)ji2AgandfYi(1)ji2Ag Onthecontrary,astandardadjustmentforposttreatmentvariablesusesthetreatmentcomparisonbetweenfYi(0)jSi(0)=sgandfYi(1)jSi(1)=sg. ConsistentwithFrangakisandRubin'sframework, Rubin ( 2000 )introducedtheconceptofsurvivorsaveragecausaleffect(SACE),thatisthecausaleffectsoftreatmentonendpointsthataredenedonlyforsurvivors,i.e.thegroupofpatientswhowouldliveregardlessoftheirtreatmentassignment. Withintheprincipalstrataframework,theidenticationofSACEorotherprincipalstratumcausaleffectsusuallydependsonuntestableassumptions.Toaddressthe 106 PAGE 107 ZhangandRubin ( 2003 )derivedlargesampleboundsforcausaleffectswithoutassumptionsandwithassumptionssuchasmonotonicityondeathrateondifferenttreatmentarms. Gilbertetal. ( 2003 )usedaclassoflogisticselectionbiasmodelstoidentifythecausalestimandsandcarriedoutsensitivityanalysisforthemagnitudeofselectionbias. Haydenetal. ( 2005 )assumedexplainablenonrandomnoncompliance( Robins 1998 )andoutlinedasensitivityanalysisforexploringtherobustnessoftheassumption. ChengandSmall ( 2006 )derivedsharpboundsforthecausaleffectsandconstructedcondenceintervalstocovertheidenticationregion. Eglestonetal. ( 2007 )proposedasimilarmethodto ZhangandRubin ( 2003 ),butinsteadofidentifyingthefulljointdistributionofpotentialoutcomes,theyonlyidentifyfeaturesofthejointdistributionthatarenecessaryforidentifyingtheSACEestimand. Leeetal. ( 2010 )replacedthecommondeterministicmonotonicityassumptionbyastochasticonethatallowsincorporationofsubjectspeciceffectsandgeneralizedtheassumptionstomorecomplextrials. Weconsideracontrolledrandomizedclinicalstudywithtreatmentarm(Z=1)andcontrolarm(Z=0).AlongitudinalbinaryoutcomeYisscheduledtobemeasuredatvisitsj=1,...,J,i.e.Y=(Y1,...,YJ)isaJ-dimensionalvector.LetR=(R1,...,RJ)bethemissingindicatorvectorwithRj=1ifYjisobservedandRj=0ifYjismissing.Weassumethemissingnessismonotone. Weassumetherearemultipleeventsthatwillcausedropoutforapatientonthistrial,andcategorizetheeventsasnon-responseevents(e.g.death)andmissingevents(e.g.withdrawofconsent).Weassumethatnon-responseeventsmayhappenafterthe 107 PAGE 108 LetCdenotethesurvivaltimeforapatient.Thatis,C=cimpliesthatanon-responseeventhappenedtothepatientbetweenvisitcandc+1andcausedthepatienttodropoutonandaftervisitc+1.LetRc=fR1,...,RCgbethemissingdataindicatorrecordedpriortopatientdrop-outthatiscausedbyanon-responseevent. Weuse ThefulldataFofapatientthusconsistsoffZ,C(0), forallj.NotethatthegroupofpatientsofinterestfC(0)j,C(1)jgformaprincipalstratum. 108 PAGE 109 3 ,Section 3.2 ,thatRc? 3 thatj,z,cisidentiedbytheobserveddataunderthispartialmissingatrandomassumption. 5 )isnotidentiablefromtheobserveddataO=fZ,C(Z),Rc(Z),Yobs(Z)g. I LetZ=(Z1,...,ZN)bethevectoroftreatmentassignmentforallthepatients.SUTVAmeansZi=Z0i)(Yi(Zi),Ci(Zi))=(Yi(Z0i),Ci(Z0i)), II III Thisassumptionprovidesanorderingofthemeanpotentialresponseatvisitjundertreatmentzforalltheprincipalcohortsofindividualswhowouldbeonstudyatvisitjundertreatmentz.Themeansareassumedtonotbeworseforcohortswhoremainon-studylongerunderbothtreatments.Thatis,theindividualswhowouldbelastseenattimec0(c0j)undertreatmentzandtimet0undertreatment1zwillnothaveaworsemeanpotentialresponseattimejundertreatmentzthanindividuals 109 PAGE 110 Themeanmonotonicityassumptionisoftenreasonableinclinicalstudies.Forexample,inacardiovascularstentimplantationtrial,multipleendpointsincludingall-causemortalityfreesurvivaland6-minutewalktestscoreareusedtoevaluatetheeffectivenessofthedevice.Sincethetwoendpointsarepositivelycorrelated,itisplausibletoassumethatpatientswillpotentiallyperformbetterwiththeir6-minutewalktestsiftheyhavealongersurvivaltime,i.e.remainonthestudylonger. Weintroducesomefurthernotation 1. 2. 3. 4. NotethatunderAssumption II (randomization),bothz,c=P(C(z)=c)=P(C=cjZ=z)andj,z,c=E[Yj(z)jC(z)=c]=E[YjjC=c,Z=z]areidentiedbytheobserveddataunderthepartialmissingatrandomassumption. ThecausaleffectofinterestSACEjcanbeexpressedas TheboundariesofSACEjin( 5 )canbefoundsubjecttothefollowingrestrictions: 1. 2. 3. 4. 110 PAGE 111 III FindingtheboundariesoftheSACE,i.e.ndingtheminimumandthemaximumoftheobjectivefunction( 5 ),canbeapproximated(byignoringthenormalizingconstant)asanon-convexquadraticallyconstrainedquadraticproblem(QCQP)( BoydandVandenberghe 1997 2004 ).ForaQCQP,astandardapproachistooptimizeasemideniterelaxationoftheQCQPandgetlowerandupperboundsonlocaloptimaloftheobjectivefunction( BoydandVandenberghe 1997 ). TheuncertaintyoftheestimatedboundscanbecharacterizedinaBayesianframework.Thejointposteriordistributionoftheboundscanbeconstructedbyimplementingtheoptimizationforeachposteriorsampleofj,z,c,identiedbythealgorithmproposedinSection 5.3.3 .TheresultcanbepresentedasinFigure 5-1 .Astudydecisionmightbebasedonthemodeoftheposteriorjointdistributionofthebounds. 5-2 ).Theseassumptions,whenreasonable,willsimplifytheoptimizationoftheobjectivefunctionandyieldmorepreciseresults. 1. Thatis,givenapatientwillsurviveuntiltimepointconthetreatmentarm,theprobabilitythepatientwillsurviveuntiltimepointn1isqtimestheprobabilitythatthepatientwillsurviveuntilnforncontheplaceboarm.Theparameterqisasensitivityparameter. 2. 111 PAGE 112 5-2 iszero. TheseassumptionsmaybeincorporatedintheoptimizationBayesianframeworktoimprovetheprecisionoftheposteriorjointdistributionofthebounds. PAGE 113 ContourandPerspectivePlotsofaBivariateDensity 113 PAGE 114 Illustrationofpc,t PAGE 115 Albert,P.(2000).ATransitionalModelforLongitudinalBinaryDataSubjecttoNonignorableMissingData.Biometrics56,602. Albert,P.,Follmann,D.,Wang,S.,andSuh,E.(2002).Alatentautoregressivemodelforlongitudinalbinarydatasubjecttoinformativemissingness.Biometrics58,631. Baker,S.(1995).Marginalregressionforrepeatedbinarydatawithoutcomesubjecttonon-ignorablenon-response.Biometrics51,1042. Baker,S.,Rosenberger,W.,andDerSimonian,R.(1992).Closed-formestimatesformissingcountsintwo-waycontingencytables.StatisticsinMedicine11,643. Birmingham,J.andFitzmaurice,G.(2002).APattern-MixtureModelforLongitudinalBinaryResponseswithNonignorableNonresponse.Biometrics58,989. Boyd,S.andVandenberghe,L.(1997).Semideniteprogrammingrelaxationsofnon-convexproblemsincontrolandcombinatorialoptimization.communications,computation,controlandsignalprocessing:atributetoThomasKailath. Boyd,S.andVandenberghe,L.(2004).Convexoptimization.CambridgeUnivPr. Cheng,J.andSmall,D.(2006).Boundsoncausaleffectsinthree-armtrialswithnon-compliance.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology)68,815. Christiansen,C.andMorris,C.(1997).HierarchicalPoissonregressionmodeling.JournaloftheAmericanStatisticalAssociationpages618. Daniels,M.(1999).Apriorforthevarianceinhierarchicalmodels.CanadianJournalofStatistics27,. Daniels,M.andHogan,J.(2000).ReparameterizingthePatternMixtureModelforSensitivityAnalysesUnderInformativeDropout.Biometrics56,1241. Daniels,M.andHogan,J.(2008).MissingDatainLongitudinalStudies:StrategiesforBayesianModelingandSensitivityAnalysis.Chapman&Hall/CRC. DeGruttola,V.andTu,X.(1994).ModellingProgressionofCD4-lymphocyteCountanditsRelationshiptoSurvivalTime.Biometrics50,1003. Diggle,P.andKenward,M.(1994).InformativeDrop-outLongitudinalDataAnalysis.AppliedStatistics43,49. Egleston,B.L.,Scharfstein,D.O.,Freeman,E.E.,andWest,S.K.(2007).Causalinferencefornon-mortalityoutcomesinthepresenceofdeath.Biostatistics8,526545. 115 PAGE 116 Fan,J.andLi,R.(2001).VariableSelectionViaNonconcavePenalizedLikelihoodandItsOracleProperties.JournaloftheAmericanStatisticalAssociation96,1348. Farnir,F.,Grisart,B.,Coppieters,W.,Riquet,J.,Berzi,P.,Cambisano,N.,Karim,L.,Mni,M.,Moisio,S.,Simon,P.,etal.(2002).Simultaneousminingoflinkageandlinkagedisequilibriumtonemapquantitativetraitlociinoutbredhalf-sibpedigrees:revisitingthelocationofaquantitativetraitlocuswithmajoreffectonmilkproductiononbovinechromosome14.Genetics161,275. Faucett,C.andThomas,D.(1996).Simultaneouslymodellingcensoredsurvivaldataandrepeatedlymeasuredcovariates:aGibbssamplingapproach.StatisticsinMedicine15,. Fisher,B.,Costantino,J.,Wickerham,D.,Redmond,C.,Kavanah,M.,Cronin,W.,Vogel,V.,Robidoux,A.,Dimitrov,N.,Atkins,J.,Daly,M.,Wieand,S.,Tan-Chiu,E.,Ford,L.,Wolmark,N.,otherNationalSurgicalAdjuvantBreast,andInvestigators,B.P.(1998).Tamoxifenforpreventionofbreastcancer:reportoftheNationalSurgicalAdjuvantBreastandBowelProjectP-1study.JournaloftheNationalCancerInstitute90,1371. Fitzmaurice,G.andLaird,N.(2000a).Generalizedlinearmixturemodelsforhandlingnonignorabledropoutsinlongitudinalstudies.Biostatistics1,141. Fitzmaurice,G.andLaird,N.(2000b).GeneralizedLinearMixtureModelsforHandlingNonignorableDropoutsinLongitudinalStudies.Biostatistics1,141. Fitzmaurice,G.,Molenberghs,G.,andLipsitz,S.(1995).RegressionModelsforLongitudinalBinaryResponseswithInformativeDrop-Outs.JournaloftheRoyalStatisticalSociety.SeriesB.Methodological57,691. Follmann,D.andWu,M.(1995).Anapproximategeneralizedlinearmodelwithrandomeffectsforinformativemissingdata.Biometricspages151. Forster,J.andSmith,P.(1998).Model-BasedInferenceforCategoricalSurveyDataSubjecttoNon-IgnorableNon-Response.JournaloftheRoyalStatisticalSociety:SeriesB:StatisticalMethodology60,57. Frangakis,C.E.andRubin,D.B.(2002).Principalstraticationincausalinference.Biometrics58,21. Gilbert,P.,Bosch,R.,andHudgens,M.(2003).SensitivityanalysisfortheassessmentofcausalvaccineeffectsonviralloadinHIVvaccinetrials.Biometrics59,531. Green,P.J.andSilverman,B.(1994).NonparametricRegressionandGeneralizedLinearModels.Chapman&Hall. 116 PAGE 117 Hayden,D.,Pauler,D.,andSchoenfeld,D.(2005).Anestimatorfortreatmentcomparisonsamongsurvivorsinrandomizedtrials.Biometrics61,305. Heagerty,P.(2002).Marginalizedtransitionmodelsandlikelihoodinferenceforlongitudinalcategoricaldata.Biometricspages342. Heckman,J.(1979a).SampleSelectionBiasasaSpecicationError.Econometrica47,153. Heckman,J.(1979b).Sampleselectionbiasasaspecicationerror.Econometrica:Journaloftheeconometricsocietypages153. Heitjan,D.andRubin,D.(1991).Ignorabilityandcoarsedata.TheAnnalsofStatisticspages2244. Henderson,R.,Diggle,P.,andDobson,A.(2000).Jointmodellingoflongitudinalmeasurementsandeventtimedata.Biostatistics1,465. Hogan,J.andLaird,N.(1997a).MixtureModelsfortheJointDistributionofRepeatedMeasuresandEventTimes.StatisticsinMedicine16,239. Hogan,J.andLaird,N.(1997b).Model-BasedApproachestoAnalysingIncompleteLongitudinalandFailureTimeData.StatisticsinMedicine16,259. Hogan,J.,Lin,X.,andHerman,B.(2004).Mixturesofvaryingcoefcientmodelsforlongitudinaldatawithdiscreteorcontinuousnonignorabledropout.Biometrics60,854. Ibrahim,J.andChen,M.(2000).Powerpriordistributionsforregressionmodels.StatisticalSciencepages46. Ibrahim,J.,Chen,M.,andLipsitz,S.(2001).Missingresponsesingeneralisedlinearmixedmodelswhenthemissingdatamechanismisnonignorable.Biometrika88,551. Kaciroti,N.,Schork,M.,Raghunathan,T.,andJulius,S.(2009).ABayesianSensitivityModelforIntention-to-treatAnalysisonBinaryOutcomeswithDropouts.StatisticsinMedicine28,572. Kenward,M.andMolenberghs,G.(1999).ParametricModelsforIncompleteContinuousandCategoricalLongitudinalData.StatisticalMethodsinMedicalResearch8,51. Kenward,M.,Molenberghs,G.,andThijs,H.(2003).Pattern-mixturemodelswithpropertimedependence.Biometrika90,53. Kurland,B.andHeagerty,P.(2004).MarginalizedTransitionModelsforLongitudinalBinaryDatawithIgnorableandNon-IgnorableDrop-Out.StatisticsinMedicine23,2673. 117 PAGE 118 Land,S.,Wieand,S.,Day,R.,TenHave,T.,Costantino,J.,Lang,W.,andGanz,P.(2002).MethodologicalIssuesIntheAnalysisofQualityofLifeDatainClinicalTrials:IllustrationsfromtheNationalSurgicalAdjuvantBreastAndBowelProject(NSABP)BreastCancerPreventionTrial.StatisticalMethodsforQualityofLifeStudiespages71. Lee,J.andBerger,J.(2001).SemiparametricBayesianAnalysisofSelectionModels.JournaloftheAmericanStatisticalAssociation96,1397. Lee,J.,Hogan,J.,andHitsman,B.(2008).Sensitivityanalysisandinformativepriorsforlongitudinalbinarydatawithoutcome-relateddrop-out.TechnicalReport,BrownUniversity. Lee,K.,Daniels,M.J.,andSargent,D.J.(2010).Causaleffectsoftreatmentsforinformativemissingdataduetoprogression.ToAppearinJASA. Liang,K.-Y.andZeger,S.L.(1986).Longitudinaldataanalysisusinggeneralizedlinearmodels.Biometrika73,13. Lin,H.,McCulloch,C.,andRosenheck,R.(2004).Latentpatternmixturemodelsforinformativeintermittentmissingdatainlongitudinalstudies.Biometrics60,295. Little,R.(1993).Pattern-MixtureModelsforMultivariateIncompleteData.JournaloftheAmericanStatisticalAssociation88,125. Little,R.(1994).AClassofPattern-MixtureModelsforNormalIncompleteData.Biometrika81,471. Little,R.(1995).Modelingthedrop-outmechanisminrepeated-measuresstudies.JournaloftheAmericanStatisticalAssociation90,. Little,R.andRubin,D.(1987).StatisticalAnalysiswithMissingData.Wiley. Little,R.andRubin,D.(1999).CommentonAdjustingforNon-IgnorableDrop-outUsingSemiparametricModelsbyD.O.Scharfstein,A.RotnitskyandJ.M.Robins.JournaloftheAmericanStatisticalAssociation94,1130. Little,R.andWang,Y.(1996).Pattern-mixturemodelsformultivariateincompletedatawithcovariates.Biometrics52,98. Liu,T.,Todhunter,R.,Lu,Q.,Schoettinger,L.,Li,H.,Littell,R.,Burton-Wurster,N.,Acland,G.,Lust,G.,andWu,R.(2006).Modellingextentanddistributionofzygoticdisequilibrium:Implicationsforamultigenerationalcaninepedigree.Genetics. 118 PAGE 119 Molenberghs,G.,Kenward,M.,andLesaffre,E.(1997).TheAnalysisofLongitudinalOrdinalDatawithNonrandomDrop-Out.Biometrika84,33. Molenberghs,G.andKenward,M.G.(2007).MissingDatainClinicalStudies.Wiley. Molenberghs,G.,Michiels,B.,Kenward,M.,andDiggle,P.(1998).MonotoneMissingDataandPattern-MixtureModels.StatisticaNeerlandica52,153. Neal,R.(2003).Slicesampling.TheAnnalsofStatistics31,705. Neyman,J.(1923).Ontheapplicationofprobabilitytheorytoagriculturalexperiments.StatisticalScience5,465. Nordheim,E.(1984).InferencefromNonrandomlyMissingCategoricalData:anExampleFromaGeneticStudyofTurner'sSyndrome.JournaloftheAmericanStatisticalAssociation79,772. Pauler,D.,McCoy,S.,andMoinpour,C.(2003).PatternMixtureModelsforLongitudinalQualityofLifeStudiesinAdvancedStageDisease.StatisticsinMedicine22,795. Pulkstenis,E.,TenHave,T.,andLandis,J.(1998).ModelfortheAnalysisofBinaryLongitudinalPainDataSubjecttoInformativeDropoutThroughRemedication.JournaloftheAmericanStatisticalAssociation93,438. Radloff,L.(1977).TheCES-DScale:ASelf-ReportDepressionScaleforResearchintheGeneralPopulation.AppliedPsychologicalMeasurement1,385. Robins,J.(1997).Non-responsemodelsfortheanalysisofnon-monotonenon-ignorablemissingdata.StatisticsinMedicine16,21. Robins,J.(1998).Correctionfornon-complianceinequivalencetrials.StatisticsinMedicine17,. Robins,J.andRitov,Y.(1997).Towardacurseofdimensionalityappropriate(coda)asymptotictheoryforsemi-parametricmodels.StatisticsinMedicine16,285. Robins,J.,Rotnitzky,A.,andZhao,L.(1994).Estimationofregressioncoefcientswhensomeregressorsarenotalwaysobserved.JournaloftheAmericanStatisticalAssociation89,846. Robins,J.,Rotnitzky,A.,andZhao,L.(1995).Analysisofsemiparametricregressionmodelsforrepeatedoutcomesinthepresenceofmissingdata.JournaloftheAmericanStatisticalAssociation90,. 119 PAGE 120 Rotnitzky,A.,Robins,J.,andScharfstein,D.(1998b).Semiparametricregressionforrepeatedoutcomeswithnonignorablenonresponse.JournaloftheAmericanStatisticalAssociation93,1321. Rotnitzky,A.,Scharfstein,D.O.,Su,T.-L.,andRobins,J.M.(2001).Methodsforconductingsensitivityanalysisoftrialswithpotentiallynonignorablecompetingcausesofcensoring.Biometrics57,103. Roy,J.(2003).ModelingLongitudinalDatawithNonignorableDropoutsUsingaLatentDropoutClassModel.Biometrics59,829. Roy,J.andDaniels,M.J.(2008).Ageneralclassofpatternmixturemodelsfornonignorabledropoutwithmanypossibledropouttimes.Biometrics64,538. Rubin,D.(1974).Estimatingcausaleffectsoftreatmentsinrandomizedandnonrandomizedstudies.JournalofEducationalPsychology66,688. Rubin,D.(1976).Inferenceandmissingdata.Biometrika63,581. Rubin,D.(1977).Formalizingsubjectivenotionsabouttheeffectofnonrespondentsinsamplesurveys.JournaloftheAmericanStatisticalAssociationpages538. Rubin,D.B.(1987).MultipleImputationforNonresponseinSurveys.Wiley. Rubin,D.B.(2000).Causalinferencewithoutcounterfactuals:comment.JournaloftheAmericanStatisticalAssociationpages435. Scharfstein,D.,Daniels,M.,andRobins,J.(2003).IncorporatingPriorBeliefsaboutSelectionBiasintotheAnalysisofRandomizedTrialswithMissingOutcomes.Biostatistics4,495. Scharfstein,D.,Halloran,M.,Chu,H.,andDaniels,M.(2006).Onestimationofvaccineefcacyusingvalidationsampleswithselectionbias.Biostatistics7,615. Scharfstein,D.,Manski,C.,andAnthony,J.(2004).OntheConstructionofBoundsinProspectiveStudieswithMissingOrdinalOutcomes:ApplicationtotheGoodBehaviorGameTrial.Biometrics60,154. Scharfstein,D.,Rotnitzky,A.,andRobins,J.(1999).AdjustingforNonignorableDrop-OutUsingSemiparametricNonresponseModels.JournaloftheAmericanStatisticalAssociation94,1096. 120 PAGE 121 Shepherd,B.,Gilbert,P.,andMehrotra,D.(2007).ElicitingaCounterfactualSensitivityParameter.AmericanStatistician61,56. TenHave,T.,Kunselman,A.,Pulkstenis,E.,andLandis,J.(1998).Mixedeffectslogisticregressionmodelsforlongitudinalbinaryresponsedatawithinformativedrop-out.Biometrics54,367. TenHave,T.,Miller,M.,Reboussin,B.,andJames,M.(2000).MixedEffectsLogisticRegressionModelsforLongitudinalOrdinalFunctionalResponseDatawithMultiple-CauseDrop-OutfromtheLongitudinalStudyofAging.Biometrics56,279. Thijs,H.,Molenberghs,G.,Michiels,B.,Verbeke,G.,andCurran,D.(2002).Strategiestotpattern-mixturemodels.Biostatistics3,245. Troxel,A.,Harrington,D.,andLipsitz,S.(1998).Analysisoflongitudinaldatawithnon-ignorablenon-monotonemissingvalues.JournaloftheRoyalStatisticalSociety.SeriesC(AppliedStatistics)47,425. Troxel,A.,Lipsitz,S.,andHarrington,D.(1998).Marginalmodelsfortheanalysisoflongitudinalmeasurementswithnonignorablenon-monotonemissingdata.Biometrika85,661. Tsiatis,A.A.(2006).Semiparametrictheoryandmissingdata.Springer,NewYork. vanderLaan,M.J.andRobins,J.(2003).UniedMethodsforCensoredLongitudinalDataandCausality.Springer. Vansteelandt,S.,Goetghebeur,E.,Kenward,M.,andMolenberghs,G.(2006a).Ignoranceanduncertaintyregionsasinferentialtoolsinasensitivityanalysis.Statis-ticaSinica16,953. Vansteelandt,S.,Goetghebeur,E.,Kenward,M.,andMolenberghs,G.(2006b).Ignoranceanduncertaintyregionsasinferentialtoolsinasensitivityanalysis.Statis-ticaSinica16,953. Vansteelandt,S.,Rotnitzky,A.,andRobins,J.(2007).Estimationofregressionmodelsforthemeanofrepeatedoutcomesundernonignorablenonmonotonenonresponse.Biometrika94,841. Wahba,G.(1990).Splinemodelsforobservationaldata.SocietyforIndustrialMathematics. 121 PAGE 122 Wang,C.,Daniels,M.,D.O.,S.,andLand,S.(2010).ABayesianshrinkagemodelforincompletelongitudinalbinarydatawithapplicationtothebreastcancerpreventiontrial.ToAppearinJASA. Wu,M.andBailey,K.(1988).Analysingchangesinthepresenceofinformativerightcensoringcausedbydeathandwithdrawal.StatisticsinMedicine7,. Wu,M.andBailey,K.(1989).Estimationandcomparisonofchangesinthepresenceofinformativerightcensoring:conditionallinearmodel.Biometricspages939. Wu,M.andCarroll,R.(1988).Estimationandcomparisonofchangesinthepresenceofinformativerightcensoringbymodelingthecensoringprocess.Biometrics44,175. Wulfsohn,M.andTsiatis,A.(1997).Ajointmodelforsurvivalandlongitudinaldatameasuredwitherror.Biometricspages330. Yuan,Y.andLittle,R.J.(2009).Mixed-effecthybridmodelsforlongitudinaldatawithnonignorabledrop-out.Biometrics(inpress). Zhang,J.andHeitjan,D.(2006).Asimplelocalsensitivityanalysistoolfornonignorablecoarsening:applicationtodependentcensoring.Biometrics62,1260. Zhang,J.andRubin,D.(2003).EstimationofCausalEffectsviaPrincipalStraticationWhenSomeOutcomesareTruncatedbyDeath.JournalofEducationalandBehavioralStatistics28,353. 122 PAGE 123 ChenguangWangreceivedhisbachelor'sandmaster'sdegreesincomputersciencefromDalianUniversityofTechnology,China.Chenguanglaterjoinedthebiometryprogramofandreceivedhismaster'sdegreeinstatisticsfromUniversityofNebraska-Lincoln.AtUniversityofFlorida,Chenguang'smajorwasstatisticswhilesimultaneouslyworkingfortheChildren'sOncologyGroupStatisticsandDataCenter(2004-2009)andCenterforDevicesandRadiologicalHealth,FDA(2009-2010).ChenguangreceivedhisPh.D.fromUniversityofFloridainthesummerof2010.Chenguang'sresearchhasfocusedonconstructingaBayesianframeworkforincompletelongitudinaldatathatidentiestheparametersofinterestandassessessensitivityoftheinferenceviaincorporatingexpertopinions.Suchaframeworkcanbebroadlyusedinclinicaltrialstoprovidehealthcareprofessionalsmoreaccurateunderstandingofthestatisticalorcausalrelationshipbetweenclinicalinterventionsandhumandiseases.ChenguangisamemberofAmericanStatisticalAssociation,amemberofEasternNorthAmericanRegion/InternationalBiometricSociety,andamemberofChildren'sOncologyGroup. 123 |