UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
MODELS FOR REPEATED MEASURES OF A MULTIVARIATE RESPONSE By RALITZA GUEORGUIEVA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1999 Copyright 1999 by Ralitza Gueorguieva To my family ACKNOWLEDGMENTS I would like to express my deepest gratitude to Dr. Alan Agresti for serving as my dissertation advisor. Without his guidance, constant encouragement and valuable advice this work would not have been completed. My appreciation is extended to Drs. James Booth, Randy Carter, Malay Ghosh, and Monika Ardelt for serving on my committee and for helping me with my research. I would also like to thank all the faculty, staff, and students in the Department of Statistics for their support and friendship. I also wish to acknowledge the members of the Perinatal Data Systems group whose constant encouragement and understanding have helped me in many ways. Special thanks go to Dr. Randy Carter, who has supported me since my first day as a graduate student. Finally, I would like to express my gratitude to my parents, Vessela and Vladislav, for their loving care and confidence in my success, to my sister, Annie, and her fiance, Nathan, for their encouragement and continual support, and to my husband, Velizar, for his constant love and inspiration. TABLE OF CONTENTS page ACKNOWLEDGMENTS ...................................................... iv A B ST RA C T ................................................................... vii CHAPTERS 1 INTRODUCTION .................................................... 1 1.1 Models for Univariate Repeated Measures ...................... 3 1.1.1 General Linear Models .................................. 3 1.1.2 Generalized Linear Models .............................. 4 1.1.3 Marginal Models ........................................ 5 1.1.4 Random Effects Models ................................. 7 1.1.5 Transition M odels ....................................... 9 1.2 Models for Multivariate Repeated Measures ................... 10 1.3 Simultaneous Modelling of Responses of Different Types ....... 15 1.4 Format of Dissertation ......................................... 17 2 MULTIVARIATE GENERALIZED LINEAR MIXED MODEL ...... 20 2.1 Introduction ..................................................... 20 2.2 M odel Definition ..................... ........................... 22 2.3 M odel Properties .................... ........................... 24 2.4 A applications .................................................... 25 3 ESTIMATION IN THE MULTIVARIATE GENERALIZED LINEAR M IXED M ODEL .................. ................................ 29 3.1 Introduction ..................................................... 29 3.2 Maximum Likelihood Estimation ............................... 32 3.2.1 GaussHermite Quadrature ............................. 33 3.2.2 Monte Carlo EM Algorithm ............................ 38 3.2.3 Pseudolikelihood Approach ............................ 42 3.3 Simulated Data Example ....................................... 47 3.4 Applications .................................................... 53 3.4.1 Developmental Toxicity Study in Mice .................. 53 3.4.2 Myoelectric Activity Study in Ponies ................... 60 3.5 Additional Methods ............................................ 69 4 INFERENCE IN THE MULTIVARIATE GENERALIZED LINEAR MIXED MODEL .................................................. 72 4.1 Inference about Regression Parameters ......................... 74 4.2 Estimation of Random Effects ................................ 77 4.3 Inference Based on Score Tests ................................. 78 4.3.1 General Theory .................. ......................... 78 4.3.2 Testing the Conditional Independence Assumption ..... 80 4.3.3 Testing the Significance of Variance Components ....... 84 4.4 Applications ..................................................... 94 4.5 Sim ulation Study ................................. .............. 97 4.6 Future Research ................................................. 102 5 CORRELATED PROBIT MODEL .................................. 104 5.1 Introduction ..................................................... 105 5.2 Model Definition ................................................ ..110 5.3 Maximum Likelihood Estimation ............................... 112 5.3.1 Monte Carlo EM Algorithm ............................. 112 5.3.2 Stochastic Approximation EM Algorithm ............... 121 5.3.3 Standard Error Approximation ......................... 124 5.4 Application ...................................................... 125 5.5 Simulation Study ............................................... 142 5.6 Identifiability Issue ................ .............................. 145 5.7 M odel Extensions ............................................... 155 5.8 Future Research ................... .............................. 157 6 CONCLUSIONS .................. .... ...................... .......... 158 6.1 Sum m ary ........................................................ 158 6.2 Future Research ................. ...................... .......... 161 REFERENCES ........................................................ ......... 164 BIOGRAPHICAL SKETCH ................................................... 171 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy MODELS FOR REPEATED MEASURES OF A MULTIVARIATE RESPONSE By Ralitza Gueorguieva December 1999 Chairman: Alan Agresti Major Department: Statistics The goal of this dissertation is to propose and investigate random effects models for repeated measures situations when there are two or more response variables. The emphasis is on maximum likelihood estimation and on applications with outcomes of different types. We propose a multivariate generalized linear mixed model that can accommodate any combination of outcome variables in the exponential family. This model assumes conditional independence between the response variables given the random effects. We also consider a correlated probit model that is suitable for mixtures of binary, continuous, censored continuous, and ordinal outcomes. Although more limited in area of applicability, the correlated probit model allows for more general correlation structure between the response variables than the corresponding multivariate generalized linear mixed model. We extend three estimation procedures from the univariate generalized linear mixed model to the multivariate generalization proposed herein. The methods are GaussHermite quadrature, Monte Carlo EM algorithm, and pseudolikelihood. Stan dard error approximations are considered along with parameter estimation. A sim ulated data example and two 'reallife' examples are used for illustration. We also consider hypothesis testing based on quadrature and Monte Carlo approximations to the Wald, score, and likelihood ratio tests. The performance of the approximations to the test statistics is studied via a small simulation study for checking the conditional independence assumption. We propose a Monte Carlo EM algorithm for maximum likelihood estimation in the correlated probit model. Because of the computational inefficiency of the algo rithm we consider a modification based on stochastic approximations which leads to a significant decrease in the time for model fitting. To address the issue of advan tages of joint over separate analyses of the response variables we design a simulation study to investigate possible efficiency gains in a multivariate analysis. Noticeable increase in the estimated standard errors is observed only in the binary response case for small number of subjects and observations per subject and for high correlation between the outcomes. We also briefly consider an identifiability issue for one of the variance components. CHAPTER 1 INTRODUCTION Univariate repeated measures occur when one response variable is observed at several occasions for each subject. Hereafter subject refers to any unit on which a measurement is taken, while occasion corresponds to time or to a specific condition. If more than one response is observed at each occasion, multivariate repeated measures are available. Univariate and multivariate repeated measures are very common in biomedical applications, for example when one or more variables are measured on each patient at a number of hospital visits, or when a number of questions are asked at a series of interviews. But the occasions do not necessarily refer to different times. For instance dependent responses can be measured on litter mates, on members of the same family or at different places on a subject's body. Difficulties in analyzing repeated measures arise because of correlations usually present between observations on the same subject. Statistical methods and estimation techniques are well developed for repeated measures on a univariate normal variable, and lately much research has been dedicated to repeated observations on a binary variable and more generally on variables with distributions in the exponential family. Zeger and Liang (1992) provide an overview of methods for longitudinal data and the books of Lindsey (1993), Diggle, Liang and Zeger (1994), and Fahrmeir and Tutz (1994) cover many details. Pendergast et al. (1996) present a comprehensive survey of models for correlated binary outcomes, including longitudinal data. However, relatively little attention is concentrated on repeated measures of a mul tivariate response. General models for this situation are necessarily complex as two types of correlations must be taken into account: correlations between measurements on different variables at each occasion and correlations between measurements at dif ferent occasions. Reinsel (1982, 1984), LundbyeChristensen (1991), Matsuyama and Ohashi (1997), and Heitjan and Sharma (1997) consider models for normally dis tributed responses. Lefkopoulou, Moore and Ryan (1989), Liang and Zeger (1989), and Agresti (1997) propose models for multivariate binary data; Catalano and Ryan (1992), Fitzmaurice and Laird (1995), and Regan and Catalano (1999) introduce models for clustered bivariate discrete and continuous outcomes. Catalano (1994) considers an extension to ordinal data of the Catalano and Ryan model. Rochon (1996) demonstrates how generalized estimating equations can be used to fit extended marginal models for bivariate repeated measures of discrete or continuous outcomes. Rochon's approach is very general and allows for a large class of response distribu tions. However, in many cases, especially when subjectspecific inference is of primary interest, marginal models are not appropriate as they may lead to attenuation of the estimates of the regression parameters (Zeger, Liang, and Albert, 1988). Sammel, Ryan, and Legler (1997) analyze mixtures of discrete and continuous responses in the exponential family using latent variables models. Their approach is based on numerical or stochastic approximations to maximumlikelihood and al lows for subjectspecific inference. Blackwell and Catalano (1999a, 1999b) consider extensions for ordinal data and for repeated measures of ordinal responses. The Generalized Linear Mixed Model (GLMM) forms a very general class of subjectspecific models for discrete and continuous responses in the exponential fam ily and is used for univariate repeated measures (Fahrmeir and Tutz, 1994). In the current dissertation we demonstrate how the GLMM approach can be extended for multivariate repeated measures by assuming separate random effects for each out come variable. This is in contrast to the Sammel et al. approach in which common underlying latent variables are assumed. We also consider a more general correlated probit model than the appropriate GLMM for the special case of clustered binary and continuous data. The introduction to this dissertation contains an overview of approaches for mod elling of univariate repeated measures (Section 1.1), existing models for multivariate repeated measures (Section 1.2), and simultaneous modeling of different types of responses (Section 1.3). The chapter concludes with an outline of the dissertation (Section 1.4). 1.1 Models for Univariate Repeated Measures 1.1.1 General Linear Models Historically, the first models for repeated measures data to be considered are general linear models with correlated normally distributed errors (see Ware, 1982, for review). The univariate representation of such models is as follows: Suppose each subject i is observed on J occasions and denote the column vector of responses by Yi. Assume that y, arises from the linear model yi = Xi/3 + ei, (1.1) where Xi is a J x p model matrix for the ith individual, f3 is apx 1 unknown parameter vector and Ei is a J x 1 error vector with a multivariate normal distribution with mean 0 and arbitrary positive definite covariance matrix E: Nj(0, E). E can take certain special forms. The case E = I corresponds to the usual linear model for cross sectional data. The equicorrelation structure (E = o2(pI + (1 p)J), where J is a matrix of ones, 0 < p < 1 and a2 > 0) is appropriate when the repeated measures are on subjects within a cluster, for example when a certain characteristic is observed on each of a number of litter mates. The autoregressive structure for E = ((aij)) (uij = pli71) is one of the most popular structures when the observations are over equallyspaced time periods. Usually of primary interest in general linear models is the estimation of regression parameters while recognizing the likely correlation structure in the data. To achieve this, one either assumes an explicit parametric model for the covariance structure, or uses methods of inference that are robust to misspecification of the covariance structure. Weighted least squares, maximum likelihood and restricted maximum likelihood are the most popular estimation methods for general linear models. 1.1.2 Generalized Linear Models Generalized linear models (GLM) are natural extensions of classical linear models allowing for a larger class of response distributions. Their specification consists of three parts (McCullagh and Nelder, 1989, pp.2730): a random component, a sys tematic component and a link function. 1. The random component is the probability distribution for the elements of the response vector. The yij's, i = 1, ...n, are assumed to be independent with a distribution in the exponential family f (Yi; Oi,i) = expi ab(O) +c(yi, 0 (1.2) u\(Oi) for some specified functions a(.), b(.) and c(.). Usually a(i) = Wi where 0 is called a dispersion parameter and wi are known weights. The mean Pi and the variance function v(pi) completely specify a member of the exponential family because y, = b'(0i) and v(pi) = b"(Oi)a(ji). Important exponential family distributions are the normal, the binomial, the Poisson and the gamma distributions. 2. The systematic component is a linear function of the covariates iii = xf3, where 7i is commonly called a linear predictor. 3. The link function g(.) is a monotonic differentiable function which relates the expected value of the response distribution pi to the linear predictor 71i: = g(pi) When the response distribution is normal and the link is the identity function g(p) = p, the GLM reduces to the usual linear regression model. For each member of the exponential family there is a special link function, called the canonical link function, which simplifies model fitting. For that link function Oi = 77i. Maximum likelihood estimates in GLM are obtained using iterative reweighted least squares. Just as modifications of linear models are used for analyzing Gaussian repeated measures, modifications of GLM can handle discrete and continuous outcomes. Ex tensions to GLM include marginal, random effects and transition models (Zeger and Liang, 1992). Hereafter we will use Yij to denote the response at the jth occasion for the ith subject. The ranges of the subscripts will be i = 1, ...n, and j = 1, ...J for balanced and j = 1, ...ni for unbalanced data. 1.1.3 Marginal Models Marginal models are designed to permit separate modeling of the regression of the response on the predictors, and of the association among repeated observations for each individual. The models are defined by specifying expressions for the marginal mean and the marginal variancecovariance matrix of the response: 1. The marginal mean pij = E(yij) is related to the predictors by a known link function g(.) g(ij) = x 2. The marginal variance is a function of the marginal mean Var(yij) = V(pij) and the marginal covariance is a function of the marginal means and of additional parameters 6 Cov(yij,, yij) = c(/i, Ai;6). Notice that if the correlation is ignored and the variance function is chosen to corre spond to an exponential family distribution, the marginal model reduces to GLM for independent data. But the variance function can be more general, and hence even if the responses are uncorrelated this model is more general than the corresponding GLM. Because only the first two moments are specified for the joint distribution of the response, additional assumptions are needed for likelihood inferences. Alterna tively, the Generalized Estimating Equations (GEE) method can be used (Liang and Zeger (1986), Zeger, Liang and Albert (1988)) as briefly summarized here. Let yi = (yI,..Y = (i, .., Ai = diag{V(pil),..., V(i,,J)} and let R,(6) be a 'working' correlation matrix for the ih subject. The latter means than R,(65) is completely specified up to a parameter vector 6 and may or may not be the true correlation matrix. The regression parameters /3 are then estimated by solving n, 8,,T I_ S =o V (6)(Yi /ti) = 0, 1 1 where Vi(6) = A5RP(6)A2. Liang and Zeger (1986) show that if the mean function is correctly specified, 3, the solution to the above equation, is consistent and asymp totically normal as the number of subjects goes to infinity. They also propose a robust variance estimate, which is also consistent even when the variancecovariance struc ture is misspecified. Hence, the GEE approach is appropriate when the regression relationship, and not the correlation structure of the data, is of primary interest. 1.1.4 Random Effects Models An important feature of the marginal models is that the regression coefficients have the same interpretation as coefficients from a crosssectional analysis. These models are preferred when the effects of explanatory variables on the average response within a population are of primary interest. However, when it is of interest to describe how the response for a particular individual changes as a result of a change in the covariates, a more pertinent approach is to consider random (mixed) effects models. Random effects models assume that the correlation among repeated responses arises because there is a natural heterogeneity across individuals and that this het erogeneity can be represented as a probability distribution. More precisely, 1. The conditional distribution of the response Yij given a subjectspecific vector of random effects bi satisfies a GLM with a linear predictor x ,/ + zbi, where zi3 in general is a subset of xj. 2. The responses on the same subject yii, Yi2,., yin, are conditionally independent given bi. 3. bi has certain distribution F(.; 6) with mean 0 and variancecovariance matrix E depending on a parameter vector 6. Such models with normal distribution for the random effects are considered in greater detail in Section 2.1. In contrast to marginal models, the regression coefficients in random effects models have subjectspecific interpretations. To better illustrate that difference let us consider a particular example. Let the response Yij be binary and the subscript j refer to time. Consider the marginal model logit(ij) =/3o +/31ij, where pij = E(yii), and the random effects model logit(pj) = 1c +/ j + bi, bi i.i.d. N(0,a2), where i = E(yij~bi). Then / is the logodds ratio for a positive response at time j +1 relative to time j for any subject i, while /3i is the populationaveraged logodds ratio. That is 0i1 describes the change in logodds for a positive response from time j to time j + 1 for the population as a whole. In general these two interpretations are different but in some special cases such as identity link functions subjectspecific and populationaveraged interpretations coincide. More discussion on tonneonnection between marginal and random effects models follows in Chapter 2. The presence of random effects enables the pooling of information across different subjects to result in better subjectspecific as opposed to populationaveraged infer ence, but complicates the estimation problem considerably. To obtain the likelihood function, one has to integrate out the random effects, which, except for a few special cases, cannot be performed analytically. If the random effects are nuisance param eters, conditional likelihood estimates for the fixed effects may be easy to obtain for canonical link functions. This is accomplished by conditioning on the sufficient statistics for the unknown nuisance parameters and then maximizing the conditional likelihood. When the dimension of the integral is not high, numerical methods such as Gaus sian quadrature work well for normally distributed random effects (Fahrmeir and Tutz, 1994, pp.357362; Liu and Pierce, 1994). A variety of other methods have recently been proposed to handle more difficult cases. These include the approxi mate maximum likelihood estimates proposed by Schall (1991), the penalized quasi likelihood approach of Breslow and Clayton (1993), the hierarchical maximum like lihood of Lee and Nelder (1996), the Gibbs sampling approach of Zeger and Karim (1991), the EM algorithm approach for GLMM of Booth and Hobert (1999) and of McCulloch (1997) and others. 1.1.5 Transition Models A third group of models for dealing with longitudinal data consists of transition models. Regression Markov chain models for repeated measures data have been con sidered by Zeger et al. (1985) and Zeger and Qaqish (1988). This approach involves modeling the conditional expectation of the response at each occasion given past outcomes. Specifically, 1. The conditional expectation of the response 4, = E(yjyjj1, ...yii) depends on previous responses and current covariates as follows j g(9) = X3 + L Ykfk(Yij, ...Yi), k=l where {fk(')}, k = 1, ...j are known functions. 2. The conditional variance of yij is a function of the conditional mean Var(yjijyi ...yjl) = V(p)), where V is a known function. Transition models combine the assumptions about the dependence of the response on the explanatory variables and the correlation among repeated observations into a single equation. Conditional maximum likelihood and GEE have both been used for estimating parameters. Extensions of GLM are not the only methods for analyzing repeated measures data (see for example Vonesh (1992) for an overview of nonlinear models for longitudinal data), but as the proposed models in this dissertation are based on such extensions, our discussion in the following sections will be restricted to GLM types of models. 1.2 Models for Multivariate Repeated Measures In contrast to univariate longitudinal data, very few models have been discussed that specifically deal with multivariate repeated measures. These models are now briefly discussed, starting with normal theory linear models and proceeding with models for discrete outcomes. A review of general linear models for the analysis of longitudinal studies is pro vided by Ware (1985). The general multivariate model is defined as in (1.1) but we assume that the J = KL repeated measures on each subject are made on K nor mally distributed variables rather than on only one normally distributed response. Hence, the general multivariate model with unspecified covariance structure can be directly applied to multivariate repeated measures data. However, the number of parameters to be estimated increases quickly as the number of occasions and/or vari ables increases and the estimation may become quite burdensome. Special types of correlation structures such as bivariate autoregressive can be specified directly as pro posed by Galecki (1994). This multivariate linear model is also not well suited for unbalanced or incomplete data. More parsimonious covariance structures are achieved in random effects models. The linear mixed effects model is defined in two stages. At stage 1 we assume yibi = Xi,3 + Zib, + Ei, where Ei is distributed Nn,(0, a2I), X, and Z, are ni x p and ni x q model matrices for the fixed and the random effects respectively, and bi is a q x 1 random effects vector. At stage 2, bi Nq(O, E) independently of El. This corresponds to a special variancecovariance structure for the ith subject: Ei = 02I + ZiEZT. Reinsel (1982) generalized this linear mixed effects model and showed that for bal anced multivariate repeated measures models with random effects structure, closed form solutions exist for both maximum likelihood (ML) and restricted maximum likelihood (REML) estimates of mean and covariance parameters. Matsuyama and Ohashi (1997) considered bivariate response mixed effects models that can handle missing data. Choosing a Bayesian viewpoint, they used the Gibbs sampler to esti mate the parameters. The model of Reinsel is overly restrictive in some cases as he prescribes the same growth pattern over all response variables for all individuals. In contrast, Heitjan and Sharma (1997) considered a model for repeated series longitudinal data, where each unit could yield multiple series of the same variable. The error term they used was a sum of a random subject effect and a vector autoregressive process, thus accounting for subject heterogeneity and time dependence in an additive fashion. A straightforward extension for multiple series of observations on distinct variables is possible. All models discussed so far in this section are appropriate for continuous response variables that can be assumed to be normally distributed. More generally GEE marginal models can easily accommodate multivariate repeated measures if all re sponse variables have the same discrete or continuous distribution. We now restate the GEE marginal model from Section 1.1 in matrix notation to simplify the mul tivariate extension. We also consider balanced data. Let yi represent the vector of observations for the ith individual (i = 1,2, ...n) and let /Ai = E(yi) be the marginal mean vector. We assume gWli = X,/3 where X, is the model matrix for the ith individual, /3 is an unknown parameter vector and g(.) is a known link function, applied componentwise to i. Also Var(yij) = OV(pij), where V is a known variance function and 0 is a dispersion parameter. Letting A, = diag[V(pii1),...V((pij)] and assuming a working correlation matrix R,(65) for yi, the 'working' covariance matrix for y, is given by Vi = (A) ()(A . If ,((6) = R(6) is the true correlation matrix, V, is the true covariance matrix of yi. The J responses for each subject are usually repeated measures on the same variable, but as in the normal case, they can be repeated observations on two or more outcome variables as long as they have the same distribution. The only difference with univariate repeated measures is in the specification of the covariance matrix. The estimates of the regression parameters /3 will be consistent provided that the model for the marginal mean structure is specified correctly, but for better efficiency, the working correlation matrix should be close to the true correlation matrix. Several correlation structures for multivariate repeated measures are discussed by Rochon (1996). As a further extension, Rochon (1996) proposed a model for bivariate repeated measures that could accommodate both continuous and discrete outcomes. He used GEE models as the one described above, to relate each set of repeated measures to im portant explanatory variables and then applied seemingly unrelated regression (SUR, Zellner, 1962) methodology to combine the pair of GEE models into an overall analysis framework. If the response vector for the ith subject is denoted by yi = (y 1)T, y 2)T and all relevant quantities for the first and second response are superscribed by 1 and 2 respectively, the SUR model may be written as where [$1)] F p' I ri i xi f X 0 Q\ Pi ^ j'^ X2 p.3(2). and g(.) is a compound function consisting of g(l)(.) and g(2)(.). The joint covariance matrix among the sets of repeated measures may be written as V0 FIJ 0 A 1( ) 1 R() 0 1}2 0 52Ij 0 [ 0 [ 0 A02. J (6) 0 A[2A' J [ 0 0 1J ' where R(65) is the working correlation matrix among the two sets of repeated measures for each subject, and each of its elements is a function of the vector parameter 6. R() [ R11 R121 R(6 RfL R22 j The suggested techniques may be extended to multiple outcome measures. Rochon's approach provides a great deal of flexibility in modeling the effects of both within subject and betweensubject covariates on discrete and continuous outcomes and is appropriate when the effects of covariates on the marginal distributions of the response are of interest. If subjectspecific inference is preferred, however, a transitional or random effects models should be considered. Liang and Zeger (1989) suggest a class of Markov chain logistic regression models for multivariate binary time series. Their approach is to model the conditional dis tribution of each component of the multivariate binary time series given the others. They use 'pseudolikelihood' estimation methods to reduce the computational burden associated with maximum likelihood estimation. Liang and Zeger's transitional model is useful when the association among vari ables at one time is of interest, or when the purpose is to identify temporal relation ship among the variables, adjusting for covariates. However, one must use caution when interpreting the estimated parameters because the regression parameter 03 in the model has logodds ratio interpretation conditional on the past and on the other outcomes at time t. Hence, if a covariate influences more than one component of the outcome vector or past outcomes, which is frequently the case, its regression coeffi cient will capture only that part of its influence that cannot be explained by the other outcomes it is also affecting. Another problem with the model is that in the fitting process all the information on the first q observations is ignored. A random effects approach for dealing with multivariate longitudinal data is dis cussed by Agresti (1997) who develops a multivariate extension of the Rasch model for repeated measures of a multivariate binary response. One disadvantage of the multi variate Rasch model shared by many random effects models, is that it can not model negative covariance structure among repeated observations on the same variable. Al though usually measurements on the same variable within a subject are positively correlated, there are cases when the correlation is negative. One such example is the infection data, first considered by Haber (1986). Frequencies of infection profiles of a sample of 263 individuals for four influenza outbreaks over four consecutive winters in Michigan were recorded. The first and fourth outbreaks are known to be caused by the same virus type and because contracting influenza during the first outbreak provides an immunity against a subsequent outbreak, a subject's response for these two outbreaks is negatively correlated. Coull (1997) analyzed these data using the multivariate binomiallogit normal model. As it is a special case of the models that we propose later in this dissertation, we define it here. Let y = (yi, ...Y1)T given ir = (71, ...Tri)T be a random vector of independent bi nomial components with number of trials (nl, ...nh)T. Also let logit(ir) be Ni(tI, E). Then 7r has multivariate logisticnormal distribution and unconditionally y has a mul tivariate binomial logitnormal mixture distribution. If the I observations correspond to measurements on K variables at L occasions, this model can be used for analyzing multivariate repeated measures data. The mean of the multivariate random effects distribution can be assumed to be a function of covariates p = X,3 and several groups of subjects with the same design matrices X., s = 1, ...S, can be considered. The multivariate binomiallogit normal model can be regarded as an analog for binary data of the multivariate Poissonlog normal model of Aitchison and Ho (1989) for count data. Aitchison and Ho assume that y = (yi, ...y1)T given 0 = (01, ...0i)T are independent Poisson with mean vector 0, and that log(O) = (log(01), ...log(9i))T are Ni(p, E). Then 0 has multivariate lognormal distribution. Like the multivariate logitnormal model, the multivariate Poissonlog normal model can be used to model negative correlations and can be extended to incorporate covariates. Chan and Kuk (1997) also consider random effects models for binary repeated measures but assume an underlying threshold model with normal errors and random effects, and use the probit link. They estimate the parameters via a Monte Carlo EM algorithm, regarding the observations from the underlying continuous model as the complete data. Their approach will be discussed in more detail in Chapter 5 where an extension of their model will be considered. 1.3 Simultaneous Modelling of Responses of Different Types Difficulties in joint modelling of responses of different types arise because of the need to specify a multivariate joint distribution for the outcome variables. Most research so far has concentrated on simultaneous analysis of binary and continuous responses. Olkin and Tate (1961) introduced a location model for discrete and continuous outcomes. It is based on a multinomial model for the discrete outcomes and a mul tivariate Gaussian model for the continuous outcomes conditional on the discrete outcomes. Fitzmaurice and Laird (1995) discussed a generalization of this model which turns out to be a special case of the partly exponential model introduced by Zhao, Prentice and Self (1992). Partly exponential models for the regression analy sis of multivariate (discrete, continuous or mixed) response data are parametrized in terms of the response mean and a general shape parameter. They encompass gener alized linear models as well as certain multivariate distributions. A fully parametric approach to estimation leads to asymptotically independent maximum likelihood es timates of the mean and the shape parameters. The score equations for the mean parameters are essentially the same as in GEE. The authors point out two major drawbacks to their approach: one is the computational complexity of full maximum likelihood estimation of the mean and the shape parameters together, another one is the need to specify the shape function for the response. The latter will be especially hard if the response vector is a mixture of discrete and continuous outcomes. Zhao, Prentice and Self conclude that partly exponential models are mainly of theoretical interest and can be used to evaluate properties of other mean estimation procedures. Cox and Wermuth (1992) compared a number of special models for the joint distribution of qualitative (binary) and quantitative variables. The joint distribution for all bivariate models is based on the marginal distribution of one of the components and a conditional distribution for the other component given the first one. A key distinction is between models in which for each binary outcome (A), the quantitative response (Y) is assumed to be normally distributed and models in which the marginal distribution of Y is normal. Typically, simplicity in the marginal distribution of Y corresponds to fairly complicated conditional distribution of Y and vice versa but normality often holds at least approximately. Estimation procedures differ from model to model but essentially the same tests of independence of the two components of the response can be derived. This, however, is not true if trivariate distributions are considered with two binary and one continuous, or with two continuous and one binary component. In that case several different hypotheses of independence and conditional independence can be considered and depending on the model sometimes they may not be tested unless a stronger hypothesis of independence is assumed. Catalano and Ryan (1992), Fitzmaurice and Laird (1995) and Regan and Catalano (1999) considered mixed models for a bivariate response consisting of a binary and of a quantitative variable. Catalano and Ryan (1992) and Regan and Catalano (1999) treated the binary variable as a dichotomized continuous latent trait, which had a joint bivariate normal distribution with the other continuous response. Catalano and Ryan (1992) then parametrized the model so that the joint distribution was a product of a standard random effects model for the continuous variable and a correlated probit model for the discrete variable. Estimation for the parameters of the two models is performed using quasilikelihood techniques. Catalano (1994) extended the Catalano and Ryan procedure to ordinal instead of binary data. Regan and Catalano (1999) used exact maximum likelihood for estimation, which is computationally feasible because of the equicorrelation assumption between and within the binary and continuous outcomes. The maximum likelihood methodology used by the authors is an extension to the procedure suggested by Ochi and Prentice (1984) for binary data. Fitzmaurice and Laird (1995) assumed a logit model for the binary response and a conditional Gaussian model for the continuous response. Unlike Catalano and Ryan's model all regression parameters have marginal interpretations and the estimates of the regression parameters (based on ML or GEE) are robust to misspecification of the association between the binary and the continuous responses. Sammel, Ryan and Legler (1997) developed models for mixtures of outcomes in the exponential family. They assumed that all responses are manifestations of one or more common latent variables and that conditional independence between the outcomes given the value of the latent trait held. This allows one to use the EM algorithm with some numerical or stochastic approximations at each Estep. Blackwell and Catalano (1999) extended the Sammel, Ryan and Legler methodology to longitudinal ordinal data by assuming correlated latent variables at each time point. For simplicity of analysis each outcome is assumed to depend only on one latent variable and the latent variables are assumed to be independent at each time point. 1.4 Format of Dissertation The purpose of this dissertation is to propose and investigate random effects mod els for repeated measures situations when there are two or more response variables. Of special interest is the case when the response varibles are of different types. The dissertation is organized as follows. In Chapter 2 we propose a multivariate generalized linear mixed model which can accommodate any combination of responses in the exponential family. We first describe the usual generalized linear mixed model (GLMM) and then define its extension. The relationship between marginal and conditional moments in the proposed model is briefly discussed and two motivating examples are presented. The key assumption of conditional independence is outlined. Chapter 3 concentrates on maximum likelihood model fitting methods for the proposed model. GaussHermite quadrature, a Monte Carlo EM algorithm and a pseudolikelihood approach are extended from the univariate to the multivariate gen eralized linear mixed model. Standard error approximation is discussed along with point estimation. We use a simulated data example and the two motivating examples to illustrate the proposed methodology. We also address certain issues such as stan dard error variability and comparison between multivariate and univariate analyses. In Chapter 4 we consider inference in the multivariate GLMM. We describe hy pothesis testing for the fixed effects based on approximations to the Wald and likeli hood ratio statistics and propose score tests for the variance components and for test ing the conditional independence assumption. The performance of the GaussHermite quadrature and Monte Carlo approximations to the Wald, score and likelihood ratio statistics is compared via a small simulation study. Chapter 5 introduces a correlated probit model as an alternative to the multivari ate GLMM for binary and continuous data when conditional independence does not hold. We develop a Monte Carlo EM algorithm for maximum likelihood estimation and apply it to one of the motivating examples. To address the issue of advantages of joint over separate analyses of the response variables we design a simulation study F M 19 to investigate possible efficiency gains in a multivariate analysis. An identifiability issue concerning one of the variance components is also discussed. The dissertation concludes with a summary of the most important findings and discussion of future research topics (Chapter 6). CHAPTER 2 MULTIVARIATE GENERALIZED LINEAR MIXED MODEL The Generalized Linear Mixed Model (GLMM) is a very general class of random effects models, well suited for subjectspecific inference for repeated measures data. It is a special case of the random effects model defined in Section 1.2. We start this chapter by providing a detailed definition of the GLMM for repeated univariate mixed models (Section 2.1) and then introduce the multivariate extension (Section 2.2). Some properties of the multivariate GLMM are discussed in Section 2.3 and the chapter concludes with a description of two data sets which will be used for illustration throughout the thesis. Hereafter we omit the superscript c in the conditional mean 4 notation. 2.1 Introduction Let yij denote the jth response observed on the ith subject, j = 1, ...ni, i = 1, ...n. The conditional distribution of Yij given an unobserved q x 1 subjectspecific random vector b, is assumed to be in the exponential family with density f (Yij Ibi) = exp{ yijOij  b(oij) W) (ij ,W fx{ w +c(yij,,w where pij = b'(Oij) is the conditional mean, 4 is the dispersion parameter, b(.) and c(.) are specific functions corresponding to the type of exponential family, and wij are known weights. Also at this stage, it is assumed that g(gij) = g(E(yijlbi)) = 7i, where ij = xT/3 + zTbi is a linear predictor, b, is a q x 1 random effect, 3 is a p x 1 parameter vector and the design vectors xij3P and Zij" are functions of the covariates. At the second stage the subjectspecific effects b, are assumed to be i.i.d. Nq(0, E). In general, the random effects can have other continuous or discrete dis tributions, but the normal distribution provides a full range of possible covariance structures and we will restrict our attention to this case. As an additional assumption, conditional independence of the observations within and between subjects is required, that is, n ni /(ylb;/3,0) =lf(yilbi;/3,O) with f(yi\bi;/3,q) =ff(yijfbi,3,O), i=1 j=l where y = (yl, ...ynT) and b = (bf, ...b) are column vectors of all responses and all random effects respectively. Maximumlikelihood estimation for GL\MM is complicated because the marginal likelihood for the response does not have a closed form expression. Different ap proaches for dealing with this situation are discussed in Chapter 3. The parameters /3 in the GLMM have subjectspecific interpretations; i.e. they describe the individual's rather than the average population response to changing the covariates. Under the GLMM, the marginal mean E* E(yij) = E(E(yjlbi))) = f g (x',/3+zbi)f(bi, E)dbi and in general g(1pij) 1 x5T/3 if g is a nonlinear function. However, this equation holds approximately if the standard deviations of the random effects distributions are small. Closedform expressions for the marginal means exist for certain link functions (Zeger, Liang and Albert, 1988). For example, for the identity link function the marginal and conditional moments coincide. For the log link function the marginal mean is p = exp(xif3 + )J.) For the profit link function, the marginal mean is S= (ap(E)xi ), where p(E) = Ezz+I 2 and q is the dimension ofb. For wher V~~ ~2zI / and q is the dimension of bi. For the logit link, an exact closedform expression for the marginal mean is unavailable but using a cumulative Gaussian approximation to the logistic function leads to the expression logit({tj) al(E)xT/3, where ai(E) = IC2Ezzi + IIq/2 and c = 16j. Unfortunately, there are no simple formulae for the higher order marginal moments except in the case of a linear link function. But the secondorder moments can also be approximated and then the GEE approach can be used to fit the mixed model (Zeger, Liang and Albert, 1988). 2.2 Model Definition Let us first consider bivariate repeated measures. Denote the response vector for the ith subject by yi = (yTyY)T, where Yi = (yil, ...,Y )T and Y2 = (Yi2 1, Yi2ni)T are the repeated measurements on the 1"t and 2nd variable respec tively at rni occasions. The number of observations for the two variables within a subject need not be the same and hence it would be more appropriate to denote them by nil and ni2 but for simplicity we will use ni. We assume that Yilj, j = 1, ...ni, are conditionally independent given bil with density fl(.) in the exponential family. Analogously, Yi2j, j = 1, ...ni, are conditionally independent given bi2 with density f2(') in the exponential family. Note that fl and f2 need not be the same. Also Yi and Yi2 are conditionally independent given b, = (bi bT and the responses on different subjects are independent. Let gl(.) and g2(') be appropriate link functions for fl and f2. Denote the conditional means of yilj and Yi2j by ilj and 1i2j respec tively. Let pil = (i1 ..., P il)T and /i2 = (21, ....Pi2ni, )T. At stage one of the mixed model specification we assume gi(=il) = Xil31 + Zilbil (2.1) g2(,i2) = Xi2J32 + Z,2bi2, (2.2) where 31 and 02 are p, x 1 and P2 x 1 dimensional unknown parameter vectors, Xil and Xi2 are ni, x pi and ni, x P2 dimensional design matrices for the fixed effects, Zil and Zi2 are ni x q, and ni x q2 design matrices for the random effects and g, and 92 are applied componentwise to Mil and i2. At stage two, a joint distribution for bil (q1 x 1) and bi2 (q2 x 1) is specified. The normal distribution is a very good candidate, as it provides a rich covariance structure. Hence, we assume b l: i.i.d.MVN(O,) = MVN [ [ ET 12]) (2.3) b bi2 ) 5 1 T 12 522 ' where E, El and E22 are in general unknown positivedefinite matrices. The exten sion of this model to higher dimensional multivariate repeated measures is straight forward but for simplicity of discussion only the bivariate case will be considered. When E12 = 0 then the above model is equivalent to two separate GLMM's for the two outcome variables. Advantages of joint over separate fitting include the ability to answer intrinsically multivariate questions, better control over the type I error rates in multiple tests and possible gains in efficiency in the parameter estimates. For example, in the developmental toxicity application described in more detail at the end of this section, it is of interest to estimate the dose effect of ethylene glycol on both outcomes (malformation and low fetal weight) simultaneously. One might be interested in the probability of either malformation or fetal weight at any given dose. This type of question can not be answered by a univariate analysis as it requires knowledge of the correlation between the outcomes. Also, testing the significance of the dose effect on malformation and fetal weight simultaneously (using a Wald test for example) allows one to keep the significance level fixed at a. While if the two outcomes were tested separately then an adjustment should be made for the level of the individual tests to achieve overall a level for both comparisons. Multivariate analysis also allows one to borrow strength from the observations on one outcome to estimate the other outcomes. This can lead to more precise estimates of the parameters and therefore, to gains in efficiency. Theoretically, the gain will be the greatest if there is a common random effect for all responses (this means E12 = En11E22 in our notation). Also, if the number of observations per subject is small and the correlation between the two outcomes is large, the gains in efficiency should increase. If the two vectors of random effects are perfectly correlated, then this is equivalent to assuming a common latent variable with a multivariate normal distribution which does not depend on covariates. Hence, the multivariate GLMM reduces to a special case of the Sammel, Ryan and Legler model when the latent variable is not modelled as a function of the covariates. It is possible to allow the random effects in our models to depend on covariates but we have not considered that extension here. Several other models discussed in previous sections turn out to be special cases of this general formulation. For example the bivariate Rasch model is obtained by spec ifying Bernoulli response distributions for both variables, using the logit link function and identity design matrices both for the fixed and for the random effects. The mul tivariate binomiallogit normal model is also a special case for binary responses with identity design matrix for the random effects and unrestricted variancecovariance structure. The Aitchison and Ho multivariate Poisson lognormal model also falls un der this general structure when the response variables are assumed to have a Poisson distribution. 2.3 Model Properties Exactly as in GLMM, the conditional moments in the multivariate GLMM are directly modelled, while marginal moments are harder to find. The marginal means and the marginal variances of Yii and Yi2 for the model defined by (2.2) and (2.3) are the same as those of the GLMM considering one variable at a time: E(yi+) = EE(yil bil) = Epi (31, bil)], E(yi2) = E[/i2(2, bi2)], Var(yil) = E[Var(yilbil)] + Var[E(yi bil)] = E[iV(l)] + Var[ il], Var(yi2) = E[2V(pi2)] + Var[f/i2], where V(pil) and V(/i2) denote the variance functions corresponding to the expo nential family distributions for the two response variables. The marginal covariance matrix between Yi and Yi2 is found to be equal to the covariance matrix between the conditional means ,il and Ai2: CoV(yyl,Yi2) = E(ylY T) E(yil)E(yT) = = EE(yiyTbil, bi2) E(iil)E(') = = E[E(yiibi1)E(yTbi2)] E(il)E(T) = = Cov(mil, Ai2) The latter property is a consequence of the key assumption of conditional indepen dence between the two response variables. This assumption allows one to extend model fitting methods from the univariate to the multivariate GLMM but may not hold in certain situations. This issue will be discussed in more detail in Chapter 4, where score tests are proposed for verifying conditional independence. 2.4 Applications The multivariate GLMM are fitted to two data sets. The first one is a dataset from a developmental toxicity study of ethylene glycol (EG) in mice conducted through the National Toxicology Program (Price et al., 1985). The experiment involved four ran domly chosen groups of pregnant mice, one group serving as a control and the other three exposed to three different levels of EG during major organogenesis. Following Table 2.1. Descriptive statistics for the Ethylene Glycol data Dose (g/kg) Dams Live Fetuses Fetal Weight (g) Malformation Mean SD Number Percent 0 25 297 0.972 0.098 1 0.34 0.75 24 276 0.877 0.104 26 9.42 1.50 22 229 0.764 0.107 89 38.86 3.00 23 226 0.704 0.124 126 57.08 sacrifice, measurements were taken on each fetus in the uterus. The two outcome measures on each live fetus of interest to us are fetal weight (continuous) and malfor mation status dichotomouss). Some descriptive statistics for the data are available in Table 2.1. Fetal weight decreases monotonically with increasing dose with the average weight ranging from 0.972 g in the control group to 0.704 g in the group administered the highest dose. At the same time the malformation rate increases with dose from 0.3% in the control group to 57% in the group administered the highest dose. The goal of the analysis is to study the joint effects of increasing dose on fetal weight and on the probability of malformation. The analysis of these data is compli cated by the correlations between the repeated measures on fetuses within litter. A multivariate GLMM with random intercepts for each variable allows one to explicitly model the correlation structure within litter and provides subjectspecific estimates for the regression parameters. The second data set is from a study to compare the effects of 9 drugs and a placebo on the patterns of myoelectric activity in the intestines of ponies (Lester et al., 1998a, 1998b, 1998c). For that purpose electrodes are attached to four different areas of the intestines of 6 ponies, and spike burst rate and duration are measured at 18 equally spaced time intervals around the time of each drug administration. Six of the drugs and the placebo are given twice to each pony in a randomized complete block design. The remaining three drugs are not given to all ponies and hence will not be analyzed here. There is a rest period after each drug's administration and no carryover effects Table 2.2. Descriptive statistics for pony data Duration Count Pony Mean SD Mean SD Corr. 1 1.03 0.25 77.57 45.53 0.59 2 1.35 0.28 128.25 76.08 0.18 3 1.40 0.36 84.33 46.27 0.45 4 1.27 0.48 111.00 49.66 0.21 5 1.14 0.21 75.44 45.03 0.50 6 1.24 0.40 66.86 48.97 0.32 are expected. The spike burst rate is a count variable reflecting the number of con tractions exceeding certain threshold in 15 minutes. The duration variable reflects average duration of the contractions in each 15minute interval. Figure 2.1 shows graphical representations of the averaged responses by drug and time for one of the electrodes and one hour after the drug administration. Table 2.2 shows the sample means, standard deviations and correlations between the two outcome variables by pony for this smaller data set. We analyse a restricted data set for reasons of com putational and practical feasibility. This issue is discussed in more detail in Chapter 3. / ^ /  7         .. . . . . . .... . . . 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Time period 7 7 7 // ~~~ . ..... . .. .  , 7 7          1.0 1.5 2.0 2.5 3.0 3.5 4.0 Time period Figure 2.1 Count and duration trends over time for the pony data. Each trajectory shows the change in mean response for one of the seven drugs. C C 0 M 2% ~C"J ca 0 OD 0 0  .1  CHAPTER 3 ESTIMATION IN THE MULTIVARIATE GENERALIZED LINEAR MIXED MODEL This chapter focuses on methods for obtaining maximum likelihood (or approx imate maximum likelihood) estimates for the model parameters in the multivariate GLMM. We first mention some approaches proposed for the univariate GLMM (Sec tion 3.1), and then describe in detail extensions of three of those approaches for the multivariate GLMM (Section 3.2). The proposed methods are then illustrated on a simulated data example (Section 3.3), and on the two 'reallife' data sets introduced in Chapter 2 (Section 3.4). Some issues such as standard error variability and advan tages of one multivariate versus several univariate analyses are addressed in Sections 3.3 and 3.4. The chapter concludes with discussion of some additional modelfitting methods (Section 3.5). 3.1 Introduction In GLMM the marginal likelihood of the response is obtained by integrating out the random effects nf f (yIbi; )3, ))f(bi)dbi, i=l" where f(bi) denotes a normal density. Usually these integrals are not analytically tractable and some kind of approximation must be used. Direct approaches to maximum likelihood estimation are based on numerical or stochastic approximations of the integrals and on numerical maximizations of those approximations. Probably the most widely used numerical approximation procedure for this type of integral is GaussHermite quadrature. It involves evaluating the integrands at m prespecified quadrature points and substituting weighted sums in place of the intractable integrals for the n subjects. If the number of quadrature points m is large the approximation can be made very accurate but to keep the numerical effort low m should be kept as small as possible. GaussHermite quadrature is appropriate when the dimension of the random effects is small. An alternative for highdimensional random effects is to approximate the integrals by Monte Carlo sums. This involves generating m random values from the random effects distribution for each subject, evaluating the conditional densities f(yilbi;/3, 0) at those values and taking averages. Details on how to perform GaussHermite quadrature and Monte Carlo approximation for the GLMM can be found in Fahrmeir and Tutz (1994, pp.357 365). We discuss GaussHermite quadrature for the multivariate GLMM in Section 3.2. Liu and Pierce (1994) consider adaptive Gaussian quadrature, which allows a reduction in the required number of quadrature points by centering and scaling them around the mode of the integrand function. This procedure is described in more detail and applied to one of the data examples in Section 3.4. Indirect approaches to maximum likelihood estimation use the EMalgorithm (Dempster, Laird and Rubin, 1977), treating the random effects as the missing data. We apply these methods both to the multivariate GLMM and to the correlated pro bit model and therefore we now introduce the basic ideas. Hereafter 0/ denotes the vector of all unknown parameters in the problem, and b denotes some estimate of 4'. The EM algorithm is an iterative technique for finding maximum likelihood estimates when direct maximization of the observed likelihood f(y; 4) is not feasible. It involves augmenting the observed data by unobserved data so that maximization at each step of the algorithm is considerably simplified. The unobserved data are denoted by b and in the GLMM context these are the random effects. The EMalgorithm can be summarized as follows: 1. Select a starting value (0). Set r = 0. 2. Increase r by 1. Estep: Calculate E{lnf(y, b; ) 1y; (T1 }. 3. Mstep: Find a value '(r) of 0 that maximizes this conditional expectation. 4. Iterate between (2) and (3) until convergence is achieved. In the GLMM context the complete data is u = (yT bT7)T and the complete data loglikelihood is given by n n lnL, = 1n f(yiJbi;I3,) + In f(biE). =l ij=I The rth Estep of the EM algorithm involves computing E(lnLIy, (1) and the rth Mstep maximizes this quantity with respect to 4 and updates the parameter estimates. Notice that because (3 and 0 enter only the first term of L., the Mstep with respect to 3 and 4 uses only f(yIb) and so it is similar to a standard generalized linear model computation with the values of b treated as known. Maximizing with respect to E is just maximum likelihood using the distribution of b after replacing sufficient statistics with their conditional expected values. In general, the conditional expectations in the Estep can not be computed in closed form, but GaussHermite or different Monite Carlo approximations can be utilized. (Fahrmeir and Tutz, 1994, pp. 362365; McCulloch, 1997; Booth and Hobert, 1999). Many authors consider tractable analytical approximations to the likelihood (Bres low and Clayton, 1993; Wolfinger and O'Connell, 1994). Although these methods lead to inconsistent estimates in some cases, they may have considerable advantage in com putational speed over 'exact' methods and can be fitted with standard software. Both Breslow and Clayton's and Wolfinger and O'Connell's procedures amount to itera tive fitting of normal theory linear mixed models and can be implemented using the %GLIMMIX macro in SAS. An extension of Wolfinger and O'ConneUll's approach is considered in Section 3.2. A Bayesian paradigm with flat priors can also be used to approximate the maxi mum likelihood estimates using posterior modes (Fahrmeir and Tutz, 1994, pp.233 238) or posterior means (Zeger and Karim, 1991). Though the numerator in such computations is the same as for the maximum likelihood calculations, the posterior may not exist for diffuse priors (Natarajan and McCulloch, 1995). This may not be detected using computational techniques such as the Gibbs sampler, and can result in incorrect parameter estimates (Hobert and Casella, 1996). 3.2 Maximum Likelihood Estimation For simplicity of presentation we again consider the case of only two response variables. The marginal likelihood in the bivariate GLMM is obtained as in the usual GLMM by integrating out the random effects n n, I1 f Ji fI(yi, Ibii;3j, 0i)f2(y 2jjbi2;A/2,02)}f(bil, bi2; E)dbildbi2, (3.1) i=1 j=l where f denotes the multivariate normal density of the random effects. In this sec tion we describe how GaussHermite quadrature, Monte Carlo EM algorithm and pseudolikelihood can be used to obtain estimates in the multivariate GLMM. Both Gaussian quadrature and the Monte Carlo EM algorithm are referred to as 'exact maximum likelihood' methods because they target the exact maximum likelihood es timates. In comparison, methods that use analytical approximations are referred to as 'approximate' maximum likelihood. We now start by describing the 'exact' maxi mum likelihood methods. Hereafter 0 = Vec(31, I2, 01 2, 6), where 6 are unknown variance components in E. 3.2.1 GaussHermite Quadrature Parameter Estimation The marginal loglikelihood in the multivariate GLMM is expressed as a sum of the individual loglikelihoods for all subjects, that is n lnL(y; i) = lnLi (0), i=1 where n(4) = {JI fi(Yljbi1; ,, l)f2(yi2jlb2; 32, 2)} exp(bElb)db. Sj=l 27r 2 Hence GaussHermite quadrature (Fahrmeir and Tutz, 1994) involves numerical ap proximations of n qdimensional integrals (q = q + q2). The big's are first transformed so that each integral has the form Li = J h(zi)exp(zTz)dzi . 7 JRq The needed transformation is b, = vhLzi, where E = LLT. L is the lower triangular Choleski factor of E and it always exists because E is nonnegative definite. Here ni h(zi) = fI (YiViljZil;)1, Ol)f2(Yi2j zi2;)32,02). j=l Each integral is then approximated by m m GQ = E . E h(Z), kl=l kq=l where z(k) = v/2Ld(k) for the multiple index k = (k1, ...kq) and d(k) denote the tabled nodes of univariate GaussHermite integration of order m (Abramowitz and Stegun, 1972). The corresponding weights are given by vk = r wk) where Wk are the tabled univariate weights, 1 = 1, ...m. The maximization algorithm then proceed as follows: 1. Choose initial estimate for the parameter vector b( ). Set r = 0 2. Increase r by 1. Approximate each of the integrals LiP(r )) by LQ((r )) using m quadrature points in each direction. 3. Maximize the approximation with respect to f using a numerical maximization routine. 4. Iterate between steps (2) and (3) until the parameter estimates have converged. A popular numerical maximization procedure for step 3 is the NewtonRaphson method. It involves iteratively solving the equations )(r) = (r1) + (r1) where S ) = 8,L and J((r1)) 2LnL (r) is the observed infor mation matrix. One possible criterion for convergence is (r+l) r) max rI"+ < 61, where ir) denotes the estimate of the s8th element of the parameter vector at step r of the algorithm and 81 and 62 are chosen to be small positive numbers. The role of 62 is to prevent numerical problems stemming from estimates close to zero. Another frequently used criterion is (r+i) ;(r)1< 6, 112P 0 1<6, where 1 I1 denotes Euclidean norm. The numerical maximization procedure MaxBFGS which we are going to use for the numerical examples later in this chapter is based on two criteria for convergence: [Ss rs ) < e for all s when 5r) 0 s()I < E for all s when r) = 0 and (r+l) r) < 10e\r)\ for all s when ^) 0 (r+l) r) _< 10e for all s when ir) = 0 where s, denotes the sth component of the score vector. Standard Error Estimation After the algorithm has converged estimates of the standard errors of the pa rameter estimates can be based on the observed information matrix. By asymptotic maximumlikelihood theory Var() = I1(i), where I(i) = E( 8n r) is the expected information matrix. But the observed information matrix J(g,) is easier to obtain and hence we will use the latter to ap proximate the standard errors where i8S is the 8th diagonal element of the inverse of the observed information ma trix. Note that if the NewtonRaphson maximization method is used the observed information matrix is a byproduct of the algorithm. Numerical and Exact Derivatives The observed information matrix and the score vector are not available in closed form and must be approximated. One can either compute numerical derivatives of the approximated loglikelihood, or approximate the intractable integrals in the expressions for the exact derivatives. The first approach is simpler to implement but may require a large number of quadrature points. The method of finite differences can be used to find numerical derivatives. The sth element of the score vector and the (s, t)th element of the information matrix are approximated as follows: n ^ +'0 + 62t  W Et i= l 2e ist( ) il li' I + 61Z^ + 22t) W(V ElZ, + 622t) f^+ 6^I, EA) + E126, EA) i = 4 6 162 where lk = InLQ, E and E2 are suitably chosen step lengths, and z, and zt are unit vectors. The unit vectors have ones in positions s and t respectively and the rest of their elements are zeros. An alternative to numerical derivatives is to approximate the exact derivatives. The latter are obtained by interchanging the differentiation and integration signs and are as follows: OlnL 1" rlnfi and a21nL n P "f, 1di f fidbI fW dbi f o d/. Ifdbi 7i=T (f fidb,)2 where ni fi = In{fl(Yiljbi;/)31,1)f2(yi2jlbi;/2,2)}f(bi;E). j=l For each subject there are three intractable integrals which need to be approximated: f fidb, f o and f (**T + ^ )fb and GaussHermite quadrature is used for each one of them. Note that n n lnfi = lnfi(yiljIbi;/3i, i1) + lnf2(yi2jvbi;P2,g2) + lnf(b,; E). ni i oiE=l I 1 oE=l n) Hence O = f = Inf2) and = f. Similarly the deriva a9 a~i a)2 .9 2 0gt 8 tives with respect to the scale parameters 01 and 02, if needed, are obtained by differentiating the first and second term in the summation for lnfi. This leads to simple expressions for the integrands because the functions that are differentiated are members of the exponential family (Fahrmeir and Tutz, 1994). Note that the random effects distribution is always multivariate normal which complicates the expressions for the second order derivatives with respect to the vari ance components. Jennrich and Schluster (1986) propose an elegant approach to deal with this problem and we now illustrate their method for the multivariate GLMM. For simplicity of notation consider bivariate random effects bi. Let 6= 82 = 2 63 p (83) \p (I and~~ le2 1 aU2 Te rt and let = 2 po2 Then write ^ P01C2 2 T o 2(1 0) ) 20 0 ) 0 1) =o1 0 0 +a 0 1 1+P'lo2 1 0 " Let , s = 1,...3. Then ,9lnfi 1 0lf 2tr{Y (bibT _ and a2lnfi 1 __ O69O6t 2 In the above parametrization of the variancecovariance matrix there are restrictions on the parameter space (al > 0, a02 > 0, 1 < p < 1), but the algorithm does not guarantee that the current estimates will stay in the restricted parameter space. It is then preferable to maximize with respect to the elements of the Choleski root of E. The Choleski root is a lefttriangular matrix L = ( / ) such that E = LL'. Its \121 122 elements are held unrestricted in intermediate steps of the algorithm and hence they can lead to a negativedefinite final estimate of E. Such an outcome indicates either a numerical problem in the procedure or an inappropriate model. However, if only intermediate estimates of E are negativedefinite, then the maximization algorithm with respect to the elements of the Choleski root will work well while the other one can have problems. Although the parametrization in terms of 111, 12l and 122 has fewer numerical problems than the parametrization in terms of oai, a2 and p, it does not have nice interpretation. It then seems reasonable to use the Choleski roots in the maximization procedure but to perform one extra partial step of the algorithm to approximate the information matrix for the meaningful parametrization ao, a2 and p. 3.2.2 Monte Carlo EM Algorithm Parameter Estimation The complete data for the multivariate GLMM is u = (yT, bT)T and hence the complete data loglikelihood is given by n ni n ni nr lnL, = In f (yij Ibil;/31, 0) + 1 In f2(yi2j21bi2;/3, 1 2) + 1n f(bi, E). i=l j=l i=1 j2=1 i1 Notice that (/31, 0i), (132, 92) and E appear in different terms in the loglikelihood and therefore each Mstep of the algorithm consists of three separate maximizations of E[E' 1 'iln fi (yii, bil; 3,0i)y], n[E' E;=lin f2(yi2j3lbi2;/32,2)Iy], and E [ZnI=n 1f(b, E)y] respectively, evaluated at the current parameter estimate of 1P. Recall that for the regular GLMM there are two separate terms to be maximized. These conditional expectations do not have closed form expressions but can be approximated by Monte Carlo estimates: 1m n niz __ S l nf,(yi )" jb';31 91) (3.2) E EE 1n f (Yiljl I.bil, (3.2) m k=l i=1 j1=l 1m n ni 1 EE n f2 (Y2j2 Ibi2 ;2, 2) (3.3) m k=l i=1 j2=1 m n 1E E1nfb (); E), (3.4) mn k=l i=1 where (b(k) b(k), k = 1, ...m are simulated values from the distribution of bilyi, \ il ui2 1 evaluated at the current parameter estimate of 0. In order to obtain simulated values for bi multivariate rejection sampling or importance sampling can be used as proposed by Booth and Hobert (1999). At the rth step of the Monte Carlo EM algorithm multivariate rejection sampling algorithm is as follows: For each subject i, i = 1, ...n, 1. Compute i = supbRq f(yijbi, (rl)). 2. Sample b(k) from the multivariate normal density f(bi; ) and indepen dently sample wk) from Uniform(O, 1). 3. If w < YbL then accept b.; if not, go to 2. Iterate between (2) and (3) until m generated values of bi are accepted. Once m simulated values for the random effects are available, the Monte Carlo estimates of the conditional expectations can be maximized using some numerical maximization procedure such as the NewtonRaphson algorithm. Convergence can be claimed when successive values for the parameters are within some required precision. The same convergence criterions as those mentioned for GaussHermite quadrature can be applied but with generally larger J1 value. Booth and Hobert (1999) suggest using 61 between 0.002 and 0.005. This is needed to avoid excessively large simulation sample sizes. A nice feature of the Booth and Hobert algorithm is that because independent random sampling is used, Monte Carlo error can be assessed at each iteration and the simulation sample size can be automatically increased. The idea is that it is inefficient to start with a large simulation sample but as the estimates get closer to the maximum likelihood estimates the simulation sample size must be increased to provide enough precision for convergence. Booth and Hobert propose to increase the simulation sample size m with m/t, where t some positive number, if (r1) lies in a 95% confidence ellipsoid constructed around '(r) ,that is if ()(r) ( (r1) where c2 denotes the 95th percentile of Chisquare distribution with number of de grees of freedom equal to the dimension of the parameter vector. To describe what variance approximation is used above some notation must be introduced. Denote the maximizer of the true conditional expectation Q = E(InLr(b)y,?r1)) by *r'). Then Var((r) (rl)) ,Q (2m) (*(r),b(r1)) 1varf (Q(1) (b*(r) 1(r1)) IO(2m)(b*(r)l~ r)i Here Qm = 1 lnL.(b(k)y, ((W)) and Q() and Q() are the vector and matrix of first and second derivatives of Qm respectively. A sandwich estimate of the variance ^(r) ^*(r) is obtained by substituting f) in place of 4' and using the estimate 1 rQ$,lnfpy(b k); =(1))alnf(yb(k); ) alnf(y,b(k);g,(r)) vacuum), *( 1) J=Hf if S(D/rn^ M2k1 ^r ayo i^ .^ W; ) In summary, the algorithm works as follows: 1. Select initial estimate 4' Set r = 0. 2. Increase r by 1. Estep: For each subject i, i = 1,...n generate m random samples from the distribution of bilyi; (r1) using rejection sampling. 3. Mstep: Update the parameter estimate of 4 by maximizing (3.2), (3.3) and (3.4). 4. Iterate between (2) and (3) until convergence is achieved. Standard Errors Estimation After the algorithm has converged standard errors can be estimated using Louis' method of approximation of the observed information matrix (Tanner, 1993). Denote the observed data loglikelihood by I and then the observed data information matrix is 02(y, 0) nO21 (b,y, ) )nL(y, T = E[ aapT ly] Var[OlnL(b ly] = E[ O2lnL, (b,y, ) 9lnL (b,y,4) OlnL (b, Y, )ly] +1 99lnn9(b,Y, 0) J lnn,(b, y,) , By simulating values b k), k = 1, ...m, i = 1, ...n from the conditional distributions of bilyi at the final 4 the conditional expectations above can be approximated by the Monte Carlo sums 1 m a2 nL,(b(k), y, O) OlnL.(b(k), y, ) alnL(b('k) , b) 19+ 0T aoT and 1 alnLu(b(k), y,4) m k=1 0 In the above expressions the argument ip means that the derivatives are evaluated at the final parameter estimate. Exactly as in the GaussHermite quadrature example Var(o) = ( As the simulation sample size increases the Monte Carlo EM algorithm behaves as a deterministic EM algorithm and usually converges to the exact maximum like lihood estimates, but it may also converge to a local instead of a global maximum (Wu, 1983). A drawback of the Monte Carlo EM algorithm is that it may be very computationally intensive. Hence important alternatives to this 'exact' method for GLMM are methods based on analytical approximations to the marginal likelihood such as the procedures suggested by Breslow and Clayton (1993) and by Wolfinger and O'Connell (1993). Wolfinger and O'Connell propose finding approximate maxi mum likelihood estimates by approximating the nonnormal responses using Taylor series expansions and then solving the linear mixed model equations for the associated normal model. The extension to their method is presented in the next section. 3.2.3 Pseudolikelihood Approach Parameter Estimation Let us first assume for simplicity that the shape parameters 01 and 02 for both response distributions are equal to 1. By the model definition mi, gl 1(Xiil31 + Zijbil) /Ai2 = g92 (Xi2/32 + Zi2bi2), where gl1 and g2' above are evaluated componentwise at the elements of Xil/31 + Ziibil and Xi2/32 + Zi2bi2 respectively. Let /13i, /2, b1i and bi2, i = 1, ...n, be some estimates of the fixed and the random effects. The corresponding mean estimates are denoted by fiil and Ati2. In a neighborhood around the fixed and random effects estimates the errors Eil = Yii i, and Ei2 = Yi2 Ai2 are then approximated by first order Taylor series expansions. Denote the two approximations by eii Yii Ail (g1)'(Xi 1 + Zilbil)(Xil/31 + Zilbii XJ1/ Zibil) and i2 = Yi2 Ai2 (g21)'(Xi2f2 + Zi2bi2)(Xi2P2 + Zi2bi2 Xi22 Zi22), where (g1)' (Xil31 + Zilbil) and (g') (Xi2j32 + Zi2bi2) are diagonal matrices with elements consisting of evaluations of the first derivatives of gl1 and g'. Note that because the estimates of the random effects are not consistent as the number of subjects increases this expansion may not work well when the number of observations per subject is small. Now as in Wolfinger and O'Connell we approximate the conditional distributions of ,Eil bi and E.i2bi by normal distributions with 0 means and diagonal variance ma trices V1(ti1) = diag(Vi(pin),...V1(piln,)) and V2(jAi2) = diag(V2(pi2l),...!V2(i2,J)) respectively, where V1 (.) and V2 () are the variance functions of the original responses. Note that it is reasonable to assume that the two normal distributions are uncorrelated because the responses are assumed to be conditionally independent given the random effects. The normal approximation, on the other hand, may not be appropriate for some distributions, such as the Bernoulli distribution. Substituting fil and Ai2 in the variance expressions, and using the fact that (g(1)'(XA + Zgib = (g'9ii))1 and (g1)'(X22 Z2b2) = (g(2)), the conditional distributions of g'(#jl)(yi Ail) and g'(i2)(yi f12) are also multi variate normal with appropriately transformed variancecovariance matrices. Defining vil= gi(A il) + g'l(ftA)(yil Ail) and vi2 = g2(Ai2) + i)(i2 Ai2), it follows that viibi b Nn (Xii/31 + Zilbi,; gI(jil)VIi( jA)gj (Ail)) i2b. N(Xi2/32 + Zba2, 922)V2(a2) (2^)) Now, recalling that bi has a multivariate normal distribution with mean 0 and variancecovariance matrix E, this approximation takes form of a linear mixed model with response Vee(vil, vi2), random effect Vec(bil, bi2), and uncorrelated but het eroscedastic errors. Such a model is called a weighted linear mixed model with a diagonal weight matrix W = diag(Wi), where Wi = diag(Wil, Wi2), ~i g'l(Ail)Vl(jA)gl(Ai) and W2l = g'2 i)V2(2)g2(i2) For canonical link functions Vl(jl) = [g'(ftil)]1 and V2(A,2) = [g((A2)]1, and then W,1 diag(gl(A,l),2(Ai2)) Estimates of 3 = Vec(/31, f32) and b, can be obtained by solving the mixedmodel equations below (Harville, 1977): (XTWX XTWZE \(/3 \ X zTWX E + ZT Z ) b) ZT V ) (3.5) Here /X, Z, V11 / bi X = ,Z = ,v = ,b = " X o zV b o \Xn / \Zn / Vn bn and Xi (xil 0) z il 0 ) Zi2 il 0 X Zi = 0 Zi/ 'i x1.=( n )~ ~ 0 'i ( n 7 Variance component estimates can be obtained by numerically maximizing the weighted linear mixed model loglikelihood 1 InIVil ErTVri, (3.6) 2 1=1 2 i=l where V, = W 1 + ZEZiT and ri = v Xi(XT V Xxi)xTVi 'Vi. The algorithm then is as follows: 1. Obtain initial estimates t(') and fjt using the original data, i = 1, ...n. 2. Compute the modified responses Iia = g(Ail) + (YiI Ai)g'l(Ai) and g2 = 92(Ai2) + (Yi2 Ait2)g2(fit2). 3. Maximize (3.6) with respect to the variance components. Stop if the difference between the new and the old variance component estimates is sufficiently small. Otherwise go to the next step. 4. Compute the mixedmodel estimates for /3 and b, by solving the mixed model equations (3.5): 3 = (Zn1 XiTVi X,)(! 1 X[V,i=), 5. Compute the new estimates of pi = (Ail, 11i2): A1 = g11(X13 + Zilbi) and A2 = g2'(X 3A + Z,2b,2). Go to step 2. In comparison to the original Wolfinger and O'Connell algorithm, the above al gorithm for multiple responses is computationally more intensive at step 3 because of the more complicated structure of the variancecovariance matrices Vi. As the number of response variables increases the computational advantages of this approx imate procedure over the "exact" maximum likelihood procedures may become less pronounced. The NewtonRaphson method is a logical choice for the numerical maximization in step 3 because of its relatively fast convergence and because as a side product it provides the Hessian matrix for the approximation of the standard errors of the variance components. In the general case when 01 and 02 are arbitrary, the variance functions V1(/1jl) and V2(ti2) are replaced by 51V1Q(#il) and 2V2(AU2), and hence the weight matrices are accordingly modified: W*1I = 1' (Ai)Vi(Ai)g'1(AL) and W 1 (= 292'(i2)2(Ai2)2(Ai2). The estimates of 1 and 02 can be obtained in step 3 together with the other variance components estimates. This approach is different from the approach taken by Wolfinger and O'Connell, who estimated the dispersion parameter for the response together with the fixed and random effects. Standard Error Estimation The standard errors for the regression parameter and for the random effects esti mates are approximated from the linear mixed model: ( ZTWX XTWYZE 1 arb zTWX ;+ ZWZ As E and W are unknown, estimates of the variance components will be used. The standard errors for the variance components can be approximated by using exact or numerical derivatives of the likelihood function in step 3 of the algorithm. For the same reasons as discussed in Section 3.2 it is better to maximize with respect to the Choleski factors of E. The computation of the standard errors, however, should be carried out with respect to the original parameters a,, U2 and p (or oal, o2 and Oa12). This can be done only if the final variancecovariance estimate is positive definite. The approach discussed in Section 3.2 for representation of the variancecovariance matrix can also be adopted here to simplify computations. 3.3 Simulated Data Example The methods discussed in the previous sections are illustrated on one simulated data set in which the true values on the parameters are known. The data consists of J = 10 repeated measures of a normal and of a Bernoulli response on each of I = 30 subjects. No covariates are considered and a random intercept is assumed for each variable. In the notation introduced in Chapter 2, Yiljhbil indep. N(pilj, a2), Yi2jbi2 indep. Bernoulli(pi2j), ilj = 0i + bil, logit(iv2) = /32+ bi2, Tr U1 ~2 bi = (bil, bi2) MVN(O E), 2= [a [ 012 02 This means that we define a normal theory random intercept model for one of the responses, a logistic regression model with a random intercept for the other response, and we assume that the random intercepts are correlated. The data are simulated ( 10.5) for 02 = 1, 01 = 4, /2 = 1, E = 0.5 0 ). The parameter estimates and their standard errors for the three fitting methods discussed above are presented in Table 3.1. The methods are programmed in Ox and are run on Sun Ultra workstations. Ox is an objectoriented matrix programming language developed at Oxford University Table 3.1. Estimates and standard errors from model fitting for the simulated data example using GaussHermite quadrature (GQ), Monte Carlo EM algorithm (MCEM) and pseudolikelihood (PL) Parameter True GQ MCEM PL value Est. SE Est. SE Est. SE 31 4.0 3.72 0.19 3.72 0.20 3.72 0.19 /02 1.0 1.08 0.18 1.08 0.19 1.08 0.17 a2 1.0 0.97 0.08 0.97 0.08 0.97 0.08 a1 1.0 0.94 0.27 0.94 0.27 0.94 0.27 U2 0.5 0.39 0.26 0.41 0.35 0.35 0.22 012 0.5 0.54 0.21 0.54 0.24 0.48 0.20 (Doornik, 1998). It is convenient because of the availability of matrix operations and a variety of predefined functions and even more importantly because of the speed of computations. All standard errors, except those for the Monte Carlo EM algorithm and those for the regression parameters for the pseudolikelihood algorithm, are based on numerical derivatives. The identity matrix is used as an initial estimate of the covariance matrix for the random components, and the initial estimates for /31, 03 and a2 are taken to be the sample means for the normal and the Bernoulli response, and the sample variance for the normal response. GaussHermite quadrature is used with number of quadrature points in each dimension varying from 10 to 100. The MaxBFGS numerical maximization procedure in Ox is used for optimization in the Gaussian quadrature and the pseudolikelihood algorithms with a convergence criterion based on 61 = 0.0001. The convergence criterion used for the Monte Carlo EM algorithm is based on 61 = 0.003. MaxBFGS employs a quasiNewton method for maximization developed by Broyden, Fletcher, Goldfarb and Shanno (BFGS) (Doornik, 1998) and it can use either analytical or numerical first derivatives. The times to achieve convergence by the three methods are 30 minutes for Gaus sian quadrature (GQ) with 100 quadrature points in each dimension, approximately 3 hours for the Monte Carlo EM algorithm and 1.5 hours for the pseudolikelihood (PL) algorithm. The numbers of iterations required are 22, 53 and 7 respectively. The final simulation sample size for the Monte Carlo EM algorithm is 9864. As expected, the Monte Carlo EM estimates are very close to the Gaussian quadra ture estimates with comparable standard errors. Only the estimate of u2 is slightly different and has a rather large standard error, but this is due to premature stopping of the EM algorithm. This problem can be solved by increasing the required precision for convergence but at the expense of a very large simulation sample size. As can be seen from the table, the parameters corresponding to the normal re sponse are well estimated by the pseudolikelihood algorithm, but some of the es timates corresponding to the Bernoulli response are somewhat underestimated and have smaller standard errors. The estimate of the random intercept variance a2 is about 10% smaller than the corresponding Gaussian quadrature estimate with about 15% smaller standard error. The estimate of a12 is also about 15% smaller than the corresponding GaussHermite quadrature and Monte Carlo estimates, but it is actually closer to the true value in this particular example. In general the variance component estimates for binary data obtained using the pseudolikelihood algorithm are expected to be biased downward as indicated by previous research in the univari ate GLMM (Breslow and Clayton, 1993; Wolfinger, 1998). Hence the Wolfinger and O'Connell method should be used with caution for the extended GLMM's with at least one Bernoulli response, keeping in mind the possible attenuation of estimates and of the standard errors. GaussHermite Quadrature: 'Exact' vs 'Numerical' Derivatives As mentioned before MaxBFGS can be used either with exact or with numerical first derivatives. Comparisons of these two approaches in terms of computational speed and accuracy are provided in Tables 3.2 and 3.3 respectively. The 'Iteration' columns in Table 3.2 contain number of iterations and the 'Convergence' columns Table 3.2. 'Exact' versus 'numerical' derivatives for the simulated data example: convergence using GaussHermite quadrature. Number Exact derivatives Numerical derivatives of quad. Time Iteration Convergence Time Iteration Convergence points (min.) (min.) 10 27.4 56 no 0.3 19 strong 20 21.3 22 weak 1.1 19 strong 30 75 35 weak 2.8 23 strong 40 120 34 strong 4.8 22 strong 50 246 47 strong 7.2 22 strong 60 237 34 strong 10.9 22 strong show whether the convergence tests are passed at the prespecified 61 level (strong convergence), or passed at the lower level 51 = 0.001 (weak convergence) or not passed at all (no convergence). Note that for the exact derivatives strong convergence is achieved with 40 or more quadrature points. It is rather surprising that the 'exact' Gaussian quadrature procedure does not perform better than the 'numerical' one. The parameter estimates from both pro cedures are the same (Table 3.3) for 40 or more quadrature points but the 'exact' GaussHermite quadrature converges much more slowly than the numerical procedure (hours as compared to minutes). The numerical procedure also has the advantage of simpler programming code and hence for the rest of the disertation will be the preferred quadrature method. Monte Carlo EM algorithm: Variability of Estimated Standard Errors We observed that the estimated Monte Carlo EM standard errors varied somewhat for different random seeds, so we generated 100 samples with the final simulation sam ple size and computed the average and the standard deviation of the standard error estimates for these samples. Table 3.4 shows the results. We include the Gaussian quadrature standard error estimates for comparison. There is much variability in the standard error estimates (especially in the estimate of oU) which is primarily due to Table 3.3. 'Exact' versus 'numerical' derivatives for the simulated data example: estimates and standard errors using GaussHermite quadrature with varying number of quadrature points (m). m Exact derivatives Numerical derivatives /31 32 a02 al2 a2 012 21 02 02 3a 2 2 1T C2 1 13U.2~ 2 Or1 SE SE SE SE SE SE SE SE SE SE SE SE 10 3.68 1.06 0.97 0.98 0.38 0.52 3.72 1.08 0.97 0.85 0.35 0.48 0.13 0.16 0.08 0.28 0.25 0.22 0.12 0.16 0.08 0.16 0.24 0.15 20 3.75 1.10 0.96 0.95 0.39 0.54 3.75 1.10 0.97 0.95 0.39 0.54 0.16 0.18 0.08 0.27 0.26 0.21 0.16 0.18 0.08 0.27 0.26 0.21 30 3.71 1.07 0.97 0.94 0.39 0.54 3.71 1.07 0.97 0.95 0.39 0.54 0.19 0.18 0.08 0.27 0.26 0.21 0.19 0.18 0.08 0.29 0.27 0.22 40 3.72 1.08 0.97 0.94 0.39 0.54 3.72 1.08 0.97 0.94 0.39 0.54 0.18 0.18 0.08 0.27 0.26 0.21 0.18 0.18 0.08 0.26 0.26 0.20 50 3.72 1.08 0.97 0.94 0.39 0.54 3.72 1.08 0.97 0.94 0.39 0.54 0.19 0.18 0.08 0.27 0.26 0.21 0.19 0.18 0.08 0.27 0.26 0.21 60 3.72 1.08 0.97 0.94 0.39 0.54 3.72 1.08 0.97 0.94 0.39 0.54 0.19 0.18 0.08 0.27 0.26 0.21 0.19 0.18 0.08 0.27 0.26 0.21 several extreme observations. If the simulation sample size is increased by onethird, the standard error estimates are more stable (see the last two columns of Table 3.4). It is not surprising that there is more variability in the standard error estimates than in the parameter estimates because only the latter are controlled by the convergence criterion. So it may be necessary to increase the simulation sample size to obtain better estimates of the standard errors. This, however, requires additional compu tational effort and it is not clear by how much the simulation sample size should be increased. We consider a different approach to dealing with standard error instability in the next section of the dissertation. Comparison of Simultaneous and Separate Fitting of the Response Variables To investigate the possible efficiency gains of joint fitting of the two responses over separate fitting, we compared estimates from the joint to estimates from the two separate fits. The results for the three estimation methods are provided in Table Table 3.4. Variability in Monte Carlo EM standard errors for the simulated data example. Means and standard deviations of the standard error estimates are com puted for 100 samples using two different simulation sample sizes. The GaussHermite quadrature standard errors are given as a reference in the second column. Parameter GQ SE Mean SE SDofSE Mean SE SDofSE ss=9864 ss=13152 f1 0.19 0.21 0.07 0.19 0.03 /32 0.18 0.20 0.09 0.19 0.03 a2 0.08 0.08 0.0008 0.08 0.0001 a2 0.27 0.27 0.03 0.27 0.01 2a 0.26 0.41 0.76 0.31 0.13 a012 0.21 0.22 0.04 0.22 0.02 Table 3.5. Results from joint and separate GaussHermite quadrature of the two response variables in the simulated data example Parameter True value Joint estimation Separate estimation Estimate SE Estimate SE 31 4.0 3.72 0.19 3.72 0.19 /32 1.0 1.08 0.18 1.07 0.18 a2 1.0 0.97 0.08 0.97 0.08 a12 1.0 0.94 0.27 0.94 0.27 2O 0.5 0.39 0.26 0.37 0.25 7r12 0.5 0.54 0.21  3.5 (Gaussian quadrature), Table 3.6 (Monte Carlo EM), and Table 3.7 (pseudo likelihood). For the separate fits we used PROC NLMIXED for Gaussian quadrature and wrote our own programs in Ox for Monte Carlo EM and pseudolikelihood. Notice that when the Normal response is considered separately, the corresponding model is a oneway random effects ANOVA and can be fitted using PROC MIXED in SAS, for example. The estimates from PROC MIXED are as follows: / = 3.72 (SE = 0.19), &2 = 0.97 (SE = 0.08) and &2 = 0.94 (SE = 0.27). It is surprising that although the estimated correlation between the random effects was rather high ( a1 = f = 0.86) the estimates and the estimated standard errors from the joint and from the separate fits were very similar. Having in mind that it is much faster to fit the responses separately and that there is existing software for univariate repeated measures, it is not clear if there are any advantages to joint fitting Table 3.6. Results from joint and separate Monte Carlo EM estimation of the two response variables in the simulated data example Parameter True value Joint estimation Separate estimation Estimate SE Estimate SE /31 4.0 3.72 0.20 3.73 0.22 )2 1.0 1.08 0.19 1.08 0.18 a2 1.0 0.97 0.08 0.97 0.08 a2 1.0 0.94 0.27 0.95 0.28 2o 0.5 0.41 0.35 0.38 0.23 U12 0.5 0.54 0.24  Table 3.7. Results from joint and separate pseudolikelihood estimation of the two response variables in the simulated data example Parameter True value Joint estimation Separate estimation Estimate SE Estimate SE 01 4.0 3.72 0.19 3.72 0.19 12 1.0 1.08 0.17 1.05 0.17 a.2 1.0 0.97 0.08 0.97 0.08 OT2 1.0 0.94 0.27 0.94 0.27 202 0.5 0.35 0.22 0.35 0.22 OU12 0.5 0.48 0.20  in this particular example. Advantages and disadvantages of joint and separate fitting will be discussed in Chapter 5 of the dissertation. 3.4 Applications 3.4.1 Developmental Toxicity Study in Mice In this section the 'exact' maximum likelihood methods are applied to the ethylene glycol (EG) example mentioned in the introduction. The model is defined as follows (di denotes the exposure level of EG for the ith dam): Yij birth weight of jth live fetus in ith litter. Yi2j malformation status of jth live fetus in ith litter. Yaji bil indep. N(piij, a2). yi2j I bi2 ' indep. Be(fii2j). Table 3.8. GaussHermite and Monte Carlo EM estimates and standard errors from model fitting with a quadratic effect of dose on birth weight in the ethylene glycol data Parameter GaussHermite quadrature Monte Carlo EM Estimate SE Estimate SE 10o 0.962 0.016 0.964 0.037 01 0.118 0.028 0.120 0.053 012 0.010 0.009 0.010 0.015 /320 4.267 0.409 4.287 0.455 021 1.720 0.207 1.731 0.217 a2 0.006 0.0003 0.006 0.0003 7l 0.007 0.001 0.007 0.001 0' 2.287 0.596 2.238 0.560 a12 0.082 0.020 0.082 0.022 Table 3.9. GaussHermite quadrature and Monte Carlo EM estimates and standard errors from model fitting with linear trends of dose in the ethylene glycol data Parameter GQ (logit) MCEM (logit) GQ (probit) MCEM (probit) Est SE Est SE Est SE Est SE 010 0.952 0.014 0.952 0.021 0.952 0.014 0.954 0.028 011 0.087 0.008 0.088 0.008 0.087 0.008 0.088 0.012 /020 4.335 0.411 4.335 0.522 2.340 0.213 2.422 0.269 021 1.749 0.208 1.752 0.225 0.970 0.111 0.982 0.129 a 0.075 0.002 0.075 0.002 0.075 0.002 0.075 0.002 al 0.086 0.007 0.086 0.007 0.086 0.007 0.086 0.007 U2 1.513 0.196 1.495 0.207 0.839 0.107 0.831 0.101 p 0.682 0.088 0.687 0.091 0.666 0.090 0.670 0.094 ttilj =/3io + 10 11 di + bil, logit(Aji2j) = 0/32o + 021 di + bi2 012 2 We considered a linear trend in dose in both linear predictors. Although other authors have found a significant quadratic dose trend for birth weight, we found that to be nonsignificant and hence have not included it in the final model (see Table 3.8). We also considered a probit link. The Gaussian quadrature and Monte Carlo EM parameter estimates from the model fits are given in Table 3.9. Initial estimates for the regression parameters were bi = (bi1, bi2)' MVN(O, E), E = [ 0'12 the estimates from the fixed effects models for the two responses, a was set equal to the estimated standard deviation from the linear mixed model for birth weight, and the identity matrix was taken as an initial estimate for the variancecovariance matrix of the random effects. Gaussian quadrature estimates were obtained using 100 quadrature points in each dimension which took 36 hours for both the logit and the probit links. Adaptive Gaussian quadrature will probably be much faster because it requires a smaller number of quadrature points. The Monte Carlo EM algorithm for the model with logit link ran for 3 hours and 6 minutes, and required 41 iterations and a final sample size of 989 for convergence. The Monte Carlo EM algorithm for the model with a probit link ran for 5 hours and 13 minutes, and required 37 iterations and a final sample size of 1757 for convergence. The convergence precisions were the same as in the simulated data example. The estimates from the two algorithms are similar but the standard error estimates from the Monte Carlo EM algorithm are larger than their counterparts from Gaussian quadrature. This is not always the case as seen from Table 3.10. There is much variability in the standard error estimates from the Monte Carlo EM algorithm. The estimates can be improved by considering larger simulation sample sizes but we adopt a different approach. The observed information matrix at each step of the algorithm is easily approximated using Louis' method and it does not require extra computational effort, because it relies on approximations needed for the sample size increase decision. Because the EM algorithm converges slowly in close proximity of the estimates and because the convergence criterion must be satisfied in three consecutive iterations, the approximated observed information matrices from the last three iterations are likely to be equally good in approximating the variance of the parameter estimates. Hence if we average the three matrices we are likely to obtain a better estimate of the variancecovariance matrix than if we rely on only one particular iteration. Table 3.10. Variability in Monte Carlo EM standard errors for the ethylene glycol ex ample. Means and standard deviations of the standard error estimates are computed for 100 samples for the logit and probit models. GaussHermite quadrature standard errors as given as a reference. Parameter Logit model Probit model ____GQ Mean of SE SDofSE GQ Mean of SE SDofSE 010 0.014 0.015 0.032 0.014 0.014 0.021 Oi 0.008 0.008 0.015 0.008 0.007 0.010 /320 0.411 0.440 0.500 0.213 0.200 0.137 /21 0.208 0.206 0.207 0.111 0.101 0.063 a 0.002 0.002 0.0001 0.002 0.002 0.00002 a7 0.007 0.007 0.0005 0.007 0.007 0.0001 72 0.196 0.212 0.115 0.107 0.110 0.020 p 0.088 0.094 0.052 0.090 0.092 0.012 Tables 3.11 and 3.12 give standard error estimates for the EG data based on Gaus sian quadrature (column GQ), based on the the Monte Carlo approximated observed information matrix in the third to last (column Al), second to last (column A2) and last (column A3) iterations, and based on average of the observed information matri ces of the last two (column A4) and of the last three (column A5) iterations. Clearly using only the estimate of the observed information matrix from the last iteration is not satisfactory because it depends heavily on the random seed and may contain some negative standard error estimates. Averaging over the three final iterations is better, although it is not guaranteed that even the pooled estimate of the observed information matrix will be positive definite, or that it will lead to improved estimates. All parameter estimates in the model are significantly different from zero with birth weight significantly decreasing with increasing dose and probability for malfor mation significantly increasing with increasing dose. As expected, the regression pa rameters estimates using the logit link function are greater than those obtained using the probit link function. The use of the logit link function facilitates the interpreta tion of the parameter estimates. Thus, if the EG dose level for a litter in increased by lg/kg the estimated odds of malformation for any fetus within that litter increase Table 3.11. Monte Carlo EM standard errors using logit link in the ethylene glycol example. The approximations are based on the estimate of the observed information matrix from the third to last iteration (Al), second to last iteration (A2), last iter ation (A3), the average of the last two iterations (A4), and the average of the last three iterations (A5). The standard error estimates obtained using GaussHermite quadrature are given as a reference in the column labeled GQ. Parameter Estimated S.E. GQ Al A2 A3 A4 A5 ,31o 0.014 0.017 0.037 0.012 0.016 0.016 A11 0.008 0.007 0.014 0.001 0.008 0.007 f20 0.411 0.385 0.429 0.365 0.409 0.396 /21 0.208 0.183 0.239 0.215 0.236 0.206 a 0.002 0.002 0.002 0.002 0.002 0.002 O"1 0.007 0.007 0.007 0.007 0.007 0.007 U2 0.196 0.199 0.202 0.221 0.204 0.199 p 0.088 0.081 0.084 0.136 0.096 0.089 Table 3.12. Monte Carlo EM standard errors using probit link in the ethylene glycol example. The approximations are based on the estimate of the observed information matrix from the third to last iteration (Al), second to last iteration (A2), last iter ation (A3), the average of the last two iterations (A4), and the average of the last three iterations (A5). The standard error estimates obtained using GaussHermite quadrature are given as a reference in the column labeled GQ. Parameter Estimated S.E. GQ Al A2 A3 A4 A5 010 0.014 0.021 0.021 0.006 0.016 0.017 Oil 0.008 0.012 0.007 0 0.008 0.009 320 0.213 0.273 0.227 0 0.243 0.233 /321 0.111 0.151 0.110 0 0.132 0.127 a 0.002 0.002 0.002 0.002 0.002 0.002 a1 0.007 0.008 0.007 0.007 0.007 0.007 U2 0.107 0.127 0.097 0 0.124 0.110 p 0.090 0.094 0.102 0 0.101 0.095 exp(1.75) = 5.75 times. The probit link function does not offer similar easy inter pretation, but allows one to obtain populationaveraged estimates for dose. If the subjectspecific regression estimate is /3, then the marginal regression parameters are estimated by T (Section 2.1). Hence, we obtain a marginal intercept of 1.793 V(1+42) and a marginal slope of 0.743. The latter can be interpreted as the amount by which the populationaveraged probability of malformation changes on the probit scale for each unit increase in dose. The subjectspecific slope estimate of 0.970 on other hand is interpreted as the increase in the individual fetus probability of malformation on the probit scale for each unit increase in dose. A more meaningful interpretation for those numbers can be offered if one assumes a continuous underlying malformation variable for the observed binary malformation outcome. The latent variable reflects a cumulative detrimental effect which manifests itself in a malformation if it exceeds a certain threshold. The effect of dose on this underlying latent variable is linear and hence /&21 is interpreted as the amount by which the cumulative effect is increased for one unit increase in dose. /321 can also be interpreted as the highest rate of change in the probability for malformation (Agresti, 1990, p.103). This highest rate of change is estimated to be achieved at __ As expected birth weight and malformation are significantly negatively correlated as judged from the Wald test from Table 3.9: (^ .)2 = (066 )2 = 54.8 (p value < 0.0001). This translates into a negative correlation between the responses but there is no closed form expression to estimate it precisely. It is interesting to compare the joint and the separate fits of the two response variables (Tables 3.13 and 3.14). Table 3.13 presents estimates obtained using Gaus sian quadrature and shows that the estimates for the Normal response are identical (within the reported precision) but the estimates for the Bernoulli response from the separate fits are generally larger in absolute value with larger standard errors. Table 3.13. Results from joint and separate GaussHermite quadrature of the two response variables in the ethylene glycol example. Par. Logit models Probit models Joint fit Separate fits Joint fit Separate fits Est. SE Est. SE Est. SE Est. SE O3lo 0.952 0.014 0.952 0.014 0.952 0.014 0.952 0.014 01, 0.087 0.008 0.087 0.008 0.087 0.008 0.087 0.008 320o 4.335 0.411 4.356 0.438 2.340 0.213 2.426 0.230 1321 1.749 0.208 1.779 0.220 0.970 0.111 0.993 0.118 a 0.075 0.002 0.075 0.002 0.075 0.002 0.075 0.002 a1 0.086 0.007 0.086 0.007 0.086 0.007 0.086 0.007 U2 1.513 0.196 1.577 0.211 0.839 0.107 0.881 0.117 p 0.682 0.088 0.666 0.090  Table 3.14. Results from joint and separate fitting of the Monte Carlo EM algorithm for the two response variables in the ethylene glycol example Par. Logit models Probit models Joint fit Separate fits Joint fit Separate fits Est. SE Est. SE Est. SE Est. SE 310 0.952 0.021 0.952 0.014 0.954 0.028 0.952 0.014 /311 0.088 0.008 0.088 0.007 0.088 0.012 0.088 0.007 1320 4.335 0.522 4.453 0.344 2.422 0.269 2.499 0.193 321 1.752 0.225 1.822 0.211 0.982 0.129 1.025 0.098 a 0.075 0.002 0.075 0.002 0.075 0.002 0.075 0.002 01 0.086 0.007 0.086 0.007 0.086 0.007 0.086 0.007 a2 1.495 0.207 1.547 0.199 0.831 0.101 0.869 0.052 p 0.687 0.091 0.670 0.094  This may indicate small efficiency gains in fitting the responses together rather than separately. More noticeable differences in the parameter estimates and especially in the standard errors are observed in the results from the Monte Carlo EM algorithm (Table 3.14). In this case the standard error estimates for the Bernoulli components from the separate fits are smaller than their counterparts from the joint fits. These results, however, may be deceiving because there is large variability in the standard error estimates depending on the particular simulation sample used (see Table 3.10). As a result of the conditional independence assumption the correlation between birth weight and malformation within fetus, and the correlation between birth weight and malformation measured on two different fetuses within the same litter, are as sumed to be the same. However, in practice this assumption may not be satisfied. We would expect that measurements on the same fetus are more highly correlated than measurements on two different fetuses within a litter. Hence, it is very important to be able to check the conditional independence assumption and to investigate how departures from that assumption affect the parameter estimates. In the following chapter score tests are used to check aspects of conditional independence without fit ting more complicated models. When the score tests show that there is nonnegligible departure from this assumption, alternative models should be considered. In the case of a binary and a continuous response one more general model is presented in Chapter 5. It has been considered by Catalano and Ryan (1992), who used GEE methods to fit it. We propose "exact" maximum likelihood estimation which allows for direct estimation of the variancecovariance structure. 3.4.2 Myoelectric Activity Study in Ponies The second data set introduced in Section 2.4 is from a study on myoelectric activ ity in ponies. The purpose of this analysis is to simultaneously assess the immediate effects of six drugs and placebo on spike burst rate and spike burst duration within a pony. Two features of this data example are immediately obvious from Table 2.2: there is a large number of observations within cluster (pony) and the variance of the spike burst rate response for each pony is much larger than the mean (see Table 2.2). The implication of the first observation is that ordinary GaussHermite quadrature as described in Section 3.2.1 may not work well because some of the integrands will be essentially zero. Adaptive Gaussian quadrature on the other hand will be more appropriate because it requires a smaller number of quadrature points. Also the ad ditional computations to obtain the necessary modes and curvatures will not slow down the algorithm because there are only 6 subjects in the data set and hence only six additional maximizations will be performed. Adaptive Gaussian quadrature is described in the next subsection of the dissertation. The implication of the second observation concerns the response distribution of the spike burst rate response. An obvious choice for count data is the Poisson distribution, but the Poisson distribution imposes equality of the mean and the variance of the response and this assumption is clearly not satisfied for the data even within a pony. Therefore some way of accounting for the extra dispersion in addition to the pony effect must be incorporated in the model. Such an approach based on the negative binomial distribution is discussed later in this chapter. Adaptive Gaussian Quadrature Adaptive Gaussian quadrature has been considered by Liu and Pierce (1994), Pinheiro and Bates (1999), and Wolfinger (1998). We now present that idea in the context of the bivariate GLMM. Recall that the likelihood for the ith subject has the form Li f f(y=, bf)db, , bR)bv where n, 11T 1 j=l 27T 2 f~y1 b1 = J {f~yi, b~; ~&)f(y~2Ib~; I21gy,,) k)1 16, Tenp1b~b) Let ib, be the mode of f(yi, bi) and fi = (2YbT)11 Then L= J f*(yi,bi)O(bi;bi,r)db, =JRq where f*(yi, bi) = f(yb,b and
