UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
SUBJECTSPECIFIC MODELLING OF CAPTURE RECAPTURE EXPERIMENTS By BRENT ANDREW COULL A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1997 To Jill and my parents ACKNOWLEDGEMENTS I would like to express my sincere gratitude to Dr. Alan Agresti for serving as my dissertation advisor. Not only does his hard work and obvious enthusiasm for statistical research serve as an excellent example for his students, but he has shown an inordinate amount of interest in me and my professional development as a statistician well beyond that required of any advisor. I consider him a valued friend. I would also like to thank Drs. James Booth, James Hobert, Ramon Littell, and Michel Ochi for serving on my committee, and Dr. Craig Osenberg for agreeing to sit in on my final examination on such short notice. I thank my mother and father for their continual love and support. Their work ethic and compassion for other people make them worthy role models. I am also lucky to have married into the Cubbedge family. They treat me as if I am one of their own. Jill and I are truly blessed to have such a wonderful family. Finally, I could not have completed this work without the unwaivering love, pa tience, and support of my wife, Jill. She has always had the confidence that I would complete this paper, even when I did not, and has made many sacrifices so that I could do so. I will forever be grateful. TABLE OF CONTENTS ACKNOWLEDGEMENTS ................................................... iii ABSTRACT ................................................................. vi CHAPTERS 1 FORMULATION OF THE CAPTURERECAPTURE PROBLEM 1 2 OVERVIEW OF THE EXISTING CAPTURERECAPTURE LITER ATURE .................................................... 6 2.1 Existing Models .............................. ............ .. 6 2.2 Maximum Likelihood Estimation of N ......................... 22 2.3 Methods for Constructing Confidence Intervals for N ......... 25 3 CAPTURERECAPTURE MODELS ASSUMING LOCAL INDEPEN DENCE .................................................... 33 3.1 A Logistic Model with Subject Heterogeneity ................. 34 3.2 A Latent Class Model .......................................... 43 3.3 ML Estimation of N ........................................... 49 3.4 Snowshoe Hare Example ....................................... 51 3.5 Behavior of the Log Likelihood and N Estimator ............. 54 3.6 Similarities to Other NEstimation Problems ................. 64 3.7 Comments .............................................. 65 4 ALTERNATIVE FORMS OF DEPENDENCE ...................... 67 4.1 Serial Dependence ........................................ 68 4.2 An Overdispersed Poisson LogLinear Model .................. 75 4.3 The Multivariate Logitnormal Model ......................... 84 4.4 Conclusions ............................................. 96 5 SIMULATION STUDIES ............................................ 99 5.1 Numerical Optimization and the Bootstrap ...................100 5.2 Nc and the Bootstrap ........................................106 5.3 Nc and the Profile Likelihood Confidence Interval .............113 5.4 Narrow Intervals vs. Attained Nominal Confidence ...........138 5.5 Recommendations ...................................... .... ... 141 6 CONCLUSIONS .................................... ..............144 6.1 Summary of Results ............................................. 144 6.2 Future Research ............................. .... .......... ... 147 REFERENCES ....................................... ..........................149 BIOGRAPHICAL SKETCH ..................................................155 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy SUBJECTSPECIFIC MODELLING OF CAPTURE RECAPTURE EXPERIMENTS By Brent Andrew Coull December 1997 Chairman: Alan G. Agresti Major department: Statistics A capturerecapture experiment that successively samples individuals from some population of interest is the most popular method for estimating population size. These experiments have found application in wildlife ecology, census undercount, and epidemiological studies. We consider capturerecapture models appropriate for estimating N in a closed population when the probability of capture varies across both the N subjects (population heterogeneity) and the t sampling occasions. We investigate the performance of mixture models, loglinear models, and a latent class model for a variety of dependence structures among the t sampling occasions. In particular, we look at situations when population heterogeneity is the only source of dependence among the t samples, situations when positive or negative within subject dependencies exist, and situations when both sources of dependence exist. We demonstrate that two mixed models, the logisticnormal model and a logistic normal model with a serial dependence term, and the latent class model experience a near nonidentifiability problem when most of the "captured" subjects are observed on only one sampling occasion. We draw parallels between this phenomenon and other wellstudied Nestimation problems. When this is the case, the loglinear models of homogeneous twofactor interaction and homogeneous twofactor interaction plus a serial dependence term prove useful in that they yield confidence intervals for N that are narrower than those resulting from the mixed models while maintaining closeto nominal coverage. These simpler models can also describe a wide range of dependence structures among the t occasions (including negative dependence structures) and are easy to fit. We present an alternative mixed model, an overdispersed Poisson model, that shows promise in estimating N when most subjects are observed on only one occasion. An example shows that this model avoids the nonidentifiability problems incurred by the other mixed models. A multivariate logitnormal model is introduced that accounts for correlations between the t responses by specifying that the probability vector is distributed as a multivariate normal random vector. We also compare methods for constructing interval estimates for N. Although the bootstrap is probably the most popular method in the capturerecapture literature right now, we give several reasons why profile likelihood intervals are to be preferred when used in conjunction with the above models. CHAPTER 1 FORMULATION OF THE CAPTURERECAPTURE PROBLEM The capturerecapture (CR) problem is one of estimating the size, N, of some population of interest by way of successive sampling and recording of individuals in the population. The earliest models obtained an estimate for the number of subjects in the population under the restrictive assumption that the probability of capture stays constant for all occasions and for all subjects. Researchers quickly realized that these assumptions, while mathematically convenient for obtaining a numerical estimate, were rarely biologically realistic. Thus, in the last 40 years, an enormous body of literature has been developed to explore ways to relax one or both of these assumptions. The goal of capturerecapture experiments is typically that of estimating ani mal abundance in some habitat, in which case the "subjects" are animals and the sampling occasions refer to trappings. Researchers, however, are finding other areas of application for capturerecapture methods. Often called dualsystem estimation, capturerecapture methodology has been used in census undercount estimation (Alho et al., 1993; Darroch et al., 1993). An individual is "captured" if listed in the census or a supplementary postenumeration survey, which together represent the multiple samples. Captur:recapture methodology is just recently finding a place in epidemi ology. In this setting, the population of interest is the set of individuals with a certain condition or disease and the sampling occasions correspond to incomplete lists from different sources. Wittes (1974) first proposed using capturerecapture methods to estimate the size of an infected group in an epidemiological setting. Baker (1990) used capturerecapture methods to analyze data on multiple screenings for the early detection of breast cancer in women, while Regal and Hook (1984, 1991) estimated the number of spina bifida cases in upstate New York. Chao and Tsay (1996a, 1996b) focused on capturerecapture methods applicable exclusively to epidemiological data and attempted to estimate the number of subjects who contracted the hepititis A virus during an outbreak in Northern Taiwan. The International Society for Disease Monitoring and Forecasting (1995a,b) detailed CR applications in human diseases and called for future research in this area. Throughout this dissertation, we consider a general tsample capturerecapture experiment in which the animals are marked or recorded in such a way that the num ber of subjects with a particular pattern of being observed or not being observed at each sampling occasion is available. These patterns are often referred to as the "cap ture histories" in the capturerecapture literature. The data from such experiments can be represented in a 2t contingency table, formed by crossclassifying the capture status of a subject on each of the t occasions. For instance, if we denote a capture at sampling occasion i as 1 and a noncapture as 0, a t = 3 occasion experiment gives rise to the follov.ing 23 contingency table: occ. 3 = 1 occ. 3 =0 occ. 2 occ. 2 1 0 1 0 occ. 1 1 niI nlo0 occ. 1 1 nio nloo 0 non1 nooo 0 nolo nooo =? Thus, ni11, for instance, is the number of subjects having capture history (1,1,1), or captured at all three occasions. In this representation the problem of estimating N is equivalent to estimating nooo, the number of subjects unobserved in all t samples. We obtain an estimate of this unknown quantity through models which place assumptions on the capture probabilities. The estimate of nooo is then the fitted cell count for cell (0, 0, 0) obtained from the model. More generally, we use the following notation throughout this dissertation for a t sample experiment. Let Z = {(1,..., 1),..., (0,..., 0)} denote the set of 2t possible capture histories in lexicographic order, and = {(1,..., 1),..., (0,..., 0, 1)} the set of observable histories. Let i = (il,..., it) be an element of I, and let ni = ni,...dt be the number of subjects having that sequence. We let n = (n...,..., no...o) denote the vector of cell counts, i = (ni...l,..., no...1) the vector of observable cell counts, and the scalar n = Eic ni the number of observed subjects in the experiment. The problem of estimating the unobserved cell count no...o from a model fit to the observed 2t 1 cell counts is inherently one of extrapolation. That is, in "estimating" no...0, one extrapolates from the range of the observed data, the numbers of subjects having 1,2,..., t captures, to the number of subjects with 0 captures. One can think of this problem as similar to the famous Space Shuttle data in which one fits the probability of Oring failure as a function of temperature in the range of 53 to 81 degrees, and then seeks to predict the probability of Oring failure at 31 degrees (Dalal et al., 1989; Lavine, 1991). Another analogous problem is that of lowdose extrapolation (Prentice, 1976; Brown, 1978). In carcinogenic studies, researchers are often not able to employ enough patients to estimate accurately the probability of an event for low doses of a drug. Thus, researchers must apply larger doses of the substance under question, fit a model to the observed range of doses, and extrapolate the fit either to predict the effect of a specific low dose of interest or to determine a minimum effective dose. In extrapolation problems such as these, different models that provide essentially equally good fits to the observed data can lead to drastically different predicted values at settings outside the observed range of covariates. Thus, we see that population size estimates can differ drastically depending on the selected model. It can also happen that a model with a poor fit to the observed data extrapolates to the settings of interest much more accurately than a better fitting model. Thus, the standard techniques for model selection based on goodnessoffit criteria can be inappropriate. The purpose of this thesis is to examine methods that can be used to estimate the size of some population of interest when the probability of capture varies across both subjects in the population and across the t sampling occasions. In Chapter 2, we review existing models developed to estimate N in the presence of heterogeneity and discuss the advantages and disadvantages of each. We also review likelihoodbased methods for obtaining a point estimate of N and standard techniques of constructing confidence intervals for N. In Chapter 3, we develop a mixed model approach to account for population heterogeneity in capturerecapture studies. We motivate the logisticnormal model, which results from a randomeffects approach to a subjectspecific logistic model. We compare the performance of this model in estimating N with a simpler loglinear model that is motivated from a fixedeffects approach to the same logistic model. We also present a simpler form of the logisticnormal model, a latent class model, and include this model in the comparisons. We demonstrate these methods by considering an animal abundance example resulting from six trappings of snowshoe hares and an epidemiologic example resulting from three incomplete lists of hepititis A patients compiled during a 1995 hepititis outbreak in northern Taiwan. We then use these examples to demonstrate potential difficulties, in the form of flat loglikelihoods, incurred by the logisticnormal model. Finally, Chapter 3 draws parallels between the difficulties incurred in the mixed model setting and those occurring in other well studied Nestimation problems, namely other capturerecapture experiments that allow for heterogeneity and the situation where one observes k independently and identically distributed binomial(N, p) observations when both N and p are unknown. Chapter 4 investigates the usefulness of other statistical models in the context of capturerecapture studies. We study models that relax the logisticnormal assump tion of local independence. In particular, for sequential trappings, we look at fixed and randomeffect models allowing for serial dependence, given an individual. We also investigate the use of an alternative mixed model that allows for overdispersion relative to the loglinear model of mutual independence, and introduce and develop a new model, the multivariate logitnormal model, that postulates a separate latent variable for each occasion. Chapter 5 presents the results of simulation studies conducted to examine the performances of the models discussed in Chapter 3 and Chapter 4 in the capture recapture setting. We investigate the coverage properties and mean lengths of the resulting bootstrap and profile likelihood confidence intervals for N and the absolute error of the point estimates. These studies demonstrate the impact that flat log likelhoods have on the logisticnormal model's ability to accurately estimate N. We see that the logisticnormal model is helpful in estimating the population size only when the amount of heterogeneity is small to moderate and the probability of capture is not extremely small. Otherwise, we see that alternative loglinear models are viable candidates when mixed models prove unsatisfactory. CHAPTER 2 OVERVIEW OF THE EXISTING CAPTURERECAPTURE LITERATURE Capturerecapture methods are often classified according to the strength of the as sumptions placed on the population of interest. Models assuming an open population that experiences recruitment, in the form of birth or immigration, and/or losses, in the form of death or emigration, incorporate extra parameters for these phenomenon and will not be considered here. We will instead concentrate on methods appropriate when estimating population size in closed populations whose size remains constant during a tsample capturerecapture experiment. In these models, the subjects in a population of size N can be enumerated {1, 2,..., N}, where N is a parameter to be estimated. The assumption of a closed population is often appropriate for studies conducted over short periods of time, say, for instance, several days. 2.1 Existing Models Darroch (1958) was the first to use maximum likelihood analyses to obtain pop ulation size estimates. His model assumed that the capture probabilities vary across sampling occasions, but are the same for every subject in the population. This fits into the Mt class of models within the hierarchy of models of Otis et al. (1978) Mo, Mt, Mh, Mb, Mth, Mtb, Mhb, Mtbh, where t denotes variation of capture probabilities across time, h denotes variation of capture probabilities across subjects (h=heterogeneous population), and b denotes variation of capture probabilities according to a behavioral response. Behavioral response refers to the phenomenon of the capture probability changing once an animal is captured either in the form of trap "avoidance" or trap "dependence." Mo, the simplest model, assumes constant capture probability for all animals on all sampling occasions and is rarely realistic. Although models are now available for the entire hierarchy, models allowing cap ture probabilities to vary among individuals (i.e. having the subscript h) have proven to be problematic. In this chapter we will review in detail existing models that ac count for heterogeneity. The earliest attempts to consider heterogeneity specified a heterogeneous population as an assumption violation to simpler models, for instance Mo, Mb, or Mt, and tried to construct estimators that were robust to this violation. Fienberg (1972) proposed using a loglinear model that fit the 2 1 contingency table well. Cormack (1989), however, noted that although means of checking for subject heterogeneity are available when using standard loglinear models, these models do not provide reliable estimates when this is the case. Nanthan Mantel, pioneer of the use of stratified contingency tables, stated (Gail, 1997), The issue was to estimate the population size. Fienberg's idea was to assume that you were in a complex contingency table with one missing cell entry. However, his method assumed that all animals came from this same complex contingency table. That was one thing that I would have disagreed with. Maybe the risks were not really homogeneous from animal to animal. The two most popular estimators within the Mh class are the BurnhamOverton (1978) and Chao (1987, 1989) estimators. Burnham and Overton (1978) were the first to estimate N using heterogeneous capturerecapture models. Both of these models assume that the capture probabilities ps, s = 1,..., N, are randomly distributed ac cording to an unspecified distribution F. Most recently, Norris and Pollock (1996) conducted nonparametric ML estimation (NPML) of the pair (N, F) via an EM al gorithm under both Mh and Mbh assumptions. This is by no means an exhaustive list of all heterogeneity models in the enormous capturerecapture literature. Other models within this class were proposed by Pollock and Otto (1983), Smith and Van Belle (1984), and Lee and Chao (1994). In this section, however, we focus on the models in the preceding paragraph because log linear models are the simplest to implement, the BurnhamOverton and Chao models are the most widely used heterogeneity models, and the NPML estimator is the most recent. These models also introduce the themes that will become evident in Chapters 35; that is, these models demonstrate difficulties inherent in the problem of Nestimation when heterogeneity is present and the tradeoff of narrow confidence intervals versus attained confidence. 2.1.1 Loglinear Models An obvious class of models to model the 2t 1 counts in the contingency table is the class of loglinear models. As noted earlier, the use of these models in the capturerecapture setting was first developed by Fienberg (1972) and reviewed by Cormack (1989). These models assume that the cell counts in the 2t table are either Poisson or multinomial random variables. Since independent Poisson random variables are dis tributed as multinomial when conditioned on their sum, both of these assumptions yield identical inferences. Loglinear models relate the expectations of these counts, { ui}, to a vector of unknown parameters, /, through the log transformation, log(/) = X3, where pA is the vector of cell means in lexiographic order and X is a known covariate matrix. We estimate the unknown parameter vector / through maximum likelihood. Since Poisson loglinear models are exponential family models and the log link is the canonical link, the likelihood equations take the form of equating the expected value of the sufficient statistics to their observed values. Fienberg (1972) and Cormack (1989) selected an appropriate model within the hierarchy of loglinear models based on usual goodnessoffit criteria, either the Pearson chisquared statistic or the residual deviance. The simplest way to obtain an Nestimate from a given loglinear model is to fit the model to the 2t 1 observed counts conditional on the number of subjects, n, observed in the experiment. We then use the model to predict the content of the unobservable cell. We demonstrate this procedure in the simple twosample case, for which the model specifying that the capture status at occasion is independent of capture status at occasion one produces a closedform estimator. Consider the following 2 x 2 table: occ. 2 1 0 occ. 1 1 ni nio 0 no  Letting pij = E(nij), i, j = 0, 1, denote the expected value of the ijth cell count, the maximum likelihood estimates of the {fi } conditional on n = nio + no, + nl are just the corresponding {nij}. Also, the independence model requires 11/100oo 1, so that 10o01 nono01 /too  A11 ni1 This estimate is the traditional Lincoln index (1930) used to estimate N. Loglinear modelling of capturerecapture data has both advantages and disad vantages. One advantage is that, unlike the Mh models discussed in the next section, capture probabilities are allowed to vary across sampling occasions. Also, one can specify dependencies between sampling occasions in the form of interactions. The only requirement on the form of the model in order to obtain an Nestimate is that the torder interaction is zero. Otherwise, the model is saturated for the unknown cell count, and any value of that count is consistent with the model. Thus, the saturated model provides no information on the population size. A third advantage of the loglinear models is the ease with which these models can be implemented. Cormack (1989, 1990), Agresti (1994) and others showed that one can fit standard loglinear models to capturerecapture data using GLIM (Francis et al. (1993)). One simply specifies the weight of the missing cell count to be zero, producing Nestimate N = n + Fo...o. This estimate corresponds to an estimate of N conditional on n. We discuss this estimation procedure in Section 2.2.1. The main disadvantage of loglinear modelling is the inability to model heterogene ity within the population. Cormack (1989) proposed ways to diagnose heterogeneity when fitting these models. He suggested examining the residual deviance, examining the pattern of standardized residuals based on the number of individuals observed and predicted to have been captured k times, k = 1,..., t. He suggested that plots exhibiting a concave shape exhibit heterogeneity, suggesting that the given loglinear model is inadequate for the data. He demonstrated with a set of data based on six trappings of snowshoe hares, which we will analyze in detail in Chapter 3. In the standard setting for loglinear models, that of complete contingency ta bles where all cell counts are observed, parsimonous loglinear models yield smaller standard errors for estimated parameters. Thus, the standard practice is to base inferences on a simpler model that fits reasonably well since the mean squared errors of the estimates for that model can be much smaller than those for a more com plex model, even if the simpler model holds only approximately. In the context of Nestimation in capturerecapture models, the magnitude of the standard error for N also increases for more complex models. As noted in Chapter 1, however, the traditional goodnessoffit tests for extrapolatory problems do not necessarily give a good indication of the model fit to the response outside the observed range of data. Thus, simpler models that yield smaller standard errors and narrower confidence intervals can result in overly optimistic confidence. This was noted by Regal and Hook (1991) and Agresti (1994) in the capturerecapture setting, and Prentice (1976) in the lowdose extrapolation setting. Thus, Prentice (1976) and Regal and Hook (1991) both suggest the opposite of standard practice: unless outside considerations warrant the use of a simple model, one should use the more complex model to obtain wider confidence intervals that achieve higher actual coverage. We will examine the tradeoff between narrow, "informative" confidence intervals versus attaining nominal confidence in Chapter 5. We introduce a variety of models that allow for heterogeneity in Chapters 3 and 4, and compare their performance in estimating N through simulation in Chapter 5. First, however, we review in detail previous attempts to model heterogeneity in the population. 2.1.2 Models Allowing for Heterogeneous Population As noted by Cormack (1989), the main problem in using loglinear models in capturerecapture experiments is the constraint that the capture probability is the same for all N subjects in the population. The assumption of heterogeneous capture probabilities has proven to be most problematic in the capturerecapture setting. It has long been recognized that ignoring heterogeneity when it is present results in se vere underestimation of the population size (Burnham, 1972; Burnham and Overton, 1978; Chao, 1987, 1989; and references therein). On the other hand, several models developed to account for heterogeneity either (a) yield confidence intervals for N that have extremely poor coverage or (b) yield extremely wide confidence intervals that provide little or no practical information on the population size. This section presents a summary of the models developed to account for heterogeneity in the population with respect to capturability. Burnham's BetaBinomial Model Under the Mh assumptions, the number of captures for subject s in the population is a binomial random variable with t trials and success probability p,, where p, is the probability of capture of subject s. A natural strategy in modelling heterogeneity is to specify that the individual capture probabilities {p,} are random variables with some known parametric form. One can then average across this mixing distribution to obtain expressions for the probability of capture of a subject selected at random from the population. Since the capture counts for each observed individual are binomial, a natural candidate for the form of the mixing distribution is the congugate beta distribution. Thus, ps ~ Beta(a,b), where a and b are unknown parameters. This assumption results in the common betabinomial model (Morgan, 1992). The marginal loglikelihood for this model is obtained by integrating over the random beta mixing distribution. This marginal loglikelihood is then maximized over a, b, and N to obtain a maximum likelihood estimate for the population size. Burnham (1972) rigorously developed this model in his dissertation. Unfortunately, he concluded that this model was unsatisfactory for estimating the population size because it yields extremely flat loglikelihoods with respect to the unknown parameter N. These flat profile loglikelihood surfaces are a result of a nearnonidentifiability problem that occurs when both the parameters of the mixing distribution and the population size N are left completely unspecified. These flat surfaces result in nearly arbitrary point estimates for N and extremely wide confidence intervals that provide little or no practical information on N. We look at this problem in more detail in the context of logisticnormal models in Chapter 3. Burnham and Overton's Solution the Jackknife Burnham and Overton's (1978) solution to the near nonidentifiability problem encountered in the betabinomial model is a jackknife estimate. This population size estimate takes a naive estimate of N, the total number of subjects seen in the exper iment, and attempts to improve this estimate through a bias correction procedure, the jackknife. Specifically, the jackknife is a leaveoneout algorithm that estimates the bias of an estimator of interest. Suppose we wish to estimate an unknown parameter, 0, indexing an unknown distribution. Let x = (xl,... xn) be a random sample of size n from this distribution, and suppose that we wish to estimate 0 with the statistic 0 = 0(x). One performs the jackknife by computing 0(i) = 9(xi), where x_i is the sample of size n 1 formed by deleting the ith observation. If 0.) = i=1 i), then the jackknife estimate of bias is B = (n 1)(0(.) 0). This estimate yields the biascorrected estimate = 6 B. Under the assumption that E(0)=0++ a+, where the {ai} are constants, the jackknife technique reduces the bias from order 1/n for 6 to order 1/n2 for 0 (Efron 1982). One can reduce the order of the bias further by deleting d observations at a time, known as the deleted jackknife. Burnham (1972) and Burnham and Overton (1978) applied the jackknife to the capturerecapture setting by considering 0 = nt, the total number of subjects observed after t samples, as a biased estimate for the unknown parameter 0 = N. The authors assume that al a2 E(n) = N + 2 + +., t t2 since the bias decreases as more samples are taken. The authors performed the jackknife by computing the t statistics {n(t_)i} by pretending that the ith sample was not recorded. The resulting biascorrected estimate has closed form since the n(t1)i depend only on the original estimate nt and the number of individuals who were observed exclusively on occasion i. The authors also proposed higherorder jackknife estimates obtained by deleting more than one occasion at a time and proposed a test to determine which estimate should be used to estimate the population size. These jackknife estimators also have advantages and disadvantages. The estimates are easy to compute since they can be expressed as a linear combination, E cf, with known coefficients {ci} of the number of subjects {fi} observed on exactly i different sampling occasions, which are the minimal sufficient statistics for the Mh model. As a result, closedform expressions of the asymptotic variances for the es timates exist. Also, through simulation results, Burnham (1972) demonstrated that the asymptotic confidence intervals obtained from these estimates tend to have ac tual coverage close to the nominal level when t is large (t > 10) for a wide range of assumed heterogeneity distributions. The main disadvantage of the jackknife estimators is the extremely poor confidence interval coverage when the number of sampling occasions are small. Chao's (1987) and our simulations show that for t < 5, the true coverages can be as low as zero in the presence of moderate heterogeneity. Also, for some models, it is possible to show analytically that the bias of nt is not expressible as a power series in t (Cormack, 1989). Specifically, model Mo, discussed at the beginning of this section, and all finite mixture distributions for a subject's capture probability p, do not satisfy this assumption. Thus, in these cases, theory has shown that these biascorrected estimates do not necessarily have smaller bias than the original estimator nt. Chao's Alternatives to the Jackknife The poor performance of Burnham and Overton's jackknife estimates for a small number of sampling occasions led Chao (1987, 1989) to develop an alternative esti mator based on the method of moments. This estimator was also motivated assuming t is large and the mean probability of capture is small, but simulation (Chao, 1987) demonstrated that this alternative yields much higher confidence interval coverage when t is small. Instead of basing the estimator on all known capture frequencies, {fi}, Chao developed an estimator based only on fi, f2. Chao pointed out the intu itive appeal of this estimator since animals remaining unobserved after t samples (i.e. those animals in capture frequency fo) are thought to be most like those observed on a small number of occasions. Chao assumed that the individual capture probability for individual s, ps, is dis tributed according to some unspecified distribution function F. This yields expected values E(fi) = N pi(1 p)tdF(p), i = 0 1,...,t. Using a Poisson approximation to the binomial density when t is large and p is small and Jensen's inequality, Chao obtained the inequality [E(f )]2 E(fo) > 2E(f2)' which leads to the moment estimate, N>n+ f (2f2) as a lower bound of the population size. Without being any more difficult to compute, this estimator and asymptotic con fidence interval perform much better than the BurnhamOverton estimator for a wide range of assumed mixing distributions. As Chao points out, this estimate will not perform well when the mean probability of capture is moderate to large, since informa tion based on subjects captured on more than two occasions (i.e. counts (f3, f4,...)) is ignored. Nonparametric MLE Norris and Pollock (1996) developed models in the classes Mh and Mbh that non parametrically estimate the random distribution assumed for the population hetero geneity. This work generalized Burnham's betabinomial model, which assumed that the subject capture probabilities p, are distributed as Beta(a, b) random variables, by leaving this mixing distribution, F, unspecified and estimating it via maximum likelihood. For a given candidate value for N, the authors used existing algorithms for finding the nonparametric maximum likelihood estimate for F. After doing this for many plausible N values, the authors took as the nonparametric maximum likelihood estimate (NPMLE) of N the value for which the pair (N, F) yielded the maximum loglikelihood among all those candidate values of N considered. These authors avoided the flat loglikelihood difficulties by placing restrictions on both the possible size of the population and the size of the probability of capture. In order to use Norris and Pollock's estimators, one must place an upper bound on the size of the population, as well as a lower bound on the probability of capture. Norris and Pollock suggested that if the NPMLE of N is the preset upper bound, one should not use this estimator. 2.1.3 Models Allowing for Heterogeneous Population and Variable Sampling Effort There are relatively few members in the class of models allowing for both time and heterogeneity effects, denoted by Mth. Lloyd and Yip (1991) proposed martingale estimating equations as a unifying framework for capturerecapture inference under which simultaneous occasion and subjects effects could be included. Chao, Lee, and Jeng (1992) proposed a nonparametric method of estimation for this model. We review three such models here. We first consider Sanathanan's (1974) "generalized" model, since it is the mixed logisticnormal model that we study in Chapter 3. We also detail the two most recently developed models that take into account heterogeneity and occasion effects, the loglinear model of homogeneous 2factor interaction, de veloped by Darroch et al. (1993) and Agresti (1994), sometimes referred to as the partial quasisymmetry model, and Chao and Tsay's (1996a, 1996b) estimator based on the notion of sample coverage. Sanathanan's 'Generalized' Model Sanathanan (1974) was the first to consider a capturerecapture model that al lowed the probability of capture to vary across both sampling occasions and among the subjects in the population. Sanathanan considered a twostep estimation scheme to the logisticnormal model discussed in Chapter 3 to obtain cell probability esti mates conditional on the total observed number of subjects in the experiment. The author considered the impact of the assumption of different randomeffects distribu tions when only three samples were taken. Sanathanan, however, does not report any problems with this model in terms of flat loglikelihoods, arbitrary point estimates, or extremely wide confidence intervals. Homogeneous TwoFactor Interaction Model Darroch et al. (1993) and Agresti (1994) motivated the simple loglinear model of homogeneous twofactor interaction from a fixedeffects approach to a subjectspecific logistic model. Specifically, if pi = /il...i = E(nil...i,) is the mean of the cell count associated with capture history (il,..., it) in the 2t table formed by crossclassifying capture status at each of the t occasions, then this model is log(il...Ji = p+ 01i + + 'tit + ( ij ) A, where we define ( = 0 if a < b. The t maineffect occasion parameters model the variation in the capture probabilities across the t sampling occasions, while the extra term A accounts for subject heterogeneity. We will consider this model in detail in Chapter 3 when we derive both the fixed and random effect approaches to the subjectspecific logistic model. This model is also included in the simulation study of Chapter 5, and the results suggest that this model is useful when substantial population heterogeneity is present. Chao's Sample Coverage Estimator Chao and Tsay (1996a, 1996b) estimated N when both occasion and subject effects are present through the notion of sample coverage. Consider a t = 3 sample experiment. Let Yj = 1 if subject s is captured on occasion j and 0 otherwise, pj = N1 SN= E(Yj) be the average probability of capture on occasion j, j = 1,...,3, and 1N yij = E E [(Y.i i) (Y, .j)] /(tizj) sq=l and 7123 = E E[(Y1 1) (Ys2 2) (Ys3 A3)] /(1i23) measure the degree of dependence between the occasions. Chao and Tsay used the connection between the number of subjects unseen and the conditional probability of finding an undiscovered one if an additional sample is collected. They defined sample coverage, C, as just one minus this conditional probability, and then estimated N through its relationship with C and the two and threeway associations, y = (712, 'Y13, 23, )123), between the sampling occasions. Specifically, the authors defined sample coverage as follows. Let I() denote the indicator function. Assuming the first two samples are fixed, the probability that an additional subject is discovered on the third occasion is EP P 3 = 1)I[Y1 = 0, Ys2 = 0] P3 E VN P3=1 P(Y,3 = 1) Since epidemiological applications are the main focus of these papers so that no ordering exists for the sampling occasions, the analogous quantities :N1 P(Y1 = 1)I[Y2 = O, Y3 = 0] SE=1 P(Yi = 1) when fixing samples two and three, and = = P(Y82 = 1)I[Y = 0, Ys3 =0] P2N 8=1 P(Y82 = 1) when fixing samples one and three, are also relevant. Thus, S P + P2 + P3 C* = (2.1) 3 is defined as the average probability of finding a new subject in any chosen additional sample, and the sample coverage is then defined as C = 1 C*. Let n+i1i3, ni,+i,, nili2+, nil++, n+i2+, n++i, be marginal counts obtained by sum ming the observed cell counts, n, over the samples denoted by "+" in the subscript. Chao and Tsay argued that a reasonable estimate for Pj, j = 1, 2,3, is the number of subjects captured on occasion j only divided by the total number of subjects captured on occasion j, leading to sample coverage estimate 1( nlo + noo + nool 3 \n++ n+l+ n++l Then, claiming that identities N E(D) 1 N = + ) [E(nl+o + n+10)712 + E(nlo+ + n+01)713 E(C) 3E(C) R +E(nol+ + no+1)723] + R (2.2) 3E(C) 712 = N E(n+) 1, (2.3) E(nl++)E(n+i+) 713 = N E(n+) 1, (2.4) E(nl++)E(n++l) and 723 = N E(n+11) 1 (2.5) E(n+,+)E(n++,) hold, where N N D= E I[Y l + 2> O I[Y2 + Y3 > 0] + I[Y, + 3 > 0] 3 $s=I s=l s=l is the average of the distinct total and R is a remainder term involving , they obtained an Nestimate as follows. Set R = 0 in identity (2.2) by assuming that the threeway association, 7123, between occasions is a linear function of (i, /2, /13) and the twoway associations. Then solve the equation that results by replacing all expectations in (2.3), (2.4), and (2.5) by their observed values and substituting the resulting (712, 713, 23), along with D for E(D) and 0 for E(C), into identity (2.2). This results in the Nestimate n+11 + nl+l + n+nl 3{( [ (n4+o + n+o)ni+ + (nlo+ + n+ol)n+ + (no+1 +n+ol)n+. 1 3C [ nl++n+i+ nl++n++i n+i+n++i Since this Nestimate contains C in its denominator, Chao and Tsay noted that N can be unstable for small sample coverage. We have found that this instability, in fact, can take the form of nonsensical negative estimates of the population size. Even if the observed data prove to be "stable" in Chao and Tsay's terms, bootstrap replicates (see Section 2.3.2) used to obtain estimated variances for N and confidence intervals for N can have this undesireable behavior. Thus, this poor behavior is the main disadvantage of this sample coverage estimator. Chao and Tsay proposed an alternative, onestep estimator, N1, to be used when N proves unstable. This iterative estimator is constructed by substituting the N estimate that assumes no dependencies between occasions, N = D, into equations (2.3), (2.4), and (2.5), substituting the resulting twoway association estimates into (2.2) to obtain new estimate N', and then repeating this process using N' to obtain alternative estimate N1. The main advantage of these estimators over the ML estimates we will consider in Chapters 3 and 4 is that they can be expressed in closedform so that one can compute a point estimate and bootstrap confidence intervals for N (see Section 2.3.2) quickly. Also, these estimators allow for both sources of dependence between oc casions: withinsubject dependence and population heterogeneity. We will consider both sources of dependence in Chapter 4. The major disadvantages of N are the possible nonsensical Nestimates and the lack of a model generating this this form of estimate. N1 remedies the instability of N, but Chao and Tsay's simulations sug gested that this estimator yields overly optimistic nonparametric bootstrap percentile confidence intervals (e.g. 56% actual coverage for a 95% confidence interval) when the true model is a continuous mixture model. Our simulations have shown that this actual coverage figure can drop to 20% in some settings. We focus on the issue of narrow confidence intervals versus attained nominal confidence in Chapter 5. 2.2 Maximum Likelihood Estimation of N In this section, we review three methods for obtaining point estimates for N from a model applied to the 2t table with unknown cell count no...0. In subsequent discussions, we refer to the table with all 2t cell counts known as the complete table and the 2t 1 observed counts as the incomplete table. 2.2.1 Conditional on the Number of Observed Subjects Sanathanan (1972) outlined an approach to the estimation of N conditional on the total number of observed subjects for any general model indexed by parameter vector 0. For a model indexed by 0, the likelihood of (N, 0) given n = (nl...1...., no...) is N! L(N,0; W) = 7N!... (0)"' ... ro...(0)no... ro...o(O)" 1 ...nno no...!(N n)!11 "" Sanathanan, using the sufficiency of n for N, factors L into two components: L(N, 0; I) = Li(0;HIn)L2(N, 0; n) where n!,{0; n).. In() i...! no ...1 ( and N! L2(N, 0; n) = !(N ro1... o(0))nTo...o(0))Nn, n!(N n)! with Vi(0) Since L1 is free of N and is a function only of the unknown parameters 0, one obtains estimates Oc from the incomplete table by conditioning on n and maximizing L1 with respect to 0. One then maximizes L2(N, 0c; n) with respect to N, yielding Nc ,] (2.6) 1 7oO... (0c) where [x] denotes the greatest integer less than or equal to x. Sanathanan showed that Nc has the same asymptotic normal distribution as N, the value of N for which (N, 0) yields the overall maximum of L, as the true population size goes to infinity. Let Ao...o be the fitted value for the unknown cell count obtained by conditioning on n and fitting a given model to the incomplete table. In addition, let hlo...o be potential values for no...o, with i0...0o > 0. For a given observed incomplete table, let G2(hio...o) be the deviance for the model applied to the complete table formed by assuming the unobserved cell count is ho...o. This conditional (on n) estimation procedure selects the value po...o that minimizes G2(iio...o) for that complete table. That is, G2 o...0) = inf G2(io...o). (2.7) no...oER+ This estimate Ao...o is the value that produces a complete table whose fitted values from the model under consideration satisfy n = t. Fienberg (1972) applied this approach to loglinear modelling of capturerecapture data, calling it an extension of the estimated parameters to the missing cell count. Fienberg demonstrated testing the fit of the model to the data with the deviance applied to the incomplete table, G2 = 2 E nilog (2.8) yet cautions against the use of these methods for accurate estimation of N. Fienberg likened the capturerecapture problem to fitting a regression line of y on x, say, for x > 0, and then using the fit of this line to predict y at x = 0. This alludes to the extrapolatory nature of the problem as discussed in Chapter 1. Our experience has been that the goodnessoffit criterion applied to the observed data does not necessarily indicate an accurate Nestimate. We will demonstrate this phenomenon in Chapter 3. The advantage of these conditional methods is the computational ease of obtaining Nc, especially when using standard loglinear models (see Section 2.1.1). Cormack (1989, 1990), Agresti (1994), and others showed the ease of fitting standard loglinear models to capturerecapture data using GLIM by simply specifying the weight of the missing cell count to be zero, producing Nestimate N = n + pio...o. Baker (1990) described an EM algorithm approach to obtaining these conditional estimates. Using a starting guess for the missing cell count, one performs the Mstep of fitting a chosen model to the complete table and the Estep of taking i to be the observed values and no...o to be the fitted value from the previous Mstep fit. In his comment to Baker, Cormack (1990) noted that this iterative approach obtains the same estimate Nc as the GLIM approach and is unnecessary. 2.2.2 Search over N Alternatively, one can search for the overall maximum of L with respect to (N, 0) by maximizing N! L(N,;...) = n ... no... !(N n) N! oc (N )! 1...1(0)n1'...1 ... ro...i1(0)n...iro...(0) N with respect to 0 for a fixed N, leading to a maximum likelihood, LN, for that value N. One obtains the overall ML estimates (Ns, 0) by evaluating LN over a range of N values and taking the ML estimates to be those values of (N, 0) that provide the maximum LN. This approach was taken by Norris and Pollock (1996) (see Section 2.1.2) for Mh and Mbh closed population models with 0 being an unspecified mixing distribution on the individual capture probabilities. The advantage to this approach is that this search procedure (usually) yields the true maximum likelihood estimates for (N, 0) under the restriction that N is an integer. Disadvantages include the computational complexity of a search procedure consisting of maximizing the loglikelihood for many different values of N and the fact that one must put some restrictions on the range of N values since LN cannot be computed for all N > n. It is because of this second point that we say the true ML estimate is usually found, since the maximized loglikelihood as a function of N need not be unimodal. We will discuss this point further in Chapter 3. 2.2.3 N As Continuous A third approach to computing the overall ML estimates treats N as a continuous variable and employs numerical optimization techniques to maximize L with respect to (N, 0). One does this by substituting the factorial terms in L by gamma functions, so that the objective function to be maximized becomes (N + 1) N L(N, 0 ) = 0 rr...1 (0)"'.1 ... 7o... 1(0)'o.' 7o...o(0)" With the development of dependable, generalpurpose numerical optimization algo rithms, this maximization is straightforward. The resulting Nestimates from the three approaches are normally very similar. Sanathanan (1972) claimed that neces sarily Nc < N. Our experience has been that either Ns = [N] or Ns = [N + 1] always since the maximized loglikelihood is a smooth function of N. 2.3 Methods for Constructing Confidence Intervals for N Much of the recent research in capturerecapture modelling has focused on meth ods of constructing confidence intervals for N. Although Sanathanan derived the asymptotic normality of N as the true population size increases, confidence intervals based on asymptotic normality of the estimator are often unsatisfactory since the estimate is often based on small samples. This can result in a skewed distribution for N and a lower confidence limit that is less than n. As a result, the bootstrap and profile likelihood methods of Buckland and Garthwaite (1991) and Cormack (1992), respectively, are often preferred over ones based on the asymptotic normality of N. 2.3.1 Profile Likelihood Confidence Interval The alternative definition (2.7) for the conditional estimate Nc suggests a pro file likelihood approach to constructing confidence intervals when using this point estimate. Since ho...o is the value that produces the minimum G2 statistic from the complete table, a 100(1 a)% profile likelihood confidence interval is based on those values for the unobserved cell that increase the LR statistic for a model fitted to the complete table by a value less than X,. This interval rejects candidate values 0o...o for which the likelihood under the model is sufficiently less than the maximum value. For 95% confidence intervals, we accept all values no...o that increase the LR statistic by X,.05 = 3.84 or less. If all no...o values in the interval (a, b) satisfy this criterion for the missing cell count, then (n + a, n + b) is the confidence interval for N. The cutoff value X ,' is based on the asymptotic framework in which N + oo. Hall (1994) argued in a related Nestimation problem, that of k ii.id. binomial(N, p) observations with both N and p are unknown), that an alternative asymptotic frame work assuming the unknown parameter N is fixed is more appropriate. Moreover, Hall showed that the Nestimator is not distributed as a standard normal random variable, but rather has a Cauchy distribution. We note that in an analogous asymp totic framework for the capturerecapture setting, this chisquared cutoff might not be the most appropriate one. Simulations in Chapter 5, however, demonstrate that the above profile likelihood intervals do maintain coverage that is close to the nominal level in many practical situations. 2.3.2 Boostrap Confidence Intervals We describe both the ordinary percentile and the BCa, or biascorrected and accelerated, percentile bootstrap confidence intervals. We first describe these intervals for a generic estimation problem before applying the methods to the capturerecapture problem. The Bootstrap The bootstrap is based on the idea of a plugin rule. Suppose we wish to esti mate the parameter 0 = O(F) from some random sample x = (xl,..., xn) generated from the probability distribution F. The plugin principle estimates 0 by 0 = 0(F), where F is some estimate of the probability distribution generating the observed data. Often, 0(F) does not have simple form and must be computed by Monte Carlo resam pling. This procedure generates B resamples, x*l, ..., x*B according to the estimated probability distribution F. One then computes the statistic of interest from each of the resamples to obtain the bootstrap replicates 0,,... 0. The nonparametric bootstrap takes as the estimate of F the empirical distribution function based on the original data; that is, F places mass 1/n on each of the n observed data points xi, i = 1,...,n, and mass 0 for any point not observed. One then resamples with replacement from F. One can exploit any knowledge one might have concerning the form of the underlying distribution by using the parametric bootstrap. The parametric bootstrap assumes the functional form of the distribution to be known up to a vector of unknown parameters, so that F = FA. The estimate F = FA is the assumed functional form with some estimate A replacing the unknown A. Percentile and BC, Intervals Efron and Tibshirani (1993, pp. 168169) motivate the use of percentile bootstrap intervals as follows. Let 0 be as above and let <5 be its estimated standard error. Consider the standard normal 100 (1 2a)% confidence interval (0 z("a)&, + z(1)) (2.9) for 0 based on the assumption 9~ N (0, oa). Let the resample 0* be an observation from a N(0, 5M) distribution. The lower and upper confidence limits of (2.9) are the ath and (1 a)th percentiles of 0*'s distribution. Therefore, if I is the normal distribution function of 0*, the confidence interval has the form (T'(a), T1(1 a)). This suggests a way to construct general percentile intervals. Let O* be defined as in the previous section and let H be the distribution function of 9*. Interval (2.9) suggests the bootstrap interval (H1(a), H(1 a)). In practice, we must estimate the cdf H as the empirical distribution of the B bootstrap replicates { (}, so that the confidence endpoints are the ath and (1 a)th percentiles of this empirical distribution. If there exists a transformation u such that u(0) is normally distributed with mean u(0) and some finite variance, the percentile method can be thought of as a means of constructing an ordinary normal interval (a, b) for u(0) and then transforming back to the (ul(a), u1(b)) interval for 0 without knowing the correct transformation. This means the percentile interval generalizes the ordinary normal interval by requiring some unknown transformation of 9 to be normally distributed instead of requiring 9 itself to be normally distributed. Efron (1987) improved upon the percentile interval idea by introducing the bias corrected and accelerated, or BCa, interval. Efron's motivation for the BCa interval is as follows. Suppose there exists an increasing transformation u such that for 4 = u(O) and < = u(9), we have SN(zo, 1), (2.10) see where se=seo. 1+ and 0o is any convenient reference point on the scale of values. The biascorrected, or BC, interval is the special case when a = 0. Then, this means the BCa interval further generalizes the ordinary normal intervals by allowing for a normalinducing transformation, 4, to have bias zo when estimating the true 0, and standard deviation that changes according to the value of the parameter To construct the adjusted intervals, one estimates the bias and acceleration by and En 1 U3 a {E= (2.11) 6{ E=1 U,2}3/2' where I is the indicator function, 4) is the standard normal cdf, and U, is the ith infinitesimal jackknife value, or empirical influence component. For ease of computa tion, one can substitute the ith jackknife value from the Section 2.1.2 for Ui, so (2.11) becomes a =(0(.) 0())23 6,=11 ((.) 0(,))2}3/2 One then calculates a percentile interval using percentiles of the empirical distribution adjusted for the bias and acceleration: BCa interval (^*(Q), ,*(02)) where ( + z() I a pZ + z(a)) ( ( o + z(1a)) a2 += 1 ) 1 ?,(o + z(0) Efron (1987) details the theoretical advantages of the BCa interval over the ordinary percentile interval. Notice that the ordinary percentile interval is a special case of the BCa interval with zo and a both equal to zero. Thus, if model (2.10) holds but the bias and standard deviation acceleration are neglible, the percentile interval will perform adequately. The Bootstrap Applied to CaptureRecapture For the capturerecapture setting, the underlying distribution F = Mult(N, 1r) generates the observed data i. Thus, we generate B resamples (nl,... ,E*B) from an estimate of the distribution F = Mult(N, 7r) and calculate (N9,..., N). The 100(1 2a)% percentile confidence interval in this case has endpoints that are the empirical a and 1 a percentiles of the N* values (Buckland and Garthwaite, 1991). The BCa interval computes zo and a as above to obtain the percentile adjustments ai and a2. We believe this is a useful modification for the capturerecapture appli cation, and we recommend this type of bootstrap interval over the percentile interval for capturerecapture problems for two reasons. First, there exists a controversy in the bootstrap literature as to the validity of the ordinary percentile interval (Hall, 1994). Second, little is known about the small sample properties of Nestimates. Thus, the more general assumptions of the BCa interval are safer since we cannot verify the stricter assumption that there exists some transformation of N that has a normal sampling distribution with mean N and constant variance. Bootstrap theory has shown that the bootstrap BCa confidence intervals have nice properties, in the form of secondorder correctness, when the parameter (and estimate) is a smooth function of means (Hall, p. 52). We note, however, that although the bootstrap has been proposed in the capturerecapture framework and is probably the most popular method of computing confidence intervals in this setting, the theoretical properties of this bootstrap are unknown since this setting does not fall under this "smooth function model" framework. For the percentile bootstrap, the parametric version assumes that the ii are gen erated from the parametric model using the ML estimates as the true parameters, F = Mult((N, *). Efron (1987) showed that when a random variable has finite sup port, the nonparametric and parametric BCa intervals are theoretically equivalent. Since we are resampling from a multinomial distribution with finite support, we only consider nonparametric BCa intervals in the capturerecapture setting since Efron's result applies here. Buckland and Garthwaite (1991) proposed a nonparametric bootstrap that resam ples from a multinomial distribution with index N and probabilities ( 0o...o ni...I nm This is not a nonparametric bootstrap in the truest sense, but rather a semiparamet ric boostrap since we are forced to use the estimate, N, from the assumed model to estimate the underlying distribution F. We investigate an alternative, strictly non parametric bootstrap that resamples from the conditional multinomial distribution of the observed cell counts given n. Thus, we bootstrap the conditional estimate Nc of Section 2.2.1 by resampling from the conditional multinomial distribution, 32 Mult(n, (nl1.../n,..., no...1/n)). We investigate the performance of both of these boot straps in Chapter 5. CHAPTER 3 CAPTURERECAPTURE MODELS ASSUMING LOCAL INDEPENDENCE We now consider mixed capturerecapture models that allow for population het erogeneity and variable sampling effort. In this chapter, we focus on models that assume that the dependencies among the t sampling occasions arise solely from the differences between subjects. We consider alternative forms of dependence in Chapter 4. We investigate the use of two mixed models, the logisticnormal model and a latent class model, and the loglinear model of homogeneous twofactor interaction. We motivate all three of these models from the logistic model that contains a sep arate parameter for each subject in the population and for each sampling occasion. Sections 3.1 and 3.2 review estimation and interpretation of the models in their tradi tional application, when all N individuals are observed. This complete table method ology will prove useful when computing the profile likelihood confidence intervals of Section 2.3.1. Section 3.3 discusses their extensions to the capturerecapture setting and is followed by an example based on a sixsample experiment designed to estimate the size of a snowshoe hare population in Section 3.4. Section 3.5 uses another example, a study designed to estimate the number of people who contracted the hepatitis A virus during a hepatitis outbreak, to demon strate the difficulties associated with the use of the mixed models of Section 3.1 and 3.2. Section 3.6 draws analogies between the behavior of the Nestimates obtained from these models and other wellstudied Nestimation problems and is followed by comments in Section 3.7. We defer simulation studies that examine the performance of the Nestimates presented in this chapter until Chapter 5. 3.1 A Logistic Model with Subject Heterogeneity For subject s, s = 1,..., N, let y', = (yl,..., yt) be a vector of t binary mea surements (0 or 1), where sj = 1 denotes capture in sample j. Let pj = P(yj = 1). We permit subject heterogeneity using the model log ( = as + j. (3.1) Original applications of the model (Rasch, 1961) referred to t test items, making the model popular in educational testing, where it is known as the Rasch model. In fitting the model, one assumes independence of responses across occasions for a given subject, termed local independence, and independence between subjects. Standard ML asymptotics do not apply to this model, since as the number of subjects (N) grows, so does the number of model parameters. Thus, the ML estimate of 3 = (,/i,..., 3) is not consistent (Andersen 1980). Let 3j denote the ML estimate of p3, j = 1,..., t. It is wellknown that when t = 2, #2 1 ~i 2(32 31) as N + oo (Andersen 1980). Ghosh (1995) proved inconsistency for t > 2. 3.1.1 Conditional Maxinmurm Likelihood Two approaches are used to overcome the inconsistency. The first, a fixedeffects approach, treats {a,} as nuisance parameters and eliminates them by conditioning on their sufficient statistics, then maximizing the conditional likelihood, yielding condi tional ML (CML) estimates 3 As in Chapter 1, let Z = {(1,...,1),...,(0,...,0)} be the set of 2t possible sequences of responses (ys,,..., Yst), in lexicographic order, and denote Z = {(1,...,1),..., (0,...,0,1)} as the set of observable sequences. Let i = (il,...,it) be an element of Z, and let ni = ni,...i, be the number of subjects having that sequence. Tjur (1982) showed that the CML estimates are, equivalently, ML estimates of main effect parameters in a loglinear model of quasi symmetry fitted to the 2t table of counts {ni}. Specifically, letting {/.i = E(ni)}, the loglinear model is log(/l...,) = p + $1I(ii = 1) + + Ptl(it = 1) + A(ii,..., it), (3.2) where the parameter A(ii,..., it) is invariant to permutations of its arguments and the I() function is an indicator. The quasisymmetry model (3.2) is easily fit in standard statistical software such as GLIM or SAS (PROC GENMOD). The model implies that the binary response has the same association for each pair of items and corresponds to a random effects approach in which one assumes that the ability parameters are distributed according to some completely unspecified distribution F. To see this, suppose a,,..., c are independently and identically distributed according to the unknown cumulative dis tribution F. For subject s, model (3.1) states that the probability of having capture history y, = (ys, ys2, st), S = 1,..., N, is H e Ysi (a ,,+ )3) e j y C i v 'i II 1 + e"+ n =1 1 + ea[+Ij I= (1 + ea,+Oj) Thus, the marginal probability of response pattern i E Z is "I = i...i = exp(E 3i ) (1 + e,+ dF(cs) (3.3) j=1 I=, (1 + ) We notice that the integral in (3.3) is a complex function of 3 that depends on the data only through the sufficient statistics Si = =1 ij, the raw score for pattern i. These marginal probabilities then have structure that is a special case of t r...i, = exp(E Ijij)h(il,..., it), (3.4) j=1 where h is a parameter that is invariant to permutations of its arguments. Taking the natural logarithm of both sides of (3.4) gives (3.2). This nonparametric formulation will prove important when we consider the model's utility in the capturerecapture setting. We will see in Section 3.3 that fitting this model provides no information on the population size. Darroch et al. (1993) and Agresti (1994) proposed the simpler loglinear model of homogeneous twofactor interaction (H02), log(p..., =a + =(i = 1) +..., tI(it = 1) + ( = 1 )A, (3.5) for capturerecapture. This model is the special case of the loglinear model of no threefactor interaction in which all associations are identical. It is also a special case of the quasisymmetry model (3.2) in which only the secondorder interactions are allowed to differ from zero. This simple model has only one more parameter than the model of mutual independence of the t responses, which is (3.5) with A = 0. This model proves useful for estimating the population size since the quasisymmetry model provides no Nestimate. Chapter 5 will show that confidence intervals for N produced by this model have good coverage probabilities when population heterogeneity is present. 3.1.2 Marginal Maximum Likelihood (MML) Approach to Estimation A second approach treats {a,} as random effects, typically having a normal dis tribution with mean 0 and unknown variance a2, for which logit(p,j) = aZ, + /j (3.6) with Z, ~ N(0, 1). We refer to this model as the logisticnormal (LN) model. One integrates over this distribution to obtain the "marginal likelihood" of (a, (3) given the 2t observed cell counts. The fitted counts from the marginal model have quasi symmetric structure, since the model assumes a particular form for F in equation (3.3). This model also has only one more parameter than the model of mutual inde pendence, which is (3.6) with a = 0. Bock and Lieberman (1970), Bock and Aitkin (1981), and Thissen (1982) discuss a direct method of maximizing the loglikelihood based on the NewtonRaphson algorithm. Direct Approach to Maximization. A direct approach to MML estimation maximizes a Guassian quadrature approx imation to the marginal loglikelihood using a maximization algorithm. Bock and Aitkin (1981) demonstrated using the NewtonRaphson algorithm for a general item response model. The probability that a subject with ability Z has capture history i, i E Z is t e(iC(az+i#)) S7z = 1 + e(CZ+/i) j= 1 Thus, the probability that a randomly selected subject has that pattern is the marginal probability e eij("z+l)) 7i = f irz (z)dz = L + e(z+/ ) (z)dz, (3.7) where O(z) is the standard normal density. This yields the marginal multinomial loglikelihood for (a, /3) given the cell counts n, l(a, p; n) oc E nilog(7ri). (3.8) iEI Since no closedform expressions exist for the integral in (3.8), one can use numer ical integration to obtain an approximation. GaussHermite quadrature (Stroud and Secrest 1966) approximates this integral of form f f(z)k(z)dz by the expres sion Eq= f(zk)Vk, where the zk's are known as quadrature points or nodes and the vk's are the corresponding weights. The choice of the number of quadrature points q determines the degree of accuracy, and the weights vk are usually scaled to satisfy _q=L Vk = 1. Thus, the marginal probabilities (3.7) are approximated by V t e>(i(zhk + j)) ' i =1 + e(lZk+ kj) k=1 1 and l(a,/3; n) (x E nilog(fr,) (3.9) iEZ is the objective function maximized with respect to (a, /). NewtonRaphson has traditionally been popular for ML analysis because it pro vides an estimate of precision of the ML estimates as a byproduct of the algorithm. Suppose, in general, we wish to maximize a nonlinear function g(6) with respect to the unknown parameter vector 0. Let 0P) be the pt guess for 6, the value that maximizes g. Let q' = (9g/901,09g/802,...), H denote the matrix having entries hij = 92g/8a0i0j, and q(P) and H() be the corresponding quantities evaluated at 0P). At iteration p, the (p + 1)st guess relies on the Taylor series expansion of g(O) around 0(P), Q(P)(O : 9(e(P)) + q(P)'(0 0P)) + 1(0 0(P))'H() (6 (P)). (3.10) Solving OQ(P)(0)/O0 = q(P) + H(P)(0 0(P)) = 0 yields the (p + 1)st guess for 0, (p+= 0(P) (H(P)) q(P), (3.11) as long as H(P) is nonsingular. For the logisticnormal model, 0 = (a, 3) and g(O) = 1. Denote exp(azk + 3j) P 1 + exp(azk + 3j) The Ith element, = 1,...,t + 1, of the score vector q is _91 ni airi El (3.12) 090 i i 090 If we rewrite q i = L fkvk (3.13) k=1 where fik = Ep Pik)1i1 j=1 then the elements of the score vector have the form S nE l ikVk, l=1,...,t+1. (3.14) iEl k=l (h=1 ihh) (90 Using the identity that, for any function f() and parameter 0, of Ologf(0) o = f(0) lf (3.15) equation (3.14) can be reexpressed as q Ologfik nik =1,...,t + 1, (3.16) iEZ k=1 O^ where Wik = fikvk/l(Z=l fihVh), 9logfigk t ba  Zk(ij Pk), j=1 and alogfik (9fk = (ij Pjk), j = 1,..., . a!j The Hessian matrix H has elements S21 q 2logfik 0logffk wik] hi.m= E u' E e + t 0=, K=E 0l0m 80 8 / 'n where Sj= 8210gfik a2f Z [jk(1 Pjk)], a2logfik 0g k Pjk(1 pjk), 0210gfi k 2 logik and Ok (EI= fikVk) fikVk [klok~i [k (fikVk )l 0Gm (=1 fikVk)2 The observed information matrix, I1 = (H(0))1, is an estimate of the largesample variance covariance matrix for 0 = (&, /). We choose to use another maximization algorithm for direct maximization of the loglikelihood since we will not base confidence intervals on the largesample standard errors. Feasible Sequential Quadratic Programming (FSQP) (Zhou and Tits, 1994) is a set of highquality FORTRAN subroutines for the minimization of a smooth objective function subject to nonlinear equality and inequality constraints, linear equality and inequality constraints, and simple bounds on the variables. These routines are based on an iterative algorithm that searches away from a given feasible iterate in an arc whose direction is determined by an estimate of the Hessian of the Lagrangian. This arc search along a function's surface is known as an Armijo type search. We use FSQP to provide the MML estimate (MMLE) of 0 by minimizing the negative of equation (3.9). We maximize the loglikelihood with the only constraint being a lower bound of zero for the variance component a. Indirect Maximization For completeness, we outline an indirect approach to MML estimation in the logisticnormal model. This approach takes the form of an ExpectationMaximization (EM) algorithm (Demptster, Laird, and Rubin, 1977) and is based on the observation that the likelihood equations take the simple form of those from a weighted generalized linear model. Specifically, the score equations (3.16) can be rewritten as q qt a = =E E. Zk [Tik(1 pik) NkPjk] = 0 k=1 j=1 and q a 3 E jk(1 pjk) kPjk] =0, j= 1,...,t #fi, k=l where Nk Z nfiWik (3.17) iET can be interpreted as the expected number of individuals with a latent variable value of zk and Tjk = niijWik (3.18) iEI is the number of subjects with catchibility zk expected to be captured on occasion j. These equations take the familiar form of the likelihood equations for a logistic regression analysis of nijk successes out of Nk trials. This formulation suggests an iterative EM procedure. At the pth step, given pa rameter values 0), the expectation step computes ({ )} and {N()} using (3.17) and (3.18). The maximization step obtains updated parameter estimate 0P+1) through an ordinary logistic regression analysis of {n } successes out of (NP) } trials. One typically iterates between the two steps until some criterion, such as the change in successive deviances or the maximum change in successive parameter values, is less than some predetermined tolerance. The NewtonRaphson and EM methods of estimation both have advantages and disadvantages. The NewtonRaphson algorithm converges relatively quickly, having a quadratic convergence rate, and provides largesample standard errors for the MML estimates. A drawback of NewtonRaphson is the complexity of the derivatives and that the repeated inversion of H becomes computationally intensive for large t, al though with today's computing facilities, it is not too much so. The biggest advantage of the EM algorithm is that it is computationally simpler than NewtonRaphson to implement. Since the EM algorithm updates the param eter estimates through logistic regression, it is simple to program the algorithm in standard software packages such as SAS, Splus, or GLIM. The algorithm is, however, notoriously slow in converging, with convergence becoming very slow after the algo rithm quickly moves to a neighborhood of the estimates. In f.ct, as we will see in the next section, this algorithm has the potential to "stall"; that is, the algorithm can stop prematurely. Also, the EM algorithm does not provide standard error estimates without additional computation. After estimating the parameters, the fit of the assumed model can be assessed by the likelihood ratio statistic G2 = 2 ( niglo (3.19) where ii = 7i(&,/3) is the estimate of the Gaussian quadrature approximation of 7ri, i E Z. If all expected frequencies are greater than or equal to five, this LR statistic has an approximate x2 distribution with df=(2t 1 free cell probabilities)(t + 1 parameters)=2t t 2. 3.2 A Latent Class Model This chapter also applies a latent class model to the capturerecapture problem. General latent class models were first introduced by Goodman (1974). In general terms, suppose we observe measurements on a set of t categorical variables {Vj}, where Vj has Lj levels, j = 1,..., t, so that subjects can be crossclassified into a ni=1 Lj contingency table. Latent class models postulate the joint distribution of the t observed, or manifest, variables as a mixture of L distributions, defined by L classes of a latent, or unobservable, categorical variable Z. This latent variable categorizes individuals into L homogeneous classes. Each of the L components of the mixture distribution are assumed to satisfy mutual independence among the t variables, so that observed associations among these manifest variables are a product of differences between latent classes. If we let (ii,..., it) be a subject's classification according to the t manifest variables, and (il,...,it, iz) be the joint response to these variables along with their latent class membership, the observed counts {ni,... } represent a collapsing of the n=l Lj xL table over the levels of Z. Assuming mutual independence of (V1,..., V) given Z implies that the cell counts in this n1=i Lj x L table satisfy the loglinear model t t =+log(.. ) A + A ++1: A . k=1 k=1 Thus, the observed counts satisfy t L t log(pi...i,) = E Ai + log E exp Az + = A. ) z k=1 e1=1 k=1 3.2.1 QuasiSymmetric Latent Class Model We consider a special case of this general latent class model that requires the associations, {Ai } between the latent variable and the t manifest variables to be identical. This latent class model is a special case of models introduced by Lindsay et al. (1991) and Agresti and Lang (1993). If we assume there are only two latent classes, this model is the special case of the logistic model (3.1) with only two possible values for a,. In terms of {fi1...,~}, it has the quasisymmetric structure whereby the associations between the binary latent variable and the t responses are identical. Relationship to LogisticNormal Model This quasisymmetric latent class model (QLC) with two latent classes relates to the logisticnormal model, being a generalization of the crude approximation of it that uses only two quadrature points. When the {Vk} are unknown, the logisticnormal model with q = 2 is equivalent to the QLC model with L = 2. Consider the three sample case for notational convenience. Generalizations to t > 3 are straightforward. Consider the marginal probability of capture history i = (il, i2, i3), 7r1i213 3 [i exp(ij(az + )) Ozdz. J=1 1 + exp(Uz + j)J 2point Gaussian quadrature yields z = (z, z2) = (1, 1), but with only 2 nodes, the model is equivalent if z is treated as a factor: z = (0, 1), so that ( exp(E1 zkij) irili2i3 = exp jij E exp(k + j)) k k.4=1 k=1 j=1(1 + exp(azk +6 3 2 3 3 = exp ( OjYij E exp O azki + log(Vk) log II (1 + exp(azkij + /j)) (j=1 k=l j=l j=1 Taking the log of both sides yields 3 log( rli2i3) = f +i, j=1 2 3 3 log E exp (aZkij) + log(vk) log II(1 + exp(azkij + 3j)) k=l =1 j=l This model satisfies the QLC structure of log(7ri,i2i3) = A"i + AY + AY + log (AVIZ + A2f + AX3 + A) (3.20) where the item parameters 3 are equivalent to the main effect parameters for the manifest variables (A, A A3 ). The association parameters Avl = A, = A equal a for ij = 1 and k = 2 and 0 otherwise, satisfying quasisymmetry. Also, constraining Af = 0 for identifiability yields 3 3 A = log(v ) log (1 + exp(a + 0)) log(i) log J(1 + exp(/)) j=1 j=1 The QLC model with two latent classes also implies the LN (q=2) model, providing equivalence between the two models. Consider the ML estimates, \ = ( V, i i, \V2, V3z Z\' 0 1 1 1 1 for model (3.20), where An1 is the common value of AI~' = A2Z = Az. Without loss of generality, we fit this model with the latent variable Z coded so that this association parameter is nonnegative. If this is not the case, we simply change the coding of the levels of the latent variable Z from (0,1) to (1,0). Assuming Aoo 0, we have & = Arl, B1 = \A1 A~o, 2 = 12, and ~3 = 3. The values fk, k = 1,2 in the LN model are then obtained by determining the values for v = (vl, v2) that satisfy the equations 3 3 S= log(V2) log (1 exp(+ )) log(v) log (1 + exp() , j=1 j=1 and V1 + V2 = . Thus, there is a onetoone equivalence between the parameters of the QLC (L = 2) model and the LN (q = 2) model. This equivalence between the two models can be generalized in that the QLC model for a given L > 2 is Aitkin's (1996) logisticmixture model, which places no distributional assumptions on the mixing distribution other than it has L masspoints. This is the model fit sequentially for increasing L in order to obtain Aitkin's NPML estimates of the unknown masspoints a = (al,...,aL), weights v = (V,...,VL), and 3. A simple extension of the previous argument shows this equivalence. The marginal probabilities for the logisticmixture model with L masspoints is 3 log(Tri...i,) = y3jij + j=1 log E exp (akij) + 1og(Vk) log (1 + exp(aij + 3j)) . k=l j= j=l 1 Arguing as in the q = 2 case, we see ak, k = 1,..., L, are the common association parameters A yz = Avz= AVz and that 3 3 = log(Vk) log (1 + exp(tk+ )) Ilog(1) log IT(1 + exp(al + j)) j=1 j=1 3.2.2 Estimation One can fit this model using the EM algorithm. The distribution of the complete data, the Ij=1 Li x L contingency table, has regular exponential form, so that only the complete data sufficient statistics must be estimated at each Estep. (See Goodman (1974) and Lang (1992) for an EM approach to latent class models in general and Agresti and Lang (1993) for this specific case). The ik, k = 1,2, are the values that simultaneously satisfy 3 3 2 = log(2) log l(1 +exp(& + j)) log(i) log 1(1 +exp(j)) . j=1 j=1 and v1 + 2 1. Alternatively, we estimate (vi, a,/3) using FSQP. This optimization routine is much faster than the EM algorithm, which can stall before converging to the maxi mum likelihood estimates. As an example, consider the complete table based on the data from Bock and Aitkin (1981) that consists of the response pattern counts of 1000 students' re sponses to 5 LSAT questions. The data are displayed in Table 3.1. FSQP gives the MMLE's for the 2point LN model with arbitrary weights v as & = 1.416, /3 = (3.240,1.520,0.761,1.827,2.616), and fi = (0.621,0.379). The model fit yields deviance G2 = 23.36 based on 23 df. The parameter estimates for the quasisymmetric latent class model obtained from the EM algorithm using GLIM are reported in Table 3.2. Table 3.1. Response pattern counts and the fitted values of two mixed models of 1000 students' answers to 5 dichotomouslyscored LSAT questions ni 298 28 61 11 173 21 56 16 80 15 28 3 81 14 29 10 i 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Table 3.2. EM Parameter Estimates for Applied to the LSAT Data Parameter the QuasiSymmetric Latent Class Model Estimate 1.416 5.593 2.352 1.520 0.761 1.827 2.616 2.292 Notice A = 11, A1 = A 13, A2 2,3 = p, = A5 = V, and S= [log(.621) log =(1 + exp(& + ^))] [log(.379) logn =(1 + exp(/j))], demonstrating the equivalence of the two models. The convergence for the EM algorithm is slow. 826 EM iterations were required to match the estimates obtained in FSQP. Also, the EM algorithm terminated twice (after 336 iterations and 345 iterations) before the true MLE's were obtained even though the convergence criterion, the change in successive deviances, was tightened to le20. This propensity of the EM algorithm to "stall" before reaching the true MLE's has been observed in fitting other generalized linear mixed models (Booth and Hobert 1997). 3.3 ML Estimation of N We now discuss fitting the models outlined in the previous section in the capture recapture setting. Again, we refer to the table with all 2t cell counts known as the complete table and the one with no...0 unknown as the incomplete table. Recall, also, from Section 2.2 that L is the full unconditional likelihood, L1 is the conditional (on n) likelihood, and L is the approximate likelihood that treats N as a continuous parameter. 3.3.1 The LogisticNormal Model Using Gaussian quadrature, the logisticnormal model of Section 3.1.2 postulates the form of the probabilities 7ri(OLN), where OLN = (a, f), as ii(OLN) = : + e(+#1) Vk, iE Z. k=1 =1 One can easily implement the conditional (on n) approach to Nestimation of Sec tion 2.2.1 by directly maximizing L1 with respect to (a, O) using NewtonRaphson or some other numerical optimization routine. Denote Jr(0LN) = ir for notational convenience. Since LI (OLN; ) n! . nliej i! iVEhEy 7rh we have li = logLl oc nilog(ri) nlog ( i r) . and all ni 8i n ai ni n ai\ Thus, the likelihood equations used to obtain Oc from the incomplete table are sim ply the score equations for the complete table minus the term corresponding to cell (0,..., 0), corrected for the total number of subjects seen during the experiment. One then calculates Nc using (2.6). Sanathanan (1974) incorporated a twostep method "C of estimating (a, f3) by estimating p/ using the CML approach and then maximizing "C Ll(a,3 C) with respect to a. The 95% profile likelihood interval of Section 2.3.1 can be constructed by using one of the complete table algorithms of Section 3.1.2 to find the no...o values that yield a likelihood ratio statistic of G2( o...o) + 3.84. We obtain Ns by simply maximizing L with respect to (a, f) for each candidate N, while con straining a > 0. Maximizing L with respect to (N, OLN), while constraining a > 0, yields N. 3.3.2 QuasiSymmetric Loglinear and Latent Class Models From Tjur (1982), the fixedeffects approach of CML estimation to the logistic model is equivalent to fitting the quasisymmetric loglinear model to {ni}. Darroch et al. (1993) and Agresti (1994) noted that this CML estimation provides no information on N, since one of the likelihood equations, ALo...o = no...0, shows that any value no...o is consistent with the model. Thus, the deviance for this model remains constant for all N > n since cell count no...o makes no contribution to the test statistic. This yields the profile likelihood confidence interval (n, oo) for N. They considered special cases of the quasisymmetry model for which this is not the case. For instance, the loglinear model of homogeneous twofactor interaction (3.5) is a special case that is still more complex than the loglinear model of mutual independence that results from a lack of subject heterogeneity. One obtains Nc for this model using standard model fitting software by fitting (3.5) with zero weight for cell count no...0. One can obtain N by numerically optimizing L with respect to (N, 0, A) in expression (3.5). One could compute the estimate Nc for the QLC model with L = 2 by estimating (a, 0, v) using the EM algorithm while setting the weights of the two cells in the 2t x 2 table that correspond to the unobserved cell in the 2* table equal to zero. Because of this algorithm's potential to stall (Section 3.2.2), we use the model's equivalence to the logisticnormal model with q = 2, and use FSQP to maximize L1 with respect to OLC = (VI, 0, 3). We also use FSQP to fit the complete tables generated when searching across no...o values to obtain a profile likelihood confidence interval. We obtain N with FSQP by numerically maximizing L with respect to (N, OLc). 3.4 Snowshoe Hare Example Cormack (1989) reported a capturerecapture study having t = 6 consecutive trapping days for a population of snowshoe hares. Table 3.3, which displays the data, shows that 68 hares were observed. The Nestimates and confidence intervals from the different models are summarized in Table 3.4 for ease of comparison. The logistic normal model using 10point quadrature (LN10) yields &c = 0.96 and Nc = 92.0 from the conditional (on n) approach and & = 0.91 and N = 89.0 from numerical optimization. These point estimates remain unchanged with better approximations of the marginal probabilities (i.e. q > 10); in fact, a profile of both Nc and N across q > 2 reveals that these estimates stabilize for q > 5. This is not always true, however, and we examine issues related to the estimation of a in the next section. Table 3.3 shows the model fit conditional on n. For the logisticnormal model, the 95% profile likelihood interval corresponding to Nc is (74.8, 148.5). The 95% parametric percentile bootstrap interval corresponding to N (with B = 1000) is (69.9,129.4). The 95% nonparametric percentile and BCa bootstrap intervals are (70.8, 231.5) and (71.8, 286.4), respectively. The reason for the large discrepancies between the bootstrap intervals will be discussed in Section 3.5. The loglinear model of homogeneous 2factor interaction gives point estimates almost identical to the logisticnormal model, with Nc = 90.5 and N = 88.2. All fitted values for the 2t 1 observed counts were no further away than .04 from the fitted values produced by the logisticnormal model. The 95% profile likelihood interval is (74.8,125.1), while the 95% parametric percentile interval is (73.6, 114.3). The nonparametric percentile and BC, intervals are (73.8, 120.2) and (74.8, 127.4), respectively. Thus, we see much more consistency across the different confidence intervals for the simpler loglinear model than for the mixed model, the reason for which will be seen in Section 3.5. The loglinear model of mutual independence yields Nc = 75.1, profile likelihood interval (69.9, 83.3), and N = 76.3 with 95% percentile percentile interval (71.3, 77.9). The 95% nonparametric percentile and BCa intervals are (70.9, 80.3) and (71.1, 80.5), respectively. This model assumes no heterogeneity, which yields narrow intervals that severely underestimate N when this assumption is false. Table 3.3. Results of capturerecapture of snowshoe hares Capture 3, Capture 2, Capture 1 Capture 6 Capture 5 Capture 4 000 001 010 011 100 101 110 111 0 0 0 3 6 0 5 1 0 0 (24.0)* (2.3) (5.4) (0.9) (3.2) (0.5) (1.2) (0.3) (9.1)** (2.1) (4.8) (1.1) (2.8) (0.6) (1.5) (0.3) 0 0 1 3 2 3 0 0 1 0 0 (4.8) (0.8) (1.8) (0.5) (1.1) (0.3) (0.6) (0.3) (4.2) (1.0) (2.2) (0.5) (1.3) (0.3) (0.7) (0.2) 0 1 0 4 2 3 1 0 1 0 0 (3.9) (0.6) (1.5) (0.4) (0.9) (0.2) (0.5) (0.2) (3.5) (0.8) (1.8) (0.4) (1.1) (0.2) (0.6) (0.1) 0 1 1 1 0 0 0 0 0 0 0 (1.3) (0.3) (0.8) (0.3) (0.5) (0.2) (0.4) (0.3) (1.6) (0.4) (0.8) (0.2) (0.5) (0.1) (0.3) (0.1) 1 0 0 4 1 1 1 2 0 2 0 (6.8) (1.1) (2.6) (0.6) (1.5) (0.4) (0.9) (0.4) (6.0) (1.3) (3.1) (0.7) (1.9) (0.4) (1.0) (0.2) 1 0 1 4 0 3 0 1 0 2 0 (2.3) (0.6) (1.3) (0.5) (0.8) (0.3) (0.7) (0.4) (2.8) (0.6) (1.5) (0.3) (0.9) (0.2) (0.5) (0.2) 1 1 0 2 0 1 0 1 0 1 0 (1.9) (0.5) (1.1) (0.4) (0.7) (0.3) (0.6) (0.4) (2.3) (0.5) (1.2) (0.3) (0.7) (0.2) (0.4) (0.1) 1 1 1 1 1 1 0 0 0 1 2 (1.0) (0.4) (0.9) (0.5) (0.5) (0.3) (0.7) (0.7) (1.1) (0.2) (0.6) (0.2) (0.3) (0.1) (0.3) (2.0) * Logisticnormal Model (q=10) ** Quasisyiunctric latent class model The conditional approach to the twocategory quasisymmetric latent class model yields Nc = 77.1. Table 3.3 shows the fit. The 95% profile likelihood interval is (70.8,87.4). The numerical optimization approach yields N = 76.3 and 95% para metric percentile interval (72.3, 84.3). The 95% nonparametric percentile and BCa intervals are slightly wider at (72.6, 100.7) and (72.0, 91.3). These intervals are much narrower than those produced by the logisticnormal model, since this latent class model represents a compromise between the naive mutual independence model and the logisticnormal model. Table 3.4. NEstimates and Confidence Intervals Produced by the Logisticnormal, QuasiSymmetric Latent Class, Homogeneous Twofactor Interaction, and Mutual Independence Models bootstrap Profile percentile BCa Model Nc Likelihood N parametric nonparametric nonparametric LN10 92.0 (74.8,148.5) 89.0 (69.9,129.4) (70.8,231.5) (71.8,286.4) H02 90.5 (74.8,125.1) 88.2 (73.6,125.0) (73.8,120.2) (74.8, 127.4) QLC 77.1 (70.8,87.4) 76.3 (72.3,84.3) (72.6,100.7) (72.0,91.3) IND 75.1 (69.9,83.3) 76.3 (71.3,77.9) (70.9,80.3) (71.1,80.5) This example exhibits typical variability among the different point estimates and confidence intervals obtained from the different models, and also the variability among the different confidence intervals for a given model. These relationships and the reasons behind them are explored in the next section. 3.5 Behavior of the Log Likelihood and N Estimator In the capturerecapture problem, the estimation of a strongly affects the esti mation of N. Large population heterogeneity causes high correlations among the t captures and results in a large estimate of no...0. For the snowshoe hare example, Figure 3.1 shows N as a function of an assumed known value for a. Plot 1 in that figure shows results using q = 10, 50, and 75 quadrature points. Since N is a rapidly increasing function of a, small changes in & can have a large impact on the ML esti mate of N. Plot 2 in Figure 1 displays a profile of 2 log L in terms of a, revealing that & = 0.9, from which we see N = 89.0 in Plot 1. These plots suggest that the choice of q affects the results only for large a. Since the maximum likelihood estimate for a is moderate, we obtain identical results for q = 10, 50, and 75. 200 180 160 140 120 100 80 2 Log L 74 72 70 68 66 64 62 60 1.0 2.0 1.0 2.0 Figure 3.1. N and 2 Log L as a Function of a for the Snowshoe Hare Data As heterogeneity increases in model (3.6), the probability of capturing a subject a relatively small or relatively large number of times increases. Capturing a large num ber of animals only once, for instance, suggests either (1) the logisticnormal model holds and considerable heterogeneity exists within the population, (2) the logistic normal model holds and the probabilities of capture at each occasion are small, or (3) subjects experience trap avoidance so that the local independence assumption of the logisticnormal model is inappropriate. Unfortunately, traditional goodness offit tests do not always differentiate between a correct and incorrect model. We demonstrate this difficulty in Section 3.7. This case of large heterogeneity causes dif ficulties in estimation when using the logisticnormal model, since a large & results in a relatively flat likelihood surface, which implies unstable and imprecise Nestimates. This problem is not serious with the snowshoe hare data. The profile likelihood surface with respect to (N, a), maximized over /3, in Figure 3.5 shows a reasonably welldefined mode, which leads to a stable point estimate of N. The loglikelihood surface for the snowshoe hare data, however, is not unimodal, reflecting that many (N, a) pairs exist that have loglikelihood values only slightly lower than the welldefined maximum that defines N. This second mode is reflected in the large differences between the parametric percentile bootstrap interval and the more robust (e.g. nonparametric percentile and nonparametric BCa) bootstrap in tervals. Thus, the large inconsistencies among these intervals provide some clue that many (N, a) pairs are consistent with the data. Because of this near nonidentifia bility problem, the simpler loglinear model of homogeneous twofactor interaction provides confidence intervals narrower than those generated by the logisticnormal model when a is moderate to large. Simulation results in Chapter 5 support the use of this simpler model in these cases. In contrast to the stable point estimates for the snowshoe hare data, consider Table 3.5 from Chao and Tsay (1996a,b), which reports the results from an epi demiological study designed to estimate the number of people infected during a 1995 hepatitis A outbreak in northern Taiwan. The 271 observed cases were reported from 3 sources records based on a serum test taken by the Institute of Preventive Medicine of Taiwan (P), records reported by the National Quarantine Service (Q), and records based on questionnaires conducted by epidemiologists (E). For the logisticnormal model, a profile of N across q reveals that the Nestimates do not stabilize until q > 45. Numerical integration yields & = 2.6 and N = 2204.5 using q=10 and & = 2.9 and N = 4087.2 using q = 50. Figure 3.2 shows that o 'o $000 10 100 200 300 400 500 o N Figure 3.2. Two Views of the Profile Loglikelihood Surface With Respect to N and a, Maximized Over )3, for the Snowshoe Hare Data Set Table 3.5. Capturehistory Counts and Conditional (on n) Fitted Values for Hepatitis Study P Q E n, LN (q = 10) LN (q = 50) LC (L=2) 000 1953.4 4280.3  0 1 63 61.0 61.0 61.0 0 1 0 55 58.0 58.0 58.0 0 11 18 17.0 17.0 17.0 1 00 69 68.0 68.0 68.0 1 0 1 17 20.0 20.0 20.0 1 1 0 21 19.0 19.0 19.0 111 28 28.0 28.0 28.0 the Nestimates for different q diverge as a increases, so that at & = 2.6, N differs dramatically for different values of q. As Aitkin (1996) noted, large q is necessary in random effect models when a is large. We also see that the q = 10 approximation breaks down around a = 3. Better approximations (q > 10) break down at larger values of a. 2 Log L 12000 oq= + 1400 0 q=50 + 1420  10000 ++=75 +q=75 + o+ 1440 8000 1460 6000 1480 4000 1500 2000 1520 1540 0 1 2 3 4 0 1 2 3 4 0 0 Figure 3.2. N and 2 Log L as a Function of a for the Hepatitis Data The large & and large negative values for / here reflect the many subjects with one capture (187) compared with the numbers of subjects with two or three captures (56 and 28). With such large &, the data provide relatively little information about N. Figure 3.2, Plot 2 shows that the hepatitis data yields a flat likelihood with respect to a, so a wide range of a values are consistent with the data. The plausible a values, however, correspond to a wide range of Nestimates, since N increases sharply with respect to a (Figure 2, Plot 1). For instance, a 95% nonparametric BCa interval for N is (786, 85,876). We present the hepatitis loglikelihood surface for comparison against the snow shoe hare surface of Figure 3.5. Figure 3.5 illustrates the relationship among N, a, and the confidence on N, by showing the profile loglikelihood surface with respect to N and a maximized over 0, for the hepatitis data set. Nothing practical can be said about N based on the logisticnormal model except that it is not very small. The flat loglikelihood can cause wild fluctuations in the point estimate due to small changes in numerical precision or in the data themselves. By contrast, the snowshoe hare data has a welldefined maximum of the loglikelihood at (a = .9, N = 89.0) in the range 68 < N < 500. In this case, the slope of the surface in the N direction is much steeper for small a than for large a, so that small to moderate heterogeneity produces narrower confidence intervals. Why does the logisticnormal model sometimes provide little information about the population size? The reason is similar to the reason why every no...o > 0 is plausi ble for the quasisymmetry model. A consequence of the nonparametric randomeffect motivation for the quasisymmetry model of Section 3.1.1 is that for each candidate no...o, there exists a mixing distribution for which the fitted value would be that no...o. The class of possible mixing distributions is so rich that any no...o is equally plausible. The logisticnormal model implies a marginal distribution that is a special case of the quasisymmetry model. This class of mixing distributions is still rich enough that 0003000 4000 4 3 2 00 000 2000 o 3oo 4000 "0 0 o N 5000 Figure 3.4. Two Views of the Profile Loglikelihood Surface With Respect to N and a, Maximized Over /3, for the Hepatitis Data Set many values of a may be consistent with the data. A wide range of plausible a val ues means the candidate N values form a wide interval, amounting to little practical information about N. Instead of allowing each subject to have a different propensity for capture, the latent class approach requires Z, to be in one of two latent classes, with unknown probabilities {vk}. For the hepatitis example, Figure 3.3 portrays the deviance profile for the QLC model. The latent class model also yields a flat loglikelihood, yielding arbitrary point estimate and profile likelihood interval (407, oo). G2 0.96+3.84 0.96  400 500 600 700 800 Figure 3.3. Deviance (G2) Profile for 379 < N < 800 for the Hepatitis Data The flat loglikelihood incurred by this model for the hepatitis data is a result of the model's close relationship to the loglinear model of quasisymmetry when t = 3. Lindsey et al. (1991) demonstrated that, when k > (t + 1)/2, the quasi symmetric latent class fit is very close to the conditional maximum likelihood fit of logistic model (3.1). They gave sufficient conditions for these two fits to be identical, and term tables for which this is true as concordant tables. Chao and Tsay (1996a, 1996b) note that the true population size for the hepatitis data is approximately 545. If we fit the latent class model to the complete data set, we see that the complete table is concordant since the quasisymmetric latent class model yields the conditional maximum likelihood fit given by the loglinear model of quasisymmetry (see Table 3.4). The unobserved cell causes this equivalence to no longer hold for all concordant tables, since if it did, the profile likelihood interval would match the quasisymmetry interval of (n, oo). This relationship does, however, explain the small amount of information provided by the QLC model when t = 3. The profile likelihood lower bound, 407.2, is only slightly larger than the quasisymmetry lower bound n = 271. We will see the similarities between the two models for t = 3 in Chapter 5. When t > 3, one can also obtain only a lower bound for N from this model. Chapter 5 will also demonstrate that if a table does provide a twosided interval for N, such as in the snowshoe hare example, these intervals tend to be narrower than the logistic normal and homogeneous twofactor interaction intervals. These simulations will show that when the logisticnormal model is the true model, these narrower intervals are optimistic in that their true coverage is less than the nominal level. Numerical optimization for the latent class model yields N = 476.1. The BCa bootstrap (B=1000) interval is (408.3,573.6), even though the loglikelihood is ex tremely flat. This is a major disadvantage of the bootstrap in that it may not rec ognize the extremely flat loglikelihoods associated with a model. Figure 3.4 shows why this happens. This figure plots 2 Log L as a function of N for the observed 63 2 Log L 1510 1515 1520  Z I Ii N \, Observed 1520 ~~~ z\..  _ ....... ...... 400 800 1200 1600 A= 476.1 Figure 3.4. 2 Log L Profile for the Observed Hepatitis Data (Solid Line) and Four Resampled Tables (Dashed Line) hepatitis data (solid line) and for four tables obtained by resampling from the ob served data. We remark that the values on the 2 Log L axis are not the true values for the resampled profiles. Rather, these profiles are overlayed on the plot for the observed data, since we are simply interested in relaying (1) the Nestimates that are produced at the minimum of the four resampled profiles, and (2) the fact that the profiles are extremely flat and never exceed the minimum value + X ,.5. We see that the minimum 2 Log L value for the resamples are shifted somewhat away from the observed value, but that the resampled statistics N9, = (423.6, 443.4,547.5, 629.7) ignore the flat loglikelihoods associated with the observed and resampled tables. Thus, one can obtain a maximum, N, and bootstrap from this estimate to obtain a confidence interval even though almost all values of the missing cell count are consistent with the model. In fact, simulations in Chapter 5 demonstrate that, for t = 3, the narrow percentile intervals from the QLC model provide close to 0% coverage, while the profile likelihood intervals with upper bound oo provide coverage close to nominal. Of course, these intervals provide very little practical information on the population size. 3.6 Similarities to Other NEstimation Problems Flat loglikelihoods have arisen in other capturerecapture approaches. Burnham and Overton (1978) made a passing reference to the performance, in this respect, of the betabinomial model, and they presented the jackknife estimators. Chao (1987), however, presented simulation results that showed very poor coverage probabilities of this estimator when t is small to moderate (i.e. t < 10). Similarities exist between this problem and the related problem of Nestimation when observing k independent and identically distributed binomial counts with un known N and success probability p; see, for instance, Olkin et al. (1981), Casella (1986), and Aitkin and Stasinopoulos (1989). Aitkin and Stasinopoulos (1989) demon strated flat likelihood for certain configurations of data that provide little information about N. All these authors demonstrated that when the log likelihood is flat, the ML estimator is unstable, with small changes in the data yielding large changes in N. For the hepatitis data, we notice that N changes from 3194.3 to 3816.2 to 4599.5 when n(l,..,l) changes from 27 to 28 to 29. Olkin et al. (1981) proposed new estimators that "stabilize" the ML estimate, also by jackknifing, but which often result in the underestimation of N in stable cases (Casella, 1986). 3.7 Comments Fitting a variety of models to the snowshoe hare data and hepatitis data demon strates that different models can produce dramatically different point and interval estimates of population size. This is because the problem of estimating the unob served cell count no...o is inherently one of extrapolation. Knowing the true value of no...o = 274 for the hepatitis data allows us to investigate the fits of these models to the complete table. Table 3.6 displays the fits of the logisticnormal (a profile along q indicates that q = 10 is sufficient for the complete table) and the latent class model to the complete table. The logisticnormal model does not fit the complete table well, since Gcomplete = 9.3 with df = 3. The latent class model, on the other hand, has Gcmplete = 1.0 with df = 2 for the complete table. Indeed, the latent class model has a biologically plausible interpretation for the hepatitis data set. The population can be divided into two groups: one that is susceptible to the virus and one that is relatively immune. Compare to Table 3.5. We cannot discern the difference in fits from the incomplete table, since, conditional on n, both models produce identical fits. This reflects the fact that usual goodnessoffit criteria are not appropriate for this extrapolation problem. This also suggests that models that have poorer fits to the complete table could potentially have an adequate fit for the unobserved cell so as to produce a confidence interval that contains the true N. So models with a poor conditional fit cannot be excluded from consideration. We demonstrate using a 24 table resulting from the crossclassification of 263 individuals according to whether or not they contracted influenza during 4 influenza outbreaks. This data is considered in detail in Chapter 4. For now, we simply note that the logisticnormal (q = 10) fit to the complete table yields deviance Gomplete = 27.7 based on df=10, provid ing strong evidence of lackoffit. However, if we pretend that noooo is unknown and estimate N using the logisticnormal model, we also obtain evidence of lack of fit with G complete = 26.5 based on df=9, but obtain Nestimate Nc = 204.2 and profile likelihood interval (170.9, 388.0). Thus, subject matter knowledge of the capture recapture application becomes important in assessing the performance of different modelbased estimators. Table 3.6. Capturehistory Counts and Fitted Values for Compete Hepatitis Table P Q E ni LN (q = 10) QLC (L=2) 0 0 0 274 279.1 274.0 0 01 63 55.5 61.0 0 1 0 55 52.8 58.0 0 11 18 22.5 17.0 1 00 69 62.0 68.0 1 0 1 17 26.4 20.0 1 1 0 21 25.1 19.0 1 1 1 28 21.6 28.0 Since the standard techniques for model selection based on goodnessoffit criteria can give misleading results in this setting, we recommend an exploratory data analy sis approach to modelling capturerecapture data, instead of rejection/acceptance of models based on goodnessoffit criterion. We recommend looking at several models (such as logisticnormal with various q, latent class models, and loglinear models) simultaneously along with profile loglikelihood plots for the models and variance com ponent and item parameter estimates from the logisticnormal model. Simulations in Chapter 5 will show that when the logisticnormal model fit yields a large estimated variance component and flat loglikelihood surface, the simpler loglinear model of homogeneous two factor interaction is preferable. The simulations suggest that even if the true model is the logisticnormal model, the simpler loglinear model is compet itive with the true model with respect to confidence interval coverage and superior with respect to interval width if the logisticnormal model yields a flat loglikelihood. CHAPTER 4 ALTERNATIVE FORMS OF DEPENDENCE In fitting the logisticnormal model of Section 3.1.2, we assume that the t responses are independent for a given subject. This assumption implies that any observed de pendencies among the t samples are due solely to population heterogeneity. In other words, these models assume that a positive association among sampling occasions is not caused by the fact that capture on the first occasion increases a subject's probabil ity of capture on the second, but rather that capture on the first occasion indicates an aboveaverage susceptibility to capture and hence an aboveaverage chance of capture on the second occasion. In this chapter, we investigate models that relax this strong assumption. We first consider models that assume serial, or Markov, dependence among the t cap tures. These models make sense when the occasions are ordered in time, since the probability of capture at occasion j depends on capture status at occasion j 1, for j = 2,..., t. Thus, these models are often appropriate in animal abundance studies in which the sampling occasions are trappings conducted sequentially in time and a subject experiences trap "avoidance" or trap "dependence." These methods are not appropriate for epidemiological data arising from t lists or records of subjects with a certain condition, such as the hepatitis data of Section 3.5, since the sampling occasions are not ordered. In such settings, it is reasonable to incorporate a depen dence structure that is symmetric with respect to the t occasions. In Section 4.1, we describe this structure, which provides an alternative motivation for the loglinear model of homogeneous twofactor interaction of Section 3.1.1. Sections 4.2 and 4.3 investigate two alternative randomeffect models that allow for dependence structures that are more general than that implied by the logistic normal model. Section 4.2 presents a model for capturerecapture that adds a normal random effect to the linear predictor of the loglinear model of mutual independence. This random effect models an observed table's departure from the simple mutual independence structure. We use an EM algorithm to fit this model conditional on the number of subjects observed, and investigate the use of profile likelihood confidence intervals for this model. Section 4.3 presents a model that is potentially useful when one has reason to believe that capture status at two or more occasions are negatively correlated, such as the situation of trap "avoidance". This model, which we term the multivariate logitnormal model, is a binary analogue of Aitchison and Ho's (1989) Poisson lognormal model for multivariate count data. To our knowledge, this binary analogue has not been addressed in the statistical literature. 4.1 Serial Dependence Serial dependence occurs when the probability of capture at time j depends on the capture status at time j 1, j = 2,...,t. Duncan (1985) and Conaway (1990) presented a generalization of subjectspecific logistic model (3.1) that relaxes the local independence assumption. As in the local independence case, the CML approach pro vides no information on population size. We therefore investigate models that remove the requirement of main diagonal saturation, and generalize these serial dependence models to situations when the "sampling occasions" are not ordered in time. 4.1.1 Sequential Sampling Occasions Duncan (1985) and Conaway (1989) incorporated serial dependence with the model e(aS,+j=l BJysJ+7h.) p(y, a,) = E= y, (4.1) (si,...,yst) e(assj+h.) where S, is the total number of captures for subject s and hs is the number of adjacent (in time) pairs of samples that have concordant responses for subject s. The serial dependence parameter 7 reflects withinsubject dependence among the t occasions. Thus, trapavoidance results in a negative y value while trapdependence results in a positive 7 value. Analogous to the results for the logistic model of Section 3.1.1, the CML approach leads to the loglinear model, t log(pi...)= E jij + A(il,..., it) + D(ii,..., it)'y, (4.2) j=1 for the 2t table, where A(il,..., it) is invariant to permutations of its argument and D(il,..., it) = ili2+(1i)(1i2)+i2i3+(1i2)(1i3)+ +i(t1) it+(1i(t_))(1it). (4.3) Like its local independence analogue, this model does not provide population size estimates since the A(O,..., 0) parameter forces a perfect fit for the 0 = (0,...,0) cell. We examine two simpler models that provide population size estimates. Serial Dependence Assuming No Heterogeneity The first model, that of serial dependence (SE) assuming no heterogeneity in the population, results from dropping the A(il,..., it) parameters from model (4.2). The model is t log(pil...i) = 1E jij + D(ii,..., it)7. (4.4) j=1 Table 4.1. Column of the covariate matrix corresponding to w in model (4.5) for the models of homogeneous 2factor interaction and serial dependence when t = 3 (il...it) 3=1i D(i,i2, i3) 000 0 2 001 0 1 010 0 0 011 1 1 100 0 1 101 1 0 110 1 1 111 3 2 This model accounts for dependencies among the t responses by the addition of the extra parameter y. Both this model and the loglinear model of homogeneous two factor interaction have form log(~.) = [X2tt : Xt+] ( ), (4.5) where X2txt is the covariate matrix corresponding to the mutual independence model. w is in the serial dependence model and A in the homogeneous 2factor interaction model. The only difference between the two models is the column of the covariate matrix, Xt+1, that corresponds to w. Table 4.1 displays xt+l associated with and A for the two models when t = 3. We see that the serial dependence model smooths the fitted value for cell 0 towards the value of cell 1 = (1,...,1) much more than the homogeneous 2factor interaction model, since both cell 0 and cell 1 have the largest number, t, of adjacent concordant pairs. This is in contrast to the homogeneous twofactor interaction model, which smooths this unknown count towards those cells corresponding to one capture. Thus, when most subjects are captured on only one sampling occasion, the serial dependence model produces smaller population size estimates than the H02 model since the number of subjects captured at all t occasions is small. To illustrate, we fitted the serial dependence model to the snowshoe hare data of Section 3.4. The model yields point estimate Nc = 74.6 and corresponding 95% profile likelihood interval (69.5, 83.8). These results are very close to the mutual independence results of Nc = 75.1 and (69.9,83.3) because 7 = .039 is close to zero relative to variability in D. The serial dependence fit yields G2 = 58.2 based on df=55, while the mutual independence model yields G2 = 58.3 on df=56. Simulations in Chapter 5 will show that the addition of this dependence parameter improves upon the mutual independence model slightly by yielding confidence intervals that are slightly wider than the mutual independence intervals. Chapter 5 shows that as a result, when the logisticnormal model truly holds, these intervals improve upon the poor coverage of the mutual independence intervals somewhat, but are also overly optimistic when large amounts of heterogeneity exist in the population. This is not surprising since we obtained the model by dropping the A(il,...,it) heterogeneity parameters from model (4.2). Mixed Serial Dependence Model We also investigate the performance of the mixed model that results from assum ing as = oaZ in model (4.1), where Z, i.i.. N(0, 1). This extends the logisticnormal model of Section 3.1.2 to allow for withinsubject dependence. The quadrature ap proximations to the marginal cell probabilities for this model have the form t e (k7ZkSiI..Ati + j'j= 1 i+ ...i) l e(..i = E Zki+ jij+yhi,. it) k (4.6) k=1 Z iE e I where Si,...i, and hi,...i, are the raw score and serial dependence value, respectively, associated with cell i = (il,..., it). It is worthwhile to investigate under what con ditions the addition of the serial dependence parameter improves the coverage of the simpler logisticnormal model and to compare the lengths of the confidence intervals produced by the two models. We investigate these questions through simulation in Chapter 5. This randomeffects approach to model (4.1), like the logisticnormal model, can also yield flat loglikelihood surfaces with respect to N. For instance, for the snowshoe hare data set, this approach yields Nc = 337.7 and 95% profile likelihood interval (75.7,476.4). So, again, the inclusion of a random effect results in a model that may provide little information about the population size. Serial Dependence + Homogeneous TwoFactor Interaction Mimicking the developments in Section 3.1.1 that motivated the loglinear model of homogeneous twofactor interaction, we obtain a second serial dependence model, which we denote by HO2SE, by replacing the A(il,..., it) parameters in model (4.2) by the homogeneous twofactor interaction dependence term, ( Lfi This yields the model log(Pil...i)J = jj + i A + D(il,..., it)7, (4.7) j=1 2 / where as before D(il,..., it) has form (4.3). The loglinear model of homogeneous twofactor interaction is the special case of this model with y = 0. To illustrate, we fit this model to the snowshoe hare data. This model yields Nestimate Nc = 92.3 and 95% profile likelihood interval (75.5,130.1). This model provides similar results to the loglinear model of homogeneous twofactor interac tion since 5 = .34. Unfortunately, we cannot interpret this estimate as a reflection of the withinsubject dependence between the t responses, since this withinsubject dependence and the dependencies resulting from population heterogeneity are con founded (Darroch and McCloud 1990). To see this, note that, under model (4.7), the conditional odds ratios are equal to A + 2y. Including this serial dependence parameter into the model, however, can be worthwhile when estimating N. Simula tions in Chapter 5 show that, when both withinsubject dependencies and population heterogeneity exist, model (4.7) maintains close to nominal coverage in those cases when the loglinear model of homogeneous twofactor interaction produces intervals with coverage lower than the nominal level. Chapter 5 will also demonstrate that the logisticnormal model seriously overestimates N when subjects in the population experience trap avoidance. Thus, if one suspects trap avoidance in the form of a negative 7 in model (4.1), model (4.7) is an attractive alternative for estimating the population size. We now provide an alternative motivation for the loglinear model of homogeneous 2factor interaction by generalizing model (4.7) to the situation of unordered sampling occasions. 4.1.2 Unordered Sampling Occasions The two models in the previous section do not apply to the hepatitis data set because the three "sampling occasions," lists of patients contracting hepatitis A, are not sequential. We now extend the models of the previous section to treat the t oc casions symmetrically. We define alternative dependence covariates, D8'm(il,..., it), that, instead of considering only adjacent pairs of occasions, treat all occasion pairs equally. Specifically, Dym(il,... ,it) = ili2 + (1 i)(1 i2) + ii3 + ( i)(1 i3) + + itlit + (1 it1)(1 it), Table 4.2. Column of the Covariate Matrix Corresponding to A in the Homogeneous TwoFactor Interaction Model and 7 in models (4.8) and (4.9) (il, i2,i3) 1 Dym(i, i2, i3) 000 0 3 001 0 1 010 0 1 011 1 1 100 0 1 101 1 1 110 1 1 111 3 3 so that the symmetric versions of models (4.4) and (4.7) are t log(il...i) = O fij + Daym(il, ., it)Y. (4.8) j=1 and t I ji ij log(Pil,.,i,) = E ji + 11 A + DSVm(ij,..., i) (4.9) j=1 2 " respectively. Table 4.2 displays Dsym(il, i2, i3) values along with the column of the covariate matrix corresponding to A in the H02 model for t = 3. We notice, however, that this symmetric dependence term satisfies D'"(i, i2, i) = 3 2i 2i2 2i + 1 ij and more generally, DsYm(ij,...,i = 2 2 ij + _ij 2 j =1 2 " j=1 Thus, given the marginal indicators, il,..., it, the symmetrical dependence parameter 7 is aliased with the A term in the H02 model. Therefore, both models (4.8) and (4.9) are equivalent to the loglinear model of homogeneous twofactor interaction. So, the loglinear model of homogeneous twofactor interaction postulates the dependence structure as a symmetric one when there is no ordering among the t occasions. 4.2 An Overdispersed Poisson LogLinear Model Chapter 5 shows that when population heterogeneity is present, ignoring this het erogeneity by fitting the loglinear model of mutual independence to the incomplete table produces overlyoptimistic, narrow confidence intervals that systematically un derestimate N. In this section, we pursue a more general Poisson model that accounts for a table's departure from mutual independence by adding a normal random effect with 0 mean and unknown variance, oa2, on the log scale. This model differs from the logisticnormal model since it postulates the random deviations from mutual depen dence as being at the cell level, instead of the subject level. There are both advantages and disadvantages of this model for capturerecapture. The model is more general than the independence model, and the addition of the ran dom effect produces confidence intervals for N that are wider than the independence interval. These intervals, however, require more computation. A sufficiently large number of quadrature points must be employed when approximating the marginal loglikelihood in order to obtain a continuous profile of the deviance with respect to N that yields a profile likelihood interval. Section 4.2.1 presents the model, while Section 4.2.2 describes an EM algorithm for fitting the model. Section 4.2.3 demonstrates the advantages and disadvantages of the model for capturerecapture by fitting the model to the snowshoe hare and hepatitis data sets of Chapter 3. 4.2.1 An Overdispersed LogLinear Model The loglinear model of mutual independence for a 2t table models the expected frequencies, {/1i} = {, ...i,}, as log(mi) = Po + /31(il = 1) + ... + PtI(it = 1). (4.10) Instead of adding association parameters to the model, we model an observed table's departure from the mutual independence model as overdispersion. Specifically, the model has form log(Ai) = 0o + Yin + ... + 3tit + aZi, where Zi i.d. N(0, 1). Thus, pi is assumed to have a lognormal distribution with mean 3o + /311i + + 3tyit and variance a2, yielding a Poissonlognormal(30 + /31Yi + + Ptit, o2) model for the cell counts. This model contains the mutual independence model as a special case with a = 0. The variability associated with the random effect will result in wider confidence inter vals for N, alleviating the problem of extremely narrow confidence intervals caused by the overly simplistic mutual independence assumption. Since any extra variation be yond that specified by the mutual independence model is modelled on the same scale as the linear predictor, this approach is often employed in generalized linear models when there is more variability than predicted by the model due to the omission of covariates. We next consider standard methods to fit this model. 4.2.2 Estimation Like the generalized linear mixed models that incorporate a random effect to account for dependence within clusters of observations, such as the logisticnormal model of Section 3.1.2, overdispersed generalized linear mixed models that include a random effect for each observation are also computationally difficult to fit using maximum likelihood. These models also produce marginal loglikelihoods that do not have closed form, again making GaussHermite quadrature necessary when assuming a normal random effect. We use the EM algorithm described by Anderson and Hinde (1988) and Aitkin (1996). The form of the EM algorithm is similar to that of the EM algorithm detailed in Section 3.1.2 for fitting the logisticnormal model to the complete table, although that algorithm is slightly simpler since we can reduce the data down to the expected number of successes and trials for a given value, Zk, of the random effect. For the overdispersed Poisson model, we must fit a response vector of length 2t x q, where again q is the number of quadrature points. We proceed by following Aitkin (1996). Consider the marginal loglikelihood of (a, ,), 1(3, a; n) = log (f f(ni1, a, zi)i(z)dz), where f(ni F, a, zi) e l ni! and pi is specified by the model, /i = exp(o + 31Pyi + + Ptvit + aZi). qpoint Gaussian quadrature yields the approximation 1(,8, u; n) w log Vkfik , i \k=1 where fik is the Poisson density for ni given value zk for the random effect. Compare with expression (3.13). Proceeding as in Section 3.1.2 and using identity (3.15), a /element of the score vector has form al q Vkfik Ologfk q LL E= 5, = E5 WizkSik), P i k= l 1Vkfik 0P i k=1 78 where k fik 'Wik fEk=l Vkfik and SikR(O) is simply the /element of the score vector corresponding to the generalized linear model log(Pik) = 1o + AlYn + .. + PtYit + UZk. Similarly, 9l q = Z WikSik(o). i k=1 Setting the score vector equal to zero yields likelihood equations corresponding to the weighted generalized linear model log (E [n'])(2tq)1 = V(2t xq)x(t+2)7(t+2)x1 (4.11) with prior weights w = (w11, w21, ..., 2t(q1), w2(q)), where n(2txq)x1 = (,n, ,..., n), X z112t X z212t V(2txq)x(t+2) = X Zgl2t X2tx(t+l) is the design matrix for the mutual independence model (4.10), and 7(t+2)xl = (3,o). This representation suggests the iterative procedure that, at iteration (p + 1), computes the prior weights w(p+1) for given parameter estimates YP), and obtains updated estimates y(P+) by fitting model (4.11) to the expanded data vector nQ. Anderson and Hinde (1988) show that this algorithm takes the form of an EM algo rithm that considers the joint distribution of (n, Z) as the complete data and n as the incomplete data. The computation of the prior weights, given parameter estimates "(P, corresponds to the Estep that computes the conditional expectation, Q(7y7()), of the complete loglikelihood given the observed data and the parameter estimates from iteration p. The fitting of the weighted generalized linear model (4.11) maxi mizes this conditional expectation with respect to 7, constituting the Mstep of the EM algorithm. One can take 0 obtained from the mutual independence model and and & = 0 as the initial estimates for the algorithm. We terminate the algorithm when the difference between deviances of successive fits is sufficiently small. We compute the residual deviance of the overdispersed Poisson fit by comparing the resulting maximum weighted loglikelihood under this model against that for the saturated fit: G2= 2 Elog ( eivk 2E[ni(log(ni) 1)] i k=l1 i We use the deviance to construct profile likelihood surfaces with respect to the un known cell count no...o for capturerecapture. 4.2.3 Application to CaptureRecapture We now apply this overdispersed Poisson model to capturerecapture using Sanathanan's conditional (on n) methods of Section 2.2.1. We have noted that this approach yields the estimate o0...0 that yields minimum G2. We first obtain this minimum G2 value by fitting the model while assigning count no...o weight 0, just as we would when fitting a standard loglinear model. For the hepatitis data using 10point Gaussian quadrature, we obtain inf G2(no...o) = 18.64. no...oER+ Thus, we attempt to construct a profile likelihood interval for N that contains all values n + no...0 = 271 + no...o for which the residual deviance of the complete table fit is less than 18.64 + 3.84 = 22.48. We obtain a profile of these residual deviance values by searching across 0 < no...o < 500, with the understanding that we may have to investigate larger no...o values if the corresponding deviances are all less than or extremely close to this cutoff value. Using a convergence criterion of .0001 for the change in sucessive deviances, the maximum number of EM iterations performed for a particular value of N was 48 when N = 338. 2 32 30  28 26 24 22  SII 20  20 710 743 18 I  297 530 N Figure 4.1. Deviance (G2) as a Function of N for the Overdispersed Poisson Model (q=10) Applied to the Hepatitis Data Figure 4.1 shows the resulting deviance profile with a horizontal dotted line de noting the acceptable deviance cutoff value of 22.48. We see that when q = 10, unlike the standard loglinear and random effects models of Chapter 3, this profile is neither continuous nor does it have convex shape near its minimum. Instead, we obtain a confidence set that is not an interval, since N E (297, 530) and N E (710,743) satisfy G2 < 22.48. 32 30 28 26 24 .. ............ 22  20 18 I 299 560 N Figure 4.2. Deviance (G2) as a Function of N for the Overdispersed Poisson Model (q=15) Applied to the Hepatitis Data Figures 4.2 and 4.3 plot the deviance profiles for 15 and 20point quadrature, respectively. Comparing these plots to 10point quadrature suggests that the irregular behavior of the G2 profile when q = 10 is a result of the quadrature approximation. The better approximations produce much smoother functions of G2 as a function of N, yielding interval estimates for N. This suggests that we may require a large number of quadrature points to obtain a deviance profile that provides the appropriate interval 32 G2 30 28 26  24 ............................... 22  20  18  295 561 N Figure 4.3. Deviance (G2) as a Function of N for the Overdispersed Poisson Model (q=20) Applied to the Hepatitis Data estimate of the population size. When q = 15, inf G2(no...o) = 19.68 no...oER+ and we obtain the 95% confidence interval (300, 560) for N. We obtain similar results when q = 20, with inf G2(no...o) = 19.81 no...oER+ and (295, 561). Both of these more accurate approximations yield intervals that are much wider than (351.5, 437.1), the mutual independence interval obtained when ignoring population heterogeneity. Thus, the introduction of a random effect for each cell in the 2t table introduces uncertainty for N beyond that induced by the mutual independence model. This additional uncertainty produces wider confidence intervals for N. For the hepatitis data set, this wider interval contains 545, the true population size. In addition, this mixed model does not incur the extremely flat likelihood encountered by the logisticnormal model in this example. 100 120 N Figure 4.4. Deviance (G2) as a Function of N for the Overdispersed Poisson Model Applied to the Snowshoe Hare Data We also analyzed the snowshoe hare data with this overdispersed Poisson model for q = 10. The point estimate, Nc = 75.0 and the profile likelihood interval, (70,83), for N match the results of the mutual independence model for this case. This is because the estimate of the random effect standard deviation, & = .007, is close to zero. Thus, the maximum likelihood fit of the mixed model is essentially the special case of mutual independence. Figure 4.4 shows the profile of G2 for this model. Note that with such a small estimated dispersion parameter, q = 10 quadrature points is sufficient and we obtain a smooth profile of the deviance. 4.3 The Multivariate Logitnormal Model The use of a subjectspecific random effect in the logisticnormal model imposes a correlation structure among the t responses in which only positive correlation is possible. In this section, we investigate a more general model that permits negative correlations to exist between pairs of responses. We first review an analogous model for multivariate count data, which was proposed by Aitchison and Ho (1989), and then extend their ideas to the binary setting. Aitchison and Ho (1989) induced correlations between multivariate Poisson counts X = (X1,..., Xt) by assuming that (1) these counts, given a vector 0 = (01,..., Ot) are independent Poisson variates with parameters 0 and (2) that this mean vector 0 has a lognormal(p, E) distribution. That is, log(0) = (log(01),...,log(0t))  MVN(Ip, E). Note that the Poisson lognormal distribution discussed in the last section for allowing overdispersion in the mutual independence model is a special case of this multivariate model. The authors chose an appropriate transformation, exp, of a multivariate normal random vector that produces a random variable with support appropriate for a Poisson mean vector, namely the positive orthant, R_, of tdimensional real space. Thus, the mixture distribution satisfies the restriction that the mean vector has support R+, while retaining the rich covariance structure of the multivariate normal mixture. This Poissonlog normal assumption yields closedform expressions for the marginal moments of the elements of X since the lognormal distribution has closed form expressions for its moments: E(Xi) = E(E(XLj0)) = E(Oi) = exp(p, + (1/2)aii). var(Xi) = E(X,) + (E(Xi))2 [exp(aii) 1] exp(aij) 1 corr(X, X) {[exp(ai) 1 + (E(X1))1] [exp(ajj) 1 + (E(Xj))']}' where aoi denotes the (ij) element of E. Examination of the corr(Xi, X) reveals that the direction of this correlation is determined by the sign of Oij from the log normal distribution. This is important since one can examine the maximum likelihood estimate t and know if the count correlations are estimated to be positive or negative. The authors also note that Icorr(X1,Xj) < Icorr(i, 0j) , so that the count correlation range is bounded above by the correlation between the corresponding log normal means. Aitchison and Ho (1989) demonstrated this model's utility when negative cor relations exist between clustered counts. They analyzed data consisting of counts from three air samplers at 50 locations. Aitchison and Ho recognized that differ ences among the 50 locations will induce correlations between the three counts from a particular location. The most common way of inducing correlations between the three sampler readings for a given location is to include a location randomeffect in the model. This approach induces positive correlations between the three samplers for a given location. The Poissonlog normal analysis, however, provided negative correlation estimates in the ML estimate E, indicating that the three samplers are competing against each other at a particular location. An analogous approach could be used to account for negative correlations in a tvariate binomial vector. That is, let Y = (Y1,..., Yt) be a binomial random vector with number of trials (nl,...,nt) and success probabilities ir = (7l,..., 7t). We choose the logistic transformation to map a tvariate multivariate normal vector onto the necessary parameter space [0, 1]i for ir. Thus, we assume that the binary vector results from the mixing of independent Binomial(n1, 7ri) distributions with mixture distribution specified by exp(Wy) S exp= (W j) = ,.., t,(4.12) 1 + exp(Wi) where W = (Wi,..., Wt) ~ N(p, E). We denote this multivariate logisticnormal distribution as n ~ LNt(I,, E), and the resulting multivariate logitnormal mixture distribution for Y as Y ~ BLNt(Ii, E). Unfortunately, unlike the Poisson lognormal mixture, the multivariate logitnormal distribution does not have closedform expressions for its moments, since closedform expressions do not exist for the moments of the logisticnormal distribution and E(Y1) = E(E(Y )) = n2E(7i), var(Yi) = E(var(Yl7r~)) + var(E(Yil7ri)) = n1E(7r(1 ir)) + n var(7r) = n (E(7r)(1 niE(7r)) + ni(n, 1)E(r), and cov(Y) = cov(E(YIr)) + E(cov(Ylr)) = nfcov(r) + E(Diag[n,7ri(1 y)]) Taylor series expansions provide approximations for the logisticnormal means, variances, and covariances when a is small. Williams (1982) showed that for small aii, the mean of a logisticnormal random variable can be reasonably approximated by exp(pi) 1 + exp(pi) Denote this approximate value as l*. Then an approximation for the logisticnormal variance is Var(7ri) u i [1 (1 14)12. Also, for two logisticnormal random variables (iri, 7rj), we have Cov(7Tri, Trj) (aUi (7) 4 (1 i;) ,; (1 j ;). Our simulation work shows that these approximations tend to break down when uai > .64. Thus, we simulated variates from a \2 ^ y )[ a20, 2p distribution for a variety of (p, a, p) values in order to get an idea of the prop erties of the multivariate logitnormal distribution for a wide range of parameter values. We ran 100,000 simulations at all combinations of p = (1.0, 0.5,0.0), a = (0.5,1.0,1.5,...,5.5), and p = (0.9,0.8,.. .,0.8,0.9). We only consider negative M values since the binary means are symmetric around .5 for positive ip. Since this produces 627 possible (jp, p) combinations, we report only those for p = (1.0, 0.5, 0.0), a = (0.5, 2.5, 5.5) and p = (.9, .5, 0, .5, .9). Table 4.3 reports the correlations between Y1 and Y2 for these combinations, while Figure 4.5 plots the binary correlations (solid line) and logisticnormal correlations (dashed line) for the full range of p values for several (p, a) values. These results show that the multivariate logitnormal distribution has properties that are similar to the Poisson lognormal distribution. We see that the binary cor relation has the same sign as the correlation between the bivariate normal random Table 4.3. Simulated Correlation between Y1 and Y2 when Y = (Y1, Y2) is Distributed as BLN2(p, E), when p = 1.0, 0.5, 0.0 and O2 = 0.5, 2.5, 5.5 p / oa2 0.9 0.5 0.0 0.5 0.9 1.0 0.5 .074 .040 .002 .045 .075 2.5 .241 .139 .005 .142 .265 5.5 .354 .193 .001 .206 .388 0.5 0.5 .089 .046 .003 .042 .085 2.5 .268 .146 .002 .152 .267 5.5 .387 .204 .003 .210 .397 0.0 0.5 .091 .044 .001 .050 .089 2.5 .277 .148 .003 .152 .277 5.5 .401 .218 .001 .208 .400 variables. The range of the possible binary correlation is not as wide as that of the corresponding logisticnormal distribution. This correlation restriction means that this model will not yield a good fit when the data exhibit high binomial correlations. Thus, it is possible to see a poor model fit to the data even if the model contains as many parameters as there are cells. For example, consider the model applied to the complete hepatitis data of Table 3.6. If we estimate an unrestricted mean vector p. and variancecovariance matrix E, we are fitting an eightcell table with a model containing nine parameters. The model fit, however, yields G2 = 10.5. Thus, this model cannot account for the high binary correlations between the three samples. An analogous situation is the performance of the logisticnormal model in the matchedpairs setting (i.e. t = 2). The 2 x 2 table has three unrestricted cell probabilities while the model contains three parameters: (a, ,, 02). If an observed table exhibits a negative correlation between the two samples (e.g. odds ratio less than 1.0), the logisticnormal model will not be able to reproduce the observed table since the random subject effects impose a positive correlation structure. Consider the example Correlation 1 0 I 1 p 1 0 1 Correlation S=0.5 0 1 1 0 1 Correlation 0 / I / 1 1 0 1 Correlation 1 / / 0 1 0 Correlation 1 0 0=2.5 Correlation 1 0 / / I 1 1 0 1 0=5.5 Figure 4.5. Corr(YI, Y2) (Solid Line) and Corr(7tl, 7r2) (Dotted Line) as a Function of p, for A = (1.0,0.5, 0.0) and a = (0.5, 2.5, 5.5) occ. 2 1 0 occ. 1 1 4 5 0 10 2 The odds ratio is .16, and the likelihood ratio test for the logisticnormal model is G2 = 3.6. The fit of this model is on the boundary, with & = 0.0. Likewise, the BLN3 fit for the hepatitis data is also on the boundary with all estimated correlations p=I 1 0 Correlation p=O 0 1 1 0 0=05 in E equal to one. Thus, like the Poisson lognormal model, the multivariate logit normal model allows negative correlations, but still has limitations with respect to the patterns of correlations that the model is able to fit. 4.3.1 Estimation Let g(n7rI,, E) denote the probability function of the ddimensional component wise logistic transformation of a tvariate N(,p, E) distribution, so that i=1 1/ exp {1 (logit(wr) p)' E (logit(7r) ) where logit(7r) is taken componentwise. Note that the logisticnormal vector 7T differs from the vector modelled with the additive logisticnormal distribution in order to analyze compositional data. The ad ditive logisticnormal distribution (Aitchison and Shen (1980) and Aitchison (1986)) is an appropriate distribution for random vectors on the simplex; that is, 7r' = (7r,..., ') with the constraints 0 < 7r' < 1, j= 1,...,t and '= l) = 1. This distribution is induced using the transformation I exp(Wj) Sl+ exp(W) +... + exp(W(t_))' 1 = 1 + exp(W1) + ... + exp(W(t1)' where W Nt(I, E). Rather than being required to sum to one, the 7r vector of means for the multivariate logitnormal model has support [0, 1]t, so that the componentwise transformation (4.12) is more appropriate than the additive form for this application. The BLNt(,p, E) mixture then has density function d p(yI, E) = f I f(Yini, 7ri)g(rlp, E)d7r, (4.13) J[0,1]t i=1 where f(yini, 7ri) denotes the usual binomial density with ni trials and probability of success 7i. Estimation of the parameters (iA, E) is more complex computationally for this model compared to the other mixedmodels we have examined because we must use multidimensional Gaussian quadrature to approximate the tdimensional integral in density (4.13). This employs the same strategy of writing the density as J h(z)exp(z'z)dz JR 2 and now approximating it by a tdimensional weighted average of the function h evaluated at the qt different quadrature vectors, Zkl...k = (Zk, Zk), where {zkm}, km = 1.. .,q, m = 1,...,t are the univariate Gaussian quadrature nodes of Sec tion 3.1.2. The multivariate quadrature weights vI ...kt are products of the univariate quadrature weights, t Vtk ...= I vk,. i=1 The transformation logit(r) / = Qz, where Qtxt is the unique lower triangular matrix with positive diagonal elements such that E = QQ' is positivedefinite, yields the necessary form p(yl, Q) = (27)t/2 /R h(y, Q, z)exp { z'z} dz with h(y, A, Q, z) = exp y'( + Qz) ltlog [1 + exp( + Qz)] + log [( )] ML estimates fJ and : = QQ' are obtained by maximizing the approximation p(yIA, Q) = h(y, p, Q, zk)exp { k zIk u, (4.14) k I2 where k = (kl,..., kt). We use FSQP to maximize (4.14) with respect to (A, Q). We were concerned about the performance of this maximization algorithm when handling such a complex function, so we programmed Aitchison and Ho's (1989) analogous Poisson lognormal density in FSQP and computed the ML estimates for the four models fit to the sampler data discussed in the previous section. The results for all four models matched those reported in Aitchison and Ho's (1989) Table 5, demonstrating that FSQP successfully maximimized the Poisson analogue of (4.14). The generality of the multivariate logitnormal model allows us to test various hypotheses concerning the t responses. For instance, if we fit the model with the con straint that the offdiagonal elements in E are zero, this maximum loglikelihood can be compared to the loglikelihood with unrestricted aij to test mutual independence between the t responses. Alternatively, we could test the equivalence of the t means or that no differences exist between the t responses: j= = ai, jj = 2, p2. Aitchison and Ho (1989) term this hypothesis the isotrophic hypothesis. The logistic normal model of Section 3.1.2 is the special case j = j, = 02, j = pc2 with p = 1 (Agresti 1997). The multivariate logitnormal model becomes computationally difficult to fit when t is moderate to large. Using FSQP, maximizing a 10point quadrature approximation of the loglikelihood when t = 3 takes approximately 12 hours on a SUN SPARCsta tion 20 with 48 MB of RAM. Increasing the dimension to four substantially increases Table 4.4. Observed Frequencies and Fitted Counts of infection profiles for influenza data Fitted Values (i,..., i4) nl...,44 LN (q = 10) H02 BLN (q = 10) Cnst. BLN4 0 0 0 0 140 138.4 138.2 138.7 137.8 0001 31 20.8 20.8 31.0 31.3 0 01 0 16 19.5 19.6 17.6 18.0 001 1 3 4.1 4.1 2.8 2.6 0 1 00 17 22.1 22.2 17.8 18.6 0 1 0 1 2 4.6 4.6 2.6 2.4 0 1 1 0 5 4.3 4.3 4.1 3.8 0 1 1 1 1 1.3 1.2 0.5 0.4 1 0 0 0 20 26.2 26.3 22.0 21.8 1 00 1 2 5.5 5.5 1.7 1.8 1 0 1 0 9 5.1 5.1 6.9 7.1 1 0 1 1 0 1.5 1.5 0.4 0.4 1 1 00 12 5.8 5.8 10.6 10.8 1 1 01 1 1.7 1.7 0.6 0.5 1 1 1 0 4 1.6 1.6 5.4 5.4 1 1 1 1 0 0.6 0.6 0.2 0.1 Deviance (df) 27.7 (10) 27.7 (10) 3.9 (1) 4.3 (5) fitting time to approximately one day, since we have increased the size of the grid of quadrature points from 103 to 104. A fourdimensional constrained fit (e.g. equal means or equal correlations) takes approximately 1.5 to 2 days. 4.3.2 Influenza Example We now use the multivariate logitnormal model to analyze a data set first consid ered by Haber (1986) and later reanalyzed by Darroch and McCloud (1990). The data are frequencies of infection profiles of a sample of 263 individuals for four influenza outbreaks occurring in the winters 1977/1978 to 1980/1981 in Tecumseh, Michigan. The data are reported in Table 4.4. The first and fourth outbreaks are known to have been caused by the same virus type, while the viruses in the second and third outbreaks were of different types. Be cause the first and fourth outbreaks were caused by the same virus type, a subject's 
Full Text 
xml version 1.0 encoding UTF8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID ESFFA9OD3_BLAZUM INGEST_TIME 20131024T17:35:21Z PACKAGE AA00017652_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES 