Subject-specific modelling of capture-recapture experiments


Material Information

Subject-specific modelling of capture-recapture experiments
Physical Description:
vii, 155 leaves : ill. ; 29 cm.
Coull, Brent Andrew, 1970-
Publication Date:


Subjects / Keywords:
Statistics thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Statistics -- UF   ( lcsh )
bibliography   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 1997.
Includes bibliographical references (leaves 149-154).
Statement of Responsibility:
by Brent Andrew Coull.
General Note:
General Note:

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 028674906
oclc - 39137434
System ID:

This item is only available as the following downloads:

Full Text







To Jill and my parents


I would like to express my sincere gratitude to Dr. Alan Agresti for serving as

my dissertation advisor. Not only does his hard work and obvious enthusiasm for

statistical research serve as an excellent example for his students, but he has shown an

inordinate amount of interest in me and my professional development as a statistician

well beyond that required of any advisor. I consider him a valued friend. I would also

like to thank Drs. James Booth, James Hobert, Ramon Littell, and Michel Ochi for

serving on my committee, and Dr. Craig Osenberg for agreeing to sit in on my final

examination on such short notice.

I thank my mother and father for their continual love and support. Their work

ethic and compassion for other people make them worthy role models. I am also lucky

to have married into the Cubbedge family. They treat me as if I am one of their own.

Jill and I are truly blessed to have such a wonderful family.

Finally, I could not have completed this work without the unwaivering love, pa-

tience, and support of my wife, Jill. She has always had the confidence that I would

complete this paper, even when I did not, and has made many sacrifices so that I

could do so. I will forever be grateful.


ACKNOWLEDGEMENTS ................................................... iii

ABSTRACT ................................................................. vi



ATURE .................................................... 6

2.1 Existing Models .............................. ............ .. 6
2.2 Maximum Likelihood Estimation of N ......................... 22
2.3 Methods for Constructing Confidence Intervals for N ......... 25

DENCE .................................................... 33

3.1 A Logistic Model with Subject Heterogeneity ................. 34
3.2 A Latent Class Model .......................................... 43
3.3 ML Estimation of N ........................................... 49
3.4 Snowshoe Hare Example ....................................... 51
3.5 Behavior of the Log Likelihood and N Estimator ............. 54
3.6 Similarities to Other N-Estimation Problems ................. 64
3.7 Comments .............................................. 65

4 ALTERNATIVE FORMS OF DEPENDENCE ...................... 67

4.1 Serial Dependence ........................................ 68
4.2 An Overdispersed Poisson Log-Linear Model .................. 75
4.3 The Multivariate Logit-normal Model ......................... 84
4.4 Conclusions ............................................. 96

5 SIMULATION STUDIES ............................................ 99

5.1 Numerical Optimization and the Bootstrap ...................100

5.2 Nc and the Bootstrap ........................................106
5.3 Nc and the Profile Likelihood Confidence Interval .............113
5.4 Narrow Intervals vs. Attained Nominal Confidence ...........138
5.5 Recommendations ...................................... .... ... 141

6 CONCLUSIONS .................................... ..............144

6.1 Summary of Results ............................................. 144
6.2 Future Research ............................. .... .......... ... 147

REFERENCES ....................................... ..........................149

BIOGRAPHICAL SKETCH ..................................................155

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of
Doctor of Philosophy



Brent Andrew Coull

December 1997

Chairman: Alan G. Agresti
Major department: Statistics

A capture-recapture experiment that successively samples individuals from some

population of interest is the most popular method for estimating population size.

These experiments have found application in wildlife ecology, census undercount,

and epidemiological studies. We consider capture-recapture models appropriate for

estimating N in a closed population when the probability of capture varies across

both the N subjects (population heterogeneity) and the t sampling occasions. We

investigate the performance of mixture models, log-linear models, and a latent class

model for a variety of dependence structures among the t sampling occasions. In

particular, we look at situations when population heterogeneity is the only source

of dependence among the t samples, situations when positive or negative within-

subject dependencies exist, and situations when both sources of dependence exist.

We demonstrate that two mixed models, the logistic-normal model and a logistic-

normal model with a serial dependence term, and the latent class model experience a

near nonidentifiability problem when most of the "captured" subjects are observed on

only one sampling occasion. We draw parallels between this phenomenon and other

well-studied N-estimation problems. When this is the case, the log-linear models of

homogeneous two-factor interaction and homogeneous two-factor interaction plus a

serial dependence term prove useful in that they yield confidence intervals for N that

are narrower than those resulting from the mixed models while maintaining close-to-

nominal coverage. These simpler models can also describe a wide range of dependence

structures among the t occasions (including negative dependence structures) and are

easy to fit.

We present an alternative mixed model, an overdispersed Poisson model, that

shows promise in estimating N when most subjects are observed on only one occasion.

An example shows that this model avoids the nonidentifiability problems incurred

by the other mixed models. A multivariate logit-normal model is introduced that

accounts for correlations between the t responses by specifying that the probability

vector is distributed as a multivariate normal random vector.

We also compare methods for constructing interval estimates for N. Although the

bootstrap is probably the most popular method in the capture-recapture literature

right now, we give several reasons why profile likelihood intervals are to be preferred

when used in conjunction with the above models.


The capture-recapture (CR) problem is one of estimating the size, N, of some

population of interest by way of successive sampling and recording of individuals in

the population. The earliest models obtained an estimate for the number of subjects

in the population under the restrictive assumption that the probability of capture

stays constant for all occasions and for all subjects. Researchers quickly realized

that these assumptions, while mathematically convenient for obtaining a numerical

estimate, were rarely biologically realistic. Thus, in the last 40 years, an enormous

body of literature has been developed to explore ways to relax one or both of these


The goal of capture-recapture experiments is typically that of estimating ani-

mal abundance in some habitat, in which case the "subjects" are animals and the

sampling occasions refer to trappings. Researchers, however, are finding other areas

of application for capture-recapture methods. Often called dual-system estimation,

capture-recapture methodology has been used in census undercount estimation (Alho

et al., 1993; Darroch et al., 1993). An individual is "captured" if listed in the census

or a supplementary post-enumeration survey, which together represent the multiple

samples. Captur:-recapture methodology is just recently finding a place in epidemi-

ology. In this setting, the population of interest is the set of individuals with a certain

condition or disease and the sampling occasions correspond to incomplete lists from

different sources. Wittes (1974) first proposed using capture-recapture methods to

estimate the size of an infected group in an epidemiological setting. Baker (1990)

used capture-recapture methods to analyze data on multiple screenings for the early

detection of breast cancer in women, while Regal and Hook (1984, 1991) estimated

the number of spina bifida cases in upstate New York. Chao and Tsay (1996a, 1996b)

focused on capture-recapture methods applicable exclusively to epidemiological data

and attempted to estimate the number of subjects who contracted the hepititis A

virus during an outbreak in Northern Taiwan. The International Society for Disease

Monitoring and Forecasting (1995a,b) detailed CR applications in human diseases

and called for future research in this area.

Throughout this dissertation, we consider a general t-sample capture-recapture

experiment in which the animals are marked or recorded in such a way that the num-

ber of subjects with a particular pattern of being observed or not being observed at

each sampling occasion is available. These patterns are often referred to as the "cap-

ture histories" in the capture-recapture literature. The data from such experiments

can be represented in a 2t contingency table, formed by cross-classifying the capture

status of a subject on each of the t occasions. For instance, if we denote a capture

at sampling occasion i as 1 and a noncapture as 0, a t = 3 occasion experiment gives

rise to the 23 contingency table:

occ. 3 = 1 occ. 3 =0
occ. 2 occ. 2
1 0 1 0
occ. 1 1 niI nlo0 occ. 1 1 nio nloo
0 non1 nooo 0 nolo nooo =?

Thus, ni11, for instance, is the number of subjects having capture history (1,1,1), or

captured at all three occasions. In this representation the problem of estimating N is

equivalent to estimating nooo, the number of subjects unobserved in all t samples. We

obtain an estimate of this unknown quantity through models which place assumptions

on the capture probabilities. The estimate of nooo is then the fitted cell count for cell

(0, 0, 0) obtained from the model.

More generally, we use the following notation throughout this dissertation for a t

sample experiment. Let Z = {(1,..., 1),..., (0,..., 0)} denote the set of 2t possible

capture histories in lexicographic order, and = {(1,..., 1),..., (0,..., 0, 1)} the set

of observable histories. Let i = (il,..., it) be an element of I, and let ni = ni,...dt be

the number of subjects having that sequence. We let n = (n...,..., no...o) denote the

vector of cell counts, i = (ni...l,..., no...1) the vector of observable cell counts, and

the scalar n = Eic ni the number of observed subjects in the experiment.

The problem of estimating the unobserved cell count no...o from a model fit to the

observed 2t 1 cell counts is inherently one of extrapolation. That is, in "estimating"

no...0, one extrapolates from the range of the observed data, the numbers of subjects
having 1,2,..., t captures, to the number of subjects with 0 captures. One can think

of this problem as similar to the famous Space Shuttle data in which one fits the

probability of O-ring failure as a function of temperature in the range of 53 to 81

degrees, and then seeks to predict the probability of O-ring failure at 31 degrees

(Dalal et al., 1989; Lavine, 1991). Another analogous problem is that of low-dose

extrapolation (Prentice, 1976; Brown, 1978). In carcinogenic studies, researchers are

often not able to employ enough patients to estimate accurately the probability of

an event for low doses of a drug. Thus, researchers must apply larger doses of the

substance under question, fit a model to the observed range of doses, and extrapolate

the fit either to predict the effect of a specific low dose of interest or to determine a

minimum effective dose.

In extrapolation problems such as these, different models that provide essentially

equally good fits to the observed data can lead to drastically different predicted values

at settings outside the observed range of covariates. Thus, we see that population

size estimates can differ drastically depending on the selected model. It can also

happen that a model with a poor fit to the observed data extrapolates to the settings

of interest much more accurately than a better fitting model. Thus, the standard

techniques for model selection based on goodness-of-fit criteria can be inappropriate.

The purpose of this thesis is to examine methods that can be used to estimate the

size of some population of interest when the probability of capture varies across both

subjects in the population and across the t sampling occasions. In Chapter 2, we

review existing models developed to estimate N in the presence of heterogeneity and

discuss the advantages and disadvantages of each. We also review likelihood-based

methods for obtaining a point estimate of N and standard techniques of constructing

confidence intervals for N.

In Chapter 3, we develop a mixed model approach to account for population

heterogeneity in capture-recapture studies. We motivate the logistic-normal model,

which results from a random-effects approach to a subject-specific logistic model. We

compare the performance of this model in estimating N with a simpler log-linear

model that is motivated from a fixed-effects approach to the same logistic model. We

also present a simpler form of the logistic-normal model, a latent class model, and

include this model in the comparisons. We demonstrate these methods by considering

an animal abundance example resulting from six trappings of snowshoe hares and an

epidemiologic example resulting from three incomplete lists of hepititis A patients

compiled during a 1995 hepititis outbreak in northern Taiwan. We then use these

examples to demonstrate potential difficulties, in the form of flat log-likelihoods,

incurred by the logistic-normal model. Finally, Chapter 3 draws parallels between

the difficulties incurred in the mixed model setting and those occurring in other well-

studied N-estimation problems, namely other capture-recapture experiments that

allow for heterogeneity and the situation where one observes k independently and

identically distributed binomial(N, p) observations when both N and p are unknown.

Chapter 4 investigates the usefulness of other statistical models in the context of

capture-recapture studies. We study models that relax the logistic-normal assump-

tion of local independence. In particular, for sequential trappings, we look at fixed-

and random-effect models allowing for serial dependence, given an individual. We

also investigate the use of an alternative mixed model that allows for overdispersion

relative to the log-linear model of mutual independence, and introduce and develop

a new model, the multivariate logit-normal model, that postulates a separate latent

variable for each occasion.

Chapter 5 presents the results of simulation studies conducted to examine the

performances of the models discussed in Chapter 3 and Chapter 4 in the capture-

recapture setting. We investigate the coverage properties and mean lengths of the

resulting bootstrap and profile likelihood confidence intervals for N and the absolute

error of the point estimates. These studies demonstrate the impact that flat log-

likelhoods have on the logistic-normal model's ability to accurately estimate N. We

see that the logistic-normal model is helpful in estimating the population size only

when the amount of heterogeneity is small to moderate and the probability of capture

is not extremely small. Otherwise, we see that alternative log-linear models are viable

candidates when mixed models prove unsatisfactory.


Capture-recapture methods are often classified according to the strength of the as-

sumptions placed on the population of interest. Models assuming an open population

that experiences recruitment, in the form of birth or immigration, and/or losses, in

the form of death or emigration, incorporate extra parameters for these phenomenon

and will not be considered here. We will instead concentrate on methods appropriate

when estimating population size in closed populations whose size remains constant

during a t-sample capture-recapture experiment. In these models, the subjects in a

population of size N can be enumerated {1, 2,..., N}, where N is a parameter to

be estimated. The assumption of a closed population is often appropriate for studies

conducted over short periods of time, say, for instance, several days.

2.1 Existing Models

Darroch (1958) was the first to use maximum likelihood analyses to obtain pop-

ulation size estimates. His model assumed that the capture probabilities vary across

sampling occasions, but are the same for every subject in the population. This fits

into the Mt class of models within the hierarchy of models of Otis et al. (1978)

Mo, Mt, Mh, Mb, Mth, Mtb, Mhb, Mtbh,

where t denotes variation of capture probabilities across time, h denotes variation

of capture probabilities across subjects (h=heterogeneous population), and b denotes

variation of capture probabilities according to a behavioral response. Behavioral

response refers to the phenomenon of the capture probability changing once an animal

is captured either in the form of trap "avoidance" or trap "dependence." Mo, the

simplest model, assumes constant capture probability for all animals on all sampling

occasions and is rarely realistic.

Although models are now available for the entire hierarchy, models allowing cap-

ture probabilities to vary among individuals (i.e. having the subscript h) have proven

to be problematic. In this chapter we will review in detail existing models that ac-

count for heterogeneity. The earliest attempts to consider heterogeneity specified a

heterogeneous population as an assumption violation to simpler models, for instance

Mo, Mb, or Mt, and tried to construct estimators that were robust to this violation.

Fienberg (1972) proposed using a log-linear model that fit the 2 1 contingency table

well. Cormack (1989), however, noted that although means of checking for subject

heterogeneity are available when using standard log-linear models, these models do

not provide reliable estimates when this is the case. Nanthan Mantel, pioneer of the

use of stratified contingency tables, stated (Gail, 1997),

The issue was to estimate the population size. Fienberg's idea was to
assume that you were in a complex contingency table with one missing
cell entry. However, his method assumed that all animals came from this
same complex contingency table. That was one thing that I would have
disagreed with. Maybe the risks were not really homogeneous from animal
to animal.

The two most popular estimators within the Mh class are the Burnham-Overton

(1978) and Chao (1987, 1989) estimators. Burnham and Overton (1978) were the first

to estimate N using heterogeneous capture-recapture models. Both of these models

assume that the capture probabilities ps, s = 1,..., N, are randomly distributed ac-

cording to an unspecified distribution F. Most recently, Norris and Pollock (1996)

conducted nonparametric ML estimation (NPML) of the pair (N, F) via an EM al-

gorithm under both Mh and Mbh assumptions.

This is by no means an exhaustive list of all heterogeneity models in the enormous

capture-recapture literature. Other models within this class were proposed by Pollock

and Otto (1983), Smith and Van Belle (1984), and Lee and Chao (1994). In this

section, however, we focus on the models in the preceding paragraph because log-

linear models are the simplest to implement, the Burnham-Overton and Chao models

are the most widely used heterogeneity models, and the NPML estimator is the

most recent. These models also introduce the themes that will become evident in

Chapters 3-5; that is, these models demonstrate difficulties inherent in the problem

of N-estimation when heterogeneity is present and the trade-off of narrow confidence

intervals versus attained confidence.

2.1.1 Log-linear Models

An obvious class of models to model the 2t 1 counts in the contingency table

is the class of log-linear models. As noted earlier, the use of these models in the

capture-recapture setting was first developed by Fienberg (1972) and reviewed by

Cormack (1989).

These models assume that the cell counts in the 2t table are either Poisson or

multinomial random variables. Since independent Poisson random variables are dis-

tributed as multinomial when conditioned on their sum, both of these assumptions

yield identical inferences. Log-linear models relate the expectations of these counts,

{ ui}, to a vector of unknown parameters, /, through the log transformation,

log(/) = X3,

where pA is the vector of cell means in lexiographic order and X is a known covariate

matrix. We estimate the unknown parameter vector / through maximum likelihood.

Since Poisson log-linear models are exponential family models and the log link is the

canonical link, the likelihood equations take the form of equating the expected value of

the sufficient statistics to their observed values. Fienberg (1972) and Cormack (1989)

selected an appropriate model within the hierarchy of log-linear models based on

usual goodness-of-fit criteria, either the Pearson chi-squared statistic or the residual


The simplest way to obtain an N-estimate from a given log-linear model is to fit

the model to the 2t 1 observed counts conditional on the number of subjects, n,

observed in the experiment. We then use the model to predict the content of the

unobservable cell. We demonstrate this procedure in the simple two-sample case,

for which the model specifying that the capture status at occasion is independent

of capture status at occasion one produces a closed-form estimator. Consider the

following 2 x 2 table:

occ. 2
1 0
occ. 1 1 ni nio
0 no -

Letting pij = E(nij), i, j = 0, 1, denote the expected value of the ijth cell count, the

maximum likelihood estimates of the {fi } conditional on n = nio + no, + nl are just

the corresponding {nij}. Also, the independence model requires


so that

10o01 nono01
/too -
A11 ni1

This estimate is the traditional Lincoln index (1930) used to estimate N.

Log-linear modelling of capture-recapture data has both advantages and disad-

vantages. One advantage is that, unlike the Mh models discussed in the next section,

capture probabilities are allowed to vary across sampling occasions. Also, one can

specify dependencies between sampling occasions in the form of interactions. The

only requirement on the form of the model in order to obtain an N-estimate is that

the t-order interaction is zero. Otherwise, the model is saturated for the unknown cell

count, and any value of that count is consistent with the model. Thus, the saturated

model provides no information on the population size.

A third advantage of the log-linear models is the ease with which these models

can be implemented. Cormack (1989, 1990), Agresti (1994) and others showed that

one can fit standard log-linear models to capture-recapture data using GLIM (Francis

et al. (1993)). One simply specifies the weight of the missing cell count to be zero,

producing N-estimate N = n + Fo...o. This estimate corresponds to an estimate of N

conditional on n. We discuss this estimation procedure in Section 2.2.1.

The main disadvantage of log-linear modelling is the inability to model heterogene-

ity within the population. Cormack (1989) proposed ways to diagnose heterogeneity

when fitting these models. He suggested examining the residual deviance, examining

the pattern of standardized residuals based on the number of individuals observed

and predicted to have been captured k times, k = 1,..., t. He suggested that plots

exhibiting a concave shape exhibit heterogeneity, suggesting that the given log-linear

model is inadequate for the data. He demonstrated with a set of data based on six

trappings of snowshoe hares, which we will analyze in detail in Chapter 3.

In the standard setting for log-linear models, that of complete contingency ta-

bles where all cell counts are observed, parsimonous log-linear models yield smaller

standard errors for estimated parameters. Thus, the standard practice is to base

inferences on a simpler model that fits reasonably well since the mean squared errors

of the estimates for that model can be much smaller than those for a more com-

plex model, even if the simpler model holds only approximately. In the context of

N-estimation in capture-recapture models, the magnitude of the standard error for

N also increases for more complex models. As noted in Chapter 1, however, the

traditional goodness-of-fit tests for extrapolatory problems do not necessarily give a

good indication of the model fit to the response outside the observed range of data.

Thus, simpler models that yield smaller standard errors and narrower confidence

intervals can result in overly optimistic confidence. This was noted by Regal and

Hook (1991) and Agresti (1994) in the capture-recapture setting, and Prentice (1976)

in the low-dose extrapolation setting. Thus, Prentice (1976) and Regal and Hook

(1991) both suggest the opposite of standard practice: unless outside considerations

warrant the use of a simple model, one should use the more complex model to obtain

wider confidence intervals that achieve higher actual coverage. We will examine the

trade-off between narrow, "informative" confidence intervals versus attaining nominal

confidence in Chapter 5.

We introduce a variety of models that allow for heterogeneity in Chapters 3 and

4, and compare their performance in estimating N through simulation in Chapter 5.

First, however, we review in detail previous attempts to model heterogeneity in the


2.1.2 Models Allowing for Heterogeneous Population

As noted by Cormack (1989), the main problem in using log-linear models in

capture-recapture experiments is the constraint that the capture probability is the

same for all N subjects in the population. The assumption of heterogeneous capture

probabilities has proven to be most problematic in the capture-recapture setting. It

has long been recognized that ignoring heterogeneity when it is present results in se-

vere underestimation of the population size (Burnham, 1972; Burnham and Overton,

1978; Chao, 1987, 1989; and references therein). On the other hand, several models

developed to account for heterogeneity either (a) yield confidence intervals for N that

have extremely poor coverage or (b) yield extremely wide confidence intervals that

provide little or no practical information on the population size. This section presents

a summary of the models developed to account for heterogeneity in the population

with respect to capturability.

Burnham's Beta-Binomial Model

Under the Mh assumptions, the number of captures for subject s in the population

is a binomial random variable with t trials and success probability p,, where p, is the

probability of capture of subject s. A natural strategy in modelling heterogeneity is

to specify that the individual capture probabilities {p,} are random variables with

some known parametric form. One can then average across this mixing distribution to

obtain expressions for the probability of capture of a subject selected at random from

the population. Since the capture counts for each observed individual are binomial,

a natural candidate for the form of the mixing distribution is the congugate beta

distribution. Thus, ps ~ Beta(a,b), where a and b are unknown parameters. This

assumption results in the common beta-binomial model (Morgan, 1992).

The marginal log-likelihood for this model is obtained by integrating over the

random beta mixing distribution. This marginal log-likelihood is then maximized

over a, b, and N to obtain a maximum likelihood estimate for the population size.

Burnham (1972) rigorously developed this model in his dissertation. Unfortunately,

he concluded that this model was unsatisfactory for estimating the population size

because it yields extremely flat log-likelihoods with respect to the unknown parameter

N. These flat profile log-likelihood surfaces are a result of a near-nonidentifiability

problem that occurs when both the parameters of the mixing distribution and the

population size N are left completely unspecified. These flat surfaces result in nearly

arbitrary point estimates for N and extremely wide confidence intervals that provide

little or no practical information on N. We look at this problem in more detail in the

context of logistic-normal models in Chapter 3.

Burnham and Overton's Solution the Jackknife

Burnham and Overton's (1978) solution to the near nonidentifiability problem

encountered in the beta-binomial model is a jackknife estimate. This population size

estimate takes a naive estimate of N, the total number of subjects seen in the exper-

iment, and attempts to improve this estimate through a bias correction procedure,

the jackknife.

Specifically, the jackknife is a leave-one-out algorithm that estimates the bias of

an estimator of interest. Suppose we wish to estimate an unknown parameter, 0,

indexing an unknown distribution. Let x = (xl,... xn) be a random sample of size

n from this distribution, and suppose that we wish to estimate 0 with the statistic

0 = 0(x). One performs the jackknife by computing 0(i) = 9(x-i), where x_i is the

sample of size n 1 formed by deleting the ith observation. If 0.) = i=1 i),

then the jackknife estimate of bias is B = (n 1)(0(.) 0). This estimate yields the

bias-corrected estimate = 6 B. Under the assumption that

E(0)=0+-+ a+---,

where the {ai} are constants, the jackknife technique reduces the bias from order 1/n

for 6 to order 1/n2 for 0 (Efron 1982). One can reduce the order of the bias further

by deleting d observations at a time, known as the delete-d jackknife.

Burnham (1972) and Burnham and Overton (1978) applied the jackknife to the

capture-recapture setting by considering 0 = nt, the total number of subjects observed

after t samples, as a biased estimate for the unknown parameter 0 = N. The authors

assume that
al a2
E(n) = N + 2- + +.,
t t2

since the bias decreases as more samples are taken. The authors performed the

jackknife by computing the t statistics {n(t-_)i} by pretending that the ith sample was

not recorded. The resulting bias-corrected estimate has closed form since the n(t-1)i

depend only on the original estimate nt and the number of individuals who were

observed exclusively on occasion i. The authors also proposed higher-order jackknife

estimates obtained by deleting more than one occasion at a time and proposed a test

to determine which estimate should be used to estimate the population size.

These jackknife estimators also have advantages and disadvantages. The estimates

are easy to compute since they can be expressed as a linear combination, E cf,

with known coefficients {ci} of the number of subjects {fi} observed on exactly i

different sampling occasions, which are the minimal sufficient statistics for the Mh

model. As a result, closed-form expressions of the asymptotic variances for the es-

timates exist. Also, through simulation results, Burnham (1972) demonstrated that

the asymptotic confidence intervals obtained from these estimates tend to have ac-

tual coverage close to the nominal level when t is large (t > 10) for a wide range of

assumed heterogeneity distributions.

The main disadvantage of the jackknife estimators is the extremely poor confidence

interval coverage when the number of sampling occasions are small. Chao's (1987)

and our simulations show that for t < 5, the true coverages can be as low as zero

in the presence of moderate heterogeneity. Also, for some models, it is possible

to show analytically that the bias of nt is not expressible as a power series in t

(Cormack, 1989). Specifically, model Mo, discussed at the beginning of this section,

and all finite mixture distributions for a subject's capture probability p, do not satisfy

this assumption. Thus, in these cases, theory has shown that these bias-corrected

estimates do not necessarily have smaller bias than the original estimator nt.

Chao's Alternatives to the Jackknife

The poor performance of Burnham and Overton's jackknife estimates for a small
number of sampling occasions led Chao (1987, 1989) to develop an alternative esti-
mator based on the method of moments. This estimator was also motivated assuming
t is large and the mean probability of capture is small, but simulation (Chao, 1987)
demonstrated that this alternative yields much higher confidence interval coverage
when t is small. Instead of basing the estimator on all known capture frequencies,

{fi}, Chao developed an estimator based only on fi, f2. Chao pointed out the intu-
itive appeal of this estimator since animals remaining unobserved after t samples (i.e.
those animals in capture frequency fo) are thought to be most like those observed on
a small number of occasions.
Chao assumed that the individual capture probability for individual s, ps, is dis-
tributed according to some unspecified distribution function F. This yields expected

E(fi) = N pi(1 -p)t-dF(p), i = 0 1,...,t.

Using a Poisson approximation to the binomial density when t is large and p is small
and Jensen's inequality, Chao obtained the inequality

[E(f )]2
E(fo) >

which leads to the moment estimate,

N>n+ f

as a lower bound of the population size.
Without being any more difficult to compute, this estimator and asymptotic con-
fidence interval perform much better than the Burnham-Overton estimator for a wide

range of assumed mixing distributions. As Chao points out, this estimate will not

perform well when the mean probability of capture is moderate to large, since informa-

tion based on subjects captured on more than two occasions (i.e. counts (f3, f4,...))

is ignored.

Nonparametric MLE

Norris and Pollock (1996) developed models in the classes Mh and Mbh that non-

parametrically estimate the random distribution assumed for the population hetero-

geneity. This work generalized Burnham's beta-binomial model, which assumed that

the subject capture probabilities p, are distributed as Beta(a, b) random variables,

by leaving this mixing distribution, F, unspecified and estimating it via maximum

likelihood. For a given candidate value for N, the authors used existing algorithms for

finding the nonparametric maximum likelihood estimate for F. After doing this for

many plausible N values, the authors took as the nonparametric maximum likelihood

estimate (NPMLE) of N the value for which the pair (N, F) yielded the maximum

log-likelihood among all those candidate values of N considered.

These authors avoided the flat log-likelihood difficulties by placing restrictions on

both the possible size of the population and the size of the probability of capture.

In order to use Norris and Pollock's estimators, one must place an upper bound on

the size of the population, as well as a lower bound on the probability of capture.

Norris and Pollock suggested that if the NPMLE of N is the pre-set upper bound,

one should not use this estimator.

2.1.3 Models Allowing for Heterogeneous Population and Variable Sampling Effort

There are relatively few members in the class of models allowing for both time

and heterogeneity effects, denoted by Mth. Lloyd and Yip (1991) proposed martingale

estimating equations as a unifying framework for capture-recapture inference under

which simultaneous occasion and subjects effects could be included. Chao, Lee, and

Jeng (1992) proposed a nonparametric method of estimation for this model. We

review three such models here. We first consider Sanathanan's (1974) "generalized"

model, since it is the mixed logistic-normal model that we study in Chapter 3. We also

detail the two most recently developed models that take into account heterogeneity

and occasion effects, the log-linear model of homogeneous 2-factor interaction, de-

veloped by Darroch et al. (1993) and Agresti (1994), sometimes referred to as the

partial quasi-symmetry model, and Chao and Tsay's (1996a, 1996b) estimator based

on the notion of sample coverage.

Sanathanan's 'Generalized' Model

Sanathanan (1974) was the first to consider a capture-recapture model that al-

lowed the probability of capture to vary across both sampling occasions and among

the subjects in the population. Sanathanan considered a two-step estimation scheme

to the logistic-normal model discussed in Chapter 3 to obtain cell probability esti-

mates conditional on the total observed number of subjects in the experiment. The

author considered the impact of the assumption of different random-effects distribu-

tions when only three samples were taken. Sanathanan, however, does not report any

problems with this model in terms of flat log-likelihoods, arbitrary point estimates,

or extremely wide confidence intervals.

Homogeneous Two-Factor Interaction Model

Darroch et al. (1993) and Agresti (1994) motivated the simple log-linear model of

homogeneous two-factor interaction from a fixed-effects approach to a subject-specific

logistic model. Specifically, if pi = /il...i = E(nil...i,) is the mean of the cell count

associated with capture history (il,..., it) in the 2t table formed by cross-classifying
capture status at each of the t occasions, then this model is

log(il...Ji = p+ 01i + + 'tit + ( ij ) A,

where we define ( = 0 if a < b. The t main-effect occasion parameters model

the variation in the capture probabilities across the t sampling occasions, while the
extra term A accounts for subject heterogeneity.
We will consider this model in detail in Chapter 3 when we derive both the fixed
and random effect approaches to the subject-specific logistic model. This model is
also included in the simulation study of Chapter 5, and the results suggest that this
model is useful when substantial population heterogeneity is present.

Chao's Sample Coverage Estimator

Chao and Tsay (1996a, 1996b) estimated N when both occasion and subject
effects are present through the notion of sample coverage. Consider a t = 3 sample

experiment. Let Yj = 1 if subject s is captured on occasion j and 0 otherwise, pj =

N-1 SN= E(Yj) be the average probability of capture on occasion j, j = 1,...,3,
yij = E E [(Y.i- i) (Y, .j)] /(tizj)


7123 = E E[(Y1 1) (Ys2 2) (Ys3 A3)] /(1i23)

measure the degree of dependence between the occasions. Chao and Tsay used the
connection between the number of subjects unseen and the conditional probability
of finding an undiscovered one if an additional sample is collected. They defined

sample coverage, C, as just one minus this conditional probability, and then estimated

N through its relationship with C and the two and three-way associations, y =

(712, 'Y13, 23, )123), between the sampling occasions.
Specifically, the authors defined sample coverage as follows. Let I(-) denote the

indicator function. Assuming the first two samples are fixed, the probability that an

additional subject is discovered on the third occasion is

EP P 3 = 1)I[Y1 = 0, Ys2 = 0]
P3=1 P(Y,3 = 1)

Since epidemiological applications are the main focus of these papers so that no

ordering exists for the sampling occasions, the analogous quantities

:N1 P(Y1 = 1)I[Y2 = O, Y3 = 0]
SE=1 P(Yi = 1)

when fixing samples two and three, and

= = P(Y82 = 1)I[Y = 0, Ys3 =0]
8=1 P(Y82 = 1)

when fixing samples one and three, are also relevant. Thus,

S P + P2 + P3
C* = (2.1)

is defined as the average probability of finding a new subject in any chosen additional

sample, and the sample coverage is then defined as C = 1 C*.

Let n+i1i3, ni,+i,, nili2+, nil++, n+i2+, n++i, be marginal counts obtained by sum-

ming the observed cell counts, n, over the samples denoted by "+" in the subscript.

Chao and Tsay argued that a reasonable estimate for Pj, j = 1, 2,3, is the number of

subjects captured on occasion j only divided by the total number of subjects captured

on occasion j, leading to sample coverage estimate

1( nlo + noo + nool
3 \n++ n+l+ n++l

Then, claiming that identities

N E(D) 1
N = + ) [E(nl+o + n+10)712 + E(nlo+ + n+01)713
E(C) 3E(C)

+E(nol+ + no+1)723] + R (2.2)

712 = N E(n+) 1, (2.3)

713 = N E(n+) 1, (2.4)


723 = N E(n+11) 1 (2.5)

hold, where

D= E I[Y l + 2> O I[Y2 + Y3 > 0] + I[Y, + 3 > 0]
3 $s=I s=l s=l

is the average of the distinct total and R is a remainder term involving -, they
obtained an N-estimate as follows. Set R = 0 in identity (2.2) by assuming that
the three-way association, 7123, between occasions is a linear function of (i, /2, /13)
and the two-way associations. Then solve the equation that results by replacing all
expectations in (2.3), (2.4), and (2.5) by their observed values and substituting the

resulting (712, 713, 23), along with D for E(D) and 0 for E(C), into identity (2.2).
This results in the N-estimate

n+11 + nl+l + n+nl

[ (n4+o + n+o)ni+ + (nlo+ + n+ol)n+ + (no+1 +n+ol)n+. -1
3C [ nl++n+i+ nl++n++i n+i+n++i

Since this N-estimate contains C in its denominator, Chao and Tsay noted that

N can be unstable for small sample coverage. We have found that this instability,

in fact, can take the form of nonsensical negative estimates of the population size.

Even if the observed data prove to be "stable" in Chao and Tsay's terms, bootstrap

replicates (see Section 2.3.2) used to obtain estimated variances for N and confidence

intervals for N can have this undesireable behavior. Thus, this poor behavior is the

main disadvantage of this sample coverage estimator.

Chao and Tsay proposed an alternative, one-step estimator, N1, to be used when

N proves unstable. This iterative estimator is constructed by substituting the N-

estimate that assumes no dependencies between occasions, N = D, into equations

(2.3), (2.4), and (2.5), substituting the resulting two-way association estimates into

(2.2) to obtain new estimate N', and then repeating this process using N' to obtain

alternative estimate N1.

The main advantage of these estimators over the ML estimates we will consider

in Chapters 3 and 4 is that they can be expressed in closed-form so that one can

compute a point estimate and bootstrap confidence intervals for N (see Section 2.3.2)

quickly. Also, these estimators allow for both sources of dependence between oc-

casions: within-subject dependence and population heterogeneity. We will consider

both sources of dependence in Chapter 4. The major disadvantages of N are the

possible nonsensical N-estimates and the lack of a model generating this this form

of estimate. N1 remedies the instability of N, but Chao and Tsay's simulations sug-

gested that this estimator yields overly optimistic nonparametric bootstrap percentile

confidence intervals (e.g. 56% actual coverage for a 95% confidence interval) when

the true model is a continuous mixture model. Our simulations have shown that this

actual coverage figure can drop to 20% in some settings. We focus on the issue of

narrow confidence intervals versus attained nominal confidence in Chapter 5.

2.2 Maximum Likelihood Estimation of N

In this section, we review three methods for obtaining point estimates for N

from a model applied to the 2t table with unknown cell count no...0. In subsequent

discussions, we refer to the table with all 2t cell counts known as the complete table

and the 2t 1 observed counts as the incomplete table.

2.2.1 Conditional on the Number of Observed Subjects

Sanathanan (1972) outlined an approach to the estimation of N conditional on the

total number of observed subjects for any general model indexed by parameter vector

0. For a model indexed by 0, the likelihood of (N, 0) given n = (nl...1...., no...) is

L(N,0; W) = 7N!... (0)"'-- ... ro...(0)no... ro...o(O)"-
1 ...nno no...!(N- n)!11 ""

Sanathanan, using the sufficiency of n for N, factors L into two components:

L(N, 0; I) = Li(0;HIn)L2(N, 0; n)

n!,{0; n).. In()
i...! no ...1 (

L2(N, 0; n) = !(N- ro1... o(0))nTo...o(0))Nn,
n!(N n)!


Since L1 is free of N and is a function only of the unknown parameters 0, one obtains

estimates Oc from the incomplete table by conditioning on n and maximizing L1 with

respect to 0. One then maximizes L2(N, 0c; n) with respect to N, yielding

Nc -,] (2.6)
1 7oO... (0c)

where [x] denotes the greatest integer less than or equal to x. Sanathanan showed

that Nc has the same asymptotic normal distribution as N, the value of N for which

(N, 0) yields the overall maximum of L, as the true population size goes to infinity.

Let Ao...o be the fitted value for the unknown cell count obtained by conditioning on

n and fitting a given model to the incomplete table. In addition, let hlo...o be potential

values for no...o, with i0...0o > 0. For a given observed incomplete table, let G2(hio...o)

be the deviance for the model applied to the complete table formed by assuming the

unobserved cell count is ho...o. This conditional (on n) estimation procedure selects

the value po...o that minimizes G2(iio...o) for that complete table. That is,

G2 o...0) = inf G2(io...o). (2.7)

This estimate Ao...o is the value that produces a complete table whose fitted values

from the model under consideration satisfy n = t.

Fienberg (1972) applied this approach to log-linear modelling of capture-recapture

data, calling it an extension of the estimated parameters to the missing cell count.

Fienberg demonstrated testing the fit of the model to the data with the deviance

applied to the incomplete table,

G2 = 2 E nilog (2.8)

yet cautions against the use of these methods for accurate estimation of N. Fienberg

likened the capture-recapture problem to fitting a regression line of y on x, say, for

x > 0, and then using the fit of this line to predict y at x = 0. This alludes to

the extrapolatory nature of the problem as discussed in Chapter 1. Our experience

has been that the goodness-of-fit criterion applied to the observed data does not

necessarily indicate an accurate N-estimate. We will demonstrate this phenomenon

in Chapter 3.

The advantage of these conditional methods is the computational ease of obtaining

Nc, especially when using standard log-linear models (see Section 2.1.1). Cormack

(1989, 1990), Agresti (1994), and others showed the ease of fitting standard log-linear

models to capture-recapture data using GLIM by simply specifying the weight of the

missing cell count to be zero, producing N-estimate N = n + pio...o. Baker (1990)

described an EM algorithm approach to obtaining these conditional estimates. Using

a starting guess for the missing cell count, one performs the M-step of fitting a chosen

model to the complete table and the E-step of taking i to be the observed values and

no...o to be the fitted value from the previous M-step fit. In his comment to Baker,

Cormack (1990) noted that this iterative approach obtains the same estimate Nc as

the GLIM approach and is unnecessary.

2.2.2 Search over N

Alternatively, one can search for the overall maximum of L with respect to (N, 0)

by maximizing

L(N,;...) = n ... no... !(N n)

oc (N )! 1...1(0)n1'...1 ... ro...i1(0)n...iro...(0) N-

with respect to 0 for a fixed N, leading to a maximum likelihood, LN, for that

value N. One obtains the overall ML estimates (Ns, 0) by evaluating LN over a

range of N values and taking the ML estimates to be those values of (N, 0) that

provide the maximum LN. This approach was taken by Norris and Pollock (1996) (see

Section 2.1.2) for Mh and Mbh closed population models with 0 being an unspecified

mixing distribution on the individual capture probabilities. The advantage to this

approach is that this search procedure (usually) yields the true maximum likelihood
estimates for (N, 0) under the restriction that N is an integer. Disadvantages include
the computational complexity of a search procedure consisting of maximizing the
log-likelihood for many different values of N and the fact that one must put some

restrictions on the range of N values since LN cannot be computed for all N > n.
It is because of this second point that we say the true ML estimate is usually found,

since the maximized log-likelihood as a function of N need not be unimodal. We will

discuss this point further in Chapter 3.

2.2.3 N As Continuous

A third approach to computing the overall ML estimates treats N as a continuous
variable and employs numerical optimization techniques to maximize L with respect
to (N, 0). One does this by substituting the factorial terms in L by gamma functions,
so that the objective function to be maximized becomes

(N- + 1) N-
L(N, 0 ) = 0 rr...1 (0)"-'.1 ... 7o... 1(0)'o.' 7o...o(0)"-

With the development of dependable, general-purpose numerical optimization algo-

rithms, this maximization is straightforward. The resulting N-estimates from the

three approaches are normally very similar. Sanathanan (1972) claimed that neces-

sarily Nc < N. Our experience has been that either Ns = [N] or Ns = [N + 1]

always since the maximized log-likelihood is a smooth function of N.

2.3 Methods for Constructing Confidence Intervals for N

Much of the recent research in capture-recapture modelling has focused on meth-
ods of constructing confidence intervals for N. Although Sanathanan derived the

asymptotic normality of N as the true population size increases, confidence intervals

based on asymptotic normality of the estimator are often unsatisfactory since the

estimate is often based on small samples. This can result in a skewed distribution for

N and a lower confidence limit that is less than n. As a result, the bootstrap and

profile likelihood methods of Buckland and Garthwaite (1991) and Cormack (1992),

respectively, are often preferred over ones based on the asymptotic normality of N.

2.3.1 Profile Likelihood Confidence Interval

The alternative definition (2.7) for the conditional estimate Nc suggests a pro-

file likelihood approach to constructing confidence intervals when using this point

estimate. Since ho...o is the value that produces the minimum G2 statistic from the

complete table, a 100(1 a)% profile likelihood confidence interval is based on those

values for the unobserved cell that increase the LR statistic for a model fitted to the

complete table by a value less than X,. This interval rejects candidate values 0o...o

for which the likelihood under the model is sufficiently less than the maximum value.

For 95% confidence intervals, we accept all values no...o that increase the LR statistic

by X,.05 = 3.84 or less. If all no...o values in the interval (a, b) satisfy this criterion for

the missing cell count, then (n + a, n + b) is the confidence interval for N.

The cutoff value X ,' is based on the asymptotic framework in which N -+ oo.

Hall (1994) argued in a related N-estimation problem, that of k binomial(N, p)

observations with both N and p are unknown), that an alternative asymptotic frame-

work assuming the unknown parameter N is fixed is more appropriate. Moreover,

Hall showed that the N-estimator is not distributed as a standard normal random

variable, but rather has a Cauchy distribution. We note that in an analogous asymp-

totic framework for the capture-recapture setting, this chi-squared cutoff might not

be the most appropriate one. Simulations in Chapter 5, however, demonstrate that

the above profile likelihood intervals do maintain coverage that is close to the nominal

level in many practical situations.

2.3.2 Boostrap Confidence Intervals

We describe both the ordinary percentile and the BCa, or bias-corrected and

accelerated, percentile bootstrap confidence intervals. We first describe these intervals

for a generic estimation problem before applying the methods to the capture-recapture


The Bootstrap

The bootstrap is based on the idea of a plug-in rule. Suppose we wish to esti-

mate the parameter 0 = O(F) from some random sample x = (xl,..., xn) generated

from the probability distribution F. The plug-in principle estimates 0 by 0 = 0(F),

where F is some estimate of the probability distribution generating the observed data.

Often, 0(F) does not have simple form and must be computed by Monte Carlo resam-

pling. This procedure generates B resamples, x*l, ..., x*B according to the estimated

probability distribution F. One then computes the statistic of interest from each of

the resamples to obtain the bootstrap replicates 0,,... 0.

The nonparametric bootstrap takes as the estimate of F the empirical distribution

function based on the original data; that is, F places mass 1/n on each of the n

observed data points xi, i = 1,...,n, and mass 0 for any point not observed. One

then resamples with replacement from F. One can exploit any knowledge one might
have concerning the form of the underlying distribution by using the parametric

bootstrap. The parametric bootstrap assumes the functional form of the distribution

to be known up to a vector of unknown parameters, so that F = FA. The estimate

F = FA is the assumed functional form with some estimate A replacing the unknown


Percentile and BC, Intervals

Efron and Tibshirani (1993, pp. 168-169) motivate the use of percentile bootstrap

intervals as follows. Let 0 be as above and let <5 be its estimated standard error.
Consider the standard normal 100 (1 2a)% confidence interval

(0 z("a)&, + z(1--)) (2.9)

for 0 based on the assumption 9~ N (0, oa). Let the resample 0* be an observation

from a N(0, 5M) distribution. The lower and upper confidence limits of (2.9) are

the ath and (1 a)th percentiles of 0*'s distribution. Therefore, if I is the normal

distribution function of 0*, the confidence interval has the form (T'-(a), T-1(1 -a)).
This suggests a way to construct general percentile intervals.

Let O* be defined as in the previous section and let H be the distribution function

of 9*. Interval (2.9) suggests the bootstrap interval (H-1(a), H-(1 -a)). In practice,
we must estimate the cdf H as the empirical distribution of the B bootstrap replicates

{ (}, so that the confidence endpoints are the ath and (1 a)th percentiles of this
empirical distribution.

If there exists a transformation u such that u(0) is normally distributed with mean
u(0) and some finite variance, the percentile method can be thought of as a means of
constructing an ordinary normal interval (a, b) for u(0) and then transforming back to
the (u-l(a), u-1(b)) interval for 0 without knowing the correct transformation. This
means the percentile interval generalizes the ordinary normal interval by requiring

some unknown transformation of 9 to be normally distributed instead of requiring 9
itself to be normally distributed.

Efron (1987) improved upon the percentile interval idea by introducing the bias

corrected and accelerated, or BCa, interval. Efron's motivation for the BCa interval is

as follows. Suppose there exists an increasing transformation u such that for 4 = u(O)

and < = u(9), we have

SN(-zo, 1), (2.10)


se=seo. 1+

and 0o is any convenient reference point on the scale of values. The bias-corrected,

or BC, interval is the special case when a = 0. Then, this means the BCa interval

further generalizes the ordinary normal intervals by allowing for a normal-inducing

transformation, 4, to have bias zo when estimating the true 0, and standard deviation

that changes according to the value of the parameter To construct the adjusted

intervals, one estimates the bias and acceleration by


En 1 U3
a {E= (2.11)
6{ E=1 U,2}3/2'

where I is the indicator function, 4) is the standard normal cdf, and U, is the ith

infinitesimal jackknife value, or empirical influence component. For ease of computa-

tion, one can substitute the ith jackknife value from the Section 2.1.2 for Ui, so (2.11)


a =(0(.) 0())23
6,=11 ((.) 0(,))2}3/2

One then calculates a percentile interval using percentiles of the empirical distribution

adjusted for the bias and acceleration:

BCa interval (^*(Q), ,*(02))


( + z()
I a pZ + z(a))
( ( o + z(1a))
a2 += 1 )
1 ?,(o + z(-0)

Efron (1987) details the theoretical advantages of the BCa interval over the ordinary

percentile interval. Notice that the ordinary percentile interval is a special case of

the BCa interval with zo and a both equal to zero. Thus, if model (2.10) holds but

the bias and standard deviation acceleration are neglible, the percentile interval will

perform adequately.

The Bootstrap Applied to Capture-Recapture

For the capture-recapture setting, the underlying distribution F = Mult(N, 1r)

generates the observed data i. Thus, we generate B resamples (nl,... ,E*B) from

an estimate of the distribution F = Mult(N, 7r) and calculate (N9,..., N). The

100(1 2a)% percentile confidence interval in this case has endpoints that are the

empirical a and 1 a percentiles of the N* values (Buckland and Garthwaite, 1991).

The BCa interval computes zo and a as above to obtain the percentile adjustments

ai and a2. We believe this is a useful modification for the capture-recapture appli-

cation, and we recommend this type of bootstrap interval over the percentile interval

for capture-recapture problems for two reasons. First, there exists a controversy in

the bootstrap literature as to the validity of the ordinary percentile interval (Hall,

1994). Second, little is known about the small sample properties of N-estimates.

Thus, the more general assumptions of the BCa interval are safer since we cannot

verify the stricter assumption that there exists some transformation of N that has a

normal sampling distribution with mean N and constant variance. Bootstrap theory

has shown that the bootstrap BCa confidence intervals have nice properties, in the

form of second-order correctness, when the parameter (and estimate) is a smooth

function of means (Hall, p. 52). We note, however, that although the bootstrap has

been proposed in the capture-recapture framework and is probably the most popular

method of computing confidence intervals in this setting, the theoretical properties

of this bootstrap are unknown since this setting does not fall under this "smooth

function model" framework.

For the percentile bootstrap, the parametric version assumes that the ii are gen-

erated from the parametric model using the ML estimates as the true parameters,

F = Mult((N, *). Efron (1987) showed that when a random variable has finite sup-

port, the nonparametric and parametric BCa intervals are theoretically equivalent.

Since we are resampling from a multinomial distribution with finite support, we only

consider nonparametric BCa intervals in the capture-recapture setting since Efron's

result applies here.

Buckland and Garthwaite (1991) proposed a nonparametric bootstrap that resam-

ples from a multinomial distribution with index N and probabilities

( 0o...o ni...I nm

This is not a nonparametric bootstrap in the truest sense, but rather a semiparamet-

ric boostrap since we are forced to use the estimate, N, from the assumed model to

estimate the underlying distribution F. We investigate an alternative, strictly non-

parametric bootstrap that resamples from the conditional multinomial distribution

of the observed cell counts given n. Thus, we bootstrap the conditional estimate

Nc of Section 2.2.1 by resampling from the conditional multinomial distribution,


Mult(n, (nl1.../n,..., no...1/n)). We investigate the performance of both of these boot-

straps in Chapter 5.


We now consider mixed capture-recapture models that allow for population het-

erogeneity and variable sampling effort. In this chapter, we focus on models that

assume that the dependencies among the t sampling occasions arise solely from the

differences between subjects. We consider alternative forms of dependence in Chapter


We investigate the use of two mixed models, the logistic-normal model and a

latent class model, and the log-linear model of homogeneous two-factor interaction.

We motivate all three of these models from the logistic model that contains a sep-

arate parameter for each subject in the population and for each sampling occasion.

Sections 3.1 and 3.2 review estimation and interpretation of the models in their tradi-

tional application, when all N individuals are observed. This complete table method-

ology will prove useful when computing the profile likelihood confidence intervals of

Section 2.3.1. Section 3.3 discusses their extensions to the capture-recapture setting

and is followed by an example based on a six-sample experiment designed to estimate

the size of a snowshoe hare population in Section 3.4.

Section 3.5 uses another example, a study designed to estimate the number of

people who contracted the hepatitis A virus during a hepatitis outbreak, to demon-

strate the difficulties associated with the use of the mixed models of Section 3.1 and

3.2. Section 3.6 draws analogies between the behavior of the N-estimates obtained

from these models and other well-studied N-estimation problems and is followed by

comments in Section 3.7. We defer simulation studies that examine the performance
of the N-estimates presented in this chapter until Chapter 5.

3.1 A Logistic Model with Subject Heterogeneity

For subject s, s = 1,..., N, let y', = (yl,..., yt) be a vector of t binary mea-
surements (0 or 1), where sj = 1 denotes capture in sample j. Let pj = P(yj = 1).

We permit subject heterogeneity using the model

log ( = as + j. (3.1)

Original applications of the model (Rasch, 1961) referred to t test items, making
the model popular in educational testing, where it is known as the Rasch model. In
fitting the model, one assumes independence of responses across occasions for a given
subject, termed local independence, and independence between subjects.
Standard ML asymptotics do not apply to this model, since as the number of
subjects (N) grows, so does the number of model parameters. Thus, the ML estimate

of 3 = (,/i,..., 3) is not consistent (Andersen 1980). Let 3j denote the ML estimate

of p3, j = 1,..., t. It is well-known that when t = 2, #2 1 ~i 2(32 31) as N -+ oo

(Andersen 1980). Ghosh (1995) proved inconsistency for t > 2.

3.1.1 Conditional Maxinmurm Likelihood

Two approaches are used to overcome the inconsistency. The first, a fixed-effects
approach, treats {a,} as nuisance parameters and eliminates them by conditioning on
their sufficient statistics, then maximizing the conditional likelihood, yielding condi-

tional ML (CML) estimates 3 As in Chapter 1, let Z = {(1,...,1),...,(0,...,0)}
be the set of 2t possible sequences of responses (ys,,..., Yst), in lexicographic order,

and denote Z = {(1,...,1),..., (0,...,0,1)} as the set of observable sequences. Let

i = (il,...,it) be an element of Z, and let ni = ni,...i, be the number of subjects
having that sequence. Tjur (1982) showed that the CML estimates are, equivalently,
ML estimates of main effect parameters in a log-linear model of quasi symmetry fitted
to the 2t table of counts {ni}. Specifically, letting {/.i = E(ni)}, the log-linear model

log(/l...,) = p + $1I(ii = 1) + + Ptl(it = 1) + A(ii,..., it), (3.2)

where the parameter A(ii,..., it) is invariant to permutations of its arguments and
the I(-) function is an indicator.
The quasi-symmetry model (3.2) is easily fit in standard statistical software such
as GLIM or SAS (PROC GENMOD). The model implies that the binary response
has the same association for each pair of items and corresponds to a random effects
approach in which one assumes that the ability parameters are distributed according
to some completely unspecified distribution F. To see this, suppose a,,..., c are
independently and identically distributed according to the unknown cumulative dis-
tribution F. For subject s, model (3.1) states that the probability of having capture
history y, = (ys, ys2, st), S = 1,..., N, is

H e Ysi (a ,,+ )3) e j y C i v 'i
II 1 + e"+- n
=1 1 + ea[+Ij I= (1 + ea,+Oj)

Thus, the marginal probability of response pattern i E Z is

"I = i...i = exp(E 3i ) (1 + e,+ dF(cs) (3.3)
j=1 I=, (1 + )

We notice that the integral in (3.3) is a complex function of 3 that depends on the

data only through the sufficient statistics Si = =1 ij, the raw score for pattern i.

These marginal probabilities then have structure that is a special case of

r...i, = exp(E Ijij)h(il,..., it), (3.4)

where h is a parameter that is invariant to permutations of its arguments. Taking the

natural logarithm of both sides of (3.4) gives (3.2). This nonparametric formulation
will prove important when we consider the model's utility in the capture-recapture


We will see in Section 3.3 that fitting this model provides no information on the
population size. Darroch et al. (1993) and Agresti (1994) proposed the simpler

log-linear model of homogeneous two-factor interaction (H02),

log(p..., =a + =(i = 1) +..., tI(it = 1) + ( = 1 )A, (3.5)

for capture-recapture. This model is the special case of the log-linear model of no
three-factor interaction in which all associations are identical. It is also a special case
of the quasi-symmetry model (3.2) in which only the second-order interactions are
allowed to differ from zero. This simple model has only one more parameter than the

model of mutual independence of the t responses, which is (3.5) with A = 0. This

model proves useful for estimating the population size since the quasi-symmetry model

provides no N-estimate. Chapter 5 will show that confidence intervals for N produced

by this model have good coverage probabilities when population heterogeneity is

3.1.2 Marginal Maximum Likelihood (MML) Approach to Estimation

A second approach treats {a,} as random effects, typically having a normal dis-
tribution with mean 0 and unknown variance a2, for which

logit(p,j) = aZ, + /j (3.6)

with Z, ~ N(0, 1). We refer to this model as the logistic-normal (LN) model. One
integrates over this distribution to obtain the "marginal likelihood" of (a, (3) given

the 2t observed cell counts. The fitted counts from the marginal model have quasi-

symmetric structure, since the model assumes a particular form for F in equation
(3.3). This model also has only one more parameter than the model of mutual inde-
pendence, which is (3.6) with a = 0. Bock and Lieberman (1970), Bock and Aitkin

(1981), and Thissen (1982) discuss a direct method of maximizing the log-likelihood
based on the Newton-Raphson algorithm.

Direct Approach to Maximization.

A direct approach to MML estimation maximizes a Guassian quadrature approx-

imation to the marginal log-likelihood using a maximization algorithm. Bock and

Aitkin (1981) demonstrated using the Newton-Raphson algorithm for a general item-

response model.

The probability that a subject with ability Z has capture history i, i E Z is

t e(iC(az+i#))
S7z = 1 + e(CZ+/i)
j= 1

Thus, the probability that a randomly selected subject has that pattern is the
marginal probability

e eij("z+l))
7i = f irz (z)dz = L + e(z+/ ) (z)dz, (3.7)

where O(z) is the standard normal density. This yields the marginal multinomial

log-likelihood for (a, /3) given the cell counts n,

l(a, p; n) oc E nilog(7ri). (3.8)

Since no closed-form expressions exist for the integral in (3.8), one can use numer-
ical integration to obtain an approximation. Gauss-Hermite quadrature (Stroud

and Secrest 1966) approximates this integral of form f f(z)k(z)dz by the expres-
sion Eq= f(zk)Vk, where the zk's are known as quadrature points or nodes and the

vk's are the corresponding weights. The choice of the number of quadrature points
q determines the degree of accuracy, and the weights vk are usually scaled to satisfy

_q=L Vk = 1. Thus, the marginal probabilities (3.7) are approximated by

V t e>(i(zhk + j)) '
i =1 + e(lZk+ kj)
k=1 --1


l(a,/3; n) (x E nilog(fr,) (3.9)

is the objective function maximized with respect to (a, /).
Newton-Raphson has traditionally been popular for ML analysis because it pro-
vides an estimate of precision of the ML estimates as a by-product of the algorithm.

Suppose, in general, we wish to maximize a nonlinear function g(6) with respect to

the unknown parameter vector 0. Let 0P) be the pt guess for 6, the value that
maximizes g. Let q' = (9g/901,09g/802,...), H denote the matrix having entries

hij = 92g/8a0i0j, and q(P) and H() be the corresponding quantities evaluated at

0P). At iteration p, the (p + 1)st guess relies on the Taylor series expansion of g(O)

around 0(P),

Q(P)(O : 9(e(P)) + q(P)'(0 0P)) + 1(0 0(P))'H() (6 (P)). (3.10)

Solving OQ(P)(0)/O0 = q(P) + H(P)(0 0(P)) = 0 yields the (p + 1)st guess for 0,

-(p+= 0(P) (H(P))- q(P), (3.11)

as long as H(P) is nonsingular.

For the logistic-normal model, 0 = (a, 3) and g(O) = 1. Denote

exp(azk + 3j)
P 1 + exp(azk + 3j)

The Ith element, = 1,...,t + 1, of the score vector q is

_91 ni airi
El (3.12)
090 i -i 090

If we rewrite
i = L fkvk (3.13)


fik = Ep Pik)1i1

then the elements of the score vector have the form

S nE l ikVk, l=1,...,t+1. (3.14)
iEl k=l (h=1 ihh) (90

Using the identity that, for any function f(-) and parameter 0,

of Ologf(0)
o = f(0) lf (3.15)

equation (3.14) can be reexpressed as

q Ologfik
nik =1,...,t + 1, (3.16)
iEZ k=1 O^

where Wik = fikvk/l(Z=l fihVh),

9logfigk t
ba -- Zk(ij Pk),

(9fk = (ij --Pjk), j = 1,..., .

The Hessian matrix H has elements

S21 q 2logfik 0logffk wik]
hi.m= E u' E e +
t- 0=, K=E 0l0m 80 8 / 'n


a2f Z [jk(1 Pjk)],

0g k -Pjk(1 pjk),

0210gfi k
2 logik


Ok (EI= fikVk) fikVk [klok~i [k (fikVk )l
0Gm (=1 fikVk)2

The observed information matrix, I1- = (H(0))-1, is an estimate of the large-sample

variance covariance matrix for 0 = (&, /).
We choose to use another maximization algorithm for direct maximization of
the log-likelihood since we will not base confidence intervals on the large-sample
standard errors. Feasible Sequential Quadratic Programming (FSQP) (Zhou and
Tits, 1994) is a set of high-quality FORTRAN subroutines for the minimization of
a smooth objective function subject to nonlinear equality and inequality constraints,
linear equality and inequality constraints, and simple bounds on the variables. These

routines are based on an iterative algorithm that searches away from a given feasible

iterate in an arc whose direction is determined by an estimate of the Hessian of the

Lagrangian. This arc search along a function's surface is known as an Armijo type

search. We use FSQP to provide the MML estimate (MMLE) of 0 by minimizing the

negative of equation (3.9). We maximize the log-likelihood with the only constraint

being a lower bound of zero for the variance component a.

Indirect Maximization

For completeness, we outline an indirect approach to MML estimation in the

logistic-normal model. This approach takes the form of an Expectation-Maximization

(EM) algorithm (Demptster, Laird, and Rubin, 1977) and is based on the observation

that the likelihood equations take the simple form of those from a weighted generalized

linear model. Specifically, the score equations (3.16) can be rewritten as

q qt
-a = =E E. Zk [Tik(1 pik) -NkPjk] = 0
k=1 j=1

a 3 E jk(1 pjk) kPjk] =0, j= 1,...,t
#fi, k=l


Nk Z nfiWik (3.17)

can be interpreted as the expected number of individuals with a latent variable value

of zk and

Tjk = niijWik (3.18)

is the number of subjects with catchibility zk expected to be captured on occasion

j. These equations take the familiar form of the likelihood equations for a logistic

regression analysis of nijk successes out of Nk trials.

This formulation suggests an iterative EM procedure. At the pth step, given pa-

rameter values 0), the expectation step computes ({ )} and {N()} using (3.17) and

(3.18). The maximization step obtains updated parameter estimate 0P+1) through

an ordinary logistic regression analysis of {n- } successes out of (NP) } trials. One

typically iterates between the two steps until some criterion, such as the change in

successive deviances or the maximum change in successive parameter values, is less

than some pre-determined tolerance.

The Newton-Raphson and EM methods of estimation both have advantages and

disadvantages. The Newton-Raphson algorithm converges relatively quickly, having

a quadratic convergence rate, and provides large-sample standard errors for the MML

estimates. A drawback of Newton-Raphson is the complexity of the derivatives and

that the repeated inversion of H becomes computationally intensive for large t, al-

though with today's computing facilities, it is not too much so.

The biggest advantage of the EM algorithm is that it is computationally simpler

than Newton-Raphson to implement. Since the EM algorithm updates the param-

eter estimates through logistic regression, it is simple to program the algorithm in

standard software packages such as SAS, Splus, or GLIM. The algorithm is, however,

notoriously slow in converging, with convergence becoming very slow after the algo-

rithm quickly moves to a neighborhood of the estimates. In f-.ct, as we will see in the
next section, this algorithm has the potential to "stall"; that is, the algorithm can

stop prematurely. Also, the EM algorithm does not provide standard error estimates

without additional computation.

After estimating the parameters, the fit of the assumed model can be assessed by
the likelihood ratio statistic

G2 = 2 ( niglo (3.19)

where ii = 7i(&,/3) is the estimate of the Gaussian quadrature approximation of 7ri,

i E Z. If all expected frequencies are greater than or equal to five, this LR statistic

has an approximate x2 distribution with df=(2t 1 free cell probabilities)-(t + 1

parameters)=2t t 2.

3.2 A Latent Class Model

This chapter also applies a latent class model to the capture-recapture problem.
General latent class models were first introduced by Goodman (1974). In general

terms, suppose we observe measurements on a set of t categorical variables {Vj},

where Vj has Lj levels, j = 1,..., t, so that subjects can be cross-classified into a

ni=1 Lj contingency table. Latent class models postulate the joint distribution of the
t observed, or manifest, variables as a mixture of L distributions, defined by L classes

of a latent, or unobservable, categorical variable Z. This latent variable categorizes

individuals into L homogeneous classes. Each of the L components of the mixture

distribution are assumed to satisfy mutual independence among the t variables, so

that observed associations among these manifest variables are a product of differences

between latent classes. If we let (ii,..., it) be a subject's classification according to
the t manifest variables, and (il,...,it, iz) be the joint response to these variables
along with their latent class membership, the observed counts {ni,... } represent a

collapsing of the n=l Lj xL table over the levels of Z. Assuming mutual independence

of (V1,..., V) given Z implies that the cell counts in this n1=i Lj x L table satisfy

the log-linear model

t t
=+log(.. ) A + A ++1: A .
k=1 k=1

Thus, the observed counts satisfy

t L t
log(pi...i,) = E Ai + log E exp Az + = A. ) z
k=1 e1=1 k=1

3.2.1 Quasi-Symmetric Latent Class Model

We consider a special case of this general latent class model that requires the

associations, {Ai } between the latent variable and the t manifest variables to be

identical. This latent class model is a special case of models introduced by Lindsay
et al. (1991) and Agresti and Lang (1993). If we assume there are only two latent
classes, this model is the special case of the logistic model (3.1) with only two possible
values for a,. In terms of {fi1...,~}, it has the quasi-symmetric structure whereby the
associations between the binary latent variable and the t responses are identical.

Relationship to Logistic-Normal Model

This quasi-symmetric latent class model (QLC) with two latent classes relates to
the logistic-normal model, being a generalization of the crude approximation of it that
uses only two quadrature points. When the {Vk} are unknown, the logistic-normal
model with q = 2 is equivalent to the QLC model with L = 2. Consider the three-
sample case for notational convenience. Generalizations to t > 3 are straightforward.
Consider the marginal probability of capture history i = (il, i2, i3),

7r1i213 3 [i exp(ij(az + )) Ozdz.
J=1 1 + exp(Uz + -j)J

2-point Gaussian quadrature yields z = (z, z2) = (-1, 1), but with only 2 nodes, the
model is equivalent if z is treated as a factor: z = (0, 1), so that

( exp(E1 zkij)
irili2i3 = exp jij E exp(k + j)) k
k.4=1 k=1 j=1(1 + exp(azk +6

3 2 3 3
= exp ( OjYij E exp O azki + log(Vk) log II (1 + exp(azkij + /j))
(j=1 k=l j=l j=1

Taking the log of both sides yields

log( rli2i3) = f +i,

2 3 3
log E exp (aZkij) + log(vk) log II(1 + exp(azkij + 3j))
k=l =1 j=l

This model satisfies the QLC structure of

log(7ri,i2i3) = A"i + AY + AY + log (AVIZ + A2f + AX3 + A) (3.20)

where the item parameters 3 are equivalent to the main effect parameters for the
manifest variables (A, A A3 ). The association parameters Avl = A, = A

equal a for ij = 1 and k = 2 and 0 otherwise, satisfying quasi-symmetry. Also,

constraining Af = 0 for identifiability yields

3 3
A = log(v ) log (1 + exp(a + 0)) log(i) log J(1 + exp(/))
j=1 j=1

The QLC model with two latent classes also implies the LN (q=2) model, providing
equivalence between the two models. Consider the ML estimates,

\ = ( V, i i, \V2, V3z Z\'
0 1 1 1 1

for model (3.20), where An1 is the common value of AI~' = -A2Z = Az. Without loss

of generality, we fit this model with the latent variable Z coded so that this association

parameter is nonnegative. If this is not the case, we simply change the coding of the

levels of the latent variable Z from (0,1) to (1,0). Assuming Aoo 0, we have & = Arl,

B1 = \A1 A~o, 2 = 12, and ~3 = 3. The values fk, k = 1,2 in the LN model are
then obtained by determining the values for v = (vl, v2) that satisfy the equations

3 3
S= log(V2) -log (1 exp(+ )) log(v) log (1 + exp() ,
j=1 j=1


V1 + V2 = .

Thus, there is a one-to-one equivalence between the parameters of the QLC (L = 2)

model and the LN (q = 2) model.

This equivalence between the two models can be generalized in that the QLC

model for a given L > 2 is Aitkin's (1996) logistic-mixture model, which places no

distributional assumptions on the mixing distribution other than it has L mass-points.

This is the model fit sequentially for increasing L in order to obtain Aitkin's NPML

estimates of the unknown mass-points a = (al,...,aL), weights v = (V,...,VL),

and 3.

A simple extension of the previous argument shows this equivalence. The marginal

probabilities for the logistic-mixture model with L mass-points is

log(Tri...i,) = y3jij +

log E exp (akij) + 1og(Vk) log (1 + exp(aij + 3j)) .
k=l j= j=l 1

Arguing as in the q = 2 case, we see ak, k = 1,..., L, are the common association

parameters A yz = Avz= AVz and that

3 3
= log(Vk) log (1 + exp(tk+ )) Ilog(1) log IT(1 + exp(al + j))
j=1 j=1

3.2.2 Estimation

One can fit this model using the EM algorithm. The distribution of the complete

data, the Ij=1 Li x L contingency table, has regular exponential form, so that only the

complete data sufficient statistics must be estimated at each E-step. (See Goodman

(1974) and Lang (1992) for an EM approach to latent class models in general and

Agresti and Lang (1993) for this specific case). The ik, k = 1,2, are the values that

simultaneously satisfy

3 3
2 = log(2) -log l(1 +exp(& + j)) log(i) -log 1(1 +exp(j)) .
j=1 j=1


v1 + 2 1.

Alternatively, we estimate (vi, a,/3) using FSQP. This optimization routine is

much faster than the EM algorithm, which can stall before converging to the maxi-

mum likelihood estimates.

As an example, consider the complete table based on the data from Bock and
Aitkin (1981) that consists of the response pattern counts of 1000 students' re-

sponses to 5 LSAT questions. The data are displayed in Table 3.1. FSQP gives

the MMLE's for the 2-point LN model with arbitrary weights v as & = 1.416,

/3 = (-3.240,-1.520,-0.761,-1.827,-2.616), and fi = (0.621,0.379). The model
fit yields deviance G2 = 23.36 based on 23 df. The parameter estimates for the

quasi-symmetric latent class model obtained from the EM algorithm using GLIM are

reported in Table 3.2.

Table 3.1. Response pattern counts and the fitted values of two mixed models of 1000
students' answers to 5 dichotomously-scored LSAT questions



Table 3.2. EM Parameter Estimates for
Applied to the LSAT Data

the Quasi-Symmetric Latent Class Model


Notice A = 11, A1 = A 13, A2 2,3 = p, = A5 = V, and

S= [log(.621)- log =(1 + exp(& + ^))] [log(.379) logn =(1 + exp(/j))],

demonstrating the equivalence of the two models.

The convergence for the EM algorithm is slow. 826 EM iterations were required

to match the estimates obtained in FSQP. Also, the EM algorithm terminated twice

(after 336 iterations and 345 iterations) before the true MLE's were obtained even

though the convergence criterion, the change in successive deviances, was tightened

to le-20. This propensity of the EM algorithm to "stall" before reaching the true

MLE's has been observed in fitting other generalized linear mixed models (Booth and

Hobert 1997).

3.3 ML Estimation of N

We now discuss fitting the models outlined in the previous section in the capture-
recapture setting. Again, we refer to the table with all 2t cell counts known as the

complete table and the one with no...0 unknown as the incomplete table. Recall, also,

from Section 2.2 that L is the full unconditional likelihood, L1 is the conditional

(on n) likelihood, and L is the approximate likelihood that treats N as a continuous


3.3.1 The Logistic-Normal Model

Using Gaussian quadrature, the logistic-normal model of Section 3.1.2 postulates

the form of the probabilities 7ri(OLN), where OLN = (a, f), as

ii(OLN) = : + e(+#1) Vk, iE Z.
k=1 =1

One can easily implement the conditional (on n) approach to N-estimation of Sec-

tion 2.2.1 by directly maximizing L1 with respect to (a, O) using Newton-Raphson

or some other numerical optimization routine. Denote Jr(0LN) = ir for notational

convenience. Since

LI (OLN; ) n! .
nliej i! iVEhEy 7rh

we have

li = logLl oc nilog(ri) nlog ( i r) .


all ni 8i n ai ni n ai\

Thus, the likelihood equations used to obtain Oc from the incomplete table are sim-

ply the score equations for the complete table minus the term corresponding to cell

(0,..., 0), corrected for the total number of subjects seen during the experiment. One

then calculates Nc using (2.6). Sanathanan (1974) incorporated a two-step method
of estimating (a, f3) by estimating p/ using the CML approach and then maximizing
Ll(a,3 C) with respect to a. The 95% profile likelihood interval of Section 2.3.1 can
be constructed by using one of the complete table algorithms of Section 3.1.2 to find

the no...o values that yield a likelihood ratio statistic of G2( o...o) + 3.84. We obtain

Ns by simply maximizing L with respect to (a, f) for each candidate N, while con-

straining a > 0. Maximizing L with respect to (N, OLN), while constraining a > 0,

yields N.

3.3.2 Quasi-Symmetric Loglinear and Latent Class Models

From Tjur (1982), the fixed-effects approach of CML estimation to the logistic

model is equivalent to fitting the quasi-symmetric loglinear model to {ni}. Darroch et
al. (1993) and Agresti (1994) noted that this CML estimation provides no information

on N, since one of the likelihood equations,

ALo...o = no...0,

shows that any value no...o is consistent with the model. Thus, the deviance for this

model remains constant for all N > n since cell count no...o makes no contribution

to the test statistic. This yields the profile likelihood confidence interval (n, oo) for

N. They considered special cases of the quasi-symmetry model for which this is not

the case. For instance, the log-linear model of homogeneous two-factor interaction

(3.5) is a special case that is still more complex than the loglinear model of mutual

independence that results from a lack of subject heterogeneity. One obtains Nc for

this model using standard model fitting software by fitting (3.5) with zero weight

for cell count no...0. One can obtain N by numerically optimizing L with respect to

(N, 0, A) in expression (3.5).

One could compute the estimate Nc for the QLC model with L = 2 by estimating

(a, 0, v) using the EM algorithm while setting the weights of the two cells in the 2t x 2
table that correspond to the unobserved cell in the 2* table equal to zero. Because

of this algorithm's potential to stall (Section 3.2.2), we use the model's equivalence

to the logistic-normal model with q = 2, and use FSQP to maximize L1 with respect

to OLC = (VI, 0, 3). We also use FSQP to fit the complete tables generated when
searching across no...o values to obtain a profile likelihood confidence interval. We

obtain N with FSQP by numerically maximizing L with respect to (N, OLc).

3.4 Snowshoe Hare Example

Cormack (1989) reported a capture-recapture study having t = 6 consecutive

trapping days for a population of snowshoe hares. Table 3.3, which displays the data,

shows that 68 hares were observed. The N-estimates and confidence intervals from the

different models are summarized in Table 3.4 for ease of comparison. The logistic-

normal model using 10-point quadrature (LN10) yields &c = 0.96 and Nc = 92.0

from the conditional (on n) approach and & = 0.91 and N = 89.0 from numerical

optimization. These point estimates remain unchanged with better approximations

of the marginal probabilities (i.e. q > 10); in fact, a profile of both Nc and N across

q > 2 reveals that these estimates stabilize for q > 5. This is not always true, however,

and we examine issues related to the estimation of a in the next section. Table 3.3

shows the model fit conditional on n.

For the logistic-normal model, the 95% profile likelihood interval corresponding to

Nc is (74.8, 148.5). The 95% parametric percentile bootstrap interval corresponding

to N (with B = 1000) is (69.9,129.4). The 95% nonparametric percentile and BCa

bootstrap intervals are (70.8, 231.5) and (71.8, 286.4), respectively. The reason for the

large discrepancies between the bootstrap intervals will be discussed in Section 3.5.

The log-linear model of homogeneous 2-factor interaction gives point estimates

almost identical to the logistic-normal model, with Nc = 90.5 and N = 88.2. All

fitted values for the 2t 1 observed counts were no further away than .04 from

the fitted values produced by the logistic-normal model. The 95% profile likelihood

interval is (74.8,125.1), while the 95% parametric percentile interval is (73.6, 114.3).

The nonparametric percentile and BC, intervals are (73.8, 120.2) and (74.8, 127.4),

respectively. Thus, we see much more consistency across the different confidence

intervals for the simpler log-linear model than for the mixed model, the reason for

which will be seen in Section 3.5.

The log-linear model of mutual independence yields Nc = 75.1, profile likelihood

interval (69.9, 83.3), and N = 76.3 with 95% percentile percentile interval (71.3, 77.9).

The 95% nonparametric percentile and BCa intervals are (70.9, 80.3) and (71.1, 80.5),
respectively. This model assumes no heterogeneity, which yields narrow intervals that

severely underestimate N when this assumption is false.

Table 3.3. Results of capture-recapture of snowshoe hares

Capture 3, Capture 2, Capture 1
Capture 6 Capture 5 Capture 4 000 001 010 011 100 101 110 111

0 0 0 3 6 0 5 1 0 0
(24.0)* (2.3) (5.4) (0.9) (3.2) (0.5) (1.2) (0.3)
(9.1)** (2.1) (4.8) (1.1) (2.8) (0.6) (1.5) (0.3)

0 0 1 3 2 3 0 0 1 0 0
(4.8) (0.8) (1.8) (0.5) (1.1) (0.3) (0.6) (0.3)
(4.2) (1.0) (2.2) (0.5) (1.3) (0.3) (0.7) (0.2)

0 1 0 4 2 3 1 0 1 0 0
(3.9) (0.6) (1.5) (0.4) (0.9) (0.2) (0.5) (0.2)
(3.5) (0.8) (1.8) (0.4) (1.1) (0.2) (0.6) (0.1)

0 1 1 1 0 0 0 0 0 0 0
(1.3) (0.3) (0.8) (0.3) (0.5) (0.2) (0.4) (0.3)
(1.6) (0.4) (0.8) (0.2) (0.5) (0.1) (0.3) (0.1)

1 0 0 4 1 1 1 2 0 2 0
(6.8) (1.1) (2.6) (0.6) (1.5) (0.4) (0.9) (0.4)
(6.0) (1.3) (3.1) (0.7) (1.9) (0.4) (1.0) (0.2)

1 0 1 4 0 3 0 1 0 2 0
(2.3) (0.6) (1.3) (0.5) (0.8) (0.3) (0.7) (0.4)
(2.8) (0.6) (1.5) (0.3) (0.9) (0.2) (0.5) (0.2)

1 1 0 2 0 1 0 1 0 1 0
(1.9) (0.5) (1.1) (0.4) (0.7) (0.3) (0.6) (0.4)
(2.3) (0.5) (1.2) (0.3) (0.7) (0.2) (0.4) (0.1)

1 1 1 1 1 1 0 0 0 1 2
(1.0) (0.4) (0.9) (0.5) (0.5) (0.3) (0.7) (0.7)
(1.1) (0.2) (0.6) (0.2) (0.3) (0.1) (0.3) (2.0)
* Logistic-normal Model (q=10)
** Quasi-syiunctric latent class model

The conditional approach to the two-category quasi-symmetric latent class model

yields Nc = 77.1. Table 3.3 shows the fit. The 95% profile likelihood interval is

(70.8,87.4). The numerical optimization approach yields N = 76.3 and 95% para-

metric percentile interval (72.3, 84.3). The 95% nonparametric percentile and BCa

intervals are slightly wider at (72.6, 100.7) and (72.0, 91.3). These intervals are much

narrower than those produced by the logistic-normal model, since this latent class

model represents a compromise between the naive mutual independence model and

the logistic-normal model.

Table 3.4. N-Estimates and Confidence Intervals Produced by the Logistic-normal,
Quasi-Symmetric Latent Class, Homogeneous Two-factor Interaction, and Mutual-
Independence Models
Profile percentile BCa
Model Nc Likelihood N parametric nonparametric nonparametric
LN10 92.0 (74.8,148.5) 89.0 (69.9,129.4) (70.8,231.5) (71.8,286.4)
H02 90.5 (74.8,125.1) 88.2 (73.6,125.0) (73.8,120.2) (74.8, 127.4)
QLC 77.1 (70.8,87.4) 76.3 (72.3,84.3) (72.6,100.7) (72.0,91.3)
IND 75.1 (69.9,83.3) 76.3 (71.3,77.9) (70.9,80.3) (71.1,80.5)

This example exhibits typical variability among the different point estimates and

confidence intervals obtained from the different models, and also the variability among

the different confidence intervals for a given model. These relationships and the

reasons behind them are explored in the next section.

3.5 Behavior of the Log Likelihood and N Estimator

In the capture-recapture problem, the estimation of a strongly affects the esti-
mation of N. Large population heterogeneity causes high correlations among the t

captures and results in a large estimate of no...0. For the snowshoe hare example,

Figure 3.1 shows N as a function of an assumed known value for a. Plot 1 in that

figure shows results using q = 10, 50, and 75 quadrature points. Since N is a rapidly

increasing function of a, small changes in & can have a large impact on the ML esti-

mate of N. Plot 2 in Figure 1 displays a profile of -2 log L in terms of a, revealing

that & = 0.9, from which we see N = 89.0 in Plot 1. These plots suggest that the

choice of q affects the results only for large a. Since the maximum likelihood estimate

for a is moderate, we obtain identical results for q = 10, 50, and 75.


-2 Log L

1.0 2.0

1.0 2.0

Figure 3.1. N and -2 Log L as a Function of a for the Snowshoe Hare Data

As heterogeneity increases in model (3.6), the probability of capturing a subject a

relatively small or relatively large number of times increases. Capturing a large num-

ber of animals only once, for instance, suggests either (1) the logistic-normal model

holds and considerable heterogeneity exists within the population, (2) the logistic-

normal model holds and the probabilities of capture at each occasion are small, or

(3) subjects experience trap avoidance so that the local independence assumption

of the logistic-normal model is inappropriate. Unfortunately, traditional goodness-

of-fit tests do not always differentiate between a correct and incorrect model. We

demonstrate this difficulty in Section 3.7. This case of large heterogeneity causes dif-

ficulties in estimation when using the logistic-normal model, since a large & results in

a relatively flat likelihood surface, which implies unstable and imprecise N-estimates.

This problem is not serious with the snowshoe hare data. The profile likelihood

surface with respect to (N, a), maximized over /3, in Figure 3.5 shows a reasonably

well-defined mode, which leads to a stable point estimate of N.

The log-likelihood surface for the snowshoe hare data, however, is not unimodal,

reflecting that many (N, a) pairs exist that have log-likelihood values only slightly

lower than the well-defined maximum that defines N. This second mode is reflected

in the large differences between the parametric percentile bootstrap interval and the

more robust (e.g. nonparametric percentile and nonparametric BCa) bootstrap in-

tervals. Thus, the large inconsistencies among these intervals provide some clue that

many (N, a) pairs are consistent with the data. Because of this near nonidentifia-

bility problem, the simpler log-linear model of homogeneous two-factor interaction

provides confidence intervals narrower than those generated by the logistic-normal

model when a is moderate to large. Simulation results in Chapter 5 support the use

of this simpler model in these cases.

In contrast to the stable point estimates for the snowshoe hare data, consider

Table 3.5 from Chao and Tsay (1996a,b), which reports the results from an epi-

demiological study designed to estimate the number of people infected during a 1995

hepatitis A outbreak in northern Taiwan. The 271 observed cases were reported

from 3 sources records based on a serum test taken by the Institute of Preventive

Medicine of Taiwan (P), records reported by the National Quarantine Service (Q),

and records based on questionnaires conducted by epidemiologists (E).

For the logistic-normal model, a profile of N across q reveals that the N-estimates

do not stabilize until q > 45. Numerical integration yields & = 2.6 and N = 2204.5

using q=10 and & = 2.9 and N = 4087.2 using q = 50. Figure 3.2 shows that

o 'o
10 100 200 300 400 500 o

Figure 3.2. Two Views of the Profile Log-likelihood Surface With Respect to N and a, Maximized
Over )3, for the Snowshoe Hare Data Set

Table 3.5. Capture-history Counts and Conditional (on n) Fitted Values for Hepatitis
P Q E n, LN (q = 10) LN (q = 50) LC (L=2)
000 1953.4 4280.3 -
0 1 63 61.0 61.0 61.0
0 1 0 55 58.0 58.0 58.0
0 11 18 17.0 17.0 17.0
1 00 69 68.0 68.0 68.0
1 0 1 17 20.0 20.0 20.0
1 1 0 21 19.0 19.0 19.0
111 28 28.0 28.0 28.0

the N-estimates for different q diverge as a increases, so that at & = 2.6, N differs

dramatically for different values of q. As Aitkin (1996) noted, large q is necessary in

random effect models when a is large. We also see that the q = 10 approximation

breaks down around a = 3. Better approximations (q > 10) break down at larger

values of a.

-2 Log L
12000 oq= + -1400
0 q=50 + -1420 -
10000 ++=75
+q=75 +
o+ -1440
8000 -1460
6000 -1480
4000 -1500
2000 -1520

0 1 2 3 4 0 1 2 3 4

0 0

Figure 3.2. N and -2 Log L as a Function of a for the Hepatitis Data

The large & and large negative values for / here reflect the many subjects with one

capture (187) compared with the numbers of subjects with two or three captures (56

and 28). With such large &, the data provide relatively little information about N.

Figure 3.2, Plot 2 shows that the hepatitis data yields a flat likelihood with respect

to a, so a wide range of a values are consistent with the data. The plausible a values,

however, correspond to a wide range of N-estimates, since N increases sharply with

respect to a (Figure 2, Plot 1). For instance, a 95% nonparametric BCa interval for

N is (786, 85,876).

We present the hepatitis log-likelihood surface for comparison against the snow-

shoe hare surface of Figure 3.5. Figure 3.5 illustrates the relationship among N, a,

and the confidence on N, by showing the profile log-likelihood surface with respect

to N and a maximized over 0, for the hepatitis data set. Nothing practical can be

said about N based on the logistic-normal model except that it is not very small.

The flat log-likelihood can cause wild fluctuations in the point estimate due to small

changes in numerical precision or in the data themselves. By contrast, the snowshoe

hare data has a well-defined maximum of the log-likelihood at (a = .9, N = 89.0) in

the range 68 < N < 500. In this case, the slope of the surface in the N direction is

much steeper for small a than for large a, so that small to moderate heterogeneity

produces narrower confidence intervals.

Why does the logistic-normal model sometimes provide little information about

the population size? The reason is similar to the reason why every no...o > 0 is plausi-

ble for the quasi-symmetry model. A consequence of the nonparametric random-effect

motivation for the quasi-symmetry model of Section 3.1.1 is that for each candidate

no...o, there exists a mixing distribution for which the fitted value would be that no...o.

The class of possible mixing distributions is so rich that any no...o is equally plausible.

The logistic-normal model implies a marginal distribution that is a special case of

the quasi-symmetry model. This class of mixing distributions is still rich enough that

0003000 4000 4 3 2 00 000
2000 o 3oo 4000 "0 0 o
N 5000

Figure 3.4. Two Views of the Profile Log-likelihood Surface With Respect to N and a, Maximized
Over /3, for the Hepatitis Data Set

many values of a may be consistent with the data. A wide range of plausible a val-
ues means the candidate N values form a wide interval, amounting to little practical
information about N.

Instead of allowing each subject to have a different propensity for capture, the
latent class approach requires Z, to be in one of two latent classes, with unknown
probabilities {vk}. For the hepatitis example, Figure 3.3 portrays the deviance profile
for the QLC model. The latent class model also yields a flat log-likelihood, yielding
arbitrary point estimate and profile likelihood interval (407, oo).



0.96 -

400 500 600 700 800

Figure 3.3. Deviance (G2) Profile for 379 < N < 800 for the Hepatitis Data

The flat log-likelihood incurred by this model for the hepatitis data is a result

of the model's close relationship to the log-linear model of quasi-symmetry when

t = 3. Lindsey et al. (1991) demonstrated that, when k > (t + 1)/2, the quasi-

symmetric latent class fit is very close to the conditional maximum likelihood fit of

logistic model (3.1). They gave sufficient conditions for these two fits to be identical,

and term tables for which this is true as concordant tables. Chao and Tsay (1996a,

1996b) note that the true population size for the hepatitis data is approximately 545.

If we fit the latent class model to the complete data set, we see that the complete

table is concordant since the quasi-symmetric latent class model yields the conditional

maximum likelihood fit given by the log-linear model of quasi-symmetry (see Table

3.4). The unobserved cell causes this equivalence to no longer hold for all concordant

tables, since if it did, the profile likelihood interval would match the quasi-symmetry

interval of (n, oo). This relationship does, however, explain the small amount of

information provided by the QLC model when t = 3. The profile likelihood lower

bound, 407.2, is only slightly larger than the quasi-symmetry lower bound n = 271.

We will see the similarities between the two models for t = 3 in Chapter 5. When

t > 3, one can also obtain only a lower bound for N from this model. Chapter 5

will also demonstrate that if a table does provide a two-sided interval for N, such as

in the snowshoe hare example, these intervals tend to be narrower than the logistic-

normal and homogeneous two-factor interaction intervals. These simulations will

show that when the logistic-normal model is the true model, these narrower intervals

are optimistic in that their true coverage is less than the nominal level.

Numerical optimization for the latent class model yields N = 476.1. The BCa

bootstrap (B=1000) interval is (408.3,573.6), even though the log-likelihood is ex-

tremely flat. This is a major disadvantage of the bootstrap in that it may not rec-

ognize the extremely flat log-likelihoods associated with a model. Figure 3.4 shows

why this happens. This figure plots -2 Log L as a function of N for the observed


-2 Log L



-1520 ------- Z-
I Ii N
\, Observed
-1520 ~~~ -z\.. --- --_ .......

400 800 1200 1600

A= 476.1

Figure 3.4. -2 Log L Profile for the Observed Hepatitis Data (Solid Line) and Four
Resampled Tables (Dashed Line)

hepatitis data (solid line) and for four tables obtained by resampling from the ob-

served data. We remark that the values on the -2 Log L axis are not the true values

for the resampled profiles. Rather, these profiles are overlayed on the plot for the

observed data, since we are simply interested in relaying (1) the N-estimates that

are produced at the minimum of the four resampled profiles, and (2) the fact that

the profiles are extremely flat and never exceed the minimum value + X ,.5. We see

that the minimum -2 Log L value for the resamples are shifted somewhat away from

the observed value, but that the resampled statistics N9, = (423.6, 443.4,547.5, 629.7)

ignore the flat log-likelihoods associated with the observed and resampled tables.

Thus, one can obtain a maximum, N, and bootstrap from this estimate to obtain

a confidence interval even though almost all values of the missing cell count are

consistent with the model. In fact, simulations in Chapter 5 demonstrate that, for

t = 3, the narrow percentile intervals from the QLC model provide close to 0%

coverage, while the profile likelihood intervals with upper bound oo provide coverage

close to nominal. Of course, these intervals provide very little practical information

on the population size.

3.6 Similarities to Other N-Estimation Problems

Flat log-likelihoods have arisen in other capture-recapture approaches. Burnham

and Overton (1978) made a passing reference to the performance, in this respect, of

the beta-binomial model, and they presented the jackknife estimators. Chao (1987),

however, presented simulation results that showed very poor coverage probabilities of

this estimator when t is small to moderate (i.e. t < 10).

Similarities exist between this problem and the related problem of N-estimation

when observing k independent and identically distributed binomial counts with un-

known N and success probability p; see, for instance, Olkin et al. (1981), Casella

(1986), and Aitkin and Stasinopoulos (1989). Aitkin and Stasinopoulos (1989) demon-

strated flat likelihood for certain configurations of data that provide little information

about N. All these authors demonstrated that when the log likelihood is flat, the ML

estimator is unstable, with small changes in the data yielding large changes in N. For

the hepatitis data, we notice that N changes from 3194.3 to 3816.2 to 4599.5 when

n(l,..,l) changes from 27 to 28 to 29. Olkin et al. (1981) proposed new estimators
that "stabilize" the ML estimate, also by jackknifing, but which often result in the

underestimation of N in stable cases (Casella, 1986).


Fitting a variety of models to the snowshoe hare data and hepatitis data demon-

strates that different models can produce dramatically different point and interval

estimates of population size. This is because the problem of estimating the unob-

served cell count no...o is inherently one of extrapolation. Knowing the true value of

no...o = 274 for the hepatitis data allows us to investigate the fits of these models to
the complete table. Table 3.6 displays the fits of the logistic-normal (a profile along

q indicates that q = 10 is sufficient for the complete table) and the latent class model

to the complete table. The logistic-normal model does not fit the complete table well,

since Gcomplete = 9.3 with df = 3. The latent class model, on the other hand, has

Gcmplete =- 1.0 with df = 2 for the complete table. Indeed, the latent class model
has a biologically plausible interpretation for the hepatitis data set. The population

can be divided into two groups: one that is susceptible to the virus and one that is

relatively immune. Compare to Table 3.5. We cannot discern the difference in fits

from the incomplete table, since, conditional on n, both models produce identical fits.

This reflects the fact that usual goodness-of-fit criteria are not appropriate for this

extrapolation problem. This also suggests that models that have poorer fits to the

complete table could potentially have an adequate fit for the unobserved cell so as

to produce a confidence interval that contains the true N. So models with a poor

conditional fit cannot be excluded from consideration. We demonstrate using a 24

table resulting from the cross-classification of 263 individuals according to whether or

not they contracted influenza during 4 influenza outbreaks. This data is considered

in detail in Chapter 4. For now, we simply note that the logistic-normal (q = 10)

fit to the complete table yields deviance Gomplete = 27.7 based on df=10, provid-

ing strong evidence of lack-of-fit. However, if we pretend that noooo is unknown and

estimate N using the logistic-normal model, we also obtain evidence of lack of fit

with G complete = 26.5 based on df=9, but obtain N-estimate Nc = 204.2 and profile

likelihood interval (170.9, 388.0). Thus, subject matter knowledge of the capture-

recapture application becomes important in assessing the performance of different

model-based estimators.

Table 3.6. Capture-history Counts and Fitted Values for Compete Hepatitis Table
P Q E ni LN (q = 10) QLC (L=2)
0 0 0 274 279.1 274.0
0 01 63 55.5 61.0
0 1 0 55 52.8 58.0
0 11 18 22.5 17.0
1 00 69 62.0 68.0
1 0 1 17 26.4 20.0
1 1 0 21 25.1 19.0
1 1 1 28 21.6 28.0

Since the standard techniques for model selection based on goodness-of-fit criteria

can give misleading results in this setting, we recommend an exploratory data analy-

sis approach to modelling capture-recapture data, instead of rejection/acceptance of

models based on goodness-of-fit criterion. We recommend looking at several models

(such as logistic-normal with various q, latent class models, and log-linear models)

simultaneously along with profile log-likelihood plots for the models and variance com-

ponent and item parameter estimates from the logistic-normal model. Simulations in

Chapter 5 will show that when the logistic-normal model fit yields a large estimated

variance component and flat log-likelihood surface, the simpler log-linear model of

homogeneous two factor interaction is preferable. The simulations suggest that even

if the true model is the logistic-normal model, the simpler log-linear model is compet-

itive with the true model with respect to confidence interval coverage and superior

with respect to interval width if the logistic-normal model yields a flat log-likelihood.


In fitting the logistic-normal model of Section 3.1.2, we assume that the t responses

are independent for a given subject. This assumption implies that any observed de-

pendencies among the t samples are due solely to population heterogeneity. In other

words, these models assume that a positive association among sampling occasions is

not caused by the fact that capture on the first occasion increases a subject's probabil-

ity of capture on the second, but rather that capture on the first occasion indicates an

above-average susceptibility to capture and hence an above-average chance of capture

on the second occasion.

In this chapter, we investigate models that relax this strong assumption. We

first consider models that assume serial, or Markov, dependence among the t cap-

tures. These models make sense when the occasions are ordered in time, since the

probability of capture at occasion j depends on capture status at occasion j 1, for

j = 2,..., t. Thus, these models are often appropriate in animal abundance studies

in which the sampling occasions are trappings conducted sequentially in time and

a subject experiences trap "avoidance" or trap "dependence." These methods are

not appropriate for epidemiological data arising from t lists or records of subjects

with a certain condition, such as the hepatitis data of Section 3.5, since the sampling

occasions are not ordered. In such settings, it is reasonable to incorporate a depen-

dence structure that is symmetric with respect to the t occasions. In Section 4.1, we

describe this structure, which provides an alternative motivation for the log-linear

model of homogeneous two-factor interaction of Section 3.1.1.

Sections 4.2 and 4.3 investigate two alternative random-effect models that allow

for dependence structures that are more general than that implied by the logistic-

normal model. Section 4.2 presents a model for capture-recapture that adds a normal

random effect to the linear predictor of the log-linear model of mutual independence.

This random effect models an observed table's departure from the simple mutual-

independence structure. We use an EM algorithm to fit this model conditional on the

number of subjects observed, and investigate the use of profile likelihood confidence

intervals for this model. Section 4.3 presents a model that is potentially useful when

one has reason to believe that capture status at two or more occasions are negatively

correlated, such as the situation of trap "avoidance". This model, which we term the

multivariate logit-normal model, is a binary analogue of Aitchison and Ho's (1989)

Poisson log-normal model for multivariate count data. To our knowledge, this binary

analogue has not been addressed in the statistical literature.

4.1 Serial Dependence

Serial dependence occurs when the probability of capture at time j depends on

the capture status at time j 1, j = 2,...,t. Duncan (1985) and Conaway (1990)

presented a generalization of subject-specific logistic model (3.1) that relaxes the local

independence assumption. As in the local independence case, the CML approach pro-

vides no information on population size. We therefore investigate models that remove

the requirement of main diagonal saturation, and generalize these serial dependence

models to situations when the "sampling occasions" are not ordered in time.

4.1.1 Sequential Sampling Occasions

Duncan (1985) and Conaway (1989) incorporated serial dependence with the


e(aS,+j=l BJysJ+7h.)
p(y, a,) = E= y, (4.1)
(si,...,yst) e(assj+h.)

where S, is the total number of captures for subject s and hs is the number of adjacent

(in time) pairs of samples that have concordant responses for subject s. The serial

dependence parameter 7 reflects within-subject dependence among the t occasions.

Thus, trap-avoidance results in a negative y value while trap-dependence results in a

positive 7 value. Analogous to the results for the logistic model of Section 3.1.1, the

CML approach leads to the log-linear model,

log(pi...)= E jij + A(il,..., it) + D(ii,..., it)'y, (4.2)

for the 2t table, where A(il,..., it) is invariant to permutations of its argument and

D(il,..., it) = ili2+(1-i)(1-i2)+i2i3+(1-i2)(1-i3)+- +i(t-1) it+(1-i(t_))(1-it).

Like its local independence analogue, this model does not provide population size

estimates since the A(O,..., 0) parameter forces a perfect fit for the 0 = (0,...,0)

cell. We examine two simpler models that provide population size estimates.

Serial Dependence Assuming No Heterogeneity

The first model, that of serial dependence (SE) assuming no heterogeneity in the

population, results from dropping the A(il,..., it) parameters from model (4.2). The

model is
log(pil...i) = 1E jij + D(ii,..., it)-7. (4.4)

Table 4.1. Column of the covariate matrix corresponding to w in model (4.5) for the
models of homogeneous 2-factor interaction and serial dependence when t = 3

( 3=1i D(i,i2, i3)

000 0 2
001 0 1
010 0 0
011 1 1
100 0 1
101 1 0
110 1 1
111 3 2

This model accounts for dependencies among the t responses by the addition of the

extra parameter y. Both this model and the log-linear model of homogeneous two-

factor interaction have form

log(~.) = [X2tt : Xt+] ( ), (4.5)

where X2txt is the covariate matrix corresponding to the mutual independence model.

w is in the serial dependence model and A in the homogeneous 2-factor interaction

model. The only difference between the two models is the column of the covariate

matrix, Xt+1, that corresponds to w. Table 4.1 displays xt+l associated with and A

for the two models when t = 3.

We see that the serial dependence model smooths the fitted value for cell 0 towards

the value of cell 1 = (1,...,1) much more than the homogeneous 2-factor interaction

model, since both cell 0 and cell 1 have the largest number, t, of adjacent concordant

pairs. This is in contrast to the homogeneous two-factor interaction model, which

smooths this unknown count towards those cells corresponding to one capture. Thus,

when most subjects are captured on only one sampling occasion, the serial dependence

model produces smaller population size estimates than the H02 model since the
number of subjects captured at all t occasions is small.
To illustrate, we fitted the serial dependence model to the snowshoe hare data

of Section 3.4. The model yields point estimate Nc = 74.6 and corresponding 95%
profile likelihood interval (69.5, 83.8). These results are very close to the mutual

independence results of Nc = 75.1 and (69.9,83.3) because 7 = -.039 is close to

zero relative to variability in D. The serial dependence fit yields G2 = 58.2 based on

df=55, while the mutual independence model yields G2 = 58.3 on df=56. Simulations

in Chapter 5 will show that the addition of this dependence parameter improves
upon the mutual independence model slightly by yielding confidence intervals that
are slightly wider than the mutual independence intervals. Chapter 5 shows that as a
result, when the logistic-normal model truly holds, these intervals improve upon the
poor coverage of the mutual independence intervals somewhat, but are also overly
optimistic when large amounts of heterogeneity exist in the population. This is not
surprising since we obtained the model by dropping the A(il,...,it) heterogeneity
parameters from model (4.2).

Mixed Serial Dependence Model

We also investigate the performance of the mixed model that results from assum-

ing as = oaZ in model (4.1), where Z, i.i.. N(0, 1). This extends the logistic-normal

model of Section 3.1.2 to allow for within-subject dependence. The quadrature ap-
proximations to the marginal cell probabilities for this model have the form

t e (k7ZkSiI..Ati + j'j= 1 i+ ...i)
l- e(..i = E Zki+ jij+yhi,. it) k (4.6)
k=1 Z iE e I

where Si,...i, and hi,...i, are the raw score and serial dependence value, respectively,

associated with cell i = (il,..., it). It is worthwhile to investigate under what con-
ditions the addition of the serial dependence parameter improves the coverage of the
simpler logistic-normal model and to compare the lengths of the confidence intervals
produced by the two models. We investigate these questions through simulation in
Chapter 5.
This random-effects approach to model (4.1), like the logistic-normal model, can
also yield flat log-likelihood surfaces with respect to N. For instance, for the snowshoe

hare data set, this approach yields Nc = 337.7 and 95% profile likelihood interval
(75.7,476.4). So, again, the inclusion of a random effect results in a model that may

provide little information about the population size.

Serial Dependence + Homogeneous Two-Factor Interaction

Mimicking the developments in Section 3.1.1 that motivated the log-linear model
of homogeneous two-factor interaction, we obtain a second serial dependence model,
which we denote by HO2SE, by replacing the A(il,..., it) parameters in model (4.2) by

the homogeneous two-factor interaction dependence term, ( Lfi This yields

the model

log(Pil...i)J = jj + i A + D(il,..., it)7, (4.7)
j=1 2 /

where as before D(il,..., it) has form (4.3). The log-linear model of homogeneous

two-factor interaction is the special case of this model with y = 0.

To illustrate, we fit this model to the snowshoe hare data. This model yields

N-estimate Nc = 92.3 and 95% profile likelihood interval (75.5,130.1). This model
provides similar results to the log-linear model of homogeneous two-factor interac-
tion since 5 = -.34. Unfortunately, we cannot interpret this estimate as a reflection

of the within-subject dependence between the t responses, since this within-subject

dependence and the dependencies resulting from population heterogeneity are con-

founded (Darroch and McCloud 1990). To see this, note that, under model (4.7),

the conditional odds ratios are equal to A + 2y. Including this serial dependence

parameter into the model, however, can be worthwhile when estimating N. Simula-

tions in Chapter 5 show that, when both within-subject dependencies and population

heterogeneity exist, model (4.7) maintains close to nominal coverage in those cases

when the log-linear model of homogeneous two-factor interaction produces intervals

with coverage lower than the nominal level. Chapter 5 will also demonstrate that

the logistic-normal model seriously overestimates N when subjects in the population

experience trap avoidance. Thus, if one suspects trap avoidance in the form of a

negative 7 in model (4.1), model (4.7) is an attractive alternative for estimating the

population size. We now provide an alternative motivation for the log-linear model

of homogeneous 2-factor interaction by generalizing model (4.7) to the situation of

unordered sampling occasions.

4.1.2 Unordered Sampling Occasions

The two models in the previous section do not apply to the hepatitis data set

because the three "sampling occasions," lists of patients contracting hepatitis A, are

not sequential. We now extend the models of the previous section to treat the t oc-

casions symmetrically. We define alternative dependence covariates, D8'm(il,..., it),

that, instead of considering only adjacent pairs of occasions, treat all occasion pairs

equally. Specifically,

Dym(il,... ,it) = ili2 + (1 i)(1 i2) + ii3 + ( i)(1 i3)

+ + it-lit + (1 it-1)(1 it),

Table 4.2. Column of the Covariate Matrix Corresponding to A in the Homogeneous
Two-Factor Interaction Model and 7 in models (4.8) and (4.9)

(il, i2,i3) 1 Dym(i, i2, i3)

000 0 3
001 0 1
010 0 1
011 1 1
100 0 1
101 1 1
110 1 1
111 3 3

so that the symmetric versions of models (4.4) and (4.7) are

log(il...i) = O fij + Daym(il, ., it)Y. (4.8)

t I ji ij
log(Pil,.,i,) = E ji + 11 A + DSVm(ij,..., i) (4.9)
j=1 2 "

respectively. Table 4.2 displays Dsym(il, i2, i3) values along with the column of the

covariate matrix corresponding to A in the H02 model for t = 3.

We notice, however, that this symmetric dependence term satisfies

D'"(i, i2, i) = 3 2i 2i2 2i + 1 ij

and more generally,

DsYm(ij,...,i = 2 -2 ij + _ij
2 j =1 2 "

Thus, given the marginal indicators, il,..., it, the symmetrical dependence parameter

7 is aliased with the A term in the H02 model. Therefore, both models (4.8) and (4.9)

are equivalent to the log-linear model of homogeneous two-factor interaction. So, the

log-linear model of homogeneous two-factor interaction postulates the dependence

structure as a symmetric one when there is no ordering among the t occasions.

4.2 An Overdispersed Poisson Log-Linear Model

Chapter 5 shows that when population heterogeneity is present, ignoring this het-

erogeneity by fitting the log-linear model of mutual independence to the incomplete

table produces overly-optimistic, narrow confidence intervals that systematically un-

derestimate N. In this section, we pursue a more general Poisson model that accounts

for a table's departure from mutual independence by adding a normal random effect

with 0 mean and unknown variance, oa2, on the log scale. This model differs from the

logistic-normal model since it postulates the random deviations from mutual depen-

dence as being at the cell level, instead of the subject level.

There are both advantages and disadvantages of this model for capture-recapture.

The model is more general than the independence model, and the addition of the ran-

dom effect produces confidence intervals for N that are wider than the independence

interval. These intervals, however, require more computation. A sufficiently large

number of quadrature points must be employed when approximating the marginal

log-likelihood in order to obtain a continuous profile of the deviance with respect to

N that yields a profile likelihood interval.

Section 4.2.1 presents the model, while Section 4.2.2 describes an EM algorithm
for fitting the model. Section 4.2.3 demonstrates the advantages and disadvantages

of the model for capture-recapture by fitting the model to the snowshoe hare and

hepatitis data sets of Chapter 3.

4.2.1 An Overdispersed Log-Linear Model

The log-linear model of mutual independence for a 2t table models the expected

frequencies, {/1i} = {, ...i,}, as

log(mi) = Po + /31(il = 1) + ... + PtI(it = 1). (4.10)

Instead of adding association parameters to the model, we model an observed table's

departure from the mutual independence model as overdispersion. Specifically, the

model has form

log(Ai) = 0o + Yin + ... + 3tit + aZi,

where Zi i.d. N(0, 1). Thus, pi is assumed to have a log-normal distribution with

mean 3o + /311i + + 3tyit and variance a2, yielding a Poisson-lognormal(30 +

/31Yi + + Ptit, o2) model for the cell counts.
This model contains the mutual independence model as a special case with a = 0.

The variability associated with the random effect will result in wider confidence inter-

vals for N, alleviating the problem of extremely narrow confidence intervals caused by

the overly simplistic mutual independence assumption. Since any extra variation be-

yond that specified by the mutual independence model is modelled on the same scale

as the linear predictor, this approach is often employed in generalized linear models

when there is more variability than predicted by the model due to the omission of

covariates. We next consider standard methods to fit this model.

4.2.2 Estimation

Like the generalized linear mixed models that incorporate a random effect to

account for dependence within clusters of observations, such as the logistic-normal

model of Section 3.1.2, overdispersed generalized linear mixed models that include

a random effect for each observation are also computationally difficult to fit using

maximum likelihood. These models also produce marginal log-likelihoods that do not

have closed form, again making Gauss-Hermite quadrature necessary when assuming
a normal random effect. We use the EM algorithm described by Anderson and Hinde

(1988) and Aitkin (1996). The form of the EM algorithm is similar to that of the
EM algorithm detailed in Section 3.1.2 for fitting the logistic-normal model to the
complete table, although that algorithm is slightly simpler since we can reduce the
data down to the expected number of successes and trials for a given value, Zk, of the
random effect. For the overdispersed Poisson model, we must fit a response vector of
length 2t x q, where again q is the number of quadrature points.
We proceed by following Aitkin (1996). Consider the marginal log-likelihood of

(a, ,),

1(3, a; n) = log (f f(ni1, a, zi)i(z)dz),


f(ni F, a, zi)- e- l

and pi is specified by the model,

/i = exp(o + 31Pyi + + Ptvit + aZi).

q-point Gaussian quadrature yields the approximation

1(,8, u; n) w log Vkfik ,
i \k=1

where fik is the Poisson density for ni given value zk for the random effect. Compare
with expression (3.13). Proceeding as in Section 3.1.2 and using identity (3.15), a

/-element of the score vector has form

al q Vkfik Ologf-k q
LL E= 5, = E5 WizkSik),
P i k= l 1Vkfik 0P i k=1


k fik
'Wik --fEk=l Vkfik

and SikR(O) is simply the /-element of the score vector corresponding to the generalized

linear model

log(Pik) = 1o + AlYn +- .. + PtYit + UZk.


9l q
-= Z WikSik(o).
i k=1

Setting the score vector equal to zero yields likelihood equations corresponding to the

weighted generalized linear model

log (E [n'])(2tq)1 = V(2t xq)x(t+2)7(t+2)x1 (4.11)

with prior weights w = (w11, w21, ..., 2t(q-1), w2(q)), where

n(2txq)x1 = (,n, ,..., n),

X z112t
X z212t
V(2txq)x(t+2) =

X Zgl2t

X2tx(t+l) is the design matrix for the mutual independence model (4.10), and 7(t+2)xl

= (3,o).

This representation suggests the iterative procedure that, at iteration (p + 1),

computes the prior weights w(p+1) for given parameter estimates YP), and obtains

updated estimates -y(P+) by fitting model (4.11) to the expanded data vector nQ.

Anderson and Hinde (1988) show that this algorithm takes the form of an EM algo-

rithm that considers the joint distribution of (n, Z) as the complete data and n as the

incomplete data. The computation of the prior weights, given parameter estimates

"(P, corresponds to the E-step that computes the conditional expectation, Q(7y7()),

of the complete log-likelihood given the observed data and the parameter estimates

from iteration p. The fitting of the weighted generalized linear model (4.11) maxi-

mizes this conditional expectation with respect to 7, constituting the M-step of the

EM algorithm. One can take 0 obtained from the mutual independence model and

and & = 0 as the initial estimates for the algorithm. We terminate the algorithm

when the difference between deviances of successive fits is sufficiently small.

We compute the residual deviance of the overdispersed Poisson fit by comparing

the resulting maximum weighted log-likelihood under this model against that for the

saturated fit:

G2= -2 Elog ( e-ivk -2E[ni(log(ni)- 1)]
i k=l1 i

We use the deviance to construct profile likelihood surfaces with respect to the un-

known cell count no...o for capture-recapture.

4.2.3 Application to Capture-Recapture

We now apply this overdispersed Poisson model to capture-recapture using

Sanathanan's conditional (on n) methods of Section 2.2.1. We have noted that this

approach yields the estimate o0...0 that yields minimum G2. We first obtain this

minimum G2 value by fitting the model while assigning count no...o weight 0, just as

we would when fitting a standard log-linear model.

For the hepatitis data using 10-point Gaussian quadrature, we obtain

inf G2(no...o) = 18.64.

Thus, we attempt to construct a profile likelihood interval for N that contains all

values n + no...0 = 271 + no...o for which the residual deviance of the complete table

fit is less than 18.64 + 3.84 = 22.48. We obtain a profile of these residual deviance

values by searching across 0 < no...o < 500, with the understanding that we may have
to investigate larger no...o values if the corresponding deviances are all less than or

extremely close to this cutoff value. Using a convergence criterion of .0001 for the

change in sucessive deviances, the maximum number of EM iterations performed for

a particular value of N was 48 when N = 338.

2 32

30 -




22 -
20 -
20 710 743

18 -I -

297 530


Figure 4.1. Deviance (G2) as a Function of N for the Overdispersed Poisson Model
(q=10) Applied to the Hepatitis Data

Figure 4.1 shows the resulting deviance profile with a horizontal dotted line de-
noting the acceptable deviance cutoff value of 22.48. We see that when q = 10, unlike

the standard log-linear and random effects models of Chapter 3, this profile is neither

continuous nor does it have convex shape near its minimum. Instead, we obtain a
confidence set that is not an interval, since N E (297, 530) and N E (710,743) satisfy
G2 < 22.48.





24 .. ............

22 -


18 I

299 560


Figure 4.2. Deviance (G2) as a Function of N for the Overdispersed Poisson Model
(q=15) Applied to the Hepatitis Data

Figures 4.2 and 4.3 plot the deviance profiles for 15- and 20-point quadrature,
respectively. Comparing these plots to 10-point quadrature suggests that the irregular
behavior of the G2 profile when q = 10 is a result of the quadrature approximation.
The better approximations produce much smoother functions of G2 as a function of N,
yielding interval estimates for N. This suggests that we may require a large number of
quadrature points to obtain a deviance profile that provides the appropriate interval



26 -

24 ...............................

22 -

20 -

18 --

295 561


Figure 4.3. Deviance (G2) as a Function of N for the Overdispersed Poisson Model
(q=20) Applied to the Hepatitis Data

estimate of the population size. When q = 15,

inf G2(no...o) = 19.68

and we obtain the 95% confidence interval (300, 560) for N. We obtain similar results
when q = 20, with

inf G2(no...o) = 19.81

and (295, 561). Both of these more accurate approximations yield intervals that are
much wider than (351.5, 437.1), the mutual independence interval obtained when

ignoring population heterogeneity. Thus, the introduction of a random effect for

each cell in the 2t table introduces uncertainty for N beyond that induced by the
mutual independence model. This additional uncertainty produces wider confidence
intervals for N. For the hepatitis data set, this wider interval contains 545, the true
population size. In addition, this mixed model does not incur the extremely flat
likelihood encountered by the logistic-normal model in this example.




Figure 4.4. Deviance (G2) as a Function of N for the Overdispersed Poisson Model
Applied to the Snowshoe Hare Data

We also analyzed the snowshoe hare data with this overdispersed Poisson model for

q = 10. The point estimate, Nc = 75.0 and the profile likelihood interval, (70,83),
for N match the results of the mutual independence model for this case. This is

because the estimate of the random effect standard deviation, & = .007, is close to

zero. Thus, the maximum likelihood fit of the mixed model is essentially the special

case of mutual independence. Figure 4.4 shows the profile of G2 for this model. Note

that with such a small estimated dispersion parameter, q = 10 quadrature points is

sufficient and we obtain a smooth profile of the deviance.

4.3 The Multivariate Logit-normal Model

The use of a subject-specific random effect in the logistic-normal model imposes

a correlation structure among the t responses in which only positive correlation is

possible. In this section, we investigate a more general model that permits negative

correlations to exist between pairs of responses. We first review an analogous model

for multivariate count data, which was proposed by Aitchison and Ho (1989), and

then extend their ideas to the binary setting.

Aitchison and Ho (1989) induced correlations between multivariate Poisson counts

X = (X1,..., Xt) by assuming that (1) these counts, given a vector 0 = (01,..., Ot)

are independent Poisson variates with parameters 0 and (2) that this mean vector

0 has a log-normal(p, E) distribution. That is, log(0) = (log(01),...,log(0t)) -

MVN(Ip, E). Note that the Poisson log-normal distribution discussed in the last

section for allowing overdispersion in the mutual independence model is a special

case of this multivariate model. The authors chose an appropriate transformation,

exp, of a multivariate normal random vector that produces a random variable with

support appropriate for a Poisson mean vector, namely the positive orthant, R_, of

t-dimensional real space. Thus, the mixture distribution satisfies the restriction that

the mean vector has support R+, while retaining the rich covariance structure of the

multivariate normal mixture.

This Poisson-log normal assumption yields closed-form expressions for the

marginal moments of the elements of X since the log-normal distribution has closed-

form expressions for its moments:

E(Xi) = E(E(XLj0)) = E(Oi) = exp(p, + (1/2)aii).

var(Xi) = E(X,) + (E(Xi))2 [exp(aii) 1]

exp(aij) 1
corr(X, X) {[exp(ai) 1 + (E(X1))-1] [exp(ajj) -1 + (E(Xj))-']}'

where aoi denotes the (ij) element of E. Examination of the corr(Xi, X) reveals

that the direction of this correlation is determined by the sign of Oij from the log-

normal distribution. This is important since one can examine the maximum likelihood

estimate t and know if the count correlations are estimated to be positive or negative.

The authors also note that

Icorr(X1,Xj) < Icorr(i, 0j) ,

so that the count correlation range is bounded above by the correlation between the

corresponding log normal means.

Aitchison and Ho (1989) demonstrated this model's utility when negative cor-

relations exist between clustered counts. They analyzed data consisting of counts

from three air samplers at 50 locations. Aitchison and Ho recognized that differ-

ences among the 50 locations will induce correlations between the three counts from

a particular location. The most common way of inducing correlations between the

three sampler readings for a given location is to include a location random-effect in

the model. This approach induces positive correlations between the three samplers

for a given location. The Poisson-log normal analysis, however, provided negative

correlation estimates in the ML estimate E, indicating that the three samplers are

competing against each other at a particular location.

An analogous approach could be used to account for negative correlations in a
t-variate binomial vector. That is, let Y = (Y1,..., Yt) be a binomial random vector
with number of trials (nl,...,nt) and success probabilities ir = (7l,..., 7t). We
choose the logistic transformation to map a t-variate multivariate normal vector onto
the necessary parameter space [0, 1]i for ir. Thus, we assume that the binary vector
results from the mixing of independent Binomial(n1, 7ri) distributions with mixture
distribution specified by

S exp= (W j) = ,.., t,(4.12)
1 + exp(Wi)

where W = (Wi,..., Wt) ~ N(p, E). We denote this multivariate logistic-normal
distribution as n ~ LNt(I,, E), and the resulting multivariate logit-normal mixture
distribution for Y as Y ~ BLNt(Ii, E).

Unfortunately, unlike the Poisson log-normal mixture, the multivariate logit-normal

distribution does not have closed-form expressions for its moments, since closed-form
expressions do not exist for the moments of the logistic-normal distribution and

E(Y1) = E(E(Y )) = n2E(7i),

var(Yi) = E(var(Yl7r~)) + var(E(Yil7ri))

= n1E(7r(1 ir)) + n var(7r)

= n (E(7r)(1 niE(7r)) + ni(n, 1)E(r),


cov(Y) = cov(E(YI|r)) + E(cov(Y|lr))

= nfcov(r) + E(Diag[n,7ri(1 y)])

Taylor series expansions provide approximations for the logistic-normal means,
variances, and covariances when a is small. Williams (1982) showed that for small

aii, the mean of a logistic-normal random variable can be reasonably approximated

1 + exp(pi)

Denote this approximate value as l*. Then an approximation for the logistic-normal
variance is

Var(7ri) u i [1 (1 14)12.

Also, for two logistic-normal random variables (iri, 7rj), we have

Cov(7Tri, Trj) (aUi (7) 4 (1 i;) ,; (1 j ;).

Our simulation work shows that these approximations tend to break down when

uai > .64. Thus, we simulated variates from a

\2 ^ y )[ a20, 2p

distribution for a variety of (p, a, p) values in order to get an idea of the prop-

erties of the multivariate logit-normal distribution for a wide range of parameter

values. We ran 100,000 simulations at all combinations of p = (-1.0, -0.5,0.0),

a = (0.5,1.0,1.5,...,5.5), and p = (-0.9,-0.8,.. .,0.8,0.9). We only consider

negative M values since the binary means are symmetric around .5 for positive ip.

Since this produces 627 possible (jp, p) combinations, we report only those for

p = (-1.0, -0.5, 0.0), a = (0.5, 2.5, 5.5) and p = (-.9, -.5, 0, .5, .9). Table 4.3 reports
the correlations between Y1 and Y2 for these combinations, while Figure 4.5 plots the
binary correlations (solid line) and logistic-normal correlations (dashed line) for the

full range of p values for several (p, a) values.
These results show that the multivariate logit-normal distribution has properties

that are similar to the Poisson log-normal distribution. We see that the binary cor-

relation has the same sign as the correlation between the bivariate normal random

Table 4.3. Simulated Correlation between Y1 and Y2 when Y = (Y1, Y2) is Distributed
as BLN2(p, E), when p = -1.0, -0.5, 0.0 and O2 = 0.5, 2.5, 5.5
/ oa2 -0.9 -0.5 0.0 0.5 0.9
-1.0 0.5 -.074 -.040 .002 .045 .075
2.5 -.241 -.139 -.005 .142 .265
5.5 -.354 -.193 -.001 .206 .388

-0.5 0.5 -.089 -.046 .003 .042 .085
2.5 -.268 -.146 .002 .152 .267
5.5 -.387 -.204 -.003 .210 .397

0.0 0.5 -.091 -.044 .001 .050 .089
2.5 -.277 -.148 -.003 .152 .277
5.5 -.401 -.218 -.001 .208 .400

variables. The range of the possible binary correlation is not as wide as that of the

corresponding logistic-normal distribution.

This correlation restriction means that this model will not yield a good fit when

the data exhibit high binomial correlations. Thus, it is possible to see a poor model

fit to the data even if the model contains as many parameters as there are cells.

For example, consider the model applied to the complete hepatitis data of Table 3.6.

If we estimate an unrestricted mean vector p. and variance-covariance matrix E,

we are fitting an eight-cell table with a model containing nine parameters. The

model fit, however, yields G2 = 10.5. Thus, this model cannot account for the

high binary correlations between the three samples. An analogous situation is the

performance of the logistic-normal model in the matched-pairs setting (i.e. t = 2).

The 2 x 2 table has three unrestricted cell probabilities while the model contains three

parameters: (a, ,, 02). If an observed table exhibits a negative correlation between

the two samples (e.g. odds ratio less than 1.0), the logistic-normal model will not

be able to reproduce the observed table since the random subject effects impose a

positive correlation structure. Consider the example



-1 p

-1 0 1


S=-0.5 0


-1 0 1





-1 0 1




-1 0


-1 0





-1 0 1


Figure 4.5. Corr(YI, Y2) (Solid Line) and Corr(7tl, 7r2) (Dotted Line) as a Function of
p, for A = (-1.0,0.5, 0.0) and a = (0.5, 2.5, 5.5)

occ. 2

1 0

occ. 1 1 4 5

0 10 2

The odds ratio is .16, and the likelihood ratio test for the logistic-normal model is

G2 = 3.6. The fit of this model is on the boundary, with & = 0.0. Likewise, the

BLN3 fit for the hepatitis data is also on the boundary with all estimated correlations


-1 0


p=O 0


-1 0


in E equal to one. Thus, like the Poisson log-normal model, the multivariate logit-

normal model allows negative correlations, but still has limitations with respect to
the patterns of correlations that the model is able to fit.

4.3.1 Estimation

Let g(n7rI,, E) denote the probability function of the d-dimensional component-
wise logistic transformation of a t-variate N(,p, E) distribution, so that

i=1 -1/

exp {-1 (logit(wr) p)' E- (logit(7r) )

where logit(7r) is taken component-wise.
Note that the logistic-normal vector 7T differs from the vector modelled with the
additive logistic-normal distribution in order to analyze compositional data. The ad-
ditive logistic-normal distribution (Aitchison and Shen (1980) and Aitchison (1986))
is an appropriate distribution for random vectors on the simplex; that is, 7r' =

(7r,..., ') with the constraints 0 < 7r' < 1, j= 1,...,t and '= l) = 1. This
distribution is induced using the transformation

I exp(Wj)
Sl+ exp(W) +... + exp(W(t-_))'

= 1 + exp(W1) + ... + exp(W(t-1)'

where W Nt(I, E). Rather than being required to sum to one, the 7r vector

of means for the multivariate logit-normal model has support [0, 1]t, so that the
component-wise transformation (4.12) is more appropriate than the additive form for
this application.

The BLNt(,p, E) mixture then has density function

p(yI|, E) = f I f(Yini, 7ri)g(rlp, E)d7r, (4.13)
J[0,1]t i=1

where f(yini, 7ri) denotes the usual binomial density with ni trials and probability

of success 7i.

Estimation of the parameters (iA, E) is more complex computationally for this

model compared to the other mixed-models we have examined because we must use

multi-dimensional Gaussian quadrature to approximate the t-dimensional integral in
density (4.13). This employs the same strategy of writing the density as

J h(z)exp(--z'z)dz
JR 2

and now approximating it by a t-dimensional weighted average of the function h

evaluated at the qt different quadrature vectors, Zkl...k = (Zk, Zk), where {zkm},

km = 1.. .,q, m = 1,...,t are the univariate Gaussian quadrature nodes of Sec-
tion 3.1.2. The multivariate quadrature weights vI ...kt are products of the univariate

quadrature weights,
Vtk ...= I vk,.

The transformation

logit(r) / = Qz,

where Qtxt is the unique lower triangular matrix with positive diagonal elements such

that E = QQ' is positive-definite, yields the necessary form

p(yl, Q) = (27)-t/2 /R h(y, Q, z)exp {- z'z} dz


h(y, A, Q, z) = exp y'( + Qz) ltlog [1 + exp( + Qz)] + log [( )]

ML estimates fJ and : = QQ' are obtained by maximizing the approximation

p(yIA, Q) = h(y, p, Q, zk)exp {- k zIk u, (4.14)
k I-2

where k = (kl,..., kt).
We use FSQP to maximize (4.14) with respect to (A, Q). We were concerned
about the performance of this maximization algorithm when handling such a complex
function, so we programmed Aitchison and Ho's (1989) analogous Poisson log-normal
density in FSQP and computed the ML estimates for the four models fit to the sampler

data discussed in the previous section. The results for all four models matched those

reported in Aitchison and Ho's (1989) Table 5, demonstrating that FSQP successfully

maximimized the Poisson analogue of (4.14).

The generality of the multivariate logit-normal model allows us to test various
hypotheses concerning the t responses. For instance, if we fit the model with the con-
straint that the off-diagonal elements in E are zero, this maximum log-likelihood can

be compared to the log-likelihood with unrestricted aij to test mutual independence
between the t responses. Alternatively, we could test the equivalence of the t means
or that no differences exist between the t responses:

j= = ai, jj = 2, p2.

Aitchison and Ho (1989) term this hypothesis the isotrophic hypothesis. The logistic-

normal model of Section 3.1.2 is the special case

j = j, = 02, j = pc2

with p = 1 (Agresti 1997).

The multivariate logit-normal model becomes computationally difficult to fit when
t is moderate to large. Using FSQP, maximizing a 10-point quadrature approximation
of the log-likelihood when t = 3 takes approximately 1-2 hours on a SUN SPARCsta-
tion 20 with 48 MB of RAM. Increasing the dimension to four substantially increases

Table 4.4. Observed Frequencies and Fitted Counts of infection profiles for influenza
Fitted Values
(i,..., i4) nl...,44 LN (q = 10) H02 BLN (q = 10) Cnst. BLN4
0 0 0 0 140 138.4 138.2 138.7 137.8
0001 31 20.8 20.8 31.0 31.3
0 01 0 16 19.5 19.6 17.6 18.0
001 1 3 4.1 4.1 2.8 2.6
0 1 00 17 22.1 22.2 17.8 18.6
0 1 0 1 2 4.6 4.6 2.6 2.4
0 1 1 0 5 4.3 4.3 4.1 3.8
0 1 1 1 1 1.3 1.2 0.5 0.4
1 0 0 0 20 26.2 26.3 22.0 21.8
1 00 1 2 5.5 5.5 1.7 1.8
1 0 1 0 9 5.1 5.1 6.9 7.1
1 0 1 1 0 1.5 1.5 0.4 0.4
1 1 00 12 5.8 5.8 10.6 10.8
1 1 01 1 1.7 1.7 0.6 0.5
1 1 1 0 4 1.6 1.6 5.4 5.4
1 1 1 1 0 0.6 0.6 0.2 0.1
Deviance (df) 27.7 (10) 27.7 (10) 3.9 (1) 4.3 (5)

fitting time to approximately one day, since we have increased the size of the grid

of quadrature points from 103 to 104. A four-dimensional constrained fit (e.g. equal

means or equal correlations) takes approximately 1.5 to 2 days.

4.3.2 Influenza Example

We now use the multivariate logit-normal model to analyze a data set first consid-

ered by Haber (1986) and later reanalyzed by Darroch and McCloud (1990). The data

are frequencies of infection profiles of a sample of 263 individuals for four influenza

outbreaks occurring in the winters 1977/1978 to 1980/1981 in Tecumseh, Michigan.

The data are reported in Table 4.4.

The first and fourth outbreaks are known to have been caused by the same virus

type, while the viruses in the second and third outbreaks were of different types. Be-

cause the first and fourth outbreaks were caused by the same virus type, a subject's

Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd