
Citation 
 Permanent Link:
 http://ufdc.ufl.edu/AA00003698/00001
Material Information
 Title:
 On model fitting for multivariate polytomous response data
 Creator:
 Lang, Joseph B., 1963
 Publication Date:
 1992
 Language:
 English
 Physical Description:
 vii, 200 leaves : ; 29 cm.
Subjects
 Subjects / Keywords:
 Algorithms ( jstor )
Degrees of freedom ( jstor ) Estimators ( jstor ) Mathematical vectors ( jstor ) Matrices ( jstor ) Maximum likelihood estimations ( jstor ) Parametric models ( jstor ) Statistical discrepancies ( jstor ) Statistical models ( jstor ) Statistics ( jstor ) City of St. Cloud ( local )
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Thesis:
 Thesis (Ph. D.)University of Florida, 1992.
 Bibliography:
 Includes bibliographical references (leaves 193199).
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 by Joseph B. Lang.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 027817400 ( ALEPH )
AJG6082 ( NOTIS ) 26576215 ( OCLC )

Downloads 
This item has the following downloads:

Full Text 
ON MODEL FITTING FOR MULTIVARIATE POLYTOMOUS
RESPONSE DATA
By
JOSEPH B. LANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1992
UNIVERSITYY OF FLORIDA US1llic15
ACKNOWLEDGMENTS
I would like to express my appreciation to Dr. Alan Agresti for serving
as my dissertation advisor. For the many comments, ideas, and lessons he
has shared with me, I am greatly indebted. Through his advisement and
guidance, he has taught me to appreciate and respect good statistical research
and teaching. He is a mentor worthy of emulation. I also want to express
my gratitude to Dr. Jane Pendergast, who also served on my dissertation
committee. I learned a great deal from her during the two years that I worked
in the Biostatistics Department. To all of the faculty at the University of
Florida, I extend my thanks. The statistics department, with its scholarly
and friendly atmosphere, proved to be a wonderful place to learn.
The influences of persons from my past are not forgotten. Without
Patrick Kearin's stimulating teaching of high school math, I may never have
become interested in this subject. The genuine excitement delivered by Dr.
James Kepner, in his teaching of undergraduate statistics, was the reason I
decided to pursue an advanced degree in statistics.
I would like to thank my parents and the rest of my family for all of the
support and encouragement they have given over the course of my studies
and research. My friends and student colleagues deserve many thanks as
well. Finally, I would like to thank Kendra Paar for always being there to
support and encourage me while I was writing this paper.
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS ............................................ ii
LIST OF TABLES ................................................ v
ABSTRACT .................................................... ... vi
CHAPTERS
1 INTRODUCTION ............................................. 1
1.1 A Brief Introduction to the Problem...................... 1
1.2 Outline of Existing MethodologiesNo Missing Data ...... 3
1.3 Outline of Existing MethodologiesMissing Data.......... 12
1.4 Format of Dissertation ...................................... 14
2 RESTRICTED MAXIMUM LIKELIHOOD FOR A
GENERAL CLASS OF MODELS FOR
POLYTOMOUS RESPONSE DATA .................... 17
2.1 Introduction ........................................ .. ..... 17
2.2 Parametric ModelingAn Overview....................... 24
2.2.1 Model Specification .................................. 25
2.2.2 Measuring Model Goodness of Fit ................... 33
2.3 Multivariate Polytomous Response Model Fitting .......... 43
2.3.1 A General Multinomial Response Model.............. 44
2.3.2 Maximum Likelihood Estimation .................... 48
2.3.3 Asymptotic Distribution of ProductMultinomial
M L Estimator ...... ..... ..... ....... .... .......... 56
2.3.4 Lagrange's MethodThe Algorithm ................ 60
2.4 Comparison of ProductMultinomial and
ProductPoisson Estimators ........................... 67
2.5 Miscellaneous Results ....................................... 78
2.6 Discussion ................................................... 83
3 SIMULTANEOUSLY MODELING THE JOINT AND
MARGINAL DISTRIBUTIONS OF MULTIVARIATE
POLYTOMOUS RESPONSE VECTORS .................. 87
3.1 Introduction............................................... 87
3.2 ProductMultinomial Sampling Model..................... 88
3.3 Joint and Marginal Models................................. 93
3.4 Numerical Examples ....................................... 98
3.5 ProductMultinomial Versus ProductPoisson
Estimators: An Example .......................... 111
3.6 WellDefined Models and the Computation of
Residual Degrees of Freedom ......................... 121
3.7 Discussion .............. .................................... 132
4 LOGLINEAR MODEL FITTING WITH
INCOMPLETE DATA...................................... 135
4.1 Introduction .............. ................................... 135
4.2 Review of the EM Algorithm................................137
4.2.1 General Results .................. ....................138
4.2.2 Exponential Family Results ........................... 140
4.3 Loglinear Model Fitting with Incomplete Data............. 144
4.3.1 The EM Algorithm for Poisson Loglinear Models..... 145
4.3.2 Obtaining the Observed Information Matrix ..........148
4.3.3 Inferences for Multinomial Loglinear Models ..........152
4.4 Latent Class Model FittingAn Application .............. 160
4.5 Modified EM/NewtonRaphson Algorithm................. 166
4.6 Discussion .................................................. 170
APPENDICES
A CALCULATIONS FOR CHAPTER 2.........................172
B CALCULATIONS FOR CHAPTER 4.........................176
BIBLIOGRAPHY ................... ..........................193
BIOGRAPHICAL SKETCH ........................................ 200
LIST OF TABLES
page
2.1 Opinion Poll Data Configuration................................. 22
3.1 Interest in Political Campaigns ................................... 91
3.2 CrossOver Data............ .......... ..... ....... ................ 92
3.3 Joint Distribution ModelsGoodness of Fit..................... 100
3.4 Marginal Distribution ModelsGoodness of Fit............... 101
3.5 Candidate Models in J(L x L + D) n M(U)Goodness of Fit... 102
3.6 Estimates of Freedom Parameters for
Model J(L x L + D) n M(CU)..................................... 103
3.7 Freedom Parameter Estimates and Standard Errors.............. 105
3.8 Estimated Cell Means and Standard Errors ................. 106
3.9 CrossOver Data ModelsGoodness of Fit....................... 110
3.10 Freedom Parameter ML Estimates for Model J(UA) n M(U) .... 110
3.11 Children's Respiratory Illness Data........................... 112
3.12 ProductMultinomial versus ProductPoisson Freedom
Parameter Estimation ........................................ 117
4.1 Observed crossclassification of 216 respondents
with respect to whether the tend toward
universalistic (1) or particularistic (2) values
in four situations (A,B,C,D) of role conflict ................. 162
4.2 Parameter and Standard Error Estimates ....................... 164
4.3 Classification Probability Estimates ..................... ....... 165
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
ON MODEL FITTING FOR MULTIVARIATE POLYTOMOUS
RESPONSE DATA
By
Joseph B. Lang
May, 1992
Chairman: Dr. Alan Agresti
Major Department: Statistics
A broad class of models that imply structure on both the joint
and marginal distributions of multivariate categorical (ordinal or nominal)
responses is introduced. These parsimonious models can be used to si
multaneously describe the marginal distributions of the responses and the
association structure among the responses. As a special case, this class
of models includes classical log and logitlinear models. In this sense,
we address model fitting for multivariate polytomous response data from
a very general perspective. Simultaneous models for joint and marginal
distributions are useful in a variety of applications, including longitudinal
studies and studies dealing with social mobility and interrater agreement.
We outline a maximum likelihood fitting algorithm that can be used for
fitting a large class of models that includes the class of simultaneous models.
The algorithm uses Lagrange's method of undetermined multipliers and a
modified NewtonRaphson iterative scheme. We also discuss goodnessoffit
tests and modelbased inferences. Inferences for certain model parameters
are shown to be equivalent for productPoisson and productmultinomial
vi
sampling assumptions. This useful equivalence result generalizes existing
results. The models and fitting method are illustrated for several applications.
Missing data are often a problem for multivariate response data. We
consider inferences about loglinear models for which only certain disjoint
sums of the data are observable. We derive an explicit formula for the
observed information matrix associated with the loglinear parameters that
is intuitively appealing and simple to evaluate. The observed information
matrix can be evaluated at the maximum likelihood estimates and inverted
to obtain an estimate of the precision of the loglinear parameter estimates.
The EMalgorithm can be used to fit these incomplete data loglinear models.
We describe this algorithm in some detail, paying special attention to the
Poisson loglinear model fitting case. Alternative fitting algorithms are also
outlined. One proposed alternative uses both the EM and NewtonRaphson
algorithm, thereby resulting in a faster, more stable, algorithm. We illustrate
the utility of these results using latent class model fitting.
CHAPTER 1
INTRODUCTION
1.1 A Brief Introduction to the Problem
There are many situations when multiple responses are observed for each
'subject' in a group, or several groups. Here 'subject' is generically used to
refer to a randomly chosen object that generates responses. The multiple
responses could represent repeated measurements taken on subjects over time
or occasions. They could be the ratings assigned by several judges that all
viewed and rated the same set of slides (here, the 'subjects' are the slides).
Or, perhaps, it may be that several distinct or noncommensurate responses
are recorded for each subject. These responses are often categoricalordinal
or nominaland inevitably interrelated. This dissertation addresses issues
related to modeling and model fitting for multivariate categorical (ordinal or
nominal) responses.
Models for multivariate categorical response data are usually developed
to answer questions about (i) the association structure among the multiple
responses or (ii) the behavior of the marginal distributions of the response
variables. Specifically, a typical question of the first type is, "How are the
responses interrelated and is this interrelationship the same across the levels
of the covariates?" A typical type ii question is, "How do the (marginal)
responses depend on the covariates or occasions?" Historically, many models
(e.g. log and logitlinear models) have been developed for the primary
1
2
purpose of answering the type i questions. Many of these models can easily
be fitted using maximum likelihood (ML) methods. These models typically,
however, are not useful for answering the type ii questions (Cox, 1972).
Marginal modelsthose models used to answer type ii questionsare not
as well developed. One reason for this is that ML fitting of these marginal
models is more difficult. At present, the method of weighted least squares
(WLS) is used almost exclusively for fitting these models.
Suppose that we are interested in answering questions of both types
i and ii. Usually the questions are addressed using two different models, a
joint distribution model and a marginal model, and fitting them separately. It
seems reasonable to want a model that can be used to address simultaneously
both questions. That is, we would like a model that simultaneously implies
structure on both the joint and marginal distribution parameters. To date,
there has been very little work done on the development and fitting of these
simultaneous models.
Whenever multiple responses are observed it is inevitable that there will
be missing data. There are several ways to fit the Poisson loglinear model with
incomplete data. One popular method is to use the EM algorithm to find the
ML estimates of the loglinear parameters. One drawback to this algorithm
is that a precision estimate of the ML estimators is not produced as a by
product. Several numerical techniques have been developed to approximate
the observed information matrix, which, upon inversion, will act as the
precision estimate. However, it would be of some convenience to derive an
explicit formula for the observed information matrix, at least in some special
cases.
3
1.2 Outline of Existing MethodologiesNo Missing Data
We begin our discussion by considering the case of no missing data.
There are many methods for analyzing multivariate categorical (ordinal or
nominal) response data. These methods usually involve fitting (separately)
models for the joint or the marginal distributions of the response vectors.
In rare instances, simultaneous models for both the joint and marginal
distributions are considered. Maximum likelihood fitting methods for the
joint distribution models are simple and described in almost every standard
text on categorical data analysis. The fitting of marginal models using
ML methods is more difficult. Maximum likelihood fitting of the marginal
homogeneity model was considered by Madansky (1963) and Lipsitz (1988).
The fitting of a more general class of marginal models was considered
by Haber (1985a). Finally, the fitting of simultaneous models using ML
methods has only been addressed in the bivariate response case. The fitting
technique becomes very complicated when there are more than two categorical
responses. To appreciate the complexity of extending the technique to
multivariate response data, see section 6.5 of McCullagh and Nelder (1989)
or perhaps Dale (1986). In contrast, the ML fitting method of Chapter 2 can
easily be used to fit many marginal and simultaneous models. In the next few
paragraphs, we briefly describe the existing methods for modeling and model
fitting for multivariate categorical response data.
Modeling Joint Distributions Separately. One common method for analyz
ing multivariate categorical responses is to model the joint distribution only.
These models, which include classical log and logitlinear models for the
4
joint probabilities, are useful for describing the association structure among
the responses. The last 30 years have seen the development of these methods
for analyzing multivariate categorical responses (Haberman, 1979; Bishop et
al., 1975; Agresti, 1984, 1990). For specificity, consider the following panel
study: One hundred randomly selected subjects were asked how interested
they were in the political campaigns. They were to respond on the 3 point
ordinal scale, (1) Not Much, (2) Somewhat, and (3) Very Much. Then four
years later the same group of subjects was asked to respond on the same
scale to the same question. A separate investigation into the association
structure would enable us to answer questions of a conditional nature. For
example, we could estimate the probability of responding 'Very Much' on the
second occasion given that the response at the first occasion was 'Not Much'.
The description of these 'transitional' probabilities, although very interesting,
may not be completely satisfactory. We may also be interested in addressing
questions with regard to the marginal distributions. Perhaps we would like
to answer the question, "Are the distributions of responses to the political
interest question the same for each occasion?" Laird (1991), in a nice review of
likelihoodbased methods for longitudinal analysis, mentions that the utility
of classical log and logitlinear models is restricted to two situations: (1)
modeling the dependence of a univariate response on a set of covariates and
(2) modeling the association structure between a set of multivariate responses.
These models place structure on the joint probabilities and so they are not
directly useful for studying the dependence of the marginal probabilities on
occasion and other covariates. This problem was pointed out by several
authors (Cox, 1972; Prentice, 1988; McCullagh and Nelder, 1989;
5
Liang et al., 1991). An advantage of these models is that they are simple to fit
using either WLS (Grizzle et al., 1969), ML (McCullagh and Nelder, 1989),
or iterative proportional fitting (Bishop et al., 1975) methods. There are
many standard statistical programs available for fitting these models (SAS,
SPSS BMDP, GLIM, GENSTAT).
Modeling Marginal Distributions Separately. A second approach to an
alyzing multivariate categorical responses is to model only the marginal
distributions and to ignore the joint distribution structure. Full likelihood
methods that consider only models for the marginal probabilities tacitly
assume a saturated model for the joint distribution. Therefore, the models
may be far from parsimonious. In the nonGaussian response setting, there
is a distinction between these marginal models and the transitional (or
conditional) models of the previous paragraph. Marginal models describe the
occasionspecific distributions and the dependence of those distributions on
the covariates. Transitional or conditional models describe the distribution
of individual changes over occasions. Models for these transitions can be
represented as probability distributions for the future state 'given' the past
states. Questions regarding transition probabilities can only be investigated
with longitudinal data. On the other hand, questions regarding the marginal
probabilities could theoretically be answered using crosssectional data,
provided the cohort (subject) effects were negligible. Panel studies resulting
in longitudinal data result in more powerful tests for significance of within
cluster factors, such as occasion effect. This follows because there is a reduced
cohort effect; we are using the same panel of subjects at each occasion. For
6
further discussion about the distinction between marginal and transitional
models, see Ware et al. (1988), Laird (1991), and Zeger (1988).
We will briefly discuss existing methods for making inferences about
the marginal probabilities separately. We will group these methods into 5
categories: (1) nonmodelbased methods, (2) WLS methods, (3) ML methods,
(4) Semiparametric methods, and (5) other methods.
Nonmodelbased methods can be used to derive test statistics used for
testing specific hypotheses regarding the marginal distributions. Examples
include the CochranMantelHaenszel (1950, 1959) statistic which can be used
for testing the hypothesis of marginal homogeneity (MH) (cf. White et al.,
1982), McNemar's (1947) statistic which can be used for testing the equality of
two dependent proportions, and Madansky's (1963) likelihoodratio statistic
for MH. Madansky's statistic is a difference in fit of the model of marginal
homogeneity to the fit of the unstructured (saturated) model (see also Lipsitz,
1988 and Lipsitz et al., 1990). Many other relevant test statistics, some of
which are generalizations or modifications of the aforementioned (cf. Mantel,
1963; White et al., 1982), exist. Cochran's (1950) Q statistic and Darroch's
(1981) Waldtype statistic are examples of other test statistics that can be
used to test for marginal homogeneity.
Presently, if one was to fit a marginal model, say a generalized loglinear
model of the form Clog Ai = X/, where p is the vector of expected counts
in the full contingency table, he or she would most likely use the WLS fitting
algorithm. Most statistical software that fits these generalized loglinear
models does so using WLS. There are some advantages to using WLS. It
is computationally simple. Secondorder marginal information is all that is
7
needed. And, the estimates are asymptotically equivalent to ML estimates.
Some disadvantages are that covariates must be categorical, sampling zeroes
create problems, and estimates are sensitive when secondorder marginal
counts are small. The WLS method for analyzing categorical data was
originally outlined by Grizzle, Starmer and Koch (1969). Subsequently,
marginal models for longitudinal categorical data, or more generally mul
tivariate categorical response data, have been introduced and fitted using the
WLS method (Koch et al., 1977; Landis and Koch, 1979; Landis et al., 1988;
Agresti, 1989).
Maximum likelihood fitting of marginal models is more difficult since
the model utilizes marginal probabilities, rather than joint probabilities to
which the likelihood refers. When the responses are correlated, as they
invariably are, the marginal counts do not follow a productmultinomial
distribution. The fulltable likelihood must be maximized subject to the
constraint that the marginal probabilities satisfy the model. Haber (1985a)
considers fitting generalized loglinear models of the form C log Ap = XP3 using
Lagrange multipliers and an unmodified NewtonRaphson iterative scheme.
The algorithm becomes very difficult to implement for even moderately large
tables. This is primarily due to the difficulty of inverting the large Hessian
matrix of the Lagrangian objective function. In this dissertation we consider a
modified NewtonRaphson that uses a much simpler matrix than the Hessian.
The matrix is easily inverted even for relatively large tables. Haber (1985b)
considers the estimation of the parameters / in the special case Clog y = XP3.
We will use a modification of the method of Aitchison and Silvey (1958, 1960)
and Silvey (1959) to investigate the asymptotic behavior of the estimators of
8
3 in the more general model Clog Ap = XP3, thereby extending the work of
Haber (1985b). Another relevant paper, Haber and Brown (1986), considers
ML fitting of a model for the expected counts p that has loglinear and
linear constraints. One can test hypotheses about the marginal probabilities
by comparing the fit of relevant models. Haber (1985a, 1985b) and Haber
and Brown (1986) only consider fitting the marginal models separately. No
attempt has been made to simultaneously model the joint and marginal
distributions.
Semiparametric methods such as quasilikelihood (Wedderburn, 1974)
and a multivariate extension, generalized estimating equations (GEE), have
become popular in recent years. The work of Liang and Zeger (1986), which
advocated the use of these GEEs, has been extended to cover the multivariate
categorical response data setting (Prentice, 1988; Zhao and Prentice, 1991;
Stram et al., 1988; Liang et al., 1991). With these semiparametric methods,
the likelihood is not completely specified. Instead, generalized estimating
equations are chosen so that, when the marginal model holds, even if the
association among the multiple responses is misspecified, the estimators are
consistent and asymptotically normally distributed. These estimators, used
in conjunction with a robust estimator of their covariance (Liang and Zeger,
1986; Zeger and Liang, 1986; White, 1980, 1981, 1982; Royall, 1986), result
in consistent inference about the effects of interest. When the responses are
truly independent, the estimating equations with correlation matrix taken to
be the identity matrix, are equivalent to the likelihood equations. The GEE
approach requires the specification of a 'working' association or correlation
matrix. Examples of working associations include those that imply all
9
pairwise associations (measured in terms of odds ratios) are the same and
that the higher order associations are negligible (Liang et al., 1991).
A related approach is known as GEE2. The consistency of these esti
mators follows only if both the marginal model and the pairwise association
model are correctly specified. This approach is a second order extension
of the GEEs of Liang and Zeger (1986) which are now termed GEE1. It
is second order because the estimation of the marginal model parameters
and the pairwise association model parameters is considered simultaneously.
The focus of both approaches, GEE1 and GEE2, is usually on modeling
the marginal distributionsinvestigating how the marginal distributions
depend on occasion and covariates. The association is considered a nuisance.
Presently, there are no tests for goodnessoffit of these models and so the
investigation into how well both models fit can be done only at an empirical
level. The assumption that higher order effects are negligible may not be
tenable. Testing procedures to assess the validity of these assumptions have
yet to be developed. Also, in contrast to WLS and ML methods, which
require only that the missing data be 'missing at random' (MAR), the semi
parametric approaches require the missing data to be 'missing completely
at random' (MCAR). The assumption that the missing data mechanism is
MCAR is a much stronger assumption than MAR (Little and Rubin, 1986).
Finally, there are many other approaches to analyzing the marginal
probability structure separately. There are random effects models, whereby
subjectspecific random effects induce a correlation structure on the multiple
responses. The marginal approachthe full likelihood is obtained by
averaging across the random effectsis computationally difficult (Stiratelli
10
et al., 1984). An alternative is to condition on the sufficient statistics
for the subject effects and consider finding the estimates by maximizing
the conditional likelihood. For further details on these conditional and
unconditional methods see Rasch, 1961; Tjur, 1982; Agresti, 1991; Stiratelli
et al., 1984; Conaway, 1989, 1990. As yet another alternative, Koch et al.
(1980) give a bibliography for relevant nonparametric methods for analyzing
repeated measures data. Agresti and Pendergast (1986) consider replacing
the actual observations by their within cluster rank and testing for marginal
homogeneity using the ordinary ANOVA statistic for repeated measures data.
A threestage estimator for repeated measures studies with possibly missing
binary responses has been developed by Lipsitz et al. (1992). This approach
is very similar to a generalized least squares approach, but it has some of
the nice features of the GEE approaches. One of these nice features is that
the estimators and their variance estimates are consistent under very mild
assumptions. An extension of this method to the polytomous response case
has yet to be developed.
Simultaneous Investigation of Joint and Marginal Distributions. There
has been very little work done to investigate simultaneously the joint and
marginal distribution structure. In some ways GEE2 is an attempt to
describe both distributions. However, only the pairwise (not the joint)
association structure is modeled; the higherorder associations are considered
a nuisance. Tests comparing nested models have not been developed in this
semiparametric setting. Full likelihood approaches have been addressed
by Dale (1986), McCullagh and Nelder (1989, Chapt. 6), and Becker and
Balagtas (1991). Dale models the joint distributions of bivariate ordered
11
categorical responses by assuming that the log global odds ratios follow a
linear model. The marginal probabilities are assumed to follow a cumulative
logit model. McCullagh and Nelder consider simultaneously modeling the
joint and marginal probabilities of a bivariate dichotomous response (two
distinct responses) by assuming that the log oddsratios follow a linear
model and that the marginal probabilities follow a logitlinear model. Their
example included age as a categorical covariate. Finally, Becker and Balagtas
consider models for twoperiod crossover data. The bivariate dichotomous
response was the response to the two different treatments. Order of treatment
application was considered a covariate. They assumed that the two log odds
ratios followed a linear model and that the marginal probabilities satisfied a
loglinear model. Because it is the marginal probabilities and not the joint
probabilities that satisfy a loglinear model, Becker and Balagtas refer to the
model as log nonlinear.
The ML model fitting approach used by each of these authors involves
a reparameterization of the likelihood, which is a function of the joint
probabilities, in terms of the joint and marginal model parameters. The
reparameterization in the bivariate response casethe case each author
consideredis somewhat complicated especially for multilevel responses. To
make matters worse, the extension of this method to general multivariate
polytomous responses looks to be extremely difficult. If the repaparameter
izations are made so that the full likelihood is expressible in terms of the
joint and marginal model parameters, the likelihood can be maximized using
a NewtonRaphsontype algorithm. Basically, one must solve for the root of
some nonlinear score equation. This maximization approach is very sensitive
12
to the starting value in that convergence to a local maximum is not likely
unless the starting estimate is very close to the actual maximum. Finding
reasonable starting values is not a simple task. Dale (1986) outlines a method,
specifically for the models considered in that paper, for finding a starting
estimate.
In this dissertation, we outline an ML fitting method that can easily be
used to fit a large class of simultaneous models, including those considered
by Dale, McCullagh and Nelder, and Becker and Balagtas. The approach
involves using Lagrange's method of undetermined multipliers along with a
modified NewtonRaphson iterative scheme. For all of the models considered,
an initial estimate for the algorithm is the data counts themselves along with
a vector of zeroes corresponding to a first guess at the values of the Lagrange
multipliers. The convergence of the algorithm is quite stable. The extension
to multivariate polytomous response data is straightforward.
1.3 Outline of Existing MethodologiesMissing Data
Missing data is often an issue when the response is multivariate in nature.
Missing data can also occur in more hypothetical situations. Examples
include loglinear latent class models (Goodman, 1974; Haberman, 1988)
and linear mixed or random effects models (Laird et al., 1987). In latent
class analyses, a latent variable, which is unobservable, is assumed to exist.
Mixed or random effects models posit the existence of some unobservable
random variables that affect the mean response. In this brief outline, we will
consider ML methods for model fitting when the data are not completely
observable. Little and Rubin (1986) provide a nice summary of methods
13 
for model fitting with incomplete data. There are many ways to find the
maximum likelihood estimators when the data are not completely observable,
each method having its positive and negative features. We could work directly
with the incompletedata likelihood, which is usually complicated relative to
the completedata likelihood, and use a NewtonRaphson or Fisherscoring
algorithm. Palmgren and Ekholm (1987) and Haberman (1988) use these
methods to obtain maximum likelihood estimates and their standard errors.
Alternatively, we could avoid the complicated likelihood altogether and use
the ExpectationMaximization algorithm (Dempster et al., 1977). Sundberg
(1976) discusses the properties of the EM algorithm when it is used to
fit models to data coming from the regular exponential family. The EM
algorithm is one of the more flexible ML fitting algorithms for missing data
situations. We will primarily focus on this method for fitting loglinear models
with incomplete data.
Although the EM algorithm is easily implemented to fit loglinear models
with incomplete data, the algorithm does not provide an estimate of precision
of the model parameter estimators. Meng and Rubin (1991) outline a
supplemental EM (SEM) algorithm, whereby, upon convergence of the EM
algorithm, the variance matrix for the model estimators is adjusted to account
for missing data. The adjustment is a function of the rate of convergence of
the EM algorithm, which in turn is a function of how much information
is missing. Meng and Rubin numerically estimate the rate of convergence,
thereby obtaining an estimate of precision that reflects missingness. Although
this approach should prove to be applicable in the general situation, it still
is desirable to derive an explicit formula for the variance matrix that reflects
14
missingness. Other authors (Meilijson, 1989; Louis, 1982) have discussed
methods for estimating precision of model estimators when the data are
incomplete and the EM algorithm is used. Meilijson's method involves EM
aided differentiation, which is essentially a numerical differentiation of the
score vector. The method relies on the assumption that the observed data
components are i.i.d. (identically and independently distributed). Louis
gives an analytic formula for the observed information matrix based on the
incomplete data. The computation of the observed information matrix based
on this formula is not straightforward and must be considered separately for
each special application.
1.4 Format of Dissertation
In Chapter 2, we develop a maximum likelihood method for fitting a large
class of models for multivariate categorical response data. This development
follows a general discussion about parametric modeling. Concepts such as
degrees of freedom and model distances (or goodness of fit) are described at
an intuitive level. We also describe and compare the asymptotic distributions
of freedom parameter estimators under productmultinomial and product
Poisson sampling assumptions. Chapter 3 has more of an applied flavor.
We consider simultaneously modeling the joint and marginal distributions
of multivariate categorical response vectors. A broad class of simultaneous
models is introduced. The models can be fitted using the techniques of
Chapter 2. Several numerical examples are considered. Chapter 4 outlines the
ML fitting technique known as the EM algorithm. This algorithm is used to
fit models with incomplete data. Some advantages and disadvantages of using
15
the EM algorithm are addressed. The most important disadvantage is that
the algorithm does not provide, as a byproduct, a precision estimate of the
ML estimators. We derive an explicit formula for the observed information
matrix for the Poisson loglinear model parameters when only disjoint sums of
the complete data are observable. An application to latent class modeling is
considered. We also propose an ML fitting algorithm that uses both EM and
NewtonRaphson steps. The modified algorithm should prove to have many
positive features.
In this dissertation, we do not distinguish typographically between
scalars, vectors, and matrices. Parameters and variables are treated as ob
jects, their dimensions either being explicitly stated or implied contextually.
By convention, functions that map scalars into scalars, when applied to
vectors, will be defined componentwise. For example, if i represents an n x 1
vector, then
logy = (log,lg/A2,...,log~n)'.
We frequently use abbreviations that are common in the statistical
literature. They include ML (Maximum Likelihood), WLS (Weighted
Least Squares), IWLS (Iterative (Re)Weighted Least Squares), and EM
(ExpectationMaximization).
The range (or column) space of an n x p matrix X is denoted by M(X)
and is defined as {p : tz = XP3, 3 E RP}. The symbols and $ are the
binary operators 'direct product' and 'direct sum'. The direct (or Kronecker)
product is taken to be the righthand product. That is,
A B = {Abij}.
16
The direct sum, C, of two matrices A and B is defined as
C=A B= A 0).
OB
The symbol D(p) represents a diagonal matrix with the elements of p on the
diagonal. That is,
(1 0 ... 0
D(M) = 0
0 0 ... /n
In Chapter 4, we make use of the bracket notation often used by
statistical and mathematical programming languages (e.g. Splus, Matlab).
To illustrate the notation, consider a matrix A. The (sub)matrix A[, 2] is
then matrix A with the second column deleted. Similarly, the matrix A[3,]
is the matrix A with the third row deleted.
Equation numbering is consecutive within sections of a chapter, the
first number representing the chapter in which it appears. For example, the
thirteenth equation in section 2.3 is equation (2.3.13). Within each appendix,
the equations are numbered consecutively. For example, the third equation
in Appendix B is numbered (B.3). Tables are numbered consecutively within
chapters so that, for instance, Table 3.2 represents the second table within
Chapter 3. Theorems, lemmas, and corollaries are numbered independently
of each other. All are numbered consecutively within sections. Therefore,
Corollary 3.2.2 is the second corollary within section 3.2 and Theorem 2.3.1
is the first theorem within section 2.3.
CHAPTER 2
RESTRICTED MAXIMUM LIKELIHOOD FOR A GENERAL
CLASS OF MODELS FOR POLYTOMOUS RESPONSE DATA
2.1 Introduction
In this chapter, we consider using maximum likelihood methods to fit a
general class of parametric models for univariate or multivariate polytomous
response data. The models will be specified in terms of freedom equations
and/or constraint equations. These two ways of specifying models will be
discussed at length in section 2.2. The model specification equations may be
linear or nonlinear in the model parameters. Specifically, if p represents the
s x 1 vector of expected cell means, the linear constraints will be of the form
Lp = d and the nonlinear constraints will be of the form U'Clog(Ap) =
0. The freedom equations will have form Clog(Ap) = XPf, where the
components of the vector 3 are referred to as the freedom parameters. In
Chapter 3 of this dissertation, we discuss more specifically models that can
be specified in terms of these constraint and freedom equations. The models
of that chapter allow one to simultaneously model the joint and marginal
distributions of multivariate polytomous response vectors.
The maximum likelihood, model fitting algorithm of this chapter utilizes
Lagrange multipliers and a modified NewtonRaphson iterative scheme. In
particular, the models will be specified in terms of constraint equations and
the log likelihood will be maximized subject to the constraint equations being
17
18
satisfied. One common optimization algorithm found in the mathematics
literature is Lagrange's method of undetermined multipliers. We show that
Lagrange's method is easily implemented for ML fitting of the models under
consideration in this chapter. One problem with Lagrange's method of
undetermined multipliers for ML fitting of statistical models has been that it
becomes computationally infeasible for large data sets. By using a modified
NewtonRaphson method which involves inverting a matrix of a simpler form
than the more complicated Hessian, we consider fitting models to relatively
large data sets.
We also explore the asymptotic behavior of the estimators within the
framework of constraintrather than freedommodels. Usually, asymptotic
properties of model and freedom parameter estimators are studied within the
framework of freedom models. Aitchison and Silvey (1958, 1960) and Silvey
(1959) studied the asymptotic behavior of the model parameter estimators
when the model is specified in terms of constraint equations. Following the
arguments of Aitchison and Silvey, we derive the asymptotic distributions of
both the model and freedom parameter estimators.
Previous work by Haber (1985a) addressed maximum likelihood methods
for fitting models of the form
Clog(A/) = X,,
to categorical response data. Subsequently, Haber and Brown (1986)
discussed ML fitting for loglinear models that were also subject to the
linear constraints Lp. = d, where these constraints necessarily include the
identifiability constraint required of p, the vector of productmultinomial
19
cell means. Both of these papers advocated the use of Lagrange's method
of undetermined multipliers to find the maximum likelihood estimates of
the model parameters Mp. The method of Haber (1985a) involved using
the (unmodified) NewtonRaphson method which becomes computationally
unattractive as the number of components in p gets moderately large. Both
Haber (1985a) and Haber and Brown (1986) were primarily concerned with
measuring model goodness of fit and therefore did not consider estimation
of freedom parameters. Haber (1985b) did consider estimation of freedom
parameters, but only when the simpler model Clog p = XP3 was used. One of
the several ways that we extend the work of Haber (1985a, 1985b) and Haber
and Brown (1986) is to consider estimation of the freedom parameters when
the more general model Clog Ap = X,3 is used.
Others have considered ML fitting of nonstandard models for multivari
ate polytomous response data. Laird (1991) outlines the different approaches
taken by different authors. As an example, Dale (1986) considered ML fitting
for a particular class of models for bivariate polytomous ordered response data
which were of the form
C1 log(Al p) = X1fi, g(A2li) = X2,2
Specifically, the first freedom equation specifies a loglinear model for the
association between the two responses measured by the global crossratios
(crossproduct ratios of quadrant probabilities) so that C1 and A1 are of
a particular form. The second set of freedom equations specifies some
generalized linear model (McCullagh and Nelder, 1989) for the marginal
means or probabilities. Maximum likelihood estimators for the association
20
model freedom parameters /3 and the marginal model freedom parameters
/32 were simultaneously computed by iteratively solving the score equations
via a quasiNewton approach. To use this maximization technique, the score
functions, which involve the cell probabilities, must be written explicitly
as a function of the freedom parameter 3 = vec(/l3, 32). A nontrivial
approach to finding reasonable starting values for / is discussed by Dale
(1986). Along with Dale, McCullagh and Nelder (section 6.5, 1989) and
Becker and Balagtas (1991) consider writing the score as an explicit function
of the freedom parameters so that the marginal and association freedom
parameter estimates may be computed simultaneously. In general, when there
are more than two responses, this is not a simple task and so an extension
of this method to multivariate polytomous response data models will be very
messy indeed. Also, convergence of the iterative scheme requires good initial
estimates of the freedom parameter P. These may be very difficult to find. In
contrast, the maximization approach of this chapter, which is similar to Haber
(1985a) and Haber and Brown (1986), is shown to be easily implemented for
fitting multivariate polytomous response data models. With this technique,
it is not necessary to write the cell means as an explicit function of the
freedom parameters. Further, initial estimates of the freedom parameters,
which are difficult to find, are not needed for this technique. Instead, only
initial estimates of the cell means and undetermined multipliers are needed.
Reasonable initial estimates of the cell means are the cell counts themselves.
While a reasonable initial estimate of the vector of undetermined multipliers
is the zero vectorthe value of the undetermined multipliers when the model
fits the data perfectly.
21
We will now introduce the class of models that we will consider for the
remainder of this chapter and the next, more applied chapter. The models
have form
C1 log(A1/) = X18I, C2 log(A2z) = X212, LIp = d
where the linear constraints include the identifiability constraints. Later,
when we study the asymptotic behavior of the ML estimators, we will
require the components of d to be zero unless they correspond to an
identifiability constraint. These models, which are of the form Clog(A/) =
Xfp, Lpr = d, will allow us to model both the joint and marginal distributions
simultaneously when dealing with multivariate response data. The bivariate
association model of Dale (1986) is a special case of these models, as we
can specify the matrices C1 and A1 so that C1 log(Ali) is the vector of log
bivariate global crossratios. Restricting the marginal models to have form
C2 log(A2/t) = X2f62, rather than allowing the marginal means to follow a
generalized linear model, as Dale (1986) did, is not overly restrictive. In
fact, many of the generalized linear models for multinomial cell means can be
written in this form. For example, loglinear, multiple logit, and cumulative
logit models are of this form. Also, unlike Haber (1985a) and Haber and
Brown (1986), we will be concerned with estimation of the freedom parameter
3 = vec(f31, 32), thereby allowing for modelbased inference.
Modelbased inferences usually refer to inferences based on freedom
parameters. With freedom equations, we have the luxury of choosing a
parameterization that results in the freedom parameters having meaningful
interpretations. For instance, a freedom parameter / may be chosen to
22 
represent a departure from independence in the form of a log odds ratio.
More generally, we usually will try to parameterize in such a way so that
certain parameters will measure the magnitude of an effect of interest.
For example, consider an opinion poll where a group a subjects were
asked on two different occasions whether they would vote for the President
again in the next election. Suppose they were asked immediately after the
President took office and again after the President had served for two years.
The researcher may be interested in determining whether the distribution of
response changed from Time 1 to Time 2 and if so, assess the magnitude of
the change. The data configuration can be displayed as in Table 2.1.
Table 2.1. Opinion Poll Data Configuration
Data Probabilities
Time 2 Time 2
yes no yes no
Time 1 yes yn Y12 Time 1 yes 7rl 7rl12 71I+
no Y21 Y22 no r721 22 72+
7+1 7r+2
We could formulate a model of the form C log(Ap) = X/3 in such a way
so that the freedom parameter 3 has a nice interpretation with respect to the
hypothesis of interest. One such model is
l 2(i)
log ( )=a+pi, i=1,2 (2.1.1)
where the parameter ij(i) is a marginal probability, i.e.
r(i)={ r+, ifi=1
r+j, if i=2
23
and, for identifiability of the freedom parameters,
P1 = P2 = P
Model (2.1.1) is a simple logit model for the marginal probabilities {rj+} and
{Tr+j}. The parameter p measures the magnitude of departure from marginal
homogeneity in that p = 0 if and only if there is marginal homogeneity.
One could use the Wald statistic P/se(p) to test the hypothesis. If the
null hypothesis is rejected, we can assess the magnitude of departure from
marginal homogeneity by computing a confidence interval for 2p which is the
log odds ratio comparing the odds that a randomly chosen subject responds
'yes' at Time 2 to the odds that a randomly chosen subject responds 'yes' at
Time 1.
This simple example illustrates the utility of using freedom parameters
and the corresponding modelbased inferences. For this reason, this chapter
will be concerned with making inferences about both the model parameters
Ap and the freedom parameters 3.
The contents of the following sections are as follows. In section 2.2,
we provide an overview of parametric modeling. The two ways of specifying
modelsvia constraint equations and via freedom equationsare discussed
at length in section 2.2.1. It is shown that a model specified in terms of
freedom equations can be respecified in terms of constraint equations. In
particular, the freedom equation Clog(Ap) = XP3, which actually constrains
the function C log(Ap) to lie in some manifold spanned by the columns of X,
is equivalent to the constraint equation U'Clog(Ap) = 0, where the columns
of U form a basis for the null space of X'. Other topics covered in section 2.2
24
include interpretation and calculation of 'degrees of freedom' and measuring
model goodness of fit.
We describe a general class of models for univariate or multivariate
polytomous response data in section 2.3.1. The data vector y is initially
assumed to be a realization of a productmultinomial random vector. We
describe the asymptotic behavior of the productmultinomial ML estimators
in section 2.3.3. Lagrange's method of undetermined multipliers is used to
find restricted maximum likelihood estimates of the model parameters and
the freedom parameters. The actual algorithm is described in detail in section
2.3.4.
In section 2.4, we explore the relationship between the productmultinomial
and productPoisson ML estimators. General results that allow one to
ascertain when inferences based on productPoisson estimates are the same as
inferences based on productmultinomial estimates are shown to follow quite
directly when one works within the framework of constraint models. Theorem
2.4.2 of this section, represents a generalization of the results of Birch (1963)
and Palmgren (1981).
2.2 Parametric ModelingAn Overview
Inferences about the distribution of some n x 1 random vector Y are
often based solely on a particular realization y of Y. In parametric modeling
it is often the case that the distribution of Y is known up to an s x 1 vector
of model parameters 8; i.e. it is 'known' that
Y ~ F(y; ), 0e8, E
(2.2.1)
25 
where 0 is some (s q)dimensional (q 0) subset of R* known to contain the
true unknown parameter 9*. The cumulative distribution function F maps
points in R" into the unit interval [0, 1] and is assumed to be known.
In general, we will allow the dimension s of 0 to grow with n. For
example, let Y = (Y1,..., Y,) have independent components such that
Yi ~ ind G(yi; zi(O)), i= 1,...,n,
where zi(8) is some function of 0 associated with the ith component of Y.
The function zi could be defined as zi(0) = Oi, in which case s = n. Or, on
the other hand, zi could be a mapping from R' to RI with s fixed.
2.2.1 Model Specification.
In parametric settings, models for the data, or more precisely, models for
the distribution of Y, can be completely specified by recording the family of
candidate distributions that F may belong to. That is, one must specify the
form for F(.; 0) and the space 0M that is assumed to contain the true value
9* of 9. In parametric modeling, the form of F(.; 9) is assumed known, but
the true value 0* is not. Denote a parametric model by [F(.; 9); 0 e OM] or
more simply by [0M]. We say the model [0M] 'holds', if the true parameter
value 0* is a member of 0M, i.e.
[OM] holds 0* e OM.
A model does not hold if 0* V Om.
The objective of model fitting is to find a simple, parsimonious model
that holds (or nearly holds). By parsimonious, we mean that the vector 0 can
be obtained as a function of relatively few unknown parameters. An example
26 
of a parsimonious model for the distribution of an nvariate normal vector
with unknown mean vector p and known covariance is [Op], where
Op = {A E R" : P/ = a, J = 1,...,n, p unknown}.
Notice that all n components of p can be obtained as a function of
one unknown parameter 3. Thus, all of our estimation efforts can be
directed towards the estimation of the common mean 3. An example of a
nonparsimonious model is the socalled saturated model [O], where
0 = {pI: p E R"} = R".
In this case, p is a function of n unknown parameters.
The question of whether or not the parsimonious model holds is an
entirely different matter. Practically speaking, a model will rarely strictly
hold. Therefore, we will often say a model holds if it nearly holds, i.e. for
some small e
inf 9* 01 < e.
Without delving too much into the philosophy of model fitting and the
simplicity principle (Foster and Martin, 1966), we point out that for a model
to be practically useful it must be robust to the 'white noise' of the process
generating Y. That is, it should account for only the obvious systematic
variation. A model would be said to be robust to the white noise variability,
if the model parameter estimates based on different realizations of Y are very
similar. As an example, if instead of [0E], the saturated model [E] was used
to draw inferences about the normal mean vector pt, we would find that the
model fit perfectly, but that upon repeated sampling the model estimates
27
would change dramatically. Thus, the model is not robust to the white noise
of the process. On the other hand, the parsimonious model [Op] estimates
would change very little from sample to sample, varying with the sample
mean of n observations. This model is robust to the white noise variability.
Therefore, if the model would hold, or nearly hold, we would say it was a
good model.
Freedom Models. In the previous nvariate normal example we specified a
model [Op] in terms of some unknown parameter /. Aitchison and Silvey
(1958, 1960) and Silvey (1959) refer to the parameter / as a 'freedom
parameter' and the model [Op] as a 'freedom model'. These labels are
reasonable since we can measure the amount of freedom we have for estimating
9 by noting the number of independent freedom parameters there are in the
model. The model [O(] has one degree of freedom for estimating the mean
vector ~. Thus, once an estimate of the single parameter / is obtained the
entire vector p can be estimated; it is a function of the one parameter 3.
Notice that 'degrees' of freedom correspond to integer dimension in that a
degree of freedom is gained (lost) if we introduce (omit) one independent
freedom parameter thereby increasing (decreasing) the dimensionality of OE
by one.
In general we will denote a freedom model by [Ox], where
ox = {9 e : g(e) = X3,3 E R'}
The function g is some differentiable vector valued function mapping 0 e 0
into rdimensional Euclidean space Rr. The 'model' matrix X is an r x p full
column rank matrix of known numbers. To calculate degrees of freedom for
28
[Ox] we will initially assume g satisfies
V00 Ox, (~ ) is of full row rank r.
It also will be assumed that the constraints implied by g(0) = XP3 are
independent of the q constraints implied by the model [O] of 2.2.1. Well
defined models will satisfy these conditions. For example, any g that is
invertible satisfies the derivative condition. Actually this derivative condition
is not a necessary condition for the model to be well defined. Later, we will
show that g need only satisfy a milder derivative condition.
The degrees of freedom for the model [Ox] can be obtained by subtract
ing the number of constraints implied by [Ox] from the total number of model
parameters, s. The number of constraints implied by [Ox] is (r p) + q, the
dimension of the null space of X' plus the q constraints implied by model [O].
Hence, the model degrees of freedom for [Ox] is
df[Ox] = s (r p + q) (2.2.2)
In view of (2.2.2) the model degrees of freedom, an integer measure of freedom
one has for estimating 9, is an increasing function of p the number of freedom
parameters. In fact, for the special case when q = 0 and g(8) = 0 (so s = r),
we have that the number of degrees of freedom for model [Ox] is simply p,
the number of freedom parameters. This gives us another good reason for
calling f a freedom parameter and [ex] a freedom model.
Constraint Models. Notice that
{0 e E : g(8) = Xp, 3 e RP} (2.2.3)
can be rewritten as
{( e 6 : U'g(8) = 0},
29
where U is an r x (r p) full column rank matrix satisfying U'X = 0, i.e. the
columns of U form a minimal spanning set, or basis, for the null space of X'.
Letting u = r p and h.(0) = 0 be the q constraints implied by [0], we can
write the (u + q) x 1 vector of constraining functions as h(0) = [hl(0), h,(0)]'
where hi = U'g. We rewrite the freedom model [Ox] of (2.2.3) as [Oh], where
Oh = {0 E R' : h() = 0}. (2.2.4)
Aitchison and Silvey (1958,1960) refer to model [Oh] as a constraint model.
Every freedom model can be written as a constraint model.
We present a few simple examples to illustrate the equivalence between
the two model formulationsfreedom and constraint.
Example 1. Let Yi ~ ind N(p3,r2), i = 1,...,n, where r2 is known.
This model can be specified as the freedom model [Ox], where
Ox = {pI E R" : p = lnf, # unknown }
or equivalently it can be expressed as the constraint model [Oh], where
Oh = {/ E R" : U'p = 0}
and U' is the (n 1) x n matrix
1 1 0 0 ... 0
U'= 01 0 1 0 0
1 0 0 0 . . 1
It is easily seen that Ox = Oh and that the model degrees of freedom is
df[Ox]= n(n 1)= 1.
Example 2. Let Y. ~ ind N(i( = fio +/3ix,,U2), i = 1,...,n, where a2
is known. This model can be specified as the freedom model [Ox], where
Ox = {A E R" : Pi = 3o +31Xi, i = 1,..., n }
30 
or assuming that each xi is distinct, as the constraint model [Oh], where
Oh = { E R" : U'l = 0 }.
Here U' is the (n 2) x n matrix
1 1 + 1 1 0 .. 0 0 0
22Z1 221 Z322 Z3Z2
1 1 1 1 0 0 0
2221 Z2ZX Z423 2423
U' =
,1 1 0 0 ...0 1 1
21Xl x2z Z1 ZRnn1~ /
Notice that U'p = 0 implies that
Aj+I Pi Pk+1 ik
vk, j.
xj+1 xj Xk+1 Xk
That is, the n means fall on a line. As before, it can be seen that Ox = Oh
and that the model degrees of freedom is df[Oh] = n (n 2) = 2.
Definitions. We will assume that the constraining function h satisfies
some reasonable conditions so that the model is well defined. We first present
some definitions.
(1) A model [Oh] is said to be 'consistent' if Oh 0.
(2) A consistent model [Oh] is said to be 'welldefined' if the Jacobian
matrix for h is of full row rank v = u + q at every point in Oh. That is,
0 Oh, h(0) \ of
VOo E h, ( cOh is of full row rank v.
(3) A model [Oh] is said to be 'illdefined' if it is not welldefined, i.e.
3o E hA, (O) is not of full row rank v.
( ow I1 0
31 
(4) An illdefined model [Oh] is said to be 'inconsistent' or 'incompatible'
if Oh = 0.
Briefly, any reasonable model will have a nonempty parameter space and
hence will be consistent. The Jacobian condition of definition (2) is similar
to the condition required in the Implicit Function Theorem (see Bartle, 1976).
Basically, this condition requires the constraints to be nonredundant so that,
at least theoretically, the constraint equations can be written uniquely as
a function of a smaller set of parameters. An illdefined model has been
specified with a redundant set of constraint equations. Using the lingo of
the optimization literature, two constraints are redundant if, for each point
in the parameter space, both of the constraints are 'active' or both of the
constraints are 'inactive'. That is, for all parameter values, if one constraint
is active (inactive) then the other is necessarily active (inactive).
It should be noted that the above definitions are in terms of the
constraint formulation of a model. This is sufficient since freedom models can
be written as constraint models. For convenience, we give sufficient conditions
for a freedom model to be welldefined.
A consistent freedom model is welldefined if it satisfies the following two
conditions:
(i) The constraints implied by g(e) = XP3 are independent of the q
constraints implied by [O].
(ii) The Jacobian matrix of g evaluated at any point in [Ex] is of full row
rank r, i.e.
Vo E Ox, ( g(0) is of full row rank r.
(90
32 
The sufficiency of conditions (i) and (ii) can be seen by observing that
(ii) implies that hi = U'g has a full row rank Jacobian since U' is of full row
rank and (i) implies that h = (hi, h.)' has full row rank Jacobian. These
sufficient conditions are by no means necessary for a model to be well defined
as the Jacobian of h may be of full row rank v even when the Jacobian of g
is not of full row rank.
Notice that the model matrix has nothing to do with whether or not a
model is well defined. In particular, one may think that the model [Ox] is
illdefined whenever the r x p matrix X is not of full column rank; i.e. the
freedom parameters are nonestimable. However, the model can be rewritten
as a constraint model with the full column rank matrix U spanning the null
space of X, which has dimension less than p r. It follows that if g satisfies
(i) and (ii), then the model [Ox] will be welldefined. The only reason we
have taken X to be of full column rank is to avoid using generalized inverses
when working with the freedom parameters.
To illustrate the use of these definitions, we consider the model [OM],
where
OM = { e Rn : MO d = 0}.
The model will be well defined if Oh/80' = M is of full row rank. It is
inconsistent if the linear system of equations MO = d is inconsistent.
If a model [Oh] is well defined, then the constraints implied by the model
are all independent in that no constraint can be implied by the others. We
will consider only welldefined models when calculating degrees of freedom.
33 
As before, we calculate degrees of freedom for a model as the difference
between the number of model parameters s and the number of independent
constraints v implied by the model, i.e.
df[Oh] = s (r p + q) = s (u + q) = s v
Notice that for the constraint model, model degrees of freedom is a decreasing
function of the number of independent constraints v.
Finally, it should be noted that models may be specified in terms of
both freedom equations and constraint equations. In fact, in subsequent
sections this will be the case. However, without loss of generality, we will
concentrate on constraint models since any model can be written in the form
of a constraint model.
2.2.2 Measuring Model Goodness of Fit
Inferences about model parameters are reliable only if the model is
'good'. A good model should be well defined (or at least consistent). It
should be simple and parsimonious. Finally, the model should be relatively
close to holding.
To assess whether or not the model holds, we will need the concept of a
distance between two models. To begin, we will assume there is some measure
of distance between two hierarchical parametric models. (Two models [O1]
and [02] are hierarchical if 02 c 01 and df[O2] < df[O1] whenever 01 02.)
This parametricc) distance will be a quantitative comparison of how close
the two models are to holding. Thus, if both models hold the distance is
zero. The distance will also be independent of the model degrees of freedom.
34 
Recall that the form of F(.; 0) is assumed known. Therefore, the distance will
measure how far the true parameter is from falling in the parametric model
space. Suppose, firstly, that 01 and 02 are general parameter spaces. That
is, 0 E 01 u 02 does not necessarily define a probability distribution. In other
words, 9 need not fall in a subset of an (s 1)dimensional simplex. Let a(8)
and b(8) be vector or matrix valued functions of the unknown parameter 8.
Define a distance between two hierarchical models [01] and [02] (02 C 01) as
6[02; 01] = inf lib(9)(a(9) a(0*))12 inf Ib(9)(a(O) a(O*))112.
02 01
Notice that a and b can be chosen so that
(1) 6[02; 01] 0
(2) [02; 01] = 0, iff O1 and 02 hold.
For example, consider the case Y ~ MVN,(i, a2I,). Suppose that
[] = {((p,2) :L E Rn,a2 > 0}
[01] = {((, 2) : l= o,2, > 0}
[02] = {(t, ,2) : X = Xp, p RP, ,2 > 0}
[03] = {(I, 02) : = 1,a, a e R, a2 > 0}.
In this example, each component of Y has a common variance a2. It seems
reasonable that differences between any pj and the true mean t* are equally
important. Hence, a natural distance between any two of these models is
S[0M,; 2M; = inf II. _,*2 inf IIt L *2.
Sn a
Notice that a(i,, ,2) = l and b(f, a2) = 1. Hence, the measure of distance
 35 
between [0] and [01] is
6[01; 0] = inf llp *11' = I0o *ll12
The second infimum is zero since the model [0] is known to hold.
The measure of distance between [02] and [0] is
b[02; ] = inf Ip p *112 = inf IIXP 112
= IIX(XIX)IXI',* 1*112
(2.2.5)
= I(I. X(X'X)lX')1*112
= P*'(I X(X'X)IX')1X *.
This is the squared length of the vector orthogonal to the projection of p*
onto the range space of X. Notice that if p* = Xf*, that is 02 holds, then
6[02; 0] = 0.
Finally, the distance between [03] and [02] is
6[03; 02] = inf jIp p*112 inf 11i /1*112
03 2
= *(I ll)* /1*'(I X(X'XX)X') (2.2.6)
n
= /*'(X(X'X)'X 1n1)
As another example, consider a random vector Y = (Y1,...,Y,)', with
independent components following an exponential dispersion distribution
(Jorgenson, 1989). That is,
Yi ~ indep ED(Ai,oa'), i= 1,...,n,
where the density of Yi, with respect to some measure, has form
fy(y; , 02) = a(y, 2) exp{ T K(71)} (2.2.7)
36 
where pi = J'(7y) and var(Yi) = a2r"(1y,). Let V(Iu) = e "(7y) and
0 = (PI,...,n,oa2)'. Since the components of Y have different variances,
a natural measure of distance is
[SOM,; OM] = inf IIV(L)1/2(p *)112 inf I V(p/)L/2( _*)112. (2.2.8)
OM2 eM1
That is a(0) = p and b(O) = V(p)1/2. Premultiplying the vector (p p*) by
V(p)1/2 has the effect of downplaying those differences (p/ i *) when the
corresponding variance is large.
To assess the goodness of fit of a model, relative to another, we can
estimate the distance 6 via some statistic based on the observed data. It
is interesting to note that when 6 = 0, i.e. both models hold, our data
based estimate of this null distance will be some nonnegative (positive, if
the model is unsaturated) number, reflecting the amount of white noise or
random variability there is in Y. This is so because, if both models hold,
then the only reason that our estimate of distance would be nonzero would
be because Y has some random component. That is, the variability in Y that
is not explained by the model causes the data to fit the model imperfectly.
Let D be an estimate of 6. That is, D[E2; 01] is a stochastic, databased
estimate of how far apart models [01] and [02] are. Potential candidates
for D are the weighted least squares, likelihood ratio, Wald, deviance, and
Lagrange multiplier statistics.
For example, consider the nvariate normal case and the four candidate
models [0], [01], [02], and [03]. We will assume that both [0] and [02]
hold. In view of (2.2.5) a reasonable estimate of 6[02; 0] can be obtained by
37
replacing /* by Y, the estimate of p* under model [O], i.e.
n
D[o2; ] = Y'(I X(X'X)X')Y = (Y Y)2.
1
Recall, that since [0E2; ] is known to be zero, D[02; ] serves as our
'estimate of error'.
Similarly, a reasonable estimate of 6[03; 02] can be obtained by replacing
p* in (2.2.6) by Y, the least restrictive estimate of p*, i.e.
D[03; 02] = Y'(X(X'X)'X' Y = (8 )2
Now 03 C 02 and
df[03] = n +1 (n 1)= 2
df[2] = n+ 1 (n p) = p + 1.
The degrees of freedom associated with estimating the distance between
two models will be called the distance (or residual or goodnessoffit) degrees
of freedom. The distance degrees of freedom for the two models [OM1] and
[OM ,] is defined to be the difference between the two model degrees of freedom,
i.e.
df (S[OM; M1]) = df [OM1 df[0M1].
The number of distance degrees of freedom measures the dimensional distance
between the two models, i.e. the difference in dimensions. It measures the
difference in the amount of freedom one has for estimating 0 for the two
models. It seems intuitive that if the degrees of freedom is large, that is the
dimensional difference between the two models great, the significance of the
distance statistic may be difficult to ascertain. This follows since we expect
the fit to be quite different for the two very different models, even when both
38 
models hold. This is a reflection of both white noise and possibly lack of fit.
Therefore, the distance statistic will tend to be large, even when both models
hold. But for many statistics, a large mean implies a large variance, thereby
making significant findings more difficult. It is for this reason that we say
it is better to concentrate our efforts on relatively few degrees of freedom
to detect lack of fit. That is, one should use the smallest alternative space
possible when testing a null hypothesis.
A more technical argument holds when the test statistic (distance
statistic) is a Chisquare or an F. Das Gupta and Perlman (1974) showed
that for a fixed noncentrality parameter, i.e. fixed distance between models,
the power of the Ftest or the Chisquare test increases as the distance degrees
of freedom decreases.
Example 1: Continuing with the nvariate normal example, we see that
df(6[O3; 021) = df[Oz] df[03] = (p + 1) 2 = p 1.
Thus, 03 is of p 1 less dimensions than 02. Now, if we knew ,2 the white
noise variance, we could test Ho : 9* e 03, vs. H1 : 8* 02 03, using the
statistic
D[03; 02] _SS(Reg)
2 2 ,(2.2.9)
which has a X2(p1) null distribution. However, r2 is not generally known and
we must estimate it. One way of estimating ,2 is by estimating the distance
between [0] and [02], two models that are known to hold, and dividing by
the distance degrees of freedom. Since the distance degrees of freedom is
df [] df [2] = n +1 (p +1) = n p, we have that the estimate of the white
noise variance is D[02; 0]/(n p) = SS(Error)/(n p).
39 
Notice that in the above example the estimate of the parameter 02
was simply the estimated distance between two models that were known to
hold divided by their dimensional distance. Quite generally, when the data
have an exponential dispersion distribution (2.2.7) with common dispersion
parameter r2, the estimated distance between two models that are known to
hold, divided by their dimensional distance gives us an estimate of a2. This is
true when the estimated distance is taken to be the LR, Wald, Deviance, LM,
or the weighted least squares statistics. These statistics are natural estimators
of the weighted distance given in (2.2.8) for the exponential dispersion models.
Now, let us assume that 01 and 02 are each subsets of an (s 
1)dimensional simplex. For example, with count data, conditional on the
total n, the distribution is often multinomial with index n and parameter
(alternatively, probability distribution vector) 0*. Read and Cressie (1988)
extensively study a family of distance measures called the powerdivergence
family. The power divergences have form
(0* 0) (+ 1) O [( 1 ; o
where IJ and I1 are defined to be the continuous limiting value as A 0 and
A 1. It is assumed that 0* and 0 fall on an (s 1)dimensional simplex.
As usual, let 0* represent the true unknown parameter. We define the family
of distance measures between [01] and [02] (02 c 01) to be proportional to
6[02; 01] = 2n{ inf I0'(*,0) inf A(0*, 0)}.
02 01
By properties of IX(O*, ) (Read and Cressie, 1988, pp. 110113), it follows
that S > 0, with equality if and only if both models hold.
40 
To estimate 6[[2; 01] based on the data, we note that our least restrictive
guess of 0* is Y/n, the vector of sample proportions. Intuitively, a good
estimate of the quantity 6[02; 01] would be
D[02; 0O] = 2n{ inf IA(Y/n, 9) inf IA(Y/n, 0)}
02 01
2 Y 2 Y ^
[(A+1) n ) A1] (+1)K[(Y ) 1
where 9) and ' are the 'minimum divergence' estimators obtained by
minimizing IX(Y/n,0) with respect to 0 over 01 and 02 respectively. Read
and Cressie (1988) point out that D[02; 01] is equal to the likelihood ratio
statistic when A = 0. Also, if we assume that [01] holds so that the second
infimum is zero, we have that, for A = 1,
D[02; 1]= (Y n ))2
which is asymptotically equivalent to
D[02; 0,] = ( n ))2
where 9(0) is the maximum likelihood estimator of 6* over the space 02. This
is the Pearson chisquare statistic. Other asymptotically equivalent distance
estimates are the Wald statistic and the Lagrangian multiplier statistic. We
now illustrate these results via examples.
Example 2: Suppose that Y = (Yu, Y12, Y21, Y22) is a multinomial vector.
That is,
(Y11, Y2, Y21, Y22)' ~ Mult(n, (7ri, 712,7r21,7r22)'), with i.y = 1.
i j
Thus, the model that is known to contain the true parameter vector 7r* is [0]
where
O = {7r: 7r'14 = 1,7,j E (0, 1), i, J = 1,2}.
41 
Notice that is really a 3dimensional subset (simplex) of (0, 1)4 so that
df[9] =4 1 = 3.
We wish to test the independence hypotheses
SHo : rlll 22 = 7r2721, VS.
H : 7r11722 = 7"127"21
Writing the model of interest [o] as
O0 = {7 E E : 7r1122 727r21 = 0}
= {r : 7'14 = 1, 71rnr22 712721 = 0},
we can state the independence hypotheses as
Ho : r e00, vs.
H : 7r e o0.
Now, the model degrees of freedom can be found by subtracting the number
of constraints implied by [o0] from the total number of parameters, which
is 4. Hence, df[0o] = 4 2 = 2. Thus, the distance degrees of freedom or
measure of dimensional distance, is df(b[O0; 1]) = 3 2 = 1.
Two distance (goodnessoffit) statistics commonly used are the Pearson
chisquare X2 (A = 1) and the likelihood ratio statistic G2 (A = 0). The forms
of these two statistics are
D[Oo; O] = X' = (y nri,o)2
i j n7rij,o
and
D[0o; ] = Ga = 2E yj log( Yi
i j n7r,,o
where iri,o is the ML estimate of 7rij assuming that model [Oo] holds.
Under the null hypothesis, i.e. if independence truly holds, then the
asymptotic distribution of both distance statistics, X2 and G2, is X2(1).
42 
Example 3: Continuing with example 2, consider the model [EMH] where
EMH = {7 : 7r'14 = 1, 7rl+ +r+1 = 0}.
This model implies that there is marginal homogeneity, i.e. The marginal
distributions for both factors are the same.
We would like to test the hypotheses
Ho : r e O)MH, vs.
H, : 7r EMH.
The model degrees of freedom is df[OMH] = 4 2 = 2, and so the distance
degrees of freedom is df (6[OMH; ]0) = 3 2 = 1. Once again, to illustrate
what model degrees of freedom means, we observe that if [OMH] holds and
we specify two of the four probabilities, the remaining two are completely
determined. Thus, we are free to estimate two of the probabilities based on
the data. The other two are determined.
Two frequently used estimates of the model distance, or model goodness
of fit are the likelihood ratio statistic G2 and the McNemar statistic M2. For
2 x 2 tables, the McNemar statistic and the Lagrange Multiplier statistic are
equivalent since both are score statistics (Agresti, 1990; Aitchison & Silvey,
1958). The statistics take the following forms
D[OMH; O = G2 = 2 = E 2log( Yij
jn ij,o
i jfji,0
and
D[eMH;e] = M (22
Y12 + Y21
where the iij,o in the first expression is the ML estimate of 7rij under the
model [OMHI.
43 
Under the null, i.e. when the marginal distributions are homogeneous,
both of these statistics have asymptotic X2(1) distributions.
It is important to note that, had the constraint 72+Tr+2 = 0 been added,
the model would remain consistent but would be ill defined. For 2 x 2 tables,
this additional constraint is exactly the same as the constraint 7r+ r+1 = 0.
2.3 Multivariate Polytomous Response Model Fitting
In this section, we describe ML model fitting for an integer valued
random vector Y that is assumed to be distributed productmultinomially.
We also investigate the asymptotic behavior of the ML estimators within the
framework of constraint models. The models we will consider have form
Ox = {( E O: Clog(Ae)) = XP3, Lee = 0}
or equivalently, for appropriately chosen U,
Ox = Oh e E : U'Clog(AeC) = 0, LeC = 0},
where ee is the s x 1 mean vector of Y, a productmultinomial random vector
and the model parameter space O is of dimension s q, where q is the number
of identifiability constraints. We use the parameter rather than 11 = ee
for several reasons. One reason will become evident when we explore the
asymptotic behavior of the ML estimator of It turns out that the random
variable 4 po is not bounded in probability, whereas 6o is. In fact, the
random variable o converges in probability to 0. Another reason for using
rather than y is that the procedure for deriving the maximum likelihood
estimate of is less sensitive to small (or zero) counts. The range of possible
values is the whole real line, while the range of possible p values is restricted
44 
to the positive half of the real line. By using 6 the problem of intermediate
out of range values (e.g. negative cell mean estimates) is avoided.
As stated above, we initially assume that the vector of cell counts Y
has a productmultinomial distribution. This is not overly restrictive since it
will be shown that inferences based on maximum (multinomial) likelihood
estimates are often the same as inferences based on maximum (Poisson)
likelihood estimates. We will present some results in section 2.4 that allow
us to determine when these inferences are indeed the same.
We also consider an alternative method for computing the maximum
likelihood estimators and their asymptotic covariances. The method of
Lagrange undetermined multipliers is well suited for maximum likelihood
fitting of the models we will be considering. This is so because we will specify
the models in terms of constraint equations and the fitting problem will be
one of maximizing a function, namely the log likelihood, subject to some
constraints, namely that 6 E Oh.
2.3.1 A General Multinomial Response Model
In this section we specify a class of models that is directly applicable
to Chapter 3 of this dissertation. Specifically, the models will be specified in
such a way so as to include the class of simultaneous models for the joint and
marginal distributions considered in Chapter 3.
45 
Let the random vector Y = vec(Yi,..., YK) denote a product multinomial
random vector, i.e.
Yi= (Yil,...,YiR)' ~ ind Mult(ni, ri), i = 1,...,K, K > 1,
where the R x 1 vector of cell probabilities satisfy 7rilR = 1, i = 1,..., K.
Consider the 1:1 reparameterization from {Ir;} to {(~}, where &, =
log(pi) = log(niri) is an R x 1 vector of log means. Under this parame
terization,
Yi ~ ind Mult(ni, ), e41=ni, i= 1,...,K,
or
Yi ~ indMult(ni, ), i=1,...,K, e'(ef1R) = n', (2.3.1)
ni
where n' = (nl,..., nK) is the 1 x K vector of multinomial indices.
The kernel of the log likelihood for Y, written as a function of e, is
e(M)(; y) = y'e, e'(e$ 1R) = n' (2.3.2)
We now posit a model for the vector of log means. Let s = RK be the
total number of cell means. Our objectives are to test the model goodness
of fit and to estimate the s x 1 model parameter vector as well as any
freedom parameters of interest. It will be assumed that the model [ex] can
be specified as
Ox = {( e R' : C1 log Alet = Xi/3, Ca log A2e = X2,2, Lee = 0,
(2.3.3)
e'( 1R) =n',
 46 
where
Ci = (fCij, Cij = Cil, is qi x mi i = 1,2
Ai = qfAij, Aij = Ail, is mi x R, i= 1,2
L = 'Lfj, Lj L1 is dx R
= vec(,..., ) and is R xl 1
Xi is Kqi x pi of full rank pi, i = 1, 2
n is the K x 1 vector of multinomial indices
s = RK, the total number of cells
Let us say that a model that can be specified as in (2.3.3) satisfies
assumption (Al). That is,
(Al) The multinomial response model can be specified as in (2.3.3).
Notice that the K matrices of Ci are all identical, likewise with the
matrices comprising Ai and L. This requires that the model does not change
across the K populations (K multinomials). Also, the two sets of freedom
equations in (2.3.3) will allow us to use two different types of models for
the expected cell means. This provides us with enough generality to fit
many interesting models. For example, we may wish to simultaneously fit
a linearbylinear association loglinear model for the joint distribution and a
cumulative logit model for the marginal distributions.
We can conveniently rewrite (2.3.3) as
Ox = { e R' : Clog(Aee) = XP, LeC = 0, eC'(ef1R) = n', (2.3.4)
where A'= [A', A'], C = C1 ) C2, X = X1 X2, and 3 = vec(3l,3Q2).
Notice that the model [Ox] is specified in terms of both freedom
equations and constraint equations. We will rewrite [Ox] as a constraint
47 
model keeping in the back of our minds that the freedom parameters may be
of interest also.
Let U be a K(ql + q2) x u matrix of full column rank u such that
U'X = 0. Here u is the dimension of the null space of X', A((X'), i.e.
u = K(qi + q2) (1 +p2). Since U can be chosen to be of full column rank, it
follows that the columns of U form a basis for the null space of X'. Thus, the
range space of U equals the null space of X', i.e. M(U) = A(X'). Multiplying
the right and left hand side of the freedom equation Clog(Aee) = XP/ by U',
we can rewrite (2.3.4) as
Oh = { e R' : U'Clog(Aee) = 0, Lee = 0, e'(elR) n' = 0}. (2.3.5)
Thus, Ox = Oh and the models [Ox] and [Oh] are one and the same.
At this point, we will assume that the constraints implied by the model
[Oh] are nonredundant so that the model is well defined. More specifically, let
h'() = [(U'Clog(Ae))', e'L'] be the 1 x (u + 1) (1 = Kd) vector of constraint
functions. We will assume that the u ++ K constraints implied by h(() = 0
and ee'(@IelR) = n' are nonredundant. Notice that the constraints in h(() = 0
do not include the identifiability constraints. We treat the identifiability
constraints separately for reasons that will become apparent when we actually
fit the models.
As stated previously, one of our primary objectives is to estimate the
model parameters 6 and the freedom parameters f under the assumption
that [Ox] (and [Oh]) holds. We will use the maximum likelihood estimates,
which can be found by maximizing the log likelihood of Y subject to the
constraint that [Oh] holds.
48 
The (kernel of the) log likelihood under the product multinomial
assumption is shown in (2.3.2). It is
fcm) (; Y) = Y1,*
Thus, we are to maximize the function e(M)(E; y) = y' subject to e OEh.
2.3.2 Maximum Likelihood Estimation
In this section we will discuss two procedurally different approaches
to maximizing the log likelihood e(M)(; y) subject to E e ,. The first
approach, which is the more commonly used approach, requires that the
model be specified entirely in terms of freedom equations. Often times,
when there are no identifiability constraints, the model can be completely
specified as a freedom model. Models amenable to this approach include the
Poisson loglinear model and the Normal linear model. The second approach,
Lagrange's method of undetermined multipliers, can be directly applied when
the model is specified completely in terms of constraint equations. Since the
product multinomial model includes identifiability constraints, it can more
easily be specified in terms of constraint equations. For this reason this
second method is the preferred choice. In the following sections, we discuss
some additional features of these two methods.
Freedom Parameter Approach. One approach often used in simple situa
tions, namely those situations when the model can be specified completely
in terms of freedom equations, is to write the parameter C as a function
of the freedom parameter I and maximize e(M)((P); y) with respect to 3.
The vector (P3) will be in the model space, since the model was specified
49 
completely in terms of f/. For example, if the model could be specified as
Ox = {( E R' : log e = Xp},
then ((/) = XP3. Notice that the multinomial model, which includes the K
constraints eV'($flR) = n', is not directly amenable to this approach. In fact,
we would have to reparameterize to a smaller set of sK model parameters
that account for the K constraints. This reparameterization results in an
asymmetric treatment of the e and for that reason is deemed undesirable.
On the other hand, the Poisson model considered below, will often lend itself
to this maximization approach, since the K constraints eC'(e$1R) = n' are
not included.
Computationally, the method of maximizing the log likelihood with
respect to the freedom parameters is usually simple. Assuming the log
likelihood is concave and differentiable in 3, we need only solve for the root
of the 'score equations', viz.
s(; Y) = ; ) .
Many of the asymptotic properties of the maximum likelihood estimator
3 for 3 are derived by formally expanding the score vector s(P3; y) about the
true value f = 3* in a linear Taylor expansion. That is,
s(/; y) = s(3*; y) + Os(,*;y P ) ) + ( 12) (2.3.6)
In particular, in many situations,
O=s( ; Y)=s(3*;Y)+ Os' () p*) + Op(1),
0/3'
 50
so that / 3* has the same asymptotic distribution as
SY) s(3 *; Y).
Subsequently, we will derive the asymptotic distribution of3 P3* in a different
way. This alternative derivation of the asymptotic distribution of the freedom
parameter estimate will shed new light on the relationship between the
asymptotic behavior of the estimates under the two sampling assumptions
product Poisson and product multinomial.
Expression (2.3.6) also gives some indication of how one might numer
ically solve for /, the root of the score equation. A NewtonRaphson type
algorithm is often used. This root finding algorithm involves the inversion
of the derivative matrix as(P3;y)/Ql3', which is usually of small dimension
since the model is usually specified in terms of a small number of freedom
parameters. In fact, the dimension of the derivative matrix will not be larger
than s x s, which occurs when the model is saturated.
Constraint Equations Approach. In many situations, it may be difficult to
specify a model in terms of only freedom parameters or perhaps it is possible
but the researcher would like to treat the model parameters symmetrically,
which would necessitate an additional constraint equation. It also could be
that the function ClogAeE is not a 1:1 function of so that for given /, we
can not solve for explicitly. In any of these cases, we may not be able to
use the aforementioned maximization approach.
In this section, we consider an alternative method for finding that i
that maximizes the function e(M)({; y) subject to E Oh. The method we
will use is the Lagrange's method of undetermined multipliers. Aitchison and
51
Silvey (1958, 1960) and Silvey (1959) provide much of the essential underlying
theory related to this approach. Three positive features of this method
include (i) estimation of both ( and 3 is possible, (ii) the method provides
us with another enlightening way of deriving the asymptotic distribution
of the freedom parameter estimators, and (iii) the method works quite
generally. A negative feature of this approach is the computational difficulty.
Computationally, the method becomes burdensome as s, the number of log
mean parameters, and u + 1 + K, the number of constraints implied by the
model, become large. In fact, the algorithm involves the inversion of an
(s + u +1) x (s + u +1) matrix. One positive note, is that this potentially very
large matrix does have a simple form and one can invoke some simple matrix
algebra results to reduce the inversion problem to one of inverting matrices
of dimensions (u +1) x (u +1) and s x s.
To best illustrate the difference in computational difficulty of the two
methods, we consider the following normal linear model example. Let
Yi ~ ind N(;i = 3o + alxi,
The log likelihood can easily be written as a function of f = (Po, /i)'.
Maximizing this likelihood with respect to f involves working with a 2 x 2
matrix. On the other hand, we could equivalently specify the linear model in
terms of the 98 constraints,
Pi+1 Pi Pi+2 i+1, i= 12, ... 98,
Bi+i Xi +i+2 Zi+l
and use Lagrange's method. In this case, we would need to invert a matrix
which has dimension (s + u + 1) x (s + u + 1) = 198 x 198.
52 
Even when we use the matrix algebra results that simplify the problem
of working with the 198 x 198 matrix, we still are left with a formidable task.
It seems that when s is large and the model is parsimonious, i.e. u + + K,
the number of constraints is large, the undetermined multiplier method may
not be the method of choice. However, in time, as computer efficiency gains
are realized, we predict that the scope of candidate models to be fit using
this method will increase tremendously. In fact, at present, many categorical
models can easily be fit using Lagrange's method. We discuss in more detail
how we can use the method of undetermined multipliers to fit models like
[Oh] of (2.3.5).
We are to maximize the function (M) ($; y) = y', subject to the constraint
( E Oh, where
Gh = {~ e R : U'Clog(Ae4) = 0, Le4 = 0, e'($jflR) n' = 0}
= { R: h() = 0,et'(flR)= n'},
and h'({) = [log(e4'A')C'U, eE'L'].
Consider the Lagrangian objective function
F(7) = e(M)(6; y) + (et'(eKlR) n')7 + h'(\)A,
where 7 = vec(, r, A). The K x 1 vector r and the (u + 1) x 1 vector A are
called either 'Lagrange multipliers' or 'undetermined multipliers'.
Provided a maximum exists and that the Jacobian of [e6'(eK1R)
n', h'(()] is of full row rank u + 1 + K for all 6 e Oh, we can solve for the
maximum by solving the system of equations
F () + D(e'')( lR)(') + H(iM)) )
% = ( f@ 1 )e m) 'n = 0 (2.3.7)
7 /h((M))
53
where the matrix H() = 8h'(()/98. The Jacobian condition basically
requires the constraints to be nonredundant, thereby making [Oh] a well
defined model.
From this point on, for notational convenience, the indices for the direct
sum will be omitted unless they are different from 1 and K.
We now require the matrices of models [Ox] and [Oh] to satisfy some
additional conditions. Let us assume that
(A2) Either C = Iq,K or Ci( lm,)= 0, i = 1,2
and
(A3) If C = Iq,K then M(Xi) D M(l$m,)
The assumptions require Ci to be either a contrast matrix (rows sum
to zero), a zero matrix, or the identity matrix. If Ci is the identity matrix,
it will be required that there exists a set of columns in Xi that spans a
space containing the range space of $(Klm,. For most models of interest
these conditions are met. For example, any of the logit type models, such as
cumulative or multiple logit models, can be specified with C being a contrast
matrix. For loglinear models, the condition (A3) is met whenever the model
includes a parameter for each of the K multinomials.
The following lemma will be useful in showing that the maximum
likelihood estimates of and j3 are equivalent under both sampling schemes
productPoisson and productmultinomial. The lemma will also enable us to
reduce the number of equations in (2.3.7) that must be simultaneously solved
when computing the maximum (multinomial) likelihood estimators.
54
LEMMA 2.3.1. If the matrices of models [Ox] and [Oh] satisfy (Al), (A2),
and (A3), then provided the model holds
( ) = ( 1'~)H() = 0.
Proof. Using matrix derivatives (MacRae, 1974; Magnus and Neudecker,
1988), it follows that
H(s) = [D(ef)A'DI(Ae4)C'U, D(ee)L']
Thus,
(e 1')H() = [(e e)A'D1(Aee)C'U, ( e:)L]
= [(@ee)[A',A']D1 ) (C e( C()U, @eeL.]
[[( e et)A'D1(Alef), (e eci)A'D1(A2e')](C. e C2)U, 0]
= [[( e eA'i)D'(Alef)C' ( eeA'i)D1(A2e)C2U, o]
= [( e 1')c, ( 1m, )CE]U, O
= 0,0]
=0
The third equality follows since the model holding implies that seiL = 0.
The sixth equality can be seen via the following argument.
If both Ci's are contrast matrices, or zero matrices, then (A2) implies
that the matrix [( $11)C', ($1',)C'] is the zero matrix. On the other hand,
if both C1 and C2 are identity matrices, then since the columns of U span
the null space of X', which, by (A3), implies that the columns of U span a
set contained in the null space of
elm2 '
sim, '
55
we have that [(e 1m), ( 1'))]U = 0. Any other combination of CO and C2
can also be seen to result in the matrix equaling zero. 0
The following theorem gives conditions under which we can find the ML
estimators of 6 by solving a reduced set of equations. The smaller system of
equations no longer includes the identifiability constraint equations.
THEOREM 2.3.1 Let vec((M), i(M), k(M)) be the solution to (2.3.7).
Assuming that (Al), (A2), and (A3) hold, the subvector vec(&(M), \(M))
is the solution to the reduced set of s +u + equations
h(+ H((M))0 (2.3.8)
Proof: Premultiplying the first set of equations in (2.3.7) by $1'W, we arrive
at
( 1'l)y + ( 1'I)D(eZM)( 1R)T + ( l1')H((M))iAM) = 0 (2.3.9)
Now, (e l')y = n and (E l'm)D(eE(M) = e~M)'. Also, since (M) E Oe it must
be that ( eeM )( e 1R) = D(n), the diagonal matrix with the multinomial
indices on the diagonal. Further, by Lemma 2.3.1,
( 11 ')H( /(M)) = 0. Therefore, (2.3.9) can be rewritten as
n + D(n) i(M) = 0,
which implies that 4(M) = 1K. Now, since the identifiability constraints have
been explicitly accounted for when solving for f(M), we can replace i(M) of
(2.3.7) by 1K and omit the identifiability constraints. Thus, vec( (M), \(M))
56 
is the solution to the reduced set of equations
( (m) + H(W(M))A(M) =
This is what we set out to show.
Before detailing the iterative scheme used for solving (2.3.8), we will
explore the asymptotic behavior of the estimator 0(M) = vec((M), ^(M))
within the framework of constraint models.
2.3.3 Asymptotic Distribution of ProductMultinomial ML Estimators
In what follows, we will assume that K, the number of identifiability
constraints, is some fixed integer, K > 1. We also will assume that the
asymptotics hold as n. = min{ni} approaches infinity and that n. ~ ni, i =
1,..., K. That is, we assume that the asymptotic approximations hold as
each of the multinomial indices get large at the same rate.
The derivation of the asymptotic distribution of b(M) will follow closely
that of Aitchison and Silvey (1958). Briefly, Aitchison and Silvey show that
if the score vector is op(n) and the constraints are such that the derivative
matrices H(() and OH'(()/98 have elements that are bounded functions then,
provided certain mild regularity conditions hold, the maximum likelihood
estimator is an n1/2consistent estimator of o and A is an n1/2consistent
estimator of 0. They show that the joint distribution of (n1/2( o), n1/2)
is multivariate normal with zero mean and covariance matrix
(B B1H(H'BH)H'B1 0 (
0 (H'B'H)1)
where B is the information matrix and H is the derivative of the constraint
function.
57
In our application, however, there are some minor changes. With the pa
rameterization we use, the information matrix is zero since the (multinomial)
log likelihood (2.3.2) is linear in the parameter This happens because the
identifiability constraints eE'( of 1R) = n' are ignored, to preserve symmetry,
when differentiating. Also, in our parameterization, the constraints are in
terms of ee, the components of which are eCi = n7rij. Thus, the constraints
and the corresponding derivative matrices may not be bounded. For example,
a typical constraint is of the form Let = 0. It follows that the components
of Let and the derivatives are increasing without bound as the multinomial
indices are allowed to increase without bound.
Fortunately, we can still use the results of Aitchison and Silvey (1958)
by replacing the matrix H and the vector A/n of Aitchison and Silvey by
our H/n. and A, where n. = min{ni}. The zero information problem can be
solved by identifying the vector Y e as the 'score vector'. It is pointed out
that, in this case, the asymptotic variance of @D'/2 (nlR) times the score
vector is not equal to the negative derivative matrix D(ro) but instead is
equal to D(ro) Eroi7r'j. This happens because the components of Y are not
independent; Y is product multinomial. Using this reparameterization, all of
the necessary assumptions required by Aitchison and Silvey (1958) hold, i.e.
assumptions X and of Aitchison and Silvey (1958) hold.
As previously mentioned, Aitchison and Silvey show that A is an
n1/2consistent estimator of 0. With our paramterization, having replaced
A/n by A, it follows that A(M) will be n,1/2consistent. We now derive the
asymptotic distribution of b(M).
58 
Define the stochastic function g by
g(O; Y) Y eY + H())
The maximum likelihood estimator 0(M) is the solution to g(O; Y) = 0.
Under our parameterization, using the results of Aitchison and Silvey
(1958), we have that each of the following hold
e(M) e6o = D(el)( (M) ) + Op(1),
H( (M)) = H(0) + Op(nl/),
h((M)) = h(o) + H'(Eo)( (M) o) + Op(1)
= H'(o)( (M o) + Op(1),
and
H(J(M))^(M) = H()A(M) + Op(l).
Thus,
O = g(m); Y) Y em) + H( (M))(M )
can be rewritten as
0 =Y eo D(eo)( (M) _o) + H(O )Op(l1)
H'( o)( (M) o) + Op(1)
(Y e)o (D(eo) H( o) (M) 0o O (1)
V 0 H'() 0 M(M)
Therefore, it follows that
eD1/2(n,1R) (Y eo) =
D(ir) niO~l n /2(+o(n;'/' (2.3.10)
since n, ~ n, i= 1,...,K and 0ro = ( D'(nil1))ef0.
59
Now, the random variable SD1/2(nilR)(Ye6o) is a vector of normalized
sample proportions so that
( D1/2(nl1R)(Y e(o))
has an asymptotic normal distribution with zero mean and covariance matrix
(D(7ro) ED70,7r1, 0)
0 0'
Therefore, by an extension of a theorem of Cramer (1949) and by equation
(2.3.10), it follows that n/2( (M) 0,) = n/vec((M) 0o, i(M)) has an
asymptotic normal distribution with mean zero and covariance
D(ro) (O D(7ro) e7roi7r o D(7ro) 2
(n. (2.3.11)
_o 0 0 0 _(o 0
S* \ *
This covariance matrix is shown in the appendix to have the simple form
(M, 0
0 M)
where
M, = D'(oo) D)1()H(H'D'(o)H)'H'D'(7o) $fK 1Rl
and
M2 = n (H'D( ro)H).
Finally, using the fact that n, ~ ni, i = 1,..., K, we can discriminantly
replace n* by the appropriate n, to arrive at a simple, asymptotically
equivalent, expression for the asymptotic covariance of i(M) = vec((M), \(M)).
 60 
It is
D1 D'H(H'D'H)'H'D1 R 0
0 (H'D1H)1) '(
where D = D(po) = D(eo) and H = H(o).
2.3.4 Lagrange's MethodThe Algorithm
In this section, we give details of how one can actually fit the models
of (2.3.4) or equivalently (2.3.5). We show how Lagrange's undetermined
multipliers method can be used in conjunction with a modified Newton
Raphson iterative scheme to compute the ML estimators and their asymptotic
covariances. We will assume that the model assumptions (Al), (A2), and (A3)
hold. This section includes an outline of the algorithm used in the FORTRAN
program 'mle.restraint'.
Recall that our objective is to find that (M) e Ox, where
Ox ={ R': Clog(AeC)=Xf3, Lee=0, (el))e=n},
that maximizes the multinomial log likelihood
(M) (; y) = Y'
Since the assumptions (Al), (A2), and (A3) hold, we see by Theorem
2.3.1 that our problem is reduced to one of solving the system of equations
(2.3.8), i.e. to find the ML estimator 9(M) = vec(i(M), \(M)) we must
simultaneously solve the system of s + u +1 equations
( eY e4H()A =0,
g~o) = ( h()
61
where the (u + 1) x 1 vector h and the s x (u + 1) matrix H are defined as
follows.
h()= U'Clog(Ae$)
and
Oh' (()
H() =
It will be shown in section (2.4) that g(0) is actually the derivative
of the Lagrangian objective function under the productPoisson sampling
assumption.
The iterative scheme used in the FORTRAN program 'mle.restraint' is
a modified NewtonRaphson algorithm. The algorithm can be sketched as
follows.
(1) Find a starting value for 8.
(2) Replace 0(") by 0("+1) = O(V) G1((Y))g(o(")) (2.3.13)
(3) If g(0(v+l))l > tol go to (2). Else stop.
The matrix G(8) used in step (2) is actually
G() + Op(n1 /2) (De) H( ))
and the inverse of G(O) is of the very simple form (see Aitchison and Silvey,
1958 or Rao, 1974)
G'() _D1 D1H(H'D1H)'H'D1 D1H(H'D1H)1
(H'D'H)1H'D1 (H'D1H)
(2.3.14)
62 
where D = D(ee). Since we use G(0) in place of the Hessian matrix, the
procedure is a modification to the NewtonRaphson method. Haber (1985a)
used the more complicated Hessian matrix.
Notice that the inversion of G, which may be performed at each iteration,
is not nearly as difficult as inverting a general matrix of dimension (s + u +
1) x (s + u + 1). First of all, in view of (2.3.14), to obtain the inverse of the
partitioned matrix G, we need only invert the matrices D and H'D1H, which
are of dimension s x s and (u + 1) x (u + 1). Secondly, the inversion of D is
simple since D is a diagonal matrix with et on the diagonal. Hence, the most
formidable task in the inversion process is the inversion of the symmetric
positive definite matrix H'D1H. There are many efficient ways to invert
large symmetric positive definite matrices.
Upon convergence of the algorithm (2.3.13), estimates of the asymptotic
covariances of (M) and A(M) are readily calculable. Write G1() of (2.3.14)
as
where
P = D1 D1H(H'D1H)1H'D1
Q = DIH(H'D1H)1
R =(H'D2H)1
By (2.3.12), the asymptotic covariance of i(M) = vec((M), (M)) can be
estimated by
var("l)=( P)efi 0 )
Variance estimates for other continuous functions of ^(M), such as
A(M) = eF(M and t(M) = (X'X)1X'Clog(Aee~M), can be found by invoking
 63 
the delta method. For example,
var(A(M)) aD(eD m ))var(^(M))D(ee) )
and
var( (M)) a
(X'X)lX' CD (AA(M))A(var(Af()))A'D (Ac(M))C'X(X'X).
Evidently, Lagrange's method of undetermined multipliers provides us
with a convenient procedure for maximum likelihood fitting of models in a
very general class of parametric models for multivariate polytomous data with
covariates possible. We now briefly outline the steps needed to perform the
iterations of (2.3.13).
Computing U. The first thing we must do is write the freedom model (2.3.4),
which can easily be input by the user, as a constraint model (2.3.5). Therefore,
we must compute a full column rank matrix U that satisfies U'X = 0. The
method we use to find U is attributed to Haber (1985b).
Using the notation of 'mle.restraint', let X be a full column rank matrix
of dimension q x r. Let u = q r be the dimension of the null space of X'.
Further the matrices A and C of (2.3.4) will have dimensions m x s and q x m
respectively. The relationship between these dimension variables and those
used in sections 2.3.1 and 2.3.2 is as follows
q K(q + q2)
r pi +P2
m K(m, +m2).
We use the variables q, r, and m for notational convenience.
64 
Consider the matrix U* = I, X(X'X)1X'. This q x q matrix is of rank
u = q r and satisfies the property
U*' X = 0.
Let W denote a q x u matrix with random elements. Specifically,
Wij ~ Uniform(0,100), i=l,...,q, j=l1,...,u.
It follows that the matrix W is of full column rank with probability one and
hence that the q x u matrix U = U*W is of full column rank u with probability
one. But the matrix U satisfies
U'X = W'U*'X = W'O = 0.
Therefore, at least with probability one, we have found a full column rank
matrix U that satisfies the property U'X = 0. Using this U, we are able to
write freedom model (2.3.4) as a constraint model (2.3.5).
Computing h(s). We write the constraint model of (2.3.5) as
{( e R : h()) = 0, e6'(e1lR) = n'}, (2.3.15)
where the constraint function h is defined as
h() = (U'Clo(Aet)
Computing g(O). Notice that since (Al), (A2), and (A3) hold, the
identifiability constraints present in the product multinomial model (2.3.4)
can be accounted for explicitly. It will follow by results of section 2.4, that
under either sampling schemeproductPoisson or productmultinomial
65 
the maximum likelihood estimators for ( and A can be found by solving the
equation
g(0)= et H()A) =0, (2.3.16)
where the matrix H is the derivative of h' with respect to (.
Computing H((). We will use matrix derivative results of MacRae (1974)
to find the matrix of derivatives of the constraint function h'().
H(0 h = ()= [log(e'A')C'U, ee'L']
= [D(ee)A'D'(Ae4)C'U, D(e4)L'].
The equality follows upon using the matrix version of the chain rule. Notice
that
a aef' a
(log(e 'A')C'U) ( (log(ee'A')C'U)
=D(e) log(eA') )CU + 0
.e ) OAe ]
= D(ee)A'D1(Ae)C'U
and that
Oet' L' Oee' Oet' L'
S D(ee)L'.
Computing G(8). The iterative scheme (2.3.13) used to solve the system
of equations (2.3.16) is actually a slight modification of the NewtonRaphson
algorithm. It is a modification because we do not use the derivative matrix
G* = Og(O)/O0 to adjust at each iteration, as Haber (1985a) did, but rather a
simpler matrix G that is related to G* by G* = G+ Op(n2 ). The derivative
66 
matrix G* can be computed as follows.
G*(8) = g(O) = [O )
(D(e) +
\ H'()
H, ()
Og(O) J
H( )
0 0/
+W~) o)
V(" H o o
The matrix
OH( )A __ H()(
is of order Op(n/2) when it is evaluated at 9 = vec(, A) since
H( =
aH Op(n,)
'96
and
A = Op(n.1/2).
It follows that the matrix G, which is much simpler to invert than G*, can
be used to adjust the estimate at each iteration.
Computing the inverse of G. Although the matrix G is of dimension
(s + u + 1) x (s + u + 1), which may be very large in practice, its inverse
is relatively simple to calculate. The inverse of the partitioned matrix
,G (D H\
= (H'
is shown by Aitchison and Silvey (1958) to have form
G1 (D1 D1H(H'DH)H'D DH(H'D HD )1
(H'D1H)IH'D1 (H'D1H)1 )
Therefore, only the matrices D and (H'D1H), which are of dimensions
s x s and (u + 1) x (u + 1), need to be inverted. The inverse of D is easily
 67
calculated since D is a diagonal matrix with e4 on the diagonal. The inverse
of (H'D1H), a symmetric positive definite matrix, can be found quite easily,
even when u + 1, the number of constraints, is large. It should be pointed
out that when s, the total number of cell means, is large, the number of
constraints u + 1 may be large and on the same order as s. This will be the
case for parsimonious modelsthose models with many constraints relative
to number of model parameters.
One could choose to invert the matrix G a limited number of times to
mitigate the computational burden. In fact, in their 1958 and 1960 papers,
Aitchison and Silvey advocate an iterative method whereby the inverse of G
is computed only two times. Once at the initial iteration and again at the
final iteration, upon convergence. We feel, however, that in this special case
in which the matrix G has a particularly simple form, the inverse can be
computed at each iteration. Along with increased computing power, there
are many efficient algorithms for inverting large symmetric positive definite
matrices.
2.4 Comparison of ProductMultinomial and ProductPoisson Estimators
We begin this section by introducing notation for a productPoisson
random vector.
The sxl random vector Y = vec(Y,,..., YK) is said to be productPoisson
if
Yij ~ indPoisson(eei), i =1,...,K, j=1,...,R. (2.4.1)
Suppose that the s = RK log means {(ij} satisfy the model [O(p)] where
8) = {( R': Clog(Ae4) = X3, Le = 0}
68 
or equivalently, for appropriately chosen U,
(P) = P) = { E R' : U'Clog(Aee) = 0, Lee = 0} (2.4.2)
This model implies all the same constraints on ( as the product
multinomial model [Oh] of (2.3.5), with one exceptionthe identifiability
constraints, eV'( $ lR) = n', are not included.
Denote the maximum likelihood estimators computed assuming (2.4.1)
and (2.4.2) by (P) and 3(P). Similarly, denote the maximum likelihood
estimators computed assuming (2.3.1) and (2.3.5) by (M) and (M).
Recall that the three productmultinomial model assumptions are
(Al) The multinomial response model can be specified as in (2.3.3).
That is the model parameter space can be represented as
Ox = { E R' : C1 log Ale = XP1, C2 log A2e = X232,
Le = 0, e'(ilR) = n'},
where
Ci = DgCij, Cij Cil, is qi x mi = 1, 2
Ai = afAij, Aij Ail, is mi x R, i = 1,2
L = $ Lj, L = L1 is dx R
= vec(,,..., Kg), and k is R x 1
Xi is Kqi x pi of full rank pi, i = 1,2
n is the K x 1 vector of multinomial indices
s = RK, the total number of cells.
69 
(A2) Either Ci Iq,K Or Ci( (l m,)= i =1,2,
and
(A3) If C, = IqK then M(X,) D M(lm,,).
The following theorem states that the maximum likelihood estimators for
E and hence p are the same under the productmultinomial sampling scheme
of (2.3.1) and the productPoisson sampling scheme of (2.4.1) provided that
the three assumptions (Al), (A2), and (A3) hold.
THEOREM 2.4.1 If the model (2.3.4) satisfies assumptions (Al), (A2), and
(A3), then
i(P) = (M) and (P) = 7(M)
That is, the maximum likelihood estimators of / and are the same under
both sampling schemesproductPoisson (2.4.1) and productmultinomial
(2.3.1).
Proof: Under the product Poisson assumption of (2.4.1) and (2.4.2), the
kernel of the log likelihood is
e(P)(; y) =y'( e'1,
Therefore, letting 0 = vec(, A), the corresponding Lagrangian objective
function is
Q(0) = y' e'1, + h'(\)A
and so to find the maximum (Poisson) likelihood estimator ^(P) = (i(P), 5(P))
we must solve the system of equations
9Q(O) y ei') + H(i())() (2.4.3)
o h( (P))0. (
The conclusion of the theorem now follows, since the equations (2.3.8) of
 70 
Theorem 2.3.1 and (2.4.3) yield exactly the same solutions and
(P) = (X'X) X'Clog(Ae'P)) = (X'X)lX'Clog(AeM') = (M)
As a corollary to Theorem 2.4.1 we have
COROLLARY 2.4.1 Provided the assumptions of Theorem 2.4.1 hold, the
estimated undetermined multipliers are invariant with respect to sampling
scheme, i.e.
A(M) = (P)
Proof: The proof follows immediately upon noting that equations (2.3.8)
and (2.4.3) yield exactly the same solutions. 0
A remark is in order. Basically, Theorem 2.4.1 enables us to conclude
that the sufficient and necessary condition of Birch (1963) holds. These
conditions are that the model be specified so that the Poisson ML estimators
necessarily satisfy the identifiability constraints that are required for the
multinomial model.
We now explore the asymptotic behavior of the (Poisson) ML estimator
b(P) = vec(i(P), i(P)). For the productPoisson assumptions (2.4.1) and
(2.4.2), we can obtain the asymptotic distribution of 9(P) by formally replacing
the n, = min{ni} by p, = min{ei } and using the same arguments as those
used to derive the asymptotic distribution of i(M).
J0rgenson (1989) discusses limiting distributions for Poisson random
variables as the mean parameters, or equivalently /*, go to infinity. In this
 71 
case,
g(0o; Y) =( 0e)
has an asymptotic normal distribution with mean zero and asymptotic
covariance
(D(yo) 0)
0 0)
Using arguments similar to those used in the multinomial case, it follows that
(Y ) = (D(o) OH) (O )+f ( OP+1
0 ) H' ) jOp)
We conclude, as in the product multinomial case, that (P) Oo has an
asymptotic normal distribution with mean zero and asymptotic covariance
(D(po) H) (D(o) 0 D( C) H r
H' 0 0 Oj 0\ H' 0
But, this can again be simplified as it was in the multinomial case. It can be
shown that the asymptotic covariance can be rewritten as
D1 D'H(H'D'H)'H'D1 0 (2.4.4)
0 (H'D'H)l ( 4)
where D = D(o) = D(e~O) and H = H(o).
Comparison of the Asymptotic Distributions. Provided assumptions (Al),
(A2), and (A3) hold, both (P) 0o and b(M) Oo have asymptotic normal
distributions with zero means and respective covariances given in (2.4.4) and
(2.3.12). Therefore, we have the following interesting results.
Result 1. The asymptotic covariances of (P) and i(M) are related by
var(()) = var(()) (2.4.5)
( 0 0)
72 
Result 2. The asymptotic distributions of A(P) and ^(M) are identical and
it follows that the Lagrange multiplier statistic which has form
LM = A'(var(A))l = A'(H'D1H)A
is invariant with respect to the sampling scheme.
Result 3.
K (P) A (P)'
var((M)) = var(()) f (2.4.6)
Result 4.
var(/(M)) = var( ()) A (2.4.7)
where
A = (X'X)X'C ) (e l )C'X(X'X).
and is nonnegative definite.
The notation var(.) used in these results denotes the asymptotic variance.
This is important since the finite sample variances may not even exist.
The proofs for Results 3 and 4 are straightforward. Basically, they
involve using the delta method and equation (2.4.5). The interested reader
will find an outline of the proofs in Appendix A.
In practice, it is of particular interest to evaluate the matrix A of equation
(2.4.7). Often, for convenience, the models are fit assuming the vector Y
is product Poisson and then inferences based on the maximum likelihood
estimates are made assuming that they are invariant with respect to the
sampling assumption. Birch (1963) and Palmgren (1981) derive rules for
73
when these inferences, based on the two different sampling assumptions, will
be equivalent. However, they assume that the model is of a simple loglinear
form. That is, the Poisson model is assumed to have form
Ox = { R': = XI3}.
We will use the results of this section to derive more general rules for when
the two inferences will be equal. As a special case of these results, we will
arrive at the Birch and Palmgren results.
The following lemma will enable us to rewrite A of (2.4.7) in still a
simpler form.
LEMMA 2.4.1 Let Z = [Z1,..., ZK] be an r x K matrix of full rank K.
Suppose that X = [Xx,..., Xp] is an r x p (r > p > K) matrix of full rank p
such that M(X) M(Z), i.e. the range space of X contains the range space
of Z. Denote the T (K < T < p) columns of X that span a space that contains
M(Z) by {X,,,...,X,,}. Without loss of generality, suppose that the set of
vectors {XX,,...,Xv,} is a minimal spanning subset, i.e. the spanning set
of any r < T of these vectors does not contain the range space of Z. We
conclude that
3W e RTxK 9 (X'X)'X'Z = JW,
where the p x T matrix J = [ex,...,e v] and ey, is the p x 1 vector
(0,...,0,1,0,...,0)' with the '1' in the vth position.
74 
Proof: Let X. = [X,,...,XT]. Now, by assumption, M(X.) D M(Z).
Hence, there must exist a matrix W E RTxK 3 Z = X.W. Therefore,
(X'X)1X'Z = (X'X)'X'X.w = (X'X)'(X'X.)w = JW
where J = (X'X)'(X'X.) is as stated in the conclusion of the lemma. *
Before stating the next important theorem, let us write A in another
way. Assuming that (Al) holds, A can be written as
A(Al A12)
A = A21 A22) (2.4.8)
where
A'J = (XiX) xjC( )(D )CXj(XjXj)
Now, if Ci is a contrast matrix, by assumption (A2), we can write
(X X)'X 1( i m ) = J(iW('), (2.4.9)
where J(i) can arbitrarily be chosen to be equal to Xj and so W() = 0. On
the other hand, if Ci = IK then we have by (A3) that M(Xi) 2 M((lm,).
Therefore, we can invoke the result of Lemma 2.4.1 by setting Z = e .
Since M(Xi) 2 M(lm,,) = M(Z), the conditions for the lemma are satisfied.
Let Xi, = [X. (,),..., X ,)] be the miK x Ti (K < T, < pi) submatrix of X,
that has columns that form a minimal spanning subset for M(Z) = M(Ei).
By Lemma 2.4.1,
3W() e RTxK 3 (X'XiX 1X, m) = J(')W(i). (2.4.10)
Here, J(i) = [e (),..., e ()], where the Ti elementary vectors correspond to the
1 Ti
columns {X. (i,..., X. )} of Xi that form a minimal spanning subset for the
ITi
75 
range space of lm,, i.e. the Ti columns span a set that contains the range
space of elm, and any smaller set of columns will not span a set containing
the range space of ,lm, .
It follows that the matrices Aij of (2.4.8) can be written as
Ai' = J()W(W'(W()J,(j) (2.4.11)
where
j( [e) ,...,e()], if Ci = IqiK
W vzr(2.4.12)
Xi[, otherwise
and
W(i) W(), if C = IqK (2.4.13)
10, otherwise.
We now state a theorem of substantive importance.
THEOREM 2.4.2 Suppose that assumptions (Al), (A2), and (A3) hold. For
r = 1,2, if Cr is the identity matrix then let {v(r),... )} be the set of
indices that index those columns of X, that form a minimal spanning subset
for M($lm,). Then it follows that the relationship between the asymptotic
variances of the two estimators 3(M) and p(P) is
var( ((M) = var(/(")) ( 21 A1
where the pi x pj matrix Aii is a zero matrix whenever at least one of Ci or
Ci is a contrast or zero matrix. Otherwise, if both Ci and Cj are identity
matrices then
= 0, if (k,l) I {v) (i)*T" {v/1 (i)
 76
Proof: Since (Al), (A2), and (A3) hold, we can rewrite AiJ as in (2.4.11).
Now, if either C, or Cj are contrast or zero matrices, it is obvious by (2.4.9)
that Aij will have zero components, as stated in the theorem, since at least
one of W(i) or W(j) will be a zero matrix. On the other hand, if both C, and
Ci are identity matrices, then A'j can be rewritten as in (2.4.11) where
ji) = [e (, ...,e ],
J7i)= [(e ,,..., (,,
and the matrices W(i) and W(i) are elements of RTixK and RTjxK. Hence,
1 Ti e
where Wii = W(i)W'(i) is some Ti xT matrix. Now, since {e,} are elementary
vectors, we have that if
(k,1) {v ),..., )} x {vi),..., (),
then the component A' = 0. Otherwise, if (k, 1) is a member of this set, it
must be that A' is one of the elements of the matrix Wii. This completes
the proof.
The next two corollaries follow immediately from Theorem 2.4.1.
COROLLARY 2.4.2 If both C1 and C2 are contrast matrices then
var( (M)) = var((P )).
 77 
Proof: Since both 7C and C2 are contrast matrices it follows that W(1)
and W(2) are zero matrices. Therefore, the matrices Aij of the theorem are
zero matrices.
COROLLARY 2.4.3 Let C2 = 0,X2 = 0, and C1 A = I, = so that the model
(2.3.4) becomes
{h = {E R': = XP3, ee'( $1R) = n'},
i.e. a simple loglinear model with K subpopulations. Let {vl,..., vT} be the set
of indices that indez the columns of X that form a minimal spanning subset
for M(eflR). Then
var(A(M)) = var((P)) A,
where the elements of A are such that
Ak = 0, if (k, ) V {v,,..., VT}2
Proof: The proof is an immediate consequence of the theorem upon
identifying A" of the theorem with A of the corollary. The other matrices
A12, A21, and A22 will be zero since C2 = 0. 0
Corollary 2.4.3 is of practical importance and is essentially the result
shown by Palmgren (1981). In particular, if we parameterize the model in
such a way so that there is a parameter included for each of the K independent
multinomials (or K covariate levels), then the K columns of X corresponding
to these K 'fixed by design' parameters will form a basis (and hence a minimal
spanning subset) for AM(efRl). Therefore, if 3i and pj are not one of the
 78 
K parameters fixed by design, then cov(f(M), M)) = cov(1(P),~P). We
will illustrate the utility of the above results in the next chapter of this
dissertation.
The next section considers issues that may arise when computing the
model degrees of freedom. It also states some other miscellaneous results
with regard to the Lagrange multiplier statistic.
2.5 Miscellaneous Results
We begin this section by addressing practical issues that may arise during
nonstandard model fitting. Specifically, we will consider computing the model
and distance (or residual) degrees of freedom.
Computing model and distance degrees of freedom. Assuming the model
[Oh] of (2.3.5) is well defined, i.e. the u+I+K constraints are nonredundant,
we can compute the model degrees of freedom as in section 2.2. In that
section, we defined the model degrees of freedom as the number of model
parameters minus the number of independent constraints implied by the
model. Notice that in this application we have an additional 1 linear
constraints. The I constraints were not present in section 2.2. It follows
that the model degrees of freedom for [Oh] is
df[Oh] = s (u + 1 + K) (2.5.1)
where s is the number of cell means, u is the dimension of the null space of X',
1 is the number of linear constraints, and K is the number of identifiability
constraints.
79 
To measure model goodness of fit, we can consider estimating some
hypothetical distance between model [Oh] and the saturated model (u = 1 = 0)
[O]. This distance, denoted S[eO; 0] has degrees of freedom
df ([Oh; 0]) = df[O] df[Oh]
= (s K) (s (u + + K)) (2.5.2)
= U + 1.
Notice that, had we considered the product Poisson model (2.4.2), the
distance degrees of freedom would be
df (6[O); O(P)]) = s ( ( + 1)) = + ,
which is identical to the product multinomial distance degrees of freedom of
(2.5.2).
We have assumed that the u +1+ K constraints are nonredundant, i.e.
each constraint is not implied by the other constraints. This may not always
be the case. To illustrate, consider the model specification for example 3 of
section 2.2.2. The model [OMH] implies that the two marginal distributions
are equal. We stated at the end of that example that the additional constraint
7r2+ 7r+2 = 0 was redundant. This can be seen since
72+ 7r+2 = r21 r12 = (7rl+ 7r+i) = 0
That is, the constraints of model [OMH] imply that 7r2+ r+2 equals zero.
Had we blindly added this constraint, we may have incorrectly calculated
the model degrees of freedom as 1 and the distance degrees of freedom as 2.
Therefore, we must be very careful to have a set of nonredundant constraints
when computing degrees of freedom.
80 
In practice, when models are more complicated, it may be difficult to as
certain whether or not the model constraints are nonredundant. Fortunately,
there are two very useful results that help in this regard.
The first result is that when the constraints are redundant, the matrix
(H'D1H) evaluated at some point in Oh is of less than full rank and is not
invertible. Therefore, in practice, if the algorithm (2.3.13) does not converge
due to G being singular, it may be due to redundant constraints, i.e. an ill
defined model. The user should investigate and possibly respecify the model
should this occur. A caveat is that due to computational roundoff error, a
singularity may not occur even when the model is ill defined because the
iterate estimates, including the final estimate, may not strictly lie in Oh. The
next result may mitigate this problem.
A result that is useful in practice is that a necessary condition for the
constraints to be nonredundant or equivalently for the model to be well
defined, is that the Lagrange multiplier statistic be invariant to choice of
U, a matrix with columns spanning the null space of X'. Evidently, if the
user fits the model several times, each time using a different 'U' matrix, and
the Lagrange multiplier statistic varies (more so than can be explained by
roundoff error), then it must be that the model is ill defined.
Formally, this necessary condition can be stated as
THEOREM 2.5.1 Let U1 and U2 (U1 E U2) be any two full column rank
matrices satisfying ULX = 0, i = 1, 2. Denote the Lagrange multiplier statistic
evaluated using Ui by LM(Ui). If the matrix
H Oh() _= eA)CU,
Hi = 1= Zt[log(e4'A')C'Uj, e4TL']
C~ ar
81
is such that [Hi, ee] is of full column rank, i = 1, 2, and hence the models well
defined, then
LM(UI) = LM(U2),
i.e. the value of the Lagrange multiplier statistic is invariant with respect to
choice of U.
Proof: Denote the model specified in terms of Ui by [Oh,], i = 1,2. By
the definition of U, we know that the constraints implied by [Oh1] and [Oh,]
are equivalent. Hence, the solution to (2.3.8), or equivalently (2.4.3), under
either model is the same. Thus, in view of the first set of equations in (2.3.8),
any solution vec( A,) under model [Oh,] must satisfy
(ye) = Hi()A;, i= 1,2. (2.5.3)
Notice that since U1 f U2, we have that H () 5 H2() and by (2.5.3) Ai f A2.
Now, (2.5.3) implies that
Hi()Ai = Hg2()2. (2.5.4)
Also, since Hi() is assumed to be of full column rank, the variance of A,,
var(A,) = (H)()D'(eZ)H,())1 (2.5.5)
exists. Therefore, the Lagrange multiplier statistics LM(Ui), which have form
\[var(i)]'5^, i= 1,2
(2.5.6)
82 
exist. Finally, by (2.5.4)(2.5.6), it follows that
LM(U1) = I[var(il)]1l
= A'(H:(()Dl(e )Hz())Ai
= 2(H'( )D1(ei)H2())\
= i'[var(l2)]12
= LM(U2).
This completes the proof.
The final result of this section states that the Lagrange multiplier
statistic is exactly the same as the Pearson chisquared statistic whenever the
random vector Y is productPoisson or productmultinomial and the model
satisfies assumptions (Al), (A2), and (A3).
THEOREM 2.5.2 Assume that the productmultinomial model satisfies
assumptions (Al), (A2), and (A3). Let X2 denote the Pearson chisquared
statistic, i.e.
2 = (y )'D'()(y i)
where A is the ML estimator under either of the sampling schemesproduct
multinomial or productPoisson. It follows that the Lagrange multiplier
statistic LM is equivalent to X2. That is,
LM = X2.
Proof: By equations (2.5.3), (2.5.5), and (2.5.6) of the previous theorem's
proof and the fact that eM = M, we have that
LM = (y A)'D'(f)(y 4) = X2
This is what we set out to show.
83
2.6 Discussion
In this chapter, we discussed in some detail issues related to parametric
modeling. In particular, we followed the lead of Aitchison and Silvey (1958,
1960) and Silvey (1959) and described two ways of specifying modelsusing
constraint equations and using freedom equations. In section 2.2, distance
measures for quantifying how far apart two models are, relative to how close
they are to holding, were discussed. In particular, the powerdivergence
measures (Read and Cressie, 1988) were used when the parameter spaces were
subsets of an (s 1)dimensional simplex. Estimates of these distances were
developed based on very intuitive notions. Also, a geometric interpretation
of model and residual (or distance) degrees of freedom was given.
In section 2.3, we described a general class of multivariate polytomous
(categorical) response data models. The class of models, which satisfy
assumptions (Al), (A2), and (A3), were shown to satisfy the necessary and
sufficient conditions of Birch (1963) so that the models could be fitted using
either the productPoisson or productmultinomial sampling assumption.
An ML fitting method was developed, using results of Aitchison and Sil
vey (1958, 1960) and Haber (1985a, 1985b). The algorithm used Lagrangian
undetermined multipliers in conjunction with a modified NewtonRaphson
iterative scheme. The modification, which simplifies the method of Haber
(1985a), is to use a simpler matrix than the Hessian matrix. We replace
the Hessian matrix (of the Lagrangian objective function) by its dominant
part, which turns out to be easily inverted. Because the matrices used in the
algorithm proposed in this chapter are very large and must be inverted, this
84 
modification is a very important one. A FORTRAN program 'mle.restraint'
has been written by the author to implement this modified algorithm.
The asymptotic behavior of the ML estimators computed under the two
sampling schemesproductPoisson and productmultinomialwas investi
gated. The method for deriving the asymptotic distributions represents a
modification to the technique of Aitchison and Silvey (1958). A comparison of
the limiting distributions of the two estimators was made in section 2.4. Some
very interesting results were obtained by studying the asymptotic behavior
in the constraint equation setting. In particular, Theorem 2.4.2 represents
a generalization of the results of Palmgren (1981). The theorem provides a
method for determining when the inferences about the freedom parameters
of a generalized loglinear model of the form C log Apl = X/f will be invariant
with respect to the sampling assumption. Palmgren (1981) developed some
similar results for the special case when the freedom parameters are part of
a loglinear model.
It is important to note that the asymptotic results are only valid if
the number of populations K is considered fixed and the expected counts
all get large at approximately the same rate. In particular, the asymptotic
arguments do not hold when the covariates are continuous, since the number
of populations (levels of the covariates) can theoretically run off to infinity.
The reason the arguments do not hold is that when we use the method of
Aitchison and Silvey (1958) it is required that the vector n,' lY;Y~ converge
in probability to zero as the total number of observations gets large. This is
the case only when n* = min{nl,..., nK} goes to infinity. This drawback
85 
could prove to be temporary. It seems reasonable to assume in many cases,
that as long as the 'information' about each parameter is increasing without
bound, the estimators will be consistent and asymptotically normally dis
tributed. For example, consider the logistic regression model with continuous
covariates. Although the nk's may all be 1, the ML estimators of the
regression parameters are often consistent and asymptotically normal.
Section 2.5 outlines some miscellaneous results. One result that is
important to the practicing statistician, is that the Lagrange multiplier
statistic is shown to be invariant with respect to choice of the matrix U
(of U'Clog Ay = 0) as long as the model is well defined. An important
implication of this result is that if one fits the model several times, each
time using a different 'U' matrix, and the Lagrange multiplier statistics
vary more so than can be explained by roundoff, then it could be that the
model is not well defined. Another interesting result is that the Lagrange
multiplier statistic is simply the Pearson chisquared statistic X2 whenever
the assumptions (Al), (A2), and (A3) are satisfied.
Theoretically the ML fitting algorithm will work for any size problem.
Practically, however, the algorithm is certainly not a model fitting panacea.
The number of parameters that must be estimated gets very large, very fast.
Consider the case where 7 raters rate the same set of objects on a 5 point
scale. Even without covariates, the number of cell probabilities that must be
estimated is 57 = 78, 125. It seems the ML fitting method developed in this
chapter is, at least for now, useful for moderate size problems only. It can be
used to analyze longitudinal categorical response data when the number
86 
of measurements taken on each subject is somewhere in the neighborhood of
2 to 6. This is not to take away from the utility of this chapter's algorithm,
but rather to indicate its breadth of application. In time, with increasing
computer efficiency, much larger data sets may be fitted using this algorithm.
CHAPTER 3
SIMULTANEOUSLY MODELING THE JOINT AND MARGINAL
DISTRIBUTIONS OF MULTIVARIATE POLYTOMOUS
RESPONSE VECTORS
3.1 Introduction
Often times, when given an opportunity to analyze multivariate response
data, the investigator may wish to describe both the joint and marginal
distributions simultaneously. We consider a broad class of models which
imply structure on both the joint and marginal distributions of multivariate
polytomous response vectors. To illustrate the need for such models, we
consider several settings where these models would be useful. For example,
when the multivariate responses represent repeated measures of the same
categorical response across time, one may be interested in how the marginal
distributions are changing across time and how strongly the responses are
associated. The simultaneous investigation of both joint and marginal
distributions is not restricted to the longitudinal data setting. Other examples
include the analysis of rater agreement, crossover, and social mobility data.
The common thread tying all of these data types together is that the sampling
scheme is such that the different responses are correlated. In longitudinal
studies the same subject responds on several occasions. In rater agreement
studies, raters rate the same objects. In twoperiod crossover studies, one
group of subjects receive the two treatments in one order and the other group
receive them in the other order. In social mobility studies, the socioeconomic
87
88 
status of a fatherson pair is recorded. When the responses are positively
correlated, these designs result in increased power for detecting differences
between the marginal distributions (Laird, 1991; Zeger, 1988).
This chapter considers the modeling of multivariate categorical responses
in which the same response scale is used for each response. The classes
of models used in this chapter are of the form considered in Chapter 2 of
this dissertation and hence are readily fit using the ML methods of that
chapter. In section 3.2, we give several examples that may be analyzed by
simultaneously modeling the joint and marginal distributions. We introduce
the classes of simultaneous JointMarginal models in section 3.3. Several
models are fitted to the data sets of section 3.2.
3.2 ProductMultinomial Sampling Model
Initially, we assume that a random sample of nk subjects is taken from
population k, k = 1,..., K. The number of populations, or covariate profiles,
K is considered to be some fixed integer. The subscript k is allowed to be
compound, i.e. the subscript k is allowed to represent a vector of subscripts
such as
k = (ki, k2,..., ik).
Suppose that there are T categorical responses V(1),..., V(T) of interest
and that each response is measured on the same response scale. Let
Vk = (Vk),..., VT))' be the random vector of responses for population k
and Vk,, u = 1,...,nk be the nk independent and identically distributed
copies of Vk, where Vk, denotes the response profile for the uth randomly
89 
chosen person within population k. Notationally we have,
Vku ~ i.i.d. Vk, u = 1,...nk
For our purposes we can assume that each response takes on values in
{1,2,...,d} with probability one. Denote the probability that a randomly
selected subject from population k has response profile i = (i,..., iT)' by 7rk,
i.e.
P(Vk = (i,... iT)') =
where i e {1,...,d} x .. x {1,...,d}.
The joint distribution of Vk = (V(),..., Vk(T))' is specified as {7rik}. The
marginal distributions of Vk will be denoted by {(i(t; k)}, t = 1,..., T, where
,(t; k)= P(V,(t) = i), i= 1i,...,d
Our objective is to model simultaneously the K joint distributions
{7TCk}, k= ,...,K
and the KT marginal distributions
{i(t; k)}, t= 1,...,T, k = l,...,K.
To help the reader better understand the notation, we consider the one
population bivariate case. When T = 2, the response profiles can be denoted
by i = (il,i2) = (i,j), where i = 1,...,d and j = 1,...,d. Since there is
just one population (or covariate profile) the subscript k is always 1 and is
therefore dropped. It follows that {7rij} is the joint distribution of (V('), V(2))'
and { i(t)}, t = 1, 2 are the two marginal distributions. That is,
7r, = P(V(I) = i, V(2) = j), i= l,...,d, j= l,...,d
 90 
and
(t) = '7i+ = P(V(I) = i), if t= l
7r+i = P(V(2) = i), if t = 2
for i= 1,2,..., d.
Now for each population k, consider the dT x 1 random vector of
indicators
'k = [I(V&=t1), .. ., I(V=_idT)]'
Notice that no information about the Vk is lost since 4k is a onetoone
function of Vk. Also,
xk ~ ind. Mult(1, {7rik}), k= 1,..., K
Therefore, since we have randomly sampled nk subjects from each of the K
populations, we have that for given k
Tki k2, .,*k, ~ i.i.d. Mult(1, {7rik})
and hence the vector
nk
Yk = E k Mult(nk, {rik})
u=l
is sufficient for the family of distributions {rik} and {(i(t; k)}.
By independence across populations, the vector vec(,Y 2 ,.. .,YK) is
sufficient for the joint and marginal distributions of vec(V, V2,...,VK).
Further, the random vector vec(Yi, 2,...,YK) is productmultinomial, i.e.
Yk = (Yk,...,YR)' ~ ind Mult(n, {7ik}), k = 1,...,K
where 1,...,_R represent the R = dT different response profiles.
91
Evidently, Yik represents the number of randomly selected subjects from
population k who have response profile i. That is, the {Yik} represent counts
resulting from a crossclassification of N = E'=1 nk subjects on T response
variables and a population variable. The data can be displayed in a dT x K
contingency table. By convention, we use lower case Roman letters to denote
realizations of random quantities. For example, yik represents a particular
realization of Yik.
Consider Table 3.1, taken from Hout et al. (1987).
Table 3.1. Interest in Political Campaigns
1960
Not Much Somewhat Very Much
Not Much
1956 Somewhat
Very Much
335
499
369
278 444 481 1203
Source: Hout et al. (1987), p. 166, Table 4
Each of 1203 randomly selected subjects was asked in 1956 how inter
ested they were in the political campaigns. They responded on the 3category
ordinal scale: 1 = Not Much, 2 = Somewhat, and 3 = Very Much.
Then, in 1960, each of the subjects was asked the same question and
responded on the same 3category ordinal scale. Using the above notation,
155 116 64
91 237 171
32 91 246
92 
we let V(1) and V(2) represent the responses in 1956 and 1960. Let yij, i,j =
1,2,3 represent the number of the N = 1203 subjects responding at level
i in 1956 and level j in 1960. Notice that there is just one population
of interest, we drop the population subscript altogether. Finally, for this
bivariate response example, the compound subscript i is replaced by ij. Table
3.1 summarizes the bivariate responses.
As another example, consider the crossover data of Ezzet and White
head (1991).
Table 3.2. Crossover Data
B B
1 2 3 4 1 2 3 4
1 59 35 3 2 1
A 2 11 27 2 1 A 2
3 0 0 0 0 3
4 1 1 0 0 4
63 40 7 2
13 15 2 0
0 0 1 1
0 0 0 0
AB Sequence BA Sequence
(Group 1) (Group 2)
The counts displayed in Table 3.2 are from a study conducted by 3M
Health Care Ltd. to compare the suitability of two inhalation devices (A and
B) in patients who are currently using a standard inhaler device delivering
salbutomal. Two independent groups of subjects participated. Group 1 used
device A for a week followed by device B (sequence AB). Group 2 used the
devices in reverse order (sequence BA).
The response variables V(') (device A) and V(2) (device B) are ordinal
polytomous. Specifically, they are the selfassessment on clarity of leaflet
instructions accompanying the two devices, recorded on the ordinal four point
scale,
 93 
1 = Easy
2 = Only clear after rereading
3 = Not very clear
4 = Confusing.
For this example there are two populations of interestGroup 1 and
Group 2. Let yik represent the number of the nk subjects responding at level
i for device A and level j for device B, where nl = 142 and n2 = 144. Again,
the bivariate response profiles can be denoted by i = ij where i, j = 1, 2, 3, 4.
The bivariate responses are summarized in Table 3.2.
3.3 Joint and Marginal Models
Two types of questions that can be posed about Table 3.1 lead to quite
distinct types of models. One question is whether the interest in the political
campaigns was different at the two times. For example, the researcher
may wish to test the hypothesis that there was more interest in the 1960
political campaign than the 1956 political campaign. An investigation into the
marginal distributions is needed to test this hypothesis. For these bivariate
response data, the marginal distributions correspond to the row and column
distributions of Table 3.1. A second question that may be asked is whether
the two responses are associated and if so, how strong is the association. To
answer these questions, we must describe the dependence displayed in the
joint distribution of Table 3.1.
The marginal models we consider will be used to investigate whether
the probability that a randomly selected subject responds at level i or lower
in 1956 is different from the probability that a randomly selected subject
responds at level i or lower in 1960. In this sense, the comparison of marginal

Full Text 
ON MODEL FITTING FOR MULTIVARIATE POLYTOMOUS
RESPONSE DATA
By
JOSEPH B. LANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1992
UNIVERSITY OF FL0R1BA LIBRARIES
ACKNOWLEDGMENTS
I would like to express my appreciation to Dr. Alan Agresti for serving
as my dissertation advisor. For the many comments, ideas, and lessons he
has shared with me, I am greatly indebted. Through his advisement and
guidance, he has taught me to appreciate and respect good statistical research
and teaching. He is a mentor worthy of emulation. I also want to express
my gratitude to Dr. Jane Pendergast, who also served on my dissertation
committee. I learned a great deal from her during the two years that I worked
in the Biostatistics Department. To all of the faculty at the University of
Florida, I extend my thanks. The statistics department, with its scholarly
and friendly atmosphere, proved to be a wonderful place to learn.
The influences of persons from my past are not forgotten. Without
Patrick Kearinâ€™s stimulating teaching of high school math, I may never have
become interested in this subject. The genuine excitement delivered by Dr.
James Kepner, in his teaching of undergraduate statistics, was the reason I
decided to pursue an advanced degree in statistics.
I would like to thank my parents and the rest of my family for all of the
support and encouragement they have given over the course of my studies
and research. My friends and student colleagues deserve many thanks as
well. Finally, I would like to thank Kendra Paar for always being there to
support and encourage me while I was writing this paper.
n
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS ii
LIST OF TABLES v
ABSTRACT vi
CHAPTERS
1 INTRODUCTION 1
1.1 A Brief Introduction to the Problem 1
1.2 Outline of Existing Methodologiesâ€”No Missing Data 3
1.3 Outline of Existing Methodologiesâ€”Missing Data 12
1.4 Format of Dissertation 14
2 RESTRICTED MAXIMUM LIKELIHOOD FOR A
GENERAL CLASS OF MODELS FOR
POLYTOMOUS RESPONSE DATA 17
2.1 Introduction 17
2.2 Parametric Modelingâ€”An Overview 24
2.2.1 Model Specification 25
2.2.2 Measuring Model Goodness of Fit 33
2.3 Multivariate Polytomous Response Model Fitting 43
2.3.1 A General Multinomial Response Model 44
2.3.2 Maximum Likelihood Estimation 48
2.3.3 Asymptotic Distribution of ProductMultinomial
ML Estimator 56
2.3.4 Lagrangeâ€™s Methodâ€”The Algorithm 60
2.4 Comparison of ProductMultinomial and
ProductPoisson Estimators 67
2.5 Miscellaneous Results 78
2.6 Discussion 83
iii
page
3 SIMULTANEOUSLY MODELING THE JOINT AND
MARGINAL DISTRIBUTIONS OF MULTIVARIATE
POLYTOMOUS RESPONSE VECTORS 87
3.1 Introduction 87
3.2 ProductMultinomial Sampling Model 88
3.3 Joint and Marginal Models 93
3.4 Numerical Examples 98
3.5 ProductMultinomial Versus ProductPoisson
Estimators: An Example Ill
3.6 WellDefined Models and the Computation of
Residual Degrees of Freedom 121
3.7 Discussion 132
4 LOGLINEAR MODEL FITTING WITH
INCOMPLETE DATA 135
4.1 Introduction 135
4.2 Review of the EM Algorithm 137
4.2.1 General Results 138
4.2.2 Exponential Family Results 140
4.3 Loglinear Model Fitting with Incomplete Data 144
4.3.1 The EM Algorithm for Poisson Loglinear Models 145
4.3.2 Obtaining the Observed Information Matrix 148
4.3.3 Inferences for Multinomial Loglinear Models 152
4.4 Latent Class Model Fittingâ€”An Application 160
4.5 Modified EM/NewtonRaphson Algorithm 166
4.6 Discussion 170
APPENDICES
A CALCULATIONS FOR CHAPTER 2 172
B CALCULATIONS FOR CHAPTER 4 176
BIBLIOGRAPHY 193
BIOGRAPHICAL SKETCH 200
IV
LIST OF TABLES
page
2.1 Opinion Poll Data Configuration 22
3.1 Interest in Political Campaigns 91
3.2 CrossOver Data 92
3.3 Joint Distribution Modelsâ€”Goodness of Fit 100
3.4 Marginal Distribution Modelsâ€”Goodness of Fit 101
3.5 Candidate Models in J(L x L + D) n M{U)â€”Goodness of Fit... 102
3.6 Estimates of Freedom Parameters for
Model J(L x L 4 D) n M(CU) 103
3.7 Freedom Parameter Estimates and Standard Errors 105
3.8 Estimated Cell Means and Standard Errors 106
3.9 CrossOver Data Modelsâ€”Goodness of Fit 110
3.10 Freedom Parameter ML Estimates for Model J(UÃ) n M(U).... 110
3.11 Childrenâ€™s Respiratory Illness Data 112
3.12 ProductMultinomial versus ProductPoisson Freedom
Parameter Estimation 117
4.1 Observed crossclassification of 216 respondents
with respect to whether the tend toward
universalistic (l) or particularistic (2) values
in four situations (A,B,C,D) of role conflict 162
4.2 Parameter and Standard Error Estimates 164
4.3 Classification Probability Estimates 165
v
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
ON MODEL FITTING FOR MULTIVARIATE POLYTOMOUS
RESPONSE DATA
By
Joseph B. Lang
May, 1992
Chairman: Dr. Alan Agresti
Major Department: Statistics
A broad class of models that imply structure on both the joint
and marginal distributions of multivariate categorical (ordinal or nominal)
responses is introduced. These parsimonious models can be used to siÂ¬
multaneously describe the marginal distributions of the responses and the
association structure among the responses. As a special case, this class
of models includes classical log and logitlinear models. In this sense,
we address model fitting for multivariate polytomous response data from
a very general perspective. Simultaneous models for joint and marginal
distributions are useful in a variety of applications, including longitudinal
studies and studies dealing with social mobility and interrater agreement.
We outline a maximum likelihood fitting algorithm that can be used for
fitting a large class of models that includes the class of simultaneous models.
The algorithm uses Lagrangeâ€™s method of undetermined multipliers and a
modified NewtonRaphson iterative scheme. We also discuss goodnessoffit
tests and modelbased inferences. Inferences for certain model parameters
are shown to be equivalent for productPoisson and productmultinomial
vi
sampling assumptions. This useful equivalence result generalizes existing
results. The models and fitting method are illustrated for several applications.
Missing data are often a problem for multivariate response data. We
consider inferences about loglinear models for which only certain disjoint
sums of the data are observable. We derive an explicit formula for the
observed information matrix associated with the loglinear parameters that
is intuitively appealing and simple to evaluate. The observed information
matrix can be evaluated at the maximum likelihood estimates and inverted
to obtain an estimate of the precision of the loglinear parameter estimates.
The EMalgorithm can be used to fit these incomplete data loglinear models.
We describe this algorithm in some detail, paying special attention to the
Poisson loglinear model fitting case. Alternative fitting algorithms are also
outlined. One proposed alternative uses both the EM and NewtonRaphson
algorithm, thereby resulting in a faster, more stable, algorithm. We illustrate
the utility of these results using latent class model fitting.
Vll
CHAPTER 1
INTRODUCTION
1.1 A Brief Introduction to the Problem
There are many situations when multiple responses are observed for each
â€˜subjectâ€™ in a group, or several groups. Here â€˜subjectâ€™ is generically used to
refer to a randomly chosen object that generates responses. The multiple
responses could represent repeated measurements taken on subjects over time
or occasions. They could be the ratings assigned by several judges that all
viewed and rated the same set of slides (here, the â€˜subjectsâ€™ are the slides).
Or, perhaps, it may be that several distinct or noncommensurate responses
are recorded for each subject. These responses are often categoricalâ€”ordinal
or nominalâ€”and inevitably interrelated. This dissertation addresses issues
related to modeling and model fitting for multivariate categorical (ordinal or
nominal) responses.
Models for multivariate categorical response data are usually developed
to answer questions about (i) the association structure among the multiple
responses or (ii) the behavior of the marginal distributions of the response
variables. Specifically, a typical question of the first type is, â€œHow are the
responses interrelated and is this interrelationship the same across the levels
of the covariates?â€ A typical type ii question is, â€œHow do the (marginal)
responses depend on the covariates or occasions?â€ Historically, many models
(e.g. log and logitlinear models) have been developed for the primary
 1 
 2 
purpose of answering the type i questions. Many of these models can easily
be fitted using maximum likelihood (ML) methods. These models typically,
however, are not useful for answering the type ii questions (Cox, 1972).
Marginal modelsâ€”those models used to answer type ii questionsâ€”are not
as well developed. One reason for this is that ML fitting of these marginal
models is more difficult. At present, the method of weighted least squares
(WLS) is used almost exclusively for fitting these models.
Suppose that we are interested in answering questions of both types
i and ii. Usually the questions are addressed using two different models, a
joint distribution model and a marginal model, and fitting them separately. It
seems reasonable to want a model that can be used to address simultaneously
both questions. That is, we would like a model that simultaneously implies
structure on both the joint and marginal distribution parameters. To date,
there has been very little work done on the development and fitting of these
simultaneous models.
Whenever multiple responses are observed it is inevitable that there will
be missing data. There are several ways to fit the Poisson loglinear model with
incomplete data. One popular method is to use the EM algorithm to find the
ML estimates of the loglinear parameters. One drawback to this algorithm
is that a precision estimate of the ML estimators is not produced as a byÂ¬
product. Several numerical techniques have been developed to approximate
the observed information matrix, which, upon inversion, will act as the
precision estimate. However, it would be of some convenience to derive an
explicit formula for the observed information matrix, at least in some special
cases.
1.2 Outline of Existing Methodologiesâ€”No Missing Data
We begin our discussion by considering the case of no missing data.
There are many methods for analyzing multivariate categorical (ordinal or
nominal) response data. These methods usually involve fitting (separately)
models for the joint or the marginal distributions of the response vectors.
In rare instances, simultaneous models for both the joint and marginal
distributions are considered. Maximum likelihood fitting methods for the
joint distribution models are simple and described in almost every standard
text on categorical data analysis. The fitting of marginal models using
ML methods is more difficult. Maximum likelihood fitting of the marginal
homogeneity model was considered by Madansky (1963) and Lipsitz (1988).
The fitting of a more general class of marginal models was considered
by Haber (1985a). Finally, the fitting of simultaneous models using ML
methods has only been addressed in the bivariate response case. The fitting
technique becomes very complicated when there are more than two categorical
responses. To appreciate the complexity of extending the technique to
multivariate response data, see section 6.5 of McCullagh and Nelder (1989)
or perhaps Dale (1986). In contrast, the ML fitting method of Chapter 2 can
easily be used to fit many marginal and simultaneous models. In the next few
paragraphs, we briefly describe the existing methods for modeling and model
fitting for multivariate categorical response data.
Modeling Joint Distributions Separately. One common method for analyzÂ¬
ing multivariate categorical responses is to model the joint distribution only.
These models, which include classical log and logitlinear models for the
 4 
joint probabilities, are useful for describing the association structure among
the responses. The last 30 years have seen the development of these methods
for analyzing multivariate categorical responses (Haberman, 1979; Bishop et
al., 1975; Agresti, 1984, 1990). For specificity, consider the following panel
study: One hundred randomly selected subjects were asked how interested
they were in the political campaigns. They were to respond on the 3 point
ordinal scale, (l) Not Much, (2) Somewhat, and (3) Very Much. Then four
years later the same group of subjects was asked to respond on the same
scale to the same question. A separate investigation into the association
structure would enable us to answer questions of a conditional nature. For
example, we could estimate the probability of responding â€˜Very Muchâ€™ on the
second occasion given that the response at the first occasion was â€˜Not Muchâ€™.
The description of these â€˜transitionalâ€™ probabilities, although very interesting,
may not be completely satisfactory. We may also be interested in addressing
questions with regard to the marginal distributions. Perhaps we would like
to answer the question, â€œAre the distributions of responses to the political
interest question the same for each occasion?â€ Laird (1991), in a nice review of
likelihoodbased methods for longitudinal analysis, mentions that the utility
of classical log and logitlinear models is restricted to two situations: (1)
modeling the dependence of a univariate response on a set of covariates and
(2) modeling the association structure between a set of multivariate responses.
These models place structure on the joint probabilities and so they are not
directly useful for studying the dependence of the marginal probabilities on
occasion and other covariates. This problem was pointed out by several
authors (Cox, 1972; Prentice, 1988; McCullagh and Nelder, 1989;
 5 
Liang et al., 1991). An advantage of these models is that they are simple to fit
using either WLS (Grizzle et al., 1969), ML (McCullagh and Nelder, 1989),
or iterative proportional fitting (Bishop et al., 1975) methods. There are
many standard statistical programs available for fitting these models (SAS,
SPSS2, BMDP, GLIM, GENSTAT).
Modeling Marginal Distributions Separately. A second approach to anÂ¬
alyzing multivariate categorical responses is to model only the marginal
distributions and to ignore the joint distribution structure. Full likelihood
methods that consider only models for the marginal probabilities tacitly
assume a saturated model for the joint distribution. Therefore, the models
may be far from parsimonious. In the nonGaussian response setting, there
is a distinction between these marginal models and the transitional (or
conditional) models of the previous paragraph. Marginal models describe the
occasionspecific distributions and the dependence of those distributions on
the covariates. Transitional or conditional models describe the distribution
of individual changes over occasions. Models for these transitions can be
represented as probability distributions for the future state â€˜givenâ€™ the past
states. Questions regarding transition probabilities can only be investigated
with longitudinal data. On the other hand, questions regarding the marginal
probabilities could theoretically be answered using crosssectional data,
provided the cohort (subject) effects were negligible. Panel studies resulting
in longitudinal data result in more powerful tests for significance of within
cluster factors, such as occasion effect. This follows because there is a reduced
cohort effect; we are using the same panel of subjects at each occasion. For
6 
further discussion about the distinction between marginal and transitional
models, see Ware et al. (1988), Laird (1991), and Zeger (1988).
We will briefly discuss existing methods for making inferences about
the marginal probabilities separately. We will group these methods into 5
categories: (l) nonmodelbased methods, (2) WLS methods, (3) ML methods,
(4) Semiparametric methods, and (5) other methods.
Nonmodelbased methods can be used to derive test statistics used for
testing specific hypotheses regarding the marginal distributions. Examples
include the CochranMantelHaenszel (1950, 1959) statistic which can be used
for testing the hypothesis of marginal homogeneity (MH) (cf. White et ah,
1982), McNemarâ€™s (1947) statistic which can be used for testing the equality of
two dependent proportions, and Madanskyâ€™s (1963) likelihoodratio statistic
for MH. Madanskyâ€™s statistic is a difference in fit of the model of marginal
homogeneity to the fit of the unstructured (saturated) model (see also Lipsitz,
1988 and Lipsitz et al., 1990). Many other relevant test statistics, some of
which are generalizations or modifications of the aforementioned (cf. Mantel,
1963; White et al., 1982), exist. Cochranâ€™s (1950) Q statistic and Darrochâ€™s
(1981) Waldtype statistic are examples of other test statistics that can be
used to test for marginal homogeneity.
Presently, if one was to fit a marginal model, say a generalized loglinear
model of the form ClogAfi = Xfl, where Â¡j, is the vector of expected counts
in the full contingency table, he or she would most likely use the WLS fitting
algorithm. Most statistical software that fits these generalized loglinear
models does so using WLS. There are some advantages to using WLS. It
is computationally simple. Secondorder marginal information is all that is
 7 
needed. And, the estimates are asymptotically equivalent to ML estimates.
Some disadvantages are that covariates must be categorical, sampling zeroes
create problems, and estimates are sensitive when secondorder marginal
counts are small. The WLS method for analyzing categorical data was
originally outlined by Grizzle, Starmer and Koch (1969). Subsequently,
marginal models for longitudinal categorical data, or more generally mulÂ¬
tivariate categorical response data, have been introduced and fitted using the
WLS method (Koch et ah, 1977; Landis and Koch, 1979; Landis et ah, 1988;
Agresti, 1989).
Maximum likelihood fitting of marginal models is more difficult since
the model utilizes marginal probabilities, rather than joint probabilities to
which the likelihood refers. When the responses are correlated, as they
invariably are, the marginal counts do not follow a productmultinomial
distribution. The fulltable likelihood must be maximized subject to the
constraint that the marginal probabilities satisfy the model. Haber (1985a)
considers fitting generalized loglinear models of the form Clog Ap. = X(3 using
Lagrange multipliers and an unmodified NewtonRaphson iterative scheme.
The algorithm becomes very difficult to implement for even moderately large
tables. This is primarily due to the difficulty of inverting the large Hessian
matrix of the Lagrangian objective function. In this dissertation we consider a
modified NewtonRaphson that uses a much simpler matrix than the Hessian.
The matrix is easily inverted even for relatively large tables. Haber (1985b)
considers the estimation of the parameters Â¡3 in the special case Clog/x = X/3.
We will use a modification of the method of Aitchison and Silvey (1958, 1960)
and Silvey (1959) to investigate the asymptotic behavior of the estimators of
8 
(3 in the more general model ClogA/i = X/3, thereby extending the work of
Haber (1985b). Another relevant paper, Haber and Brown (1986), considers
ML fitting of a model for the expected counts /i that has loglinear and
linear constraints. One can test hypotheses about the marginal probabilities
by comparing the fit of relevant models. Haber (1985a, 1985b) and Haber
and Brown (1986) only consider fitting the marginal models separately. No
attempt has been made to simultaneously model the joint and marginal
distributions.
Semiparametric methods such as quasilikelihood (Wedderburn, 1974)
and a multivariate extension, generalized estimating equations (GEE), have
become popular in recent years. The work of Liang and Zeger (1986), which
advocated the use of these GEEs, has been extended to cover the multivariate
categorical response data setting (Prentice, 1988; Zhao and Prentice, 1991;
Stram et ah, 1988; Liang et ah, 1991). With these semiparametric methods,
the likelihood is not completely specified. Instead, generalized estimating
equations are chosen so that, when the marginal model holds, even if the
association among the multiple responses is misspecified, the estimators are
consistent and asymptotically normally distributed. These estimators, used
in conjunction with a robust estimator of their covariance (Liang and Zeger,
1986; Zeger and Liang, 1986; White, 1980, 1981, 1982; Royall, 1986), result
in consistent inference about the effects of interest. When the responses are
truly independent, the estimating equations with correlation matrix taken to
be the identity matrix, are equivalent to the likelihood equations. The GEE
approach requires the specification of a â€˜workingâ€™ association or correlation
matrix. Examples of working associations include those that imply all
 9 
pairwise associations (measured in terms of odds ratios) are the same and
that the higher order associations are negligible (Liang et ah, 1991).
A related approach is known as GEE2. The consistency of these estiÂ¬
mators follows only if both the marginal model and the pairwise association
model are correctly specified. This approach is a second order extension
of the GEEs of Liang and Zeger (1986) which are now termed GEEl. It
is second order because the estimation of the marginal model parameters
and the pairwise association model parameters is considered simultaneously.
The focus of both approaches, GEEl and GEE2, is usually on modeling
the marginal distributionsâ€”investigating how the marginal distributions
depend on occasion and covariates. The association is considered a nuisance.
Presently, there are no tests for goodnessoffit of these models and so the
investigation into how well both models fit can be done only at an empirical
level. The assumption that higher order effects are negligible may not be
tenable. Testing procedures to assess the validity of these assumptions have
yet to be developed. Also, in contrast to WLS and ML methods, which
require only that the missing data be â€˜missing at randomâ€™ (MAR), the semi
parametric approaches require the missing data to be â€˜missing completely
at randomâ€™ (MCAR). The assumption that the missing data mechanism is
MCAR is a much stronger assumption than MAR (Little and Rubin, 1986).
Finally, there are many other approaches to analyzing the marginal
probability structure separately. There are random effects models, whereby
subjectspecific random effects induce a correlation structure on the multiple
responses. The marginal approachâ€”the full likelihood is obtained by
averaging across the random effectsâ€”is computationally difficult (Stiratelli
 10 
et al., 1984). An alternative is to condition on the sufficient statistics
for the subject effects and consider finding the estimates by maximizing
the conditional likelihood. For further details on these conditional and
unconditional methods see Rasch, 1961; Tjur, 1982; Agresti, 1991; Stiratelli
et ah, 1984; Conaway, 1989, 1990. As yet another alternative, Koch et al.
(1980) give a bibliography for relevant nonparametric methods for analyzing
repeated measures data. Agresti and Pendergast (1986) consider replacing
the actual observations by their within cluster rank and testing for marginal
homogeneity using the ordinary ANOVA statistic for repeated measures data.
A threestage estimator for repeated measures studies with possibly missing
binary responses has been developed by Lipsitz et al. (1992). This approach
is very similar to a generalized least squares approach, but it has some of
the nice features of the GEE approaches. One of these nice features is that
the estimators and their variance estimates are consistent under very mild
assumptions. An extension of this method to the polytomous response case
has yet to be developed.
Simultaneous Investigation of Joint and Marginal Distributions. There
has been very little work done to investigate simultaneously the joint and
marginal distribution structure. In some ways GEE2 is an attempt to
describe both distributions. However, only the pairwise (not the joint)
association structure is modeled; the higherorder associations are considered
a nuisance. Tests comparing nested models have not been developed in this
semiparametric setting. Full likelihood approaches have been addressed
by Dale (1986), McCullagh and Nelder (1989, Chapt. 6), and Becker and
Balagtas (1991). Dale models the joint distributions of bivariate ordered
11 
categorical responses by assuming that the log global odds ratios follow a
linear model. The marginal probabilities are assumed to follow a cumulative
logit model. McCullagh and Nelder consider simultaneously modeling the
joint and marginal probabilities of a bivariate dichotomous response (two
distinct responses) by assuming that the log oddsratios follow a linear
model and that the marginal probabilities follow a logitlinear model. Their
example included age as a categorical covariate. Finally, Becker and Balagtas
consider models for twoperiod crossover data. The bivariate dichotomous
response was the response to the two different treatments. Order of treatment
application was considered a covariate. They assumed that the two log odds
ratios followed a linear model and that the marginal probabilities satisfied a
loglinear model. Because it is the marginal probabilities and not the joint
probabilities that satisfy a loglinear model, Becker and Balagtas refer to the
model as log nonlinear.
The ML model fitting approach used by each of these authors involves
a reparameterization of the likelihood, which is a function of the joint
probabilities, in terms of the joint and marginal model parameters. The
reparameterization in the bivariate response caseâ€”the case each author
consideredâ€”is somewhat complicated especially for multilevel responses. To
make matters worse, the extension of this method to general multivariate
polytomous responses looks to be extremely difficult. If the repaparameter
izations are made so that the full likelihood is expressible in terms of the
joint and marginal model parameters, the likelihood can be maximized using
a NewtonRaphsontype algorithm. Basically, one must solve for the root of
some nonlinear score equation. This maximization approach is very sensitive
 12 
to the starting value in that convergence to a local maximum is not likely
unless the starting estimate is very close to the actual maximum. Finding
reasonable starting values is not a simple task. Dale (1986) outlines a method,
specifically for the models considered in that paper, for finding a starting
estimate.
In this dissertation, we outline an ML fitting method that can easily be
used to fit a large class of simultaneous models, including those considered
by Dale, McCullagh and Nelder, and Becker and Balagtas. The approach
involves using Lagrangeâ€™s method of undetermined multipliers along with a
modified NewtonRaphson iterative scheme. For all of the models considered,
an initial estimate for the algorithm is the data counts themselves along with
a vector of zeroes corresponding to a first guess at the values of the Lagrange
multipliers. The convergence of the algorithm is quite stable. The extension
to multivariate polytomous response data is straightforward.
1.3 Outline of Existing Methodologiesâ€”Missing Data
Missing data is often an issue when the response is multivariate in nature.
Missing data can also occur in more hypothetical situations. Examples
include loglinear latent class models (Goodman, 1974; Haberman, 1988)
and linear mixed or random effects models (Laird et ah, 1987). In latent
class analyses, a latent variable, which is unobservable, is assumed to exist.
Mixed or random effects models posit the existence of some unobservable
random variables that affect the mean response. In this brief outline, we will
consider ML methods for model fitting when the data are not completely
observable. Little and Rubin (1986) provide a nice summary of methods
 13 
for model fitting with incomplete data. There are many ways to find the
maximum likelihood estimators when the data are not completely observable,
each method having its positive and negative features. We could work directly
with the incompletedata likelihood, which is usually complicated relative to
the completedata likelihood, and use a NewtonRaphson or Fisherscoring
algorithm. Palmgren and Ekholm (1987) and Haberman (1988) use these
methods to obtain maximum likelihood estimates and their standard errors.
Alternatively, we could avoid the complicated likelihood altogether and use
the ExpectationMaximization algorithm (Dempster et al., 1977). Sundberg
(1976) discusses the properties of the EM algorithm when it is used to
fit models to data coming from the regular exponential family. The EM
algorithm is one of the more flexible ML fitting algorithms for missing data
situations. We will primarily focus on this method for fitting loglinear models
with incomplete data.
Although the EM algorithm is easily implemented to fit loglinear models
with incomplete data, the algorithm does not provide an estimate of precision
of the model parameter estimators. Meng and Rubin (1991) outline a
supplemental EM (SEM) algorithm, whereby, upon convergence of the EM
algorithm, the variance matrix for the model estimators is adjusted to account
for missing data. The adjustment is a function of the rate of convergence of
the EM algorithm, which in turn is a function of how much information
is missing. Meng and Rubin numerically estimate the rate of convergence,
thereby obtaining an estimate of precision that reflects missingness. Although
this approach should prove to be applicable in the general situation, it still
is desirable to derive an explicit formula for the variance matrix that reflects
14
missingness. Other authors (Meilijson, 1989; Louis, 1982) have discussed
methods for estimating precision of model estimators when the data are
incomplete and the EM algorithm is used. Meilijsonâ€™s method involves EM
aided differentiation, which is essentially a numerical differentiation of the
score vector. The method relies on the assumption that the observed data
components are i.i.d. (identically and independently distributed). Louis
gives an analytic formula for the observed information matrix based on the
incomplete data. The computation of the observed information matrix based
on this formula is not straightforward and must be considered separately for
each special application.
1.4 Format of Dissertation
In Chapter 2, we develop a maximum likelihood method for fitting a large
class of models for multivariate categorical response data. This development
follows a general discussion about parametric modeling. Concepts such as
degrees of freedom and model distances (or goodness of fit) are described at
an intuitive level. We also describe and compare the asymptotic distributions
of freedom parameter estimators under productmultinomial and product
Poisson sampling assumptions. Chapter 3 has more of an applied flavor.
We consider simultaneously modeling the joint and marginal distributions
of multivariate categorical response vectors. A broad class of simultaneous
models is introduced. The models can be fitted using the techniques of
Chapter 2. Several numerical examples are considered. Chapter 4 outlines the
ML fitting technique known as the EM algorithm. This algorithm is used to
fit models with incomplete data. Some advantages and disadvantages of using
 15 
the EM algorithm are addressed. The most important disadvantage is that
the algorithm does not provide, as a byproduct, a precision estimate of the
ML estimators. We derive an explicit formula for the observed information
matrix for the Poisson loglinear model parameters when only disjoint sums of
the complete data are observable. An application to latent class modeling is
considered. We also propose an ML fitting algorithm that uses both EM and
NewtonRaphson steps. The modified algorithm should prove to have many
positive features.
In this dissertation, we do not distinguish typographically between
scalars, vectors, and matrices. Parameters and variables are treated as obÂ¬
jects, their dimensions either being explicitly stated or implied contextually.
By convention, functions that map scalars into scalars, when applied to
vectors, will be defined componentwise. For example, if /j, represents an n x 1
vector, then
log = (log/i1,log/i2,...,log/in)\
We frequently use abbreviations that are common in the statistical
literature. They include ML (Maximum Likelihood), WLS (Weighted
Least Squares), IWLS (Iterative (Re)Weighted Least Squares), and EM
(ExpectationMaximization).
The range (or column) space of an n x p matrix X is denoted by M(X)
and is defined as {/lx : /x = X(3, f3 e Rp}. The symbols Â® and 0 are the
binary operators â€˜direct productâ€™ and â€˜direct sumâ€™. The direct (or Kronecker)
product is taken to be the righthand product. That is,
AÂ®B = {Abij}.
16
The direct sum, C, of two matrices A and B is defined as
C = AÂ® B = 0).
The symbol D(n) represents a diagonal matrix with the elements of /Â¿ on the
diagonal. That is,
/>i 0 ... 0\
0 fJ.2 â€¢â€¢â€¢ 0
V 0 0 ... Â¿Â¿n /
In Chapter 4, we make use of the bracket notation often used by
statistical and mathematical programming languages (e.g. Splus, Matlab).
To illustrate the notation, consider a matrix A. The (sub)matrix A[, 2] is
then matrix A with the second column deleted. Similarly, the matrix A[3,]
is the matrix A with the third row deleted.
Equation numbering is consecutive within sections of a chapter, the
first number representing the chapter in which it appears. For example, the
thirteenth equation in section 2.3 is equation (2.3.13). Within each appendix,
the equations are numbered consecutively. For example, the third equation
in Appendix B is numbered (B.3). Tables are numbered consecutively within
chapters so that, for instance, Table 3.2 represents the second table within
Chapter 3. Theorems, lemmas, and corollaries are numbered independently
of each other. All are numbered consecutively within sections. Therefore,
Corollary 3.2.2 is the second corollary within section 3.2 and Theorem 2.3.1
is the first theorem within section 2.3.
CHAPTER 2
RESTRICTED MAXIMUM LIKELIHOOD FOR A GENERAL
CLASS OF MODELS FOR POLYTOMOUS RESPONSE DATA
2.1 Introduction
In this chapter, we consider using maximum likelihood methods to fit a
general class of parametric models for univariate or multivariate polytomous
response data. The models will be specified in terms of freedom equations
and/or constraint equations. These two ways of specifying models will be
discussed at length in section 2.2. The model specification equations may be
linear or nonlinear in the model parameters. Specifically, if represents the
s x 1 vector of expected cell means, the linear constraints will be of the form
L/j, = d and the nonlinear constraints will be of the form U'Clog^Afi) =
0. The freedom equations will have form Clog(A/i) = X(3, where the
components of the vector /3 are referred to as the freedom parameters. In
Chapter 3 of this dissertation, we discuss more specifically models that can
be specified in terms of these constraint and freedom equations. The models
of that chapter allow one to simultaneously model the joint and marginal
distributions of multivariate polytomous response vectors.
The maximum likelihood, model fitting algorithm of this chapter utilizes
Lagrange multipliers and a modified NewtonRaphson iterative scheme. In
particular, the models will be specified in terms of constraint equations and
the log likelihood will be maximized subject to the constraint equations being
 17 
 18 
satisfied. One common optimization algorithm found in the mathematics
literature is Lagrangeâ€™s method of undetermined multipliers. We show that
Lagrangeâ€™s method is easily implemented for ML fitting of the models under
consideration in this chapter. One problem with Lagrangeâ€™s method of
undetermined multipliers for ML fitting of statistical models has been that it
becomes computationally infeasible for large data sets. By using a modified
NewtonRaphson method which involves inverting a matrix of a simpler form
than the more complicated Hessian, we consider fitting models to relatively
large data sets.
We also explore the asymptotic behavior of the estimators within the
framework of constraintâ€”rather than freedomâ€”models. Usually, asymptotic
properties of model and freedom parameter estimators are studied within the
framework of freedom models. Aitchison and Silvey (1958, 1960) and Silvey
(1959) studied the asymptotic behavior of the model parameter estimators
when the model is specified in terms of constraint equations. Following the
arguments of Aitchison and Silvey, we derive the asymptotic distributions of
both the model and freedom parameter estimators.
Previous work by Haber (1985a) addressed maximum likelihood methods
for fitting models of the form
C\ag(Ati) = XP,
to categorical response data. Subsequently, Haber and Brown (1986)
discussed ML fitting for loglinear models that were also subject to the
linear constraints L/u, = d, where these constraints necessarily include the
identifiability constraint required of the vector of productmultinomial
 19 
cell means. Both of these papers advocated the use of Lagrangeâ€™s method
of undetermined multipliers to find the maximum likelihood estimates of
the model parameters /x. The method of Haber (1985a) involved using
the (unmodified) NewtonRaphson method which becomes computationally
unattractive as the number of components in Â¡x gets moderately large. Both
Haber (1985a) and Haber and Brown (1986) were primarily concerned with
measuring model goodness of fit and therefore did not consider estimation
of freedom parameters. Haber (1985b) did consider estimation of freedom
parameters, but only when the simpler model C log/i = X/3 was used. One of
the several ways that we extend the work of Haber (1985a, 1985b) and Haber
and Brown (1986) is to consider estimation of the freedom parameters when
the more general model ClogAfx = X(3 is used.
Others have considered ML fitting of nonstandard models for multivariÂ¬
ate polytomous response data. Laird (1991) outlines the different approaches
taken by different authors. As an example, Dale (1986) considered ML fitting
for a particular class of models for bivariate polytomous ordered response data
which were of the form
C\ log(Aifx) â€” Xi/3i, g(A2fx) â€” X2(32
Specifically, the first freedom equation specifies a loglinear model for the
association between the two responses measured by the global crossratios
(crossproduct ratios of quadrant probabilities) so that C\ and A\ are of
a particular form. The second set of freedom equations specifies some
generalized linear model (McCullagh and Nelder, 1989) for the marginal
means or probabilities. Maximum likelihood estimators for the association
 20 
model freedom parameters 0i and the marginal model freedom parameters
02 were simultaneously computed by iteratively solving the score equations
via a quasiNewton approach. To use this maximization technique, the score
functions, which involve the cell probabilities, must be written explicitly
as a function of the freedom parameter 0 = vec^, 02). A nontrivial
approach to finding reasonable starting values for 0 is discussed by Dale
(1986). Along with Dale, McCullagh and Nelder (section 6.5, 1989) and
Becker and Balagtas (1991) consider writing the score as an explicit function
of the freedom parameters so that the marginal and association freedom
parameter estimates may be computed simultaneously. In general, when there
are more than two responses, this is not a simple task and so an extension
of this method to multivariate polytomous response data models will be very
messy indeed. Also, convergence of the iterative scheme requires good initial
estimates of the freedom parameter 0. These may be very difficult to find. In
contrast, the maximization approach of this chapter, which is similar to Haber
(1985a) and Haber and Brown (1986), is shown to be easily implemented for
fitting multivariate polytomous response data models. With this technique,
it is not necessary to write the cell means as an explicit function of the
freedom parameters. Further, initial estimates of the freedom parameters,
which are difficult to find, are not needed for this technique. Instead, only
initial estimates of the cell means and undetermined multipliers are needed.
Reasonable initial estimates of the cell means are the cell counts themselves.
While a reasonable initial estimate of the vector of undetermined multipliers
is the zero vectorâ€”the value of the undetermined multipliers when the model
fits the data perfectly.
 21 
We will now introduce the class of models that we will consider for the
remainder of this chapter and the next, more applied chapter. The models
have form
logC^iAO â€” XiPi, C2 log(A2/x) â€” AT2/32, L[i â€” d
where the linear constraints include the identifiability constraints. Later,
when we study the asymptotic behavior of the ML estimators, we will
require the components of d to be zero unless they correspond to an
identifiability constraint. These models, which are of the form C\og(Afi) =
X/3, Lfj, = d, will allow us to model both the joint and marginal distributions
simultaneously when dealing with multivariate response data. The bivariate
association model of Dale (1986) is a special case of these models, as we
can specify the matrices C\ and A1 so that Ci log(Ai/Â¿) is the vector of log
bivariate global crossratios. Restricting the marginal models to have form
C2 log(A2/Â¿) â€” X"2/?2, rather than allowing the marginal means to follow a
generalized linear model, as Dale (1986) did, is not overly restrictive. In
fact, many of the generalized linear models for multinomial cell means can be
written in this form. For example, loglinear, multiple logit, and cumulative
logit models are of this form. Also, unlike Haber (1985a) and Haber and
Brown (1986), we will be concerned with estimation of the freedom parameter
Â¡3 = vec(/?i, /32), thereby allowing for modelbased inference.
Modelbased inferences usually refer to inferences based on freedom
parameters. With freedom equations, we have the luxury of choosing a
parameterization that results in the freedom parameters having meaningful
interpretations. For instance, a freedom parameter (3 may be chosen to
 22 
represent a departure from independence in the form of a log odds ratio.
More generally, we usually will try to parameterize in such a way so that
certain parameters will measure the magnitude of an effect of interest.
For example, consider an opinion poll where a group a subjects were
asked on two different occasions whether they would vote for the President
again in the next election. Suppose they were asked immediately after the
President took office and again after the President had served for two years.
The researcher may be interested in determining whether the distribution of
response changed from Time 1 to Time 2 and if so, assess the magnitude of
the change. The data configuration can be displayed as in Table 2.1.
Table 2.1. Opinion Poll Data Configuration
Data
Time 2
yes no
Probabilities
Time 2
yes no
Time 1 yes
2/n
2/12
Time 1 yes
*11
7Ti2
no
2/21
V22
no
7121
*"22
*"+l *"+2
*"l +
*"2+
We could formulate a model of the form Clog(Afi) = X(3 in such a way
so that the freedom parameter (3 has a nice interpretation with respect to the
hypothesis of interest. One such model is
log(4g) = a + * = 1,2 (2.1.1)
where the parameter is a marginal probability, i.e.
if i = 1
if i = 2
 23 
and, for identifiability of the freedom parameters,
Pi = ~P2 = P
Model (2.1.1) is a simple logit model for the marginal probabilities {7rÂ¿+} and
{7rj }. The parameter p measures the magnitude of departure from marginal
homogeneity in that p = 0 if and only if there is marginal homogeneity.
One could use the Wald statistic p/se(p) to test the hypothesis. If the
null hypothesis is rejected, we can assess the magnitude of departure from
marginal homogeneity by computing a confidence interval for 2p which is the
log odds ratio comparing the odds that a randomly chosen subject responds
â€˜yesâ€™ at Time 2 to the odds that a randomly chosen subject responds â€˜yesâ€™ at
Time 1.
This simple example illustrates the utility of using freedom parameters
and the corresponding modelbased inferences. For this reason, this chapter
will be concerned with making inferences about both the model parameters
p and the freedom parameters /3.
The contents of the following sections are as follows. In section 2.2,
we provide an overview of parametric modeling. The two ways of specifying
modelsâ€”via constraint equations and via freedom equationsâ€”are discussed
at length in section 2.2.1. It is shown that a model specified in terms of
freedom equations can be respecified in terms of constraint equations. In
particular, the freedom equation Clog(j4/i) = Xf3, which actually constrains
the function C\og(Ap) to lie in some manifold spanned by the columns of X,
is equivalent to the constraint equation U'Clog(Ap) = 0, where the columns
of U form a basis for the null space of X'. Other topics covered in section 2.2
24
include interpretation and calculation of â€˜degrees of freedomâ€™ and measuring
model goodness of fit.
We describe a general class of models for univariate or multivariate
polytomous response data in section 2.3.1. The data vector y is initially
assumed to be a realization of a productmultinomial random vector. We
describe the asymptotic behavior of the productmultinomial ML estimators
in section 2.3.3. Lagrangeâ€™s method of undetermined multipliers is used to
find restricted maximum likelihood estimates of the model parameters and
the freedom parameters. The actual algorithm is described in detail in section
2.3.4.
In section 2.4, we explore the relationship between the productmultinomial
and productPoisson ML estimators. General results that allow one to
ascertain when inferences based on productPoisson estimates are the same as
inferences based on productmultinomial estimates are shown to follow quite
directly when one works within the framework of constraint models. Theorem
2.4.2 of this section, represents a generalization of the results of Birch (1963)
and Palmgren (1981).
2.2 Parametric Modelingâ€”An Overview
Inferences about the distribution of some n x 1 random vector Y are
often based solely on a particular realization y of Y. In parametric modeling
it is often the case that the distribution of Y is known up to an s x 1 vector
of model parameters 0; i.e. it is â€˜knownâ€™ that
Y ~ F(y9), 0 6 0,
(2.2.1)
 25 
where 0 is some (sÂ«^dimensional (q > 0) subset of R3 known to contain the
true unknown parameter 9*. The cumulative distribution function F maps
points in Rn into the unit interval [0,1] and is assumed to be known.
In general, we will allow the dimension s of 0 to grow with n. For
example, let Y = (Yi,..., Yn) have independent components such that
Yi ~ ind G(yi]Zi(6)), t = l,...,n,
where Z{(9) is some function of 9 associated with the ith component of Y.
The function could be defined as z,(#) = 9i, in which case s = n. Or, on
the other hand, Z{ could be a mapping from Rs to R1 with s fixed.
2.2.1 Model Specification.
In parametric settings, models for the data, or more precisely, models for
the distribution of Y, can be completely specified by recording the family of
candidate distributions that F may belong to. That is, one must specify the
form for F(]9) and the space Om that is assumed to contain the true value
9* of 9. In parametric modeling, the form of F1(; 9) is assumed known, but
the true value 9* is not. Denote a parametric model by [F(]9),9 Â£ 0m] or
more simply by [0m] We say the model [0m] â€˜holdsâ€™, if the true parameter
value 9* is a member of 0m, he.
[0m] holds 9* Â£ 0M
A model does not hold if 9* g 0m
The objective of model fitting is to find a simple, parsimonious model
that holds (or nearly holds). By parsimonious, we mean that the vector 9 can
be obtained as a function of relatively few unknown parameters. An example
 26 
of a parsimonious model for the distribution of an nvariate normal vector
with unknown mean vector fi and known covariance is [0/3], where
0/3 = {fj. e Rn : Hj = (3, j = 1,..., n, (3 unknown}.
Notice that all n components of /j, can be obtained as a function of
one unknown parameter /3. Thus, all of our estimation efforts can be
directed towards the estimation of the common mean (3. An example of a
nonparsimonious model is the socalled saturated model [0], where
0 = {/i : n g Rn} = Rn.
In this case, fi is a function of n unknown parameters.
The question of whether or not the parsimonious model holds is an
entirely different matter. Practically speaking, a model will rarely strictly
hold. Therefore, we will often say a model holds if it nearly holds, i.e. for
some small e
inf Â«*  # < e.
Om
Without delving too much into the philosophy of model fitting and the
simplicity principle (Foster and Martin, 1966), we point out that for a model
to be practically useful it must be robust to the â€˜white noiseâ€™ of the process
generating Y. That is, it should account for only the obvious systematic
variation. A model would be said to be robust to the white noise variability,
if the model parameter estimates based on different realizations of Y are very
similar. As an example, if instead of [0^], the saturated model [0] was used
to draw inferences about the normal mean vector /i, we would find that the
model fit perfectly, but that upon repeated sampling the model estimates
27
would change dramatically. Thus, the model is not robust to the white noise
of the process. On the other hand, the parsimonious model [0^] estimates
would change very little from sample to sample, varying with the sample
mean of n observations. This model is robust to the white noise variability.
Therefore, if the model would hold, or nearly hold, we would say it was a
good model.
Freedom Models. In the previous nvariate normal example we specified a
model [0^] in terms of some unknown parameter Â¡3. Aitchison and Silvey
(1958, 1960) and Silvey (1959) refer to the parameter [3 as a â€˜freedom
parameterâ€™ and the model [0^] as a â€˜freedom modelâ€™. These labels are
reasonable since we can measure the amount of freedom we have for estimating
9 by noting the number of independent freedom parameters there are in the
model. The model [0^] has one degree of freedom for estimating the mean
vector /i. Thus, once an estimate of the single parameter (3 is obtained the
entire vector g, can be estimated; it is a function of the one parameter Â¡3.
Notice that â€˜degreesâ€™ of freedom correspond to integer dimension in that a
degree of freedom is gained (lost) if we introduce (omit) one independent
freedom parameter thereby increasing (decreasing) the dimensionality of 0^
by one.
In general we will denote a freedom model by [0^], where
ex = {0eQ:g(9) = X(3iÂ¡3eRr}
The function g is some differentiable vector valued function mapping 6 Â£ 0
into rdimensional Euclidean space Rr. The â€˜modelâ€™ matrix X is an r xp full
column rank matrix of known numbers. To calculate degrees of freedom for
 28 
[Â©x] we initially assume g satisfies
V#o e Â©x5
(M)
V 06'
#0
is of full row rank r.
It also will be assumed that the constraints implied by g{6) = X/3 are
independent of the q constraints implied by the model [0] of 2.2.1. Well
defined models will satisfy these conditions. For example, any g that is
invertible satisfies the derivative condition. Actually this derivative condition
is not a necessary condition for the model to be well defined. Later, we will
show that g need only satisfy a milder derivative condition.
The degrees of freedom for the model [0x] can be obtained by subtractÂ¬
ing the number of constraints implied by [Â©x] from the total number of model
parameters, s. The number of constraints implied by [Â©x] is (r  p) + q, the
dimension of the null space of X' plus the q constraints implied by model [0].
Hence, the model degrees of freedom for [0x] is
df[Â®x] = s(rp + q) (2.2.2)
In view of (2.2.2) the model degrees of freedom, an integer measure of freedom
one has for estimating 9, is an increasing function of p the number of freedom
parameters. In fact, for the special case when q = 0 and g(9) = 9 (so s = r),
we have that the number of degrees of freedom for model [0x] is simply p,
the number of freedom parameters. This gives us another good reason for
calling (3 a freedom parameter and [Â©x] a freedom model.
Constraint Models. Notice that
{9ee:g(e) = X(3,f3eRp}
(2.2.3)
can be rewritten as
{# e 0 : U'g(9) = 0},
 29 
where U is an r x (r p) full column rank matrix satisfying U'X = 0, i.e. the
columns of U form a minimal spanning set, or basis, for the null space of X'.
Letting u = r  p and h*(8) = 0 be the q constraints implied by [0], we can
write the (u + g) x 1 vector of constraining functions as h(9) = [hi(6), h*(8)]'
where hi = U'g. We rewrite the freedom model [0x] of (2.2.3) as [0/,], where
Sh = {6 e R3 :h{8) = 0}. (2.2.4)
Aitchison and Silvey (1958,1960) refer to model [0^] as a constraint model.
Every freedom model can be written as a constraint model.
We present a few simple examples to illustrate the equivalence between
the two model formulationsâ€”freedom and constraint.
Example 1. Let YÂ¿ ~ ind N(/3,a2), i = 1 ,...,n, where cr2 is known.
This model can be specified as the freedom model [0_y], where
Â©x = {p E Rn 'â– p = 1 n/3, Â¡3 unknown }
or equivalently it can be expressed as the constraint model [0/,], where
Qh = {p g Rn : U'p = 0}
and U' is the (n  l) x n matrix
/I 1 0 0 0 \
u<= l 0 10 ... 0
\1 0 0 0 ... 1/
It is easily seen that 0^ = 0/, and that the model degrees of freedom is
df[Qx] = n(n  1) = 1.
Example 2. Let YÂ¿ ~ ind N(pi = /30 + fliXi, cr2), i = 1 ,...,n, where
is known. This model can be specified as the freedom model [0_x]5 where
Ox = {P e Rn : Pi =/5o +fiiXi, * = l,...,n}
 30 
or assuming that each is distinct, as the constraint model [Â©/,]> where
eh = {n G Rn : U'n = 0 }.
Here U' is the (n  2) x n matrix
/
l
i
U' =
+
i
Z2â€”Zl Z2 â€”Xl X3 â€” X2 X3â€”X2
1 1 1
0
x2 X\
\ X2XÃ
ZjZl
Z4â€”Zs Z4â€”Zs
Z2Zl
0
0
0
0
0
0
0 \
0
1
â€”1 zr
inl /
Notice that U'fi = 0 implies that
Hj+i ~ _ Hk+i ~
xj+1 â€” xj xk+l ~~ xk
, Vk,j.
That is, the n means fall on a line. As before, it can be seen that Â©x = Â©&
and that the model degrees of freedom is df[Qh] = n â€” (n â€” 2) = 2.
Definitions. We will assume that the constraining function h satisfies
some reasonable conditions so that the model is well defined. We first present
some definitions.
(1) A model [Â©/,] is said to be â€˜consistentâ€™ if Qh 7^ 0.
(2) A consistent model [Â©/,] is said to be â€˜welldefinedâ€™ if the Jacobian
matrix for h is of full row rank v â€” u f q at every point in Qh. That is,
v*â€œe e*
#0
is of full row rank v.
(3)A model [Â©/,] is said to be â€˜illdefinedâ€™ if it is not welldefined, i.e.
3i fe (ahW
*,Â£Â°b
00
is not of full row rank u.
 31 
(4) An illdefined model [0/,] is said to be â€˜inconsistentâ€™ or â€˜incompatibleâ€™
if Â©/, = 0.
Briefly, any reasonable model will have a nonempty parameter space and
hence will be consistent. The Jacobian condition of definition (2) is similar
to the condition required in the Implicit Function Theorem (see Bartle, 1976).
Basically, this condition requires the constraints to be nonredundant so that,
at least theoretically, the constraint equations can be written uniquely as
a function of a smaller set of parameters. An illdefined model has been
specified with a redundant set of constraint equations. Using the lingo of
the optimization literature, two constraints are redundant if, for each point
in the parameter space, both of the constraints are â€˜activeâ€™ or both of the
constraints are â€˜inactiveâ€™. That is, for all parameter values, if one constraint
is active (inactive) then the other is necessarily active (inactive).
It should be noted that the above definitions are in terms of the
constraint formulation of a model. This is sufficient since freedom models can
be written as constraint models. For convenience, we give sufficient conditions
for a freedom model to be welldefined.
A consistent freedom model is welldefined if it satisfies the following two
conditions:
(i) The constraints implied by g{6) = X/3 are independent of the q
constraints implied by [0].
(ii) The Jacobian matrix of g evaluated at any point in [0x] is of full row
rank r, i.e.
ggffl
d0â€˜
00
)â€¢
is of full row rank r.
V#o 6 0x>
 32 
The sufficiency of conditions (i) and (ii) can be seen by observing that
(ii) implies that hi = U'g has a full row rank Jacobian since U' is of full row
rank and (i) implies that h = (hi,h*)' has full row rank Jacobian. These
sufficient conditions are by no means necessary for a model to be well defined
as the Jacobian of h may be of full row rank u even when the Jacobian of g
is not of full row rank.
Notice that the model matrix has nothing to do with whether or not a
model is well defined. In particular, one may think that the model [0^] is
illdefined whenever the r x p matrix X is not of full column rank; i.e. the
freedom parameters are nonestimable. However, the model can be rewritten
as a constraint model with the full column rank matrix U spanning the null
space of X, which has dimension less than p  r. It follows that if g satisfies
(i) and (ii), then the model [Â©x] will be welldefined. The only reason we
have taken X to be of full column rank is to avoid using generalized inverses
when working with the freedom parameters.
To illustrate the use of these definitions, we consider the model [0M],
where
Qm = {9 e Rn : Md  d = 0}.
The model will be well defined if dh/d6' = M is of full row rank. It is
inconsistent if the linear system of equations M9 = d is inconsistent.
If a model [0/,] is well defined, then the constraints implied by the model
are all independent in that no constraint can be implied by the others. We
will consider only welldefined models when calculating degrees of freedom.
 33 
As before, we calculate degrees of freedom for a model as the difference
between the number of model parameters s and the number of independent
constraints v implied by the model, i.e.
df[Qh] = s(rp + q) = s(u + q) = s u
Notice that for the constraint model, model degrees of freedom is a decreasing
function of the number of independent constraints u.
Finally, it should be noted that models may be specified in terms of
both freedom equations and constraint equations. In fact, in subsequent
sections this will be the case. However, without loss of generality, we will
concentrate on constraint models since any model can be written in the form
of a constraint model.
2.2.2 Measuring Model Goodness of Fit
Inferences about model parameters are reliable only if the model is
â€˜goodâ€™. A good model should be well defined (or at least consistent). It
should be simple and parsimonious. Finally, the model should be relatively
close to holding.
To assess whether or not the model holds, we will need the concept of a
distance between two models. To begin, we will assume there is some measure
of distance between two hierarchical parametric models. (Two models [0i]
and [@2] are hierarchical if 02 C 0! and d/[02] < d/[0i] whenever 0j ^ 02.)
This (parametric) distance will be a quantitative comparison of how close
the two models are to holding. Thus, if both models hold the distance is
zero. The distance will also be independent of the model degrees of freedom.
 34 
Recall that the form of F(.] 9) is assumed known. Therefore, the distance will
measure how far the true parameter is from falling in the parametric model
space. Suppose, firstly, that Â©! and @2 are general parameter spaces. That
is, 6 g Â©i u Â©2 does not necessarily define a probability distribution. In other
words, 9 need not fall in a subset of an (s l)dimensional simplex. Let a(9)
and b(9) be vector or matrix valued functions of the unknown parameter 9.
Define a distance between two hierarchical models [Â©1] and [02] (Â©2 C Â©j) as
i[02; e,] = inf 6(0)(a(e)  a(0â€™))*  inf 6(Â»)(o(Â«)  a(0*))2.
Notice that a and b can be chosen so that
(1) 6[02;Â©i]>O
(2) <$[02;Â©i] = 0, iff Â©! and 02 hold.
For example, consider the case Y ~ MVNn(fi,cr2In). Suppose that
[Â©] = {(/b*2) e Rn, 0}
[Â©i] = {(m, ct2) : // = //0, 0}
[02] = {(aÃ, 0}
[Â©3] = {(/b ^2) â€¢â€¢ A4  ln<*, OCER,a2> 0}.
In this example, each component of Y has a common variance a2. It seems
reasonable that differences between any Â¡J,j and the true mean Â¡x* are equally
important. Hence, a natural distance between any two of these models is
= inf ///i*2  inf ///z*2.
WM2 {^)M1
Notice that a(/x,a2) = // and &(//, cr2) = 1. Hence, the measure of distance
 35 
between [0] and [0i] is
Â¿[0i; 0] = inf /i  fi*2 = ll/io  ^l2
fc)l
The second infimum is zero since the model [0] is known to hold.
The measure of distance between [02] and [0] is
Â¿[02; 0] = inf \\fi  /z*2 = inf \\X(3  /z*2
t>2 P
= IX(X'x)>xv*m*II2
(2.2.5)
= (/.  X(X'X)>X>'2
= m*'(/â€žx(x'x)â€™x'K
This is the squared length of the vector orthogonal to the projection of Â¡i*
onto the range space of X. Notice that if Â¿i* = X/3*, that is 02 holds, then
Â¿[e2;0] = O.
Finally, the distance between [02] and [02] is
Â¿[03;02] = inf /i  /i*2  inf ll/x  /Â¿*2
= ^\in  ^ K  #**'(/â€ž  x(x'x)â€™x>* (2.2.6)
= iÂ¡â€™,\x(x'x)1xl 
As another example, consider a random vector Y = (Yj,.. with
independent components following an exponential dispersion distribution
(Jprgenson, 1989). That is,
Yi ~ indep FJD(/ij,cr2), i = l,...,n,
where the density of Yi, with respect to some measure, has form
/v(y;7i,^2) = a(y,o2)exp{^2(y7i  Â«(7i)>
(2.2.7)
 36 
where /zÂ¿ = Â«'(7Â¿) and var(yÂ¿) = cr2/Ã"(7Â¿). Let V(/i) = Â®â€/Ã"(7Â¿) and
# = (/ii,.. .,/xn,(j2)'. Since the components of y have different variances,
a natural measure of distance is
Â¿[0m2;Â©mJ = inf y(/i)1/2(/i^)2inf ^(m)~1/2(^  M*)2 (2.2.8)
WM2
That is a(0) = /j, and b(0) = y(/Â¿)1/2. Premultiplying the vector (/Â¿  Â¡j,*) by
Vr(/i)1/2 has the effect of downplaying those differences (/q  Â¿Â¿*) when the
corresponding variance is large.
To assess the goodness of fit of a model, relative to another, we can
estimate the distance 8 via some statistic based on the observed data. It
is interesting to note that when 8 = 0, i.e. both models hold, our data
based estimate of this null distance will be some nonnegative (positive, if
the model is unsaturated) number, reflecting the amount of white noise or
random variability there is in Y. This is so because, if both models hold,
then the only reason that our estimate of distance would be nonzero would
be because Y has some random component. That is, the variability in Y that
is not explained by the model causes the data to fit the model imperfectly.
Let D be an estimate of 8. That is, D[02; Â©i] is a stochastic, databased
estimate of how far apart models [Â©1] and [02] are. Potential candidates
for D are the weighted least squares, likelihood ratio, Wald, deviance, and
Lagrange multiplier statistics.
For example, consider the nvariate normal case and the four candidate
models [0], [0j], [02], and [03]. We will assume that both [0] and [02]
hold. In view of (2.2.5) a reasonable estimate of <5[02; 0] can be obtained by
37
replacing /Â¿* by Y, the estimate of under model [0], i.e.
D[02; 6] = Y'(Iâ€ž  X(X'X)'X')Y = Â¿(rÂ¡  i;)2.
1
Recall, that since Â¿[02;0] is known to be zero, Z?[02;0] serves as our
â€˜estimate of errorâ€™.
Similarly, a reasonable estimate of Â¿[03; 02] can be obtained by replacing
Â¡u,* in (2.2.6) by Y, the least restrictive estimate of /i*, i.e.
L>[03; 02] = Y'{X(X'X)'X'  = Â¿(Y,  Y)2.
Tl>
1
Now 03 C 02 and
d/[03] = n + l(nl) = 2
d/[02] = n + l(np)=p+l.
The degrees of freedom associated with estimating the distance between
two models will be called the distance (or residual or goodnessoffit) degrees
of freedom. The distance degrees of freedom for the two models [Â©Mi] and
[Â®m2] is defined to be the difference between the two model degrees of freedom,
i.e.
d/(Â¿[0M2; 0mJ) =d/[0MJ d/[0M,].
The number of distance degrees of freedom measures the dimensional distance
between the two models, i.e. the difference in dimensions. It measures the
difference in the amount of freedom one has for estimating 9 for the two
models. It seems intuitive that if the degrees of freedom is large, that is the
dimensional difference between the two models great, the significance of the
distance statistic may be difficult to ascertain. This follows since we expect
the fit to be quite different for the two very different models, even when both
 38 
models hold. This is a reflection of both white noise and possibly lack of fit.
Therefore, the distance statistic will tend to be large, even when both models
hold. But for many statistics, a large mean implies a large variance, thereby
making significant findings more difficult. It is for this reason that we say
it is better to concentrate our efforts on relatively few degrees of freedom
to detect lack of fit. That is, one should use the smallest alternative space
possible when testing a null hypothesis.
A more technical argument holds when the test statistic (distance
statistic) is a Chisquare or an F. Das Gupta and Perlman (1974) showed
that for a fixed noncentrality parameter, i.e. fixed distance between models,
the power of the Ftest or the Chisquare test increases as the distance degrees
of freedom decreases.
Example 1: Continuing with the nvariate normal example, we see that
Â¿mee,]) = df[@2] d/[0s] = (p+i)2=pi.
Thus, 03 is of p  1 less dimensions than 02. Now, if we knew a2 the white
noise variance, we could test H0 : 6* e @3, vs. Hi : 6* e Â©2  Â©3, using the
statistic
Â¿>[83; 62] = SSjReg)
which has a X2(p1) null distribution. However, a2 is not generally known and
we must estimate it. One way of estimating a2 is by estimating the distance
between [0] and [Â©2], two models that are known to hold, and dividing by
the distance degrees of freedom. Since the distance degrees of freedom is
df[Q]  df[Q2] = nf 1  (p +1) = np, we have that the estimate of the white
noise variance is D[02; Â©]/(up) = SS(Error)/(n  p).
 39 
Notice that in the above example the estimate of the parameter a2
was simply the estimated distance between two models that were known to
hold divided by their dimensional distance. Quite generally, when the data
have an exponential dispersion distribution (2.2.7) with common dispersion
parameter a2, the estimated distance between two models that are known to
hold, divided by their dimensional distance gives us an estimate of a2. This is
true when the estimated distance is taken to be the LR, Wald, Deviance, LM,
or the weighted least squares statistics. These statistics are natural estimators
of the weighted distance given in (2.2.8) for the exponential dispersion models.
Now, let us assume that 0j and Â©2 are each subsets of an (s 
l)dimensional simplex. For example, with count data, conditional on the
total n, the distribution is often multinomial with index n and parameter
(alternatively, probability distribution vector) 0*. Read and Cressie (1988)
extensively study a family of distance measures called the powerdivergence
family. The power divergences have form
Ix^=w+T)p;l{%Y1h~
OO,
where Io and I1 are defined to be the continuous limiting value as A 0 and
A â€”> 1. It is assumed that 9* and 8 fall on an (s  1)dimensional simplex.
As usual, let 0* represent the true unknown parameter. We define the family
of distance measures between [Â©j] and [Â©2] (Â©2 C0j) to be proportional to
6[Â©2;Â©i] = 2n{infJA(0*,0) inf/A(0*,0)}.
Â©2 Â©1
By properties of Ix{9*,0) (Read and Cressie, 1988, pp. 110113), it follows
that 8 > 0, with equality if and only if both models hold.
 40 
To estimate Â¿[Â©2; Â©i] based on the data, we note that our least restrictive
guess of 9* is Y/n, the vector of sample proportions. Intuitively, a good
estimate of the quantity Â¿[Â©2; Â©1] would be
D[02; Â©1] = 2n{ inf IA(Y/n, 9)  inf 7A(Y/n, 0)}
Â©2 Â©i
= 1 yy.lY Yi Yi]  yyIÃâ€”ÃY 1
A(A + l)^r*LU(A)J XJ A(A + l)tr*lU(A)j
where 0jA^ and 9^ are the â€˜minimum divergenceâ€™ estimators obtained by
minimizing Ix(Y/n,9) with respect to 9 over 04 and 02 respectively. Read
and Cressie (1988) point out that Z)[02;0i] is equal to the likelihood ratio
statistic when A = 0. Also, if we assume that [O^ holds so that the second
infimum is zero, we have that, for A = 1,
(V.  nÂ«,(1))2
0[e3;0.] = x;
TV
9^
which is asymptotically equivalent to
n[Q . 0 ] _ V'' O^i â€” n9\ ^)2
where 0(Â°) is the maximum likelihood estimator of 9* over the space 02. This
is the Pearson chisquare statistic. Other asymptotically equivalent distance
estimates are the Wald statistic and the Lagrangian multiplier statistic. We
now illustrate these results via examples.
Example 2: Suppose that Y = (Yn, Y12, Y2J, Y22)' is a multinomial vector.
That is,
(Yn,Y12,Y21,Y22)' ~ MuZÂ¿(n,(7rn,7r12,7r21,7r32)'), with = 1.
* j
Thus, the model that is known to contain the true parameter vector 7r* is [0]
where
Â© = (7T :7tT4 = 1,7Tij e (0,1), i,j = 1,2}.
41 
Notice that 0 is really a 3dimensional subset (simplex) of (0, l)4 so that
d/[0]=4l = 3.
We wish to test the independence hypotheses
Ã H0 : 7rn7r22 = 7r127r2i, vs.
I H\ : 7T117T22 ^ 7T127T21
Writing the model of interest [0O] as
00 = {vr e 0 : 7T117T22  7T127T21 = 0}
= (tt : 7t'14 = l,7rn7r22  ^12^21 = 0},
we can state the independence hypotheses as
Ã H0 : 7T G 0O, vs.
1 #1 : 7T G 0  00
Now, the model degrees of freedom can be found by subtracting the number
of constraints implied by [0O] from the total number of parameters, which
is 4. Hence, df[Q0] =42 = 2. Thus, the distance degrees of freedom or
measure of dimensional distance, is d/(<5[0o; 0]) = 32 = 1.
Two distance (goodnessoffit) statistics commonly used are the Pearson
chisquare X2 (A = l) and the likelihood ratio statistic G2 (A = 0). The forms
of these two statistics are
D[eâ€ž;e] = *â€™ = Â£Â£
* Ã
(yij  nTTijto)2
and
Â£[0o; e] = gj = 2 Â£ JÂ»g (Jh.
i j n7r *J,0
where 7ris the ML estimate of 7rÂ¿j assuming that model [0O] holds.
Under the null hypothesis, i.e. if independence truly holds, then the
asymptotic distribution of both distance statistics, X2 and G2, is X2(l)
 42 
Example 3: Continuing with example 2, consider the model [Â®mh] where
Â®mh = {tt : tt'14 = 1, 7T1+  7r+1 = 0}.
This model implies that there is marginal homogeneity, i.e. The marginal
distributions for both factors are the same.
We would like to test the hypotheses
H0 : 7T g QMh, vs.
Hi : 7T G 0  Â®MH
The model degrees of freedom is df[Â®MH\ =42 = 2, and so the distance
degrees of freedom is df(6[QMH] 0]) = 32 = 1. Once again, to illustrate
what model degrees of freedom means, we observe that if [Â®mh] holds and
we specify two of the four probabilities, the remaining two are completely
determined. Thus, we are free to estimate two of the probabilities based on
the data. The other two are determined.
Two frequently used estimates of the model distance, or model goodness
of fit are the likelihood ratio statistic G2 and the McNemar statistic M2. For
2x2 tables, the McNemar statistic and the Lagrange Multiplier statistic are
equivalent since both are score statistics (Agresti, 1990; Aitchison & Silvey,
1958). The statistics take the following forms
and
â€¢D[eMiI;e] = G2 = 2Â£y>,iog(
* j
Vij \
nrriji 0}'
D[QMHe] = Mi =
(yi2  yii)2
yn + V2i '
where the 7rÂ¿J)0 in the first expression is the ML estimate of 7rÂ¿; under the
model [0MJi].
 43 
Under the null, i.e. when the marginal distributions are homogeneous,
both of these statistics have asymptotic %2(l) distributions.
It is important to note that, had the constraint 7r2+7r+2 = 0 been added,
the model would remain consistent but would be ill defined. For 2x2 tables,
this additional constraint is exactly the same as the constraint 7r1+  7T+1 = 0.
2.3 Multivariate Polytomous Response Model Fitting
In this section, we describe ML model fitting for an integer valued
random vector Y that is assumed to be distributed productmultinomially.
We also investigate the asymptotic behavior of the ML estimators within the
framework of constraint models. The models we will consider have form
Qx = {Â£ e 0 : Clog(AeÃ) = X/3, LÃ© = 0}
or equivalently, for appropriately chosen U,
Qx = 0k = {(e@: U'C]og(AÃ©) = 0,LÃ© = 0},
where eÂ¿ is the s x 1 mean vector of Y, a productmultinomial random vector
and the model parameter space 0 is of dimension s  q, where q is the number
of identifiability constraints. We use the parameter Â£ rather than fx â€” Ã©
for several reasons. One reason will become evident when we explore the
asymptotic behavior of the ML estimator of Â£. It turns out that the random
variable Â¡x  Â¡u,o is not bounded in probability, whereas Â£  Â£o is. In fact, the
random variable Â£  Â£0 converges in probability to 0. Another reason for using
Â£ rather than Â¡x is that the procedure for deriving the maximum likelihood
estimate of Â£ is less sensitive to small (or zero) counts. The range of possible
Â£ values is the whole real line, while the range of possible /x values is restricted
 44 
to the positive half of the real line. By using Â£ the problem of intermediate
out of range values (e.g. negative cell mean estimates) is avoided.
As stated above, we initially assume that the vector of cell counts Y
has a productmultinomial distribution. This is not overly restrictive since it
will be shown that inferences based on maximum (multinomial) likelihood
estimates are often the same as inferences based on maximum (Poisson)
likelihood estimates. We will present some results in section 2.4 that allow
us to determine when these inferences are indeed the same.
We also consider an alternative method for computing the maximum
likelihood estimators and their asymptotic covariances. The method of
Lagrange undetermined multipliers is well suited for maximum likelihood
fitting of the models we will be considering. This is so because we will specify
the models in terms of constraint equations and the fitting problem will be
one of maximizing a function, namely the log likelihood, subject to some
constraints, namely that Â£ 6 0/,.
2.3.1 A General Multinomial Response Model
In this section we specify a class of models that is directly applicable
to Chapter 3 of this dissertation. Specifically, the models will be specified in
such a way so as to include the class of simultaneous models for the joint and
marginal distributions considered in Chapter 3.
45 
Let the random vector Y = vec(Yâ€™1,..., Yk) denote a product multinomial
random vector, i.e.
Yi = {Yu, â– â– â– ,YÃr)' ~ ind Mult(nÂ¿,7rÂ¿), i = K> 1,
where the R x 1 vector of cell probabilities satisfy k^Ir = 1, i = 1,..., K.
Consider the 1:1 reparameterization from {7rÂ¿} to {Â£,}, where =
log(/Â¿Â¿) = log(nÂ¿7Tj) is an R x 1 vector of log means. Under this parameÂ¬
terization,
Yi ~ ind Mult(nÂ¿, â€”), e^l# = nÂ¿, i = l,...,K,
Tli
or
Yi ~ ind Mult(nÂ¿, â€”), i = l,...,K, e^eflR) = n', (2.3.1)
Tli
where n' = (n1?... , n#) is the 1 x K vector of multinomial indices.
The kernel of the log likelihood for Y, written as a function of Â£, is
f(M)(Â£;Â») = â€ž'Â£, eÂ«â€™(Â©f 1*) = n' (2.3.2)
We now posit a model for Â£, the vector of log means. Let s = RK be the
total number of cell means. Our objectives are to test the model goodness
of fit and to estimate the s x 1 model parameter vector Â£ as well as any
freedom parameters of interest. It will be assumed that the model [Â©x] can
be specified as
Â©x â€” {Â£ Â£ Rs â€¢ Ci log A\^ =â– Xi(3i, C2 log â€” X2P2, Le^ = 0,
^'(Â©f 1r) = n'}>
(2.3.3)
 46 
where
Ci = Â©fCtf, Cij = CÃ¼, is qi xmÂ¡ Â¿ = 1,2
AÂ¿ â€” Aij, Aij = An, is mÂ¡ x R, i = 1,2
L = = L\ is d x R
f = vec(6,,^), and & is JÃ x 1
XÂ¿ is Kqi x pi of full rank pÂ¿, Â¿ = 1,2
n is the if x 1 vector of multinomial indices
s = RK, the total number of cells
Let us say that a model that can be specified as in (2.3.3) satisfies
assumption (Al). That is,
(Al) The multinomial response model can be specified as in (2.3.3).
Notice that the K matrices of Ct are all identical, likewise with the
matrices comprising AÂ¿ and L. This requires that the model does not change
across the K populations (K multinomials). Also, the two sets of freedom
equations in (2.3.3) will allow us to use two different types of models for
the expected cell means. This provides us with enough generality to fit
many interesting models. For example, we may wish to simultaneously fit
a linearbylinear association loglinear model for the joint distribution and a
cumulative logit model for the marginal distributions.
We can conveniently rewrite (2.3.3) as
Qx = {ZeRâ€™: Clog(AeÃ) = X(3,LÂ¿ = 0,e*'(Â©f 1*)  n'}, (2.3.4)
where A' = [A\, A'2\, C = CyÂ® C2, X = X\ Â® X2, and Â¡3 = vec(/31?/?2).
Notice that the model [0x] is specified in terms of both freedom
equations and constraint equations. We will rewrite [Â©x] as a constraint
47
model keeping in the back of our minds that the freedom parameters may be
of interest also.
Let U be a K(q\ + <72) x u matrix of full column rank u such that
U'X = 0. Here u is the dimension of the null space of X', Ai(X'), i.e.
u = K(qi + q2)  (pi +^2) Since U can be chosen to be of full column rank, it
follows that the columns of U form a basis for the null space of X'. Thus, the
range space of U equals the null space of X', i.e. M{U) = Xâ€™{X'). Multiplying
the right and left hand side of the freedom equation Clog(AeÃ) = X/3 by U1,
we can rewrite (2.3.4) as
eh = {Â£ e RÂ° : C7'Clog(Aei) = 0,Le* = 0,e*'(Â®f 1R) n' = 0}. (2.3.5)
Thus, 0x = Â©h and the models [Â©x] and [Â©;,] are one and the same.
At this point, we will assume that the constraints implied by the model
[Â©/,] are nonredundant so that the model is well defined. More specifically, let
h'(Â£) = [(J7'Clog(Aei))',e^If'] be the 1 x (u + l) (l = Kd) vector of constraint
functions. We will assume that the u + / + K constraints implied by fi(Â£) = 0
and = n' are nonredundant. Notice that the constraints in fi(Â£) = 0
do not include the identifiability constraints. We treat the identifiability
constraints separately for reasons that will become apparent when we actually
fit the models.
As stated previously, one of our primary objectives is to estimate the
model parameters Â£ and the freedom parameters (3 under the assumption
that [Â©x] (and [Â©/,]) holds. We will use the maximum likelihood estimates,
which can be found by maximizing the log likelihood of Y subject to the
constraint that [0h] holds.
 48 
The (kernel of the) log likelihood under the product multinomial
assumption is shown in (2.3.2). It is
Ã(M)(Ã; y) = y'(
Thus, we are to maximize the function y) = subject to Â£ e 0/,.
2.3.2 Maximum Likelihood Estimation
In this section we will discuss two procedurally different approaches
to maximizing the log likelihood subject to Â£ e Qh. The first
approach, which is the more commonly used approach, requires that the
model be specified entirely in terms of freedom equations. Often times,
when there are no identifiability constraints, the model can be completely
specified as a freedom model. Models amenable to this approach include the
Poisson loglinear model and the Normal linear model. The second approach,
Lagrangeâ€™s method of undetermined multipliers, can be directly applied when
the model is specified completely in terms of constraint equations. Since the
product multinomial model includes identifiability constraints, it can more
easily be specified in terms of constraint equations. For this reason this
second method is the preferred choice. In the following sections, we discuss
some additional features of these two methods.
Freedom Parameter Approach. One approach often used in simple situaÂ¬
tions, namely those situations when the model can be specified completely
in terms of freedom equations, is to write the parameter Â£ as a function
of the freedom parameter (3 and maximize &M\Â£((3)\y) with respect to (3.
The vector Â£(/?) will be in the model space, since the model was specified
 49 
completely in terms of (3. For example, if the model could be specified as
0Jf = Â« â‚¬ R: loge< = X/3),
then Â£(/3) = Xf3. Notice that the multinomial model, which includes the K
constraints e^Â®^].#) = n', is not directly amenable to this approach. In fact,
we would have to reparameterize to a smaller set Â£* of sK model parameters
that account for the K constraints. This reparameterization results in an
asymmetric treatment of the Â£ and for that reason is deemed undesirable.
On the other hand, the Poisson model considered below, will often lend itself
to this maximization approach, since the K constraints e^Â®^!.#) = n' are
not included.
Computationally, the method of maximizing the log likelihood with
respect to the freedom parameters is usually simple. Assuming the log
likelihood is concave and differentiable in /3, we need only solve for the root
of the â€˜score equationsâ€™, viz.
s(/?;y)
SMo
<9/3
Many of the asymptotic properties of the maximum likelihood estimator
/3 for /3 are derived by formally expanding the score vector s(/3;y) about the
true value /3 = Â¡3* in a linear Taylor expansion. That is,
s(0 y) = s(/S*; y) + ds(^y)(H /?*) + 0(11/3  /3'll2)
In particular, in many situations,
0 = S0;Y) = S0;Y) + 9s{Q,Y) 0  F) + Of( 1),
(2.3.6)
 50 
so that (3  Â¡3* has the same asymptotic distribution as
)~V;n
Subsequently, we will derive the asymptotic distribution of $(3* in a different
way. This alternative derivation of the asymptotic distribution of the freedom
parameter estimate will shed new light on the relationship between the
asymptotic behavior of the estimates under the two sampling assumptionsâ€”
product Poisson and product multinomial.
Expression (2.3.6) also gives some indication of how one might numerÂ¬
ically solve for /3, the root of the score equation. A NewtonRaphson type
algorithm is often used. This root finding algorithm involves the inversion
of the derivative matrix ds((3;y)/d(3', which is usually of small dimension
since the model is usually specified in terms of a small number of freedom
parameters. In fact, the dimension of the derivative matrix will not be larger
than s x s, which occurs when the model is saturated.
Constraint Equations Approach. In many situations, it may be difficult to
specify a model in terms of only freedom parameters or perhaps it is possible
but the researcher would like to treat the model parameters symmetrically,
which would necessitate an additional constraint equation. It also could be
that the function ClogAe^ is not a 1:1 function of Â£ so that for given Â¡3, we
can not solve for Â£ explicitly. In any of these cases, we may not be able to
use the aforementioned maximization approach.
In this section, we consider an alternative method for finding that Â£
that maximizes the function ^M^(Â£;y) subject to Â£ e 0j,. The method we
will use is the Lagrangeâ€™s method of undetermined multipliers. Aitchison and
 51 
Silvey (1958, 1960) and Silvey (1959) provide much of the essential underlying
theory related to this approach. Three positive features of this method
include (i) estimation of both Â£ and Â¡3 is possible, (ii) the method provides
us with another enlightening way of deriving the asymptotic distribution
of the freedom parameter estimators, and (iii) the method works quite
generally. A negative feature of this approach is the computational difficulty.
Computationally, the method becomes burdensome as s, the number of log
mean parameters, and u + l + K, the number of constraints implied by the
model, become large. In fact, the algorithm involves the inversion of an
(s + u + l) x (s\u + l) matrix. One positive note, is that this potentially very
large matrix does have a simple form and one can invoke some simple matrix
algebra results to reduce the inversion problem to one of inverting matrices
of dimensions (u + 1) x (u + l) and s x s.
To best illustrate the difference in computational difficulty of the two
methods, we consider the following normal linear model example. Let
Yi ~ ind N(hÃ = (30 + PiXi,#2), i = 1,2,..., 100, a2 known.
The log likelihood can easily be written as a function of Â¡3 = (/30,/3i)'.
Maximizing this likelihood with respect to (3 involves working with a 2 x 2
matrix. On the other hand, we could equivalently specify the linear model in
terms of the 98 constraints,
A*Â»+i ~ Hi = Vi+2 ~ IH+i i = i 2 .,98,
Xi+1  Xi Xi+2  Xi+1
and use Lagrangeâ€™s method. In this case, we would need to invert a matrix
which has dimension (s + u + l) x (s + u + l) = 198 x 198.
 52 
Even when we use the matrix algebra results that simplify the problem
of working with the 198 x 198 matrix, we still are left with a formidable task.
It seems that when s is large and the model is parsimonious, i.e. u + l + K,
the number of constraints is large, the undetermined multiplier method may
not be the method of choice. However, in time, as computer efficiency gains
are realized, we predict that the scope of candidate models to be fit using
this method will increase tremendously. In fact, at present, many categorical
models can easily be fit using Lagrangeâ€™s method. We discuss in more detail
how we can use the method of undetermined multipliers to fit models like
[0k] of (2.3.5).
We are to maximize the function y) = y'Â£ subject to the constraint
Â£ 6 0/,, where
0/, = {Â£ E Rs : C/'Clog^e^) = 0, Let = 0,e^(Â®^l^)  n' = 0}
= {ZeRs:h{t) = 0,et\(B?lR) = n'},
and h'(Â£) = [log(et'A')C'U, c?L%
Consider the Lagrangian objective function
F( 7) = f(M)(i;y) + (ef'(Â®f 1*)  nâ€˜)T +
where 7 = vec(Â£,r, A). The K x 1 vector r and the (ii + /) x 1 vector A are
called either â€˜Lagrange multipliersâ€™ or â€˜undetermined multipliersâ€™.
Provided a maximum Â£ exists and that the Jacobian of [e^sfT#) 
n',/i'(Â£)] is of full row rank u + 1 + K for all Â£ 6 0/,, we can solve for the
maximum by solving the system of equations
dF( 7^))
dy
(y + LÂ»(eÃ(M))( Â®f l*)ÃW + H(Â£M)AM >
( Â©f  Â«
V M^(Ai))
= 0
(2.3.7)
 53 
where the matrix H(Â£) = dh'(Â£)/dÂ£. The Jacobian condition basically
requires the constraints to be nonredundant, thereby making [0/,] a well
defined model.
From this point on, for notational convenience, the indices for the direct
sum Â® will be omitted unless they are different from 1 and K.
We now require the matrices of models [0^] and [0/,] to satisfy some
additional conditions. Let us assume that
(A2) Either CÂ¿ = Iq.K or CÂ¿( Â© lmÂ¿) = 0, i = 1,2
and
(A3) If Ci = Iq.K then M(X,) d M(elm.)
The assumptions require to be either a contrast matrix (rows sum
to zero), a zero matrix, or the identity matrix. If CÂ¿ is the identity matrix,
it will be required that there exists a set of columns in that spans a
space containing the range space of Â®fTmi. For most models of interest
these conditions are met. For example, any of the logit type models, such as
cumulative or multiple logit models, can be specified with C being a contrast
matrix. For loglinear models, the condition (A3) is met whenever the model
includes a parameter for each of the K multinomials.
The following lemma will be useful in showing that the maximum
likelihood estimates of Â£ and Â¡3 are equivalent under both sampling schemesâ€”
productPoisson and productmultinomial. The lemma will also enable us to
reduce the number of equations in (2.3.7) that must be simultaneously solved
when computing the maximum (multinomial) likelihood estimators.
54
Lemma 2.3.1. If the matrices of models [Â©x] and [Â©/i] satisfy (Al), (A2),
and (AS), then provided the model holds
(Â® i= (Â©i
Proof. Using matrix derivatives (MacRae, 1974; Magnus and Neudecker,
1988), it follows that
H(Â£) = [D(Ã©)A!Dl(AÂ¿)C'U, D(e^)L'}
Thus,
(Â© rR)H(0 =
(
( Â© eÂ« )[A1, A'2)D' ( ^ ) (C? Â© C2)U, @eÂ« L[
[(Â©eÃi^ÃD'Ã^eÃJC! (
(Â»iyci, (Â®1W)CSF. 0
= 0
The third equality follows since the model holding implies that Â®eÂ¿ÃLÂ¿ = 0.
The sixth equality can be seen via the following argument.
If both CÂ¿â€™s are contrast matrices, or zero matrices, then (A2) implies
that the matrix [(Â©lm,)^, (ffilmj)^] 1S the zero matrix. On the other hand,
if both C\ and C2 are identity matrices, then since the columns of U span
the null space of X', which, by (A3), implies that the columns of U span a
set contained in the null space of
fffilmxY
VffllmJ â€™
 55 
we have that [( Â© l'mi), ( Â© l'm2)]U = 0. Any other combination of Cj and C2
can also be seen to result in the matrix equaling zero. m
The following theorem gives conditions under which we can find the ML
estimators of Â£ by solving a reduced set of equations. The smaller system of
equations no longer includes the identifiability constraint equations.
Theorem 2.3.1 Let vec(Â£(M), r(M\ \(M^) be the solution to (2.3.7).
Assuming that (Al), (A2), and (A3) hold, the subvector vec(Â£(M\ A(M1)
is the solution to the reduced set of s + u + l equations
+H(eM)) A(M)
hC^M))
= 0
(2.3.8)
Proof: Premultiplying the first set of equations in (2.3.7) by Â©1'^, we arrive
at
(e l'*)y + ( e e 1*R + ( Â® l's)tf(Ã<">)Ã = 0 (2.3.9)
Now, (Â© l'Ã±)y = n and (Â© l'fl)Z)(eÃ
~(M)1
be that ( Â© e^ )( Â© 1#) = D(n), the diagonal matrix with the multinomial
indices on the diagonal. Further, by Lemma 2.3.1,
( Â© l'iZ)Lf(Â£(M)) = 0. Therefore, (2.3.9) can be rewritten as
n + D(n) fW = 0,
which implies that f(M) = Ik Now, since the identifiability constraints have
been explicitly accounted for when solving for r^M\ we can replace of
(2.3.7) by 1^ and omit the identifiability constraints. Thus, vec(Â£(M), A(M1)
 56 
is the solution to the reduced set of equations
/'!/eÃ<",+tf(Ã<">)Ã<">'\ =0
V Ml(M)) )
This is what we set out to show. g
Before detailing the iterative scheme used for solving (2.3.8), we will
explore the asymptotic behavior of the estimator = vec(Â£(M\
within the framework of constraint models.
2.3.3 Asymptotic Distribution of ProductMultinomial ML Estimators
In what follows, we will assume that K, the number of identifiability
constraints, is some fixed integer, K > 1. We also will assume that the
asymptotics hold as n* = min{nÂ¿} approaches infinity and that n* ~ nÂ¿, i =
1 That is, we assume that the asymptotic approximations hold as
each of the multinomial indices get large at the same rate.
The derivation of the asymptotic distribution of will follow closely
that of Aitchison and Silvey (1958). Briefly, Aitchison and Silvey show that
if the score vector is op(n) and the constraints are such that the derivative
matrices #(Â£) and dif'(Â£)/<9Â£ have elements that are bounded functions then,
provided certain mild regularity conditions hold, the maximum likelihood
estimator Â£ is an n1/2consistent estimator of Â£0 and A is an n1/2consistent
estimator of 0. They show that the joint distribution of (n*/2(Â£  Â£o)>â„¢1^2^)
is multivariate normal with zero mean and covariance matrix
( B~l  B'HiH'B'Hy'H'B1 0
V 0 (H'B'H)1
where B is the information matrix and H is the derivative of the constraint
function.
57
In our application, however, there are some minor changes. With the paÂ¬
rameterization we use, the information matrix is zero since the (multinomial)
log likelihood (2.3.2) is linear in the parameter Â£. This happens because the
identifiability constraints e^( Â©^ 1^) = n' are ignored, to preserve symmetry,
when differentiating. Also, in our parameterization, the constraints are in
terms of e^, the components of which are e&Â¿ = nÂ¿7rÂ¿Â¿. Thus, the constraints
and the corresponding derivative matrices may not be bounded. For example,
a typical constraint is of the form Let â€” o. It follows that the components
of Let and the derivatives are increasing without bound as the multinomial
indices are allowed to increase without bound.
Fortunately, we can still use the results of Aitchison and Silvey (1958)
by replacing the matrix H and the vector A/n of Aitchison and Silvey by
our H/n* and A, where n* = min{nÂ¿}. The zero information problem can be
solved by identifying the vector Y  et as the â€˜score vectorâ€™. It is pointed out
that, in this case, the asymptotic variance of Â®D_1/2(nÂ¿l^) times the score
vector is not equal to the negative derivative matrix D(7T0) but instead is
equal to D(tt0)  Â©7roÂ¿7rÃ³Â¿. This happens because the components of Y are not
independent; Y is product multinomial. Using this reparameterization, all of
the necessary assumptions required by Aitchison and Silvey (1958) hold, i.e.
assumptions X and H of Aitchison and Silvey (1958) hold.
As previously mentioned, Aitchison and Silvey show that A is an
n1/2â€”consistent estimator of 0. With our paramterization, having replaced
A/n by A, it follows that A(M) will be n^2consistent. We now derive the
asymptotic distribution of
 58 
Define the stochastic function g by
MO
The maximum likelihood estimator is the solution to g[6\ Y) = 0.
Under our parameterization, using the results of Aitchison and Silvey
(1958), we have that each of the following hold
ef'"â€™  eÂ®> =  fâ€ž) + 0P(1),
H(
and
Thus,
MÂ£(M)) = MÃ.) + W'(ÃÂ»)(Ã(M)  Ão) + Op(l)
= tf'(Â£o)(Ã(M)Ão) + Op(l),
ff(f
0 = s(Â¿(M);y)=('reÃ("l+Ãf(Ã(Â«))X(Â«)'j
V MÃ
can be rewritten as
0=(Y~eÃÂ° â– D(ef0)(f + Op(l)\
V ÃT'(Ão)(f
= ('reÃÂ»\ / D(eÃo) H(Ã.)UÃ(M)ÃoUop(l)
l 0 ) â€ž) 0 H Ã(Â«) )+L,pW
Therefore, it follows that
oD'iâ€™Ãm i*)
ref"
0
since n* ~ n,, Â¿ = 1,... ,K and 7r0 = ( Â® D1(nÂ¿lfl))eÃo.
(2.3.10)
 59 
Now, the random variable Â®D_1/2(niliZ)(yeÂ¿Â°) is a vector of normalized
sample proportions so that
Â©JD1/2(nil*)(yeio)^
has an asymptotic normal distribution with zero mean and covariance matrix
^D(tTo)  Â©XoiTT^ 0^
Therefore, by an extension of a theorem of Cramer (1949) and by equation
(2.3.10), it follows that nlJ2{6(M)  9*) = n;Â¡/2vec(Â£(M)  Â£o>A(M)) has an
asymptotic normal distribution with mean zero and covariance
( D(irâ€ž) (D(wh)  Â®iroiic'e
{*** o ) l 0
* Ã)(S T
This covariance matrix is shown in the appendix to have the simple form
Mi 0
0 M2
where
M, = D~\*0)  DÂ¡(k0)H(H'D'(*0)H)Â¡H'DÂ¡(vo)  ffif lal'a
and
Finally, using the fact that n* ~ nÂ¿, i = 1,..., K, we can discriminantly
replace n* by the appropriate to arrive at a simple, asymptotically
equivalent, expression for the asymptotic covariance of = vec(Â£(M), A(M1).
 60 
It is
( D'  D'HiH'D'Hy'H'D1  0
V 0 ' (H'D'H)J
where Z? â€” 0) = D(eÂ¿0) and H = if(Â£0)
2.3.4 Lagrangeâ€™s Methodâ€”The Algorithm
In this section, we give details of how one can actually fit the models
of (2.3.4) or equivalently (2.3.5). We show how Lagrangeâ€™s undetermined
multipliers method can be used in conjunction with a modified Newton
Raphson iterative scheme to compute the ML estimators and their asymptotic
covariances. We will assume that the model assumptions (Al), (A2), and (A3)
hold. This section includes an outline of the algorithm used in the FORTRAN
program â€˜mle.restraintâ€™.
Recall that our objective is to find that e Qx, where
Qx = iteRâ€™: CTog(AeÃ) = X/3, = 0, (Â®l'*)eÂ« = n},
that maximizes the multinomial log likelihood
(2.3.12)
Â¿(M)(Ã; y) = y'l
Since the assumptions (Al), (A2), and (A3) hold, we see by Theorem
2.3.1 that our problem is reduced to one of solving the system of equations
(2.3.8), i.e. to find the ML estimator = vec(^(Afl, we must
simultaneously solve the system of s + u + 1 equations
 61 
where the (u + I) x 1 vector h and the s x (w + /) matrix H are defined as
follows.
Mf)=([Wlog(Â¿eÃ))
and
m =
mt)
dZ â–
It will be shown in section (2.4) that g(9) is actually the derivative
of the Lagrangian objective function under the productPoisson sampling
assumption.
The iterative scheme used in the FORTRAN program â€˜mle.restraintâ€™ is
a modified NewtonRaphson algorithm. The algorithm can be sketched as
follows.
(1) Find a starting value for 8.
(2) Replace 0M by 9^) =  Gâ€œ1(0('%(0<*')) (2.3.13)
(3) If 5I(^^I/+1^) > tol go to (2). Else stop.
The matrix G{8) used in step (2) is actually
G(9) =
m
0
and the inverse of G{9) is of the very simple form (see Aitchison and Silvey,
1958 or Rao, 1974)
G~\9)
(D1  D'HiH'D'Hy'H'D1 D^HiH'D^H)1 \
(H'D'Hy'H'D* (H'D'H)1 ) '
(2.3.14)
 62 
where D = D(et). Since we use G(0) in place of the Hessian matrix, the
procedure is a modification to the NewtonRaphson method. Haber (1985a)
used the more complicated Hessian matrix.
Notice that the inversion of G, which may be performed at each iteration,
is not nearly as difficult as inverting a general matrix of dimension (s + u +
l) x (s + u + /). First of all, in view of (2.3.14), to obtain the inverse of the
partitioned matrix G, we need only invert the matrices D and H'D~1H, which
are of dimension s x s and (u + l) x (u + Z). Secondly, the inversion of D is
simple since D is a diagonal matrix with eÂ¿ on the diagonal. Hence, the most
formidable task in the inversion process is the inversion of the symmetric
positive definite matrix H'D^H. There are many efficient ways to invert
large symmetric positive definite matrices.
Upon convergence of the algorithm (2.3.13), estimates of the asymptotic
covariances of and A(M) are readily calculable. Write G_1(0) of (2.3.14)
as
where
P = D~1  D'HiH'D'Hy'H'D1
Q = D~'lH(H'D1H)1
R = {H'D1H)1
By (2.3.12), the asymptotic covariance of 9(M) = vec(Â£(M), AW) can be
estimated by
0 )
0 R)
Variance estimates for other continuous functions of 6^M\ such as
Â¿(M) = and p(M) _ (X'J5f)1X'Glog(AeÃ(M)), can be found by invoking
 63 
the delta method. For example,
var )var (Â¿(M))D(ei<">)
and
var(/3(M))=
(X'X)1X'CD1(Afi(M))A(vax(fiM))A'D1(Afi(M))C'X(X'X)1.
Evidently, Lagrangeâ€™s method of undetermined multipliers provides us
with a convenient procedure for maximum likelihood fitting of models in a
very general class of parametric models for multivariate polytomous data with
covariates possible. We now briefly outline the steps needed to perform the
iterations of (2.3.13).
Computing U. The first thing we must do is write the freedom model (2.3.4),
which can easily be input by the user, as a constraint model (2.3.5). Therefore,
we must compute a full column rank matrix U that satisfies U'X = 0. The
method we use to find U is attributed to Haber (1985b).
Using the notation of â€˜mle.restraintâ€™, let X be a full column rank matrix
of dimension q x r. Let u = q  r be the dimension of the null space of X'.
Further the matrices A and C of (2.3.4) will have dimensions mxs and q x m
respectively. The relationship between these dimension variables and those
used in sections 2.3.1 and 2.3.2 is as follows
q = K(q1+q2)
r=p! +p2
m = K(rrii + m2).
We use the variables g, r, and m for notational convenience.
64
Consider the matrix U* = IqX(X'X)_1X'. This qxq matrix is of rank
u = qr and satisfies the property
XJ*'X = 0.
Let W denote a q x u matrix with random elements. Specifically,
Wij ~ Uniform(0,100), i=l,...,
It follows that the matrix W is of full column rank with probability one and
hence that the qxu matrix U = U*W is of full column rank u with probability
one. But the matrix U satisfies
U'X = W'U*'X = W'O = 0.
Therefore, at least with probability one, we have found a full column rank
matrix U that satisfies the property U'X = 0. Using this U, we are able to
write freedom model (2.3.4) as a constraint model (2.3.5).
Computing h(Â£). We write the constraint model of (2.3.5) as
{Â£efl':A(f) = 0, e<'(@fl*)=n'}, (2.3.15)
where the constraint function h is defined as
m=(u'c ^ef)).
Computing g(9). Notice that since (Al), (A2), and (A3) hold, the
identifiability constraints present in the product multinomial model (2.3.4)
can be accounted for explicitly. It will follow by results of section 2.4, that
under either sampling schemeâ€”productPoisson or productmultinomialâ€”
 65 
the maximum likelihood estimators for Â£ and A can be found by solving the
equation
s(0)=(yef+^(fM)=Â°, (2.3.16)
where the matrix H is the derivative of h' with respect to Â£.
Computing #(Â£). We will use matrix derivative results of MacRae (1974)
to find the matrix of derivatives of the constraint function h'(Â£).
H(0 = ^ = ^[log(e
= [D(Ã©)A!D\AÂ¿)C'U, D(Ã©)L'].
The equality follows upon using the matrix version of the chain rule. Notice
that
^(log(eâ€˜,A')CC/)=(^)!j(be
= D(Ã©)A! D\AÂ¿)C'U
and that
dtPV dtP dtP U
dfi
8^ det
= D(et)L'.
Computing G{9). The iterative scheme (2.3.13) used to solve the system
of equations (2.3.16) is actually a slight modification of the NewtonRaphson
algorithm. It is a modification because we do not use the derivative matrix
G* = dg(9)/dd to adjust at each iteration, as Haber (1985a) did, but rather a
simpler matrix G that is related to G* by G* = G + Op(n\'2). The derivative
 66 
matrix G* can be computed as follows.
_ QgW _ \SaW Mill
' 1 ~ aÂ» ~[ sf â€™ d\â€˜ .
= iD(e() + ^p H(()\
V Hâ€™(() o )
=(hdW TWT
The matrix
dH(QA _ dH{Q
d?
{I, Â® A)
is of order Op(i7.y2) when it is evaluated at 0 = vec(Â£, X) since
mi)
d?
0P(n*)
and
A = Op(nâ€œ1/2).
It follows that the matrix G, which is much simpler to invert than G*, can
be used to adjust the estimate at each iteration.
Computing the inverse of G. Although the matrix G is of dimension
(s + u + V) x (s + u + Z), which may be very large in practice, its inverse
is relatively simple to calculate. The inverse of the partitioned matrix
is shown by Aitchison and Silvey (1958) to have form
( D1  D'HiH'D'Hy'H'D1 \
\ (H'D'Hy'H'D1 {H'D'H)' )â€™
Therefore, only the matrices D and (H'D 1H), which are of dimensions
s x s and {u + l) x {u + /), need to be inverted. The inverse of D is easily
67
calculated since D is a diagonal matrix with e* on the diagonal. The inverse
of (H1 D~JH), a symmetric positive definite matrix, can be found quite easily,
even when u + /, the number of constraints, is large. It should be pointed
out that when s, the total number of cell means, is large, the number of
constraints u +1 may be large and on the same order as s. This will be the
case for parsimonious modelsâ€”those models with many constraints relative
to number of model parameters.
One could choose to invert the matrix G a limited number of times to
mitigate the computational burden. In fact, in their 1958 and 1960 papers,
Aitchison and Silvey advocate an iterative method whereby the inverse of G
is computed only two times. Once at the initial iteration and again at the
final iteration, upon convergence. We feel, however, that in this special case
in which the matrix G has a particularly simple form, the inverse can be
computed at each iteration. Along with increased computing power, there
are many efficient algorithms for inverting large symmetric positive definite
matrices.
2.4 Comparison of ProductMultinomial and ProductPoisson Estimators
We begin this section by introducing notation for a productPoisson
random vector.
The sxl random vector Y = vec(Yi,..., YK) is said to be productPoisson
if
Yij ~ ind Poisson(e^), i = l,...,K, j = 1,...,R. (2.4.1)
Suppose that the s = RK log means {&_,â€¢} satisfy the model [0^] where
&P = {Â£eR': Clog(AeÃ) = X/3, Le* = 0}
 68
or equivalently, for appropriately chosen U,
&P = Qp = {Â£eRs: U'C\og(Ae*) = 0, = 0}
(2.4.2)
This model implies all the same constraints on Â£ as the product
multinomial model [0/,] of (2.3.5), with one exceptionâ€”the identifiability
constraints, e^( Â© 1#) = n', are not included.
Denote the maximum likelihood estimators computed assuming (2.4.1)
and (2.4.2) by Â£(p) and /3(ph Similarly, denote the maximum likelihood
estimators computed assuming (2.3.1) and (2.3.5) by Â£(M) and
Recall that the three productmultinomial model assumptions are
(Al) The multinomial response model can be specified as in (2.3.3).
That is the model parameter space can be represented as
0jt = {Â£ e R* : Ci logAje* = X\f3\,C2 log A2e^ = X2f32,
LÃ© = 0, ei'(Â©fl^) = n'},
where
Ci = Â®f Cy, Cij = Ci 1, is qi x rrii 7 â€” 1,2
Ai = Â©f Aij, Aij = An, is rrii x R, 7 = 1,2
L = Lj, Lj = Li is d x R
Â£ zz vec(Â£i,...,Â£^), and Â£*. is A x 1
Xi is Kqi x Pi of full rank pÂ¿, 7 = 1,2
n is the AT x 1 vector of multinomial indices
s = RK, the total number of cells.
 69 
(A2) Either = Iq.K or CÂ¿( 0 lmj) = 0, Â¿ = 1,2,
and
(A3) If Ci = Iq.K then M(Xt) D M(elmi).
The following theorem states that the maximum likelihood estimators for
Â£ and hence Â¡3 are the same under the productmultinomial sampling scheme
of (2.3.1) and the productPoisson sampling scheme of (2.4.1) provided that
the three assumptions (Al), (A2), and (A3) hold.
Theorem 2.4.1 If the model (2.3.4) satisfies assumptions (Al), (A2), and
(AS), then
Â¡(P) = Â£W and Â£(p) =
That is, the maximum likelihood estimators of (3 and Â£ are the same under
both sampling schemesâ€”productPoisson (2.41) and productmultinomial
(2.3.1).
Proof: Under the product Poisson assumption of (2.4.1) and (2.4.2), the
kernel of the log likelihood is
Â¿(p)(Â£;y) = y'Â£e*'i,.
Therefore, letting 9 = vec(Â£,A), the corresponding Lagrangian objective
function is
W)=Â»â€™ÃeÂ«,l. + W(Ã)A
and so to find the maximum (Poisson) likelihood estimator 9^ = (Â£(p), A(pl)
we must solve the system of equations
dQ(9) = (y ei(p) + if(Â£(p))A(p)
90 \ h@p))
The conclusion of the theorem now follows, since the equations (2.3.8) of
\ =0. (2.4.3)
 70 
Theorem 2.3.1 and (2.4.3) yield exactly the same solutions and
/?(p) = (X'X^X'Clog^Ae^) = (X'Xy'X'ClogiAe^) = ftM\
As a corollary to Theorem 2.4.1 we have
Corollary 2.4.1 Provided the assumptions of Theorem 2.f.l hold, the
estimated undetermined multipliers are invariant with respect to sampling
scheme, i.e.
A(m) = X(P)
Proof: The proof follows immediately upon noting that equations (2.3.8)
and (2.4.3) yield exactly the same solutions. _
A remark is in order. Basically, Theorem 2.4.1 enables us to conclude
that the sufficient and necessary condition of Birch (1963) holds. These
conditions are that the model be specified so that the Poisson ML estimators
necessarily satisfy the identifiability constraints that are required for the
multinomial model.
We now explore the asymptotic behavior of the (Poisson) ML estimator
0(p) = vec(Â£(p\ A(p)). For the productPoisson assumptions (2.4.1) and
(2.4.2), we can obtain the asymptotic distribution of Q(p) by formally replacing
the n* = min{nÂ¿} by Â¿Â¿* = min{e&>} and using the same arguments as those
used to derive the asymptotic distribution of 0(M).
Jprgenson (1989) discusses limiting distributions for Poisson random
variables as the mean parameters, or equivalently /i*, go to infinity. In this
 71 
case,
alÂ»';Y)=(r + )
has an asymptotic normal distribution with mean zero and asymptotic
covariance
(^Â°)
Using arguments similar to those used in the multinomial case, it follows that
(Ye*Â°
V 0
H
0
^ ( Â£(p)  6
/ l A(p)
We conclude, as in the product multinomial case, that  90 has an
asymptotic normal distribution with mean zero and asymptotic covariance
(DM Hy'fD^o) o) /%) fry1
\ H' 0 ) \ 0 0) [ H' 0 ) â€¢
But, this can again be simplified as it was in the multinomial case. It can be
shown that the asymptotic covariance can be rewritten as
(D* D'HiH'D'Hy'H'D1
V 0
0 ^
(.H'D'H)J)
(2.4.4)
where D = D(/j,0) = D(eio) and H = H(Â£0)
Comparison of the Asymptotic Distributions. Provided assumptions (Al),
(A2), and (A3) hold, both 9â€” 0O and  90 have asymptotic normal
distributions with zero means and respective covariances given in (2.4.4) and
(2.3.12). Therefore, we have the following interesting results.
Result 1. The asymptotic covariances of Â¿Ap) and are related by
(â€¢t;)
var( iVÃ*) = var(#(iâ€™*)
(2.4.5)
 72 
Result 2. The asymptotic distributions of A(p) and A^) are identical and
it follows that the Lagrange multiplier statistic which has form
LM = A'(var(A))1A = A \H'D'H)\
is invariant with respect to the sampling scheme.
Result 3.
~(P)~(P)'
var= var(/Â¿(p))  â€”â€”â€”
Tlx
(2.4.6)
Result 4.
var(/?(M)) = var(/?(p))  A (2.4.7)
where
A = (X'X)~1X'C
Q^)C'X(X'X)1.
VUi
and is nonnegative definite.
The notation var() used in these results denotes the asymptotic variance.
This is important since the finite sample variances may not even exist.
The proofs for Results 3 and 4 are straightforward. Basically, they
involve using the delta method and equation (2.4.5). The interested reader
will find an outline of the proofs in Appendix A.
In practice, it is of particular interest to evaluate the matrix A of equation
(2.4.7). Often, for convenience, the models are fit assuming the vector Y
is product Poisson and then inferences based on the maximum likelihood
estimates are made assuming that they are invariant with respect to the
sampling assumption. Birch (1963) and Palmgren (1981) derive rules for
 73 
when these inferences, based on the two different sampling assumptions, will
be equivalent. However, they assume that the model is of a simple loglinear
form. That is, the Poisson model is assumed to have form
0* = {Â£ G Rs : t = X(3}.
We will use the results of this section to derive more general rules for when
the two inferences will be equal. As a special case of these results, we will
arrive at the Birch and Palmgren results.
The following lemma will enable us to rewrite A of (2.4.7) in still a
simpler form.
Lemma 2.4.1 Let Z = [Z\,..., Zk\ be an r x K matrix of full rank K.
Suppose that X = [Xx,..., Xp\ is an r x p (r > p > K) matrix of full rank p
such that m(X) D m(Z), i.e. the range space of X contains the range space
of Z. Denote the T (K
M(Z) by {X^,... ,Xâ€žT}. Without loss of generality, suppose that the set of
vectors {XVl,...,XVT} is a minimal spanning subset, i.e. the spanning set
of any r < T of these vectors does not contain the range space of Z. We
conclude that
3W e RTxK 3 (X'X)~lX'Z = JW,
where the p x T matrix J = [eVl,..., eâ€žr] and eVi is the p x 1 vector
(0,..., 0,1,0,..., 0)' with the â€˜1 â€™ in the v\h position.
74
Proof: Let X* = [XVl,... , Xâ€žT]. Now, by assumption, M(X*) D M(Z).
Hence, there must exist a matrix W e RTxK 3 Z = X*W. Therefore,
(x'xy'x'z = (x'xy'x'XtW = (x,x)1(x,x*)wâ€™ = jw
where J = (X'X)_1(X'X*) is as stated in the conclusion of the lemma. g
Before stating the next important theorem, let us write A in another
way. Assuming that (Al) holds, A can be written as
a=(aâ€œ A22) (2.4.8)
where
Ay = [XW'XM @ ^)( Â® ^WXjiXW'.
Now, if CÂ¡ is a contrast matrix, by assumption (A2), we can write
(XIXJ'XICJ Â© lp~) = 0 = JWwW,
vnk
(2.4.9)
where Jb) can arbitrarily be chosen to be equal to X[ and so Wb) = 0. On
the other hand, if C, = Iq.K then we have by (A3) that At(XÂ¿) D XÃ(Â®lmÂ¡).
Therefore, we can invoke the result of Lemma 2.4.1 by setting Z =
Since M(Xi) 2 Al(ffilmÂ¡) = M(Z), the conditions for the lemma are satisfied.
Let X;* = [X. (,),...,X. (,â€¢)] be the x (K < Ti < Pi) submatrix of XÂ¿
W1 tVTÂ¡
that has columns that form a minimal spanning subset for M(Z) â€” A^Â©^Â»).
By Lemma 2.4.1,
3WÂ® 6 RTixK 3 (X!XÂ¿)_1X!( ffi Ip) = JWW&.
Vnk
(2.4.10)
Here, Jb) = [e (i),..., e (i)], where the TÂ¡Â¡ elementary vectors correspond to the
V1 UTi
columns (X. X. (i)} of XÂ¿ that form a minimal spanning subset for the
 75 
range space of Â®lmÂ¡, i.e. the columns span a set that contains the range
space of Â®lmi and any smaller set of columns will not span a set containing
the range space of Â©lmÂ¡ .
It follows that the matrices A,J of (2.4.8) can be written as
A*7 = jWwWw'O) J'U)
where
[e (i)], if Ci=Iq.K
X[, otherwise
and
= if Ci =/Â« jt
\ 0, otherwise.
We now state a theorem of substantive importance.
Theorem 2.4.2 Suppose that assumptions (Al), (A2), and (AS) hold. For
r = 1,2, if Cr is the identity matrix then let {u[r\... ,u^} be the set of
indices that index those columns of Xr that form a minimal spanning subset
for Ai(ffilmr). Then it follows that the relationship between the asymptotic
variances of the two estimators /3(M) and Â¡3(p) is
var(/?(M)) = var(/?(p))  ^22 ) ,
where the pi x pj matrix AlJ is a zero matrix whenever at least one of or
Cj is a contrast or zero matrix. Otherwise, if both CÂ¿ and Cj are identity
matrices then
(2.4.11)
(2.4.12)
(2.4.13)
^ki â€”
if (k,l) (f {v\
(*)
ut!}
X
76
Proof: Since (Al), (A2), and (A3) hold, we can rewrite A,JI as in (2.4.11).
Now, if either CÂ¿ or Cj are contrast or zero matrices, it is obvious by (2.4.9)
that A*7 will have zero components, as stated in the theorem, since at least
one of jyb) or Wwill be a zero matrix. On the other hand, if both CÂ¿ and
Cj are identity matrices, then A17 can be rewritten as in (2.4.11) where
and the matrices Wb) and are elements of RTÂ¡xK and RTixK. Hence,
V
where W'i = W^W'C) is some xTj matrix. Now, since {eâ€ž} are elementary
vectors, we have that if
(M) Â£ K
(i)
Ãœ)
â€™ UT,
},
then the component A^t = 0. Otherwise, if (k,l) is a member of this set, it
must be that AÂ¿j is one of the elements of the matrix W,J. This completes
the proof. ^
The next two corollaries follow immediately from Theorem 2.4.1.
Corollary 2.4.2 If both C\ and are contrast matrices then
var(^(M^) = var(^(p)).
77
Proof: Since both Cj and C2 are contrast matrices it follows that W(*)
and WW are zero matrices. Therefore, the matrices Aâ€˜J of the theorem are
zero matrices. g
Corollary 2.4.3 Let C2 = 0,X2 = 0, and C\ = A\ = I3, so that the model
(2.3.4) becomes
e* = {( 6 Râ€˜ â– â– Â£ = xp, e<'(els)=n'>,
i.e. a simple loglinear model with K subpopulations. Let {v\,..., Ut} be the set
of indices that index the columns of X that form a minimal spanning subset
for Then
var(/3(M)) = var(/5(p))  A,
where the elements of A are such that
Aki = 0, if (M) 0 M,...,^r}2
Proof: The proof is an immediate consequence of the theorem upon
identifying A11 of the theorem with A of the corollary. The other matrices
A12, A21, and A22 will be zero since C2 = 0. _
Corollary 2.4.3 is of practical importance and is essentially the result
shown by Palmgren (1981). In particular, if we parameterize the model in
such a way so that there is a parameter included for each of the K independent
multinomials (or K covariate levels), then the K columns of X corresponding
to these K â€˜fixed by designâ€™ parameters will form a basis (and hence a minimal
spanning subset) for At(Â®f lj?). Therefore, if (3i and (3j are not one of the
 78 
K parameters fixed by design, then cav(^M\^M^) = cov{j3\p\j3^). We
will illustrate the utility of the above results in the next chapter of this
dissertation.
The next section considers issues that may arise when computing the
model degrees of freedom. It also states some other miscellaneous results
with regard to the Lagrange multiplier statistic.
2.5 Miscellaneous Results
We begin this section by addressing practical issues that may arise during
nonstandard model fitting. Specifically, we will consider computing the model
and distance (or residual) degrees of freedom.
Computing model and distance degrees of freedom. Assuming the model
[0ft] of (2.3.5) is well defined, i.e. the u + l + K constraints are nonredundant,
we can compute the model degrees of freedom as in section 2.2. In that
section, we defined the model degrees of freedom as the number of model
parameters minus the number of independent constraints implied by the
model. Notice that in this application we have an additional l linear
constraints. The l constraints were not present in section 2.2. It follows
that the model degrees of freedom for [Â©/,] is
df[Qh] = s  (u + l + K) (2.5.1)
where s is the number of cell means, u is the dimension of the null space of X',
l is the number of linear constraints, and K is the number of identifiability
constraints.
 79 
To measure model goodness of fit, we can consider estimating some
hypothetical distance between model [0/,] and the saturated model (u = l = 0)
[0]. This distance, denoted <$[0/,;0] has degrees of freedom
d/(i[ek;e]) = d/[0]d/[e4]
= (sK)(s(u + l + K)) (2.5.2)
= u + 1.
Notice that, had we considered the product Poisson model (2.4.2), the
distance degrees of freedom would be
df(8[Q^; 0(p)]) = s  (s  (u + /)) = u +1,
which is identical to the product multinomial distance degrees of freedom of
(2.5.2).
We have assumed that the u + l + K constraints are nonredundant, i.e.
each constraint is not implied by the other constraints. This may not always
be the case. To illustrate, consider the model specification for example 3 of
section 2.2.2. The model [Â®mh] implies that the two marginal distributions
are equal. We stated at the end of that example that the additional constraint
7t2+  7t+2 = 0 was redundant. This can be seen since
7r2+ ~ 7r+2 â€” Tn  7r12 = ("^l^  tt+i) â€” 0
That is, the constraints of model [Â®mh] imply that 7r2+  7r+2 equals zero.
Had we blindly added this constraint, we may have incorrectly calculated
the model degrees of freedom as 1 and the distance degrees of freedom as 2.
Therefore, we must be very careful to have a set of nonredundant constraints
when computing degrees of freedom.
 80 
In practice, when models are more complicated, it may be difficult to asÂ¬
certain whether or not the model constraints are nonredundant. Fortunately,
there are two very useful results that help in this regard.
The first result is that when the constraints are redundant, the matrix
evaluated at some point in 0/, is of less than full rank and is not
invertible. Therefore, in practice, if the algorithm (2.3.13) does not converge
due to G being singular, it may be due to redundant constraints, i.e. an ill
defined model. The user should investigate and possibly respecify the model
should this occur. A caveat is that due to computational roundoff error, a
singularity may not occur even when the model is ill defined because the
iterate estimates, including the final estimate, may not strictly lie in Qh. The
next result may mitigate this problem.
A result that is useful in practice is that a necessary condition for the
constraints to be nonredundant or equivalently for the model to be well
defined, is that the Lagrange multiplier statistic be invariant to choice of
U, a matrix with columns spanning the null space of X1. Evidently, if the
user fits the model several times, each time using a different lU' matrix, and
the Lagrange multiplier statistic varies (more so than can be explained by
roundoff error), then it must be that the model is ill defined.
Formally, this necessary condition can be stated as
Theorem 2.5.1 Let U\ and U2 (U\ U2) be any two full column rank
matrices satisfying UX = 0, Â¿ = 1,2. Denote the Lagrange multiplier statistic
evaluated using Ui by LM{Ui). If the matrix
Hi = Aâ€™)CUi, Ã©â€˜U)
 81 
is such that [Hi, e^] is of full column rank, i = 1,2, and hence the models well
defined, then
LM{UX) = LM(U2),
i.e. the value of the Lagrange multiplier statistic is invariant with respect to
choice of U.
Proof: Denote the model specified in terms of C/Â¿ by [0fc.], i = 1,2. By
the definition of U{ we know that the constraints implied by [0ftJ and [0ftj]
are equivalent. Hence, the solution Â£ to (2.3.8), or equivalently (2.4.3), under
either model is the same. Thus, in view of the first set of equations in (2.3.8),
any solution vec(Â£, AÂ¿) under model [0/,,.] must satisfy
(yeÂ¿)= HiCOk, Â¿ = 1,2. (2.5.3)
Notice that since Ux ^ U2, we have that Hx{\) ^ H2(Â£) and by (2.5.3) Aj ^ A2.
Now, (2.5.3) implies that
ffi(Ã‰)Ã, = H2{t) A2.
(2.5.4)
Also, since Ã¼fÂ¿(Â£) is assumed to be of full column rank, the variance of ÃÂ¿,
var(ÃÂ¡) = (ffKÃ©jIT'ÃeÃWÃ))
1
(2.5.5)
exists. Therefore, the Lagrange multiplier statistics LM{Ui), which have form
Ã'[var(AÂ¿)] 1Xi, i = l,2
(2.5.6)
 82 
exist. Finally, by (2.5.4)(2.5.6), it follows that
LAf(CTi) = A^var^)]1^
= X2(H'2(()D'(Â¿)H,(0)M
= Ã'2 [var(Ã2)]Ã2
= LM(Ut).
This completes the proof. g
The final result of this section states that the Lagrange multiplier
statistic is exactly the same as the Pearson chisquared statistic whenever the
random vector Y is productPoisson or productmultinomial and the model
satisfies assumptions (Al), (A2), and (A3).
Theorem 2.5,2 Assume that the productmultinomial model satisfies
assumptions (Al), (A2), and (A3). Let X2 denote the Pearson chisquared
statistic, i.e.
X2 = (y  fi)
where fi is the ML estimator under either of the sampling schemesâ€”product
multinomial or productPoisson. It follows that the Lagrange multiplier
statistic LM is equivalent to X2. That is,
LM = X2.
Proof: By equations (2.5.3), (2.5.5), and (2.5.6) of the previous theoremâ€™s
proof and the fact that e* = Â¡x, we have that
LM = (yÂ¡i)'D'Ui)(y Â£) = *2
This is what we set out to show.
 83 
2.6 Discussion
In this chapter, we discussed in some detail issues related to parametric
modeling. In particular, we followed the lead of Aitchison and Silvey (1958,
1960) and Silvey (1959) and described two ways of specifying modelsâ€”using
constraint equations and using freedom equations. In section 2.2, distance
measures for quantifying how far apart two models are, relative to how close
they are to holding, were discussed. In particular, the powerdivergence
measures (Read and Cressie, 1988) were used when the parameter spaces were
subsets of an (s  l)dimensional simplex. Estimates of these distances were
developed based on very intuitive notions. Also, a geometric interpretation
of model and residual (or distance) degrees of freedom was given.
In section 2.3, we described a general class of multivariate polytomous
(categorical) response data models. The class of models, which satisfy
assumptions (Al), (A2), and (A3), were shown to satisfy the necessary and
sufficient conditions of Birch (1963) so that the models could be fitted using
either the productPoisson or productmultinomial sampling assumption.
An ML fitting method was developed, using results of Aitchison and SilÂ¬
vey (1958, 1960) and Haber (1985a, 1985b). The algorithm used Lagrangian
undetermined multipliers in conjunction with a modified NewtonRaphson
iterative scheme. The modification, which simplifies the method of Haber
(1985a), is to use a simpler matrix than the Hessian matrix. We replace
the Hessian matrix (of the Lagrangian objective function) by its dominant
part, which turns out to be easily inverted. Because the matrices used in the
algorithm proposed in this chapter are very large and must be inverted, this
 84 
modification is a very important one. A FORTRAN program â€˜mle.restraintâ€™
has been written by the author to implement this modified algorithm.
The asymptotic behavior of the ML estimators computed under the two
sampling schemesâ€”productPoisson and productmultinomialâ€”was investiÂ¬
gated. The method for deriving the asymptotic distributions represents a
modification to the technique of Aitchison and Silvey (1958). A comparison of
the limiting distributions of the two estimators was made in section 2.4. Some
very interesting results were obtained by studying the asymptotic behavior
in the constraint equation setting. In particular, Theorem 2.4.2 represents
a generalization of the results of Palmgren (1981). The theorem provides a
method for determining when the inferences about the freedom parameters
of a generalized loglinear model of the form dog A/i = X(3 will be invariant
with respect to the sampling assumption. Palmgren (1981) developed some
similar results for the special case when the freedom parameters are part of
a loglinear model.
It is important to note that the asymptotic results are only valid if
the number of populations K is considered fixed and the expected counts
all get large at approximately the same rate. In particular, the asymptotic
arguments do not hold when the covariates are continuous, since the number
of populations (levels of the covariates) can theoretically run off to infinity.
The reason the arguments do not hold is that when we use the method of
Aitchison and Silvey (1958) it is required that the vector n* ldir^n converge
in probability to zero as the total number of observations gets large. This is
the case only when n* = minfni,..., n#} goes to infinity. This drawback
 85 
could prove to be temporary. It seems reasonable to assume in many cases,
that as long as the â€˜informationâ€™ about each parameter is increasing without
bound, the estimators will be consistent and asymptotically normally disÂ¬
tributed. For example, consider the logistic regression model with continuous
covariates. Although the n^s may all be 1, the ML estimators of the
regression parameters are often consistent and asymptotically normal.
Section 2.5 outlines some miscellaneous results. One result that is
important to the practicing statistician, is that the Lagrange multiplier
statistic is shown to be invariant with respect to choice of the matrix U
(of U'C log An = 0) as long as the model is well defined. An important
implication of this result is that if one fits the model several times, each
time using a different ÃU, matrix, and the Lagrange multiplier statistics
vary more so than can be explained by roundoff, then it could be that the
model is not well defined. Another interesting result is that the Lagrange
multiplier statistic is simply the Pearson chisquared statistic X2 whenever
the assumptions (Al), (A2), and (A3) are satisfied.
Theoretically the ML fitting algorithm will work for any size problem.
Practically, however, the algorithm is certainly not a model fitting panacea.
The number of parameters that must be estimated gets very large, very fast.
Consider the case where 7 raters rate the same set of objects on a 5 point
scale. Even without covariates, the number of cell probabilities that must be
estimated is 57 = 78,125. It seems the ML fitting method developed in this
chapter is, at least for now, useful for moderate size problems only. It can be
used to analyze longitudinal categorical response data when the number
 86 
of measurements taken on each subject is somewhere in the neighborhood of
2 to 6. This is not to take away from the utility of this chapterâ€™s algorithm,
but rather to indicate its breadth of application. In time, with increasing
computer efficiency, much larger data sets may be fitted using this algorithm.
CHAPTER 3
SIMULTANEOUSLY MODELING THE JOINT AND MARGINAL
DISTRIBUTIONS OF MULTIVARIATE POLYTOMOUS
RESPONSE VECTORS
3.1 Introduction
Often times, when given an opportunity to analyze multivariate response
data, the investigator may wish to describe both the joint and marginal
distributions simultaneously. We consider a broad class of models which
imply structure on both the joint and marginal distributions of multivariate
polytomous response vectors. To illustrate the need for such models, we
consider several settings where these models would be useful. For example,
when the multivariate responses represent repeated measures of the same
categorical response across time, one may be interested in how the marginal
distributions are changing across time and how strongly the responses are
associated. The simultaneous investigation of both joint and marginal
distributions is not restricted to the longitudinal data setting. Other examples
include the analysis of rater agreement, crossover, and social mobility data.
The common thread tying all of these data types together is that the sampling
scheme is such that the different responses are correlated. In longitudinal
studies the same subject responds on several occasions. In rater agreement
studies, raters rate the same objects. In twoperiod crossover studies, one
group of subjects receive the two treatments in one order and the other group
receive them in the other order. In social mobility studies, the socioeconomic
 87 
 88 
status of a fatherson pair is recorded. When the responses are positively
correlated, these designs result in increased power for detecting differences
between the marginal distributions (Laird, 1991; Zeger, 1988).
This chapter considers the modeling of multivariate categorical responses
in which the same response scale is used for each response. The classes
of models used in this chapter are of the form considered in Chapter 2 of
this dissertation and hence are readily fit using the ML methods of that
chapter. In section 3.2, we give several examples that may be analyzed by
simultaneously modeling the joint and marginal distributions. We introduce
the classes of simultaneous JointMarginal models in section 3.3. Several
models are fitted to the data sets of section 3.2.
3.2 ProductMultinomial Sampling Model
Initially, we assume that a random sample of nk subjects is taken from
population k, k = 1,..., K. The number of populations, or covariate profiles,
K is considered to be some fixed integer. The subscript k is allowed to be
compound, i.e. the subscript k is allowed to represent a vector of subscripts
such as
k = [ki,ki,... ,kv).
Suppose that there are T categorical responses Vf1),..., V(T) of interest
and that each response is measured on the same response scale. Let
14 = (V^,..., VÂ£T^)' be the random vector of responses for population k
and Vku, u = l,...,n*. be the nk independent and identically distributed
copies of 14, where Vku denotes the response profile for the uth randomly
 89 
chosen person within population k. Notationally we have,
Vku ~ i.i.d. 14, u = 1, â€” ,n*
For our purposes we can assume that each response takes on values in
{1,2,..., cÂ¿} with probability one. Denote the probability that a randomly
selected subject from population k has response profile i = (Â¿1,..., iy)' by Trik,
i.e.
P{Vk = =
where i Â£ {1,..., d} x â€¢ â€¢ â€¢ x {1,..., d}.
The joint distribution of Vk = (V^,..., vÂ£T^)' is specified as {7rifc}. The
marginal distributions of Vk will be denoted by k)}, t = 1,..., T, where
i(t,k) = P(V^=i), i = l,...,d
Our objective is to model simultaneously the K joint distributions
{TTjjfc}, k = l,...,K
and the KT marginal distributions
{*(<;*)}, i = l,...,T, k = l,...,K.
To help the reader better understand the notation, we consider the one
population bivariate case. When T = 2, the response profiles can be denoted
by i = (Â¿1,Â¿2) = (bj)> where i = 1 and j = 1 ,...,d. Since there is
just one population (or covariate profile) the subscript k is always 1 and is
therefore dropped. It follows that {7rÂ¿^} is the joint distribution of (F^1), F(2))'
and {0Â¿(Ã)}, t = 1,2 are the two marginal distributions. That is,
= p(vm = i, vw = j), Â¿ = 1,j = i,...,d
 90 
and
4>Â¡(t) =
JTi+ = P(V<â€˜> = i),
*+i = P(W = i),
if Ã â€” 1
if t = 2
for i = 1,2,..d.
Now for each population k, consider the (F x 1 random vector of
indicators
 [V*=*i)â€™ â€¢ â€¢ â€¢ â€™ I(yk=iiT)\
Notice that no information about the Vk is lost since ^ is a onetoone
function of Vk. Also,
~ ind. Mult(l, {7rÂ¿*.}), k = l,...,K
Therefore, since we have randomly sampled nk subjects from each of the K
populations, we have that for given k
^ki,^k2,,^knh ~ i.i.d. Mult(l, {7rifc})
and hence the vector
Yk = Y^ ^ku ~ Mult(n*> i^ik})
u=l
is sufficient for the family of distributions {7rÂ¿*.} and {Â¿(t; k)}.
By independence across populations, the vector vec(Yj,Y2, â€¢ â€¢ â€¢, Yk) is
sufficient for the joint and marginal distributions of vec(Vj, V2,..., Vk).
Further, the random vector vec(yj, Y2,. â€¢ â€¢, Yr) is productmultinomial, i.e.
Yk = (Ylk,...,YSjty ~ ind Mult(nfc, {7riJb}), k = l,...,K
where 1.,R represent the R = d? different response profiles.
 91 
Evidently, Yik represents the number of randomly selected subjects from
population k who have response profile i. That is, the {P**.} represent counts
resulting from a crossclassification of N = Ylk=1 nJfc subjects on T response
variables and a population variable. The data can be displayed in a d? x K
contingency table. By convention, we use lower case Roman letters to denote
realizations of random quantities. For example, yik represents a particular
realization of Yik.
Consider Table 3.1, taken from Hout et al. (1987).
Table 3.1. Interest in Political Campaigns
1960
Not Much
Somewhat
Very Much
Not Much
155
116
64
1956 Somewhat
91
237
171
Very Much
32
91
246
278 444 481
335
499
369
1203
Source: Hout et al. (1987), p. 166, Table 4
Each of 1203 randomly selected subjects was asked in 1956 how interÂ¬
ested they were in the political campaigns. They responded on the 3category
ordinal scale: 1 = Not Much, 2 = Somewhat, and 3 = Very Much.
Then, in 1960, each of the subjects was asked the same question and
responded on the same 3category ordinal scale. Using the above notation,
 92 
we let V^1) and V(2) represent the responses in 1956 and 1960. Let i,j =
1,2,3 represent the number of the N = 1203 subjects responding at level
i in 1956 and level j in 1960. Notice that there is just one population
of interest, we drop the population subscript altogether. Finally, for this
bivariate response example, the compound subscript i is replaced by ij. Table
3.1 summarizes the bivariate responses.
As another example, consider the crossover data of Ezzet and White
head (1991).
A
Table 3.2. Crossover Data
B B
1
2
3
4
1
2
3
4
1
59
35
3
2
1
63
40
7
2
2
11
27
2
1
A 2
13
15
2
0
3
0
0
0
0
3
0
0
1
1
4
1
1
0
0
4
0
0
0
0
AB Sequence BA Sequence
(Group 1) (Group 2)
The counts displayed in Table 3.2 are from a study conducted by 3M
Health Care Ltd. to compare the suitability of two inhalation devices (A and
B) in patients who are currently using a standard inhaler device delivering
salbutomal. Two independent groups of subjects participated. Group 1 used
device A for a week followed by device B (sequence AB). Group 2 used the
devices in reverse order (sequence BA).
The response variables V^1) (device A) and V(device B) are ordinal
polytomous. Specifically, they are the selfassessment on clarity of leaflet
instructions accompanying the two devices, recorded on the ordinal four point
scale,
 93 
1 = Easy
2 = Only clear after rereading
3 = Not very clear
4 = Confusing.
For this example there are two populations of interestâ€”Group 1 and
Group 2. Let yÂ¡jk represent the number of the nk subjects responding at level
i for device A and level j for device B, where rii = 142 and n2 = 144. Again,
the bivariate response profiles can be denoted by i = ij where i, j = 1, 2, 3,4.
The bivariate responses are summarized in Table 3.2.
3.3 Joint and Marginal Models
Two types of questions that can be posed about Table 3.1 lead to quite
distinct types of models. One question is whether the interest in the political
campaigns was different at the two times. For example, the researcher
may wish to test the hypothesis that there was more interest in the 1960
political campaign than the 1956 political campaign. An investigation into the
marginal distributions is needed to test this hypothesis. For these bivariate
response data, the marginal distributions correspond to the row and column
distributions of Table 3.1. A second question that may be asked is whether
the two responses are associated and if so, how strong is the association. To
answer these questions, we must describe the dependence displayed in the
joint distribution of Table 3.1.
The marginal models we consider will be used to investigate whether
the probability that a randomly selected subject responds at level i or lower
in 1956 is different from the probability that a randomly selected subject
responds at level i or lower in 1960. In this sense, the comparison of marginal
94
distributions gives a â€˜population averagedâ€™ description of change. That is, we
will describe how the marginal distribution changes on the whole, averaging
over the entire population. In contrast, subjectspecific modeling allows us to
investigate how a randomly chosen subjectâ€™s response changes from 1956 to
1960. Zeger et al. (1988) discuss at length the difference between population
average and subjectspecific models.
The same types of questions may be posed about the distributions of
Table 3.2. For example, one may wish to determine whether the leaflet
instructions are perceived as clearer for one of the devices. Also, we may
be interested in whether there is a sequence effect. That is, does the order
of â€˜exposureâ€™ to the two deviceâ€™s instruction leaflet affect the perception of
clarity. To answer these two questions we must investigate the marginal
distributions corresponding to the row and column totals of Table 3.2. Finally,
one may be interested in testing whether the association between the two
responses is the same for both sequences. We will consider modeling the joint
distributions to answer this question.
Modeling of marginal distributions is usually conducted separately
from the modeling of joint distributions. We use results from Chapter 2
of this dissertation to show that these models can be fit simultaneously
using maximum likelihood methods. Simultaneously modeling the joint and
marginal distributions leads to several advantages. It will provide a single
test for overall goodness of fit. Also, it provides improved model parsimony,
potentially resulting in better estimates than one would obtain by fitting the
models separately.
 95 
We consider four classes of simultaneous models. Let J(S) represent
the class of saturated joint distribution models. These models imply no
structure on the joint distributions and therefore allow for general association
between the T responses. Similarly, let M(5) be the class of marginal
models that assume no structure on the marginal distributions, i.e. M(S)
is the class of saturated marginal models. Denote the classes of unsaturated
models by J{U) and M(U). By simultaneously modeling the joint and
marginal distributions we can consider four classes of models, J(S) n M(S),
J{U) n M(S), J(S) n M(U), and J(U) n M{U). The union of these four
classes will be denoted by J n M. We let the symbol J{M\) n M(M2), where
Mi and M2 are particular models, represent a specific model in J n M. Some
examples of Mi and M2 are Mi â€” QSY, the quasisymmetry model, and
M2 = MH, the marginal homogeneity model. The two symbols S and U
will represent either the â€˜classâ€™ of saturated and unsaturated models or an
arbitrary model in those classes. The possibility that the joint distribution
structure implied by the joint model J{Mi) will imply that the marginal
distributions are constrained in some way is always there. In this case the
model may not be well defined in the sense of Chapter 2. We address this
issue in section 3.6.
The first class of models J(S) n M(S) is the class of completely
unstructured or fully saturated models. These models fit the data perfectly
and are used primarily for exploratory purposes. If an estimated freedom
parameter is small relative to its standard error, the corresponding effect
may prove to be negligible. In this way, the fit of the saturated model may
suggest simpler models that may fit the data well.
 96 
The models in class J(U) n M(S) focus on modeling the joint distriÂ¬
butions. No additional structure on the marginal distribution is assumed.
This class includes ordinary loglinear models for the expected cell frequencies
in the joint distributions. Fitting this simultaneous model is equivalent to
separately fitting the joint distribution model J{JJ) in that the goodnessoffit
statistic and joint model parameter estimates will be exactly the same. There
is, however, some benefit to fitting the simultaneous model; marginal model
parameter estimates are obtained. In general, these J(U) models are not
designed to estimate effects in marginal distributions. There are exceptions.
For example, the symmetry model for the joint distribution implies that all of
the marginal distributions are equal. Bishop et al. (1975) discuss comparing
the fit of the symmetry (SY) model to the fit of the quasisymmetry (QSY)
model to test for marginal homogeneity. Our focus will be on models that
do not imply any structure on the marginal distribution. Loglinear models
that assume no relationship among the main effect parameters satisfy this
condition.
The models in class J(S)nM(U) are used to answer questions about the
marginal distributions. They assume no structure for the joint distribution
and hence allow for general association among the responses. Fitting a
J(S) n M{U) model is equivalent to separately fitting the M(U) model in
that the goodnessoffit statistic and the marginal model parameter estimates
are exactly the same. A simple M(U) model that is often of interest is
the marginal homogeneity (MH) model. Madanskyâ€™s (1963) test of marginal
homogeneity is simply the likelihoodratio test comparing the fit of J(S) n
M(MH) to the saturated model J(S)nilf(5), For bivariate dichotomous
 97 
response data, an analogous test using the Lagrange multiplier statistic
(which is shown to be equal to Pearsonâ€™s chisquared statistic in Chapter
2) is McNemarâ€™s (1947) test.
In this chapter, we will focus primarily on the parsimonious models
within the class J(U) n M(U). Often times, a simple model can be found
that fits the data relatively well. Simultaneous inferences about both the
association structure and the marginal distribution structure can be made
using the model or freedom parameter estimates, or goodnessoffit statistics.
Also, by the parsimony principle, the parameter estimates may be more
reliable than those based on less structured models. See Agresti (1990) and
Bishop et al. (1975) for a discussion of the benefits of using parsimonious
models. We can use models within this class to test such things as MH
given that QSY holds. This can be accomplished by comparing the fit of
J(QSY) n M(MH) to the fit of J(QSY) n M(5). More generally, we may
wish to test for MH given that some simple model Mi holds for the joint
distribution.
Let fj.k = (/zlfc,..., fjLRk)' be the vector of expected frequencies for
population k. That is
Hik â€” nk^ik
The RK x 1 vector /z is defined as /z = vec(/zi,/Z2,...,^k) For the marginal
distributions, let {mÂ¿(Ã; k) = nki(t',k)} represent the marginal distribution
expected cell frequencies. Cumulative marginal probabilities will be denoted
by T/Â¿(Ã;fc), i.e.,
t
r]i(t]k) = ^2(f)l/(t,k), t = l,...,d.
V=\
 98 
We consider models in the following classes:
J: CilogAi/x = XiPu ovL1fx = X1f31
M : C2 logA.2/Â¿ = X2(32 or L2Â¡j, = ^2(^2 â€¢
(3.3.1)
The matrices Cj and C2 are either identity, contrast (rows sum to zero),
or zero matrices. The model matrices X\ and X2 are assumed to be of full
column rank. We refer to the parameters in vectors and (32 as freedom
parameters, whereas the components of the parameter vector Â¡j, will be called
model parameters.
Evidently, the class of models JnM of (3.3.1) is very broad. Permissible
models for the joint distributions include simple loglinear models as well as
models for log odds ratios using individual cells (e.g. local odds ratios)
or groupings of cells (e.g. global odds ratios which are crossproduct
ratios of quadrant probabilities, cf. Dale, 1986). The marginal models of
class M can be loglinear or corresponding logit models (such as adjacent
categories or baselinecategories logit models) or they can be other types of
multinomial response models, such as cumulative or continuationratio logit
models (Agresti, 1990). The second form for each model in (3.3.1) allows for
linear probability or mean response models (Grizzle et al., 1969). All of the
models in JnM can be fit using the methods of Chapter 2. We illustrate the
usefulness of these models by way of example.
3.4 Numerical Examples
Example 1. We begin by simultaneously modeling the joint and marginal
distributions for Table 3.1. Recall that response variable W1) represents a
randomly chosen subjectâ€™s response to the political interest question in 1956
 99 
and y(2) is a randomly chosen subjectâ€™s response to the political interest
question in 1960. Some candidate models for the joint distribution of
(VW, V^2)) include the following:
J(I):
log Vij
â€” a
+
aY(1)
+
^y(J)
aj
J{QSY) :
log Vij
= a
+
aY(1)
+
V(2)
aY
J(LxL) :
log Vij
= a
+
v(l)
ai
+
y(2)
aj
J(L x L + D) :
log fJij
= a
+
v(l)
ai
+
a]
J(S):
lo gHij
= a
+
ai
+
v(*)
aj
+ cxYÂ¡1)vW, {aYÂ¡1)vW
+ OuiVj
+ SuiVj + 8I(i = j)
+ aY"vm
=ayÂ¡"vW)
where I = independence, QSY = quasi symmetry, L x L â€” linearbylinear
association, and L x L + D also adds a maindiagonal parameter. The
latter two models recognize the ordinality of the measurement scale, through
sets of monotone scores {rq} for and {vj} for V^2\ The L x L form of
model fits well when underlying continuous variables have a bivariate normal
distribution (Goodman, 1981; Becker, 1989), and extra parameters for the
main diagonal can account for larger frequencies often observed there when
both dimensions have the same categories.
Candidate models for the marginal distributions of V = (V^1), V(2))
include the following:
M(MH):
log m^t) = (3 + (3? + f3j
M(L x L) :
log rriiit) = Â¡3 +(3?+ (3j + (3^vUi
M(CU):
logit T]i(t) =UJi + 7t
M(S):
log mi(t) = (3 + f3f + /3j + (3jÂ¡v
where CU denotes the cumulative logit and the superscript R is used to
label those parameters related to the â€˜levelâ€™ of response. There is marginal
 100 
homogeneity if there is no association between level of response (R) and
response variable (V) (cf. Agresti, 1989). When the number of levels of
V exceeds two (i.e. T > 2) and V can be considered ordinal, rather than
assume that there are general row effects for levels of V, one could account
for the ordinality by introducing scores for the levels of V. That is, we could
replace PpvUi by fiRVUiVt in the loglinear model and replace 7t by 7vt in the
cumulative logit model. An example where we can consider V as ordinal is
when the T responses represent repeated measures over time. The T levels
of V are then naturally ordered; response at occasion 1 (V^1)), response at
occasion 2 (W2)), ..., response at occasion T (V(T)). For model identifiability,
certain parameters (or more generally, linear combinations of parameters)
were set to zero. For example, the parameter 72 of model M(CU) was set to
zero.
To obtain information about which simultaneous models may fit well,
we first investigate joint and marginal models separately. Table 3.3 contains
likelihoodratio (G2) and Pearson (AT2) goodnessoffit statistics for several
models in the class J(U)nM(S). The associated distance or residual degrees
of freedom are listed as well. The linearbylinear terms used equally spaced
scores for rows and for columns.
Table 3.3. Joint Distribution Modelsâ€”Goodness of Fit
Model
df
G2
X2
J(S) nM(S)
0
0.00
0.00
J(QSY)nM(S)
1
0.39
0.39
J(LxL + D)nM(S)
2
0.49
0.49
J(Lx L)r M(S)
3
18.58
18.72
J(I) n M(S)
4
245.01
253.09
 101 
Both J(QSY) and the simpler J(L x L + D) models fit well. Notice
that the independence model fits poorly as is usually the case for longitudinal
data.
We next fit several models in the class J(S)nM(U). The goodnessoffit
statistics and the associated residual degrees of freedom for these marginal
models are tabled in Table 3.4.
Table 3.4. Marginal Distribution Modelsâ€”Goodness of Fit
Model
df
G2
X2
J(S) n M{CU)
1
3.35
3.35
J(S)nM(LxL)
1
4.21
4.20
J(S) n M(MH)
2
38.22
37.49
There is very strong evidence of marginal heterogeneity as measured by the
goodnessoffit statistic for the model J(S) n M(MH) or as measured by a
comparison of that fit with the fit of some unsaturated model that allows for
marginal heterogeneity.
Finally, we will try to find a good fitting, parsimonious model in the
class J(JJ) n M{U) that simultaneously describes the joint and marginal
distributions. Since the model J(L x L + D) fits the data very well, we
will assume this structure for the joint distribution and simultaneously fit
several candidate marginal models. In section 3.5, we show that the model
J(LxL + D) belongs to a class of joint distribution models that do not imply
any structure on the marginal distribution. It therefore follows that residual
degrees of freedom for the simultaneous model J(L x L 4 D) n M(U) can be
 102 
computed as follows,
dfies[J{L xL + D) n M(U)} = dfies[J(L xL + D)] + dfTes[M(U)].
This follows since the model is well defined in the sense of Chapter 2 and since,
for well defined models, residual degrees of freedom is simply the difference
between the number of constraints implied by the simpler model and the
number of constraints implied by the less structured model. Table 3.5 contains
the result of fitting several models in the class J(L x L + D) n M(U).
Table 3.5. Candidate Models in J(L x L + D) n M{U)â€”Goodness of Fit
Model
J{Lx L +D)nM(S)
J(L x L +D)nM(CU)
J{LxL +D)nM{LxL)
J(L x L +D)nM(MH)
df
(P
JP
2
0.49
0.49
3
3.84
3.82
3
4.68
4.66
4
38.73
38.15
The simple model J(L x L + D) n M(CU) fits the data very well
(G2 = 3.84, df = 3). This model implies that the joint and marginal
distributions simultaneously follow the models,
J(L x L + D) : logfJ'tj = a + aYw + ajw +9ij + 8I(i = j)
M(CU) : logit T)i(t) = ujÃ + 7i
In Table 3.6, we give the ML estimates of the freedom parameters for
this model along with their corresponding estimate of standard errors.
 103 
Table 3.6. Estimates of Freedom Parameters for
Model J(L x L + D) n M(CU)
Parameter
Estimate
Std. Error
a
0.085
0.662
v(i)
ai
2.430
0.349
a.T
1.605
0.203
v(3)
<
1.606
0.325
y(3)
Â«2
1.172
0.192
e
0.563
0.081
8
0.355
0.084
U>1
1.255
0.063
U>2
0.435
0.057
7i
0.341
0.058
To test for marginal homogeneity in the context of this model, we can use
either of two asymptotically equivalent x2(l) test statistics:
G2 = 38.73  3.84 = 34.89
w2 / 0.341,2 04 57
W '0.058''
where W2 is the squared Wald statistic. The Pvalues for both of these tests
are less than 0.001. We conclude that there is strong evidence of marginal
heterogeneity. We need not, and should not, stop here. Since we are working
with model and freedom parameters, we can continue with other modelbased
inferences. Interval estimation of certain interesting freedom parameters is
considered next.
The interpretation of the parameter 71 is as follows: The odds that a
randomly selected subject would have responded at level i or less in 1956 is
exp(71) times higher than the odds that a randomly selected subject would
have responded at level i or less in 1960. Thus, the freedom parameter 71
measures the departure from marginal homogeneity in that the two odds are
 104 
identical if and only if ^ = 0. We use the delta method to compute a 95%
confidence interval for the odds ratio exp(7i); it is [ 1.324 , 1.488 ]. Thus,
based on the data at hand, we estimate that the odds that a subject would
respond at level i or less in 1956 is between 1.324 and 1.488 times higher
than the odds that a subject would respond at level i or less in 1960. There
is significant evidence of increased political interest in 1960 relative to 1956.
Next we consider the association between the two responses. The estimated
odds that the response in 1960 was â€˜very muchâ€™ instead of â€˜somewhatâ€™ is
exp(0 + 2Â¿) = 3.57 times higher when the response in 1956 was â€˜very muchâ€™
than when it was â€˜somewhatâ€™. The same estimated odds ratio applies when
the response was â€˜somewhatâ€™ instead of â€˜not muchâ€™. Similarly, the estimated
odds that the response in 1960 was â€˜very muchâ€™ instead of â€˜not muchâ€™ is
exp(40 + 28) = 19.34 times higher when the response in 1956 was â€˜very muchâ€™
than when in was â€˜not muchâ€™. In summary, there is evidence of strong positive
association between the response in 1956 and the response in 1960 and there
is evidence that there was greater political interest in 1960 than in 1956.
Suppose we ignored the fact that the same subjects responded to the
political interest question in 1956 and 1960. If we treated the two responses as
independent, then the row and column marginal counts would be distributed
as independent multinomials with the same index N =â– 1203 and probability
vectors {0Â¿(l)} and {^(2)}. Then it follows that separately fitting the
marginal model M{U) under this independence assumption is equivalent
to fitting the simultaneous model J(7) n M(U). By results of Liang and
Zeger (1986), the estimates of parameters in M{U) would be consistent, even
when the responses are not truly independent. However, the estimates of
 105 
the corresponding standard errors would no longer be valid. One way to see
that we are losing information by incorrectly assuming independence is by
comparing the likelihoodratio statistic for testing MH assuming J(I) holds
to the likelihoodratio statistic for testing MH assuming that J(L x L + D)
holds. The former is G2 = 268.33247.74 = 20.59 and the latter is G2 = 34.89.
Both of these values would be compared to a tabled %2(l) value. Evidently,
by accounting for the dependence between the responses we have greater
evidence of marginal heterogeneity. Another way of illustrating the effect
of wrongly assuming independence between the responses is by looking at
the freedom parameter estimates and their estimated standard errors for
different models. Table 3.7 contains estimates of 7j and the corresponding
standard error estimate under three different models of interest. Notice that
the standard errors are similar when one used either the saturated or the
diagonal parameter model for the joint distribution.
Table 3.7. Freedom Parameter Estimates and Standard Errors
Model
df
7i
se(7i)
J(S) n M(CU)
1
0.342
0.058
J(L x L + D) n M(CU)
3
0.341
0.058
J(I) rM(CU)
5
0.343
0.076
We have shown that there may be problems with assuming too much
structure on the joint distribution; for example, unreasonably assuming
independence. Similarly, we should be concerned with assuming too little
structure on the joint distribution. In this case, too many freedom parameters
require estimation and the overall fit may be unreliable. A good model is one
that fits the data at hand relatively well and is robust to the white noise
 106 
present in the data generation. That is, a good fitting model with model
parameter estimates that change very little for different realizations of the
random data vector, is considered a good model. For example, the saturated
model fits perfectly but has parameter estimates that may change greatly
for different realizations. In this sense the saturated model may not be a
good one; it may be unreliable. When we ignore the association structure by
separately fitting marginal models, we are tacitly using the saturated model
for the joint distribution. Table 3.8 illustrates why we should search for a
good fitting, parsimonious model. Note that the standard errors of expected
cell frequency estimates are inflated when we assume a saturated model for
the joint distribution. The more parsimonious model J(L x L + D) nM(CU)
fits as well as the less structured model J(S)r\M(CU), yet it is more reliable
in the sense described above.
Table 3.8. Estimated Cell Means and Standard Errors
for Models J(S) n M(CU) and J{LxL + D) n M(CU)
J(S) n
M(CU)
J(L x
L + D)nM(CU)
AÂ¿i
se(fia)
Ait
se(im)
152.79
11.49
154.28
10.56
127.00
8.80
123.08
6.56
64.87
7.82
66.98
7.16
82.74
7.53
83.25
4.89
237.30
13.80
237.30
13.80
159.53
9.95
159.00
8.12
31.41
5.52
29.37
4.10
99.14
8.44
103.05
6.28
248.23
13.98
246.70
13.16
 107 â€”
Example 2. We continue with the crossover data example of section 3.2.
Denote the set of 18 local odds ratios by {r^}, where
7~ij k
and 7represents the probability that a randomly chosen subject from
Group (G) k responds at the ith level for device A (Vl1)) and the jth level for
device B (W2)). Recall that cumulative marginal probabilities are denoted
by
u\i EUi *V+fc, if t = 1 (device A)
lEUi7r+i/*> if t = 2 (device B)
where i = 1,2,3,4 and k = 1,2. To elucidate, 773(2; 1) represents the
probability that a randomly chosen subject from Group 1 will respond at
level 3 or lower for device B (V(2)).
Some possible models for the joint distributions of (vÂ£1\ V^)', k = 1,2
include the following:
AS):
J{VWG,VWG,VWVW):
J(LxL):
J(VWG,VWG):
J(VW,VW,G):
J(UA(G)):
J{UA):
log Hijk = aijk
log fMijk = a + ocYW + ctJ(J)
+Â«?+Â«rG+<,G+
log** =a + arâ€œ> + ajm + aÂ° +
+a%')G + avt',vi,)ui Vj
log ** = a + <*rW + <â€™> + af + + aJ">e
log** = a + a,1"'' + + of
lÂ°g g* =u + 0k
log rijk=u
where J(S) is a fully saturated model, J{V^G, V^G, V(1)y(2)) assumes
no threefactor interaction, J(L x L) implies that there is no threefactor
 108 
interaction and that the association between the ordinal responses can
be accounted for by including a linearbylinear association parameter,
is the mutual independence model, and J(V^G, V^G)
implies that and W2) are conditionally independent given G. The model
J(UA(G)) implies uniform association within levels of G, and J(UA) is the
simple model that assumes this uniform association is the same for both levels
of G. When the row and column scores {uÂ¿} and {vj} are equally spaced,
models J(L x L) and J(UA) are equivalent. It is shown in section 3.6 that
model V"(2), G) implies that the marginal distributions of (VF), F(2))'
do not depend on G. When this happens, the simultaneous model will be ill
defined whenever the marginal model constrains the marginal distributions to
be equal across levels of G. We will not consider this particular model for this
reason. The rest of the models do not imply any structure on the marginal
distributions. Also, notice that simultaneously fitting J(V^G, V^G) and
some marginal model M(U) is equivalent to separately fitting M(U) when
the row and column marginal counts are treated as independent multinomials
within each level of G.
The marginal models we fitted include the following cumulative logit
models:
M(S): logit(r/,(i; k)) = fiut
M(VG): logit k)) = ft + ft + ft0 + Pla
M(V, G): logit(T?,(<; fc)) = ft + ft + ftG
M(V): logit (>,.((;&)) = ft+ftV
M( 1): logit(iji(<; fc)) = ft
 109 
where M(5) is the saturated model and M(VG) is the proportionalodds
cumulativelogit model for the marginal probabilities that allows for otherwise
general association between the response variable V, the group or population
variable G, and the response â€˜levelâ€™ R. In the literature on crossover designs,
a secondorder interaction among V, G, and R is said to be a â€˜carryoverâ€™ effect.
The model M(V, G) implies that there is no secondorder interaction among
the variables V, G, and R, i.e. the model implies that there is no carryover
effect. The model M(V) implies that there is no G effect, i.e. no sequence
effect. Finally, the simple model M( 1) implies that there is no V or G effect.
To make these models identifiable, we place the following restrictions on the
freedom parameters.
PX = PX = Pv
P? = PX = PG
qvg _ f PVG, if t + k = 3
^tk \ 0, otherwise
With this parameterization, f3v, (3G, and (3VG measure device, sequence, and
carryover effects, respectively.
Table 3.9 displays the goodnessoffit statistics and their associated
degrees of freedom for several simultaneous models. The L x L model used
the equally spaced row and column scores = Â¿} and {Vj = j}.
 110 
Table 3.9. Crossover Data Modelsâ€”Goodness of Fit
Model
df
G2
X2
J{S)nM(S)
0
0
0
J(S)nM(VG)
6
10.55
6.91
J(UA)nM(S)
17
17.36
9.19
J(V^G, V^G, V^VW) n M(VG)
15
14.28
10.65
J(L x L) nM(VG)
23
28.52
27.00
J(VWG,VWG) n M(VG)
24
37.92
58.77
J(UA(G)) r M(VG)
22
28.45
26.11
J{UA) n M(VG)
23
28.52
27.00
J(UA)nM(V,G)
24
29.97
29.64
J(UA)nM{V)
25
31.05
30.32
J(UA) n M(l)
26
70.51
64.87
Evidently the parsimonious model J(UA)n M[V) fits the data very well.
This model implies that there is no period or carryover effect and that the
uniform association structure is the same for each sequence group. There is
evidence of a significant device effect (G2 = 70.51  31.05 = 39.46, df â€” 1).
We will proceed to describe this device effect. The freedom parameter ML
estimates and their corresponding standard error estimates are tabled in Table
3.10.
Table 3.10. Freedom Parameter ML Estimates
for Model J(UA)nM(V)
Parameter
Estimate
St Error
0.469
0.148
Pi
0.542
0.096
P2
3.189
0.219
Pz
4.360
0.375
PV
0.511
0.082
These estimates also indicate that there is a significant device effect; the
Wald statistic which is based on 1 degree of freedom takes on the value of
 Ill 
W2 = (^e0v))2 = 38.8. The magnitude of the device effect can be estimated
using /3V. Specifically, the odds of responding j + 1 or higher for device B is
estimated to be e2^v = 2.78 times higher than the odds for device A. Using
the delta method, an approximate 95% confidence interval for this odds ratio
is (1.87, 3.69). Since the higher responses correspond to less perceived clarity
of the instructional leaflet, we conclude that there is evidence suggesting a
significant improvement of device A over device B in terms of perceived clarity
of instructions. We can describe the association between the two responses
using to. For either sequence group, the odds of responding at level i instead
of i + l for device A is estimated to be exp(0.469) = 1.6 times higher when the
response for device B was i rather than i +1. This holds for each i and j. In
summary, there is a moderate positive association between the two responses,
the strength of association being the same for both sequence groups. There
also is significant evidence of increased perceived clarity for device A over
device B.
3.5. ProductMultinomial Versus ProductPoisson
Estimators: An Application
In this section and in section 3.6, we explore some of the more practical
aspects of model fitting for categorical data. In this section we will illustrate,
by way of example, how to determine when inferences based on freedom
parameters will be the same under both sampling assumptionsâ€”product
multinomial and productPoisson. The method of determination is a direct
consequence of Theorem 2.4.2. In section 3.6, we address, at least partially,
 112 
the issue of whether or not the model is well defined. Closely related to this
is the computation of residual and model degrees of freedom.
Consider the data taken from the Harvard Study of Air Pollution and
Health. The data, displayed in Table 3.11, can be found in Agresti (1990,
p.414); they were supplied by Dr. James Ware.
Table 3.11. Childrenâ€™s Respiratory Illness Data
No Maternal
Childâ€™s Respiratory Illness Smoking
Age 7Age 8Age 9 Age 10
No
Yes
No No No
237
10
Yes
15
4
Yes No
16
2
Yes
7
3
Yes No No
24
3
Yes
3
2
Yes No
6
2
Yes
5
11
Maternal
Smoking
Age 10
No
Yes
118
6
8
2
11
1
6
4
7
3
3
1
4
2
4
7
Source: Agresti (1990, p.414), supplied by Dr. James Ware
The two groups of childrenâ€”those with smoking mothers and those with
nonsmoking mothersâ€”were followed for four years, from age 7 to age 10. At
each occasion, each child was tested for respiratory illness. The response
vector for the kth (k = 1,2) group of children is Vk = (V^*\ V^),
where response Vjf* is binary; either the disease is present or it is not. Our
goal is to find a parsimonious, simultaneous model that fits the data well.
Using this model, we will be able to address questions such as â€œis motherâ€™s
smoking status associated with the childâ€™s respiratory illness statusâ€ or â€œare
the odds of having respiratory illness the same for all four years?â€
 113 
After fitting several simultaneous models, we finally settled on the
following goodfitting (G2 = 14.33, df = 22) simultaneous model.
J : logmju, = a + aVm + a]"â€™ + <â– ' + cY"â€˜ + a? + al"â€™s + c)
+ alms + *rs + aif,v<â€ + + aY<â€˜>vm
y(2)
4 a
y(2)y(s)
jk
+ ajl
y(*)y(*) . y(Â»)y(^)
+ OL
kl
M : logit ; s) = 6 + 6Y,
(3.5.1)
(3.5.2)
where 6Y satisfies the following,
0Y=0V = $V = $V; 9\ = 0
This model ((3.5.1) n (3.5.2)) implies that there are no threefactor
interactions among the five factorsâ€”the four responses and the covariate,
there is no significant group (Smoker) effect, and that there is marginal
homogeneity among the first three times. There is an indication that the
odds of having respiratory illness are lower when the child is 10 years old. In
fact, the test statistic value used for testing marginal homogeneity across all
four times was significantly large (G2 = 24.29  14.33 = 9.96, df = l).
Our objective in this section is to determine which of the freedom
parameter estimates, if any, are affected by assuming the counts are product
Poisson rather than productmultinomial. We will use Theorem 2.4.2. To
invoke the results of that theorem more directly, we will rewrite the model
using the matrix notation of Chapter 2. The model can be written as
Glog = X(3,
Â©
OHO
i* o o
O Ãâ€”1 H1
lâ€”1 o
H* O
o o
H1 O Iâ€”1 O O H1
O Hi O
H O O
O Iâ€”1 H*
Iâ€”1 O H1
O H> o
H* O O
O I4 H1
o
o
Iâ€”1 o
H* O
o o
o
o
o
o
O Iâ€”1 o
I1 o o
O Iâ€”> Iâ€”*
Iâ€”1 O H1
o
o
o
o
o
o
o
o
o
o
Iâ€”1 O Iâ€”1 O O H4 I4
o
o
o
o
o
o
o
o
o
o
o
o
0
I* to
sir
Cs
CO
to
p
II
Â©
O O O H4
o o o ^
O O I1 o
o o
I
O Ht o o
0^00
H4 o o o
I o o o
Â»7ÃÂ®
and
O
II
II
II
II
vec
tâ€”i
P
Â©
Â©
i*
W
fcO
to "â€¢
P
where
115
o
o
O rH
rH iâ€”I
rH O 11
rH iâ€”Ã O
T ( Tâ€”I
rH rH ^D
rH rH rH rH ^D
O o
o o
o o
O iâ€”I
iâ€”I o
o o
O rH
iâ€”I o
o o
O rH
O rH
O iâ€”I
o o
o o
o o
O rH
o
o
o
o
rH
o
o
o
o
o
o
o
o
O rH
o o
o o
o o
o o
o o
rH rH CD
o
o
rH rH rH rH rH rH rH rH ^D ^D ^D
O O O
o o o
o o o
o o o
o o o
o o o
rH ^D rH
rH O
o o
rH ^D rH ^D rH (O rH ^D rH ^D rH ^D rH ^D rH
rH rH ^D
rH rH rH rH CZ2
rH O
O O
O
O
O
rH rH ^D
rH rH rH rH rH rH rH rH ^D ^D ^D
O
O rH
iâ€”l O
o o
rH O O
rH O rH
rH iâ€”I O
rH O tâ€”I
rH O
O O
O O
O rH
iâ€”I tH O O rH iâ€”I
O
rH O O
O O O
o o o
o
rH O O
o o o
o o o
o o
o o
o o
o o
o o
O iâ€”I
rH rH C3
rH O
o o
o o
o o
o o
o o
O rH
O rH
iâ€”I iâ€”i iâ€”I iâ€”I o
o o o
o o o
o o o
O rH O
o o
o o o
o o o
o o o
o o o
o o o
o o o
O rH O
rH O O
o o o
o o
O rH
rH O
o o
o o
o o
o o
o o
o o
o o
o o
O rH
rH O
rH rH rH rH rH rH rH rH ^D ^D ^D
O rH
O O
O O
O o
o o
o o
o o
o o
o o
o o
o o
O T 1
O rH
rH O
o o
o o o
o o o
o o o
o o o
o o o
o o o
o o o
o o o
o o o
o o o
o o o
O rH O
rH O O
o o o
o o o
1â€”I rH rH o rH rH rH O
V,
i?
n
s
c/}
N
s
v>
'SS
$
a
e
râ€”I
s
y(i)y(2) y(i)yW y(i)y(<) y(*)y(s) y(*)y(*) y(*)y(<)
 116 
and
$2 = (M'7
Also, the vector of expected cell counts n is a 2 â€¢ 24 x 1 vector and is
defined as
H â€” (/Â¿mili/Â¿11121 j â€¢ â€¢ â€¢ j /Â¿22221, ^11112 > â€¢ â€¢ â€¢ 5/^22222)'â€¢
That is, the last subscript (corresponding to the sth group) is changing the
slowest and the other 4 subscripts are in lexicographical order.
In view of Theorem 2.4.2, we must determine, for i = 1,2, whether or
not C{ is a contrast matrix. If it is not, then we must find those columns of
that span a set containing the range space of 02lmÂ¡, where qi â€” m\ â€” 16 and
g2 = 4 m2. Recall that qi is the number of response functions within each
independent population for the ith model. For example, for this data set, the
second model (i = 2), which is the marginal model (3.5.2), has g2 = 4 logits to
be modeled within each of the two population groups (children with smoking
mothers and children with nonsmoking mothers). As in the statement of the
theorem, we will find a minimal spanning subset.
Since matrix C\ is not a contrast matrix, we wish to find the columns
of Xi that span a space containing the range space of Â©21i6. With the
parameterization we have used, we can easily see that the first and the
sixth columns of Xi span the required space. Also, C2 is a contrast matrix.
Therefore, it follows by Theorem 2.4.2 that the two asymptotic variances of
the freedom parameter estimators, computed under the two different sampling
assumptions, are related as follows,
varCP{M)) = var(^(p)) " ( ^21 A22 ) â€™
 117 
where Anisal6xl6 matrix with zeroes everywhere except in rows 1 and 6
and columns 1 and 6 and all the other Aâ€˜J,s are zero matrices.
Table 3.12 displays the freedom parameter estimators and their esÂ¬
timated standard errors, which were calculated under the two sampling
assumptions. Notice that only those standard errors corresponding to the
parameters a and af are different for the two sampling schemes. These are
the parameters that correspond to the first and sixth columns of X\.
Table 3.12. ProductMultinomial versus ProductPoisson
Freedom Parameter Estimation
Parameter
Estimate
ProductMultinomial
Standard Error
ProductPoisson
Standard Error
a
1.67
0.216
0.228
â€ž y(i)
Â«1
1.20
0.304
0.304
v(Â»)
al
1.35
0.342
0.342
af3)
1.07
0.266
0.266
af4)
0.39
0.288
0.288
of
0.63
0.000
0.091
oy(1)5
0.00
0.000
0.000
av(J)s
U11
0.00
0.000
0.000
fy
all
0.00
0.000
0.000
av(4)s
U11
0.00
0.000
0.000
V(1)V(2)
clH
0.73
0.323
0.323
V(i)y(3)
U11
1.30
0.303
0.303
nVWvM
** 11
1.64
0.321
0.321
y(2)v(3)
1.56
0.304
0.304
y(2)V(4)
0.98
0.327
0.327
V(s)y(4)
uii
0.92
0.226
0.226
e
2.02
0.134
0.134
9V
0.38
0.126
0.126
One last remark worth mentioning is with regard to the standard error
estimates of the estimated expected cell counts {Ai/Â¡fczÂ¿} The precision
 118 
estimates will be different for the two sampling schemes. In fact, the
relationship (2.4.6), viz.
(P)(P)'
var(/Â¿(M)) = var(/Â¿(p))  Â®f â€”â€”â€”â€”,
ni
allows us to determine how different the two variances will be. For example,
the estimated expected cell count for cell (1,1,1,1,1) is /Â¿mil â€” 232.80
and the standard errors are 7.029 and 14.292 corresponding to the product
multinomial and productPoisson sampling assumptions. The difference in
standard errors is substantial. In contrast, the estimated expected cell count
for cell (1,1,2,2,1) is /j1122i = 4.09 and the two standard errors are 1.324 and
1.342. The productPoisson standard error estimate is only slightly inflated.
Suppose that, instead of assuming the logit model (3.5.2) for the
marginal parameters, we used the equivalent loglinear model. That is, we
will modify the matrices C2, A2 and X2, and the vector /32, so that the logit
model is equivalently expressed as a loglinear model. Let = Â®jI&, A% = A2
(no modification is necessary for this example), and
*2 =
(1 1
1 0
1 1
1 0
1 1
1 0
1 1
1 0
1 1
1 0
1 1
1 0
1 1
1 0
1 1
VI 0
1 0
1 0
0 1
0 1
0 0
0 0
0 0
0 0
1 0
1 0
0 1
0 1
0 0
0 0
0 0
0 0
0 1
0 1
0 0
0 0
1 0
1 0
0 0
0 0
0 0
0 0
0 0
0 0
1 0
1 0
0 0
0 0
0 0
0 0
1 0
1 0
0 1
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 !\
0 1
1 1
0 1
1 1
0 1
0 1
0 1
1 0
0 0
1 0
0 0
1 0
0 0
0 0
0 0/
 119 
With this specification, the logit model is equivalent to the loglinear
model
M : logmÂ¿(Ã; k) = A + Af + Af + Afs + Af v + Af, (3.5.3)
where Af v satisfies
and {m,(t;A;)} is the set of expected marginal counts. That is, =
nk(f)i{t\k). The vector /32 is thus defined as
ft = (A, Af, Af^Af, Aâ„¢, Aâ„¢, Aâ„¢, Af^, Af)'
Notice that the loglinear model (3.5.3) includes the VS effect. This
effect must be included so that the model is well defined. We will discuss this
further in the next section, section 3.6.
The matrix C2 is not a contrast matrix for the loglinear representation
of the marginal model. Therefore, to determine which freedom parameter
estimators are unaffected by the sampling assumption, we must find, among
the columns of X2, the minimal spanning set for At(Â©flmj) = At(Â©^l8).
Notice that the number of response functions, within each population, for
the marginal model is now m2 = q% =8, not q2 = 4 as it was for the logit
model. Again, with the parameterization we have chosen, we can easily see
that the first and tenth columns of X2 span a set that contains the range space
of Â®jl8. Invoking Theorem 2.4.2, we have the following result. Letting the
vector Â¡3 represent the freedom parameter vector for model ((3.5.1)n(3.5.3)),
var(/?(M>) = var(/3(p>)  A = var(/3
 120 
where the elements of the partitioned matrix A are
A;; = /o, if (M)Â¿{i,6}x{i,6>
Ai; = / 0, if {k,l) ( {1,6} x {1,10}
kl \ + 0, otherwise
A2i = /0, if (fc, Z)*{1,10} X {1,6}
kl \ ^ 0, otherwise
and
A22= io, if (k,l)4 {1,10} X {1,10}
kl \ > 0, otherwise.
By expressing (3 = vec^, (32) as (3 = (^i,^2j">^26)S we can state the
result in another way: If (i,j) Â£ {1,6,17,26} x {1,6,17,26} then cov(0Â¿, 6j)
is the same under both sampling assumptions. If (i,j) is in the set then the
covariances may be different.
To illustrate, we compare the standard errors for the loglinear parameter
estimators. It happens that all of the freedom parameter estimators are the
same (see Theorem 2.4.1) and all of the standard errors are the same except
those associated with the liJ, 6th, 17th, and 26th parameters, namely a, af, A,
and Af. For these four, the standard error estimates were related as follows
se(o:Poisson) = se(dmultinomial) + 0.012
se(df  Poisson) = se(df multinomial) + 0.091
se(APoisson) = se(Amultinomial) + 0.016
se(Af Poisson) = se(Af Â¡multinomial) +0.091.
 121 
In summary, we were able to easily determine when inferences using
certain freedom parameter estimators would be the same under both sampling
schemes. This holds for a very broad class of generalized loglinear models of
the form ClogA/i = X/3. Basically, if the matrix C is a contrast matrix,
that is both C\ and C2 are contrast matrices, all of the inferences are the
same. On the other hand, if, for example, CÂ¿ of C is an identity matrix
then we must look at the design matrix Xi to determine which columns form
a minimal spanning subset for the range space of some matrix of the form
Â©fTmÂ¡. When Ci is an identity matrix, is the number of response
functions, within each population (or level of covariate), that are modeled via
CilogAi/i = Xi/3i.
3.6 WellDefined Models and the Computation of
Residual Degrees of Freedom
We made some remarks above with regard to models being well or ill deÂ¬
fined. To illustrate, we use the simple example in which the joint distribution
model is J(SY) and the marginal distribution model is We stated
that the model J(SY) n M(MH) is ill defined since the constraints implied
by the symmetry model J(SY), namely that the marginal distributions are
equal, are the model constraints of We will show that, for the
one population setting, as long as the maineffects loglinear parameters are
allowed to be arbitrary (up to freedom parameter identifiability constraints)
the joint distribution model will only imply that the expected marginal counts
satisfy the (multinomial) identifiability constraints. In all other respects the
expected marginal counts are allowed to be arbitrary positive numbers. That
 122 
is, the joint distribution model and the marginal distribution model will
not include redundant constraints and the simultaneous model will be well
defined. For this example, J(SY) restricts the maineffects parameters to
satisfy
rm>.
Evidently, the sufficient condition for the model to be generally well defined
is not met. We also discuss sufficient conditions for a simultaneous model to
be well defined when there are covariates present.
A simultaneous model will necessarily be well defined if the following
three conditions hold: The joint distribution model must be well defined. The
marginal distribution model must be well defined. And, the joint distribution
model must only constrain the expected marginal counts to satisfy the
identifiability constraints. The first two conditions hold whenever the models
do not contain redundant and/or conflicting constraints; the identifiability
constraints being included. For example if one covariate is present, as long
as the generalized loglinear portion of the model allows for a perfect fit to
the sums of expected counts within each level of the covariate, the model will
be well defined. In what follows we consider the two response, one covariate
case to illustrate how one can identify a large class of simultaneous models
that will be well defined. The extension to arbitrary numbers of responses
and covariates is straightforward.
Suppose that A and B are two response variables. We will initially
allow the number of response categories for A and B, namely I and J, to
be different. Since this chapter deals with situations when the responses
 123 
are measured on the same scale (i.e. I = J), we will also address the
sufficient conditions for model well definedness in that case. Denote the K
level covariate by P. The following lemma identifies a large class of joint
distribution models that only imply that the expected marginal counts satisfy
the identifiability constraints. It is important to point out that we will be
referring to two types of identifiability constraints. â€˜Identifiabilityâ€™ constraints
are those constraints associated with multinomial sampling, namely that cerÂ¬
tain sums of probabilities add up to 1. â€˜Freedom identifiabilityâ€™ constraints are
those constraints that are necessary to ensure that each freedom parameter in
the model is estimable. The identifiability constraints for /z will generically be
labelled as iderzi(/z) in this section. Similarly, let the identifiability constraints
for m, the vector of expected marginal counts, be denoted by ident{m). These
constraints are implied by ident(n).
Lemma 3.6.1. Let the hierarchical loglinear model (AP,BP) be specified as
either
log/z = X*/3*, ident(pL), or U*' log/z = 0, ident^pi).
Suppose that the joint distribution model [0j] can be specified as either
log p = X/3, ident(p), or U'\ogp = 0, ident(p).
If [Â© j] is no more restrictive than (AP,BP) in the sense that
M(X)DM(X*) or M(U)CM(U*),
then [0j] only constrains the expected marginal counts to satisfy the identifiÂ¬
ability constraints ident{rn).
 124 
Proof: Write the model (AP,BP) as
log Vijk = a + ocÂ£ + a f + af + afkp + afkp,
where without loss of generality the freedom identifiability constraints are
ap = a? = af = off = a?p = afp = afp = 0, VÂ», j, k,
and the identifiability constraints ident(fx) are
k = l,...,K.
* i
Using the identifiability constraints we can write
nk = exp(a + oip)'yk'yk , k = l,...,K,
where
Hence,
ik = +atkp)
i=l
Ik = 5>xp(a? + afkp)
j=l
a + ak = logâ„¢*  logTfc1  logTjf
Now all of the freedom parameters not constrained by the freedom identifiÂ¬
ability constraints or the identifiability constraints are completely arbitrary.
It follows that (7^}and (7Â¿f}, which are functions of these arbitrary freedom
parameters, are also completely arbitrary.
Therefore,
j
mi(l,k) =
3=1
= exp(lognfc  log 7^  log7jf + af + afkp)yf
= exp (log nk  log 7^ + ocf + afkp)
_ nk exp(q4 + afp)
Ik
 125 
That is, this set of expected marginal counts follows a saturated multinomial
loglinear model. Similarly,
(n  Uk eXP(af +afkP) â– _ J i i rr
mj(2j k) g , J k
"4
follow a saturated multinomial loglinear model. Since the two sets of expected
marginal counts are functions of different arbitrary parameters we have that
the entire set of expected marginal counts are constrained only to satisfy the
identifiability constraints zdent(m), viz.
i J
mÂ¿(l, &) â€” nki and k) â€” nfc, k = l,...,K.
t=i j=i
Now, if any joint distribution model is less restrictive in the sense stated in the
lemma, it must be that the model must only constrain the expected marginal
counts to satisfy ident(m). This is what we set out to show. g
As a special case, suppose that the covariate P has just one level,
i.e. K = 1. Lemma 3.6.1 tells us that a sufficient condition for the joint
distribution model to only constrain the expected marginal counts to satisfy
ident(m) is that the maineffects parameters {o^} and {a?} be arbitrary up
to the freedom identifiability constraints. In fact, for the case I = J, in view
of the proof of the lemma, if we constrained the maineffects parameters to
satisfy
af = af, * =
then expected marginal counts would be constrained to satisfy marginal
homogeneity. Another generalization of Lemma 3.6.1 involves the situation
when there is more than one covariate. If there was more than one covariate,
say P and Q, then the joint distribution model should be no more restrictive
 126 
than the hierarchical loglinear model (APQ, BPQ) for the conclusion of
Lemma 3.6.1 to hold.
Since most reasonable joint distribution models will be well defined we
assume this to be the case and hence are left to show that the marginal
distribution model is well defined. To show this, we simply must show
that the generalized loglinear or linear marginal model constraints and
the identifiability constraints ident(m) (which are implied by ident(//)) are
independent. We will initially assume that I need not equal J. Let the factors
Ri and R2 represent the level of response to factors A and B. That is, R\
is an I level factor and R2 is a J level factor. A simple loglinear model for
the expected marginal counts can be written as ((P1? P), (P2> P)). What this
means is that the expected marginal counts satisfy
logmÂ¡(l,fc) = /3' +P?' + 0lr, Â¿ = 1k = l,...,K
logmJ(2,fc)=iai+/3f +PlP, j = = (3.6.1)
/3Pl = flf2 = PlP = P\p â€” 0, ident(m).
Suppose now that I â€” J. As before, let the factor R represent the
common levels of response for both response factors A and B. Also, the
factor V will again be defined to be the response variable factor. For this
example, V is a twolevel factor taking on the values 1, corresponding to
the â€˜firstâ€™ response A, and 2, corresponding to the â€˜secondâ€™ response B. For
longitudinal data, V is referred to as the â€˜Occasionâ€™ variable. Since I â€” J we
can consider an even simpler model. We could assume that
/9f' = = T*,
* = 1 ,â– â– â– ,!
 127 
and consider the model (P, VP), which can be specified as
logmÂ¿(Ã, k) = r + rfi + rp + rf + Tt^p, t = 1,2, i = 1,...,/, fc = 1,.. .,K,
(3.6.2)
where
r tY â€” /?*, i = 1,2
rk + TtkP = Ptpi Â¿ = i,2, fc = i,...,Ã¼r,
the r parameters satisfy the freedom constraints,
tv tp  tvp  tkp  0 vf fr
and the identifiability constraints ident(m) are satisfied. Notice that the
model (P, VP) only makes sense when I = J; it implies marginal homogeneity
of the A and B response distributions. The following lemma provides us with
a way of identifying a large class of marginal distribution models that are well
defined. It is concerned with the case when I need not equal J. Lemma 3.6.3
applies when I â€” J. Each of these lemmas is easily generalizable to situations
when there are many response variables and many covariates.
Lemma 3.6.2 Suppose that the marginal distribution model ((Pi, P), (P2, P))
can be written as either
\ogm = X*(3*, ident(m) or U*â€˜ logm = 0, ident(m),
where ident{m) are those identifiability constraints implied by ident{n).
Specify the marginal distribution model [0m] as
log m = X(3, ident{rn) or U1 log m = 0, ident(m).
 128 
If [0M] is no more restrictive than ((Ri, P), (P2> P)) in the sense that
M{X) 2 M(X*) or M{U) C M{U*)
then [Ojtf] is well defined.
Proof: By equation (3.6.1), the marginal model {{R\, P), (R2, P)), without
the identifiability constraints, implies that
*(M) = X777^1â€™*1) = +#bP)XexP(#Rl) and
j=l *=1
s(2,k) = x mi(2â€™= exP (P2 + z5^) X exp(/3f1 )â€¢
i=l 3=1
Hence, the s(t,A:), which are functions of 2 * K arbitrary parameters, are
arbitrary. Since the identifiability constraints ident(m) constrain the s(i, k) to
satisfy s(i, k) = nk, k = 1,..., K, t = 1,2 and the model constraints allow the
s(t, k) to be completely arbitrary, it follows that the model ((Ri, P), (R2, P))
is well defined. Also, any less restrictive marginal distribution model will also
be well defined. _
Notice that in the proof of Lemma 3.6.2 the conclusion would still hold
if the sums X)f=i exp(Pfl) and exp(Pf2) were constrained to equal each
other. This will be important when we show that the model (R, VP) is well
defined.
Suppose now that I = J so that the model (R,VP) is reasonable. This
next lemma identifies a large class of marginal distribution models that are
well defined when the responses are measured on the same scale.
 129 
Lemma 3.6.3 Suppose that the model (R,VP) can be written as either
logm = X*/3*, ident(rn) or U*' logm = 0, ident{m).
Specify the marginal distribution model [0^] as
logm = X/3, ident{m) or U' logm = 0, ident{rn).
If [0M] is no more restrictive than (R,VP) in the sense that
M(X) D M(X*) or M(U) C M{U*)
then it is well defined.
Proof: By equation (3.6.2), we can write the sums s(t,k) = as
s(t, k) = exp(r + rtv + rf + t%p) Â£ exp(r/*).
i
Notice that the first exponential term is completely arbitrary; it is a function
of 2 * K independent parameters. Therefore the set of sums (s(i,fc)} is not
constrained in any way by the model constraints, logm = X*(3*. As in the
proof of Lemma 3.6.2, it follows that the marginal distribution model (R, VP)
is well defined. Finally, any less restrictive model will also be well defined, g
In view of the proof of Lemma 3.6.3, the model (R, V, P) would not be
well defined; neither would (RV,P). In order for the marginal distribution
model to be well defined the loglinear model must include the VP effect. We
can easily generalize the results of Lemma 3.6.3. Suppose that there are two
covariates, say P and Q. It can be shown that any marginal distribution
model that is no more restrictive than the loglinear model (R,VPQ) is well
defined. A marginal distribution model that is specified as a cumulative or
 130 
adjacent categorieslogit model would be well defined if the model allows the
sums {s(t,k)} to be completely arbitrary.
We now state an important theorem that addresses the issue of model
well definedness. The theorem is specifically for the case when the response
variables A and B are measured on the same scale and there is just one
covariate P. It can easily be generalized to the case of several distinct
responses and several covariates.
Theorem 3.6.1 Suppose that the joint distribution model [0j] is no
more restrictive than the loglinear model (AP,BP) and that the marginal
distribution model [O^f] is no more restrictive than the loglinear model
(R,VP). It follows that the simultaneous model [0j n &m] 15 weM defined.
Proof: The proof follows immediately by Lemmas 3.6.1 and 3.6.3 and
the fact that a simultaneous model is well defined if the following conditions
hold: Both the joint and marginal distribution models are well defined and
the joint distribution model only constrains the expected marginal counts to
satisfy the identifiability constraints ident(rn). g
A few remarks about Theorem 3.6.1 are in order. Firstly, when there is
only one population of interest the sufficient condition is that the maineffects
parameters are allowed to be arbitrary. It follows that such models as quasi
symmetry (J(QSY)) satisfy these sufficient conditions. Also, models such as
J{UA(G)) and J(UA) of the crossover example satisfy the conditions. This
follows since the model J(UA) is equivalent to the model J(L x L) which
satisfies the sufficient conditions of the theorem; it is less restrictive than
(VWG,VWG).
 131 
For the example of section 3.5, we see that had we left the effect VS
out of the marginal loglinear model (3.5.3), the marginal model would have
constrained the sums {s(t,k)} to lie in some restricted space. This can be
seen by noting that
s(tik) = Y exP (P + P? + Pt +Pk+ *Ã¼V)
t=i
= exp {P+PY + Pi) Y exp (A*v + Pi1)
i
and that neither exp(/3+/3ty +Â¡3%) or exp(A*y+/?/*) is completely arbitrary;
s(t,k) is constrained to satisfy s(t,k) = Ktpk for some Kt and pk. Therefore,
the marginal model constraints and the identifiability constraints are not
independent. That is, model ((3.5.1) n (3.5.3)) would not be well defined if
the effect VS were not included in (3.5.3). This also follows directly from
Theorem 3.6.1. Using the program â€˜mle.restraintâ€™, an attempt was made to
fit the illdefined model. The algorithm did not converge. In practice, this
nonconvergence could very well indicate that the model is ill defined (see
section 2.5) as it did in this example.
If a simultaneous model is well defined it follows that the residual degrees
of freedom can be computed as
dfrea[Â®J H Â©Af] ~ dfrea[Qj\ + d/rei[Â©Af] (3.6.3)
since the model constraints are nonredundant. For example, the residual
degrees of freedom for measuring goodness of fit of the simultaneous model
J(L x L + D) n M{U) used in the political interest data example can be
computed in this way. This follows since the model J(L x L + D) satisfies
the sufficient conditions of Theorem 3.6.1 and so, if M{U) is well defined, the
 132 
simultaneous model J(L x L + D) n M(U) is well defined. In contrast, the
model G) used for the crossover data example, does not satisfy
the conditions of the theorem since the effects V^G and V^G are omitted.
In fact, the model implies that there is no Group (G) by Response level
(R) association. Therefore, the simultaneous model comprised of this joint
distribution model along with the marginal cumulativelogit model M(V) is
ill defined since M(V) implies the same constraints. Equation (3.6.3) does
not apply in this case.
3.7 Discussion
In this chapter, we introduced a broad class of models that imply strucÂ¬
ture on both the joint and marginal distributions of multivariate categorical
response vectors when the response scale was the same for each response. We
showed that these models can be fit using the ML fitting method of Chapter
2. Several numerical examples were considered, illustrating the usefulness
of simultaneously modeling the joint and marginal distributions. All of the
models were fitted using the FORTRAN program â€˜mle.restraintâ€™, which was
developed by the author.
Model parsimony was the impetus behind this entire chapter. Our
objective was to find parsimonious models that both fit the data well and
provided us with straightforward interpretations of freedom parameters. The
models often included parameters that measured departures from indepenÂ¬
dence among the responses, as well as parameters that measured departure
from marginal homogeneity. It was shown, via a numerical example, that
parsimonious modeling may result in more efficient and reliable estimation
 133 
of both model and freedom parameters, the researcher must find a balance
between a model that is too structured and one that is not structured enough.
The author fully intends to conduct simulation studies to better understand
the importance of parsimonious modeling in this setting.
Although we provide somewhat general results regarding compatibility
of the joint and marginal models, there still is a need for more general results.
We discuss the case when the joint and marginal models can be expressed,
at least equivalently, as loglinear models. More general results are needed for
other types of models, such as cumulativelogit and linear models. For these
simultaneous models to be useful to the practitioner, a general method to
determine whether the constraints implied by the two models are independent
must be developed. The proposition in section 3.6 is a step in the right
direction.
A factor that could impede the use of this method to fit models to very
large data sets is the input requirements. The algorithm requires a substantial
amount of input. For example, consider the input required for the example
in section 3.5. The matrices C, A, and X all must be input. Although the
required input is simple to determine, there is much energy expended inputing
the information. An input program must be developed and implemented in
the program â€˜mle.restraintâ€™.
The assessment of model goodness of fit is straightforward when using
the ML method. The (log) likelihoodratio statistic G2, the Pearson statistic
X2, or the Wald statistic W2 can be used for this purpose. Of interest to
the practicing statistician, is the ability to assess how far wrong you can
be by assuming that the responses are independent. The test statistic used
 134 
for this purpose is simply the likelihoodratio statistic that measures how
â€˜far apartâ€™ the models J(J) n M(U) and J(S) n M(U) are. Because the
model J(I) n M(U) is nested within the model J(S) n M(I7), one can use,
as a measure of this distance, the difference between the two likelihoodratio
statistics, viz. G2[J{I) nM(U)]  G2[J(S) nM(t/)]. More generally, there are
many assumptions one can make about the association structure among the
responses. With the methods of this dissertation, one can easily derive tests
for the validity of the assumptions.
As an alternative to longitudinal type sampling designs, a crosssectional
sample may be taken. Crosssectional sampling involves sampling indepenÂ¬
dent groups of subjects for each response. The research questions posed about
the marginal distributions are such that they could by answered using cross
sectional data. In this sense, the marginal models are â€˜population averagedâ€™
models (Zeger et ah, 1988). However, a crosssectional sampling design
results in more subject variability, since nonhomogeneous subjects are used for
each response, and the detection of differences in the marginal distributions
may be clouded by these subject effects (Laird, 1991). Further, with cross
sectional studies, we are unable to explore the association structure among
the responses. This information, regarding the association structure, may be
of substantive importance in some situations.
CHAPTER 4
LOGLINEAR MODEL FITTING WITH INCOMPLETE DATA
4.1 Introduction
We consider making inferences about loglinear model parameters when
only disjoint sums of the complete data are observed. Inferences will be made
based on the maximum likelihood estimates of the model parameters and an
estimate of precision of these estimates. As an example, consider the data in
Table 1 of Goodman (1974). Each of 216 respondents was classified as being
universalistic or particularistic when confronted by each of four situations
(A, B, C, D) of role conflict. Goodman (1974) postulated the presence of an
underlying twolevel latent factor W which was not observed. Within a level
of the latent factor the manifest variables (A, B, C, D) are assumed to be
mutually independent. Thus, the latent class structure would allow us to
simply explain the relationship among the four manifest variables. In this
setting the unobservable complete data are the counts resulting from a crossÂ¬
classification on the four manifest factors and the latent factor. The data,
if observable, could be displayed in a 25 contingency table. The observable
incomplete data are the counts obtained by summing over the two levels of
the latent factor, i.e. the incomplete data are disjoint sums of the complete
data. As in Goodman (1974), we assume the complete data means follow a
loglinear model which implies conditional independence among the manifest
factors (A, B, C, D) given the latent factor W. Our objectives include finding
 135 
 136 
the maximum likelihood estimates of the loglinear parameters based on
the observed data, estimating their precision, computing other model based
estimators and their standard errors, and testing model goodness of fit.
There are many ways to find the maximum likelihood estimators, each
method having its positive and negative features. For example, we could work
directly with the incompletedata likelihood, which is usually complicated
relative to the completedata likelihood, and use a NewtonRaphson or FisherÂ¬
scoring algorithm. Palmgren and Ekholm (1987) and Haberman (1989) use
these methods to obtain maximum likelihood estimates and their standard
errors. We could avoid the complicated likelihood altogether and use the
ExpectationMaximization algorithm (Dempster et al., 1977). Sundberg
(1976) discusses the properties of the EM algorithm when it is used to fit
models to data coming from the regular exponential family. In section 4.2
the EM algorithm is explored in greater detail.
Unlike the other approaches, the EM algorithm is insensitive to starting
values. This is important in practice since we seldom have any idea what
a reasonable starting value is. Another positive feature, not shared by the
other methods, is that the convergence to the maximum is monotonic, i.e.
the likelihood is increased at each successive iteration. Drawbacks to the EM
algorithm are that (1) it is relatively slow and (2) an estimate of precision
of the parameter estimate is not obtained as a byproduct of the algorithm.
NR and Fisherscoring, on the other hand, are faster and, as a byproduct,
provide us with an estimate of precision. The slow convergence of the EM
algorithm can be mitigated somewhat using the acceleration methods of
Meilijson (1989) or Louis (1982). Also, increased computer efficiency has
 137
made the slow convergence less of an issue. In section 4.3.2 we address
the second drawback of the EM algorithm by deriving an explicit form for
the observed information matrix when the complete data are independent
Poissons with means following a loglinear model. The observed information
matrix is computed upon convergence of the EM algorithm and then inverted.
The inverse will serve as the estimate of precision. In section 4.5 we explore
an iterative scheme that uses both NR and EM, exploiting each of their strong
points.
4.2 Review of the EM Algorithm
The EM algorithm is generally used in those estimation problems in
which the likelihood is complicated, rendering it difficult or impractical to
maximize, but in which the data can be viewed as being some function
of complete data which, had they been observed, evaluation of maximum
likelihood estimates would be simple. Unlike many other statistical rootÂ¬
finding algorithms, the EM algorithm does not require explicit calculation of
the score vector or its derivative. It uses much simpler functions.
The EM algorithm is by no means a new method for finding maximum
likelihood estimates. Goodman (1974) essentially used it. Sundberg (1976)
discusses it at length when used in the exponential family case. Dempster,
Laird, and Rubin (1977) provide us with a review of the method as well as
some of its properties. Subsequent work with the EMalgorithm has been
primarily devoted to improving the speed of its convergence (Louis, 1982;
Meilijson, 1989).
 138 
4.2.1 General Results
Suppose the complete data X has density fx(x; 9) with respect to some
measure. Let Y â€” Y(X), a function of the complete data, denote the observed
data. It follows that the density of Y is
/y(y;0)= / fx{x,0)dv(x), (4.2.1)
Jr
where R â€” {x : Y(a;) = y} and v is some appropriate measure. Since Y is a
function of X, the joint density of X and Y can be written as
fx,r(x> y;e) = 9) â– IR(x).
Hence, the conditional density of X given Y = y is
t (r.â€ž Ã¼i _ fx,YÃx
fxr{xâ€™yâ€™9> ~ fr(y,g) ~ fY(y,e)
Therefore, the log likelihood based on Y is
(4.2.2)
M^;y) = lÂ°gfY{y,0) = logfx(x;9)  log fxlY(x;y,9).
Taking the conditional expectation (given Y = y) at 0O gives us
M0;v) = E(MÂ»;#)iv = i/,Â«.)
= E(ex(0; X)\Y = y, Â»â€ž)  E(ix[Y(e, y; X)\Y = y, $â€ž)
= Â«(Mo,Â»)ff(Mo,y).
The EM algorithm is defined by
Q(9(m+1\9(m\y) = rnaxQ(y,y(m),y), (4.2.3)
0
i.e. given the mth iterate estimate of 9, 9(m\ the next iterate is that value of
9 that maximizes Q(9,9(m\y).
 139 
The following properties of the EM algorithm are verified in the
appendix. The proofs follow from Dempster et al. (1977) and Louis (1982).
In what follows S denotes a score vector and I an information matrix.
Property 1:
If and are the mth and m+ l4t iterate estimates obtained via
the EM algorithm then
M0(m+1);s/)>M<'(ra);!/),
i.e. the log likelihood is increased at each successive iteration.
Property 2:
The sequence of EM iterates {9^m\m > 1} satisfy, whenever 9
converges to 0(Â°Â°) as ra > oo,
I*) = *M0(oo);y) = Â°
i.e. the estimates converge to a zero of the score vector for Y.
Property 3:
For any 90,
Â§g[Q(Â«,Â«o,y) IÂ».] = sy(g 0;y) = E(sx(ff0;X)iv =
Property 4:
For any 90,
Jy(9â€ž;y) = E(Ix(Sâ€žX)\Y =  var(Sx(Â«â€ž;X)r = y,60).
Briefly, property 1 implies that the incompletedata likelihood is inÂ¬
creased with each successive iteration, property 2 says that the EM algorithm
 140 
can be used to find a zero of the incompletedata score function, property 3
provides us with a way of evaluating the score function (see Meilijson, 1989),
and finally property 4 gives us an expression for the observed information
matrix based on the incomplete data. These four properties of the EM
algorithm will be explored in detail in the next section which deals with
the special case in which the complete data have distribution in the regular
exponential family.
4.2.2 Exponential Family Results
The exponential families of distributions play an important role in statisÂ¬
tical inference. Many data generating mechanisms can be modeled assuming
that the underlying distribution is a member of the regular exponential family.
In this section we consider properties of the regular exponential family that
are relevant to the use of the EM algorithm. Specifically, we will make use
of the results of this section, which are due primarily to Sundberg (1974), to
justify results for Poisson loglinear models with missing data.
Let the complete data vector X have density, with respect to some
measure, in the regular exponential family. That is assume that
fxfaP) = a(x)exp(T'(x)/3  c(p)), (4.2.4)
where T(x) = (Ti(x),T2(x), .. .,Tp(x))' and /3 is a canonical parameter vector
of length p. Let X â€” (x : fx(x\P) > 0}.
Some well known properties of the regular exponential family include
1. T(X) is sufficient for (3
2' HÂ§L = Ei>(nX)) and
wÂ§=â„¢Anx)).
(4.2.5)
3.
 141 
These properties of (4.2.5) are shown in Lehmann (1983, pp. 29,30). The
properties follow immediately upon repeated differentiation of Jx fx{x',fi)dy,(x)
with respect to (3. Lehmann (1983) showed that the derivative could be passed
through the integral.
Suppose that the incomplete data vector Y is a (many to one) function
of X, i.e. Y = Y(X). For notational convenience, we let t = T(x) and Ir(x)
represent the indicator of membership in R = {x : Y(x) = y}. It follows by
equation (4.2.2) that
/jry(*;y>Â£)
Ãx(x]^)Ir(x) _ a(x) exp(t'(3  c((3)) â€¢ IR(x)
/y(y;/3) Â¡Ra(x)exP(t'0  c(/3))dv(x)
= a(x) exp(t'(3  c*(/5; y)) â– IR(x) = a*(x) exp(t'fi  c*(/3; y)),
(4.2.6)
where a*(cc) = ci(x)Ir(x) and c*(/3;y) = log JRa(x) exp(t'(3)du(x). Hence the
conditional distribution of X given Y â€” y is also a member of the exponential
family (Sundberg, 1974). Again by properties of the exponential family we
have
1.  Ed(T(X)\Y = y) and
2 = var^T(x)lF = Â»)â€¢
Using (4.2.2) and (4.2.6) we can reexpress the density of Y as
/y(y;/3)
_ fx{x;(3)IR(x)
fx\y{x;y,P)
_ a(x) exp(t'(3  c({3)) â€¢ IR(x)
a(x) exp(t'/3  c*(/3; y)) â€¢ IR(x)
= exp(c*(/3; y)  c((3))
 142
Our objective is to maximize /y(y; /3) with respect to (3. Or, equivalently,
we are to maximize the log likelihood
M/3;y) = c*(/3;y)c(/?)
(4.2.8)
with respect to (3.
For well behaved Â£Y((3]y) we can find the value of (3, say /5, that
maximizes it by solving the score equations
c (sv)f ac*(/3;y) dc(/3)
sY{p,y) d(3^Y{p,y) d/3 dp o.
(4.2.9)
Notice that by properties given in (4.2.5) and (4.2.7), this is equivalent to
solving the equation
Sr(P;y) = MT(X)\Y = y)  Eâ€ž(T{X)) = 0. (4.2.10)
There are many ways to solve (4.2.10). One possibility is to use the following
iterative scheme:
(1) Find EeM(T(X)\Y = y)
(2) Solve for /3(+Â» in E^(T(X)) = EpM(T(X)\Y = y) (4.2.11)
(3) If /3(*d  /5^I/+1^ > TOL then replace (3^ by /3^+1^ and go to (1).
Else stop.
We show in Appendix B that the iterative scheme (4.2.11) is simply the EM
algorithm. The convergence properties are discussed in Sundberg (1976).
One important note with regard to the EM algorithm (4.2.11) is that if
Â£y(/3;y) is not so well behaved, e.g. the score vector 5y(/5;y) has multiple
roots some of which may be associated with a minimum, then the particular
 143 
solution /3, obtained via the EM algorithm, will be a local maximum likelihood
estimate. This follows since the likelihood increases monotonically with each
successive EM iteration.
Upon convergence of the algorithm, we can use the negative Hessian
matrix evaluated at /3 to estimate the observed information matrix based on
the incomplete data. The negative Hessian is
Mfcy) =QfojjtYifcy)
d2c((3) d2c*(p)
dp'dp dP'dp
= var,(T(X))var,(T(X)y = y)
(4.2.12)
=lx^y)lx\Y{P\y)
This expression for the negative Hessian was noted by Sundberg (1974).
He referred to the matrix IX\y as a measure of information loss. With
regard to lost information, let us suppose the observed data Y are such
that T(X) = g(Y). That is, the sufficient statistic for P is a function of
Y. Intuitively we would expect no loss of information since we are able to
observe the sufficient statistic and hence we expect IX\Y to be identically the
zero matrix. In fact, this follows since T(x) is constant on R = {x : Y(cc) = y}
whenever T(x) = g(y). Hence c*(/3;y) = exp(t'/3) JRa(x)dv(x) which is linear
in p. Thus
Jxy(/5;y)
d2c*(p;y)
dpdp
In view of equation (4.2.9), instead of using the iterative scheme deÂ¬
scribed in (4.2.11), we could work directly with the incomplete data likelihood
Â¿y(Pâ€˜i V) and implement a NewtonRaphson or Fisherscoring algorithm to find
a root to the nonlinear equation. The program NLIN described in Appendix B
 144 
can be used to this end. Notice that both Sy (/3; y) and IY(/?; y) (or a numerical
approximation thereof) would need to be computed at each iteration.
Specifically, the iterative scheme can be written as
(1) Compute /3(I/+1) = /3(u) + (AY((3("); y))~l SY{(3^\y)
(2) If H/Jbd _ /5(l'+1) > TOL then replace by /5(I/+1) and go to (1).
Else stop. (4.2.13)
where AY(/3; y) = IY((3; y) if the NewtonRaphson method is used, AY(/3] y) =
Ep(IY(Â¡3\ Y)) if the Fisherscoring method is used, or AY((3; y) is a numerical
approximation to the observed or expected information. See section 4.5 for
details on the approximation method.
In section 4.5, we consider an iterative scheme that is a modificaÂ¬
tion/combination of the two schemes (4.2.11) and (4.2.13). The modified
algorithm for solving (4.2.10) exploits the virtues of both these iterative
schemes.
4.3 Loglinear Model Fitting with Incomplete Data
We investigate more closely the special case of incomplete Poisson data
with means following a loglinear model. The assumption that the complete
data are distributed as product Poisson, i.e. the components are independent
Poisson random variables, is not as restrictive as it seems. We use results
of Birch (1963) and Palmgren (1981) to show that maximum likelihood
inferences about the parameters that are not fixed by sample design are the
same whether the data are product Poisson or multinomial. To this end,
we derive an expression for the variance of the multinomial cell probability
 145 
estimates when the model parameters are estimated under the product
Poisson assumption.
Section 4.3.1 shows that the EM algorithm takes on a particularly simple
form when the complete data are assumed to be product Poisson with means
following a loglinear model. In section 4.3.2 we derive an explicit formula for
the observed information matrix that is based on the observable incomplete
data. Section 4.3.3 discusses inferences for multinomial loglinear models.
4.3.1 The EM Algorithm for Poisson Loglinear Models
Let X = (Xi, X2,..., Xâ€ž) represent the â€œcompleteâ€ data vector of cell
counts and suppose that
Xi ~ indep. Poisson(/iÂ¿), i = l,2, ...,n
where pi = /iÂ¿(/5) satisfies the loglinear model logp(/3) = Z/3. Here Z is some
nxp full rank model matrix and Â¡3 is a p x 1 parameter vector.
Suppose only certain disjoint sums of X are observable. Let Y =
{Y\,Y2,... ,Ym) = LX denote the observable (or â€œincompleteâ€) data. Here
L is an m x n matrix (m < n) that satisfies the following three properties:
(1) Each element is a â€˜Oâ€™ or a â€˜1â€™
(2) There is at most one â€˜1â€™ per column (4.3.1)
(3) There is at least one â€˜1â€™ per row
Properties (1) and (2) of (4.3.1) ensure that the components of Y
are independent Poisson random variables while property (3) precludes a
noninformative row of zeroes.
 146 
Denote realizations of X and Y by x and y. The objective of this section
is to find the maximum likelihood estimate of /?, denoted by /3, based on the
observed data. Writing the density of the complete data X as
fx(x;(3) = a(x) exp(x'Zf3  l'eZ/3) (4.3.2)
we see that fx has form (4.2.4) and that a sufficient statistic for (3 is Z'X. It
follows by (4.2.8) that Y = LX has log likelihood of the form
M/3;y) = c*(/3;y)c(/?). (4.3.3)
where c* and c are functions defined in section 4.2.2. But, by properties of the
matrix L, we know that Y has a product Poisson distribution. Specifically,
Yi ~ ind Poisson(T'/x), i =
where L\ is the ith row of L and p. is the vector of complete data means. Since
the complete data means are a function of some model parameters through
log(/Â¿) = Z(3, we have that = L'iexp(Z/3). It is important to note that
log(L'ifi) is generally a nonlinear function of (3. For this reason, the model
fitting is somewhat more complicated.
Using the fact that Y is product Poisson, we have that the log likelihood
of Y is
m m
M/3; y) = Y, y* los(3 exP (ZP)) Y,1'* exp(zP) + Hv) (4.3.4)
1 1
where the function h{y) is independent of the parameter (3. Now, we
differentiate equation (4.3.4) with respect to Â¡3 to obtain an expression for
 147 
the score vector. It is
o m m
y)=? T^mz'DML''  Z'DM ?Li
m m
= Z,Â£l(,Ã)( S â€œ Z'D{L< ~ !Â»)  Z'D(M)1Â»
m m
= Z'DM S Â¿;exp(^j?)Â£i) + ^Wf1  Â£ L) 
= MW Â£ ()Li + Z'/x  ZV
(4.3.5)
where in the last line and â€˜â€”â€™ represent componentwise operators. As
shown in section 4.2.2. equation (4.2.10), the log likelihood of the incomplete
data can alternatively be expressed as
V) = Ee(Z'X\Y = y)~ Eg(Z'X)
since dc*(/3)/d/3 = Ep(Z'X\Y â€” y) and <9c(/3)/<9/3 = Ep(Z'X). Evidently,
since Ep(Z'X) = Z'fi, it must be that
E,(Z'X\Y = y) = Z'[n (1.  Vlm + Â£â€™(#))] (4.3.6)
Therefore, the EM algorithm is simply
(1) Find Z'K/JM) â– (1â€ž  L'lm + Â¿'(j^))]
(2) Solve for /?(â– '+Â» in Z>(/3<+â€™>) = Z'\p(f3<>) â€¢ (1.  Llm + Â¿'(j^))]
(3) If /3^^  /3(,/+1) > TOL then replace /3b7) by /3(,/+1) and go to (1).
Else stop. (4.3.7)
In practice, finding a reasonable starting value for /3, say (3^Â°\ is very
difficult. However, in view of the first step of the EM algorithm, we need
only be concerned with an initial estimate of Â¿i. Notice that if /AÂ°), the initial
 148 
guess for /Â¿, satisfies L/i(Â°) = y then we have tacitly chosen an appropriate /?(Â°)
to start the algorithm. This is so since we can go to step (2) of the algorithm
and calculate Â¡3^ the solution to the equation. In fact,
/3W = (Z'Z)1Z' log//0).
Thus, the EM algorithm has the nice feature that, not only is it
insensitive to starting values, but also reasonable starting values are simple
to find. A FORTRAN program â€˜em.loglinâ€™ has been written to actually
implement the EM algorithm as defined in (4.3.7).
4.3.2 Obtaining the Observed Information Matrix
In the previous section we showed how one can obtain maximum likeliÂ¬
hood estimates of the loglinear model parameters using the EM algorithm. In
this section we address the major drawback of the EM algorithm; an estimate
of the precision of these ML estimates is not obtained as a byproduct of the
algorithm. We derive an explicit formula for the observed information matrix
associated with the loglinear model parameters that is intuitively appealing
and simple to evaluate. Upon convergence of the EM algorithm the observed
information matrix is evaluated at the ML estimates and inverted. The
inverse information can be used as an estimate of precision (Agresti, 1990).
Notice that in this section we consider using the observed information rather
than the expected information. We follow the lead of Efron and Hinckley
(1978) which builds a case for the preferred use of the observed information.
If desired, however, the expected information can easily be computed since
the observed information is shown to be a linear function of the incomplete
data.
 149 
Recall the setup in the previous section. Only disjoint sums of a complete
data vector X, which is product Poisson, are observable. The complete data
means are assumed to follow a loglinear model of the form log Â¡j, = Z/3. By
expression (4.2.12) of section 4.2.2. we see that the observed information
matrix based on the incomplete data has form
IriPw) = varp{Z'X)  yaxp{Z'X\Y = y)
â€” Ix(fl)  (Adjustment Matrix)
This expression is intuitively appealing since varp(Z'X) = Z'D(fi)Z is the
expected (and observed) information for Â¡3 treating the complete data, X, as
if it were observed, while var/3(Z'XF = y) is an adjustment that is necessary
because we do not actually observe X but only LX = Y. The amount of
information lost by observing only Y is determined by the conditional variance
of the sufficient statistic Z'X given LX = y.
At this point, one could derive a formula for the adjustment matrix as
in a technical report by the author. The gist of the argument was that the
distribution of X\Y â€” y has a simple form when Y represents disjoint sums of
the independent Poisson random variables X and so the conditional variance
of X (or Z'X) given Y = y can easily be computed. A main result of that
technical report was that
cov(Xa,X6 LX = y)
\ Ma â€¢ J(r(a)=0) + &(Â«) JT (l jT ) â€™ ^(r(a)>0)  * ^(a=6)
1 r(a)^ r(a)r >
+ { Vr(a) LT^ ' J(r(a)=r(6))  * ^(a#6)
where 1^ is the indicator function and r(j) is defined as follows:
(4.3.8),
row number
0,
in which â€˜1â€™ occurs, for the jth column of L,
if a â€˜1â€™ does not occur in column j of L.
 150 
In this dissertation we will take a different approach. The explicit
form of the score statistic for Y was derived in equation (4.3.5). Since the
observed information is nothing but the negative Hessian of the log likelihood,
we can obtain an explicit formula for the observed information by simply
differentiating the negative of the score function with respect to /3'. The
appendix shows how one arrives at
lY(P\y) = â€”Â£pÂ¡SY(P\y)
= Z'D^)(Â±J^LiL't)mZ
1 Â« (43.9)
(m t i \ \ /
= Z'D(u)L'D(j^)LD(lx)Z  Z'D(V(*^))D(riZ,
where the â€˜â€”â€™ in the last line represents componentwise division.
Notice that the expected information matrix has a particularly simple
form, viz.
E,(IrV),Y)) = Z'DM(jr7^Â¡LiLl)D(riZ
= Z< D^)L' D\L^)LD{ii)Z.
(4.3.10)
Using either of the results in (4.3.8) or (4.3.9), we derive an explicit form
for the observed information matrix for several examples.
 151 
Example 1: Missing Componentsâ€”When certain components are
unobservable, L will be an identity matrix with rows missing. It follows
that the observed information matrix is
M/3;!/) = Z'DMZ  Z'D{Mn)Z
where M is a diagonal matrix with jth diagonal element (M)jj = 1  I(r(j) >
0).
Example 2: Latent Class Modelsâ€”Suppose that counts resulting from
a crossclassification on several factors are observable and that classification
on an additional Klevel latent factor is unobservable. We let the subscript i
represent a compound subscript identifying classification on observable factors
while the subscript j indexes the K latent classes. Denote the complete
data vector of cell counts by X = (Xu,..., Xuc, ..., Xm\,..., Xmx;)T =
{Xij} and the incomplete data by Y = {XÂ¿+}. Notice that Y = LX
where L = 1^ Â® Im. One possible latent class model assumes the means
of the unobservable complete data, say /iÂ¿y, follow a loglinear model that
implies conditional independence of observed factors given the latent factor
classification (Haberman, 1979). It follows that the observed information
matrix is
M/3; y) = Z'D(n)Z  Z\ Â©Â¡I, VÂ¡)Z (4.3.11)
where each is the covariance of a K x 1 multinomial vector with index
yi = Xi+ and cell probabilities {/Â¿Â¿j/ Y,f=i Viji j = 1Â» â€¢ â€¢ â– j X}.
Example 3: Partially Classified Data Modelsâ€”Consider the two factor
nonignorable nonresponse model with one supplemental margin (Little &
Rubin, section 11.6, 1987). The complete data X are counts resulting from
 152 
a crossclassification on two factors and F2, along with a dichotomous
nonresponse indicator R. Suppose the Fy classification is always known and
that R indicates whether or not the F2 classification is known. To make
inferences about the classification probabilities and missing data assumptions,
Little & Rubin assume the complete data means follow a loglinear model.
Variance estimates of the loglinear parameters are easily derived since the
observed data have form Y = LX and L satisfies (4.3.1).
4.3.3 Inferences for Multinomial Loglinear Models
Previously, we assumed that the complete data were distributed as
product Poisson, i.e. the complete data components are independent Poisson
random variables. However, the sample size is often fixed by design so that
the distribution of the complete data vector may really be multinomial. This
follows since a product Poisson vector given the total is multinomial. Since
the total sample size is considered a random variable when the product
Poisson assumption is used, the assumption seems to be unreasonable.
Fortunately, Birch (1963) and Palmgren (1981) showed that maximum
likelihood inferences about all of the loglinear parameters that are not fixed by
design are the same whether one assumes the distribution is product Poisson
or multinomial. Therefore, it is general practice to assume the data are
product Poisson since the Poisson distribution is in the regular exponential
family and has an unconstrained canonical parameter. The Poisson loglinear
model is an example of a generalized linear model (McCullagh and Nelder,
1989) which makes it simple to work with.
 153 
In this section we discuss making inferences about loglinear parameters
when the sampling design is such that the total sample size is considered
fixed but the data are not completely observed, i.e. there is missing data. It
is not obvious that the results of Birch extend to the case of incomplete data.
Therefore, we provide a detailed discussion of the extension to the missing
data case.
The Setup. In the following argument we assume that the matrix L is such
that each column has at least one â€˜1â€™ in it. This requirement results in the
incomplete data Y = LX having the same sum total as the complete data,
i.e. VmY = 1 'mLX = l'nXd=N. We also require the loglinear model to include
an intercept term. This intercept term will be the parameter that is fixed by
design, since the total sample size N will be considered fixed.
FullMultinomial Sampling. Suppose that the complete data vector X has
a multinomial distribution, i.e.
X = (Xlt...,Xny ~ Mult(iV,7r(0)),
where N = l'nX is the fixed total sample size and 7r(8) = (7^(0),... ,7rn(0))'
represents the vector of cell probabilities that satisfy X)"=i 7rÂ¿(^) â€” 1 Since N
is considered fixed, it makes sense to write the cell means as /iÂ¿(0) = iWj(0)
so that Z)"=i/Â¿*(0) = N. Assume also that the cell means {/Â¿Â¿(0)} follow the
loglinear model
log m(9) = oc + x'i/3, t = l,...,n,
where is a p x 1 vector and 8 = (a,/3contains the so called loglinear
parameters.
 154 
Further, suppose that only Y = (Yi,..., Ym)' = LX is observable. The
matrix L, which is of dimension mxn (m < n) will be required to satisfy the
3 conditions of (4.3.1) as well as
(4) L has at least one Tâ€™ in each column.
It follows that
Y = (Y1,...,Ym)' ~ Mult(iV,Ltt(9)),
where Ltt(9) = (.Lj7r(0), ..., Z/m7r(0))'. Again expressing the cell means as
rj{9) = Lfj,(9) = NLtt(9), we have that the incomplete data cell means satisfy
r]i(9) = Lp(0) = L\ exp(aln + X/3) = exp(a)LÂ¿ exp(X/3).
Also, since there is a constraint on the /Â¿Â¿(#) there is a constraint on the rÂ¡i{9).
In fact, the r]i satisfy 2 rÂ¡i{9) = L\)n{9) = 1 'np(0) = N. Also, the log
means satisfy \ogrji(9) = logexp(a)LÂ¿ exp(X/3) = a + log(LÂ¿ exp(XÂ¡3).
Denote the model parameter space for the multinomial scenario by 0^
and notice that
0Â«={*=w ')'=
i
Evidently the set Qm is constrained and so 0^f is not equal to the (p + 1)
dimensional real space.
Consider the onetoone transformation 9^9* = (t,/3where r =
Y,â„¢ pÂ¿(0). It follows that under this new parameterization the rji satisfy
m
log r7Â¿(<9*) = logrlog(]TL;exp(A73)) + log exp(X/3)
i
,r = = Z^(exp(a)LÃexp(x^))
i i
m
=> a = logr  log(^) L\ exp(X/3))
i
since
 155 
We will call the new parameter space Q*M and note that it is
0Ji = {0* = (t, 13â€™)': T = N,f3eR
The incompletedata likelihood under the (M)ultinomial assumption can
be written in terms of this new parameterization as
4MV*;s/) = Eviiog^*)  iviogjv
i
= ^yÂ¿(logrlog(^L; exp(X/3)) + logL'Â¿ exp(X/3))  NlogiV
i i
= N logr  N log N + Â£ Vi log L'i e*PixP)  N lÂ°g(Â£ L'i exp(X/3)
i i
= &L'i exp(X0)  Nlog^L'iCxpiXP)) = t2(P),
(4.3.12)
since V0* g 0^, r = N. Therefore, the incompletedata multinomial log
likelihood is independent of r. Also, since the parameter (3 is free of
constraints, we can maximize y) with respect to 9* by simply setting
f = N and maximizing the unconstrained function i2(/3) with respect to (3.
In this context we refer to a as being fixed by sample design since it is a
function of the other parameters (3 and the fixed sample size N.
ProductPoisson Sampling. In contrast to the first sampling scheme, the
total sample size is not considered fixed. Assume that the complete data
X = (Xi,..., An)' are distributed as product Poisson, i.e.
X{ ~ ind Poisson(/Â¿Â¿(0)), z = l,...,n,
where the parameter 9 is unconstrained and the means satisfy
log Hi(9) = a + x'fl, i =
Again, we assume that the complete data are not observable and that
we only are able to see Y = (Yi,...,ym) = LX with L satisfying the same
 156 
four properties that it did in the multinomial setup. The vector Y is then
distributed as product Poisson. Specifically,
Yi ~ ind Poisson(LÂ¿/Â¿(0) = 7^(0)), i = l,...,m.
The cell means ?7Â¿(0) satisfy the model
logffc(0) = a + log(Lexp(X/3)), i= l,...,m.
or using the same reparameterization [6 6*) as above
logÂ»7i(^*) = logr  log (L'i exp{Xf3)) + log {L\ exp(X(3)).
i
We will denote the model parameter space for the Poisson sampling case by
0p zz {0* = (r,/5')' : r e R+,/3 Â£ Rp)i where the symbol R+ represents the
set of positive real numbers. It is important to note that &m ^0? since
constrains r to equal N while 0p requires r only to be positive.
The incompletedata Poisson log likelihood can be written as
4P)(0*;y) = Y,{y^Â°svi^*)  vi(o*))
i
= XÂ¡yÂ¿(losr  1Â°s(SI,Ãex pW)) + log(LÃ exp(^))) 
it i
= y+ logr â€” t + XI y* los W exp(^))  y+ los (S exp(^))
i i
= Mr) + 4(/3)>
(4.3.13)
where Â£2(/3) is defined to be the multinomial log likelihood in (4.3.12) and
Â£\(t) is the log likelihood for the Poisson random variable Y+ which is the
total sample size N. Since /3 is unconstrained for both sampling schemes,
we can find the ML estimates by differentiating (4.3.12) and (4.3.13) with
respect to /3 and finding the roots of these score functions. But the two score
 157
functions are identical implying that the maximum likelihood estimates of Â¡3
are the same for both sampling schemes. That is, if we let /3(M) and /3(p)
denote the ML estimates of (3 under the multinomial and Poisson sampling
schemes, respectively, we have shown that /?(M) = Â¡3(p\ Also, by (4.3.12) and
(4.3.13), we see that upon differentiating a second time
d2
df3'd(3
d2
M/3) 
d(3'd(3
so that the portion of the information matrix that pertains to /3 is the same
for both sampling schemes. Further, equation (4.3.13) shows that the log
likelihood for incomplete Poisson components can be expressed as a sum
of two, parameter independent, log likelihoods. Thus, the parameters are
orthogonal in that the information matrix is block diagonal, i.e. the parameter
estimates are asymptotically uncorrelated. The inverse of a block diagonal
matrix is simply the block diagonal matrix of the individual inverses. Hence,
the estimated variance of the ML estimates of (3 is the same for either sampling
scheme.
Cell Mean Inference. Notice that not only is M) = /3(p) but also
t(m) â€” f(p) = N. This follows since, in the multinomial case, r is necessarily
equal to the total sample size iV, while in the Poisson case, ^i(r) is simply
the log likelihood of the random variable Y+ which is Poisson with mean
r, implying that the ML estimate is f(p) = Y+ = N. However, we must
acknowledge the fact that the asymptotic variance of f under the Poisson
assumption is approximately N (it is var(Y"+), where Y+ ~ Poisson{r)),
while the variance of f under the multinomial assumption is zero (var(iV) = 0
since N is nonstochastic). This is important because inferences about cell
 158 
means (or cell probabilities) involve all of the loglinear parameters, even r.
Thus the variance of the cell mean estimates will depend upon which sampling
scheme is used.
Briefly, using the EM algorithm, we can find the observed information for
the loglinear parameters (a,Â¡3')' based on the assumption that the complete
data are product Poisson. The complete data means Hi are assumed to follow
the loglinear model
log Hi = a + x'i/3, i = 1,..., n.
If the sampling design is such that X+ = N, the total sample size, is fixed
so that X ~ Mult(iV, 7c(a,/3)) then the parameter a is â€˜fixed by designâ€™.
Actually, upon reparameterization, we see that /3 is free of constraints but
that a = a(/3, N), i.e. a is a function of /3 and N. In fact,
Â« = log(Â£Hi) ~ log(Â£ exp(x'/3))
i i
= logN lÂ°g(Â¿ exp(zÂ¿/3))
(4.3.14)
Our objective is to find an estimate of the variance of the cell mean estimates (1
under the multinomial assumption. The calculation of this variance estimate
is complicated somewhat since the variance estimate of Ã¡ is different for the
two sampling schemes. It is a simple application of the delta method to find
the variance of Â¡1 under the Poisson assumption since Â¡j, = exp(dln + X/3).
This follows since weâ€™ve found the information for (a,/3) and hence the
estimated variancecovariance matrix of (a, /3) based on the assumption that
the complete data are product Poisson and that the incomplete data are of
the form Y = LX with L satisfying the same four properties as above.
 159 
Since, upon convergence of the EM algorithm, we compute the variance
covariance matrix of (a,/3) under the product Poisson assumption only, we
must find a way to rewrite /i in terms of /3 and N only. But by (4.3.14) we
have the relationship
d = log N  lÂ°g(Â¿ exP(x'ifi))
so that
A = exp(dln + Xf3)
= exp (lnlogiV + lâ€žlog( â€”â€”^) +XÂ¡3
\ Â£exp
= N( exp(X/3)^ \ =Ni exp(X/3) \
^E?exp(x^)/ \Vnexp(XÂ¡3)J
(4.3.15)
Now since the information for (3 is the same under both sampling
schemes, we can find an estimate of the variance of Â¡3 assuming the complete
data are multinomially distributed. We will actually find the variance of 7r,
which is nothing but fo = (exp(X/3)/l'n exp(X/?)), via the delta method.
Delta Method. Since the ML estimate Â¡3 is consistent, a first order
approximation to 7r can be found by using a Taylorâ€™s expansion about the
true parameter value (30, viz.
7T Â« 7t(/50) + ~ Po)
Thus, the variance of if is approximately
var(7r) Â« var^7r(/30) + ^i\p0CP ~ Pofj
A A
where var(/3) is that portion of the variancecovariance matrix of (d,/2)
pertaining to $. Recall that it was shown above that this portion is the
same for both sampling schemes.
 160 
It is shown in Appendix B that
= [Pi*) ~ â„¢']X (4.3.16)
where X = Z[,lj. That is, X is the design matrix with the first column
deleted. Hence, the variance of ir under the multinomial assumption can be
estimated by
varMult(7r) = [D(tc)  Ã±Ã±'](Xvax(j3)X')[D(Tt)  7T7r'] (4.3.17).
4.4 Latent Class Model Fittingâ€”An Application
To further illustrate the utility of the above results, we explore the fitting
of loglinear latent class models. For an expository on latent class analysis,
see Haberman (1979).
Suppose we can observe (manifest) factors Ai, A2,..., Ap with Jl512,..., Ip
levels, respectively, while a latent factor W with K levels is not observable.
Consider the set of cells, C = {(1,1,..., 1,1), (1,1,..., 1,2),..., (Jj,..., Ip)}
resulting from a cross classification on factors A\,..., Ap. Listing the elements
of C in lexicographical order, we denote the first cell by 1, the second by 2, and
so on to m, where m = II?=i U With this representation the complete data
(the K *m cell counts) are X = (Xn,..., XiK,...,Xm,...,X!BK)T. The
observed data, Y, are the marginal counts collapsed over latent factor W.
Here Y = LX = (Xi+,..., Wm+)T, where L = 1TK Â® Jm.
We initially assume that X is composed of independent Poissons with
means following the loglinear model,
logMj(ff)=a + x'ijP, Â» = l,...,m, j = l,...,K.
 161 
We can use the EMalgorithm of (4.3.7) to derive 9 = (a,/?')' and equation
(4.3.11) to obtain an estimate of its variance. From (4.3.8) the adjustment
matrix is Z'vax(X\LX â€” y)Z with
(Vi
0
0
... 0 \
\ax(X\LX  y)  =
0
V2
0
... 0
u
0
0
... Vm)
where
Vi =
wS(iS)
y HK A*jl
Mi+ A*i+
P; + AÂ»i+
W+
y* M.+ w+
yiSLiSE \
y* mÂ¿+ /*Â¿+
^Â¡2 t*iK
~Vi
' Vi+ #*Â¡+
HiK f, HiK \
Vi m+ C1 mÂ¿+ ) /
MK
(4.4.1)
Notice that is the covariance of a FÃ x 1 multinomial vector with index
yÂ¿ = Â£i+ and cell probabilities {/Xy//2f+, j = 1,..., FQ.
Let 9 denote the final estimate of 9 obtained upon convergence of the
EMalgorithm. Using (4.3.11) and (4.4.1), we can derive an explicit estimate
of the variancecovariance matrix of 9. It is
(z'D(piÂ»))Z  Z'( Â®a, 1, (442)
which is the inverse of the information matrix evaluated at 9.
Numerical Example. We consider the example introduced in section
4.1. The observed data are counts resulting from crossclassifying the 216
respondents with respect to whether they tend toward universalistic (1) or
particularistic (2) values in four different situations (A,B,C,D) of role conflict.
The data are displayed below in Table 4.1.
 162 
Table 4.1. Observed crossclassification of 216 respondents with respect to
whether they tend toward universalistic (1) or particularistic (2) values in
four situations (A,B,C,D) of role conflict
Observed
Observed
A
B
C
D
frequency
A
B
C
D
frequency
1
1
1
1
42
2
1
1
1
1
1
1
1
2
23
2
1
1
2
4
1
1
2
1
6
2
1
2
1
1
1
1
2
2
25
2
1
2
2
6
1
2
1
1
6
2
2
1
1
2
1
2
1
2
24
2
2
1
2
9
1
2
2
1
7
2
2
2
1
2
1
2
2
2
38
2
2
2
2
20
We illustrate the results of the previous sections by fitting a simple
loglinear latent class model to the data. The ordinary twolevel latent class
model fitted by Goodman is equivalent to the loglinear model
logAbifcit â€” A1 + Af + + A k
\ n
where i,j, k, l, and t run from 1 to 2.
Using the notation defined above, the set of observable cells is
C = {(1,1,1,1),(1,1,1,2),(1,1,2,1),...,(2,2,2,1),(2,2,2,2)} and m = 24 =
16. The complete data are x = (xn, X12, â€¢ â€¢ â€¢, Â£i6i, where for instance
X42 = Â£11222 represents the count in cell (1,1,2,2,2). Although, we assume
that the complete data means satisfy the model in (4.4.3), we are only able
to observe y = Lx where L = 1'2 Â® Ji6. Hence, we will fit the model using
 163 
the EM algorithm defined in (4.3.7). The FORTRAN program em.loglin was
used to fit the model. The input information needed is
(1) m(0), an initial estimate of the complete data means
(2) m and n, the length of the observed and complete data vectors
(3) p, the number of independent loglinear parameters
(4) Z, the design matrix
(5) L, the mxn matrix that satisfies Lx = y.
As discussed in section 4.3.1, a simple initial estimate of p, and hence
of /3, is one that satisfies Lp^ = y. But, by simply allocating approximately
a half of each observed cell count to the two levels of the latent factor, we
can find a pW that satisfies Lp^ = y. This initial estimate of p also allows
us to omit the direct input of the observed data which can be obtained via
VÂ°) = y.
The twolevel latent class model fit the data well (G2 = 2.72, df= 6)
thereby giving us a simple way of interpreting the association among the four
situations of role conflict. Table 2 displays the model parameter estimates
and their estimated standard errors. To make model (4.4.3) identifiable, those
parameters not displayed in Table 4.2 were set to zero. The last column,
entitled â€œUnadj Std Errorâ€, contains the standard error estimates that would
be used if the complete data were actually observed. These are too small and
are invalid.
 164 
Table 4.2. Parameter and Standard Error Estimates
Parameter
Estimate Std Error Unadj Std Error
A*
0.532
0.911
A?
0.712
A?
0.604
A?
1.884
Af
3.160
\AW
A22
4.032
\BW
A22
3.444
\CW
A22
3.126
\DW
A22
3.081
0.491
0.276
0.197
0.177
0.225
0.171
0.212
0.168
0.334
0.237
0.530
0.317
3.593
1.543
1.151
0.563
0.962
0.518
0.603
0.386
Estimates of certain classification probabilities and their estimated
standard errors were also computed. These probabilities are defined as
=*++++. = p(w=t)
= 7ri+++i/7r++++i = P(A = H W = t)
*u'W = 7T+1 ++,/*++++< = P(B = 1 W = t)
++1+t/*++++Â« = P(C = 1\W = t)
4'"' = 1r+++lt/1r++++, =P(D = 1\W = t)
The standard errors were found using the arguments of section 4.3.3 and the
delta method. For example, the conditional probabilities have form
b\7T
binâ€™â€™
where b\ and b2 are 1 x n vectors of known constants. Thus, by a direct
application of the delta method, an estimate of the asymptotic variance is
'b27tbi  b\irb2''1
var
/6l7t\ _ b27tbi  &J7T&2
\&2*V L (Â£>27t)2
var
(tt)
(&2
(4.4.4)
where var(7r) is the variance of fr under the multinomial assumption, i.e.
equation (4.3.17). Actually, since the conditional probabilities do not involve
 165 
the intercept parameter, the variance of 7r under the Poisson assumption,
which is
772 var (Â£) = ^D(fj,)ZvaT(a,P)Z'D(v)
could be used in expression (4.4.4) and the result would be the same. This is
not true of the marginal probabilities which have form b^TV. An estimate of
the variance of 61 ir is
var(617r) = 61var(7r)6'1,
where var(7r) is the variance of 7r under the multinomial assumption. The
estimate would be inflated if one used the variance under the Poisson
assumption, reflecting the stochastic nature of the total sample size. To
illustrate, we consider an extreme example. Let b\ = l'n so that b\Ã± = 1.0
with probability one. That is, i>i7r is nonstochastic. If we use the multinomial
variance estimator we get zero as our estimate of the variance. This is what
we know it to be. On the other hand, using the Poisson variance estimator
we get some positive value as our estimate of the variance. This is known
to be incorrect. The estimated probabilities and their estimated standard
deviations are displayed in Table 4.3.
Table 4.3. Classification Probability Estimates (Standard Errors)
Latent
Class t
*V
~A\W
7rl t
"it
KC\W
"it
*D\W
7rl t
1
.279 (.058)
.993 (.025)
.940 (.066)
.927 (.066)
.769 (.095)
2
.721 (.058)
.714 (.040)
.330 (.050)
.354 (.049)
.132 (.038)
From these estimated classification probabilities, we see that level 1 of
the latent class W can be labeled the â€˜universalisticâ€™ level. That is, subjects
 166 
in level 1 of the latent class tend to have universalistic views for all four
situations. Notice that, given a subject is in level 1 of the latent class, the
probability that they respond â€˜universalisticâ€™ is estimated to be at least .77
for each of the four situations. Similarly, one could label level 2 of the latent
class as the â€˜particularisticâ€™ level. Except for situation A, the estimated
probability that an individual in latent level 2 responds â€˜particularisticâ€™ to
the situations is at least .65. Since the latent class model (4.4.3) fits well,
we conclude that, given a person is intrinsically particularistic or intrinsically
universalistic, their responses to the four situations (A, B,C,D) of role conflict
are independent.
4.5 Modified EM/NewtonRaphson Algorithm
In this section we present an alternative root finding algorithm for
the incomplete exponential family score functions of equation (4.2.9). As
mentioned above, the EM algorithm has both positive and negative features.
Two very important positive features are (1) the EM algorithm is insensitive
to starting values and (2) the EM algorithm finds a root that maximizes
the likelihood. In contrast, since the incompletedata log likelihood is not
generally a concave function of the parameters, the NewtonRaphson (NR)
or Fisherscoring (FS) algorithms may not converge to a maximal root. In
fact, they will be very sensitive to starting values and may not converge at
all. Negative features of the EM algorithm include its slow convergence and
lack of precision estimate byproduct. On the other hand, the NR and FS
algorithms work well locally, in that if we implement these methods very near
 167 
a maximal root, the convergence, relative to EM, is fast and an estimate of
precision of the ML estimator is obtained as a byproduct.
In practice, the EM algorithm may quickly approach a small neighÂ¬
borhood around a maximal root, but then slowly converge to the root.
For this reason, we present an alternative algorithm that uses both EM
iterations and NR (or some modified NR, such as FS or quasiNR) iterations.
Specifically, the EM algorithm will be used initially and then, upon reaching a
neighborhood of the maximal root, the NR type algorithms will be employed.
Meilijson (1989) suggested this approach in a fine expository of root finding
methods for incomplete data score equations.
Recall that when the complete data has distribution in the regular
exponential family the incompletedata log likelihood has form (4.2.8), i.e.
hiPiy) = c*((3;y)  c(/3)
and that the score function has form
sY(P; y) = Â¿M/3; v) = eâ€ž(t(x)\y = â€ž) e,(t(x)) (4.5.1)
To solve for a maximal root of (4.5.1) we can begin by using the EM
iterative scheme described in (4.2.11). We will conclude that the iterate
estimate is in a sufficiently small neighborhood of the maximal root as soon as
/5M _Â£(m+i) < SWITCH(TOL), where SWITCH(TOL) > TOL of (4.2.1).
At this point, we will employ the iterative scheme described in (4.2.13). As
a first step in (4.2.13), we must calculate the matrix Ay(/3(m); y) which is an
estimate of the negative Hessian of the incompletedata log likelihood. At
times the Hessian or expected Hessian can be explicitly calculated. This is
 168 
true in the Poisson loglinear case (see equations (4.3.9) and (4.3.10)). Thence
the matrix Ay(/?(m);y) can be explicitly calculated and inverted. Generally,
however, the matrix Ay will only be an approximation.
Since both Ep(T(X)\Y = y) and Ep(T(X)) must be calculated during
the EM algorithm, in view of equation (4.5.1), we must have the ability
to calculate Sy(/3;y) at different values of Â¡3. We then could use as an
approximation to Iy(/9(m);y),
Sy(/?M+ei;
eÂ«
where the bracket notation B[i,] represents the ith row of matrix B and
= (0,..., 0, e, 0,..., 0)' is a p x 1 vector with a small number e in the ith
position. The value of e should be determined by rules used for numerical
differentiation. Meilijson (1989) discusses this approximation technique and
refers to it as EMaided differentiation.
Evidently, if one uses approximation (4.5.2), the only functions needed
to be calculated for (4.2.13) are the score functions which are differences
between the conditional and marginal expected values of the sufficient statistic
T(X). Finally, upon convergence of (4.2.13) we can use [Ay(/?(Â°Â°h y)] 1 as an
estimate of the precision of the ML estimates /3.
If one feels the EM algorithm will converge quickly enough or that
the matrix inversion of Ay is unnecessarily burdensome, then one can
select SWITCH(TOL) = TOL. In which case, Ay will be inverted just
once, since the iterative scheme (4.2.13) will converge after one iteration.
For SWITCH(TOL) = TOL, the modified algorithm is simply the EM
algorithm supplemented by a single calculation of a precision estimate. If
y)
i = 1,
(4.5.2)
Ay(fi^y)[i,\ =
 169 
SWITCH(TOL) > TOL, then the EM algorithm can be viewed as a procedure
for finding an appropriate starting value for the faster iterative schemes such
as NR or FS.
The modified iterative scheme can be described as follows
(1) Solve for /?("+1) in E^+t)(T(X)) = E0(m)(T{X)\Y = y)
(2) If /3(m)   > SWITCH(TOL), then replace /3(m) by /3(m+1)
and go to (1). Else go to (3).
(3) Calculate [Ay(/3(m); y)]_1 and Sy(/3(m);y) as discussed above. (4.5.3)
(4) Replace /3(m) by /3(m+1) = /3(m) + [Ay(/3(m); y)]_15y(/3(m); y)
(5) If /3(m)  /3(m+1) > TOL, then go to (3) (or (1))*. Else stop.
* If the faster, less stable, algorithms are having trouble converging, reset
SWITCH(TOL) to a smaller value and reuse the EM algorithm to get into a
smaller neighborhood of the maximal root.
Algorithm (4.5.3) should be stable, insensitive to starting values, relaÂ¬
tively fast, and will provide an estimate of the precision of the ML estimate
as a byproduct.
As a special case, let us consider applying the modified algorithm (4.5.3)
to the Poisson loglinear model of section 4.3. In that case we were able
to derive an explicit formula for the observed and expected information for
the incomplete data. For simplicity, we will use the expected information
displayed in equation (4.3.10) as our AY matrix, i.e.
Ay(0; V) = r)) = Z'D(M(fl)L'Dâ€˜(LM(/3))LD(fi(/3))Z. (4.5.4)
 170 
By expression (4.3.5), we can write the score function as
SY(f},y) = ZyL'(V^)], (4.5.5)
where the and â€˜â€”â€™ are componentwise operators.
To start the algorithm, we apply the EM iterative scheme of (4.3.7),
continuing until /3(m)  /3(m+1) < SWITCH(TOL). At this point we will go
to step (3) of (4.5.3) using the formulas (4.5.4) and (4.5.5) for Ay and Sy.
Repeat steps (3)(5) of (4.5.3) until the convergence criterion is met.
4.6 Discussion
This chapter emphasized loglinear model fitting when the data are
incomplete. As an example, a latent class loglinear model was fit to the
data presented in Goodman (1974). The primary method of obtaining
ML estimates of the loglinear parameters was the EM algorithm, but other
possibilities such as the NewtonRaphson algorithm were discussed.
In section 4.2 we reviewed the EM algorithm with special attention
given to the regular exponential family. For the regular exponential case, the
iterative scheme (4.2.11) was shown to be equivalent to the EM algorithm.
Then, in section 4.3.1, we derive the specific form for the EM algorithm
when the data are product Poisson with means following a loglinear model.
An explicit formula for the observed information matrix is derived in section
4.3.2. An estimate of the variance of the ML estimates of latent class loglinear
parameters is shown in equation (4.4.2).
The assumption that the data are product Poisson is not as restrictive
as it may seem. In section 4.3.3 we discuss inference for loglinear parameters
 171 
when the complete data are multinomially distributed. The results follow by
arguments of Birch (1963) and Palmgren (1981). It is shown that, when the
total sample size is considered fixed, inferences about all loglinear parameters,
except the one that is fixed by design, are the same for both the product
Poisson assumption and the multinomial assumption. A method of estimating
the variance of classification probability estimates (and functions thereof) is
also developed in this section.
We introduce an alternative root finding algorithm (4.5.3) for the
incomplete exponential family score functions in section 4.5. The algorithm
exploits the positive features of both the EM and NewtonRaphson type
algorithms. Specifically, the algorithm should prove to be insensitive to
starting values and relatively fast (compared to straight EM). It also will
provide an estimate of the precision of the estimators as a byproduct.
As mentioned above, many models that can be fit using the EM
algorithm can also be fit more directly using the NewtonRaphson algorithm.
Appendix B includes a discussion about the program NLIN which fits
generalized linearnonlinear models. Also included in the appendix, is
the code for the two model fitting programs â€˜em.loglinâ€™ and â€˜NLINâ€™. The
FORTRAN program â€˜em.loglinâ€™ is based on the iterative scheme (4.3.6) and
the formula (4.3.9) for the observed information matrix. The Splus program
â€˜NLINâ€™ can be used to fit generalized linear and nonlinear models. The
data are required to be independent and of the exponential dispersion type
(see discussion of NLIN). The author plans on implementing the algorithm
described in (4.5.3) for the Poisson loglinear model case.
APPENDIX A
CALCULATIONS FOR CHAPTER 2
We set out to show that the matrix of equation (2.3.11), viz.
DM agd
_iOiÂ£j 0
is equal to the matrix
ÃD(,To
)Â®7T0i7r{)i 0\ / D(tto)
0 Oj l g'(Ão) o
l
( Mi 0 \
Vo M2)
where
M, = D\7T0)  D1(7To)ii(JH'D1(7ro)Lf)1if'D1(7ro)  Â®f 1*1'
R
and
Proof: For notational convenience, let D = D{7r0) and let H = H(Â£0). We
will state a basic matrix algebra result, the proof of which can be found in
Aitchison and Silvey (1958).
Let A be nonsingular and B be of full column rank. Assuming
compatibility
(A BY1  [A1 A'BiB'A'By'B'A1 A1B(B,A~lB)l\
\B' 0 ) â€œV {B'A'BYB'A1 (B'A'B)1 )m
That is, the partitioned matrix has a simple inverse.
 172 
 173 
Using this result, identifying D and H/nÂ» with A and B, we arrive at
an equivalent form for (2.3.11). It is
(D1  D'HiH'D'Hy'H'D1 n*DlH(H'\
^ n^H'D'Hy'H'D1 n\(H'D~'H)1 ) X
( D  Â©TTo^ 0\
V 0 oJx
(D1  D'HtH'D'Hy'H'D1 n^D^H^H'D^H)1 \
V n^H'D^H^H'D1 nl(H'D~l H)~{ )â€™
Now, using the fact that D_1(7r0)( Â© 7r0Â¿7rÃ³Â¿)D1(7ro) = Â®1rVr =
(ffilpXÂ©!#) and, by Lemma 2.3.1, (Â©1^)# = 0, we can multiply out these
three partitioned matrices to get
( Mi 0 \
VO M2)
where
Mi = D\tt0)  D'MHiH'D'MHy'H'D'M  Â©f 1*1'*
and
4
M2 = nl{H,D1(%0)H)1.
This is what we set out to show.
Result 3 (2.4.6) We wish to show that the asymptotic variances are related
~(P)~(P)'
var(/Â¿(M)) = var(/i(p))  Â©f â€”â€”â€”â€”
Tli
according to
 174
Proof: Since /j, = e^, we can invoke the delta method to arrive at
var(/i(Afl) = var(e Ã(Af)) = .D(eÃÂ°)var(Â£(M))Â£>(eÃÂ°)
= .D(e^0) ^var(Â£(p))  ffiâ€”.D(e^0), by 2.4.5
Ã i'
= .D(e^0)var(Â£(p)).D(e^0)  0^Â°*Â£o*
ni
~(p)~(py
= var(A^)Â©^
Tli
where the equal signs represent asymptotic equivalence.
Result 4 (2.4.7) We wish to show that the asymptotic variances of the
freedom parameter estimates are related according to
var(/3(M)) = var(/3(p))  A,
where
A = (X'Xy'X'C
Q^p.)C'X(X'X)1.
Vni
Proof: In the following, the equal signs represent asymptotic equivalence.
Now, since Â¡3 = (X1 X)1 X' C log^Afi), we can invoke the delta method to
arrive at
var(/?(M))
= (X'X)1 X'Cv ar( log(AA(M)))C"X(X'X)1
= (X'X)1A:,CL>1(A/io)Avar(A(M))A'D1(A/ro)C"X(X'X)1
= (X'X)1X'CD\AnQ)Av3,v(ii^)A'D\An<))C,X(X'Xy1
K~(p)~(p)\
Â© ^ Â£ JA'DyA^C'XiX'X)1
= var(/^p^)
, ^(P)~(P)'
 {X'Xy'X'CDyA^AÃ Â© ^ Â£ j A!D1{Ah0)C,X{X,X)\
 175 
But by assumption (Al) of section 2.3.1,
D1 (An)A( Â® ^l^A'D1 {Ah)
 D1 ( ( Â®Ali \ ( J
\ Â®A2jHj ) \Â®A2j) V Â® VÃ±j
â€” JJ! ( Â©Ajj/Zj \
V Â®A2jHj )
( ffiAim
VÃ±j â€™ v^i
\ ni f Â®^1Â¿Â¿ÃÂ¿
; V Â®A2jPj
(
Vn;
lTi lm2
\Â®A/
Ã ffi Ik?
VÂ® vV
Hence, we have that the asymptotic equivalence
var(/3(M)) = var(/3(p))  A
holds, where
A = {X'Xy'X'C
(0 lmi \
A
m lm2
\Â®vs/
(Â®^ ^)CT(M)"
which is what we set out to show.
APPENDIX B
CALCULATIONS FOR CHAPTER 4
We prove that the four properties of the EM algorithm introduced in
section 4.2.1 do indeed hold. These proofs are essentially those of Dempster
et al. (1977) and Little and Rubin (1986).
Property 1. If 6and 0(m+1) are the mth and m + 1st iterate estimates
obtained via the EM algorithm then
eY(S^,v) > Â£Y{Â«(m),yy,
i.e. the log likelihood is increased at each successive iteration.
Proof: As in section 4.2.1, we write the incomplete data log likelihood as
eY(6;y) = Q(e,e(m);y)H(e,e<."'),y).
Now, by Jensenâ€™s inequality, H(6,6^;y) < 0(m); y), V0. This follows
since
(0(m), Â»(m); v) = iW
= J lÂ°gfx]Ax\ 0(m>)fxir(x; 0(m,)A/
= / {log (y^ir) +lÂ°zfxirW)}fxÂ¡Y(*;Â»(m))d
= / l0S ( fxly(X: ^ ^ + HW"h
> H(Â»,Â»(");Â»)
(B.l)
 176 
 177
where the last inequality holds since the â€˜logâ€™ function is concave whereby
Jensenâ€™s inequality tells us that
/log ^Xfx\Axo) ~) 9{m)^du  ~ los / fx\Y(*; 6)du = loS 1 = Â°
Now, equation (B.l) holds at 6 â€” 0(m+1), i.e.
tf(0(m+1),0(m);y) <
Therefore,
Â£y(0(m+1); y) = Q(6(m+1\ y)  y)
> Q(6(m+1\ ^(m); y)  if y)
> Q(0, y)
= iy(Â«(m,;y)
where the second inequality follows since 0(m+1) is defined to be that value of
9 that maximizes the function Q(#,0(m);y). Hence we have shown that
Property 2: The sequence of EM iterates >1} satisfy, whenever
y(m) converges to as m > oo,
Â¿4(Â«;y)U., = Sr(Â«(â€œ);Ã/) = o.
i.e. the estimates converge to a zero of the score vector for Y.
Proof: Using Property 3, we can write the score vector for the incomplete
data as
Ãr(Â«(â€,);j/) = Â¿Q(M(m);y)U..>.
 178 
But this implies that
^fr(M(m,;y)U>=o,
since
!') = gjMÂ»;!/)l*, = ^Â«(M("); y)U>
Therefore,
= 0 + o(l;m â€”â–º oo)
since, by definition of 0(m+1),
and because as  0(m+1) goes to zero the function 0(m); y)^m+1)
goes to zero. But by convergence properties of the EM algorithm 0(m) 
0(â„¢+i) Â»0asmtoo, Thus equation (B.2) holds and is tantamount to
Â§qÂ¿y{9,v) I*Â») =0.
Property 3: For any 60,
o,Â»)U = Sy(e 0; y) = E*,(sx(eÂ« X)\Y = y).
179 
Proof:
Â¿Q(M.;y)
= E9o(px(Â«,X)\ei)lY = y)
= Ell'(Sx(e0;X)\Y = y)
= I Sx(0oâ€¢,x)fX\Y{x,y,9Q)dv
J R
= IRWlosfx{xâ€™e)
fx{x;0)
Jr fx{x) 0)dv
)
dv
_ jRWefx(x;0)dv
Jr fx{x'i Q)dv
&0
= Ã¡(Iog/,/x^;S)d,/)lÂ».
= Â¿lÂ°g/y(y;^)lÂ»â€ž = SY(S0;y).
Property 4: For any 0o,
Ir(Â«o; y) = Â£*(/*(Â«.; X)ir = y)  varÂ»0 (SX(0O; X)\Y = y).
Proof: Since the observed information matrix is the negative Hessian of
the log likelihood, we have that
lr(0; y)
dHY(e,y)
BO'dO
33 Q(MÂ«;y)g^tf(Mo;y)
dB'de
Eâ€˜Â° ( wml*V'< x>r=v)E*( wmeMÂ»<Â». =y)
= Â£,â€ž (/*(Â»; X)\Y = y)  Â£fc(JXK(Â«;y, X)K = Â»).
 180 
But
Eao(ixiy(Â«aiy,x)\Y = Â») = Â£*( = Â»)
= ^(^y(#Â¡Â»y)k^xy(tf;wy)Ulir = Â»)
= ^([Sx(9â€ž;X)Sy(Â«0;y)]x
[5x(9o;X)5y(90;y)]'y = y)
= Â£*([Sx(Â»â€ž;X)  Eâ€žt(Sx(e0,X)]Y = y)]x
[Sjr(Â»â€ž;.X)  Eeâ€ž(Sx(eâ€žX)\Y = y)]'\Y = y)
= B9â€ž(MÂ»o;^)^(9o;X)y = y)
 Â£*,(S*(0â€ž;X)y = y)Eg0(S'x(6o] X)\Y = y)
= varâ€žo(5A(90;X)y = y).
Hence
MÂ»o! y) = E<,a(Ix(6,X)\Y = y)  var^S* (90; X )y = y),
which is what we set out to show. g
Theorem: If the complete data vector X has distribution in the regular
exponential family, i.e. the density function has form
fx(x\P) = a(x)exp(T'(x)Pc(J3)) (B.3)
with respect to some measure, then the EM algorithm can be used to find the
MLE of P based on incomplete data Y = Y(X) and the algorithm is as stated
in (4.2.11).
Proof: Sundberg (1976) shows that the EM algorithm can be used to find
the ML estimates of P based on incomplete data. We will show that the
 181 
general EM algorithm of (4.2.3) reduces to (4.2.11) when the complete data
have distribution in the regular exponential family.
The general EM algorithm (4.2.3) is defined as
Q(/3(m+1),/3(m);y) = max Q(0,pW Â¡y)
where
Q(t3,/?<"â€¢>;!/) = E0{m)(ix(p,X)\Y = y).
Now since X has density of form (B.3), it follows the the log likelihood
ix{/3',X) has form
txtfi X) = loga(X) + T'(X)(3  c(/3).
Hence,
Q(P,/3
Now, since
dp d0 dpdp vzrâ€žU{X))
is negative definite, it follows that the solution, say /3(m+1\ to
^Q(P,0^;y) = 0
is the value of (3 that maximizes the function Q(/9,/3(m); y). But
^pQ(P,0(m);y) = E^(T(X)Â¡Y = y) 
= Eâ€ž{m>(T(X)\Y = y)  E,(T(X)).
 182 
Hence /?(m+1) satisfies
Ep(m+i)(T(X)) = Ep(m){T(X)\Y = y)
which is tantamount to showing the equivalence of the two iterative schemes
(4.2.3) and (4.2.11). a
We differentiate the score vector of equation (4.3.5) to obtain an explicit
expression for the observed information matrix. Recall that we are to show
that the information matrix can be expressed as in (4.3.9), viz.
Ir(0;y) = Z'D(n)L'D(jÂ±)LD(Â»)Z  Zâ€™D(L'(^ii))DMZ.
Proof: By equation (4.3.5), we know that the score vector for Y is
â„¢e(^)^
Now
dSY{(3;y) _ (dSY(py)\ ( dp\
dp V ap )\dp)'
where
d\i _ dexp(Z(3)
dp ~ W
= D(n)Z.
In the following, denote the n x 1 elementary vectors by eÂ¿. That is,
e' = (0,0,.. .,0,1,0,. ..,0,0),
where the â€˜1â€™ is in the ith position.
 183 
We set out to find the derivative of the score vector with respect to /j,.
It is
dMM = JL[z'DMÂ±{y^)L,]
dfi1
L\n
Therefore,
=9W(py^^
= Z'(eâ€™!e'i)[pyij^)LlSIn)
= Z'D(pyijgÂ£)L,)
m
xm'LWjfcpW
= Z'd(l\?LÂ»
Ly,
ZWD^)L.
9SY(/3,y) _ fdSY((3;y)\ ( dfx\
d/3' V dyf )\dp)
= Zâ€™D(n)L'D(^)LD(n)Z  Z'd(l'(Z^Â¿))d(v>)Z,
which is what we set out to show.
Using the delta method we can find the asymptotic variance of 7r. The
expression for the asymptotic variance involves the matrix dir/d(3'. We show
 184 
that equation (4.3.16) holds, i.e.
dir _ Vn/j.)ntÂ¿]X _
d(3'
i1^)2
= [D(ir)  7T7r']X.
Proof: From (4.3.15) we have that
â€ž = exp(al. + X/3) = N (),
or equivalently that
_/Â£ _ / exp(X/3) ^
"U.
iV Vl^exp(X/3)/'
Here Â¡3 is an unconstrained parameter vector of length p. Notice that l'nfj, = N
and hence,
dir _ d / exp (X/3) \ _ d / Â¡j, \
W ~ W\Vnexp{X(3)J ~ W\KÂ¿)
= W(Â¿Â®^)+K(I^)(^(1W)
= Â¿W(Â¿FÃâ€˜1â€W
= Â¿ w  ^'x
= [JM(g;= \d(Â«)â„¢â€˜]x. m
DESCRIPTION OF COMPUTER PROGRAMS
em.loglin. Briefly, em.loglin is a FORTRAN program that can be used to
obtain ML estimates of loglinear parameters as well as an estimate of their
precision when only disjoint sums of the complete Poisson data are observable.
The EM algorithm (4.3.7) is used to find the ML estimates and expression
(4.3.9) is used to calculate the precision estimate. It is assumed that the
 185 
complete data X are distributed product Poisson with n x 1 mean vector
p following the loglinear model log p = Z/3. The incomplete data must be
expressible as Y = LX where L is an m x n matrix that satisfies properties
(l)(3) of (4.3.1). The user must input the following information
(1) an initial estimate of the complete data means that equation
L/j(Â°) = y, i.e. /i(Â°) is consistent with the observed data y
(2) m and n, the length of the observed and complete data vectors.
(3) p, the number of loglinear parameters
(4) Z, the n x p full column rank design matrix
(5) L, the m x n matrix that satisfies Y = LX.
The output includes
(1) /3, an ML estimate of the loglinear parameter vector /3
(2) var(/3), an estimate of precision of the ML estimate
(3) G2, the likelihood ratio goodnessoffit statistic
(4) df, the degrees of freedom associated with the null asymptotic
Chisquare distribution of G2
(5) p, an estimate of the complete data cell means
(6) var(/l), an estimate of the precision of p (Poisson sampling)
NLIN NLIN is an Splus (Becker, et al. 1989) program that fits generalized
linear and nonlinear models to data with distributions in the exponential
dispersion family (Jqrgenson, 1989). We now briefly describe exponential
dispersion models and how to fit them.
 186 
A General Algorithm For Fitting Generalized LinearNonlinear Models. Let
Yi,...,Yn ~ ind ED(m,a2,Wi),
i.e. the density function for the random variable YÂ¿ has form
f{Vi) = a(yÂ¿,<72,wÂ¿)exp{^(0Â¿yÂ¿  k(0Â¿))}>
where k(0) is the cumulant function and /c'(0Â¿)d=r(6i) = Hi
Suppose that each mean can be expressed as an invertible function of
some covariate vector and a p x 1 parameter vector, i.e. Hi = /iÂ¿(xj,/3), i =
1,..., n. Some examples are
(1) Hi = x'ift, Linear Model, Identity Link
(2) Hi â€” exp(xÂ¿/3), Linear Model, Log Link
(3) Hi â€” exP(ccÂ¿/3)/(l +exp(xÂ¿/3)), Linear Model, Logit Link
(4) Hi â€” exp(X/3), Nonlinear Model, Log Link
Example (4) is nonlinear when the matrix L is some m x n (m < n) matrix
satisfying (4.3.1) and is not the identity matrix. Note that L\ is the ith row
of the matrix L. In fact, the matrix L can be chosen so that the Poisson
loglinear latent class models are a special case of example (4).
Letting the vector h = (hi,..., hn)' and the symbol ED represent a
particular exponential dispersion distribution, we say that {ED, h} specifies
a generalized linearnonlinear model. As a special case, suppose that each \
has a common inverse g such that
K1(Hi) = gM, t = i,...,n.
We say that the triple
{ED,ri = g()i),rH = x',pi,
(BA)
 187
specifies a generalized linear model (GLM) (McCullagh and Nelder, 1989).
In GLM parlance, the function g is known as the â€˜linkâ€™ function. Examples
include
(1) Poisson Loglinear Model:
{Poisson(/r), rÂ¡ = log(/Â¿), = x'fi}
(2) Binomial Logistic Model:
{Binomial(n,7r), g = logjf^, = x'fl}
(3) Normal Linear Model:
{Normal^, Â£), rj = fi, rj{ = x'fi}
Maximizing the Likelihood Our objective is to make inference about the
loglinear parameters in (3 and hence about the means We will base our
inference on the maximum likelihood estimates and their precision. Therefore,
we must maximize the log likelihood with respect to /2. The log likelihood for
the sample Y is
= Â¿loSa{yi,
i i
where k'(9Ã) = /iÂ¿ = h^x'^P).
 188 
The score function is
_dl{py) _ 1 "
s(P\y) =
d(3
1 V''Â«. \f \(â€ž. \
i A ,dm^ i
where
d(3 )Kdni
= Â¿iÃ )(Â»*)
= Â¿Â¡Ewâ€˜(Tjp)v~tMÂ«
=^w)wv"s
= XD'WV'S,
S = yfi
w = Â®"wi
(B.5)
D =
d/Â¿
Here the matrix D is referred to as the â€˜model matrixâ€™. The maximum
likelihood estimate may be found by solving for a zero of the score function
(B.5) (at least in many cases). To solve for this zero, we will use a Newton
Raphson type algorithm which will require calculation of the Hessian matrix.
W1=hU*â„¢1Â»
=^wl%{{v~IWD) 8 h)+(s' * ^Ã©{v~'WD)
= Â±D'(V1WD + Z)
a2
= XiD'V'WD + D'Z)
a2
where E(Z) = 0 so that the expected value of the Hessian is
e( â€” â€”D'V~lWD
V dP'd(3 ) a2Â±JV VVÂ±J
 189 
Therefore, for /3(fc) in a neighborhood of Â¡3, the solution to the score equation,
we have the following linear approximation
dWk+1)\y) â€ž di(/m,y) 1 dH^y){p
dp
_i_
cr2
dp ' dp'dp
D'WV'S  D'WVlD(p(k+r>  pW)
i=L{p<)
The next estimate of Â¡3 will be p(k+1\ the solution to the linear equation
L(p(k+J)) = 0. The solution is
^(fc+i) _ ^(k) + {pwvlD)1i>wv1s
= (D'WV~1D)~1D'WV~1(Dp(k^ + S) (B. 6)
= {D,WVlD)lD,WV~li^
where = DP^  S is a â€˜localâ€™ dependent variable.
The iterative algorithm (B.6), which is the Fisherscoring algorithm, is
also referred to as the iteratively reweighted least squares algorithm (IRLS).
The reason for this label is evidently due to the last expression in (B.6).
For each k, it resembles a weighted least squares estimate, where the weight
matrix is W1, the model matrix is D, and the dependent variable is
Denoting the ML estimate by Â¡3, we have that in many situations
Â¡3 ~ AN{P,a\D'WV1D)1),
i.e. P has an asymptotic normal distribution. Also, we let
consistent estimator of the dispersion parameter cr2. For example, dividing
the deviance statistic by the degrees of freedom associated with its asymptotic
distribution results in a consistent estimator of cr2 (Jprgenson, 1989).
 190 
By evaluating D and V at Â¡3 and using the consistent estimator
can consistently estimate the asymptotic variance of Â¡3 by
var(/3) Â« o2(b'WV~lb)~l.
The astute reader will notice that, upon specification of the exponential
dispersion distribution, the matrix V is determined. Also, the matrix W
is a matrix of known constants. Hence, the only matrix not determined as
yet is D, the so called â€˜model matrixâ€™. The matrix D is a function of f3 and
X through the following function
D = w = wh(x'^
When the model is of the form (B.4), i.e. the model is a GLM, we have that
the model matrix
i ( \(dM\
dp Kdgâ€™WJK dp )
_ ( ( dr] \ _ (drj'Y1
\dr,')\dp)\dn)
and can be calculated explicitly. But, more generally, when the model is
{ED,h}, D can not be calculated explicitly or at least is very difficult to
calculate explicitly. However, it can be numerically estimated.
Numerical Approximation to D. We use a popular and simple technique to
numerically approximate D. Recall that D is the matrix of partial derivatives
of fi with respect to /3. Hence, the problem is to approximate a derivative
matrix. One such estimate, and the one used in the program NLIN, is
D Â« Dn = [n((3 + e1)^i((3e1),...,n((3 + ep)^ep)\E 1 (B.7)
where = (0,..., 0, e, 0,..., 0)' is a p x 1 vector with the small constant e in
the ith position and the matrix E is a p x p diagonal matrix with 2e on the
diagonal. Thus E = 2elp = 2[ei,..., ep\.
 191 
Now the IRLS algorithm will involve just one additional step and that is
to calculate a numerical approximation to the model matrix D. The actual
algorithm used in NLIN is
(1) Input y,w,fj, = h(X,(3),V(ix), and the deviance function Dev(y,w, y)
(2) Find an initial estimate /3(Â°) of Â¡3
(3) Compute D= Dn(/3^), = V(/3(ml), and
â€” y  /i(/3(ml) (B.8)
(4) Compute /3(m+1) =
(5) Compute Dev(y,w,
(6) If Dev(y, w,  Dev(y,wy > TOL, replace Â¡3
by and go to (3). Else stop.
Notice that step (1) of (B.8) involves inputting the data, the weights,
the mean function, the variance function, and the corresponding deviance
function. It follows that this program can more generally be used to fit models
via quasilikelihood methods (McCullagh and Nelder, 1989). Another remark
is worthwhile mentioning. When the model is {ED, g(n) = /i, fj. = X/3}, i.e.
a Linear, Identity link model, the numerical approximation Djv of D in (B.7),
which equals X, is exact. Specifically, for the Normal Linear Model
 192 
the approximation is exactly equal to the model matrix X. The argument is
as follows
(DN)ij â€” [Vi(P + ej) ~ Vi{P  ei)]/2ei
Thus, Dn = X = D.
= WÃP + Cj)  x'AP  *j)\/2e
= [AP + Â¿Â¿i  AP + Aej]/2e
= 2x\ey/2e = x\ejle
~ exij/e=xij
= (X)y = {D)a
BIBLIOGRAPHY
Agresti, A. (1984). Analysis of Ordinal Categorical Data. New York: John
Wiley.
Agresti, A. (1989). A Survey of Models for Repeated Ordered Categorical
Response Data. Statistics in Medicine. 8. 12091224
Agresti, A. (1990). Categorical Data Analysis. New York: John Wiley.
Agresti, A. and Lang, J.B. (in press). QuasiSymmetric Latent Class Models,
with Application to Rater Agreement. Biometrics.
Agresti, A., Lipsitz, S., and Lang, J.B. (in press). Comparing Marginal DisÂ¬
tributions of Large, Sparse Contingency Tables. Computational Statistics
and Data Analysis.
Agresti, A. and Pendergast, J. (1986). Comparing Mean Ranks for Repeated
Measures Data. Communications in Statistics. A15. 14171433.
Aitchison, J. and Silvey, S.D. (1958). MaximumLikelihood Estimation of
Parameters Subject to Restraints. Annals of Mathematical Statistics. 29,
813828.
Aitchison, J. and Silvey, S.D. (1960). MaximumLikelihood Estimation
Procedures and Associated Tests of Significance. Journal of the Royal
Statistical Society  B, 1, 154171.
Bartle, R.G. (1976). The Elements of Real Analysis. 2nd edn. New York:
John Wiley &c Sons, Inc.
Becker, M.P. (1989). On the Bivariate Normal Distribution and Association
Models for Ordinal Categorical Data. Statist. Probab. Lett. 8. 435440.
Becker, M.P. and Balagtas, C.C. (1991). A LogNonlinear Model for Binary
Crossover Data. Unpublished Technical Report.
Birch, M.W. (1963). Maximum Likelihood in ThreeWay Contingency Tables.
Journal of the Royal Statistical Society. B25. 220233.
Bishop, Y.M.M., Feinberg, S.E., and Holland, P.W. (1975). Discrete
Multivariate Analysis: Theory and Practice. Cambridge: MIT Press.
 193 
 194 
Cochran, W.G. (1950). The Comparison of Percentages in Matched Samples.
Biometrika. 37. 256266.
Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educ.
Psychol. Meas. 20. 3746.
Conaway, M.R. (1989). Conditional Likelihood Methods for Repeated
Categorical Responses. J. Amer. Statist. Assoc. 84. 5362.
Conaway, M.R. (1990). A Random Effects Model for Binary Data. BiometÂ¬
rics. 46. 317328.
Cox, D.R. (1972). The Analysis of Multivariate Binary Data. Applied
Statistics. 21, 113120.
Dale, J.R. (1986). Global CrossRatio Models for Bivariate, Discrete, Ordered
Responses. Biometrics. 42, 909917.
Darroch, J.N. (1981). The MantelHaenszel Test and Tests of Marginal
Symmetry; Fixed Effects and Mixed Models for a Categorical Response.
International Statistical Review. 49. 285307.
Das Gupta, S. and Perlman, M.D. (1974). Power of the Noncentral F
test: Effect of Additional Variates on Hotellingâ€™s T2test. Journal of the
American Statistician. 69, 174180.
Davis, C.S. (1991). SemiParametric and NonParametric Methods for the
Analysis of Repeated Measurements, with Applications to Clinical Trials.
Unpublished Manuscript.
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum Likelihood
Estimation From Incomplete Data Via the EM Algorithm. J.R. Statist.
Soc., B39, 138.
Efron, B. and Hinckley, D.V. (1978). Assessing the Accuracy of the Maximum
Likelihood Estimator: Observed Versus Expected Fisher Information.
Biometrika. 65. 457488.
Ezzet, F. and Whitehead, J. (1991). A Random Effects Model for Ordinal
Responses From a Crossover Trial. Statistics in Medicine. 10. 901907.
Foster, M.H., and Martin, M.L. (1966). Probability, Confirmation, and
SimplicityReadings in the Philosophy of Inductive Logic. New York: The
Odyssey Press, Inc.
Goodman, L.A. (1974). Exploratory latent structure analysis using both
identifiable and unidentifiable models. Biometrika. 61. 215231.
 195 
Goodman, L.A. (1981). Association Models and the Bivariate Normal for
Contingency Tables with Ordered Categories. Biometrika. 68. 347355.
Gourieroux, C., Monfort, A., and Trognon, A. (1984). Pseudo Maximum
Likelihood Methods Theory. Econometrica. 52. 681700.
Grizzle, J.E., Starmer, C.F., and Koch, G.G. (1969). Analysis of Categorical
Data by Linear Models. Biometrics. 25. 489504.
Haber, M. (1985a). Loglinear Models For Correlated Marginal Totals of a
Contingency Table. Communications in StatisticsTheory and Methods.
14, 28452856.
Haber, M. (1985b). Maximum Likelihood Methods for Linear and LogLinear
Models in Categorical Data. Comp. Stat. & Data Anal. 3. 110.
Haber, M. and Brown, M. (1986). Maximum Likelihood Methods for
LogLinear Models When Expected Frequencies are Subject to Linear
Constraints. Journal of the American Statistical Association. 81, 394,
477482.
Haberman, S.J. (1978,1979). Analysis of Qualitative Data. Vols. 1 & 2, New
York: Academic Press.
Haberman, S.J. (1988). A Stabilized NewtonRaphson Algorithm for Log
Linear Models for Frequency Tables Derived by Indirect Observation, in
Sociological Methodology, ed. C.C. Clogg. San Francisco, CA: JosseyBass
Publishers. 193211.
Hout, M., Duncan, O.D., and Sobel, M.E. (1987). Association and HeteroÂ¬
geneity: Structural Models of Similarities and Differences, in Sociological
Methodology, ed. C.C. Clogg. San Francisco, CA: JosseyBass Publishers.
145184.
J0rgensen, B. (1989). The Theory of Exponential Dispersion Models and
Analysis of Deviance. Preliminary version of book on generalized linear
models.
Koch, G.G., Amara, I.A., Stokes, M.E., and Gillings, D.B. (1980). Some
Views on Parametric and NonParametric Analysis for Repeated MeasureÂ¬
ments and Selected Bibliography. International Statistical Review. 48.
249265.
Laird, N.M. (1991). Topics In LikelihoodBased Methods For Longitudinal
Data Analysis. Statistica Sinica. 1, 3350.
 196 
Laird, N.M., Lange, N., and Stram, D. (1987). Maximum Likelihood
Computations with Repeated Measures: Application of the EM Algorithm.
Journal of the American Statistical Association, 82, 97105.
Landis, J.R. and Koch, G.G. (1975). A Review of Statistical Methods in the
Analysis of Data Arising from Observer Reliability Studies, Parts I, II.
Statist. Neerlandica. 29. 101123, 151161.
Landis, J.R. and Koch, G.G. (1979). The Analysis of Categorical Data in
Longitudinal Studies of Behavioral Development, in Longitudinal MethodÂ¬
ology in the Study of Behavior and Development, eds. J.R. Nesselroade
and P.B. Baltes. New York: Academic Press. 233261.
Landis, J.R., Miller, M.E., Davis, C.S., and Koch, G.G. (1988). Some General
Methods for the Analysis of Categorical Data in Longitudinal Studies.
Statistics in Medicine. 7. 109137.
Lang, J.B. (in press). Obtaining the Observed Information Matrix for the
Poisson Loglinear Model with Incomplete Data. Biometrika.
Lehmann, E.L. (1983). Theory of Point Estimation. New York: John Wiley
& Sons, Inc.
Liang, K.Y. and Zeger, S.L. (1986). Longitudinal Data Analysis Using
Generalized Linear Models. Biometrika. 73, 1322.
Liang, K.Y., Zeger, S.L., and Qaqish, B. (1992). Multivariate Regression
Analyses for Categorical Data (With Discussion). J.R. Statist. Soc. B54.
340.
Lipsitz, S.R. (1988). Methods for Analyzing Repeated Categorical Outcomes.
Unpublished PhD Dissertation, Department of Biostatistics, Harvard
University.
Lipsitz, S.R., Laird, N.M., and Harrington, D.P. (1990). Maximum LikeliÂ¬
hood Regression Methods for Paired Binary Data. Statistics in Medicine.
9. 15171525.
Lipsitz, S.R., Laird, N.M., and Harrington, D.P. (1992). A threestage
estimator for studies with repeated and possibly missing binary outcomes.
J.Roy. Statist. Soc. C41. 203213.
Little, R.J.A. and Rubin, D.B. (1986). Statistical Analysis with Missing Data.
New York: John Wiley.
Louis, T. A. (1982). Finding the Observed Information Matrix when Using
the EM Algorithm. J.R. Statist. Soc. B, 44, 226233.
 197
MacRae, E.C. (1974). Matrix Derivatives with an Application to an Adaptive
Linear Decision Problem. Annals of Statistics. 2. 337346.
Madansky, A. (1963). Tests of Homogeneity for Correlated Samples. J.Amer.
Statist. Assoc. 58. 97119.
Magnus, J.R. and Neudecker, H. (1988). Matrix Differential Calculus with
Applications in Statistics and Econometrics. New York: John Wiley &
Sons, Ltd.
Mantel, N. (1963). ChiSquare Tests with One Degree of Freedom; Extensions
of the MantelHaenszel Procedure. J.Amer. Statist. Assoc. 58. 690700.
Mantel, N. and Haenszel, W. (1959). Statistical Aspects of the Analysis of
Data from Retrospective Studies of Disease. J. Natl. Cancer Inst. 22.
719748.
McCullagh, P. (1980). Regression Models for Ordinal Data (with discussion).
J. Roy. Statist. Soc. B42. 109142.
McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models. London:
Chapman and Hall.
McNemar, Q. (1947). Note on the Sampling Error of the Difference Between
Correlated Proportions or Percentages. Psychometriha. 12. 153157.
Meilijson, I. (1989). A Fast Improvement to the EM Algorithm on its Own
Terms. J.R. Statist. Soc., B51, 127138.
Meng, X. and Rubin, D.B. (1991). Using EM to Obtain Asymptotic Variance
Covariance Matrices: The SEM Algorithm. J. Amer. Statist. Assoc. 86.
899909.
Palmgren, J. (1981). The Fisher Information Matrix For Log Linear Models
Arguing Conditionally on Observed Explanatory Variables. Biometrika.
68, 2, 563566.
Palmgren, J., and Ekholm, A. (1987). Exponential Family NonLinear Models
For Categorical Data With Errors Of Observation. Applied Stock. Models
and Data Analysis, 3, 111124.
Prentice, R.L. (1988). Correlated Binary Regression with Covariates Specific
to Each Binary Observation. Biometrics. 44. 10331048.
Prentice, R.L. and Zhao, L.P. (1991). Estimating Equations for ParameÂ¬
ters in Means and Covariances of Multivariate Discrete and Continuous
Responses. Biometrics. 47. 825840.
 198 
Rao, C.R. (1974). Linear Statistical Inference and Its Applications. 2nd edn.
New York: John Wiley & Sons, Inc.
Rasch, G. (1961). On General Laws and the Meaning of Measurement in
Psychology, in Proc. J^th Berkeley Symp. Math. Statist. Probab. vol. 4.
ed. J. Neyman. Berkeley: University of California Press. 321333.
Read, T.R.C. and Cressie, N.A.C. (1988). GoodnessofFit Statistics for
Discrete Multivariate Data. New York, NY: Springer Verlag.
Royall, R.M. (1986). Model Robust Confidence Intervals Using Maximum
Likelihood Estimators. Inti. Statist. Rev. 54. 221226.
SAS Institute Inc. (1985). SAS Userâ€™s Guide: Statistics, Version 5 Edition.
Cary, NC: SAS Institute Inc.
Silvey, S.D. (1959). The LagrangeMultiplier Test. Ann. Math. Statist. 30.
389407.
Stiratelli, R., Laird, N.M., and Ware, J.H. (1984). RandomEffects Models
for Serial Observations with Binary Response. Biometrics. 40. 961971.
Stram, D.O., Wei, L.J. and Ware, J.H. (1988). Analysis of Repeated
Ordered Categorical Outcomes with Possibly Missing Observations and
TimeDependent Covariates. J. Amer. Statist. Assoc. 83. 631637.
Sundberg, R. (1974). Maximum Likelihood Theory for Incomplete Data from
an Exponential Family. Scand. J. Statist., 1, 4958.
Sundberg, R. (1976). An Iterative Method for Solution of the Likelihood
Equations for Incomplete Data from Exponential Families. Comm. Statist.
 Simula. Computa. B5. 5564.
Tjur, T. (1982). A Connection Between Raschâ€™s Item Analysis Model and a
Multiplicative Poisson Model. Scand. J. Statist. 9. 2330.
Ware, J.H., Lipsitz, S., and Speizer, F.E. (1988). Issues in the Analysis of
Repeated Categorical Outcomes. Statistics in Medicine. 7. 95107.
Wedderburn, R.W.M. (1974). Quasilikelihood Functions, Generalized Linear
Models, and the GaussNewton Method. Biometrika. 61. 439447.
Wei, L.J. and Stram, D.O. (1988). Analysing Repeated Measurements
with Possibly Missing Observations by Modeling Marginal Distributions.
Statistics in Medicine. 7. 139148.
 199 
White, A.A., Landis, J.R., and Cooper, M.M. (1982). A Note on the
Equivalence of Several Marginal Homogeneity Test Criteria for Categorical
Data. Internat. Statist. Rev. 50. 2734.
White, H. (1980). A HeteroskedasticityConsistent Covariance Matrix EstiÂ¬
mator and a Direct Test for Heteroskedasticity. Econometrica. 48. 817
838.
White, H. (1981). Consequences and Detection of Misspecified Nonlinear
Regression Models. J. Amer. Statist Assoc. 76. 419433.
White, H. (1982). Maximum Likelihood Estimation of Misspecified Models.
Econometrica. 50. 125.
White, A.A., Landis, J.R., and Cooper, M.M. (1982). A Not on the
Equivalence of Several Marginal Homogeneity Test Criteria for Categorical
Data. Internat. Statist. Rev. 50. 2734.
Zhao, L.P. and Prentice, R.L. (1990). Correlated Binary Regression Using a
Quadratic Exponential Model. Biometrika. 77. 642648.
Zeger, S.L. (1988). The Analysis of Discrete Longitudinal Data: CommenÂ¬
tary. Statistics in Medicine. 7, 161168.
Zeger, S.L., and Liang, K.Y. (1986). Longitudinal Data Analysis for Discrete
and Continuous Outcomes. Biometrics. 42, 121130.
Zeger, S.L., Liang, K.Y., and Albert, P.S. (1988). Models For Longitudinal
Data: A Generalized Estimating Equation Approach. Biometrics. 44.
10491060.
BIOGRAPHICAL SKETCH
Joseph Benedict Lang was born in St. Cloud, Minnesota, on February
12, 1963. In 1967, his parents, Ralph and Mary Jean Lang, moved the family
to Richmond, a small resort town in central Minnesota. He remained in the
central Minnesota area for 23 years. His parents, 7 sisters, and 1 brother
remain there to this day. In 1982, he decided to pursue a college degree. His
10 year â€œcareerâ€ as bartender and cook looked to be nearing an end when
he began his postsecondary education at St. Cloud State University. After
a brief period of entertaining the idea of majoring in art, Joseph grew very
fond of mathematics and statistics and decided to focus his attention on these
more quantitative disciplines.
After receiving his Bachelor of Arts degree in mathematics from St.
Cloud State University in 1986, Joseph was encouraged to pursue his Masterâ€™s
and Ph.D. degrees in statistics at the University of Florida in Gainesville.
He went on to receive a Master of Statistics degree in 1988 and, under the
direction of Professor Alan Agresti, was awarded a Ph.D. degree in statistics
in the spring of 1992. While working toward these degrees, he worked as
a teaching assistant, biostatistics consultant, and a research assistant. In
1992, Joseph accepted an academic position as assistant professor in the
Department of Statistics and Actuarial Science at the University of Iowa.
 200 
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
(
(vJ
(r
Alan i
vgresti, Chair^
Professor of Statistics
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
JÃ©tne Pendergast
Associate Professor of Statistics
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
Rocco Ballerini
Associate Professor of Statistics
I certify that. I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
Carole Kimberlin
Associate Professor of Pharmacy
Health Care Administration
This dissertation was submitted to the Graduate Faculty of the
Department of Statistics in the College of Liberal Arts and Sciences and
to the Graduate School and was accepted as partial fulfillment of the
requirements for the degree of Doctor of Philosophy.
May, 1992
Dean, Graduate School
UNIVERSITY OF FLORIDA
3 1262 08554 0424
xml version 1.0 encoding UTF8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EWPC9K8TS_TI4ND4 INGEST_TIME 20170713T22:07:54Z PACKAGE AA00003698_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
PAGE 1
21 02'(/ ),77,1* )25 08/7,9$5,$7( 32/<720286 5(63216( '$7$ %\ 26(3+ % /$1* $ ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 2) 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),//0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$ 81,9(56,7< 2) )/5%$ /,%5$5,(6
PAGE 2
$&.12:/('*0(176 ZRXOG OLNH WR H[SUHVV P\ DSSUHFLDWLRQ WR 'U $ODQ $JUHVWL IRU VHUYLQJ DV P\ GLVVHUWDWLRQ DGYLVRU )RU WKH PDQ\ FRPPHQWV LGHDV DQG OHVVRQV KH KDV VKDUHG ZLWK PH DP JUHDWO\ LQGHEWHG 7KURXJK KLV DGYLVHPHQW DQG JXLGDQFH KH KDV WDXJKW PH WR DSSUHFLDWH DQG UHVSHFW JRRG VWDWLVWLFDO UHVHDUFK DQG WHDFKLQJ +H LV D PHQWRU ZRUWK\ RI HPXODWLRQ DOVR ZDQW WR H[SUHVV P\ JUDWLWXGH WR 'U DQH 3HQGHUJDVW ZKR DOVR VHUYHG RQ P\ GLVVHUWDWLRQ FRPPLWWHH OHDUQHG D JUHDW GHDO IURP KHU GXULQJ WKH WZR \HDUV WKDW ZRUNHG LQ WKH %LRVWDWLVWLFV 'HSDUWPHQW 7R DOO RI WKH IDFXOW\ DW WKH 8QLYHUVLW\ RI )ORULGD H[WHQG P\ WKDQNV 7KH VWDWLVWLFV GHSDUWPHQW ZLWK LWV VFKRODUO\ DQG IULHQGO\ DWPRVSKHUH SURYHG WR EH D ZRQGHUIXO SODFH WR OHDUQ 7KH LQIOXHQFHV RI SHUVRQV IURP P\ SDVW DUH QRW IRUJRWWHQ :LWKRXW 3DWULFN .HDULQfV VWLPXODWLQJ WHDFKLQJ RI KLJK VFKRRO PDWK PD\ QHYHU KDYH EHFRPH LQWHUHVWHG LQ WKLV VXEMHFW 7KH JHQXLQH H[FLWHPHQW GHOLYHUHG E\ 'U DPHV .HSQHU LQ KLV WHDFKLQJ RI XQGHUJUDGXDWH VWDWLVWLFV ZDV WKH UHDVRQ GHFLGHG WR SXUVXH DQ DGYDQFHG GHJUHH LQ VWDWLVWLFV ZRXOG OLNH WR WKDQN P\ SDUHQWV DQG WKH UHVW RI P\ IDPLO\ IRU DOO RI WKH VXSSRUW DQG HQFRXUDJHPHQW WKH\ KDYH JLYHQ RYHU WKH FRXUVH RI P\ VWXGLHV DQG UHVHDUFK 0\ IULHQGV DQG VWXGHQW FROOHDJXHV GHVHUYH PDQ\ WKDQNV DV ZHOO )LQDOO\ ZRXOG OLNH WR WKDQN .HQGUD 3DDU IRU DOZD\V EHLQJ WKHUH WR VXSSRUW DQG HQFRXUDJH PH ZKLOH ZDV ZULWLQJ WKLV SDSHU Q
PAGE 3
7$%/( 2) &217(176 SDJH $&.12:/('*0(176 LL /,67 2) 7$%/(6 Y $%675$&7 YL &+$37(56 ,1752'8&7,21 $ %ULHI ,QWURGXFWLRQ WR WKH 3UREOHP 2XWOLQH RI ([LVWLQJ 0HWKRGRORJLHVfÂ§1R 0LVVLQJ 'DWD 2XWOLQH RI ([LVWLQJ 0HWKRGRORJLHVfÂ§0LVVLQJ 'DWD )RUPDW RI 'LVVHUWDWLRQ 5(675,&7(' 0$;,080 /,.(/,+22' )25 $ *(1(5$/ &/$66 2) 02'(/6 )25 32/<720286 5(63216( '$7$ ,QWURGXFWLRQ 3DUDPHWULF 0RGHOLQJfÂ§$Q 2YHUYLHZ 0RGHO 6SHFLILFDWLRQ 0HDVXULQJ 0RGHO *RRGQHVV RI )LW 0XOWLYDULDWH 3RO\WRPRXV 5HVSRQVH 0RGHO )LWWLQJ $ *HQHUDO 0XOWLQRPLDO 5HVSRQVH 0RGHO 0D[LPXP /LNHOLKRRG (VWLPDWLRQ $V\PSWRWLF 'LVWULEXWLRQ RI 3URGXFW0XOWLQRPLDO 0/ (VWLPDWRU /DJUDQJHfV 0HWKRGfÂ§7KH $OJRULWKP &RPSDULVRQ RI 3URGXFW0XOWLQRPLDO DQG 3URGXFW3RLVVRQ (VWLPDWRUV 0LVFHOODQHRXV 5HVXOWV 'LVFXVVLRQ LLL
PAGE 4
SDJH 6,08/7$1(286/< 02'(/,1* 7+( 2,17 $1' 0$5*,1$/ ',675,%87,216 2) 08/7,9$5,$7( 32/<720286 5(63216( 9(&7256 ,QWURGXFWLRQ 3URGXFW0XOWLQRPLDO 6DPSOLQJ 0RGHO RLQW DQG 0DUJLQDO 0RGHOV 1XPHULFDO ([DPSOHV 3URGXFW0XOWLQRPLDO 9HUVXV 3URGXFW3RLVVRQ (VWLPDWRUV $Q ([DPSOH ,OO :HOO'HILQHG 0RGHOV DQG WKH &RPSXWDWLRQ RI 5HVLGXDO 'HJUHHV RI )UHHGRP 'LVFXVVLRQ /2*/,1($5 02'(/ ),77,1* :,7+ ,1&203/(7( '$7$ ,QWURGXFWLRQ 5HYLHZ RI WKH (0 $OJRULWKP *HQHUDO 5HVXOWV ([SRQHQWLDO )DPLO\ 5HVXOWV /RJOLQHDU 0RGHO )LWWLQJ ZLWK ,QFRPSOHWH 'DWD 7KH (0 $OJRULWKP IRU 3RLVVRQ /RJOLQHDU 0RGHOV 2EWDLQLQJ WKH 2EVHUYHG ,QIRUPDWLRQ 0DWUL[ ,QIHUHQFHV IRU 0XOWLQRPLDO /RJOLQHDU 0RGHOV /DWHQW &ODVV 0RGHO )LWWLQJfÂ§$Q $SSOLFDWLRQ 0RGLILHG (01HZWRQ5DSKVRQ $OJRULWKP 'LVFXVVLRQ $33(1',&(6 $ &$/&8/$7,216 )25 &+$37(5 % &$/&8/$7,216 )25 &+$37(5 %,%/,2*5$3+< %,2*5$3+,&$/ 6.(7&+ ,9
PAGE 5
/,67 2) 7$%/(6 SDJH 2SLQLRQ 3ROO 'DWD &RQILJXUDWLRQ ,QWHUHVW LQ 3ROLWLFDO &DPSDLJQV &URVV2YHU 'DWD RLQW 'LVWULEXWLRQ 0RGHOVfÂ§*RRGQHVV RI )LW 0DUJLQDO 'LVWULEXWLRQ 0RGHOVfÂ§*RRGQHVV RI )LW &DQGLGDWH 0RGHOV LQ / [ / 'f Q 0^8ffÂ§*RRGQHVV RI )LW (VWLPDWHV RI )UHHGRP 3DUDPHWHUV IRU 0RGHO / [ / 'f Q 0&8f )UHHGRP 3DUDPHWHU (VWLPDWHV DQG 6WDQGDUG (UURUV (VWLPDWHG &HOO 0HDQV DQG 6WDQGDUG (UURUV &URVV2YHU 'DWD 0RGHOVfÂ§*RRGQHVV RI )LW )UHHGRP 3DUDPHWHU 0/ (VWLPDWHV IRU 0RGHO 8Âƒf Q 08f &KLOGUHQfV 5HVSLUDWRU\ ,OOQHVV 'DWD 3URGXFW0XOWLQRPLDO YHUVXV 3URGXFW3RLVVRQ )UHHGRP 3DUDPHWHU (VWLPDWLRQ 2EVHUYHG FURVVFODVVLILFDWLRQ RI UHVSRQGHQWV ZLWK UHVSHFW WR ZKHWKHU WKH WHQG WRZDUG XQLYHUVDOLVWLF Of RU SDUWLFXODULVWLF f YDOXHV LQ IRXU VLWXDWLRQV $%&'f RI UROH FRQIOLFW 3DUDPHWHU DQG 6WDQGDUG (UURU (VWLPDWHV &ODVVLILFDWLRQ 3UREDELOLW\ (VWLPDWHV Y
PAGE 6
$EVWUDFW RI 'LVVHUWDWLRQ 3UHVHQWHG WR WKH *UDGXDWH 6FKRRO RI WKH 8QLYHUVLW\ RI )ORULGD LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 'RFWRU RI 3KLORVRSK\ 21 02'(/ ),77,1* )25 08/7,9$5,$7( 32/<720286 5(63216( '$7$ %\ RVHSK % /DQJ 0D\ &KDLUPDQ 'U $ODQ $JUHVWL 0DMRU 'HSDUWPHQW 6WDWLVWLFV $ EURDG FODVV RI PRGHOV WKDW LPSO\ VWUXFWXUH RQ ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV RI PXOWLYDULDWH FDWHJRULFDO RUGLQDO RU QRPLQDOf UHVSRQVHV LV LQWURGXFHG 7KHVH SDUVLPRQLRXV PRGHOV FDQ EH XVHG WR VLn PXOWDQHRXVO\ GHVFULEH WKH PDUJLQDO GLVWULEXWLRQV RI WKH UHVSRQVHV DQG WKH DVVRFLDWLRQ VWUXFWXUH DPRQJ WKH UHVSRQVHV $V D VSHFLDO FDVH WKLV FODVV RI PRGHOV LQFOXGHV FODVVLFDO ORJ DQG ORJLWOLQHDU PRGHOV ,Q WKLV VHQVH ZH DGGUHVV PRGHO ILWWLQJ IRU PXOWLYDULDWH SRO\WRPRXV UHVSRQVH GDWD IURP D YHU\ JHQHUDO SHUVSHFWLYH 6LPXOWDQHRXV PRGHOV IRU MRLQW DQG PDUJLQDO GLVWULEXWLRQV DUH XVHIXO LQ D YDULHW\ RI DSSOLFDWLRQV LQFOXGLQJ ORQJLWXGLQDO VWXGLHV DQG VWXGLHV GHDOLQJ ZLWK VRFLDO PRELOLW\ DQG LQWHUUDWHU DJUHHPHQW :H RXWOLQH D PD[LPXP OLNHOLKRRG ILWWLQJ DOJRULWKP WKDW FDQ EH XVHG IRU ILWWLQJ D ODUJH FODVV RI PRGHOV WKDW LQFOXGHV WKH FODVV RI VLPXOWDQHRXV PRGHOV 7KH DOJRULWKP XVHV /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV DQG D PRGLILHG 1HZWRQ5DSKVRQ LWHUDWLYH VFKHPH :H DOVR GLVFXVV JRRGQHVVRIILW WHVWV DQG PRGHOEDVHG LQIHUHQFHV ,QIHUHQFHV IRU FHUWDLQ PRGHO SDUDPHWHUV DUH VKRZQ WR EH HTXLYDOHQW IRU SURGXFW3RLVVRQ DQG SURGXFWPXOWLQRPLDO YL
PAGE 7
VDPSOLQJ DVVXPSWLRQV 7KLV XVHIXO HTXLYDOHQFH UHVXOW JHQHUDOL]HV H[LVWLQJ UHVXOWV 7KH PRGHOV DQG ILWWLQJ PHWKRG DUH LOOXVWUDWHG IRU VHYHUDO DSSOLFDWLRQV 0LVVLQJ GDWD DUH RIWHQ D SUREOHP IRU PXOWLYDULDWH UHVSRQVH GDWD :H FRQVLGHU LQIHUHQFHV DERXW ORJOLQHDU PRGHOV IRU ZKLFK RQO\ FHUWDLQ GLVMRLQW VXPV RI WKH GDWD DUH REVHUYDEOH :H GHULYH DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ DVVRFLDWHG ZLWK WKH ORJOLQHDU SDUDPHWHUV WKDW LV LQWXLWLYHO\ DSSHDOLQJ DQG VLPSOH WR HYDOXDWH 7KH REVHUYHG LQIRUPDWLRQ PDWUL[ FDQ EH HYDOXDWHG DW WKH PD[LPXP OLNHOLKRRG HVWLPDWHV DQG LQYHUWHG WR REWDLQ DQ HVWLPDWH RI WKH SUHFLVLRQ RI WKH ORJOLQHDU SDUDPHWHU HVWLPDWHV 7KH (0DOJRULWKP FDQ EH XVHG WR ILW WKHVH LQFRPSOHWH GDWD ORJOLQHDU PRGHOV :H GHVFULEH WKLV DOJRULWKP LQ VRPH GHWDLO SD\LQJ VSHFLDO DWWHQWLRQ WR WKH 3RLVVRQ ORJOLQHDU PRGHO ILWWLQJ FDVH $OWHUQDWLYH ILWWLQJ DOJRULWKPV DUH DOVR RXWOLQHG 2QH SURSRVHG DOWHUQDWLYH XVHV ERWK WKH (0 DQG 1HZWRQ5DSKVRQ DOJRULWKP WKHUHE\ UHVXOWLQJ LQ D IDVWHU PRUH VWDEOH DOJRULWKP :H LOOXVWUDWH WKH XWLOLW\ RI WKHVH UHVXOWV XVLQJ ODWHQW FODVV PRGHO ILWWLQJ 9OO
PAGE 8
&+$37(5 ,1752'8&7,21 $ %ULHI ,QWURGXFWLRQ WR WKH 3UREOHP 7KHUH DUH PDQ\ VLWXDWLRQV ZKHQ PXOWLSOH UHVSRQVHV DUH REVHUYHG IRU HDFK fVXEMHFWf LQ D JURXS RU VHYHUDO JURXSV +HUH fVXEMHFWf LV JHQHULFDOO\ XVHG WR UHIHU WR D UDQGRPO\ FKRVHQ REMHFW WKDW JHQHUDWHV UHVSRQVHV 7KH PXOWLSOH UHVSRQVHV FRXOG UHSUHVHQW UHSHDWHG PHDVXUHPHQWV WDNHQ RQ VXEMHFWV RYHU WLPH RU RFFDVLRQV 7KH\ FRXOG EH WKH UDWLQJV DVVLJQHG E\ VHYHUDO MXGJHV WKDW DOO YLHZHG DQG UDWHG WKH VDPH VHW RI VOLGHV KHUH WKH fVXEMHFWVf DUH WKH VOLGHVf 2U SHUKDSV LW PD\ EH WKDW VHYHUDO GLVWLQFW RU QRQFRPPHQVXUDWH UHVSRQVHV DUH UHFRUGHG IRU HDFK VXEMHFW 7KHVH UHVSRQVHV DUH RIWHQ FDWHJRULFDOfÂ§RUGLQDO RU QRPLQDOfÂ§DQG LQHYLWDEO\ LQWHUUHODWHG 7KLV GLVVHUWDWLRQ DGGUHVVHV LVVXHV UHODWHG WR PRGHOLQJ DQG PRGHO ILWWLQJ IRU PXOWLYDULDWH FDWHJRULFDO RUGLQDO RU QRPLQDOf UHVSRQVHV 0RGHOV IRU PXOWLYDULDWH FDWHJRULFDO UHVSRQVH GDWD DUH XVXDOO\ GHYHORSHG WR DQVZHU TXHVWLRQV DERXW Lf WKH DVVRFLDWLRQ VWUXFWXUH DPRQJ WKH PXOWLSOH UHVSRQVHV RU LLf WKH EHKDYLRU RI WKH PDUJLQDO GLVWULEXWLRQV RI WKH UHVSRQVH YDULDEOHV 6SHFLILFDOO\ D W\SLFDO TXHVWLRQ RI WKH ILUVW W\SH LV f+RZ DUH WKH UHVSRQVHV LQWHUUHODWHG DQG LV WKLV LQWHUUHODWLRQVKLS WKH VDPH DFURVV WKH OHYHOV RI WKH FRYDULDWHV"f $ W\SLFDO W\SH LL TXHVWLRQ LV f+RZ GR WKH PDUJLQDOf UHVSRQVHV GHSHQG RQ WKH FRYDULDWHV RU RFFDVLRQV"f +LVWRULFDOO\ PDQ\ PRGHOV HJ ORJ DQG ORJLWOLQHDU PRGHOVf KDYH EHHQ GHYHORSHG IRU WKH SULPDU\
PAGE 9
SXUSRVH RI DQVZHULQJ WKH W\SH L TXHVWLRQV 0DQ\ RI WKHVH PRGHOV FDQ HDVLO\ EH ILWWHG XVLQJ PD[LPXP OLNHOLKRRG 0/f PHWKRGV 7KHVH PRGHOV W\SLFDOO\ KRZHYHU DUH QRW XVHIXO IRU DQVZHULQJ WKH W\SH LL TXHVWLRQV &R[ f 0DUJLQDO PRGHOVfÂ§WKRVH PRGHOV XVHG WR DQVZHU W\SH LL TXHVWLRQVfÂ§DUH QRW DV ZHOO GHYHORSHG 2QH UHDVRQ IRU WKLV LV WKDW 0/ ILWWLQJ RI WKHVH PDUJLQDO PRGHOV LV PRUH GLIILFXOW $W SUHVHQW WKH PHWKRG RI ZHLJKWHG OHDVW VTXDUHV :/6f LV XVHG DOPRVW H[FOXVLYHO\ IRU ILWWLQJ WKHVH PRGHOV 6XSSRVH WKDW ZH DUH LQWHUHVWHG LQ DQVZHULQJ TXHVWLRQV RI ERWK W\SHV L DQG LL 8VXDOO\ WKH TXHVWLRQV DUH DGGUHVVHG XVLQJ WZR GLIIHUHQW PRGHOV D MRLQW GLVWULEXWLRQ PRGHO DQG D PDUJLQDO PRGHO DQG ILWWLQJ WKHP VHSDUDWHO\ ,W VHHPV UHDVRQDEOH WR ZDQW D PRGHO WKDW FDQ EH XVHG WR DGGUHVV VLPXOWDQHRXVO\ ERWK TXHVWLRQV 7KDW LV ZH ZRXOG OLNH D PRGHO WKDW VLPXOWDQHRXVO\ LPSOLHV VWUXFWXUH RQ ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQ SDUDPHWHUV 7R GDWH WKHUH KDV EHHQ YHU\ OLWWOH ZRUN GRQH RQ WKH GHYHORSPHQW DQG ILWWLQJ RI WKHVH VLPXOWDQHRXV PRGHOV :KHQHYHU PXOWLSOH UHVSRQVHV DUH REVHUYHG LW LV LQHYLWDEOH WKDW WKHUH ZLOO EH PLVVLQJ GDWD 7KHUH DUH VHYHUDO ZD\V WR ILW WKH 3RLVVRQ ORJOLQHDU PRGHO ZLWK LQFRPSOHWH GDWD 2QH SRSXODU PHWKRG LV WR XVH WKH (0 DOJRULWKP WR ILQG WKH 0/ HVWLPDWHV RI WKH ORJOLQHDU SDUDPHWHUV 2QH GUDZEDFN WR WKLV DOJRULWKP LV WKDW D SUHFLVLRQ HVWLPDWH RI WKH 0/ HVWLPDWRUV LV QRW SURGXFHG DV D E\n SURGXFW 6HYHUDO QXPHULFDO WHFKQLTXHV KDYH EHHQ GHYHORSHG WR DSSUR[LPDWH WKH REVHUYHG LQIRUPDWLRQ PDWUL[ ZKLFK XSRQ LQYHUVLRQ ZLOO DFW DV WKH SUHFLVLRQ HVWLPDWH +RZHYHU LW ZRXOG EH RI VRPH FRQYHQLHQFH WR GHULYH DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ DW OHDVW LQ VRPH VSHFLDO FDVHV
PAGE 10
2XWOLQH RI ([LVWLQJ 0HWKRGRORJLHVfÂ§1R 0LVVLQJ 'DWD :H EHJLQ RXU GLVFXVVLRQ E\ FRQVLGHULQJ WKH FDVH RI QR PLVVLQJ GDWD 7KHUH DUH PDQ\ PHWKRGV IRU DQDO\]LQJ PXOWLYDULDWH FDWHJRULFDO RUGLQDO RU QRPLQDOf UHVSRQVH GDWD 7KHVH PHWKRGV XVXDOO\ LQYROYH ILWWLQJ VHSDUDWHO\f PRGHOV IRU WKH MRLQW RU WKH PDUJLQDO GLVWULEXWLRQV RI WKH UHVSRQVH YHFWRUV ,Q UDUH LQVWDQFHV VLPXOWDQHRXV PRGHOV IRU ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV DUH FRQVLGHUHG 0D[LPXP OLNHOLKRRG ILWWLQJ PHWKRGV IRU WKH MRLQW GLVWULEXWLRQ PRGHOV DUH VLPSOH DQG GHVFULEHG LQ DOPRVW HYHU\ VWDQGDUG WH[W RQ FDWHJRULFDO GDWD DQDO\VLV 7KH ILWWLQJ RI PDUJLQDO PRGHOV XVLQJ 0/ PHWKRGV LV PRUH GLIILFXOW 0D[LPXP OLNHOLKRRG ILWWLQJ RI WKH PDUJLQDO KRPRJHQHLW\ PRGHO ZDV FRQVLGHUHG E\ 0DGDQVN\ f DQG /LSVLW] f 7KH ILWWLQJ RI D PRUH JHQHUDO FODVV RI PDUJLQDO PRGHOV ZDV FRQVLGHUHG E\ +DEHU Df )LQDOO\ WKH ILWWLQJ RI VLPXOWDQHRXV PRGHOV XVLQJ 0/ PHWKRGV KDV RQO\ EHHQ DGGUHVVHG LQ WKH ELYDULDWH UHVSRQVH FDVH 7KH ILWWLQJ WHFKQLTXH EHFRPHV YHU\ FRPSOLFDWHG ZKHQ WKHUH DUH PRUH WKDQ WZR FDWHJRULFDO UHVSRQVHV 7R DSSUHFLDWH WKH FRPSOH[LW\ RI H[WHQGLQJ WKH WHFKQLTXH WR PXOWLYDULDWH UHVSRQVH GDWD VHH VHFWLRQ RI 0F&XOODJK DQG 1HOGHU f RU SHUKDSV 'DOH f ,Q FRQWUDVW WKH 0/ ILWWLQJ PHWKRG RI &KDSWHU FDQ HDVLO\ EH XVHG WR ILW PDQ\ PDUJLQDO DQG VLPXOWDQHRXV PRGHOV ,Q WKH QH[W IHZ SDUDJUDSKV ZH EULHIO\ GHVFULEH WKH H[LVWLQJ PHWKRGV IRU PRGHOLQJ DQG PRGHO ILWWLQJ IRU PXOWLYDULDWH FDWHJRULFDO UHVSRQVH GDWD 0RGHOLQJ RLQW 'LVWULEXWLRQV 6HSDUDWHO\ 2QH FRPPRQ PHWKRG IRU DQDO\]n LQJ PXOWLYDULDWH FDWHJRULFDO UHVSRQVHV LV WR PRGHO WKH MRLQW GLVWULEXWLRQ RQO\ 7KHVH PRGHOV ZKLFK LQFOXGH FODVVLFDO ORJ DQG ORJLWOLQHDU PRGHOV IRU WKH
PAGE 11
MRLQW SUREDELOLWLHV DUH XVHIXO IRU GHVFULELQJ WKH DVVRFLDWLRQ VWUXFWXUH DPRQJ WKH UHVSRQVHV 7KH ODVW \HDUV KDYH VHHQ WKH GHYHORSPHQW RI WKHVH PHWKRGV IRU DQDO\]LQJ PXOWLYDULDWH FDWHJRULFDO UHVSRQVHV +DEHUPDQ %LVKRS HW DO $JUHVWL f )RU VSHFLILFLW\ FRQVLGHU WKH IROORZLQJ SDQHO VWXG\ 2QH KXQGUHG UDQGRPO\ VHOHFWHG VXEMHFWV ZHUH DVNHG KRZ LQWHUHVWHG WKH\ ZHUH LQ WKH SROLWLFDO FDPSDLJQV 7KH\ ZHUH WR UHVSRQG RQ WKH SRLQW RUGLQDO VFDOH Of 1RW 0XFK f 6RPHZKDW DQG f 9HU\ 0XFK 7KHQ IRXU \HDUV ODWHU WKH VDPH JURXS RI VXEMHFWV ZDV DVNHG WR UHVSRQG RQ WKH VDPH VFDOH WR WKH VDPH TXHVWLRQ $ VHSDUDWH LQYHVWLJDWLRQ LQWR WKH DVVRFLDWLRQ VWUXFWXUH ZRXOG HQDEOH XV WR DQVZHU TXHVWLRQV RI D FRQGLWLRQDO QDWXUH )RU H[DPSOH ZH FRXOG HVWLPDWH WKH SUREDELOLW\ RI UHVSRQGLQJ f9HU\ 0XFKf RQ WKH VHFRQG RFFDVLRQ JLYHQ WKDW WKH UHVSRQVH DW WKH ILUVW RFFDVLRQ ZDV f1RW 0XFKf 7KH GHVFULSWLRQ RI WKHVH fWUDQVLWLRQDOf SUREDELOLWLHV DOWKRXJK YHU\ LQWHUHVWLQJ PD\ QRW EH FRPSOHWHO\ VDWLVIDFWRU\ :H PD\ DOVR EH LQWHUHVWHG LQ DGGUHVVLQJ TXHVWLRQV ZLWK UHJDUG WR WKH PDUJLQDO GLVWULEXWLRQV 3HUKDSV ZH ZRXOG OLNH WR DQVZHU WKH TXHVWLRQ f$UH WKH GLVWULEXWLRQV RI UHVSRQVHV WR WKH SROLWLFDO LQWHUHVW TXHVWLRQ WKH VDPH IRU HDFK RFFDVLRQ"f /DLUG f LQ D QLFH UHYLHZ RI OLNHOLKRRGEDVHG PHWKRGV IRU ORQJLWXGLQDO DQDO\VLV PHQWLRQV WKDW WKH XWLOLW\ RI FODVVLFDO ORJ DQG ORJLWOLQHDU PRGHOV LV UHVWULFWHG WR WZR VLWXDWLRQV f PRGHOLQJ WKH GHSHQGHQFH RI D XQLYDULDWH UHVSRQVH RQ D VHW RI FRYDULDWHV DQG f PRGHOLQJ WKH DVVRFLDWLRQ VWUXFWXUH EHWZHHQ D VHW RI PXOWLYDULDWH UHVSRQVHV 7KHVH PRGHOV SODFH VWUXFWXUH RQ WKH MRLQW SUREDELOLWLHV DQG VR WKH\ DUH QRW GLUHFWO\ XVHIXO IRU VWXG\LQJ WKH GHSHQGHQFH RI WKH PDUJLQDO SUREDELOLWLHV RQ RFFDVLRQ DQG RWKHU FRYDULDWHV 7KLV SUREOHP ZDV SRLQWHG RXW E\ VHYHUDO DXWKRUV &R[ 3UHQWLFH 0F&XOODJK DQG 1HOGHU
PAGE 12
/LDQJ HW DO f $Q DGYDQWDJH RI WKHVH PRGHOV LV WKDW WKH\ DUH VLPSOH WR ILW XVLQJ HLWKHU :/6 *UL]]OH HW DO f 0/ 0F&XOODJK DQG 1HOGHU f RU LWHUDWLYH SURSRUWLRQDO ILWWLQJ %LVKRS HW DO f PHWKRGV 7KHUH DUH PDQ\ VWDQGDUG VWDWLVWLFDO SURJUDPV DYDLODEOH IRU ILWWLQJ WKHVH PRGHOV 6$6 6366 %0'3 */,0 *(167$7f 0RGHOLQJ 0DUJLQDO 'LVWULEXWLRQV 6HSDUDWHO\ $ VHFRQG DSSURDFK WR DQn DO\]LQJ PXOWLYDULDWH FDWHJRULFDO UHVSRQVHV LV WR PRGHO RQO\ WKH PDUJLQDO GLVWULEXWLRQV DQG WR LJQRUH WKH MRLQW GLVWULEXWLRQ VWUXFWXUH )XOO OLNHOLKRRG PHWKRGV WKDW FRQVLGHU RQO\ PRGHOV IRU WKH PDUJLQDO SUREDELOLWLHV WDFLWO\ DVVXPH D VDWXUDWHG PRGHO IRU WKH MRLQW GLVWULEXWLRQ 7KHUHIRUH WKH PRGHOV PD\ EH IDU IURP SDUVLPRQLRXV ,Q WKH QRQ*DXVVLDQ UHVSRQVH VHWWLQJ WKHUH LV D GLVWLQFWLRQ EHWZHHQ WKHVH PDUJLQDO PRGHOV DQG WKH WUDQVLWLRQDO RU FRQGLWLRQDOf PRGHOV RI WKH SUHYLRXV SDUDJUDSK 0DUJLQDO PRGHOV GHVFULEH WKH RFFDVLRQVSHFLILF GLVWULEXWLRQV DQG WKH GHSHQGHQFH RI WKRVH GLVWULEXWLRQV RQ WKH FRYDULDWHV 7UDQVLWLRQDO RU FRQGLWLRQDO PRGHOV GHVFULEH WKH GLVWULEXWLRQ RI LQGLYLGXDO FKDQJHV RYHU RFFDVLRQV 0RGHOV IRU WKHVH WUDQVLWLRQV FDQ EH UHSUHVHQWHG DV SUREDELOLW\ GLVWULEXWLRQV IRU WKH IXWXUH VWDWH fJLYHQf WKH SDVW VWDWHV 4XHVWLRQV UHJDUGLQJ WUDQVLWLRQ SUREDELOLWLHV FDQ RQO\ EH LQYHVWLJDWHG ZLWK ORQJLWXGLQDO GDWD 2Q WKH RWKHU KDQG TXHVWLRQV UHJDUGLQJ WKH PDUJLQDO SUREDELOLWLHV FRXOG WKHRUHWLFDOO\ EH DQVZHUHG XVLQJ FURVVVHFWLRQDO GDWD SURYLGHG WKH FRKRUW VXEMHFWf HIIHFWV ZHUH QHJOLJLEOH 3DQHO VWXGLHV UHVXOWLQJ LQ ORQJLWXGLQDO GDWD UHVXOW LQ PRUH SRZHUIXO WHVWV IRU VLJQLILFDQFH RI ZLWKLQ FOXVWHU IDFWRUV VXFK DV RFFDVLRQ HIIHFW 7KLV IROORZV EHFDXVH WKHUH LV D UHGXFHG FRKRUW HIIHFW ZH DUH XVLQJ WKH VDPH SDQHO RI VXEMHFWV DW HDFK RFFDVLRQ )RU
PAGE 13
IXUWKHU GLVFXVVLRQ DERXW WKH GLVWLQFWLRQ EHWZHHQ PDUJLQDO DQG WUDQVLWLRQDO PRGHOV VHH :DUH HW DO f /DLUG f DQG =HJHU f :H ZLOO EULHIO\ GLVFXVV H[LVWLQJ PHWKRGV IRU PDNLQJ LQIHUHQFHV DERXW WKH PDUJLQDO SUREDELOLWLHV VHSDUDWHO\ :H ZLOO JURXS WKHVH PHWKRGV LQWR FDWHJRULHV Of QRQPRGHOEDVHG PHWKRGV f :/6 PHWKRGV f 0/ PHWKRGV f 6HPLSDUDPHWULF PHWKRGV DQG f RWKHU PHWKRGV 1RQPRGHOEDVHG PHWKRGV FDQ EH XVHG WR GHULYH WHVW VWDWLVWLFV XVHG IRU WHVWLQJ VSHFLILF K\SRWKHVHV UHJDUGLQJ WKH PDUJLQDO GLVWULEXWLRQV ([DPSOHV LQFOXGH WKH &RFKUDQ0DQWHO+DHQV]HO f VWDWLVWLF ZKLFK FDQ EH XVHG IRU WHVWLQJ WKH K\SRWKHVLV RI PDUJLQDO KRPRJHQHLW\ 0+f FI :KLWH HW DK f 0F1HPDUfV f VWDWLVWLF ZKLFK FDQ EH XVHG IRU WHVWLQJ WKH HTXDOLW\ RI WZR GHSHQGHQW SURSRUWLRQV DQG 0DGDQVN\fV f OLNHOLKRRGUDWLR VWDWLVWLF IRU 0+ 0DGDQVN\fV VWDWLVWLF LV D GLIIHUHQFH LQ ILW RI WKH PRGHO RI PDUJLQDO KRPRJHQHLW\ WR WKH ILW RI WKH XQVWUXFWXUHG VDWXUDWHGf PRGHO VHH DOVR /LSVLW] DQG /LSVLW] HW DO f 0DQ\ RWKHU UHOHYDQW WHVW VWDWLVWLFV VRPH RI ZKLFK DUH JHQHUDOL]DWLRQV RU PRGLILFDWLRQV RI WKH DIRUHPHQWLRQHG FI 0DQWHO :KLWH HW DO f H[LVW &RFKUDQfV f 4 VWDWLVWLF DQG 'DUURFKfV f :DOGW\SH VWDWLVWLF DUH H[DPSOHV RI RWKHU WHVW VWDWLVWLFV WKDW FDQ EH XVHG WR WHVW IRU PDUJLQDO KRPRJHQHLW\ 3UHVHQWO\ LI RQH ZDV WR ILW D PDUJLQDO PRGHO VD\ D JHQHUDOL]HG ORJOLQHDU PRGHO RI WKH IRUP &ORJ$IL ;IO ZKHUH cM LV WKH YHFWRU RI H[SHFWHG FRXQWV LQ WKH IXOO FRQWLQJHQF\ WDEOH KH RU VKH ZRXOG PRVW OLNHO\ XVH WKH :/6 ILWWLQJ DOJRULWKP 0RVW VWDWLVWLFDO VRIWZDUH WKDW ILWV WKHVH JHQHUDOL]HG ORJOLQHDU PRGHOV GRHV VR XVLQJ :/6 7KHUH DUH VRPH DGYDQWDJHV WR XVLQJ :/6 ,W LV FRPSXWDWLRQDOO\ VLPSOH 6HFRQGRUGHU PDUJLQDO LQIRUPDWLRQ LV DOO WKDW LV
PAGE 14
QHHGHG $QG WKH HVWLPDWHV DUH DV\PSWRWLFDOO\ HTXLYDOHQW WR 0/ HVWLPDWHV 6RPH GLVDGYDQWDJHV DUH WKDW FRYDULDWHV PXVW EH FDWHJRULFDO VDPSOLQJ ]HURHV FUHDWH SUREOHPV DQG HVWLPDWHV DUH VHQVLWLYH ZKHQ VHFRQGRUGHU PDUJLQDO FRXQWV DUH VPDOO 7KH :/6 PHWKRG IRU DQDO\]LQJ FDWHJRULFDO GDWD ZDV RULJLQDOO\ RXWOLQHG E\ *UL]]OH 6WDUPHU DQG .RFK f 6XEVHTXHQWO\ PDUJLQDO PRGHOV IRU ORQJLWXGLQDO FDWHJRULFDO GDWD RU PRUH JHQHUDOO\ PXOn WLYDULDWH FDWHJRULFDO UHVSRQVH GDWD KDYH EHHQ LQWURGXFHG DQG ILWWHG XVLQJ WKH :/6 PHWKRG .RFK HW DK /DQGLV DQG .RFK /DQGLV HW DK $JUHVWL f 0D[LPXP OLNHOLKRRG ILWWLQJ RI PDUJLQDO PRGHOV LV PRUH GLIILFXOW VLQFH WKH PRGHO XWLOL]HV PDUJLQDO SUREDELOLWLHV UDWKHU WKDQ MRLQW SUREDELOLWLHV WR ZKLFK WKH OLNHOLKRRG UHIHUV :KHQ WKH UHVSRQVHV DUH FRUUHODWHG DV WKH\ LQYDULDEO\ DUH WKH PDUJLQDO FRXQWV GR QRW IROORZ D SURGXFWPXOWLQRPLDO GLVWULEXWLRQ 7KH IXOOWDEOH OLNHOLKRRG PXVW EH PD[LPL]HG VXEMHFW WR WKH FRQVWUDLQW WKDW WKH PDUJLQDO SUREDELOLWLHV VDWLVI\ WKH PRGHO +DEHU Df FRQVLGHUV ILWWLQJ JHQHUDOL]HG ORJOLQHDU PRGHOV RI WKH IRUP &ORJ $S ; XVLQJ /DJUDQJH PXOWLSOLHUV DQG DQ XQPRGLILHG 1HZWRQ5DSKVRQ LWHUDWLYH VFKHPH 7KH DOJRULWKP EHFRPHV YHU\ GLIILFXOW WR LPSOHPHQW IRU HYHQ PRGHUDWHO\ ODUJH WDEOHV 7KLV LV SULPDULO\ GXH WR WKH GLIILFXOW\ RI LQYHUWLQJ WKH ODUJH +HVVLDQ PDWUL[ RI WKH /DJUDQJLDQ REMHFWLYH IXQFWLRQ ,Q WKLV GLVVHUWDWLRQ ZH FRQVLGHU D PRGLILHG 1HZWRQ5DSKVRQ WKDW XVHV D PXFK VLPSOHU PDWUL[ WKDQ WKH +HVVLDQ 7KH PDWUL[ LV HDVLO\ LQYHUWHG HYHQ IRU UHODWLYHO\ ODUJH WDEOHV +DEHU Ef FRQVLGHUV WKH HVWLPDWLRQ RI WKH SDUDPHWHUV c LQ WKH VSHFLDO FDVH &ORJ[ ; :H ZLOO XVH D PRGLILFDWLRQ RI WKH PHWKRG RI $LWFKLVRQ DQG 6LOYH\ f DQG 6LOYH\ f WR LQYHVWLJDWH WKH DV\PSWRWLF EHKDYLRU RI WKH HVWLPDWRUV RI
PAGE 15
LQ WKH PRUH JHQHUDO PRGHO & ORJ $IL ; WKHUHE\ H[WHQGLQJ WKH ZRUN RI +DEHU Ef $QRWKHU UHOHYDQW SDSHU +DEHU DQG %URZQ f FRQVLGHUV 0/ ILWWLQJ RI D PRGHO IRU WKH H[SHFWHG FRXQWV L WKDW KDV ORJOLQHDU DQG OLQHDU FRQVWUDLQWV 2QH FDQ WHVW K\SRWKHVHV DERXW WKH PDUJLQDO SUREDELOLWLHV E\ FRPSDULQJ WKH ILW RI UHOHYDQW PRGHOV +DEHU D Ef DQG +DEHU DQG %URZQ f RQO\ FRQVLGHU ILWWLQJ WKH PDUJLQDO PRGHOV VHSDUDWHO\ 1R DWWHPSW KDV EHHQ PDGH WR VLPXOWDQHRXVO\ PRGHO WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV 6HPLSDUDPHWULF PHWKRGV VXFK DV TXDVLOLNHOLKRRG :HGGHUEXUQ f DQG D PXOWLYDULDWH H[WHQVLRQ JHQHUDOL]HG HVWLPDWLQJ HTXDWLRQV *((f KDYH EHFRPH SRSXODU LQ UHFHQW \HDUV 7KH ZRUN RI /LDQJ DQG =HJHU f ZKLFK DGYRFDWHG WKH XVH RI WKHVH *((V KDV EHHQ H[WHQGHG WR FRYHU WKH PXOWLYDULDWH FDWHJRULFDO UHVSRQVH GDWD VHWWLQJ 3UHQWLFH =KDR DQG 3UHQWLFH 6WUDP HW DK /LDQJ HW DK f :LWK WKHVH VHPLSDUDPHWULF PHWKRGV WKH OLNHOLKRRG LV QRW FRPSOHWHO\ VSHFLILHG ,QVWHDG JHQHUDOL]HG HVWLPDWLQJ HTXDWLRQV DUH FKRVHQ VR WKDW ZKHQ WKH PDUJLQDO PRGHO KROGV HYHQ LI WKH DVVRFLDWLRQ DPRQJ WKH PXOWLSOH UHVSRQVHV LV PLVVSHFLILHG WKH HVWLPDWRUV DUH FRQVLVWHQW DQG DV\PSWRWLFDOO\ QRUPDOO\ GLVWULEXWHG 7KHVH HVWLPDWRUV XVHG LQ FRQMXQFWLRQ ZLWK D UREXVW HVWLPDWRU RI WKHLU FRYDULDQFH /LDQJ DQG =HJHU =HJHU DQG /LDQJ :KLWH 5R\DOO f UHVXOW LQ FRQVLVWHQW LQIHUHQFH DERXW WKH HIIHFWV RI LQWHUHVW :KHQ WKH UHVSRQVHV DUH WUXO\ LQGHSHQGHQW WKH HVWLPDWLQJ HTXDWLRQV ZLWK FRUUHODWLRQ PDWUL[ WDNHQ WR EH WKH LGHQWLW\ PDWUL[ DUH HTXLYDOHQW WR WKH OLNHOLKRRG HTXDWLRQV 7KH *(( DSSURDFK UHTXLUHV WKH VSHFLILFDWLRQ RI D fZRUNLQJf DVVRFLDWLRQ RU FRUUHODWLRQ PDWUL[ ([DPSOHV RI ZRUNLQJ DVVRFLDWLRQV LQFOXGH WKRVH WKDW LPSO\ DOO
PAGE 16
SDLUZLVH DVVRFLDWLRQV PHDVXUHG LQ WHUPV RI RGGV UDWLRVf DUH WKH VDPH DQG WKDW WKH KLJKHU RUGHU DVVRFLDWLRQV DUH QHJOLJLEOH /LDQJ HW DK f $ UHODWHG DSSURDFK LV NQRZQ DV *(( 7KH FRQVLVWHQF\ RI WKHVH HVWLn PDWRUV IROORZV RQO\ LI ERWK WKH PDUJLQDO PRGHO DQG WKH SDLUZLVH DVVRFLDWLRQ PRGHO DUH FRUUHFWO\ VSHFLILHG 7KLV DSSURDFK LV D VHFRQG RUGHU H[WHQVLRQ RI WKH *((V RI /LDQJ DQG =HJHU f ZKLFK DUH QRZ WHUPHG *((O ,W LV VHFRQG RUGHU EHFDXVH WKH HVWLPDWLRQ RI WKH PDUJLQDO PRGHO SDUDPHWHUV DQG WKH SDLUZLVH DVVRFLDWLRQ PRGHO SDUDPHWHUV LV FRQVLGHUHG VLPXOWDQHRXVO\ 7KH IRFXV RI ERWK DSSURDFKHV *((O DQG *(( LV XVXDOO\ RQ PRGHOLQJ WKH PDUJLQDO GLVWULEXWLRQVfÂ§LQYHVWLJDWLQJ KRZ WKH PDUJLQDO GLVWULEXWLRQV GHSHQG RQ RFFDVLRQ DQG FRYDULDWHV 7KH DVVRFLDWLRQ LV FRQVLGHUHG D QXLVDQFH 3UHVHQWO\ WKHUH DUH QR WHVWV IRU JRRGQHVVRIILW RI WKHVH PRGHOV DQG VR WKH LQYHVWLJDWLRQ LQWR KRZ ZHOO ERWK PRGHOV ILW FDQ EH GRQH RQO\ DW DQ HPSLULFDO OHYHO 7KH DVVXPSWLRQ WKDW KLJKHU RUGHU HIIHFWV DUH QHJOLJLEOH PD\ QRW EH WHQDEOH 7HVWLQJ SURFHGXUHV WR DVVHVV WKH YDOLGLW\ RI WKHVH DVVXPSWLRQV KDYH \HW WR EH GHYHORSHG $OVR LQ FRQWUDVW WR :/6 DQG 0/ PHWKRGV ZKLFK UHTXLUH RQO\ WKDW WKH PLVVLQJ GDWD EH fPLVVLQJ DW UDQGRPf 0$5f WKH VHPL SDUDPHWULF DSSURDFKHV UHTXLUH WKH PLVVLQJ GDWD WR EH fPLVVLQJ FRPSOHWHO\ DW UDQGRPf 0&$5f 7KH DVVXPSWLRQ WKDW WKH PLVVLQJ GDWD PHFKDQLVP LV 0&$5 LV D PXFK VWURQJHU DVVXPSWLRQ WKDQ 0$5 /LWWOH DQG 5XELQ f )LQDOO\ WKHUH DUH PDQ\ RWKHU DSSURDFKHV WR DQDO\]LQJ WKH PDUJLQDO SUREDELOLW\ VWUXFWXUH VHSDUDWHO\ 7KHUH DUH UDQGRP HIIHFWV PRGHOV ZKHUHE\ VXEMHFWVSHFLILF UDQGRP HIIHFWV LQGXFH D FRUUHODWLRQ VWUXFWXUH RQ WKH PXOWLSOH UHVSRQVHV 7KH PDUJLQDO DSSURDFKfÂ§WKH IXOO OLNHOLKRRG LV REWDLQHG E\ DYHUDJLQJ DFURVV WKH UDQGRP HIIHFWVfÂ§LV FRPSXWDWLRQDOO\ GLIILFXOW 6WLUDWHOOL
PAGE 17
HW DO f $Q DOWHUQDWLYH LV WR FRQGLWLRQ RQ WKH VXIILFLHQW VWDWLVWLFV IRU WKH VXEMHFW HIIHFWV DQG FRQVLGHU ILQGLQJ WKH HVWLPDWHV E\ PD[LPL]LQJ WKH FRQGLWLRQDO OLNHOLKRRG )RU IXUWKHU GHWDLOV RQ WKHVH FRQGLWLRQDO DQG XQFRQGLWLRQDO PHWKRGV VHH 5DVFK 7MXU $JUHVWL 6WLUDWHOOL HW DK &RQDZD\ $V \HW DQRWKHU DOWHUQDWLYH .RFK HW DO f JLYH D ELEOLRJUDSK\ IRU UHOHYDQW QRQSDUDPHWULF PHWKRGV IRU DQDO\]LQJ UHSHDWHG PHDVXUHV GDWD $JUHVWL DQG 3HQGHUJDVW f FRQVLGHU UHSODFLQJ WKH DFWXDO REVHUYDWLRQV E\ WKHLU ZLWKLQ FOXVWHU UDQN DQG WHVWLQJ IRU PDUJLQDO KRPRJHQHLW\ XVLQJ WKH RUGLQDU\ $129$ VWDWLVWLF IRU UHSHDWHG PHDVXUHV GDWD $ WKUHHVWDJH HVWLPDWRU IRU UHSHDWHG PHDVXUHV VWXGLHV ZLWK SRVVLEO\ PLVVLQJ ELQDU\ UHVSRQVHV KDV EHHQ GHYHORSHG E\ /LSVLW] HW DO f 7KLV DSSURDFK LV YHU\ VLPLODU WR D JHQHUDOL]HG OHDVW VTXDUHV DSSURDFK EXW LW KDV VRPH RI WKH QLFH IHDWXUHV RI WKH *(( DSSURDFKHV 2QH RI WKHVH QLFH IHDWXUHV LV WKDW WKH HVWLPDWRUV DQG WKHLU YDULDQFH HVWLPDWHV DUH FRQVLVWHQW XQGHU YHU\ PLOG DVVXPSWLRQV $Q H[WHQVLRQ RI WKLV PHWKRG WR WKH SRO\WRPRXV UHVSRQVH FDVH KDV \HW WR EH GHYHORSHG 6LPXOWDQHRXV ,QYHVWLJDWLRQ RI RLQW DQG 0DUJLQDO 'LVWULEXWLRQV 7KHUH KDV EHHQ YHU\ OLWWOH ZRUN GRQH WR LQYHVWLJDWH VLPXOWDQHRXVO\ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQ VWUXFWXUH ,Q VRPH ZD\V *(( LV DQ DWWHPSW WR GHVFULEH ERWK GLVWULEXWLRQV +RZHYHU RQO\ WKH SDLUZLVH QRW WKH MRLQWf DVVRFLDWLRQ VWUXFWXUH LV PRGHOHG WKH KLJKHURUGHU DVVRFLDWLRQV DUH FRQVLGHUHG D QXLVDQFH 7HVWV FRPSDULQJ QHVWHG PRGHOV KDYH QRW EHHQ GHYHORSHG LQ WKLV VHPLSDUDPHWULF VHWWLQJ )XOO OLNHOLKRRG DSSURDFKHV KDYH EHHQ DGGUHVVHG E\ 'DOH f 0F&XOODJK DQG 1HOGHU &KDSW f DQG %HFNHU DQG %DODJWDV f 'DOH PRGHOV WKH MRLQW GLVWULEXWLRQV RI ELYDULDWH RUGHUHG
PAGE 18
FDWHJRULFDO UHVSRQVHV E\ DVVXPLQJ WKDW WKH ORJ JOREDO RGGV UDWLRV IROORZ D OLQHDU PRGHO 7KH PDUJLQDO SUREDELOLWLHV DUH DVVXPHG WR IROORZ D FXPXODWLYH ORJLW PRGHO 0F&XOODJK DQG 1HOGHU FRQVLGHU VLPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO SUREDELOLWLHV RI D ELYDULDWH GLFKRWRPRXV UHVSRQVH WZR GLVWLQFW UHVSRQVHVf E\ DVVXPLQJ WKDW WKH ORJ RGGVUDWLRV IROORZ D OLQHDU PRGHO DQG WKDW WKH PDUJLQDO SUREDELOLWLHV IROORZ D ORJLWOLQHDU PRGHO 7KHLU H[DPSOH LQFOXGHG DJH DV D FDWHJRULFDO FRYDULDWH )LQDOO\ %HFNHU DQG %DODJWDV FRQVLGHU PRGHOV IRU WZRSHULRG FURVVRYHU GDWD 7KH ELYDULDWH GLFKRWRPRXV UHVSRQVH ZDV WKH UHVSRQVH WR WKH WZR GLIIHUHQW WUHDWPHQWV 2UGHU RI WUHDWPHQW DSSOLFDWLRQ ZDV FRQVLGHUHG D FRYDULDWH 7KH\ DVVXPHG WKDW WKH WZR ORJ RGGV UDWLRV IROORZHG D OLQHDU PRGHO DQG WKDW WKH PDUJLQDO SUREDELOLWLHV VDWLVILHG D ORJOLQHDU PRGHO %HFDXVH LW LV WKH PDUJLQDO SUREDELOLWLHV DQG QRW WKH MRLQW SUREDELOLWLHV WKDW VDWLVI\ D ORJOLQHDU PRGHO %HFNHU DQG %DODJWDV UHIHU WR WKH PRGHO DV ORJ QRQOLQHDU 7KH 0/ PRGHO ILWWLQJ DSSURDFK XVHG E\ HDFK RI WKHVH DXWKRUV LQYROYHV D UHSDUDPHWHUL]DWLRQ RI WKH OLNHOLKRRG ZKLFK LV D IXQFWLRQ RI WKH MRLQW SUREDELOLWLHV LQ WHUPV RI WKH MRLQW DQG PDUJLQDO PRGHO SDUDPHWHUV 7KH UHSDUDPHWHUL]DWLRQ LQ WKH ELYDULDWH UHVSRQVH FDVHfÂ§WKH FDVH HDFK DXWKRU FRQVLGHUHGfÂ§LV VRPHZKDW FRPSOLFDWHG HVSHFLDOO\ IRU PXOWLOHYHO UHVSRQVHV 7R PDNH PDWWHUV ZRUVH WKH H[WHQVLRQ RI WKLV PHWKRG WR JHQHUDO PXOWLYDULDWH SRO\WRPRXV UHVSRQVHV ORRNV WR EH H[WUHPHO\ GLIILFXOW ,I WKH UHSDSDUDPHWHU L]DWLRQV DUH PDGH VR WKDW WKH IXOO OLNHOLKRRG LV H[SUHVVLEOH LQ WHUPV RI WKH MRLQW DQG PDUJLQDO PRGHO SDUDPHWHUV WKH OLNHOLKRRG FDQ EH PD[LPL]HG XVLQJ D 1HZWRQ5DSKVRQW\SH DOJRULWKP %DVLFDOO\ RQH PXVW VROYH IRU WKH URRW RI VRPH QRQOLQHDU VFRUH HTXDWLRQ 7KLV PD[LPL]DWLRQ DSSURDFK LV YHU\ VHQVLWLYH
PAGE 19
WR WKH VWDUWLQJ YDOXH LQ WKDW FRQYHUJHQFH WR D ORFDO PD[LPXP LV QRW OLNHO\ XQOHVV WKH VWDUWLQJ HVWLPDWH LV YHU\ FORVH WR WKH DFWXDO PD[LPXP )LQGLQJ UHDVRQDEOH VWDUWLQJ YDOXHV LV QRW D VLPSOH WDVN 'DOH f RXWOLQHV D PHWKRG VSHFLILFDOO\ IRU WKH PRGHOV FRQVLGHUHG LQ WKDW SDSHU IRU ILQGLQJ D VWDUWLQJ HVWLPDWH ,Q WKLV GLVVHUWDWLRQ ZH RXWOLQH DQ 0/ ILWWLQJ PHWKRG WKDW FDQ HDVLO\ EH XVHG WR ILW D ODUJH FODVV RI VLPXOWDQHRXV PRGHOV LQFOXGLQJ WKRVH FRQVLGHUHG E\ 'DOH 0F&XOODJK DQG 1HOGHU DQG %HFNHU DQG %DODJWDV 7KH DSSURDFK LQYROYHV XVLQJ /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV DORQJ ZLWK D PRGLILHG 1HZWRQ5DSKVRQ LWHUDWLYH VFKHPH )RU DOO RI WKH PRGHOV FRQVLGHUHG DQ LQLWLDO HVWLPDWH IRU WKH DOJRULWKP LV WKH GDWD FRXQWV WKHPVHOYHV DORQJ ZLWK D YHFWRU RI ]HURHV FRUUHVSRQGLQJ WR D ILUVW JXHVV DW WKH YDOXHV RI WKH /DJUDQJH PXOWLSOLHUV 7KH FRQYHUJHQFH RI WKH DOJRULWKP LV TXLWH VWDEOH 7KH H[WHQVLRQ WR PXOWLYDULDWH SRO\WRPRXV UHVSRQVH GDWD LV VWUDLJKWIRUZDUG 2XWOLQH RI ([LVWLQJ 0HWKRGRORJLHVfÂ§0LVVLQJ 'DWD 0LVVLQJ GDWD LV RIWHQ DQ LVVXH ZKHQ WKH UHVSRQVH LV PXOWLYDULDWH LQ QDWXUH 0LVVLQJ GDWD FDQ DOVR RFFXU LQ PRUH K\SRWKHWLFDO VLWXDWLRQV ([DPSOHV LQFOXGH ORJOLQHDU ODWHQW FODVV PRGHOV *RRGPDQ +DEHUPDQ f DQG OLQHDU PL[HG RU UDQGRP HIIHFWV PRGHOV /DLUG HW DK f ,Q ODWHQW FODVV DQDO\VHV D ODWHQW YDULDEOH ZKLFK LV XQREVHUYDEOH LV DVVXPHG WR H[LVW 0L[HG RU UDQGRP HIIHFWV PRGHOV SRVLW WKH H[LVWHQFH RI VRPH XQREVHUYDEOH UDQGRP YDULDEOHV WKDW DIIHFW WKH PHDQ UHVSRQVH ,Q WKLV EULHI RXWOLQH ZH ZLOO FRQVLGHU 0/ PHWKRGV IRU PRGHO ILWWLQJ ZKHQ WKH GDWD DUH QRW FRPSOHWHO\ REVHUYDEOH /LWWOH DQG 5XELQ f SURYLGH D QLFH VXPPDU\ RI PHWKRGV
PAGE 20
IRU PRGHO ILWWLQJ ZLWK LQFRPSOHWH GDWD 7KHUH DUH PDQ\ ZD\V WR ILQG WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV ZKHQ WKH GDWD DUH QRW FRPSOHWHO\ REVHUYDEOH HDFK PHWKRG KDYLQJ LWV SRVLWLYH DQG QHJDWLYH IHDWXUHV :H FRXOG ZRUN GLUHFWO\ ZLWK WKH LQFRPSOHWHGDWD OLNHOLKRRG ZKLFK LV XVXDOO\ FRPSOLFDWHG UHODWLYH WR WKH FRPSOHWHGDWD OLNHOLKRRG DQG XVH D 1HZWRQ5DSKVRQ RU )LVKHUVFRULQJ DOJRULWKP 3DOPJUHQ DQG (NKROP f DQG +DEHUPDQ f XVH WKHVH PHWKRGV WR REWDLQ PD[LPXP OLNHOLKRRG HVWLPDWHV DQG WKHLU VWDQGDUG HUURUV $OWHUQDWLYHO\ ZH FRXOG DYRLG WKH FRPSOLFDWHG OLNHOLKRRG DOWRJHWKHU DQG XVH WKH ([SHFWDWLRQ0D[LPL]DWLRQ DOJRULWKP 'HPSVWHU HW DO f 6XQGEHUJ f GLVFXVVHV WKH SURSHUWLHV RI WKH (0 DOJRULWKP ZKHQ LW LV XVHG WR ILW PRGHOV WR GDWD FRPLQJ IURP WKH UHJXODU H[SRQHQWLDO IDPLO\ 7KH (0 DOJRULWKP LV RQH RI WKH PRUH IOH[LEOH 0/ ILWWLQJ DOJRULWKPV IRU PLVVLQJ GDWD VLWXDWLRQV :H ZLOO SULPDULO\ IRFXV RQ WKLV PHWKRG IRU ILWWLQJ ORJOLQHDU PRGHOV ZLWK LQFRPSOHWH GDWD $OWKRXJK WKH (0 DOJRULWKP LV HDVLO\ LPSOHPHQWHG WR ILW ORJOLQHDU PRGHOV ZLWK LQFRPSOHWH GDWD WKH DOJRULWKP GRHV QRW SURYLGH DQ HVWLPDWH RI SUHFLVLRQ RI WKH PRGHO SDUDPHWHU HVWLPDWRUV 0HQJ DQG 5XELQ f RXWOLQH D VXSSOHPHQWDO (0 6(0f DOJRULWKP ZKHUHE\ XSRQ FRQYHUJHQFH RI WKH (0 DOJRULWKP WKH YDULDQFH PDWUL[ IRU WKH PRGHO HVWLPDWRUV LV DGMXVWHG WR DFFRXQW IRU PLVVLQJ GDWD 7KH DGMXVWPHQW LV D IXQFWLRQ RI WKH UDWH RI FRQYHUJHQFH RI WKH (0 DOJRULWKP ZKLFK LQ WXUQ LV D IXQFWLRQ RI KRZ PXFK LQIRUPDWLRQ LV PLVVLQJ 0HQJ DQG 5XELQ QXPHULFDOO\ HVWLPDWH WKH UDWH RI FRQYHUJHQFH WKHUHE\ REWDLQLQJ DQ HVWLPDWH RI SUHFLVLRQ WKDW UHIOHFWV PLVVLQJQHVV $OWKRXJK WKLV DSSURDFK VKRXOG SURYH WR EH DSSOLFDEOH LQ WKH JHQHUDO VLWXDWLRQ LW VWLOO LV GHVLUDEOH WR GHULYH DQ H[SOLFLW IRUPXOD IRU WKH YDULDQFH PDWUL[ WKDW UHIOHFWV
PAGE 21
PLVVLQJQHVV 2WKHU DXWKRUV 0HLOLMVRQ /RXLV f KDYH GLVFXVVHG PHWKRGV IRU HVWLPDWLQJ SUHFLVLRQ RI PRGHO HVWLPDWRUV ZKHQ WKH GDWD DUH LQFRPSOHWH DQG WKH (0 DOJRULWKP LV XVHG 0HLOLMVRQfV PHWKRG LQYROYHV (0 DLGHG GLIIHUHQWLDWLRQ ZKLFK LV HVVHQWLDOO\ D QXPHULFDO GLIIHUHQWLDWLRQ RI WKH VFRUH YHFWRU 7KH PHWKRG UHOLHV RQ WKH DVVXPSWLRQ WKDW WKH REVHUYHG GDWD FRPSRQHQWV DUH LLG LGHQWLFDOO\ DQG LQGHSHQGHQWO\ GLVWULEXWHGf /RXLV JLYHV DQ DQDO\WLF IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ EDVHG RQ WKH LQFRPSOHWH GDWD 7KH FRPSXWDWLRQ RI WKH REVHUYHG LQIRUPDWLRQ PDWUL[ EDVHG RQ WKLV IRUPXOD LV QRW VWUDLJKWIRUZDUG DQG PXVW EH FRQVLGHUHG VHSDUDWHO\ IRU HDFK VSHFLDO DSSOLFDWLRQ )RUPDW RI 'LVVHUWDWLRQ ,Q &KDSWHU ZH GHYHORS D PD[LPXP OLNHOLKRRG PHWKRG IRU ILWWLQJ D ODUJH FODVV RI PRGHOV IRU PXOWLYDULDWH FDWHJRULFDO UHVSRQVH GDWD 7KLV GHYHORSPHQW IROORZV D JHQHUDO GLVFXVVLRQ DERXW SDUDPHWULF PRGHOLQJ &RQFHSWV VXFK DV GHJUHHV RI IUHHGRP DQG PRGHO GLVWDQFHV RU JRRGQHVV RI ILWf DUH GHVFULEHG DW DQ LQWXLWLYH OHYHO :H DOVR GHVFULEH DQG FRPSDUH WKH DV\PSWRWLF GLVWULEXWLRQV RI IUHHGRP SDUDPHWHU HVWLPDWRUV XQGHU SURGXFWPXOWLQRPLDO DQG SURGXFW 3RLVVRQ VDPSOLQJ DVVXPSWLRQV &KDSWHU KDV PRUH RI DQ DSSOLHG IODYRU :H FRQVLGHU VLPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV RI PXOWLYDULDWH FDWHJRULFDO UHVSRQVH YHFWRUV $ EURDG FODVV RI VLPXOWDQHRXV PRGHOV LV LQWURGXFHG 7KH PRGHOV FDQ EH ILWWHG XVLQJ WKH WHFKQLTXHV RI &KDSWHU 6HYHUDO QXPHULFDO H[DPSOHV DUH FRQVLGHUHG &KDSWHU RXWOLQHV WKH 0/ ILWWLQJ WHFKQLTXH NQRZQ DV WKH (0 DOJRULWKP 7KLV DOJRULWKP LV XVHG WR ILW PRGHOV ZLWK LQFRPSOHWH GDWD 6RPH DGYDQWDJHV DQG GLVDGYDQWDJHV RI XVLQJ
PAGE 22
WKH (0 DOJRULWKP DUH DGGUHVVHG 7KH PRVW LPSRUWDQW GLVDGYDQWDJH LV WKDW WKH DOJRULWKP GRHV QRW SURYLGH DV D E\SURGXFW D SUHFLVLRQ HVWLPDWH RI WKH 0/ HVWLPDWRUV :H GHULYH DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ IRU WKH 3RLVVRQ ORJOLQHDU PRGHO SDUDPHWHUV ZKHQ RQO\ GLVMRLQW VXPV RI WKH FRPSOHWH GDWD DUH REVHUYDEOH $Q DSSOLFDWLRQ WR ODWHQW FODVV PRGHOLQJ LV FRQVLGHUHG :H DOVR SURSRVH DQ 0/ ILWWLQJ DOJRULWKP WKDW XVHV ERWK (0 DQG 1HZWRQ5DSKVRQ VWHSV 7KH PRGLILHG DOJRULWKP VKRXOG SURYH WR KDYH PDQ\ SRVLWLYH IHDWXUHV ,Q WKLV GLVVHUWDWLRQ ZH GR QRW GLVWLQJXLVK W\SRJUDSKLFDOO\ EHWZHHQ VFDODUV YHFWRUV DQG PDWULFHV 3DUDPHWHUV DQG YDULDEOHV DUH WUHDWHG DV REn MHFWV WKHLU GLPHQVLRQV HLWKHU EHLQJ H[SOLFLWO\ VWDWHG RU LPSOLHG FRQWH[WXDOO\ %\ FRQYHQWLRQ IXQFWLRQV WKDW PDS VFDODUV LQWR VFDODUV ZKHQ DSSOLHG WR YHFWRUV ZLOO EH GHILQHG FRPSRQHQWZLVH )RU H[DPSOH LI M UHSUHVHQWV DQ Q [ YHFWRU WKHQ ORJ ORJLAORJAMORJ$fn :H IUHTXHQWO\ XVH DEEUHYLDWLRQV WKDW DUH FRPPRQ LQ WKH VWDWLVWLFDO OLWHUDWXUH 7KH\ LQFOXGH 0/ 0D[LPXP /LNHOLKRRGf :/6 :HLJKWHG /HDVW 6TXDUHVf ,:/6 ,WHUDWLYH 5Hf:HLJKWHG /HDVW 6TXDUHVf DQG (0 ([SHFWDWLRQ0D[LPL]DWLRQf 7KH UDQJH RU FROXPQf VSDFH RI DQ Q [ S PDWUL[ ; LV GHQRWHG E\ 0;f DQG LV GHILQHG DV ^O[ [ ; I H 5S` 7KH V\PEROV p DQG DUH WKH ELQDU\ RSHUDWRUV fGLUHFW SURGXFWf DQG fGLUHFW VXPf 7KH GLUHFW RU .URQHFNHUf SURGXFW LV WDNHQ WR EH WKH ULJKWKDQG SURGXFW 7KDW LV $p% ^$ELM`
PAGE 23
7KH GLUHFW VXP & RI WZR PDWULFHV $ DQG % LV GHILQHG DV & $p % rf 7KH V\PERO 'Qf UHSUHVHQWV D GLDJRQDO PDWUL[ ZLWK WKH HOHPHQWV RI U RQ WKH GLDJRQDO 7KDW LV !L ? Ifff 9 ÂÂQ ,Q &KDSWHU ZH PDNH XVH RI WKH EUDFNHW QRWDWLRQ RIWHQ XVHG E\ VWDWLVWLFDO DQG PDWKHPDWLFDO SURJUDPPLQJ ODQJXDJHV HJ 6SOXV 0DWODEf 7R LOOXVWUDWH WKH QRWDWLRQ FRQVLGHU D PDWUL[ $ 7KH VXEfPDWUL[ $> @ LV WKHQ PDWUL[ $ ZLWK WKH VHFRQG FROXPQ GHOHWHG 6LPLODUO\ WKH PDWUL[ $>@ LV WKH PDWUL[ $ ZLWK WKH WKLUG URZ GHOHWHG (TXDWLRQ QXPEHULQJ LV FRQVHFXWLYH ZLWKLQ VHFWLRQV RI D FKDSWHU WKH ILUVW QXPEHU UHSUHVHQWLQJ WKH FKDSWHU LQ ZKLFK LW DSSHDUV )RU H[DPSOH WKH WKLUWHHQWK HTXDWLRQ LQ VHFWLRQ LV HTXDWLRQ f :LWKLQ HDFK DSSHQGL[ WKH HTXDWLRQV DUH QXPEHUHG FRQVHFXWLYHO\ )RU H[DPSOH WKH WKLUG HTXDWLRQ LQ $SSHQGL[ % LV QXPEHUHG %f 7DEOHV DUH QXPEHUHG FRQVHFXWLYHO\ ZLWKLQ FKDSWHUV VR WKDW IRU LQVWDQFH 7DEOH UHSUHVHQWV WKH VHFRQG WDEOH ZLWKLQ &KDSWHU 7KHRUHPV OHPPDV DQG FRUROODULHV DUH QXPEHUHG LQGHSHQGHQWO\ RI HDFK RWKHU $OO DUH QXPEHUHG FRQVHFXWLYHO\ ZLWKLQ VHFWLRQV 7KHUHIRUH &RUROODU\ LV WKH VHFRQG FRUROODU\ ZLWKLQ VHFWLRQ DQG 7KHRUHP LV WKH ILUVW WKHRUHP ZLWKLQ VHFWLRQ
PAGE 24
&+$37(5 5(675,&7(' 0$;,080 /,.(/,+22' )25 $ *(1(5$/ &/$66 2) 02'(/6 )25 32/<720286 5(63216( '$7$ ,QWURGXFWLRQ ,Q WKLV FKDSWHU ZH FRQVLGHU XVLQJ PD[LPXP OLNHOLKRRG PHWKRGV WR ILW D JHQHUDO FODVV RI SDUDPHWULF PRGHOV IRU XQLYDULDWH RU PXOWLYDULDWH SRO\WRPRXV UHVSRQVH GDWD 7KH PRGHOV ZLOO EH VSHFLILHG LQ WHUPV RI IUHHGRP HTXDWLRQV DQGRU FRQVWUDLQW HTXDWLRQV 7KHVH WZR ZD\V RI VSHFLI\LQJ PRGHOV ZLOO EH GLVFXVVHG DW OHQJWK LQ VHFWLRQ 7KH PRGHO VSHFLILFDWLRQ HTXDWLRQV PD\ EH OLQHDU RU QRQOLQHDU LQ WKH PRGHO SDUDPHWHUV 6SHFLILFDOO\ LI UHSUHVHQWV WKH V [ YHFWRU RI H[SHFWHG FHOO PHDQV WKH OLQHDU FRQVWUDLQWV ZLOO EH RI WKH IRUP /M G DQG WKH QRQOLQHDU FRQVWUDLQWV ZLOO EH RI WKH IRUP 8n&ORJ$ILf 7KH IUHHGRP HTXDWLRQV ZLOO KDYH IRUP &ORJ$Lf ; ZKHUH WKH FRPSRQHQWV RI WKH YHFWRU c DUH UHIHUUHG WR DV WKH IUHHGRP SDUDPHWHUV ,Q &KDSWHU RI WKLV GLVVHUWDWLRQ ZH GLVFXVV PRUH VSHFLILFDOO\ PRGHOV WKDW FDQ EH VSHFLILHG LQ WHUPV RI WKHVH FRQVWUDLQW DQG IUHHGRP HTXDWLRQV 7KH PRGHOV RI WKDW FKDSWHU DOORZ RQH WR VLPXOWDQHRXVO\ PRGHO WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV RI PXOWLYDULDWH SRO\WRPRXV UHVSRQVH YHFWRUV 7KH PD[LPXP OLNHOLKRRG PRGHO ILWWLQJ DOJRULWKP RI WKLV FKDSWHU XWLOL]HV /DJUDQJH PXOWLSOLHUV DQG D PRGLILHG 1HZWRQ5DSKVRQ LWHUDWLYH VFKHPH ,Q SDUWLFXODU WKH PRGHOV ZLOO EH VSHFLILHG LQ WHUPV RI FRQVWUDLQW HTXDWLRQV DQG WKH ORJ OLNHOLKRRG ZLOO EH PD[LPL]HG VXEMHFW WR WKH FRQVWUDLQW HTXDWLRQV EHLQJ
PAGE 25
VDWLVILHG 2QH FRPPRQ RSWLPL]DWLRQ DOJRULWKP IRXQG LQ WKH PDWKHPDWLFV OLWHUDWXUH LV /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV :H VKRZ WKDW /DJUDQJHfV PHWKRG LV HDVLO\ LPSOHPHQWHG IRU 0/ ILWWLQJ RI WKH PRGHOV XQGHU FRQVLGHUDWLRQ LQ WKLV FKDSWHU 2QH SUREOHP ZLWK /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV IRU 0/ ILWWLQJ RI VWDWLVWLFDO PRGHOV KDV EHHQ WKDW LW EHFRPHV FRPSXWDWLRQDOO\ LQIHDVLEOH IRU ODUJH GDWD VHWV %\ XVLQJ D PRGLILHG 1HZWRQ5DSKVRQ PHWKRG ZKLFK LQYROYHV LQYHUWLQJ D PDWUL[ RI D VLPSOHU IRUP WKDQ WKH PRUH FRPSOLFDWHG +HVVLDQ ZH FRQVLGHU ILWWLQJ PRGHOV WR UHODWLYHO\ ODUJH GDWD VHWV :H DOVR H[SORUH WKH DV\PSWRWLF EHKDYLRU RI WKH HVWLPDWRUV ZLWKLQ WKH IUDPHZRUN RI FRQVWUDLQWfÂ§UDWKHU WKDQ IUHHGRPfÂ§PRGHOV 8VXDOO\ DV\PSWRWLF SURSHUWLHV RI PRGHO DQG IUHHGRP SDUDPHWHU HVWLPDWRUV DUH VWXGLHG ZLWKLQ WKH IUDPHZRUN RI IUHHGRP PRGHOV $LWFKLVRQ DQG 6LOYH\ f DQG 6LOYH\ f VWXGLHG WKH DV\PSWRWLF EHKDYLRU RI WKH PRGHO SDUDPHWHU HVWLPDWRUV ZKHQ WKH PRGHO LV VSHFLILHG LQ WHUPV RI FRQVWUDLQW HTXDWLRQV )ROORZLQJ WKH DUJXPHQWV RI $LWFKLVRQ DQG 6LOYH\ ZH GHULYH WKH DV\PSWRWLF GLVWULEXWLRQV RI ERWK WKH PRGHO DQG IUHHGRP SDUDPHWHU HVWLPDWRUV 3UHYLRXV ZRUN E\ +DEHU Df DGGUHVVHG PD[LPXP OLNHOLKRRG PHWKRGV IRU ILWWLQJ PRGHOV RI WKH IRUP &ORJ$Lf ; WR FDWHJRULFDO UHVSRQVH GDWD 6XEVHTXHQWO\ +DEHU DQG %URZQ f GLVFXVVHG 0/ ILWWLQJ IRU ORJOLQHDU PRGHOV WKDW ZHUH DOVR VXEMHFW WR WKH OLQHDU FRQVWUDLQWV /X G ZKHUH WKHVH FRQVWUDLQWV QHFHVVDULO\ LQFOXGH WKH LGHQWLILDELOLW\ FRQVWUDLQW UHTXLUHG RI WKH YHFWRU RI SURGXFWPXOWLQRPLDO
PAGE 26
FHOO PHDQV %RWK RI WKHVH SDSHUV DGYRFDWHG WKH XVH RI /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV WR ILQG WKH PD[LPXP OLNHOLKRRG HVWLPDWHV RI WKH PRGHO SDUDPHWHUV [ 7KH PHWKRG RI +DEHU Df LQYROYHG XVLQJ WKH XQPRGLILHGf 1HZWRQ5DSKVRQ PHWKRG ZKLFK EHFRPHV FRPSXWDWLRQDOO\ XQDWWUDFWLYH DV WKH QXPEHU RI FRPSRQHQWV LQ c[ JHWV PRGHUDWHO\ ODUJH %RWK +DEHU Df DQG +DEHU DQG %URZQ f ZHUH SULPDULO\ FRQFHUQHG ZLWK PHDVXULQJ PRGHO JRRGQHVV RI ILW DQG WKHUHIRUH GLG QRW FRQVLGHU HVWLPDWLRQ RI IUHHGRP SDUDPHWHUV +DEHU Ef GLG FRQVLGHU HVWLPDWLRQ RI IUHHGRP SDUDPHWHUV EXW RQO\ ZKHQ WKH VLPSOHU PRGHO & ORJL ; ZDV XVHG 2QH RI WKH VHYHUDO ZD\V WKDW ZH H[WHQG WKH ZRUN RI +DEHU D Ef DQG +DEHU DQG %URZQ f LV WR FRQVLGHU HVWLPDWLRQ RI WKH IUHHGRP SDUDPHWHUV ZKHQ WKH PRUH JHQHUDO PRGHO &ORJ$I[ ; LV XVHG 2WKHUV KDYH FRQVLGHUHG 0/ ILWWLQJ RI QRQVWDQGDUG PRGHOV IRU PXOWLYDULn DWH SRO\WRPRXV UHVSRQVH GDWD /DLUG f RXWOLQHV WKH GLIIHUHQW DSSURDFKHV WDNHQ E\ GLIIHUHQW DXWKRUV $V DQ H[DPSOH 'DOH f FRQVLGHUHG 0/ ILWWLQJ IRU D SDUWLFXODU FODVV RI PRGHOV IRU ELYDULDWH SRO\WRPRXV RUGHUHG UHVSRQVH GDWD ZKLFK ZHUH RI WKH IRUP &L J^$Lf fÂ§ ;3 6SHFLILFDOO\ WKH ILUVW IUHHGRP HTXDWLRQ VSHFLILHV D ORJOLQHDU PRGHO IRU WKH DVVRFLDWLRQ EHWZHHQ WKH WZR UHVSRQVHV PHDVXUHG E\ WKH JOREDO FURVVUDWLRV FURVVSURGXFW UDWLRV RI TXDGUDQW SUREDELOLWLHVf VR WKDW &? DQG $? DUH RI D SDUWLFXODU IRUP 7KH VHFRQG VHW RI IUHHGRP HTXDWLRQV VSHFLILHV VRPH JHQHUDOL]HG OLQHDU PRGHO 0F&XOODJK DQG 1HOGHU f IRU WKH PDUJLQDO PHDQV RU SUREDELOLWLHV 0D[LPXP OLNHOLKRRG HVWLPDWRUV IRU WKH DVVRFLDWLRQ
PAGE 27
PRGHO IUHHGRP SDUDPHWHUV A DQG WKH PDUJLQDO PRGHO IUHHGRP SDUDPHWHUV M ZHUH VLPXOWDQHRXVO\ FRPSXWHG E\ LWHUDWLYHO\ VROYLQJ WKH VFRUH HTXDWLRQV YLD D TXDVL1HZWRQ DSSURDFK 7R XVH WKLV PD[LPL]DWLRQ WHFKQLTXH WKH VFRUH IXQFWLRQV ZKLFK LQYROYH WKH FHOO SUREDELOLWLHV PXVW EH ZULWWHQ H[SOLFLWO\ DV D IXQFWLRQ RI WKH IUHHGRP SDUDPHWHU YHFA f $ QRQWULYLDO DSSURDFK WR ILQGLQJ UHDVRQDEOH VWDUWLQJ YDOXHV IRU LV GLVFXVVHG E\ 'DOH f $ORQJ ZLWK 'DOH 0F&XOODJK DQG 1HOGHU VHFWLRQ f DQG %HFNHU DQG %DODJWDV f FRQVLGHU ZULWLQJ WKH VFRUH DV DQ H[SOLFLW IXQFWLRQ RI WKH IUHHGRP SDUDPHWHUV VR WKDW WKH PDUJLQDO DQG DVVRFLDWLRQ IUHHGRP SDUDPHWHU HVWLPDWHV PD\ EH FRPSXWHG VLPXOWDQHRXVO\ ,Q JHQHUDO ZKHQ WKHUH DUH PRUH WKDQ WZR UHVSRQVHV WKLV LV QRW D VLPSOH WDVN DQG VR DQ H[WHQVLRQ RI WKLV PHWKRG WR PXOWLYDULDWH SRO\WRPRXV UHVSRQVH GDWD PRGHOV ZLOO EH YHU\ PHVV\ LQGHHG $OVR FRQYHUJHQFH RI WKH LWHUDWLYH VFKHPH UHTXLUHV JRRG LQLWLDO HVWLPDWHV RI WKH IUHHGRP SDUDPHWHU c 7KHVH PD\ EH YHU\ GLIILFXOW WR ILQG ,Q FRQWUDVW WKH PD[LPL]DWLRQ DSSURDFK RI WKLV FKDSWHU ZKLFK LV VLPLODU WR +DEHU Df DQG +DEHU DQG %URZQ f LV VKRZQ WR EH HDVLO\ LPSOHPHQWHG IRU ILWWLQJ PXOWLYDULDWH SRO\WRPRXV UHVSRQVH GDWD PRGHOV :LWK WKLV WHFKQLTXH LW LV QRW QHFHVVDU\ WR ZULWH WKH FHOO PHDQV DV DQ H[SOLFLW IXQFWLRQ RI WKH IUHHGRP SDUDPHWHUV )XUWKHU LQLWLDO HVWLPDWHV RI WKH IUHHGRP SDUDPHWHUV ZKLFK DUH GLIILFXOW WR ILQG DUH QRW QHHGHG IRU WKLV WHFKQLTXH ,QVWHDG RQO\ LQLWLDO HVWLPDWHV RI WKH FHOO PHDQV DQG XQGHWHUPLQHG PXOWLSOLHUV DUH QHHGHG 5HDVRQDEOH LQLWLDO HVWLPDWHV RI WKH FHOO PHDQV DUH WKH FHOO FRXQWV WKHPVHOYHV :KLOH D UHDVRQDEOH LQLWLDO HVWLPDWH RI WKH YHFWRU RI XQGHWHUPLQHG PXOWLSOLHUV LV WKH ]HUR YHFWRUfÂ§WKH YDOXH RI WKH XQGHWHUPLQHG PXOWLSOLHUV ZKHQ WKH PRGHO ILWV WKH GDWD SHUIHFWO\
PAGE 28
:H ZLOO QRZ LQWURGXFH WKH FODVV RI PRGHOV WKDW ZH ZLOO FRQVLGHU IRU WKH UHPDLQGHU RI WKLV FKDSWHU DQG WKH QH[W PRUH DSSOLHG FKDSWHU 7KH PRGHOV KDYH IRUP ORJ&AL$2 fÂ§ ;L3L & ORJ$[f fÂ§ $7 />L fÂ§ G ZKHUH WKH OLQHDU FRQVWUDLQWV LQFOXGH WKH LGHQWLILDELOLW\ FRQVWUDLQWV /DWHU ZKHQ ZH VWXG\ WKH DV\PSWRWLF EHKDYLRU RI WKH 0/ HVWLPDWRUV ZH ZLOO UHTXLUH WKH FRPSRQHQWV RI G WR EH ]HUR XQOHVV WKH\ FRUUHVSRQG WR DQ LGHQWLILDELOLW\ FRQVWUDLQW 7KHVH PRGHOV ZKLFK DUH RI WKH IRUP &ORJ$ILf ; /IM G ZLOO DOORZ XV WR PRGHO ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV VLPXOWDQHRXVO\ ZKHQ GHDOLQJ ZLWK PXOWLYDULDWH UHVSRQVH GDWD 7KH ELYDULDWH DVVRFLDWLRQ PRGHO RI 'DOH f LV D VSHFLDO FDVH RI WKHVH PRGHOV DV ZH FDQ VSHFLI\ WKH PDWULFHV &? DQG $ VR WKDW &L ORJ$LÂf LV WKH YHFWRU RI ORJ ELYDULDWH JOREDO FURVVUDWLRV 5HVWULFWLQJ WKH PDUJLQDO PRGHOV WR KDYH IRUP & ORJ$Âf fÂ§ ;" UDWKHU WKDQ DOORZLQJ WKH PDUJLQDO PHDQV WR IROORZ D JHQHUDOL]HG OLQHDU PRGHO DV 'DOH f GLG LV QRW RYHUO\ UHVWULFWLYH ,Q IDFW PDQ\ RI WKH JHQHUDOL]HG OLQHDU PRGHOV IRU PXOWLQRPLDO FHOO PHDQV FDQ EH ZULWWHQ LQ WKLV IRUP )RU H[DPSOH ORJOLQHDU PXOWLSOH ORJLW DQG FXPXODWLYH ORJLW PRGHOV DUH RI WKLV IRUP $OVR XQOLNH +DEHU Df DQG +DEHU DQG %URZQ f ZH ZLOO EH FRQFHUQHG ZLWK HVWLPDWLRQ RI WKH IUHHGRP SDUDPHWHU c YHFL f WKHUHE\ DOORZLQJ IRU PRGHOEDVHG LQIHUHQFH 0RGHOEDVHG LQIHUHQFHV XVXDOO\ UHIHU WR LQIHUHQFHV EDVHG RQ IUHHGRP SDUDPHWHUV :LWK IUHHGRP HTXDWLRQV ZH KDYH WKH OX[XU\ RI FKRRVLQJ D SDUDPHWHUL]DWLRQ WKDW UHVXOWV LQ WKH IUHHGRP SDUDPHWHUV KDYLQJ PHDQLQJIXO LQWHUSUHWDWLRQV )RU LQVWDQFH D IUHHGRP SDUDPHWHU PD\ EH FKRVHQ WR
PAGE 29
UHSUHVHQW D GHSDUWXUH IURP LQGHSHQGHQFH LQ WKH IRUP RI D ORJ RGGV UDWLR 0RUH JHQHUDOO\ ZH XVXDOO\ ZLOO WU\ WR SDUDPHWHUL]H LQ VXFK D ZD\ VR WKDW FHUWDLQ SDUDPHWHUV ZLOO PHDVXUH WKH PDJQLWXGH RI DQ HIIHFW RI LQWHUHVW )RU H[DPSOH FRQVLGHU DQ RSLQLRQ SROO ZKHUH D JURXS D VXEMHFWV ZHUH DVNHG RQ WZR GLIIHUHQW RFFDVLRQV ZKHWKHU WKH\ ZRXOG YRWH IRU WKH 3UHVLGHQW DJDLQ LQ WKH QH[W HOHFWLRQ 6XSSRVH WKH\ ZHUH DVNHG LPPHGLDWHO\ DIWHU WKH 3UHVLGHQW WRRN RIILFH DQG DJDLQ DIWHU WKH 3UHVLGHQW KDG VHUYHG IRU WZR \HDUV 7KH UHVHDUFKHU PD\ EH LQWHUHVWHG LQ GHWHUPLQLQJ ZKHWKHU WKH GLVWULEXWLRQ RI UHVSRQVH FKDQJHG IURP 7LPH WR 7LPH DQG LI VR DVVHVV WKH PDJQLWXGH RI WKH FKDQJH 7KH GDWD FRQILJXUDWLRQ FDQ EH GLVSOD\HG DV LQ 7DEOH 7DEOH 2SLQLRQ 3ROO 'DWD &RQILJXUDWLRQ 'DWD 7LPH \HV QR 3UREDELOLWLHV 7LPH \HV QR 7LPH \HV Q 7LPH \HV r 7L QR 9 QR r rO 7 rO r :H FRXOG IRUPXODWH D PRGHO RI WKH IRUP &ORJ$[f ; LQ VXFK D ZD\ VR WKDW WKH IUHHGRP SDUDPHWHU KDV D QLFH LQWHUSUHWDWLRQ ZLWK UHVSHFW WR WKH K\SRWKHVLV RI LQWHUHVW 2QH VXFK PRGHO LV ORJ_Jf D r f ZKHUH WKH SDUDPHWHU LV D PDUJLQDO SUREDELOLW\ LH LI L LI L
PAGE 30
DQG IRU LGHQWLILDELOLW\ RI WKH IUHHGRP SDUDPHWHUV 3L a3L 3 0RGHO f LV D VLPSOH ORJLW PRGHO IRU WKH PDUJLQDO SUREDELOLWLHV ^UÂ` DQG ^UB_BM ` 7KH SDUDPHWHU S PHDVXUHV WKH PDJQLWXGH RI GHSDUWXUH IURP PDUJLQDO KRPRJHQHLW\ LQ WKDW S LI DQG RQO\ LI WKHUH LV PDUJLQDO KRPRJHQHLW\ 2QH FRXOG XVH WKH :DOG VWDWLVWLF SVHSf WR WHVW WKH K\SRWKHVLV ,I WKH QXOO K\SRWKHVLV LV UHMHFWHG ZH FDQ DVVHVV WKH PDJQLWXGH RI GHSDUWXUH IURP PDUJLQDO KRPRJHQHLW\ E\ FRPSXWLQJ D FRQILGHQFH LQWHUYDO IRU S ZKLFK LV WKH ORJ RGGV UDWLR FRPSDULQJ WKH RGGV WKDW D UDQGRPO\ FKRVHQ VXEMHFW UHVSRQGV f\HVf DW 7LPH WR WKH RGGV WKDW D UDQGRPO\ FKRVHQ VXEMHFW UHVSRQGV f\HVf DW 7LPH 7KLV VLPSOH H[DPSOH LOOXVWUDWHV WKH XWLOLW\ RI XVLQJ IUHHGRP SDUDPHWHUV DQG WKH FRUUHVSRQGLQJ PRGHOEDVHG LQIHUHQFHV )RU WKLV UHDVRQ WKLV FKDSWHU ZLOO EH FRQFHUQHG ZLWK PDNLQJ LQIHUHQFHV DERXW ERWK WKH PRGHO SDUDPHWHUV S DQG WKH IUHHGRP SDUDPHWHUV 7KH FRQWHQWV RI WKH IROORZLQJ VHFWLRQV DUH DV IROORZV ,Q VHFWLRQ ZH SURYLGH DQ RYHUYLHZ RI SDUDPHWULF PRGHOLQJ 7KH WZR ZD\V RI VSHFLI\LQJ PRGHOVfÂ§YLD FRQVWUDLQW HTXDWLRQV DQG YLD IUHHGRP HTXDWLRQVfÂ§DUH GLVFXVVHG DW OHQJWK LQ VHFWLRQ ,W LV VKRZQ WKDW D PRGHO VSHFLILHG LQ WHUPV RI IUHHGRP HTXDWLRQV FDQ EH UHVSHFLILHG LQ WHUPV RI FRQVWUDLQW HTXDWLRQV ,Q SDUWLFXODU WKH IUHHGRP HTXDWLRQ &ORJMLf ;I ZKLFK DFWXDOO\ FRQVWUDLQV WKH IXQFWLRQ &?RJ$Sf WR OLH LQ VRPH PDQLIROG VSDQQHG E\ WKH FROXPQV RI ; LV HTXLYDOHQW WR WKH FRQVWUDLQW HTXDWLRQ 8n&ORJ$Sf ZKHUH WKH FROXPQV RI 8 IRUP D EDVLV IRU WKH QXOO VSDFH RI ;n 2WKHU WRSLFV FRYHUHG LQ VHFWLRQ
PAGE 31
LQFOXGH LQWHUSUHWDWLRQ DQG FDOFXODWLRQ RI fGHJUHHV RI IUHHGRPf DQG PHDVXULQJ PRGHO JRRGQHVV RI ILW :H GHVFULEH D JHQHUDO FODVV RI PRGHOV IRU XQLYDULDWH RU PXOWLYDULDWH SRO\WRPRXV UHVSRQVH GDWD LQ VHFWLRQ 7KH GDWD YHFWRU \ LV LQLWLDOO\ DVVXPHG WR EH D UHDOL]DWLRQ RI D SURGXFWPXOWLQRPLDO UDQGRP YHFWRU :H GHVFULEH WKH DV\PSWRWLF EHKDYLRU RI WKH SURGXFWPXOWLQRPLDO 0/ HVWLPDWRUV LQ VHFWLRQ /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV LV XVHG WR ILQG UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWHV RI WKH PRGHO SDUDPHWHUV DQG WKH IUHHGRP SDUDPHWHUV 7KH DFWXDO DOJRULWKP LV GHVFULEHG LQ GHWDLO LQ VHFWLRQ ,Q VHFWLRQ ZH H[SORUH WKH UHODWLRQVKLS EHWZHHQ WKH SURGXFWPXOWLQRPLDO DQG SURGXFW3RLVVRQ 0/ HVWLPDWRUV *HQHUDO UHVXOWV WKDW DOORZ RQH WR DVFHUWDLQ ZKHQ LQIHUHQFHV EDVHG RQ SURGXFW3RLVVRQ HVWLPDWHV DUH WKH VDPH DV LQIHUHQFHV EDVHG RQ SURGXFWPXOWLQRPLDO HVWLPDWHV DUH VKRZQ WR IROORZ TXLWH GLUHFWO\ ZKHQ RQH ZRUNV ZLWKLQ WKH IUDPHZRUN RI FRQVWUDLQW PRGHOV 7KHRUHP RI WKLV VHFWLRQ UHSUHVHQWV D JHQHUDOL]DWLRQ RI WKH UHVXOWV RI %LUFK f DQG 3DOPJUHQ f 3DUDPHWULF 0RGHOLQJfÂ§$Q 2YHUYLHZ ,QIHUHQFHV DERXW WKH GLVWULEXWLRQ RI VRPH Q [ UDQGRP YHFWRU < DUH RIWHQ EDVHG VROHO\ RQ D SDUWLFXODU UHDOL]DWLRQ \ RI < ,Q SDUDPHWULF PRGHOLQJ LW LV RIWHQ WKH FDVH WKDW WKH GLVWULEXWLRQ RI < LV NQRZQ XS WR DQ V [ YHFWRU RI PRGHO SDUDPHWHUV LH LW LV fNQRZQf WKDW < a )\f f
PAGE 32
ZKHUH LV VRPH V mAGLPHQVLRQDO T f VXEVHW RI 5 NQRZQ WR FRQWDLQ WKH WUXH XQNQRZQ SDUDPHWHU r 7KH FXPXODWLYH GLVWULEXWLRQ IXQFWLRQ ) PDSV SRLQWV LQ 5Q LQWR WKH XQLW LQWHUYDO >@ DQG LV DVVXPHG WR EH NQRZQ ,Q JHQHUDO ZH ZLOO DOORZ WKH GLPHQVLRQ V RI WR JURZ ZLWK Q )RU H[DPSOH OHW < )@f@ H P@ RU PRUH VLPSO\ E\ >P@ :H VD\ WKH PRGHO >P@ fKROGVf LI WKH WUXH SDUDPHWHU YDOXH r LV D PHPEHU RI P KH >P@ KROGV r J 0 $ PRGHO GRHV QRW KROG LI r J P 7KH REMHFWLYH RI PRGHO ILWWLQJ LV WR ILQG D VLPSOH SDUVLPRQLRXV PRGHO WKDW KROGV RU QHDUO\ KROGVf %\ SDUVLPRQLRXV ZH PHDQ WKDW WKH YHFWRU FDQ EH REWDLQHG DV D IXQFWLRQ RI UHODWLYHO\ IHZ XQNQRZQ SDUDPHWHUV $Q H[DPSOH
PAGE 33
RI D SDUVLPRQLRXV PRGHO IRU WKH GLVWULEXWLRQ RI DQ QYDULDWH QRUPDO YHFWRU ZLWK XQNQRZQ PHDQ YHFWRU IL DQG NQRZQ FRYDULDQFH LV >@ ZKHUH ^IM J 5Q +M I M Q XQNQRZQ` 1RWLFH WKDW DOO Q FRPSRQHQWV RI M FDQ EH REWDLQHG DV D IXQFWLRQ RI RQH XQNQRZQ SDUDPHWHU 7KXV DOO RI RXU HVWLPDWLRQ HIIRUWV FDQ EH GLUHFWHG WRZDUGV WKH HVWLPDWLRQ RI WKH FRPPRQ PHDQ $Q H[DPSOH RI D QRQSDUVLPRQLRXV PRGHO LV WKH VRFDOOHG VDWXUDWHG PRGHO >@ ZKHUH ^L I[ H 5Q` 5Q ,Q WKLV FDVH M LV D IXQFWLRQ RI Q XQNQRZQ SDUDPHWHUV 7KH TXHVWLRQ RI ZKHWKHU RU QRW WKH SDUVLPRQLRXV PRGHO KROGV LV DQ HQWLUHO\ GLIIHUHQW PDWWHU 3UDFWLFDOO\ VSHDNLQJ D PRGHO ZLOO UDUHO\ VWULFWO\ KROG 7KHUHIRUH ZH ZLOO RIWHQ VD\ D PRGHO KROGV LI LW QHDUO\ KROGV LH IRU VRPH VPDOO H LQI __mr }__ H 2P :LWKRXW GHOYLQJ WRR PXFK LQWR WKH SKLORVRSK\ RI PRGHO ILWWLQJ DQG WKH VLPSOLFLW\ SULQFLSOH )RVWHU DQG 0DUWLQ f ZH SRLQW RXW WKDW IRU D PRGHO WR EH SUDFWLFDOO\ XVHIXO LW PXVW EH UREXVW WR WKH fZKLWH QRLVHf RI WKH SURFHVV JHQHUDWLQJ < 7KDW LV LW VKRXOG DFFRXQW IRU RQO\ WKH REYLRXV V\VWHPDWLF YDULDWLRQ $ PRGHO ZRXOG EH VDLG WR EH UREXVW WR WKH ZKLWH QRLVH YDULDELOLW\ LI WKH PRGHO SDUDPHWHU HVWLPDWHV EDVHG RQ GLIIHUHQW UHDOL]DWLRQV RI < DUH YHU\ VLPLODU $V DQ H[DPSOH LI LQVWHDG RI >A@ WKH VDWXUDWHG PRGHO >@ ZDV XVHG WR GUDZ LQIHUHQFHV DERXW WKH QRUPDO PHDQ YHFWRU L ZH ZRXOG ILQG WKDW WKH PRGHO ILW SHUIHFWO\ EXW WKDW XSRQ UHSHDWHG VDPSOLQJ WKH PRGHO HVWLPDWHV
PAGE 34
ZRXOG FKDQJH GUDPDWLFDOO\ 7KXV WKH PRGHO LV QRW UREXVW WR WKH ZKLWH QRLVH RI WKH SURFHVV 2Q WKH RWKHU KDQG WKH SDUVLPRQLRXV PRGHO >A@ HVWLPDWHV ZRXOG FKDQJH YHU\ OLWWOH IURP VDPSOH WR VDPSOH YDU\LQJ ZLWK WKH VDPSOH PHDQ RI Q REVHUYDWLRQV 7KLV PRGHO LV UREXVW WR WKH ZKLWH QRLVH YDULDELOLW\ 7KHUHIRUH LI WKH PRGHO ZRXOG KROG RU QHDUO\ KROG ZH ZRXOG VD\ LW ZDV D JRRG PRGHO )UHHGRP 0RGHOV ,Q WKH SUHYLRXV QYDULDWH QRUPDO H[DPSOH ZH VSHFLILHG D PRGHO >A@ LQ WHUPV RI VRPH XQNQRZQ SDUDPHWHU c $LWFKLVRQ DQG 6LOYH\ f DQG 6LOYH\ f UHIHU WR WKH SDUDPHWHU > DV D fIUHHGRP SDUDPHWHUf DQG WKH PRGHO >A@ DV D fIUHHGRP PRGHOf 7KHVH ODEHOV DUH UHDVRQDEOH VLQFH ZH FDQ PHDVXUH WKH DPRXQW RI IUHHGRP ZH KDYH IRU HVWLPDWLQJ E\ QRWLQJ WKH QXPEHU RI LQGHSHQGHQW IUHHGRP SDUDPHWHUV WKHUH DUH LQ WKH PRGHO 7KH PRGHO >A@ KDV RQH GHJUHH RI IUHHGRP IRU HVWLPDWLQJ WKH PHDQ YHFWRU L 7KXV RQFH DQ HVWLPDWH RI WKH VLQJOH SDUDPHWHU LV REWDLQHG WKH HQWLUH YHFWRU J FDQ EH HVWLPDWHG LW LV D IXQFWLRQ RI WKH RQH SDUDPHWHU c 1RWLFH WKDW fGHJUHHVf RI IUHHGRP FRUUHVSRQG WR LQWHJHU GLPHQVLRQ LQ WKDW D GHJUHH RI IUHHGRP LV JDLQHG ORVWf LI ZH LQWURGXFH RPLWf RQH LQGHSHQGHQW IUHHGRP SDUDPHWHU WKHUHE\ LQFUHDVLQJ GHFUHDVLQJf WKH GLPHQVLRQDOLW\ RI A E\ RQH ,Q JHQHUDO ZH ZLOO GHQRWH D IUHHGRP PRGHO E\ >A@ ZKHUH H[ ^H4Jf ;LH5U` 7KH IXQFWLRQ J LV VRPH GLIIHUHQWLDEOH YHFWRU YDOXHG IXQFWLRQ PDSSLQJ H LQWR UGLPHQVLRQDO (XFOLGHDQ VSDFH 5U 7KH fPRGHOf PDWUL[ ; LV DQ U [S IXOO FROXPQ UDQN PDWUL[ RI NQRZQ QXPEHUV 7R FDOFXODWH GHJUHHV RI IUHHGRP IRU
PAGE 35
>k[@ ZH LQLWLDOO\ DVVXPH J VDWLVILHV 92 H k[ 0f 9 n LV RI IXOO URZ UDQN U ,W DOVR ZLOO EH DVVXPHG WKDW WKH FRQVWUDLQWV LPSOLHG E\ J^f ; DUH LQGHSHQGHQW RI WKH T FRQVWUDLQWV LPSOLHG E\ WKH PRGHO >@ RI :HOO GHILQHG PRGHOV ZLOO VDWLVI\ WKHVH FRQGLWLRQV )RU H[DPSOH DQ\ J WKDW LV LQYHUWLEOH VDWLVILHV WKH GHULYDWLYH FRQGLWLRQ $FWXDOO\ WKLV GHULYDWLYH FRQGLWLRQ LV QRW D QHFHVVDU\ FRQGLWLRQ IRU WKH PRGHO WR EH ZHOO GHILQHG /DWHU ZH ZLOO VKRZ WKDW J QHHG RQO\ VDWLVI\ D PLOGHU GHULYDWLYH FRQGLWLRQ 7KH GHJUHHV RI IUHHGRP IRU WKH PRGHO >[@ FDQ EH REWDLQHG E\ VXEWUDFWn LQJ WKH QXPEHU RI FRQVWUDLQWV LPSOLHG E\ >k[@ IURP WKH WRWDO QXPEHU RI PRGHO SDUDPHWHUV V 7KH QXPEHU RI FRQVWUDLQWV LPSOLHG E\ >k[@ LV U Sf T WKH GLPHQVLRQ RI WKH QXOO VSDFH RI ;n SOXV WKH T FRQVWUDLQWV LPSOLHG E\ PRGHO >@ +HQFH WKH PRGHO GHJUHHV RI IUHHGRP IRU >k[@ LV GI>t[@ V^US Tf f ,Q YLHZ RI f WKH PRGHO GHJUHHV RI IUHHGRP DQ LQWHJHU PHDVXUH RI IUHHGRP RQH KDV IRU HVWLPDWLQJ LV DQ LQFUHDVLQJ IXQFWLRQ RI S WKH QXPEHU RI IUHHGRP SDUDPHWHUV ,Q IDFW IRU WKH VSHFLDO FDVH ZKHQ T DQG J^f VR V Uf ZH KDYH WKDW WKH QXPEHU RI GHJUHHV RI IUHHGRP IRU PRGHO >[@ LV VLPSO\ S WKH QXPEHU RI IUHHGRP SDUDPHWHUV 7KLV JLYHV XV DQRWKHU JRRG UHDVRQ IRU FDOOLQJ D IUHHGRP SDUDPHWHU DQG >k[@ D IUHHGRP PRGHO &RQVWUDLQW 0RGHOV 1RWLFH WKDW ^HHJHf ;IH5S` f FDQ EH UHZULWWHQ DV ^ H 8nJf `
PAGE 36
ZKHUH 8 LV DQ U [ U Sf IXOO FROXPQ UDQN PDWUL[ VDWLVI\LQJ 8n; LH WKH FROXPQV RI 8 IRUP D PLQLPDO VSDQQLQJ VHW RU EDVLV IRU WKH QXOO VSDFH RI ;n /HWWLQJ X U S DQG Krf EH WKH T FRQVWUDLQWV LPSOLHG E\ >@ ZH FDQ ZULWH WKH X Jf [ YHFWRU RI FRQVWUDLQLQJ IXQFWLRQV DV Kf >KLf Krf@n ZKHUH KL 8nJ :H UHZULWH WKH IUHHGRP PRGHO >[@ RI f DV >@ ZKHUH 6K ^ H 5 K^f ` f $LWFKLVRQ DQG 6LOYH\ f UHIHU WR PRGHO >A@ DV D FRQVWUDLQW PRGHO (YHU\ IUHHGRP PRGHO FDQ EH ZULWWHQ DV D FRQVWUDLQW PRGHO :H SUHVHQW D IHZ VLPSOH H[DPSOHV WR LOOXVWUDWH WKH HTXLYDOHQFH EHWZHHQ WKH WZR PRGHO IRUPXODWLRQVfÂ§IUHHGRP DQG FRQVWUDLQW ([DPSOH /HW <Â a LQG 1Df L Q ZKHUH U LV NQRZQ 7KLV PRGHO FDQ EH VSHFLILHG DV WKH IUHHGRP PRGHO >B\@ ZKHUH k[ ^$r f 5Q Q > XQNQRZQ ` RU HTXLYDOHQWO\ LW FDQ EH H[SUHVVHG DV WKH FRQVWUDLQW PRGHO >$@ ZKHUH 4K ^Q J 5Q 8nQ ` DQG 8n LV WKH Q f [ Q PDWUL[ ? X O ? ,W LV HDVLO\ VHHQ WKDW A DQG WKDW WKH PRGHO GHJUHHV RI IUHHGRP LV GI>4[@ QQ f ([DPSOH /HW B[@ ZKHUH 2[ H 5Q [Â L;L r OQ`
PAGE 37
RU DVVXPLQJ WKDW HDFK [Â LV GLVWLQFW DV WKH FRQVWUDLQW PRGHO >k@! ZKHUH HK ^ÂW 5Q 8nQ ` +HUH 8n LV WKH Q f [ Q PDWUL[ B]OB B$B L O 8n ;fÂ§; ; fÂ§! ; fÂ§ ; ;fÂ§; ;;, ? ;; =M;O ;rfÂ§; ;fÂ§; ;; ? ]Q fÂ§ ]U LQO 1RWLFH WKDW 8nIL LPSOLHV WKDW +ML a B 9NL a [M fÂ§ [M [NO fÂ§ [N 9NM 7KDW LV WKH Q PHDQV IDOO RQ D OLQH $V EHIRUH LW FDQ EH VHHQ WKDW k[ kt DQG WKDW WKH PRGHO GHJUHHV RI IUHHGRP LV GI>4K@ Q Q f 'HILQLWLRQV :H ZLOO DVVXPH WKDW WKH FRQVWUDLQLQJ IXQFWLRQ K VDWLVILHV VRPH UHDVRQDEOH FRQGLWLRQV VR WKDW WKH PRGHO LV ZHOO GHILQHG :H ILUVW SUHVHQW VRPH GHILQLWLRQV f $ PRGHO >LV VDLG WR EH fFRQVLVWHQWf LI 4K A f $ FRQVLVWHQW PRGHO >k@ LV VDLG WR EH fZHOOGHILQHGf LI WKH DFRELDQ PDWUL[ IRU K LV RI IXOO URZ UDQN Y fÂ§ X I T DW HYHU\ SRLQW LQ 4K 7KDW LV YrfH W6" LV RI IXOO URZ UDQN Y f$ PRGHO >$@ LV VDLG WR EH fLOOGHILQHGf LI LW LV QRW ZHOOGHILQHG LH L IH DK: rWE OLU LV QRW RI IXOO URZ UDQN X
PAGE 38
f $Q LOOGHILQHG PRGHO >@ LV VDLG WR EH fLQFRQVLVWHQWf RU fLQFRPSDWLEOHf LI k %ULHIO\ DQ\ UHDVRQDEOH PRGHO ZLOO KDYH D QRQHPSW\ SDUDPHWHU VSDFH DQG KHQFH ZLOO EH FRQVLVWHQW 7KH DFRELDQ FRQGLWLRQ RI GHILQLWLRQ f LV VLPLODU WR WKH FRQGLWLRQ UHTXLUHG LQ WKH ,PSOLFLW )XQFWLRQ 7KHRUHP VHH %DUWOH f %DVLFDOO\ WKLV FRQGLWLRQ UHTXLUHV WKH FRQVWUDLQWV WR EH QRQUHGXQGDQW VR WKDW DW OHDVW WKHRUHWLFDOO\ WKH FRQVWUDLQW HTXDWLRQV FDQ EH ZULWWHQ XQLTXHO\ DV D IXQFWLRQ RI D VPDOOHU VHW RI SDUDPHWHUV $Q LOOGHILQHG PRGHO KDV EHHQ VSHFLILHG ZLWK D UHGXQGDQW VHW RI FRQVWUDLQW HTXDWLRQV 8VLQJ WKH OLQJR RI WKH RSWLPL]DWLRQ OLWHUDWXUH WZR FRQVWUDLQWV DUH UHGXQGDQW LI IRU HDFK SRLQW LQ WKH SDUDPHWHU VSDFH ERWK RI WKH FRQVWUDLQWV DUH fDFWLYHf RU ERWK RI WKH FRQVWUDLQWV DUH fLQDFWLYHf 7KDW LV IRU DOO SDUDPHWHU YDOXHV LI RQH FRQVWUDLQW LV DFWLYH LQDFWLYHf WKHQ WKH RWKHU LV QHFHVVDULO\ DFWLYH LQDFWLYHf ,W VKRXOG EH QRWHG WKDW WKH DERYH GHILQLWLRQV DUH LQ WHUPV RI WKH FRQVWUDLQW IRUPXODWLRQ RI D PRGHO 7KLV LV VXIILFLHQW VLQFH IUHHGRP PRGHOV FDQ EH ZULWWHQ DV FRQVWUDLQW PRGHOV )RU FRQYHQLHQFH ZH JLYH VXIILFLHQW FRQGLWLRQV IRU D IUHHGRP PRGHO WR EH ZHOOGHILQHG $ FRQVLVWHQW IUHHGRP PRGHO LV ZHOOGHILQHG LI LW VDWLVILHV WKH IROORZLQJ WZR FRQGLWLRQV Lf 7KH FRQVWUDLQWV LPSOLHG E\ J^f ; DUH LQGHSHQGHQW RI WKH T FRQVWUDLQWV LPSOLHG E\ >@ LLf 7KH DFRELDQ PDWUL[ RI J HYDOXDWHG DW DQ\ SRLQW LQ >[@ LV RI IXOO URZ UDQN U LH JJIIO Gf ff LV RI IXOO URZ UDQN U 9R [!
PAGE 39
7KH VXIILFLHQF\ RI FRQGLWLRQV Lf DQG LLf FDQ EH VHHQ E\ REVHUYLQJ WKDW LLf LPSOLHV WKDW KL 8nJ KDV D IXOO URZ UDQN DFRELDQ VLQFH 8n LV RI IXOO URZ UDQN DQG Lf LPSOLHV WKDW K KLKrfn KDV IXOO URZ UDQN DFRELDQ 7KHVH VXIILFLHQW FRQGLWLRQV DUH E\ QR PHDQV QHFHVVDU\ IRU D PRGHO WR EH ZHOO GHILQHG DV WKH DFRELDQ RI K PD\ EH RI IXOO URZ UDQN X HYHQ ZKHQ WKH DFRELDQ RI J LV QRW RI IXOO URZ UDQN 1RWLFH WKDW WKH PRGHO PDWUL[ KDV QRWKLQJ WR GR ZLWK ZKHWKHU RU QRW D PRGHO LV ZHOO GHILQHG ,Q SDUWLFXODU RQH PD\ WKLQN WKDW WKH PRGHO >A@ LV LOOGHILQHG ZKHQHYHU WKH U [ S PDWUL[ ; LV QRW RI IXOO FROXPQ UDQN LH WKH IUHHGRP SDUDPHWHUV DUH QRQHVWLPDEOH +RZHYHU WKH PRGHO FDQ EH UHZULWWHQ DV D FRQVWUDLQW PRGHO ZLWK WKH IXOO FROXPQ UDQN PDWUL[ 8 VSDQQLQJ WKH QXOO VSDFH RI ; ZKLFK KDV GLPHQVLRQ OHVV WKDQ S U ,W IROORZV WKDW LI J VDWLVILHV Lf DQG LLf WKHQ WKH PRGHO >k[@ ZLOO EH ZHOOGHILQHG 7KH RQO\ UHDVRQ ZH KDYH WDNHQ ; WR EH RI IXOO FROXPQ UDQN LV WR DYRLG XVLQJ JHQHUDOL]HG LQYHUVHV ZKHQ ZRUNLQJ ZLWK WKH IUHHGRP SDUDPHWHUV 7R LOOXVWUDWH WKH XVH RI WKHVH GHILQLWLRQV ZH FRQVLGHU WKH PRGHO >0@ ZKHUH 4P ^ H 5Q 0 G ` 7KH PRGHO ZLOO EH ZHOO GHILQHG LI GKGn 0 LV RI IXOO URZ UDQN ,W LV LQFRQVLVWHQW LI WKH OLQHDU V\VWHP RI HTXDWLRQV 0 G LV LQFRQVLVWHQW ,I D PRGHO >@ LV ZHOO GHILQHG WKHQ WKH FRQVWUDLQWV LPSOLHG E\ WKH PRGHO DUH DOO LQGHSHQGHQW LQ WKDW QR FRQVWUDLQW FDQ EH LPSOLHG E\ WKH RWKHUV :H ZLOO FRQVLGHU RQO\ ZHOOGHILQHG PRGHOV ZKHQ FDOFXODWLQJ GHJUHHV RI IUHHGRP
PAGE 40
$V EHIRUH ZH FDOFXODWH GHJUHHV RI IUHHGRP IRU D PRGHO DV WKH GLIIHUHQFH EHWZHHQ WKH QXPEHU RI PRGHO SDUDPHWHUV V DQG WKH QXPEHU RI LQGHSHQGHQW FRQVWUDLQWV Y LPSOLHG E\ WKH PRGHO LH GI>4K@ VUS Tf VX Tf V X 1RWLFH WKDW IRU WKH FRQVWUDLQW PRGHO PRGHO GHJUHHV RI IUHHGRP LV D GHFUHDVLQJ IXQFWLRQ RI WKH QXPEHU RI LQGHSHQGHQW FRQVWUDLQWV X )LQDOO\ LW VKRXOG EH QRWHG WKDW PRGHOV PD\ EH VSHFLILHG LQ WHUPV RI ERWK IUHHGRP HTXDWLRQV DQG FRQVWUDLQW HTXDWLRQV ,Q IDFW LQ VXEVHTXHQW VHFWLRQV WKLV ZLOO EH WKH FDVH +RZHYHU ZLWKRXW ORVV RI JHQHUDOLW\ ZH ZLOO FRQFHQWUDWH RQ FRQVWUDLQW PRGHOV VLQFH DQ\ PRGHO FDQ EH ZULWWHQ LQ WKH IRUP RI D FRQVWUDLQW PRGHO 0HDVXULQJ 0RGHO *RRGQHVV RI )LW ,QIHUHQFHV DERXW PRGHO SDUDPHWHUV DUH UHOLDEOH RQO\ LI WKH PRGHO LV fJRRGf $ JRRG PRGHO VKRXOG EH ZHOO GHILQHG RU DW OHDVW FRQVLVWHQWf ,W VKRXOG EH VLPSOH DQG SDUVLPRQLRXV )LQDOO\ WKH PRGHO VKRXOG EH UHODWLYHO\ FORVH WR KROGLQJ 7R DVVHVV ZKHWKHU RU QRW WKH PRGHO KROGV ZH ZLOO QHHG WKH FRQFHSW RI D GLVWDQFH EHWZHHQ WZR PRGHOV 7R EHJLQ ZH ZLOO DVVXPH WKHUH LV VRPH PHDVXUH RI GLVWDQFH EHWZHHQ WZR KLHUDUFKLFDO SDUDPHWULF PRGHOV 7ZR PRGHOV >L@ DQG >#@ DUH KLHUDUFKLFDO LI & DQG G>@ G>L@ ZKHQHYHU M A f 7KLV SDUDPHWULFf GLVWDQFH ZLOO EH D TXDQWLWDWLYH FRPSDULVRQ RI KRZ FORVH WKH WZR PRGHOV DUH WR KROGLQJ 7KXV LI ERWK PRGHOV KROG WKH GLVWDQFH LV ]HUR 7KH GLVWDQFH ZLOO DOVR EH LQGHSHQGHQW RI WKH PRGHO GHJUHHV RI IUHHGRP
PAGE 41
5HFDOO WKDW WKH IRUP RI )@ f LV DVVXPHG NQRZQ 7KHUHIRUH WKH GLVWDQFH ZLOO PHDVXUH KRZ IDU WKH WUXH SDUDPHWHU LV IURP IDOOLQJ LQ WKH SDUDPHWULF PRGHO VSDFH 6XSSRVH ILUVWO\ WKDW k DQG # DUH JHQHUDO SDUDPHWHU VSDFHV 7KDW LV J L X k GRHV QRW QHFHVVDULO\ GHILQH D SUREDELOLW\ GLVWULEXWLRQ ,Q RWKHU ZRUGV QHHG QRW IDOO LQ D VXEVHW RI DQ V OfGLPHQVLRQDO VLPSOH[ /HW Df DQG Ef EH YHFWRU RU PDWUL[ YDOXHG IXQFWLRQV RI WKH XQNQRZQ SDUDPHWHU 'HILQH D GLVWDQFH EHWZHHQ WZR KLHUDUFKLFDO PRGHOV >k@ DQG >@ k & kMf DV L> H@ LQI __fDHf Dfff__r LQI __}fRmf Drff__ 1RWLFH WKDW D DQG E FDQ EH FKRVHQ VR WKDW f L>L@!2 f >kL@ LII k DQG KROG )RU H[DPSOH FRQVLGHU WKH FDVH < a 091QILFU,Qf 6XSSRVH WKDW >k@ ^r!rrf f 5Qr ` >kL@ ^P FWf [ ` >@ ^P FWf >L fÂ§ ;IL e 5SD ` >k@ ^Â FUf ff 0 ,QWW DH5D! ` ,Q WKLV H[DPSOH HDFK FRPSRQHQW RI < KDV D FRPPRQ YDULDQFH D ,W VHHPV UHDVRQDEOH WKDW GLIIHUHQFHV EHWZHHQ DQ\ cM DQG WKH WUXH PHDQ c[r DUH HTXDOO\ LPSRUWDQW +HQFH D QDWXUDO GLVWDQFH EHWZHHQ DQ\ WZR RI WKHVH PRGHOV LV LQI __Lr__ LQI __]r__ :0 ^Af0O 1RWLFH WKDW D[Df DQG EILDf +HQFH WKH PHDVXUH RI GLVWDQFH
PAGE 42
EHWZHHQ >@ DQG >L@ LV Â>L @ LQI __L ILr__ OOLR 7KH VHFRQG LQILPXP LV ]HUR VLQFH WKH PRGHO >@ LV NQRZQ WR KROG 7KH PHDVXUH RI GLVWDQFH EHWZHHQ >@ DQG >@ LV Â> @ LQI ??IL ]r__ LQI ??; ]r__ W! 3 ,_;;n[f![YrPr,, f __ ;;n;f!;!n__ Lrnf[;n;f![n. 7KLV LV WKH VTXDUHG OHQJWK RI WKH YHFWRU RUWKRJRQDO WR WKH SURMHFWLRQ RI cLr RQWR WKH UDQJH VSDFH RI ; 1RWLFH WKDW LI ÂLr ;r WKDW LV KROGV WKHQ L>H@ 2 )LQDOO\ WKH GLVWDQFH EHWZHHQ >@ DQG >@ LV Â>@ LQI __L Lr__ LQI OO[ Âr__ A?LQ A }rnf [[n[f![n. f An;I;n;n; $V DQRWKHU H[DPSOH FRQVLGHU D UDQGRP YHFWRU <
PAGE 43
ZKHUH mnÂf DQG YDU\Âf FUÂf /HW 9Lf pfFÂf DQG LL [fDfn 6LQFH WKH FRPSRQHQWV RI < KDYH GLIIHUHQW YDULDQFHV D QDWXUDO PHDVXUH RI GLVWDQFH LV Â>kPkPLQI Qrf?? LQI __APfaÂÂ Lrf__ f :0 7KDW LV Df DQG Ef )Âf 3UHPXOWLSO\LQJ WKH YHFWRU Â Ârf E\ 9Âf KDV WKH HIIHFW RI GRZQSOD\LQJ WKRVH GLIIHUHQFHV T ÂÂrf ZKHQ WKH FRUUHVSRQGLQJ YDULDQFH LV ODUJH 7R DVVHVV WKH JRRGQHVV RI ILW RI D PRGHO UHODWLYH WR DQRWKHU ZH FDQ HVWLPDWH WKH GLVWDQFH YLD VRPH VWDWLVWLF EDVHG RQ WKH REVHUYHG GDWD ,W LV LQWHUHVWLQJ WR QRWH WKDW ZKHQ LH ERWK PRGHOV KROG RXU GDWD EDVHG HVWLPDWH RI WKLV QXOO GLVWDQFH ZLOO EH VRPH QRQQHJDWLYH SRVLWLYH LI WKH PRGHO LV XQVDWXUDWHGf QXPEHU UHIOHFWLQJ WKH DPRXQW RI ZKLWH QRLVH RU UDQGRP YDULDELOLW\ WKHUH LV LQ < 7KLV LV VR EHFDXVH LI ERWK PRGHOV KROG WKHQ WKH RQO\ UHDVRQ WKDW RXU HVWLPDWH RI GLVWDQFH ZRXOG EH QRQ]HUR ZRXOG EH EHFDXVH < KDV VRPH UDQGRP FRPSRQHQW 7KDW LV WKH YDULDELOLW\ LQ < WKDW LV QRW H[SODLQHG E\ WKH PRGHO FDXVHV WKH GDWD WR ILW WKH PRGHO LPSHUIHFWO\ /HW EH DQ HVWLPDWH RI 7KDW LV '>4 kL@ LV D VWRFKDVWLF GDWDEDVHG HVWLPDWH RI KRZ IDU DSDUW PRGHOV >kL@ DQG >k@ DUH 3RWHQWLDO FDQGLGDWHV IRU DUH WKH ZHLJKWHG OHDVW VTXDUHV OLNHOLKRRG UDWLR :DOG GHYLDQFH DQG /DJUDQJH PXOWLSOLHU VWDWLVWLFV )RU H[DPSOH FRQVLGHU WKH QYDULDWH QRUPDO FDVH DQG WKH IRXU FDQGLGDWH PRGHOV >@ >M@ >@ DQG >@ :H ZLOO DVVXPH WKDW ERWK >@ DQG >@ KROG ,Q YLHZ RI f D UHDVRQDEOH HVWLPDWH RI > @ FDQ EH REWDLQHG E\
PAGE 44
UHSODFLQJ Âr E\ < WKH HVWLPDWH RI XQGHU PRGHO >@ LH '> @ @ LV NQRZQ WR EH ]HUR =">@ VHUYHV DV RXU fHVWLPDWH RI HUURUf 6LPLODUO\ D UHDVRQDEOH HVWLPDWH RI Â> @ FDQ EH REWDLQHG E\ UHSODFLQJ cMr LQ f E\ < WKH OHDVW UHVWULFWLYH HVWLPDWH RI Lr LH /!> @ @ Q OQOf G>@ Q OQSf SO 7KH GHJUHHV RI IUHHGRP DVVRFLDWHG ZLWK HVWLPDWLQJ WKH GLVWDQFH EHWZHHQ WZR PRGHOV ZLOO EH FDOOHG WKH GLVWDQFH RU UHVLGXDO RU JRRGQHVVRIILWf GHJUHHV RI IUHHGRP 7KH GLVWDQFH GHJUHHV RI IUHHGRP IRU WKH WZR PRGHOV >k0L@ DQG >pP@ LV GHILQHG WR EH WKH GLIIHUHQFH EHWZHHQ WKH WZR PRGHO GHJUHHV RI IUHHGRP LH GÂ>0 Pf G>0G>0@ 7KH QXPEHU RI GLVWDQFH GHJUHHV RI IUHHGRP PHDVXUHV WKH GLPHQVLRQDO GLVWDQFH EHWZHHQ WKH WZR PRGHOV LH WKH GLIIHUHQFH LQ GLPHQVLRQV ,W PHDVXUHV WKH GLIIHUHQFH LQ WKH DPRXQW RI IUHHGRP RQH KDV IRU HVWLPDWLQJ IRU WKH WZR PRGHOV ,W VHHPV LQWXLWLYH WKDW LI WKH GHJUHHV RI IUHHGRP LV ODUJH WKDW LV WKH GLPHQVLRQDO GLIIHUHQFH EHWZHHQ WKH WZR PRGHOV JUHDW WKH VLJQLILFDQFH RI WKH GLVWDQFH VWDWLVWLF PD\ EH GLIILFXOW WR DVFHUWDLQ 7KLV IROORZV VLQFH ZH H[SHFW WKH ILW WR EH TXLWH GLIIHUHQW IRU WKH WZR YHU\ GLIIHUHQW PRGHOV HYHQ ZKHQ ERWK
PAGE 45
PRGHOV KROG 7KLV LV D UHIOHFWLRQ RI ERWK ZKLWH QRLVH DQG SRVVLEO\ ODFN RI ILW 7KHUHIRUH WKH GLVWDQFH VWDWLVWLF ZLOO WHQG WR EH ODUJH HYHQ ZKHQ ERWK PRGHOV KROG %XW IRU PDQ\ VWDWLVWLFV D ODUJH PHDQ LPSOLHV D ODUJH YDULDQFH WKHUHE\ PDNLQJ VLJQLILFDQW ILQGLQJV PRUH GLIILFXOW ,W LV IRU WKLV UHDVRQ WKDW ZH VD\ LW LV EHWWHU WR FRQFHQWUDWH RXU HIIRUWV RQ UHODWLYHO\ IHZ GHJUHHV RI IUHHGRP WR GHWHFW ODFN RI ILW 7KDW LV RQH VKRXOG XVH WKH VPDOOHVW DOWHUQDWLYH VSDFH SRVVLEOH ZKHQ WHVWLQJ D QXOO K\SRWKHVLV $ PRUH WHFKQLFDO DUJXPHQW KROGV ZKHQ WKH WHVW VWDWLVWLF GLVWDQFH VWDWLVWLFf LV D &KLVTXDUH RU DQ ) 'DV *XSWD DQG 3HUOPDQ f VKRZHG WKDW IRU D IL[HG QRQFHQWUDOLW\ SDUDPHWHU LH IL[HG GLVWDQFH EHWZHHQ PRGHOV WKH SRZHU RI WKH )WHVW RU WKH &KLVTXDUH WHVW LQFUHDVHV DV WKH GLVWDQFH GHJUHHV RI IUHHGRP GHFUHDVHV ([DPSOH &RQWLQXLQJ ZLWK WKH QYDULDWH QRUPDO H[DPSOH ZH VHH WKDW GI>H@f G>@ G>@ S f S 7KXV LV RI S OHVV GLPHQVLRQV WKDQ 1RZ LI ZH NQHZ D WKH ZKLWH QRLVH YDULDQFH ZH FRXOG WHVW + r H # YV +L r H k k XVLQJ WKH VWDWLVWLF Â!> @ 66M5HJf M M f ? n n f ZKLFK KDV D ;Sf QXOO GLVWULEXWLRQ +RZHYHU D LV QRW JHQHUDOO\ NQRZQ DQG ZH PXVW HVWLPDWH LW 2QH ZD\ RI HVWLPDWLQJ D LV E\ HVWLPDWLQJ WKH GLVWDQFH EHWZHHQ >@ DQG >k@ WZR PRGHOV WKDW DUH NQRZQ WR KROG DQG GLYLGLQJ E\ WKH GLVWDQFH GHJUHHV RI IUHHGRP 6LQFH WKH GLVWDQFH GHJUHHV RI IUHHGRP LV GI>4@ GI>4 @ Q S f QS ZH KDYH WKDW WKH HVWLPDWH RI WKH ZKLWH QRLVH YDULDQFH LV '> 4@^Q fÂ§ Sf 66(UURUfQ Sf
PAGE 46
1RWLFH WKDW LQ WKH DERYH H[DPSOH WKH HVWLPDWH RI WKH SDUDPHWHU D ZDV VLPSO\ WKH HVWLPDWHG GLVWDQFH EHWZHHQ WZR PRGHOV WKDW ZHUH NQRZQ WR KROG GLYLGHG E\ WKHLU GLPHQVLRQDO GLVWDQFH 4XLWH JHQHUDOO\ ZKHQ WKH GDWD KDYH DQ H[SRQHQWLDO GLVSHUVLRQ GLVWULEXWLRQ f ZLWK FRPPRQ GLVSHUVLRQ SDUDPHWHU D WKH HVWLPDWHG GLVWDQFH EHWZHHQ WZR PRGHOV WKDW DUH NQRZQ WR KROG GLYLGHG E\ WKHLU GLPHQVLRQDO GLVWDQFH JLYHV XV DQ HVWLPDWH RI D 7KLV LV WUXH ZKHQ WKH HVWLPDWHG GLVWDQFH LV WDNHQ WR EH WKH /5 :DOG 'HYLDQFH /0 RU WKH ZHLJKWHG OHDVW VTXDUHV VWDWLVWLFV 7KHVH VWDWLVWLFV DUH QDWXUDO HVWLPDWRUV RI WKH ZHLJKWHG GLVWDQFH JLYHQ LQ f IRU WKH H[SRQHQWLDO GLVSHUVLRQ PRGHOV 1RZ OHW XV DVVXPH WKDW M DQG DUH HDFK VXEVHWV RI DQ V OfGLPHQVLRQDO VLPSOH[ )RU H[DPSOH ZLWK FRXQW GDWD FRQGLWLRQDO RQ WKH WRWDO Q WKH GLVWULEXWLRQ LV RIWHQ PXOWLQRPLDO ZLWK LQGH[ Q DQG SDUDPHWHU DOWHUQDWLYHO\ SUREDELOLW\ GLVWULEXWLRQ YHFWRUf r 5HDG DQG &UHVVLH f H[WHQVLYHO\ VWXG\ D IDPLO\ RI GLVWDQFH PHDVXUHV FDOOHG WKH SRZHUGLYHUJHQFH IDPLO\ 7KH SRZHU GLYHUJHQFHV KDYH IRUP RR ZKHUH ,R DQG DUH GHILQHG WR EH WKH FRQWLQXRXV OLPLWLQJ YDOXH DV $ DQG $ fÂ§! ,W LV DVVXPHG WKDW r DQG IDOO RQ DQ V fGLPHQVLRQDO VLPSOH[ $V XVXDO OHW r UHSUHVHQW WKH WUXH XQNQRZQ SDUDPHWHU :H GHILQH WKH IDPLO\ RI GLVWDQFH PHDVXUHV EHWZHHQ >L@ DQG >@ &Mf WR EH SURSRUWLRQDO WR Â>L@ Q^LQI$rf LQI $rf` k k %\ SURSHUWLHV RI ,[^rf 5HDG DQG &UHVVLH SS f LW IROORZV WKDW ZLWK HTXDOLW\ LI DQG RQO\ LI ERWK PRGHOV KROG
PAGE 47
7R HVWLPDWH Â>k kL@ EDVHG RQ WKH GDWD ZH QRWH WKDW RXU OHDVW UHVWULFWLYH JXHVV RI r LV k k@ ZRXOG EH '> k@ Q^ LQI ,$L@ LV HTXDO WR WKH OLNHOLKRRG UDWLR VWDWLVWLF ZKHQ $ $OVR LI ZH DVVXPH WKDW >2A KROGV VR WKDW WKH VHFRQG LQILPXP LV ]HUR ZH KDYH WKDW IRU $ 9 Qmff '>@ ( 79 0 ZKLFK LV DV\PSWRWLFDOO\ HTXLYDOHQW WR Q>4 @ B ;r a QAL Af ZKHUH rf LV WKH PD[LPXP OLNHOLKRRG HVWLPDWRU RI r RYHU WKH VSDFH 7KLV LV WKH 3HDUVRQ FKLVTXDUH VWDWLVWLF 2WKHU DV\PSWRWLFDOO\ HTXLYDOHQW GLVWDQFH HVWLPDWHV DUH WKH :DOG VWDWLVWLF DQG WKH /DJUDQJLDQ PXOWLSOLHU VWDWLVWLF :H QRZ LOOXVWUDWH WKHVH UHVXOWV YLD H[DPSOHV ([DPSOH 6XSSRVH WKDW < @ ZKHUH ^7 WW7 OL?M f LM `
PAGE 48
1RWLFH WKDW LV UHDOO\ D GLPHQVLRQDO VXEVHW VLPSOH[f RI Of VR WKDW G>@ O :H ZLVK WR WHVW WKH LQGHSHQGHQFH K\SRWKHVHV Â + UQU UUL YV +? 77 A 7L7L :ULWLQJ WKH PRGHO RI LQWHUHVW >2@ DV ^YU H 77 U7O ` 7 Wn OUU AA ` ZH FDQ VWDWH WKH LQGHSHQGHQFH K\SRWKHVHV DV I + 7 2 YV +L U J 1RZ WKH PRGHO GHJUHHV RI IUHHGRP FDQ EH IRXQG E\ VXEWUDFWLQJ WKH QXPEHU RI FRQVWUDLQWV LPSOLHG E\ >2@ IURP WKH WRWDO QXPEHU RI SDUDPHWHUV ZKLFK LV +HQFH GI>4@ 7KXV WKH GLVWDQFH GHJUHHV RI IUHHGRP RU PHDVXUH RI GLPHQVLRQDO GLVWDQFH LV G>R @f 7ZR GLVWDQFH JRRGQHVVRIILWf VWDWLVWLFV FRPPRQO\ XVHG DUH WKH 3HDUVRQ FKLVTXDUH ; $ Of DQG WKH OLNHOLKRRG UDWLR VWDWLVWLF $ f 7KH IRUPV RI WKHVH WZR VWDWLVWLFV DUH '>HfH@ rf ee r Â n9LM QrMMRf QLMR DQG e>k H@ JM e }J K Â M QUrZKHUH ULV WKH 0/ HVWLPDWH RI UÂM DVVXPLQJ WKDW PRGHO >2@ KROGV 8QGHU WKH QXOO K\SRWKHVLV LH LI LQGHSHQGHQFH WUXO\ KROGV WKHQ WKH DV\PSWRWLF GLVWULEXWLRQ RI ERWK GLVWDQFH VWDWLVWLFV ; DQG LV ;Of
PAGE 49
([DPSOH &RQWLQXLQJ ZLWK H[DPSOH FRQVLGHU WKH PRGHO >pPK@ ZKHUH f!0+ ^WW WWn 7 7 ` 7KLV PRGHO LPSOLHV WKDW WKHUH LV PDUJLQDO KRPRJHQHLW\ LH 7KH PDUJLQDO GLVWULEXWLRQV IRU ERWK IDFWRUV DUH WKH VDPH :H ZRXOG OLNH WR WHVW WKH K\SRWKHVHV + 7 J 40K YV +L 7 p0+ 7KH PRGHO GHJUHHV RI IUHHGRP LV GI>p0+? DQG VR WKH GLVWDQFH GHJUHHV RI IUHHGRP LV GI>40+@ @f 2QFH DJDLQ WR LOOXVWUDWH ZKDW PRGHO GHJUHHV RI IUHHGRP PHDQV ZH REVHUYH WKDW LI >pPK@ KROGV DQG ZH VSHFLI\ WZR RI WKH IRXU SUREDELOLWLHV WKH UHPDLQLQJ WZR DUH FRPSOHWHO\ GHWHUPLQHG 7KXV ZH DUH IUHH WR HVWLPDWH WZR RI WKH SUREDELOLWLHV EDVHG RQ WKH GDWD 7KH RWKHU WZR DUH GHWHUPLQHG 7ZR IUHTXHQWO\ XVHG HVWLPDWHV RI WKH PRGHO GLVWDQFH RU PRGHO JRRGQHVV RI ILW DUH WKH OLNHOLKRRG UDWLR VWDWLVWLF DQG WKH 0F1HPDU VWDWLVWLF 0 )RU [ WDEOHV WKH 0F1HPDU VWDWLVWLF DQG WKH /DJUDQJH 0XOWLSOLHU VWDWLVWLF DUH HTXLYDOHQW VLQFH ERWK DUH VFRUH VWDWLVWLFV $JUHVWL $LWFKLVRQ t 6LOYH\ f 7KH VWDWLVWLFV WDNH WKH IROORZLQJ IRUPV DQG f'>H0L,H@ e\!LRJ r M 9LM ? QUULML K '>p0+? @ 0 \L \LLf \Q 9L n ZKHUH WKH UÂf LQ WKH ILUVW H[SUHVVLRQ LV WKH 0/ HVWLPDWH RI UÂ XQGHU WKH PRGHO >0L@
PAGE 50
8QGHU WKH QXOO LH ZKHQ WKH PDUJLQDO GLVWULEXWLRQV DUH KRPRJHQHRXV ERWK RI WKHVH VWDWLVWLFV KDYH DV\PSWRWLF bOf GLVWULEXWLRQV ,W LV LPSRUWDQW WR QRWH WKDW KDG WKH FRQVWUDLQW UU EHHQ DGGHG WKH PRGHO ZRXOG UHPDLQ FRQVLVWHQW EXW ZRXOG EH LOO GHILQHG )RU [ WDEOHV WKLV DGGLWLRQDO FRQVWUDLQW LV H[DFWO\ WKH VDPH DV WKH FRQVWUDLQW U 7 0XOWLYDULDWH 3RO\WRPRXV 5HVSRQVH 0RGHO )LWWLQJ ,Q WKLV VHFWLRQ ZH GHVFULEH 0/ PRGHO ILWWLQJ IRU DQ LQWHJHU YDOXHG UDQGRP YHFWRU < WKDW LV DVVXPHG WR EH GLVWULEXWHG SURGXFWPXOWLQRPLDOO\ :H DOVR LQYHVWLJDWH WKH DV\PSWRWLF EHKDYLRU RI WKH 0/ HVWLPDWRUV ZLWKLQ WKH IUDPHZRUN RI FRQVWUDLQW PRGHOV 7KH PRGHOV ZH ZLOO FRQVLGHU KDYH IRUP 4[ ^e H &ORJ$Hf ; /Hr ` RU HTXLYDOHQWO\ IRU DSSURSULDWHO\ FKRVHQ 8 t[ N ^H# &n&ORJ$Hf / ` ZKHUH HÂ LV WKH V [ PHDQ YHFWRU RI < D SURGXFWPXOWLQRPLDO UDQGRP YHFWRU DQG WKH PRGHO SDUDPHWHU VSDFH LV RI GLPHQVLRQ V T ZKHUH T LV WKH QXPEHU RI LGHQWLILDELOLW\ FRQVWUDLQWV :H XVH WKH SDUDPHWHU e UDWKHU WKDQ c[ fÂ§ HA IRU VHYHUDO UHDVRQV 2QH UHDVRQ ZLOO EHFRPH HYLGHQW ZKHQ ZH H[SORUH WKH DV\PSWRWLF EHKDYLRU RI WKH 0/ HVWLPDWRU RI e ,W WXUQV RXW WKDW WKH UDQGRP YDULDEOH c[ cMT LV QRW ERXQGHG LQ SUREDELOLW\ ZKHUHDV e eR LV ,Q IDFW WKH UDQGRP YDULDEOH e e FRQYHUJHV LQ SUREDELOLW\ WR $QRWKHU UHDVRQ IRU XVLQJ e UDWKHU WKDQ c[ LV WKDW WKH SURFHGXUH IRU GHULYLQJ WKH PD[LPXP OLNHOLKRRG HVWLPDWH RI e LV OHVV VHQVLWLYH WR VPDOO RU ]HURf FRXQWV 7KH UDQJH RI SRVVLEOH e YDOXHV LV WKH ZKROH UHDO OLQH ZKLOH WKH UDQJH RI SRVVLEOH [ YDOXHV LV UHVWULFWHG
PAGE 51
WR WKH SRVLWLYH KDOI RI WKH UHDO OLQH %\ XVLQJ e WKH SUREOHP RI LQWHUPHGLDWH RXW RI UDQJH YDOXHV HJ QHJDWLYH FHOO PHDQ HVWLPDWHVf LV DYRLGHG $V VWDWHG DERYH ZH LQLWLDOO\ DVVXPH WKDW WKH YHFWRU RI FHOO FRXQWV < KDV D SURGXFWPXOWLQRPLDO GLVWULEXWLRQ 7KLV LV QRW RYHUO\ UHVWULFWLYH VLQFH LW ZLOO EH VKRZQ WKDW LQIHUHQFHV EDVHG RQ PD[LPXP PXOWLQRPLDOf OLNHOLKRRG HVWLPDWHV DUH RIWHQ WKH VDPH DV LQIHUHQFHV EDVHG RQ PD[LPXP 3RLVVRQf OLNHOLKRRG HVWLPDWHV :H ZLOO SUHVHQW VRPH UHVXOWV LQ VHFWLRQ WKDW DOORZ XV WR GHWHUPLQH ZKHQ WKHVH LQIHUHQFHV DUH LQGHHG WKH VDPH :H DOVR FRQVLGHU DQ DOWHUQDWLYH PHWKRG IRU FRPSXWLQJ WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV DQG WKHLU DV\PSWRWLF FRYDULDQFHV 7KH PHWKRG RI /DJUDQJH XQGHWHUPLQHG PXOWLSOLHUV LV ZHOO VXLWHG IRU PD[LPXP OLNHOLKRRG ILWWLQJ RI WKH PRGHOV ZH ZLOO EH FRQVLGHULQJ 7KLV LV VR EHFDXVH ZH ZLOO VSHFLI\ WKH PRGHOV LQ WHUPV RI FRQVWUDLQW HTXDWLRQV DQG WKH ILWWLQJ SUREOHP ZLOO EH RQH RI PD[LPL]LQJ D IXQFWLRQ QDPHO\ WKH ORJ OLNHOLKRRG VXEMHFW WR VRPH FRQVWUDLQWV QDPHO\ WKDW e $ *HQHUDO 0XOWLQRPLDO 5HVSRQVH 0RGHO ,Q WKLV VHFWLRQ ZH VSHFLI\ D FODVV RI PRGHOV WKDW LV GLUHFWO\ DSSOLFDEOH WR &KDSWHU RI WKLV GLVVHUWDWLRQ 6SHFLILFDOO\ WKH PRGHOV ZLOO EH VSHFLILHG LQ VXFK D ZD\ VR DV WR LQFOXGH WKH FODVV RI VLPXOWDQHRXV PRGHOV IRU WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV FRQVLGHUHG LQ &KDSWHU
PAGE 52
/HW WKH UDQGRP YHFWRU < YHF k[@ FDQ EH VSHFLILHG DV k[ fÂ§ ^e e 5V f &L ORJ $?A Â‘ ;LL & ORJ fÂ§ ;3 /HA HLkI Uf Qn`! f
PAGE 53
ZKHUH &L &LM &LX LV TL [Pc Â $L pA$LM $LM $Q LV Pc [ 5 L / /M /? LV G [ 5 = YHFeLe.f DQG LV L [ ;L LV .TL [ SL RI IXOO UDQN Â Q LV WKH [ YHFWRU RI PXOWLQRPLDO LQGLFHV V 5. WKH WRWDO QXPEHU RI FHOOV /HW XV VD\ WKDW D PRGHO WKDW FDQ EH VSHFLILHG DV LQ f VDWLVILHV DVVXPSWLRQ $Of 7KDW LV $Of 7KH PXOWLQRPLDO UHVSRQVH PRGHO FDQ EH VSHFLILHG DV LQ f 1RWLFH WKDW WKH PDWULFHV RI &W DUH DOO LGHQWLFDO OLNHZLVH ZLWK WKH PDWULFHV FRPSULVLQJ $Â DQG / 7KLV UHTXLUHV WKDW WKH PRGHO GRHV QRW FKDQJH DFURVV WKH SRSXODWLRQV PXOWLQRPLDOVf $OVR WKH WZR VHWV RI IUHHGRP HTXDWLRQV LQ f ZLOO DOORZ XV WR XVH WZR GLIIHUHQW W\SHV RI PRGHOV IRU WKH H[SHFWHG FHOO PHDQV 7KLV SURYLGHV XV ZLWK HQRXJK JHQHUDOLW\ WR ILW PDQ\ LQWHUHVWLQJ PRGHOV )RU H[DPSOH ZH PD\ ZLVK WR VLPXOWDQHRXVO\ ILW D OLQHDUE\OLQHDU DVVRFLDWLRQ ORJOLQHDU PRGHO IRU WKH MRLQW GLVWULEXWLRQ DQG D FXPXODWLYH ORJLW PRGHO IRU WKH PDUJLQDO GLVWULEXWLRQV :H FDQ FRQYHQLHQWO\ UHZULWH f DV k[ ^H5D &ORJ$Hf ;/Â HrnkI rf Qn` f ZKHUH $n >$? $n? & &\ p & ; ;? k ; DQG c YHF""f 1RWLFH WKDW WKH PRGHO >[@ LV VSHFLILHG LQ WHUPV RI ERWK IUHHGRP HTXDWLRQV DQG FRQVWUDLQW HTXDWLRQV :H ZLOO UHZULWH >k[@ DV D FRQVWUDLQW
PAGE 54
PRGHO NHHSLQJ LQ WKH EDFN RI RXU PLQGV WKDW WKH IUHHGRP SDUDPHWHUV PD\ EH RI LQWHUHVW DOVR /HW 8 EH D .TL Tf [ X PDWUL[ RI IXOO FROXPQ UDQN X VXFK WKDW 8n; +HUH X LV WKH GLPHQVLRQ RI WKH QXOO VSDFH RI ;n $L;nf LH X .TL Tf SL Af 6LQFH 8 FDQ EH FKRVHQ WR EH RI IXOO FROXPQ UDQN LW IROORZV WKDW WKH FROXPQV RI 8 IRUP D EDVLV IRU WKH QXOO VSDFH RI ;n 7KXV WKH UDQJH VSDFH RI 8 HTXDOV WKH QXOO VSDFH RI ;n LH 0^8f ;;nf 0XOWLSO\LQJ WKH ULJKW DQG OHIW KDQG VLGH RI WKH IUHHGRP HTXDWLRQ &ORJ$HLf ; E\ 8? ZH FDQ UHZULWH f DV 4K ^e H 5V 8n&ORJ$HAf /Hr HrnkI 5f Qn ` f 7KXV [ k DQG WKH PRGHOV >k[@ DQG >k@ DUH RQH DQG WKH VDPH $W WKLV SRLQW ZH ZLOO DVVXPH WKDW WKH FRQVWUDLQWV LPSOLHG E\ WKH PRGHO >k@ DUH QRQUHGXQGDQW VR WKDW WKH PRGHO LV ZHOO GHILQHG 0RUH VSHFLILFDOO\ OHW Knef >8n&?RJ$HWff?HWn/n@ EH WKH [ X Of .Gf YHFWRU RI FRQVWUDLQW IXQFWLRQV :H ZLOO DVVXPH WKDW WKH X FRQVWUDLQWV LPSOLHG E\ ILef DQG Qn DUH QRQUHGXQGDQW 1RWLFH WKDW WKH FRQVWUDLQWV LQ ILef GR QRW LQFOXGH WKH LGHQWLILDELOLW\ FRQVWUDLQWV :H WUHDW WKH LGHQWLILDELOLW\ FRQVWUDLQWV VHSDUDWHO\ IRU UHDVRQV WKDW ZLOO EHFRPH DSSDUHQW ZKHQ ZH DFWXDOO\ ILW WKH PRGHOV $V VWDWHG SUHYLRXVO\ RQH RI RXU SULPDU\ REMHFWLYHV LV WR HVWLPDWH WKH PRGHO SDUDPHWHUV e DQG WKH IUHHGRP SDUDPHWHUV XQGHU WKH DVVXPSWLRQ WKDW >k[@ DQG >k@f KROGV :H ZLOO XVH WKH PD[LPXP OLNHOLKRRG HVWLPDWHV ZKLFK FDQ EH IRXQG E\ PD[LPL]LQJ WKH ORJ OLNHOLKRRG RI < VXEMHFW WR WKH FRQVWUDLQW WKDW >K@ KROGV
PAGE 55
7KH NHUQHO RI WKHf ORJ OLNHOLKRRG XQGHU WKH SURGXFW PXOWLQRPLDO DVVXPSWLRQ LV VKRZQ LQ f ,W LV 7KXV ZH DUH WR PD[LPL]H WKH IXQFWLRQ \f \ne VXEMHFW WR e H 0D[LPXP /LNHOLKRRG (VWLPDWLRQ ,Q WKLV VHFWLRQ ZH ZLOO GLVFXVV WZR SURFHGXUDOO\ GLIIHUHQW DSSURDFKHV WR PD[LPL]LQJ WKH ORJ OLNHOLKRRG IO0?en\f VXEMHFW WR e H 4K 7KH ILUVW DSSURDFK ZKLFK LV WKH PRUH FRPPRQO\ XVHG DSSURDFK UHTXLUHV WKDW WKH PRGHO EH VSHFLILHG HQWLUHO\ LQ WHUPV RI IUHHGRP HTXDWLRQV 2IWHQ WLPHV ZKHQ WKHUH DUH QR LGHQWLILDELOLW\ FRQVWUDLQWV WKH PRGHO FDQ EH FRPSOHWHO\ VSHFLILHG DV D IUHHGRP PRGHO 0RGHOV DPHQDEOH WR WKLV DSSURDFK LQFOXGH WKH 3RLVVRQ ORJOLQHDU PRGHO DQG WKH 1RUPDO OLQHDU PRGHO 7KH VHFRQG DSSURDFK /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV FDQ EH GLUHFWO\ DSSOLHG ZKHQ WKH PRGHO LV VSHFLILHG FRPSOHWHO\ LQ WHUPV RI FRQVWUDLQW HTXDWLRQV 6LQFH WKH SURGXFW PXOWLQRPLDO PRGHO LQFOXGHV LGHQWLILDELOLW\ FRQVWUDLQWV LW FDQ PRUH HDVLO\ EH VSHFLILHG LQ WHUPV RI FRQVWUDLQW HTXDWLRQV )RU WKLV UHDVRQ WKLV VHFRQG PHWKRG LV WKH SUHIHUUHG FKRLFH ,Q WKH IROORZLQJ VHFWLRQV ZH GLVFXVV VRPH DGGLWLRQDO IHDWXUHV RI WKHVH WZR PHWKRGV )UHHGRP 3DUDPHWHU $SSURDFK 2QH DSSURDFK RIWHQ XVHG LQ VLPSOH VLWXDn WLRQV QDPHO\ WKRVH VLWXDWLRQV ZKHQ WKH PRGHO FDQ EH VSHFLILHG FRPSOHWHO\ LQ WHUPV RI IUHHGRP HTXDWLRQV LV WR ZULWH WKH SDUDPHWHU e DV D IXQFWLRQ RI WKH IUHHGRP SDUDPHWHU DQG PD[LPL]H WW0?ef@\f ZLWK UHVSHFW WR 7KH YHFWRU e"f ZLOO EH LQ WKH PRGHO VSDFH VLQFH WKH PRGHO ZDV VSHFLILHG
PAGE 56
FRPSOHWHO\ LQ WHUPV RI )RU H[DPSOH LI WKH PRGHO FRXOG EH VSHFLILHG DV t[ m f 5 ORJH ;IW WKHQ ef ; 1RWLFH WKDW WKH PXOWLQRPLDO PRGHO ZKLFK LQFOXGHV WKH FRQVWUDLQWV HApA@f Qn LV QRW GLUHFWO\ DPHQDEOH WR WKLV DSSURDFK ,Q IDFW ZH ZRXOG KDYH WR UHSDUDPHWHUL]H WR D VPDOOHU VHW er RI V. PRGHO SDUDPHWHUV WKDW DFFRXQW IRU WKH FRQVWUDLQWV 7KLV UHSDUDPHWHUL]DWLRQ UHVXOWV LQ DQ DV\PPHWULF WUHDWPHQW RI WKH e DQG IRU WKDW UHDVRQ LV GHHPHG XQGHVLUDEOH 2Q WKH RWKHU KDQG WKH 3RLVVRQ PRGHO FRQVLGHUHG EHORZ ZLOO RIWHQ OHQG LWVHOI WR WKLV PD[LPL]DWLRQ DSSURDFK VLQFH WKH FRQVWUDLQWV HAIILA7f Qn DUH QRW LQFOXGHG &RPSXWDWLRQDOO\ WKH PHWKRG RI PD[LPL]LQJ WKH ORJ OLNHOLKRRG ZLWK UHVSHFW WR WKH IUHHGRP SDUDPHWHUV LV XVXDOO\ VLPSOH $VVXPLQJ WKH ORJ OLNHOLKRRG LV FRQFDYH DQG GLIIHUHQWLDEOH LQ ZH QHHG RQO\ VROYH IRU WKH URRW RI WKH fVFRUH HTXDWLRQVf YL] V"\f 0J4f\fB R 0DQ\ RI WKH DV\PSWRWLF SURSHUWLHV RI WKH PD[LPXP OLNHOLKRRG HVWLPDWRU IRU c DUH GHULYHG E\ IRUPDOO\ H[SDQGLQJ WKH VFRUH YHFWRU V\f DERXW WKH WUXH YDOXH Ir LQ D OLQHDU 7D\ORU H[SDQVLRQ 7KDW LV P Yf mrr Yf VAYf: )f ROL P ,Q SDUWLFXODU LQ PDQ\ VLWXDWLRQV 6
PAGE 57
VR WKDW cr KDV WKH VDPH DV\PSWRWLF GLVWULEXWLRQ DV fa9Q 6XEVHTXHQWO\ ZH ZLOO GHULYH WKH DV\PSWRWLF GLVWULEXWLRQ RI cr LQ D GLIIHUHQW ZD\ 7KLV DOWHUQDWLYH GHULYDWLRQ RI WKH DV\PSWRWLF GLVWULEXWLRQ RI WKH IUHHGRP SDUDPHWHU HVWLPDWH ZLOO VKHG QHZ OLJKW RQ WKH UHODWLRQVKLS EHWZHHQ WKH DV\PSWRWLF EHKDYLRU RI WKH HVWLPDWHV XQGHU WKH WZR VDPSOLQJ DVVXPSWLRQVfÂ§ SURGXFW 3RLVVRQ DQG SURGXFW PXOWLQRPLDO ([SUHVVLRQ f DOVR JLYHV VRPH LQGLFDWLRQ RI KRZ RQH PLJKW QXPHUn LFDOO\ VROYH IRU c WKH URRW RI WKH VFRUH HTXDWLRQ $ 1HZWRQ5DSKVRQ W\SH DOJRULWKP LV RIWHQ XVHG 7KLV URRW ILQGLQJ DOJRULWKP LQYROYHV WKH LQYHUVLRQ RI WKH GHULYDWLYH PDWUL[ GVn\fGn ZKLFK LV XVXDOO\ RI VPDOO GLPHQVLRQ VLQFH WKH PRGHO LV XVXDOO\ VSHFLILHG LQ WHUPV RI D VPDOO QXPEHU RI IUHHGRP SDUDPHWHUV ,Q IDFW WKH GLPHQVLRQ RI WKH GHULYDWLYH PDWUL[ ZLOO QRW EH ODUJHU WKDQ V [ V ZKLFK RFFXUV ZKHQ WKH PRGHO LV VDWXUDWHG &RQVWUDLQW (TXDWLRQV $SSURDFK ,Q PDQ\ VLWXDWLRQV LW PD\ EH GLIILFXOW WR VSHFLI\ D PRGHO LQ WHUPV RI RQO\ IUHHGRP SDUDPHWHUV RU SHUKDSV LW LV SRVVLEOH EXW WKH UHVHDUFKHU ZRXOG OLNH WR WUHDW WKH PRGHO SDUDPHWHUV V\PPHWULFDOO\ ZKLFK ZRXOG QHFHVVLWDWH DQ DGGLWLRQDO FRQVWUDLQW HTXDWLRQ ,W DOVR FRXOG EH WKDW WKH IXQFWLRQ &7RJ$HA LV QRW D IXQFWLRQ RI e VR WKDW IRU JLYHQ c ZH FDQ QRW VROYH IRU e H[SOLFLWO\ ,Q DQ\ RI WKHVH FDVHV ZH PD\ QRW EH DEOH WR XVH WKH DIRUHPHQWLRQHG PD[LPL]DWLRQ DSSURDFK ,Q WKLV VHFWLRQ ZH FRQVLGHU DQ DOWHUQDWLYH PHWKRG IRU ILQGLQJ WKDW e WKDW PD[LPL]HV WKH IXQFWLRQ A0Ae\f VXEMHFW WR e H M 7KH PHWKRG ZH ZLOO XVH LV WKH /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV $LWFKLVRQ DQG
PAGE 58
6LOYH\ f DQG 6LOYH\ f SURYLGH PXFK RI WKH HVVHQWLDO XQGHUO\LQJ WKHRU\ UHODWHG WR WKLV DSSURDFK 7KUHH SRVLWLYH IHDWXUHV RI WKLV PHWKRG LQFOXGH Lf HVWLPDWLRQ RI ERWK e DQG c LV SRVVLEOH LLf WKH PHWKRG SURYLGHV XV ZLWK DQRWKHU HQOLJKWHQLQJ ZD\ RI GHULYLQJ WKH DV\PSWRWLF GLVWULEXWLRQ RI WKH IUHHGRP SDUDPHWHU HVWLPDWRUV DQG LLLf WKH PHWKRG ZRUNV TXLWH JHQHUDOO\ $ QHJDWLYH IHDWXUH RI WKLV DSSURDFK LV WKH FRPSXWDWLRQDO GLIILFXOW\ &RPSXWDWLRQDOO\ WKH PHWKRG EHFRPHV EXUGHQVRPH DV V WKH QXPEHU RI ORJ PHDQ SDUDPHWHUV DQG X O WKH QXPEHU RI FRQVWUDLQWV LPSOLHG E\ WKH PRGHO EHFRPH ODUJH ,Q IDFW WKH DOJRULWKP LQYROYHV WKH LQYHUVLRQ RI DQ V X Of [V?X Of PDWUL[ 2QH SRVLWLYH QRWH LV WKDW WKLV SRWHQWLDOO\ YHU\ ODUJH PDWUL[ GRHV KDYH D VLPSOH IRUP DQG RQH FDQ LQYRNH VRPH VLPSOH PDWUL[ DOJHEUD UHVXOWV WR UHGXFH WKH LQYHUVLRQ SUREOHP WR RQH RI LQYHUWLQJ PDWULFHV RI GLPHQVLRQV X f [ X Of DQG V [ V 7R EHVW LOOXVWUDWH WKH GLIIHUHQFH LQ FRPSXWDWLRQDO GLIILFXOW\ RI WKH WZR PHWKRGV ZH FRQVLGHU WKH IROORZLQJ QRUPDO OLQHDU PRGHO H[DPSOH /HW
PAGE 59
(YHQ ZKHQ ZH XVH WKH PDWUL[ DOJHEUD UHVXOWV WKDW VLPSOLI\ WKH SUREOHP RI ZRUNLQJ ZLWK WKH [ PDWUL[ ZH VWLOO DUH OHIW ZLWK D IRUPLGDEOH WDVN ,W VHHPV WKDW ZKHQ V LV ODUJH DQG WKH PRGHO LV SDUVLPRQLRXV LH X O WKH QXPEHU RI FRQVWUDLQWV LV ODUJH WKH XQGHWHUPLQHG PXOWLSOLHU PHWKRG PD\ QRW EH WKH PHWKRG RI FKRLFH +RZHYHU LQ WLPH DV FRPSXWHU HIILFLHQF\ JDLQV DUH UHDOL]HG ZH SUHGLFW WKDW WKH VFRSH RI FDQGLGDWH PRGHOV WR EH ILW XVLQJ WKLV PHWKRG ZLOO LQFUHDVH WUHPHQGRXVO\ ,Q IDFW DW SUHVHQW PDQ\ FDWHJRULFDO PRGHOV FDQ HDVLO\ EH ILW XVLQJ /DJUDQJHfV PHWKRG :H GLVFXVV LQ PRUH GHWDLO KRZ ZH FDQ XVH WKH PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV WR ILW PRGHOV OLNH >N@ RI f :H DUH WR PD[LPL]H WKH IXQFWLRQ \f \ne VXEMHFW WR WKH FRQVWUDLQW e ZKHUH ^e ( 5V &n&ORJAHAf /HW HApAOAf Qn ` ^eH5LK^2 HW?%"O5f Qn` DQG Knef >ORJHWn$nf&n8 Hrn/n@ &RQVLGHU WKH /DJUDQJLDQ REMHFWLYH IXQFWLRQ ) f I0fL\f HInpI rf Qff7 ZKHUH YHFeU $f 7KH [ YHFWRU U DQG WKH X f [ YHFWRU $ DUH FDOOHG HLWKHU f/DJUDQJH PXOWLSOLHUVf RU fXQGHWHUPLQHG PXOWLSOLHUVf 3URYLGHG D PD[LPXP e H[LVWV DQG WKDW WKH DFRELDQ RI >HAVI7f QnLnef@ LV RI IXOO URZ UDQN X IRU DOO e ZH FDQ VROYH IRU WKH PD[LPXP E\ VROYLQJ WKH V\VWHP RI HTXDWLRQV G) P!f I \ e!H0ff pI +eAff$: A kI } 9 0O$Lff f
PAGE 60
ZKHUH WKH PDWUL[ +ef GKnefGe 7KH DFRELDQ FRQGLWLRQ EDVLFDOO\ UHTXLUHV WKH FRQVWUDLQWV WR EH QRQUHGXQGDQW WKHUHE\ PDNLQJ >@ D ZHOO GHILQHG PRGHO )URP WKLV SRLQW RQ IRU QRWDWLRQDO FRQYHQLHQFH WKH LQGLFHV IRU WKH GLUHFW VXP p ZLOO EH RPLWWHG XQOHVV WKH\ DUH GLIIHUHQW IURP DQG :H QRZ UHTXLUH WKH PDWULFHV RI PRGHOV >A@ DQG >@ WR VDWLVI\ VRPH DGGLWLRQDO FRQGLWLRQV /HW XV DVVXPH WKDW $f (LWKHU &Â ,T. RU &Â k OPÂf Â DQG $f ,I &L ,T. WKHQ 0;Lf $WpOPcf 7KH DVVXPSWLRQV UHTXLUH WR EH HLWKHU D FRQWUDVW PDWUL[ URZV VXP WR ]HURf D ]HUR PDWUL[ RU WKH LGHQWLW\ PDWUL[ ,I &Â LV WKH LGHQWLW\ PDWUL[ LW ZLOO EH UHTXLUHG WKDW WKHUH H[LVWV D VHW RI FROXPQV LQ WKDW VSDQV D VSDFH FRQWDLQLQJ WKH UDQJH VSDFH RI pI7Pc )RU PRVW PRGHOV RI LQWHUHVW WKHVH FRQGLWLRQV DUH PHW )RU H[DPSOH DQ\ RI WKH ORJLW W\SH PRGHOV VXFK DV FXPXODWLYH RU PXOWLSOH ORJLW PRGHOV FDQ EH VSHFLILHG ZLWK & EHLQJ D FRQWUDVW PDWUL[ )RU ORJOLQHDU PRGHOV WKH FRQGLWLRQ $f LV PHW ZKHQHYHU WKH PRGHO LQFOXGHV D SDUDPHWHU IRU HDFK RI WKH PXOWLQRPLDOV 7KH IROORZLQJ OHPPD ZLOO EH XVHIXO LQ VKRZLQJ WKDW WKH PD[LPXP OLNHOLKRRG HVWLPDWHV RI e DQG c DUH HTXLYDOHQW XQGHU ERWK VDPSOLQJ VFKHPHVfÂ§ SURGXFW3RLVVRQ DQG SURGXFWPXOWLQRPLDO 7KH OHPPD ZLOO DOVR HQDEOH XV WR UHGXFH WKH QXPEHU RI HTXDWLRQV LQ f WKDW PXVW EH VLPXOWDQHRXVO\ VROYHG ZKHQ FRPSXWLQJ WKH PD[LPXP PXOWLQRPLDOf OLNHOLKRRG HVWLPDWRUV
PAGE 61
/HPPD ,I WKH PDWULFHV RI PRGHOV >k[@ DQG >kL@ VDWLVI\ $Of $f DQG $f WKHQ SURYLGHG WKH PRGHO KROGV p L m YUfKf R 3URRI 8VLQJ PDWUL[ GHULYDWLYHV 0DF5DH 0DJQXV DQG 1HXGHFNHU f LW IROORZV WKDW +ef >'f$'?$Âf&n8 'HAf/n` 7KXV p U5f+ 6Af$n'n$f&n8 T6f/n H Hmf>$ $n@'n Af &" H DfX pHO? HHmf$L,!$DHff@&p&6f8 >HHIL$nXSn8LA&" pA$Lf'n$f&f?8 >k2F HL8fF6) 7KH WKLUG HTXDOLW\ IROORZV VLQFH WKH PRGHO KROGLQJ LPSOLHV WKDW pHÂ/Â 7KH VL[WK HTXDOLW\ FDQ EH VHHQ YLD WKH IROORZLQJ DUJXPHQW ,I ERWK &ÂfV DUH FRQWUDVW PDWULFHV RU ]HUR PDWULFHV WKHQ $f LPSOLHV WKDW WKH PDWUL[ >kOPLfA" kOPMfA@ 6 WKH ]HUR PDWUL[ 2Q WKH RWKHU KDQG LI ERWK &? DQG & DUH LGHQWLW\ PDWULFHV WKHQ VLQFH WKH FROXPQV RI 8 VSDQ WKH QXOO VSDFH RI ;n ZKLFK E\ $f LPSOLHV WKDW WKH FROXPQV RI 8 VSDQ D VHW FRQWDLQHG LQ WKH QXOO VSDFH RI IILO72O 9 9IIOOPf
PAGE 62
ZH KDYH WKDW > k Af k OnPf@8 $Q\ RWKHU FRPELQDWLRQ RI &M DQG & FDQ DOVR EH VHHQ WR UHVXOW LQ WKH PDWUL[ HTXDOLQJ ]HUR Â‘ 7KH IROORZLQJ WKHRUHP JLYHV FRQGLWLRQV XQGHU ZKLFK ZH FDQ ILQG WKH 0/ HVWLPDWRUV RI e E\ VROYLQJ D UHGXFHG VHW RI HTXDWLRQV 7KH VPDOOHU V\VWHP RI HTXDWLRQV QR ORQJHU LQFOXGHV WKH LGHQWLILDELOLW\ FRQVWUDLQW HTXDWLRQV 7KHRUHP /HW YHFe0f U0? ?0ff EH WKH VROXWLRQ WR f $VVXPLQJ WKDW $Of $f DQG $f KROG WKH VXEYHFWRU YHFe0? $0f LV WKH VROXWLRQ WR WKH UHGXFHG VHW RI V X O HTXDWLRQV +H0ff $0f K&A0ff f 3URRI 3UHPXOWLSO\LQJ WKH ILUVW VHW RI HTXDWLRQV LQ f E\ knA ZH DUULYH DW H Lf5f\ H Ln5f'H"fff p LrfW p R f 1RZ k Onf\ Q DQG k OnIOf=fHL0ff kHA n $OVR VLQFH k LW PXVW P \ EH WKDW k HA f k f 'Qf WKH GLDJRQDO PDWUL[ ZLWK WKH PXOWLQRPLDO LQGLFHV RQ WKH GLDJRQDO )XUWKHU E\ /HPPD k OnL=f/Ie0ff 7KHUHIRUH f FDQ EH UHZULWWHQ DV Q 'Qf I: ZKLFK LPSOLHV WKDW I0f A 1RZ VLQFH WKH LGHQWLILDELOLW\ FRQVWUDLQWV KDYH EHHQ H[SOLFLWO\ DFFRXQWHG IRU ZKHQ VROYLQJ IRU UA0? ZH FDQ UHSODFH RI f E\ A DQG RPLW WKH LGHQWLILDELOLW\ FRQVWUDLQWV 7KXV YHFe0f $0f
PAGE 63
LV WKH VROXWLRQ WR WKH UHGXFHG VHW RI HTXDWLRQV HWI!fÂƒ!n? 9 0O0ff f 7KLV LV ZKDW ZH VHW RXW WR VKRZ J %HIRUH GHWDLOLQJ WKH LWHUDWLYH VFKHPH XVHG IRU VROYLQJ f ZH ZLOO H[SORUH WKH DV\PSWRWLF EHKDYLRU RI WKH HVWLPDWRU YHFe0? ZLWKLQ WKH IUDPHZRUN RI FRQVWUDLQW PRGHOV $V\PSWRWLF 'LVWULEXWLRQ RI 3URGXFW0XOWLQRPLDO 0/ (VWLPDWRUV ,Q ZKDW IROORZV ZH ZLOO DVVXPH WKDW WKH QXPEHU RI LGHQWLILDELOLW\ FRQVWUDLQWV LV VRPH IL[HG LQWHJHU :H DOVR ZLOO DVVXPH WKDW WKH DV\PSWRWLFV KROG DV Qr PLQ^QÂ` DSSURDFKHV LQILQLW\ DQG WKDW Qr a QÂ L 7KDW LV ZH DVVXPH WKDW WKH DV\PSWRWLF DSSUR[LPDWLRQV KROG DV HDFK RI WKH PXOWLQRPLDO LQGLFHV JHW ODUJH DW WKH VDPH UDWH 7KH GHULYDWLRQ RI WKH DV\PSWRWLF GLVWULEXWLRQ RI ZLOO IROORZ FORVHO\ WKDW RI $LWFKLVRQ DQG 6LOYH\ f %ULHIO\ $LWFKLVRQ DQG 6LOYH\ VKRZ WKDW LI WKH VFRUH YHFWRU LV RSQf DQG WKH FRQVWUDLQWV DUH VXFK WKDW WKH GHULYDWLYH PDWULFHV ef DQG GLInefe KDYH HOHPHQWV WKDW DUH ERXQGHG IXQFWLRQV WKHQ SURYLGHG FHUWDLQ PLOG UHJXODULW\ FRQGLWLRQV KROG WKH PD[LPXP OLNHOLKRRG HVWLPDWRU e LV DQ QFRQVLVWHQW HVWLPDWRU RI e DQG $ LV DQ QFRQVLVWHQW HVWLPDWRU RI 7KH\ VKRZ WKDW WKH MRLQW GLVWULEXWLRQ RI Qre ARfAAAf LV PXOWLYDULDWH QRUPDO ZLWK ]HUR PHDQ DQG FRYDULDQFH PDWUL[ %aO %n+L+n%n+\n+n% 9 +n%n+f ZKHUH % LV WKH LQIRUPDWLRQ PDWUL[ DQG + LV WKH GHULYDWLYH RI WKH FRQVWUDLQW IXQFWLRQ
PAGE 64
,Q RXU DSSOLFDWLRQ KRZHYHU WKHUH DUH VRPH PLQRU FKDQJHV :LWK WKH SDn UDPHWHUL]DWLRQ ZH XVH WKH LQIRUPDWLRQ PDWUL[ LV ]HUR VLQFH WKH PXOWLQRPLDOf ORJ OLNHOLKRRG f LV OLQHDU LQ WKH SDUDPHWHU e 7KLV KDSSHQV EHFDXVH WKH LGHQWLILDELOLW\ FRQVWUDLQWV HA kA Af Qn DUH LJQRUHG WR SUHVHUYH V\PPHWU\ ZKHQ GLIIHUHQWLDWLQJ $OVR LQ RXU SDUDPHWHUL]DWLRQ WKH FRQVWUDLQWV DUH LQ WHUPV RI HA WKH FRPSRQHQWV RI ZKLFK DUH HtÂ QÂUÂM 7KXV WKH FRQVWUDLQWV DQG WKH FRUUHVSRQGLQJ GHULYDWLYH PDWULFHV PD\ QRW EH ERXQGHG )RU H[DPSOH D W\SLFDO FRQVWUDLQW LV RI WKH IRUP /HW fÂ§ R ,W IROORZV WKDW WKH FRPSRQHQWV RI /HW DQG WKH GHULYDWLYHV DUH LQFUHDVLQJ ZLWKRXW ERXQG DV WKH PXOWLQRPLDO LQGLFHV DUH DOORZHG WR LQFUHDVH ZLWKRXW ERXQG )RUWXQDWHO\ ZH FDQ VWLOO XVH WKH UHVXOWV RI $LWFKLVRQ DQG 6LOYH\ f E\ UHSODFLQJ WKH PDWUL[ + DQG WKH YHFWRU $Q RI $LWFKLVRQ DQG 6LOYH\ E\ RXU +Qr DQG $ ZKHUH Qr PLQ^QÂ` 7KH ]HUR LQIRUPDWLRQ SUREOHP FDQ EH VROYHG E\ LGHQWLI\LQJ WKH YHFWRU < HW DV WKH fVFRUH YHFWRUf ,W LV SRLQWHG RXW WKDW LQ WKLV FDVH WKH DV\PSWRWLF YDULDQFH RI p'BQÂOAf WLPHV WKH VFRUH YHFWRU LV QRW HTXDO WR WKH QHJDWLYH GHULYDWLYH PDWUL[ '7f EXW LQVWHDG LV HTXDO WR 'WWf kURÂUÂ 7KLV KDSSHQV EHFDXVH WKH FRPSRQHQWV RI < DUH QRW LQGHSHQGHQW < LV SURGXFW PXOWLQRPLDO 8VLQJ WKLV UHSDUDPHWHUL]DWLRQ DOO RI WKH QHFHVVDU\ DVVXPSWLRQV UHTXLUHG E\ $LWFKLVRQ DQG 6LOYH\ f KROG LH DVVXPSWLRQV ; DQG + RI $LWFKLVRQ DQG 6LOYH\ f KROG $V SUHYLRXVO\ PHQWLRQHG $LWFKLVRQ DQG 6LOYH\ VKRZ WKDW $ LV DQ QfÂ§FRQVLVWHQW HVWLPDWRU RI :LWK RXU SDUDPWHUL]DWLRQ KDYLQJ UHSODFHG $Q E\ $ LW IROORZV WKDW $0f ZLOO EH QAFRQVLVWHQW :H QRZ GHULYH WKH DV\PSWRWLF GLVWULEXWLRQ RI
PAGE 65
'HILQH WKH VWRFKDVWLF IXQFWLRQ J E\ 02 7KH PD[LPXP OLNHOLKRRG HVWLPDWRU LV WKH VROXWLRQ WR J>?
PAGE 66
1RZ WKH UDQGRP YDULDEOH p'BQLOL=f\HÂrf LV D YHFWRU RI QRUPDOL]HG VDPSOH SURSRUWLRQV VR WKDW p'nO?QLO5f^< Rff KDV DQ DV\PSWRWLF QRUPDO GLVWULEXWLRQ ZLWK ]HUR PHDQ DQG FRYDULDQFH PDWUL[ A'W7Rf k77RL77A A 7KHUHIRUH E\ DQ H[WHQVLRQ RI D WKHRUHP RI &UDPHU f DQG E\ HTXDWLRQ f LW IROORZV WKDW rf QcYHFe0f e!$0ff KDV DQ DV\PSWRWLF QRUPDO GLVWULEXWLRQ ZLWK PHDQ ]HUR DQG FRYDULDQFH 'LUff 'ZKf pLURLLFnH ^rrr R f O r f6 7 LfÂ§f 7KLV FRYDULDQFH PDWUL[ LV VKRZQ LQ WKH DSSHQGL[ WR KDYH WKH VLPSOH IRUP 0L 0 ZKHUH 0 'a?rRf 'nA+L+n'nA+
PAGE 67
,W LV 'n 'n+L+n'n+\n+n' 9 n +n'n+fZKHUH fÂ§ f 'HWrf DQG + Lef /DJUDQJHfV 0HWKRGfÂ§7KH $OJRULWKP ,Q WKLV VHFWLRQ ZH JLYH GHWDLOV RI KRZ RQH FDQ DFWXDOO\ ILW WKH PRGHOV RI f RU HTXLYDOHQWO\ f :H VKRZ KRZ /DJUDQJHfV XQGHWHUPLQHG PXOWLSOLHUV PHWKRG FDQ EH XVHG LQ FRQMXQFWLRQ ZLWK D PRGLILHG 1HZWRQ 5DSKVRQ LWHUDWLYH VFKHPH WR FRPSXWH WKH 0/ HVWLPDWRUV DQG WKHLU DV\PSWRWLF FRYDULDQFHV :H ZLOO DVVXPH WKDW WKH PRGHO DVVXPSWLRQV $Of $f DQG $f KROG 7KLV VHFWLRQ LQFOXGHV DQ RXWOLQH RI WKH DOJRULWKP XVHG LQ WKH )2575$1 SURJUDP fPOHUHVWUDLQWf 5HFDOO WKDW RXU REMHFWLYH LV WR ILQG WKDW H 2A ZKHUH 4[ LWH5f &7RJ$Hf ; pOnrfH Q` WKDW PD[LPL]HV WKH PXOWLQRPLDO ORJ OLNHOLKRRG f Â0fÂ \f \nO 6LQFH WKH DVVXPSWLRQV $Of $f DQG $f KROG ZH VHH E\ 7KHRUHP WKDW RXU SUREOHP LV UHGXFHG WR RQH RI VROYLQJ WKH V\VWHP RI HTXDWLRQV f LH WR ILQG WKH 0/ HVWLPDWRU YHFA$IO ZH PXVW VLPXOWDQHRXVO\ VROYH WKH V\VWHP RI V X HTXDWLRQV }Ârrf
PAGE 68
ZKHUH WKH X ,f [ YHFWRU K DQG WKH V [ Z f PDWUL[ + DUH GHILQHG DV IROORZV 0Lf &n&ORJAff DQG Pf PWf G= Â‘ ,W ZLOO EH VKRZQ LQ VHFWLRQ f WKDW Jf LV DFWXDOO\ WKH GHULYDWLYH RI WKH /DJUDQJLDQ REMHFWLYH IXQFWLRQ XQGHU WKH SURGXFW3RLVVRQ VDPSOLQJ DVVXPSWLRQ 7KH LWHUDWLYH VFKHPH XVHG LQ WKH )2575$1 SURJUDP fPOHUHVWUDLQWf LV D PRGLILHG 1HZWRQ5DSKVRQ DOJRULWKP 7KH DOJRULWKP FDQ EH VNHWFKHG DV IROORZV f )LQG D VWDUWLQJ YDOXH IRU f 5HSODFH 0 E\ A[f 0 *?:fJ^0f f f ,I __nAA,Af__ WRO JR WR f (OVH VWRS 7KH PDWUL[ *^f XVHG LQ VWHS f LV DFWXDOO\ *f P DQG WKH LQYHUVH RI *^f LV RI WKH YHU\ VLPSOH IRUP VHH $LWFKLVRQ DQG 6LOYH\ RU 5DR f *a?f 'n+L+n'n+\n+n' 'A+L+n'A+f ? +n'n+\n+n'r +n'n+f f f f
PAGE 69
ZKHUH 'HWf 6LQFH ZH XVH *f LQ SODFH RI WKH +HVVLDQ PDWUL[ WKH SURFHGXUH LV D PRGLILFDWLRQ WR WKH 1HZWRQ5DSKVRQ PHWKRG +DEHU Df XVHG WKH PRUH FRPSOLFDWHG +HVVLDQ PDWUL[ 1RWLFH WKDW WKH LQYHUVLRQ RI ZKLFK PD\ EH SHUIRUPHG DW HDFK LWHUDWLRQ LV QRW QHDUO\ DV GLIILFXOW DV LQYHUWLQJ D JHQHUDO PDWUL[ RI GLPHQVLRQ V X Of [ V X f )LUVW RI DOO LQ YLHZ RI f WR REWDLQ WKH LQYHUVH RI WKH SDUWLWLRQHG PDWUL[ ZH QHHG RQO\ LQYHUW WKH PDWULFHV DQG + 'B+ ZKLFK DUH RI GLPHQVLRQ V [ V DQG X Of [ X Of 6HFRQGO\ WKH LQYHUVLRQ RI LV VLPSOH VLQFH LV D GLDJRQDO PDWUL[ ZLWK HÂ RQ WKH GLDJRQDO +HQFH WKH PRVW IRUPLGDEOH WDVN LQ WKH LQYHUVLRQ SURFHVV LV WKH LQYHUVLRQ RI WKH V\PPHWULF SRVLWLYH GHILQLWH PDWUL[ +n'A+ 7KHUH DUH PDQ\ HIILFLHQW ZD\V WR LQYHUW ODUJH V\PPHWULF SRVLWLYH GHILQLWH PDWULFHV 8SRQ FRQYHUJHQFH RI WKH DOJRULWKP f HVWLPDWHV RI WKH DV\PSWRWLF FRYDULDQFHV RI DQG $0f DUH UHDGLO\ FDOFXODEOH :ULWH *Bf RI f DV ZKHUH 3 'a 'n+L+n'n+\n+n' 4 '++n'a+f 5 ^+n'+f %\ f WKH DV\PSWRWLF FRYDULDQFH RI 0f YHFe0f $:f FDQ EH HVWLPDWHG E\ f 5f 9DULDQFH HVWLPDWHV IRU RWKHU FRQWLQXRXV IXQFWLRQV RI A0? VXFK DV Â0f DQG S0f B ;nIf;n*ORJ$H0ff FDQ EH IRXQG E\ LQYRNLQJ
PAGE 70
WKH GHOWD PHWKRG )RU H[DPSOH YDU fYDU Lff8Hmff DQG YDU0ff ;;f;&'$MM0f$YD[S0ff$'$MM0f&;^;;f (YLGHQWO\ /DJUDQJHfV PHWKRG RI XQGHWHUPLQHG PXOWLSOLHUV SURYLGHV XV ZLWK D FRQYHQLHQW SURFHGXUH IRU PD[LPXP OLNHOLKRRG ILWWLQJ RI PRGHOV LQ D YHU\ JHQHUDO FODVV RI SDUDPHWULF PRGHOV IRU PXOWLYDULDWH SRO\WRPRXV GDWD ZLWK FRYDULDWHV SRVVLEOH :H QRZ EULHIO\ RXWOLQH WKH VWHSV QHHGHG WR SHUIRUP WKH LWHUDWLRQV RI f &RPSXWLQJ 8 7KH ILUVW WKLQJ ZH PXVW GR LV ZULWH WKH IUHHGRP PRGHO f ZKLFK FDQ HDVLO\ EH LQSXW E\ WKH XVHU DV D FRQVWUDLQW PRGHO f 7KHUHIRUH ZH PXVW FRPSXWH D IXOO FROXPQ UDQN PDWUL[ 8 WKDW VDWLVILHV 8n; 7KH PHWKRG ZH XVH WR ILQG 8 LV DWWULEXWHG WR +DEHU Ef 8VLQJ WKH QRWDWLRQ RI fPOHUHVWUDLQWf OHW ,EHD IXOO FROXPQ UDQN PDWUL[ RI GLPHQVLRQ T [ U /HW X T U EH WKH GLPHQVLRQ RI WKH QXOO VSDFH RI ;n )XUWKHU WKH PDWULFHV $ DQG & RI f ZLOO KDYH GLPHQVLRQV P[V DQG T [ P UHVSHFWLYHO\ 7KH UHODWLRQVKLS EHWZHHQ WKHVH GLPHQVLRQ YDULDEOHV DQG WKRVH XVHG LQ VHFWLRQV DQG LV DV IROORZV T .TTf U SL S P .UULL Pf :H XVH WKH YDULDEOHV J U DQG P IRU QRWDWLRQDO FRQYHQLHQFH
PAGE 71
&RQVLGHU WKH PDWUL[ 8r ,T;;n;fB;n 7KLV T[T PDWUL[ LV RI UDQN X T U DQG VDWLVILHV WKH SURSHUW\ 8rn; /HW : GHQRWH D T[X PDWUL[ ZLWK UDQGRP HOHPHQWV 6SHFLILFDOO\ :LM a 8QLIRUPf L OJ M OX ,W IROORZV WKDW WKH PDWUL[ : LV RI IXOO FROXPQ UDQN ZLWK SUREDELOLW\ RQH DQG KHQFH WKDW WKH T[X PDWUL[ 8 8r: LV RI IXOO FROXPQ UDQN X ZLWK SUREDELOLW\ RQH %XW WKH PDWUL[ 8 VDWLVILHV 8n; :n8rn; :n2 7KHUHIRUH DW OHDVW ZLWK SUREDELOLW\ RQH ZH KDYH IRXQG D IXOO FROXPQ UDQN PDWUL[ 8 WKDW VDWLVILHV WKH SURSHUW\ 8n; 8VLQJ WKLV 8 ZH DUH DEOH WR ZULWH IUHHGRP PRGHO f DV D FRQVWUDLQW PRGHO f &RPSXWLQJ Kef :H ZULWH WKH FRQVWUDLQW PRGHO RI f DV D f 5f Â‘Â‘ $If HLnHIOVf Qn` f ZKHUH WKH FRQVWUDLQW IXQFWLRQ K LV GHILQHG DV P XnF Aff &RPSXWLQJ Jf 1RWLFH WKDW VLQFH $Of $f DQG $f KROG WKH LGHQWLILDELOLW\ FRQVWUDLQWV SUHVHQW LQ WKH SURGXFW PXOWLQRPLDO PRGHO f FDQ EH DFFRXQWHG IRU H[SOLFLWO\ ,W ZLOO IROORZ E\ UHVXOWV RI VHFWLRQ WKDW XQGHU HLWKHU VDPSOLQJ VFKHPHfÂ§SURGXFW3RLVVRQ RU SURGXFWPXOWLQRPLDOfÂ§
PAGE 72
WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV IRU e DQG $ FDQ EH IRXQG E\ VROYLQJ WKH HTXDWLRQ Vr` \BH0If$f nf ZKHUH WKH PDWUL[ + LV WKH GHULYDWLYH RI Kn ZLWK UHVSHFW WR e &RPSXWLQJ ef :H ZLOO XVH PDWUL[ GHULYDWLYH UHVXOWV RI 0DF5DH f WR ILQG WKH PDWUL[ RI GHULYDWLYHV RI WKH FRQVWUDLQW IXQFWLRQ Knef Pf A A>ORJA$nf&n8 H&/n@ >'f$'?$Âf&n8 'f/n@ 7KH HTXDOLW\ IROORZV XSRQ XVLQJ WKH PDWUL[ YHUVLRQ RI WKH FKDLQ UXOH 1RWLFH WKDW '^f$ 'a?$Af&8 DQG WKDW GW39 GW3 GW3 8 GIL A GHW 'HWf/n &RPSXWLQJ *^f 7KH LWHUDWLYH VFKHPH f XVHG WR VROYH WKH V\VWHP RI HTXDWLRQV f LV DFWXDOO\ D VOLJKW PRGLILFDWLRQ RI WKH 1HZWRQ5DSKVRQ DOJRULWKP ,W LV D PRGLILFDWLRQ EHFDXVH ZH GR QRW XVH WKH GHULYDWLYH PDWUL[ *r GJ^fG WR DGMXVW DW HDFK LWHUDWLRQ DV +DEHU Df GLG EXW UDWKHU D VLPSOHU PDWUL[ WKDW LV UHODWHG WR *r E\ *r 2SQf 7KH GHULYDWLYH
PAGE 73
PDWUL[ *r FDQ EH FRPSXWHG DV IROORZV UA0O0O 0LO n a G2n a> DF f D[n ,'f6A +f? 9 +nWf f m" 7!7 7KH PDWUL[ G+ef$ B G+^4 G" G" ^, p $f LV RI RUGHU 2S^Y`f ZKHQ LW LV HYDOXDWHG DW YHFe ;f VLQFH PLf G" 3Qrf DQG $ 2SQff ,W IROORZV WKDW WKH PDWUL[ ZKLFK LV PXFK VLPSOHU WR LQYHUW WKDQ *r FDQ EH XVHG WR DGMXVW WKH HVWLPDWH DW HDFK LWHUDWLRQ &RPSXWLQJ WKH LQYHUVH RI $OWKRXJK WKH PDWUL[ LV RI GLPHQVLRQ V X 9f [ V X =f ZKLFK PD\ EH YHU\ ODUJH LQ SUDFWLFH LWV LQYHUVH LV UHODWLYHO\ VLPSOH WR FDOFXODWH 7KH LQYHUVH RI WKH SDUWLWLRQHG PDWUL[ LV VKRZQ E\ $LWFKLVRQ DQG 6LOYH\ f WR KDYH IRUP 'n+L+n'n+\n+n' ? ? +n'n+\n+n' ^+n'n+fn ff 7KHUHIRUH RQO\ WKH PDWULFHV DQG +n' +f ZKLFK DUH RI GLPHQVLRQV V [ V DQG ^X Of [ ^X ,f QHHG WR EH LQYHUWHG 7KH LQYHUVH RI LV HDVLO\
PAGE 74
FDOFXODWHG VLQFH LV D GLDJRQDO PDWUL[ ZLWK RQ WKH GLDJRQDO 7KH LQYHUVH RI + 'B +f D V\PPHWULF SRVLWLYH GHILQLWH PDWUL[ FDQ EH IRXQG TXLWH HDVLO\ HYHQ ZKHQ X WKH QXPEHU RI FRQVWUDLQWV LV ODUJH ,W VKRXOG EH SRLQWHG RXW WKDW ZKHQ V WKH WRWDO QXPEHU RI FHOO PHDQV LV ODUJH WKH QXPEHU RI FRQVWUDLQWV X PD\ EH ODUJH DQG RQ WKH VDPH RUGHU DV V 7KLV ZLOO EH WKH FDVH IRU SDUVLPRQLRXV PRGHOVfÂ§WKRVH PRGHOV ZLWK PDQ\ FRQVWUDLQWV UHODWLYH WR QXPEHU RI PRGHO SDUDPHWHUV 2QH FRXOG FKRRVH WR LQYHUW WKH PDWUL[ D OLPLWHG QXPEHU RI WLPHV WR PLWLJDWH WKH FRPSXWDWLRQDO EXUGHQ ,Q IDFW LQ WKHLU DQG SDSHUV $LWFKLVRQ DQG 6LOYH\ DGYRFDWH DQ LWHUDWLYH PHWKRG ZKHUHE\ WKH LQYHUVH RI LV FRPSXWHG RQO\ WZR WLPHV 2QFH DW WKH LQLWLDO LWHUDWLRQ DQG DJDLQ DW WKH ILQDO LWHUDWLRQ XSRQ FRQYHUJHQFH :H IHHO KRZHYHU WKDW LQ WKLV VSHFLDO FDVH LQ ZKLFK WKH PDWUL[ KDV D SDUWLFXODUO\ VLPSOH IRUP WKH LQYHUVH FDQ EH FRPSXWHG DW HDFK LWHUDWLRQ $ORQJ ZLWK LQFUHDVHG FRPSXWLQJ SRZHU WKHUH DUH PDQ\ HIILFLHQW DOJRULWKPV IRU LQYHUWLQJ ODUJH V\PPHWULF SRVLWLYH GHILQLWH PDWULFHV &RPSDULVRQ RI 3URGXFW0XOWLQRPLDO DQG 3URGXFW3RLVVRQ (VWLPDWRUV :H EHJLQ WKLV VHFWLRQ E\ LQWURGXFLQJ QRWDWLRQ IRU D SURGXFW3RLVVRQ UDQGRP YHFWRU 7KH V[O UDQGRP YHFWRU < YHFA@ ZKHUH t3 ^e e 5 &ORJ$Hf ; /Hr `
PAGE 75
RU HTXLYDOHQWO\ IRU DSSURSULDWHO\ FKRVHQ 8 HS 4>3f ^eH5V 8n&?RJ$Hrf ` f 7KLV PRGHO LPSOLHV DOO WKH VDPH FRQVWUDLQWV RQ e DV WKH SURGXFW PXOWLQRPLDO PRGHO >@ RI f ZLWK RQH H[FHSWLRQfÂ§WKH LGHQWLILDELOLW\ FRQVWUDLQWV HA k f Qn DUH QRW LQFOXGHG 'HQRWH WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV FRPSXWHG DVVXPLQJ f DQG f E\ eSf DQG SK 6LPLODUO\ GHQRWH WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV FRPSXWHG DVVXPLQJ f DQG f E\ e0f DQG 5HFDOO WKDW WKH WKUHH SURGXFWPXOWLQRPLDO PRGHO DVVXPSWLRQV DUH $Of 7KH PXOWLQRPLDO UHVSRQVH PRGHO FDQ EH VSHFLILHG DV LQ f 7KDW LV WKH PRGHO SDUDPHWHU VSDFH FDQ EH UHSUHVHQWHG DV MW ^e H 5r &L ORJ$MHr ;?I?& ORJ $HA ; / HnkIOrf Qn` ZKHUH & &LM &L LV T^ [ UULL fÂ§ $L kI $LM $LM $Q LV UULL [ 5 / /M /M /L LV G [ 5 e YHFeLeMUf DQG er LV 5 [ ;L LV .TL [ 3L RI IXOO UDQN SÂ Q LV WKH [ YHFWRU RI PXOWLQRPLDO LQGLFHV V 5. WKH WRWDO QXPEHU RI FHOOV
PAGE 76
$f (LWKHU ,T. RU &Â OPMf Â DQG $f ,I &L ,T. WKHQ 0;Wf 0HOPLf 7KH IROORZLQJ WKHRUHP VWDWHV WKDW WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV IRU e DQG KHQFH c DUH WKH VDPH XQGHU WKH SURGXFWPXOWLQRPLDO VDPSOLQJ VFKHPH RI f DQG WKH SURGXFW3RLVVRQ VDPSOLQJ VFKHPH RI f SURYLGHG WKDW WKH WKUHH DVVXPSWLRQV $Of $f DQG $f KROG 7KHRUHP ,I WKH PRGHO f VDWLVILHV DVVXPSWLRQV $Of $f DQG $6f WKHQ c3f e: DQG eSf IF: 7KDW LV WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV RI DQG e DUH WKH VDPH XQGHU ERWK VDPSOLQJ VFKHPHVfÂ§SURGXFW3RLVVRQ f DQG SURGXFWPXOWLQRPLDO f 3URRI 8QGHU WKH SURGXFW 3RLVVRQ DVVXPSWLRQ RI f DQG f WKH NHUQHO RI WKH ORJ OLNHOLKRRG LV ÂSfe\f \neHLn 7KHUHIRUH OHWWLQJ YHFe$f WKH FRUUHVSRQGLQJ /DJUDQJLDQ REMHFWLYH IXQFWLRQ LV "mf \nHfO :&2$ DQG VR WR ILQG WKH PD[LPXP 3RLVVRQf OLNHOLKRRG HVWLPDWRU eSf $SOf ZH PXVW VROYH WKH V\VWHP RI HTXDWLRQV G4f \ HLSf WIeSff$Sf ? ILeSff 7KH FRQFOXVLRQ RI WKH WKHRUHP QRZ IROORZV VLQFH WKH HTXDWLRQV f RI ? f
PAGE 77
7KHRUHP DQG f \LHOG H[DFWO\ WKH VDPH VROXWLRQV DQG "Sf ;n;A;n&ORJL$Hrrnf ;n;\n;n&ORJL$HAf IW0? $V D FRUROODU\ WR 7KHRUHP ZH KDYH &RUROODU\ 3URYLGHG WKH DVVXPSWLRQV RI 7KHRUHP IO KROG WKH HVWLPDWHG XQGHWHUPLQHG PXOWLSOLHUV DUH LQYDULDQW ZLWK UHVSHFW WR VDPSOLQJ VFKHPH LH $! $Sf 3URRI 7KH SURRI IROORZV LPPHGLDWHO\ XSRQ QRWLQJ WKDW HTXDWLRQV f DQG f \LHOG H[DFWO\ WKH VDPH VROXWLRQV B $ UHPDUN LV LQ RUGHU %DVLFDOO\ 7KHRUHP HQDEOHV XV WR FRQFOXGH WKDW WKH VXIILFLHQW DQG QHFHVVDU\ FRQGLWLRQ RI %LUFK f KROGV 7KHVH FRQGLWLRQV DUH WKDW WKH PRGHO EH VSHFLILHG VR WKDW WKH 3RLVVRQ 0/ HVWLPDWRUV QHFHVVDULO\ VDWLVI\ WKH LGHQWLILDELOLW\ FRQVWUDLQWV WKDW DUH UHTXLUHG IRU WKH PXOWLQRPLDO PRGHO :H QRZ H[SORUH WKH DV\PSWRWLF EHKDYLRU RI WKH 3RLVVRQf 0/ HVWLPDWRU Sf YHFeSf $Sff )RU WKH SURGXFW3RLVVRQ DVVXPSWLRQV f DQG f ZH FDQ REWDLQ WKH DV\PSWRWLF GLVWULEXWLRQ RI E\ IRUPDOO\ UHSODFLQJ WKH Qr PLQ^QÂ` E\ ÂÂr PLQIHAL` DQG XVLQJ WKH VDPH DUJXPHQWV DV WKRVH XVHG WR GHULYH WKH DV\PSWRWLF GLVWULEXWLRQ RI 0f SUJHQVRQ f GLVFXVVHV OLPLWLQJ GLVWULEXWLRQV IRU 3RLVVRQ UDQGRP YDULDEOHV DV WKH PHDQ SDUDPHWHUV RU HTXLYDOHQWO\ Lr JR WR LQILQLW\ ,Q WKLV
PAGE 78
FDVH DO}n
PAGE 79
5HVXOW 7KH DV\PSWRWLF GLVWULEXWLRQV RI $Sf DQG DUH LGHQWLFDO DQG LW IROORZV WKDW WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF ZKLFK KDV IRUP /0 $nYDU$ffB $ LV LQYDULDQW ZLWK UHVSHFW WR WKH VDPSOLQJ VFKHPH 5HVXOW a3fa3fn YDU YDUÂSff fÂ§fÂ§fÂ§ Qr f 5HVXOW YDU"0ff YDU"Sff $ f ZKHUH $ ;n;fa;n& 98L DQG LV QRQQHJDWLYH GHILQLWH 7KH QRWDWLRQ YDUf XVHG LQ WKHVH UHVXOWV GHQRWHV WKH DV\PSWRWLF YDULDQFH 7KLV LV LPSRUWDQW VLQFH WKH ILQLWH VDPSOH YDULDQFHV PD\ QRW HYHQ H[LVW 7KH SURRIV IRU 5HVXOWV DQG DUH VWUDLJKWIRUZDUG %DVLFDOO\ WKH\ LQYROYH XVLQJ WKH GHOWD PHWKRG DQG HTXDWLRQ f 7KH LQWHUHVWHG UHDGHU ZLOO ILQG DQ RXWOLQH RI WKH SURRIV LQ $SSHQGL[ $ ,Q SUDFWLFH LW LV RI SDUWLFXODU LQWHUHVW WR HYDOXDWH WKH PDWUL[ $ RI HTXDWLRQ f 2IWHQ IRU FRQYHQLHQFH WKH PRGHOV DUH ILW DVVXPLQJ WKH YHFWRU < LV SURGXFW 3RLVVRQ DQG WKHQ LQIHUHQFHV EDVHG RQ WKH PD[LPXP OLNHOLKRRG HVWLPDWHV DUH PDGH DVVXPLQJ WKDW WKH\ DUH LQYDULDQW ZLWK UHVSHFW WR WKH VDPSOLQJ DVVXPSWLRQ %LUFK f DQG 3DOPJUHQ f GHULYH UXOHV IRU
PAGE 80
ZKHQ WKHVH LQIHUHQFHV EDVHG RQ WKH WZR GLIIHUHQW VDPSOLQJ DVVXPSWLRQV ZLOO EH HTXLYDOHQW +RZHYHU WKH\ DVVXPH WKDW WKH PRGHO LV RI D VLPSOH ORJOLQHDU IRUP 7KDW LV WKH 3RLVVRQ PRGHO LV DVVXPHG WR KDYH IRUP 4[ ^WH5JW [S` :H ZLOO XVH WKH UHVXOWV RI WKLV VHFWLRQ WR GHULYH PRUH JHQHUDO UXOHV IRU ZKHQ WKH WZR LQIHUHQFHV ZLOO EH HTXDO $V D VSHFLDO FDVH RI WKHVH UHVXOWV ZH ZLOO DUULYH DW WKH %LUFK DQG 3DOPJUHQ UHVXOWV 7KH IROORZLQJ OHPPD ZLOO HQDEOH XV WR UHZULWH $ RI f LQ VWLOO D VLPSOHU IRUP /HPPD /HW = >=? =N? EH DQ U [ PDWUL[ RI IXOO UDQN 6XSSRVH WKDW ; >;[ ;S? LV DQ U [ S U S .f PDWUL[ RI IXOO UDQN S VXFK WKDW P;f P=f LH WKH UDQJH VSDFH RI ; FRQWDLQV WKH UDQJH VSDFH RI = 'HQRWH WKH 7 7 Sf FROXPQV RI ; WKDW VSDQ D VSDFH WKDW FRQWDLQV 0=f E\ ^;f[ ;f7` :LWKRXW ORVV RI JHQHUDOLW\ VXSSRVH WKDW WKH VHW RI YHFWRUV ^;9O;97` LV D PLQLPDO VSDQQLQJ VXEVHW LH WKH VSDQQLQJ VHW RI DQ\ U 7 RI WKHVH YHFWRUV GRHV QRW FRQWDLQ WKH UDQJH VSDFH RI = :H FRQFOXGH WKDW : H 57}. ;n;fO;n= : ZKHUH WKH S [ 7 PDWUL[ >H9O Hf7@ DQG H9L LV WKH S [ YHFWRU fn ZLWK WKH f f LQ WKH Y?K SRVLWLRQ
PAGE 81
3URRI /HW ;r >;A;f7@ 1RZ E\ DVVXPSWLRQ 0;rf 0=f +HQFH WKHUH PXVW H[LVW D PDWUL[ : H 57[. = ;r: 7KHUHIRUH [n[\n[n] [n[\n[n[Z [n[\A[n[AZ MZ ZKHUH ;n;fB;n;rf LV DV VWDWHG LQ WKH FRQFOXVLRQ RI WKH OHPPD J %HIRUH VWDWLQJ WKH QH[W LPSRUWDQW WKHRUHP OHW XV ZULWH $ LQ DQRWKHU ZD\ $VVXPLQJ WKDW $Of KROGV $ FDQ EH ZULWWHQ DV D Df $f f ZKHUH Dm :[Mn[P # O_f p AfT[[[ff 1RZ LI &c LV D FRQWUDVW PDWUL[ E\ DVVXPSWLRQ $f ZH FDQ ZULWH ;n;Lf;n&nL k LSf YQN f ZKHUH Ef FDQ DUELWUDULO\ EH FKRVHQ WR EH HTXDO WR ;> DQG VR :Ef 2Q WKH RWKHU KDQG LI & ,T. WKHQ ZH KDYH E\ $f WKDW $W;Âf ;pOPcf 7KHUHIRUH ZH FDQ LQYRNH WKH UHVXOW RI /HPPD E\ VHWWLQJ = IILA } 6LQFH 0;Lf $OkOPcf 0=f WKH FRQGLWLRQV IRU WKH OHPPD DUH VDWLVILHG /HW ;r >; f; ff@ EH WKH [ 7L 3Lf VXEPDWUL[ RI ;Â : W97c WKDW KDV FROXPQV WKDW IRUP D PLQLPDO VSDQQLQJ VXEVHW IRU 0=f fÂ§ $WkAf %\ /HPPD :t 57L[. ^;nL;LA;. k LSf EfSSEf 9QLIF f +HUH Ef >H Lf H Lf@ ZKHUH WKH 7cc HOHPHQWDU\ YHFWRUV FRUUHVSRQG WR WKH 9 Y7L FROXPQV ; ; Lf` RI ;Â WKDW IRUP D PLQLPDO VSDQQLQJ VXEVHW IRU WKH
PAGE 82
UDQJH VSDFH RI pOPc LH WKH FROXPQV VSDQ D VHW WKDW FRQWDLQV WKH UDQJH VSDFH RI pOPL DQG DQ\ VPDOOHU VHW RI FROXPQV ZLOO QRW VSDQ D VHW FRQWDLQLQJ WKH UDQJH VSDFH RI kOPc ,W IROORZV WKDW WKH PDWULFHV $RI f FDQ EH ZULWWHQ DV $r 2M\2Zn2nOMn2nf ZKHUH >H H Lf@ LI &L ,T. 9O 7 ;> RWKHUZLVH DQG :L! LZ LI F LL. ? RWKHUZLVH :H QRZ VWDWH D WKHRUHP RI VXEVWDQWLYH LPSRUWDQFH 7KHRUHP 6XSSRVH WKDW DVVXPSWLRQV $Of $f DQG $6f KROG )RU U LI &U LV WKH LGHQWLW\ PDWUL[ WKHQ OHW ^X>U? XA` EH WKH VHW RI LQGLFHV WKDW LQGH[ WKRVH FROXPQV RI ;U WKDW IRUP D PLQLPDO VSDQQLQJ VXEVHW IRU $IILOPUf 7KHQ LW IROORZV WKDW WKH UHODWLRQVKLS EHWZHHQ WKH DV\PSWRWLF YDULDQFHV RI WKH WZR HVWLPDWRUV 0f DQG cSf LV YDU"0ff YDU"Sff Af ZKHUH WKH SL [ SM PDWUL[ $OLV D ]HUR PDWUL[ ZKHQHYHU DW OHDVW RQH RI &c RU &M LV D FRQWUDVW RU ]HUR PDWUL[ 2WKHUZLVH LI ERWK &Â DQG &M DUH LGHQWLW\ PDWULFHV WKHQ f f f ANO fÂ§ f LI NOf ^Y? rf XW` ;
PAGE 83
3URRI 6LQFH $Of $f DQG $f KROG ZH FDQ UHZULWH $n DV LQ f 1RZ LI HLWKHU &Â RU &M DUH FRQWUDVW RU ]HUR PDWULFHV LW LV REYLRXV E\ f WKDW $ ZLOO KDYH ]HUR FRPSRQHQWV DV VWDWHG LQ WKH WKHRUHP VLQFH DW OHDVW RQH RI :0 RU :: ZLOO EH D ]HUR PDWUL[ 2Q WKH RWKHU KDQG LI ERWK &Â DQG &M DUH LGHQWLW\ PDWULFHV WKHQ $ FDQ EH UHZULWWHQ DV LQ f ZKHUH DQG WKH PDWULFHV :rf DQG :Af DUH HOHPHQWV RI 57n[. DQG 57L[. +HQFH 9 ZKHUH :nL :rf:nLf LV VRPH [7M PDWUL[ 1RZ VLQFH ^Hf` DUH HOHPHQWDU\ YHFWRUV ZH KDYH WKDW LI 0f e Lf [ ^nf! f f 87 ` WKHQ WKH FRPSRQHQW $AW 2WKHUZLVH LI NOf LV D PHPEHU RI WKLV VHW LW PXVW EH WKDW $ÂM LV RQH RI WKH HOHPHQWV RI WKH PDWUL[ :r 7KLV FRPSOHWHV WKH SURRI A 7KH QH[W WZR FRUROODULHV IROORZ LPPHGLDWHO\ IURP 7KHRUHP &RUROODU\ ,I ERWK &? DQG & DUH FRQWUDVW PDWULFHV WKHQ YDUA0Af YDU"Sff
PAGE 84
3URRI 6LQFH ERWK &M DQG & DUH FRQWUDVW PDWULFHV LW IROORZV WKDW :rf DQG :: DUH ]HUR PDWULFHV 7KHUHIRUH WKH PDWULFHV $fRI WKH WKHRUHP DUH ]HUR PDWULFHV J &RUROODU\ /HW & ; DQG &? VR WKDW WKH PRGHO f EHFRPHV Hr ^ 5f Â‘Â‘ e [S HnHOVf Qn! LH D VLPSOH ORJOLQHDU PRGHO ZLWK VXESRSXODWLRQV /HW ^Y? 8W` EH WKH VHW RI LQGLFHV WKDW LQGH[ WKH FROXPQV RI ; WKDW IRUP D PLQLPDO VSDQQLQJ VXEVHW IRU 7KHQ YDU0ff YDUSff $ ZKHUH WKH HOHPHQWV RI $ DUH VXFK WKDW $NL LI 0f 0AU` 3URRI 7KH SURRI LV DQ LPPHGLDWH FRQVHTXHQFH RI WKH WKHRUHP XSRQ LGHQWLI\LQJ $ RI WKH WKHRUHP ZLWK $ RI WKH FRUROODU\ 7KH RWKHU PDWULFHV $ $ DQG $ ZLOO EH ]HUR VLQFH & B &RUROODU\ LV RI SUDFWLFDO LPSRUWDQFH DQG LV HVVHQWLDOO\ WKH UHVXOW VKRZQ E\ 3DOPJUHQ f ,Q SDUWLFXODU LI ZH SDUDPHWHUL]H WKH PRGHO LQ VXFK D ZD\ VR WKDW WKHUH LV D SDUDPHWHU LQFOXGHG IRU HDFK RI WKH LQGHSHQGHQW PXOWLQRPLDOV RU FRYDULDWH OHYHOVf WKHQ WKH FROXPQV RI ; FRUUHVSRQGLQJ WR WKHVH fIL[HG E\ GHVLJQf SDUDPHWHUV ZLOO IRUP D EDVLV DQG KHQFH D PLQLPDO VSDQQLQJ VXEVHWf IRU $WpI OM"f 7KHUHIRUH LI L DQG M DUH QRW RQH RI WKH
PAGE 85
. SDUDPHWHUV IL[HG E\ GHVLJQ WKHQ FDYA0?A0Af FR?AS?ASAf :H ZLOO LOOXVWUDWH WKH XWLOLW\ RI WKH DERYH UHVXOWV LQ WKH QH[W FKDSWHU RI WKLV GLVVHUWDWLRQ 7KH QH[W VHFWLRQ FRQVLGHUV LVVXHV WKDW PD\ DULVH ZKHQ FRPSXWLQJ WKH PRGHO GHJUHHV RI IUHHGRP ,W DOVR VWDWHV VRPH RWKHU PLVFHOODQHRXV UHVXOWV ZLWK UHJDUG WR WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF 0LVFHOODQHRXV 5HVXOWV :H EHJLQ WKLV VHFWLRQ E\ DGGUHVVLQJ SUDFWLFDO LVVXHV WKDW PD\ DULVH GXULQJ QRQVWDQGDUG PRGHO ILWWLQJ 6SHFLILFDOO\ ZH ZLOO FRQVLGHU FRPSXWLQJ WKH PRGHO DQG GLVWDQFH RU UHVLGXDOf GHJUHHV RI IUHHGRP &RPSXWLQJ PRGHO DQG GLVWDQFH GHJUHHV RI IUHHGRP $VVXPLQJ WKH PRGHO >IW@ RI f LV ZHOO GHILQHG LH WKH X O FRQVWUDLQWV DUH QRQUHGXQGDQW ZH FDQ FRPSXWH WKH PRGHO GHJUHHV RI IUHHGRP DV LQ VHFWLRQ ,Q WKDW VHFWLRQ ZH GHILQHG WKH PRGHO GHJUHHV RI IUHHGRP DV WKH QXPEHU RI PRGHO SDUDPHWHUV PLQXV WKH QXPEHU RI LQGHSHQGHQW FRQVWUDLQWV LPSOLHG E\ WKH PRGHO 1RWLFH WKDW LQ WKLV DSSOLFDWLRQ ZH KDYH DQ DGGLWLRQDO O OLQHDU FRQVWUDLQWV 7KH O FRQVWUDLQWV ZHUH QRW SUHVHQW LQ VHFWLRQ ,W IROORZV WKDW WKH PRGHO GHJUHHV RI IUHHGRP IRU >@ LV GI>4K@ VX O .f f ZKHUH V LV WKH QXPEHU RI FHOO PHDQV X LV WKH GLPHQVLRQ RI WKH QXOO VSDFH RI ;n O LV WKH QXPEHU RI OLQHDU FRQVWUDLQWV DQG LV WKH QXPEHU RI LGHQWLILDELOLW\ FRQVWUDLQWV
PAGE 86
7R PHDVXUH PRGHO JRRGQHVV RI ILW ZH FDQ FRQVLGHU HVWLPDWLQJ VRPH K\SRWKHWLFDO GLVWDQFH EHWZHHQ PRGHO >@ DQG WKH VDWXUDWHG PRGHO X O f >@ 7KLV GLVWDQFH GHQRWHG >@ KDV GHJUHHV RI IUHHGRP GL>HNH@f G>@G>H@ V B $7f BV Bf $7ff f X 1RWLFH WKDW KDG ZH FRQVLGHUHG WKH SURGXFW 3RLVVRQ PRGHO f WKH GLVWDQFH GHJUHHV RI IUHHGRP ZRXOG EH GI>4A Sf@f V V X ff X ZKLFK LV LGHQWLFDO WR WKH SURGXFW PXOWLQRPLDO GLVWDQFH GHJUHHV RI IUHHGRP RI f :H KDYH DVVXPHG WKDW WKH X O FRQVWUDLQWV DUH QRQUHGXQGDQW LH HDFK FRQVWUDLQW LV QRW LPSOLHG E\ WKH RWKHU FRQVWUDLQWV 7KLV PD\ QRW DOZD\V EH WKH FDVH 7R LOOXVWUDWH FRQVLGHU WKH PRGHO VSHFLILFDWLRQ IRU H[DPSOH RI VHFWLRQ 7KH PRGHO >pPK@ LPSOLHV WKDW WKH WZR PDUJLQDO GLVWULEXWLRQV DUH HTXDO :H VWDWHG DW WKH HQG RI WKDW H[DPSOH WKDW WKH DGGLWLRQDO FRQVWUDLQW W W ZDV UHGXQGDQW 7KLV FDQ EH VHHQ VLQFH U U fÂ§ r U fÂ§"7O 77ILf 7KDW LV WKH FRQVWUDLQWV RI PRGHO >4PK@ LPSO\ WKDW U U HTXDOV ]HUR +DG ZH EOLQGO\ DGGHG WKLV FRQVWUDLQW ZH PD\ KDYH LQFRUUHFWO\ FDOFXODWHG WKH PRGHO GHJUHHV RI IUHHGRP DV DQG WKH GLVWDQFH GHJUHHV RI IUHHGRP DV 7KHUHIRUH ZH PXVW EH YHU\ FDUHIXO WR KDYH D VHW RI QRQUHGXQGDQW FRQVWUDLQWV ZKHQ FRPSXWLQJ GHJUHHV RI IUHHGRP
PAGE 87
,Q SUDFWLFH ZKHQ PRGHOV DUH PRUH FRPSOLFDWHG LW PD\ EH GLIILFXOW WR DVn FHUWDLQ ZKHWKHU RU QRW WKH PRGHO FRQVWUDLQWV DUH QRQUHGXQGDQW )RUWXQDWHO\ WKHUH DUH WZR YHU\ XVHIXO UHVXOWV WKDW KHOS LQ WKLV UHJDUG 7KH ILUVW UHVXOW LV WKDW ZKHQ WKH FRQVWUDLQWV DUH UHGXQGDQW WKH PDWUL[ +n'aO+f HYDOXDWHG DW VRPH SRLQW LQ LV RI OHVV WKDQ IXOO UDQN DQG LV QRW LQYHUWLEOH 7KHUHIRUH LQ SUDFWLFH LI WKH DOJRULWKP f GRHV QRW FRQYHUJH GXH WR EHLQJ VLQJXODU LW PD\ EH GXH WR UHGXQGDQW FRQVWUDLQWV LH DQ LOO GHILQHG PRGHO 7KH XVHU VKRXOG LQYHVWLJDWH DQG SRVVLEO\ UHVSHFLI\ WKH PRGHO VKRXOG WKLV RFFXU $ FDYHDW LV WKDW GXH WR FRPSXWDWLRQDO URXQGRII HUURU D VLQJXODULW\ PD\ QRW RFFXU HYHQ ZKHQ WKH PRGHO LV LOO GHILQHG EHFDXVH WKH LWHUDWH HVWLPDWHV LQFOXGLQJ WKH ILQDO HVWLPDWH PD\ QRW VWULFWO\ OLH LQ 4K 7KH QH[W UHVXOW PD\ PLWLJDWH WKLV SUREOHP $ UHVXOW WKDW LV XVHIXO LQ SUDFWLFH LV WKDW D QHFHVVDU\ FRQGLWLRQ IRU WKH FRQVWUDLQWV WR EH QRQUHGXQGDQW RU HTXLYDOHQWO\ IRU WKH PRGHO WR EH ZHOO GHILQHG LV WKDW WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF EH LQYDULDQW WR FKRLFH RI 8 D PDWUL[ ZLWK FROXPQV VSDQQLQJ WKH QXOO VSDFH RI ; (YLGHQWO\ LI WKH XVHU ILWV WKH PRGHO VHYHUDO WLPHV HDFK WLPH XVLQJ D GLIIHUHQW O8n PDWUL[ DQG WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF YDULHV PRUH VR WKDQ FDQ EH H[SODLQHG E\ URXQGRII HUURUf WKHQ LW PXVW EH WKDW WKH PRGHO LV LOO GHILQHG )RUPDOO\ WKLV QHFHVVDU\ FRQGLWLRQ FDQ EH VWDWHG DV 7KHRUHP /HW 8? DQG 8 8? 8f EH DQ\ WZR IXOO FROXPQ UDQN PDWULFHV VDWLVI\LQJ 8; Â 'HQRWH WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF HYDOXDWHG XVLQJ 8L E\ /0^8Lf ,I WKH PDWUL[ +L $ff&8L f8f
PAGE 88
LV VXFK WKDW >+L HA@ LV RI IXOO FROXPQ UDQN L DQG KHQFH WKH PRGHOV ZHOO GHILQHG WKHQ /0^8;f /08f LH WKH YDOXH RI WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF LV LQYDULDQW ZLWK UHVSHFW WR FKRLFH RI 8 3URRI 'HQRWH WKH PRGHO VSHFLILHG LQ WHUPV RI &Â E\ >IF@ L %\ WKH GHILQLWLRQ RI 8^ ZH NQRZ WKDW WKH FRQVWUDLQWV LPSOLHG E\ >LDQG >MDUH HTXLYDOHQW +HQFH WKH VROXWLRQ e WR f RU HTXLYDOHQWO\ f XQGHU HLWKHU PRGHO LV WKH VDPH 7KXV LQ YLHZ RI WKH ILUVW VHW RI HTXDWLRQV LQ f DQ\ VROXWLRQ YHFe $Âf XQGHU PRGHO >@ PXVW VDWLVI\ \HÂf +L&2N Â f 1RWLFH WKDW VLQFH 8[ 8 ZH KDYH WKDW +Lef A +ef DQG E\ f $M A $ 1RZ f LPSOLHV WKDW IILÂ‹fÂƒ +^Lf $ f $OVR VLQFH IÂef LV DVVXPHG WR EH RI IXOO FROXPQ UDQN WKH YDULDQFH RI ÂƒÂ YDUÂƒcf +8IW'n:+LGff f H[LVWV 7KHUHIRUH WKH /DJUDQJH PXOWLSOLHU VWDWLVWLFV /0^8Lf ZKLFK KDYH IRUP Âƒn>YDU$Âf@ ;L Â f
PAGE 89
H[LVW )LQDOO\ E\ ff LW IROORZV WKDW /08Lf $L>YDU$Lf@$L Âƒn >YDUÂƒf@Âƒ /08f 7KLV FRPSOHWHV WKH SURRI J 7KH ILQDO UHVXOW RI WKLV VHFWLRQ VWDWHV WKDW WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF LV H[DFWO\ WKH VDPH DV WKH 3HDUVRQ FKLVTXDUHG VWDWLVWLF ZKHQHYHU WKH UDQGRP YHFWRU < LV SURGXFW3RLVVRQ RU SURGXFWPXOWLQRPLDO DQG WKH PRGHO VDWLVILHV DVVXPSWLRQV $Of $f DQG $f 7KHRUHP $VVXPH WKDW WKH SURGXFWPXOWLQRPLDO PRGHO VDWLVILHV DVVXPSWLRQV $Of $f DQG $f /HW ; GHQRWH WKH 3HDUVRQ FKLVTXDUHG VWDWLVWLF LH ; ^\ ILf ZKHUH IL LV WKH 0/ HVWLPDWRU XQGHU HLWKHU RI WKH VDPSOLQJ VFKHPHVfÂ§SURGXFW PXOWLQRPLDO RU SURGXFW3RLVVRQ ,W IROORZV WKDW WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF /0 LV HTXLYDOHQW WR ; 7KDW LV /0 ; 3URRI %\ HTXDWLRQV f f DQG f RI WKH SUHYLRXV WKHRUHPfV SURRI DQG WKH IDFW WKDW HA c[ ZH KDYH WKDW /0 \cLfn'n8Lf\ ef r 7KLV LV ZKDW ZH VHW RXW WR VKRZ
PAGE 90
'LVFXVVLRQ ,Q WKLV FKDSWHU ZH GLVFXVVHG LQ VRPH GHWDLO LVVXHV UHODWHG WR SDUDPHWULF PRGHOLQJ ,Q SDUWLFXODU ZH IROORZHG WKH OHDG RI $LWFKLVRQ DQG 6LOYH\ f DQG 6LOYH\ f DQG GHVFULEHG WZR ZD\V RI VSHFLI\LQJ PRGHOVfÂ§XVLQJ FRQVWUDLQW HTXDWLRQV DQG XVLQJ IUHHGRP HTXDWLRQV ,Q VHFWLRQ GLVWDQFH PHDVXUHV IRU TXDQWLI\LQJ KRZ IDU DSDUW WZR PRGHOV DUH UHODWLYH WR KRZ FORVH WKH\ DUH WR KROGLQJ ZHUH GLVFXVVHG ,Q SDUWLFXODU WKH SRZHUGLYHUJHQFH PHDVXUHV 5HDG DQG &UHVVLH f ZHUH XVHG ZKHQ WKH SDUDPHWHU VSDFHV ZHUH VXEVHWV RI DQ V OfGLPHQVLRQDO VLPSOH[ (VWLPDWHV RI WKHVH GLVWDQFHV ZHUH GHYHORSHG EDVHG RQ YHU\ LQWXLWLYH QRWLRQV $OVR D JHRPHWULF LQWHUSUHWDWLRQ RI PRGHO DQG UHVLGXDO RU GLVWDQFHf GHJUHHV RI IUHHGRP ZDV JLYHQ ,Q VHFWLRQ ZH GHVFULEHG D JHQHUDO FODVV RI PXOWLYDULDWH SRO\WRPRXV FDWHJRULFDOf UHVSRQVH GDWD PRGHOV 7KH FODVV RI PRGHOV ZKLFK VDWLVI\ DVVXPSWLRQV $Of $f DQG $f ZHUH VKRZQ WR VDWLVI\ WKH QHFHVVDU\ DQG VXIILFLHQW FRQGLWLRQV RI %LUFK f VR WKDW WKH PRGHOV FRXOG EH ILWWHG XVLQJ HLWKHU WKH SURGXFW3RLVVRQ RU SURGXFWPXOWLQRPLDO VDPSOLQJ DVVXPSWLRQ $Q 0/ ILWWLQJ PHWKRG ZDV GHYHORSHG XVLQJ UHVXOWV RI $LWFKLVRQ DQG 6LOn YH\ f DQG +DEHU D Ef 7KH DOJRULWKP XVHG /DJUDQJLDQ XQGHWHUPLQHG PXOWLSOLHUV LQ FRQMXQFWLRQ ZLWK D PRGLILHG 1HZWRQ5DSKVRQ LWHUDWLYH VFKHPH 7KH PRGLILFDWLRQ ZKLFK VLPSOLILHV WKH PHWKRG RI +DEHU Df LV WR XVH D VLPSOHU PDWUL[ WKDQ WKH +HVVLDQ PDWUL[ :H UHSODFH WKH +HVVLDQ PDWUL[ RI WKH /DJUDQJLDQ REMHFWLYH IXQFWLRQf E\ LWV GRPLQDQW SDUW ZKLFK WXUQV RXW WR EH HDVLO\ LQYHUWHG %HFDXVH WKH PDWULFHV XVHG LQ WKH DOJRULWKP SURSRVHG LQ WKLV FKDSWHU DUH YHU\ ODUJH DQG PXVW EH LQYHUWHG WKLV
PAGE 91
PRGLILFDWLRQ LV D YHU\ LPSRUWDQW RQH $ )2575$1 SURJUDP fPOHUHVWUDLQWf KDV EHHQ ZULWWHQ E\ WKH DXWKRU WR LPSOHPHQW WKLV PRGLILHG DOJRULWKP 7KH DV\PSWRWLF EHKDYLRU RI WKH 0/ HVWLPDWRUV FRPSXWHG XQGHU WKH WZR VDPSOLQJ VFKHPHVfÂ§SURGXFW3RLVVRQ DQG SURGXFWPXOWLQRPLDOfÂ§ZDV LQYHVWLn JDWHG 7KH PHWKRG IRU GHULYLQJ WKH DV\PSWRWLF GLVWULEXWLRQV UHSUHVHQWV D PRGLILFDWLRQ WR WKH WHFKQLTXH RI $LWFKLVRQ DQG 6LOYH\ f $ FRPSDULVRQ RI WKH OLPLWLQJ GLVWULEXWLRQV RI WKH WZR HVWLPDWRUV ZDV PDGH LQ VHFWLRQ 6RPH YHU\ LQWHUHVWLQJ UHVXOWV ZHUH REWDLQHG E\ VWXG\LQJ WKH DV\PSWRWLF EHKDYLRU LQ WKH FRQVWUDLQW HTXDWLRQ VHWWLQJ ,Q SDUWLFXODU 7KHRUHP UHSUHVHQWV D JHQHUDOL]DWLRQ RI WKH UHVXOWV RI 3DOPJUHQ f 7KH WKHRUHP SURYLGHV D PHWKRG IRU GHWHUPLQLQJ ZKHQ WKH LQIHUHQFHV DERXW WKH IUHHGRP SDUDPHWHUV RI D JHQHUDOL]HG ORJOLQHDU PRGHO RI WKH IRUP GRJ $L ; ZLOO EH LQYDULDQW ZLWK UHVSHFW WR WKH VDPSOLQJ DVVXPSWLRQ 3DOPJUHQ f GHYHORSHG VRPH VLPLODU UHVXOWV IRU WKH VSHFLDO FDVH ZKHQ WKH IUHHGRP SDUDPHWHUV DUH SDUW RI D ORJOLQHDU PRGHO ,W LV LPSRUWDQW WR QRWH WKDW WKH DV\PSWRWLF UHVXOWV DUH RQO\ YDOLG LI WKH QXPEHU RI SRSXODWLRQV LV FRQVLGHUHG IL[HG DQG WKH H[SHFWHG FRXQWV DOO JHW ODUJH DW DSSUR[LPDWHO\ WKH VDPH UDWH ,Q SDUWLFXODU WKH DV\PSWRWLF DUJXPHQWV GR QRW KROG ZKHQ WKH FRYDULDWHV DUH FRQWLQXRXV VLQFH WKH QXPEHU RI SRSXODWLRQV OHYHOV RI WKH FRYDULDWHVf FDQ WKHRUHWLFDOO\ UXQ RII WR LQILQLW\ 7KH UHDVRQ WKH DUJXPHQWV GR QRW KROG LV WKDW ZKHQ ZH XVH WKH PHWKRG RI $LWFKLVRQ DQG 6LOYH\ f LW LV UHTXLUHG WKDW WKH YHFWRU Qr L GLULWL
PAGE 92
FRXOG SURYH WR EH WHPSRUDU\ ,W VHHPV UHDVRQDEOH WR DVVXPH LQ PDQ\ FDVHV WKDW DV ORQJ DV WKH fLQIRUPDWLRQf DERXW HDFK SDUDPHWHU LV LQFUHDVLQJ ZLWKRXW ERXQG WKH HVWLPDWRUV ZLOO EH FRQVLVWHQW DQG DV\PSWRWLFDOO\ QRUPDOO\ GLVn WULEXWHG )RU H[DPSOH FRQVLGHU WKH ORJLVWLF UHJUHVVLRQ PRGHO ZLWK FRQWLQXRXV FRYDULDWHV $OWKRXJK WKH QAV PD\ DOO EH WKH 0/ HVWLPDWRUV RI WKH UHJUHVVLRQ SDUDPHWHUV DUH RIWHQ FRQVLVWHQW DQG DV\PSWRWLFDOO\ QRUPDO 6HFWLRQ RXWOLQHV VRPH PLVFHOODQHRXV UHVXOWV 2QH UHVXOW WKDW LV LPSRUWDQW WR WKH SUDFWLFLQJ VWDWLVWLFLDQ LV WKDW WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF LV VKRZQ WR EH LQYDULDQW ZLWK UHVSHFW WR FKRLFH RI WKH PDWUL[ 8 RI 8n& ORJ $Q f DV ORQJ DV WKH PRGHO LV ZHOO GHILQHG $Q LPSRUWDQW LPSOLFDWLRQ RI WKLV UHVXOW LV WKDW LI RQH ILWV WKH PRGHO VHYHUDO WLPHV HDFK WLPH XVLQJ D GLIIHUHQW Â8 PDWUL[ DQG WKH /DJUDQJH PXOWLSOLHU VWDWLVWLFV YDU\ PRUH VR WKDQ FDQ EH H[SODLQHG E\ URXQGRII WKHQ LW FRXOG EH WKDW WKH PRGHO LV QRW ZHOO GHILQHG $QRWKHU LQWHUHVWLQJ UHVXOW LV WKDW WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF LV VLPSO\ WKH 3HDUVRQ FKLVTXDUHG VWDWLVWLF ; ZKHQHYHU WKH DVVXPSWLRQV $Of $f DQG $f DUH VDWLVILHG 7KHRUHWLFDOO\ WKH 0/ ILWWLQJ DOJRULWKP ZLOO ZRUN IRU DQ\ VL]H SUREOHP 3UDFWLFDOO\ KRZHYHU WKH DOJRULWKP LV FHUWDLQO\ QRW D PRGHO ILWWLQJ SDQDFHD 7KH QXPEHU RI SDUDPHWHUV WKDW PXVW EH HVWLPDWHG JHWV YHU\ ODUJH YHU\ IDVW &RQVLGHU WKH FDVH ZKHUH UDWHUV UDWH WKH VDPH VHW RI REMHFWV RQ D SRLQW VFDOH (YHQ ZLWKRXW FRYDULDWHV WKH QXPEHU RI FHOO SUREDELOLWLHV WKDW PXVW EH HVWLPDWHG LV ,W VHHPV WKH 0/ ILWWLQJ PHWKRG GHYHORSHG LQ WKLV FKDSWHU LV DW OHDVW IRU QRZ XVHIXO IRU PRGHUDWH VL]H SUREOHPV RQO\ ,W FDQ EH XVHG WR DQDO\]H ORQJLWXGLQDO FDWHJRULFDO UHVSRQVH GDWD ZKHQ WKH QXPEHU
PAGE 93
RI PHDVXUHPHQWV WDNHQ RQ HDFK VXEMHFW LV VRPHZKHUH LQ WKH QHLJKERUKRRG RI WR 7KLV LV QRW WR WDNH DZD\ IURP WKH XWLOLW\ RI WKLV FKDSWHUfV DOJRULWKP EXW UDWKHU WR LQGLFDWH LWV EUHDGWK RI DSSOLFDWLRQ ,Q WLPH ZLWK LQFUHDVLQJ FRPSXWHU HIILFLHQF\ PXFK ODUJHU GDWD VHWV PD\ EH ILWWHG XVLQJ WKLV DOJRULWKP
PAGE 94
&+$37(5 6,08/7$1(286/< 02'(/,1* 7+( 2,17 $1' 0$5*,1$/ ',675,%87,216 2) 08/7,9$5,$7( 32/<720286 5(63216( 9(&7256 ,QWURGXFWLRQ 2IWHQ WLPHV ZKHQ JLYHQ DQ RSSRUWXQLW\ WR DQDO\]H PXOWLYDULDWH UHVSRQVH GDWD WKH LQYHVWLJDWRU PD\ ZLVK WR GHVFULEH ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV VLPXOWDQHRXVO\ :H FRQVLGHU D EURDG FODVV RI PRGHOV ZKLFK LPSO\ VWUXFWXUH RQ ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV RI PXOWLYDULDWH SRO\WRPRXV UHVSRQVH YHFWRUV 7R LOOXVWUDWH WKH QHHG IRU VXFK PRGHOV ZH FRQVLGHU VHYHUDO VHWWLQJV ZKHUH WKHVH PRGHOV ZRXOG EH XVHIXO )RU H[DPSOH ZKHQ WKH PXOWLYDULDWH UHVSRQVHV UHSUHVHQW UHSHDWHG PHDVXUHV RI WKH VDPH FDWHJRULFDO UHVSRQVH DFURVV WLPH RQH PD\ EH LQWHUHVWHG LQ KRZ WKH PDUJLQDO GLVWULEXWLRQV DUH FKDQJLQJ DFURVV WLPH DQG KRZ VWURQJO\ WKH UHVSRQVHV DUH DVVRFLDWHG 7KH VLPXOWDQHRXV LQYHVWLJDWLRQ RI ERWK MRLQW DQG PDUJLQDO GLVWULEXWLRQV LV QRW UHVWULFWHG WR WKH ORQJLWXGLQDO GDWD VHWWLQJ 2WKHU H[DPSOHV LQFOXGH WKH DQDO\VLV RI UDWHU DJUHHPHQW FURVVRYHU DQG VRFLDO PRELOLW\ GDWD 7KH FRPPRQ WKUHDG W\LQJ DOO RI WKHVH GDWD W\SHV WRJHWKHU LV WKDW WKH VDPSOLQJ VFKHPH LV VXFK WKDW WKH GLIIHUHQW UHVSRQVHV DUH FRUUHODWHG ,Q ORQJLWXGLQDO VWXGLHV WKH VDPH VXEMHFW UHVSRQGV RQ VHYHUDO RFFDVLRQV ,Q UDWHU DJUHHPHQW VWXGLHV UDWHUV UDWH WKH VDPH REMHFWV ,Q WZRSHULRG FURVVRYHU VWXGLHV RQH JURXS RI VXEMHFWV UHFHLYH WKH WZR WUHDWPHQWV LQ RQH RUGHU DQG WKH RWKHU JURXS UHFHLYH WKHP LQ WKH RWKHU RUGHU ,Q VRFLDO PRELOLW\ VWXGLHV WKH VRFLRHFRQRPLF
PAGE 95
VWDWXV RI D IDWKHUVRQ SDLU LV UHFRUGHG :KHQ WKH UHVSRQVHV DUH SRVLWLYHO\ FRUUHODWHG WKHVH GHVLJQV UHVXOW LQ LQFUHDVHG SRZHU IRU GHWHFWLQJ GLIIHUHQFHV EHWZHHQ WKH PDUJLQDO GLVWULEXWLRQV /DLUG =HJHU f 7KLV FKDSWHU FRQVLGHUV WKH PRGHOLQJ RI PXOWLYDULDWH FDWHJRULFDO UHVSRQVHV LQ ZKLFK WKH VDPH UHVSRQVH VFDOH LV XVHG IRU HDFK UHVSRQVH 7KH FODVVHV RI PRGHOV XVHG LQ WKLV FKDSWHU DUH RI WKH IRUP FRQVLGHUHG LQ &KDSWHU RI WKLV GLVVHUWDWLRQ DQG KHQFH DUH UHDGLO\ ILW XVLQJ WKH 0/ PHWKRGV RI WKDW FKDSWHU ,Q VHFWLRQ ZH JLYH VHYHUDO H[DPSOHV WKDW PD\ EH DQDO\]HG E\ VLPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV :H LQWURGXFH WKH FODVVHV RI VLPXOWDQHRXV RLQW0DUJLQDO PRGHOV LQ VHFWLRQ 6HYHUDO PRGHOV DUH ILWWHG WR WKH GDWD VHWV RI VHFWLRQ 3URGXFW0XOWLQRPLDO 6DPSOLQJ 0RGHO ,QLWLDOO\ ZH DVVXPH WKDW D UDQGRP VDPSOH RI QN VXEMHFWV LV WDNHQ IURP SRSXODWLRQ N N 7KH QXPEHU RI SRSXODWLRQV RU FRYDULDWH SURILOHV LV FRQVLGHUHG WR EH VRPH IL[HG LQWHJHU 7KH VXEVFULSW N LV DOORZHG WR EH FRPSRXQG LH WKH VXEVFULSW N LV DOORZHG WR UHSUHVHQW D YHFWRU RI VXEVFULSWV VXFK DV N ^NXNAANYf 6XSSRVH WKDW WKHUH DUH 7 FDWHJRULFDO UHVSRQVHV 9If 97f RI LQWHUHVW DQG WKDW HDFK UHVSRQVH LV PHDVXUHG RQ WKH VDPH UHVSRQVH VFDOH /HW 9A 9Afn EH WKH UDQGRP YHFWRU RI UHVSRQVHV IRU SRSXODWLRQ N DQG 9NX X QIF EH WKH QN LQGHSHQGHQW DQG LGHQWLFDOO\ GLVWULEXWHG FRSLHV RI ZKHUH 9NX GHQRWHV WKH UHVSRQVH SURILOH IRU WKH XWK UDQGRPO\
PAGE 96
FKRVHQ SHUVRQ ZLWKLQ SRSXODWLRQ N 1RWDWLRQDOO\ ZH KDYH 9NX a LLG 9N X OQN )RU RXU SXUSRVHV ZH FDQ DVVXPH WKDW HDFK UHVSRQVH WDNHV RQ YDOXHV LQ ^ FÂ` ZLWK SUREDELOLW\ RQH 'HQRWH WKH SUREDELOLW\ WKDW D UDQGRPO\ VHOHFWHG VXEMHFW IURP SRSXODWLRQ N KDV UHVSRQVH SURILOH L LM L\fn E\ 7ULN LH 3^9N ZKHUH L H ^ G` [ f f f [ ^ G` 7KH MRLQW GLVWULEXWLRQ RI 9N 9A Ye7Afn LV VSHFLILHG DV ^ULMIF` 7KH PDUJLQDO GLVWULEXWLRQV RI 9N ZLOO EH GHQRWHG E\ ^!ÂI Nf` W 7 ZKHUH 2XU REMHFWLYH LV WR PRGHO VLPXOWDQHRXVO\ WKH MRLQW GLVWULEXWLRQV ^77MMIF` N O. DQG WKH .7 PDUJLQDO GLVWULEXWLRQV W L7 N L. 7R KHOS WKH UHDGHU EHWWHU XQGHUVWDQG WKH QRWDWLRQ ZH FRQVLGHU WKH RQH SRSXODWLRQ ELYDULDWH FDVH :KHQ 7 WKH UHVSRQVH SURILOHV FDQ EH GHQRWHG E\ L ZKHUH L DQG M G 6LQFH WKHUH LV MXVW RQH SRSXODWLRQ RU FRYDULDWH SURILOHf WKH VXEVFULSW N LV DOZD\V DQG LV WKHUHIRUH GURSSHG ,W IROORZV WKDW U\ ` LV WKH MRLQW GLVWULEXWLRQ RI 9:fn DQG ^Âf` W DUH WKH WZR PDUJLQDO GLVWULEXWLRQV 7KDW LV P 3LYA LYZ Lf Â LM
PAGE 97
DQG !cWf 77L 39f! Lf WWÂ SY2f Lf LI W O LI W IRU L G 1RZ IRU HDFK SRSXODWLRQ N FRQVLGHU WKH ) [ UDQGRP YHFWRU RI LQGLFDWRUV >9r LLff f f f f KYN LG7f@n 1RWLFH WKDW QR LQIRUPDWLRQ DERXW WKH 9N LV ORVW VLQFH A LV D RQHWRRQH IXQFWLRQ RI 9N $OVR n6"N a LQG 0XOWO ^UÂÂ`f N O. 7KHUHIRUH VLQFH ZH KDYH UDQGRPO\ VDPSOHG QN VXEMHFWV IURP HDFK RI WKH SRSXODWLRQV ZH KDYH WKDW IRU JLYHQ N ANLANANQK a LLG 0XOWO ^ULIF`f DQG KHQFH WKH YHFWRU
PAGE 98
(YLGHQWO\
PAGE 99
ZH OHW 9Af DQG 9f UHSUHVHQW WKH UHVSRQVHV LQ DQG /HW \ÂM LM UHSUHVHQW WKH QXPEHU RI WKH 1 VXEMHFWV UHVSRQGLQJ DW OHYHO L LQ DQG OHYHO M LQ 1RWLFH WKDW WKHUH LV MXVW RQH SRSXODWLRQ RI LQWHUHVW ZH GURS WKH SRSXODWLRQ VXEVFULSW DOWRJHWKHU )LQDOO\ IRU WKLV ELYDULDWH UHVSRQVH H[DPSOH WKH FRPSRXQG VXEVFULSW L LV UHSODFHG E\ LM 7DEOH VXPPDUL]HV WKH ELYDULDWH UHVSRQVHV $V DQRWKHU H[DPSOH FRQVLGHU WKH FURVVRYHU GDWD RI (]]HW DQG :KLWH KHDG f $ 7DEOH &URVVRYHU 'DWD % % $ $% 6HTXHQFH %$ 6HTXHQFH *URXS f *URXS f 7KH FRXQWV GLVSOD\HG LQ 7DEOH DUH IURP D VWXG\ FRQGXFWHG E\ 0 +HDOWK &DUH /WG WR FRPSDUH WKH VXLWDELOLW\ RI WZR LQKDODWLRQ GHYLFHV $ DQG %f LQ SDWLHQWV ZKR DUH FXUUHQWO\ XVLQJ D VWDQGDUG LQKDOHU GHYLFH GHOLYHULQJ VDOEXWRPDO 7ZR LQGHSHQGHQW JURXSV RI VXEMHFWV SDUWLFLSDWHG *URXS XVHG GHYLFH $ IRU D ZHHN IROORZHG E\ GHYLFH % VHTXHQFH $%f *URXS XVHG WKH GHYLFHV LQ UHYHUVH RUGHU VHTXHQFH %$f 7KH UHVSRQVH YDULDEOHV GHYLFH $f DQG 9A GHYLFH %f DUH RUGLQDO SRO\WRPRXV 6SHFLILFDOO\ WKH\ DUH WKH VHOIDVVHVVPHQW RQ FODULW\ RI OHDIOHW LQVWUXFWLRQV DFFRPSDQ\LQJ WKH WZR GHYLFHV UHFRUGHG RQ WKH RUGLQDO IRXU SRLQW VFDOH
PAGE 100
(DV\ 2QO\ FOHDU DIWHU UHUHDGLQJ 1RW YHU\ FOHDU &RQIXVLQJ )RU WKLV H[DPSOH WKHUH DUH WZR SRSXODWLRQV RI LQWHUHVWfÂ§*URXS DQG *URXS /HW \cMN UHSUHVHQW WKH QXPEHU RI WKH QN VXEMHFWV UHVSRQGLQJ DW OHYHO L IRU GHYLFH $ DQG OHYHO M IRU GHYLFH % ZKHUH ULL DQG Q $JDLQ WKH ELYDULDWH UHVSRQVH SURILOHV FDQ EH GHQRWHG E\ L LM ZKHUH LM 7KH ELYDULDWH UHVSRQVHV DUH VXPPDUL]HG LQ 7DEOH RLQW DQG 0DUJLQDO 0RGHOV 7ZR W\SHV RI TXHVWLRQV WKDW FDQ EH SRVHG DERXW 7DEOH OHDG WR TXLWH GLVWLQFW W\SHV RI PRGHOV 2QH TXHVWLRQ LV ZKHWKHU WKH LQWHUHVW LQ WKH SROLWLFDO FDPSDLJQV ZDV GLIIHUHQW DW WKH WZR WLPHV )RU H[DPSOH WKH UHVHDUFKHU PD\ ZLVK WR WHVW WKH K\SRWKHVLV WKDW WKHUH ZDV PRUH LQWHUHVW LQ WKH SROLWLFDO FDPSDLJQ WKDQ WKH SROLWLFDO FDPSDLJQ $Q LQYHVWLJDWLRQ LQWR WKH PDUJLQDO GLVWULEXWLRQV LV QHHGHG WR WHVW WKLV K\SRWKHVLV )RU WKHVH ELYDULDWH UHVSRQVH GDWD WKH PDUJLQDO GLVWULEXWLRQV FRUUHVSRQG WR WKH URZ DQG FROXPQ GLVWULEXWLRQV RI 7DEOH $ VHFRQG TXHVWLRQ WKDW PD\ EH DVNHG LV ZKHWKHU WKH WZR UHVSRQVHV DUH DVVRFLDWHG DQG LI VR KRZ VWURQJ LV WKH DVVRFLDWLRQ 7R DQVZHU WKHVH TXHVWLRQV ZH PXVW GHVFULEH WKH GHSHQGHQFH GLVSOD\HG LQ WKH MRLQW GLVWULEXWLRQ RI 7DEOH 7KH PDUJLQDO PRGHOV ZH FRQVLGHU ZLOO EH XVHG WR LQYHVWLJDWH ZKHWKHU WKH SUREDELOLW\ WKDW D UDQGRPO\ VHOHFWHG VXEMHFW UHVSRQGV DW OHYHO L RU ORZHU LQ LV GLIIHUHQW IURP WKH SUREDELOLW\ WKDW D UDQGRPO\ VHOHFWHG VXEMHFW UHVSRQGV DW OHYHO L RU ORZHU LQ ,Q WKLV VHQVH WKH FRPSDULVRQ RI PDUJLQDO
PAGE 101
GLVWULEXWLRQV JLYHV D fSRSXODWLRQ DYHUDJHGf GHVFULSWLRQ RI FKDQJH 7KDW LV ZH ZLOO GHVFULEH KRZ WKH PDUJLQDO GLVWULEXWLRQ FKDQJHV RQ WKH ZKROH DYHUDJLQJ RYHU WKH HQWLUH SRSXODWLRQ ,Q FRQWUDVW VXEMHFWVSHFLILF PRGHOLQJ DOORZV XV WR LQYHVWLJDWH KRZ D UDQGRPO\ FKRVHQ VXEMHFWfV UHVSRQVH FKDQJHV IURP WR =HJHU HW DO f GLVFXVV DW OHQJWK WKH GLIIHUHQFH EHWZHHQ SRSXODWLRQ DYHUDJH DQG VXEMHFWVSHFLILF PRGHOV 7KH VDPH W\SHV RI TXHVWLRQV PD\ EH SRVHG DERXW WKH GLVWULEXWLRQV RI 7DEOH )RU H[DPSOH RQH PD\ ZLVK WR GHWHUPLQH ZKHWKHU WKH OHDIOHW LQVWUXFWLRQV DUH SHUFHLYHG DV FOHDUHU IRU RQH RI WKH GHYLFHV $OVR ZH PD\ EH LQWHUHVWHG LQ ZKHWKHU WKHUH LV D VHTXHQFH HIIHFW 7KDW LV GRHV WKH RUGHU RI fH[SRVXUHf WR WKH WZR GHYLFHfV LQVWUXFWLRQ OHDIOHW DIIHFW WKH SHUFHSWLRQ RI FODULW\ 7R DQVZHU WKHVH WZR TXHVWLRQV ZH PXVW LQYHVWLJDWH WKH PDUJLQDO GLVWULEXWLRQV FRUUHVSRQGLQJ WR WKH URZ DQG FROXPQ WRWDOV RI 7DEOH )LQDOO\ RQH PD\ EH LQWHUHVWHG LQ WHVWLQJ ZKHWKHU WKH DVVRFLDWLRQ EHWZHHQ WKH WZR UHVSRQVHV LV WKH VDPH IRU ERWK VHTXHQFHV :H ZLOO FRQVLGHU PRGHOLQJ WKH MRLQW GLVWULEXWLRQV WR DQVZHU WKLV TXHVWLRQ 0RGHOLQJ RI PDUJLQDO GLVWULEXWLRQV LV XVXDOO\ FRQGXFWHG VHSDUDWHO\ IURP WKH PRGHOLQJ RI MRLQW GLVWULEXWLRQV :H XVH UHVXOWV IURP &KDSWHU RI WKLV GLVVHUWDWLRQ WR VKRZ WKDW WKHVH PRGHOV FDQ EH ILW VLPXOWDQHRXVO\ XVLQJ PD[LPXP OLNHOLKRRG PHWKRGV 6LPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV OHDGV WR VHYHUDO DGYDQWDJHV ,W ZLOO SURYLGH D VLQJOH WHVW IRU RYHUDOO JRRGQHVV RI ILW $OVR LW SURYLGHV LPSURYHG PRGHO SDUVLPRQ\ SRWHQWLDOO\ UHVXOWLQJ LQ EHWWHU HVWLPDWHV WKDQ RQH ZRXOG REWDLQ E\ ILWWLQJ WKH PRGHOV VHSDUDWHO\
PAGE 102
:H FRQVLGHU IRXU FODVVHV RI VLPXOWDQHRXV PRGHOV /HW 6f UHSUHVHQW WKH FODVV RI VDWXUDWHG MRLQW GLVWULEXWLRQ PRGHOV 7KHVH PRGHOV LPSO\ QR VWUXFWXUH RQ WKH MRLQW GLVWULEXWLRQV DQG WKHUHIRUH DOORZ IRU JHQHUDO DVVRFLDWLRQ EHWZHHQ WKH 7 UHVSRQVHV 6LPLODUO\ OHW 0f EH WKH FODVV RI PDUJLQDO PRGHOV WKDW DVVXPH QR VWUXFWXUH RQ WKH PDUJLQDO GLVWULEXWLRQV LH 06f LV WKH FODVV RI VDWXUDWHG PDUJLQDO PRGHOV 'HQRWH WKH FODVVHV RI XQVDWXUDWHG PRGHOV E\ ^8f DQG 08f %\ VLPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV ZH FDQ FRQVLGHU IRXU FODVVHV RI PRGHOV 6f Q 06f ^8f Q 06f 6f Q 08f DQG 8f Q 0^8f 7KH XQLRQ RI WKHVH IRXU FODVVHV ZLOO EH GHQRWHG E\ Q 0 :H OHW WKH V\PERO ^0?f Q 00f ZKHUH 0L DQG 0 DUH SDUWLFXODU PRGHOV UHSUHVHQW D VSHFLILF PRGHO LQ Q 0 6RPH H[DPSOHV RI 0L DQG 0 DUH 0L fÂ§ 46< WKH TXDVLV\PPHWU\ PRGHO DQG 0 0+ WKH PDUJLQDO KRPRJHQHLW\ PRGHO 7KH WZR V\PEROV 6 DQG 8 ZLOO UHSUHVHQW HLWKHU WKH fFODVVf RI VDWXUDWHG DQG XQVDWXUDWHG PRGHOV RU DQ DUELWUDU\ PRGHO LQ WKRVH FODVVHV 7KH SRVVLELOLW\ WKDW WKH MRLQW GLVWULEXWLRQ VWUXFWXUH LPSOLHG E\ WKH MRLQW PRGHO ^0Lf ZLOO LPSO\ WKDW WKH PDUJLQDO GLVWULEXWLRQV DUH FRQVWUDLQHG LQ VRPH ZD\ LV DOZD\V WKHUH ,Q WKLV FDVH WKH PRGHO PD\ QRW EH ZHOO GHILQHG LQ WKH VHQVH RI &KDSWHU :H DGGUHVV WKLV LVVXH LQ VHFWLRQ 7KH ILUVW FODVV RI PRGHOV 6f Q 06f LV WKH FODVV RI FRPSOHWHO\ XQVWUXFWXUHG RU IXOO\ VDWXUDWHG PRGHOV 7KHVH PRGHOV ILW WKH GDWD SHUIHFWO\ DQG DUH XVHG SULPDULO\ IRU H[SORUDWRU\ SXUSRVHV ,I DQ HVWLPDWHG IUHHGRP SDUDPHWHU LV VPDOO UHODWLYH WR LWV VWDQGDUG HUURU WKH FRUUHVSRQGLQJ HIIHFW PD\ SURYH WR EH QHJOLJLEOH ,Q WKLV ZD\ WKH ILW RI WKH VDWXUDWHG PRGHO PD\ VXJJHVW VLPSOHU PRGHOV WKDW PD\ ILW WKH GDWD ZHOO
PAGE 103
7KH PRGHOV LQ FODVV 8f Q 06f IRFXV RQ PRGHOLQJ WKH MRLQW GLVWULn EXWLRQV 1R DGGLWLRQDO VWUXFWXUH RQ WKH PDUJLQDO GLVWULEXWLRQ LV DVVXPHG 7KLV FODVV LQFOXGHV RUGLQDU\ ORJOLQHDU PRGHOV IRU WKH H[SHFWHG FHOO IUHTXHQFLHV LQ WKH MRLQW GLVWULEXWLRQV )LWWLQJ WKLV VLPXOWDQHRXV PRGHO LV HTXLYDOHQW WR VHSDUDWHO\ ILWWLQJ WKH MRLQW GLVWULEXWLRQ PRGHO ^f LQ WKDW WKH JRRGQHVVRIILW VWDWLVWLF DQG MRLQW PRGHO SDUDPHWHU HVWLPDWHV ZLOO EH H[DFWO\ WKH VDPH 7KHUH LV KRZHYHU VRPH EHQHILW WR ILWWLQJ WKH VLPXOWDQHRXV PRGHO PDUJLQDO PRGHO SDUDPHWHU HVWLPDWHV DUH REWDLQHG ,Q JHQHUDO WKHVH 8f PRGHOV DUH QRW GHVLJQHG WR HVWLPDWH HIIHFWV LQ PDUJLQDO GLVWULEXWLRQV 7KHUH DUH H[FHSWLRQV )RU H[DPSOH WKH V\PPHWU\ PRGHO IRU WKH MRLQW GLVWULEXWLRQ LPSOLHV WKDW DOO RI WKH PDUJLQDO GLVWULEXWLRQV DUH HTXDO %LVKRS HW DO f GLVFXVV FRPSDULQJ WKH ILW RI WKH V\PPHWU\ 6
PAGE 104
UHVSRQVH GDWD DQ DQDORJRXV WHVW XVLQJ WKH /DJUDQJH PXOWLSOLHU VWDWLVWLF ZKLFK LV VKRZQ WR EH HTXDO WR 3HDUVRQfV FKLVTXDUHG VWDWLVWLF LQ &KDSWHU f LV 0F1HPDUfV f WHVW ,Q WKLV FKDSWHU ZH ZLOO IRFXV SULPDULO\ RQ WKH SDUVLPRQLRXV PRGHOV ZLWKLQ WKH FODVV 8f Q 08f 2IWHQ WLPHV D VLPSOH PRGHO FDQ EH IRXQG WKDW ILWV WKH GDWD UHODWLYHO\ ZHOO 6LPXOWDQHRXV LQIHUHQFHV DERXW ERWK WKH DVVRFLDWLRQ VWUXFWXUH DQG WKH PDUJLQDO GLVWULEXWLRQ VWUXFWXUH FDQ EH PDGH XVLQJ WKH PRGHO RU IUHHGRP SDUDPHWHU HVWLPDWHV RU JRRGQHVVRIILW VWDWLVWLFV $OVR E\ WKH SDUVLPRQ\ SULQFLSOH WKH SDUDPHWHU HVWLPDWHV PD\ EH PRUH UHOLDEOH WKDQ WKRVH EDVHG RQ OHVV VWUXFWXUHG PRGHOV 6HH $JUHVWL f DQG %LVKRS HW DO f IRU D GLVFXVVLRQ RI WKH EHQHILWV RI XVLQJ SDUVLPRQLRXV PRGHOV :H FDQ XVH PRGHOV ZLWKLQ WKLV FODVV WR WHVW VXFK WKLQJV DV 0+ JLYHQ WKDW 46< KROGV 7KLV FDQ EH DFFRPSOLVKHG E\ FRPSDULQJ WKH ILW RI 46
PAGE 105
:H FRQVLGHU PRGHOV LQ WKH IROORZLQJ FODVVHV &LORJ$L[ 0 & ORJ$Â ; RU =A fÂ§ ;I f 7KH PDWULFHV &M DQG & DUH HLWKHU LGHQWLW\ FRQWUDVW URZV VXP WR ]HURf RU ]HUR PDWULFHV 7KH PRGHO PDWULFHV ;? DQG ; DUH DVVXPHG WR EH RI IXOO FROXPQ UDQN :H UHIHU WR WKH SDUDPHWHUV LQ YHFWRUV DQG DV IUHHGRP SDUDPHWHUV ZKHUHDV WKH FRPSRQHQWV RI WKH SDUDPHWHU YHFWRU cM ZLOO EH FDOOHG PRGHO SDUDPHWHUV (YLGHQWO\ WKH FODVV RI PRGHOV Q0 RI f LV YHU\ EURDG 3HUPLVVLEOH PRGHOV IRU WKH MRLQW GLVWULEXWLRQV LQFOXGH VLPSOH ORJOLQHDU PRGHOV DV ZHOO DV PRGHOV IRU ORJ RGGV UDWLRV XVLQJ LQGLYLGXDO FHOOV HJ ORFDO RGGV UDWLRVf RU JURXSLQJV RI FHOOV HJ JOREDO RGGV UDWLRV ZKLFK DUH FURVVSURGXFW UDWLRV RI TXDGUDQW SUREDELOLWLHV FI 'DOH f 7KH PDUJLQDO PRGHOV RI FODVV 0 FDQ EH ORJOLQHDU RU FRUUHVSRQGLQJ ORJLW PRGHOV VXFK DV DGMDFHQW FDWHJRULHV RU EDVHOLQHFDWHJRULHV ORJLW PRGHOVf RU WKH\ FDQ EH RWKHU W\SHV RI PXOWLQRPLDO UHVSRQVH PRGHOV VXFK DV FXPXODWLYH RU FRQWLQXDWLRQUDWLR ORJLW PRGHOV $JUHVWL f 7KH VHFRQG IRUP IRU HDFK PRGHO LQ f DOORZV IRU OLQHDU SUREDELOLW\ RU PHDQ UHVSRQVH PRGHOV *UL]]OH HW DO f $OO RI WKH PRGHOV LQ Q0 FDQ EH ILW XVLQJ WKH PHWKRGV RI &KDSWHU :H LOOXVWUDWH WKH XVHIXOQHVV RI WKHVH PRGHOV E\ ZD\ RI H[DPSOH 1XPHULFDO ([DPSOHV ([DPSOH :H EHJLQ E\ VLPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV IRU 7DEOH 5HFDOO WKDW UHVSRQVH YDULDEOH 9A UHSUHVHQWV D UDQGRPO\ FKRVHQ VXEMHFWfV UHVSRQVH WR WKH SROLWLFDO LQWHUHVW TXHVWLRQ LQ
PAGE 106
DQG \f LV D UDQGRPO\ FKRVHQ VXEMHFWfV UHVSRQVH WR WKH SROLWLFDO LQWHUHVW TXHVWLRQ LQ 6RPH FDQGLGDWH PRGHOV IRU WKH MRLQW GLVWULEXWLRQ RI 9Lf 9nOff LQFOXGH WKH IROORZLQJ ,f ORJ b D D
PAGE 107
KRPRJHQHLW\ LI WKHUH LV QR DVVRFLDWLRQ EHWZHHQ OHYHO RI UHVSRQVH 5f DQG UHVSRQVH YDULDEOH 9f FI $JUHVWL f :KHQ WKH QXPEHU RI OHYHOV RI 9 H[FHHGV WZR LH 7 f DQG 9 FDQ EH FRQVLGHUHG RUGLQDO UDWKHU WKDQ DVVXPH WKDW WKHUH DUH JHQHUDO URZ HIIHFWV IRU OHYHOV RI 9 RQH FRXOG DFFRXQW IRU WKH RUGLQDOLW\ E\ LQWURGXFLQJ VFRUHV IRU WKH OHYHOV RI 9 7KDW LV ZH FRXOG UHSODFH 3SY8L E\ IL598L9W LQ WKH ORJOLQHDU PRGHO DQG UHSODFH W E\ YW LQ WKH FXPXODWLYH ORJLW PRGHO $Q H[DPSOH ZKHUH ZH FDQ FRQVLGHU 9 DV RUGLQDO LV ZKHQ WKH 7 UHVSRQVHV UHSUHVHQW UHSHDWHG PHDVXUHV RYHU WLPH 7KH 7 OHYHOV RI 9 DUH WKHQ QDWXUDOO\ RUGHUHG UHVSRQVH DW RFFDVLRQ 9Aff UHVSRQVH DW RFFDVLRQ :ff UHVSRQVH DW RFFDVLRQ 7 97ff )RU PRGHO LGHQWLILDELOLW\ FHUWDLQ SDUDPHWHUV RU PRUH JHQHUDOO\ OLQHDU FRPELQDWLRQV RI SDUDPHWHUVf ZHUH VHW WR ]HUR )RU H[DPSOH WKH SDUDPHWHU RI PRGHO 0^&8f ZDV VHW WR ]HUR 7R REWDLQ LQIRUPDWLRQ DERXW ZKLFK VLPXOWDQHRXV PRGHOV PD\ ILW ZHOO ZH ILUVW LQYHVWLJDWH MRLQW DQG PDUJLQDO PRGHOV VHSDUDWHO\ 7DEOH FRQWDLQV OLNHOLKRRGUDWLR *f DQG 3HDUVRQ $7f JRRGQHVVRIILW VWDWLVWLFV IRU VHYHUDO PRGHOV LQ WKH FODVV 8fQ06f 7KH DVVRFLDWHG GLVWDQFH RU UHVLGXDO GHJUHHV RI IUHHGRP DUH OLVWHG DV ZHOO 7KH OLQHDUE\OLQHDU WHUPV XVHG HTXDOO\ VSDFHG VFRUHV IRU URZV DQG IRU FROXPQV 7DEOH RLQW 'LVWULEXWLRQ 0RGHOVfÂ§*RRGQHVV RI )LW 0RGHO GI ; ^6fQ06f 46
PAGE 108
%RWK 46
PAGE 109
FRPSXWHG DV IROORZV GILHV>^/ [/ 'f Q 08f? GILHV>/ [/ 'f@ GI7HV>08f@ 7KLV IROORZV VLQFH WKH PRGHO LV ZHOO GHILQHG LQ WKH VHQVH RI &KDSWHU DQG VLQFH IRU ZHOO GHILQHG PRGHOV UHVLGXDO GHJUHHV RI IUHHGRP LV VLPSO\ WKH GLIIHUHQFH EHWZHHQ WKH QXPEHU RI FRQVWUDLQWV LPSOLHG E\ WKH VLPSOHU PRGHO DQG WKH QXPEHU RI FRQVWUDLQWV LPSOLHG E\ WKH OHVV VWUXFWXUHG PRGHO 7DEOH FRQWDLQV WKH UHVXOW RI ILWWLQJ VHYHUDO PRGHOV LQ WKH FODVV / [ / 'f Q 08f 7DEOH &DQGLGDWH 0RGHOV LQ / [ / 'f Q 0^8ffÂ§*RRGQHVV RI )LW 0RGHO /[ / 'fQ06f /[/ 'fQ0&8f ^/[/ 'fQ0^/[/f / [ / 'fQ00+f GI &3 3 7KH VLPSOH PRGHO / [ / 'f Q 0&8f ILWV WKH GDWD YHU\ ZHOO GI f 7KLV PRGHO LPSOLHV WKDW WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV VLPXOWDQHRXVO\ IROORZ WKH PRGHOV / [ / 'f ORJIAWM D D
PAGE 110
7DEOH (VWLPDWHV RI )UHHGRP 3DUDPHWHUV IRU 0RGHO ^/ [ / 'f Q 0>&8f 3DUDPHWHU (VWLPDWH 6WG (UURU D YLf DL mU \f m H 8!L 8! L 7R WHVW IRU PDUJLQDO KRPRJHQHLW\ LQ WKH FRQWH[W RI WKLV PRGHO ZH FDQ XVH HLWKHU RI WZR DV\PSWRWLFDOO\ HTXLYDOHQW [Of WHVW VWDWLVWLFV Z fÂ§ : nnn ZKHUH : LV WKH VTXDUHG :DOG VWDWLVWLF 7KH 3YDOXHV IRU ERWK RI WKHVH WHVWV DUH OHVV WKDQ :H FRQFOXGH WKDW WKHUH LV VWURQJ HYLGHQFH RI PDUJLQDO KHWHURJHQHLW\ :H QHHG QRW DQG VKRXOG QRW VWRS KHUH 6LQFH ZH DUH ZRUNLQJ ZLWK PRGHO DQG IUHHGRP SDUDPHWHUV ZH FDQ FRQWLQXH ZLWK RWKHU PRGHOEDVHG LQIHUHQFHV ,QWHUYDO HVWLPDWLRQ RI FHUWDLQ LQWHUHVWLQJ IUHHGRP SDUDPHWHUV LV FRQVLGHUHG QH[W 7KH LQWHUSUHWDWLRQ RI WKH SDUDPHWHU LV DV IROORZV 7KH RGGV WKDW D UDQGRPO\ VHOHFWHG VXEMHFW ZRXOG KDYH UHVSRQGHG DW OHYHO L RU OHVV LQ LV H[Sf WLPHV KLJKHU WKDQ WKH RGGV WKDW D UDQGRPO\ VHOHFWHG VXEMHFW ZRXOG KDYH UHVSRQGHG DW OHYHO L RU OHVV LQ 7KXV WKH IUHHGRP SDUDPHWHU PHDVXUHV WKH GHSDUWXUH IURP PDUJLQDO KRPRJHQHLW\ LQ WKDW WKH WZR RGGV DUH
PAGE 111
LGHQWLFDO LI DQG RQO\ LI A :H XVH WKH GHOWD PHWKRG WR FRPSXWH D b FRQILGHQFH LQWHUYDO IRU WKH RGGV UDWLR H[SLf LW LV > @ 7KXV EDVHG RQ WKH GDWD DW KDQG ZH HVWLPDWH WKDW WKH RGGV WKDW D VXEMHFW ZRXOG UHVSRQG DW OHYHO L RU OHVV LQ LV EHWZHHQ DQG WLPHV KLJKHU WKDQ WKH RGGV WKDW D VXEMHFW ZRXOG UHVSRQG DW OHYHO L RU OHVV LQ 7KHUH LV VLJQLILFDQW HYLGHQFH RI LQFUHDVHG SROLWLFDO LQWHUHVW LQ UHODWLYH WR 1H[W ZH FRQVLGHU WKH DVVRFLDWLRQ EHWZHHQ WKH WZR UHVSRQVHV 7KH HVWLPDWHG RGGV WKDW WKH UHVSRQVH LQ ZDV fYHU\ PXFKf LQVWHDG RI fVRPHZKDWf LV H[S f WLPHV KLJKHU ZKHQ WKH UHVSRQVH LQ ZDV fYHU\ PXFKf WKDQ ZKHQ LW ZDV fVRPHZKDWf 7KH VDPH HVWLPDWHG RGGV UDWLR DSSOLHV ZKHQ WKH UHVSRQVH ZDV fVRPHZKDWf LQVWHDG RI fQRW PXFKf 6LPLODUO\ WKH HVWLPDWHG RGGV WKDW WKH UHVSRQVH LQ ZDV fYHU\ PXFKf LQVWHDG RI fQRW PXFKf LV H[S f WLPHV KLJKHU ZKHQ WKH UHVSRQVH LQ ZDV fYHU\ PXFKf WKDQ ZKHQ LQ ZDV fQRW PXFKf ,Q VXPPDU\ WKHUH LV HYLGHQFH RI VWURQJ SRVLWLYH DVVRFLDWLRQ EHWZHHQ WKH UHVSRQVH LQ DQG WKH UHVSRQVH LQ DQG WKHUH LV HYLGHQFH WKDW WKHUH ZDV JUHDWHU SROLWLFDO LQWHUHVW LQ WKDQ LQ 6XSSRVH ZH LJQRUHG WKH IDFW WKDW WKH VDPH VXEMHFWV UHVSRQGHG WR WKH SROLWLFDO LQWHUHVW TXHVWLRQ LQ DQG ,I ZH WUHDWHG WKH WZR UHVSRQVHV DV LQGHSHQGHQW WKHQ WKH URZ DQG FROXPQ PDUJLQDO FRXQWV ZRXOG EH GLVWULEXWHG DV LQGHSHQGHQW PXOWLQRPLDOV ZLWK WKH VDPH LQGH[ 1 Â‘ DQG SUREDELOLW\ YHFWRUV ^!ÂOf` DQG ^Af` 7KHQ LW IROORZV WKDW VHSDUDWHO\ ILWWLQJ WKH PDUJLQDO PRGHO 0^8f XQGHU WKLV LQGHSHQGHQFH DVVXPSWLRQ LV HTXLYDOHQW WR ILWWLQJ WKH VLPXOWDQHRXV PRGHO f Q 08f %\ UHVXOWV RI /LDQJ DQG =HJHU f WKH HVWLPDWHV RI SDUDPHWHUV LQ 0^8f ZRXOG EH FRQVLVWHQW HYHQ ZKHQ WKH UHVSRQVHV DUH QRW WUXO\ LQGHSHQGHQW +RZHYHU WKH HVWLPDWHV RI
PAGE 112
WKH FRUUHVSRQGLQJ VWDQGDUG HUURUV ZRXOG QR ORQJHU EH YDOLG 2QH ZD\ WR VHH WKDW ZH DUH ORVLQJ LQIRUPDWLRQ E\ LQFRUUHFWO\ DVVXPLQJ LQGHSHQGHQFH LV E\ FRPSDULQJ WKH OLNHOLKRRGUDWLR VWDWLVWLF IRU WHVWLQJ 0+ DVVXPLQJ ,f KROGV WR WKH OLNHOLKRRGUDWLR VWDWLVWLF IRU WHVWLQJ 0+ DVVXPLQJ WKDW / [ / 'f KROGV 7KH IRUPHU LV DQG WKH ODWWHU LV %RWK RI WKHVH YDOXHV ZRXOG EH FRPSDUHG WR D WDEOHG bOf YDOXH (YLGHQWO\ E\ DFFRXQWLQJ IRU WKH GHSHQGHQFH EHWZHHQ WKH UHVSRQVHV ZH KDYH JUHDWHU HYLGHQFH RI PDUJLQDO KHWHURJHQHLW\ $QRWKHU ZD\ RI LOOXVWUDWLQJ WKH HIIHFW RI ZURQJO\ DVVXPLQJ LQGHSHQGHQFH EHWZHHQ WKH UHVSRQVHV LV E\ ORRNLQJ DW WKH IUHHGRP SDUDPHWHU HVWLPDWHV DQG WKHLU HVWLPDWHG VWDQGDUG HUURUV IRU GLIIHUHQW PRGHOV 7DEOH FRQWDLQV HVWLPDWHV RI M DQG WKH FRUUHVSRQGLQJ VWDQGDUG HUURU HVWLPDWH XQGHU WKUHH GLIIHUHQW PRGHOV RI LQWHUHVW 1RWLFH WKDW WKH VWDQGDUG HUURUV DUH VLPLODU ZKHQ RQH XVHG HLWKHU WKH VDWXUDWHG RU WKH GLDJRQDO SDUDPHWHU PRGHO IRU WKH MRLQW GLVWULEXWLRQ 7DEOH )UHHGRP 3DUDPHWHU (VWLPDWHV DQG 6WDQGDUG (UURUV 0RGHO GI L VHLf 6f Q 0&8f / [ / 'f Q 0&8f ,f U0&8f :H KDYH VKRZQ WKDW WKHUH PD\ EH SUREOHPV ZLWK DVVXPLQJ WRR PXFK VWUXFWXUH RQ WKH MRLQW GLVWULEXWLRQ IRU H[DPSOH XQUHDVRQDEO\ DVVXPLQJ LQGHSHQGHQFH 6LPLODUO\ ZH VKRXOG EH FRQFHUQHG ZLWK DVVXPLQJ WRR OLWWOH VWUXFWXUH RQ WKH MRLQW GLVWULEXWLRQ ,Q WKLV FDVH WRR PDQ\ IUHHGRP SDUDPHWHUV UHTXLUH HVWLPDWLRQ DQG WKH RYHUDOO ILW PD\ EH XQUHOLDEOH $ JRRG PRGHO LV RQH WKDW ILWV WKH GDWD DW KDQG UHODWLYHO\ ZHOO DQG LV UREXVW WR WKH ZKLWH QRLVH
PAGE 113
SUHVHQW LQ WKH GDWD JHQHUDWLRQ 7KDW LV D JRRG ILWWLQJ PRGHO ZLWK PRGHO SDUDPHWHU HVWLPDWHV WKDW FKDQJH YHU\ OLWWOH IRU GLIIHUHQW UHDOL]DWLRQV RI WKH UDQGRP GDWD YHFWRU LV FRQVLGHUHG D JRRG PRGHO )RU H[DPSOH WKH VDWXUDWHG PRGHO ILWV SHUIHFWO\ EXW KDV SDUDPHWHU HVWLPDWHV WKDW PD\ FKDQJH JUHDWO\ IRU GLIIHUHQW UHDOL]DWLRQV ,Q WKLV VHQVH WKH VDWXUDWHG PRGHO PD\ QRW EH D JRRG RQH LW PD\ EH XQUHOLDEOH :KHQ ZH LJQRUH WKH DVVRFLDWLRQ VWUXFWXUH E\ VHSDUDWHO\ ILWWLQJ PDUJLQDO PRGHOV ZH DUH WDFLWO\ XVLQJ WKH VDWXUDWHG PRGHO IRU WKH MRLQW GLVWULEXWLRQ 7DEOH LOOXVWUDWHV ZK\ ZH VKRXOG VHDUFK IRU D JRRG ILWWLQJ SDUVLPRQLRXV PRGHO 1RWH WKDW WKH VWDQGDUG HUURUV RI H[SHFWHG FHOO IUHTXHQF\ HVWLPDWHV DUH LQIODWHG ZKHQ ZH DVVXPH D VDWXUDWHG PRGHO IRU WKH MRLQW GLVWULEXWLRQ 7KH PRUH SDUVLPRQLRXV PRGHO / [ / 'f Q0&8f ILWV DV ZHOO DV WKH OHVV VWUXFWXUHG PRGHO 6fU?0&8f \HW LW LV PRUH UHOLDEOH LQ WKH VHQVH GHVFULEHG DERYH 7DEOH (VWLPDWHG &HOO 0HDQV DQG 6WDQGDUG (UURUV IRU 0RGHOV 6f Q 0&8f DQG ^/[/ 'f Q 0&8f 6f Q 0&8f / [ / 'fQ0&8f $D VHILDf $LW VHLPf
PAGE 114
fÂ§ ([DPSOH :H FRQWLQXH ZLWK WKH FURVVRYHU GDWD H[DPSOH RI VHFWLRQ 'HQRWH WKH VHW RI ORFDO RGGV UDWLRV E\ ^UA` ZKHUH aLM N ALMN A}O MMfO N DQG UHSUHVHQWV WKH SUREDELOLW\ WKDW D UDQGRPO\ FKRVHQ VXEMHFW IURP *URXS *f N UHVSRQGV DW WKH LWK OHYHO IRU GHYLFH $ 9Off DQG WKH MWK OHYHO IRU GHYLFH % :ff 5HFDOO WKDW FXPXODWLYH PDUJLQDO SUREDELOLWLHV DUH GHQRWHG E\ YW IFf B (8L r9IF! LI L GHYLFH Âƒf (8L .YNL LLW GHYLFH %f ZKHUH L DQG N 7R HOXFLGDWH f UHSUHVHQWV WKH SUREDELOLW\ WKDW D UDQGRPO\ FKRVHQ VXEMHFW IURP *URXS ZLOO UHVSRQG DW OHYHO RU ORZHU IRU GHYLFH % 9ff 6RPH SRVVLEOH PRGHOV IRU WKH MRLQW GLVWULEXWLRQV RI Ye? 9Afn N LQFOXGH WKH IROORZLQJ $6f 9:*9:*909:f /[/f 9:*9:*f ^9:9:*f 8$*ff ^8$f ORJ +LMN DLMN ORJ I0LMN D RF<: Df D" mUFf9! ORJrr D DUf! DMP D* D<Â* Dbff* DYWnYfXL 9M ORJ rr D D If! D
PAGE 115
LQWHUDFWLRQ DQG WKDW WKH DVVRFLDWLRQ EHWZHHQ WKH RUGLQDO UHVSRQVHV FDQ EH DFFRXQWHG IRU E\ LQFOXGLQJ D OLQHDUE\OLQHDU DVVRFLDWLRQ SDUDPHWHU 9? *f LV WKH PXWXDO LQGHSHQGHQFH PRGHO DQG 9A* 9A*f LPSOLHV WKDW DQG :f DUH FRQGLWLRQDOO\ LQGHSHQGHQW JLYHQ 7KH PRGHO 8$*ff LPSOLHV XQLIRUP DVVRFLDWLRQ ZLWKLQ OHYHOV RI DQG 8$f LV WKH VLPSOH PRGHO WKDW DVVXPHV WKLV XQLIRUP DVVRFLDWLRQ LV WKH VDPH IRU ERWK OHYHOV RI :KHQ WKH URZ DQG FROXPQ VFRUHV ^XÂ` DQG ^YM` DUH HTXDOO\ VSDFHG PRGHOV / [ /f DQG 8$f DUH HTXLYDOHQW ,W LV VKRZQ LQ VHFWLRQ WKDW PRGHO 9f *f LPSOLHV WKDW WKH PDUJLQDO GLVWULEXWLRQV RI )f )ffn GR QRW GHSHQG RQ :KHQ WKLV KDSSHQV WKH VLPXOWDQHRXV PRGHO ZLOO EH LOO GHILQHG ZKHQHYHU WKH PDUJLQDO PRGHO FRQVWUDLQV WKH PDUJLQDO GLVWULEXWLRQV WR EH HTXDO DFURVV OHYHOV RI :H ZLOO QRW FRQVLGHU WKLV SDUWLFXODU PRGHO IRU WKLV UHDVRQ 7KH UHVW RI WKH PRGHOV GR QRW LPSO\ DQ\ VWUXFWXUH RQ WKH PDUJLQDO GLVWULEXWLRQV $OVR QRWLFH WKDW VLPXOWDQHRXVO\ ILWWLQJ 9A* 9A*f DQG VRPH PDUJLQDO PRGHO 08f LV HTXLYDOHQW WR VHSDUDWHO\ ILWWLQJ 08f ZKHQ WKH URZ DQG FROXPQ PDUJLQDO FRXQWV DUH WUHDWHG DV LQGHSHQGHQW PXOWLQRPLDOV ZLWKLQ HDFK OHYHO RI 7KH PDUJLQDO PRGHOV ZH ILWWHG LQFOXGH WKH IROORZLQJ FXPXODWLYH ORJLW PRGHOV 06f ORJLWUL Nff ILXW 09*f ORJLWLL Nff IW IW 36 09 *f ORJLWNff IW < IW* 09f ORJLW !tff IWIW9 0^ f ORJLW U I L IFff IW
PAGE 116
ZKHUH 06f LV WKH VDWXUDWHG PRGHO DQG 09*f LV WKH SURSRUWLRQDORGGV FXPXODWLYHORJLW PRGHO IRU WKH PDUJLQDO SUREDELOLWLHV WKDW DOORZV IRU RWKHUZLVH JHQHUDO DVVRFLDWLRQ EHWZHHQ WKH UHVSRQVH YDULDEOH 9 WKH JURXS RU SRSXODWLRQ YDULDEOH DQG WKH UHVSRQVH fOHYHOf 5 ,Q WKH OLWHUDWXUH RQ FURVVRYHU GHVLJQV D VHFRQGRUGHU LQWHUDFWLRQ DPRQJ 9 DQG 5 LV VDLG WR EH D fFDUU\RYHUf HIIHFW 7KH PRGHO 09 *f LPSOLHV WKDW WKHUH LV QR VHFRQGRUGHU LQWHUDFWLRQ DPRQJ WKH YDULDEOHV 9 DQG 5 LH WKH PRGHO LPSOLHV WKDW WKHUH LV QR FDUU\RYHU HIIHFW 7KH PRGHO 09f LPSOLHV WKDW WKHUH LV QR HIIHFW LH QR VHTXHQFH HIIHFW )LQDOO\ WKH VLPSOH PRGHO 0 f LPSOLHV WKDW WKHUH LV QR 9 RU HIIHFW 7R PDNH WKHVH PRGHOV LGHQWLILDEOH ZH SODFH WKH IROORZLQJ UHVWULFWLRQV RQ WKH IUHHGRP SDUDPHWHUV 3; 3; 3Y 3" 3; 3* 9* 39*L LI W N AWN ? RWKHUZLVH :LWK WKLV SDUDPHWHUL]DWLRQ c9 DQG 9* PHDVXUH GHYLFH VHTXHQFH DQG FDUU\RYHU HIIHFWV UHVSHFWLYHO\ 7DEOH GLVSOD\V WKH JRRGQHVVRIILW VWDWLVWLFV DQG WKHLU DVVRFLDWHG GHJUHHV RI IUHHGRP IRU VHYHUDO VLPXOWDQHRXV PRGHOV 7KH / [ / PRGHO XVHG WKH HTXDOO\ VSDFHG URZ DQG FROXPQ VFRUHV L` DQG ^YM M`
PAGE 117
7DEOH &URVVRYHU 'DWD 0RGHOVfÂ§*RRGQHVV RI )LW 0RGHO GI ; ^6fQ06f 6fQ09*f 8$fU06f 9:* 9t* 9A9:f Q 09*f / [ /f F09*f :*9:*f Q 09*f 8$*ff F09*f ^8$f U 09*f 8$f Q09 *f 8$fQ0^9f 8$f Q 0Of (YLGHQWO\ WKH SDUVLPRQLRXV PRGHO 8ÂƒfU0>9f ILWV WKH GDWD YHU\ ZHOO 7KLV PRGHO LPSOLHV WKDW WKHUH LV QR SHULRG RU FDUU\RYHU HIIHFW DQG WKDW WKH XQLIRUP DVVRFLDWLRQ VWUXFWXUH LV WKH VDPH IRU HDFK VHTXHQFH JURXS 7KHUH LV HYLGHQFH RI D VLJQLILFDQW GHYLFH HIIHFW GI fÂ§ f :H ZLOO SURFHHG WR GHVFULEH WKLV GHYLFH HIIHFW 7KH IUHHGRP SDUDPHWHU 0/ HVWLPDWHV DQG WKHLU FRUUHVSRQGLQJ VWDQGDUG HUURU HVWLPDWHV DUH WDEOHG LQ 7DEOH 7DEOH )UHHGRP 3DUDPHWHU 0/ (VWLPDWHV IRU 0RGHO 8$fQ09f 3DUDPHWHU (VWLPDWH 6W (UURU MM 3L 3 3] 39 7KHVH HVWLPDWHV DOVR LQGLFDWH WKDW WKHUH LV D VLJQLILFDQW GHYLFH HIIHFW WKH :DOG VWDWLVWLF ZKLFK LV EDVHG RQ GHJUHH RI IUHHGRP WDNHV RQ WKH YDOXH RI
PAGE 118
,OO : VH9ff 7KH PDJQLWXGH RI WKH GHYLFH HIIHFW FDQ EH HVWLPDWHG XVLQJ 9 6SHFLILFDOO\ WKH RGGV RI UHVSRQGLQJ M RU KLJKHU IRU GHYLFH % LV HVWLPDWHG WR EH HAY WLPHV KLJKHU WKDQ WKH RGGV IRU GHYLFH $ 8VLQJ WKH GHOWD PHWKRG DQ DSSUR[LPDWH b FRQILGHQFH LQWHUYDO IRU WKLV RGGV UDWLR LV f 6LQFH WKH KLJKHU UHVSRQVHV FRUUHVSRQG WR OHVV SHUFHLYHG FODULW\ RI WKH LQVWUXFWLRQDO OHDIOHW ZH FRQFOXGH WKDW WKHUH LV HYLGHQFH VXJJHVWLQJ D VLJQLILFDQW LPSURYHPHQW RI GHYLFH $ RYHU GHYLFH % LQ WHUPV RI SHUFHLYHG FODULW\ RI LQVWUXFWLRQV :H FDQ GHVFULEH WKH DVVRFLDWLRQ EHWZHHQ WKH WZR UHVSRQVHV XVLQJ WR )RU HLWKHU VHTXHQFH JURXS WKH RGGV RI UHVSRQGLQJ DW OHYHO L LQVWHDG RI I IRU GHYLFH $ LV HVWLPDWHG WR EH H[Sf WLPHV KLJKHU ZKHQ WKH UHVSRQVH IRU GHYLFH % ZDV L UDWKHU WKDQ L 7KLV KROGV IRU HDFK L DQG M ,Q VXPPDU\ WKHUH LV D PRGHUDWH SRVLWLYH DVVRFLDWLRQ EHWZHHQ WKH WZR UHVSRQVHV WKH VWUHQJWK RI DVVRFLDWLRQ EHLQJ WKH VDPH IRU ERWK VHTXHQFH JURXSV 7KHUH DOVR LV VLJQLILFDQW HYLGHQFH RI LQFUHDVHG SHUFHLYHG FODULW\ IRU GHYLFH $ RYHU GHYLFH % 3URGXFW0XOWLQRPLDO 9HUVXV 3URGXFW3RLVVRQ (VWLPDWRUV $Q $SSOLFDWLRQ ,Q WKLV VHFWLRQ DQG LQ VHFWLRQ ZH H[SORUH VRPH RI WKH PRUH SUDFWLFDO DVSHFWV RI PRGHO ILWWLQJ IRU FDWHJRULFDO GDWD ,Q WKLV VHFWLRQ ZH ZLOO LOOXVWUDWH E\ ZD\ RI H[DPSOH KRZ WR GHWHUPLQH ZKHQ LQIHUHQFHV EDVHG RQ IUHHGRP SDUDPHWHUV ZLOO EH WKH VDPH XQGHU ERWK VDPSOLQJ DVVXPSWLRQVfÂ§SURGXFW PXOWLQRPLDO DQG SURGXFW3RLVVRQ 7KH PHWKRG RI GHWHUPLQDWLRQ LV D GLUHFW FRQVHTXHQFH RI 7KHRUHP ,Q VHFWLRQ ZH DGGUHVV DW OHDVW SDUWLDOO\
PAGE 119
WKH LVVXH RI ZKHWKHU RU QRW WKH PRGHO LV ZHOO GHILQHG &ORVHO\ UHODWHG WR WKLV LV WKH FRPSXWDWLRQ RI UHVLGXDO DQG PRGHO GHJUHHV RI IUHHGRP &RQVLGHU WKH GDWD WDNHQ IURP WKH +DUYDUG 6WXG\ RI $LU 3ROOXWLRQ DQG +HDOWK 7KH GDWD GLVSOD\HG LQ 7DEOH FDQ EH IRXQG LQ $JUHVWL Sf WKH\ ZHUH VXSSOLHG E\ 'U DPHV :DUH 7DEOH &KLOGUHQfV 5HVSLUDWRU\ ,OOQHVV 'DWD 1R 0DWHUQDO &KLOGfV 5HVSLUDWRU\ ,OOQHVV 6PRNLQJ $JH $JH $JH $JH 1R
PAGE 120
$IWHU ILWWLQJ VHYHUDO VLPXOWDQHRXV PRGHOV ZH ILQDOO\ VHWWOHG RQ WKH IROORZLQJ JRRGILWWLQJ GI f VLPXOWDQHRXV PRGHO OR6PL8 D D9P D@f Â‘n F
PAGE 121
k 2+2 Lr R R 2 fÂ§ + OfÂ§ R ,fÂ§ R R R + 2 ,fÂ§ 2 2 + 2 +L 2 + 2 2 2 ,fÂ§ +r ,fÂ§ 2 + 2 +! R +r 2 2 2 + ,fÂ§ R R ,fÂ§ R +r 2 R R R R R R 2 ,fÂ§ R R R 2 +! ,fÂ§r ,fÂ§ 2 + R R R R R R R R R R ,fÂ§ 2 ,fÂ§ 2 2 +r Kr R R R R R R R R R R R R OW WR R &2 WR S ,, 2 2 2 +W R R R A 2 2 R R R 2 +W R R A O+ R R R / 2 R R }Âp DQG Â‘& 2 ,, ,, ,, ,, YHF L 3 k k Lr : IF2 cX WR f 3 ZKHUH
PAGE 122
R R 2 U+ U+ LfÂ§, U+ 2 U+ U+ 7 7fÂ§, U+ U+ A' U+ U+ U+ U+ A' 2 R R R R R 2 LfÂ§, LfÂ§, R R R 2 U+ LfÂ§, R R R 2 U+ 2 U+ 2 LfÂ§, R R R R R R 2 U+ R R R R U+ R R R R R R R R 2 U+ R R R R R R R R R R U+ U+ &' R R U+ U+ U+ U+ U+ U+ U+ U+ A' A' A' 2 2 2 R R R R R R R R R R R R R R R U+ A' U+ U+ 2 R R U+ A' U+ A' 7fÂ§, 2 U+ A' U+ A' U+ A' U+ A' U+ U+ U+ A' U+ U+ U+ U+ A' U+ 2 2 2 2 2 R U+ U+ A' U+ U+ U+ U+ U+ U+ U+ U+ A' A' 2 2 2 U+ LfÂ§O 2 R R U+ 2 2 U+ 2 U+ U+ LfÂ§, 2 U+ 2 LfÂ§, U+ 2 2 2 2 2 2 U+ LfÂ§, U+ 2 2 U+ LfÂ§, 2 U+ 2 2 2 2 2 R R R R U+ 2 2 R R R R R R R R R R R R R R R R 2 LfÂ§, U+ U+ & U+ 2 R R R R R R R R R R 2 U+ 2 U+ LfÂ§, LfÂ§L LfÂ§, LfÂ§, R R R R R R R R R R 2 U+ 2 R R R R R R R R R R R R R R R R R R R R 2 U+ 2 U+ 2 2 R R R R R 2 LfÂ§, U+ 2 R R R R R R R R R R R R R R R R 2 U+ U+ 2 U+ U+ U+ U+ U+ U+ A' A' A' 2 U+ R R R R R R R R R R R R R R R R R R R R 2 LfÂ§O 2 U+ U+ 2 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R 2 U+ 2 U+ 2 2 R R R R R R fÂ§, U+ U+ R U+ U+ U+ 2 9 Y! Q V F` 1 V Y! WULQ '!U D D UO D \Lf\f \Lf\: \Lf\f \rf\Vf \rf\rf \rf\f
PAGE 123
DQG $OVR WKH YHFWRU RI H[SHFWHG FHOO FRXQWV Q LV D f [ YHFWRU DQG LV GHILQHG DV + AOLP M f f f M ÂÂA f f f ÂÂ fff 7KDW LV WKH ODVW VXEVFULSW FRUUHVSRQGLQJ WR WKH VWK JURXSf LV FKDQJLQJ WKH VORZHVW DQG WKH RWKHU VXEVFULSWV DUH LQ OH[LFRJUDSKLFDO RUGHU ,Q YLHZ RI 7KHRUHP ZH PXVW GHWHUPLQH IRU L ZKHWKHU RU QRW &^ LV D FRQWUDVW PDWUL[ ,I LW LV QRW WKHQ ZH PXVW ILQG WKRVH FROXPQV RI WKDW VSDQ D VHW FRQWDLQLQJ WKH UDQJH VSDFH RI OPc ZKHUH TL fÂ§ P? fÂ§ DQG J P 5HFDOO WKDW TL LV WKH QXPEHU RI UHVSRQVH IXQFWLRQV ZLWKLQ HDFK LQGHSHQGHQW SRSXODWLRQ IRU WKH LWK PRGHO )RU H[DPSOH IRU WKLV GDWD VHW WKH VHFRQG PRGHO L f ZKLFK LV WKH PDUJLQDO PRGHO f KDV J ORJLWV WR EH PRGHOHG ZLWKLQ HDFK RI WKH WZR SRSXODWLRQ JURXSV FKLOGUHQ ZLWK VPRNLQJ PRWKHUV DQG FKLOGUHQ ZLWK QRQVPRNLQJ PRWKHUVf $V LQ WKH VWDWHPHQW RI WKH WKHRUHP ZH ZLOO ILQG D PLQLPDO VSDQQLQJ VXEVHW 6LQFH PDWUL[ &? LV QRW D FRQWUDVW PDWUL[ ZH ZLVK WR ILQG WKH FROXPQV RI ;L WKDW VSDQ D VSDFH FRQWDLQLQJ WKH UDQJH VSDFH RI kL :LWK WKH SDUDPHWHUL]DWLRQ ZH KDYH XVHG ZH FDQ HDVLO\ VHH WKDW WKH ILUVW DQG WKH VL[WK FROXPQV RI ;? VSDQ WKH UHTXLUHG VSDFH $OVR & LV D FRQWUDVW PDWUL[ 7KHUHIRUH LW IROORZV E\ 7KHRUHP WKDW WKH WZR DV\PSWRWLF YDULDQFHV RI WKH IUHHGRP SDUDPHWHU HVWLPDWRUV FRPSXWHG XQGHU WKH WZR GLIIHUHQW VDPSOLQJ DVVXPSWLRQV DUH UHODWHG DV IROORZV YDU&3^0ff YDUASff A $ f f
PAGE 124
ZKHUH $QLVDO[O PDWUL[ ZLWK ]HURHV HYHU\ZKHUH H[FHSW LQ URZV DQG DQG FROXPQV DQG DQG DOO WKH RWKHU $fV DUH ]HUR PDWULFHV 7DEOH GLVSOD\V WKH IUHHGRP SDUDPHWHU HVWLPDWRUV DQG WKHLU HVn WLPDWHG VWDQGDUG HUURUV ZKLFK ZHUH FDOFXODWHG XQGHU WKH WZR VDPSOLQJ DVVXPSWLRQV 1RWLFH WKDW RQO\ WKRVH VWDQGDUG HUURUV FRUUHVSRQGLQJ WR WKH SDUDPHWHUV D DQG RI DUH GLIIHUHQW IRU WKH WZR VDPSOLQJ VFKHPHV 7KHVH DUH WKH SDUDPHWHUV WKDW FRUUHVSRQG WR WKH ILUVW DQG VL[WK FROXPQV RI ;? 7DEOH 3URGXFW0XOWLQRPLDO YHUVXV 3URGXFW3RLVVRQ )UHHGRP 3DUDPHWHU (VWLPDWLRQ 3DUDPHWHU (VWLPDWH 3URGXFW0XOWLQRPLDO 6WDQGDUG (UURU 3URGXFW3RLVVRQ 6WDQGDUG (UURU D f \Lf m Y2f DO DIf DIf RI R\f DY:V XQ L[Q DYrfV 8 9Lf\f 9Gf\f A \fYf A \f9f 8 \rfYf XLL H HY 2QH ODVW UHPDUN ZRUWK PHQWLRQLQJ LV ZLWK UHJDUG WR WKH VWDQGDUG HUURU HVWLPDWHV RI WKH HVWLPDWHG H[SHFWHG FHOO FRXQWV 7KH SUHFLVLRQ
PAGE 125
HVWLPDWHV ZLOO EH GLIIHUHQW IRU WKH WZR VDPSOLQJ VFKHPHV ,Q IDFW WKH UHODWLRQVKLS f YL] 3f }3fn YDUÂ0ff YDUÂSff pI fÂ§fÂ§fÂ§fÂ§ UWL DOORZV XV WR GHWHUPLQH KRZ GLIIHUHQW WKH WZR YDULDQFHV ZLOO EH )RU H[DPSOH WKH HVWLPDWHG H[SHFWHG FHOO FRXQW IRU FHOO f LV ÂPLO fÂ§ DQG WKH VWDQGDUG HUURUV DUH DQG FRUUHVSRQGLQJ WR WKH SURGXFW PXOWLQRPLDO DQG SURGXFW3RLVVRQ VDPSOLQJ DVVXPSWLRQV 7KH GLIIHUHQFH LQ VWDQGDUG HUURUV LV VXEVWDQWLDO ,Q FRQWUDVW WKH HVWLPDWHG H[SHFWHG FHOO FRXQW IRU FHOO f LV ML DQG WKH WZR VWDQGDUG HUURUV DUH DQG 7KH SURGXFW3RLVVRQ VWDQGDUG HUURU HVWLPDWH LV RQO\ VOLJKWO\ LQIODWHG 6XSSRVH WKDW LQVWHDG RI DVVXPLQJ WKH ORJLW PRGHO f IRU WKH PDUJLQDO SDUDPHWHUV ZH XVHG WKH HTXLYDOHQW ORJOLQHDU PRGHO 7KDW LV ZH ZLOO PRGLI\ WKH PDWULFHV & $ DQG ; DQG WKH YHFWRU VR WKDW WKH ORJLW PRGHO LV HTXLYDOHQWO\ H[SUHVVHG DV D ORJOLQHDU PRGHO /HW kMLJ $b $ QR PRGLILFDWLRQ LV QHFHVVDU\ IRU WKLV H[DPSOHf DQG r 9, ?
PAGE 126
:LWK WKLV VSHFLILFDWLRQ WKH ORJLW PRGHO LV HTXLYDOHQW WR WKH ORJOLQHDU PRGHO 0 ORJPÂ Nf $ $I $I $IV $I Y $I f ZKHUH $I Y VDWLVILHV DQG ^PW$f` LV WKH VHW RI H[SHFWHG PDUJLQDO FRXQWV 7KDW LV QNIfL^W?Nf 7KH YHFWRU LV WKXV GHILQHG DV 3 fÂ§ $ $I $I$I$I $I $I $I $U $Ifn 1RWLFH WKDW WKH ORJOLQHDU PRGHO f LQFOXGHV WKH 96 HIIHFW 7KLV HIIHFW PXVW EH LQFOXGHG VR WKDW WKH PRGHO LV ZHOO GHILQHG :H ZLOO GLVFXVV WKLV IXUWKHU LQ WKH QH[W VHFWLRQ VHFWLRQ 7KH PDWUL[ & LV QRW D FRQWUDVW PDWUL[ IRU WKH ORJOLQHDU UHSUHVHQWDWLRQ RI WKH PDUJLQDO PRGHO 7KHUHIRUH WR GHWHUPLQH ZKLFK IUHHGRP SDUDPHWHU HVWLPDWRUV DUH XQDIIHFWHG E\ WKH VDPSOLQJ DVVXPSWLRQ ZH PXVW ILQG DPRQJ WKH FROXPQV RI ; WKH PLQLPDO VSDQQLQJ VHW IRU $WkIOPMf $WpAOf 1RWLFH WKDW WKH QXPEHU RI UHVSRQVH IXQFWLRQV ZLWKLQ HDFK SRSXODWLRQ IRU WKH PDUJLQDO PRGHO LV QRZ P Tb QRW T DV LW ZDV IRU WKH ORJLW PRGHO $JDLQ ZLWK WKH SDUDPHWHUL]DWLRQ ZH KDYH FKRVHQ ZH FDQ HDVLO\ VHH WKDW WKH ILUVW DQG WHQWK FROXPQV RI ; VSDQ D VHW WKDW FRQWDLQV WKH UDQJH VSDFH RI pMO ,QYRNLQJ 7KHRUHP ZH KDYH WKH IROORZLQJ UHVXOW /HWWLQJ WKH YHFWRU UHSUHVHQW WKH IUHHGRP SDUDPHWHU YHFWRU IRU PRGHO fQff YDU"0!f YDUS!f $ YDUSff A Af
PAGE 127
ZKHUH WKH HOHPHQWV RI WKH SDUWLWLRQHG PDWUL[ $ DUH $ R LI 0fr ^` [ ^L! $L LI Lf^` [ ^` NO ? RWKHUZLVH LI IF =fr^` ; ^` NO A RWKHUZLVH DQG Âƒ Â LI NOfW ^` ; ^` NO ? RWKHUZLVH %\ H[SUHVVLQJ YHFA f DV ALAM!Af6 ZH FDQ VWDWH WKH UHVXOW LQ DQRWKHU ZD\ ,I LMf e ^` [ ^` WKHQ FRYÂ Mf LV WKH VDPH XQGHU ERWK VDPSOLQJ DVVXPSWLRQV ,I LMf LV LQ WKH VHW WKHQ WKH FRYDULDQFHV PD\ EH GLIIHUHQW 7R LOOXVWUDWH ZH FRPSDUH WKH VWDQGDUG HUURUV IRU WKH ORJOLQHDU SDUDPHWHU HVWLPDWRUV ,W KDSSHQV WKDW DOO RI WKH IUHHGRP SDUDPHWHU HVWLPDWRUV DUH WKH VDPH VHH 7KHRUHP f DQG DOO RI WKH VWDQGDUG HUURUV DUH WKH VDPH H[FHSW WKRVH DVVRFLDWHG ZLWK WKH OLWK WK DQG WK SDUDPHWHUV QDPHO\ D DI $ DQG $I )RU WKHVH IRXU WKH VWDQGDUG HUURU HVWLPDWHV ZHUH UHODWHG DV IROORZV VHR_3RLVVRQf VHG_PXOWLQRPLDOf VHGI 3RLVVRQf VHÂ£I _PXOWLQRPLDOf VH$_3RLVVRQf VH$_PXOWLQRPLDOf VH$I _3RLVVRQf VH$I cPXOWLQRPLDOf
PAGE 128
,Q VXPPDU\ ZH ZHUH DEOH WR HDVLO\ GHWHUPLQH ZKHQ LQIHUHQFHV XVLQJ FHUWDLQ IUHHGRP SDUDPHWHU HVWLPDWRUV ZRXOG EH WKH VDPH XQGHU ERWK VDPSOLQJ VFKHPHV 7KLV KROGV IRU D YHU\ EURDG FODVV RI JHQHUDOL]HG ORJOLQHDU PRGHOV RI WKH IRUP &ORJ$L ; %DVLFDOO\ LI WKH PDWUL[ & LV D FRQWUDVW PDWUL[ WKDW LV ERWK &? DQG & DUH FRQWUDVW PDWULFHV DOO RI WKH LQIHUHQFHV DUH WKH VDPH 2Q WKH RWKHU KDQG LI IRU H[DPSOH &Â RI & LV DQ LGHQWLW\ PDWUL[ WKHQ ZH PXVW ORRN DW WKH GHVLJQ PDWUL[ ;L WR GHWHUPLQH ZKLFK FROXPQV IRUP D PLQLPDO VSDQQLQJ VXEVHW IRU WKH UDQJH VSDFH RI VRPH PDWUL[ RI WKH IRUP kI7Pc :KHQ &L LV DQ LGHQWLW\ PDWUL[ LV WKH QXPEHU RI UHVSRQVH IXQFWLRQV ZLWKLQ HDFK SRSXODWLRQ RU OHYHO RI FRYDULDWHf WKDW DUH PRGHOHG YLD &LORJ$LL ;LIF :HOO'HILQHG 0RGHOV DQG WKH &RPSXWDWLRQ RI 5HVLGXDO 'HJUHHV RI )UHHGRP :H PDGH VRPH UHPDUNV DERYH ZLWK UHJDUG WR PRGHOV EHLQJ ZHOO RU LOO GHn ILQHG 7R LOOXVWUDWH ZH XVH WKH VLPSOH H[DPSOH LQ ZKLFK WKH MRLQW GLVWULEXWLRQ PRGHO LV 6
PAGE 129
LV WKH MRLQW GLVWULEXWLRQ PRGHO DQG WKH PDUJLQDO GLVWULEXWLRQ PRGHO ZLOO QRW LQFOXGH UHGXQGDQW FRQVWUDLQWV DQG WKH VLPXOWDQHRXV PRGHO ZLOO EH ZHOO GHILQHG )RU WKLV H[DPSOH 6
PAGE 130
DUH PHDVXUHG RQ WKH VDPH VFDOH LH f ZH ZLOO DOVR DGGUHVV WKH VXIILFLHQW FRQGLWLRQV IRU PRGHO ZHOO GHILQHGQHVV LQ WKDW FDVH 'HQRWH WKH OHYHO FRYDULDWH E\ 3 7KH IROORZLQJ OHPPD LGHQWLILHV D ODUJH FODVV RI MRLQW GLVWULEXWLRQ PRGHOV WKDW RQO\ LPSO\ WKDW WKH H[SHFWHG PDUJLQDO FRXQWV VDWLVI\ WKH LGHQWLILDELOLW\ FRQVWUDLQWV ,W LV LPSRUWDQW WR SRLQW RXW WKDW ZH ZLOO EH UHIHUULQJ WR WZR W\SHV RI LGHQWLILDELOLW\ FRQVWUDLQWV f,GHQWLILDELOLW\f FRQVWUDLQWV DUH WKRVH FRQVWUDLQWV DVVRFLDWHG ZLWK PXOWLQRPLDO VDPSOLQJ QDPHO\ WKDW FHUn WDLQ VXPV RI SUREDELOLWLHV DGG XS WR f)UHHGRP LGHQWLILDELOLW\f FRQVWUDLQWV DUH WKRVH FRQVWUDLQWV WKDW DUH QHFHVVDU\ WR HQVXUH WKDW HDFK IUHHGRP SDUDPHWHU LQ WKH PRGHO LV HVWLPDEOH 7KH LGHQWLILDELOLW\ FRQVWUDLQWV IRU S ZLOO JHQHULFDOO\ EH ODEHOOHG DV LGHQWSf LQ WKLV VHFWLRQ 6LPLODUO\ OHW WKH LGHQWLILDELOLW\ FRQVWUDLQWV IRU P WKH YHFWRU RI H[SHFWHG PDUJLQDO FRXQWV EH GHQRWHG E\ LGHQWPf 7KHVH FRQVWUDLQWV DUH LPSOLHG E\ LGHQWSf /HPPD /HW WKH KLHUDUFKLFDO ORJOLQHDU PRGHO $3%3f EH VSHFLILHG DV HLWKHU ORJ] ;rr LGHQWSf RU 8rn ORJ] LGHQWSf 6XSSRVH WKDW WKH MRLQW GLVWULEXWLRQ PRGHO >M@ FDQ EH VSHFLILHG DV HLWKHU ORJ S ; LGHQWSf RU 8n?RJS LGHQWSf ,I >k M@ LV QR PRUH UHVWULFWLYH WKDQ $3%3f LQ WKH VHQVH WKDW 0;f'0;rf RU 08f&08rf WKHQ >M@ RQO\ FRQVWUDLQV WKH H[SHFWHG PDUJLQDO FRXQWV WR VDWLVI\ WKH LGHQWLILn DELOLW\ FRQVWUDLQWV LGHQW^UQf
PAGE 131
3URRI :ULWH WKH PRGHO $3%3f DV ORJ 9LMN D De DI DI DINS DINS ZKHUH ZLWKRXW ORVV RI JHQHUDOLW\ WKH IUHHGRP LGHQWLILDELOLW\ FRQVWUDLQWV DUH DS DI D" RII DIS DIS DIS 9} M N DQG WKH LGHQWLILDELOLW\ FRQVWUDLQWV LGHQWI[f DUH N O. r L 8VLQJ WKH LGHQWLILDELOLW\ FRQVWUDLQWV ZH FDQ ZULWH QN H[SD RLSfn\Nn\N N O. ZKHUH +HQFH ,N ;fH[3D DÂ£3f L O LN ![SD"DINSf M L D De ORJ QN ORJ r ORJ7MI 1RZ DOO RI WKH IUHHGRP SDUDPHWHUV QRW FRQVWUDLQHG E\ WKH IUHHGRP LGHQWLILn DELOLW\ FRQVWUDLQWV RU WKH LGHQWLILDELOLW\ FRQVWUDLQWV DUH FRPSOHWHO\ DUELWUDU\ ,W IROORZV WKDW A`DQG ÂI` ZKLFK DUH IXQFWLRQV RI WKHVH DUELWUDU\ IUHHGRP SDUDPHWHUV DUH DOVR FRPSOHWHO\ DUELWUDU\ 7KHUHIRUH M PLONf H[SORJQIF ORJ A ORJMI DI DINSEI H[S ORJ QN ORJ MI DI DINSf B QN H[S ^DI DINSf ,N
PAGE 132
7KDW LV WKLV VHW RI H[SHFWHG PDUJLQDO FRXQWV IROORZV D VDWXUDWHG PXOWLQRPLDO ORJOLQHDU PRGHO 6LPLODUO\ Q 8N H;3DI DIN3f Â‘ B L L UU 7
PAGE 133
WKDQ WKH KLHUDUFKLFDO ORJOLQHDU PRGHO $34 %34f IRU WKH FRQFOXVLRQ RI /HPPD WR KROG 6LQFH PRVW UHDVRQDEOH MRLQW GLVWULEXWLRQ PRGHOV ZLOO EH ZHOO GHILQHG ZH DVVXPH WKLV WR EH WKH FDVH DQG KHQFH DUH OHIW WR VKRZ WKDW WKH PDUJLQDO GLVWULEXWLRQ PRGHO LV ZHOO GHILQHG 7R VKRZ WKLV ZH VLPSO\ PXVW VKRZ WKDW WKH JHQHUDOL]HG ORJOLQHDU RU OLQHDU PDUJLQDO PRGHO FRQVWUDLQWV DQG WKH LGHQWLILDELOLW\ FRQVWUDLQWV LGHQWPf ZKLFK DUH LPSOLHG E\ LGHQWff DUH LQGHSHQGHQW :H ZLOO LQLWLDOO\ DVVXPH WKDW QHHG QRW HTXDO /HW WKH IDFWRUV 5L DQG 5 UHSUHVHQW WKH OHYHO RI UHVSRQVH WR IDFWRUV $ DQG % 7KDW LV 5? LV DQ OHYHO IDFWRU DQG 5 LV D OHYHO IDFWRU $ VLPSOH ORJOLQHDU PRGHO IRU WKH H[SHFWHG PDUJLQDO FRXQWV FDQ EH ZULWWHQ DV 3" 3f 3! 3ff :KDW WKLV PHDQV LV WKDW WKH H[SHFWHG PDUJLQDO FRXQWV VDWLVI\ ORJPcOIFf "f 3"n 3O3 Â N O. ORJPIFf LDLI _} M f 3O ILI 3O3 3?S fÂ§ LGHQWPf 6XSSRVH QRZ WKDW fÂ§ $V EHIRUH OHW WKH IDFWRU 5 UHSUHVHQW WKH FRPPRQ OHYHOV RI UHVSRQVH IRU ERWK UHVSRQVH IDFWRUV $ DQG % $OVR WKH IDFWRU 9 ZLOO DJDLQ EH GHILQHG WR EH WKH UHVSRQVH YDULDEOH IDFWRU )RU WKLV H[DPSOH 9 LV D WZROHYHO IDFWRU WDNLQJ RQ WKH YDOXHV FRUUHVSRQGLQJ WR WKH fILUVWf UHVSRQVH $ DQG FRUUHVSRQGLQJ WR WKH fVHFRQGf UHVSRQVH % )RU ORQJLWXGLQDO GDWD 9 LV UHIHUUHG WR DV WKH f2FFDVLRQf YDULDEOH 6LQFH fÂ§ ZH FDQ FRQVLGHU DQ HYHQ VLPSOHU PRGHO :H FRXOG DVVXPH WKDW Wr L
PAGE 134
DQG FRQVLGHU WKH PRGHO 3 93f ZKLFK FDQ EH VSHFLILHG DV ORJPÂ Nf U U< UW UI 7WAS W W N O. f ZKHUH W W" f L 7N 7WN3 3N3L Â IF O37 WKH U SDUDPHWHUV VDWLVI\ WKH IUHHGRP FRQVWUDLQWV UY WS WYS UYS : DQG WKH LGHQWLILDELOLW\ FRQVWUDLQWV LGHQWUQf DUH VDWLVILHG 1RWLFH WKDW WKH PRGHO 3 93f RQO\ PDNHV VHQVH ZKHQ @ LW LPSOLHV PDUJLQDO KRPRJHQHLW\ RI WKH $ DQG % UHVSRQVH GLVWULEXWLRQV 7KH IROORZLQJ OHPPD SURYLGHV XV ZLWK D ZD\ RI LGHQWLI\LQJ D ODUJH FODVV RI PDUJLQDO GLVWULEXWLRQ PRGHOV WKDW DUH ZHOO GHILQHG ,W LV FRQFHUQHG ZLWK WKH FDVH ZKHQ QHHG QRW HTXDO /HPPD DSSOLHV ZKHQ fÂ§ (DFK RI WKHVH OHPPDV LV HDVLO\ JHQHUDOL]DEOH WR VLWXDWLRQV ZKHQ WKHUH DUH PDQ\ UHVSRQVH YDULDEOHV DQG PDQ\ FRYDULDWHV /HPPD 6XSSRVH WKDW WKH PDUJLQDO GLVWULEXWLRQ PRGHO 3L 3f 3 3ff FDQ EH ZULWWHQ DV HLWKHU ?RJP ;rr LGHQW^UQf RU 8rf ORJP LGHQWPf ZKHUH LGHQW^Pf DUH WKRVH LGHQWLILDELOLW\ FRQVWUDLQWV LPSOLHG E\ LGHQW^Qf 6SHFLI\ WKH PDUJLQDO GLVWULEXWLRQ PRGHO >P@ DV ORJ P ; LGHQWIPf RU 8 ORJ P LGHQW^UQf
PAGE 135
,I >0@ LV QR PRUH UHVWULFWLYH WKDQ 5L 3f 5 3ff LQ WKH VHQVH WKDW 0^;f 0;rf RU 0^8f & 0^8rf WKHQ >2MWI@ LV ZHOO GHILQHG 3URRI %\ HTXDWLRQ f WKH PDUJLQDO PRGHO ^^5? 3f 5 3ff ZLWKRXW WKH LGHQWLILDELOLW\ FRQVWUDLQWV LPSOLHV WKDW r0f ;Afrf E3f;H[35Of DQG MaO r VNf [ PLf H[3 3 3OSf ; H[SI ff M L L L +HQFH WKH VW$f ZKLFK DUH IXQFWLRQV RI r DUELWUDU\ SDUDPHWHUV DUH DUELWUDU\ 6LQFH WKH LGHQWLILDELOLW\ FRQVWUDLQWV LGHQWPf FRQVWUDLQ WKH VL Nf WR VDWLVI\ VL Nf QN N W DQG WKH PRGHO FRQVWUDLQWV DOORZ WKH VW Nf WR EH FRPSOHWHO\ DUELWUDU\ LW IROORZV WKDW WKH PRGHO 5L 3f 5 3ff LV ZHOO GHILQHG $OVR DQ\ OHVV UHVWULFWLYH PDUJLQDO GLVWULEXWLRQ PRGHO ZLOO DOVR EH ZHOO GHILQHG B 1RWLFH WKDW LQ WKH SURRI RI /HPPD WKH FRQFOXVLRQ ZRXOG VWLOO KROG LI WKH VXPV AI H[SIOIf DQG M L H[S3If ZHUH FRQVWUDLQHG WR HTXDO HDFK RWKHU 7KLV ZLOO EH LPSRUWDQW ZKHQ ZH VKRZ WKDW WKH PRGHO 5 93f LV ZHOO GHILQHG 6XSSRVH QRZ WKDW VR WKDW WKH PRGHO 593f LV UHDVRQDEOH 7KLV QH[W OHPPD LGHQWLILHV D ODUJH FODVV RI PDUJLQDO GLVWULEXWLRQ PRGHOV WKDW DUH ZHOO GHILQHG ZKHQ WKH UHVSRQVHV DUH PHDVXUHG RQ WKH VDPH VFDOH
PAGE 136
/HPPD 6XSSRVH WKDW WKH PRGHO 593f FDQ EH ZULWWHQ DV HLWKHU ORJP ;rr LGHQWUQf RU 8rn ORJP LGHQW^Pf 6SHFLI\ WKH PDUJLQDO GLVWULEXWLRQ PRGHO >#P@ DV ORJP ; LGHQW^Pf RU 8n ORJP LGHQW^UQf ,I >0@ LV QR PRUH UHVWULFWLYH WKDQ 593f LQ WKH VHQVH WKDW 0;f 0;rf RU 08f & 0^8rf WKHQ LW LV ZHOO GHILQHG 3URRI %\ HTXDWLRQ f ZH FDQ ZULWH WKH VXPV VWNf DV VW Nf H[SU UWY UI WbSf A H[SU]f L 1RWLFH WKDW WKH ILUVW H[SRQHQWLDO WHUP LV FRPSOHWHO\ DUELWUDU\ LW LV D IXQFWLRQ RI r LQGHSHQGHQW SDUDPHWHUV 7KHUHIRUH WKH VHW RI VXPV VLIFf` LV QRW FRQVWUDLQHG LQ DQ\ ZD\ E\ WKH PRGHO FRQVWUDLQWV ORJP ;rr $V LQ WKH SURRI RI /HPPD LW IROORZV WKDW WKH PDUJLQDO GLVWULEXWLRQ PRGHO 5 93f LV ZHOO GHILQHG )LQDOO\ DQ\ OHVV UHVWULFWLYH PRGHO ZLOO DOVR EH ZHOO GHILQHG J ,Q YLHZ RI WKH SURRI RI /HPPD WKH PRGHO 5 9 3f ZRXOG QRW EH ZHOO GHILQHG QHLWKHU ZRXOG 593f ,Q RUGHU IRU WKH PDUJLQDO GLVWULEXWLRQ PRGHO WR EH ZHOO GHILQHG WKH ORJOLQHDU PRGHO PXVW LQFOXGH WKH 93 HIIHFW :H FDQ HDVLO\ JHQHUDOL]H WKH UHVXOWV RI /HPPD 6XSSRVH WKDW WKHUH DUH WZR FRYDULDWHV VD\ 3 DQG 4 ,W FDQ EH VKRZQ WKDW DQ\ PDUJLQDO GLVWULEXWLRQ PRGHO WKDW LV QR PRUH UHVWULFWLYH WKDQ WKH ORJOLQHDU PRGHO 5934f LV ZHOO GHILQHG $ PDUJLQDO GLVWULEXWLRQ PRGHO WKDW LV VSHFLILHG DV D FXPXODWLYH RU
PAGE 137
DGMDFHQW FDWHJRULHVORJLW PRGHO ZRXOG EH ZHOO GHILQHG LI WKH PRGHO DOORZV WKH VXPV ^VWNf` WR EH FRPSOHWHO\ DUELWUDU\ :H QRZ VWDWH DQ LPSRUWDQW WKHRUHP WKDW DGGUHVVHV WKH LVVXH RI PRGHO ZHOO GHILQHGQHVV 7KH WKHRUHP LV VSHFLILFDOO\ IRU WKH FDVH ZKHQ WKH UHVSRQVH YDULDEOHV $ DQG % DUH PHDVXUHG RQ WKH VDPH VFDOH DQG WKHUH LV MXVW RQH FRYDULDWH 3 ,W FDQ HDVLO\ EH JHQHUDOL]HG WR WKH FDVH RI VHYHUDO GLVWLQFW UHVSRQVHV DQG VHYHUDO FRYDULDWHV 7KHRUHP 6XSSRVH WKDW WKH MRLQW GLVWULEXWLRQ PRGHO >M@ LV QR PRUH UHVWULFWLYH WKDQ WKH ORJOLQHDU PRGHO $3%3f DQG WKDW WKH PDUJLQDO GLVWULEXWLRQ PRGHO >kÂI@ LV QR PRUH UHVWULFWLYH WKDQ WKH ORJOLQHDU PRGHO 593f ,W IROORZV WKDW WKH VLPXOWDQHRXV PRGHO >M Q tP@ r1 ZH0 GHILQHG 3URRI 7KH SURRI IROORZV LPPHGLDWHO\ E\ /HPPDV DQG DQG WKH IDFW WKDW D VLPXOWDQHRXV PRGHO LV ZHOO GHILQHG LI WKH IROORZLQJ FRQGLWLRQV KROG %RWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQ PRGHOV DUH ZHOO GHILQHG DQG WKH MRLQW GLVWULEXWLRQ PRGHO RQO\ FRQVWUDLQV WKH H[SHFWHG PDUJLQDO FRXQWV WR VDWLVI\ WKH LGHQWLILDELOLW\ FRQVWUDLQWV LGHQWPf J $ IHZ UHPDUNV DERXW 7KHRUHP DUH LQ RUGHU )LUVWO\ ZKHQ WKHUH LV RQO\ RQH SRSXODWLRQ RI LQWHUHVW WKH VXIILFLHQW FRQGLWLRQ LV WKDW WKH PDLQHIIHFWV SDUDPHWHUV DUH DOORZHG WR EH DUELWUDU\ ,W IROORZV WKDW VXFK PRGHOV DV TXDVL V\PPHWU\ 46
PAGE 138
)RU WKH H[DPSOH RI VHFWLRQ ZH VHH WKDW KDG ZH OHIW WKH HIIHFW 96 RXW RI WKH PDUJLQDO ORJOLQHDU PRGHO f WKH PDUJLQDO PRGHO ZRXOG KDYH FRQVWUDLQHG WKH VXPV ^VWNf` WR OLH LQ VRPH UHVWULFWHG VSDFH 7KLV FDQ EH VHHQ E\ QRWLQJ WKDW VWLNf p&O k$I@ a GIUHD>4M? GUHL>k$I@ f VLQFH WKH PRGHO FRQVWUDLQWV DUH QRQUHGXQGDQW )RU H[DPSOH WKH UHVLGXDO GHJUHHV RI IUHHGRP IRU PHDVXULQJ JRRGQHVV RI ILW RI WKH VLPXOWDQHRXV PRGHO / [ / 'f Q 08f XVHG LQ WKH SROLWLFDO LQWHUHVW GDWD H[DPSOH FDQ EH FRPSXWHG LQ WKLV ZD\ 7KLV IROORZV VLQFH WKH PRGHO / [ / 'f VDWLVILHV WKH VXIILFLHQW FRQGLWLRQV RI 7KHRUHP DQG VR LI 08f LV ZHOO GHILQHG WKH
PAGE 139
VLPXOWDQHRXV PRGHO / [ / 'f Q 08f LV ZHOO GHILQHG ,Q FRQWUDVW WKH PRGHO ^\A? 9? *f XVHG IRU WKH FURVVRYHU GDWD H[DPSOH GRHV QRW VDWLVI\ WKH FRQGLWLRQV RI WKH WKHRUHP VLQFH WKH HIIHFWV 9A* DQG 9A* DUH RPLWWHG ,Q IDFW WKH PRGHO LPSOLHV WKDW WKHUH LV QR *URXS *f E\ 5HVSRQVH OHYHO 5f DVVRFLDWLRQ 7KHUHIRUH WKH VLPXOWDQHRXV PRGHO FRPSULVHG RI WKLV MRLQW GLVWULEXWLRQ PRGHO DORQJ ZLWK WKH PDUJLQDO FXPXODWLYHORJLW PRGHO 09f LV LOO GHILQHG VLQFH 09f LPSOLHV WKH VDPH FRQVWUDLQWV (TXDWLRQ f GRHV QRW DSSO\ LQ WKLV FDVH 'LVFXVVLRQ ,Q WKLV FKDSWHU ZH LQWURGXFHG D EURDG FODVV RI PRGHOV WKDW LPSO\ VWUXFn WXUH RQ ERWK WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV RI PXOWLYDULDWH FDWHJRULFDO UHVSRQVH YHFWRUV ZKHQ WKH UHVSRQVH VFDOH ZDV WKH VDPH IRU HDFK UHVSRQVH :H VKRZHG WKDW WKHVH PRGHOV FDQ EH ILW XVLQJ WKH 0/ ILWWLQJ PHWKRG RI &KDSWHU 6HYHUDO QXPHULFDO H[DPSOHV ZHUH FRQVLGHUHG LOOXVWUDWLQJ WKH XVHIXOQHVV RI VLPXOWDQHRXVO\ PRGHOLQJ WKH MRLQW DQG PDUJLQDO GLVWULEXWLRQV $OO RI WKH PRGHOV ZHUH ILWWHG XVLQJ WKH )2575$1 SURJUDP fPOHUHVWUDLQWf ZKLFK ZDV GHYHORSHG E\ WKH DXWKRU 0RGHO SDUVLPRQ\ ZDV WKH LPSHWXV EHKLQG WKLV HQWLUH FKDSWHU 2XU REMHFWLYH ZDV WR ILQG SDUVLPRQLRXV PRGHOV WKDW ERWK ILW WKH GDWD ZHOO DQG SURYLGHG XV ZLWK VWUDLJKWIRUZDUG LQWHUSUHWDWLRQV RI IUHHGRP SDUDPHWHUV 7KH PRGHOV RIWHQ LQFOXGHG SDUDPHWHUV WKDW PHDVXUHG GHSDUWXUHV IURP LQGHSHQn GHQFH DPRQJ WKH UHVSRQVHV DV ZHOO DV SDUDPHWHUV WKDW PHDVXUHG GHSDUWXUH IURP PDUJLQDO KRPRJHQHLW\ ,W ZDV VKRZQ YLD D QXPHULFDO H[DPSOH WKDW SDUVLPRQLRXV PRGHOLQJ PD\ UHVXOW LQ PRUH HIILFLHQW DQG UHOLDEOH HVWLPDWLRQ
PAGE 140
RI ERWK PRGHO DQG IUHHGRP SDUDPHWHUV WKH UHVHDUFKHU PXVW ILQG D EDODQFH EHWZHHQ D PRGHO WKDW LV WRR VWUXFWXUHG DQG RQH WKDW LV QRW VWUXFWXUHG HQRXJK 7KH DXWKRU IXOO\ LQWHQGV WR FRQGXFW VLPXODWLRQ VWXGLHV WR EHWWHU XQGHUVWDQG WKH LPSRUWDQFH RI SDUVLPRQLRXV PRGHOLQJ LQ WKLV VHWWLQJ $OWKRXJK ZH SURYLGH VRPHZKDW JHQHUDO UHVXOWV UHJDUGLQJ FRPSDWLELOLW\ RI WKH MRLQW DQG PDUJLQDO PRGHOV WKHUH VWLOO LV D QHHG IRU PRUH JHQHUDO UHVXOWV :H GLVFXVV WKH FDVH ZKHQ WKH MRLQW DQG PDUJLQDO PRGHOV FDQ EH H[SUHVVHG DW OHDVW HTXLYDOHQWO\ DV ORJOLQHDU PRGHOV 0RUH JHQHUDO UHVXOWV DUH QHHGHG IRU RWKHU W\SHV RI PRGHOV VXFK DV FXPXODWLYHORJLW DQG OLQHDU PRGHOV )RU WKHVH VLPXOWDQHRXV PRGHOV WR EH XVHIXO WR WKH SUDFWLWLRQHU D JHQHUDO PHWKRG WR GHWHUPLQH ZKHWKHU WKH FRQVWUDLQWV LPSOLHG E\ WKH WZR PRGHOV DUH LQGHSHQGHQW PXVW EH GHYHORSHG 7KH SURSRVLWLRQ LQ VHFWLRQ LV D VWHS LQ WKH ULJKW GLUHFWLRQ $ IDFWRU WKDW FRXOG LPSHGH WKH XVH RI WKLV PHWKRG WR ILW PRGHOV WR YHU\ ODUJH GDWD VHWV LV WKH LQSXW UHTXLUHPHQWV 7KH DOJRULWKP UHTXLUHV D VXEVWDQWLDO DPRXQW RI LQSXW )RU H[DPSOH FRQVLGHU WKH LQSXW UHTXLUHG IRU WKH H[DPSOH LQ VHFWLRQ 7KH PDWULFHV & $ DQG ; DOO PXVW EH LQSXW $OWKRXJK WKH UHTXLUHG LQSXW LV VLPSOH WR GHWHUPLQH WKHUH LV PXFK HQHUJ\ H[SHQGHG LQSXWLQJ WKH LQIRUPDWLRQ $Q LQSXW SURJUDP PXVW EH GHYHORSHG DQG LPSOHPHQWHG LQ WKH SURJUDP fPOHUHVWUDLQWf 7KH DVVHVVPHQW RI PRGHO JRRGQHVV RI ILW LV VWUDLJKWIRUZDUG ZKHQ XVLQJ WKH 0/ PHWKRG 7KH ORJf OLNHOLKRRGUDWLR VWDWLVWLF WKH 3HDUVRQ VWDWLVWLF ; RU WKH :DOG VWDWLVWLF : FDQ EH XVHG IRU WKLV SXUSRVH 2I LQWHUHVW WR WKH SUDFWLFLQJ VWDWLVWLFLDQ LV WKH DELOLW\ WR DVVHVV KRZ IDU ZURQJ \RX FDQ EH E\ DVVXPLQJ WKDW WKH UHVSRQVHV DUH LQGHSHQGHQW 7KH WHVW VWDWLVWLF XVHG
PAGE 141
IRU WKLV SXUSRVH LV VLPSO\ WKH OLNHOLKRRGUDWLR VWDWLVWLF WKDW PHDVXUHV KRZ fIDU DSDUWf WKH PRGHOV f Q 08f DQG 6f Q 08f DUH %HFDXVH WKH PRGHO ,f Q 08f LV QHVWHG ZLWKLQ WKH PRGHO 6f Q 0,f RQH FDQ XVH DV D PHDVXUH RI WKLV GLVWDQFH WKH GLIIHUHQFH EHWZHHQ WKH WZR OLNHOLKRRGUDWLR VWDWLVWLFV YL] *>^,f Q08f@ *>6f Q0Wf@ 0RUH JHQHUDOO\ WKHUH DUH PDQ\ DVVXPSWLRQV RQH FDQ PDNH DERXW WKH DVVRFLDWLRQ VWUXFWXUH DPRQJ WKH UHVSRQVHV :LWK WKH PHWKRGV RI WKLV GLVVHUWDWLRQ RQH FDQ HDVLO\ GHULYH WHVWV IRU WKH YDOLGLW\ RI WKH DVVXPSWLRQV $V DQ DOWHUQDWLYH WR ORQJLWXGLQDO W\SH VDPSOLQJ GHVLJQV D FURVVVHFWLRQDO VDPSOH PD\ EH WDNHQ &URVVVHFWLRQDO VDPSOLQJ LQYROYHV VDPSOLQJ LQGHSHQn GHQW JURXSV RI VXEMHFWV IRU HDFK UHVSRQVH 7KH UHVHDUFK TXHVWLRQV SRVHG DERXW WKH PDUJLQDO GLVWULEXWLRQV DUH VXFK WKDW WKH\ FRXOG E\ DQVZHUHG XVLQJ FURVV VHFWLRQDO GDWD ,Q WKLV VHQVH WKH PDUJLQDO PRGHOV DUH fSRSXODWLRQ DYHUDJHGf PRGHOV =HJHU HW DK f +RZHYHU D FURVVVHFWLRQDO VDPSOLQJ GHVLJQ UHVXOWV LQ PRUH VXEMHFW YDULDELOLW\ VLQFH QRQKRPRJHQHRXV VXEMHFWV DUH XVHG IRU HDFK UHVSRQVH DQG WKH GHWHFWLRQ RI GLIIHUHQFHV LQ WKH PDUJLQDO GLVWULEXWLRQV PD\ EH FORXGHG E\ WKHVH VXEMHFW HIIHFWV /DLUG f )XUWKHU ZLWK FURVV VHFWLRQDO VWXGLHV ZH DUH XQDEOH WR H[SORUH WKH DVVRFLDWLRQ VWUXFWXUH DPRQJ WKH UHVSRQVHV 7KLV LQIRUPDWLRQ UHJDUGLQJ WKH DVVRFLDWLRQ VWUXFWXUH PD\ EH RI VXEVWDQWLYH LPSRUWDQFH LQ VRPH VLWXDWLRQV
PAGE 142
&+$37(5 /2*/,1($5 02'(/ ),77,1* :,7+ ,1&203/(7( '$7$ ,QWURGXFWLRQ :H FRQVLGHU PDNLQJ LQIHUHQFHV DERXW ORJOLQHDU PRGHO SDUDPHWHUV ZKHQ RQO\ GLVMRLQW VXPV RI WKH FRPSOHWH GDWD DUH REVHUYHG ,QIHUHQFHV ZLOO EH PDGH EDVHG RQ WKH PD[LPXP OLNHOLKRRG HVWLPDWHV RI WKH PRGHO SDUDPHWHUV DQG DQ HVWLPDWH RI SUHFLVLRQ RI WKHVH HVWLPDWHV $V DQ H[DPSOH FRQVLGHU WKH GDWD LQ 7DEOH RI *RRGPDQ f (DFK RI UHVSRQGHQWV ZDV FODVVLILHG DV EHLQJ XQLYHUVDOLVWLF RU SDUWLFXODULVWLF ZKHQ FRQIURQWHG E\ HDFK RI IRXU VLWXDWLRQV $ % & 'f RI UROH FRQIOLFW *RRGPDQ f SRVWXODWHG WKH SUHVHQFH RI DQ XQGHUO\LQJ WZROHYHO ODWHQW IDFWRU : ZKLFK ZDV QRW REVHUYHG :LWKLQ D OHYHO RI WKH ODWHQW IDFWRU WKH PDQLIHVW YDULDEOHV $ % & 'f DUH DVVXPHG WR EH PXWXDOO\ LQGHSHQGHQW 7KXV WKH ODWHQW FODVV VWUXFWXUH ZRXOG DOORZ XV WR VLPSO\ H[SODLQ WKH UHODWLRQVKLS DPRQJ WKH IRXU PDQLIHVW YDULDEOHV ,Q WKLV VHWWLQJ WKH XQREVHUYDEOH FRPSOHWH GDWD DUH WKH FRXQWV UHVXOWLQJ IURP D FURVVn FODVVLILFDWLRQ RQ WKH IRXU PDQLIHVW IDFWRUV DQG WKH ODWHQW IDFWRU 7KH GDWD LI REVHUYDEOH FRXOG EH GLVSOD\HG LQ D FRQWLQJHQF\ WDEOH 7KH REVHUYDEOH LQFRPSOHWH GDWD DUH WKH FRXQWV REWDLQHG E\ VXPPLQJ RYHU WKH WZR OHYHOV RI WKH ODWHQW IDFWRU LH WKH LQFRPSOHWH GDWD DUH GLVMRLQW VXPV RI WKH FRPSOHWH GDWD $V LQ *RRGPDQ f ZH DVVXPH WKH FRPSOHWH GDWD PHDQV IROORZ D ORJOLQHDU PRGHO ZKLFK LPSOLHV FRQGLWLRQDO LQGHSHQGHQFH DPRQJ WKH PDQLIHVW IDFWRUV $ % & 'f JLYHQ WKH ODWHQW IDFWRU : 2XU REMHFWLYHV LQFOXGH ILQGLQJ
PAGE 143
WKH PD[LPXP OLNHOLKRRG HVWLPDWHV RI WKH ORJOLQHDU SDUDPHWHUV EDVHG RQ WKH REVHUYHG GDWD HVWLPDWLQJ WKHLU SUHFLVLRQ FRPSXWLQJ RWKHU PRGHO EDVHG HVWLPDWRUV DQG WKHLU VWDQGDUG HUURUV DQG WHVWLQJ PRGHO JRRGQHVV RI ILW 7KHUH DUH PDQ\ ZD\V WR ILQG WKH PD[LPXP OLNHOLKRRG HVWLPDWRUV HDFK PHWKRG KDYLQJ LWV SRVLWLYH DQG QHJDWLYH IHDWXUHV )RU H[DPSOH ZH FRXOG ZRUN GLUHFWO\ ZLWK WKH LQFRPSOHWHGDWD OLNHOLKRRG ZKLFK LV XVXDOO\ FRPSOLFDWHG UHODWLYH WR WKH FRPSOHWHGDWD OLNHOLKRRG DQG XVH D 1HZWRQ5DSKVRQ RU )LVKHUn VFRULQJ DOJRULWKP 3DOPJUHQ DQG (NKROP f DQG +DEHUPDQ f XVH WKHVH PHWKRGV WR REWDLQ PD[LPXP OLNHOLKRRG HVWLPDWHV DQG WKHLU VWDQGDUG HUURUV :H FRXOG DYRLG WKH FRPSOLFDWHG OLNHOLKRRG DOWRJHWKHU DQG XVH WKH ([SHFWDWLRQ0D[LPL]DWLRQ DOJRULWKP 'HPSVWHU HW DO f 6XQGEHUJ f GLVFXVVHV WKH SURSHUWLHV RI WKH (0 DOJRULWKP ZKHQ LW LV XVHG WR ILW PRGHOV WR GDWD FRPLQJ IURP WKH UHJXODU H[SRQHQWLDO IDPLO\ ,Q VHFWLRQ WKH (0 DOJRULWKP LV H[SORUHG LQ JUHDWHU GHWDLO 8QOLNH WKH RWKHU DSSURDFKHV WKH (0 DOJRULWKP LV LQVHQVLWLYH WR VWDUWLQJ YDOXHV 7KLV LV LPSRUWDQW LQ SUDFWLFH VLQFH ZH VHOGRP KDYH DQ\ LGHD ZKDW D UHDVRQDEOH VWDUWLQJ YDOXH LV $QRWKHU SRVLWLYH IHDWXUH QRW VKDUHG E\ WKH RWKHU PHWKRGV LV WKDW WKH FRQYHUJHQFH WR WKH PD[LPXP LV PRQRWRQLF LH WKH OLNHOLKRRG LV LQFUHDVHG DW HDFK VXFFHVVLYH LWHUDWLRQ 'UDZEDFNV WR WKH (0 DOJRULWKP DUH WKDW f LW LV UHODWLYHO\ VORZ DQG f DQ HVWLPDWH RI SUHFLVLRQ RI WKH SDUDPHWHU HVWLPDWH LV QRW REWDLQHG DV D E\SURGXFW RI WKH DOJRULWKP 15 DQG )LVKHUVFRULQJ RQ WKH RWKHU KDQG DUH IDVWHU DQG DV D E\SURGXFW SURYLGH XV ZLWK DQ HVWLPDWH RI SUHFLVLRQ 7KH VORZ FRQYHUJHQFH RI WKH (0 DOJRULWKP FDQ EH PLWLJDWHG VRPHZKDW XVLQJ WKH DFFHOHUDWLRQ PHWKRGV RI 0HLOLMVRQ f RU /RXLV f $OVR LQFUHDVHG FRPSXWHU HIILFLHQF\ KDV
PAGE 144
PDGH WKH VORZ FRQYHUJHQFH OHVV RI DQ LVVXH ,Q VHFWLRQ ZH DGGUHVV WKH VHFRQG GUDZEDFN RI WKH (0 DOJRULWKP E\ GHULYLQJ DQ H[SOLFLW IRUP IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ ZKHQ WKH FRPSOHWH GDWD DUH LQGHSHQGHQW 3RLVVRQV ZLWK PHDQV IROORZLQJ D ORJOLQHDU PRGHO 7KH REVHUYHG LQIRUPDWLRQ PDWUL[ LV FRPSXWHG XSRQ FRQYHUJHQFH RI WKH (0 DOJRULWKP DQG WKHQ LQYHUWHG 7KH LQYHUVH ZLOO VHUYH DV WKH HVWLPDWH RI SUHFLVLRQ ,Q VHFWLRQ ZH H[SORUH DQ LWHUDWLYH VFKHPH WKDW XVHV ERWK 15 DQG (0 H[SORLWLQJ HDFK RI WKHLU VWURQJ SRLQWV 5HYLHZ RI WKH (0 $OJRULWKP 7KH (0 DOJRULWKP LV JHQHUDOO\ XVHG LQ WKRVH HVWLPDWLRQ SUREOHPV LQ ZKLFK WKH OLNHOLKRRG LV FRPSOLFDWHG UHQGHULQJ LW GLIILFXOW RU LPSUDFWLFDO WR PD[LPL]H EXW LQ ZKLFK WKH GDWD FDQ EH YLHZHG DV EHLQJ VRPH IXQFWLRQ RI FRPSOHWH GDWD ZKLFK KDG WKH\ EHHQ REVHUYHG HYDOXDWLRQ RI PD[LPXP OLNHOLKRRG HVWLPDWHV ZRXOG EH VLPSOH 8QOLNH PDQ\ RWKHU VWDWLVWLFDO URRWn ILQGLQJ DOJRULWKPV WKH (0 DOJRULWKP GRHV QRW UHTXLUH H[SOLFLW FDOFXODWLRQ RI WKH VFRUH YHFWRU RU LWV GHULYDWLYH ,W XVHV PXFK VLPSOHU IXQFWLRQV 7KH (0 DOJRULWKP LV E\ QR PHDQV D QHZ PHWKRG IRU ILQGLQJ PD[LPXP OLNHOLKRRG HVWLPDWHV *RRGPDQ f HVVHQWLDOO\ XVHG LW 6XQGEHUJ f GLVFXVVHV LW DW OHQJWK ZKHQ XVHG LQ WKH H[SRQHQWLDO IDPLO\ FDVH 'HPSVWHU /DLUG DQG 5XELQ f SURYLGH XV ZLWK D UHYLHZ RI WKH PHWKRG DV ZHOO DV VRPH RI LWV SURSHUWLHV 6XEVHTXHQW ZRUN ZLWK WKH (0DOJRULWKP KDV EHHQ SULPDULO\ GHYRWHG WR LPSURYLQJ WKH VSHHG RI LWV FRQYHUJHQFH /RXLV 0HLOLMVRQ f
PAGE 145
*HQHUDO 5HVXOWV 6XSSRVH WKH FRPSOHWH GDWD ; KDV GHQVLW\ A[ f ZLWK UHVSHFW WR VRPH PHDVXUH /HW < fÂ§ <;f D IXQFWLRQ RI WKH FRPSOHWH GDWD GHQRWH WKH REVHUYHG GDWD ,W IROORZV WKDW WKH GHQVLW\ RI < LV \\f I[^[fGY[f f U ZKHUH 5 fÂ§ ^[
PAGE 146
7KH IROORZLQJ SURSHUWLHV RI WKH (0 DOJRULWKP DUH YHULILHG LQ WKH DSSHQGL[ 7KH SURRIV IROORZ IURP 'HPSVWHU HW DO f DQG /RXLV f ,Q ZKDW IROORZV 6 GHQRWHV D VFRUH YHFWRU DQG DQ LQIRUPDWLRQ PDWUL[ 3URSHUW\ ,I DQG DUH WKH PWK DQG P OW LWHUDWH HVWLPDWHV REWDLQHG YLD WKH (0 DOJRULWKP WKHQ 0PfVf!0nUDff LH WKH ORJ OLNHOLKRRG LV LQFUHDVHG DW HDFK VXFFHVVLYH LWHUDWLRQ 3URSHUW\ 7KH VHTXHQFH RI (0 LWHUDWHV ^AP?P ` VDWLVI\ ZKHQHYHU FRQYHUJHV WR rrf DV UD RR ,rf 0RRf\f r LH WKH HVWLPDWHV FRQYHUJH WR D ]HUR RI WKH VFRUH YHFWRU IRU < 3URSHUW\ )RU DQ\ A>40R\f ,}@ V\J \f (V[II;fLY YHff 3URSHUW\ )RU DQ\ ,
PAGE 147
FDQ EH XVHG WR ILQG D ]HUR RI WKH LQFRPSOHWHGDWD VFRUH IXQFWLRQ SURSHUW\ SURYLGHV XV ZLWK D ZD\ RI HYDOXDWLQJ WKH VFRUH IXQFWLRQ VHH 0HLOLMVRQ f DQG ILQDOO\ SURSHUW\ JLYHV XV DQ H[SUHVVLRQ IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ EDVHG RQ WKH LQFRPSOHWH GDWD 7KHVH IRXU SURSHUWLHV RI WKH (0 DOJRULWKP ZLOO EH H[SORUHG LQ GHWDLO LQ WKH QH[W VHFWLRQ ZKLFK GHDOV ZLWK WKH VSHFLDO FDVH LQ ZKLFK WKH FRPSOHWH GDWD KDYH GLVWULEXWLRQ LQ WKH UHJXODU H[SRQHQWLDO IDPLO\ ([SRQHQWLDO )DPLO\ 5HVXOWV 7KH H[SRQHQWLDO IDPLOLHV RI GLVWULEXWLRQV SOD\ DQ LPSRUWDQW UROH LQ VWDWLVn WLFDO LQIHUHQFH 0DQ\ GDWD JHQHUDWLQJ PHFKDQLVPV FDQ EH PRGHOHG DVVXPLQJ WKDW WKH XQGHUO\LQJ GLVWULEXWLRQ LV D PHPEHU RI WKH UHJXODU H[SRQHQWLDO IDPLO\ ,Q WKLV VHFWLRQ ZH FRQVLGHU SURSHUWLHV RI WKH UHJXODU H[SRQHQWLDO IDPLO\ WKDW DUH UHOHYDQW WR WKH XVH RI WKH (0 DOJRULWKP 6SHFLILFDOO\ ZH ZLOO PDNH XVH RI WKH UHVXOWV RI WKLV VHFWLRQ ZKLFK DUH GXH SULPDULO\ WR 6XQGEHUJ f WR MXVWLI\ UHVXOWV IRU 3RLVVRQ ORJOLQHDU PRGHOV ZLWK PLVVLQJ GDWD /HW WKH FRPSOHWH GDWD YHFWRU ; KDYH GHQVLW\ ZLWK UHVSHFW WR VRPH PHDVXUH LQ WKH UHJXODU H[SRQHQWLDO IDPLO\ 7KDW LV DVVXPH WKDW I[ID3f D[fH[S7n[f Fff f ZKHUH 7[f 7L[f7[f 7S[ffn DQG LV D FDQRQLFDO SDUDPHWHU YHFWRU RI OHQJWK S /HW ; fÂ§ [ I[[?IWf ` 6RPH ZHOO NQRZQ SURSHUWLHV RI WKH UHJXODU H[SRQHQWLDO IDPLO\ LQFOXGH 7;f LV VXIILFLHQW IRU DQG :i r1A;ff f
PAGE 148
7KHVH SURSHUWLHV RI f DUH VKRZQ LQ /HKPDQQ SS f 7KH SURSHUWLHV IROORZ LPPHGLDWHO\ XSRQ UHSHDWHG GLIIHUHQWLDWLRQ RI [ I[^[nILfG\[f ZLWK UHVSHFW WR /HKPDQQ f VKRZHG WKDW WKH GHULYDWLYH FRXOG EH SDVVHG WKURXJK WKH LQWHJUDO 6XSSRVH WKDW WKH LQFRPSOHWH GDWD YHFWRU < LV D PDQ\ WR RQHf IXQFWLRQ RI ; LH < <;f )RU QRWDWLRQDO FRQYHQLHQFH ZH OHW W 7[f DQG ,5[f UHSUHVHQW WKH LQGLFDWRU RI PHPEHUVKLS LQ 5 ^[ <[f \` ,W IROORZV E\ HTXDWLRQ f WKDW I[?\^rf\3f Â[[@Af,U[f B D[f H[SIc Fff f ,5[f I\^\3f 5D[fH[SWn3FMffGY^[f D[f H[SWn Fr \ff Â‘ ,5[f Dr[f H[S^Wn Fr \ff f ZKHUH Dr[f D[f,5[f DQG Fr\f ORJ 5D[f H[SWnfGX[f +HQFH WKH FRQGLWLRQDO GLVWULEXWLRQ RI ; JLYHQ < \ LV DOVR D PHPEHU RI WKH H[SRQHQWLDO IDPLO\ 6XQGEHUJ f $JDLQ E\ SURSHUWLHV RI WKH H[SRQHQWLDO IDPLO\ ZH KDYH G&n\f (J^7;f>< \f DQG YDU07AfL\ Yf 8VLQJ f DQG f ZH FDQ UHH[SUHVV WKH GHQVLW\ RI < DV U\f B I[^[f,5[f I[?
PAGE 149
2XU REMHFWLYH LV WR PD[LPL]H \\ f ZLWK UHVSHFW WR 2U HTXLYDOHQWO\ ZH DUH WR PD[LPL]H WKH ORJ OLNHOLKRRG 0\f Fr\fF"f f ZLWK UHVSHFW WR )RU ZHOO EHKDYHG e<@\f ZH FDQ ILQG WKH YDOXH RI VD\ WKDW PD[LPL]HV LW E\ VROYLQJ WKH VFRUH HTXDWLRQV F DY?GFrA\f GFf B V<^S\f JS=<^\f GS GS f 1RWLFH WKDW E\ SURSHUWLHV JLYHQ LQ f DQG f WKLV LV HTXLYDOHQW WR VROYLQJ WKH HTXDWLRQ 6US\f (S7;f?< \f (f7;ff f 7KHUH DUH PDQ\ ZD\V WR VROYH f 2QH SRVVLELOLW\ LV WR XVH WKH IROORZLQJ LWHUDWLYH VFKHPH f )LQG (H07;f?< \f f 6ROYH IRU } LQ (A7;ff (S07;f?< \f f f ,I __A AA__ 72/ WKHQ UHSODFH cA E\ AYf DQG JR WR f (OVH VWRS :H VKRZ LQ $SSHQGL[ % WKDW WKH LWHUDWLYH VFKHPH f LV VLPSO\ WKH (0 DOJRULWKP 7KH FRQYHUJHQFH SURSHUWLHV DUH GLVFXVVHG LQ 6XQGEHUJ f 2QH LPSRUWDQW QRWH ZLWK UHJDUG WR WKH (0 DOJRULWKP f LV WKDW LI e\\f LV QRW VR ZHOO EHKDYHG HJ WKH VFRUH YHFWRU \\f KDV PXOWLSOH URRWV VRPH RI ZKLFK PD\ EH DVVRFLDWHG ZLWK D PLQLPXP WKHQ WKH SDUWLFXODU
PAGE 150
VROXWLRQ REWDLQHG YLD WKH (0 DOJRULWKP ZLOO EH D ORFDO PD[LPXP OLNHOLKRRG HVWLPDWH 7KLV IROORZV VLQFH WKH OLNHOLKRRG LQFUHDVHV PRQRWRQLFDOO\ ZLWK HDFK VXFFHVVLYH (0 LWHUDWLRQ 8SRQ FRQYHUJHQFH RI WKH DOJRULWKP ZH FDQ XVH WKH QHJDWLYH +HVVLDQ PDWUL[ HYDOXDWHG DW WR HVWLPDWH WKH REVHUYHG LQIRUPDWLRQ PDWUL[ EDVHG RQ WKH LQFRPSOHWH GDWD 7KH QHJDWLYH +HVVLDQ LV G GSnGS 0\f GF3f GFrSf GS nGS GSnGS YDUS7;ffY]77;f?< \f f O[^S\fO[?<^3\f 7KLV H[SUHVVLRQ IRU WKH QHJDWLYH +HVVLDQ ZDV QRWHG E\ 6XQGEHUJ f +H UHIHUUHG WR WKH PDWUL[ ,;?\ DV D PHDVXUH RI LQIRUPDWLRQ ORVV :LWK UHJDUG WR ORVW LQIRUPDWLRQ OHW XV VXSSRVH WKH REVHUYHG GDWD < DUH VXFK WKDW 7;f J
PAGE 151
FDQ EH XVHG WR WKLV HQG 1RWLFH WKDW ERWK 6< \f DQG ,< \f RU D QXPHULFDO DSSUR[LPDWLRQ WKHUHRIf ZRXOG QHHG WR EH FRPSXWHG DW HDFK LWHUDWLRQ 6SHFLILFDOO\ WKH LWHUDWLYH VFKHPH FDQ EH ZULWWHQ DV f &RPSXWH f 0 $
PAGE 152
HVWLPDWHV ZKHQ WKH PRGHO SDUDPHWHUV DUH HVWLPDWHG XQGHU WKH SURGXFW 3RLVVRQ DVVXPSWLRQ 6HFWLRQ VKRZV WKDW WKH (0 DOJRULWKP WDNHV RQ D SDUWLFXODUO\ VLPSOH IRUP ZKHQ WKH FRPSOHWH GDWD DUH DVVXPHG WR EH SURGXFW 3RLVVRQ ZLWK PHDQV IROORZLQJ D ORJOLQHDU PRGHO ,Q VHFWLRQ ZH GHULYH DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ WKDW LV EDVHG RQ WKH REVHUYDEOH LQFRPSOHWH GDWD 6HFWLRQ GLVFXVVHV LQIHUHQFHV IRU PXOWLQRPLDO ORJOLQHDU PRGHOV 7KH (0 $OJRULWKP IRU 3RLVVRQ /RJOLQHDU 0RGHOV /HW ; ;L ; ;ff UHSUHVHQW WKH fFRPSOHWHf GDWD YHFWRU RI FHOO FRXQWV DQG VXSSRVH WKDW ;L a LQGHS 3RLVVRQLÂf Lr O Q ZKHUH SL LÂf VDWLVILHV WKH ORJOLQHDU PRGHO ORJSf = +HUH = LV VRPH Q[S IXOO UDQN PRGHO PDWUL[ DQG c LV D S [ SDUDPHWHU YHFWRU 6XSSRVH RQO\ FHUWDLQ GLVMRLQW VXPV RI ; DUH REVHUYDEOH /HW <
PAGE 153
'HQRWH UHDOL]DWLRQV RI ; DQG < E\ [ DQG \ 7KH REMHFWLYH RI WKLV VHFWLRQ LV WR ILQG WKH PD[LPXP OLNHOLKRRG HVWLPDWH RI GHQRWHG E\ EDVHG RQ WKH REVHUYHG GDWD :ULWLQJ WKH GHQVLW\ RI WKH FRPSOHWH GDWD ; DV I[[@f D[f H[S[n= OnH=f f ZH VHH WKDW I[ KDV IRUP f DQG WKDW D VXIILFLHQW VWDWLVWLF IRU c LV =n; ,W IROORZV E\ f WKDW < /; KDV ORJ OLNHOLKRRG RI WKH IRUP 0\f Fr\fF"f f ZKHUH Fr DQG F DUH IXQFWLRQV GHILQHG LQ VHFWLRQ %XW E\ SURSHUWLHV RI WKH PDWUL[ / ZH NQRZ WKDW < KDV D SURGXFW 3RLVVRQ GLVWULEXWLRQ 6SHFLILFDOO\
PAGE 154
WKH VFRUH YHFWRU ,W LV R P P \f /ANP]n'0/> Â‘ =n'0 "/L P P =neOf 6 f =n'A7 /f a f P P ]fULnOf 6 /AS]Sf/Lf A:W e /f PZ e fec ]Y ]Y =n>Qf /nOP =fS f ZKHUH LQ WKH ODVW OLQH DQG ffÂ§f UHSUHVHQW FRPSRQHQWZLVH RSHUDWRUV $V VKRZQ LQ VHFWLRQ HTXDWLRQ f WKH ORJ OLNHOLKRRG RI WKH LQFRPSOHWH GDWD FDQ DOWHUQDWLYHO\ EH H[SUHVVHG DV =SOU: 9f (H=n;?< \fa (f=n;f VLQFH GFrfG (S=n;?< \f DQG GFfG (S=n;f (YLGHQWO\ VLQFH (S=n;f =nIL LW PXVW EH WKDW (=n;?< \f =n>Q 9OP eneff@ f 7KHUHIRUH WKH (0 DOJRULWKP LV VLPSO\ f )LQG =n00f Â‘ /nOP enMAMff@ f 6ROYH IRU "Â‘n} LQ =!f!f =n>SI!f f /OP ÂnMAff@ f ,I rf__ 72/ WKHQ UHSODFH IA E\ AA DQG JR WR f (OVH VWRS f ,Q SUDFWLFH ILQGLQJ D UHDVRQDEOH VWDUWLQJ YDOXH IRU VD\ Ar? LV YHU\ GLIILFXOW +RZHYHU LQ YLHZ RI WKH ILUVW VWHS RI WKH (0 DOJRULWKP ZH QHHG RQO\ EH FRQFHUQHG ZLWK DQ LQLWLDO HVWLPDWH RI ÂL 1RWLFH WKDW LI $rf WKH LQLWLDO
PAGE 155
JXHVV IRU Â VDWLVILHV /Lrf \ WKHQ ZH KDYH WDFLWO\ FKRVHQ DQ DSSURSULDWH "rf WR VWDUW WKH DOJRULWKP 7KLV LV VR VLQFH ZH FDQ JR WR VWHS f RI WKH DOJRULWKP DQG FDOFXODWH A WKH VROXWLRQ WR WKH HTXDWLRQ ,Q IDFW : =n=f=n ORJf 7KXV WKH (0 DOJRULWKP KDV WKH QLFH IHDWXUH WKDW QRW RQO\ LV LW LQVHQVLWLYH WR VWDUWLQJ YDOXHV EXW DOVR UHDVRQDEOH VWDUWLQJ YDOXHV DUH VLPSOH WR ILQG $ )2575$1 SURJUDP fHPORJOLQf KDV EHHQ ZULWWHQ WR DFWXDOO\ LPSOHPHQW WKH (0 DOJRULWKP DV GHILQHG LQ f 2EWDLQLQJ WKH 2EVHUYHG ,QIRUPDWLRQ 0DWUL[ ,Q WKH SUHYLRXV VHFWLRQ ZH VKRZHG KRZ RQH FDQ REWDLQ PD[LPXP OLNHOLn KRRG HVWLPDWHV RI WKH ORJOLQHDU PRGHO SDUDPHWHUV XVLQJ WKH (0 DOJRULWKP ,Q WKLV VHFWLRQ ZH DGGUHVV WKH PDMRU GUDZEDFN RI WKH (0 DOJRULWKP DQ HVWLPDWH RI WKH SUHFLVLRQ RI WKHVH 0/ HVWLPDWHV LV QRW REWDLQHG DV D E\SURGXFW RI WKH DOJRULWKP :H GHULYH DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ DVVRFLDWHG ZLWK WKH ORJOLQHDU PRGHO SDUDPHWHUV WKDW LV LQWXLWLYHO\ DSSHDOLQJ DQG VLPSOH WR HYDOXDWH 8SRQ FRQYHUJHQFH RI WKH (0 DOJRULWKP WKH REVHUYHG LQIRUPDWLRQ PDWUL[ LV HYDOXDWHG DW WKH 0/ HVWLPDWHV DQG LQYHUWHG 7KH LQYHUVH LQIRUPDWLRQ FDQ EH XVHG DV DQ HVWLPDWH RI SUHFLVLRQ $JUHVWL f 1RWLFH WKDW LQ WKLV VHFWLRQ ZH FRQVLGHU XVLQJ WKH REVHUYHG LQIRUPDWLRQ UDWKHU WKDQ WKH H[SHFWHG LQIRUPDWLRQ :H IROORZ WKH OHDG RI (IURQ DQG +LQFNOH\ f ZKLFK EXLOGV D FDVH IRU WKH SUHIHUUHG XVH RI WKH REVHUYHG LQIRUPDWLRQ ,I GHVLUHG KRZHYHU WKH H[SHFWHG LQIRUPDWLRQ FDQ HDVLO\ EH FRPSXWHG VLQFH WKH REVHUYHG LQIRUPDWLRQ LV VKRZQ WR EH D OLQHDU IXQFWLRQ RI WKH LQFRPSOHWH GDWD
PAGE 156
5HFDOO WKH VHWXS LQ WKH SUHYLRXV VHFWLRQ 2QO\ GLVMRLQW VXPV RI D FRPSOHWH GDWD YHFWRU ; ZKLFK LV SURGXFW 3RLVVRQ DUH REVHUYDEOH 7KH FRPSOHWH GDWD PHDQV DUH DVVXPHG WR IROORZ D ORJOLQHDU PRGHO RI WKH IRUP ORJ cM = %\ H[SUHVVLRQ f RI VHFWLRQ ZH VHH WKDW WKH REVHUYHG LQIRUPDWLRQ PDWUL[ EDVHG RQ WKH LQFRPSOHWH GDWD KDV IRUP ,UWILYf YDUS=n;f YD[S=n;?< \f fÂ§ ,[IOf $GMXVWPHQW 0DWUL[f 7KLV H[SUHVVLRQ LV LQWXLWLYHO\ DSSHDOLQJ VLQFH YDUS=n;f =n'ILf= LV WKH H[SHFWHG DQG REVHUYHGf LQIRUPDWLRQ IRU c WUHDWLQJ WKH FRPSOHWH GDWD ; DV LI LW ZHUH REVHUYHG ZKLOH YDUS=n;?< \f LV DQ DGMXVWPHQW WKDW LV QHFHVVDU\ EHFDXVH ZH GR QRW DFWXDOO\ REVHUYH ; EXW RQO\ /; < 7KH DPRXQW RI LQIRUPDWLRQ ORVW E\ REVHUYLQJ RQO\ < LV GHWHUPLQHG E\ WKH FRQGLWLRQDO YDULDQFH RI WKH VXIILFLHQW VWDWLVWLF =n; JLYHQ /; \ $W WKLV SRLQW RQH FRXOG GHULYH D IRUPXOD IRU WKH DGMXVWPHQW PDWUL[ DV LQ D WHFKQLFDO UHSRUW E\ WKH DXWKRU 7KH JLVW RI WKH DUJXPHQW ZDV WKDW WKH GLVWULEXWLRQ RI ;?< \ KDV D VLPSOH IRUP ZKHQ < UHSUHVHQWV GLVMRLQW VXPV RI WKH LQGHSHQGHQW 3RLVVRQ UDQGRP YDULDEOHV ; DQG VR WKH FRQGLWLRQDO YDULDQFH RI ; RU =n;f JLYHQ < \ FDQ HDVLO\ EH FRPSXWHG $ PDLQ UHVXOW RI WKDW WHFKQLFDO UHSRUW ZDV WKDW FRY;D;_ /; \f M ÂD f IUDf f UDf W7 2f M7 f f AUDf!f n AD f UDfA UDfA ^ 9UDf /7A /7A n UDf Uff r ADf ZKHUH A LV WKH LQGLFDWRU IXQFWLRQ DQG UMf LV GHILQHG DV IROORZV f URZ QXPEHU LQ ZKLFK ff RFFXUV IRU WKH MWK FROXPQ RI / LI D ff GRHV QRW RFFXU LQ FROXPQ M RI /
PAGE 157
,Q WKLV GLVVHUWDWLRQ ZH ZLOO WDNH D GLIIHUHQW DSSURDFK 7KH H[SOLFLW IRUP RI WKH VFRUH VWDWLVWLF IRU < ZDV GHULYHG LQ HTXDWLRQ f 6LQFH WKH REVHUYHG LQIRUPDWLRQ LV QRWKLQJ EXW WKH QHJDWLYH +HVVLDQ RI WKH ORJ OLNHOLKRRG ZH FDQ REWDLQ DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ E\ VLPSO\ GLIIHUHQWLDWLQJ WKH QHJDWLYH RI WKH VFRUH IXQFWLRQ ZLWK UHVSHFW WR n 7KH DSSHQGL[ VKRZV KRZ RQH DUULYHV DW O<3?\f fÂ§eSc6<3?\f =n'AfsA/L/nWfP= m f P W L ? Y
PAGE 158
([DPSOH 0LVVLQJ &RPSRQHQWVfÂ§:KHQ FHUWDLQ FRPSRQHQWV DUH XQREVHUYDEOH / ZLOO EH DQ LGHQWLW\ PDWUL[ ZLWK URZV PLVVLQJ ,W IROORZV WKDW WKH REVHUYHG LQIRUPDWLRQ PDWUL[ LV ,ULILZf =n'MLf= =n'^0Qf= ZKHUH 0 LV D GLDJRQDO PDWUL[ ZLWK MWK GLDJRQDO HOHPHQW 0fMM ,UMf f ([DPSOH /DWHQW &ODVV 0RGHOVfÂ§6XSSRVH WKDW FRXQWV UHVXOWLQJ IURP D FURVVFODVVLILFDWLRQ RQ VHYHUDO IDFWRUV DUH REVHUYDEOH DQG WKDW FODVVLILFDWLRQ RQ DQ DGGLWLRQDO .OHYHO ODWHQW IDFWRU LV XQREVHUYDEOH :H OHW WKH VXEVFULSW L UHSUHVHQW D FRPSRXQG VXEVFULSW LGHQWLI\LQJ FODVVLILFDWLRQ RQ REVHUYDEOH IDFWRUV ZKLOH WKH VXEVFULSW M LQGH[HV WKH ODWHQW FODVVHV 'HQRWH WKH FRPSOHWH GDWD YHFWRU RI FHOO FRXQWV E\ ; ;X ;XF ;P? ;P.f7 ^;LM` DQG WKH LQFRPSOHWH GDWD E\ < ^;Â` 1RWLFH WKDW < /; ZKHUH / A p ,P 2QH SRVVLEOH ODWHQW FODVV PRGHO DVVXPHV WKH PHDQV RI WKH XQREVHUYDEOH FRPSOHWH GDWD VD\ LÂ\ IROORZ D ORJOLQHDU PRGHO WKDW LPSOLHV FRQGLWLRQDO LQGHSHQGHQFH RI REVHUYHG IDFWRUV JLYHQ WKH ODWHQW IDFWRU FODVVLILFDWLRQ +DEHUPDQ f ,W IROORZV WKDW WKH REVHUYHG LQIRUPDWLRQ PDWUL[ LV ,U3? \f =n'Qf= =? ke 9cf= f ZKHUH HDFK LV WKH FRYDULDQFH RI D [ PXOWLQRPLDO YHFWRU ZLWK LQGH[ \L ;L DQG FHOO SUREDELOLWLHV ^AÂM
PAGE 159
D FURVVFODVVLILFDWLRQ RQ WZR IDFWRUV ) DQG ) DORQJ ZLWK D GLFKRWRPRXV QRQUHVSRQVH LQGLFDWRU 5 6XSSRVH WKH )\ FODVVLILFDWLRQ LV DOZD\V NQRZQ DQG WKDW 5 LQGLFDWHV ZKHWKHU RU QRW WKH ) FODVVLILFDWLRQ LV NQRZQ 7R PDNH LQIHUHQFHV DERXW WKH FODVVLILFDWLRQ SUREDELOLWLHV DQG PLVVLQJ GDWD DVVXPSWLRQV /LWWOH t 5XELQ DVVXPH WKH FRPSOHWH GDWD PHDQV IROORZ D ORJOLQHDU PRGHO 9DULDQFH HVWLPDWHV RI WKH ORJOLQHDU SDUDPHWHUV DUH HDVLO\ GHULYHG VLQFH WKH REVHUYHG GDWD KDYH IRUP < /; DQG / VDWLVILHV f ,QIHUHQFHV IRU 0XOWLQRPLDO /RJOLQHDU 0RGHOV 3UHYLRXVO\ ZH DVVXPHG WKDW WKH FRPSOHWH GDWD ZHUH GLVWULEXWHG DV SURGXFW 3RLVVRQ LH WKH FRPSOHWH GDWD FRPSRQHQWV DUH LQGHSHQGHQW 3RLVVRQ UDQGRP YDULDEOHV +RZHYHU WKH VDPSOH VL]H LV RIWHQ IL[HG E\ GHVLJQ VR WKDW WKH GLVWULEXWLRQ RI WKH FRPSOHWH GDWD YHFWRU PD\ UHDOO\ EH PXOWLQRPLDO 7KLV IROORZV VLQFH D SURGXFW 3RLVVRQ YHFWRU JLYHQ WKH WRWDO LV PXOWLQRPLDO 6LQFH WKH WRWDO VDPSOH VL]H LV FRQVLGHUHG D UDQGRP YDULDEOH ZKHQ WKH SURGXFW 3RLVVRQ DVVXPSWLRQ LV XVHG WKH DVVXPSWLRQ VHHPV WR EH XQUHDVRQDEOH )RUWXQDWHO\ %LUFK f DQG 3DOPJUHQ f VKRZHG WKDW PD[LPXP OLNHOLKRRG LQIHUHQFHV DERXW DOO RI WKH ORJOLQHDU SDUDPHWHUV WKDW DUH QRW IL[HG E\ GHVLJQ DUH WKH VDPH ZKHWKHU RQH DVVXPHV WKH GLVWULEXWLRQ LV SURGXFW 3RLVVRQ RU PXOWLQRPLDO 7KHUHIRUH LW LV JHQHUDO SUDFWLFH WR DVVXPH WKH GDWD DUH SURGXFW 3RLVVRQ VLQFH WKH 3RLVVRQ GLVWULEXWLRQ LV LQ WKH UHJXODU H[SRQHQWLDO IDPLO\ DQG KDV DQ XQFRQVWUDLQHG FDQRQLFDO SDUDPHWHU 7KH 3RLVVRQ ORJOLQHDU PRGHO LV DQ H[DPSOH RI D JHQHUDOL]HG OLQHDU PRGHO 0F&XOODJK DQG 1HOGHU f ZKLFK PDNHV LW VLPSOH WR ZRUN ZLWK
PAGE 160
,Q WKLV VHFWLRQ ZH GLVFXVV PDNLQJ LQIHUHQFHV DERXW ORJOLQHDU SDUDPHWHUV ZKHQ WKH VDPSOLQJ GHVLJQ LV VXFK WKDW WKH WRWDO VDPSOH VL]H LV FRQVLGHUHG IL[HG EXW WKH GDWD DUH QRW FRPSOHWHO\ REVHUYHG LH WKHUH LV PLVVLQJ GDWD ,W LV QRW REYLRXV WKDW WKH UHVXOWV RI %LUFK H[WHQG WR WKH FDVH RI LQFRPSOHWH GDWD 7KHUHIRUH ZH SURYLGH D GHWDLOHG GLVFXVVLRQ RI WKH H[WHQVLRQ WR WKH PLVVLQJ GDWD FDVH 7KH 6HWXS ,Q WKH IROORZLQJ DUJXPHQW ZH DVVXPH WKDW WKH PDWUL[ / LV VXFK WKDW HDFK FROXPQ KDV DW OHDVW RQH ff LQ LW 7KLV UHTXLUHPHQW UHVXOWV LQ WKH LQFRPSOHWH GDWD < /; KDYLQJ WKH VDPH VXP WRWDO DV WKH FRPSOHWH GDWD LH 9P< nP/; OnQ;G 1 :H DOVR UHTXLUH WKH ORJOLQHDU PRGHO WR LQFOXGH DQ LQWHUFHSW WHUP 7KLV LQWHUFHSW WHUP ZLOO EH WKH SDUDPHWHU WKDW LV IL[HG E\ GHVLJQ VLQFH WKH WRWDO VDPSOH VL]H 1 ZLOO EH FRQVLGHUHG IL[HG )XOO0XOWLQRPLDO 6DPSOLQJ 6XSSRVH WKDW WKH FRPSOHWH GDWD YHFWRU ; KDV D PXOWLQRPLDO GLVWULEXWLRQ LH ; ;X;Q\ a 0XOWL9Uff ZKHUH 1 OnQ; LV WKH IL[HG WRWDO VDPSOH VL]H DQG LUf WWLf UHSUHVHQWV WKH YHFWRU RI FHOO SUREDELOLWLHV WKDW VDWLVI\ ;f L Af fÂ§ 6LQFH 1 LV FRQVLGHUHG IL[HG LW PDNHV VHQVH WR ZULWH WKH FHOO PHDQV DV 1WYf VR WKDW ef LLrf 1 $VVXPH DOVR WKDW WKH FHOO PHDQV ^ÂÂf` IROORZ WKH ORJOLQHDU PRGHO ORJ Pf RF [nIO W OQ ZKHUH LV D S [ YHFWRU DQG DFRQWDLQV WKH VR FDOOHG ORJOLQHDU SDUDPHWHUV
PAGE 161
)XUWKHU VXSSRVH WKDW RQO\ <
PAGE 162
:H ZLOO FDOO WKH QHZ SDUDPHWHU VSDFH 4r0 DQG QRWH WKDW LW LV kP ^r 0nfn 7 1IH5"f 7KH LQFRPSOHWHGDWD OLNHOLKRRG XQGHU WKH 0fXOWLQRPLDO DVVXPSWLRQ FDQ EH ZULWWHQ LQ WHUPV RI WKLV QHZ SDUDPHWHUL]DWLRQ DV Ar Vnf (ZLRJArf ZLRJ1 L A\ÂORJUORJA/ H[S;ff ORJ/nÂ H[S;ff 1ORJL9 L L 1 ORJ U 1 ORJ 1 9L OR6 /nL H[3L[3f 1 OrJ( /L H[3;3f L L
PAGE 163
IRXU SURSHUWLHV WKDW LW GLG LQ WKH PXOWLQRPLDO VHWXS 7KH YHFWRU < LV WKHQ GLVWULEXWHG DV SURGXFW 3RLVVRQ 6SHFLILFDOO\ rf DV DERYH ORJ U@L^rf ORJU ORJ /? H[S;ff ORJ /? H[S;ff L :H ZLOO GHQRWH WKH PRGHO SDUDPHWHU VSDFH IRU WKH 3RLVVRQ VDPSOLQJ FDVH E\ S ]] ^r fÂ§ Unfn U H 5 5Lf` ZKHUH WKH V\PERO L UHSUHVHQWV WKH VHW RI SRVLWLYH UHDO QXPEHUV ,W LV LPSRUWDQW WR QRWH WKDW tP A" VLQFH FRQVWUDLQV U WR HTXDO 1 ZKLOH S UHTXLUHV U RQO\ WR EH SRVLWLYH 7KH LQFRPSOHWHGDWD 3RLVVRQ ORJ OLNHOLKRRG FDQ EH ZULWWHQ DV 3fr\f Y Ârff L ;c\ÂORVU ORJAAAS:ff ORV/Â H[SAfff 6Arf LW L \?RJ77
PAGE 164
IXQFWLRQV DUH LGHQWLFDO LPSO\LQJ WKDW WKH PD[LPXP OLNHOLKRRG HVWLPDWHV RI c DUH WKH VDPH IRU ERWK VDPSOLQJ VFKHPHV 7KDW LV LI ZH OHW 0f DQG Sf GHQRWH WKH 0/ HVWLPDWHV RI XQGHU WKH PXOWLQRPLDO DQG 3RLVVRQ VDPSOLQJ VFKHPHV UHVSHFWLYHO\ ZH KDYH VKRZQ WKDW 0f "Sf $OVR E\ f DQG f ZH VHH WKDW XSRQ GLIIHUHQWLDWLQJ D VHFRQG WLPH G GInG G 0f 4I6LGIA< GnGI VR WKDW WKH SRUWLRQ RI WKH LQIRUPDWLRQ PDWUL[ WKDW SHUWDLQV WR LV WKH VDPH IRU ERWK VDPSOLQJ VFKHPHV )XUWKHU HTXDWLRQ f VKRZV WKDW WKH ORJ OLNHOLKRRG IRU LQFRPSOHWH 3RLVVRQ FRPSRQHQWV FDQ EH H[SUHVVHG DV D VXP RI WZR SDUDPHWHU LQGHSHQGHQW ORJ OLNHOLKRRGV 7KXV WKH SDUDPHWHUV DUH RUWKRJRQDO LQ WKDW WKH LQIRUPDWLRQ PDWUL[ LV EORFN GLDJRQDO LH WKH SDUDPHWHU HVWLPDWHV DUH DV\PSWRWLFDOO\ XQFRUUHODWHG 7KH LQYHUVH RI D EORFN GLDJRQDO PDWUL[ LV VLPSO\ WKH EORFN GLDJRQDO PDWUL[ RI WKH LQGLYLGXDO LQYHUVHV +HQFH WKH HVWLPDWHG YDULDQFH RI WKH 0/ HVWLPDWHV RI LV WKH VDPH IRU HLWKHU VDPSOLQJ VFKHPH &HOO 0HDQ ,QIHUHQFH 1RWLFH WKDW QRW RQO\ LV 0f Sf EXW DOVR WPf fÂ§ ISf 1 7KLV IROORZV VLQFH LQ WKH PXOWLQRPLDO FDVH U LV QHFHVVDULO\ HTXDO WR WKH WRWDO VDPSOH VL]H L9 ZKLOH LQ WKH 3RLVVRQ FDVH ALUf LV VLPSO\ WKH ORJ OLNHOLKRRG RI WKH UDQGRP YDULDEOH < ZKLFK LV 3RLVVRQ ZLWK PHDQ U LPSO\LQJ WKDW WKH 0/ HVWLPDWH LV ISf < 1 +RZHYHU ZH PXVW DFNQRZOHGJH WKH IDFW WKDW WKH DV\PSWRWLF YDULDQFH RI I XQGHU WKH 3RLVVRQ DVVXPSWLRQ LV DSSUR[LPDWHO\ 1 LW LV YDU
PAGE 165
PHDQV RU FHOO SUREDELOLWLHVf LQYROYH DOO RI WKH ORJOLQHDU SDUDPHWHUV HYHQ U 7KXV WKH YDULDQFH RI WKH FHOO PHDQ HVWLPDWHV ZLOO GHSHQG XSRQ ZKLFK VDPSOLQJ VFKHPH LV XVHG %ULHIO\ XVLQJ WKH (0 DOJRULWKP ZH FDQ ILQG WKH REVHUYHG LQIRUPDWLRQ IRU WKH ORJOLQHDU SDUDPHWHUV D cnfn EDVHG RQ WKH DVVXPSWLRQ WKDW WKH FRPSOHWH GDWD DUH SURGXFW 3RLVVRQ 7KH FRPSOHWH GDWD PHDQV +L DUH DVVXPHG WR IROORZ WKH ORJOLQHDU PRGHO ORJ +L D [nL L Q ,I WKH VDPSOLQJ GHVLJQ LV VXFK WKDW ; 1 WKH WRWDO VDPSOH VL]H LV IL[HG VR WKDW ; a 0XOWL9 FDff WKHQ WKH SDUDPHWHU D LV fIL[HG E\ GHVLJQf $FWXDOO\ XSRQ UHSDUDPHWHUL]DWLRQ ZH VHH WKDW c LV IUHH RI FRQVWUDLQWV EXW WKDW D D 1f LH D LV D IXQFWLRQ RI c DQG 1 ,Q IDFW m ORJe+Lf a ORJe H[S]nff L L ORJ1 ORJÂH[S[Aff f 2XU REMHFWLYH LV WR ILQG DQ HVWLPDWH RI WKH YDULDQFH RI WKH FHOO PHDQ HVWLPDWHV XQGHU WKH PXOWLQRPLDO DVVXPSWLRQ 7KH FDOFXODWLRQ RI WKLV YDULDQFH HVWLPDWH LV FRPSOLFDWHG VRPHZKDW VLQFH WKH YDULDQFH HVWLPDWH RI Â£ LV GLIIHUHQW IRU WKH WZR VDPSOLQJ VFKHPHV ,W LV D VLPSOH DSSOLFDWLRQ RI WKH GHOWD PHWKRG WR ILQG WKH YDULDQFH RI c XQGHU WKH 3RLVVRQ DVVXPSWLRQ VLQFH cM H[SGOQ ;f 7KLV IROORZV VLQFH ZHfYH IRXQG WKH LQIRUPDWLRQ IRU Df DQG KHQFH WKH HVWLPDWHG YDULDQFHFRYDULDQFH PDWUL[ RI Df EDVHG RQ WKH DVVXPSWLRQ WKDW WKH FRPSOHWH GDWD DUH SURGXFW 3RLVVRQ DQG WKDW WKH LQFRPSOHWH GDWD DUH RI WKH IRUP < /; ZLWK / VDWLVI\LQJ WKH VDPH IRXU SURSHUWLHV DV DERYH
PAGE 166
6LQFH XSRQ FRQYHUJHQFH RI WKH (0 DOJRULWKP ZH FRPSXWH WKH YDULDQFH FRYDULDQFH PDWUL[ RI Df XQGHU WKH SURGXFW 3RLVVRQ DVVXPSWLRQ RQO\ ZH PXVW ILQG D ZD\ WR UHZULWH L LQ WHUPV RI DQG 1 RQO\ %XW E\ f ZH KDYH WKH UHODWLRQVKLS G ORJ 1 OrJÂ H[3[nLILff VR WKDW $ H[SGOQ ;If H[S OQORJL9 OfORJ fÂ§fÂ§Af ;c ? eH[S 1 H[S;fA ? B1U H[S;f ? A("H[S[Af ?9QH[S;cff 1RZ VLQFH WKH LQIRUPDWLRQ IRU LV WKH VDPH XQGHU ERWK VDPSOLQJ VFKHPHV ZH FDQ ILQG DQ HVWLPDWH RI WKH YDULDQFH RI c DVVXPLQJ WKH FRPSOHWH GDWD DUH PXOWLQRPLDOO\ GLVWULEXWHG :H ZLOO DFWXDOO\ ILQG WKH YDULDQFH RI U ZKLFK LV QRWKLQJ EXW IR H[S;fOnQ H[S;"ff YLD WKH GHOWD PHWKRG 'HOWD 0HWKRG 6LQFH WKH 0/ HVWLPDWH c LV FRQVLVWHQW D ILUVW RUGHU DSSUR[LPDWLRQ WR U FDQ EH IRXQG E\ XVLQJ D 7D\ORUfV H[SDQVLRQ DERXW WKH WUXH SDUDPHWHU YDOXH YL] 7 m Wf a 3Rf 7KXV WKH YDULDQFH RI LU LV DSSUR[LPDWHO\ YDUf Z YDU rÂ‘f Ac?SR a 3Rff $ $ ZKHUH YDUf LV WKDW SRUWLRQ RI WKH YDULDQFHFRYDULDQFH PDWUL[ RI Gf SHUWDLQLQJ WR 5HFDOO WKDW LW ZDV VKRZQ DERYH WKDW WKLV SRUWLRQ LV WKH VDPH IRU ERWK VDPSOLQJ VFKHPHV
PAGE 167
,W LV VKRZQ LQ $SSHQGL[ % WKDW >'Wf QUn@; f ZKHUH ; =>OM 7KDW LV ; LV WKH GHVLJQ PDWUL[ ZLWK WKH ILUVW FROXPQ GHOHWHG +HQFH WKH YDULDQFH RI LU XQGHU WKH PXOWLQRPLDO DVVXPSWLRQ FDQ EH HVWLPDWHG E\ YDU0XOWUf >'WFf UUn@;YDUf;nf>=Uf IULUn@ f /DWHQW &ODVV 0RGHO )LWWLQJfÂ§$Q $SSOLFDWLRQ 7R IXUWKHU LOOXVWUDWH WKH XWLOLW\ RI WKH DERYH UHVXOWV ZH H[SORUH WKH ILWWLQJ RI ORJOLQHDU ODWHQW FODVV PRGHOV )RU DQ H[SRVLWRU\ RQ ODWHQW FODVV DQDO\VLV VHH +DEHUPDQ f 6XSSRVH ZH FDQ REVHUYH PDQLIHVWf IDFWRUV $L $ $S ZLWK =M = ,S OHYHOV UHVSHFWLYHO\ ZKLOH D ODWHQW IDFWRU : ZLWK OHYHOV LV QRW REVHUYDEOH &RQVLGHU WKH VHW RI FHOOV & ^ f f =" ,Sf` UHVXOWLQJ IURP D FURVV FODVVLILFDWLRQ RQ IDFWRUV $? $S /LVWLQJ WKH HOHPHQWV RI & LQ OH[LFRJUDSKLFDO RUGHU ZH GHQRWH WKH ILUVW FHOO E\ WKH VHFRQG E\ DQG VR RQ WR P ZKHUH P Q" L 8 :LWK WKLV UHSUHVHQWDWLRQ WKH FRPSOHWH GDWD WKH rP FHOO FRXQWVf DUH ; ;Q ;L.;P;%.f7 7KH REVHUYHG GDWD < DUH WKH PDUJLQDO FRXQWV FROODSVHG RYHU ODWHQW IDFWRU : +HUH < /; ; ;Pf7 ZKHUH / O7. p =P :H LQLWLDOO\ DVVXPH WKDW ; LV FRPSRVHG RI LQGHSHQGHQW 3RLVVRQV ZLWK PHDQV IROORZLQJ WKH ORJOLQHDU PRGHO ORJ0MIIf D [nLM3 r IZ M O.
PAGE 168
:H FDQ XVH WKH (0DOJRULWKP RI f WR GHULYH D"nfn DQG HTXDWLRQ f WR REWDLQ DQ HVWLPDWH RI LWV YDULDQFH )URP f WKH DGMXVWPHQW PDWUL[ LV =nYD[;?/; \f= ZLWK YDU; /; \f 9 9 R RR ZKHUH ? 9P9L \ $rML 3 $}L +ML WrL \r 0 L} \? \r PÂ rÂ Ac $nLL7 a9L n 9c +L 3L. I +L. ? 9L m L 0L f 0. f 1RWLFH WKDW LV WKH FRYDULDQFH RI D [ PXOWLQRPLDO YHFWRU ZLWK LQGH[ \L fÂ§ ;L DQG FHOO SUREDELOLWLHV M .` /HW GHQRWH WKH ILQDO HVWLPDWH RI REWDLQHG XSRQ FRQYHUJHQFH RI WKH (0DOJRULWKP 8VLQJ f DQG f ZH FDQ GHULYH DQ H[SOLFLW HVWLPDWH RI WKH YDULDQFHFRYDULDQFH PDWUL[ RI ,W LV ]n'WI`ff= =n HJ f ZKLFK LV WKH LQYHUVH RI WKH LQIRUPDWLRQ PDWUL[ HYDOXDWHG DW 1XPHULFDO ([DPSOH :H FRQVLGHU WKH H[DPSOH LQWURGXFHG LQ VHFWLRQ 7KH REVHUYHG GDWD DUH FRXQWV UHVXOWLQJ IURP FURVVFODVVLI\LQJ WKH UHVSRQGHQWV ZLWK UHVSHFW WR ZKHWKHU WKH\ WHQG WRZDUG XQLYHUVDOLVWLF f RU SDUWLFXODULVWLF f YDOXHV LQ IRXU GLIIHUHQW VLWXDWLRQV $%&'f RI UROH FRQIOLFW 7KH GDWD DUH GLVSOD\HG EHORZ LQ 7DEOH
PAGE 169
7DEOH 2EVHUYHG FURVVFODVVLILFDWLRQ RI UHVSRQGHQWV ZLWK UHVSHFW WR ZKHWKHU WKH\ WHQG WRZDUG XQLYHUVDOLVWLF f RU SDUWLFXODULVWLF f YDOXHV LQ IRXU VLWXDWLRQV $%&'f RI UROH FRQIOLFW 2EVHUYHG 2EVHUYHG $ % & IUHTXHQF\ $ % & IUHTXHQF\ :H LOOXVWUDWH WKH UHVXOWV RI WKH SUHYLRXV VHFWLRQV E\ ILWWLQJ D VLPSOH ORJOLQHDU ODWHQW FODVV PRGHO WR WKH GDWD 7KH RUGLQDU\ WZROHYHO ODWHQW FODVV PRGHO ILWWHG E\ *RRGPDQ LV HTXLYDOHQW WR WKH ORJOLQHDU PRGHO ORJ$ELIFLW fÂ§ $ $I $ r ? Q ZKHUH LM N O DQG W UXQ IURP WR 8VLQJ WKH QRWDWLRQ GHILQHG DERYH WKH VHW RI REVHUYDEOH FHOOV LV & ^fffff` DQG P 7KH FRPSOHWH GDWD DUH [ DTL e f Â‘ f eLL eLf7 ZKHUH IRU LQVWDQFH e e UHSUHVHQWV WKH FRXQW LQ FHOO f $OWKRXJK ZH DVVXPH WKDW WKH FRPSOHWH GDWD PHDQV VDWLVI\ WKH PRGHO LQ f ZH DUH RQO\ DEOH WR REVHUYH \ /[ ZKHUH / n p +HQFH ZH ZLOO ILW WKH PRGHO XVLQJ
PAGE 170
WKH (0 DOJRULWKP GHILQHG LQ f 7KH )2575$1 SURJUDP HPORJOLQ ZDV XVHG WR ILW WKH PRGHO 7KH LQSXW LQIRUPDWLRQ QHHGHG LV f Pf DQ LQLWLDO HVWLPDWH RI WKH FRPSOHWH GDWD PHDQV f P DQG Q WKH OHQJWK RI WKH REVHUYHG DQG FRPSOHWH GDWD YHFWRUV f S WKH QXPEHU RI LQGHSHQGHQW ORJOLQHDU SDUDPHWHUV f = WKH GHVLJQ PDWUL[ f / WKH P[Q PDWUL[ WKDW VDWLVILHV /[ \ $V GLVFXVVHG LQ VHFWLRQ D VLPSOH LQLWLDO HVWLPDWH RI S DQG KHQFH RI LV RQH WKDW VDWLVILHV /SA \ %XW E\ VLPSO\ DOORFDWLQJ DSSUR[LPDWHO\ D KDOI RI HDFK REVHUYHG FHOO FRXQW WR WKH WZR OHYHOV RI WKH ODWHQW IDFWRU ZH FDQ ILQG D WKDW VDWLVILHV /SA \ 7KLV LQLWLDO HVWLPDWH RI S DOVR DOORZV XV WR RPLW WKH GLUHFW LQSXW RI WKH REVHUYHG GDWD ZKLFK FDQ EH REWDLQHG YLD 9rf \ 7KH WZROHYHO ODWHQW FODVV PRGHO ILW WKH GDWD ZHOO GI f WKHUHE\ JLYLQJ XV D VLPSOH ZD\ RI LQWHUSUHWLQJ WKH DVVRFLDWLRQ DPRQJ WKH IRXU VLWXDWLRQV RI UROH FRQIOLFW 7DEOH GLVSOD\V WKH PRGHO SDUDPHWHU HVWLPDWHV DQG WKHLU HVWLPDWHG VWDQGDUG HUURUV 7R PDNH PRGHO f LGHQWLILDEOH WKRVH SDUDPHWHUV QRW GLVSOD\HG LQ 7DEOH ZHUH VHW WR ]HUR 7KH ODVW FROXPQ HQWLWOHG f8QDGM 6WG (UURUf FRQWDLQV WKH VWDQGDUG HUURU HVWLPDWHV WKDW ZRXOG EH XVHG LI WKH FRPSOHWH GDWD ZHUH DFWXDOO\ REVHUYHG 7KHVH DUH WRR VPDOO DQG DUH LQYDOLG
PAGE 171
7DEOH 3DUDPHWHU DQG 6WDQGDUG (UURU (VWLPDWHV 3DUDPHWHU (VWLPDWH 6WG (UURU 8QDGM 6WG (UURU $r $" $" $" $I ?$: $ ?%: $ ?&: $ ?': $ (VWLPDWHV RI FHUWDLQ FODVVLILFDWLRQ SUREDELOLWLHV DQG WKHLU HVWLPDWHG VWDQGDUG HUURUV ZHUH DOVR FRPSXWHG 7KHVH SUREDELOLWLHV DUH GHILQHG DV r SZ Wf UU 3$ O?: Wf 7 Wr 3% ?: Wf r rm 3& ?: Wf frL7 rrrÂ‘ 3' ?: Wf 7KH VWDQGDUG HUURUV ZHUH IRXQG XVLQJ WKH DUJXPHQWV RI VHFWLRQ DQG WKH GHOWD PHWKRG )RU H[DPSOH WKH FRQGLWLRQDO SUREDELOLWLHV KDYH IRUP E?7 ZKHUH E? DQG E DUH [ Q YHFWRUV RI NQRZQ FRQVWDQWV 7KXV E\ D GLUHFW DSSOLFDWLRQ RI WKH GHOWD PHWKRG DQ HVWLPDWH RI WKH DV\PSWRWLF YDULDQFH LV nEWEL ELnUUEnLnn YDU OU? B EWEL t7t ?tr9 / tWf YDU WWf f ZKHUH YDUUf LV WKH YDULDQFH RI IU XQGHU WKH PXOWLQRPLDO DVVXPSWLRQ LH HTXDWLRQ f $FWXDOO\ VLQFH WKH FRQGLWLRQDO SUREDELOLWLHV GR QRW LQYROYH
PAGE 172
WKH LQWHUFHSW SDUDPHWHU WKH YDULDQFH RI U XQGHU WKH 3RLVVRQ DVVXPSWLRQ ZKLFK LV YDU IWf A'Yf=YD7D3f=n'Qf FRXOG EH XVHG LQ H[SUHVVLRQ f DQG WKH UHVXOW ZRXOG EH WKH VDPH 7KLV LV QRW WUXH RI WKH PDUJLQDO SUREDELOLWLHV ZKLFK KDYH IRUP EALU $Q HVWLPDWH RI WKH YDULDQFH RI E?WW LV YDUUf YDUUfn ZKHUH YDUUf LV WKH YDULDQFH RI U XQGHU WKH PXOWLQRPLDO DVVXPSWLRQ 7KH HVWLPDWH ZRXOG EH LQIODWHG LI RQH XVHG WKH YDULDQFH XQGHU WKH 3RLVVRQ DVVXPSWLRQ UHIOHFWLQJ WKH VWRFKDVWLF QDWXUH RI WKH WRWDO VDPSOH VL]H 7R LOOXVWUDWH ZH FRQVLGHU DQ H[WUHPH H[DPSOH /HW E? OnQ VR WKDW E?LU ZLWK SUREDELOLW\ RQH 7KDW LV ELLW LV QRQVWRFKDVWLF ,I ZH XVH WKH PXOWLQRPLDO YDULDQFH HVWLPDWRU ZH JHW ]HUR DV RXU HVWLPDWH RI WKH YDULDQFH 7KLV LV ZKDW ZH NQRZ LW WR EH 2Q WKH RWKHU KDQG XVLQJ WKH 3RLVVRQ YDULDQFH HVWLPDWRU ZH JHW VRPH SRVLWLYH YDOXH DV RXU HVWLPDWH RI WKH YDULDQFH 7KLV LV NQRZQ WR EH LQFRUUHFW 7KH HVWLPDWHG SUREDELOLWLHV DQG WKHLU HVWLPDWHG VWDQGDUG GHYLDWLRQV DUH GLVSOD\HG LQ 7DEOH 7DEOH &ODVVLILFDWLRQ 3UREDELOLW\ (VWLPDWHV 6WDQGDUG (UURUVf /DWHQW &ODVV W a$?: UO W W%?Z QLW &&O: OO W r'?: 7OW f f f f f f f f f f )URP WKHVH HVWLPDWHG FODVVLILFDWLRQ SUREDELOLWLHV ZH VHH WKDW OHYHO RI WKH ODWHQW FODVV : FDQ EH ODEHOHG WKH fXQLYHUVDOLVWLFf OHYHO 7KDW LV VXEMHFWV
PAGE 173
LQ OHYHO RI WKH ODWHQW FODVV WHQG WR KDYH XQLYHUVDOLVWLF YLHZV IRU DOO IRXU VLWXDWLRQV 1RWLFH WKDW JLYHQ D VXEMHFW LV LQ OHYHO RI WKH ODWHQW FODVV WKH SUREDELOLW\ WKDW WKH\ UHVSRQG fXQLYHUVDOLVWLFf LV HVWLPDWHG WR EH DW OHDVW IRU HDFK RI WKH IRXU VLWXDWLRQV 6LPLODUO\ RQH FRXOG ODEHO OHYHO RI WKH ODWHQW FODVV DV WKH fSDUWLFXODULVWLFf OHYHO ([FHSW IRU VLWXDWLRQ $ WKH HVWLPDWHG SUREDELOLW\ WKDW DQ LQGLYLGXDO LQ ODWHQW OHYHO UHVSRQGV fSDUWLFXODULVWLFf WR WKH VLWXDWLRQV LV DW OHDVW 6LQFH WKH ODWHQW FODVV PRGHO f ILWV ZHOO ZH FRQFOXGH WKDW JLYHQ D SHUVRQ LV LQWULQVLFDOO\ SDUWLFXODULVWLF RU LQWULQVLFDOO\ XQLYHUVDOLVWLF WKHLU UHVSRQVHV WR WKH IRXU VLWXDWLRQV $ %&'f RI UROH FRQIOLFW DUH LQGHSHQGHQW 0RGLILHG (01HZWRQ5DSKVRQ $OJRULWKP ,Q WKLV VHFWLRQ ZH SUHVHQW DQ DOWHUQDWLYH URRW ILQGLQJ DOJRULWKP IRU WKH LQFRPSOHWH H[SRQHQWLDO IDPLO\ VFRUH IXQFWLRQV RI HTXDWLRQ f $V PHQWLRQHG DERYH WKH (0 DOJRULWKP KDV ERWK SRVLWLYH DQG QHJDWLYH IHDWXUHV 7ZR YHU\ LPSRUWDQW SRVLWLYH IHDWXUHV DUH f WKH (0 DOJRULWKP LV LQVHQVLWLYH WR VWDUWLQJ YDOXHV DQG f WKH (0 DOJRULWKP ILQGV D URRW WKDW PD[LPL]HV WKH OLNHOLKRRG ,Q FRQWUDVW VLQFH WKH LQFRPSOHWHGDWD ORJ OLNHOLKRRG LV QRW JHQHUDOO\ D FRQFDYH IXQFWLRQ RI WKH SDUDPHWHUV WKH 1HZWRQ5DSKVRQ 15f RU )LVKHUVFRULQJ )6f DOJRULWKPV PD\ QRW FRQYHUJH WR D PD[LPDO URRW ,Q IDFW WKH\ ZLOO EH YHU\ VHQVLWLYH WR VWDUWLQJ YDOXHV DQG PD\ QRW FRQYHUJH DW DOO 1HJDWLYH IHDWXUHV RI WKH (0 DOJRULWKP LQFOXGH LWV VORZ FRQYHUJHQFH DQG ODFN RI SUHFLVLRQ HVWLPDWH E\SURGXFW 2Q WKH RWKHU KDQG WKH 15 DQG )6 DOJRULWKPV ZRUN ZHOO ORFDOO\ LQ WKDW LI ZH LPSOHPHQW WKHVH PHWKRGV YHU\ QHDU
PAGE 174
D PD[LPDO URRW WKH FRQYHUJHQFH UHODWLYH WR (0 LV IDVW DQG DQ HVWLPDWH RI SUHFLVLRQ RI WKH 0/ HVWLPDWRU LV REWDLQHG DV D E\SURGXFW ,Q SUDFWLFH WKH (0 DOJRULWKP PD\ TXLFNO\ DSSURDFK D VPDOO QHLJKn ERUKRRG DURXQG D PD[LPDO URRW EXW WKHQ VORZO\ FRQYHUJH WR WKH URRW )RU WKLV UHDVRQ ZH SUHVHQW DQ DOWHUQDWLYH DOJRULWKP WKDW XVHV ERWK (0 LWHUDWLRQV DQG 15 RU VRPH PRGLILHG 15 VXFK DV )6 RU TXDVL15f LWHUDWLRQV 6SHFLILFDOO\ WKH (0 DOJRULWKP ZLOO EH XVHG LQLWLDOO\ DQG WKHQ XSRQ UHDFKLQJ D QHLJKERUKRRG RI WKH PD[LPDO URRW WKH 15 W\SH DOJRULWKPV ZLOO EH HPSOR\HG 0HLOLMVRQ f VXJJHVWHG WKLV DSSURDFK LQ D ILQH H[SRVLWRU\ RI URRW ILQGLQJ PHWKRGV IRU LQFRPSOHWH GDWD VFRUH HTXDWLRQV 5HFDOO WKDW ZKHQ WKH FRPSOHWH GDWD KDV GLVWULEXWLRQ LQ WKH UHJXODU H[SRQHQWLDO IDPLO\ WKH LQFRPSOHWHGDWD ORJ OLNHOLKRRG KDV IRUP f LH KL3Zf Fr\f Ff DQG WKDW WKH VFRUH IXQFWLRQ KDV IRUP 6\ \f (7;f?< \f (f7;ff f 7R VROYH IRU D PD[LPDO URRW RI f ZH FDQ EHJLQ E\ XVLQJ WKH (0 LWHUDWLYH VFKHPH GHVFULEHG LQ f :H ZLOO FRQFOXGH WKDW WKH LWHUDWH HVWLPDWH LV LQ D VXIILFLHQWO\ VPDOO QHLJKERUKRRG RI WKH PD[LPDO URRW DV VRRQ DV ??IPf BePLf__ 6:,7&+72/f ZKHUH 6:,7&+72/f 72/ RI f $W WKLV SRLQW ZH ZLOO HPSOR\ WKH LWHUDWLYH VFKHPH GHVFULEHG LQ f $V D ILUVW VWHS LQ f ZH PXVW FDOFXODWH WKH PDWUL[ $\Pf \f ZKLFK LV DQ HVWLPDWH RI WKH QHJDWLYH +HVVLDQ RI WKH LQFRPSOHWHGDWD ORJ OLNHOLKRRG $W WLPHV WKH +HVVLDQ RU H[SHFWHG +HVVLDQ FDQ EH H[SOLFLWO\ FDOFXODWHG 7KLV LV
PAGE 175
WUXH LQ WKH 3RLVVRQ ORJOLQHDU FDVH VHH HTXDWLRQV f DQG ff 7KHQFH WKH PDWUL[ $\"Pf\f FDQ EH H[SOLFLWO\ FDOFXODWHG DQG LQYHUWHG *HQHUDOO\ KRZHYHU WKH PDWUL[ $\ ZLOO RQO\ EH DQ DSSUR[LPDWLRQ 6LQFH ERWK (S7;f?< \f DQG (S7;ff PXVW EH FDOFXODWHG GXULQJ WKH (0 DOJRULWKP LQ YLHZ RI HTXDWLRQ f ZH PXVW KDYH WKH DELOLW\ WR FDOFXODWH 6\\f DW GLIIHUHQW YDOXHV RI c :H WKHQ FRXOG XVH DV DQ DSSUR[LPDWLRQ WR \Pf\f 6\"0H Hm__ ZKHUH WKH EUDFNHW QRWDWLRQ %>L@ UHSUHVHQWV WKH LWK URZ RI PDWUL[ % DQG H fn LV D S [ YHFWRU ZLWK D VPDOO QXPEHU H LQ WKH LWK SRVLWLRQ 7KH YDOXH RI H VKRXOG EH GHWHUPLQHG E\ UXOHV XVHG IRU QXPHULFDO GLIIHUHQWLDWLRQ 0HLOLMVRQ f GLVFXVVHV WKLV DSSUR[LPDWLRQ WHFKQLTXH DQG UHIHUV WR LW DV (0DLGHG GLIIHUHQWLDWLRQ (YLGHQWO\ LI RQH XVHV DSSUR[LPDWLRQ f WKH RQO\ IXQFWLRQV QHHGHG WR EH FDOFXODWHG IRU f DUH WKH VFRUH IXQFWLRQV ZKLFK DUH GLIIHUHQFHV EHWZHHQ WKH FRQGLWLRQDO DQG PDUJLQDO H[SHFWHG YDOXHV RI WKH VXIILFLHQW VWDWLVWLF 7;f )LQDOO\ XSRQ FRQYHUJHQFH RI f ZH FDQ XVH >$\"rrK \f@ DV DQ HVWLPDWH RI WKH SUHFLVLRQ RI WKH 0/ HVWLPDWHV ,I RQH IHHOV WKH (0 DOJRULWKP ZLOO FRQYHUJH TXLFNO\ HQRXJK RU WKDW WKH PDWUL[ LQYHUVLRQ RI $\ LV XQQHFHVVDULO\ EXUGHQVRPH WKHQ RQH FDQ VHOHFW 6:,7&+72/f 72/ ,Q ZKLFK FDVH $\ ZLOO EH LQYHUWHG MXVW RQFH VLQFH WKH LWHUDWLYH VFKHPH f ZLOO FRQYHUJH DIWHU RQH LWHUDWLRQ )RU 6:,7&+72/f 72/ WKH PRGLILHG DOJRULWKP LV VLPSO\ WKH (0 DOJRULWKP VXSSOHPHQWHG E\ D VLQJOH FDOFXODWLRQ RI D SUHFLVLRQ HVWLPDWH ,I \f L f $\ILA\f>L?
PAGE 176
6:,7&+72/f 72/ WKHQ WKH (0 DOJRULWKP FDQ EH YLHZHG DV D SURFHGXUH IRU ILQGLQJ DQ DSSURSULDWH VWDUWLQJ YDOXH IRU WKH IDVWHU LWHUDWLYH VFKHPHV VXFK DV 15 RU )6 7KH PRGLILHG LWHUDWLYH VFKHPH FDQ EH GHVFULEHG DV IROORZV f 6ROYH IRU f LQ (AWf7;ff (APf7^;f?< \f f ,I __Pf __ 6:,7&+72/f WKHQ UHSODFH Pf E\ DQG JR WR f (OVH JR WR f f &DOFXODWH >$\Pf\f@B DQG 6\A\f DV GLVFXVVHG DERYH f f 5HSODFH E\ Pf Pf >$\Pf \f@B\Pf \f f ,I __Pf Pf__ 72/ WKHQ JR WR f RU ffr (OVH VWRS r ,I WKH IDVWHU OHVV VWDEOH DOJRULWKPV DUH KDYLQJ WURXEOH FRQYHUJLQJ UHVHW 6:,7&+72/f WR D VPDOOHU YDOXH DQG UHXVH WKH (0 DOJRULWKP WR JHW LQWR D VPDOOHU QHLJKERUKRRG RI WKH PD[LPDO URRW $OJRULWKP f VKRXOG EH VWDEOH LQVHQVLWLYH WR VWDUWLQJ YDOXHV UHODn WLYHO\ IDVW DQG ZLOO SURYLGH DQ HVWLPDWH RI WKH SUHFLVLRQ RI WKH 0/ HVWLPDWH DV D E\SURGXFW $V D VSHFLDO FDVH OHW XV FRQVLGHU DSSO\LQJ WKH PRGLILHG DOJRULWKP f WR WKH 3RLVVRQ ORJOLQHDU PRGHO RI VHFWLRQ ,Q WKDW FDVH ZH ZHUH DEOH WR GHULYH DQ H[SOLFLW IRUPXOD IRU WKH REVHUYHG DQG H[SHFWHG LQIRUPDWLRQ IRU WKH LQFRPSOHWH GDWD )RU VLPSOLFLW\ ZH ZLOO XVH WKH H[SHFWHG LQIRUPDWLRQ GLVSOD\HG LQ HTXDWLRQ f DV RXU $\ PDWUL[ LH $\9f (,U8 9ff =n'AA/n'a?/Aff/':ff= f
PAGE 177
%\ H[SUHVVLRQ f ZH FDQ ZULWH WKH VFRUH IXQFWLRQ DV 6U3\f =n?OL/nAMef@ f ZKHUH WKH DQG ffÂ§f DUH FRPSRQHQWZLVH RSHUDWRUV 7R VWDUW WKH DOJRULWKP ZH DSSO\ WKH (0 LWHUDWLYH VFKHPH RI f FRQWLQXLQJ XQWLO __Pf Pf__ 6:,7&+72/f $W WKLV SRLQW ZH ZLOO JR WR VWHS f RI f XVLQJ WKH IRUPXODV f DQG f IRU $\ DQG 6\ 5HSHDW VWHSV ff RI f XQWLO WKH FRQYHUJHQFH FULWHULRQ LV PHW 'LVFXVVLRQ 7KLV FKDSWHU HPSKDVL]HG ORJOLQHDU PRGHO ILWWLQJ ZKHQ WKH GDWD DUH LQFRPSOHWH $V DQ H[DPSOH D ODWHQW FODVV ORJOLQHDU PRGHO ZDV ILW WR WKH GDWD SUHVHQWHG LQ *RRGPDQ f 7KH SULPDU\ PHWKRG RI REWDLQLQJ 0/ HVWLPDWHV RI WKH ORJOLQHDU SDUDPHWHUV ZDV WKH (0 DOJRULWKP EXW RWKHU SRVVLELOLWLHV VXFK DV WKH 1HZWRQ5DSKVRQ DOJRULWKP ZHUH GLVFXVVHG ,Q VHFWLRQ ZH UHYLHZHG WKH (0 DOJRULWKP ZLWK VSHFLDO DWWHQWLRQ JLYHQ WR WKH UHJXODU H[SRQHQWLDO IDPLO\ )RU WKH UHJXODU H[SRQHQWLDO FDVH WKH LWHUDWLYH VFKHPH f ZDV VKRZQ WR EH HTXLYDOHQW WR WKH (0 DOJRULWKP 7KHQ LQ VHFWLRQ ZH GHULYH WKH VSHFLILF IRUP IRU WKH (0 DOJRULWKP ZKHQ WKH GDWD DUH SURGXFW 3RLVVRQ ZLWK PHDQV IROORZLQJ D ORJOLQHDU PRGHO $Q H[SOLFLW IRUPXOD IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ LV GHULYHG LQ VHFWLRQ $Q HVWLPDWH RI WKH YDULDQFH RI WKH 0/ HVWLPDWHV RI ODWHQW FODVV ORJOLQHDU SDUDPHWHUV LV VKRZQ LQ HTXDWLRQ f 7KH DVVXPSWLRQ WKDW WKH GDWD DUH SURGXFW 3RLVVRQ LV QRW DV UHVWULFWLYH DV LW PD\ VHHP ,Q VHFWLRQ ZH GLVFXVV LQIHUHQFH IRU ORJOLQHDU SDUDPHWHUV
PAGE 178
ZKHQ WKH FRPSOHWH GDWD DUH PXOWLQRPLDOO\ GLVWULEXWHG 7KH UHVXOWV IROORZ E\ DUJXPHQWV RI %LUFK f DQG 3DOPJUHQ f ,W LV VKRZQ WKDW ZKHQ WKH WRWDO VDPSOH VL]H LV FRQVLGHUHG IL[HG LQIHUHQFHV DERXW DOO ORJOLQHDU SDUDPHWHUV H[FHSW WKH RQH WKDW LV IL[HG E\ GHVLJQ DUH WKH VDPH IRU ERWK WKH SURGXFW 3RLVVRQ DVVXPSWLRQ DQG WKH PXOWLQRPLDO DVVXPSWLRQ $ PHWKRG RI HVWLPDWLQJ WKH YDULDQFH RI FODVVLILFDWLRQ SUREDELOLW\ HVWLPDWHV DQG IXQFWLRQV WKHUHRIf LV DOVR GHYHORSHG LQ WKLV VHFWLRQ :H LQWURGXFH DQ DOWHUQDWLYH URRW ILQGLQJ DOJRULWKP f IRU WKH LQFRPSOHWH H[SRQHQWLDO IDPLO\ VFRUH IXQFWLRQV LQ VHFWLRQ 7KH DOJRULWKP H[SORLWV WKH SRVLWLYH IHDWXUHV RI ERWK WKH (0 DQG 1HZWRQ5DSKVRQ W\SH DOJRULWKPV 6SHFLILFDOO\ WKH DOJRULWKP VKRXOG SURYH WR EH LQVHQVLWLYH WR VWDUWLQJ YDOXHV DQG UHODWLYHO\ IDVW FRPSDUHG WR VWUDLJKW (0f ,W DOVR ZLOO SURYLGH DQ HVWLPDWH RI WKH SUHFLVLRQ RI WKH HVWLPDWRUV DV D E\SURGXFW $V PHQWLRQHG DERYH PDQ\ PRGHOV WKDW FDQ EH ILW XVLQJ WKH (0 DOJRULWKP FDQ DOVR EH ILW PRUH GLUHFWO\ XVLQJ WKH 1HZWRQ5DSKVRQ DOJRULWKP $SSHQGL[ % LQFOXGHV D GLVFXVVLRQ DERXW WKH SURJUDP 1/,1 ZKLFK ILWV JHQHUDOL]HG OLQHDUQRQOLQHDU PRGHOV $OVR LQFOXGHG LQ WKH DSSHQGL[ LV WKH FRGH IRU WKH WZR PRGHO ILWWLQJ SURJUDPV fHPORJOLQf DQG f1/,1f 7KH )2575$1 SURJUDP fHPORJOLQf LV EDVHG RQ WKH LWHUDWLYH VFKHPH f DQG WKH IRUPXOD f IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ 7KH 6SOXV SURJUDP f1/,1f FDQ EH XVHG WR ILW JHQHUDOL]HG OLQHDU DQG QRQOLQHDU PRGHOV 7KH GDWD DUH UHTXLUHG WR EH LQGHSHQGHQW DQG RI WKH H[SRQHQWLDO GLVSHUVLRQ W\SH VHH GLVFXVVLRQ RI 1/,1f 7KH DXWKRU SODQV RQ LPSOHPHQWLQJ WKH DOJRULWKP GHVFULEHG LQ f IRU WKH 3RLVVRQ ORJOLQHDU PRGHO FDVH
PAGE 179
$33(1',; $ &$/&8/$7,216 )25 &+$37(5 :H VHW RXW WR VKRZ WKDW WKH PDWUL[ RI HTXDWLRQ f YL] '0 DJG +n0 LV HTXDO WR WKH PDWUL[ A8LU fp7LU^fL ? 'WWRf 0 JnRf R O 0L ? 9R 0f ZKHUH 0 '?7f '?n.4f+^+'?Af+fnL+'?Af kI rn 5 DQG 0 QO+n'?.4f+f? 3URRI )RU QRWDWLRQDO FRQYHQLHQFH OHW '^Uf DQG OHW + +ef :H ZLOO VWDWH D EDVLF PDWUL[ DOJHEUD UHVXOW WKH SURRI RI ZKLFK FDQ EH IRXQG LQ $LWFKLVRQ DQG 6LOYH\ f /HW $ EH QRQVLQJXODU DQG % EH RI IXOO FROXPQ UDQN $VVXPLQJ FRPSDWLELOLW\ $ %< >$ $n%L%n$n%\n%n$ $a%%n$a%fa? ?%n f f9 ^%n$n%<%n$ %n$n%f fn 7KDW LV WKH SDUWLWLRQHG PDWUL[ KDV D VLPSOH LQYHUVH
PAGE 180
8VLQJ WKLV UHVXOW LGHQWLI\LQJ DQG +Q} ZLWK $ DQG % ZH DUULYH DW DQ HTXLYDOHQW IRUP IRU f ,W LV 'aO++n'an+far+n'B Qr'n++n? A QA+n'n+\n+n' Q?+n'an+f f ; p7ULUnL ? 9 f[ 'aO 'n+W+n'n+\n+n' Qr'an++n'an+f ? 9 QA+n'A+A+n' Qr+n 'an +fa^ ff 1RZ XVLQJ WKH IDFW WKDW 'BUf k UÂUÂf'URf pUnU pOSfkOLMf DQG E\ /HPPD kAf ZH FDQ PXOWLSO\ RXW WKHVH WKUHH SDUWLWLRQHG PDWULFHV WR JHW 0L ? 92 0f ZKHUH 0L 'a?7f '77fLUn'URfLfLn/!URf kI rnr DQG 0 QO^+'?n.f+fa? 7KLV LV ZKDW ZH VHW RXW WR VKRZ 5HVXOW f :H ZLVK WR VKRZ WKDW WKH DV\PSWRWLF YDULDQFHV DUH UHODWHG a3fa3fn YDUÂ0ff YDUILAf kI fÂ§fÂ§fÂ§fÂ§ 7OL DFFRUGLQJ WR
PAGE 181
3URRI 6LQFH M HA ZH FDQ LQYRNH WKH GHOWD PHWKRG WR DUULYH DW YDUL$IOf YDUH $Iff =!HrfYDU_0ff/!Hrf 'HAf AYDUeSff IILfÂ§'HAf E\ Â Ln 'HAfYDUeSff'HAf kArreRr QL aSfaS\ YDU$SAf fÂ§ kfÂ§ $BB 7OL ZKHUH WKH HTXDO VLJQV UHSUHVHQW DV\PSWRWLF HTXLYDOHQFH 5HVXOW f :H ZLVK WR VKRZ WKDW WKH DV\PSWRWLF YDULDQFHV RI WKH IUHHGRP SDUDPHWHU HVWLPDWHV DUH UHODWHG DFFRUGLQJ WR YDU0ff YDUSff $ ZKHUH $ ;n;\n;n& 4Af&n;;n;f 9QL 3URRI ,Q WKH IROORZLQJ WKH HTXDO VLJQV UHSUHVHQW DV\PSWRWLF HTXLYDOHQFH 1RZ VLQFH c ;n;f ; &ORJ$Qf ZH FDQ LQYRNH WKH GHOWD PHWKRG WR DUULYH DW YDU"0ff ;n;f;n&YDU ORJ$$0fff&;;n;f ;;f;&/!$Lf$YDU$0ff$n'$Uf&;;n;f ;;f;n&'$ARf$YDU$Sff$=!$Lf&n;;n;f .a3fa3f? k A e M $n'B $\LRf&;;n;f YDUA3Af A3fa3fn ^;n;A;n&'A$A$Â k A e M $n'B $Mf&n;; ;f
PAGE 182
%XW E\ DVVXPSWLRQ $Of RI VHFWLRQ $Qf$ p AOAM$n' $Kf fÂ§ 'a p$OM9M ? p$OM ? DV BW ^%$LM+Mf 9 p$M f 9 p \MM fÂ§ a p$O MAM $ ?p$M+Mf ÂeL 9M f 9M ? QL k$LMLML 9 kA MOrM 9Q O7L OP 9p 9Mf +HQFH ZH KDYH WKDW WKH DV\PSWRWLF HTXLYDOHQFH YDU0ff YDUSff $ KROGV ZKHUH $ ;n;\n;n& OPL ? VM P OP ?p; pA ZKLFK LV ZKDW ZH VHW RXW WR VKRZ
PAGE 183
$33(1',; % &$/&8/$7,216 )25 &+$37(5 :H SURYH WKDW WKH IRXU SURSHUWLHV RI WKH (0 DOJRULWKP LQWURGXFHG LQ VHFWLRQ GR LQGHHG KROG 7KHVH SURRIV DUH HVVHQWLDOO\ WKRVH RI 'HPSVWHU HW DO f DQG /LWWOH DQG 5XELQ f 3URSHUW\ ,I DQG Pf DUH WKH PWK DQG P OL LWHUDWH HVWLPDWHV REWDLQHG YLD WKH (0 DOJRULWKP WKHQ LH WKH ORJ OLNHOLKRRG LV LQFUHDVHG DW HDFK VXFFHVVLYH LWHUDWLRQ 3URRI $V LQ VHFWLRQ ZH ZULWH WKH LQFRPSOHWH GDWD ORJ OLNHOLKRRG DV 1RZ E\ HQVHQfV LQHTXDOLW\ Pf \f 9 7KLV IROORZV VLQFH Pf }Pf Yf L:9}Pf [f?\ \f P!fI[LU[ ^ORJ &=UIUH"f ORJ AfGY +HHA\f !+HHZ\f %Of
PAGE 184
ZKHUH WKH ODVW LQHTXDOLW\ KROGV VLQFH WKH fORJf IXQFWLRQ LV FRQFDYH ZKHUHE\ HQVHQfV LQHTXDOLW\ WHOOV XV WKDW O6 A;I[?U^[nf A A;O
PAGE 185
%XW WKLV LPSOLHV WKDW ÂII0UDf\f8 R VLQFH 6\AK\f Â:mPff8 ÂWI0Pff8f 6\AYfiVPAK\f 8! 7KHUHIRUH ROP fÂ§! RRf VLQFH E\ GHILQLWLRQ RI Pf 8f R DQG EHFDXVH DV Pf__ JRHV WR ]HUR WKH IXQFWLRQ Pf \f_APf JRHV WR ]HUR %XW E\ FRQYHUJHQFH SURSHUWLHV RI WKH (0 DOJRULWKP __Pf r1Lf__ DV P !f RR 7KXV HTXDWLRQ %f KROGV DQG LV WDQWDPRXQW WR iTÂ\^Yf ,r}f 3URSHUW\ )RU DQ\ }f8 6\H \f ;f_) \f
PAGE 186
3URRI Â:$\f8 A(fRL[H[f?< \f?HD (HS[H;f?< \f (R6[H@;f?< \f 6[G[fI[O<^[@\GfG 5 / G Df ORJ 0r}?M5I[^[HfGf Y I[^[@f f GX B 5IHI[AfGX U I[^[nL 4fGY t O5I[^AfGXf? 0ORVA\\AfOA V
PAGE 187
%XW (HDL[O\H\[f@< }f er }f N>6Âm6Ucf@[ er>6MU}f%IW[;f_\ \f@[ >6[;f (r6;f;f_\ Mf@f_\ \f VmEA}$fVLfMUf_\ \f %}f[HR;f_\ \f(H6n[H;f?< \f YDUfR$;f_\ \f +HQHH ,UR \f e}f[m;f_\ \f :E[} ;f?< \f ZKLFK LV ZKDW ZH VHW RXW WR VKRZ J 7KHRUHP ,I WKH FRPSOHWH GDWD YHFWRU ; KDV GLVWULEXWLRQ LQ WKH UHJXODU H[SRQHQWLDO IDPLO\ LH WKH GHQVLW\ IXQFWLRQ KDV IRUP I[[?3f D[fH[S7n[f3Fff %f ZLWK UHVSHFW WR VRPH PHDVXUH WKHQ WKH (0 DOJRULWKP FDQ EH XVHG WR ILQG WKH 0/( RI 3 EDVHG RQ LQFRPSOHWH GDWD < <;f DQG WKH DOJRULWKP LV DV VWDWHG LQ f 3URRI 6XQGEHUJ f VKRZV WKDW WKH (0 DOJRULWKP FDQ EH XVHG WR ILQG WKH 0/ HVWLPDWHV RI 3 EDVHG RQ LQFRPSOHWH GDWD :H ZLOO VKRZ WKDW WKH
PAGE 188
JHQHUDO (0 DOJRULWKP RI f UHGXFHV WR f ZKHQ WKH FRPSOHWH GDWD KDYH GLVWULEXWLRQ LQ WKH UHJXODU H[SRQHQWLDO IDPLO\ 7KH JHQHUDO (0 DOJRULWKP f LV GHILQHG DV 4PfPf\f PD[ 4 A@ \f ZKHUH 433^P?\f (VLP!H[c[f?< \f 1RZ VLQFH ; KDV GHQVLW\ RI IRUP %f LW IROORZV WKH WKH ORJ OLNHOLKRRG e[^n;f KDV IRUP 0 ;f ORJD;f 7n;f Ff +HQFH 43 Pf\f (A?LR;f?< \f3(A7;f?< \f GILf 1RZ VLQFH GS GS GSnGS Y]Uf8^;ff LV QHJDWLYH GHILQLWH LW IROORZV WKDW WKH VROXWLRQ VD\ WR iS4^ILSL?\f R LV WKH YDOXH RI WKDW PD[LPL]HV WKH IXQFWLRQ 4Pf \f %XW A433Pf\f (f07;fL< \f (AW7;f?< \f (f7;ff
PAGE 189
+HQFH "Pf VDWLVILHV (SPf7;ff (SPf^7;f?< \f ZKLFK LV WDQWDPRXQW WR VKRZLQJ WKH HTXLYDOHQFH RI WKH WZR LWHUDWLYH VFKHPHV f DQG f D :H GLIIHUHQWLDWH WKH VFRUH YHFWRU RI HTXDWLRQ f WR REWDLQ DQ H[SOLFLW H[SUHVVLRQ IRU WKH REVHUYHG LQIRUPDWLRQ PDWUL[ 5HFDOO WKDW ZH DUH WR VKRZ WKDW WKH LQIRUPDWLRQ PDWUL[ FDQ EH H[SUHVVHG DV LQ f YL] ,U\f =n'Qf9'MAf/'Qf= 3URRI %\ HTXDWLRQ f ZH NQRZ WKDW WKH VFRUH YHFWRU IRU < LV r1HAfA 1RZ G6
PAGE 190
:H VHW RXW WR ILQG WKH GHULYDWLYH RI WKH VFRUH YHFWRU ZLWK UHVSHFW WR M ,W LV G00 />]n'0s^\Af/@ GIM /?>L 7KHUHIRUH D:SAA =n'0 =n'SrMJWf/f P [Pn/+MMMIF: /\ WQQ? B G6<\f B ÂG63n9f a Gn ? GILn f?Gn=n'Qf/f'Af/'Qf= =nGOnYaAf`'Kf= ZKLFK LV ZKDW ZH VHW RXW WR VKRZ 8VLQJ WKH GHOWD PHWKRG ZH FDQ ILQG WKH DV\PSWRWLF YDULDQFH RI U 7KH H[SUHVVLRQ IRU WKH DV\PSWRWLF YDULDQFH LQYROYHV WKH PDWUL[ GLUGn :H VKRZ
PAGE 191
WKDW HTXDWLRQ f KROGV LH 6WW >'Mf GS 3URRI )URP f ZH KDYH WKDW >'LUf 7Un@; H[SrO P 1 MJJMMJf f RU HTXLYDOHQWO\ WKDW Be B H[S;3f ` a?9}n 1 9OnfH[S ;Sffn +HUH c LV DQ XQFRQVWUDLQHG SDUDPHWHU YHFWRU RI OHQJWK S 1RWLFH WKDW 9QIM 1 DQG KHQFH GLU D ^ H[S;f ? G 9 ? Dn Dnn YOAH[S ^;3fGn ?9QYf iÂAfÂf Â Z An[ JIH+NLK ZfOr >M'Zf B fn@[ L!\ Yn '(6&5,37,21 2) &20387(5 352*5$06 HPORJOLQ %ULHIO\ HPORJOLQ LV D )2575$1 SURJUDP WKDW FDQ EH XVHG WR REWDLQ 0/ HVWLPDWHV RI ORJOLQHDU SDUDPHWHUV DV ZHOO DV DQ HVWLPDWH RI WKHLU SUHFLVLRQ ZKHQ RQO\ GLVMRLQW VXPV RI WKH FRPSOHWH 3RLVVRQ GDWD DUH REVHUYDEOH 7KH (0 DOJRULWKP f LV XVHG WR ILQG WKH 0/ HVWLPDWHV DQG H[SUHVVLRQ f LV XVHG WR FDOFXODWH WKH SUHFLVLRQ HVWLPDWH ,W LV DVVXPHG WKDW WKH
PAGE 192
FRPSOHWH GDWD ; DUH GLVWULEXWHG SURGXFW 3RLVVRQ ZLWK Q [ PHDQ YHFWRU S IROORZLQJ WKH ORJOLQHDU PRGHO ?RJS = 7KH LQFRPSOHWH GDWD PXVW EH H[SUHVVLEOH DV < /; ZKHUH / LV DQ P [ Q PDWUL[ WKDW VDWLVILHV SURSHUWLHV Off RI f 7KH XVHU PXVW LQSXW WKH IROORZLQJ LQIRUPDWLRQ f DQ LQLWLDO HVWLPDWH RI WKH FRPSOHWH GDWD PHDQV WKDW HTXDWLRQ /Mrf \ LH Lrf LV FRQVLVWHQW ZLWK WKH REVHUYHG GDWD \ f P DQG Q WKH OHQJWK RI WKH REVHUYHG DQG FRPSOHWH GDWD YHFWRUV f S WKH QXPEHU RI ORJOLQHDU SDUDPHWHUV f = WKH Q [ S IXOO FROXPQ UDQN GHVLJQ PDWUL[ f / WKH P [ Q PDWUL[ WKDW VDWLVILHV < /; 7KH RXWSXW LQFOXGHV f DQ 0/ HVWLPDWH RI WKH ORJOLQHDU SDUDPHWHU YHFWRU f YDUf DQ HVWLPDWH RI SUHFLVLRQ RI WKH 0/ HVWLPDWH f WKH OLNHOLKRRG UDWLR JRRGQHVVRIILW VWDWLVWLF f GI WKH GHJUHHV RI IUHHGRP DVVRFLDWHG ZLWK WKH QXOO DV\PSWRWLF &KLVTXDUH GLVWULEXWLRQ RI f S DQ HVWLPDWH RI WKH FRPSOHWH GDWD FHOO PHDQV f YDUOf DQ HVWLPDWH RI WKH SUHFLVLRQ RI S 3RLVVRQ VDPSOLQJf 1/,1 1/,1 LV DQ 6SOXV %HFNHU HW DO f SURJUDP WKDW ILWV JHQHUDOL]HG OLQHDU DQG QRQOLQHDU PRGHOV WR GDWD ZLWK GLVWULEXWLRQV LQ WKH H[SRQHQWLDO GLVSHUVLRQ IDPLO\ TUJHQVRQ f :H QRZ EULHIO\ GHVFULEH H[SRQHQWLDO GLVSHUVLRQ PRGHOV DQG KRZ WR ILW WKHP
PAGE 193
$ *HQHUDO $OJRULWKP )RU )LWWLQJ *HQHUDOL]HG /LQHDU1RQOLQHDU 0RGHOV /HW
PAGE 194
VSHFLILHV D JHQHUDOL]HG OLQHDU PRGHO */0f 0F&XOODJK DQG 1HOGHU f ,Q */0 SDUODQFH WKH IXQFWLRQ J LV NQRZQ DV WKH fOLQNf IXQFWLRQ ([DPSOHV LQFOXGH f 3RLVVRQ /RJOLQHDU 0RGHO ^3RLVVRQUf Uc ORJÂf [>` f %LQRPLDO /RJLVWLF 0RGHO ^%LQRPLDOQUf J ORJMIA UML [nIO` f 1RUPDO /LQHDU 0RGHO ^1RUPDOA ef Uf IL UfL [nIL` 0D[LPL]LQJ WKH /LNHOLKRRG 2XU REMHFWLYH LV WR PDNH LQIHUHQFH DERXW WKH ORJOLQHDU SDUDPHWHUV LQ DQG KHQFH DERXW WKH PHDQV :H ZLOO EDVH RXU LQIHUHQFH RQ WKH PD[LPXP OLNHOLKRRG HVWLPDWHV DQG WKHLU SUHFLVLRQ 7KHUHIRUH ZH PXVW PD[LPL]H WKH ORJ OLNHOLKRRG ZLWK UHVSHFW WR 7KH ORJ OLNHOLKRRG IRU WKH VDPSOH < LV .IF\f ÂOR6D^\LUZLf L Â mÂff! L L ZKHUH NÂf fÂ§ fÂ§ KL>[?Af
PAGE 195
7KH VFRUH IXQFWLRQ LV BG^S\f B V^3?\f GI ZLMS\Lrntff $ rr ZKHUH Abf:9an6 Â'n :A W Y 6 \IL Z pZL %f G\ Z +HUH WKH PDWUL[ LV UHIHUUHG WR DV WKH fPRGHO PDWUL[f 7KH PD[LPXP OLNHOLKRRG HVWLPDWH PD\ EH IRXQG E\ VROYLQJ IRU D ]HUR RI WKH VFRUH IXQFWLRQ %f DW OHDVW LQ PDQ\ FDVHVf 7R VROYH IRU WKLV ]HUR ZH ZLOO XVH D 1HZWRQ 5DSKVRQ W\SH DOJRULWKP ZKLFK ZLOO UHTXLUH FDOFXODWLRQ RI WKH +HVVLDQ PDWUL[ f : 6 AVf f LU'nI9n:' =f D ;L'n9n:' 'n=f D ZKHUH (=f VR WKDW WKH H[SHFWHG YDOXH RI WKH +HVVLDQ LV H fÂ§ fÂ§'n9aO:' 9 GnG f Ds9 99s
PAGE 196
7KHUHIRUH IRU IFf LQ D QHLJKERUKRRG RI c WKH VROXWLRQ WR WKH VFRUH HTXDWLRQ ZH KDYH WKH IROORZLQJ OLQHDU DSSUR[LPDWLRQ G:Nf?\f f GLP\f G+A\f^SNf A GS BLB FU GS n GSnGS 'n:9n6 'n:9a'3ANA S:f L /SNn!f 7KH QH[W HVWLPDWH RI c ZLOO EH SN? WKH VROXWLRQ WR WKH OLQHDU HTXDWLRQ /SNff 7KH VROXWLRQ LV AIFLf B ANf ^GZYGfaGZYV 'n:9a'fa'n:9a'SNA 6f % f ^':9aO'faO':9aA ZKHUH '3A _6n LV D fORFDOf GHSHQGHQW YDULDEOH 7KH LWHUDWLYH DOJRULWKP %f ZKLFK LV WKH )LVKHUVFRULQJ DOJRULWKP LV DOVR UHIHUUHG WR DV WKH LWHUDWLYHO\ UHZHLJKWHG OHDVW VTXDUHV DOJRULWKP ,5/6f 7KH UHDVRQ IRU WKLV ODEHO LV HYLGHQWO\ GXH WR WKH ODVW H[SUHVVLRQ LQ %f )RU HDFK N LW UHVHPEOHV D ZHLJKWHG OHDVW VTXDUHV HVWLPDWH ZKHUH WKH ZHLJKW PDWUL[ LV : WKH PRGHO PDWUL[ LV DQG WKH GHSHQGHQW YDULDEOH LV 'HQRWLQJ WKH 0/ HVWLPDWH E\ c ZH KDYH WKDW LQ PDQ\ VLWXDWLRQV c a $13D'n:9a'faf LH c KDV DQ DV\PSWRWLF QRUPDO GLVWULEXWLRQ $OVR ZH OHW [ GHQRWH D FRQVLVWHQW HVWLPDWRU RI WKH GLVSHUVLRQ SDUDPHWHU FU )RU H[DPSOH GLYLGLQJ WKH GHYLDQFH VWDWLVWLF E\ WKH GHJUHHV RI IUHHGRP DVVRFLDWHG ZLWK LWV DV\PSWRWLF GLVWULEXWLRQ UHVXOWV LQ D FRQVLVWHQW HVWLPDWRU RI FU SUJHQVRQ f
PAGE 197
%\ HYDOXDWLQJ DQG 9 DW c DQG XVLQJ WKH FRQVLVWHQW HVWLPDWRU [ ZH FDQ FRQVLVWHQWO\ HVWLPDWH WKH DV\PSWRWLF YDULDQFH RI E\ YDUf m D?IU:9OEf 7KH DVWXWH UHDGHU ZLOO QRWLFH WKDW XSRQ VSHFLILFDWLRQ RI WKH H[SRQHQWLDO GLVSHUVLRQ GLVWULEXWLRQ WKH PDWUL[ 9 LV GHWHUPLQHG $OVR WKH PDWUL[ : LV D PDWUL[ RI NQRZQ FRQVWDQWV +HQFH WKH RQO\ PDWUL[ QRW GHWHUPLQHG DV \HW LV WKH VR FDOOHG fPRGHO PDWUL[f 7KH PDWUL[ LV D IXQFWLRQ RI DQG ; WKURXJK WKH IROORZLQJ IXQFWLRQ Z ZK[nA :KHQ WKH PRGHO LV RI WKH IRUP %f LH WKH PRGHO LV D */0 ZH KDYH WKDW WKH PRGHO PDWUL[ L ?G0? GS .GJf:. GS f B GQ? GY ? B GUI?aO\ ?GUIf?GSf?G/Lf DQG FDQ EH FDOFXODWHG H[SOLFLWO\ %XW PRUH JHQHUDOO\ ZKHQ WKH PRGHO LV ^('K` FDQ QRW EH FDOFXODWHG H[SOLFLWO\ RU DW OHDVW LV YHU\ GLIILFXOW WR FDOFXODWH H[SOLFLWO\ +RZHYHU LW FDQ EH QXPHULFDOO\ HVWLPDWHG 1XPHULFDO $SSUR[LPDWLRQ WR :H XVH D SRSXODU DQG VLPSOH WHFKQLTXH WR QXPHULFDOO\ DSSUR[LPDWH 5HFDOO WKDW LV WKH PDWUL[ RI SDUWLDO GHULYDWLYHV RI cM ZLWK UHVSHFW WR 3 +HQFH WKH SUREOHP LV WR DSSUR[LPDWH D GHULYDWLYH PDWUL[ 2QH VXFK HVWLPDWH DQG WKH RQH XVHG LQ WKH SURJUDP 1/,1 LV m 'Q >SIL HfQHfQ3 HSfLLOHSf?( %f ZKHUH H fn LV D S [ YHFWRU ZLWK WKH VPDOO FRQVWDQW H LQ WKH LWK SRVLWLRQ DQG WKH PDWUL[ ( LV D S [ S GLDJRQDO PDWUL[ ZLWK H RQ WKH GLDJRQDO 7KXV ( HOS >HL HS?
PAGE 198
1RZ WKH ,5/6 DOJRULWKP ZLOO LQYROYH MXVW RQH DGGLWLRQDO VWHS DQG WKDW LV WR FDOFXODWH D QXPHULFDO DSSUR[LPDWLRQ WR WKH PRGHO PDWUL[ 7KH DFWXDO DOJRULWKP XVHG LQ 1/,1 LV f ,QSXW \ZIM K;f9L[f DQG WKH GHYLDQFH IXQFWLRQ 'HY\Z \f f )LQG DQ LQLWLDO HVWLPDWH rf RI c f &RPSXWH 'QAf 9Pf 9POf DQG \ LPff %f f &RPSXWH Pf f &RPSXWH 'HY\ZLPff f ,I __'HY\ Z 'HY\Z\ 72/ UHSODFH cA E\ Pf DQG JR WR f (OVH VWRS 1RWLFH WKDW VWHS f RI %f LQYROYHV LQSXWWLQJ WKH GDWD WKH ZHLJKWV WKH PHDQ IXQFWLRQ WKH YDULDQFH IXQFWLRQ DQG WKH FRUUHVSRQGLQJ GHYLDQFH IXQFWLRQ ,W IROORZV WKDW WKLV SURJUDP FDQ PRUH JHQHUDOO\ EH XVHG WR ILW PRGHOV YLD TXDVLOLNHOLKRRG PHWKRGV 0F&XOODJK DQG 1HOGHU f $QRWKHU UHPDUN LV ZRUWKZKLOH PHQWLRQLQJ :KHQ WKH PRGHO LV ^(' JQf L L ;` LH D /LQHDU ,GHQWLW\ OLQN PRGHO WKH QXPHULFDO DSSUR[LPDWLRQ 'MY RI LQ %f ZKLFK HTXDOV ; LV H[DFW 6SHFLILFDOO\ IRU WKH 1RUPDO /LQHDU 0RGHO
PAGE 199
WKH DSSUR[LPDWLRQ LV H[DFWO\ HTXDO WR WKH PRGHO PDWUL[ ; 7KH DUJXPHQW LV DV IROORZV '1fLM fÂ§ >9L3 WMf +LI HMf@_OHMOO 7KXV 'Q ; >[nÂƒ3 WMf [nLL3 WMf@H >[nL3 [nL&M [?3 [nLHM@H [A MH [nAMH f[LMH [LM ;f\ 'fm
PAGE 200
%,%/,2*5$3+< $JUHVWL $ f $QDO\VLV RI 2UGLQDO &DWHJRULFDO 'DWD 1HZ
PAGE 201
&RFKUDQ :* f 7KH &RPSDULVRQ RI 3HUFHQWDJHV LQ 0DWFKHG 6DPSOHV %LRPHWULND &RKHQ f $ &RHIILFLHQW RI $JUHHPHQW IRU 1RPLQDO 6FDOHV (GXF 3V\FKRO 0HDV &RQDZD\ 05 f &RQGLWLRQDO /LNHOLKRRG 0HWKRGV IRU 5HSHDWHG &DWHJRULFDO 5HVSRQVHV $PHU 6WDWLVW $VVRF &RQDZD\ 05 f $ 5DQGRP (IIHFWV 0RGHO IRU %LQDU\ 'DWD %LRPHWn ULFV &R[ '5 f 7KH $QDO\VLV RI 0XOWLYDULDWH %LQDU\ 'DWD $SSOLHG 6WDWLVWLFV 'DOH 5 f *OREDO &URVV5DWLR 0RGHOV IRU %LYDULDWH 'LVFUHWH 2UGHUHG 5HVSRQVHV %LRPHWULFV 'DUURFK 1 f 7KH 0DQWHO+DHQV]HO 7HVW DQG 7HVWV RI 0DUJLQDO 6\PPHWU\ )L[HG (IIHFWV DQG 0L[HG 0RGHOV IRU D &DWHJRULFDO 5HVSRQVH ,QWHUQDWLRQDO 6WDWLVWLFDO 5HYLHZ 'DV *XSWD 6 DQG 3HUOPDQ 0' f 3RZHU RI WKH 1RQFHQWUDO ) WHVW (IIHFW RI $GGLWLRQDO 9DULDWHV RQ +RWHOOLQJfV 7WHVW RXUQDO RI WKH $PHULFDQ 6WDWLVWLFLDQ 'DYLV &6 f 6HPL3DUDPHWULF DQG 1RQ3DUDPHWULF 0HWKRGV IRU WKH $QDO\VLV RI 5HSHDWHG 0HDVXUHPHQWV ZLWK $SSOLFDWLRQV WR &OLQLFDO 7ULDOV 8QSXEOLVKHG 0DQXVFULSW 'HPSVWHU $3 /DLUG 10 DQG 5XELQ '% f 0D[LPXP /LNHOLKRRG (VWLPDWLRQ )URP ,QFRPSOHWH 'DWD 9LD WKH (0 $OJRULWKP 5 6WDWLVW 6RF % (IURQ % DQG +LQFNOH\ '9 f $VVHVVLQJ WKH $FFXUDF\ RI WKH 0D[LPXP /LNHOLKRRG (VWLPDWRU 2EVHUYHG 9HUVXV ([SHFWHG )LVKHU ,QIRUPDWLRQ %LRPHWULND (]]HW ) DQG :KLWHKHDG f $ 5DQGRP (IIHFWV 0RGHO IRU 2UGLQDO 5HVSRQVHV )URP D &URVVRYHU 7ULDO 6WDWLVWLFV LQ 0HGLFLQH )RVWHU 0+ DQG 0DUWLQ 0/ f 3UREDELOLW\ &RQILUPDWLRQ DQG 6LPSOLFLW\5HDGLQJV LQ WKH 3KLORVRSK\ RI ,QGXFWLYH /RJLF 1HZ
PAGE 202
*RRGPDQ /$ f $VVRFLDWLRQ 0RGHOV DQG WKH %LYDULDWH 1RUPDO IRU &RQWLQJHQF\ 7DEOHV ZLWK 2UGHUHG &DWHJRULHV %LRPHWULND *RXULHURX[ & 0RQIRUW $ DQG 7URJQRQ $ f 3VHXGR 0D[LPXP /LNHOLKRRG 0HWKRGV 7KHRU\ (FRQRPHWULFD *UL]]OH ( 6WDUPHU &) DQG .RFK ** f $QDO\VLV RI &DWHJRULFDO 'DWD E\ /LQHDU 0RGHOV %LRPHWULFV +DEHU 0 Df /RJOLQHDU 0RGHOV )RU &RUUHODWHG 0DUJLQDO 7RWDOV RI D &RQWLQJHQF\ 7DEOH &RPPXQLFDWLRQV LQ 6WDWLVWLFV7KHRU\ DQG 0HWKRGV +DEHU 0 Ef 0D[LPXP /LNHOLKRRG 0HWKRGV IRU /LQHDU DQG /RJ/LQHDU 0RGHOV LQ &DWHJRULFDO 'DWD &RPS 6WDW t 'DWD $QDO +DEHU 0 DQG %URZQ 0 f 0D[LPXP /LNHOLKRRG 0HWKRGV IRU /RJ/LQHDU 0RGHOV :KHQ ([SHFWHG )UHTXHQFLHV DUH 6XEMHFW WR /LQHDU &RQVWUDLQWV RXUQDO RI WKH $PHULFDQ 6WDWLVWLFDO $VVRFLDWLRQ +DEHUPDQ 6f $QDO\VLV RI 4XDOLWDWLYH 'DWD 9ROV t 1HZ
PAGE 203
/DLUG 10 /DQJH 1 DQG 6WUDP f 0D[LPXP /LNHOLKRRG &RPSXWDWLRQV ZLWK 5HSHDWHG 0HDVXUHV $SSOLFDWLRQ RI WKH (0 $OJRULWKP RXUQDO RI WKH $PHULFDQ 6WDWLVWLFDO $VVRFLDWLRQ /DQGLV 5 DQG .RFK ** f $ 5HYLHZ RI 6WDWLVWLFDO 0HWKRGV LQ WKH $QDO\VLV RI 'DWD $ULVLQJ IURP 2EVHUYHU 5HOLDELOLW\ 6WXGLHV 3DUWV ,, 6WDWLVW 1HHUODQGLFD /DQGLV 5 DQG .RFK ** f 7KH $QDO\VLV RI &DWHJRULFDO 'DWD LQ /RQJLWXGLQDO 6WXGLHV RI %HKDYLRUDO 'HYHORSPHQW LQ /RQJLWXGLQDO 0HWKRGn RORJ\ LQ WKH 6WXG\ RI %HKDYLRU DQG 'HYHORSPHQW HGV 5 1HVVHOURDGH DQG 3% %DOWHV 1HZ
PAGE 204
0DF5DH (& f 0DWUL[ 'HULYDWLYHV ZLWK DQ $SSOLFDWLRQ WR DQ $GDSWLYH /LQHDU 'HFLVLRQ 3UREOHP $QQDOV RI 6WDWLVWLFV 0DGDQVN\ $ f 7HVWV RI +RPRJHQHLW\ IRU &RUUHODWHG 6DPSOHV $PHU 6WDWLVW $VVRF 0DJQXV 5 DQG 1HXGHFNHU + f 0DWUL[ 'LIIHUHQWLDO &DOFXOXV ZLWK $SSOLFDWLRQV LQ 6WDWLVWLFV DQG (FRQRPHWULFV 1HZ
PAGE 205
5DR &5 f /LQHDU 6WDWLVWLFDO ,QIHUHQFH DQG ,WV $SSOLFDWLRQV QG HGQ 1HZ
PAGE 206
:KLWH $$ /DQGLV 5 DQG &RRSHU 00 f $ 1RWH RQ WKH (TXLYDOHQFH RI 6HYHUDO 0DUJLQDO +RPRJHQHLW\ 7HVW &ULWHULD IRU &DWHJRULFDO 'DWD ,QWHUQDW 6WDWLVW 5HY :KLWH + f $ +HWHURVNHGDVWLFLW\&RQVLVWHQW &RYDULDQFH 0DWUL[ (VWLn PDWRU DQG D 'LUHFW 7HVW IRU +HWHURVNHGDVWLFLW\ (FRQRPHWULFD :KLWH + f &RQVHTXHQFHV DQG 'HWHFWLRQ RI 0LVVSHFLILHG 1RQOLQHDU 5HJUHVVLRQ 0RGHOV $PHU 6WDWLVW $VVRF :KLWH + f 0D[LPXP /LNHOLKRRG (VWLPDWLRQ RI 0LVVSHFLILHG 0RGHOV (FRQRPHWULFD :KLWH $$ /DQGLV 5 DQG &RRSHU 00 f $ 1RW RQ WKH (TXLYDOHQFH RI 6HYHUDO 0DUJLQDO +RPRJHQHLW\ 7HVW &ULWHULD IRU &DWHJRULFDO 'DWD ,QWHUQDW 6WDWLVW 5HY =KDR /3 DQG 3UHQWLFH 5/ f &RUUHODWHG %LQDU\ 5HJUHVVLRQ 8VLQJ D 4XDGUDWLF ([SRQHQWLDO 0RGHO %LRPHWULND =HJHU 6/ f 7KH $QDO\VLV RI 'LVFUHWH /RQJLWXGLQDO 'DWD &RPPHQn WDU\ 6WDWLVWLFV LQ 0HGLFLQH =HJHU 6/ DQG /LDQJ .< f /RQJLWXGLQDO 'DWD $QDO\VLV IRU 'LVFUHWH DQG &RQWLQXRXV 2XWFRPHV %LRPHWULFV =HJHU 6/ /LDQJ .< DQG $OEHUW 36 f 0RGHOV )RU /RQJLWXGLQDO 'DWD $ *HQHUDOL]HG (VWLPDWLQJ (TXDWLRQ $SSURDFK %LRPHWULFV
PAGE 207
%,2*5$3+,&$/ 6.(7&+ RVHSK %HQHGLFW /DQJ ZDV ERUQ LQ 6W &ORXG 0LQQHVRWD RQ )HEUXDU\ ,Q KLV SDUHQWV 5DOSK DQG 0DU\ HDQ /DQJ PRYHG WKH IDPLO\ WR 5LFKPRQG D VPDOO UHVRUW WRZQ LQ FHQWUDO 0LQQHVRWD +H UHPDLQHG LQ WKH FHQWUDO 0LQQHVRWD DUHD IRU \HDUV +LV SDUHQWV VLVWHUV DQG EURWKHU UHPDLQ WKHUH WR WKLV GD\ ,Q KH GHFLGHG WR SXUVXH D FROOHJH GHJUHH +LV \HDU fFDUHHUf DV EDUWHQGHU DQG FRRN ORRNHG WR EH QHDULQJ DQ HQG ZKHQ KH EHJDQ KLV SRVWVHFRQGDU\ HGXFDWLRQ DW 6W &ORXG 6WDWH 8QLYHUVLW\ $IWHU D EULHI SHULRG RI HQWHUWDLQLQJ WKH LGHD RI PDMRULQJ LQ DUW RVHSK JUHZ YHU\ IRQG RI PDWKHPDWLFV DQG VWDWLVWLFV DQG GHFLGHG WR IRFXV KLV DWWHQWLRQ RQ WKHVH PRUH TXDQWLWDWLYH GLVFLSOLQHV $IWHU UHFHLYLQJ KLV %DFKHORU RI $UWV GHJUHH LQ PDWKHPDWLFV IURP 6W &ORXG 6WDWH 8QLYHUVLW\ LQ RVHSK ZDV HQFRXUDJHG WR SXUVXH KLV 0DVWHUfV DQG 3K' GHJUHHV LQ VWDWLVWLFV DW WKH 8QLYHUVLW\ RI )ORULGD LQ *DLQHVYLOOH +H ZHQW RQ WR UHFHLYH D 0DVWHU RI 6WDWLVWLFV GHJUHH LQ DQG XQGHU WKH GLUHFWLRQ RI 3URIHVVRU $ODQ $JUHVWL ZDV DZDUGHG D 3K' GHJUHH LQ VWDWLVWLFV LQ WKH VSULQJ RI :KLOH ZRUNLQJ WRZDUG WKHVH GHJUHHV KH ZRUNHG DV D WHDFKLQJ DVVLVWDQW ELRVWDWLVWLFV FRQVXOWDQW DQG D UHVHDUFK DVVLVWDQW ,Q RVHSK DFFHSWHG DQ DFDGHPLF SRVLWLRQ DV DVVLVWDQW SURIHVVRU LQ WKH 'HSDUWPHQW RI 6WDWLVWLFV DQG $FWXDULDO 6FLHQFH DW WKH 8QLYHUVLW\ RI ,RZD
PAGE 208
, FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ n/ YIFU $ODQ L YJUHVWL &KDLU A 3URIHVVRU RI 6WDWLVWLFV FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ Â‘2 WQH 3HQGHUJDVW $VVRFLDWH 3URIHVVRU RI 6WDWLVWLFV FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 5RFFR %DOOHULQL $VVRFLDWH 3URIHVVRU RI 6WDWLVWLFV
PAGE 209
, FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ &DUROH .LPEHUOLQ $VVRFLDWH 3URIHVVRU RI 3KDUPDF\ +HDOWK &DUH $GPLQLVWUDWLRQ 7KLV GLVVHUWDWLRQ ZDV VXEPLWWHG WR WKH *UDGXDWH )DFXOW\ RI WKH 'HSDUWPHQW RI 6WDWLVWLFV LQ WKH &ROOHJH RI /LEHUDO $UWV DQG 6FLHQFHV DQG WR WKH *UDGXDWH 6FKRRO DQG ZDV DFFHSWHG DV SDUWLDO IXOILOOPHQW RI WKH UHTXLUHPHQWV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 0D\ 'HDQ *UDGXDWH 6FKRRO
PAGE 210
81,9(56,7< 2) )/25,'$

