Full Citation 
Material Information 

Title: 
Generalized economic models and methods for markets in disequilibrium 

Physical Description: 
Book 

Language: 
English 

Creator: 
Mayer, Walter James, 1955 

Copyright Date: 
1986 
Record Information 

Bibliographic ID: 
UF00102774 

Volume ID: 
VID00001 

Source Institution: 
University of Florida 

Holding Location: 
University of Florida 

Rights Management: 
All rights reserved by the source institution and holding location. 

Resource Identifier: 
ltuf  AEN9870 oclc  16167663 

Full Text 
GENERALIZED ECONOMETRIC MODELS AND METHODS FOR MARKETS IN
DISEQUILIBRIUM
BY
WALTER JAMES MAYER
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL
FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1986
ACINOWLEDGEMENTS
I would like to thank my advisor Dr. S.R. Cosslett, for his support
throughout this project. Thanks are also extended to Dr. G.S. Maddala, Dr.
D.A. Denslow, and Dr. A.I. Khuri. Invaluable assistance provided by DeLayne
Redding in the typing of this document is much appreciated. This dissertation
is dedicated to my parents.
Abstract of Dissertation Presented to the
Graduate School of the'University of Florida
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
GENERALIZED ECONOMETRIC MODELS AND METHODS FOR
MARKETS IN DISEQUILIBRIUM
By
WALTER JAMES MAYER
December 1986
Chairman: Dr. S. R. Cosslett
Cochairman: Dr. G. S. Maddala
Major Department: Economics
Empirical studies of markets in disequilibrium have relied on the
appropriateness of explicit price adjustment equations, serial
independence, normally distributed errors, and explicit equations
relating the observed quantity transacted to desired supply and demand.
For example, the asymptotic properties of "disequilibrium" estimators
and test statistics are sensitive to the parametric forms chosen for
price adjustment, the serial behavior of the observations, error
distributions, and the quantity transacted. In a word, "disequilibrium"
estimators and statistics are nonrobust. Unfortunately, economic
theory provides little basis for choosing the parametric forms. A lack
of economictheoretic restrictions coupled with nonrobust estimators
and statistics has severely limited empirical studies of markets in
disequilibrium.
This dissertation develops new methods for more meaningful
estimation of disequilibrium models. The new methods involve more
general models and robust estimators.
A switching regression model with imperfect sample separation is
used to incorporate price adjustment into a disequilibrium model. The
model enables price adjustment to be incorporated with less a prior
information than usual. To estimate the model, maximum likelihood and
least squares estimators are proposed.
The asymptotic properties of the maximum likelihood estimator are
examined. Previous results for maximum likelihood estimators of
disequilibrium models are generalized with asymptotic theory for
serially dependent observations. The maximum likelihood estimator is
shown to be consistent and asymptotically normal even if the data are
characterized by unknown forms of serial dependence. Asymptotic test
statistics are also derived.
The methodology is illustrated with an empirical application to the
U.S. commercial loan market from 1979 to 1984.
Finally, I propose semiparametric models and estimators for markets
in disequilibrium. These methods are applicable when the error
distributions are unknown, and the quantity transacted is an unknown
function of supply and demand. Consistent estimators are derived using
the method of maximum score.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS...................................
ABSTRACT...........................................
CHAPTERS
1 AN OVERVIEW..............................
1.1 The Problem ........................
1.2 Solutions...........................
NOTES.....................................
2 A GENERAL DISEQUILIBRIUM MODEL AND
ESTIMATORS FOR LIMITED A PRIORI PRICE
ADJUSTMENT INFORMATION ...................
2.1 Introduction ........................
2.2 The Model and Maximum Likelihood
Estimation...........................
2.3 A Consistent Initial Least Squares
Estimator............................
2.4 Summary and Conclusions.............
NOTES......................................
3 SOME ASYMPTOTIC THEORY FOR SERIALLY
DEPENDENT OBSERVATIONS...................
3.1 Introduction........................
3.2 Consistency.........................
3.3 Asymptotic Normality................
3.4 Consistent Covariance Estimation....
3.5 An Asymptotic Test for Serial
Correlation.........................
3.6 Summary and Conclusions.............
NOTES.....................................
v
4 AN EMPIRICAL EXAMPLE: THE U.S.
COMMERCIAL LOAN MARKET................... 41
4.1 Introduction........................ 41
4.2 The Empirical Model.................. 42
4.3 Hypothesis Testing Procedures....... 43
4.4 The Results ......................... 45
5 SEMIPARAMETRIC ESTIMATION OF DIS
EQUILIBRIUM MODELS USING THE METHOD OF
MAXIMUM SCORE............................ 53
5.1 Introduction........................ 53
5.2 A Directional Model and Consistent
Estimation Up to Scale............. 55
5.3 A Price Adjustment Model with
Identified (Without a Loss of Scale) 59
5.4 Maximum Score Estimation of Models
that Include the Quantity Transacted 61
NOTES ...................................... 68
6 CONCLUDING REMARKS AND DIRECTIONS FOR
FURTHER RESEARCH......................... 69
APPENDIX
A.1 Inconsistency and Misclassified
Observations......................... 71
A.2 The Computational Tractability and
Asymptotic Properties of the Least
Squares Estimator of Section 2.3.... 74
A.3 Proofs of Theorems 3.2 3.7........ 80
A.4 Quadratic HillClimbing and the
Asymptotic Distribution of the
(p+l)thRound Estimates.............. 83
BIBLIOGRAPHY.................... ................. 86
BIOGRAPHICAL SKETCH............................... 89
CHAPTER 1
AN OVERVIEW
1.1 The Problem
Before Fair and Jaffee (1972) introduced their econometric
disequilibrium model, estimation of market behavior was confined to the
equilibrium assumption. The study of the econometrics of disequilibrium
was further developed by Fair and Kelejian (1974), Maddala and Nelson
(1974), Amemiya (1974), Goldfeld and Quandt (1975), Laffont and Garcia
(1977), Bowden (1978), and some others. By allowing for disequilibrium,
Fair and Jaffee's work and the subsequent work it inspired represents an
important generalization; but a generalization obtained by imposing
(1) explicit price adjustment equations,
(2) serial independence on time series data,
(3) normally distributed error terms, and
(4) explicit equations relating the observed quantity transacted
to desired supply and demand.
This contrasts with the equilibrium assumption where
(1) price adjustment is not an estimation issue,
(2) allowances are made for serial correlation,
(3) the errors are only assumed to be uncorrelated with a subset of the
explanatory variables, and
(4) desired supply and demand are directly observable.
The estimation of market behavior has been extended to the
disequilibrium assumption, but at a cost.
Economic theory for markets in disequilibrium is a relatively new
area of research that has been developed in the last few years by
Benassy (1982), and Fisher (1983), among others. Being recent
phenomenon, however, the theories that have been proposed are rather
limited in scope and tentative. For the empirical researcher, the
theories provide little guidance for specifying price adjustment, and
the quantity transacted as a function of desired supply and demand; they
provide no basis for specifying the error distributions and serial
independence. A survey of the many empirical studies that have followed
Fair and Jaffee (1972) suggests that the basis for specifying these
aspects of the econometric disequilibrium model has been largely
computational tractability. This approach has led to several
drastically different disequilibrium specifications. The assumptions
of each specification generally do not represent welldefined
economictheoretic restrictions, and thus differences among them seldom
reflect differences among welldefined alternative economic theories.
As a result, most disequilibrium specifications are as good (or bad) as
any other. Unfortunately, each specification produces estimates only as
reliable as the assumptions imposed, and differences among them can lead
to conflicting estimates of supply and demand equations.
The lack of economictheoretic restrictions alone does not prohibit
meaningful estimation of a disequilibrium model. The estimators
commonly applied are also prohibitive. Most proposed "disequilibrium"
estimators can be viewed as corrected versions of the "equilibrium"
2
least squares (LS) and maximum likelihood (ML) estimators. The
inequality of supply and demand introduces nonzero correlation between
the explanatory variables and the error terms. Given a model for the
inequality of supply and demand, the "equilibrium" LS and ML estimators
can be corrected for the nonzero correlation to yield consistent
"disequilibrium" LS and ML estimators. The correction approach provides
insight into the problem of relaxing the equilibrium assumption, but
generally requires restrictive assumptions to make it operational. In
particular, consistent LS and ML estimation of a disequilibrium model
depends on choosing the correct the parametric forms for price
adjustment, the error distribution functions, and the quantity
transacted; useful inferences require allowances for serial correlation
as well as correct parametric forms. Nonrobustness coupled with a lack
of economictheoretic restrictions severely limits the reliability of LS
and ML estimation.
To illustrate these points we consider the following model.
D = + (1.1)
t 1xt it
St = Bx + 2t (1.2)
2xt 2t
Data: (Qt,xt)n=l.
t t t=1
Equilibrium assumption: Qt = D = St .
Disequilibrium assumption: Dt St
Qt = T (D S )
Ap = 1t(DtSt).
Equations (1.1) and (1.2) are demand and supply functions; D denotes the
quantity demanded, S the quantity supplied, x a vector of explanatory
variables, e1t and e2t denote random error terms. Under the equilibrium
assumption the observed quantity transacted, Q, is equal to both D and
S; data are observed after prices adjust, and therefore adjustment
models are irrelevant. Under the disequilibrium assumption D and S are
not necessarily observable, the function T (.,.) specifies the position
of D and S relative to the observable Q ; data reflect adjustments at
various stages, and therefore it becomes meaningful to model price
adjustment. Price adjustments are modeled as follows: the price change,
Apt+I t+l Pt, depends on excess demand, DtSt, through the function
Ht
When LS and ML are applied under the disequilibrium assumption, it
becomes necessary to specify the distribution of (Elt, 2t) up to an
unknown parameter vector, and the functional forms for Tt and It. The
following example will illustrate this. Consider the problem of
obtaining a consistent LS estimate of B. Under the equilibrium
assumption the data are conditional on the event Q=D=S, and therefore a
consistent "equilibrium" LS estimate requires E(xtlt IQt=Dt=St)=0, or
equivalently E(x t t)=0 since Qt=Dt=St is a sure event by assumption.
Under the disequilibrium assumption, by contrast, each observation is
conditional on either Dt St, and therefore a consistent
"disequilibrium" LS estimate requires E(xtEltIDtSt)=0.
But since, for example, D
condition E(xt ItDt
contrary E(xtct D
holds. As a result, the LS estimator must be corrected for the nonzero
conditional correlation between xt and elt; that is, parametric models
must be specified for E(Xt t IDtSt). For example,
suppose we specify
0 2 0 \
(lt2t) N 2 N(1.3)
t t 0 0 2
Qt = Tt(Dt'St min(DtS t)' (1.4)
Apt+ = t (Dt,St) E a(DtSt), a>O, (1.5)
and for the first n observations we have Apt+l0. Then Q =D
t=l,...,n1 by equations (1.4) and (1.5), and consistent LS estimates can
be obtained by solving the problem
min {(n(QtB Xt E( It IQ tt; ',E21, 2,22' E 2
2 2 t=l
(B1,C1 ,2' a2
The functional form for the "correction" term E(clt Qt
from (1.3) and is given by3
2 2
where i(.) and $(.) denote the standard normal density and distribution
function. Without a priori restrictions the specification of
assumptions (1.3), (1.4), and (1.5) is arbitrary, but obviously crucial
to the LS estimation of the parameters. For example, given what is
known about most markets, some alternative assumptions that are just
plausible as (1.3), (1.4), and (1.5) are (1) any nonnormal symmetrical
distribution for the error terms, (2) Q tain(Dt,St ) and (3)
pt+l=a(DtSt) + 3t, where E3t is a random error term. Alternatively,
one could derive the likelihood function of (Qt,xt )=1, and obtain the
t't
ML estimates. Once again, however, the estimates are subject to the
validity of restrictive assumptions.
Empirical studies of markets in disequilibrium are concerned with
analyzing timeseries data, and therefore the possibility of serially
correlated errors also arises. Most disequilibrium studies, however,
completely ignore the possibility of serial correlation. One reason for
this practice is that maximizing the correct likelihood function for a
typical disequilibrium model with even a simple form of serial correla
tion can be intractable. The problem is one of introducing further
complications into a highly nonlinear structure. (Equilibrium models,
by contrast, have simpler structures, and therefore it is relatively
straightforward to incorporate an ARMA process (say) into these models.)
The problem is further complicated when the true form of the serial
correlation is unknown; even if one was willing to incorporate a simple
process such as AR(1), the result would likely to be a questionable
approximation at best. At the same time, failure to adequately account
for serial correlation can cause inconsistent covariance estimates, and
incorrectly interpretated test statistics.
In summary, the estimation of markets in disequilibrium has been
severely limited by the problems of specifying (1) price adjustment;
(2) serial correlation; (3) the distributions of the error terms up to
an unknown parameter vector; and (4)the quantity transacted as a
function of desired supply and demand.
1.2 Solutions
This thesis addresses the above problems by examining their
effects, and by proposing and demonstrating solutions.
Chapters 2, 3, and 4 are directed at the problems of specifying
price adjustment, and specifying serial correlation. In Chapter 2, we
propose using the switching regression model with imperfect sample
separation of Lee and Porter (1984) to incorporate price adjustment into
disequilibrium models. The model enables price adjustment to be
incorporated with less a priori information than usual. To estimate the
model, ML and LS estimators are proposed.
In Chapter 3, the asymptotic properties of the ML estimator are
examined in the context of possible serial correlation. This chapter
builds on previous results of Hartley and Mallela (1977), and Amemiya
and Sen (1977). By incorporating into their results some recent
developments in modeling serial correlation by White and Domowitz (1984)
and others, the analysis permits the data to be characterized by unknown
and general forms of serial correlation. At the same time, the
estimation problem remains computationally tractable.
In Chapter 4, the practical importance of the methodology developed
in chapters 2 and 3 is illustrated with an empirical example. The
methodology is applied to monthly data on the U.S. commercial loan
market from 1979 to 1984.
The final chapter, Chapter 5, proposes semiparametric models and
estimators for markets in disequilibrium. Unlike the previous chapters,
the results of Chapter 5 apply when the functional forms of the error
distribution functions are unknown, and the observed quantity transacted
is an unknown function of desired supply and demand. Consistent
semiparametric estimators are derived by extending the method of maximum
score of Manski (1975, 1985) to a new class of applications.
8
Although the focus is on the disequilibrium estimation problem,
many of the issues addressed are applicable to other important problems
as well. From a general perspective, the central issue is how to deal
with an estimation problem characterized by less information than what
is usually assumed. The methodology with which we confront the issue
brings together important works from the areas of limited dependent
variables, nonlinear estimation, asymptotic theory, data analysis,
maximum likelihood, least squares, and semiparametric estimation.
NOTES
lOne notable difference among many proposed disequilibrium
specifications is the treatment of price adjustment. Different price
adjustment models often produce different coefficient estimates and
inferences for given supply and demand equations. Most studies assume
normally distributed error terms, and that the quantity transacted is
the minimum of desired supply and demand. Surveys of disequilibrium
specifications commonly used in applied work can be found in Bowden
(1978) and Maddala (1983).
General discussions of LS and ML estimators for disequilibrium
models can be found in Bowden (1978) and Maddala (1983).
The random variable ,t conditional on D
normal distribution. Formulae for means and variances of truncated
random variables can be found in Maddala (1983, pp. 365370).
CHAPTER 2
A GENERAL DISEQUILIBRIUM MODEL AND ESTIMATORS FOR LIMITED
A PRIORI PRICE ADJUSTMENT INFORMATION
2.1 Introduction
Price adjustment has a well defined role in the equilibrium model:
prices adjust to clear the market; data are observed only after
adjustments terminate, and therefore are uninformative on the forces
which led to equilibrium. When we assume prices clear the market,
modeling price adjustment is trivial. In contrast, when we assume
disequilibrium, and therefore observe adjustments at various stages,
modeling the process becomes nontrivial and affects the estimation of
the supply and demand model. The research that has followed Fair and
Jaffee has given this issue only limited attention. To lessen the
neglect the present chapter examines the importance of price adjustment
to estimation, and offers a new approach for introducing price
adjustment into the disequilibrium model.
The estimation of a disequilibrium model carries the reservation
that estimates are sensitive to the price adjustment specification. This
sensitivity is evident in many of the empirical studies which followed
Fair and Jaffee. For example, Maddala and Nelson (1974) obtained the
maximum likelihood (ML) estimates of a housing market in disequilibrium
under two different price adjustment specifications,
PA1. the sign of excess demand is given by the direction of the price
change, or equivalently
Pr(Apt+1>0Dt>St) = 1 and Pr(Apt+>0 IDt
10
PA2. ignore whatever information the direction of the price change
contains on excess demand. (This is a limitedinformation approach
as no attempt is made to model price adjustment.) In the next
section, we shall see that this specification can be usefully
viewed as imposing the following constraint:
Pr(Ap t+>0ODt >S) = Pr(Apt+ >01Dt
For the two sets of estimates the following conflicts are apparent: one
estimated coefficient is negative under PAl and positive under PA2;
another is statistically significant under PA2 but not under PAl; the
estimated variance of the supply error term is twentyfive times larger
under PA2, and the same parameter for the demand equation is ten times
larger.
Economic theory imposes few restrictions on the dynamics of price
adjustment, and consequently provides little basis for choosing between
specifications such as PA1 and PA2. Perhaps this explains why in many
studies the FairJaffee models are applied rather mechanically with no
discussion of why a particular choice is appropriate for a given market.
The tendency has been either to specify convenient but restrictive
price adjustment mechanism such as PA1, or to ignore potential relations
between price and excess demand as in PA2. Apart from the potential for
conflicting results, each approach has serious drawbacks. The
restrictive approach may misspecify the model, and therefore lead to
inconsistent estimates of the supply and demand parameters. On the
other hand, if there is some interaction between price and excess
demand, then efficiency will be lost if price adjustment is completely
ignored. In short, even if the estimates under PA1 are close to those
under PA2, problems remain.
The failure of many empirical studies to adequately represent price
adjustment stems from a failure to carefully assess what is known a
priori. For most applications PAl imposes too much structure, and PA2
imposes too little. What is needed is an approach which allows price
and excess demand to interact, but at the same time is unrestrictive.
I propose nesting PA1 and PA2 in a more general model using a
method suggested by Lee and Porter (1984). In many respects the
approach is less restrictive than usual. Price adjustments are assumed
to be governed by the following condition:
PA3. The direction of the price change is most likely, but not certain
to follow the direction of excess demand, or equivalently
Pr(Apt+l>0oDt>St) > Pr(Apt+l>0 Dt
Although PA3 allows for the possibility that excess demand influences
price changes, it does not restrict the direction of the price change to
correspond to the sign of excess demand, impose a specific price
adjustment equation, or restrict price changes to obey a known
probability distribution. The approach entails estimating the
conditional probabilities in PA3, and hence the data rather than a
priori constraints such as PA1 or PA2 determine to what extent prices
are related to excess demand. Moreover, the problem of modeling price
adjustment is placed in a unified framework which permits a useful
discussion of the relationship between the price adjustment
specification, and the statistical properties of estimation. PA1 and
PA2 are special cases of PA3, and it is argued that imposing PA1 can
lead to inconsistent estimates, while imposing PA2 can suppress
exploitable information on the supply and demand parameters.
The model and its maximum likelihood estimator are discussed in
section 2.2. In section 2.3 a convenient least squares approach is
proposed which has not been previously available for disequilibrium
models. The LS estimator resembles that suggested by Heckman (1976) for
the Tobit model. Although the ML estimator presented in section 2.2 is
more efficient, the LS estimator is easier to compute, and provides
consistent starting values if the ML estimates are desired. An initial
consistent estimator is especially important when PA1 is relaxed since
the resulting likelihood generally has multiple solutions.
2.2 The Model and Maximum Likelihood Estimation
I propose the following model:
D = 0Xt + t (demand)
t 1t lt
S = 0xt + 2 (supply)
t 2 t 2t
0 a 0
(e ) N ( 22 (normality)
0 0 aE2
Qt = min(Dt,St) (quantity transacted)
Pr(Ap t+>0 Dt>St) > Pr(Ap t+>0 Dt
where the variables are as they were defined in Chapter one. The
specification of the demand and supply equations, normality for the
error terms, and the quantity transacted as the minimum of supply and
demand has become standard practice for empirical studies of markets in
disequilibrium. The model differs from previous disequilibrium
specifications with the introduction of PA3: shortages (D>S) and
surpluses (D
respectively, but in an unpredictable manner. The conditional
probabilities are unknowns which can assume any values in the unit
interval that satisfy the inequality.
The data consists of n observations on (Q t, xt,(Apt+1>0)), where
1(.) is the indicator function, and the problem is to estimate the
unknown supply and demand parameters along with the conditional
distribution of l(Apt+ >0) subject to PA3.
To make the problem operational we will adopt the methodology of
Lee and Porter (1984) which entails the following assumptions:1
Assumption 2.1. Given Dt>St (or Dt0) are mutually
independent for all t;
Assumption 2.2. the conditional probabilities of PA3 do not vary with
t; i.e., p11 Pr(Apt+>0 I Dt>Stxt' P10 Pr(Apt+>0 I Dt
Assumption 2.2 is the simplest assumption that allows the price
adjustment probabilities to be treated as estimable parameters, but is
not the only possible way of doing so. For example, if it is suspected
that price setting behavior differs between certain subsamples, then a
different pair of parameters could be defined for each. One possible
application might be a market where prices are regulated in some
periods, but not in others. Alternatively, a completely varyingpara
meter approach is developed in Chapter 5. Although assumptions 2.1 and
2.2 are still somewhat restrictive, arguably the benefits obtained from
imposing them outweigh the costs. Price adjustment is incorporated
without an explicit adjustment equation, a specific distribution for
price changes, or the restriction that price changes reveal the sign of
excess demand. Furthermore, as we shall see next, estimation is
relatively straightforward under assumptions 2.1 and 2.2.
The log likelihood function of n independent observations on
2 2
(Qtl(APt+>0) I xt,',Pl,pl0), where 0= (B1,a,,2,a'2 ), is
n
Ln (0p1'pI0) = E log ft (Qtl(Apt+>0)) (2.3)
t=1
where
l(Ap t+>0)
ft(Q,,l(Apt+l>0))=(p11.gst+pl0gdt)
st f gt(Dt Qt)dD gdt Q gt(Qt,St)dSt,
t t Qt
and gt(Dt,St) is the joint density of Dt and St given xt and 0. Under
fairly general conditions, a consistent and asymptotically normal
estimate of ( ,polpl0) can be obtained by maximizing Ln over an
appropriate parameter space. (The asymptotic properties of a maximizer
of L are developed in the next chapter.)
The MaddalaNelson estimators discussed in section 2.1 are obtained
by maximizing L subject to
n
(PA1): (P11,p10) = (1,0);
(PA2): Pl=P10'
As was noted, however, applying these two estimators to a given data set
can produce conflicting results. One advantage of specifying PA3 is
that the parameter space includes the entire region (p11' P0:
lapl= >100), and consequently it is not necessary to choose between
PAl and PA2.
By viewing the MaddalaNelson estimators as constrained maximizers of
Ln, two additional limitations that are overcome by specifying PA3
can be seen. First, if the direction of the price change does not
always follow the sign of excess demand so that p l<1 or p0 >0, then the
estimator obtained by maximizing L subject to (p11,10) = (1,0) is
inconsistent. In other words, if it is incorrectly assumed that
l(Apt+ >0) separates the sample into the underlying demand (Qt=Dt) and
supply (Q =S ) regimes, then the resulting estimates will be generally
inconsistent. To see this denote the constrained estimator by 0 (1,0),
and suppose that elt ,2t are normally distributed independent random
variables, and pl <1, Po00O. Then it is shown in Appendix A.1 that
E(L (0;l,0)/9 1) = (1pl) E xtE(QtDt)/o. (2.4)
Since Pr(D >St)>0, and Pr(Dt Qt = min (Dt,St)) = 1, we have E(QtDt)<0,
and it follows from (2.4) that in general plim 0 (1,0)00. (For further
details see Appendix A.1.)
The second limitation overcome by maximizing L over the
n
unconstrained space demonstrates the importance of incorporating price
adjustment into the model. If price changes are related to excess
demand so that plP,,o0, then the observations on l(Ap t+>0) contain
information on Go that is exploited by the maximizer of L only if the
restriction p11=P10 is not imposed. Since imposing P11P10 is equi
valent to estimating the model without a price adjustment specification,
this implies that one is better off using even limited amounts of price
adjustment information rather than neglecting it altogether. This can
be seen by examining the difference between the corresponding
information matrices of the constrained and unconstrained estimators of
0o. For this purpose note that p11 =10 implies that Qt and l(APt+1>O)
are independent, the marginal distribution of l(Apt+1>0) does not depend
on 00, and therefore the 0estimator obtained by maximizing L subject
n
to PI=P10 can be written as
Sn
n(p11=P10) = arg max E log gt(Qt),
Ot=1
where gt is the density of Qt given xt. Since n(P11=P10) does not
require the joint observation (Qt,l(Apt+l>0)), it uses one more
observation on Q than the 9estimator obtained by maximizing L over
the unconstrained space, and therefore we write the latter estimator as
n1
0nl(Pl pl0) = arg max E log ft (Qt,(APt+>0))
(O,p11' 10) t=1
Unlike (P11=P10), the estimator 0n_(PltP10) uses the price
adjustment information implied by p 1,po0, namely the dependence of
l(Apt+1>0) and Qt. For simplicity suppose that the observations are
identically distributed. The tradeoff between the extra observation on
Q used in 9n(p1 =P10), and the price adjustment information exploited by
e n_ (P11p0) is apparent in the difference between their corresponding
information matrices:
(n1)E(21logf/ 0,') nE(21logg/0~0E')
(nl)E(a21ogh/30 @') E(a21ogg/ 9a')
where ht is the density of l(Apt+l>0) given (Qt,xt). In large samples
the information provided by the extra observation on Q in n (Pl=P10) is
insignificant, and clearly 0n _(PllPl) is the more efficient
estimator.
Having developed a fairly unrestrictive approach for introducing
price adjustment into the disequilibrium model, an important question
remains: is maximization of Ln over (O,p11,Pl0) computationally
tractable? This question is important given a common structure shared
by both L with (p11,P10) unrestricted, and L restricted by P11lp10:
neither specification permits the observations to be separated into the
underlying supply and demand regimes, and hence both are switching
regression models with unknown sample separation. The question of
tractability arises because likelihood functions of unknown sample
separation models generally have an unknown number of local maxima, and
finding the consistent and asymptotically normal estimate (global
maxima) usually requires an exhaustive set of local candidates. For
example, Maddala and Nelson found that three different starting values
produced three different sets of estimates, and were not able to rule
out the possibility of other solutions. Unfortunately, the extra
information provided by the joint observation (Qtl(pt+l>0)) does not
automatically eliminate the problem; in general, Ln is likely to have
multiple solutions. Fortunately, the problem can be circumvented. If
one is willing to assume that P11>P10, then it is possible to construct
a computationally simple and consistent estimator of (0,p11',p0), and
therefore obtain consistent starting values to iterate to a local maxima
of L The consistency of the initial estimates generally guarantee the
n
consistency and asymptotic normality of the resulting solution. The
next task is to describe the initial estimator.
2.3. A Consistent Initial Least Squares Estimator
While computationally simple and consistent estimators have been
proposed for other limited dependent variable models such as the Tobit,
similar results have not been previously available for the disequili
brium model with unknown sample separation. Ironically, the models for
which such estimators have been available generally possess tractable
likelihood functions, and therefore finding consistent initial estimates
is of limited value. A prime example is the Tobit model for which
consistent initial estimators were proposed by Amemiya (1973), and
Heckman (1976); their estimators are not particularly useful for the
Tobit as this model has a globally concave likelihood function (when
suitably parameterized) which ensures convergence to the consistent and
3
asymptotically normal maximizer from any starting values. In contrast,
the likelihood functions of models with unknown sample separation are
likely to have multiple maxima, and therefore finding initial consistent
estimates for these models is crucial.
The estimator described below extends the approaches suggested by
Amemiya and Heckman to disequilibrium models with unknown sample
separation. The method requires the first moment of l(Ap t+>0)
(t=l,...,n), and the first and second moments of Qt (t=l,...,n). Least
squares is then applied successively to three estimation equations.
Assuming that eit and E2t are independent normally distributed random
variables, the relevant equations are
l(Ap t+>0) = El(Apt+1>0) + ult (2.5)
Qt = E(Qt) + u2t (2.6)
Q2 = E(Q2) + u3t (2.7)
where
E(1(Apt+ >)) = p (p1 P 0) D(xty), (2.8)
E(Qt)= (lD(x yo))xtB2 + $(xtYo)xt (co2 2+ o2 (x to),(2.9)
tt t2 t t1 1 +e2) (o(9
E(Q) = (1D(xto))(xt) o2 + (xt )(xt )2
+ t O 2 O(0 02 02+1
+ o2(10(x 2x 0 (x o)/(o2 + Co2)
+ o(2(xtY0) 2xtB 0(xt Y)/( o2 + co 2
E2 t t I t E 1 e2
+ ((xty o)xt (2.10)
0 0 0 o2 + 2 2
Y = ( )/( C2 + o2) and )(.) and (.*) denote the standard normal
density and c.d.f., respectively. Given appropriate regularity
conditions, nonlinear LS applied to equation (2.5) yields consistent
estimates of p 1, P0', and yo. These estimates are then used to estimate
the nonlinear functions, ( and P, in equation (2.6). Ordinary LS can
o o o2
then be applied to (2.6) to consistently estimate 61, 2 and ( 1 +
0o2) Finally, the nonlinear functions of equation (2.7) are estimated
02 o2
so that OLS can be applied, and consistent estimates of 0 o and o2
obtained. The asymptotic properties of the LS estimator are developed
in appendix A.2.
Interestingly, the above approach is possible only if price changes
provide some information on whether there is excess demand or supply;
i.e., p 0p This can be seen from equation (2.8) which can be
i.e., PllfPl0
interpreted as the probability that l(Ap t+>0) is equal to one. If
price changes are completely uninformative on excess demand or supply,
then pl 0=P0, and it follows from (2.8) that the distribution of
l(Apt+>0) is independent of yo. In this case the observations on
l(Apt+l>0) contain no information on the supply and demand parameters,
and therefore equations (2.5) is irrelevant for the estimation of the
model.
2.4 Summary and Conclusions
The main points of this chapter are
(1) Estimates of disequilibrium models are sensitive to the price
adjustment specification.
(2) Economic theory imposes few restrictions on price adjustment, and
consequently provides little basis for choosing between specifications.
(3) Assumption PA3 serves as an unrestrictive approach for introducing
price adjustments into disequilibrium models; adjustment enters without
an explicit adjustment equation, a known probability distribution for
price changes, or the restriction that price changes reveal the sign of
excess demand.
(4) Assumption PA3 together with assumptions 2.1 and 2.2 permit a
straightforward derivation of the likelihood function. The parameter
space includes but is not limited to the important special cases PA1 and
PA2. Constraining the parameter space to PA1, as is often done in
practice, can lead to inconsistent estimation; constraining the space to
PA2 produces inefficient estimates.
(5) Under assumption PA3 the disequilibrium model is one of unknown
sample separation, and therefore its likelihood function generally has
multiple solutions. To resolve the problem of multiple solutions, the
least squares method described in section 2.3 provides consistent
initial estimates.
In Chapter 4 we apply the methodology developed in the present
chapter to monthly data on the U.S. commercial loan market. Before
proceeding to the application, however, the problem of serial
correlation must be addressed. In the next chapter we develop some
results which permit the data to be analyzed in the context of possible
serial correlation.
NOTES
There is an important difference between the model Lee and Porter
(1984) discuss, and our model. The LeePorter model excludes an analog
to Qt=min(Dt,St), and consequently in their model the switching is
exogenous; i.e., the switching that occurs between the underlying
regimes is independent of the error terms. In contrast, the
disequilibrium model is of endogenous switching, (the "switch" depends
on (elt',2t)), and consequently many of the results, interpretations,
and expressions found in the LeePorter paper must be modified
accordingly.
In fact, given appropriate regularity conditions, consistent
initial estimates ensure the consistency and asymptotic normality of the
secondround estimates from a NewtonRaphson type algorithm. See, for
example, Amemiya (1973, pp. 101415).
3Olsen (1978) proved that the likelihood function for the Tobit
model is globally concave when suitably parameterized, and thus has a
single maximum.
CHAPTER 3
SOME ASYMPTOTIC THEORY FOR SERIALLY DEPENDENT OBSERVATIONS
3.1 Introduction
In this chapter we examine the asymptotic properties of the
estimator discussed in section 2.1,
ml arg max L (0 ),
n n p
p
where 0 = (6,p11,p10), and L (0) is defined on page 14, equation
(2.3). If the observations are serially independent, then obviously 1ml
is the MLE of 00. However, for serially dependent observations, oml is
p n
not the MLE and will be referred to as the partialMLE.
Hartley and Mallela (1977), and Amemiya and Sen (1977) derive
asymptotic properties of the MLE for the special case of p1l=p10. We
will extend their results to the case of serially dependent observations
and p11pl0 in sections 3.2 and 3.3. In section 3.4 we consider the
problem of consistently estimating the asymptotic covariance matrix. In
section 3.5 we derive a new test for serial correlation.
3.2 Consistency
Since disequilibrium models are typically estimated with time
series data, it is of interest that the property of consistency can be
extended to the partial MLE. Using some results and definitions
presented by White and Domowitz (1981), Levine (1983) has discussed how
and why a partial MLE can be consistent. Levine points out that the
consistency of an estimator 0 (y) which maximizes the product
23
n=if (yt 0) depends on each f (yt I ) satisfying certain regularity
conditions. In general, whether or not ft(Yt 1) satisfies such
conditions does not depend on the product being the joint density of y =
(yl,"..yn), but rather it usually suffices that ft(. *) is the marginal
density of yt. The regularity conditions consist of identification
conditions, and moment restrictions sufficient to apply an appropriate
law of large numbers. We will show that the partial MLE for our model
can satisfy such conditions by extending some results proven by Hartley
and Mallela, and Amemiya and Sen. But first it is necessary to describe
the type of dependence we have in mind.
We will adopt the nonparametric approach of White and Domowitz
(1984) to allow for the possibility of serial correlation. The approach
of White and Domowitz is nonparametric in the sense that the
observations are not required to be generated by a known parametric
model such as an ARMA (p,q) process, but instead must obey general
memory requirements. The memory requirements are referred to as mixing
conditions, and a sequence of random variables which obey mixing
conditions is said to be a mixing sequence. More precisely, we have the
following definition.
Definition 3.1. Let (y t) denote a sequence of random vectors defined on
a probability space (2,F,P), and let F denote the Borel ofield of
a
events generated by the random variables yaa+1'...yb. Define
<(m) = sup(sup P(A B)P(A) i: AeF0 BeFn )) and
(m)= sn+m p(sup(P(BA)P(B)P(A): A
a(m) = sup(sup( P(BA)P(B)P(A) : ASFnFm,BcF ))n
n nm 0
(i) If ((m)O0 as mm, and }(m)=0(mk) for k>r/(2rl), where r1l, then
(yt) is a mixing sequence with 4(m) of size r/(2r1).
(ii) If a(m)* as mm, and a(m)=0(mk) for k>r/(rl), where r>l, then
(yt) is a mixing sequence with a(m) of size r/(r1).
4(m) and a(m) measure how much dependence exists between observations at
least m periods apart. A sequence such that j(m)+0 as m+ is called
uniform mixing or (mixing, and a sequence for which a(m)+O as m is
called strong mixing or cmixing. Since the dependence coefficients,
p(m) and a(m), are required to vanish asymptotically, mixing is a form
of asymptotic independence. A fairly large class of processes satisfy
mixing conditions. For example, finite order Gausian ARMA processes are
strong mixing, as are stationary Markov chains under fairly general
conditions. White and Domowitz (1984) show that measurable functions of
mixing processes are mixing and of the same size. This is particularly
convenient for nonlinear problems. Mixing processes are useful for
modeling complex economic data since they are not required to be
stationary. In short, mixing conditions provide a convenient way to
model an economic phenomenon that is likely to be both heterogeneous and
time dependent.
The following law of large numbers, due to McLeish (1975), applies
2
for mixing sequences.
Theorem 3.2. Let (y t) be a sequence with 4(m) of size r/(2r1) or a(m)
of size r/(r1), r>1, such that Ejy r+d0, and all t.
Then
n p
(l/n)E(ytE(y t))0
t=1
All proofs of theorems in this Chapter are provided in Appendix A.3.
For Theorem 3.2 to be applicable to a given sequence, it is clear
that there is a tradeoff between the moment restriction that the
sequence must satisfy, and allowable dependence. The stronger the
moment restriction satisfied, the more dependence as measured by ((m)
(or c(m)) is allowed. If the members of the sequence are independent,
then we can set r=l, and Theorem 3.2 collapses to the Markov law of
large numbers. For sequences with exponential memory decay, r can be
set arbitrarily close to one. In general, the longer the memory of a
sequence, the larger is the size of p(m) and a(m), and consequently the
more stringent the moment restriction (which depends on r) becomes.
By using mixing conditions to restrict the serial behavior of the
sequence (Qt,l(APt+ >0),xt), it is not necessary to specify an
additional parametric model such as an ARMA (p,q) process.
Consequently, one possible source of model misspecification is
eliminated. Mixing conditions enable us to include a larger class of
models in the analysis. Of course, as Theorem 3.2 implies, the precise
size of the class will depend on what moment restrictions are satisfied.
We are now ready to state conditions which ensure the consistency of the
partialMLE (and the MLE) of 0.
p
In order to establish consistency for the partialMLE we impose the
following assumptions on the disequilibrium model presented in section
2.2:
Assumption 3.3. (allowable serial dependence): The sequence
(Qt,l(Apt+1>0),xt) is a mixing sequence with p(m) of size r/(2r1), rl,
or a(m) of size r/(rl), r>l.
Assumption 3.4. (distributions):
(i) The random vector (elt, 2t) is normally distributed with mean zero
and covariance matrix:
o2 02)
Fe2
(ii) Assumptions 2.1 and 2.2 hold. (See page 13.)
Assumption 3.5. (the regressors):
(i) The vector xt consists of only exogenous variables.
(ii) Each component of xt is uniformly bounded in t, has a finite
range for each t, and a support given by St=S for all t.
(iii) Any linear combination of the components of xt where the
coefficients are not all zero is not zero with probability one.
Assumption 3.6. (the parameter space):
(i) The parameter space E includes the true parameter vector
o o o2 o o2 o o 2
0P=(Boa 2, o2 ,p',0), excludes the region a 20 (i=1,2) and
P10>P11, and is a compact subset of a Euclidean space.
(ii) If the set includes points such that P11=P10, then it excludes
o 2 o o2 o o
the point 0 =(62,oi o2 o o2 0 0). Otherwise may include
p B2' F2 1' J ,PlP I0'
0.
p
With a few exceptions, the conditions on the regressors and the
parameter space are identical to those given by Hartley and Mallela
(1977), and Amemiya and Sen (1977). One exception is that we place no
restrictions on the limiting behavior of the empirical distribution of
the regressors, whereas Hartley and Mallela require it to converge
completely to a nondegenerate distribution. As pointed out by White
(1980), in sampling situations where the researcher has little control
over the data, it is important to allow for the possibility that the
empirical distribution does not converge. In contrast to Amemiya and
Sen, we do not require the regressors to be i.i.d., but for convenience
retain their assumption that the regressors are discrete random
variables.
Assumption 3.6(ii) is necessary to identify the true parameter
O o
vector do. Without appropriate prior information on B and B2, the
P 1
point 0 is indistinguishable from o0 and the model can not be
p p
estimated. This is the problem of interchanging regimes which is
discussed by Hartley and Mallela, and Amemiya and Sen. Both studies
*
point out that 0 is eliminated from the parameter space if the usual
"order condition" holds. We will extend this result below by showing
that for 0o to be distinguishable from it suffices to know a prior
p p
that pol >P0. In this sense prior sample separation information
represents prior information on the supply and demand parameters.
Hoadley (1971) has generalized the Wald argument to the case of
independent not identically distributed observations. Theorem 3.7 below
is an extension of Hoadley's argument to mixing sequences, and will be
used to verify that assumptions 3.3, 3.4, 3.5, and 3.6 imply consistency
for the partialMLE, m .3
n
Theorem 3.7. Suppose:
(i) The sequence (y t) is a mixing sequence with ((m) of size r/(2r1),
r2l, or a(m) of size r/(rl), r>l.
(ii) The parameter space E is a compact subset of a Euclidean space.
(iii) The function ft(y t0) is continuous on E, uniformly in t, a.e.
(iv) The function
sup{in(ft(yt '1)/ft(Yt eI)): I'0 1p}
is a measurable function of yt for each 0 belonging to E.
(v) There exists p (0)>0, and d>O such that
E sup{ln(f(yt t ')/ft(Yt 0): 9' sp} Ir+d<=A<
for Op
(vi) For 0e0,
n
lim sup{n E E(ln(ft (yt I)/ft(yt I )))}<0.
n t=1
Let 0 (y) be a function of the observations y=(y1,. .,n) which
solves the problem
n
max n ft(y 1).
0 t=1
Then plim 6 (y)=O.
n
To show that the partialMLE dnl is a consistent estimator of 0 we
n p
verify that 5, (Qt,l(Apt+1>0),xt), and ft(Qt,1(Apt+1 >0) ) satisfy
3.7(i)(vi) given assumptions 3.3 3.6.
The fact that the mixing and compactness requirements 3.7(i) and
3.7(ii) are satisfied follows directly from assumptions 3.3 and 3.6(i).
Lemma 3.8 establishes that f (Q ,l(Ap t>010 ) satisfies the
continuity requirement 3.7(iii).
Lemma 3.8. Given assumptions 3.4 3.6, ft (Qt,1(Apt+1>0 ) is a
continuous function of Q uniformly in t, a.e.
p
Lemma 3.9 establishes that the measurability requirement, 3.7(iv),
is satisfied.
Lemma 3.9. Given assumptions 3.3 3.6, the function
I f
sup{ln(ft(Qt,l(Apt+l>0) I9p)/f t(Qt,l(Apt+l >0)o )): O pOp
is a measurable function of (Q t,(Ap t+>0),xt).
The moment restriction, 3.7(v), together with 3.7(i) determines the
amount of dependence allowable. The following lemma extends Hartley and
Mallela's Corollary 4.2, and establishes that 3.7(v) is satisfied for
large r+d.
Lemma 3.10. Given assumptions 3.3 3.6, for all sufficiently small
p=p(0) >0,
Elsup{ln(ft (t,1(Ap t+oe>0 /fo): I pep Isp}k~g~m.
where k is any positive integer.
Finally, Lemma 3.11 establishes that the identification condition,
3.7(vi), is satisfied. Lemma 3.11 extends Amemiya and Sen's Lemmas 2
and 3 to the case of p11pl0.
Lemma 3.11 Given assumptions 3.3 3.6, for 0 ;o0 there exists a
P P
negative constant b(6 ) such that
p
E(1n(f t(Qt, (Apt+l>0) 6 p)/ft (Qt,1(Apt+1>0)p))):5b().
p tn ft(Qt
We have proven the following theorem.
Theorem 3.12. Given assumptions 3.3 3.6, then plim J= 0.
n p
3.3 Asymptotic Normality
Under the assumption that (Qt,1(Apt+l>O),xt) is a mixing sequence,
we consider the limiting distribution of
nV 2eO)VL (0),
n p n p
where VL (0) denotes the gradient vector corresponding to L (0o), and
n p n p
S() = var (n2VL (0)). We will discuss conditions that imply
n p n p
asymptotic normality; that is,
i i A
n V '(0)VL (o)yN(O,I), (3.13)
n p n p
where I denotes an identity matrix of appropriate dimensions. The
results in this section together with those in the next section permit
derivation of asymptotic test statistics.
As is well known, asymptotic normality is proven by an appropriate
application of a central limit theorem. As with consistency, the
conditions sufficient for asymptotic normality depend on the degree of
dependence and heterogeneity the sequence exhibits. For a sequence of
independent identically distributed random vectors, we have the
LindebergLevy Theorem; for independent not identically distributed we
have the LindebergFeller Theorem; for dependent identically distributed
we have the central limit theorem of Gordin (1969); for dependent not
identically distributed we have the central limit theorem of Serfling
(1968).
For the case of independent observations, Hartley and Mallela
(1977) prove the asymptotic normality result (3.13) by applying a
version of the LindebergFeller Theorem. However, by specifying the
sequence (Qt,l(Apt+1>0),xt) as mixing, a more general result is
possible. The following theorem is based on Theorem 2.4 of White and
Domowitz (1984) which generalizes Serfling's (1968) central limit
theorem.
Theorem 3.14. Suppose:
n
(i) Let VL (0o) = E VL (0). Then E(VL (0)=0 for all t.
n p t=1 p p
(ii) Let X be any nonzero vector, and define
1n+a
1
VL (0) = n Z VL (O ).
n,a p t=l+a
Then there exists a matrix V such that det(V)>0, and
AE(VL (O0)VL (Oo)T) T XVXT + 0
n,a p n,a p
as n+ uniformly in a.
(iii) EIVL t(0) 12r A<~ for some r>1.
t p
If p(m) or a(m) is of size r/(rl), then (3.13) holds.
Condition 3.14(i) is the familiar condition that the vector of
likelihood equations, when evaluated at the true parameter vector 0,
has zero expectation. Sufficient conditions for 3.14(i) are (1) the
model is correctly specified, and (2) the density of (Qtl(Ap t+>0),x )
is sufficiently regular to permit differentiation under the integral
sign.
Condition 3.14(ii) is somewhat restrictive, but unfortunately a
less restrictive replacement for it is currently not available.
Condition 3.14(ii) restricts the heterogeneity of VL (0o) by requiring
it to be covariance stationary asymptotically.
it to be covariance stationary asymptotically.
Condition 3.14(iii) is a moment condition which depends on the
amount of dependence the sequence (Qt,(Ap t+>0),xt) exhibits. If the
sequence is serially independent, then r can be set arbitrarily close to
one; as the amount of dependence increases, as measured by ((m) or a(m),
r increases accordingly.
3.4 Consistent Covariance Estimation
We consider the problem of deriving consistent estimators for the
asymptotic covariance matrix of the partialMLE 0 The expression for
n
the asymptotic covariance matrix is
n2V2 (e0)V (0oV2 (o)1,
n p n p n p
where V (00)= var(n VL (00)), and V2 (00) is the matrix of second
n p n p n p
order partial derivatives of L (bo) = E(L (0o)).
n p n p
First consider the problem of consistently estimating the term
nV 2L (0)1. The functional form of this term does not depend on the
n p
serial dependence (or independence) of the observations, and therefore
consistent estimation of it is straightforward. The following theorem,
which combines Lemma 2.6 of White (1980) with Theorem 2.3 of White and
Domowitz (1984), provides conditions that imply
plim n(V L (e1l) V2L (0)1) = 0.
n n n p
Theorem 3.15. Let qt(yt,0) be measurable for each 0 belonging to a
compact set E, and continuous on E uniformly in t a.e.
Suppose
(i) The sequence (y t) is mixing as stated in Definition 3.1.
(ii) For r1 and any d>0,
sup E q t(yt, ) Ir+d<
n
If plim 0=', then plim n( E (qt(yt' )q (y ,)))=0
n t=1
Next consider the problem of consistently estimating V (0o).
n p
Unlike the term V2 (00), the functional form of V (0) depends on the
n p n p
nature of the serial correlation, and consequently special care must be
taken. The general form for V (eo) is
n p
n oT
v ();c(n)) = n1 E E(ft(OO)f ) ) +
n p t=1 t p
Sc(n)1 n
+ n E E E[ft( (0)f (s )T + f s(0f o T
t p ts p ts p t p
s=l t=s+l p P
where ft(0p)PVlog f t(O), ft(.) is the density of (Qt,l(APt+1>0),xt) and
c(n) is such that E(ft (0)f s(0p)T = 0 for sac(n). The natural choice
t p ts p
for an estimator of V (0;c(n)) is the sample analogue V (ml ; c(n)).
n p n n
The consistency of such an estimator, however, depends on the asymptotic
behavior of c(n). We will consider two special cases.
Case 1. c(n) = c
If c(n) is equal to a known finite constant c which is less than or
equal to the sample size minus one, (if c=n, then the estimator
V ( l;c)=0), then imposing the conditions of Theorem 3.15 will suffice
n n
for
ml
plim (V (n ;c) V (6;c))=0.
n n n p
An example of sampling situation where c is a known finite constant
would be one in which the observations are known to be generated from a
movingaverage process of order c.
If c is assumed to be constant and less than or equal to n1, but
otherwise unknown, then the problem becomes more complicated. Let c
denote the specified choice for an unknown c. In the next section we
derive an asymptotic test for the hypothesis c=c. The test is a
possible criterion for specifying c. The issues involved in specifying
c are the following. If we specify c
is inconsistent since nonzero terms in V (0o;c) are mistakenly
n p
constrained to be zero. On the other hand, if we specify c> c, then the
estimator is consistent, but inefficient since restrictions of the form
E(f.(0)f.(o ) )=0 are neglected. When the purpose of estimating
i P J P
V (9;c) is to construct asymptotic test statistics, however, the
n p
essential requirement is consistency (rather than efficiency).
Therefore, when the purpose is hypothesis testing, the choice c>c is
preferable to c
Case 2. lim c(n)= and lim E(f (0 )f ()(0 T)=0.
cn+t p tc(n) p
n+mco c(n)>+m
0
In this case the sequence (ft (0 )) is only assumed to be
asymptotically uncorrelated. A sufficient condition for (f (0 )) to be
asymptotically uncorrelated is that the sequence (Qt,1(Ap t+>0),xt) be
mixing. Theoretical results for this case have been presented by White
and Domowitz (1984), White (1984), and Newey and West (1985). Their
results depend on restricting the growth rate of c(n). Unfortunately,
their results do not give any guidance concerning the choice of c(n) for
finite samples. The following theorem is due to Newey and West (1985),
and provides sufficient conditions for plim (V (CP ;c(n)) V (9;c(n)))
n n n p
=0.
Theorem 3.16. Suppose
(i) f (9 ) is measurable in (Q l(Ap +>0),xt) for each 0 and
t p t t+l t p
continuously differentiable in 0 in a neighborhood N of 00.
p p
(ii) (a) sup Ift(o ) 12
0 EN
p
(b) There are finite constants d>0 and rl such that
E Ift(0) 4(r+d) <.
(iii) (Qt,l(Apt+l>O),xt) is a mixing sequence with p(m) of size 2 or
c(m) of size 2(r+d)/(r+dl), r>l.
(iv) For all t, E(f (O))=O, and n'(Gm0o) is bounded in probability.
t p n p
If lim c(n)= such that c(n)=o(n'), then
plim (V (aml;c(n)) V (0);c(n)))=0.
n n n p
One additional problem is that for c(n)>1 the estimate V (ml;c(n))
n n
is not necessarily positive semidefinite. This can lead to negative
estimates of the variances and test statistics which are clearly not
acceptable. To ensure that V (ofl;c(n)) is positive definite, the
n n
summands can be weighted according to a procedure described in Newey and
West (1985). This modification does not affect the consistency of the
estimate.
3.5 An Asymptotic Test for Serial Correlation
In this section we propose a test sensitive to serial correlation
in the gradient vectors f (00). The test provides a criterion for
t p
specifying the constant c of the covariance estimator V (Ol;c).
n n
The null hypothesis of interest is
H : E(f (0~).f (0o).)=0 for all i,j,
o t p i tc p j
where f (0). denotes the ith component of the vector ft(0o). The
t p I t P
basis for a test of H comes from two observations.
o
(1) Under Ho, linear combinations of the components of the vector
f (b) are uncorrelated with linear combinations of the components
t p
of the vector f (00).
tc p
(2) Under Ho, the products f (G l)f (il). should be close to zero
o t n i tc n a
for sufficiently large n.
Therefore, a reasonable strategy for testing H would be to compute the
sample correlation between appropriate linear combinations, and reject
H if the sample correlation is too large in some sense. To this end,
o
for a kdimensional vector f (G~), consider the artificial regression
t p
k k
Z w. f (Ql) E a.f (e)
Sit t n f i i tc n i
i=l i=l
k
where the w. are known constants such that I w =1, and the a. are
it it 1
unknowns to be estimated. The test we propose entails computing the OLS
Is
estimates a. i=l,...,k, and testing the hypothesis al= ... k=0. More
formally, we have the following theorem.
T
Theorem 3.17. Define a = (a *...a ,
f (0 ) =/f( )
c p 1 p1
f (0 )
nc p 1
f(O )
p
fl( p)k \
f (0 ) (nc)xk
nc pk
k .
/ w. f (0 ).
i=1 i,c+l c+1 p )
k
Sw. f ( ). (nc)xl
i=1 1,n n p i
In addition to H suppose
(i) The vectorvalued function f(0 ) is continuously differentiable
p
(component by component) on an open convex set E CR containing
0
0.
P
(ii) There exists an open neighborhood of 0 ,N, such that
P
sup ft (6 )il
SeNt p
8 EN
p
sup lf (0 )i/90 
0 EN P P
P
n 0
(iii) plim n Z ft( )=0
t=1
(iv) Let A (0)=n f (0) T (0o), and (0 ) = E(A (0 )). Then
n p c p c p n p n p
there exists an open neighborhood of 0 NO, such that A (0 ) is
p n p
positive definite on NO for all n sufficiently large and
plim sup A n( ) A (0 ) =0.
Sd n p n p
p
(v) Let U (Go) = var (n f () fT(0o)), and let U (C1) denote the
n p c p p n n
sample analogue. Then U (0) is positive definite on an open
n p
neighborhood of 0 for all n sufficiently large and
p
plim (U (d11) U (0)) = 0.
n n n p
(vi) Under H U (O)n f (oo) f(0) 1 N(0,I).
o n p c p p
Let D (9 )=A (0 )U (0 )A (0 ). Then given conditions (i)(vi), and
n p n p n p n p
H ,
O T
1s T1 1(l) is A 2
na D (U ) a v .^
n n n n X"
3.6 Summary and Conclusions
The main points of this chapter are the following:
(1) The assumptions presented in sections 3.2 and 3.3 imply that the
partialMLE of the disequilibrium model is consistent and asymptotically
normal. The assumptions allow for serial correlation of an unknown
form; for example, an arbitrary ARMA process is allowable for the
observations. At the same time, the estimator Uml is computed as though
n
the observations were serially independent, and thus computational
tractability is retained.
(2) To calculate asymptotic test statistics, a consistent estimate of
the asymptotic covariance matrix is needed. Obtaining a consistent
covariance estimator is complicated by the need to specify a constant c
such that E(f (O )f (o T)=0 for all s.c. In general, c is unknown but
t p ts p
consistent covariance estimation depends on specifying a c such that c>c.
(3) The test statistic presented in Section 3.5 permits a test of
H :c=c, and thus provides a criterion for specifying c.
o
NOTES
Our discussion of mixing draws heavily on White and Domowitz
(1984), and White (1984, pp. 4347).
2Theorem 3.2 is a less general version of the law of large numbers
presented by McLeish (1975, Theorem 2.10). The version we present is
discussed in White (1984, Corollary 3.48), and imposes a stronger but
simpler moment restriction.
3White and Domowitz (1984) extend Hoadley's Theorem A.5, which is a
uniform law of large numbers, to mixing sequences by applying Theorem
2.10 of McLeish (1975) instead of Markov's law of large numbers. Here
we merely point out that Hoadley's Theorem 1 can be extended to mixing
sequences using the same technique.
In some respects the conditions of Theorem 3.7 are stronger than
those stated in Hoadley's Theorem 1. For example, the requirement that
ft (t 0) is continuous can be replaced by upper semicontinuity. The
conditions that we state are sufficiently general for our purposes.
CHAPTER 4
AN EMPIRICAL EXAMPLE: THE U.S. COMMERCIAL LOAN MARKET
4.1 Introduction
In this chapter the disequilibrium model described in section 2.2
(page 12) is fitted to monthly data on the U.S. commercial loan market
from 1979 to 1984. The problem is to analyze disequilibrium supply and
demand behavior with limited a priori information imposed on the price
adjustment process. The model is estimated and tested with the
partialMLE and least squares method described in sections 2.2 and 2.3,
respectively. The possibility of serial correlation is accounted for
using methods described in Chapter three.
Disequilibrium models of commercial loan markets have been
estimated by Laffont and Garcia (1977), Sealy (1979), and Ito and Ueda
(1981). To design the specification of the supply and demand equations
these works were consulted. Our model and estimation methods, however,
differ from the previous studies in three important respects. First,
price enters the model differently. Laffont and Garcia, and Ito and
Ueda constrained the price change to separate the sample, and Sealy
assumed that price changes were a linear function of normal random
variables. Second, the starting values we employ for maximizing the
likelihood function are consistent estimates, and therefore ensure
convergence to an asymptotically desirable solution. None of the above
studies employed methods that guarantee this. Third, we will adopt the
nonparametric approach developed in Chapter three to allow for the
possibility of serial correlation. Given that the data is a time
series, allowing for serial correlation is particularly important.
Failure to do so can cause inconsistent covariance estimates and
therefore misleading test statistics. In contrast, most existing
disequilibrium studies, including those mentioned above, apply methods
to time series data that are only appropriate for serially independent
observations. The nonparametric approach was chosen for its generality,
and computational ease. An arbitrary ARMA process is allowable for the
error terms, but at the same time the parameter estimators are computed
as though the errors are serially independent. As opposed to an
assumption of serial independence, the only part of the problem that
changes is the calculation of the asymptotic covariance estimate.
4.2 The Empirical Model
The empirical model to be estimated and tested is specified as
follows.
Dt = 10 + B1(RLtRAt) + 2IP t + 1t,
St =20 + 21(RLtRTt) + B22TDt + E2t'
Qt = min(Dt,St),
P11P10' where p11 Pr(ARLt+I>0 Dt>St), and
P0 Pr(ARLt+I>ODt
The variables we use will differ little from those of the previous
studies. The variable RL is the average prime rate charged by banks; RA
is the Aaa corporate bond rate, and reflects the price of alternative
financing to firms; IP is the industrial production index and measures
firms expectations about future economic activity; RT is the three month
treasury bill rate, and represents an alternative rate of return for
banks; TD is total bank deposits in billions of dollars, and is a scale
variable. The observed quantity transacted, Q, is specified as the sum
of commercial and industrial loans, and the relevant price change is
ARLt+ =RLt+1RLt. All interest rates are expressed as percentages. The
sample consists of 72 observations on each variable, and can be found in
various issues of the Federal Reserve Bulletin.
4.3 Hypothesis Testing Procedures
Two hypotheses concerning the price adjustment process, and several
hypotheses concerning serial correlation were tested. The first price
adjustment hypothesis maintains that the direction of the price change
l(Apt+l>0) can be used to separate the sample into the underlying supply
(Qt=S ) and demand (Qt=Dt) regimes. The approach we have chosen to
model price adjustment permits the known sample separation hypothesis to
be conveniently expressed as
H o: (P11'Po)=(110).
The null hypothesis was tested by computing a Lagrange multiplier (LM)
test. The LM test was chosen over the Wald and likelihood ratio tests
because it only requires the estimates under the computationally
simpler null hypothesis.
The second price adjustment hypothesis maintains that price
adjustments are symmetrical in the following sense: the chance of a
price increase during a shortage is the same as that of a decrease
during a surplus. This hypothesis can be expressed as
H: P11=1P10.
To test the hypothesis of symmetrical price adjustment, a Wald test was
computed. The Wald test was chosen over the LM and likelihood ratio
tests because it only requires the unconstrained estimates. In this
case the constrained estimates (those obtained under H ) offer no
0
computational advantage over the unconstrained estimates.
The LM and Wald test statistics converge to their usual chisquared
limiting distributions provided that:
(1) n2 (oo)VL (OAN(O,I);
n p n p
(2) a constant c is chosen such that plim (V (?lCc)V (90;c))=0.
n n n p
If VL (G ) is a kdimensional vector, and both (1) and (2) hold, then we
n p
can conclude
VL ( )TV (dV ) 1L (o) A 2
n p n n n p ()X"k
(See, for example, White (1984, Theorem 4.30)).
The specification of c was handled as follows. The LM statistic
for the first H and the Wald statistic for the second H were each
o o
computed for several successive values of c. The LM statistic was
computed for c=1,...,12, and in each case the null hypothesis
(p11' ,P )=(1,0) was rejected. The Wald statistic, however, produced
conflicting evidence for the hypothesis p11= lP0; for some values of c
the hypothesis was rejected, and for others it was accepted. To choose
among the conflicting evidence, the test statistic for serial
correlation (See Section 3.6) was computed for several values of c. On
this basis c was specified, and a single covariance estimate for the
Wald test was chosen. The covariance estimate chosen for the Wald test
was also used to compute the asymptotic standard errors of the parameter
estimates.
The test statistic for serial correlation depends on the
correlation between linear combinations of the components of f (00) and
t p
linear combinations of the components of f (00). Therefore, the
tc p
conclusion of the test depends on how the linear combinations are
chosen, or in other words, on the specified weights wit (see page 36).
For example, the test might reject H for some set of weights, and not
o
reject H for other sets. To help cope with this difficulty, it was
o
decided to choose the weights randomly from a uniform distribution on
the interval (0,1). If there is a finite or countable number of sets of
weights such that H is incorrectly rejected or accepted, then choosing
the weights from a continuous distribution ensures that these weights
are not chosen with probability one. The weights were generated from a
uniform distribution by a SAS random number generator.
4.4 The Results
The model was estimated under the assumption that the error terms
are independent normal variates with constant variances, but are not
necessarily serially independent. First the LS method was applied.
The LS estimates are reported in the first column of Table 1, and were
used as starting values to obtain the ML estimates presented in the
second and third columns. A computer program was written with the SAS
"Matrix Procedure" for the purpose of maximizing the likelihood
functions; the program uses the quadratic hillclimbing technique as
presented in Goldfeld, Quandt, and Trotter (1966). In Appendix A.4 we
describe the quadratic hillclimbing technique, and show that consistent
initial estimates ensure that the secondround estimates obtained from
the technique have the same asymptotic distribution as the partialMLE.
The estimates in column two of Table 1 maximize the likelihood
subject to (p11,P10)=(1,0), or equivalently, under the assumption that
the direction of the price change separates the sample into the
underlying supply and demand regimes. Unlike previous studies a test of
this hypothesis was carried out. The constrained estimates were used to
construct the Lagrange multiplier (LM) statistics. The LM statistic was
computed with twelve different covariance estimates (c=1,...,12). As
the figures in Table two indicate, the hypothesis of known sample
separation is rejected. The conclusion of the LM test has two important
implications for the analysis of the data and model. First it suggests
that the price change alone should not be used to determine whether the
sample period was characterized by excess demand, excess supply, or
both. In most disequilibrium studies this type of analysis is routinely
done. Second, as was shown in Section 2.1, incorrect sample separation
adversely effects the large sample properties of the estimators. In
view of this problem the constrained estimates are suspect.
The next estimation was performed over the unconstrained space, and
consequently pll and p10 were estimated along with the other parameters.
In this case all of the initial consistent estimates were employed, and
therefore the estimates in column three represent the consistent and
asymptotically normal solution. The ML estimates are not much different
than the LS estimates. This is due to stopping iteration before
complete convergence to a maxima of the likelihood function. The
iterative technique performed poorly for the unconstrained likelihood in
the sense that the speed of convergence was extremely slow. For this
reason, the final estimates were obtained from the 100th iteration where
the gradient is not significantly close to zero, and therefore are not
true ML estimates. However, since the initial estimates are consistent,
estimates obtained after the second iteration are asymptotically
equivalent to the ML estimates, and therefore nothing is lost by
stopping iteration before convergence, at least asymptotically. Further
details regarding this point are provided in Appendix A.4. The
particular specification chosen for the model performed well in the
sense that all of the estimates are of the correct sign, and most are
significant. The estimates of p11 and p10 are .8179 and .2455,
respectively, which mean there is (1) a 81.79% chance of a price
increase and 18.21% of a decrease during shortages, and (2) a 75.45%
chance of a decrease and 24.55% of an increase during surpluses.
To select a covariance estimator for the Wald test of H :11=1pP0,
the serial correlation statistic was computed for c=1,2,3. (See Table
3.) The hypothesis of c=3 was accepted. The Wald test statistic did
not reject the hypothesis H :p11=1p10 (see Table 4), suggesting that
price adjustments are symmetrical.
The differences which arise when the imperfect sample separation
given by the price change is ignored can be seen by comparing columns
two and three of Table 1. While both sets of estimates give the correct
signs for the supply and demand variables, the unconstrained estimates
suggest that demand and supply are less responsive to price changes than
do the constrained estimates. The unconstrained estimate of the price
parameter for the supply equation is approximately 40% less than the
constrained estimate, and the price coefficient for the demand equation
is approximately 14% smaller in absolute value for the unconstrained
estimate. Given the rejection of the known sample separation model,
however, we are more inclined to believe the unconstrained estimates.
The problem of determining whether the period 197984 was charac
terized by excess demand or supply was also addressed with the
unconstrained estimates. This was accomplished by estimating the
probability of excess demand for each t conditional on the quantity
transacted and the direction of the price change. The expression for
this conditional probability is
1(Apl >0) 1(Ap <0)
Pr(Dt>S Qt ,(Apt>0)) = (pg) .((lP)gst)
ft(Qt 1(t+l>))
The results are reported in Table 5. As pointed out by Lee and Porter
(1984), the classification rule: Qt=St if Pr(Dt>St Qt, l(Apt+l>0)) > .5
and Qt=Dt otherwise, is optimal in the sense that it minimizes the
probability of misclassification. Applying this rule, we find that
54.12% of the observations are excess demand and 45.8% excess supply.
In contrast, if one were to rely solely on the direction of the price
change, the conclusion would be 31.9% excess demand, 43.1% excess
supply, and for 25% of the observations, Apt+=0. In Table 6, the
compatibility of the direction of the price change with the optimal
classification rule is further examined. Comparing the two rules,
excluding the observations for which Apt+1=0, we find that 9
observations out of 54 are classified differently.
Table 1
Estimated parameters and statistics.
(Asymptotic standard errors in parentheses)
LS
Initial estimates
MLE
P11=1, P10=0
P11'P10
MLE
unconstrained
demand const.
RLRA
IP_1
2
el1
supply const.
RLRT
TD
2
se2
P11
P10
log likelihood
n=72
79.6508
14.9764
2.2856
367.7335
60.6708
4.4981
0.3176
1197.7623
0.8526
0.2571
40.5262
17.2779
2.5429
2140.5700
74.9844
7.3034
0.3266
77.4408
1.0000
0.0000
355.9850
79.6509
14.9758
2.2938
367.7344
60.6709
4.4985
0.3288
1197.7622
0.8178
0.2454
317.3710
(169.61)
(2.918)
( 1.170)
( 94.36)
(145.87)
(0.3834)
(0.982)
( 87.40)
( .0673)
( .2752)
Variables
Table 2
Test of H :(P11,10) = (1,0)
LM Statistic H rejected at a% level
0
16.7693
28.8588
18.9718
65.5703
22.9532
17.3450
10.7834
12.2467
14.4707
5.7286
6.5377
6.5118
0.020%
0.001%
0.008%
0.001%
0.001%
0.017%
0.455%
0.219%
0.072%
5.702%
3.805%
3.854%
nfl; C)
c=1
=2
=3
=4
=5
=6
=7
=8
=9
=10
=11
=12
n n
Table 3
Test of H : c=c
o
Serial Correlation Statistic
H rejected at a% level
o
1 32.7552 0.030%
2 14.2697 16.104%
3 10.6509 38.540%
Table 4
Test of H : p11 = 110
V (m ;c~) Wald Statistic H rejected at a% level
n n o
0.0550411
c=3
94.34%
52
Table 6
Compatibility of the Direction of the Price Change with the
Optimal Classification Rule
Pr(Dt >S l(Apt+1 >O)) >.5
Ap >0
APt+l1
Ap <0
At+1
APt+=0
Pr(D t >S t ItI1(Ap t+ >0)) <.
CHAPTER 5
SEMIPARAMETRIC ESTIMATION OF DISEQUILIBRIUM MODELS USING THE
METHOD OF MAXIMUM SCORE
5.1 Introduction
We consider an alternative estimation strategy not previously
analyzed for a disequilibrium model. The strategy is the socalled
"semiparametric" estimation developed in Manski (1975), Cosslett (1983),
Powell (1984), Manski (1985), and some others. Semiparametric
estimators have been shown to be consistent under more general
conditions than the conventional LS and ML estimators, and therefore
require fewer prior restrictions. For a number of cases where
consistent LS and ML estimation require the functional form of the error
distribution, consistent semiparametric estimators have been derived
without imposing functional form. Powell did so for the censored
regression model using the method of least absolute deviations, Cosslett
derived a distributionfree ML estimator for the binary choice model,
and Manski derived consistent estimators for the same model using the
method of maximum score. Semiparametric estimation is most useful when
parametric assumptions cannot be trusted, but are needed for consistent
LS and ML estimation. In particular, it offers an improved strategy for
estimating disequilibrium models.
We derive consistent semiparametric estimators for disequilibrium
models using the method of maximum score of Manski (1975, 1985).
Consistent score estimators are derived for the following situations:
the functional forms of the error distributions are unknown, the
quantity transacted is an unknown function of supply and demand, and the
price change is an unknown function of excess demand. The presentation
comprises three models and their score estimators. The models we
consider are all of the following form:
M. (model): Given the supply and demand equations St= x + 2t and
Dt=B Xt + E the iid sequence of random vectors (Q pt,xXt) +1, the
event Spq involving either pt or Qt, and the event Sx involving xt.
Pr(S Is ;Bo o) > Pr(Sc IS ;0, ), and
pq x1'2 pq x 1 2
Pr(Sc ;c Oo) >Pr(S pqISCo o);
pq x 1 2 pq x 1 2
where Sc denotes the complement of the event S. General, intuitive
considerations motivate the specification of (S ,S ) for each model.
pq x
For example, the intuition that an expected shortage (excess demand) is
a better predictor of a positive price change than an expected surplus
motivates the model in Section 5.2. Given the model, consistent
estimation depends on general continuity and identification assumptions
which do not require prior knowledge of the functional forms of the
underlying distribution functions or explicit equations for quantity or
price.
The model in Section 5.2 concerns events involving the price
change, Apt+l = Pt+lPt, and expected excess demand, B xt 1xt 2xt'
or more specifically, the binary variables 1(Apt+1>0) and l(xt >0),
where 1(*) denotes the indicator function. The model maintains that
given 1(Bxt >0), the best forecast of l(Apt+l>0) corresponds to
1(Ap t+>0) = 1(xt >0). A score estimator of B is defined and
assumptions for consistency given. The model resembles the binary
response model studied by Manski (1975, 1985), and shares an
identification problem: 8 is only identified up to an unknown
multiplicative scalar.
The model in Section 5.3 is a more restrictive version of that in
Section 5.2, but retains a considerable amount of generality. The model
is designed to exploit the fully observable Apt+1 (versus l(Ap t+>0)) to
identify B. A consistent score estimator is presented, and we show
that 6 is identified without a loss of scale. The model represents a
completely new application for maximum score estimation as it differs
significantly from the model studied by Manski.
The estimators presented in Section 5.2 and 5.3 do not depend on
the quantity transacted, Qt, and therefore impose no restrictions on it.
By neglecting the observations on Qt however, the generality involves a
loss of information. In Section 5.4 we specify a model for Qt and
define a corresponding score estimator. The specification, however, is
insufficient to identify B (even up to a multiplicative scalar) without
severely restricting the distribution of xt. To eliminate the
identification problem the models of the previous sections are added to
the specification, and the estimator is redefined. The resulting
estimator uses the entire sample (QtAPt+l,Xt) t, and is shown to be
consistent under general conditions.
5.2 A Directional Model and Consistent Estimation Up to Scale
The directional model restricts the direction of the price change
to be most likely, but not certain to follow the sign of expected excess
demand, or equivalently
M5.1 (directional model): Pr(Apt+1>0 0xt>0) > Pr(Apt+1OIB0xt>O), and
Pr( Ap t 0 x t<0) > Pr(Apt+ >0 I x <0).
t+1 t t+1 t
The motivation for M5.1 is its compatibility with an intuitively
appealing forecast procedure: if a shortage is expected at time t,
Bext>0, then predict a positive price change, Apt+ >0; otherwise,
predict a nonpositive change. Given M5.1, the number of correct
forecasts must eventually exceed the number incorrect.
The forecast procedure in turn motivates a strategy for estimating
B from n observations on (Apt+l,xt): choose as an estimate of Bo a
value B that maximizes the proportion of the observations characterized
by 1(Apt+l>0) = 1(Bxt >0). This is the method of maximum score. We
propose the score estimator:
n
1 n
S= arg max g (6), where g (B)=n Eg g (B), and
a B t=1
g (B) = l(Ap >0)1(Bx >0) + l(Ap +<0)l(Bxt<0).
The function gt(') "scores" one if a candidate B implies a forecast
compatible with the maintained model, M5.1, and zero otherwise.
Manski (1985) presents a consistent score estimator for a model of
the form MED(ylx)=bx, where MED(z) denotes the median of the random
variable z. His consistency proof, however, depends on the weaker
model: Pr(y>0) bx>0) > Pr(y<0 bx>0) and Pr(y<0 bx<0) > Pr(y>0)Ibx<0).
We have postulated our model in the weaker form for two reasons. First,
the weaker model is easy to interpret as a price adjustment model;
positive price changes occur most frequently with expected shortages,
and negative changes with expected surpluses. Second, but not less
important, MED(Apt+ xt )=oxt is unnecessarily restrictive.
Manski's consistency proof (1985, p. 323) is directly applicable
for B assuming appropriate regularity conditions are met. Theorem 5.2
n
below provides assumptions that imply n converge to 6 almost
everywhere (a.e.) as n becomes indefinitely large.
Theorem 5.2. In addition to M1.1 assume:
A5.3. (continuity): E(gt(B)) = g(B) is continuous in B on a compact set
B.
A5.4. (identification): The set A (B) = {x: sgn(BOx) t sgn(Bx)} has
positive probability for all BeB such that 8Bg .
Then lim B = a.e.
n
Proof:
Step 1. Uniform convergence.
The proof of uniform convergence uses the argument presented in Manski
(1985, pp. 3212). Observe that
g (B) = P ( t+>0, xt >0) + P (Ap t+ xt<0), and
n t+1 n t+< t'
g(B) = P(Ap t+>0, Bxt>0) + P(Apt+1
where P P represent the empirical and true distributions. Therefore,
the generalized GlivenkoCantelli theorem of Rao (1962, Theorem 7.2)
implies
lim sup gn () g(B) = 0 a.e.
BeB
Step 2. Identification.
M5.1 and A5.4 imply that B0 uniquely maximizes g(B). To see this,
consider
E(gt(B) gt(B)) = / E(gt(S) gt() Ixt)dFx
Ac B)
+ / E(g t(B) gt() xt)dFx
A(B)
where Ac(B) denotes the complement of A (B), and F the distribution
x x x
function of x. The first term on the righthand side vanishes given the
definition of gt, and under M5.1 the second term is strictly positive.
0o
Step 3. lim B = a.e.
n
Given A5.3, Step 1, and Step 2, a.e. convergence follows from Theorem 2
of Manski (1983).
Q.E.D.
The assumptions permit a fairly general disequilibrium model. The
consistency proof does not depend on the distributions of lt and _2t'
or how the market determines the quantity transacted. Consistency
depends on a price adjustment model which enters without an explicit
adjustment equation, or a known functional form for the probability
distribution of prices. It suffices to believe that an expected
shortage (surplus) is a better predictor of a positive (nonpositive)
price change than an expected surplus (shortage).
The generality of the assumptions, however, has costs. In
particular, a careful examination of A5.4 reveals that 80 is only
identified up to an arbitrary scale factor. The identification problem
results from the failure of the obvious, but necessary condition that
A (B) be nonempty for all BB0o. Observe that for any A>0 we have
x
sgn(XBx) = sgn(Box) for all vectors x, and therefore A (UB) is an
x
empty set. Thus, if points of the form 8=B0 are included in the
parameter space, B, then A5.4 fails as does identification (Step 2).
Manski (1985) resolves the problem by normalizing the parameter space
with respect to scale which effectively eliminates the troublesome
points. Scale normalization suffices for A5.4, but the conclusion of
o 2
Theorem 5.2 becomes lim B = XB a.e., where X is an unknown scalar.
n
The loss of scale can be interpreted as arising from insufficient
information. The directional model represents prior information on the
stochastic behavior of the signs of Apt+l and xt, but not their
magnitudes; by construction the estimator depends only on the signs.
The limited information permits a fairly general model, but limits what
can be learned about 6o. We shall see next that the loss of scale can
be eliminated by imposing assumptions on the magnitudes of Apt+1 and
Bxt. At the same time it is possible to retain a considerable amount of
generality.
5.3 A Price Adjustment Model with B Identified (Without a Loss of
Scale)
Manski (1985) discusses the score estimator for a binary response
model where the dependent variable, y*, is unobservable, and the sample
consists of observations on l(y*>0). In the last section the price
change was treated analogously to obtain a robust method of estimation.
Unlike the problem considered by Manski, however, Apt+l is generally
observable. To take advantage of the extra information, and thus obtain
a stronger result, we propose the following model.
M5.5 (directionalmagnitude model): for appropriately specified numbers
s>O and 6>0,
Pr(Ap t+l>eBx >6) > max(Pr(Ap t+l 6), Pr( Apt+<e 0 gx >6)),
Pr( I Apt+ 1 OXt <6) > max(Pr(Apt+l>El IBxt 16), Pr(Apt+l< I Boxt l<6)),
Pr(pt+E oxt<6) > max(Pr(Apt+>E Bxt<6), Pr(Apt+ xt<6)).
The directionalmagnitude model quantifies the notion that large (small)
discrepancies between expected buy and sell decisions are most likely to
lead to relatively large (small) price changes. The model predicts a
small price change (IApt+l <) if the expected market position lies
within a specified interval centered at equilibrium (I 6xtl 1<), and
larger changes ( Apt+l l>) otherwise.
Compared to M5.1, the model M5.5 is more restrictive as it
restricts both the direction and magnitude of the price change. We
shall see, however, that M5.5 distinguishes Bo from BO, and thus it
becomes meaningful to discuss estimators that converge unambiguously to
6.
Given M5.5 we define a score estimator of 06 as follows:
B = arg max h (B), where
n BeB n
h (B) = 1(Ap >E)1(Bx >6) + 1( Apt+
t t+ > t t+( xt _
+ 1(Apt+ <s)1(Bxt<6).
To prove lim B =6 a.e. using the arguments in the proof of Theorem 5.2,
n
the relevant assumptions are:
A5.6 (continuity): h(B) is continuous in B on a compact set B.
A5.7 (identification): The set J (B) = {x: sgn(6Bx6) z sgn(Bx6)}
x
has positive probability for all 6BB such that B8BO.
The important difference between the above assumptions and those of
Section 5.2 lies in the identification assumptions A5.4 and A5.7.
Specifically, assumption A5.7 does not require a normalized parameter
space since there generally exist vectors x such that sgn(B6x6) ?
sgn(6x6) for BB0; i.e., the set J (B) is nonempty for 80.
Therefore, it is possible to restrict the distribution of x so that
J (B) has positive probability for B6B6, and to identify 6 without a
x
loss of scale. We summarize the result in the following theorem.
Theorem 5.8. Suppose the ith component of the vector Bo is nonzero.
Then for all B such that 8.60o and .BiO, the set J (B) is nonempty.
I 1 x
Proof:
It suffices to show that there exists at least one solution x to the
system of linear equations: M(B0,B)x=r where
M(360,8) o= 0 k ...
r = and yo>6>y or y>6>yo
Y
The existence of x is equivalent to rank(M(B6,)) = rank(M(B0 ,) r), or
det(M(o,B)) = det(M(B0,) r). If det (M(0B,)) = 2, then the proof is
complete. If det(M(Bo,)) = 1, then we need i/Bi y/0. The
existence of such points y and yO follows immediately since
{y/yO: y>6>y} = (,O)U(1,o), and
{y/yo: y0>6>y} = (0,1).
Q.E.D.
5.4 Maximum Score Estimation of Models That Include the Quantity
Transacted
The estimators presented in sections 5.2 and 5.3 do not depend on
the observed quantity transacted, Q, and therefore neglect relevant
sample information. In this section we propose a model for Q, and
define a score estimator of Bo that depends on n observations of Q. We
shall see, however, that the model for Q is insufficient to identify Bo
(even up to a multiplicative scalar). We resolve the identification
problem by combining the model for Q with the price adjustment models
described in sections 5.2 and 5.3. The score estimator we define for
n
the combined model uses the entire sample (Qt,Apt+l,xt)t=, and
therefore can be expected to be more efficient than the estimators of
sections 5.2 and 5.3.
The observations on the quantity transacted are modeled as follows:
M5.9 (quantity model): For some given 6>0,
Pr(Q >6i xt > 6, x > > )> Pr(Qt<6 xt > 6, > 6),
and
Pr(Q <6 x 6, x < 6) > Pr(Q >6 xt <6,
Two appealing assumptions that are sufficient for M5.9, and therefore
motivate it, are
A5.10 Qt = min(Dt,St).
A5.11 MED(t ) = MED(2t) = 0, and elt and s2t are independent.
Assumption A5.11 requires only independent error terms with
distributions symmetrical about zero.
To construct an estimator of Be given the quantity model, we define
the scoring function:
qt(B) = 1(Qt>6)1(B1Xt>6, 2xt>6) + l(Qt<6)l($1x6,B2xt
To prove consistency for a maximizer of qn(6) using the arguments in the
proof of Theorem 5.2, the relevant assumptions are:
A5.12 (continuity): q(B) is continuous in B on a compact set B.
A5.13 (identification):
(i) The set U (B) = {x: sgn(Box6) z sgn(1x6), sgn(B2x6)
sgn(B2x6)} has positive probability for all PBB such that (B, 2) V
(B' 2).
(ii) The set Z () = {x: sgn( ox6) sgn(Bx6)} has zero
probability.
The role of assumption A5.13 in proving consistency is analogous to that
of the previous identification assumptions A5.4 and A5.7. The two parts
of A5.13 imply that B uniquely maximizes q(B). Part (i) compares to
the familiar order condition needed for the identification in the
textbook simultaneous equation framework. For example, if the supply
and demand equations have no explanatory variables in common, and 6>0,
then Theorem 5.8 implies that U (B) is nonempty for B'B.3 To see the
x
role of part (ii), suppose that the sets ZCU {Zc( o) U ( ), ZCUC ZU,
X x
and ZUc each have positive probability for some 8B. Then we can write,
E(qt( B)qt()) = E(qt( )q t() Ixt)dFx
ZeU
+ c E(q ( e)qt( B) xt)dF + / E(qt( t)qt( ) xt)dFx
ZCuC ZU
+ / E(q ( )qt(B) xt)dF
ZUc t t
It can be readily verified that the first term on the right hand side is
positive, the second in nonnegative, the third is zero, and the last
term is negative. Therefore, given the negativity of the last term,
8t does not necessarily imply E(qt( e)qt(B)) > 0. To rule out this
possibility, we impose part (ii).
The requirement that Z (P) has zero probability, however, is too
restrictive to be generally applicable. It is difficult to imagine a
situation where such an assumption would be appropriate. Therefore,
unless one is willing to severely restrict the distribution of xt, the
model M5.9 is insufficient to identify B. Assumption A5.13(ii) can be
relaxed, however, by combining the model for Q with the price adjustment
model of Section 5.2, and constructing a score estimator that exploits
both models. For this purpose we assume that the price adjustment model
M5.1 holds in addition to M5.9, and consider the scoring function:
q*(B,Bo) = 1(Zc()) (8) + 1(Z ($))P (B)
where Pt(B) = 1(Apt+10)1( 1xt<, B2xt>6) + 1(Apt+1>0)1(Blxt>6, B2xt 6),
l(Zc(B0)) l(x eZC(0)), and Zc(B0) denotes the complement of Z (B).
x t x X X
Generally Z (6) will be unknown, but if a consistent estimate, say Bn,
x
is available, then it can be replaced by Z (6 ). One possible choice
x n
for n is the estimator presented in Section 5.3. This forms the basis
for a "total" sample estimator of 0:
n = arg max q* (B,$ ).
n BeB
To show that $B converges to B0 a.e. we prove:
Theorem 5.14. Let lim $ = B a.e., and 6 sB for all n. In addition to
n n
M5.1 and M5.9 assume:
(continuity): q*(B,B') is continuous in both arguments on a compact set
B.
(identification): Assumption A5.13(i) holds.
Then lim B = B a.e.
n
Proof:
Step 1. Uniform convergence.
The proof is similar to Step 1 of Theorem 5.2. Theorem 7.2 of Rao
(1962) implies
lim sup Jq*(8,a') q*(B,B') = 0 a.e.
B B,B'B
Step 2. Identification.
Let d (B,B) = q*(B 0,) q*(B, ). We will show that BB0 implies
d(, B) > 0. Consider,
d(8,6) = fE(dt(B, B) Ixt)dF + f E(dt( 8) Ixt)dF
UZ UZ
+ f E(dt (B, B) xt)dF +C fcE(dt(8,Bo) Ixt)dFx'
where UZ = {U (8) Z (Bo)}, UZ = {U () ZC(0B)},
x x x x
uCz = {UC(B) Z (Bo)}, and UZc = {UC(B) ZC(O)}. That BB0 implies
x x x x
d(B,B) > 0 follows from the first two terms being positive, and the
last two nonnegative. We will prove this for the first and last terms
only; the proof for the remaining terms is similar.
Consider the first term, and assume without loss of generality that
Box 6 < 0, and B0x 6 > 0, and thus (B1 2)x < 0. Since xU (B), we
1 t t 1 2 t t x
have B1xt6 > 0, and B2xt6 < 0. Therefore,
E(dt( B,) Ixt UZ) = Pr(Apt+1 < 0 xt) Pr(Apt+1 > 0Ixt) > 0,
where the inequality follows from (6 18)x < 0, and M5.1.
For xt E UCc assume without loss of generality that B xt6 > 0,
and 826 > 0. Since xt e Uc(), we have B1xt6 > 0 and 62xt6 > 0, or
6x t6 > 0 and 2x t6 < 0, or x t6 < 0 and 82x 6 > 0. Therefore,
evaluating the conditional expectation case by case, we find
E(dt(B,3) Ixt UCZC) = Pr(Qt>6 xt) Pr(Qt>61xt) = 0, or
= Pr(Qt>6 xt) > 0.
Step 3. lim sup q*(B,Bn) q*(8, BO) = 0 a.e.
B n n
Let Y > 0 be given. Step 1 implies
sup I q*(B,B) q*(8,B ) < y/2 a.e.
feB n n n
BEB
for sufficiently large n. The continuity of q*, and the compactness of
B imply
sup I q*(B, ) q*(B,B)I < y/2 a.e.
eB n
FB
for sufficiently large n since lim B = o a.e. Applying the triangle
inequality we get
^
sup q*(B,B ) q*(B,Be)  < Y a.e.
6B n
for sufficiently large n, which is the desired result.
Step 4. lim B = B8 a.e.
n
Let N be an open neighborhood of BO and define
S= q*(8,, ) sup q*(B,0) > 0
eN 0
where the existence of e follows from Step 2, and the compactness of B.
Now Step 3 implies q*(n ,8) > q*(B B) E/2, a.e. for large n, and
n n n n
since ( ,6 ) maximizes q* we have
n n n
q*(8 B) > q*(B,B ) e/2 a.e. (5.15)
n n n
Step 3 also implies
q*(,Bn) > q*( ,B) e/2 a.e. (5.16)
n n n
67
for large n. Adding both sides of (5.15) and (5.16) we get
q*(e n )
sup q*(8, o) a.e.
8 N B
and therefore 8 EN a.e. for sufficiently large n.
n
Q.E.D.
NOTES
The signum function, sgn(), is defined as follows: sgn(z) = 1 if
z > 0, and sgn(z) = 1 if z < 0.
2Another significant cost is that no distributional theory for
maximum score estimators is currently known.
3Other comparisons with the socalled order condition for
identification are much more complicated, and beyond the scope of this
paper.
CHAPTER 6
CONCLUDING REMARKS AND DIRECTIONS FOR FURTHER RESEARCH
In this thesis, I have proposed several new solutions to the
problem of generalizing disequilibrium models and their estimators. The
empirical example in Chapter 4 demonstrates how to implement many of
these solution in practice. However, as we have seen, while some of the
solutions solve old problems, they also introduce new complications.
For example, while the methods presented in Chapter 3 eliminate the need
to specify a parametric model for serial correlation, they also
introduce the complication of having to choose a single covariance
estimator from several candidates. Clearly, some of the results fall
short of completely generalizing disequilibrium models and their
estimators; there is a tradeoff. I believe, however, that this thesis
accomplishes more than merely shifting the problems faced by empirical
studies from old ones to new ones. In particular, it provides a solid
foundation for further research by clarifying many of the issues
involved. The following is a partial list of directions for further
research on the problem generalizing disequilibrium models and their
estimators:
(1) the consequences of restricting the conditional probabilities
Pr(Ap t>01D >S ) and Pr(Ap t>01D
respect to t, and how to relax this restriction;
(2) the problem of finding an optimal covariance estimator when the
serial correlation is modeled by mixing conditions;
70
(3) the power properties of the serial correlation test in section 3.5;
(4) the small sample properties of estimators obtained from starting
iterative techniques with consistent estimates, but stopping
iteration before convergence;
(5) numerical studies examining the properties of the maximum score
estimators for disequilibrium models relative to parametric
estimators.
APPENDIX
A.1 Inconsistency and Misclassified Observations
We will show that constraining the direction of the price change
1(Apt+1>0) to separate the sample into the underlying demand (Qt=Dt) and
supply (Qt=St) regimes, when in fact l(Ap t+>0) misclassifies
observations with positive probability, leads to inconsistent estimates.
Consider the estimator 0 (1,0) which solves the problem
n
max L ( p11,p10) subject to (p11 ,10)=(1,0),
(O,P11'Pl0)
where L (O,p 11,p0) is defined on page 14, equation 2.3. We will show
that pl<1 and P0=O imply plim n (1,0)> . The proof of plim
1n
0 (1,0)>0 proceeds as follows: we derive a necessary condition for the
n
consistency of an estimator that solves a maximization problem, show
that the condition is violated, and hence conclude plim 0 ;9.
n
The necessary condition for consistency can be viewed as either a
global or local condition depending on whether the estimator is a global
or local maximizer of L The global condition appears as the
n
conclusion of the following theorem.
Theorem A.1.1. Let 0 (y) be a function of the observations such that
n
L (0 ,y)>L (O,y) for all n and all OE, where E is a subset of a
n n n
Euclidean space. Define
L (O,O',y,p) = sup{Ln(t,y)L (O',y): tl
and let L (, O',p) E(L ((0, ',y,p)). Suppose
n n
71
(i) For all sufficiently small p(e)=p>0,
plim(L( O,0' ,y,p) (e,e ',p))=0.
(ii) L (e,Q',p) decreases to L (e,0',0) uniformly in n as p decreases
to zero.
If plim n=0 then lim sup n (e,E,0)} 0 for all E0s.
Proof:
Suppose there exists 0*es such that lim sup{L (90 ,*,0) }<0.
Then by (ii) we can choose p>O such that limnsup{L n( ,0*,p)}<0. Now
define N={e: 10 
R =sup{Ln(t,y)L (0*,y): Ite I0p}
Since 0 EN implies R >0, it suffices to show that lim Pr(R <0)=1.
n n n+ n
Let M = L (0,0*,p) and d=lim supMn 0. Now for sufficiently large
n n n n
n we have M
n
Pr(R Pr(R M <d/4) + 1 as n + by (i).
n n n n n n
Q.E.D.
Under additional regularity conditions, the conclusion of Theorem
A.1.1 can be viewed as a local condition.
Theorem A.1.2. In addition to A.l.l(i) and A.1.1(ii), suppose
(i) IL (0)/a90=a (e)/a0; that is, the order of integration and
n n
differentiation can be interchanged.
(ii) 0o is an interior point of 5.
(iii) L (o)/ao is continuous on a closed neighborhood N1 of 00 with
n 1
radius E1>0, for all n sufficiently large.
Let an (O)/o.=T (O).. If for some i there exists a positive constant
n 1 n i
m. such the IL ((). > m. for all 0 belonging to a closed neighborhood of
0 with radius E2>0, N2, for all n sufficiently large, then plim 0 e9.
Proof:
We will prove plim 9 nO by showing that the hypothesis of the
n
theorem implies lim sup{L (00)L (9*)}<0 for some sequence (9*)
n n n n n
belonging to E.
Let E3=min(E, 2). Since N3 is compact and Ln () is continuous on
N3, there exist points 9* belonging to N3 such that L (E*)=sup{L (0):8
n 3 n n n
belongs to N3}. Furthermore, since In (0)il>0 on N2, the points 9* lie
on the boundary of N3. Therefore, 9*0o I=E3'
By the mean value theorem we have
K
L (*)L (o) = E (* .) (0')., (2)
n n n 1 n,i n ni
where 0' lies on the segment connecting 9* and 0 Now if L (9') >m >0,
n n n ni 1
then we must have .o >0. Otherwise, since L is strictly increasing
n, i 1 n
in its ith argument on N3, we would have L (9* ...n,)>
3 n n,1 I n, k
S(* ,...,9* .,...,1* ) which contradicts the fact that 9* is a
n n, n,i n,k n
maximizer of L Similarly, if L (9')jn. <0, then n,. <0.
n n n j n,j j
Without loss of generality suppose
L (9').>m.>0 for i=l,...,h and
Snn 1 1
L (0').
n n i 1
Then by equation (2) we have
h K
i (9*)L (o) > Z (0* .?)m. + E (?e0* .)(m.)
n n n i n, 1 1 i=+h i n,1 1
i=1 i= 1+h
K
>m i* .3 i>m.d>0,
 ,i= 1 
for some d>0, where m=min(ml,...,mh, mh+,,...,mk). This implies
lim sup{L (o0): (0*)}<0.
n n n n
Q.E.D.
Therefore, to prove plim 0 (1,0)0b0, it suffices to show that
n
DL (e;1,0)/nBI is bounded away from zero. We establish this by showing
that
E(OL (0;i,0)/ ) = (1pol) E xtE(QtDt)/o2 (3)
Let Dlogft(;l1,0)/81 = fb, l(.)=l(Apt+>0), and note that
E(f ) = b f tf(Qt, () IO,po ,P0=0) dQt. (4)
Now if po =1, then (4) is the expectation of a likelihood equation, and
1b
therefore given the usual regularity conditions we have E(f )=0 at
po l1. This condition will imply
b b
i(') / f gstdQt=(ll(')) / f gdtdQt (5)
Substituting (5) into (4) yields
E(f) = (11())(1pl) f gdtdQt + f gstdQt (6)
b 02
For 1()=0, given the normality of Elt, 2t we have f (Q tXt )xt/ o'
Substituting this into (6), and summing over the observations gives (3).
A.2 The Computational Tractability and Asymptotic Properties of the
Least Squares Estimator of Section 2.3
In Section 2.3 we proposed using a LS estimator to find the
consistent and asymptotically normal solution to the likelihood
equations; i.e., use the LS estimates as starting values to iterate to
the consistent and asymptotically normal local maxima of the likelihood
function. The success of this strategy depends on:
(a) The objective functions to be solved for the LS estimates are not
characterized by an unknown number of local minima so that global minima
can be easily found; i.e., multiple solutions are not a problem.
(b) The LS estimators (defined as global minimizers) are consistent and
have a proper limiting distribution.
If (a) fails, then the LS method is no more computationally tractable
than the ML method, and thus one might as well use the ML method to
begin with. (b) ensures convergence to the consistent and
asymptotically normal local maxima of the likelihood function. (See, for
example, Amemiya (1973, pp. 101415).) In this section we will argue
that both (a) and (b) are likely to be satisfied in practice.
Condition (a) will be obviously satisfied if the following
optimization problems have unique solutions:
1 n 2
localmin n E (l(Ap+ >0)E(l(Apt+>0)))
t=1
(p1'P10' Y)
1 n 2
localmin n (QtE(Q )
t=l
2 2
(81, B, ~al+o)2
1 n 2^( 22
localmin n E (Q E(Q )
t t
2 2
(a1 ,a82)
where E(Qt) denotes the function E(Qt) with y estimated by y (obtained
2o o 2 2 and
from (1)), and E(Q ) denotes E(Q ) with B0, 2',( ), and y
2 2
estimated by a1, 2) (a +0F2), and y (obtained from (1) and (2)).
Solutions to problems (2) and (3) are OLS estimates, and therefore
are unique if the appropriate matrices of explanatory variables have
full column rank. For example, unique LS estimates can be obtained by
solving (2) if the following matrix has full column rank:
s d
(1(x1Y))x1, $(xly))xd, ((xlY)
d
(1$(x Y))xs, <(X y))x 4(x y)
where xs denotes the lxks vector of explanatory variables of the supply
t
equation, and xd the lxk vector of demand explanatory variables. In
general, the matrices of explanatory variables for (2) and (3) will have
full column rank provided that the functions (xt y) and }(xt y) are not
constant for all t.
Solutions to problem (1) are nonlinear LS estimates, and conse
quently establishing their uniqueness is much more difficult. Unfortun
ately, attempts to prove that problem (1) has a unique solution have
been inconclusive. However, there is some evidence suggesting that
problem (1) can be solved for a global minimum in practice. First, the
larger the sample size the more likely problem (1) will have a unique
solution. Lemma A.2.4 below provides a rank condition which ensures a
unique solution with probability approaching one as n approaches
infinity. Second, given the data discussed in Chapter 4, attempts to
solve problem (1) were successful in the sense that all starting values
iterated to the same solution. In contrast, attempts to maximize the
likelihood function were unsuccessful as different starting values
iterated to different solutions. Third, the objective function in
problem (1) is bounded below (by zero) which simplifies the search of
the parameter space for a global minimum. In contrast, a search for a
global maximum of the likelihood function is complicated by
2 2
unboundedness: L  as a *0 or a 20, (see, for example, Maddala
n El dd2
(1983, p. 300)). Therefore, any search for a global ML estimate will be
futile unless one is willing to arbitrarily bound the error variances
away from zero.
Next we discuss conditions that imply consistency for the LS
estimator. We will only consider conditions that imply consistency for
the nonlinear LS estimator defined as any global minimizer of problem
(1). (Given plim y=yo, proving consistency for the OLS estimators
obtained from solving problems (2) and (3) involves repeated application
of Jennrich's (1969, Lemma 3) meanvalue theorem for random functions,
and is quite tedious.) For simplicity, rather than necessity, we will
assume that all relevant random variables are independent identically
distributed across t. This enables us to apply the following simplified
version of White's (1980) Lemma 2.2 to the global minimizer of problem
(1).
Lemma A.2.1. Let Q n(w,) be a measurable function on a measurable space
W and for each w in W a continuous function on a compact set E. Then
there exists a measurable function 0 (w) such that
n
Qn (w, (w))=inf Q (w,O) for all w in W.
If plim{sup Q (w,e) (O) )=0, and if ( 0) has a unique minimum at
n
9, then plim =9.
Proof: See White (1980, Lemma 2.2).
The first part of lemma A.2.1 ensures the existence of the
nonlinear LS estimator (defined as a global minimizer). The second part
will be used to show consistency. For this purpose we define,
1 2
Qn(0)= n 1 (1(Apt+l>0)E(l(Apt+1>0)))
t=l
1 n 2
= n E (z t() + Ul) ,
t=1
where zt 0)=Pi11 (PlP o)(xt Y) + (Pllp0)(xtY), and
=(p11'1,P0,Y). To apply the second part of Lemma A.2.1 we need to show
uniform convergence, and that Q(0) has a unique minimum at 0 The next
lemma, which is due to Hoadley (1971), provides a moment restriction
that implies uniform convergence.
Lemma A.2.2. For the function defined in Lemma A.2.1 suppose
EIQ (0) l+d_0. Then plim {sup IQ ()Q (0) }=0.
n n n
Proof: See Hoadley (1971, Theorem A.5).
The following lemma establishes that the moment restriction holds.
Lemma A.2.3. EIQ (O) Il+d0.
Proof: Since zt(0) is bounded we have
(z (8)+ul) 2<2.z () 2+2.u2
Therefore, the conclusion of the lemma follows if E u tl2+dn, d>0. Let
1t=l(Apt+ >0), set d=l, note that E lt k=E(1t)
that ult=1tE(it). Thus,
E lu 31Elu3 J
It it rt t t t t t 
Q.E.D.
Finally, we present a rank condition that implies Q(O) has a
unique minimum at o0, and therefore together with Lemma A.2.3 ensures
consistency for a global minimizer of Qn ().
i
Lemma A.2.4. Suppose xt is a discrete random variable, and let xt
denote the ith member of the support of xt. For each EH such that
0o0 suppose there exists k>l members of the support of xt such that
the following matrix has full column rank:
Ak 1 ( (x y) D(xtY )
'(k '(k o
If p1 >p10, then Q(O) has a unique minimum at 0.
Proof: Since E(ult Xt)=0, we have
Q(e)=E(zt(0)+ult)2=E(0E(z )2)+E(ult).
Obviously, (0) has a minimum at 0 since E(zt(0 )2)=0. To prove
uniqueness it suffices to show that o0<=> E(z (0)2)>0.
Suppose for some Oo, E(zt(0) )=0. Since Pr(zt() 2>0)=1, we have
E(z (0)2)=0<=>Pr(z (0) =0)=1. This implies that for every xi belonging
to the support of x ,zt(0) =0. That is,
But this contradicts the assumption that Ak has full column rank unless
o o o
1 1plI lP0l=P11p10
Q.E.D.
Finally,we note without proof that Theorem 3.1 of White (1980) can
be applied to show that the nonlinear LS estimator obtained from solving
problem (1) is asymptotically normal. Therefore, the LS estimates have
a proper limiting distribution.
A.3 Proofs of Theorems 3.23.17
Proof of Theorem 3.2: See McLeish (1975, Theorem 2.10).
Proof of Theorem 3.7: The proof is the same as Hoadley's (1971)
Theorem 1 except Theorem 3.2 is applied instead of Markov's law of large
numbers.
Proof of Theorem 3.8: For notational simplicity let 1 t=(Ap t+>0).
Consider an arbitrary point O*e5. We will show that given E>0 there
p
exists d>0 such that 0 0*1
p p
If (Qt,lt p)f t(Q ,lt )I
t t1t p t t1t p
where e and d do not depend on t.
Assumptions 3.4(i) (normality) and 3.6(i) (compactness) imply
lim sup{ If(Qtt1 0 )ft(Q ltit IO:0 C*}=0 (1)
Q +o p p p p
Qt
Let e>0 be chosen. Then equation (1) implies that there exists
a =a(xt,1 )>0 and dt=d(xt,1 )>0 such that for QtI>at and I p t
t t t t t t p t
have
Ift(Qtt1 )f t(Qtlt )I< (2)
By assumption 3.5(ii) (xt has a finite support) equation (2) holds a.e.
for IQt >amax(al,...,ak) and 1 p9* d=min(dl,... k). Thus, it
remains to show that equation (2) holds a.e. for Qt belonging to [a,a].
Let C={(Qt,1t,xt ):Qt belongs to [a,a]}. Since C is compact,
and ft(Qt,1t 1 ) is continuous on C, it follows that ft(Q t,10 p) is
uniformly continuous on C. That is, there exists a d>0 such that
equation (2) holds a.e. uniformly in t whenever 10 0*\
p pE.D.
Q.E.D.
Proof of Lemma 3.9: The result follows from the fact that E is
separable and ft (Qt,(Ap t+>0) Ip) is continuous on E. See, for
example, Loeve (1960, p. 510).
Proof of Lemma 3.10: Hartley and Mallela (1977, Corollary 4.2)
prove that there exists p(O )>0 such that
p
E sup{lnf (Q ,l( Ap >0) :e'):, _e)
t tpt+1 p p p
for k=2. In fact, their arguments can be used to show that (3) holds
for any even positive k, and therefore for any positive k.
Proof of Lemma 3.11: The proof involves minor modifications to the
proofs given in Amemiya and Sen (1977, lemmas 2 and 3) to cover the case
of PllP10'
Proof of Lemma 3.14: See White (1984, Theorem 2.4).
Proof of Lemma 3.15: By Theorem 2.3 of White and Domowitz (1984),
assumptions 3.15(i) and 3.15(ii) imply
1 n
plim {sup n E (qt(yt,)q t(y t,))}=0. (4)
OE t=l
Given (4) and plim 0 =0o, Lemma 2.6 of White (1980) implies
n
plim n1 Z (qt( t )qt (yt)))=0.
Q.E.D.
Proof of Theorem 3.16: See Newey and West (1985, Theorem 2).
Proof of Theorem 3.17:
Step 1. n2 a (n1 ()T (o0)) n~ (o)Tf (O0)
n c p c p c p c p
We will show that Step 1 follows from
n as (nlf ( Pml)Tf (fPl))lnf ( ml)Tf(ml0). (5)
n c n c n c n n
Given 3.17(i), the meanvalue theorem for random functions
(Jennrich (1969, Lemma 3)) allows us to write
f(oml)=f(o)+(af(_ )/ae )(D i_O), and (6)
n p n p n p
ml0o),ml
f (eo )=f (oo).+(af (' ) /e )( ), i=l,...,k, (7)
c n i c p i c ni p n p
where f ( )i). denotes the ith column of the matrix f (11l), and o
c n 1 c n n
m0
and 0 each lie on the segment connecting ?l and 0 .
n n p
Given (7), 3.17(ii), and plim ml=o0, we have
n p
nl ( f1)Tf ( l) .=nlf (6o)Tf (o).+o (1). (8)
c n I c n j c p i c p J p
Given (6), (7), 3.17(ii), 3.17(iii), H and plim Ol=00, we have
0 n p
nlf ( f( fl)=n o (O O+o (1). (9)
c n i n c p i p p
Substituting (8) and (9) into (1) we get the desired result:
n c =(n f (o)Tf (e)+o ()) nf (o)Tf(o)+o (1).
n c p c p p c p p p
A
I1sT mliis 2
Step 2. n o1 D (ml ) A 2
n n n n 'XK
By Step 1 we can write
ls 11 o T
2 A (00)1 2f (oo)Tf(oo)=
n s A (0) nf (0)
n n p c p p
[(n f (e) f (o))1A (e0)1n f (0o)Tf( o)+o (1)
c p c p n p c p p p
Therefore, by 3.17(iv), 3.17(v), and 3.17(vi), we have
s1 2 2f1( e)T*
plim (D (o00) n 2as D (00)A (bo) n f (e)Tf())= (10)
n p n n p n p c p p
Given 3.17(vi), by Corollary 4.24 of White (1984),
A
D (0)A (0) f (no Tf(o0) N(O,I) (11)
n p n p c p p
(6) and (7) imply,
So is A
D (op)nas n N(0,I),
n p n k
and therefore by Corollary 4.28 of White (1984) we have
T A
is o 1 ls 2
n nD (0) an .
n n p n "
Finally, since plim (D ( l )1D (00)1)0, by Theorem 4.30 of White
n n n p
(1984)
T A
Is T l1 la s 2
na D (P ) a "
n n n n Xk
Q.E.D.
A.4 Quadratic HillClimbing and the Asymptotic Distribution of the
(p+l)thRound Estimates
The (p+l)th (p=1,2,...) iteration of the quadratic hillclimbing
technique is given by
p+i=P (2L (P)a I)1 VL (EP) (1)
n n n n n n n
where a = max (A +r VL ( ) ,0), X is the maximum eigenvalue of
n n n n n
2L (eP), r is a scalar correction factor, and IIVL ()P) denotes the
n n n n
length of the k dimensional vector VL (n ).
n n
Goldfeld, Quandt, and Trotter (1966) show that the technique
chooses OP+i to maximize the quadratic approximation of L (0) on a
n n
region centered at Op of radius
n
(A ((O)a I)1 VL (P)
n n n n n 
If the quadratic approximation is good, (that is, if the step increases
L (0)), then in the next step r is decreased. Otherwise r is increased.
n
Further details can be found in Goldfeld, Quandt, and Trotter (1966).
Next we show that the estimator defined by p+1 has the same
n
asymptotic distribution as the partialMLE provided that plim OP=O0 and
n
7n (OP0) has a proper limiting distribution. More explicitly, we show
n
A
(p (e0 _0) = (n'V2L (0))1 n VL (0o). (2)
n n n
The implication is that when consistent initial estiamtes are employed,
iteration beyond the secondround does not improve the final estimates,
at least asymptotically.
1
To prove (2), it suffices to show that plim n a =0. To see this,
consider the meanvalue expansion
VL (GP) = VL (Co) + L ( )(Poo). (3)
n n n n n n
Substituting (3) into (1) and rearranging, we get
n(op+lo0) (nla In1 L (P))lnVL (0)
n n n n n
= [I(n2L (P)na I)n 1V2L (0 )] Tn(OP0). (4)
n n n n n n
1
Therefore, if plim n a =0, then (2) follows from (4) since the right
n
hand side of (4) converges in probability to zero.
1
The following theorem establishes that plim n a =0.
n
Theorem A.4.1. For VL (0) = E alogft ()/aQ, suppose
n t=1
n
1
(i) plim sup n Z [ logf t()/ E(alogft( )/a()] = 0.
0 t=l
(ii) plim ep=9.
n
(iii) E(9logft(O)/O) is continuous.
1
(iv) plim n <0 .
n
i
Then plim n a =0.
n
Proof:
1i
If suffices to show that plim n  IVL (eP) =0. Now
n n
n LV (P) I = n(1E VL (0p)2)2
n n i= n n i
k n
= n1( (Z 1ogf (P)/96.)2a
i=1 t= t n
k n p
< Z inE C logf (OP)/eo. I 0
i=i t= t n
by (i), (ii) and (iii).
Q.E.D.
In effect, the proof of (2) follows from the observation that if
1
plim n a =0, then for sufficiently large n equation (1) reduces to the
n
NewtonRaphson technique with probability approaching one. Given that
the proof depends on (1) reducing to the NewtonRaphson technique
asymptotically, why not use the latter to begin with? Unfortunately, a
definitive answer to this question is not available. The answer lies in
the small sample properties of the estimators, which undoubtably would
require Monte Carlo studies to help uncover. We have chosen quadratic
hillclimbing over NewtonRaphson because it is somewhat reassuring to
know that the former always moves in the direction of a maximizer of the
likelihood function, while the latter might not.
BIBLIOGRAPHY
Amemiya, T. (1973). "Regression Analysis when the Dependent Variable
is Truncated Normal." Econometrica 41:9971016.
Amemiya, T. (1974). "A Note on the Fair and Jaffee Model."
Econometrica 42:759762.
Amemiya, T., and G. Sen (1977). "The Consistency of the Maximum
Likelihood Estimator in a Disequilibrium Model." Technical Report
238. Institute for Mathematical Studies in the Social Sciences,
Stanford University.
Benassy, J.P. (1982). The Economics of Market Disequilibrium. New
York: Academic Press.
Bowden, R.J. (1978). The Econometrics of Disequilibrium. Amsterdam:
North Holland.
Cosslett, S.R. (1983). "Distributionfree Maximum Likelihood Estimator
of the Binary Choice Model." Econometrica 51:765782.
Fair, R.C., and D.M. Jaffee (1972). "Methods of Estimation for Markets
in Disequilibrium." Econometrica 40:497514.
Fair, R.C., and H.H. Kelejian (1974). "Methods of Estimation for
Markets in Disequilibrium: A Further Study." Econometrica
42:117190.
Fisher, F.M. (1983). Disequilibrium Foundations of Equilibrium
Economics. New York: Cambridge University Press.
Goldfeld, S.M., and R.E. Quandt (1975). "Estimation in a
Disequilibrium Model and the Value of Information." Journal of
Econometrics 3:325348.
Goldfeld, S.C., R.E. Quandt, and H.F. Trotter (1966). "Maximization
by Quadratic Hillclimbing." Econometrica 34:541551.
Gordin, M.I. (1969). "The Central Limit Theorem for Stationary
Processes." Soviet Mathematics 10:11741176.
Hartley, M.J., and P. Mallela (1977). "The Asymptotic Properties of a
Maximum Likelihood Estimator for a Model of Markets in
Disequilibrium." Econometrics 46:12511271.
Heckman, J.J. (1976). "The Common Structure of Statistical Models of
Truncated, Sample Selection and Limited Dependent Variables and a
Simple Estimator for Such Models." Annals of Economic and Social
Measurement 5:475492.
Hoadley, B. (1971). "Asymptotic Properties of Maximum Likelihood
Estimators for the Independent Not Identically Distributed Case."
Annals of Mathematical Statistics 42:19771991.
Ito, T., and K. Ueda (1981). "Tests of the Equilibrium Hypothesis in
Disequilibrium Econometrics: An International Comparison of Credit
Rationing." International Economic Review 22:691708.
Jennrich, R.I. (1969). "Asymptotic Properties of Nonlinear Least
Squares Estimators." Annals of Mathematical Statistics 40:633643.
Laffont, J.J. and R. Garcia (1977). "Disequilibrium Econometrics for
Business Loans." Econometrica 45:11871204.
Lee, L.F., and R.H. Porter (1984). "Switching Regression Models with
Imperfect Sample Separation Information  With an Application on
Cartel Stability." Econometrica 52:391418.
Levine, D. (1983). "A Remark on Serial Correlation in Maximum
Likelihood." Journal of Econometrics 23:337342.
Loeve, M. (1960). Probability Theory. 2nd ed. Princeton: Van
Nostrand.
Maddala, G.S. (1983) Limiteddependent and Qualitative Variables in
Econometrics. New York: Cambridge University Press.
Maddala, G.S., and F. Nelson (1974). "Maximum Likelihood Methods for
Markets in Disequilibrium." Econometrica 42:10131030.
Manski, C.F. (1975). "The Maximum Score Estimation of the Stochastic
Utility Model of Choice." Journal of Econometrics 3:205228.
Manski, C.F. (1983). "Closest Empirical Distribution Estimator."
Econometrica 51:305320.
Manski, C.F. (1985). "Semiparametric Analysis of Discrete Response:
Asymptotic Properties of the Maximum Score Estimator." Journal of
Econometrics 27:313333.
McLeish, D.C. (1975). "A Maximal Inequality and Dependent Strong Laws."
Annals of Probability 3:826836.
Newey, W.K. and K.D. West (1985). "A Simple, Positive Definite,
Heteroscedasticity and Autocorrelation Consistent Covariance
Matrix." Discussion paper 92, Woodrow Wilson School, Princeton
University.
Olsen, R.J. (1978), "Note on the Uniqueness of the Maximum Likelihood
Estimator for the Tobit Model." Econometrica 46:12111215.
Powell, J.L. (1984). "Least Absolute Deviations Estimation for the
Censored Regression Model." Journal of Econometrics 25:303325.
Rao, R.R. (1962). "Relations between Weak and Uniform Convergence of
Measures with Applications." Annals of Mathematical Statistics
33:659680.
Rudin, W. (1976). Principles of Mathematical Analysis. New York:
McGrawHill.
Serfling, R.J. (1968). "Contributions to Central Limit Theory for
Dependent Variables." Annals of Mathematical Statistics
39:11581175.
Sealy, C.W., Jr. (1979). "Credit Rationing in the Commercial Loan
Market: Estimates of a Structural Model Under Conditions of
Disequilibrium." Journal of Finance 34:689702.
Wald, A. (1949). "Note on the Consistency of the Maximum Likelihood
Estimate." Annals of Mathematical Statistics 20:595601.
White, H. (1980). "Nonlinear Regression on CrossSection Data."
Econometrica 48:721746.
White, H. (1984). Asymptotic Theory for Econometricians. New York:
Academic Press.
White, H., and I. Domowitz (1981). "Nonlinear Regression with Dependent
Observations." Unpublished paper, University of California, San
Diego.
White, H., and I. Domowitz (1984). "Nonlinear Regression with Dependent
Observations." Econometrica 52:143162.
BIOGRAPHICAL SKETCH
Walter James Mayer was born in Detroit, Michigan, in 1955. He
received a Bachelor of Arts degree in economics from the University of
Missouri in 1982, and a Master of Arts degree from the University of
Florida in 1983.
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
Stephen R. Cosslett, Chairman
Associate Professor of Economics
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
G.S. Maddala
Professor of Economics
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy. j
A.I. Khuri
Associate Professor of Statistics
This dissertation was submitted to the Graduate Faculty of Department of
Economics in the College of Business Administration and to the Graduate
School and was accepted as partial fulfillment of the requirements for
the degree of Doctor of Philosophy.
nDrpmcobr 1986
Dean, Graduate School
D

