Title: Generalized economic models and methods for markets in disequilibrium
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00102774/00001
 Material Information
Title: Generalized economic models and methods for markets in disequilibrium
Physical Description: Book
Language: English
Creator: Mayer, Walter James, 1955-
Copyright Date: 1986
 Record Information
Bibliographic ID: UF00102774
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: ltuf - AEN9870
oclc - 16167663

Full Text








I would like to thank my advisor Dr. S.R. Cosslett, for his support

throughout this project. Thanks are also extended to Dr. G.S. Maddala, Dr.

D.A. Denslow, and Dr. A.I. Khuri. Invaluable assistance provided by DeLayne

Redding in the typing of this document is much appreciated. This dissertation

is dedicated to my parents.

Abstract of Dissertation Presented to the
Graduate School of the'University of Florida
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy




December 1986

Chairman: Dr. S. R. Cosslett
Cochairman: Dr. G. S. Maddala
Major Department: Economics

Empirical studies of markets in disequilibrium have relied on the

appropriateness of explicit price adjustment equations, serial

independence, normally distributed errors, and explicit equations

relating the observed quantity transacted to desired supply and demand.

For example, the asymptotic properties of "disequilibrium" estimators

and test statistics are sensitive to the parametric forms chosen for

price adjustment, the serial behavior of the observations, error

distributions, and the quantity transacted. In a word, "disequilibrium"

estimators and statistics are non-robust. Unfortunately, economic

theory provides little basis for choosing the parametric forms. A lack

of economic-theoretic restrictions coupled with non-robust estimators

and statistics has severely limited empirical studies of markets in


This dissertation develops new methods for more meaningful

estimation of disequilibrium models. The new methods involve more

general models and robust estimators.

A switching regression model with imperfect sample separation is

used to incorporate price adjustment into a disequilibrium model. The

model enables price adjustment to be incorporated with less a prior

information than usual. To estimate the model, maximum likelihood and

least squares estimators are proposed.

The asymptotic properties of the maximum likelihood estimator are

examined. Previous results for maximum likelihood estimators of

disequilibrium models are generalized with asymptotic theory for

serially dependent observations. The maximum likelihood estimator is

shown to be consistent and asymptotically normal even if the data are

characterized by unknown forms of serial dependence. Asymptotic test

statistics are also derived.

The methodology is illustrated with an empirical application to the

U.S. commercial loan market from 1979 to 1984.

Finally, I propose semiparametric models and estimators for markets

in disequilibrium. These methods are applicable when the error

distributions are unknown, and the quantity transacted is an unknown

function of supply and demand. Consistent estimators are derived using

the method of maximum score.






1 AN OVERVIEW..............................

1.1 The Problem ........................
1.2 Solutions...........................

ADJUSTMENT INFORMATION ...................

2.1 Introduction ........................

2.2 The Model and Maximum Likelihood


2.3 A Consistent Initial Least Squares


2.4 Summary and Conclusions.............




3.1 Introduction........................
3.2 Consistency.........................
3.3 Asymptotic Normality................
3.4 Consistent Covariance Estimation....
3.5 An Asymptotic Test for Serial
3.6 Summary and Conclusions.............

COMMERCIAL LOAN MARKET................... 41
4.1 Introduction........................ 41
4.2 The Empirical Model.................. 42
4.3 Hypothesis Testing Procedures....... 43
4.4 The Results ......................... 45

MAXIMUM SCORE............................ 53
5.1 Introduction........................ 53
5.2 A Directional Model and Consistent
Estimation Up to Scale............. 55
5.3 A Price Adjustment Model with
Identified (Without a Loss of Scale) 59
5.4 Maximum Score Estimation of Models
that Include the Quantity Transacted 61
NOTES ...................................... 68

FURTHER RESEARCH......................... 69


A.1 Inconsistency and Misclassified
Observations......................... 71
A.2 The Computational Tractability and
Asymptotic Properties of the Least
Squares Estimator of Section 2.3.... 74
A.3 Proofs of Theorems 3.2 3.7........ 80

A.4 Quadratic Hill-Climbing and the

Asymptotic Distribution of the

(p+l)th-Round Estimates.............. 83

BIBLIOGRAPHY.................... ................. 86

BIOGRAPHICAL SKETCH............................... 89


1.1 The Problem

Before Fair and Jaffee (1972) introduced their econometric

disequilibrium model, estimation of market behavior was confined to the

equilibrium assumption. The study of the econometrics of disequilibrium

was further developed by Fair and Kelejian (1974), Maddala and Nelson

(1974), Amemiya (1974), Goldfeld and Quandt (1975), Laffont and Garcia

(1977), Bowden (1978), and some others. By allowing for disequilibrium,

Fair and Jaffee's work and the subsequent work it inspired represents an

important generalization; but a generalization obtained by imposing

(1) explicit price adjustment equations,

(2) serial independence on time series data,

(3) normally distributed error terms, and

(4) explicit equations relating the observed quantity transacted

to desired supply and demand.

This contrasts with the equilibrium assumption where

(1) price adjustment is not an estimation issue,

(2) allowances are made for serial correlation,

(3) the errors are only assumed to be uncorrelated with a subset of the

explanatory variables, and

(4) desired supply and demand are directly observable.

The estimation of market behavior has been extended to the

disequilibrium assumption, but at a cost.

Economic theory for markets in disequilibrium is a relatively new

area of research that has been developed in the last few years by

Benassy (1982), and Fisher (1983), among others. Being recent

phenomenon, however, the theories that have been proposed are rather

limited in scope and tentative. For the empirical researcher, the

theories provide little guidance for specifying price adjustment, and

the quantity transacted as a function of desired supply and demand; they

provide no basis for specifying the error distributions and serial

independence. A survey of the many empirical studies that have followed

Fair and Jaffee (1972) suggests that the basis for specifying these

aspects of the econometric disequilibrium model has been largely

computational tractability. This approach has led to several

drastically different disequilibrium specifications. The assumptions

of each specification generally do not represent well-defined

economic-theoretic restrictions, and thus differences among them seldom

reflect differences among well-defined alternative economic theories.

As a result, most disequilibrium specifications are as good (or bad) as

any other. Unfortunately, each specification produces estimates only as

reliable as the assumptions imposed, and differences among them can lead

to conflicting estimates of supply and demand equations.

The lack of economic-theoretic restrictions alone does not prohibit

meaningful estimation of a disequilibrium model. The estimators

commonly applied are also prohibitive. Most proposed "disequilibrium"

estimators can be viewed as corrected versions of the "equilibrium"

least squares (LS) and maximum likelihood (ML) estimators. The

inequality of supply and demand introduces nonzero correlation between

the explanatory variables and the error terms. Given a model for the

inequality of supply and demand, the "equilibrium" LS and ML estimators

can be corrected for the nonzero correlation to yield consistent

"disequilibrium" LS and ML estimators. The correction approach provides

insight into the problem of relaxing the equilibrium assumption, but

generally requires restrictive assumptions to make it operational. In

particular, consistent LS and ML estimation of a disequilibrium model

depends on choosing the correct the parametric forms for price

adjustment, the error distribution functions, and the quantity

transacted; useful inferences require allowances for serial correlation

as well as correct parametric forms. Non-robustness coupled with a lack

of economic-theoretic restrictions severely limits the reliability of LS

and ML estimation.

To illustrate these points we consider the following model.

D = + (1.1)
t 1xt it
St = Bx + 2t (1.2)
2xt 2t
Data: (Qt,xt)n=l.
t t t=1
Equilibrium assumption: Qt = D = St .

Disequilibrium assumption: Dt St

Qt = T (D S )

Ap = 1t(Dt-St).

Equations (1.1) and (1.2) are demand and supply functions; D denotes the

quantity demanded, S the quantity supplied, x a vector of explanatory

variables, e1t and e2t denote random error terms. Under the equilibrium

assumption the observed quantity transacted, Q, is equal to both D and

S; data are observed after prices adjust, and therefore adjustment

models are irrelevant. Under the disequilibrium assumption D and S are

not necessarily observable, the function T (.,.) specifies the position

of D and S relative to the observable Q ; data reflect adjustments at

various stages, and therefore it becomes meaningful to model price

adjustment. Price adjustments are modeled as follows: the price change,

Apt+I t+l Pt, depends on excess demand, Dt-St, through the function


When LS and ML are applied under the disequilibrium assumption, it

becomes necessary to specify the distribution of (Elt, 2t) up to an

unknown parameter vector, and the functional forms for Tt and It. The

following example will illustrate this. Consider the problem of

obtaining a consistent LS estimate of B. Under the equilibrium

assumption the data are conditional on the event Q=D=S, and therefore a

consistent "equilibrium" LS estimate requires E(xtlt IQt=Dt=St)=0, or

equivalently E(x t t)=0 since Qt=Dt=St is a sure event by assumption.

Under the disequilibrium assumption, by contrast, each observation is

conditional on either Dt St, and therefore a consistent

"disequilibrium" LS estimate requires E(xtEltIDtSt)=0.

But since, for example, D
condition E(xt ItDt
contrary E(xtct D
holds. As a result, the LS estimator must be corrected for the nonzero

conditional correlation between xt and elt; that is, parametric models

must be specified for E(Xt t IDtSt). For example,

suppose we specify

0 2 0 \
(lt2t) N 2 N(1.3)
t t 0 0 2

Qt = Tt(Dt'St min(DtS t)' (1.4)

Apt+ = t (Dt,St) E a(Dt-St), a>O, (1.5)

and for the first n observations we have Apt+l0. Then Q =D
t=l,...,n1 by equations (1.4) and (1.5), and consistent LS estimates can

be obtained by solving the problem

min {(n(Qt-B Xt- E( It IQ tt; ',E21, 2,22' E 2
2 2 t=l
(B1,C1 ,2' a2

The functional form for the "correction" term E(clt Qt
from (1.3) and is given by3

2 2

where i(.) and $(.) denote the standard normal density and distribution

function. Without a priori restrictions the specification of

assumptions (1.3), (1.4), and (1.5) is arbitrary, but obviously crucial

to the LS estimation of the parameters. For example, given what is

known about most markets, some alternative assumptions that are just

plausible as (1.3), (1.4), and (1.5) are (1) any nonnormal symmetrical

distribution for the error terms, (2) Q tain(Dt,St ) and (3)

pt+l=a(Dt-St) + 3t, where E3t is a random error term. Alternatively,

one could derive the likelihood function of (Qt,xt )=1, and obtain the

ML estimates. Once again, however, the estimates are subject to the

validity of restrictive assumptions.

Empirical studies of markets in disequilibrium are concerned with

analyzing time-series data, and therefore the possibility of serially

correlated errors also arises. Most disequilibrium studies, however,

completely ignore the possibility of serial correlation. One reason for

this practice is that maximizing the correct likelihood function for a

typical disequilibrium model with even a simple form of serial correla-

tion can be intractable. The problem is one of introducing further

complications into a highly nonlinear structure. (Equilibrium models,

by contrast, have simpler structures, and therefore it is relatively

straightforward to incorporate an ARMA process (say) into these models.)

The problem is further complicated when the true form of the serial

correlation is unknown; even if one was willing to incorporate a simple

process such as AR(1), the result would likely to be a questionable

approximation at best. At the same time, failure to adequately account

for serial correlation can cause inconsistent covariance estimates, and

incorrectly interpretated test statistics.

In summary, the estimation of markets in disequilibrium has been

severely limited by the problems of specifying (1) price adjustment;

(2) serial correlation; (3) the distributions of the error terms up to

an unknown parameter vector; and (4)the quantity transacted as a

function of desired supply and demand.

1.2 Solutions

This thesis addresses the above problems by examining their

effects, and by proposing and demonstrating solutions.

Chapters 2, 3, and 4 are directed at the problems of specifying

price adjustment, and specifying serial correlation. In Chapter 2, we

propose using the switching regression model with imperfect sample

separation of Lee and Porter (1984) to incorporate price adjustment into

disequilibrium models. The model enables price adjustment to be

incorporated with less a priori information than usual. To estimate the

model, ML and LS estimators are proposed.

In Chapter 3, the asymptotic properties of the ML estimator are

examined in the context of possible serial correlation. This chapter

builds on previous results of Hartley and Mallela (1977), and Amemiya

and Sen (1977). By incorporating into their results some recent

developments in modeling serial correlation by White and Domowitz (1984)

and others, the analysis permits the data to be characterized by unknown

and general forms of serial correlation. At the same time, the

estimation problem remains computationally tractable.

In Chapter 4, the practical importance of the methodology developed

in chapters 2 and 3 is illustrated with an empirical example. The

methodology is applied to monthly data on the U.S. commercial loan

market from 1979 to 1984.

The final chapter, Chapter 5, proposes semiparametric models and

estimators for markets in disequilibrium. Unlike the previous chapters,

the results of Chapter 5 apply when the functional forms of the error

distribution functions are unknown, and the observed quantity transacted

is an unknown function of desired supply and demand. Consistent

semiparametric estimators are derived by extending the method of maximum

score of Manski (1975, 1985) to a new class of applications.


Although the focus is on the disequilibrium estimation problem,

many of the issues addressed are applicable to other important problems

as well. From a general perspective, the central issue is how to deal

with an estimation problem characterized by less information than what

is usually assumed. The methodology with which we confront the issue

brings together important works from the areas of limited dependent

variables, nonlinear estimation, asymptotic theory, data analysis,

maximum likelihood, least squares, and semiparametric estimation.


lOne notable difference among many proposed disequilibrium

specifications is the treatment of price adjustment. Different price

adjustment models often produce different coefficient estimates and

inferences for given supply and demand equations. Most studies assume

normally distributed error terms, and that the quantity transacted is

the minimum of desired supply and demand. Surveys of disequilibrium

specifications commonly used in applied work can be found in Bowden

(1978) and Maddala (1983).

General discussions of LS and ML estimators for disequilibrium

models can be found in Bowden (1978) and Maddala (1983).

The random variable ,t conditional on D
normal distribution. Formulae for means and variances of truncated

random variables can be found in Maddala (1983, pp. 365-370).


2.1 Introduction

Price adjustment has a well defined role in the equilibrium model:

prices adjust to clear the market; data are observed only after

adjustments terminate, and therefore are uninformative on the forces

which led to equilibrium. When we assume prices clear the market,

modeling price adjustment is trivial. In contrast, when we assume

disequilibrium, and therefore observe adjustments at various stages,

modeling the process becomes nontrivial and affects the estimation of

the supply and demand model. The research that has followed Fair and

Jaffee has given this issue only limited attention. To lessen the

neglect the present chapter examines the importance of price adjustment

to estimation, and offers a new approach for introducing price

adjustment into the disequilibrium model.

The estimation of a disequilibrium model carries the reservation

that estimates are sensitive to the price adjustment specification. This

sensitivity is evident in many of the empirical studies which followed

Fair and Jaffee. For example, Maddala and Nelson (1974) obtained the

maximum likelihood (ML) estimates of a housing market in disequilibrium

under two different price adjustment specifications,

PA1. the sign of excess demand is given by the direction of the price

change, or equivalently

Pr(Apt+1>0Dt>St) = 1 and Pr(Apt+>0 IDt

PA2. ignore whatever information the direction of the price change

contains on excess demand. (This is a limited-information approach

as no attempt is made to model price adjustment.) In the next

section, we shall see that this specification can be usefully

viewed as imposing the following constraint:

Pr(Ap t+>0ODt >S) = Pr(Apt+ >01Dt

For the two sets of estimates the following conflicts are apparent: one

estimated coefficient is negative under PAl and positive under PA2;

another is statistically significant under PA2 but not under PAl; the

estimated variance of the supply error term is twenty-five times larger

under PA2, and the same parameter for the demand equation is ten times


Economic theory imposes few restrictions on the dynamics of price

adjustment, and consequently provides little basis for choosing between

specifications such as PA1 and PA2. Perhaps this explains why in many

studies the Fair-Jaffee models are applied rather mechanically with no

discussion of why a particular choice is appropriate for a given market.

The tendency has been either to specify convenient but restrictive

price adjustment mechanism such as PA1, or to ignore potential relations

between price and excess demand as in PA2. Apart from the potential for

conflicting results, each approach has serious drawbacks. The

restrictive approach may misspecify the model, and therefore lead to

inconsistent estimates of the supply and demand parameters. On the

other hand, if there is some interaction between price and excess

demand, then efficiency will be lost if price adjustment is completely

ignored. In short, even if the estimates under PA1 are close to those

under PA2, problems remain.

The failure of many empirical studies to adequately represent price

adjustment stems from a failure to carefully assess what is known a

priori. For most applications PAl imposes too much structure, and PA2

imposes too little. What is needed is an approach which allows price

and excess demand to interact, but at the same time is unrestrictive.

I propose nesting PA1 and PA2 in a more general model using a

method suggested by Lee and Porter (1984). In many respects the

approach is less restrictive than usual. Price adjustments are assumed

to be governed by the following condition:

PA3. The direction of the price change is most likely, but not certain

to follow the direction of excess demand, or equivalently

Pr(Apt+l>0oDt>St) > Pr(Apt+l>0 Dt

Although PA3 allows for the possibility that excess demand influences

price changes, it does not restrict the direction of the price change to

correspond to the sign of excess demand, impose a specific price

adjustment equation, or restrict price changes to obey a known

probability distribution. The approach entails estimating the

conditional probabilities in PA3, and hence the data rather than a

priori constraints such as PA1 or PA2 determine to what extent prices

are related to excess demand. Moreover, the problem of modeling price

adjustment is placed in a unified framework which permits a useful

discussion of the relationship between the price adjustment

specification, and the statistical properties of estimation. PA1 and

PA2 are special cases of PA3, and it is argued that imposing PA1 can

lead to inconsistent estimates, while imposing PA2 can suppress

exploitable information on the supply and demand parameters.

The model and its maximum likelihood estimator are discussed in

section 2.2. In section 2.3 a convenient least squares approach is

proposed which has not been previously available for disequilibrium

models. The LS estimator resembles that suggested by Heckman (1976) for

the Tobit model. Although the ML estimator presented in section 2.2 is

more efficient, the LS estimator is easier to compute, and provides

consistent starting values if the ML estimates are desired. An initial

consistent estimator is especially important when PA1 is relaxed since

the resulting likelihood generally has multiple solutions.

2.2 The Model and Maximum Likelihood Estimation

I propose the following model:

D = 0Xt + t (demand)
t 1t lt
S = 0xt + 2 (supply)
t 2 t 2t

0 a 0
(e ) N ( 22 (normality)
0 0 aE2

Qt = min(Dt,St) (quantity transacted)

Pr(Ap t+>0 Dt>St) > Pr(Ap t+>0 Dt

where the variables are as they were defined in Chapter one. The

specification of the demand and supply equations, normality for the

error terms, and the quantity transacted as the minimum of supply and

demand has become standard practice for empirical studies of markets in

disequilibrium. The model differs from previous disequilibrium

specifications with the introduction of PA3: shortages (D>S) and

surpluses (D
respectively, but in an unpredictable manner. The conditional

probabilities are unknowns which can assume any values in the unit

interval that satisfy the inequality.

The data consists of n observations on (Q t, xt,(Apt+1>0)), where

1(.) is the indicator function, and the problem is to estimate the

unknown supply and demand parameters along with the conditional

distribution of l(Apt+ >0) subject to PA3.

To make the problem operational we will adopt the methodology of

Lee and Porter (1984) which entails the following assumptions:1

Assumption 2.1. Given Dt>St (or Dt0) are mutually

independent for all t;

Assumption 2.2. the conditional probabilities of PA3 do not vary with

t; i.e., p11 Pr(Apt+>0 I Dt>Stxt' P10 Pr(Apt+>0 I Dt

Assumption 2.2 is the simplest assumption that allows the price

adjustment probabilities to be treated as estimable parameters, but is

not the only possible way of doing so. For example, if it is suspected

that price setting behavior differs between certain subsamples, then a

different pair of parameters could be defined for each. One possible

application might be a market where prices are regulated in some

periods, but not in others. Alternatively, a completely varying-para-

meter approach is developed in Chapter 5. Although assumptions 2.1 and

2.2 are still somewhat restrictive, arguably the benefits obtained from

imposing them outweigh the costs. Price adjustment is incorporated

without an explicit adjustment equation, a specific distribution for

price changes, or the restriction that price changes reveal the sign of

excess demand. Furthermore, as we shall see next, estimation is

relatively straightforward under assumptions 2.1 and 2.2.

The log likelihood function of n independent observations on
2 2
(Qtl(APt+>0) I xt,',Pl,pl0), where 0= (B1,a,,2,a'2 ), is

Ln (0p1'pI0) = E log ft (Qtl(Apt+>0)) (2.3)

l(Ap t+>0)

st f gt(Dt Qt)dD gdt Q gt(Qt,St)dSt,
t t Qt

and gt(Dt,St) is the joint density of Dt and St given xt and 0. Under

fairly general conditions, a consistent and asymptotically normal

estimate of ( ,polpl0) can be obtained by maximizing Ln over an

appropriate parameter space. (The asymptotic properties of a maximizer

of L are developed in the next chapter.)

The Maddala-Nelson estimators discussed in section 2.1 are obtained

by maximizing L subject to

(PA1): (P11,p10) = (1,0);

(PA2): Pl=P10'

As was noted, however, applying these two estimators to a given data set

can produce conflicting results. One advantage of specifying PA3 is

that the parameter space includes the entire region (p11' P0:

lapl= >100), and consequently it is not necessary to choose between
PAl and PA2.

By viewing the Maddala-Nelson estimators as constrained maximizers of

Ln, two additional limitations that are overcome by specifying PA3

can be seen. First, if the direction of the price change does not

always follow the sign of excess demand so that p l<1 or p0 >0, then the

estimator obtained by maximizing L subject to (p11,10) = (1,0) is

inconsistent. In other words, if it is incorrectly assumed that

l(Apt+ >0) separates the sample into the underlying demand (Qt=Dt) and

supply (Q =S ) regimes, then the resulting estimates will be generally

inconsistent. To see this denote the constrained estimator by 0 (1,0),

and suppose that elt ,2t are normally distributed independent random

variables, and pl <1, Po00O. Then it is shown in Appendix A.1 that

E(L (0;l,0)/9 1) = (1-pl) E xtE(Qt-Dt)/o. (2.4)

Since Pr(D >St)>0, and Pr(Dt Qt = min (Dt,St)) = 1, we have E(Qt-Dt)<0,

and it follows from (2.4) that in general plim 0 (1,0)-00. (For further

details see Appendix A.1.)

The second limitation overcome by maximizing L over the

unconstrained space demonstrates the importance of incorporating price

adjustment into the model. If price changes are related to excess

demand so that plP,,o0, then the observations on l(Ap t+>0) contain

information on Go that is exploited by the maximizer of L only if the

restriction p11=P10 is not imposed. Since imposing P11P10 is equi-

valent to estimating the model without a price adjustment specification,

this implies that one is better off using even limited amounts of price

adjustment information rather than neglecting it altogether. This can

be seen by examining the difference between the corresponding

information matrices of the constrained and unconstrained estimators of

0o. For this purpose note that p11 =10 implies that Qt and l(APt+1>O)

are independent, the marginal distribution of l(Apt+1>0) does not depend

on 00, and therefore the 0-estimator obtained by maximizing L subject

to PI=P10 can be written as
n(p11=P10) = arg max E log gt(Qt),

where gt is the density of Qt given xt. Since n(P11=P10) does not

require the joint observation (Qt,l(Apt+l>0)), it uses one more

observation on Q than the 9-estimator obtained by maximizing L over

the unconstrained space, and therefore we write the latter estimator as

0nl(Pl pl0) = arg max E log ft (Qt,(APt+>0))
(O,p11' 10) t=1

Unlike (P11=P10), the estimator 0n-_(PltP10) uses the price

adjustment information implied by p 1,po0, namely the dependence of

l(Apt+1>0) and Qt. For simplicity suppose that the observations are

identically distributed. The trade-off between the extra observation on

Q used in 9n(p1 =P10), and the price adjustment information exploited by

e n_ (P11p0) is apparent in the difference between their corresponding

information matrices:

(n-1)E(-21logf/ 0,') nE(-21logg/0~0E')

(n-l)E(-a21ogh/30 @') E(-a21ogg/ 9a')

where ht is the density of l(Apt+l>0) given (Qt,xt). In large samples

the information provided by the extra observation on Q in n (Pl=P10) is

insignificant, and clearly 0n- _(PllPl) is the more efficient


Having developed a fairly unrestrictive approach for introducing

price adjustment into the disequilibrium model, an important question

remains: is maximization of Ln over (O,p11,Pl0) computationally

tractable? This question is important given a common structure shared

by both L with (p11,P10) unrestricted, and L restricted by P11lp10:

neither specification permits the observations to be separated into the

underlying supply and demand regimes, and hence both are switching

regression models with unknown sample separation. The question of

tractability arises because likelihood functions of unknown sample

separation models generally have an unknown number of local maxima, and

finding the consistent and asymptotically normal estimate (global

maxima) usually requires an exhaustive set of local candidates. For

example, Maddala and Nelson found that three different starting values

produced three different sets of estimates, and were not able to rule

out the possibility of other solutions. Unfortunately, the extra

information provided by the joint observation (Qtl(pt+l>0)) does not

automatically eliminate the problem; in general, Ln is likely to have

multiple solutions. Fortunately, the problem can be circumvented. If

one is willing to assume that P11>P10, then it is possible to construct

a computationally simple and consistent estimator of (0,p11',p0), and

therefore obtain consistent starting values to iterate to a local maxima

of L The consistency of the initial estimates generally guarantee the
consistency and asymptotic normality of the resulting solution. The

next task is to describe the initial estimator.

2.3. A Consistent Initial Least Squares Estimator

While computationally simple and consistent estimators have been

proposed for other limited dependent variable models such as the Tobit,

similar results have not been previously available for the disequili-

brium model with unknown sample separation. Ironically, the models for

which such estimators have been available generally possess tractable

likelihood functions, and therefore finding consistent initial estimates

is of limited value. A prime example is the Tobit model for which

consistent initial estimators were proposed by Amemiya (1973), and

Heckman (1976); their estimators are not particularly useful for the

Tobit as this model has a globally concave likelihood function (when

suitably parameterized) which ensures convergence to the consistent and
asymptotically normal maximizer from any starting values. In contrast,

the likelihood functions of models with unknown sample separation are

likely to have multiple maxima, and therefore finding initial consistent

estimates for these models is crucial.

The estimator described below extends the approaches suggested by

Amemiya and Heckman to disequilibrium models with unknown sample

separation. The method requires the first moment of l(Ap t+>0)

(t=l,...,n), and the first and second moments of Qt (t=l,...,n). Least

squares is then applied successively to three estimation equations.

Assuming that eit and E2t are independent normally distributed random

variables, the relevant equations are

l(Ap t+>0) = El(Apt+1>0) + ult (2.5)

Qt = E(Qt) + u2t (2.6)

Q2 = E(Q2) + u3t (2.7)


E(1(Apt+ >)) = p (p1 P 0) D(xty), (2.8)

E(Qt)= (l-D(x yo))xtB2 + $(xtYo)xt (co2 2+ o2 (x to),(2.9)
tt t2 t t1 1 +e2) (o(9

E(Q) = (1-D(xto))(xt) o2 + (xt )(xt )2

+ t O 2 O(0 02 02+1
+ o2(1-0(x 2x 0 (x o)/(o2 + Co2)

+ o(2(xtY0) 2xtB 0(xt Y)/( o2 + co 2
E2 t t I t E 1 e2

+ ((xty o)xt (2.10)

0 0 0 o2 + 2 2
Y = ( )/( C2 + o2) and )(.) and (.*) denote the standard normal

density and c.d.f., respectively. Given appropriate regularity

conditions, nonlinear LS applied to equation (2.5) yields consistent

estimates of p 1, P0', and yo. These estimates are then used to estimate

the nonlinear functions, ( and P, in equation (2.6). Ordinary LS can
o o o2
then be applied to (2.6) to consistently estimate 61, 2 and ( 1 +

0o2) Finally, the nonlinear functions of equation (2.7) are estimated
02 o2
so that OLS can be applied, and consistent estimates of 0 o and o2

obtained. The asymptotic properties of the LS estimator are developed

in appendix A.2.

Interestingly, the above approach is possible only if price changes

provide some information on whether there is excess demand or supply;

i.e., p 0p This can be seen from equation (2.8) which can be
i.e., PllfPl0

interpreted as the probability that l(Ap t+>0) is equal to one. If

price changes are completely uninformative on excess demand or supply,

then pl 0=P0, and it follows from (2.8) that the distribution of

l(Apt+>0) is independent of yo. In this case the observations on

l(Apt+l>0) contain no information on the supply and demand parameters,

and therefore equations (2.5) is irrelevant for the estimation of the


2.4 Summary and Conclusions

The main points of this chapter are

(1) Estimates of disequilibrium models are sensitive to the price

adjustment specification.

(2) Economic theory imposes few restrictions on price adjustment, and

consequently provides little basis for choosing between specifications.

(3) Assumption PA3 serves as an unrestrictive approach for introducing

price adjustments into disequilibrium models; adjustment enters without

an explicit adjustment equation, a known probability distribution for

price changes, or the restriction that price changes reveal the sign of

excess demand.

(4) Assumption PA3 together with assumptions 2.1 and 2.2 permit a

straightforward derivation of the likelihood function. The parameter

space includes but is not limited to the important special cases PA1 and

PA2. Constraining the parameter space to PA1, as is often done in

practice, can lead to inconsistent estimation; constraining the space to

PA2 produces inefficient estimates.

(5) Under assumption PA3 the disequilibrium model is one of unknown

sample separation, and therefore its likelihood function generally has

multiple solutions. To resolve the problem of multiple solutions, the

least squares method described in section 2.3 provides consistent

initial estimates.

In Chapter 4 we apply the methodology developed in the present

chapter to monthly data on the U.S. commercial loan market. Before

proceeding to the application, however, the problem of serial

correlation must be addressed. In the next chapter we develop some

results which permit the data to be analyzed in the context of possible

serial correlation.


There is an important difference between the model Lee and Porter

(1984) discuss, and our model. The Lee-Porter model excludes an analog

to Qt=min(Dt,St), and consequently in their model the switching is

exogenous; i.e., the switching that occurs between the underlying

regimes is independent of the error terms. In contrast, the

disequilibrium model is of endogenous switching, (the "switch" depends

on (elt',2t)), and consequently many of the results, interpretations,

and expressions found in the Lee-Porter paper must be modified


In fact, given appropriate regularity conditions, consistent

initial estimates ensure the consistency and asymptotic normality of the

second-round estimates from a Newton-Raphson type algorithm. See, for

example, Amemiya (1973, pp. 1014-15).

3Olsen (1978) proved that the likelihood function for the Tobit

model is globally concave when suitably parameterized, and thus has a

single maximum.


3.1 Introduction

In this chapter we examine the asymptotic properties of the

estimator discussed in section 2.1,

ml arg max L (0 ),
n n p

where 0 = (6,p11,p10), and L (0) is defined on page 14, equation

(2.3). If the observations are serially independent, then obviously 1ml

is the MLE of 00. However, for serially dependent observations, oml is
p n
not the MLE and will be referred to as the partial-MLE.

Hartley and Mallela (1977), and Amemiya and Sen (1977) derive

asymptotic properties of the MLE for the special case of p1l=p10. We

will extend their results to the case of serially dependent observations

and p11pl0 in sections 3.2 and 3.3. In section 3.4 we consider the

problem of consistently estimating the asymptotic covariance matrix. In

section 3.5 we derive a new test for serial correlation.

3.2 Consistency

Since disequilibrium models are typically estimated with time

series data, it is of interest that the property of consistency can be

extended to the partial MLE. Using some results and definitions

presented by White and Domowitz (1981), Levine (1983) has discussed how

and why a partial MLE can be consistent. Levine points out that the

consistency of an estimator 0 (y) which maximizes the product


n=if (yt 0) depends on each f (yt I ) satisfying certain regularity

conditions. In general, whether or not ft(Yt 1) satisfies such

conditions does not depend on the product being the joint density of y =

(yl,"..yn), but rather it usually suffices that ft(. *) is the marginal

density of yt. The regularity conditions consist of identification

conditions, and moment restrictions sufficient to apply an appropriate

law of large numbers. We will show that the partial MLE for our model

can satisfy such conditions by extending some results proven by Hartley

and Mallela, and Amemiya and Sen. But first it is necessary to describe

the type of dependence we have in mind.

We will adopt the nonparametric approach of White and Domowitz

(1984) to allow for the possibility of serial correlation. The approach

of White and Domowitz is nonparametric in the sense that the

observations are not required to be generated by a known parametric

model such as an ARMA (p,q) process, but instead must obey general

memory requirements. The memory requirements are referred to as mixing

conditions, and a sequence of random variables which obey mixing

conditions is said to be a mixing sequence. More precisely, we have the

following definition.

Definition 3.1. Let (y t) denote a sequence of random vectors defined on

a probability space (2,F,P), and let F denote the Borel o-field of

events generated by the random variables yaa+1'...yb. Define

<(m) = sup(sup P(A B)-P(A) i: AeF0 BeFn )) and
(m)= sn+m -p(sup(P(BA)-P(B)P(A): A

a(m) = sup(sup( P(BA)-P(B)P(A) : ASFnFm,BcF ))n
n nm 0

(i) If ((m)O0 as m-m, and }(m)=0(m-k) for k>r/(2r-l), where r1l, then

(yt) is a mixing sequence with 4(m) of size r/(2r-1).

(ii) If a(m)-* as mm, and a(m)=0(m-k) for k>r/(r-l), where r>l, then

(yt) is a mixing sequence with a(m) of size r/(r-1).

4(m) and a(m) measure how much dependence exists between observations at

least m periods apart. A sequence such that j(m)+0 as m-+ is called

uniform mixing or (-mixing, and a sequence for which a(m)+O as m- is

called strong mixing or c-mixing. Since the dependence coefficients,

p(m) and a(m), are required to vanish asymptotically, mixing is a form

of asymptotic independence. A fairly large class of processes satisfy

mixing conditions. For example, finite order Gausian ARMA processes are

strong mixing, as are stationary Markov chains under fairly general

conditions. White and Domowitz (1984) show that measurable functions of

mixing processes are mixing and of the same size. This is particularly

convenient for nonlinear problems. Mixing processes are useful for

modeling complex economic data since they are not required to be

stationary. In short, mixing conditions provide a convenient way to

model an economic phenomenon that is likely to be both heterogeneous and

time dependent.

The following law of large numbers, due to McLeish (1975), applies
for mixing sequences.

Theorem 3.2. Let (y t) be a sequence with 4(m) of size r/(2r-1) or a(m)

of size r/(r-1), r>1, such that Ejy r+d0, and all t.


n p
(l/n)E(yt-E(y t))0

All proofs of theorems in this Chapter are provided in Appendix A.3.

For Theorem 3.2 to be applicable to a given sequence, it is clear

that there is a trade-off between the moment restriction that the

sequence must satisfy, and allowable dependence. The stronger the

moment restriction satisfied, the more dependence as measured by ((m)

(or c(m)) is allowed. If the members of the sequence are independent,

then we can set r=l, and Theorem 3.2 collapses to the Markov law of

large numbers. For sequences with exponential memory decay, r can be

set arbitrarily close to one. In general, the longer the memory of a

sequence, the larger is the size of p(m) and a(m), and consequently the

more stringent the moment restriction (which depends on r) becomes.

By using mixing conditions to restrict the serial behavior of the

sequence (Qt,l(APt+ >0),xt), it is not necessary to specify an

additional parametric model such as an ARMA (p,q) process.

Consequently, one possible source of model misspecification is

eliminated. Mixing conditions enable us to include a larger class of

models in the analysis. Of course, as Theorem 3.2 implies, the precise

size of the class will depend on what moment restrictions are satisfied.

We are now ready to state conditions which ensure the consistency of the

partial-MLE (and the MLE) of 0.
In order to establish consistency for the partial-MLE we impose the

following assumptions on the disequilibrium model presented in section


Assumption 3.3. (allowable serial dependence): The sequence

(Qt,l(Apt+1>0),xt) is a mixing sequence with p(m) of size r/(2r-1), rl,

or a(m) of size r/(r-l), r>l.

Assumption 3.4. (distributions):

(i) The random vector (elt, 2t) is normally distributed with mean zero

and covariance matrix:

o2 02)

(ii) Assumptions 2.1 and 2.2 hold. (See page 13.)

Assumption 3.5. (the regressors):

(i) The vector xt consists of only exogenous variables.

(ii) Each component of xt is uniformly bounded in t, has a finite

range for each t, and a support given by St=S for all t.

(iii) Any linear combination of the components of xt where the

coefficients are not all zero is not zero with probability one.

Assumption 3.6. (the parameter space):

(i) The parameter space E includes the true parameter vector
o o o2 o o2 o o 2
0P=(Boa 2, o2 ,p',0), excludes the region a 20 (i=1,2) and

P10>P11, and is a compact subset of a Euclidean space.

(ii) If the set includes points such that P11=P10, then it excludes
o 2 o o2 o o
the point 0 =(62,oi o2 o o2 0 0). Otherwise may include
p B2' F-2 1' J ,PlP I0'

With a few exceptions, the conditions on the regressors and the

parameter space are identical to those given by Hartley and Mallela

(1977), and Amemiya and Sen (1977). One exception is that we place no

restrictions on the limiting behavior of the empirical distribution of

the regressors, whereas Hartley and Mallela require it to converge

completely to a nondegenerate distribution. As pointed out by White

(1980), in sampling situations where the researcher has little control

over the data, it is important to allow for the possibility that the

empirical distribution does not converge. In contrast to Amemiya and

Sen, we do not require the regressors to be i.i.d., but for convenience

retain their assumption that the regressors are discrete random


Assumption 3.6(ii) is necessary to identify the true parameter
O o
vector do. Without appropriate prior information on B and B2, the
P 1
point 0 is indistinguishable from o0 and the model can not be
p p
estimated. This is the problem of interchanging regimes which is

discussed by Hartley and Mallela, and Amemiya and Sen. Both studies
point out that 0 is eliminated from the parameter space if the usual

"order condition" holds. We will extend this result below by showing

that for 0o to be distinguishable from it suffices to know a prior
p p
that pol >P0. In this sense prior sample separation information

represents prior information on the supply and demand parameters.

Hoadley (1971) has generalized the Wald argument to the case of

independent not identically distributed observations. Theorem 3.7 below

is an extension of Hoadley's argument to mixing sequences, and will be

used to verify that assumptions 3.3, 3.4, 3.5, and 3.6 imply consistency

for the partial-MLE, m .3

Theorem 3.7. Suppose:

(i) The sequence (y t) is a mixing sequence with ((m) of size r/(2r-1),

r2l, or a(m) of size r/(r-l), r>l.

(ii) The parameter space E is a compact subset of a Euclidean space.

(iii) The function ft(y t0) is continuous on E, uniformly in t, a.e.

(iv) The function

sup{in(ft(yt '1)/ft(Yt eI)): |I'-0 |1p}

is a measurable function of yt for each 0 belonging to E.

(v) There exists p (0)>0, and d>O such that

E sup{ln(f(yt t ')/ft(Yt 0): 9'- sp} Ir+d<=A<

for Op-

(vi) For 0e0,

lim sup{n E E(ln(ft (yt I)/ft(yt I )))}<0.
n t=1

Let 0 (y) be a function of the observations y=(y1,. .,n) which

solves the problem

max n ft(y 1).
0 t=1

Then plim 6 (y)=O.

To show that the partial-MLE dnl is a consistent estimator of 0 we
n p
verify that 5, (Qt,l(Apt+1>0),xt), and ft(Qt,1(Apt+1 >0) ) satisfy

3.7(i)-(vi) given assumptions 3.3 3.6.

The fact that the mixing and compactness requirements 3.7(i) and

3.7(ii) are satisfied follows directly from assumptions 3.3 and 3.6(i).

Lemma 3.8 establishes that f (Q ,l(Ap t>010 ) satisfies the

continuity requirement 3.7(iii).

Lemma 3.8. Given assumptions 3.4 3.6, ft (Qt,1(Apt+1>0 ) is a

continuous function of Q uniformly in t, a.e.

Lemma 3.9 establishes that the measurability requirement, 3.7(iv),

is satisfied.

Lemma 3.9. Given assumptions 3.3 3.6, the function

I f
sup{ln(ft(Qt,l(Apt+l>0) I9p)/f t(Qt,l(Apt+l >0)o )): O p-Op

is a measurable function of (Q t,(Ap t+>0),xt).

The moment restriction, 3.7(v), together with 3.7(i) determines the

amount of dependence allowable. The following lemma extends Hartley and

Mallela's Corollary 4.2, and establishes that 3.7(v) is satisfied for

large r+d.

Lemma 3.10. Given assumptions 3.3 3.6, for all sufficiently small

p=p(0) >0,

Elsup{ln(ft (t,1(Ap t+oe>0 /fo): I p-ep Isp}k~g~m.

where k is any positive integer.

Finally, Lemma 3.11 establishes that the identification condition,

3.7(vi), is satisfied. Lemma 3.11 extends Amemiya and Sen's Lemmas 2

and 3 to the case of p11pl0.

Lemma 3.11 Given assumptions 3.3 3.6, for 0 ;o0 there exists a
negative constant b(6 ) such that

E(1n(f t(Qt, (Apt+l>0) 6 p)/ft (Qt,1(Apt+1>0)p))):5b().
p tn ft(Qt

We have proven the following theorem.

Theorem 3.12. Given assumptions 3.3 3.6, then plim J= -0.
n p

3.3 Asymptotic Normality

Under the assumption that (Qt,1(Apt+l>O),xt) is a mixing sequence,

we consider the limiting distribution of

nV 2eO)VL (0),
n p n p
where VL (0) denotes the gradient vector corresponding to L (0o), and
n p n p
S() = var (n-2VL (0)). We will discuss conditions that imply
n p n p
asymptotic normality; that is,

-i i A
n -V '(0)VL (o)yN(O,I), (3.13)
n p n p

where I denotes an identity matrix of appropriate dimensions. The

results in this section together with those in the next section permit

derivation of asymptotic test statistics.

As is well known, asymptotic normality is proven by an appropriate

application of a central limit theorem. As with consistency, the

conditions sufficient for asymptotic normality depend on the degree of

dependence and heterogeneity the sequence exhibits. For a sequence of

independent identically distributed random vectors, we have the

Lindeberg-Levy Theorem; for independent not identically distributed we

have the Lindeberg-Feller Theorem; for dependent identically distributed

we have the central limit theorem of Gordin (1969); for dependent not

identically distributed we have the central limit theorem of Serfling


For the case of independent observations, Hartley and Mallela

(1977) prove the asymptotic normality result (3.13) by applying a

version of the Lindeberg-Feller Theorem. However, by specifying the

sequence (Qt,l(Apt+1>0),xt) as mixing, a more general result is

possible. The following theorem is based on Theorem 2.4 of White and

Domowitz (1984) which generalizes Serfling's (1968) central limit


Theorem 3.14. Suppose:
(i) Let VL (0o) = E VL (0). Then E(VL (0)=0 for all t.
n p t=1 p p

(ii) Let X be any nonzero vector, and define

VL (0) = n Z VL (O ).
n,a p t=l+a

Then there exists a matrix V such that det(V)>0, and

AE(VL (O0)VL (Oo)T) T -XVXT -+ 0
n,a p n,a p

as n-+ uniformly in a.

(iii) EIVL t(0) 12r A<~ for some r>1.
t p

If p(m) or a(m) is of size r/(r-l), then (3.13) holds.

Condition 3.14(i) is the familiar condition that the vector of

likelihood equations, when evaluated at the true parameter vector 0,

has zero expectation. Sufficient conditions for 3.14(i) are (1) the

model is correctly specified, and (2) the density of (Qtl(Ap t+>0),x )

is sufficiently regular to permit differentiation under the integral


Condition 3.14(ii) is somewhat restrictive, but unfortunately a

less restrictive replacement for it is currently not available.

Condition 3.14(ii) restricts the heterogeneity of VL (0o) by requiring
it to be covariance stationary asymptotically.
it to be covariance stationary asymptotically.

Condition 3.14(iii) is a moment condition which depends on the

amount of dependence the sequence (Qt,(Ap t+>0),xt) exhibits. If the

sequence is serially independent, then r can be set arbitrarily close to

one; as the amount of dependence increases, as measured by ((m) or a(m),

r increases accordingly.

3.4 Consistent Covariance Estimation

We consider the problem of deriving consistent estimators for the

asymptotic covariance matrix of the partial-MLE 0 The expression for
the asymptotic covariance matrix is

n2V2 (e0)-V (0oV2- (o)-1,
n p n p n p

where V (00)= var(n- VL (00)), and V2 (00) is the matrix of second
n p n p n p
order partial derivatives of L (bo) = E(L (0o)).
n p n p
First consider the problem of consistently estimating the term

nV 2L (0)-1. The functional form of this term does not depend on the
n p
serial dependence (or independence) of the observations, and therefore

consistent estimation of it is straightforward. The following theorem,

which combines Lemma 2.6 of White (1980) with Theorem 2.3 of White and

Domowitz (1984), provides conditions that imply

plim n(V L (e1l)- V2L (0)-1) = 0.
n n n p

Theorem 3.15. Let qt(yt,0) be measurable for each 0 belonging to a

compact set E, and continuous on E uniformly in t a.e.


(i) The sequence (y t) is mixing as stated in Definition 3.1.

(ii) For r1 and any d>0,

sup E q t(yt, ) Ir+d<

If plim 0=', then plim n( E (qt(yt' )-q (y ,)))=0
n t=1

Next consider the problem of consistently estimating V (0o).
n p

Unlike the term V2 (00), the functional form of V (0) depends on the
n p n p
nature of the serial correlation, and consequently special care must be

taken. The general form for V (eo) is
n p

n oT
v ();c(n)) = n-1 E E(ft(OO)f ) ) +
n p t=1 t p

Sc(n)-1 n
+ n E E E[ft( (0)f (s )T + f s(0f o T
t p t-s p t-s p t p
s=l t=s+l p P

where ft(0p)PVlog f t(O), ft(.) is the density of (Qt,l(APt+1>0),xt) and

c(n) is such that E(ft (0)f s(0p)T = 0 for sac(n). The natural choice
t p t-s p
for an estimator of V (0;c(n)) is the sample analogue V (ml ; c(n)).
n p n n
The consistency of such an estimator, however, depends on the asymptotic

behavior of c(n). We will consider two special cases.

Case 1. c(n) = c

If c(n) is equal to a known finite constant c which is less than or

equal to the sample size minus one, (if c=n, then the estimator

V ( l;c)=0), then imposing the conditions of Theorem 3.15 will suffice
n n

plim (V (n ;c) V (6;c))=0.
n n n p

An example of sampling situation where c is a known finite constant

would be one in which the observations are known to be generated from a

moving-average process of order c.

If c is assumed to be constant and less than or equal to n-1, but

otherwise unknown, then the problem becomes more complicated. Let c

denote the specified choice for an unknown c. In the next section we

derive an asymptotic test for the hypothesis c=c. The test is a

possible criterion for specifying c. The issues involved in specifying

c are the following. If we specify c
is inconsistent since nonzero terms in V (0o;c) are mistakenly
n p
constrained to be zero. On the other hand, if we specify c> c, then the

estimator is consistent, but inefficient since restrictions of the form

E(f.(0)f.(o ) )=0 are neglected. When the purpose of estimating
i P J P
V (9;c) is to construct asymptotic test statistics, however, the
n p
essential requirement is consistency (rather than efficiency).

Therefore, when the purpose is hypothesis testing, the choice c>c is

preferable to c

Case 2. lim c(n)=- and lim E(f (0 )f ()(0 T)=0.
cn+t p t-c(n) p
n-+mco c(n)->+m

In this case the sequence (ft (0 )) is only assumed to be

asymptotically uncorrelated. A sufficient condition for (f (0 )) to be

asymptotically uncorrelated is that the sequence (Qt,1(Ap t+>0),xt) be

mixing. Theoretical results for this case have been presented by White

and Domowitz (1984), White (1984), and Newey and West (1985). Their

results depend on restricting the growth rate of c(n). Unfortunately,

their results do not give any guidance concerning the choice of c(n) for

finite samples. The following theorem is due to Newey and West (1985),

and provides sufficient conditions for plim (V (CP ;c(n)) V (9;c(n)))
n n n p

Theorem 3.16. Suppose

(i) f (9 ) is measurable in (Q l(Ap +>0),xt) for each 0 and
t p t t+l t p
continuously differentiable in 0 in a neighborhood N of 00.
p p

(ii) (a) sup Ift(o ) 12 0 EN
(b) There are finite constants d>0 and rl such that

E Ift(0) 4(r+d) <.

(iii) (Qt,l(Apt+l>O),xt) is a mixing sequence with p(m) of size 2 or

c(m) of size 2(r+d)/(r+d-l), r>l.

(iv) For all t, E(f (O))=O, and n'(Gm-0o) is bounded in probability.
t p n p

If lim c(n)=- such that c(n)=o(n'), then

plim (V (aml;c(n)) V (0);c(n)))=0.
n n n p

One additional problem is that for c(n)>1 the estimate V (ml;c(n))
n n

is not necessarily positive semi-definite. This can lead to negative

estimates of the variances and test statistics which are clearly not

acceptable. To ensure that V (ofl;c(n)) is positive definite, the
n n
summands can be weighted according to a procedure described in Newey and

West (1985). This modification does not affect the consistency of the


3.5 An Asymptotic Test for Serial Correlation

In this section we propose a test sensitive to serial correlation

in the gradient vectors f (00). The test provides a criterion for
t p
specifying the constant c of the covariance estimator V (Ol;c).
n n

The null hypothesis of interest is

H : E(f (0~).f (0o).)=0 for all i,j,
o t p i t-c p j

where f (0). denotes the i-th component of the vector ft(0o). The
t p I t P
basis for a test of H comes from two observations.

(1) Under Ho, linear combinations of the components of the vector

f (b) are uncorrelated with linear combinations of the components
t p
of the vector f (00).
t-c p
(2) Under Ho, the products f (G l)f (il). should be close to zero
o t n i t-c n a
for sufficiently large n.

Therefore, a reasonable strategy for testing H would be to compute the

sample correlation between appropriate linear combinations, and reject

H if the sample correlation is too large in some sense. To this end,
for a k-dimensional vector f (G~), consider the artificial regression
t p

k k
Z w. f (Ql) E a.f (e)
Sit t n f i i t-c n i
i=l i=l

where the w. are known constants such that I w =1, and the a. are
it it 1

unknowns to be estimated. The test we propose entails computing the OLS
estimates a. i=l,...,k, and testing the hypothesis al= ... k=0. More

formally, we have the following theorem.

Theorem 3.17. Define a = (a *...a ,

f (0 ) =/f( )
-c p 1 p1

f (0 )
n-c p 1

f(O )

fl( p)k \

f (0 ) (n-c)xk
n-c pk

k .
/ w. f (0 ).
i=1 i,c+l c+1 p )

Sw. f ( ). (n-c)xl
i=1 1,n n p i

In addition to H suppose

(i) The vector-valued function f(0 ) is continuously differentiable
(component by component) on an open convex set E CR containing

(ii) There exists an open neighborhood of 0 ,N, such that

sup ft (6 )il SeNt p
8 EN

sup |lf (0 )i/90 | 0 EN P P-
n 0
(iii) plim n Z ft( )=0

(iv) Let A (0)=n- f (0) T- (0o), and (0 ) = E(A (0 )). Then
n p -c p -c p n p n p
there exists an open neighborhood of 0 NO, such that A (0 ) is
p n p

positive definite on NO for all n sufficiently large and

plim sup A n( ) A (0 ) =0.
Sd n p n p

(v) Let U (Go) = var (n- f () fT(0o)), and let U (C1) denote the
n p -c p p n n
sample analogue. Then U (0) is positive definite on an open
n p
neighborhood of 0 for all n sufficiently large and

plim (U (d11) U (0)) = 0.
n n n p

(vi) Under H U (O)n f (oo) f(0) 1 N(0,I).
o n p -c p p
Let D (9 )=A (0 )U (0 )A (0 ). Then given conditions (i)-(vi), and
n p n p n p n p
H ,
1s T-1 1(l) is A 2
na D (U ) a v .^
n n n n X"

3.6 Summary and Conclusions

The main points of this chapter are the following:

(1) The assumptions presented in sections 3.2 and 3.3 imply that the

partial-MLE of the disequilibrium model is consistent and asymptotically

normal. The assumptions allow for serial correlation of an unknown

form; for example, an arbitrary ARMA process is allowable for the

observations. At the same time, the estimator Uml is computed as though

the observations were serially independent, and thus computational

tractability is retained.

(2) To calculate asymptotic test statistics, a consistent estimate of

the asymptotic covariance matrix is needed. Obtaining a consistent

covariance estimator is complicated by the need to specify a constant c

such that E(f (O )f (o T)=0 for all s.c. In general, c is unknown but
t p t-s p
consistent covariance estimation depends on specifying a c such that c>c.

(3) The test statistic presented in Section 3.5 permits a test of

H :c=c, and thus provides a criterion for specifying c.


Our discussion of mixing draws heavily on White and Domowitz

(1984), and White (1984, pp. 43-47).

2Theorem 3.2 is a less general version of the law of large numbers

presented by McLeish (1975, Theorem 2.10). The version we present is

discussed in White (1984, Corollary 3.48), and imposes a stronger but

simpler moment restriction.

3White and Domowitz (1984) extend Hoadley's Theorem A.5, which is a

uniform law of large numbers, to mixing sequences by applying Theorem

2.10 of McLeish (1975) instead of Markov's law of large numbers. Here

we merely point out that Hoadley's Theorem 1 can be extended to mixing

sequences using the same technique.

In some respects the conditions of Theorem 3.7 are stronger than

those stated in Hoadley's Theorem 1. For example, the requirement that

ft (t 0) is continuous can be replaced by upper semi-continuity. The

conditions that we state are sufficiently general for our purposes.


4.1 Introduction

In this chapter the disequilibrium model described in section 2.2

(page 12) is fitted to monthly data on the U.S. commercial loan market

from 1979 to 1984. The problem is to analyze disequilibrium supply and

demand behavior with limited a priori information imposed on the price

adjustment process. The model is estimated and tested with the

partial-MLE and least squares method described in sections 2.2 and 2.3,

respectively. The possibility of serial correlation is accounted for

using methods described in Chapter three.

Disequilibrium models of commercial loan markets have been

estimated by Laffont and Garcia (1977), Sealy (1979), and Ito and Ueda

(1981). To design the specification of the supply and demand equations

these works were consulted. Our model and estimation methods, however,

differ from the previous studies in three important respects. First,

price enters the model differently. Laffont and Garcia, and Ito and

Ueda constrained the price change to separate the sample, and Sealy

assumed that price changes were a linear function of normal random

variables. Second, the starting values we employ for maximizing the

likelihood function are consistent estimates, and therefore ensure

convergence to an asymptotically desirable solution. None of the above

studies employed methods that guarantee this. Third, we will adopt the

nonparametric approach developed in Chapter three to allow for the

possibility of serial correlation. Given that the data is a time

series, allowing for serial correlation is particularly important.

Failure to do so can cause inconsistent covariance estimates and

therefore misleading test statistics. In contrast, most existing

disequilibrium studies, including those mentioned above, apply methods

to time series data that are only appropriate for serially independent

observations. The nonparametric approach was chosen for its generality,

and computational ease. An arbitrary ARMA process is allowable for the

error terms, but at the same time the parameter estimators are computed

as though the errors are serially independent. As opposed to an

assumption of serial independence, the only part of the problem that

changes is the calculation of the asymptotic covariance estimate.

4.2 The Empirical Model

The empirical model to be estimated and tested is specified as


Dt = 10 + B1(RLt-RAt) + 2IP t- + 1t,

St =20 + 21(RLt-RTt) + B22TDt + E2t'

Qt = min(Dt,St),

P11P10' where p11 Pr(ARLt+I>0 Dt>St), and

P0 Pr(ARLt+I>ODt
The variables we use will differ little from those of the previous

studies. The variable RL is the average prime rate charged by banks; RA

is the Aaa corporate bond rate, and reflects the price of alternative

financing to firms; IP is the industrial production index and measures

firms expectations about future economic activity; RT is the three month

treasury bill rate, and represents an alternative rate of return for

banks; TD is total bank deposits in billions of dollars, and is a scale

variable. The observed quantity transacted, Q, is specified as the sum

of commercial and industrial loans, and the relevant price change is

ARLt+ =RLt+1-RLt. All interest rates are expressed as percentages. The

sample consists of 72 observations on each variable, and can be found in

various issues of the Federal Reserve Bulletin.

4.3 Hypothesis Testing Procedures

Two hypotheses concerning the price adjustment process, and several

hypotheses concerning serial correlation were tested. The first price

adjustment hypothesis maintains that the direction of the price change

l(Apt+l>0) can be used to separate the sample into the underlying supply

(Qt=S ) and demand (Qt=Dt) regimes. The approach we have chosen to

model price adjustment permits the known sample separation hypothesis to

be conveniently expressed as

H o: (P11'Po)=(110).

The null hypothesis was tested by computing a Lagrange multiplier (LM)

test. The LM test was chosen over the Wald and likelihood ratio tests

because it only requires the estimates under the computationally

simpler null hypothesis.

The second price adjustment hypothesis maintains that price

adjustments are symmetrical in the following sense: the chance of a

price increase during a shortage is the same as that of a decrease

during a surplus. This hypothesis can be expressed as

H: P11=1-P10.

To test the hypothesis of symmetrical price adjustment, a Wald test was

computed. The Wald test was chosen over the LM and likelihood ratio

tests because it only requires the unconstrained estimates. In this

case the constrained estimates (those obtained under H ) offer no
computational advantage over the unconstrained estimates.

The LM and Wald test statistics converge to their usual chi-squared

limiting distributions provided that:

(1) n-2 (oo)VL (OAN(O,I);
n p n p
(2) a constant c is chosen such that plim (V (?lCc)-V (90;c))=0.
n n n p

If VL (G ) is a k-dimensional vector, and both (1) and (2) hold, then we
n p
can conclude

VL ( )TV (dV ) -1L (o) A 2
n p n n n p ()X"k

(See, for example, White (1984, Theorem 4.30)).

The specification of c was handled as follows. The LM statistic

for the first H and the Wald statistic for the second H were each
o o
computed for several successive values of c. The LM statistic was

computed for c=1,...,12, and in each case the null hypothesis

(p11' ,P )=(1,0) was rejected. The Wald statistic, however, produced
conflicting evidence for the hypothesis p11= l-P0; for some values of c

the hypothesis was rejected, and for others it was accepted. To choose

among the conflicting evidence, the test statistic for serial

correlation (See Section 3.6) was computed for several values of c. On

this basis c was specified, and a single covariance estimate for the

Wald test was chosen. The covariance estimate chosen for the Wald test

was also used to compute the asymptotic standard errors of the parameter


The test statistic for serial correlation depends on the

correlation between linear combinations of the components of f (00) and
t p

linear combinations of the components of f (00). Therefore, the
t-c p

conclusion of the test depends on how the linear combinations are

chosen, or in other words, on the specified weights wit (see page 36).

For example, the test might reject H for some set of weights, and not

reject H for other sets. To help cope with this difficulty, it was
decided to choose the weights randomly from a uniform distribution on

the interval (0,1). If there is a finite or countable number of sets of

weights such that H is incorrectly rejected or accepted, then choosing

the weights from a continuous distribution ensures that these weights

are not chosen with probability one. The weights were generated from a

uniform distribution by a SAS random number generator.

4.4 The Results

The model was estimated under the assumption that the error terms

are independent normal variates with constant variances, but are not

necessarily serially independent. First the LS method was applied.

The LS estimates are reported in the first column of Table 1, and were

used as starting values to obtain the ML estimates presented in the

second and third columns. A computer program was written with the SAS

"Matrix Procedure" for the purpose of maximizing the likelihood

functions; the program uses the quadratic hill-climbing technique as

presented in Goldfeld, Quandt, and Trotter (1966). In Appendix A.4 we

describe the quadratic hill-climbing technique, and show that consistent

initial estimates ensure that the second-round estimates obtained from

the technique have the same asymptotic distribution as the partial-MLE.

The estimates in column two of Table 1 maximize the likelihood

subject to (p11,P10)=(1,0), or equivalently, under the assumption that

the direction of the price change separates the sample into the

underlying supply and demand regimes. Unlike previous studies a test of

this hypothesis was carried out. The constrained estimates were used to

construct the Lagrange multiplier (LM) statistics. The LM statistic was

computed with twelve different covariance estimates (c=1,...,12). As

the figures in Table two indicate, the hypothesis of known sample

separation is rejected. The conclusion of the LM test has two important

implications for the analysis of the data and model. First it suggests

that the price change alone should not be used to determine whether the

sample period was characterized by excess demand, excess supply, or

both. In most disequilibrium studies this type of analysis is routinely

done. Second, as was shown in Section 2.1, incorrect sample separation

adversely effects the large sample properties of the estimators. In

view of this problem the constrained estimates are suspect.

The next estimation was performed over the unconstrained space, and

consequently pll and p10 were estimated along with the other parameters.

In this case all of the initial consistent estimates were employed, and

therefore the estimates in column three represent the consistent and

asymptotically normal solution. The ML estimates are not much different

than the LS estimates. This is due to stopping iteration before

complete convergence to a maxima of the likelihood function. The

iterative technique performed poorly for the unconstrained likelihood in

the sense that the speed of convergence was extremely slow. For this

reason, the final estimates were obtained from the 100th iteration where

the gradient is not significantly close to zero, and therefore are not

true ML estimates. However, since the initial estimates are consistent,

estimates obtained after the second iteration are asymptotically

equivalent to the ML estimates, and therefore nothing is lost by

stopping iteration before convergence, at least asymptotically. Further

details regarding this point are provided in Appendix A.4. The

particular specification chosen for the model performed well in the

sense that all of the estimates are of the correct sign, and most are

significant. The estimates of p11 and p10 are .8179 and .2455,

respectively, which mean there is (1) a 81.79% chance of a price

increase and 18.21% of a decrease during shortages, and (2) a 75.45%

chance of a decrease and 24.55% of an increase during surpluses.

To select a covariance estimator for the Wald test of H :11=1-pP0,

the serial correlation statistic was computed for c=1,2,3. (See Table

3.) The hypothesis of c=3 was accepted. The Wald test statistic did

not reject the hypothesis H :p11=1-p10 (see Table 4), suggesting that

price adjustments are symmetrical.

The differences which arise when the imperfect sample separation

given by the price change is ignored can be seen by comparing columns

two and three of Table 1. While both sets of estimates give the correct

signs for the supply and demand variables, the unconstrained estimates

suggest that demand and supply are less responsive to price changes than

do the constrained estimates. The unconstrained estimate of the price

parameter for the supply equation is approximately 40% less than the

constrained estimate, and the price coefficient for the demand equation

is approximately 14% smaller in absolute value for the unconstrained

estimate. Given the rejection of the known sample separation model,

however, we are more inclined to believe the unconstrained estimates.

The problem of determining whether the period 1979-84 was charac-

terized by excess demand or supply was also addressed with the

unconstrained estimates. This was accomplished by estimating the

probability of excess demand for each t conditional on the quantity

transacted and the direction of the price change. The expression for

this conditional probability is

1(Apl >0) 1(Ap <0)
Pr(Dt>S Qt ,(Apt>0)) = (pg) .((l-P)gst)
ft(Qt 1(t+l>))

The results are reported in Table 5. As pointed out by Lee and Porter

(1984), the classification rule: Qt=St if Pr(Dt>St Qt, l(Apt+l>0)) > .5

and Qt=Dt otherwise, is optimal in the sense that it minimizes the

probability of misclassification. Applying this rule, we find that

54.12% of the observations are excess demand and 45.8% excess supply.

In contrast, if one were to rely solely on the direction of the price

change, the conclusion would be 31.9% excess demand, 43.1% excess

supply, and for 25% of the observations, Apt+=0. In Table 6, the

compatibility of the direction of the price change with the optimal

classification rule is further examined. Comparing the two rules,

excluding the observations for which Apt+1=0, we find that 9

observations out of 54 are classified differently.

Table 1

Estimated parameters and statistics.
(Asymptotic standard errors in parentheses)

Initial estimates

P11=1, P10=0



demand const.



supply const.




log likelihood




































( 1.170)

( 94.36)




( 87.40)

( .0673)

( .2752)


Table 2

Test of H :(P11,10) = (1,0)

LM Statistic H rejected at a% level

























nfl; C)













n n

Table 3

Test of H : c=c
Serial Correlation Statistic

H rejected at a% level

1 32.7552 0.030%

2 14.2697 16.104%

3 10.6509 38.540%

Table 4

Test of H : p11 = 1-10

V (m ;c~) Wald Statistic H rejected at a% level
n n o





Table 6

Compatibility of the Direction of the Price Change with the
Optimal Classification Rule

Pr(Dt >S l(Apt+1 >O)) >.5

Ap >0

Ap <0


Pr(D t >S t ItI1(Ap t+ >0)) <.


5.1 Introduction

We consider an alternative estimation strategy not previously

analyzed for a disequilibrium model. The strategy is the so-called

"semiparametric" estimation developed in Manski (1975), Cosslett (1983),

Powell (1984), Manski (1985), and some others. Semiparametric

estimators have been shown to be consistent under more general

conditions than the conventional LS and ML estimators, and therefore

require fewer prior restrictions. For a number of cases where

consistent LS and ML estimation require the functional form of the error

distribution, consistent semiparametric estimators have been derived

without imposing functional form. Powell did so for the censored

regression model using the method of least absolute deviations, Cosslett

derived a distribution-free ML estimator for the binary choice model,

and Manski derived consistent estimators for the same model using the

method of maximum score. Semiparametric estimation is most useful when

parametric assumptions cannot be trusted, but are needed for consistent

LS and ML estimation. In particular, it offers an improved strategy for

estimating disequilibrium models.

We derive consistent semiparametric estimators for disequilibrium

models using the method of maximum score of Manski (1975, 1985).

Consistent score estimators are derived for the following situations:

the functional forms of the error distributions are unknown, the

quantity transacted is an unknown function of supply and demand, and the

price change is an unknown function of excess demand. The presentation

comprises three models and their score estimators. The models we

consider are all of the following form:

M. (model): Given the supply and demand equations St= x + 2t and

Dt=B Xt + E the iid sequence of random vectors (Q pt,xXt) +1, the

event Spq involving either pt or Qt, and the event Sx involving xt.

Pr(S Is ;Bo o) > Pr(Sc IS ;0, ), and
pq x1'2 pq x 1 2
Pr(Sc ;c Oo) >Pr(S pqISCo o);
pq x 1 2 pq x 1 2

where Sc denotes the complement of the event S. General, intuitive

considerations motivate the specification of (S ,S ) for each model.
pq x
For example, the intuition that an expected shortage (excess demand) is

a better predictor of a positive price change than an expected surplus

motivates the model in Section 5.2. Given the model, consistent

estimation depends on general continuity and identification assumptions

which do not require prior knowledge of the functional forms of the

underlying distribution functions or explicit equations for quantity or


The model in Section 5.2 concerns events involving the price

change, Apt+l = Pt+l-Pt, and expected excess demand, B xt 1xt- 2xt'

or more specifically, the binary variables 1(Apt+1>0) and l(xt >0),

where 1(*) denotes the indicator function. The model maintains that

given 1(Bxt >0), the best forecast of l(Apt+l>0) corresponds to

1(Ap t+>0) = 1(xt >0). A score estimator of B is defined and

assumptions for consistency given. The model resembles the binary

response model studied by Manski (1975, 1985), and shares an

identification problem: 8 is only identified up to an unknown

multiplicative scalar.

The model in Section 5.3 is a more restrictive version of that in

Section 5.2, but retains a considerable amount of generality. The model

is designed to exploit the fully observable Apt+1 (versus l(Ap t+>0)) to

identify B. A consistent score estimator is presented, and we show

that 6 is identified without a loss of scale. The model represents a

completely new application for maximum score estimation as it differs

significantly from the model studied by Manski.

The estimators presented in Section 5.2 and 5.3 do not depend on

the quantity transacted, Qt, and therefore impose no restrictions on it.

By neglecting the observations on Qt however, the generality involves a

loss of information. In Section 5.4 we specify a model for Qt and

define a corresponding score estimator. The specification, however, is

insufficient to identify B (even up to a multiplicative scalar) without

severely restricting the distribution of xt. To eliminate the

identification problem the models of the previous sections are added to

the specification, and the estimator is redefined. The resulting

estimator uses the entire sample (QtAPt+l,Xt) t, and is shown to be

consistent under general conditions.

5.2 A Directional Model and Consistent Estimation Up to Scale

The directional model restricts the direction of the price change

to be most likely, but not certain to follow the sign of expected excess

demand, or equivalently

M5.1 (directional model): Pr(Apt+1>0 |0xt>0) > Pr(Apt+1OIB0xt>O), and

Pr( Ap t 0 x t<0) > Pr(Apt+ >0 I x <0).
t+1 t- t+1 t-

The motivation for M5.1 is its compatibility with an intuitively

appealing forecast procedure: if a shortage is expected at time t,

Bext>0, then predict a positive price change, Apt+ >0; otherwise,

predict a nonpositive change. Given M5.1, the number of correct

forecasts must eventually exceed the number incorrect.

The forecast procedure in turn motivates a strategy for estimating

B from n observations on (Apt+l,xt): choose as an estimate of Bo a

value B that maximizes the proportion of the observations characterized

by 1(Apt+l>0) = 1(Bxt >0). This is the method of maximum score. We

propose the score estimator:

-1 n
S= arg max g (6), where g (B)=n Eg g (B), and
a B t=1

g (B) = l(Ap >0)1(Bx >0) + l(Ap +<0)l(Bxt<0).

The function gt(') "scores" one if a candidate B implies a forecast

compatible with the maintained model, M5.1, and zero otherwise.

Manski (1985) presents a consistent score estimator for a model of

the form MED(ylx)=bx, where MED(z) denotes the median of the random

variable z. His consistency proof, however, depends on the weaker

model: Pr(y>0) bx>0) > Pr(y<0 bx>0) and Pr(y<0 bx<0) > Pr(y>0)Ibx<0).

We have postulated our model in the weaker form for two reasons. First,

the weaker model is easy to interpret as a price adjustment model;

positive price changes occur most frequently with expected shortages,

and negative changes with expected surpluses. Second, but not less

important, MED(Apt+ xt )=oxt is unnecessarily restrictive.

Manski's consistency proof (1985, p. 323) is directly applicable

for B assuming appropriate regularity conditions are met. Theorem 5.2

below provides assumptions that imply n converge to 6 almost

everywhere (a.e.) as n becomes indefinitely large.

Theorem 5.2. In addition to M1.1 assume:

A5.3. (continuity): E(gt(B)) =- g(B) is continuous in B on a compact set


A5.4. (identification): The set A (B) = {x: sgn(BOx) t sgn(Bx)} has

positive probability for all BeB such that 8Bg .

Then lim B = a.e.


Step 1. Uniform convergence.

The proof of uniform convergence uses the argument presented in Manski

(1985, pp. 321-2). Observe that

g (B) = P ( t+>0, xt >0) + P (Ap t+ xt<0), and
n t+1 n t+-< t'
g(B) = P(Ap t+>0, Bxt>0) + P(Apt+1
where P P represent the empirical and true distributions. Therefore,

the generalized Glivenko-Cantelli theorem of Rao (1962, Theorem 7.2)


lim sup gn () g(B) = 0 a.e.

Step 2. Identification.

M5.1 and A5.4 imply that B0 uniquely maximizes g(B). To see this,


E(gt(B) gt(B)) = / E(gt(S) gt() Ixt)dFx

Ac B)

+ / E(g t(B) gt() xt)dFx

where Ac(B) denotes the complement of A (B), and F the distribution
x x x
function of x. The first term on the right-hand side vanishes given the

definition of gt, and under M5.1 the second term is strictly positive.

Step 3. lim B = a.e.

Given A5.3, Step 1, and Step 2, a.e. convergence follows from Theorem 2

of Manski (1983).


The assumptions permit a fairly general disequilibrium model. The

consistency proof does not depend on the distributions of lt and _2t'

or how the market determines the quantity transacted. Consistency

depends on a price adjustment model which enters without an explicit

adjustment equation, or a known functional form for the probability

distribution of prices. It suffices to believe that an expected

shortage (surplus) is a better predictor of a positive (nonpositive)

price change than an expected surplus (shortage).

The generality of the assumptions, however, has costs. In

particular, a careful examination of A5.4 reveals that 80 is only

identified up to an arbitrary scale factor. The identification problem

results from the failure of the obvious, but necessary condition that

A (B) be nonempty for all BB0o. Observe that for any A>0 we have

sgn(XBx) = sgn(Box) for all vectors x, and therefore A (UB) is an

empty set. Thus, if points of the form 8=B0 are included in the

parameter space, B, then A5.4 fails as does identification (Step 2).

Manski (1985) resolves the problem by normalizing the parameter space

with respect to scale which effectively eliminates the troublesome

points. Scale normalization suffices for A5.4, but the conclusion of
o 2
Theorem 5.2 becomes lim B = XB a.e., where X is an unknown scalar.

The loss of scale can be interpreted as arising from insufficient

information. The directional model represents prior information on the

stochastic behavior of the signs of Apt+l and xt, but not their

magnitudes; by construction the estimator depends only on the signs.

The limited information permits a fairly general model, but limits what

can be learned about 6o. We shall see next that the loss of scale can

be eliminated by imposing assumptions on the magnitudes of Apt+1 and

Bxt. At the same time it is possible to retain a considerable amount of


5.3 A Price Adjustment Model with B Identified (Without a Loss of

Manski (1985) discusses the score estimator for a binary response

model where the dependent variable, y*, is unobservable, and the sample

consists of observations on l(y*>0). In the last section the price

change was treated analogously to obtain a robust method of estimation.

Unlike the problem considered by Manski, however, Apt+l is generally

observable. To take advantage of the extra information, and thus obtain

a stronger result, we propose the following model.

M5.5 (directional-magnitude model): for appropriately specified numbers

s>O and 6>0,

Pr(Ap t+l>eBx >6) > max(Pr(Ap t+l 6), Pr( Apt+<-e 0 gx >6)),

Pr( I Apt+ 1 OXt <6) > max(Pr(Apt+l>El IBxt 16), Pr(Apt+l<- I Boxt l<6)),

Pr(pt+-E oxt<-6) > max(Pr(Apt+>E Bxt<-6), Pr(Apt+ xt<-6)).

The directional-magnitude model quantifies the notion that large (small)

discrepancies between expected buy and sell decisions are most likely to

lead to relatively large (small) price changes. The model predicts a

small price change (IApt+l <) if the expected market position lies

within a specified interval centered at equilibrium (I 6xtl 1<), and

larger changes ( Apt+l l>) otherwise.

Compared to M5.1, the model M5.5 is more restrictive as it

restricts both the direction and magnitude of the price change. We

shall see, however, that M5.5 distinguishes Bo from BO, and thus it

becomes meaningful to discuss estimators that converge unambiguously to


Given M5.5 we define a score estimator of 06 as follows:

B = arg max h (B), where
n BeB n

h (B) = 1(Ap >E)1(Bx >6) + 1( Apt+ t t+ > t t+( xt _
+ 1(Apt+ <-s)1(Bxt<-6).

To prove lim B =6 a.e. using the arguments in the proof of Theorem 5.2,
the relevant assumptions are:

A5.6 (continuity): h(B) is continuous in B on a compact set B.

A5.7 (identification): The set J (B) = {x: sgn(6Bx-6) z sgn(Bx-6)}
has positive probability for all 6BB such that B8BO.

The important difference between the above assumptions and those of

Section 5.2 lies in the identification assumptions A5.4 and A5.7.

Specifically, assumption A5.7 does not require a normalized parameter

space since there generally exist vectors x such that sgn(B6x-6) ?

sgn(6x-6) for BB0; i.e., the set J (B) is nonempty for 80.

Therefore, it is possible to restrict the distribution of x so that

J (B) has positive probability for B6B6, and to identify 6 without a
loss of scale. We summarize the result in the following theorem.

Theorem 5.8. Suppose the i-th component of the vector Bo is nonzero.

Then for all B such that 8.60o and .BiO, the set J (B) is nonempty.
I 1 x

It suffices to show that there exists at least one solution x to the

system of linear equations: M(B0,B)x=r where

M(360,8) o= 0 k ...

r = and yo>6>y or y>6>yo


The existence of x is equivalent to rank(M(B6,)) = rank(M(B0 ,) r), or

det(M(o,B)) = det(M(B0,) r). If det (M(0B,)) = 2, then the proof is

complete. If det(M(Bo,)) = 1, then we need i/Bi y/0. The

existence of such points y and yO follows immediately since

{y/yO: y>6>y} = (--,O)U(1,o), and

{y/yo: y0>6>y} = (-0,1).


5.4 Maximum Score Estimation of Models That Include the Quantity


The estimators presented in sections 5.2 and 5.3 do not depend on

the observed quantity transacted, Q, and therefore neglect relevant

sample information. In this section we propose a model for Q, and

define a score estimator of Bo that depends on n observations of Q. We

shall see, however, that the model for Q is insufficient to identify Bo

(even up to a multiplicative scalar). We resolve the identification

problem by combining the model for Q with the price adjustment models

described in sections 5.2 and 5.3. The score estimator we define for
the combined model uses the entire sample (Qt,Apt+l,xt)t=, and

therefore can be expected to be more efficient than the estimators of

sections 5.2 and 5.3.

The observations on the quantity transacted are modeled as follows:

M5.9 (quantity model): For some given 6>0,

Pr(Q >6i xt > 6, x > > )> Pr(Qt<6 xt > 6, > 6),


Pr(Q <6 x 6, x < 6) > Pr(Q >6 xt <6,

Two appealing assumptions that are sufficient for M5.9, and therefore

motivate it, are

A5.10 Qt = min(Dt,St).

A5.11 MED(t ) = MED(2t) = 0, and elt and s2t are independent.

Assumption A5.11 requires only independent error terms with

distributions symmetrical about zero.

To construct an estimator of Be given the quantity model, we define

the scoring function:

qt(B) = 1(Qt>6)1(B1Xt>6, 2xt>6) + l(Qt<6)l($1x6,B2xt

To prove consistency for a maximizer of qn(6) using the arguments in the

proof of Theorem 5.2, the relevant assumptions are:

A5.12 (continuity): q(B) is continuous in B on a compact set B.

A5.13 (identification):

(i) The set U (B) = {x: sgn(Box-6) z sgn(1x-6), sgn(B2x-6)

sgn(B2x-6)} has positive probability for all PBB such that (B, 2) V

(B' 2).

(ii) The set Z () = {x: sgn( ox-6) sgn(Bx-6)} has zero


The role of assumption A5.13 in proving consistency is analogous to that

of the previous identification assumptions A5.4 and A5.7. The two parts

of A5.13 imply that B uniquely maximizes q(B). Part (i) compares to

the familiar order condition needed for the identification in the

textbook simultaneous equation framework. For example, if the supply

and demand equations have no explanatory variables in common, and 6>0,

then Theorem 5.8 implies that U (B) is nonempty for B'B.3 To see the
role of part (ii), suppose that the sets ZCU {Zc( o) U ( ), ZCUC ZU,
X x

and ZUc each have positive probability for some 8B. Then we can write,

E(qt( B)-qt()) = E(qt( )-q t() Ixt)dFx

+ c E(q ( e)-qt( B) xt)dF + / E(qt( t)-qt( ) xt)dFx

+ / E(q ( )-qt(B) xt)dF
ZUc t t

It can be readily verified that the first term on the right hand side is

positive, the second in nonnegative, the third is zero, and the last

term is negative. Therefore, given the negativity of the last term,

8t does not necessarily imply E(qt( e)-qt(B)) > 0. To rule out this

possibility, we impose part (ii).

The requirement that Z (P) has zero probability, however, is too

restrictive to be generally applicable. It is difficult to imagine a

situation where such an assumption would be appropriate. Therefore,

unless one is willing to severely restrict the distribution of xt, the

model M5.9 is insufficient to identify B. Assumption A5.13(ii) can be

relaxed, however, by combining the model for Q with the price adjustment

model of Section 5.2, and constructing a score estimator that exploits

both models. For this purpose we assume that the price adjustment model

M5.1 holds in addition to M5.9, and consider the scoring function:

q*(B,Bo) = 1(Zc()) (8) + 1(Z ($))P (B)

where Pt(B) = 1(Apt+10)1( 1xt<, B2xt>6) + 1(Apt+1>0)1(Blxt>6, B2xt 6),

l(Zc(B0)) l(x eZC(0)), and Zc(B0) denotes the complement of Z (B).
x t x X X

Generally Z (6) will be unknown, but if a consistent estimate, say Bn,
is available, then it can be replaced by Z (6 ). One possible choice
x n
for n is the estimator presented in Section 5.3. This forms the basis

for a "total" sample estimator of 0:

n = arg max q* (B,$ ).
n BeB

To show that $B converges to B0 a.e. we prove:

Theorem 5.14. Let lim $ = B a.e., and 6 sB for all n. In addition to
n n
M5.1 and M5.9 assume:

(continuity): q*(B,B') is continuous in both arguments on a compact set


(identification): Assumption A5.13(i) holds.

Then lim B = B a.e.


Step 1. Uniform convergence.

The proof is similar to Step 1 of Theorem 5.2. Theorem 7.2 of Rao

(1962) implies

lim sup Jq*(8,a') q*(B,B')| = 0 a.e.

Step 2. Identification.

Let d (B,B) = q*(B 0,) q*(B, ). We will show that BB0 implies

d(, B) > 0. Consider,

d(8,6) = fE(dt(B, B) Ixt)dF + f E(dt( 8) Ixt)dF

+ f E(dt (B, B) xt)dF +C fcE(dt(8,Bo) Ixt)dFx'

where UZ = {U (8) Z (Bo)}, UZ = {U () ZC(0B)},
x x x x
uCz = {UC(B) Z (Bo)}, and UZc = {UC(B) ZC(O)}. That BB0 implies
x x x x
d(B,B) > 0 follows from the first two terms being positive, and the

last two nonnegative. We will prove this for the first and last terms

only; the proof for the remaining terms is similar.

Consider the first term, and assume without loss of generality that

Box -6 < 0, and B0x -6 > 0, and thus (B1 2)x < 0. Since xU (B), we
1 t t 1 2 t t x
have B1xt-6 > 0, and B2xt-6 < 0. Therefore,

E(dt( B,) Ixt UZ) = Pr(Apt+1 < 0 xt) Pr(Apt+1 > 0Ixt) > 0,

where the inequality follows from (6 1-8)x < 0, and M5.1.

For xt E UCc assume without loss of generality that B xt-6 > 0,

and 82-6 > 0. Since xt e Uc(), we have B1xt-6 > 0 and 62xt-6 > 0, or

6x t-6 > 0 and 2x t-6 < 0, or x t-6 < 0 and 82x -6 > 0. Therefore,

evaluating the conditional expectation case by case, we find

E(dt(B,3) Ixt UCZC) = Pr(Qt>6 xt) Pr(Qt>61xt) = 0, or

= Pr(Qt>6 xt) > 0.

Step 3. lim sup q*(B,Bn) q*(8, BO) = 0 a.e.
B n n

Let Y > 0 be given. Step 1 implies

sup I q*(B,B) q*(8,B ) < y/2 a.e.
feB n n n

for sufficiently large n. The continuity of q*, and the compactness of

B imply

sup I q*(B, ) q*(B,B)I < y/2 a.e.
eB n

for sufficiently large n since lim B = o a.e. Applying the triangle

inequality we get
sup q*(B,B ) q*(B,Be) | < Y a.e.
6B n

for sufficiently large n, which is the desired result.

Step 4. lim B = B8 a.e.
Let N be an open neighborhood of BO and define

S= q*(8,, ) sup q*(B,0) > 0
eN 0

where the existence of e follows from Step 2, and the compactness of B.

Now Step 3 implies q*(n ,8) > q*(B B) E/2, a.e. for large n, and
n n n n
since ( ,6 ) maximizes q* we have
n n n

q*(8 B) > q*(B,B ) e/2 a.e. (5.15)
n n n
Step 3 also implies

q*(,Bn) > q*( ,B) e/2 a.e. (5.16)
n n n


for large n. Adding both sides of (5.15) and (5.16) we get

q*(e n )

sup q*(8, o) a.e.
8 N B

and therefore 8 EN a.e. for sufficiently large n.


The signum function, sgn(-), is defined as follows: sgn(z) = 1 if

z > 0, and sgn(z) = -1 if z < 0.

2Another significant cost is that no distributional theory for

maximum score estimators is currently known.

3Other comparisons with the so-called order condition for

identification are much more complicated, and beyond the scope of this



In this thesis, I have proposed several new solutions to the

problem of generalizing disequilibrium models and their estimators. The

empirical example in Chapter 4 demonstrates how to implement many of

these solution in practice. However, as we have seen, while some of the

solutions solve old problems, they also introduce new complications.

For example, while the methods presented in Chapter 3 eliminate the need

to specify a parametric model for serial correlation, they also

introduce the complication of having to choose a single covariance

estimator from several candidates. Clearly, some of the results fall

short of completely generalizing disequilibrium models and their

estimators; there is a trade-off. I believe, however, that this thesis

accomplishes more than merely shifting the problems faced by empirical

studies from old ones to new ones. In particular, it provides a solid

foundation for further research by clarifying many of the issues

involved. The following is a partial list of directions for further

research on the problem generalizing disequilibrium models and their


(1) the consequences of restricting the conditional probabilities

Pr(Ap t>01D >S ) and Pr(Ap t>01D
respect to t, and how to relax this restriction;

(2) the problem of finding an optimal covariance estimator when the

serial correlation is modeled by mixing conditions;


(3) the power properties of the serial correlation test in section 3.5;

(4) the small sample properties of estimators obtained from starting

iterative techniques with consistent estimates, but stopping

iteration before convergence;

(5) numerical studies examining the properties of the maximum score

estimators for disequilibrium models relative to parametric



A.1 Inconsistency and Misclassified Observations

We will show that constraining the direction of the price change

1(Apt+1>0) to separate the sample into the underlying demand (Qt=Dt) and

supply (Qt=St) regimes, when in fact l(Ap t+>0) misclassifies

observations with positive probability, leads to inconsistent estimates.

Consider the estimator 0 (1,0) which solves the problem

max L ( p11,p10) subject to (p11 ,10)=(1,0),

where L (O,p 11,p0) is defined on page 14, equation 2.3. We will show

that pl<1 and P0=O imply plim n (1,0)> . The proof of plim
0 (1,0)>0 proceeds as follows: we derive a necessary condition for the
consistency of an estimator that solves a maximization problem, show

that the condition is violated, and hence conclude plim 0 ;9.

The necessary condition for consistency can be viewed as either a

global or local condition depending on whether the estimator is a global

or local maximizer of L The global condition appears as the
conclusion of the following theorem.

Theorem A.1.1. Let 0 (y) be a function of the observations such that
L (0 ,y)>L (O,y) for all n and all OE, where E is a subset of a
n n n
Euclidean space. Define

L (O,O',y,p) = sup{Ln(t,y)-L (O',y): t-l

and let L (, O',p) -E(L ((0, ',y,p)). Suppose
n n

(i) For all sufficiently small p(e)=p>0,

plim(L( O,0' ,y,p)- (e,e ',p))=0.

(ii) L (e,Q',p) decreases to L (e,0',0) uniformly in n as p decreases

to zero.

If plim n=0 then lim sup n (e,E,0)} 0 for all E0s.


Suppose there exists 0*es such that lim sup{L (90 ,*,0) }<0.

Then by (ii) we can choose p>O such that limnsup{L n( ,0*,p)}<0. Now

define N={e: 1-0- |

R =sup{Ln(t,y)-L (0*,y): It-e I0p}

Since 0 EN implies R >0, it suffices to show that lim Pr(R <0)=1.
n n- n+- n
Let M = L (0,0*,p) and d=lim supMn 0. Now for sufficiently large
n n n n
n we have M n
Pr(R Pr(R -M <-d/4) + 1 as n + by (i).
n- n n- n n n-

Under additional regularity conditions, the conclusion of Theorem

A.1.1 can be viewed as a local condition.

Theorem A.1.2. In addition to A.l.l(i) and A.1.1(ii), suppose

(i) IL (0)/a90=a (e)/a0; that is, the order of integration and
n n
differentiation can be interchanged.

(ii) 0o is an interior point of 5.

(iii) L (o)/ao is continuous on a closed neighborhood N1 of 00 with
n 1
radius E1>0, for all n sufficiently large.

Let an (O)/o.=T (O).. If for some i there exists a positive constant
n 1 n i

m. such the IL ((). > m. for all 0 belonging to a closed neighborhood of

0 with radius E2>0, N2, for all n sufficiently large, then plim 0 e9.


We will prove plim 9 nO by showing that the hypothesis of the
theorem implies lim sup{L (00)-L (9*)}<0 for some sequence (9*)
n n n n n
belonging to E.

Let E3=min(E, 2). Since N3 is compact and Ln () is continuous on

N3, there exist points 9* belonging to N3 such that L (E*)=sup{L (0):8
n 3 n n n
belongs to N3}. Furthermore, since In (0)il>0 on N2, the points 9* lie

on the boundary of N3. Therefore, 9*-0o I=E3'

By the mean value theorem we have

L (*)-L (o) = E (* .-) (0')., (2)
n n n 1 n,i n ni

where 0' lies on the segment connecting 9* and 0 Now if L (9') >m >0,
n n n ni 1
then we must have .-o >0. Otherwise, since L is strictly increasing
n, i 1-- n
in its i-th argument on N3, we would have L (9* ...n,)>
3 n n,1 I n, k
S(* ,...,9* .,...,1* ) which contradicts the fact that 9* is a
n n, n,i n,k n
maximizer of L Similarly, if L (9')jn. <0, then n,.- <0.
n n n j n,j j-
Without loss of generality suppose

L (9').>m.>0 for i=l,...,h and
Snn 1 1
L (0'). n n i 1
Then by equation (2) we have

h K
i (9*)-L (o) > Z (0* .-?)m. + E (?-e0* .)(-m.)
n n n i n, 1 1 i=+h i n,1 1
i=1 i= 1+h

>m i|* .-3 |i>m.d>0,
-- ,i= 1 --

for some d>0, where m=min(ml,...,mh, -mh+,,...,-mk). This implies

lim sup{L (o0)-: (0*)}<0.
n n n n

Therefore, to prove plim 0 (1,0)0b0, it suffices to show that
DL (e;1,0)/nBI is bounded away from zero. We establish this by showing


E(OL (0;i,0)/ ) = (1-pol) E xtE(Qt-Dt)/o2 (3)

Let Dlogft(;l1,0)/81 = fb, l(.)=l(Apt+>0), and note that

E(f ) = b f tf(Qt, () IO,po ,P0=0) dQt. (4)

Now if po =1, then (4) is the expectation of a likelihood equation, and
therefore given the usual regularity conditions we have E(f )=0 at

po l1. This condition will imply

b b
-i(') / f gstdQt=(l-l(')) / f gdtdQt (5)

Substituting (5) into (4) yields

E(f) = (1-1())(1-pl) f gdtdQt + f gstdQt (6)

b 02
For 1(-)=0, given the normality of Elt, 2t we have f (Q t-Xt )xt/ o'

Substituting this into (6), and summing over the observations gives (3).

A.2 The Computational Tractability and Asymptotic Properties of the

Least Squares Estimator of Section 2.3

In Section 2.3 we proposed using a LS estimator to find the

consistent and asymptotically normal solution to the likelihood

equations; i.e., use the LS estimates as starting values to iterate to

the consistent and asymptotically normal local maxima of the likelihood

function. The success of this strategy depends on:

(a) The objective functions to be solved for the LS estimates are not

characterized by an unknown number of local minima so that global minima

can be easily found; i.e., multiple solutions are not a problem.

(b) The LS estimators (defined as global minimizers) are consistent and

have a proper limiting distribution.

If (a) fails, then the LS method is no more computationally tractable

than the ML method, and thus one might as well use the ML method to

begin with. (b) ensures convergence to the consistent and

asymptotically normal local maxima of the likelihood function. (See, for

example, Amemiya (1973, pp. 1014-15).) In this section we will argue

that both (a) and (b) are likely to be satisfied in practice.

Condition (a) will be obviously satisfied if the following

optimization problems have unique solutions:

-1 n 2
local-min n E (l(Ap+ >0)-E(l(Apt+>0)))
(p1'P10' Y)

-1 n 2
local-min n (Qt-E(Q )
2 2
(81, B, ~al+o)2

-1 n 2^( 22
local-min n E (Q -E(Q )
t t

2 2
(a1 ,a82)

where E(Qt) denotes the function E(Qt) with y estimated by y (obtained
2o o 2 2 and
from (1)), and E(Q ) denotes E(Q ) with B0, 2',( ), and y
2 2
estimated by a1, 2) (a +0F2), and y (obtained from (1) and (2)).

Solutions to problems (2) and (3) are OLS estimates, and therefore

are unique if the appropriate matrices of explanatory variables have

full column rank. For example, unique LS estimates can be obtained by

solving (2) if the following matrix has full column rank:

s d
(1-(x1Y))x1, $(xly))xd, ((xlY)

(1-$(x Y))xs, <(X y))x 4(x y)

where xs denotes the lxks vector of explanatory variables of the supply
equation, and xd the lxk vector of demand explanatory variables. In

general, the matrices of explanatory variables for (2) and (3) will have

full column rank provided that the functions (xt y) and }(xt y) are not

constant for all t.

Solutions to problem (1) are nonlinear LS estimates, and conse-

quently establishing their uniqueness is much more difficult. Unfortun-

ately, attempts to prove that problem (1) has a unique solution have

been inconclusive. However, there is some evidence suggesting that

problem (1) can be solved for a global minimum in practice. First, the

larger the sample size the more likely problem (1) will have a unique

solution. Lemma A.2.4 below provides a rank condition which ensures a

unique solution with probability approaching one as n approaches

infinity. Second, given the data discussed in Chapter 4, attempts to

solve problem (1) were successful in the sense that all starting values

iterated to the same solution. In contrast, attempts to maximize the

likelihood function were unsuccessful as different starting values

iterated to different solutions. Third, the objective function in

problem (1) is bounded below (by zero) which simplifies the search of

the parameter space for a global minimum. In contrast, a search for a

global maximum of the likelihood function is complicated by
2 2
unboundedness: L -- as a -*0 or a 2-0, (see, for example, Maddala
n El dd2
(1983, p. 300)). Therefore, any search for a global ML estimate will be

futile unless one is willing to arbitrarily bound the error variances

away from zero.

Next we discuss conditions that imply consistency for the LS

estimator. We will only consider conditions that imply consistency for

the nonlinear LS estimator defined as any global minimizer of problem

(1). (Given plim y=yo, proving consistency for the OLS estimators

obtained from solving problems (2) and (3) involves repeated application

of Jennrich's (1969, Lemma 3) mean-value theorem for random functions,

and is quite tedious.) For simplicity, rather than necessity, we will

assume that all relevant random variables are independent identically

distributed across t. This enables us to apply the following simplified

version of White's (1980) Lemma 2.2 to the global minimizer of problem


Lemma A.2.1. Let Q n(w,) be a measurable function on a measurable space

W and for each w in W a continuous function on a compact set E. Then

there exists a measurable function 0 (w) such that

Qn (w, (w))=inf Q (w,O) for all w in W.

If plim{sup |Q (w,e)- (O) )=0, and if ( 0) has a unique minimum at

9, then plim =9.

Proof: See White (1980, Lemma 2.2).

The first part of lemma A.2.1 ensures the existence of the

nonlinear LS estimator (defined as a global minimizer). The second part

will be used to show consistency. For this purpose we define,

-1 2
Qn(0)= n 1 (1(Apt+l>0)-E(l(Apt+1>0)))

-1 n 2
= n E (z t() + Ul) ,

where zt 0)=Pi1-1 (PlP o)(xt Y) + (Pll-p0)(xtY), and

=(p11'1,P0,Y). To apply the second part of Lemma A.2.1 we need to show
uniform convergence, and that Q(0) has a unique minimum at 0 The next

lemma, which is due to Hoadley (1971), provides a moment restriction

that implies uniform convergence.

Lemma A.2.2. For the function defined in Lemma A.2.1 suppose

EIQ (0) l+d_0. Then plim {sup IQ ()-Q- (0) }=0.
n n n

Proof: See Hoadley (1971, Theorem A.5).

The following lemma establishes that the moment restriction holds.

Lemma A.2.3. EIQ (O) Il+d0.

Proof: Since zt(0) is bounded we have

(z (8)+ul) 2<2.z () 2+2.u2

Therefore, the conclusion of the lemma follows if E u tl2+dn, d>0. Let

1t=l(Apt+ >0), set d=l, note that E lt k=E(1t)
that ult=1t-E(it). Thus,

E lu 31Elu3 J It it -rt t t t t t -

Finally, we present a rank condition that implies Q(O) has a

unique minimum at o0, and therefore together with Lemma A.2.3 ensures

consistency for a global minimizer of Qn ().

Lemma A.2.4. Suppose xt is a discrete random variable, and let xt

denote the i-th member of the support of xt. For each EH such that

0o0 suppose there exists k>l members of the support of xt such that

the following matrix has full column rank:

Ak 1 ( (x y) D(xtY )

'(k '(k o

If p1 >p10, then Q(O) has a unique minimum at 0.

Proof: Since E(ult Xt)=0, we have

Q(e)=E(zt(0)+ult)2=E(0E(z )2)+E(ult).

Obviously, (0) has a minimum at 0 since E(zt(0 )2)=0. To prove

uniqueness it suffices to show that o0<=> E(z (0)2)>0.

Suppose for some Oo, E(zt(0) )=0. Since Pr(zt() 2>0)=1, we have

E(z (0)2)=0<=>Pr(z (0) =0)=1. This implies that for every xi belonging

to the support of x ,zt(0) =0. That is,

But this contradicts the assumption that Ak has full column rank unless
o o o
1 1-plI l-P0l=P11p-10

Finally,we note without proof that Theorem 3.1 of White (1980) can

be applied to show that the nonlinear LS estimator obtained from solving

problem (1) is asymptotically normal. Therefore, the LS estimates have

a proper limiting distribution.

A.3 Proofs of Theorems 3.2-3.17

Proof of Theorem 3.2: See McLeish (1975, Theorem 2.10).

Proof of Theorem 3.7: The proof is the same as Hoadley's (1971)

Theorem 1 except Theorem 3.2 is applied instead of Markov's law of large


Proof of Theorem 3.8: For notational simplicity let 1 t=(Ap t+>0).

Consider an arbitrary point O*e5. We will show that given E>0 there
exists d>0 such that 0 -0*1 p p-

If (Qt,lt p)-f t(Q ,lt )I t t1t p t t1t p

where e and d do not depend on t.

Assumptions 3.4(i) (normality) and 3.6(i) (compactness) imply

lim sup{ If(Qtt1 0 )-ft(Q ltit IO:0 C*}=0 (1)
Q +o p p p p

Let e>0 be chosen. Then equation (1) implies that there exists

a =a(xt,1 )>0 and dt=d(xt,1 )>0 such that for QtI>at and I- p t t t t t t t p -t

Ift(Qtt1 )-f t(Qtlt )I< (2)

By assumption 3.5(ii) (xt has a finite support) equation (2) holds a.e.

for IQt >a-max(al,...,ak) and 1 p-9* |d=min(dl,... k). Thus, it

remains to show that equation (2) holds a.e. for Qt belonging to [-a,a].

Let C={(Qt,1t,xt ):Qt belongs to [-a,a]}. Since C is compact,

and ft(Qt,1t 1 ) is continuous on C, it follows that ft(Q t,10 p) is

uniformly continuous on C. That is, there exists a d>0 such that

equation (2) holds a.e. uniformly in t whenever 10 -0*\ p pE.D.

Proof of Lemma 3.9: The result follows from the fact that E is

separable and ft (Qt,(Ap t+>0) Ip) is continuous on E. See, for

example, Loeve (1960, p. 510).

Proof of Lemma 3.10: Hartley and Mallela (1977, Corollary 4.2)

prove that there exists p(O )>0 such that

E sup{lnf (Q ,l( Ap >0) :e'):, _e)- t tpt+1 p p p

for k=2. In fact, their arguments can be used to show that (3) holds

for any even positive k, and therefore for any positive k.

Proof of Lemma 3.11: The proof involves minor modifications to the

proofs given in Amemiya and Sen (1977, lemmas 2 and 3) to cover the case

of PllP10'

Proof of Lemma 3.14: See White (1984, Theorem 2.4).

Proof of Lemma 3.15: By Theorem 2.3 of White and Domowitz (1984),

assumptions 3.15(i) and 3.15(ii) imply

-1 n
plim {sup n E (qt(yt,)-q t(y t,))}=0. (4)
OE- t=l

Given (4) and plim 0 =0o, Lemma 2.6 of White (1980) implies

plim n-1 Z (qt( t )-qt (yt)))=0.


Proof of Theorem 3.16: See Newey and West (1985, Theorem 2).

Proof of Theorem 3.17:

Step 1. n2 a (n-1 ()T (o0)) n-~ (o)Tf (O0)
n -c p -c p -c p -c p

We will show that Step 1 follows from

n as (n-lf ( Pml)Tf (fPl))-ln-f ( ml)Tf(ml0). (5)
n -c n -c n -c n n

Given 3.17(i), the mean-value theorem for random functions

(Jennrich (1969, Lemma 3)) allows us to write

f(oml)=f(o)+(af(_ )/ae )(D i_-O), and (6)
n p n p n p
f (eo )=f (oo).+(af (' ) /e )( -), i=l,...,k, (7)
-c n i -c p i -c ni p n p

where f ( )i). denotes the i-th column of the matrix f (11l), and o
-c n 1 -c n n
and 0 each lie on the segment connecting ?l and 0 .
n n p

Given (7), 3.17(ii), and plim ml=o0, we have
n p

n-l ( f1)Tf ( l) .=n-lf (6o)Tf (o).+o (1). (8)
-c n I -c n j -c p i -c p J p

Given (6), (7), 3.17(ii), 3.17(iii), H and plim Ol=00, we have
0 n p

nlf ( f( fl)=n- o (O O+o (1). (9)
-c n i n -c p i p p

Substituting (8) and (9) into (1) we get the desired result:

n c =(n f (o)Tf (e)+o ()) nf (o)Tf(o)+o (1).
n -c p -c p p -c p p p
I1sT mli-is 2
Step 2. n o1 D (ml ) A 2
n n n n 'XK

By Step 1 we can write

ls 1-1 o T
2 -A (00)-1 2f (oo)Tf(oo)=
n s -A (0) nf (0)
n n p -c p p

[(n f (e) f (o))-1-A (e0)-1n f (0o)Tf( o)+o (1)
-c p -c p n p -c p p p

Therefore, by 3.17(iv), 3.17(v), and 3.17(vi), we have

s1 2 2f1( e)T*
plim (D (o00) n 2as D (00)-A (bo)- n -f (e)Tf())= (10)
n p n n p n p -c p p

Given 3.17(vi), by Corollary 4.24 of White (1984),

D (0)-A (0)- -f (no Tf(o0) N(O,I) (11)
n p n p -c p p

(6) and (7) imply,

So- is A
D (op)nas n N(0,I),
n p n k

and therefore by Corollary 4.28 of White (1984) we have

is o -1 ls 2
n nD (0) an .
n n p n "

Finally, since plim (D ( l )-1-D (00)-1)0, by Theorem 4.30 of White
n n n p

Is T l-1 la s 2
na D (P ) a "
n n n n Xk


A.4 Quadratic Hill-Climbing and the Asymptotic Distribution of the

(p+l)th-Round Estimates

The (p+l)th (p=1,2,...) iteration of the quadratic hill-climbing

technique is given by

p+i=P (2L (P)-a I)-1 VL (EP) (1)
n n n n n n n

where a = max (A +r VL ( ) |,0), X is the maximum eigenvalue of
n n n n n
2L (eP), r is a scalar correction factor, and IIVL ()P) denotes the
n n n n
length of the k dimensional vector VL (n ).
n n

Goldfeld, Quandt, and Trotter (1966) show that the technique

chooses OP+i to maximize the quadratic approximation of L (0) on a
n n
region centered at Op of radius

(A ((O)-a I)-1 VL (P) n n n n n -

If the quadratic approximation is good, (that is, if the step increases

L (0)), then in the next step r is decreased. Otherwise r is increased.
Further details can be found in Goldfeld, Quandt, and Trotter (1966).

Next we show that the estimator defined by p+1 has the same

asymptotic distribution as the partial-MLE provided that plim OP=O0 and

7n (OP-0) has a proper limiting distribution. More explicitly, we show
(p (e0 _0) = (n'V2L (0))-1 n VL (0o). (2)
n n n

The implication is that when consistent initial estiamtes are employed,

iteration beyond the second-round does not improve the final estimates,

at least asymptotically.
To prove (2), it suffices to show that plim n a =0. To see this,

consider the mean-value expansion

VL (GP) = VL (Co) + L ( )(P-oo). (3)
n n n n n n

Substituting (3) into (1) and rearranging, we get

n(op+l-o0) (n-la I-n-1 L (P))-ln-VL (0)
n n n n n
= [I-(n-2L (P)-n-a I)n 1V2L (0 )] Tn(OP-0). (4)
n n n n n n

Therefore, if plim n a =0, then (2) follows from (4) since the right

hand side of (4) converges in probability to zero.
The following theorem establishes that plim n a =0.

Theorem A.4.1. For VL (0) = E alogft ()/aQ, suppose
n t=1

(i) plim sup n Z [ logf t()/ E(alogft( )/a()] = 0.
0 t=l

(ii) plim ep=9.
(iii) E(9logft(O)/O) is continuous.
(iv) plim n <0 .
Then plim n a =0.
If suffices to show that plim n | IVL (eP) |=0. Now
n n

n- LV (P) I = n-(1E VL (0p)2)2
n n i= n n i

k n
= n-1( (Z 1ogf (P)/96.)2a
i=1 t= t n
k n p
< Z inE C logf (OP)/eo. I 0
i=i t= t n

by (i), (ii) and (iii).


In effect, the proof of (2) follows from the observation that if
plim n a =0, then for sufficiently large n equation (1) reduces to the
Newton-Raphson technique with probability approaching one. Given that

the proof depends on (1) reducing to the Newton-Raphson technique

asymptotically, why not use the latter to begin with? Unfortunately, a

definitive answer to this question is not available. The answer lies in

the small sample properties of the estimators, which undoubtably would

require Monte Carlo studies to help uncover. We have chosen quadratic

hill-climbing over Newton-Raphson because it is somewhat reassuring to

know that the former always moves in the direction of a maximizer of the

likelihood function, while the latter might not.


Amemiya, T. (1973). "Regression Analysis when the Dependent Variable
is Truncated Normal." Econometrica 41:997-1016.

Amemiya, T. (1974). "A Note on the Fair and Jaffee Model."
Econometrica 42:759-762.

Amemiya, T., and G. Sen (1977). "The Consistency of the Maximum
Likelihood Estimator in a Disequilibrium Model." Technical Report
238. Institute for Mathematical Studies in the Social Sciences,
Stanford University.

Benassy, J.P. (1982). The Economics of Market Disequilibrium. New
York: Academic Press.

Bowden, R.J. (1978). The Econometrics of Disequilibrium. Amsterdam:
North Holland.

Cosslett, S.R. (1983). "Distribution-free Maximum Likelihood Estimator
of the Binary Choice Model." Econometrica 51:765-782.

Fair, R.C., and D.M. Jaffee (1972). "Methods of Estimation for Markets
in Disequilibrium." Econometrica 40:497-514.

Fair, R.C., and H.H. Kelejian (1974). "Methods of Estimation for
Markets in Disequilibrium: A Further Study." Econometrica

Fisher, F.M. (1983). Disequilibrium Foundations of Equilibrium
Economics. New York: Cambridge University Press.

Goldfeld, S.M., and R.E. Quandt (1975). "Estimation in a
Disequilibrium Model and the Value of Information." Journal of
Econometrics 3:325-348.

Goldfeld, S.C., R.E. Quandt, and H.F. Trotter (1966). "Maximization
by Quadratic Hill-climbing." Econometrica 34:541-551.

Gordin, M.I. (1969). "The Central Limit Theorem for Stationary
Processes." Soviet Mathematics 10:1174-1176.

Hartley, M.J., and P. Mallela (1977). "The Asymptotic Properties of a
Maximum Likelihood Estimator for a Model of Markets in
Disequilibrium." Econometrics 46:1251-1271.

Heckman, J.J. (1976). "The Common Structure of Statistical Models of
Truncated, Sample Selection and Limited Dependent Variables and a
Simple Estimator for Such Models." Annals of Economic and Social
Measurement 5:475-492.

Hoadley, B. (1971). "Asymptotic Properties of Maximum Likelihood
Estimators for the Independent Not Identically Distributed Case."
Annals of Mathematical Statistics 42:1977-1991.

Ito, T., and K. Ueda (1981). "Tests of the Equilibrium Hypothesis in
Disequilibrium Econometrics: An International Comparison of Credit
Rationing." International Economic Review 22:691-708.

Jennrich, R.I. (1969). "Asymptotic Properties of Non-linear Least
Squares Estimators." Annals of Mathematical Statistics 40:633-643.

Laffont, J.J. and R. Garcia (1977). "Disequilibrium Econometrics for
Business Loans." Econometrica 45:1187-1204.

Lee, L.F., and R.H. Porter (1984). "Switching Regression Models with
Imperfect Sample Separation Information -- With an Application on
Cartel Stability." Econometrica 52:391-418.

Levine, D. (1983). "A Remark on Serial Correlation in Maximum
Likelihood." Journal of Econometrics 23:337-342.

Loeve, M. (1960). Probability Theory. 2nd ed. Princeton: Van

Maddala, G.S. (1983) Limited-dependent and Qualitative Variables in
Econometrics. New York: Cambridge University Press.

Maddala, G.S., and F. Nelson (1974). "Maximum Likelihood Methods for
Markets in Disequilibrium." Econometrica 42:1013-1030.

Manski, C.F. (1975). "The Maximum Score Estimation of the Stochastic
Utility Model of Choice." Journal of Econometrics 3:205-228.

Manski, C.F. (1983). "Closest Empirical Distribution Estimator."
Econometrica 51:305-320.

Manski, C.F. (1985). "Semiparametric Analysis of Discrete Response:
Asymptotic Properties of the Maximum Score Estimator." Journal of
Econometrics 27:313-333.

McLeish, D.C. (1975). "A Maximal Inequality and Dependent Strong Laws."
Annals of Probability 3:826-836.

Newey, W.K. and K.D. West (1985). "A Simple, Positive Definite,
Heteroscedasticity and Autocorrelation Consistent Covariance
Matrix." Discussion paper 92, Woodrow Wilson School, Princeton

Olsen, R.J. (1978), "Note on the Uniqueness of the Maximum Likelihood
Estimator for the Tobit Model." Econometrica 46:1211-1215.

Powell, J.L. (1984). "Least Absolute Deviations Estimation for the
Censored Regression Model." Journal of Econometrics 25:303-325.

Rao, R.R. (1962). "Relations between Weak and Uniform Convergence of
Measures with Applications." Annals of Mathematical Statistics

Rudin, W. (1976). Principles of Mathematical Analysis. New York:

Serfling, R.J. (1968). "Contributions to Central Limit Theory for
Dependent Variables." Annals of Mathematical Statistics

Sealy, C.W., Jr. (1979). "Credit Rationing in the Commercial Loan
Market: Estimates of a Structural Model Under Conditions of
Disequilibrium." Journal of Finance 34:689-702.

Wald, A. (1949). "Note on the Consistency of the Maximum Likelihood
Estimate." Annals of Mathematical Statistics 20:595-601.

White, H. (1980). "Nonlinear Regression on Cross-Section Data."
Econometrica 48:721-746.

White, H. (1984). Asymptotic Theory for Econometricians. New York:
Academic Press.

White, H., and I. Domowitz (1981). "Nonlinear Regression with Dependent
Observations." Unpublished paper, University of California, San

White, H., and I. Domowitz (1984). "Nonlinear Regression with Dependent
Observations." Econometrica 52:143-162.


Walter James Mayer was born in Detroit, Michigan, in 1955. He

received a Bachelor of Arts degree in economics from the University of

Missouri in 1982, and a Master of Arts degree from the University of

Florida in 1983.

I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.

Stephen R. Cosslett, Chairman
Associate Professor of Economics

I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.

G.S. Maddala
Professor of Economics

I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy. j

A.I. Khuri
Associate Professor of Statistics

This dissertation was submitted to the Graduate Faculty of Department of
Economics in the College of Business Administration and to the Graduate
School and was accepted as partial fulfillment of the requirements for
the degree of Doctor of Philosophy.

nDrpmcobr 1986

Dean, Graduate School


University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - Version 2.9.7 - mvs