Citation

## Material Information

Title:
Generalized econometric models and methods for markets in disequilibrium
Creator:
Mayer, Walter James, 1955- ( Dissertant )
Cosslett, Stephen R. ( Thesis advisor )
Khuri, A. I. ( Reviewer )
Lockhart, Madelyn ( Degree grantor )
Place of Publication:
Gainesville, Fla.
Publisher:
University of Florida
Publication Date:
1986
Language:
English

## Subjects

Subjects / Keywords:
Consistent estimators ( jstor )
Estimation methods ( jstor )
Estimators ( jstor )
Market disequilibrium ( jstor )
Maximum likelihood estimations ( jstor )
Price changes ( jstor )
Price level changes ( jstor )
Prices ( jstor )
Statistics ( jstor )
Supply and demand ( jstor )
Dissertations, Academic -- Economics -- UF
Econometric models
Economics thesis, Ph.D.
Equilibrium (Economics)
Markets
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

## Notes

Abstract:
Empirical studies of markets in disequilibrium have relied on the appropriateness of explicit price adjustment equations, serial independence, normally distributed errors, and explicit equations relating the observed quantity transacted to desired supply and demand. For example, the asymptotic properties of "disequilibrium" estimators and test statistics are sensitive to the parametric forms chosen for price adjustment, the serial behavior of the observations, error distributions, and the quantity transacted. In a word, "disequilibrium" estimators and statistics are non-robust. Unfortunately, economic theory provides little basis for choosing the parametric forms. A lack of economic-theoretic restrictions coupled with non-robust estimators and statistics has severely limited empirical studies of markets in disequilibrium. This dissertation develops new methods for more meaningful estimation of disequilibrium models. The new methods involve more general models and robust estimators. A switching regression model with imperfect sample separation is used to incorporate price adjustment into a disequilibrium model. The model enables price adjustment to be incorporated with less a prior information than usual. To estimate the model, maximum likelihood and least squares estimators are proposed. The asymptotic properties of the maximum likelihood estimator are examined. Previous results for maximum likelihood estimators of disequilibrium models are generalized with asymptotic theory for serially dependent observations. The maximum likelihood estimator is shown to be consistent and asymptotically normal even if the data are characterized by unknown forms of serial dependence. Asymptotic test statistics are also derived. The methodology is illustrated with an empirical application to the U.S. commercial loan market from 1979 to 1984. Finally, I propose semiparametric models and estimators for markets in disequilibrium. These methods are applicable when the error distributions are unknown, and the quantity transacted is an unknown function of supply and demand. Consistent estimators are derived using the method of maximum score.
Thesis:
Thesis (Ph.D).--University of Florida, 1986.
Bibliography:
Includes bibliographical references (leaves 86-88).
General Note:
Vita.
General Note:
Typescript.

## Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Resource Identifier:
AEN9870 ( ltuf )
16167663 ( oclc )
0030223414 ( ALEPH )

Full Text

GENERALIZED ECONOMETRIC MODELS AND METHODS FOR MARKETS IN DISEQUILIBRIUM

BY

WALTER JAMIES MIAYER

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF TIIE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1986

ACIIOWLEDGEMENTS

I would like to thank my advisor Dr. S.R. Cosslett, for his support

throughout this project. Thanks are also extended to Dr. G.S. Maddala, Dr. D.A. Denslow, and Dr. A.I. Khuri. Invaluable assistance provided by DeLayne Redding in the typing of this document is much appreciated. This dissertation is dedicated to my parents.

Abstract of Dissertation Presented to the
Graduate School of the'University of Florida
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy

GENERALIZED ECONOMETRIC MODELS AND METHODS FOR MARKETS IN DISEQUILIBRIUM

By

WALTER JAMES MAYER

December 1986

Chairman: Dr. S. R. Cosslett Cochairman: Dr. G. S. Maddala Major Department: Economics

Empirical studies of markets in disequilibrium have relied on the appropriateness of explicit price adjustment equations, serial independence, normally distributed errors, and explicit equations relating the observed quantity transacted to desired supply and demand. For example, the asymptotic properties of "disequilibrium" estimators and test statistics are sensitive to the parametric forms chosen for price adjustment, the serial behavior of the observations, error distributions, and the quantity transacted. In a word, "disequilibrium" estimators and statistics are non-robust. Unfortunately, economic theory provides little basis for choosing the parametric forms. A lack of economic-theoretic restrictions coupled with non-robust estimators and statistics has severely limited empirical studies of markets in disequilibrium.

This dissertation develops new methods for more meaningful

estimation of disequilibrium models. The new methods involve more general models and robust estimators.

A switching regression model with imperfect sample separation is used to incorporate price adjustment into a disequilibrium model. The model enables price adjustment to be incorporated with less a priori information than usual. To estimate the model, maximum likelihood and least squares estimators are proposed.

The asymptotic properties of the maximum likelihood estimator are examined. Previous results for maximum likelihood estimators of disequilibrium models are generalized with asymptotic theory for serially dependent observations. The maximum likelihood estimator is shown to be consistent and asymptotically normal even if the data are characterized by unknown forms of serial dependence. Asymptotic test statistics are also derived.

The methodology is illustrated with an empirical application to the U.S. commercial loan market from 1979 to 1984.

Finally, I propose semiparametric models and estimators for markets in disequilibrium. These methods are applicable when the error distributions are unknown, and the quantity transacted is an unknown function of supply and demand. Consistent estimators are derived using the method of maximum score.

Page

ACKNOWLEDGMENTS . ABSTRACT . CHAPTERS

1 AN OVERVIEW .

1.1 The Problem .
1.2 Solutions .
NOTES .

2 A GENERAL DISEQUILIBRIUM MODEL AND
ESTIMATORS FOR LIMITED A PRIORI PRICE

2.1 Introduction .

2.2 The Model and Maximum Likelihood

Estimation .

2.3 A Consistent Initial Least Squares

Estimator .

2.4 Summary and Conclusions .

NOTES .

3 SOME ASYMPTOTIC THEORY FOR SERIALLY

DEPENDENT OBSERVATIONS .

3.1 Introduction .
3.2 Consistency .
3.3 Asymptotic Normality .
3.4 Consistent Covariance Estimation .
3.5 An Asymptotic Test for Serial
Correlation .
3.6 Summary and Conclusions .
NOTES .
v

4 AN EMPIRICAL EXAMPLE: THE U.S.
COMMERCIAL LOAN MARKET . 41 4.1 Introduction . 41 4.2 The Empirical Model . 42 4.3 Hypothesis Testing Procedures . 43
4.4 The Results . 45

5 SEMIPARAMETRIC ESTIMATION OF DISEQUILIBRIUM MODELS USING THE METHOD OF
MAXIMUM SCORE . 53 5.1 Introduction . 53
5.2 A Directional Model and Consistent
Estimation Up to Scale . 55
5.3 A Price Adjustment Model wit 0
Identified (Without a Loss of Scale) 59
5.4 Maximum Score Estimation of Models
that Include the Quantity Transacted 61
NOTES . 68

6 CONCLUDING REMARKS AND DIRECTIONS FOR
FURTHER RESEARCH . 69

APPENDIX

A.1 Inconsistency and Misclassified
Observations . 71
A.2 The Computational Tractability and
Asymptotic Properties of the Least
Squares Estimator of Section 2.3 . 74
A.3 Proofs of Theorems 3.2 - 3.7 . 80

Asymptotic Distribution of the

(p+l)th-Round Estimates . 83

BIBLIOGRAPHY . 86

BIOGRAPHICAL SKETCH . 89

CHAPTER 1
AN OVERVIEW

1.1 The Problem

Before Fair and Jaffee (1972) introduced their econometric

disequilibrium model, estimation of market behavior was confined to the equilibrium assumption. The study of the econometrics of disequilibrium was further developed by Fair and Kelejian (1974), Maddala and Nelson (1974), Amemiya (1974), Goldfeld and Quandt (1975), Laffont and Garcia (1977), Bowden (1978), and some others. By allowing for disequilibrium, Fair and Jaffee's work and the subsequent work it inspired represents an important generalization; but a generalization obtained by imposing

(2) serial independence on time series data,

(3) normally distributed error terms, and

(4) explicit equations relating the observed quantity transacted

to desired supply and demand.

This contrasts with the equilibrium assumption where

(1) price adjustment is not an estimation issue,

(2) allowances are made for serial correlation,

(3) the errors are only assumed to be uncorrelated with a subset of the explanatory variables, and

(4) desired supply and demand are directly observable.

The estimation of market behavior has been extended to the disequilibrium assumption, but at a cost.

Economic theory for markets in disequilibrium is a relatively new area of research that has been developed in the last few years by Benassy (1982), and Fisher (1983), among others. Being recent phenomenona, however, the theories that have been proposed are rather limited in scope and tentative. For the empirical researcher, the theories provide little guidance for specifying price adjustment, and the quantity transacted as a function of desired supply and demand; they provide no basis for specifying the error distributions and serial independence. A survey of the many empirical studies that have followed Fair and Jaffee (1972) suggests that the basis for specifying these aspects of the econometric disequilibrium model has been largely computational tractability. This approach has led to several drastically different disequilibrium specifications. 1The assumptions of each specification generally do not represent well-defined economic-theoretic restrictions, and thus differences among them seldom reflect differences among well-defined alternative economic theories. As a result, most disequilibrium specifications are as good (or bad) as any other. Unfortunately, each specification produces estimates only as reliable as the assumptions imposed, and differences among them can lead to conflicting estimates of supply and demand equations.

The lack of economic-theoretic restrictions alone does not prohibit meaningful estimation of a disequilibrium model. The estimators commonly applied are also prohibitive. Most proposed "disequilibrium" estimators can be viewed as corrected versions of the "equilibrium"

least squares (LS) and maximum likelihood (ML) estimators. 2The inequality of supply and demand introduces nonzero correlation between the explanatory variables and the error terms. Given a model for the inequality of supply and demand, the "equilibrium" LS and ML estimators can be corrected for the nonzero correlation to yield consistent "disequilibrium" LS and ML estimators. The correction approach provides insight into the problem of relaxing the equilibrium assumption, but generally requires restrictive assumptions to make it operational. In particular, consistent LS and ML estimation of a disequilibrium model depends on choosing the correct the parametric forms for price adjustment, the error distribution functions, and the quantity transacted; useful inferences require allowances for serial correlation as well as correct parametric forms. Non-robustness coupled with a lack of economic-theoretic restrictions severely limits the reliability of LS and ML estimation.

To illustrate these points we consider the following model.

Dt = o0x + 6 11
t 1 t it
St = 0 x + E2 (1.2)
2 t 2
Data: (Qt x )n~
t t t

Disequilibrium assumption: D t S Qt= T t(D t,S )

Ap t 11 (D-S).

Equations (1.1) and (1.2) are demand and supply functions; D denotes the quantity demanded, S the quantity supplied, x a vector of explanatory variables, c lt and c 2t denote random error terms. Under the equilibrium

assumption the observed quantity transacted, Q, is equal to both D and S; data are observed after prices adjust, and therefore adjustment models are irrelevant. Under the disequilibrium assumption D and S are not necessarily observable, the function t (.,.) specifies the position of Dt and St relative to the observable Q t; data reflect adjustments at various stages, and therefore it becomes meaningful to model price adjustment. Price adjustments are modeled as follows: the price change, Apt+I Pt~ - Pt, depends on excess demand, Dt-St, through the function Ht

When LS and ML are applied under the disequilibrium assumption, it becomes necessary to specify the distribution of (eit,E2t) up to an unknown parameter vector, and the functional forms for Tt and Ht" The following example will illustrate this. Consider the problem of obtaining a consistent LS estimate of I0o Under the equilibrium assumption the data are conditional on the event Q=D=S, and therefore a consistent "equilibrium" LS estimate requires E(xtclt Qt=Dt=st)=0, or equivalently E(x ts t)=0 since Qt=Dt=St is a sure event by assumption. Under the disequilibrium assumption, by contrast, each observation is conditional on either Dt S t or Dt>St, and therefore a consistent "disequilibrium" LS estimate requires E(XtEItIDtSt)=0. But since, for example, D tSt). For example, suppose we specify

0 02 0
(elt g2t) S N 812(1.3)
0 0 2

Qt = Tt (D t,'S E min(D tS ), (1.4)

Apt+i = 7t (Dt,St) E a(Dt-St), a>O, (1.5)

and for the first n observations we have Ap t+l<0. Then Q =Dt 1 t+1 t t t t=1,.,n1 by equations (1.4) and (1.5), and consistent LS estimates can be obtained by solving the problem

n

m 1 E(2 2 2
min {n (Qt-IXt - E(It IQt 2 2 )t=1
C0,6 2 'a-2

The functional form for the "correction" term E(cIt Qt
2 2 2 1

2 2
@(("xt- xt)/( a i +0 2)2

where 4(.) and 0(.) denote the standard normal density and distribution function. Without a priori restrictions the specification of assumptions (1.3), (1.4), and (1.5) is arbitrary, but obviously crucial to the LS estimation of the parameters. For example, given what is known about most markets, some alternative assumptions that are just plausible as (1.3), (1.4), and (1.5) are (1) any nonnormal symmetrical distribution for the error terms, (2) Qt ;min(Dt,St) and (3) Apt+=a(Dt-St) + 3t', where e3t is a random error term. Alternatively, one could derive the likelihood function of (Qtxt )n=, and obtain the t't

ML estimates. Once again, however, the estimates are subject to the validity of restrictive assumptions.

Empirical studies of markets in disequilibrium are concerned with analyzing time-series data, and therefore the possibility of serially correlated errors also arises. Most disequilibrium studies, however, completely ignore the possibility of serial correlation. One reason for this practice is that maximizing the correct likelihood function for a typical disequilibrium model with even a simple form of serial correlation can be intractable. The problem is one of introducing further complications into a highly nonlinear structure. (Equilibrium models, by contrast, have simpler structures, and therefore it is relatively straightforward to incorporate an ARMA process (say) into these models.) The problem is further complicated when the true form of the serial correlation is unknown; even if one was willing to incorporate a simple process such as ARM1, the result would likely to be a questionable approximation at best. At the same time, failure to adequately account for serial correlation can cause inconsistent covariance estimates, and incorrectly interpretated test statistics.

In summary, the estimation of markets in disequilibrium has been severely limited by the problems of specifying (1) price adjustment;

(2) serial correlation; (3) the distributions of the error terms up to an unknown parameter vector; and (4the quantity transacted as a function of desired supply and demand.

1.2 Solutions

This thesis addresses the above problems by examining their effects, and by proposing and demonstrating solutions.

Chapters 2, 3, and 4 are directed at the problems of specifying price adjustment, and specifying serial correlation. In Chapter 2, we propose using the switching regression model with imperfect sample separation of Lee and Porter (1984) to incorporate price adjustment into disequilibrium models. The model enables price adjustment to be incorporated with less a priori information than usual. To estimate the model, ML and LS estimators are proposed.

In Chapter 3, the asymptotic properties of the ILL estimator are examined in the context of possible serial correlation. This chapter builds on previous results of Hartley and Mallela (1977), and Amemiya and Sen (1977). By incorporating into their results some recent developments in modeling serial correlation by White and Domowitz (1984) and others, the analysis permits the data to be characterized by unknown and general forms of serial correlation. At the same time, the estimation problem remains computationally tractable.

In Chapter 4, the practical importance of the methodology developed in chapters 2 and 3 is illustrated with an empirical example. The methodology is applied to monthly data on the U.S. commercial loan market from 1979 to 1984.

The final chapter, Chapter 5, proposes semiparametric models and

estimators for markets in disequilibrium. Unlike the previous chapters, the results of Chapter 5 apply when the functional forms of the error distribution functions are unknown, and the observed quantity transacted is an unknown function of desired supply and demand. Consistent semiparametric estimators are derived by extending the method of maximum score of Manski (1975, 1985) to a new class of applications.

8

Although the focus is on the disequilibrium estimation problem,

many of the issues addressed are applicable to other important problems as well. From a general perspective, the central issue is how to deal with an estimation problem characterized by less information than what is usually assumed. The methodology with which we confront the issue brings together important works from the areas of limited dependent variables, nonlinear estimation, asymptotic theory, data analysis, maximum likelihood, least squares, and semiparametric estimation.

NOTES

lOne notable difference among many proposed disequilibrium

specifications is the treatment of price adjustment. Different price adjustment models often produce different coefficient estimates and inferences for given supply and demand equations. Most studies assume normally distributed error terms, and that the quantity transacted is the minimum of desired supply and demand. Surveys of disequilibrium specifications commonly used in applied work can be found in Bowden (1978) and Maddala (1983).
2General discussions of LS and ML estimators for disequilibrium models can be found in Bowden (1978) and Maddala (1983).
3The random variable eit conditional on D t

CHAPTER 2
A GENERAL DISEQUILIBRIUM MODEL AND ESTIMATORS FOR LIMITED A PRIORI PRICE ADJUSTMENT INFORMATION

2.1 Introduction

Price adjustment has a well defined role in the equilibrium model: prices adjust to clear the market; data are observed only after adjustments terminate, and therefore are uninformative on the forces which led to equilibrium. When we assume prices clear the market, modeling price adjustment is trivial. In contrast, when we assume disequilibrium, and therefore observe adjustments at various stages, modeling the process becomes nontrivial and affects the estimation of the supply and demand model. The research that has followed Fair and Jaffee has given this issue only limited attention. To lessen the neglect the present chapter examines the importance of price adjustment to estimation, and offers a new approach for introducing price adjustment into the disequilibrium model.

The estimation of a disequilibrium model carries the reservation

that estimates are sensitive to the price adjustment specification. This sensitivity is evident in many of the empirical studies which followed Fair and Jaffee. For example, Maddala and Nelson (1974) obtained the maximum likelihood (ML) estimates of a housing market in disequilibrium under two different price adjustment specifications, PAl. the sign of excess demand is given by the direction of the price

change, or equivalently

Pr(Ap t+1>O1D t>S t I and Pr(Ap t+1 >ID t

PA2. ignore whatever information the direction of the price change

contains on excess demand. (This is a limited-information approach

as no attempt is made to model price adjustment.) In the next section, we shall see that this specification can be usefully

viewed as imposing the following constraint:

Pr(Ap t+ >0ID t >S t) = Pr(Apt+i >0ID t

For the two sets of estimates the following conflicts are apparent: one estimated coefficient is negative under PAL and positive under PA2; another is statistically significant under PA2 but not under PAl; the estimated variance of the supply error term is twenty-five times larger under PA2, and the same parameter for the demand equation is ten times larger.

Economic theory imposes few restrictions on the dynamics of price adjustment, and consequently provides little basis for choosing between specifications such as PAl and PA2. Perhaps this explains why in many studies the Fair-Jaf fee models are applied rather mechanically with no discussion of why a particular choice is appropriate for a given market. The tendency has been either to specify convenient but restrictive price adjustment mechanism such as PAL, or to ignore potential relations between price and excess demand as in PA2. Apart from the potential for conflicting results, each approach has serious drawbacks. The restrictive approach may misspecify the model, and therefore lead to inconsistent estimates of the supply and demand parameters. On the other hand, if there is some interaction between price and excess demand, then efficiency will be lost if price adjustment is completely ignored. In short, even if the estimates under PAl are close to those under PA2, problems remain.

The failure of many empirical studies to adequately represent price adjustment stems from a failure to carefully assess what is known a priori. For most applications PAl imposes too much structure, and PA2 imposes too little. What is needed is an approach which allows price and excess demand to interact, but at the same time is unrestrictive.

I propose nesting PAl and PA2 in a more general model using a method suggested by Lee and Porter (1984). In many respects the approach is less restrictive than usual. Price adjustments are assumed to be governed by the following condition: PA3. The direction of the price change is most likely, but not certain to follow the direction of excess demand, or equivalently

Although PA3 allows for the possibility that excess demand influences price changes, it does not restrict the direction of the price change to correspond to the sign of excess demand, impose a specific price adjustment equation, or restrict price changes to obey a known probability distribution. The approach entails estimating the conditional probabilities in PA3, and hence the data rather than a priori constraints such as PAl or PA2 determine to what extent prices are related to excess demand. Moreover, the problem of modeling price adjustment is placed in a unified framework which permits a useful discussion of the relationship between the price adjustment specification, and the statistical properties of estimation. PAl and PA2 are special cases of PA3, and it is argued that imposing PAl can lead to inconsistent estimates, while imposing PA2 can suppress exploitable information on the supply and demand parameters.

The model and its maximum likelihood estimator are discussed in section 2.2. In section 2.3 a convenient least squares approach is proposed which has not been previously available for disequilibrium models. The LS estimator resembles that suggested by Heckman (1976) for the Tobit model. Although the ML estimator presented in section 2.2 is more efficient, the LS estimator is easier to compute, and provides consistent starting values if the ML estimates are desired. An initial consistent estimator is especially important when PAl is relaxed since the resulting likelihood generally has multiple solutions.

2.2 The Model and Maximum Likelihood Estimation

I propose the following model:

Dt = 0X + i (demand)
t 1 t l
St = xt + 2 (supply)
t 2 t 2t

N(it' (( N 2 (normality)
0 , 0O a 2

Qt = min(D t'S t) (quantity transacted)

Pr(APt+l>O t>St) > Pr(APt+l>0IDt

where the variables are as they were defined in Chapter one. The specification of the demand and supply equations, normality for the error terms, and the quantity transacted as the minimum of supply and demand has become standard practice for empirical studies of markets in disequilibrium. The model differs from previous disequilibrium specifications with the introduction of PA3: shortages (D>S) and

surpluses (D
The data consists of n observations on (Q t, xt At+ 0, where 1(.) is the indicator function, and the problem is to estimate the unknown supply and demand parameters along with the conditional distribution of 1(Ap t >0) subject to PA3.

To make the problem operational we will adopt the methodology of Lee and Porter (1984) which entails the following assumptions: 1 Assumption 2.1. Given D t>S t(or D tO) are mutually

independent for all t;

Assumption 2.2. the conditional probabilities of PA3 do not vary with

t; i.e., p 11 =Pr(Apt+> D t>S tx ) P0= rA +> D t

Assumption 2.2 is the simplest assumption that allows the price adjustment probabilities to be treated as estimable parameters, but is not the only possible way of doing so. For example, if it is suspected that price setting behavior differs between certain subsamples, then a different pair of parameters could be defined for each. One possible application might be a market where prices are regulated in some periods, but not in others. Alternatively, a completely varying-parameter approach is developed in Chapter 5. Although assumptions 2.1 and 2.2 are still somewhat restrictive, arguably the benefits obtained from imposing them outweigh the costs. Price adjustment is incorporated without an explicit adjustment equation, a specific distribution for price changes, or the restriction that price changes reveal the sign of

excess demand. Furthermore, as we shall see next, estimation is relatively straightforward under assumptions 2.1 and 2.2.

The log likelihood function of n independent observations on
2 2
(Qt,l(APt+l3>) I xt,',Pl,l0), where 0= (B1,1a 2, 2 ), is

n
Ln (0,p11'p10) = E log ft (Qt,1(Apt+>0)) (2.3)
t=1
where

l(Apt+ >0)
ft(Qt,l(t+l>0))=(P11gst+P l0gdt(tapt+1)
((-Pll)1 st+(1-Pl0)gdt)

st f gt(Dt Qt)dD t gdt t gt(Qt,St)dSt,
s t t

and gt(Dt,St) is the joint density of Dt and St given xt and 0. Under fairly general conditions, a consistent and asymptotically normal estimate of (0o,pol,'P0) can be obtained by maximizing Ln over an appropriate parameter space. (The asymptotic properties of a maximizer of Ln are developed in the next chapter.)

The Maddala-Nelson estimators discussed in section 2.1 are obtained by maximizing L subject to
n

(PAl): (pll,10) = (1,0); (PA2): Pl=P10'

As was noted, however, applying these two estimators to a given data set can produce conflicting results. One advantage of specifying PA3 is that the parameter space includes the entire region (P11' P10: Pll= 100), and consequently it is not necessary to choose between PAl and PA2.

By viewing the Maddala-Nelson estimators as constrained maximizers of Ln, two additional limitations that are overcome by specifying PA3 can be seen. First, if the direction of the price change does not always follow the sign of excess demand so that pll<< or pI0 >0, then the estimator obtained by maximizing L subject to (p11,P10) (1,0) is inconsistent. In other words, if it is incorrectly assumed that 1(Apt+l >0) separates the sample into the underlying demand (Qt=Dt) and supply (Qt S t) regimes, then the resulting estimates will be generally inconsistent. To see this denote the constrained estimator by 0n(1,0), and suppose that slt,s2t are normally distributed independent random variables, and p0

E(DL (ï¿½;i,0)/1) (l-Pol) E xtE(Qt-Dt)/oi. (2.4)

Since Pr(D t>S t)>0, and Pr(Dt-eQt = min (Dt,St)) = 1, we have E(Q -D t)<0, and it follows from (2.4) that in general plim 0 (1,0);-Oï¿½. (For further details see Appendix A.1.)

The second limitation overcome by maximizing L over the
n

unconstrained space demonstrates the importance of incorporating price adjustment into the model. If price changes are related to excess demand so that p0lpl0, then the observations on 1(Apt l>0) contain information on GP that is exploited by the maximizer of L only if the
n

restriction p1=p10 is not imposed. Since imposing p11=p10 is equivalent to estimating the model without a price adjustment specification, this implies that one is better off using even limited amounts of price adjustment information rather than neglecting it altogether. This can be seen by examining the difference between the corresponding information matrices of the constrained and unconstrained estimators of

0o. For this purpose note that P11=P10 implies that Qt and 1(APt+1>O) are independent, the marginal distribution of l(Apt+1>0) does not depend on 00, and therefore the 0o-estimator obtained by maximizing L subject
n

to P11=P10 can be written as
A n
(P11=P10) = arg max E log gt(Qt), Ot=1

where gt is the density of Qt given xt. Since pn(P11=P10) does not require the joint observation (Qt,l(Apt+l>0)), it uses one more observation on Q than the 8-estimator obtained by maximizing Ln over the unconstrained space, and therefore we write the latter estimator as n-1
0nl(Pllt1l0) = arg max E log ft (Qt,(APt+l>0))
(O,p11'10) t=l

Unlike 0 (P11=P10), the estimator 0n-_(PlltP10) uses the price adjustment information implied by p1,po0, namely the dependence of 1(Apt+1>0) and Qt. For simplicity suppose that the observations are identically distributed. The trade-off between the extra observation on Q used in Gn(P11=P10), and the price adjustment information exploited by en_- (P11p10) is apparent in the difference between their corresponding information matrices:

(n-1)E(-32logf/ 0') - nE(-a2logg/0~0E')

(n-1)E(-a21 ogh/3 ') - E(-a21ogg/ a3')

where ht is the density of l(Apt+l>0) given (Qt,xt). In large samples the information provided by the extra observation on Q in n (P=P 10) is insignificant, and clearly n- _l(PllPl0) is the more efficient estimator.

Having developed a fairly unrestrictive approach for introducing price adjustment into the disequilibrium model, an important question remains: is maximization of L over (,p 1p) computationally tractable? This question is important given a common structure shared by both L with (p 11,P10 ) unrestricted, and L restricted by p11=p10: neither specification permits the observations to be separated into the underlying supply and demand regimes, and hence both are switching regression models with unknown sample separation. The question of tractability arises because likelihood functions of unknown sample separation models generally have an unknown number of local maxima, and finding the consistent and asymptotically normal estimate (global maxima) usually requires an exhaustive set of local candidates. For example, Maddala and Nelson found that three different starting values produced three different sets of estimates, and were not able to rule out the possibility of other solutions. Unfortunately, the extra information provided by the joint observation (Q~~ + 0)does not

automatically eliminate the problem; in general, L nis likely to have multiple solutions. Fortunately, the problem can be circumvented. If one is willing to assume that P11>'p10, then it is possible to construct a computationally simple and consistent estimator of (O,p 11,plo), and therefore obtain consistent starting values to iterate to a local maxima of L n. The consistency of the initial estimates generally guarantee the consistency and asymptotic normality of the resulting slto.2The next task is to describe the initial estimator.

2.3. A Consistent Initial Least Squares Estimator

While computationally simple and consistent estimators have been proposed for other limited dependent variable models such as the Tobit,

similar results have not been previously available for the disequilibrium model with unknown sample separation. Ironically, the models for which such estimators have been available generally possess tractable likelihood functions, and therefore finding consistent initial estimates is of limited value. A prime example is the Tobit model for which consistent initial estimators were proposed by Amemiya (1973), and Heckman (1976); their estimators are not particularly useful for the Tobit as this model has a globally concave likelihood function (when suitably parameterized) which ensures convergence to the consistent and
3
asymptotically normal maximizer from any starting values. In contrast, the likelihood functions of models with unknown sample separation are likely to have multiple maxima, and therefore finding initial consistent estimates for these models is crucial.

The estimator described below extends the approaches suggested by Amemiya and Heckman to disequilibrium models with unknown sample separation. The method requires the first moment of l(Ap t+>0) (tl,. .,n), and the first and second moments of Qt (t=1,.,n). Least squares is then applied successively to three estimation equations. Assuming that 6it and F2t are independent normally distributed random variables, the relevant equations are

l(Ap t+>0) = El(APt+l>0) + ult (2.5)

Qt = E(Q t) + ult (2.6)
Q2 = E(Q2) + u (2.7)

t t 3
where

E(I(Apt+I>0)) = P - (p1 - PI0) D(xtyï¿½), (2.8)

E(Qt) (l-D(xt y))xtB2 + D(xtY0)x 0 o2 o2
tt t2 t t1 1 +e2) (o(9

E(Q = (1-D(xtTo))(xt o)2 + D(xtY0)(xto)2

+ ci t (t( 2~O ( ),t o +

" yo2 (1,~ x00 xYo, o2 + Fo2 )

" (G-(xtï¿½y)- 2x 82p(xY)/(o + o))
E2 t t t 1

+ (xt y0)xtY 0 (2.10)

00 0 o2 +o2 2
Y =B2_ ) 1 + o F , and (.) and 4(.) denote the standard normal

density and c.d.f., respectively. Given appropriate regularity conditions, nonlinear LS applied to equation (2.5) yields consistent estimates of pill P10) and yo. These estimates are then used to estimate the nonlinear functions, and P, in equation (2.6). Ordinary LS can then be applied to (2.6) to consistently estimate +, 2 and (a i + 02) . Finally, the nonlinear functions of equation (2.7) are estimated so that OLS can be applied, and consistent estimates of ao2 and o2

obtained. The asymptotic properties of the LS estimator are developed in appendix A.2.

Interestingly, the above approach is possible only if price changes provide some information on whether there is excess demand or supply;
p o0;p0 This can be seen from equation (2.8) which can be i.e.,

interpreted as the probability that l(Ap t+>0) is equal to one. If price changes are completely uninformative on excess demand or supply, then Pll=Pl0, and it follows from (2.8) that the distribution of l(APt+l>0) is independent of y 0. In this case the observations on l(Apt+l>0) contain no information on the supply and demand parameters, and therefore equations (2.5) is irrelevant for the estimation of the model.

2.4 Summary and Conclusions

The main points of this chapter are

(1) Estimates of disequilibrium models are sensitive to the price adjustment specification.

(2) Economic theory imposes few restrictions on price adjustment, and consequently provides little basis for choosing between specifications.

(3) Assumption PA3 serves as an unrestrictive approach for introducing price adjustments into disequilibrium models; adjustment enters without an explicit adjustment equation, a known probability distribution for price changes, or the restriction that price changes reveal the sign of excess demand.

(4) Assumption PA3 together with assumptions 2.1 and 2.2 permit a straightforward derivation of the likelihood function. The parameter space includes but is not limited to the important special cases PAl and PA2. Constraining the parameter space to PAl, as is often done in practice, can lead to inconsistent estimation; constraining the space to PA2 produces inefficient estimates.

(5) Under assumption PA3 the disequilibrium model is one of unknown sample separation, and therefore its likelihood function generally has multiple solutions. To resolve the problem of multiple solutions, the least squares method described in section 2.3 provides consistent initial estimates.

In Chapter 4 we apply the methodology developed in the present chapter to monthly data on the U.S. commercial loan market. Before proceeding to the application, however, the problem of serial correlation must be addressed. In the next chapter we develop some results which permit the data to be analyzed in the context of possible serial correlation.

NOTES

'There is an important difference between the model Lee and Porter (1984) discuss, and our model. The Lee-Porter model excludes an analog to Qt min(D CS t), and consequently in their model the switching is exogenous; i.e., the switching that occurs between the underlying regimes is independent of the error terms. In contrast, the disequilibrium model is of endogenous switching, (the "switch" depends on (E tF2 ) and consequently many of the results, interpretations, and expressions found in the Lee-Porter paper must be modified accordingly.

21In fact, given appropriate regularity conditions, consistent

initial estimates ensure the consistency and asymptotic normality of the second-round estimates from a Newton-Raphson type algorithm. See, for example, Amemiya (1973, pp. 1014-15).

3 Olsen (1978) proved that the likelihood function for the Tobit model is globally concave when suitably parameterized, and thus has a single maximum.

CHAPTER 3
SOME ASYMPTOTIC THEORY FOR SERIALLY DEPENDENT OBSERVATIONS

3.1 Introduction

In this chapter we examine the asymptotic properties of the estimator discussed in section 2.1, (-?l arg max L (0 ),
n 0 n p
p

where 0 = (Opljjpl0), and Ln (0p) is defined on page 14, equation (2.3). If the observations are serially independent, then obviously iml
ï¿½ n

is the MLE of 00. However, for serially dependent observations, Cml is
p n
not the MLE and will be referred to as the partial-MLE.

Hartley and Mallela (1977), and Amemiya and Sen (1977) derive

asymptotic properties of the MLE for the special case of P1l=P10. We will extend their results to the case of serially dependent observations and p11jpl0 in sections 3.2 and 3.3. In section 3.4 we consider the problem of consistently estimating the asymptotic covariance matrix. In section 3.5 we derive a new test for serial correlation.

3.2 Consistency

Since disequilibrium models are typically estimated with time

series data, it is of interest that the property of consistency can be extended to the partial MLE. Using some results and definitions presented by White and Domowitz (1981), Levine (1983) has discussed how and why a partial MLE can be consistent. Levine points out that the consistency of an estimator 0 (y) which maximizes the product 23

ftPift(yt 1) depends on each f (YtIï¿½) satisfying certain regularity conditions. In general, whether or not ft (Yt1) satisfies such conditions does not depend on the product being the joint density of y = (Yl,-.Yn), but rather it usually suffices that ft(.[O) is the marginal density of yt. The regularity conditions consist of identification conditions, and moment restrictions sufficient to apply an appropriate law of large numbers. We will show that the partial MLE for our model can satisfy such conditions by extending some results proven by Hartley and Mallela, and Amemiya and Sen. But first it is necessary to describe the type of dependence we have in mind.

We will adopt the nonparametric approach of White and Domowitz

(1984) to allow for the possibility of serial correlation. The approach of White and Domowitz is nonparametric in the sense that the observations are not required to be generated by a known parametric model such as an ARMA (p,q) process, but instead must obey general memory requirements. The memory requirements are referred to as mixing conditions, and a sequence of random variables which obey mixing conditions is said to be a mixing sequence. More precisely, we have the following definition. 1

Definition 3.1. Let (y t) denote a sequence of random vectors defined on a probability space (Q,F,P), and let Fb denote the Borel a-field of
a

events generated by the random variables yaYa+l,. ,yb Define

~n
(m) = sup(sup JP(A JB)-P(A) 1: A:Fn00,B F )) and

a(m) = sup(sup([P(BA)-P(B)P(A) I: AFn0,BFn_ )).
n nm 0

(i) If ((m)-*O as m-, and ((m)=O(m-k ) for k>r/(2r-1), where r l, then

(y t) is a mixing sequence with 4(m) of size r/(2r-1).

(ii) If c(m)-*O as m m, and a(m)=O(m-k ) for k>r/(r-1), where r>1, then

(y t) is a mixing sequence with a(m) of size r/(r-1).

4(m) and a(m) measure how much dependence exists between observations at least m periods apart. A sequence such that ((m)-*O as m- - is called uniform mixing or (-mixing, and a sequence for which c(m)-*O as m- is called strong mixing or c-mixing. Since the dependence coefficients, p(m) and a(m), are required to vanish asymptotically, mixing is a form of asymptotic independence. A fairly large class of processes satisfy mixing conditions. For example, finite order Gausian ARMA processes are strong mixing, as are stationary Markov chains under fairly general conditions. White and Domowitz (1984) show that measurable functions of mixing processes are mixing and of the same size. This is particularly convenient for nonlinear problems. Mixing processes are useful for modeling complex economic data since they are not required to be stationary. In short, mixing conditions provide a convenient way to model an economic phenomenon that is likely to be both heterogeneous and time dependent.

The following law of large numbers, due to McLeish (1975), applies
2
for mixing sequences.

Theorem 3.2. Let (y t) be a sequence with ((m) of size r/(2r-1) or a(m) of size r/(r-1), r>1, such that EYt r+dO, and all t. Then

n p
(i/n) E(Y t-E(Y t))-0
t=1

All proofs of theorems in this Chapter are provided in Appendix A.3.

For Theorem 3.2 to be applicable to a given sequence, it is clear that there is a trade-off between the moment restriction that the sequence must satisfy, and allowable dependence. The stronger the moment restriction satisfied, the more dependence as measured by p(m) (or a(m)) is allowed. If the members of the sequence are independent, then we can set r=1, and Theorem 3.2 collapses to the Markov law of large numbers. For sequences with exponential memory decay, r can be set arbitrarily close to one. In general, the longer the memory of a sequence, the larger is the size of p(m) and a(m), and consequently the more stringent the moment restriction (which depends on r) becomes.

By using mixing conditions to restrict the serial behavior of the sequence (Qt,l(APt+1 >0),xt), it is not necessary to specify an additional parametric model such as an ARMA (p,q) process. Consequently, one possible source of model misspecification is eliminated. Mixing conditions enable us to include a larger class of models in the analysis. Of course, as Theorem 3.2 implies, the precise size of the class will depend on what moment restrictions are satisfied. We are now ready to state conditions which ensure the consistency of the partial-MLE (and the MLE) of 00.
p
In order to establish consistency for the partial-MLE we impose the following assumptions on the disequilibrium model presented in section

2.2:

Assumption 3.3. (allowable serial dependence): The sequence (Qt,l(Apt+l>0),xt) is a mixing sequence with p(m) of size r/(2r-1), r 1, or a(m) of size r/(r-1), r>1.

Assumption 3.4. (distributions):

(i) The random vector (e1t, 62t) is normally distributed with mean zero

and covariance matrix:

o2 02

(ii) Assumptions 2.1 and 2.2 hold. (See page 13.) Assumption 3.5. (the regressors):

(i) The vector xt consists of only exogenous variables.

(ii) Each component of xt is uniformly bounded in t, has a finite

range for each t, and a support given by St=S for all t. (iii) Any linear combination of the components of xt where the

coefficients are not all zero is not zero with probability one. Assumption 3.6. (the parameter space):

(i) The parameter space - includes the true parameter vector
0o o o2 o o2 o 0 2
dP= l, 1 , ,P P), excludes the region a O (i1,2) and
p 1 ' 2 c2 '1110l C
Pl0>Pll, and is a compact subset of a Euclidean space.

(ii) If the set - includes points such that p11=P10, then it excludes

the point 0 =( 0 2 o2, o2 o 0 ). Otherwise may include
p B2 F1',BlJl PlIPI0ï¿½

0.
p

With a few exceptions, the conditions on the regressors and the parameter space are identical to those given by Hartley and Mallela (1977), and Amemiya and Sen (1977). One exception is that we place no restrictions on the limiting behavior of the empirical distribution of the regressors, whereas Hartley and Mallela require it to converge completely to a nondegenerate distribution. As pointed out by White

(1980), in sampling situations where the researcher has little control over the data, it is important to allow for the possibility that the empirical distribution does not converge. In contrast to Amemiya and Sen, we do not require the regressors to be i.i.d., but for convenience retain their assumption that the regressors are discrete random variables.

Assumption 3.6(ii) is necessary to identify the true parameter
0 0
vector do. Without appropriate prior information on 1 and 2, the
P 1
point 0* is indistinguishable from o and the model can not be
p p
estimated. This is the problem of interchanging regimes which is discussed by Hartley and Mallela, and Amemiya and Sen. Both studies
*
point out that 0 is eliminated from the parameter space if the usual "order condition" holds. We will extend this result below by showing that for o to be distinguishable from 0 it suffices to know a priori
p p
that Pl >Po0. In this sense prior sample separation information represents prior information on the supply and demand parameters.

Hoadley (1971) has generalized the Wald argument to the case of

independent not identically distributed observations. Theorem 3.7 below is an extension of Hoadley's argument to mixing sequences, and will be used to verify that assumptions 3.3, 3.4, 3.5, and 3.6 imply consistency for the partial-MLE, Cml3
n

Theorem 3.7. Suppose:

(i) The sequence (y t) is a mixing sequence with (m) of size r/(2r-1),

r2l, or a(m) of size r/(r-1), r>1.

(ii) The parameter space E is a compact subset of a Euclidean space. (iii) The function ft(yt10) is continuous on E, uniformly in t, a.e.

(iv) The function

sup{in(ft( t 10 )/ft(Yt o )): IO'-0 S1p)

is a measurable function of yt for each 0 belonging to E.

(v) There exists p (0)>0, and d>O such that

E sup{iln(ft (yt t')/ft(Yt 0): io'- sp} lr+d<=A<

for Op-

(vi) For ebO,

lim n sup{n-E E(ln(ft(yt I)/ft(yt I)))}<0.
t=1

Let n(y) be a function of the observations y=(y1,."yn) which solves the problem

n
max ft(Yt ).
o t=1

Then plim 0 (y)=.
n

To show that the partial-MLE dnl is a consistent estimator of 0o we
n p
verify that F, (Qt,1(Apt+l>O),xt), and ft(Qt,1(Apt+1>0) satisfy

3.7(i)-(vi) given assumptions 3.3 - 3.6.

The fact that the mixing and compactness requirements 3.7(i) and

3.7(ii) are satisfied follows directly from assumptions 3.3 and 3.6(i).

Lemma 3.8 establishes that f t(Q t,l1(Ap t+>010 ) satisfies the continuity requirement 3.7(iii).

Lemma 3.8. Given assumptions 3.4 - 3.6, ft (Qt,l(Apt+>0p) is a continuous function of 0 uniformly in t, a.e.
p

Lemma 3.9 establishes that the measurability requirement, 3.7(iv), is satisfied.

Lemma 3.9. Given assumptions 3.3 - 3.6, the function

I f
sup{ln(ft(Qt, l(Apt+ >0) 0p)/f t(Qt,l(Apt+l >0) p)): Opp I is a measurable function of (Qt,l(Ap t+>0),xt).

The moment restriction, 3.7(v), together with 3.7(i) determines the amount of dependence allowable. The following lemma extends Hartley and Mallela's Corollary 4.2, and establishes that 3.7(v) is satisfied for large r+d.

Lemma 3.10. Given assumptions 3.3 - 3.6, for all sufficiently small

p=p(0) >0,

E supln(ft t,(At+ p)/ft t,(t+>0 Bp)): I Sp-9p IQp k m .

where k is any positive integer.

Finally, Lemma 3.11 establishes that the identification condition,

3.7(vi), is satisfied. Lemma 3.11 extends Amemiya and Sen's Lemmas 2 and 3 to the case of plP11l0. Lemma 3.11 Given assumptions 3.3 - 3.6, for 0 ;o0 there exists a P P
negative constant b(0 ) such that
P

E(1n(f t(Qt,1(Apt+l>0) p)/ft (Qt,1(Apt+1>0)p))):5b().
p tn ft(Qt

We have proven the following theorem.

Theorem 3.12. Given assumptions 3.3 - 3.6, then plim ?l=ï¿½.
n p

3.3 Asymptotic Normality

Under the assumption that (Qt,l(Apt+l>O),xt) is a mixing sequence, we consider the limiting distribution of

n- - (O ) VL (00),
n p n p

where VL (00) denotes the gradient vector corresponding to L (00), and
n p n p
2
V (dï¿½) = var (n-!VL (0ï¿½)). We will discuss conditions that imply n p n p
asymptotic normality; that is,

I- 0 A
n 2V 2(0ï¿½)VL (Oï¿½y'N(O,I), (3.13)
n p n p

where I denotes an identity matrix of appropriate dimensions. The results in this section together with those in the next section permit derivation of asymptotic test statistics.

As is well known, asymptotic normality is proven by an appropriate application of a central limit theorem. As with consistency, the conditions sufficient for asymptotic normality depend on the degree of dependence and heterogeneity the sequence exhibits. For a sequence of independent identically distributed random vectors, we have the Lindeberg-Levy Theorem; for independent not identically distributed we have the Lindeberg-Feller Theorem; for dependent identically distributed we have the central limit theorem of Gordin (1969); for dependent not identically distributed we have the central limit theorem of Serfling (1968).

For the case of independent observations, Hartley and Mallela (1977) prove the asymptotic normality result (3.13) by applying a version of the Lindeberg-Feller Theorem. However, by specifying the sequence (Qt,1(Ap t+1>0),xt) as mixing, a more general result is

possible. The following theorem is based on Theorem 2.4 of White and Domowitz (1984) which generalizes Serfling's (1968) central limit theorem.

Theorem 3.14. Suppose:
n
(i) Let VL (0o) = E VL (0). Then E(VL (0ï¿½)=0 for all t.
np t=1 p tp

(ii) Let X be any nonzero vector, and define

1 n+a
-1
VL (0) = n Z VL (O ).
n,a p t=l+a

Then there exists a matrix V such that det(V)>0, and

E(VL (L (0) (O)T)XT -XVXT + 0
n,a p n,a p

as n-) uniformly in a.

(iii) EIVL (0) 12r A<_ for some r>1.
t p

If p(m) or a(m) is of size r/(r-1), then (3.13) holds.

Condition 3.14(i) is the familiar condition that the vector of likelihood equations, when evaluated at the true parameter vector 0,
p
has zero expectation. Sufficient conditions for 3.14(i) are (1) the model is correctly specified, and (2) the density of (Qt,1(Apt+1>0),xt is sufficiently regular to permit differentiation under the integral sign.

Condition 3.14(ii) is somewhat restrictive, but unfortunately a less restrictive replacement for it is currently not available. Condition 3.14(ii) restricts the heterogeneity of VL (bo) by requiring t p
it to be covariance stationary asymptotically.

Condition 3.14(iii) is a moment condition which depends on the

amount of dependence the sequence (Qtl(APt+ >0),x t) exhibits. If the sequence is serially independent, then r can be set arbitrarily close to one; as the amount of dependence increases, as measured by (m) or a(m), r increases accordingly.

3.4 Consistent Covariance Estimation

We consider the problem of deriving consistent estimators for the

asymptotic covariance matrix of the partial-MLE Cml. The expression for
n
the asymptotic covariance matrix is

n2V2 (Eo)-1V ()- (o)-1
np n p n p

where V (00)= var(n- VL (00)), and V2f (00) is the matrix of second
n p n p n p
order partial derivatives of L (bo) = E(L (0o)).
n p n p
First consider the problem of consistently estimating the term nV 2 (0o)-i. The functional form of this term does not depend on the
n p
serial dependence (or independence) of the observations, and therefore consistent estimation of it is straightforward. The following theorem, which combines Lemma 2.6 of White (1980) with Theorem 2.3 of White and Domowitz (1984), provides conditions that imply

plim n(V 2L (ml)-i - V2L (0o)-i) = 0.
n n n p

Theorem 3.15. Let qt(Yt ,0) be measurable for each 0 belonging to a compact set E, and continuous on 7 uniformly in t a.e.

Suppose

(i) The sequence (y t) is mixing as stated in Definition 3.1.

(ii) For r 1 and any d>0,

sup E qt(y, ) r+d<

n
If plim 0=', then plim n ( E (qt (Yt' n (y o)))=0 t=1

Next consider the problem of consistently estimating V (o).
n p

Unlike the term V2 (00), the functional form of V (00) depends on the np np
nature of the serial correlation, and consequently special care must be taken. The general form for V (eo) is
n p

n oT
S(O);c(n)) = n-l E(ft()f 0 ) ) +
n p t=1 t p

c(n)-1 n + n-1 cE E[ft( (0)f (0 )T + f (0)f o O)T
t p t-s p t-s p t p
s=l t=s+l p P

where ft (0p)Vlog f (Op), ft(.) is the density of (Qt, (Ap t+>0),xt) and c(n) is such that E(ft (0)f s(0o)T = 0 for sac(n). The natural choice for an estimator of V (0o;c(n)) is the sample analogue V (ml ; c(n)).
n p n n

The consistency of such an estimator, however, depends on the asymptotic behavior of c(n). We will consider two special cases.

Case 1. c(n) = c

If c(n) is equal to a known finite constant c which is less than or equal to the sample size minus one, (if c=n, then the estimator V ( ml;c)=0), then imposing the conditions of Theorem 3.15 will suffice n n
for

ml
plim (V (n ;c) -V (eï¿½;c))=0.
n n n p

An example of sampling situation where c is a known finite constant would be one in which the observations are known to be generated from a moving-average process of order c.

If c is assumed to be constant and less than or equal to n-i, but otherwise unknown, then the problem becomes more complicated. Let c denote the specified choice for an unknown c. In the next section we derive an asymptotic test for the hypothesis c=c. The test is a possible criterion for specifying c. The issues involved in specifying i;1
c are the following. If we specify c n n

is inconsistent since nonzero terms in V (0p;c) are mistakenly np

constrained to be zero. On the other hand, if we specify c> c, then the estimator is consistent, but inefficient since restrictions of the form E(f.(0ï¿½)f.()T )=0 are neglected. When the purpose of estimating
i p J p
V (ï¿½;c) is to construct asymptotic test statistics, however, the np
essential requirement is consistency (rather than efficiency). Therefore, when the purpose is hypothesis testing, the choice c>c is preferable to c
O o T
Case 2. lim c(n)=- and lim E(f (0 )ft-c n (0 ï¿½))=0.
noc (n)+t p tc0)

0
In this case the sequence (ft (0 )) is only assumed to be

asymptotically uncorrelated. A sufficient condition for (f (00)) to be t P

asymptotically uncorrelated is that the sequence (Q t,1(AP t+>O),x t) be mixing. Theoretical results for this case have been presented by White and Domowitz (1984), White (1984), and Newey and West (1985). Their results depend on restricting the growth rate of c(n). Unfortunately, their results do not give any guidance concerning the choice of c(n) for

finite samples. The following theorem is due to Newey and West (1985), and provides sufficient conditions for plim (V n( ;c(n)) - V (o;c(n))) n n n p
=0.

Theorem 3.16. Suppose

(i) f (6 ) is measurable in (Q 1(Ap +>0),xt) for each 0 , and
t P t t+1 P
continuously differentiable in 0 in a neighborhood N of 00.
p p

(ii) (a) sup Ift(op) 12 A 0 EN
p
(b) There are finite constants d>0 and r l such that

E If(ï¿½o) 4(r+d)A' <.

(iii) (Qt,l(Apt+l>O),xt) is a mixing sequence with p(m) of size 2 or

c(m) of size 2(r+d)/(r+d-1), r>1.

(iv) For all t, E(f (O0))=O, and n'(d-0o) is bounded in probability.
t p n p

If lim c(n)== such that c(n)=o(n'), then

plim (V (aml;c(n)) - V (0);c(n)))=0.
n n n p

One additional problem is that for c(n)>1 the estimate V ( l;c(n))
n n

is not necessarily positive semi-definite. This can lead to negative estimates of the variances and test statistics which are clearly not acceptable. To ensure that V ( fl;c(n)) is positive definite, the n n
summands can be weighted according to a procedure described in Newey and West (1985). This modification does not affect the consistency of the estimate.

3.5 An Asymptotic Test for Serial Correlation

In this section we propose a test sensitive to serial correlation in the gradient vectors f (Oï¿½). The test provides a criterion for t p
specifying the constant c of the covariance estimator V (oml;c).
n n
The null hypothesis of interest is

H E(f (0 ).f (e0).)=O for all ij,
0 t pit-c pj

where f (0ï¿½). denotes the i-th component of the vector ft(GO). The
t tp
basis for a test of H comes from two observations.
0

(1) Under Ho, linear combinations of the components of the vector

f (bï¿½) are uncorrelated with linear combinations of the components
tp
of the vector f (0(D).
t-c p
(2) Under H0, the products f ( a&) f (O ï¿½). should be close to zero t n i t-c n
for sufficiently large n.

Therefore, a reasonable strategy for testing H would be to compute the
0

sample correlation between appropriate linear combinations, and reject H if the sample correlation is too large in some sense. To this end,
o

for a k-dimensional vector f (0O), consider the artificial regression t p

k .Jl k M
Z w. f E(l) = a f )V
it t n i ilt-c

k
where the wit are known constants such that E w =1, and the a. are it i it i

unknowns to be estimated. The test we propose entails computing the OLS
Is
estimates a. , i=l ,.,k, and testing the hypothesis ol= . ak=O. More formally, we have the following theorem.

Theorem 3.17. Define aT = (a *.aK'

f ( )/f )tf(
-c (p ) 1 p

f ( )
n-c p 1

f(O )
p

k
I w. f (0 ). i=1 i,c+1 c+l p )

k
E w. f (0 ). (n-c)xl
i=1 1,n n p i

fl( p)k \

f (O ) (n-c)xk n-c pk

(i) The vector-valued function f(6 ) is continuously differentiable
p

(component by component) on an open convex set E CRk containing
0
0.
P
(ii) There exists an open neighborhood of 0o,N, such that
p

sup Ift (0 p)il 8 EN
p

sup lft ( )i/op I 0 EN P
p

-I1
(iii) plim n Z f (00)=0
t=1 t p

(iv) Let A (00)=n- f (00o) (o), and (0 ) = E(A (0 )). Then
n p -c p -c p n p n p
there exists an open neighborhood of 0 , NO, such that A (0 ) is p n p

positive definite on NO for all n sufficiently large and

plim sup A n(0 ) - (0 )I =0.
S No p n p
p

(v) Let U (Go) = var (n- f (So) f(0o)), and let U (0m) denote the
n p -c p p n n
sample analogue. Then U (0 ) is positive definite on an open n p
neighborhood of 00 for all n sufficiently large and
p

plim (U (ml) - U (0)) = 0.
n n n p

(vi) Under H, U ()n f (o) f(0o) % N(0,I).
0 n p -c p p
Let D (0 )=A- (0 )U (0 )A 1(0 ). Then given conditions (i)-(vi), and
n p n pnpn p
H,
H )n als Tm-l(Cml) is k 2

n n n n X"

3.6 Summary and Conclusions

The main points of this chapter are the following:

(1) The assumptions presented in sections 3.2 and 3.3 imply that the partial-MLE of the disequilibrium model is consistent and asymptotically normal. The assumptions allow for serial correlation of an unknown form; for example, an arbitrary ARMA process is allowable for the observations. At the same time, the estimator dml is computed as though
n

the observations were serially independent, and thus computational tractability is retained.

(2) To calculate asymptotic test statistics, a consistent estimate of the asymptotic covariance matrix is needed. Obtaining a consistent covariance estimator is complicated by the need to specify a constant c such that E(ft (9)f ()T)0- for all sac. In general, c is unknown but
p t P
consistent covariance estimation depends on specifying a c such that c>c.

(3) The test statistic presented in Section 3.5 permits a test of H :c=c, and thus provides a criterion for specifying c.
0

NOTES

'Our discussion of mixing draws heavily on White and Domowitz (1984), and White (1984, pp. 43-47).

2 Theorem 3.2 is a less general version of the law of large numbers presented by McLeish (1975, Theorem 2.10). The version we present is discussed in White (1984, Corollary 3.48), and imposes a stronger but simpler moment restriction.

3 White and Domowitz (1984) extend Hoadley's Theorem A.5, which is a uniform law of large numbers, to mixing sequences by applying Theorem 2.10 of McLeish (1975) instead of Markov's law of large numbers. Here we merely point out that Hoadley's Theorem 1 can be extended to mixing sequences using the same technique.

In some respects the conditions of Theorem 3.7 are stronger than those stated in Hoadley's Theorem 1. For example, the requirement that f t yt1) is continuous can be replaced by upper semi-continuity. The

conditions that we state are sufficiently general for our purposes.

CHAPTER 4
ANl EMPIRICAL EXAMPLE: THE U.S. COMMERCIAL LOAN MARKET

4.1 Introduction

In this chapter the disequilibrium model described in section 2.2 (page 12) is fitted to monthly data on the U.S. commercial loan market from 1979 to 1984. The problem is to analyze disequilibrium supply and demand behavior with limited a priori information imposed on the price adjustment process. The model is estimated and tested with the partial-MiLE and least squares method described in sections 2.2 and 2.3, respectively. The possibility of serial correlation is accounted for using methods described in Chapter three.

Disequilibrium models of commercial loan markets have been

estimated by Laf font and Garcia (1977), Sealy (1979), and Ito and Ueda (1981). To design the specification of the supply and demand equations these works were consulted. Our model and estimation methods, however, differ from the previous studies in three important respects. First, price enters the model differently. Laffont and Garcia, and Ito and Ueda constrained the price change to separate the sample, and Sealy assumed that price changes were a linear function of normal random variables. Second, the starting values we employ for maximizing the likelihood function are consistent estimates, and therefore ensure convergence to an asymptotically desirable solution. None of the above

studies employed methods that guarantee this. Third, we will adopt the nonparametric approach developed in Chapter three to allow for the possibility of serial correlation. Given that the data is a time series, allowing for serial correlation is particularly important. Failure to do so can cause inconsistent covariance estimates and therefore misleading test statistics. In contrast, most existing disequilibrium studies, including those mentioned above, apply methods to time series data that are only appropriate for serially independent observations. The nonparametric approach was chosen for its generality, and computational ease. An arbitrary ARMA process is allowable for the error terms, but at the same time the parameter estimators are computed as though the errors are serially independent. As opposed to an assumption of serial independence, the only part of the problem that changes is the calculation of the asymptotic covariance estimate.

4.2 The Empirical Model

The empirical model to be estimated and tested is specified as follows.

Dt =10 + B 1(RLt-RAt) + B 2IPt_1 + 61t,

St =20 + 21 (RLt-RTt) + 22TDt + E2t'

Qt = min(DtSt),

where pl1 Pr(ARLt+i>OIDt>St)' and P10 Pr(ARLt+I>ODt

The variables we use will differ little from those of the previous studies. The variable RL is the average prime rate charged by banks; RA is the Aaa corporate bond rate, and reflects the price of alternative

financing to firms; IP is the industrial production index and measures firms expectations about future economic activity; RT is the three month treasury bill rate, and represents an alternative rate of return for banks; TD is total bank deposits in billions of dollars, and is a scale variable. The observed quantity transacted, Q, is specified as the sum of commercial and industrial loans, and the relevant price change is ARL tmR1 t+-RI All interest rates are expressed as percentages. The

sample consists of 72 observations on each variable, and can be found in various issues of the Federal Reserve Bulletin.

4.3 Hypothesis Testing Procedures

Two hypotheses concerning the price adjustment process, and several hypotheses concerning serial correlation were tested. The first price adjustment hypothesis maintains that the direction of the price change 1(Ap t+iO) can be used to separate the sample into the underlying supply (QtjS t) and demand (Qt=D t) regimes. The approach we have chosen to model price adjustment permits the known sample separation hypothesis to be conveniently expressed as

Ho: (ipo)(1)

The null hypothesis was tested by computing a Lagrange multiplier (11M) test. The LM4 test was chosen over the Wald and likelihood ratio tests because it only requires the estimates under the computationally simpler null hypothesis.

The second price adjustment hypothesis maintains that price

adjustments are symmetrical in the following sense: the chance of a

price increase during a shortage is the same as that of a decrease during a surplus. This hypothesis can be expressed as

H : P11=I-p10.

To test the hypothesis of symmetrical price adjustment, a Wald test was computed. The Wald test was chosen over the LM and likelihood ratio tests because it only requires the unconstrained estimates. In this case the constrained estimates (those obtained under H ) offer no
0

computational advantage over the unconstrained estimates.

The LM and Wald test statistics converge to their usual chi-squared limiting distributions provided that:

(1) n-1-()V (C)AN(OI);
n p n p
(2) a constant c is chosen such that plim (V (?l; )-V (e0;c))=0.
n n n p

If VL (bO) is a k-dimensional vector, and both (1) and (2) hold, then we
n p
can conclude

S(o) T V 2)I o (d o) A2
n ( n n )in( p)%Xk

(See, for example, White (1984, Theorem 4.30)).

The specification of c was handled as follows. The LM statistic for the first H and the Wald statistic for the second H were each
0 0
computed for several successive values of c. The LM statistic was computed for c=1,.,12, and in each case the null hypothesis (PllPl0 )=(0) was rejected. The Wald statistic, however, produced conflicting evidence for the hypothesis p1l=l-P10; for some values of c the hypothesis was rejected, and for others it was accepted. To choose among the conflicting evidence, the test statistic for serial

correlation (See Section 3.6) was computed for several values of c. On this basis % was specified, and a single covariance estimate for the Wald test was chosen. The covariance estimate chosen for the Wald test was also used to compute the asymptotic standard errors of the parameter estimates.

The test statistic for serial correlation depends on the

correlation between linear combinations of the components of f (00) and t p

linear combinations of the components of f (60). Therefore, the t-c P

conclusion of the test depends on how the linear combinations are chosen, or in other words, on the specified weights wit (see page 36). For example, the test might reject H 0 for some set of weights, and not reject H 0 for other sets. To help cope with this difficulty, it was decided to choose the weights randomly from a uniform distribution on the interval (0,1). If there is a finite or countable number of sets of weights such that H 0 is incorrectly rejected or accepted, then choosing the weights from a continuous distribution ensures that these weights are not chosen with probability one. The weights were generated from a uniform distribution by a SAS random number generator.

4.4 The Results

The model was estimated under the assumption that the error terms are independent normal variates with constant variances, but are not necessarily serially independent. First the LS method was applied. The LS estimates are reported in the first column of Table 1, and were used as starting values to obtain the ML estimates presented in the second and third columns. A computer program was written with the SAS "Matrix Procedure" for the purpose of maximizing the likelihood

functions; the program uses the quadratic hill-climbing technique as presented in Goldfeld, Quandt, and Trotter (1966). In Appendix A.4 we describe the quadratic hill-climbing technique, and show that consistent initial estimates ensure that the second-round estimates obtained from the technique have the same asymptotic distribution as the partial-MLE.

The estimates in column two of Table 1 maximize the likelihood

subject to (p11,p1O)=(1,0), or equivalently, under the assumption that the direction of the price change separates the sample into the underlying supply and demand regimes. Unlike previous studies a test of this hypothesis was carried out. The constrained estimates were used to construct the Lagrange multiplier (LM) statistics. The 111 statistic was computed with twelve different covariance estimates (c=1,.,12). As the figures in Table two indicate, the hypothesis of known sample separation is rejected. The conclusion of the LM test has two important implications for the analysis of the data and model. First it suggests that the price change alone should not be used to determine whether the sample period was characterized by excess demand, excess supply, or both. In most disequilibrium studies this type of analysis is routinely done. Second, as was shown in Section 2.1, incorrect sample separation adversely effects the large sample properties of the estimators. In view of this problem the constrained estimates are suspect.

The next estimation was performed over the unconstrained space, and consequently pl, and plo were estimated along with the other parameters. In this case all of the initial consistent estimates were employed, and therefore the estimates in column three represent the consistent and asymptotically normal solution. The ML estimates are not much different than the LS estimates. This is due to stopping iteration before

complete convergence to a maxima of the likelihood function. The iterative technique performed poorly for the unconstrained likelihood in the sense that the speed of convergence was extremely slow. For this reason, the final estimates were obtained from the 100th iteration where the gradient is not significantly close to zero, and therefore are not true ML estimates. However, since the initial estimates are consistent, estimates obtained after the second iteration are asymptotically equivalent to the ML estimates, and therefore nothing is lost by stopping iteration before convergence, at least asymptotically. Further details regarding this point are provided in Appendix A.4. The particular specification chosen for the model performed well in the sense that all of the estimates are of the correct sign, and most are significant. The estimates of p 11 and p 10 are .8179 and .2455, respectively, which mean there is (1) a 81.79% chance of a price increase and 18.21% of a decrease during shortages, and (2) a 75.45% chance of a decrease and 24.55% of an increase during surpluses.

To select a covariance estimator for the Wald test of H 0 :p 11 =1-p 10, the serial correlation statistic was computed for c=1,2,3. (See Table 3.) The hypothesis of c 3 was accepted. The Wald test statistic did not reject the hypothesis H 0 :p 11= 1-p 10 (see Table 4), suggesting that price adjustments are symmetrical.

The differences which arise when the imperfect sample separation given by the price change is ignored can be seen by comparing columns two and three of Table 1. While both sets of estimates give the correct signs for the supply and demand variables, the unconstrained estimates suggest that demand and supply are less responsive to price changes than do the constrained estimates. The unconstrained estimate of the price

parameter for the supply equation is approximately 40% less than the constrained estimate, and the price coefficient for the demand equation is approximately 14% smaller in absolute value for the unconstrained estimate. Given the rejection of the known sample separation model, however, we are more inclined to believe the unconstrained estimates.

The problem of determining whether the period 1979-84 was characterized by excess demand or supply was also addressed with the unconstrained estimates. This was accomplished by estimating the probability of excess demand for each t conditional on the quantity transacted and the direction of the price change. The expression for this conditional probability is

Pr(D t>S tiQt ,(Ap t+i>0)) = (lgt (pt1>) (1pl) t)IA + 0
f t(QtX 1( 1O

The results are reported in Table 5. As pointed out by Lee and Porter (1984), the classification rule: Qt~St if Pr(D t>S tIQt, l(Apt+1 >0)) > .5 and QJD t otherwise, is optimal in the sense that it minimizes the probability of misclassification. Applying this rule, we find that 54.12% of the observations are excess demand and 45.8% excess supply. In contrast, if one were to rely solely on the direction of the price change, the conclusion would be 31.9% excess demand, 43.1% excess supply, and for 25% of the observations, Ap t+=0. In Table 6, the compatibility of the direction of the price change with the optimal classification rule is further examined. Comparing the two rules, excluding the observations for which Ap t+l=0, we find that 9 observations out of 54 are classified differently.

Table 1

Estimated parameters and statistics. (Asymptotic standard errors in parentheses)

LS
Initial estimates

MLE
p11=1, P1O=0

P11 lP0

MLE
unconstrained

demand const. RL-RA IP
2
a1

supply const. RL-RT TD
2
CF 2 PHl P10
log likelihood n=72

79.6508

-14.9764

2.2856 367.7335

-60.6708

4.4981 0.3176 1197.7623

0.8526 0.2571

40.5262

-17.2779

2.5429 2140.5700

-74.9844

7.3034 0.3266 77.4408

1.0000 0.0000

-355.9850

79.6509

-14.9758

2.2938 367.7344

-60.6709

4.4985 0.3288 1197.7622

0.8178 0.2454

-317.3710

(169.61) (2.918) ( 1.170) (94.36) (145.87) (0.3834) (0.982) (87.40) ( .0673) ( .2752)

Variables

Table 2

Test of H :(P11,pl0) = (1,0)

LM Statistic H rejected at a% level
0

16.7693 28.8588 18.9718 65.5703 22.9532 17.3450 10.7834 12.2467 14.4707 5.7286 6.5377 6.5118

0.020% 0.001% 0.008% 0.001% 0.001% 0.017% 0.455% 0.219% 0.072% 5.702% 3.805% 3.854%

v (i;%
np n

C=1

=2 =3

=4 =5 =6 =7 =8 =9 =10 =11

=12

n n

Table 3

Test of H : cc
0
Serial Correlation Statistic

H rejected at % level
o

1 32.7552 0.030%

2 14.2697 16.104%

3 10.6509 38.540%

Table 4

Test of H : p11 = i-p10

V (Oml;c) Wald Statistic H rejected at u% level
n n 0

0.0550411

c=3

94.34%

52

Table 6

Compatibility of the Direction of the Price Change with the
Optimal Classification Rule

Pr(Dt >St t, l(Apt+1>0)) >.5

Ap >0
t+l1

Ap <0 t+1
Apt+ =0 t+1

Pr(Dt >St IQt,1(Apt+1 >0)) <.5

CHAPTER 5
SEMIPARAMETRIC ESTIMATION OF DISEQUILIBRIUM MODELS USING THE METHOD OF MAXIMUM SCORE

5.1 Introduction

We consider an alternative estimation strategy not previously analyzed for a disequilibrium model. The strategy is the so-called "semiparametric" estimation developed in Manski (1975), Cosslett (1983), Powell (1984), Manski (1985), and some others. Semiparametric estimators have been shown to be consistent under more general conditions than the conventional LS and ML estimators, and therefore require fewer prior restrictions. For a number of cases where consistent LS and ML estimation require the functional form of the error distribution, consistent semiparametric estimators have been derived without imposing functional form. Powell did so for the censored regression model using the method of least absolute deviations, Cosslett derived a distribution-free ML estimator for the binary choice model, and Manski derived consistent estimators for the same model using the method of maximum score. Semiparametric estimation is most useful when parametric assumptions cannot be trusted, but are needed for consistent LS and ML estimation. In particular, it offers an improved strategy for estimating disequilibrium models.

We derive consistent semiparametric estimators for disequilibrium models using the method of maximum score of Manski (1975, 1985). Consistent score estimators are derived for the following situations: the functional forms of the error distributions are unknown, the

quantity transacted is an unknown function of supply and demand, and the price change is an unknown function of excess demand. The presentation comprises three models and their score estimators. The models we consider are all of the following form:

M. (model): Given the supply and demand equations St= 0x + and
t1 t 2

Dt= 30tx + Elt the iid sequence of random vectors (Qtptxt)n+l, the
1 t Ittttt1
event Spq involving either Pt or Qt ,and the event Sx involving xt.

Pr(S Is x;o B) > Pr(Sc IS ; ) and
pq x1'2 pq x 1I,
Pr(Sc jSc; O ) > Pr(S ISc;I .o );
pq x 1' 2 pq x'1' 2

where Sc denotes the complement of the event S. General, intuitive considerations motivate the specification of (S ,S ) for each model.
pqx

For example, the intuition that an expected shortage (excess demand) is a better predictor of a positive price change than an expected surplus motivates the model in Section 5.2. Given the model, consistent estimation depends on general continuity and identification assumptions which do not require prior knowledge of the functional forms of the underlying distribution functions or explicit equations for quantity or price.

The model in Section 5.2 concerns events involving the price

change, Apt+, = Pt+l-Pt, and expected excess demand, 0 xt 0 xt--0xt,

or more specifically, the binary variables 1(Ap t+>0) and l(\$xt >0), where 1(') denotes the indicator function. The model maintains that given 1( ext >0), the best forecast of 1(Apt+i>0) corresponds to (Ap t>0) = 1(ï¿½x t>0). A score estimator of r is defined and assumptions for consistency given. The model resembles the binary response model studied by Manski (1975, 1985), and shares an

identification problem: 0c is only identified up to an unknown multiplicative scalar.

The model in Section 5.3 is a more restrictive version of that in Section 5.2, but retains a considerable amount of generality. The model is designed to exploit the fully observable Ap t+1 (versus 1(Apt+I >0)) to identify e.~ A consistent score estimator is presented, and we show that 60is identified without a loss of scale. The model represents a completely new application for maximum score estimation as it differs significantly from the model studied by Manski.

The estimators presented in Section 5.2 and 5.3 do not depend on

the quantity transacted, Qt, and therefore impose no restrictions on it. By neglecting the observations on Q t, however, the generality involves a loss of information. In Section 5.4 we specify a model for Q t, and define a corresponding score estimator. The specification, however, is insufficient to identify 60 (even up to a multiplicative scalar) without severely restricting the distribution of x t. To eliminate the identification problem the models of the previous sections are added to the specification, and the estimator is redefined. The resulting estimator uses the entire sample (Qt3APt+ilxt)i an ssow ob

consistent under general conditions.

5.2 A Directional Model and Consistent Estimation Up to Scale

The directional model restricts the direction of the price change to be most likely, but not certain to follow the sign of expected excess demand, or equivalently

M5.1 (directional model): Pr(Ap t+1>O0 x t>0) > Pr(Ap t+i0), and Pr(At<~Oj0x t<0) > Pr(Apt+i >OIBox t<0).

The motivation for M5.1 is its compatibility with an intuitively appealing forecast procedure: if a shortage is expected at time t, ex t>0, then predict a positive price change, Ap t+ >0; otherwise, predict a nonpositive change. Given M5.1, the number of correct forecasts must eventually exceed the number incorrect.

The forecast procedure in turn motivates a strategy for estimating
0 from n observations on (APt+1Ixt): choose as an estimate of Bï¿½ a value B that maximizes the proportion of the observations characterized by 1(Apt+l>O) = 1(B xt >0). This is the method of maximum score. We propose the score estimator:

n
n= arg max gn(6), where gn(n)=n- E gt (i), and
m (B t=1
g() = 1(Ap t+>0)1(x t>0) + 1(Apt+l<0)l(6xt<0).

The function gt (') "scores" one if a candidate implies a forecast compatible with the maintained model, M5.1, and zero otherwise.

Manski (1985) presents a consistent score estimator for a model of the form MED(ylx)=bx, where MED(z) denotes the median of the random variable z. His consistency proof, however, depends on the weaker model: Pr(y>0)Ibx>0) > Pr(y<0Ibx>0) and Pr(y Pr(y>0)Ibx_<0). We have postulated our model in the weaker form for two reasons. First, the weaker model is easy to interpret as a price adjustment model; positive price changes occur most frequently with expected shortages, and negative changes with expected surpluses. Second, but not less important, MED(Apt+I t xt is unnecessarily restrictive.

Manski's consistency proof (1985, p. 323) is directly applicable for a assuming appropriate regularity conditions are met. Theorem 5.2
n

below provides assumptions that imply Sn converge to 6 almost everywhere (a.e.) as n becomes indefinitely large. Theorem 5.2. In addition to M1.1 assume: A5.3. (continuity): E(gt(B)) - g(B) is continuous in B on a compact set

B.

A5.4. (identification): The set A (B) = {x: sgn(BOx) x sgn(Bx)} has positive probability for all BeB such that 8B4. Then lim W=ï¿½ a.e. Proof:

Step 1. Uniform convergence. The proof of uniform convergence uses the argument presented in Manski (1985, pp. 321-2). Observe that

gn(B) = P (Ap >0, xt >0) + P (Ap + , Bxt0), and
n t+1 n t+< t
g(B) = P(Apt+1>0, Bxt>0) + P(Apt+
where P , P represent the empirical and true distributions. Therefore, the generalized Glivenko-Cantelli theorem of Rao (1962, Theorem 7.2) implies

lim supgn (B) - g(B) = 0 a.e.
SBB

Step 2. Identification. M5.1 and A5.4 imply that B0 uniquely maximizes g(B). To see this, consider

E(gt(B) - gt(B)) = f E(gt(S) - gt(B) xt)dFx Ac B)

+ f E(g t(B) - gt(B) xt)dFx A(B)

where A C(M denotes the complement of A (B), and F the distribution
x x x
function of x. The first term on the right-hand side vanishes given the definition of gtand under M5.1 the second term is strictly positive.

0
Step 3. lim B n B a.e.

Given A5.3, Step 1, and Step 2, a.e. convergence follows from Theorem 2 of Manski (1983).

Q.E.D.

The assumptions permit a fairly general disequilibrium model. The consistency proof does not depend on the distributions of Etand ï¿½2tor how the market determines the quantity transacted. Consistency depends on a price adjustment model which enters without an explicit adjustment equation, or a known functional form for the probability distribution of prices. It suffices to believe that an expected shortage (surplus) is a better predictor of a positive (nonpositive) price change than an expected surplus (shortage).

The generality of the assumptions, however, has costs. In

particular, a careful examination of A5.4 reveals that 0is only identified up to an arbitrary scale factor. The identification problem results from the failure of the obvious, but necessary condition that A (B) be nonempty for all B;e0. Observe that for any X>O we have sgn(U 0 x) =son(6 x) for all vectors x, and therefore A (XB ) is an C, x

empty set. Thus, if points of the form aBXB0 are included in the parameter space, B, then A5.4 fails as does identification (Step 2). Manski (1985) resolves the problem by normalizing the parameter space with respect to scale which effectively eliminates the troublesome points. Scale normalization suffices for A5.4, but the conclusion of
o
Theorem 5.2 becomes lim B n UB a.e., where X is an unknown scalar.2

The loss of scale can be interpreted as arising from insufficient information. The directional model represents prior information on the stochastic behavior of the signs of APt+l and xt, but not their magnitudes; by construction the estimator depends only on the signs. The limited information permits a fairly general model, but limits what can be learned about 6o. We shall see next that the loss of scale can be eliminated by imposing assumptions on the magnitudes of Apt+, and

*xt. At the same time it is possible to retain a considerable amount of generality.

5.3 A Price Adjustment Model with Bï¿½ Identified (Without a Loss of
Scale)

Manski (1985) discusses the score estimator for a binary response model where the dependent variable, y*, is unobservable, and the sample consists of observations on l(y*>O). In the last section the price change was treated analogously to obtain a robust method of estimation. Unlike the problem considered by Manski, however, APt+1 is generally observable. To take advantage of the extra information, and thus obtain a stronger result, we propose the following model. M5.5 (directional-magnitude model): for appropriately specified numbers

s>0 and 6>0,
Pr(AP t+l > I x t>0) > max(Pr(JAP t+1< la 0 ~x t>6), Pr(AP t+l<-gfl 0x t>6)), Pr( I AP t+l 1 I max(Pr(APt+l >E: I I xt 16), Pr(AP t+1 <-1; aBxt 1 6)), Pr(APt+1<-ElBox t<-6) > max(Pr(APt+l>E 1 0x t<-6), Pr(JAPt+l

The directional-magnitude model quantifies the notion that large (small) discrepancies between expected buy and sell decisions are most likely to lead to relatively large (small) price changes. The model predicts a

small price change (IApt J<6) if the expected market position lies within a specified interval centered at equilibrium (I ex t 1<6), and larger changes ( Apt+l >) otherwise.

Compared to M5.1, the model M5.5 is more restrictive as it

restricts both the direction and magnitude of the price change. We shall see, however, that M5.5 distinguishes 6c0 from XO, and thus it becomes meaningful to discuss estimators that converge unambiguously to 60.

Given M5.5 we define a score estimator of 0ï¿½ as follows:

n = arg max h (6), where
n 6 SB n

h (6) = 1(Ap >1(x >6) + 1(!Ap l t t+ > t t+( xt
+ l(APt+ <-s)l(6xt<-6).

To prove lim 6 =6 a.e. using the arguments in the proof of Theorem 5.2,
n

the relevant assumptions are:

A5.6 (continuity): h(6) is continuous in 6 on a compact set B. A5.7 (identification): The set J (6) = {x: sgn(60x-6) z sgn(6x-)}
x

has positive probability for all a6B such that 626o.

The important difference between the above assumptions and those of Section 5.2 lies in the identification assumptions A5.4 and A5.7. Specifically, assumption A5.7 does not require a normalized parameter space since there generally exist vectors x such that sgn(6 x-6) sgn(6x-) for 6 6o; i.e., the set J (6) is nonempty for 6z ï¿½. Therefore, it is possible to restrict the distribution of x so that
0 0
J (6) has positive probability for 6B6ï¿½, and to identify 6 without a
x
loss of scale. We summarize the result in the following theorem.

Theorem 5.8. Suppose the i-th component of the vector 6ï¿½ is nonzero. Then for all 6 such that 60 6i and 6.4O, the set J (6) is nonempty. Proof:

It suffices to show that there exists at least one solution x to the system of linear equations: M(Oï¿½,6)x=F where

M( 0,a) ï¿½ 0 .

r= , and y _>y or y>6>yï¿½ï¿½

Y

The existence of x is equivalent to rank(M(6ï¿½,6)) = rank(M(Wï¿½,) r), or det(M(ï¿½,)) = det(M(6ï¿½,6) ). If det (M(aï¿½,6)) = 2, then the proof is
0 0 =
complete. If det(M(Oï¿½,)) = 1, then we need 6i/ i y/yO. The existence of such points y and y follows immediately since

{y/yO: y>6>y0 } = (-,O)U(I,), and

{Y/o: yo>6>y} = (-0%1).

Q.E.D.

5.4 Maximum Score Estimation of Models That Include the Quantity

Transacted

The estimators presented in sections 5.2 and 5.3 do not depend on the observed quantity transacted, Q, and therefore neglect relevant sample information. In this section we propose a model for Q, and define a score estimator of 60 that depends on n observations of Q. We shall see, however, that the model for Q is insufficient to identify 60 (even up to a multiplicative scalar). We resolve the identification problem by combining the model for Q with the price adjustment models described in sections 5.2 and 5.3. The score estimator we define for
n
the combined model uses the entire sample (Qt,APt+lXt)t=, and

therefore can be expected to be more efficient than the estimators of sections 5.2 and 5.3.

The observations on the quantity transacted are modeled as follows:

M5.9 (quantity model): For some given 6>0,

Pr(Q>6I0xt > 6, Boxt > 6) > Pr(Qt<616\$0x > 0 > 6),
tl~ I t, 2 - I2 t> )

and

Pr(Q <6i0xt < 6, 2xt < 6) > Pr(Qt>61i lxt <6, 0 X < 6),

Two appealing assumptions that are sufficient for M5.9, and therefore motivate it, are

A5.10 Qt = min(D tSt)"

A5.11 MED(s lt) = MED(s ) = 0, and Elt and s2t are independent. Assumption A5.11 requires only independent error terms with distributions symmetrical about zero.

To construct an estimator of eï¿½ given the quantity model, we define the scoring function:

qt(0) = >(Qt> )l(IX t>6, 2xt>6) + l(Qt<6)l(\$ixt<6,B2xt<6).

To prove consistency for a maximizer of qn(6) using the arguments in the proof of Theorem 5.2, the relevant assumptions are: A5.12 (continuity): q() is continuous in B on a compact set B. A5.13 (identification):

(i) The set U (0) = {x: sgn(Olx-6) z sgn( ix-6), sgn( 0X-6) x1 1g(~-) sg(i-6
sgn('2x-)} has positive probability for all B B such that (, 2)
0 0
i' B2 )

(ii) The set Z (aï¿½) = {x: sgn( ox-6) B sgn(Sx-6)} has zero x 1 2
probability.

The role of assumption A5.13 in proving consistency is analogous to that of the previous identification assumptions A5.4 and A5.7. The two parts of A5.13 imply that e uniquely maximizes q(a). Part (i) compares to the familiar order condition needed for the identification in the textbook simultaneous equation framework. For example, if the supply and demand equations have no explanatory variables in common, and 6>0, then Theorem 5.8 implies that U () is nonempty for a- 1.3 To see the
x

role of part (ii), suppose that the sets ZC U {Zc( e) U M}, zCuc ZU,
x x

and ZUc each have positive probability for some ;"ï¿½. Then we can write,

E(qt( 0ï¿½)-qt() =z E(qt( eï¿½)-qt(a) Ixt dW

+ cfc E(q ( e)-qt(Wl { xdF + f E(qt( e)-qt() !xt)dFx
z~u U ZU

+ f E(q ( ï¿½)-qt(B) xt)dF
ZU c t x

It can be readily verified that the first term on the right hand side is positive, the second in nonnegative, the third is zero, and the last term is negative. Therefore, given the negativity of the last term, 8i does not necessarily imply E(qt( )-qt(6)) > 0. To rule out this possibility, we impose part (ii).

The requirement that Z (U) has zero probability, however, is too restrictive to be generally applicable. It is difficult to imagine a situation where such an assumption would be appropriate. Therefore, unless one is willing to severely restrict the distribution of xt, the model M5.9 is insufficient to identify e. Assumption A5.13(ii) can be

relaxed, however, by combining the model for Q with the price adjustment model of Section 5.2, and constructing a score estimator that exploits both models. For this purpose we assume that the price adjustment model M5.1 holds in addition to M5.9, and consider the scoring function:

q*(B,B) = 1(Zc( ))q t() + 1(Z (o))P ()

where Pt() = 1(Apt+1 0)1( 81xt <6, 2xt>6) + 1(Apt+1>0)l(aIxt>6, 2xt <6), l(Zc (o)) - 1(x eZC(0)), and Zc(a0) denotes the complement of Z (6).
x t x x x

Generally Z (0) will be unknown, but if a consistent estimate, say an,
x

is available, then it can be replaced by Z (6 ). One possible choice
x n
for an is the estimator presented in Section 5.3. This forms the basis for a "total" sample estimator of a0:

n = arg max q* (,n).
n n n

To show that n converges to \$0 a.e. we prove:
n

0
Theorem 5.14. Let lim \$ = o a.e., and 6 sB for all n. In addition to n n

M5.1 and M5.9 assume:

(continuity): q*(\$,W') is continuous in both arguments on a compact set B.

(identification): Assumption A5.13(i) holds. Then lim a = B0 a.e.
n

Proof:

Step 1. Uniform convergence. The proof is similar to Step 1 of Theorem 5.2. Theorem 7.2 of Rao (1962) implies

lim sup jq*(8,') - q*(a,8') = 0 a.e.
acB, ' cB

Step 2. Identification.

Let dt (B,B) = q*(B0, e) - q(B, B0). We will show that BB0 implies d( B0) > 0. Consider,

d(8,60) = fE(dt(B, B0) xt)dF + f E(dt(B, 80) Ixt)dFx
UZ UZ

+ f E(dt (B, B) xt)dFx + ZcE(dt(8,80) Ixt)dFx'

where UZ = {U (8) Z (Bo)}, UZc = U () ZC(BO)},
x x x x
uCz = {UC(B) Z (Bo)}, and UcZc = {UC(B) C(O)}. That B~B0 implies
x x x x
d(B,B0) > 0 follows from the first two terms being positive, and the last two nonnegative. We will prove this for the first and last terms only; the proof for the remaining terms is similar.

Consider the first term, and assume without loss of generality that Bxt -6 < 0, and B2xt-6 > 0, and thus (B1o - )x < 0. Since x (B), we have B1xt-6 > 0, and B2xt-6 < 0. Therefore,

E(dt( B,B) IxtEUZ) = Pr(Apt+l - O IXt) - Pr(Apt+1 > 0Ixt) > 0,

where the inequality follows from (1-8 2)xt < 0, and M5.1.

For xt eU c c assume without loss of generality that B0xt-6 > 0, and 82-6 > 0. Since xt E Ucx(), we have 1xt-6 > 0 and 62xt-6 > 0, or B xt-6 > 0 and 2x t-6 < 0, or Bx t-6 < 0 and 82xt-6 > 0. Therefore, evaluating the conditional expectation case by case, we find

E(dt(B,60) IxtUCzc) = Pr(Qt >6xt) - Pr(Qt>61xt) = 0, or = Pr(Qt>6 xt) > 0.

Step 3. lim sup q*(B,Bn) - q*(8, BO) = 0 a.e.
BB n n

Let Y > 0 be given. Step 1 implies

sup I q*(B,B) - q*(8,B ) < y/2 a.e.
BEB

for sufficiently large n. The continuity of q*, and the compactness of B imply

sup I q*(B,B ) - q*(B,B )I < y/2 a.e.
n
ScB

for sufficiently large n since lim B = 8o a.e. Applying the triangle inequality we get

sup q*(B, Bn ) - q*(B,B0) I < Y a.e.
B n

for sufficiently large n, which is the desired result.

Step 4. lim B = Bï¿½ a.e.
n
Let N be an open neighborhood of BO and define

= q*(8, ) - sup q*(B, B0)> 0

where the existence of 6 follows from Step 2, and the compactness of B. Now Step 3 implies q*(8 n,80) > q*(B ,B ) - 6/2, a.e. for large n, and n n n n
since (n, Bn) maximizes q* we have
n n n

q*(B Bo) > q*(B , ) - 5/2 a.e. (5.15)
n n n
Step 3 also implies

q*(B,Bn) > q*( 0,B0) - c/2 a.e. (5.16)

67

for large n. Adding both sides of (5.15) and (5.16) we get

q*( ï¿½ O)

sup q*(8,8ï¿½) a.e.
N B

and therefore 8 EN a.e. for sufficiently large n.
n
Q.E.D.

NOTES

'The signum function, sgn(-), is defined as follows: sgn(z) =1 if z > 0, and sgn(z) = -1 if z < 0.

2 Another significant cost is that no distributional theory for maximum score estimators is currently known.

3 Other comparisons with the so-called order condition for

identification are much more complicated, and beyond the scope of this paper.

CHAPTER 6
CONCLUDING REMARKS AND DIRECTIONS FOR FURTHER RESEARCH

In this thesis, I have proposed several new solutions to the

problem of generalizing disequilibrium models and their estimators. The empirical example in Chapter 4 demonstrates how to implement many of these solution in practice. However, as we have seen, while some of the solutions solve old problems, they also introduce new complications. For example, while the methods presented in Chapter 3 eliminate the need to specify a parametric model for serial correlation, they also introduce the complication of having to choose a single covariance estimator from several candidates. Clearly, some of the results fall short of completely generalizing disequilibrium models and their estimators; there is a trade-off. I believe, however, that this thesis accomplishes more than merely shifting the problems faced by empirical studies from old ones to new ones. In particular, it provides a solid foundation for further research by clarifying many of the issues involved. The following is a partial list of directions for further research on the problem generalizing disequilibrium. models and their estimators:

(1) the consequences of restricting the conditional probabilities

Pr(Ap t+>O1D t>S t) and PrA +>1 St)to be invariant with

respect to t, and how to relax this restriction;

(2) the problem of finding an optimal covariance estimator when the

serial correlation is modeled by mixing conditions;

70

(3) the power properties of the serial correlation test in section 3.5;

(4) the small sample properties of estimators obtained from starting

iterative techniques with consistent estimates, but stopping

iteration before convergence;

(5) numerical studies examining the properties of the maximum score

estimators for disequilibrium models relative to parametric

estimators.

APPENDIX

A.1 Inconsistency and Misclassified Observations

We will show that constraining the direction of the price change

1(Apt+1>0) to separate the sample into the underlying demand (Qt= Dt) and supply (Qt=St) regimes, when in fact 1(Apt+1 >0) misclassifies observations with positive probability, leads to inconsistent estimates. Consider the estimator ; (1,0) which solves the problem
n

max L(n 11, p10) subject to (p11' P0)=(1,0),
(0,p11pP10)

where Ln (O,p 11,p10) is defined on page 14, equation 2.3. We will show
O0
that pl<1 and p0=0 imply plim n(1,0) o. The proof of plim
1 10 n
0 (1,0)>ï¿½0 proceeds as follows: we derive a necessary condition for the
n
consistency of an estimator that solves a maximization problem, show that the condition is violated, and hence conclude plim ; ;,.
n
The necessary condition for consistency can be viewed as either a

global or local condition depending on whether the estimator is a global or local maximizer of L . The global condition appears as the
n
conclusion of the following theorem.

Theorem A.1.1. Let 0 (y) be a function of the observations such that
n
L (0 ,y)>L (O,y) for all n and all OcE, where E is a subset of a n n n
Euclidean space. Define

L n(,8',y,p) = sup{Ln(t,y)-L (O',y): t- I

and let L (0, 6',p)-E(L ((0,0',y,p)). Suppose
n n
71

(i) For all sufficiently small p(ï¿½)=p>O,

plim(Ln( O 0' ,Y, )-n(, O', p))=0.

(ii) Ln(0,0',p) decreases to Ln(e,0',0) uniformly in n as p decreases

to zero.

If plim n=0 , then lim sup n(ï¿½ ,, 0)} >0 for all Es.

Proof:

Suppose there exists ?* Es such that lim nsup{L (0, *,0) }<0.

Then by (ii) we can choose p>O such that limnsup{Ln(ï¿½, 0*,p) }<0. Now define N={e: 10-0o

Rn=sup{Ln(t,y)-L (0*,y): It-O I0

Since 0 EN implies R >0, it suffices to show that lim Pr(R <0)=1.
n n- n+ n
Let M = L (, 0*,p) and d=lim nsupMn 0. Now for sufficiently large
n n n n
n we have M n
Pr(R Pr(R -M <-d/4) + 1 as n + ~by (i).
n- n n- n - n nQ.E.D.

Under additional regularity conditions, the conclusion of Theorem A.1.1 can be viewed as a local condition.

Theorem A.1.2. In addition to A.1.1(i) and A.1.1(ii), suppose

(i) L (0)/aC=L (e)/ae; that is, the order of integration and
n n
differentiation can be interchanged.

(ii) 0o is an interior point of E. (iii) ~ ()/ae is continuous on a closed neighborhood N1 of 0o with

radius 1>0, for all n sufficiently large.

Let nn ()/o.=- T (o). If for some i there exists a positive constant
n 1 n 1

m. such the If ()i > mi for all 0 belonging to a closed neighborhood of

0 with radius E2>0, N2, for all n sufficiently large, then plim O do. Proof:

We will prove plim 0 nOï¿½ by showing that the hypothesis of the
n

theorem implies lim sup{L (00)-L (O*)}<0 for some sequence (e*) n n n n n
belonging to E.

Let E3=min(El, 2). Since N3 is compact and L n() is continuous on N3, there exist points n* belonging to N3 such that , (E*)=sup{L (0):8 n 3 n n n
belongs to N3}. Furthermore, since In (0)il>0 on N2, the points * lie on the boundary of N3. Therefore, I*-o I=E3'

By the mean value theorem we have

K
L (*)-L (eï¿½) = E ( - .ï¿½) (e')., (2)
n n n n,i 1 n ni

where 0' lies on the segment connecting G" and 00. Now if L (0') >m >0,
n n n ni 1
then we must have * .- >0. Otherwise, since L is strictly increasing n,i 1- n
in its i-th argument on N3, we would have 0 (* ,., nk)
3 n n,1 I nk
(E** . ,6* ) which contradicts the fact that 9* is a
n n, n,i n,k n
maximizer of L . Similarly, if Ln(0')j n. 0, then n,.- d .<0.
n n n j n,j JWithout loss of generality suppose

L (0').>m.>O for i=l,.,h and Snn 1 1
T (0'). Then by equation (2) we have

h K
S(e*)-L (eo) > (0* .-e?)m. + E (?-e0* .)(-m.)
n n n - i=1 n,1 1 1 i=l+h i n, 1
i=1 i= 1+h

K
-- i=1 n, 1 --

for some d>0O, where m=min(ml,.,mh, -mh+l,.,-mk). This implies lim sup{L (o0)- (0*)}<0.
n n n n
Q.E.D.

Therefore, to prove plim e (1,0)b0, it suffices to show that
n
DL n(o;1,0)/nB1 is bounded away from zero. We establish this by showing that

E(L (0;i,0)/81) = (1-pol) E xtE(Qt-Dt)/o2 (3)

Let Dlogft(90;1,0)/861 fbt l()=l(Apt+l>0), and note that

E(f ) = fb f tft(Qtl() l O,po ,P0=0) dQt. (4)

Now if pol=1, then (4) is the expectation of a likelihood equation, and
1b
therefore given the usual regularity conditions we have E(f )=0 at pol 1. This condition will imply

-1(*) I fb stdQt=(1-1(*)) f fb gdtdQt (5)
t t
Substituting (5) into (4) yields

E(f) = (1-1())(1-pl) (f gdtdQt + fstdQt). (6)

b 02
For 1()=0, given the normality of EltE2t we have ft (Qt-XtB )xt/ o2 Substituting this into (6), and summing over the observations gives (3).

A.2 The Computational Tractability and Asymptotic Properties of the

Least Squares Estimator of Section 2.3

In Section 2.3 we proposed using a LS estimator to find the consistent and asymptotically normal solution to the likelihood equations; i.e., use the LS estimates as starting values to iterate to the consistent and asymptotically normal local maxima of the likelihood function. The success of this strategy depends on:

(a) The objective functions to be solved for the LS estimates are not characterized by an unknown number of local minima so that global minima can be easily found; i.e., multiple solutions are not a problem.

(b) The LS estimators (defined as global minimizers) are consistent and have a proper limiting distribution. If (a) fails, then the LS method is no more computationally tractable than the ML method, and thus one might as well use the ML method to begin with. (b) ensures convergence to the consistent and asymptotically normal local maxima of the likelihood function. (See, for example, Amemiya (1973, pp. 1014-15).) In this section we will argue that both (a) and (b) are likely to be satisfied in practice.

Condition (a) will be obviously satisfied if the following optimization problems have unique solutions:

-1I n >0)2
local-min n E (M(APt+l>0)-E(1(Apt+l>0)))
t=1

(p1IP10' Y)

-1 n 2
local-min n E (Q -E(Q )
t=1

2 +L )iE:

-1 n 2 2)2 local-min n E (Q -E(Q )

2 2

where E(Q t) denotes the function E(Q t) with y estimated by y (obtained
t t t 2 2
from (1)), and E(Q ) denotes E(Q ) with B,,,(.2 a and y
2 2
estimated by a19 2) (aE02 ),and y (obtained from (1) and (2)).

Solutions to problems (2) and (3) are OLS estimates, and therefore are unique if the appropriate matrices of explanatory variables have full column rank. For example, unique LS estimates can be obtained by solving (2) if the following matrix has full column rank:

d

(1-D~xY))s (X y))x , q(x y)

where x s denotes the lxk s vector of explanatory variables of the supply
t
equation, and x d the lxk vector of demand explanatory variables. In general, the matrices of explanatory variables for (2) and (3) will have full column rank provided that the functions o(x ty) and (x ty) are not constant for all t.

Solutions to problem (1) are nonlinear LS estimates, and consequently establishing their uniqueness is much more difficult. Unfortunately, attempts to prove that problem (1) has a unique solution have been inconclusive. However, there is some evidence suggesting that problem (1) can be solved for a global minimum in practice. First, the larger the sample size the more likely problem (1) will have a unique solution. Lemma A.2.4 below provides a rank condition which ensures a unique solution with probability approaching one as n approaches infinity. Second, given the data discussed in Chapter 4, attempts to solve problem (1) were successful in the sense that all starting values iterated to the same solution. In contrast, attempts to maximize the likelihood function were unsuccessful as different starting values iterated to different solutions. Third, the objective function in problem (1) is bounded below (by zero) which simplifies the search of the parameter space for a global minimum. In contrast, a search for a

global maximum of the likelihood function is complicated by unboundedness: L -)-as Cy2-*0 or ca2 -*0, (see, for example, Maddala (1983, p. 300)). Therefore, any search for a global ML estimate will be futile unless one is willing to arbitrarily bound the error variances away from zero.

Next we discuss conditions that imply consistency for the LS

estimator. We will only consider conditions that imply consistency for the nonlinear LS estimator defined as any global minimizer of problem

(1). (Given plim yy0, proving consistency for the OLS estimators obtained from solving problems (2) and (3) involves repeated application of Jennrich's (1969, Lemma 3) mean-value theorem for random functions, and is quite tedious.) For simplicity, rather than necessity, we will assume that all relevant random variables are independent identically distributed across t. This enables us to apply the following simplified version of White's (1980) Lemma 2.2 to the global minimizer of problem

(1).

Lemma A.2.1. Let Q n(w,O) be a measurable function on a measurable space W and for each w in W a continuous function on a compact set E. Then there exists a measurable function 0 Mw such that
n

Qn (w,O n(w))=infQ nQ(w,ï¿½) for all w in W.

If plim'suplQn (w,e)-: (0)0I=0, and if T(O) has a unique minimum at

dï¿½0 then plim E) =d0.
n
Proof: See White (1980, Lemma 2.2).

The first part of lemma A.2.1 ensures the existence of the

nonlinear LS estimator (defined as a global minimizer). The second part will be used to show consistency. For this purpose we define,

-1 2
Qn(0)= n E (1(Apt+l >0)-E(1(Apt+1>0)))
t=1

-1 n 2
= n E (z t() + Ul) ,
t=1

where zt (0)=p7-p11 - (Pl 1po)(xtYo) + (Pll-pl0)(xtY), and =(11'1,0'Y). To apply the second part of Lemma A.2.1 we need to show uniform convergence, and that (8) has a unique minimum at 0. The next lemma, which is due to Hoadley (1971), provides a moment restriction that implies uniform convergence. Lemma A.2.2. For the function defined in Lemma A.2.1 suppose EIQ (0) l1+d0. Then plim {supIQn(0)-Q-()O)}=0. Proof: See Hoadley (1971, Theorem A.5). The following lemma establishes that the moment restriction holds. Lemma A.2.3. EIQ (O) Il+d0. Proof: Since zt(0) is bounded we have (z (8)+ul) 2<2.z () 2+2.u2 O. Let 1t=l(Apt+1>0), set d=l, note that Ellt k=E(1t)

El 3< U 3 J
Finally, we present a rank condition that implies Q(O) has a

unique minimum at o0, and therefore together with Lemma A.2.3 ensures consistency for a global minimizer of Qn (0).

i
Lemma A.2.4. Suppose xt is a discrete random variable, and let xt denote the i-th member of the support of xt. For each CEs such that 0e0o, suppose there exists k>1 members of the support of xt such that the following matrix has full column rank:

Ak = 1 (xt y) Q(xtY )

'k '(k o

If p1>Pl>0, then Q(O) has a unique minimum at 0. Proof: Since E(ult xt)=0, we have

Q(E)=E(zt(0)+ult ) E(z(0) )2=E(zt)2)+E(u lt). Obviously, ((0) has a minimum at 0O since E(zt()0 2)=0. To prove uniqueness it suffices to show that Q0o<> E(zt (0)2)>0.

Suppose for some @Oï¿½ , E(zt() 2)=0. Since Pr(zt() 2>0)=1, we have E(z (0)2)=0<=>Pr(zt ()2=0)=1. This implies that for every x i belonging to the support of xt,zt(0)2=0. That is,

p-P11-(P 1-P 0)(xty)+(P1-P 0)(xty)=0 i=1,2,. But this contradicts the assumption that Ak has full column rank unless o o o
11 1-p1 P 10Pl11-P10
Q.E.D.

Finally,we note without proof that Theorem 3.1 of White (1980) can be applied to show that the nonlinear LS estimator obtained from solving problem (1) is asymptotically normal. Therefore, the LS estimates have a proper limiting distribution.

A.3 Proofs of Theorems 3.2-3.17

Proof of Theorem 3.2: See McLeish (1975, Theorem 2.10).

Proof of Theorem 3.7: The proof is the same as Hoadley's (1971)

Theorem 1 except Theorem 3.2 is applied instead of Markov's law of large numbers.

Proof of Theorem 3.8: For notational simplicity let 1t=l(Ap t+>0). Consider an arbitrary point O*cE. We will show that given E>0 there
p
exists d>0 such that 0 - p1
Ift(Q,10 )-f p(Q )I t t1t p t t1t p

where E and d do not depend on t.

Assumptions 3.4(i) (normality) and 3.6(i) (compactness) imply

lim sup{ If(Q 10 )-f (Qtl ):0 E, CE}=0 (1)
Q +ï¿½o p p p p
Qtï¿½

Let s>O be chosen. Then equation (1) implies that there exists at=a(xt,1 )>0 and dt=d(xt,1t)>0 such that for QtI>at and I- p- t

Ift(Qt,1 t )-ft tlt O) I

By assumption 3.5(ii) (xt has a finite support) equation (2) holds a.e. for IQt >a-max(al,.,ak) and 10-o* d min(dl,. k). Thus, it

remains to show that equation (2) holds a.e. for Qt belonging to [-a,a].

Let C={(Qt,lt,xt p):Qt belongs to [-a,a]}. Since C is compact, and ft(Q t,t p ) is continuous on C, it follows that ft(Qt,lt p) is uniformly continuous on C. That is, there exists a d>0 such that equation (2) holds a.e. uniformly in t whenever 10 I-*1 p pQ
Q.E.D.

Proof of Lemma 3.9: The result follows from the fact that E is separable and ft (Qt,l(APt+1>0) Ip) is continuous on E. See, for example, Loeve (1960, p. 510).

Proof of Lemma 3.10: Hartley and Mallela (1977, Corollary 4.2) prove that there exists p(O )>O such that
p

E Isup {inf (Qt,( APt+l >O)I ()): E-1 t tpt+1 p p p

for k=2. In fact, their arguments can be used to show that (3) holds for any even positive k, and therefore for any positive k.

Proof of Lemma 3.11: The proof involves minor modifications to the proofs given in Amemiya and Sen (1977, lemmas 2 and 3) to cover the case of Pll1P10

Proof of Lemma 3.14: See White (1984, Theorem 2.4).

Proof of Lemma 3.15: By Theorem 2.3 of White and Domowitz (1984), assumptions 3.15(i) and 3.15(ii) imply

-1 n
plim {sup n E (qt (yt,)-q t (yt,))=0. (4)
E t=1

Given (4) and plim 0 =0o, Lemma 2.6 of White (1980) implies
n

plim n -1 (qt(Y, )-q t to)))=0. Q.E.D.

Proof of Theorem 3.16: See Newey and West (1985, Theorem 2).

Proof of Theorem 3.17:

Step 1. n2 als (n1 ( T (o)) n (o)f ( o)
n -c p -c p -c p -c p

We will show that Step 1 follows from

n ais (n-lf ( Pl)T' (TPl )-ln- f ( ml)Tf(ml). (5)
n -c n -c n -c n n

Given 3.17(i), the mean-value theorem for random functions

(Jennrich (1969, Lemma 3)) allows us to write

f(oml)=f(eo)+(af(_ )/ae )(D(i-o), and (6)
n p n p n p
ï¿½ ' ml0o,
f (oi ).=f (oo).+(af (' )/e )( ), i=l,.,k, (7)
-c n i -c p i -c ni p n p

where f (Ul). denotes the i-th column of the matrix f (1l), and o
-c n 1 -c n n
m0
and 0 each lie on the segment connecting ?l and 0.
n n p

Given (7), 3.17(ii), and plim m1=00o we have
n p

n-l (l)Tf (al) .=n-lf (0o)Tf (o).+o (1). (8)
-c n I -c n j -c p i -c p J p

Given (6), (7), 3.17(ii), 3.17(iii), H and plim Pl=00, we have
0 n p

n ; ( l)=n (+o (1). (9)
-c n 1 n -c p i p p

Substituting (8) and (9) into (1) we get the desired result:

n ct ls=(n f (o)T ()+o ( n f () f()+o (1).
n -c p -c p p -c p p p
A
I1sT ml-is 2
Step 2. n o D (m ) I A 2
n n n n uXK

By Step 1 we can write

ls o-1 1. T(o)
n 2s -A (0) nf (o0) T o
n n p -c p p

[(n f ( f ()) -1-A (00) -1n f (o) Tf(o)+o (1)
-c p -c p n p -c p p p

Therefore, by 3.17(iv), 3.17(v), and 3.17(vi), we have

1 2 2f1( e)T*
plim (D (eo00)n als - D (00)-A (bo)- n -f (o)Tf(0ï¿½))=0 (10)
n p n n p n p -c p p

Given 3.17(vi), by Corollary 4.24 of White (1984),

D (0)- A (0)- n f ()oTf(0o) N(0,I ). (11)
n p n p -c p p

(6) and (7) imply,

So- is A
D (O) n as n N(O,Ik) n p n k

and therefore by Corollary 4.28 of White (1984) we have

T A
is o -l s 2
nn D (O) a
n n p n "

Finally, since plim (D (l )-1I-D (o )-1)=0, by Theorem 4.30 of White n n n p
(1984)

T A
is D -(d1- s A 2
n n n n uXk*

Q.E.D.

A.4 Quadratic Hill-Climbing and the Asymptotic Distribution of the

(p+l)th-Round Estimates

The (p+l)th (p=1,2,.) iteration of the quadratic hill-climbing technique is given by

p+' P _ (2L (G)-a I)-1 VL (GP) (1)
n n n n n n n

where a = max (X +r VL (P) ,0), X is the maximum eigenvalue of
n n n n n
2L (P), r is a scalar correction factor, and IIVL( ) denotes the length of the k dimensional vector VL (n).
n n

Goldfeld, Quandt, and Trotter (1966) show that the technique chooses OP+i to maximize the quadratic approximation of L (0) on a
n n
region centered at Op of radius
n

I(L (OP)-a I)-1 VL() <1r.
n n n n n -If the quadratic approximation is good, (that is, if the step increases

L (0)), then in the next step r is decreased. Otherwise r is increased.

Further details can be found in Goldfeld, Quandt, and Trotter (1966).

Next we show that the estimator defined by Gp+1 has the same
n

asymptotic distribution as the partial-MLE provided that plim Op=o and
nn
n (OP-09) has a proper limiting distribution. More explicitly, we show

A
S(0p+l_0) = (n-V2L (0))-1 n 2VL (0o). (2)
n n n

The implication is that when consistent initial estiamtes are employed,

iteration beyond the second-round does not improve the final estimates,

at least asymptotically.
-1
To prove (2), it suffices to show that plim n a =0. To see this,

consider the mean-value expansion

VL (GP) = VL (o) + 2L (_ )(P-o). (3)
n n n n n n

Substituting (3) into (1) and rearranging, we get

n(op+l-0) - (n-la I-n-1 L (OP))-ln- VL (0)
n n n n n
= [I-(n-'V2L ()-n-la I) n-1V2L (0 )] Wn(6P-8). (4)
n n n n n n

-1
Therefore, if plim n a =0, then (2) follows from (4) since the right
n
hand side of (4) converges in probability to zero.
-1
The following theorem establishes that plim n a =0.
nn

Theorem A.4.1. For VL (0) = E alogft ()/8a, suppose t=1

n
-1
(i) plim sup n E [3logft(0)/DO - E(alogft()/a)] = 0.
0 t=1

(ii) plim ep=O.
n
(iii) E(9logft(ï¿½)/BO) is continuous.
-1
(iv) plim n <0.
n
-i
Then plim n t =0.
n
Proof:
-1i
If suffices to show that plim n _ L (C) ==0. Now n n
-i k
n-i L () I = n-( VL(0 )2)2
n n i= n ni
i=1

k n
n-(E (Z logf()/ )
i=1 t=1 t n
k n p
< n- a Blogf (OP)/BG. *0
i=1 t=1 t n i

by (i), (ii) and (iii).

Q.E.D.

In effect, the proof of (2) follows from the observation that if
-i
plim n a =0, then for sufficiently large n equation (1) reduces to the
n
Newton-Raphson technique with probability approaching one. Given that

the proof depends on (i) reducing to the Newton-Raphson technique

asymptotically, why not use the latter to begin with? Unfortunately, a

definitive answer to this question is not available. The answer lies in

the small sample properties of the estimators, which undoubtably would

require Monte Carlo studies to help uncover. We have chosen quadratic

hill-climbing over Newton-Raphson because it is somewhat reassuring to

know that the former always moves in the direction of a maximizer of the

likelihood function, while the latter might not.

BIBLIOGRAPHY

Amemiya, T. (1973). "Regression Analysis when the Dependent Variable
is Truncated Normal." Econometrica 41:997-1016.

Amemiya, T. (1974). "A Note on the Fair and Jaffee Model."
Econometrica 42:759-762.

Amemiya, T., and G. Sen (1977). "The Consistency of the Maximum
Likelihood Estimator in a Disequilibrium Model." Technical Report
238. Institute for Mathematical Studies in the Social Sciences,
Stanford University.

Benassy, J.P. (1982). The Economics of Market Disequilibrium. New

Bowden, R.J. (1978). The Econometrics of Disequilibrium. Amsterdam:
North Holland.

Cosslett, S.R. (1983). "Distribution-free Maximum Likelihood Estimator
of the Binary Choice Model." Econometrica 51:765-782.

Fair, R.C., and D.M. Jaffee (1972). "Methods of Estimation for Markets
in Disequilibrium." Econometrica 40:497-514.

Fair, R.C., and H.H. Kelejian (1974). "Methods of Estimation for
Markets in Disequilibrium: A Further Study." Econometrica
42:117-190.

Fisher, F.M. (1983). Disequilibrium Foundations of Equilibrium
Economics. New York: Cambridge University Press.

Goldfeld, S.M., and R.E. Quandt (1975). "Estimation in a
Disequilibrium Model and the Value of Information." Journal of
Econometrics 3:325-348.

Goldfeld, S.C., R.E. Quandt, and H.F. Trotter (1966). "Maximization

Gordin, M.I. (1969). "The Central Limit Theorem for Stationary
Processes." Soviet Mathematics 10:1174-1176.

Hartley, M.J., and P. Mallela (1977). "The Asymptotic Properties of a
Maximum Likelihood Estimator for a Model of Markets in
Disequilibrium." Econometrics 46:1251-1271.

Heckman, J.J. (1976). "The Common Structure of Statistical Models of
Truncated, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models." Annals of Economic and Social
Measurement 5:475-492.

Hoadley, B. (1971). "Asymptotic Properties of Maximum Likelihood
Estimators for the Independent Not Identically Distributed Case."
Annals of Mathematical Statistics 42:1977-1991.

Ito, T., and K. Ueda (1981). "Tests of the Equilibrium Hypothesis in
Disequilibrium Econometrics: An International Comparison of Credit
Rationing." International Economic Review 22:691-708.

Jennrich, R.I. (1969). "Asymptotic Properties of Non-linear Least
Squares Estimators." Annals of Mathematical Statistics 40:633-643.

Laffont, J.J. and R. Garcia (1977). "Disequilibrium Econometrics for

Lee, L.F., and R.H. Porter (1984). "Switching Regression Models with
Imperfect Sample Separation Information -- With an Application on
Cartel Stability." Econometrica 52:391-418.

Levine, D. (1983). "A Remark on Serial Correlation in Maximum
Likelihood." Journal of Econometrics 23:337-342.

Loeve, M. (1960). Probability Theory. 2nd ed. Princeton: Van
Nostrand.

Maddala, G.S. (1983) Limited-dependent and Qualitative Variables in
Econometrics. New York: Cambridge University Press.

Maddala, G.S., and F. Nelson (1974). "Maximum Likelihood Methods for
Markets in Disequilibrium." Econometrica 42:1013-1030.

Manski, C.F. (1975). "The Maximum Score Estimation of the Stochastic
Utility Model of Choice." Journal of Econometrics 3:205-228.

Manski, C.F. (1983). "Closest Empirical Distribution Estimator."
Econometrica 51:305-320.

Manski, C.F. (1985). "Semiparametric Analysis of Discrete Response:
Asymptotic Properties of the Maximum Score Estimator." Journal of
Econometrics 27:313-333.

McLeish, D.C. (1975). "A Maximal Inequality and Dependent Strong Laws."
Annals of Probability 3:826-836.

Newey, W.K. and K.D. West (1985). "A Simple, Positive Definite,
Heteroscedasticity and Autocorrelation Consistent Covariance
Matrix." Discussion paper 92, Woodrow Wilson School, Princeton
University.

Olsen, R.J. (1978), "Note on the Uniqueness of the Maximum Likelihood
Estimator for the Tobit Model." Econometrica 46:1211-1215.

Powell, J.L. (1984). "Least Absolute Deviations Estimation for the
Censored Regression Model." Journal of Econometrics 25:303-325.

Rao, R.R. (1962). "Relations between Weak and Uniform Convergence of
Measures with Applications." Annals of Mathematical Statistics
33:659-680.

Rudin, W. (1976). Principles of Mathematical Analysis. New York:
McGraw-Hill.

Serfling, R.J. (1968). "Contributions to Central Limit Theory for
Dependent Variables." Annals of Mathematical Statistics
39:1158-1175.

Sealy, C.W., Jr. (1979). "Credit Rationing in the Commercial Loan
Market: Estimates of a Structural Model Under Conditions of
Disequilibrium." Journal of Finance 34:689-702.

Wald, A. (1949). "Note on the Consistency of the Maximum Likelihood
Estimate." Annals of Mathematical Statistics 20:595-601.

White, H. (1980). "Nonlinear Regression on Cross-Section Data."
Econometrica 48:721-746.

White, H. (1984). Asymptotic Theory for Econometricians. New York:

White, H., and I. Domowitz (1981). "Nonlinear Regression with Dependent
Observations." Unpublished paper, University of California, San
Diego.

White, H., and I. Domowitz (1984). "Nonlinear Regression with Dependent
Observations." Econometrica 52:143-162.

BIOGRAPHICAL SKETCH

Walter James Mayer was born in Detroit, Michigan, in 1955. He

received a Bachelor of Arts degree in economics from the University of Missouri in 1982, and a Master of Arts degree from the University of Florida in 1983.

I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. I

Stephen R. Cosslett, Chairman
Associate Professor of Economics

I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.

Professor of EconomiC;3

I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ) ,

A.I. Khuri~Z
Associate Professor of Statistics

This dissertation was submitted to the Graduate Faculty of Department of Economics in the College of Business Administration and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy.

December 1986

UNIVERSITY OF FLORIDA

1I 0851 115 0 3 1262 08285 315 0

Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EZ1KJKMSB_BF8IVT INGEST_TIME 2017-07-20T21:25:17Z PACKAGE UF00102774_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES

PAGE 1

GENERALIZED ECONOMETRIC MODELS AND METHODS FOR MARKETS IN DISEQUILIBRIUM BY WALTER JANES HAYER A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1986

PAGE 2

AC IG ;OWLEDGEMENTS I would like to thank my advisor Dr. S.R. Cosslett, for his support throughout this project. Thanks are also extended to Dr. G.S. Maddala, Dr. D.A. Denslow, and Dr. A.I. Khuri. Invaluable assistance provided by DeLayne Redding in the typing of this document is much appreciated. This dissertation is dedicated to my parents. ii

PAGE 3

Abstract of Dissertation Presented to the Graduate School of the 'University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy GENERALIZED ECONOMETRIC MODELS AND METHODS FOR MARKETS IN DISEQUILIBRIUM By WALTER JAMES MAYER Chairman: Dr. S. R. Cosslett Cochairman: Dr. G. S. Maddala Major Department: Economics December 1986 Empirical studies of markets in disequilibrium have relied on the appropriateness of explicit price adjustment equations, serial independence, normally distributed errors, and explicit equations relating the observed quantity transacted to desired supply and demand. For example, the asymptotic properties of "disequilibrium" estimators and test statistics are sensitive to the parametric forms chosen for price adjustment, the serial behavior of the observations, error distributions, and the quantity transacted. In a word, "disequilibrium" estimators and statistics are non-robust. Unfortunately, economic theory provides little basis for choosing the parametric forms. A lack of economic-theoretic restrictions coupled with non-robust estimators and statistics has severely limited empirical studies of markets in disequilibrium. iii

PAGE 4

This dissertation develops new methods for more meaningful estimation of disequilibrium models. The new methods involve more general models and robust estimators. A switching regression model with imperfect sample separation is used to incorporate price adjustment into a disequilibrium model. The model enables price adjustment to be incorporated with less~ priori information than usual. To estimate the model, maximum likelihood and least squares estimators are proposed. The asymptotic properties of the maximum likelihood estimator are examined. Previous results for maximum likelihood estimators of disequilibrium models are generalized with asymptotic theory for serially dependent observations. The maximum likelihood estimator is shown to be consistent and asymptotically normal even if the data are characterized by unknown forms of serial dependence. Asymptotic test statistics are also derived. The methodology is illustrated with an empirical application to the U.S. commercial loan market from 1979 to 1984. Finally, I propose semiparametric models and estimators for markets in disequilibrium. These methods are applicable when the error distributions are unknown, and the quantity transacted is an unknown function of supply and demand. Consistent estimators are derived using the method of maximum score. iv

PAGE 5

TABLE OF CONTENTS Page AC~O~EDGMENTS. . . . . . . . ii ABSTRA.CT.. â€¢. .. . . â€¢. .â€¢ . . . iii CHAPTERS 1 2 AN OVERVIEW 1 1. 1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Solutions........................... 6 NOTES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 A GENERAL DISEQUILIBRIUM MODEL AND ESTIMATORS FOR LIMITED A PRIORI PRICE ADJUSTMENT INFORMATION.~ ....â€¢...â€¢........ 10 2.1 Introduction........................ 10 2.2 The Model and Maximum Likelihood Estimation.......................... 13 2.3 A Consistent Initial Least Squares Estimator........................... 18 2.4 Summary and Conclusions............. 21 NOTES. . . . . . . . . . . . . . . . . . . . . . . 22 3 SOME ASYMPTOTIC THEORY FOR SERIALLY DEPENDENT OBSERVATIONS................... 23 3.1 Introduction........................ 23 3.2 Consistency......................... 23 3.3 Asymptotic Normality................ 31 3.4 Consistent Covariance Estimation.... 33 3.5 An Asymptotic Test for Serial Correlation......................... 37 3.6 Summary and Conclusions............. 39 NOTES 40 V

PAGE 6

4 AN EMPIRICAL EXAMPLE: THE U.S. COMMERCIAL LOAN MARKET................... 41 4.1 Introduction........................ 41 4.2 The Empirical Model................. 42 4.3 Hypothesis Testing Procedures....... 43 4.4 The Results......................... 45 5 SEMIPARAMETRIC ESTIMATION OF DIS EQUILIBRIUM MODELS USING THE METHOD OF MAXIM'l1}1 SCORE. . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 Introduction........................ 53 5.2 A Directional Model and Consistent Estimation Up to Scale............. 55 5.3 A Price Adjustment Model with B 0 Identified (Without a Loss of Scale) 59 5.4 Maximum Score Estimation of Models that Include the Quantity Transacted 61 NOTES . . . . . 68 6 CONCLUDING REMARKS AND DIRECTIONS FOR APPENDIX FURTHER RESEARCH......................... 69 A.I Inconsistency and Misclassified Observations........................ 71 A.2 The Computational Tractability and Asymptotic Properties of the Least Squares Estimator of Section 2.3.... 74 A.3 Proofs of Theorems 3.2 3.7........ 80 A.4 Quadratic Hill-Climbing and the Asymptotic Distribution of the (p+l)th-Round Estimates............. 83 BIBLIOGRAPHY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 BIOGRAPHICAL SKETCH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 vi

PAGE 7

1.1 The Problem CHAPTER 1 AN OVERVIEW Before Fair and Jaffee (1972) introduced their econometric disequilibrium model, estimation of market behavior was confined to the equilibrium assumption. The study of the econometrics of disequilibrium was further developed by Fair and Kelejian (1974), Maddala and Nelson (1974), Amemiya (1974), Goldfeld and Quandt (1975), Laffont and Garcia (1977), Bowden (1978), and some others. By allowing for disequilibrium, Fair and Jaffee's work and the subsequent work it inspired represents an important generalization; but a generalization obtained by imposing (1) explicit price adjustment equations, (2) serial independence on time series data, (3) normally distributed error terms, and (4) explicit equations relating the observed quantity transacted to desired supply and demand. This contrasts with the equilibrium assumption where (1) price adjustment is not an estimation issue, (2) allowances are made for serial correlation, (3) the errors are only assumed to be uncorrelated with a subset of the explanatory variables, and (4) desired supply and demand are directly observable. 1

PAGE 8

2 The estimation of market behavior has been extended to the disequilibrium assumption, but at a cost. Economic theory for markets in disequilibrium is a relatively new area of research that has been developed in the last few years by Benassy (1982), and Fisher (1983), among others. Being recent phenomenona, however, the theories that have been proposed are rather limited in scope and tentative. For the empirical researcher, the theories provide little guidance for specifying price adjustment, and the quantity transacted as a function of desired supply and demand; they provide no basis for specifying the error distributions and serial independence. A survey of the many empirical studies that have followed Fair and Jaffee (1972) suggests that the basis for specifying these aspects of the econometric disequilibrium model has been largely computational tractability. This approach has led to several drastically different disequilibrium specifications. 1 The assumptions of each specification generally do not represent well-defined economic-theoretic restrictions, and thus differences among them seldom reflect differences among well-defined alternative economic theories. As a result, most disequilibrium specifications are as good (or bad) as any other. Unfortunately, each specification produces estimates only as reliable as the assumptions imposed, and differences among them can lead to conflicting estimates of supply and demand equations. The lack of economic-theoretic restrictions alone does not prohibit meaningful estimation of a disequilibrium model. The estimators commonly applied are also prohibitive. Most proposed "disequilibrium" estimators can be viewed as corrected versions of the "equilibrium"

PAGE 9

3 least squares (LS) and maximum likelihood (ML) estimators. 2 The inequality of supply and demand introduces nonzero correlation between the explanatory variables and the error terms. Given a model for the inequality of supply and demand, the "equilibrium" LS and ML estimators can be corrected for the nonzero correlation to yield consistent "disequilibrium" LS and ML estimators. The correction approach provides insight into the problem of relaxing the equilibrium assumption, but generally requires restrictive assumptions to make it operational. In particular, consistent LS and ML estimation of a disequilibrium model depends on choosing the correct the parametric forms for price adjustment, the error distribution functions, and the quantity transacted; useful inferences require allowances for serial correlation as well as correct parametric forms. Non-robustness coupled with a lack of economic-theoretic restrictions severely limits the reliability of LS and ML estimation. To illustrate these points we consider the following model. 0 Dt = Slxt + E:lt st = B~xt + E:2t Data: (Qt,xt)~=l Equilibrium assumption: Disequilibrium assumption: Dt St Q = T (D ,s ) t t t t lip t+ 1 = I\ ( D t St ) . ( 1. 1) (1. 2) Equations (1.1) and (1.2) are demand and supply functions; D denotes the quantity demanded, S the quantity supplied, x a vector of explanatory variables, E:lt and E: 2 t denote random error terms. Under the equilibrium

PAGE 10

4 assumption the observed quantity transacted, Q, is equal to both D and S; data are observed after prices adjust, and therefore adjustment models are irrelevant. Under the disequilibrium assumption D and Sare not necessarily observable, the function rt(.,.) specifies the position of D and S relative to the observable Q; data reflect adjustments at t t t various stages, and therefore it becomes meaningful to model price adjustment. Price adjustments are modeled as follows: the price change, ~Pt+l = Pt+l pt, depends on excess demand, Dt-St' through the function When LS and ML are applied under the disequilibrium assumption, it becomes necessary to specify the distribution of (Elt'E 2 t) up to an unknown parameter vector, and the functional forms for T and IT. The t t following example will illustrate this. Consider the problem of obtaining a consistent LS estimate of S~. Under the equilibrium assumption the data are conditional on the event Q=D=S, and therefore a consistent "equilibrium" LS estimate requires E(xtElt IQt=Dt=St)=O, or equivalently E(xtElt)=O since Qt=Dt=St is a sure event by assumption. Under the disequilibrium assumption, by contrast, each observation is conditional on either DtSt, and therefore a consistent "disequilibrium" LS estimate requires E(xtElt !Dt St)=O. But since, for example, DtSt). For example, suppose we specify

PAGE 11

5 ( 1.3) Qt = \(Dt'St) = min(Dt'St), (1.4) llpt+l = IT/Dt'\) = a(Dt-St)' a>O, (1.5) and for the first n 1 observations we have llpt+l
PAGE 12

6 ML estimates. Once again, however, the estimates are subject to the validity of restrictive assumptions. Empirical studies of markets in disequilibrium are concerned with analyzing time-series data, and therefore the possibility of serially correlated errors also arises. Most disequilibrium studies, however, completely ignore the possibility of serial correlation. One reason for this practice is that maximizing the correct likelihood function for a typical disequilibrium model with even a simple form of serial correla tion can be intractable. The problem is one of introducing further complications into a highly nonlinear structure. (Equilibrium models, by contrast, have simpler structures, and therefore it is relatively straightforward to incorporate an ARMA process (say) into these models.) The problem is further complicated when the true form of the serial correlation is unknown; even if one was willing to incorporate a simple process such as AR(l), the result would likely to be a questionable approximation at best. At the same time, failure to adequately account for serial correlation can cause inconsistent covariance estimates, and incorrectly interpretated test statistics. In summary, the estimation of markets in disequilibrium has been severely limited by the problems of specifying (1) price adjustment; (2) serial correlation; (3) the distributions of the error terms up to an unknown parameter vector; and (4)the quantity transacted as a function of desired supply and demand. 1.2 Solutions This thesis addresses the above problems by examining their effects, and by proposing and demonstrating solutions.

PAGE 13

7 Chapters 2, 3, and 4 are directed at the problems of specifying price adjustment, and specifying serial correlation. In Chapter 2, we propose using the switching regression model with imperfect sample separation of Lee and Porter (1984) to incorporate price adjustment into disequilibrium models. The model enables price adjustment to be incorporated with less~ priori information than usual. To estimate the model, ML and LS estimators are proposed. In Chapter 3, the asymptotic properties of the ML estimator are examined in the context of possible serial correlation. This chapter builds on previous results of Hartley and Mallela (1977), and Amemiya and Sen (1977). By incorporating into their results some recent developments in modeling serial correlation by White and Domowitz (1984) and others, the analysis permits the data to be characterized by unknown and general forms of serial correlation. At the same time, the estimation problem remains computationally tractable. In Chapter 4, the practical importance of the methodology developed in chapters 2 and 3 is illustrated with an empirical example. The methodology is applied to monthly data on the U.S. commercial loan market from 1979 to 1984. The final chapter, Chapter 5, proposes semiparametric models and estimators for markets in disequilibrium. Unlike the previous chapters, the results of Chapter 5 apply when the functional forms of the error distribution functions are unknown, and the observed quantity transacted is an unknown function of desired supply and demand. Consistent semiparametric estimators are derived by extending the method of maximum score of Manski (1975, 1985) to a new class of applications.

PAGE 14

8 Although the focus is on the disequilibrium estimation problem, many of the issues addressed are applicable to other important problems as well. From a general perspective, the central issue is how to deal with an estimation problem characterized by less information than what is usually assumed. The methodology with which we confront the issue brings together important works from the areas of limited dependent variables, nonlinear estimation, asymptotic theory, data analysis, maximum likelihood, least squares, and semiparametric estimation.

PAGE 15

9 NOTES 1 one notable difference among many proposed disequilibrium specifications is the treatment of price adjustment. Different price adjustment models often produce different coefficient estimates and inferences for given supply and demand equations. Most studies assume normally distributed error terms, and that the quantity transacted is the minimum of desired supply and demand. Surveys of disequilibrium specifications commonly used in applied work can be found in Bowden (1978) and Maddala (1983). 2 General discussions of LS and ML estimators for disequilibrium models can be found in Bowden (1978) and Maddala (1983). 3 The random variable Elt conditional on Dt
PAGE 16

PAGE 17

11 PA2. ignore whatever information the direction of the price change contains on excess demand. (This is a limited-information approach as no attempt is made to model price adjustment.) In the next section, we shall see that this specification can be usefully viewed as imposing the following constraint: Pr( tipt+l >O Int >St) = Pr( L'lpt+l >O lot
PAGE 18

12 The failure of many empirical studies to adequately represent price adjustment stems from a failure to carefully assess what is known a priori. For most applications PAl imposes too much structure, and PAZ imposes too little. What is needed is an approach which allows price and excess demand to interact, but at the same time is unrestrictive. I propose nesting PAl and PA2 in a more general model using a method suggested by Lee and Porter (1984). In many respects the approach is less restrictive than usual. Price adjustments are assumed to be governed by the following condition: PA3. The direction of the price change is most likely, but not certain to follow the direction of excess demand, or equivalently Pr(~pt+l>OIDt>St) _: Pr(~pt+l>OIDt
PAGE 19

13 The model and its maximum likelihood estimator are discussed in section 2.2. In section 2.3 a convenient least squares approach is proposed which has not been previously available for disequilibrium models. The LS estimator resembles that suggested by Heckman (1976) for the Tobit model. Although the ML estimator presented in section 2.2 is more efficient, the LS estimator is easier to compute, and provides consistent starting values if the ML estimates are desired. An initial consistent estimator is especially important when PAI is relaxed since the resulting likelihood generally has multiple solutions. 2.2 The Model and Maximum Likelihood Estimation I propose the following model: Qt = min(Dt'St) Pr(t.pt+l >O lnt>st) 2 Pr(t.pt+l>O Int S) and

PAGE 20

14 surpluses (D0)), where 1(.) is the indicator function, and the problem is to estimate the unknown supply and demand parameters along with the conditional distribution of l(~pt+l >0) subject to PA3. To make the problem operational we will adopt the methodology of Lee and Porter (1984) which entails the following assumptions: 1 Assumption 2.1. Given Dt>St (or DtO) are mutually independent for all t; Assumption 2.2. the conditional probabilities of PA3 do not vary with Assumption 2.2 is the simplest assumption that allows the price adjustment probabilities to be treated as estimable parameters, but is not the only possible way of doing so. For example, if it is suspected that price setting behavior differs between certain subsamples, then a different pair of parameters could be defined for each. One possible application might be a market where prices are regulated in some periods, but not in others. Alternatively, a completely varying-para meter approach is developed in Chapter 5. Although assumptions 2.1 and 2.2 are still somewhat restrictive, arguably the benefits obtained from imposing them outweigh the costs. Price adjustment is incorporated without an explicit adjustment equation, a specific distribution for price changes, or the restriction that price changes reveal the sign of

PAGE 21

15 excess demand. Furthermore, as we shall see next, estimation is relatively straightforward under assumptions 2.1 and 2.2. where The log likelihood function of n independent observations on 00 n I: log ft(Qt'l( ~t+l >0)) t=l 00 fairly general conditions, a consistent and asymptotically normal ( 0 0 0 ) estimate of 0 ,p 11 ,p 10 can be obtained by maximizing Ln over an (2.3) appropriate parameter space. (The asymptotic properties of a maximizer of L are developed in the next chapter.) n The Maddala-Nelson estimators discussed in section 2.1 are obtained by maximizing L subject to n (PAl): (p 11 ,p 10 ) = (1,0); (PA2): pll =p 10 . As was noted, however, applying these two estimators to a given data set can produce conflicting results. One advantage of specifying PA3 is that the parameter space includes the entire region (p 11 , p 10 : 1~ 11 ~ 10 ;;;0), and consequently it is not necessary to choose between PAl and PA2.

PAGE 22

16 By viewing the Maddala-Nelson estimators as constrained maximizers of L, two additional limitations that are overcome by specifying PA3 n can be seen. First, if the direction of the price change does not 0 0 always follow the sign of excess demand so that p 11 0, then the estimator obtained by maximizing Ln subject to (p 11 ,p 10 ) = (1,0) is inconsistent. In other words, if it is incorrectly assumed that l(t.pt+l>O) separates the sample into the underlying demand (Qt=Dt) and supply (Qt=St) regimes, then the resulting estimates will be generally A inconsistent. To see this denote the constrained estimator by 0 (1,0), n and suppose that 2 t are normally distributed independent random 0 0 _ variables, and p 11 0) contain 0 0 that is exploited by the maximizer of L only if the n restriction p 11 =p 10 is not imposed. Since imposing p 11 =p 10 is equi valent to estimating the model without a price adjustment specification, this implies that one is better off using even limited amounts of price adjustment information rather than neglecting it altogether. This can be seen by examining the difference between the corresponding information matrices of the constrained and unconstrained estimators of

PAGE 23

17 0. For this purpose note that p 11 =p 1O implies that Qt and l(t)pt+l >0) are independent, the marginal distribution of l(~pt+l >0) does not depend on 0, and therefore the 0-estimator obtained by maximizing L subject n to p 11 =p 1O can be written as n = arg max r log gt(Qt), 0 t=l A where gt is the density of Qt given xt. Since 0n(p 11 =p 1O ) does not require the joint observation (Qt,l(~pt+l >0)), it uses one more observation on Q than the 0-estimator obtained by maximizing L over n the unconstrained space, and therefore we write the latter estimator as n-1 = arg max L log ft(Qt,l(~pt+l >0)) (0,p 11 ,p 1O ) t=l A Unlike 0n(p 11 =p 1O ), the estimator 0n_ 1 (p 11 zp 1O ) uses the price 0 0 adjustment information implied by p 11 zp 1O , namely the dependence of l(~pt+l >0) and Qt. For simplicity suppose that the observations are identically distributed. The trade-off between the extra observation on A Q used in 0n(p 11 =p 1O ), and the price adjustment information exploited by 0n_ 1 (p 11 zp 1O ) is apparent in the difference between their corresponding information matrices: 2 (n-l)E(-a logf/a0a0 1 ) (n-l)E(-a 2 logh/a0a0 1 ) E(-a 2 logg/a0a0 1 ) where ht is the density of l(~pt+l>O) given (Qt,xt). In large samples A the information provided by the extra observation on Qin 0n(p 11 =p 10 ) is A insignificant, and clearly 0n_ 1 (p 11 ;z:p 1O ) is the more efficient estimator.

PAGE 24

18 Having developed a fairly unrestrictive approach for introducing price adjustment into the disequilibrium model, an important question remains: is maximization of Ln over (0,p 11 ,p 10 ) computationally tractable? This question is important given a common structure shared by both Ln with (p 11 ,p 10 ) unrestricted, and Ln restricted by p 11 =p 10 : neither specification permits the observations to be separated into the underlying supply and demand regimes, and hence both are switching regression models with unknown sample separation. The question of tractability arises because likelihood functions of unknown sample separation models generally have an unknown number of local maxima, and finding the consistent and asymptotically normal estimate (global maxima) usually requires an exhaustive set of local candidates. For example, Maddala and Nelson found that three different starting values produced three different sets of estimates, and were not able to rule out the possibility of other solutions. Unfortunately, the extra information provided by the joint observation (Qt,l(pt+l >0)) does not automatically eliminate the problem; in general, L is likely to have n multiple solutions. Fortunately, the problem can be circumvented. If one is willing to assume that p 11 >p 10 , then it is possible to construct a computationally simple and consistent estimator of (0,p 11 ,p 10 ), and therefore obtain consistent starting values to iterate to a local maxima of L. The consistency of the initial estimates generally guarantee the n consistency and asymptotic normality of the resulting solution. 2 The next task is to describe the initial estimator. 2.3. A Consistent Initial Least Squares Estimator While computationally simple and consistent estimators have been proposed for other limited dependent variable models such as the Tobit,

PAGE 25

19 similar results have not been previously available for the disequili brium model with unknown sample separation. Ironically, the models for which such estimators have been available generally possess tractable likelihood functions, and therefore finding consistent initial estimates is of limited value. A prime example is the Tobit model for which consistent initial estimators were proposed by Amemiya (1973), and Heckman (1976); their estimators are not particularly useful for the Tobit as this model has a globally concave likelihood function (when suitably parameterized) which ensures convergence to the consistent and 3 asymptotically normal maximizer from any starting values. In contrast, the likelihood functions of models with unknown sample separation are likely to have multiple maxima, and therefore finding initial consistent estimates for these models is crucial. The estimator described below extends the approaches suggested by Amemiya and Heckman to disequilibrium models with unknown sample separation. The method requires the first moment of l(l\pt+l >O) (t=l, ..â€¢ ,n), and the first and second moments of Qt (t=l, ... ,n). Least squares is then applied successively to three estimation equations. Assuming that It and 2 t are independent normally distributed random variables, the relevant equations are 1( lip t+ 1 >O) = El( lip t+ 1 >()) + ult (2.5) Qt = E(Qt) + u2t (2.6) Q2 = E(Q~) t + u3t (2. 7) where (2.8) ( o2 o2) ( o) ( 2 9 ) 0 d + cr cf, xt Y '

PAGE 26

20 E(Q!) (l-41(xt y)) (xt 8~) 2 0 0 2 = + 41(xty )(xt81) o2 o o o o2 + er~)) + a 1 (141( x y ) 2xt8 2 (xty )/(crcl E t + o2 o a i=;2 ( 4J( X t y ) o o I o2 2xt 8 1 q,( xt y ) (ad + a~;)) + 0) 0 q,(xt y xt y (2.10) y = ( B~ B~) / ( cr~i + a~) , and q,( and 4J( denote the standard normal density and c.d.f., respectively. Given appropriate regularity conditions, nonlinear LS applied to equation (2.5) yields consistent f o O d O Th . h d . estimates o p 11 , p 10 , an y. ese estimates are ten use to estimate the nonlinear functions, q, and 41, in equation (2.6). Ordinary LS can . o o o2 then be applied to (2.6) to consistently estimate 8 1 ,8 2 and (cri=;l + Finally, the nonlinear functions of equation (2.7) are estimated o2 o2 so that OLS can be applied, and consistent estimates of cri=;l and crE 2 obtained. The asymptotic properties of the LS estimator are developed in appendix A.2. Interestingly, the above approach is possible only if price changes provide some information on whether there is excess demand or supply; i.e., p~ 1 =i,~ 0 This can be seen from equation (2.8) which can be interpreted as the probability that l(~pt+l>O) is equal to one. If price changes are completely uninformative on excess demand or supply, 0 0 ) then p 11 =p 10 , and it follows from (2.8 that the distribution of 0 l(~pt+l>O) is independent of y. In this case the observations on l(~pt+l>O) contain no information on the supply and demand parameters, and therefore equations (2.5) is irrelevant for the estimation of the model.

PAGE 27

21 2.4 Summary and Conclusions The main points of this chapter are (1) Estimates of disequilibrium models are sensitive to the price adjustment specification. (2) Economic theory imposes few restrictions on price adjustment, and consequently provides little basis for choosing between specifications. (3) Assumption PA3 serves as an unrestrictive approach for introducing price adjustments into disequilibrium models; adjustment enters without an explicit adjustment equation, a known probability distribution for price changes, or the restriction that price changes reveal the sign of excess demand. (4) Assumption PA3 together with assumptions 2.1 and 2.2 permit a straightforward derivation of the likelihood function. The parameter space includes but is not limited to the important special cases PAl and PA2. Constraining the parameter space to PAl, as is often done in practice, can lead to inconsistent estimation; constraining the space to PA2 produces inefficient estimates. (5) Under assumption PA3 the disequilibrium model is one of unknown sample separation, and therefore its likelihood function generally has multiple solutions. To resolve the problem of multiple solutions, the least squares method described in section 2.3 provides consistent initial estimates. In Chapter 4 we apply the methodology developed in the present chapter to monthly data on the U.S. commercial loan market. Before proceeding to the application, however, the problem of serial correlation must be addressed. In the next chapter we develop some results which permit the data to be analyzed in the context of possible serial correlation.

PAGE 28

22 NOTES 1 There is an important difference between the model Lee and Porter (1984) discuss, and our model. The Lee-Porter model excludes an analog to Q =min(D ,S ), and consequently in their model the switching is t t t exogenous; i.e., the switching that occurs between the underlying regimes is independent of the error terms. In contrast, the disequilibrium model is of endogenous switching, (the "switch" depends on (Elt'E 2 t)), and consequently many of the results, interpretations, and expressions found in the Lee-Porter paper must be modified accordingly. 2 rn fact, given appropriate regularity conditions, consistent initial estimates ensure the consistency and asymptotic normality of the second-round estimates from a Newton-Raphson type algorithm. See, for example, Amemiya (1973, pp. 1014-15). 3 01sen (1978) proved that the likelihood function for the Tobit model is globally concave when suitably parameterized, and thus has a single maximum.

PAGE 29

CHAPTER 3 SOME ASYMPTOTIC THEORY FOR SERIALLY DEPENDENT OBSERVATIONS 3.1 Introduction In this chapter we examine the asymptotic properties of the estimator discussed in section 2.1, Ef1 1 = arg max L ( 0 ) , n 0 n p p where 0 = (G,p11,P10), and L ( 0 ) is defined on page 14, equation p n p (2.3). If the observations are serially independent, then obviously is the MLE of 00. However, for serially dependent observations, c:f11 p n not the MLE and will be referred to as the partial-MLE. c:f11 n is Hartley and Nallela (1977), and Amemiya and Sen (1977) derive asymptotic properties of the MLE for the special case of p 11 =p 1O . We will extend their results to the case of serially dependent observations and p 11 ~p 10 in sections 3.2 and 3.3. In section 3.4 we consider the problem of consistently estimating the asymptotic covariance matrix. In section 3.5 we derive a new test for serial correlation. 3.2 Consistency Since disequilibrium models are typically estimated with time series data, it is of interest that the property of consistency can be extended to the partial MLE. Using some results and definitions presented by White and Domowitz (1981), Levine (1983) has discussed how and why a partial MLE can be consistent. Levine points out that the consistency of an estimator 0 (y) which maximizes the product n 23

PAGE 30

24 ~=lft(ytj0) depends on each ft(yt j0) satisfying certain regularity conditions. In general, whether or not ft(yt j0) satisfies such conditions does not depend on the product being the joint density of y = (y 1 , ,y ), but rather it usually suffices that f ( ) is the marginal n t density of yt. The regularity conditions consist of identification conditions, and moment restrictions sufficient to apply an appropriate law of large numbers. We will show that the partial MLE for our model can satisfy such conditions by extending some results proven by Hartley and Mallela, and Amemiya and Sen. But first it is necessary to describe the type of dependence we have in mind. We will adopt the nonparametric approach of White and Domowitz (1984) to allow for the possibility of serial correlation. The approach of White and Domowitz is nonparametric in the sense that the observations are not required to be generated by a known parametric model such as an ARMA (p,q) process, but instead must obey general memory requirements. The memory requirements are referred to as mixing conditions, and a sequence of random variables which obey mixing conditions is said to be a mixing sequence. More precisely, we have the following definition. 1 Definition 3.1. Let (yt) denote a sequence of random vectors defined on a probability space (n,F,P), and let Fb denote the Borel o-field of a events generated by the random variables ya,Ya+l',Yb Define

PAGE 31

25 (i) If Cm)~ as m+ 00 , and (m)=O(m-k) for k>r/(2r-l), where r~l, then (yt) is a mixing sequence with Cm) of size r/(2r-l). (ii) If aCm)~ as m+ 00 , and a(m)=OCm-k) for k>r/(r-1), where r>l, then (yt) is a mixing sequence with a(m) of size r/Cr-1). Cm) and aCm) measure how much dependence exists between observations at least m periods apart. A sequence such that Cm)~ as m+ is called uniform mixing or t:mixing, and a sequence for which a(m)~ as m+ 00 is called strong mixing or a-mixing. Since the dependence coefficients, Cm) and a(m), are required to vanish asymptotically, mixing is a form of asymptotic independence. A fairly large class of processes satisfy mixing conditions. For example, finite order Gausian ARMA processes are strong mixing, as are stationary Markov chains under fairly general conditions. White and Domowitz Cl984) show that measurable functions of mixing processes are mixing and of the same size. This is particularly convenient for nonlinear problems. Mixing processes are useful for modeling complex economic data since they are not required to be stationary. In short, mixing conditions provide a convenient way to model an economic phenomenon that is likely to be both heterogeneous and time dependent. The following law of large numbers, due to McLeish (1975), applies f . . 2 or mixing sequences. Theorem 3.2. Let (yt) be a sequence with Cm) of size r/(2r-l) or a(m) I l r+d of size r/Cr-1), r>l, such that E yt O, and all t. Then n p Cl/n)E(yt-E(yt))~ t=l All proofs of theorems in this Chapter are provided in Appendix A.3.

PAGE 32

26 For Theorem 3.2 to be applicable to a given sequence, it is clear that there is a trade-off between the moment restriction that the sequence must satisfy, and allowable dependence. The stronger the moment restriction satisfied, the more dependence as measured by ~(m) (or a(m)) is allowed. If the members of the sequence are independent, then we can set r=l, and Theorem 3.2 collapses to the Markov law of large numbers. For sequences with exponential memory decay, r can be set arbitrarily close to one. In general, the longer the memory of a sequence, the larger is the size of ~(m) and a(m), and consequently the more stringent the moment restriction (which depends on r) becomes. By using mixing conditions to restrict the serial behavior of the sequence (Qt,l(~pt+l >O),xt), it is not necessary to specify an additional parametric model such as an ARMA (p,q) process. Consequently, one possible source of model misspecification is eliminated. Mixing conditions enable us to include a larger class of models in the analysis. Of course, as Theorem 3.2 implies, the precise size of the class will depend on what moment restrictions are satisfied. We are now ready to state conditions which ensure the consistency of the partial-MLE (and the MLE) of 0. p In order to establish consistency for the partial-MLE we impose the following assumptions on the disequilibrium model presented in section 2.2: Assumption 3.3. (allowable serial dependence): The sequence (Qt,l(~pt+l>O),xt) is a mixing sequence with ~(m) of size r/(2r-l), r~l, or a(m) of size r/(r-1), r>l.

PAGE 33

27 Assumption 3.4. (distributions): (i) The random vector 2 t) is normally distributed with mean zero and covariance matrix: ( o2 o d 0 (ii) Assumptions 2.1 and 2.2 hold. (See page 13.) Assumption 3.5. (the regressors): (i) The vector xt consists of only exogenous variables. (ii) Each component of xt is uniformly bounded int, has a finite range for each t, and a support given by St=S for all t. (iii) Any linear combination of the components of xt where the coefficients are not all zero is not zero with probability one. Assumption 3.6. (the parameter space): (i) The parameter space :::: includes the true parameter vector o o o2 o o2 o o 2 0p=(B 1 ,od ,B 2 ,o 2 ,p 11 ,p 10 ), excludes the region oe:i~ (i=l,2) and p 10 >p 11 , and is a compact subset of a Euclidean space. (ii) If the set :::: includes points such that p 11 =p 10 , then it excludes * o o2 o o2 o o the point 0p=(B 2 ,o ,t\,od ,p 11 ,p 10 ). Otherwise 5 may include * 0 p With a few exceptions, the conditions on the regressors and the parameter space are identical to those given by Hartley and Mallela (1977), and Amemiya and Sen (1977). One exception is that we place no restrictions on the limiting behavior of the empirical distribution of the regressors, whereas Hartley and Mallela require it to converge completely to a nondegenerate distribution. As pointed out by White

PAGE 34

28 (1980), in sampling situations where the researcher has little control over the data, it is important to allow for the possibility that the empirical distribution does not converge. In contrast to Amemiya and Sen, we do not require the regressors to be i.i.d., but for convenience retain their assumption that the regressors are discrete random variables. Assumption 3.6(ii) is necessary to identify the true parameter 00. Without appropriate prior information on so and 0 the vector s2' p 1 * 00 point 0 is indistinguishable from and the model can not be p p estimated. This is the problem of interchanging regimes which is discussed by Hartley and Mallela, and Amemiya and Sen. Both studies * point out that 0 is eliminated from the parameter space if the usual "order condition" holds. We will extend this result below by showing * that for 0 to be distinguishable from 0, it suffices to know a priori p p h O > 0 t at p 11 Pioâ€¢ In this sense prior sample separation information represents prior information on the supply and demand parameters. Hoadley (1971) has generalized the Wald argument to the case of independent not identically distributed observations. Theorem 3.7 below is an extension of Hoadley's argument to mixing sequences, and will be used to verify that assumptions 3.3, 3.4, 3.5, and 3.6 imply consistency for the partial-MLE, Ef1 1 . 3 n Theorem 3.7. Suppose: (i) The sequence (yt) is a mixing sequence with ~(m) of size r/(2r-l), r~l, or a(m) of size r/(r-1), r>l. (ii) The parameter space 2 is a compact subset of a Euclidean space. (iii) The function ft(ytj0) is continuous on 2, uniformly int, a.e.

PAGE 35

29 (iv) The function (v) (vi) sup{ln(f (y 10 1 )/ (y 10)): 10â€¢-0!;:;;p} 0' t t t t is a measurable function of yt for each 0 belonging to * There exists p (0)>0, and d>O such that * for Q;;.p::::ap ( 0). -ln o lim sup{n E E(ln(f (y 10)/f (y Je )))}O), x t), and ft(Qt,l(~pt+l>O) l0p) satisfy 3.7(i)-(vi) given assumptions 3.3 3.6. The fact that the mixing and compactness requirements 3.7(i) and 3.7(ii) are satisfied follows directly from assumptions 3.3 and 3.6(i). Lemma 3.8 establishes that f (Q ,l(~p +l>O Je) satisfies the t t t p continuity requirement 3.7(iii).

PAGE 36

30 Lemma 3.8. Given assumptions 3.4 3.6, f (Q ,l(llp +l>O 10) is a t t t p continuous function of 0 uniformly int, a.e. p Lemma 3.9 establishes that the measurability requirement, 3.7(iv), is satisfied. Lemma 3.9. Given assumptions 3.3 3.6, the function I Q I sup {lnCf /Qt' 1 ( lipt+l >0) I 0P) //Qt' 1 ( lipt+l >O) I 0P)): I OP -0P I ~P} is a measurable function of (Q ,l(llp 1 >0),x ). t t+ t The moment restriction, 3.7(v), together with 3.7(i) determines the amount of dependence allowable. The following lemma extends Hartley and Mallela's Corollary 4.2, and establishes that 3.7(v) is satisfied for large r+d. Lemma 3.10. Given assumptions 3.3 3.6, for all sufficiently small p=p( 0) >O, I O I k E/sup{ln(f (Q ,l(llp +l>Olo )/f (Q ,l(llp +1>010 )): 10 -0 l;;;p}I ;;;t1;;; 00 t t t p t t t p p p where k is any positive integer. Finally, Lemma 3.11 establishes that the identification condition, 3.7(vi), is satisfied. Lemma 3.11 extends Amemiya and Sen's Lemmas 2 and 3 to the case of p 11 ~p 10 . Lemma 3.11 Given assumptions 3.3 3.6, for 0 ~0 there exists a p p negative constant b(0) such that p We have proven the following theorem.

PAGE 37

31 Theorem 3.12. Given assumptions 3.3 3.6, then plim 0f1 1 =0. n p 3.3 Asymptotic Normality Under the assumption that (Qt,l(~pt+l>O),xt) is a mixing sequence, we consider the limiting distribution of 1 1 n -2V2 ( 0 o) 'vL ( 8 0) n p n p ' where 'vL (0) denotes the gradient vector corresponding to L (0), and n p n p 0 0 V (0) = var (n 'vL (0 )). We will discuss conditions that imply n p n p asymptotic normality; that is, (3.13) where I denotes an identity matrix of appropriate dimensions. The results in this section together with those in the next section permit derivation of asymptotic test statistics. As is well known, asymptotic normality is proven by an appropriate application of a central limit theorem. As with consistency, the conditions sufficient for asymptotic normality depend on the degree of dependence and heterogeneity the sequence exhibits. For a sequence of independent identically distributed random vectors, we have the Lindeberg-Levy Theorem; for independent not identically distributed we have the Lindeberg-Feller Theorem; for dependent identically distributed we have the central limit theorem of Gordin (1969); for dependent not identically distributed we have the central limit theorem of Serfling (1968). For the case of independent observations, Hartley and Mallela (1977) prove the asymptotic normality result (3.13) by applying a version of the Lindeberg-Feller Theorem. However, by specifying the sequence (Q ,l(~p 1 >0),x) as mixing, a more general result is t t+ t

PAGE 38

32 possible. The following theorem is based on Theorem 2.4 of White and Domowitz (1984) which generalizes Serfling's (1968) central limit theorem. Theorem 3.14. Suppose: n (i) Let VL ( 0) = I: V.Lt( 0). n p t=l P Then E(VL (0)=0 for all t. t p (ii) Let A be any nonzero vector, and define VL ( 0 ) n,a p Then there exists a matrix V such that det(V)>O, and as n+ 00 uniformly in a. If (m) or a(m) is of size r/(r-1), then (3.13) holds. Condition 3.14(i) is the familiar condition that the vector of 0 likelihood equations, when evaluated at the true parameter vector 0, p has zero expectation. Sufficient conditions for 3.14(i) are (1) the model is correctly specified, and (2) the density of (Qt,l(~pt+l>O),xt) is sufficiently regular to permit differentiation under the integral sign. Condition 3.14(ii) is somewhat restrictive, but unfortunately a less restrictive replacement for it is currently not available. Condition 3.14(ii) restricts the heterogeneity of v'Lt(0;) by requiring it to be covariance stationary asymptotically.

PAGE 39

33 Condition 3.14(iii) is a moment condition which depends on the amount of dependence the sequence (Qt,l(6pt+l >0),xt) exhibits. If the sequence is serially independent, then r can be set arbitrarily close to one; as the amount of dependence increases, as measured by ~(m) or a(m), r increases accordingly. 3.4 Consistent Covariance Estimation We consider the problem of deriving consistent estimators for the asymptotic covariance matrix of the partial-MLE d11 1 . The expression for n the asymptotic covariance matrix is where V (0)= n p order partial -~ 0 20 var(n 2 171 (0 )), and V L (0) is the matrix of second n p n p derivatives of L (0) = E(L (0)). n p n p First consider the problem of consistently estimating the term nV 2 L (0)1 The functional form of this term does not depend on the n p serial dependence (or independence) of the observations, and therefore consistent estimation of it is straightforward. The following theorem, which combines Lemma 2.6 of White (1980) with Theorem 2.3 of White and Domowitz (1984), provides conditions that imply Theorem 3.15. Let qt(yt,0) be measurable for each 0 belonging to a compact set ~, and continuous on~ uniformly int a.e. Suppose (i) The sequence (yt) is mixing as stated in Definition 3.1. (ii) For r~l and any d>O,

PAGE 40

34 Next consider the problem of consistently estimating V (0). n p Unlike the term i-I (0), the functional form of V (0) depends on the n p n p nature of the serial correlation, and consequently special care must be taken. The general form for V (0) is n p 0 V (0 );c(n)) n p where f ( 0 ) :::Vlog t p c(n) is such that _ 1 cCn)-1 + n s=l for an estimator of V (0;c(n)) is the sample analogue V (0ml ; c(n)). n p n n The consistency of such an estimator, however, depends on the asymptotic behavior of c(n). We will consider two special cases. Case 1. c(n) = C
PAGE 41

35 An example of sampling situation where c is a known finite constant would be one in which the observations are known to be generated from a moving-average process of order c. If c is assumed to be constant and less than or equal to n-1, but otherwise unknown, then the problem becomes more complicated. 1\, Let c denote the specified choice for an unknown c. In the next section we 1\, derive an asymptotic test for the hypothesis c=c. The test is a 1\, possible criterion for specifying c. 1\, care the following. If we specify is inconsistent since nonzero terms The issues involved in specifying 1\, JDl 'v c c, then the estimator is consistent, but inefficient since restrictions of the form T E(f.(0)f .(0) )=Oare neglected. When the purpose of estimating 1 p J p 0 V (0 ;c) is to construct asymptotic test statistics, however, the n p essential requirement is consistency (rather than efficiency). 1\, Therefore, when the purpose is hypothesis testing, the choice c>c is 1\, preferable to c O),xt) be mixing. Theoretical results for this case have been presented by White and Domowitz (1984), White (1984), and Newey and West (1985). Their results depend on restricting the growth rate of c(n). Unfortunately, their results do not give any guidance concerning the choice of c(n) for

PAGE 42

36 finite samples. The following theorem is due to Newey and West (1985), and provides sufficient conditions for plim (V ccf11;c(n)) V (0;c(n))) n n n p =O. Theorem 3.16. Suppose (i) (ii) f (0) is measurable in (Q ,l(~p +l>O),x) for each 0, and t p t t t p continuously differentiable in 0 in a neighborhood N of 0. p p 2 (a) sup If ( 0 ) I ~~< 00 0 N t p p (b) There are finite constants d>O and r~l such that (iii) (Qt,l(~pt+l>O),xt) is a mixing sequence with Cm) of size 2 or a(m) of size 2(r+d)/(r+d-l), r>l. (iv) o .! _ml o For all t, E(f (0 ))=O, and n 2 (~-0) t p n p is bounded in probability. .! If lim c(n)= 00 such that c(n)=o(n 4 ), then plim (V (e 1 ;c(n)) V ( 0 );c(n)))=O. n n n p One additional problem is that for ~(n)>l the estimate V (e 1 ;~(n)) n n is not necessarily positive semi-definite. This can lead to negative estimates of the variances and test statistics which are clearly not acceptable. To ensure that V (e 1 ;~(n)) is positive definite, the n n summands can be weighted according to a procedure described in Newey and West (1985). This modification does not affect the consistency of the estimate.

PAGE 43

37 3.5 An Asymptotic Test for Serial Correlation In this section we propose a test sensitive to serial correlation in the gradient vectors f (0). The test provides a criterion for t p specifying the constant c of the covariance estimator V (e 1 ;c). n n The null hypothesis of interest is H: E(f (0).f (0) .)=O for all i,j, o t pi t-c p J where f (0). denotes the i-th component of the vector f (0). The t p 1 t p basis for a test of H comes from two observations. 0 (1) Under H, linear combinations of the components of the vector 0 (2) f (0) are uncorrelated with linear combinations of the components t p of the vector f (0). t-c p Under H, the products f (0 1 ).f (0). should be close to zero o t n i t-c n J for sufficiently large n. Therefore, a reasonable strategy for testing H would be to compute the 0 sample correlation between appropriate linear combinations, and reject H if the sample correlation is too large in some sense. To this end, 0 . for a k-dimensional vector f (0), consider the artificial regression t p k . l k E w.tft(e ). = E i=l 1 n 1 i=l . 1 a.f crJ1 ). ' i t-c n i k where thew. are known constants such that E w.t=l, and the a. are it i=l i i unknowns to be estimated. The test we propose entails computing the OLS ls estimates ai , i=l, ,k, and testing the hypothesis a 1 = More formally, we have the following theorem.

PAGE 44

f (0) =/ (0) -c p / 1. p 1 f(0) p f (0 )1 , n-c p \ k L w. f (0 ) ._ 1 i,c+l c+l pi 1k LW. f(0). i=l i,n n p 1 In addition to H suppose 0 38 f (0 )k n-c p (n-c)xl (n-c)xk (i) The vector-valued function f(0) is continuously differentiable p (ii) (component by component) on an open convex set~ CRk containing 00 p 0 There exists an open neighborhood of 0 ,N, such that p sup\ (0 )i\<~< 00 , and 0 N t p p . sup \a (0 )i/30 \<~'< 00 , 0 N t p p p 1 n . (iii) plim nL f (0)=0 t=l t P (iv) o -1 ' o T' o Let A (0 )=n (0) (0) and A (0) = E(A (0 )). n p -c p -c p' n p n p Then there exists an open neighborhood of 0 N such that A (0) is p' ' n p

PAGE 45

(v) (vi) Let H ' 0 3.6 39 positive definite on N for all n sufficiently large and plim sup IA (0) A (0) I =O. 0 No n p n p p Let U (0) = var n p sample analogue. (n-f (0)Tf(0)), and let U (EfD-1) denote the -c p p n n Then U (0) is positive definite on an open n p neighborhood of 0 for all n sufficiently large and p plim (U cEF 1 ) U (0)) = 0. n n n p i .!' T' A Under H U 2 (0)n2 f (0) f(0) 'I., N(O I). o' n p -c p p ' D (0 )=A1 (0 )U (0 )A1 (0 ). Then given conditions (i)-(vi), and n p n p n p n p Summary and Conclusions The main points of this chapter are the following: (1) The assumptions presented in sections 3.2 and 3.3 imply that the partial-MLE of the disequilibrium model is consistent and asymptotically normal. The assumptions allow for serial correlation of an unknown form; for example, an arbitrary ARMA process is allowable for the b . Ah . h . ,.ml. d h h o servations. t t e same time, t e estimator~ is compute as t oug n the observations were serially independent, and thus computational tractability is retained. (2) To calculate asymptotic test statistics, a consistent estimate of the asymptotic covariance matrix is needed. Obtaining a consistent covariance estimator is complicated by the need to specify a constant c T such that E(f (0)f (0) )=O for alls~. In general, c is unknown but t p t-s p 'I., 'I., consistent covariance estimation depends on specifying a c such that c>c. (3) The test statistic presented in Section 3.5 permits a test of 'I., 'I., H :c=c, and thus provides a criterion for specifying c. 0

PAGE 46

40 NOTES 1 our discussion of mixing draws heavily on White and Domowitz (1984), and White (1984, pp. 43-47). 2 Theorem 3.2 is a less general version of the law of large numbers presented by McLeish (1975, Theorem 2.10). The version we present is discussed in White (1984, Corollary 3.48), and imposes a stronger but simpler moment restriction. 3 White and Domowitz (1984) extend Hoadley's Theorem A.5, which is a uniform law of large numbers, to mixing sequences by applying Theorem 2.10 of McLeish (1975) instead of Markov's law of large numbers. Here we merely point out that Hoadley's Theorem 1 can be extended to mixing sequences using the same technique. In some respects the conditions of Theorem 3.7 are stronger than those stated in Hoadley's Theorem 1. For example, the requirement that ft(ytj0) is continuous can be replaced by upper semi-continuity. The conditions that we state are sufficiently general for our purposes.

PAGE 47

CHAPTER 4 AN EMPIRICAL EXAMPLE: THE U.S. CO:MMERCIAL LOAN MARKET 4.1 Introduction In this chapter the disequilibrium model described in section 2.2 (page 12) is fitted to monthly data on the U.S. commercial loan market from 1979 to 1984. The problem is to analyze disequilibrium supply and demand behavior with limited a priori information imposed on the price adjustment process. The model is estimated and tested with the partial-MLE and least squares method described in sections 2.2 and 2.3, respectively. The possibility of serial correlation is accounted for using methods described in Chapter three. Disequilibrium models of commercial loan markets have been estimated by Laffont and Garcia (1977), Sealy (1979), and Ito and Ueda (1981). To design the specification of the supply and demand equations these works were consulted. Our model and estimation methods, however, differ from the previous studies in three important respects. First, price enters the model differently. Laffont and Garcia, and Ito and Ueda constrained the price change to separate the sample, and Sealy assumed that price changes were a linear function of normal random variables. Second, the starting values we employ for maximizing the likelihood function are consistent estimates, and therefore ensure convergence to an asymptotically desirable solution. None of the above 41

PAGE 48

42 studies employed methods that guarantee this. Third, we will adopt the nonparametric approach developed in Chapter three to allow for the possibility of serial correlation. Given that the data is a time series, allowing for serial correlation is particularly important. Failure to do so can cause inconsistent covariance estimates and therefore misleading test statistics. In contrast, most existing disequilibrium studies, including those mentioned above, apply methods to time series data that are only appropriate for serially independent observations. The nonparametric approach was chosen for its generality, and computational ease. An arbitrary ARMA process is allowable for the error terms, but at the same time the parameter estimators are computed as though the errors are serially independent. As opposed to an assumption of serial independence, the only part of the problem that changes is the calculation of the asymptotic covariance estimate. 4.2 The Empirical Model The empirical model to be estimated and tested is specified as follows. D = 8 10 + 811 (RLt -RAt) + 8 12 1 pt-1 + e::lt' t st = 8 20 + 821 (RLt -RTt) + 822TDt + E:2t' Qt = min(Dt,St), p 11 >p 10 , where P11 Pr(M.Lt+l>OIDt>St), and P10 Pr(lffi.Lt+l>olnt
PAGE 49

43 financing to firms; IP is the industrial production index and measures firms expectations about future economic activity; RT is the three month treasury bill rate, and represents an alternative rate of return for banks; TD is total bank deposits in billions of dollars, and is a scale variable. The observed quantity transacted, Q, is specified as the sum of commercial and industrial loans, and the relevant price change is liR.Lt+l=RLt+l-RLt. All interest rates are expressed as percentages. The sample consists of 72 observations on each variable, and can be found in various issues of the Federal Reserve Bulletin. 4.3 Hypothesis Testing Procedures Two hypotheses concerning the price adjustment process, and several hypotheses concerning serial correlation were tested. The first price adjustment hypothesis maintains that the direction of the price change l(~pt+l>O) can be used to separate the sample into the underlying supply (Qt=St) and demand (Qt=Dt) regimes. The approach we have chosen to model price adjustment permits the known sample separation hypothesis to be conveniently expressed as The null hypothesis was tested by computing a Lagrange multiplier (LM) test. The LM test was chosen over the Wald and likelihood ratio tests because it only requires the estimates under the computationally simpler null hypothesis. The second price adjustment hypothesis maintains that price adjustments are symmetrical in the following sense: the chance of a

PAGE 50

44 price increase during a shortage is the same as that of a decrease during a surplus. This hypothesis can be expressed as To test the hypothesis of symmetrical price adjustment, a Wald test was computed. The Wald test was chosen over the LM and likelihood ratio tests because it only requires the unconstrained estimates. In this case the constrained estimates (those obtained under H) offer no 0 computational advantage over the unconstrained estimates. The LM and Wald test statistics converge to their usual chi-squared limiting distributions provided that: 1 1 A__ (1) n2 V 2 (0)VL (0t'\N(O I) n p n p ' ' "-' ...ml "-' o (2) a constant c is chosen such that plim (V (~ c)-V (0 c))=O. n n' n p' If v'.L (0) is a k-dimensional vector, and both (1) and (2) hold, then we n P can conclude (See, for example, White (1984, Theorem 4.30)). '\, The specification of c was handled as follows. The LM statistic for the first H and the Wald statistic for the second H were each 0 0 I\, computed for several successive values of c. The LM statistic was '\, computed for c=l, ,12, and in each case the null hypothesis (p 11 ,p 1O )=(1,O) was rejected. The Wald statistic, however, produced conflicting evidence for the hypothesis P11 =l-plO; for some values of '\, C the hypothesis was rejected, and for others it was accepted. To choose among the conflicting evidence, the test statistic for serial

PAGE 51

45 correlation (See Section 3.6) was computed for several values of c. On . this basis c was specified, and a single covariance estimate for the Wald test was chosen. The covariance estimate chosen for the Wald test was also used to compute the asymptotic standard errors of the parameter estimates. The test statistic for serial correlation depends on the correlation between linear combinations of the components off (0) and t p 0 linear combinations of the components off (0 ). Therefore, the t-c p conclusion of the test depends on how the linear combinations are chosen, or in other words, on the specified weights w. (see page 36). lt For example, the test might reject H for some set of weights, and not 0 reject H for other sets. To help cope with this difficulty, it was 0 decided to choose the weights randomly from a uniform distribution on the interval (0,1). If there is a finite or countable number of sets of weights such that H is incorrectly rejected or accepted, then choosing 0 the weights from a continuous distribution ensures that these weights are not chosen with probability one. The weights were generated from a uniform distribution by a SAS random number generator. 4.4 The Results The model was estimated under the assumption that the error terms are independent normal variates with constant variances, but are not necessarily serially independent. First the LS method was applied. The LS estimates are reported in the first column of Table 1, and were used as starting values to obtain the ML estimates presented in the second and third columns. A computer program was written with the SAS "Matrix Procedure" for the purpose of maximizing the likelihood

PAGE 52

46 functions; the program uses the quadratic hill-climbing technique as presented in Goldfeld, Quandt, and Trotter (1966). In Appendix A.4 we describe the quadratic hill-climbing technique, and show that consistent initial estimates ensure that the second-round estimates obtained from the technique have the same asymptotic distribution as the partial-MLE. The estimates in column two of Table I maximize the likelihood subject to (p 11 ,p 10 )=(1,0), or equivalently, under the assumption that the direction of the price change separates the sample into the underlying supply and demand regimes. Unlike previous studies a test of this hypothesis was carried out. The constrained estimates were used to construct the Lagrange multiplier (LM) statistics. The LM statistic was computed with twelve different covariance estimates (c=I, .â€¢. ,12). As the figures in Table two indicate, the hypothesis of known sample separation is rejected. The conclusion of the LM test has two important implications for the analysis of the data and model. First it suggests that the price change alone should not be used to determine whether the sample period was characterized by excess demand, excess supply, or both. In most disequilibrium studies this type of analysis is routinely done. Second, as was shown in Section 2.1, incorrect sample separation adversely effects the large sample properties of the estimators. In view of this problem the constrained estimates are suspect. The next estimation was performed over the unconstrained space, and consequently p 11 and p 10 were estimated along with the other parameters. In this case all of the initial consistent estimates were employed, and therefore the estimates in column three represent the consistent and asymptotically normal solution. The ML estimates are not much different than the LS estimates. This is due to stopping iteration before

PAGE 53

47 complete convergence to a maxima of the likelihood function. The iterative technique performed poorly for the unconstrained likelihood in the sense that the speed of convergence was extremely slow. For this reason, the final estimates were obtained from the 100th iteration where the gradient is not significantly close to zero, and therefore are not true ML estimates. However, since the initial estimates are consistent, estimates obtained after the second iteration are asymptotically equivalent to the ML estimates, and therefore nothing is lost by stopping iteration before convergence, at least asymptotically. Further details regarding this point are provided in Appendix A.4. The particular specification chosen for the model performed well in the sense that all of the estimates are of the correct sign, and most are significant. The estimates of p 11 and p 10 are .8179 and .2455, respectively, which mean there is (1) a 81.79% chance of a price increase and 18.21% of a decrease during shortages, and (2) a 75.45% chance of a decrease and 24.55% of an increase during surpluses. To select a covariance estimator for the Wald test of H 0 :p 11 =1-p 10 , the serial correlation statistic was computed for c=l,2,3. (See Table 3.) The hypothesis of c=3 was accepted. The Wald test statistic did not reject the hypothesis H 0 :p 11 =1-p 10 (see Table 4), suggesting that price adjustments are symmetrical. The differences which arise when the imperfect sample separation given by the price change is ignored can be seen by comparing columns two and three of Table 1. While both sets of estimates give the correct signs for the supply and demand variables, the unconstrained estimates suggest that demand and supply are less responsive to price changes than do the constrained estimates. The unconstrained estimate of the price

PAGE 54

48 parameter for the supply equation is approximately 40% less than the constrained estimate, and the price coefficient for the demand equation is approximately 14% smaller in absolute value for the unconstrained estimate. Given the rejection of the known sample separation model, however, we are more inclined to believe the unconstrained estimates. The problem of determining whether the period 1979-84 was charac terized by excess demand or supply was also addressed with the unconstrained estimates. This was accomplished by estimating the probability of excess demand for each t conditional on the quantity transacted and the direction of the price change. The expression for this conditional probability is The results are reported in Table 5. As pointed out by Lee and Porter and Qt=Dt otherwise, is optimal in the sense that it minimizes the probability of misclassification. Applying this rule, we find that 54.12% of the observations are excess demand and 45.8% excess supply. In contrast, if one were to rely solely on the direction of the price change, the conclusion would be 31.9% excess demand, 43.1% excess supply, and for 25% of the observations, ~Pt+l=O. In Table 6, the compatibility of the direction of the price change with the optimal classification rule is further examined. Comparing the two rules, excluding the observations for which ~Pt+l=O, we find that 9 observations out of 54 are classified differently.

PAGE 55

49 Table 1 Estimated parameters and statistics. (Asymptotic standard errors in parentheses) LS MLE MLE Variables Initial estimates p 11 =1, p 10 =o p 11 ,p 10 unconstrained demand canst. 79.6508 40.5262 79.6509 (169,61) RL-RA IP_ 1 2 ad supply canst. RL-RT TD 2 a e:2 log likelihood n=72 -14.9764 2.2856 367.7335 -60.6708 4.4981 0.3176 1197. 7623 0.8526 0.2571 -17.2779 2.5429 2140.5700 -74.9844 7.3034 0.3266 77.4408 1.0000 0.0000 -355.9850 -14.9758 ( 2.918) 2.2938 ( 1.170) 367.7344 ( 94.36) -60.6709 (145.87) 4.4985 ( 0.3834) 0.3288 ( 0.982) 1197.7622 ( 87.40) 0.8178 ( 0.2454 ( -317.3710 .0673) .2752)

PAGE 56

50 Table 2 Test of Ho: (p11,P10) = (1,0) c:f11 '\, rejected at a.% level V ( c) LM Statistic H n n ' 0 '\, c=l 16.7693 0.020% =2 28.8588 0.001% =3 18.9718 0.008 % =4 65.5703 0.001% =5 22.9532 0.001 % =6 17.3450 0.017% =7 10.7834 0.455% =8 12.2467 0.219 % =9 14.4707 0.072% =10 5. 7286 5.702% =11 6. 5377 3.805 % =12 6.5118 3.854%

PAGE 57

C 1 2 3 V ( ~. ~) n n ' '\, c=3 51 Table 3 '\, Test of H: c=c 0 Serial Correlation Statistic 32.7552 14.2697 10.6509 Table 4 H rejected at a.% level 0 0.030% 16.104% 38.540% Test of H : P11 = l-plO 0 Wald Statistic H rejected at a.% level 0 0. 0550411 94.34%

PAGE 58

tspt+l >O tspt+l
PAGE 59

CHAPTER 5 SEMIPARAMETRIC ESTIMATION OF DISEQUILIBRIUM MODELS USING THE METHOD OF MAXIMUM SCORE 5.1 Introduction We consider an alternative estimation strategy not previously analyzed for a disequilibrium model. The strategy is the so-called "semiparametric" estimation developed in Manski (1975), Cosslett (1983), Powell (1984), Manski (1985), and some others. Semiparametric estimators have been shown to be consistent under more general conditions than the conventional LS and ML estimators, and therefore require fewer prior restrictions. For a number of cases where consistent LS and ML estimation require the functional form of the error distribution, consistent semiparametric estimators have been derived without imposing functional form. Powell did so for the censored regression model using the method of least absolute deviations, Cosslett derived a distribution-free ML estimator for the binary choice model, and Manski derived consistent estimators for the same model using the method of maximum score. Semiparametric estimation is most useful when parametric assumptions cannot be trusted, but are needed for consistent LS and ML estimation. In particular, it offers an improved strategy for estimating disequilibrium models. We derive consistent semiparametric estimators for disequilibrium models using the method of maximum score of Manski (1975, 1985). Consistent score estimators are derived for the following situations: the functional forms of the error distributions are unknown, the 53

PAGE 60

54 quantity transacted is an unknown function of supply and demand, and the price change is an unknown function of excess demand. The presentation comprises three models and their score estimators. The models we consider are all of the following form: M. (model): Given the supply and demand equations St= S~xt + e: 2 t and Dt=S~xt + e:lt' the iid sequence of random vectors (Qt,pt,xt)~+l' the event S involving either p or Q , and the event S involving xt. pq t t X Pr(S Is ; s 0 1' 82) > Pr(Sc Is ; s 0 1' 132)' and pq X pq X Pr(Sc lsc; rPl' s 0 2) > Pr(S lsc; s 0 1, s2); pq X pq X C where S denotes the complement of the event S. General, intuitive considerations motivate the specification of (S ,S) for each model. pq X For example, the intuition that an expected shortage (excess demand) is a better predictor of a positive price change than an expected surplus motivates the model in Section 5.2. Given the model, consistent estimation depends on general continuity and identification assumptions which do not require prior knowledge of the functional forms of the underlying distribution functions or explicit equations for quantity or price. The model in Section 5.2 concerns events involving the price change, ~Pt+l = Pt+l-pt' and expected excess demand, s 0 xt or more specifically, the binary variables l(~pt+l >0) and l(Sxt>O), where l(â€¢) denotes the indicator function. The model maintains that given l(S 0 xt>O), the best forecast of l(~pt+l>O) corresponds to l(~pt+l >0) = l(S 0 xt>O). A score estimator of s 0 is defined and assumptions for consistency given. The model resembles the binary response model studied by Manski (1975, 1985), and shares an

PAGE 61

55 identification problem: s 0 is only identified up to an unknown multiplicative scalar. The model in Section 5.3 is a more restrictive version of that in Section 5.2, but retains a considerable amount of generality. The model is designed to exploit the fully observable ~Pt+l (versus l(~pt+l>O)) to identify S 0 A consistent score estimator is presented, and we show that 8 is identified without a loss of scale. The model represents a completely new application for maximum score estimation as it differs significantly from the model studied by Manski. The estimators presented in Section 5.2 and 5.3 do not depend on the quantity transacted, Qt' and therefore impose no restrictions on it. By neglecting the observations on Q, however, the generality involves a t loss of information. In Section 5.4 we specify a model for Qt' and define a corresponding score estimator. The specification, however, is insufficient to identify S 0 (even up to a multiplicative scalar) without severely restricting the distribution of xt. To eliminate the identification problem the models of the previous sections are added to the specification, and the estimator is redefined. The resulting estimator uses the entire sample (Qt,~Pt+l'xt)~=l' and is shown to be consistent under general conditions. 5.2 A Directional Model and Consistent Estimation Up to Scale The directional model restricts the direction of the price change to be most likely, but not certain to follow the sign of expected excess demand, or equivalently M5.l (directional model): Pr(~pt+l>Ols 0 xt>O) > Pr(~pt+l~ols 0 xt>O), and Pr( ~p +l Pr( ~p +l >O I S 0 x <0). t tt t

PAGE 62

56 The motivation for M5.l is its compatibility with an intuitively appealing forecast procedure: if a shortage is expected at time t, 0 B xt>O, then predict a positive price change, ~Pt+l >O; otherwise, predict a nonpositive change. Given M5.l, the number of correct forecasts must eventually exceed the number incorrect. The forecast procedure in turn motivates a strategy for estimating B 0 from n observations on (~pt+l'xt): choose as an estimate of B 0 a value B that maximizes the proportion of the observations characterized by l(~pt+l>O) = l(B 0 xt>O). This is the method of maximum score. We propose the score estimator: Sn= arg max g (B), where g (B)=n-l 8B n n The function gt(â€¢) "scores" one if a candidate B implies a forecast compatible with the maintained model, M5.l, and zero otherwise. Manski (1985) presents a consistent score estimator for a model of the form MED(yjx)=bx, where MED(z) denotes the median of the random variable z. His consistency proof, however, depends on the weaker model: Pr(y>O) lbx>O) > Pr(y 2 0jbx>O) and Pr(y 2 0lbx 2 0) > Pr(y>O) jbx 2 0). We have postulated our model in the weaker form for two reasons. First, the weaker model is easy to interpret as a price adjustment model; positive price changes occur most frequently with expected shortages, and negative changes with expected surpluses. Second, but not less important, MED(~pt+l lxt)=s 0 xt is unnecessarily restrictive. Manski's consistency proof (1985, p. 323) is directly applicable for B assuming appropriate regularity conditions are met. Theorem 5.2 n

PAGE 63

57 '\, 0 below provides assumptions that imply f3 converge to f3 almost n everywhere (a.e.) as n becomes indefinitely large. Theorem 5.2. In addition to Ml.I assume: AS.3. (continuity): E(gt(S)) = g(B) is continuous in f3 on a compact set B. AS.4. (identification): The set A ( f3) = {x: sgn( s 0 x) ;c sgn( Bx)} has X 0 1 positive probability for all Se:B such that f3;cf3. Then lim S =s 0 a.e. n Proof: Step 1. Uniform convergence. The proof of uniform convergence uses the argument presented in Manski (1985, pp. 321-2). Observe that gn(f3) = Pn(tipt+l>O, Bxt>O) + Pn(tipt+l..::_0, Sxt..::_0), and g( 8) = P( tipt+l >O, Sxt >0) + P( tipt+l~O, Bxt..::,0), where P , P represent the empirical and true distributions. Therefore, n the generalized Glivenko-Cantelli theorem of Rao (1962, Theorem 7.2) implies lim sup lg (8) g(f3) I= 0 a.e. /3B n Step 2. Identification. MS.I and A5.4 imply that 13 uniquely maximizes g(f3), To see this, consider 0 = f E(gt ( f3 ) g ( /3) IX ) dF t t X Ac ( 8) X + f E(gt( 8) g ( 8) Ix )dF A ( 8) t t x X

PAGE 64

58 C where A (B) denotes the complement of A (B), and F the distribution X X X function of x. The first term on the right-hand side vanishes given the definition of gt' and under M5.l the second term is strictly positive. Step 3. "\, 0 lim B = B a .e. n Given A5.3, Step 1, and Step 2, a.e. convergence follows from Theorem 2 of Manski (1983). Q.E.D. The assumptions permit a fairly general disequilibrium model. The consistency proof does not depend on the distributions of lt and 2 t, or how the market determines the quantity transacted. Consistency depends on a price adjustment model which enters without an explicit adjustment equation, or a known functional form for the probability distribution of prices. It suffices to believe that an expected shortage (surplus) is a better predictor of a positive (nonpositive) price change than an expected surplus (shortage). The generality of the assumptions, however, has costs. In particular, a careful examination of A5.4 reveals that B 0 is only identified up to an arbitrary scale factor. The identification problem results from the failure of the obvious, but necessary condition that A (B) be nonempty for all B~B 0 Observe that for any A>O we have X sgn(AB 0 x) = sgn(B 0 x) for all vectors x, and therefore A (AB 0 ) is an X empty set. Thus, if points of the form B=AB 0 are included in the parameter space, B, then A5.4 fails as does identification (Step 2). Manski (1985) resolves the problem by normalizing the parameter space with respect to scale which effectively eliminates the troublesome points. Scale normalization suffices for A5.4, but the conclusion of Theorem 5.2 becomes lim S = AB 0 a.e., where A is an unknown scalar. 2 n

PAGE 65

59 The loss of scale can be interpreted as arising from insufficient information. The directional model represents prior information on the stochastic behavior of the signs of ~p 1 and Bx, but not their t+ t magnitudes; by construction the estimator depends only on the signs. The limited information permits a fairly general model, but limits what 0 can be learned about 6. We shall see next that the loss of scale can be eliminated by imposing assumptions on the magnitudes of ~Pt+l and Bxt, At the same time it is possible to retain a considerable amount of generality. 5.3 A Price Adjustment Model with e 0 Identified (Without a Loss of Scale Manski (1985) discusses the score estimator for a binary response model where the dependent variable, y*, is unobservable, and the sample consists of observations on l(y*>O). In the last section the price change was treated analogously to obtain a robust method of estimation. Unlike the problem considered by Manski, however, ~Pt+l is generally observable. To take advantage of the extra information, and thus obtain a stronger result, we propose the following model. M5.5 (directional-magnitude model): for appropriately specified numbers ..::() and o>O, Pr(~pt+l>S 0 xt>o) > max(Pr( l~Pt+l l..::s 0 xt>o), Pr(~pt+l <-js 0 xt>o)), Pr( ,~Pt+l l_.:: ls 0 xtl_.::o) > max(Pr(~pt+l> le 0 xtl..::o), Pr(6pt+l max(Pr(~pt+l>S 0 xt<-o), Pr( j6pt+l l..::S 0 xt<-o)). The directional-magnitude model quantifies the notion that large (small) discrepancies between expected buy and sell decisions are most likely to lead to relatively large (small) price changes. The model predicts a

PAGE 66

60 small price change ( l6pt+l l_.::E) if the expected market position lies within a specified interval centered at equilibrium ( IB 0 x lE) otherwise. Compared to M5.1, the model M5.5 is more restrictive as it restricts both the direction and magnitude of the price change. We shall see, however, that M5,5 distinguishes B 0 from AB 0 , and thus it becomes meaningful to discuss estimators that converge unambiguously to Given M5.5 we define a score estimator of 6 as follows: 8 = n h ( 8) t arg max h (8), where BEB n = 1(6p >E)l(Bx >o) + 1( l6p l
PAGE 67

61 Theorem 5.8. Suppose the i-th component of the vector Bo is nonzero. 0 Then for all B such that Bi ;rBi and 1\ ;t{), the set J/ B) is nonempty. Proof: It suffices to show that there exists at least one solution x to the system of linear equations: 0 M( B , B)x=r where 0 M( B , B) = r = The existence of xis equivalent to rank(M(B 0 ,B)) = rank(M(B 0 ,s) r), or det(M( 8, S)) = det(M( B 0 , S) r). 0 If det (M(B ,s)) = 2, then the proof is complete. 0 0 0 If det(M(S ,8)) = 1, then we need Bi/Bi= y/y. The 0 existence of such points y and y follows immediately since 0 y>6_2y} = (-oo,O)U(l,oo), and Q.E.D. 5.4 Maximum Score Estimation of Models That Include the Quantity Transacted The estimators presented in sections 5.2 and 5.3 do not depend on the observed quantity transacted, Q, and therefore neglect relevant sample information. In this section we propose a model for Q, and 0 define a score estimator of B that depends on n observations of Q. We shall see, however, that the model for Q is insufficient to identify s 0 (even up to a multiplicative scalar). We resolve the identification problem by combining the model for Q with the price adjustment models described in sections 5.2 and 5.3. The score estimator we define for n the combined model uses the entire sample (Qt,~Pt+l'xt)t=l' and

PAGE 68

62 therefore can be expected to be more efficient than the estimators of sections 5.2 and 5.3. The observations on the quantity transacted are modeled as follows: M5.9 (quantity model): For some given o~, Pr( Qt >o I (3~xt > o, 0 > o) > Pr(Q o, 0 > o)' B2xt (32xt tt and Pr(Q Pr(Q >olB~x ~' 0 (32xt (32x < o), tt t t t Two appealing assumptions that are sufficient for M5.9, and therefore motivate it, are A5.10 Qt= min(Dt,St). A5.ll MED(e: 1 t) = MED(e: 2 t) = O, and e: 1 t and e: 2 t are independent. Assumption A5.11 requires only independent error terms with distributions symmetrical about zero. To construct an estimator of (3 given the quantity model, we define the scoring function: To prove consistency for a maximizer of q (8) using the arguments in the n proof of Theorem 5.2, the relevant assumptions are: A5.12 (continuity): q((3) is continuous in 8 on a compact set B. A5.13 (identification): (i) The set U (8) = {x: X sgn( s 2 x-o)} has positive probability for all 8e:B such that ( s 1 , s 2 ) ;z 0 0 ( 8 1' 8 2).

PAGE 69

63 (ii) The set Z/ S 0 ) = {x: sgn( S~xo) ;t sgn( S~x-o)} has zero probability. The role of assumption A5.13 in proving consistency is analogous to that of the previous identification assumptions A5.4 and A5.7. The two parts of A5.13 imply that S 0 uniquely maximizes q(S). Part (i) compares to the familiar order condition needed for the identification in the textbook simultaneous equation framework. For example, if the supply and demand equations have no explanatory variables in common, and o>O, 0 3 then Theorem 5.8 implies that U (S) is nonempty for S;:tS. To see the X role of part (ii), suppose that the sets ZcU = {zc(S 0 ) U (S)}, ZcUc, ZU, X X and ZUc each have positive probability for some S;:tS 0 Then we can write, E(qt C 8)-q/ 8)) = 2 [ 0 ECq/ 8)-qt C 8) lxt )dF x + f E(q ( s)-q ( S) Ix )dF + f E(q ( s)-q ( S) Ix )dF zCUC t t t X ZU t t t X + J E(q ( S 0 )-q ( 8) Ix )dF ZUC t t t X It can be readily verified that the first term on the right hand side is positive, the second in nonnegative, the third is zero, and the last term is negative. Therefore, given the negativity of the last term, S;:tS 0 does not necessarily imply E(qt ( s)-qt ( S)) > 0. To rule out this possibility, we impose part (ii). The requirement that Z (s) has zero probability, however, is too X restrictive to be generally applicable. It is difficult to imagine a situation where such an assumption would be appropriate. Therefore, unless one is willing to severely restrict the distribution of xt, the model M5.9 is insufficient to identify S 0 Assumption A5.13(ii) can be

PAGE 70

64 relaxed, however, by combining the model for Q with the price adjustment model of Section 5.2, and constructing a score estimator that exploits both models. For this purpose we assume that the price adjustment model M5.l holds in addition to M5.9, and consider the scoring function: q*t( S, S 0 ) = l(Zc( S 0 ) )q ( S) + l(Z ( s 0 ) )P ( S) X t X t where Pt(S) l(Zc( S 0 )) X = l(~pt+l_::O)l(S 1 xt_::o, s 2 xt>o) + l(~pt+l >O)l(S 1 xt>o, s 2 xt_::o), l(xtEZc(S 0 )), and Zc(S 0 ) denotes the complement of Z (S 0 ). X X X Generally Zx(S 0 ) will be unknown, but if a consistent estimate, say Sn' is available, then it can be replaced by Z (S ). One possible choice X n for S is the estimator presented in Section 5.3. This forms the basis n for a "total" sample estimator of e 0 : 0 To show that Sn converges to S a.e. we prove: Theorem 5.14. Let lim Sn M5.l and M5.9 assume: = 0 S a.e., and S EB for all n. n In addition to (continuity): q*(S,S') is continuous in both arguments on a compact set B. (identification): Assumption A5.13(i) holds. Then lim S n Proof: 0 = S a.e. Step 1. Uniform convergence. The proof is similar to Step 1 of Theorem 5.2. Theorem 7.2 of Rao ( 1962) implies lim sup jq~(S,S') q*(S,S') I= 0 a.e. SEB' S' EB

PAGE 71

65 Step 2. Identification. Let dt(B,B 0 ) = q:(B 0 ,B 0 ) q~(B,B 0 ). We will show that B=B 0 implies d(B,8) > 0. Consider, d( B, 8) = JE(d ( 8, B 0 ) lxt)dF + J E(d/ 8, 8) Ix )dF UZ t X UZC t X where UZ = {U (B) X Z ( B 0 ) } UZc = {U ( 8) X ' X Zc( Bo)} X ' Zc(B 0 )}. That B~8 implies X 0 d(B,B) > 0 follows from the first two terms being positive, and the last two nonnegative. We will prove this for the first and last terms only; the proof for the remaining terms is similar. Consider the first term, and assume without loss of generality that 0 0 0 0 B 1 xt-o .:::_ 0, and B 2 xt-o > 0, and thus (8 1 8 2 )xt < 0. Since xtE:U/8), we have Bx -o > O, and s 2 x -o < 0. Therefore, 1 t t ( 0 0 where the inequality follows from 8 1 -8 2 )xt < 0, and MS .1. For xt UcZc assume without loss of generality that 8~xt-6 > 0, and B~-o > 0. Since xt U~(8), we have B 1 xt-o > 0 and 8 2 xt-o > 0, or 8 1 xt-o > 0 and B 2 xt-o .:::_ 0, or B 1 xt-o .:::_ 0 and 8 2 xt-o > 0. Therefore, evaluating the conditional expectation case by case, we find E(d ( B, 8) Ix E:UcZc) = Pr(Q >o Ix ) Pr(Qt>olxt) = 0, or t t t t = Pr(Q >o Ix ) t t > 0. A q*(B,B 0 ) I Step 3. lim sup lq*(B,B) = 0 a.e. BB n n

PAGE 72

66 Let Y > 0 be given. Step 1 implies sup I q*( B, 8 ) q*( B, B ) I < y/2 a.e. t3B n n n for sufficiently large n. The continuity of q*, and the compactness of B imply sup BB q*( B, B ) q*( B, B 0 ) I < y/2 a.e. n for sufficiently large n since lim B n inequality we get 0 = B a.e. I o I sup q*(B,B)-q*(B,B) O SNT)B where the existence of follows from Step 2, and the compactness of B. 0 Now Step 3 implies q*(B ,B) > q*(S ,B) /2, a.e. for large n, and n n n n since CB ,B) maximizes q* we have n n n q * ( B , s 0 ) > q * ( B 0 , B ) / 2 a . e . n n n Step 3 also implies q * ( s 0 , B ) > q * ( B 0 , s 0 ) / 2 a. e n n n (5.15) (5.16)

PAGE 73

67 for large n. Adding both sides of (5.15) and (5.16) we get 0 sup ~*(S,S) a.e. SE:N'TlB and therefore S E:.N a.e. for sufficiently large n. n Q,E,D.

PAGE 74

68 NOTES 1 The signum function, sgn( â€¢), is defined as follows: sgn(z) = 1 if z > 0, and sgn(z) = -1 if z < O. 2 Another significant cost is that no distributional theory for maximum score estimators is currently known. 3 other comparisons with the so-called order condition for identification are much more complicated, and beyond the scope of this paper.

PAGE 75

CHAPTER 6 CONCLUDING REMARKS AND DIRECTIONS FOR FURTHER RESEARCH In this thesis, I have proposed several new solutions to the problem of generalizing disequilibrium models and their estimators. The empirical example in Chapter 4 demonstrates how to implement many of these solution in practice. However, as we have seen, while some of the solutions solve old problems, they also introduce new complications. For example, while the methods presented in Chapter 3 eliminate the need to specify a parametric model for serial correlation, they also introduce the complication of having to choose a single covariance estimator from several candidates. Clearly, some of the results fall short of completely generalizing disequilibrium models and their estimators; there is a trade-off. I believe, however, that this thesis accomplishes more than merely shifting the problems faced by empirical studies from old ones to new ones. In particular, it provides a solid foundation for further research by clarifying many of the issues involved. The following is a partial list of directions for further research on the problem generalizing disequilibrium models and their estimators: (1) the consequences of restricting the conditional probabilities respect tot, and how to relax this restriction; (2) the problem of finding an optimal covariance estimator when the serial correlation is modeled by mixing conditions; 69

PAGE 76

70 (3) the power properties of the serial correlation test in section 3.5; (4) the small sample properties of estimators obtained from starting iterative techniques with consistent estimates, but stopping iteration before convergence; (5) numerical studies examining the properties of the maximum score estimators for disequilibrium models relative to parametric estimators.

PAGE 77

APPENDIX A,l Inconsistency and Misclassified Observations We will show that constraining the direction of the price change l(~pt+l >0) to separate the sample into the underlying demand (Qt=Dt) and supply (Q =s) regimes, when in fact l(~p +l>O) misclassifies t t t observations with positive probability, leads to inconsistent estimates. Consider the estimator 8 (1,0) which solves the problem n where Ln(0,p 11 ,p 10 ) is defined on page 14, equation 2.3. We will show that p~ 1 L ( 0,y) for all n and all 0E::::, where ::: is a subset of a n n n Euclidean space. Define L (0 0' y p) = sup{L (t y)-L (0' y): Jt-0J
PAGE 78

72 (i) For all sufficiently small p(0)=p>O, plim(L (0,0',y,p)-L (0,0' ,p))=O. n n (ii) L (0,0' ,p) decreases to L (0,0' ,0) uniformly inn as p decreases n n to zero. 0 0 If plim 0 = 0 , then lim sup {L ( 0 , 0, 0) }>O for all 0E::::â€¢ n n n Proof: Suppose there exists e"E:::: such that lim sup{L (0,(:Jk,0)}<0. n n Then by (ii) we can choose p >O such that lim sup {L ( 0, 0", p) } O, it suffices to show that lim Pr(R <0)=1. n nn-+-o:> n Let M = L (0,Ef:,p) and d=lim supM Pr(R -M <-d/4) + 1 as n + oo by (i). n n nQ.E.D. Under additional regularity conditions, the conclusion of Theorem A.1.1 can be viewed as a local condition. Theorem A.1.2. In addition to A.l.l(i) and A.l.l(ii), suppose (i) al, (0)/ae=ai (0)/a0; that is, the order of integration and n n differentiation can be interchanged. (ii) 0 is an interior point of :::â€¢ (iii) aLn ( 0)/ a0 is continuous on a closed neighborhood N 1 of 0 with radius El >O, for all n sufficiently large. Let aL ( 0)/ a0 1 =1 ( 0).. If for some i there exists a positive constant n n 1

PAGE 79

73 mi such the II (0). I> m. for all 0 belonging to a closed neighborhood of n 1 1 0 0 0 with radius 2 >O, N 2 , for all n sufficiently large, then plim 0n ~0 Proof: We will prove plim 0 ~0 by showing that the hypothesis of the n theorem implies lim sup {L ( 0)-L ( 0*) }<0 for some sequence ( 0*) n n n n n belonging to :::. Let 3 =min( 1 2 ). Since N 3 is compact and Ln(0) is continuous on N 3 , there exist points 0" belonging to N 3 such that L (r-JIC)=sup{L (0):0 n n n n belongs to N 3 }. Furthermore, since 1In(0)i l>O on N 2 , the points~ lie on the boundary of N 3 Therefore, I ~-0 I=3 By the mean value theorem we have L C Ef:)-1 C 0) = n n n K E i=l ( E'fk .-0?)1 ( 0'). n,1 1 n n 1' (2) where 0~ lies on the segment connecting and 0. Now if Ln ( 0~) i >mi >O, then we must have EJk .-0?>0. Otherwise, since L is strictly increasing n,1 1n 0 in its i-th argument on N 3 , we would have L (EJk 1 , ,0., ,E'fk k)> n n, 1 n, L (E'fk 1 , ,tfk ., ,E'fk k) which contradicts the fact that E'fk is a n n, n,i n, n maximizer of L if then 0 . 0 .-0.<0. n Similarly, I C 0 1 ) j . <0, n n J n, J JWithout loss of generality suppose I C 0 1 ) >mi >O for i =l, ,h and n n 1 L ( 0'). ~. h E i=l ( E'fk 0?) m + E ( 0?0* . ) ( -m. ) n,1 1 i i=l+h 1 n,1 1 n n n K >m E i=l I EJk -0~ I >m. d >O, n,1 1

PAGE 80

74 for some d>O, where m=min(m 1 , â€¢.. ,~, -~+ 1 , ,-~). This implies lim sup(L (0)-L (0*) }O), and note that (3) (4) Now if p~ 1 =1, then (4) is the expectation of a likelihood equation, and therefore given the usual regularity conditions we have E(f:)=O at 0 p 11 -l. This condition will imply b b -l( J ftgstdQt=(l-l( J ftgdtdQt (5) Substituting (5) into (4) yields (6) b / o2 For 1( â€¢)=O, given the normality of Elt'EZt we have ft=(Qt-xtB 1 )xt crE 1 Substituting this into (6), and summing over the observations gives (3). A.2 The Computational Tractability and Asymptotic Properties of the Least Squares Estimator of Section 2.3 In Section 2.3 we proposed using a LS estimator to find the consistent and asymptotically normal solution to the likelihood equations; i.e., use the LS estimates as starting values to iterate to the consistent and asymptotically normal local maxima of the likelihood function. The success of this strategy depends on:

PAGE 81

75 (a) The objective functions to be solved for the LS estimates are not characterized by an unknown number of local minima so that global minima can be easily found; i.e., multiple solutions are not a problem. (b) The LS estimators (defined as global minimizers) are consistent and have a proper limiting distribution. If (a) fails, then the LS method is no more computationally tractable than the ML method, and thus one might as well use the ML method to begin with. (b) ensures convergence to the consistent and asymptotically normal local maxima of the likelihood function. (See, for example, Amemiya (1973, pp. 1014-15).) In this section we will argue that both (a) and (b) are likely to be satisfied in practice. Condition (a) will be obviously satisfied if the following optimization problems have unique solutions: 1 n 2 local-min nI: (l(tip +l >0)-E(l(tip +l >0))) t=l t t (1) (2) (3) where E(Qt) denotes the function E(Qt) with y 0 estimated by y (obtained ) "2 2. 00 2 2 0 from (1 ), and E(Qt) denotes E(Qt) with s 1 ,s 2 ,(cr 1 +cr 2 ), and y 2 2 estimated by B 1 , B 2 , (crd+cr), and y (obtained from (1) and (2)).

PAGE 82

76 Solutions to problems (2) and (3) are OLS estimates, and therefore are unique if the appropriate matrices of explanatory variables have full column rank. For example, unique LS estimates can be obtained by solving (2) if the following matrix has full column rank: d (1-~(x y))xs, 4>(x y))x , q,(x y) n n n n n h s d h lxks f 1 . bl f h 1 were xt enotes t e vector o exp anatory varia es o t e supp y d d h lxkd f d d 1 . bl I equation, an xt t e vector o eman exp anatory varia es. n general, the matrices of explanatory variables for (2) and (3) will have full column rank provided that the functions ~(xty) and Cxty) are not constant for all t. Solutions to problem (1) are nonlinear LS estimates, and conse quently establishing their uniqueness is much more difficult. Unfortun ately, attempts to prove that problem (1) has a unique solution have been inconclusive. However, there is some evidence suggesting that problem (1) can be solved for a global minimum in practice. First, the larger the sample size the more likely problem (1) will have a unique solution. Lemma A.2.4 below provides a rank condition which ensures a unique solution with probability approaching one as n approaches infinity. Second, given the data discussed in Chapter 4, attempts to solve problem (1) were successful in the sense that all starting values iterated to the same solution. In contrast, attempts to maximize the likelihood function were unsuccessful as different starting values iterated to different solutions. Third, the objective function in problem (1) is bounded below (by zero) which simplifies the search of the parameter space for a global minimum. In contrast, a search for a

PAGE 83

77 global maximum of the likelihood function is complicated by 2 2 L ~ 00 as a -+O or 2 -+0, (see, for example, Maddala n unboundedness: (1983, p. 300)). Therefore, any search for a global ML estimate will be futile unless one is willing to arbitrarily bound the error variances away from zero. Next we discuss conditions that imply consistency for the LS estimator. We will only consider conditions that imply consistency for the nonlinear LS estimator defined as any global minimizer of problem (1). (Given plim ~y 0 , proving consistency for the OLS estimators obtained from solving problems (2) and (3) involves repeated application of Jennrich's (1969, Lemma 3) mean-value theorem for random functions, and is quite tedious.) For simplicity, rather than necessity, we will assume that all relevant random variables are independent identically distributed across t. This enables us to apply the following simplified version of White's (1980) Lemma 2.2 to the global minimizer of problem (1). Lemma A.2.1. Let Q (w,0) be a measurable function on a measurable space n Wand for each win W a continuous function on a compact set = Then A there exists a measurable function 0 (w) such that n A Q (w,0 (w))=inf Q (w,0) for all win W. n n 0 :'. n If plim{suplQ (w,0)-Q(0) IJ=O, and if Q(0) has a unique minimum at 0 n ::: 0, then plim 0 =0. n Proof: See White (1980, Lemma 2.2). The first part of lemma A.2.1 ensures the existence of the nonlinear LS estimator (defined as a global minimizer). The second part will be used to show consistency. For this purpose we define,

PAGE 84

Q ( 0)= n -1 n -1 = n 78 n 2 I (1( l;pt+l >Q)-E(l( l;pt+l >Q))) t=l n 2 I ( z ( 0) + u 1 t) , t=l t 0 0 0 0 ) where 2 t( 0 )=pll-pll (pll-plO)~(xtY) + (pll-plO)~(xt Y' a nd 0=(p 11 ,p 1O ,y). To apply the second part of Lemma A.2.1 we need to show uniform convergence, and that Q(0) has a unique minimum at 0. The next lemma, which is due to Hoadley (1971), provides a moment restriction that implies uniform convergence. Lemma A.2.2. For the function defined in Lemma A.2.1 suppose E Jq (0) Jl+dO. n Then plim {sup Jq ( 0)-Q ( 0) I l=O. 0 n n .: Proof: See Hoadley (1971, Theorem A.5). The following lemma establishes that the moment restriction holds. I l l+d Lemma A. 2. 3. E Q ( 0) 0. n Proof: Since zt(0) is bounded we have I 1 2+d Therefore, the conclusion of the lemma follows if E u 1 O. t Let lt=l(l;pt+l >O), set d=l, note that EJlt Jk=E(lt)< 00 , k=l,2, , and recall that u =l -E(l ). Thus, lt t t Q.E.D. Finally, we present a rank condition that implies Q(0) has a 0 unique minimum at 0, and therefore together with Lemma A.2.3 ensures consistency for a global minimizer of Q (0). n

PAGE 85

79 Lemma A.2.4. i Suppose xt is a discrete random variable, and let xt denote the i-th member of the support of xt. For each 0~ such that 0 0=0, suppose there exists k>l members of the support of xt such that the following matrix has full column rank: 1 1 If p~ 1 >p~ 0 , then Q(0) has a unique minimum at 0. Proof: Since E(u 1 t lxt)=O, we have Obviously, Q(0) has a minimum at 0 since E(zt(0) 2 )=0. To prove uniqueness it suffices to show that 0=0<=> E(z (0) 2 )>0. t Since Pr(z (0) 2 >0)=1, we have t 0 2 Suppose for some 0=0, E(zt(0) )=O. 2 2 E(z (0) )=O<=>Pr(z (0) =O)=l. t t This implies that for every x! belonging 2 to the support of xt,zt(0) =O. That is, But this contradicts the assumption that~ has full column rank unless 0 0 0 P11-P11=P11-P1o=P11-P1o=O. Q.E.D. Finally,we note without proof that Theorem 3.1 of White (1980) can be applied to show that the nonlinear LS estimator obtained from solving problem (1) is asymptotically normal. Therefore, the LS estimates have a proper limiting distribution.

PAGE 86

80 A.3 Proofs of Theorems 3.2-3.17 Proof of Theorem 3.2: See McLeish (1975, Theorem 2.10). Proof of Theorem 3.7: The proof is the same as Hoadley's (1971) Theorem 1 except Theorem 3.2 is applied instead of Markov's law of large numbers. Proof of Theorem 3.8: For notational simplicity let lt=l(~pt+l >0). Consider an arbitrary point El"'E~. We will show that given E>O there p exists d >O such that I 0 -0k I O be chosen. Then equation (1) implies that there exists at=a(xt,lt)>O and dt=d(xt,lt)>O such that for lqtl>at and l0p-e;l..:::a=max(a 1 , ,ak) has a finite support) equation (2) holds a.e. and 10 -0*lO such that equation (2) holds a.e. uniformly int whenever 10 -0*l
PAGE 87

81 Proof of Lemma 3.9: The result follows from the fact that is separable and ft(Qt,1(6pt+l >0) l0p) is continuous on ~See, for example, Loeve (1960, p. 510). Proof of Lemma 3.10: Hartley and Mallela (1977, Corollary 4.2) prove that there exists p(0 )>0 such that p k E lsup{lnft(Q ,1(6pt+l>O) 10'): l0'-0l
PAGE 88

82 We will show that Step 1 follows from Given 3.17(i), the mean-value theorem for random functions (Jennrich (1969, Lemma 3)) allows us to write ,..Jill o ,..Jill o f(tJ )=f(0 )+(af(0 )/30 )(tJ -0) and n p n p n p ' (5) ( 6) (7) where f (if1 1 ). denotes the i-th column of the matrix f (if1 1 ), and 0 -c n 1 -c n n and each lie on the segment connecting ef1 1 and 0. n n p Given (7), 3.17(ii), and plim e: 1 =0;, we have o T" o (0) 1 f (0).+o(l). -c p -c p J p (8) ,..Jill o Given (6), (7), 3.17(ii), 3.17(iii), H and plim tJ =0, we have o n p (9) Substituting (8) and (9) into (1) we get the desired result: By Step l we can write Therefore, by 3.17(iv), 3.17(v), and 3.17(vi), we have

PAGE 89

83 Given 3.17(vi), by Corollary 4.24 of White (1984), (11) (6) and (7) imply, and therefore by Corollary 4.28 of White (1984) we have Finally, since plim (D (#1 1 )1 -D (0)1 )=0 by Theorem 4.30 of White n n n p ' (1984) Q.E.D. A.4 Quadratic Hill-Climbing and the Asymptotic Distribution of the (p+l)th-Round Estimates The (p+l)th (p=l,2, ..â€¢ ) iteration of the quadratic hill-climbing technique is given by (1) where a = max ( A +r 11 \71 ( ff) 11, 0), A is the maximum eigenvalue of n n n n n ..J-1 (ff), r is a scalar correction factor, and I I \71 ( Ef) 11 denotes the n n n n length of the k dimensional vector \71 (Ef). n n Goldfeld, Quandt, and Trotter (1966) show that the technique h _o+ l h d f L ( 0) c oases tr to maximize t e qua ratic approximation o on a n n region centered at ff of radius n

PAGE 90

84 If the quadratic approximation is good, (that is, if the step increases L (0)), then in the next step r is decreased. Otherwise r is increased. n Further details can be found in Goldfeld, Quandt, and Trotter (1966). Next we show that the estimator defined by eJ>+l has the same n asymptotic distribution as the partial-MLE provided that plim #=0 and n Tn (el'-0) has a proper limiting distribution. More explicitly, we show n , ~P+l 0 vn(tr -0) n (2) The implication is that when consistent initial estiamtes are employed, iteration beyond the second-round does not improve the final estimates, at least asymptotically. -1 To prove (2), it suffices to show that plim n a =O. To see this, n consider the mean-value expansion (3) Substituting (3) into (1) and rearranging, we get ;~P+ 1 O -1 -1 2 ~P -1 1 o vn ( tr -0 ) ( n a I-n V-L ( tr ) ) n 2 171 ( 0 ) n n n n n -1 2 ~P -1 -1 -1 2 r -P o = [ I -( n V-L ( tr )-n a I) n V-L ( 0 ) ] vn ( tr 0 ) n n n n n n (4) Therefore, if plim n1 a =O, then (2) follows from (4) since the right n hand side of (4) converges in probability to zero. -1 The following theorem establishes that plim n a =O. n n Theorem A.4 .1. For 171 ( 0) = E alogft ( 0) / a0, suppose n t=l (i) -1 n plim sup n r [alogf (0)/a0E(alogf (0)/a0)] = O. 0 t=l t t

PAGE 91

(ii) plim cf=e 0 n (iii) E(alogf (0)/a0) is continuous. t -1 (iv) plim n A
PAGE 92

:SIBLI0GRAPHY Amemiya, T. (1973). "Regression Analysis when the Dependent Variable is Truncated Normal." Econometrica 41:997-1016. Amemiya, T. ( 1974). "A Note on the Fair and Jaffee Model." Econometrica 42:759-762. Amemiya, T., and G. Sen (1977). "The Consistency of the Maximum Likelihood Estimator in a Disequilibrium Model." Technical Report 238. Institute for Mathematical Studies in the Social Sciences, Stanford University. Benassy, J.P. (1982). The Economics of Market Disequilibrium. New York: Academic Press. Bowden, R.J. (1978). The Econometrics of Disequilibrium. Amsterdam: North Holland. Cosslett, S.R. (1983). "Distribution-free Maximum Likelihood Estimator of the Binary Choice Model." Econometrica 51:765-782. Fair, R.C., and D.M. Jaffee (1972). "Methods of Estimation for Markets in Disequilibrium." Econometrica 40:497-514. Fair, R.C., and H.H. Kelejian (1974). "Methods of Estimation for Markets in Disequilibrium: A Further Study." Econometrica 42:117-190. Fisher, F.M. (1983). Disequilibrium Foundations of Equilibrium Economics. New York: Cambridge University Press. Goldfeld, S.M., and R.E. Quandt (1975). "Estimation in a Disequilibrium Model and the Value of Information." Journal of Econometrics 3:325-348. Goldfeld, S.C., R.E. Quandt, and H.F. Trotter (1966). "Maximization by Quadratic Hill-climbing." Econometrica 34:541-551. Gordin, M.I. (1969). "The Central Limit Theorem for Stationary Processes." Soviet Mathematics 10:1174-1176. Hartley, M.J., and P. Mallela (1977). "The Asymptotic Properties of a Maximum Likelihood Estimator for a Model of Markets in Disequilibrium." Econometrics 46:1251-1271. 86

PAGE 93

87 Heckman, J.J. (1976). "The Common Structure of Statistical Models of Truncated, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models." Annals of Economic and Social Measurement 5:475-492. Hoadley, B. (1971). "Asymptotic Properties of Maximum Likelihood Estimators for the Independent Not Identically Distributed Case." Annals of Mathematical Statistics 42:1977-1991. Ito, T., and K. Ueda (1981). "Tests of the Equilibrium Hypothesis in Disequilibrium Econometrics: An International Comparison of Credit Rationing." International Economic Review 22:691-708. Jennrich, R.I. (1969). "Asymptotic Properties of Non-linear Least Squares Estimators." Annals of Mathematical Statistics 40:633-643. Laffont, J.J. and R. Garcia (1977). "Disequilibrium Econometrics for Business Loans." Econometrica 45:1187-1204. Lee, L.F., and R.H. Porter (1984). "Switching Regression Models with Imperfect Sample Separation Information -With an Application on Cartel Stability." Econometrica 52:391-418. Levine, D. (1983). Likelihood." "A Remark on Serial Correlation in Maximum Journal of Econometrics 23:337-342. Loeve, M. (1960). Probability Theory. 2nd ed. Princeton: Van Nostrand. Maddala, G.S. (1983) Limited-dependent and Qualitative Variables in Econometrics. New York: Cambridge University Press. Maddala, G.S., and F. Nelson (1974). "Maximum Likelihood Methods for Markets in Disequilibrium." Econometrica 42:1013-1030. Manski, C.F. (1975). "The Maximum Score Estimation of the Stochastic Utility Model of Choice." Journal of Econometrics 3:205-228. Manski, C.F. (1983). "Closest Empirical Distribution Estimator." Econometrica 51:305-320. Manski, C.F. (1985). "Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator." Journal of Econometrics 27:313-333. Mcleish, D.C. (1975). "A Maximal Inequality and Dependent Strong Laws." Annals of Probability 3:826-836. Newey, W.K. and K.D. West (1985). "A Simple, Positive Definite, Heteroscedasticity and Autocorrelation Consistent Covariance Matrix." Discussion paper 92, Woodrow Wilson School, Princeton University.

PAGE 94

88 Olsen, R.J. (1978), "Note on the Uniqueness of the Maximum Likelihood Estimator for the Tobit Model," Econometrica 46:1211-1215. Powell, J.L. (1984). "Least Absolute Deviations Estimation for the Censored Regression Model." Journal of Econometrics 25:303-325. Rao, R.R. (1962). "Relations between Weak and Uniform Convergence of Measures with Applications." Annals of Mathematical Statistics 33:659-680. Rudin, W. (1976). Principles of Mathematical Analysis. New York: McGraw-Hill. Serfling, R.J. (1968). "Contributions to Central Limit Theory for Dependent Variables." Annals of Mathematical Statistics 39:1158-1175. Sealy, C.W., Jr. (1979). "Credit Rationing in the Commercial Loan Market: Estimates of a Structural Model Under Conditions of Disequilibrium." Journal of Finance 34:689-702. Wald, A. (1949). Estimate." "Note on the Consistency of the Maximum Likelihood Annals of Mathematical Statistics 20:595-601. White, H. (1980). "Nonlinear Regression on Cross-Section Data." Econometrica 48:721-746. White, H. (1984). Asymptotic Theory for Econometricians. New York: Academic Press. White, H., and I. Domowitz (1981). "Nonlinear Regression with Dependent Observations," Unpublished paper, University of California, San Diego. White, H., and I. Domowitz (1984). "Nonlinear Regression with Dependent Observations." Econometrica 52:143-162.

PAGE 95

B I O GRAPHICAL SKETCH Walter James Mayer was born in Detroit, Michigan, in 1955. He received a Bachelor of Arts degree in economics from the University of Missouri in 1982, and a Master of Arts degree from the University of Florida in 1983. 89

PAGE 96