Estimation and prediction for certain models of spatial time series

MISSING IMAGE

Material Information

Title:
Estimation and prediction for certain models of spatial time series
Physical Description:
viii, 134 leaves : ; 28 cm.
Language:
English
Creator:
Eby, Lloyd Marlin, 1951-
Publication Date:

Subjects

Subjects / Keywords:
Spatial analysis (Statistics)   ( lcsh )
Time-series analysis   ( lcsh )
Estimation theory   ( lcsh )
Prediction theory   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis--University of Florida.
Bibliography:
Includes bibliographical references (leaves 131-133).
Statement of Responsibility:
by Lloyd Marlin Eby.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 000086058
notis - AAK1413
oclc - 05356240
System ID:
AA00003493:00001

Full Text










ESTIMATION AND PREDICTION FOR CERTAIN MODELS
OF SPATIAL TIME SERIES













By

LLOYD MARLIN EBY


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY








UNIVERSITY OF FLORIDA


1978


































TO

MY PARENTS AND FAMILY

FOR THEIR LOVE AND SUPPORT















ACKNOWLEDGMENTS


My sincere thanks go to my advisors, Dr. Richard Scheaffer and

Dr. James McClave. I will always appreciate their patient guidance

throughout this project, from suggesting the problem to providing help-

ful comments on the first draft of this paper. To be able to draw on

their experience in research situations was always reassuring.

Special thanks go to the faculty and students of the

Department of Statistics for their encouragement during the pursuit

of this degree.

My appreciation extends to Professor Harry Canter at Millersville

State College who, because of his enthusiasm for statistics and special

interest in his students, was instrumental in my entering this field.

I am deeply grateful for the support of my family and friends.

Knowing of their loving concern and prayers for me, during this under-

taking, meant much to me.

My typist, Mrs. Edna Larrick, is especially deserving of my

thanks. She somehow deciphered my hieroglyphics and turned them into

this typing masterpiece. Her perseverance in this difficult task is

greatly appreciated.














TABLE OF CONTENTS


Page
ACKNOWLEDGMENTS . . iii

LIST OF TABLES . . ... ..... .vi

ABSTRACT . . ... ... .vii

CHAPTER

I INTRODUCTION . ... . 1

1.0 Preamble ... . . 1
1.1 Introduction to the Spatial Problem .. 1
1.2 A Literature Review . . 3
1.3 Our Approach to the Problem . 7
1.4 An Outline of Our Results . .. 10
1.5 Notation and Format .. .. 11
1.6 Review of Assumptions Introduced in Chapter I .. 11

II ESTIMATION OF MODEL PARAMETERS . ... .15

2.0 Preamble . . .. 15
2.1 The Usual Yule-Walker Estimators . .. 15
2.2 The Known Weights Case . ... 17
2.3 The Variable Weights Case . .. 22
2.4 Review of Assumptions Introduced in Chapter II 29

III PROPERTIES OF ESTIMATORS .. . ... .30

3.0 Preamble . . ... 30
3.1 Results for the Usual Yule-Walker Estimators
and Another Useful Lemma . 30
3.2 The Known Weights Case . .... 32
3.3 The Variable Weights Case .. ..... .35
3.4 Review of Assumptions Introduced in Chapter III 62

IV ESTIMATORS OF COVARIANCE MATRICES AND THEIR PROPERTIES 64

4.0 Preamble . . ... 64
4.1 Results for the General First-Order Autoregressive
Multivariate Model ..... . 64
4.2 The Yule-Walker #1 and #2 Covariance Estimators 67
4.3 Consistency of the Yule-Walker #1 and #2
Covariance Estimators . .. 71










TABLE OF CONTENTS (Continued)

CHAPTER Page
V INFERENCE . .. .. 74

5.0 Preamble . . .. 74
5.1 Asymptotic Single-Parameter Hypothesis Tests
and Confidence Intervals. . 74
5.2 Asymptotic Multiparameter Hypothesis Tests
and Confidence Regions . .. 79
5.3 Prediction with the General First-Order Autoregressive
Multivariate Time Series Model. . 82
5.4 Prediction with the Spatial First-Order Autoregressive
Multivariate Time Series Model. .. 95
5.5 Review of Assumptions Introduced in Chapter V 97

VI EMPIRICAL RESULTS . . .. .... 98

6.0 Preamble . . .. 98
6.1 Monte Carlo Studies . . .. 98
6.2 A Real Data Example . .. .127

BIBLIOGRAPHY . . .. . 131

BIOGRAPHICAL SKETCH . . .. ..... 134














LIST OF TABLES


Table Page
1.1 Notation . . ... ..... 12

1.2 Assumptions Introduced in Chapter I . .... .14

2.1 Assumptions Introduced in Chapter II . ... 29

3.1 Assumptions Introduced in Chapter III . .. 63

6.1 Minimum and Maximum Absolute Roots of f(z) = I- Br z =0 102

6.2 Weights Assigned to the Neighbors of Location 7 .. .103

6.3 The Negative First-Order Correlation of Each Location
with Location 7 . .... .. .105

6.4 YW#1 Estimates of a with Actual and Estimated Asymptotic
Standard Deviations of aTl . .... 107

6.5 YW#1 Estimates of b with Actual and Estimated Asymptotic
Standard Deviations of bTl . .. .109

6.6 YW#1 Estimates of a with Actual and Estimated Asymptotic
Standard Deviations of aT1 . .. ill

6.7 Actual and Estimated Values of the Asymptotic Covariance
of aT1 and bT ............. .... .. 119

6.8 Actual and Estimated Values of the Asymptotic Covariance
of aTl and aT ........... .. 120

6.9 Actual and Estimated Values of the Asymptotic Covariance
of bT1 and TI . . 122

6.10 Mean Squared Errors for Usual Yule-Walker and
Yule-Walker #1 Estimates . .. ... .124

6.11 Mean, Range, and Standard Deviation of (aTl, bT, T1) 125

6.12 Covariances and Correlations of (a T, bT, TI) .. .125

6.13 Names and Coordinates of Employment Exchange Cities 128

6.14 YW#1 Estimates of (a, b, a) and the Asymptotic
Covariance Matrix of (aT, b, aT) . .. 129
STV' TV' Ti










Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy


ESTIMATION AND PREDICTION FOR CERTAIN MODELS
OF SPATIAL TIME SERIES

By

Lloyd Marlin Eby

August 1978


Chairman: Richard L. Scheaffer
Co-Chairman: James T. McClave
Major Department: Statistics


Our primary objective is to consider a special class of first-

order autoregressive multivariate time series models in which the

individual series correspond to locations on a plane. Conditioned on

the past, the expected response at a given location for a given time

period is taken to be a linear combination of the immediate past

response at that location and a weighted average of the immediate past

responses at the other locations. If the weights are not assumed to be

known, an exponential weight function of the interlocational distances

is used. (We refer to this as the variable weights case.) The form

of the weighting function is quite flexible in that it allows for a wide

range of weighting schemes which might be appropriate in various applica-

tions to both regular and irregular arrays of locations. Parameters of

interest are the two linear coefficients and a parameter in the weight

function (in the variable weights case).

An estimation procedure is proposed which takes into account

the spatial nature of the process through modification of the usual

vii









Yule-Walker estimators. Using the results for the usual Yule-Walker

estimators, ours are shown to be consistent (in probability) and asymp-

totically normally distributed for both the known and variable weights

cases.

A benefit of our approach to the spatial time series problem is

that we obtain straightforward asymptotic tests for location, neighbor,

and distance effects. Asymptotic joint confidence ellipsoids are also

given for these parameters. We develop an approximation to the variance-

covariance matrix of the k-step prediction errors in using the fitted

general first-order autoregressive model. The necessary modifications

of this matrix for the spatial model are given.

We present consistent estimators of the variance-covariance

matrices of the error term and the time series. This allows us to

consistently estimate all other variance-covariance matrices encoun-

tered in our work.

Some simulation results are presented which indicate that the

performance of our estimators depends on the location, neighbor, and

distance effects as well as array characteristics. There does not

appear to be one model specification for which all estimators perform

well except for large (by time series standards) samples. An actual

data example is also analyzed.

The methodology developed is flexible so that it can have a wide

range of application. The procedures presented suggest the possibility

for extension of these results to other first-order autoregressive models,

both spatial and nonspatial, for which restrictions are placed on the

coefficient matrix.


viii















CHAPTER I


INTRODUCTION



1.0 Preamble

The spatial problem being investigated is introduced in Section 1.1

by considering several examples which serve as motivation for our work.

After a review of the literature in Section 1.2, we describe our approach

to this problem in Section 1.3. An outline of the results to be presented

is given in Section 1.4. Section 1.5 introduces our notation and format

for the dissertation.


1.1 Introduction to the Spatial Problem

Many physical processes generate multivariate responses for which

the components of the vector response are associated with distinct points

in a plane. These responses may be repeated over time. Such processes

are referred to as spatial-temporal processes. For example, several

weather stations might be located throughout a region, with each station

monitoring local conditions on a regular basis. Suppose temperature

readings are recorded every hour at each station. We can regard the vec-

tor responses of hourly temperatures as a multivariate time series.

In addition to expecting a relationship among the vector responses over

Time, we might expect a spatial relationship among the components of the

vector since the individual variates correspond to particular locations

in a region. In particular, we might expect that there is a "distance

1










effect" among the locations, with the responses from those stations that

are close together being perhaps more strongly related than responses

from stations that are far apart. We refer to multivariate time series

of this type as spatial time series.

For our real data example in Chapter VI, we consider unemployment

rates for ten centers in southwestern England. Each month, the unemploy-

ment rate is determined for the region corresponding to each center.

These monthly rates for all ten centers constitute a spatial time series.

In modeling a spatial time series, our objective in this paper is

to model the nondeterministic component of the series. (We expect to con-

sider the deterministic component as well in future research.) With this

objective in mind, we consider the simplest autoregressive model, the

first-order model for which

Yt = B t + t' (1.1.1)

where yt is the vector response at time t, ct is an unobservable random

error vector, and B is an n x n matrix of coefficients. In the general

model, it is not assumed that B has a specific structure which would

reflect the spatial nature of the series. Consequently, in applying only

the general estimation schemes for B to a spatial problem, we are not ex-

plicitly accounting for the spatial aspects of the phenomenon under study.

It would seem desirable to assume a structure for B that reflects

the spatial nature of the process. In particular, in considering a

response at a given location at time t, it would be of interest to con-

sider the relationship of that response to a response at the same loca-

tion and to responses at neighboring locations in the previous time period.

Factors such as distance should enter into the consideration of the rela-

tionship with a particular neighbor.










By assuming such a structure for B and developing estimation

procedures based on this structure, we hope to model the underlying

process which generated the series. In addition, the structural assump-

tions would probably mean a reduction in the number of parameters in the

model. A parsimonious parameterization is desirable, provided that such

a model adequately describes the process, since such a parameterization

allows more efficient usage of the sample information. A model which

incorporates both the spatial and time aspects of the process would seem

to be a better forecasting tool than a model which only includes the time

aspect.

Before going into more detail on our approach to this problem,

we review the literature on related problems.


1.2 A Literature Review

Much of the work in the general area of spatially related random

variables has been done with purely spatial processes, where both joint

and conditional models have been considered. For the joint model, the

response at location i is related to the responses at the other locations,

simultaneously. Specializing a joint model to the linear case, we have

that

y = ij Yj + Ei' (1.2.1)
j/i
where E. is a random error term. For the linear conditional model, the
1
relationship is such that

E(yiy responses at other locations) = E Yj y.. (1.2 .2)
jfi

At first glance, it would appear that taking the expectation of yi in

(1.2.1) conditional on the responses at the other locations would yield









(1.2.2) with y.. = Bij for all i and j J i. However, this is not the

case since the error term, Ei, is not independent of the y.'s. Bartlett

(1974), Besag (1974), Brook (1964), Cliff and Ord (1975), and Ord (1975)

give more complete discussions of the differences between the two speci-

fications.

Many of the specific results for spatial processes are for.

regular arrays (for example, rectangular) of locations. Restrictions are

usually placed on the coefficients in (1.2.1) and (1.2.2). For example,

a simple first-order joint model on a regular lattice is given by

j= (i-lj + ij- + i+lj + i,j+) + j'

where the subscripts correspond to the coordinates of the location.

The correlation structure (or spectral function) of some of the

joint models (or their continuous analogues) are considered by Bartlett

(1974), Besag (1972), Heine (1955), and Whittle (1954). Whittle (1954)

developed a maximum likelihood estimation scheme for the parameters of

the spectral function.

Besag (1974), a major proponent of a conditional approach,

discusses a class of conditional models called auto-models. Examples are

the auto-normal, auto-binomial, and auto-logistic. These models are

specified by the probability (or density) function of y. conditional on

the response at all other locations. Although these models can be speci-

fied for both regular and irregular arrays of locations, the statistical

analysis is generally limited to the regular lattice cases. Besag (1974)

shows that it can be quite difficult to use maximum likelihood procedures

directly and thus discusses two alternative approaches. The first relies

on a subsetting of the responses (which is called coding) which results










in a simpler likelihood. In the second, another simpler maximum likeli-

hood procedure results when a unilateral approximation to the original

process is used. (For the unilateral approach, the concept of one-

directional dependency in an autoregressive time series is extended to

two dimensions.) Besag and Moran (1975) use the coding procedure to

develop a test of spatial dependency for an auto-normal process.

Although irregular arrays may be less attractive mathematically,

they are of interest for practical reasons since many spatial processes

occur naturally on irregular arrays. Cliff and Ord have done extensive

work in this area. Their approach has been to specify weights that are

functions of array characteristics such as interlocational distances and

region size. (See Cliff and Ord (1969, 1975), Cliff, Haggett et al.

(1975:148-149, 161), or Mead (1971) for examples.) For example, a joint

model could be specified such that

n
y. = P E w.. y. + E., (1.2.3)
j=1
j#i

where the w..'s are known weights and E is a random error term. (The

approach also extends to the conditional case.) A natural extension would

be for a restricted parameterization of the weights so that sample infor-

mation could be used to estimate them.

Two types of inference problems are considered for Cliff and

Ord-type models. The first involves tests for spatial autocorrelation

and the second involves parameter estimation. Cliff, Haggett et al.

(1975:152-155) present a parameteric test (under normal assumptions)

and a nonparametric test of H : p = 0, where p is as in (1.2.3) or its

conditional analogue. Both test statistics, under the null hypothesis,









have asymptotic normal distributions (as n m). Cliff and Ord (1972)

develop a similar test for spatial correlation among the error residuals

in a linear regression.

Maximum likelihood estimation procedures (under normal assump-

tions) are presented by Ord (1975) for both the model in (1.2.3) and an

extension which included regressor variables. Maximum likelihood proce-

dures for some other models are outlined in Cliff and Ord (1975).

Another approach to modeling spatial processes has been to think

of the responses as a surface and fit polynomial models of the form,

m r
y = E E ij x x + E,
i=0 j=0 1 1 2

where xl and x2 are the map coordinates and c is a random error term.

(See Cliff, Haggett et al. (1975:49-70).) This is an example of a trend

surface model.

A somewhat different class of spatial processes is the class of

spatial point processes. These processes are characterized by the dis-

tribution of points across a region. The literature is fairly extensive

in this area. Two important types of analysis of point processes are the

distance methods and the quadrat. count methods. A sampling of results

for these and related methods can be found in the work of Diggle (1975),

Holgate (1972), Mead (1974), Rogers (1974), and Strauss (1975).

Spatial-temporal processes are an extension of purely spatial

processes. Both Granger (1969) and Cliff, Haggett et al. (1975:107-141)

have used standard multivariate time series techniques (cross-spectral

analysis) in comparing time series corresponding to locations in a region.

Cross-sectional time series analysis may be appropriate for some

spatial problems where the cross sections are taken over regions or










locations. Swamy and Mehta (1977) consider a linear model for cross-

sectional time series in which the coefficient vector is taken to be the

sum of a mean vector and two random components. One component varies

over time and among individuals (which could be locations) and the other

varies only over individuals.

Fuller and Battese (1974) consider estimation of a linear model

for cross-sectional time series but assume an error term which is the

sum of location and individual components (possibly random) and another

random component. Both Maddala (1971) and Nerlove (1971) have studied

estimation for error-component linear models (somewhat similar to Fuller

and Battese's model) which contain a single lagged value of the depen-

dent (univariate) variable.

Cliff and Ord (1970) discuss estimation schemes and testing

procedures for the coefficient vectors of a linear model for cross-

sectional time series. Constraints on the coefficient vectors such as

equality for all individuals (or over time) are considered. They also

develop some estimation procedures when the coefficient vector is random.

Although we found a number of related problems in our literature

review, we found little evidence of statistical procedures developed for

a spatially restricted coefficient matrix for the model in (1.1.1).

This research develops such procedures.



1.3 Our Approach to the Problem

In Section 1.1, we suggested that a first-order spatial time

series model should incorporate location, neighbor, and distance effects

in the structure of B. We will do this by considering the response

for time t at location i, yt,i' to be of the form











n
y = a yt +b w.. ylj + (1.3.1)
j+ti =l j t-l,j ,t

j#i

where t .i is a random error term, a and b are parameters whose values

are unknown, and n is the number of locations in the array. The w..'s
1J
are weights which may be completely known or contain one or more param-

eters to be estimated from the sample information. We make three assump-

tions concerning the weights.

Al: For all w. 0 : w < 1.
ij ij1
A2: For all i, wii = 0.

A3: The weights are scaled to add to unity for each location.

n
That is, Z w.. = 1 for all i.
j=l 13
j#i

Since yt-,i already enters the model with a as its coefficient,

we set w.. = 0 for all i. The other two assumptions are made to provide
11
a consistent class of models. (For example, the total weight should not

depend on the number of locations in the array.) The necessity of these

assumptions will be seen as they are used in the derivation of certain

results in later chapters.
n
By considering all three assumptions, we see that E w. y
j=l j t-l,j
is just a weighted average of the responses at time (t-l) for all loca-

tions other than i. It follows that the parameters, a and b, can be

regarded as accounting for a location effect and a neighbor effect,

respectively. If a is zero, only the neighboring locations of i are

explicitly related to y .. However, if b = 0, none of i's neighbors
t,i
appears explicitly in the model for yti (By a neighbor of location i,

we mean any location other than i and not just contiguous neighbors.)






9



The nature of the distance effect among the neighbors would determine the

form of th weights. If a distance effect is to be considered, there

must be at east two different interlocational distances, and thus, the

need for an additional assumption.

A4:; There are at least three locations in the array. If there

are exactly three, the array is not in the form of an

equilateral triangle.

The model in (1.3.1) is a specific case of a more general model

suggested in Cliff, Haggett et al. (1975:202). By referring to the model

in (1.3.1) as "our model," we do not intend to suggest originality on our

part in the model formulation, but we do develop original methods of

parameter estimation, particularly in the variable weights case. We also

refer to this model as "the spatial model."

Writing the model in (1.3.1) in matrix form yields

t = (a I + bW)y + E,

where W is the matrix of weights (all diagonal terms are zero) and I is

the n x n identity matrix. We summarize the restrictions on B as follows.

A5: For a first-order autoregressive spatial time series, the

model in (1.1.1) is such that B = Br, where

B .. = a for all i
r,ii
and

B .ij = b iw for all i and j # i.
ri3 ij

With this model specification, one objective is to estimate a, b,

any parameters in the weight function, and the variance-covariance matrix

of the error terms. Another objective is to make the modifications neces-

sary to use this model in forecasting.










1.4 An Outline of Our Results

We consider two cases of the spatial model. In the first the

weights are assumed to be completely specified (the known weights case)

and in the second, the weights are of a specific form but contain a

parameter to be estimated (the variable weights case).

In Chapter II, we develop estimation schemes for the location and

neighbor parameters in both cases and also for the distance effect

parameter in the variable weights case. These schemes involve modifica-

tion of the usual Yule-Walker estimators according to the specific struc-

ture assumed for B (i.e., Br).

In Chapter III, we show the existence of finite-valued estimators

using these schemes. These estimators are also shown to be consistent

(in probability) and asymptotically normally distributed. The asymptotics

are in terms of T, the number of vector responses observed in time, and

not n, the number of locations in the array.

Consistent estimators of the variance-covariance matrices of

both the random error term, Et, and yt are presented in Chapter IV.

In Chapter V, we focus on inferential aspects. Procedures based

on asymptotic results are given for testing hypotheses and constructing

confidence ellipsoids for the location, neighbor, and distance (if appro-

priate) parameters. We also derive an approximation to the variance-

covariance matrix of the k-step prediction errors in using a fitted

general first-order autoregressive model and make the necessary modifica-

tions for the case of the fitted spatial model.

We conclude in Chapter VI by presenting simulation results which

provide insight into some of the procedures developed in earlier chapters.

We also analyze a real data set.











1.5 Notation and Format

Since notation in time series work can be quite cumbersome, we

summarize our notational system in Table 1.1.

From time to time, we introduce certain assumptions and as we

introduce each one, we give the rationale for it. It is to be under-

stood that the assumption is in effect for the remainder of the paper.

At the end of each chapter, we list all assumptions introduced in that

chapter.


1.6 Review of Assumptions Introduced in Chapter I

The assumptions introduced in Chapter I are summarized in Table 1.2.











Table 1.1

Notation


Notation

Ai.

A .

A..

A ..
C, 1J

{A. .}
ij

(ABC)..

A'

IAI

A C

I
n

x

xi

x
C.-, i

{x )
m m=1

f(*)

x =

P
xT x

xT D x

rT+r
*


Interpretation

row i of matrix A

column j of matrix A

the element in row i and column j of matrix A

the element in row i and column j of matrix A
c
the matrix comprised of the A..'s
13
the element in row i and column j of matrix ABC

A transposed

the determinant of the matrix A

the Kronecker product of A and C

the n x n identity matrix

the vector, x
th
the i element of x

the i element of x
-C

the sequence, x x2, x ...

the function, f

a particular value of the random variable x

xT converges to x in probability

xT converges to x in distribution or law

convergence of a sequence of constants

is approximately equal to


is distributed as

the k-variate normal distribution with mean
V and variance-covariance matrix E

Nl (WG2)


N(,o02)










Table 1.1 (Continued)

Interpretation


Notation


e
0

BT
Rk

iff

gib

in 3.2.1

in (3.2.1)

Al

Cl

R1

Smith (1975:27)


the true value of 0 when 9 is a parameter

an estimator of 0 based on T observations

k-dimensional real space

if and only if

greatest lower bound

in Section 3.2.1

in equation (3.2.1)

assumption #1

condition #1

result #1

page 27 of the reference authored by Smith and
published in 1975










Table 1.2

Assumptions Introduced in Chapter I

Section Assumption

1.3 Al: For all w.i, 0 < w. < 1.

1.3 A2: For all i, wii = 0.

1.3 A3: The weights are scaled to add to unity for each
n
location. That is, E w = 1 for all i.
j=l iJ
j#i

1.3 A4: There are at least three locations in the array.
If there are exactly three, the array is not in
the form of an equilateral triangle.

1.3 A5: For a first-order autoregressive spatial time
series, the model in (1.1.1) is such that B=B
r
where
B .. a for all i
r,ii
and


for all i and j j i.


B r, = b w..
r,ij ij















CHAPTER II


ESTIMATION OF MODEL PARAMETERS



2.0 Preamble

In this chapter, we will consider estimation schemes for

parameters other than the variance and covariance terms in the special

first-order autoregressive model introduced in Chapter I. Since the

estimation procedures to be introduced involve modifications of the usual

Yule-Walker (YW) estimators, a review of the YW estimation procedure will

be presented in Section 2.1. The estimation procedures for the known

weights case and variable weights case are presented in Sections 2.2 and

2.3, respectively. The properties of the estimators will be derived

in Chapter III.


2.1 The Usual Yule-Walker Estimators

Hannan (1970:13-15, 326-333) and Fuller (1976:72-73) are the

primary references for the results of this section.

Again consider the model for the general first-order auto-

regressive multivariate time series,

yt = Byt- + t' (2.1.1)

where yt, t-1 and et are vectors of length n and B is n x n. The follow-
--t

ing assumptions are made.

A6: All roots of f(z) = II BzI = 0 lie outside the

unit circle.










A7: The error terms, t 's, are independent and identically

distributed with mean, 0, and variance-covariance

matrix, G.

There are three implications of A6 and A7 that should be noted

at this stage. The results are given here without proof. The first

is that

E(yt) = 0 for all t. (2.1.2)

The second is that t is second-order stationary. That is, the covariance

function has the following property:


E(y 4 ) = F(s-t) for all s and t. (2.1.3)

A third implication of A6 and A7 is that E is independent of
-t

_t-1'-,t-2"'., for all t.
It is now apparent that in making assumptions A6 and A7, we are

assigning a stability to the process in terms of its first two moments.

It should also be noted that since our special first-order model can be

included within the general framework of the model in (2.1.1), A6 and A7

will be assumed throughout for our special model and so results like

(2.1.2) and (2.1.3) will still follow.

If both sides of (2.1.1) are multiplied by yt-l and expectations

are taken, we have, after applying (2.1.2), (2.1.3), A6 and A7,


r(-l) = Br(O).

This leads to
_1
B = r(-l)r (0).

The usual YW estimator of B, B is found by replacing the

parameters on the right-hand side of the above equation with their

"moment" estimators. That is,









-1
BT= r(-1)r (0) (2.1.4)
T T T
where


r T,-ij- = Tk Y+k,i Y1,j k=O,l.


This estimator, B T, defines a process which satisfies, with

probability one, the conditions for stationarity given in A6.

As was noted, an implication of A6 and A7 is that E( t) = 0.

This is somewhat unrealistic if yt is regarded as the vector observa-

tion at time t. Thus, we will let x denote the vector observation at
-t
time t and assume ~t to be as in A8.

A8: Let yt = x R for all t, where E(x ) = p.
S L t -t -t
The calculations necessary in (2.1.4) are then carried out using
T
x x, t=l,2,...,T, where xi = E xt T. Hannan shows that this
-t i It=1 ti

mean correction does not change any asymptotic properties of interest in

our work. Consequently, for the remainder of the theoretical consider-

ations in this paper, it will be assumed, without loss of generality,

that the mean correction has already been made.

A9: Let t = x- x, t=l,2,...,T, be the observations
--- t --t

used to fit the model in (2.1.1).




2.2 The Known Weights Case


2.2.1 Introduction

We now work with the special form of the coefficient matrix, B,

which we denote by B where
r










Br, = a,=1,2,
r,ii""
and

B ij= bw i,j=l,2,...,n; iij.


The wij's are known weights for which we assume Al, A2, and A3.

Our objective then is to estimate a and b which, in turn, allows us

to estimate B
r

Since the weights are assumed to be known in this section, it

would be helpful to first consider some possible choices of weights.


2.2.2 Examples of Known Weights

It was stated in the introduction that most of the work done

with spatial processes on irregular lattices has been with known weights.

Ord (1974) states in the discussion on Besag's paper that one of the

specifications of a spatial model arises when the spatial relationship is

in the form of a time lag, which is true of our model although the spec-

ification is different. Consequently, some of the weighting patterns

that have been suggested or used in the literature for spatial models

are presented here, since they may be appropriate for the spatial-

temporal processes that we consider. If the researcher possesses con-

siderable insight into the process being studied, it may be reasonable

to completely specify an appropriate weighting scheme.

In the following examples, the weights will be presented in the

unsealed form. The simplest weighting scheme is: w.. = 1 if location j
1J
is a nearest neighbor of location i, j # i, and w.. = 0 otherwise. (See

Cliff, Haggett et al. (1975: 161).) We will refer to models with this

weight structure as "closest-neighbor" models.
i











If one had regional data and wished to consider the relative size

of the regions and distances between their centers, one might use the

scheme,

q (j)
Wij d- j#i,

where qi(j) is the proportion of location i's interior boundary which is

in contact with the boundary of location j and d.. is the distance between
1j
location i and location j. (See Ord (1975).)

Both of the above weighting schemes assign nonzero weights only

to those locations which are direct or contiguous neighbors. If all

neighbors are to be taken into account in the weighting scheme, one might

use the following weights:

-6
w.. = d.. jfi,
ij ij j i,

where 6 is specified or

-ad..
jj
wij = e 1j, j/i

where a is specified. (See Cliff and Ord (1975).) For both of these

weight schemes, we see that the weights either increase or decrease

monotonically as d.. increases, the direction of change depending on

the sign of 6 and a.


2.2.3 The Yule-Walker #2 Estimation Procedure
for the Known Weights Case

Since using the usual YW estimators to fit the first-order auto-

regressive time series model, when B = Br, does not account explicitly

for the spatial nature of the process being considered, it is desirable

to develop an estimation scheme which does account for this spatial nature










in a more direct fashion. By checking assumption A7, we see that no

distribution has been assumed for the E 's. Since this allows for a
-t
wider range of application in our work, it would seem advantageous then

to develop an estimation procedure which is distribution-free. The dis-

tribution-free results given for BT in Section 2.1 and 3.1 suggest esti-

mation procedures which modify BT. There are various criteria by which

one might modify the usual YW estimator to reflect the spatial nature of

the process. One criterion is to use as an estimator of B those esti-
r
mators of a and b which make the B .'s as "close" as possible to the
rT,ij
usual YW estimators, the BT,ij 's. The criterion suggests a least squares

approach.

In this case, take as the estimators, aT2 and bT2, those values

of aT and b which minimize

n n
SS = Z Z (B T,i-Bri ) (2.2.1)
i=1 j=1

where BrT,ii = aT and BrT,ij = b wij, j#i. Also BT is the matrix of

usual YW estimators given by (2.1.4). (This subscript "2" indicates that

these are the YW#2 estimators. The significance of the "2" will become

apparent later.)

Because of the form of BrT, the sum of squares function given

in (2.2.1) can be separated into two parts, the diagonal sum of squares

and the off-diagonal sum of squares, as shown below:

n n n
SS = E (BT -a ) + Z Z (B -b w.)2. (2.2.2)
i=l1 i=l j=l ij j
j#i

The value of aT which minimizes the above sum of squares is that value

which minimizes the left-hand component in (2.2.2). A similar statement











can be made about the minimizing bT-value relative to the right-hand

component in (2.2.2).

By taking partial first derivatives of (2.2.2) and equating to 0,

one finds that

n

E BT,ii
a = n (2.2.3)
T2 = n
and

n n
b B u.. (2.2.4)
Si=l j=l 1 ,iJ j,
j#i

where
n n
u..= w. .ij/( w 2).
1J 1J k=l =lk
Z#k

From this discussion, we see that the YW#2 estimators of a and b

can be found through a two-step procedure.


Step 1: Find the usual YW estimator of B (and

hence B ) by using (2.1.4).

Step 2: Find aT2 and bT2 from (2.2.3) and (2.2.4),

respectively.


2.2.4 The Yule-Walker #1 Estimation Procedure
for the Known Weights Case

In this estimation scheme, a property of the weights is used to

find another estimator of b. Let a be the same as aT2 given in (2.2.3).

To find bTl, note that










nn n n
E E B = Z bw..
i=1 j=1 r,ij i=l j=1 3
j#i j#i

n n
= b E wi
i=1 j=l
j#i

n
= b E 1
i=1

= n b.

This suggests that b could be estimated by

n n

i=1 j=l BT'

b = (2.2.5)
bT1 n

In comparing (2.2.5) with (2.2.4), it is seen that (2.2.5) is

1
just a special case of (2.2.4) where u.. = for all i, j#i. Note that
j1 n
this will be the least square estimator if the weights are w.. = n- for
ij n-l
all i, j#i. This observation will make our theoretical considerations

in later chapters easier in the sense that we need only consider YW#2 in

the known weights case.

The two-step procedure for the YW#1 estimators is as follows.


Step 1: Find the usual YW estimator of B by using (2.1.4).

Step 2: Find a T and bT from (2.2.3) and (2.2.5),

respectively.


2.3 The Variable Weights Case

2.3.1 Introduction

In the study of spatial processes, one may be willing, in a

particular situation, to specify the form of the weights but not their










specific values. In these situations, the weight function would contain

a parameter or parameters to be estimated using the sample information.

We will consider the weight function of the form (before scaling),

-ad
v (a) = e i, j#i. (2.3.1)


(From this point on, the notation "w.." will be reserved for the known
13
weights case.)

This weight function was introduced in 2.2.2, but now a is a

parameter to be estimated from the sample information. This particular

weight function will be investigated in the following section, after

which we present procedures for estimating a, b, and a.


2.3.2 Properties of the Exponential
Weight Function

The exponential weight function takes distance into account in

a reasonable way, exponentially decreasing or increasing as distance

increases depending on the sign of a. If a = 0, each neighbor receives

identical weight. For these reasons, one can label a as a "distance

effect" parameter (assuming b # 0). Because of the explicit dependence

of these weights on distance, they are suitable for both regular and

irregular arrays of locations.

This weight function has certain mathematical properties which

allow one to develop the statistical and numerical properties of aT*

One such property is continuity everywhere as a function of a. Another

concerns the limits of the functions as lja tends to -. Now,










-ad..
) = e 1 = 1 ji. (2.3.2)
ij n -adik n a(dij-dik)
Z e E e
k=l k=l
k/i k#i

Let c. = the number of locations j for which d.. = min {d i} and
kii
f. = the number of locations j for which di. = max {d ik.
1 1 k ik
Let us first consider the limiting case as a tends to -. It is

enough to consider the limiting behavior of the components in the denom-

inator of v..(a) in (2.3.2). For j#i,


d- iff dij > dik
a(d -dik )
e 11 k ----1{ 1 iff .d.. = d.
e iff dij = dik


0 iff d.. < dik .


From the limiting behavior of these components, it is clear that,


--i if d = mind }
c. ij i ik
1 k#i
lim v..(a) =
a++ 13
0 otherwise.

In the limiting case then, we have the weights corresponding to the

closest-neighbor model introduced in 2.2.2.

Let us now consider the limiting case as a tends to -_.

By observing the result in the previous case, one might conjecture an

analogous result here with the weights corresponding to a "farthest-

neighbor" model. Indeed, it follows that

S if d.i = max {d. }
fi 1] k ik
lim v..(a) =
S- 00 otherwise.










We thus see that the exponential function allows flexibility in the

weights for the spatial process.


2.3.3 The Yule-Walker #2 Estimation Procedure
for the Variable Weights Case

The criterion used in deriving the YW#2 estimators here is the

same as that used in 2.2.3 (i.e., least squares). As before, the sum

of squares function to be minimized is split into two components as

follows:

n n n 2
SS = E (BT,i-aT) 2 + E BT ,-b i(a ) (2.3.3)
i=l 1=1 j=1
j#i

where v..(a) is given by (2.3.2).

Then aT2, bT2, and aT2 are those values of aT, b and a respec-

tively which minimize the sum of squares function in (2.3.3).

Taking the first derivative of this function with respect to aT

and equating to 0 yields,

n
E BT,ii
i=l
a = i=1 -. (2.3.4)
aT2 = n (23.4)

Similar action in terms of b yields,
n n
E Z B v. ..( )
i=1 j=1 TJj 1 T
b T j2i (2.3.5)
T2 n n
E E [VkZ (YT)2
k=l =l
#k

After seeing the form of our sum of squares function in (2.3.3),

it is not surprising that our results here agree with those for the









YW#2 estimators of a and b in the known weights case. Equation (2.3.4)

agrees with (2.2.3) and (2.3.5) agrees with (2.2.4) if a value of aT

is specified.

A result that will be useful in simplifying our work is,

for j # i,

vij (aT) n
iT = vij() di.kik(c) d., (2.3.6)
atT 3 k=l k 1

Now taking the first partial derivative of the sum of squares function

in (2.3.3) with respect to aT and equating to 0 yields,

n n n
b T E [B -by vj()vi (T)[d E d.ikvi(aT) = 0. (2.3.7)
i=1 j=1Tij ij k=1
j#i

Then aT2 is the solution to (2.3.7) with bT replaced by bT2,

given in (2.3.5). The resulting equation can be simplified a bit by

dividing through by bT2. This modification necessitates the assumption

that bT2 is nonzero.

Al0: In the variable weights case, bT2 j 0.

The necessity of this assumption is seen by examining equation

(2.3.5) and the sum of squares function in (2.3.3). Any aT2-value which

would lead to bT2 = 0 in (2.3.5) must be meaningless because it is

obvious from (2.3.3) that if bT2 = 0, a cannot be estimated.

Therefore aT2 is the solution to the following equation,

n n n
E E [B ij-bT vi(a )]v (aT)[d. E dik ik(T)] = 0, (2.3.8)
i=1 j=1 k=1
j#i

where bT2 is given by (2.3.5).










The YW#2 estimation procedure can be summarized in two steps.

Step 1: Find the usual YW estimator of B by using (2.1.4).

Step 2: Find aT2 and bT2 explicitly from (2.3.4) and

(2.3.5), respectively after finding aT2 implicitly

from (2.3.8).

Two problems arise with this estimation procedure. First, the

implicit solution (and its determination) to (2.3.8) is complicated by

the fact that bT2 is also a function of aT2. Future research would

indicate whether or not this would be a problem numerically. In any

case though, the evaluation of the statistical properties would be

more difficult.

The second potential problem occurs in the presence of a weak

neighbor effect (i.e., b close to 0). Since a distance effect can be

identified only if a neighbor effect is present, it would seem that it

might be difficult to get a clear picture of any distance effect if the

neighbor effect itself is small. This suggests that T2's behavior might

be erratic (i.e., large variance) in the presence of a weak neighbor

effect. However, since the estimators of b and a are intertwined in

the YW#2 procedure, it appears that there may be an effect on both bT2

and aT2 in this case.

These problems, real and potential, should serve as motivation

to consider, at least initially, other estimation schemes for which b is

estimated independently of a. Such a scheme, YW#1, is presented in the

next section. All additional work for the variable weights case has

been for the YW#1 estimators. One aspect of future research will involve

the study of the YW#2 estimators in this case. At this stage of the











discussion, it might now be clear why the numerical labels for these

estimation procedures were given as they were.


2.3.4 The Yule-Walker #1 Estimation Procedure
for the Variable Weights Case

In 2.2.4, an estimator of b was introduced which did not use

any property of the weights other than that they were scaled to add to

one for each location. It is that estimator which will be used now in

the variable weights case. The estimator of a is unchanged. That is,

n

BT,ii
i=l
aT n (2.3.9)

and

n n

i=1 j=1 T,ij
b j=i (2.3.10)
Tl n

Then aT1 is that aT-value which minimizes the following sum

of squares:

n n 2
SS = E E BTij-bTI vij(OT) (2.3.11)
i=1 j=l
j#i

Using (2.3.6) and taking the first derivative of the function in

(2.3.11) with respect to aT and equating to 0, we have, after simplifying,

n n n
b T E E BT,ij-bTl vij (T) v (oT) LdJ k dik vik(aT) =0.
T1 i=l j=l k=l
jii

We assume that bT1 is nonzero for basically the same reasons as were

given in the previous section.










All: In the variable weights case, bT1 # 0.

It follows then that aT1 is a solution to the following equation:

n n n
i E [BTij-bT1 ) j(aT)[dj E dikvik(I)] = 0. (2.3.12)
i=l j=1 k=l
j#i

The YW#1 estimation procedure thus yields an estimator of b which

is functionally independent of a. The procedure can be summarized in

two steps.

Step 1: Find the usual YW estimator of B by using (2.1.4).

Step 2: Find aTl and bT1 explicitly from (2.3.9) and

(2.3.10), respectively, and then aTI implicitly

from (2.3.12).


2.4 Review of Assumptions Introduced in Chapter II

The assumptions introduced in Chapter II are summarized in

Table 2.1.

Table 2.1

Assumptions Introduced in Chapter II

Section Assumption

2.1 A6: All roots of f(z) = [I-Bzj = 0 lie outside the
unit circle.

2.1 A7: The error terms, Et's, are independent and
identically distributed with mean, 0, and
variance-covariance matrix, G.

2.1 A8: Let yt = t p for all t, where E(xt) = ) .

2.1 A9: Let yt = xt x, t = 1,2,...,T, be the observa-
tions used to fit the model in (2.1.1).

2.3.3 A10: In the variable weights case, bT2 0.

2.3.4 All: In the variable weights case, bT1 0.














CHAPTER III


PROPERTIES OF ESTIMATORS



3.0 Preamble

In this chapter, we will consider numerical and statistical

properties of the estimators developed in Chapter II. Numerical prop-

erties of existence and uniqueness are considered, and the statistical

properties of consistency and asymptotic distribution are investigated.

In Section 3.1, we review those properties of the usual YW estimators

which are beneficial in dealing with our estimators. This section also

contains a general lemma which will be applied in the remainder of the

chapter. We then discuss these properties for the known weights case

in Section 3.2 and the variable weights case in Section 3.3.


3.1 Results for the Usual Yule-Walker Estimators
and Another Useful Lemma

In terms of statistical properties, we will be concerned with

consistency (in probability) and asymptotic distributions. The first

two lemmas give these properties for the usual YW estimators. Hannan

(1970:329-332) gives proofs of the results which lead to these lemmas.


Lemma 3.1:

If Yt is generated as in (2.1.1) and A6 and A7 hold (that is, we

have a second-order stationary process), then for all i and j,


B B as T + c,
Tij o,13ij
where BT is defined in (2.1.4) and B is the true value of B.

30










Let

T= (BT,ll'BT,12"' ,,BT,ln BT,21...,BT,2n ,.,BT,n***,.,BT,nn) (3. )

and

B'= (Bo,11,Bo,12,...,Bo,lnBo,21".' Bo,2n'**",Bo,nl'* ,Bo,nn).(3.1.2)

Recall that for our model, B = B (A5).

Lemma 3.2:

Under the same conditions as in Lemma 3.1, we have


r( ) D N 2 [0, (Gr (0))] as T +m
-T -_o n -

where G is the variance-covariance matrix of E defined in A7 and r(0)
--t
is given by (2.1.3).

The third lemma to be stated provides a useful result for the

asymptotic distribution of well-behaved functions of asymptotically

normal statistics. Rao (1973:388) gives a proof of the lemma.

Lemma 3.3:

Let -T be a k-dimensional statistic, (eT,,... T,k ), for which

V(T ) DNk (0, E) as T + m.

Let hl,...,h be q functions of k variables and assume that each

hi is totally differentiable. Then the asymptotic distribution of

V~t[h (0T) hi (0)], i=l,2,...,q, as T t, is q-variate normal with

mean 0 and variance-covariance matrix H H', where


= ahQi (0)
H 3= is t
T j e=e

The rank of the distribution is the rank of H i H'.











3.2 The Known Weights Case

As was mentioned in 2.2.4, we can concentrate on the properties

of the YW#2 estimators and simply note the slight adjustments necessary

for the special case of the YW#1 estimators.


3.2.1 Existence and Uniqueness

Recall from our work in 2.2.3 that

n
E BT,ii
i=l
aT2 =(3.2.1)

and

n n
bT2 = BTijuij (3.2.2)
i=1 j=1 1 1
j#i

n n
where u.. = w k l wE2 It is clear that aT2 and bT2 both exist

i#k
and are unique. This result also holds for the YW#1 estimators, since

in that case, u = for all i and j # i.
in that case, uij n


3.2.2 Consistency (in Probability)

From Lemma 3.1, we have that the B ,'s are consistent (in
T,ij
probability). The modified estimators given in (3.2.1) and (3.2.2) are

just linear combinations of consistent estimators and, hence, are both

consistent (in probability) since

n n
E B E B .
i=l T'ii P i=1 ro,11
a = --- as T + m
T2 n n

na
0
n
= a
o












n n
P
bT2 =E E BT iu -
i=1 j=l ', 3
j#i


n n
- Z ro B ij .u
i=1 j=1 1
j#i

n n
= Z b w..u..
i=1 j=] 13 1]
j#i

n n
b E Z w.2.
0 i=l j=l 1'
*i#i
n n
E E w 2k

klk


as T --+


j


=b .
o


Similarly,


P
aT1 ao




bT1 bO


as T c




as T +- .


3.2.3 The Asymptotic Joint Distribution
of (aT, bT)

To find the asymptotic distribution of (aT2, bT2), we use

Lemma 3.2 to satisfy the conditions of Lemma 3.3. For application of

Lemma 3.3, let 0 = -8 = k = n2, Z = G 0 F(0),
.T -o -o


n
i Bi
Z ii
i=l
1 n


n n
h2() = E E B .u.
i=1 j=1l
j#i









where u.. is given after (3.2.2). Then
1J
f9h.(O)
-I2 = e.


0 ... 0 0 0 ... 0 ... 0 ... 0
n n n
0 u12 ... uln u21 0 u23 ... 2n Un ... un n- 0 (3.2.3)
SU12'- in U21 .23 2n nl n n-1 _

Since the elements in H2 are all constants, it follows that hI and h2

are totally differentiable. Now,

hlT) = aT2
and


h2(T) = bT2


From our work in 3.2.2, we have

hl(w ) = ao


Applying Lemma 3.3


h (B ) = b .
2 o

yields the result that


v[(aT2,bT2) (ao,bo)] N2(0,H2EH2') as T + -,

where Z = GOFr 0) and H2 is given by (3.2.3). The univariate asymptotic

distributions of both aT2 and bT2 follow easily from the joint result.

It also follows that

i[(aTbTI) -(aao,bo)] -_ N2(0,H1EH1') as T + ,

-11
where 1 = G F- (0) and H is the same as H2 except u..ij = for all
i and j n i
i and j / i.










3.3 The Variable Weights Case

As one might expect, it will be more difficult to get the properties

of our estimators here, since there is no explicit solution for the esti-

mator of a. As was mentioned in 2.3.3, only the YW#1 estimators will be

considered.


3.3.1 Existence and Uniqueness

Recall from our work in 2.3.4 that

n
aT Bn
-Tl n- (3.3.1)

n n
SZ BT,ij
i=1 j=1
b j-i (3.3.2)
Tl n


and aTI is the solution to the equation,

n n n
Z E [B T -b v (a)]v j(a)[d. dik v k()] = 0. (3.3.3)
i=1 j=1T,ij T 13 1 1 k=1 ik ik
j#i

It was established in 3.2.1 that aT1 and bTI both exist and are

unique. In order to show the existence of cT1 we will work with the sum

of squares function, call it s T(a), the partial derivative of which led to

(3.3.3). That is,

n n
s (a) = E E [B bT vij(a)], (3.3.4)
i=l j=l
j#i

where v..(a) is the exponential weight function given in (2.3.2).
:l











Define the following sets:

N. = {locations j: d.i = min {d.
1 X ki 'kk 1

Q. = {locations j: j N., j#i},


Fi = {locations j: dij = max {dik}},
and
Pi = {locations j: j V Fi, j#i}.



Recall from 2.3.2 that c. = the number of elements in N and f. = the
1 1
number of elements in F..
1

Theorem 3.4:

Suppose the following conditions are met.

Cl: The estimate, bT1, is nonzero.

C2: It is not true that the usual YW estimator, BT, is such that

Tb for all i and Ni
Tl c. i

T,ij
S0 for all i and j E Q


or

fb 1 for all i and j E F.
T1 f.
BT,ij =
0for all i and j E P


C3: There exists a location i for which c. < n-1 where n 13.
--- 1

Then there exists a finite aTI such that sT(oTl) = min ST ().

Proof:

Let MT1 = lim s (a) and T2 = lim s (a). Using the results

from 2.3.2, it follows that










n b T 2 n
1 j= E \ BT'+ El B 2 (3.3.5)
MT1 i=l jCN T, ij ci i=l jCQ T,ij
and
n b 2
MT = ( f) + E B n (3.3.6)
i=1 j EF Tij fi i=1 jeP. iT
i i

From C2, we see that both MT1 and MT2 are positive. (They are also fin-

ite since the BT,ij 's are finite.) Since s (a) is clearly a continuous

function of a, we know that if there exists a finite a3 such that

T (a3) < T1 and ST(a3) < M2' then there exists a finite aT1 such that

s (aT) = min s (a). Our objective is to show the existence of a3.

First, we will show there exists a finite al such that s (a1) < MT1

Let E > 0 be such that

< min (min {~-1} R, R R (3.3.7)
i i

where R1 and R2 are the positive roots of two quadratics to be introduced

later in (3.3.15) and (3.3.18). Since lim v..(a) = 0 for all i and
a-* + 1i
j e Qi and,for finite a, vij(a) > 0 for all i and j # i, it follows that

there exists a finite positive OL such that for all finite a > aL'

0 < v..(a) < E for all i and j e Q.. (3.3.8)

1
We also know that lim v..(a) = for all i and j e N.. Since these
a++o 1iJ c. 1
1
weights sum to 1 for each location, it follows that for all i and j E N.,

v..(a) from the left as a +. In addition to satisfying (3.3.8),
13 c.

aL can be chosen to also satisfy the condition that for all finite

a 2 a, 0 < e < vij(a) < for all i and j E N.. Consider the
c ij c. I
i i
interval, S = [aL, 2aL]. Since S is compact, there exists wL > 0

such that











w = min
i,jEQi


min v.. (a)
acS 1


1
and there exists w > 0 such that w < and
Ul Ul C.
1

w = max max v (a) .
ju EN. aCS
Thus, for all a S,
Thus, for all a E S,


0 < wL r vi.(a) < E
L 13


1
0 <--
c.
1


- < vij (a)
1j


for all i



5 w .<
U1 C.
1

for all i


and j Q.,
1


and j e Ni.


From (3.3.10), we have

1
0<--
c


i v.. (a)
ui c. 1ij
1


1
< <- -
ci
1


for all i and j E N..
1


We now claim that for all a S, MT s (a) > 0. From (3.3.4) and (3.3.5),


MTI sT(a)


n / b Tl2
= E EZ B -
i=1 jeN. 'T,' i
1


[BT -b v .(a)]2 n
T,ij Tl ij


i=l jEQ.


[B -ib v.(a)]
T,ij Tl 13


= 2bTl (
+ =1




=1


JE
i Qi


n 1
B ..v. ()- E E B [---v (a)
i jeN i1i


n
E [ -v ()t ) v 2 (a) j
j N i i=1 jEQi
i i


(3.3.9)


(3.3.10)


(3.3.11)


n
+ E
i=l


n
- E


S2
T,ij


E
jEQ
1


i=1 jEN.
1


(


3.3.12)










Case 1: Suppose that bT1 > 0.

It is enough to show


[_1v _v(a)] -
ci ij


E
i
j SQ.


j B T, ij ij (
JQi


n
- Z B ..[--v. (a) >0
i=1 jN. T,ij ci
1


for all a E S.


(3.3.13)


Now using (3.3.9), (3.3.10), and (3.3.11), we have


n n
b E E [ -v a)]- ij
T i=1 jN. i i=1
1


n
+ 2 E
i=1


J
jEQ


T,ijvij(
Li


E v.
jEQ. 1J
i


E E B ..[ v. )]
=1 jeN T,13 c 1)


n
2 b [I
SbTl[
i=



+ 2 E E
[i=1 je ,


1 2 n
E (- -w2.i)- Z
j N 1 i=l
i


BT,ijwL + i
i=l


i
T,ij 0


J
J Qi


B T,ij< 0
T,ij


- z
i=l jeN.
B T,ij>0


BT ij -
T, ij


E
jEN.


1
B (--w .)
T,ij ci uij


BT,ij 0


= A1 2 + BI +C1,


T

i=l1


J
1j


+ n
+ 2 E
i=l


E
jEQ


T,ij


(3.3.14)










where
n
A = b E (n-l-c.),
i=l

-n n
B=2 E E B ,ij -E Ez B Ti
=1 jeQ i i=l jCN. T,1J

BT,ij <0 BT,ij0

and



n n
n n 1 )

T,ijwL T-( c u
+2 E E B iwL E E B (--wj
=1 jEQ i=l jeNi Tij

BT,ij > BT,ij

Now consider the following polynomial,

fl(x) = A x2 + Blx + C1, (3.3.15)

where Al, B1, and C1 are as above. From C3, it follows that A < 0.

Clearly B1 < 0. From (3.3.10), we have C1 > 0. With these conditions on

the coefficients of the quadratic in (3.3.15), it follows that fl(x) = 0

has two roots, one positive and one negative.

Let R1 be the positive root. Since c in (3.3.7) is such that

0 < E < R1, one can conclude that the lower bound in (3.3.14) is positive

and that (3.3.13) is established.

Case 2: Suppose that bT1 < 0.

It is enough to show


b in E [ 1 2 (<0) V 2 2O
2n n
vijE ij
l i=l jeN. i i=l jQi
1
n n 1
+2 (a)- E B .[--v .(a)] < 0 for all a E S.
=1 jeQ 1 i=l jN 1 1

(3.3.16)










Now using (3.3.9), (3.3.10), and (3.3.11), we have


jE [1 2 (a)] E E
c 2 v ij
jEN i=1 j EQ


B ij.(a) -
T,ij 1j


E
j^i


1 V
B [- -v1
T,ij c. i j
1


n
2 2
w 2 E E e
ul i
i=1 jEQ.J
i


+2 2
i=l j

BT, ij


SBTij e+
EQi
>0


n
-Z Z
i=l jEN.

T,ij


1
B .(---
T,ij c.
1


n
wi) E Z
i=l jEN.
1ij
BT,ij <0


= A2 e2 + B2 + C2,

where


n
A = -bT1 (n-l-c),
1=1


B2 = 2 E E
Li=l j EQ
BT,ij >0


BTij


n
- Z E
i=l jEN.
1ij
BT,ij<0


n
C = b c w2.
2 T1 i ui
1=1


n
+ 2 E
Li=1l


SEQ
i


n
BT, ij L-
i=l


E
j EN
1


BT,ij <0 BT,ij >0


n
+ 2 E
i=1l


E
i


E
jcN.
1


n
E E
i=l jEQi

BT,ij 0


BT,ij WL


BTIj ]


(3.3.17)


and


BTij


B ij(
T,ij C


w u)


( n
b Tl
i=1


5 bT E
i=1










Now consider the following polynomial

f2(x) = A x2 + Bx + C2 (3.3.18)

From C3, it follows that A2 > 0. Clearly B2 > 0. From (3.3.10), one

can conclude that C2 < 0. With these conditions on the coefficients of

the quadratic in (3.3.18), it follows that f2(x) = 0 has two roots, one

positive and one negative. Let R2 be the positive root. Since E in

(3.3.7) is such that 0
(3.3.17) is negative and (3.3.16) is thus established. Since bT1 0 0

by Cl, all cases have been considered. Thus, there exists an a1 belong-

ing to S (and hence, finite) such that sT(al) < MT1.

To complete the proof, it is necessary to show that there exists

a finite a2 such that ST(a2) < MT2. However, a check of the form of MT2

in (3.3.6) reveals that the details would be analogous to those of the

part just completed. Finally one can conclude that a3 = al if MTL < MT2

and a3 = a2 otherwise. Thus, there exists a finite a3 such that

sT(a3) < MT1 and s (3) < MT2 and the proof is complete.

Before going on, some discussion of the conditions of this

theorem is in order. The first condition is just All. The second condi-

tion is an assumption which seems to be reasonable.

A12: The usual YW estimator, BT, is not such that


Sb for all i and j N.
bTl ci
T,ij
0 for all i and j E Qi

or

b BTi for all i and j E F
B. =
ST,ij
0 for all i and j E P. .
1










It appears quite unlikely that one would observe a BT of either heavily

restricted form excluded by A12. It should be noted that the second

condition alone, does not imply that aT1 is finite, only that MT~ and MT2

are both positive. Finally, C3 is met by assuming A4. That is, the

array has either more than 3 locations or 3 locations not forming an

equilateral triangle.

Only the existence of aTI has been established. Although we

were unable to show its uniqueness, most of our empirical investigations

would support such a conjecture.


3.3.2 Consistency (in Probability)

Since aTl and bTI in the variable weights case are the same as

their counterparts in the known weights case, we already have from

3.2.2 that

P
aT ao as T +

and


b bo as T o

As would be expected, to arrive at the consistency of xT1

requires more effort. A general theorem will be presented and after its

proof, it will be applied to our particular problem.

Let 0 be an estimator of 0 based on T observations, o a param-
-T -0 O
eter of interest, and f(*) a finite-valued function of 0 and (. We define

hT () = f(OT, )

and

h (4) = f(eo,0).










Theorem 3.5:

Suppose the following conditions are met.

Cl: There exists a finite ST such that hT ( T) =min hT(),

where T is taken to be an estimator of o.

C2: The function, h (*), is continuous everywhere.

C3: The function, h (), has a unique minimum at fo.

C4: (i) The limit of h (0) as -+ +m, M1, is finite and

greater than ho( o).

(ii) The limit of h ( ) as M -_, M2, is finite and

greater than h ( o).

C5: We have that suplh T() h () |-> 0 as T 0.
T
p
Then jT o as T -- .

Proof:

This proof is patterned after one by Parzen (1962) and consists

of establishing the following two results.
P
Rl: As T m,, h (4 ) h ( ).
o T o o
R2: For every > 0, there exists n > 0 such that |o -I E

implies that ho (o) ho (0) n.

Applying R1 to R2, with (T in place of (, yields the desired result.

Proof of R1:

If it can be shown that

h T(4T) ho(o) < sup hT () ho(),

it follows that

|ho( T) -ho (o)( < 2 sup IhT() -ho($) (3.3.19)

since


Iho(T) -ho o)I Iho(PT)-hT( T)I + IhT(QT) -h(Io .









Case 1: Suppose that hT( T) -ho( o) 2 0.

Then it follows from Cl that

0 < h T(T) -ho () = inf hT()) -ho( o)

< hr($o) -ho(o)
h T 0 0 -h 0 0

= IhT ( ) -ho (

5 sup |h T() -ho( )).

Therefore,

h T()T) -ho(o) I1 sup hT() -ho() .

Case 2: Suppose that hT(OT) -h () ) < 0.

From C3, it follows that

0 < ho(4o)-hT($T) = inf ho() -hT T )


Sho(T) -hT($T)

= ho (T) -h T(T )

< sup (hT(h) -h (4) .

Thus,

h T(4T) -ho( o) 0 < sup [ hT() -ho(0) .
40
Therefore, from our comments prior to (3.3.19), we can conclude that

Iho((T) -ho(o)I 2 sup hT(h) -ho(()I.

Applying C5 to this result leads to the conclusion,

ho(T) -- ho (o) as T m. (3.3.20)










Proof of R2:

The proof will be by contradiction. Suppose there exists

C > 0 such that for every rl > 0, there exists A such that o- Aj
SI I 1
and ho () -ho(0) < l. Now choose a sequence of i's of the form,-,

and let m be the corresponding G-values. We then have a sequence

{m } for which C[ -n > E and h (o ) -h( ) < for some
m m=l 0 m a o o o m m
> 0. Since, from C4, the limit of h (() as a + +Mo, M1, is finite,

one can pick a finite A large enough to get arbitrarily close to MI.

A similar statement can be made about small ( and M2. Now since M1 and

M2 are both greater than ho ( ) and ho (m) ho(A ) as m m, there

exists M, 1, and A2 such that for all m 2 M, m E S = [ Ao-]

union [o + E,2]. Since S is compact and h (A) is a continuous func-

tion of A, from C2, it follows that there exists 3 E S such that

ho (3) = ho (o). This contradicts C3 and so R2 is established and the

proof is complete.

In order to show the consistency of aT the conditions of

Theorem 3.5 will now be verified for our particular case. We have:



T= (BT,12 ...,BT,nBT,21',BT,23.' ,BT,2n'*' T,nl,' T,n n-l'bTI'

(3.3.21)
e = (B B B B B B B b)
-0 (Bro,12""' Bro,ln' Bro,21' Bro,23, ...Bro,2n"*"'Bro,nl"''B ro,n n-l'bo)

(3.3.22)
(where Bro = a for all i and B ..ro,ij = b v.( ) for all i and j j i,
ro,ii o ro,1j o 13 o
with vij (a) given in (2.3.2)),

S= aC ,
0 0
and
n n
f(Z, ) = E [B.. b v..(C)] .
i=1 j=1
j#i











This implies that

n n
hT( ) = E [B ,ij-b v ij(a)]2 = sT(a), (3.3.23)
i=1 j=1
jfi

which agrees with the notation in (3.3.4), and

n n
h ()) = Z [Broij b v..(a)]2 = s (a). (3.3.24)
i=1 j=l
j#i

Clearly f is finite-valued and we will now check the other conditions.

Cl: In our case, 4T = OTI. This condition was established

in Theorem 3.4.

C2: The function, s (*), is clearly a continuous function of a

since vij () is continuous for all i and j # i.

C3: From (3.3.24),

n n
s (a) = [B b vi (a)]2
i=l j=l
j#i

n n
= b2 [v..( ) v (a)]2. (3.3.25)
o i=l j=l ij 0 ij
j#i

Therefore, s (a ) = 0 and so a is a minimum of s (*).
o o o o
Now suppose so(al) = 0. From (3.3.25), this implies that

b2[v.(a ) v. (a )] = 0 for all i and j # i. Since consideration
0 1J 0 1j 1
of a~ is meaningful only if b # 0, it seems reasonable to consider the

consistency of aT1 only if the following assumption is made.

A13: The true value of b, b is nonzero.

With A13, it follows that


v..(x ) = v..(a ) for all i and j 0 i.
1 o j 1


(3.3.26)











Now pick a location i for which there exists neighbors j and k such

that d. # dik. Such a situation exists from A4. Since vij(a) is of
3j ik .3
the form given in (2.3.2), we have from (3.3.26),

v.ij (a) vij(a )
Vik(o) Vik(I) '
ik o ik 1

which implies that

o (d ik-di) eX (d ik-d..)
e = e ,

which implies that


1 o

Thus, ao is a unique minimum of s (').


C4: From results in 2.3.2, we have,


lim s (a) =b 2 ( ) ]2 v2
Sij 0 C. c ij
a+1 i=l jN. i i=l jeQi
i i

where c., N., and Q. are defined in 3.3.1. This limit clearly
1 1 1
exists and is finite. Since bo # 0 by A13, the only way for MI to

equal 0 is for ao to equal +0. However, if one really felt that ao

equalled +4o, one could use the identical closest-neighbor model in the

known weights case (see 2.2.2). The considerations are similar in the

case of M2 and so it would seem reasonable to assume that a is finite
2 o
when the variable weights model is employed.

A14: The true value of a, a is finite.

With this assumption, C4 is established.

C5: Using (3.3.23) and (3.3.24), it follows from Al that

nn n n
IsT(a)-so(a) = i [BT,ij-b vij(a)] 2- E [B .roij.-b v. ()
i=l j=L i=1 j=l
j#i j#i












n n
= E (B 2 -B 2 ..)
T,ij ro,ij
i=l j=
j#i


n n
<5 Z B 2 -B 2
i=l j-l T,ij ro,1i
i=1 j=1


n n
+2 E E
i=1 j=1
j#i


n
+
i=l


n
+ 2 E
i=l


i (a)(boBro,ij-b T T,ij


n
E
j=1
j#i


n
j bB B- I
bo ro, ijbTl BT,ij
j=l


n
E lb 2-b 21
_ I o


1=1 J=i


Thus,


sup ls(a)-so(a)
a


From Lemma 3.1, BT j
T iij


P
from 3.3.2, b ----b as
-- o


n n
Z E [IB .-B 2 .. +21b B -b B I + b2 -b 2 1
~I [jBT,ij ro,ij o ro,ij -bTT,ij Tl o
i=1 j=1


(3.3.27)
P
--*B ,. = B .. as T a for all i and j and
o,ij ro,ij


T -+ o. Therefore


S2 P 2
B B
T,ij ro,ij

2 P 2
b b 0
Ti o


as T -+


as T ,


P
bTl BTij ---b B
T, o ro,ij


for all i and j,


as T -+ o


for all i and j.


Applying these results to (3.3.27), we see that the upper bound con-

verges to zero in probability as T + m, and hence,


P
sup s T(a) so(a)| -*-0
ax


as T -.


All conditions of Theorem 3.5 have been satisfied. So the


consistency of aT1 has been established.


n
+ E


v 2 (a)(b 2-b 2)
ij Tl o









3.3.3 The Asymptotic Joint Distribution
of (aTl bTaTI)


The format of this section will be similar to that by which OT1

was shown to be consistent in the previous section. Two lemmas will be

presented first. These will be followed by a general theorem and an

application of it to our problem.

The first lemma is just a specific statement of the multivariate

Taylor's formula. A more general statement of the formula and its proof

can be found in Fleming (1965:44-49).


Lemma 3.6:

ah(d)e
Let h(') be a function of = ( 1', 'q If h() exists

and is continuous everywhere for j = 1,2,...,q, and both 6T and o

belong to IRq, then there exists xT IRq for which


h($T) = h(h ) + E(
-T 1-o j1__ T,j o,j
j=1 j ':

where _T = + sh, 0 < s < 1, and h = .


The second lemma is a result on asymptotic distributions. This

lemma can be found in Fuller (1976:199).


Lemma 3.7:
D P
Let zT -- z as T 4 o and AT A as T + m, where z is a

random k-vector and A is a nonsingular k x k matrix of constants. Then


AT z1 -- A.1 z as T o.

Before stating and proving the general theorem, we make the

following definitions. Let T = (0 ,..., )' be an estimator of
i-T T,JJ T,k









o = (0 ', ,k)' based on T observations and = (,l 0,2''''' oq)
-o ol o,k o,l o o,q
be a parameter of interest. For i = 1,2,...,q, let f.(-) be a function
1
of 0 and 4 and let

hoi(4) = fi.(_,4),

hT, i i (--T'I'
and

r oi() = f,(,1).

Let f(0,4), h (4), hT(), and r (0) be the corresponding vectors of

functions.

The purpose of the following theorem is to allow one to find

the asymptotic distribution of estimators when some of them are implic-

itly defined.


Theorem 3.8:

Suppose the following conditions are met.

Cl: The statistic, 0T, is such that


(_T-_o) -- N (O,E) as T o.
-T -k k

C2: An estimator of A is AT for which hT(AT) = h (A)

C3: The statistic, T,i' is a consistent (in probability)

estimator of .i for all i = 1,2,...,q.

C4: All partial derivatives of h T,i() exist and are contin-
-- Ti
uous everywhere for all i = 1,2,...,q. We let


hN, i)
T,ij () =
alj












C5: All partial derivatives of h .(*) exist and are continuous
0,1
everywhere for all i = 1,2,...,q. We let

3h .(g)
h () = o0 ) and M = {h ()}.
o,1 ij. o oij -

C6: The matrix, M is nonsingular.


C7: We have that sup Ih (1) h1 () --- 0
ST,ij o,ij

as T o for all i and j.

C8: All partial derivatives of r .(*) exist and are continuous
0,i

everywhere for all i = 1,2,...,q. We let

ar .(6)
R ={ .
j 0=0
--0
Then


/i(T- ) -D2- N (0,(M o-R )Z(MN R )') as T
-To q o o o o

Proof:

Several intermediate results will be necessary to reach the

desired conclusion.

R1: For every i = 1,2,...,q, there exists a random vector,

xTi, for which

q
hT,i(T) = hT,i( + hTij (Ti)T,j-o,j)

where xT = + s.h, 0 < s. < 1, and h = T o .
eTi o i- 1 --

Proof of R1:

For a fixed T, use C4 and apply Lemma 3.6 to h T,i() for

i = 1,2,...,q. The randomness is due to the fact that hT,i() is a

random function. Using RI, we have the following system of equations:









hT' T) = hT() + MTT~~), (3.3.28)

where M = (hT,ij (i)}.

R2: For all i = 1,2,...,q, xTi as T in the
-1 -o
sense that for all E > 0,

P(lxTi,l-o,l1 < .XTi q-o,ql < E) 1 as T +

Proof of R2:

From Rl, it follows that for all i = 1,2,...,q, XTi 1

= sh = s(iT- ), where 0 < s. < 1. Thus, if


1 1
E { = f -%,11 < I*., -( < } occurs, then
T. T s 0 q o,q s1


FT = {IxTi,l-o,l < E,..., XTi,qo,ql < E} also occurs.

This implies that P(E ) < P(FT). From C3, we have that P-- oj
T T,j j
as T oo for all j = 1,2,...,q. This implies that P(ET) 1 as T + ,

which, in turn, implies that P(FT) 1 as T + o, which establishes

the result.
P
R3: For all i and j, M, .i -- M .. as T 0
T- ,. o,j13

Proof of R3:

For all i and j,

MTij-Mo,ijI = hT,ij(i) hoijo

-< hT,ij (Ti) ho,ij (i)l+ Iho,ij Ti)-hoi/j

suphT ij ()-h () + h oij )-h ,j()"


Now R2 and C5 imply that h oij(xT) hoij( ) as T -+ Combining

this result with C7 establishes the result.









R4: As T o,

I/[r (_ ) r (6 )] D- N (0,R E R').
-T q -o o

Proof of R4:

From C8, we have that r .(*) is totally differentiable for all
0,1
i = 1,2,...,q. The result follows by an application of Lemma 3.3 to Cl.

R5: As T o,


o) ----N (0,Ro E R').
D q- o o

Proof of R5:

From (3.3.28) and C2, it follows that

T ^MTT-J)=-=-/T[h__T(o) hT,) ]


=- T[h T(o) h o(=)].

A clarification of notation yields the result. From our definition

of h ('), r (*), and h (*), we have




= r ( )
o -
and

h(-o) = f(,o)

=r(6)
--o0



R6: As T a

M1 MI (,T-_ ) ->N (O,(M-IR) (-IR)).

Proof of R6:

Using standard results for normal distributions, the result

follows when C6, R3, and R5 are applied to Lemma 3.7.









-1
It should be noted that for any finite T, MT may not exist.

However, the following result indicates that this fact has no effect on

the asymptotic distribution of 'T(T ).


R7: Let A T= MTI and CT = {(: AT(o ) # 0). Then P(CT) 1 as T-+.


Proof of R7:

It follows from R3 and standard results that A -T IMJ as T--.

Since IMo # 0 (from C6), we have that P(CT) 1 as T m

The implication of R7 is that the behavior of /T (T- )

need only be considered on CT for purposes of determining the asymptotic

distribution. Thus, we can conclude from R6 that


TT--) --k N (O,(M R ) E (M R )') as T + .
-T q o o o o

With the completion of the proof of this general result, it is

now necessary to verify that the conditions hold for our specific case.

We have e = 8 and 9 = where and B are defined in (3.1.1) and
-T --T --o --o -o
(3.1.2), respectively, IT = (aT1,bT',aIl)' $o = (aobo',o)

E = G r-'(0), k = n2, and q = 3. Let

n
E B..
i=l1
fl (0{) = n a,

n n
SEZ B..
i=1 j=l 1
f (iO,)- = #-- b,
n
and

n n n
f (09,) = E [B..-bv (a)] v .(a) [d E d ikik((a),
i=l j=l J k=l
jji











where vi.(a) is given in (2.3.2). Recall from A5 that B = B .
It follows then that
It follows then that


n
E B
ro,ii
i=l
o,1 n


n n
E E B
i=l j=l ro,ij
i=1 j=1
h 2() =
o,2 n


n
a
i=l 1
a = -a
n

= a a,
o


n n
E E b v..(a )
i=l j=l 1 j

_b =i b
n


n
E b
i=1 b
n


= b -b,
o


and


n n
h ,3() = E E
o i=l j=l
j#i


[ro,ij-bij ( ij() [ij-


dikvik (c)


n n
= C
i=1 j=1
j#i


n
[b v.. (a ) -bv .(a)] v.(ca) [d .- E di v (a) .
S1J 1 1 1' k=1 k ik


Recalling the form of aTI, bT1, and aT1 from 2.3.4, it follows that


hT,1() = n


= a a,
Tl


n n

i=1 j=l Tij

h () = b = b b,
T,2 n TI


and


hT,3 )


n n

1 j [BT ij-bvij()] vij(a) [dij dik k(a)].
i=l j=l k= i











We also have:

n
E B..
i=l
r () = a ,
0,1 -- n o

n n
E E B..
i=l j=l 1

r () = j#i -b ,
o,2- n o


and


r ,(6)
o,3 -


n n n
E E [B. -b v (a )]v .(a )[d.. E dikv.(a )].
i=l j=1 ij o j jk=likik
jji


We now check the conditions.

Cl: Lemma 3.2 meets this condition.

C2: Evaluating h (1), we have:

h ,l() = a a = 0,

h () = b b = 0,
o,2 o o


and


n n
h (,3 ) = E Z [b v .(a )-b v..(a )]v..(a )[d..
io, i=l= o o o ij ij


n
- dik vik (ao)]
k=1


= 0.


Evaluating hT(mT), we have:

h T,1() = a aT = 0,

hT,2(h) = bT bT = 0,


n
[BT,ij-bT1V ij (CT1ij(aTI)d [dij dikVik(aT1 )]
k=1


n n
h ,3() = Z E
i=1 j=l
j#i


= 0,









by the definition of CTI in (2.3.12). Since hT(4T) = h ((), the

condition is established.

C3: The results from 3.3.2 satisfy this condition.

C4: Evaluation of the partial first derivatives yields:

hT,11(4) = hT,22(4) = -1,


T,12( ) = hT,13() = h21 = hT2 hT,23(4) = hT,31(4) = 0,

n n n
hT,32() = v.2.(i)[d.j dik v (a)],
i=l j=l k=likik
j#i
and
n n n
hT(33() = E v.i (at) E d ik ik() 2bv ij ()-B
i=1 j=1 k= ij
j#i

n n
+ [BT,ibij() ( dikvik ()[dik- i di
k=l =1

Since v ) is a continuous function of ti for all i and j, it follows

that hT,ij(*) exists and is continuous everywhere for all i and j.

C5: The forms of h (1) and h T() imply that h0, (4) = h T, ()
--( o 0,ij T,ij
for all i and j except for i = j = 3, where we have,


h 33() = v. (a) [d.- d ikvik () ]2 [2bvj(a)-b v( )ij ]

j#i

n n
+ [b v..(a )-bv..(a)]( E dik v ik() [dik E d ,v (a)] .
o 0j o = k 1 i

For the same reason as in C4, we can conclude that h (-) exists and is
o,ij
continuous everywhere for all i and j.










C6: By definition,

M = {h (A )}
o oij


-1 0 0

= -1 (3.3.29)


Sho,32 o) ho,33( 0
where
n n n
ho32 ) = E v2.(a ) [d.. d ikv (ao)]
i=l j=l k=l
jfi
and
n n n
h o(1) = b ZE E v 2.(a ) [d. E d. v (a )]2
o,33 () o i=lj=l ijo j k

jii

Therefore IM o = ho (0 ) Since bo 0 (A13), it is enough to show
0 0,33 o o
n
there exists i and j such that d.. Z dik vik (a ) 0 0. From A4,
k- ik ik o
k=l
it follows that there exists a location i for which there exists

neighbors j and k such that d.. # d. Choose location j to be a

closest neighbor of location i. Now, from A3, it follows that

n n n
d.. E dv (a ) = E d..v, (a ) dikv ik(a )
dij E ik ik o E ijv ik o E ik ik o
k=l k=l1k k=1

n
= E (d..-d. )v. (a ).
k=l 1 ik Ik o

Now dij dik for all k # i and dij < d. Therefore, since v (a ) > 0

for all i and j 0 i (from A14 and (2.3.2)),it follows that

n n
E (d i-dA )v. (a ) < 0 which implies that d.. E dikv (a ) < 0.
k ij ik Ik o 1 k=j ik ik o
Thus, the condition is satisfied.
Thus, the condition is satisfied.









C7: Since the only (i,j) combination for which

hT,ij(1) ho,ij(1) is (3,3), it is enough to show that


sup IhT (33) h ( )3 ( 0 as T +

First note that there exist c and d, both finite, for which


Ik d ik ik(a) c and d.ij
k=l 1 1J


n
- d ikvik(a) d,
k=l k k


since, from Al, it follows that


n
k=E
k=l


dikik ) =


n
k=
k=1


dik v ik()
ik ik


n
SE dik
k=l


n
Id.. -
Sk=l


diki I 1 dij


< d.. + c

< d.

Now, from Al and the above results, it follows that


IhT,33() ho,33( l


n n
E E v (a) [d.
i=1 j=1 ij
j#i

n
k dikvik(0a)[dik
k=l


n
- d Av i )]2
k=lik ik


n i}


[bVij(o B T,ij
n n
S Z E b v..(a ) B .I(d2+cd).
i=1 j=1 Tij
j#i


dikVik (0)










Therefore,


sup
i


IhT,33() ho,33()


n n
5 E E bvij (a) B Ti(d2+cd)
i=l j=l o ij o T,ij
j#i

Applying Lemma 3.1 and A5 (B =B ) leads to the conclusion that
o ro


sup IhT33() -
0


h o,33() -- 0 as T m.


C8: Evaluation of the partial first derivatives yields:


Dr (0)
o,l
aB.j


r () n
o ,2 0
0- =
RB..
1J 10


and


0r
o,
ai


Sv..(a.


^ Oj
llj I


if j = i


otherwise.



if j # i


otherwise,



n
)[d i- E
o k=l


otherwise.


d ikv (a)] = r.
ik ik o 13


Since all of these derivatives are constants, it follows that C8 is

satisfied. Then


n


n
1 1
R = 0 1 ... 1
o n n

S0 r12.. rln


n

0 1 "
0 -- 0 ... 0 ...
n


0 1 1
n n.
n n


r23... r2n*.. rnl


0
n


r oJ
n n-1

(3.3.30)


if j i










All of the conditions for Theorem 3.8 have been met without the

need for any additional assumptions. We finally have that


f-[(aT,b T,a T) (a ,bo, a )] D -- N(0, ) as T ,
T TlT1 o 0 3-
where

1 = (Mo R ) E (M R )',
1 o o o o

M is given in (3.3.29),

R is given in (3.3.30), and
o

E = G -'(0). (3.3.31)

The asymptotic univariate distributions of aT1, bT1, and aT1

follow directly from the joint result. One might fear that since the

assumption, A13: b # 0, was made to satisfy C3 and C6, there may be

some problem in a derivation of the asymptotic univariate distributions

from (3.3.31). However, recall that A13 was made only in order to

arrive at the consistency of aTl. For this reason and also since bTI is

explicitly defined, A13 enters into the univariate considerations only

in the case of aTl. Consequently, one should consider the asymptotic

distribution of aTl only if one is willing to assume bo 0 0. In light

of the earlier discussion on the justification of A13, it is seen that

this restriction is quite reasonable. For similar reasons, A14 is also

necessary only in the case of aT1.



3.4 Review of Assumptions Introduced in Chapter III


The assumptions introduced in Chapter III are summarized in

Table 3.1.











Table 3.1

Assumptions Introduced in Chapter III

Assumption

A12: The usual YW estimator, B is

1
bT1 ci for all i

T, for all i
Tij Q 0 for all i s


A13:


A14:


lot


and


md


such that


j E Ni


j E Qi
1


1
b for all i and j F.

BT.ij =
S 0 for all i and j E P.
1


The true value of b, b is nonzero.
o

The true value of a, a is finite.
o


Section

3.3.1


3.3.2


3.3.2















CHAPTER IV


ESTIMATORS OF COVARIANCE MATRICES
AND THEIR PROPERTIES



4.0 Preamble

In this chapter we will consider the estimation of two covariance

matrices, G and F(0), where G is the covariance matrix of the error term,

E and F(0) is the covariance matrix of yt. As for the other parameters

of interest, estimation schemes will be introduced which exploit the

nature of the spatial first-order autoregressive model. The estimators

and some of their properties in the case of the general first-order auto-

regressive model will be presented in Section 4.1. This will serve as

motivation for the specialized estimation schemes which will be intro-

duced in Section 4.2. Some properties of these new estimators will be

examined in Section 4.3. The known weights and variable weights cases

are considered simultaneously, with differences noted when necessary.


4.1 Results for the General First-Order
Autoregressive Multivariate Model

4.1.1 Relationships Among Model Parameters

Just as there are relationships among the model parameters which

lead to the usual YW estimator of B in Section 2.1, there are relation-

ships which will lead to the usual YW estimator of G. We recall from

Section 2.1 that F(O) is already estimated by a moment estimator, F (0).










Consider again the model for the general first-order auto-

regressive time series,

t BYt + t, (4.1.1)

for which we assume A6 and A7. That is, (A6) all roots of

f(z) = JI-Bzj = 0 lie outside the unit circle and (A7) the t 's are

independently and identically distributed with the mean equal to 0 and

the variance-covariance matrix equal to G.

If we multiply through (4.1.1) by Et and take expectations,

we have from (2.1.2), A6 and A7,

E(y ') = G for all t. (4.1.2)
-t
If we then multiply through (4.1.1) by yt', it follows from (2.1.3)

and (4.1.2) that

F(0) = BP(1) + G. (4.1.3)

Another implication of A6 and A7 is the relationship,

00
F(0) = E BJGB'j, (4.1.4)
j=O

where B = I. We recall from Section 2.1 that

F(-l) = BF(0). (4.1.5)

We now have a system of 3 equations involving the model

parameters, B, F(0), F(-l), and G. (From the definition of r(') in

Section 2.1, it follows that F(l) = F'(-1).) The results presented

thus far in this section are known and can be found in Hannan (1970:13-15,

326-329) and Fuller (1976:72-73).

We will show that (4.1.4) and (4.1.5) imply (4.1.3). Since

r'(-I) = F(l) implies that r(l) = F(0)B', it follows that










Br(1) + G = Br(0)B' + G

00
= B[ B3 G B']B' + G
j=0


= Bj+l G B'j+] + G
j=0


= E B G Brj
j=0

= r(0),

and the result is established.


4.1.2 Results for the Usual Yule-Walker Estimators

The usual YW estimator of G, GT, is found by using the

relationship in (4.1.3) and letting

GT = FT(0) B TT(1), (4.1.6)

where F (0), T(1) (= T '(-I)), and BT are defined in Section 2.1.

The following results from Hannan (1970:209-210,329) will be

useful in determining a property of the estimators developed in

Section 4.2.


Lemma 4.1:

If yt is generated as in (4.1.1) and A6 and A7 hold, then

P
GT,ij G..i as T -- c for all i and j,

where GT,i and G are the (i,j) elements of G and G, respectively.
T,1i ij T

Lemma 4.2:

Under the same conditions as in Lemma 4.1,

r ..T(-k) P---F ij(-k) as T m for all i and j, and k=0,l.
T,iJ ij










By the nature of the estimation procedures for the usual YW

estimators, (4.1.3) and (4.1.5) are satisfied for BT, FT(0), FT(-1),

and GT. It will now be shown that (4.1.4) also holds. Let

k
k =. T TT
j=O

k
= I B [r T(0) B FT(0)BT ] B TJ
j=0

k
= [BT T(O)Bj B j+B T(O)B ,j+l]
j=0

k+l 'k+l
= rT(O) BTk1 FT(0)B T+

We know that B satisfies A6 and that F (0) is nonsingular with

probability one. (See Hannan (1970:329,332).) (Of course, FT(0) then

must be positive definite with probability one.) If we consider a new

first-order autoregressive process for which B is replaced by BT and G

by FT(0), it follows that the sum in (4.1.4) must converge for that
00
process. That is, F BT FT(0) BT is a positive definite matrix
j=O T
with probability one. In order for this matrix sum to converge, the
k+1 k+1
contribution of B F (0) B k+ to this sum must tend to zero as

k -- o. For this reason, one can conclude that Sk FT(0) as k m.

Thus,

FT(0) = E B GT BT
j=0


4.2 The Yule-Walker #1 and #2 Covariance Estimators

Our objective is to develop estimators of the covariance terms

in our model which reflect the spatial nature of the model. A natural










modification would be to estimate G using the relationship in (4.1.3)

and letting

GTI= T(0) BrTT(1) (4.2.1)

where BrT is a general notation for the estimator of B using either

the YW#1 or YW #2 estimators in the known weights case or the YW#1

estimators in the variable weights case. In general,

BrT,ii = aT for all i.

In the known weights case,

BrT,ij = bTWij for all i and j # i,

and in the variable weights case,

B ,ij = b vij(a ) for all i and j i.
rT,ij T ij T

For specific estimation schemes, BrT is replaced by BT1 in the YW#1

case and by BT2 in the YW#2 case. We will use the general notation

whenever possible in our discussion in this chapter and consider the

specific cases only when necessary.

An undesirable property of GTI is apparent upon replacing FT(1)

with T (O)BT' in (4.2.1). That is, we have

GTI = T (0) BrT T(O)BT

Since, in general, BrT BT, it follows that GTI is not symmetric.

A modification which corrects this problem would be to use GrT as an

estimator of G, where
1
T -(G + ). (4.2.2)
rT 2 TI TI

As in the case of B T, GrT will be the general notation for the esti-

mator and GT1 and GT2 will denote the YW#1 and YW#2 estimators of G,

respectively.










It follows from the work in 3.2.3 and 3.3.3 that r- (0)

is a component which enters into the calculation of the asymptotic

covariance matrices of the estimators of (a,b) or (a,b,a). At this

stage, we have only the moment estimator of F(0), T (0). It is desir-

able to develop another estimator which takes into account the special

structure of our model. One criterion would be to develop an estimator,

SrT(0) in general notation, which fits into the framework of the three

relationships given by (4.1.3), (4.1.4), and (4.1.5). Using (4.1.4)

gives
(O) = P B Bj
rT = rT rT rT (4.2.3)
j=O

It is not clear that the right-hand side converges because it is

not known that BrT satisfies A6, nor is it known what effect GrT might

have. (This is an area for additional research.) Two suggestions are

made for practical usage of (4.2.3):

(i) Calculate terms in the sum, (4.2.3), until convergence,

according to a specified criterion, is established. In

our work, a correlation matrix was calculated after each

step in the summation. When the absolute change in the

correlations from one step to the next was arbitrarily small

for all elements in the matrix, convergence was assumed.
L
(ii) Calculate Z B J Gr B where L is a preassigned limit.
j=0 rT
The choice of L would probably depend on when one would expect

convergence to occur if r(0) were calculated by using (4.1.4).

Combinations of (i) and (ii) could be used. For example, (i) could be

used, with (ii) as a default if convergence does not occur before L steps.









Empirical investigations like those presented in Chapter VI should

provide more insight to the practical considerations of this problem.

Using (i) and/or (ii) provides an estimator that is modified

according to the special structure of our model. One might then sug-

gest using FrT(0) and BrT in (4.1.5) to get a modified estimator of

r(-l), rT (-1), which in turn would be used along with BrT and rT(0)

in (4.1.3) to modify GrT. However, since (4.1.4) and (4.1.5) imply

(4.1.3), one, in theory, would not get any modification of GrT'

This would be the case if the sum in (4.2.3) did converge and it were

possible to calculate all terms in the sum. However, only a finite

number of these terms will be calculated in order to determine rT(0).

It would seem that using only a finite number of terms would not have

a major modifying effect on GrT if the stopping rule (in summing terms

in (4.2.3)) were reasonable. If the modification was not major, BrT,

GrT, rT(0), and rT(1) could be regarded, for practical purposes, as

satisfying (4.1.3), (4.1.4), and (4.1.5).

The covariance estimation procedures discussed in this section

can be summarized in three steps.

Step 1: Estimate B using (a TbTl) or (aTbT1 ,T)

to calculate BT1 or (aT2,bT2) to calculate BT2'

Step 2: Estimate G with GTI or GT2 using (4.2.1) and (4.2.2).

Step 3: Calculate a modified estimate of r(0), FTI(0) or

rT2(0),by using (4.2.3) and following (i)

and/or (ii).










4.3 Consistency of the Yule-Walker #1 and #2
Covariance Estimators

In order to show the consistency (in probability) of GrT, we will

show the consistency of each of its components. The consistency of GrT

will then follow from standard results. In the variable weights case, we

have from 3.3.2 that aTl, bTl, and aTI are all consistent (in probability).

It then follows that

P
B i= aT -- a = B as T -- for all i
Tl,ii o ro,1i
and

BT = bTlv ij(a ) b v. ()= B
Tl,ij TI ij Tl o ij o ro,ij

as T for all i and j # i,

since v i(a) is continuous. Therefore, in the variableweights case,

B is a consistent estimator, element-wise, of Bro

From our earlier work in the known weights case, it is enough to

consider only the YW#2 estimators of (a,b). We have from 3.2.2 that both

aT and bT2 are consistent. It follows then that

BT2ii aT2 a = Broii as T for all i

and

P
B = b w -P-;b w.. = B .. as T m for all i and j#i.
T2,ij T2 ij o 13 ro,1i

Therefore in the known weights case, both BT1 and BT2 are consistent

estimators, element-wise, of B .
ro
Since this covers all of our estimation procedures, we can say,

in general, that
P
B ----B as T o for all i and j.
rT,ij roij

That is, B is a consistent estimator, element-wise, of Br.
rT ro










From Lemma 4.2, we have that both F (0) and F (1) are element-wise

consistent estimators of F(0) and F(1), respectively. Using this result

and the consistency of BrT, we have by standard results,

P
G .T --> G. as T for all i and j. It then

follows from (4.2.2) that

P
G rT,ij -- Gij as T m for all i and j.

Thus, GrT is an element-wise consistent and symmetric estimator of G.

This result is analogous to Lemma 4.1.

In order to determine whether or not FrT(0) is a consistent

estimator of F(0), one must specify a stopping rule for the sum in (4.2.3).

If (ii) is followed, let

L
r (0) = E B G Br j
L j= ro ro

The results of this section imply that

L P
r (0) = E Br G B'r -;+F (0) as T m for all i and j.
rT,ij = rT rT rT jij L

That is, rT (0) is an element-wise consistent estimator of FL(0) which is

a good approximation to F(0). Of course from Lemma 4.2, T (0), itself,

is a consistent estimator of F(0), but it is hoped that FrT (0) would

perform better than FT(0) for finite T because the specific structure of

our model is taken into account in the former.

If a stopping rule like (i) were used, the study of the

consistency of rT(0) would be more difficult because the number of

terms included in the sum would be a random variable. Consistency of

the estimator in this situation was not extensively studied.






73



Estimation of the covariance matrices of the asymptotic

distributions derived in Chapter III requires an estimator of r- (0).

Thus, a desirable property for an estimator of F(0) is nonsingularity.

The question of the nonsingularity of rT(0) has only been empirically

considered. For the results reported in Chapter VI, F (0) was non-

singular in all cases.















CHAPTER V


INFERENCE



5.0 Preamble

The results in Chapters II through IV represent the foundation

for the inferential procedures presented in this chapter. In Section 5.1,

asymptotic single-parameter test statistics and confidence intervals for

a, b, and a (if appropriate) will be presented. Joint confidence inter-

vals and tests will he considered in Section 5.2. Prediction with the

general first-order autoregressive model will be discussed in Section 5.3,

and these results will be applied to the spatial model in Section 5.4.


5.1 Asymptotic Single-Parameter Hypothesis Tests
and Confidence Intervals

5.1.1 The Known Weights Case

One of the advantages of the parameterization for our special

first-order autoregressive model is that it allows for exploratory

study of the underlying process. In the known weights case, there are

two effects to be studied, the location effect, represented by the

parameter a, and the neighbor effect, represented by the parameter b.

To perform hypothesis tests and construct confidence intervals, we use

the results from 3.2.3, where it was shown that


T[(a ,bT) (ao,bo)] D-- N2 (0,Hr ZH ') as T + m, (5.1.1)

where E = Gi^- (0) and H is either H1 (YW#1) or H2 (YW#2) given
in 3.2.3.
in 3.2.3.








Let 2 =(H H') and 2 = (H EH ).
a r r 11 b r r 22

The results presented here can be applied to both the YW#1 and

YW#2 estimators. Consequently, a general notation without subscripts

"1" and "2" will be used.

The asymptotic univariate distributions follow directly from

the asmptotic joint distributions. Thus, for large values of T, both

aT a b -b
z and z -
a a /T b /blF

can be regarded as being approximate standard normal random variables.

In order to test the hypothesis, H : a = a the usual z-test would be
o o
used with the test statistic, z If a = 0, this would provide a
ao o
test of the hypothesis of no locational effect. That is, we test that

the response at a location at time t is not explicitly related to the

response at that location at time (t-l).

In the same way, to test the hypothesis, H : b = b one could
o o
use the test statistic, zbo. If b = 0, this would provide a test of
b0 0
the hypothesis of no overall neighbor effect in the sense that the

response at a location at time t is not explicitly related to the

response at any of the other locations at time (t-l). Note that if

bT2 were used, the test would represent a test of an overall neighbor

effect with regard to a specific weight structure. However, if bTl

were used, one considers the specific weight structure only through E

and the assumption that the weights are scaled to add to unity for each

location. (The weights determine B which, along with G, determines

r(0) and hence, E.) While for theoretical considerations, the YW#1

estimator can be regarded as fitting within the general framework of

the YW#2 estimation scheme, it is seen that in application, at least









in terms of calculating the estimate of b and performing this hypothesis

.test, the YW#1 approach is more general.

If one were interested in estimating a and b, the usual confi-

dence intervals could be constructed. A (l-y) 100% confidence interval

for a would be

aT + Zy2 a//T,
a T zY/2 'a '

where zy/2 is such that P(z > z y/2) = y/2 and z ~ N(0,1).

Likewise, a (l-y) 100% confidence interval for b would be

bT Zy/2 b /T

Upon observing the form of a or ab for either estimation
a b
scheme, it is clear that for practical usage of these results, one

would need to estimate Ca and ab by aT and a respectively. Since
a b aT bT
H1 and H2 are both constant matrices, E is the only matrix that needs

to be estimated. We estimate Z by replacing G and r(0) with their

consistent estimators, GrT and FrT(0), presented in Section 4.2.

It should be noted that in using 0aT and ObT as consistent estimators of

a and ob we are taking into account the specific weight structure
a D
assumed for our model in both the YW#1 and YW#2 cases through the use

of GrT and FrT(0).


5.1.2 The Variable Weights Case

For the variable weights case, the parameter a provides for a

distance effect in addition to the location and neighbor effects.

Recall that in 3.3.3, we showed that

b ) (aobo )] (0,~D) as T (5.1.2)
I/T[(a ,b ,T t ) (a ,b ,ba )] -) N3 (0,1 ) as T + =, (5.1.2)
Tl Tl Tl o o o3 1










where E = (M R ) E (M R )',
1 0 0 0 0

M is given in (3.3.29),
o

R is given in (3.3.30), and

S= G F'(0).

Let 02 = a2 = E and o = Z
al 1,11' bl 1,22' al 1,33

As in 5.1.1, the appropriate test statistic to use in testing

the hypothesis, H : a = a would be
o o


aTI ao
z =
ao oal/


The interpretation of the test when a = 0 is the same as in 5.1.1.

Similarly, the appropriate test statistic for testing

H : b = b would be
o o

bTl b
z
Zbo = bl


In order to test H : a = a more care must be taken. We
o o
showed in 3.3.3 that the asymptotic distribution of v/ (OcT1-a ) exists

only if b # 0. Consequently, the following hypothesis test can be

performed only if assumption A13 (b A0) holds. In that case, the

appropriate test statistic would be

STI 0o
z =
o CtlY

If a = 0, this would provide a test of the hypothesis of no distance
o
effect among the neighbors. Under this hypothesis, the effect of a

neighbor on a location does not depend on the distance between the loca-

tion and the neighbor, according to our specific weight structure,






78


because v. .(0) = for all i and j # i. Because A13 must be assumed
1j n-i
for the validity of this test, it follows that one should test first

for a neighbor effect and then for a distance effect among the neighbors.

We can construct (l-y) 100% confidence intervals for a, b, and a

as follows:
!al
aT 1 ZY/2 'al


b + T bl
bT1 /2 '
and

+ al
T1 y/2 T

For practical usage of these results, we would need to estimate

Sal, obl, and al. In addition to estimating G and F(0) with GrT and

r (0), respectively, M and R need to be estimated. The forms of

M and R in (3.3.29) and (3.3.30), imply that they can be estimated
o o
consistently using bTl and aT1 to calculate M1 and RT1. Since bT1 # 0

(All), MTI exists for the same reason that Mo exists. However,

assumption of All is necessary only if the procedures involving a1

are used. Since all the components involved in the determination of

0aTl', bTl' and OaTl are consistent, it follows that oaT1, ObTl' and

oaT1 are consistent estimators of al,' bl' and al, respectively.

The results in this section, as well as the next, are asymptotic.

The empirical investigations reported in Chapter VI should provide some

insight into the use of the results for finite sample sizes.










5.2 Asymptotic Multiparameter Hypothesis Tests
and Confidence Regions

5.2.1 A General Result

The following lemma provides a general result that will be

useful in developing multiparameter hypothesis tests and confidence

regions. Since this is a known result, it is stated without proof.


Lemma 5.1:

Let 8 be an estimator of 0 based on T observations, with
-T-o
D
both 0 and 6 of length k. If $r(8 -) -- ->V (0,E) as T ,
-T -0 -T -
then if -E exists,

T(O )' E _-1 (T -- D X2 (k) as T -

where X2(k) is the central chi-squared distribution with k degrees

of freedom.


5.2.2 The Known Weights Case

By using the result in (5.1.1) and applying Lemma 5.1, one can

derive the asymptotic test of the joint hypothesis, H : a = a and
o o
b = b The test statistic is
o

2 =T[(aT,bT)- (ao,b)] rE [(aT,bT)- (ao,b) (5.2.1)

where E = Hr EH Hr = H or H2 depending on whether the YW#1 esti-

mators orYW#2 estimators are being used. The form of the rejection

region for a y-level test would he X2 > X2(2), where X (k) is such that
Y Y
p[2(k) > (k)] = Y.

If H is rejected, one might use the single-parameter procedures

in 5.1.1 in an attempt to determine individual differences which could

have led to the rejection of H .
o










For estimation purposes, an asymptotic (l-y) 100% joint confidence

ellipsoid for (a,b) could be constructed using a technique like that

in Anderson (1958:55). The ellipsoid would consist of all values of

(a,b) for which

T[(a,bT)- (a,b)] 1[(aT'b )- (a,b)]'< X (2). (5.2.2)

Since both (5.2.1) and (5.2.2) contain E the question of
r
whether or not E is invertible must be answered, where

-1
E = H (G g -(O))H '.
r r r

Now both G and (0) are invertible, and H (H1,H2) is clearly of rank 2,

which implies that Z is of rank 2 and hence that I exists.
r r
The results of this section still follow when, in practice, E is

estimated consistently by ErT where

-1
= H(G (C P (0))H '.
rT r rT rT r

The matrix, rT, will then be invertible if both GrT and FrT(0) are

invertible.


5.2.3 The Variable Weights Case

More care must be taken in developing and using multiparameter

procedures in the variable weights case, because TI 's behavior is

evaluated only if b # 0.

From (5.1.2) and Lemma 5.1, it follows that to test the joint

hypothesis, H : a = a b = bo 0, and a = a ,for large T, one can

employ the test statistic,


XlT[(a T,b T Ta )-(a ,b ,a )] [(a T,b Ta Tl)-(a,b ,a ) ]', (5.2.3)
T1 Tl T1 o o o 1 T1 T1 Tl o o

where E = (M R )(G @ F- (0))(M R )'. The form of the rejection region
1 o o o o
for a y-level test would be X2 > X2(3).
Y










If H : a = a b = b # 0, and C = a is rejected, the single-
o 0 0 0
parameter procedures of 5.1.2 could then be used to detect individual

differences.

If b = 0, one would need to first test H : a = a and b = 0
o o O
using the YW#1 procedure given by (5.2.1). If H is rejected, one

could use the single-parameter tests in 5.1.2 to detect significant

differences from the hypothesized values. The test of H : a = a

would be carried out only if one were willing to assume b # 0 (A13).

An asymptotic (l-Y) 100% joint confidence ellipsoid for

(a ,b ,a ) would be all values of (a,b,a) for which

-1 () 524
T[(aTl,bT1,aT)-(a1b,a)] E1 [ (aT,bTT)-(a,b,]) ] X2(3). (5.2.4)

Any points of the form (a,0,a) would need to be eliminated from

the ellipsoid since we consider the joint distribution of (aTl,bT TaTl)

only if b # 0. Using (5.2.2), a confidence interval could be con-

structed for a in the case of b = 0. Even with the (a,0,a) values

removed from the ellipsoid, it would seem that (5.2.4) would be a bit

difficult to portray graphically. A better procedure might be to graph

contours of (a,a) for selected nonzero b-values.

From (5.2.3) and (5.2.4), it is seen that ,1 must be invertible.
-I
If M R is of rank 3, it follows that E1 is invertible since both
o o 1
G and F(0) are invertible. Upon examining the form of R in (3.3.30),

it follows that the rank of R is 3 since we assume that there are at
o
least two different distances (A4). Since M is clearly of rank 3, it
o
follows that M R is of rank 3.
o o
In practice, one would use ET1, a consistent estimator of l,

-1 -1 -1=
where ET1 = (1lRT1)(GT1 FT(O))(MT1 R). '









Since M 1L RT is of rank 3, ETZ will be invertible if GTI and FT1(0)

are nonsingular.


5.3 Prediction with the General First-Order Autoregressive
Multivariate Time Series Model

5.3.1 Introduction

One of the major purposes in developing a time series model is

to use the model to predict or forecast future realizations of the series.

Consider again the model for the first-order autoregressive multivariate

time series,

t = B t-l + (5.3.1)

where assumptions A6 and A7 are true.

Suppose this process is observed for T time periods, t=1,2,...,T.

This section deals with the problem of predicting YT+k' k = 1,2,...,

that is, predicting k time units ahead.

We begin by writing the model in (5.3.1) for t = T+k in terms

of the observations by time T. We have

YT+k = B YT+k-1 + -T+k

= B(B T+k-2 T+k-l -+T+k

= B2 T+k-2 + B -T+k-i +-T+k


k-1
= Bk T + B(5.3.2)
jOT + T+k-j
j=O

It follows that

E(YT+k I yT = y' T-1 = Tr-1 .) = B

since an implication of A6 and A7 is that E is independent of

y_>yt-2 .-t for all t. Practically, we will be interested in the
yt-l, t_2,..., for all t. Practically, we will be interested in the









expected value of YT+k given only a finite number of past values. But

the Markovian nature of the autoregressive model implies that


E(YT+k I 7T "'Yl = Yl) = BT = E(YT+k I YT = T) (5.3.3)

so that this practical consideration imposes no limitations.


5.3.2 Prediction When B is Known to be B

From (5.3.2), it would seem natural for one to use Bk y to
o X-T
predict YT+k if one wished to use only a linear combination of past

observations (i.e., yT',1T_,',* ). Call this predictor -T+k. An

application of a more general result in Hannan (1970:127-130, 135-136)
k
leads to the conclusion that yT+k = Bo IT is the best linear predictor

of yT+k using the entire past, yT,T_, ... The predictor, YT+k' is

best in the sense that the minimum of E[(T+k -+k)' (vT+k- -T+) ]

is at -T+k = +k' where the minimum is taken over all linear predictors,

-T+k, of nT+k based on the entire past. So

T+k B (5.3.4)

is the mean square predictor of YT+k'

For a particular realization of the series, l,2, ...* T, the

predicted value at time T+k would be
k
v = B k.
"T+k o -T

From (5.3.3), we see that this predicted value is such that

-+k = E(T+k I T = T''''' = l) = E(T+k I yT = T)"

The error of prediction is defined to be the difference between

the actual value at time T+k and the predicted value. One important

characteristic of these errors is their variance-covariance matrix.

There are two approaches in evaluating this matrix that we will consider.










One is the conditional on the part of the particular past realization

that is used in the prediction, yIT, and the other is the unconditional

over all values of yT. If these yield different results, the experimenter

would then need to decide which approach would be appropriate to his

experimental situation.

Case 1: We consider the conditional approach first.

Let VT(') and E (*) denote the conditional (on y = y ) variance-

covariance matrix and mean vector, respectively, and let V(*) and E(*)

denote their unconditional counterparts. We then have from (5.3.2),

(5.3.4), and assumptions A6 and A7, that

VT(error of prediction) = VT(YT+k T+k)


k-l
= V ( E B 0j .(3)
T o -T+k-j
j=0

k-l k-i
j=0 j=0
T o T+k-j o -T+k-j


j=0 j=0

k-i
o -T+k-j Bo j



j=0O

k-I
E Z B GB (5.3.5)
j= 0 0

Case 2: We now consider the unconditional approach.

Using (5.3.5) and the arguments used in Case 1, we have

V(YT+k T+k) = E[VT(yT+ T+k) + V[ET(Y T+k- T+k)

k-i
= Bj G B + V(O)
j=0 0 0








k-1
= JkI BjGB 'j
j= 0 0

= V (IT+k +k). (5.3.6)

This result can be found by an application of the general result in

Jones (1964). If k = 1, we see that V(T+1 iT+) = G which agrees

with our intuition.

Another form of the variance-covariance matrix can be derived by

using the form of F(0) in (4.1.4). This implies that
k-1 l
SB G B 3 = F(0) Z B j G B
j=O 0 0 j=k o o

= F(0) B k[ B GB B k
o j=0 0 o o
k k
= (0) B F(0) B (5.3.7)
o o
By the same reasoning that was used in a similar case in 4.1.2,

we can conclude that this variance-covariance matrix of prediction

errors approaches F(0) as k -+ o. This result is intuitively appealing,

since as one predicts farther ahead in time, the information provided by

the past observations becomes less important. Consequently, the predic-

tion variance-covariance matrix conditional on the past values approaches

the unconditional variance-covariance matrix of the time series.


5.3.3 Prediction when B is Unknown

The more realistic prediction situation is to treat the matrix

of coefficients, B, as unknown. In this situation the predictor would be


~B+k= BT. (5.3.8)

An approximation to the variance-covariance matrix of the predic-

tion errors will be derived here. Our approach will be similar in some

respects to that of Box and Jenkins (1976:269) in the scalar case. We

make the following assumption.









k
A15: The matrix BT can be regarded as being independent of YT.

Since YT is used in the calculation of the usual YW estimator,

BT, we know that A15 is probably not true. However, if T is large, it

would seem that the effect of yT on BT would be relatively insignificant.

Thus, A15 could be used in deriving an approximation to the variance-

covariance matrix of the prediction errors. We derive this approximation

by using the mean vectors and variance-covariance matrices of asymptotic

distributions determined by repeated applications of Lemma 3.3 to

Lemma 3.2. We use the notation, "-", instead of "=" at each point where

an actual moment is replaced by a moment of the corresponding asymptotic

distribution. Let B be the true (unknown) value of B.
o

Case 1: We first consider the conditional case.

An application of Lemma 3.3 to Lemma 3.2 yields
k k
E(B B ) Z,
o T
where Z is a matrix of zeroes. It follows then from A15 that
k k k k
E [(Bo B k)T] = E(Bo B )T 0.


This result, along with (5.3.2), (5.3.8), A6, A7, and A15, implies that

k-1
V(T+k +k = V T[(B k + B T+k-j
oT(YT+k o[ T 0 T=+k
j=0

ET[(yT+k T+k) (YT+k YT+k) ]

= E[(Bk BTk T (BokBk),]


k-i
+ E[ E Bj B ]
j=0 o --T+k-j -T+k-j o


k-1
SV[(B Bk )YT + E B G B (5.3.9)
S T =0 o o
j=o0










The following lemmas will be used to derive an approximation to

V[(Bk -B k


Lemma 5.2:

Let X be an n xn matrix of random variables for which the

variance-covariance matrix of x= (X11,X12... ,X1n,X21,... ,...,Xnl,

...,X ) is E. Let A and B be n x n constant matrices, r=1,2,...,k,
nn r r
k
and S = E A X B Then the variance-covariance matrix of
r=l r r
s (S l,S12,...,Sn,S21,...,S2n,...,S n,...,Snn)' is H E H', where

k
H = E (A B ').
s r= r r
r=l

Proof:

Let A and B be n xn constant matrices and P = A X B. It is

claimed that the variance-covariance matrix of

P= (PP2'...,PP2 ,...,P2n... Pn...,P )' is HP ZH where
11 12' n' 21'" 2n''' nl'"' nn p p
H = (A B').
P
Let R = AX and r = (RI1,R12,...,R nR 21 ...,R2n,...,R n ...,Rn)'.

Since R.j = A. X j, it follows from a standard result that the variance-

covariance matrix of r is H E H where
r rx

Br.
rHj








n
/I--,-^--.-- ,
7A 0 .0

0 Al 0 0




0 .. 0 All


':1


n

A 2 0 0
0 A12 0

0 Al2 0 ... O


0 .


A22 0 .

0 A22 0 .


0 A12**


. 0 **

. 0 **


. O A21 O A22...


/-----^----
n

Aln 0 ..... 0

0 Aln 0 0




0 0 Aln


A2 0

0 A 0 0




0 A2n


A 0 .. 0 A 0 .. 0 ... An 0 ... 0

0 A 0 .. 0 0 An 0 .0 .. 0 An .0




0 O A 0 ... O An2 0 .0 Ann
......OAni 0 ........0A n.OA
o--

= (A In), (5.3.10)


where I is the n x n identity matrix.
n

Now let T = X' and t = (TI,T12, .. ,Tln T21 ..T2n ...,T n,...,T)'.

Then the variance-covariance matrix of t is H Z h where
t t

H = Since there exist unique r and s such that t. = T = X
t ix-. i rs sr

it follows that the only nonzero element of Ht,. is that corresponding
t,1r
to X which contains "1". Since x. = X = T the only nonzero element
sr 1 rs sr

of H t. is that corresponding to T which contains "1". This implies
ta sr
that H = H '.
t t


A21 0 .

0 A21 0




0 .


. 0

. 0










Let Q = XB. By following the route from X to X' to B'X' to XB,

the previous two results imply that the variance-covariance matrix of

q = (Q11'Q12'""''Qln'Q21'*..'Q2n'* ..'Qnl*""'Q nn)' is

H (B'I )H EH '(B'GI )' H = H (B'OI )H 'EH (BI )H '. It is
t n t t n t t n t t n t

known that Ht(VGW)Ht' = (WgV), where V and W are nxn matrices. Then

the above variance-covariance matrix can be written as (I GB') E (I GB).
n n
Since P = AQ, it follows from (5.3.10) that the variance-covariance

matrix of p is (AI )(I B') E (I B)(A'GI ). Simplifying, we have,
n n n n

(AOI )(I GB') E (I GB)(A'I ) = (AB') Z (A'B), since
n n n n
(AI )(I OB')(AI = ( AI ') = (AB'). Therefore,
n n n' n

H = (AQB'). (5.3.11)
P

Let S = A XB s be the corresponding vector representation,
r r r -7

and H = Then the variance-covariance matrix of s is H EH '. Now
s tx. s s

s as



r E s .
aI i
H=r1_









ax
k s .



ax.

k
= Hs
r=l Sr

where, from (5.3.11), H = (A rB '). Therefore,
s r r
r
k
H = E (A GB ').
s r r
r=l










Lemma 5.3:

Let A and B be
k1
to (A-B) in terms of

Proof:

The proof will

(A-B)2


two n xn matrices. Then a first-order approximation

powers of B is Ak E A-BA
j=0


be by induction. For k = 2,

= (A-B)(A-B)

= A2 BA AB + B2

SA2 BA AB


1
= A2 A BAl-j
j=0

as a first-order approximation in B. Suppose the result holds for k.

It follows that
k+1 k
(A-B)k+ = (A-B)(A-B)k

k-1
(A-B)(Ak E A1 BAk-l-j
j=0

k-1
= A BA E Aj+ BAk-l-j
j=0

k
k+1 j k-j
= A A BA ,
j=0

and the result holds for k+l.

To derive the asymptotic variance-covariance matrix of (B k-BTk )T'

k k
we consider a two-stage transformation, from B to Bk to BT YT. The fol-

lowing lemma gives the asmptotic variance-covariance matrix for the first

stage of the transformation.











Lemma 5.4:

Define A = BTk and A = Bk. Let A and A be the corre-
Tk T ok 0 LTk -ok

spending vector representations of BTk and B k respectively. Then



(Tk ok) n2 ( k) as T ,

k-1
where Ak = Hk EHk, E = (Gc (0)), and Hk = E (B j B ok-1).
j= o

Proof:

From Lemma 5.3, we have,





0 0 = 0 0 0
j=0=0

k-1
= Z B (B -B)B k-1-j
j=0 0

Sk-1
= kB k- B B B k-l-j
0 J= 0 0
j=0

We consider a first-order approximation here because that is what we

used in the asymptotic distribution results in Chapter III. The above

k k
approximation to B B is just the Taylor expansion (in matrix form)

of (Bk-Bk) about B to the first-order term. Consequently all first-

order partial derivatives with respect to the B..'s evaluated at B
lJ o,ij
can be found in this approximating matrix. Since this matrix will be
k-1 i -
evaluated at B = BT, it is enough to consider E B B B k--
T j=o To
j=0
for determining the asymptotic variance-covariance matrix. By applying

Lemma 5.2 to Lemma 3.2, we can conclude from Lemma 3.3 that

-A ) D 2(0,A ) as T -,
-Tk --ok n k










where Ak=Hk EHk',
k k k'


k-i
E = (Gfr (0)), and H = (B JB k-l-j
k =0
j=0


Theorem 5.5:

With Ak defined as in Lemma 5.4, we have,

k-i
VT(yT+kT+k) T (In T')Ak(I T) + B o GB
j=O

1 k k
S( In T')Ak(InOT) + F(0) Bk F(O)B

Proof:

From the comments following (5.3.9), we know that our objective

isto approximate V[(Bk-B )T. Since YT is regarded as a constant

vector, applying Lemma 3.3 to Lemma 5.4 implies that

V(Bk BTk)I 1 H A H
V[(o T H,


f(hi (A)
where H = -- and h (X)


n

YT1 YT2 .YTn

0 0
H =
y


n
= i A y ,jfor all i. We see that
j=l 1i ,j
n n

.o. .. T2 ~ ... .. o 0

YT1 YT2 YTn''" 0 ...... 0


S0 0


0 ..... 0 ... YT1 YT2 YTn~


= (In (aTF)

The result follows by using (5.3.9) with (5.3.7).

Note that the asymptotic normality of /T[(B k Bk )] was an

intermediate step in the preceding proof.