UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
ESTIMATION AND PREDICTION FOR CERTAIN MODELS OF SPATIAL TIME SERIES By LLOYD MARLIN EBY A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1978 TO MY PARENTS AND FAMILY FOR THEIR LOVE AND SUPPORT ACKNOWLEDGMENTS My sincere thanks go to my advisors, Dr. Richard Scheaffer and Dr. James McClave. I will always appreciate their patient guidance throughout this project, from suggesting the problem to providing help ful comments on the first draft of this paper. To be able to draw on their experience in research situations was always reassuring. Special thanks go to the faculty and students of the Department of Statistics for their encouragement during the pursuit of this degree. My appreciation extends to Professor Harry Canter at Millersville State College who, because of his enthusiasm for statistics and special interest in his students, was instrumental in my entering this field. I am deeply grateful for the support of my family and friends. Knowing of their loving concern and prayers for me, during this under taking, meant much to me. My typist, Mrs. Edna Larrick, is especially deserving of my thanks. She somehow deciphered my hieroglyphics and turned them into this typing masterpiece. Her perseverance in this difficult task is greatly appreciated. TABLE OF CONTENTS Page ACKNOWLEDGMENTS . . iii LIST OF TABLES . . ... ..... .vi ABSTRACT . . ... ... .vii CHAPTER I INTRODUCTION . ... . 1 1.0 Preamble ... . . 1 1.1 Introduction to the Spatial Problem .. 1 1.2 A Literature Review . . 3 1.3 Our Approach to the Problem . 7 1.4 An Outline of Our Results . .. 10 1.5 Notation and Format .. .. 11 1.6 Review of Assumptions Introduced in Chapter I .. 11 II ESTIMATION OF MODEL PARAMETERS . ... .15 2.0 Preamble . . .. 15 2.1 The Usual YuleWalker Estimators . .. 15 2.2 The Known Weights Case . ... 17 2.3 The Variable Weights Case . .. 22 2.4 Review of Assumptions Introduced in Chapter II 29 III PROPERTIES OF ESTIMATORS .. . ... .30 3.0 Preamble . . ... 30 3.1 Results for the Usual YuleWalker Estimators and Another Useful Lemma . 30 3.2 The Known Weights Case . .... 32 3.3 The Variable Weights Case .. ..... .35 3.4 Review of Assumptions Introduced in Chapter III 62 IV ESTIMATORS OF COVARIANCE MATRICES AND THEIR PROPERTIES 64 4.0 Preamble . . ... 64 4.1 Results for the General FirstOrder Autoregressive Multivariate Model ..... . 64 4.2 The YuleWalker #1 and #2 Covariance Estimators 67 4.3 Consistency of the YuleWalker #1 and #2 Covariance Estimators . .. 71 TABLE OF CONTENTS (Continued) CHAPTER Page V INFERENCE . .. .. 74 5.0 Preamble . . .. 74 5.1 Asymptotic SingleParameter Hypothesis Tests and Confidence Intervals. . 74 5.2 Asymptotic Multiparameter Hypothesis Tests and Confidence Regions . .. 79 5.3 Prediction with the General FirstOrder Autoregressive Multivariate Time Series Model. . 82 5.4 Prediction with the Spatial FirstOrder Autoregressive Multivariate Time Series Model. .. 95 5.5 Review of Assumptions Introduced in Chapter V 97 VI EMPIRICAL RESULTS . . .. .... 98 6.0 Preamble . . .. 98 6.1 Monte Carlo Studies . . .. 98 6.2 A Real Data Example . .. .127 BIBLIOGRAPHY . . .. . 131 BIOGRAPHICAL SKETCH . . .. ..... 134 LIST OF TABLES Table Page 1.1 Notation . . ... ..... 12 1.2 Assumptions Introduced in Chapter I . .... .14 2.1 Assumptions Introduced in Chapter II . ... 29 3.1 Assumptions Introduced in Chapter III . .. 63 6.1 Minimum and Maximum Absolute Roots of f(z) = I Br z =0 102 6.2 Weights Assigned to the Neighbors of Location 7 .. .103 6.3 The Negative FirstOrder Correlation of Each Location with Location 7 . .... .. .105 6.4 YW#1 Estimates of a with Actual and Estimated Asymptotic Standard Deviations of aTl . .... 107 6.5 YW#1 Estimates of b with Actual and Estimated Asymptotic Standard Deviations of bTl . .. .109 6.6 YW#1 Estimates of a with Actual and Estimated Asymptotic Standard Deviations of aT1 . .. ill 6.7 Actual and Estimated Values of the Asymptotic Covariance of aT1 and bT ............. .... .. 119 6.8 Actual and Estimated Values of the Asymptotic Covariance of aTl and aT ........... .. 120 6.9 Actual and Estimated Values of the Asymptotic Covariance of bT1 and TI . . 122 6.10 Mean Squared Errors for Usual YuleWalker and YuleWalker #1 Estimates . .. ... .124 6.11 Mean, Range, and Standard Deviation of (aTl, bT, T1) 125 6.12 Covariances and Correlations of (a T, bT, TI) .. .125 6.13 Names and Coordinates of Employment Exchange Cities 128 6.14 YW#1 Estimates of (a, b, a) and the Asymptotic Covariance Matrix of (aT, b, aT) . .. 129 STV' TV' Ti Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ESTIMATION AND PREDICTION FOR CERTAIN MODELS OF SPATIAL TIME SERIES By Lloyd Marlin Eby August 1978 Chairman: Richard L. Scheaffer CoChairman: James T. McClave Major Department: Statistics Our primary objective is to consider a special class of first order autoregressive multivariate time series models in which the individual series correspond to locations on a plane. Conditioned on the past, the expected response at a given location for a given time period is taken to be a linear combination of the immediate past response at that location and a weighted average of the immediate past responses at the other locations. If the weights are not assumed to be known, an exponential weight function of the interlocational distances is used. (We refer to this as the variable weights case.) The form of the weighting function is quite flexible in that it allows for a wide range of weighting schemes which might be appropriate in various applica tions to both regular and irregular arrays of locations. Parameters of interest are the two linear coefficients and a parameter in the weight function (in the variable weights case). An estimation procedure is proposed which takes into account the spatial nature of the process through modification of the usual vii YuleWalker estimators. Using the results for the usual YuleWalker estimators, ours are shown to be consistent (in probability) and asymp totically normally distributed for both the known and variable weights cases. A benefit of our approach to the spatial time series problem is that we obtain straightforward asymptotic tests for location, neighbor, and distance effects. Asymptotic joint confidence ellipsoids are also given for these parameters. We develop an approximation to the variance covariance matrix of the kstep prediction errors in using the fitted general firstorder autoregressive model. The necessary modifications of this matrix for the spatial model are given. We present consistent estimators of the variancecovariance matrices of the error term and the time series. This allows us to consistently estimate all other variancecovariance matrices encoun tered in our work. Some simulation results are presented which indicate that the performance of our estimators depends on the location, neighbor, and distance effects as well as array characteristics. There does not appear to be one model specification for which all estimators perform well except for large (by time series standards) samples. An actual data example is also analyzed. The methodology developed is flexible so that it can have a wide range of application. The procedures presented suggest the possibility for extension of these results to other firstorder autoregressive models, both spatial and nonspatial, for which restrictions are placed on the coefficient matrix. viii CHAPTER I INTRODUCTION 1.0 Preamble The spatial problem being investigated is introduced in Section 1.1 by considering several examples which serve as motivation for our work. After a review of the literature in Section 1.2, we describe our approach to this problem in Section 1.3. An outline of the results to be presented is given in Section 1.4. Section 1.5 introduces our notation and format for the dissertation. 1.1 Introduction to the Spatial Problem Many physical processes generate multivariate responses for which the components of the vector response are associated with distinct points in a plane. These responses may be repeated over time. Such processes are referred to as spatialtemporal processes. For example, several weather stations might be located throughout a region, with each station monitoring local conditions on a regular basis. Suppose temperature readings are recorded every hour at each station. We can regard the vec tor responses of hourly temperatures as a multivariate time series. In addition to expecting a relationship among the vector responses over Time, we might expect a spatial relationship among the components of the vector since the individual variates correspond to particular locations in a region. In particular, we might expect that there is a "distance 1 effect" among the locations, with the responses from those stations that are close together being perhaps more strongly related than responses from stations that are far apart. We refer to multivariate time series of this type as spatial time series. For our real data example in Chapter VI, we consider unemployment rates for ten centers in southwestern England. Each month, the unemploy ment rate is determined for the region corresponding to each center. These monthly rates for all ten centers constitute a spatial time series. In modeling a spatial time series, our objective in this paper is to model the nondeterministic component of the series. (We expect to con sider the deterministic component as well in future research.) With this objective in mind, we consider the simplest autoregressive model, the firstorder model for which Yt = B t + t' (1.1.1) where yt is the vector response at time t, ct is an unobservable random error vector, and B is an n x n matrix of coefficients. In the general model, it is not assumed that B has a specific structure which would reflect the spatial nature of the series. Consequently, in applying only the general estimation schemes for B to a spatial problem, we are not ex plicitly accounting for the spatial aspects of the phenomenon under study. It would seem desirable to assume a structure for B that reflects the spatial nature of the process. In particular, in considering a response at a given location at time t, it would be of interest to con sider the relationship of that response to a response at the same loca tion and to responses at neighboring locations in the previous time period. Factors such as distance should enter into the consideration of the rela tionship with a particular neighbor. By assuming such a structure for B and developing estimation procedures based on this structure, we hope to model the underlying process which generated the series. In addition, the structural assump tions would probably mean a reduction in the number of parameters in the model. A parsimonious parameterization is desirable, provided that such a model adequately describes the process, since such a parameterization allows more efficient usage of the sample information. A model which incorporates both the spatial and time aspects of the process would seem to be a better forecasting tool than a model which only includes the time aspect. Before going into more detail on our approach to this problem, we review the literature on related problems. 1.2 A Literature Review Much of the work in the general area of spatially related random variables has been done with purely spatial processes, where both joint and conditional models have been considered. For the joint model, the response at location i is related to the responses at the other locations, simultaneously. Specializing a joint model to the linear case, we have that y = ij Yj + Ei' (1.2.1) j/i where E. is a random error term. For the linear conditional model, the 1 relationship is such that E(yiy responses at other locations) = E Yj y.. (1.2 .2) jfi At first glance, it would appear that taking the expectation of yi in (1.2.1) conditional on the responses at the other locations would yield (1.2.2) with y.. = Bij for all i and j J i. However, this is not the case since the error term, Ei, is not independent of the y.'s. Bartlett (1974), Besag (1974), Brook (1964), Cliff and Ord (1975), and Ord (1975) give more complete discussions of the differences between the two speci fications. Many of the specific results for spatial processes are for. regular arrays (for example, rectangular) of locations. Restrictions are usually placed on the coefficients in (1.2.1) and (1.2.2). For example, a simple firstorder joint model on a regular lattice is given by j= (ilj + ij + i+lj + i,j+) + j' where the subscripts correspond to the coordinates of the location. The correlation structure (or spectral function) of some of the joint models (or their continuous analogues) are considered by Bartlett (1974), Besag (1972), Heine (1955), and Whittle (1954). Whittle (1954) developed a maximum likelihood estimation scheme for the parameters of the spectral function. Besag (1974), a major proponent of a conditional approach, discusses a class of conditional models called automodels. Examples are the autonormal, autobinomial, and autologistic. These models are specified by the probability (or density) function of y. conditional on the response at all other locations. Although these models can be speci fied for both regular and irregular arrays of locations, the statistical analysis is generally limited to the regular lattice cases. Besag (1974) shows that it can be quite difficult to use maximum likelihood procedures directly and thus discusses two alternative approaches. The first relies on a subsetting of the responses (which is called coding) which results in a simpler likelihood. In the second, another simpler maximum likeli hood procedure results when a unilateral approximation to the original process is used. (For the unilateral approach, the concept of one directional dependency in an autoregressive time series is extended to two dimensions.) Besag and Moran (1975) use the coding procedure to develop a test of spatial dependency for an autonormal process. Although irregular arrays may be less attractive mathematically, they are of interest for practical reasons since many spatial processes occur naturally on irregular arrays. Cliff and Ord have done extensive work in this area. Their approach has been to specify weights that are functions of array characteristics such as interlocational distances and region size. (See Cliff and Ord (1969, 1975), Cliff, Haggett et al. (1975:148149, 161), or Mead (1971) for examples.) For example, a joint model could be specified such that n y. = P E w.. y. + E., (1.2.3) j=1 j#i where the w..'s are known weights and E is a random error term. (The approach also extends to the conditional case.) A natural extension would be for a restricted parameterization of the weights so that sample infor mation could be used to estimate them. Two types of inference problems are considered for Cliff and Ordtype models. The first involves tests for spatial autocorrelation and the second involves parameter estimation. Cliff, Haggett et al. (1975:152155) present a parameteric test (under normal assumptions) and a nonparametric test of H : p = 0, where p is as in (1.2.3) or its conditional analogue. Both test statistics, under the null hypothesis, have asymptotic normal distributions (as n m). Cliff and Ord (1972) develop a similar test for spatial correlation among the error residuals in a linear regression. Maximum likelihood estimation procedures (under normal assump tions) are presented by Ord (1975) for both the model in (1.2.3) and an extension which included regressor variables. Maximum likelihood proce dures for some other models are outlined in Cliff and Ord (1975). Another approach to modeling spatial processes has been to think of the responses as a surface and fit polynomial models of the form, m r y = E E ij x x + E, i=0 j=0 1 1 2 where xl and x2 are the map coordinates and c is a random error term. (See Cliff, Haggett et al. (1975:4970).) This is an example of a trend surface model. A somewhat different class of spatial processes is the class of spatial point processes. These processes are characterized by the dis tribution of points across a region. The literature is fairly extensive in this area. Two important types of analysis of point processes are the distance methods and the quadrat. count methods. A sampling of results for these and related methods can be found in the work of Diggle (1975), Holgate (1972), Mead (1974), Rogers (1974), and Strauss (1975). Spatialtemporal processes are an extension of purely spatial processes. Both Granger (1969) and Cliff, Haggett et al. (1975:107141) have used standard multivariate time series techniques (crossspectral analysis) in comparing time series corresponding to locations in a region. Crosssectional time series analysis may be appropriate for some spatial problems where the cross sections are taken over regions or locations. Swamy and Mehta (1977) consider a linear model for cross sectional time series in which the coefficient vector is taken to be the sum of a mean vector and two random components. One component varies over time and among individuals (which could be locations) and the other varies only over individuals. Fuller and Battese (1974) consider estimation of a linear model for crosssectional time series but assume an error term which is the sum of location and individual components (possibly random) and another random component. Both Maddala (1971) and Nerlove (1971) have studied estimation for errorcomponent linear models (somewhat similar to Fuller and Battese's model) which contain a single lagged value of the depen dent (univariate) variable. Cliff and Ord (1970) discuss estimation schemes and testing procedures for the coefficient vectors of a linear model for cross sectional time series. Constraints on the coefficient vectors such as equality for all individuals (or over time) are considered. They also develop some estimation procedures when the coefficient vector is random. Although we found a number of related problems in our literature review, we found little evidence of statistical procedures developed for a spatially restricted coefficient matrix for the model in (1.1.1). This research develops such procedures. 1.3 Our Approach to the Problem In Section 1.1, we suggested that a firstorder spatial time series model should incorporate location, neighbor, and distance effects in the structure of B. We will do this by considering the response for time t at location i, yt,i' to be of the form n y = a yt +b w.. ylj + (1.3.1) j+ti =l j tl,j ,t j#i where t .i is a random error term, a and b are parameters whose values are unknown, and n is the number of locations in the array. The w..'s 1J are weights which may be completely known or contain one or more param eters to be estimated from the sample information. We make three assump tions concerning the weights. Al: For all w. 0 : w < 1. ij ij1 A2: For all i, wii = 0. A3: The weights are scaled to add to unity for each location. n That is, Z w.. = 1 for all i. j=l 13 j#i Since yt,i already enters the model with a as its coefficient, we set w.. = 0 for all i. The other two assumptions are made to provide 11 a consistent class of models. (For example, the total weight should not depend on the number of locations in the array.) The necessity of these assumptions will be seen as they are used in the derivation of certain results in later chapters. n By considering all three assumptions, we see that E w. y j=l j tl,j is just a weighted average of the responses at time (tl) for all loca tions other than i. It follows that the parameters, a and b, can be regarded as accounting for a location effect and a neighbor effect, respectively. If a is zero, only the neighboring locations of i are explicitly related to y .. However, if b = 0, none of i's neighbors t,i appears explicitly in the model for yti (By a neighbor of location i, we mean any location other than i and not just contiguous neighbors.) 9 The nature of the distance effect among the neighbors would determine the form of th weights. If a distance effect is to be considered, there must be at east two different interlocational distances, and thus, the need for an additional assumption. A4:; There are at least three locations in the array. If there are exactly three, the array is not in the form of an equilateral triangle. The model in (1.3.1) is a specific case of a more general model suggested in Cliff, Haggett et al. (1975:202). By referring to the model in (1.3.1) as "our model," we do not intend to suggest originality on our part in the model formulation, but we do develop original methods of parameter estimation, particularly in the variable weights case. We also refer to this model as "the spatial model." Writing the model in (1.3.1) in matrix form yields t = (a I + bW)y + E, where W is the matrix of weights (all diagonal terms are zero) and I is the n x n identity matrix. We summarize the restrictions on B as follows. A5: For a firstorder autoregressive spatial time series, the model in (1.1.1) is such that B = Br, where B .. = a for all i r,ii and B .ij = b iw for all i and j # i. ri3 ij With this model specification, one objective is to estimate a, b, any parameters in the weight function, and the variancecovariance matrix of the error terms. Another objective is to make the modifications neces sary to use this model in forecasting. 1.4 An Outline of Our Results We consider two cases of the spatial model. In the first the weights are assumed to be completely specified (the known weights case) and in the second, the weights are of a specific form but contain a parameter to be estimated (the variable weights case). In Chapter II, we develop estimation schemes for the location and neighbor parameters in both cases and also for the distance effect parameter in the variable weights case. These schemes involve modifica tion of the usual YuleWalker estimators according to the specific struc ture assumed for B (i.e., Br). In Chapter III, we show the existence of finitevalued estimators using these schemes. These estimators are also shown to be consistent (in probability) and asymptotically normally distributed. The asymptotics are in terms of T, the number of vector responses observed in time, and not n, the number of locations in the array. Consistent estimators of the variancecovariance matrices of both the random error term, Et, and yt are presented in Chapter IV. In Chapter V, we focus on inferential aspects. Procedures based on asymptotic results are given for testing hypotheses and constructing confidence ellipsoids for the location, neighbor, and distance (if appro priate) parameters. We also derive an approximation to the variance covariance matrix of the kstep prediction errors in using a fitted general firstorder autoregressive model and make the necessary modifica tions for the case of the fitted spatial model. We conclude in Chapter VI by presenting simulation results which provide insight into some of the procedures developed in earlier chapters. We also analyze a real data set. 1.5 Notation and Format Since notation in time series work can be quite cumbersome, we summarize our notational system in Table 1.1. From time to time, we introduce certain assumptions and as we introduce each one, we give the rationale for it. It is to be under stood that the assumption is in effect for the remainder of the paper. At the end of each chapter, we list all assumptions introduced in that chapter. 1.6 Review of Assumptions Introduced in Chapter I The assumptions introduced in Chapter I are summarized in Table 1.2. Table 1.1 Notation Notation Ai. A . A.. A .. C, 1J {A. .} ij (ABC).. A' IAI A C I n x xi x C., i {x ) m m=1 f(*) x = P xT x xT D x rT+r * Interpretation row i of matrix A column j of matrix A the element in row i and column j of matrix A the element in row i and column j of matrix A c the matrix comprised of the A..'s 13 the element in row i and column j of matrix ABC A transposed the determinant of the matrix A the Kronecker product of A and C the n x n identity matrix the vector, x th the i element of x the i element of x C the sequence, x x2, x ... the function, f a particular value of the random variable x xT converges to x in probability xT converges to x in distribution or law convergence of a sequence of constants is approximately equal to is distributed as the kvariate normal distribution with mean V and variancecovariance matrix E Nl (WG2) N(,o02) Table 1.1 (Continued) Interpretation Notation e 0 BT Rk iff gib in 3.2.1 in (3.2.1) Al Cl R1 Smith (1975:27) the true value of 0 when 9 is a parameter an estimator of 0 based on T observations kdimensional real space if and only if greatest lower bound in Section 3.2.1 in equation (3.2.1) assumption #1 condition #1 result #1 page 27 of the reference authored by Smith and published in 1975 Table 1.2 Assumptions Introduced in Chapter I Section Assumption 1.3 Al: For all w.i, 0 < w. < 1. 1.3 A2: For all i, wii = 0. 1.3 A3: The weights are scaled to add to unity for each n location. That is, E w = 1 for all i. j=l iJ j#i 1.3 A4: There are at least three locations in the array. If there are exactly three, the array is not in the form of an equilateral triangle. 1.3 A5: For a firstorder autoregressive spatial time series, the model in (1.1.1) is such that B=B r where B .. a for all i r,ii and for all i and j j i. B r, = b w.. r,ij ij CHAPTER II ESTIMATION OF MODEL PARAMETERS 2.0 Preamble In this chapter, we will consider estimation schemes for parameters other than the variance and covariance terms in the special firstorder autoregressive model introduced in Chapter I. Since the estimation procedures to be introduced involve modifications of the usual YuleWalker (YW) estimators, a review of the YW estimation procedure will be presented in Section 2.1. The estimation procedures for the known weights case and variable weights case are presented in Sections 2.2 and 2.3, respectively. The properties of the estimators will be derived in Chapter III. 2.1 The Usual YuleWalker Estimators Hannan (1970:1315, 326333) and Fuller (1976:7273) are the primary references for the results of this section. Again consider the model for the general firstorder auto regressive multivariate time series, yt = Byt + t' (2.1.1) where yt, t1 and et are vectors of length n and B is n x n. The follow t ing assumptions are made. A6: All roots of f(z) = II BzI = 0 lie outside the unit circle. A7: The error terms, t 's, are independent and identically distributed with mean, 0, and variancecovariance matrix, G. There are three implications of A6 and A7 that should be noted at this stage. The results are given here without proof. The first is that E(yt) = 0 for all t. (2.1.2) The second is that t is secondorder stationary. That is, the covariance function has the following property: E(y 4 ) = F(st) for all s and t. (2.1.3) A third implication of A6 and A7 is that E is independent of t _t1',t2"'., for all t. It is now apparent that in making assumptions A6 and A7, we are assigning a stability to the process in terms of its first two moments. It should also be noted that since our special firstorder model can be included within the general framework of the model in (2.1.1), A6 and A7 will be assumed throughout for our special model and so results like (2.1.2) and (2.1.3) will still follow. If both sides of (2.1.1) are multiplied by ytl and expectations are taken, we have, after applying (2.1.2), (2.1.3), A6 and A7, r(l) = Br(O). This leads to _1 B = r(l)r (0). The usual YW estimator of B, B is found by replacing the parameters on the righthand side of the above equation with their "moment" estimators. That is, 1 BT= r(1)r (0) (2.1.4) T T T where r T,ij = Tk Y+k,i Y1,j k=O,l. This estimator, B T, defines a process which satisfies, with probability one, the conditions for stationarity given in A6. As was noted, an implication of A6 and A7 is that E( t) = 0. This is somewhat unrealistic if yt is regarded as the vector observa tion at time t. Thus, we will let x denote the vector observation at t time t and assume ~t to be as in A8. A8: Let yt = x R for all t, where E(x ) = p. S L t t t The calculations necessary in (2.1.4) are then carried out using T x x, t=l,2,...,T, where xi = E xt T. Hannan shows that this t i It=1 ti mean correction does not change any asymptotic properties of interest in our work. Consequently, for the remainder of the theoretical consider ations in this paper, it will be assumed, without loss of generality, that the mean correction has already been made. A9: Let t = x x, t=l,2,...,T, be the observations  t t used to fit the model in (2.1.1). 2.2 The Known Weights Case 2.2.1 Introduction We now work with the special form of the coefficient matrix, B, which we denote by B where r Br, = a,=1,2, r,ii"" and B ij= bw i,j=l,2,...,n; iij. The wij's are known weights for which we assume Al, A2, and A3. Our objective then is to estimate a and b which, in turn, allows us to estimate B r Since the weights are assumed to be known in this section, it would be helpful to first consider some possible choices of weights. 2.2.2 Examples of Known Weights It was stated in the introduction that most of the work done with spatial processes on irregular lattices has been with known weights. Ord (1974) states in the discussion on Besag's paper that one of the specifications of a spatial model arises when the spatial relationship is in the form of a time lag, which is true of our model although the spec ification is different. Consequently, some of the weighting patterns that have been suggested or used in the literature for spatial models are presented here, since they may be appropriate for the spatial temporal processes that we consider. If the researcher possesses con siderable insight into the process being studied, it may be reasonable to completely specify an appropriate weighting scheme. In the following examples, the weights will be presented in the unsealed form. The simplest weighting scheme is: w.. = 1 if location j 1J is a nearest neighbor of location i, j # i, and w.. = 0 otherwise. (See Cliff, Haggett et al. (1975: 161).) We will refer to models with this weight structure as "closestneighbor" models. i If one had regional data and wished to consider the relative size of the regions and distances between their centers, one might use the scheme, q (j) Wij d j#i, where qi(j) is the proportion of location i's interior boundary which is in contact with the boundary of location j and d.. is the distance between 1j location i and location j. (See Ord (1975).) Both of the above weighting schemes assign nonzero weights only to those locations which are direct or contiguous neighbors. If all neighbors are to be taken into account in the weighting scheme, one might use the following weights: 6 w.. = d.. jfi, ij ij j i, where 6 is specified or ad.. jj wij = e 1j, j/i where a is specified. (See Cliff and Ord (1975).) For both of these weight schemes, we see that the weights either increase or decrease monotonically as d.. increases, the direction of change depending on the sign of 6 and a. 2.2.3 The YuleWalker #2 Estimation Procedure for the Known Weights Case Since using the usual YW estimators to fit the firstorder auto regressive time series model, when B = Br, does not account explicitly for the spatial nature of the process being considered, it is desirable to develop an estimation scheme which does account for this spatial nature in a more direct fashion. By checking assumption A7, we see that no distribution has been assumed for the E 's. Since this allows for a t wider range of application in our work, it would seem advantageous then to develop an estimation procedure which is distributionfree. The dis tributionfree results given for BT in Section 2.1 and 3.1 suggest esti mation procedures which modify BT. There are various criteria by which one might modify the usual YW estimator to reflect the spatial nature of the process. One criterion is to use as an estimator of B those esti r mators of a and b which make the B .'s as "close" as possible to the rT,ij usual YW estimators, the BT,ij 's. The criterion suggests a least squares approach. In this case, take as the estimators, aT2 and bT2, those values of aT and b which minimize n n SS = Z Z (B T,iBri ) (2.2.1) i=1 j=1 where BrT,ii = aT and BrT,ij = b wij, j#i. Also BT is the matrix of usual YW estimators given by (2.1.4). (This subscript "2" indicates that these are the YW#2 estimators. The significance of the "2" will become apparent later.) Because of the form of BrT, the sum of squares function given in (2.2.1) can be separated into two parts, the diagonal sum of squares and the offdiagonal sum of squares, as shown below: n n n SS = E (BT a ) + Z Z (B b w.)2. (2.2.2) i=l1 i=l j=l ij j j#i The value of aT which minimizes the above sum of squares is that value which minimizes the lefthand component in (2.2.2). A similar statement can be made about the minimizing bTvalue relative to the righthand component in (2.2.2). By taking partial first derivatives of (2.2.2) and equating to 0, one finds that n E BT,ii a = n (2.2.3) T2 = n and n n b B u.. (2.2.4) Si=l j=l 1 ,iJ j, j#i where n n u..= w. .ij/( w 2). 1J 1J k=l =lk Z#k From this discussion, we see that the YW#2 estimators of a and b can be found through a twostep procedure. Step 1: Find the usual YW estimator of B (and hence B ) by using (2.1.4). Step 2: Find aT2 and bT2 from (2.2.3) and (2.2.4), respectively. 2.2.4 The YuleWalker #1 Estimation Procedure for the Known Weights Case In this estimation scheme, a property of the weights is used to find another estimator of b. Let a be the same as aT2 given in (2.2.3). To find bTl, note that nn n n E E B = Z bw.. i=1 j=1 r,ij i=l j=1 3 j#i j#i n n = b E wi i=1 j=l j#i n = b E 1 i=1 = n b. This suggests that b could be estimated by n n i=1 j=l BT' b = (2.2.5) bT1 n In comparing (2.2.5) with (2.2.4), it is seen that (2.2.5) is 1 just a special case of (2.2.4) where u.. = for all i, j#i. Note that j1 n this will be the least square estimator if the weights are w.. = n for ij nl all i, j#i. This observation will make our theoretical considerations in later chapters easier in the sense that we need only consider YW#2 in the known weights case. The twostep procedure for the YW#1 estimators is as follows. Step 1: Find the usual YW estimator of B by using (2.1.4). Step 2: Find a T and bT from (2.2.3) and (2.2.5), respectively. 2.3 The Variable Weights Case 2.3.1 Introduction In the study of spatial processes, one may be willing, in a particular situation, to specify the form of the weights but not their specific values. In these situations, the weight function would contain a parameter or parameters to be estimated using the sample information. We will consider the weight function of the form (before scaling), ad v (a) = e i, j#i. (2.3.1) (From this point on, the notation "w.." will be reserved for the known 13 weights case.) This weight function was introduced in 2.2.2, but now a is a parameter to be estimated from the sample information. This particular weight function will be investigated in the following section, after which we present procedures for estimating a, b, and a. 2.3.2 Properties of the Exponential Weight Function The exponential weight function takes distance into account in a reasonable way, exponentially decreasing or increasing as distance increases depending on the sign of a. If a = 0, each neighbor receives identical weight. For these reasons, one can label a as a "distance effect" parameter (assuming b # 0). Because of the explicit dependence of these weights on distance, they are suitable for both regular and irregular arrays of locations. This weight function has certain mathematical properties which allow one to develop the statistical and numerical properties of aT* One such property is continuity everywhere as a function of a. Another concerns the limits of the functions as lja tends to . Now, ad.. ) = e 1 = 1 ji. (2.3.2) ij n adik n a(dijdik) Z e E e k=l k=l k/i k#i Let c. = the number of locations j for which d.. = min {d i} and kii f. = the number of locations j for which di. = max {d ik. 1 1 k ik Let us first consider the limiting case as a tends to . It is enough to consider the limiting behavior of the components in the denom inator of v..(a) in (2.3.2). For j#i, d iff dij > dik a(d dik ) e 11 k 1{ 1 iff .d.. = d. e iff dij = dik 0 iff d.. < dik . From the limiting behavior of these components, it is clear that, i if d = mind } c. ij i ik 1 k#i lim v..(a) = a++ 13 0 otherwise. In the limiting case then, we have the weights corresponding to the closestneighbor model introduced in 2.2.2. Let us now consider the limiting case as a tends to _. By observing the result in the previous case, one might conjecture an analogous result here with the weights corresponding to a "farthest neighbor" model. Indeed, it follows that S if d.i = max {d. } fi 1] k ik lim v..(a) = S 00 otherwise. We thus see that the exponential function allows flexibility in the weights for the spatial process. 2.3.3 The YuleWalker #2 Estimation Procedure for the Variable Weights Case The criterion used in deriving the YW#2 estimators here is the same as that used in 2.2.3 (i.e., least squares). As before, the sum of squares function to be minimized is split into two components as follows: n n n 2 SS = E (BT,iaT) 2 + E BT ,b i(a ) (2.3.3) i=l 1=1 j=1 j#i where v..(a) is given by (2.3.2). Then aT2, bT2, and aT2 are those values of aT, b and a respec tively which minimize the sum of squares function in (2.3.3). Taking the first derivative of this function with respect to aT and equating to 0 yields, n E BT,ii i=l a = i=1 . (2.3.4) aT2 = n (23.4) Similar action in terms of b yields, n n E Z B v. ..( ) i=1 j=1 TJj 1 T b T j2i (2.3.5) T2 n n E E [VkZ (YT)2 k=l =l #k After seeing the form of our sum of squares function in (2.3.3), it is not surprising that our results here agree with those for the YW#2 estimators of a and b in the known weights case. Equation (2.3.4) agrees with (2.2.3) and (2.3.5) agrees with (2.2.4) if a value of aT is specified. A result that will be useful in simplifying our work is, for j # i, vij (aT) n iT = vij() di.kik(c) d., (2.3.6) atT 3 k=l k 1 Now taking the first partial derivative of the sum of squares function in (2.3.3) with respect to aT and equating to 0 yields, n n n b T E [B by vj()vi (T)[d E d.ikvi(aT) = 0. (2.3.7) i=1 j=1Tij ij k=1 j#i Then aT2 is the solution to (2.3.7) with bT replaced by bT2, given in (2.3.5). The resulting equation can be simplified a bit by dividing through by bT2. This modification necessitates the assumption that bT2 is nonzero. Al0: In the variable weights case, bT2 j 0. The necessity of this assumption is seen by examining equation (2.3.5) and the sum of squares function in (2.3.3). Any aT2value which would lead to bT2 = 0 in (2.3.5) must be meaningless because it is obvious from (2.3.3) that if bT2 = 0, a cannot be estimated. Therefore aT2 is the solution to the following equation, n n n E E [B ijbT vi(a )]v (aT)[d. E dik ik(T)] = 0, (2.3.8) i=1 j=1 k=1 j#i where bT2 is given by (2.3.5). The YW#2 estimation procedure can be summarized in two steps. Step 1: Find the usual YW estimator of B by using (2.1.4). Step 2: Find aT2 and bT2 explicitly from (2.3.4) and (2.3.5), respectively after finding aT2 implicitly from (2.3.8). Two problems arise with this estimation procedure. First, the implicit solution (and its determination) to (2.3.8) is complicated by the fact that bT2 is also a function of aT2. Future research would indicate whether or not this would be a problem numerically. In any case though, the evaluation of the statistical properties would be more difficult. The second potential problem occurs in the presence of a weak neighbor effect (i.e., b close to 0). Since a distance effect can be identified only if a neighbor effect is present, it would seem that it might be difficult to get a clear picture of any distance effect if the neighbor effect itself is small. This suggests that T2's behavior might be erratic (i.e., large variance) in the presence of a weak neighbor effect. However, since the estimators of b and a are intertwined in the YW#2 procedure, it appears that there may be an effect on both bT2 and aT2 in this case. These problems, real and potential, should serve as motivation to consider, at least initially, other estimation schemes for which b is estimated independently of a. Such a scheme, YW#1, is presented in the next section. All additional work for the variable weights case has been for the YW#1 estimators. One aspect of future research will involve the study of the YW#2 estimators in this case. At this stage of the discussion, it might now be clear why the numerical labels for these estimation procedures were given as they were. 2.3.4 The YuleWalker #1 Estimation Procedure for the Variable Weights Case In 2.2.4, an estimator of b was introduced which did not use any property of the weights other than that they were scaled to add to one for each location. It is that estimator which will be used now in the variable weights case. The estimator of a is unchanged. That is, n BT,ii i=l aT n (2.3.9) and n n i=1 j=1 T,ij b j=i (2.3.10) Tl n Then aT1 is that aTvalue which minimizes the following sum of squares: n n 2 SS = E E BTijbTI vij(OT) (2.3.11) i=1 j=l j#i Using (2.3.6) and taking the first derivative of the function in (2.3.11) with respect to aT and equating to 0, we have, after simplifying, n n n b T E E BT,ijbTl vij (T) v (oT) LdJ k dik vik(aT) =0. T1 i=l j=l k=l jii We assume that bT1 is nonzero for basically the same reasons as were given in the previous section. All: In the variable weights case, bT1 # 0. It follows then that aT1 is a solution to the following equation: n n n i E [BTijbT1 ) j(aT)[dj E dikvik(I)] = 0. (2.3.12) i=l j=1 k=l j#i The YW#1 estimation procedure thus yields an estimator of b which is functionally independent of a. The procedure can be summarized in two steps. Step 1: Find the usual YW estimator of B by using (2.1.4). Step 2: Find aTl and bT1 explicitly from (2.3.9) and (2.3.10), respectively, and then aTI implicitly from (2.3.12). 2.4 Review of Assumptions Introduced in Chapter II The assumptions introduced in Chapter II are summarized in Table 2.1. Table 2.1 Assumptions Introduced in Chapter II Section Assumption 2.1 A6: All roots of f(z) = [IBzj = 0 lie outside the unit circle. 2.1 A7: The error terms, Et's, are independent and identically distributed with mean, 0, and variancecovariance matrix, G. 2.1 A8: Let yt = t p for all t, where E(xt) = ) . 2.1 A9: Let yt = xt x, t = 1,2,...,T, be the observa tions used to fit the model in (2.1.1). 2.3.3 A10: In the variable weights case, bT2 0. 2.3.4 All: In the variable weights case, bT1 0. CHAPTER III PROPERTIES OF ESTIMATORS 3.0 Preamble In this chapter, we will consider numerical and statistical properties of the estimators developed in Chapter II. Numerical prop erties of existence and uniqueness are considered, and the statistical properties of consistency and asymptotic distribution are investigated. In Section 3.1, we review those properties of the usual YW estimators which are beneficial in dealing with our estimators. This section also contains a general lemma which will be applied in the remainder of the chapter. We then discuss these properties for the known weights case in Section 3.2 and the variable weights case in Section 3.3. 3.1 Results for the Usual YuleWalker Estimators and Another Useful Lemma In terms of statistical properties, we will be concerned with consistency (in probability) and asymptotic distributions. The first two lemmas give these properties for the usual YW estimators. Hannan (1970:329332) gives proofs of the results which lead to these lemmas. Lemma 3.1: If Yt is generated as in (2.1.1) and A6 and A7 hold (that is, we have a secondorder stationary process), then for all i and j, B B as T + c, Tij o,13ij where BT is defined in (2.1.4) and B is the true value of B. 30 Let T= (BT,ll'BT,12"' ,,BT,ln BT,21...,BT,2n ,.,BT,n***,.,BT,nn) (3. ) and B'= (Bo,11,Bo,12,...,Bo,lnBo,21".' Bo,2n'**",Bo,nl'* ,Bo,nn).(3.1.2) Recall that for our model, B = B (A5). Lemma 3.2: Under the same conditions as in Lemma 3.1, we have r( ) D N 2 [0, (Gr (0))] as T +m T _o n  where G is the variancecovariance matrix of E defined in A7 and r(0) t is given by (2.1.3). The third lemma to be stated provides a useful result for the asymptotic distribution of wellbehaved functions of asymptotically normal statistics. Rao (1973:388) gives a proof of the lemma. Lemma 3.3: Let T be a kdimensional statistic, (eT,,... T,k ), for which V(T ) DNk (0, E) as T + m. Let hl,...,h be q functions of k variables and assume that each hi is totally differentiable. Then the asymptotic distribution of V~t[h (0T) hi (0)], i=l,2,...,q, as T t, is qvariate normal with mean 0 and variancecovariance matrix H H', where = ahQi (0) H 3= is t T j e=e The rank of the distribution is the rank of H i H'. 3.2 The Known Weights Case As was mentioned in 2.2.4, we can concentrate on the properties of the YW#2 estimators and simply note the slight adjustments necessary for the special case of the YW#1 estimators. 3.2.1 Existence and Uniqueness Recall from our work in 2.2.3 that n E BT,ii i=l aT2 =(3.2.1) and n n bT2 = BTijuij (3.2.2) i=1 j=1 1 1 j#i n n where u.. = w k l wE2 It is clear that aT2 and bT2 both exist i#k and are unique. This result also holds for the YW#1 estimators, since in that case, u = for all i and j # i. in that case, uij n 3.2.2 Consistency (in Probability) From Lemma 3.1, we have that the B ,'s are consistent (in T,ij probability). The modified estimators given in (3.2.1) and (3.2.2) are just linear combinations of consistent estimators and, hence, are both consistent (in probability) since n n E B E B . i=l T'ii P i=1 ro,11 a =  as T + m T2 n n na 0 n = a o n n P bT2 =E E BT iu  i=1 j=l ', 3 j#i n n  Z ro B ij .u i=1 j=1 1 j#i n n = Z b w..u.. i=1 j=] 13 1] j#i n n b E Z w.2. 0 i=l j=l 1' *i#i n n E E w 2k klk as T + j =b . o Similarly, P aT1 ao bT1 bO as T c as T + . 3.2.3 The Asymptotic Joint Distribution of (aT, bT) To find the asymptotic distribution of (aT2, bT2), we use Lemma 3.2 to satisfy the conditions of Lemma 3.3. For application of Lemma 3.3, let 0 = 8 = k = n2, Z = G 0 F(0), .T o o n i Bi Z ii i=l 1 n n n h2() = E E B .u. i=1 j=1l j#i where u.. is given after (3.2.2). Then 1J f9h.(O) I2 = e. 0 ... 0 0 0 ... 0 ... 0 ... 0 n n n 0 u12 ... uln u21 0 u23 ... 2n Un ... un n 0 (3.2.3) SU12' in U21 .23 2n nl n n1 _ Since the elements in H2 are all constants, it follows that hI and h2 are totally differentiable. Now, hlT) = aT2 and h2(T) = bT2 From our work in 3.2.2, we have hl(w ) = ao Applying Lemma 3.3 h (B ) = b . 2 o yields the result that v[(aT2,bT2) (ao,bo)] N2(0,H2EH2') as T + , where Z = GOFr 0) and H2 is given by (3.2.3). The univariate asymptotic distributions of both aT2 and bT2 follow easily from the joint result. It also follows that i[(aTbTI) (aao,bo)] _ N2(0,H1EH1') as T + , 11 where 1 = G F (0) and H is the same as H2 except u..ij = for all i and j n i i and j / i. 3.3 The Variable Weights Case As one might expect, it will be more difficult to get the properties of our estimators here, since there is no explicit solution for the esti mator of a. As was mentioned in 2.3.3, only the YW#1 estimators will be considered. 3.3.1 Existence and Uniqueness Recall from our work in 2.3.4 that n aT Bn Tl n (3.3.1) n n SZ BT,ij i=1 j=1 b ji (3.3.2) Tl n and aTI is the solution to the equation, n n n Z E [B T b v (a)]v j(a)[d. dik v k()] = 0. (3.3.3) i=1 j=1T,ij T 13 1 1 k=1 ik ik j#i It was established in 3.2.1 that aT1 and bTI both exist and are unique. In order to show the existence of cT1 we will work with the sum of squares function, call it s T(a), the partial derivative of which led to (3.3.3). That is, n n s (a) = E E [B bT vij(a)], (3.3.4) i=l j=l j#i where v..(a) is the exponential weight function given in (2.3.2). :l Define the following sets: N. = {locations j: d.i = min {d. 1 X ki 'kk 1 Q. = {locations j: j N., j#i}, Fi = {locations j: dij = max {dik}}, and Pi = {locations j: j V Fi, j#i}. Recall from 2.3.2 that c. = the number of elements in N and f. = the 1 1 number of elements in F.. 1 Theorem 3.4: Suppose the following conditions are met. Cl: The estimate, bT1, is nonzero. C2: It is not true that the usual YW estimator, BT, is such that Tb for all i and Ni Tl c. i T,ij S0 for all i and j E Q or fb 1 for all i and j E F. T1 f. BT,ij = 0for all i and j E P C3: There exists a location i for which c. < n1 where n 13.  1 Then there exists a finite aTI such that sT(oTl) = min ST (). Proof: Let MT1 = lim s (a) and T2 = lim s (a). Using the results from 2.3.2, it follows that n b T 2 n 1 j= E \ BT'+ El B 2 (3.3.5) MT1 i=l jCN T, ij ci i=l jCQ T,ij and n b 2 MT = ( f) + E B n (3.3.6) i=1 j EF Tij fi i=1 jeP. iT i i From C2, we see that both MT1 and MT2 are positive. (They are also fin ite since the BT,ij 's are finite.) Since s (a) is clearly a continuous function of a, we know that if there exists a finite a3 such that T (a3) < T1 and ST(a3) < M2' then there exists a finite aT1 such that s (aT) = min s (a). Our objective is to show the existence of a3. First, we will show there exists a finite al such that s (a1) < MT1 Let E > 0 be such that < min (min {~1} R, R R (3.3.7) i i where R1 and R2 are the positive roots of two quadratics to be introduced later in (3.3.15) and (3.3.18). Since lim v..(a) = 0 for all i and a* + 1i j e Qi and,for finite a, vij(a) > 0 for all i and j # i, it follows that there exists a finite positive OL such that for all finite a > aL' 0 < v..(a) < E for all i and j e Q.. (3.3.8) 1 We also know that lim v..(a) = for all i and j e N.. Since these a++o 1iJ c. 1 1 weights sum to 1 for each location, it follows that for all i and j E N., v..(a) from the left as a +. In addition to satisfying (3.3.8), 13 c. aL can be chosen to also satisfy the condition that for all finite a 2 a, 0 < e < vij(a) < for all i and j E N.. Consider the c ij c. I i i interval, S = [aL, 2aL]. Since S is compact, there exists wL > 0 such that w = min i,jEQi min v.. (a) acS 1 1 and there exists w > 0 such that w < and Ul Ul C. 1 w = max max v (a) . ju EN. aCS Thus, for all a S, Thus, for all a E S, 0 < wL r vi.(a) < E L 13 1 0 < c. 1  < vij (a) 1j for all i 5 w .< U1 C. 1 for all i and j Q., 1 and j e Ni. From (3.3.10), we have 1 0< c i v.. (a) ui c. 1ij 1 1 < <  ci 1 for all i and j E N.. 1 We now claim that for all a S, MT s (a) > 0. From (3.3.4) and (3.3.5), MTI sT(a) n / b Tl2 = E EZ B  i=1 jeN. 'T,' i 1 [BT b v .(a)]2 n T,ij Tl ij i=l jEQ. [B ib v.(a)] T,ij Tl 13 = 2bTl ( + =1 =1 JE i Qi n 1 B ..v. () E E B [v (a) i jeN i1i n E [ v ()t ) v 2 (a) j j N i i=1 jEQi i i (3.3.9) (3.3.10) (3.3.11) n + E i=l n  E S2 T,ij E jEQ 1 i=1 jEN. 1 ( 3.3.12) Case 1: Suppose that bT1 > 0. It is enough to show [_1v _v(a)]  ci ij E i j SQ. j B T, ij ij ( JQi n  Z B ..[v. (a) >0 i=1 jN. T,ij ci 1 for all a E S. (3.3.13) Now using (3.3.9), (3.3.10), and (3.3.11), we have n n b E E [ v a)] ij T i=1 jN. i i=1 1 n + 2 E i=1 J jEQ T,ijvij( Li E v. jEQ. 1J i E E B ..[ v. )] =1 jeN T,13 c 1) n 2 b [I SbTl[ i= + 2 E E [i=1 je , 1 2 n E ( w2.i) Z j N 1 i=l i BT,ijwL + i i=l i T,ij 0 J J Qi B T,ij< 0 T,ij  z i=l jeN. B T,ij>0 BT ij  T, ij E jEN. 1 B (w .) T,ij ci uij BT,ij 0 = A1 2 + BI +C1, T i=l1 J 1j + n + 2 E i=l E jEQ T,ij (3.3.14) where n A = b E (nlc.), i=l n n B=2 E E B ,ij E Ez B Ti =1 jeQ i i=l jCN. T,1J BT,ij <0 BT,ij0 and n n n n 1 ) T,ijwL T( c u +2 E E B iwL E E B (wj =1 jEQ i=l jeNi Tij BT,ij > BT,ij Now consider the following polynomial, fl(x) = A x2 + Blx + C1, (3.3.15) where Al, B1, and C1 are as above. From C3, it follows that A < 0. Clearly B1 < 0. From (3.3.10), we have C1 > 0. With these conditions on the coefficients of the quadratic in (3.3.15), it follows that fl(x) = 0 has two roots, one positive and one negative. Let R1 be the positive root. Since c in (3.3.7) is such that 0 < E < R1, one can conclude that the lower bound in (3.3.14) is positive and that (3.3.13) is established. Case 2: Suppose that bT1 < 0. It is enough to show b in E [ 1 2 (<0) V 2 2O 2n n vijE ij l i=l jeN. i i=l jQi 1 n n 1 +2 (a) E B .[v .(a)] < 0 for all a E S. =1 jeQ 1 i=l jN 1 1 (3.3.16) Now using (3.3.9), (3.3.10), and (3.3.11), we have jE [1 2 (a)] E E c 2 v ij jEN i=1 j EQ B ij.(a)  T,ij 1j E j^i 1 V B [ v1 T,ij c. i j 1 n 2 2 w 2 E E e ul i i=1 jEQ.J i +2 2 i=l j BT, ij SBTij e+ EQi >0 n Z Z i=l jEN. T,ij 1 B .( T,ij c. 1 n wi) E Z i=l jEN. 1ij BT,ij <0 = A2 e2 + B2 + C2, where n A = bT1 (nlc), 1=1 B2 = 2 E E Li=l j EQ BT,ij >0 BTij n  Z E i=l jEN. 1ij BT,ij<0 n C = b c w2. 2 T1 i ui 1=1 n + 2 E Li=1l SEQ i n BT, ij L i=l E j EN 1 BT,ij <0 BT,ij >0 n + 2 E i=1l E i E jcN. 1 n E E i=l jEQi BT,ij 0 BT,ij WL BTIj ] (3.3.17) and BTij B ij( T,ij C w u) ( n b Tl i=1 5 bT E i=1 Now consider the following polynomial f2(x) = A x2 + Bx + C2 (3.3.18) From C3, it follows that A2 > 0. Clearly B2 > 0. From (3.3.10), one can conclude that C2 < 0. With these conditions on the coefficients of the quadratic in (3.3.18), it follows that f2(x) = 0 has two roots, one positive and one negative. Let R2 be the positive root. Since E in (3.3.7) is such that 0 (3.3.17) is negative and (3.3.16) is thus established. Since bT1 0 0 by Cl, all cases have been considered. Thus, there exists an a1 belong ing to S (and hence, finite) such that sT(al) < MT1. To complete the proof, it is necessary to show that there exists a finite a2 such that ST(a2) < MT2. However, a check of the form of MT2 in (3.3.6) reveals that the details would be analogous to those of the part just completed. Finally one can conclude that a3 = al if MTL < MT2 and a3 = a2 otherwise. Thus, there exists a finite a3 such that sT(a3) < MT1 and s (3) < MT2 and the proof is complete. Before going on, some discussion of the conditions of this theorem is in order. The first condition is just All. The second condi tion is an assumption which seems to be reasonable. A12: The usual YW estimator, BT, is not such that Sb for all i and j N. bTl ci T,ij 0 for all i and j E Qi or b BTi for all i and j E F B. = ST,ij 0 for all i and j E P. . 1 It appears quite unlikely that one would observe a BT of either heavily restricted form excluded by A12. It should be noted that the second condition alone, does not imply that aT1 is finite, only that MT~ and MT2 are both positive. Finally, C3 is met by assuming A4. That is, the array has either more than 3 locations or 3 locations not forming an equilateral triangle. Only the existence of aTI has been established. Although we were unable to show its uniqueness, most of our empirical investigations would support such a conjecture. 3.3.2 Consistency (in Probability) Since aTl and bTI in the variable weights case are the same as their counterparts in the known weights case, we already have from 3.2.2 that P aT ao as T + and b bo as T o As would be expected, to arrive at the consistency of xT1 requires more effort. A general theorem will be presented and after its proof, it will be applied to our particular problem. Let 0 be an estimator of 0 based on T observations, o a param T 0 O eter of interest, and f(*) a finitevalued function of 0 and (. We define hT () = f(OT, ) and h (4) = f(eo,0). Theorem 3.5: Suppose the following conditions are met. Cl: There exists a finite ST such that hT ( T) =min hT(), where T is taken to be an estimator of o. C2: The function, h (*), is continuous everywhere. C3: The function, h (), has a unique minimum at fo. C4: (i) The limit of h (0) as + +m, M1, is finite and greater than ho( o). (ii) The limit of h ( ) as M _, M2, is finite and greater than h ( o). C5: We have that suplh T() h () > 0 as T 0. T p Then jT o as T  . Proof: This proof is patterned after one by Parzen (1962) and consists of establishing the following two results. P Rl: As T m,, h (4 ) h ( ). o T o o R2: For every > 0, there exists n > 0 such that o I E implies that ho (o) ho (0) n. Applying R1 to R2, with (T in place of (, yields the desired result. Proof of R1: If it can be shown that h T(4T) ho(o) < sup hT () ho(), it follows that ho( T) ho (o)( < 2 sup IhT() ho($) (3.3.19) since Iho(T) ho o)I Iho(PT)hT( T)I + IhT(QT) h(Io . Case 1: Suppose that hT( T) ho( o) 2 0. Then it follows from Cl that 0 < h T(T) ho () = inf hT()) ho( o) < hr($o) ho(o) h T 0 0 h 0 0 = IhT ( ) ho ( 5 sup h T() ho( )). Therefore, h T()T) ho(o) I1 sup hT() ho() . Case 2: Suppose that hT(OT) h () ) < 0. From C3, it follows that 0 < ho(4o)hT($T) = inf ho() hT T ) Sho(T) hT($T) = ho (T) h T(T ) < sup (hT(h) h (4) . Thus, h T(4T) ho( o) 0 < sup [ hT() ho(0) . 40 Therefore, from our comments prior to (3.3.19), we can conclude that Iho((T) ho(o)I 2 sup hT(h) ho(()I. Applying C5 to this result leads to the conclusion, ho(T)  ho (o) as T m. (3.3.20) Proof of R2: The proof will be by contradiction. Suppose there exists C > 0 such that for every rl > 0, there exists A such that o Aj SI I 1 and ho () ho(0) < l. Now choose a sequence of i's of the form,, and let m be the corresponding Gvalues. We then have a sequence {m } for which C[ n > E and h (o ) h( ) < for some m m=l 0 m a o o o m m > 0. Since, from C4, the limit of h (() as a + +Mo, M1, is finite, one can pick a finite A large enough to get arbitrarily close to MI. A similar statement can be made about small ( and M2. Now since M1 and M2 are both greater than ho ( ) and ho (m) ho(A ) as m m, there exists M, 1, and A2 such that for all m 2 M, m E S = [ Ao] union [o + E,2]. Since S is compact and h (A) is a continuous func tion of A, from C2, it follows that there exists 3 E S such that ho (3) = ho (o). This contradicts C3 and so R2 is established and the proof is complete. In order to show the consistency of aT the conditions of Theorem 3.5 will now be verified for our particular case. We have: T= (BT,12 ...,BT,nBT,21',BT,23.' ,BT,2n'*' T,nl,' T,n nl'bTI' (3.3.21) e = (B B B B B B B b) 0 (Bro,12""' Bro,ln' Bro,21' Bro,23, ...Bro,2n"*"'Bro,nl"''B ro,n nl'bo) (3.3.22) (where Bro = a for all i and B ..ro,ij = b v.( ) for all i and j j i, ro,ii o ro,1j o 13 o with vij (a) given in (2.3.2)), S= aC , 0 0 and n n f(Z, ) = E [B.. b v..(C)] . i=1 j=1 j#i This implies that n n hT( ) = E [B ,ijb v ij(a)]2 = sT(a), (3.3.23) i=1 j=1 jfi which agrees with the notation in (3.3.4), and n n h ()) = Z [Broij b v..(a)]2 = s (a). (3.3.24) i=1 j=l j#i Clearly f is finitevalued and we will now check the other conditions. Cl: In our case, 4T = OTI. This condition was established in Theorem 3.4. C2: The function, s (*), is clearly a continuous function of a since vij () is continuous for all i and j # i. C3: From (3.3.24), n n s (a) = [B b vi (a)]2 i=l j=l j#i n n = b2 [v..( ) v (a)]2. (3.3.25) o i=l j=l ij 0 ij j#i Therefore, s (a ) = 0 and so a is a minimum of s (*). o o o o Now suppose so(al) = 0. From (3.3.25), this implies that b2[v.(a ) v. (a )] = 0 for all i and j # i. Since consideration 0 1J 0 1j 1 of a~ is meaningful only if b # 0, it seems reasonable to consider the consistency of aT1 only if the following assumption is made. A13: The true value of b, b is nonzero. With A13, it follows that v..(x ) = v..(a ) for all i and j 0 i. 1 o j 1 (3.3.26) Now pick a location i for which there exists neighbors j and k such that d. # dik. Such a situation exists from A4. Since vij(a) is of 3j ik .3 the form given in (2.3.2), we have from (3.3.26), v.ij (a) vij(a ) Vik(o) Vik(I) ' ik o ik 1 which implies that o (d ikdi) eX (d ikd..) e = e , which implies that 1 o Thus, ao is a unique minimum of s ('). C4: From results in 2.3.2, we have, lim s (a) =b 2 ( ) ]2 v2 Sij 0 C. c ij a+1 i=l jN. i i=l jeQi i i where c., N., and Q. are defined in 3.3.1. This limit clearly 1 1 1 exists and is finite. Since bo # 0 by A13, the only way for MI to equal 0 is for ao to equal +0. However, if one really felt that ao equalled +4o, one could use the identical closestneighbor model in the known weights case (see 2.2.2). The considerations are similar in the case of M2 and so it would seem reasonable to assume that a is finite 2 o when the variable weights model is employed. A14: The true value of a, a is finite. With this assumption, C4 is established. C5: Using (3.3.23) and (3.3.24), it follows from Al that nn n n IsT(a)so(a) = i [BT,ijb vij(a)] 2 E [B .roij.b v. () i=l j=L i=1 j=l j#i j#i n n = E (B 2 B 2 ..) T,ij ro,ij i=l j= j#i n n <5 Z B 2 B 2 i=l jl T,ij ro,1i i=1 j=1 n n +2 E E i=1 j=1 j#i n + i=l n + 2 E i=l i (a)(boBro,ijb T T,ij n E j=1 j#i n j bB B I bo ro, ijbTl BT,ij j=l n E lb 2b 21 _ I o 1=1 J=i Thus, sup ls(a)so(a) a From Lemma 3.1, BT j T iij P from 3.3.2, b b as  o n n Z E [IB .B 2 .. +21b B b B I + b2 b 2 1 ~I [jBT,ij ro,ij o ro,ij bTT,ij Tl o i=1 j=1 (3.3.27) P *B ,. = B .. as T a for all i and j and o,ij ro,ij T + o. Therefore S2 P 2 B B T,ij ro,ij 2 P 2 b b 0 Ti o as T + as T , P bTl BTij b B T, o ro,ij for all i and j, as T + o for all i and j. Applying these results to (3.3.27), we see that the upper bound con verges to zero in probability as T + m, and hence, P sup s T(a) so(a) *0 ax as T . All conditions of Theorem 3.5 have been satisfied. So the consistency of aT1 has been established. n + E v 2 (a)(b 2b 2) ij Tl o 3.3.3 The Asymptotic Joint Distribution of (aTl bTaTI) The format of this section will be similar to that by which OT1 was shown to be consistent in the previous section. Two lemmas will be presented first. These will be followed by a general theorem and an application of it to our problem. The first lemma is just a specific statement of the multivariate Taylor's formula. A more general statement of the formula and its proof can be found in Fleming (1965:4449). Lemma 3.6: ah(d)e Let h(') be a function of = ( 1', 'q If h() exists and is continuous everywhere for j = 1,2,...,q, and both 6T and o belong to IRq, then there exists xT IRq for which h($T) = h(h ) + E( T 1o j1__ T,j o,j j=1 j ': where _T = + sh, 0 < s < 1, and h = . The second lemma is a result on asymptotic distributions. This lemma can be found in Fuller (1976:199). Lemma 3.7: D P Let zT  z as T 4 o and AT A as T + m, where z is a random kvector and A is a nonsingular k x k matrix of constants. Then AT z1  A.1 z as T o. Before stating and proving the general theorem, we make the following definitions. Let T = (0 ,..., )' be an estimator of iT T,JJ T,k o = (0 ', ,k)' based on T observations and = (,l 0,2''''' oq) o ol o,k o,l o o,q be a parameter of interest. For i = 1,2,...,q, let f.() be a function 1 of 0 and 4 and let hoi(4) = fi.(_,4), hT, i i (T'I' and r oi() = f,(,1). Let f(0,4), h (4), hT(), and r (0) be the corresponding vectors of functions. The purpose of the following theorem is to allow one to find the asymptotic distribution of estimators when some of them are implic itly defined. Theorem 3.8: Suppose the following conditions are met. Cl: The statistic, 0T, is such that (_T_o)  N (O,E) as T o. T k k C2: An estimator of A is AT for which hT(AT) = h (A) C3: The statistic, T,i' is a consistent (in probability) estimator of .i for all i = 1,2,...,q. C4: All partial derivatives of h T,i() exist and are contin  Ti uous everywhere for all i = 1,2,...,q. We let hN, i) T,ij () = alj C5: All partial derivatives of h .(*) exist and are continuous 0,1 everywhere for all i = 1,2,...,q. We let 3h .(g) h () = o0 ) and M = {h ()}. o,1 ij. o oij  C6: The matrix, M is nonsingular. C7: We have that sup Ih (1) h1 ()  0 ST,ij o,ij as T o for all i and j. C8: All partial derivatives of r .(*) exist and are continuous 0,i everywhere for all i = 1,2,...,q. We let ar .(6) R ={ . j 0=0 0 Then /i(T ) D2 N (0,(M oR )Z(MN R )') as T To q o o o o Proof: Several intermediate results will be necessary to reach the desired conclusion. R1: For every i = 1,2,...,q, there exists a random vector, xTi, for which q hT,i(T) = hT,i( + hTij (Ti)T,jo,j) where xT = + s.h, 0 < s. < 1, and h = T o . eTi o i 1  Proof of R1: For a fixed T, use C4 and apply Lemma 3.6 to h T,i() for i = 1,2,...,q. The randomness is due to the fact that hT,i() is a random function. Using RI, we have the following system of equations: hT' T) = hT() + MTT~~), (3.3.28) where M = (hT,ij (i)}. R2: For all i = 1,2,...,q, xTi as T in the 1 o sense that for all E > 0, P(lxTi,lo,l1 < .XTi qo,ql < E) 1 as T + Proof of R2: From Rl, it follows that for all i = 1,2,...,q, XTi 1 = sh = s(iT ), where 0 < s. < 1. Thus, if 1 1 E { = f %,11 < I*., ( < } occurs, then T. T s 0 q o,q s1 FT = {IxTi,lo,l < E,..., XTi,qo,ql < E} also occurs. This implies that P(E ) < P(FT). From C3, we have that P oj T T,j j as T oo for all j = 1,2,...,q. This implies that P(ET) 1 as T + , which, in turn, implies that P(FT) 1 as T + o, which establishes the result. P R3: For all i and j, M, .i  M .. as T 0 T ,. o,j13 Proof of R3: For all i and j, MTijMo,ijI = hT,ij(i) hoijo < hT,ij (Ti) ho,ij (i)l+ Iho,ij Ti)hoi/j suphT ij ()h () + h oij )h ,j()" Now R2 and C5 imply that h oij(xT) hoij( ) as T + Combining this result with C7 establishes the result. R4: As T o, I/[r (_ ) r (6 )] D N (0,R E R'). T q o o Proof of R4: From C8, we have that r .(*) is totally differentiable for all 0,1 i = 1,2,...,q. The result follows by an application of Lemma 3.3 to Cl. R5: As T o, o) N (0,Ro E R'). D q o o Proof of R5: From (3.3.28) and C2, it follows that T ^MTTJ)==/T[h__T(o) hT,) ] = T[h T(o) h o(=)]. A clarification of notation yields the result. From our definition of h ('), r (*), and h (*), we have = r ( ) o  and h(o) = f(,o) =r(6) o0 R6: As T a M1 MI (,T_ ) >N (O,(MIR) (IR)). Proof of R6: Using standard results for normal distributions, the result follows when C6, R3, and R5 are applied to Lemma 3.7. 1 It should be noted that for any finite T, MT may not exist. However, the following result indicates that this fact has no effect on the asymptotic distribution of 'T(T ). R7: Let A T= MTI and CT = {(: AT(o ) # 0). Then P(CT) 1 as T+. Proof of R7: It follows from R3 and standard results that A T IMJ as T. Since IMo # 0 (from C6), we have that P(CT) 1 as T m The implication of R7 is that the behavior of /T (T ) need only be considered on CT for purposes of determining the asymptotic distribution. Thus, we can conclude from R6 that TT) k N (O,(M R ) E (M R )') as T + . T q o o o o With the completion of the proof of this general result, it is now necessary to verify that the conditions hold for our specific case. We have e = 8 and 9 = where and B are defined in (3.1.1) and T T o o o (3.1.2), respectively, IT = (aT1,bT',aIl)' $o = (aobo',o) E = G r'(0), k = n2, and q = 3. Let n E B.. i=l1 fl (0{) = n a, n n SEZ B.. i=1 j=l 1 f (iO,) = # b, n and n n n f (09,) = E [B..bv (a)] v .(a) [d E d ikik((a), i=l j=l J k=l jji where vi.(a) is given in (2.3.2). Recall from A5 that B = B . It follows then that It follows then that n E B ro,ii i=l o,1 n n n E E B i=l j=l ro,ij i=1 j=1 h 2() = o,2 n n a i=l 1 a = a n = a a, o n n E E b v..(a ) i=l j=l 1 j _b =i b n n E b i=1 b n = b b, o and n n h ,3() = E E o i=l j=l j#i [ro,ijbij ( ij() [ij dikvik (c) n n = C i=1 j=1 j#i n [b v.. (a ) bv .(a)] v.(ca) [d . E di v (a) . S1J 1 1 1' k=1 k ik Recalling the form of aTI, bT1, and aT1 from 2.3.4, it follows that hT,1() = n = a a, Tl n n i=1 j=l Tij h () = b = b b, T,2 n TI and hT,3 ) n n 1 j [BT ijbvij()] vij(a) [dij dik k(a)]. i=l j=l k= i We also have: n E B.. i=l r () = a , 0,1  n o n n E E B.. i=l j=l 1 r () = j#i b , o,2 n o and r ,(6) o,3  n n n E E [B. b v (a )]v .(a )[d.. E dikv.(a )]. i=l j=1 ij o j jk=likik jji We now check the conditions. Cl: Lemma 3.2 meets this condition. C2: Evaluating h (1), we have: h ,l() = a a = 0, h () = b b = 0, o,2 o o and n n h (,3 ) = E Z [b v .(a )b v..(a )]v..(a )[d.. io, i=l= o o o ij ij n  dik vik (ao)] k=1 = 0. Evaluating hT(mT), we have: h T,1() = a aT = 0, hT,2(h) = bT bT = 0, n [BT,ijbT1V ij (CT1ij(aTI)d [dij dikVik(aT1 )] k=1 n n h ,3() = Z E i=1 j=l j#i = 0, by the definition of CTI in (2.3.12). Since hT(4T) = h ((), the condition is established. C3: The results from 3.3.2 satisfy this condition. C4: Evaluation of the partial first derivatives yields: hT,11(4) = hT,22(4) = 1, T,12( ) = hT,13() = h21 = hT2 hT,23(4) = hT,31(4) = 0, n n n hT,32() = v.2.(i)[d.j dik v (a)], i=l j=l k=likik j#i and n n n hT(33() = E v.i (at) E d ik ik() 2bv ij ()B i=1 j=1 k= ij j#i n n + [BT,ibij() ( dikvik ()[dik i di k=l =1 Since v ) is a continuous function of ti for all i and j, it follows that hT,ij(*) exists and is continuous everywhere for all i and j. C5: The forms of h (1) and h T() imply that h0, (4) = h T, () ( o 0,ij T,ij for all i and j except for i = j = 3, where we have, h 33() = v. (a) [d. d ikvik () ]2 [2bvj(a)b v( )ij ] j#i n n + [b v..(a )bv..(a)]( E dik v ik() [dik E d ,v (a)] . o 0j o = k 1 i For the same reason as in C4, we can conclude that h () exists and is o,ij continuous everywhere for all i and j. C6: By definition, M = {h (A )} o oij 1 0 0 = 1 (3.3.29) Sho,32 o) ho,33( 0 where n n n ho32 ) = E v2.(a ) [d.. d ikv (ao)] i=l j=l k=l jfi and n n n h o(1) = b ZE E v 2.(a ) [d. E d. v (a )]2 o,33 () o i=lj=l ijo j k jii Therefore IM o = ho (0 ) Since bo 0 (A13), it is enough to show 0 0,33 o o n there exists i and j such that d.. Z dik vik (a ) 0 0. From A4, k ik ik o k=l it follows that there exists a location i for which there exists neighbors j and k such that d.. # d. Choose location j to be a closest neighbor of location i. Now, from A3, it follows that n n n d.. E dv (a ) = E d..v, (a ) dikv ik(a ) dij E ik ik o E ijv ik o E ik ik o k=l k=l1k k=1 n = E (d..d. )v. (a ). k=l 1 ik Ik o Now dij dik for all k # i and dij < d. Therefore, since v (a ) > 0 for all i and j 0 i (from A14 and (2.3.2)),it follows that n n E (d idA )v. (a ) < 0 which implies that d.. E dikv (a ) < 0. k ij ik Ik o 1 k=j ik ik o Thus, the condition is satisfied. Thus, the condition is satisfied. C7: Since the only (i,j) combination for which hT,ij(1) ho,ij(1) is (3,3), it is enough to show that sup IhT (33) h ( )3 ( 0 as T + First note that there exist c and d, both finite, for which Ik d ik ik(a) c and d.ij k=l 1 1J n  d ikvik(a) d, k=l k k since, from Al, it follows that n k=E k=l dikik ) = n k= k=1 dik v ik() ik ik n SE dik k=l n Id..  Sk=l diki I 1 dij < d.. + c < d. Now, from Al and the above results, it follows that IhT,33() ho,33( l n n E E v (a) [d. i=1 j=1 ij j#i n k dikvik(0a)[dik k=l n  d Av i )]2 k=lik ik n i} [bVij(o B T,ij n n S Z E b v..(a ) B .I(d2+cd). i=1 j=1 Tij j#i dikVik (0) Therefore, sup i IhT,33() ho,33() n n 5 E E bvij (a) B Ti(d2+cd) i=l j=l o ij o T,ij j#i Applying Lemma 3.1 and A5 (B =B ) leads to the conclusion that o ro sup IhT33()  0 h o,33()  0 as T m. C8: Evaluation of the partial first derivatives yields: Dr (0) o,l aB.j r () n o ,2 0 0 = RB.. 1J 10 and 0r o, ai Sv..(a. ^ Oj llj I if j = i otherwise. if j # i otherwise, n )[d i E o k=l otherwise. d ikv (a)] = r. ik ik o 13 Since all of these derivatives are constants, it follows that C8 is satisfied. Then n n 1 1 R = 0 1 ... 1 o n n S0 r12.. rln n 0 1 " 0  0 ... 0 ... n 0 1 1 n n. n n r23... r2n*.. rnl 0 n r oJ n n1 (3.3.30) if j i All of the conditions for Theorem 3.8 have been met without the need for any additional assumptions. We finally have that f[(aT,b T,a T) (a ,bo, a )] D  N(0, ) as T , T TlT1 o 0 3 where 1 = (Mo R ) E (M R )', 1 o o o o M is given in (3.3.29), R is given in (3.3.30), and o E = G '(0). (3.3.31) The asymptotic univariate distributions of aT1, bT1, and aT1 follow directly from the joint result. One might fear that since the assumption, A13: b # 0, was made to satisfy C3 and C6, there may be some problem in a derivation of the asymptotic univariate distributions from (3.3.31). However, recall that A13 was made only in order to arrive at the consistency of aTl. For this reason and also since bTI is explicitly defined, A13 enters into the univariate considerations only in the case of aTl. Consequently, one should consider the asymptotic distribution of aTl only if one is willing to assume bo 0 0. In light of the earlier discussion on the justification of A13, it is seen that this restriction is quite reasonable. For similar reasons, A14 is also necessary only in the case of aT1. 3.4 Review of Assumptions Introduced in Chapter III The assumptions introduced in Chapter III are summarized in Table 3.1. Table 3.1 Assumptions Introduced in Chapter III Assumption A12: The usual YW estimator, B is 1 bT1 ci for all i T, for all i Tij Q 0 for all i s A13: A14: lot and md such that j E Ni j E Qi 1 1 b for all i and j F. BT.ij = S 0 for all i and j E P. 1 The true value of b, b is nonzero. o The true value of a, a is finite. o Section 3.3.1 3.3.2 3.3.2 CHAPTER IV ESTIMATORS OF COVARIANCE MATRICES AND THEIR PROPERTIES 4.0 Preamble In this chapter we will consider the estimation of two covariance matrices, G and F(0), where G is the covariance matrix of the error term, E and F(0) is the covariance matrix of yt. As for the other parameters of interest, estimation schemes will be introduced which exploit the nature of the spatial firstorder autoregressive model. The estimators and some of their properties in the case of the general firstorder auto regressive model will be presented in Section 4.1. This will serve as motivation for the specialized estimation schemes which will be intro duced in Section 4.2. Some properties of these new estimators will be examined in Section 4.3. The known weights and variable weights cases are considered simultaneously, with differences noted when necessary. 4.1 Results for the General FirstOrder Autoregressive Multivariate Model 4.1.1 Relationships Among Model Parameters Just as there are relationships among the model parameters which lead to the usual YW estimator of B in Section 2.1, there are relation ships which will lead to the usual YW estimator of G. We recall from Section 2.1 that F(O) is already estimated by a moment estimator, F (0). Consider again the model for the general firstorder auto regressive time series, t BYt + t, (4.1.1) for which we assume A6 and A7. That is, (A6) all roots of f(z) = JIBzj = 0 lie outside the unit circle and (A7) the t 's are independently and identically distributed with the mean equal to 0 and the variancecovariance matrix equal to G. If we multiply through (4.1.1) by Et and take expectations, we have from (2.1.2), A6 and A7, E(y ') = G for all t. (4.1.2) t If we then multiply through (4.1.1) by yt', it follows from (2.1.3) and (4.1.2) that F(0) = BP(1) + G. (4.1.3) Another implication of A6 and A7 is the relationship, 00 F(0) = E BJGB'j, (4.1.4) j=O where B = I. We recall from Section 2.1 that F(l) = BF(0). (4.1.5) We now have a system of 3 equations involving the model parameters, B, F(0), F(l), and G. (From the definition of r(') in Section 2.1, it follows that F(l) = F'(1).) The results presented thus far in this section are known and can be found in Hannan (1970:1315, 326329) and Fuller (1976:7273). We will show that (4.1.4) and (4.1.5) imply (4.1.3). Since r'(I) = F(l) implies that r(l) = F(0)B', it follows that Br(1) + G = Br(0)B' + G 00 = B[ B3 G B']B' + G j=0 = Bj+l G B'j+] + G j=0 = E B G Brj j=0 = r(0), and the result is established. 4.1.2 Results for the Usual YuleWalker Estimators The usual YW estimator of G, GT, is found by using the relationship in (4.1.3) and letting GT = FT(0) B TT(1), (4.1.6) where F (0), T(1) (= T '(I)), and BT are defined in Section 2.1. The following results from Hannan (1970:209210,329) will be useful in determining a property of the estimators developed in Section 4.2. Lemma 4.1: If yt is generated as in (4.1.1) and A6 and A7 hold, then P GT,ij G..i as T  c for all i and j, where GT,i and G are the (i,j) elements of G and G, respectively. T,1i ij T Lemma 4.2: Under the same conditions as in Lemma 4.1, r ..T(k) PF ij(k) as T m for all i and j, and k=0,l. T,iJ ij By the nature of the estimation procedures for the usual YW estimators, (4.1.3) and (4.1.5) are satisfied for BT, FT(0), FT(1), and GT. It will now be shown that (4.1.4) also holds. Let k k =. T TT j=O k = I B [r T(0) B FT(0)BT ] B TJ j=0 k = [BT T(O)Bj B j+B T(O)B ,j+l] j=0 k+l 'k+l = rT(O) BTk1 FT(0)B T+ We know that B satisfies A6 and that F (0) is nonsingular with probability one. (See Hannan (1970:329,332).) (Of course, FT(0) then must be positive definite with probability one.) If we consider a new firstorder autoregressive process for which B is replaced by BT and G by FT(0), it follows that the sum in (4.1.4) must converge for that 00 process. That is, F BT FT(0) BT is a positive definite matrix j=O T with probability one. In order for this matrix sum to converge, the k+1 k+1 contribution of B F (0) B k+ to this sum must tend to zero as k  o. For this reason, one can conclude that Sk FT(0) as k m. Thus, FT(0) = E B GT BT j=0 4.2 The YuleWalker #1 and #2 Covariance Estimators Our objective is to develop estimators of the covariance terms in our model which reflect the spatial nature of the model. A natural modification would be to estimate G using the relationship in (4.1.3) and letting GTI= T(0) BrTT(1) (4.2.1) where BrT is a general notation for the estimator of B using either the YW#1 or YW #2 estimators in the known weights case or the YW#1 estimators in the variable weights case. In general, BrT,ii = aT for all i. In the known weights case, BrT,ij = bTWij for all i and j # i, and in the variable weights case, B ,ij = b vij(a ) for all i and j i. rT,ij T ij T For specific estimation schemes, BrT is replaced by BT1 in the YW#1 case and by BT2 in the YW#2 case. We will use the general notation whenever possible in our discussion in this chapter and consider the specific cases only when necessary. An undesirable property of GTI is apparent upon replacing FT(1) with T (O)BT' in (4.2.1). That is, we have GTI = T (0) BrT T(O)BT Since, in general, BrT BT, it follows that GTI is not symmetric. A modification which corrects this problem would be to use GrT as an estimator of G, where 1 T (G + ). (4.2.2) rT 2 TI TI As in the case of B T, GrT will be the general notation for the esti mator and GT1 and GT2 will denote the YW#1 and YW#2 estimators of G, respectively. It follows from the work in 3.2.3 and 3.3.3 that r (0) is a component which enters into the calculation of the asymptotic covariance matrices of the estimators of (a,b) or (a,b,a). At this stage, we have only the moment estimator of F(0), T (0). It is desir able to develop another estimator which takes into account the special structure of our model. One criterion would be to develop an estimator, SrT(0) in general notation, which fits into the framework of the three relationships given by (4.1.3), (4.1.4), and (4.1.5). Using (4.1.4) gives (O) = P B Bj rT = rT rT rT (4.2.3) j=O It is not clear that the righthand side converges because it is not known that BrT satisfies A6, nor is it known what effect GrT might have. (This is an area for additional research.) Two suggestions are made for practical usage of (4.2.3): (i) Calculate terms in the sum, (4.2.3), until convergence, according to a specified criterion, is established. In our work, a correlation matrix was calculated after each step in the summation. When the absolute change in the correlations from one step to the next was arbitrarily small for all elements in the matrix, convergence was assumed. L (ii) Calculate Z B J Gr B where L is a preassigned limit. j=0 rT The choice of L would probably depend on when one would expect convergence to occur if r(0) were calculated by using (4.1.4). Combinations of (i) and (ii) could be used. For example, (i) could be used, with (ii) as a default if convergence does not occur before L steps. Empirical investigations like those presented in Chapter VI should provide more insight to the practical considerations of this problem. Using (i) and/or (ii) provides an estimator that is modified according to the special structure of our model. One might then sug gest using FrT(0) and BrT in (4.1.5) to get a modified estimator of r(l), rT (1), which in turn would be used along with BrT and rT(0) in (4.1.3) to modify GrT. However, since (4.1.4) and (4.1.5) imply (4.1.3), one, in theory, would not get any modification of GrT' This would be the case if the sum in (4.2.3) did converge and it were possible to calculate all terms in the sum. However, only a finite number of these terms will be calculated in order to determine rT(0). It would seem that using only a finite number of terms would not have a major modifying effect on GrT if the stopping rule (in summing terms in (4.2.3)) were reasonable. If the modification was not major, BrT, GrT, rT(0), and rT(1) could be regarded, for practical purposes, as satisfying (4.1.3), (4.1.4), and (4.1.5). The covariance estimation procedures discussed in this section can be summarized in three steps. Step 1: Estimate B using (a TbTl) or (aTbT1 ,T) to calculate BT1 or (aT2,bT2) to calculate BT2' Step 2: Estimate G with GTI or GT2 using (4.2.1) and (4.2.2). Step 3: Calculate a modified estimate of r(0), FTI(0) or rT2(0),by using (4.2.3) and following (i) and/or (ii). 4.3 Consistency of the YuleWalker #1 and #2 Covariance Estimators In order to show the consistency (in probability) of GrT, we will show the consistency of each of its components. The consistency of GrT will then follow from standard results. In the variable weights case, we have from 3.3.2 that aTl, bTl, and aTI are all consistent (in probability). It then follows that P B i= aT  a = B as T  for all i Tl,ii o ro,1i and BT = bTlv ij(a ) b v. ()= B Tl,ij TI ij Tl o ij o ro,ij as T for all i and j # i, since v i(a) is continuous. Therefore, in the variableweights case, B is a consistent estimator, elementwise, of Bro From our earlier work in the known weights case, it is enough to consider only the YW#2 estimators of (a,b). We have from 3.2.2 that both aT and bT2 are consistent. It follows then that BT2ii aT2 a = Broii as T for all i and P B = b w P;b w.. = B .. as T m for all i and j#i. T2,ij T2 ij o 13 ro,1i Therefore in the known weights case, both BT1 and BT2 are consistent estimators, elementwise, of B . ro Since this covers all of our estimation procedures, we can say, in general, that P B B as T o for all i and j. rT,ij roij That is, B is a consistent estimator, elementwise, of Br. rT ro From Lemma 4.2, we have that both F (0) and F (1) are elementwise consistent estimators of F(0) and F(1), respectively. Using this result and the consistency of BrT, we have by standard results, P G .T > G. as T for all i and j. It then follows from (4.2.2) that P G rT,ij  Gij as T m for all i and j. Thus, GrT is an elementwise consistent and symmetric estimator of G. This result is analogous to Lemma 4.1. In order to determine whether or not FrT(0) is a consistent estimator of F(0), one must specify a stopping rule for the sum in (4.2.3). If (ii) is followed, let L r (0) = E B G Br j L j= ro ro The results of this section imply that L P r (0) = E Br G B'r ;+F (0) as T m for all i and j. rT,ij = rT rT rT jij L That is, rT (0) is an elementwise consistent estimator of FL(0) which is a good approximation to F(0). Of course from Lemma 4.2, T (0), itself, is a consistent estimator of F(0), but it is hoped that FrT (0) would perform better than FT(0) for finite T because the specific structure of our model is taken into account in the former. If a stopping rule like (i) were used, the study of the consistency of rT(0) would be more difficult because the number of terms included in the sum would be a random variable. Consistency of the estimator in this situation was not extensively studied. 73 Estimation of the covariance matrices of the asymptotic distributions derived in Chapter III requires an estimator of r (0). Thus, a desirable property for an estimator of F(0) is nonsingularity. The question of the nonsingularity of rT(0) has only been empirically considered. For the results reported in Chapter VI, F (0) was non singular in all cases. CHAPTER V INFERENCE 5.0 Preamble The results in Chapters II through IV represent the foundation for the inferential procedures presented in this chapter. In Section 5.1, asymptotic singleparameter test statistics and confidence intervals for a, b, and a (if appropriate) will be presented. Joint confidence inter vals and tests will he considered in Section 5.2. Prediction with the general firstorder autoregressive model will be discussed in Section 5.3, and these results will be applied to the spatial model in Section 5.4. 5.1 Asymptotic SingleParameter Hypothesis Tests and Confidence Intervals 5.1.1 The Known Weights Case One of the advantages of the parameterization for our special firstorder autoregressive model is that it allows for exploratory study of the underlying process. In the known weights case, there are two effects to be studied, the location effect, represented by the parameter a, and the neighbor effect, represented by the parameter b. To perform hypothesis tests and construct confidence intervals, we use the results from 3.2.3, where it was shown that T[(a ,bT) (ao,bo)] D N2 (0,Hr ZH ') as T + m, (5.1.1) where E = Gi^ (0) and H is either H1 (YW#1) or H2 (YW#2) given in 3.2.3. in 3.2.3. Let 2 =(H H') and 2 = (H EH ). a r r 11 b r r 22 The results presented here can be applied to both the YW#1 and YW#2 estimators. Consequently, a general notation without subscripts "1" and "2" will be used. The asymptotic univariate distributions follow directly from the asmptotic joint distributions. Thus, for large values of T, both aT a b b z and z  a a /T b /blF can be regarded as being approximate standard normal random variables. In order to test the hypothesis, H : a = a the usual ztest would be o o used with the test statistic, z If a = 0, this would provide a ao o test of the hypothesis of no locational effect. That is, we test that the response at a location at time t is not explicitly related to the response at that location at time (tl). In the same way, to test the hypothesis, H : b = b one could o o use the test statistic, zbo. If b = 0, this would provide a test of b0 0 the hypothesis of no overall neighbor effect in the sense that the response at a location at time t is not explicitly related to the response at any of the other locations at time (tl). Note that if bT2 were used, the test would represent a test of an overall neighbor effect with regard to a specific weight structure. However, if bTl were used, one considers the specific weight structure only through E and the assumption that the weights are scaled to add to unity for each location. (The weights determine B which, along with G, determines r(0) and hence, E.) While for theoretical considerations, the YW#1 estimator can be regarded as fitting within the general framework of the YW#2 estimation scheme, it is seen that in application, at least in terms of calculating the estimate of b and performing this hypothesis .test, the YW#1 approach is more general. If one were interested in estimating a and b, the usual confi dence intervals could be constructed. A (ly) 100% confidence interval for a would be aT + Zy2 a//T, a T zY/2 'a ' where zy/2 is such that P(z > z y/2) = y/2 and z ~ N(0,1). Likewise, a (ly) 100% confidence interval for b would be bT Zy/2 b /T Upon observing the form of a or ab for either estimation a b scheme, it is clear that for practical usage of these results, one would need to estimate Ca and ab by aT and a respectively. Since a b aT bT H1 and H2 are both constant matrices, E is the only matrix that needs to be estimated. We estimate Z by replacing G and r(0) with their consistent estimators, GrT and FrT(0), presented in Section 4.2. It should be noted that in using 0aT and ObT as consistent estimators of a and ob we are taking into account the specific weight structure a D assumed for our model in both the YW#1 and YW#2 cases through the use of GrT and FrT(0). 5.1.2 The Variable Weights Case For the variable weights case, the parameter a provides for a distance effect in addition to the location and neighbor effects. Recall that in 3.3.3, we showed that b ) (aobo )] (0,~D) as T (5.1.2) I/T[(a ,b ,T t ) (a ,b ,ba )] ) N3 (0,1 ) as T + =, (5.1.2) Tl Tl Tl o o o3 1 where E = (M R ) E (M R )', 1 0 0 0 0 M is given in (3.3.29), o R is given in (3.3.30), and S= G F'(0). Let 02 = a2 = E and o = Z al 1,11' bl 1,22' al 1,33 As in 5.1.1, the appropriate test statistic to use in testing the hypothesis, H : a = a would be o o aTI ao z = ao oal/ The interpretation of the test when a = 0 is the same as in 5.1.1. Similarly, the appropriate test statistic for testing H : b = b would be o o bTl b z Zbo = bl In order to test H : a = a more care must be taken. We o o showed in 3.3.3 that the asymptotic distribution of v/ (OcT1a ) exists only if b # 0. Consequently, the following hypothesis test can be performed only if assumption A13 (b A0) holds. In that case, the appropriate test statistic would be STI 0o z = o CtlY If a = 0, this would provide a test of the hypothesis of no distance o effect among the neighbors. Under this hypothesis, the effect of a neighbor on a location does not depend on the distance between the loca tion and the neighbor, according to our specific weight structure, 78 because v. .(0) = for all i and j # i. Because A13 must be assumed 1j ni for the validity of this test, it follows that one should test first for a neighbor effect and then for a distance effect among the neighbors. We can construct (ly) 100% confidence intervals for a, b, and a as follows: !al aT 1 ZY/2 'al b + T bl bT1 /2 ' and + al T1 y/2 T For practical usage of these results, we would need to estimate Sal, obl, and al. In addition to estimating G and F(0) with GrT and r (0), respectively, M and R need to be estimated. The forms of M and R in (3.3.29) and (3.3.30), imply that they can be estimated o o consistently using bTl and aT1 to calculate M1 and RT1. Since bT1 # 0 (All), MTI exists for the same reason that Mo exists. However, assumption of All is necessary only if the procedures involving a1 are used. Since all the components involved in the determination of 0aTl', bTl' and OaTl are consistent, it follows that oaT1, ObTl' and oaT1 are consistent estimators of al,' bl' and al, respectively. The results in this section, as well as the next, are asymptotic. The empirical investigations reported in Chapter VI should provide some insight into the use of the results for finite sample sizes. 5.2 Asymptotic Multiparameter Hypothesis Tests and Confidence Regions 5.2.1 A General Result The following lemma provides a general result that will be useful in developing multiparameter hypothesis tests and confidence regions. Since this is a known result, it is stated without proof. Lemma 5.1: Let 8 be an estimator of 0 based on T observations, with To D both 0 and 6 of length k. If $r(8 )  >V (0,E) as T , T 0 T  then if E exists, T(O )' E _1 (T  D X2 (k) as T  where X2(k) is the central chisquared distribution with k degrees of freedom. 5.2.2 The Known Weights Case By using the result in (5.1.1) and applying Lemma 5.1, one can derive the asymptotic test of the joint hypothesis, H : a = a and o o b = b The test statistic is o 2 =T[(aT,bT) (ao,b)] rE [(aT,bT) (ao,b) (5.2.1) where E = Hr EH Hr = H or H2 depending on whether the YW#1 esti mators orYW#2 estimators are being used. The form of the rejection region for a ylevel test would he X2 > X2(2), where X (k) is such that Y Y p[2(k) > (k)] = Y. If H is rejected, one might use the singleparameter procedures in 5.1.1 in an attempt to determine individual differences which could have led to the rejection of H . o For estimation purposes, an asymptotic (ly) 100% joint confidence ellipsoid for (a,b) could be constructed using a technique like that in Anderson (1958:55). The ellipsoid would consist of all values of (a,b) for which T[(a,bT) (a,b)] 1[(aT'b ) (a,b)]'< X (2). (5.2.2) Since both (5.2.1) and (5.2.2) contain E the question of r whether or not E is invertible must be answered, where 1 E = H (G g (O))H '. r r r Now both G and (0) are invertible, and H (H1,H2) is clearly of rank 2, which implies that Z is of rank 2 and hence that I exists. r r The results of this section still follow when, in practice, E is estimated consistently by ErT where 1 = H(G (C P (0))H '. rT r rT rT r The matrix, rT, will then be invertible if both GrT and FrT(0) are invertible. 5.2.3 The Variable Weights Case More care must be taken in developing and using multiparameter procedures in the variable weights case, because TI 's behavior is evaluated only if b # 0. From (5.1.2) and Lemma 5.1, it follows that to test the joint hypothesis, H : a = a b = bo 0, and a = a ,for large T, one can employ the test statistic, XlT[(a T,b T Ta )(a ,b ,a )] [(a T,b Ta Tl)(a,b ,a ) ]', (5.2.3) T1 Tl T1 o o o 1 T1 T1 Tl o o where E = (M R )(G @ F (0))(M R )'. The form of the rejection region 1 o o o o for a ylevel test would be X2 > X2(3). Y If H : a = a b = b # 0, and C = a is rejected, the single o 0 0 0 parameter procedures of 5.1.2 could then be used to detect individual differences. If b = 0, one would need to first test H : a = a and b = 0 o o O using the YW#1 procedure given by (5.2.1). If H is rejected, one could use the singleparameter tests in 5.1.2 to detect significant differences from the hypothesized values. The test of H : a = a would be carried out only if one were willing to assume b # 0 (A13). An asymptotic (lY) 100% joint confidence ellipsoid for (a ,b ,a ) would be all values of (a,b,a) for which 1 () 524 T[(aTl,bT1,aT)(a1b,a)] E1 [ (aT,bTT)(a,b,]) ] X2(3). (5.2.4) Any points of the form (a,0,a) would need to be eliminated from the ellipsoid since we consider the joint distribution of (aTl,bT TaTl) only if b # 0. Using (5.2.2), a confidence interval could be con structed for a in the case of b = 0. Even with the (a,0,a) values removed from the ellipsoid, it would seem that (5.2.4) would be a bit difficult to portray graphically. A better procedure might be to graph contours of (a,a) for selected nonzero bvalues. From (5.2.3) and (5.2.4), it is seen that ,1 must be invertible. I If M R is of rank 3, it follows that E1 is invertible since both o o 1 G and F(0) are invertible. Upon examining the form of R in (3.3.30), it follows that the rank of R is 3 since we assume that there are at o least two different distances (A4). Since M is clearly of rank 3, it o follows that M R is of rank 3. o o In practice, one would use ET1, a consistent estimator of l, 1 1 1= where ET1 = (1lRT1)(GT1 FT(O))(MT1 R). ' Since M 1L RT is of rank 3, ETZ will be invertible if GTI and FT1(0) are nonsingular. 5.3 Prediction with the General FirstOrder Autoregressive Multivariate Time Series Model 5.3.1 Introduction One of the major purposes in developing a time series model is to use the model to predict or forecast future realizations of the series. Consider again the model for the firstorder autoregressive multivariate time series, t = B tl + (5.3.1) where assumptions A6 and A7 are true. Suppose this process is observed for T time periods, t=1,2,...,T. This section deals with the problem of predicting YT+k' k = 1,2,..., that is, predicting k time units ahead. We begin by writing the model in (5.3.1) for t = T+k in terms of the observations by time T. We have YT+k = B YT+k1 + T+k = B(B T+k2 T+kl +T+k = B2 T+k2 + B T+ki +T+k k1 = Bk T + B(5.3.2) jOT + T+kj j=O It follows that E(YT+k I yT = y' T1 = Tr1 .) = B since an implication of A6 and A7 is that E is independent of y_>yt2 .t for all t. Practically, we will be interested in the ytl, t_2,..., for all t. Practically, we will be interested in the expected value of YT+k given only a finite number of past values. But the Markovian nature of the autoregressive model implies that E(YT+k I 7T "'Yl = Yl) = BT = E(YT+k I YT = T) (5.3.3) so that this practical consideration imposes no limitations. 5.3.2 Prediction When B is Known to be B From (5.3.2), it would seem natural for one to use Bk y to o XT predict YT+k if one wished to use only a linear combination of past observations (i.e., yT',1T_,',* ). Call this predictor T+k. An application of a more general result in Hannan (1970:127130, 135136) k leads to the conclusion that yT+k = Bo IT is the best linear predictor of yT+k using the entire past, yT,T_, ... The predictor, YT+k' is best in the sense that the minimum of E[(T+k +k)' (vT+k T+) ] is at T+k = +k' where the minimum is taken over all linear predictors, T+k, of nT+k based on the entire past. So T+k B (5.3.4) is the mean square predictor of YT+k' For a particular realization of the series, l,2, ...* T, the predicted value at time T+k would be k v = B k. "T+k o T From (5.3.3), we see that this predicted value is such that +k = E(T+k I T = T''''' = l) = E(T+k I yT = T)" The error of prediction is defined to be the difference between the actual value at time T+k and the predicted value. One important characteristic of these errors is their variancecovariance matrix. There are two approaches in evaluating this matrix that we will consider. One is the conditional on the part of the particular past realization that is used in the prediction, yIT, and the other is the unconditional over all values of yT. If these yield different results, the experimenter would then need to decide which approach would be appropriate to his experimental situation. Case 1: We consider the conditional approach first. Let VT(') and E (*) denote the conditional (on y = y ) variance covariance matrix and mean vector, respectively, and let V(*) and E(*) denote their unconditional counterparts. We then have from (5.3.2), (5.3.4), and assumptions A6 and A7, that VT(error of prediction) = VT(YT+k T+k) kl = V ( E B 0j .(3) T o T+kj j=0 kl ki j=0 j=0 T o T+kj o T+kj j=0 j=0 ki o T+kj Bo j j=0O kI E Z B GB (5.3.5) j= 0 0 Case 2: We now consider the unconditional approach. Using (5.3.5) and the arguments used in Case 1, we have V(YT+k T+k) = E[VT(yT+ T+k) + V[ET(Y T+k T+k) ki = Bj G B + V(O) j=0 0 0 k1 = JkI BjGB 'j j= 0 0 = V (IT+k +k). (5.3.6) This result can be found by an application of the general result in Jones (1964). If k = 1, we see that V(T+1 iT+) = G which agrees with our intuition. Another form of the variancecovariance matrix can be derived by using the form of F(0) in (4.1.4). This implies that k1 l SB G B 3 = F(0) Z B j G B j=O 0 0 j=k o o = F(0) B k[ B GB B k o j=0 0 o o k k = (0) B F(0) B (5.3.7) o o By the same reasoning that was used in a similar case in 4.1.2, we can conclude that this variancecovariance matrix of prediction errors approaches F(0) as k + o. This result is intuitively appealing, since as one predicts farther ahead in time, the information provided by the past observations becomes less important. Consequently, the predic tion variancecovariance matrix conditional on the past values approaches the unconditional variancecovariance matrix of the time series. 5.3.3 Prediction when B is Unknown The more realistic prediction situation is to treat the matrix of coefficients, B, as unknown. In this situation the predictor would be ~B+k= BT. (5.3.8) An approximation to the variancecovariance matrix of the predic tion errors will be derived here. Our approach will be similar in some respects to that of Box and Jenkins (1976:269) in the scalar case. We make the following assumption. k A15: The matrix BT can be regarded as being independent of YT. Since YT is used in the calculation of the usual YW estimator, BT, we know that A15 is probably not true. However, if T is large, it would seem that the effect of yT on BT would be relatively insignificant. Thus, A15 could be used in deriving an approximation to the variance covariance matrix of the prediction errors. We derive this approximation by using the mean vectors and variancecovariance matrices of asymptotic distributions determined by repeated applications of Lemma 3.3 to Lemma 3.2. We use the notation, "", instead of "=" at each point where an actual moment is replaced by a moment of the corresponding asymptotic distribution. Let B be the true (unknown) value of B. o Case 1: We first consider the conditional case. An application of Lemma 3.3 to Lemma 3.2 yields k k E(B B ) Z, o T where Z is a matrix of zeroes. It follows then from A15 that k k k k E [(Bo B k)T] = E(Bo B )T 0. This result, along with (5.3.2), (5.3.8), A6, A7, and A15, implies that k1 V(T+k +k = V T[(B k + B T+kj oT(YT+k o[ T 0 T=+k j=0 ET[(yT+k T+k) (YT+k YT+k) ] = E[(Bk BTk T (BokBk),] ki + E[ E Bj B ] j=0 o T+kj T+kj o k1 SV[(B Bk )YT + E B G B (5.3.9) S T =0 o o j=o0 The following lemmas will be used to derive an approximation to V[(Bk B k Lemma 5.2: Let X be an n xn matrix of random variables for which the variancecovariance matrix of x= (X11,X12... ,X1n,X21,... ,...,Xnl, ...,X ) is E. Let A and B be n x n constant matrices, r=1,2,...,k, nn r r k and S = E A X B Then the variancecovariance matrix of r=l r r s (S l,S12,...,Sn,S21,...,S2n,...,S n,...,Snn)' is H E H', where k H = E (A B '). s r= r r r=l Proof: Let A and B be n xn constant matrices and P = A X B. It is claimed that the variancecovariance matrix of P= (PP2'...,PP2 ,...,P2n... Pn...,P )' is HP ZH where 11 12' n' 21'" 2n''' nl'"' nn p p H = (A B'). P Let R = AX and r = (RI1,R12,...,R nR 21 ...,R2n,...,R n ...,Rn)'. Since R.j = A. X j, it follows from a standard result that the variance covariance matrix of r is H E H where r rx Br. rHj n /I,^. , 7A 0 .0 0 Al 0 0 0 .. 0 All ':1 n A 2 0 0 0 A12 0 0 Al2 0 ... O 0 . A22 0 . 0 A22 0 . 0 A12** . 0 ** . 0 ** . O A21 O A22... /^ n Aln 0 ..... 0 0 Aln 0 0 0 0 Aln A2 0 0 A 0 0 0 A2n A 0 .. 0 A 0 .. 0 ... An 0 ... 0 0 A 0 .. 0 0 An 0 .0 .. 0 An .0 0 O A 0 ... O An2 0 .0 Ann ......OAni 0 ........0A n.OA o = (A In), (5.3.10) where I is the n x n identity matrix. n Now let T = X' and t = (TI,T12, .. ,Tln T21 ..T2n ...,T n,...,T)'. Then the variancecovariance matrix of t is H Z h where t t H = Since there exist unique r and s such that t. = T = X t ix. i rs sr it follows that the only nonzero element of Ht,. is that corresponding t,1r to X which contains "1". Since x. = X = T the only nonzero element sr 1 rs sr of H t. is that corresponding to T which contains "1". This implies ta sr that H = H '. t t A21 0 . 0 A21 0 0 . . 0 . 0 Let Q = XB. By following the route from X to X' to B'X' to XB, the previous two results imply that the variancecovariance matrix of q = (Q11'Q12'""''Qln'Q21'*..'Q2n'* ..'Qnl*""'Q nn)' is H (B'I )H EH '(B'GI )' H = H (B'OI )H 'EH (BI )H '. It is t n t t n t t n t t n t known that Ht(VGW)Ht' = (WgV), where V and W are nxn matrices. Then the above variancecovariance matrix can be written as (I GB') E (I GB). n n Since P = AQ, it follows from (5.3.10) that the variancecovariance matrix of p is (AI )(I B') E (I B)(A'GI ). Simplifying, we have, n n n n (AOI )(I GB') E (I GB)(A'I ) = (AB') Z (A'B), since n n n n (AI )(I OB')(AI = ( AI ') = (AB'). Therefore, n n n' n H = (AQB'). (5.3.11) P Let S = A XB s be the corresponding vector representation, r r r 7 and H = Then the variancecovariance matrix of s is H EH '. Now s tx. s s s as r E s . aI i H=r1_ ax k s . ax. k = Hs r=l Sr where, from (5.3.11), H = (A rB '). Therefore, s r r r k H = E (A GB '). s r r r=l Lemma 5.3: Let A and B be k1 to (AB) in terms of Proof: The proof will (AB)2 two n xn matrices. Then a firstorder approximation powers of B is Ak E ABA j=0 be by induction. For k = 2, = (AB)(AB) = A2 BA AB + B2 SA2 BA AB 1 = A2 A BAlj j=0 as a firstorder approximation in B. Suppose the result holds for k. It follows that k+1 k (AB)k+ = (AB)(AB)k k1 (AB)(Ak E A1 BAklj j=0 k1 = A BA E Aj+ BAklj j=0 k k+1 j kj = A A BA , j=0 and the result holds for k+l. To derive the asymptotic variancecovariance matrix of (B kBTk )T' k k we consider a twostage transformation, from B to Bk to BT YT. The fol lowing lemma gives the asmptotic variancecovariance matrix for the first stage of the transformation. Lemma 5.4: Define A = BTk and A = Bk. Let A and A be the corre Tk T ok 0 LTk ok spending vector representations of BTk and B k respectively. Then (Tk ok) n2 ( k) as T , k1 where Ak = Hk EHk, E = (Gc (0)), and Hk = E (B j B ok1). j= o Proof: From Lemma 5.3, we have, 0 0 = 0 0 0 j=0=0 k1 = Z B (B B)B k1j j=0 0 Sk1 = kB k B B B klj 0 J= 0 0 j=0 We consider a firstorder approximation here because that is what we used in the asymptotic distribution results in Chapter III. The above k k approximation to B B is just the Taylor expansion (in matrix form) of (BkBk) about B to the firstorder term. Consequently all first order partial derivatives with respect to the B..'s evaluated at B lJ o,ij can be found in this approximating matrix. Since this matrix will be k1 i  evaluated at B = BT, it is enough to consider E B B B k T j=o To j=0 for determining the asymptotic variancecovariance matrix. By applying Lemma 5.2 to Lemma 3.2, we can conclude from Lemma 3.3 that A ) D 2(0,A ) as T , Tk ok n k where Ak=Hk EHk', k k k' ki E = (Gfr (0)), and H = (B JB klj k =0 j=0 Theorem 5.5: With Ak defined as in Lemma 5.4, we have, ki VT(yT+kT+k) T (In T')Ak(I T) + B o GB j=O 1 k k S( In T')Ak(InOT) + F(0) Bk F(O)B Proof: From the comments following (5.3.9), we know that our objective isto approximate V[(BkB )T. Since YT is regarded as a constant vector, applying Lemma 3.3 to Lemma 5.4 implies that V(Bk BTk)I 1 H A H V[(o T H, f(hi (A) where H =  and h (X) n YT1 YT2 .YTn 0 0 H = y n = i A y ,jfor all i. We see that j=l 1i ,j n n .o. .. T2 ~ ... .. o 0 YT1 YT2 YTn''" 0 ...... 0 S0 0 0 ..... 0 ... YT1 YT2 YTn~ = (In (aTF) The result follows by using (5.3.9) with (5.3.7). Note that the asymptotic normality of /T[(B k Bk )] was an intermediate step in the preceding proof. 