UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
BAYESIAN ANALYSIS OF SMALL DOMAIN DATA IN REPEATED SURVEYS By NARINDER K. NANGIA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1992 wc Trc.ATGo31& ACKNOWLEDGEMENTS I would like to express my sincere gratitude to Professor Malay Ghosh for his invaluable guidance and support throughout this project. I would like to thank Dr. Ronald H. Randles, Dr. P. V. Rao, Dr. Ramon C. Littell, Dr. Geoffrey G. Vining, and Dr. Sencer Yeralan for being on my committee. Much gratitude is owed to my parents, whose support, advice, guidance and prayers throughout the years of my life have made this achievement possible. My debt to them can never be repaid in full. A very special thank you is offered to my wife, Uma, for her continuous comfort, trust, and understanding; and my son, Nishant, for his loving support. TABLE OF CONTENTS ACKNOWLEDGEMENTS ............................ ii ABSTRACT ......... ... .... ................ v CHAPTERS 1 INTRODUCTION ...... ............... ....... 1 1.1 Literature Review ... .. ........ ......... .... 1 1.2 The Subject of This Dissertation ................ 4 2 ESTIMATION OF SMALL AREA MEANS IN TIME DOMAIN... 6 2.1 Introduction ................... ......... 6 2.2 Description of the Hierarchical Bayes Model ........ 8 2.3 Empirical Bayes Estimation .................. 14 2.4 An Application of Hierarchical Bayes Analysis ........ 16 3 MULTIVARIATE ESTIMATION OF MEANS IN TIME DOMAIN 30 3.1 Introduction .... ........... .. .......... 30 3.2 The General Multivariate Hierarchical Bayes Model ..... 31 3.3 An Illustrative Application to Survey data .......... 33 4 HIERARCHICAL BAYES PREDICTION IN FINITE POPULATIONS 46 4.1 Introduction ...... ........... .. ........ 46 4.2 Description of the Hierarchical Bayes Model ......... 48 4.3 The Hierarchical Bayes Predictor in a Special Case ..... 57 4.4 Best Unbiased Prediction in Small Area Estimation ..... 59 4.5 Stochastic Domination ...................... 67 4.6 Best Equivariant Prediction . . .. 72 5 SUMMARY AND FUTURE RESEARCH . ... 86 5.1 Summary .... ......................... 86 5.2 Future Research .......................... 87 APPENDIX PROOF OF THEOREM 4.2.1 ................. 88 BIBLIOGRAPHY .................................. 94 BIOGRAPHICAL SKETCH ........................... 97 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy BAYESIAN ANALYSIS OF SMALL DOMAIN DATA IN REPEATED SURVEYS By NARINDER K. NANGIA December 1992 Chairman: Malay Ghosh Major Department: Statistics We consider in this dissertation estimation and prediction of local area means based on repeated surveys over time. A general hierarchical Bayes model is used for this purpose. The resulting HB estimators for a small area "borrow strength" over time in addition to borrowing strength from other local areas. The Bayesian procedure provides current as well as retrospective estimates simultaneous for all the small areas. The HB methodology, in infinite as well as finite population sampling situations, has been implemented by adopting a Monto Carlo integration technique the Gibbs Sampler. A multivariate HB model is considered to exploit the interrelationship among the parameters in improving the small area estimators. Both the univariate and multivariate methodologies have been implemented on median income estimation problem. The results show that the estimators borrowing strength over time and from other local areas are substantially better than the estimators which borrow strength only from other local areas. In finite population sampling, the predictive distribution of a characteristic of interest for the unsampled population units is found given the observations on the sampled units and is used to draw inference. In particular, we have provided HB predictors and the associated standard errors of the local area means over time. In a special case of this HB analysis, when the vector of the ratios of the variance components is known, the HB predictor of the vector of local area means is shown to possess some frequentist optimality properties. For example, among other properties, we have shown that the HB predictors are best unbiased, best equivariant predictor under a suitable group of transformations. CHAPTER 1 INTRODUCTION 1.1 Literature Review The existence of small area (domain) statistics based either on a census or on other administrative records aiming at complete enumeration can be traced back to 11th Century in England and 17th Century in Canada (see Brackstone (1987)). (For the past few decades, sample surveys for most purposes have taken the place of complete enumeration as means of data collection. These surveys have been able to provide sample based estimates of adequate precision of the parameters of interest at the national and subnational levels. However, the overall sample size in such surveys is usually determined with the target of providing specified accuracy at a much higher level of aggregation than that of the small areas. The resulting small sample sizes for the small areas are usually insufficient to provide sample based estimates of adequate precision of parameters at the small area level and are likely to lead to large estimation errors. On the other hand, the reliability of the small area statistics is crucial for regional economic planning and administration. The evaluation of state economic policies and identification of the subgroups or regions of the population which are not keeping up with average, relies heavily on the small area statistics. Furthermore, the decisions concerning the investments and the location of small industrial units by the corporate sector are usually based on the local social, economic, and environmental conditions. As a consequence, there is a growing interest in the development of reliable small area statistics. In recent years, several modelbased estimation techniques have emerged result ing in improved small area estimates. The essence of these small area estimation techniques is to borrow strength from similar neighboring areas for estimation and prediction purposes. These estimation techniques use models, either explicitly or implicitly, that "connect" the small areas through supplementary data (e.g., census and administration data). Early work of Ericksen (1973, 1974) employed the simple regression method that combines sample data with symptomatic and census data for estimating the population of local areas. By adopting a regression model similar to that of Ericksen (1974), Fay and Herriot (1979) suggested a procedure that used a constraint weighted average of the modeldependent and samplebased estimators with weights inversely proportional to the average lack of fit of the regression, and the variance of the sample estimate. These weighted average estimates were constrained to be within one standard error of the sample estimates. Ghosh and Rao (1991) have recently surveyed most of the existing small area estimation methods. The simultaneous estimation of the parameters at a given time for several small ar eas (strata) by employing a general linear model with covariates have been considered by Fay and Herriot (1979), Ericksen and Kadane (1985, 1987), Ghosh and Meeden (1986), Ghosh and Lahiri (1988), Battese, Harter and Fuller (1988), Prasad and Rao (1990) using either a variance components approach or an empirical Bayes (EB) ap proach. An EB approach does not incorporate uncertainty due to the estimation of prior parameters and thus leads to underestimation of true posterior variance. The underestimation problem gets more severe when one or more variance components of the model are unknown. An alternative hierarchical Bayes (HB) procedure for the small area estimation at a given time, has been considered in Ghosh and Lahiri (1988) and Datta and Ghosh (1991). Both the EB and HB procedures often lead to similar point estimates of the parameters of interest. We may also add that in contrast to the EB approach, the HB approach provides small area estimators that have natural measures of standard errors associated with them, namely the posterior standard deviations. In the sample survey literature, the importance of modelbased approach to small area estimation has been well recognized. However, the modelbased techniques for small area estimation considered so far borrowed strength only from similar other areas to improve small area estimators. However, often these small area (strata) populations are subject to change in time. Usually the time series information is available through the repeated surveys, (e.g. market research survey, crop cutting experiments etc.) carried out at regular time intervals by industries and government agencies. As mentioned in Scott and Smith (1974), Indian National Sample Survey, a multipurpose survey carried out annually since 1950 and the Medical Date Index in The U.K., a market research survey of doctors are examples of nonoverlapping surveys with human populations. In such surveys, one has at one's disposal not only the current data, but also data from similar past experiments. Typically, only a few samples are available from an individual area. In the past, time series models have usefully been applied to the analysis of re peated survey data from the random samples. Early results in this area (see, for example, Jessen (1942), Patterson (1950), Rao and Graham (1964), Raj (1965) and Singh (1968)) have not assumed any relationship between successive values of the population mean. Blight and Scott (1973) and Scott and Smith (1974) have derived the estimates for the mean of a time dependent population using a firstorder au toregressive model under the assumption that all the parameters of the model are known. Rodrigues and Bolfarine (1987) considered the prediction of the population total (in a finite population) using a Bayesian approach based on the Kalman Filter estimating algorithm. However, no results concerning the estimation of the parameters for the small areas under the dynamic superpopulation models wherein past information can be effectively used in conjunction with the current information, are available. 1.2 The Subject of This Dissertation In this dissertation, we present a Bayesian analysis of the small domain data in repeated surveys. Estimation and prediction methodology have been developed for the infinite as well as finite population sampling situations. We consider simultaneous estimation of several strata means through a time series Bayesian estimation model. In Chapter 2, we introduce a HB model for simultaneous estimation of several small areas means in infinite population sampling, assuming that the observations are available only at the aggregate level. The posterior distribution is provided through a fully Bayesian procedure which uses a Monte Carlo numerical integration technique, the Gibbs sampling algorithm. The resulting small area estimators "borrow strength" over time in addition to borrowing strength from other local areas. The methodology developed is applied to the problem of estimation of median income of fourperson families by states. In Chapter 3, we present a multivariate HB model for simultaneous estimation of several small areas means in an infinite population sampling, once again assuming that the observations are available only at the aggregate level. Our aim is to exploit the interrelationship (if any) among the parameters in improving the small area estimators. We have also implemented the proposed analysis on the median income estimation problem considered in the previous chapter through the use of median incomes of three and fiveperson families. The multivariate analysis in this case shows definite improvement in the median income estimates when compared to the univariate analysis in Chapter 2. Chapter 4 addresses the simultaneous estimation of several strata means at dif ferent time points, where each stratum at each time point contains a finite number of elements. We assume that the observations are available at the unit level. The HB model considered in this chapter can be regarded as an extension of the work of Datta and Ghosh (1991) who used general linear models for small area estimation. While the HB predictors of Datta and Ghosh (1991) utilize the data available at a given time, our HB predictors for small areas utilize the data available at a given time, along with all the data available at previous time points. Also, in addition to the current small area estimates, our model provides the retrospective estimates of the small areas at all the previous time points. This chapter also considers a special case of the time domain HB model for the finite population sampling, assuming known variance ratios. In such a setup, the HB predictors have been shown to possess certain optimal properties. It has been shown that HB predictors are best unbiased predictors within the class of all unbiased predictors. Under milder assumptions, these predictors are best unbiased linear predictors. The stochastic dominance (cf. Hwang, 1985) of these predictors over the linear unbiased predictors for elliptically symmetric distributions have also been established in this chapter. Finally, we have also shown that these predictors are also best equivariant predictors for both the matrix loss as well as general quadratic loss, under suitable group of transformations for a class of elliptically symmetric distributions. CHAPTER 2 ESTIMATION OF SMALL AREA MEANS IN TIME DOMAIN 2.1 Introduction This chapter considers the problem of small domain estimation based on data from repeated surveys. Estimation of the characteristic of interest, such as population mean, change in mean over two distinct time points, or other parameters of interest is considered by using a hierarchical Bayes (HB) approach. Such models, in contrast to the ones considered in Chapter 4, will be appropriate when observations are available at an aggregate level, rather than at a unit level. The previous work of Datta, Fay and Ghosh (1991) used a HB approach for analyzing such aggregate data for several small areas at a fixed point of time. Also, the empirical and subjective Bayes approaches considered earlier by Fay and Herriot (1979), Ericksen and Kadane (1985, 1987), Ericksen, Kadane and Tukey (1989) involved similar analysis of small domain data where no time series model was used. One common theme in all these papers was to "borrow strength" from similar other areas to obtain improved small area estimators. The present study borrows strength over time as well in addition to borrowing strength from other local areas. On another front, Bayesian analysis of data from repeated surveys is also available in the literature which does not address the problem of simultaneous estimation from several small areas. Mention may be made in this context of the work of Blight and Scott (1972), Scott and Smith (1974) who considered estimation procedures from a population whose mean varied over time, and applied standard normal time series methods. These results were later generalized by Rodrigues and Bolfarine (1987) who used a dynamic linear model, and used Kalman filter estimating algorithm. Our model encompasses all these models considered earlier as special cases when the number of local areas is one and the variance components, or at least the variance ratios are known. In addition, the model that we consider in Section 2.2 has several novel features. Most of the models considered earlier for time series analysis of survey data assumed known variance components. The present work does not need such an assumption. Instead, the HB method assigns distributions (often noninformative) to these vari ance components, and thereby, in effect models the uncertainty due to unknown prior parameters. In consequence, the usual Kalman filter updating algorithm does not work any more, but the problem is addressed in a more realistic manner without invoking the assumption of known variances, or known ratios of variances. Also, the computational problems are overcome by using MontoCorlo integration techniques such as Gibbs Sampling (see Gelfand and Smith (1990)). Our model also bears some analogy to the one considered in Sallas and Harville (1988) when the number of local areas is one. However, while the Sallas Harville approach is primarily an empirical Bayes (EB) approach ours is a HB approach. The advantage of the present approach over an EB approach is that while the latter does not incorporate uncertainty due to estimation of the prior parameters, and as such typically underestimates the standard errors that need to be reported with the EB estimates, the former produces more honest estimates of the standard errors. The outline of the remaining sections is as follows. Section 2 describes the basic hierarchical model, describes the Gibbs sampling algorithm, and computes the nec essary posterior distributions needed to implement the Gibbs sampling procedure. This section also provides formulas for the posterior mean and variance via Gibbs sampling. An alternative EB procedure is described in Section 3. The methodology developed in Section 2 is illustrated with the problem of estimation of median income of fourperson families by states in Section 4. 2.2 Description of the Hierarchical Bayes Model Let Yj denote the standard sample survey estimate of some characteristic 0i, such as the population mean based on a given sample at time j for the ith small area (i = 1, m; j = 1, t). Our objective is to optimally estimate a linear function T!li =I lijOij, using all the data available upto time t. In particular, we may be interested in estimating current population means for all the small areas based on several sets of sample means available over time. To this end, we consider the following HB model: I. YilOj N(Oj, Vij); II. Oijlct, bj, Vj i0d N(xTa + zTb ?kj) (i II. jaby,1 iN(aTa+z by, 4y) (i= 1,...,m; j= 1,.,t); ind III. bj\b1, W N(Hjbj1, W) (j = 1,. ,t); IV. Marginally a, V1, , Ot, and W are mutually independent with a ~ Uniform(RP), RJ (= 7) Gamma (Jay, 'h), W ~ inverseWishart (S, k). In (II), z~j and zij are known design vectors of size p and q respectively. We say that a random variable R has a Gamma (a, b) distribution if it has a pdf of the form f(r) oc exp( ar)rb I(o,,o)(r); a > 0,b > 0, f() c sp r~z where I denotes the usual indicator function. We say that a q x q, positive definite random matrix W has an inverse Wishart distribution with parameters S and k if it has a pdf of the form f(w) oc IWI(+k2)exp 1 trace (kSW1)] where k > 0 is a known, scalar degrees of freedom parameter, and S a q x q positive definite matrix. This means that W' has a Wishart distribution with degrees of freedom k and scale parameter S. In (IV), we shall allow the possibility of diffuse priors for R4 or W, e.g. aj = 0, b6 = 0, for (j = 1, , t), S = 0. In the case when t = 1, the suffix j can be omitted and Yij = Yj is the usual survey estimate of the small area parameter Oi. Write tj = x, a;, = xi. In such a case, we can write the model IIII as A. YjOi 'i N(0,, V); B. Oiw i N(xzTo, 4 + zTwzi) Special cases of this model are considered in Fay and Herriot (1979) and Ericksen and Kadane (1985, 1987) where zi = 0. In the case when m = 1, the suffix i can be omitted and Yij = Yj is the usual survey estimate parameter Oj at time j. In such a case, our model in (IIII) can be compared with some of the models advocated for the analysis of repeated survey data. To see this, it is convenient to write (IIII) as: (i) Y = zTa + bj + ej, e ~ N(0, Vj), (ii) bj = HIbj_ + w, wj N(0, W) We first consider the time series model of Scott and Smith (1974) for analysis of repeated surveys. The special case of their model when a firstorder autoregressive process is assumed for the characteristic of interest Oj (which in this setup corresponds to bj) can be identified with (i) and (ii). In general stages I and II of the model can be combined into a single stage Yj = zxa + z b, + eii, where eij ~ N(0, Vi + Oj) Our model bears resemblance to the one considered in Sallas and Harville (1988). However, there is a slight difference in the variance structure. To see this, combine stages I and II of the model. This leads to Yij = afa + zbj, + eii, where eij id N(O, Vi + ij). In contrast, in Sallas and Harville (1988), V(ej) = a2Rj, Rj known, a2 possibly unknown, where ej = (eli, emj)'. We emphasize once again that the HB structure considered in this chapter assigns distributions to the parameters a, 1, , ,Ct and W. To our knowledge, the problem has not been addressed to this level of generality before even within a Bayesian framework. We now describe Gibbs sampling algorithm in its general framework. The Gibbs sampling is a Markovian updating scheme which requires sampling from full condi tional distributions. Densities are denoted generally, by square brackets so that the joint, conditional and marginal densities appear as [U, V], UIV], [V], etc. Given a collection of random variables (real or vector valued) U1, Uk, the joint density [U1, Uk] is uniquely determined by [U, IUr, r 5 s], s = 1, .. k. under certain mild conditions (see Beseg 1974, pp 194195). The interest is in finding the marginal distributions [U,], s = 1, ... k. Gibbs sampling algorithm is as follows: given an arbitrary set of starting values, U(), ...,U draw U(1) from [Ui U20),... U)]. Then draw U41) from [U21U 1), U(O) ., U0)],, .U(1) from [Uk1U~'), ., U(1 ] to complete one iteration of the scheme. After t such iterations, arrive at (U', ... U'). There are two alter native ways to obtain i.i.d. k tuples (Uf, ... Uk ),(g = 1, G) from the joint distribution: Parallel: Replicate the entire process of t iterations in parallel G times storing only the last (tth) iterate to provide (U), . U), g = 1, G. Sequential: Employ only one single long run of t x G Gibbs iterates, storing every t'h iterate to provide (Ut Ur(), g = 1, ,G. Gelfand and Smith (1990) adopted the parallel algorithm in simulating samples from posterior distributions. These simulated sample observations can then be used for estimation of any of the marginal densities. In particular, assuming that the conditional density f(U1U,, r j s) is available in closed form, a RaoBlackwellized samplebased estimate of the marginal density of U, (see Gelfand and Smith (1991)) is given by (U,) = E ff(UU(, r s). g=1 We will also be employing the parallel implementation of the Gibbs sampling algo rithm in our analysis of the time domain HB model. In order to implement the Gibbs sampling algorithm we require samples from the following complete conditional distributions: aly, 0, bi, , bt, ri,. ,r, W bj y,,a,b,6i,b.,bjl,bj+l,, bt, ri,,rt, W, j= 1,...,t * ri,,rtty,0,b,,b,W Wjy, O,bi, ,bt, rl, ,rt Oiijy, bi,,bt,rl, ..,rt, W, (i = 1,...,m;j = ,...,t) The hierarchical structure of the model implies that (i) a y, 0, bl,', bt,,rt ri, ,rt, W ~ ajlO, b,, 6, ,b,r,.,r, (iia) bjly, 0,a, bj,#j, rl, ,r,, W b~jO, at, bj1, bi+i, rj, W (j = 1,...,t 1) (iib) b6y, 0, a, bj,,4, rl, rt, W ~ bjO, a, bt,, rt, W (iii) rly, 0, bl,.,bt, W ~ rjla,O,b,,6bt (j = 1,...,t) (iv) Wly, 0,b,, bt, rl,, rt Wlb,...,bt (v) Oj1y, b6i,, bt, r,r, ,r, (W ~ Oilyij, ,r, bj,a (i= 1,...,m;j = 1,...,t) Formal derivation of the complete conditional distributions is now straightforward and is omitted. The complete conditional distributions are given by (i) aly, bl,,bt, ri,...,rt, W ~ iN[(E1 E=,Tj rxiiij)1 Ei=l Et= r(, zb(), (E."i EC lr^ jxj)1] (iia) For (j= 1, .,t 1): bj y,, a, b, *, bj_l, bj+, bt, rl, rt, W N,[rj(EC i zTzj + H+ WHj+l + W1)1 x {(E=l z r) ELi, ZT(O aTa) + HT+ Wlbj+ + WlHjbj_}, (E=t r iUzT + HT+W1Hj+1 + W1)1] (ii b) bt6y, O,o,bl,"', bt6,r ,',rlt,W SN[r i zi + W1)1 x {(E=L IZT )1 2Ei zT(O TZa) + W1Hbt1}, (E=1 rIzzt + W1)1] (iii) rt, , ry,0, bl,...,bt, W nd Gamma(1(a + E = Oi E= Zija zUb,)2, (hj + m)) (iv) Wly,O,bl,.6,bt,rl,...,rt ~ Inverse Wishart(S + Etj=(bj Hjbjl)(b Hjbj_l)T, k + t) (v) Oijly,bl, .,bt,rl,...,rt, W ~ N[(Vi + rj)1 (Vii + r(aa" + z bj),(V 1 + r)1] Our objective is to find the posterior expectation and variance of Oij given y. A RaoBlackwellized estimate for E(O9ijy) based on simulated samples from the above conditional distributions can be approximated by SE E (Oijly, a =, Wt), bj = bjg, rj = r,,, = 1, , g=1 The posterior variance can be approximated by S[E2(Oijly, = a (), W = W(t), bj = bj, r r rj, j = 1,..., t) (G=1 E (0;ly,a. = ), W= ), bjg, rj = r,,j = 1, ,t))] +V(Oily, a = at), W = t), bj = bj,,rj = rj,,j = 1,,t) 2.3 Empirical Baves Estimation In an EB formulation, we assume (IIII) of the model, but not (IV). Also, it is assumed that W is a known matrix. In such a case, we may write the model as YijOij i N(Oij, Vij) Oi N ( ,, + zO(W + z H H )z (2.3.1) l assuming known bo, i.e., bo = 0 without any loss of generality. It follows from 2.3.1 that Y" N [( 'T:aF c D ij + Vij D ij. (2.3.2) where j1 Di, = O~ + zT(W + HHT)zij (2.3.3) l=1 From (2.3.2), the posterior distribution of 0ij given Yi can be obtained as ijj N [x7a + Di,(Di + lVj) (Yj x;a), DijVij(Dij + Vj) (2.3.4) The Bayes estimator of Oij under any quadratic loss is its posterior mean, and is given by = a + Di(Dij + Vi)'(Y xoa) (2.3.5) which may also be seen as a weighted average of the survey estimate Yij and xTa. In an EB analysis, at least one of the parameters a and jt, (j = 1, ., t); is unknown and needs to be estimated from the marginal distribution of the Yi's. First we estimate a. Marginally So N D(aia, + Vj) Using the method of maximum likelihood, a can be estimated by m t 1 m t t, = E (Dij + Vi) E() (2.3.6) i=l j=1 i=l j=1 An EB estimator of Oij is obtained by substituting & for a in (2.3.5) and is given by j = z= & + Dij(Dij + Vi)\(Yij zTf) (2.3.7) The EB approach now attempts to estimate #i by maximum likelihood method or some other standard method. For example, noting that E i(yj XT&)2/(D +m , one can solve iteratively m ( a &)2/(Di + Vi,) = m q (2.3.8) i=1 in conjunction with (2.3.6), to find ^j if a positive solution for i, is found and letting fi = 0, when no positive solution could be found. We observe that the EB approach and the HB approach (considered in Section 2.2) will lead to the same point estimator of 0ij. However, the EB procedure by ignoring the uncertainty in estimating the unknown parameters will lead to the underestima tion of the true posterior standard errors. The complete HB analysis though involved, provide more reliable estimates of standard errors. It is for this reason, we will be employing only the HB approach in the illustrative example considered in Section 2.4. 2.4 An Application of Hierarchical Bayes Analysis In this section, we apply the HB procedure developed in Section 2.2. to an actual data set. Our application concerns estimation of median income for fourperson families by state. Before employing the HB model for median income estimation, we will briefly give a background of this problem. The details are given in Fay (1987). The U.S. Department of Health and Human Services (HHS) administers a program of energy assistance to lowincome families. Eligibility for the program is determined by a formula where the most important variable is an estimate of the current median income for fourperson families by states (the 50 states and the District of Columbia). The Bureau of the Census, by an informal agreement, has provided estimates of the median income for the fourperson families by states to HHS through a linear regression methodology since the latter part of the 1970s. Two basic approaches have been used with change in methodology occurring for income year 1985. In this earlier method, sample estimates Y of the state medians for the most current year c, as well as the associated standard errors are first obtained through Current Population Survey (CPS). The estimates Y were used as. dependent variables in a linear regression procedure with the single predictor variable Adjusted census median(c) = [BEA PCI(c)/BEA PCI(b)] (2.4.9) x Census median(b), where census median (b) represents the median income of the fourperson families in the state for the base year b from the most recently available decennial census. In the above BEA PCI(c) and BEA PCI(b) represent respectively estimates of percapita income produced by the Bureau of Economic Analysis of the U.S. Department of Commerce for the current year c, and the baseyear b, respectively Thus (2.4.9) attempts to adjust the base year census median by the proportional growth in the BEA PCI to arrive at the current year adjusted median. Moreover, in developing the model, the states were divided on the basis of the population into four groups of 12 or 13 states each. Model variances for each were computed as the average of squared residuals of 1969 census medians fitted by a linear regression with an intercept term along with the adjusted census median for 1969 (as given in (2.4.9)) as a covariate. Next a weighted average of the CPS sample estimate of the current median income and the regression estimates is obtained, weighting the regression estimate inversely proportional to the model error due to the lack of fit for the census values of 1969 by the model with 1959 used as the base year, and weighting the sample estimate inversely proportional to the sampling variance. From a Bayesian angle, the above composite estimate is obtainable from the simple hierarchical model of Lindley and Smith (1972). Finally, the above composite estimate was constrained not to deviate by more than one standard deviation away from the corresponding sample estimates. Thus the final estimates were analogous to the "Limited Translation Estimates" as discussed for example in Efron and Morris (1971) or Fay and Herriot (1979). The model was reviewed once the median income of fourfamilies by state in 1979 became available from the 1980 census. The following modifications to the previous modelling were suggested on the basis of this review. First, it was decided to consider a single level of model error across all the states. The reason was that the differences between the estimated model errors for the larger and smaller states greatly exaggerated more current differences. Second, the predic tive value of the regression appeared usefully improved if in addition to the adjusted census median (c), the census median (b) was also included as a second indepen dent variable. The idea was to adjust for any possible overstatement of the effect of change in BEA income upon the median incomes of fourperson families. Also, the one standard deviation constraint was removed. This matter is discussed in Fay (1987). So far, the entire time series of the sample median income estimates available for each year from the preceding census have not been used to obtain the median income estimates for the fourperson families. Our model in Section 2.2 exploits the past time series information available at a given time in addition to the recent most census medians. Before we describe the hierarchical model for median income estimation, we need to introduce a few notations. Let Yi denote the median income of fourperson families in state i. The true median corresponding to Yij is denoted by Oij. Let zijl and xij2 denote respectively the adjusted census median income for the year j (as given in (2.4.9)), and the baseyear census median income for fourperson families in the i'h state (i = 1, ** ,51). Finally, let 0 = (1 xij zi2), (i= 1, 51; j = 1, 10), (2.4.10) and denote by a = (al a2 a3)T the vector of regression coefficients. The following hierarchical model is used to estimate the Oij's: I. YijO ij, vij are mutually independent with Yj l N(Oj, vij), (i = 1,..., 51; j = 1, .., 10) II. Oija, bj, N(za + bj,, j), (i= 1,.,51; j= 1,..,10) III. bjbj_1, 2 nd N(bj_, r2) (j = 1,..., 10) IV. Marginally a,, i/ ,1co, and r2 are mutually independent with a ~ uniform(R3), rj = ,1 ~, Gamma(0, 1), (j = 1, .., 10) ro = (r2)1 ~ Gamma(0,0), To find the estimates of the median income of fourperson families and their standard errors, we need to use the Gibbs sampling algorithm and require samples from the following complete conditional distributions: aly,O, bl, ,bio,ri,,rio,ro bjYy, a,bi, ..,bj,bj+l,..,b o,ri, ,roro, ,j= 1,...,10 ri,',rioly,O,bl,,blo, roJy,0, bi,, bo,ri,,rio Oijy, bl,6, ,blo, ri,,rio,ro, (i = 1,...,51; j = 1,...,10) Again, the hierarchical structure of the model implies that (i) caiy,0, bl,,blo, rl,,r,ro, ro~ a ctO, bl, ,blo, rl,' ,rio (iia) bj ly, 0, a, bj~, bj+i, r, ro ~ bj 10, a, bj1, bj+i, rj, ro (j = 1,...,9) (iib) bioly, 0, ba ,,, bs9,,r ro, ro ~ biol, q, ab, ,ro,ro (iii) rjly, 0,a,b .,bio, ro~ rj Qa, ,bi, .,b o(j = 1,...,10) (iv) roly, 0, bi,, bo, ri, ,rio ~ robl, bio (v) Oijly, bl, ,blo, rl, ,rlo,ro ~ o Oyiji, r, bj, a (i= 1,...,m;j= 1,...,10) These distributions under conjugate priors follow from standard normal and gamma distribution theory and are as follows: (i) cay, O, b, bio, ri, rio, ro N3[( 1 1 rJr2ij g))1 E1 r1jf(j  bj), (2i=l 2E'01 rjxTij)'] (iia) For(j=1,..,9): bjly, O, aI, jil, bj+l, rj, ro ~ N[(51 r, + 2 ro)1{rI E1(,ij a1a) + ro(bj+ + bj,)}, (51 r + ro)1] (ii b) bioly, 0, a, b, ro, ro SN[(51 ro + r),o){r rio ,(o ioT+) + rob09, (51 ro + ro)1] i Gamma(!iE1P LE,(0 a( m3a b1)2, 25) (iv) rojy,, 8, b b,, r, ,rio ~ Gamma( Ej (b bjb)(bj bj1)T, 5) (v) Oijly, bl,"1 blo, rl, rlo, ro ~ N[(v1 + rj) (viyij + r1(a + bj)), (vs1 + r1)1 Simulations from these complete conditional distributions can be made using the standard random number generators from the normal and gamma distributions. We can now estimate the marginal posterior density of Oij ly as described earlier in Section 2.2. We find estimates of the fourperson families for each year from 19791989 (except for 1980) using all the sample estimates available since 1979 census by running 50 iterations of 1002000 replications each. The necessary convergence is achieved after 500 samples. We denote these HB estimates by HB1. As a special case of our model, we have also computed the median income estimates for 1989 utilizing only the census median income figures, the CPS median income estimates for 1989, and the PCI's for the years 1979 and 1989. We denote these HB estimates by HB2. The HB2 estimates are not based on any time series model, and are based on a univariate version of the HB model considered in Datta, Fay and Ghosh (1991). Table 2.1 provides the statewise sample median incomes as well as the HB1 and HB2 estimates for 1989. The standard errors associated with these estimates are given in Table 2.2. Also, for most states, the time domain HB analysis provides reduced estimates of standard errors, when compared to the corresponding standard errors under the non timeseries model of Datta, Fay and Ghosh (1991). The time domain HB procedure results in substantial reduction of the standard errors when compared to the sample estimates. The median income estimates along with their standard errors for the year 1979 are provided in Table 2.3. Since in addition to the sample estimates, the 1979 incomes are also available from the 1980 census data, we compare the retrospective estimates obtained for 1979 by using the time series model against the 1980 census, treating the latter as the "truth." The 1980 Census figures are labelled as Truth in Table2.3. Suppose eiTR denotes the true median income for the ith state and ej any estimate of eiT.. Then for the estimate e = (el, esi)T of eTR = (elTR, es5R)T', let Average Relative Bias = (51)1 ZEi lei eRI /e Average Squared Relative Bias = (51)1 E1I je, eT12 /e? Average Absolute Bias = (51)1 Ei1 e eTR Average Squared Deviation = (51)1 E 1 (e_ e,)2 The above quantities are provided in Table 2.4 for the sample estimates and HB1 estimates. The HB1 estimates are clearly superior to the sample median estimates for each state. In terms of the average relative bias, the HB1 estimates improve on the sample medians by 80.32 %. Using the creterion of average squared relative bias, the corresponding percentage improvement is 97.01. Similarly, for average absolute bias 22 and average squared deviations as the creteria, the respective percentage improvement of the HB1 estimates over the sample medians are 80.88 and 96.27. Thus time series HB procedure improves tremendously on the sample estimates. Table 2.1. Median Income for 4Person Family by State in 1989 (In Dollars ) State Sample HB1 HB2 Median 1 37700 37332 37987 2 52720 47829 48232 3 41489 39604 40086 4 55729 52661 52817 5 47106 43424 43629 6 55048 53203 53020 7 43434 43855 43879 8 53128 53436 53305 9 40990 40745 40762 10 44241 42589 42550 11 40297 39736 39604 12 42590 43075 42817 13 43260 43101 42907 14 42226 41111 41006 15 45432 43631 43439 16 38711 37773 37713 17 39232 38932 38976 18 36966 34946 35068 19 32610 32116 32387 20 40894 38326 38419 21 42781 38820 38821 22 39819 41924 41724 23 48252 50346 49893 24 36587 40270 40384 25 49718 45511 45656 HB1 HB2 Table 2.1. continued State Sample Median 26 31604 27 37423 28 31834 29 45876 30 37194 31 32186 32 31327 33 32245 34 31330 35 33317 36 36219 37 40358 38 32679 39 29586 40 32227 41 38931 42 36732 43 29761 44 38969 45 37938 46 34579 47 42392 48 41419 49 41121 50 41379 51 44999 32141 37399 34407 41050 37827 33546 33843 34079 31559 32480 35123 35680 34247 32598 33030 36451 39591 31238 38828 36913 39129 41821 39342 42010 48607 44785 32217 37564 34626 41479 37876 33779 34041 34272 31922 32918 35118 35865 34117 32602 33161 35852 39294 31462 38868 36942 38722 41559 39256 41820 47146 44369 Table 2.2. Estimated Standard Errors of Median Income Estimates for 4Person Family by State in 1989 (In Dollars ) State Sample HB1 HB2 Median 1 3230 2010 2147 2 4132 2327 2450 3 2848 1936 2011 4 2452 1946 1999 5 4512 2289 2355 6 3893 2245 2379 7 1512 1302 1318 8 1933 1575 1635 9 1584 1335 1349 10 1845 1.562 1583 11 2970 1931 1974 12 1763 1440 1472 13 1597 1341 1374 14 2310 1722 1757 15 3251 2030 2083 16 2258 1698 1734 17 3677 2098 2141 18 2249 1757 1794 19 2130 1630 1679 20 2502 1869 1899 21 3359 2134 2184 22 3699 2117 2164 23 4665 2314 2450 24 4645 2280 2336 25 4349 2292 2362 Table 2.2. Continued State Sample HB1 HB2 Median 26 2278 1695 1736 27 1513 1289 1311 28 2965 1971 2020 29 4385 2295 2377 30 1397 1232 1242 31 2885 1915 1968 32 2446 1830 1870 33 2690 1869 1913 34 2392 1733 1804 35 2435 1762 1836 36 3314 2039 2099 37 3923 2244 2311 38 1552 1405 1394 39 2557 1896 1918 40 2268 1691 1727 41 3516 2158 2356 42 3261 2045 2088 43 2811 1902 1962 44 5003 2293 2346 45 3209 2004 2050 46 2863 2059 2084 47 2525 1788 1842 48 3056 1993 2040 49 1586 1359 1369 50 3637 2299 2665 51 4805 2278 2369 Table 2.3. Median Income Estimates for 4Person Family by State in 1979 (In Dollars ) State Truth Sample HB1 Median SE Median SE 18319 22027 19424 23772 22107 25712 22669 26014 22266 23279 23014 25410 25111 23320 24044 22351 21891 20511 18674 21438 22127 23627 26203 21862 22757 18084 23129 18438 25324 23597 25037 21852 25486 23326 22397 22716 24294 24103 23270 25137 23851 20743 18567 19826 22603 22014 21860 26235 20232 24160 1533 1647 1642 1414 2409 1431 766 1264 878 843 1041 1025 1006 1202 1571 1592 1255 1535 1588 1802 1586 1900 1722 3801 1418 17967 21919 19055 23805 21967 25596 22329 25913 22330 22963 22826 25168 24872 23179 24021 22294 21563 20094 18449 21297 21930 23388 26173 21623 22736 516 495 508 501 511 509 423 502 460 439 451 486 477 467 501 499 478 506 522 501 487 510 528 520 494 Table 2.3. continued State Truth Sample HB1 Median SE Median SE 20214 19772 19944 20668 21086 19685 19693 19926 18150 17893 21412 20659 22521 20776 19961 24641 23757 19257 21924 21572 24438 24394 22688 24752 31018 24966 18274 20296 19282 22687 19675 18657 19776 17978 19167 18917 18965 19295 22963 21216 21533 22013 26746 19968 21733 20727 25447 25463 24251 24265 30788 23744 1380 1012 1795 1196 1042 1285 1274 1282 1762 1507 1444 1675 873 1625 1543 1559 1677 1597 1828 1440 2051 1415 1500 767 1891 2263 19758 19611 19624 20703 20650 19295 19434 19436 17875 17639 20964 20311 22466 20571 19827 24327 23853 19017 21712 21285 24388 24401 22665 24559 31159 24830 506 465 506 517 474 489 483 506 534 530 513 501 435 494 510 530 531 509 496 484 516 498 499 427 655 526 29 Table 2.4. Average Relative Bias, Average Squared Relative Bias, Average Abso lute Bias, and Average Squared Deviations of the Estimates. Sample Median HB1 Aver. Rel. Bias 0.04984 0.00980 Aver. Sq. Rel. Bias 0.00340 0.00014 Aver. Abs. Bias 1090.41 208.45 Aver. Sq. Dev. 1631203.47 60782.92 CHAPTER 3 MULTIVARIATE ESTIMATION OF MEANS IN TIME DOMAIN 3.1 Introduction Periodic sample surveys usually provide samplebased estimates for several pa rameters. The interrelationship among the parameters can play an important role in improving the small area estimators. Fay (1987) demonstrated that, even when one parameter of a vector is of interest, estimating the joint vector may improve the estimation of a given parameter compared to restricting attention only to the single parameter. In particular, he suggested improving on estimation of median incomes for fourperson families discussed in the previous chapters by including information regarding the median incomes of three and five person families as well because of a strong statistical relationship among the three sets. While the HB model of Datta, Fay and Ghosh (1991) involved bivariate data, their analysis was restricted to a fixed point of time. The univariate HB model considered in Chapter 2 can be extended to the mul tivariate problems simply by taking the observations at each time as vectors rather than scalars. In Section 3.2 we present a general framework for multivariate HB anal ysis when the error variancecovariance matrices are known for all time. This section also provides a methodology for obtaining marginal posterior densities through Gibbs sampling algorithm. We revisit the problem of median income estimation in Section 3.3. We find the median income estimates based on the model given in Section 3.2 for the period 19791989 for all the states. We have also compared the estimates for the year 1989 with the corresponding estimates obtained in Section 2.4 3.2 The General Multivariate Hierarchical Bayes Model Suppose, based on a given sample at time j, (j = 1, , t) that Yij = (Yijl, "", Yj,)T is a sdimensional column vector of sample survey estimators of some characteristics 0ij = (Oij, Oj,)T, for the ith small area (i = 1, , m). As in Chapter 2, our ob jective is to optimally estimate a linear function EiZ I= 1 LO9ij, using all the data available upto time t. Consider the following HB model: I. YijO, i N(Oi,, Vii) (i = 1, m;j = 1, t); II. Oi la,bj,j k N(XijT + Zijbj, ,) (i = 1,.,m;j= 1,..,t); III. bjbj_,, W 'i N(Hbj_1, W) (j = 1,*.,t); IV. Marginally a, 1,.. ,t, and W are mutually independent with a ~ Uniform (RP), ij ~, inverse Wishart (Sj, kj), W ~ inverse Wishart (So, ko). In (II), Xij (s x p) and Zij (s x p) are known design matrices. In (IV), we shall allow the possibility of diffuse priors for Ij or W, e.g. Sj = 0,for(j =1,.,t), So =0. The hierarchical structure of the model implies that (i) ajy, ,6 b,...,6bt, i I,..., t,W ,,~ac O,b6,..,bt, j,..., t (iia) bj y, 0, a, bj, ,j10, T, 6 W bj O, a, bj1, bj+, y, W (j,j/= 1,..., 1) (iib) btjy, ,a,bI, 6 ,b_1,O1, ...,Ot, W b6jf c,a,bt,_i,jW (iii) liy, a, b1,. ,b, W jlaa, 61, bi ,6bt (j = 1,...,t) (iv) W ly,0, bl,...,bt, ...,l t ~ W lbl,...,, (v) Oijy, bl,,b, 1, t, W ~ OijyYij, j, bj, a (i= 1,...,m;j = 1,...,t) Formal derivation of the complete conditional distributions is now straightforward and is omitted. The complete conditional distributions are given by (i) ay,O,b6,..,6bt,l,..,C,t, W .~ NP[( Ei E =i X xii)1 E E=X 1 ^ Z b,), ('=l EiX T 1^ ] (iia) For(j=1,...,t1): bj y,O, a, bl,, bil, bj+l,.., bt, 1, i**,t, W ~ Nq E[(1 Z ij + Hj+1 W1'H + w)1 x {i=1 T 51 ( X x (Z ZO' (Oie j Xija) + Hj+Wbj+l + W1jbj, (EZ ZT iZj + Hj+1'H,+ + 1) (iib) btly,0,a,bl,.b, btl,.., W N, [(Ft! Z IZt + W1)I x {E ,1 (T X1ta) + W1Htbt.i}, $= Zit bt (Oit Xitot) W+ Htbtjj, (E z Z, + W1)1] (iii) Ir *, ,.,l y, bl,..., bt, W SInverse Wishart [Sj + E=I (Oi, Xijo Zijbj) x (Oi, Xija Zijbj)T, kj + m] (iv) W jy, b,. .,b,, .. I t ~ Inverse Wishart (So + ,E=I (bj Hjb_l) (b, Hjbj_)T, ko + t) (v) Oijly,bl,,bt,,i,...', t, W SN [(Vi1 + ~, 1) (V1j 1 ( + Zijbj)) , (V1 +' )1] A RaoBlackwellized sample based density estimate of Oij given y based on sim ulated samples from the above conditional distributions can be approximated by 1G SE f (Oij y,a = act), W = t), bj = bj, t,j = i,., 9=1 3.3 An Illustrative Application to Survey data In this section, we will illustrate the implementation of multivariate HB model described in Section 3.2 We again consider the problem of estimation of median income for fourperson families by state. Fay (1987) suggested improving on estima tion of median incomes for fourperson families by including information regarding the median incomes of three and five person families as well because of a strong statistical relationship among the three sets. He advocated the use of a multivariate empirical Bayes (EB) or variancecomponents approach for the composite estimation of median incomes of fourperson families by state. Datta, Fay and Ghosh (1991) adopted a bivariate hierarchical linear model for estimating the statewide median incomes of fourperson families. In their procedure, one component was the median income of fourperson families, while the other component was a convex combination of the median incomes of five and threeperson families. The analysis required eval uation of 3dimensional integrals. Instead, we will use Montecarlo integration via Gibbs sampling algorithm. In our analysis, we will be considering as the basic data 51 three component vectors corresponding to the median incomes of four, three, and five person families for the years 19791989 (except for 1980). Our target is to estimate statewide median incomes of fourperson families, although our proce dure furnishes estimates of median incomes of three and five person families as well. Note that direct numerical integration in a trivariate set up will require evaluation of sixdimensional integrals, which is fairly complicated and often unreliable. The Gibbs sampling technique avoids such problems.Our analysis rests heavily on the implementation of the Gibbs sampling algorithm. We consider as the basic data the three component vector Yij = (Yijl, Yj ij3 )T i = 1, 51;j = 1, 10, where Yij1, Yj2, and Yijs are the sample median incomes of four, three, and fiveperson families in state i in year j. The true medians cor responding to Yi's are denoted by Oij's respectively. Let Oij = (0ijl, Oij2, Oij3) We shall also write as Y = (Y1,I, Y5110)T 9 = (011, 051,10)  We would like to introduce few more notations before describing the HB model. Let xijll and zij12 denote respectively the adjusted census median income for the year j and the baseyear census median income for fourperson families in the ith state (i = 1, ,51). Also, let zix21 and Zij31 denote respectively the adjusted census median incomes for the year j for three, and fiveperson families. The corresponding baseyear census median income for three, and fiveperson families are denoted by xz;22 and xij32. Finally, let 1 ziinl .i2 0 0 0 0 0 0 Xi = 0 0 0 1 xij21 Xij22 0 0 0 (3.3.1) 0 0 0 0 0 0 1 Xij31 Xij32 i = 1, 51, j = 1, 10, and denote by a = (alx, a9)T the vector of regres sion coefficients. Also, bj = (bji bj2 bj3)T, (j = 1,..., 10) is the vector of random components for the year j. The following HB model is used: I. Y ejIO, N(Oij, Vi,) (i = 1,..,51;j = 1,..,10); II. Oita,bj,, N(Xja + b6, Tj) (i = 1,...,51;j = 1,..,10); III. bj6bj1, W d N(bj_1, W) (j = 1, .., 10); IV. Marginally a, ,1, ~ *, and W are mutually independent with a ~ Uniform (R9), Oj~ inverse Wishart (Sj, kj), W ~ inverse Wishart (So, ko). In (II), Xij (3 x p) is a known design matrix. In (IV) we assume diffuse priors for Oj or W, e.g. Sj = 0, for (j = 1,, ,10), So = 0. As mentioned earlier, closedform expressions for E(Oijly) and V(Oijly) in such a setup can not be obtained. Also, it is too complicated to estimate E(Oijly) and V(Oij y) through numerical integration. We will be using the Gibbs sampling proce dure to simulate samples from the following complete conditional distributions: (i) a"y, O, b, *, 6bo, 1, l, io, W N E10 X ij Xij) i 1 =1 X10 X ),Ij (iia) For(j = 1,...,9): bjly, 0, r1, bl, b + 1 1bj+l, i, 1 0 j Wo, SN [(51 l l + 2 Wl)l x f{1 =1i (Oi, Xja) + W1 (bj+l + bj)), (51 1 2W1)1] (iib) bioly,O,a,bl, ,blo, i, .., 0,o,W SN [(51 to1 + W) x {o10ik E1 (Oit Xia) + W1}, (I51 1 + 11] (iii) l, .., oly, O I, bl,b , ,blo, W id Inverse Wishart [Sj + E10 (0i Xija bj) x (0ij Xij bj)T 51] (iv) W ry, 61,b, 1, bi, 010 ~ Inverse Wishart (EF,=0 (bj bj1) (bj bj_)T, 10) (v) Oijy, b, '", blo, 1,*"*,1o, W1 ~ N v 1 (iy, + X1 (X a + ) i(V + (V With all the complete conditional distributions available in closed form, we can now estimate the marginal posterior density of ij ly as described earlier in Section 4.2. We find estimates of the fourperson families for each year from 19791989 (except for 1980) using all the sample estimates available for three, four, and five person families since 1979 census (except for 1980) by running 50 iterations of 1002000 replications each. The necessary convergence is achieved after 500 iterations. We denote these HB estimates by HB3. As in Section 2.4, we also consider the non time series analysis which utilizes only the census median income figures for 1979 and the CPS median income estimates for 1989, and the PCI's for the years 1979 and 1989. In this case, we utilize the data available for three, four, and five person families in contrast to using only the information for fourperson families as in Section 2.4. We denote these estimates by HB4. These estimates are based on a trivariate version of the HB model considered in Datta, Fay, and Ghosh (1991). Table 3.1 provides the sample median incomes as well as the HB3 and HB4 estimates. The HB3 estimates are based on the time series HB described earlier in this section. The standard errors associated with these estimates are given in Table 3.2. Again in conformity with the results in the case of univariate analysis in Chapter 2, the time domain HB procedure results in substantial reduction of the standard errors when compared to the sample estimates. Also, the standard errors associated with the HB3 estimates are lower than the standard errors of the HB4 estimates for each state. The median income estimates along with their their standard errors for the year 1989 are provided in Table 3.3. Also, as in Section 2.4, we compare the retrospective estimates obtained for 1979 by using the time series model against the 1980 census, treating the latter as the "truth". We also compute the creteria average relative bias, average squared relative bias, average absolute bias, and average squared deviation (as defined in Section 2.4) for the HB3 estimates. These quantities are provided in Table 3.4. Again, the HB3 estimates are clearly superior to the sample median estimates for each state. In terms of the average relative bias the HB3 estimates improve on the sample medians by 87.62 %. Using the criterion of average squared relative bias, the corresponding percentage improvement is 98.24. Similarly, for average absolute bias and average squared deviations as the creteria, the respective percentage improvement of the HB3 estimates over the sample medians are 87.69 and 98.28. Thus time series HB procedure improve tremendously on the sample estimates. We also compare the HB3 estimates for the year 1979 based on the trivariate time series HB model with the HB' estimates based on univariate analysis in Chapter 2. In terms of the average relative bias, the HB3 estimates improve on the HB' estimates by 37.04 %. Using the criterion of average squared relative bias, the corresponding percentage improvement is 57.14. Similarly, for average absolute bias and average squared deviations as the criteria, the respective percentage improvement of the HB3 estimates over the HB1 estimates are 33.67 and 53.94. Thus, the trivariate time series HB model captures the interrelationships between the median income of three, four and fiveperson families to improve the median income estimates of fourperson families. Table 3.1. Median Income for 4Person Family (In Dollars ) State Sample Median 37700 52720 41489 55729 47106 55048 43434 53128 40990 44241 40297 42590 43260 42226 45432 38711 39232 36966 32610 40894 42781 39819 48252 36587 49718 by State in 1989 HB3 HB4 37826 49130 40889 53372 43723 54755 43760 53756 40582 42408 38250 42732 43405 41033 43043 37100 38512 35498 33137 38846 38934 43068 51029 39918 46250 38172 48988 41118 52986 43578 54140 43513 53076 40508 42326 38121 42360 43090 40932 42748 37122 38491 35790 33514 38988 38990 42751 50229 39842 46019 Table State HB3 HB4 3.1. continued Sample Median 31604 37423 31834 45876 37194 32186 31327 32245 31330 33317 36219 40358 32679 29586 32227 38931 36732 29761 38969 37938 34579 42392 41419 41121 41379 44999 31102 38078 35686 40665 36952 33722 33911 34192 31902 31516 34409 34252 34226 32865 33112 37177 40034 30098 38132 36890 38654 42561 39475 42787 49151 45806 31346 38174 35865 40830 36919 33976 34048 34365 32284 32021 34543 34551 34137 32944 33319 36842 39727 30441 38100 36953 38221 42316 39404 42421 47671 45262 Table 3.2. Estimated Standard Errors of Median Income Estimates for 4Person Family by State in 1989 (In Dollars ) State Sample HB3 HB4 Median 1 3230 1867 2051 2 4132 2129 2281 3 2848 1825 1925 4 2452 1710 1848 5 4512 2065 2168 6 3893 2291 2399 7 1512 1167 1231 8 1933 1389 1545 9 1584 1157 1194 10 1845 1350 1399 11 2970 1774 1830 12 1763 1272 1342 13 1597 1225 1277 14 2310 1581 1630 15 3251 1865 1925 16 2258 1501 1560 17 3677 1875 1928 18 2249 1537 1615 19 2130 1494 1586 20 2502 1687 1737 21 3359 1952 2016 22 3699 1972 2024 23 4665 2214 2355 24 4645 2235 2295 25 4349 2017 2104 HBI3 HB4 Table 3.2. continued State Sample Median 26 2278 27 1513 28 2965 29 4385 30 1397 31 2885 32 2446 33 2690 34 2392 35 2435 36 3314 37 3923 38 1552 39 2557 40 2268 41 3516 42 3261 43 2811 44 5003 45 3209 46 2863 47 2525 48 3056 49 1586 50 3637 51 4805 1515 1140 1766 2049 1113 1774 1702 1706 1614 1587 1841 2031 1287 1693 1516 2030 1897 1661 2022 1745 1863 1679 1821 1216 2260 2144 1594 1194 1842 2148 1154 1839 1761 1768 1701 1704 1931 2135 1292 1731 1573 2280 1944 1773 2089 1809 1914 1749 1881 1235 2607 2252 Table 3.3. Median Income Estimates for 4Person F family by State in 1979 (In Dollars ) State Truth Sample HB3 Median SE Median SE 18319 22027 19424 23772 22107 25712 22669 26014 22266 23279 23014 25410 25111 23320 24044 22351 21891 20511 18674 21438 22127 23627 26203 21862 22757 18084 23129 18438 25324 23597 25037 21852 25486 23326 22397 22716 24294 24103 23270 25137 23851 20743 18567 19826 22603 22014 21860 26235 20232 24160 1533 1647 1642 1414 2409 1431 766 1264 878 843 1041 1025 1006 1202 1571 1592 1255 1535 1588 1802 1586 1900 1722 3801 1418 18206 22032 19246 23808 22094 25631 22414 25938 22440 23061 22819 25167 24963 23322 23928 22375 21695 20211 18621 21395 22025 23476 26202 21737 22866 511 481 498 467 499 493 395 482 453 412 449 459 454 471 526 477 456 491 506 482 466 493 517 507 486 Table 3.3. continued State Truth Sample HB3 Median SE Median SE 20214 19772 19944 20668 21086 19685 19693 19926 18150 17893 21412 20659 22521 20776 19961 24641 23757 19257 21924 21572 24438 24394 22688 24752 31018 24966 18274 20296 19282 22687 19675 18657 19776 17978 19167 18917 18965 19295 22963 21216 21533 22013 26746 19968 21733 20727 25447 25463 24251 24265 30788 23744 1380 1012 1795 1196 1042 1285 1274 1282 1762 1507 1444 1675 873 1625 1543 1559 1677 1597 1828 1440 2051 1415 1500 767 1891 2263 19890 19702 19757 20824 20717 19539 19532 19541 18106 17844 21084 20426 22501 20646 20008 24412 23944 19167 21861 21366 24416 24399 22720 24518 30937 24873 496 445 502 489 464 484 471 510 517 517 491 489 416 482 500 505 518 496 489 469 498 477 481 418 660 508 45 Table 3.4. Average Relative Bias, Average Squared Relative Bias, Average Abso lute Bias, and Average Squared Deviations of the Estimates. Sample Median HB3 Aver. Rel. Bias 0.04984 0.00617 Aver. Sq. Rel. Bias 0.00340 0.00006 Aver. Abs. Bias 1090.41 134.27 Aver. Sq. Dev. 1631203.47 27996.90 CHAPTER 4 HIERARCHICAL BAYES PREDICTION IN FINITE POPULATIONS 4.1 Introduction In Chapter 2 and Chapter 3, we introduced HB procedures for small domain estimation based on data from repeated surveys from an infinite population. In such models, we assumed that the observations are available only at the aggregate level. There we derived the HB estimators for the current as well as previous time points, simultaneously for all the small area population means. In this Chapter, we consider the problem of small domain estimation in the context of finite population sampling. In the finite population setup, we assume that there are m strata, the ith stratum at time j contains a finite number of units Nij, and the unit level information on the characteristic of interest at time j, (j = 1, *. t) is available through a sequence of random sample surveys of the given strata populations. We are interested in predicting certain parameters (such as finite population mean or total or change in mean over two distinct time points for each stratum) of the strata populations which are changing over time. We describe a general HB approach in Section 4.2. There we consider the most general situation where all the variance components are unknown. Some prior dis tributions are assigned to the vector of regression coefficients and the variance com ponents. This section provides the predictive distribution of unobserved population units at all the time points given the time series data on the sampled units. The com putation of the estimation of population means simultaneously for several small areas at all the time points through the implementation of the Gibbs sampling algorithm (as described in Section 2.2.1) have also been discussed in this section. In Section 4.3, we consider a special case of the general HB model in Section 4.2. We assume that A, the vector of ratios of variance components is known. In such a case, it is possible to find the posterior mean vector and posterior variance covariance matrix corresponding to the characteristic vector of nonsampled units at all the time points, in a closed form. In subsequent sections 4.44.5, we assume this special case of the HB model. The corresponding HB predictor as obtained in Section 4.3 is shown to possess some interesting optimal properties. Datta and Ghosh (1991) also assumed known ratios of the variance components in a special case of their general HB approach for small area estimation. The HB predictors in their special case have been shown to be the best with in the class of all linear unbiased predictors. For a class of spherically symmetric distributions, their HB predictors are also optimal within the class of all unbiased predictors. In Datta and Ghosh (1990) these HB predictors dominate stochastically the linear unbiased predictors in the sense of Hwang (1985). Furthermore, under a suitable group of transformations, Datta and Ghosh (199091) showed these predictors to be the "best" within the class of all equivariant predictors for elliptically symmetric distributions. In the set up of Datta and Ghosh, the HB predictor borrowed strength from other small areas. In our setup, as we have mentioned earlier in Chapter 2, the HB predictor borrows strength over time as well in addition to borrowing strength from other small areas. Our HB predictors as derived in Section 4.3 based on a time series HB model, possesses all the optimal properties listed above. In establishing this in Sections 4.44.6, we have followed the approach as in Datta and Ghosh (199091). In Section 4.4, it is shown that the HB predictors of section 4.3 are best unbiased predictors within the class of all unbiased predictors. Besided, it is shown that these predictors are BLUPs. Section 4.5 establishes the stochastic dominance (cf. Hwang, 1985) of these predictors over the LUPs for elliptically symmetric distributions. Finally, in Section 4.6, we have shown that these predictors are also best equivariant predictors for both the matrix loss (or standardized quadratic loss) under suitable group of transformations for a class of elliptically symmetric distributions. 4.2 Description of the Hierarchical Baves Model In this section, we consider a hierarchical model for prediction of the characteristic of interest (like the population total or mean for each small area or domain) in the context of finite population sampling Let Yijk denote some characteristic of interest associated with the kth unit at time j in the ith small area (k = 1,* ,Nii;j = 1, *.*, t; i = 1, .., m). For notational simplicity, we assume that the characteristic of interest is a scaler. The results can be easily extended to the case when Yijk is a vector. Let yQl)= (y ...,y.. ,,T Y(2) = (Y; +,n1+ y;, T The superscripts (1) and (2) are used to denote the sampled and nonsampled units in the ith small area at time j, respectively. Nij is the population size of the ith small area at time j from which a sample of size nij is taken (1 < i < m; 1 < j < t). X!) (nii x p) and Z!) (nij x q) known design matrices correspond to the sampled units. X) ((Nij nij) x p) and Z2 ((Nij ni,) x q) correspond to the unsampled units. We consider the following Bayesian model: I. (Y) )() 12 A( N A 9 )() y(2) (\ ?(2) 0 (2 I (i= 1,,m;j = 1,,1) II. ab),a, N' N ((a + L b ( \ j 13 \ (i = 1,.,m;j = 1,,t) III. blb _, W,V nA N(Hjbj_,, 2W(A)); (j = ,t) We assume that Vij(A), j (A), (i = 1, .., m;j = 1, t), and W(A) are struc turally known except possibly for some unknown A = (A1, A). Hj(q x q), (j = 1, .. ,t) are known matrices. bo is assumed to be known. Without any loss of gener ality, we assume bo = 0. Before writing the stages IIII of the above model in a compact form, we need to introduce a few notations. We denote a u x u identity matrix by I,. P denotes a null matrix of suitable order. We define C () =c T C ,C)T C(1) T C = Y fO 2TY ... (i = 1,... ,m) C = i 2t Mt C(2) f(2)T rC2) ... 2)T T t it, ,t M mt C 2) y V(2T V(2)2T T (i1 *mT t i l i2 I ii ) I)T vX ()'T i T !X, ,Xi2 ,,X ) ; (i=l,..,m) S X 2)T X(2)T 2) (i 1 rn)  i'il ,"'.2 ,'","i t ) ; ( = ,""  X 1T,XIT.2 ..,XI s)T 1 2 M = X X2)T (2)T = HII=tH, l=1,2,,j; j= ,,t = Iforl=j+l, j=l,...,t B(1) = Bit = B( 2) B(1) Z ()U22 i2Utz it Ut2 Z(2) Si2 22 B 2) Zit U2 It B(l) mt zQS Z(2) i2= B(2) It D(2) B2t B(2) *mt Vt = Block Diag (VII, ., Vlt, V21, ..., V2t, VmI, ., Vmt) We now write the model as a compact linear mixed model which incorporates all the information up to and including time t for all the small areas. We have, without any loss of generality, listed the sampled units first. Ai2) UjI X 2) S ... z!: ... z,) * ** Z)t c" AC / 'A Bi) \ C= ( A( ) a+ w + ( where w ~ N (o,or (It W(A))) ~ N (, 2 )()) V()) N (0, a.2 ((Im 0 obt(A)) + Vt(A))) Writing S(1) Ct = ) k C^2)) given a, A, a2, and r C, ~ N (At a, r1 Zt) where Et = Xt(A) = Bt (It 0 W) BT + (Im, 0' t)+ Vt Furthermore, we partition Tt as Et [ 1 T1 1 2 I t21 <22 where E2t, is nt x nt, E.t2 (= 1t2) is nt x (N nt), and Zt, is (Nt n) x (Ntnt). nt = Zi=1 E=1 nij N, = E =1 J=Nij We also define (4.2.1) , A = A2) At ) , R = (o.2)1 (4.2.2) B SBt B (1) 'Bt ~ (2B^) Zt22.1 = Zt22 Et21 .t11t21 K, = E Ai) (A(1)T A) 1 AiT_1 Mt = t21Kt + A2) (At1)T A) 1 A(1)TZl tt t t 't lI Gt = t22.1 + (A2) t211 A(') x (At(I)TE A 1 A(A' ( t21 1 A1) til t At (A X t,t() Our objective is to use all the data available at a given time in predicting (t t(C'),t ) = PIC1) + P2 where Pi and P2 are known matrices. In particular, we may be interested in pre dicting retrospectively the population totals for the small areas at the previous time points based on all the available sampled data at a given time. To this end, we derive the predictive distribution of C2) given C~1) = c_1) in the following theorem. A proof of this theorem is given in Appendix. Theorem 4.2.1 Consider the model in 4.2.1 or 4.2.2. Assume that a, R, AXR,. A,R are independently distributed with a ~ Uniform (RP) R ~ Gamma (Nao, ho); ao > 0, ho 0 AIR ~ Gamma(fat, ht);l = 1,l,s with at > 0, hi > 0, (1 = 1, ., s). Assume that nt + E=o hi p > 2. Then conditional on A = A, and C1) = c1), CI2) is distributed as multivariate t with degrees of freedom nt + o ht p, location parameter Mtc') and scale parameter nt = (nt + ht p ao+ f a + K Gt (4.2.3) 1=0 1=1 Also the conditional distribution of A given CI1)= c' ) has the pdf f(A  c1)) oc 1tiil A(1)T 1iA ) [I Ah ( T \ ~(n+ELo A'p) O + E atit + +ct Kc, )) (4.2.4) Now the conditional expectation and variance of Ct2) given C 1) = c1) can be derived as E(C\2) c C)) = E(Mt cMl))c(1) (4.2.5) V (C)I c)) = V (E (C2\I A, c1))  c'1)) + E (V (C2 A, c(')) I c)) = V(Mtc(,)Ic 1 ))+ (nt+Zhlp22 \ l=0  x E ao + a + c TKc) G, I c1)] (4.2.6) From (4.2.5) and (4.2.6), for t > 1 we can find the posterior mean and variance of ~ = ,(CI1), C2)) = P1CC12 + P2Ci2 where P1 and P2 are known matrices . The Bayes estimate of ,(C '), C(2)) under any quadratic loss is its posterior mean and is given by e(c)) = [P1 + P2E (Mtc))] c Also, V [, (c('), C(2)) ic(')] = (c'2)1 c(')) P Note that when P1 = E itP1i and P2 = OiP2i where Pii = [0, 0, ,1,niTj ,0 and P2i = [0, 0,', N, In,, ,0] t(C 1), C 2)) reduces to the vector of finite population totals for the m small areas at time j, (j = 1,2,**,t) and whereas for the choice Pi = 1nPlif/Nij and P2 = ED,1P2i/Nij, ~,(C'), C2)) reduces to the vector of finite population means for the m small areas at time j, (j = 1,, 2 *, t). Also for t 2 2, when P1 = (Ei=Pli and P2 = @iEP2i where P [ 1 1T 1iT] Ni(t_) ,i(,1), N tit and P2i = [0 T,,,, 1 IT 1 Ct(C 1), C2)) reduces to the vector of change in mean at time t from previous time point, for the m small areas. However, in order to find the conditional distribution of C'2) given C1') = c 1) analytically, we need to find the posterior density f(A c,)) as given in (4.2.4). Even when A is a smalldimensional vector, it is not easy to find this density through the use of sophisticated numerical or analytic approximation techniques. In contrast, this posterior density as well as the conditional expectation and variance in (4.2.5) and (4.2.6) can be found with considerable ease in a more general case when Pj, (j = 1,, t) and W are unknown, by employing the Gibbs sampling algorithm. It is assumed that Vij is known. In such a case, we rewrite the model as: ( ) Q )( ) R, in)R d N1 (1) '3 where Vij Vi 11 Vij12 Vij2l Viji2 ' (i = 1,,m;j = 1,...,t) II. a() )I R IN i ))cr+( ~ b ),r ; (i = 1,,m;j = 1, ,t) .nd III. bjb,1, W,R N(Hjbj,_, rxW); (j= 1,..,t) We assume that a, R, j,, ,, and W are mutually independent with ca Uniform (RP), R .~ Gamma(lao, 1ho) ao 2 0, ho > 0, Pj ~ inverse Wishart (0, kj), W ~ inverse Wishart (0, ko). The Gibbs sampling algorithm requires samples from the following complete condi tional distributions: ctc), 2), 0R,b,... b b, ..,t ,W bjclc ),c2),0,R,a 1i,..., t,'. ,bj1,bj+l,.. ,b W,j = 1,..., t t1,, 12tbI),c02),0,R,a ,bi,,' bt,, W WIj1) C2) R, a, b, bbt, b~ *,, t Y(? Ic),0, R, b, bt,1,. ..,i,, W,,(i = ,...,m;j= 1,...,t) The hierarchical structure of the model and the conjugacy of the priors lead to the following complete conditional distribution: (i) lccl),c2)t,O,r,bt,* ,bt,ox vkt,W ~ alcl), 0, r, 6bl,,, 6bt,, t, ~ [(Ep E=I X^> p"[r=l X= T 1Xi) $=, E =,XT xl (0; Zijbj), (ET E=1 X Xij )I1 (iia) For(j=1,...,t1): bjc ) i),c ,)0, r, ,l, , bjx, bi+l, ,bt, b, i, W, .bj0, r, a, 6b, *, bj6, bj+l, bt, 1, ,* *t* W Nq [(E= Z ij Zij + Hj+w1Hj+1 + W1)1 x f{F1 ZO1 (6Ow X1a) + HfjTWbj+1 + W'Hjbjl, (E= Z 1 Zij + Hj+W1Hj+1 W1) (7b) bIci ,c ,O r ,, ,...c,bt,0j,... ,I ,t, W (iib) bt, c r, ac, 61, *, bt6_, ik, *, ', W SN E[(E ZI Z Zi + W1)  x {Ei, Zt,1 (if, Xita) + w1+'Htb_, (E zTolzt + w1)1] t= it t/ (iii) ,... tcIl),c 2),r,at,0,bl,*.*,bt,W ~ i,, 0tlr, a,0, bl, bt in Inverse Wishart [r E~i1 (Oij Xica Zijbj) x (Oij Xijc Zijbj)T, m + kj] (iv) W lIc1), 2), ,0, 6b, , bt,, ....,.t ~WITr, bl, , bt ~ Inverse Wishart (r E (= (bj Hjbj) (b, Hjbj_)T t + ko) (v) 0....l(1) .(2) bl b, ba,, W (v) 0c ) ,c( 2),a,b b,,,b , ,, (W ~Offic ,ct ,C ,b**,,bt,,W ~ N [(V1 + 1)1 (Vj1yj + 1 (Xi a + Zb,)) , (vi) " (vi) yIc), r, a, b__, "",, W NN n, [ + Vj2, 1V, (yj) 0!)) Vij Vij] 4.3 The Hierarchical Baves Predictor in a Special Case This section considers a special case of the HB model in Section 4.2. We assume that the vector of the ratios of variance components, A is known. Our target is again to find an optimal predictor for t, t,(C '), C 2)) = PIC ) + P2C 2) for some appropriate P1 and P2, based on all the available data at a given time. To this end, it suffices to find the predictive distribution of C2) given C(1) = c?1). This is done in the following theorem. Theorem 4.3.1 Consider the model given in 4.2.1 or 4.2.2 with A known. Assume that ao and R are independently distributed with ac ~ Uniform (RP) R ~ Gamma (Jao, ho); ao 0, ho > 0 Also, assume that nt +ho p > 2. Then conditional on C(1) = c1), C2 is distributed as multivariate t with degrees of freedom nt + ho p, location parameter Mtc 1) and scale parameter O, = (nt + ho p) (ao + ct )1) G, (4.3.7) Proof of this theorem is similar to that of Theorem 4.2.1 and is omitted. In this case, it is possible to obtain the Bayes estimate of t = (C'), CI2)) in a closed form and is given by et*(C)) = [P + P2E (MAIc1))] c) (4.3.8) Remark 4.3.1 We may note that the predictor ef*(c,1)) given in (4.3.8) can be obtained without using any numerical integration techniques or the Gibbs sampling algorithm. Remark 4.3.2 The predictor ef*(c(')) does not depend on the choice of the prior (proper) distribution of R. In this sense, the predictor ef*(ct')) is robust against the choice of priors for R. To see this, E (C )ICi1)) =E [E{E (c 2ja, R, C 1)) R, C'}) IC1)] = E [E{A~a + ~t21 t (C1 A1)a) IR, C1) Ct = C(E21jC 1) + (A2) 2 A M) xE [E {a R, C 1)} IC()] = .21X C ()+At Et21 _A A() {,,o)T'r_1A(1) 1 AT vFTI C) x (At(1 A 1) At 1)Til c1 = M,Ci1), It is assumed that all the expectations appearing above exist. Remark 4.3.3 The predictor ef*(c l)) of 4, ,(Ci', C2) can also be obtained in an alternative manner. Suppose, first a known and r may be known or unknown. In such a case, the best predictor (best linear predictor without the normality assump tion) of ,((C'1), C(2)) in the sense of having the smallest mean squared error matrix is given by EaR [ (C C1), C(2)) C = A(1)C + P2 [~t21E + (A Et21,'A(1)) a] a.e. (Ci)) Now, when a is unknown, then one replaces a by its UMVUE (At1 A T (1) A/)T IC( I. The resulting predictor of S,(CIC ) turns out to be ef*(cPl)). In this sense, ef*(c1)) is also an EB predictor of (,(CI1), C2)). 4.4 Best Unbiased Prediction in Small Area Estimation In this section, we prove the optimality of the predictor ef* within the class of all unbiased predictors of 4~(C 1), C ). We consider the model given in 4.2.1 or 4.2.2 with A known, but a and r = (02)1 unknown. We do not assume any prior distributions for a and r and treat them as unknown parameters. Write 0 = (aT, r)T. Definition 4.4.1. A predictor T(C1')) is said to be a BUP of &,(Cg'),C 2)) if Eq [T(CI1)) (CI), C2))] = 0 for all P and for every predictor 6(C~')) of (t(C1), C(2)) satisfying EO [6(C'1)) (&(C'), C2))] = 0 for all 0 (4.4.9) V [6(C 1)) t(C&), C2))] V [T(C1)) ,(C c), Ci2))] (4.4.10) is nonnegative definite (n.n.d) for all 4 provided the quantities are finite. The BUP property of eB* is proved in the following theorem. Theorem 4.4.1 Consider the model given in 4.2.1 or 4.2.2 with a and r = (02)1 treated as unknown parameters but A assumed to be known. efB(C 1)) = (PI + P2Mt) CI1) is the BUP of t(C), () C ). The proof of the theorem is facilitated by the following a general lemma which is a simple extension of Lemma 3.3.1 of Datta (1990). The lemma concerns unbiased prediction of a general g(Ct) (u x 1) based on C(1), where g(Ct) is not necessarily equal to t (C' C2)) or linear in Ct. It is assumed that a finite second moment exists for each component gi(Ct), i = 1, u. Lemma 4.4.1 A predictor T(C 1)) E Ug is BUP for g(Ct) if and only if Cov T [T(C')) g(CI')), m(C))] = 0 (4.4.11) for all g(Ct) and for every m E Uo, Ug = Class of all unbiased predictors 6(C')) of g(Ct) with each component of 6(CI)) having a finite second moment. Uo is the class of all real valued statistics (i.e., functions of C(1)) with finite second moments having zero expectations identically in P. Proof of Lemma 4.4.1 Let T(C(C)) = (TI(CI')), T,(C1)))T. If 6(C)) = (61(Ci)), .. ,S,(C1l)))T is another predictor in Ug, then v, [6(c~)) g(C,)] = V4 [T(C1)) g(C,)] + V [(C1)) T(Ct)] +Cov4 [T(C')) g(Ct), 6(C 1)) T(C,)] +Cov [6(C'1)) T(Ct), T(C'1)) g(Ct)] (4.4.12) Now T(CI')) b(C')) E Uo for every i = 1, ...,u. Hence, using the condition of the lemma, for 1 < i < u, CovP [T(C 1)) g(C), T,(C1') 6 (C~1)] = 0. (4.4.13) From (4.4.12) and (4.4.13) it follows that V [6(C)) g(Ct)] = V [T(C')) g(C,)] +V, [6(C1)) T(C,)] (4.4.14) for all 4. Hence T(CI')) is BUP for g(Ct). Only if. Given that T(C1')) is BUP, we will show that the condition (4.4.11) is true. First we will show that TI(C~1)) is BUP for gi(Ct) for every i = 1, , u. Let Ui(C(1)) be any unbiased predictor for gi(Ct). Then 6*(C1')), a ucomponent vector with i th component equal to Ui(C(')) belongs to Ug. Then V4 [6*(C t)) g(C)] VS [T(C')) g(C)] is n.n.d. by hypothesis. So we have Vo [Ui(c$) gi(Ct)] VP [Ti(C 1)) gi(Ct)] > 0 and consequently Ti(C')) is BUP for g,(Ct). Now following the usual LehmannScheffe (1950) technique (also Rao, 1952), for any mE Uo Cov [Ti(C(')) g,(Ct), m(Cc1))] = 0 (4.4.15) for all 1 < i < u. Hence, (4.4.11) holds, and the proof of the lemma is complete. Remark 4.4.1. It follows from the proof of the above lemma that the BUP of g(Ct) is unique with probability one. Proof of Theorem 4.4.1 In view of Lemma 4.4.1, it suffices to show that for every m(CP)) e Uo, Cov [eB*(CM1)) 4t(C1)), C22)), m(C01))] 0 for all <. that is E [P1(MtC1) C2) m(C2))] = 0 for all 4. Since, under the model, MtCf ) E(Ct2) I Ct1)) = (A2) Et21lA(')) x (A )T E At )A t At EC.l ] using E[m(C~1')] = 0, it suffices to show that E [(A()Ftilc )m(C))] = 0 (4.4.16) for all k. Since Ep[m(Cl'))] = 0, J m(C 1))exp [r(c1) A1)a)T X (cl) A1)a) d]1) = 0, (4.4.17) differentiating both sides of this equation wr.t a, one gets (see p 318 of Rao, 1973) fJ A1)TEji (c1) A 1)a)m(CI1)) xexp [r(c1) A(1))T (c') A1')a)] dc1) (4.4.18) =0. Using Ep[m(C(1))] = 0 again, (4.4.16) follows from (4.4.18). The proof of Theorem is complete. Remark 4.4.2. We can also prove Equation (4.4.16) in the following way. Note that since C(1) N (Atl)a, r1 u) (Ati (') C), C1)TK 1) is complete sufficient 4. Hence At(1)T E" C(1) must have 0 covariance with every zero estimator m (Ci)), i.e., E [(At (1)TL' C1)m(C 1)) = 0. Next we generalize Theorem 4.4.1 for certain nonnormal general distributions. Suppose that e* = (WT, rT)T A = Block Diagonal (B(It 0 W)BT, 1" 0 + Vt) Assume that given R = r, e* ~ N(0, rIA), while marginally the distribution function of R is an arbitrary member of the family F = {F : Fis absolutely continuous with pdf f(r) = 0 for r < 0}. Let F* denote a subfamily of F such that each component of eB*(C1)) and ,(C'), C(2)) has finite second moment under the model 4.2.1 and the joint distribution of e* and R. In such a set up, we now prove the BUP property of the predictor ef*(C(1)) in the following theorem which is a simple extension of the Theorem 3.3.2. of Datta (1990). Theorem 4.4.2 ef*(CI1)) is BUP of t(C(1), C2)) under the model in 4.2.1, e*lR = r ~ N(O, r1A) and R has a df from F*. Proof of Theorem 4..2 Using lemma 4.4.1 and following the proof of Theorem 4.4.1, it suffices to show that Ea,F [(At()TC ))m(C(1)] = 0 (4.4.19) for all a E RP and for all F E 7*, where Ea,F [m(C1))] = 0 (4.4.20) and Ea,F [m2(C ))] < oo for all a and for all F E F*. Consider the subfamily S= {R gamma(Ic, d) : c > 0, d > 2} of F. Since (4.4.20) holds for this subfamily H, Ea,F [m(C(1))] = 0 for all a and for all F E H gives fo~ exp( cr) fr(nt+d)1exp [,r(c1) A (1))ET 1 (c1) A 1)a)] S(4.4.21) xm(c l))dce1)dr = 0 for all oa, c > 0 and d > 2. Now using the uniqueness property of Laplace transforms, it follows from (4.4.21) that r(n+) exp r(c1) A 1)a). E (CO) A$1)a)] xm(ct1))dc1) = 0 a.e. Lebesgue for all r > 0 and all a, i.e., exp [ r(c1) A1a)Tr (c) A a)] xm(ctl))dc(1) = 0 (4.4.22) a.e. Lebesgue for all r > 0 and all a. Differentiation of both sides of (4.4.22) with respect to a and some simplifications using (4.4.22) lead to (A1)T (1))m(c1) xexp r(c(') A 1a) 7 A1)a) de1) = 0 (4.4.23) a.e. Lebesgue for all r > 0 and all a. Multiplying both sides of (4.4.23) by rA"' and integrating with respect to dF(r) where F E F*, one gets (4.4.19). Remark 4.4.3. Since F* does not contain the degenerate distributions of R on (0, oo), Theorem 4.4.1 does not follow from Theorem 4.4.2. Remark 4.4.4. In Theorem 4.4.2, if we take H for F*, we see that the marginal distribution of Ct is given by the family of distributions {( Nt, Atca, (c/d)Et, d) : a RP, c > 0,d > 2} and ef*(c1)) is BUP for (t(C'), C(2)) for this family where ( Nt, At (c/d),t, d) is Ntvariate tdistribution with location parameter Atce, scale parameter (c/d)Zt and degrees of freedom d. Next we dispense with any distributional assumptions in (4.2.1) and show that ef'(cl)) is the BUP of ,(C'), C2)) within the class of all linear unbiased predictors. A pre dictor 6(C'1)) is said to be linear if b(C(1)) has the form LtC1') for some known u x nt matrix Lt. If in addition Eg [.(C'1)) E(C1), C2)] = 0 for all P, we say that 6(Ci1)) is a LUP of ,( ), C(2)). We now introduce another definition. Definition 4.4.2. A LUP SftC1) of (t(C(1), C2)) is said to be a BLUP if for every LUP LtC1) of E(C1), C2)), Vp [LC1) ,(C), C2))] V [SC1) (C), CI2))] is n.n.d. for all P. We now prove the BLUP property of ef'(C1')) in predicting ,(C 2), C2)) in the following theorem. Theorem 4.4.3 Consider the model in (4.2.1) and assume that Eg(w) = 0, Ep(qr) = 0, Eo(w) = 0, Ef(oWT) = 0, Ep(Tqr) < 00 and EO(wTw) < oo. Then e*(C(I)) is the BLUP of t(C(l), C2)). Proof of this theorem requires the following lemma. Lemma 4.4.2 A LUP StC1) of ~(CtI), C2)) is a BLUP if and only if Cov4 [SCt') tt(C(1), C(2)), mC (1)] Cov[C mIt C(I] = 0 for all 0 and for every known nt x 1 vector m satisfying Egp(mTC)) = 0 for all . Proof of this lemma is similar to the proof of Lemma 4.4.1 and is omitted. Proof of Theorem 4.4.3 If EO(mTCM)) = mTA a = 0 for all a, mTA1) = T. Hence Cov4, ef ) (C1C ), m = Cov [P2(MtC(1) C2)), mTC~)] = P2(Mttn St21m) = P2(A2) E't21ZA(1))(At(1)T IA(1))'A(1)Tm = 2 tll t t t11l t tt = 0 for all . The last two qualities follow from the definition of Mt and from the fact that mTA(1) = OT. Applying lemma 4.4.2, the result follows. Remark 4.4.5. It can be easily proved that BLUP of g(Ct) is unique with probability one. 4.5 Stochastic Domination This section examines the stochastic dominance (as in Hwang (1985)) of the HB predictor ef*(c1)) within the class of all linear unbiased predictors of (t(C$1), C(2)). We will consider the model of Section 4.2 with A known. As in the previous Section, we will treat a and R as unknown parameters. Also, we write e* = (wT, 7T)T where w and rl are as defined in Section 4.2. We dispense with the normality assumptions on w and if. In the previous section, we established the risk optimality of the HB predictor ef*(c~1)) within the class of all LUPs of t(C1), C2)) without any distributional assumptions on e*. The optimality of ef*(c1)) within LUPs holds a fortiori under the quadratic loss L 1= 16_t = (6 t)Tn(6 t) = tr [Lo(, 6)], where 5? is a n.n.d. matrix. We shall refer to such a loss as generized Euclidean error w.r.t. 1. Let RL(P, ,; 6) = E [L (6(C()) td(C1), C 2)),)] be the risk function of the predictor 6 for predicting t, under a loss function which is a function of generalized Euclidean error w.r.t. 51 for some function L. We now assume that e* has an elliptically symmetric distribution with pdf given by h(e* A,r) oc Ir1A f(re*TAle*) (4.5.24) A = Block Diagonal (B(I 0 W)BT, (Im 0 ftL ) + Vt) and the known non negative function f is such that / Wj EEE(ijkl + el)+ 1 f(re*T rle*)de* < C0 (4.5.25) =lk=l i=lj==1k=1 where j = (Wjl,Wj2,..., Wjk)T We denote this distribution by /(,0,r'1) where (p,, *a'2) denotes the distri bution whose pdf is given by k(sal, '*, a) ox Io20* f((a )T*(a _)/a2) (4.5.26) where s and A are in RP, W*(p x p is p.d. and a > 0. Note that the normality of e with mean 0 and variance covariance matrix r1A is sufficient but not necessary for (4.5.24) and (4.5.25) to hold. It follows from (4.5.25) that E(e*) exists and from (4.5.24) E(e*) = 0 Note that e** = rA14 e* has a spherically symmetric distribution 9f(0, I,.) with characteristic function (c.f.) E [exp(iuTe**)] = c(uTu) for some function c (see Kelker, 1970) where i = v/, u = (ui, u.)T and q* = Nt +tq. Hence e* has c.f. given by E [exp(iuTe*)] = c(rluTAu) (4.5.27) Now write Di = Bt(j)w + e(j) (j = 1,2) Then Dt = (D T, D2T)T has a c.f. given by E [exp(iu*TD,')] = c(rlu*TtU*) (4.5.28) where St = B,(I W)B + (I t) + Vt Comparing (4.5.24) and (4.5.27) one can see from (4.5.28) that Dt has also an ellip tically symmetric distributions Sf(0, r'Tt) with pdf given by h(d*Tt, r) ox rt I f (rdfT tld*) (4.5.29) We now introduce the notion of "stochastic" or "universal" domination as given in Hwang (1985). Definition 4.5.1. An estimator 61(C&1)) universally dominates 62(C(1)) (under the generalized Euclidean error w.r.t 17) if for every P, and every nondecreasing loss function L, RL((, tt; i6) < RL( ~t; 62) holds and for a particular loss, the risk functions are not identical. In Hwang (1985), it has been shown that 61 universally dominates 62 under the generalized Euclidean error w.r.t 2 if b6(C 1)) t(C(C), C(2))2I is stochastically smaller than 62(Ct')) tE(Ct), CIt ) The following theorem establishes the stochastic dominance of ef*(C 1)) over every LUP of ((C~'), C'2)) for a general class of elliptically symmetric distributions of e*. Theorem 4.5.1 Under the model given in 4.2.1 with A known and assuming e* has an elliptically symmetric pdf as in (4.5.24), etf*(C )) universally dominates every LUP 6(C~')) = SC1') of^ (C '), Ce2)) for every p.d. Q. To prove this theorem, we require the following lemma. Lemma 4.5.1 If D(Nt x 1) has pdf h(djIN,,r), then for every L(u x Nt), u < Nt, LD A (L LT)2D, = (I,, O)D where 0 (u x (Nt u)) is a null matrix and = means equal in distribution. Proof of Lemma 4.5.1 The proof follows the arguments of Hwang (1985). From (4.5.27) it follows that D has c.f. E [exp (isTD)] = c (r sTs). Hence E[exp (irL D)] = c r sL LL ,) (4.5.30) where sl is a u x 1 vector. Next using (4.5.30) E exp (is(L LT)2D, = [exp (isT(L LT)2 (, 0)D)] = c(r L L) ( 0) (Iu )(I, O)T(L L sT)28 = c(rls (LL) sl), (4.5.31) so that the lemma follows from (4,5.30) and (4.5.31). It follows as a consequence of Lemma 4.5.1 that DT LT LD = (L D)T(L D) I TT ((L L) Du) ((LLT) D) DL LTDu (4.5.32) We shall use (4.5.32) repeatedly for proving Theorem 4.5.1. Proof of Theorem 4.5.1 Let StC~1) be a LUP of (CI), C(2)). Then, under the time series model (4.2.1) with A known and from the fact that E(e*) = 0, it is easy to see that (S, P,)A1) = P2A 2) (4.5.33) Writing D = '2D* and using (4.5.32) and (4.5.33) one gets [stC1) t,(C('), C2) ) a [Scr'1) ,(C 1), C 2)] = D*T[StP1 P2]TO[S P1 P2] D = DT '[St P2]T [t P1 P2] D SDu n [St P P2E [St P P2] rD,u. (4.5.34) Similarly, [ef(C'1)) ,(C), c 2)) [TBef(C'1)) ,(C ), C2))] = D*T[P2Mt P2]T, [P2Mt P2] D* T 1 1 = DTVE[P2Mt P2] [P2Mt P2] } D D Djs [P2Mt P2]1t [P2Mt P2]T 4D,. (4.5.35) Write 1 = St Pi P2Mt. Then rhs of (4.5.34)=rhs of (4.5.35)+D, T F2tinFTfDD, (4.5.36) since using the definition of Mt. (P2Mt P2) ( T ,( t21 ) = P2(Mt ( tpl)FT = p2 (MV2)ix 21) T)T = P (A) t21 A )) (AX1. .A(1) A 1T (4.5.37) and using (4.5.33), FA1) = (St P P2Mt) A) = P2 (A) MtA(1)) = 0 (4.5.38) Theorem (4.5.1) follows now from (4.5.34)(4.5.36). Also since tEm is positive definite, it follows from (4.5.36) that rhs of (59) = rhs of (60) if and only if T = 0, that is St = P1 + P2Mt. 4.6 Best Equivariant Prediction In this section we concentrate on the equivariant prediction of t,(C(1), C'2)) on the basis of CI1) under suitable group of transformations. Assuming A as known, we rewrite the model in (4.2.1) as Ct ~ N(Ata, r, Et) (4.6.39) where St = Bt(I W)BT + (Im 0 t) + Vt (4.6.40) We redefine 0 = (a aT) We are interested in equivariant estimation of P1A)ar +P2A a2) where the matrices A(1), A ), P1 and P2 are as introduced earlier in Section 4.2. It suffices to estimate the vector a where Ata can be viewed as a location vector. Note that C, + Ate ~ N(At(a + /), a2E,) We are interested in developing an estimator of a which remains equivariant under a new origin of measurement. So a "natural" group of transformations will be 9i = ({g,f E RP : go(Ct) = C, + A13} (4.6.41) Now assume that we partition Ct and At as C (2) At = A( If b(C1)) estimates (P1AP1) + P2A2))a then 6(Cd1)) + PiA(')3 + P2A 2) should estimate (PIAl') + P2A 2))(a + 3) = (Pi P2At(a + 3) Treating At(a+3) as the new location parameter, one can expect that 6(C1')+A')P3) (estimator based on the "observed part" of Ct and At) will estimate (PIA~') + P2A 2))(a + 3) So we should have b(C 1) + A1)3) = s(C1)) + P+AI') + P2A 2) (4.6.42) for all C '),P. Now if we are interested in estimating (C 1), C$2)), instead of EOb(te(C1l), C(2)) = PiAll)a + P2A 2)~ We can still use 6(C.1)) and again we will impose (4.6.42) on 6. Note that the induced group of transformations on the parameter space D = {f: 5 = (a, aT)T, a E R, a > 0} (4.6.43) is given by 1 = {f,/3 E RP : g (() = (, T + 3T)T} (4.6.44) A loss function L(t,< ; 6) is said to be invariant for predicting (CI1), C2)) by 6(C(')) if L (,(C'), C )) + (PIA() 2 + ))PA); b(Cc1)) + (PIA() + P2A )3) = L(4(Cc, C()), 4; 6(C1),)) (4.6.45) for all C'), C(), 3, and q. An estimator 6 satisfying (4.6.42) is said to be equivariant. We will now be interested in the best equivariant prediction of 4E(C(I), C2)) The following lemma provides a useful characterization of the class of equivariant predictors of 4,(C1'), C12)). Lemma 4.6.1 Let 6o(CI')) be an equivariant predictor for t(C ), CI )). Then a necessary and sufficient condition for a predictor 6(C(1)) to be equivariant is that 6(C(1)) = bo(C~1)) + h(C')) (4.6.46) for all C), where h(C(1) + A(1)0) = h(C(1)) (4.6.47) for all C$1) and all 3. A function h satisfying (4.6.47) is said to be invariant and hence must be a function of a maximal invariant function under the group of transformations G, = g' ,f E RP : g'(C1)) = Cl) + A 1)} (4.6.48) induced by Q1 in the C') space. The next lemma finds a maximal invariant function under the group of transformations Q[. Lemma 4.6.2 A maximal invariant function under the group of transformations 'i is KtC(1) where Kt = E' E1)AA(1) t = lt ti t t A tn t I i t Proof of Lemma 4.6.1 First we show that KtC(1) is an invariant function. Note that KtA') = 0 and hence Kt(C(1) + AI)3) = KtC1) for all C'1) and 3. Now we will prove the maximality. For C'), Cl) such that KtC.1t v rt(1) KtC( = KtI(1) we have C) C') is orthogonal to the row space of Kt. Also rank (Kt) = nt p and rank (A1)) = p. Again using KtA(1) = 0, it follows that the column space of A(1) forms an orthocom plement of the row space of Kt. Hence C() C) = A1') for some 0, i.e., C() = C + A) tl t2+ proving thereby the maximality of KtC1). Since etf*(C )) is an equivariant predictor of g(C'), tC2)), using lemmas 4.6.1 and 4.6.2 we can characterize the class of all equivariant predictors of t(C 1), C'2)). This is given in the next lemma. Lemma 4.6.3 Under the group of transformations Q\ given in (4.6.41), a predictor 6(C 1)) of ,(C$1), C 2)) is an equivariant predictor if and only if 6 has the represen tation 6(C1) = ef(C)) + 1(KtC1)) (4.6.49) for all C 1), where I is an nt variate vector valued function. Define the loss functions Lo((,, 6) = (6 _,)(6 _f)T, L1(t,, 6) = tr[MLo(tt, 6)], (4.6.50) where 1 is a n.n.d. matrix. Note that Lo and L1 are invariant loss functions. Definition 4.6.1. An equivariant predictor 60(Ct1)) of ,(C1), Ct)) is said to be a best equivariant predictor under the loss Lo if for every other equivariant predictor 6(C~I)) of ,(C ), C(2)) ES [Lo(t,, 6) Lo((,, 6o)] is n.n.d. Note that if bo(Cl1)) is the best equivariant predictor of lt(Cj1), C ) under the loss Lo, then it is so under the loss L1 for every n.n.d. 9. Conversely, if 6o(C1')) is the best equivariant predictor of tt(C$1), C(2)) under the loss L1 for every n.n.d. 9, then it is so under Lo. Next theorem establishes the optimality of ef*(C(')) within the class of all equivariant predictors 6(C(1)) of bo(C$1)) under the loss Lo, and a fortiori under the loss L1 for every n.n.d. 7. Theorem 4.6.1 Under the normal time series linear model given in (4.2.1), the group of transformations 9 given in (4.6.41) and the loss Lo given in (4.6.50), the best equivariant predictor of (C ), C2)) is given by ef*(C'l)). Proof of Theorem 4.6.1 Note that from (4.6.49) any equivariant predictor 6(C 1)) of t (CI, C)) is given by 6(C')) = ef*(C') + (KC')) Only those 6 are to be considered for which E,[Lo(t, 6)] is finite. Note that EO[Lo(t, et*)] exists finitely and hence EO[IT(KtC('))l(KtCl))] <_ 2tr {(E[Lo(t, ef*)] + EO[Lo(E,, 6)]} Hence by CauchySchwarz inequality E[I(KtCIr(KtC'))] is finite. Under the matrix loss Lo, E0 [Lo (t, 6) Lo (t, et,*)] = E0 [1 (KtC1)) 1T (KC~1)] +Eo [(ef* (CM)) t, (C1), C(2))) 1T (KtCl))] (4.6.51) +E0 [I (KCI1)) (ef (C,.) (C), C t)) T] (4.6.52) Now we will show that the last two product terms are null matrices. Using the defini tions of ef(C'1)) and Mt, we have E B [(eCf* (C)) t (CM1), C 12)) C1)] = P [Mt,( E, (C2 ,C1)] = P2 [MtC ) A 2 t2l1 (C ) A1)a)] a.e. Leb. = P2 [A2) Zt21E A1)] A ( )At A(') x (A(1)TvE C 1) A( )a) (4.6.53) Now Covq A[Ai1STKC\, Kc1)]= AnK = A(1)Kt = 0 Since C1') is multivariate normal and At(')T.c 'C(1) and KtC() are linear functions of C(), so AtT ECc and KtC'1) are independently distributed. Now using (4.6.53) and independence of At(1 )T C1) and KtC1'), we have E e [(ef* (CIM)) (C~1), C(2))) IT (KC1)) = [EE {ef* (CM1)) (Cl1), C(2)) IC1)} T (Ktc')] = P2 (A Et21 AA1)) (A,(1)T~~ A1) xE [(A (1)T 1C(1) At(I)T 1) A (lKC 1))] =P2 (At t21Et A ) A(1)T At = Pt (AI2) ^2 ~A$15) (A"tvlAl1)1 xE ((A()TT1 C(l At(')TXA (") EO ( (K (Cc))) =0 (4.6.54) Now from (4.6.52) and (4.6.54) E L [Lo (t, 6) Lo (t, ef)] = E4 [1 (KC$)) 1T (KC$I)] S0 (4.6.55) with equality if and only if P4 [1 (KCl1))] = 0, i.e.,P [6 (c1)) = eB(* C))] = 1 The proof of theorem is complete. Next instead of g1 we consider the group G2 of transformations given by 2 = (gf,d, 3 E R, d > 0 : gp(Ct) = d Ct + At3} (4.6.56) A predictor 6(C(1)) of 4t(C'), C(2)) is said to equivariant under the group 92 of transformations if 6(dC$1) + A'),3) = d6(Cl1)) + (PIAl1) + P2A 2)), (4.6.57) for all C~'),3 and d > 0. Note that ef*(Ct1)) is again an equivariant predictor of t(C('), C (2) under the above creterion. Also, note that if 6 is equivariant under Q2 taking d = 1 in (4.6.57) it follows that it is equivariant under gQ. So the class of equivariant predictors under G2 is smaller than that under g1. But we will prove that et*(C1l)) is best inside this smaller class under a larger family of distributions of Ct which includes the normal distribution as a special case. We will apply the group of transformations g1 on the family of elliptically symmetric distributions with pdf given by f (ct) oc l2EtlI f((ct Ata)TL '(ct At,)/or2 (4.6.58) where f is a known nonnegative function satisfying S[1 + (ct Ata)T' L (t Ata)] xf((ct Ata)TZ'l(Ct Ata)/"a)dct < oo (4.6.59) Note that Ct ~ N(Ate, C2aEt) is a special case satisfying (4.6.58) and (4.6.59). It is easy to see that the above group of transformations g2 on Ntdimensional Euclidean space for Ct induces a group of transformations 2 = { ,d, t3 E RP, d > o: gd = (da, dac + T)T} (4.6.60) on the parameter space given in (4.6.43). As before, a loss function L(t, 0; 6) for predicting Et(C 1), C2)) by 6(C1l)) is invariant under the group of transformations g2 if L(dt(CI1), C2)) + (PIA ) + P2A(2)3, ,(4); d6(Cl')) + (P1A(') + P2A(2))3) = L(, (C'), C2) ), 4; 6(C~1))) (4.6.61) for all CI', C'), /,d(> 0) and 2. We shall consider the special losses L2(,t, 4; 6)= Lo(~t, 6) /a2 (4.6.62) and L3(t, 4; 6) = L(t, 6) /a2 (4.6.63) Both these losses satisfy (4.6.61). To find the best equivariant predictor of ,(C(1), C~)) in this set up, we will present two lemmas which provide a useful characterization of the class of equivariant predictors. Lemma 4.6.4 Let bo(C(')) be an equivariant predictor for Et(C 1), C 2)). Then a nec essary and sufficient condition for a predictor 6(CI1)) of ,t(C1), C(2)) to be equiv ariant is that 6(C'1)) = so(C(') + h(C(1)) (4.6.64) for all C$1), where h(dC') + A')3) = dh(Ci')) (4.6.65) for all C I), and d > 0. We will follow the arguments of Lemma 2 and 3 of Datta and Ghosh (1988) to find a representation of the function h in (4.6.65). Define p(KC')) = KgCt)/(C 1)rKCg1)) )CC 1 KC,,ol (4.6.66) Note that since KtStlnKt = Kt, so C 1)TKC1't) = (KtC))TEttI (KC(l)) and p is indeed a function of KtC'). It can be shown that p(KtC'1)) is a maximal invariant under the group of transformations S= {=g3,d, P RP,d > o: g/ (Cl1) = (dCe1) + A1))3} (4.6.67) induced by 92 in the C )space. The following lemma characterizes the class of functions h(C'1)) satisfying (4.6.65. We will use this lemma to characterize the class of equivariant predictors. Lemma 4.6.5 A function h(Ci'))(u x 1) satisfying (4.6.65) if and only if h has the representation h(CI()) = (C(I)KtC())is(p(KI C('))) (4.6.68) where s(u x 1) is an arbitrary function ofp(KtC1)). Proof of Lemma 4.6.2 Assume h has the representation given in (4.6.68). Now, since (dC(1) + Al)3)TKt(dCl) + A(1)t) = d2 1)T KtC 1) and since p(KtC1)) is a maximal invariant under G', so p(Kt(dC(1) +A A),)) = p(KtC(I)) and consequently h(dCI ) + A1')3) = (d2C )rKtC1)) s(p(KtC1)) = dh(C1)) Hence (4.6.65) is satisfied. only if. Since h satisfies (4.6.65) for all c'),3 and d > 0, taking d = 1, we see that h must satisfy h(C~1) + A1)/3) = h(C1C)) for all C$1) and 3. This implies that h must be invariant under 9Q, and hence must be a function of KtC ). So h(CI')) = s(KtC\ )) where s(u x 1) is an arbitrary function satisfying s(K,(dC') + A(1))) = ds(KtC1)) i.e.s(dKtC(')) = ds(KtC')) (4.6.69) for all d > 0. Now taking d = (C 1)TKCG )) for all C t1)TKC1) > 0 we have from (4.6.69) h(C)) = s(KC1)) ((cT K8 c 1)) KtC(1)) (C$1)KC(')) = (p (KtC1))) (C 1)K1Cl)) 2 Now for C'1)KtC') = 0, if we take h = 0, then (4.6.65) is satisfied and we can represent h by (4.6.68). Since etf(C t)) is an equivariant predictor of gt(C1), C12)), it follows from lemma and lemma that 6(C ')) is an equivariant predictor of t (C'), C(2)) under the group G2 of transformations if and only if 6(C(')) = ef*(C(I)) + (C1)T1KtCI)) s(p(KtC(1))) (4.6.70) for all C'). Definition An equivariant predictor bo(C ')) of t (C), C(2)) is said to be a best equivariant predictor under the loss L2 if for every other equivariant predictor b(C )) of (C), C 2) E0 [L2(t, 0; 6) L2(, 0; 6o) is non negative definite. Remark. Note that if o6(C(1)) is the best equivariant predictor of ~t(C(1), C 2)) under the loss L2, then it is so under the loss L3 for every n.n.d. S and vice versa. We now establish that ef*(C 1)) is best equivariant predictor of gt(C(1), C2)) under the loss L2. Theorem 4.6.2 Under the time domain HB model given in and the family of ellipti cally symmetric distributions as given in (4.6.58) and (4.6.59), the group of trans formations G2 given in (4.6.56), and the loss function L2 given in (4.6.62), the best equivariant predictor of t,(C(), C(2)) is given by ef*(C')). Proof of Theorem 4.6.2 Note that from (4.6.70) any equivariant predictor 6(CI1)) of tt(CI(), C(2) is given by 6(C(')) = ef*(C(')) + (C K)TtC()) s(p(K1C(1))). Only those 6 are considered for which E0 (L2(tt,,; 6)) is finite and in that case for the corresponding s, we have E [CIK,C (p(K,CI )) (p(KtC(!))] is finite. Under the matrix loss L2, 2E [L2(,, O; 6) (,, [; ef *) = E C KcKtC)s(p(KtC s1) (p(KC(1)] +E [(eB*(C)) t(C(I), C2)))(C(1)TKC) T(p(KCI)) +E [(C( KC )' s(p(KC()))(ef (C ) ,(C$), C)))] (4.6.71) It is enough to prove that the last two product terms in (4.6.71) are null matrices. Eb [etf(C')) t,(C '), C t))IC')] = P2 [M,C) E t(C 2)C )] (4.6.72) Now from the property of elliptically symmetric distributions, it follows that E[C)C())] = A2a + 1 (C (1) A'2)a)a.e.Lebesgue and using this, we have from (4.6.72) that E eB*(C(l)) tt(C(I), C(2)) IC l) = P2 (Ai2) Zt21l1A(1)) (A(1)T A(1))1 x (At(1)TE (1) At(1)T X A1)c) (4.6.73) X Llt t t t .llll Then, the second term of the rhs of (4.6.71) = P (A 2) E't211A 1)) (At()T 1A (1)1 xEo [(CT KtCt)(AEt(l)Tr l C ) A ')TEZ1 A1) X T (p(KC t1)))] (4.6.74) Now we will show that the expectation of the rhs of (4.6.73) is a null matrix. Note that since Ep (CI1)TKeI1)) < oo) and Ei(C(l)) = Al1)c, by CauchySchwarz inequality, (C')TKtC'))At E)T 'A1) has finite expectation. Now using the in dependence of ((Cf)KtC(1)~ A )TE C)) and p(KtC')), we have .,t t t t Il tll xs (p(KC1))] E [(CtT Ktc) (At(1)T l C(1) A (I)T X1 A$1)a) EZ [(CO KgC Et)A gI A(')c )] xt u t s(p(Ktiet)))r (4.6.75) Next using the elliptical symmetry of C'), it follows that Cl) Al) a 1 _(C1) A1) a) Now since KtA(1) = 0 and Kt is n.n.d., one has [(C$1) Al1)rK)T (C1) A 1)cA)] [AETTI(C(1) A')a) = [(C) A)}T K, {(C1) A1)}] xAf {f (C1) A }1) i.e.(CtTKtCt)) }AT (C'1) A1')a) = (C'1)TK C ')) A ET (C') A( ')) (4.6.76) Now taking expectations on both sides which exist finitely due to the previous obser vation, one gets E4 [(C KtCl)2A E I(C ) A}')a)] = 0 (4.6.77) Hence from (4.6.71), (4.6.74), (4.6.75), and (4.6.77) we have a2E, [L2(,t, ; 6) (t, 4; e')] = E [(CiF KtCl))s(p(KC s)) (p(KtC1))] which is n.n.d. for all 0 and the two risk matrices are equal if and only if PO [s(p(KtC'))) = 0] = 1 forall i.e.,PO [6(cl1)) = efB(C(l)) = 1 forall 4 Remark. Although (CiKtC)j,At(,tC')t is sufficient and p(KtC(1) is ancillary, Basu's theorem can not be applied since the sufficient statistics is not complete. CHAPTER 5 SUMMARY AND FUTURE RESEARCH 5.1 Summary In this dissertation, an estimation and prediction methodology is developed for simultaneous estimation of several strata means at different time points, based on data from repeated surveys. The methodology adopts a HB model which incorporates time dependent random components. The resulting HB estimators for a small area "borrow strength" over time in addition to borrowing strength from other local areas as has been the case in most of the existing methods of small area estimation. The Bayesian procedure provides current as well as retrospective estimates simultaneously for all the small areas. Application of this HB procedure to the median income estimation problem, in Chapters 2 and 3 has demonstrated the usefulness of this procedure in providing substantially improved estimators for small areas. Implementation of the HB methodology, in finite as well as in infinite population sampling situations have been illustrated by adopting a Monto Carlo integration technique known as the Gibbs sampler. Using this procedure, in both the univariate and the multivariate cases, the posterior density as well as conditional mean and variance can be obtained with considerable ease. The HB predictors of a vector of linear functions of the finite population obser vations, in the special case of known ratios of variance components, are shown to be best unbiased linear predictors without any distributional assumptions. It is also shown that these predictors stochastically dominate all the linear unbiased predic tors for elliptically symmetric distributions and are best equivariant predictors under suitable group of transformations. 5.2 Future Research The HB analysis considered in this dissertation assumed the variancecovariance matrix for the basic observations as known. However, often in practice, only the sample variancecovariance matrix is available. We can extend our HB model by setting either proper or improper priors on the unknown variancecovariance matri ces. We also assumed that the process through which the random components evolve through time is known. The case when this process is unknown needs further in vestigation. Also, the application of general technique of "borrowing strength" over time can be explored in other application areas e.g. in multicenter clinical trials in biomedical research or in online quality assessment of different production lines in a manufacturing industry. It is important to investigate the robustness of the HB procedure considered in this dissertation, to handle the case of heavy tailed priors. It is also, worthwhile to extend our Bayesian model to allow for the presence of outliers in the data by assuming scalecontaminated normal distributions. APPENDIX PROOF OF THEOREM 4.2.1. The joint pdf of Ct, a, R and A is given by f(ct,a,r,A) oc r t+Eih [j)f Pt]1 Ad 24 x exp [ (t Ata)TE71(Ce Aa) + ao + Zall 1 1l=1 U (ct Ata)T1(C, Aa) = a (AtT rAt)l AtTlct] x (AtT Z At) [a (ATE At)1 A Tr~ t] + ctTQtct where Qt = 1' iAt(ATTZElA)lAATA 1 Integrating (1) w.r.t a, the joint pdf of Ct, R and A is given by f(ct, r, A) oc Il lIA t 1At2lr(Nt,+ E. hp) x exp  ao + aA + cTQ1c))] f S =1 1=1 Integrating (3) w.r.t R, the pdf of Ct, and A is given by T 1 T '(N+' Eop) f(ct,,A) (oc ILtlAt E7lAtl2 ao + atjt + cTQtt) L=1 1= 1 ;=1 Now, using a standard formula for partitioned matrices (e.g., Searle (1971), page 46), we have tt _t vttc T +( ($2) Et2lci l ) 2 T Similarly cT 'At x(c (t21 tll2 = ct()TZA1) +( i t 1i(A 1)7 t22.1  t +tllt t21 tlA() 2 x (c2) Et21A1) = T +s, U1 +2, AtT E At  A ()TEIA(1) + (A(2) 21 A(1)  tllit Xt21Xt1 t 't22.1 x (A() t21, A1)) From (7) (AtTXt At)= (A (1)T 1 A '))1 (At(1)TIA(1))1(A 2) t2lI1A(')) x (,22.1 + (A 2) Et21Z1A)1)) (At(1)TX1A.1)  (A 2) t21t 1 A( 1) x (A(2 t21E iA(1))(At )T A ( 1) 1 (A 2t t. "t t1il t = At(E)TA (1) (A (1)TEA(1)(A) Et2 A (1)T =x G( tl E2 Ax A t t21 tll Gt r,1 a(1))(At(1)T  A(1))  ( Zt 21Zt ll" t t" lt = Mi + M2 (say) CtT tCt = cT tAt (AtTE7At)1AtTStct = (s + s')(M + M2)(si + s2) = sTMisi STM2s1 + sTMis2 s M2s2 + 2sT(Mi M2)s2 sTMIsi = (1)T ( Kt)c ) sTM2s2 = (Mtc1) t2i c1)TG1(M ( 1) t2E 1) x (ct2 1 A(')) STM s2 (c?) ~1 t. rt_2.) Ms2 = (c2 Et2A =l) T(2.G 2.21G) x (c2) Et2EM@Aj)) sTMIs, = (M S12Ec1^T (c2)c  cTQ t = (1)T (1) + MtC f) TGri(2) Mtc(')) CTt t tt + (Ct t G) (10) (11) (12) (13) (14) Now starting with the joint pdf of C l), a, R and A we get the pdf of C$1) and A given by h(c, A) c IllAAi1 h(c l), A) ocx J~ Alle)t 'A(1)1 ( 1 T (1 +) 2 & Yl x o + a ,A + c)t Kc,))" A1 1=1 1=1 (15) So the conditional pdf of C(2) given A = A is given by h(c2) C() = c~'), A = A) Thus f(c A) h(c('), X) oc ( '( IAtTE ; I x ao + a) X+ A' A, c 1) (=1 / x (ao+ aIAI ( s =1 + CtTQ.c) (N+El h,p) + ci Q~c1) IAtT ,'AtI  At(1)T 'A(1) + A( t2l1 A(1) Ti . tll t +(A 2 t21 tl t ) t22.1 x A () 1 A(')) x (At 2l/ Atll( At()TEA(1) A(2) X tlMA(1) At Zt2l, tAll(it + IZt22.1 At()T 1A(1) I 22.1 + (At Et221 1 A (1) x (As(1) v A())) (A(2) t2 MA (1)() til t ts t2lZ A ) + 1t(22.11 by using Exercise 2.4 on page 32 of Rao (1973) twice. Also Itl = ItZ l I122.1 Using the definition of Gt, (17) and (18), we have  At (I)T iAt ( IG ( I I t il IAT '=E'At From (7) (16) (A() 2 t21IA(1) T Xt22.1 (17) (18) I[c tIc , = IGtl So the conditional pdf of C(2) given C(') and A is given by h(c2) I c(), A) oc IGt' (ao + aA, + c ()TKc (1)) (Ntnt) +aAt t )^c) x [1 + (ao + 1 atXt+I ^(') ( c2)  + E a,, + c(i)T Kc) (c2) Mc()) l=1 x Gt1(c2) MtC1))]`4(Nt+:0i^ hP)  Gt t ) p)1 (ao + ht  I=0 l=0 (20) 1=1 T)) We have f(c2) c), A) oc t ti n $ S h p () M ())T1 1=0 x [(c2) MtC))]I (Nt+ hP)} From (21) it follows that the conditional distribution of C2) given C 1) and A = A is (Nt nt)variate t distribution with df nt + EZ=o ht p parameter Mt(') and scale parameter (21) (1) = Cti location t = (Ne + hp) ao 10 Ic)T c))) + a1 + c)TKc)) Gt 1=1 Also, since f(A Ic()) = (' A) c( h(c1), A), we have from (15) f(CV,1) f(A \ cj1)) oc nl A t(1)TiiA(1) I 1 I=1 (19) Writing (N 93 ao + a1xt + c(j) Ktc)) =1 ) BIBLIOGRAPHY Battese, G.E., Harter, R.M., and Fuller, W.A. (1988). An errorcomponents model for pre diction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc., 83, 2836. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Statist. Soc.B 36, 192326. Blight B.J.N. and Scott A.J. (1973). A stochastic model for repeated surveys. J. R. Statist. Soc.B 35, 6166. Brackstone, G.J. (1987). Small area data: policy issues and technical challenges. Small Area Statistics. Eds. R. Platek, J.N.K., Rao, C.E., Sarndal, and M.P. Singh. Wiley, New York, pp. 320. Datta, G.S. (1990). Bayesian prediction in mixed linear models with applications in small area estimation. Unpublished Ph.D. dissertation. Department of Statistics, University of Florida. Datta, G.S., Fay, R.E. and Ghosh, M. (1991). Hierarchical and empirical multivariate Bayes analysis in small area estimation. Technical Report 409, Dept. Statistics, Univ. Florida. Datta, G.S. and Ghosh, M. (1988). Minimum risk equivariant estimators of percentiles in locationscale families of distributions. Cal. Statist. Assoc. Bull., 37, 201207. Datta, G.S. and Ghosh, M. (1990). On the frequentist properties of some hierarchical Bayes predictors in finite population sampling. Technical Report, Dept. Statistics, Univ. Florida. Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to small area estimation. The Annals of Statistics, 19, 17481770. Efron, B., and Morris, C. (1971). Limiting the risk of Bayes and empirical Bayes estimators Part I: the Bayes case. J. Amer. Statist. Assoc., 66, 807815. Ericksen, E.P. (1973). A method of combining sample survey data and symptomatic indi cators to obtain population estimates for local areas. Demography., 10, 137160. 