Bayesian analysis of small domain data in repeated surveys

MISSING IMAGE

Material Information

Title:
Bayesian analysis of small domain data in repeated surveys
Physical Description:
vi, 97 leaves : ; 29 cm.
Language:
English
Creator:
Nangia, Narinder K., 1955-
Publication Date:

Subjects

Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1992.
Bibliography:
Includes bibliographical references (leaves 94-96).
Statement of Responsibility:
by Narinder K. Nangia.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001747534
notis - AJG0368
oclc - 29233108
System ID:
AA00003277:00001

Full Text










BAYESIAN ANALYSIS OF SMALL DOMAIN DATA IN REPEATED SURVEYS













By

NARINDER K. NANGIA


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


1992


wc Trc-.ATGo31&















ACKNOWLEDGEMENTS


I would like to express my sincere gratitude to Professor Malay Ghosh for his

invaluable guidance and support throughout this project. I would like to thank

Dr. Ronald H. Randles, Dr. P. V. Rao, Dr. Ramon C. Littell, Dr. Geoffrey G.

Vining, and Dr. Sencer Yeralan for being on my committee.

Much gratitude is owed to my parents, whose support, advice, guidance and

prayers throughout the years of my life have made this achievement possible. My

debt to them can never be repaid in full. A very special thank you is offered to

my wife, Uma, for her continuous comfort, trust, and understanding; and my son,

Nishant, for his loving support.














TABLE OF CONTENTS




ACKNOWLEDGEMENTS ............................ ii

ABSTRACT ......... ... .... ................ v

CHAPTERS

1 INTRODUCTION ...... ............... ....... 1

1.1 Literature Review ... .. ........ ......... .... 1
1.2 The Subject of This Dissertation ................ 4

2 ESTIMATION OF SMALL AREA MEANS IN TIME DOMAIN... 6

2.1 Introduction ................... ......... 6
2.2 Description of the Hierarchical Bayes Model ........ 8
2.3 Empirical Bayes Estimation .................. 14
2.4 An Application of Hierarchical Bayes Analysis ........ 16

3 MULTIVARIATE ESTIMATION OF MEANS IN TIME DOMAIN 30

3.1 Introduction .... ........... .. .......... 30
3.2 The General Multivariate Hierarchical Bayes Model ..... 31
3.3 An Illustrative Application to Survey data .......... 33

4 HIERARCHICAL BAYES PREDICTION IN FINITE POPULATIONS 46

4.1 Introduction ...... ........... .. ........ 46
4.2 Description of the Hierarchical Bayes Model ......... 48
4.3 The Hierarchical Bayes Predictor in a Special Case ..... 57
4.4 Best Unbiased Prediction in Small Area Estimation ..... 59
4.5 Stochastic Domination ...................... 67
4.6 Best Equivariant Prediction . . .. 72

5 SUMMARY AND FUTURE RESEARCH . ... 86

5.1 Summary .... ......................... 86








5.2 Future Research .......................... 87


APPENDIX PROOF OF THEOREM 4.2.1 ................. 88


BIBLIOGRAPHY .................................. 94

BIOGRAPHICAL SKETCH ........................... 97














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of
Doctor of Philosophy

BAYESIAN ANALYSIS OF SMALL DOMAIN DATA IN REPEATED SURVEYS


By

NARINDER K. NANGIA

December 1992
Chairman: Malay Ghosh
Major Department: Statistics

We consider in this dissertation estimation and prediction of local area means
based on repeated surveys over time. A general hierarchical Bayes model is used

for this purpose. The resulting HB estimators for a small area "borrow strength"

over time in addition to borrowing strength from other local areas. The Bayesian

procedure provides current as well as retrospective estimates simultaneous for all the

small areas. The HB methodology, in infinite as well as finite population sampling

situations, has been implemented by adopting a Monto Carlo integration technique-

the Gibbs Sampler.

A multivariate HB model is considered to exploit the inter-relationship among

the parameters in improving the small area estimators. Both the univariate and

multivariate methodologies have been implemented on median income estimation

problem. The results show that the estimators borrowing strength over time and

from other local areas are substantially better than the estimators which borrow

strength only from other local areas.







In finite population sampling, the predictive distribution of a characteristic of

interest for the unsampled population units is found given the observations on the

sampled units and is used to draw inference. In particular, we have provided HB

predictors and the associated standard errors of the local area means over time. In

a special case of this HB analysis, when the vector of the ratios of the variance

components is known, the HB predictor of the vector of local area means is shown to

possess some frequentist optimality properties. For example, among other properties,

we have shown that the HB predictors are best unbiased, best equivariant predictor

under a suitable group of transformations.














CHAPTER 1

INTRODUCTION



1.1 Literature Review


The existence of small area (domain) statistics based either on a census or on other

administrative records aiming at complete enumeration can be traced back to 11th

Century in England and 17th Century in Canada (see Brackstone (1987)). (For the

past few decades, sample surveys for most purposes have taken the place of complete

enumeration as means of data collection. These surveys have been able to provide

sample based estimates of adequate precision of the parameters of interest at the

national and subnational levels. However, the overall sample size in such surveys is

usually determined with the target of providing specified accuracy at a much higher

level of aggregation than that of the small areas. The resulting small sample sizes for

the small areas are usually insufficient to provide sample based estimates of adequate

precision of parameters at the small area level and are likely to lead to large estimation

errors. On the other hand, the reliability of the small area statistics is crucial for

regional economic planning and administration. The evaluation of state economic

policies and identification of the subgroups or regions of the population which are not
keeping up with average, relies heavily on the small area statistics. Furthermore, the
decisions concerning the investments and the location of small industrial units by the

corporate sector are usually based on the local social, economic, and environmental







conditions. As a consequence, there is a growing interest in the development of reliable

small area statistics.

In recent years, several model-based estimation techniques have emerged result-
ing in improved small area estimates. The essence of these small area estimation

techniques is to borrow strength from similar neighboring areas for estimation and

prediction purposes. These estimation techniques use models, either explicitly or

implicitly, that "connect" the small areas through supplementary data (e.g., census

and administration data). Early work of Ericksen (1973, 1974) employed the simple

regression method that combines sample data with symptomatic and census data

for estimating the population of local areas. By adopting a regression model similar

to that of Ericksen (1974), Fay and Herriot (1979) suggested a procedure that used

a constraint weighted average of the model-dependent and sample-based estimators

with weights inversely proportional to the average lack of fit of the regression, and the

variance of the sample estimate. These weighted average estimates were constrained

to be within one standard error of the sample estimates. Ghosh and Rao (1991) have

recently surveyed most of the existing small area estimation methods.

The simultaneous estimation of the parameters at a given time for several small ar-
eas (strata) by employing a general linear model with covariates have been considered

by Fay and Herriot (1979), Ericksen and Kadane (1985, 1987), Ghosh and Meeden

(1986), Ghosh and Lahiri (1988), Battese, Harter and Fuller (1988), Prasad and Rao

(1990) using either a variance components approach or an empirical Bayes (EB) ap-

proach. An EB approach does not incorporate uncertainty due to the estimation of

prior parameters and thus leads to underestimation of true posterior variance. The
underestimation problem gets more severe when one or more variance components of

the model are unknown.

An alternative hierarchical Bayes (HB) procedure for the small area estimation at
a given time, has been considered in Ghosh and Lahiri (1988) and Datta and Ghosh







(1991). Both the EB and HB procedures often lead to similar point estimates of the

parameters of interest. We may also add that in contrast to the EB approach, the

HB approach provides small area estimators that have natural measures of standard

errors associated with them, namely the posterior standard deviations.

In the sample survey literature, the importance of model-based approach to small
area estimation has been well recognized. However, the model-based techniques for

small area estimation considered so far borrowed strength only from similar other

areas to improve small area estimators.

However, often these small area (strata) populations are subject to change in
time. Usually the time series information is available through the repeated surveys,

(e.g. market research survey, crop cutting experiments etc.) carried out at regular
time intervals by industries and government agencies. As mentioned in Scott and

Smith (1974), Indian National Sample Survey, a multipurpose survey carried out

annually since 1950 and the Medical Date Index in The U.K., a market research

survey of doctors are examples of nonoverlapping surveys with human populations.

In such surveys, one has at one's disposal not only the current data, but also data

from similar past experiments. Typically, only a few samples are available from an

individual area.

In the past, time series models have usefully been applied to the analysis of re-
peated survey data from the random samples. Early results in this area (see, for

example, Jessen (1942), Patterson (1950), Rao and Graham (1964), Raj (1965) and

Singh (1968)) have not assumed any relationship between successive values of the

population mean. Blight and Scott (1973) and Scott and Smith (1974) have derived
the estimates for the mean of a time dependent population using a first-order au-

toregressive model under the assumption that all the parameters of the model are

known. Rodrigues and Bolfarine (1987) considered the prediction of the population







total (in a finite population) using a Bayesian approach based on the Kalman Filter

estimating algorithm.

However, no results concerning the estimation of the parameters for the small

areas under the dynamic superpopulation models wherein past information can be

effectively used in conjunction with the current information, are available.


1.2 The Subject of This Dissertation


In this dissertation, we present a Bayesian analysis of the small domain data in

repeated surveys. Estimation and prediction methodology have been developed for

the infinite as well as finite population sampling situations. We consider simultaneous

estimation of several strata means through a time series Bayesian estimation model.

In Chapter 2, we introduce a HB model for simultaneous estimation of several
small areas means in infinite population sampling, assuming that the observations are

available only at the aggregate level. The posterior distribution is provided through a

fully Bayesian procedure which uses a Monte Carlo numerical integration technique,

the Gibbs sampling algorithm. The resulting small area estimators "borrow strength"

over time in addition to borrowing strength from other local areas. The methodology

developed is applied to the problem of estimation of median income of four-person

families by states.

In Chapter 3, we present a multivariate HB model for simultaneous estimation

of several small areas means in an infinite population sampling, once again assuming

that the observations are available only at the aggregate level. Our aim is to exploit

the inter-relationship (if any) among the parameters in improving the small area

estimators. We have also implemented the proposed analysis on the median income

estimation problem considered in the previous chapter through the use of median

incomes of three- and five-person families. The multivariate analysis in this case







shows definite improvement in the median income estimates when compared to the

univariate analysis in Chapter 2.

Chapter 4 addresses the simultaneous estimation of several strata means at dif-

ferent time points, where each stratum at each time point contains a finite number of

elements. We assume that the observations are available at the unit level. The HB

model considered in this chapter can be regarded as an extension of the work of Datta

and Ghosh (1991) who used general linear models for small area estimation. While

the HB predictors of Datta and Ghosh (1991) utilize the data available at a given

time, our HB predictors for small areas utilize the data available at a given time,

along with all the data available at previous time points. Also, in addition to the

current small area estimates, our model provides the retrospective estimates of the

small areas at all the previous time points. This chapter also considers a special case

of the time domain HB model for the finite population sampling, assuming known

variance ratios. In such a set-up, the HB predictors have been shown to possess

certain optimal properties. It has been shown that HB predictors are best unbiased

predictors within the class of all unbiased predictors. Under milder assumptions,

these predictors are best unbiased linear predictors. The stochastic dominance (cf.

Hwang, 1985) of these predictors over the linear unbiased predictors for elliptically

symmetric distributions have also been established in this chapter. Finally, we have

also shown that these predictors are also best equivariant predictors for both the

matrix loss as well as general quadratic loss, under suitable group of transformations

for a class of elliptically symmetric distributions.













CHAPTER 2
ESTIMATION OF SMALL AREA MEANS IN TIME DOMAIN



2.1 Introduction


This chapter considers the problem of small domain estimation based on data

from repeated surveys. Estimation of the characteristic of interest, such as population
mean, change in mean over two distinct time points, or other parameters of interest is

considered by using a hierarchical Bayes (HB) approach. Such models, in contrast to

the ones considered in Chapter 4, will be appropriate when observations are available

at an aggregate level, rather than at a unit level. The previous work of Datta,

Fay and Ghosh (1991) used a HB approach for analyzing such aggregate data for
several small areas at a fixed point of time. Also, the empirical and subjective Bayes

approaches considered earlier by Fay and Herriot (1979), Ericksen and Kadane (1985,

1987), Ericksen, Kadane and Tukey (1989) involved similar analysis of small domain

data where no time series model was used. One common theme in all these papers

was to "borrow strength" from similar other areas to obtain improved small area

estimators. The present study borrows strength over time as well in addition to

borrowing strength from other local areas.

On another front, Bayesian analysis of data from repeated surveys is also available
in the literature which does not address the problem of simultaneous estimation from

several small areas. Mention may be made in this context of the work of Blight and

Scott (1972), Scott and Smith (1974) who considered estimation procedures from a







population whose mean varied over time, and applied standard normal time series

methods. These results were later generalized by Rodrigues and Bolfarine (1987)

who used a dynamic linear model, and used Kalman filter estimating algorithm.

Our model encompasses all these models considered earlier as special cases when the

number of local areas is one and the variance components, or at least the variance

ratios are known.

In addition, the model that we consider in Section 2.2 has several novel features.

Most of the models considered earlier for time series analysis of survey data assumed

known variance components. The present work does not need such an assumption.

Instead, the HB method assigns distributions (often non-informative) to these vari-

ance components, and thereby, in effect models the uncertainty due to unknown

prior parameters. In consequence, the usual Kalman filter updating algorithm does

not work any more, but the problem is addressed in a more realistic manner without
invoking the assumption of known variances, or known ratios of variances. Also, the

computational problems are overcome by using Monto-Corlo integration techniques

such as Gibbs Sampling (see Gelfand and Smith (1990)).

Our model also bears some analogy to the one considered in Sallas and Harville

(1988) when the number of local areas is one. However, while the Sallas Harville

approach is primarily an empirical Bayes (EB) approach ours is a HB approach.

The advantage of the present approach over an EB approach is that while the latter

does not incorporate uncertainty due to estimation of the prior parameters, and as

such typically underestimates the standard errors that need to be reported with the

EB estimates, the former produces more honest estimates of the standard errors.

The outline of the remaining sections is as follows. Section 2 describes the basic
hierarchical model, describes the Gibbs sampling algorithm, and computes the nec-

essary posterior distributions needed to implement the Gibbs sampling procedure.

This section also provides formulas for the posterior mean and variance via Gibbs







sampling. An alternative EB procedure is described in Section 3. The methodology

developed in Section 2 is illustrated with the problem of estimation of median income
of four-person families by states in Section 4.


2.2 Description of the Hierarchical Bayes Model


Let Yj denote the standard sample survey estimate of some characteristic 0i, such

as the population mean based on a given sample at time j for the ith small area

(i = 1, m; j = 1, t). Our objective is to optimally estimate a linear function

T!li =I lijOij, using all the data available upto time t. In particular, we may
be interested in estimating current population means for all the small areas based
on several sets of sample means available over time. To this end, we consider the

following HB model:


I. YilOj N(Oj, Vij);

II. Oijlct, bj, Vj i0d N(xTa + zTb ?kj) (i
II. jaby,1 iN(aTa+z by, 4y) (i= 1,...,m; j= 1,--.,t);

ind
III. bj\b-1, W N(Hjbj-1, W) (j = 1,. ,t);

IV. Marginally a, V1, -, Ot, and W are mutually independent with

a ~ Uniform(RP),

RJ (= 7) Gamma (Jay, 'h),

W ~ inverseWishart (S, k).


In (II), z~j and zij are known design vectors of size p and q respectively.

We say that a random variable R has a Gamma (a, b) distribution if it has a pdf
of the form


f(r) oc exp(- ar)rb- I(o,,o)(r); a > 0,b > 0,
f() c sp- r~z







where I denotes the usual indicator function.

We say that a q x q, positive definite random matrix W has an inverse Wishart
distribution with parameters S and k if it has a pdf of the form


f(w) oc IWI-(+k2)exp -1 trace (kSW-1)]

where k > 0 is a known, scalar degrees of freedom parameter, and S a q x q positive

definite matrix. This means that W-' has a Wishart distribution with degrees of

freedom k and scale parameter S.

In (IV), we shall allow the possibility of diffuse priors for R4 or W, e.g. aj =
0, b6 = 0, for (j = 1, -, t), S = 0.
In the case when t = 1, the suffix j can be omitted and Yij = Yj is the usual
survey estimate of the small area parameter Oi. Write tj = x, a;, = xi. In such a
case, we can write the model I-III as

A. YjOi 'i N(0,, V);

B. Oi|w i N(xzTo, 4 + zTwzi)

Special cases of this model are considered in Fay and Herriot (1979) and Ericksen and
Kadane (1985, 1987) where zi = 0.
In the case when m = 1, the suffix i can be omitted and Yij = Yj is the usual
survey estimate parameter Oj at time j. In such a case, our model in (I-III) can
be compared with some of the models advocated for the analysis of repeated survey
data. To see this, it is convenient to write (I-III) as:

(i) Y = zTa + bj + ej, e ~- N(0, Vj),

(ii) bj = HIbj_- + w, wj N(0, W)

We first consider the time series model of Scott and Smith (1974) for analysis of
repeated surveys. The special case of their model when a first-order autoregressive







process is assumed for the characteristic of interest Oj (which in this setup corresponds
to bj) can be identified with (i) and (ii).

In general stages I and II of the model can be combined into a single stage

Yj = zxa + z b, + eii, where eij ~ N(0, Vi + Oj)

Our model bears resemblance to the one considered in Sallas and Harville (1988).
However, there is a slight difference in the variance structure. To see this, combine
stages I and II of the model. This leads to

Yij = afa + zbj, + eii,


where eij id N(O, Vi + ij). In contrast, in Sallas and Harville (1988), V(ej) =

a2Rj, Rj known, a2 possibly unknown, where ej = (eli, emj)'.
We emphasize once again that the HB structure considered in this chapter assigns
distributions to the parameters a, -1, -, ,Ct and W. To our knowledge, the problem
has not been addressed to this level of generality before even within a Bayesian
framework.
We now describe Gibbs sampling algorithm in its general framework. The Gibbs
sampling is a Markovian updating scheme which requires sampling from full condi-
tional distributions. Densities are denoted generally, by square brackets so that the
joint, conditional and marginal densities appear as [U, V], UIV], [V], etc. Given a
collection of random variables (real or vector valued) U1, Uk, the joint density

[U1, Uk] is uniquely determined by [U, IUr, r 5 s], s = 1, .. k. under certain mild
conditions (see Beseg 1974, pp 194-195). The interest is in finding the marginal
distributions [U,], s = 1, ... k.
Gibbs sampling algorithm is as follows: given an arbitrary set of starting values,

U(), ...,U draw U(1) from [Ui U20),... U)]. Then draw U41) from

[U21U 1), U(O) ., U0)],, .U(1) from [Uk1U~'), ., U(1 ] to complete one iteration







of the scheme. After t such iterations, arrive at (U', ... U'). There are two alter-

native ways to obtain i.i.d. k tuples (Uf, ... Uk ),(g = 1, G) from the joint

distribution:

Parallel: Replicate the entire process of t iterations in parallel G times storing

only the last (tth) iterate to provide (U), -. U), g = 1, G.

Sequential: Employ only one single long run of t x G Gibbs iterates, storing

every t'h iterate to provide (Ut- Ur(), g = 1, ,G.


Gelfand and Smith (1990) adopted the parallel algorithm in simulating samples
from posterior distributions. These simulated sample observations can then be used
for estimation of any of the marginal densities. In particular, assuming that the
conditional density f(U1U,, r j s) is available in closed form, a Rao-Blackwellized
sample-based estimate of the marginal density of U, (see Gelfand and Smith (1991))
is given by


(U,) = E ff(UU(, r s).
g=1

We will also be employing the parallel implementation of the Gibbs sampling algo-
rithm in our analysis of the time domain HB model.
In order to implement the Gibbs sampling algorithm we require samples from the
following complete conditional distributions:



aly, 0, bi, -, bt, ri,. ,r, W

bj y,,a,b,6i,b-.,bjl,bj+l,---, bt, ri,--,rt, W, j= 1,...,t


* ri,---,rtty,0,b,---,b,W







Wjy, O,bi, -,bt, rl,- ,rt

Oiijy, bi,--,bt,rl, ..,rt, W, (i = 1,...,m;j = ,...,t)

The hierarchical structure of the model implies that

(i) a y, 0, bl,', bt,,rt ri, ,rt, W ~ ajlO, b,,-- 6, ,b,r,.,r,

(iia) bjly, 0,a, bj,#j, rl, ,r,, W b~jO, at, bj-1, bi+i, rj, W

(j = 1,...,t- 1)

(iib) b6|y, 0, a, bj,,4, rl, rt, W ~ bj|O, a, bt,-, rt, W

(iii) rly, 0, bl,-.,bt, W ~ rjla,O,b,--,6bt (j = 1,...,t)

(iv) Wly, 0,b,--, bt, rl,-, rt Wlb,..-.,bt

(v) Oj1|y, b6i,, bt, r,r,- ,r, (W ~ Oilyij, ,r, bj,a

(i= 1,...,m;j = 1,...,t)

Formal derivation of the complete conditional distributions is now straightforward
and is omitted. The complete conditional distributions are given by

(i) al|y, bl,--,bt, ri,...,rt, W
~ iN[(E1 E=,Tj rxiiij)-1 Ei=l Et= r(, zb(),


(E."i EC lr^ jxj)-1]

(iia) For (j= 1, .,t- 1):

bj y,, a, b, *, bj_l, bj+, bt, rl, rt, W

N,[rj(EC i zTzj + H+ W-Hj+l + W-1)-1

x {(E=l z r)- ELi, ZT(O aTa)

+ HT+ W-lbj+ + W-lHjbj_},

(E=t r iUzT + HT+W-1Hj+1 + W-1)-1]







(ii b) bt6y, O,o,bl,"', bt6,r ,'--,rlt,W

SN[r i zi + W-1)-1
x {(E=L IZT )-1 2Ei zT(O TZa)

+ W-1Hbt-1}, (E=1 rIzzt + W-1)-1]

(iii) rt, ---, ry,0, bl,...,bt, W
nd Gamma(1(a + E = Oi E= Zija zUb,)2, (hj + m))


(iv) Wly,O,bl,.6-,bt,rl,...,rt

~ Inverse Wishart(S + Etj=(bj Hjbjl)(b Hjbj_l)T, k + t)


(v) Oijly,bl, -.,bt,rl,...,rt, W

~ N[(Vi + rj)-1 (Vii + r(aa" + z bj),(V 1 + r)-1]

Our objective is to find the posterior expectation and variance of Oij given y. A
Rao-Blackwellized estimate for E(O9ijy) based on simulated samples from the above
conditional distributions can be approximated by


SE E (Oijly, a =, Wt), bj = bjg, rj = r,,, = 1, ,
g=1

The posterior variance can be approximated by

S[E2(Oijly, = a (), W = W(t), bj = bj, r r rj, j = 1,..., t)

(G=1 E (0;ly,a. = ), W= ), bjg, rj = r,,j = 1, ,t))]


+V(Oily, a = at), W = t), bj = bj,,rj = rj,,j = 1,,t)







2.3 Empirical Baves Estimation


In an EB formulation, we assume (I-III) of the model, but not (IV). Also, it is
assumed that W is a known matrix. In such a case, we may write the model as

YijOij i N(Oij, Vij)


Oi N ( ,, + zO(W + z H H )z (2.3.1)
l---

assuming known bo, i.e., bo = 0 without any loss of generality.

It follows from 2.3.1 that


Y" N [( 'T:aF c D ij + Vij D ij. (2.3.2)

where

j-1
Di, = O~ + zT(W + HHT)zij (2.3.3)
l=1

From (2.3.2), the posterior distribution of 0ij given Yi can be obtained as



ijj N [x7a + Di,(Di + lVj)- (Yj x;a), DijVij(Dij + Vj)- (2.3.4)

The Bayes estimator of Oij under any quadratic loss is its posterior mean, and is given

by



= a + Di(Dij + Vi)-'(Y xoa) (2.3.5)

which may also be seen as a weighted average of the survey estimate Yij and xTa.

In an EB analysis, at least one of the parameters a and jt, (j = 1, -., t); is
unknown and needs to be estimated from the marginal distribution of the Yi's.

First we estimate a. Marginally







So N D(aia, + Vj)

Using the method of maximum likelihood, a can be estimated by



m t -1 m t
t, = E (Dij + Vi) E() (2.3.6)
i=l j=1 i=l j=1

An EB estimator of Oij is obtained by substituting & for a in (2.3.5) and is given by



j = z= & + Dij(Dij + Vi)-\(Yij zTf) (2.3.7)


The EB approach now attempts to estimate #i by maximum likelihood method or
some other standard method. For example, noting that


E i(yj XT&)2/(D +m ,


one can solve iteratively

m
( a &)2/(Di + Vi,) = m q (2.3.8)
i=1

in conjunction with (2.3.6), to find ^j if a positive solution for i, is found and letting

fi = 0, when no positive solution could be found.

We observe that the EB approach and the HB approach (considered in Section 2.2)
will lead to the same point estimator of 0ij. However, the EB procedure by ignoring
the uncertainty in estimating the unknown parameters will lead to the underestima-
tion of the true posterior standard errors. The complete HB analysis though involved,
provide more reliable estimates of standard errors. It is for this reason, we will be

employing only the HB approach in the illustrative example considered in Section
2.4.







2.4 An Application of Hierarchical Bayes Analysis


In this section, we apply the HB procedure developed in Section 2.2. to an actual

data set. Our application concerns estimation of median income for four-person

families by state. Before employing the HB model for median income estimation, we

will briefly give a background of this problem. The details are given in Fay (1987).

The U.S. Department of Health and Human Services (HHS) administers a program
of energy assistance to low-income families. Eligibility for the program is determined

by a formula where the most important variable is an estimate of the current median

income for four-person families by states (the 50 states and the District of Columbia).

The Bureau of the Census, by an informal agreement, has provided estimates of
the median income for the four-person families by states to HHS through a linear

regression methodology since the latter part of the 1970s. Two basic approaches have
been used with change in methodology occurring for income year 1985. In this earlier

method, sample estimates Y of the state medians for the most current year c, as

well as the associated standard errors are first obtained through Current Population

Survey (CPS). The estimates Y were used as. dependent variables in a linear regression

procedure with the single predictor variable

Adjusted census median(c) = [BEA PCI(c)/BEA PCI(b)]
(2.4.9)
x Census median(b),

where census median (b) represents the median income of the four-person families in

the state for the base year b from the most recently available decennial census. In the
above BEA PCI(c) and BEA PCI(b) represent respectively estimates of per-capita

income produced by the Bureau of Economic Analysis of the U.S. Department of

Commerce for the current year c, and the base-year b, respectively Thus (2.4.9)

attempts to adjust the base year census median by the proportional growth in the

BEA PCI to arrive at the current year adjusted median. Moreover, in developing the







model, the states were divided on the basis of the population into four groups of 12

or 13 states each. Model variances for each were computed as the average of squared

residuals of 1969 census medians fitted by a linear regression with an intercept term

along with the adjusted census median for 1969 (as given in (2.4.9)) as a covariate.

Next a weighted average of the CPS sample estimate of the current median income

and the regression estimates is obtained, weighting the regression estimate inversely

proportional to the model error due to the lack of fit for the census values of 1969

by the model with 1959 used as the base year, and weighting the sample estimate

inversely proportional to the sampling variance. From a Bayesian angle, the above

composite estimate is obtainable from the simple hierarchical model of Lindley and

Smith (1972). Finally, the above composite estimate was constrained not to deviate

by more than one standard deviation away from the corresponding sample estimates.

Thus the final estimates were analogous to the "Limited Translation Estimates" as

discussed for example in Efron and Morris (1971) or Fay and Herriot (1979).

The model was reviewed once the median income of four-families by state in 1979
became available from the 1980 census. The following modifications to the previous

modelling were suggested on the basis of this review.

First, it was decided to consider a single level of model error across all the states.

The reason was that the differences between the estimated model errors for the larger

and smaller states greatly exaggerated more current differences. Second, the predic-

tive value of the regression appeared usefully improved if in addition to the adjusted

census median (c), the census median (b) was also included as a second indepen-

dent variable. The idea was to adjust for any possible overstatement of the effect

of change in BEA income upon the median incomes of four-person families. Also,

the one standard deviation constraint was removed. This matter is discussed in Fay

(1987).







So far, the entire time series of the sample median income estimates available
for each year from the preceding census have not been used to obtain the median
income estimates for the four-person families. Our model in Section 2.2 exploits
the past time series information available at a given time in addition to the recent
most census medians. Before we describe the hierarchical model for median income
estimation, we need to introduce a few notations. Let Yi denote the median income
of four-person families in state i. The true median corresponding to Yij is denoted by
Oij. Let zijl and xij2 denote respectively the adjusted census median income for the
year j (as given in (2.4.9)), and the base-year census median income for four-person

families in the i'h state (i = 1, ** ,51). Finally, let

0 = (1 xij zi2), (i= 1, 51; j = 1, 10), (2.4.10)

and denote by a = (al a2 a3)T the vector of regression coefficients. The following
hierarchical model is used to estimate the Oij's:

I. YijO ij, vij are mutually independent with

Yj l N(Oj, vij), (i = 1,..., 51; j = 1, .., 10)

II. Oija, bj, N(za + bj,, j), (i= 1,--.,51; j= 1,..-,10)

III. bjbj_1, 2 nd N(bj_, r2) (j = 1,..., 10)

IV. Marginally a,, i/ ,1co, and r2 are mutually independent with

a ~ uniform(R3),

rj = ,1 ~, Gamma(0, -1), (j = 1, .., 10)

ro = (r2)-1 ~ Gamma(0,0),

To find the estimates of the median income of four-person families and their standard
errors, we need to use the Gibbs sampling algorithm and require samples from the







following complete conditional distributions:



aly,O, bl,- --,bio,ri,---,rio,ro

bjYy, a,bi, ..,bj-,bj+l,..,b o,ri, ,roro, ,j= 1,...,10

ri,'--,rioly,O,bl,---,blo,

roJy,0, bi,---, bo,ri,--,rio

Oij|y, bl,6, ,blo, ri,--,rio,ro, (i = 1,...,51; j = 1,...,10)

Again, the hierarchical structure of the model implies that

(i) caiy,0, bl,---,blo, rl,---,r,ro, ro~ a ctO, bl,- ,blo, rl,-' ,rio

(iia) bj ly, 0, a, bj-~, bj+i, r, ro ~ bj 10, a, bj-1, bj+i, rj, ro

(j = 1,...,9)

(iib) bioly, 0, ba ,,, bs9,,r ro, ro ~ biol, q, ab, ,ro,ro

(iii) rjly, 0,a,b .,bio, ro~ rj Qa, ,bi, .,b o(j = 1,...,10)

(iv) roly, 0, bi,--, bo, ri, ,rio ~ robl-, bio

(v) Oijly, bl, ,blo, rl, ,rlo,ro ~ o Oyiji, r, bj, a

(i= 1,...,m;j= 1,...,10)

These distributions under conjugate priors follow from standard normal and gamma

distribution theory and are as follows:

(i) cay, O, b, bio, ri, rio, ro
N3[( 1 1 rJr2ij g)-)-1 E-1 r1jf(j -- bj),

(2i=l 2E'01 rjxTij)-']







(iia) For(j=1,-..,9):

bjly, O, aI, j-il, bj+l, rj, ro

~ N[(51 r, + 2 ro)-1{rI E1(,ij a1a)

+ ro(bj+ + bj-,)}, (51 r + ro)-1]

(ii b) bioly, 0, a, b, ro, ro

SN[(51 ro + r)-,o){r rio ,(o- ioT+) + rob09,

(51 ro + ro)-1]




i Gamma(!iE1P LE,(0 a( m3a b1)2, 25)

(iv) rojy,, 8, b b,, r, ,rio

~ Gamma( Ej (b bj-b)(bj bj-1)T, 5)

(v) Oijly, bl,"1 blo, rl, rlo, ro

~ N[(v-1 + rj)- (viyij + r1(a + bj)),

(vs1 + r1)-1

Simulations from these complete conditional distributions can be made using the
standard random number generators from the normal and gamma distributions. We
can now estimate the marginal posterior density of Oij ly as described earlier in Section
2.2. We find estimates of the four-person families for each year from 1979-1989

(except for 1980) using all the sample estimates available since 1979 census by running
50 iterations of 100-2000 replications each. The necessary convergence is achieved
after 500 samples. We denote these HB estimates by HB1. As a special case of our
model, we have also computed the median income estimates for 1989 utilizing only
the census median income figures, the CPS median income estimates for 1989, and







the PCI's for the years 1979 and 1989. We denote these HB estimates by HB2. The
HB2 estimates are not based on any time series model, and are based on a univariate
version of the HB model considered in Datta, Fay and Ghosh (1991). Table 2.1

provides the statewise sample median incomes as well as the HB1 and HB2 estimates
for 1989. The standard errors associated with these estimates are given in Table 2.2.
Also, for most states, the time domain HB analysis provides reduced estimates of
standard errors, when compared to the corresponding standard errors under the non
time-series model of Datta, Fay and Ghosh (1991). The time domain HB procedure
results in substantial reduction of the standard errors when compared to the sample
estimates.
The median income estimates along with their standard errors for the year 1979
are provided in Table 2.3. Since in addition to the sample estimates, the 1979 incomes
are also available from the 1980 census data, we compare the retrospective estimates
obtained for 1979 by using the time series model against the 1980 census, treating
the latter as the "truth." The 1980 Census figures are labelled as Truth in Table-2.3.

Suppose eiTR denotes the true median income for the ith state and ej any estimate

of eiT.. Then for the estimate e = (el, esi)T of eTR = (elTR, es5R)T', let

Average Relative Bias = (51)-1 ZEi lei eRI /e

Average Squared Relative Bias = (51)-1 E1I je, eT12 /e?

Average Absolute Bias = (51)-1 Ei1 e eTR

Average Squared Deviation = (51)-1 E 1 (e_ e,)2

The above quantities are provided in Table 2.4 for the sample estimates and HB1
estimates. The HB1 estimates are clearly superior to the sample median estimates
for each state. In terms of the average relative bias, the HB1 estimates improve on the
sample medians by 80.32 %. Using the creterion of average squared relative bias, the
corresponding percentage improvement is 97.01. Similarly, for average absolute bias





22

and average squared deviations as the creteria, the respective percentage improvement

of the HB1 estimates over the sample medians are 80.88 and 96.27. Thus time series

HB procedure improves tremendously on the sample estimates.








Table 2.1. Median Income for 4-Person Family by State in 1989

(In Dollars )
State Sample HB1 HB2
Median
1 37700 37332 37987
2 52720 47829 48232
3 41489 39604 40086
4 55729 52661 52817
5 47106 43424 43629
6 55048 53203 53020
7 43434 43855 43879
8 53128 53436 53305
9 40990 40745 40762
10 44241 42589 42550
11 40297 39736 39604
12 42590 43075 42817
13 43260 43101 42907
14 42226 41111 41006
15 45432 43631 43439
16 38711 37773 37713
17 39232 38932 38976
18 36966 34946 35068
19 32610 32116 32387
20 40894 38326 38419
21 42781 38820 38821
22 39819 41924 41724
23 48252 50346 49893
24 36587 40270 40384
25 49718 45511 45656





HB1


HB2


Table 2.1. continued
State Sample

Median
26 31604
27 37423
28 31834
29 45876
30 37194
31 32186
32 31327
33 32245
34 31330
35 33317
36 36219
37 40358
38 32679
39 29586
40 32227
41 38931
42 36732
43 29761
44 38969
45 37938
46 34579
47 42392
48 41419
49 41121
50 41379
51 44999


32141
37399
34407
41050
37827
33546
33843
34079
31559
32480
35123
35680
34247
32598
33030
36451
39591
31238
38828
36913
39129
41821
39342
42010
48607
44785


32217
37564
34626
41479
37876
33779
34041
34272
31922
32918
35118
35865
34117
32602
33161
35852
39294
31462
38868
36942
38722
41559
39256
41820
47146
44369








Table 2.2. Estimated Standard Errors of Median Income Estimates
for 4-Person Family by State in 1989
(In Dollars )
State Sample HB1 HB2
Median
1 3230 2010 2147
2 4132 2327 2450
3 2848 1936 2011
4 2452 1946 1999
5 4512 2289 2355
6 3893 2245 2379
7 1512 1302 1318
8 1933 1575 1635
9 1584 1335 1349
10 1845 1.562 1583
11 2970 1931 1974
12 1763 1440 1472
13 1597 1341 1374
14 2310 1722 1757
15 3251 2030 2083
16 2258 1698 1734
17 3677 2098 2141
18 2249 1757 1794
19 2130 1630 1679
20 2502 1869 1899
21 3359 2134 2184
22 3699 2117 2164
23 4665 2314 2450
24 4645 2280 2336
25 4349 2292 2362








Table 2.2. Continued
State Sample HB1 HB2
Median
26 2278 1695 1736
27 1513 1289 1311
28 2965 1971 2020
29 4385 2295 2377
30 1397 1232 1242
31 2885 1915 1968
32 2446 1830 1870
33 2690 1869 1913
34 2392 1733 1804
35 2435 1762 1836
36 3314 2039 2099
37 3923 2244 2311
38 1552 1405 1394
39 2557 1896 1918
40 2268 1691 1727
41 3516 2158 2356
42 3261 2045 2088
43 2811 1902 1962
44 5003 2293 2346
45 3209 2004 2050
46 2863 2059 2084
47 2525 1788 1842
48 3056 1993 2040
49 1586 1359 1369
50 3637 2299 2665
51 4805 2278 2369








Table 2.3. Median Income Estimates for 4-Person Family by State in 1979

(In Dollars )
State Truth Sample HB1

Median SE Median SE


18319
22027
19424
23772
22107
25712
22669
26014
22266
23279
23014
25410
25111
23320
24044
22351
21891
20511
18674
21438
22127
23627
26203
21862
22757


18084
23129
18438
25324
23597
25037
21852
25486
23326
22397
22716
24294
24103
23270
25137
23851
20743
18567
19826
22603
22014
21860
26235
20232
24160


1533
1647
1642
1414
2409
1431
766
1264
878
843
1041
1025
1006
1202
1571
1592
1255
1535
1588
1802
1586
1900
1722
3801
1418


17967
21919
19055
23805
21967
25596
22329
25913
22330
22963
22826
25168
24872
23179
24021
22294
21563
20094
18449
21297
21930
23388
26173
21623
22736


516
495
508
501
511
509
423
502
460
439
451
486
477
467
501
499
478
506
522
501
487
510
528
520
494








Table 2.3. continued
State Truth Sample HB1

Median SE Median SE


20214
19772
19944
20668
21086
19685
19693
19926
18150
17893
21412
20659
22521
20776
19961
24641
23757
19257
21924
21572
24438
24394
22688
24752
31018
24966


18274
20296
19282
22687
19675
18657
19776
17978
19167
18917
18965
19295
22963
21216
21533
22013
26746
19968
21733
20727
25447
25463
24251
24265
30788
23744


1380
1012
1795
1196
1042
1285
1274
1282
1762
1507
1444
1675
873
1625
1543
1559
1677
1597
1828
1440
2051
1415
1500
767
1891
2263


19758
19611
19624
20703
20650
19295
19434
19436
17875
17639
20964
20311
22466
20571
19827
24327
23853
19017
21712
21285
24388
24401
22665
24559
31159
24830


506
465
506
517
474
489
483
506
534
530
513
501
435
494
510
530
531
509
496
484
516
498
499
427
655
526





29

Table 2.4. Average Relative Bias, Average Squared Relative Bias, Average Abso-
lute Bias, and Average Squared Deviations of the Estimates.

Sample Median HB1


Aver. Rel. Bias 0.04984 0.00980

Aver. Sq. Rel. Bias 0.00340 0.00014

Aver. Abs. Bias 1090.41 208.45

Aver. Sq. Dev. 1631203.47 60782.92














CHAPTER 3
MULTIVARIATE ESTIMATION OF MEANS IN TIME DOMAIN



3.1 Introduction


Periodic sample surveys usually provide sample-based estimates for several pa-

rameters. The inter-relationship among the parameters can play an important role

in improving the small area estimators. Fay (1987) demonstrated that, even when

one parameter of a vector is of interest, estimating the joint vector may improve the

estimation of a given parameter compared to restricting attention only to the single

parameter. In particular, he suggested improving on estimation of median incomes

for four-person families discussed in the previous chapters by including information

regarding the median incomes of three- and five- person families as well because of a

strong statistical relationship among the three sets. While the HB model of Datta,

Fay and Ghosh (1991) involved bivariate data, their analysis was restricted to a fixed

point of time.

The univariate HB model considered in Chapter 2 can be extended to the mul-

tivariate problems simply by taking the observations at each time as vectors rather

than scalars. In Section 3.2 we present a general framework for multivariate HB anal-

ysis when the error variance-covariance matrices are known for all time. This section

also provides a methodology for obtaining marginal posterior densities through Gibbs

sampling algorithm. We revisit the problem of median income estimation in Section

3.3. We find the median income estimates based on the model given in Section 3.2







for the period 1979-1989 for all the states. We have also compared the estimates for
the year 1989 with the corresponding estimates obtained in Section 2.4


3.2 The General Multivariate Hierarchical Bayes Model


Suppose, based on a given sample at time j, (j = 1, -, t) that Yij = (Yijl, "", Yj,)T
is a s-dimensional column vector of sample survey estimators of some characteristics
0ij = (Oij, Oj,)T, for the ith small area (i = 1, -, m). As in Chapter 2, our ob-

jective is to optimally estimate a linear function EiZ I= 1 LO9ij, using all the data
available upto time t. Consider the following HB model:


I. YijO, i N(Oi,, Vii) (i = 1, m;j = 1, t);

II. Oi la,bj,j k N(XijT + Zijbj, ,) (i = 1,.--,m;j= 1,..-,t);

III. bjbj_,, W 'i N(Hbj_1, W) (j = 1,*.,t);

IV. Marginally a, 1,.. ,t, and W are mutually independent with

a ~ Uniform (RP),

ij ~, inverse Wishart (Sj, kj),

W ~ inverse Wishart (So, ko).

In (II), Xij (s x p) and Zij (s x p) are known design matrices.
In (IV), we shall allow the possibility of diffuse priors for Ij or W, e.g. Sj =

0,for(j =1,.-,t), So =0.
The hierarchical structure of the model implies that







(i) aj|y, ,6 b,...,6bt, i I,..., t,W ,,~ac O,b6,-..-,bt, j,..-., t
(iia) bj y, 0, a, bj, ,j10, T, 6 W bj O, a, bj-1, bj+, y, W
(j,j/= 1,...,- 1)
(iib) btjy, ,a,bI, 6 ,b_-1,O1, ...,Ot, W b6jf c,a,bt,_i,jW

(iii) liy, a, b1,. ,b, W jla|a, 61, bi ,6bt (j = 1,...,t)

(iv) W ly,0, bl,...,-bt,- ...,l t ~ W lbl,...,--,

(v) Oi|jy, bl,---,b, 1,- t, W ~ OijyYij, j, bj, a
(i= 1,...,m;j = 1,...,t)

Formal derivation of the complete conditional distributions is now straightforward
and is omitted. The complete conditional distributions are given by

(i) a|y,O,b6,..,6bt,l,..,C,t, W
.~ NP[( Ei E =i X xii)-1 E E=X 1 ^ Z b,),
('=l EiX T -1^ -]



(iia) For(j=1,...,t-1):

bj y,O, a, bl,, bil, bj+l,.., bt, 1, i**,t, W

~ Nq E[(1 Z ij + Hj+1 W-1'H + w-)-1
x {i=1 T 5-1 ( -X
x (Z ZO-' (Oie j Xija)

+ Hj+W-bj+l + W-1jbj-,

(EZ ZT iZj + Hj+1-'H,+ + -1)-






(iib) btly,0,a,bl,.b, btl,.., W

N, [(Ft!- Z I-Zt + W-1)-I

x {E ,1 (T X1ta) + W-1Htbt.i},
$= Zit bt (Oit Xitot) W-+ Htbt-jj,

(E z Z,- + W-1)-1]

(iii) Ir *, |,.,l y, bl,..., bt, W

SInverse Wishart [Sj + E=I (Oi, Xijo Zijbj)

x (Oi, Xija Zijbj)T, kj + m]

(iv) W jy, b,. .,b,, .. I t

~ Inverse Wishart (So + ,E=I (bj Hjb-_l) (b, Hjbj_-)T, ko + t)

(v) Oijly,bl,---,bt,,i,...', t, W

SN [(Vi1 + ~, 1) (V-1j -1 ( + Zijbj)) ,
(V-1 +' )-1]


A Rao-Blackwellized sample based density estimate of Oij given y based on sim-
ulated samples from the above conditional distributions can be approximated by

1G
SE f (Oij y,a = act), W = t), bj = bj, t,j = i,.,
9=1

3.3 An Illustrative Application to Survey data

In this section, we will illustrate the implementation of multivariate HB model
described in Section 3.2 We again consider the problem of estimation of median







income for four-person families by state. Fay (1987) suggested improving on estima-

tion of median incomes for four-person families by including information regarding

the median incomes of three- and five- person families as well because of a strong
statistical relationship among the three sets. He advocated the use of a multivariate

empirical Bayes (EB) or variance-components approach for the composite estimation
of median incomes of four-person families by state. Datta, Fay and Ghosh (1991)

adopted a bivariate hierarchical linear model for estimating the statewide median

incomes of four-person families. In their procedure, one component was the median

income of four-person families, while the other component was a convex combination

of the median incomes of five- and three-person families. The analysis required eval-
uation of 3-dimensional integrals. Instead, we will use Monte-carlo integration via

Gibbs sampling algorithm. In our analysis, we will be considering as the basic data

51 three component vectors corresponding to the median incomes of four-, three-,
and five person families for the years 1979-1989 (except for 1980). Our target is

to estimate statewide median incomes of four-person families, although our proce-
dure furnishes estimates of median incomes of three- and five person families as well.

Note that direct numerical integration in a trivariate set up will require evaluation

of six-dimensional integrals, which is fairly complicated and often unreliable. The
Gibbs sampling technique avoids such problems.Our analysis rests heavily on the

implementation of the Gibbs sampling algorithm.

We consider as the basic data the three component vector Yij = (Yijl, Yj ij3 )T

i = 1, 51;j = 1, 10, where Yij1, Yj2, and Yijs are the sample median incomes

of four-, three-, and five-person families in state i in year j. The true medians cor-

responding to Yi's are denoted by Oij's respectively. Let Oij = (0ijl, Oij2, Oij3) We

shall also write as Y = (Y1,I, Y5110)T 9 = (011, 051,10) -

We would like to introduce few more notations before describing the HB model.

Let xijll and zij12 denote respectively the adjusted census median income for the







year j and the base-year census median income for four-person families in the ith
state (i = 1, ,51). Also, let zix21 and Zij31 denote respectively the adjusted census
median incomes for the year j for three-, and five-person families. The corresponding
base-year census median income for three-, and five-person families are denoted by

xz;22 and xij32. Finally, let


1 ziinl .i2 0 0 0 0 0 0
Xi = 0 0 0 1 xij21 Xij22 0 0 0 (3.3.1)
0 0 0 0 0 0 1 Xij31 Xij32


i = 1, 51, j = 1, 10, and denote by a = (alx, a9)T the vector of regres-

sion coefficients. Also, bj = (bji bj2 bj3)T, (j = 1,..., 10) is the vector of random
components for the year j.
The following HB model is used:


I. Y ejIO|, N(Oij, Vi,) (i = 1,..-,51;j = 1,..,10);

II. |Oita,bj,, N(Xja + b6, Tj) (i = 1,...,51;j = 1,..,10);

III. bj6bj-1, W d N(bj_1, W) (j = 1, .., 10);

IV. Marginally a, ,1, ~ *, and W are mutually independent with

a ~ Uniform (R9),

Oj~ -inverse Wishart (Sj, kj),

W ~ inverse Wishart (So, ko).

In (II), Xij (3 x p) is a known design matrix.
In (IV) we assume diffuse priors for Oj or W, e.g. Sj = 0, for (j = 1,-, ,10), So =
0.
As mentioned earlier, closed-form expressions for E(Oijly) and V(Oijly) in such
a set-up can not be obtained. Also, it is too complicated to estimate E(Oijly) and






V(Oij y) through numerical integration. We will be using the Gibbs sampling proce-
dure to simulate samples from the following complete conditional distributions:

(i) a"y, O, b, *, 6bo, 1, l, io, W

N E---10 X ij Xij)- i-- 1
=1 X10 X )-,Ij


(iia) For(j = 1,...,9):

bjly, 0, r1, bl, b + -1 1bj+l, i, 1 0 j Wo,

SN [(51 l l + 2 W-l)-l

x f{-1 =1i (Oi, Xja)

+ W-1 (bj+l + bj-)),

(51 1 2W-1)1]

(iib) bioly,O,a,bl, ---,blo, i, .., 0,o,W

SN [(51 to1 + W-)-

x {o10ik E1 (Oit Xia) + W-1},

(I51 -1 + -11]

(iii) l, .., oly, O I, bl,b ,- ,blo, W
id Inverse Wishart [Sj + E10 (0i Xija bj)

x (0ij Xij bj)T 51]

(iv) W ry, 61,b---, 1, bi, 010

~ Inverse Wishart (EF,=0 (bj bj-1) (bj bj_-)T, 10)







(v) Oijy, b, -'", blo, 1,*"*,1o, W1

~ N v -1 (iy, + X-1 (X a + )
i(V- + (V




With all the complete conditional distributions available in closed form, we can
now estimate the marginal posterior density of ij ly as described earlier in Section 4.2.

We find estimates of the four-person families for each year from 1979-1989 (except
for 1980) using all the sample estimates available for three-, four-, and five person
families since 1979 census (except for 1980) by running 50 iterations of 100-2000
replications each. The necessary convergence is achieved after 500 iterations. We
denote these HB estimates by HB3. As in Section 2.4, we also consider the non time-
series analysis which utilizes only the census median income figures for 1979 and the
CPS median income estimates for 1989, and the PCI's for the years 1979 and 1989.
In this case, we utilize the data available for three-, four-, and five person families in
contrast to using only the information for four-person families as in Section 2.4. We
denote these estimates by HB4. These estimates are based on a trivariate version of
the HB model considered in Datta, Fay, and Ghosh (1991). Table 3.1 provides the

sample median incomes as well as the HB3 and HB4 estimates. The HB3 estimates
are based on the time series HB described earlier in this section. The standard errors
associated with these estimates are given in Table 3.2. Again in conformity with the
results in the case of univariate analysis in Chapter 2, the time domain HB procedure
results in substantial reduction of the standard errors when compared to the sample
estimates. Also, the standard errors associated with the HB3 estimates are lower than
the standard errors of the HB4 estimates for each state.
The median income estimates along with their their standard errors for the year
1989 are provided in Table 3.3. Also, as in Section 2.4, we compare the retrospective
estimates obtained for 1979 by using the time series model against the 1980 census,







treating the latter as the "truth". We also compute the creteria average relative bias,

average squared relative bias, average absolute bias, and average squared deviation (as

defined in Section 2.4) for the HB3 estimates. These quantities are provided in Table

3.4. Again, the HB3 estimates are clearly superior to the sample median estimates for

each state. In terms of the average relative bias the HB3 estimates improve on the

sample medians by 87.62 %. Using the criterion of average squared relative bias, the

corresponding percentage improvement is 98.24. Similarly, for average absolute bias

and average squared deviations as the creteria, the respective percentage improvement

of the HB3 estimates over the sample medians are 87.69 and 98.28. Thus time series

HB procedure improve tremendously on the sample estimates.

We also compare the HB3 estimates for the year 1979 based on the trivariate time
series HB model with the HB' estimates based on univariate analysis in Chapter 2.

In terms of the average relative bias, the HB3 estimates improve on the HB' estimates

by 37.04 %. Using the criterion of average squared relative bias, the corresponding

percentage improvement is 57.14. Similarly, for average absolute bias and average

squared deviations as the criteria, the respective percentage improvement of the HB3
estimates over the HB1 estimates are 33.67 and 53.94. Thus, the trivariate time series

HB model captures the inter-relationships between the median income of three-,

four- and five-person families to improve the median income estimates of four-person
families.








Table 3.1. Median Income for 4-Person Family

(In Dollars )


State


Sample
Median
37700
52720
41489
55729
47106
55048
43434
53128
40990
44241
40297
42590
43260
42226
45432
38711
39232
36966
32610
40894
42781
39819
48252
36587
49718


by State in 1989


HB3 HB4


37826
49130
40889
53372
43723
54755
43760
53756
40582
42408
38250
42732
43405
41033
43043
37100
38512
35498
33137
38846
38934
43068
51029
39918
46250


38172
48988
41118
52986
43578
54140
43513
53076
40508
42326
38121
42360
43090
40932
42748
37122
38491
35790
33514
38988
38990
42751
50229
39842
46019








Table
State


HB3 HB4


3.1. continued

Sample
Median

31604
37423
31834
45876
37194
32186
31327
32245
31330
33317
36219
40358
32679
29586
32227
38931
36732
29761
38969
37938
34579
42392
41419
41121
41379
44999


31102
38078
35686
40665
36952
33722
33911
34192
31902
31516
34409
34252
34226
32865
33112
37177
40034
30098
38132
36890
38654
42561
39475
42787
49151
45806


31346
38174
35865
40830
36919
33976
34048
34365
32284
32021
34543
34551
34137
32944
33319
36842
39727
30441
38100
36953
38221
42316
39404
42421
47671
45262








Table 3.2. Estimated Standard Errors of Median Income Estimates
for 4-Person Family by State in 1989
(In Dollars )
State Sample HB3 HB4
Median
1 3230 1867 2051
2 4132 2129 2281
3 2848 1825 1925
4 2452 1710 1848
5 4512 2065 2168
6 3893 2291 2399
7 1512 1167 1231
8 1933 1389 1545
9 1584 1157 1194
10 1845 1350 1399
11 2970 1774 1830
12 1763 1272 1342
13 1597 1225 1277
14 2310 1581 1630
15 3251 1865 1925
16 2258 1501 1560
17 3677 1875 1928
18 2249 1537 1615
19 2130 1494 1586
20 2502 1687 1737
21 3359 1952 2016
22 3699 1972 2024
23 4665 2214 2355
24 4645 2235 2295
25 4349 2017 2104





HBI3 HB4


Table 3.2. continued
State Sample

Median
26 2278
27 1513
28 2965
29 4385
30 1397
31 2885
32 2446
33 2690
34 2392
35 2435
36 3314
37 3923
38 1552
39 2557
40 2268
41 3516
42 3261
43 2811
44 5003
45 3209
46 2863
47 2525
48 3056
49 1586
50 3637
51 4805


1515
1140
1766
2049
1113
1774
1702
1706
1614
1587
1841
2031
1287
1693
1516
2030
1897
1661
2022
1745
1863
1679
1821
1216
2260
2144


1594
1194
1842
2148
1154
1839
1761
1768
1701
1704
1931
2135
1292
1731
1573
2280
1944
1773
2089
1809
1914
1749
1881
1235
2607
2252








Table 3.3. Median Income Estimates for 4-Person F family by State in 1979

(In Dollars )
State Truth Sample HB3

Median SE Median SE


18319
22027
19424
23772
22107
25712
22669
26014
22266
23279
23014
25410
25111
23320
24044
22351
21891
20511
18674
21438
22127
23627
26203
21862
22757


18084
23129
18438
25324
23597
25037
21852
25486
23326
22397
22716
24294
24103
23270
25137
23851
20743
18567
19826
22603
22014
21860
26235
20232
24160


1533
1647
1642
1414
2409
1431
766
1264

878
843
1041
1025
1006
1202
1571
1592
1255
1535
1588
1802
1586
1900
1722
3801
1418


18206
22032
19246
23808
22094
25631
22414
25938
22440
23061
22819
25167
24963
23322
23928
22375
21695
20211
18621
21395
22025
23476
26202
21737
22866


511
481
498
467
499
493
395
482
453
412
449
459
454
471
526
477
456
491
506
482
466
493
517
507
486








Table 3.3. continued
State Truth Sample HB3

Median SE Median SE


20214
19772
19944
20668
21086
19685
19693
19926
18150
17893
21412
20659
22521
20776
19961
24641
23757
19257
21924
21572
24438
24394
22688
24752
31018
24966


18274
20296
19282
22687
19675
18657
19776
17978
19167
18917
18965
19295
22963
21216
21533
22013
26746
19968
21733
20727
25447
25463
24251
24265
30788
23744


1380
1012
1795
1196
1042
1285
1274
1282
1762
1507
1444
1675
873
1625
1543
1559
1677
1597
1828
1440
2051
1415
1500
767
1891
2263


19890
19702
19757
20824
20717
19539
19532
19541
18106
17844
21084
20426
22501
20646
20008
24412
23944
19167
21861
21366
24416
24399
22720
24518
30937
24873


496
445
502
489
464
484
471
510
517
517
491
489
416
482
500
505
518
496
489
469
498
477
481
418
660
508





45

Table 3.4. Average Relative Bias, Average Squared Relative Bias, Average Abso-
lute Bias, and Average Squared Deviations of the Estimates.

Sample Median HB3


Aver. Rel. Bias 0.04984 0.00617

Aver. Sq. Rel. Bias 0.00340 0.00006

Aver. Abs. Bias 1090.41 134.27

Aver. Sq. Dev. 1631203.47 27996.90














CHAPTER 4
HIERARCHICAL BAYES PREDICTION IN FINITE POPULATIONS



4.1 Introduction


In Chapter 2 and Chapter 3, we introduced HB procedures for small domain

estimation based on data from repeated surveys from an infinite population. In such

models, we assumed that the observations are available only at the aggregate level.

There we derived the HB estimators for the current as well as previous time points,

simultaneously for all the small area population means.

In this Chapter, we consider the problem of small domain estimation in the context

of finite population sampling. In the finite population setup, we assume that there are

m strata, the ith stratum at time j contains a finite number of units Nij, and the unit

level information on the characteristic of interest at time j, (j = 1, *. t) is available

through a sequence of random sample surveys of the given strata populations. We

are interested in predicting certain parameters (such as finite population mean or

total or change in mean over two distinct time points for each stratum) of the strata

populations which are changing over time.

We describe a general HB approach in Section 4.2. There we consider the most

general situation where all the variance components are unknown. Some prior dis-

tributions are assigned to the vector of regression coefficients and the variance com-

ponents. This section provides the predictive distribution of unobserved population







units at all the time points given the time series data on the sampled units. The com-

putation of the estimation of population means simultaneously for several small areas

at all the time points through the implementation of the Gibbs sampling algorithm

(as described in Section 2.2.1) have also been discussed in this section.

In Section 4.3, we consider a special case of the general HB model in Section

4.2. We assume that A, the vector of ratios of variance components is known. In

such a case, it is possible to find the posterior mean vector and posterior variance-

covariance matrix corresponding to the characteristic vector of non-sampled units

at all the time points, in a closed form. In subsequent sections 4.4-4.5, we assume

this special case of the HB model. The corresponding HB predictor as obtained

in Section 4.3 is shown to possess some interesting optimal properties. Datta and

Ghosh (1991) also assumed known ratios of the variance components in a special case

of their general HB approach for small area estimation. The HB predictors in their

special case have been shown to be the best with in the class of all linear unbiased

predictors. For a class of spherically symmetric distributions, their HB predictors

are also optimal within the class of all unbiased predictors. In Datta and Ghosh

(1990) these HB predictors dominate stochastically the linear unbiased predictors in

the sense of Hwang (1985). Furthermore, under a suitable group of transformations,

Datta and Ghosh (1990-91) showed these predictors to be the "best" within the class

of all equivariant predictors for elliptically symmetric distributions. In the set up of

Datta and Ghosh, the HB predictor borrowed strength from other small areas. In our

set-up, as we have mentioned earlier in Chapter 2, the HB predictor borrows strength

over time as well in addition to borrowing strength from other small areas. Our HB

predictors as derived in Section 4.3 based on a time series HB model, possesses all

the optimal properties listed above. In establishing this in Sections 4.4-4.6, we have

followed the approach as in Datta and Ghosh (1990-91). In Section 4.4, it is shown

that the HB predictors of section 4.3 are best unbiased predictors within the class of







all unbiased predictors. Besided, it is shown that these predictors are BLUPs. Section

4.5 establishes the stochastic dominance (cf. Hwang, 1985) of these predictors over
the LUPs for elliptically symmetric distributions. Finally, in Section 4.6, we have

shown that these predictors are also best equivariant predictors for both the matrix

loss (or standardized quadratic loss) under suitable group of transformations for a
class of elliptically symmetric distributions.


4.2 Description of the Hierarchical Baves Model


In this section, we consider a hierarchical model for prediction of the characteristic
of interest (like the population total or mean for each small area or domain) in the

context of finite population sampling Let Yijk denote some characteristic of interest

associated with the kth unit at time j in the ith small area (k = 1,* -,Nii;j =
1, *.*, t; i = 1, .-., m). For notational simplicity, we assume that the characteristic

of interest is a scaler. The results can be easily extended to the case when Yijk is a
vector. Let

yQl)= (y- ...,y.. ,,T

Y(2) = (Y; +,n1+ y;, T


The superscripts (1) and (2) are used to denote the sampled and non-sampled units

in the ith small area at time j, respectively. Nij is the population size of the ith small

area at time j from which a sample of size nij is taken (1 < i < m; 1 < j < t).

X!) (nii x p) and Z!) (nij x q) known design matrices correspond to the sampled

units. X) ((Nij nij) x p) and Z2 ((Nij ni,) x q) correspond to the unsampled

units.

We consider the following Bayesian model:








I. (Y) )() 12 A( N A 9 )()
y(2) (\ ?(2) 0 (2 I

(i= 1,---,m;j = 1,--,1)

II. ab),a, N' N ((a + L b (
\ j 13 \

(i = 1,--.,m;j = 1,---,t)

III. blb _, W,V nA N(Hjbj_,, 2W(A)); (j = ,t)

We assume that Vij(A), j (A), (i = 1, .., m;j = 1, t), and W(A) are struc-

turally known except possibly for some unknown A = (A1, A). Hj(q x q), (j =
1, .. ,t) are known matrices. bo is assumed to be known. Without any loss of gener-

ality, we assume bo = 0.

Before writing the stages I-III of the above model in a compact form, we need to
introduce a few notations. We denote a u x u identity matrix by I,. P denotes a

null matrix of suitable order.

We define


C () -=c T C ,C)T C(1) T


C -= Y fO 2TY ... (i = 1,... ,m)
C = i 2t Mt





C(2) f(2)T rC2) ... 2)T T
t i--t, ,t M mt

C 2) y V(2T V(2)2T T (i-1 *mT
t i l i2 I ii )








I)T vX ()'T i T
!X, ,Xi2 ,,X ) ; (i=l,.-.-,m)


S X 2)T X(2)T 2) (i 1 rn)
-- i-'il ,"'.2 ,'","i t ) ; ( = ,""


- X 1T,XIT.2 ..,XI s)T
1 2 M


=- X X2)T (2)T


= HII=tH, l=1,2,---,j; j= ,---,t

= Iforl=j+l, j=l,...,t


B(1) =



Bit =


B( 2)






B(1)


Z ()U22
i2Utz

it Ut2


Z(2)
Si2 22

B 2)
Zit U2


It

B(l)

mt


zQS







Z(2)
i2=


B(2)
It
D(2)
B2t


B(2)
*mt


Vt = Block Diag (VII, -., Vlt, V21, ..., V2t-, VmI, ., Vmt)



We now write the model as a compact linear mixed model which incorporates all the

information up to and including time t for all the small areas. We have, without any

loss of generality, listed the sampled units first.


Ai2)


UjI


X 2)
S


... z!:





... z,)

* ** Z)t







c" AC / 'A Bi) \
C= ( A( ) a+ w + (

where


w ~ N (o,or (It W(A)))



~ N (, 2 )()) V())
N (0, a.2 ((Im 0 obt(A)) + Vt(A)))


Writing


S(1)
Ct = )
k C^2))


given a, A, a2, and r

C, ~ N (At a, r-1 Zt)

where

Et =- Xt(A) = Bt (It 0 W) BT + (Im, 0' t)+ Vt

Furthermore, we partition Tt as


Et [ 1 T1 1 2
I t21 <22


where E2t, is nt x nt, E.t2 (= 1t2) is nt x (N -nt), and Zt, is (Nt -n) x (Nt-nt).

nt = Zi=1 E=1 nij

N, = E =1 J=Nij


We also define


(4.2.1)


, A = A2)
At )


, R = (o.2)-1


(4.2.2)


B
SBt B (1)
'Bt ~ (2B^)







Zt22.1 = Zt22 -Et21 .t11-t21

K, = E Ai) (A(1)T A) -1 AiT-_1


Mt = t21Kt + A2) (At1)T A) -1 A(1)TZl-
t-t t t 't lI


Gt = -t22.1 + (A2) t21-1 A(')

x (At(I)TE A 1 A(A' ( t21 -1 A1)
til t At (A X t,-t()

Our objective is to use all the data available at a given time in predicting

(t t(C'),t ) = PIC1) + P2

where Pi and P2 are known matrices. In particular, we may be interested in pre-
dicting retrospectively the population totals for the small areas at the previous time
points based on all the available sampled data at a given time. To this end, we derive

the predictive distribution of C2) given C~1) = c_1) in the following theorem. A proof
of this theorem is given in Appendix.

Theorem 4.2.1 Consider the model in 4.2.1 or 4.2.2. Assume that a, R, AXR,. A,R
are independently distributed with

a ~ Uniform (RP)

R ~ Gamma (Nao, ho); ao > 0, ho 0

AIR ~ Gamma(fat, ht);l = 1,--l,s

with at > 0, hi > 0, (1 = 1, ., s). Assume that nt + E=o hi -p > 2. Then conditional

on A = A, and C1) = c1), CI2) is distributed as multivariate t with degrees of

freedom nt + o ht p, location parameter Mtc') and scale parameter


nt = (nt + ht p ao+ f a + K Gt (4.2.3)
1=0 1=1







Also the conditional distribution of A given CI1)= c' ) has the pdf

f(A | c1)) oc 1-tiil A(1)T -1iA ) [I Ah-

( T \- ~(n+ELo A'-p)
O + E atit + +ct Kc, )) (4.2.4)

Now the conditional expectation and variance of Ct2) given C 1) = c1) can be derived
as

E(C\2) c C)) = E(Mt cMl))c(1) (4.2.5)

V (C)I c)) = V (E (C2\I A, c1)) | c'1)) + E (V (C2 A, c(')) I c))

= V(Mtc(,)Ic 1 ))+ (nt+Zhl-p-22
\ l=0 -

x E ao + a + c TKc) G, I c1)] (4.2.6)

From (4.2.5) and (4.2.6), for t > 1 we can find the posterior mean and variance
of ~ = ,(CI1), C2)) = P1CC12 + P2Ci2 where P1 and P2 are known matrices

. The Bayes estimate of ,(C '), C(2)) under any quadratic loss is its posterior mean
and is given by

e(c)) = [P1 + P2E (Mtc))] c

Also,

V [, (c('), C(2)) ic(')] = (c'2)1 c(')) P

Note that when P1 = E itP1i and P2 = OiP2i
where

Pii = [0, 0,- ,1,niTj ,0







and

P2i = [0, 0,'--, N, I-n,, ,0]

t(C 1), C 2)) reduces to the vector of finite population totals for the m small areas
at time j, (j = 1,2,**-,t) and whereas for the choice Pi = 1nPlif/Nij and

P2 = ED,1P2i/Nij, ~,(C'), C2)) reduces to the vector of finite population means
for the m small areas at time j, (j = 1,, 2- *, t).
Also for t 2 2, when P1 = (Ei=Pli and P2 = @iEP2i where


P [ 1 1T 1iT]
Ni(t_-) ,i(,-1), N tit

and

P2i = [0 T,,,, 1 IT 1


Ct(C 1), C2)) reduces to the vector of change in mean at time t from previous time
point, for the m small areas.

However, in order to find the conditional distribution of C'2) given C1') = c 1)
analytically, we need to find the posterior density f(A c,)) as given in (4.2.4). Even
when A is a small-dimensional vector, it is not easy to find this density through the
use of sophisticated numerical or analytic approximation techniques. In contrast, this
posterior density as well as the conditional expectation and variance in (4.2.5) and
(4.2.6) can be found with considerable ease in a more general case when Pj, (j =
1,--, t) and W are unknown, by employing the Gibbs sampling algorithm. It is
assumed that Vij is known. In such a case, we rewrite the model as:








( ) Q )( ) R, in)R d N1 (1)
'3

where Vij- Vi 11 Vij12
Vij2l Viji2 '

(i = 1,---,m;j = 1,...,t)

II. a() )I R IN i ))cr+( ~ b ),r- ;

(i = 1,---,m;j = 1, ,t)

.nd
III. bjb,-1, W,R N(Hjbj,_, r-xW); (j= 1,..,t)

We assume that a, R, j,, ,, and W are mutually independent with

ca Uniform (RP),

R .~ Gamma(lao, 1ho) ao 2 0, ho > 0,

Pj ~ inverse Wishart (0, kj),

W ~ inverse Wishart (0, ko).

The Gibbs sampling algorithm requires samples from the following complete condi-
tional distributions:

ctc), 2), 0R,b,... b b, ..,t ,W

bjclc ),c2),0,R,a 1i,..., t,'. ,bj-1,bj+l,.. ,b W,j = 1,..., t

t1,--, 12tbI),c02),0,R,a ,bi,,' bt,, W

WIj1) C2) R, a, b, bbt, b~ *,, t

Y(? Ic),0, R, b, bt,1,. ..,i,, W,,(i = ,...,m;j= 1,...,t)

The hierarchical structure of the model and the conjugacy of the priors lead to the
following complete conditional distribution:






(i) lccl),c2)t,O,r,bt,* ,bt,ox vkt,W
~ alcl), 0, r, 6bl,,, 6bt,, t,

~ [(Ep E=I X^>-
p"[r=l X= T -1Xi)

$=, E =,XT x-l (0; Zijbj),

(ET E=1 X Xij )-I1

(iia) For(j=1,...,t-1):

bj|c ) i),c ,)0, r, ,l, -, bj-x, bi+l, ,bt, b, i, W,
.bj0, r, a, 6b, *, bj6-, bj+l, bt, 1, ,* *t* W

Nq [(E=- Z -ij Zij + Hj+w-1Hj+1 + W-1)-1

x f{F1 ZO-1 (6Ow X1a)
+ HfjTW-bj+1 + W-'Hjbj-l,

(E= Z 1 Zij + Hj+W-1Hj+1 W-1)-

(7b) bIci ,c ,O r ,, ,...c,bt-,0j,... ,I ,t, W
(iib) bt, c r, ac, 61, *, bt6_, ik, *, ', W

SN E[(E ZI Z -Zi + W-1) -

x {Ei, Zt,1 (if, Xita) + w-1+'Htb_,

(E- zTo-lzt + w-1)1]
t= it t/







(iii) ,... tcIl),c 2),r,at,0,bl,*.*,bt,W

~ i,---, 0tlr, a,0, bl, bt
in Inverse Wishart [r E~i1 (Oij Xica Zijbj)

x (Oij Xijc Zijbj)T, m + kj]

(iv) W lIc1), 2), ,0, 6b,- --, bt,, ....,.t

~WITr, bl, -, bt

~ Inverse Wishart (r E (= (bj Hjbj-) (b, Hjbj_)T t + ko)

(v) 0....l(1) .(2) bl b, ba,, W
(v) 0c ) ,c( 2),a,b b,,,b ,- ,, (W
~Offic ,ct ,C ,b**,,bt,--,W

~ N [(V1 + -1)-1 (V-j1yj + 1 (Xi a + Zb,)) ,



(vi) "
(vi) yIc-), r, a, b__, "",, W

NN -n, [ + Vj2, 1V, (yj) 0!)) Vij Vij]

4.3 The Hierarchical Baves Predictor in a Special Case


This section considers a special case of the HB model in Section 4.2. We assume
that the vector of the ratios of variance components, A is known. Our target is again
to find an optimal predictor for

t, t,(C '), C 2)) = PIC ) + P2C 2)
for some appropriate P1 and P2, based on all the available data at a given time. To
this end, it suffices to find the predictive distribution of C2) given C(1) = c?1). This
is done in the following theorem.







Theorem 4.3.1 Consider the model given in 4.2.1 or 4.2.2 with A known. Assume
that ao and R are independently distributed with

ac ~ Uniform (RP)

R ~ Gamma (Jao, ho); ao 0, ho > 0

Also, assume that nt +ho- p > 2. Then conditional on C(1) = c1), C2 is distributed

as multivariate t with degrees of freedom nt + ho p, location parameter Mtc 1) and
scale parameter

O, = (nt + ho p)- (ao + ct )1) G, (4.3.7)


Proof of this theorem is similar to that of Theorem 4.2.1 and is omitted.
In this case, it is possible to obtain the Bayes estimate of t = (C'), CI2)) in
a closed form and is given by

et*(C)) = [P + P2E (MAIc1))] c) (4.3.8)

Remark 4.3.1 We may note that the predictor ef*(c,1)) given in (4.3.8) can be
obtained without using any numerical integration techniques or the Gibbs sampling
algorithm.

Remark 4.3.2 The predictor ef*(c(')) does not depend on the choice of the prior

(proper) distribution of R. In this sense, the predictor ef*(ct')) is robust against the
choice of priors for R. To see this,







E (C )ICi1)) =E [E{E (c 2ja, R, C 1)) R, C'}) IC1)]

= E [E{A~a + ~t21 t (C1 A1)a) IR, C1) Ct

= C(E21jC 1) + (A2) 2 A M)

xE [E {a R, C 1)} IC()]

= .21X C ()+At Et21 _A -A()
{,,o)T'r_-1A(1) -1 AT vFTI C)
x (At(1 A 1) -At 1)Til c1

= M,Ci1),

It is assumed that all the expectations appearing above exist.
Remark 4.3.3 The predictor ef*(c l)) of 4, ,(Ci', C2) can also be obtained in
an alternative manner. Suppose, first a known and r may be known or unknown. In
such a case, the best predictor (best linear predictor without the normality assump-
tion) of ,((C'1), C(2)) in the sense of having the smallest mean squared error matrix
is given by
EaR [ (C C1), C(2)) C = A(1)C + P2 [~t21E

+ (A Et21,'A(1)) a] a.e. (Ci))

Now, when a is unknown, then one replaces a by its UMVUE

(At1 A T (1) A/)T IC( I. The resulting predictor of S,(CIC )

turns out to be ef*(cPl)). In this sense, ef*(c1)) is also an EB predictor of (,(CI1), C2)).

4.4 Best Unbiased Prediction in Small Area Estimation

In this section, we prove the optimality of the predictor ef* within the class of
all unbiased predictors of 4~(C 1), C ). We consider the model given in 4.2.1 or






4.2.2 with A known, but a and r = (02)-1 unknown. We do not assume any prior
distributions for a and r and treat them as unknown parameters.
Write 0 = (aT, r)T.
Definition 4.4.1. A predictor T(C1')) is said to be a BUP of &,(Cg'),C 2)) if

Eq [T(CI1)) (CI), C2))] = 0 for all P and for every predictor 6(C~')) of

(t(C1), C(2)) satisfying

EO [6(C'1)) (&(C'), C2))] = 0 for all 0 (4.4.9)

V [6(C 1)) t(C&), C2))] V [T(C1)) ,(C c), Ci2))] (4.4.10)

is non-negative definite (n.n.d) for all 4 provided the quantities are finite.
The BUP property of eB* is proved in the following theorem.

Theorem 4.4.1 Consider the model given in 4.2.1 or 4.2.2 with a and r = (02)-1
treated as unknown parameters but A assumed to be known.
efB(C 1)) = (PI + P2Mt) CI1) is the BUP of t(C), () C ).

The proof of the theorem is facilitated by the following a general lemma which is
a simple extension of Lemma 3.3.1 of Datta (1990). The lemma concerns unbiased
prediction of a general g(Ct) (u x 1) based on C(1), where g(Ct) is not necessarily
equal to t (C' C2)) or linear in Ct. It is assumed that a finite second moment
exists for each component gi(Ct), i = 1, u.

Lemma 4.4.1 A predictor T(C 1)) E Ug is BUP for g(Ct) if and only if

Cov T [T(C')) -g(CI')), m(C))] = 0 (4.4.11)

for all g(Ct) and for every m E Uo, Ug = Class of all unbiased predictors 6(C'))

of g(Ct) with each component of 6(CI)) having a finite second moment. Uo is the






class of all real valued statistics (i.e., functions of C(1)) with finite second moments
having zero expectations identically in P.

Proof of Lemma 4.4.1 Let T(C(C)) = (TI(CI')), T,(C1)))T.
If 6(C)) = (61(Ci)), .. ,S,(C1l)))T is another predictor in Ug, then

v, [6(c~)) g(C,)] =

V4 [T(C1)) g(C,)] + V [(C1)) T(Ct)]

+Cov4 [T(C')) g(Ct), 6(C 1)) T(C,)]

+Cov [6(C'1)) T(Ct), T(C'1)) g(Ct)] (4.4.12)

Now T(CI')) b(C')) E Uo for every i = 1, ...,u.
Hence, using the condition of the lemma, for 1 < i < u,

CovP [T(C 1)) g(C), T,(C1') 6 (C~1)] = 0. (4.4.13)

From (4.4.12) and (4.4.13) it follows that

V [6(C)) g(Ct)] = V [T(C')) g(C,)]

+V, [6(C1)) T(C,)] (4.4.14)

for all 4. Hence T(CI')) is BUP for g(Ct).
Only if. Given that T(C1')) is BUP, we will show that the condition (4.4.11) is true.
First we will show that TI(C~1)) is BUP for gi(Ct) for every i = 1, -, u. Let Ui(C(1))
be any unbiased predictor for gi(Ct). Then 6*(C1')), a u-component vector with i th
component equal to Ui(C(')) belongs to Ug.
Then

V4 [6*(C t)) g(C)] VS [T(C')) g(C)]






is n.n.d. by hypothesis.
So we have

Vo [Ui(c$) gi(Ct)] VP [Ti(C 1)) gi(Ct)] > 0

and consequently Ti(C')) is BUP for g,(Ct).
Now following the usual Lehmann-Scheffe (1950) technique (also Rao, 1952), for any
mE Uo

Cov [Ti(C(')) g,(Ct), m(Cc1))] = 0 (4.4.15)

for all 1 < i < u. Hence, (4.4.11) holds, and the proof of the lemma is complete.

Remark 4.4.1. It follows from the proof of the above lemma that the BUP of g(Ct)
is unique with probability one.

Proof of Theorem 4.4.1 In view of Lemma 4.4.1, it suffices to show that for every
m(CP)) e Uo,
Cov [eB*(CM1)) 4t(C1)), C22)), m(C01))] 0 for all <.

that is E [P1(MtC1) C2) m(C2))] = 0 for all 4.
Since, under the model,

MtCf )- E(Ct2) I Ct1)) = (A2) Et21lA('))

x (A )T -E At )-A t At EC.l ]


using E[m(C~1')] = 0, it suffices to show that

E [(A()Ftilc )m(C))] = 0 (4.4.16)

for all k. Since Ep[m(Cl'))] = 0,







J m(C 1))exp [-r(c1) A1)a)T X (cl) A1)a) d]1) = 0, (4.4.17)

differentiating both sides of this equation wr.t a, one gets (see p 318 of Rao, 1973)



fJ A1)TEji (c1) A 1)a)m(CI1))

xexp [r(c1) A(1))T (c') A1')a)] dc1) (4.4.18)

=0.

Using Ep[m(C(1))] = 0 again, (4.4.16) follows from (4.4.18). The proof of Theorem
is complete.

Remark 4.4.2. We can also prove Equation (4.4.16) in the following way. Note

that since C(1) N (Atl)a, r-1 u) (Ati (') C-), C1)TK 1) is complete

sufficient 4. Hence At(1)T E" C(1) must have 0 covariance with every zero estimator

m (Ci)), i.e., E [(At (1)TL' C1)m(C 1)) = 0.

Next we generalize Theorem 4.4.1 for certain non-normal general distributions.
Suppose that

e* = (WT, rT)T

A = Block Diagonal (B(It 0 W)BT, 1" 0 + Vt)

Assume that given R = r, e* ~ N(0, r-IA), while marginally the distribution
function of R is an arbitrary member of the family F = {F : Fis absolutely continuous
with pdf f(r) = 0 for r < 0}. Let F* denote a subfamily of F such that each
component of eB*(C1)) and ,(C'), C(2)) has finite second moment under the model
4.2.1 and the joint distribution of e* and R. In such a set up, we now prove the

BUP property of the predictor ef*(C(1)) in the following theorem which is a simple
extension of the Theorem 3.3.2. of Datta (1990).







Theorem 4.4.2 ef*(CI1)) is BUP of t(C(1), C2)) under the model in 4.2.1, e*lR =
r ~ N(O, r-1A) and R has a df from F*.

Proof of Theorem 4..2 Using lemma 4.4.1 and following the proof of Theorem
4.4.1, it suffices to show that

Ea,F [(At()T-C ))m(C(1)] = 0 (4.4.19)

for all a E RP and for all F E 7*, where

Ea,F [m(C1))] = 0 (4.4.20)

and Ea,F [m2(C ))] < oo for all a and for all F E F*. Consider the subfamily

S= {R gamma(Ic, d) : c > 0, d > 2} of F.

Since (4.4.20) holds for this subfamily H, Ea,F [m(C(1))] = 0 for all a and for all
F E H gives

fo~ exp(- cr) fr(nt+d)-1exp [-,r(c1) A (1))ET -1 (c1) A 1)a)]
S(4.4.21)
xm(c l))dce1)dr = 0

for all oa, c > 0 and d > 2. Now using the uniqueness property of Laplace transforms,
it follows from (4.4.21) that

r(n+)- exp r(c1) A 1)a). E (CO) A$1)a)]

xm(ct1))dc1) = 0

a.e. Lebesgue for all r > 0 and all a, i.e.,

exp [- r(c1) A1a)Tr (c) A a)]

xm(ctl))dc(1) = 0 (4.4.22)






a.e. Lebesgue for all r > 0 and all a.
Differentiation of both sides of (4.4.22) with respect to a and some simplifications
using (4.4.22) lead to

(A1)T (1))m(c1)


xexp r(c(') A 1a) 7 A1)a) de1) = 0 (4.4.23)

a.e. Lebesgue for all r > 0 and all a.
Multiplying both sides of (4.4.23) by rA"' and integrating with respect to dF(r) where
F E F*, one gets (4.4.19).

Remark 4.4.3. Since F* does not contain the degenerate distributions of R on
(0, oo), Theorem 4.4.1 does not follow from Theorem 4.4.2.
Remark 4.4.4. In Theorem 4.4.2, if we take H for F*, we see that the marginal
distribution of Ct is given by the family of distributions

{( |Nt, Atca, (c/d)Et, d) : a RP, c > 0,-d > 2} and ef*(c1)) is BUP for

(t(C'), C(2)) for this family where (- |Nt, At (c/d),t, d) is Nt-variate t-distribution
with location parameter Atce, scale parameter (c/d)Zt and degrees of freedom d.

Next we dispense with any distributional assumptions in (4.2.1) and show that ef'(cl))
is the BUP of ,(C'), C2)) within the class of all linear unbiased predictors. A pre-

dictor 6(C'1)) is said to be linear if b(C(1)) has the form LtC1') for some known
u x nt matrix Lt. If in addition Eg [.(C'1)) E(C1), C2)] = 0 for all P, we say

that 6(Ci1)) is a LUP of ,( ), C(2)). We now introduce another definition.


Definition 4.4.2. A LUP SftC1) of (t(C(1), C2)) is said to be a BLUP if for every LUP
LtC1) of E(C1), C2)), Vp [LC1) ,(C), C2))] V [SC1) (C), CI2))]

is n.n.d. for all P.







We now prove the BLUP property of ef'(C1')) in predicting ,(C 2), C2)) in the
following theorem.

Theorem 4.4.3 Consider the model in (4.2.1) and assume that Eg(w) = 0, Ep(qr) =

0, Eo(w) = 0, Ef(oWT) = 0, Ep(Tqr) < 00 and EO(wTw) < oo. Then e*(C(I))

is the BLUP of t(C(l), C2)).

Proof of this theorem requires the following lemma.

Lemma 4.4.2 A LUP StC1) of ~(CtI), C2)) is a BLUP if and only if

Cov4 [SCt') tt(C(1), C(2)), mC (1)]
Cov[C mIt C(I] = 0

for all 0 and for every known nt x 1 vector m satisfying Egp(mTC)) = 0 for all
.

Proof of this lemma is similar to the proof of Lemma 4.4.1 and is omitted.

Proof of Theorem 4.4.3 If EO(mTCM)) = mTA a = 0 for all a, mTA1) = T.
Hence

Cov4, ef ) (C1C ), m

= Cov [P2(MtC(1) C2)), mTC~)]

= P2(Mttn St21m)

= P2(A2) E't21Z-A(1))(At(1)T IA(1))-'A(1)Tm
= 2 tll t t t11l -t tt
= 0

for all -. The last two qualities follow from the definition of Mt and from the fact
that mTA(1) = OT. Applying lemma 4.4.2, the result follows.

Remark 4.4.5. It can be easily proved that BLUP of g(Ct) is unique with probability
one.







4.5 Stochastic Domination


This section examines the stochastic dominance (as in Hwang (1985)) of the HB
predictor ef*(c1)) within the class of all linear unbiased predictors of (t(C$1), C(2)).
We will consider the model of Section 4.2 with A known. As in the previous Section, we
will treat a and R as unknown parameters. Also, we write e* = (wT, 7T)T where w
and rl are as defined in Section 4.2. We dispense with the normality assumptions on w
and if. In the previous section, we established the risk optimality of the HB predictor

ef*(c~1)) within the class of all LUPs of t(C1), C2)) without any distributional
assumptions on e*. The optimality of ef*(c1)) within LUPs holds a fortiori under
the quadratic loss

L 1= 16-_t

= (6 t)Tn(6 t)

= tr [Lo(, 6)],

where 5? is a n.n.d. matrix. We shall refer to such a loss as generized Euclidean error

w.r.t. 1. Let RL(P, ,; 6) = E [L (6(C()) td(C1), C 2)),)] be the risk

function of the predictor 6 for predicting t, under a loss function which is a function
of generalized Euclidean error w.r.t. 51 for some function L.
We now assume that e* has an elliptically symmetric distribution with pdf given
by

h(e* A,r) oc Ir-1A f(re*TA-le*) (4.5.24)

A = Block Diagonal (B(I 0 W)BT, (Im 0 ftL ) + Vt)


and the known non negative function f is such that








/ Wj EEE(ijkl + el)+ 1 f(re*T r-le*)de* < C0 (4.5.25)
=lk=l i=lj==1k=1

where

j = (Wjl,Wj2,..., Wjk)T

We denote this distribution by /(,0,r-'1) where (p,, *a'2) denotes the distri-
bution whose pdf is given by


k(sal, '*, a) ox Io20* f((a -)T*(a- _)/a2) (4.5.26)

where s and A are in RP, W*(p x p is p.d. and a > 0.
Note that the normality of e with mean 0 and variance covariance matrix r-1A is
sufficient but not necessary for (4.5.24) and (4.5.25) to hold.
It follows from (4.5.25) that E(e*) exists and from (4.5.24) E(e*) = 0

Note that e** = r-A14 e* has a spherically symmetric distribution 9f(0, I,.) with

characteristic function (c.f.) E [exp(iuTe**)] = c(uTu) for some function c (see

Kelker, 1970) where i = v/-, u = (ui, u.)T and q* = Nt +tq.
Hence e* has c.f. given by

E [exp(iuTe*)] = c(r-luTAu) (4.5.27)

Now write Di = Bt(j)w + e(j) (j = 1,2)

Then Dt = (D T, D2T)T has a c.f. given by

E [exp(iu*TD,')] = c(r-lu*TtU*) (4.5.28)

where


St = B,(I W)B + (I t) + Vt







Comparing (4.5.24) and (4.5.27) one can see from (4.5.28) that Dt has also an ellip-
tically symmetric distributions Sf(0, r-'Tt) with pdf given by

h(d*T|t, r) ox r-t I f (rdfT tld*) (4.5.29)

We now introduce the notion of "stochastic" or "universal" domination as given
in Hwang (1985).

Definition 4.5.1. An estimator 61(C&1)) universally dominates 62(C(1)) (under the
generalized Euclidean error w.r.t 17) if for every P, and every nondecreasing loss
function L, RL((, tt; i6) < RL( ~t; 62) holds and for a particular loss, the risk
functions are not identical.
In Hwang (1985), it has been shown that 61 universally dominates 62 under the
generalized Euclidean error w.r.t 2 if

b6(C 1)) t(C(C), C(2))2I

is stochastically smaller than

62(Ct')) tE(Ct), CIt )

The following theorem establishes the stochastic dominance of ef*(C 1)) over every

LUP of ((C~'), C'2)) for a general class of elliptically symmetric distributions of e*.

Theorem 4.5.1 Under the model given in 4.2.1 with A known and assuming e* has an

elliptically symmetric pdf as in (4.5.24), etf*(C )) universally dominates every LUP

6(C~')) = SC1') of^ (C '), Ce2)) for every p.d. Q.

To prove this theorem, we require the following lemma.

Lemma 4.5.1 If D(Nt x 1) has pdf h(djIN,,r), then for every L(u x Nt), u <

Nt, LD A (L LT)2D, = (I,, O)D where

0 (u x (Nt u)) is a null matrix and = means equal in distribution.






Proof of Lemma 4.5.1 The proof follows the arguments of Hwang (1985). From
(4.5.27) it follows that D has c.f. E [exp (isTD)] = c (r- sTs). Hence

E[exp (irL D)] = c r- sL LL ,) (4.5.30)

where sl is a u x 1 vector. Next using (4.5.30)

E exp (is(L LT)2D,

= [exp (isT(L LT)2 (, 0)D)]

= c(r- L L) ( 0) (Iu )(I, O)T(L L sT)28

= c(r-ls (LL) sl), (4.5.31)

so that the lemma follows from (4,5.30) and (4.5.31).

It follows as a consequence of Lemma 4.5.1 that

DT LT LD = (L D)T(L D)
I TT
((L L) Du) ((LLT) D)

DL LTDu (4.5.32)

We shall use (4.5.32) repeatedly for proving Theorem 4.5.1.


Proof of Theorem 4.5.1 Let StC~1) be a LUP of (CI), C(2)). Then, under the time
series model (4.2.1) with A known and from the fact that E(e*) = 0, it is easy to
see that


(S, P,)A1) = P2A 2)


(4.5.33)






Writing D = '2D* and using (4.5.32) and (4.5.33) one gets

[stC1) t,(C('), C2) ) a [Scr'1) ,(C 1), C 2)]

= D*T[St-P1 P2]TO[S -P1 -P2] D

= DT '[St P2]T [t P1 P2] D

SDu n [St -P P2E [St P P2] rD,u. (4.5.34)

Similarly,

[ef(C'1)) ,(C), c 2)) [TBef-(C'1)) ,(C ), C2))]

= D*T[P2Mt P2]T, [P2Mt P2] D*
T 1 1
= DTVE[P2Mt P2] [P2Mt P2] } D

D Djs [P2Mt P2]1t [P2Mt P2]T 4D,. (4.5.35)

Write 1 = St Pi P2Mt. Then

rhs of (4.5.34)=rhs of (4.5.35)+D, T F2tinFTfDD, (4.5.36)

since using the definition of Mt.

(P2Mt P2) ( T


,( t21 )
= P2(Mt ( tpl)FT
= p2 (MV2)ix 21) T)T

= P (A) t21 A )) (AX1. .A(1) A 1T (4.5.37)

and using (4.5.33),

FA1) = (St- P P2Mt) A)







= P2 (A) MtA(1))

= 0 (4.5.38)

Theorem (4.5.1) follows now from (4.5.34)-(4.5.36).
Also since tEm is positive definite, it follows from (4.5.36) that

rhs of (59) = rhs of (60)

if and only if T = 0, that is St = P1 + P2Mt.


4.6 Best Equivariant Prediction


In this section we concentrate on the equivariant prediction of t,(C(1), C'2)) on

the basis of CI1) under suitable group of transformations. Assuming A as known, we
rewrite the model in (4.2.1) as

Ct ~ N(Ata, r-, Et) (4.6.39)

where

St = Bt(I W)BT + (Im 0 t) + Vt (4.6.40)

We redefine 0 = (a aT)

We are interested in equivariant estimation of P1A)ar +P2A a2) where the matrices

A(1), A ), P1 and P2 are as introduced earlier in Section 4.2. It suffices to estimate
the vector a where Ata can be viewed as a location vector.
Note that C, + Ate ~ N(At(a + /), a2E,)
We are interested in developing an estimator of a which remains equivariant under
a new origin of measurement. So a "natural" group of transformations will be

9i = ({g,f E RP : go(Ct) = C, + A13} (4.6.41)







Now assume that we partition Ct and At as


C (2) At = A(


If b(C1)) estimates (P1AP1) + P2A2))a then 6(Cd1)) + PiA(')3 + P2A 2) should
estimate

(PIAl') + P2A 2))(a + 3) = (Pi P2At(a + 3)

Treating At(a+3) as the new location parameter, one can expect that 6(C1')+A')P3)
(estimator based on the "observed part" of Ct and At) will estimate

(PIA~') + P2A 2))(a + 3)

So we should have

b(C 1) + A1)3) = s(C1)) + P+AI') + P2A 2) (4.6.42)

for all C '),P. Now if we are interested in estimating (C 1), C$2)), instead of

EOb(te(C1l), C(2)) = PiAll)a + P2A 2)~

We can still use 6(C.1)) and again we will impose (4.6.42) on 6.
Note that the induced group of transformations on the parameter space

D = {f: 5 = (a, aT)T, a E R, a > 0} (4.6.43)

is given by

1 = {f,/3 E RP : g (() = (, T + 3T)T} (4.6.44)






A loss function L(t,< ; 6) is said to be invariant for predicting (CI1), C2)) by
6(C(')) if

L (,(C'), C )) + (PIA() 2 + ))PA);

b(Cc1)) + (PIA() + P2A )3)

= L(4(Cc, C()), 4; 6(C1),)) (4.6.45)

for all C'), C(), 3, and q. An estimator 6 satisfying (4.6.42) is said to be equivariant.
We will now be interested in the best equivariant prediction of

4E(C(I), C2))

The following lemma provides a useful characterization of the class of equivariant
predictors of 4,(C1'), C12)).


Lemma 4.6.1 Let 6o(CI')) be an equivariant predictor for t(C ), CI )). Then a
necessary and sufficient condition for a predictor 6(C(1)) to be equivariant is that

6(C(1)) = bo(C~1)) + h(C')) (4.6.46)

for all C), where

h(C(1) + A(1)0) = h(C(1)) (4.6.47)

for all C$1) and all 3.

A function h satisfying (4.6.47) is said to be invariant and hence must be a function
of a maximal invariant function under the group of transformations

G, = g' ,f E RP : g'(C1)) = Cl) + A 1)} (4.6.48)







induced by Q1 in the C') space. The next lemma finds a maximal invariant function
under the group of transformations Q[.

Lemma 4.6.2 A maximal invariant function under the group of transformations 'i is

KtC(1) where

Kt = E-' E1)A--A(1)
t = lt- -ti t -t A tn t I i t

Proof of Lemma 4.6.1 First we show that KtC(1) is an invariant function. Note that

KtA') = 0

and hence

Kt(C(1) + AI)3) = KtC1)

for all C'1) and 3. Now we will prove the maximality.
For C'), Cl) such that

KtC.1t v rt(1)
KtC( = KtI(1)

we have C) C-') is orthogonal to the row space of Kt.

Also rank (Kt) = nt p and rank (A1)) = p.

Again using KtA(1) = 0, it follows that the column space of A(1) forms an orthocom-
plement of the row space of Kt.

Hence C() C) = A1') for some 0, i.e.,

C() = C + A)
tl t2+

proving thereby the maximality of KtC1).

Since etf*(C )) is an equivariant predictor of g(C'), tC2)), using lemmas 4.6.1 and

4.6.2 we can characterize the class of all equivariant predictors of t(C 1), C'2)). This
is given in the next lemma.







Lemma 4.6.3 Under the group of transformations Q\ given in (4.6.41), a predictor

6(C 1)) of ,(C$1), C 2)) is an equivariant predictor if and only if 6 has the represen-
tation

6(C1) = ef(C)) + 1(KtC1)) (4.6.49)

for all C 1), where I is an nt- variate vector valued function.



Define the loss functions

Lo((,, 6) = (6 _,)(6 _f)T, L1(t,, 6) = tr[MLo(tt, 6)], (4.6.50)

where 1 is a n.n.d. matrix.
Note that Lo and L1 are invariant loss functions.

Definition 4.6.1. An equivariant predictor 60(Ct1)) of ,(C1), Ct)) is said to be a
best equivariant predictor under the loss Lo if for every other equivariant predictor

6(C~I)) of ,(C ), C(2))

ES [Lo(t,, 6) Lo((,, 6o)]

is n.n.d.
Note that if bo(Cl1)) is the best equivariant predictor of lt(Cj1), C ) under the loss

Lo, then it is so under the loss L1 for every n.n.d. 9. Conversely, if 6o(C1')) is the

best equivariant predictor of tt(C$1), C(2)) under the loss L1 for every n.n.d. 9, then

it is so under Lo. Next theorem establishes the optimality of ef*(C(')) within the

class of all equivariant predictors 6(C(1)) of bo(C$1)) under the loss Lo, and a fortiori
under the loss L1 for every n.n.d. 7.

Theorem 4.6.1 Under the normal time series linear model given in (4.2.1), the group
of transformations 9 given in (4.6.41) and the loss Lo given in (4.6.50), the best

equivariant predictor of (C ), C2)) is given by ef*(C'l)).






Proof of Theorem 4.6.1 Note that from (4.6.49) any equivariant predictor 6(C 1)) of
t (CI, C)) is given by

6(C')) = ef*(C') + (KC'))

Only those 6 are to be considered for which E,[Lo(t, 6)] is finite.
Note that EO[Lo(t, et*)] exists finitely and hence

EO[IT(KtC('))l(KtCl))] <_ 2tr {(E[Lo(t, ef*)] + EO[Lo(E,, 6)]}

Hence by Cauchy-Schwarz inequality E[I(KtCIr(KtC'))] is finite.
Under the matrix loss Lo,

E0 [Lo (t, 6) Lo (t, et,*)]

= E0 [1 (KtC1)) 1T (KC~1)]

+Eo [(ef* (CM)) t, (C1), C(2))) 1T (KtCl))] (4.6.51)

+E0 [I (KCI1)) (ef (C,.) (C), C t)) T] (4.6.52)

Now we will show that the last two product terms are null matrices. Using the defini-
tions of ef(C'1)) and Mt, we have

E B [(eCf* (C)) t (CM1), C 12)) C1)]
= P [Mt,( E, (C2 ,C1)]

= P2 [MtC )- A 2 t2l1 (C )- A1)a)] a.e. Leb.

= P2 [A2) Zt21E A1)] A ( )At A(')

x (A(1)TvE C 1) A( )a) (4.6.53)






Now

Covq A[Ai1STKC\, Kc1)]= AnK

= A(1)Kt
= 0

Since C1') is multivariate normal and At(')T.c 'C(1) and KtC() are linear functions
of C(), so AtT E-Cc and KtC'1) are independently distributed. Now using (4.6.53)
and independence of At(1 )T C-1) and KtC1'), we have


E e [(ef* (CIM)) (C~1), C(2))) IT (KC1))

= [EE {ef* (CM1)) (Cl1), C(2)) IC1)} T (Ktc')]

= P2 (A Et21 AA1)) (A,(1)T~~- A1)

xE [(A (1)T 1C(1) At(I)T 1) A (lKC 1))]

=P2 (At t21Et A ) A(1)T At
= Pt (AI2)- ^2 ~A$15) (A"tvlAl1)-1

xE ((A()TT-1 C(l At(')TX-A (") EO ( (K (Cc)))
=0 (4.6.54)

Now from (4.6.52) and (4.6.54)

E L [Lo (t, 6) Lo (t, ef)] = E4 [1 (KC$)) 1T (KC$I)]
S0 (4.6.55)
with equality if and only if

P4 [1 (KCl1))] = 0,







i.e.,P [6 (c1)) = eB(* C))] = 1

The proof of theorem is complete.

Next instead of g1 we consider the group G2 of transformations given by

2 = (gf,d, 3 E R, d > 0 : gp(Ct) = d Ct + At3} (4.6.56)

A predictor 6(C(1)) of 4t(C'), C(2)) is said to equivariant under the group 92 of
transformations if


6(dC$1) + A'),3) = d6(Cl1)) + (PIAl1) + P2A 2)), (4.6.57)

for all C~'),3 and d > 0. Note that ef*(Ct1)) is again an equivariant predictor of

t(C('), C (2) under the above creterion. Also, note that if 6 is equivariant under Q2
taking d = 1 in (4.6.57) it follows that it is equivariant under gQ. So the class of
equivariant predictors under G2 is smaller than that under g1. But we will prove that

et*(C1l)) is best inside this smaller class under a larger family of distributions of Ct
which includes the normal distribution as a special case. We will apply the group
of transformations g1 on the family of elliptically symmetric distributions with pdf
given by

f (ct) oc l2Etl-I f((ct Ata)TL '(ct At,)/or2 (4.6.58)

where f is a known non-negative function satisfying

S[1 + (ct Ata)T' L (t Ata)]

xf((ct Ata)TZ'l(Ct Ata)/"a)dct < oo (4.6.59)

Note that Ct ~ N(Ate, C2aEt) is a special case satisfying (4.6.58) and (4.6.59). It is
easy to see that the above group of transformations g2 on Nt-dimensional Euclidean
space for Ct induces a group of transformations







-2 = { ,d, t3 E RP, d > o: gd = (da, dac + T)T} (4.6.60)

on the parameter space given in (4.6.43).

As before, a loss function L(t, 0; 6) for predicting Et(C 1), C2)) by 6(C1l)) is
invariant under the group of transformations g2 if

L(dt(CI1), C2)) + (PIA ) + P2A(2)3, ,(4);

d6(Cl')) + (P1A(') + P2A(2))3)

= L(, (C'), C2) ), 4; 6(C~1))) (4.6.61)

for all CI', C'), /,d(> 0) and 2.
We shall consider the special losses

L2(,t, 4; 6)= Lo(~t, 6) /a2 (4.6.62)

and

L3(t, 4; 6) = L(t, 6) /a2 (4.6.63)

Both these losses satisfy (4.6.61).

To find the best equivariant predictor of ,(C(1), C~)) in this set up, we will
present two lemmas which provide a useful characterization of the class of equivariant
predictors.

Lemma 4.6.4 Let bo(C(')) be an equivariant predictor for Et(C 1), C 2)). Then a nec-

essary and sufficient condition for a predictor 6(CI1)) of ,t(C1), C(2)) to be equiv-
ariant is that

6(C'1)) = so(C(') + h(C(1)) (4.6.64)


for all C$1), where






h(dC') + A')3) = dh(Ci')) (4.6.65)

for all C I), and d > 0.

We will follow the arguments of Lemma 2 and 3 of Datta and Ghosh (1988) to find
a representation of the function h in (4.6.65). Define

p(KC')) = KgCt)/(C 1)rKCg1)) )CC 1 KC,,ol (4.6.66)

Note that since KtStlnKt = Kt, so

C 1)TKC1't) = (KtC))TEttI (KC(l))

and p is indeed a function of KtC'). It can be shown that p(KtC'1)) is a maximal
invariant under the group of transformations

S= {=g3,d, P RP,d > o: g/ (Cl1) = (dCe1) + A1))3} (4.6.67)

induced by 92 in the C )-space.
The following lemma characterizes the class of functions h(C'1)) satisfying (4.6.65.
We will use this lemma to characterize the class of equivariant predictors.

Lemma 4.6.5 A function h(Ci'))(u x 1) satisfying (4.6.65) if and only if h has the
representation

h(CI()) = (C(I)KtC())is(p(KI C('))) (4.6.68)

where s(u x 1) is an arbitrary function ofp(KtC1)).

Proof of Lemma 4.6.2 Assume h has the representation given in (4.6.68). Now, since
(dC(1) + Al)3)TKt(dCl) + A(1)t) = d2 1)T KtC 1) and since p(KtC1)) is a
maximal invariant under G', so p(Kt(dC(1) +A A),)) = p(KtC(I)) and consequently

h(dCI ) + A1')3) = (d2C )rKtC1)) s(p(KtC1))

= dh(C1))







Hence (4.6.65) is satisfied. only if. Since h satisfies (4.6.65) for all c'),3 and d > 0,
taking d = 1, we see that h must satisfy h(C~1) + A1)/3) = h(C1C)) for all C$1) and
3. This implies that h must be invariant under 9Q, and hence must be a function of
KtC ). So h(CI')) = s(KtC\ )) where s(u x 1) is an arbitrary function satisfying

s(K,(dC') + A(1))) = ds(KtC1))

i.e.s(dKtC(')) = ds(KtC')) (4.6.69)

for all d > 0. Now taking d = (C 1)TKCG ))- for all C t1)TKC1) > 0 we have
from (4.6.69)

h(C)) = s(KC1))

((cT- K8 c 1)-) KtC(1)) (C$1)KC('))


= (p (KtC1))) (C 1)K1Cl)) 2

Now for C'1)KtC') = 0, if we take h = 0, then (4.6.65) is satisfied and we can
represent h by (4.6.68).

Since etf(C t)) is an equivariant predictor of gt(C1), C12)), it follows from lemma
and lemma that 6(C ')) is an equivariant predictor of t (C'), C(2)) under the group
G2 of transformations if and only if

6(C(')) = ef*(C(I)) + (C1)T1KtCI)) s(p(KtC(1))) (4.6.70)

for all C').
Definition An equivariant predictor bo(C ')) of t (C), C(2)) is said to be a best
equivariant predictor under the loss L2 if for every other equivariant predictor b(C ))
of (C), C 2)







E0 [L2(t, 0; 6) L2(, 0; 6o)

is non negative definite.
Remark. Note that if o6(C(1)) is the best equivariant predictor of ~t(C(1), C 2))
under the loss L2, then it is so under the loss L3 for every n.n.d. S and vice versa.
We now establish that ef*(C 1)) is best equivariant predictor of gt(C(1), C2))
under the loss L2.

Theorem 4.6.2 Under the time domain HB model given in and the family of ellipti-
cally symmetric distributions as given in (4.6.58) and (4.6.59), the group of trans-
formations G2 given in (4.6.56), and the loss function L2 given in (4.6.62), the best
equivariant predictor of t,(C(), C(2)) is given by ef*(C')).

Proof of Theorem 4.6.2 Note that from (4.6.70) any equivariant predictor 6(CI1)) of

tt(CI(), C(2) is given by

6(C(')) = ef*(C(')) + (C K)TtC()) s(p(K1C(1))).

Only those 6 are considered for which E0 (L2(tt,,; 6)) is finite and in that case for
the corresponding s, we have

E [CIK,C (p(K,CI )) (p(KtC(!))]

is finite. Under the matrix loss L2,

2E [L2(-,, O; 6)- (,, [; ef *)

= E C KcKtC)s(p(KtC s1) (p(KC(1)]

+E [(eB*(C)) t(C(I), C2)))(C(1)TKC) T(p(KCI))

+E [(C( KC )' s(p(KC()))(ef (C ) ,(C$), C)))] (4.6.71)

It is enough to prove that the last two product terms in (4.6.71) are null matrices.







Eb [etf(C')) t,(C '), C t))IC')] = P2 [M,C) E t(C 2)C )] (4.6.72)

Now from the property of elliptically symmetric distributions, it follows that
E[C)C())] = A2a + 1 (C (1) A'2)a)a.e.Lebesgue

and using this, we have from (4.6.72) that

E eB*(C(l)) tt(C(I), C(2)) IC l)

= P2 (Ai2) Zt21l-1A(1)) (A(1)T -A(1))-1

x (At(1)TE (1) At(1)T X A1)c) (4.6.73)
X Llt t t t .llll

Then, the second term of the rhs of (4.6.71)

= P (A 2) E't21-1A 1)) (At()T -1A (1)-1

xEo [(CT KtCt)(AEt(l)Tr -l C ) -A ')TEZ-1 A1)

X T (p(KC t1)))] (4.6.74)

Now we will show that the expectation of the rhs of (4.6.73) is a null matrix. Note

that since Ep (CI1)TKeI1)) < oo) and Ei(C(l)) = Al1)c, by Cauchy-Schwarz

inequality, (C'-)TKtC-'))At E)T -'A1) has finite expectation. Now using the in-
dependence of ((Cf)KtC(1)~ A )TE C)) and p(KtC')), we have
.,t t t t Il tll



xs (p(KC1))]
E [(CtT Ktc) (At(1)T l C(1) A (I)T X1 A$1)a)



EZ [(CO KgC Et)A gI A(')c )]

xt u t s(p(Ktiet)))r (4.6.75)

Next using the elliptical symmetry of C'), it follows that







Cl) Al) a 1 -_(C1) A1) a)

Now since KtA(1) = 0 and Kt is n.n.d., one has

[(C$1) Al1)rK)T (C1) A 1)cA)] [AETTI(C(1) A')a)

= -[(C) A)}T K, {-(C1) A1)}]

xAf {f -(C1) A }1)

i.e.(CtTKtCt)) }AT (C'1) A1')a)

= -(C'1)TK C ')) A ET (C') A( ')) (4.6.76)

Now taking expectations on both sides which exist finitely due to the previous obser-
vation, one gets

E4 [(C KtCl)2A E I(C ) A}')a)] = 0 (4.6.77)

Hence from (4.6.71), (4.6.74), (4.6.75), and (4.6.77) we have

a2E, [L2(,t, ; 6) -(t, 4; e')]

= E [(CiF KtCl))s(p(KC s)) (p(KtC1))]

which is n.n.d. for all 0 and the two risk matrices are equal if and only if

PO [s(p(KtC'))) = 0] = 1 forall

i.e.,PO [6(cl1)) = efB(C(l)) = 1 forall 4

Remark. Although (CiKtC)j,At(,tC')t is sufficient and p(KtC(1)
is ancillary, Basu's theorem can not be applied since the sufficient statistics is not
complete.














CHAPTER 5

SUMMARY AND FUTURE RESEARCH



5.1 Summary


In this dissertation, an estimation and prediction methodology is developed for

simultaneous estimation of several strata means at different time points, based on data

from repeated surveys. The methodology adopts a HB model which incorporates time

dependent random components. The resulting HB estimators for a small area "borrow

strength" over time in addition to borrowing strength from other local areas as has

been the case in most of the existing methods of small area estimation. The Bayesian

procedure provides current as well as retrospective estimates simultaneously for all

the small areas. Application of this HB procedure to the median income estimation

problem, in Chapters 2 and 3 has demonstrated the usefulness of this procedure in

providing substantially improved estimators for small areas.

Implementation of the HB methodology, in finite as well as in infinite population

sampling situations have been illustrated by adopting a Monto Carlo integration

technique known as the Gibbs sampler. Using this procedure, in both the univariate

and the multivariate cases, the posterior density as well as conditional mean and

variance can be obtained with considerable ease.

The HB predictors of a vector of linear functions of the finite population obser-

vations, in the special case of known ratios of variance components, are shown to

be best unbiased linear predictors without any distributional assumptions. It is also







shown that these predictors stochastically dominate all the linear unbiased predic-

tors for elliptically symmetric distributions and are best equivariant predictors under

suitable group of transformations.


5.2 Future Research


The HB analysis considered in this dissertation assumed the variance-covariance

matrix for the basic observations as known. However, often in practice, only the

sample variance-covariance matrix is available. We can extend our HB model by

setting either proper or improper priors on the unknown variance-covariance matri-

ces. We also assumed that the process through which the random components evolve

through time is known. The case when this process is unknown needs further in-

vestigation. Also, the application of general technique of "borrowing strength" over

time can be explored in other application areas e.g. in multicenter clinical trials in

biomedical research or in on-line quality assessment of different production lines in a

manufacturing industry.

It is important to investigate the robustness of the HB procedure considered in

this dissertation, to handle the case of heavy tailed priors. It is also, worthwhile

to extend our Bayesian model to allow for the presence of outliers in the data by

assuming scale-contaminated normal distributions.














APPENDIX


PROOF OF THEOREM 4.2.1.



The joint pdf of Ct, a, R and A is given by



f(ct,a,r,A) oc r t+Eih [j)-f Pt]-1 Ad 24


x exp [- (t Ata)TE7-1(Ce Aa) + ao + Zall
1 1l=1 U


(ct Ata)T-1(C, Aa) = a (AtT r-At)-l AtT-lct]

x (AtT Z At) [a (ATE At)-1 A Tr~ t]

+ ctTQtct

where

Qt = -1' -iAt(ATTZElA)-lAATA -1

Integrating (1) w.r.t a, the joint pdf of Ct, R and A is given by


f(ct, r, A)


oc Il- lIA t 1At2l-r(Nt,+ E. h-p)

x exp -- ao + aA + cTQ1c))] f
S =1 1=1


Integrating (3) w.r.t R, the pdf of Ct, and A is given by

T 1 T '(N+' Eo-p)
f(ct,,A) (oc ILtl-At E7lAt-l2 ao + atjt + cTQtt)
L-=1









1= 1
;=1


Now, using a standard formula for partitioned matrices (e.g., Searle (1971), page 46),

we have

tt _t vttc T +( ($2)- Et2--lci l ) 2 T


Similarly


cT -'At


x(c (t21 tll2




= ct()TZ-A1) +( i t -1i(A 1)7 t22.1
-- t +-tllt t21 tlA() 2


x (c2) Et21-A1)

= T +s,
U1 +2,


AtT E At


- A ()TE-IA(1) + (A(2) 21-- A(1) -
tllit Xt21Xt1 t 't22.1


x (A() t21, A-1))


From (7)

(AtTXt At)-=

(A (1)T- 1 A '))1 (At(1)T-IA(1))-1(A 2) t2l-I1A('))

x (,22.1 + (A 2)- Et21Z-1A)1)) (At(1)TX-1A.1) -


(A 2) -t21-t 1 A( 1)


x (A(2 t21E iA(1))(At )T A ( 1) -1
(A 2t t-. "t t1il t

= At(E)T-A (1) (A (1)TE-A(1)-(A) Et2 A (1)T
=x G-( tl E2 Ax A t -t21 tll

Gt r,-1 a(1))(At(1)T -- A(1))-
-- ( Zt 21Zt ll" t t" lt


= Mi + M2 (say)









CtT tCt


= cT -tAt (AtTE7-At)-1AtTSt-ct

= (s + s')(M + M2)(si + s2)

= sTMisi STM2s1 + sTMis2

s M2s2 + 2sT(Mi M2)s2


sTMIsi = (1)T (-- Kt)c )

sTM2s2 = (Mtc1) t2i c1)TG-1(M ( 1) t2E 1)



x (ct2 1 A('))
STM s2 -(c?) ~1 t. rt_2.-)



Ms2 = (c2- Et2A =l) T(2.G 2.21G-)
x (c2)- Et2-EM@Aj))



sTMIs, = (M S12Ec-1^T (c2)c -


cTQ t = (1)T (1) + MtC f) TGr-i(2)- Mtc('))
CTt t tt -+ (Ct t G)-


(10)





(11)



(12)

(13)


(14)


Now starting with the joint pdf of C l), a, R and A we get the pdf of C$1) and A
given by

h(c, A) c Ill-AAi-1
h(c l), A) ocx J~ Alle)t 'A(1)1


( 1 T (1 +) 2 & Y-l
x o + a ,A + c)t Kc,))"- A-1
1=1 1=1


(15)


So the conditional pdf of C(2) given A = A is given by

h(c2) C() = c~'), A = A)


Thus







f(c A)
h(c('), X)


oc ( '( IAtTE ; I


x ao + a) X+ A' A, c 1)
(=1 /


x (ao+- aIAI
( s =1


+ CtTQ.c) (N+El h,-p)
+ ci Q~c1)


IAtT ,'AtI


- At(1)T 'A(1) + A( t2l-1 A(1) Ti .
tll t +(A 2 t21 tl t ) -t22.1


x A () -1 A('))
x (At 2l/ Atll(


At()TE-A(1)
A(2) X- tlM-A(1)
At Zt2l, tAll(it


+ IZt22.1


At()T -1A(1) I 22.1 + (A-t Et221 -1 A (1)

x (As(1) v- A())) (A(2) t2 MA (1)()
til t ts -t2lZ A )
+ 1t(22.11

by using Exercise 2.4 on page 32 of Rao (1973) twice.
Also


Itl


-= ItZ l I1-22.1


Using the definition of Gt, (17) and (18), we have


- At (I)T iAt ( IG (
I I t il


IAT '=E'At


From (7)


(16)


-(A() 2 t21-IA(1) T
Xt22.1


(17)


(18)








I[c tIc ,


= IGtl


So the conditional pdf of C(2) given C(') and A is given by


h(c2) I c(), A)


oc IGt-' (ao


+ aA, + c ()TKc (1)) (Nt-nt)
+aAt t )^c)


x [1 + (ao


+ 1 atXt+I ^(') ( c2) -
+ E a,, + c(i)T Kc) (c2) Mc())
l=1


x Gt-1(c2) MtC1))]-`4(Nt+:0i^ h-P)
-- Gt t )


p)1 (ao


+ ht -
I=0
l=0


(20)


1=1 T))


We have

f(c2) c), A)


oc t- ti n


$
S h p () M ())T-1
1=0


x [(c2) MtC))]I (Nt+ h-P)}

From (21) it follows that the conditional distribution of C2) given C 1)
and A = A is (Nt nt)-variate t distribution with df nt + EZ=o ht p

parameter Mt(') and scale parameter


(21)

(1)
= Cti
location


t = (Ne + h-p) ao
1----0


Ic)T c)))
-+ a1 + c)TKc)) Gt
1=1


Also, since f(A Ic()) = (' A) c( h(c1), A), we have from (15)
f(CV,1)

f(A \ cj1)) oc |nl|- A t(1)Ti-iA(1) I -1
I=1


(19)


Writing


(N





93



ao + a1xt + c(j) Ktc)-)
=1 )















BIBLIOGRAPHY


Battese, G.E., Harter, R.M., and Fuller, W.A. (1988). An error-components model for pre-
diction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc.,
83, 28-36.

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with
discussion). J. R. Statist. Soc.B 36, 192-326.

Blight B.J.N. and Scott A.J. (1973). A stochastic model for repeated surveys. J. R. Statist.
Soc.B 35, 61-66.

Brackstone, G.J. (1987). Small area data: policy issues and technical challenges. Small Area
Statistics. Eds. R. Platek, J.N.K., Rao, C.E., Sarndal, and M.P. Singh. Wiley, New
York, pp. 3-20.

Datta, G.S. (1990). Bayesian prediction in mixed linear models with applications in small
area estimation. Unpublished Ph.D. dissertation. Department of Statistics, University
of Florida.

Datta, G.S., Fay, R.E. and Ghosh, M. (1991). Hierarchical and empirical multivariate Bayes
analysis in small area estimation. Technical Report 409, Dept. Statistics, Univ. Florida.

Datta, G.S. and Ghosh, M. (1988). Minimum risk equivariant estimators of percentiles in
location-scale families of distributions. Cal. Statist. Assoc. Bull., 37, 201-207.

Datta, G.S. and Ghosh, M. (1990). On the frequentist properties of some hierarchical Bayes
predictors in finite population sampling. Technical Report, Dept. Statistics, Univ.
Florida.

Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to
small area estimation. The Annals of Statistics, 19, 1748-1770.

Efron, B., and Morris, C. (1971). Limiting the risk of Bayes and empirical Bayes estimators-
Part I: the Bayes case. J. Amer. Statist. Assoc., 66, 807-815.

Ericksen, E.P. (1973). A method of combining sample survey data and symptomatic indi-
cators to obtain population estimates for local areas. Demography., 10, 137-160.