UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
BAYESIAN PREDICTION WITH APPLICATIONS IN IN MIXED LINEAR MODELS SMALL AREA ESTIMATION GAURI SANKAR DATTA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY to my parents and teachers, with regards ACKNOWLEDGEMENTS would like to express sincere gratitude Professor Malay Ghosh being my advisor, for originally proposing the problem and for the attention received from him for the past five years. Without enormous patience, encouragement guidance, it would not have been possible to complete the work. Throughout my years graduate program, he has been my friend , philosopher guide , consider myself extremely lucky to get as my dissertation advisor. would like thank Professors Michael DeLorenzo Ronald Randles for serving on my committee Also, am grateful to Professors Ramon Littell, Kenneth M. Portier P.V. for being on my Part C and oral defense committees. special Richard thanks to Professor Ghosh Scheaffer for their genuine interest, Professor incessant efforts unlimited energy which made it possible remove stumbling stone out of my way to join University of Florida. would also like to express my irrat itude to mv respected teachers. especially to were crucial to my coming to United States. feel very fortunate being able turn them whenever necessary. would also like acknowledge the help and support received from Krishnandu Ghosh preparing me come United States. am al so grateful highly indebted to my Alma Mater RamaKrishna Mission Residential College, Narendrapur, West Bengal, India, for support received. not been admitted to Narendrapur, would have never pursued my studies in statistics. this respect, will always remember our Principal, Respected Swami Suparnananda Maharaj, our Head Department, P.K. Giri, their care concern about me. would like to offer my humble regards them. will take this opportunity to express my appreciation to Professor Ashok Kumar Hazra for initiative is a great took pleasure introducing me to acknowledge to Narendrapur. Professor Uttam Bandyopadhyay to whom definitely owe a lot for my basic understanding of statistics as an undergraduate. interest was further stimulated by the insightful teaching of Professor S.K. Chatterjee of Calcutta University when was a master's student. would nona for me. like thank my parents for indebted they to my numerous have well stei n dee I * R Bibekananda Nandi, Professor K.M. Senapati, Mrs. Durga Senapati, K.C. Ghosh and Mrs. Bimala Ghosh, who have always considered me consider myself as a part extremely of their family. lucky to get the also affection ate concern of Mrs. Senapati, who has always treated like her own son. My heartfelt thanks are for my "unofficial" host family in Gainesville, Malay Ghosh his wife, our beloved Dolad i. would like thank A.P . Reznek of the United States Census Bureau for providing me with computing facilities during my stay in the Bureau as an ASA/NSF Research Associate. Last least would like to thank Ms. Cindy Zimmerman for her skillful typing putting a scribbled manuscript into final form. TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ABSTRACT a.. I .a .a... .a .a a .. ... a v111 CHAPTERS ONE INTRODUCTION Literature The Subject Review .. ... ... of This Dissertation . 10 TWO BAYESIAN PREDICTION GENERAL CASE .. OF MEANS IN LINEAR MODELS . .a .a ..a ..a. .a. S ..15 Introduction Desc Mode Hier Appl Anal Hier Popu Leve ript 1 wi arch icat ysis arch lati 1 Ob Cr .I It!  ion Or me nierarcnic th Examples .. ical Bayes Analysis . ions of Hierarchical ical Bayes Prediction on Mean Vector in Abs servations . . Bayes Baye of ence * .. 18 .. ... 31 .* . .. ... 37 Finite of Unit S ......57 THREE OPTIMALITY OF BAYES PREDICTORS A SPECIAL CASE .... ... . FOR MEANS IN S. ... .. 65 Introduction The Hierarchi . .. 65 Bayes Predictor 3.3 3.4 3.5 Spec Best Domi Best Domi Best ial Unb nati Unb nati Equ ase ase n i ase n i var Predict Small A Predict Infinit ant Pred ion and rea Est ion and e Popul iction Stochast imation Stochast action ... in Small ... 67 ic C . .88 *  FOUR ASYMPTOTIC PREDICTORS OPTIMALITY FOR MEANS OF HIERARCHICAL BAYES Introduct i Model. Los 4.2.2 >n ....... . ....... 114 i, Prior and Predictors ....... 117 ral Expressions for Bayes s Difference. .. . . ... 119 om Regression Coefficients Error Regr Model S. .120 S. 129 4.3 4.4 Optimal ity with Stage Variance Component: Model .................... Asymptotic Optimality with Variance Components ...... Known FayH First erriot Unknown . 137 . . . 153 FIVE SIMULTANEOUS BAYESIAN AREA VARIANCES ...... ESTIMATION OF SMALL Introduction 5.2 5.3 5.4 Baye when Know Asym Regr Vari Asym Regr Estimation of a Ratios of Varian Quad rati ce Compon in N Known sted Ratio in Ne Unkno Error of S. . 173 Error Variance Components SIX SUMMARY FUTURE RESEARCH Summary Future Research ... . 189 ... . . 190 APPENDICES PROOF OF THEOREM 2.3.1 . 194 AN INDEPENDENCE RESULT ELLIPTICALLY SYMMETRIC IN A FAMILY OF DISTRIBUTIONS S.. 203 BIBLIOGRAPHY . .. .. .. . .. . . . .. .. .. . . 207 BIOGRAPHICAL SKETCH . ...... . .. . . . 213 Abstract of Dissertation Presented the Graduate School the University Requirements of Florida for the Degree in Partial of Doctor of Fulfillment of Philosophy BAYESIAN PREDICTION IN MIXED LINEAR MODELS WITH APPLICATIONS GAURI IN SMALL AREA ESTIMATION SANKAR DATTA August, Malay Ghosh Major Department Statistics Small area estimation is gaining increasing popularity recent times. Government agencies United States Canada have been involved estimating unemployment per for many capital state income, local crop yield, government r etc. egions simultaneously Typically, only a few samples are available from an individual area. Consequently reliable estimators of "parameters, " such the mean or the variance for the area, need "borrow strength" from similar neighboring areas implicitly explicitly through a model. Such estimators usually have smaller mean squared error of prediction than survey estimators. this dissertation a general hierarchical Bayes Chai rman: 1990 rates, Sa a regression coefficients model , authors are seen to be special etc. cases considered earl ier proposed general model. predictive distribution characteristic inte rest for the unsampled population units is found given the observations on sampled units is used estimators of to draw several inference small 1 particular, area means and simultaneous variances are developed. A mixed linear model with noninformative prior regression coefficients (or fixed effects) independent gamma priors (possibly noninformative) for the inverse variance components is used. a special case this HB analysis, when vector ratios of variance components is known, predictor of sampling is shown vector of means possess some finite frequentist population optimal properties (such as best unbiased predictor, best equivariant predictor, etc .) basically under the elliptical symmetry assumptions. Performance of this HB predictor is evaluated comparing Bayes risk with that of subjective Bayes predictor with superpopulation "true" or "elicited" parameters. prior for the is shown that, unknown under a balanced oneway random effects model 1 with covariates and ax'sP maeI~ niua Ired errnr 1 nAs  the difference IE t,ICc Raves . CHAPTER ONE INTRODUCTION Literature Review linear models astronomers for predicting the positions of Starting from celestial these bodies days, goes use back several of modelbased centuries. inference prediction received considerable attention. part icular, animal plant breeders have used such models predicting some characteristics of future progeny. Starting with pioneering work of Henderson (1953), considerable attention been devoted this problem. refer to Gianola and Fernando (1986) Harville press) where other references are cited. other hand, survey analysts have used the modelbased approach finite population sampling with goal predicting certain characteristics of population basis unsampled units observed sample. Early work this topic may found in Cochran (1939, 1946) where finite population is viewed as a realization from hypothetical superpopulation. small 1 area statistics was in existence as early as the llth century England 17th century in Canada (see Brackstone, 1987). However, these early small area statistics were based on data obtained complete enumeration. availability limited resources advent of sophisticated statistical methodologies, past few decades, sample surveys for most purposes, have been widely used as the means of data collection contrast to complete enumeration. data collected from these surveys have been very effectively used to provide suitable statistics national state levels on regular sublevel basis. However, below the state use of level (for survey data example, county other subdi vision) was limited because the estimates these small 1 areas usually were based on small samples produced unacceptably large standard errors coefficients of variation. improve reliability small area statistics, is necessary to have a much larger sample size an individual area than can afforded with limited resources available. Consequently, use of survey data (possibly or During the last United States and few years, Canada, h many iave countries, recognized including the importance of small area estimation. Recently there growing concern among several governments with the issues of distribution, equity and disparity. There may exist subgroups within a given population which are far below the average certain respects, thereby necessitating remedial action on the part government. Before taking such an action, there accordingly, the a need statistical identify data at such subgroups, relevant subgroup levels must available. different government agencies like the Census Bureau, Bureau of Labor Statistics, Stat ist i cs Canada and Central Bureau Statistics of Norway have been involved obtaining estimates of population counts, adjustment factors census counts , unemployment rates, per capital income, etc. state local government areas. techniques face have this emerged problem, that small 1 area estimation "borrow strength" from similar neighboring areas for estimation prediction purposes. Through use some appropriate model auxiliary information (possibly obtained through complete over the survey est imators a good review of small area estimation literature one may refer to Ghosh (1990). necessity "borrowing strength" been real ized by many stat i st i c i ans. Ericksen (1974) advocated use regression method for estimating population changes local areas. Herriot (1979) proposed adaptation JamesStein estimator to survey estimates income small areas. Survey estimates being based a small sample size (which is usually 20 percent population of size less than 1000) usually have large standard this, errors these coefficients of authors f i rst variation. a regression To rectify equation census sample estimates, using as independent variables county values, tax return data for year 1969 data housing from the 1970 census. The estimate they provided each place was a weighted ave rage sample Harter under estimate and corn Fuller (1988) soybeans regression considered estimate. prediction 12 counties Battese, areas northcentral Iowa based satellite data 1978 June E Battese, :numerat ive Harter and survey Fuller nd LANDSAT (BHF) used sampled counties. Fuller Harter (1987) also considered a multivariate extension this model. There is a similar problem of prediction faced by the animal breeders. purpose of selecting the best animals for future breeding, they need come up with index for each animal under consideration. Henderson (1953, 1975) predictor advocated (BLUP) certain use best linear linear unbiased combinations of fixed random effects using a mixed linear model. Harville press) used a mixed linear model for predicting the ave rage sires weight of singlebirth male belonging to different lambs which population are lines progeny dams belonging to different categories. Harvilie Fenech (1985) considered this example for estimating the heritabilit other problems based on a linear model come varietal trials comparative experiments comparative experiments, several treatments have to be compared their effects or some suitable contrasts have to be estimated. Multicentered clinical trials are good examples of comparative experiments (see Fleiss, 1986). Problems this type ones mentioned in the The methods that have usually been proposed in model based inference use either a variance components approach or an empirical Bayes (EB) approach, although as pointed out by Harville (1988, press), the distinction between the two is often superfluous. Both these procedures use certain First, mixed linear models for assuming the variance prediction components to purposes. be known, certain BLUPs or EB predictors are obtained for the unknown parameters of interest. Then unknown variance components are fitting of con estimated stands or the typically by Henderson's method restricted max i mum likelihood (REML) method. resulting estimators, which can called estimated BLUPs or EBLUPs (see Harville, 1977) , are used for final prediction purposes. Empirical Bayes approach in small area estimation was first given Herriot (1979) later also used by Ghosh Meeden (1986), Ghosh Lahiri (1987a, 1988) among others. According to this procedure, first a Bayes estimate of unknown parameter of interest is obtained by using a normal prior or using a linear Bayes argument (Hartigan, 1969). unknown parameters of prior are then estimated some classical methods like method Although above approach of EBLUP or EB is usually quite satisfactory for point prediction, 1s very difficult to estimate standard errors associated with these predictors. This is primarily due the lack of closed form expressions for the mean squared errors (MSEs) (1984) EBLUPs or the suggested an EB predictors. approximation Kackar the MSEs Harvilie (also Harville, 1985, 1988 , in press; Harvilie Jeske, 1989). Prasad and Rao approximate MSEs (1990) proposed in three estimates of specific mixed these linear models. these approximations rest heavily on normal ity assumption. Recently, Lah iri Rao (1990) considered this problem, relaxing the normality assumption, assuming some moment conditions without presence of auxiliary information. work of Prasad (1990) suggests that their approximations work well when number of small areas is sufficiently large. not clear though how these approximations fare for a small or even moderately large number of small areas. Ghosh as an Lahiri alternative press) EBLUP proposed or the an HB procedure EB procedure. HB procedure, one uses posterior mean for estimating often complicated, can found exactly via numerical integration without approximation. The model considered by Ghosh Lahiri press) was, however, only a special case socalled nested error regression model, also used by BHF A similar model was considered by Stroud (1987), general analysis was performed only for the balanced case that is when number of samples was same for each stratum. Other models have also been proposed. a recent article, Choudhry (1988) considered five specific models small area estimation not included earlier work of (1979) Prasad Cumberland (1990). (1989) Recently, considered Royal 1 certain crossc lassificatory models for small 1 area estimation. latter carried out a Bayesian analysis assuming the degeneracy certain terms an usual twoway linear model. For a Bayesian analysis context animal breed ing, one may refer to Gianola and Fernando (1986). However, they consider the HB analysis. They used subjective informative priors which are constructed from previous data and experiments. Also , they showed important special case which arises in the above approaches which is also important the theory least squares. When ratios of variance components are known, predictors (least squares, empirical Bayes or hierarchical Bayes) are BLUPs (Henderson, 1963). related BLUP results predicting scalars in finite population Cumberland sampling one may (1989), refer to Prasad Royall 1 (1990) (1979), several others. Harvilie (1985, 1988, in press) pointed out BLUP properties of Bayesian scalars in general mixed linear models (see also Harville, 1976). Ghosh Lahiri press) have extended Henderson others scalar BLUP notion to show the Bayesian predictor of vector of finite population mean is BLUP To conclude this discussion, we will briefly mention another problem. far, we have considered problem of estimating the mean in finite population sampling. Another important problem finite population sampling is estimating the finite population variance. Ericson (1969) found Bayes estimator of finite population Empirical variance Bayes under estimation a normal of finite theory set population variance Subject of This D i ssertat i on this dissertation, we present a unified Bayesian prediction theory for linear models small area estimation context finite population sampling. general Bayesian model as an extension is presented ideas of which Lindley can regarded and Smith (1972) to prediction. This general model can also be applied infinite population situations, for example, animal breeding other applications where a mixed linear model used. In Chapte r Two, introduce a general HB model use this model for simultaneous estimation of several small area means in finite population sampling. Some of widely used models small area estimation including the nested error regression model (Battese al., 1988; Prasad Rao, 1990; Stroud, 1987; Ghosh Lah i ri, press), random regression coefficients model (Dempster et al ., 1981 Prasad Rao, 1990), crossclassificat o ry models stage (Royall, 1979; sampling models Cumberland, (Ghosh Lahiri, 1989) 1988; multi Malec Sedransk, 1985; Scott Smith, 1969) can regarded special cases of our model. posterior distribution population, given the data, conditional distribution conditional mean variance vector of effects are provided. These two analyses are applied two real data sets. is worthwhile to mention that Bayesian analysis linear models was initiated by Hill (1965). also Hill (1977, 1980). For a good exposition HB analysis see Berge r (1985). In Chapter discussed this model, which Three, previous assumes a special chapter known case of HB models is considered. rat ios Based of variance components, certain optimal properties of the HB predictors proposed within appeal this a Bayesian also chapter are framework, frequentists. proved. these The Although, results should BLUP notion developed be of real for valued parameters is extended vector valued parameters, is shown that Bayesian predictors derived this chapter are indeed BLUPs. From this, as a special case, follows that Bayesian predictors of finite population mean vector and other linear parameters are BLUPs as well. BLUP result for the finite population mean vector unifies a number similar results derived under specific models (e.g. , Royall, 1979; Ghosh on suitable subclass elliptically symmetric distributions, including but not limited normal, the HB predictors are shown best unbiased; that they have smallest variancecovariance matrix within class of unbiased been able predictors. to show that Also, the following Hwang BLUPs also (1985), "universally" we have (or stochasticallyy") for elliptically dominate linear unbiased symmetric distributions. The predictors notion "universal" and "s precise in Chapter tochastic" Three. A dominant ion lso, will be made is established that under a suitable group of transformations, pred ictors are best within class of all equivariant predictors for elliptically symmetric distributions. Jeske Harvilie (1987) have shown that scalar BLUPs are best equivariant within class of linear equivariant predictors to our without knowledge, any the distributional equivariance re assumption. :suits However, for vector valued predictors have not been addressed before this context their full generality. In Chapter results Four, regarding the we have Bayes r established isk some performance asymptotic certain H predictors of finite population mean vector. We have that under average squared error loss Bayes risk difference between HB predictors and subjective Bayes predictors for "true" prior goes to zero as number of small areas goes infinity. This shows our HB predictors are asymptoticallyy optimal" (A.O.) sense of Robbins (1955). The A.O. property certain predictors arising naturally context finite population sampling was proved in Ghosh Meeden (1986), Ghosh Lahiri (1987a) Ghosh, Lahiri and Tiwari (1989). Chapter of several s special Five trata cases is devoted variances. nested the simultaneous have considered error estimation the regression model considered in detail. property by Ghosh in Chapter these Lah i r i Four, predictors. press) we have Ghosh and Stroud proved Lahiri (1987) A.O. (1987b) Lahiri Tiwari press) have proved A.O. property certain EB predictors finite population variances. reemphasize that present dissertation provides a unified Bayesian analysis both finite infinite population framework. For finite population, we unify number of models considered earl ier by different authors. to our knowledge, estimates of MSEs or good approximations thereof are not available except for a few specific models. Bayesian procedures this dissertation, the other hand, can serve as a general recipe to handle a greater variety of problems. Also, inferential methods of following chapters are implementable for data analysis, especially in these days of sophisticated computing facilities. CHAPTER TWO BAYESIAN PREDICTION OF MEANS IN LINEAR MODELS: GENERAL CASE Introduction this chapter we will consider two similar different small population prediction problems simultaneously. area estimation sampling and problem problem other comparative problem context problem deals experiments refers to finite with the context of ANOVA, ANOCOVA or linear regression infinite population situation. In both these cases, a mixed linear model used. first case, we are interested predicting some finite population characteristic (e.g., finite population totals or means) whereas second case we are interested in predicting linear functions of fixed random effects. finite population sampling set we assume that there are m strata, stratum Ui containing a finite number of units with units labe lled TUT UiN. Let Y.j denote with some characteristic unit stratum interest associated 1,..., some finite cost. are interested predicting some linear combinations of these observables (like the finite population total or mean for each small area or domain) using a quadratic loss. notational convenience, we will denote a sample of size from the stratum the other ' Yi2 hand ,9 . in. infinite population set are interested particular predicting contrasts) fixed near and combinations ( random effects. Note that this set up these quantities are observables. For this problem too, use a quadratic loss function. We will quantities use we want word predictands to to predict both refer to problems. analysis assume will be done ratios of in two stages; variance components first are stage we known whereas situation the where second stage we consider the variance more components are general unknown. In Section a general HB model will be described a number interesting examples arising finite population will sampling or considered. in the Some of the infinite population existing models set used context of finite population sampling are shown to be Realizing the importance of problem, most general unknown situation will where be considered the variance this chapter, components are whereas known ratios of variance components will be considered next Section chapter. 2.3. This A general general mixed situation linear model is considered is considered some prior distribution parameters which the variance consist of components. is assigned vector of first to all fixed unknown effects part of Section 2.3, for the model introduced in Section 2.2, have found posterior (predictive) distribution of the characteristic interest of nonsampled population units given values of that characteristic for the sample units finite population sampling. Also posterior mean vector posterior variancecovari ance matrix corresponding to characteristic vector of nonsampled units are obtained from this predictive distribution. In particular, posterior means variances of finite population means small areas are obtained. In the second half of Section 2.3, have obtained posterior distribution the vector of fixed random effects for the model introduced in Section 2.2. particular, In Section 2.4, we have applied results of Section 2.3 to some actual data sets. First, we shall consider the corn and soybeans data which appeared in Battese, Harter Fuller (1988). Using the HB analysis developed Section 2.3 we have derived the posterior means posterior standard deviations for the 12 small area (county) means. second data set containing the weights of 62 singlebirth lambs appeared Harvilie press). This set is analyzed HB methods up developed second for half infinite of Section population 2.3. Finall in Section an HB analysis of the model considered by Carter Rolph (1974) subsequently Herriot (1979) to estimate per capital income small places is considered. this situation unit level observations are not available we are interested predicting the finite population mean for each small area. Here sampling variances are different and are assumed to be known; also, a uniform prior on the regression coefficients and a gamma prior (proper or improper) on the inverse prior variance. Description Hierarchical Bayes Model with Examples r ) (B) conditional (C)B, R and N(O have r D(A)); a certain joint prior distribution proper improper. Stages of the model can identified as a general mixed linear model. see this, write (2.2.1) where is the vector fixed effects, are mutually independent with N(O , r19) and N(O, r1D(A)) are known design matrices, is a known positive definite (p.d.) matrix, while is a p.d. matrix which is structurally known except possibly for some unknown examples follow, involves ratios variance components. context of small area estimation, part it ion Y(NTXl), X (NTxp), Z(NTxq) e(NTXl) with conformity rewrite model given 2.2. = IL) (2) + e(2). 2.22) as N(Xb + Xb + 12 (x) (1) (2) 2(1) (2) S(2)((NTnT) xl) corresponds vector of unsampled units. (.(1)T We will y(1)T ) , . further partition where into y(1)T nicomponent vector (1) Yi Yil ,. . small 1 ((2)T Yin. )T area. is the Similarly (2)T) Ym , S... vector of , y(2)T where sampled can (Nini)component units from the partitioned into vector (2) xi (Yi +l' " the ith sma Yi ,N 1,N) Lii area. vector of unsampled our primary units objectives for in small area estimation vector (71 .. to estimate ym)T where finite N. population mean >2YiJ/ Ni 1 . More generally, we may interested predicting the vector (say) this linear combinations known matrices purpose distribution it suff of Y(2) ices given , y2)) + c (2) + CY A(uxnT) find C(ux(NTnT)). predictive y(1) next section this will accomplished by using modelbased approach survey sampling. Before we consider the other problem infinite population introduced set identify small some area estimation existing models several authors special cases (2.2.2). In what follows, we shall use ident.^ t.' mn+r iv notat. nfl mnr (1)T (1) AY E(Y(1) I1Y1I Il, I II ff I  1 11  Also, col 1 (B.) denote the matrix B ,..., Z)T k SA. i= 1 denote the matrix First, consider the nested error regression model = jb  v  ^ii + eij (j = 1,..., N.; i = 1,..., m). (2.2.3) The model was considered by Battese, Harter Fuller (1988). They assumed to be mutually independent N(O, with this case N(0, X(1) (ArY,'), col 1 col 1 col col 1 T :ij DA) = A1 m i=l 1i Im In z(2) the m =i1 i=lNini further special case of Ghosh Lahiri press) , xi "1 for every 1,..., 1,..., Note that a ratio of V(eij)/V(vi), variance components. The random regression coefficients model of Dempster, Rubin Tsutakawa (1981) (also Prasad Rao, 1990) also a special case ours. this set up, are same in the nested error regression ... A T ~ij D(A) ,1> ,(2) (1) XI "1 (1) Xt Some the models of Choudhry and (1988) can also treated as special cases ours. For example, one of their models given bxi1 1/2 + eijij (j = 1,..., Ni (2.2.4) with N(O, (rA) 1) and N(0, r1). Here = 1, are V) same in the nested error regression model with vector xi "1] replaced scalar 5(2) are F'. ii same in the nested error regression model, = Diag(x1l,..., X1N1 , . xml,'.. XmNm) Another model considered by these authors similar to one given (2.2.4) with replacing x1/2 replacing x _1 as multipliers eij. another model considered Choudhry (1988) bxij 1/2 V1 ij 1/2 + e .jx. 1JJ 1J 1,..., .2.5) with having the same distribution previous model. Here = Diag(x11,..., X1N1 xlii, ,.. . , t XmN),' z(1) m (1) i= u i ~1 S(2) m (2) G u with "(1) u.i ( 1/2 I 1 ,... (2) , u. 1x/2 T x, ) 1/2  IX i ,..., xY/ 2 are . 9 b, i 1,, m) .(2) 2(1) o(x> (1) (2) X ,  linear model. For example, suppose there are m small areas labelled Within 1 ,..., each small area, units are further classified into c subgroups (socioeconomic class, age, etc ) labelled 1 ,.. The cell sizes 1 ,..., 1 ,...,  1,..., Nij) denote 121 assumed to be known. the measurement on Yijk individual (i,j)th cell. Conditional r and suppose Yijk ijk = xT .b + 1J + eijk (2.2.6) = 1,..., iid N(O, N(O, Nij; mutually (A3r)1), (A1r)1). 1,..., independent N(0, this 1 ,..., with with N(O, (A2r)) and case col col col 1 *J Y., ilk col 1 ( col c \n. +l Y., 13k col 1 col 1i ol ij Tj)x, j~c ~ j1 c Ni.ni j)} r1), (l) (1) .(2) col 1 ln.i}' 1~v i '3 m i=lj c e In. =1 is a matrix similar to with (N~~n~~) replacing n lJ in defining the dimensions of vectors. Also, (r ,... , rm r~' . Y 1 *''* 7mc) = 3 , A (A1, pm~ A21 Ic = Diag(AllIm , A31 Imc). Special cases of this model have been considered several others. Cumberland 7. are degenerate (1989) considered zeroes. Also, a model they where assumed variance rat io to be known in deriving their estimators, and did not address issue unknown appropriately. Next we show that two stage sampling model with covariates and m strata is a special case our general linear model. Suppose that stratum contains primary units. Suppose also that primary unit within denote stratum value of contains subunits. characteristic interest Yijk for the subunit within primary unit from stratum 1,..., 1 ,..., 1,..., From the stratum, a sample primary units taken. selected orimarv unit within stratum. z ~2 'A' ,(2) 2(1) u ir~ uiir. I Assume conditional r and Yijk = xijb + (k= 1,..., Nij; j = 1,..., Li 1 1 i = 1,..., m), (2.2.7) where ', ij and are mutually independent with (i N(O, (A1r) "ili N(O, (Ar) 1), N(O, col 1 col { col 1_ col 1 col 1 col u .. + nij [j 1,..., = s( 2)T W2) col 1 (Q ol col i ('i~)) col col 1 LI 1i Al sn. let+ he defined~ i1 m 1 lnrl v(i) + eijk r1) ('i~~) eijk (Yij k) .(1) (Yij (2) ilY col col (1 x. , 1 where  uij C'')), m 1 ini i=l1 S ni. E nij j= J W(1) = w(1 V where (1) W1 1 m i e *e in. i 3 =n.1. i ji "ij (1) V2 m = "TOL. m Li ,' z(2) = (S 2) m e 1 * i ~=. Li l Ni.jni. L. m i i= 1r. . ~~~ 1' Here with = 2, = (A, L. INi, JE Nj. 2)T , The , D(A) ideas can = Diag(AllIm, be extended A21IL.) directly to multistage sampling with more complicated notations. may mention here that Bayesian analysis for two stage sampling was introduced first by Scott Smith (1969) a much simpler framework. A multistage analog of their work was provided by Malec Sedransk (1985). Ghosh = N.3 = (1) m E.i' i=l (2) s 2(1) "T 1 W(2)), (2) S (2) Now we will consider the infinite population set this context, we will use the model given (A), with mixed linear model representation given (2.2.1) Here we will assume the data vector nTxl associated design matrices X are nTxp and nT X respectively 1 Without loss of generality, also assume that rank(X  p. Our objective is to predict (say) on the basis of Y where S(uxp) T(uxq) are known matrices. Following the modelbased inference, it suffi distribution of () V~ conditional ces = W find (say) distribution posterior given = y. is provided (conditional) This next section. We will conclude this section discussing a few spec if ic models context of comparative trials an imal breeding which are special cases general model proposed (2.2.1). First consider multicentered clinical trial which conducted participating clinics to compare two treatments, one already existing the market other newly developed. Suppose there are subjects receiving the treatment jth clini Some of nij. could IJ zero. are interested in estimating the SL? + treatment participating clinic. consider the model: + Tij + eijk Yijk (2.2.8) 2 ,... n.i , 13 1,..., where mutually independent with subject effects eijk SJ k are N(O, N(O, treatmentclinic (A2r)1) clinic interaction effects , (A1r)1) the effect due treatment. Now we will write down for (2.2.8). ease presentation, assume > 0 1 ,..., Then writing S(Y111 '" Y1in11 S. . Y1cl'"' Y1n1 ~2c1' Y2cn2c)T (l'I, S711' ** 2 s In. i=l1 71c * * C j=l n C ) In . el S.. elcl,..., etnl e2cn2c)T A21 I2zC)~ In ,  7 (A1, is clear that A2)T and D(A) (2.2.8) is a special case (2.2.1) 2 E "ij,' * = _ where observations. above c and want C ="ij ' c E nij j=1 to estimate = total which number is a * " Diag(Alic, , ** x, Z 1 I 1> pa~T (C1,..., TZc)T (ellll, "2 ) e2cl''''' CI1Cc2 used. Once again, one can use linear model given (2.2.8) inferential purpose. case one may interested predicting some suitable linear functions of random quantities, c and &J 7ij, known as the breeding values. These predicted breeding values can used as a selection index for selecting the most suitable breeds for future breeding purpose. a concrete example in animal breeding we will discuss example considered by Harville press) which involves prediction the average birth weights of infinite number of offspring of singlebirth male different sires lambs in different that are population lines. The data consist of weights birth) of 62 single birth male lambs, caine from five distinct population lines Eac h 1amb was progeny of one of rams, each lamb a different dam. Age of dam was recorded as belonging to one three categories, numbered (12 years), (23 years (over 3 years) Yijkd represent weight birth) the dth those lambs that are offspring of sire population line a dam belonging to age category. Following Harville press), we will use where d 1 ,..., nijk I. ,., = 1, 3 and 1 ,..., 5 where nijk is the number of lambs whose dams belong age category when population line and si re is k and m. j is the total number of lambs whose sires are from population line Here effects line effects are , . considered fixed effects sire (within line) effects Sjk are iid N(O, (rA)1) and independent error variables eijkd which are N(O, r1) To make the design matrix associated with fixed effects full rank we can take 63 = 0 = r5 which usual formulation needed for GLM Procedures in SAS. = E(Yij kd) =14+ + and kJ there n"i observations will corresponding to interested category. predicting Wjk pijni. rj + (2.2.10) where The value Wjk can interpreted 3 i n i . average birth weight infinite number of male lambs that are offspring of sire line. as (sl' 2.3 Hierarchical Bayes Analysis this section, for the finite population sampling we provide predictive distribution of Y given y(1) y and for the infinite population set up we provide posterior distribution vector of effects , vT)T given = y. We will use following notations label certain distributions used this section. A random variable is said have a gamma(a, distribution f(z) = [exp(oz random vector a/(0) I[z>OJ] , , Tp)T is said (2.3.1) to have mu It ivar i ate tdistribution with location parameter scale parameter c, a p.d. pxp matrix and degrees freedom (d.f v if g(t)  II)l~ cx ~ [ ) J4(V+p) (2.3.2) (see Zelliner, 1971, 383, or Press, 1972, 136) . Here E denotes the determinant of a square matrix E. Assume v > 2. Then E(T) =/" V(T) = (/(v2)). a nr,e I f onnditi onsI driven at the (1) )~P1 Sll...~ lr n.~nl . .1A .i .i (Cl) AL ~ AtR are independently distributed with uniform(RP) , ~ gamma 4a0, gamma( ai, 2gi) with > 0, 19..., Allowing a& some zero some improper gamma distributions are included as a possibility our prior. Before stating the predictive distribution given notations. we write need E(A) introduce a few matrix = + D(A)zT partition into Also, 22. 22 21 1112; ^l2C1112 11 .3.3) .3.4) ~11i 11~ ) 2 (X(1)T V1 x(1)) 11 (1)Ti 1 (2.3.5)  22 2  2 2 S(1 ))1) 111 (2.3.6) Now the predictive distribution of Y(2) given given in the following theorem in two steps 250 ' of Y(2) Y +(x( g0 (1) x (1)TE Ili 1 (l))(,(l)TC"`11 (1) Theorem 2.3.1. Consider model given .2.2) (cl). Assume t Egi i=O that Then conditional on A multivariate distribution with d.f t Z gi i=0  pt location parameter scale t Ea1 1 parameter + (1)T (1) J ti i=O Also, 1 P conditional distribution given y (1)) u Cx A, (1)T 11 t i ai.A + i11 (1)TKy (1) moments of a multivariate tdistribution iterated formulas conditional expectations variances follows from above theorem that y (1)) E(M y(1))(1) .3.8) (1)) (1) t i=l 1 1sr A! t i 0 i i0 Y (2> "T My (1) (1) 1 i (nT+ ag+ ,(y(2) ,(y(2) = V My l Y + nT + t Egi i=0 2 t + (1)T (1) i=l 1 1 1 .^^ (1) (2.3.9) Using (2.3.8) and (2.3.9), it is possible to find the posterior mean and variance of , Y(2)) 4Y(1) + CY 2) where A and C are known matrices. The Bayes estimate of (1) , (2)) under any quadratic loss posterior mean, and is given by BFsing (2.3.8) using (2.3.8). = A + MIy(1)(1) Similarly, (2.3.10) using (2.3.9), one may obtain = CV(Y y (2) ))CT . (2.3.11) Note that when A m T i=l 1 and C ,(2)) im T i= lNini reduces to the vector of finite population totals for the m smal1 areas, il1i /N, ( Y(1) 1. N.n. 1!  areas for the choice A .ilTin /Ni) and C J ^nn 1 S(2)) reduces to the vector of finite population means for the m small areas. Now we will get back to the infinite population set up to provide the posterior distribution of W ,T)T v ) t S S I S ** U I * V ( 1) ( I(u(') ~(y (1) theorem. proof similarity to this the theorem will proof be omitted of Theorem 2.3.1. because We will consider the model given of Section (cl). Recall from the middle of Section that have redefined the dimensions of Y e appearing there Y(n xl), X(nTxp) , Z(nTxq) e(nTxl) Also we have assumed rank(X) = p. Now we will state theorem. Theorem 2.3.2. Consider the model stated above assume that t E gi i=O  p Then, conditional has multivariate tdistribution with d.f. nT + nT + t E gi i=O t i=gi i=1 location  p, parameter ai.Ai scale parameter where x(xTx1X 1)xTr1; XX X X * (2.3.12) _= [1x(xTy1X) QZD 2.3. (xTE1X) DZT 1X(x T (XTEX) 1) xTE1ZD  DZTQZD S C 2.3. on = 1 Y, c x, + yT9y f( iy) I! xTslx i=l 2'. [{ A  a.A. 1 1 yTQy  C (2.3.15) Again using the moments iterated formulas of multivariate for expectation tdistribution, variance, above theorem can used find the computational formulas for E(Wy) V(Wly) as (2.3 (2.3.9). Similarly, one can find eBI( (say) (2.3.10) V( (b Sb + known matrices S(uxp) .3.11) where and T(uxq). Applications of these two theorems will considered in Section some actual data sets. There we will carry out an HB analysis of data sets which appeared Battese et (1988) Harvilie press) Before we conclude this section , we will make a final observation. comparison (2.3.4) .3.7) with (2.3.12) (2.3.15) reveals that replace by y(1) X by c by f( ly(1)) as given f(Aly) (2.3.7). (2.3.15) we obtain This observation will referred in Section 2.4. vi) C(b, x (1) Applications of Hierarchical Bayes Analysis This section concerns analysis of two real data sets using the HB procedures suggested in Section 2.3. first data set soybeans for related 12 counties prediction northcentral corn and Iowa based 1978 June Enumerative Survey as we 11 as LANDSAT satellite data . It appeared in Battese, Harter Fuller (BHF) who conducted a variance components analysis for thi problem. second data set original 1 appeared Harville Fenech (1985) reappeared in Harville press) where he conducted a variance components as well as an HB analysis predict Wjk , given (2.2.10) ave rage weight of an infinite number of singlebirth male lambs that are offspring of sire population line. We will first consider data set. start with briefly give a background this problem. USDA Statistical Reporting Service field staff determined area of corn soybeans in 37 sample segments (each segment about hectares) 12 counties in northcentral Iowa by interviewing farm operators. Based on LANDSAT readings obtained during August September 1978, USDA number June of hectares of Enumerative corn Survey), soybeans number of reported pixels classified as corn soybeans for each sample segment, county mean number of pixels classified as corn soybeans (the total number of pixels classified as that crop divided number of segments that county) are reported in Table of BHF ready reference, reproduced comparable Table that 2.1. orde r to make of BHF second our segment results Hard in county was ignored model considered by BHF bl 1 ij + b x2ij + e,., 1J (2.4.1) where a subscript for the county, a subscript a segment within given county number segments county, ... 12) . Here Xl ii number of pixels corn the number of pixels of soybeans for the segment in the county. They assumed our notations) E(vi) = E(eij) = 0, Cov(vi 1 V(vi) (Ar) v.,) , Cov(vi, Cov(e1J, i .1') eij) = 0, = 0 , j'). are interested oo. , bg Vt Table 2.1 Survey in 12 and Satellite Iowa Counties Data for Corn Soybeans Segments Reported hectares No. in of pixels sample segments Mean no. of pixels per segments County Sample County Corn Soybean Corn Soybean Corn Soybean Cerro Gordo Hamilton Worth Humboldt Franklin Pocahontas Winnebago Wright Webster 374 209 253 432 367 361 288 369 206 316 145 355 295 223 459 290 307 252 293 206 302 313 246 353 271 237 221 369 343 342 294 220 340 355 Hancock Kossuth Hard in 55 218 250 96 178 137 206 165 218 221 338 128 147 204 77 217 258 303 221 222 274 190 270 172 228 297 167 191 249 182 179 262 87 160 295.29 300.40 289.60 290.74 318.21 257.17 291.77 301.26 262.17 314.28 298.65 325.99 189 196 205 220 188.06 247.13 185.37 221.36 247.09 198.66 204.61 177.05 can be written as where N. 1 1 = Ni E eij *j=l ~ + bRip) 1 li(p) + b2R2i(p) Xli(p) N N. 1 1 = N. xlij and j=1 N. = Nix2ij. J=l Under the assumptions of model (2.4 .1), can interpreted as the conditional mean hectares of corn (or soybeans) per segment, given realized county effect values of satellite data. because Clearly, average segments county mean not equivalent over the not finite identically 0. population However, either if N i are m) are appropriate conditions large small, predictor appears or then of VY. to be sampling rates predictor of this example true . n i/N i is an either predicting first assuming A and r known, obtained BLUPs 12) . Then, using Henderson 's Method , they obtained estimates of variance components, final predictors involved estimated variance components. Henderson 's method being an ANOVA method could lead to negative estimates this were case, set it equal to zero. This phenomenon likely to I,. n C nfl r+ 4 int 1r ttn r .,tI ~~~~11   ~  r~~~~~~ ~~ n1 a np it 'i f'T U u r..4I . . + 73i nllm kar nC h Yna 1 special case of nested error regression model, we will now develop expressions for the posterior distribution for the posterior means given (2.3 expressions variances of Y Here, we have DO) 'Lu, Then m In e(Ini A'1J~.) so that 111 n. + ni)/A} Also, writing = nil 1,..., where xij ii one gets x(1)T 1 (1) x r ,x nm 2n  i= I 1 1 _ Si T i (s) 1 (s) = H(A) say (2.4.2) Next writing i(s) 1 1 one gets (1)T n =m . = i= Ij  n i=1 j AS'!' (" =1X ij ij x H1m 1 =1 j=1  "i(n 1 A) y i(s) T Yi(s) =1 x; jX n illl((x Xlij Y2ij~T (1) Ky  "i("i f(1)) 1 1 1 2(m+ )1 m "2 2 Aml) n 0(A + ni) H(A) I 1H  (nT+g+g p)) a + alA + QO(A)) (2.4.4) Next writing fi  (N. 1  ni)/Ni / \ 1 N. = tN.n. E. x.. 1 j=ni+1ij posterior means, variances and covariances of the finite population means are given by 1  fi(s) + fiE ni(ni + A) + f.E 1 {i  n.(ni + A) 1 11 T i (s)} () Xn. x Em Tijpx x j(Yij  ni(n. + A) ~1  eHB (say (2.4.5) Ni Ni (1) = fV ni(n.+A) fi(s) + {Tni(ni+A) 1(s) 1 1 1 i(s)} i(s> n.(n.+A) iT say =2 SHB .4.6) it Cov N1 1 Ej:*'Yii 1k . k j=l kjy f.f iCov 1 k ni(n.+A) I~1 + {8Tni(ni+A) x H(A) nk(nk+A) n.(n +A) 1 ~11 *~ nk(nk+A) H1() H1 (A) x k=l "kj( nk (nk+A) (nT+go+g1 p2) 1 fifkE (ao+alA+Q o(A)) ni(ni+A) H(A:)( nk (nk+A) x (Ni+A)Ni(n,+A) x H1)( 2 I I ~Tn i(ni+~> "ij(yij r covariances which may be necessary for providing simultaneous confidence set finite population mean vector, we will not use it here. Before we find posterior means and variances of in the infinite population set we give a general discussion comparing HB predictors with EB predictors. Writing N. j=l 1J  f.) 1 i(s) f. ^N i ~it fini(nl + 1 + A)1 A)1 Yi(s) )H1 ) i(s x m i1l  in = gl(A) (say), (2.4.8) 1 j= 1J N~Yi. (1) y (nT + go  p  2)lf{aO + A +( f  "i("i j(Y;j = g2() (say), (2.4.9) we have from (2.4.5), (2.4.6), (2.4.8) (2.4.9) that SEgi(A)yl ,] (2.4.10) = v[gl()y(1), (2.4.11) (2.4.12) In EB analysis, to obtain EB predictor, usually replace Egl (A)I) (1) by gl(A) = eEB (say) where is some estimate of A, which can be ML, REML or ANOVA estimate report a naive measure of posterior variance g2(A) (say). Usually, point estimates eHB are not too apart. to measure posterior variance 2 5EB' we may underestimate actual measure because failure to account for the estimation We may grossly underestimate actual measure if gl(A) varies too much within body posterior distribution this case will be significantly large. We will see this example that for some of the counties = Eg (A)y(1) . 2B SHB usually increases with relative difference between eHB eEB. Now to develop expressions for the posterior means and variances for . we use 1 Theorem 2.3.2. Note that observation made posterior distribution of Section given 2.3, is given f(Ajy), (2.4.4). After considerable simplifications this particular case, we obtain ECUi lIJ = E ni(ni Af S)Y { i(p)  "i(n1 A)x. } / ~i(s) H1() H (A)  ni~e1  eHB (say) (2.4.13) VP ilR ni(n. 1 (s) A1 sf + X. ( Si(p)  ni(ni Af 11 i=1 (nT +  Ti.n p 2)1E A) I a{o + alA m i= + O v + E j(yij + gl _ H*2  SHB (say) (2.4.14) where 2i(p))T , () as given (2.4.2) %Q(A) xli(p)' as given (2.4.3). Note that since >K x .  0 as N. can seen informal 1 that rhs of (2.4.5) (2.4.6) approach limit (2.4 (2.4.14), respectively. We will now get back to actual data analysis of data set given in Table 2.1. use formulas (2.4.5) , (2.4.6) , (2.4.8), (2.4.9), (2.4.13) (2.4.14) to obtain HB and posterior means and variances population means for the 12 counties. HB approach eliminates the possibility obtaining zero estimates of variance components. A number of different priors for R and were tried; both informative noninformative results for the posterior means were quite similar whereas posterior variances varied approximately much as 10%. i 1 1 lustration purpose, have decided report our analysis for the prior with a0 = 0.005, = 0, = 0.005 and = 0. since choice = 0 gives improper posterior distribution we took a1 Table Predicted Hectares of Corn and Associated Standard Errors .005 .005 County eHB eEB eBHF sHB SEB SBHF Cerro Gordo 122.1 122.2 122.2 9.3 9.4 10.3 Franklin 143.6 144.2 145.3 6.9 6.4 6.7 Hamilton 126.2 126.2 126.5 9.2 9.3 10.1 Hancock 124.6 124.4 124.2 5.3 5.3 5.5 Hardin 142.6 143.0 143.5 5.8 5.6 5.8 Humboldt 108.9 108.5 107.7 8.2 7.9 8.4 Kossuth 107.7 106.9 106.1 5.8 5.2 5.4 Pocahontas 111.8 112.1 112.9 6.6 6.4 6.8 Webster 114.9 115.3 116.0 5.9 5.7 6.0 Winnebago 113.3 112.8 112.1 6.6 6.4 6.8 Worth 107.1 106.8 105.6 9.9 9.1 10.0 Wright 122.0 122.0 122.1 6.4 6.5 6.9 eBHF and respective associated standard errors sHB' SBHF for the corn data. Table 2.3 provides the values eHB, eEB, eBHF and eHB for the soybeans data for the same choice of prior hyperparameters, whereas Table 2.4 provides their respective standard errors along with components of SHB. Values of eBHF and SBHF presented in Tables 2.4 are computed using FORTRAN from the formulas given paper are slightly different from values reported Battese (1988). From Tables 2.3, for predicting corn soybeans, one can see that eHB , eHB, eBHF are quite close to each other. From Tables 2.2 2.4, SEB and appear to be smaller than 5BEIF But since sEB is naive posterior s.d. , is probably underestimate true measure. difference From Tables either 2.3 and between eHB we find eB eHB hardly or between any their standard errors SHB. This is what we anticipated for this data. To draw a clear comparison between HB and EB procedures, we added one extra column the end Tables 2.3 2.4. last column of Table 2.3 measures percent relative difference x IeHB  eEBI/ between EB and HB predicted values whereas last column Table 2.3 The Predicted Hectares of Soybeans Obtained Using Different .005 Procedures .005 County eHB eHB eEB eBHF l eEB/eHB xl00% Cerro Gordo 78.8 78.8 78.2 77.5 0.78 Franklin 67.1 67.1 65.9 64.8 1.80 Hamilton 94.4 94.4 94.6 96.0 0.21 Hancock 100.4 100.4 100.8 101.1 0.40 Hardin 75. 4 75.4 75 .1 74.9 0.39 Humboldt 81.9 82.0 80.6 79.2 1.71 Kossuth 118.2 118.2 119.2 120.2 0.84 Pocahontas 113.9 113.9 113.7 113.8 0.18 Webster 110.0 110.0 109.7 109.6 0.37 Winnebago 97.3 97.3 98.0 98.7 0.72 Worth 87.8 87.8 87.2 86.6 0.68 Wright 111.9 111.9 112.4 112.9 0.45 Table 2.4 Standard Errors Associated with Different Predictors of Hectares of Soybeans .005 .005 County SHB SHB SEB SBHF V1 V2 V1/(V1+V2)xl00% Cerro Gordo 11.7 11.7 11.6 12.7 7.67 128.59 5.1 Franklin 8.2 8.2 7.5 7.8 11.94 54.92 18.0 Hamilton 11.2 11.2 11.4 12.4 1.97 123.61 1.6 Hancock 6.2 6.3 6.1 6.3 1.35 37.59 3.4 Hardin 6.5 6.5 6.5 6.6 0.37 41.84 0.9 Humboldt 10.4 10.4 9.9 10.0 22.62 85.40 20.9 Kossuth 6.6 6.7 6.0 6.2 7.99 36.23 18.1 Pocahontas 7.5 7.5 7.5 7.9 0.06 55.98 0.1 Webster 6.6 6.7 6.6 6.8 0.64 43.51 1.5 Winnebago 7.7 7.8 7.5 7.9 4.11 55.70 6.9 Worth 11.1 11.1 11.1 12.1 4.06 118.17 3.3 Wright 7.7 7.7 7.6 8.0 1.62 57.48 2.7 contribution of V1 usually increases with the relative difference In particular, for counties Franklin, Humboldt and Kossuth 1.80%, these 1.71% and relative differences are .84% and as high corresponding contributions of V1 made are as nonnegligible SEB much smaller than 18.0%, SHB 20.9% and for these 18.1%. This counties. one uses a naive EB or estimated BLUP approach, he will tend to underestimate mean squared error (MSE) prediction. should note that though BHF used estimated BLUP , they tried to account for the uncertainty involved estimation their approximations of MSE. Similar approximations of MSE of prediction have been suggested Kackar Harville (1984), Prasad (1990) Lahiri (1990). Now we will consider the lambweight data set Harville given press) example The background presented the data set of Section 2.2. We will use a model 1 similar to the one given (2.2.9) analyze the data set. There assumed, following Harville press) population line effects as fixed. For the purpose will illustration assume with population three effects variance random. components, we This would (age of dam) full column rank, we will write Now we have following mixed linear model Yijkd + sjk =pi + eijkd (2.4.15) 1 ,..., r1ijk' 3 and 1 ,..., 1,..., where "ijk are are and mj , (rA1) , r1) are 1), same are Moreover as (2.2.9) N(O, assume (rA21) and eijkd are eijkd assumed to be mutually independent We want to predict given 2.2. 10). Using (2.4.15) , we will rewrite ~~1 ni.. 1 .* + rj + sjk (2.4.16) where nT are given (2.2.10). We will carry out a noninformative Bayesian analysis using a uniform(R3) prior independent gamma( ao, gamma( a, ~51) and gamma( La2, 1~2 priors RA2 respectively. Using Theorem fixed E(wjk 2.3.2, random effect = eHB (say) being a we can linear find combination posterior mean posterior variance V(wjk   ~~   r I T a 1 14 1. . p + 6i as  S C  Wi k Wi k pg~T (Ccl, 40) C 1 example choice 0 of hype rparamete rs the variance components . gives a noninformative choice prior for = 0 or = 0 will give improper posterior distribution of (A1, we tried several combinations of these hyperparameters which are small positive numbers. Our findings for this data set, provided Table 2.5, are not different from data set. report our analysis  0.0005,  0.05  0.01 in Table 2.6. The estimated BLUPs for w13 w56 reported in Harville press) are 10.98 and 10.29 respectively, whereas corresponding values we obtained using a noninformative HB analysis are 11.0 10.4 respectively. agreement between two sets of estimates remarkably close considering the fact that underlying models 2.2. (2.4.15) are not identical. Harvilie press) also estimated the difference  w56 associated MSE of prediction using both variance components approach HB approach. The estimated given MSE of (0.955)2  w56 whereas naive for EBLUP Kackar approach Harvilie was (1984) approximation it was (1.053) for Prasad Rao (1990) "1. &2 g0 gl 82 gl g2 Table Birth Weights (in pounds) of Lambs Sire Dam Age Weight Sire Dam Age Weight Line 1 Line 4 6.2 13.0 9.5 10.1 11.4 11.8 12.9 13.1 10.4 9.2 10.6 10.6 7.7 10.0 11.2 10.2 10.9 11.7 9.9 8.5 Line 1 2 Line 13.5 10.1 11.0 14.0 15.5 12.0 Line 3 11.7 12.6 9.0 11.0 9.0 12.0 9.9 13.5 10.9 5.9 10.0 12.7 13.2 13.3 10.7 11.0 12.5 9.0 10.2 10.0 Table Predicted Standard Birth Errors Weights of Lambs and Associated aO = .0005 g0=O a = .05 gl=0 a2 = .01 21 g2=0 Line Sire e* s* eHB 9HB 10.1 10.9 11.0 10.4 11.9 11.7 10.8 10.8 11.3 11.1 10.2 10.5 10.5 0.90 0.86 0.62 0.80 0.88 0.71 0.87 0.80 0.70 0.53 0.75 0.82 0.62 0.80 0.79 11.2 10.7 10.8 10.8 11.3 10.4 11.4 10.8 error. HB estimate of Wi3 in this case reported 1.042. as 0.69 posterior s.d. corresponding values obtained reported using our approach are 0.60 0.99 respectively. To conclude this section, we can recommend from whatever sets that have learned from noninformative analysis of HB method these data is clearly a viable alternative usual EB or variance components approach, should given every serious consideration prediction both finite population sampling and infinite population situation. Hierarchical Bayes Prediction Finite Population Mean Vector in Absence of Unit Level Observations Sometimes is either difficult impossible obtain information unit level for small areas. this section we will derive predictor finite population mean vector when we do not have observations unit level. 1 ,..., small area with units, assume that based on a sample of size know only sample mean characteristic inte rest, sample mean auxiliary variables. Also vYi(s) vector we have i(s) (pxl) information on the we  "56 population mean N. Nl1 Y 1 j=(s)"' based vector ( 1 ..., on (s)'"" Vm)T Vm(s)) = 7 where (say), m(s)T rm(s)) 5 ,) m(p)) YiN.) 1 1,..., Consider the following model. Conditional on ?? (N.xl), (pxl) RilIN.i) , 9 independently, where (ii) are Conditional known sampling variances. b and lIiNi) independently (iii)B A are independent uniform(RP) and a prior gamma( a, with Zg Combining (ii) we have conditional b and ~ N(X ib, ~1 (R +61)IN iI independently. Carter Rolph (1974) introduced this type of model Herriot (1979) considered EB approach this problem a special case  e1^ assumed place (ii) that conditional S1,..., independently. Subsequently, place (i') they assumed that conditional , Y. 1 NT/ /..TL \ fllT I   1  , . 1;' _ 9 (Yilr. N(?i. N(Xib , ("iTb nlr ffi variance, they estimated iteratively applying generalized least squares procedure to Y. (s) i(s) ~ N(xTb, 1 + 61 1,..., independently where Yi(s) is the sample mean based on ni units. They estimated = 1i'  1,..., based on their superpopulation model whereas are interested predicting the finite population mean 1,..., based on (i') (iii). Now we will back to our problem. For the sake notational simplicity, we will assume without any loss generality that units is given sample mean by Pi(s) Yi(s) n. = ni1. is based Now first define S "~i. N. .E j=ni.+1 +1 N1 j=ni+1 T "i1) Y1j' have T i(s) "n 1= X n1 1.2i,' 1 4 1 also defined T( .,u) in Section that 1 ,..., Since vector 7,  ni)/Ni f .(u) is enough known, find to predict predictor of Y(u) " Ym(u)) enough (say). find to predict the vector predictive distribution given sample mean vector (say). any quadrat i OSS, predictor c is given we E(Pi) fi)pi ( and its posterior variance is given by = f f(a IY)( . (2.5.2) Now from (i), given b and 6, (s)'"" m(u)) m(s)' is multivariate normal (MVN) with mean 4 and variance Diag(a,..., 2rm) where i = X(s)b,' i+m T b 2 "i(u)~' i = (R1 + 61n. and a2 0i+m (R1 + (N i = 1,..., m. From this, it is easy to derive that given b and 6, a is MVN with E("il (s) b 6) =x (u)b i u) (2.5.3) Cov(ai, aki(s), b, 6) = +mik i +m i k (2.5.4) where 6;i the Kronecker delta which is 1 if i = k and zero otherwise. Using the iterative formulas for expectation and variance we have from (2.5.1) E(ri z(s))  (2.5.4)  fi)Y(s) + fEs)' b (s)  fi)i() + fE i(u by ))  fi)i(s) + fiT(u)Ebl_(s)) (2.5.5) Y1 Y i> v ri (s)) =1 1 (ils)) =f2  E V(a i s) b 6)s)+  (s) VE a. V ,b, ~ (s) (s i t E+m i(s) (u) V"( I(s))Si (u) (2.5.6) Note that from (ii), have (iv) given b and ' (s) where (2.5.7) Xrn( 2 m) 2.5.8) from (iii) we can write given uniform(RP). From (iv) we have joint of Y( ) ~(s) B given (s)' b16) i (1/)exp 1y( Ab)V( =1 (2.5.9) assume rank(A) p and define = (ATv1A)1ATv1 T/ 2 Xi (s)1x1 1 m E1 (2.5.10) m \1 N(Ab, (Xl(s>''''' Diag(~ 2~,,,, (,)Ab~. 'i(s)"i(9)/D? (2.5.10) (2.5.9) can be written as fy AbV 1 Ab S(s)  (s) ~~ (bT(AV1A )(bb) + sQy( b~T(4 ()s (2.5.12) From (2.5.9) and (2.5.12) , it follows that given Y(s) and (ATV1A)1) (\ ^ _) i) Note that in (2.5.10) that b depends on ? ( and 6 since a2 depend on 6. Again using the iterative formulas for expectation and variance we have E[ Iv( 1 *ER. ()&(s)1 /) 10i/<1 1 1 1 / \ X / Y(s) (2.5.13) = EEVTy l)1 + VE(By ()  ( s ) (+ T 2 (s) i(s)/ 8i mi N(1?, E[E(BIP~,). 6)e( E(B IP~s)) E \r(B IP(9)) E to evaluate E(il Ys)) and it follows from (2.5.5), (2.5.6) , (2.5.13) (2.5.14) that enough to evaluate E(1+ml(s)) and the quantities that appear evaluate on the e them, (2.5.13) we need find and the (2.5.14). conditional order distribution of A given From (iii), (iv) (v), joint of (s) is given f (s)' o c (1/.i)exp l( [?ii "o]^ ( AbV)Ty(1 ( Ab) 62 (s) ~~ ~ (s) ~~ exp( ab . .15) Using ~ N(b, 2.5. (ATVl1A )31 fact that given integrating out f(s) from (2.5.15) that joint of Y( is given m 1{ i=l A ~ S1 1 62 A I V12 s) ~ 2.5. Since f(s)) cx (P9. ) the conditional IO~s)) ~ f(y(s) . 9s s> f(y(g) , f(v .5.14) are accomplished now using .16) typically some numerical integration techniques. CHAPTER THREE OPTIMALITY OF BAYES PREDICTORS FOR MEANS 3.1 IN A SPECIAL CASE Introduction In Chapter Two a hierarchical Bayes procedure was introduced prediction in mixed linear models, Section results were utilized for prediction purpose both population set finite in th population e presence sampling and auxiliary infinite information. There considered general case unknown variance components and derived posterior distributions of interest fixed ef assigning fects and independent gamma priors uniform prior inverse variance components. this chapter, we will consider a special case. assume that ratios variance components are known. We derive HB predictors for the mean vector prove some optimal properties of this predictor. In Section (2.2.2) 3.2, of Section we consider the with normal vector of linear model ratios variance components, known. assign a uniform prior to posterior distribution nonsampled units given sampled units we derive the finite population HB predictor of sampling and e finite pop from this ulation mean vector. Later this section, infinite population situation, posterior distribution vector of fixed random effects and HB predictors for linear combinations fixed random effects are determined. Our approach these problems can regarded extensions of ideas Lindley Smith (1972) prediction. Although developed within a Bayesian framework, our results should be of appeal also frequentists. both problems, BLUP notion for real valued parameters (see, for example, Henderson , 1963; Royal 1, 1976) extended in Sections 3.3 and 3.4 to vector valued parameters, is shown that Bayesian predictors Section are indeed BLUP Like other related papers, our BLUP results do not require any normality assumption. With added assumption of normality, BLUPs indeed turn out best unbiased predictors (BUPs) within the class of a that these unbiased Bayes predictors. predictors are addition, BUPs even is shown for some nonnormal distributions. In these sections. we have also distributions. In Sections 3.5 and 3.6 we have shown that these Bayes predictors are best equivariant predictors for both the matrix loss (or standardized matrix loss) quadratic loss (or standardized quadratic loss) under suitable groups of elliptically transformations for symmetric distributions, a broad class of including but not limited normal distribution. We conclude this section introducing a few notations. a square matrix T (txt), tr(T) denotes trace. a symmetric nonnegative definite (n.n.d.) matrix is a symmetric n .n.d. matrix such 1 1 that T2T2 for a symmetric p.d. matrix T ,1 , T 2 is a symmetric p.d. matrix such that = (T1 The Hierarchical Bayes Predictor a Special Case We will assume normal linear model (2.2.2) Section when A known, 2.2. the while We consider vector of B and R are this section ratios of va independently special r ance case components, distributed with uniform(R )  gamma( aO, Here we will cons ider the case of finite population sampling in details 1I U Cl 'A I... t... 1 i, \ ~s0j I I .... 11 F 1 I still interested finding the predictor of (recall 1 suffices that (y(1) SAy(1) find + C (2)), predictive distribution id for this ii of Y given Recall notations K, G given (2.3.4)  (2.3.6). following Theorem Since 3.2. is known instead this case, we have of Theorem 2.3.1. The proof of Theorem 3.2.1 is similar to that of Theorem 2.3.1 is omitted. Theorem 3.2.1. Assume that Then under the model given independent prior for R, uniform(R ) in Section prior for B and predictive distribution with known, gamma( ao, of Y(2) given 8g0) y(1) is multivariate tdistribution with d.f. nT +go P' location paramet er My(1) scale parameter (n + go K x (ao y(1)TKy(1))G Using the properties of the multivariate distribution, possible now to obtain closed form expressions for = y1) and = y(1). particular, Bayes estimate of , (2)) (1) ( y under any quadratic loss is now w1Ven = (1) S(1)( , y(2)) _ y (1) y(2)) (1> + g0 ~(y(2) U We may note that predictor eBF(()) given (3.2.1) is the outcome the model given and with known, and the not depend use of on the uniform(R ) choice the prior on prior B and (proper) it does distribution of R. This can formally seen, assuming all the expectations appearing below exist, as follows: Y R +(x(2) +(x  52 I. (1) 'ic;1X') 1 (1)) E X^ 11~ (3.2.2) qualities, second equality follows from fact that conditional R=r (1)b), r1 I S22 2^ the fourth foll ows from the fact that conditional on R E Y(2)Y()) E{E(y2) B, E X (2)B + E 11, 2 x ((1)T = My( ) where in the above string of y(1) (1) .1 , Cz E ''') Y X ~1(y(l) (1) ''') L~Z1C;:11 (1) 1 (1) 1C11Y 1 (1> 1C11Y (1>T _1 (1.) x C,,y (2> b+ C21E;ll\y (2) definition e* (Y(1) ~BFl ~ of M, robust given (2.3.5). against Thus, choice the of prior predictor hs for R. There are alternate ways to generate same predictor eBF(Y (1) of ^ w(1) Suppose, for example, one assumes onl be known). Then and best with predi ctor known (best (r may linear or may not predictor without normality assumption) ,(2)) of ((1) sense of having the smallest mean squared error matrix given SAy(1) AY + (x(2) 1 1 ~ 2111() 3 a.e., (3.2.3) where (b). say that for two symmetric matrices if F is n.n.d If b unknown, then one replaces UMVUE (BLUE without x(1)T _y1 (1) X E ,Y normality assumption) resulting predictor of 1.II  , (2)) turns out 1 a1 an anf to be e F((1)) PR nredic tnr of y(2))~ 1(Y(1) i c[EzlE 1.i:Y (1) (''') (X(1)TC,1 X''') f(Y(1) "Pnnp p Similarly, in this special case, one can derive predictor (for quadratic loss) of C(b, context infinite population set Denoting this HB predictor y)IY) e*I(Y), *B one can see that the arguments leading to empirical Baye s interpretation of eBF(y()) work equally well to show that ei~1(Y) also possesses the empirical Bayes interpretation. Harvilie (1985, 1988, press) recognized this for predicting scalars. next four sections, we will discuss a few frequentist 3.3 we show properties e* () i ~BFI" )  of egF(y(1)) and best unbiased eBI(Y). Section predictor consider its stochastic domination whereas in Section 3.4 we consider these properties for egI(Y). In Section 3.5 we show that e YF(1)y ) is best equivariant predictor of , y(2)) under suitable groups of transformations, whereas property in Section of eoI(Y) 3.6 we under consider the same best groups equ variance transforma tions. scalar Jeske BLUPs and are Harville best (1987) equivariant have with shown n the that class the of all linear equ variant predictors without any distributional assumption. However, to our knowledge, the equivariance results for vector valued predictors have not been E((b, i(y(f ) Best Unbiased Prediction and Stochastic Domination in Small Area Estimation this section, we assume normal linear model (2.2.2) with known. No prior distribution for B and assumed, , r)T is treated as an unknown parameter. within First, class of prove unbiased optimality of e y(1) predictors of , (2)) Next, we dispense with the normality assumption eF(1y()) within and the e, and prove class of all optimality linear unbiased predictors (LUPs). We start with following definition of a best unbiased predictor (BUP). Definition 3.3.1. A predictor T(Y(1)) said to be a BUP , (2)) if E [T(Y ') for every predictor b(Y()) of .(1) = O for , Y (2)) satisfying E[6(Y<(1)) 0 '' 'p 2)) Y (2))] , Yovided provided n. n.d.  V T(Y ) (1) the quantities are (2))] finite. following general lemma plays a key proving the best unbiasedness of predictor eBF(. ) of ,., I 1 I.1  of ((Y) E(yf i) ~(Y(1) Y( (,(1) _ f(u(') (2)\ .1! .. L Ir I vii assume that each component of g(Y) has a finite second moment Denote by Ug , class of all unbiased predictors 6Yl) of g(Y) with each component of 6(Y(1)) having a finite real second moment. valued Also, statistics (i *e, denote class of functions of Y (1)) with finite second moments having zero expectations identically in 9. Lemma 3.3 A predictor T(Y (1)) E Ug is BUP for g(Y) only Cov[T(Y(1)) =9 .3.1) for every Proof Lemma .3.1 TT Tu(Y(1~) et T(Y) 6u(Y1)) (_)_ If Y(1)) is another pre Idictor then (Y(1)) a) + Cov T(Y(1)) + Cov ((Y(1) g(Y) (3.3.2) *..., . ~ ~(''',3 (y(l)~ VB~(Y(1)) VB~(y(l)) 6(Y (1)) T(Y(1)) T(r(l))~ Cov T(Y 6(Y(1)) = 0. (3.3.3) From (3.3.2) (3.3.3) follows that V0[ (Y(1) Sv0[r(x<1))  g(Y) +  TY( ) , (3.3.4) for all Hence T(Y(1)) is BUP for Only Given that is BUP we will show that condition 3.3. true. First we will show that >t UiY (1) is BUP for gi(Y) for every 1,..., any unbiased predictor for Then 6y Y(1)), a ucomponent column vector with component equal to U (1)), belongs to Ug. Then is n.n.d. So we have V Ui((Y'1))  i(Y)]  i(Y) > o, S , consequently is BUP for Now following usual LehmannScheffe (1950) technique (also Rao, Ti(Y(I))  g(Y) g(y;] Vl~(y(l)) g(y> T(u(l)) Ti(U(1)) gi(y> g(Y ~ V ~(1''', g(Y ~ Ve[Ti(Y(')) Ti(Y(1)) gi(y> Hence, (3.3.1) holds, the proof lemma is complete. Remark 3.3.1. follows from the above lemma (see (3.3.4)) that ifT) and T2(Y(1)) are both BUPs of then r6'''IC = CovO Tl(Y(')) C ov T2(Y(1)) = 0 (Y),  g(Y) (3.3.1)  2(Y(')) T2(Y (1)) Ilu~) for all  2(Y(1) Remark 3.3.2. is also clear that technique of above lemma can applied more general contexts. We will use above lemma prove BUP property of e*F(y) following theorem. Recal 1 from (3.2.1) that eBF(yl)) Theorem 3.3. Under normal linear model (2.2.2), * a  i.e., e; (Y) (Tl(y(l)) Tl(u(l)) Pb~l(Y(1)) + CM)Y ,, (2)\ f LI. Proof of Theorem 3.3. view of Lemma 3.3.1, suffices to show that for every m(Y(1) Cov[ e(Y(1)) that is EG C(MY O ~  y (2))(Y (1))] = f'  = O for or all Since, under the model 2.2.2  E4Y~2~ ( l , using E,.(Y (1)) it suff ices to show that (3.3.6) Since E,(Y(1)) = 0 x()) si  1)2  0, differentiating both sides of this equation w.r.t one gets (see 318 of Rao, 1973) /X (1)T1 (1) 1 (1) 2r (Jl  X b) 11 (1) E11 y d(1) dy = 0. (3.3.7) 2 E (X (1)Ty (1) m(y (1) /o( x exp  f(r(l) t u (2)) MY(1) x(l>TC1. (1) Y11Y ,e11 X(1)) (X(2) (1) dy _L2 r(,(l) (l))exp/ b, X (I)b)m(y(l)) X (l)b) Remark 3.3.3. Equation (3.3.6) can be alternatively proved r1E 11) following way. (x1)T (1) Note that , Y(1)TKy (1)) since is complete suffic ient for 0 Hence (1)T1 (1) Hence X' E Y must have Q covariance vector with every zero estimator m(Y ()), i.e., E[(X (1)T(1))m()] =9. Next we show that conclusion of Theorem 3.3.1 continues to hold even certain nonnormal distributions. Suppose , T)T that = Diag(D, Assume that given N(O, r1A), while the df of R is an arbitrary member of family is absolutely continuous with f(r) = 0 for r < O}. denote subfamily of 1 such that each component of egF((1)) and (.71) Sy(2)) finite second moment under the model (2.2.2) joint distribution of e* now prove following theorem. Theorem 3.3.2. model (2.2.2), eBF(Y()) is BUP ~ N(O, of t r1a) , (2)) under the R has a df from Proof of Theorem 3.3.2. Using Lemma 3.3.1, following proof of Theorem 3.3.1, it suffices to show that r, N(X(l)t2 y (1) (y (1) (3.3.9) Eb ,F[(1)) Eb, m2( (1) 00 for all b and Consider subfamily Sgamma( c 2} of , d) Since (3.3.9) holds for this subfamily fl, EkF^m(1) 36 gives exp( 1 (n +d)1 2c r x exp  Xb)T 1 (1) E ny  (~b LIr( 2^ xrnY~))v'~ d = 0 (3.3.10) c > 0 and > 2. using the uniqueness property of Laplace transforms, follows from (3.3.10) that 5(nT+d)1 exp  X()b)T 1 r 2  X b) x m( a.e. Lebesgue Jexp 1, 2 r > O and  X b)T    (~b x m(y (1) dy= (3.3.11) *e, ( c > 0, ci:(r (I)),y C 1 l\y_ simplifications using (3.3.11) lead x exp SX(1)b)T _1ry 111,1  X ( b) (3.3.12) a.e. Lebesgue 0 and Multiplying both sides of (3.3.12) 1 integrating with respect dF(r) where one gets (3.3.8). Remark 3.3.4. Since does not contain the degenerate distributions of R on (0,oo) , Theorem 3.3.1 does not follow from Theorem 3.3.2. Remark 3.3.5. In Theorem 3.3.2 we take for F* ,we see that marginal distribution of Y given by the family of distributions (c/d) > 0, this > 2} family where distribution with eBF (Y) N*INT, Xl location BUP for (c/d)E, parameter , (2)) for is NTvariate scale parameter (c/d) and d.f. Next will Y(1) show that predictor e~F(Y() (which linear a best linear unbiased predictor /(X(1)TFi:r fl))m(y(l)) x dy Xb, ~(U(') t  , we say that (Y(1)) is a LUP ,(2)) of (Y1) need the following definition. Definition 3 .3.2. A LUP Py( , (2)) of (( ) is said be a BLUF V(HY1) VHY   n.n.d. for every LUP HY(1) , y (2)) , Y<2 of ((1) S ( Sv(py1 a ~~ for all now prove the BLUP property of eBF(y1) for predicting (1) , (2)) this end, we will state lemma whose proof is similar to proof of Lemma 3 .3.1 hence proof will be omitted. Lemma 3.3 A LUP , y(2)) of ((1)  V(1 a BLUP if and only , (2)) Cov0PY O  , Ty()) mY} =9 .3.13) for all every known nTxl vector m satisfying E(mTy (1)) = 0 for The following theorem provides BLUP property BF( )) for predicting f(Y , y(2)). proving this BLUP property we do not need any distributional assumption on e *. We only assume =9 ae~ ) = r14 _~(u(l) _ E(Y(I) Eg(e Proof of Theorem 3.3.3. If E0,(mTy()) = mTx (l)b = 0 for mTX<(1) = OT Hence, , (2)) Cov Y )) = CovC( MY (1) y (2)), Ty (1)] for all last two qualities follow from definition of M and from fact mnTx(1) Applying Lemma 3.3.2, Remark 3.3.6. result follows. already mentioned, normal ity assumption * Y(1). is not needed Theorem 3.3.3 proving the unifies BLUP property extends available BLUP results related estimation finite population mean vector under different models (cf. Ghosh Lahiri, in press; Royall , 1976; others) Remark 3.3.1 one can prove that BLUP unique with probability one. , Ty(1) S C(X ~(y(l) C21)m C (MFI1 1 x (1))(X (1>T 1 X''')  C21C11 C11 ,(1)T T O  a)( (3.3.14) the model (2.2.2) without any distributional assumption on e The optimality of eF(y (1) within LUPs holds a fortiori under quadrati loss 2 in  C tr[gLo (, (3.3.15) where is a n.n. d. matr ix. Such loss will, henceforth, be referred to as generalized Euclidean error w.r.t. optimality Theorem 3.3.2 results under the carry added over via Theorem 3.3.1 distributional assumption (which not necessarily normality assumption) on e natural question to ask now is whether the risk optimality of eF(Y (1)) predictors, holds or at with in least class of within class unbiased LUPs under certain other criterion a broader family distributions of notions of "un iversal" investigate this question, "stochast ic" we need domination, their interrelationship as given Hwang (1985). It. (R F = p a I Iti'' risk 16 Lo Li(rl ~)Tn(h u(2)\12\  FIV(1) IF; L~IIC I . I w.r.t. O for some function The following definition adapted from Hwang (1985). Definition 3.3.3. dominates 2(Y (1)) w.r.t. An estimator (under the for every , general ized 6l(Y () universally Euclidean error every nondecreasing loss function holds and for particular RLo(, loss, risk functions are not identical. Hwang (1985) shown that (see Theorem 2.3) universally dominates under the generalized Euclidean error w.r.t 2 (2))I ' Y if 6(Y(1)) stochastically smaller than (62(Y ) 2 Y(2) , Y say that a random variable is stochastically smaller than if Po(Z1 > x) PO (Z2 > x) for for some have distinct distributions. next theorem shows that for a general ass elliptically symmetric distributions of , eBF(1) universal ly dominates every , (2)) under every generalized Euclidean error w.r. t. a n .n.d. Assume that has an elliptically symmetric given Ir1 Afre*T Ale* (3.3.16) h (e* IA, RL(8  f(Y(1) (U(') HY (1) o, q vi. + i=1 NT i=le i + f(re*TAle f(  *)de* (3.3.17) where (vl,..., vq)T (el,..., eNT)T We will denote this distribution by 8f(O, r1A) where n*<2) denotes the distribution whose is given k(tl, Qa *2 f((t  ) Tg*l(t  )/o2) (3.3.18) where are , g*(pxp) is p.d. Note that normality with mean variancecovariance matrix r1A sufficient not necessary for (3.3.16) (3.3.17) to hold. follows from (3.3.17) Note that that ** exists 1) with distribution .1 2*+ from (3.3.16) has a spherically characteristic function symmetric (c.f.) E exp(iuT e = c (uu) some function (see Kelker, 1970) where =47j, (Ul,..., = NT qv) + q. Hence c.f. given E[exp(iuT e = c(r1uTAu). (3.3.19) (4' *t I B) "f( II , gf(0 ,,~ where + ZDZT Comparing (3.3.16) (3.3.19) one can see from (3.3.20) that has also an elliptically symmetric distribution with given h(w*I , Ir1 2 (rw *TElw*). (3.3.21) Theorem 3.3.4. Under the model (2.2.2), (3.3.16) (3.3.17), eBF (1)) universally dominates every LUP Y, 2)) for of (y1) (y(l)) every p.d. Remark 3.3.7. Theorem 3.3.4 does not contain Theorem 3.3.3 since Theorem 3.3.4 distribution of W* requires , while the elliptical other does symmetry not. of the It should noted though that model assumption made (3.3.16) not necessarily stronger than usual assumption finiteness of certain moments. assumptions of Theorem 3.3.4 This hold even is because if a distribution infinite second moment (e.g., for certain multivariate BLUP property is meaningless in such instance. Now we will Theorem 3.3.4 state rests prove crucially lemma. on this proof lemma. Lemma 3.3.3. If W(NTXl) then e r1C) gf(0, HY(I) ) Proof of Lemma 3.3 proof follows arguments Hwang (1985) From (3.3 .19) follows that E[exp( itTW) = c(r it T Hence E[exp( itTLW)] = c(r 1 TLL ~1 Ttl) (3.3 .22) where is a uxl vector Next using (3.3 .22), exp( T it (LL ~1(  ^T) u1 T)2w 1 T)' (Iu expl itT(LL ~1 ( = c(r 1iT (L 1( T (Iu O)(Iu O)T(LL tT (LLT)tl) ~1Wi (3.3.23) so that lemma follows from (3.3 .22) .3.23). follows as a consequence of Lemma 3.3.3 that WTLTLW (LW) T(LW) 1 T) Wu (, T) w = WTLL W . (3.3 .24) We shall use .3.24) repeatedly for proving Theorem 3.3.4. 'i' = c(r ) tl1 E 5 o)w  A)  CX 2) (3 .3.25) Writing _1 = E 2W* using .3.24) .3.25) one HY(i) 2T 2))] (y(l) Y _HY_(1) SHY (y(1)  cT [H C * = T a. 1 c]Tn{H  C] 1T WOH CIL  C] .3.26) Similarly, * i3F(Y(')) (y(1) 2))] eBF(Y()) (y(1) = W*T CM = wT a. E CM  CT nCM  CT CM W1 WTQ2[ucM.. 1 CT 2V .  C] .3.27) Write  CM Then, .3.26) .3.27) T+ W wuo FEr rT2W u .3.28) a.cC~  C * (2 Y Ut Y(2 T  C)Ef ) O ) l 11 T s21 Ss21) T  C(MEi 1 1T1  a x Yx 0^ x ~2111 A ~11~ x(1)TET (3.3.29) using (3.3.25), CM)(1)x  ll~ MX(1)) = c(x(2) = 0. (3.3.30) Theorem 3.3.4 follows now from (3.3.26)  (3.3.28). Also, since 1 is positive definite, follows from (3.3.28) that rhs of (3.3.26) rhs of (3.3.27) only =9, that is H (cf. Hwang, 1985). 3.4 Best Unbiased Prediction and Stochastic Domination Infinite Population In this section will briefly consider a few optimal properties which of ei (Y) are similar to those of eBF (Y) following closely Section 3.3. First, we note that eBI(Y) is optimal within class of all unbiased predictors of under normal linear model with known. finite population case, no prior distribution B and is assigned, , r)T treated as an unknown parameter. Next, dispensing with = c(x 2) if r c(M rX(1) A + CM Definition 3.4.1. be an unbiased A predictor 6(Y) predictor if E [6(Y) of C(b,  C(b , is said = 0 for all An unbiased predictor U(Y) of C(b , v) is said to be for v0e (Y) V every unbiased V U(Y) OV predictor  c(b, 6(Y) of C(b, is n.n.d. for provided the quantiti exist finitely. Recall that 0b following lemma is analogous to Lemma 3.3.1, concerns characterization of a BUP based on for some known function g where each component has a finite second moment. Lemma 3.4.1 An Es[T (Y)U (Y) 0  unbiased predictor U(Y) BUP for g(W) of g(W) with only Cov{U(Y) statist i m(Y)] m(Y) such that every Eo(m(Y)) 0 and E0[m2(Y) oo00 for Lemma 3.4.1 can proved similarly Lemma 3.3.1 proof is omitted. We will use this lemma to sketch a proof of the following theorem which concerns best unbiased prediction of C(b, Theorem 3.4.1. Under normal linear model (2.2.1), SBI(Y) is the BUP of C(b, of g(W)  g(W) v)}m(Y)~ = J for (3.4.1) Note, however that with POprobabi 1 ity E[SBI(Y) eiI(Y) TDZT1 (y    _Xb) S(XT  TDZT 1X (3.4.2) From (3.4.1) (3.4 it suffices to show that E [(XT = 0 for This is proved similar to (3.3.6). Remark 3.4.1. conclusion of the above theorem holds even certain nonnormal distributions. Theorem 3.3.2, one can show that e;I(Y) of C(b v, ) under the model (2.2.1) where e* ~N(O, r1A) R has a df from r* where e* are same in Theorem 3.3 Next, note that predictor eiI(Y) linear can proved as in Theorem 3.3.3 that is BLUP under linear model (2.2 without any E {eBI (Y) , v) sb xb) ly),(y~ Now we will show that 131(Y) dominates universally of ((b, an elliptically symmetric distribution Consider the generalized Euclidean error loss w.r.t. a uxu p.d. matrix 0 2  cIn  ).  C(b, v (3.4.3) risk function predictor 6 for predicting ( under a loss function which is a function of generalized Euclidean error w.r.t. O for some function following definition similar to Definition 3 3.3. Definition 3.4.2. An estimator 61(Y) universally dominates another estimator 62(Y) (under the generalized Euclidean error w.r.t. for every 0 every nondecreasing function , C; holds and for a particular Now we RL( , loss, will the r state isk functions are not identical following theorem on stochastic domination of e*I(Y) proof will be omitted because its similarity to Theorem 3.3.4. of e* 16 L1 C>To ( (y) RL(B Ee RL (B 