Bayesian prediction in mixed linear models with applications in small area estimation

MISSING IMAGE

Material Information

Title:
Bayesian prediction in mixed linear models with applications in small area estimation
Physical Description:
ix, 214 leaves : ill. ; 29 cm.
Language:
English
Creator:
Datta, Gauri Sankar, 1962-
Publication Date:

Subjects

Subjects / Keywords:
Linear models (Statistics)   ( lcsh )
Estimation theory   ( lcsh )
Bayesian statistical decision theory   ( lcsh )
Population forecasting   ( lcsh )
Statistics thesis Ph. D
Dissertations, Academic -- Statistics -- UF
Genre:
bibliography   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1990.
Bibliography:
Includes bibliographical references (leaves 207-212)
Statement of Responsibility:
by Gauri Sankar Datta.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001687933
oclc - 25116672
notis - AHZ9963
System ID:
AA00002109:00001

Full Text













BAYESIAN PREDICTION
WITH APPLICATIONS IN


IN MIXED LINEAR MODELS
SMALL AREA ESTIMATION


GAURI


SANKAR


DATTA


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


































to my parents and


teachers,


with


regards




















ACKNOWLEDGEMENTS


would


like


to express


sincere


gratitude


Professor Malay Ghosh


being my advisor,


for originally


proposing the


problem and


for


the attention


received


from


him for the


past


five


years.


Without


enormous


patience,


encouragement


guidance,


it would


not


have


been


possible


to complete the


work.


Throughout my years


graduate


program,


he has


been


my friend


, philosopher


guide ,


consider myself


extremely


lucky to get


as my dissertation


advisor.


would


like


thank


Professors


Michael


DeLorenzo


Ronald


Randles


for


serving on my


committee


Also,


am grateful


to Professors


Ramon


Littell,


Kenneth M.


Portier


P.V.


for


being


on my Part C and


oral


defense


committees.


special


Richard


thanks


to Professor Ghosh


Scheaffer for their genuine


interest,


Professor


incessant


efforts


unlimited


energy which


made


it possible


remove


stumbling stone out


of my way to


join


University


of Florida.


would


also


like


to express my


irrat itude


to mv


respected


teachers.


especially to











were


crucial


to my


coming to


United


States.


feel


very fortunate


being able


turn


them whenever


necessary.


would


also


like


acknowledge


the


help and


support


received


from Krishnandu


Ghosh


preparing me


come


United States.


am al


so grateful


highly


indebted


to my


Alma


Mater RamaKrishna Mission


Residential


College,


Narendrapur,


West


Bengal,


India,


for


support


received.


not


been


admitted


to Narendrapur,


would


have


never


pursued my


studies


in statistics.


this


respect,


will


always


remember our


Principal,


Respected Swami


Suparnananda


Maharaj,


our Head


Department,


P.K.


Giri,


their


care


concern


about me.


would


like


to offer my


humble


regards


them.


will


take


this opportunity to


express my appreciation


to Professor


Ashok Kumar Hazra


for


initiative


is a great


took


pleasure


introducing me


to acknowledge


to Narendrapur.


Professor Uttam


Bandyopadhyay to whom


definitely


owe


a lot


for my basic


understanding of


statistics as an


undergraduate.


interest


was further


stimulated


by the


insightful


teaching


of Professor S.K.


Chatterjee of


Calcutta University when


was a master's


student.


would


nona for me.


like


thank my parents for


indebted


they


to my numerous


have


well-


stei n dee I


*


R











Bibekananda Nandi,


Professor


K.M.


Senapati,


Mrs.


Durga


Senapati,


K.C.


Ghosh


and Mrs.


Bimala Ghosh,


who


have


always


considered me


consider myself


as a part


extremely


of their family.


lucky to


get the


also


affection ate


concern


of Mrs.


Senapati,


who


has always


treated


like


her own


son.


My heartfelt


thanks are


for my


"unofficial"


host


family


in Gainesville,


Malay Ghosh


his wife,


our


beloved Dolad i.


would


like


thank


A.P


. Reznek of the


United


States Census


Bureau


for providing me


with


computing


facilities during my


stay


in the


Bureau


as an


ASA/NSF


Research


Associate.


Last


least


would


like to


thank Ms.


Cindy Zimmerman


for


her skillful


typing


putting a scribbled manuscript


into


final


form.




















TABLE OF CONTENTS


Page


ACKNOWLEDGEMENTS


ABSTRACT


a.. I .a .a... .a .a a .. ... a


v111


CHAPTERS


ONE


INTRODUCTION


Literature
The Subject


Review .. ... ...
of This Dissertation


. 10


TWO


BAYESIAN PREDICTION
GENERAL CASE ..


OF MEANS


IN LINEAR MODELS
. .a .a ..a ..a. .a. S


..15


Introduction


Desc
Mode
Hier
Appl
Anal
Hier
Popu
Leve


ript
1 wi
arch
icat
ysis
arch
lati
1 Ob


Cr --.I


It! --


ion Or me nierarcnic
th Examples ..
ical Bayes Analysis .
ions of Hierarchical

ical Bayes Prediction
on Mean Vector in Abs
servations . .


Bayes


Baye

of
ence


* .. 18
.. ... 31


.*


. .. ... 37
Finite
of Unit
S ......57


THREE


OPTIMALITY OF BAYES PREDICTORS
A SPECIAL CASE .... ... .


FOR MEANS


IN
S. ... .. 65


Introduction
The Hierarchi


. .. 65


Bayes


Predictor


3.3

3.4

3.5


Spec
Best
Domi
Best
Domi
Best


ial
Unb
nati
Unb
nati
Equ


ase
ase
n i
ase
n i
var


Predict
Small A
Predict
Infinit
ant Pred


ion and
rea Est
ion and
e Popul
iction


Stochast
imation
Stochast
action ...
in Small


... 67
ic


C


. .88


*


--










FOUR


ASYMPTOTIC
PREDICTORS


OPTIMALITY
FOR MEANS


OF HIERARCHICAL


BAYES


Introduct i
Model. Los


4.2.2


>n ....... . ....... 114
i, Prior and Predictors ....... 117
ral Expressions for Bayes
s Difference. .. . . ... 119
om Regression Coefficients


Error


Regr


Model


S. .120
S. 129


4.3


4.4


Optimal ity


with


Stage Variance Component:
Model ....................
Asymptotic Optimality with
Variance Components ......


Known
Fay-H


First
erriot


Unknown


. 137


. . . 153


FIVE


SIMULTANEOUS BAYESIAN
AREA VARIANCES ......


ESTIMATION


OF SMALL


Introduction


5.2


5.3


5.4


Baye
when
Know
Asym
Regr
Vari
Asym
Regr


Estimation of a
Ratios of Varian


Quad rati
ce Compon


in N
Known


sted
Ratio


in Ne
Unkno


Error
of


S. . 173
Error


Variance


Components


SIX


SUMMARY


FUTURE


RESEARCH


Summary
Future


Research


... . 189
... . . 190


APPENDICES


PROOF


OF THEOREM


2.3.1


. 194


AN INDEPENDENCE RESULT
ELLIPTICALLY SYMMETRIC


IN A FAMILY OF
DISTRIBUTIONS


S.. 203


BIBLIOGRAPHY


. .. .. .. . .. . . . .. .. .. . . 207


BIOGRAPHICAL


SKETCH


. ...... . .. . . . 213

















Abstract of Dissertation Presented


the Graduate School


the University


Requirements


of Florida


for the Degree


in Partial


of Doctor of


Fulfillment of


Philosophy


BAYESIAN PREDICTION


IN MIXED


LINEAR MODELS


WITH APPLICATIONS


GAURI


IN SMALL AREA ESTIMATION


SANKAR DATTA


August,


Malay Ghosh


Major Department


Statistics


Small


area estimation


is gaining


increasing popularity


recent


times.


Government


agencies


United


States


Canada have


been


involved


estimating unemployment


per


for many


capital


state


income,

local


crop yield,

government r


etc.

-egions


simultaneously

Typically,


only


a few samples are


available


from an


individual


area.


Consequently


reliable


estimators of


"parameters,


" such


the mean


or the


variance


for the


area,


need


"borrow


strength"


from similar


neighboring areas


implicitly


explicitly through


a model.


Such


estimators


usually have


smaller mean


squared


error of


prediction


than


survey


estimators.


this dissertation


a general


hierarchical


Bayes


Chai rman:


1990


rates,


Sa a











regression


coefficients model ,


authors are seen


to be special


etc.

cases


considered


earl ier


proposed


general


model.


predictive distribution


characteristic


inte rest


for the


unsampled


population


units


is found


given


the observations on


sampled


units


is used


estimators of


to draw


several


inference


small 1


particular,


area means and


simultaneous


variances are


developed.


A mixed


linear model


with


noninformative


prior


regression


coefficients


(or fixed


effects)


independent gamma priors


(possibly noninformative)


for the


inverse


variance


components


is used.


a special


case


this


HB analysis,


when


vector


ratios of


variance


components


is known,


predictor of


sampling


is shown


vector of


means


possess some


finite


frequentist


population


optimal


properties


(such


as best


unbiased


predictor,


best


equivariant


predictor,


etc .)


basically


under the


elliptical


symmetry


assumptions.


Performance of


this


HB predictor


is evaluated


comparing


Bayes


risk with


that


of subjective


Bayes


predictor with

superpopulation


"true"


or


"elicited"


parameters.


prior for the


is shown


that,


unknown


under a


balanced


one-way


random effects model 1


with


covariates and


ax'sP maeI~


niua Ired


errnr


1 nAs -


the difference


I-E t-,ICc


Raves


.



















CHAPTER


ONE


INTRODUCTION


Literature


Review


linear models


astronomers


for predicting the


positions of


Starting from


celestial


these


bodies


days,


goes


use


back several


of model-based


centuries.


inference


prediction


received


considerable


attention.


part icular,


animal


plant


breeders


have


used


such models


predicting


some


characteristics of


future


progeny.


Starting with


pioneering work of Henderson


(1953),


considerable


attention


been


devoted


this


problem.


refer


to Gianola and


Fernando


(1986)


Harville


press)


where


other


references are


cited.


other


hand,


survey


analysts


have


used


the model-based


approach


finite


population


sampling with


goal


predicting


certain


characteristics of


population


basis


unsampled


units


observed sample.


Early work


this


topic may


found


in Cochran


(1939,


1946)


where


finite


population


is viewed


as a realization


from


hypothetical


superpopulation.














small 1


area statistics


was


in existence


as early


as the


llth


century


England


17th


century


in Canada


(see


Brackstone,


1987).


However,


these


early


small


area


statistics


were based


on data obtained


complete


enumeration.


availability


limited


resources


advent


of sophisticated


statistical


methodologies,


past


few decades,


sample


surveys


for most


purposes,


have


been


widely


used


as the


means of


data collection


contrast


to complete enumeration.


data collected


from


these


surveys


have


been


very


effectively


used


to provide


suitable


statistics


national


state


levels on


regular

sublevel


basis.


However,


below the


state


use of


level


(for


survey


data


example,


county


other subdi


vision)


was


limited


because


the estimates


these


small 1


areas


usually were


based


on small


samples


produced


unacceptably


large


standard


errors


coefficients


of variation.


improve


reliability


small


area statistics,


is necessary


to have


a much


larger


sample


size


an


individual


area


than


can


afforded


with


limited


resources


available.


Consequently,


use


of survey data


(possibly


or














During the


last


United States and


few years,

Canada, h


many


iave


countries,


recognized


including


the


importance of


small


area estimation.


Recently


there


growing concern among several


governments


with


the


issues


of distribution,


equity


and disparity.


There may


exist


subgroups


within


a given


population


which


are


far


below the


average


certain


respects,


thereby necessitating


remedial


action


on the


part


government.


Before


taking such


an action, there

accordingly, the


a need


statistical


identify


data at


such


subgroups,


relevant


subgroup


levels must


available.


different


government


agencies


like


the Census


Bureau,


Bureau


of Labor


Statistics,


Stat ist i cs Canada and


Central


Bureau


Statistics


of Norway


have


been


involved


obtaining


estimates of


population


counts,


adjustment


factors


census


counts


, unemployment


rates,


per


capital


income,


etc.


state


local


government


areas.


techniques


face

have


this


emerged


problem,


that


small 1


area estimation


"borrow strength"


from similar


neighboring areas


for estimation


prediction


purposes.


Through


use


some


appropriate model


auxiliary


information


(possibly


obtained


through


complete















over the


survey


est imators


a good


review of


small


area estimation


literature one may


refer to Ghosh


(1990).


necessity


"borrowing strength"


been


real ized


by many


stat i st i c i ans.


Ericksen


(1974)


advocated


use


regression


method


for estimating population


changes


local


areas.


Herriot


(1979)


proposed


adaptation


James-Stein


estimator to


survey


estimates


income


small


areas.


Survey


estimates


being


based


a small


sample


size


(which


is usually 20 percent


population


of size


less


than


1000)


usually


have


large


standard


this,


errors


these


coefficients of


authors f i rst


variation.


a regression


To rectify


equation


census sample


estimates,


using


as


independent


variables


county values,


tax


return


data for


year


1969


data


housing from the


1970


census.


The


estimate


they


provided


each


place


was


a weighted


ave rage


sample

Harter

under


estimate


and

corn


Fuller


(1988)


soybeans


regression

considered


estimate.

prediction


12 counties


Battese,


areas


north-central


Iowa based


satellite data


1978 June E

Battese,


:numerat ive


Harter and


survey

Fuller


nd LANDSAT

(BHF) used














sampled


counties.


Fuller


Harter


(1987)


also considered


a multivariate


extension


this model.


There


is a similar problem of


prediction


faced


by the


animal


breeders-.


purpose


of selecting the


best


animals


for


future


breeding,


they need


come


up with


index for


each


animal


under


consideration.


Henderson


(1953,


1975)


predictor


advocated


(BLUP)


certain


use


best


linear


linear unbiased


combinations


of fixed


random


effects


using a mixed


linear model.


Harville


press)


used


a mixed


linear model


for predicting the


ave rage


sires


weight


of single-birth male


belonging


to different


lambs which


population


are


lines


progeny


dams


belonging to different


categories.


Harvilie


Fenech


(1985)


considered


this


example


for estimating the


heritabilit


other


problems


based


on a


linear model


come


varietal


trials


comparative


experiments


comparative experiments,


several


treatments


have


to be


compared


their effects


or some


suitable


contrasts


have


to be estimated.


Multicentered


clinical


trials


are


good


examples of


comparative experiments


(see


Fleiss,


1986).


Problems


this


type


ones mentioned


in the














The methods


that


have


usually been


proposed


in model-


based


inference


use


either


a variance


components approach


or an


empirical


Bayes


(EB)


approach,


although


as pointed


out


by Harville (1988,


press),


the distinction


between


the


two


is often


superfluous.


Both


these


procedures


use


certain

First,


mixed


linear models for


assuming the


variance


prediction


components to


purposes.

be known,


certain


BLUPs or


EB predictors


are


obtained


for the


unknown


parameters of


interest.


Then


unknown


variance


components are

fitting of con


estimated


stands or the


typically


by Henderson's method


restricted max i mum


likelihood


(REML)


method.


resulting estimators,


which


can


called


estimated


BLUPs or


EBLUPs


(see


Harville,


1977) ,


are


used


for final


prediction


purposes.


Empirical


Bayes


approach


in small


area estimation


was


first given


Herriot


(1979)


later


also


used


by Ghosh


Meeden


(1986),


Ghosh


Lahiri


(1987a,


1988)


among others.


According to


this


procedure,


first


a Bayes


estimate of


unknown


parameter of


interest


is obtained


by using a normal


prior or using a


linear Bayes


argument


(Hartigan,


1969).


unknown


parameters of


prior


are


then


estimated


some


classical


methods


like


method














Although


above


approach


of EBLUP


or EB


is usually


quite


satisfactory for


point


prediction,


1s very


difficult


to estimate


standard


errors associated


with


these


predictors.


This


is primarily


due


the


lack of


closed


form expressions


for the


mean


squared


errors


(MSEs)


(1984)


EBLUPs or the

suggested an


EB predictors.


approximation


Kackar


the


MSEs


Harvilie


(also


Harville,


1985,


1988


, in


press;


Harvilie


Jeske,


1989).


Prasad


and Rao


approximate


MSEs


(1990)


proposed


in three


estimates of


specific


mixed


these


linear models.


these


approximations


rest


heavily on


normal ity


assumption.


Recently,


Lah iri


Rao


(1990)


considered


this


problem,


relaxing the


normality


assumption,


assuming


some moment


conditions


without


presence of


auxiliary


information.


work of Prasad


(1990)


suggests


that


their


approximations


work well


when


number of


small


areas


is sufficiently


large.


not


clear


though


how these approximations


fare


for a small


or even


moderately


large


number


of small


areas.


Ghosh


as an


Lahiri


alternative


press)


EBLUP


proposed


or the


an HB procedure


EB procedure.


HB procedure,


one


uses


posterior mean


for


estimating














often


complicated,


can


found


exactly via numerical


integration


without


approximation.


The model


considered


by Ghosh


Lahiri


press)


was,


however,


only


a special


case


so-called nested


error


regression model,


also


used


by BHF


A similar model


was


considered


by Stroud


(1987),


general


analysis


was


performed


only for the


balanced


case


that


is when


number of


samples


was


same


for


each


stratum.


Other models


have


also


been


proposed.


a recent


article,


Choudhry


(1988)


considered


five


specific


models


small


area estimation


not


included


earlier work of


(1979)


Prasad


Cumberland


(1990).


(1989)


Recently,


considered


Royal 1


certain


cross-c


lassificatory models for


small 1


area estimation.


latter


carried


out


a Bayesian


analysis assuming the


degeneracy


certain


terms


an usual


two-way


linear


model.


For a Bayesian


analysis


context


animal


breed ing,


one may


refer to Gianola and


Fernando


(1986).


However,


they


consider the


HB analysis.


They


used


subjective


informative


priors


which


are


constructed


from


previous data and


experiments.


Also


, they showed














important


special


case


which


arises


in the


above


approaches which


is also


important


the theory


least


squares.


When


ratios of


variance


components


are


known,


predictors


(least


squares,


empirical


Bayes or


hierarchical


Bayes)


are


BLUPs


(Henderson,


1963).


related


BLUP


results


predicting scalars


in finite


population


Cumberland


sampling one may


(1989),


refer to


Prasad


Royall 1


(1990)


(1979),


several


others.


Harvilie


(1985,


1988,


in press)


pointed


out


BLUP properties of


Bayesian


scalars


in general


mixed


linear models


(see


also


Harville,


1976).


Ghosh


Lahiri


press)


have


extended


Henderson


others scalar


BLUP


notion


to show the


Bayesian


predictor of


vector of


finite


population


mean


is BLUP


To conclude


this discussion,


we will


briefly mention


another problem.


far,


we have


considered


problem of


estimating the


mean


in finite


population


sampling.


Another


important


problem


finite


population


sampling


is estimating the


finite


population


variance.


Ericson


(1969)


found


Bayes estimator of


finite


population


Empirical


variance


Bayes


under


estimation


a normal


of finite


theory


set


population


variance














Subject


of This


D i ssertat i on


this dissertation,


we present a unified


Bayesian


prediction


theory for


linear models


small


area


estimation


context


finite


population


sampling.


general


Bayesian model


as an extension


is presented


ideas of


which

Lindley


can


regarded


and Smith


(1972)


to prediction.


This


general


model


can


also


be applied


infinite


population


situations,


for


example,


animal


breeding


other


applications


where


a mixed


linear


model


used.


In Chapte


r Two,


introduce


a general


HB model


use


this model


for simultaneous estimation


of several


small


area means


in finite


population


sampling.


Some of


widely used models


small


area estimation


including the


nested


error


regression model


(Battese


al.,


1988;


Prasad


Rao,


1990;


Stroud,


1987;


Ghosh


Lah i ri,


press),


random


regression


coefficients model


(Dempster et


al .,


1981


Prasad


Rao,


1990),


cross-classificat o ry


models


stage


(Royall,


1979;


sampling models


Cumberland,


(Ghosh


Lahiri,


1989)


1988;


multi-


Malec


Sedransk,


1985;


Scott


Smith,


1969)


can


regarded


special


cases


of our model.


posterior distribution














population,


given


the data,


conditional


distribution


conditional


mean


variance


vector of


effects are


provided.


These


two


analyses are


applied


two


real


data sets.


is worthwhile


to mention


that


Bayesian


analysis


linear models was


initiated


by Hill


(1965).


also


Hill


(1977,


1980).


For a good


exposition


HB analysis


see


Berge r


(1985).


In Chapter


discussed


this model,


which


Three,


previous

assumes


a special


chapter

known


case


of HB models


is considered.


rat ios


Based


of variance


components,


certain


optimal


properties


of the


HB predictors


proposed


within

appeal


this


a Bayesian


also


chapter


are


framework,


frequentists.


proved.


these

The


Although,


results


should


BLUP notion


developed


be of

real


for


valued


parameters


is extended


vector valued


parameters,


is shown


that


Bayesian


predictors derived


this


chapter


are


indeed


BLUPs.


From


this,


as a special


case,


follows


that


Bayesian


predictors of


finite


population mean


vector and


other


linear


parameters are


BLUPs


as well.


BLUP


result


for the


finite


population


mean


vector unifies


a number


similar


results derived


under specific models


(e.g. ,


Royall,


1979;


Ghosh


on














suitable


subclass


elliptically


symmetric


distributions,


including but


not


limited


normal,


the


HB predictors


are shown


best


unbiased;


that


they


have


smallest


variance-covariance matrix within


class of


unbiased

been able


predictors.

to show that


Also,

the


following Hwang


BLUPs also


(1985),


"universally"


we have

(or


stochasticallyy")


for elliptically


dominate


linear unbiased


symmetric distributions.


The


predictors


notion


"universal" and "s

precise in Chapter


tochastic"

Three. A


dominant ion


lso,


will


be made


is established


that


under


a suitable


group of


transformations,


pred ictors


are


best


within


class


of all


equivariant


predictors for


elliptically


symmetric


distributions.


Jeske


Harvilie


(1987)


have


shown


that


scalar BLUPs are


best


equivariant


within


class of


linear equivariant


predictors


to our


without


knowledge,


any

the


distributional

equivariance re


assumption.


:suits


However,


for vector


valued


predictors


have


not


been addressed


before


this


context


their


full


generality.


In Chapter


results


Four,


regarding the


we have

Bayes r


established


isk


some


performance


asymptotic

certain H


predictors of


finite


population


mean


vector.


We have














that


under


average


squared


error


loss


Bayes


risk


difference


between


HB predictors and


subjective


Bayes


predictors for


"true"


prior


goes


to zero as


number of


small


areas


goes


infinity.


This


shows our HB


predictors


are


asymptoticallyy


optimal"


(A.O.)


sense


of Robbins


(1955).


The


A.O.


property


certain


predictors


arising naturally


context


finite


population


sampling


was


proved


in Ghosh


Meeden


(1986),


Ghosh


Lahiri


(1987a)


Ghosh,


Lahiri


and Tiwari


(1989).


Chapter

of several s


special


Five


trata


cases


is devoted

variances.


nested


the simultaneous

have considered


error


estimation

the


regression model


considered


in detail.


property


by Ghosh


in Chapter


these


Lah i r i


Four,


predictors.


press)


we have


Ghosh


and Stroud


proved


Lahiri


(1987)


A.O.


(1987b)


Lahiri


Tiwari


press)


have


proved


A.O.


property


certain


EB predictors


finite


population


variances.


reemphasize


that


present dissertation


provides


a unified


Bayesian


analysis


both


finite


infinite


population


framework.


For finite


population,


we


unify


number of models


considered


earl ier


by different authors.














to our


knowledge,


estimates


of MSEs or good


approximations


thereof


are


not


available


except


for a few


specific models.


Bayesian


procedures


this dissertation,


the other


hand,


can


serve


as a general


recipe


to handle


a greater


variety


of problems.


Also,


inferential


methods of


following


chapters are


implementable


for data analysis,


especially


in these


days of


sophisticated


computing


facilities.


















CHAPTER TWO


BAYESIAN PREDICTION


OF MEANS


IN LINEAR MODELS:


GENERAL CASE


Introduction


this


chapter we


will


consider two similar


different


small


population


prediction


problems


simultaneously.


area estimation


sampling and


problem


problem


other


comparative


problem


context


problem deals


experiments


refers to


finite


with


the


context


of ANOVA,


ANOCOVA


or


linear


regression


infinite


population


situation.


In both


these


cases,


a mixed


linear


model


used.


first


case,


we are


interested


predicting some


finite


population


characteristic


(e.g.,


finite


population


totals or means)


whereas


second


case


we


are


interested


in predicting


linear functions of


fixed


random effects.


finite


population


sampling set


we


assume


that


there


are m strata,


stratum Ui


containing a


finite


number of


units


with


units


labe lled


TUT


UiN.


Let


Y.j denote


with


some


characteristic


unit


stratum


interest associated


1,...,














some


finite


cost.


are


interested


predicting some


linear


combinations of


these observables


(like the


finite


population


total


or mean


for each


small


area or domain)


using a quadratic


loss.


notational


convenience,


we will


denote


a sample of


size


from the


stratum


the other


' Yi2

hand


,9 .


in.


infinite


population


set


are


interested


particular


predicting


contrasts)


fixed


near

and


combinations (

random effects.


Note


that


this set


up these quantities are


observables.


For this


problem


too,


use


a quadratic


loss


function.


We will


quantities


use


we want


word


predictands to


to predict


both


refer to


problems.


analysis


assume


will


be done


ratios of


in two


stages;


variance


components


first

are


stage we

known


whereas


situation


the

where


second


stage


we consider the


variance


more


components are


general

unknown.


In Section


a general


HB model


will


be described


a number


interesting


examples arising


finite


population


will


sampling or


considered.


in the


Some


of the


infinite


population


existing models


set

used


context


of finite


population


sampling are shown


to be














Realizing the


importance of


problem,


most


general


unknown


situation


will


where


be considered


the variance


this chapter,


components are


whereas


known


ratios of -variance


components


will


be considered


next


Section


chapter.


2.3.


This


A general


general


mixed


situation


linear model


is considered


is considered


some


prior distribution


parameters which


the variance


consist of


components.


is assigned


vector of


first


to all


fixed


unknown


effects


part of Section


2.3,


for the model


introduced


in Section


2.2,


have


found


posterior


(predictive)


distribution


of the


characteristic


interest of


nonsampled


population


units


given


values of


that


characteristic for the


sample


units


finite


population


sampling.


Also


posterior mean


vector


posterior variance-covari ance matrix


corresponding to


characteristic vector of


nonsampled


units are


obtained


from this


predictive distribution.


In particular,


posterior means


variances of


finite


population


means


small


areas are obtained.


In the


second


half


of Section


2.3,


have


obtained


posterior


distribution


the


vector of


fixed


random effects


for


the model


introduced


in Section


2.2.


particular,














In Section


2.4,


we have


applied


results


of Section


2.3 to


some


actual


data sets.


First,


we shall


consider the


corn and


soybeans data


which


appeared


in Battese,


Harter


Fuller


(1988).


Using the


HB analysis developed


Section


2.3


we have derived


the


posterior means


posterior standard deviations for the


12 small


area


(county)


means.


second


data set containing the weights


of 62 single-birth


lambs appeared


Harvilie


press).


This


set


is analyzed


HB methods


up developed


second


for


half


infinite


of Section


population


2.3.


Finall


in Section


an HB analysis of


the model


considered


by Carter


Rolph


(1974)


subsequently


Herriot


(1979)


to estimate


per


capital


income


small


places


is considered.


this


situation


unit


level


observations are


not


available


we are


interested


predicting the


finite


population mean


for each


small


area.


Here


sampling variances are different and


are


assumed


to be known;


also,


a uniform


prior


on the


regression


coefficients and


a gamma prior


(proper or


improper)


on the


inverse


prior variance.


Description


Hierarchical


Bayes


Model


with


Examples













r- )


(B) conditional


(C)B,


R and


N(O

have


r- D(A));

a certain


joint


prior distribution


proper


improper.


Stages


of the model


can


identified


as a


general


mixed


linear model.


see


this,


write


(2.2.1)


where


is the


vector


fixed


effects,


are


mutually


independent


with


N(O ,


r-19) and


N(O,


r-1D(A))


are


known


design matrices,


is a


known


positive definite


(p.d.)


matrix,


while


is a p.d.


matrix which


is structurally


known


except


possibly for some


unknown


examples


follow,


involves


ratios


variance


components.


context


of small


area estimation,


part it ion


Y(NTXl),


X (NTxp),


Z(NTxq)


e(NTXl)


with


conformity


rewrite


model


given


2.2.


= IL)


(2)
+ e(2).


2.22)


as


N(Xb +


Xb +


12 (x)


(1)

(2)


2(1)

(2)














S(2)((NT-nT) xl) corresponds


vector of


unsampled


units.

(.(1)T


We will


y(1)T )


, .


further partition


where


into


y(1)T


ni-component


vector


(1)
Yi


Yil


,. .


small 1

((2)T


Yin. )T


area.


is the


Similarly


(2)T)
Ym ,


S...


vector of

, y(2)T


where


sampled


can


(Ni-ni)-component


units


from the


partitioned


into


vector


(2)
xi


(Yi +l' "
the ith sma


Yi ,N
1,N)


Lii


area.


vector


of unsampled


our primary


units


objectives


for

in small


area estimation


vector


(71 ..


to estimate


ym)T


where


finite
N.


population mean


>2YiJ/ Ni


1 .


More


generally,


we may


interested


predicting the


vector


(say)


this


linear


combinations


known matrices


purpose


distribution


it suff

of Y(2)


ices


given


, y2))


+ c (2)
+ CY


A(uxnT)


find


C(ux(NT-nT)).

predictive


y(1)


next


section


this


will


accomplished


by using model-based


approach


survey sampling.


Before we


consider the


other problem


infinite


population


introduced


set


identify


small


some


area estimation


existing models


several


authors


special


cases


(2.2.2).


In what


follows,


we shall


use


ident-.^ t.' mn+-r iv


notat. nfl


mnr


(1)T


(1)
AY


E(Y(1)


I1Y1I


Il,


I II


ff


I -


1 11| -














Also,


col
1

(B.)


denote


the matrix B ,...,


Z)T


k
SA.
i= 1


denote


the matrix


First,


consider the


nested


error


regression model


= jb
- v
- ^ii-


+ eij


(j = 1,..., N.; i = 1,..., m).


(2.2.3)


The model


was


considered


by Battese,


Harter


Fuller


(1988).


They


assumed


to be mutually


independent


N(O,


with


this


case


N(0,

X(1)


(ArY,'),


col
1

col
1

col col
1

T
:ij


DA)


= A-1


m
i=l 1i

Im In


z(2)


the


m
=i1
i=l-Ni-ni

further


special


case


of Ghosh


Lahiri


press)


, xi
"1


for


every


1,...,


1,...,


Note


that


a ratio of


V(eij)/V(vi),


variance


components.


The


random


regression


coefficients model


of Dempster,


Rubin


Tsutakawa


(1981)


(also


Prasad


Rao,


1990)


also a special


case


ours.


this set


up,


are


same


in the


nested


error


regression


... A


T
~ij


D(A)


,-1>


,(2)


(1)


XI
"1


(1)


Xt














Some


the models of


Choudhry and


(1988)


can


also


treated


as special


cases


ours.


For example,


one of


their models


given


bxi1


1/2
+ eijij


(j = 1,..., Ni


(2.2.4)


with


N(O,


(rA)


-1) and


N(0,


r-1).


Here


= 1,


are


-V)


same


in the


nested


error


regression


model


with


vector xi
"1]


replaced


scalar


5(2)


are


-F'.
ii


same


in the


nested


error


regression


model,


= Diag(x1l,...,


X1N1


, .


xml,'..


XmNm)


Another model


considered


by these authors


similar to


one


given


(2.2.4)


with


replacing x1/2
replacing x _1


as multipliers


eij.


another model


considered


Choudhry


(1988)


bxij


1/2
V1 ij


1/2
+ e .jx.
1JJ 1J


1,...,


.2.5)


with


having the


same distribution


previous model.


Here


= Diag(x11,...,


X1N1


xlii, ,.. .


, t


XmN),'


z(1)


m (1)
i= u
i- ~1


S(2)


m (2)
G u


with


"(1)
u.i


( 1/2
I 1 ,...


(2)
, u.


1x/2 T
x, )


1/2
-- IX i ,...,


xY/ 2


are


. 9


b,


i 1,, m)


.(2)


2(1)


o(x>


(1)


(2)
X


,


---














linear model.


For example,


suppose


there


are m small


areas


labelled


Within


1 ,...,


each


small


area,


units are


further


classified


into


c subgroups


(socioeconomic


class,


age,


etc )


labelled


1 ,..


The


cell


sizes


1 ,...,

1 ,...,


- 1,...,


Nij) denote
121


assumed


to be known.


the measurement on


Yijk


individual


(i,j)th


cell.


Conditional


r and


suppose


Yijk
ijk


= xT .b +
-1J-


+ eijk


(2.2.6)


= 1,...,


iid N(O,


N(O,


Nij;


mutually

(A3r)-1),


(A1r)-1).


1,...,


independent


N(0,


this


1 ,...,


with


with


N(O,


(A2r)-) and


case


col col col
1 *J


Y.,
ilk


col
1

( col
c \n. +l

Y.,
13k


col
1


col
1i

ol ij Tj)x,
j~c ~ j-1


c -Ni.-ni j)}


r-1),


(l)


(1)


.(2)

















col
1

ln.i}'
1~v i


-'3


m
i=lj


c
e In.
=1-


is a matrix similar to


with


(N~~-n~~)


replacing n
lJ


in defining the


dimensions of


vectors.


Also,


(r ,...


, rm


r~-' .


Y 1 *''*


7mc)


= 3


, A


(A1,


pm~


A21 Ic


= Diag(AllIm ,


A31 Imc).


Special


cases


of this model


have


been


considered


several


others.


Cumberland


7. -are degenerate


(1989)


considered


zeroes.


Also,


a model


they


where


assumed


variance


rat io


to be known


in deriving their


estimators,


and did


not


address


issue


unknown


appropriately.


Next


we show that


two stage


sampling model


with


covariates and


m strata


is a special


case


our


general


linear model.


Suppose


that


stratum


contains


primary


units.


Suppose


also


that


primary


unit


within


denote


stratum


value of


contains


subunits.


characteristic


interest


Yijk


for the


subunit


within


primary


unit


from


stratum


1,...,


1 ,...,


1,...,


From the


stratum,


a sample


primary units


taken.


selected


orimarv unit


within


stratum.


z
~2


'A'


,(2)


2(1)


u ir~


uiir. I













Assume


conditional


r and


Yijk


= xijb +


(k= 1,..., Nij; j = 1,..., Li
1 1


i = 1,..., m),


(2.2.7)


where


', ij and


are mutually


independent


with (i


N(O,


(A1r)


"ili


N(O,


(Ar) -1),


N(O,


col
1

col { col
1_ lle A Jn


col
1

col
1

col
u .. 1J 1J


+ nij [j

1,...,


= s(


2)T
W2)


col
1

(Q


ol col
i

('i~))


col col
1 LI 1i


Al sn.


let+


he defined~


i1 m 1 lnrl


v(i)


+ eijk


r-1)


('i~~)


eijk


(Yij k)


.(1)


(Yij


(2)


ilY
















col col (1 x. ,
1

where


- uij


C'')),


m
1- ini
i=l1


S ni.


E nij
j=- J


W(1)


= w(1


V where


(1)
W1
-1


m i
e *e in.
i 3 =n.1.
i- j-i "ij


(1)
V2


m


= "TOL.


m
Li ,'


z(2)


= (S 2)


m
e 1
* i ~=.


Li
l Ni.j-ni.


L.
m i
i= 1r. .
-~~~ 1'


Here

with


= 2,


= (A,
L.
INi,
JE Nj.


2)T ,

The


, D(A)


ideas


can


= Diag(AllIm,


be extended


A21IL.)


directly


to multistage


sampling with


more


complicated


notations.


may mention


here


that


Bayesian


analysis


for two stage


sampling


was


introduced


first


by Scott


Smith


(1969)


a much


simpler framework.


A multistage


analog of


their


work was


provided


by Malec


Sedransk


(1985).


Ghosh


= N.3


= (1)


m
E.i'
i=l


(2)


s


2(1)


"T
1


W(2)),


(2)
S


(2)














Now we


will


consider the


infinite


population


set


this


context,


we will


use


the model


given


(A),


with


mixed


linear model


representation


given


(2.2.1)


Here


we will


assume


the data


vector


nTxl


associated design matrices X


are


nTxp and


nT X


respectively 1


Without


loss of


generality,


also


assume


that


rank(X


- p.


Our objective


is to predict


(say)


on the


basis of


Y where S(uxp)


T(uxq)


are


known matrices.


Following the model-based


inference, it suffi

distribution of ()
V~


conditional


ces

= W


find


(say)


distribution


posterior


given


= y.


is provided


(conditional)


This


next


section.


We will


conclude


this


section


discussing a few


spec if ic


models


context


of comparative


trials


an imal


breeding which


are


special


cases


general


model


proposed


(2.2.1).


First


consider multicentered


clinical


trial


which


conducted


participating


clinics


to compare


two


treatments,


one


already


existing


the market


other


newly


developed.


Suppose


there


are


subjects


receiving the


treatment


jth


clini


Some of


nij. could
IJ


zero.


are


interested


in estimating the


SL? +














treatment


participating


clinic.


consider the


model:


+ Tij + eijk


Yijk


(2.2.8)


2 ,...


n.i ,
13


1,...,


where


mutually


independent


with


subject effects eijk
SJ k


are


N(O,

N(O,


treatment-clinic


(A2r)-1)


clinic


interaction


effects


, (A1r)-1)


the effect due


treatment.


Now we


will


write


down


for


(2.2.8).


ease


presentation,


assume


> 0


1 ,...,


Then


writing


S(Y1-11 '"


Y1in11


S. .


Y1cl'"'


Y1n1


~2c1'--


Y2cn2c)T


(l'I,


S711' **


2
s In.
i=l1


71c -* *


C
j=l n


C
) In .


el S..


elcl,...,


etnl


e2cn2c)T


A21 I2zC)~


In ,
- 7


(A1,


is clear that


A2)T


and D(A)


(2.2.8)


is a special


case


(2.2.1)


2
E "ij,'
* = _-


where


observations.


above


c and


want


C
="ij '


c
E nij
j=1


to estimate


= total


which


number


is a


* "


Diag(Alic,


, **


x,


Z
1


I-- 1>


pa~T


(C1,...,


TZc)T


(ellll,


"2 )


e2cl'''''


CI1-Cc2














used.


Once


again,


one


can


use


linear model


given


(2.2.8)


inferential


purpose.


case one may


interested


predicting some


suitable


linear functions of


random quantities,


c and
&J


7ij,


known


as the


breeding


values.


These


predicted


breeding values can


used


as a


selection


index for


selecting the


most


suitable


breeds for


future


breeding purpose.


a concrete


example


in animal


breeding we


will


discuss


example


considered


by Harville


press)


which


involves


prediction


the average


birth


weights of


infinite


number of


offspring of


single-birth male


different


sires


lambs


in different


that


are


population


lines.


The data consist of


weights


birth)


of 62 single-


birth


male


lambs,


caine


from five distinct


population


lines


Eac h


1amb


was


progeny of


one of


rams,


each


lamb


a different dam.


Age of


dam was


recorded


as belonging to one


three


categories,


numbered


(1-2


years),


(2-3


years


(over 3


years)


Yijkd


represent


weight


birth)


the dth


those


lambs


that


are


offspring of


sire


population


line


a dam


belonging to


age


category.


Following Harville


press),


we will


use














where d


1 ,...,


nijk


I. ,.,


= 1,


3 and


1 ,...,


5 where


nijk


is the


number of


lambs whose dams


belong


age


category when


population


line


and


si re


is k and m.
j


is the


total


number of


lambs


whose


sires are


from


population


line


Here


effects


line


effects


are


, .


considered


fixed


effects


sire


(within


line)


effects


Sjk


are


iid N(O,


(rA)-1) and


independent


error variables eijkd


which


are


N(O,


r-1)


To make


the design


matrix


associated


with


fixed


effects


full


rank we


can


take 63


= 0


= r5


which


usual


formulation


needed


for GLM


Procedures


in SAS.


= E(Yij kd)


=14+


+ and
kJ


there


n"i


observations


will


corresponding to


interested


category.


predicting


Wjk


pijni.


rj +


(2.2.10)


where


The


value


Wjk


can


interpreted


3
i n i .


average


birth


weight


infinite


number of


male


lambs


that


are


offspring of


sire


line.


as


(sl'














2.3


Hierarchical


Bayes Analysis


this


section,


for the


finite


population


sampling we


provide


predictive distribution


of Y


given


y(1)
y


and for the


infinite


population


set


up we


provide


posterior distribution


vector of


effects


, vT)T


given


= y.


We will


use


following notations


label


certain


distributions


used


this


section.


A random


variable


is said


have


a gamma(a,


distribution


f(z)


= [exp(-oz


random vector


a/(0) I[z>OJ]


, ,


Tp)T


is said


(2.3.1)


to have


mu It ivar i ate


t-distribution


with


location


parameter


scale


parameter c,


a p.d.


pxp matrix and


degrees


freedom


(d.f


v if


g(t)


- II)l~


cx ~ [


) J4(V+p)


(2.3.2)


(see


Zelliner,


1971,


383,


or Press,


1972,


136) .


Here


E denotes


the determinant of


a square matrix E.


Assume


v > 2.


Then


E(T)


=/"


V(T)


= (/(v-2)).


a nr-,e- I f


onnditi onsI


driven


at the


(1)


)~P-1


Sll...~


lr n.~nl


. .1A .i .i













(Cl)


AL ~


AtR are


independently


distributed


with


uniform(RP) ,


~ gamma 4a0,


gamma( ai,


2gi) with


> 0,


19...,


Allowing a&


some


zero


some


improper gamma


distributions are


included


as a possibility


our prior.


Before


stating the


predictive


distribution


given


notations.


we


write


need


E(A)


introduce


a few matrix


= + D(A)zT


partition


into


Also,


22.


-22


21 1112;
^l2C1112


-11


.3.3)


.3.4)


-~11-i -11~ )


--2


(X(1)T


V1 x(1))
11-


(1)Ti -1


(2.3.5)


- 22


-2


- 2
-2


S(-1 ))1)
1-11-


(2.3.6)


Now the


predictive


distribution


of Y(2)


given


given


in the


following theorem


in two


steps


250 '


of Y(2)


Y


+(x(


g0


(1)


x (1)TE
-Ili


-1
(l))(,(l)TC"`11


(1)














Theorem


2.3.1.


Consider


model


given


.2.2)


(cl).


Assume


t
Egi
i=O


that


Then


conditional


on A


multivariate


distribution


with


d.f


t
Z gi
i=0


- pt


location


parameter


scale


t
Ea1 1


parameter


+ (1)T (1)
J


ti
i=O


Also,


-1
-P


conditional


distribution


given


y (1))


u Cx


-A, (1)T


-11--


t
i ai.A +
i1-1


(1)TKy (1)


moments


of a multivariate


t-distribution


iterated


formulas


conditional


expectations


variances


follows


from


above


theorem


that


y (1))


-E(M


y(1))(1)


.3.8)


(1))


(1)


t
i=l


1
1sr
A!


t
i 0 i
i-0


Y


(2>


"T


My (1)


(1)


1
-i (nT+


ag+


,(y(2)


,(y(2)












= V My l Y + nT +


t
Egi
i=0


-2


t + (1)T (1)
i=l 1 1
1 .^^-


(1)


(2.3.9)


Using (2.3.8) and (2.3.9),


it is possible to find the


posterior mean and variance of


, Y(2))


4Y(1)


+ CY 2)


where A and C are known matrices.


The Bayes


estimate of


(1)


, (2)) under any quadratic


loss


posterior mean, and


is given by


BFsing (2.3.8)


using (2.3.8).


= A + MIy(1)(1)


Similarly,


(2.3.10)


using (2.3.9), one may obtain


= CV(Y y (2) ))CT


. (2.3.11)


Note that when A


m T
i=l 1


and C


,(2))


im T
i= lNi-ni


reduces to the vector of finite population totals for the m


smal1 areas,


il1i /N, ( Y(1)
1. N.-n. 1! -


areas


for the choice A


.ilTin /Ni) and C
J ^nn 1


S(2)) reduces to the vector of finite


population means for the m small areas.

Now we will get back to the infinite population set up


to provide the posterior distribution of W


,T)T
v )


t S S I S ** U I *


V ( 1)


(


I(u(')


~(y (1)

















theorem.


proof


similarity to


this

the


theorem will


proof


be omitted


of Theorem 2.3.1.


because

We will


consider the model


given


of Section


(cl).


Recall


from the


middle


of Section


that


have


redefined


the dimensions of Y


e appearing there


Y(n xl),


X(nTxp) ,


Z(nTxq)


e(nTxl)


Also


we have


assumed


rank(X)


= p.


Now we will


state


theorem.


Theorem


2.3.2.


Consider the model


stated


above


assume


that


t
E gi
i=O


- p


Then,


conditional


has multivariate


t-distribution


with


d.f.


nT +


nT +


t
E gi
i=O
t
i=gi
i=1


location


- p,


parameter


ai.Ai


scale


parameter


where


x(xTx-1X -1)xTr-1;
XX X X *


(2.3.12)


_= [-1x(xTy-1X)


QZD


2.3.


(xTE-1X)

-DZT -1X(x T


-(XTE-X)


-1)


xTE-1ZD


- DZTQZD


S C


2.3.


on


= -1


Y,


c-


x,


+ yT9y















f( iy)


I! xTslx


i=l
2'-.-
[{ A -


a.A.
1 1


yTQy


- C


(2.3.15)


Again


using the moments


iterated


formulas


of multivariate


for expectation


t-distribution,


variance,


above


theorem can


used


find


the


computational


formulas for


E(W|y)


V(Wly)


as


(2.3


(2.3.9).


Similarly,


one


can


find


eBI(


(say)


(2.3.10)


V( (b

Sb +


known matrices S(uxp)


.3.11)


where


and T(uxq).


Applications of


these


two


theorems will


considered


in Section


some


actual


data sets.


There


we will


carry


out


an


HB analysis of


data sets


which


appeared


Battese et


(1988)


Harvilie


press)


Before


we conclude


this


section ,


we will


make


a final


observation.


comparison


(2.3.4)


.3.7)


with


(2.3.12)


(2.3.15)


reveals


that


replace


by y(1)


X by


c by


f( ly(1))


as given


f(Aly)


(2.3.7).


(2.3.15)


we obtain


This observation


will


referred


in Section


2.4.


vi)


C(b,


x (1)














Applications of Hierarchical


Bayes


Analysis


This


section


concerns


analysis of


two


real


data


sets


using the


HB procedures


suggested


in Section


2.3.


first data set


soybeans for


related


12 counties


prediction


north-central


corn and


Iowa based


1978 June


Enumerative


Survey


as we 11


as LANDSAT


satellite


data .


It appeared


in Battese,


Harter


Fuller


(BHF)


who


conducted


a variance


components


analysis for thi


problem.


second


data


set


original 1


appeared


Harville


Fenech


(1985)


reappeared


in Harville


press)


where


he conducted


a variance


components


as well


as an HB


analysis


predict


Wjk


, given


(2.2.10)


ave rage


weight of


an


infinite


number


of single-birth male


lambs


that are offspring of


sire


population


line.


We will


first


consider


data set.


start


with


briefly


give


a background


this


problem.


USDA Statistical


Reporting Service


field


staff


determined


area of


corn


soybeans


in 37


sample


segments


(each


segment


about


hectares)


12 counties


in north-central


Iowa by


interviewing farm operators.


Based


on LANDSAT


readings obtained during August


September


1978,


USDA














number


June


of hectares of


Enumerative


corn


Survey),


soybeans


number of


reported


pixels


classified


as corn


soybeans for each sample


segment,


county mean


number of


pixels classified as


corn


soybeans


(the


total


number of


pixels


classified


as that


crop divided


number of


segments


that


county)


are


reported


in Table


of BHF


ready


reference,


reproduced


comparable


Table


that


2.1.


orde r to make


of BHF


second


our


segment


results


Hard in


county


was


ignored


model


considered


by BHF


bl 1 ij


+ b x2ij


+ e,.,
1J


(2.4.1)


where


a subscript


for the


county,


a subscript


a segment


within


given


county


number


segments


county,


...


12) .


Here


Xl ii


number of


pixels


corn


the


number


of pixels


of soybeans


for the


segment


in the


county.


They


assumed


our


notations)


E(vi)


= E(eij)


= 0,


Cov(vi 1


V(vi)


(Ar)


v.,)


, Cov(vi,


Cov(e1J,


i .1')


eij)


= 0,


= 0


, j').


are


interested


oo. ,


bg


Vt













Table


2.1


Survey
in 12


and Satellite
Iowa Counties


Data for Corn


Soybeans


Segments


Reported
hectares


No.
in


of pixels
sample


segments


Mean no. of
pixels per
segments


County


Sample


County


Corn


Soybean Corn Soybean Corn


Soybean


Cerro Gordo
Hamilton
Worth
Humboldt


Franklin


Pocahontas


Winnebago


Wright


Webster


374
209
253
432
367
361
288
369
206
316
145
355
295
223
459
290
307
252
293
206
302
313
246
353
271
237
221
369
343
342
294
220
340
355


Hancock


Kossuth


Hard in


55
218
250
96
178
137
206
165
218
221
338
128
147
204
77
217
258
303
221
222
274
190
270
172
228
297
167
191
249
182
179
262
87
160


295.29
300.40
289.60
290.74

318.21


257.17


291.77


301.26


262.17



314.28




298.65




325.99


189
196
205
220


188.06


247.13


185.37


221.36


247.09



198.66




204.61




177.05
















can


be written


as


where


N.
-1 1
= Ni E eij
*j=l ~


+ bRip)
1 li(p)


+ b2R2i(p)


Xli(p)


N N.
-1 1
= N. xlij and
j=-1


N.
= Nix2ij.
J=l


Under the


assumptions of


model


(2.4


.1),


can


interpreted


as the


conditional


mean


hectares of


corn


(or soybeans)


per segment,


given


realized


county


effect


values of


satellite


data.

because


Clearly,


average


segments


county


mean


not


equivalent


over the


not


finite


identically 0.


population


However,


either


if N


i are


m) are


appropriate


conditions


large


small,


predictor


appears


or


then


of VY.

to be


sampling


rates


predictor of


this example


true .


n i/N


i is an

either


predicting


first


assuming


A and


r known,


obtained


BLUPs


12) .


Then,


using


Henderson 's Method


, they


obtained


estimates of


variance


components,


final


predictors


involved


estimated


variance


components.


Henderson 's method


being an


ANOVA method


could


lead


to negative


estimates


this were


case,


set


it equal


to zero.


This


phenomenon


likely to


I,. n C


nfl r+- 4 i-nt 1r ttn r .,tI


~~~~11


- - ~ -


r~~~~~~ ~~ n1 a np it 'i f-'T U u r..4I


. .


+ 73i


nllm kar nC


h Yna 1













special


case


of nested


error


regression


model,


we will


now develop


expressions


for the


posterior distribution


for the


posterior means


given


(2.3


expressions


variances of Y


Here,


we have


DO)


'Lu,


Then


m In
e(Ini


A'1J~.)


so that 111

n.


+ ni)/A}


Also,


writing


= nil


1,...,


where xij
-ii


one


gets


x(1)T -1 (1)
x r ,x


nm 2n
- i= I 1 1


_
Si


-T
i (s)- 1 (s)


= H(A)


say


(2.4.2)


Next


writing


i(s)


-1
1


one


gets


(1)T


n
=m .
= i= Ij


- n
i=1 j


AS'!' ("


=1-X ij ij


x H-1m
1


=1 j=1-


- "i(n


-1
A) y
i(s)


-T

Yi(s)


=1 x; jX


n
illl((x


Xlij Y2ij~T


(1)
Ky


- "i("i












f(1))


1 1 1
-2(m+ )-1 m "2 2
Aml)- n 0(A + ni) H(A)
I 1||H|--


- (nT+g+g -p))


a + alA + QO(A))


(2.4.4)


Next writing fi


- (N.
1


- ni)/Ni


/ \ 1 N.
= tN.-n. E. x..
1 j=ni+1-ij


posterior means, variances and covariances of the finite

population means are given by



1


- fi(s) + fiE


ni(ni + A)


+ f.E
1


{-i


- n.(ni + A)
1 11


T
-i (s)} ()


Xn.
x Em Tijpx x j(Yij


- ni(n. + A)
~1


- eHB


(say


(2.4.5)


Ni Ni (1)


= fV


ni(n.+A)


fi(s) + {T-ni(ni+A)
1(s) 1 1 1


-i(s)}


i(s>















-n.(n.+A)


iT


say


=2
SHB


.4.6)


it


Cov


N1
1


Ej:*'Yii


-1k .
k j=l kjy-


f.f iCov
1 k


ni(n.+A)
I~1


+ {8T-ni(ni+A)


x H-(A)



nk(nk+A)


-n.(n +A)
1 ~11


*~


-nk(nk+A)


H-1()
H1 (A)


x k=l


"kj(


-nk (nk+A)


(nT+go+g1


-p-2)


-1
fifkE


(ao+alA+Q o(A))


-ni(ni+A)


H-(A:)(


-nk (nk+A)


x (Ni+A)Ni(n,+A)


x H-1)(


2


I I


~T-n i(ni+~>


"ij(yij


r













covariances which may


be necessary for providing


simultaneous


confidence


set


finite


population mean


vector,


we will


not


use


it here.


Before


we find


posterior means and


variances of


in the


infinite


population


set


we give


a general


discussion


comparing HB


predictors with


EB predictors.


Writing


N.

j=l 1J


- f.)
1


i(s)


f. ^N
i ~it


fini(nl
+ -1

+ A)-1


A)-1
Yi(s)


)H-1 )


i(s


x m

i1l


- in


= gl(A)


(say),


(2.4.8)


1 j= 1J
N~Yi.


(1)
-y


(nT + go


- p


- 2)-lf{aO


+ A +(


f


- "i("i


j(Y;j














= g2()


(say),


(2.4.9)


we have


from


(2.4.5),


(2.4.6),


(2.4.8)


(2.4.9)


that


SEgi(A)yl ,]


(2.4.10)


= v[gl()y(1),


(2.4.11)


(2.4.12)


In EB analysis,


to obtain


EB predictor,


usually


replace


Egl (A)I) (1)


by gl(A)


= eEB


(say)


where


is some


estimate


of A,


which


can


be ML,


REML or


ANOVA


estimate


report


a naive measure


of posterior variance


g2(A)


(say).


Usually,


point


estimates


eHB


are


not


too


apart.


to measure


posterior variance


2
5EB'


we may


underestimate


actual


measure


because


failure


to account


for the estimation


We may


grossly underestimate


actual


measure


if gl(A)


varies


too much


within


body


posterior distribution


this


case


will


be significantly


large.


We will


see


this example


that


for some


of the


counties


= Eg (A)y(1) .













2B
SHB


usually


increases with


relative difference


between


eHB


eEB.


Now to develop


expressions


for the


posterior means


and variances for


. we use
1


Theorem 2.3.2.


Note


that


observation made


posterior distribution


of Section


given


2.3,


is given


f(Ajy),


(2.4.4).


After


considerable


simplifications


this


particular


case,


we obtain


ECUi lIJ


= E ni(ni


Af-


S)Y


{ i(p)


- "i(n1


A)-x. }
/ ~i(s)


H-1()
-H (A)


- ni~e1


- eHB


(say)


(2.4.13)


VP ilR


ni(n.


-1 (s)
A1 sf


+ X. (
Si(p)


- ni(ni


Af 1-1


i=1


(nT +


- Ti.n


-p -2)-1E


A) -I


a{o


+ alA


m
i=


+ O


v


+ E


j(yij


+ gl
















_ H*2
- SHB


(say)


(2.4.14)


where


2i(p))T


, ()


as given


(2.4.2)


%Q(A)


xli(p)'

as given


(2.4.3).


Note


that


since


>K
-x .


- 0


as N.


can


seen


informal 1


that


rhs of


(2.4.5)


(2.4.6)


approach


limit


(2.4


(2.4.14),


respectively.


We will


now get


back to


actual


data analysis of


data set


given


in Table


2.1.


use


formulas


(2.4.5)


, (2.4.6)


, (2.4.8),


(2.4.9),


(2.4.13)


(2.4.14)


to obtain


HB and


posterior means


and variances


population means


for the


12 counties.


HB approach


eliminates the


possibility


obtaining zero estimates of


variance


components.


A number of


different


priors for


R and


were


tried;


both


informative


noninformative


results


for the


posterior means


were


quite


similar


whereas


posterior variances


varied


approximately


much


as


10%.


i 1 1 lustration


purpose,


have decided


report


our analysis


for the


prior with a0


= 0.005,


= 0,


= 0.005 and


= 0.


since


choice


= 0


gives


improper posterior distribution


we took a1












Table


Predicted


Hectares of


Corn


and Associated


Standard


Errors


.005


.005


County eHB eEB eBHF sHB SEB SBHF


Cerro Gordo 122.1 122.2 122.2 9.3 9.4 10.3
Franklin 143.6 144.2 145.3 6.9 6.4 6.7
Hamilton 126.2 126.2 126.5 9.2 9.3 10.1
Hancock 124.6 124.4 124.2 5.3 5.3 5.5
Hardin 142.6 143.0 143.5 5.8 5.6 5.8
Humboldt 108.9 108.5 107.7 8.2 7.9 8.4
Kossuth 107.7 106.9 106.1 5.8 5.2 5.4
Pocahontas 111.8 112.1 112.9 6.6 6.4 6.8
Webster 114.9 115.3 116.0 5.9 5.7 6.0
Winnebago 113.3 112.8 112.1 6.6 6.4 6.8
Worth 107.1 106.8 105.6 9.9 9.1 10.0
Wright 122.0 122.0 122.1 6.4 6.5 6.9















eBHF and


respective associated


standard


errors


sHB'


SBHF


for the


corn


data.


Table


2.3


provides


the values


eHB,


eEB,


eBHF and


eHB


for the


soybeans data


for the


same


choice of


prior


hyperparameters,


whereas


Table


2.4


provides


their


respective


standard


errors along with


components


of SHB.


Values of


eBHF and


SBHF


presented


in Tables


-2.4


are


computed


using FORTRAN


from the


formulas given


paper


are


slightly


different


from


values


reported


Battese


(1988).


From


Tables


2.3,


for


predicting corn


soybeans,


one


can


see


that


eHB


, eHB,


eBHF are quite


close


to each


other.


From


Tables


2.2


2.4,


SEB and


appear to


be smaller


than


5BEIF


But since


sEB


is naive


posterior


s.d. ,


is probably


underestimate


true


measure.

difference


From


Tables


either


2.3 and


between


eHB


we find


eB
eHB


hardly


or between


any

their


standard


errors


SHB.


This


is what


we anticipated


for this data.


To draw a clear


comparison


between


HB and


EB procedures,


we


added


one


extra column


the end


Tables


2.3


2.4.


last


column


of Table


2.3 measures


percent


relative


difference


x IeHB


- eEBI/


between


EB and


HB predicted


values


whereas


last column












Table


2.3 The


Predicted


Hectares of


Soybeans Obtained


Using Different


.005


Procedures


.005


County eHB eHB eEB eBHF l eEB/eHB xl00%


Cerro Gordo 78.8 78.8 78.2 77.5 0.78
Franklin 67.1 67.1 65.9 64.8 1.80
Hamilton 94.4 94.4 94.6 96.0 0.21
Hancock 100.4 100.4 100.8 101.1 0.40
Hardin 75. 4 75.4 75 .1 74.9 0.39
Humboldt 81.9 82.0 80.6 79.2 1.71
Kossuth 118.2 118.2 119.2 120.2 0.84
Pocahontas 113.9 113.9 113.7 113.8 0.18
Webster 110.0 110.0 109.7 109.6 0.37
Winnebago 97.3 97.3 98.0 98.7 0.72
Worth 87.8 87.8 87.2 86.6 0.68
Wright 111.9 111.9 112.4 112.9 0.45













Table


2.4


Standard


Errors Associated


with


Different


Predictors


of Hectares


of Soybeans


.005


.005


County SHB SHB SEB SBHF V1 V2 V1/(V1+V2)xl00%


Cerro Gordo 11.7 11.7 11.6 12.7 7.67 128.59 5.1
Franklin 8.2 8.2 7.5 7.8 11.94 54.92 18.0
Hamilton 11.2 11.2 11.4 12.4 1.97 123.61 1.6
Hancock 6.2 6.3 6.1 6.3 1.35 37.59 3.4
Hardin 6.5 6.5 6.5 6.6 0.37 41.84 0.9
Humboldt 10.4 10.4 9.9 10.0 22.62 85.40 20.9
Kossuth 6.6 6.7 6.0 6.2 7.99 36.23 18.1
Pocahontas 7.5 7.5 7.5 7.9 0.06 55.98 0.1
Webster 6.6 6.7 6.6 6.8 0.64 43.51 1.5
Winnebago 7.7 7.8 7.5 7.9 4.11 55.70 6.9
Worth 11.1 11.1 11.1 12.1 4.06 118.17 3.3
Wright 7.7 7.7 7.6 8.0 1.62 57.48 2.7














contribution


of V1


usually


increases with


the


relative


difference


In particular,


for


counties


Franklin,


Humboldt


and Kossuth


1.80%,


these


1.71% and


relative differences are


.84% and


as high


corresponding contributions


of V1

made


are


as nonnegligible


SEB much


smaller than


18.0%,


SHB


20.9% and


for these


18.1%.


This


counties.


one


uses


a naive


EB or


estimated


BLUP


approach,


he will


tend


to underestimate


mean


squared


error


(MSE)


prediction.


should


note


that


though


BHF


used


estimated


BLUP


, they tried


to account


for the


uncertainty


involved


estimation


their


approximations


of MSE.


Similar approximations


of MSE of


prediction


have


been


suggested


Kackar


Harville


(1984),


Prasad


(1990)


Lahiri


(1990).


Now we


will


consider the


lamb-weight data set


Harville


given


press)


example


The


background


presented


the data set


of Section


2.2.


We will


use


a model 1


similar to


the one


given


(2.2.9)


analyze


the data set.


There


assumed,


following Harville


press)


population


line


effects


as fixed.


For the


purpose


will


illustration


assume


with


population


three


effects


variance


random.


components,


we


This would














(age


of dam)


full


column


rank,


we will


write


Now we


have


following mixed


linear model


Yijkd


+ sjk


=pi


+ eijkd


(2.4.15)


1 ,...,


r1ijk'


3 and


1 ,...,


1,...,


where


"ijk


are


are


and mj

, (rA1)

, r-1)


are


-1),


same


are


Moreover


as


(2.2.9)


N(O,


assume


(rA2-1) and


eijkd


are


eijkd


assumed


to be mutually


independent


We want


to predict


given


2.2.


10).


Using


(2.4.15) ,


we will


rewrite


~~1


ni..
1 .*


+ rj + sjk


(2.4.16)


where


nT are


given


(2.2.10).


We will


carry


out


a noninformative


Bayesian


analysis


using


a uniform(R3)


prior


independent


gamma( ao,


gamma( a,


~51) and


gamma( La2,


1~2


priors


RA2


respectively.


Using Theorem


fixed


E(wjk


2.3.2,


random effect


= eHB


(say)


being a


we can


linear


find


combination


posterior mean


posterior variance


V(wjk


- --


-~~ -- -


r I T a 1 14 1. .


p + 6i


as


- S


C -


Wi k


Wi k


pg~T


(Ccl,


40)


C 1














example


choice


0 of


hype rparamete rs


the variance


components .


gives a noninformative


choice


prior for


= 0 or


= 0 will


give


improper posterior distribution of


(A1,


we tried


several


combinations of


these


hyperparameters


which


are small


positive


numbers.


Our


findings


for this data set,


provided


Table


2.5,


are


not


different


from


data set.


report our


analysis


- 0.0005,


- 0.05


- 0.01


in Table


2.6.


The estimated


BLUPs


for w13


w56


reported


in Harville


press)


are


10.98 and


10.29


respectively,


whereas


corresponding values we


obtained


using a


noninformative


HB analysis are


11.0


10.4


respectively.


agreement


between


two


sets


of estimates


remarkably


close


considering the


fact


that


underlying


models


2.2.


(2.4.15)


are


not


identical.


Harvilie


press)


also


estimated


the difference


- w56


associated


MSE of


prediction


using


both


variance


components approach


HB approach.


The


estimated


given


MSE of


(0.955)2


- w56


whereas


naive


for


EBLUP


Kackar


approach


Harvilie


was


(1984)


approximation it


was


(1.053)


for Prasad


Rao


(1990)


"1.


&2


g0


gl


82


gl


g2













Table


Birth Weights (in pounds) of Lambs


Sire Dam Age Weight Sire Dam Age Weight


Line 1


Line 4


6.2
13.0
9.5
10.1
11.4
11.8
12.9
13.1
10.4


9.2
10.6
10.6
7.7
10.0
11.2
10.2
10.9
11.7
9.9


8.5


Line
1
2


Line


13.5
10.1
11.0
14.0
15.5
12.0


Line 3


11.7
12.6
9.0
11.0
9.0
12.0
9.9
13.5
10.9
5.9
10.0
12.7
13.2
13.3
10.7
11.0
12.5
9.0
10.2


10.0













Table


Predicted
Standard


Birth
Errors


Weights


of Lambs and


Associated


aO = .0005


g0=O


a = .05


gl=0


a2 = .01
21


g2=0


Line Sire e* s*
eHB 9HB


10.1
10.9
11.0
10.4


11.9
11.7


10.8
10.8
11.3
11.1

10.2
10.5
10.5


0.90
0.86
0.62
0.80


0.88
0.71
0.87
0.80


0.70
0.53
0.75
0.82

0.62
0.80
0.79


11.2
10.7
10.8
10.8
11.3
10.4
11.4
10.8














error.


HB estimate of


Wi3


in this


case


reported


1.042.


as 0.69


posterior s.d.


corresponding values


obtained


reported


using our


approach


are


0.60


0.99


respectively.


To conclude


this


section,


we can


recommend


from


whatever


sets


that


have


learned


from


noninformative


analysis of


HB method


these data


is clearly


a viable


alternative


usual


EB or variance


components


approach,


should


given


every


serious


consideration


prediction


both


finite


population


sampling and


infinite


population


situation.


Hierarchical


Bayes


Prediction


Finite


Population


Mean


Vector


in Absence of


Unit


Level


Observations


Sometimes


is either difficult


impossible


obtain


information


unit


level


for


small


areas.


this


section


we will


derive


predictor


finite


population


mean


vector when


we do


not


have


observations


unit


level.


1 ,...,


small


area with


units,


assume


that


based


on a sample of


size


know


only


sample mean


characteristic


inte rest,


sample mean


auxiliary variables.


Also


vYi(s)
vector


we


have


i(s)


(pxl)


information


on the


we


- "56














population mean
N.


Nl1 Y
1 j=(s)"'


based


vector


( 1 ...,


on (s)'""


Vm)T


Vm(s))


= 7 where


(say),


m(s)T
-rm(s))


5 ,)


-m(p))


YiN.)
1


1,...,


Consider the


following model.


Conditional


on ??


(N.xl),


(pxl)


RilIN.i)


, 9


independently,


where


(ii)


are


Conditional


known


sampling variances.


b and


lIiNi)


independently


(iii)B


A are


independent


uniform(RP) and


a prior

gamma( a,


with


Z-g


Combining


(ii)


we


have


conditional


b and


~ N(X ib,
~1-


(R +6-1)IN
iI


independently.


Carter


Rolph


(1974)


introduced


this


type


of model


Herriot


(1979)


considered


EB approach


this


problem


a special


case


- e1^


assumed


place


(ii)


that


conditional


S1,...,


independently.


Subsequently,


place


(i')


they


assumed


that


conditional


, Y.
-1


NT/ /..TL \


fllT -I


- - 1 -


, .


1;' _


9


(Yilr.


N(?i.


N(Xib ,


("iTb


n-lr


ffi














variance,


they


estimated


iteratively


applying


generalized


least


squares


procedure


to Y. (s)
i(s)


~ N(xTb,


-1


+ 6-1


1,...,


independently where


Yi(s)


is the


sample mean


based


on ni


units.


They


estimated


= 1i'


- 1,...,


based


on their superpopulation model


whereas


are


interested


predicting the


finite


population mean


1,...,


based


on


(i')


(iii).


Now we


will


back to our


problem.


For the


sake


notational


simplicity,


we will


assume without


any


loss


generality that


units


is given


sample mean


by Pi(s)


Yi(s)
n.


=- ni1.


is based


Now


first


define


S


-"~i.-


N.
.E
j=ni.+1
+1


N1

j=ni+1


T
"i1)


Y1j'


have


-T
-i(s)


"n
-1= X
n1 1.2i,'
1 ---4 1


also defined


-T(
-.,u)


in Section


that


1 ,...,


Since


vector 7,


- ni)/Ni

f .(u)

is enough


known,


find


to predict


predictor of


Y(u) "


Ym(u))
enough


(say).


find


to predict


the


vector


predictive distribution


given


sample mean


vector


(say).


any


quadrat i


OSS,


predictor c


is given


we


E(Pi)


fi)pi (












and its posterior variance is given by


= f f(a IY)( .


(2.5.2)


Now from (i), given b and 6, (s)'""


m(u))


m(s)'


is multivariate normal (MVN) with mean 4


and variance Diag(a-,..., 2rm) where i


= X-(s)b,' i+m


-T b 2
"i(u)~' i


= (R1 + 6-1n. and a2
0i+m


(R1 + (N


i = 1,..., m. From this, it is easy to derive that given

b and 6, a is MVN with


E("il (s) b 6)


=x (u)b
-i u)


(2.5.3)


Cov(ai, aki(s), b, 6)


= +mik
i +m i k


(2.5.4)


where 6;i


the Kronecker delta which is 1 if i


= k and


zero otherwise.

Using the iterative formulas for expectation and


variance we have from (2.5.1)


E(ri z(s))


- (2.5.4)


- fi)Y(s) + fEs)' b (s)


- fi)i() + fE i(u by ))


- fi)i(s) + fiT(u)Ebl_(s))


(2.5.5)


Y1 Y


i>













v ri (s))


=1 1 (ils))


=f2
-


E V(a i s)


b 6)s)+
- -(s)-


VE a. V ,b,
~ -(s)- (s


i t E+m i(s) (u) V"( I(s))Si (u)


(2.5.6)


Note


that


from


(ii),


have


(iv)


given


b and


' (s)


where


(2.5.7)


Xrn(


2 m)


2.5.8)


from


(iii)


we can


write


given


uniform(RP).


From


(iv)


we have


joint


of Y( )
~(s)


B given


(s)'


b16)


i (1/-)exp 1y( -Ab)V-(
=1


(2.5.9)


assume


rank(A)


p and


define


= (ATv-1A)-1ATv-1


-T/ 2
Xi
(s)1-x1


-1
m
E-1


(2.5.10)


m
\-1


N(Ab,


(Xl(s>'''''


Diag(~ 2~,,,,


(,)-Ab~.


'i(s)"i(9)/D?


(2.5.10)












(2.5.9) can be written as


fy -AbV- 1 -Ab
S(s) -- -(s) ~~


(-bT(AV-1A )(b-b) + sQy(
-b~T(4 ()-s


(2.5.12)


From (2.5.9) and


(2.5.12)


, it follows that given


Y(s) and


(ATV-1A)-1)
(\ ^ _)- i)


Note that in (2.5.10) that b


depends on ? ( and 6 since a2 depend on 6.

Again using the iterative formulas for expectation and

variance we have


E[ Iv(


-1
*ER. ()&(s)1 /) 10i/<1 1
1 1 / \ -X /


Y(s)


(2.5.13)


= EEVTy l)-1 + VE(By ()
-- ( s ) (+


T 2
(s) i(s)/ 8i


mi


N(1?,


E[E(BIP~,). 6)e(


E(B IP~s))


E


\r(B IP(9))


E













to evaluate


E(il Ys)) and


it follows


from


(2.5.5),


(2.5.6)


, (2.5.13)


(2.5.14)


that


enough


to evaluate


E(1+ml(s)) and


the quantities that


appear

evaluate


on the

e them,


(2.5.13)


we need


find


and

the


(2.5.14).


conditional


order


distribution


of A given


From


(iii),


(iv)


(v),


joint


of (s)


is given


f (s)'


o c (1/.i)exp -l(
[?ii "o]^


( -AbV)Ty(-1 ( Ab) 62
(s) ~~ ~ -(s) -~~


exp(- ab .


.15)


Using


~ N(b,


2.5.


(ATVl1A )31


fact


that


given


integrating out


f(s)
from


(2.5.15)


that


joint


of Y(


is given


m 1{
i=l A ~


S-1 1
62 A I V-12
s) ~


2.5.


Since


f(s))


cx (P9. )


the


conditional


IO~s)) ~


f(y(s) .


9s
s>


f(y(g) ,
f(v















.5.14)


are


accomplished


now


using


.16)


typically


some


numerical


integration


techniques.


















CHAPTER THREE
OPTIMALITY OF BAYES PREDICTORS


FOR MEANS


3.1


IN A SPECIAL CASE


Introduction


In Chapter


Two


a hierarchical


Bayes


procedure was


introduced


prediction


in mixed


linear models,


Section


results


were


utilized


for


prediction


purpose


both


population


set


finite


in th


population

e presence


sampling and


auxiliary


infinite

information.


There


considered


general


case


unknown


variance


components and


derived


posterior distributions of


interest

fixed ef


assigning


fects and


independent


gamma priors


uniform


prior


inverse


variance


components.


this


chapter,


we will


consider


a special


case.


assume


that


ratios


variance


components


are


known.


We derive


HB predictors


for the mean


vector


prove


some


optimal


properties of


this


predictor.


In Section


(2.2.2)


3.2,


of Section


we consider the


with


normal


vector of


linear model


ratios


variance


components,


known.


assign


a uniform


prior to













posterior distribution


nonsampled


units given


sampled units

we derive the


finite


population


HB predictor of


sampling and

e finite pop


from this


ulation mean


vector.


Later


this


section,


infinite


population


situation,


posterior distribution


vector of


fixed


random effects and


HB predictors


for


linear


combinations


fixed


random effects are determined.


Our


approach


these


problems


can


regarded


extensions of


ideas


Lindley


Smith


(1972)


prediction.


Although


developed


within


a Bayesian


framework,


our


results


should


be of


appeal


also


frequentists.


both


problems,


BLUP notion


for


real


valued


parameters


(see,


for example,


Henderson


, 1963;


Royal 1,


1976)


extended


in Sections 3.3 and


3.4


to vector valued


parameters,


is shown


that


Bayesian


predictors


Section


are


indeed


BLUP


Like


other


related


papers,


our


BLUP


results do


not


require


any


normality


assumption.


With


added


assumption


of normality,


BLUPs


indeed


turn


out


best


unbiased


predictors


(BUPs)


within


the


class of a

that these


unbiased


Bayes


predictors.


predictors are


addition,


BUPs even


is shown


for some


nonnormal


distributions.


In these


sections.


we have


also













distributions.


In Sections 3.5 and 3.6 we


have


shown


that


these


Bayes


predictors


are


best


equivariant


predictors


for


both


the matrix


loss


(or standardized matrix


loss)


quadratic


loss


(or standardized


quadratic


loss)


under


suitable


groups of


elliptically


transformations for


symmetric


distributions,


a broad


class of


including but


not


limited


normal


distribution.


We conclude


this


section


introducing a few


notations.


a square matrix T


(txt),


tr(T)


denotes


trace.


a symmetric


nonnegative


definite


(n.n.d.)


matrix


is a symmetric


n .n.d.


matrix such


1 1
that T2T2


for


a symmetric


p.d.


matrix T


,1
, T 2


is a symmetric


p.d.


matrix such


that


= (T1


The


Hierarchical


Bayes


Predictor


a Special


Case


We will


assume


normal


linear model


(2.2.2)


Section


when A

known,


2.2.


the

while


We consider


vector of


B and


R are


this section


ratios of va

independently


special


r ance


case


components,


distributed


with


uniform(R )


- gamma( aO,


Here we


will


cons


ider the


case


of finite


population


sampling


in details


1I U Cl 'A I... t... 1


i, \
~s0j


I I


.... 11


F


1


I












still


interested


finding the


predictor of


(recall 1

suffices


that (y(1)


SAy(1)


find


+ C (2)),


predictive distribution


id for this ii

of Y given


Recall


notations K,


G given


(2.3.4)


- (2.3.6).


following


Theorem


Since


3.2.


is known


instead


this case,


we have


of Theorem 2.3.1.


The


proof


of Theorem 3.2.1


is similar to


that of Theorem 2.3.1


is omitted.


Theorem 3.2.1.


Assume


that


Then


under


the model


given


independent


prior for R,


uniform(R )


in Section


prior for B and


predictive distribution


with


known,


gamma( ao,


of Y(2)


given


8g0)

y(1)


is multivariate


t-distribution


with


d.f.


nT +go -P'


location


paramet


er My(1)


scale


parameter (n + go


-K


x (ao


y(1)TKy(1))G


Using the


properties of


the multivariate


distribution,


possible


now to


obtain


closed


form


expressions


for


= y1) and


= y(1).


particular,


Bayes


estimate


of


, (2))


(1)
( y


under


any


quadratic


loss


is now


w1Ven


= (1)


S(1)(


, y(2))


_ y (1)


y(2))


(1>


+ g0


~(y(2)


U











We may note


that


predictor


eBF(()) given


(3.2.1)


is the


outcome


the model


given


and


with


known,


and the


not depend


use of


on the


uniform(R )


choice


the


prior on


prior


B and


(proper)


it does


distribution


of R.


This


can


formally seen,


assuming all


the


expectations


appearing


below


exist,


as follows:


Y R


+(x(2)


+(x


- 52


I. (1)


'ic;1X')


1 (1))
E X^
-11~


(3.2.2)


qualities,


second


equality


follows


from


fact


that


conditional


R=r


(1)b), r-1


-I-


S22
-2^


the fourth


foll


ows


from the


fact


that


conditional


on R


E Y(2)Y())


E{E(y2) B,


E X (2)B +


E 11,


-2


x ((1)T


= My( )


where


in the


above


string of


y(1)


(1)


.1 ,


Cz


E


''')


Y


X


~1(y(l)


(1)


''')


L~Z1C;:11


(1)


-1 (1)
1C11Y


-1 (1>
1C11Y


(1>T _1 (1.)
x C,,y


(2>
b+ C21E;ll\y-


(2)














definition

e* (Y(1)
~BFl ~


of M,


robust


given


(2.3.5).


against


Thus,


choice


the


of prior


predictor

hs for R.


There


are


alternate ways


to generate


same


predictor eBF(Y (1) of


^ w(1)


Suppose,


for


example,


one


assumes onl


be known).


Then


and

best


with


predi


ctor


known

(best


(r may

linear


or may not

predictor


without


normality


assumption)


,(2))


of ((1)


sense


of having the


smallest mean


squared


error matrix


given


SAy(1)
-AY


+ (x(2)


-1 1
~ 21-11-() 3


a.e.,


(3.2.3)


where


(b).


say that


for two


symmetric


matrices


if F


is n.n.d


If b


unknown,


then


one


replaces


UMVUE


(BLUE without

x(1)T _y1 (1)
X E ,Y


normality


assumption)


resulting


predictor of


1.II -


, (2)) turns out


1 a1 an anf


to be e F((1))


PR nredic tnr of


y(2))~


1(Y(1)


i c[EzlE -1.i:Y (1)


(''')


(X(1)TC,1 X''')


f(Y(1)


"Pnnp p













Similarly,


in this special


case,


one


can


derive


predictor


(for quadratic


loss)


of C(b,


context


infinite


population


set


Denoting this


HB predictor


y)IY)


e*I(Y),
*B


one


can


see


that


the


arguments


leading to


empirical


Baye s


interpretation


of eBF(y())


work equally well


to show that


ei~1(Y)


also


possesses the


empirical


Bayes


interpretation.


Harvilie


(1985,


1988,


press)


recognized


this


for


predicting


scalars.


next


four


sections,


we will


discuss


a few


frequentist


3.3 we


show


properties

e* () i
~BFI" ) -


of egF(y(1)) and


best


unbiased


eBI(Y).


Section


predictor


consider


its


stochastic domination


whereas


in Section 3.4


we consider these


properties


for egI(Y).


In Section 3.5 we


show that e YF(1)y )


is best equivariant


predictor of


, y(2))


under suitable


groups of


transformations,


whereas


property


in Section


of eoI(Y)


3.6 we


under


consider the


same


best


groups


equ variance


transforma-


tions.

scalar


Jeske

BLUPs


and

are


Harville


best


(1987)


equivariant


have

with


shown

n the


that

class


the

of all


linear


equ variant


predictors


without any


distributional


assumption.


However,


to our


knowledge,


the equivariance


results


for vector valued


predictors


have


not


been


E((b,


i(y(f )











Best


Unbiased


Prediction


and Stochastic


Domination


in Small


Area Estimation


this


section,


we


assume


normal


linear model


(2.2.2)


with


known.


No prior distribution


for


B and


assumed,


, r)T


is treated


as an


unknown


parameter.


within


First,


class of


prove


unbiased


optimality


of e y(1)


predictors of


, (2))


Next,


we dispense


with


the


normality


assumption


eF(1y()) within


and

the


e, and prove

class of all


optimality


linear


unbiased


predictors


(LUPs).


We start


with


following definition


of a best


unbiased


predictor


(BUP).


Definition


3.3.1.


A predictor T(Y(1))


said


to be a BUP


, (2))


if E [T(Y ')


for every predictor


b(Y()) of


.(1)


= O for


, Y (2)) satisfying


E[6(Y<(1))


0 ''


'p


2))


Y (2))]
, Yovided

provided


n. n.d.


- V T(Y )


(1)


the quantities are


(2))]

finite.


following general


lemma plays a key


proving the


best


unbiasedness of


predictor


eBF(. ) of


,., I 1 I.1 -


of ((Y)


E(yf i)


-~(Y(1)


Y(


(,(1)

_ f(u(')


(2)\


.1! .. L


Ir I


vii











assume


that


each


component


of g(Y)


has a finite


second


moment


Denote


by Ug ,


class


of all


unbiased


predictors


6Yl) of


g(Y)


with


each


component of


6(Y(1)) having a


finite


real


second moment.


valued


Also,


statistics (i


*e,


denote


class of


functions of Y (1)) with


finite


second


moments


having


zero


expectations


identically


in 9.


Lemma 3.3


A predictor T(Y (1))


E Ug


is BUP for g(Y)


only


Cov[T(Y(1))


=9


.3.1)


for


every


Proof


Lemma


.3.1


TT
Tu(Y(1~)


et T(Y)

6u(Y1))


(_)_


If Y(1))


is another pre


I-dictor


then


(Y(1))


a)


+ Cov T(Y(1))


+ Cov ((Y(1)


-g(Y)


(3.3.2)


*..., .


~


~(''',3


(y(l)~


VB~(Y(1))


VB~(y(l))


6(Y (1))


T(Y(1))


-T(r(l))~












Cov T(Y


-6(Y(1))


= 0.


(3.3.3)


From


(3.3.2)


(3.3.3)


follows


that


V0[ (Y(1)


Sv0[r(x<1))


- g(Y) +


- TY( ) ,


(3.3.4)


for all


Hence


T(Y(1))


is BUP for


Only


Given


that


is BUP


we will


show that


condition


3.3.


true.


First


we will


show that

>t UiY (1)


is BUP for gi(Y)


for


every


1,...,


any


unbiased


predictor for


Then 6y Y(1)),


a u-component


column


vector with


component


equal


to U (1)),


belongs


to Ug.


Then


is n.n.d.


So we


have


V Ui((Y'1))


- i(Y)]


- i(Y)


> o,


S ,


consequently


is BUP for


Now following


usual


Lehmann-Scheffe


(1950)


technique


(also


Rao,


Ti(Y(I))


- g(Y)


g(y;]


Vl~(y(l))


g(y>


T(u(l))


Ti(U(1))


gi(y>


g(Y ~


V


~(1''',


g(Y ~


Ve[Ti(Y('))


Ti(Y(1))


gi(y>











Hence,


(3.3.1)


holds,


the


proof


lemma


is complete.


Remark 3.3.1.


follows from the


above


lemma


(see


(3.3.4))


that


if-T) and T2(Y(1)) are


both


BUPs of


then


r6'''IC


= CovO Tl(Y('))


C- ov T2(Y(1))


= 0


(Y),


- g(Y)


(3.3.1)


- 2(Y('))


T2(Y (1))


Ilu~)


for all


- 2(Y(1)


Remark


3.3.2.


is also


clear that


technique of


above


lemma can


applied


more


general


contexts.


We will


use


above


lemma


prove


BUP


property


of e*F(y)


following theorem.


Recal 1


from


(3.2.1)


that


eBF(yl))


Theorem 3.3.


Under


normal


linear model


(2.2.2),


*- a -


i.e.,


e; (Y)


(Tl(y(l))


Tl(u(l))


Pb~l(Y(1))


+ CM)Y


,, (2)\


f LI.











Proof


of Theorem


3.3.


view


of Lemma 3.3.1,


suffices to


show that


for every m(Y(1)


Cov[ e(Y(1))


that


is EG C(MY
O ~- --


-y (2))(Y (1))]


= f'
-


= O for


or all


Since,


under the model


2.2.2


- E4Y~2~


(- -l


, using


E,.(Y (1))


it suff


ices


to show that


(3.3.6)


Since


E,(Y(1))


= 0


-x())


s-i


- 1)2


- 0,


differentiating


both


sides


of this


equation


w.r.t


one


gets


(see


318 of Rao,


1973)


/X (1)T-1 (1)


-1 (1)
-2r (Jl


- X b)


11 (1)
E11 y


d(1)
dy


= 0.


(3.3.7)


2


E (X (1)Ty (1) m(y (1)


/o(


x exp


- f(r(l)


t u (2))


MY(1)


x(l>TC-1. (1)
Y11Y


,e11 X(1))


(X(2)


(1)
dy


_L2 r(,(l)


(l))exp/


b,


X (I)b)m(y(l))


X (l)b)













Remark 3.3.3.


Equation


(3.3.6)


can


be alternatively proved


r-1E
--11)


following way.

(x1)T (1)


Note


that


, Y(1)TKy (1))


since


is complete


suffic


ient


for 0


Hence (1)T1 (1)
Hence X' E Y


must


have Q covariance


vector


with


every


zero estimator m(Y ()),


i.e.,


E[(X (1)T(1))m()]


=9.


Next


we show that


conclusion


of Theorem 3.3.1


continues


to hold


even


certain


nonnormal


distributions.


Suppose


, T)T


that


= Diag(D,


Assume


that


given


N(O,


r-1A),


while


the df


of R


is an


arbitrary member of


family


is absolutely


continuous with


f(r)


= 0 for


r < O}.


denote


subfamily


of 1 such


that each


component


of egF((1)) and


(.71)


Sy(2))


finite


second


moment


under the model


(2.2.2)


joint distribution


of e*


now


prove


following theorem.


Theorem 3.3.2.


model


(2.2.2),


eBF(Y())


is BUP


~ N(O,


of t

r-1a)


, (2)) under the


R has


a df


from


Proof


of Theorem 3.3.2.


Using


Lemma 3.3.1,


following


proof


of Theorem 3.3.1,


it suffices


to show that


r,


N(X(l)t2


y (1)


(y (1)











(3.3.9)


Eb ,F[(1))


Eb, m2( (1)


00 for all


b and


Consider


subfamily


Sgamma( c


2} of


, d)


Since


(3.3.9)


holds


for this


subfamily fl,


EkF^m(1)


36 gives


exp(


1 (n +d)-1
-2c r


x exp


- Xb)T


-1 (1)
E ny


- (~b


LIr(
-2^


xrnY~))v'~ d


= 0


(3.3.10)


c >


0 and


> 2.


using the


uniqueness


property


of Laplace


transforms,


follows


from


(3.3.10)


that


5(nT+d)-1


exp


- X()b)T


1-
--r
2


- X b)


x m(


a.e.


Lebesgue


Jexp


1,
-2


r > O and


- X b)T
- -


- (~b


x m(y


(1) dy=


(3.3.11)


*e,


(


c > 0,


ci:(r


(I)),y


C 1 l\y_












simplifications


using


(3.3.11)


lead


x exp


SX(1)b)T


_1-ry


-111,1


- X ( b)


(3.3.12)


a.e.


Lebesgue


0 and


Multiplying both


sides of


(3.3.12)


1


integrating with


respect


dF(r)


where


one


gets


(3.3.8).


Remark 3.3.4.


Since


does


not


contain


the degenerate


distributions of


R on


(0,oo) ,


Theorem 3.3.1


does


not


follow


from


Theorem 3.3.2.


Remark 3.3.5.


In Theorem 3.3.2


we take


for F*


,we


see


that


marginal


distribution


of Y


given


by the


family


of distributions


(c/d)


> 0,


this


> 2}


family where


distribution


with


eBF (Y)

N*I|NT, Xl

location


BUP for


(c/d)E,


parameter


, (2)) for


is NT-variate


scale


parameter


(c/d)


and d.f.


Next


will

Y(1)


show that


predictor e~F(Y() (which


linear


a best


linear


unbiased


predictor


/(X(1)TFi:r fl))m(y(l))


x dy


Xb,


~(U(')


t -











, we say


that (Y(1))


is a LUP


,(2))


of (Y1)


need the


following definition.


Definition 3


.3.2.


A LUP Py(


, (2))


of (( )


is said


be a BLUF

V(HY1)
VHY
- --


n.n.d.


for


every


LUP HY(1)


, y (2))


, Y<2-


of ((1)


S- -(


Sv(py1
a ~~


for all


now


prove


the BLUP property


of eBF(y1) for


predicting


(1)


, (2))


this


end,


we will


state


lemma whose


proof


is similar to


proof


of Lemma 3


.3.1


hence


proof


will


be omitted.


Lemma


3.3


A LUP


, y(2))


of ((1)


- V(1


a BLUP


if and


only


, (2))


Cov0PY
O -


, Ty())
mY}


=9


.3.13)


for all


every


known


nTxl


vector m


satisfying


E(mTy (1))


= 0 for


The


following theorem


provides


BLUP


property


-BF(- )) for


predicting f(Y


, y(2)).


proving this


BLUP


property


we do


not


need


any


distributional


assumption


on e


*. We only


assume


=9


ae~


) = r-14


_~(u(l)


_ E(Y(I)


Eg(e













Proof


of Theorem 3.3.3.


If E0,(mTy())


= mTx (l)b


= 0 for


mTX<(1)


= OT


Hence,


, (2))


Cov Y ))


= CovC( MY (1)


-y (2)),


Ty (1)]


for all


last


two qualities


follow from


definition


of M and


from


fact mnTx(1)


Applying


Lemma 3.3.2,


Remark 3.3.6.


result


follows.


already mentioned,


normal ity


assumption

* Y(1).


is not


needed


Theorem 3.3.3


proving the


unifies


BLUP property


extends


available


BLUP


results


related


estimation


finite


population mean


vector


under different models


(cf.


Ghosh


Lahiri,


in press;


Royall


, 1976;


others)


Remark 3.3.1


one


can


prove


that


BLUP


unique


with


probability


one.


, Ty(1)


S


C(X


~(y(l)


C21)m


C (MFI1


-1 x (1))(X (1>T -1 X''')
- C21C11 C11


,(1)T


T
O












- a)(


(3.3.14)


the model


(2.2.2)


without any


distributional


assumption


on e


The


optimality


of eF(y (1) within


LUPs


holds


a fortiori


under


quadrati


loss


2
-in


- C


tr[gLo (,


(3.3.15)


where


is a n.n. d.


matr ix.


Such


loss


will,


henceforth,


be referred


to as generalized


Euclidean


error w.r.t.


optimality


Theorem 3.3.2


results


under the


carry


added


over via


Theorem 3.3.1


distributional


assumption


(which


not


necessarily normality


assumption)


on e


natural


question


to ask


now


is whether the


risk optimality


of eF(Y (1))

predictors,


holds


or at


with in


least


class of


within


class


unbiased


LUPs


under


certain


other


criterion


a broader family


distributions of


notions of


"un iversal"


investigate


this question,


"stochast ic"


we need


domination,


their


interrelationship


as given


Hwang


(1985).


It. (R F


= p a I I-ti''--


risk


16


Lo


Li(rl


-~)Tn(h


u(2)\12\


- FIV(1)


IF; L~IIC


I


.


I












w.r.t.


O for


some


function


The


following definition


adapted


from Hwang


(1985).


Definition 3.3.3.

dominates 2(Y (1))


w.r.t.


An estimator


(under the


for every ,


general ized


6l(Y () universally


Euclidean error


every nondecreasing


loss


function


holds and


for


particular


RLo(,

loss,


risk functions are


not


identical.


Hwang


(1985)


shown


that


(see


Theorem


2.3)


universally


dominates


under the


generalized


Euclidean


error w.r.t


2
(2))I
' -Y


if 6(Y(1))


stochastically


smaller than (62(Y )


2
Y(2)
, Y-


say that


a random variable


is stochastically


smaller


than


if Po(Z1


> x)


PO (Z2


> x)


for


for some


have distinct distributions.


next


theorem shows


that


for


a general


ass


elliptically symmetric distributions of


, eBF(1)


universal ly


dominates


every


, (2)) under


every


generalized Euclidean


error w.r. t.


a n .n.d.


Assume


that


has an


elliptically symmetric


given


Ir-1 -Afre*T A-le*


(3.3.16)


h (e* IA,


RL(8


- f(Y(1)


(U(')


HY (1)


o,














q
|vi. +
i=1


NT
i=le i +


f(re*TA-le
f( -


*)de*


(3.3.17)


where


(vl,...,


vq)T


(el,...,


eNT)T


We will


denote


this distribution


by 8f(O,


r-1A)


where


n*<2)


denotes


the distribution


whose


is given


k(tl|,


Qa *2 f((t


- ) Tg*-l(t


- )/o2)


(3.3.18)


where


are


, g*(pxp)


is p.d.


Note


that


normality


with


mean


variance-covariance matrix


r-1A


sufficient


not


necessary for


(3.3.16)


(3.3.17)


to hold.


follows


from


(3.3.17)


Note


that


that


**


exists


1)

with


distribution


.1
2*+


from


(3.3.16)


has a spherically


characteristic


function


symmetric


(c.f.)


E exp(iuT e


= c (uu)


some


function


(see


Kelker,


1970)


where


=47j,


(Ul,...,


= NT


qv)


+ q.


Hence


c.f.


given


E[exp(iuT e


= c(r-1uTAu).


(3.3.19)


(4' *t I


B)


"f( II ,


gf(0


,,~














where


+ ZDZT


Comparing


(3.3.16)


(3.3.19)


one


can


see


from


(3.3.20)


that


has also an


elliptically


symmetric distribution


with


given


h(w*I ,


Ir-1 2 (rw


*TE-lw*).


(3.3.21)


Theorem 3.3.4.


Under the model


(2.2.2),


(3.3.16)


(3.3.17),


eBF (1))


universally


dominates every


LUP


Y, 2)) for


of (y1)


(y(l))


every p.d.


Remark


3.3.7.


Theorem 3.3.4 does


not


contain


Theorem


3.3.3


since


Theorem 3.3.4


distribution


of W*


requires


, while


the elliptical


other does


symmetry


not.


of the


It should


noted


though


that


model


assumption made


(3.3.16)


not


necessarily


stronger than


usual


assumption


finiteness of


certain moments.


assumptions of Theorem 3.3.4


This


hold


even


is because


if a distribution


infinite


second


moment


(e.g.,


for


certain multivariate


BLUP property


is meaningless


in such


instance.


Now we will


Theorem 3.3.4


state


rests


prove


crucially


lemma.


on this


proof


lemma.


Lemma 3.3.3.


If W(NTXl)


then


e


r-1C)


gf(0,


HY(I)


)













Proof


of Lemma


3.3


proof


follows


arguments


Hwang


(1985)


From


(3.3


.19)


follows


that


E[exp( itTW)


= c(r


-it T


Hence


E[exp( itTLW)]


= c(r


-1 TLL
~1--


Ttl)


(3.3


.22)


where


is a uxl


vector


Next


using


(3.3


.22),


exp(


T
it (LL
~1(- -


^T) u1
T)2w


1
T)' (Iu


expl itT(LL
-~1 (


= c(r


-1iT -(L


1(
T- (Iu


O)(Iu


O)T(LL


-tT (LLT)tl)
~1-Wi-


(3.3.23)


so that


lemma


follows


from


(3.3


.22)


.3.23).


follows


as a consequence


of Lemma


3.3.3


that


WTLTLW


(LW) T(LW)


1
T) Wu


(,


T) w


= WTLL W .


(3.3


.24)


We shall


use


.3.24)


repeatedly


for


proving


Theorem


3.3.4.


'-i'


= c(r


) tl1


E


5


o)w












- A)


- CX


2) (3


.3.25)


Writing


_1
= E 2W*


using


.3.24)


.3.25)


one


HY(i)


2T
2))]


(y(l)


Y


_HY_(1)
SHY


(y(1)


- cT [H


-C *


= T
a.


1


c]Tn{H


- C]


1T
WOH CIL


- C]


.3.26)


Similarly,


*
i3F(Y('))


(y(1)


2))]


eBF(Y())


(y(1)


= W*T CM


= wT
a.


E- CM


- CT nCM


- C-T -CM


W1
WTQ2[ucM..


1
CT 2V .


- C]


.3.27)


Write


- CM


Then,


.3.26)


.3.27)


T+ W
wuo FEr


rT2W- u


.3.28)


a.cC~


- C *


(2
Y


Ut


Y(2











T
- C)Ef )
O )


-l 11 T
s21


Ss21) T


- C(MEi


-1 1T-1
- a x Yx 0^ x
~21-11- A- ~11~


x(1)TET


(3.3.29)


using


(3.3.25),


-CM)(1)x
- -ll~


-MX(1))


= c(x(2)


= 0.


(3.3.30)


Theorem 3.3.4


follows


now from


(3.3.26)


- (3.3.28).


Also,


since


1 is positive


definite,


follows


from


(3.3.28)


that


rhs of


(3.3.26)


rhs of


(3.3.27)


only


=9,


that


is H


(cf.


Hwang,


1985).


3.4


Best


Unbiased Prediction


and Stochastic


Domination


Infinite


Population


In this section


will


briefly


consider


a few


optimal


properties


which


of ei (Y)


are


similar to


those of


eBF (Y) following closely Section 3.3.


First,


we note


that


-eBI(Y)


is optimal


within


class


of all


unbiased


predictors of


under


normal


linear model


with


known.


finite


population


case,


no prior


distribution


B and


is assigned,


, r)T


treated


as an unknown


parameter.


Next,


dispensing with


= c(x 2)


if r


c(M


rX(1)


A + CM














Definition 3.4.1.


be an


unbiased


A predictor 6(Y)


predictor


if E [6(Y)


of C(b,

- C(b ,


is said


= 0 for all


An unbiased


predictor U(Y)


of C(b


, v)


is said


to be


for


v0e (Y)


-V


every unbiased


V- -U(Y)
O-V-


predictor


- c(b,


6(Y)


of C(b,


is n.n.d.


for


provided


the quantiti


exist


finitely.


Recall


that


-0b


following


lemma


is analogous


to Lemma 3.3.1,


concerns


characterization


of a BUP


based


on


for some


known


function


g where each


component


has a


finite


second moment.


Lemma 3.4.1


An


Es[T (Y)U (Y)
0- -


unbiased


predictor U(Y)


BUP for


g(W)


of g(W)


with


only


Cov{U(Y)

statist i


m(Y)]


m(Y)


such


that


every


Eo(m(Y))


0 and


E0[m2(Y)


oo00 for


Lemma 3.4.1


can


proved


similarly


Lemma 3.3.1


proof


is omitted.


We will


use


this


lemma


to sketch


a proof


of the


following theorem which


concerns


best


unbiased


prediction


of C(b,


Theorem 3.4.1.


Under


normal


linear model


(2.2.1),


SBI(Y)


is the


BUP


of C(b,


of g(W)


- g(W)












v)}m(Y)~


=
J-


for


(3.4.1)


Note,


however that with


PO-probabi 1 ity


E[SBI(Y)


-eiI(Y)


TDZT-1 (y
- -


- _Xb)


S(XT


- TDZT


1X


(3.4.2)


From


(3.4.1)


(3.4


it suffices to show that


E [(XT


= 0 for


This


is proved


similar to


(3.3.6).


Remark 3.4.1.


conclusion


of the


above


theorem holds


even


certain


nonnormal


distributions.


Theorem


3.3.2,


one


can


show that


e;I(Y)


of C(b


v, )


under the


model


(2.2.1)


where e*


~N(O,


r-1A)


R has a df


from r*


where e*


are


same


in Theorem


3.3


Next,


note


that


predictor eiI(Y)


linear


can


proved


as


in Theorem 3.3.3 that


is BLUP


under


linear model


(2.2


without


any


E {eBI (Y)


, v)


sb


xb)


ly),(y~












Now we


will


show that


131(Y)


dominates


universally


of ((b,


an elliptically symmetric distribution


Consider the


generalized


Euclidean


error


loss


w.r.t.


a uxu


p.d.


matrix 0


2
- cIn


- ).


- C(b,


v


(3.4.3)


risk


function


predictor 6


for


predicting (


under a


loss


function


which


is a


function


of generalized


Euclidean


error


w.r.t.


O for


some


function


following definition


similar


to Definition


3 3.3.


Definition


3.4.2.


An estimator


61(Y)


universally


dominates


another


estimator


62(Y)


(under the


generalized


Euclidean


error w.r.t.


for every 0


every nondecreasing


function


, C;


holds and


for a


particular

Now we


RL( ,

loss,

will


the r

state


isk functions are


not


identical


following theorem on


stochastic


domination


of e*I(Y)


proof


will


be omitted


because


its similarity to


Theorem 3.3.4.


of e*


16


L1


C>To (


(y)


RL(B


Ee


RL (B