ROBUST BAYESIAN INFERENCE IN FINITE POPULATION SAMPLING
By
DALHO KIM
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA
1994
To my parents and teachers
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to Professor Malay Ghosh for being my advisor and for originally proposing the problem. Words cannot simply express how grateful I am for his patience, encouragement and invaluable guidance. Without his help it would not have been possible to complete the work. I would like to thank Professors Michael A. DeLorenzo, Alan Agresti, James G. Booth, Brett D. Presnell and Jeffrey A. Longmate for their encouragement and advice while serving on my committee.
Much gratitude is owed to my parents, brother, sisters, motherinlaw, brotherinlaw and sistersinlaw, whose support, advice, guidence and prayers throughout the years of my life have made this achievement possible. My debt to them can never be repaid in full. A very special thanks are offered to my wife, Kyungok, for her love, patience, support and prayers, and my daughters, Seunghyun and Grace, for being a joy to us.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ............................
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTERS
I INTRODUCTION ...............................
1.1 Literature Review ......... ......... .. .........
1.2 The Subject of This Dissertation ....................
2 ROBUST BAYES ESTIMATION OF THE FINITE POPULATION MEAN
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Robust Bayes Estimators .........................
2.3 Symmetric Unimodal Contamination ..................
2.4 An Exam ple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 ROBUST BAYES COMPETITORS OF THE RATIO ESTIMATOR ...
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 EContamination Model and the MLII Prior ..............
3.3 Symmetric Unimodal Contamination ..................
3.4 An Exam ple . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 BAYESIAN ANALYSIS UNDER HEAVYTAILED PRIORS ........
Introduction .... Known Variance. Unknown Variance An Example ....
5 BAYESIAN ROBUSTNESS IN SMALL AREA ESTIMATION .....
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Contamination Class ........................
. . . . . . . . . . . . . .ï¿½
. . . . . . . . . . . . . .ï¿½
5.3 Density Ratio Class ................................... 84
5.4 Class of Uniform Priors ................................ 86
5.5 An Example ........ ................................ 87
6 SUMMARY AND FUTURE RESEARCH ....................... 114
6.1 Summary ........ ................................. 114
6.2 Future Research ..................................... 115
BIBLIOGRAPHY ........................................ 116
BIOGRAPHICAL SKETCH ....... ........................... 122
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
ROBUST BAYESIAN INFERENCE IN FINITE POPULATION SAMPLING By
DalHo Kim
August 1994
Chairman: Malay Ghosh
Major Department: Statistics
This dissertation considers Bayesian robustness in the context of the finite population sampling. Although we are concerned exclusively with the finite population mean, the general methods are applicable to other parameters of interest as well.
First, we consider some robust Bayes competitors of the sample mean and the subjective Bayes estimators of the finite population mean. Specifically, we have proposed some robust Bayes estimators using MLII priors from the Econtamination class. Classes of contaminations that are considered include all possible contaminations and all symmetric, unimodal contaminations. These estimators are compared with the sample mean and subjective Bayes estimators in terms of "posterior robustness" and "procedure robustness". Similar robust Bayes estimators are introduced in the presence of auxiliary information, and these are compared again in terms of "posterior robustness" and "procedure robustness" with the ratio estimator, and subjective Bayes estimators utilizing the auxiliary information. Also, we provide the range where the posterior mean belongs under Econtamination models.
Second, we consider the idea of developing prior distributions which provide Bayes rules that are inherently robust with respect to reasonable misspecification of the prior. We provide the robust Bayes estimators of the finite population mean based on heavytailed prior distributions using scale mixtures of normals. A Monte Carlo method, the Gibbs sampler, has been used for implementation of the Bayesian program. Also, the asymptotic optimality properties of proposed robust Bayes estimators are proved.
Finally, we address Bayesian robustness in the hierarchical Bayes setting for small area estimation. We provide robust hierarchical Bayes estimators of the small area means based on MLII priors under the Econtamination class where the contamination class includes all unimodal distributions. Also, we provide the range where the small area means belong under 6contamination class as well as the density ratio class of priors. For the class of priors that are uniform over a specified interval, we investigate the sensitivity as to the choice of the intervals. The methods are illustrated using data related to the median income of fourperson families in the 50 states and District of Columbia.
CHAPTER 1
INTRODUCTION
1.1 Literature Review
When prior information about an unknown parameter, say 0, is available, Bayesian analysis requires quantification of this information in the form of a (prior) distribution on the parameter space for any statistical analysis. The most frequent criticism of subjective Bayesian analysis is that it supposedly assumes an ability to quantify available prior information completely and accurately in terms of a single prior distribution. Given the common and unavoidable practical limitations on factors such as available prior elicitation techniques and time, it is rather unrealistic to be able to quantify prior information in terms of one distribution with complete accuracy. In view of this difficulty in prior elicitation, there has long been a robust Bayesian viewpoint that assumes only that subjective information can be quantified only in terms of a class F of prior distributions. A procedure is then said to be robust, if its inferences are relatively insensitive to the variation of the prior distribution over F.
The robust Bayesian idea can be traced back to Good as early as in 1950 (see for example Good (1965)), and has been popularized in recent years, notably by the stimulating article of Berger (1984). There is a rapidly growing literature in the area of robust Bayesian analysis. Berger (1984, 1985, 1990) and Wasserman (1992) provide reviews and discussion of the various issues and approaches. Berger (1984) discusses the philosophical or pragmatic reasons for adopting the robust Bayesian
viewpoint along with a review of some of the leading approaches to robust Bayesian analysis. More recently, Berger (1990) provides a review of different approaches to the selection of F and techniques used in the analyses. Wasserman (1992) discusses several approaches to computation of bounds on posterior expectations for certain classes of priors as well as different approaches to graphical and numerical summarization.
Various classes of priors have been proposed and studied in the literature. One such class is the Econtamination class having the form F = {r: 7r(0) = (1  e)7ro(O) +eq(O),q E Q} (1.1.1) where Q is the class of allowable contaminations, and 7r0 is a fixed prior, commonly called the base prior, which can be thought of as the prior one would use if a single prior had to be chosen. 6contamination priors have been used by Huber (1973), Berger (1984, 1990), Berger and Berliner (1986), and Sivaganesan and Berger (1989) among others. Berger and Berliner (1986) studied the issue of selecting, in a data dependent fashion, a "good" prior distribution (the typeIl maximum likelihood prior) from the Econtamination class, and using this prior in any subsequent analysis. Sivaganesan and Berger (1989) determined the ranges of the various posterior quantities of interest (e.g. the posterior mean, posterior variance, posterior probability of a set (allowing for credible sets or tests)) as the prior varies over F. Wasserman (1989) provides a robust interpretation of likelihood regions in the case of all possible contaminations. Also related is Berger and Sellke (1987), who carry out the determination of the range of the posterior probability of a hypothesis when E = 1 in (1.1.1) (i.e., when there is no specified subjective prior x0).
Another class of interest has been considered in DeRobertis and Hartigan (1981) to find ranges of general posterior quantities. This class is of the form
{71: 7F(01) < gL(O ) Vo1,02} (1.1.2)
(02) 2(2)
where g2 < gi are given positive functions. Such class is called a density ratio class by Berger (1990). Wasserman and Kadane (1992) develop a Monte Carlo approach to computing bounds on posterior expectations over the density ratio class. Cano (1993) considers the range of the posterior mean in hierarchical Bayes models for several classes of priors including Econtamination class and density ratio class.
Learner (1982) and Polasek (1985) considers a class of conjugate priors with constraints on the domain of parameters to get closed form expressions for posterior expectations, and did sensitivity analyses. DasGupta and Studden (1988) consider the above conjugate class, as well as a density ratio class in the context of Bayesian design of experiments.
The difficulty of working with a class of priors makes it very appealing to find prior distributions which provide Bayes rules that are naturally robust with respect to reasonable misspecification of the prior. Since Box and Tiao (1968, 1973), Dawid (1973), and O'Hagan (1979), it has been recognized that insensitivity to outliers in Bayesian analysis can be achieved through the use of flattailed distributions. Use of tpriors when the data are normal has been specifically considered by West (1985) and O'Hagan (1989). Angers and Berger (1991) consider a robust Bayesian estimation of exchangeable means using Cauchy priors. In Angers (1992), the Cauchy prior is replaced by tprior with odd number of degrees of freedom. Andrews and Mallows (1974) and West (1987) study scale mixture of normal distributions which can be used for achieving Bayesian robustness. The Student t family, doubleexponential, logistic, and the exponential power family can all be constructed as scale mixtures of normals (cf Andrews and Mallows (1974)). West (1984) uses scale mixtures of normal distributions for both the random errors and the priors in Bayesian linear modelling. Carlin and Polson (1991) discuss Bayesian model specification and provide a general paradigm for the Bayesian modelling of nonnormal errors by using scale mixtures of normal distributions.
The references so far pertain to Bayesian analysis for infinite populations. Bayesian analysis in finite population sampling is of more recent vintage. A unified and elegant formulation of Bayes estimation in finite population sampling was given by Hill (1968) and Ericson (1969). Since then, there are many papers in the area of Bayes estimation in finite population sampling. But most of the Bayesian literature in survey sampling deals with subjective Bayesian analysis in that the inference procedure is based on a single completely specified prior distribution.
The need for robust Bayesian analysis in survey sampling, however, has been felt by some authors, although the methods described in the first few paragraphs of this section have not so far been used to achieve this end. Godambe and Thompson (1971) adapted a framework whereby the prior information could only be quantified up to a class F (C in their notation) of prior distributions. For estimating the population total in the presence of auxiliary information, they came up with the usual ratio and difference estimators, justifying these on the ground of location invariance. The model assumption there played a very minimal role, the main idea being that modelbased inference statements could be replaced, in the case of modelfailure by designbased inference. In a later study, Godambe (1982) considered the more common phenomenon of specific departures from the assumed model. His contention there was that sampling designs could be a useful companion of modelbased inference procedures to generate "nearoptimal" and "robust" estimators. However, the basic model assumed in that paper considered Y1,. ,'YN to be independent, and attention was confined only to design and model unbiased estimators. Royall and Pfefferman (1982) also consider robustness of certain Bayes estimators in finite populations. However, their main concern is to find out conditions under which the Bayes estimators under an assumed model remain the same under certain departures from the model.
Much of the earlier work as well as some of the current work in finite population sampling relates to inference for a single stratum. However, there is a recent need for simultaneous inference from several strata, particularly in the context of small area estimation which is becoming increasingly popular in survey sampling. Agencies of the Federal Government have been involved in obtaining estimates of population counts, unemployment rates, per capita income, crop yields and so forth for state and local government areas. In typical instances, only a few samples are available from an individual area, and an estimate of a certain area mean or simultaneous estimates of several area means can be improved by incorporating information from similar neighboring areas. Ghosh and Rao (1994) have recently surveyed an early history as well as the recent developments on small area estimation.
Bayesian methodology has been implemented in improving small area estimators. Empirical Bayes (EB) approach has been considered for simultaneous estimation of the parameters for several small areas where each area contains a finite number of elements, by Fay and Harriot (1979), Ghosh and Meeden (1986), and Ghosh and Lahiri (1987). Datta and Ghosh (1991), as an alternative to the EB procedure, propose hierarchical Bayes (HB) approach for estimation of small area means under general mixed linear models, and also discuss the computational aspects. To handle the case of heavy tailed priors in the setting of Fay and Herriot, Datta and Lahiri (1994) use tpriors viewing them as scale mixture of normals.
1.2 The Subject of This Dissertation
This dissertation considers Bayesin robustness in the context of the finite population sampling. Although we are concerned exclusively with the finite population mean, the general methods are applicable to other parameters of interest as well.
In Chapter 2, we develop some robust Bayes estimators of the finite population mean using MLII priors under Econtamination models. We compare the performance
of these estimators with the subjective Bayes estimators as well as the sample means using the criteria of "posterior robustness" and "procedure robustness" as suggested by Berger (1984). As a consequence of the results on procedure robustness, we have established some asymtotic optimality of the robust Bayes estimators even in the presence of a subjective prior. Furthermore, modifying slightly the arguments of Siverganesan and Berger (1989), we have provided the range where the posterior mean belongs under contamination models.
In Chapter 3, we provide expressions for variations of the posterior mean within the Econtamination class of priors in the presence of auxiliary information following Siverganesan and Berger (1989). Moreover, similar robust Bayes predictors using the ML11 priors under this contamination class of priors are introduced in the presence of auxiliary information. We compare these estimators again in terms of "posterior robustness" and "procedure robustness" with the classical ratio estimators, and a subjective Bayes predictors utilizing the auxiliary information. In the course of calculating indices for procedure robustness, we have proved the asymptotic optimality of the robust Bayes predictors.
In Chapter 4, we propose certain robust Bayes estimators of the finite population mean from a different perspective. In order to overcome the problem associated with outliers in the context of finite population sampling, we develop Bayes estimators of the finite population mean based on heavytailed priors using scale mixtures of normals. Also, we study the asymptotic optimality property of the proposed Bayes estimators. For implementation we have devised a computerintensive fully Bayesian procedure which uses Markov chain Monte Carlo integration techniques like the Gibbs sampler.
Chapter 5 addresses Bayesian robustness in the hierarchical Bayes setting in the context of small area estimation. We provide the robust hierarchical Bayes estimators of the small area means based on MLII priors under the Econtamination class of
7
priors where the contamination class includes all unimodal distributions. We compare these estimators with HB and EB estimators. Also, we provide the range where the small area means belong under Econtamination class as well as the density ratio class of priors.
Finally, in Chapter 6, we discuss some open problems which will be topics for future research.
CHAPTER 2
ROBUST BAYES ESTIMATION OF THE FINITE POPULATION MEAN
2.1 Introduction
Consider a finite population U with units labeled 1, 2,..., N. Let yi denote the value of a single characteristic attached to the unit i. The vector y = (y,  , YN)T is the unknown state of nature, and is assumed to belong to E) = RN. A subset s of {1, 2,..., N} is called a sample. Let n(s) denote the number of elements belonging to s. The set of all possible samples is denoted by S. A design is a function p on S such that p(s) G [0,1] for all s C S and Zscsp(s) = 1. Given y G E and s = {il,"',in(s)I with 1 < il < ... < in(,) < N, let y(s) = IYi", Yin(s)}. One of the main objectives in sample surveys is to draw inference about y or some function (real or vector valued) y(y) of y on the basis of s and y(s).
Bayes estimators of the finite population total (or the mean) within the class of linear unbiased estimators were obtained by Godambe (1955). Subsequently Godambe and Joshi (1965) found such estimators within the class of all unbiased estimators. General Bayesian approach for finite population sampling was initiated by Hill (1968) and Ericson (1969). Since then, a huge literature has grown in this area. Ericson (1983) provides a recent account.
We have discussed the use of a single subjective prior in Section 1.1. In this chapter, we generate certain robust Bayes estimators using econtamination priors, and study their performance over a broad class of prior distributions on the parameter
space. As we mentioned in Section 1.2, we have restricted ourselves to the estimation of the finite population mean. It may be noted that in our framework, the sampling design does not play any role in the robustness study. More generally, in the Bayesian framework, the irrelevance of the sampling design at the inference stage has been pointed out earlier by Godambe (1966), Basu (1969) and Ericson (1969). Our findings are consistent with that. This, however, does not undermine the importance of sampling design in the actual choice of units in a survey. It is very crucial to design a survey before its actual execution, and we have made some comments on why a simple random sample is justified even within our framework.
Royall and Pfefferman (1982) also consider robustness of certain Bayes estimators. However, their main concern is to find out conditions under which the Bayes estimators under an assumed model remain the same under certain departures from the model.
The contents of the remaining sections are as follows. In Section 2.2, we develop some robust Bayes estimators of the finite population mean using MLII priors under Econtamination models (see Good (1965) and Berger and Berliner (1986)) where the contamination class includes all possible distributions. As in Berger (1984), the concepts of posterior and procedure robustness are introduced, and the proposed robust Bayes estimators are compared to the sample means and the subjective Bayes estimators under this criterion. It turns out that the robust Bayes estimator enjoys good posterior as well as procedure robustness property relative to the sample mean, while the subjective Bayes estimator enjoys good posterior robustness, but lacks in procedure robustness. As a consequence of the results on procedure robustness, we have established some asymtotic (as the sample size increases) optimality of the robust Bayes estimators in the presence of a subjective prior. Specifically we have shown that.the difference of the Bayes risks of the proposed robust Bayes estimator and the optimal Bayes estimator converges to zero at a certain rate as the sample size tends
to infinity. Also, modifying slightly the arguments of Siverganesan (1988), we have provided the range where the posterior mean belongs under Econtamination models. This pertains to the sensitivity analysis of the proposed robust Bayes procedure.
Section 2.3 contains the more realistic framework when the contamination class contains only symmetric unimodal distributions. We have developed robust Bayes estimators as competitors of the classical estimator as well as the subjective Bayes estimator, and have studied their properties.
Finally, in Section 2.4, a numerical example is provided to illustrate the methods described in Sections 2.2 and 2.3.
One of the highlights of this chapter is the analytical study of the procedure robustness and the subsequent asymptotic optimality in the presence of a subjective prior. To our knowledge, this issue has not been addressed before in the previous work.
For simplicity, in the subsequent sections, only the case where p(s) > 0 if and only if n(s) = n will be considered. This amounts to considering only fixed samples of size n. Also, throughout this chapter, the loss is assumed to be squared error.
2.2 Robust Bayes Estimators
Consider the case when yi0 i N(0, a2) (i=l, ..., N) and 0 , N(po, T2). Write Mo = a2/T2, Bo = Mo/(Mo + n), 9(s) = n' KESYi and y( ) = {yj : i V s}, the suffixes in y(g) being arranged in the ascending order. From Ericson (1969), it follows that the posterior distribution of y(g) given s and y(s) is N({(1  Bo)g(s) + Bo0} 1N,, a2(IN. + (M0 + n)g'JN)), where l is a ucomponent column vector with each element equal to 1, J, = 1,1T and I,, is the identity matrix. Then the Bayes estimator of 'y(y) = N1 is
J0(s,y(s)) = E[,(y)Js,y(s)] = fV(s) + (1 f){(1  Bo)97(s) +Bopo}
V (s) (1f)Bo(g(s)  yo), (2.2.1) where f = n/N. Strictly speaking, we should call 6 a predictor of y(y) since 6 is the sum of the observed yi's plus a predictor of the total of the unobserved yi's, but we shall use the two terms "estimators" and "predictors" interchangeably. Also, the associated posterior variance is given by
V(y(y)Js,y(s)) = N2(N  n)U2(Mo + N)/(Mo + n). (2.2.2)
The classical estimator of y(y) is 6C(s, y(s)) = g(s) which is a designunbiased estimator of 'y(y) under simple random sampling. Also, Jc is a modelunbiased estimator of y(y) under any model which assumes that the yi's have a common mean.
To derive robust Bayes estimators of y(y), we first introduce the notion of Econtaminated priors. Denote 7ro by the N(p0, T2) distribution. The class F of prior distributions is given by
{r = 1r:7r = (1 E)0ro +Eq,q G Q}, (2.2.3) where 0 < e < 1 is given, and Q is the class of all distribution functions. We denote m(yir) by the marginal (predictive) density of y under the prior 7r. If 7r E F,, we can write
m(yl7r) (1  E)m(yl7ro) + Em(ylq). This leads to
m(y(s)[7r) = (1  _)m(y(s)17ro) + em(y(s)Iq). (2.2.4) Our objective is to choose the prior 7r which maximizes m(y(s)[7r) over F,. This amounts to maximization of m(y(s)Iq) over q E Q. Noting that
m(y(s)Iq) = f (21ra2) exp[ E_(y,  O)2/(2a2)]q(dO), (2.2.5)
_0 iE8
it follows that m(y(s) q) is maximized with respect to the prior which is degenerate at y(s). We shall denote this prior by J(s), 6 being the diracdelta function. The resulting (estimated) prior, say ir,, is now given by FF = (1  E) 70j + ESO). (2.2.6) The prior Fr is called MLII prior by Good (1965), and Berger and Berliner (1986). Using the prior Fr,, y(s) is marginally distributed as (1  E)N(/Ioln, a2In + TJn) + eF, (2.2.7) where F, has (improper) pdf
f(yj;i E s) =(27ro2) exp[ '(Yi  y(s))2/(22)] (2.2.8) iEs
We now prove the following theorem, which provides the posterior distribution of y(g) given s and y(s) under the MLII prior R. Theorem 2.2.1 Under the MLII prior ir, the conditional distribution of y(g) given s and y(s) is
AML(9(S)) N({(1  Bo)9(s) + Bo~o},1N.,02(IN., + (Mo + n)JNn))
+  AML(9(s))) N(g(s) INn, 02IN.), (2.2.9) where
ML(9(s)) = 1 + E(1  E)1Bo exp(nBo(V(s)  u0)2/(2a2)). (2.2.10) Proof of Theorem 2.2.1 The conditional pdf of y(ï¿½) given s and y(s) is r,(Y() f f(y 0), ( s, y(s))dO, (2.2.11) where
S(S, )= (y(s))o(Sy(s)) + (1  ISY(s)),
(2.2.12)
and
A(y(s)) =(1  )m(ys7ro)/m(y(s)ps). (2.2.13) From Ericson(1969), 7ro(y(9)1s, y(s)) is the N[{(1  Bo) (s) + Bo0}1Nl, u2(IN. + (Mo + n)IJN_,,)] pdf. Also,
J f(Y(m)IO)q(Ols, y(s))dO = f f(y(9)10)f (y(s)I0)q(O)dO/m(y(s) I) = m(yl)/m((s)Il)
= (27ro2) N exp[ :(y,  9(s))2/(2U2)](2.2.14) iEï¿½
Further, after some algebraic simplifications, kl(YS)) 1 ï¿½E(1  E>_1M(Y(S)I)/M(Y(S)j7Fo)
 AML(Y(S)). (2.2.15) This completes the proof of the theorem.
Note that under the posterior distribution given in (2.2.9), the Bayes estimator of
/(y) is given by
6RB(sy(s)) = Nl[ny(s) + (N n){AML(V(S))((1  Bo)g(s) + Bojio) +( ML((S)))y(S)}]
= 9(S)  (1  f)AML(P(s))Bo(9(S)  O). (2.2.16) Note that for E close to zero i.e. when one is very confident about the N(po, T ) prior for 0, since ML(9i(S)) is close to 1, it follows that 6RB is very close to JO. On the other hand, when E is close to 1, i.e. there is very little confidence in the N( o, T ) prior, AML(9(S)) is close to zero, and JRB is very close to the sample mean 9(s). Thus, as one might expect, a robust Bayes estimator serves as a compromise between the
subjective Bayes and the classical estimators of 7(y). Also, generalizing the formula (1.8) in Berger and Berliner (1986), i.e.,
V'(x) = A(x) V'O (x) + (1  A(x))Vq(x) + A(x)(1  A(x))(PO (x)  51(X))
one gets
v (Y (Y) lS, y (s) )
N2 + AML(  AML)B2(9(8)  /10)21].
[(N  n)c +(N  n) M0
(2.2.17)
Next, we compare the performances of 60, 5C and 6RB from the robustness perspective. The main idea is that we want to examine whether these estimators perform satisfactorily over a broad class of priors.
With this end, for a given prior , denote by p( , (s, y(s)), a) the posterior risk of an estimator a(s, y(s)) of 7y(y), i.e., p( , (s, y(s)), a) = E[{a(s, y(s))  7(y)}21s, y(8)]. The following definition is taken from Berger (1984). Definition 2.2.1 An estimator ao(s, y(s)) is (posterior robust with respect to F if for the observed (s, y(s)),
PORp(ao) =sup lp( , (s, y(s)), ao)  inf p( , (s, y(s)), a)I < ï¿½. (2.2.18) EF aEA
We shall, henceforth, refer to the left hand side of (2.2.18) as the posterior robustness index of the estimator ao(s, y(s)) of 'y(y) under the class of priors F. PORr(ao) in a sense is the sensitivity index of the estimator a0 of y(y) as the prior varies over F. For any given C > 0, it is very clear that whether or not posterior robustness exists will often depend on which (s, y(s)) is observed. This will be revealed in the examples to follow.
To examine the posterior robustness of J0, JC and jRB, consider the class of N(M,,'2) priors, p (real) and T2(> 0). Write M = a'2/T2, B = M/(M+n), where a2(> 0) is known. Calculations similar to (2.2.1) now give the Bayes estimator of
y(y) under the N(L, T 2) prior (to be denoted by ,,B) as
P,"B(s,y(s)) = fri(s) + (1  f){Bji ï¿½ (1  B)qi(s)}
Y (s)  (1  f)B(Y7(s)  pt). (2.2.19) Then the following results hold:
P( ,,B, (s, y(s)), 6P"B) = N2(N  n)U2(M + N)/(M + n); (2.2.20)
P(,B, (s, y(s)), 60)  p( I,B, (s, y(S)), ,B)
= (1  f)2[Bo(p  [o) + (Bo  B)(g(s)  A)]2; (2.2.21)
p( .,B, (S, y(S)), JC)  p( .,B, (s, y(S)), ,B)
= (1  f)2B2(9(s)  t)2; (2.2.22)
P(i,B, (s, y(s)), RB)  P( ,,B, (s, y(s)), 6AB)
= (1  f)2[BoAML(Y(S))(7(S)  I o)  B(g(s) /)]2. (2.2.23)
From (2.2.21)  (2.2.23) it is very clear that if we consider the class F of all N(p , r2) priors, for each one of the estimators J0, JC and 31B, the supremum (over p (real)) of the left hand side of (2.2.21)  (2.2.23) becomes +oo, and all these estimators turn out to be nonrobust. One reason why this happens is that the N(ft, T2) class of priors for all real M and T2(> 0) is indeed too big to be practically useful. As a next step, we consider the smaller class N(p0, T2) of priors, where the mean It0 is specified. This is not too unrealistic since, very often from prior experience, one can make a reasonable guess at the center of the distribution.
Note that when p = yo, denoting ,o,B by B and 0o,B by jB, (2.2.21)  (2.2.23) simplify to
p(&B, (B , y(s)), ï¿½)  p( B, (S, y(s)), 6B) = (1  f)2(Bo  B)2(j(s)  po)2; (2.2.24)
P( B, (s, y(s)), 6C)  P( B, (s, y(s)), B) =(1  f)2B2(9j(s)  /o)2; (2.2.25) P(B, (s, y(s)), 6R)  P( B, (s, y(s)), jB) = (1  f)2(BoAML(9(S))  B)'(q(S) /o)2.
(2.2.26)
Accordingly, from (2.2.24)  (2.2.26),
PORt(6ï¿½) = (1  f)2 max[B2, (1  Bo)2](g(s)  PO)2, (2.2.27) PORt(jC) = (1 _ f)2(f7(s)  0)2, (2.2.28) PORr(j"B)=(1_f)2Ma 2^2 (())]qS OI ( f) aX[BoAML(7(s)), (1  BoAML((s)))2]( (8) /z0)2. (2.2.29) Thus, given any ( > 0 and 0 < f < 1, the posterior (robustness of all these procedures depends on the closeness of the sample mean to the prior mean I0. Also, it follows from (2.2.27)  (2.2.29) that both the subjective and robust Bayes estimators are more posterior robust than the sample mean for the {N(po, T2), T2 > 0} class of priors. A comparison between (2.2.27) and (2.2.29) also reveals that the robust Bayes estimator arrived at by employing the typeII ML prior enjoys a greater degree of posterior robustness than the subjective Bayes estimator if BOAML(9(s)) > 1/2.
Although the Bayesian thinks conditionally on (s, y(s)), and accordingly in terms of posterior robustness, it is certainly possible to use the overall Bayes risk in defining a suitable robustness criterion. This may not be totally unappealing even to a Bayesian at the preexperimental stage when he/she perceives that y will be occurring according to a certain marginal distribution. The overall performance of an estimator a(s, y(s)) of 7y(y) when the prior is is given by r( , a) = E[p( , (s, y(s)), a)], the expectation being taken with respect to the resulting marginal distribution of y(s). The following method of measuring the robustness of a procedure is given in Berger (1984).
Definition 2.2.2 An estimator ao(s, y(s)) of 'y(y) is said to be (procedure robust with respect to F if
PRr(ao) = sup Ir( , ao)  inf r( , a)I < 7. (2.2.30) EF aEA
We shall henceforth refer to PRr(ao) as the procedure robustness index of ao.
Next we examine how the three estimators J0, Jc and jRB compare according to the PR criterion as given in (2.2.30) when we consider the {N(i0, T2), T2 > 0} class of priors. Using the same notation B as before for the N(yi0, T2) prior, from (2.2.24)
 (2.2.26) it follows that
r(B, SO)  r(B, SB)  (1  f)2(Bo  B)2a2/(nB); (2.2.31) r( B, JC)  r( B, SB) = (1  f)2BU2/n; (2.2.32) r( B, SRB)  r( B, 6B) = (1  f)2E[(BoAML(V(s))  B)2(g(s)  O)2]. (2.2.33) It is clear from (2.2.31)  (2.2.33) that PRr(J0) = +00, (2.2.34) PRr(C) = (1  f)2U2/n, (2.2.35) PRr(J'B) = (1  f)2 sup E[(BoAML(7(s))  B)2(y(s)  ko)2]. (2.2.36) 0
From (2.2.34) and (2.2.35) it is clear that the subjective Bayes estimator lacks completely in procedure robustness, while the sample mean is quite procedure robust. To examine the procedure robustness of jRB, we proceed as follows.
Theorem 2.2.2 For every  > 0, E[(BoAML(9(s))  B)2(9(s)  jto)2] = Oe(B21), where 0e denotes the exact order.
Proof of Theorem 2.2.2 Noting that n((s)po)2 _ (U2/B)X', it follows from (2.2.36) that
E[(BoAML(9(S))  B)2(g(s) Ao)2]
Bo 12 o /21 = ï¿½ B0B  B2 uexp() U1 du, (2.2.37)
fo {1 + gexp(uBo/2B) nBe 2)21/2r(1/2)
where g = (E/(1  E))Bo 1/2. Next observe that
rhs of (2.2.37)
or2 B 12/21
< + B2 oo g B 2Blu exp( u) ul/  u
 n 1 ï¿½ gexp(uBo/2B)}2 2 21/2(1/2) du
< BB 0 U3/21
  12 exp((1 + BoB U)) du + B]
n 2g Jo ï¿½2 0B) 23/2r(3/2)
a2 B2
[  B1(1 + BoB)312 + B]
n 2g
 0(B1/2). (2.2.38) Again, writing g'= max(g, 1), rhs of (2.2.37)
2 02Bo2 _ 2B0 U ?3/21
> Bo/ B  gexp(uBo/2B) + B] exp(2)23/1(3/2) du
J o[{2g, exp(uBo/2B)}g2p(uO2 32
= 2[(Bo/2g')2B(1 + 2BoBI3/2  2BOg'(1 + BoB3/2 + B]
= 0(B1/2). (2.2.39) Combining (2.2.38) and (2.2.39) the result follows.
Thus, unlike the subjective Bayes estimator, the robust Bayes estimator using the MLII prior does not suffer from lack of procedure robustness. Also, if one examines (2.2.32) and (2.2.33) (i.e., before taking supremum over B), then using Theorem 2.2.2
it appears that the classical estimator JC has a certain edge over JRB for small B. Smallness of B can arise in two ways. One, n is very large, the second, T2 is much larger compared to a2. This is, however, natural to expect since small B signifies small variance ratio a2/T2 which amounts to instability in the assessment of the prior distribution of 0. It is not surprising that in such a situation, it is safer to use V(s) in estimating y(y) if one is seriously concerned about longrun performance of an estimator.
Theorem 2.2.2 (specifically the upper bound derived in this theorem for the difference of Bayes risks) clearly demonstrates the procedure robustness of 3RB. The superior performance of the POR index of this estimator relative to the POR index of JC has already been established. On the other hand Jï¿½ which performs better than 6c on its POR index, fails miserably in its procedure robustness. This also shows that the average longterm performance of a procedure can sometimes be highly misleading when used as a yardstick for measuring its performance in a given situation. In this case, JRB seems to achieve the balance between a frequentist and a subjective Bayesian.
In practice, however, U2 is not usually known. In such a situation, one can conceive of an inverse gamma prior for a2 independent of the prior for 0 to derive a Bayes estimator of 7(y) (see Ericson (1969)). In a robust Bayes approach, if one assumes a mixture of a normalgamma prior and a type II ML prior for (0, a2), then the typeI ML prior puts its mass on the point (W(s), ZI (Yi  Y(S))2/n). It is possible to study the robustness of the resulting Bayes estimator which will not be pursued here.
Next we consider the problem of finding the range of the posterior mean of 'Y(y) = N1 i=1 yi over F, in (2.2.3). Simple modifications of the arguments of Sivaganesan and Berger (1989) or Sivaganesan (1988) leads to the following theorem.
Theorem 2.2.3
sup E[Y(y)ls, y(s)] = f9(s) + (1 f)a'P(y(s)) + O'f(y(s)O ) (2.2.40)
wEr. a + f(y(s) O,) and
inf E[y(y)Is,y(s)] = f9(s) (1  f)a' (Y(S)) +ï¿½J(y(s)11) (2.2.41)
rr.a + f (y(s) 101) (..
where a = E(1  'm(y(s) wro), PO (y(s)) = (1 Bo)9(s) + B01 0, 01 = vja/ V +l (s), 0,, = v,,/,/V/n +(s), and the values of v, and v, are the solutions in v of the equation e 2/2  cv2  bv + c = 0, (2.2.42) where c = a(27ru2),/2 exp[ZiE,8(yi  (s))2/(2a2)], and b = cxi (V(s)  P0(9(s)))/a.
Note that the equation (2.2.42) has exactly two solutions which may be obtained by rewriting it in the form v = [2 log(cv2 + bv  c)]1/2 and solving it iteratively using the initial values bï¿½ ,/b2+2(1+2c)(1+c)
1+2c
2.3 Symmetric Unimodal Contamination
The contamination class Q of the previous section contains all possible distributions. Needless to say, such a class is too big to be useful in practice as it may contain many unwanted distributions. In this section, we consider the smaller but sufficiently rich class Q which contains all unimodal distributions symmetric about go. This class was considered by Berger and Berliner (1986). The contamination prior pdf 7r is given by
7r(0) = (1  e)7ro(0) + Eq(O), (2.3.1) where q E Q, 0 < E < 1, and as before 7ro(0) is the N(o,TJ) pdf. To find the MLII prior in this case, we use the wellknown fact that any symmetric unimodal
distribution is a mixture of symmetric uniform distributions (cf Berger and Sellke (1987)). Thus it suffices to restrict q to Q' = {Gk : Gk is uniform (po k,/10 +k), k > 0}. The MLII prior is then given by
(2.3.2)
where q' is uniform (jt0  k, p0 + k), k being the value k which maximizes m(y(s)jq), the marginal pdf of y(s) under q.
To find k, note that
m(y(s)Iq) =
J(2wa exp[ yi E(Yi  )2]q(dO) 2os
pto+k a 1
= (2k)1 (27ra2)2 exp 1 (Y, _ O)2]dO iEs
= (27r2) (2kvx/) exp[ 32 irs(yi  V(s))2
x ['(v/k/o  z(s))  4iD(Vlk/u  z(s))],
where 4 is the N(0, 1) cdf and z(s) = Fn(y(s)  Ipo)/a7. The solution kk (cf Berger and Selke (1987), p117)
=0
if Iz(s)l > 1 if Iz(S)l _ 1,
is given by
(2.3.4)
where k* satisfies
D(V*  z(s))  i(k*  z(s)) = k*[O(k*  z(s)) + q(k* + z(s))],
0 being the N(O, 1) pdf. Since both sides of the above equation are symmetric function of z(s), we can replace z(s) by Iz(s)l to get
(k*  Iz(s)l)  I(k*  Iz(s)l) = k*[O(k*  Iz(s)l) + 0(k* + Iz(s)l)].
(2.3.3)
? *(0) = (I  E) 7o(0) + q()
(2.3.5)
Clearly k* = 0 is a solution of (2.3.5). Berger (1985, p234) states without proof that there exists a unique solution k*(> 0) of (2.3.5) for Iz(s)l > 1. Since the proof of the uniqueness result is nontrivial, and is very critical for many subsequent developments, we provide below a proof of the same. Lemma 2.3.1 There exists a unique solution k* > Iz(s)l of (2.3.5) for Iz(s)l > 1. Proof of Lemma 2.3.1 For notational simplicity, write k* = x and Iz(s)l = y. Consider the function
h(x, y) = (D(x  y)  1(x  y)  x[ï¿½(x  y) + ï¿½(x + y)]. (2.3.6) It suffices to show that for every y > 1, there exists a unique x > y for which h(x, y) = 0. To see this, first observe that
ah(x1 Y)
(9X = [(x  y)(x  y) + (x + y)q(x + y)]
Ox
= xï¿½(x + y)[(x  y) exp(2xy) + x + y]. (2.3.7) Hence, if for some x0 > 0, Oh(x, y)/OxIx=xo > 0 (the existence of x0 is guaranteed due to Oh(x, y)/OxJ,= = 2y2ï¿½(2y) > 0 for y > 0), then from (2.3.7), (x0  y) exp(2xoy) + x0 + y > 0. This implies that for every X1 > xO, (xl  y) exp(2xly) + x, + y > 0 which from (2.3.7) is equivalent to Oh(x, y)/Oxx=x1 > 0. Thus, for a fixed y, once h(x, y) starts increasing at x = x. (say), it continues to do so from x. onwards for that y. Moreover h(0, y) = 0 for all y, h(y, y) < 0 for all y > 1, and h(+o0, y) = 1 for all y. This shows that for a fixed y > 1, h(x, y) starts decreasing as a function of x from 0 to a negative number, and then starts increasing upto 1. Since h(y, y) < 0, this guarantees the existence of an x > y for which h(x, y) = 0. Remark 2.3,1 Berger (1985) used the fact k* > iz(s)J if Iz(s)l > 1 in order to provide an iterative procedure for finding k*. The above lemma substantiates Berger's claim in addition to justifying the uniqueness of k*.
The following theorem provides the conditional distribution of y(ï¿½) given s and y(s) under the MLII prior "rK. Recall M0 = a2/To, Bo = Mo/(Mo + n). Theorem 2.3.1 The posterior distribution of y(g) (given s and y(s)) under the prior "sr* is
f(y(9)l8, y(S))= su(z(s))ro(y ()1S,y(s)) + (1  Xs(z(S)))m(y l)/m(y(s)l ),
(2.3.8)
where for k > 0,
As(Z(S)) = 1 + U(1  e)1Bl/2(27r)/2(1/2)[ï¿½(k*  z(s)) + ï¿½(k* + z(s))] x exp(Boz2(s)/2)
= Q(z(s)) (say), (2.3.9) while for k = 0,
Asl(z(s)) = 1 + E(1  )'B/2 exp((1  Bo)z2(s)/2)
= A%1(z(s)) (say). (2.3.10) Proof of Theorem 2.3.1 Arguing as in Theorem 2.2.1, the posterior pdf of y(g) given s and y(s) is
7;()y(ï¿½)Is, y(s)) = A(y(s))iro(y(g)js, y(s))+ (1  (y(s)))m(ylij)/m(y(s)d8), (2.3.11) where
((s))= 1 + (1  )1m(y(S)I )/m(y(s)Jiro). (2.3.12) But for k > 0,
m(y( s)jij ) /rm(y( s ) "o )
= (2k1B l/2 ]7o+I exp[ n 9(8))2 B
(2k) dO/ exp[ (g(s))2].
(2.3.13)
Recall that z(s) = vfn(V(s) Io)/a. Also, using (2.3.5), and the fact that k k* o/v ,,
L0k exp[ 2  ]dO
= (2 k  /+ (s))/U)  ,(vfi(,lso  k
(27ra2/n)l/2[f D(k*  z(s)) (D(k*  z(s))]
= (27ru 2/n)l/2k*[ï¿½(k*  z(s)) + ï¿½(k* + z(s))]. (2.3.14) Hence, from (2.3.13) and (2.3.14), for k > 0,
= 1Bo1/2(27r)1/2[k(k*  z(s)) + ï¿½(k* + z(s))] exp(BOZ2(s)12)
= ,s ](zs)). (2.3.15) Combine (2.3.12) and (2.3.15) to get (2.3.9). Next, for k = 0,
(27ra2)n/2 exp[i Z&ES(Yi  Io)2] (27rU2)n/2Bo/2 exp[ {E(y po)2  n(1  Bo)(V7(s)  M0)2}]
= Bo1/2 exp[(1  Bo)z2(s)/21. (2.3.16) Combine (2.3.12) and (2.3.16) to get (2.3.10). This completes the proof of the theorem.
From Theorem 2.3.1, the robust Bayes estimator of y(y) = N1 1 y is
E[(y)ls, y(s)]
= ff7(s) + N1 Z{su(z(s))[(1  Bo)l(s) + Bo/o] iEï¿½
f yim(YI)dy( )
 Asu(z(s))) 1m(y)dy() (2.3.17) fm(yj,)dy('ï¿½)
But for i E 9 andk > 0,
f yim(yli ,)dy(g)
f m(yj ,)dy(g)
y1(27ro)'/2 exp( Zi~l (Y  9)2)q,(dO)dy(g) (27o2)N/2 exp( =1 (Yi  9)2)q8(d9)dY() fok 9exp(12 Eis(Y , 9)2)qs(d)
fMoeï¿½ p("g j7 (Yi  )2) s(d)
fOt 9 exp(  (  _(s))2)j(dO) fj,exp(;2(0 Y(S))2)qd
fIo+k(
.~s + t"k(9(s))exp((  ()2q(dO) fk"t+kexp(=, (0  9(s))2)jj (do) &ok 2
a o (k*  z(s))  0(k*  z(s)) = 9(s)  7/=i(k*  z(s))  I(k*  z(W)) a 0(k*  z(s))  0(k*  z(s)) = oj(s) /nk*{ï¿½(k*  z(s)) + q(k* + z(s))}
9(s)  a tanh(k*z(s)), (2.3.18)
where the penultimate equality in (2.3.18) uses (2.3.5).
From (2.3.17) and (2.3.18), one gets for k > 0,
5SU(s y(s)) = li(s)  (1  f)[iBo(fl(s) ~o)+ï¿½ (1  k1 ko tanh(k*z(s))]. (2.3.19)
Similarly, for k = 0, noting that tanh(k*z(s))/k*  z(s) as k* + 0, one gets after some simplifications
JSU(s,y(s)) = i(s)  (1  f)(1  A2(1  Bo))(li(s)  Ao).
(2.3.20)
Also, generalizing the formula (1.8) in Berger and Berliner and after some heavy algebra, one gets for k > 0, V(y(y)I/, Y(S))
__1_ 1  1
A, + tanh(k*z(s)) =/[(/ n~a +(N )2(2{Mo + n k*n
x (z(s)  tanh(k*z(s)))} + 5j(1  A1){Bo((s)  go)  itanh(k*z(s))})] (2.3.21)
while for k 0,
V(Y(Y)IS, Y(S))
"N N2[(N n)a2 + (N  n)2fa2 A2 + A2(1  2)(1  Bo)2(9(s) lo)2}].
(2.3.22)
Next we study the posterior and procedure robustness of the robust Bayes estimators proposed in this section. For posterior robustness, calculations similar to those of the previous section provide for k > 0,
p(.B, (s, Y(S)), 6s)  p( B, (s, y(s)), B)
= (1  f)2[(B j  B)(g(s)  p'o) + (1  A1)k* tanh(k*z(s))12,(2.3.23) while for k = 0,
P( B, (s, y(s)), sU)  p( B, (s, y(s)), )
= (1  f)2[(1  B)  A2(1  Bo)]2(g(s)  O)2. (2.3.24)
Accordingly, from (2.3.23) and (2.3.24), for k > 0,
PORr(OSU)
= (1 f)2max[{Bo 1(9(s) po) + (1  a)k tanh(k*z(s))}2,
{(B0o1  1)(V(s)  Mo) + (1  0 tanh(k*z(s))}2], (2.3.25) while for k 0,
PORr(JSU) (1 f)2 max[(1 Bo)2A ,{1  A2(1 Bo)}2](7(s)  o)2. (2.3.26) In order to study the behavior of the difference in the Bayes risks of JsU and JB under the subjective N(po, T2) prior, we need the following lemma which leads to the asymptotic relationship between k* and Iz(s)l as Iz(s)I + o. Lemma 2.3.2 (i) For large Iz(s)I, say Iz(s)l _> Mo(> 0), there exists Co(o < co < 1) such that co/k* < 0(k*  Iz(s)) + q(k* + Iz(s)) < Il/k*;
(ii) k*  Iz(s)I + +oo as fz(s)l  +oo; (iii) for Iz(s)I >_ M1, 0 < k*  Jz(s)l _< c*(logk*)u/2 for some c,(> 0). Proof of Lemma 2.3.2 (i) From (2.3.5),
(k*  Iz(s)I) + k(k* + lz(s)I) = [1(k*  Iz(s)I)  cD(k*  lz(s)l)]/k*
< l/k*. (2.3.27) Further, for Iz(s)I > Mo > 0,
D(k*  Iz(s)I)  ID(k*  Iz(s)J) = 1(k* + k(s)J)  (Iz(s)l  k*) > 1(Iz())  4,(o)
1
> D(Mo)  = co (say). (2.3.28) Hence, combining (2.3.27) and (2.3.28) one gets (i).
(ii) From (i),
lim [ï¿½(k*  Iz(s)I) + ï¿½(k* + Iz(s)l)] = 0
Iz(S)I4oo
since k* > Iz(s)l 4 00 as jz(s)l + oc. Hence,
lim q(k*  jz(s)j) = 0, Iz(s)*0
that is,
lir (27)1/2exp(k*  Iz(8)1)2/2] = 0. Iz(s)Ioo
This implies
lim (k*  Iz(s))2 = +00 Iz(s)I*oo
which implies (ii) since k* > tz(s)I for Iz(s)l > 1. (iii) Using (2.3.5) and (ii),
lim k*O(k*  Iz(s)I) = 1. IZ(S)I*oo
This is equivalent to
0 = lim [log(k*/V/7)  (k*  Iz(s))2/2] Iz(s) h*oo
which implies
(k*  z(s))2 = 2 log(k*/V2) + o(1) as Iz(s)I  o. Consequently,
k* Iz(s)l = [2log(k*/V2) + o(1)1/2
= (2logk*)1/2(1 +o(1)) as Iz(s)+ 00. Hence, there exists M such that for Iz(s)l > M1, k*  Iz(s)I < c*(logk*) 1/2 for some c, > 0.
This completes the proof of the lemma.
29
In order to study the procedure robustness of &6U, first find under the N(uo, T2) prior (denoted by B)
r( B, Ssu)  r( B,6B)
= E[6SU(s,y(s))aU(s,y(s))]'
= (1  f)2E[{AiBo(g(s)  go) + (1  01k tanh(k*z(s))}I[k>o]
+ (1  A2(1  Bo))(9(s)   B(p(s)  o)]2. (2.3.29) We shall now prove the following theorem. Theorem 2.3.2 r(B, 6sU)  r( B, B) = 0(B1/2). Proof of Theorem 2.3.2 First use the inequality
rhs of (2.3.29)
< 3(1 f)2E[{A1Bo(g(s)  io) + (1  A1)k. tanh(k*z(s))}2I(>o]
+(I  A2(1  Bo))2(+(s) o ] B2(9(s) /.o)21. (2.3.30) Next observe that
E[B2(97(s) o)2] = E[B2a2(nB)IX2] = Bo2/n = Oe(B); (2.3.31)
E[(1  A2(1  Bo))2((s)  =o
= E[(1  A2(1  Bo))2(g(s)  to)21[z2(s)
1Bo 12o2 X 2 1 [ 1 +gexp((1  Bo) 1/(2B)) J BX1 [x1
< E (1 +gexp((1 Bo)X2/(2B)))2}X I [X
or2 (1  Bo)2 2 nB 2gexp((1  Bo)X2/(2B)))}XiIcx<I]
o 2 (1  Bo)2 exp((1  Bï¿½)/2) }E(x2IL
; B ~ 2g 11
E (X2I[X, 2
= {xexp(x/2)(x/2)1/21/(2F(1/2))}dx
_ 2r12(B/)
=(27r)1/2 j x 112dX=()  1/( 3/. Combine (2.3.32) and (2.3.33) to get lhs of (2.3.32) = O(B1/2).
Next use the inequality
E[{AlBo(p(s) to) + (1 A)ktanh(k*z(s))}2I>o]]
< 2E[{ 2B2(9(s)  rIo)2 + (1  1)2r2 (k*vfn)2 tanh2(k*z(s))}Ir>]].
(2.3.35)
Now, writing g' = gVT,
E[B((s)  >o]]
 E[A2B2(u2/n)z2(s)I[2(s)>]I
E[ B(c2/n)z2(s)
1 + g' exp(Boz2(s)/2)(ï¿½(k*  Iz(s)I)
+ q(k* + Iz(s)I))
Let K = max(Mo, MI, 2). Then writing g" = cog' and using (i) of Lemma 2.3.2,
Note that
(2.3.32)
(2.3.33) (2.3.34)
(2.3.36)
rhs of (2.3.36)
1
E[B2(a2/n)z2(s){I[<2(s)K2]}]
Bo(a/n)E[BXII[B
+ k*z2 (s)/(k* + g" exp(Boz2(s)/2))I[2()>K2]]. (2.3.37) But,
E I[B<2
fB I
< 3 B /(K3 1)/V/. (2.3.38) Also, using (iii) of Lemma 2.3.2, and log k*/k* < 1 for jz(s)j > K
E[k*z2(s){k* + g" exp(Boz2(s)/2)}11I[Z2()>K2]]
< E[z2(s)(z(s) + c.(log k*)l/2){k* + g" exp(Boz2(s)/2)}I[2()>K2]]
< E[{Iz(s)131(g"exp(Boz2(s)2))}I[2()>K2
cz2 (s)(2Rifexp(Soz2 (s)14) )l I[z2(s)>K2]]. (2.3.39)
But,
E[Iz(s)3 exp(Boz2(s)/2)Iz2(s)>K2]]
=E[(X21B)/ exp(1BOX2 /B)IL[>2B]
1
= B32 /K2B x32exp(Box  !x)X1/21 (27r)1/2dx 222B B 2 (27r) 1/2 B3/2j0 x exp( xj2o + 1))dx
=(27r)1/2 B3/2 4( + 1)2 = (Bl/2). (2.3.40)
Moreover,
E[z 2(s) exp(B0z 2(s))I[z2(S)>K2]
SE[(X2/B) exp(OX)I[2>K2B]] B1 exp( + 1))x/2(20)1/2dx = 0(B1/2). (2.3.41) Combine (2.3.37)  (2.3.41) to conclude that E[A2B2(g(s)  o)2I[r>o]] = 0(B1/2). (2.3.42) Finally,
E[(1  A2)
(k*)2n tanh2(k*z(s))I[>]] < E[(k*) tanh (k*z(s))Ili(,)I>l]]
n
012 2 k*2I 2.43 < E[z2(s)I[ [lz(s)I>K]I, (2.3.43)
n
where in the final inequality of (2.3.43), we use I tanh(k*z(s)) k*lz(s)j for 1 < Iz(s)I : K and Itanh(k*z(s)) < 1 for Iz(s)l > K. As before
E [Z2 (s)I[l<1.(.)
E[ (k *)2 I )j> K]]
< E[(z2(s))I[Z2()>K2]]= E[B(X2)1 I[X 2
BJE[B. / 1 (27r >K)B]]
00KBX x =I1 2r 12d
= 0(B1/2).
(2.3.45)
From (2.3.43)  (2.3.45),
lhs of (2.3.43) = 0(B1/2).
Combine (2.3.30), (2.3.31), (2.3.32), (2.3.34), (2.3.35), (2.3.42), (2.3.43) and (2.3.45)
to get the theorem.
Remark 2.3.2 It follows from the above theorem that as n + oo, i.e. B + 0, under the subjective N(i0, '2) prior, OSU, the robust Bayes estimator of y(y)= N1N y, is asymptotically optimal in the sense of Robbins (1955).
Next, to find the range of E[^(y)Is, y(s)] over F. in (2.3.1), once again applying Sivaganesan and Berger (1989) or Sivaganesan (1988), we get the following theorem. Theorem 2.3.3
sup E['y(y) Is, y(s)] = fq(s) + (1  f)S 70(Y(S)) + H1(z.) 7r r. a + H(z,,)
inf E[7(y)Is, y(s)] = fY9(s) + (1  f)a (s))+H (ZI) irr. a + H(zt)
 !21zo f(y(s)IO)dO = f(y(s)lmo)
2z L:z 9~~)6d
= i' o+z
J ',o Of (Y(S)IO)dO = of (y(S) Po)
if z5 0 ifz = O;
(2.3.48)
if z  0
if z = 0, (2.3.49)
and
where
(2.3.46) (2.3.47)
H(z) H, (z)
while the values of z and z,, are given by the solutions of the equation
(P(s)v  uu/V )(wt + 2au/v/)  v(2aou/Vi + wtlo) (2.3.50) 2[auz + t(apuo  ao) + wuv/2]
where t  ï¿½) ï¿½ l u =(PO+ZJ V = 'I)(ozJ(8)) v  (s) (POzV ), and w = exp[1 EiEs(Yi  V()2] (2/o2)()/2
Note that the equation (2.3.50) may be iteratively solved for z by taking a number larger than 6 o(9(s)) as the initial value of z when maximizing, and a number smaller than 6O'(9(s)) as the initial value of z when minimizing.
2.4 An Example
This section concerns the analysis of real data set from Fortune magazine, May 1975 and May 1976 to illustrate the methods suggested in Sections 2.3 and 2.4. The data set consists of 331 corporations in US with 1975 gross sales, in billions, between onehalf billion and ten billion dollars. For the complete finite population, we find the population mean to be 1.7283 and the population variance 2.2788. The population variance is assumed to be known for us. We select 10% simple random sample without replacement from this finite population. So the sample size is n=33. We can obtain easily the sample mean and the corresponding standard error. We use gross sales in previous year as prior information to elicit the base prior 7r0. The elicited prior 7r0 is the N(1.6614, 6.4351 x 103) distribution. Under this elicited prior r0, we use formulas (2.2.1) and (2.2.2) to obtain the subjective Bayes estimate and the associated posterior variance. But we have some uncertainty in 7r0 and the prior information, so we choose E = .1 in F, and we get the robust Bayes estimates and the associated posterior variances using the formulas (2.2.16), (2.2.17), (2.3.19) and (2.3.21). A number of samples were tried and we have reported our analysis for one sample. Table 2.1 provides the classical estimate Jc, the subjective Bayes estimate 60, the robust
Table 2.1. Estimates, Associated Standard Errors and Posterior Robustness Index Estimate SE [Y(Y)  J1 POR 6C 1.9881 0.2852 0.2598 8.6506x102 J0 1.7191 0.0105 9.2187x103 7.2386x102 6RB 1.7704 0.1459 4.2075x102 4.7416x102 JSU 1.7313 0.1184 3.0239x103 6.5948x102
Bayes estimate 6RB with all possible contaminations, the robust Bayes estimate OSU with all symmetric unimodal contaminations and the respective associated standard errors. Table 2.1 also provides the posterior robustness index for each estimate which is in a sense the sensitivity index of the estimate as the prior varies over F.
From Table 2.1, we may find that the robust Bayes estimates 6sU and RB are well behaved in the sense that they are closer to y(y) than at least the classical estimate 6C and 3sU is closer to 'y(y) than even the subjective Bayes estimate 60. The robust Bayes estimates 6S" and 6RB are also good in the viewpoint of the posterior robustness index.
For finding the range of the posterior mean of y(y) = N1 N y over 17, we get that the posterior mean is in the interval (1.7120, 1.7868) for all possible contamination case and in the interval (1.7166, 1.7406) for all symmetric unimodal contamination case. Observe that the second interval is much shorter than the preceding one. Also note that robust inference is achieved for F, in either case. So if we feel that the true prior is close to a specific one, say 70, we should model through F, using one or the other contamination model gaining robustness in each case as compared to the use of a subjective case.
CHAPTER 3
ROBUST BAYES COMPETITORS OF THE RATIO ESTIMATOR
3.1 Introduction
For most sample surveys, for every unit i in the finite population, information is available for one or more auxiliary characteristics, characteristics other than the one of direct interest. For example, if the characteristic of direct interest is the yield of a particular crop, the auxiliary characteristic could be the area devoted that crop by different farms in the list. We consider the simplest situation when for every unit i in the population, value of a certain auxiliary characteristic, say xi(> 0) is known (i = 1,2,.. .,N).
The classical estimator of the finite population mean N1 i,= yi in such cases is the ratio estimator eR = N1 (EZc, Yi/ E_. xi) EI1 x, which seems to incorporate the auxiliary information in a very natural manner. Moreover, this estimator can be justified both from the model and design based approach. While Cochran (1977) provides many designbased properties of the ratio estimator, Royall (1970, 1971) justifies this estimator based on certain superpopulation models.
The ratio estimator can also be justified from a Bayesian viewpoint. To see this, consider the superpopulation model yi = fOxi+ei, where ei are independent N(0, u2xi), i = 1,2,... , N, while / is uniform over (oo, o). In the above, u2(> 0) may or may not be known. For unknown a2, one assigns an independent prior distribution to a2.
Then based on s and y(s), the posterior (conditional) mean of Y(y) = N1 is given by eR.
It is clear from the above that the ratio estimator can possibly be improved upon when one has more precise prior information about 3. For example, if one wants to predict the mean yield of a certain crop based on a sample of farms, it is conceivable, on the basis of past experience, to specify a prior distribution for 0 with fairly accurate mean 00 and variance To. Ericson (1969) has indeed shown that with a N(0, r) prior to fl, the Bayes predictor eE of 'Y(y) is given by
eE(s,y(S)) = N'[yi + {(2Z yi + TO2/O)/( 2 ZXi" +T02)} Ixi]. (3.1.1) iEs iEs iEs ivs Clearly eE converges to eR as To + Co, that is, when the prior information is vague.
The above Bayesian approach has been frequently criticized on the ground that it preassumes an ability to completely and accurately quantify subjective information in terms of a single prior distribution. We shall see in the next section that failure to specify accurately one or more of the parameters /0 and To can have a serious consequence when calculating the Bayes risk and often protection is needed against the possibility of such occurrence. A robust Bayesian viewpoint assumes that subjective information can be quantified only in terms of a class F of possible distributions. Inferences and decisions should be relatively insensitive to deviations as the prior distribution varies over F.
In this chapter, we consider an Econtamination class of priors for /3 following the lines of Berger and Berliner (1986). In Section 3.2, the econtamination class includes all possible distributions. For every member within this class, the Bayes predictor (posterior mean) of 7(y) is obtained, and expressions for variations of the posterior mean within this class of priors are given following Sivaganesan and Berger (1989) and Sivaganesan (1988). Moreover, the MLII prior (see Good (1965) or Berger and Berliner (1986)) within this contamination class of priors is found. We provide also
analytical expressions for the indices of posterior and procedure robustness (cf Berger (1984)) of the proposed robust Bayes predictors based on MLII priors for an entire class of priors, and have compared these indices with similar indices found for the usual ratio estimator as well as the subjective Bayes predictor. In the course of calculating indices for procedure robustness, we have proved the asymptotic optimality in the sense of Robbins (1955) of the robust Bayes predictor, and have also pointed out that the subjective Bayes predictor does not possess the asymptotic optimality property.
The above program is repeated in Section 3.3 with the exception that the contamination class now contains only all symmetric unimodal distributions. Once again, robust Bayes predictors are found, and their optimality is studied.
Finally, in Section 3.4, a numerical example is provided to illustrate the results of the preceding sections.
The summary of our findings is that the robust Bayes predictors enjoy both posterior and procedure robustness for a fairly general class of priors. The subjective Bayes predictor enjoys posterior robustness, but fails miserably under the criterion of procedure robustness. The classical ratio estimator is inferior to the others under the criterion of posterior robustness, but enjoys certain amount of procedure robustness: Thus, our recommendation is to give the robust Bayes predictors every serious consideration if one is concerned with both Bayesian and frequentist robustness. The other important finding is that when the sampling fraction goes to zero, that is, we are essentially back to an infinite population, several asymptotic optimality properties of the estimators of Berger and Berliner (1986) are established which are hitherto unknown in the literature.
Royall and Pfferman (1982) have addressed a different robustness issue. They have shown that under the superpopulation models (i) yi ind N(a + O3xi, a2xi) and ii) Y, in N(3xi, u2xi), if 0 is uniform (oo, oo), then under balanced samples, that is, t(s) = ( the posterior distribution of y(ï¿½) given (s,y(s)) is the same under
the two models, the resulting estimator of the finite population mean being the ratio estimator which equals the sample mean in this case.
For simplicity, we shall assume that n(s) : n = p(s) = 0, that is, we effectively consider only samples of fixed size n. Also, for notational simplicity, we shall, henceforth, assume that s = {il,. .,j} where 1 < il < . < in < N. Let
 {1,2,...,N} s = {j1,"',jNn} (say), where 1 < ji <  < jNt < N.
We shall write y(s) = (yil, .,yin)T, y( ) = (Yjl,.. IYjN.)T, X(S) = (X,' ".,)T, D(s) = Diag(xil,.', xi.). x(g) = (xj,...,xjN_,)T, and D(9) = Diag(xj3,., XjN_).
Note that writing M0 = o2/T , Bo =_ Bo(s) = Mo(Mo + nï¿½(s))1 and f = n/N (the sampling fraction), the Bayes predictor eE given in (3.1.1) can be alternately written as
eBo(S, y(s)) = fV(s) + (1  f)2(ï¿½)[(1  Bo(s))g(s)/12(s) + Bo(s)flo]. (3.1.2) In later sections, we shall repeatedly use (3.1.2). Also, the associated posterior variance is given by
V(y(y) s, y(s)) = a2N2[(N  n) (9) + (N  n)2t2(g)/(M + ni(s))]. (3.1.3)
3.2 eContamination Model and the MLII Prior
Suppose that conditional on j3, y, are independent N(3xi, a2xi), i = 1, 2,..., N. Also, let fi have the prior distribution r = (1 E)0ro+eq, where  E [0, 1) is known, 7ro is the N(o, 2) distribution, and q E Q, the class of all possible distributions. Then the marginal pdf of y(s) is given by
m(y(s)I1r) = (1  e)m(y(s)Iiro) + Em(y(s)Iq), (3.2.1) where m(y(s)ro) denotes the N(floX(s), o2D(s) + T0x(s)xT(s)) pdf, while
m~y~s ~ U2)  (H J1 e2 [h   1:(y,Trn(y(s)Iq) = 1(2F 2~ (JJ x,) exp[ 2' (y2 s _OX=)2/xj]q(dfl). (3.2.2)
jiEs j0 Es
The posterior pdf of / given (s, y(s)) is now
7r(fl s, y(s)) A(y(s))ro(31s, y(s)) + (1  A(y(s)))q((31s, y(s)), (3.2.3) where
A(y(s)) = (1  E)m(y(s)jiro)/m(y(s)Iir), (3.2.4) 7ro(fls, y(s)) denotes the N((1  Bo(s))9j(s)/l(s) + Bo(s)Oo, u2(Mo + nt(s))1) pdf, and
q(/31s, y(s)) = f(y(s) /3)q(/3)/m(y(s) q). (3.2.5) This leads to the posterior pdf of y(g) given (s, y(s)) as
'7r Y(9 IS, (S) = ff (y( 9)I,3)7r(OIs, y( s))d O
= A(y(s))7ro(y(9)Is, y(s)) + (1  A(y(s)))q(y(g) Is, y(s)) (3.2.6) where 7ro(y(g)Is, y(s)) is the N(((1  Bo(s))g(s)/l(s) + Bo(s)/3o)x(g),,2(D(9) + (Mo + nt(s))lx(9)xT(9))) pdf while using (3.2.5), q(y(9)Is, y(s)) = m(ylq)/m(y(s)jq). (3.2.7) Then the posterior mean of y(y) is given by
E[y(y)Is, y(s)]
= fy() +( (1  E)m(y(s)ro)6PO(y(s)) + 6f13f(y(s)/3)q(d3) (1  E)m(y(s)7ro) + Em(y(s)Iq) (3.2.8)
where
PO(y(s)) = (1  Bo(s))9(s)/l(s) + Bo(s)Oo. (3.2.9) From Sivaganesan and Berger (1989) or Sivaganesan (1988), it follows that
sup(inf)E[y(y)ls, y(s)] = fy(g) + (1  f)t(g)
7Er
x sup(inf) (1  E)m(y(s)wo)J"O(y(s)) + Of(y(s) 13) (3.2.10) 1(1  E)M(y(S)17o) + Ef(y(s)1,3)
Hence, following Sivaganesan (1988), we get
sup E(y) Is, y(s)]
irEF
fy() + (1 (1  E)m(y(s)lro)6ro(y(s)) + eiOuf (y(s) fu) (3.2.11) (1  6)m(y(s)r10) + Ef(y(s) 1lU)
while infEr E[y(y)ls, y(s)] has a similar expression as (3.2.11) with f3L replacing fu. In the above flu and fiL (iL < fu) are given by
,3u = 9 (s)/(s) + vu u(uT(s))1/2, fL = 9(s)l+(s) VL(ni(S))11 where vu and VL (< vu) are solutions in v of the equation
eV2 2 c(v2  1)  bv = 0, (3.2.12) where c = a(27ra2) 2(L Xi8 /2) exp[ Es(yi  x=m(s)/x(s))2/x], (s) (9(s)/2(s)  65"ï¿½(y(s)))/u, and a = (1  E)1m(y(s)j7ro). We shall use (3.2.11) in Section 3.4 for numerical evaluation of the supremum and infimum of the posterior mean under the given class of priors.
Next we find the MLII prior within the given class of priors. Since iE(Yi Ox,)2/x, is minimized with respect to 13 at f8 = V(s)/ï¿½(s), from (3.2.1) and (3.2.2), the MLII prior which maximizes the marginal likelihood m(y(s)Iir) with respect to q E Q is given by
irML(f0) (1  E)7ro(fl) + (fl), (3.2.13) where j(fl) is degenerate at f0 =l. The posterior pdf of y(g) given by (s, y(s)) under the MLII prior FFML is now given by
FrML (Y (9)1S, Y (S))
= ML(y(S)) 7ro(y()Is, y(s)) + (1  ML(y(S))) N((,(s)/((s))x(9),a2())
(3.2.14)
where for 0 < e < 1, after some algebraic simplifications
ML (9('5) )
1+ E(1  _)1m(y(S)IiM/m(Y(S)Iiro)
1+E(1  B(s)(B(s)(V(s)  /ox(s))2/(2a2(s))). (3.2.15) The robust Bayes predictor of y(y) under the MLI prior FiML then simplifies to
eRB(S,y(s)) = fj(s) + (1  f)'(9){(1  AML(V(s))Bo(s))g(s)/(s)
+AML(g(s))Bo(s)fio}. (3.2.16) Also, generalizing the formula (1.8) in Berger and Berliner (1986), one gets the associated posterior variance given by
V( (y)ls, Y(s))
N 2[a2 (N  n).(9) + (N n) 2t2(g){a 0 AML + AML(1  AML) Mo+ n;( s
xB2(s)(q(s)/_(s) 30)2}]. (3.2.17) We shall now compare the robust Bayes predictor eRB of Y(Y) with the Bayes predictor eBo given in (3.1.2) and the ratio estimator eR in terms of posterior risks as well as the Bayes risks under the {N(f30, T2), T2 > 0} class of priors. For a typical member N(f30, r2) of this class, the Bayes predictor of y(y) is given by
eB(s,y(s)) = f 9(s) + (1  f).t(g)[(1  B(s))q(s)/(s) + B(s)30], (3.2.18) where B =_ B(s) = M/(M + ni2(s)), M = U2/Tr2.
The choice of the above class of priors may be justified as follows. Very often, based on prior elicitation, one can take a fairly accurate guess at the prior mean. However, the same need not necessarily be true for the prior variance, where there is
a greater chance of vagueness. Note that when r2 54 T2, none of the estimators eR, eBo or eRB is the optimal (Bayes) estimator.
Based on Definition 2.2.1 introduced in Chapter 2, we examine the posterior robustness indices of eRB, eBo and eR. Note that whether or not posterior robustness exits will often depend on which (s, y(s)) is observed. This will be revealed in the subsequent calculations.
With this end, first note that under the N(f30, T2) prior denoted by r2, the posterior risk of any estimator e of y(y) is
p( T2, (s, y(s)), e) = p( 2, (s, y(s)), eB) + (e  eB)2
(3.2.19)
where eB is given in (3.2.18). Using Definition 2.2.1 and (3.2.19) one gets for the class F = {r2 :T2 > 0} of priors
PORr(eR) = sup (1  f)2;2(.)B(s)2[y(s)/2(s) _)o]2
O
(1  f)22(g)[V(s)/(s) 0]2;
PORr(eBo)
PORr(eRB)
0
(
 sup (1  f)2 22(g)(B(s)  Bo(s))2[9(s)/(s) /3o]2
O
f (1  f)22(g) max[Bo(s)2 , (1  Bo(s))2]
X [9(8)/;  1o] ; (
3.2.20)
3.2.21)
sup (1  f)2.t2(g)(AML(9(s))Bo(s)  B(s))2[V(s)/(s) ,f0]' I
S(1  f)2ï¿½2(g)max[ ML(9(s))BO(S)2, (1  AML(q(s))Bo(S))]
x[9(s)/(s) _ #o12. (3.2.22) Note from (3.2.20)  (3.2.22) that if we allow all possible distributions N(3, T2), where 0 is widely different from ,3 as our priors, all POR indices can become prohibitively
large. It follows from (3.2.20)  (3.2.22) that both eB0 and eRB are superior to eR in terms of posterior robustness. However, the ratio
PORt(eBo )/PORt (eRB)
max[B2, (1  B0)2]/ max[AML(M(s))B , (1  ML((s))Bo)] (3.2.23) can take values both larger and smaller than 1 depending on the particular (s, y(s)).
Although the Bayesian thinks conditionally on (s, y(s)), it seems quite sensible to use the overall Bayes risk as a suitable robustness criteria, at least at a preexperimental stage. This issue is also addressed in Berger (1984) who introduced also the criterion of procedure robustness. We use Definition 2.3.2 to study the procedure robustness indices of eRB, eBo and eR.
Simple calculations yield for the class IF {=2 ï¿½72 > 0} of priors
PRr(eR) = sup (1  f)2%2(g)u2(n(s))1B(s) 0
= (1  f)2;2(g)u2(nt (s))1; (3.2.24)
PRr(eBo) = sup (1  f)2Yr2(.g)U2(nt(s))1(Bo(s)  B(s))2/B(s) 0
= + C); (3.2.25)
PRr(eRB) sup (1  f)2.2(9)E[(Bo(s)ML(Y(s))  B(s))2(g(s)/?(s)  0)2].
0
(3.2.26)
It is thus clear that the subjective Bayes predictor eBo lacks procedure robustness, while the ratio estimator eR is quite procedure robust. The procedure robustness of eRB can be examined on the basis of the following theorem.
Theorem 3.2.1 E[(Bo(S)AML(7(s))B(s))2((s)/t(s) _30o)2] = O (B'!/2(s)), for every E> 0, where 0e denotes the exact order.
Proof of Theorem 3.2.1 Noting that nt(s)(9(s)/l(s)  o)2  (a2/B)X2, it follows from (3.2.26) that
E[(BoA ML(())/(S))  B)2(y(s)  3o)2]
o0Bo 2 r 2u 1/21
B) u B}2/U\ U du ,(3 .2.27)
o 1 + gexp(uBo/2B)  n(s)B uexp2)21/2F(1/2) where g = (/(1  E))Bo1/2. Next observe that
rhs of (3.2.27)
a2 B0
< nx(s) Jo{ 1 + gexp(uBo/2B)}2
or2 B2 0
< I B1 exp(2(1 + BoB
nx (s) 2g fo 2
or2 B 2 )3/2 B
 1(1 + BoB' +
= 0(B1/2).
Again, writing g'= max(g, 1),
rhs of (3.2.27)
a2 2~
> or f BO B1 x __ n(s) [2gfexp(uBo/2B)}2
B2}B1uexp() '/2 du
221/2r(1/2) a
U3/21du B]
)23/211(3/2)du+B
(3.2.28)
2B0
gexp(uBo/2B) B
u u 3/21
x exp(.) 2s/F(a/2)du
a2
r 2[(Bo/2g')2B1(1 + 2BoB1)3/2  2BOg(1 + BOB)3/2 B] n_7F(S) +B
= 0(B1/2).
(3.2.29)
Combining (3.2.28) and (3.2.29) the result follows.
When f + 0, we get a result related to the procedure robustness of the robust Bayes procedure of Berger and Berliner (1986). To our knowledge, such a result is
the first of its kind. In addition as n + oc, that is, B + 0, it shows that the robust Bayes procedure is asymptotically optimal. In view of (3.2.24) and (3.2.26) and 'Theorem 3.2.1 (i.e., before taking supremum over B) it appears that eR has distinct advantage over eRB for small B. This is not surprising though since small B signifies small M = a2/T2 which amounts to greater instability in the assessment of the prior distribution of /3 relative to the superpopulation model. It is not surprising that in such circumstances, it is safer to use eR for estimating 'Y(y) if one is seriously concerned about the longrun performance of the estimator.
A different point to note is that in reality, a2 is not usually known. In such situations, however, one can conceive of an inverse gamma prior for a2 independent of the prior for /3 to derive a Bayes estimator of y(y). In a robust Bayes approach, if one assumes a mixture of a normalgamma prior for (/, a2), then the MLII prior puts its mass on the point (9(s)/;(s), n 1 &Es(Yi  xiy(s)/x(s))2/xi). We have not, however, pursued here the resulting Bayesian analysis.
It should be noted though that the class Q contains many priors which should possibly not come under consideration. In the following section, instead of Q, we consider the Q. which contains only symmetric unimodal distributions.
3.3 Symmetric Unimodal Contamination
Suppose now the contamination prior distribution is given by
7r(0) = (1  )07ro(/0) + Eq(/3), (3.3.1) where q G Q,, the class of all possible symmetric unimodal distributions, and ir0 is as in Section 3.2. The expression for E['y(y)Is, y(s)] is the same as the one given in (3.2.8). Using the fact that any symmetric unimodal distribution is a mixture of symmetric uniform distributions as in Sivaganesan and Berger (1989), one gets
sup(inf)E[7(y)Is,y(s)] fy(g) + (1  f); (9)
7rEF
x sup(inf) (1  e)m(y(s)Tro)6rO(y(s)) + eH(k) (3.3.2) k (1  E)m(y(s)7ro) + EH(k) where m(y(s)[iro) and J1rï¿½(y(s)) are given by (3.2.1) and (3.2.9) respectively, and
1 60 o+k
H (k) = 2f S~ (y (s) fld,3 if k =0
= f(y(s))/o) if k 0; (3.3.3)
H1(k) = 2kok 3f(y(s)I3)d/3 if k 0
3 of(y(s) 1,o) if k = 0. (3.3.4) Using the expression f (y(s)13) = (27rr2) e XI 2x) exp[212 Ei,8,(Y,6 OX)2/Xi], it follows after some simple algebra that
H(k) = exp[   2 (H x. )
iEs iEs
x (n f())(8/2/(, _((o +'o  k  (s)/;() )13.3.5) o(n0(S))1/2 a (n (s)) 1/2 and
Hi(k) = H(k)q(s)/.(s) + exp[1 (Y, X Y(S)/ (s))2/x,] x (2k) (27r 2) 7 xi2)a(n.,(s))1 iEs
[iD(o + k  O(S)/o() o  k  (3.3.6) a (n (S))  1  u(n t(s))1/2 (3.3.6)
Write z z(s) = ((s)/(s) o)(nt(s))1/2/0, ko = k(n 2(s))1/2/a, t = (koz(s))+ q(ko  z(s)), u = q(ko  z(s))  q(ko  z(s)), w = (ko  z(s))  '(ko  z(s)),
a = E'(1)m(y(s)[ro) and a, = aPO(y(s)). Now, using (3.3.2)  (3.3.6) and solving
d[(a, + Hi(k))/(a + H(k))] = 0, it follows after some heavy algebra that sup E[y(y)Is, y(s)] = fg(s) + (1  f)t( )(a, + Hi(ku))/(a + H(ku)) (3.3.7)
rEF
and
inf E[(y)Is, y(s)] = f9(s) + (1  f)X(g)(a, + Hl(kL))/(a + H(kL)) (3.3.8)
7rEr
where kv and kL(< ku) are the two solutions of the quadratic equation
2k[auk + t(ao  a,) + Guv/2]
= [w9(s)/1(s)  uu(nj (s))'/2][Gt + 2au(n (s))1/2]
 w(2aia(nt(s))1/2 + Gf3ot), (3.3.9) where
 xiy~)/x~s) /xi]2a) 2 2lx )(8())1/2. (3.3.10) G exp[ 1o2 ~  x9s1Ts)1j( , n
iEs iEs
The formulas (3.3.7)  (3.3.10) will be utilized in Section 3.4 for numerical computations.
Next we find the MLII prior in this case. Since any symmetric unimodal distribution is a mixture of symmetric uniform distributions, for finding the MLII prior, it suffices to restrict oneself to Q' = {Gk : Gk is uniform (/0  k, /3o + k), k > 0}. The MLII prior is then given by 8= (1  E)7ro + Eq, (3.3.11) where , is uniform (/3o  k,3o + k), k being the value k which maximizes m(y(s)Iq).
To find Ic, first write
f3o+k
m(y(s)lq) = (2k)1 fok (212 x exp[.2 (fi1xi)2/xd
Es 2aiEs
= (2ior2)2 (2k n (s))1 (I x2) exp[yi Z yy  X4V(S)/(S))2/XiI iEs iEs
x [1D( (s)k/a  z(s))  i( ni(s)k/U  z(s))], (3.3.12) where z(s) = (nt,(s))1/2(V(S)/Yt(s) 30)/u. Differentiating m(y(s)fq) with respect to k and letting the derivative equal to zero, it follows from Berger and Sellke (1987) that
= k*o/ nï¿½(s) if Iz(s)I > 1
0 if Iz(s)I < 1, (3.3.13) where k* is a solution of the equation
D(k*  Iz(s)I)  I(k*  Iz(s)I) = k*[O(k*  Iz(s)I) + 0(k* + Iz(s))]. (3.3.14) Remark 3.3.1 Clearly k* = 0 is a solution of (3.3.14). Berger (1985, p234) points out that there exists a unique solution k*(> 0) of (3.3.14) for Iz(s)I > 1. Lemma 2.3.1 contains the stronger result than that there exists a unique solution k* > Iz(s) of (3.3.14) for z(s)l > 1.
Under the MLII prior ?r*, the posterior distribution of y(g) given (s, y(s)) is
rS(y()s,y(s)) = Asv(z(s))7ro(y(.)Is,y(s)) + (1  AsU(z(s)))m(y)/m(y(s)Iq ), (3.3.15)
where for k > 0,
A(z(s)) 1 + e(1  E)l(27r/Bo)1/2(1/2)[O(k*  z(s)) + ï¿½(k* + z(s))] x exp(Boz2(s)/2)
= 1(z(s)) (say). (3.3.16) while for k = 0,
Is(z(s)) = 1 + E(1  )'Bol/2 exp((1  Bo)z2(s)/2)
= A '(z(s)) (say). (3.3.17)
The robust Bayes predictor of y(y) under the MLII prior 7r is then given by
esu(s, y(s))
= fh(s) + (1  f).()[l(z(s)){(1  Bo(s))gsl()/(s) + Bo(s)3o}
+ (1  k nsMf{9(s)/ (S) k, tanh(k*z(s))}] (3.3.18) for k > 0, while for k = 0,
esu (s, y (s))
 fg(s) + (1  f) (.)[ 2(z(s)){(1  Bo(s))9(s)/ (s) + Bo(s)Qo}
+(1  2(z(s)))/%]. (3.3.19) Also, generalizing the formula (1.8) in Berger and Berliner and after some heavy algebra, one gets the associated posterior variance given by
V(Y(Y) Is, Y(8))
N 2[a,2(N  n).() + (Y n)2 2(g)
x{a2( A, + 1 Ak* tanh(k*z(s))(z(s) 1 tanh(k*z(s))))}
Mo + nt(s) niX(S) k* A1(1  l){Bo(s)(Y (s)/;2(s)  lo)  k* tanh(k*z(s))}2] (3.3.20)
for k > 0, while for k = 0,
V(Y(y)Is, y(s))
N 2[a2 (N  n)t(g) + (N  n)22(g){a2 A2 + A2(1  A2) Mo + nx;,(s)
x (1  Bo(s))2(j(S)/ï¿½(s)  /3o)2}]. (3.3.21) Next we provide expressions for the indices of posterior robustness of the robust Bayes predictors proposed in this section under the {N(60, T2), T2 > 0} class of priors.
Calculations similar to those of the previous section provide for k > 0, PORr(esu)
(1  f)2x2()max[{Bo(s)A(9(s)/(s)  Oo) + (1  A1) k*U
k* rtIi(s)tahkz)),
{(Bo(s)Ai  1)(9(s)/(s)  io) + (1  A1) tanh(k*z(s))}2], k* Vn (s)
(3.3.22)
while for k 0,
PORr(esu) (1 f)2 2($) max[(1  Bo(s))2A, {1  )2(1  Bo(s))}2](j(s)/2(s) flo)2.
(3.3.23)
In order to examine the procedure robustness of esu, first note that under the N(fl0, T2) prior (denoted by 4) r( 2,esu)  r(Cr2,eB)
Etesu(sy(s)) eB(SY(s))i
(1  f ()E[{AiBo(s)(q(s)/ (s)  i3o) + (1  A),tanh(k*z(s))F] + (1  A2(1  Bo(s)))(g(s)/(s)  f3o)Irk=o]  B(s)(V(s)/(s)  00o)]2. (3.3.24) We now have the following theorem.
Theorem 3.3.1 r( ,2,esu)  r(C42,eB) = 0(B1/2). Proof of Theorem 3.3.1 First use the inequality rhs of (3.3.24)
_ 3(1  f)2 2(9)E[{A1Bo((s)/V(s)  flo) + (1
 A1) k*  tanh(k*z(s))}2Ik
k* nï¿½(s) kO
+(I  A2(1  Bo))2(O(S)/2(s) flo)2I[ o] + B2(3(s)/0(s) flo)2].
(3.3.25)
Next observe that E[B2 (8)/ (s) 0)2] = E[B20,2(n(s)B)l X2] = B T/(n (s)) = Oe(B) and
E[(1  2(1  Bo))2(9(s)/ (s)  flo)2I[.=o]]
= E[(1  A2(1  Bo))2(q(s)/(s)  I3O)2I[z2(s)_l]]
1  Bo
1 + g exp((1  Bo) X/(2B))
(3.3.26)
}2
2( s)BXl I[x(
(g = (I + E)1Bo 1/2)
2 E[{1 + (1  Bo)2
n(s)B(1 + gexp((1  Bo)x /(2B)))2}x1I[x B]]
K 2 (1  Bo)2
nx (s)B 2gexp((1  Bo)12(2B)))} x<___2 (1  Bo)2 exp((1  Bo)/2)}E(X2l[X2
Note that
E(XI[x2
= fB{xexp(x/2)(x/2)1/21/(2F(1/2))}dx
= (27)1/2j oBx1/2dx  (27r)1/2(3B3/2).
fo 3
Combine (3.3.27) and (3.3.28) to get
lhs of (3.3.27) = O(B/2).
Next use the inequality
E[{AjBo(9(s)/1(s)  i0o) + (1  A1) k* tanh(k*z(s))}2I>0]]
2E[{IB (()/ (s)  o)2 + (1  A1)2cr2(k* ) 2tanh2(k*z(s))}[>o]].
(3.3.30)
(3.3.27)
(3.3.28)
(3.3.29)
Now, writing g' = g 1,
E[B2( (S)/(s)  I30)2IF>o]]
= f[X B (U /nt(s))z2(s)I[Z (8)>l]]
B2( 2/n (s))z2(s)
1 + g'exp(Boz2(s)/2)(ï¿½(k*  Iz(s)I) + q(k* + z(s)1))
Let K = max(Mo, M1, 2). Then writing g" = cog' and using (i) of Lemma 2.3.2,
rhs of (3.3.31)
E[Bo(U2/ nx(s))Z2(S){I[<2(,)K2]}]
" B(U2 /nx(s))E[B1 XI[B<2
+ k* z2(s)/(k* + g" exp(Boz2 (s)/2))I[2()>K2]]. (3.3.32) But,
E 2 [K2B
[xI[B
< 3B/(K 1)/v 2. (3.3.33) Also, using (iii) of Lemma 2.3.2, and logk*/k* < 1 for Iz(s)I > K
E[k* z2(s) {k* ï¿½ g" exp(Boz2(s)/2) } I[z2(s)>K2]]
" E[z2(s)(z(s) + c,(log k*)l/2){k* + g" exp(Boz2(s)/2)}I[2()>K2]]
" E[{Iz(s)j1/(g" exp(Boz2(s)/2))}I[z2()>K2]
cz2 (s) (2Xexp(Boz2 (s)/4))1l[z2()>K2]. (3.3.34) But,
E[jz(s)l exp(Boz2(s)/2)I[Z2()>K2]]
= E[(x2/B)3/2 exp(1BoX /B)I[ >K ]]
2 1 1Bo
= B32IB x3' exp( Box  1 1~ (27r) 1dx cL2B2B 2 < (27r) 1/2 B3/2 00x 4 IX B 1))dx
(2r) exp( x(ï¿½1)d
(27r) 1/2 B3/2 4( Bo 1)2 = 0(B1/2). (3.3.35) Moreover,
E[z2 (s) exp(BZ (s))I[2(.)>g2]]
B
= E[(x2/B) exp(X)I[X2>KB]]
1 2 exp( ( +] 1))x1/ (27r)/d
B IM2B ~2 2Bï¿½ = O(B1/2). (3.3.36) Combine (3.3.34)  (3.3.36) to conclude that
 01)2I>0]] = 0(B1/2). (3.3.37) Finally,
E[(l1  ah a2
(k,)2n.T(s)tnh(zs)I>] a72
< a E[(k*)2 tanh2(k*z(s))I[Iz(s)>l]]
; (S)
< or 2 E[z2 (s)I[l<1z(s)K]], (3.3.38)
where in the final inequality of (3.3.38), we use I tanh(k*z(s))l _< k*lz(s)l for I < Iz(s)I < K and Itanh(k*z(s))I < 1 for Iz(s)I > K.
As before
E[z2(s)[l
E[ ( k* ) [I )> K]]
11[2(,)>K2] = E[B(X2)' 2]]
B K2B X1 exp(2x)xl/21 (2r)112dx B(2i)1/2] ï¿½312d
< B(27 1/2 x3/2ax = 2B(2K) 1/2(K 2B) 1/2 2B
=(B1/2). (3.3.40) From (3.3.38)  (3.3.40), lhs of (3.3.38) = O(B1/2) Combine(3.3.25), (3.3.26), (3.3.27), (3.3.29), (3.3.30), (3.3.37), (3.3.38), (3.3.39) and
(3.3.40) to get the theorem. Remark 3.3.2 It follows from the above theorem that as n + oo, i.e. B(s) 4 0, under the subjective N(d0, T2) prior, esu, the robust Bayes estimator of 7(y) = N1 i y is asymptotically optimal in the sense of Robbins (1955).
3.4 An Example
The example in this section considers one of the six real populations which are used in Royall and Cumberland (1981) for an empirical study of the ratio estimator and estimates of its variance. Our population consists of the 1960 and 1970 population, in millions, of 125 US cities with 1960 population between 100,000 and 1,000,000. Here the auxiliary information is the 1960 population. The populations of different cities are shown in Figure 3.1.
Z 0
.J
J 00
Z
00
F
0.0
0 0A 0:6 0'
0.2 *0.8
1960 POPULATION (MILLIONS) Figure 3.1. Cities Populations
The problem is to estimate the mean (or total) number of inhabitants in those 125 cities in 1970. For the complete population in 1970, we find that the population mean is 0.29034. We select 20% simple random sample without replacement from this population. So the sample size is n=25. Also, we are using a2= (N  1) i=l(iOlX,)2 = 4.84844X 103 which is assumed to be known. We can obtain easily the ratio estimate and corresponding standard error. To do a Bayesian analysis, we use both 1950 and 1960 populations in 125 cities to elicit the base prior ir0 for ft. The elicited prior 7r0 is the N(1.15932, 1.21097x 103 ) distribution based on prior information. Under this elicited prior 7ro, we use formulas (3.1.2) and (3.1.3) to obtain the subjective Bayes predictor and the associated posterior variance. But we have some uncertainty in 7ro and the prior information, so we choose E = .1 and we get the robust Bayes
Table 3.1. Predictors, Associated Standard Errors and Posterior Robustness Index
Predictor SE 1y(y)  el POR
eR 0.28426 1.10032x102 6.08452 x 103 5.38660x 104 eBo 0.29336 5.61481X103 3.01418x103 3.27488x104 eRB 0.28660 5.49854X103 3.74880x103 4.35696x104 esu 0.29027 5.74954x 103 7.83722 x 105 3.29777x 104
predictors and the associated posterior variances using formulas (3.2.16), (3.2.17), (3.3.18) and (3.3.20). For illustration purpose, we have decided to report our analysis for one sample. Table 3.1 provides the classical ratio estimate eR, the subjective Bayes predictor eBo, the robust Bayes predictor eRB with all possible contaminations, the robust Bayes predictor esu with all symmetric unimodal contaminations and the respective associated standard errors. Table 3.1 also provides the posterior robustness index for each predictor which in a sense the sensitivity index of the predictor as the prior varies over the class {N(0,r2), T2 > 0 }.
An inspection of Table 3.1 reveals that the robust Bayes predictors eRB and esu are well behaved in the sense that esu is closest to 7(y) and both eRB and esu are closer to 7(y) than the classical ratio estimate eR . The subjective Bayes predictor eBo is good in the sense of the posterior robustness index. Note that eR is worst in both the closeness to 7(y) and the posterior robustness index.
Also we find that the posterior mean of y(y) = N1 ENI Y, is in the interval (0.28363, 0.29428) for all possible contaminations and in the interval (.29003, .29364) for all symmetric unimodal contaminations. Note that the range of the posterior mean of y(y) = N1 EI Y, is fairly small for both cases. So if we feel that the true prior is close to a specific one, say 70, we should model via one of the contamination models, and achieve a very robust inference.
CHAPTER 4
BAYESIAN ANALYSIS UNDER HEAVYTAILED PRIORS
4.1 Introduction
In this chapter, we consider the idea of developing priors that are inherently robust in some sense. The idea is that it is perhaps easier to build robustness into the analysis at the beginning, than to attempt verifying robustness at the end.
Substantial evidence has been presented to the effect that priors with tails that are flatter than those of the likelihood function tend to be fairly robust (e.q., Box and Tiao (1968, 1973), Dawid (1973), O'Hagan (1979, 1989) and West (1985)). It is thus desirable to develop fairly broad classes of flattailed priors for use in "standard" Bayesian analyses. Andrews and Mallows (1974) and West (1987) studied scale mixture of normal distributions which can be used for simulation and in the analysis of outlier models. The Student t family, doubleexponential, logistic, and the exponential power family can all be constructed as scale mixtures of normals. The exponential power family was introduced and popularized by Box and Tiao (1973) in the context of Bayesian modelling for robustness. Recently, Angers and Berger (1992) and Angers(1992) considered t priors in the hierarchical Bayes setting, while Datta and Lahiri (1994) considered general scale mixtures of normals primarily with the end of outlier detection in the context of small area estimation.
The price to be paid for utilization of inherently robust procedures is computational; closed form calculation is no longer possible. Recently, however, the Markov
chain Monte Carlo integration techniques, in particular the Gibbs sampling (Geman and Geman (1984), Gelfand and Smith (1990), and Gelfand et al. (1990)) has proved to be a simple yet powerful tool for performing robust Bayes computations.
Ericson (1969) considered the superpopulation model yj = 0 + ej, where 0, el, , eN are independently distributed with 0  N(po, To) and ei's are iid N(0, a2). As we have seen in Chapter 2, under the N(po, T2) prior, the Bayes estimator of y(y) = N zfi=1 Yi is
60(s, y(s)) = f 9(s) + (1  f){(1  Bo)y(s) + Bopto}. (4.1.1)
Recall that f = n/N, Mo = a2/TO2 and Bo = Mo/(Mo + n). For unknown a2, a normalgamma prior was used.
The purpose of this chapter is to develop inherently robust Bayes procedures to overcome the problem associated with outliers in the context of finite population sampling. We consider a refinement based on heavytailed prior distributions on 9 using scale mixtures of normals for both known and unknown a2. We use the same notations as in Chapter 2.
The outline of the remaining sections is as follows. In Section 4.2, we provide the robust Bayes estimators of y(y) based on heavytailed prior distributions using scale mixtures of normals when a2 is known. Also, the asymptotic optimality (A.O.) in the sense of Robbins (1955) of proposed robust Bayes estimators is proved. The above program is repeated in Section 4.3 with the exception that a2 is unknown. Once again, robust Bayes estimators are proposed using scale mixtures of normals, and their A.O. property is studied. Finally in Section 4.4, a numerical example is provided to illustrate the results of the preceding sections.
4.2 Known Variance
Consider the case when (i) Y ]0/ dN(9,U2) (i=, ..., N) and (ii) 0  ' p( O) where p(x) = fo A q(xA'2)g(A) dA, that is, p(.) is a scale mixture of the normal distribution with mixing distribution g(). Note that we can write (ii) in the following two steps; (iia) 9I A  N(po, A1) and (iib) A  T 2g(T0A) where fo g(x)dx = 1. The following list identifies the necessary functional form for g(A) to obtain a wide range of densities which represent departures from normality:
tpriors: If kA  X2 then 0 is Student t with k degrees of freedom, location parameter po, and scale parameter T0.
doubleexponential priors: If 1/A has exponential distribution with mean 2 then
0 is doubleexponential with location parameter po and scale parameter To.
exponential power family priors: If A has positive stable distribution with index a/2 then 0 has exponential power distribution with location parameter A0 and scale parameter To.
logistic priors: If v'X has the asymptotic Kolmogorov distance distribution then 0 is logistic with location parameter P0 and scale parameter To. [A random variable Z is said to have an asymptotic Kolmogorov distance distribution if it has a pdf of the form f(z) = 8z E, 1_(1)Jlj2 exp(2j2z2)I(o, )(z) ].
We shall use the notations 9(s) = n1 Z,s Yi and y(g) = (yj : i s}, the suffixes in y(g) being arranged in the ascending order. Then the posterior distribution of y(ï¿½) given by s and y(s) is obtained as follows:
(i) conditional on s, y(s) and A, y(g) has N((B(A)po + (1  B(/))y(8))lNn, a2(IN. + I N_ where B(A) = Au2/(Au2 + n);
(ii) the conditional distribution of A given s and y(s) has pdf
OC(U 12 n(?j(s) _ Mb)2 2
f(AIs, y(s)) c + (2 + nA)1/2 exp[ 2(U2 + nA')' g(T0 A). (4.2.1)
Note that under the posterior distribution given in (4.2.1), the Bayes estimator of 'y(y) is given by
J M(s,y(s)) = E[y(y)js,y(s)]
= f9(s) + (1  f){E[B(A) I s, y(s)] /to + (1  E[B(A) I s, y(s)]) 9(s)}. (4.2.2) Also, one gets
V(Y(Y) I .S, y(8))
E[ V(/(y) I s, y(s), A) I s, y(s)] + V[ E(y(y) I s, y(s), A) I s, y(s)]
SN2U2{(N  n) + (N  n)2E((Aa2 + n)1 I s,y(s))}
+(I f)2 V(B(A) Jo + (1  B(A)) P(s) I s, y(s)). (4.2.3)
The calculations in (4.2.2) and (4.2.3) can be performed using onedimensional numerical integration. Alternately, one can use Monte Carlo numerical integration techniques to generate the posterior distribution and the associated means and variances. More specifically, in this chapter, we use Gibbs sampling originally introduced in Geman and Geman (1984), and more recently popularized by Gelfand and Smith (1990) and Gelfand et al. (1990). Gibbs sampling is described below.
Gibbs sampling is a Markovian updating scheme. Given an arbitrary starting set of values Uï¿½)," , U(ï¿½) , we draw U(1) [U1 I U20), , V(ï¿½)] , U2(1) , [U2 I UI1), U ï¿½)," , U(1()  [Uk I U , " " (1)j, where [ [ ] denotes the relevant conditional distributions. Thus, each variable is visited in the natural order and a cycle in this scheme requires k random variate generations. After t such iterations, one arrives at (Ult), , Ut)). As t4 o, ... , U(t)  (U1,..., Uk).
Gibbs sampling through q replications of the aforementioned titerations generates q ktuples (Ue),..., U(t) (j = 1,. ,q) for t large enough. U1,..., Uk could possibly be vectors in the above scheme.
Gelman and Rubin (1992) adapt multiple sequences, with starting points drawn from an overdispersed distribution, to monitor the convergence of the Gibbs sampler. Specifically, m > 2 independent sequences are generated, each of length 2d. But to diminish the effect of the starting distribution, the first d iterations of each sequence are discarded. Hence, we have m x d simulated values for each parameter of interest.
Using Gibbs sampling, the posterior distribution of y(g) is approximated by m d
(rd)1 E E [y(S) I s, y(s),0 = Oij, A = Aij]. (4.2.4) i=1 j=1
To estimate the posterior moments, we use BaoBlackwellized estimates as in Gelfand and Smith (1991). Note that E[7(y) I s, y(s)] is approximated by m d
f 9(s) + (1  f) (md)1 E E (B(Aij) po + (1  B(Aij)) fj(s)). (4.2.5) i=1 j=1
Next one approximates V(y(y) Is, y(s)) by m d
N22{(N  n) + (N  n)2 (md)1 1 (Aij ,2 + n)1} i=1 j=1
 f)2[(md) E :(B(Atj)/to + (1  S(Aij) i=1 j=1
m d
{(md)1 E (B(Aij)to + (1  B(Aij)) j(s))}2]. (4.2.6) i=1 j=1
The Gibbs sampling analysis is based on the following posterior distributions:
(i) 0 1 s, y(s), y(), A , N [(Apo + ENI yi/02)/(A + N/c2), (A ï¿½N02)]
(ii) f(A I s, y(s), y(g), 0) x vx/exp[A(0  ,o)2] g(T2A);
(iii) y(g) I s, y(s), 9, A  N[O1Nfn, U2INn].
Note that if kA . X2 then f(A I s,y(s),y(ï¿½),0) reduces to a Gamma(.{J2k + (0 po)2}, l(k + 1)) density. [A random variable W is said to have a Gamma(a,fi)
distribution if it has a pdf of the form f(w) cx exp(aw)w"1 I(o,o)(w), where I denotes the usual indicator function]. Also, if 1/A has exponential distribution with mean 2 then f(A I s, y(s), y(g), 0) reduces to a IGN(1/ T(0  A0)2, I/TA) density. [A random variable V is said to have a IGN(rq, 172) distribution if it has a pdf of the form f(v) 1 V3/2 exp( (2n1 )2 MI ,0)(0).
We shall now evaluate the performance of the robust Bayes estimator 61%M of 9 for large n under the N(po, T2) prior, say 7ro. The Bayes estimator of 0 under this prior is J0 which is given by (4.1.1). Let r(7ro, 6) denote the Bayes risk of an estimator J of 9 under the prior iro. Our aim is to show that r(ro, J1M)  r(wo, Jï¿½)  0 as n * oo. Lemma 4.2.1 Assume E(A 1/2) < oc. Then E[B(A) Is, y(s)] 4 0 as n + o. Proof of Lemma 4.2.1 Note that
E[B(A) Is, y(s)] or2 f(a2 + nA 13/2 exp[ n(2 )2] g(T2A)dA J 2(a2 + nA1) + f(a2 + n) 1/2 exp[ n(() [o)2 Jo2(u2 ï¿½ nA1)l g('r2"A)dA
1 E + n(2)1A1) 1+ n(U2)A1
+E[1 + n(02)A11 exp(A(YMs) P)2) = E ( +n,  'A1 ,1/2 A 1/
1 E [( n(021A1 )1 + n(o2)1A1
E 1 ï¿½ n(2)A1 A1/2 exp(A (S)  /0)2)
 (say). (4.2.7)
Now, P < n1l2E(A 3/2) 4 0, if E(A3/2) < 0o. Also,
Q,= E[(no2A + 1)1/2 AI/2 exp(A((9(s)  o)2)]. (4.2.8) Note that 9(s)  Io is the centered mean of an exchangeable sequence of random variables, and hence, is a centered backward martingale. Hence, (V(s)  t0)2 is a backward submartingale. Since 1rn supn_, E(9(s)  o)2 = T2 < +oo, by the submartingale convergence theorem, (9(s)  yo)2 converges a.s. to a rv, say Yo. Hence, using Fatou's lemma
lir inf Qn > E[A1/2 exp(0Y)], (4.2.9) where the lower bound is bounded away from zero a.s.. Hence, P,/Qn + 0 as n + o.
We now turn to the theorem which proves the A.O. property of JlM obtained in (4.2.2).
The6rem 4.2.1 Assume E(A3/2) < oo. Then r(7ro, 6sim)  r(ro,3)  0 as n ï¿½ oo. Proof of Theorem 4.2.1 Standard Bayesian calculations yield
r (ir, J'Sm)  r (7ro, 60) = E (J% 1 m 30)2
= (1  f)2E[(E(B(A) 1 s, y(s))  Bo)2 (V(s) io)2]. (4.2.10) By Lemma 4.2.1, E[B(A) I s, y(s)] 4 0 as n * 00. Also, B0 + 0 as n  oo. Hence, (E(B(A) I s,y(s))  Bo)2 + 0 as n + oo. Also, IE(B(A) I s,y(s))  Bo < 1 and (9(s)  tLO)2 being a backward submartingale is uniformly integrable. Hence, the rhs of (4.2.10) * 0 as n  oc. This completes the proof of the theorem.
4.3 Unknown Variance
In this section, somewhat more realistically, we consider the normal superpopulation model with unknown mean 0 and unknown variance r1. Ericson (1969) used a normalgamma prior on (0, r) in this setting. That is, 0 1 r ' N( o, rrT) and r ,Gamma( ao, go). But in this case the ratio of model variance and prior variance is known.
Suppose now, more generally, that y, 0, r " N(O, r1) (i = 1,., N), and 0 and r are independently distributed with 0  N(o, To) and r ,, Gamma( ao, 'go). Then the posterior distribution of y(g) given s and y(s) is obtained via the following two steps :
(i) conditional on s,y(s) and r, y(g) has N((B(r)go + (1  B(r))g(s))1Nl,
r1(IN. U(r)+f1NflNl)), where M(r) = r1/TO2 and B(r) M(r)/(M(r) +n).
(ii) the conditional distribution of r given s and y(s) has pdf
f(rls, y(s))
e a r(nT90 xF lao ï¿½ (Y + n ) + (4.3.1)
(i+nr,gr)1/2 2  +nr0 )
Note that under the posterior distribution given (4.3.1), the Bayes estimator of
y(y) is given by
B(s, y(s)) = E[y(y) I s,y(s)]
= fq(s) + (1  f){E[B(r) I q y(s)]p o + (1  E[B(r) I s, y(s)])g(s)}. (4.3.2) Also, one gets
V(7(Y) I s,y(s))
E[ V(y(y) I s, y(s), A, r) I s, y(s)] + V[ E((y) I s, y(s), A, r) s, y(s)]
 N2(N  n)E[rl{1 ï¿½ N' n(1  B(r)}l I s, y(s)]
+(1  f)2(9(s) ,o)2V(B(r) I s, y(s)). (4.3.3)
In order to robustify the above model, consider the case when (i) yi 0, r d N(O,r) (i 1,,N), (ii) r ,. Gamma( ao,1go) and (iii) 0  1p(01o0), where p(x) = fo' A (x)d )g(A)dA. Note that the prior pdf of 0 does not depend on r. Recall that we can write (iii) in the following two steps; (iiia) 0 A ..' N(io, A1) and (iiib) A T2g(T2A) where fo g(x)dx = 1.
Then the posterior distribution of y(g) given by s and y(s) is obtained as follows: (i) conditional on s, y(s), r, and A, y(g) has N((B(A,r)po+(1B(A,r))q(s))1N , r1(INfn ï¿½ Arl+nlN1,lN)), where B(A,r) = 1/(1 + nAr);
(ii) the conditional distribution of A and r given s and y(s) has pdf
f(A, rls, y(s))
C r(n o2)/2 exp[I{ao + E s(Yi _ g(S))2 + n~lX1_r) }](TA) (43.4)
(1+n 1r)1/2 2 + n\I (4.3.4
Note that under the posterior distribution given in (4.3.4), the Bayes estimator of
y(y) is given by
M(sy (s)) = E[y(y)Is, y(s)]
= f (s) + (1  f){E[B(A, r) I s, y(s)],uo + (1  E[B(A, r) I s, y(s)]) (s)}(4.3.5) Also, one gets
V(Y(y) I S, y(s))
= E[ V(y(y) I s, y(s), A, r) I s, y(s)] + V[ E(y(y) I s, y(s), A, r) I s, y(s)] N2(N  n)E[r{1 ï¿½ (N  n)1 } I s,y(s)] +(I f)2 V(B(A, r) pzo + (1  B(A, r)) i(s) I s, y(s)). (4.3.6)
Using Gibbs sampling, the posterior distribution of y(g) is approximated by m d
(md)1 E E [y(g) s, y(s), 9 = Oij,, A = Aij, r = rij]. (4.3.7) i=1 j=1
To estimate the posterior moments, we use once again the RaoBlackwellized estimates as in Gelfand and Smith (1991). Note that E[7(y) I s, y(s)] is approximated by
m d
f 9(s) + (1  f) (md) E E (B(Aij, rij) yo + (1  B(Aij, rij)) 9 (s)). (4.3.8) i=1 j=1
Next one approximates V(y(y) I s, y(s)) by m d
N2(N  n)(md)1 E '[(rij)1 {1 + (N  n)(Aij(rj)1 + n)}] i=1 j=1
rn d
 f)2[(md) Z(B(Aij, rij) yo + (1  B(A3, rij)),l(s))2 i=1 j=1
m d
{(md)1 EE (B(Aj, rij) go + (1  B(Aij, rij)) 9(s))}2]. (4.3.9) i=1 j=1
The Gibbs sampling analysis is based on the following posterior distributions:
(i) 0 1 s, y(s), y(g), A, r  N [(Apo + r i1 yi)/(A + rN), (A + rN)];
(ii) f(A I s,y(s),y(g),0,r) cx vexp[ (0  /o)2] g(T02A);
(iii) r I s, y(s), y( ), A, 0  Gamma('l{ao + ENI(y  0)2}, (N + go));
(iv) y(g) I s, y(s), 9, A, r , N[1N_,,, rIIN,].
Recall that if kA _ X2 then f(A I s,y(s),y(g),O) reduces to a Gamma( {Tgk + (0 'po)2}, (k + 1)) density. Also, if 1/A has exponential distribution with mean 2 then f(A I s, y(s), y(g), 9) is reduced to a IGN(1/ /T(O  po)2, 1/T02) density.
Now to evaluate the performance of the robust Bayes estimator g M of 0 for large n, we denote as the prior under which 0 and r are independent with 0  N(1,0, T2) and r , Gamma('ao, go). The Bayes estimator of 0 under prior is given by (4.3.2). Our goal is now to show that r( , eM)  r( , 6B) + 0 as n 4 oo.
68
Lemma 4.3.1 Assume E(A3/2) < cc. Then E[B(r) I s, y(s)] 4 0 as n  oc. Also, E[B(A, r) I s, y(s)]  0 as n + c. Proof of Lemma 4.3.1 First show E[B(r) I s, y(s)] 4 0 as n 4 cc. This amounts to showing Nn/D,,  0 as n + cc, where
r(n+go2)/2 r 1 ï¿½ n(g(S) _1IO)2 }]dr
N 1 (1ï¿½ + n r)3/2 exp[ {ao + i YnT02r
" (+ or32 2 iES 1 +n n r
(4.3.10)
and
00 r (n+go2)/2
Dn = I) f expIt{ao
0 (1 + nr2r)1/2 ex 2a
r(n+gï¿½2)/2 r (1 + nT0r)1/2 exp{a0+
+ D(y
iEs
9 (s))2 + n(v(s)  /1o)2 }]dr.
I + nT2r
(Yi  Y(9s)) (s)  o)2]dr. iEs 2T
Hence, defining Wn Gamma( {ao + EiES(Yi  Y(S))2}, (n + go 1)),
< E (+ n 02Wn
< E (1 + n o W ) 1 
1
exp 9 W ])
t
1
vrT2Wn]
exp (9 (iS) _ o)2)
n T2Wn 1/2]
1+ nT2Wn )I
An
W (say).
Note that
D0o
(4.3.11)
(4.3.12)
(4.3.13)
t
Now, A, < E[(1 + nT2Wn)1] :_ nl(rO)1E(W1). Note that
n1 E(Wjl) = n1 ao + Ei8s(Y  9(S))2 . o n+go 3
(4.3.14)
since by the law of large numbers for exchangeable sequences, ao+Zjo,(ViV(S))' R1 g+in3
Again,
B. = E[(nI(72)'WI' + 1)1/2]exp
> (nl(T)1E(W,') + 1)1/2exp
(g(s) ito)2\
2 T02
using Jensen's inequality since (x + 1)1/2 is a convex function of x. Hence,
N < nl(T2)lE(Wnl')(nl(T>2)'E(wi') + 1)1/2exp ((() _ p0)2\ 0
(4.3.16)
since (9(s) o)2= Op(1) and nIE(Wn1) 4 0.
Next we need to show that E[B(A, r) Is, y(s)] P 0 as n 4 oo, that is, N'/Dn P 0 as n oc, where
o (1 + nA1r)3/2 exp[ {ao+i(y (s)) 1) ,o)2 }]drdA (4.3.17)
and
nD = 100
00 g(TA)1r(n+go2)/2 r Yi_,(,))2ï¿½ n(V(s) _o)2 Jo T A)(1 + nA1r) 1/2 expL 2faoï¿½tys 1 + nAr }]drdA
(4.3.18)
Note that
fo fo A)_r(n+92)/2 A(8()
D'g ( 1 + nAr)1/2 e . A_ + ]drdA.
n 1+n r12 2 iES
(4.3.19)
(4.3.15)
Hence, defining Wn, as before,
1 1
+ nA'W
exp AMS
 /o)2)]
nAwn ,_)1/2Al/2 +
 (1 + nA1 Wn
Al2exp AWS
where expectation is taken with respect to the joint pdf
f(A, w) =og (T0o A) exp[{ao + E(Y ( iEs
x ((+ ) ao+ E (yV(S))2 2
Now,
= 1 n  E (A 3/2) ao+ E ,(7J  V(S))  0,
n+go3
if E(A3/2) < co. In the above, we have used independence of Wn and A. Again,
B' = E[(n1AW. + l)1/2 A1/2 epA~S _ [to)2)]
A
= E[E(nAW + 1)1/2 A}A1/2exp((gj( )  to)2)]. (4.3.23)
2
REW 1/2 +E I + nA W
 n (say),
Bn
(4.3.20)
(4.3.21)
(4.3.22)
A' < E[A 1/2 nI AW,'] = nI E(A 3/2 )E(Wn1)
 nA _ lwn 1/2
+1 nA1Wn )
Since (x + 1)1/2 is a convex function of x, using Jensen's inequality for conditional expectations,
E[(n'AWn1 + 1)1/2 1 A] _ [E(n'AW' I A) + 1]1/
 (n'AE(W:') + 1)1/2. (4.3.24) Hence,
Bn > E[(nAE(W,1) + 1)1/2A1/2 exp( (9(s)  [sO)2)]
 E[(nao + Eje(yj  9(s))2 + 11/2 A1/2exp(((s)  )2)]
n n+go  3
(4.3.25)
By the same argument in Lemma 4.2.1, (9(s)  juo)2 converges a.s. to a rv, say Y0'. Hence, using Fatou's lemma, it follows from (4.3.25) that liminf B' > E[A1/2 exp(AY)].
n+oo n  2(4.3.26) n*oo 2 Hence, B,, is bounded away from zero a.s. in the limit. Hence An/Bn'  0 so that N'/D" o.
We now turn to the theorem which proves the A.O. property of 6gM obtained in (4.3.5).
Theorem 4.3.1 Assume E(A 3/2) < cc. Then r( , 6DM)  r( , jB) + 0 as n + oo. Proof of Theorem 4.3.1 Standard Bayesian calculations yield
r(,jM)  r( ,jB)= E(6M 3B)2
 (1  f)2E[{E[B(A, r) Is, y(s)]  E[B(r) I s, y(s)]}2 (g(s)  [o)2]. (4.3.27)
By Lemma 4.3.1, E[B(A,r) s,y(s)] 4 0 as n + oo and E[B(r) I s,y(s)] _4 0 as n 4 oo. Hence, {E[B(A,r) s,y(s)]  E[B(r) I s,y(s)]}2 4 0 as n  oc. Also, jE[B(A,r) I s,y(s)]  E[B(r) [ s,y(s)]j < 1 and (9(s)  p0)2 is uniformly integrable. Hence, the rhs of (4.3.27) 4 0 as n + oo. This completes the proof of the theorem.
4.4 An Example
We illustrate the methods of Section 4.2 and 4.3 with an analysis of data in Cochran (1977). The data set consists of the 1920 and 1930 number of inhabitants, in thousands, of 64 large cities in the United States. The data were obtained by taking the cities which ranked fifth to sixtyeighth in the United States in total number of inhabitants in 1920. The cities are arranged in two strata, the first containing the 16 largest cities and the second the remaining 48 cities. But for our purpose, we just use the second stratum only. For the complete population, we find the population mean to be 197.875 and the population variance 5580.92. We use the 1920 data to elicit the prior in our setting so that p0 = 165.438 and T2 = 71.424. We want to estimate the average (or total) number of inhabitants in all 48 cities in 1930 based on a sample of size 16 (i.e. 1/3 sample). For illustrative purposes, we have decided to report our analysis for one sample.
In deriving the robust Bayes estimates based on heavytailed prior distributions using scale mixtures of normals, we have considered Gibbs sampler with 10 independent sequences, each with a sample of size 5000 with a burn in sample of another 5000.
Table 4.1 provides the Bayes estimates of y(y) and the associated posterior standard deviations for the normal, double exponential and t priors with degree of freedom 1, 3, 5, 10 and 15 in both known and unknown a2 cases. For the unknown a2 case, we have used a0 = go = 0 to ensure some form of diffuse gamma prior for the inverse of
the variance component in our superpopulation model. Note that the naive estimate, that is, the sample mean is 207.69.
An inspection of Table 4.1 reveals that there can be significant improvement in the estimate of y(y) by using heavytailed prior distributions rather than using the normal prior distribution in the sense of the closeness to y(y). For instance, using the double exponential and the t(1), t(3), t(5), t(10) and t(15) priors, the percentage improvements over the normal are given respectively by 45.78%, 89.05%, 52.06%, 30.68%, 15.53% and 9.06% for the known U2 case. Here the percentage improvement of el over e2 is calculated by
((e2  truth)2  (el  truth)2)/(e2  truth)2 where el is the robust Bayes predictor based on heavytailed prior distributions and e2 is the Bayes predictor using the normal prior. Also as one might expect, flatter the prior, closer is the Bayes estimates to the sample mean. In general, for most cases we have considered, the Cauchy prior (i.e., t prior with 1 degree of freedom) leads to an estimate which is closest to the population mean.
We adopt the basic approach of Gelman and Rubin (1992) to monitor the convergence of the Gibbs sampler. For 0, we simulate m = 10 independent sequences each of length 2d = 10000 with starting points drawn from a t distribution with 2 degree of freedom. The justification for t distributions as well as the choice of the specific parameters of this distribution are given below.
First note that from the posterior distribution of A given s and y(s) as given in (4.2.1), we find the posterior mode, say A by using the NewtonRaphson algorithm. Also, we use y(s) for yi, i E .9 based on sample. We can now very well use the N [(AM + N9(s)/a2)/(A + N/a2), (A + N/2)1] as the starting posterior distribution for 0. But in order to start with an overdispersed distribution as recommended by Gelman and Rubin, we take t distribution with 2 degree of freedom. Also, note
that once the initial 0 value have been generated, the rest of the procedure uses the posterior distributions as given in (i)(iii) in Section 4.2. Similar procedures can be used for the unknown a2 case.
Next, as in Gelman and Rubin, we compute
B/5000=the variance between the 10 sequence means #i. each based on 5000 0 values, 10 10 that is B/5000  .)2/(10  1), where j.. = L E#j..
10
j=l i=1
W= the average of the 10 withinsequence variance s? each based on (50001) degrees 10
of freedom, that is W 1 Z s. Then, find
62 50001W + 1B
5000 5000
and
/ = &62 1 B.
(10)(5000)
Finally, find f? = V/W. If R is near 1 for all scalar estimands of interest, it is reasonable to assume that the desired convergence is achieved in the Gibbs sampling algorithm (see Gelman and Rubin (1992) for the complete discussion).
The second column of Table 4.2 provides the f? values (the potential scale reduction factors) corresponding to the estimand 0 using Cauchy and double exponential priors based on 10 x 5000 = 50000 simulated values. The third column provides the corresponding 97.5% quantiles which are also equal to 1. The rightmost five columns of Table 4.2 show the simulated quantiles of the target posterior distribution of 0 for each one of the 4 estimates based on 10 x 5000 = 50000 simulated values.
Table 4.1. Bayes Estimates and Associated Posterior Standard Deviations
Known a2 Unknown a2 Priors Bayes EI s Posterior SD Bayes Estimate Posterior SD Normal 184.31 10.19 183.70 11.47 DE 187.89 12.16 187.09 13.14 t(1) 193.38 15.01 192.53 15.79 t(3) 188.48 12.63 187.69 13.60 t(5) 186.58 11.51 185.93 12.60 t(10) 185.41 10.75 184.68 11.91 t(15) 184.94 10.50 184.30 11.75
Table 4.2. Potential Scale Reduction and Simulated Quantiles
Potential scale
reduction Simulated quantiles
Priors R 97.5% 2.5% 25.0% 50.0% 75.0% 97.5% Known a2
Cauchy 1.00 1.00 160.00 171.98 183.09 198.17 226.89
DE 1.00 1.00 158.68 168.54 176.00 185.55 206.82 Unknown a2
Cauchy 1.00 1.00 157.54 170.38 181.46 196.96 226.55
DE 1.00 1.00 156.63 167.27 174.63 184.55 206.39
CHAPTER 5
BAYESIAN ROBUSTNESS IN SMALL AREA ESTIMATION
5.1 Introduction
Small area estimation is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. In typical small area estimation problems, there exist a large number of small areas, but samples available from an individual area are not usually adequate to achieve accuracy at a specified level. The reason behind this is that the original survey was designed to provide specific accuracy at a much higher level of aggregation than that for small areas. This makes it a necessity to "borrow strength" from related areas through implicit or explicit models that connect the small areas to find more accurate estimates for a given area, or simultaneously, for several areas. Ghosh and Rao (1994) have recently surveyed an early history as well as the recent developments on small area estimation.
Like frequentist methods, Bayesian methods have also been applied very extensively for solving small area estimation problems. Particularly effective in this regard has been the hierarchical or empirical Bayes (HB or EB) approach which are especially suited for a systematic connection of the small areas through models. For the general discussion of the EB or HB methodology in the small area estimation context, we may refer to Fay and Herriot (1979), Ghosh and Meeden (1986), Ghosh and Lahiri (1987), Datta and Ghosh (1991), Datta and Lahiri (1994), among others.
In this chapter, we propose an alternative Bayesian approach, namely the robust Bayes (RB) idea which has been discussed in the previous chapters in the context of a single stratum. Specifically, the HB procedure models the uncertainty in the prior information by assigning a single distribution (often noninformative or improper) to the prior parameters (usually called hyperparameters). Instead, as discussed in the earlier chapters, the RB procedure attempts to quantify the subjective information in terms of a class F of prior distributions.
In order to study Bayesian robustness in the context of small area estimation, we consider the following hierarchical Bayes model.
(A) Conditional on 0, /3, and 2, let I1, , Yp be independently distributed with i  N(i, Vi), i = 1,.  . ,p, where the Vi's are known positive constants;
(B) Conditional on /3, and T2, 61, , OP are independently distributed with EiN(x[3, T2), i 1, ,p, where x1,  xP are known regression vectors of dimension s and )3 is sxl;
(C) 3  uniform(R) and T2 is assumed to be independent of /3 having a distribution h(T2) which belongs to a certain class of distributions F.
We shall use the notations Y = (Y1,"", Ip)T 0  (1,"", OP)T, X = (XI,.. ,Xp)T. Write G = Diag{V1,. , V} and assume rank (X) = s. Cano (1993) considered a special case of this model when xi = 1 and V = V for i = 1, ,p.
The outline of the remaining sections is as follows. In Section 5.2, we choose F to be Econtamination class of priors where the contamination class includes all unimodal distributions. We develop the robust hierarchical Bayes estimators of the small area means and the associated measures of accuracy (i.e., the posterior variances) based on typeII maximum likelihood priors. Also we provide the range where the small area means belong under the Econtamination class.
In Section 5.3, we choose F to be the density ratio class of priors. As suggested by Wasserman and Kadane (1992), we use Gibbs sampling to compute bounds on posterior expectations over the density ratio class.
In Section 5.4, we choose F to be the class of uniform priors on T2 with T2 < T2 < T2. We are interested in the sensitivity analysis of the posterior quantity over F.
Finally, Section 5.5 contains the analysis of the real data to illustrate the results of the preceding sections.
5.2 EContamination Class
In this section, we consider the class F of priors of the form
F = {h: h = (1  E)ho +eq,q C Q}, (5.2.1) where 0 < E < 1 is given, h0 is the inverse gamma distribution with pdf
ho( r o2) ' )6+1 exp(aï¿½/T2)J(o,.)(T2), (5.2.2) denoted by IG(ao, 0o), and Q is the class of all unimodal distributions with the same mode T2 as that of ho.
The joint (improper) pdf of Y, 0, /3 and T2 is given by
2)1 YOT_0]T)2110_X 12
f(y, 0,/3, T) c exp[(y  o)TG(y  )](2) exp[IIO  X2T212]
x { (1  E)ho(T2) + q(T2)}. (5.2.3) Integrating with respect to f3 in (5.2.3), one finds the joint (improper) pdf of Y, 0, and _2 given by
f(y, 0, T2)
cx (T2) 2 exp[ (y  O)TGI(y 0) 1OT(IJ  X(XTX)lXT)o]
x{(1  E)ho(T2) + Eq(T2)}. (5.2.4)
Write E' = G + (T2)(I p X(XTX)IXT). Then, one can write
(y  O)T Gl(y  o) + (T2)loT(i  X(XTX) lxT)o = OT E10  20T Gly + YTGly S(0  EGiy)TE1(6  EG1y) + yT(G1  G1EG1)y. (5.2.5)
From (5.2.4) and (5.2.5), we have E[Oly, T2] = EG1y; V[O1y,2] = E. (5.2.6) Using (5.2.5) and integrating out with respect to 0 in (5.2.4), one gets the joint (improper) pdf of Y and T2 given by
f (Y, T2)
cx IEi6 (T2) exp[ yT (G1  G1EG1)y]{(1  e)ho(T2) + eq(T2) }(5.2.7) We denote by m(ylh) the marginal distribution of y with respect to prior h, namely m(ylh) f f(yIT2)h(d'r2). (5.2.8) For h G F, we get
m(ylh) = (1  e)m(ylho) + Em(ylq). (5.2.9) Our objective is to choose the MLII prior h which maximizes m(ylh) over F. This amounts to maximization of h(ylq) over q E Q. Using the representation of each q E Q as a mixture of uniform densities, the MLII prior is given by h(T2) = (1  E)ho(r2) + E(2) (5.2.10) where is uniform (T_2, To + ), i being the solution of the equation
(Yz) =1 j ff(yIT2)dT2, (5.2.11) f(YtZ)= z J0
and T 2 is the unique mode of ho(T2). Note that
2) CC 2(2s 2 1
f(yrIT2) cx EI(T2)ï¿½: exp[yT(G1  G1EG1)y]. (5.2.12) Write ui = Vi/(V + T2), i = 1,,p, and D = Diag{l  ul,,1  up}. Then, on simplification, it follows that
E T2(Ip  D) + T2(IP  D)X(XTDX) XT(Ip  D); (5.2.13)
EG1 = D + (I  D)X(XTDX)XTD; (5.2.14)
EG1y [(1  ul)Y1 + UlXT,..., (1  Up)Yp UPXT3]T, (5.2.15) where / (XTDX)I(XTDy). Then
G1  GEG1 = (T2)l[D  DX(XTDX)IXTD]. (5.2.16) Hence,
yT(G1  G'EG1)y
P P P
= (T2)1[Z(j _ Ui)y,  (Z(1  ui)yX,)T(TD)L((1  ui)yxi)]
i1 i= l = QT2(y) (say). (5.2.17) Combining (5.2.12), (5.2.16) and (5.2.17), we can write
f(y1T2) cC IEI2 (,T2)E exp[Q,2(y)]. (5.2.18) Writing F = G1 + (T 2)jI, and using Exercise 2.4, p. 32 of Rao (1973), one gets F X
JE11 XT T2(XTX) ï¿½IT2(xTx)I
= IFIIT2(XTX)  XTFXI +T2(XTX)I
OC (T2)n{fl(T2 + VI)}IXTDXI. (5.2.19) i=1
It is clear from (5.2.18) and (5.2.19) that P 1 f(y r2) + (T2)n{l 2 V)}IXTDXK exp[Q.2(y)].
i=1
Now to find the solution 2 to the equation (5.2.11), we consider
zf(YIZ) = f0 f ywT2)dT2. (5.2.21)
By differentiating both sides with respect to z, we get
f(ylz) + zzf(yz) = f(yT2 Z). (5.2.22)
By Lemma A.4.4., Lemma A.4.5. and Theorem A.4.6 in Anderson (1984), recall that for the s x s matrix A = (aij),
d A' = A(d A)A1 (5.2.23) dz dz
and
d s 8 d
JAI 1ZZ A131a%, (5.2.24) dz i=l j=l
where A1, = dJAI is the cofactor of aij. Write A = Pz x'xT. Then after some calculations using (5.2.23) and (5.2.24), it can be shown that
d
df (yz)
S 71 P Z T
i21 iiz1 2
P P_ Z T1 1
  (z (z +z + VI zj  iI exp{z H(z)}
2 il iz
PP Z 2T _t _IOA e _ )
lz! L](z + V) 1 (z+Vi)' I ,trl izeXp TH(z}
~i1 i
(5.2.20)
.+ z l(Z + v) i z+ 'exp{ H(z)L[ H() H(z)]
(5.2.25)
where
H~ Z = Y2 z T
p= p JViy p .ii pEzx TH(Z) Zy , 2 ,ZT z_ yixi). (5.2.26) z~ ~ + +VxTZ Cx~(~
"== Z + ~ l Z V i i =l z   Y i Using (5.2.25), (5.2.22) leads to
2 + 2 T exp{+H(z)}
i=1
x[1 +   EP=I(z_ + V)1  tr(A QA) + _jH(z) _ ldH(z)]
2zr) 2 t= Tz 2 dz
H(Z + T =l(Z TJ + v) ! i x+xTIL1 exp{ 1 H(z + TO2)}.
(5.2.27)
Now we have the following HB model based on MLI prior:
(I) Conditional on 0, 13, and r2, Y % N(Oi, V),i = 1,.,p, where the V's are known positive constants;
(II) Conditional on /3, and T2, Oi N(xr3,T2), i = 1,,p, where X  , are known regression vectors of dimension s and ,3 is sx 1;
(III) Marginally, /3 and T2 are mutually independent with /3 ,' uniform(R) and T2 having a density h(T2) = (1 e)ho(T2) + E(T2), where ho(T2) is IG(ao, io) density, with ao > 0 and /0o > 0, and " is uniform (Tr, T0 + F), with F being the solution of (5.2.27) and T2 = ao/(3o + 1).
It is clear from (5.2.20) that (T Ty)
ï¿½ V) }IXDXI1 exp[ Q,2(y)]{(1  E)ho(T2) + Eg(T2)}.
(5.2.28)
Now writing U = V/(T2 + V) (i = 1,,p), using (5.2.28), and the iterated formulas for conditional expectations and variances, one gets
E[Oiy] = E[E(Oily,T2)Iy] I E[(1  U)yj + UixT)31y]; (5.2.29) and
V[0.ly] = V[E(Oly,T 2)Iy] + E[V(O ly, T2)IY] V[(1  U,)y, + UXT31y] + E[T2U, + T2Ui2xT( DX)Iy]
= V[Ui(Yi  xT3)Iy] + E[V(1  _) ï¿½ V1U1(1  Ui)xT(XTDX)lxiy].
(5.2.30)
Thus, the posterior distribution of 0 under the MLII prior is obtained using (5.2.4) and (5.2.28). In addition, one uses (5.2.29) and (5.2.30) to find the posterior means and variances of 0 under this prior. Similarly, by using the iterated formulas, posterior covariances may be obtained as well.
Next we consider the problem of finding the range of the posterior mean of 0i over F in (5.2.1). Using the expression h(T21y) OC f(yT2)h(T2), we have E[0 ly] = yj  E[V,(y iXTO) IY] fc ui(yi  xT'3)h(2Iy)dT2 (5.2.31) = yi f~o h(T21y)dT2
Simple modifications of the arguments of Sivaganesan and Berger (1989) or Cano (1993) leads to the following result S f(Y ui(yi  xT'3))f(yT2)d(r2
sup(inf)E[Oily] = sup(inf) , (5.2.32)
hEr t B + '2 f (yIT2)dTwhere
B = 1  j0 f (yT2)ho(T2)dT2 (5.2.33)
and
A = yB  j Ui(yi  xTO)f(ylT2)ho(T2)dT2. (5.2.34) The above sup (inf) can be obtained by numerical optimization. These formulas will be used in Section 5.5 in the context of estimation of median income of fourperson families.
5.3 Density Ratio Class
In this section we consider a class of priors, introduced by DeRobertis and Hartigan (1981), and called a density ratio class by Berger (1990),
R={ h(r2) u(T2)
],R . h(T,2) for all T2 and T'2}, (5.3.1) h(T12 1(T'2)
where 1 and u are two bounded nonnegative functions such that 1(T2) : u((T2) for all T2. This class can be viewed as specifying ranges for the ratios of the prior density between any two points. By taking u = kho and 1 = h0, we have very interesting subclass,
]F{< h (2 2) for all 72 and T'2}, (5.3.2) h(T12)  ho(T/2)
where k > 1 is a constant. This class may be thought of as a neighborhood around the target prior h0. The interpretation is that the odds of any pair of points are not misspecified by more than a factor of k. This prior is especially useful when h0 is a default prior chosen mainly for convenience.
Because of the expression h(T21y) oc f(yT2)h(TF2) we can view our problem as having just the parameter T2, h being the prior and f(yT2) the likelihood. Since
. [T(Z T2 (53P T2
T2+Vxii JI ixi}], (5.3.3)
f[i~]= i TE.2 + V i=1 ,= i:
our problem reduces to finding
sup (inf)E[b(T2)Iy] (5.3.4) hEFR(ho)
,2 2
where b(T2) 7+ {Yi XT( 1 2xT,lP _Yx)}
Wasserman and Kadane (1992) have developed a Monte Carlo technique which can be used to bound posterior expectations over the density ratio class. Wasserman (1992) has shown that the set of posteriors obtained by applying Bayes' theorem to the density ratio class Fr(h0) is the density ratio class FR(h' ), where ho (T 2) ho(T2y) is the posterior corresponding to h0. To see this all we need to do is to write
FR 2 12y1
Fk (hy) = {y hy(2) < k for all T2 and T'}, (5.3.5) and observe h(r21y) f(Y(72)h(z2) and hï¿½(r  f(Yl_2)ho(r2) so is
h(r,21y) f(ylr2)h(r2) ho(r'21y)  f(ylr'2)ho(r2) SO that h eis equivalent to hy E Fk(hy ), where hy(T2) = h(T21y). Wasserman (1992) calls this as Bayes' invariance property. Hence, to bound the posterior expectation of b(T2), we only need bound the expectation of b(r2) on F(hy). To do so, we will need to draw a sample from the posterior ho. Following Wasserman and Kadane (1992), we can rely on recent samplingbased methods for Bayesian inference. Note in this case that
ho(T2(y) Oc (T2) {1(T 2ï¿½V)}XDX] exp[Q,2(y)] ho(T2). (5.3.6)
2
Let T2 , "',T2 be a random sample from ho(T21y). Let bi = b(T2),i = 1,,N and let b(1) < b(2) < ... < b(N) be the corresponding order statistics. Also, let ci = b(2), i = 1,. , N and let c(i) < C(2) <_ ... < C(N) be the corresponding order statistics. Let ji=1 bi/N and E = ZiN ci/N. Then following Wasserman and Kadane (1992), we get
sup E[b(T2)1y] = sup E[b(T2)1y]
hErR(ho) hyEro(h1
kyr khY)
max {(1  )A + 1}{A1bo}/Nt} (5.3.7) I~i
and
inf E[b(T2)y]= inf E[b(T2)y]
hEFr(ho) hyE R A
ky k (y)
N
max {(1 j)A+ 1}1{A c(j)/N+ï¿½} (5.3.8)
where A = k  1. This gives the posterior bound for a given k.
To generate the sample from h , we use Gibbs sampling. The Gibbs sampling analysis is based on the following posterior distributions
(i) /3 I y,O,z  N8 ((XTX)IXTO, Z(XTX)I);
(ii) 01, ",m I y,f3, z izd N ((1  Bi)y + BixT3, zBi);
(iii) T' I y, ,3, 0O IG Ce0 +. Em ]=1(0, _ Xi )T2, 1p +fo)
where Z = (T2)1 and Bi z/(z + &), where I 1/V, i = 1, ,p.
Note that our target is to draw samples from ho(T2ly). However, as is well known in the Gibbs sampling literatures, the histogram of the (T2)(t), t = 1, 2, , q, samples drawn from the conditionals [T2 IY, 3, 0] converges to the conditional distribution [T2 I y] as q + oo. This along with (5.3.7) and (5.3.8) facilitates the computations of upper and lower bounds of E[b(T2)ly].
5.4 Class of Uniform Priors
In usual HB model in our setting, one might use the diffuse prior on T2. Instead, we consider the class of uniform priors on r2 with constraints of the form
r = ï¿½T2
(5.4.1)
where T2 and r2 are arbitrary nonnegative numbers such that T2 < T. This class of priors is attractive because of its mathematical convenience, and indeed give a good enough class of priors for an honest robustness check. Classes of conjugate priors having parameters in certain ranges have been studied by Learner (1982) and Polasek (1985).
From (5.2.6) and (5.2.15), we have
sup (inf)E[Oily, T2]
.72<72
=y inf (sup) 2i ly T(P 72 r P T2
Ti _ 22 { i T 2T2 + vYXi)}ï¿½ < r <2 T + Vi l l
(5.4.2)
Hence, we can find the range of the posterior quantity E[9iiy, T2] over F in (5.4.1), and investigate the sensitivity with different choices of T2 and T2.
5.5 An Example
In this section we illustrate the methods suggested in preceding sections with an analysis of real data set. The data set is related to the estimation of median income for fourperson families by states. To start with, we briefly give a background of this problem.
The U.S. Department of Health and Human Services (HHS) administers a program of energy assistance to lowincome families. Eligibility for the program is determined by a formula where the most important variable is an estimate of the current median income for fourperson families by states (the 50 states and the District of Columbia).
The Bureau of the Census provides such estimates to the HHS annually since the latter part of the 1970's using a linear regression methodology. The basic source of the data is the annual demographic supplement to the March Sample of the Current Population Survey (CPS) which provides median income for fourperson families for
the preceding year. In addition, once every ten years, similar figures are obtained from the decennial census for the calendar year immediately preceding the census year. The latter set of estimates is believed to be much more accurate and serve as the "gold standard" against which other estimates need to be tested.
Direct use of CPS estimates was found undesirable because of the large coefficient of variation associated with them (cf Fay (1987)). The regression method used by the Bureau of the Census was intended to rectify this drawback. The method that is used since 1985 is as follows. The current year CPS estimates of statewide median incomes of fourperson families are used as dependent variables in a linear regression. In addition to the intercept term, the regression equation uses as independent variables the base year census median(b), and the adjusted census median(c), where census median(b) represents the medianincome of fourperson families in the state for the base year b from the most recently available decennial census. The adjusted census median(c) is obtained from the formula
Adjusted census median(c)= [BEA PCI(c)/BEA PCI(b)]
x Census median(b). (5.5.1)
In the above BEA PCI(c) and BEA PCI(b) represent respectively estimates of percapita income produced by the Bureau of Economic Analysis of the U.S. Department of Commerce for the current year c, and the base year b, respectively. Thus (5.5.1) attempts to adjust the base year census median by the proportional growth in the BEA PCI to arrive at the current year adjusted median. The inclusion of the census median(b) as a second independent variable is to adjust for any possible overstatement of the effect of change in BEA PCI upon the median income of fourperson families.
Finally, a weighted average of the CPS sample estimate of the current median income and the regression estimates is obtained, weighting the sample estimates inversely proportional to the sampling variances and the regression estimates inversely
proportional to the model error due to the lack of fit for the census values of 1979 by the model with 1969 used as the base year.
The data consist of Y, the sample median income of fourperson families in state i and associated variance Vi (i = 1,, 51). The true medians corresponding to Y's are denoted by Oi's respectively. The design matrix X = (XI, , X51)T is of the form
X = (1 xi Xi2), (i = 1,., 51), (5.5.2) where xil and Xi2 denote respectively the adjusted census median income and the baseyear census median income for fourperson families in the ith state.
We consider the HB model as given in Section 5.1. We choose F to be 6contamination class of priors where the contamination class includes all unimodal distributions. We find RB estimates of statewise median incomes of fourperson families for 1989 using 1979 as the base year based on MLII priors. In finding RB estimates, we have used (5.2.29) and (5.2.30) with a0=l, O0 =10 and =.1. The alternatives in our setting are HB and EB estimates of 0i. For the HB analysis, we use diffuse prior on T2 instead of using class of priors. The EB method is due to Morris (1983b) which uses estimates of T_2 and )3 rather than assigning any prior distribution on hyperparameters. Since in addition to the sample estimates, the 1989 incomes are also available from the 1990 census data, we compare all three Bayes estimates against the 1990 census figures, treating the latter as the "truth". Table 5.1 provides the "truth", the CPS estimates and three Bayes estimates, whereas Table 5.2 provides the standard errors associated with three Bayes estimates.
Now these estimates are compared according to several criteria. Suppose eiTR denotes the true median income for the ith state, and ei any estimate of eirT. Then for the estimate e = (el,, e51)T of eTR = (elTR, ",e51TR)T, let 51
Average Relative Bias = (51)1 l ei  eiTRI /eiTR j=l
51
Average Squared Relative Bias = (51)' lei  e,TRI2 /e,R 51
Average Absolute Bias = (51)1 lei eiTI i=1
51
Average Squared Deviation = (51)l (ei  eiTR)2
i=1
The above quantities are provided in Table 5.3 for all three Bayes estimates.
From Table 5.1, one can see that three Bayes estimates of the small area means are quite close to each other. But from Table 5.2, it appears that the estimated standard errors given by RB method are larger than those given by HB or EB methods. Table 5.3 indicates that the EB estimates are the best and the HB estimates are the second best under all criteria for this data set. As anticipated also, the HB standard errors are slightly bigger than the EB standard errors. As is wellknown, this phenomenon can be explained by the fact that unlike the EB estimators, the HB estimators take into account the uncertainty for estimating the prior parameters.
Next, we find the ranges of the posterior means for the small areas under Econtamination class. Table 5.4a and Table 5.4b provide the ranges of posterior means under econtamination class when (Qo, 030) = (1, 10) and (a0, 3o) = (7, 3) respectively. For (ao, 30) = (1, 10), the ranges are fairly small and robustness seems to be achieved using this class for all e values. But for (ao, i30) = (7, 3), the ranges are relatively wider in comparison with (a0, 030) = (1, 10). As one might expect, the choice of h0, that is, the elicited inverse gamma prior, seems to have some effect on the ranges of the posterior means for Econtamination class. Note that the inverse gamma prior of T2 with (a0, 3o) = (1, 10) has coefficient of variation 1/8 compared to 1 with (ao, /3o) = (7, 3). Although the two priors have very similar tails, the former is much flatter than the latter even for small values of T2. This suggests that the bigger the
91
taA
ï¿½0
I
tau^2
Figure 5.1. The IG(1, 10) Prior
coefficient of variation of the assumed inverse gamma prior, the wider is the range of the posterior means of the O's.
We also find the ranges of the posterior means under density ratio classes. In computing the bounds, we have considered Gibbs sampler with 10 independent sequences, each with a sample of size 1000 with a burnin sample of another 1000. We adopt the basic approach of Gelman and Rubin (1992) to monitor the convergence of the Gibbs sampler. We have obtained / values (the potential scale reduction factors) corresponding to the estimand 0i based on 10 x 1000 = 10000 simulated values. The fact that all the point estimates f are equal to 1 as well as the near equality of these point estimates and the corresponding 97.5% quantiles suggests that convergence is achieved in the Gibbs sampler. Table 5.5a and Table 5.5b provide the ranges of the
01'0 20 3 0 4 10 50 tauA2
Figure 5.2. The IG(7, 3) Prior
posterior means under density ratio classes when (aO, /3o) (1, 10) and (ceo, i0o) (7, 3) respectively. Note however that here the bounds given for IG(7, 3) are much closer than the ones for IG(1, 10). The intuitive explanation for this phenomenon is that while Q,2 can be extremely large for certain choices (T , T') corresponding
ho(
to the IG(F, 10) prior, the ratio is more under control for the IG(7, 3) prior. This density ratio classes are very convenient to represent vague prior knowledge and robustness seems to be achieved using the IG(7, 3) prior.
Finally, we consider the ranges of the posterior quantities E[O ly, T2] over the class of uniform priors. From Table 5.6, we can see that the ranges are not sensitive to
93
the choice of the upper bound of T2. Also, for most of the states, the ranges of the posterior means do not seem to be too wide.
