Citation

## Material Information

Title:
Robust Bayesian inference in finite population sampling
Creator:
Kim, Dal-Ho, 1959-
Publication Date:
Language:
English
Physical Description:
vii, 122 leaves : ill. ; 29 cm.

## Subjects

Subjects / Keywords:
Bayes estimators ( jstor )
Bayesian analysis ( jstor )
Estimators ( jstor )
Income estimates ( jstor )
Mathematical robustness ( jstor )
Median income ( jstor )
Population estimates ( jstor )
Population mean ( jstor )
Statistical estimation ( jstor )
Statistics ( jstor )
Dissertations, Academic -- Statistics -- UF
Statistics thesis Ph.D
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

## Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1994.
Bibliography:
Includes bibliographical references (leaves 116-121).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Dal-Ho Kim.

## Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Resource Identifier:
020776191 ( ALEPH )
32801367 ( OCLC )

Full Text

ROBUST BAYESIAN INFERENCE IN FINITE POPULATION SAMPLING

By

DAL-HO KIM

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA

1994

To my parents and teachers

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to Professor Malay Ghosh for being my advisor and for originally proposing the problem. Words cannot simply express how grateful I am for his patience, encouragement and invaluable guidance. Without his help it would not have been possible to complete the work. I would like to thank Professors Michael A. DeLorenzo, Alan Agresti, James G. Booth, Brett D. Presnell and Jeffrey A. Longmate for their encouragement and advice while serving on my committee.
Much gratitude is owed to my parents, brother, sisters, mother-in-law, brotherin-law and sisters-in-law, whose support, advice, guidence and prayers throughout the years of my life have made this achievement possible. My debt to them can never be repaid in full. A very special thanks are offered to my wife, Kyungok, for her love, patience, support and prayers, and my daughters, Seunghyun and Grace, for being a joy to us.

ACKNOWLEDGEMENTS ............................

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTERS

I INTRODUCTION ...............................

1.1 Literature Review ......... ......... .. .........
1.2 The Subject of This Dissertation ....................

2 ROBUST BAYES ESTIMATION OF THE FINITE POPULATION MEAN

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Robust Bayes Estimators .........................
2.3 Symmetric Unimodal Contamination ..................
2.4 An Exam ple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 ROBUST BAYES COMPETITORS OF THE RATIO ESTIMATOR ...

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 E-Contamination Model and the ML-II Prior ..............
3.3 Symmetric Unimodal Contamination ..................
3.4 An Exam ple . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 BAYESIAN ANALYSIS UNDER HEAVY-TAILED PRIORS ........

Introduction .... Known Variance. Unknown Variance An Example ....

5 BAYESIAN ROBUSTNESS IN SMALL AREA ESTIMATION .....

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 --Contamination Class ........................

. . . . . . . . . . . . . .ï¿½
. . . . . . . . . . . . . .ï¿½

5.3 Density Ratio Class ................................... 84
5.4 Class of Uniform Priors ................................ 86
5.5 An Example ........ ................................ 87

6 SUMMARY AND FUTURE RESEARCH ....................... 114

6.1 Summary ........ ................................. 114
6.2 Future Research ..................................... 115

BIBLIOGRAPHY ........................................ 116

BIOGRAPHICAL SKETCH ....... ........................... 122

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy

ROBUST BAYESIAN INFERENCE IN FINITE POPULATION SAMPLING By

Dal-Ho Kim

August 1994

Chairman: Malay Ghosh
Major Department: Statistics

This dissertation considers Bayesian robustness in the context of the finite population sampling. Although we are concerned exclusively with the finite population mean, the general methods are applicable to other parameters of interest as well.

First, we consider some robust Bayes competitors of the sample mean and the subjective Bayes estimators of the finite population mean. Specifically, we have proposed some robust Bayes estimators using ML-II priors from the E-contamination class. Classes of contaminations that are considered include all possible contaminations and all symmetric, unimodal contaminations. These estimators are compared with the sample mean and subjective Bayes estimators in terms of "posterior robustness" and "procedure robustness". Similar robust Bayes estimators are introduced in the presence of auxiliary information, and these are compared again in terms of "posterior robustness" and "procedure robustness" with the ratio estimator, and subjective Bayes estimators utilizing the auxiliary information. Also, we provide the range where the posterior mean belongs under E-contamination models.

Second, we consider the idea of developing prior distributions which provide Bayes rules that are inherently robust with respect to reasonable misspecification of the prior. We provide the robust Bayes estimators of the finite population mean based on heavy-tailed prior distributions using scale mixtures of normals. A Monte Carlo method, the Gibbs sampler, has been used for implementation of the Bayesian program. Also, the asymptotic optimality properties of proposed robust Bayes estimators are proved.
Finally, we address Bayesian robustness in the hierarchical Bayes setting for small area estimation. We provide robust hierarchical Bayes estimators of the small area means based on ML-II priors under the E-contamination class where the contamination class includes all unimodal distributions. Also, we provide the range where the small area means belong under 6-contamination class as well as the density ratio class of priors. For the class of priors that are uniform over a specified interval, we investigate the sensitivity as to the choice of the intervals. The methods are illustrated using data related to the median income of four-person families in the 50 states and District of Columbia.

CHAPTER 1

INTRODUCTION

1.1 Literature Review

When prior information about an unknown parameter, say 0, is available, Bayesian analysis requires quantification of this information in the form of a (prior) distribution on the parameter space for any statistical analysis. The most frequent criticism of subjective Bayesian analysis is that it supposedly assumes an ability to quantify available prior information completely and accurately in terms of a single prior distribution. Given the common and unavoidable practical limitations on factors such as available prior elicitation techniques and time, it is rather unrealistic to be able to quantify prior information in terms of one distribution with complete accuracy. In view of this difficulty in prior elicitation, there has long been a robust Bayesian viewpoint that assumes only that subjective information can be quantified only in terms of a class F of prior distributions. A procedure is then said to be robust, if its inferences are relatively insensitive to the variation of the prior distribution over F.

The robust Bayesian idea can be traced back to Good as early as in 1950 (see for example Good (1965)), and has been popularized in recent years, notably by the stimulating article of Berger (1984). There is a rapidly growing literature in the area of robust Bayesian analysis. Berger (1984, 1985, 1990) and Wasserman (1992) provide reviews and discussion of the various issues and approaches. Berger (1984) discusses the philosophical or pragmatic reasons for adopting the robust Bayesian

viewpoint along with a review of some of the leading approaches to robust Bayesian analysis. More recently, Berger (1990) provides a review of different approaches to the selection of F and techniques used in the analyses. Wasserman (1992) discusses several approaches to computation of bounds on posterior expectations for certain classes of priors as well as different approaches to graphical and numerical summarization.

Various classes of priors have been proposed and studied in the literature. One such class is the E-contamination class having the form F = {r: 7r(0) = (1 - e)7ro(O) +eq(O),q E Q} (1.1.1) where Q is the class of allowable contaminations, and 7r0 is a fixed prior, commonly called the base prior, which can be thought of as the prior one would use if a single prior had to be chosen. 6-contamination priors have been used by Huber (1973), Berger (1984, 1990), Berger and Berliner (1986), and Sivaganesan and Berger (1989) among others. Berger and Berliner (1986) studied the issue of selecting, in a data dependent fashion, a "good" prior distribution (the type-Il maximum likelihood prior) from the E-contamination class, and using this prior in any subsequent analysis. Sivaganesan and Berger (1989) determined the ranges of the various posterior quantities of interest (e.g. the posterior mean, posterior variance, posterior probability of a set (allowing for credible sets or tests)) as the prior varies over F. Wasserman (1989) provides a robust interpretation of likelihood regions in the case of all possible contaminations. Also related is Berger and Sellke (1987), who carry out the determination of the range of the posterior probability of a hypothesis when E = 1 in (1.1.1) (i.e., when there is no specified subjective prior -x0).
Another class of interest has been considered in DeRobertis and Hartigan (1981) to find ranges of general posterior quantities. This class is of the form

{71: 7F(01) < gL(O ) Vo1,02} (1.1.2)
(02) 2(2)

where g2 < gi are given positive functions. Such class is called a density ratio class by Berger (1990). Wasserman and Kadane (1992) develop a Monte Carlo approach to computing bounds on posterior expectations over the density ratio class. Cano (1993) considers the range of the posterior mean in hierarchical Bayes models for several classes of priors including E-contamination class and density ratio class.

Learner (1982) and Polasek (1985) considers a class of conjugate priors with constraints on the domain of parameters to get closed form expressions for posterior expectations, and did sensitivity analyses. DasGupta and Studden (1988) consider the above conjugate class, as well as a density ratio class in the context of Bayesian design of experiments.
The difficulty of working with a class of priors makes it very appealing to find prior distributions which provide Bayes rules that are naturally robust with respect to reasonable misspecification of the prior. Since Box and Tiao (1968, 1973), Dawid (1973), and O'Hagan (1979), it has been recognized that insensitivity to outliers in Bayesian analysis can be achieved through the use of flat-tailed distributions. Use of t-priors when the data are normal has been specifically considered by West (1985) and O'Hagan (1989). Angers and Berger (1991) consider a robust Bayesian estimation of exchangeable means using Cauchy priors. In Angers (1992), the Cauchy prior is replaced by t-prior with odd number of degrees of freedom. Andrews and Mallows (1974) and West (1987) study scale mixture of normal distributions which can be used for achieving Bayesian robustness. The Student t family, double-exponential, logistic, and the exponential power family can all be constructed as scale mixtures of normals (cf Andrews and Mallows (1974)). West (1984) uses scale mixtures of normal distributions for both the random errors and the priors in Bayesian linear modelling. Carlin and Polson (1991) discuss Bayesian model specification and provide a general paradigm for the Bayesian modelling of nonnormal errors by using scale mixtures of normal distributions.

The references so far pertain to Bayesian analysis for infinite populations. Bayesian analysis in finite population sampling is of more recent vintage. A unified and elegant formulation of Bayes estimation in finite population sampling was given by Hill (1968) and Ericson (1969). Since then, there are many papers in the area of Bayes estimation in finite population sampling. But most of the Bayesian literature in survey sampling deals with subjective Bayesian analysis in that the inference procedure is based on a single completely specified prior distribution.
The need for robust Bayesian analysis in survey sampling, however, has been felt by some authors, although the methods described in the first few paragraphs of this section have not so far been used to achieve this end. Godambe and Thompson (1971) adapted a framework whereby the prior information could only be quantified up to a class F (C in their notation) of prior distributions. For estimating the population total in the presence of auxiliary information, they came up with the usual ratio and difference estimators, justifying these on the ground of location invariance. The model assumption there played a very minimal role, the main idea being that model-based inference statements could be replaced, in the case of model-failure by design-based inference. In a later study, Godambe (1982) considered the more common phenomenon of specific departures from the assumed model. His contention there was that sampling designs could be a useful companion of model-based inference procedures to generate "near-optimal" and "robust" estimators. However, the basic model assumed in that paper considered Y1,-. ,'YN to be independent, and attention was confined only to design and model unbiased estimators. Royall and Pfefferman (1982) also consider robustness of certain Bayes estimators in finite populations. However, their main concern is to find out conditions under which the Bayes estimators under an assumed model remain the same under certain departures from the model.

Much of the earlier work as well as some of the current work in finite population sampling relates to inference for a single stratum. However, there is a recent need for simultaneous inference from several strata, particularly in the context of small area estimation which is becoming increasingly popular in survey sampling. Agencies of the Federal Government have been involved in obtaining estimates of population counts, unemployment rates, per capita income, crop yields and so forth for state and local government areas. In typical instances, only a few samples are available from an individual area, and an estimate of a certain area mean or simultaneous estimates of several area means can be improved by incorporating information from similar neighboring areas. Ghosh and Rao (1994) have recently surveyed an early history as well as the recent developments on small area estimation.

Bayesian methodology has been implemented in improving small area estimators. Empirical Bayes (EB) approach has been considered for simultaneous estimation of the parameters for several small areas where each area contains a finite number of elements, by Fay and Harriot (1979), Ghosh and Meeden (1986), and Ghosh and Lahiri (1987). Datta and Ghosh (1991), as an alternative to the EB procedure, propose hierarchical Bayes (HB) approach for estimation of small area means under general mixed linear models, and also discuss the computational aspects. To handle the case of heavy tailed priors in the setting of Fay and Herriot, Datta and Lahiri (1994) use t-priors viewing them as scale mixture of normals.

1.2 The Subject of This Dissertation

This dissertation considers Bayesin robustness in the context of the finite population sampling. Although we are concerned exclusively with the finite population mean, the general methods are applicable to other parameters of interest as well.

In Chapter 2, we develop some robust Bayes estimators of the finite population mean using ML-II priors under E-contamination models. We compare the performance

of these estimators with the subjective Bayes estimators as well as the sample means using the criteria of "posterior robustness" and "procedure robustness" as suggested by Berger (1984). As a consequence of the results on procedure robustness, we have established some asymtotic optimality of the robust Bayes estimators even in the presence of a subjective prior. Furthermore, modifying slightly the arguments of Siverganesan and Berger (1989), we have provided the range where the posterior mean belongs under --contamination models.

In Chapter 3, we provide expressions for variations of the posterior mean within the E-contamination class of priors in the presence of auxiliary information following Siverganesan and Berger (1989). Moreover, similar robust Bayes predictors using the ML-11 priors under this contamination class of priors are introduced in the presence of auxiliary information. We compare these estimators again in terms of "posterior robustness" and "procedure robustness" with the classical ratio estimators, and a subjective Bayes predictors utilizing the auxiliary information. In the course of calculating indices for procedure robustness, we have proved the asymptotic optimality of the robust Bayes predictors.
In Chapter 4, we propose certain robust Bayes estimators of the finite population mean from a different perspective. In order to overcome the problem associated with outliers in the context of finite population sampling, we develop Bayes estimators of the finite population mean based on heavy-tailed priors using scale mixtures of normals. Also, we study the asymptotic optimality property of the proposed Bayes estimators. For implementation we have devised a computer-intensive fully Bayesian procedure which uses Markov chain Monte Carlo integration techniques like the Gibbs sampler.

Chapter 5 addresses Bayesian robustness in the hierarchical Bayes setting in the context of small area estimation. We provide the robust hierarchical Bayes estimators of the small area means based on ML-II priors under the E-contamination class of

7

priors where the contamination class includes all unimodal distributions. We compare these estimators with HB and EB estimators. Also, we provide the range where the small area means belong under E-contamination class as well as the density ratio class of priors.

Finally, in Chapter 6, we discuss some open problems which will be topics for future research.

CHAPTER 2
ROBUST BAYES ESTIMATION OF THE FINITE POPULATION MEAN

2.1 Introduction

Consider a finite population U with units labeled 1, 2,..., N. Let yi denote the value of a single characteristic attached to the unit i. The vector y = (y, - -, YN)T is the unknown state of nature, and is assumed to belong to E) = RN. A subset s of {1, 2,..., N} is called a sample. Let n(s) denote the number of elements belonging to s. The set of all possible samples is denoted by S. A design is a function p on S such that p(s) G [0,1] for all s C S and Zscsp(s) = 1. Given y G E and s = {il,"',in(s)I with 1 < il < ... < in(,) < N, let y(s) = IYi", Yin(s)}. One of the main objectives in sample surveys is to draw inference about y or some function (real or vector valued) y(y) of y on the basis of s and y(s).

Bayes estimators of the finite population total (or the mean) within the class of linear unbiased estimators were obtained by Godambe (1955). Subsequently Godambe and Joshi (1965) found such estimators within the class of all unbiased estimators. General Bayesian approach for finite population sampling was initiated by Hill (1968) and Ericson (1969). Since then, a huge literature has grown in this area. Ericson (1983) provides a recent account.

We have discussed the use of a single subjective prior in Section 1.1. In this chapter, we generate certain robust Bayes estimators using e-contamination priors, and study their performance over a broad class of prior distributions on the parameter

space. As we mentioned in Section 1.2, we have restricted ourselves to the estimation of the finite population mean. It may be noted that in our framework, the sampling design does not play any role in the robustness study. More generally, in the Bayesian framework, the irrelevance of the sampling design at the inference stage has been pointed out earlier by Godambe (1966), Basu (1969) and Ericson (1969). Our findings are consistent with that. This, however, does not undermine the importance of sampling design in the actual choice of units in a survey. It is very crucial to design a survey before its actual execution, and we have made some comments on why a simple random sample is justified even within our framework.

Royall and Pfefferman (1982) also consider robustness of certain Bayes estimators. However, their main concern is to find out conditions under which the Bayes estimators under an assumed model remain the same under certain departures from the model.
The contents of the remaining sections are as follows. In Section 2.2, we develop some robust Bayes estimators of the finite population mean using ML-II priors under E-contamination models (see Good (1965) and Berger and Berliner (1986)) where the contamination class includes all possible distributions. As in Berger (1984), the concepts of posterior and procedure robustness are introduced, and the proposed robust Bayes estimators are compared to the sample means and the subjective Bayes estimators under this criterion. It turns out that the robust Bayes estimator enjoys good posterior as well as procedure robustness property relative to the sample mean, while the subjective Bayes estimator enjoys good posterior robustness, but lacks in procedure robustness. As a consequence of the results on procedure robustness, we have established some asymtotic (as the sample size increases) optimality of the robust Bayes estimators in the presence of a subjective prior. Specifically we have shown that.the difference of the Bayes risks of the proposed robust Bayes estimator and the optimal Bayes estimator converges to zero at a certain rate as the sample size tends

to infinity. Also, modifying slightly the arguments of Siverganesan (1988), we have provided the range where the posterior mean belongs under E-contamination models. This pertains to the sensitivity analysis of the proposed robust Bayes procedure.

Section 2.3 contains the more realistic framework when the contamination class contains only symmetric unimodal distributions. We have developed robust Bayes estimators as competitors of the classical estimator as well as the subjective Bayes estimator, and have studied their properties.
Finally, in Section 2.4, a numerical example is provided to illustrate the methods described in Sections 2.2 and 2.3.

One of the highlights of this chapter is the analytical study of the procedure robustness and the subsequent asymptotic optimality in the presence of a subjective prior. To our knowledge, this issue has not been addressed before in the previous work.
For simplicity, in the subsequent sections, only the case where p(s) > 0 if and only if n(s) = n will be considered. This amounts to considering only fixed samples of size n. Also, throughout this chapter, the loss is assumed to be squared error.

2.2 Robust Bayes Estimators

Consider the case when yi0 i N(0, a2) (i=l, ..., N) and 0 , N(po, T2). Write Mo = a2/T2, Bo = Mo/(Mo + n), 9(s) = n-' KESYi and y( ) = {yj : i V s}, the suffixes in y(g) being arranged in the ascending order. From Ericson (1969), it follows that the posterior distribution of y(g) given s and y(s) is N({(1 - Bo)g(s) + Bo0} 1N-,, a2(IN-. + (M0 + n)-g'JN-)), where l is a u-component column vector with each element equal to 1, J, = 1,1T and I,, is the identity matrix. Then the Bayes estimator of 'y(y) = N-1 is

J0(s,y(s)) = E[,(y)Js,y(s)] = fV(s) + (1- f){(1 - Bo)97(s) +Bopo}

V (s)- (1-f)Bo(g(s) - yo), (2.2.1) where f = n/N. Strictly speaking, we should call 6 a predictor of y(y) since 6 is the sum of the observed yi's plus a predictor of the total of the unobserved yi's, but we shall use the two terms "estimators" and "predictors" interchangeably. Also, the associated posterior variance is given by

V(y(y)Js,y(s)) = N-2(N - n)U2(Mo + N)/(Mo + n). (2.2.2)

The classical estimator of y(y) is 6C(s, y(s)) = g(s) which is a design-unbiased estimator of 'y(y) under simple random sampling. Also, Jc is a model-unbiased estimator of -y(y) under any model which assumes that the yi's have a common mean.
To derive robust Bayes estimators of -y(y), we first introduce the notion of Econtaminated priors. Denote 7ro by the N(p0, T2) distribution. The class F of prior distributions is given by

{r = 1r:7r = (1- E)0ro +Eq,q G Q}, (2.2.3) where 0 < e < 1 is given, and Q is the class of all distribution functions. We denote m(yir) by the marginal (predictive) density of y under the prior 7r. If 7r E F,, we can write

m(yl7r) (1 - E)m(yl7ro) + Em(ylq). This leads to

m(y(s)[7r) = (1 - _)m(y(s)17ro) + em(y(s)Iq). (2.2.4) Our objective is to choose the prior 7r which maximizes m(y(s)[7r) over F,. This amounts to maximization of m(y(s)Iq) over q E Q. Noting that

m(y(s)Iq) = f (21ra2)- exp[- E_(y, - O)2/(2a2)]q(dO), (2.2.5)
_0 iE8

it follows that m(y(s) q) is maximized with respect to the prior which is degenerate at y(s). We shall denote this prior by J(s), 6 being the dirac-delta function. The resulting (estimated) prior, say ir,, is now given by FF = (1 - E) 70j + ESO). (2.2.6) The prior Fr is called ML-II prior by Good (1965), and Berger and Berliner (1986). Using the prior Fr,, y(s) is marginally distributed as (1 - E)N(/Ioln, a2In + T-Jn) + eF, (2.2.7) where F, has (improper) pdf

f(yj;i E s) =(27ro2)- exp[- '(Yi - y(s))2/(22)] (2.2.8) iEs

We now prove the following theorem, which provides the posterior distribution of y(g) given s and y(s) under the ML-II prior R. Theorem 2.2.1 Under the ML-II prior ir, the conditional distribution of y(g) given s and y(s) is

AML(9(S)) N({(1 - Bo)9(s) + Bo~o},1N.,02(IN-., + (Mo + n)-JNn))

+ - AML(9(s))) N(g(s) IN-n, 02IN-.), (2.2.9) where

ML(9(s)) = 1 + E(1 - E)-1Bo exp(nBo(V(s) - u0)2/(2a2)). (2.2.10) Proof of Theorem 2.2.1 The conditional pdf of y(ï¿½) given s and y(s) is r,(Y() f f(y 0), ( s, y(s))dO, (2.2.11) where

S(S, )= (y(s))o(Sy(s)) + (1 - ISY(s)),

(2.2.12)

and
A(y(s)) =(1 - )m(ys7ro)/m(y(s)ps). (2.2.13) From Ericson(1969), 7ro(y(9)1s, y(s)) is the N[{(1 - Bo) (s) + Bo0}1N-l, u2(IN-. + (Mo + n)-IJN_,,)] pdf. Also,

J f(Y(m)IO)q(Ols, y(s))dO = f f(y(9)10)f (y(s)I0)q(O)dO/m(y(s) I) = m(yl)/m((s)Il)

= (27ro2)- N exp[- :(y, - 9(s))2/(2U2)](2.2.14) iEï¿½

Further, after some algebraic simplifications, kl(YS)) 1 ï¿½E(1 - E>_1M(Y(S)I)/M(Y(S)j7Fo)

- AML(Y(S)). (2.2.15) This completes the proof of the theorem.

Note that under the posterior distribution given in (2.2.9), the Bayes estimator of

-/(y) is given by

6RB(sy(s)) = N-l[ny(s) + (N- n){AML(V(S))((1 - Bo)g(s) + Bojio) +(- ML((S)))y(S)}]

= 9(S) - (1 - f)AML(P(s))Bo(9(S) - O). (2.2.16) Note that for E close to zero i.e. when one is very confident about the N(po, T ) prior for 0, since ML(9i(S)) is close to 1, it follows that 6RB is very close to JO. On the other hand, when E is close to 1, i.e. there is very little confidence in the N( o, T ) prior, AML(9(S)) is close to zero, and JRB is very close to the sample mean 9(s). Thus, as one might expect, a robust Bayes estimator serves as a compromise between the

subjective Bayes and the classical estimators of 7(y). Also, generalizing the formula (1.8) in Berger and Berliner (1986), i.e.,

V'(x) = A(x) V'O (x) + (1 - A(x))Vq(x) + A(x)(1 - A(x))(PO (x) - 51(X))

one gets

v (-Y (Y) lS, y (s) )
N-2 + AML( - AML)B2(9(8) - /10)21].
[(N - n)c +(N - n) M0

(2.2.17)

Next, we compare the performances of 60, 5C and 6RB from the robustness perspective. The main idea is that we want to examine whether these estimators perform satisfactorily over a broad class of priors.

With this end, for a given prior , denote by p( , (s, y(s)), a) the posterior risk of an estimator a(s, y(s)) of 7y(y), i.e., p( , (s, y(s)), a) = E[{a(s, y(s)) - 7(y)}21s, y(8)]. The following definition is taken from Berger (1984). Definition 2.2.1 An estimator ao(s, y(s)) is (-posterior robust with respect to F if for the observed (s, y(s)),

PORp(ao) =sup lp( , (s, y(s)), ao) - inf p( , (s, y(s)), a)I < ï¿½. (2.2.18) EF aEA

We shall, henceforth, refer to the left hand side of (2.2.18) as the posterior robustness index of the estimator ao(s, y(s)) of 'y(y) under the class of priors F. PORr(ao) in a sense is the sensitivity index of the estimator a0 of -y(y) as the prior varies over F. For any given C > 0, it is very clear that whether or not posterior robustness exists will often depend on which (s, y(s)) is observed. This will be revealed in the examples to follow.

To examine the posterior robustness of J0, JC and jRB, consider the class of N(M,,'-2) priors, p (real) and T2(> 0). Write M = a'2/T2, B = M/(M+n), where a2(> 0) is known. Calculations similar to (2.2.1) now give the Bayes estimator of

-y(y) under the N(L, T 2) prior (to be denoted by ,,B) as

P,"B(s,y(s)) = fri(s) + (1 - f){Bji ï¿½ (1 - B)qi(s)}

Y (s) - (1 - f)B(Y7(s) - pt). (2.2.19) Then the following results hold:

P( ,,B, (s, y(s)), 6P"B) = N-2(N - n)U2(M + N)/(M + n); (2.2.20)

P(,B, (s, y(s)), 60) - p( I,B, (s, y(S)), ,B)

= (1 - f)2[Bo(p - [o) + (Bo - B)(g(s) - A)]2; (2.2.21)

p( .,B, (S, y(S)), JC) - p( .,B, (s, y(S)), ,B)

= (1 - f)2B2(9(s) - t)2; (2.2.22)

P(i,B, (s, y(s)), RB) - P( ,,B, (s, y(s)), 6AB)

= (1 - f)2[BoAML(Y(S))(7(S) - I o) - B(g(s) -/)]2. (2.2.23)

From (2.2.21) - (2.2.23) it is very clear that if we consider the class F of all N(p , -r2) priors, for each one of the estimators J0, JC and 31B, the supremum (over p (real)) of the left hand side of (2.2.21) - (2.2.23) becomes +oo, and all these estimators turn out to be non-robust. One reason why this happens is that the N(ft, T2) class of priors for all real M and T2(> 0) is indeed too big to be practically useful. As a next step, we consider the smaller class N(p0, T2) of priors, where the mean It0 is specified. This is not too unrealistic since, very often from prior experience, one can make a reasonable guess at the center of the distribution.

Note that when p = yo, denoting ,o,B by B and 0o,B by jB, (2.2.21) - (2.2.23) simplify to

p(&B, (B , y(s)), ï¿½) - p( B, (S, y(s)), 6B) = (1 - f)2(Bo - B)2(j(s) - po)2; (2.2.24)

P( B, (s, y(s)), 6C) - P( B, (s, y(s)), B) =(1 - f)2B2(9j(s) - /-o)2; (2.2.25) P(B, (s, y(s)), 6R) - P( B, (s, y(s)), jB) = (1 - f)2(BoAML(9(S)) - B)'(q(S)- /o)2.
(2.2.26)
Accordingly, from (2.2.24) - (2.2.26),

PORt(6ï¿½) = (1 - f)2 max[B2, (1 - Bo)2](g(s) - PO)2, (2.2.27) PORt(jC) = (1 _ f)2(f7(s) - 0)2, (2.2.28) PORr(j"B)=(1_f)2Ma 2^2 (())]qS OI (- f) aX[BoAML(7(s)), (1 - BoAML((s)))2]( (8) -/z0)2. (2.2.29) Thus, given any ( > 0 and 0 < f < 1, the posterior (-robustness of all these procedures depends on the closeness of the sample mean to the prior mean I0. Also, it follows from (2.2.27) - (2.2.29) that both the subjective and robust Bayes estimators are more posterior robust than the sample mean for the {N(po, T2), T2 > 0} class of priors. A comparison between (2.2.27) and (2.2.29) also reveals that the robust Bayes estimator arrived at by employing the type-II ML prior enjoys a greater degree of posterior robustness than the subjective Bayes estimator if BOAML(9(s)) > 1/2.
Although the Bayesian thinks conditionally on (s, y(s)), and accordingly in terms of posterior robustness, it is certainly possible to use the overall Bayes risk in defining a suitable robustness criterion. This may not be totally unappealing even to a Bayesian at the preexperimental stage when he/she perceives that y will be occurring according to a certain marginal distribution. The overall performance of an estimator a(s, y(s)) of 7y(y) when the prior is is given by r( , a) = E[p( , (s, y(s)), a)], the expectation being taken with respect to the resulting marginal distribution of y(s). The following method of measuring the robustness of a procedure is given in Berger (1984).

Definition 2.2.2 An estimator ao(s, y(s)) of 'y(y) is said to be (-procedure robust with respect to F if

PRr(ao) = sup Ir( , ao) - inf r( , a)I < 7. (2.2.30) EF aEA

We shall henceforth refer to PRr(ao) as the procedure robustness index of ao.

Next we examine how the three estimators J0, Jc and jRB compare according to the PR criterion as given in (2.2.30) when we consider the {N(i0, T2), T2 > 0} class of priors. Using the same notation B as before for the N(yi0, T2) prior, from (2.2.24)

- (2.2.26) it follows that

r(B, SO) - r(B, SB) - (1 - f)2(Bo - B)2a2/(nB); (2.2.31) r( B, JC) - r( B, SB) = (1 - f)2BU2/n; (2.2.32) r( B, SRB) - r( B, 6B) = (1 - f)2E[(BoAML(V(s)) - B)2(g(s) - O)2]. (2.2.33) It is clear from (2.2.31) - (2.2.33) that PRr(J0) = +00, (2.2.34) PRr(C) = (1 - f)2U2/n, (2.2.35) PRr(J'B) = (1 - f)2 sup E[(BoAML(7(s)) - B)2(y(s) - ko)2]. (2.2.36) 0
From (2.2.34) and (2.2.35) it is clear that the subjective Bayes estimator lacks completely in procedure robustness, while the sample mean is quite procedure robust. To examine the procedure robustness of jRB, we proceed as follows.

Theorem 2.2.2 For every - > 0, E[(BoAML(9(s)) - B)2(9(s) - jto)2] = Oe(B21), where 0e denotes the exact order.

Proof of Theorem 2.2.2 Noting that n((s)-po)2 _ (U2/B)X', it follows from (2.2.36) that

E[(BoAML(9(S)) - B)2(g(s)- Ao)2]
Bo 12 o /2-1 = ï¿½ B0B - B2 uexp(-) U1 du, (2.2.37)
fo {1 + gexp(uBo/2B) nBe 2)21/2r(1/2)

where g = (E/(1 - E))Bo 1/2. Next observe that

rhs of (2.2.37)

or2 B 12/2-1
< +- B2 oo g B 2B-lu exp( u) ul/ - u
- n 1 ï¿½ gexp(uBo/2B)}2 2 21/2(1/2) du

< -BB 0 U3/2-1
- - 12 exp(--(1 + BoB- U)) du + B]
n 2g Jo ï¿½2 0B) 23/2r(3/2)
a2 B2
-[ - B-1(1 + BoB-)312 + B]
n 2g

- 0(B1/2). (2.2.38) Again, writing g'= max(g, 1), rhs of (2.2.37)
2 02Bo2 _ 2B0 U ?3/2-1
> Bo/ B - gexp(-uBo/2B) + B] exp(-2)23/1-(3/2) du
J o[{2g, exp(uBo/2B)}g2p(-uO2 32

= 2[(Bo/2g')2B-(1 + 2BoB-I3/2 - 2BOg-'(1 + BoB-3/2 + B]

= 0(B1/2). (2.2.39) Combining (2.2.38) and (2.2.39) the result follows.

Thus, unlike the subjective Bayes estimator, the robust Bayes estimator using the ML-II prior does not suffer from lack of procedure robustness. Also, if one examines (2.2.32) and (2.2.33) (i.e., before taking supremum over B), then using Theorem 2.2.2

it appears that the classical estimator JC has a certain edge over JRB for small B. Smallness of B can arise in two ways. One, n is very large, the second, T2 is much larger compared to a2. This is, however, natural to expect since small B signifies small variance ratio a2/T2 which amounts to instability in the assessment of the prior distribution of 0. It is not surprising that in such a situation, it is safer to use V(s) in estimating -y(y) if one is seriously concerned about long-run performance of an estimator.
Theorem 2.2.2 (specifically the upper bound derived in this theorem for the difference of Bayes risks) clearly demonstrates the procedure robustness of 3RB. The superior performance of the POR index of this estimator relative to the POR index of JC has already been established. On the other hand Jï¿½ which performs better than 6c on its POR index, fails miserably in its procedure robustness. This also shows that the average long-term performance of a procedure can sometimes be highly misleading when used as a yardstick for measuring its performance in a given situation. In this case, JRB seems to achieve the balance between a frequentist and a subjective Bayesian.
In practice, however, U2 is not usually known. In such a situation, one can conceive of an inverse gamma prior for a2 independent of the prior for 0 to derive a Bayes estimator of 7(y) (see Ericson (1969)). In a robust Bayes approach, if one assumes a mixture of a normal-gamma prior and a type -II ML prior for (0, a2), then the type-I ML prior puts its mass on the point (W(s), ZI (Yi - Y(S))2/n). It is possible to study the robustness of the resulting Bayes estimator which will not be pursued here.

Next we consider the problem of finding the range of the posterior mean of 'Y(y) = N-1 i=1 yi over F, in (2.2.3). Simple modifications of the arguments of Sivaganesan and Berger (1989) or Sivaganesan (1988) leads to the following theorem.

Theorem 2.2.3

sup E[-Y(y)ls, y(s)] = f9(s) + (1- f)a'P(y(s)) + O'f(y(s)O ) (2.2.40)
wEr. a + f(y(s) O,) and

inf E[-y(y)Is,y(s)] = f9(s) (1 - f)a' (Y(S)) +ï¿½-J(y(s)11) (2.2.41)
rr.a + f (y(s) 101) (..

where a = E(1 - -'m(y(s) wro), PO (y(s)) = (1 -Bo)9(s) + B01 0, 01 = vja/ V- +l (s), 0,, = v,,/,/V/n +(s), and the values of v, and v, are the solutions in v of the equation e- 2/2 - cv2 - bv + c = 0, (2.2.42) where c = a(27ru2),/2 exp[ZiE,8(yi - (s))2/(2a2)], and b = cxi (V(s) - P0(9(s)))/a.

Note that the equation (2.2.42) has exactly two solutions which may be obtained by rewriting it in the form v = [-2 log(cv2 + bv - c)]1/2 and solving it iteratively using the initial values -bï¿½ ,/b2+2(1+2c)(1+c)
1+2c

2.3 Symmetric Unimodal Contamination

The contamination class Q of the previous section contains all possible distributions. Needless to say, such a class is too big to be useful in practice as it may contain many unwanted distributions. In this section, we consider the smaller but sufficiently rich class Q which contains all unimodal distributions symmetric about go. This class was considered by Berger and Berliner (1986). The contamination prior pdf 7r is given by
7r(0) = (1 - e)7ro(0) + Eq(O), (2.3.1) where q E Q, 0 < E < 1, and as before 7ro(0) is the N(o,T-J) pdf. To find the ML-II prior in this case, we use the well-known fact that any symmetric unimodal

distribution is a mixture of symmetric uniform distributions (cf Berger and Sellke (1987)). Thus it suffices to restrict q to Q' = {Gk : Gk is uniform (po- k,/10 +k), k > 0}. The ML-II prior is then given by

(2.3.2)

where q' is uniform (jt0 - k, p0 + k), k being the value k which maximizes m(y(s)jq), the marginal pdf of y(s) under q.

To find k, note that

m(y(s)Iq) =

J(2wa- exp[- yi E(Yi - )2]q(dO) 2os

pto+k a 1
= (2k)-1 (27ra2)-2 exp 1 (Y, _ O)2]dO iEs

= (27r2) (2kvx/)- exp[- 32 irs(yi - V(s))2

x ['(v/k/o - z(s)) - 4iD(-Vlk/u - z(s))],

where 4 is the N(0, 1) cdf and z(s) = Fn(y(s) - Ipo)/a7. The solution kk (cf Berger and Selke (1987), p117)

=0

if Iz(s)l > 1 if Iz(S)l _ 1,

is given by

(2.3.4)

where k* satisfies

D(V* - z(s)) - i(-k* - z(s)) = k*[O(k* - z(s)) + q(k* + z(s))],

0 being the N(O, 1) pdf. Since both sides of the above equation are symmetric function of z(s), we can replace z(s) by Iz(s)l to get

(k* - Iz(s)l) - -I(-k* - Iz(s)l) = k*[O(k* - Iz(s)l) + 0(k* + Iz(s)l)].

(2.3.3)

? *(0) = (I - E) 7o(0) + q()

(2.3.5)

Clearly k* = 0 is a solution of (2.3.5). Berger (1985, p234) states without proof that there exists a unique solution k*(> 0) of (2.3.5) for Iz(s)l > 1. Since the proof of the uniqueness result is nontrivial, and is very critical for many subsequent developments, we provide below a proof of the same. Lemma 2.3.1 There exists a unique solution k* > Iz(s)l of (2.3.5) for Iz(s)l > 1. Proof of Lemma 2.3.1 For notational simplicity, write k* = x and Iz(s)l = y. Consider the function

h(x, y) = (D(x - y) - 1(-x - y) - x[ï¿½(x - y) + ï¿½(x + y)]. (2.3.6) It suffices to show that for every y > 1, there exists a unique x > y for which h(x, y) = 0. To see this, first observe that

ah(x1 Y)
(9X = [(x - y)(x - y) + (x + y)q(x + y)]
Ox

= xï¿½(x + y)[(x - y) exp(2xy) + x + y]. (2.3.7) Hence, if for some x0 > 0, Oh(x, y)/OxIx=xo > 0 (the existence of x0 is guaranteed due to Oh(x, y)/OxJ,= = 2y2ï¿½(2y) > 0 for y > 0), then from (2.3.7), (x0 - y) exp(2xoy) + x0 + y > 0. This implies that for every X1 > xO, (xl - y) exp(2xly) + x, + y > 0 which from (2.3.7) is equivalent to Oh(x, y)/Oxx=x1 > 0. Thus, for a fixed y, once h(x, y) starts increasing at x = x. (say), it continues to do so from x. onwards for that y. Moreover h(0, y) = 0 for all y, h(y, y) < 0 for all y > 1, and h(+o0, y) = 1 for all y. This shows that for a fixed y > 1, h(x, y) starts decreasing as a function of x from 0 to a negative number, and then starts increasing upto 1. Since h(y, y) < 0, this guarantees the existence of an x > y for which h(x, y) = 0. Remark 2.3,1 Berger (1985) used the fact k* > iz(s)J if Iz(s)l > 1 in order to provide an iterative procedure for finding k*. The above lemma substantiates Berger's claim in addition to justifying the uniqueness of k*.

The following theorem provides the conditional distribution of y(ï¿½) given s and y(s) under the ML-II prior "rK. Recall M0 = a2/To, Bo = Mo/(Mo + n). Theorem 2.3.1 The posterior distribution of y(g) (given s and y(s)) under the prior "sr* is

f(y(9)l8, y(S))= su(z(s))ro(y ()1S,y(s)) + (1 - Xs(z(S)))m(y l)/m(y(s)l ),

(2.3.8)
where for k > 0,

As(Z(S)) = 1 + U(1 - e)-1Bl/2(27r)/2(1/2)[ï¿½(k* - z(s)) + ï¿½(k* + z(s))] x exp(Boz2(s)/2)

= Q-(z(s)) (say), (2.3.9) while for k = 0,

Asl(z(s)) = 1 + E(1 - )-'B-/2 exp(-(1 - Bo)z2(s)/2)

= A%1(z(s)) (say). (2.3.10) Proof of Theorem 2.3.1 Arguing as in Theorem 2.2.1, the posterior pdf of y(g) given s and y(s) is

7;()y(ï¿½)Is, y(s)) = A(y(s))iro(y(g)js, y(s))+ (1 - (y(s)))m(ylij)/m(y(s)d8), (2.3.11) where

((s))= 1 + (1 - -)-1m(y(S)I )/m(y(s)Jiro). (2.3.12) But for k > 0,

m(y( s)jij ) /rm(y( s ) "o )

= (2k-1B l/2 ]7o+I exp[- n 9(8))2 -B
(2k)- dO/ exp[- (g(s)-)2].

(2.3.13)

Recall that z(s) = vfn(V(s)- Io)/a. Also, using (2.3.5), and the fact that k k* o/v ,,

L0-k exp[- 2 - ]dO

= (2 k - /+ (s))/U) - ,(vfi(,lso - k

(27ra2/n)l/2[f D(k* - z(s)) (D(-k* - z(s))]

= (27ru 2/n)l/2k*[ï¿½(k* - z(s)) + ï¿½(k* + z(s))]. (2.3.14) Hence, from (2.3.13) and (2.3.14), for k > 0,

= 1Bo1/2(27r)1/2[k(k* - z(s)) + ï¿½(k* + z(s))] exp(BOZ2(s)12)

= ,s ](zs)). (2.3.15) Combine (2.3.12) and (2.3.15) to get (2.3.9). Next, for k = 0,

(27ra2)-n/2 exp[-i Z&ES(Yi - Io)2] (27rU2)-n/2Bo/2 exp[- {E(y- po)2 - n(1 - Bo)(V7(s) - M0)2}]

= Bo1/2 exp[-(1 - Bo)z2(s)/21. (2.3.16) Combine (2.3.12) and (2.3.16) to get (2.3.10). This completes the proof of the theorem.

From Theorem 2.3.1, the robust Bayes estimator of -y(y) = N-1 1 y is

E[-(y)ls, y(s)]

= ff7(s) + N-1 Z{su(z(s))[(1 - Bo)l(s) + Bo/o] iEï¿½

f yim(YI)dy( )
- Asu(z(s))) -1m(y-)dy(-)- (2.3.17) fm(yj,)dy('ï¿½)

But for i E 9 andk > 0,
f yim(yli ,)dy(g)
f m(yj ,)dy(g)

y1(27ro)-'/2 exp(- Zi~l (Y - 9)2)q,(dO)dy(g) (27o2)-N/2 exp(- =1 (Yi - 9)2)q8(d9)dY() fo-k 9exp(-12 Eis(Y ,- 9)2)qs(d)
fMo-eï¿½ p("g j7 (Yi - )2) s(d)

fOt 9 exp(- - ( - _(s))2)j(dO) fj,-exp(-;2-(0- Y(S))2)qd

fIo+k(
.~s + -t"-k(-9(s))exp(--( - ()2q(dO) fk"t+kexp(-=,-- (0 - 9(s))2)jj (do) &o-k 2
a o (k* - z(s)) - 0(-k* - z(s)) = 9(s) - 7/=i(k* - z(s)) - I(-k* - z(W)) a 0(k* - z(s)) - 0(-k* - z(s)) = oj(s) -/nk*{ï¿½(k* - z(s)) + q(k* + z(s))}

9(s) - a tanh(k*z(s)), (2.3.18)

where the penultimate equality in (2.3.18) uses (2.3.5).
From (2.3.17) and (2.3.18), one gets for k > 0,

5SU(s y(s)) = li(s) - (1 - f)[iBo(fl(s) -~o)+ï¿½ (1 - k1 ko tanh(k*z(s))]. (2.3.19)

Similarly, for k = 0, noting that tanh(k*z(s))/k* -- z(s) as k* --+ 0, one gets after some simplifications

JSU(s,y(s)) = i(s) - (1 - f)(1 - A2(1 - Bo))(li(s) - Ao).

(2.3.20)

Also, generalizing the formula (1.8) in Berger and Berliner and after some heavy algebra, one gets for k > 0, V(y(y)I/, Y(S))
__1_ 1 - 1
A, +- tanh(k*z(s)) =/-[(/ n~a +(N )2(2{Mo + n k*n

x (z(s) - tanh(k*z(s)))} + 5j(1 - A1){Bo((s) - go) - itanh(k*z(s))})] (2.3.21)

while for k 0,

V(Y(Y)IS, Y(S))
"N N-2[(N- n)a2 + (N - n)2fa2 A2 + A2(1 - 2)(1 - Bo)2(9(s) -lo)2}].

(2.3.22)

Next we study the posterior and procedure robustness of the robust Bayes estimators proposed in this section. For posterior robustness, calculations similar to those of the previous section provide for k > 0,

p(.B, (s, Y(S)), 6s) - p( B, (s, y(s)), B)

= (1 - f)2[(B j - B)(g(s) - p'o) + (1 - A1)k* tanh(k*z(s))12,(2.3.23) while for k = 0,

P( B, (s, y(s)), sU) - p( B, (s, y(s)), )

= (1 - f)2[(1 - B) - A2(1 - Bo)]2(g(s) - O)2. (2.3.24)

Accordingly, from (2.3.23) and (2.3.24), for k > 0,

PORr(OSU)

= (1- f)2max[{Bo 1(9(s)- po) + (1 - a)k tanh(k*z(s))}2,

{(B0o1 - 1)(V(s) - Mo) + (1 - 0- tanh(k*z(s))}2], (2.3.25) while for k 0,

PORr(JSU) (1 f)2 max[(1 -Bo)2A ,{1 - A2(1 Bo)}2](7(s) - o)2. (2.3.26) In order to study the behavior of the difference in the Bayes risks of JsU and JB under the subjective N(po, T2) prior, we need the following lemma which leads to the asymptotic relationship between k* and Iz(s)l as Iz(s)I --+ o. Lemma 2.3.2 (i) For large Iz(s)I, say Iz(s)l _> Mo(> 0), there exists Co(o < co < 1) such that co/k* < 0(k* - Iz(s)) + q(k* + Iz(s)) < Il/k*;
(ii) k* - Iz(s)I -+ +oo as fz(s)l -- +oo; (iii) for Iz(s)I >_ M1, 0 < k* - Jz(s)l _< c*(logk*)u/2 for some c,(> 0). Proof of Lemma 2.3.2 (i) From (2.3.5),

(k* - Iz(s)I) + k(k* + lz(s)I) = [1(k* - Iz(s)I) - cD(-k* - lz(s)l)]/k*

< l/k*. (2.3.27) Further, for Iz(s)I > Mo > 0,

D(k* - Iz(s)I) - ID(-k* - Iz(s)J) = 1(k* + k(s)J) - (Iz(s)l - k*) > 1(Iz()) - 4,(o)
1
> D(Mo) - = co (say). (2.3.28) Hence, combining (2.3.27) and (2.3.28) one gets (i).
(ii) From (i),

lim [ï¿½(k* - Iz(s)I) + ï¿½(k* + Iz(s)l)] = 0
Iz(S)I-4oo

since k* > Iz(s)l -4 00 as jz(s)l --+ oc. Hence,

lim q(k* - jz(s)j) = 0, Iz(s)-*0

that is,

lir (27)-1/2exp-(k* - Iz(8)1)2/2] = 0. Iz(s)I-oo

This implies

lim (k* - Iz(s))2 = +00 Iz(s)I-*oo

which implies (ii) since k* > tz(s)I for Iz(s)l > 1. (iii) Using (2.3.5) and (ii),

lim k*O(k* - Iz(s)I) = 1. IZ(S)I-*oo

This is equivalent to

0 = lim [log(k*/V/7) - (k* - Iz(s))2/2] Iz(s) h-*oo

which implies

(k* - z(s))2 = 2 log(k*/V2) + o(1) as Iz(s)-I - o. Consequently,

k*- Iz(s)l = [2log(k*/V2) + o(1)1/2

= (2logk*)1/2(1 +o(1)) as Iz(s)--+ 00. Hence, there exists M such that for Iz(s)l > M1, k* - Iz(s)I < c*(logk*) 1/2 for some c, > 0.

This completes the proof of the lemma.

29

In order to study the procedure robustness of &6U, first find under the N(uo, T2) prior (denoted by B)

r( B, Ssu) - r( B,6B)

= E[6SU(s,y(s))-aU(s,y(s))]'

= (1 - f)2E[{AiBo(g(s) - go) + (1 - 01k tanh(k*z(s))}I[k>o]

+ (1 - A2(1 - Bo))(9(s) - - B(p(s) - o)]2. (2.3.29) We shall now prove the following theorem. Theorem 2.3.2 r(B, 6sU) - r( B, B) = 0(B1/2). Proof of Theorem 2.3.2 First use the inequality

rhs of (2.3.29)

< 3(1- f)2E[{A1Bo(g(s) - io) + (1 - A1)k. tanh(k*z(s))}2I(>o]

+(I - A2(1 - Bo))2(+(s) -o ] B2(9(s) -/.o)21. (2.3.30) Next observe that

E[B2(97(s) -o)2] = E[B2a2(nB)-IX2] = Bo2/n = Oe(B); (2.3.31)

E[(1 - A2(1 - Bo))2((s) - =o

= E[(1 - A2(1 - Bo))2(g(s) - to)21[z2(s)
1-Bo 12o2 X 2 1 [ -1 +gexp(-(1 - Bo) 1/(2B)) J -BX1 [x1 < E (1 +gexp(-(1- Bo)X2/(2B)))2}X I -[X

or2 (1 - Bo)2 2 nB 2gexp(-(1 - Bo)X2/(2B)))}XiIcx<-I]

o 2 (1 - Bo)2 exp((1 - Bï¿½)/2) }E(x2IL ; B- ~ 2g 11

E (X2I[X, 2

= {xexp(-x/2)(x/2)1/2-1/(2F(1/2))}dx
_ 2r12(B/)

=(27r)-1/2 j x 112dX=() - 1/( 3/. Combine (2.3.32) and (2.3.33) to get lhs of (2.3.32) = O(B1/2).

Next use the inequality

E[{AlBo(p(s) -to) + (1 -A)k-tanh(k*z(s))}2I>o]]

< 2E[{ 2B2(9(s) - rIo)2 + (1 - 1)2r2 (k*vfn)-2 tanh2(k*z(s))}Ir>]].

(2.3.35)

Now, writing g' = gVT,

E[B((s) - >o]]

- E[A2B2(u2/n)z2(s)I[2(s)>]I

E[ B(c2/n)z2(s)
1 + g' exp(Boz2(s)/2)(ï¿½(k* - Iz(s)I)

+ q(k* + Iz(s)I))

Let K = max(Mo, MI, 2). Then writing g" = cog' and using (i) of Lemma 2.3.2,

Note that

(2.3.32)

(2.3.33) (2.3.34)

(2.3.36)

rhs of (2.3.36)
1
E[B2(a2/n)z2(s){I[<2(s)K2]}]
Bo(a/n)E[B-XII[B
+ k*z2 (s)/(k* + g" exp(Boz2(s)/2))I[2()>K2]]. (2.3.37) But,

E I[B<2 fB I

< 3 B /(K3- 1)/V/. (2.3.38) Also, using (iii) of Lemma 2.3.2, and log k*/k* < 1 for jz(s)j > K

E[k*z2(s){k* + g" exp(Boz2(s)/2)}1-1I[Z2()>K2]]

< E[z2(s)(z(s) + c.(log k*)l/2){k* + g" exp(Boz2(s)/2)}-I[2()>K2]]

< E[{Iz(s)131(g"exp(Boz2(s)2))}I[2()>K2
cz2 (s)(2Rifexp(Soz2 (s)14) )-l I[z2(s)>K2]]. (2.3.39)

But,

E[Iz(s)3 exp(-Boz2(s)/2)Iz2(s)>K2]]
=E[(X21B)/ exp(-1BOX2 /B)IL[>2B]

1
= B32 /K2B x32exp(--Box - !x)X1/2-1 (27r)-1/2dx 222B B 2 (27r) 1/2 B3/2j0 x exp(- xj2o + 1))dx

=(27r)-1/2 B-3/2 4(-- + 1)-2 = (Bl/2). (2.3.40)

Moreover,

E[z 2(s) exp(-B0z 2(s))I[z2(S)>K2]

SE[(X2/B) exp(--O-X)I[2>K2B]] B-1 exp(--- + 1))x/2(20)-1/2dx = 0(B1/2). (2.3.41) Combine (2.3.37) - (2.3.41) to conclude that E[A2B2(g(s) - o)2I[r>o]] = 0(B1/2). (2.3.42) Finally,

E[(1 - A2)
(k*)2n tanh2(k*z(s))I[>]] < -E[(k*)- tanh (k*z(s))Ili(,)I>l]]
n
012 2 k*-2I 2.43 < -E[z2(s)I[ [lz(s)I>K]I, (2.3.43)
n

where in the final inequality of (2.3.43), we use I tanh(k*z(s)) k*lz(s)j for 1 < Iz(s)I : K and Itanh(k*z(s)) < 1 for Iz(s)l > K. As before

E [Z2 (s)I[l<1.(.)
E[ (k *)2 I )j> K]]
< E[(z2(s))-I[Z2()>K2]]= E[B(X2)1 I[X- 2
BJE[B. / 1 (27r >K)B]]
00KBX x =I1- 2r 12d

= 0(B1/2).

(2.3.45)

From (2.3.43) - (2.3.45),

lhs of (2.3.43) = 0(B1/2).

Combine (2.3.30), (2.3.31), (2.3.32), (2.3.34), (2.3.35), (2.3.42), (2.3.43) and (2.3.45)

to get the theorem.

Remark 2.3.2 It follows from the above theorem that as n -+ oo, i.e. B -+ 0, under the subjective N(i0, '-2) prior, OSU, the robust Bayes estimator of -y(y)= N-1N y, is asymptotically optimal in the sense of Robbins (1955).

Next, to find the range of E[^(y)Is, y(s)] over F. in (2.3.1), once again applying Sivaganesan and Berger (1989) or Sivaganesan (1988), we get the following theorem. Theorem 2.3.3

sup E['y(y) Is, y(s)] = fq(s) + (1 - f)S 70(Y(S)) + H1(z.) 7r r. a + H(z,,)

inf E[7(y)Is, y(s)] = fY9(s) + (1 - f)a (s))+H (ZI) irr. a + H(zt)

- !21zo f(y(s)IO)dO = f(y(s)lmo)
2z L:-z 9~~)6d

= i' o+z
J -',o- Of (Y(S)IO)dO = of (y(S) Po)

if z-5 0 ifz = O;

(2.3.48)

if z - 0

if z = 0, (2.3.49)

and

where

(2.3.46) (2.3.47)

H(z) H, (z)

while the values of z and z,, are given by the solutions of the equation

(P(s)v - uu/V )(wt + 2au/v/-) - v(2aou/Vi + wtlo) (2.3.50) 2[auz + t(apuo - ao) + wuv/2]

where t - ï¿½-) ï¿½ l u =(PO+ZJ V = 'I)(o-z-J(8)) v - -(s) (PO-z-V ), and w = exp[--1 EiEs(Yi - V()2] (2/-o2)-(--)/2

Note that the equation (2.3.50) may be iteratively solved for z by taking a number larger than 6 o(9(s)) as the initial value of z when maximizing, and a number smaller than 6O'(9(s)) as the initial value of z when minimizing.

2.4 An Example

This section concerns the analysis of real data set from Fortune magazine, May 1975 and May 1976 to illustrate the methods suggested in Sections 2.3 and 2.4. The data set consists of 331 corporations in US with 1975 gross sales, in billions, between one-half billion and ten billion dollars. For the complete finite population, we find the population mean to be 1.7283 and the population variance 2.2788. The population variance is assumed to be known for us. We select 10% simple random sample without replacement from this finite population. So the sample size is n=33. We can obtain easily the sample mean and the corresponding standard error. We use gross sales in previous year as prior information to elicit the base prior 7r0. The elicited prior 7r0 is the N(1.6614, 6.4351 x 10-3) distribution. Under this elicited prior r0, we use formulas (2.2.1) and (2.2.2) to obtain the subjective Bayes estimate and the associated posterior variance. But we have some uncertainty in 7r0 and the prior information, so we choose E = .1 in F, and we get the robust Bayes estimates and the associated posterior variances using the formulas (2.2.16), (2.2.17), (2.3.19) and (2.3.21). A number of samples were tried and we have reported our analysis for one sample. Table 2.1 provides the classical estimate Jc, the subjective Bayes estimate 60, the robust

Table 2.1. Estimates, Associated Standard Errors and Posterior Robustness Index Estimate SE [Y(Y) - J1 POR 6C 1.9881 0.2852 0.2598 8.6506x10-2 J0 1.7191 0.0105 9.2187x10-3 7.2386x10-2 6RB 1.7704 0.1459 4.2075x10-2 4.7416x10-2 JSU 1.7313 0.1184 3.0239x10-3 6.5948x10-2

Bayes estimate 6RB with all possible contaminations, the robust Bayes estimate OSU with all symmetric unimodal contaminations and the respective associated standard errors. Table 2.1 also provides the posterior robustness index for each estimate which is in a sense the sensitivity index of the estimate as the prior varies over F.

From Table 2.1, we may find that the robust Bayes estimates 6sU and RB are well behaved in the sense that they are closer to y(y) than at least the classical estimate 6C and 3sU is closer to 'y(y) than even the subjective Bayes estimate 60. The robust Bayes estimates 6S" and 6RB are also good in the viewpoint of the posterior robustness index.
For finding the range of the posterior mean of y(y) = N-1 N y over 17, we get that the posterior mean is in the interval (1.7120, 1.7868) for all possible contamination case and in the interval (1.7166, 1.7406) for all symmetric unimodal contamination case. Observe that the second interval is much shorter than the preceding one. Also note that robust inference is achieved for F, in either case. So if we feel that the true prior is close to a specific one, say 70, we should model through F, using one or the other contamination model gaining robustness in each case as compared to the use of a subjective case.

CHAPTER 3

ROBUST BAYES COMPETITORS OF THE RATIO ESTIMATOR

3.1 Introduction

For most sample surveys, for every unit i in the finite population, information is available for one or more auxiliary characteristics, characteristics other than the one of direct interest. For example, if the characteristic of direct interest is the yield of a particular crop, the auxiliary characteristic could be the area devoted that crop by different farms in the list. We consider the simplest situation when for every unit i in the population, value of a certain auxiliary characteristic, say xi(> 0) is known (i = 1,2,.. .,N).

The classical estimator of the finite population mean N-1 i,= yi in such cases is the ratio estimator eR = N-1 (EZc, Yi/ E_. xi) EI1 x, which seems to incorporate the auxiliary information in a very natural manner. Moreover, this estimator can be justified both from the model and design based approach. While Cochran (1977) provides many design-based properties of the ratio estimator, Royall (1970, 1971) justifies this estimator based on certain superpopulation models.

The ratio estimator can also be justified from a Bayesian viewpoint. To see this, consider the superpopulation model yi = fOxi+ei, where ei are independent N(0, u2xi), i = 1,2,... , N, while / is uniform over (-oo, o). In the above, u2(> 0) may or may not be known. For unknown a2, one assigns an independent prior distribution to a2.

Then based on s and y(s), the posterior (conditional) mean of Y(y) = N-1 is given by eR.

It is clear from the above that the ratio estimator can possibly be improved upon when one has more precise prior information about 3. For example, if one wants to predict the mean yield of a certain crop based on a sample of farms, it is conceivable, on the basis of past experience, to specify a prior distribution for 0 with fairly accurate mean 00 and variance To. Ericson (1969) has indeed shown that with a N(0, -r) prior to fl, the Bayes predictor eE of 'Y(y) is given by

eE(s,y(S)) = N-'[yi + {(-2Z yi + TO2/O)/( -2 ZXi" +T02)} Ixi]. (3.1.1) iEs iEs iEs ivs Clearly eE converges to eR as To --+ Co, that is, when the prior information is vague.

The above Bayesian approach has been frequently criticized on the ground that it preassumes an ability to completely and accurately quantify subjective information in terms of a single prior distribution. We shall see in the next section that failure to specify accurately one or more of the parameters /0 and To can have a serious consequence when calculating the Bayes risk and often protection is needed against the possibility of such occurrence. A robust Bayesian viewpoint assumes that subjective information can be quantified only in terms of a class F of possible distributions. Inferences and decisions should be relatively insensitive to deviations as the prior distribution varies over F.

In this chapter, we consider an E-contamination class of priors for /3 following the lines of Berger and Berliner (1986). In Section 3.2, the e-contamination class includes all possible distributions. For every member within this class, the Bayes predictor (posterior mean) of 7(y) is obtained, and expressions for variations of the posterior mean within this class of priors are given following Sivaganesan and Berger (1989) and Sivaganesan (1988). Moreover, the ML-II prior (see Good (1965) or Berger and Berliner (1986)) within this contamination class of priors is found. We provide also

analytical expressions for the indices of posterior and procedure robustness (cf Berger (1984)) of the proposed robust Bayes predictors based on ML-II priors for an entire class of priors, and have compared these indices with similar indices found for the usual ratio estimator as well as the subjective Bayes predictor. In the course of calculating indices for procedure robustness, we have proved the asymptotic optimality in the sense of Robbins (1955) of the robust Bayes predictor, and have also pointed out that the subjective Bayes predictor does not possess the asymptotic optimality property.
The above program is repeated in Section 3.3 with the exception that the contamination class now contains only all symmetric unimodal distributions. Once again, robust Bayes predictors are found, and their optimality is studied.

Finally, in Section 3.4, a numerical example is provided to illustrate the results of the preceding sections.

The summary of our findings is that the robust Bayes predictors enjoy both posterior and procedure robustness for a fairly general class of priors. The subjective Bayes predictor enjoys posterior robustness, but fails miserably under the criterion of procedure robustness. The classical ratio estimator is inferior to the others under the criterion of posterior robustness, but enjoys certain amount of procedure robustness: Thus, our recommendation is to give the robust Bayes predictors every serious consideration if one is concerned with both Bayesian and frequentist robustness. The other important finding is that when the sampling fraction goes to zero, that is, we are essentially back to an infinite population, several asymptotic optimality properties of the estimators of Berger and Berliner (1986) are established which are hitherto unknown in the literature.

Royall and Pfferman (1982) have addressed a different robustness issue. They have shown that under the superpopulation models (i) yi ind N(a + O3xi, a2xi) and ii) Y, in N(3xi, u2xi), if 0 is uniform (-oo, oo), then under balanced samples, that is, t(s) = ( the posterior distribution of y(ï¿½) given (s,y(s)) is the same under

the two models, the resulting estimator of the finite population mean being the ratio estimator which equals the sample mean in this case.
For simplicity, we shall assume that n(s) : n = p(s) = 0, that is, we effectively consider only samples of fixed size n. Also, for notational simplicity, we shall, henceforth, assume that s = {il,. .,j} where 1 < il < .- < in < N. Let
-- {1,2,...,N}- s = {j1,"',jN-n} (say), where 1 < ji < --- < jN-t < N.

We shall write y(s) = (yil, .,yin)T, y( ) = (Yjl,.. IYjN-.)T, X(S) = (X,' ".,)T, D(s) = Diag(xil,.'-, xi.). x(g) = (xj,...,xjN_,)T, and D(9) = Diag(xj3,--., XjN_).

Note that writing M0 = o2/T , Bo =_ Bo(s) = Mo(Mo + nï¿½(s))-1 and f = n/N (the sampling fraction), the Bayes predictor eE given in (3.1.1) can be alternately written as

eBo(S, y(s)) = fV(s) + (1 - f)2(ï¿½)[(1 - Bo(s))g(s)/12(s) + Bo(s)flo]. (3.1.2) In later sections, we shall repeatedly use (3.1.2). Also, the associated posterior variance is given by

V(y(y) s, y(s)) = a2N-2[(N - n) (9) + (N - n)2t2(g)/(M + ni(s))]. (3.1.3)

3.2 e-Contamination Model and the ML-II Prior

Suppose that conditional on j3, y, are independent N(3xi, a2xi), i = 1, 2,..., N. Also, let fi have the prior distribution r = (1 -E)0ro+eq, where -- E [0, 1) is known, 7ro is the N(o, -2) distribution, and q E Q, the class of all possible distributions. Then the marginal pdf of y(s) is given by

m(y(s)I1r) = (1 - e)m(y(s)Iiro) + Em(y(s)Iq), (3.2.1) where m(y(s)ro) denotes the N(floX(s), o2D(s) + T0x(s)xT(s)) pdf, while

m~y~s ~ U2) - (H J1 e2 [h- - - 1:(y,Trn(y(s)Iq) = 1(2-F 2~ (JJ x,) exp[- 2' (y2 -s _OX=)2/xj]q(dfl). (3.2.2)
jiEs j0 Es

The posterior pdf of / given (s, y(s)) is now

7r(fl s, y(s)) A(y(s))ro(31s, y(s)) + (1 - A(y(s)))q((31s, y(s)), (3.2.3) where
A(y(s)) = (1 - E)m(y(s)jiro)/m(y(s)Iir), (3.2.4) 7ro(fls, y(s)) denotes the N((1 - Bo(s))9j(s)/l(s) + Bo(s)Oo, u2(Mo + nt(s))-1) pdf, and

q(/31s, y(s)) = f(y(s) /3)q(/3)/m(y(s) q). (3.2.5) This leads to the posterior pdf of y(g) given (s, y(s)) as

'7r Y(9 IS, (S) = ff (y( 9)I,3)7r(OIs, y( s))d O

= A(y(s))7ro(y(9)Is, y(s)) + (1 - A(y(s)))q(y(g) Is, y(s)) (3.2.6) where 7ro(y(g)Is, y(s)) is the N(((1 - Bo(s))g(s)/l(s) + Bo(s)/3o)x(g),,2(D(9) + (Mo + nt(s))-lx(9)xT(9))) pdf while using (3.2.5), q(y(9)Is, y(s)) = m(ylq)/m(y(s)jq). (3.2.7) Then the posterior mean of -y(y) is given by

E[y(y)Is, y(s)]

= fy() +(- (1 - E)m(y(s)ro)6PO(y(s)) + 6f13f(y(s)/3)q(d3) (1 - E)m(y(s)7ro) + Em(y(s)Iq) (3.2.8)

where
PO(y(s)) = (1 - Bo(s))9(s)/l(s) + Bo(s)Oo. (3.2.9) From Sivaganesan and Berger (1989) or Sivaganesan (1988), it follows that

sup(inf)E[-y(y)ls, y(s)] = fy(g) + (1 - f)t(g)
7Er

x sup(inf) (1 - E)m(y(s)wo)J"O(y(s)) + -Of(y(s) 13) (3.2.10) 1(1 - E)M(y(S)17o) + Ef(y(s)1,3)

Hence, following Sivaganesan (1988), we get

sup E(y) Is, y(s)]
irEF

fy() + (1- (1 - E)m(y(s)lro)6ro(y(s)) + eiOuf (y(s) fu) (3.2.11) (1 - 6)m(y(s)r10) + Ef(y(s) 1lU)

while infEr E[y(y)ls, y(s)] has a similar expression as (3.2.11) with f3L replacing fu. In the above flu and fiL (iL < fu) are given by

,3u = 9 (s)/-(s) + vu u(uT(s))-1/2, fL = 9(s)l+(s) VL(ni(S))-11 where vu and VL (< vu) are solutions in v of the equation

e-V2 2- c(v2 - 1) - bv = 0, (3.2.12) where c = a(27ra2) 2(L Xi8 /2) exp[- Es(yi - x=m(s)/x(s))2/x], (s) (9(s)/2(s) - 65"ï¿½(y(s)))/u, and a = (1 - E)-1m(y(s)j7ro). We shall use (3.2.11) in Section 3.4 for numerical evaluation of the supremum and infimum of the posterior mean under the given class of priors.
Next we find the ML-II prior within the given class of priors. Since iE(Yi Ox,)2/x, is minimized with respect to 13 at f8 = V(s)/ï¿½(s), from (3.2.1) and (3.2.2), the ML-II prior which maximizes the marginal likelihood m(y(s)Iir) with respect to q E Q is given by
irML(f0) (1 - E)7ro(fl) + --(fl), (3.2.13) where j(fl) is degenerate at f0 =-l. The posterior pdf of y(g) given by (s, y(s)) under the ML-II prior FFML is now given by

FrML (Y (9)1S, Y (S))

= ML(y(S)) 7ro(y()Is, y(s)) + (1 - ML(y(S))) N((,(s)/((s))x(9),a2())

(3.2.14)

where for 0 < e < 1, after some algebraic simplifications

-ML (9('5) )

1+ E(1 - _)1m(y(S)IiM/m(Y(S)Iiro)

1+E(1 - B(s)(B(s)(V(s) - /ox(s))2/(2a2(s))). (3.2.15) The robust Bayes predictor of -y(y) under the ML-I prior FiML then simplifies to

eRB(S,y(s)) = fj(s) + (1 - f)'(9){(1 - AML(V(s))Bo(s))g(s)/(s)

+AML(g(s))Bo(s)fio}. (3.2.16) Also, generalizing the formula (1.8) in Berger and Berliner (1986), one gets the associated posterior variance given by

V( (y)ls, Y(s))

N -2[a2 (N - n).(9) + (N- n) 2t2(g){a 0 AML + AML(1 - AML) Mo+ n-;-( s

xB2(s)(q(s)/_(s)- 30)2}]. (3.2.17) We shall now compare the robust Bayes predictor eRB of Y(Y) with the Bayes predictor eBo given in (3.1.2) and the ratio estimator eR in terms of posterior risks as well as the Bayes risks under the {N(f30, T2), T2 > 0} class of priors. For a typical member N(f30, -r2) of this class, the Bayes predictor of -y(y) is given by

eB(s,y(s)) = f 9(s) + (1 - f).t(g)[(1 - B(s))q(s)/(s) + B(s)30], (3.2.18) where B =_ B(s) = M/(M + ni2(s)), M = U2/Tr2.
The choice of the above class of priors may be justified as follows. Very often, based on prior elicitation, one can take a fairly accurate guess at the prior mean. However, the same need not necessarily be true for the prior variance, where there is

a greater chance of vagueness. Note that when r2 54 T2, none of the estimators eR, eBo or eRB is the optimal (Bayes) estimator.
Based on Definition 2.2.1 introduced in Chapter 2, we examine the posterior robustness indices of eRB, eBo and eR. Note that whether or not posterior robustness exits will often depend on which (s, y(s)) is observed. This will be revealed in the subsequent calculations.
With this end, first note that under the N(f30, T2) prior denoted by r2, the posterior risk of any estimator e of y(y) is

p( T2, (s, y(s)), e) = p( 2, (s, y(s)), eB) + (e - eB)2

(3.2.19)

where eB is given in (3.2.18). Using Definition 2.2.1 and (3.2.19) one gets for the class F = {r2 :T2 > 0} of priors

PORr(eR) = sup (1 - f)2;2(.)B(s)2[y(s)/2(s) _)o]2
O
(1 - f)2-2(g)[V(s)/(s)- 0]2;

PORr(eBo)

PORr(eRB)
0

(

- sup (1 - f)2 22(g)(B(s) - Bo(s))2[9(s)/(s) -/3o]2
O
f (1 - f)22(g) max[Bo(s)2 , (1 - Bo(s))2]

X [9(8)/; - 1o] ; (

3.2.20)

3.2.21)

sup (1 - f)2.t2(g)(AML(9(s))Bo(s) - B(s))2[V(s)/(s) -,f0]' I

S(1 - f)2ï¿½2(g)max[ ML(9(s))BO(S)2, (1 - AML(q(s))Bo(S))]

x[9(s)/-(s) _ #o12. (3.2.22) Note from (3.2.20) - (3.2.22) that if we allow all possible distributions N(3, T2), where 0 is widely different from ,3 as our priors, all POR indices can become prohibitively

large. It follows from (3.2.20) - (3.2.22) that both eB0 and eRB are superior to eR in terms of posterior robustness. However, the ratio

PORt(eBo )/PORt (eRB)

max[B2, (1 - B0)2]/ max[AML(M(s))B , (1 - ML((s))Bo)] (3.2.23) can take values both larger and smaller than 1 depending on the particular (s, y(s)).

Although the Bayesian thinks conditionally on (s, y(s)), it seems quite sensible to use the overall Bayes risk as a suitable robustness criteria, at least at a preexperimental stage. This issue is also addressed in Berger (1984) who introduced also the criterion of procedure robustness. We use Definition 2.3.2 to study the procedure robustness indices of eRB, eBo and eR.

Simple calculations yield for the class IF {=2 ï¿½7-2 > 0} of priors

PRr(eR) = sup (1 - f)2%2(g)u2(n(s))-1B(s) 0
= (1 - f)2;2(g)u2(nt (s))-1; (3.2.24)

PRr(eBo) = sup (1 - f)2Yr2(.g)U2(nt(s))-1(Bo(s) - B(s))2/B(s) 0
= + C); (3.2.25)

PRr(eRB) sup (1 - f)2.2(9)E[(Bo(s)ML(Y(s)) - B(s))2(g(s)/?(s) - 0)2].
0
(3.2.26)
It is thus clear that the subjective Bayes predictor eBo lacks procedure robustness, while the ratio estimator eR is quite procedure robust. The procedure robustness of eRB can be examined on the basis of the following theorem.

Theorem 3.2.1 E[(Bo(S)AML(7(s))-B(s))2((s)/t(s)- _30o)2] = O (B'!/2(s)), for every E> 0, where 0e denotes the exact order.

Proof of Theorem 3.2.1 Noting that nt(s)(9(s)/l(s) - o)2 - (a2/B)X2, it follows from (3.2.26) that

E[(BoA ML(())/(S)) - B)2(y(s) - 3o)2]
o0Bo 2 r 2u 1/2-1
-B) u- B}2-/U\ U du ,(3 .2.27)
o 1 + gexp(uBo/2B) - n(s)B uexp-2)21/2F(1/2) where g = (/(1 - E))Bo1/2. Next observe that

rhs of (3.2.27)

a2 B0
< nx(s) Jo{ 1 + gexp(uBo/2B)}2
or2 B2 0
< I B-1 exp(-2(1 + BoB
nx (s) 2g fo 2
or2 B 2 )-3/2 B
- -1(1 + BoB-' +

= 0(B1/2).

Again, writing g'= max(g, 1),

rhs of (3.2.27)

a2 2~
> or f BO B-1 x __ n(s) [2gfexp(uBo/2B)}2

B2}B-1uexp(-) '/2- du
221/2r-(1/2) a

U3/2-1du B]
-)23/211(3/2)du+B

(3.2.28)

2B0
gexp(-uBo/2B) B

u u 3/2-1
x exp(-.) 2s/F(a/2)du

a2
-r 2[(Bo/2g')2B-1(1 + 2BoB-1)-3/2 - 2BOg-(1 + BOB-)-3/2 B] n_7F(S) +B

= 0(B1/2).

(3.2.29)

Combining (3.2.28) and (3.2.29) the result follows.

When f -+ 0, we get a result related to the procedure robustness of the robust Bayes procedure of Berger and Berliner (1986). To our knowledge, such a result is

the first of its kind. In addition as n -+ oc, that is, B -+ 0, it shows that the robust Bayes procedure is asymptotically optimal. In view of (3.2.24) and (3.2.26) and 'Theorem 3.2.1 (i.e., before taking supremum over B) it appears that eR has distinct advantage over eRB for small B. This is not surprising though since small B signifies small M = a2/T2 which amounts to greater instability in the assessment of the prior distribution of /3 relative to the superpopulation model. It is not surprising that in such circumstances, it is safer to use eR for estimating 'Y(y) if one is seriously concerned about the long-run performance of the estimator.
A different point to note is that in reality, a2 is not usually known. In such situations, however, one can conceive of an inverse gamma prior for a2 independent of the prior for /3 to derive a Bayes estimator of -y(y). In a robust Bayes approach, if one assumes a mixture of a normal-gamma prior for (/, a2), then the ML-II prior puts its mass on the point (9(s)/;-(s), n- 1 &Es(Yi - xiy(s)/x(s))2/xi). We have not, however, pursued here the resulting Bayesian analysis.

It should be noted though that the class Q contains many priors which should possibly not come under consideration. In the following section, instead of Q, we consider the Q. which contains only symmetric unimodal distributions.

3.3 Symmetric Unimodal Contamination

Suppose now the contamination prior distribution is given by

7r(0) = (1 - )07ro(/0) + Eq(/3), (3.3.1) where q G Q,, the class of all possible symmetric unimodal distributions, and ir0 is as in Section 3.2. The expression for E['y(y)Is, y(s)] is the same as the one given in (3.2.8). Using the fact that any symmetric unimodal distribution is a mixture of symmetric uniform distributions as in Sivaganesan and Berger (1989), one gets

sup(inf)E[7(y)Is,y(s)] fy(g) + (1 - f); (9)
7rEF

x sup(inf) (1 - e)m(y(s)Tro)6rO(y(s)) + eH(k) (3.3.2) k (1 - E)m(y(s)7ro) + EH(k) where m(y(s)[iro) and J1rï¿½(y(s)) are given by (3.2.1) and (3.2.9) respectively, and
1 60 o+k
H (k) =- 2f S~- (y (s) fld,3 if k =0

= f(y(s))/o) if k 0; (3.3.3)

H1(k) = 2kok 3f(y(s)I3)d/3 if k 0

3 of(y(s) 1,o) if k = 0. (3.3.4) Using the expression f (y(s)13) = (27rr2)- e XI 2x) exp[-212 Ei,8,(Y,-6 OX)2/Xi], it follows after some simple algebra that

H(k) = exp[- - - 2 (H x. )
iEs iEs
x (n f())(8/2/(, _((o +'o - k - (s)/;(-) )13.3.5) o(n0(S))-1/2 a -(n (s)) -1/2 and

Hi(k) = H(k)q(s)/.(s) + exp[-1 (Y, -X Y(S)/ (s))2/x,] x (2k)- (27r 2) 7 xi2)a(n.,(s))1 iEs

[iD(o + k - O(S)/o() o - k - (3.3.6) a (n- (S)) - 1 - u(n t(s))1/2 (3.3.6)

Write z z(s) = ((s)/-(s)- o)(nt(s))1/2/0, ko = k(n 2(s))1/2/a, t = (ko-z(s))+ q(-ko - z(s)), u = q(ko - z(s)) - q(-ko - z(s)), w = -(ko - z(s)) - '(-ko - z(s)),
a = E-'(1-)m(y(s)[ro) and a, = aPO(y(s)). Now, using (3.3.2) - (3.3.6) and solving

d[(a, + Hi(k))/(a + H(k))] = 0, it follows after some heavy algebra that sup E[y(y)Is, y(s)] = fg(s) + (1 - f)t( )(a, + Hi(ku))/(a + H(ku)) (3.3.7)
-rEF

and

inf E[-(y)Is, y(s)] = f9(s) + (1 - f)X(g)(a, + Hl(kL))/(a + H(kL)) (3.3.8)
7rEr

where kv and kL(< ku) are the two solutions of the quadratic equation

2k[auk + t(ao - a,) + Guv/2]

= [w9(s)/1(s) - uu(nj (s))-'/2][Gt + 2au(n (s))-1/2]

- w(2aia(nt(s))-1/2 + Gf3ot), (3.3.9) where

- xiy~)/x~s) /xi]2a)- 2 2lx )(8())-1/2. (3.3.10) G exp[- 1o2 ~ - x9s1Ts)1j( , n
iEs iEs

The formulas (3.3.7) - (3.3.10) will be utilized in Section 3.4 for numerical computations.
Next we find the ML-II prior in this case. Since any symmetric unimodal distribution is a mixture of symmetric uniform distributions, for finding the ML-II prior, it suffices to restrict oneself to Q' = {Gk : Gk is uniform (/0 - k, /3o + k), k > 0}. The ML-II prior is then given by 8= (1 - E)7ro + Eq, (3.3.11) where , is uniform (/3o - k,3o + k), k being the value k which maximizes m(y(s)Iq).

To find Ic, first write
f3o+k
m(y(s)lq) = (2k)-1 fo-k (212- x exp[-.2 (fi1xi)2/xd
Es 2aiEs

= (2ior2)-2 (2k n (s))1 (-I x2) exp[-yi Z yy - X4V(S)/-(S))2/XiI iEs iEs

x [1D( (s)k/a - z(s)) - i(- ni(s)k/U - z(s))], (3.3.12) where z(s) = (nt,(s))1/2(V(S)/Yt(s)- 30)/u. Differentiating m(y(s)fq) with respect to k and letting the derivative equal to zero, it follows from Berger and Sellke (1987) that

= k*o/ nï¿½(s) if Iz(s)I > 1

0 if Iz(s)I < 1, (3.3.13) where k* is a solution of the equation

D(k* - Iz(s)I) - I(-k* - Iz(s)I) = k*[O(k* - Iz(s)I) + 0(k* + Iz(s))]. (3.3.14) Remark 3.3.1 Clearly k* = 0 is a solution of (3.3.14). Berger (1985, p234) points out that there exists a unique solution k*(> 0) of (3.3.14) for Iz(s)I > 1. Lemma 2.3.1 contains the stronger result than that there exists a unique solution k* > Iz(s) of (3.3.14) for z(s)l > 1.

Under the ML-II prior ?r*, the posterior distribution of y(g) given (s, y(s)) is

rS(y()s,y(s)) = Asv(z(s))7ro(y(.)Is,y(s)) + (1 - AsU(z(s)))m(y)/m(y(s)Iq ), (3.3.15)

where for k > 0,

A-(z(s)) 1 + e(1 - E)-l(27r/Bo)1/2(1/2)[O(k* - z(s)) + ï¿½(k* + z(s))] x exp(Boz2(s)/2)

= 1(z(s)) (say). (3.3.16) while for k = 0,

Is(z(s)) = 1 + E(1 - )-'Bol/2 exp(-(1 - Bo)z2(s)/2)

= A '(z(s)) (say). (3.3.17)

The robust Bayes predictor of -y(y) under the ML-II prior 7r is then given by

esu(s, y(s))

= fh(s) + (1 - f).()[l(z(s)){(1 - Bo(s))gsl()/(s) + Bo(s)3o}

+ (1 - k nsMf{9(s)/ (S) k, tanh(k*z(s))}] (3.3.18) for k > 0, while for k = 0,

esu (s, y (s))

- fg(s) + (1 - f) (.)[ 2(z(s)){(1 - Bo(s))9(s)/ (s) + Bo(s)Qo}

+(1 - 2(z(s)))/%]. (3.3.19) Also, generalizing the formula (1.8) in Berger and Berliner and after some heavy algebra, one gets the associated posterior variance given by

V(Y(Y) Is, Y(8))

N -2[a,2(N - n).() + (Y- n)2 2(g)
x{a2( A, + 1 -Ak* tanh(k*z(s))(z(s)- 1 tanh(k*z(s))))}

Mo + nt(s) niX(S) k*- A1(1 - l){Bo(s)(Y (s)/;2(s) - lo) - k*- tanh(k*z(s))}2] (3.3.20)

for k > 0, while for k = 0,

V(Y(y)Is, y(s))

N -2[a2 (N - n)t(g) + (N - n)22(g){a2 A2 + A2(1 - A2) Mo + nx;,(s)
x (1 - Bo(s))2(j(S)/ï¿½(s) - /3o)2}]. (3.3.21) Next we provide expressions for the indices of posterior robustness of the robust Bayes predictors proposed in this section under the {N(60, T2), T2 > 0} class of priors.

Calculations similar to those of the previous section provide for k > 0, PORr(esu)

(1 - f)2x2()max[{Bo(s)A(9(s)/(s) - Oo) + (1 - A1) k*U
k* rtI-i(s)tahkz)),

{(Bo(s)Ai - 1)(9(s)/(s) - io) + (1 - A1) tanh(k*z(s))}2], k* Vn (s)

(3.3.22)

while for k 0,

PORr(esu) (1 -f)2 2(\$) max[(1 - Bo(s))2A, {1 - )2(1 - Bo(s))}2](j(s)/2(s) flo)2.
(3.3.23)
In order to examine the procedure robustness of esu, first note that under the N(fl0, T2) prior (denoted by 4) r( 2,esu) - r(Cr2,eB)

Etesu(sy(s))- eB(SY(s))i

(1 - f ()E[{AiBo(s)(q(s)/ (s) - i3o) + (1 - A),tanh(k*z(s))F] + (1 - A2(1 - Bo(s)))(g(s)/(s) - f3o)Irk=o] - B(s)(V(s)/(s) - 00o)]2. (3.3.24) We now have the following theorem.

Theorem 3.3.1 r( ,2,esu) - r(C42,eB) = 0(B1/2). Proof of Theorem 3.3.1 First use the inequality rhs of (3.3.24)

_ 3(1 - f)2 2(9)E[{A1Bo((s)/V(s) - flo) + (1

- A1) k* - tanh(k*z(s))}2Ik
k*- nï¿½(s) kO

+(I - A2(1 - Bo))2(O(S)/2(s)- flo)2I[ o] + B2(3(s)/0(s)- flo)2].

(3.3.25)

Next observe that E[B2 (8)/ (s)- 0)2] = E[B20,2(n(s)B)-l X2] = B T/(n (s)) = Oe(B) and

E[(1 - 2(1 - Bo))2(9(s)/ (s) - flo)2I[.=o]]

= E[(1 - A2(1 - Bo))2(q(s)/(s) - I3O)2I[z2(s)_l]]

1 - Bo
1 + g exp(-(1 - Bo) X/(2B))

(3.3.26)

}2
2( s)BXl I[x(
(g = --(I + E)-1Bo 1/2)

2 E[{1 + (1 - Bo)2
n(s)B(1 + gexp(-(1 - Bo)x /(2B)))2}x1I[x B]]
K 2 (1 - Bo)2
nx (s)B 2gexp(-(1 - Bo)12(2B)))} x<___2 (1 - Bo)2 exp((1 - Bo)/2)}E(X2l[X2

Note that

E(XI[x2

= fB{xexp(-x/2)(x/2)1/2-1/(2F(1/2))}dx

= (27-)-1/2j oBx1/2dx - (27r)-1/2(3B3/2).
fo 3

Combine (3.3.27) and (3.3.28) to get

lhs of (3.3.27) = O(B/2).

Next use the inequality

E[{AjBo(9(s)/1(s) - i0o) + (1 - A1) k* tanh(k*z(s))}2I>0]]

2E[{IB (()/ (s) - o)2 + (1 - A1)2cr2(k* )- 2tanh2(k*z(s))}[>o]].

(3.3.30)

(3.3.27)

(3.3.28)

(3.3.29)

Now, writing g' = g 1-,

E[B2( (S)/-(s) - I30)2IF>o]]

= f[X B (U /nt(s))z2(s)I[Z (8)>l]]
B2( 2/n (s))z2(s)
1 + g'exp(Boz2(s)/2)(ï¿½(k* - Iz(s)I) + q(k* + z(s)1))

Let K = max(Mo, M1, 2). Then writing g" = cog' and using (i) of Lemma 2.3.2,

rhs of (3.3.31)
E[Bo(U2/ nx(s))Z2(S){I[<2(,)K2]}]

" B(U2 /nx(s))E[B1 XI[B<2
+ k* z2(s)/(k* + g" exp(Boz2 (s)/2))I[2()>K2]]. (3.3.32) But,

E 2 [K2B
[xI[B
< 3-B/(K 1)/v 2. (3.3.33) Also, using (iii) of Lemma 2.3.2, and logk*/k* < 1 for Iz(s)I > K

E[k* z2(s) {k* ï¿½ g" exp(Boz2(s)/2) }- I[z2(s)>K2]]

" E[z2(s)(z(s) + c,(log k*)l/2){k* + g" exp(Boz2(s)/2)}-I[2()>K2]]

" E[{Iz(s)j1/(g" exp(Boz2(s)/2))}I[z2()>K2]

-cz2 (s) (2X--exp(Boz2 (s)/4))-1l[z2()>K2]. (3.3.34) But,

E[jz(s)l exp(-Boz2(s)/2)I[Z2()>K2]]

= E[(x2/B)3/2 exp(-1BoX /B)I[ >K ]]

2 1 1Bo
= B32IB x3' exp(- Box - 1 -1~ (27r) 1dx cL2B2B 2 < (27r) -1/2 B-3/2 00x -4- IX B 1))dx
(2r) exp(- x(--ï¿½1)d

(27r) -1/2 B-3/2 4( -Bo- 1)-2 = 0(B1/2). (3.3.35) Moreover,

E[z2 (s) exp(--BZ (s))I[2(.)>g2]]
B
= E[(x2/B) exp(-X)I[X2>KB]]

-1- 2 exp(-- (- +] 1))x1/ (27r)-/d
-B IM2B ~2 2Bï¿½ = O(B1/2). (3.3.36) Combine (3.3.34) - (3.3.36) to conclude that

- 01)2I>0]] = 0(B1/2). (3.3.37) Finally,

E[(l1 - ah a2
(k,)2n.T(s)tnh(zs)I>] a72
< a E[(k*)-2 tanh2(k*z(s))I[Iz(s)>l]]
;- (S)

< or 2 E[z2 (s)I[l<1z(s)K]], (3.3.38)

where in the final inequality of (3.3.38), we use I tanh(k*z(s))l _< k*lz(s)l for I < Iz(s)I < K and Itanh(k*z(s))I < 1 for Iz(s)I > K.

As before
E[z2(s)[l
E[ ( k* )- [I )> K]]
11[2(,)>K2] = E[B(X2)' 2]]

B K2B X-1 exp(2x)xl/2-1 (2r)-112dx B(2i)-1/2] ï¿½312d
< B(27 -1/2 x-3/2ax = 2B(2K) -1/2(K 2B) -1/2 2B

=(B1/2). (3.3.40) From (3.3.38) - (3.3.40), lhs of (3.3.38) = O(B1/2) Combine(3.3.25), (3.3.26), (3.3.27), (3.3.29), (3.3.30), (3.3.37), (3.3.38), (3.3.39) and

(3.3.40) to get the theorem. Remark 3.3.2 It follows from the above theorem that as n -+ oo, i.e. B(s) -4 0, under the subjective N(d0, T2) prior, esu, the robust Bayes estimator of 7(y) = N-1 i y is asymptotically optimal in the sense of Robbins (1955).

3.4 An Example

The example in this section considers one of the six real populations which are used in Royall and Cumberland (1981) for an empirical study of the ratio estimator and estimates of its variance. Our population consists of the 1960 and 1970 population, in millions, of 125 US cities with 1960 population between 100,000 and 1,000,000. Here the auxiliary information is the 1960 population. The populations of different cities are shown in Figure 3.1.

Z 0
.-J

-J 00
Z
00
F

0.0

0 0A 0:6 0'
0.2 *0.8

1960 POPULATION (MILLIONS) Figure 3.1. Cities Populations

The problem is to estimate the mean (or total) number of inhabitants in those 125 cities in 1970. For the complete population in 1970, we find that the population mean is 0.29034. We select 20% simple random sample without replacement from this population. So the sample size is n=25. Also, we are using a2= (N - 1) i=l(iOlX,)2 = 4.84844X 10-3 which is assumed to be known. We can obtain easily the ratio estimate and corresponding standard error. To do a Bayesian analysis, we use both 1950 and 1960 populations in 125 cities to elicit the base prior ir0 for ft. The elicited prior 7r0 is the N(1.15932, 1.21097x 10-3 ) distribution based on prior information. Under this elicited prior 7ro, we use formulas (3.1.2) and (3.1.3) to obtain the subjective Bayes predictor and the associated posterior variance. But we have some uncertainty in 7ro and the prior information, so we choose E = .1 and we get the robust Bayes

Table 3.1. Predictors, Associated Standard Errors and Posterior Robustness Index

Predictor SE 1y(y) - el POR
eR 0.28426 1.10032x10-2 6.08452 x 10-3 5.38660x 10-4 eBo 0.29336 5.61481X10-3 3.01418x10-3 3.27488x10-4 eRB 0.28660 5.49854X10-3 3.74880x10-3 4.35696x10-4 esu 0.29027 5.74954x 10-3 7.83722 x 10-5 3.29777x 10-4

predictors and the associated posterior variances using formulas (3.2.16), (3.2.17), (3.3.18) and (3.3.20). For illustration purpose, we have decided to report our analysis for one sample. Table 3.1 provides the classical ratio estimate eR, the subjective Bayes predictor eBo, the robust Bayes predictor eRB with all possible contaminations, the robust Bayes predictor esu with all symmetric unimodal contaminations and the respective associated standard errors. Table 3.1 also provides the posterior robustness index for each predictor which in a sense the sensitivity index of the predictor as the prior varies over the class {N(0,r2), T2 > 0 }.

An inspection of Table 3.1 reveals that the robust Bayes predictors eRB and esu are well behaved in the sense that esu is closest to 7(y) and both eRB and esu are closer to 7(y) than the classical ratio estimate eR . The subjective Bayes predictor eBo is good in the sense of the posterior robustness index. Note that eR is worst in both the closeness to 7(y) and the posterior robustness index.

Also we find that the posterior mean of -y(y) = N-1 ENI Y, is in the interval (0.28363, 0.29428) for all possible contaminations and in the interval (.29003, .29364) for all symmetric unimodal contaminations. Note that the range of the posterior mean of -y(y) = N-1 EI Y, is fairly small for both cases. So if we feel that the true prior is close to a specific one, say 70, we should model via one of the contamination models, and achieve a very robust inference.

CHAPTER 4
BAYESIAN ANALYSIS UNDER HEAVY-TAILED PRIORS

4.1 Introduction

In this chapter, we consider the idea of developing priors that are inherently robust in some sense. The idea is that it is perhaps easier to build robustness into the analysis at the beginning, than to attempt verifying robustness at the end.

Substantial evidence has been presented to the effect that priors with tails that are flatter than those of the likelihood function tend to be fairly robust (e.q., Box and Tiao (1968, 1973), Dawid (1973), O'Hagan (1979, 1989) and West (1985)). It is thus desirable to develop fairly broad classes of flat-tailed priors for use in "standard" Bayesian analyses. Andrews and Mallows (1974) and West (1987) studied scale mixture of normal distributions which can be used for simulation and in the analysis of outlier models. The Student t family, double-exponential, logistic, and the exponential power family can all be constructed as scale mixtures of normals. The exponential power family was introduced and popularized by Box and Tiao (1973) in the context of Bayesian modelling for robustness. Recently, Angers and Berger (1992) and Angers(1992) considered t priors in the hierarchical Bayes setting, while Datta and Lahiri (1994) considered general scale mixtures of normals primarily with the end of outlier detection in the context of small area estimation.

The price to be paid for utilization of inherently robust procedures is computational; closed form calculation is no longer possible. Recently, however, the Markov

chain Monte Carlo integration techniques, in particular the Gibbs sampling (Geman and Geman (1984), Gelfand and Smith (1990), and Gelfand et al. (1990)) has proved to be a simple yet powerful tool for performing robust Bayes computations.

Ericson (1969) considered the superpopulation model yj = 0 + ej, where 0, el,-- , eN are independently distributed with 0 - N(po, To) and ei's are iid N(0, a2). As we have seen in Chapter 2, under the N(po, T2) prior, the Bayes estimator of -y(y) = N- zfi=1 Yi is

60(s, y(s)) = f 9(s) + (1 - f){(1 - Bo)y(s) + Bopto}. (4.1.1)

Recall that f = n/N, Mo = a2/TO2 and Bo = Mo/(Mo + n). For unknown a2, a normal-gamma prior was used.

The purpose of this chapter is to develop inherently robust Bayes procedures to overcome the problem associated with outliers in the context of finite population sampling. We consider a refinement based on heavy-tailed prior distributions on 9 using scale mixtures of normals for both known and unknown a2. We use the same notations as in Chapter 2.

The outline of the remaining sections is as follows. In Section 4.2, we provide the robust Bayes estimators of -y(y) based on heavy-tailed prior distributions using scale mixtures of normals when a2 is known. Also, the asymptotic optimality (A.O.) in the sense of Robbins (1955) of proposed robust Bayes estimators is proved. The above program is repeated in Section 4.3 with the exception that a2 is unknown. Once again, robust Bayes estimators are proposed using scale mixtures of normals, and their A.O. property is studied. Finally in Section 4.4, a numerical example is provided to illustrate the results of the preceding sections.

4.2 Known Variance

Consider the case when (i) Y ]0/ dN(9,U2) (i=, ..., N) and (ii) 0 - ' -p( O) where p(x) = fo A q(xA'2)g(A) dA, that is, p(.) is a scale mixture of the normal distribution with mixing distribution g(-). Note that we can write (ii) in the following two steps; (iia) 9I A -- N(po, A-1) and (iib) A - T 2g(T0A) where fo g(x)dx = 1. The following list identifies the necessary functional form for g(A) to obtain a wide range of densities which represent departures from normality:

t-priors: If kA - X2 then 0 is Student t with k degrees of freedom, location parameter po, and scale parameter T0.
double-exponential priors: If 1/A has exponential distribution with mean 2 then

0 is double-exponential with location parameter po and scale parameter To.

exponential power family priors: If A has positive stable distribution with index a/2 then 0 has exponential power distribution with location parameter A0 and scale parameter To.
logistic priors: If v'X has the asymptotic Kolmogorov distance distribution then 0 is logistic with location parameter P0 and scale parameter To. [A random variable Z is said to have an asymptotic Kolmogorov distance distribution if it has a pdf of the form f(z) = 8z E, 1_(-1)J-lj2 exp(-2j2z2)I(o, )(z) ].

We shall use the notations 9(s) = n-1 Z,-s Yi and y(g) = (yj : i s}, the suffixes in y(g) being arranged in the ascending order. Then the posterior distribution of y(ï¿½) given by s and y(s) is obtained as follows:
(i) conditional on s, y(s) and A, y(g) has N((B(A)po + (1 - B(/))y(8))lN-n, a2(IN-. + I N-_ where B(A) = Au2/(Au2 + n);

(ii) the conditional distribution of A given s and y(s) has pdf

OC(U -12 n(?j(s) _ Mb)2 2
f(AIs, y(s)) c + (2 + nA-)1/2 exp[- 2(U2 + nA-')' g(T0 A). (4.2.1)

Note that under the posterior distribution given in (4.2.1), the Bayes estimator of 'y(y) is given by

J M(s,y(s)) = E[y(y)js,y(s)]

= f9(s) + (1 - f){E[B(A) I s, y(s)] /to + (1 - E[B(A) I s, y(s)]) 9(s)}. (4.2.2) Also, one gets

V(-Y(Y) I .S, y(8))

E[ V(-/(y) I s, y(s), A) I s, y(s)] + V[ E(y(y) I s, y(s), A) I s, y(s)]

SN-2U2{(N - n) + (N - n)2E((Aa2 + n)-1 I s,y(s))}

+(I- f)2 V(B(A) Jo + (1 - B(A)) P(s) I s, y(s)). (4.2.3)

The calculations in (4.2.2) and (4.2.3) can be performed using one-dimensional numerical integration. Alternately, one can use Monte Carlo numerical integration techniques to generate the posterior distribution and the associated means and variances. More specifically, in this chapter, we use Gibbs sampling originally introduced in Geman and Geman (1984), and more recently popularized by Gelfand and Smith (1990) and Gelfand et al. (1990). Gibbs sampling is described below.
Gibbs sampling is a Markovian updating scheme. Given an arbitrary starting set of values Uï¿½),"-- , U(ï¿½) , we draw U(1) [U1 I U20),-- , V(ï¿½)] , U2(1) -, [U2 I UI1), U ï¿½)," , U(1() - [Uk I U , " " (1)j, where [- [ -] denotes the relevant conditional distributions. Thus, each variable is visited in the natural order and a cycle in this scheme requires k random variate generations. After t such iterations, one arrives at (Ult), , Ut)). As t-4 o, ... , U(t) -- (U1,..., Uk).

Gibbs sampling through q replications of the aforementioned t-iterations generates q k-tuples (Ue),..., U(t) (j = 1,-. ,q) for t large enough. U1,..., Uk could possibly be vectors in the above scheme.

Gelman and Rubin (1992) adapt multiple sequences, with starting points drawn from an overdispersed distribution, to monitor the convergence of the Gibbs sampler. Specifically, m > 2 independent sequences are generated, each of length 2d. But to diminish the effect of the starting distribution, the first d iterations of each sequence are discarded. Hence, we have m x d simulated values for each parameter of interest.

Using Gibbs sampling, the posterior distribution of y(g) is approximated by m d
(rd)-1 E E [y(S) I s, y(s),0 = Oij, A = Aij]. (4.2.4) i=1 j=1

To estimate the posterior moments, we use Bao-Blackwellized estimates as in Gelfand and Smith (1991). Note that E[7(y) I s, y(s)] is approximated by m d
f 9(s) + (1 - f) (md)-1 E E (B(Aij) po + (1 - B(Aij)) fj(s)). (4.2.5) i=1 j=1

Next one approximates V(-y(y) Is, y(s)) by m d
N-22{(N - n) + (N - n)2 (md)-1 1 -(Aij ,2 + n)-1} i=1 j=1

- f)2[(md)- E :(B(Atj)/to + (1 - S(Aij) i=1 j=1
m d
-{(md)-1 E (B(Aij)to + (1 - B(Aij)) j(s))}2]. (4.2.6) i=1 j=1

The Gibbs sampling analysis is based on the following posterior distributions:

(i) 0 1 s, y(s), y(), A , N [(Apo + ENI yi/02)/(A + N/c2), (A ï¿½N02)-]

(ii) f(A I s, y(s), y(g), 0) x vx/exp[-A(0 - ,o)2] g(T2A);

(iii) y(g) I s, y(s), 9, A - N[O1Nfn, U2IN-n].

Note that if kA -. X2 then f(A I s,y(s),y(ï¿½),0) reduces to a Gamma(.{J2k + (0- po)2}, l(k + 1)) density. [A random variable W is said to have a Gamma(a,fi)

distribution if it has a pdf of the form f(w) cx exp(-aw)w"-1 I(o,o)(w), where I denotes the usual indicator function]. Also, if 1/A has exponential distribution with mean 2 then f(A I s, y(s), y(g), 0) reduces to a IGN(1/ T(0 - A0)2, I/TA) density. [A random variable V is said to have a IGN(rq, 172) distribution if it has a pdf of the form f(v) 1 V-3/2 exp( (2n-1 )2 MI ,0)(0).
We shall now evaluate the performance of the robust Bayes estimator 61%M of 9 for large n under the N(po, T2) prior, say 7ro. The Bayes estimator of 0 under this prior is J0 which is given by (4.1.1). Let r(7ro, 6) denote the Bayes risk of an estimator J of 9 under the prior iro. Our aim is to show that r(ro, J1M) - r(wo, Jï¿½) -- 0 as n -* oo. Lemma 4.2.1 Assume E(A 1/2) < oc. Then E[B(A) Is, y(s)] -4 0 as n -+ o. Proof of Lemma 4.2.1 Note that

E[B(A) Is, y(s)] or2 f(a2 + nA 13/2 exp[ n(2- )2] g(T2A)dA J 2(a2 + nA-1) + f(a2 + n) 1/2 exp[ n(() [o)2 Jo2(u2 ï¿½ nA-1)l g('r2"A)dA

1 E + n(2)-1A-1) 1+ n(U2)-A-1

+E[1 + n(02)-A-11 exp(-A(YMs) -P)2) = E ( +n, - '--A-1 ,1/2 A 1/
1 E [( n(02-1A1 )1 + n(o2)-1A-1

E 1 ï¿½ n(2)-A-1 A1/2 exp(-A (S) - /0)2)

- (say). (4.2.7)

Now, P < n-1l2E(A 3/2) 4 0, if E(A3/2) < 0o. Also,

Q,= E[(n-o2A + 1)1/2 AI/2 exp(-A((9(s) - o)2)]. (4.2.8) Note that 9(s) - Io is the centered mean of an exchangeable sequence of random variables, and hence, is a centered backward martingale. Hence, (V(s) - t0)2 is a backward submartingale. Since 1rn supn_, E(9(s) - o)2 = T2 < +oo, by the submartingale convergence theorem, (9(s) - yo)2 converges a.s. to a rv, say Yo. Hence, using Fatou's lemma

lir inf Qn > E[A1/2 exp(-0Y)], (4.2.9) where the lower bound is bounded away from zero a.s.. Hence, P,/Qn + 0 as n -+ o.

We now turn to the theorem which proves the A.O. property of JlM obtained in (4.2.2).

The6rem 4.2.1 Assume E(A3/2) < oo. Then r(7ro, 6sim) - r(ro,3) -- 0 as n --ï¿½ oo. Proof of Theorem 4.2.1 Standard Bayesian calculations yield

r (ir, J'Sm) - r (7ro, 60) = E (J% 1 m 30)2
= (1 - f)2E[(E(B(A) 1 s, y(s)) - Bo)2 (V(s)- io)2]. (4.2.10) By Lemma 4.2.1, E[B(A) I s, y(s)] 4 0 as n -* 00. Also, B0 -+ 0 as n - oo. Hence, (E(B(A) I s,y(s)) - Bo)2 -+ 0 as n -+ oo. Also, IE(B(A) I s,y(s)) - Bo < 1 and (9(s) - tLO)2 being a backward submartingale is uniformly integrable. Hence, the rhs of (4.2.10) -* 0 as n -- oc. This completes the proof of the theorem.

4.3 Unknown Variance

In this section, somewhat more realistically, we consider the normal superpopulation model with unknown mean 0 and unknown variance r-1. Ericson (1969) used a normal-gamma prior on (0, r) in this setting. That is, 0 1 r -' N( o, r-rT) and r ,Gamma( ao, go). But in this case the ratio of model variance and prior variance is known.

Suppose now, more generally, that y, 0, r " N(O, r-1) (i = 1,.--, N), and 0 and r are independently distributed with 0 - N(o, To) and r -,, Gamma( ao, 'go). Then the posterior distribution of y(g) given s and y(s) is obtained via the following two steps :
(i) conditional on s,y(s) and r, y(g) has N((B(r)go + (1 - B(r))g(s))1Nl,

r-1(IN-. U(r)+f1NflNl)), where M(r) = r-1/TO2 and B(r) M(r)/(M(r) +n).

(ii) the conditional distribution of r given s and y(s) has pdf

f(rls, y(s))

e a r(nT90 xF lao ï¿½ (Y + n ) + (4.3.1)
(i+nr,gr)1/2 2 - +nr0- )

Note that under the posterior distribution given (4.3.1), the Bayes estimator of

-y(y) is given by

B(s, y(s)) = E[y(y) I s,y(s)]

= fq(s) + (1 - f){E[B(r) I q y(s)]p o + (1 - E[B(r) I s, y(s)])g(s)}. (4.3.2) Also, one gets

V(7(Y) I s,y(s))

E[ V(y(y) I s, y(s), A, r) I s, y(s)] + V[ E(-(y) I s, y(s), A, r) s, y(s)]

- N2(N - n)E[r-l{1 ï¿½ N' n(1 - B(r)}l I s, y(s)]

+(1 - f)2(9(s)- ,o)2V(B(r) I s, y(s)). (4.3.3)

In order to robustify the above model, consider the case when (i) yi 0, r d N(O,r-) (i 1,---,N), (ii) r ,-. Gamma( ao,1go) and (iii) 0 - -1p(01o0), where p(x) = fo' A (x)d )g(A)dA. Note that the prior pdf of 0 does not depend on r. Recall that we can write (iii) in the following two steps; (iiia) 0 A ..' N(io, A-1) and (iiib) A- T2g(T2A) where fo g(x)dx = 1.
Then the posterior distribution of y(g) given by s and y(s) is obtained as follows: (i) conditional on s, y(s), r, and A, y(g) has N((B(A,r)po+(1-B(A,r))q(s))1N , r-1(INfn ï¿½ Ar-l+nlN-1,lN)), where B(A,r) = 1/(1 + nA-r);

(ii) the conditional distribution of A and r given s and y(s) has pdf

f(A, rls, y(s))
C r(n o-2)/2 exp[-I{ao + E s(Yi _ g(S))2 + n~lX1_r) }](TA) (4-3.4)
(1+n -1r)1/2 2 + n\I (4.3.4

Note that under the posterior distribution given in (4.3.4), the Bayes estimator of

-y(y) is given by

M(sy (s)) = E[y(y)Is, y(s)]
= f (s) + (1 - f){E[B(A, r) I s, y(s)],uo + (1 - E[B(A, r) I s, y(s)]) (s)}(4.3.5) Also, one gets

V(-Y(y) I S, y(s))

= E[ V(y(y) I s, y(s), A, r) I s, y(s)] + V[ E(-y(y) I s, y(s), A, r) I s, y(s)] N-2(N - n)E[r-{1 ï¿½ (N - n)1 } I s,y(s)] +(I- f)2 V(B(A, r) pzo + (1 - B(A, r)) i(s) I s, y(s)). (4.3.6)

Using Gibbs sampling, the posterior distribution of y(g) is approximated by m d
(md)-1 E E [y(g) s, y(s), 9 = Oij,, A = Aij, r = rij]. (4.3.7) i=1 j=1

To estimate the posterior moments, we use once again the Rao-Blackwellized estimates as in Gelfand and Smith (1991). Note that E[7(y) I s, y(s)] is approximated by

m d
f 9(s) + (1 - f) (md)- E E (B(Aij, rij) yo + (1 - B(Aij, rij)) 9 (s)). (4.3.8) i=1 j=1

Next one approximates V(-y(y) I s, y(s)) by m d
N-2(N - n)(md)-1 E '[(rij)-1 {1 + (N - n)(Aij(rj)-1 + n)-}] i=1 j=1
rn d
- f)2[(md)- Z(B(Aij, rij) yo + (1 - B(A3, rij)),l(s))2 i=1 j=1
m d
-{(md)1 EE (B(Aj, rij) go + (1 - B(Aij, rij)) 9(s))}2]. (4.3.9) i=1 j=1

The Gibbs sampling analysis is based on the following posterior distributions:

(i) 0 1 s, y(s), y(g), A, r - N [(Apo + r i1 yi)/(A + rN), (A + rN)-];

(ii) f(A I s,y(s),y(g),0,r) cx vexp[- (0 - /o)2] g(T02A);

(iii) r I s, y(s), y( ), A, 0 - Gamma('l{ao + ENI(y - 0)2}, (N + go));

(iv) y(g) I s, y(s), 9, A, r ,- N[1N_,,, r-IIN-,].

Recall that if kA _ X2 then f(A I s,y(s),y(g),O) reduces to a Gamma( {Tgk + (0 -'po)2}, (k + 1)) density. Also, if 1/A has exponential distribution with mean 2 then f(A I s, y(s), y(g), 9) is reduced to a IGN(1/ /T(O - po)2, 1/T02) density.

Now to evaluate the performance of the robust Bayes estimator g M of 0 for large n, we denote as the prior under which 0 and r are independent with 0 - N(1,0, T2) and r ,- Gamma('ao, -go). The Bayes estimator of 0 under prior is given by (4.3.2). Our goal is now to show that r( , eM) - r( , 6B) --+ 0 as n -4 oo.

68

Lemma 4.3.1 Assume E(A3/2) < cc. Then E[B(r) I s, y(s)] 4 0 as n -- oc. Also, E[B(A, r) I s, y(s)] - 0 as n --+ c. Proof of Lemma 4.3.1 First show E[B(r) I s, y(s)] 4 0 as n -4 cc. This amounts to showing Nn/D,, -- 0 as n -+ cc, where

r(n+go-2)/2 r 1 ï¿½ n(g(S) _1IO)2 }]dr
N 1 (1ï¿½ + n -r)3/2 exp[- {ao + i- YnT02r
" (+ -or32 2 iES 1 +n n r

(4.3.10)

and

00 r (n+go-2)/2
Dn = I) f expI-t{ao
0 (1 + n-r2r)1/2 ex 2a

r(n+gï¿½-2)/2 r (1 + nT0r)1/2 exp-{a0+

+ D(y
iEs

9 (s))2 + n(v(s) - /1o)2 }]dr.
I + nT2r

(Yi - Y(9s)) (s) - o)2]dr. iEs 2T

Hence, defining Wn -Gamma( {ao + EiES(Yi - Y(S))2}, (n + go -1)),

< E (+ n 02Wn

< E (1 + n o W ) 1 -

1

exp 9 -W ])

-t--

1
vrT2Wn]

exp (9 (iS) _ o)2)

n T-2Wn 1/2]
1+ nT2Wn )I

An
W- (say).

Note that

D0o

(4.3.11)

(4.3.12)

(4.3.13)

-t-

Now, A, < E[(1 + nT2Wn)-1] :_ n-l(rO)-1E(W1). Note that

n1 E(Wjl) = n-1 ao + Ei8s(Y - 9(S))2 . o n+go -3

(4.3.14)

since by the law of large numbers for exchangeable sequences, ao+Zjo,(Vi-V(S))' R1 g+in-3
Again,

B. = E[(n-I(7-2)-'WI' + 1)-1/2]exp

> (n-l(T)-1E(W,') + 1)-1/2exp

(g(s) -i-to)2\
2 T02

using Jensen's inequality since (x + 1)-1/2 is a convex function of x. Hence,

N < n-l(T2)-lE(Wnl')(n-l(T>2)-'E(wi-') + 1)1/2exp ((() -_ p0)2\ -0

(4.3.16)
since (9(s)- o)2= Op(1) and n-IE(Wn1) 4 0.

Next we need to show that E[B(A, r) Is, y(s)] P 0 as n -4 oo, that is, N'/Dn -P 0 as n -oc, where

o (1 + nA-1r)3/2 exp[- {ao+i(y- (s)) 1) -,o)2 }]drdA (4.3.17)
and

nD = 100

00 g(TA)1r(n+go-2)/2 r Yi_,(,))2ï¿½ n(V(s) _-o)2 Jo T A)(1 + nA1r) 1/2 expL- 2faoï¿½tys 1 + nA-r }]drdA

(4.3.18)

Note that

fo fo A)_r(n+9-2)/2 A(8()
D'g ( 1 + nA-r)1/2 e . A_ + ]drdA.
n 1+n -r12 2 iES

(4.3.19)

(4.3.15)

Hence, defining Wn, as before,

1 1
+ nA-'W

exp AMS

- /o)2)]

nAwn ,_)1/2Al/2 +
- (1 + nA-1 Wn

Al2exp AWS

where expectation is taken with respect to the joint pdf

f(A, w) =-og (T0o A) exp[---{ao + E(Y -( iEs

x ((+ ) ao+ E (y-V(S))2 2

Now,

= 1 n - E (A 3/2) ao+ E ,(7J - V(S)) -- 0,
n+go-3

if E(A3/2) < co. In the above, we have used independence of Wn and A. Again,

B' = E[(n-1AW.- + l)1/2 A1/2 ep-A~S _ [to)2)]
A

= E[E(n-AW + 1)-1/2 A}A1/2exp(--(gj( ) - to)2)]. (4.3.23)
2

REW 1/2 +E I + nA- W

- n (say),
B-n

(4.3.20)

(4.3.21)

(4.3.22)

A' < E[A 1/2 n-I AW,-'] = n-I E(A 3/2 )E(Wn-1)

-- nA _ lwn 1/2
+1-- nA-1Wn )

Since (x + 1)-1/2 is a convex function of x, using Jensen's inequality for conditional expectations,

E[(n-'AWn-1 + 1)-1/2 1 A] _ [E(n-'AW' I A) + 1]-1/

- (n-'AE(W:') + 1)-1/2. (4.3.24) Hence,

Bn > E[(n-AE(W,1) + 1)-1/2A1/2 exp(- (9(s) - [sO)2)]

- E[(nao + Eje(yj - 9(s))2 + 11/2 A1/2exp(-((s) - )2)]
n n+go - 3

(4.3.25)

By the same argument in Lemma 4.2.1, (9(s) - juo)2 converges a.s. to a rv, say Y0'. Hence, using Fatou's lemma, it follows from (4.3.25) that liminf B' > E[A1/2 exp(-AY)].
n-+oo n - 2(4.3.26) n--*oo 2 Hence, B,, is bounded away from zero a.s. in the limit. Hence An/Bn' - 0 so that N'/D" o.

We now turn to the theorem which proves the A.O. property of 6gM obtained in (4.3.5).

Theorem 4.3.1 Assume E(A 3/2) < cc. Then r( , 6DM) - r( , jB) -+ 0 as n -+ oo. Proof of Theorem 4.3.1 Standard Bayesian calculations yield

r(,jM) - r( ,jB)= E(6M -3B)2

- (1 - f)2E[{E[B(A, r) Is, y(s)] - E[B(r) I s, y(s)]}2 (g(s) - [o)2]. (4.3.27)

By Lemma 4.3.1, E[B(A,r) s,y(s)] 4 0 as n -+ oo and E[B(r) I s,y(s)] _4 0 as n -4 oo. Hence, {E[B(A,r) s,y(s)] - E[B(r) I s,y(s)]}2 -4 0 as n -- oc. Also, jE[B(A,r) I s,y(s)] - E[B(r) [ s,y(s)]j < 1 and (9(s) - p0)2 is uniformly integrable. Hence, the rhs of (4.3.27) -4 0 as n -+ oo. This completes the proof of the theorem.

4.4 An Example

We illustrate the methods of Section 4.2 and 4.3 with an analysis of data in Cochran (1977). The data set consists of the 1920 and 1930 number of inhabitants, in thousands, of 64 large cities in the United States. The data were obtained by taking the cities which ranked fifth to sixty-eighth in the United States in total number of inhabitants in 1920. The cities are arranged in two strata, the first containing the 16 largest cities and the second the remaining 48 cities. But for our purpose, we just use the second stratum only. For the complete population, we find the population mean to be 197.875 and the population variance 5580.92. We use the 1920 data to elicit the prior in our setting so that p0 = 165.438 and T2 = 71.424. We want to estimate the average (or total) number of inhabitants in all 48 cities in 1930 based on a sample of size 16 (i.e. 1/3 sample). For illustrative purposes, we have decided to report our analysis for one sample.
In deriving the robust Bayes estimates based on heavy-tailed prior distributions using scale mixtures of normals, we have considered Gibbs sampler with 10 independent sequences, each with a sample of size 5000 with a burn in sample of another 5000.
Table 4.1 provides the Bayes estimates of y(y) and the associated posterior standard deviations for the normal, double exponential and t priors with degree of freedom 1, 3, 5, 10 and 15 in both known and unknown a2 cases. For the unknown a2 case, we have used a0 = go = 0 to ensure some form of diffuse gamma prior for the inverse of

the variance component in our superpopulation model. Note that the naive estimate, that is, the sample mean is 207.69.
An inspection of Table 4.1 reveals that there can be significant improvement in the estimate of -y(y) by using heavy-tailed prior distributions rather than using the normal prior distribution in the sense of the closeness to -y(y). For instance, using the double exponential and the t(1), t(3), t(5), t(10) and t(15) priors, the percentage improvements over the normal are given respectively by 45.78%, 89.05%, 52.06%, 30.68%, 15.53% and 9.06% for the known U2 case. Here the percentage improvement of el over e2 is calculated by

((e2 - truth)2 - (el - truth)2)/(e2 - truth)2 where el is the robust Bayes predictor based on heavy-tailed prior distributions and e2 is the Bayes predictor using the normal prior. Also as one might expect, flatter the prior, closer is the Bayes estimates to the sample mean. In general, for most cases we have considered, the Cauchy prior (i.e., t prior with 1 degree of freedom) leads to an estimate which is closest to the population mean.

We adopt the basic approach of Gelman and Rubin (1992) to monitor the convergence of the Gibbs sampler. For 0, we simulate m = 10 independent sequences each of length 2d = 10000 with starting points drawn from a t distribution with 2 degree of freedom. The justification for t distributions as well as the choice of the specific parameters of this distribution are given below.
First note that from the posterior distribution of A given s and y(s) as given in (4.2.1), we find the posterior mode, say A by using the Newton-Raphson algorithm. Also, we use y(s) for yi, i E .9 based on sample. We can now very well use the N [(AM + N9(s)/a2)/(A + N/a2), (A + N/2)-1] as the starting posterior distribution for 0. But in order to start with an overdispersed distribution as recommended by Gelman and Rubin, we take t distribution with 2 degree of freedom. Also, note

that once the initial 0 value have been generated, the rest of the procedure uses the posterior distributions as given in (i)-(iii) in Section 4.2. Similar procedures can be used for the unknown a2 case.

Next, as in Gelman and Rubin, we compute
B/5000=the variance between the 10 sequence means #i. each based on 5000 0 values, 10 10 that is B/5000 - .)2/(10 - 1), where j.. = -L E-#j..
10
j=l i=1
W= the average of the 10 within-sequence variance s? each based on (5000-1) degrees 10
of freedom, that is W 1 Z s. Then, find

62 5000-1W + 1B
5000 5000

and

/ = &62 1 B.
(10)(5000)

Finally, find f? = V/W. If R is near 1 for all scalar estimands of interest, it is reasonable to assume that the desired convergence is achieved in the Gibbs sampling algorithm (see Gelman and Rubin (1992) for the complete discussion).

The second column of Table 4.2 provides the f? values (the potential scale reduction factors) corresponding to the estimand 0 using Cauchy and double exponential priors based on 10 x 5000 = 50000 simulated values. The third column provides the corresponding 97.5% quantiles which are also equal to 1. The rightmost five columns of Table 4.2 show the simulated quantiles of the target posterior distribution of 0 for each one of the 4 estimates based on 10 x 5000 = 50000 simulated values.

Table 4.1. Bayes Estimates and Associated Posterior Standard Deviations

Known a2 Unknown a2 Priors Bayes EI s Posterior SD Bayes Estimate Posterior SD Normal 184.31 10.19 183.70 11.47 DE 187.89 12.16 187.09 13.14 t(1) 193.38 15.01 192.53 15.79 t(3) 188.48 12.63 187.69 13.60 t(5) 186.58 11.51 185.93 12.60 t(10) 185.41 10.75 184.68 11.91 t(15) 184.94 10.50 184.30 11.75

Table 4.2. Potential Scale Reduction and Simulated Quantiles

Potential scale
reduction Simulated quantiles
Priors R 97.5% 2.5% 25.0% 50.0% 75.0% 97.5% Known a2
Cauchy 1.00 1.00 160.00 171.98 183.09 198.17 226.89
DE 1.00 1.00 158.68 168.54 176.00 185.55 206.82 Unknown a2
Cauchy 1.00 1.00 157.54 170.38 181.46 196.96 226.55
DE 1.00 1.00 156.63 167.27 174.63 184.55 206.39

CHAPTER 5
BAYESIAN ROBUSTNESS IN SMALL AREA ESTIMATION

5.1 Introduction

Small area estimation is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. In typical small area estimation problems, there exist a large number of small areas, but samples available from an individual area are not usually adequate to achieve accuracy at a specified level. The reason behind this is that the original survey was designed to provide specific accuracy at a much higher level of aggregation than that for small areas. This makes it a necessity to "borrow strength" from related areas through implicit or explicit models that connect the small areas to find more accurate estimates for a given area, or simultaneously, for several areas. Ghosh and Rao (1994) have recently surveyed an early history as well as the recent developments on small area estimation.
Like frequentist methods, Bayesian methods have also been applied very extensively for solving small area estimation problems. Particularly effective in this regard has been the hierarchical or empirical Bayes (HB or EB) approach which are especially suited for a systematic connection of the small areas through models. For the general discussion of the EB or HB methodology in the small area estimation context, we may refer to Fay and Herriot (1979), Ghosh and Meeden (1986), Ghosh and Lahiri (1987), Datta and Ghosh (1991), Datta and Lahiri (1994), among others.

In this chapter, we propose an alternative Bayesian approach, namely the robust Bayes (RB) idea which has been discussed in the previous chapters in the context of a single stratum. Specifically, the HB procedure models the uncertainty in the prior information by assigning a single distribution (often noninformative or improper) to the prior parameters (usually called hyperparameters). Instead, as discussed in the earlier chapters, the RB procedure attempts to quantify the subjective information in terms of a class F of prior distributions.

In order to study Bayesian robustness in the context of small area estimation, we consider the following hierarchical Bayes model.

(A) Conditional on 0, /3, and -2, let I1,-- -, Yp be independently distributed with i - N(i, Vi), i = 1,. - . ,p, where the Vi's are known positive constants;

(B) Conditional on /3, and T2, 61,-- -, OP are independently distributed with EiN(x[3, T-2), i 1,- ,p, where x1, - xP are known regression vectors of dimension s and )3 is sxl;

(C) 3 - uniform(R) and T2 is assumed to be independent of /3 having a distribution h(T2) which belongs to a certain class of distributions F.

We shall use the notations Y = (Y1,""-, Ip)T 0 - (1,"", OP)T, X = (XI,.. ,Xp)T. Write G = Diag{V1,.-- , V} and assume rank (X) = s. Cano (1993) considered a special case of this model when xi = 1 and V = V for i = 1,- --,p.
The outline of the remaining sections is as follows. In Section 5.2, we choose F to be E-contamination class of priors where the contamination class includes all unimodal distributions. We develop the robust hierarchical Bayes estimators of the small area means and the associated measures of accuracy (i.e., the posterior variances) based on type-II maximum likelihood priors. Also we provide the range where the small area means belong under the E-contamination class.

In Section 5.3, we choose F to be the density ratio class of priors. As suggested by Wasserman and Kadane (1992), we use Gibbs sampling to compute bounds on posterior expectations over the density ratio class.

In Section 5.4, we choose F to be the class of uniform priors on T2 with T2 < T2 < T2. We are interested in the sensitivity analysis of the posterior quantity over F.

Finally, Section 5.5 contains the analysis of the real data to illustrate the results of the preceding sections.

5.2 E-Contamination Class

In this section, we consider the class F of priors of the form

F = {h: h = (1 - E)ho +eq,q C Q}, (5.2.1) where 0 < E < 1 is given, h0 is the inverse gamma distribution with pdf

ho( r o2) ' )6+1 exp(-aï¿½/T2)J(o,.)(T2), (5.2.2) denoted by IG(ao, 0o), and Q is the class of all unimodal distributions with the same mode T2 as that of ho.

The joint (improper) pdf of Y, 0, /3 and T2 is given by
2)1 YOT_0]T)2110_X 12
f(y, 0,/3, T) c exp[-(y - o)TG-(y - )](2)- exp[--IIO - X2T212]

x { (1 - E)ho(T2) + q(T2)}. (5.2.3) Integrating with respect to f3 in (5.2.3), one finds the joint (improper) pdf of Y, 0, and -_2 given by

f(y, 0, T2)

cx (T2)- 2 exp[- (y - O)TG-I(y -0)- 1OT(IJ - X(XTX)-lXT)o]

x{(1 - E)ho(T2) + Eq(T2)}. (5.2.4)

Write E-' = G + (T2)-(I p- X(XTX)-IXT). Then, one can write

(y - O)T G-l(y - o) + (T2)-loT(i - X(XTX) -lxT)o = OT E-10 - 20T G-ly + YTG-ly S(0 - EG-iy)TE-1(6 - EG-1y) + yT(G-1 - G-1EG-1)y. (5.2.5)

From (5.2.4) and (5.2.5), we have E[Oly, T2] = EG-1y; V[O1y,-2] = E. (5.2.6) Using (5.2.5) and integrating out with respect to 0 in (5.2.4), one gets the joint (improper) pdf of Y and T2 given by

f (Y, T2)

cx IEi6 (T2)-- exp[- yT (G-1 - G-1EG-1)y]{(1 - e)ho(T2) + eq(T2) }(5.2.7) We denote by m(ylh) the marginal distribution of y with respect to prior h, namely m(ylh) f f(yIT2)h(d'r2). (5.2.8) For h G F, we get
m(ylh) = (1 - e)m(ylho) + Em(ylq). (5.2.9) Our objective is to choose the ML-II prior h which maximizes m(ylh) over F. This amounts to maximization of h(ylq) over q E Q. Using the representation of each q E Q as a mixture of uniform densities, the ML-II prior is given by h(T2) = (1 - E)ho(-r2) + E(2) (5.2.10) where is uniform (T_2, To + ), i being the solution of the equation

(Yz) =1 j ff(yIT2)dT2, (5.2.11) f(YtZ)= z J0

and T 2 is the unique mode of ho(T2). Note that

2) CC 2(2-s 2 1
f(yrIT2) cx EI(T2)ï¿½: exp[--yT(G-1 - G-1EG-1)y]. (5.2.12) Write ui = Vi/(V + T2), i = 1,---,p, and D = Diag{l - ul,---,1 - up}. Then, on simplification, it follows that

E T2(Ip - D) + T2(IP - D)X(XTDX) -XT(Ip - D); (5.2.13)

EG-1 = D + (I - D)X(XTDX)-XTD; (5.2.14)

EG-1y [(1 - ul)Y1 + UlXT,..., (1 - Up)Yp UPXT3]T, (5.2.15) where / (XTDX)-I(XTDy). Then

G-1 - G-EG-1 = (T2)-l[D - DX(XTDX)-IXTD]. (5.2.16) Hence,

yT(G-1 - G-'EG-1)y

P P P
= (T2)1[Z(j _ Ui)y, - (Z(1 - ui)yX,)T(TD)-L((1 - ui)yxi)]
i---1 i= l --= QT2(y) (say). (5.2.17) Combining (5.2.12), (5.2.16) and (5.2.17), we can write

f(y1T2) cC IEI2 (,T2)-E exp[-Q,2(y)]. (5.2.18) Writing F = G-1 + (T 2)-jI, and using Exercise 2.4, p. 32 of Rao (1973), one gets F X
JE-11 XT T2(XTX) ï¿½IT2(xTx)I
= IFIIT2(XTX) - XTF-XI +T2(XTX)I

OC (T2)-n{fl(T2 + VI)}IXTDXI. (5.2.19) i=1

It is clear from (5.2.18) and (5.2.19) that P 1 f(y r2) + (T2)n{l -2 V)-}IXTDXK exp[--Q.2(y)].
i=1

Now to find the solution 2 to the equation (5.2.11), we consider

zf(YIZ) = f0 f ywT2)dT2. (5.2.21)

By differentiating both sides with respect to z, we get

f(ylz) + zzf(yz) = f(yT2- Z). (5.2.22)

By Lemma A.4.4., Lemma A.4.5. and Theorem A.4.6 in Anderson (1984), recall that for the s x s matrix A = (aij),

d A-' = -A-(d A)A-1 (5.2.23) dz dz

and
d s 8 d
J-AI 1ZZ A131-a%, (5.2.24) dz i=l j=l

where A1, = dJAI is the cofactor of aij. Write A = P-z- x'xT. Then after some calculations using (5.2.23) and (5.2.24), it can be shown that

d
df (yz)

S 7-1 P Z T
i-2---1 i-iz1 2
P P_ Z T1 1
- - (z (z +z + VI zj -- iI- exp{--z H(z)}
2 -il iz
P-P Z 2T _t _IOA e _ )
lz! L](z +- V)- 1 (z-+-Vi)-' I -,trl izeXp -TH(z}
~i---1 i

(5.2.20)

.+ z l(Z + v) i z+ '--exp{- H(z)L[ H() ---H(z)]

(5.2.25)

where
H~ Z = Y2 z T
p= p J-Viy p --.ii pEzx TH(Z) Zy , 2 -,ZT z_ yixi). (5.2.26) z~ ~ + +VxTZ Cx~(~
"== Z --+ ~ l Z V i i =l z - - Y i Using (5.2.25), (5.2.22) leads to

2 + 2 T exp{-+H(z)}
i=1
x[1 + - - EP=I(z_ + V)-1 - -tr(A- QA) + _jH(z) _ ldH(z)]
2zr) 2 t= Tz 2 dz

H(Z + T =l(Z TJ + v)- ! i x+xTIL1 exp{- 1 H(z + TO2)}.

(5.2.27)

Now we have the following HB model based on ML-I prior:

(I) Conditional on 0, 13, and r2, Y % N(Oi, V),i = 1,--.,p, where the V's are known positive constants;

(II) Conditional on /3, and T2, Oi N(xr3,T2), i = 1,---,p, where X - , are known regression vectors of dimension s and ,3 is sx 1;
(III) Marginally, /3 and T2 are mutually independent with /3 ,-' uniform(R) and T2 having a density h(T2) = (1 -e)ho(T2) + E(T2), where ho(T2) is IG(ao, io) density, with ao > 0 and /0o > 0, and " is uniform (Tr, T0 + F), with F being the solution of (5.2.27) and T2 = ao/(3o + 1).
It is clear from (5.2.20) that (T Ty)

ï¿½ V)- }IXDXI-1 exp[- Q,2(y)]{(1 - E)ho(T2) + Eg(T2)}.

(5.2.28)

Now writing U = V/(T2 + V) (i = 1,--,p), using (5.2.28), and the iterated formulas for conditional expectations and variances, one gets

E[Oiy] = E[E(Oily,T2)Iy] I E[(1 - U)yj + UixT)31y]; (5.2.29) and

V[0.ly] = V[E(Oly,T 2)Iy] + E[V(O ly, T2)IY] V[(1 - U,)y, + UXT31y] + E[T2U, + T2Ui2xT( DX)-Iy]

= V[Ui(Yi - xT3)Iy] + E[V(1 - _) ï¿½ V1U1(1 - Ui)xT(XTDX)-lxiy].

(5.2.30)

Thus, the posterior distribution of 0 under the ML-II prior is obtained using (5.2.4) and (5.2.28). In addition, one uses (5.2.29) and (5.2.30) to find the posterior means and variances of 0 under this prior. Similarly, by using the iterated formulas, posterior covariances may be obtained as well.
Next we consider the problem of finding the range of the posterior mean of 0i over F in (5.2.1). Using the expression h(T21y) OC f(yT2)h(T2), we have E[0 ly] = yj - E[V,(y- iXTO) IY] fc ui(yi - xT'3)h(2Iy)dT2 (5.2.31) = yi- f~o h(T21y)dT2

Simple modifications of the arguments of Sivaganesan and Berger (1989) or Cano (1993) leads to the following result S- f(Y- ui(yi - xT'3))f(yT2)d(r2
sup(inf)E[Oily] = sup(inf) , (5.2.32)
hEr t B + '2 f -(yIT2)dTwhere
B = 1 - j0 f (yT2)ho(T2)dT2 (5.2.33)

and

A = yB - j Ui(yi - xTO)f(ylT-2)ho(T2)dT2. (5.2.34) The above sup (inf) can be obtained by numerical optimization. These formulas will be used in Section 5.5 in the context of estimation of median income of four-person families.

5.3 Density Ratio Class

In this section we consider a class of priors, introduced by DeRobertis and Hartigan (1981), and called a density ratio class by Berger (1990),
-R={ h(-r2) -u(T2)
],R . h(T,2) for all T2 and T'2}, (5.3.1) h(T12 -1(T'2)

where 1 and u are two bounded nonnegative functions such that 1(T2) : u((T2) for all T2. This class can be viewed as specifying ranges for the ratios of the prior density between any two points. By taking u = kho and 1 = h0, we have very interesting subclass,
]F{< h (2 2) for all 72 and T'2}, (5.3.2) h(T12) - ho(T/2)

where k > 1 is a constant. This class may be thought of as a neighborhood around the target prior h0. The interpretation is that the odds of any pair of points are not misspecified by more than a factor of k. This prior is especially useful when h0 is a default prior chosen mainly for convenience.
Because of the expression h(T21y) oc f(yT2)h(TF2) we can view our problem as having just the parameter T2, h being the prior and f(yT2) the likelihood. Since

. [T(Z T2 (53P T2
T2+Vxii J-I ixi}], (5.3.3)
f[i~]= i TE.2 + V i=1 ,= i:

our problem reduces to finding

sup (inf)E[b(T2)Iy] (5.3.4) hEFR(ho)

,2 2
where b(T2) 7+ {Yi- XT( 1 -2-xT-,lP _Yx)}

Wasserman and Kadane (1992) have developed a Monte Carlo technique which can be used to bound posterior expectations over the density ratio class. Wasserman (1992) has shown that the set of posteriors obtained by applying Bayes' theorem to the density ratio class Fr(h0) is the density ratio class FR(h' ), where ho (T 2) ho(T2y) is the posterior corresponding to h0. To see this all we need to do is to write
FR 2 12y1
Fk (hy) = {y hy(2) < k for all T2 and T'}, (5.3.5) and observe h(r21y) f(Y(72)h(z2) and hï¿½(r - f(Yl_2)ho(r2) so is
h(r,21y) f(ylr2)h(r2) ho(r'21y) - f(ylr'2)ho(r2) SO that h eis equivalent to hy E Fk(hy ), where hy(T2) = h(T21y). Wasserman (1992) calls this as Bayes' invariance property. Hence, to bound the posterior expectation of b(T2), we only need bound the expectation of b(r2) on F(hy). To do so, we will need to draw a sample from the posterior ho. Following Wasserman and Kadane (1992), we can rely on recent sampling-based methods for Bayesian inference. Note in this case that

ho(T2(y) Oc (T2) {1-(T 2ï¿½V)-}XDX]-- exp[--Q,2(y)] ho(T2). (5.3.6)
2

Let T2 , "',T2 be a random sample from ho(T21y). Let bi = b(T2),i = 1,---,N and let b(1) < b(2) < ... < b(N) be the corresponding order statistics. Also, let ci = -b(-2), i = 1,-. , N and let c(i) < C(2) <_ ... < C(N) be the corresponding order statistics. Let ji=1 bi/N and E = ZiN- ci/N. Then following Wasserman and Kadane (1992), we get

sup E[b(T2)1y] = sup E[b(T2)1y]
hErR(ho) hyEro(h1
kyr khY)

max {(1 - -)A + 1}{A1bo}/Nt} (5.3.7) I~i

and

inf E[b(T2)y]= inf E[b(T2)y]
hEFr(ho) hyE R A
ky k (y)

N
-max {(1- -j)A+ 1}1{A c(j)/N+ï¿½} (5.3.8)

where A = k - 1. This gives the posterior bound for a given k.

To generate the sample from h , we use Gibbs sampling. The Gibbs sampling analysis is based on the following posterior distributions

(i) /3 I y,O,z - N8 ((XTX)-IXTO, Z-(XTX)-I);

(ii) 01, ",m I y,f3, z izd N ((1 - Bi)y + BixT3, z-Bi);
(iii) T' I y, ,3, 0O- IG Ce0 +. Em ]=1(0, _ Xi )T2, 1p +fo)

where Z = (T2)-1 and Bi z/(z + &), where I 1/V, i = 1,-- ,p.

Note that our target is to draw samples from ho(T2ly). However, as is well known in the Gibbs sampling literatures, the histogram of the (T2)(t), t = 1, 2,-- -, q, samples drawn from the conditionals [T2 IY, 3, 0] converges to the conditional distribution [T2 I y] as q --+ oo. This along with (5.3.7) and (5.3.8) facilitates the computations of upper and lower bounds of E[b(T2)ly].

5.4 Class of Uniform Priors

In usual HB model in our setting, one might use the diffuse prior on T-2. Instead, we consider the class of uniform priors on -r2 with constraints of the form

r = ï¿½T2

(5.4.1)

where T2 and r2 are arbitrary nonnegative numbers such that T2 < T. This class of priors is attractive because of its mathematical convenience, and indeed give a good enough class of priors for an honest robustness check. Classes of conjugate priors having parameters in certain ranges have been studied by Learner (1982) and Polasek (1985).

From (5.2.6) and (5.2.15), we have

sup (inf)E[Oily, T2]
.7-2<7-2

=y- inf (sup) 2i ly T(P 72 r P T2
Ti _ 22 { i T 2T2 + vYXi)}ï¿½ < -r <2 T +- Vi l l

(5.4.2)

Hence, we can find the range of the posterior quantity E[9iiy, T2] over F in (5.4.1), and investigate the sensitivity with different choices of T2 and T2.

5.5 An Example

In this section we illustrate the methods suggested in preceding sections with an analysis of real data set. The data set is related to the estimation of median income for four-person families by states. To start with, we briefly give a background of this problem.

The U.S. Department of Health and Human Services (HHS) administers a program of energy assistance to low-income families. Eligibility for the program is determined by a formula where the most important variable is an estimate of the current median income for four-person families by states (the 50 states and the District of Columbia).

The Bureau of the Census provides such estimates to the HHS annually since the latter part of the 1970's using a linear regression methodology. The basic source of the data is the annual demographic supplement to the March Sample of the Current Population Survey (CPS) which provides median income for four-person families for

the preceding year. In addition, once every ten years, similar figures are obtained from the decennial census for the calendar year immediately preceding the census year. The latter set of estimates is believed to be much more accurate and serve as the "gold standard" against which other estimates need to be tested.

Direct use of CPS estimates was found undesirable because of the large coefficient of variation associated with them (cf Fay (1987)). The regression method used by the Bureau of the Census was intended to rectify this drawback. The method that is used since 1985 is as follows. The current year CPS estimates of statewide median incomes of four-person families are used as dependent variables in a linear regression. In addition to the intercept term, the regression equation uses as independent variables the base year census median(b), and the adjusted census median(c), where census median(b) represents the median-income of four-person families in the state for the base year b from the most recently available decennial census. The adjusted census median(c) is obtained from the formula

Adjusted census median(c)= [BEA PCI(c)/BEA PCI(b)]

x Census median(b). (5.5.1)

In the above BEA PCI(c) and BEA PCI(b) represent respectively estimates of percapita income produced by the Bureau of Economic Analysis of the U.S. Department of Commerce for the current year c, and the base year b, respectively. Thus (5.5.1) attempts to adjust the base year census median by the proportional growth in the BEA PCI to arrive at the current year adjusted median. The inclusion of the census median(b) as a second independent variable is to adjust for any possible overstatement of the effect of change in BEA PCI upon the median income of four-person families.

Finally, a weighted average of the CPS sample estimate of the current median income and the regression estimates is obtained, weighting the sample estimates inversely proportional to the sampling variances and the regression estimates inversely

proportional to the model error due to the lack of fit for the census values of 1979 by the model with 1969 used as the base year.

The data consist of Y, the sample median income of four-person families in state i and associated variance Vi (i = 1,---, 51). The true medians corresponding to Y's are denoted by Oi's respectively. The design matrix X = (XI, -, X51)T is of the form

X = (1 xi Xi2), (i = 1,.--, 51), (5.5.2) where xil and Xi2 denote respectively the adjusted census median income and the base-year census median income for four-person families in the ith state.

We consider the HB model as given in Section 5.1. We choose F to be 6contamination class of priors where the contamination class includes all unimodal distributions. We find RB estimates of statewise median incomes of four-person families for 1989 using 1979 as the base year based on ML-II priors. In finding RB estimates, we have used (5.2.29) and (5.2.30) with a0=l, O0 =10 and =.1. The alternatives in our setting are HB and EB estimates of 0i. For the HB analysis, we use diffuse prior on T2 instead of using class of priors. The EB method is due to Morris (1983b) which uses estimates of T_2 and )3 rather than assigning any prior distribution on hyperparameters. Since in addition to the sample estimates, the 1989 incomes are also available from the 1990 census data, we compare all three Bayes estimates against the 1990 census figures, treating the latter as the "truth". Table 5.1 provides the "truth", the CPS estimates and three Bayes estimates, whereas Table 5.2 provides the standard errors associated with three Bayes estimates.
Now these estimates are compared according to several criteria. Suppose eiTR denotes the true median income for the ith state, and ei any estimate of eirT. Then for the estimate e = (el,---, e51)T of eTR = (elTR, ",e51TR)T, let 51
Average Relative Bias = (51)-1 l ei - eiTRI /eiTR j=l

51
Average Squared Relative Bias = (51)-' lei - e,TRI2 /e,R 51
Average Absolute Bias = (51)-1 lei- eiTI i=1

51
Average Squared Deviation = (51)-l (ei - eiTR)2
i=1

The above quantities are provided in Table 5.3 for all three Bayes estimates.

From Table 5.1, one can see that three Bayes estimates of the small area means are quite close to each other. But from Table 5.2, it appears that the estimated standard errors given by RB method are larger than those given by HB or EB methods. Table 5.3 indicates that the EB estimates are the best and the HB estimates are the second best under all criteria for this data set. As anticipated also, the HB standard errors are slightly bigger than the EB standard errors. As is well-known, this phenomenon can be explained by the fact that unlike the EB estimators, the HB estimators take into account the uncertainty for estimating the prior parameters.

Next, we find the ranges of the posterior means for the small areas under Econtamination class. Table 5.4a and Table 5.4b provide the ranges of posterior means under e-contamination class when (Qo, 030) = (1, 10) and (a0, 3o) = (7, 3) respectively. For (ao, 30) = (1, 10), the ranges are fairly small and robustness seems to be achieved using this class for all e values. But for (ao, i30) = (7, 3), the ranges are relatively wider in comparison with (a0, 030) = (1, 10). As one might expect, the choice of h0, that is, the elicited inverse gamma prior, seems to have some effect on the ranges of the posterior means for E-contamination class. Note that the inverse gamma prior of T2 with (a0, 3o) = (1, 10) has coefficient of variation 1/8 compared to 1 with (ao, /3o) = (7, 3). Although the two priors have very similar tails, the former is much flatter than the latter even for small values of T2. This suggests that the bigger the

91

taA

ï¿½0
I

tau^2

Figure 5.1. The IG(1, 10) Prior

coefficient of variation of the assumed inverse gamma prior, the wider is the range of the posterior means of the O's.
We also find the ranges of the posterior means under density ratio classes. In computing the bounds, we have considered Gibbs sampler with 10 independent sequences, each with a sample of size 1000 with a burn-in sample of another 1000. We adopt the basic approach of Gelman and Rubin (1992) to monitor the convergence of the Gibbs sampler. We have obtained / values (the potential scale reduction factors) corresponding to the estimand 0i based on 10 x 1000 = 10000 simulated values. The fact that all the point estimates f are equal to 1 as well as the near equality of these point estimates and the corresponding 97.5% quantiles suggests that convergence is achieved in the Gibbs sampler. Table 5.5a and Table 5.5b provide the ranges of the

01'0 20 3 0 4 10 50 tauA2

Figure 5.2. The IG(7, 3) Prior

posterior means under density ratio classes when (aO, /3o) (1, 10) and (ceo, i0o) (7, 3) respectively. Note however that here the bounds given for IG(7, 3) are much closer than the ones for IG(1, 10). The intuitive explanation for this phenomenon is that while ---Q,2 can be extremely large for certain choices (T , T') corresponding
ho(

to the IG(F, 10) prior, the ratio is more under control for the IG(7, 3) prior. This density ratio classes are very convenient to represent vague prior knowledge and robustness seems to be achieved using the IG(7, 3) prior.

Finally, we consider the ranges of the posterior quantities E[O ly, T2] over the class of uniform priors. From Table 5.6, we can see that the ranges are not sensitive to

93

the choice of the upper bound of T2. Also, for most of the states, the ranges of the posterior means do not seem to be too wide.

Full Text

PAGE 1

ROBUST BAYESIAN INFERENCE IN FINITE POPULATION SAMPLING By DAL-HO KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1994

PAGE 2

To my parents and teachers

PAGE 3

ACKNOWLEDGEMENTS I would like to express my sincere gratitude to Professor Malay Ghosh for being my advisor and for originally proposing the problem. Words cannot simply express how grateful I am for his patience, encouragement and invaluable guidance. Without his help it would not have been possible to complete the work. I would like to thank Professors Michael A. DeLorenzo, Alan Agresti, James G. Booth, Brett D. Presnell and Jeffrey A. Longmate for their encouragement and advice while serving on my committee. Much gratitude is owed to my parents, brother, sisters, mother-in-law, brotherin-law and sisters-in-law, whose support, advice, guidence and prayers throughout the years of my life have made this achievement possible. My debt to them can never be repaid in full. A very special thanks are offered to my wife, Kyungok, for her love, patience, support and prayers, and my daughters, Seunghyun and Grace, for being a joy to us.

PAGE 4

TABLE OF CONTENTS ACKNOWLEDGEMENTS iii ABSTRACT vi CHAPTERS 1 INTRODUCTION 1 1.1 Literature Review 1 1.2 The Subject of This Dissertation 5 2 ROBUST BAYES ESTIMATION OF THE FINITE POPULATION MEAN 8 2.1 Introduction 8 2.2 Robust Bayes Estimators 10 2.3 Symmetric Unimodal Contamination 20 2.4 An Example 34 3 ROBUST BAYES COMPETITORS OF THE RATIO ESTIMATOR ... 36 3.1 Introduction 36 3.2 e-Contamination Model and the ML-II Prior 39 3.3 Symmetric Unimodal Contamination 46 3.4 An Example 55 4 BAYESIAN ANALYSIS UNDER HEAVY-TAILED PRIORS 58 4.1 Introduction 58 4.2 Known Variance 60 4.3 Unknown Variance 65 4.4 An Example 72 5 BAYESIAN ROBUSTNESS IN SMALL AREA ESTIMATION 76 5.1 Introduction 76 5.2 e-Contamination Class 78 IV

PAGE 5

5.3 Density Ratio Class 84 5.4 Class of Uniform Priors 86 5.5 An Example 87 6 SUMMARY AND FUTURE RESEARCH 114 6.1 Summary 114 6.2 Future Research 115 BIBLIOGRAPHY 116 BIOGRAPHICAL SKETCH 122 v

PAGE 6

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ROBUST BAYESIAN INFERENCE IN FINITE POPULATION SAMPLING By Dal-Ho Kim August 1994 Chairman: Malay Ghosh Major Department: Statistics This dissertation considers Bayesian robustness in the context of the finite population sampling. Although we are concerned exclusively with the finite population mean, the general methods are applicable to other parameters of interest as well. First, we consider some robust Bayes competitors of the sample mean and the subjective Bayes estimators of the finite population mean. Specifically, we have proposed some robust Bayes estimators using ML-II priors from the ^-contamination class. Classes of contaminations that are considered include all possible contaminations and all symmetric, unimodal contaminations. These estimators are compared with the sample mean and subjective Bayes estimators in terms of Â“posterior robustnessÂ” and Â“procedure robustnessÂ”. Similar robust Bayes estimators are introduced in the presence of auxiliary information, and these are compared again in terms of Â“posterior robustnessÂ” and Â“procedure robustnessÂ” with the ratio estimator, and subjective Bayes estimators utilizing the auxiliary information. Also, we provide the range where the posterior mean belongs under e-contamination models. vi

PAGE 7

Second, we consider the idea of developing prior distributions which provide Bayes rules that are inherently robust with respect to reasonable misspecification of the prior. We provide the robust Bayes estimators of the finite population mean based on heavy-tailed prior distributions using scale mixtures of normals. A Monte Carlo method, the Gibbs sampler, has been used for implementation of the Bayesian program. Also, the asymptotic optimality properties of proposed robust Bayes estimators are proved. Finally, we address Bayesian robustness in the hierarchical Bayes setting for small area estimation. We provide robust hierarchical Bayes estimators of the small area means based on ML-II priors under the e-contamination class where the contamination class includes all unimodal distributions. Also, we provide the range where the small area means belong under e-contamination class as well as the density ratio class of priors. For the class of priors that are uniform over a specified interval, we investigate the sensitivity as to the choice of the intervals. The methods are illustrated using data related to the median income of four-person families in the 50 states and District of Columbia. Vll

PAGE 8

CHAPTER 1 INTRODUCTION 1.1 Literature Review When prior information about an unknown parameter, say 9 , is available, Bayesian analysis requires quantification of this information in the form of a (prior) distribution on the parameter space for any statistical analysis. The most frequent criticism of subjective Bayesian analysis is that it supposedly assumes an ability to quantify available prior information completely and accurately in terms of a single prior distribution. Given the common and unavoidable practical limitations on factors such as available prior elicitation techniques and time, it is rather unrealistic to be able to quantify prior information in terms of one distribution with complete accuracy. In view of this difficulty in prior elicitation, there has long been a robust Bayesian viewpoint that assumes only that subjective information can be quantified only in terms of a class T of prior distributions. A procedure is then said to be robust, if its inferences are relatively insensitive to the variation of the prior distribution over T. The robust Bayesian idea can be traced back to Good as early as in 1950 (see for example Good (1965)), and has been popularized in recent years, notably by the stimulating article of Berger (1984). There is a rapidly growing literature in the area of robust Bayesian analysis. Berger (1984, 1985, 1990) and Wasserman (1992) provide reviews and discussion of the various issues and approaches. Berger (1984) discusses the philosophical or pragmatic reasons for adopting the robust Bayesian 1

PAGE 9

2 viewpoint along with a review of some of the leading approaches to robust Bayesian analysis. More recently, Berger (1990) provides a review of different approaches to the selection of T and techniques used in the analyses. Wasserman (1992) discusses several approaches to computation of bounds on posterior expectations for certain classes of priors as well as different approaches to graphical and numerical summarization. Various classes of priors have been proposed and studied in the literature. One such class is the e-contamination class having the form r = {vr : n(0) = (1 e)7r 0 (0) + eq(0), q 6 Q} (1.1.1) where Q is the class of allowable contaminations, and no is a fixed prior, commonly called the base prior, which can be thought of as the prior one would use if a single prior had to be chosen, e-contamination priors have been used by Huber (1973), Berger (1984, 1990), Berger and Berliner (1986), and Sivaganesan and Berger (1989) among others. Berger and Berliner (1986) studied the issue of selecting, in a data dependent fashion, a Â“goodÂ” prior distribution (the type-II maximum likelihood prior) from the e-contamination class, and using this prior in any subsequent analysis. Sivaganesan and Berger (1989) determined the ranges of the various posterior quantities of interest (e.g. the posterior mean, posterior variance, posterior probability of a set (allowing for credible sets or tests)) as the prior varies over T. Wasserman (1989) provides a robust interpretation of likelihood regions in the case of all possible contaminations. Also related is Berger and Sellke (1987), who carry out the determination of the range of the posterior probability of a hypothesis when e = 1 in (1.1.1) (i.e., when there is no specified subjective prior no). Another class of interest has been considered in DeRobertis and Hartigan (1981) to find ranges of general posterior quantities. This class is of the form r . Mfr) < 9\{0i) ' 7t(02) ~ 02(02) V0i,0 2 } ( 1 . 1 . 2 )

PAGE 10

3 where g 2 < g\ are given positive functions. Such class is called a density ratio class by Berger (1990). Wasserman and Kadane (1992) develop a Monte Carlo approach to computing bounds on posterior expectations over the density ratio class. Cano (1993) considers the range of the posterior mean in hierarchical Bayes models for several classes of priors including e-contamination class and density ratio class. Learner (1982) and Polasek (1985) considers a class of conjugate priors with constraints on the domain of parameters to get closed form expressions for posterior expectations, and did sensitivity analyses. DasGupta and Studden (1988) consider the above conjugate class, as well as a density ratio class in the context of Bayesian design of experiments. The difficulty of working with a class of priors makes it very appealing to find prior distributions which provide Bayes rules that are naturally robust with respect to reasonable misspecification of the prior. Since Box and Tiao (1968, 1973), Dawid (1973), and OÂ’Hagan (1979), it has been recognized that insensitivity to outliers in Bayesian analysis can be achieved through the use of flat-tailed distributions. Use of t-priors when the data are normal has been specifically considered by West (1985) and OÂ’Hagan (1989). Angers and Berger (1991) consider a robust Bayesian estimation of exchangeable means using Cauchy priors. In Angers (1992), the Cauchy prior is replaced by t-prior with odd number of degrees of freedom. Andrews and Mallows (1974) and West (1987) study scale mixture of normal distributions which can be used for achieving Bayesian robustness. The Student t family, double-exponential, logistic, and the exponential power family can all be constructed as scale mixtures of normals (cf Andrews and Mallows (1974)). West (1984) uses scale mixtures of normal distributions for both the random errors and the priors in Bayesian linear modelling. Carlin and Poison (1991) discuss Bayesian model specification and provide a general paradigm for the Bayesian modelling of nonnormal errors by using scale mixtures of normal distributions.

PAGE 11

4 The references so far pertain to Bayesian analysis for infinite populations. Bayesian analysis in finite population sampling is of more recent vintage. A unified and elegant formulation of Bayes estimation in finite population sampling was given by Hill (1968) and Ericson (1969). Since then, there are many papers in the area of Bayes estimation in finite population sampling. But most of the Bayesian literature in survey sampling deals with subjective Bayesian analysis in that the inference procedure is based on a single completely specified prior distribution. The need for robust Bayesian analysis in survey sampling, however, has been felt by some authors, although the methods described in the first few paragraphs of this section have not so far been used to achieve this end. Godambe and Thompson (1971) adapted a framework whereby the prior information could only be quantified up to a class T (C in their notation) of prior distributions. For estimating the population total in the presence of auxiliary information, they came up with the usual ratio and difference estimators, justifying these on the ground of location invariance. The model assumption there played a very minimal role, the main idea being that model-based inference statements could be replaced, in the case of model-failure by design-based inference. In a later study, Godambe (1982) considered the more common phenomenon of specific departures from the assumed model. His contention there was that sampling designs could be a useful companion of model-based inference procedures to generate Â“near-optimalÂ” and Â“robustÂ” estimators. However, the basic model assumed in that paper considered 2/i, * * * , 2/jv to be independent, and attention was confined only to design and model unbiased estimators. Royall and Pfefferman (1982) also consider robustness of certain Bayes estimators in finite populations. However, their main concern is to find out conditions under which the Bayes estimators under an assumed model remain the same under certain departures from the model.

PAGE 12

5 Much of the earlier work as well as some of the current work in finite population sampling relates to inference for a single stratum. However, there is a recent need for simultaneous inference from several strata, particularly in the context of small area estimation which is becoming increasingly popular in survey sampling. Agencies of the Federal Government have been involved in obtaining estimates of population counts, unemployment rates, per capita income, crop yields and so forth for state and local government areas. In typical instances, only a few samples are available from an individual area, and an estimate of a certain area mean or simultaneous estimates of several area means can be improved by incorporating information from similar neighboring areas. Ghosh and Rao (1994) have recently surveyed an early history as well as the recent developments on small area estimation. Bayesian methodology has been implemented in improving small area estimators. Empirical Bayes (EB) approach has been considered for simultaneous estimation of the parameters for several small areas where each area contains a finite number of elements, by Fay and Harriot (1979), Ghosh and Meeden (1986), and Ghosh and Lahiri (1987). Datta and Ghosh (1991), as an alternative to the EB procedure, propose hierarchical Bayes (HB) approach for estimation of small area means under general mixed linear models, and also discuss the computational aspects. To handle the case of heavy tailed priors in the setting of Fay and Herriot, Datta and Lahiri (1994) use t-priors viewing them as scale mixture of normals. 1.2 The Subject of This Dissertation This dissertation considers Bayesin robustness in the context of the finite population sampling. Although we are concerned exclusively with the finite population mean, the general methods are applicable to other parameters of interest as well. In Chapter 2, we develop some robust Bayes estimators of the finite population mean using ML-II priors under ^-contamination models. We compare the performance

PAGE 13

6 of these estimators with the subjective Bayes estimators as well as the sample means using the criteria of Â“posterior robustnessÂ” and Â“procedure robustnessÂ” as suggested by Berger (1984). As a consequence of the results on procedure robustness, we have established some asymtotic optimality of the robust Bayes estimators even in the presence of a subjective prior. Furthermore, modifying slightly the arguments of Siverganesan and Berger (1989), we have provided the range where the posterior mean belongs under e-contamination models. In Chapter 3, we provide expressions for variations of the posterior mean within the e-contamination class of priors in the presence of auxiliary information following Siverganesan and Berger (1989). Moreover, similar robust Bayes predictors using the ML-II priors under this contamination class of priors are introduced in the presence of auxiliary information. We compare these estimators again in terms of Â“posterior robustnessÂ” and Â“procedure robustnessÂ” with the classical ratio estimators, and a subjective Bayes predictors utilizing the auxiliary information. In the course of calculating indices for procedure robustness, we have proved the asymptotic optimality of the robust Bayes predictors. In Chapter 4, we propose certain robust Bayes estimators of the finite population mean from a different perspective. In order to overcome the problem associated with outliers in the context of finite population sampling, we develop Bayes estimators of the finite population mean based on heavy-tailed priors using scale mixtures of normals. Also, we study the asymptotic optimality property of the proposed Bayes estimators. For implementation we have devised a computer-intensive fully Bayesian procedure which uses Markov chain Monte Carlo integration techniques like the Gibbs sampler. Chapter 5 addresses Bayesian robustness in the hierarchical Bayes setting in the context of small area estimation. We provide the robust hierarchical Bayes estimators of the small area means based on ML-II priors under the e-contamination class of

PAGE 14

7 priors where the contamination class includes all unimodal distributions. We compare these estimators with HB and EB estimators. Also, we provide the range where the small area means belong under e-contamination class as well as the density ratio class of priors. Finally, in Chapter 6, we discuss some open problems which will be topics for future research.

PAGE 15

CHAPTER 2 ROBUST BAYES ESTIMATION OF THE FINITE POPULATION MEAN 2.1 Introduction Consider a finite population U with units labeled 1,2, ...,7V. Let y, denote the value of a single characteristic attached to the unit i. The vector y = ( 2/1 , * * * , Vn) t is the unknown state of nature, and is assumed to belong to O = R N . A subset s of {1,2,..., N} is called a sample. Let n(s) denote the number of elements belonging to s. The set of all possible samples is denoted by S. A design is a function p on S such that p(s) Â€ [0, 1] for all s Â€ S and Y,sesP( s ) = L Given y Â€ 0 and s = {A, Â• Â• Â• , i n (s)} with 1 < i\ < Â• Â• Â• < i n { s ) < AT, let y(s) = { y fl , Â• Â• Â• , y in }. One of the main objectives in sample surveys is to draw inference about y or some function (real or vector valued) 7 (y) of y on the basis of s and y{s). Bayes estimators of the finite population total (or the mean) within the class of linear unbiased estimators were obtained by Godambe (1955). Subsequently Godambe and Joshi (1965) found such estimators within the class of all unbiased estimators. General Bayesian approach for finite population sampling was initiated by Hill (1968) and Ericson (1969). Since then, a huge literature has grown in this area. Ericson (1983) provides a recent account. We have discussed the use of a single subjective prior in Section 1.1. In this chapter, we generate certain robust Bayes estimators using e-contamination priors, and study their performance over a broad class of prior distributions on the parameter 8

PAGE 16

9 space. As we mentioned in Section 1.2, we have restricted ourselves to the estimation of the finite population mean. It may be noted that in our framework, the sampling design does not play any role in the robustness study. More generally, in the Bayesian framework, the irrelevance of the sampling design at the inference stage has been pointed out earlier by Godambe (1966), Basu (1969) and Ericson (1969). Our findings are consistent with that. This, however, does not undermine the importance of sampling design in the actual choice of units in a survey. It is very crucial to design a survey before its actual execution, and we have made some comments on why a simple random sample is justified even within our framework. Royall and Pfefferman (1982) also consider robustness of certain Bayes estimators. However, their main concern is to find out conditions under which the Bayes estimators under an assumed model remain the same under certain departures from the model. The contents of the remaining sections are as follows. In Section 2.2, we develop some robust Bayes estimators of the finite population mean using ML-II priors under e-contamination models (see Good (1965) and Berger and Berliner (1986)) where the contamination class includes all possible distributions. As in Berger (1984), the concepts of posterior and procedure robustness are introduced, and the proposed robust Bayes estimators are compared to the sample means and the subjective Bayes estimators under this criterion. It turns out that the robust Bayes estimator enjoys good posterior as well as procedure robustness property relative to the sample mean, while the subjective Bayes estimator enjoys good posterior robustness, but lacks in procedure robustness. As a consequence of the results on procedure robustness, we have established some asymtotic (as the sample size increases) optimality of the robust Bayes estimators in the presence of a subjective prior. Specifically we have shown that the difference of the Bayes risks of the proposed robust Bayes estimator and the optimal Bayes estimator converges to zero at a certain rate as the sample size tends

PAGE 17

10 to infinity. Also, modifying slightly the arguments of Siverganesan (1988), we have provided the range where the posterior mean belongs under e-contamination models. This pertains to the sensitivity analysis of the proposed robust Bayes procedure. Section 2.3 contains the more realistic framework when the contamination class contains only symmetric unimodal distributions. We have developed robust Bayes estimators as competitors of the classical estimator as well as the subjective Bayes estimator, and have studied their properties. Finally, in Section 2.4, a numerical example is provided to illustrate the methods described in Sections 2.2 and 2.3. One of the highlights of this chapter is the analytical study of the procedure robustness and the subsequent asymptotic optimality in the presence of a subjective prior. To our knowledge, this issue has not been addressed before in the previous work. For simplicity, in the subsequent sections, only the case where p(s) > 0 if and only if n(s) = n will be considered. This amounts to considering only fixed samples of size n. Also, throughout this chapter, the loss is assumed to be squared error. 2.2 Robust Baves Estimators Consider the case when \9 ~ N(9,a 2 ) (i=l, ..., N) and 9 ~ N(ii 0 ,Tq). Write M 0 = cr 2 /ro, B 0 = M 0 /(M 0 + n), y(s) = n^EsesVi and y(s) = {y { : i s}, the suffixes in y(s) being arranged in the ascending order. From Ericson (1969), it follows that the posterior distribution of y(s) given s and y(s) is iV({( 1 Â— B 0 )y(s ) + B 0 /io}lArn ,<7 2 (I/v_ n + (Mo-l-n) _1 Jjv_ n )), where l u is a u-component column vector with each element equal to 1, J u = l u l^ and I u is the identity matrix. Then the Bayes estimator of 7 (y) = iV _1 Vi is SÂ°{s,y{s)) = El[7(2/)|s,y(s)] = fy{s) + (1 -/){(!B 0 )y(s) + B 0 y 0 }

PAGE 18

11 = y{s) (1 f)B 0 (y(s) /i 0 ), (2.2.1) where / = n/N. Strictly speaking, we should call 5Â° a predictor of 7 (y) since 5Â° is the sum of the observed yi s plus a predictor of the total of the unobserved y^s, but we shall use the two terms Â“estimatorsÂ” and Â“predictorsÂ” interchangeably. Also, the associated posterior variance is given by ^(7(y)k y{s)) = N~ 2 (N n)a 2 (M 0 + N)/{M 0 + n). (2.2.2) The classical estimator of 7 (y) is S c (s,y(s)) Â— y(s) which is a design-unbiased estimator of 7 (y) under simple random sampling. Also, S c is a model-unbiased estimator of 7 (y) under any model which assumes that the y^ s have a common mean. To derive robust Bayes estimators of 7 (y), we first introduce the notion of econtaminated priors. Denote 7r 0 by the N(y 0 , Tq) distribution. The class T* of prior distributions is given by T* = {-7T : 7r = (1 e)7r 0 + eq, q E Q}, (2.2.3) where 0 < e < 1 is given, and Q is the class of all distribution functions. We denote m(y|7r) by the marginal (predictive) density of y under the prior n. If ir G T*, we can write m (yk) = (1 e)m(y\ir 0 ) + em(y\q). This leads to m(y(s) |7r) = (1 Â£)m(y(s)\n 0 ) + em(y(s)\q). (2.2.4) Our objective is to choose the prior 7r which maximizes m(y(s)\n) over T*. This amounts to maximization of rn(y(s)\q) over q Â€ Q. Noting that m / OO -00 (27TCT 2 ) 2 exp[Â— 0) 2 /( 2cr2 )]9(d#)> (2.2.5)

PAGE 19

12 it follows that m(y(s)\q) is maximized with respect to the prior which is degenerate at y(s). We shall denote this prior q by 5y( s )i \$ being the dirac-delta function. The resulting (estimated) prior, say n s , is now given by TTs = (1 eKo + etifa). (2.2.6) The prior n s is called ML-II prior by Good (1965), and Berger and Berliner (1986). Using the prior 7r s , y(.s) is marginally distributed as (1 Â— e)N (/x 0 l n , Â° 2 In + Tq J n ) + eF s , (2.2.7) where F s has (improper) pdf f(Vi\ iÂ£s) = (2? rcr 2 )~2 exp[~ y(s)) 2 /(2
PAGE 20

13 and Kv(s)) = (1 Â£)m(y(s)\n 0 )/m(y(s)\n s ). (2.2.13) From Ericson(1969), 7r 0 (y(s)|s, y(s)) is the N[{(1 B 0 )y(s) + B 0 fi 0 }l N -n, & 2 {lN-n + (A/o + n) 1 J jvÂ— n)] pdf. Also, / /( kmiwi *. Â»(*))<Â» = / f(v(mf(y(s)mmdo/m(y(sm = Â™(y\q)/m(y(s)\q) = (2ira 2 )-^ exp[Â— Â£(yÂ« y(s)) 2 /(2a 2 )](2.2.14) Further, after some algebraic simplifications, ^ -1 (2/( s )) = 1 +e(l ~ ^y 1 m{y{s)\q}/m(y(s)\n 0 ) = A^ L (y(s)). (2.2.15) This completes the proof of the theorem. Note that under the posterior distribution given in (2.2.9), the Bayes estimator of 7 (y) is given by 5 RB (s,y(s)) = N1 [ny{s) + (N-n){X ML (y{8))({l B 0 )y{s) + B 0 n 0 ) +(1 A ML(y(s)))y(s)}] = y(s) (1 f)^ML(y{s))B 0 (y(s) Ho). (2.2.16) Note that for e close to zero i.e. when one is very confident about the N(y 0 , Tq) prior for 0, since A ml(?/(s)) is close to 1, it follows that 5 RB is very close to 6Â°. On the other hand, when e is close to 1, i.e. there is very little confidence in the iV(/xo,Tg) prior, A ml{v{s)) is close to zero, and 5 RB is very close to the sample mean y(s). Thus, as one might expect, a robust Bayes estimator serves as a compromise between the

PAGE 21

14 subjective Bayes and the classical estimators of 7 (y). Also, generalizing the formula (1.8) in Berger and Berliner (1986), i.e., V"(x) = \(x)V*Â°(x) + (1 \(x))V*(x) + A(x)(l \{x)){6 no (x) 5 9 (a:)) 2 , one gets V(7(v)kÂ»(Â«)) = N~ 2 [(N ny + (Nn )~ 2 {o 2 + AÂ«l(1 Xml)B 2 0 (Â§(s) W ) 2 }]. m 0 + n (2.2.17) Next, we compare the performances of 5Â°, 5 C and S RB from the robustness perspective. The main idea is that we want to examine whether these estimators perform satisfactorily over a broad class of priors. With this end, for a given prior Â£, denote by p(Â£, (s,y(s)),a) the posterior risk of an estimator a(s,y(s)) of y(y), i.e., p(f, (s,y(s)),a) = E[{a(s,y(s)) 7(y)} 2 |s, ?/(Â«)]Â• The following definition is taken from Berger (1984). Definition 2.2.1 An estimator ao(s,y(s)) is ^-posterior robust with respect to T if for the observed (s,y(s)), POR r (a 0 ) = sup |p(f , (s, y(s)), a 0 ) inf p(^, (s, y(s)), a) | < C(2.2.18) We shall, henceforth, refer to the left hand side of (2.2.18) as the posterior robustness index of the estimator a 0 (s,y(s )) of 7 (y) under the class of priors T. PORr(a 0 ) in a sense is the sensitivity index of the estimator a 0 of 7 (y) as the prior varies over T. For any given ( > 0, it is very clear that whether or not posterior robustness exists will often depend on which (s,y(s)) is observed. This will be revealed in the examples to follow.

PAGE 22

15 To examine the posterior robustness of 5Â°, 5 C and 6 RB , consider the class of Af(p,r 2 ) priors, p (real) and r 2 (> 0). Write M = cr 2 /r 2 , B = M/(M+n), where 0) is known. Calculations similar to (2.2.1) now give the Bayes estimator of 7 (y) under the N(p, r 2 ) prior (to be denoted by ^,b) as ^ B (s,y(s)) = fy(s) + (l-f){Bp+(l-B)y(s)} = y{Â») (1 f)B{y(s) p). (2.2.19) Then the following results hold: p(&,B, (Â«, y(Â«)), SÂ»' B ) = N~ 2 (N n) 0) is indeed too big to be practically useful. As a next step, we consider the smaller class 7V(p 0 ,r 2 ) of priors, where the mean p 0 is specified. This is not too unrealistic since, very often from prior experience, one can make a reasonable guess at the center of the distribution.

PAGE 23

16 Note that when y = // 0 , denoting ^ 0jB by and 5^Â°Â’ B by 5 B , (2.2.21) (2.2.23) simplify to P(&> ( s , y{s)), 6Â°) p(f B , (s , !/(Â«)), 5 B ) = (1 /) 2 (Â£ 0 B) 2 (y(s) y 0 ) 2 ; (2.2.24) p(Â£b> (a, y(a)), ^ C ) p(Â£b, (a, !/(Â«)), 5 B ) = (1 f) 2 B 2 (y(s ) // 0 ) 2 ; (2.2.25) P(&> (Â«> iM), O P(6b (a, l/(a)), 5 B ) = (1 f) 2 (B 0 \ ML (y(s )) B) 2 (Â£(s) /x 0 ) 2 (2.2.26) Accordingly, from (2.2.24) (2.2.26), POR r ( 0 and 0 < / < 1, the posterior (^-robustness 0 f all these procedures depends on the closeness of the sample mean to the prior mean y 0 . Also, it follows from (2.2.27) (2.2.29) that both the subjective and robust Bayes estimators are more posterior robust than the sample mean for the {N(y 0 , t 2 ), t 2 > 0} class of priors. A comparison between (2.2.27) and (2.2.29) also reveals that the robust Bayes estimator arrived at by employing the type-II ML prior enjoys a greater degree of posterior robustness than the subjective Bayes estimator if B Q X ML (y(s)) > 1/2. Although the Bayesian thinks conditionally on (s, y(s)), and accordingly in terms of posterior robustness, it is certainly possible to use the overall Bayes risk in defining a suitable robustness criterion. This may not be totally unappealing even to a Bayesian at the preexperimental stage when he/she perceives that y will be occurring according to a certain marginal distribution. The overall performance of an estimator a(s,y(s)) of 7 (y) when the prior is Â£ is given by r(f,a) = E[p(Â£, (s, y(s)), a)], the expectation being taken with respect to the resulting marginal distribution of y(s). The following method of measuring the robustness of a procedure is given in Berger (1984).

PAGE 24

17 Definition 2.2.2 An estimator ao(s, y(s )) of 7 (y) is said to be C-procedure robust with respect to T if PR r (a 0 ) = sup |r(Â£,a 0 ) inf r(Â£,a)| < C(2.2.30) We shall henceforth refer to PRr(ao) as the procedure robustness index of a 0 . Next we examine how the three estimators 5Â°, 8 C and 8 RB compare according to the PR criterion as given in (2.2.30) when we consider the {iV(/i 0 , r 2 ), t 2 > 0} class of priors. Using the same notation as before for the At(/x 0 ,r 2 ) prior, from (2.2.24) (2.2.26) it follows that 8Â°) r(&, 8 B ) = (1 f) 2 (B 0 B) 2 a 2 /(nB)(2.2.31) r{^B, 5 C ) r(Â£ B , 8 B ) = (1 f) 2 Ba 2 /n; (2.2.32) O ~ r(S B , 8 B ) = (1 f) 2 E[(B 0 \ ML (y(s )) B) 2 (y(s) /i 0 ) 2 ](2.2.33) It is clear from (2.2.31) (2.2.33) that PR r (Â£Â°) = + 00 , PR r (d c ) = (1 /)V/n, PR r (0 = (l -/) 2 sup E[(B 0 X ML (y(s))-B) 2 (y(s)-fi 0 ) 2 ]. (2.2.34) (2.2.35) (2.2.36) 0 0, E[(B 0 \ ML (y(s)) B) 2 (y(s) // 0 ) 2 ] = O e (Bi), where O e denotes the exact order.

PAGE 25

18 Proof of Th eorem 2.2.2 Noting that n(y(s)-p 0 ) 2 ~ ( a 2 /B)xl, it follows from (2.2.36) that E[(BqX M l ( y(s)) B) 2 (y(s ) y 0 ) 2 Bo roo =L < a 2 u u}/ 2 ~ 1 1 4g exp(uB 0 /2B) ~ B ^^B U exp ^~ 2 ) 2'/ 2 T(\/2) du ' ( 2 Â‘ 2 Â‘ 37 ) where g = (e/(l Â— e))B 0 ^ 2 . Next observe that rhs of (2.2.37) < a 2 fÂ°Â° n Jo 1 B 2 D o {1 + g exp(uB 0 /2B)} 2 + B 2 }B X Mexp(Â— ^) u . u 1 / 2-1 2 ; 2 1 / 2 T(l/2) du a 2 B% , rÂ°Â° ii 5 it' f q B l ex P<-2 (1 + B Â° B ^wmrn du + B i 3/2-1 2 3 / 2 r(3/2) n2g 1 = 0{B 1 ' 2 ). (2.2.38) Again, writing g'Â— max(g, 1), rhs of (2.2.37) > Â— fÂ°Â°[ n Jo |2o' i B 2 n o B~ x 2B. u v?! 2 ^ + B] exp(--) Â— ln ^, n du {2g' exp{uB 0 /2B)} 2 g exp(-uB 0 /2B) ' " J ^ v 2'2 3 / 2 T(3/2) = Â— [{B 0 /2g') 2 B-\l + 2BoBl )-* /2 2Â£ 0
PAGE 26

19 it appears that the classical estimator 5 C has a certain edge over 6 RB for small B. Smallness of B can arise in two ways. One, n is very large, the second, r 2 is much larger compared to a 2 . This is, however, natural to expect since small B signifies small variance ratio cr 2 /r 2 which amounts to instability in the assessment of the prior distribution of 0. It is not surprising that in such a situation, it is safer to use y(s) in estimating 7 ( 2 /) if one is seriously concerned about long-run performance of an estimator. Theorem 2.2.2 (specifically the upper bound derived in this theorem for the difference of Bayes risks) clearly demonstrates the procedure robustness of 6 RB . The superior performance of the POR index of this estimator relative to the POR index of 5 C has already been established. On the other hand 5Â° which performs better than 6 C on its POR index, fails miserably in its procedure robustness. This also shows that the average long-term performance of a procedure can sometimes be highly misleading when used as a yardstick for measuring its performance in a given situation. In this case, 5 RB seems to achieve the balance between a frequentist and a subjective Bayesian. In practice, however, a 2 is not usually known. In such a situation, one can conceive of an inverse gamma prior for a 2 independent of the prior for 6 to derive a Bayes estimator of 7 (y) (see Ericson (1969)). In a robust Bayes approach, if one assumes a mixture of a normal-gamma prior and a type -II ML prior for ( 9 , cr 2 ), then the type-II ML prior puts its mass on the point ( y(s ), ^2i es (yi Â— v{ s )) 2 l n )If is possible to study the robustness of the resulting Bayes estimator which will not be pursued here. Next we consider the problem of finding the range of the posterior mean of 7 (y) = Eili Vi over T* in (2.2.3). Simple modifications of the arguments of Sivaganesan and Berger (1989) or Sivaganesan (1988) leads to the following theorem.

PAGE 27

20 Theorem 2.2.3 sup E[j(y)\s,y(s)] = fy(s ) + (1 7rerÂ« n a6Â”Â°(y(s)) + e u f(y(s)\6 u ) a + f{y(s)\6 u ) (2.2.40) and inf E[y(y)\s,y(s)] = fy(s) + (1 7itl * aS n Â°(y(s)) + 6tf(y(s)\di) a + f{y(s)\0i) (2.2.41) where a = e(l -e)1 m(r/(s)|7r 0 ), 6Â”Â°(y(s)) = (1 B 0 )y(s) + B 0 y 0 , 0 t = v t a / yfti+y{s), 8u Â— v u a/y/n + y(s), and the values of Vi and v u are the solutions in v of the equation e v2/ 2 -cv 2 -bv + c = 0, (2.2.42) where c = a{2na 2 ) n / 2 exp [E ie4 (j/i y{s)) 2 /(2a% and b = c^(y{s) 5*Â°(y(s)))/a. Note that the equation (2.2.42) has exactly two solutions which may be obtained by rewriting it in the form v = [Â— 2\og(cv 2 + bv Â— c)] 1 / 2 and solving it iteratively using the initial values ~ bÂ± V b +2(i+2c)(i+c) ^ 2.3 Symmetric Unimodal Contamination The contamination class Q of the previous section contains all possible distributions. Needless to say, such a class is too big to be useful in practice as it may contain many unwanted distributions. In this section, we consider the smaller but sufficiently rich class Q which contains all unimodal distributions symmetric about /x 0 . This class was considered by Berger and Berliner (1986). The contamination prior pdf tx is given by ^{6) = (1 e)vr 0 (6>) +eq{0), (2.3.1) where q E Q, 0 < e < 1, and as before txq{9) is the N(/j,q,Tq ) pdf. To find the ML-II prior in this case, we use the well-known fact that any symmetric unimodal

PAGE 28

21 distribution is a mixture of symmetric uniform distributions (cf Berger and Sellke (1987)). Thus it suffices to restrict q to Q' = { G k : G k is uniform (n 0 -k,n 0 +k), k > 0}. The ML-II prior is then given by K( 9 ) = (! Â“ Â£ )MQ) + e&(0), (2.3.2) where q s is uniform (/.to Â— k,/j , { o + k), k being the value k which maximizes m(y(s)\q), the marginal pdf of y(s) under q. To find k, note that Â™(y(s)\q) = I (2? rcr 2 ) 5 exp[--^ Â£(& 0) 2 ]g(d0) / /to+fc 1 (2tt<7 2 ) 2 exp[-Â— 5Z(yi 0) 2 ]d0 .o-k la ies = (27ra 2 )-V(2A:V^)1 exp[-^ g(^ Â£(s)) 2 ] x [\$(y/nk/a z(s)) <&(-y/nk/a ^(s))], (2.3.3) where \$ is the N(0, 1) cdf and z(s) = y/n(y(s) Â— Ho)/a. The solution k is given by (cf Berger and Selke (1987), pi 17) k = k*a/y/n if |z(s)| > 1 = 0 if |z(s)| < 1, (2.3.4) where k* satisfies \$(F z(s)) \$(-r z{s)) = k*[cf>{k* z(s)) + (k * + z(s))j, (j) being the N(0, 1) pdf. Since both sides of the above equation are symmetric function of z(s), we can replace z(s) by |z(s)| to get \$(F \z(s)\) \$(-** k(s)|) = k*[cf>(k* |z(s)|) + cj>(k * + |z(s)|)]. (2.3.5)

PAGE 29

22 Clearly k* Â— 0 is a solution of (2.3.5). Berger (1985, p234) states without proof that there exists a unique solution &*(> 0) of (2.3.5) for |z(s)| > 1. Since the proof of the uniqueness result is nontrivial, and is very critical for many subsequent developments, we provide below a proof of the same. Lemma 2.3.1 There exists a unique solution k* > |z(s)| of (2.3.5) for |z(s)| > 1. Proof of Lemma 2.3.1 For notational simplicity, write k* Â— x and |z(s)| = y. Consider the function It suffices to show that for every y > 1, there exists a unique x > y for which h(x,y) Â— 0. To see this, first observe that Hence, if for some x 0 > 0, dh(x , y)/dx\ x=Xo > 0 (the existence of x 0 is guaranteed due to dh(x, y)/dx \ x=y = 2y 2 (t>(2y) > 0 for y > 0), then from (2.3.7), ( x 0 y) exp(2 x 0 y) + %o + y > 0. This implies that for every x x > x 0 , {x x y) exp(2 x x y) + x x + y > 0 which from (2.3.7) is equivalent to dh(x, y)/dx\ x=Xl > 0. Thus, for a fixed y, once h(x, y) starts increasing at x = x* (say), it continues to do so from x* onwards for that y. Moreover h(0,y) = 0 for all y , h{y,y) < 0 for all y > 1, and h(+oo,y ) = 1 for all y. This shows that for a fixed y > 1, h(x,y) starts decreasing as a function of x from 0 to a negative number, and then starts increasing upto 1. Since h(y,y) < 0, this guarantees the existence of an x > y for which h(x, y) = 0. Remark 2.3.1 Berger (1985) used the fact k* > |z(s)| if |z(s)| > 1 in order to provide an iterative procedure for finding k*. The above lemma substantiates BergerÂ’s claim in addition to justifying the uniqueness of k* . h{x, y) = \$(z y) \$(-a: y) x[(j)(x y) + (j)(x + y)}. (2.3.6) = x[(x y)(j)(x y) + (x + y)(j){x + y)] = xcj)(x + y) [(x y) exp(2 xy) + x + y] . (2.3.7)

PAGE 30

23 The following theorem provides the conditional distribution of y(s) given s and y(s) under the ML-II prior n*. Recall M 0 = 0, A su ( z ( s )) = 1 + e(\ e)~ l B 0 1,/2 (27r) 1 / 2 (l/2)[0(A:* z(s)) + (/>(k* + z(s))] x exp(B 0 z 2 (.s)/2) = V (*(Â«)) (say), (2.3.9) while for k = 0, SwM*)) = = Aj'(z(s)) (say). (2.3.10) Proof of Theorem 2.3.1 Arguing as in Theorem 2.2.1, the posterior pdf of y(s ) given s and y(s) is Kiv(s)\s,y(s)) = Xy{ s ))no(y{s)\s,y(s)) + (l-\(y(s)))m(y\q s )/m(y(s)\q s ), (2.3.11) where A _1 (y( s )) = 1 +e(l Â£)~ 1 rn(y(s)\q s )/m(y(s)\n 0 ). (2.3.12) But for k > 0, m{y(s)\q s )/m(y(s)\TV 0 ) = (2i)->B 0 -'/ 2 Â£"jÂ‘exp[-^j(Â« S ( s )) 2 ]<Â»/ex pbf^ffeM Ho) a ]. (2.3.13)

PAGE 31

24 Recall that z(s) Â— y/n{y{s) Â— /io )/oAlso, using (2.3.5), and the fact that k = k*{k * + *(*))]. (2.3.14) Hence, from (2.3.13) and (2.3.14), for k > 0, Â™{y(s)\q s )/m(y(s)\'K 0 ) = 2 B Â° 1/2 ( 2 7r) 1/2 [{k* 4z(s))] exp(B 0 z 2 (s)/2) = ^sviz(s)). (2.3.15) Combine (2.3.12) and (2.3.15) to get (2.3.9). Next, for k = 0, m (y{s)\qs)/m(y(s)\ir 0 ) = (2tt a 2 )~ n/2 exp[-^ E iGt (Vi ~ ^o) 2 ] (27ra 2 )-"/2B,! /2 exp[-^{E (s ,(w /Jo ) 2 n( 1 BÂ„)(S(s) Â«,) 2 }] = B^ ,/2 exp[-(l B 0 )z 2 (s)/2]. (2.3.16) Combine (2.3.12) and (2.3.16) to get (2.3.10). This completes the proof of the theorem. From Theorem 2.3.1, the robust Bayes estimator of 7 (y) = A --1 yi is Â£[7 (y)|Â«, !/(Â«)] = fy ( s ) + A/" -1 XH'W( 2: ( S ))[(1 Bo)y(s ) + B 0 H 0 ]

PAGE 32

25 +d J m{y\q \$ )dy(s) (2.3.17) But for i e s and k > 0, f Vtm(y\q,)dy(s) f m(y\q s )dy(s) = / f yi(2na 2 )~ N / 2 exp(Â— Â— ^ Ejlifa 6) 2 )q s {d0)dy{s) //(27r < 72)-^exp(-^EiIi(!/ i *m(Â«Â»){k* z{s)) (j){-k* z{s)) y/n \$(fc* z(s)) \$(-Â£* z(s)) a (j>(k* Â— z(s)) ^(Â— fc* Â— z(s)) \/n Â£*{(/;* z(s)) + (&* 4z(s))} g k*y/n tanh(/c*z(s)), (2.3.18) where the penultimate equality in (2.3.18) uses (2.3.5). From (2.3.17) and (2.3.18), one gets for k > 0, s Â‘ (*, !/(Â«)) = y{s) (1 f)[XiB 0 (y(s) /*Â„) + (1 A 0-7Â—p tanh(r*(s))]. (2.3.19) rC yy Tl Similarly, for k = 0, noting that tanh(A:*z(s))/A;* -> z( 5 ) as A:* -4 0, one gets after some simplifications S su (s,y(s)) = y(s) (1 /)(! A 2 (l B 0 ))(y(s) /x 0 ). (2.3.20)

PAGE 33

26 Also, generalizing the formula (1.8) in Berger and Berliner and after some heavy algebra, one gets for k > 0, V{i(y)\s,y{s)) = N2 l(N n)<7 2 + (AT Â„) V{-A_ + ifA tanh(fz(*)) M 0 + n k*n " x(^) ^tanh {k*z{s)))} -h A x (l Â— Xi){B 0 (y(s) y 0 ) tanh(A:*z(a))} 2 )] (2.3.21) while for k = 0, vh(y)\s,y(s)) = N2 [(N n)a 2 + (Nn) 2 {a 2 j^+ A 2 (l A 2 )(l BÂ„) 2 (ff( s ) ^J 2 )]. (2.3.22) Next we study the posterior and procedure robustness of the robust Bayes estimators proposed in this section. For posterior robustness, calculations similar to those of the previous section provide for k > 0, p(SbÂ» ( s > y(Â«)), t su ) ~ p(Â£b, (s, y(s)), S B ) = 0 f) 2 [{B 0 X\ B){y(s) no) + (1 Xi)~^= tanh(fc*z(s))] 2 , (2.3.23) while for k = 0, P((b, (Â», Â»(Â»)), Â« st ') p(fu, (Â«, Â»(Â»)), 0,

PAGE 34

27 POR r (S su ) = (1 ~ /) 2 m &x[{B 0 \ x (y(s) no) + (1 Ai)^ tanh (k*z(s))} 2 , K yj Tl {(B 0 Ai l)(y(s) Ho) + (1 A a ) ^ tanh(fc*z(s))} 2 ], (2.3.25) AC y/Tl while for k = 0, PORr (S su ) = (1 /) 2 max[(l B 0 ) 2 \l {1 A 2 (l B 0 )} 2 ](Â£(s) // 0 ) 2 . (2.3.26) In order to study the behavior of the difference in the Bayes risks of 6 SU and 6 B under the subjective 7V(yu 0 ,r 2 ) prior, we need the following lemma which leads to the asymptotic relationship between k* and |z(s)| as |z(s)| ->Â• oo. Lemma 2.3.2 (i) For large |z(s)|, say |^(s)| > M 0 (> 0), there exists c 0 (0 < Co < 1) such that co/k* < (j)(k* |z(s)|) + (j){k * + \z(s)\) < 1/A:*; (ii) k* Â— |z(s)| Â— >Â• +oo as | 2 (s)| Â— Â» +oo; (iii) for |z(s)| > M\, 0 < k* Â— |^r(s) | < c*(log A;*) 1 / 2 for some c*(> 0). Proof of Lemma 2.3.2 (i) From (2.3.5), {k*-\z(s)\) + M 0 > 0, \z(s)\) *(-k* \z(s)\) = \$(k* + \z(s)\) (\z(s)\ k*) > *(|z(a)|)-*(0) > \$(M 0 ) i = c 0 (say). (2.3.28) Hence, combining (2.3.27) and (2.3.28) one gets (i). (ii) From (i), l^imJ^A;* \z(s ) |) + 4>(k * + |*(*)|)] = 0

PAGE 35

28 since k* > |z(s)| Â— > oo as |z(s)| -* oo. Hence, lim |z(s)|-s-oo |z(s)|) =0, that is, |JSoo (27r ) _1/2eXp t-(^ _ l z ( s )l) 2 /2] = 0. This implies lim (k* |z(s)|) 2 = +oo |z(s)|-KX> which implies (ii) since k* > |z(s)| for |z(s)| > 1. (iii) Using (2.3.5) and (ii), lim |z(s)|->00 k*(t>{k* |z(s)|) = 1. This is equivalent to 0 = lim (log(r/x/2?) (k|z(s)|) 2 /2] which implies ( k * z(s )) 2 = 21og(/c*/v / 27r) + o(l) as |z(s)| ->Â• oo. Consequently, k*~\z(s)\ = [2 \og(k*/y/2n) + o(l)] 1 / 2 = (21og^) 1 /2(i +0 (i)) as |z(s)| Â— > oo. Hence, there exists M x such that for |z(s)| > M x , k* Â— | 2 (s)| < c*(logfc*) 1 / 2 for some c* > 0. This completes the proof of the lemma.

PAGE 36

29 In order to study the procedure robustness of 8 SU , first find under the N(/j, 0 ,t 2 ) prior (denoted by Â£ B ) r(&,0-r(fr,* B ) Â»(Â«))^(Â«, y( a ))] a = (1 /) 2 S[{A 1 S 0 (y(s) Mo) + (1 X\) kt Â°^~ tanh(fc*g(s))}/^ >0] + (1 A 2 (l B 0 ))(y(s) Mo )/pf =OJ B(y(s) Mo)] 2 (2.3.29) We shall now prove the following theorem. Theorem 2.3.2 r(f B , 8 SU ) r(f B , S B ) = 0(B 1 / 2 ). Proof of Theorem 2.3.2 First use the inequality rhs of (2.3.29) < 3(1 f) 2 E[{X l B 0 (y(s ) Mo) + (1 Ai)^^= tanh(k* z(s))} 2 /^ >0] +(1 A 2 (l B 0 )) 2 {y(s ) Mo ) 2 I(k =0 ] + B 2 (y(s) Mo) 2 ](2.3.30) Next observe that E[B 2 (y(s) mo) 2 ] = E[B 2 a 2 (nB)~ l x 2 ] = Ba 2 /n = O e {B); (2.3.31) Â£[(1 A 2 (l B 0 )) 2 (y(s) Mo) 2 /Â£ =0] ] = Â£[(1 A 2 (l B 0 )) 2 (y(s) Mo) 2 /[ 2 2 (Â«)
PAGE 37

30 (1 B 0 ) ~ nij 1 + 2g exp(Â— (1 B 0 )x?/(2Â£))) ^ | (! ~ -Bo) 2ex p(( 1 Bq)!2) 2 Â— n B ^ 2 g jB(Xi *{xl0] ] < 2^[{A 2 x B 2 {y(s) ~ Mo) 2 + (1 A 1 )V(A:*V^)' 2 tanh 2 (fe^(s))}/ [1f>0] ]. (2.3.35) Now, writing g' = g\/2n, E{%B 2 (y(*) ~ #1] \ 1]J ' (2.3.36) Let K = max(M 0 , Mi, 2). Then writing g" = cog' and using (i) of Lemma 2.3.2, (2.3.34)

PAGE 38

31 rhs of (2.3.36) < Â£[B 0 V/n)* 2 M{/, .<,Â’(.)Â«Â’] + jÂ‘ + exp(B 0 z 2 (s) /2) j_ V(*)>*=] }] Â— /n)E[B x Xi I[bK 2 ] }. (2.3.37) But, rK 2 B E [x\ I[B K E[k*z 2 (s){k* + g" exp(B 0 z 2 (s)/2)}~ 1 I [z 2 is)>K 2 ] \ < E[z 2 (s){z{s) +c t (\ogk*) l ' 2 ){k* + g" exp(B 0 z 2 (s)/2)}1 I [z 2 {s)>K 2 ] ] < E[{\z(s)\ 3 /(g"exp(B 0 z 2 (s)/2))}I [z 2 {s)>Ka] +c*z 2 (s)(2^"exp(B 0 z 2 {s)/4))1 r [z 2 {s)>K 2 ] \. (2.3.39) E[\z(s)\ 3 exp(-B 0 z 2 (s)/2)I [Z 2 {S)>K 2 ] ] = E{(x\/Bf 2 ^M-\EqxVE)I ]x 2 >K 2 B] ] roo 1 1 = B ' 3/2 Lb x3/2qM ~ 2B BoX ~ 2 X ) xX/2 ~ l ^)~ X/2dx < (27r ) -1 / 2 ^3 / 2 f a: eX p (Â— + 1 ))dx Jo 2 B = (2 7 t)1 / 2 B3 / 2 4(Â§ + l)2 = 0(B X ' 2 ). B (2.3.40)

PAGE 39

32 Moreover, E\z 2 (s) exp(-^z 2 (s))/ [z 2 { , )> ^ 2 ] ] = E \{x\/B) exp(-^x?)/ [x 2 >/<2B] ] =B '7Â«Â“ s exp (-f(S +i ^ ,/2 w v ^ = 0(B 1 ^ 2 ). Combine (2.3.37) (2.3.41) to conclude that E[\\Bl(\${s) / x 0 ) 2 / M ] = 0{BW). Finally, Â£[(i M) s a (, k*) 2 n tanh 2 (fc*z(s))/^ >0] ] < ~ E i( k *) 2 tanh 2 (A: t z(s))/ [ |j.( a ),>i]] 2 Â— E[z 2 {s)I [ x< \ z ( s) \< k] + (k*)~ 2 1[\ Z ( S )\ >K ]\, where in the final inequality of (2.3.43), we use | tanh(Ar*s(s))| < jfc*|;z(s)| |z(s)| < K and | tanh(fc*z(s))| < 1 for |^r(s)| > K. As before *[2 ! W' [ K|. M |< l ol = 0(B 1/2 )Also, using (ii) of Lemma 2.3.2, E[(**)a JiW.)|>*]] < Â£[( Z 2 (s)), / | Â» I Â„ )>Ki] ] = E[B( X f)-'/ w>K , B] ] roo 1 = B J k 2 b X ~ 1 eX P(~ 2 X )Â‘ I ' 1/2_1 (^ 7r ) _1 ^ 2 ^' r (2.3.41) (2.3.42) (2.3.43) for 1 < (2.3.44)

PAGE 40

33 / rÂ°Â° < B(2n)~ l/2 J^^x~ 3/2 dx = 2B(2 tt)~ 1 / 2 (K 2 B) -1 / 2 = 0(BV 2 ). (2.3.45) From (2.3.43) (2.3.45), lhs of (2.3.43) = 0(B 1/2 ). Combine (2.3.30), (2.3.31), (2.3.32), (2.3.34), (2.3.35), (2.3.42), (2.3.43) and (2.3.45) to get the theorem. Remark 2.3.2 It follows from the above theorem that as n -4 oo, i.e. B Â— > 0, under the subjective N(y 0 ,r 2 ) prior, 8 SU , the robust Bayes estimator of 7 (y) = TV -1 Vi is asymptotically optimal in the sense of Robbins (1955). Next, to find the range of E[y(y) |s, ?/(\$)] over T* in (2.3.1), once again applying Sivaganesan and Berger (1989) or Sivaganesan (1988), we get the following theorem. Theorem 2.3.3 sup E['y(y)\s, y(s )] = fy(s ) + (1 /)Â— (2.3.46) Trer. a + H(z u ) v Â’ and inf E[y(y)\s,y(s)] = fy(s ) + (1 f) ? S ^^ + Hi(z t ) ( 2 . 3 . 47 ) xer * a + H{zi) v ' where , 1 rno+z H{z) = Â— f(y(s)\6)d9 ifz^O LZ J hqÂ—z = f(y{s)\no) if 2 = 0; (2.3.48) 1 rno+z Hi (z) = Â— ef(y(s)\6)d\$ if 2^0 ZZ J^lqÂ—z = t*of(v(s)\no) if 2 = 0 , (2.3.49)

PAGE 41

34 while the values of zi and z u are given by the solutions of the equation Â„ _ (y(s)v uoj y/n)(wt + 2 aa/y/n) v(2a 0 a/ s /n + wtuo) z Â— 2 [auz + t(an 0 Â— a 0 ) + wuv/2 ] (2.3.50) where t = Â“), u = Â„ = ' a/y/n / ' c/y/n ), and w = exp [Â— ^2 Eies{yi v{s)) 21 ( 2 irc 2)-(n-l)/2 v/n Note that the equation (2.3.50) may be iteratively solved for z by taking a number larger than ^Â“(^(s)) as the initial value of z when maximizing, and a number smaller than S no (y(s)) as the initial value of z when minimizing. 2.4 An Example This section concerns the analysis of real data set from Fortune magazine, May 1975 and May 1976 to illustrate the methods suggested in Sections 2.3 and 2.4. The data set consists of 331 corporations in US with 1975 gross sales, in billions, between one-half billion and ten billion dollars. For the complete finite population, we find the population mean to be 1.7283 and the population variance 2.2788. The population variance is assumed to be known for us. We select 10% simple random sample without replacement from this finite population. So the sample size is n=33. We can obtain easily the sample mean and the corresponding standard error. We use gross sales in previous year as prior information to elicit the base prior n 0 . The elicited prior 7To is the N(1.6614, 6.4351 xlO -3 ) distribution. Under this elicited prior 7To, we use formulas (2.2.1) and (2.2.2) to obtain the subjective Bayes estimate and the associated posterior variance. But we have some uncertainty in 7T 0 and the prior information, so we choose e = .1 in T* and we get the robust Bayes estimates and the associated posterior variances using the formulas (2.2.16), (2.2.17), (2.3.19) and (2.3.21). A number of samples were tried and we have reported our analysis for one sample. Table 2.1 provides the classical estimate 5 C , the subjective Bayes estimate 5Â°, the robust

PAGE 42

35 Table 2.1. Estimates, Associated Standard Errors and Posterior Robustness Index Estimate SE 1 7(2/) 6\ POR 5 C 1.9881 0.2852 0.2598 8.6506 xl0~ 2 6Â° 1.7191 0.0105 9.2187xl0 -3 7.2386 xlO2 6 RB 1.7704 0.1459 4.2075 xl0~ 2 4.7416xl0 -2 S su 1.7313 0.1184 3.0239xl0 -3 6.5948xl0 -2 Bayes estimate 6 RB with all possible contaminations, the robust Bayes estimate S su with all symmetric unimodal contaminations and the respective associated standard errors. Table 2.1 also provides the posterior robustness index for each estimate which is in a sense the sensitivity index of the estimate as the prior varies over T. From Table 2.1, we may find that the robust Bayes estimates 5 SU and 5 RB are well behaved in the sense that they are closer to 7 (y) than at least the classical estimate S c and S su is closer to 7 (y) than even the subjective Bayes estimate dÂ°. The robust Bayes estimates S su and 5 RB are also good in the viewpoint of the posterior robustness index. For finding the range of the posterior mean of 7 (y) = N -1 y { over T*, we get that the posterior mean is in the interval (1.7120, 1.7868) for all possible contamination case and in the interval (1.7166, 1.7406) for all symmetric unimodal contamination case. Observe that the second interval is much shorter than the preceding one. Also note that robust inference is achieved for T* in either case. So if we feel that the true prior is close to a specific one, say 7To, we should model through T* using one or the other contamination model gaining robustness in each case as compared to the use of a subjective case.

PAGE 43

CHAPTER 3 ROBUST BAYES COMPETITORS OF THE RATIO ESTIMATOR 3.1 Introduction For most sample surveys, for every unit i in the finite population, information is available for one or more auxiliary characteristics, characteristics other than the one of direct interest. For example, if the characteristic of direct interest is the yield of a particular crop, the auxiliary characteristic could be the area devoted that crop by different farms in the list. We consider the simplest situation when for every unit i in the population, value of a certain auxiliary characteristic, say Â£*(> 0) is known (i = 1,2,..., N). The classical estimator of the finite population mean Y -1 Ui in such cases is the ratio estimator e R = N~ l {Y^iesVi/ T,ies x i) which seems to incorporate the auxiliary information in a very natural manner. Moreover, this estimator can be justified both from the model and design based approach. While Cochran (1977) provides many design-based properties of the ratio estimator, Royall (1970, 1971) justifies this estimator based on certain superpopulation models. The ratio estimator can also be justified from a Bayesian viewpoint. To see this, consider the superpopulation model ?/, = /3xi+ei , where e t are independent N( 0, cr 2 Xj), i = 1, 2, . . . , N, while (3 is uniform over (-oo, oo). In the above, a 2 (> 0) may or may not be known. For unknown cr 2 , one assigns an independent prior distribution to a 2 . 36

PAGE 44

37 Then based on s and y(s), the posterior (conditional) mean of 7 (y) = N~ l Vi is given by e R . It is clear from the above that the ratio estimator can possibly be improved upon when one has more precise prior information about 0. For example, if one wants to pi edict the mean yield of a certain crop based on a sample of farms, it is conceivable, on the basis of past experience, to specify a prior distribution for 0 with fairly accurate mean 0 O and variance r 0 2 . Ericson (1969) has indeed shown that with a N(0 Â„, r 0 2 ) prior to 0, the Bayes predictor e E of 7 (y) is given by e E {s,y(s)) = N-'[J2vi + {(oÂ’ -2 Vi + ro 2 0 o )/(a~ 2 + r 0 -2 )} ^x,]. (3.1.1) iÂ€s ies iÂ£ s Clearly e E converges to e R as Tq Â— > 00 , that is, when the prior information is vague. The above Bayesian approach has been frequently criticized on the ground that it preassumes an ability to completely and accurately quantify subjective information in terms of a single prior distribution. We shall see in the next section that failure to specify accurately one or more of the parameters 0q and Tq can have a serious consequence when calculating the Bayes risk and often protection is needed against the possibility of such occurrence. A robust Bayesian viewpoint assumes that subjective information can be quantified only in terms of a class T of possible distributions. Inferences and decisions should be relatively insensitive to deviations as the prior distribution varies over T. In this chapter, we consider an e-contamination class of priors for 0 following the lines of Berger and Berliner (1986). In Section 3.2, the e-contamination class includes all possible distributions. For every member within this class, the Bayes predictor (posterior mean) of 7 (y) is obtained, and expressions for variations of the posterior mean within this class of priors are given following Sivaganesan and Berger (1989) and Sivaganesan (1988). Moreover, the ML-II prior (see Good (1965) or Berger and Berliner (1986)) within this contamination class of priors is found. We provide also

PAGE 45

38 analytical expressions for the indices of posterior and procedure robustness (cf Berger (1984)) of the proposed robust Bayes predictors based on ML-II priors for an entire class of priors, and have compared these indices with similar indices found for the usual ratio estimator as well as the subjective Bayes predictor. In the course of calculating indices for procedure robustness, we have proved the asymptotic optimality in the sense of Robbins (1955) of the robust Bayes predictor, and have also pointed out that the subjective Bayes predictor does not possess the asymptotic optimality property. The above program is repeated in Section 3.3 with the exception that the contamination class now contains only all symmetric unimodal distributions. Once again, robust Bayes predictors are found, and their optimality is studied. Finally, in Section 3.4, a numerical example is provided to illustrate the results of the preceding sections. The summary of our findings is that the robust Bayes predictors enjoy both posterior and procedure robustness for a fairly general class of priors. The subjective Bayes predictor enjoys posterior robustness, but fails miserably under the criterion of procedure robustness. The classical ratio estimator is inferior to the others under the criterion of posterior robustness, but enjoys certain amount of procedure robustness: Thus, our recommendation is to give the robust Bayes predictors every serious consideration if one is concerned with both Bayesian and frequentist robustness. The other important finding is that when the sampling fraction goes to zero, that is, we are essentially back to an infinite population, several asymptotic optimality properties of the estimators of Berger and Berliner (1986) are established which are hitherto unknown in the literature. Royall and Pfferman (1982) have addressed a different robustness issue. They have shown that under the superpopulation models (i) yi ~ d N(a + (3xi,a 2 Xi) and (ii) yi ~ d N(/3xi,a 2 Xi), if /? is uniform ( Â— oo, oo), then under balanced samples, that is, x(s) = x(s), the posterior distribution of y(s) given {s,y(s)) is the same under

PAGE 46

39 the two models, the resulting estimator of the finite population mean being the ratio estimator which equals the sample mean in this case. For simplicity, we shall assume that n(s) ^ n =\$ p(s) = 0, that is, we effectively consider only samples of fixed size n. Also, for notational simplicity, we shall, henceforth, assume that s = {Â«i, Â•Â•Â•,*Â„} where 1 < i x < Â• Â• Â• < i n < N. Let * = {l, 2 , Â•Â•Â•, N} s = {ii , Â• Â• Â• , jN-n} (say), where 1 < j\ < < j w _ n < N. We shall write y(s) = {y h , Â• , y in ) T , y(s ) = (y hl Â• Â• Â• , y jN _ n ) T , x{s ) = ( Xil , ,x in ) T , D(s) Â— Diag(x tl , Â• Â• Â• , Xi n ), x(s) = (xj 1 , Â• Â• Â• , Xj N _ n ) T , and D(s ) = Diag(xjj , Â• Â• Â• , Xj N _ n ). Note that writing M 0 = a 2 /r\$, B 0 = B 0 (s ) = M 0 (A/ 0 + nx(s))1 and / = n/N (the sampling fraction), the Bayes predictor e# given in (3.1.1) can be alternately written as cb 0 ( s , y(s)) = fy{s ) + (1 /)x(s)[(l B 0 (s))y(s)/x(s) + B 0 {s)/3 0 ]. (3.1.2) In later sections, we shall repeatedly use (3.1.2). Also, the associated posterior variance is given by V( 7 (y)\s,y(s)) = cr 2 N~ 2 [(N n)x(s) + (N n) 2 x 2 (s)/(M 0 + nx(s))]. (3.1.3) 3.2 g-Contamination Model and the ML-II Prior Suppose that conditional on (3 , y t are independent N((3xi,cr 2 Xi), i Â— 1,2, Also, let (3 have the prior distribution n = (1 o+eq, where e G [0, 1) is known, 7 r 0 is the N((3 0 ,Tq) distribution, and q G Q, the class of all possible distributions. Then the marginal pdf of y(s) is given by m(y(s)\n) = (1 e)m(y(s)|7r 0 ) + em(y(s)\q), (3.2.1) where m(y(Â«)|7T 0 ) denotes the N(/3 0 x(s), a 2 D(s) + t\$x(s)x t (s)) pdf, while m{y{s)\q) = f (2ttct 2 ) 2 ) exp [--L f3xi) 2 /xi}q{d(3). ies Za ies (3.2.2)

PAGE 47

40 The posterior pdf of @ given (s, y(s)) is now tt(/?|s, y(s)) = \(y(s))n o {0\s, y(s )) + (1 \(y(s)))q(P\s, y(s)), (3.2.3) where A (s/(s )) = (1 Â£)m(j/(s)|7r 0 )/m(?/(s)|7r), (3.2.4) 7 r 0 (/?|.s, 2 /(s)) denotes the 7V((1 B 0 (s))y(*)/*(Â«) + B 0 (s)A), er 2 (M 0 + mr(s))1 ) pdf, and q(/3\s,y(s)) = /(y(s)|/?)g(/?)/m(y(s)|g). ( 3 . 2 . 5 ) This leads to the posterior pdf of y(s) given (s,y(s)) as 7r(y(s)k,3/(s)) = J f(y(s)\p)n(p\s,y(s))dp = Hy(s))n 0 (y(s)\s,y(s)) + (l \(y(s)))q(y(s)\s, y(s))( 3 . 2 . 6 ) where 7r 0 (y(*)|s, y(s)) is the iV(((l B 0 (s))y(s)/x(s) + B 0 (s)p 0 )x(s),a 2 (D(s) + (Mo + nx(s))~ 1 x(s)x T (s))) pdf while using (3.2.5), \$(y(*)l*> y ( s )) = m {y\q)/Â™(y(s)\q)(3.2.7) Then the posterior mean of 7 (y) is given by Â£[7(v)lÂ«.y(s)] = fy(s) + (1 f)x(s) (1 Â£)m(y(s)\TT 0 )6 7To (y(s)) + g f Pf(y(s)\p)q(dp) (1 e)m(y(s) |ir 0 ) + em(y(s)\q) (3.2.8) where ^Â°(y(s)) = (1 B 0 (s))y(s)/x(s) + B 0 (s)p 0 (3.2.9) From Sivaganesan and Berger (1989) or Sivaganesan (1988), it follows that sup(inf)Â£[ 7 (y)|s,j/(s)] = fy(s) + (1 f)x(s) 7rgr x finfl l 1 Â£)m(y(s)\n 0 )6*Â°{y(s)) + ePf(y(s)\p) P (l-e)m(i/(s)|7r o ) + e/(y(s)|0) (3.2.10)

PAGE 48

41 Hence, following Sivaganesan (1988), we get sup E[y(y)\s, j/(s)] 7rÂ€r = fy(s) + (l f)x(s)Â— Â£)m(tj(s)\7r 0 )5*Â°(y(s)) + ePufjyjs) \/3 v ) (1 Â£)m(y(s)\Tr 0 ) + Ef{y(s)\pu) while inf w6 r E['y(y)\s,y(s)] has a similar expression as (3.2.11) with /3 L replacing /3 V . In the above flu and pi (fi L < fjjj) are given by Pv = y{s)/x(s) + V(/a(nx(.s)) 1/2 , (3 L = y(s)/x(s) + v L a(nx(s))~ 1/2 where Vu and v L (< v v ) are solutions in v of the equation e~ v2/2 c(u 2 1) bv = 0, (3.2.12) where c = a(2-Ka 2 ) n / 2 {U iÂ€s x] 1 2 ) exp[^ Â£*,(& x^s) / x{s)) 2 / Xi ], b = c{nx{s)) 1 ' 2 (y(s)/x(s) 6 n Â°(y(s)))/a, and a = e(l e)1 m(?/(s)|7ro). We shall use (3.2.11) in Section 3.4 for numerical evaluation of the supremum and infimum of the posterior mean under the given class of priors. Next we find the ML-II prior within the given class of priors. Since Â— fixiflxi is minimized with respect to p at p s = y(s)/x(s), from (3.2.1) and (3.2.2), the ML-II prior which maximizes the marginal likelihood ra(y(s)| 7 r) with respect to q Â€ Q is given by kml(P) = (1 e)7r 0 (/?) + eq s (P), (3.2.13) where q s (p ) is degenerate at p Â— /3 S . The posterior pdf of y(s) given by (s, y(s)) under the ML-II prior j tml is now given by KML(y(s)\s,y(s)) = ^ ML{y(s )) 7r 0 (y(s)|s, y(s)) + (1 Xml(V(s))) N((y(s)/x(s))x(s), a 2 D[s)) (3.2.14)

PAGE 49

42 where for 0 < e < 1, after some algebraic simplifications Vi(y(Â«)) = 1 +e(l ~ Â£)~ l m(y(s)\q s )/m(y(s)\n 0 ) = 1 + e(l ^) _1 B 0 2 (s)exp(nB 0 (s)(y(s) / 5 0 ^(s)) 2 /( 2cr2 ^( s )))(3.2.15) The robust Bayes predictor of 7 (y) under the ML-II prior 7 t ml then simplifies to e R B(s, y(Â«)) = /y(Â«) + (1 f)x(s){( 1 A M L(y(s))Â£o(s))y(s)/z(s) +AMz,(y(s))B 0 (s)/?o}(3.2.16) Also, generalizing the formula (1.8) in Berger and Berliner (1986), one gets the associated posterior variance given by V(7(l/)|si !/(Â«)) . = iV" 2 [er 2 (iV n )x(S) + (N n) 2 x 2 (s){a 2 ^ ML _ + A ML (1 \ ML ) Mo + nx{s) xB o(s){V(s)/x(s) /? 0 ) 2 }]. (3.2.17) We shall now compare the robust Bayes predictor e R # of 7 (y) with the Bayes predictor es 0 given in (3.1.2) and the ratio estimator e r in terms of posterior risks as well as the Bayes risks under the {N(/3 0 , r 2 ), r 2 > 0} class of priors. For a typical member N((3 0 ,t 2 ) of this class, the Bayes predictor of 7 (y) is given by e B (s, y(s )) = fy(s ) + (1 f)x(S)[{ 1 B{s))y(s)/x(s) + B{s)(3 0 ], (3.2.18) where B = B(s) = M/(M + nx(s)), M o^/r 2 . The choice of the above class of priors may be justified as follows. Very often, based on prior elicitation, one can take a fairly accurate guess at the prior mean. However, the same need not necessarily be true for the prior variance, where there is

PAGE 50

43 a greater chance of vagueness. Note that when r 2 Â± Tq, none of the estimators e R , 6s 0 or Â£rb is the optimal (Bayes) estimator. Based on Definition 2.2.1 introduced in Chapter 2, we examine the posterior robustness indices of e BB , e Bo and e R . Note that whether or not posterior robustness exits will often depend on which (s, y(s)) is observed. This will be revealed in the subsequent calculations. With this end, first note that under the 7V(/? 0 ,r 2 ) prior denoted by Â£ T 2, the posterior risk of any estimator e of 7 (y) is p( 6 t2 , ( s,y(s)),e ) = p(Â£ T 2 , (s, y(s)),e B ) + (e e B ) 2 (3.2.19) where e B is given in (3.2.18). Using Definition 2.2.1 and (3.2.19) one gets for the class T = {Â£ t 2 : t 2 > 0} of priors POR r (eÂ«) = sup (1 f) 2 x 2 (s)B(s) 2 [y(s)/x(s) (3 0 ] 2 0
PAGE 51

44 large. It follows from (3.2.20) (3.2.22) that both e Bo and e RB are superior to e a in terms of posterior robustness. However, the ratio POR r (eB 0 )/POR r (e fiB ) = max[B 2 , (1 B 0 ) 2 ]/ max[A 2 ML {y{s))B 2 , (1 A ML (y(s))Â£ 0 ) 2 ] (3.2.23) can take values both larger and smaller than 1 depending on the particular (s,y(s)). Although the Bayesian thinks conditionally on (s, y(s)), it seems quite sensible to use the overall Bayes risk as a suitable robustness criteria, at least at a preexperimental stage. This issue is also addressed in Berger (1984) who introduced also the criterion of procedure robustness. We use Definition 2.3.2 to study the procedure robustness indices of e RB , e Bo and e R . Simple calculations yield for the class T = {Â£ T 2 : r 2 > 0} of priors PHr(e B ) = sup (1 f) 2 x 2 {s)a 2 {nx{s))~ l B(s) 0 0, where O e denotes the exact order.

PAGE 52

45 Proof of Th eorem 3.2.1 Noting that nx{s)(y(s)/x(s) \$> ) 2 ~ {a 2 /B)xj, it follows from (3.2.26) that E[{B 0 X ML (y(s))/x(s)) B) 2 (y(s) /3 0 ) 2 ] B 0 2 Bq a u v} /2 ~ 1 ~ Jo U+gexp{uB 0 /2B) ~ nx{s)B U eXP ^~ 2 ) 2V 2 r(l /2) dM Â’( 3 2 ' 27 ) where g = (e/(l e))^ 1/2 . Next observe that rhs of (3.2.27) < a rÂ°Â° roc l 7/ . 7,3/2Â— 1 nx(s) <7 mr(s) L 2 c/ = 0(J5 1/2 ). Again, writing g'= max(g, 1), rhs of (3.2.27) ' 2 Â«Â» Bl [ Tg B L exp( -2 (1 + B Â» B Â“Â‘ )) 53Af(375) a r oo /o ^12^ B -l 2Â£ fi nx(s) J o {2t/'exp(uB 0 /2B)} 2 gexp(-uB 0 /2B) + B } U 7 i 3 / 2-1 xexp(-^) 2 ; 2 3 / 2 r(3/2) [(B 0 /2^) 2 S1 (1 + 2B 0 Bl )-V 2 2B 0 g~\l + B 0 B~^I 2 + B\ (3.2.29) nx(s) = 0(fi 1/2 ). Combining (3.2.28) and (3.2.29) the result follows. When / Â— Â» 0, we get a result related to the procedure robustness of the robust Bayes procedure of Berger and Berliner (1986). To our knowledge, such a result is

PAGE 53

46 the first of its kind. In addition as n -> oo, that is, B -> 0, it shows that the robust Bayes procedure is asymptotically optimal. In view of (3.2.24) and (3.2.26) and Theorem 3.2.1 (i.e., before taking supremum over B) it appears that e r has distinct advantage over e## for small B. This is not surprising though since small B signifies small M =
PAGE 54

47 v -nnfinfl il ~ Â£ ) m (^( 5 )l 7 r o)^Â°(^( g )) + eH ^ k ) * Â’ (l-e)m(y(s)\n 0 ) + eH(k) (3.3.2) where m(j/(s)|7r 0 ) and S 7T Â°(y(s)) are given by (3.2.1) and (3.2.9) respectively, and 1 rpo+k H(k) = j _ f(y{s)\P)d/3 lf*/0 2k J0 o -k = f{y(s) \p 0 ) if k = 0; (3.3.3) i rPo+K H A k ) = uJy 0 k 0/foOOI/W if MO = A>/(l/(s)|#>) if A: = 0. (3.3.4) Using the expression /(j/(s)|/?) = (2na 2 )~* (ILe. 5 )exp[-^ E^fa-fo) 2 /**], it follows after some simple algebra that H(k) = exp[-Â—Y,(ViX iy( s )/ X ( s )) 2 /Xi\( 2k ) '(Zna 2 ) n i l (Hxi 2 ) lÂ£s iGs cr(nx(s )) -1 / 2 cr(nx(s )) -1 / 2 and ffi(A) H(k)y(s)/x(s) + exp[--^ Â£)(% ~ Â®il/(s)/*(s)) 2 /*t] /cr ies x(2/c) _1 (27rcr 2 ) ~ (IX 2 )^{nx(s))~ l k^-y{s)/x{s) _ /3 0 -k-y{s)/x{s ) a(nx(s))l l 2 ' ' cr(nz(s)) -1 / 2 (3.3.6) Write * = z(s) = ( y(s)/x(s)-po)(nx(s)y/ 2 /a , k 0 = k(nx{s)y/ 2 / a, A = 0(fc o -z(s)) + (j)(-k 0 z(s)), Â« = (f>(k 0 z{s )) (j)(-k 0 z(s)), vu = \$(k 0 2 ( 5 )) \$(-fc 0 *(Â«)), a Â— Â£ _1 (1 Â— e)m.(j/(s) |7r 0 ) and = aS ir Â°(y(s)). Now, using (3.3.2) (3.3.6) and solving

PAGE 55

48 Jr[(ai + Hi{k))/(a + H{k ))] = 0, it follows after some heavy algebra that sup E[y(y)\s, y(s)] = fy{s ) + (1 f)x(s)(a x + H x (ku))/(a + H(k v )) (3.3.7) Trgr and Â™l E h(y)\s,y(s)] = fy(s ) + (1 f)x(s)(a x + H x (k L ))/(a + H(k L )) (3.3.8) where ky and ki{< ku) are the two solutions of the quadratic equation 2 k[auk + t(aP 0 a a ) + Guv/2] = [ wy(s)/x(s ) au(nx(s))~ 1/2 ][Gt 42 ao(nx(s))~ l/2 } w(2aia(nx(s))~ 1/2 + G0 o t), (3.3.9) where G = exp f-^ _ x i y(s)/x(s)) 2 /rr i ](27ra 2 )zl 7 i (na:r^)(nx(s))1 / 2 . (3.3.10) The formulas (3.3.7) (3.3.10) will be utilized in Section 3.4 for numerical computations. Next we find the ML-II prior in this case. Since any symmetric unimodal distribution is a mixture of symmetric uniform distributions, for finding the ML-II prior, it suffices to restrict oneself to Q' t = {G k : G k is uniform (/3 0 -k,(3 0 + k),k > 0}. The ML-II prior is then given by 7r* = (1 e)n 0 + eq s , (3.3.11) where q s is uniform (/?o Â— k,/3o + k ), k being the value k which maximizes m(y(s)\q). To find k , first write m(y(s)\q) (2 k) / (2yra 2 ) 2(JJ^ 2 )exp[J fo~ k its 2a 2 ~ /3xi) 2 / Xi\dl3 its

PAGE 56

49 = (27ra 2 ) (2k\J nx(s)) *) ex P [~^2 E(& ~ Â«{-s]nx(s)k/a z(s))], (3.3.12) where z(s) = ( nx{s)f /2 (y{s)/x{s ) /3 0 )/a. Differentiating m(y(s)|g) with respect to k and letting the derivative equal to zero, it follows from Berger and Sellke (1987) that k = k*a/\Jnx{s) if |z(s)| > 1 = 0 if |z(s)| < 1, (3.3.13) where k* is a solution of the equation \$(** ~ 1*001) H~k* |z(s)|) = k*[(f)(k* |z(s)|) + (k * + |z(s)|)]. (3.3.14) Remark 3.3.1 Clearly k* = 0 is a solution of (3.3.14). Berger (1985, p234) points out that there exists a unique solution k*(> 0) of (3.3.14) for |z(s)| > 1. Lemma 2.3.1 contains the stronger result than that there exists a unique solution k* > |z(s)| of (3.3.14) for |z(s)| > 1. Under the ML-II prior n*, the posterior distribution of y(s ) given (s,y(s)) is Ki.y(s)\s,y(s)) = \su(z(s))n 0 (y(s)\s,y(s)) + (1 *su(z(s)))m(y\q s )/m(y(s)\q s ), (3.3.15) where for k > 0, %Â£(*(*)) = l + e(l-e)Â“ 1 (27r/B o ) 1/2 (V2)[0(^-^(s)) + ( /)(r + z( S ))] x exp(B 0 z 2 (s)/2) = V(*00) (say). (3.3.16) while for k = 0, ^ su ( z ( s )) Â— 1+^(1Â— e) ex P(~(l ~ B 0 )z 2 (s)/2) = ^2 0*00) (say). (3.3.17)

PAGE 57

50 The robust Bayes predictor of 7 (y) under the ML-II prior n* is then given by e su(s,y(s)) = fy(s ) + (1 f)x(s)[XMs)){(l B 0 (s))y(s)/x(s) + B 0 (s)p 0 } + (1 Ai(z(s))){?j(s)/x(s) j== tanh(A:*z(s))}] (3.3.18) k*Jnx{s ) for k > 0, while for k = 0, e su (s,y(s )) = MÂ») + (1 f)x(s)[X 2 (z(s)){(l B 0 (a))y{8)/x(s) + B 0 (s)p 0 } +(1 A 2 (z(s)))A,]. (3.3.19) Also-, generalizing the formula (1.8) in Berger and Berliner and after some heavy algebra, one gets the associated posterior variance given by v(y (y)\s,y(s)) = N~ 2 [a 2 (N n)x(s) + (N n) 2 x 2 (s) x{a 2 ( + Ai(l MÂ„ + U( 8 ) + 75(7F ta.nh(fc-^( S ))(*(s) 1 tanh(***(Â»))))} Ai ){B 0 (s)(y(s)/x(s) p 0 ) ~~ tanh(A:*z(s))} 2 ] (3.3.20) k*Jnx(s ) for k > 0, while for k = 0, Vhiy)\s,y{s)) = N~ 2 [a 2 (N n)x(s ) + (AT n) 2 x 2 (s){a 2 ^ 2 Â— + A 2 (l A 2 ) M 0 + nx(s) x (1 B 0 (s)) 2 (y(s)/x(s) po) 2 }}. (3.3.21) Next we provide expressions for the indices of posterior robustness of the robust Bayes predictors proposed in this section under the {N(P 0 , r 2 ), r 2 > 0} class of priors.

PAGE 58

51 Calculations similar to those of the previous section provide for k > 0, P0R r (e5t/) Â— (1 Â— f) 2 % 2 (s) m ax[{B 0 (.s)\i(y(s)/x(s) Â— p 0 ) + (1 Â— Ai) Â— tanh(fc*z(g))} 2 , k*Jnx(s) . {(Bo(s)\\ 1 )(y(s)/x{s) po) + (1 Ai) Â° Â— tanh(fc*z(s))} 2 ], (3.3.22) k*Jnx(s) while for k = 0, P0R r (e 5f /) = (1 ~ffx 2 {s) max[(l B 0 (s)) 2 X 2 , {1 A 2 (l B 0 (s))} 2 ](y(s)/x{s) -p 0 ) 2 (3.3.23) In order to examine the procedure robustness of esu, first note that under the N(Po,t 2 ) prior (denoted by Â£ t2 ) r(Â£r 2 ? Â®5l/) r (Cr 2 > Cb) = E[e su (s, y{s )) e B (s, y(s))] 2 = (t f) 2 x 2 {s)E[{X l B 0 (s)(y(s)/x{s) p 0 ) + (1 Ai) _ tanh(k*z(s))}I^ , k*Jnx(s ) 1 J + (1 A 2 (l B 0 (s)))(y(s)/x(s) Po)I\$ =0] B(s)(y(s)/x(s) A>)] 2 . (3.3.24) We now have the following theorem. Theorem 3.3.1 r(Â£ T 2 , e su ) r(Â£ T 2 , e B ) = 0(B 1 ! 2 ). Proof of Theorem 3.3.1 First use the inequality rhs of (3.3.24) < 3(1 f) 2 x 2 (s)E[{X 1 B 0 (y(s)/x(s) p 0 ) + (1 Ai) Â° Â— tanh {k* z(s))} 2 1\$ Q] k*Jnx(s ) 1 J +(1 A 2 (l BÂ„)) 2 (y(s)/i( s ) A) 2 / [M + B 2 (Â§(s)/x(s) ft,) 2 ]. (3.3.25)

PAGE 59

52 Next observe that E[B 2 {y(s)/x{s) A,) 2 ] = E[B 2 a 2 (nx{s)B)~ l x\\ = Ba 2 /(nx{s)) = O e {B) (3.3.26) and m A a (l B 0 )) 2 (y(s)/x(s) Po) 2 r [t=0] ] = E[( 1 A 2 (l B 0 )) 2 (y(s)/x(s) P 0 ) 2 I [z 2 (, ) < 1] ] __ 1 Bp ,2 ^ 2 t i 1 + #exp(-(l B 0 )xi/(2B))^ nx(s)B Xl (. 9 = e(l +e) _1 i5o _1/2 ) < < ^ c[[] | (1 Bo) 2 2 nx(s)B Ll (1 + gexp(Â— (1 B 0 )xj/ (2B))) 2 }Xl a e\[ 1 -i (1 ~ Bp) 2 Â„ nx( 5 )B Ll 2^exp(-(l -B 0 )a?/(2B))) /Xi 0 + ( 1 ^^(( 1 -^/ 2 ) ( 3 . 3 . 2 7 ) nx(s)B Note that E (.Xi 7 [ x ?n [ fc>0 ^ J 2E l{^B 2 0 (y(s)/x(s) Ao) 2 + (1 X\) 2 a 2 (k*\Jnx(s))~ 2 tanh 2 (fc*2:(s))}7 [ ^ >0] ]. (3.3.30)

PAGE 60

53 Now, writing g' = gy/ 2n, mBl(y(s)/x(s) A, )% 0] ] = #o (^V nx(s))2 2 (s)/ [z2(s)>1] ] < ^,r B\$(a 2 /nx(s))z 2 (s) Â“ L 1 +g'exp(Boz 2 (s)/2)((t)(k* |z(s)|) + {k* + k(s)|)) /[ * 2( * )>lll3 Â‘ 3 Â’ 31) Let K = max(M 0 , Mi, 2). Then writing g" = c 0 g' and using (i) of Lemma 2.3.2, rhs of (3.3.31) < Â£[B 0 V/-W). 2 M { W W, 1 H < S 2 (a 2 /nx(s))T;[B1 x?/ [B ^ 2] ]. (3.3.32) e [x 2 J{b< x \ K E[k*z 2 (s){k* + g" exp(B 0 z 2 (s) /2)}~ l /[ z2 ( s )>k2 ]] Â— ( s )( 2 ( s ) + c *(lÂ°g A;*) 1 / 2 ){A:* + k 2 ]] < E[{\z(s) \ 3 /{g" exp(B 0 z 2 (s)/ 2 ))}I [z 2 {s)>K 2 ] +c t z 2 (s){2i/g"exp(B 0 z 2 (s)/A))1 I m>K 2 ] ]. (3.3.34) But, E [\ z (s)\ 3 exp(-B 0 z 2 (s)/2)I [Z 2 {S)>K 2 ] ]

PAGE 61

54 E[{X\/B) ! ex P( 2 ^Â°Ai/^K[x?>a' 2 b]] pOO 1 1 = B_3/2 L* 3 ' 2 exp (~w B Â° X ~ 2 x ^ xl/2 ~ 1 ^~ 1/2dx < (2tt)1/2 B~ 3/2 ( a:exp(-io:( : ^ 41 ))dx Jo ZB = (2tt)1 / 2 B3 / 2 4(Â§ + l)2 = 0{B 1 ! 2 ). B Moreover, E[z 2 (s) exp(-^-z 2 (s))I [z 2 {s)>K 2 ] ] = E [(xVB)exp(-~x 2 1 )I [x 2 >K 2 B] } = S_1 Cb 6Xp( Â“f + i))* 172 ^)" 172 ^ = 0(B 1 ' 2 ). Combine (3.3.34) (3.3.36) to conclude that E[A?Bj( S ( s )/i( s ) A) 2 / ff>0] ] = 0(BÂ‘/ 2 ). Finally, Â£[(1-Ai) 2 < a < nx{s) T^_ nx(s) 77 -77-37 : tanh 2 (/c*z(s))/ r -r nl l ( k*) 2 nx(s ) v v " [ fc> Â°l J Â•E[(/c*)2 tanh 2 (A;^(s))7 [ | z(s) | >1] ] ^(*)WÂ«|<*] + (^*)~ 2 %k(s)|>.R:]], (3.3.35) (3.3.36) (3.3.37) (3.3.38) where in the final inequality of (3.3.38), we use | tanh(fc*z(s))| < k*\z(s)\ for 1 < |z(s)| < K and | tanh(fc*z(s))| < 1 for |z(s)| > K.

PAGE 62

55 As before E [z 2 (s)I[ i<|* (4 )|<*]] = 0(B l/2 ). (3.3.39) Also, using (ii) of Lemma 2.3.2, E[(*:*)2 /| k (, )l>if] ] < b[(^(Â»))->V(. )>KS) ] = Â£[B(x;)-V m>k , b] ] roc i = B x~ l exp(--rr)x 1/2_1 (27r) _1/2 c?x Jk 2 b 2 v ' n 00 < B^n)1 ' 2 J K2B X3 ' 2 dx = 2B(2 tt)1 / 2 (K 2 B)~ 1 / 2 = 0(B 1 ' 2 ). (3.3.40) From (3.3.38) (3.3.40), lhs of (3.3.38) = 0{B 1 / 2 ) Combine(3.3.25), (3.3.26), (3.3.27), (3.3.29), (3.3.30), (3.3.37), (3.3.38), (3.3.39) and (3.3.40) to get the theorem. Remark 3.3.2 It follows from the above theorem that as n Â— Â» oo, i.e. B(s) -4 0, under the subjective N(fi 0 , r 2 ) prior, e\$u, the robust Bayes estimator of y(y) = N~ l Ya=\ Vi is asymptotically optimal in the sense of Robbins (1955). 3.4 An Example The example in this section considers one of the six real populations which are used in Royall and Cumberland (1981) for an empirical study of the ratio estimator and estimates of its variance. Our population consists of the 1960 and 1970 population, in millions, of 125 US cities with 1960 population between 100,000 and 1,000,000. Here the auxiliary information is the 1960 population. The populations of different cities are shown in Figure 3.1.

PAGE 63

56 The problem is to estimate the mean (or total) number of inhabitants in those 125 cities in 1970. For the complete population in 1970, we find that the population mean is 0.29034. We select 20% simple random sample without replacement from this population. So the sample size is n=25. Also, we are using a 2 = (N Â— l) -1 J2iLi (Vi ~ 0Xi) 2 = 4.84844xl0 -3 which is assumed to be known. We can obtain easily the ratio estimate and corresponding standard error. To do a Bayesian analysis, we use both 1950 and 1960 populations in 125 cities to elicit the base prior 7 r 0 for (3. The elicited prior 7r 0 is the N(l. 15932, 1.21097x 10~ 3 ) distribution based on prior information. Under this elicited prior 7To, we use formulas (3.1.2) and (3.1.3) to obtain the subjective Bayes predictor and the associated posterior variance. But we have some uncertainty in 7 Tq and the prior information, so we choose e = .1 and we get the robust Bayes

PAGE 64

57 Table 3.1. Predictors, Associated Standard Errors and Posterior Robustness Index Predictor SE It ( y) -e\ POR e R 0.28426 1.10032xl0 -2 6.08452xl0 -3 5.38660xl0 -4 e B 0 0.29336 5.61481 xl0~ 3 3.01418xl0 -3 3.27488xl0~ 4 Â£rb 0.28660 5.49854xl0~ 3 3.74880 xlO3 4.35696 xlOÂ” 4 esu 0.29027 5.74954xl0 -3 7.83722 xl0~ 5 3.29777xl04 predictors and the associated posterior variances using formulas (3.2.16), (3.2.17), (3.3.18) and (3.3.20). For illustration purpose, we have decided to report our analysis for one sample. Table 3.1 provides the classical ratio estimate e R , the subjective Bayes predictor e Bo , the robust Bayes predictor e RB with all possible contaminations, the robust Bayes predictor esu with all symmetric unimodal contaminations and the respective associated standard errors. Table 3.1 also provides the posterior robustness index for each predictor which in a sense the sensitivity index of the predictor as the prior varies over the class {N(/3 0 , r 2 ), r 2 > 0}. An inspection of Table 3.1 reveals that the robust Bayes predictors e RB and e su are well behaved in the sense that e S u is closest to 7 (y) and both e RB and e S u are closer to 7 (y) than the classical ratio estimate e R . The subjective Bayes predictor e Bo is good in the sense of the posterior robustness index. Note that e R is worst in both the closeness to 7 (y) and the posterior robustness index. Also we find that the posterior mean of 7 (y) = N~ l ZiLi Vi is m the interval (0.28363, 0.29428) for all possible contaminations and in the interval (.29003, .29364) for all symmetric unimodal contaminations. Note that the range of the posterior mean of 7 (y) = N~ l Vi is fairly small for both cases. So if we feel that the true prior is close to a specific one, say 7T 0 , we should model via one of the contamination models, and achieve a very robust inference.

PAGE 65

CHAPTER 4 BAYESIAN ANALYSIS UNDER HEAVY-TAILED PRIORS 4.1 Introduction In this chapter, we consider the idea of developing priors that are inherently robust in some sense. The idea is that it is perhaps easier to build robustness into the analysis at the beginning, than to attempt verifying robustness at the end. Substantial evidence has been presented to the effect that priors with tails that are flatter than those of the likelihood function tend to be fairly robust (e.q., Box and Tiao (1968, 1973), Dawid (1973), OÂ’Hagan (1979, 1989) and West (1985)). It is thus desirable to develop fairly broad classes of flat-tailed priors for use in Â“standardÂ” Bayesian analyses. Andrews and Mallows (1974) and West (1987) studied scale mixture of normal distributions which can be used for simulation and in the analysis of outlier models. The Student t family, double-exponential, logistic, and the exponential power family can all be constructed as scale mixtures of normals. The exponential power family was introduced and popularized by Box and Tiao (1973) in the context of Bayesian modelling for robustness. Recently, Angers and Berger (1992) and Angers (1992) considered t priors in the hierarchical Bayes setting, while Datta and Lahiri (1994) considered general scale mixtures of normals primarily with the end of outlier detection in the context of small area estimation. The price to be paid for utilization of inherently robust procedures is computational; closed form calculation is no longer possible. Recently, however, the Markov 58

PAGE 66

59 chain Monte Carlo integration techniques, in particular the Gibbs sampling (Geman and Geman (1984), Gelfand and Smith (1990), and Gelfand et al. (1990)) has proved to be a simple yet powerful tool for performing robust Bayes computations. Ericson (1969) considered the superpopulation model y { = 9 4e*, where 9, e u , e N are independently distributed with 9 ~ N(/j, 0 ,Tq) and ejÂ’s are iid N(0, a 2 ). As we have seen in Chapter 2, under the N(h 0 ,Tq) prior, the Bayes estimator of 7 (y) = N 1 Eila ft is
PAGE 67

60 4.2 Known Variance Consider the case when (i) y { | 0 ~ N(9,a 2 ) (i=l, N) and (ii) 6 ~ -A-p(Â£^*a) where p(x) Â— / 0 Â°Â° A 2 0(:rA2)<7(A) dX, that is, p(-) is a scale mixture of the normal distribution with mixing distribution g(-). Note that we can write (ii) in the following two steps; (iia) 6 | A ~ iV(// 0 , A -1 ) and (iib) A ~ r 0 2 c/(r 0 2 A) where / 0 Â°Â° g(x)dx = 1. The following list identifies the necessary functional form for g{ A) to obtain a wide range of densities which represent departures from normality: t-priors: If kX ~ xl then \$ is Student t with k degrees of freedom, location parameter p 0 , and scale parameter r 0 . double-exponential priors: If 1/A has exponential distribution with mean 2 then 0 is double-exponential with location parameter go and scale parameter To. exponential power family priors: If A has positive stable distribution with index a/2 then 6 has exponential power distribution with location parameter p 0 and scale parameter tologistic priors: If \/A has the asymptotic Kolmogorov distance distribution then 6 is logistic with location parameter p 0 and scale parameter r 0 . [A random variable Z is said to have an asymptotic Kolmogorov distance distribution if it has a pdf of the form f(z) = 8zEÂ°Â° =1 (-ir 1 i 2 exp(-2iV)/ (0iOo) (z) ]. We shall use the notations y(s) = n~ l EstsVi and y(s) = {yi : i & s}, the suffixes in y(s) being arranged in the ascending order. Then the posterior distribution of y(s ) given by s and y(.s) is obtained as follows: (i) conditional on .s, y(s) and A, y(s) has N((B(X)p 0 + (1 B(X))y(s))l N _ n , Â° {I NÂ—n + Ijv-nlAf-n))) where B( A) = Aa 2 /(Acr 2 -fn); (ii) the conditional distribution of A given s and y(s) has pdf f(\\s,y(s)) oc (a 2 + nA _1 )Â“ 1/2 exp[-|^/iÂ— pjy] Â«(t 0 2 A). (4.2.1)

PAGE 68

61 Note that under the posterior distribution given in (4.2.1), the Bayes estimator of 7 (y) is given by s sm( s ,v( s )) = E[y(v)\s,y(s)] = fy{s) + (1 f){E[B( A) | s, y(s)] /x 0 + (1 E[B( A) | s, y(s)]) y(s)}. (4.2.2) Also, one gets ^(7 (y) I s,y(s)) = E[V (7 (y) | s, 1 /( 5 ), A) | s, y(.s)] + F[ E(-y(y) | s, y(s), A) | s, y(s)] = AT 2 < 7 2 {(JV Â— n) 4 (AT Â— n) 2 i?((Acr 2 + n ) _1 | s, y(s))} +(1 ~ f) 2 V(B( A) //o + (1 5(A)) y(s) | s,y(s)). (4.2.3) The calculations in (4.2.2) and (4.2.3) can be performed using one-dimensional numerical integration. Alternately, one can use Monte Carlo numerical integration techniques to generate the posterior distribution and the associated means and variances. More specifically, in this chapter, we use Gibbs sampling originally introduced in Geman and Geman (1984), and more recently popularized by Gelfatid and Smith (1990) and Gelfand et al. (1990). Gibbs sampling is described below. Gibbs sampling is a Markovian updating scheme. Given an arbitrary starting set of values , Â• Â• Â• , , we draw ~ \U\ \ U^,Â• Â• , E/^], ~ [U 2 \ Â• ,1/^ ~ [Uk | U^\Â• Â• , U^], where [Â• | Â•] denotes the relevant conditional distributions. Thus, each variable is visited in the natural order and a cycle in this scheme requires k random variate generations. After t such iterations, one arrives at (U}*\ Â• , UÂ®). As t-t 00 , (U? , Â• Â• Â• , ujp) -4 ,U k ). Gibbs sampling through q replications of the aforementioned t-iterations generates q k-tuples (Uy , Â• Â• Â• , U k j) (j = 1 ,Â• -,q) for t large enough. Ui,---,U k could possibly be vectors in the above scheme.

PAGE 69

62 Gelman and Rubin (1992) adapt multiple sequences, with starting points drawn from an overdispersed distribution, to monitor the convergence of the Gibbs sampler. Specifically, m > 2 independent sequences are generated, each of length 2d. But to diminish the effect of the starting distribution, the first d iterations of each sequence are discarded. Hence, we have m x d simulated values for each parameter of interest. Using Gibbs sampling, the posterior distribution of y(s ) is approximated by m d (md)~ l H [l/(Â«) I v(s), 0 = %, A = Ay] . (4.2.4) t=i j=i To estimate the posterior moments, we use Bao-Blackwellized estimates as in Gelfand and Smith (1991). Note that E[y(y) | s, y(s)] is approximated by m d ft M + ( W) ( md )-Â’ Â•Â£ Y. ( S (AÂ«) to + (1 B( Ay)) y(Â«)) . (4.2.5) i= 1 j=l Next one approximates V(y (y) \ s,y(s)) by m d N~ 2 a 2 {(N Â— n) + (N Â— n) 2 (i md)~ l ^ ^(A y-a 2 + n) -1 } i=l j= 1 to d +(1 /) 2 [(mrf) -1 Y Y( B (^M + (1 B(AÂ«))S(Â»)) 2 1=1 j= 1 to d -{H-'EEW* + (1 B(AÂ«))f(s))) 2 ]. (4.2.6) i=l 1=1 The Gibbs sampling analysis is based on the following posterior distributions : (i) 9 | s, y(s), |/(S), A ~ N [(A// 0 + Â£ t =: Vi/a 2 )/( A + N/a 2 ), (A + fV/a 2 )1 ] ; 00 /(A | 5,?/(s),i/(s),6>) oc \/Aexp[-|(6Â» Mo) 2 ] m(t 0 2 A); (iii) y(5) | s, 3/(s),0, A ~ fV[01,v-Â„, a 2 / N _ n ], Note that if /cA ~ xl then /(A | s,y(s),y(s),9) reduces to a Gamma(|{rQA: + (0 /i 0 ) 2 }, + 1)) density. [A random variable W is said to have a Gamma(a, /3)

PAGE 70

63 distribution if it has a pdf of the form f(w) oc exp {-aw)w p ~ l J( 0 ,oo)(w), where / denotes the usual indicator function]. Also, if 1/A has exponential distribution with mean 2 then /(A | s,y(s),y(s),\$) reduces to a IGN(l/-y/r o 2 (0 /x 0 ) 2 , 1 /r 0 2 ) density. [A random variable V is said to have a IGN (771 , 772) distribution if it has a pdf of the We shall now evaluate the performance of the robust Bayes estimator 8 l SM of 9 for large n under the N(/i q ,Tq) prior, say 7r 0 . The Bayes estimator of 9 under this prior is Â£Â° which is given by (4.1.1). Let r(7To,5) denote the Bayes risk of an estimator S of 9 under the prior n 0 . Our aim is to show that r(7T 0 , Sg M ) r( 7 r 0 , 5Â°) -> 0 as n -> 00 . Lemma 4.2.1 Assume E( A 3 / 2 ) < 00 . Then E[B{ A) | s,y(s)] 4 0 as n -> 00 . Proof of Lemma 4.2.1 Note that E[B(X) \ s,y(s)] = a 1 fÂ°Â° (a 2 + nA -1 ) -3 / 2 exp[ J 0 1/2 E A 1 / 2 i 1 + n(a 2 ) _1 AÂ“ 1 1/2 (4.2.7)

PAGE 71

64 Now, P n E[ A 1 / 2 exp(-^y 0 )], (4.2.9) where the lower bound is bounded away from zero a.s.. Hence, P n /Q n 4 0 as n Â— 4 oo. We now turn to the theorem which proves the A.O. property of S X SM obtained in (4.2.2). Theorem 4.2.1 Assume E( A 3 / 2 ) < oo. Then r( n 0 ,S x SM ) r(7r 0 ,5Â°) 0 as n -4 oo. Proof of Theorem 4.2.1 Standard Bayesian calculations yield r (*o ,4m) r(7r 0) Â£Â°) = Â£(<^ M 6 0 ) 2 = (1 f) 2 E[(E(B( A) | a,y(a)) B 0 ) 2 (\$(*) /k>) 2 ]. (4.2.10) By Lemma 4.2.1, E[B( A) | s,y(s)] 4 0 as n Â— > oo. Also, Bo -4 0 as n Â— 4 oo. Hence, (E{B( A) | s,y(s)) Â— Bo) 2 -4 0 as n Â— > oo. Also, |B(B(A) | s, y(s)) Â— Bo| < 1 and ( y { s ) ~ H o) 2 being a backward submartingale is uniformly integrable. Hence, the rhs of (4.2.10) -4 0 as n Â— > oo. This completes the proof of the theorem.

PAGE 72

65 4.3 Unknown Variance In this section, somewhat more realistically, we consider the normal superpopulation model with unknown mean 6 and unknown variance r -1 . Ericson (1969) used a normal-gamma prior on (0,r) in this setting. That is, 9 \ r ~ iV(/x 0 , r _1 r 0 2 ) and r ~ Gamma(|ao, |<7o)But in this case the ratio of model variance and prior variance is known. Suppose now, more generally, that y t \ 9, r ~ N(9, r" 1 ) (i = 1, Â• Â• Â• , N), and 6 and r are independently distributed with 6 ~ AT(/i 0 ,r 0 2 ) and r ~ Gamma(|a 0 , \g Q ). Then the posterior distribution of y{s) given s and y(s) is obtained via the following two steps : (i) conditional on s,y(s) and r, y(s) has N((B(r)n 0 + (1 B{r))y(s))l N _ n , r ~ l {lN-n+ M (r) +n 1 N-nlN-n))i where M(r) = r ~ l /tq and B(r) = M(r)/(M(r) + n). (ii) the conditional distribution of r given s and y(s) has pdf f(r\s,y(s)) oc r( n +go ~ 2 )/ 2 (1+nTg r) 1 / 2 exp[-|{a 0 + E iesiVi ~ y{s)) 2 + (4.3.1) Note that under the posterior distribution given (4.3.1), the Bayes estimator of 7 (y) is given by = E[y{y) I s,y(s)] = fy( s ) + (! f){E[B{r) I s, y(s)]fx 0 + (1 E[B(r) | s, y(Â«)])y(Â«)}. (4.3.2) Also, one gets v (l(v) I Â»!/(Â»)) = E[ V(y (y) I s, y(s ), A, r) | s , y(s)] + V[ E('y(y) \ s, y(s), A, r) | s, y(s)] = N~*(N n)B[r->{l + 7*(1 B(r)} | Â»,Â»(Â»)] +(1 /) 2 (Â»M go) 2 V(B(r) | s, y(s)). (4.3.3)

PAGE 73

66 In order to robustify the above model, consider the case when (i) y { \ \$,r ~ N(9,r~ l ) (i = (ii) r ~ Gamma(Ia 0 i |flo) and (iii) 6 ~ ^(^p-), where p( x ) = /oÂ°Â° A2 0(zA2)^(A)dA. Note that the prior pdf of 6 does not depend on r. Recall that we can write (iii) in the following two steps; (iiia) 9 | A ~ N(n o, A -1 ) and (iiibj A ~ t 0 2 #(t 0 2 A) where f 0 Â°Â° g(x)dx = 1. Then the posterior distribution of y(s) given by s and y(s) is obtained as follows : (i) conditional on s, y(s ), r, and A, y(s) has N((B(X, r)/i 0 + (l -B(A, r))y(s))ljvn , r_1 ( J w-n + ^A_liv-nlw-n))> where #(A, r) = 1/(1 + nA _1 r); (ii) the conditional distribution of A and r given s and y(s ) has pdf f(X,r\s,y(s)) K exp[-|{oo + E*,fo Â»( S )) 2 + gtfX). (4.3.4) Note that under the posterior distribution given in (4.3.4), the Bayes estimator of 7 (y) is given by <*sm(s>I/(s)) = E[y(y)\s,y(s)} = Ms) + (1 f){E[B( A, r) | s, y{s)]p 0 + (1 E[B(A, r) | s, y(s)])y(s)}{4.3.5) Also, one gets V{rt (y) I s,y(s)) = E[V (y(y) | s, y(s), A, r) | s, y(s)] + V[ Â£(7(1/) | s, y(s ), A, r) | s, y(s)] = N-\N n)Â£J[r-Hl + (iV n)^} | a, y(a)] +(1 f) 2 V (B( A, r) /*o + (1 B(A, r)) y(s) \ s, y(s)). (4.3.6) Using Gibbs sampling, the posterior distribution of y(s) is approximated by m d ( md ) _1 ^ [y(a) | a, y(a), 6> = %, A = Ay, r = ry] . Â«=i i=i (4.3.7)

PAGE 74

67 To estimate the posterior moments, we use once again the Rao-Blackwellized estimates as in Gelfand and Smith (1991). Note that E[y(y) | s,y(s)] is approximated by m d fy( s ) + (!-/) ( md)~ l Y yu 0 + (1 B(X ij ,r ij )) y(s )) . (4.3.8) t=i j= i Next one approximates 1 /( 7 (y) | s,y(s )) by m d N~ 2 {N n)M _1 Â£ E[( r ii) _1 {l + (^ n)(A li (r ii )1 + n)1 }] <=1 j = 1 m d +(1 f) 2 [(md)-' Y, Â£(Â«(-* + (1 SfAj^Ms )) 2 t=l j=l m d -{( md y 1 Y E(Â£( A b> ry)/io + (1 B(Aij, r y ))y(Â«))} 2 ]. (4.3.9) i=l j=l The Gibbs sampling analysis is based on the following posterior distributions : (i) 9 I a, y(s), y(s), X,r ~ N [(A// 0 + r Ejli 2/*)/(A + riV), (A + rN )~ l ] ; (ii) /(A | s,y(s),y(s),9,r) oc v / Aexp[--|(6> /i 0 ) 2 ] 5 (t 0 2 A); (iii) r | a,y(s),y(a),A,0~ Gamma(i{oo + EÂ£i(l/<+ (iv) 2 /(s) | s, y(s), #, A, r ~ N[0l N _ n , r 1 /jv_ n ]. Recall that if /cA ~ xl then /(A | s,y(s),y(s),9) reduces to a Gamma(|{roA: + (9 Â— Â£fo) 2 }, |(& + 1)) density. Also, if 1 /A has exponential distribution with mean 2 then /(A | s,y(s),y(s),9) is reduced to a IGN (1 /\Jr\$(9 n 0 ) 2 , 1/tq) density. Now to evaluate the performance of the robust Bayes estimator 5| M of 0 for large n, we denote Â£ as the prior under which 9 and r are independent with 9 rsj N(h 0 ,t 2 ) and r ~ Gamma(|a 0 , |< 7 o)The Bayes estimator of 9 under prior Â£ is given by (4.3.2). Our goal is now to show that r(f , 5 2 SM ) r(Â£, S B ) -Â» 0 as n -4 00 .

PAGE 75

68 Lemma 4.3.1 Assume E( A 3 / 2 ) < oo. Then E[B{r) | s, y(s)] 4 0 as n -> oo. Also, E[B(X,r) | s,y(s)] 4 0 as n -4 oo. Proof of Lemma 4.3.1 First show J5'[B(r) | s, y(s)] 4 0 as n Â— > oo. This amounts to showing N n /D n 4 0 as n Â— > oo, where = / ./0 00 r (Â«+So-2)/2 o (1 + titq r) 3 / 2 ex P[-F{Â°o + E(^^( s )) 2 + iGs 4?/( g ) aq) s 1 + titq r }]dr (4.3.10) and /Â•oo r (n+9o-2)/2 n(fi(A Â— uA 2 Dn = l (1 + nr 2 r )i/2 ex P[~2^ a Â° + T,(Vi ^( g )) 2 + \ }]^ r (4-3.11) iGs 1 + utq r Note that Dr, roo r (n+9o-2)/2 /.-j/'oÂ’v _ .. \2 -1 + (4-3.12) rG s /o (1+nrgr) 1 / 2 Hence, defining W n ~Gamma(4{a 0 + Zi&iVi ~ V(s)) 2 }, \{n + g Q 1)), N n -4 < E D n ~ 7 ' 4 2 1 V 1 + nr 0 2 WÂ„ y ) 1 + nr 2 W n -E W n 1 / 2 ' V 1 +m\$W nt exp (y(s) /x 0 ) ; 2 r 0 2 < E nr 2 W n \ 1/2 1 (l + nr 0 2 W n y/ 1 4nr^HL 4-Â£ nr oW n \ 1/2 ' 1 + nr 0 2 W n/ exp jy(s) po) 2 2 T n 2 Â£ (**!/)Â• tin (4.3.13)

PAGE 76

69 Now, A n < E[(l + nrlWn)' 1 } < n-\T%)x E(Wx ). Note that n -i E{W~ X ) = n~ x a Â° ^( ,s )) g n -\go ~ 3 (4.3.14) since by the law of large numbers for exchangeable sequences, 4 R~ x . Again, B n = E[(n 1 (tq) 1 W n 1 + 1 ) 1/2 ] exp | Â— > {n~ l [rl)~ x E(W~ l ) + l) -1 / 2 exp f(y(s)-g o) 2 ^ 2r 0 2 ) {y{s) mo) 2 ' 2r,? using JensenÂ’s inequality since (a; + 1) */ 2 is a convex function of x. Hence, (4.3.15) N, < nx (r^x E(Wx )(nx (^)x E(Wx ) + l^exp '{y{s) Mo ) 2 A ^ Q 2r 0 2 (4.3.16) since (y{s) Mo ) 2 = O p { 1) and n~ 1 E(W~ 1 ) ^ 0. Next we need to show that E[B( A, r) | s, y(s)] 4 0 as n -4 oo, that is, N' n /D' n 4 0 as n Â— > oo, where oo roc r.(n+po-2)/2 n(p(s) Mo ) 5 *Â“/. Jo *Â« a) (IT^FT (4.3.17) and /Â•oo /-oo r (n+goÂ— 2)/2 d: = / / r 0 ao (T + nA~ 1 r) 1 /2 ^Ph^^+Efa-gW)^ 5 ^^}]^(4.3.18) Note that K> t o9(to A) r (n+go-2)/2 (1 + nA -1 ?-) 1 / 2 ex P[-o{Â«o+E(^-?/( s )) 2 }A(?/(s) Mo) 5 ]drdA. (4.3.19)

PAGE 77

70 Hence, defining W n as before, N' TV * E W n 1/2 1 -i-E 1 + n\x W n ) 1 + nX~ l W n (i+Â„Tv ) 7 exp ^ < E nX~ l W n 1/2 A 1 / 2 . 1 l+n\l W n J 1 + n\~ l W n Â±E nX~ l W n 1 +nA1 H7 l 1/2 A 1/2 exp f-^(?/(s) K>) 2 ^7 ( sa y)> (4.3.20) where expectation is taken with respect to the joint pdf /(A , w ) = r o#( r o A) exp[-^{a 0 + Â£(&Â• t/(s)) 2 }] iÂ£s X . n+flfl-l w (n+g 0 -\)/2-l ( Oo+^ iÂ£i (y^-S(s)) 2 ^ 2 r( n + g0 -l ^ y 2 (4.3.21) Now, A' n < E[X ll2 nx XW1 } = nl E{X^ 2 )E{W~ x ) = n~ x E( A 3 / 2 ) Â°Â° + ^ g ( o y i 3 y( ' ))2 ^ 0, (4.3.22) if Â£(A 3 / 2 ) < oo. In the above, we have used independence of W n and A. Again, K = E{(n~'XW^ + I)"'/ 2 W 2 exp(-i(y( S ) p 0 ) 2 )] = + (4.3.23)

PAGE 78

71 Since (x + 1) : / 2 is a convex function of x, using JensenÂ’s inequality for conditional expectations, EKn-'AW1 + I)' 1 / 2 | A] > [E(nl XW1 | A) + 1 ]-V 2 = (n^XEiW1 ) + l)" 1 / 2 . (4.3.24) Hence, = m A fl 0 E EiesiVi y{s)) 2 n n + go Â— 3 + !) V2aV2 exp(Â— ^(y(s) // 0 ) 2 )] (4.3.25) By the same argument in Lemma 4.2.1, ( y(s ) Â— // 0 ) 2 converges a.s. to a rv, say F 0 '. Hence, using FatouÂ’s lemma, it follows from (4.3.25) that U n Â™ S f K > Â£[A 1/2 exp(~y o ')]. (4.3.26) Hence, B' n is bounded away from zero a.s. in the limit. Hence A' n /B' n A 0 so that K/K 4 o. We now turn to the theorem which proves the A.O. property of 5 2 SM obtained in (4.3.5). Theorem 4.3.1 Assume E( A 3/2 ) < oo. Then r(Â£,5f M ) r(Â£,8 B ) -> 0 as n Â— > oo. Proof of Theorem 4.3.1 Standard Bayesian calculations yield Â»(*, %m) ~ = E(S 2 sm S B ) 2 = (1 f) 2 E[{E[B( A, r) | s, y(s)] E[B{r) | s, y(s)]} 2 (y(s) yu 0 ) 2 ](4.3.27)

PAGE 79

72 By Lemma 4.3.1, E[B(\,r) | s,r/(s)] 4 0 as n oo and E[B(r) \ s,y(s)] 4 0 as n -> oo. Hence, {Â£[B(A,r) | s,y(s)] E[B(r) | s,y(s)]} 2 4 0 as n -> oo. Also, |E[5(A,r) | s, y(s)] Â— E[B(r) \ s, r/(s)]| < 1 and (y(s) Â— p 0 ) 2 is uniformly integrable. Hence, the rhs of (4.3.27) -Â» 0 as n Â— > oo. This completes the proof of the theorem. 4.4 An Example We illustrate the methods of Section 4.2 and 4.3 with an analysis of data in Cochran (1977). The data set consists of the 1920 and 1930 number of inhabitants, in thousands, of 64 large cities in the United States. The data were obtained by taking the cities which ranked fifth to sixty-eighth in the United States in total number of inhabitants in 1920. The cities are arranged in two strata, the first containing the 16 largest cities and the second the remaining 48 cities. But for our purpose, we just use the second stratum only. For the complete population, we find the population mean to be 197.875 and the population variance 5580.92. We use the 1920 data to elicit the prior in our setting so that /x 0 = 165.438 and r 0 2 = 71.424. We want to estimate the average (or total) number of inhabitants in all 48 cities in 1930 based on a sample of size 16 (i.e. 1/3 sample). For illustrative purposes, we have decided to report our analysis for one sample. In deriving the robust Bayes estimates based on heavy-tailed prior distributions using scale mixtures of normals, we have considered Gibbs sampler with 10 independent sequences, each with a sample of size 5000 with a burn in sample of another 5000'. Table 4.1 provides the Bayes estimates of 7 (y) and the associated posterior standard deviations for the normal, double exponential and t priors with degree of freedom 1, 3, 5, 10 and 15 in both known and unknown a 2 cases. For the unknown a 2 case, we have used a 0 = g 0 = 0 to ensure some form of diffuse gamma prior for the inverse of

PAGE 80

73 the variance component in our superpopulation model. Note that the naive estimate, that is, the sample mean is 207.69. An inspection of Table 4.1 reveals that there can be significant improvement in the estimate of 7 (y) by using heavy-tailed prior distributions rather than using the normal prior distribution in the sense of the closeness to 7 (y). For instance, using the double exponential and the t(l), t(3), t (5) , t(10) and t (1 5) priors, the percentage improvements over the normal are given respectively by 45.78%, 89.05%, 52.06%, 30.68%, 15.53% and 9.06% for the known a 2 case. Here the percentage improvement of ei over e 2 is calculated by ((e 2 truth) 2 ( e-i truth) 2 )/(e 2 truth) 5 where e\ is the robust Bayes predictor based on heavy-tailed prior distributions and e 2 is the Bayes predictor using the normal prior. Also as one might expect, flatter the prior, closer is the Bayes estimates to the sample mean. In general, for most cases we have considered, the Cauchy prior (i.e., t prior with 1 degree of freedom) leads to an estimate which is closest to the population mean. We adopt the basic approach of Gelman and Rubin (1992) to monitor the convergence of the Gibbs sampler. For 6, we simulate m = 10 independent sequences each of length 2d = 10000 with starting points drawn from a t distribution with 2 degree of freedom. The justification for t distributions as well as the choice of the specific parameters of this distribution are given below. First note that from the posterior distribution of A given s and y(s) as given in (4.2.1), we find the posterior mode, say A by using the Newton-Raphson algorithm. Also, we use y(s) for Â£ s based on sample. We can now very well use the N [(A/x 0 + Ny(s)/a 2 )/( A + N/a 2 ), (A + N/a 2 )1 as the starting posterior distribution for 0. But in order to start with an overdispersed distribution as recommended by Gelman and Rubin, we take t distribution with 2 degree of freedom. Also, note

PAGE 81

74 that once the initial 6 value have been generated, the rest of the procedure uses the posterior distributions as given in (i)-(iii) in Section 4.2. Similar procedures can be used for the unknown a 2 case. Next, as in Gelman and Rubin, we compute B/5000=the variance between the 10 sequence means 0j. each based on 5000 6 values, 10 _io that is B/5000 = ^T(0j. Â— 6..) 2 /(10 Â— 1), where 6 .. = 9 t .. Â»=1 i = 1 WÂ— the average of the 10 within-sequence variance s 2 each based on (5000-1) degrees 10 of freedom, that is W = ^ ^ s i Â• i=l Then, find Â„ o 50001 1 a 2 Â— W B 5000 5000 and V ~ a + (io)(5000) B ' Finally, find R = V/W. If R is near 1 for all scalar estimands of interest, it is reasonable to assume that the desired convergence is achieved in the Gibbs sampling algorithm (see Gelman and Rubin (1992) for the complete discussion). The second column of Table 4.2 provides the R values (the potential scale reduction factors) corresponding to the estimand 6 using Cauchy and double exponential priors based on 10 x 5000 = 50000 simulated values. The third column provides the corresponding 97.5% quantiles which are also equal to 1. The rightmost five columns of Table 4.2 show the simulated quantiles of the target posterior distribution of 9 for each one of the 4 estimates based on 10 x 5000 = 50000 simulated values.

PAGE 82

75 Table 4.1. Bayes Estimates and Associated Posterior Standard Deviations Priors Known a 2 Unknown a 2 Bayes Estimates Posterior SD Bayes Estimate Posterior SD Normal 184.31 10.19 183.70 11.47 DE 187.89 12.16 187.09 13.14 t(l) 193.38 15.01 192.53 15.79 t(3) 188.48 12.63 187.69 13.60 t(5) 186.58 11.51 185.93 12.60 t(10) 185.41 10.75 184.68 11.91 t(15) 184.94 10.50 184.30 11.75 Table 4.2. Potential Scale Reduction and Simulated Quantiles Potential scale reduction Simulated quantiles Priors R 97.5% 2.5% 25.0% 50.0% 75.0% 97.5% Known a 2 Cauchy 1.00 1.00 160.00 171.98 183.09 198.17 226.89 DE 1.00 1.00 158.68 168.54 176.00 185.55 206.82 Unknown a 2 Cauchy 1.00 1.00 157.54 170.38 181.46 196.96 226.55 DE 1.00 1.00 156.63 167.27 174.63 184.55 206.39

PAGE 83

CHAPTER 5 BAYESIAN ROBUSTNESS IN SMALL AREA ESTIMATION 5.1 Introduction Small area estimation is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. In typical small area estimation problems, there exist a large number of small areas, but samples available from an individual area are not usually adequate to achieve accuracy at a specified level. The reason behind this is that the original survey was designed to provide specific accuracy at a much higher level of aggregation than that for small areas. This makes it a necessity to Â“borrow strengthÂ” from related areas through implicit or explicit models that connect the small areas to find more accurate estimates for a given area, or simultaneously, for several areas. Ghosh and Rao (1994) have recently surveyed an early history as well as the recent developments on small area estimation. Like frequentist methods, Bayesian methods have also been applied very extensively for solving small area estimation problems. Particularly effective in this regard has been the hierarchical or empirical Bayes (HB or EB) approach which are especially suited for a systematic connection of the small areas through models. For the general discussion of the EB or HB methodology in the small area estimation context, we may refer to Fay and Herriot (1979), Ghosh and Meeden (1986), Ghosh and Lahiri (1987), Datta and Ghosh (1991), Datta and Lahiri (1994), among others. 76

PAGE 84

77 In this chapter, we propose an alternative Bayesian approach, namely the robust Bayes (RB) idea which has been discussed in the previous chapters in the context of a single stratum. Specifically, the HB procedure models the uncertainty in the prior information by assigning a single distribution (often noninformative or improper) to the prior parameters (usually called hyperparameters). Instead, as discussed in the earlier chapters, the RB procedure attempts to quantify the subjective information in terms of a class T of prior distributions. In order to study Bayesian robustness in the context of small area estimation, we consider the following hierarchical Bayes model. (A) Conditional on 0, /3, and r 2 , let Yi, Â• ,Y P be independently distributed with Y t ~ N(6i, Vi ), i = 1, Â• Â• Â• ,p, where the VjÂ’s are known positive constants; (B) Conditional on /3, and t 2 , 0i, Â• Â• , @ p are independently distributed with 0, ~ N(xf(3,T%i = 1, , p, where aq, Â• Â• Â• , x p are known regression vectors of dimension s and (3 is sxl; (C) (3 ~ uniform(7? s ) and r 2 is assumed to be independent of (3 having a distribution h(r 2 ) which belongs to a certain class of distributions I\ We shall use the notations Y = (Y u Â• Â• , Y p ) T , 6 = (6 X , Â• Â• , 9 p ) T , X = (sq , Â• Â• Â• , x p ) T . Write G Diag{Ri, Â• Â• , V p } and assume rank ( X ) = s. Cano (1993) considered a special case of this model when Xi = 1 and Vj = V for i = 1 , Â• Â• Â• , p. The outline of the remaining sections is as follows. In Section 5.2, we choose T to be e-contamination class of priors where the contamination class includes all unimodal distributions. We develop the robust hierarchical Bayes estimators of the small area means and the associated measures of accuracy (i.e., the posterior variances) based on type-II maximum likelihood priors. Also we provide the range where the small area means belong under the ^-contamination class.

PAGE 85

78 In Section 5.3, we choose T to be the density ratio class of priors. As suggested by Wasserman and Kadane (1992), we use Gibbs sampling to compute bounds on posterior expectations over the density ratio class. In Section 5.4, we choose T to be the class of uniform priors on r 2 with r 2 < r 2 < t 2 We are interested in the sensitivity analysis of the posterior quantity over T. Finally, Section 5.5 contains the analysis of the real data to illustrate the results of the preceding sections. 5.2 e-Contamination Class In this section, we consider the class T of priors of the form T Â— {h : h = (l e)h 0 + eq, q Â€ Q}, (5.2.1) where 0 < e < 1 is given, h 0 is the inverse gamma distribution with pdf q,A> ^ M^ 2 ) = r( Â° o j ^ 2 ^ +1 exp(-o;o/r 2 )/ ( o, 00 )(r 2 ), (5.2.2) denoted by IG(a 0i A>)> and Q is the class of all unimodal distributions with the same mode Tg as that of /i 0 . The joint (improper) pdf of Y, 0, (3 and r 2 is given by /(y,^,/3,T 2 ) oc exp[Â— i(y 6) T G~\y 0 )\(t 2 )~* exp[Â— ^||0 X(3\\ 2 ] x {(1 Â£)h 0 (r 2 ) +eq(T 2 )}. (5.2.3) Integrating with respect to (3 in (5.2.3), one finds the joint (improper) pdf of Y, 6, and r 2 given by /(y,0,r 2 ) . oc (r 2 )-^ exp[-i(y 0) T G~\y G) ^G T (I p X(X T X)~'X T )0\ X {(1 ~ Â£)h 0 (T 2 ) + eq(T 2 )}. (5.2.4)

PAGE 86

79 Write E 1 Â— G 1 + (r 2 ) x (J p Â— X(X T X) l X T ). Then, one can write (y 0) T G _1 (y 9) + (t 2 )~ 1 B t (I p X(X T X)~ 1 X T )0 = G T E~ l G 26 T G~ l y + y r G _1 i/ = (0 ,EG1 y) r J E1 (0 EG~ l y) + y T (G~ 1 G^EG^y. (5.2.5) From (5.2.4) and (5.2.5), we have Â£[0|y,r 2 ] EG~ x y\ U[0|y,r 2 ] = E (5.2.6) Using (5.2.5) and integrating out with respect to 0 in (5.2.4), one gets the joint (improper) pdf of Y and r 2 given by f(y,r 2 ) oc \E\ 1 2(t 2 )-^ exp[-Â±y T (G1 G~ x EG~ x )y}{(\ e)h 0 (r 2 ) + eq(r 2 )} (5.2.7) We denote by m(y\h) the marginal distribution of y with respect to prior h, namely Â™(y\h) = J f(y\r 2 )h(dT 2 ). (5.2.8) For h E T, we get m (y\h) = (1 e)m(y\h Q ) + em(y\q). (5.2.9) Our objective is to choose the ML-II prior h which maximizes m(y\h) over T. This amounts to maximization of h(y\q) over q Â€ Q. Using the representation of each qe Q as a mixture of uniform densities, the ML-II prior is given by h(r 2 ) = (1 Â£)h 0 (r 2 ) + eq(r 2 ) (5.2.10) where q is uniform (r 0 2 , r 0 2 + z), z being the solution of the equation f(y\ z ) = lJÂ° f(y\r 2 )dr 2 , (5.2.11)

PAGE 87

80 and r 0 2 is the unique mode of h 0 (r 2 ). Note that f{y\r 2 ) oc exp[-iy r (G _1 G~ x EG~ x )y\. (5.2.12) Write u { = V { / (Vi + r 2 ), * = 1, Â• Â• Â• ,p, and D = Diag{l u 1; Â• Â• , 1 u p }. Then, on simplification, it follows that E = t 2 (I p -D) + t 2 (I p D)X(X T DX)~ x X T (I p D); (5.2.13) EG 1 = D + (I p Â— D)X(X t DX)~ 1 X t D ; (5.2.14) EG ~ V = [(1 Ui)yi + Â•Â•Â•,(! Â— u p )y p + u p x p P] T , (5.2.15) where 3 = (X T DX)~ x (X T Dy). Then G l G~ X EG X = (t 2 )~ x [D DX(X T DX)~ x X T D\. (5.2.16) Hence, V T (G~ l G-'EG-')y = (r 1 )' 1 Â©
PAGE 88

81 It is clear from (5.2.18) and (5.2.19) that f{yV 2 ) oc (r 2 )2{f[(r 2 + Vi) *}\X t DX\ i = 1 '* exp[-ig T a(y)]. (5.2.20) Now to find the solution z to the equation (5.2.11), we consider zf{y\z) = / Â° + f{y\T 2 )dr 2 . (5.2.21) Jt 0 By differentiating both sides with respect to z, we get f{y\ z ) + z ^-J{y\z) = /(y|r 0 2 + z). (5.2.22) By Lemma A. 4. 4., Lemma A. 4. 5. and Theorem A. 4. 6 in Anderson (1984), recall that for the s x s matrix A = (ay), and jz \ 4=EE^, (5.2.24) where Ay = ^-|A| is the cofactor of ay. Write A Â— Y%= i ^vi x i x IThen after some calculations using (5.2.23) and (5.2.24), it can be shown that dz fiy\z) = 1 f[( 2: + v i) * I H ^ exp {Â— i=l t=l Z + V i t=i t=i f=l Z + V i XiX { \ * exp{-Â— H(z)} ~ o z *n( z + v i) *
PAGE 89

82 + 2 f n ( Z + Vt) l2 \J2Â—^T7X i x'[\ 2 exp{ i = 1 t=l z v i 1 2z J_ d_ 2z dz H{z)\ (5.2.25) where "w Â£ tt^. 2 0 and /?o > 0, and q is uniform (tq , Tq + z ), with z being the solution of (5.2.27) and r 0 2 = a 0 /(ft + 1). It is clear from (5.2.20) that h(r 2 \y) oc(r 2 )Â‘{nL,(r 2 + K)-i}|X 7 'DXj-iexp[-iQ rI (y)]{(l-e)h 0 (T 2 )+ Â£ f(r 2 )}. (5.2.28)

PAGE 90

83 Now writing Ui Â— Vi/(r 2 + Vi) (i = 1, Â• Â• Â• ,p), using (5.2.28), and the iterated formulas for conditional expectations and variances, one gets E[0i\y\ = E[E(di\y,r 2 )\y] = E[( 1 + U { xJ f3\y\(5.2.29) and vm y] = V[E(6i\y,T 2 )\y] + E[V(9i\y,T 2 )\y] = V[{1 Ui) yi + U iX Jf3\y] + E[r 2 U i + T 2 U 2 xJ{X T DX)l x i \y] = Vpiivi xj(3)\y] + E{Vi( 1 Ui) + W,(l U i )xJ(X T DX)~ l x i \y}. (5.2.30) Thus, the posterior distribution of G under the ML-II prior is obtained using (5.2.4) and (5.2.28). In addition, one uses (5.2.29) and (5.2.30) to find the posterior means and variances of under this prior. Similarly, by using the iterated formulas, posterior covariances may be obtained as well. Next we consider the problem of finding the range of the posterior mean of 6i over T in (5.2.1). Using the expression h(r 2 \y) oc /(y|r 2 )h(r 2 ), we have E \0i\v] = Vi ~ E[Ui(yi xj(3)\y] v _ Jo 00 UtiVi ~ xf f3)h(r 2 \y)dr 2 Jo 00 h(r 2 \y)dT 2 Â‘ Simple modifications of the arguments of Sivaganesan and Berger (1989) or Cano (1993) leads to the following result sup(inf)Â£[0j|y] = sup(inf)/iGT t A + ~ u i(Vi ~ xj(3))f(y\r 2 )dr 2 E + f*2 f(y\T 2 )dr 2 (5.2.32) where B = Â— f f (y\r 2 )h 0 (r 2 )dr 2 Â£ Jo (5.2.33)

PAGE 91

84 and 2 Â— Â£ roo A = ViB Â— Ui(yi xJp)f(y\T 2 )h 0 (T 2 )dT 2 . (5.2.34) o J 0 The above sup (inf) can be obtained by numerical optimization. These formulas will be used in Section 5.5 in the context of estimation of median income of four-person families. 5.3 Density Ratio Class In this section we consider a class of priors, introduced by DeRobertis and Hartigan (1981), and called a density ratio class by Berger (1990), ^ l,u {h : Mr 2 ) ^ Mr 2 ) < h(r' 2 ) 1 (t' 2 ) for all r 2 and r' 2 }, (5.3.1) where l and u are two bounded nonnegative functions such that /(r 2 ) < u(r 2 ) for all t 2 . This class can be viewed as specifying ranges for the ratios of the prior density between any two points. By taking u = kho and l Â— ho, we have very interesting subclass, r *(M = i h : ^ for al1 r2 and r ' 2 }> ( 5 3 2 ) where k > 1 is a constant. This class may be thought of as a neighborhood around the target prior /i 0 The interpretation is that the odds of any pair of points are not misspecified by more than a factor of k. This prior is especially useful when h 0 is a default prior chosen mainly for convenience. Because of the expression h(r 2 \y) oc f(y\T 2 )h(r 2 ) we can view our problem as having just the parameter r 2 , h being the prior and f{y\r 2 ) the likelihood. Since B[9i\y] =Vi-E[ Vi r 2 + Vi {Vi x. Â’(Â£ lr 2 + V t xixfr 1 ^ T 2=1 T 2 + Vi ViXi)M, (5-3.3)

PAGE 92

our problem reduces to finding 85 sup (inf) E[b{r 2 )\y] (5.3.4) fcerf(fto) where b{r 2 ) = y < xf(Ef =1 Wasserman and Kadane (1992) have developed a Monte Carlo technique which can be used to bound posterior expectations over the density ratio class. Wasserman (1992) has shown that the set of posteriors obtained by applying BayesÂ’ theorem to the density ratio class T%(h 0 ) is the density ratio class T^(hÂ° y ) : where hÂ° y (r 2 ) = h 0 {r 2 \y ) is the posterior corresponding to h Q . To see this all we need to do is to write If(*Â£) = {hy : M r2 ) < ,, h y ( r2 ) hy{ r ' 2 ) ~ K hy{ T ' 2 ) for all t 2 and t' 2 }, (5.3.5) and observe Ey\r 2 )h{r 2 ) , h 0 {r 2 \y) ousei ve fc(T , 2 |y) Â— f(y\ T ' 2 )h(TÂ’ 2 ) ancl ' ho{j' 2 \y) ~ f(y\r' 2 )h 0 (r' 2 ) f(y\ T ' 2 )ho{r 2 ) r h at h G T R (hn) is f(V\T' 2 Vi n (T' 2 \ S0 tnat n 1 k \ n o) 1S equivalent to hy G T R (h y ), where h y (r 2 ) = h(r 2 \y). Wasserman (1992) calls this as Bayes invariance property. Hence, to bound the posterior expectation of 5(r 2 ), we only need bound the expectation of 6(r 2 ) on Tf (hÂ° y ). To do so, we will need to draw a sample from the posterior h y . Following Wasserman and Kadane (1992), we can rely on recent sampling-based methods for Bayesian inference. Note in this case that V^ 2 | y) oc ( r 2 ) 2 {J[(t 2 + Vi)~ 2 }\X T DX\ t=i ' 2 exp[-iQ T 2 (y)] h 0 {r 2 ). (5.3.6) Let t 2 , Â• Â• Â• , T^f be a random sample from /i 0 (r 2 |y). Let b{ = b(r 2 ),i = 1 and let 6p) < &( 2 ) < Â• Â• Â• < 5( n) be the corresponding order statistics. Also, let c i = ~K t2 )i i Â— 1) ' Â’ * ) N and let cp) < C( 2 ) < Â• Â• Â• < C( N ) be the corresponding order statistics. Let b = J2iL\bi/N and c = Eili Ci/AL Then following Wasserman and Kadane (1992), we get sup E[b{r 2 )\y] = sup E[b{r 2 )\y) her*(h 0 ) h y er *(hÂ° y )

PAGE 93

86 Â« max { (1 1
PAGE 94

87 where t\ and r 2 2 are arbitrary nonnegative numbers such that r 2 < r|. This class of priors is attractive because of its mathematical convenience, and indeed give a good enough class of priors for an honest robustness check. Classes of conjugate priors having parameters in certain ranges have been studied by Learner (1982) and Polasek (1985). From (5.2.6) and (5.2.15), we have sup (infJEftly.T 2 ] Tj 2
PAGE 95

88 the preceding year. In addition, once every ten years, similar figures are obtained from the decennial census for the calendar year immediately preceding the census year. The latter set of estimates is believed to be much more accurate and serve as the Â“gold standardÂ” against which other estimates need to be tested. Direct use of CPS estimates was found undesirable because of the large coefficient of variation associated with them (cf Fay (1987)). The regression method used by the Bureau of the Census was intended to rectify this drawback. The method that is used since 1985 is as follows. The current year CPS estimates of statewide median incomes of four-person families are used as dependent variables in a linear regression. In addition to the intercept term, the regression equation uses as independent variables the base year census median(b), and the adjusted census median(c), where census median(b) represents the median-income of four-person families in the state for the base year b from the most recently available decennial census. The adjusted census median (c) is obtained from the formula Adjusted census median(c) = [BEA PCI(c)/BEA PCI(6)] x Census median (5). (5.5.1) In the above BEA PCI(c) and BEA PCI(6) represent respectively estimates of percapita income produced by the Bureau of Economic Analysis of the U.S. Department of Commerce for the current year c, and the base year 5, respectively. Thus (5.5.1) attempts to adjust the base year census median by the proportional growth in the BEA PCI to arrive at the current year adjusted median. The inclusion of the census median (b) as a second independent variable is to adjust for any possible overstatement of the effect of change in BEA PCI upon the median income of four-person families. Finally, a weighted average of the CPS sample estimate of the current median income and the regression estimates is obtained, weighting the sample estimates inversely proportional to the sampling variances and the regression estimates inversely

PAGE 96

89 proportional to the model error due to the lack of fit for the census values of 1979 by the model with 1969 used as the base year. The data consist of 1*, the sample median income of four-person families in state i and associated variance V{ (i = 1, Â• Â• Â• , 51). The true medians corresponding to Y^s are denoted by 6>jÂ’s respectively. The design matrix X = (x u , x 51 ) r is of the form xj = (1 x u x i2 ) , (i = 1, Â• Â• Â• , 51), (5.5.2) where Xu and X{ 2 denote respectively the adjusted census median income and the base-year census median income for four-person families in the i th state. We consider the HB model as given in Section 5.1. We choose Y to be econtamination class of priors where the contamination class includes all unimodal distributions. We find RB estimates of statewise median incomes of four-person families for 1989 using 1979 as the base year based on ML-II priors. In finding RB estimates, we have used (5.2.29) and (5.2.30) with a 0 =l, A) =10 and e=.l. The alternatives in our setting are HB and EB estimates of 6{. For the HB analysis, we use diffuse prior on r 2 instead of using class of priors. The EB method is due to Morris (1983b) which uses estimates of r 2 and (3 rather than assigning any prior distribution on hyperparameters. Since in addition to the sample estimates, the 1989 incomes are also available from the 1990 census data, we compare all three Bayes estimates against the 1990 census figures, treating the latter as the Â“truthÂ”. Table 5.1 provides the Â“truthÂ”, the CPS estimates and three Bayes estimates, whereas Table 5.2 provides the standard errors associated with three Bayes estimates. Now these estimates are compared according to several criteria. Suppose eiTR denotes the true median income for the i th state, and e* any estimate of e iTR . Then for the estimate e = (e : , Â• Â• Â• , e 51 ) T of e TR = (e lTR , Â• Â• Â• , e 5lTR ) T , let 51 Average Relative Bias = (51) -1 ^ |e< e iTR \ /e iTR i=i

PAGE 97

90 51 Average Squared Relative Bias = (51) _1 ]T \a e iTR \ 2 /eT i=l 51 Average Absolute Bias = (51) _1 ^ \e { e iT J i = 1 51 Average Squared Deviation = (51 ) -1 ^ ( e t e^) 2 t=i The above quantities are provided in Table 5.3 for all three Bayes estimates. From Table 5.1, one can see that three Bayes estimates of the small area means are quite close to each other. But from Table 5.2, it appears that the estimated standard errors given by RB method are larger than those given by HB or EB methods. Table 5.3 indicates that the EB estimates are the best and the HB estimates are the second best under all criteria for this data set. As anticipated also, the HB standard errors are slightly bigger than the EB standard errors. As is well-known, this phenomenon can be explained by the fact that unlike the EB estimators, the HB estimators take into account the uncertainty for estimating the prior parameters. Next, we find the ranges of the posterior means for the small areas under econtamination class. Table 5.4a and Table 5.4b provide the ranges of posterior means under e-contamination class when (ao> A) Â— (1, 10) and (ao> A)) = (7, 3) respectively. For (a 0 , A>) = (1) 10), the ranges are fairly small and robustness seems to be achieved using this class for all e values. But for (of 0 , A) ) Â— (7, 3), the ranges are relatively wider in comparison with (ao, A) = (1, 10). As one might expect, the choice of ho, that is, the elicited inverse gamma prior, seems to have some effect on the ranges of the posterior means for e-contamination class. Note that the inverse gamma prior of t 2 with (a 0 , A) = (1> 10) has coefficient of variation 1/8 compared to 1 with (a 0 , A) = (7, 3). Although the two priors have very similar tails, the former is much flatter than the latter even for small values of r 2 . This suggests that the bigger the

PAGE 98

91 tau A 2 Figure 5.1. The IG(1, 10) Prior coefficient of variation of the assumed inverse gamma prior, the wider is the range of the posterior means of the Of s. We also find the ranges of the posterior means under density ratio classes. In computing the bounds, we have considered Gibbs sampler with 10 independent sequences, each with a sample of size 1000 with a burn-in sample of another 1000. We adopt the basic approach of Gelman and Rubin (1992) to monitor the convergence of the Gibbs sampler. We have obtained R values (the potential scale reduction factors) corresponding to the estimand 6 t based on 10 x 1000 = 10000 simulated values. The fact that all the point estimates R are equal to 1 as well as the near equality of these point estimates and the corresponding 97.5% quantiles suggests that convergence is achieved in the Gibbs sampler. Table 5.5a and Table 5.5b provide the ranges of the

PAGE 99

92 tau A 2 Figure 5.2. The IG(7, 3) Prior posterior means under density ratio classes when (a 0 , Po) = (1, 10) and (a 0 , Po) = (7, 3) respectively. Note however that here the bounds given for IG(7, 3) are much closer than the ones for IG(1, 10). The intuitive explanation for this phenomenon is that while can be extremely large for certain choices ( r 2 ,r /2 ) corresponding to the IG(1, 10) prior, the ratio i s more under control for the IG(7, 3) prior. This density ratio classes are very convenient to represent vague prior knowledge and robustness seems to be achieved using the IG(7, 3) prior. Finally, we consider the ranges of the posterior quantities E[\$i\y, r 2 ] over the class of uniform priors. From Table 5.6, we can see that the ranges are not sensitive to

PAGE 100

93 the choice of the upper bound of r 2 . Also, for most of the states, the ranges of the posterior means do not seem to be too wide.

PAGE 101

Table 5.1 Median Income Estimates for 4-Person Family in 1989 (in Dollars) State Census CPS RB HB EB 1 38035 37700 38027 38004 38011 2 46709 52720 48652 48389 48222 3 39601 41489 40319 40170 40109 4 51498 55729 53294 52986 52865 5 45072 47106 43905 43727 43607 6 55115 55048 53224 53092 53015 7 45757 43434 43800 43853 43869 8 54398 53128 53292 53300 53310 9 40833 40990 40820 40780 40774 10 40092 44241 42874 42657 42596 11 39727 40297 39684 39624 39594 12 43960 42590 42762 42792 42800 13 43166 43260 42973 42924 42913 14 40635 42226 41196 41065 41015 15 43212 45432 43653 43507 43425 16 36880 38711 37871 37761 37720 17 37674 39232 39012 38981 38969 18 33426 36966 35381 35171 35093 19 32289 32610 32447 32406 32398 20 36519 40894 38789 38543 38438 21 37997 42781 39232 38960 38798 22 46819 39819 41551 41651 41724 23 51082 48252 49791 49846 49898 24 43279 36587 40165 40294 40411 25 43223 49718 45988 45775 45632

PAGE 102

Table 5.1 (continued) State Census CPS RB HB EB 26 31071 31604 32124 32179 32204 27 37139 37423 37555 37562 37569 28 35901 31834 34314 34510 34627 29 39036 45876 41845 41612 41455 30 38296 37194 37735 37828 37849 31 32566 32186 33603 33712 33778 32 35383 31327 33663 33907 34022 33 34153 32245 34021 34179 34265 34 29987 31330 31863 31900 31925 35 30673 33317 33015 32952 32935 36 32023 36219 35230 35147 35101 37 33111 40358 36251 35997 35827 38 36513 32679 33806 34008 34053 39 32677 29586 32177 32446 32574 40 33385 32227 33025 33110 33149 41 36619 38931 36089 35917 35798 42 40956 36732 39005 39178 39284 43 31046 29761 31261 31385 31456 44 37322 38969 38885 38863 38855 45 36567 37938 37056 36974 36932 46 41289 34579 38180 38517 38694 47 41870 42392 41666 41587 41552 48 37596 41419 39505 39336 39245 49 46088 41121 41669 41765 41788 50 51127 41379 46512 46890 47123 51 47341 44999 44395 44366 44347

PAGE 103

Table 5.2 Estimated Standard Errors of Median Income Estimates for 4-Person Family by State in 1989 (in Dollars) State SD.RB SD.HB SD.EB 1 2300 2207 2152 2 2594 2526 2470 3 2120 2054 2005 4 2009 2029 1992 5 2546 2437 2352 6 2557 2458 2388 7 1361 1336 1329 8 1696 1661 1649 9 1399 1366 1356 10 1590 1595 1567 11 2119 2028 1976 12 1533 1494 1482 13 1420 1390 1381 14 1840 1790 1761 15 2227 2141 2087 16 1818 1766 1739 17 2328 2211 2131 18 1836 1816 1787 19 1759 1706 1683 20 1950 1929 1895 21 2285 2239 2209 22 2349 2240 2172 23 2675 2545 2449 24 2558 2435 2368 25 2536 2441 2372

PAGE 104

Table 5.2 (continued) State SD.RB SD.HB SD.EB 26 1833 1772 1748 27 1357 1327 1321 28 2141 2082 2058 29 2540 2450 2383 30 1271 1257 1249 31 2102 2023 1986 32 1934 1917 1903 33 2025 1965 1938 34 1905 1840 1815 35 1932 1868 1836 36 2258 2160 2098 37 2447 2378 2338 38 1393 1412 1385 39 1980 1967 1948 40 1822 1765 1741 41 2504 2432 2405 42 2233 2155 2109 43 2086 2015 1981 44 2584 2440 2316 45 2205 2108 2047 46 2132 2139 2124 47 1952 1884 1850 48 2164 2092 2046 49 1410 1388 1376 50 2710 2718 2711 51 2599 2460 2346

PAGE 105

98 Table 5.3 Average Relative Bias, Average Squared Relative Bias, Average Absolute Bias, and Average Squared Deviations of the Estimates Estimate Avg Rel Bias Avg Sq Rel Bias Avg Abs Bias Avg Sq Dev RB .0399 .0025 1599.98 4115162.29 HB .0376 .0022 1507.79 3690921.55 EB .0365 .0021 1464.39 3489870.43

PAGE 106

99 Table 5.4a Range of Posterior Mean of 6i for e-contamination Class when cxq Â— 1 and /3 0 = 10 (In Dollars) State Â£ = .1 Â£ = .2 Â£ = .5 Â£ = = 1 inf sup inf sup inf sup inf sup 1 38022 38028 38018 38029 38007 38031 37991 38036 2 48572 48677 48502 48705 48324 48800 48081 49112 3 40277 40330 40241 40344 40147 40388 40018 40528 4 53208 53318 53133 53345 52940 53436 52673 53722 5 43850 43922 43802 43942 43680 44009 43513 44229 6 53184 53235 53150 53249 53062 53295 52941 53444 7 43796 43814 43791 43827 43777 43859 43732 43905 8 53291 53295 53289 53297 53285 53304 53271 53313 9 40810 40822 40801 40825 40777 40834 40745 40862 10 42816 42890 42765 42907 42633 42966 42449 43145 11 39667 39689 39651 39695 39612 39714 39559 39776 12 42760 42770 42757 42777 42749 42795 42724 42821 13 42960 42976 42949 42980 42920 42992 42879 43030 14 41160 41206 41128 41218 41046 41257 40933 41377 15 43610 43666 43572 43680 43476 43730 43344 43891 16 37841 37879 37814 37889 37746 37921 37651 38021 17 39004 39015 38996 39018 38976 39027 38949 39056 18 35323 35397 35272 35415 35141 35477 34961 35667 19 32436 32450 32427 32453 32403 32463 32369 32493 20 38720 38809 38659 38831 38503 38906 38289 39142

PAGE 107

100 Table 5.4a (continued) (In Dollars) Â£ = .1 Â£ = .2 Â£ = .5 Â£ = = 1 State inf sup inf sup inf sup inf sup 21 39150 39256 39079 39284 38897 39380 38650 39689 22 41542 41583 41530 41610 41492 41679 41364 41773 23 49785 49810 49778 49826 49753 49867 49669 49921 24 40151 40209 40135 40246 40078 40341 39886 40468 25 45922 46008 45865 46032 45719 46111 45520 46374 26 32120 32140 32114 32153 32097 32189 32043 32237 27 37554 37557 37553 37559 37550 37565 37539 37572 28 34297 34373 34277 34424 34208 34555 33987 34733 29 41773 41867 41710 41892 41551 41979 41334 42265 30 37728 37759 37721 37781 37697 37837 37625 37915 31 33593 33636 33582 33665 33544 33737 33419 33836 32 33643 33733 33620 33794 33542 33951 33298 34166 33 34007 34067 33991 34108 33938 34212 33768 34354 34 31859 31874 31855 31884 31842 31909 31797 31943 35 32998 33020 32983 33025 32945 33041 32892 33092 36 35205 35237 35184 35245 35129 35273 35055 35364 37 36173 36275 36104 36303 35930 36398 35694 36708 38 33792 33859 33777 33905 33726 34026 33570 34194 39 32155 32254 32129 32322 32043 32495 31772 32733 40 33018 33049 33010 33071 32984 33125 32899 33200

PAGE 108

Table 5.4a (continued) (In Dollars) Â£ = = .1 Â£ = = .2 Â£ = = .5 Â£ = = 1 State inf sup inf sup inf sup inf sup 41 36036 36106 35989 36125 35871 36190 35710 36404 42 38989 39057 38971 39103 38910 39219 38711 39377 43 31250 31298 31237 31330 31194 31412 31057 31525 44 38879 38886 38873 38888 38860 38894 38841 38913 45 37032 37063 37011 37071 36958 37098 36885 37184 46 38151 38278 38118 38365 38006 38585 37649 38887 47 41644 41672 41624 41679 41574 41704 41505 41781 48 39456 39520 39412 39537 39301 39593 39149 39775 49 41662 41694 41654 41716 41630 41774 41554 41854 50 46477 46626 46437 46726 46302 46980 45864 47327 51 44386 44398 44378 44401 44358 44411 44331 44447

PAGE 109

102 Table 5.4b Range of Posterior Mean of 9{ for e-contamination Class when a 0 Â— 7 and /3 0 = 3 (In Dollars) Â£ = .1 Â£ = .2 Â£ = .5 Â£ = = 1 State inf sup inf sup inf sup inf sup 1 37895 38034 37890 38034 37881 38034 37813 38034 2 48389 51266 48389 51311 48389 51389 48389 51931 3 40170 41215 40170 41225 40170 41243 40170 41355 4 52986 55129 52986 55151 52986 55188 52986 55430 5 43727 45865 43727 45902 43727 45966 43727 46417 6 53092 54424 53092 54444 53092 54478 53092 54713 7 43522 43853 43519 43853 43513 43853 43478 43853 8 53179 53300 53177 53300 53174 53300 53154 53300 9 40780 40962 40780 40964 40780 40966 40780 40977 10 42657 43938 42657 43949 42657 43968 42657 44092 11 39624 40123 39624 40129 39624 40139 39624 40208 12 42623 42792 42622 42792 42619 42792 42606 42792 13 42924 43195 42924 43198 42924 43202 42924 43228 14 41065 41971 41065 41980 41065 41996 41065 42098 15 43507 44876 43507 44895 43507 44926 43507 45141 16 37761 38506 37761 38514 37761 38527 37761 38609 17 38981 39191 38981 39192 38981 39196 38981 39214 18 35171 36585 35171 36599 35171 36623 35171 36777 19 32406 32596 32406 32597 32406 32598 42306 32606 20 38543 40353 38543 40372 38543 40405 38543 40622

PAGE 110

103 Table 5.4b (continued) (In Dollars) Â£ = .1 Â£ = = .2 Â£ = = .5 Â£ = = 1 State inf sup inf sup inf sup inf sup 21 38960 41642 38960 41680 38960 41744 38960 42181 22 40445 41651 40426 41651 40392 41651 40158 41651 23 48922 49846 48903 49846 48870 49846 48634 49846 24 38170 40294 38126 40294 38049 40294 37494 40294 25 45775 48297 45775 48340 45775 48414 45775 48934 26 31749 32179 31743 32179 31735 32179 31678 32179 27 37467 37562 37465 37562 37463 37562 37446 37562 28 32613 34510 32587 34510 32543 34510 32242 34510 29 41612 44350 41612 44397 41612 44476 41612 45035 30 37314 37828 37309 37828 37302 37828 37253 37828 31 32637 33712 32622 33712 32597 33712 32423 33712 32 31971 33907 31948 33907 31910 33907 31656 33907 33 32772 34179 32754 34179 32724 34179 32518 34179 34 31505 31900 31499 31900 31490 31900 31423 31900 35 32952 33281 32952 33283 32952 33286 32952 33304 36 35147 35915 35147 35925 35147 35943 35147 36061 37 35997 38886 35997 38932 35997 39011 35997 39561 38 32916 34008 32907 34008 32892 34008 32795 34008 39 30299 32446 30274 32446 30231 32446 29950 32446 40 32447 33110 32439 33110 32426 33110 32340 33110

PAGE 111

104 Table 5.4b (continued) (In Dollars) Â£ = .1 Â£ = .2 Â£ .5 Â£ = 1 State inf sup inf sup inf sup inf sup 41 35917 37920 35917 37952 35917 38007 35917 38386 42 37457 39178 37433 39178 37392 39178 37113 39178 43 30223 31385 30208 31385 30181 31385 30003 31385 44 38863 38985 38863 38985 38863 38986 38863 38986 45 36974 37682 36974 37691 36974 37706 36974 37806 46 35615 38517 35579 38517 35518 38517 35111 38517 47 41587 42194 41587 42201 41587 42213 41587 42291 48 39336 40848 39336 40867 39336 40900 39336 41123 49 41236 41765 41232 41765 41224 41765 41177 41765 50 43049 46890 42994 46890 42900 46890 42261 46890 51 44366 44737 44366 44745 44366 44757 44366 44849

PAGE 112

Table 5.5a Range of Posterior Mean of 0* for Density Ratio Class when a 0 = 1 and /3 0 = 1 0 (In Dollars) K = =2 K= =6 K= =10 State inf sup inf sup inf sup 1 35339 38450 31228 41677 28887 43795 2 50698 56652 43881 63947 39398 68553 3 38802 41828 34739 44414 32550 46181 4 53020 56279 48577 59580 45838 61773 5 46062 51233 40404 57775 36866 61899 6 51710 57020 44903 62638 40688 66174 7 43001 44363 41467 46142 40538 47298 8 49985 53007 45643 55203 42826 56608 9 40516 42089 38709 43999 37604 45166 10 43398 45319 41044 47518 39613 48883 11 37303 40512 33025 42826 30525 44454 12 41371 43116 39051 44876 37604 45953 13 42097 43464 40266 44660 39149 45440 14 40704 43328 37290 46100 35206 47813 15 43847 47653 39244 51952 36447 54577 16 37643 39925 35011 42709 33515 44577 17 38530 42162 34646 46930 32344 49980 18 36326 39099 33501 42840 31995 45223 19 31955 33558 29868 35135 28640 36166 20 40599 43778 37207 48068 35120 50698

PAGE 113

Table 5.5a (continued) (In Dollars) K: =2 K: =6 K= =10 State inf sup inf sup inf sup 21 40447 44625 34844 49190 31459 52134 22 38674 41844 35025 45539 32898 47861 23 44755 50078 37859 55709 33486 59205 24 35982 39999 31955 45620 29616 49322 25 45929 50992 39207 56164 35095 59375 26 32502 35015 30717 38794 29529 41078 27 36603 37797 34922 39062 33914 39876 28 30289 32759 26979 35041 25025 36313 29 43997 49304 38161 55744 34661 59721 30 36419 38041 34435 40026 33223 41280 31 30357 32847 27069 35016 25162 36300 32 29975 32112 27258 34108 25625 35265 33 30063 32551 26821 34397 24807 35433 34 30572 32312 28396 34092 27160 35180 35 32168 34047 29700 35859 28302 37036 36 34516 37600 30761 40914 28736 43168 37 39132 42928 34801 47720 32348 50848 38 32353 33541 31025 35012 30244 35953 39 30363 33111 28185 37075 26887 39558 40 30740 32712 28236 34389 26795 35371

PAGE 114

Table 5.5a (continued) (In Dollars) K: =2 K: =6 K= =10 State inf sup inf sup inf sup 41 36997 40490 32529 44172 29844 46531 42 33753 36833 29626 38937 27196 40405 43 28368 30490 25535 32384 23824 33416 44 37026 41695 31871 47537 29024 51430 45 37086 40774 32974 45572 30527 48652 46 32426 35104 28590 37595 26181 39108 47 40186 42782 36748 45140 34695 46746 48 39534 42560 35585 45807 33270 47904 49 40087 41610 38018 43146 36729 44090 50 39459 43772 34626 49113 31790 52690 51 43124 47568 37890 52609 34841 55692

PAGE 115

Table 5.5b Range of Posterior Mean of 0* for Density Ratio Class when a 0 = 7 and (3 0 Â— 3 (In Dollars) K= =2 K= =6 K= =10 State inf sup inf sup inf sup 1 37415 38473 36148 39852 35357 40744 2 52118 54832 48964 58092 46912 60068 3 41298 42224 40298 43509 39695 44346 4 55216 56322 53884 57565 53049 58319 5 45976 48628 42561 51532 40351 53295 6 54653 56561 52646 58987 51472 60497 7 43354 43693 42996 44131 42787 44400 8 52955 53533 52340 54266 51985 54722 9 40934 41436 40438 42129 40149 42580 10 44043 44705 43282 45526 42820 46046 11 39649 40806 38300 42163 37523 43059 12 42313 42728 41805 43188 41510 43490 13 43080 43459 42626 43904 42345 44190 14 41836 42791 40672 43889 39957 44551 15 44856 46616 42781 48729 41477 50093 16 38925 39632 38335 40682 37989 41333 17 39843 41728 38580 44434 37818 46024 18 36965 37658 36262 38638 35846 39257 19 32470 32886 31957 33348 31653 33667 20 40943 41888 40001 43150 39446 43893

PAGE 116

Table 5.5b (continued) (In Dollars) K: =2 K: =6 K= =10 State inf sup inf sup inf sup 21 41972 43553 39801 45201 38419 46173 22 39416 41067 37562 43122 36413 44445 23 47784 50026 45403 52843 44017 54562 24 37474 40325 35755 44437 34689 46848 25 47758 50414 44277 53092 42137 54708 26 31581 32173 30906 32897 30510 33400 27 37305 37601 36926 37976 36693 38233 28 31429 32302 30373 33205 29804 33864 29 45246 47740 42459 50801 40682 52744 30 37009 37350 36532 37721 36220 37953 31 31693 32440 30692 33078 30140 33428 32 31061 31591 30389 32139 30022 32477 33 31947 32621 31118 33304 30678 33793 34 31179 31620 30647 32106 30335 32429 35 33124 33667 32486 34302 32111 34725 36 36238 37341 35078 38807 34392 39776 37 41063 42721 39864 45215 39042 46726 38 32761 33142 32445 33718 32266 34087 39 29756 30428 29108 31325 28735 31922 40 32018 32523 31366 33014 30996 33352

PAGE 117

Table 5.5b (continued) (In Dollars) K= =2 K= =6 K= =10 State inf sup inf sup inf sup 41 38526 39808 36966 41270 36014 42204 42 36745 37985 35528 39662 34798 40696 43 29470 30044 28746 30616 28358 30956 44 39599 42305 37375 46304 36107 48765 45 38349 39587 37316 41386 36685 42446 46 34691 35735 33738 37248 33183 38214 47 41685 42659 40406 43696 39628 44386 48 41167 42286 39959 43698 39253 44590 49 40886 41279 40377 41700 40060 41964 50 41023 42586 39343 44542 38362 45715 51 44604 46627 42374 49173 41012 50744

PAGE 118

Table 5.6 Range of E[6{ \ y,r 2 ] for the Class of Uniform Priors (In Dollars) 0 < T 2 < I 0 < T 2 < 5 0 < r 2 < 10 State inf sup inf sup inf sup 1 37495 38037 37495 38037 37495 38037 2 45870 51893 45870 52535 45870 52626 3 38390 41348 38390 41460 38390 41475 4 49191 55414 49191 55664 49191 55696 5 42058 46385 42058 46941 42058 47022 6 51718 54697 51718 54970 51718 55009 7 43480 44639 43444 44639 43439 44639 8 53156 53332 53134 53332 53131 53332 9 39919 40977 39919 40987 39919 40989 10 39308 44084 39308 44209 39308 44225 11 38932 40203 38932 40277 38932 40287 12 42606 43215 42593 43215 42592 43215 13 42014 43226 42014 43253 42014 43256 14 39407 42092 39407 42198 39407 42212 15 42026 45126 42026 45366 42026 45399 16 36347 38604 36347 38689 36347 38700 17 38565 39213 38565 39229 38565 39230 18 32510 36767 32510 36925 32510 36945 19 31720 32606 31720 32610 31720 32610 20 35678 40608 35678 40834 35678 40864

PAGE 119

Table 5.6 (continued) (In Dollars) 0 < r 2 <1 0 < T 2 < 5 0 < r 2 < 10 State inf sup inf sup inf sup 21 36356 42152 36356 42644 36356 42712 22 40174 42403 39898 42403 39859 42403 23 48651 50140 48345 50140 48300 50140 24 37535 41159 36810 41159 36701 41159 25 43795 48897 43795 49532 43795 49623 26 31682 32750 31621 32750 31612 32750 27 37447 37590 37428 37590 37426 37590 28 32263 36279 31927 36279 31881 36279 29 39461 44996 39461 45676 39461 45774 30 37256 39608 37207 39608 37200 39608 31 32435 34650 32240 34650 32213 34650 32 31673 36497 31400 36497 31364 36497 33 32532 35693 32307 35693 32276 35693 34 31427 32145 31351 32145 31341 32145 35 32020 33303 32020 33315 32020 33316 36 34309 36053 34309 36183 34309 36201 37 33701 39523 33701 40172 33701 40264 38 32801 37643 32704 37643 32692 37643 39 29969 35302 29667 35302 29627 35302 40 32345 34012 32252 34012 32240 34012

PAGE 120

Table 5.6 (continued) (In Dollars) 0 < T 2 < I 0 < T LO VI
PAGE 121

CHAPTER 6 SUMMARY AND FUTURE RESEARCH 6.1 Summary In this dissertation, we have considered robust Bayesian inference in the context of finite population sampling. It turns out that the robust Bayes estimators enjoy both posterior and procedure robustness for a fairly general class of priors. The subjective Bayes estimator enjoys posterior robustness, but fails miserably under the criterion of procedure robustness. The sample mean is inferior to both under the criterion of posterior robustness, but enjoys certain amount of procedure robustness. Thus, our recommendation is to give the robust Bayes estimators every serious consideration if one is concerned with both Bayesian and frequentist robustness. The other important finding is that when the sampling fraction goes to zero, that is, we are essentially back to an infinite population, several asymptotic optimality properties of the estimators of Berger and Berliner (1986) hitherto unknown in the literature are also established. Also, modifying slightly the arguments of Siverganesan and Berger (1989), we have provided the range where the posterior mean belongs under e-contamination models. This pertains to the sensitivity analysis of the proposed robust Bayes procedure. Similar results are proved in the presence of auxiliary information as well. As another issue in the prior robustness, we propose inherently robust Bayes estimators of the finite population mean based on heavy-tailed priors using scale mixtures of normals. The .implementation have been illustrated by adopting a Monte Carlo integration 114

PAGE 122

115 technique known as the Gibbs sampler. Also, asymptotic optimality properties of the proposed estimators are proved as the sample size increases. We have also considered Bayesian robustness in the context of small area estimation. We provide robust Bayes estimators of the small area means using ML-II priors. We compare these RB estimators with HB and EB estimators by anlyzing the real data. It turns out that RB estimates are inferior to both HB and EB estimates. We provide the range where the small area means belong under e-contamination class as well as the density ratio class. The ranges are fairly small and robustness seems to be achieved using both classes in some cases. 6.2 Future Research We have considered Bayesian robustness in finite population sampling using a class of priors when we have a single characteristic for each unit in the population. A useful research will be to provide a suitable extension of this type of problem by considering the multivariate case where we have several characteristics of interest. We can also explore the situation when we have more than one auxiliary characteristic available. Moreover, we can also extend the robustness issue using heavy-tailed priors in the presence of auxiliary information. We have studied Bayesian robustness in small area estimation using a class of priors for Fay-Harriot model. We can explore Bayesian robustness in small area estimation for more general HB model given by Datta and Ghosh (1991). The robustness study using heavy-tailed priors to this general setting is one important issue.

PAGE 123

BIBLIOGRAPHY Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. (2nd edn.). Wiley, New York. Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. The Journal of the Royal Statistical Society, Ser. B, 36, 99-102. Angers, J.-F. (1992). Use of the Student-t prior for the estimation of normal means: a computational approach. Bayesian Statistics 4. Eds. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith. Oxford Science Publications, Oxford, 567-575. Angers, J.-F. and Berger, J. O. (1991). Robust hierarchical Bayes estimation of exchangeable means. The Canadian Journal of Statistics, 19, 39-56. Basu, D. (1969). Role of the sufficiency and likelihood principles in sample survey theory. Sankhya, Ser. A, 31, 441-454. Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83, 28-36. Berger, J. O. (1984). The robust Bayesian viewpoint (with discussion). Robustness of Bayesian Analysis. Ed. J. Kadane. North-Holland, Amsterdam, 63-124. Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. (2nd edn.). SpringerVerlag, New York. Berger, J. O. (1990). Robust Bayesian analysis: Sensitivity to the prior. Journal of Statistical Planning and Inference, 25, 303-328. Berger, J. O. and Berliner, L. M. (1986). Robust Bayes and empirical Bayes analysis with e-contaminated priors. The Annals of Statistics, 14, 461-486. Berger, J. O. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence (with discussion). Journal of the American Statistical Association, 82, 112-122. 116

PAGE 124

117 Bolfarine, H. and Zacks, S. (1992). Prediction Theory for Finite Populations. SpringerVerlag, New York. Box, G. E. P. and Tiao, G. C. (1968). A Bayesian approach to some outlier problems. Biometrika, 51, 153-167. Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. AddisonWesley, Reading, MA. Cano, J. A. (1993). Robustness of the posterior mean in normal hierarchical models. Communications in StatisticsTheory and Methods , 22(7), 1999-2014. Carlin, B. P. and Poison, N. G. (1991). Inference for nonconjugate Bayesian models using the Gibbs sampler. The Canadian Journal of Statistics , 19, 399-405. Casella, G. and George, E. I. (1992). Explaining the Gibbs sampler. American Statistician, 46, 167-174. Chhikara, R. S. and Folks, J. L. (1989). The inverse Gaussian distribution: Theory, methodology, and applications. Marcel Dekker, New York. Cochran, W. G. (1977). Sampling Techniques. (3rd edn.). Wiley, New York. DasGupta, A. and Studden, W. J. (1991). Robust Bayesian experimental designs in normal linear models. The Annals of Statistics, 19, 1244-1256. Datta, G. S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to small area estimation. The Annals of Statistics, 19, 1748-1770. Datta, G. S. and Lahiri, P. (1994). Robust hierarchical Bayes estimation of small area characteristics in presence of covariates. Technical report No. 94-28, Dept, of Statistics, University of Georgia. Dawid, A. P. (1973). Posterior expectations for large observations. Biometrika, 60, 664-667. DeRobertis, L. and Hartigan, J. A. (1981). Bayesian inference using intervals of measures. The Annals of Statistics, 9, 235-244. Dharmadhikari, S. and Joag-dev, K. (1988). Unimodality, Convexity, and Applications. Academic Press, New York. Ericson, W. A. (1969). Subjective Bayesian models in sampling finite populations (with discussion). The Journal of the Royal Statistical Society, Ser. B, 31, 195-233.

PAGE 125

118 Ericson, W. A. (1983). A subjective Bayesian approach. Presented in a panel discussion on Modelling, Randomization and Robustness in Survey Sampling at the Annual ASA Meeting, Toronto, Canada. Fay, R. E. (1987). Application of multivariate regression to small domain estimation. Small Area Statistics. Eds. R. Platek, N. J. K. Rao, C. E. Sarndal, and M. P. Singh. Wiley, New York, 91-102. Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association , 74, 269-277. Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association , 85, 972-985. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409. Gelfand, A. E. and Smith, A. F. M. (1991). Gibbs sampling for marginal posterior expectations. Communications in StatisticsTheory and Methods, 20(5 & 6), 1747-1766. Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457-511. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. Ghosh, M. (1992). Hierarchical and empirical Bayes multivariate estimation. Current Issues in Statistical Inference: Essays in Honor of D. Basu. Eds. M. Ghosh and P. K. Pathak. IMS Lecture Notes Monograph Series 17, 151-177. Ghosh, M. and Lahiri, P. (1987). Robust empirical Bayes estimation of means from stratified samples. Journal of the American Statistical Association, 82, 1153-1162. Ghosh, M. and Lahiri, P. (1992). A hierachical Bayes approach to small area estimation with auxiliary information. Bayesian Analysis in Statistics and Econometrics. Eds. P. K. Goel and S. Iyengar. Springer-Verlag, New York, 107-125. Ghosh, M. and Meeden, G. (1986). Empirical Bayes estimation in finite population sampling. Journal of the American Statistical Association, 81, 1058-1062. Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: An appraisal (with discussion). Statistical Science, 9, 55-93.

PAGE 126

119 Godambe, V. P. (1955). A unified theory of sampling from finite populations. The Journal of the Royal Statistical Society , Ser. B, 17, 268-278. Godambe, V. P. (1966). A new approach to sampling from finite populations. The Journal of the Royal Statistical Society , Ser. B, 28, 310-328. Godambe, V. P. (1982). Estimation in survey sampling: Robustness and optimality (with discussion). Journal of the American Statistical Association , 77, 393-406. Godambe, V. P. and Joshi, V. M. (1965). Admissibility and Bayes estimation in sampling finite populations I. The Annals of Mathematical Statistics, 36, 1707-1722. Godambe, V. P. and Thompson, M. E. (1971). The specification of prior knowledge by classes of prior distributions in survey sampling estimation. Foundations of Statistical Inference. Eds. V. P. Godambe and D. A. Sprott. Holt, Rinehart and Winston, Toronto, 243-258. Good, I. J. (1965). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M. I. T. Press, Cambridge. Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York. Hill, B. (1968). Posterior distribution of percentiles: Bayes theorem for sampling from a population. Journal of the American Statistical Association, 63, 677-691. Huber, P. J. (1973). The use of Choquet capacities in statistics. Bull. Internat. Statist. Inst., 45, 181-191. Kingman, J. F. C. (1978). Uses of exchangeability. The Annals of Probability, 6, 183-197. Lange, K. L., Little, R. J. A. and Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881-896. Learner, E. E. (1982). Sets of posterior means with bounded variance prior. Econometrica, 50 , 725-736. Lindley, D. V. and Smith, A. F. M. (1972). Bayes estimates for the linear model (with discussion). The Journal of the Royal Statistical Society, Ser. B, 34, 1-41 . Morris, C. (1983a). Parametric empirical Bayes confidence intervals. Scientific Inference, Data Analysis, and Robustness. Eds. G. E. P. Box, T. Leonard and C. F. Wu. Academic Press, New York, 25-50. Morris, C. (1983b). Parametric empirical Bayes inference: theory and applications (with discussion). Journal of the American Statistical Association, 78, 47-65.

PAGE 127

120 OÂ’Hagan, A. (1979). On outlier rejection phenomena in Bayes inference. The Journal of the Royal Statistical Society, Ser. B, 41, 358-367. OÂ’Hagan, A. (1989). Modeling with heavy tails. Bayesian Statistics 3. Eds. J. M. Bernardo, M. H. DeGroot, D. V. Linley and A. F. M. Smith. Oxford University Press, Oxford, 349-359. Polasek, W. (1985). Sensitivity analysis for general and hierarchical linear regression models. Bayesian Inference and Decision Techniques with Applications. Eds. P. K. Goel and A. Zellner. North-Holland, Amsterdam. Rao, C. R. (1973). Linear Statistical Inference and Its Application. (2nd edn.). Wiley, New York. Robbins, H. (1955). The empirical Bayes approach to statistics. Proc. Third Berkely Symp. Math. Statist. Probab., 1, 157-164. University of California Press, Berkely. Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika , 57, 377-387. Royall, R. M. (1971). Linear regression models in finite population sampling theory. Foundations of Statistical Inference. Eds. V. P. Godambe and D. A. Sprott. Holt, Rinehart and Winston, Toronto, 259-279. Royall, R. M. and Cumberland, W. G. (1981). An empirical study of the ratio estimator and estimators of its variance (with discussion). Journal of the American Statistical Association, 76, 66-88. Royall, R. M. and Pfeffermann, D. (1982). Balanced samples and robust Bayesian inference in finite population sampling. Biometrika, 69, 401-409. Sen, P. K. (1970). The Hajek-Renyi inequality for sampling from a finite population. Sankhya, Ser. A, 32, 181-188. Sivaganesan, S. (1988). Range of posterior measures for priors with arbitrary contaminations. Communications in StatisticsTheory and Methods, 17(5), 1591-1612. Sivaganesan, S. and Berger, J. O. (1989). Ranges of posterior measures for priors with unimodal contaminations. The Annals of Statistics, 17, 868-889. Steffey, D. (1992). Hierarchical Bayesian modeling with elicited prior information. Communications in StatisticsTheory and Methods, 21(3), 799-821. Wasserman, L. A. (1989). A robust Bayesian interpretation of likelihood regions. The Annals of Statistics, 17, 1387-1393.

PAGE 128

Wasserman, L. (1992). Recent methodological advances in robust Bayesian inference. Bayesian Statistics 4. Eds. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith. Oxford Science Publications, Oxford, 483-502. Wasserman, L. and Kadane, J. B. (1992). Computing bounds on expectations. Journal of the American Statistical Association, 87, 516-522. West, M. (1984). Outlier models and prior distributions in Bayesian linear regression. The Journal of the Royal Statistical Society , Ser. B, 46, 431-439 . West, M. (1985). Generalized linear models: Scale parameters, outliers accommodation and Â•prior distributions. Bayesian Statistics 2. Eds. J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith. North-Holland, Amsterdam, 531-557. West, M. (1987). On scale mixtures of normal distributions. Biometrika, 74, 646-648. 121

PAGE 129

BIOGRAPHICAL SKETCH Dal-Ho Kim was born on December 27, 1959, in Korea. He was awarded a Bachelor of Science degree in statistics in 1980 and a Master of Science degree in statistics in 1982, both from Kyungpook National University, Taegue, Korea. He also received a Master of Science degree in statistics from the University of Minnesota in 1991. Since then he has been working toward the Ph.D. in statistics from the University of Florida. 122

PAGE 130

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Malay Ghosh, Chairman Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a -dissertation for the degree of Doctor of Philosophy. Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. / Â— 1 Irett D. PresneT Assistant Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Associate Professor of Dairy Science

PAGE 131

This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1994 Dean, Graduate School