Title: Sequential shrinkage estimation
CITATION THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00102789/00001
 Material Information
Title: Sequential shrinkage estimation
Physical Description: Book
Language: English
Creator: Nickerson, David MacLeod, 1958-
Copyright Date: 1985
 Record Information
Bibliographic ID: UF00102789
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: ltuf - AEH3713
oclc - 14706811

Full Text















SEQUENTIAL SHRINKAGE ESTIMATION


BY






DAVID MACLEOD NICKERSON

















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA


1985




























To Mom

and

Dad
















ACKNOWLEDGMENTS


I would very much like to thank Dr. Malay Ghosh for being my

di sser ta tion advi sor. Specifically, I would like to thank him for

his infinite patience, his incredibly deep understanding of

statistics and his dedication to teaching. I would like to thank Dr.

A.K. Varma and Dr. Dennis Wackerly for serving on my committee.

Also, I am grateful to Dr. Andrew Rosalsky and Dr. Randy Carter for

serving on my Part C and oral defense committees. Further, I would

like to thank Cynthia Zimmerman for her expert typing of this

di sser ta tion.

On a different note, I would like to thank Cindy Hewitt and

Steve Ghivizzani for their friendship and merriment.

Lastly, I would like to thank my parents for just about

everything.





























INTRODUCTION.....................................1

1.1 Sequen ti al Sampling.................................1
1.2 Shrinkage Estimation...............................8
1.3 Literature Review..................................1
1.4 The Subject of This Research.......................,16

SEQUENTIAL SHRINKAGE ESTIMATION OF THE MEAN OF A
MULTIVARIATE NORMAL DISTRIBUTION.........................18

2.1 Introduction.................................. 1

2.2 A Class of James-Stein Estimators Dominating ~N....19
23 Asymptotic Risk Expansion for ~N and -N........2

2.4 A Monte Carlo Study................................40

SEQUENTIAL SHRINKAGE ESTIMATION OF LINEAR REGRESSION
PARAMETERS.......................................5

3.1 Introduction.................................. 5

3.2 A Class of James-Stein Estimators Dominating 8 ....58
33 Asymptotic Risk Expansion for (N and -N........7

SEQUENTIAL SHRINKAGE ESTIMATION OF THE DIFFERENCE OF
TWO MULTIVARIATE NORMAL MEANS............................88

4.1 Introduction.................................. 8
4.2 A Class of James-Stein Estimators Dominating W,..93
bl'b2
4.3 Asymptotic Risk Expansion for NM and (N, .....101


TWO









THREE







FOUR


TABLE OF CONTENTS


Pa to

ACKNOWLEDGMENTS. ....................................i

ABSTRACT..............................................v

CHAPTERS


ONE












FIVE SEQUENTIAL SHRINKAGE ESTIMATION OF INDEPENDENT NORMAL
MEANS WITH UNKNOWN VARIANCES............................118

5.1 Introduction................................. 11

5.2 Asymptotic Risk Expansion for XN and q.......121

SIX SUMMARY AND FUTURE RESEARCH.............................146

6.1 Summary ................................4
6.2 Future Research................................. 14

BIBLIOGRAPHY.........................................14

BIOGRAPHICAL SKETCH.......................................152
















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

SEQUENTIAL SHRINKAGE ESTIMATION

BY

DAVID MACLEOD NICKERSON

August, 1985


Chairman: Malay Ghosh
Major Department: Statistics


This dissertation is concerned with sequential estimation of the

multivariate normal mean, estimation of the regression coefficient in

a normal linear regression model, and estimation of the difference of

mean vectors of two multivariate normal distributions in the presence

of unknown and possibly unequal variance-covariance matrices.

For estimating the p(>3) variate normal mean, we consider two

different situations. In one case, the covariance matrix is known up

to a multiplicative constant; in the other situation, it is entirely

unknown but diagonal. In both cases, the sample mean is the maximum

likelihood estimator of the population mean. When the covariance

matrix is known up to a multiplicative constant, a class of James-

Stein estimators is developed which dominates the sample mean under

sequential sampling schemes of M. Ghosh, B.K. Sinha, and

N. Mukhopadhyay ([1976] Journal of Multivariate Statistics 6j,

281-294). Asymptotic risk expansions of the sample mean vector and










James-Stein estimators are provided up to the second order term.

Additionally, in this case, some Mlonte Carlo simulation is done to

compare the risks of the sample mean vector, the James-Stein

estimators, and a rival class of estimators. In the second case, a

class of James-Stein estimators is given which dominates the sample

mean asymptotically by considering second order risk expansions.

The next case is concerned with estimation of regression

parameters in a Gauss-M~arkoff setup. Here the classical estimator of

the regression coefficient is the least squares estimator, and the

sampling scheme used is that of N. Mukhopadhyay ([1974] Journal of

the Indian Statistical Association 12, 39-43). Once again, a class

of James-Stein estimators that dominates the least squares estimator

is developed, and asymptotic risk expansion is given for both the

least squares and James-Stein estimators.

Finally, we consider the estimation of the difference of two

normal mean vectors, and the sampling schemes developed in 1984 at

the Institute of Applied Mathematics National Tsing Hua University by

R. Chou and W. Hwang. A class of James-Stein estimators that

dominates the difference of sample mean vectors is given. Asymptotic

risk expansions are also provided.














CHAPTER ONE
INTRODUCTION


1.1 Sequential Sampling


There are two basic purposes for which sequential methods are

used in statistics. The first is to reduce the sample size on an

average as compared to the corresponding fixed sample size procedure

which meets the same error requirements. Wald's sequential

probability ratio test is a classic example of this. The second is
to solve certain problems which cannot be solved by any fixed sample

procedure. One important example of the latter is the fixed length
interval estimation of the normal mean with unknown variance when the

confidence coefficient is specified in advance. Another important

problem directly related to the topic of this dissertation is a point

estimation problem which is described below.

Let X1, X2, .. be a sequence of independent and identically

distributed N(9,o2) variables where GeR1 is unknown. Bsdo

X1, . X,, an estimator 6, = 6n(X1,...,X,) is used for
estimating e. Suppose the loss incurred is given by



L(e,6n) = A|6n ,2 + cn ,(1.1)


where A>0 is the known weight and c(>0) denotes the known cost per

sampling unit. The most natural estimator of a from a sample of size











n is X = 1 X. with associated risk(epcdlo)
n n i=1 1

R(e,X ) = EL(e,[,) =A 2 +c



= A e-+ cn .(1.2)



The above risk is minimized with respect to n at


2 1/2 ]
n =n, = [ (A |



where [|u|] denotes the integer closest to u. Here, for simplicity,
2 1/2
we will assume that (A 2-) is an integer. Hwvr o nnw

02 there does not exist any fixed sample size which minimizes (1.2)

simultaneously for all 02. In this case, motivated from the optimal

fixed sample size n1* (when a2 is known), the following sequential

procedure is proposed for determining the sample size.



N = inf (n > m (>2):n i (A 5 /c) 132 } (1.3)


where m denotes the initial sample size and


~2 _1 n 2
sn .~ C(X X ) ,(1.4)


for all n > 2.











The above sequential procedure was essentially introduced by

Robbins (1959) who considered case L(e,6,) = A|6n e1 + cn. Later,

Starr (1966) considered the more general case L(e,,n)

A|6, esf ,t with s>0 and t>0.

The sequential sampling procedure as introduced in (1.3) has

sometimes been described as ad hoc. Robbins (1959) did not give any

decision theoretic motivation for such a procedure, which as we shall

see below can be justified from the minimax criterion.

With this end, assume that 02 is known. Suppose O has a N(p,T2)

prior. Then the posterior distribution of a given Xi = xi,


2
N( v + (xn y) ,
2 2
S+L
n


2

1+a
2
nT


Here, under the loss (1.1), the Bayes estimator of a is




2 a
T + --


nr




n r

with corresponding posterior risk equal to A 1(2 ,2 +cn
1~~~ + ( /t

The posterior risk being independent of X ., the
1' sn










sequential Bayes rule becomes a fixed sample rule with sample

size, nB (determined by minimizing the posterior risk) and estimator
n n

Next, consider a sequence of {N(v,m), myl} priors. Then the

associated sequence of Bayes estimators is 11 +( -)
o2/n 1 + (02/nm) (n ~Y
with posterior risks A + cn. Since the sequence of
1 + (02/nm)
posterior risks converges (as m'") to K = (Ao2 nl+ nan

R(e8,[) = K for all e then the fixed sample rule with sample size
determined by minimizing (1.2) with respect to n is a minimax rule.

In summary, nl, the required sample size, and X 4, the
estimator, is a minimax rule. Therefore, one can conceive the

stopping rule (1.3) as an empirical minimax stopping rule and not as
ad hoc as one might think. (By empirical we mean having any unknowns,

in this case 0 replaced by their estimators, in this case s)

Next, consider the multivariate analogue to the above problem,

i.e., let / X .. be a sequence of independent and identically
distributed N (e o2y) random vectors (p > 3), with BERP unknown and

V, positive definite, known. Based on X ,...,X ,if
(X = (1" ) is used to estimate 2, suppose the loss incurred

is given by


L(8 8 ) = (6n f?) g (6n 0) + cn (1.5)



where S denotes a known positive definite (weight) matrix, and c(>0)

is the known cost per sampling unit. Again, from a sample of size n,










-1 n
the most natural estimator of a is X = X., with risk (expected
Nn n 1=LY1
loss)



R(e,a ,2 )) = n a~ tr (Q V) + cn (1.6)




As before, if a2 is known, the above risk is minimized at
1/2*
n = n = (tr QV/c) (here, too, we shall assume that np is an
integer). Once again, if 02 is unknown there does not exist any

fixed sample size which minimizes (1.6) simultaneously for all 02

Again, motivated by the definition of np, we propose the following
sequential sampling scheme.


N = inf {n 1 m(,2) : n 1 (~2c t )2 17



where m denotes the initial sample size and


~2={n1p- (X X ) V1 (X. (1.8)
n i=1 ~ n 1 ~Vn



for every n > 2.

As can easily be seen, the above stopping rule is a multivariate

generalization of the univariate stopping rule (1.3). Ghosh et al.

(1976) and Woodroofe (1977) considered more general stopping rules

where I was an unknown arbitrary positive definite matrix. To show

that the stopping rule (1.7) is more than just an ad hoc extension of

(1.3), we will motivate it again from a minimax criterion.











Assume a2 is known and suppose o has a N(v,T2B) prior, where B

is a positive definite matrix. Then, for fixed n, the posterior

distribution of O given X. = x., i=1,...,n, is
~ 1 ~1


n2 u L



with


v2B r2B(r2B + 2- V)-1 ,2B ,2 (r2B
n n


2 2 a-1 2
SB(r B + -- V) B) ,


~2 ~)- n ~


.2
- V(I +
n ~ ~p


.2
a -1 v-1
B V)


where Ip is the identity matrix of order p. Since the loss (1.5) is

quadratic, the Bayes estimator 6 of o is again the mean of its
~n~
posterior distribution, i.e.,



6 (X )= +T B(T B +--V) (Xb v)
~n ~n n ~ s


a2 -1-
= v + (I + V B) (X -g
-p 2 -n
nT


with the corresponding posterior risk equal to


.2
tr[Q V (I + B Vy) ] + cn.
n -p 2 ~
ATr










Again, the posterior risk is independent of the X.'s so that the

Bayes sequential decision rule is a fixed sample rule with sample

size, nB, determined by minimizing the posterior risk with respect to



Next, consider the sequence of {N(v,mB),m 1 1} priors on e.

Then the Bayes estimator of O becomes



8 a2 8-1 -
n ~n ~ p nm ~ n



with associated posterior risk



,2 a -1
--tr[Q V ( I + ---BV) ] + cn.
n ~p nm



By using a limiting Bayes argument as in the univariate case we see
.2
that the posterior risk converges (as m+.) to K tr(Q V) + cn
2 *-
where R(e,a ) = K for all 0. Hence, n the sample size, and X ,

the estimator, is a minimax rule. Consequently, the proposed

stopping rule (1.7) can also be viewed as an empirical minimax

stopping rule.

It is the stopping rule (1.7) and its eventual modifications to

fit a regression problem, its generalization to a more complex

covariance structure and its adaptation to a two sample situation,

that will be of interest to us (as far as sequential sampling goes)

in the subsequent chapters.










1.2 Shrinkage Estimation



Consider the situation X X .. are independent and

Identically distributed N (eO2V)? random vectors, eeRP is unknown, V
is a known positive definite matrix and p 3. Again, the problem is

to estimate O. For a sample of size n, assume the loss (1.5) in

estimating a by 6 =6 (X1,..X ). The minimax estimator

-1 n
of a is X = -E X.. However, Stein (1955) was able to establish

the existence of estimators of o (for known a2 and for Q = V =I)
~ p
which dominated 2n (and hence were minimax themselves). Later James

and Stein (1961) exhibited some explicit estimators dominating X.
~n
Since then, there have been a multitude of articles concerning

extensions of the James-Stein estimators to more general situations.

Specifically, we refer to Berger (1976) when a Q and V are known,

Efron and Miorris (1976) when Q = V = Ip and a' is unknown, and

Berger and Bock (1976) when Q is a known diagonal matrix and the

covariance matrix is diagonal and unknown. For a general class of

estimators that dominate the usual mean vector, we refer to Baranchik

(1970), Strawderman (1971), Alam (1973), Efron and Morris (1976),

Berger (1976) and Berger (1976).

To motivate the idea of a shrinkage estimator let us consider the

case Q = V = I .Suppose O has a N(U,T I ) prior. Then, by previous

results, the Bayes estimator of a given X. = x., i=1,...,n, is
~ 1 ~1











T2
T-
, 2 ~n


B
6 (x )= E +
~n ~n


a2n

n


11 + (1


Here, if a2 and r2 are unknown then they must be estimated from the

da ta. With this end, note that marginally


2 ~~2 .
~n ~n p


where n.n denotes the usual Euclidean norm.



~n~
-1 1
E[
p -2 2
.2
T2 +-



Also, marginally


Hence,


,2
(n-1)p (~n-1)p


~2
where we are referring to definition (1.8) of s .



E [s 2] =G.2


Consequently,











Therefore, replacing the unknowns in 6 (X ) by their estimates we
Nn
have


~2
JS (p-2)s
6 = v + (1 -)(2 u
~n ~2 ~n




which is known as the James-Stein estimator. The above empirical

Bayes interpretation of James-Steim estimators is due to Efron and

Morris (1973). In James and Stein (1961), the estimator was studied

from a purely frequentist perspective.

JS
For better understanding of the shrinking property of 6 ,note


that


JS
"nX Vil "~ >16 -Ul


(p-2)s
n -


112 vl |1 I
Nn


if and only if


Therefore, the distance between 2n and a will be greater than that


~2
JS (p-2)s,
of 6 and if and only if 1 > |1 |


JS
true an will shrink 2n towards the prior mean u.

the above condition is always met we sometimes use
JS
version of 6 given by
,n


SWhen this is


To assure that

the plus rule











+ (p-2)~2
JS n n )
6n




where (a)+ = max (0,a).
JS
The James-Stein estimator, 6 and its eventual modifications

to fit a regression problem, its generalizations to a more complex

covariance structure and its adaptation to a two-sample problem will

be of interest to us (as far as estimation goes) in the subsequent

chapters.



1.3 Literature Review



There are only two pertinent articles to date that address the

inadmissibility of the sample mean vector, X ,as an estimator of o in
~n
a setup where the sample size is random. Ghosh and Sen (1983-84)

consider the case of arbitrary Q and V, but consider a two-stage

sampling scheme. The main criticism of this approach is that a two

stage procedure requires on an average more observations than nP, the

optimal sample size when 02 is known (see for example Ghosh and

Mukhopadhyay, 1976). Takada (1984), on the other hand, considered

the case Q = V = I under a full sequential sampling scheme. The

Main objection to his approach is that at the n-th stage of sampling

his estimator (similar to the James-Stein estimator) uses only p of

the available (n-1)p degrees of freedom to estimate a2. This will be

made clear in Chapter 2.










Before leaving this section let us familiarize ourselves with a

particular identity from Stein (1981) which will be used repeatedly
in all the proofs. First we require the following lemma.

Lemma 1.1 Suppose Y ~ N(0,1). Then, if E[|g (Y)|] < m, one has


E[g (Y)] = E[Yg(Y)].


(1.9)


1 1 y2) an 'eta 1y y(
Proof: Write #(y) =--- ex (--y )adnoe ta ()=-y y)
q2
Now,


E~g'(Y)] =


J gl(y)4(y)dy


(1.10)


0
= gl(y)(y)ydy + J
0


gl')(y)(ydy


= J gl(y)[f z4(z)dzldy
0 y


0
+ J


g' (y)[f


-ze(z)dzldy


= j [g(z)-g(0)]zt(z)dz + J [g(0)-g(z)](-z6(z))dz



= ( [g(z)-g(0)]zO(z)dz


= E[Yg(Y)] + g(0)E[Y]

= E[Yg(Y)]











since E[Y] = 0. This brings us to the identity, referred to as

Stein's identity, which is stated as follows:


Lemma 1.2 Let X ~ N(O,a ). Then



E [~h'(X)] _




Proof: Define g(x) by g(x) = h(6

h (x) = ( ) Note that Y


ifE 2[jh (X)I ] < one has
8,cr



1E [(X-O)h(X)] (1.11)
2 2



+ ax). Then, h(x) = g ( ) and
X -
= ( )~(01.Hence,


= -E 2 9 E--)]
6,a


= 3 ECg'(Y)]


-E[Yg(Y)]


1 EX e g X e)


E 2[h (x)]
B,0


(1.12)


1
=--E 2[(X


- 9)h(X)] ,


where in the third equality in the rhs of (1.12) we use Lemma 1.1.

In subsequent chapters, however, we shall require a multivariate

version of the above identity which may be stated in the following

lemma.










ah(X)
Lemma 1.3 Le NeI). Then if Ee[| |](T
1 < i < p, one has


ah(X)
Ee [ -aX. ] = Ee [(Xi ei)h(X)] ,


(1.13)


for 1 i (p.

Proof: Note that for each 1 < i < p,


ah(X)
Ee IT -]


ah(X)
= EeEe.C-aK_-


( X1,...,Xi-1,Xi+1"...X ]


(1.14)


=EeE [C(X.-e.)h(X) IX ,..X ,X. ,..X]






where for the second equality we use Lemma 1.2 and the independence

of the X-'s.
An interesting consequence of Lemma 1.3 is the following: If

6(X) = (X1 + el X),..., Xp + b (X)) is a rival estimator of
X ~ N(9,I ) for estimating 9, then assuming the loss (1.5) with Q = Ip

and assuming that E, [|5 1 |] < = for 1 ( i ( p, the risk

difference is given by











R(e,6) R(e,X) (1.15)



=E [1 =1(Xi + ~i(X) 2 =(X-9)



=E [2C P4.o(X)(X. -.) +C 6 #2(X)]




= EBE[2 =1 aX I=14 ~X,1 ,



where the last equality is obtained from Lemma (1.3). Hence to

obtain estimators 6 which dominate the usual estimator X one needs

to find solutions to the differential inequality


B~(x) r=2
A~~~x)~ E 1+1 (x) < 0 (1.16)




Put b (x) = -a(xi qi)/lx un the left hand side of (1.16)

reduces to a(2(p-2)2a) < 0 if 0 < a < 2(p-2), p t 3. Hence, for


an estimator of the form X -dominates X under the loss


(1.5) with Q = I This is the classical result of James and Stein
~ p
(1961).











1.4 The Subject of This Research



In Chapter 2, we consider the case of arbitrary Q and V (and the

loss (1.5)) under a full sequential sampling scheme. We show the

exact dominance of a class of estimators over the usual mean vector

in estimating e. Further, we obtain asymptotic risk expansions for

such estimators up to the second order term. A Monte Carlo study is

also performed to compare the risks of the sample mean and two rival

classes of estimators.

In Chapter 3 we consider a multiple regression set up. Here we

wish to estimate the vector of regression parameters 8 under the

loss (1.5) with Q) = Zn Z t,Z being the usual design matrix of order

nxp, and V = (Z' Z ) 1. Again, the sampling is done sequentially

with each additional point consisting of p non-stochiastic regressor

variables and observing the response (a scalar). Here we show the

exact dominance of a class of estimators over the usual least squares

estimator. Further, we obtain asymptotic risk expansions for such

estimators up to the second order term.

In Chapter 4 we consider a multivariate version of the Behrens-

Fisher problem of estimating the difference of two normal means when

the variances are assumed to be unequal. The sequential analogue in

a univariate setting was first introduced by Ghosh and Mukhopadhyay

(1980), and was later generalized to a multivariate setting by Chou

and Hwang (1984). We have proposed in Chapter 4 an extension of the

stopping rule (1.7) to handle the two sample situation. Also, the











loss (1.5) is used in the estimation of v, the difference of two

normal mean vectors. We show the dominance of a class of estimators

over the difference of the sample mean vectors. Once more we produce

an asymptotic risk expansion for such estimators up to the second

order term.

In Chapter 5 we return to the one sample problem where O_ is

diagonal and the entire covariance matrix, formerly o2V, is assumed

to be diagonal and unknown. With the sequential sampling rule (1.7)

generalized to this situation and the loss (1.5), we have produced an

asymptotic risk expansion (up to the second order term) for a class

of estimators dominating asymptotically the usual mean vector.
















CHAPTER TWO
SEQUENTIAL SHRINKAGE ESTIMATION OF THE MEAN
OF A MULTIVARIATE NORMAL DISTRIBUTION


2.1 Introduction



Let X ,X,...b euneo needn n dnial
~1' ~2 easqec fineedn n dnia
distributed N(e,o2V) random vectors, where OERP is unknown, and V,

positive definite, is known. The problem is estimation of G. Based

on X1..., Xn if we estimate a with 6= 6 (X,..., Xn), let
the loss incurred be (1.5). We have seen from Section (1.2) that for

this situation the sequential sampling rule (1.7) and the estimator

Ep can be considered as an empirical minimax rule. Consequently,
producing a class of estimators which dominate (N takes on a greater
meaning than a mere pathology.

In Section 2.2 we exhibit a class of James-Stein estimators

which dominate Rp, the usual mean vector, and simultaneously, address
the two objections raised in Section 1.3. The first objection was in

reference to Ghosh and Sen (1983-84) and their two stage sampling

scheme. Unlike a two stage procedure which uses only s~ in defining
the stopping rule, the stopping rule (1.7) uses an updated estimator

of cr2 at each stage of the experiment, and thereby demands fewer

observations on an average. The second objection was in reference to

Takada (1984) and his version of the James-Stein estimator. We shall











see later in this section that Takada's class of estimators use only

p of the (n-1)p degrees of freedom available at stage n to estimate

,2. The class of James-Stein estimators we produce use all the

available degrees of freedom for estimating 02, and as a consequence

allows the estimator to stabilize for large n. In Section 2.3 we

produce an asymptotic risk expansion (as c+0O) for the proposed class

of estimators. In Section 2.4 we give the results of Mronte Carlo

simulations comparing the risks of the sample mean, the proposed

James-Stein estimators and Takada's estimators. As expected, our

estimators achieve greater risk reduction than Takada's.



2.2 A Class of James-Stein Estimators Dominating N



In this section, we consider the class of James-Stein estimators

6N(X1,..XN), where


bs
b n -1-1
6 (X ,..X ) = V ( _- x), (2.1)
~n 1' n ~n -, Q 1 -
n(X X) V QV (X X)
-n ~ ~ ~



for every n > 2, where s = --7 s (nt, 2), b is a constant and ERP

is the known point towards which we want to shrink. Very often X is

taken as the prior mean. The main result of this section is as

follows:

Theorem 2.1. Under the stopping rule (1.7), and the loss (1.5),






20




R(e,o2,6 ) < R(e,a2 ,X



for every be(0,2(p-2)).

Proof: First write


2 -1 -e
2 2b SN ~N ~ N
R ( ,a 2,6 ) -R(9,0,a ,)=-2bE [] (2.2)
~ N ~N 2 Y- -1 1 -1


+ b2E [sN4/(N {(X X)'V- -1 V -(X 1)})].
2N N ~ N
8 ,a



Since and V are both positive definite, using the simultaneous

diagonalization theorem, there exists a nonsingular D such that

DQ "D1 = I and DVDI = A, a diagonal matrix with all positive
~ p ~ ~
.th
diagonal elements. We write a. as the i diagonal element of A.

Use the transformation _Z. = D(_X ) =2.. so that the Z.'s

are iid N(c.~2 A), with C = D(6 1). Writing~i Z n E. =Z. for

every n 1 one can rewrite



2 -l1 n 1

sj = (nv-1)(+2) (Z Z )'A1~ (Z. Z), n 2 ; (2.3)
n n i=1 i ~n ~ ~1 n




(2i xv1)'V QV (X = Z' A 2 ng 1 (2.5)
n ~ n ~n n'











Then from (2.2), one gets


R(3,a ,6 )
~ ~


2
-R(e,a ,X )
~ N


= -bE (s /N)(Z 'A ( -c)/( 'A 2z )]
2 N N N N N


(2.6)


+ b2E [~(s /N )(Z 'A- Z
2 N -N ~ ~N)l





Next we use the Helmert orthogonal transformation Y = (Z Z )//7 ,
~2 1 ~2


Y = (Z + Z 2Z / .. Z+.. n1Z)/2-)
3)J~~ .. ~2 ~3 sn (~1+ +Zn-1 1 )/n n-
..to write


S2 n .A -1Y.n2
n i=2 ~1 ~ 1


(2.7)


where the Y.'s are iid N(0,a A).
~1 ~


Denote by Bn the a-algebra


genera ted by Y ,... Y Now,
~2 Nn




E [(s /) A ( )/Z A 2 )]
2 N -N N N N



= E E E [{I(s I n)(Z Z-c)
n=m 2 2 nN ~ n -


(2.8)


-2 -2
(_Z 'A _Z )}I (B ]~n


-1 2
n,,m 2C Sn


E (2I ( A- (Zn s))/(Z 1A' Z )JB ]
[N=n] sn n ~ ~











Z
= [-1s2 p a.-1(02 /n)a .E { ni ) IB ]
Cn=mE 2 C n [ N=n] 1=1 1~ ,n 2 -
5,0 5, aZ
n1 n sn


[n2sZI (p-2)/(Z 'A- Z )]
2 [ N n] ,n n


2a E
n=m


For the second equality in (2.8), one uses the independence of Z and
~n

(Y ,..Y ), while for the third equality, one uses Stein's identity
N2) Nn
(of Stein, 1981).

Next, note that for every n, nZ 'A- /,2 ~ .W /. h
,n N n 1=1 ni/a 1hr


2 "
the Wni's are independent and Wni ~ X2(- -) i ,.,.
20 a.
using the result (see, e.g., Cressie et al., 1981) that if

P(U > 0) = 1, then E(U-1) ; E(exp(-tUl))dt, one gets


Hence,


S-2 2 -1
E 2["ni A 3 /a ]


(2.9)


p
=o JO.x
i=1



= O~ n
i=1


E[exp(-t wni/ai)]dt


-1 2 2 2
{(1 +2tai ) exp[-ntqi/(c a. (a + 2t))]}dt


=g (hn) (say) ,



where qas 4(5,02). In this section we need only that g (hq) is











nonincreasing in n. Now, using again the independence of Zn and

(Y ,...,Y ), one gets from (2.6) (2.9),

~2

R(0,a2,6 ) R(9,a2,R ) (2.10)
)N N )N


bs
2 n
= -bZ, g (A )E [(sS /n)(2(p-2) )I



-1 2 s
< -2b(p-2)zn g (A )E [" n s(1- --)



where in the last step one uses 0 < b < 2 (p-2). Also, in the

above, EZ denotes expectation when the Y.'s arei ae iid N(0,a A).
Accordingly, for proving the theorem, it suffices to show that


2 g(A)E[ns 1- /0 )I ] > 0 for all a2 > 0 (2.11)
n=m p~n E2" Sn n [N=n]



To prove (2.11), first observe that tr(QV) = tr(QD-1ADI-1)

tr(D -QD 1A) = tr(A), since DQ D0' = I p. Let no denote the smallest

integer ? m such that p(p + 2)l cn /(tr A) to 0. Then, we write










1hs of (2.11)


(2.12)


2 [CN=n]




)I ]
2 [N~n ]
a 0


n -1
= o g()E[-1s2(1
~n=m 9p~) n C 2 n (



-1 2
+ g (An )E 2[n0 s (1 -
p n 0n


00 -1 2 n+1
+ Cn n (9p n+1l)E 2[~(n+1) sn+1 (1 ---)I[Nan+1]l



-1 2 n
9p(d,)E 2[n sn (1 2) [N>n+1]l '



where the first term in the rhs of (2.11) should be interpreted as

zero if n =m. Note that for n n ,on the set [N > n+1],
o o -
s 2> p(p+2) -cn2 /(tr A) > p(p+2) lcn /I(tr A) to 0. Accordingly,

since g ( n) is nonincreasing in n,


third term in the rhs of (2.12)


-1 2
E n~n 9p n+1 )E 2[{(n+1) sn+1(1


(2.13)


2
S
n+1
2


Note that I[Ngn+1] is a B measurable function. Then using the
2 2
representation s =l ((n-1)s, + U ~)/n, where












U =l (p+2) A -1Y 1 a2xP /(p+2) independently of
Y ,...,Y one gets with probability 1,
~2 -n


22


2 2 4
(n-1)s2 p- (n-1)2s4 + 2(n-1)s2 a
n p+2 nn p+2 p+2
- I[ 2
[Nan+1]L (n+1)n (n+1)n a



=I n-1s2~12 n(1 3-


(2.14)


[Nn+](s4 /62( L 2n) + 2 nl22nn+11p+2)


po 1 1
p+2(n+1)n 2
P+2 (n+1)n



[CNan+1]n sS~ (1 --y


I4 s
+ I sn 3n-1
[N n+1/ 2(n1)-


2
sn (n-1)p
2n2(n+1)(+ )+


po2 -n-1
-py*n (n+1)


of (2.14) is


Note that the multiple of I[Nn1 in the extreme rhs
2
a convex function of s ,where the minimum occurs at











(n-1)p
n +
= 2 p+2 ,2
3n-1


(2.15)


2 2
[Nin+1], s > a ,it follows that
n


Hence, recalling that on the set


2nd term in the extreme right of (2.14)


(2.16)


2a2 ( PE(n-1)p

n2(n+1)


(3n-1)a
[N n+1] 2(+)


2
o (n_-1)p _
(p+2)n2 (n+1)


= I a (( )( )) .
[N~n+1] 2 p+2
n (n+1)


Hence, from (2.13), (2.14) and (2.16),


third term in the rhs of (2.12) > 0 .


(2.17)


Next, note that if no m, since I[Nm 1 with probability 1


and E [sm(~1- S)] =_ 02 (m-1)p((m-1)p+2)1 (m-1)po2 2m2
02 02 p+2(m-1)2(p+2)2 (m-1)2(p+2)2


> 0, rhs of (2.12) > 0.


For no > m (, 2), first note that for


2 -1 2
n < no 1, on the set [N = n], sn < p(p+2) cn /(tr A) <

p(p+2)-1c(no-1)2/(tr A) < 02 so that using g (An) nonincreasing in n,


first term in the rhs of (2.12)


(2.18)


2
s
2 [N n]


n -1
> g (6 _)E~ E


[n-1s2(1
2 n













































(o-2 n -1
o


I


In view of (2.11), (2.12), (2.17) and (2.18), for proving the theorem

it suffices to show that


2
s
n
E [n1s2 (1 --2)
02 o no 2 [Nyn ]


2
s
2 [IN=n]3


n -1
to
ncm


E [n-1s2(1
02 n


(2.19)


s20 = ((n -2)sno-1


+ Un)/n 1)


To prove (2.19), first writing


one gets


2
S
n
--o' [~Nan ]]
ao


E 2[nos2 (1
ao


(2.20)


2
s
n
- 2-INan ] Bn -1
(I 0 0


= E2E 2[nos2 (1 -
2 0 0


2 4
pa pa
p+2 p+2


(o-) n -1+ p+2
o
= E {[--i7----- -
,2 ~ no~n-


+2(n -2)s2 -
o


no(n -1)2 2


= E2[{ s -1- b (s -1/2 + cn [ n] '


where


[N>no1














an 2' n
o no(n -1) o


(n -2)2 ( n-2)p
2 Cn2
n (n -1) o n (n -1) (p+2)


(2.21)


(p-2)(n -2)
no n (n -1)(p+2)


(> 0), so that


Let gn = b
n n


cn -. g = 2(n -2)/{n (n -1)(p+2)} > 0. Also, let
0 0


2(n -2)p
n n o(n -1)(p+2) S


d =2b
no n


that d E (0,1).
n
0


Now rewrite


extreme right of (2.20)



1 2 no
n -1 2 ~n (n -1- 7
o a o0 o


(2.22)


4
S
n -
o
o a


2
2s
n -1
o


+2)


n n [Nyn]
0e 0 0


4
s
n -1
_ of N~n ]
a o


>d E2 1p2-1
oa o no


S
n -1
_ 0_ Nan ]] > 0, noting again that


If E 2 51 (s2 1


< 2


on the set [N = n] for all n < no-1, one proves (2.19) from (2.20)

and (2.22). Otherwise noting that d, < 1, one gets from (2.19),
(2.20) and (2.22) that





(2.23)


1hs of (2.19)


2
s~lNn


n -2
> c E [n-1s2(1
n=m 2 n


2
S
n -1

2 [~iN~n -1]


-1 2
+ E [~(n -1) sn (1 -
2 o n -1


Proceed inductively to get either Ths of (2.19) > 0, or finally end

with


2

- s [N>m]] 0 ,


E 2[ 1sm(1


(2.24)


1hs of (2.19) >


as calculated earlier.


b
for X and a


2.3 Asymptotic Risk Expansion


First we obtain an asymptotic (as c + 0) expansion for the risk

of 2 Observe


+ cE [(N-n )2/N]
2


R(e,a2, ) = 2cn


(2.25)


-rd *
= 2c a(tr QV) + cE [(N-n ) /N]
2


a.s. a.s.
~2 2
+ = as c + 0 so that sN + o as c + 0.


From definition N












Also, using Anscombe's Theorem, /RFii(?N a ) + N(0,20 ) so that

using the delta method, /RFi(PN a) + N(0,(1/2)o ).
Next use the inequality


1/2, 1/2,
(tr QV/c) sN N ( m + (tr QV/c) SN (2.26)



Dividing all sides of (2.26) by n letting c + 0, and using
a~s. a~s.
SIN + as c + 0, it follows that N/n + 1 as c + 0. Thus,
L L
Jn*p(IN a) + N(0,(1/2)a2 ), and also Jn"p(sN-1 -0) + N(0,(1/2)a )
Again from (2.26), one gets the inequality


(n ) 1 (s -c)/o < (N-n )/(n ) /2< m/(n ) +/2(n* )/ (s- -a)/a.

(2.27)


Since m/(n*) 12+ 0 as c + 0, it follows from (2.27) that
(N-n*)/(n*) 12 L N(0, -) as c + 0 which implies that
(Nn)N12L L a~s.
(N-n*/N +N(0, --) as c + 0 since N/n* 1 as c + 0. Also,
2p
the uniform integrability of (N-n*)2/N for all c < co (specified)
can be proved by repeating essentially the arguments of Ghosh and
Mukhopadhyay (1980). Thus E 2[(N-n*)2/N] + --as c + 0, and it
follows now from (2.25) that












R(8,a2,X ) = 2c 1/2 o(tr VY) 12, c-+ o(c), as c +t 0
N 2p


(2.28)


Next, we find the asymptotic risk expansion for


It follows


from (2.6) and (2.8) (2.10) that


R(O,a2, g)


(2.29)


/N/ (1+2ta1)
0 i=1



(1+2ta. 2 exp(-


exp( N~ p1 aa+t)dt]
.2 i 1 1iai2


= R(e,o2 N )-2b(p-2)E 2[(sN
'-N~ ,


(2
i dt]
a.(a.+2t)
S1


+ b20- E2E [(4/N
2N


H P
0 i=1


Nt cPp
"TE i=1


1/2 a.s.
S(tr y)


2
04 p 2and Nc
(p+2)


a.s.
As c sN+


S=(tr A)


0 =


(c a.)


a. Hence, for C f 0,. as c + 0,


exp(- E a (ai 2t))dt


c-1(sN/N)/" I (1+2ta 1 2
0 i=1


(2.30)


(S4/e 1/2 )m P 1/2 1 -1/2
= (sN/(N 2 ))f (1+2tc a
0 i=1


N 1/2 5
exp(- Ntc~- 2P~ gp )
a.(a.+2tc)











1/2
c
1=1


2
+ ( P a/(o(tr A)
(p+2)


5 /a2)dt


)}{m exp(- t (tr A)
Oa


= P 4(tr A)
(p+2)2


P2 (4tr 9V
(p+2)


-1/2


-1/2
(tr A) (c


C2/a.)-1
1 1


-1 N 1 1 -1-


since tr A = tr QVj, and


cP 5/a


=' CA- 5 = ( O-A) 'D' (DVDI) 1(DVD I) 1D(8-A)


(2.31)


=(e-x)' y-1D--11 -1y-(e-x)







Next we prove the uniform integrability (in c y c ) of the left hand
side of (2.30) when C f 0. First note that


1hs of (2.30)


(2.32)


2
)dt
a.(a.+2)
1 1


<(4/(Nc))[fl exp(- Nt L
N i0


p
Sdt]


N
+ exp(-
QE


p 'i 2t
i=1 a.(a.+2) max a.










4 4 1/2 2
S(s /(Nc))(K/N) = Ks /(Nc


where K is a constant not depending on c. In deriving the first
inequality of (2.32), we have also used that t/(a.+ 2t) + in t for
all i since a. > 0 (i=1,...,p). Thus, it suffices to prove the
4 1/22
uniform integrability of s 4N/(Nc )2 in c < c First use the

inequality


4 2 ~(-6 E 4 /(2 )1+6 1 (.
E 2[(sN/(N c))I ]4N~ > dl E 2sN/Nc} ] (.3



for some 6 E(0, 1/2 ) Taking 6 in (6 1/2 ), and using Holder's
inequality, one gets



Ea2[{s N /(N2c)}1+6 ] (2.34)

(1+6)(1+6)

< [E (N2c) -(1+6) (1+6 )/(1+6)[E (s 4 6-61 (6-6 )/(1+6)
,2 a2 N


Next observe that for any arbitrary E, in (0,1),


2~~~ ~ -(+)2 (+)2 2 -(1+6)
E [( c] E [(N c)-is I ( *c (2.35)
2 2 [N

S(m c) ("P ,(N en*) + (E tr(QV)o2- +).







34



To proceed further we shall require the following lemmai.


= 0(c )


Lemma 2.1 For any 0 < e < 1 and pt 1 ,


P 2(N en*)


Proof: Note that


=Is* P (N=n) ,
n=m 2


P2 (Nlen*)


(2.36)


where [u] denotes the largest integer I u.


Now,


2
m
< c r q)


~2
P e(N=m) = P 2(s


(2.37)


2
m
Str(QV) *


2 (m-1)p
= P(x <
(m-1)p -2


Using the fact that


k1 1/2 k1/2 ,-d/2
< d) =
k
r( + 1)


2
P(x
k


2
+ P(x < d)
k+2


(2.38)


k+2
k/2 "l?"
= Oe(d ) +Oe( d)


= Oe(dk/2)


we get


P 2(N'm) = Oe(c


(2.39)











Therefore all we need show is that


[E~n*] 3 (-)
nc+ 2 (N=n) = 0(c
a~~


(2.40)


Now, for n > m+1,


< c )
-tr~QV)


(n-1)p c
)p 2


~2


2 n

= P(X2
(n-1


P 2(N=n)


(2.41)


2
n
tr(lV)


exp(h (1pc
,2


(n-1)p
exp(h c


n ~)E[exp(-hx 2(-p


2 (n-1)p
trq)1+2h1


- h>0


=inf
h>0


2(n-1)p
n 1 2
tr(qV) 2)


2
n 1
Str(QV) 2 I


(n-1)p
=exp[ ---(1
2


2 (n-1)p
n 1 2
tr(qVC) 2


2
n 1
-c ~ ~-Z)(c


= exp(1


where for the third inequality in (2.41) we use the fact that

P(Ugd) = inf chdE~e-hU]. Next observe that for n ( [En*],
h>0

c trqV E2 ye <. Since x exp(1-x) is increasing in x for

0 < x ( 1, one gets











(n-1)p
2
n 1 2
(c --)]
tr(QV) 2


2
[En*] [En*] n 1
c P (N=n)< gr exp(1 c ---)
n~m+1 2 nm+1tr(QV)2


(2.42)


(m-1)p
2
c


(m-1)p (n-m)p
2 [Cn*] (m-1)Pep l-2 22 2
n=m+1


r1 1 ,
tr(QV) 2


(m-1)p (m-1)p (n-m)p
2 1 1 2 Z (m-1)Pep l-2 22 2
< c (e) n [xp1 )]
arl) n=m+1


The series in the extreme rhs of (2.42) is convergent. Therefore


= 0(c


c P (N=n)
n=m+1 2


(2.43)


Hence, for m > 2, c (



rhs of (2.35) 5 K (m ~1I) c


+ (E tr(QV)a ~(+)


,(2.44)


where K (> 0) is a constant not depending on c. Further, using the

backwuard martingale property of Tn = {(p+2)/p~s2 n ob' aia

inequality for submartingales, it follows that for every r 1 ,


E (2r) T ]r J E 2(T )
E2( N ) 2 p+2 N2N


T r K E(Tr
Tn) 1 2 mr
a


2
a n>m


(2.45)











where K1 is a constant which only depends on r. Combining


(2.33) (2.35) and (2.44) (2.45), the uniform

integrability (in c < c,) of sN/(N 2c)2 follows.
e P x (i.e., s f 0), as c + 0,


Thus, for m > 2,


p 1/2 42
n (1+2tail) exp(- --; c=1 a (a+2t)dt]
i=1


c-1E 2[(sN/N)( 1


(2.46)


2

(p+2)


a (tr QV) I(e-X 1) V Q V (9-X)-1 )


Similarly, it can be shown that


11


. 1/2exp(-
ax( -;


c-1E4 2[(sN/N) p (1+2ta 1
a 0O i=1


(2.47)


p+2


as c + 0. Hence, from (2.28) (2.29) and (2.46) (2.47) it follows

that for m > 2, and e Z x, c + 0,



R(e,,2,6b (2.48)


1/2 1/2
= 2c a(tr QV) +


c -cb ~2(p2(4p l-4)p -~ tr
(p+2)


gV-1
QV)


x {(e-1) V 1 1 -1 -1 + o(c) .





















































x E(E =W./a.)- + o(c


For 6 = 1, i.e., 5 = 0, it follows from (2.29) that



R(X,o2,6 )


(2.49)


-1
(1+2ta i



dt.


-1/


= R(X,o 'N) 2b(p-2)E 2(NN0.H
0 1=1


121


2 -2 4 -1
+ be E2(sN/N'J0 .8 (1+2tai )
0 1=1


P -1
H (1+2tai)
i=1


dt = E(C =1 ia -)- where the W 's are


S1/2


Note that jO


. 1/2


as


a.s.


iid x2 Note sN/(Nc
1'


9 02/(o(tr QV)


) 9 o'tr QV)


o3(tr QV)


-1/


a.s. 2
) + 4/(,(tr QV)
2
(p+2)


2
) = P
(p+2)2


and sN/(Nc


c + 0. Hence, using the uniform integrability (in c ( co) of


S1/2
s /(Nc ) and s /(Nc


) for m 2, one gets from (2.28) and


(2.49) that for m > 2, as c + 0,


2b 12
R(X,a ,6 ) = 2c a(tr


12
Qv)


(2.50)


. 1/2


1/2 2
c b~y~(p+2) 2


p -4
p


- b)(tr QV)


1/











Hence, for 0 < b < 2(p-2), m > 2, asymptotically (as c + 0), for

O / x, the percent risk improvement of 6q over XN is



100{ 1/2 b ~-~2(2(p2-4)/p b]a)(tr QV)-3/2 (2.51)
(p+2) ~


x {(e-1)'V lQ V (8,-x1)}c + o(c).



For 9 = x, O < b < 2(p-2), m > 2, asymptotically (as c + 0) the

percent risk improvement of 6~ over 2N 1s

10I12(r V)1 b P ~N -4/ -N bE ~1+ol
2
(p+2)

(2.52)


Observe that the dominant term in both (2.51) and (2.52) is

maximized when b = (p2-4)/p. Unlike the fixed sample case, the

optimal choice of b in the sequential case depends on unknown

parameters O and 02. From an asymptotic point of view as evident in

(2.51) and (2.52) it appears that for small c, b = (p2-4)/p is the

optimal choice, which is different from p-2. We may also note that
the a.'s are the eigen-values of A = DVD i.e., the eigen-values of

D DV = QV. For Q = V =I the expressions given in (2.51) and

(2.52) simplify respectively to



50b 2 2(P2(p2- 4)/p b)op-3/2 eO-X)' (e0-1c-l 2+ o(c 2)(2.53)
(p+2)2











and


50b 2-(2(p 4)/p b)p (~p-2)- + o(1) (2.54)
(p+2)


Remark. It should be noted that in defining 6 n (21,fon

~2 2
uses sn instead of s then carries out the asymptotic calcula-

tions in the same way as before, one will find that the optimal

choice of b = p 2. Then, however, our method of proof does not

give an exact dominance result as in Theorem 2.1.



2.4 A Monte Carlo Study



For simplicity, consider in this section the case when

x = 0, O = V =I .In this special case, Takada (1984) has shown
~ ~ ~ ~ p
that if one defines



n ~ ly = (p2 n > 2 (2.55)
~=(p+2) n hn


b b
then the estimator 6N N 1"' X ), where




6 =X X (2.56)




dominates RN for all 0 < b < 2 (p-2). The difficulty with the

estimator n1 is that even for large n it does not stabilize since for
-1 2
every n > 2, rh ~ (p+2) x In this section, our objective is to











b b
compare the risk performance of IN N and 6 with b = p-2 and

b = (p2-4)/p.

For Monte Carlo simulation, we take p = 3. Also, in this

section, for actual risk simulation, we consider 6p-2 and 6p-2 or

2 b2
6124/ and 6!, -4 where 6 =6 (X ,..., X )=(1 )I \ ,
N ~ N, n ,n nX .X ~n
,n ~n

b bbn
n > 2, 6 =6 (X ,..Xn) =(1 -l )t X ,ng 2.
Nn, ~n, ~1' Nn nX
~Vn ~n

In the above a+ = max(a,0). Such plus rule versions of James-Stein

estimators prevent overshrinking, and in fixed sample situations,

perform better than the usual James-Stein estimators.

To simulate the sequential sampling procedure, and evaluate

the estimators under consideration, a large pool of trivariate

N(o'l3) variables was generated with o' = (0,0,0), (0, 1/2 *1/2 )
1 1
(0'2l 011,(,,) 0/,2,(,,) Also, c, the cost

per sample unit, is taken as c = .01, .05, .1, .25, .5 and 1. It
b b
should be noted that the risk of the estimators aN and 6N depends on

6 only through Hen, while XN has a risk which does not depend on

nen. For each fixed e and c under consideration, from the pool,

samples were taken sequentially from the top down until 1000

experiments were completed.

A single experiment would be taking sequential samples from the

pool until the stopping criterion was met. At this point, sampling

would stop, the number of samples would be recorded and the estimators











2 2
p -2 p -2 (p -4)/p g(p -4)/p, n hi soitdlse
~N' N 6N 6N adtei socae lse
are computed.

On the completion of 1000 experiments, we compute the average

p-2 (p2-4)/p p-2 6(p2-4)/p, n hs r h
losses for 2y N ,N ,N N, ndtes ae h
simulated versions of the corresponding risks. Also, at this point,

we compute the percentage loss improvements


100((OXp-2
100L~,R) -L(e,6N ))/L(e,X );



100(~eX(p -4)/p
10(~,_N) L(eN ))/L(e, ~N)


100(L(e, N) L(e,6N- ))/L(e, N


103((OX(p -4)/p
10(~ ,N) L(e,6N )/,e



Our simulation findings are summarized in Table 2.1 and Figures

2.1 to 2.6. It is clear from the table that as in the fixed sample

case, when X = 0, the risk improvement of all the estimators is most

substantial when 18en = 0, and the improvement keeps diminishing as IIen

moves further and further away from zero. Also, it is clearly evident

that with the proposed stopping rule b = (p2-4)/p does better than b=

p-2. This is also clear from the asymptotic risk expansion. Also, for

a fixed en~f0 as c decreases, i.e., the average sample size gets larger,

the percentage risk improvement decreases as in the fixed sample case.












Table 2.1 The Risk and the Percentage Risk Improvements


p2-4 p2-4 p2-4 p2-4

c lIIlI N R(XN g) R( N-2,) R(N 9 ) R(A~ 2,9 ) R(A~ Np N N0 2 Nsp-


9.09
10.20
11.30
12.24
13.57
14.76

6.57
6.45
6.31
4.93
4.02
2.06

4.82
4.73
3.89
3.04
1.70
.77


13.38
15.04
16.67
18.01
20.01
21.58

9.66
9.45
9.39
7.20
5.65
2.84

7.01
6.80
5.65
4.35
2.23


7.92
8.22
8.17
8.61
9.64
10.14

5.95
5.17
4.88
3.55
2.86
1.41

4.32
3.87
3.00
2.17
1.37


11.81
12.34
12.26
12.86
14.34
14.93

8.83
7.70
7.31
5.31
4.19
1.87

6.40
5.66
4.48
3.12
1.84
.51


2.27
2.72
3.62
5.54
7.89
17.60

2.26
2.73
3.56
5.45
7.82
17.48

2.27
2.78
3.58
5.39
7.83
17.58


3.6948
2.5424
1.8166
1.1751
.8052
.3490

3.6412
2.4466
1.8518
1.1591
.8002
.3694

3.6677
2.5991
1.8506
1.1681
.8107
.3524


3.3591
2.2831
1.6114
1.0313
.6959
.2975

3.4018
2.2889
1.7350
1.1020
.7680
.3618

3.4908
2.4761
1.7787
1.1326
.7969
.3497


3.2005
2.1600
1.5138
.9635
.6441
.2737

3.2895
2.2154
1.6779
1.0757
.7550
.3589

3.4107
2.4223
1.7461
1.1173
.7926
.3493


3.4022
2.3333
1.6681
1.0739
.7276
.3136

3.4246
2.3202
1.7614
1.1179
.7773
.3642

3.5091
2.4985
1.7950
1.1427
.7996
.3507


3.2586
2.2286
1.5939
1.0240
.6897
.2969

3.3197
2.2581
1.7164
1.0976
.7667
.3625

3.4329
2.4520
1.7677
1.1317
.7958
.3506


1 0
.50 0
.25 0
.10 0
.05 0
.01 0


1 47
.50 /IT7
.25 J/Lif
.10 /Lif
.OS JATE
.01 JUT

1 1
.50 1
.25 1
.10 1
.05 1
.01 1


.88 .48













1
.50
.25
.10o 2
.05
.01l 7


1 71
.50
.25
.10 71
.05
.01l 1

1 2
.50 2
.25 2
.10 2
.05 2


2.27
2.74
3.57
5.45
7.74
17.62


2.28
2.73
3.59
5.48
7.80
17.63

2.27
2.68
3.62
5.46
7.91


3.6316
2.5176
1.7857
1.1648
.7940
.3516


3.6506
2.5321
1.8786
1.1659
.8181
.3496

3.6174
2.5962
1.7980
1.1870
.7930


3.5422
2.4638
1.7611
1.1489
.7859
.3502


3.5938
2.4894
1.8575
1.1586
.8147
.3486

3.5750
2.5701
1.7867
1.1796
.7883


3.5054
2.4438
1.7544
1.1429
.7832
.3499


3.5726
2.4741
1.8497
1.1572
.8144
.3483

3.5641
2.5627
1.7849
1.1170
.7866


3.5470
2.4710
1.7656
1.1528
.7880
.3508


3.5954
2.4966
1.8639
1.1601
.8154
.3489

3.5779
3.5713
1.7884
1.1815
.7895


3.5084
2.4505
1.7590
1.1481
.7861
.3508


3.5710
2.4830
1.8580
1.1585
.8149
.3488

3.5660
2.5621
1.7859
1.1796
.7883


2.46
2.14
1.38
1.37
1.02
.40


1.56
1.69
1.12
.63
.42
.29

1.17
1.01
.63
.62
.59


3.48
2.93
1.75
1.88
1.36
.48


2.14
2.29
1.54
.75
.45
.37

1.47
1.29
.73
.84
.81


2.33
1.85
1.13
1.03
.76
.23


1.51
1.40
.78
.50
.33
.20

1.09
.96
.53
.46
.44


3.39
2.67
1.50
1.43
.99
.23


2.18
1.94
1.10
.63
.39
.23

1.42
1.31
.67
.62
.59


.01 2 17.63 .3501


.3492 .3490 .3494 .3491


.26 .31 .20 .29













p2-4p2-4 p2-4 p2-4
caili N R(XN,2) R(f6p-2,2) R(fN ) R(aN-2,) R(AN N- Np %P gp-2 N P
-~~ -N A ~N N N N


Table 2.1-continued


1 /
.50 /
.25 /
.10 /
.05 /
.01 /


2.28
2.73
3.53
5.57
7.84
17.44


3.5773
2.5431
1.7944
1.1658
.8193
.3496


3.5465
2.5247
1.7856
1.1620
.8159
.3491


3.5390
2.5201
1.7842
1.1616
.8147
.3490


3.5480
2.5274
1.7881
1.1634
.8169
.3492


3.5394
2.5221
1.7872
1.1634
.8161
.3491


1.07
.90
.57
.36
.56
.17


1.06
.83
.40
.21
.39
.14











The opposite is the case when 11ell = 0. The main reason is that

when 1 = 0,) NIlIXN 2 behaves as a multiple of c Ilell2 when

2 2
8 f 0, while NIIl II + x when O = 0.

Figure 2.1 plots the risk of the sample mean, RN, the
James-Stein estimator at b = p-2, 6Np-2, and Takada's estimator at

b = p-2, p2, versus nen, for six levels of cost. As can be seen

from the graphs, 6NP-2, for each level of cost, has smaller risk than
"~N,


p-2 p-2

Sp-2~~ ~~ bein bete ha ha f p2,f r each level of191 Also, we

c see that thepecn risk dimproemen is greatest at Ilel = 0 and n


dment decreas n oes as erases Further or fixed c. otst the percent

risk improvement derese as lie l moves awa fvro zN ero. Figure 2.3

plots the iss of ts.hee sape maean N the Jamcenrs-Stin oestmant ora


be =ha (p e 4)pren 6is and Takad s estiatort at b11 = (p -4)p, n

cesN sa vecresu s Ie, for te six r evi foush levels of cst. Throe-

panttdecrns an oclsos foder thes. ter plotare ixdentcalto thoe ofren


Figur te 2.1. s oFigue 2.4ploste percent rikimrvmeto

S(p2-4)/p, -~~4/ (p2-4)/p
6 versu Xe andth 6i orveru l ,versus cost, frthe fu


~VN ~N) N,











previous levels of netl The patterns and conclusions for these

plots are identical to those of Figure 2.2. Figure 2.5 is a plot of
the risks of X 6 and 6!. ,4/ versus net, for the six
~N' ~N ~N
previous levels of cost. This figure gives us an idea of how

Figures 2.1 and 2.3 compare as far as the James-Stein estimators are

concerned. Here, for the first time, we can graphically see that
(~p -4)/p (~p-2) wie bfr ot
N ~has smaller risk than a wie sbfrbt

dominate ~N). Hence, we have more evidence in favor of (p2-4)/p as
the optimal choice of b. Figure 2.6 plots the percent risk
impoveens p-2, p-2 (~p -4)/p ( 4/
imroemnt o 6 ,6 and 6!, ,4/ over X versus
~N, ~N ~N N ~NN
cost, for two levels of nea. Here, for the first time, we can
(p -4)/p
graphically see that SN has the greatest improvement while

6i2-4/ is second, 6N 2is third and aP- has the least

improvement over XNgiving once again more evidence in the favor of
(p2-4)/p as the optimal choice of b.










































-


.1- O


- -


) rT .


_I 1 I r I I__ rTI I_ IT1 _ Irl _ _I _____li____ __


48









i- cost=1. O



3.25-


3.00-1


2.75-



2.501a--~=""cost=.50


2.00-





1.50-


1.25-1


1.00-1





0.25-1


cost=.25


cost=.10


~------eP


cost=.05





A cost=.01


~denotes risk for XN


6p-2
-N,
Sp-2
~N


NB'RM THETA


O denotes risk for

Denotes risk for


Figure 2.1 Risk Analysis at b=p-2.







49











12-























10-(



CBS












0-N




A denotes percent risk improvement for gp-2


Figure 2.2 Percent Risk Improvement at b=p-2.







































1.00

0.75-


_ I r ~ _I 1_ r rT1_7 __1 ~1_ _ __ r_ __ __ __ __ ____


3.00
9.75-

i. 50-




3.00-






1. 75-


Y
~sr~bc--~c~ c.

~-~~
















~l~g~e


cost=1.0









cost= .50






cost=.25


cost=.10


cost=.05



cost=.01


NORMt THETAI
L denotes risk for XN
Odeote rik fr 6p2~N /
O denotes risk for 6 p2-4) /p
~N



Figure 2.3 Risk Analysis at b=(p2-4)/p.
















'20











S16-
















CBS

E 6 =)/
N 10- e ecntrs mrveetfr_
Tdntspretrs mrvmn o ~24/


Figure 2.4 Percent Risk Improvement at b=(p2-4)/p.


































































Figure 2.5 Risk Analysis for the James-Stein Estimators.


-~-


. B-c- a r


-r


RISK
3.751
j --LPe~ .ncost=1.0


3. 00


2.75-


2~.50-


2.25-


2.00O-


cost=.50


cost= .25






cost= .10



cost=.05


L 7


1.00


0.7'35

0.50-


cost=.01


0.00-


NCIRM THETAI
a denotes risk for XN

O denotes risk for 6p-2
~N

Denotes risk for g624/
-N





















18


IISli=O
~ ;

-a
~--~-E3------~---~







Ilel I=1


c 1P
E


0.0 0. 1 0.2? D.3 0.U 0.5 0.6 0.7 D.d 0..3 L,0
COST 2[Zq/
1 denotes percent risk improvement for
~ N

O denotes percent risk improvement for 6(24/

Denotes percent risk improvement for 6p-2

O denotes percent risk improvement for 6p-2



Figure 2.6 Percent Risk Improvement for All Estimators.
















CHAPTER THREE
SEQUENTIAL SHRINKAGE ESTIMATION OF
LINEAR REGRESSION PARAMETERS


3.1 Introduction



Consider the linear regression model Y. = z'.8 + E.,=,,..
1 ~1~ 1
where the E.'s are independent and identically distributed

N(0,o2), B(px1) is unknown, and z 2. . are known pxl
vectors. Let Z' =z (1.,z ) and Y, = (Y1"" 'n)'. Assume

that the rank of Z' Z is p. The problem is estimation of B. Given

Y let the loss incurred in estimating 8 by 6 (Y ) be



Ln n'~'' =i 8 IZn Zn)an B) + Cn (3.1)



The usual estimate of B is the least squares estimator

a = (Zn Z )-1Zn; Y which is distributed N(B,o2Zn~ Zn -1), with
risk


R(8, )~ = EL (8 ,8) (3.2)


= E[(B B) I(Z' Z n)(6 8)] + cn












n c


2 1/2
The above risk is minimized at n = n* E (pa ) .However, if

a' is unknown there does not exist any fixed sample size which

minimizes (3.2) simultaneously for all a Again, motivated from the

optimal fixed sample size n*, we propose the following sequential

procedure.


p MSE, 1/2
N = inf {n m : n ( n)}(3.3)


where m > p+1 is the initial sample size and


MSE = n Y Z B n /(n p) (3.4)
n ~n ~n ~n


for each n > p+1. The above stopping rule was first considered by

Mukhopadhyay (1974) who proposed the estimator 8N for B and studied

asymptotic properties of N and j .
Our next step is to show that the above procedure can be

motivated from a minimax criterion. With this end, first let 2 have

a N~~r2I) pror.Then the posterior distribution of 8 given

Yn = y_ is



N(u + .2Zn (2 nZ Z + 02 -n -1(y~ Z v), (3.5)












p-P 7 Zn InZn) + GI n) -Zn *



Next we prove the following matrix lemma.


Lemma 3.1


2Z Z' 2 -1 2 2 -12
Z (rZ Z GI = ( Z'Z+ 0 ) Z' .
~n- ~n n~n NP n



Proof: Use the identity


(r2ZnZn + 021 )Zn= Zn 2ZnZn + 02 n) .



Using Lemma 3.1 it follows that


T Z'(T +a )( Z U)



(T 22 2Z+ a2 -1Zn (n Zn9



2~ + 1 -1 -1 -2 -
= T(I +-m (n n)) (nZn ) Zn nN


(3.6)


(3.7)


2
= (I + 1,
T


(ZnZ -1- n '


Also, using Lemma 3.1,













I 2 n2Z + 2 n-1Z



=Ip (r2Z'Z + 02I -1 ,2 'Z )
~p n-n ~p -n-n


(3.8)


~2p ~ 2p)


= I -( (2Z Z + 02I -1 T2Z'Z
~pn n ~p Nn-n


= 0(, Z'Z + 62I )-
~n-n -


2
= --4Z Z
,2 ~ n n


2
a -1
+ --- I )
T2 ~p


2
a
2 ~p
T


2
2 ~n~n
T


1 -1 -


Thus from (3.5) and (3.7) (3.8) it follows that the posterior

distribution of 3 given Yn = yn Is


02
T


s -1 -1 ^
(Z Z ) ) (6 ) ,


N(1 + (I p


(3.9)


2
2 a
a(I +-q


S' -1 )-1Z Z -1)
(Zn~n Zn-n


The loss (3.1) being quadratic, the Bayes estimator of 8 is


2
a -1 -1
+ -r(ZnZ, ) ) (6 )
T


B
= p =+ (Ip












a2 a2 -11
with risk -- tr[ (I + (Z Z ) 1)]+ .
n p ,2 -n-n

Next, consider the sequence of {N(u,ml ), m>1} priors on f3.

Then the Bayes estimator becomes



B .2 11^
6 = +( +I + Z IZ )- )- (8-B
~n "p m -n n ~n



.2 ,2
with risk -- tr [(I + -(Zn)) + cn. Here, the Bayes

2a 2
risk converges to pa+ cn as m ", and R(B,o2 B ) p + cn
n ~n n

for all B. Hence n*, the sample size, and 8 the estimator, is a
~ ~n

sequential minimax rule. Consequently (3.3) can be viewed as an

empirical minimax stopping rule.

In Section 3.2 we show the dominance of a class of estimators

over SN for the loss (3.1) and the stopping rule (3.3). In Section

3.3 we provide an asymptotic risk expansion for 8N and an asymptotic

risk expansion for the shrinkage estimators dominating SN both up to
the second order terms.


3.2 A Class of James-Stein Estimators Dominating 6



In this section we consider the class of James-Stein estimators

6N(YN) where






















































r
~cN ~NNN ~N


n U- )
a ~


sbv )
~01 ~0


(3.10)


for every n > p+1, where s2 22Sn sacntn n e s

the known point towards which we shrink. The main result of this

section is as follows:


Theorem 3.1 Under the stopping rule (3.3) with m 2p, and under

the loss (3.1),


2 b
R(,a ,S )
N N


2
SR(8,a ,N
N'


for every bE(0,2(p-2)).


Proof: Note that



R(8,a2, ) (,2 )
N '-


(3.11)


=bE2 1


1
- 2bE 2 9
N


NN NZNN VN


~)( ~ ( ~ ~N







60



2 4

= E. E [(1 b
n B ,02 (B X)'(Z'Z )(B X)
~n ~n-n -n


2
bs
- 2 B n)( ) )
~n --n~-n


_B *(Z Z )(B ))}I].


To proceed further we shall require the following lemma concerning

the orthogonal decomposition of SSE (=(n-p)MSEn) in regression.


Lemma 3.2 Let Yn = 6+ e where e N( O a2In) for every n p+1.

Assume that (Z'Z ) to be invertable for every n 1 p+1 and define
^ 2n
SSE = HY Z B 8 Then the following results hold.
n ~n ~ln-


(i Zn+1Zn+ Zn n + zn+1zn+


(ii) (Z' Z ) 1
-n+1~n+1


= (Z'Z ) (1 -- zn~ (ZIZ ) 1), where
~n-n -p n n1 n+1~r
n


n =1 + z (ZIZ )- z ,
n ~n+1 YIUI ~n+1'


(iii) 8 = +n [ (Z Z )1 z ,wee
~n+1 ~nn nn ~0- ~+1'wr En


B = (Z'Z )Z'Y ;v


a 2
(iv) SSE,, = SSE, + ni


-z B and
~n+1 ~n


=Y
n+1







61




(v) Yn ZB is distributed independently of En.


Proof of (i):


Note that Z n+1l n+)I.


Zn
Z' Z = (Z'z )[ ]
~n+1~n+1 ~n~n+1
~n+1


Then


(3.12)


~n-n ~n+1~n+1



Proof of (ii): Using the matrix identity from Rao (1965) that for

any matrix A (pxp) and (px1) vectors u and v,


(A + uv )- = A 1



then


(Z ,Z ) =~ (Z Z + z
~n+n+ ~n~n


(A luv A )~,
1+v A lu





n+1~n1+1)-


(3.13)


(3.14)


(' -1_---~---(' 1-1,,31' )-1]
Nn n -1 ;n 0 n1n + Z ln
1+n+1 Znn / n+1


= (ZtnZ)- (I~p -n n+1Un+1 n 0 *l











Proof of (iii): Note that



6 (Z' Z )-Z Y
,n+1 Nn+1 n+1 sn+1-n+1


(3.15)


= (Zn -1 _1 )~En+11n+1 Zn n-1(nn
P T~n


n+11n+l


1 -1
--- z (Z'Z ) z )
n n+1 ~n~n ~n+1


+ Y (Z Z ) z2 (1
n+1 ~n~n ~n+1


= 8- --Z Z ) z (z 8 )
~n np" ~n +1 ~n+1~

1 -1
+ --- Y ~(Z'Z)z
l ,n+ n~n ~n+1


-1 -1
,n + ps(nZn ) n+1 '



where in the fourth equality in the rhs of (3.15) we use the

definition of np.


Proof of (iv): Note that


SSE,+ = "nYl Z e o~l


(3.16)


2
Y Z 8
sn sn-n+1

Y z 8
n+1 ~n+1~n+1










-1 -1 2
sIn inn E (ZI ) Z .11


n -12
+ (Y ~- z' --2' z (Z'Z ) z )
n+1 ~+1~n n ~n+1 ~Yn~n Nn+1

E E:
n nn2 + n+/nn-Z(nZi



+ (E z (
n n ~n+1 ~n~n ;n+1


2
E
= SSE, + -


2
En
= SSE + -
n np


En2
(? 1) + (E -i-)


the third equality in the

the fifth equality we use


rhs of (3.16) we use part (iii),

the definitions of ~n and 6 *


where in

while in


Proof of (v): Since Yn Z 8, and En are both normally distributed

then to prove their independence we need only show that their

covariance is zero. With this end note that


Cov(Y Z 8,n.)


(3.17)


=Cov(Y ni'1+ in+ n z



= Cov(1n'in+1 ) + Cov(Z nin' n+1







64




= Var(Y )Z (Z Zn ) zn~ + Zn Var(B )zn~


= Z (Z ~Z ) z + ZZn (Z Z ) znt
~n~~n ~n+1 ~n ~n~n +

= 0


Note that if we define X = Y Z B n and
1 ~p+1 ~p+1 p+1


X.= ,1P- i=2,...,n-p, then from Lemma 3.2 the X.'s are
1j nip


2 2
independent, and Xi ~ a x1, i=1,...,p. Also,


from Lemma 3.2, for


each n p+1,


SSE =nY Z B 5 = X1
p+1 ~p+1 ~p+1 p+1



p+2 p+1 tp+1 p+1 n~p+1 1


+ X2


2
"n-2
nn-2


SSE,_ = 'nY2 ~nZ 8 8





SSE =nY Z B 11 +
n ~n-1 ~n-1Nn-1


X1+X2+...+X ,p-


-X +X + .+X +X
n 1 2 n-p-1 n-p


be considered as the mean of


-1
Consequently, MSEn = (n-p) SSE, can


(n-p) independent and identically distributed x2 random variables.


Denote by Bn the o-algebra generated by X1, .. Xn-p.

using Lemma 3.2,


Then













E [ n I ] (3.18)
2 C N=n]
~Yn ~ ~n~ ~n



= E 2E 2- ^ nN=nljB ]
~~ ~ ~~ 2nnz' n ~n


4 1
=E [s I E [I JB }]
8,o 8,0lB (B x) (Z Z )(6 X)


4 1
= E 2[s I[Nn]E 2^ '
a 8, a (6- 1)(n)( )



where for the second equality in (3.18) we use the fact that I
[N=n]

is a function of X1, . ., Xn-p only, and in the third equality

we use the independence of X1, .. Xn and 8 .
Next, note that for every n ,p+1, since ZnZ is positive

definite, using the diagonalization theorem, there exists a

nonsingular Dn such that D~ (Z IZ ) D = A 1, a diagonal matrix with

all positive diagonal elements. We write a as the i-th diagonal
element of A n. Use the transformation Ln = D'(6 1), so that

LS ~ N(q,a~ A ), with 2q = Dn n ) Then


2 2
E [ 0 = E [ ] (3.19)
B,a2 n- X10 ZnZn n B- n2












LIA L
~n-n-n
Here, for every n, 2
2


~ ZP W ., where the W 'ls are independent


2
2 ni
and w ,. ~ xl( )i=,.p.Hence, using again
2o/ni


the result


that if P(U>0) = 1 then E(U-1) 0 I E~exp(-tU)]dt, one gets


-1
E 2 2 ]
~n


1hs of (3.19) = E


(3.20)


= IJ E[exp(-tz=W .)]dt


= J 11 E~exp(-tW i)]dt


2
ni 2t )d
exp(- ) }d~t
2a /a.


1=1



1 r ,/2
= JO `l+)


t 1 Cp 2
exp (-T~~ i- -, l=anitni)dt
a


t 1
exp(- 1+2t 2
a


= g (f)


(- ~lZnZn)(B-x))dt


(say)


since Zip a E2 .IAE = ( -h) (Z'Z )(8-h).


Here, A =a ( ,(r2
n n


We shall next show that g (6 ) is nonincreasing in n. Note that
for any nxp design matrix Z
~n












1i=1z11

n 2
~i=1zli


Ci=1 p-1,i


, r=1zlizp-1,i


Z'Z =
~n-n


(3.21)


n
ri=1zli


n
i=1zp-1,i


n 2
Ci=1zp-1,i


Then for any a ER,


=P c 2 a.(Z1Zn).. a.
1=1 j=1 1 -n 13


a'(ZnZ )a
Nn


(3.22)


=a E 2ZIZ ).+ 2 C C a.a.(Z IZ )..
1=1 1 ~n-n 11 1 n-n
i

= Pa.(E z2 ) + 2 a.a (rn z kj)
i=1 1 k= ik 1 3k= 1zkk
i

~= E [2 + a~zl + ... + "pz lkl


which is a nondecreasing function of n. Hence, using (3.21) and


(3.22), g (fn) is a nonincreasing
Next, observe that


function of n.


2
E ~S
8,02 (6 -I) (ZI ZnB-*
~n ~n~


Bn -1)'(Z'Z )( -B)I ] ~n


(3.23)







68




= E E [n 6 )(ZIZ )( -6)I (nlB ]
2 2 ^ n N n-n ^on [Nn



(B -x)'(z'z )(B -8)
,n un ,n
= E [s2I ]E ]
2 n [N=n] 2C-~--
a~l 8,0 (-X) (Z Z )(B -1)(




by the independence of X1, . ., Xn and Sn, and that I[N=n] is
a function only of X1..., Xn-

Using the diagonalization as before,



sn ,nsn on
E [ ]------- (3.24)
B,o (6 -) (Z Z x)(




= E [ ]-----
,2 L'A L


n' n,n LA


PL .
p= 2 LA nl


n 2 ni nin

= Ca 02E [ 1ni L .
1=1 2 L' L in

inP knn 1 E~ ni













= (p-2)E 2 _a_



=(p-2)g (b ).


For the third equality in (3.24), one uses Stein's identity, while

for the last equality, one uses (3.19) and (3.20).

Combining (3.18) (3.20) and (3.23) (3.24),


2b 2 ^
R(Ba ,6 ) R(Bla ,BN) (3.25)
N, ,


2 2
s s
n n
= b In=m 9p n)E 2 i- {b -q 2(p-2)}q ],~


2 2
S S
n n
< 2(p-2)b zn=m p n,) E 2~ iE 1} I[N~n] '



where in the last step one uses 0 < b < 2(p-2). Accordingly, for

proving the theorem, it suffices to show that


2 2
S S
2~~~~~~ g () j } 0 for all 02 > 0 (3.26)
n,=m p ~) _n 2 2 [) N=n]



With this end, let no be the smallest integer > m (> 2p) such
tha p~+2-1 cn /p 1 02. Then, write











2
n -1 s
Eo ( )E [n
n=m 9p ~n 2 Cn


2
s


1hs of (3.26) =


(3.27)


2
sn

+ g p(bA)E~ [ o


"n
,2


[Nan ]


2

+ E g (A )E [n+1
+n=n (p ~n+1) 2 n+1 ~


2
S
+1
2


- 1}1 ~~l ]


2 2
s S

9p bn)Z 2 TE2 N n+1]



where the first term in the rhs of (3.27) should be interpreted as

zero if no = m. Note that for n n on the set [N n+1],


s 2> p(p+2) lcn /pt p(p+2) -cn2 /pp 2

g (A ) is nonincreasing in n. Hence,


Also, by (3.21) and (3.22),


third term in the rhs of (3.27)


(3.28)


2

( n=n 9p n+1)E 2 fnl
o a


2
S
j) -1}
a-T I


2
- n-


2

a


Note that INn+] saB m su bl fnc o. Then using the


repesnttin 2 1 {(n-p)s2 p P
rereetaio n+1 n-p+1 n p+2
probability 1,


X }, one gets with
n-p+1


2 2
S S
Sn+1nl n+12 II Nnl/ n
E [ B]


(3.29)


1}IN n+1]
























































(n-p)((n+1-p)-2 )
p+2
2
(n+1-p)


2
+3 0
(p+2)2


(n-p)2s4 +2(n-p)s2 p02

(n+1)(n-p+1) ar


[~Nan+1] .


(n-p)s2 p-2
n+2
(7n+1)(n-p+


2 2
s s
n -(-
a


2
- 1} + I[Nyn+1]C 1_T (np)
(n+1-p)


1


[Ngn+1]


(1(
n+1


np)((n-p+1)-2 --7)
(n-p+1)2


1 2
- )
n n


((n+1-p)-3 )
1 p+2' p 2
(n+1-p)


the multiple of I[Nn1 in the extreme rhs of (3.29) is

2
function of s ,where the maximum occurs at


Note that

a concave


2. n+1)(n+1-p)2-n(n-p)((n+1-p)-2 ,
s = r1 <2 a.
n ~~2((n+1)(n+-)-~-)


(3.30)


2 2
Here, recalling that on the set [Nin+1], s > a it follows that
n-


2nd term of the extreme rhs of (3.29)


(3.31)


1 c2
n


2
1 (n-p) _
-I [Nn1 n12
~N~n~l n~l(n+1-p)


1 2 1 ~











((n+1-p)-3 -3 )
1 p+2 p 23
- n~r (n+1-p)2P+


=I ( 1 1 ) [3 1
[N n+1] n+1 (nlp2 p+2


- An-p)-1 02
p p+2


0 ,


since n > 2p.


Hence, from (3.28), (3.29) and (3.31),


third term in the rhs of (3.27) ( .



Next, note that I[Nm 1 with probability 1 and


(3.32)


E2 ~


2
s
m -
02


2
1)] = e
m


()~2 1 (m-p+2) 1]


(3.33)


22


- -{2 4m-p)]


0 ,


since m > 2p.


Thus, if n =m then rhs of (3.27) < 0.
o


p(p+2) lcn /p


first note that for n ( n 1, on the set [N=n], s2 <
o n


Spp2-1c o-)2p < 02 so that using the fact that g (f)i is

nonincreasing in n,

the first term in the rhs of (3.27) (3.34)


2
s ~
2n
2 a


2
Sn
2


n -1
(~~~~ 9ph)C















































1 ,,


73




In view of (3.26), (3.27), (3.32) and (3.34), for proving the theorem

it suffices to show that


2
s
n
1)I + E [ o
[N=n] 2 n
a o


2
s
n
o )I]( 0 .
T" [Nan ]
a o


2
to E[n
n=m 2 n
a


2
s
2
0


(3.35)


2 1
s = ---
n n -p


2
Io no~s,-1


To prove (3.35), first writing


+ -p- X ], one gets
p+2 n -p


s2
n
E [ o
2n
a C-o


,2 1)I N o]


(3.36)


2
s

2 2n


2
sn


|NnlB ]-


(o-p 5n -1
0


+2(n -1-p)s2 -
o 1


p+2


+3(- --)2 54
p+2


{


L


=E [
2
a


n (n -p) aZ


2 p 2
(n -1-p)sn +z c
o
no(n -p)


['Nino~


4
S
n -1
-- -- cs2 -1


=E 2[{b o


- d 12 [





















































2 2
sn -1 sn -1
-cn )-- -1--o7 1
o o o a


= E 2[{(n -1)(2b


S(n -1-p)2
where b ---,
"o "o (n -p)2

1 (n -p-3 9)
"o o (n -p)2+


-0-
p+2


(n -1-p)(n -p-2
2
(n -p)
0


1
n
o


,and


.Note that


n -1-p
c b = o1 2 -2- < 0
n n 2 p+2
o o n (n -p)


(3.37)


and that


2
- bn )' = 1 2(p-1)(n -p)-32 -7 (p-2)]> ,O


d +(c
"o (o


(3.38)


since n > 2p. Also, let
o


4
n -1 (n -1-p) (n -p- --
Sn (n -p) (n -p)
o o o


fn = (n -1)(2b, c
n o n n


(3.39)


So that f o


<(01).Now, rewrite


extreme right of (3.36)


(3.40)












4
sn -1
+( (-----


2s2
n -1
o


S (n Cn bn [~j Nan ]I
o o o


2 2
S S
n -1 n -
< f E [ (~o
a n 2 1 2


1) I] .
[Nan ]


2 2
s s
n -1 n -1
If E [----o__~ o
2 n -1 2


S2
1)I ] < 0, noting again that s2(ao
[N>n ] n


the set [N=n] for all n _

< 1, one gets from (3.35), (3.36) and


(3.40). Otherwise using f

(3.40) that


1hs of (3.35)


(3.41)


2 2
s s
n -1 n -1
+ E 2[ (
o1 a 2


2 2
s s
E 2 n( -
a a


n-52


1)I[N=n]]


1)I Nn 1
~Nn- o


Proceed inductively to get either 1hs of (3.36) I 0, or finally end

with


2 2
1hs of (3.35) < E 2[i~ m )I ]5
2 C TE N>m]l


(3.42)


as shown earlier.











2 1
Remark. Initially we defined S MEraiigtene
n (n-p)+an nS, elzigten
Nnn

dominance was then proven with conditions on an established along

the way. The particular choice of an = -4n-p) was most appealing
since then S M S n ec
(n-p)+ -(n-p)

E 2S2]= p,02 Therefore, the bias would be negligible for large

p (as in Chapter 2). As a consequence of our choice we needed

m > 2p, and thus we require such a large initial sample size.


b.
3.3 Asymptotic Risk Expansi~on for 88 and 6


In this section we first obtain an asymptotic risk expansion for

~N up to the second order term. Subsequently, we find an asymptotic
risk expansion up to the second order term for 6N'
Observe that


2 ^ 2ENl -1 EN (.3
R(6,a ,8)=pa EN]+c[](.3



(N-n*)
=2cn* + cE[ -N~-


1/2 1/2 2
=2c (po2) + cE[( u-I 1N-n*



a.s. a.s.
From definition N + m as c + 0 so that MSEN + 0 as c + 0,

since MvSEn is the mean of n-p independent and identically distributed











random variables. Also, using Amscombe's Theorem, /E- (MSEN 02 ,

N(0,204) so that using the delta method, JN-E (MSE N1/2- 0) L

N(0, 1/2 02
Next use the inequality


p/)12MSE 1/2 N -N-1


Dividing both sides of (3.44) by n*, making c + 0, and using
1/2 a.s. a.s.
MSEN + as c + 0, it follows that N/n* 1 as c + 0. Thus,
1/2 L 1/2 L
'n*-p (MSEN( ") + N(0, 1/202), and also "n*-p (MSEN-1 ") +

N(0, 1/202)

Again from (3.44), one gets the inequality


1/2 1/2
1/2 MSE Nn 1/2 MSE 0
n* < < + n* ,(3.45)
a -1 2- 1/ a
n* n*


1/2 L 1/2 L
implying that (N-n*)/n* N(0, 1/2 ) as c 0. Hence (N-n*)/N +
a.s.
N(0, 1/2 ) since (N/n*) + 1 as c + 0. Also, the uniform
2Nnf
integrability of Nfor c < c (specified) can be proved by
repeating essentially the arguments of Ghosh and Mukhopadhyay (1980).

Thus, E 2 -~l;~(N-n* ) ] + 1/2 as c + and it follows that from (3.44)

as c + ,











2 =2 1/2~p2 1/2
R(B,a2,6 ) = c (p2 1/2c+ o(c ) (3.46)



Next, we find the asymptotic risk expansion for ( It follows
from (3.11), (3.18) (3.20) and (3.23) (3.25) that


2b 2
R(8,a ) = R(8,a ,8 ) (3.47)



sN = 1 /2 N 11
2b(p-2)E 2C I 1 ~TIZE) exp{- -7 172#8 (cZqZN)}(B- )}dt]



+ -E 2[ 11 /2Tf)exp{- N2~B*~ 1 2tis-)1( 4Z Z)}(-1}d]



Now, for technical reasons, we shall handle the risk expansions

separately for the cases 8 P A and 8 = 1. However, both cases will
a.s. as
require the facts that as c + 0, s4 4( p 2, 2 + 2 p
N p+2 N p+2
1/2 a.s. 1/2
and Nc +(pa2) .Also, at this time, we shall make the

following assumption that -( ZlZ ) + K as n + =, where K is a
n ~n n
a.s.
positive definite matrix. Hence -4Z ZN)+ K as c +, 0.
Consider first the case 8 P x. Here, as c + 0,



1sN 1 /2 N t 1
c1 N Im(1+2t.) ex{ --p(-){NZZ)}-)dt3.8
0+t' a










1/2
P/2exp{- Nc t (-){ Z N}(-)d
1/2 2 1/2)jBX)d
1+2tc

2 /2
2 Im exp{-t IL(pa )(8-X)'K(8-X)}dt


4
sN I 1


1+2tc


a.s. 4
+ 0 g7
1/2
(pa2)


= 4 p 1
(p+2)2 (8-h)'K(8-X)



Next we prove the uniform integrability (in c < c ) of the left hand
side of (3.48) when B / x. First note that


1hs of (3.48)


- N


(3.49)


Nt-6 1)' Z Z ) ( -)d
3a


+ exp{- 3 48-A) I IiZNZN) j Bx


SN K2


(T~ TT/2 dt]


where K2 is a constant not depending on c.


Thus it suffices to prove


the uniform integrability of B--N )s{ 4ZZN -1i

First use the inequality












sZ
E 2 N Z 1 4
a c s
N c


(3.50)


{ (ZZN -1 > d


< d 6 E~t [( sN ( I~- ) (ZZN-11+
02 NN


Taking 6 E (6 ,1/2 ) and using Holder's


E (0, 1/2 ).


for some 6


inequality, one gets


E (N I06-X1){ 1(ZNN 1 1+6
E,2C(' ;


(3.51)


(1+6')/(1+6)
2 -(1+6


(1+a)(1+s )


6-6' (6-6')/(1+6)


4I~-)~~' -1 -~jl
SEr2[E (sNG ;NZN


Taking 6o E (6, 1/2 ) and using Holder's inequality once more one gets


rhs of (3.51)


(3.52)


i EE~2 Nc-(1+6) (1+6' )/(1+6)


(1+6)(1+6') 1+4o

x IE~24~ 6-61 60-6


6o_4
1+60











(1+a')(1+60)

S[E {(8S-? )' (Z~~NZN 18? 61


Next observe that for any arbitrary E in (0,1),


1+6
1+60


6_ .
1-47


E (2 -(1+6)
2


c (2c -(1+6)
a2 [Nien*]


+(E n* c) ] 5)


(3.53)


<(m2 )-(1+6) P(


To proceed further we need the following lemma.

Lemma 3.3 For any 0 < E < 1, p > 1 and m > p+1, P(Njen*) =

( 1/2 )(m-p)
0(c )


=Cn~ P (N=n) where [u] is the
n=m ,2


Proof: Note that P 2(Ncn*)

largest integer < u. Now


P 2(N~m)= P(1MSEm < c-m2)


(3.54)


2 m_9 c 2)
= P(x < --m )
m-p p ,2


( 1/2 )(m-p)
= 0 (c


2
since P(xk


Sd) = 0 (d ). Therefore, all we need show is that












C~n*I(1/2 )(m-p)
tcn* P (N=n) = 0(c
n=m+1 2


(3.55)


Now, for n > m+1,



P 2(N=n) = P 2(MSE < -- n )


2 n-p c n2
n-p~x_ p 2


(3.56)


expI~h n-p c 2]~E~exp(-h
S2


72
n-p,)


< inf
Sh>0



=inf
h>0


n-p c 2 1 2
exp~h n ]()
p 2 1+2h
a


2 2 "-P-
n c n c 2
a a


= xpn p] -(-1
2


2 2 '
n c n c 2
= exp(1 )( )]
p 2 p 2


2
Note that for ng [ En*], -- e2<1.

increasing in x for 0 < x I 1. Therefore


Also, x exp(1-x) is


n-p
2
n" c 2
(- ]


2
[Cn*] c
c [exp(1 -- )
n =m+1 p TE


c[En*] p Nn
nI~ P Nn)5
n a+1 2


(3.57)


m-p
2
< c


m-p n-m
1 1 2 [En*] m"-Ppl 2 E2E 2
(e ) e p 1 -e )
p a2 n=m+1











m-p m-p n-m
2 1 2 m 2 )23 2
< c (e ) n [Pexp(1 -E )
2 n=m+1
pa

= (c( 12 )m"p)) ,


n-m
since c mpep1-/e is a convergent series. Hence,
n=m+1
using Lemma 3.3 for m p+3, and c I co < 1,


1hs of (3.53) (3.58)


2 -(1+6) c 1/ )(mp)2 2 -(1+6 )
< K (mc ) c Ep

-(1+6) 2-6 2 2 -(1+6)
K )(m 6) c +( p



where K (> 0) is a constant not depending on c. Further, since MSE,
can be expressed as a mean of independent and identically distributed
X12 random variables then MSE, is a one sample U-statistic with kernel

of degree one and hence is a backward martingale. Consequently,

using Doob's maximal inequality for submartingales, it follows that
for every r 1 ,


E2s2r] = E2 MSEN r (3.59)

E 2SN =E [MSE ]
a2 N












MSE ]r


< E [sup
- 2 n~


< K E [CMSE ]


where K1 is a constant which only depends on r. Next

observe that if an + a then for all E > 0 there exists an mo

such that for all n > m la-a| < E implying that for all n > m ,


(1+6')(1+60)
Let u= a =
6 g1 n


1 1
an a-c


(8- )'Z'Z )}(B-X),
~n -n


a = (B- )'K(B-X) and E > 0. Then


6-6'


E2[{(B-x) iiZZN)}(8-x)}


(3.60)


1 u1
E2C aN


=E [ (- I
2 a N'm ]


1N o nm3u


< 2u-1{E [(-
02 aN


+ E [(-
a2 aN


I )~~u]


I [Nm])u1


u-1
<' 2 [ (sup
ncm
o


1 u
a
n


+ ( ) ] ,
a-E


independent of c. Combining (3.50) (3.60), the uniform

integrability in c < co of N 06,~(-1)~'{ZN 1 follows.
o Nc











Thus, for m p+3, B as c' +0,


4
c-1E 2 R


. )


exp{- INt -X )'{-g{ZZ~N)}( )}dt]


(3.61)


+ 4 p 1
(p+2)2 2Bh'X B



Similarly, it can be shown that, as c + 0,


4
c-1E 2


1 2)/2


exp{- -(~ )


'{-(Z Zq)} ( )}dt]


(3.62)


+ 02 1 1
p+2 ( ) 'K (f-X)



Hence, from (3.46) (3.47) and (3.61) (3.62) it follows that for

m 1 p+3, ~ as c + 0,



R(B,a2, ~) (3.63)


1/2 1/2
= 2c (po2 tl/2c +cbo2


2

(p+2) ( )lK( -x )


+ o(c).


Consider now the case = X. It follows from (3.47) that











2
sN
2b(p-2)E 2 C


1 P/2 t
1+2) t]


2 b
R(B,a ,6N)


2
= R(B,a N) -


(3.64)


4

E 2C~S r" T+"ZE) d t ]
a


b2
+
f-


1/2
Note that I( ( ) dt -
01+2t


.Hen


1
p-2


ce, using the uniform


c for m > p+1, one gets
o-


2 4
SN SN
integrability of and in c <
Nc 12 Nc
from (3.46) and (3.64) that as c +t 0,


R(B,o2,6 ) =2c 1/ (po2)1/


(3.65)


1/2 3/2

(p+2)2


2
1b(b-2 9--2)
p-2 p


+ o(c


From (3.63) it can be seen that, asymptotically (as c +t 0) the
b^
percent risk improvement of 6N over BN for 6 P 1 is


1/2 2 1/2
100{ 1/2 b p ~ a (b 2 d )} + c
(p+2)2 (BX'(-)p


), (3.66)


while from (3.65), asymptotically the percent risk improvement of
b
6 over B for 6 = 1 is







87




100{1/2b 2 (lb 2 )}4~ + o(1). (3.67)
(p+2)


Here the dominant term in both (3.66) and (3.67) is maximized when

b = (p2-4)/p. Therefore, from an asymptotic point of view as

evident in (3.66) and (3.67) it appears that for small c, (p2-4)/p
is an optimal choice for b (as in Chapter 2).

















CHAPTER FOUR
SEQUENTIAL SHRINKAGE ESTIMATION OF THE DIFFERENCE
OF TWO MULTIVARIATE NORMAL MEANS



4.1 Introduction



Let X ,X ,... and Y ,Y ,...be ko independent sequences
1' 2' ~1 2'
of random vectors with the X.'s independent and identically

distributed N(91,al 1) and the Y.'s independent and identically
disribtedN(2o2 V2). Here the Gi's E R are unknow while the

V.'s are known positive definite matrices, i=1,2. The problem is
~1
estimation of p = 6 6 Based on X ,..., X and Y ...,
~ 1 2' ~1 n ~1
Yif Fa is estimated by 6 = 6 (X ,...,X ; Y,...,Y ) let the
~m Nn,m ~n,m 1' n 1
incurred loss be given by



L(81'e2'6nm n(m n ~'m 9) + c(n+m), (4.1)



where Q denotes a known positive definite matrix and c(>0) denotes

the known cost per sampling unit. The most natural estimator of 11 is

W = X Y wi th risk
~n,m ~n ~Vm


2 2
R(01821'"2'W,alo 0 (4.2)












= E [(W u) Q(W u)] + c(n+m)
2 2 ~n,m n,m
21'i2,al'a2

[ (X x 1 X ( 1 2)'Q(Y 2
2 2 n n m ~ m
21'i2,al,02


+ 2 (X 1 '~ m 2)] + c(n+m)


2 2
al a2
tr (Q V ) + --tr(Q V2) + c(n+m).
n m ~


2 2
When o1 and 02 are known, the above risk is minimized at n = n*

"12 1/2 2 1/2
(--tr(Q V )) and m =m* E --tr(Q V2)) .Also, for this
c c


pair (n*,m*) we have n*/m* = [ ] .--- However, when
a tr(Q V )
2 2
01 and "2 are unknown no fixed sample size will minimize (4.2)
2 2
simultaneously for all 01 and 02. In this case, motivated from the

optimal fixed sample size n* and m*, and additionally by n*/m*, the

following sequential procedure is proposed to determine the sample

sizes.


T = N + M, where


N = in~n > n ^2 t( k
sC 12


(4.3)











~2
s o'j 1/2"Y


M = inf~m


n > 2 and observe X or Y if neither process has stopped when
o n+1 ~m+1


12 ~2 1/2
/[s, tr(g V )]


^2
n/m ( [s tr(g V )]


(4.4)


12 ~2
/[s, tr(Q V2)


^2
n/m > [s tr(Q V )]


respectively.


Here,


^2 -1 n -
s (n-1)p] z (X. n )'V Xi 2 )


(4.5)


~2 -1 ( Ym '-1(Y -~
s = [m 1p Y.- )V (
m j=1 3 m 2 3


for every n,m > 2. The above stopping rule is a multivariate

analogue of the stopping rule proposed in Ghosh and Mukhopadhyay

(1980). Similar stopping rules were considered by Chou and Hwang
(1984) who estiae ub Y.
N ~

A point of interest concerning the above stopping rule is that
s2 1/2
if N=n is the first occurrence of n > (-- tr(Q V )) while
c











~2
S
m <(-- trg V ))
Sc 2p trS


then necessarily we must observe Ym~ since


^2 12 ~2 12
n/m > [s, tr(9 V )] /Es, tr(Q V )]
To see how the above sequential sampling procedure can be

motivated from a minimax criterion let e. have a N(Y.,T2B.) prior,
~11
i=1,2. Then following the arguments from Section 1.1 the joint

posterior distribution of [e ,02]' given Xi. = x., i=1,...,n and


1 -1 -1
VB ) (x ,v1
nT


2 .
2 1 -1 1V -
mTT


Y + (I +

~1 Ip


2 -1 -1
22


2
2
-- 2(I
m p


Therefore, the posterior distribution of y = e e becomes


2 _2
1 -1 -1 x 2y )(
N(Y -u + (I + VB ) (x Y1 (I +
~1 2 ~p 2 ~ ~1 ~n ~ p B
"1 "2


V ~1-1 (Y-1 2











2 2 2 2
01 01 -1 -1 "2 "2 -1 -
-V (I +B V )+ V (I + --B V ) 1).
n ~1 ~p 2 ~1 ~1 m ~2 ~p ~2 ~2



Hence, the Bayes estimator of y becomes


2 2
01 -1 -1 -Y 02 -1 -1
B (m 111 p E181 ;n 11 p + 2B2 Um



with posterior risk

2 2 2 2
01 01 a 1 -1 a2 2 2 1 22-1~3 +cn
tr[Q(-- Vl p B1 V ) -V2 p B2 2)} cnm
nT1 mT2


The posterior risk being independent of any unknown parameters

implies that the sequential Bayes rule is a fixed sample rule with

sample size determined by minimizing


2 2 2 2
01 01 -1 -1 a2 a2 -1 )-1j cn
trC [ V1 4 2 1 1 m ~2 4 2 B 2) ]+c(nm
ntl mr2


with respect to n and m.

Next, consider the sequence of (N( i,1B ), 1 > 1} priors on

i, i1,2.Then the corresponding sequence of Bayes estimators for
p lS











2 2
B 1 1 -12 -1 -1
-n~m 11 1 V1B1 Nn 11 ;p 2 m2


with risk


2 2 2 2
01 0 -1 -1 a2 a2 -1 -~1~ cnm
tr[Q Vl I Bl 1 L V I + B2 4)) cnm




The latter converges to K tr(QV1) + tr(Q V) + c(n+m) as 1+mi,
n m

and R(91'22~,o2,02 n,m) = K for all 01' 2. Hence, under the loss

(4.1), the fixed sample rule with sample size determined by
02 02
1 2
mi n imizi ng --tr(QV1) + --tr( V ) + c(n+m) with respect to n and m,
n m -2
and estimator W is a minimax rule. Consequently, the stopping

rule (4.3) can be interpreted as an empirical minimax stopping rule.

In Section 4.2 we show the dominance of a class of estimators

over gM for the loss (4.1) under the stopping rule (4.3). In

Section 4.3 we develop an asymptotic risk expansion for gM and for

estimators belonging to the above class up to the second order term.


4.2 A Class of James-Stein Estimators Dominating ,~M


In this section we consider the class of James-Stein estimators

N,M N M)~~ where




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - Version 2.9.7 - mvs