Full Citation 
Material Information 

Title: 
Sequential shrinkage estimation 

Physical Description: 
Book 

Language: 
English 

Creator: 
Nickerson, David MacLeod, 1958 

Copyright Date: 
1985 
Record Information 

Bibliographic ID: 
UF00102789 

Volume ID: 
VID00001 

Source Institution: 
University of Florida 

Holding Location: 
University of Florida 

Rights Management: 
All rights reserved by the source institution and holding location. 

Resource Identifier: 
ltuf  AEH3713 oclc  14706811 

Full Text 
SEQUENTIAL SHRINKAGE ESTIMATION
BY
DAVID MACLEOD NICKERSON
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1985
To Mom
and
Dad
ACKNOWLEDGMENTS
I would very much like to thank Dr. Malay Ghosh for being my
di sser ta tion advi sor. Specifically, I would like to thank him for
his infinite patience, his incredibly deep understanding of
statistics and his dedication to teaching. I would like to thank Dr.
A.K. Varma and Dr. Dennis Wackerly for serving on my committee.
Also, I am grateful to Dr. Andrew Rosalsky and Dr. Randy Carter for
serving on my Part C and oral defense committees. Further, I would
like to thank Cynthia Zimmerman for her expert typing of this
di sser ta tion.
On a different note, I would like to thank Cindy Hewitt and
Steve Ghivizzani for their friendship and merriment.
Lastly, I would like to thank my parents for just about
everything.
INTRODUCTION.....................................1
1.1 Sequen ti al Sampling.................................1
1.2 Shrinkage Estimation...............................8
1.3 Literature Review..................................1
1.4 The Subject of This Research.......................,16
SEQUENTIAL SHRINKAGE ESTIMATION OF THE MEAN OF A
MULTIVARIATE NORMAL DISTRIBUTION.........................18
2.1 Introduction.................................. 1
2.2 A Class of JamesStein Estimators Dominating ~N....19
23 Asymptotic Risk Expansion for ~N and N........2
2.4 A Monte Carlo Study................................40
SEQUENTIAL SHRINKAGE ESTIMATION OF LINEAR REGRESSION
PARAMETERS.......................................5
3.1 Introduction.................................. 5
3.2 A Class of JamesStein Estimators Dominating 8 ....58
33 Asymptotic Risk Expansion for (N and N........7
SEQUENTIAL SHRINKAGE ESTIMATION OF THE DIFFERENCE OF
TWO MULTIVARIATE NORMAL MEANS............................88
4.1 Introduction.................................. 8
4.2 A Class of JamesStein Estimators Dominating W,..93
bl'b2
4.3 Asymptotic Risk Expansion for NM and (N, .....101
TWO
THREE
FOUR
TABLE OF CONTENTS
Pa to
ACKNOWLEDGMENTS. ....................................i
ABSTRACT..............................................v
CHAPTERS
ONE
FIVE SEQUENTIAL SHRINKAGE ESTIMATION OF INDEPENDENT NORMAL
MEANS WITH UNKNOWN VARIANCES............................118
5.1 Introduction................................. 11
5.2 Asymptotic Risk Expansion for XN and q.......121
SIX SUMMARY AND FUTURE RESEARCH.............................146
6.1 Summary ................................4
6.2 Future Research................................. 14
BIBLIOGRAPHY.........................................14
BIOGRAPHICAL SKETCH.......................................152
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
SEQUENTIAL SHRINKAGE ESTIMATION
BY
DAVID MACLEOD NICKERSON
August, 1985
Chairman: Malay Ghosh
Major Department: Statistics
This dissertation is concerned with sequential estimation of the
multivariate normal mean, estimation of the regression coefficient in
a normal linear regression model, and estimation of the difference of
mean vectors of two multivariate normal distributions in the presence
of unknown and possibly unequal variancecovariance matrices.
For estimating the p(>3) variate normal mean, we consider two
different situations. In one case, the covariance matrix is known up
to a multiplicative constant; in the other situation, it is entirely
unknown but diagonal. In both cases, the sample mean is the maximum
likelihood estimator of the population mean. When the covariance
matrix is known up to a multiplicative constant, a class of James
Stein estimators is developed which dominates the sample mean under
sequential sampling schemes of M. Ghosh, B.K. Sinha, and
N. Mukhopadhyay ([1976] Journal of Multivariate Statistics 6j,
281294). Asymptotic risk expansions of the sample mean vector and
JamesStein estimators are provided up to the second order term.
Additionally, in this case, some Mlonte Carlo simulation is done to
compare the risks of the sample mean vector, the JamesStein
estimators, and a rival class of estimators. In the second case, a
class of JamesStein estimators is given which dominates the sample
mean asymptotically by considering second order risk expansions.
The next case is concerned with estimation of regression
parameters in a GaussM~arkoff setup. Here the classical estimator of
the regression coefficient is the least squares estimator, and the
sampling scheme used is that of N. Mukhopadhyay ([1974] Journal of
the Indian Statistical Association 12, 3943). Once again, a class
of JamesStein estimators that dominates the least squares estimator
is developed, and asymptotic risk expansion is given for both the
least squares and JamesStein estimators.
Finally, we consider the estimation of the difference of two
normal mean vectors, and the sampling schemes developed in 1984 at
the Institute of Applied Mathematics National Tsing Hua University by
R. Chou and W. Hwang. A class of JamesStein estimators that
dominates the difference of sample mean vectors is given. Asymptotic
risk expansions are also provided.
CHAPTER ONE
INTRODUCTION
1.1 Sequential Sampling
There are two basic purposes for which sequential methods are
used in statistics. The first is to reduce the sample size on an
average as compared to the corresponding fixed sample size procedure
which meets the same error requirements. Wald's sequential
probability ratio test is a classic example of this. The second is
to solve certain problems which cannot be solved by any fixed sample
procedure. One important example of the latter is the fixed length
interval estimation of the normal mean with unknown variance when the
confidence coefficient is specified in advance. Another important
problem directly related to the topic of this dissertation is a point
estimation problem which is described below.
Let X1, X2, .. be a sequence of independent and identically
distributed N(9,o2) variables where GeR1 is unknown. Bsdo
X1, . X,, an estimator 6, = 6n(X1,...,X,) is used for
estimating e. Suppose the loss incurred is given by
L(e,6n) = A6n ,2 + cn ,(1.1)
where A>0 is the known weight and c(>0) denotes the known cost per
sampling unit. The most natural estimator of a from a sample of size
n is X = 1 X. with associated risk(epcdlo)
n n i=1 1
R(e,X ) = EL(e,[,) =A 2 +c
= A e+ cn .(1.2)
The above risk is minimized with respect to n at
2 1/2 ]
n =n, = [ (A 
where [u] denotes the integer closest to u. Here, for simplicity,
2 1/2
we will assume that (A 2) is an integer. Hwvr o nnw
02 there does not exist any fixed sample size which minimizes (1.2)
simultaneously for all 02. In this case, motivated from the optimal
fixed sample size n1* (when a2 is known), the following sequential
procedure is proposed for determining the sample size.
N = inf (n > m (>2):n i (A 5 /c) 132 } (1.3)
where m denotes the initial sample size and
~2 _1 n 2
sn .~ C(X X ) ,(1.4)
for all n > 2.
The above sequential procedure was essentially introduced by
Robbins (1959) who considered case L(e,6,) = A6n e1 + cn. Later,
Starr (1966) considered the more general case L(e,,n)
A6, esf ,t with s>0 and t>0.
The sequential sampling procedure as introduced in (1.3) has
sometimes been described as ad hoc. Robbins (1959) did not give any
decision theoretic motivation for such a procedure, which as we shall
see below can be justified from the minimax criterion.
With this end, assume that 02 is known. Suppose O has a N(p,T2)
prior. Then the posterior distribution of a given Xi = xi,
2
N( v + (xn y) ,
2 2
S+L
n
2
1+a
2
nT
Here, under the loss (1.1), the Bayes estimator of a is
2 a
T + 
nr
n r
with corresponding posterior risk equal to A 1(2 ,2 +cn
1~~~ + ( /t
The posterior risk being independent of X ., the
1' sn
sequential Bayes rule becomes a fixed sample rule with sample
size, nB (determined by minimizing the posterior risk) and estimator
n n
Next, consider a sequence of {N(v,m), myl} priors. Then the
associated sequence of Bayes estimators is 11 +( )
o2/n 1 + (02/nm) (n ~Y
with posterior risks A + cn. Since the sequence of
1 + (02/nm)
posterior risks converges (as m'") to K = (Ao2 nl+ nan
R(e8,[) = K for all e then the fixed sample rule with sample size
determined by minimizing (1.2) with respect to n is a minimax rule.
In summary, nl, the required sample size, and X 4, the
estimator, is a minimax rule. Therefore, one can conceive the
stopping rule (1.3) as an empirical minimax stopping rule and not as
ad hoc as one might think. (By empirical we mean having any unknowns,
in this case 0 replaced by their estimators, in this case s)
Next, consider the multivariate analogue to the above problem,
i.e., let / X .. be a sequence of independent and identically
distributed N (e o2y) random vectors (p > 3), with BERP unknown and
V, positive definite, known. Based on X ,...,X ,if
(X = (1" ) is used to estimate 2, suppose the loss incurred
is given by
L(8 8 ) = (6n f?) g (6n 0) + cn (1.5)
where S denotes a known positive definite (weight) matrix, and c(>0)
is the known cost per sampling unit. Again, from a sample of size n,
1 n
the most natural estimator of a is X = X., with risk (expected
Nn n 1=LY1
loss)
R(e,a ,2 )) = n a~ tr (Q V) + cn (1.6)
As before, if a2 is known, the above risk is minimized at
1/2*
n = n = (tr QV/c) (here, too, we shall assume that np is an
integer). Once again, if 02 is unknown there does not exist any
fixed sample size which minimizes (1.6) simultaneously for all 02
Again, motivated by the definition of np, we propose the following
sequential sampling scheme.
N = inf {n 1 m(,2) : n 1 (~2c t )2 17
where m denotes the initial sample size and
~2={n1p (X X ) V1 (X. (1.8)
n i=1 ~ n 1 ~Vn
for every n > 2.
As can easily be seen, the above stopping rule is a multivariate
generalization of the univariate stopping rule (1.3). Ghosh et al.
(1976) and Woodroofe (1977) considered more general stopping rules
where I was an unknown arbitrary positive definite matrix. To show
that the stopping rule (1.7) is more than just an ad hoc extension of
(1.3), we will motivate it again from a minimax criterion.
Assume a2 is known and suppose o has a N(v,T2B) prior, where B
is a positive definite matrix. Then, for fixed n, the posterior
distribution of O given X. = x., i=1,...,n, is
~ 1 ~1
n2 u L
with
v2B r2B(r2B + 2 V)1 ,2B ,2 (r2B
n n
2 2 a1 2
SB(r B +  V) B) ,
~2 ~) n ~
.2
 V(I +
n ~ ~p
.2
a 1 v1
B V)
where Ip is the identity matrix of order p. Since the loss (1.5) is
quadratic, the Bayes estimator 6 of o is again the mean of its
~n~
posterior distribution, i.e.,
6 (X )= +T B(T B +V) (Xb v)
~n ~n n ~ s
a2 1
= v + (I + V B) (X g
p 2 n
nT
with the corresponding posterior risk equal to
.2
tr[Q V (I + B Vy) ] + cn.
n p 2 ~
ATr
Again, the posterior risk is independent of the X.'s so that the
Bayes sequential decision rule is a fixed sample rule with sample
size, nB, determined by minimizing the posterior risk with respect to
Next, consider the sequence of {N(v,mB),m 1 1} priors on e.
Then the Bayes estimator of O becomes
8 a2 81 
n ~n ~ p nm ~ n
with associated posterior risk
,2 a 1
tr[Q V ( I + BV) ] + cn.
n ~p nm
By using a limiting Bayes argument as in the univariate case we see
.2
that the posterior risk converges (as m+.) to K tr(Q V) + cn
2 *
where R(e,a ) = K for all 0. Hence, n the sample size, and X ,
the estimator, is a minimax rule. Consequently, the proposed
stopping rule (1.7) can also be viewed as an empirical minimax
stopping rule.
It is the stopping rule (1.7) and its eventual modifications to
fit a regression problem, its generalization to a more complex
covariance structure and its adaptation to a two sample situation,
that will be of interest to us (as far as sequential sampling goes)
in the subsequent chapters.
1.2 Shrinkage Estimation
Consider the situation X X .. are independent and
Identically distributed N (eO2V)? random vectors, eeRP is unknown, V
is a known positive definite matrix and p 3. Again, the problem is
to estimate O. For a sample of size n, assume the loss (1.5) in
estimating a by 6 =6 (X1,..X ). The minimax estimator
1 n
of a is X = E X.. However, Stein (1955) was able to establish
the existence of estimators of o (for known a2 and for Q = V =I)
~ p
which dominated 2n (and hence were minimax themselves). Later James
and Stein (1961) exhibited some explicit estimators dominating X.
~n
Since then, there have been a multitude of articles concerning
extensions of the JamesStein estimators to more general situations.
Specifically, we refer to Berger (1976) when a Q and V are known,
Efron and Miorris (1976) when Q = V = Ip and a' is unknown, and
Berger and Bock (1976) when Q is a known diagonal matrix and the
covariance matrix is diagonal and unknown. For a general class of
estimators that dominate the usual mean vector, we refer to Baranchik
(1970), Strawderman (1971), Alam (1973), Efron and Morris (1976),
Berger (1976) and Berger (1976).
To motivate the idea of a shrinkage estimator let us consider the
case Q = V = I .Suppose O has a N(U,T I ) prior. Then, by previous
results, the Bayes estimator of a given X. = x., i=1,...,n, is
~ 1 ~1
T2
T
, 2 ~n
B
6 (x )= E +
~n ~n
a2n
n
11 + (1
Here, if a2 and r2 are unknown then they must be estimated from the
da ta. With this end, note that marginally
2 ~~2 .
~n ~n p
where n.n denotes the usual Euclidean norm.
~n~
1 1
E[
p 2 2
.2
T2 +
Also, marginally
Hence,
,2
(n1)p (~n1)p
~2
where we are referring to definition (1.8) of s .
E [s 2] =G.2
Consequently,
Therefore, replacing the unknowns in 6 (X ) by their estimates we
Nn
have
~2
JS (p2)s
6 = v + (1 )(2 u
~n ~2 ~n
which is known as the JamesStein estimator. The above empirical
Bayes interpretation of JamesSteim estimators is due to Efron and
Morris (1973). In James and Stein (1961), the estimator was studied
from a purely frequentist perspective.
JS
For better understanding of the shrinking property of 6 ,note
that
JS
"nX Vil "~ >16 Ul
(p2)s
n 
112 vl 1 I
Nn
if and only if
Therefore, the distance between 2n and a will be greater than that
~2
JS (p2)s,
of 6 and if and only if 1 > 1 
JS
true an will shrink 2n towards the prior mean u.
the above condition is always met we sometimes use
JS
version of 6 given by
,n
SWhen this is
To assure that
the plus rule
+ (p2)~2
JS n n )
6n
where (a)+ = max (0,a).
JS
The JamesStein estimator, 6 and its eventual modifications
to fit a regression problem, its generalizations to a more complex
covariance structure and its adaptation to a twosample problem will
be of interest to us (as far as estimation goes) in the subsequent
chapters.
1.3 Literature Review
There are only two pertinent articles to date that address the
inadmissibility of the sample mean vector, X ,as an estimator of o in
~n
a setup where the sample size is random. Ghosh and Sen (198384)
consider the case of arbitrary Q and V, but consider a twostage
sampling scheme. The main criticism of this approach is that a two
stage procedure requires on an average more observations than nP, the
optimal sample size when 02 is known (see for example Ghosh and
Mukhopadhyay, 1976). Takada (1984), on the other hand, considered
the case Q = V = I under a full sequential sampling scheme. The
Main objection to his approach is that at the nth stage of sampling
his estimator (similar to the JamesStein estimator) uses only p of
the available (n1)p degrees of freedom to estimate a2. This will be
made clear in Chapter 2.
Before leaving this section let us familiarize ourselves with a
particular identity from Stein (1981) which will be used repeatedly
in all the proofs. First we require the following lemma.
Lemma 1.1 Suppose Y ~ N(0,1). Then, if E[g (Y)] < m, one has
E[g (Y)] = E[Yg(Y)].
(1.9)
1 1 y2) an 'eta 1y y(
Proof: Write #(y) = ex (y )adnoe ta ()=y y)
q2
Now,
E~g'(Y)] =
J gl(y)4(y)dy
(1.10)
0
= gl(y)(y)ydy + J
0
gl')(y)(ydy
= J gl(y)[f z4(z)dzldy
0 y
0
+ J
g' (y)[f
ze(z)dzldy
= j [g(z)g(0)]zt(z)dz + J [g(0)g(z)](z6(z))dz
= ( [g(z)g(0)]zO(z)dz
= E[Yg(Y)] + g(0)E[Y]
= E[Yg(Y)]
since E[Y] = 0. This brings us to the identity, referred to as
Stein's identity, which is stated as follows:
Lemma 1.2 Let X ~ N(O,a ). Then
E [~h'(X)] _
Proof: Define g(x) by g(x) = h(6
h (x) = ( ) Note that Y
ifE 2[jh (X)I ] < one has
8,cr
1E [(XO)h(X)] (1.11)
2 2
+ ax). Then, h(x) = g ( ) and
X 
= ( )~(01.Hence,
= E 2 9 E)]
6,a
= 3 ECg'(Y)]
E[Yg(Y)]
1 EX e g X e)
E 2[h (x)]
B,0
(1.12)
1
=E 2[(X
 9)h(X)] ,
where in the third equality in the rhs of (1.12) we use Lemma 1.1.
In subsequent chapters, however, we shall require a multivariate
version of the above identity which may be stated in the following
lemma.
ah(X)
Lemma 1.3 Le NeI). Then if Ee[ ](T
1 < i < p, one has
ah(X)
Ee [ aX. ] = Ee [(Xi ei)h(X)] ,
(1.13)
for 1 i (p.
Proof: Note that for each 1 < i < p,
ah(X)
Ee IT ]
ah(X)
= EeEe.CaK_
( X1,...,Xi1,Xi+1"...X ]
(1.14)
=EeE [C(X.e.)h(X) IX ,..X ,X. ,..X]
where for the second equality we use Lemma 1.2 and the independence
of the X's.
An interesting consequence of Lemma 1.3 is the following: If
6(X) = (X1 + el X),..., Xp + b (X)) is a rival estimator of
X ~ N(9,I ) for estimating 9, then assuming the loss (1.5) with Q = Ip
and assuming that E, [5 1 ] < = for 1 ( i ( p, the risk
difference is given by
R(e,6) R(e,X) (1.15)
=E [1 =1(Xi + ~i(X) 2 =(X9)
=E [2C P4.o(X)(X. .) +C 6 #2(X)]
= EBE[2 =1 aX I=14 ~X,1 ,
where the last equality is obtained from Lemma (1.3). Hence to
obtain estimators 6 which dominate the usual estimator X one needs
to find solutions to the differential inequality
B~(x) r=2
A~~~x)~ E 1+1 (x) < 0 (1.16)
Put b (x) = a(xi qi)/lx un the left hand side of (1.16)
reduces to a(2(p2)2a) < 0 if 0 < a < 2(p2), p t 3. Hence, for
an estimator of the form X dominates X under the loss
(1.5) with Q = I This is the classical result of James and Stein
~ p
(1961).
1.4 The Subject of This Research
In Chapter 2, we consider the case of arbitrary Q and V (and the
loss (1.5)) under a full sequential sampling scheme. We show the
exact dominance of a class of estimators over the usual mean vector
in estimating e. Further, we obtain asymptotic risk expansions for
such estimators up to the second order term. A Monte Carlo study is
also performed to compare the risks of the sample mean and two rival
classes of estimators.
In Chapter 3 we consider a multiple regression set up. Here we
wish to estimate the vector of regression parameters 8 under the
loss (1.5) with Q) = Zn Z t,Z being the usual design matrix of order
nxp, and V = (Z' Z ) 1. Again, the sampling is done sequentially
with each additional point consisting of p nonstochiastic regressor
variables and observing the response (a scalar). Here we show the
exact dominance of a class of estimators over the usual least squares
estimator. Further, we obtain asymptotic risk expansions for such
estimators up to the second order term.
In Chapter 4 we consider a multivariate version of the Behrens
Fisher problem of estimating the difference of two normal means when
the variances are assumed to be unequal. The sequential analogue in
a univariate setting was first introduced by Ghosh and Mukhopadhyay
(1980), and was later generalized to a multivariate setting by Chou
and Hwang (1984). We have proposed in Chapter 4 an extension of the
stopping rule (1.7) to handle the two sample situation. Also, the
loss (1.5) is used in the estimation of v, the difference of two
normal mean vectors. We show the dominance of a class of estimators
over the difference of the sample mean vectors. Once more we produce
an asymptotic risk expansion for such estimators up to the second
order term.
In Chapter 5 we return to the one sample problem where O_ is
diagonal and the entire covariance matrix, formerly o2V, is assumed
to be diagonal and unknown. With the sequential sampling rule (1.7)
generalized to this situation and the loss (1.5), we have produced an
asymptotic risk expansion (up to the second order term) for a class
of estimators dominating asymptotically the usual mean vector.
CHAPTER TWO
SEQUENTIAL SHRINKAGE ESTIMATION OF THE MEAN
OF A MULTIVARIATE NORMAL DISTRIBUTION
2.1 Introduction
Let X ,X,...b euneo needn n dnial
~1' ~2 easqec fineedn n dnia
distributed N(e,o2V) random vectors, where OERP is unknown, and V,
positive definite, is known. The problem is estimation of G. Based
on X1..., Xn if we estimate a with 6= 6 (X,..., Xn), let
the loss incurred be (1.5). We have seen from Section (1.2) that for
this situation the sequential sampling rule (1.7) and the estimator
Ep can be considered as an empirical minimax rule. Consequently,
producing a class of estimators which dominate (N takes on a greater
meaning than a mere pathology.
In Section 2.2 we exhibit a class of JamesStein estimators
which dominate Rp, the usual mean vector, and simultaneously, address
the two objections raised in Section 1.3. The first objection was in
reference to Ghosh and Sen (198384) and their two stage sampling
scheme. Unlike a two stage procedure which uses only s~ in defining
the stopping rule, the stopping rule (1.7) uses an updated estimator
of cr2 at each stage of the experiment, and thereby demands fewer
observations on an average. The second objection was in reference to
Takada (1984) and his version of the JamesStein estimator. We shall
see later in this section that Takada's class of estimators use only
p of the (n1)p degrees of freedom available at stage n to estimate
,2. The class of JamesStein estimators we produce use all the
available degrees of freedom for estimating 02, and as a consequence
allows the estimator to stabilize for large n. In Section 2.3 we
produce an asymptotic risk expansion (as c+0O) for the proposed class
of estimators. In Section 2.4 we give the results of Mronte Carlo
simulations comparing the risks of the sample mean, the proposed
JamesStein estimators and Takada's estimators. As expected, our
estimators achieve greater risk reduction than Takada's.
2.2 A Class of JamesStein Estimators Dominating N
In this section, we consider the class of JamesStein estimators
6N(X1,..XN), where
bs
b n 11
6 (X ,..X ) = V ( _ x), (2.1)
~n 1' n ~n , Q 1 
n(X X) V QV (X X)
n ~ ~ ~
for every n > 2, where s = 7 s (nt, 2), b is a constant and ERP
is the known point towards which we want to shrink. Very often X is
taken as the prior mean. The main result of this section is as
follows:
Theorem 2.1. Under the stopping rule (1.7), and the loss (1.5),
20
R(e,o2,6 ) < R(e,a2 ,X
for every be(0,2(p2)).
Proof: First write
2 1 e
2 2b SN ~N ~ N
R ( ,a 2,6 ) R(9,0,a ,)=2bE [] (2.2)
~ N ~N 2 Y 1 1 1
+ b2E [sN4/(N {(X X)'V 1 V (X 1)})].
2N N ~ N
8 ,a
Since and V are both positive definite, using the simultaneous
diagonalization theorem, there exists a nonsingular D such that
DQ "D1 = I and DVDI = A, a diagonal matrix with all positive
~ p ~ ~
.th
diagonal elements. We write a. as the i diagonal element of A.
Use the transformation _Z. = D(_X ) =2.. so that the Z.'s
are iid N(c.~2 A), with C = D(6 1). Writing~i Z n E. =Z. for
every n 1 one can rewrite
2 l1 n 1
sj = (nv1)(+2) (Z Z )'A1~ (Z. Z), n 2 ; (2.3)
n n i=1 i ~n ~ ~1 n
(2i xv1)'V QV (X = Z' A 2 ng 1 (2.5)
n ~ n ~n n'
Then from (2.2), one gets
R(3,a ,6 )
~ ~
2
R(e,a ,X )
~ N
= bE (s /N)(Z 'A ( c)/( 'A 2z )]
2 N N N N N
(2.6)
+ b2E [~(s /N )(Z 'A Z
2 N N ~ ~N)l
Next we use the Helmert orthogonal transformation Y = (Z Z )//7 ,
~2 1 ~2
Y = (Z + Z 2Z / .. Z+.. n1Z)/2)
3)J~~ .. ~2 ~3 sn (~1+ +Zn1 1 )/n n
..to write
S2 n .A 1Y.n2
n i=2 ~1 ~ 1
(2.7)
where the Y.'s are iid N(0,a A).
~1 ~
Denote by Bn the aalgebra
genera ted by Y ,... Y Now,
~2 Nn
E [(s /) A ( )/Z A 2 )]
2 N N N N N
= E E E [{I(s I n)(Z Zc)
n=m 2 2 nN ~ n 
(2.8)
2 2
(_Z 'A _Z )}I (B ]~n
1 2
n,,m 2C Sn
E (2I ( A (Zn s))/(Z 1A' Z )JB ]
[N=n] sn n ~ ~
Z
= [1s2 p a.1(02 /n)a .E { ni ) IB ]
Cn=mE 2 C n [ N=n] 1=1 1~ ,n 2 
5,0 5, aZ
n1 n sn
[n2sZI (p2)/(Z 'A Z )]
2 [ N n] ,n n
2a E
n=m
For the second equality in (2.8), one uses the independence of Z and
~n
(Y ,..Y ), while for the third equality, one uses Stein's identity
N2) Nn
(of Stein, 1981).
Next, note that for every n, nZ 'A /,2 ~ .W /. h
,n N n 1=1 ni/a 1hr
2 "
the Wni's are independent and Wni ~ X2( ) i ,.,.
20 a.
using the result (see, e.g., Cressie et al., 1981) that if
P(U > 0) = 1, then E(U1) ; E(exp(tUl))dt, one gets
Hence,
S2 2 1
E 2["ni A 3 /a ]
(2.9)
p
=o JO.x
i=1
= O~ n
i=1
E[exp(t wni/ai)]dt
1 2 2 2
{(1 +2tai ) exp[ntqi/(c a. (a + 2t))]}dt
=g (hn) (say) ,
where qas 4(5,02). In this section we need only that g (hq) is
nonincreasing in n. Now, using again the independence of Zn and
(Y ,...,Y ), one gets from (2.6) (2.9),
~2
R(0,a2,6 ) R(9,a2,R ) (2.10)
)N N )N
bs
2 n
= bZ, g (A )E [(sS /n)(2(p2) )I
1 2 s
< 2b(p2)zn g (A )E [" n s(1 )
where in the last step one uses 0 < b < 2 (p2). Also, in the
above, EZ denotes expectation when the Y.'s arei ae iid N(0,a A).
Accordingly, for proving the theorem, it suffices to show that
2 g(A)E[ns 1 /0 )I ] > 0 for all a2 > 0 (2.11)
n=m p~n E2" Sn n [N=n]
To prove (2.11), first observe that tr(QV) = tr(QD1ADI1)
tr(D QD 1A) = tr(A), since DQ D0' = I p. Let no denote the smallest
integer ? m such that p(p + 2)l cn /(tr A) to 0. Then, we write
1hs of (2.11)
(2.12)
2 [CN=n]
)I ]
2 [N~n ]
a 0
n 1
= o g()E[1s2(1
~n=m 9p~) n C 2 n (
1 2
+ g (An )E 2[n0 s (1 
p n 0n
00 1 2 n+1
+ Cn n (9p n+1l)E 2[~(n+1) sn+1 (1 )I[Nan+1]l
1 2 n
9p(d,)E 2[n sn (1 2) [N>n+1]l '
where the first term in the rhs of (2.11) should be interpreted as
zero if n =m. Note that for n n ,on the set [N > n+1],
o o 
s 2> p(p+2) cn2 /(tr A) > p(p+2) lcn /I(tr A) to 0. Accordingly,
since g ( n) is nonincreasing in n,
third term in the rhs of (2.12)
1 2
E n~n 9p n+1 )E 2[{(n+1) sn+1(1
(2.13)
2
S
n+1
2
Note that I[Ngn+1] is a B measurable function. Then using the
2 2
representation s =l ((n1)s, + U ~)/n, where
U =l (p+2) A 1Y 1 a2xP /(p+2) independently of
Y ,...,Y one gets with probability 1,
~2 n
22
2 2 4
(n1)s2 p (n1)2s4 + 2(n1)s2 a
n p+2 nn p+2 p+2
 I[ 2
[Nan+1]L (n+1)n (n+1)n a
=I n1s2~12 n(1 3
(2.14)
[Nn+](s4 /62( L 2n) + 2 nl22nn+11p+2)
po 1 1
p+2(n+1)n 2
P+2 (n+1)n
[CNan+1]n sS~ (1 y
I4 s
+ I sn 3n1
[N n+1/ 2(n1)
2
sn (n1)p
2n2(n+1)(+ )+
po2 n1
py*n (n+1)
of (2.14) is
Note that the multiple of I[Nn1 in the extreme rhs
2
a convex function of s ,where the minimum occurs at
(n1)p
n +
= 2 p+2 ,2
3n1
(2.15)
2 2
[Nin+1], s > a ,it follows that
n
Hence, recalling that on the set
2nd term in the extreme right of (2.14)
(2.16)
2a2 ( PE(n1)p
n2(n+1)
(3n1)a
[N n+1] 2(+)
2
o (n_1)p _
(p+2)n2 (n+1)
= I a (( )( )) .
[N~n+1] 2 p+2
n (n+1)
Hence, from (2.13), (2.14) and (2.16),
third term in the rhs of (2.12) > 0 .
(2.17)
Next, note that if no m, since I[Nm 1 with probability 1
and E [sm(~1 S)] =_ 02 (m1)p((m1)p+2)1 (m1)po2 2m2
02 02 p+2(m1)2(p+2)2 (m1)2(p+2)2
> 0, rhs of (2.12) > 0.
For no > m (, 2), first note that for
2 1 2
n < no 1, on the set [N = n], sn < p(p+2) cn /(tr A) <
p(p+2)1c(no1)2/(tr A) < 02 so that using g (An) nonincreasing in n,
first term in the rhs of (2.12)
(2.18)
2
s
2 [N n]
n 1
> g (6 _)E~ E
[n1s2(1
2 n
(o2 n 1
o
I
In view of (2.11), (2.12), (2.17) and (2.18), for proving the theorem
it suffices to show that
2
s
n
E [n1s2 (1 2)
02 o no 2 [Nyn ]
2
s
2 [IN=n]3
n 1
to
ncm
E [n1s2(1
02 n
(2.19)
s20 = ((n 2)sno1
+ Un)/n 1)
To prove (2.19), first writing
one gets
2
S
n
o' [~Nan ]]
ao
E 2[nos2 (1
ao
(2.20)
2
s
n
 2INan ] Bn 1
(I 0 0
= E2E 2[nos2 (1 
2 0 0
2 4
pa pa
p+2 p+2
(o) n 1+ p+2
o
= E {[i7 
,2 ~ no~n
+2(n 2)s2 
o
no(n 1)2 2
= E2[{ s 1 b (s 1/2 + cn [ n] '
where
[N>no1
an 2' n
o no(n 1) o
(n 2)2 ( n2)p
2 Cn2
n (n 1) o n (n 1) (p+2)
(2.21)
(p2)(n 2)
no n (n 1)(p+2)
(> 0), so that
Let gn = b
n n
cn . g = 2(n 2)/{n (n 1)(p+2)} > 0. Also, let
0 0
2(n 2)p
n n o(n 1)(p+2) S
d =2b
no n
that d E (0,1).
n
0
Now rewrite
extreme right of (2.20)
1 2 no
n 1 2 ~n (n 1 7
o a o0 o
(2.22)
4
S
n 
o
o a
2
2s
n 1
o
+2)
n n [Nyn]
0e 0 0
4
s
n 1
_ of N~n ]
a o
>d E2 1p21
oa o no
S
n 1
_ 0_ Nan ]] > 0, noting again that
If E 2 51 (s2 1
< 2
on the set [N = n] for all n < no1, one proves (2.19) from (2.20)
and (2.22). Otherwise noting that d, < 1, one gets from (2.19),
(2.20) and (2.22) that
(2.23)
1hs of (2.19)
2
s~lNn
n 2
> c E [n1s2(1
n=m 2 n
2
S
n 1
2 [~iN~n 1]
1 2
+ E [~(n 1) sn (1 
2 o n 1
Proceed inductively to get either Ths of (2.19) > 0, or finally end
with
2
 s [N>m]] 0 ,
E 2[ 1sm(1
(2.24)
1hs of (2.19) >
as calculated earlier.
b
for X and a
2.3 Asymptotic Risk Expansion
First we obtain an asymptotic (as c + 0) expansion for the risk
of 2 Observe
+ cE [(Nn )2/N]
2
R(e,a2, ) = 2cn
(2.25)
rd *
= 2c a(tr QV) + cE [(Nn ) /N]
2
a.s. a.s.
~2 2
+ = as c + 0 so that sN + o as c + 0.
From definition N
Also, using Anscombe's Theorem, /RFii(?N a ) + N(0,20 ) so that
using the delta method, /RFi(PN a) + N(0,(1/2)o ).
Next use the inequality
1/2, 1/2,
(tr QV/c) sN N ( m + (tr QV/c) SN (2.26)
Dividing all sides of (2.26) by n letting c + 0, and using
a~s. a~s.
SIN + as c + 0, it follows that N/n + 1 as c + 0. Thus,
L L
Jn*p(IN a) + N(0,(1/2)a2 ), and also Jn"p(sN1 0) + N(0,(1/2)a )
Again from (2.26), one gets the inequality
(n ) 1 (s c)/o < (Nn )/(n ) /2< m/(n ) +/2(n* )/ (s a)/a.
(2.27)
Since m/(n*) 12+ 0 as c + 0, it follows from (2.27) that
(Nn*)/(n*) 12 L N(0, ) as c + 0 which implies that
(Nn)N12L L a~s.
(Nn*/N +N(0, ) as c + 0 since N/n* 1 as c + 0. Also,
2p
the uniform integrability of (Nn*)2/N for all c < co (specified)
can be proved by repeating essentially the arguments of Ghosh and
Mukhopadhyay (1980). Thus E 2[(Nn*)2/N] + as c + 0, and it
follows now from (2.25) that
R(8,a2,X ) = 2c 1/2 o(tr VY) 12, c+ o(c), as c +t 0
N 2p
(2.28)
Next, we find the asymptotic risk expansion for
It follows
from (2.6) and (2.8) (2.10) that
R(O,a2, g)
(2.29)
/N/ (1+2ta1)
0 i=1
(1+2ta. 2 exp(
exp( N~ p1 aa+t)dt]
.2 i 1 1iai2
= R(e,o2 N )2b(p2)E 2[(sN
'N~ ,
(2
i dt]
a.(a.+2t)
S1
+ b20 E2E [(4/N
2N
H P
0 i=1
Nt cPp
"TE i=1
1/2 a.s.
S(tr y)
2
04 p 2and Nc
(p+2)
a.s.
As c sN+
S=(tr A)
0 =
(c a.)
a. Hence, for C f 0,. as c + 0,
exp( E a (ai 2t))dt
c1(sN/N)/" I (1+2ta 1 2
0 i=1
(2.30)
(S4/e 1/2 )m P 1/2 1 1/2
= (sN/(N 2 ))f (1+2tc a
0 i=1
N 1/2 5
exp( Ntc~ 2P~ gp )
a.(a.+2tc)
1/2
c
1=1
2
+ ( P a/(o(tr A)
(p+2)
5 /a2)dt
)}{m exp( t (tr A)
Oa
= P 4(tr A)
(p+2)2
P2 (4tr 9V
(p+2)
1/2
1/2
(tr A) (c
C2/a.)1
1 1
1 N 1 1 1
since tr A = tr QVj, and
cP 5/a
=' CA 5 = ( OA) 'D' (DVDI) 1(DVD I) 1D(8A)
(2.31)
=(ex)' y1D11 1y(ex)
Next we prove the uniform integrability (in c y c ) of the left hand
side of (2.30) when C f 0. First note that
1hs of (2.30)
(2.32)
2
)dt
a.(a.+2)
1 1
<(4/(Nc))[fl exp( Nt L
N i0
p
Sdt]
N
+ exp(
QE
p 'i 2t
i=1 a.(a.+2) max a.
4 4 1/2 2
S(s /(Nc))(K/N) = Ks /(Nc
where K is a constant not depending on c. In deriving the first
inequality of (2.32), we have also used that t/(a.+ 2t) + in t for
all i since a. > 0 (i=1,...,p). Thus, it suffices to prove the
4 1/22
uniform integrability of s 4N/(Nc )2 in c < c First use the
inequality
4 2 ~(6 E 4 /(2 )1+6 1 (.
E 2[(sN/(N c))I ]4N~ > dl E 2sN/Nc} ] (.3
for some 6 E(0, 1/2 ) Taking 6 in (6 1/2 ), and using Holder's
inequality, one gets
Ea2[{s N /(N2c)}1+6 ] (2.34)
(1+6)(1+6)
< [E (N2c) (1+6) (1+6 )/(1+6)[E (s 4 661 (66 )/(1+6)
,2 a2 N
Next observe that for any arbitrary E, in (0,1),
2~~~ ~ (+)2 (+)2 2 (1+6)
E [( c] E [(N c)is I ( *c (2.35)
2 2 [N
S(m c) ("P ,(N en*) + (E tr(QV)o2 +).
34
To proceed further we shall require the following lemmai.
= 0(c )
Lemma 2.1 For any 0 < e < 1 and pt 1 ,
P 2(N en*)
Proof: Note that
=Is* P (N=n) ,
n=m 2
P2 (Nlen*)
(2.36)
where [u] denotes the largest integer I u.
Now,
2
m
< c r q)
~2
P e(N=m) = P 2(s
(2.37)
2
m
Str(QV) *
2 (m1)p
= P(x <
(m1)p 2
Using the fact that
k1 1/2 k1/2 ,d/2
< d) =
k
r( + 1)
2
P(x
k
2
+ P(x < d)
k+2
(2.38)
k+2
k/2 "l?"
= Oe(d ) +Oe( d)
= Oe(dk/2)
we get
P 2(N'm) = Oe(c
(2.39)
Therefore all we need show is that
[E~n*] 3 ()
nc+ 2 (N=n) = 0(c
a~~
(2.40)
Now, for n > m+1,
< c )
tr~QV)
(n1)p c
)p 2
~2
2 n
= P(X2
(n1
P 2(N=n)
(2.41)
2
n
tr(lV)
exp(h (1pc
,2
(n1)p
exp(h c
n ~)E[exp(hx 2(p
2 (n1)p
trq)1+2h1

h>0
=inf
h>0
2(n1)p
n 1 2
tr(qV) 2)
2
n 1
Str(QV) 2 I
(n1)p
=exp[ (1
2
2 (n1)p
n 1 2
tr(qVC) 2
2
n 1
c ~ ~Z)(c
= exp(1
where for the third inequality in (2.41) we use the fact that
P(Ugd) = inf chdE~ehU]. Next observe that for n ( [En*],
h>0
c trqV E2 ye <. Since x exp(1x) is increasing in x for
0 < x ( 1, one gets
(n1)p
2
n 1 2
(c )]
tr(QV) 2
2
[En*] [En*] n 1
c P (N=n)< gr exp(1 c )
n~m+1 2 nm+1tr(QV)2
(2.42)
(m1)p
2
c
(m1)p (nm)p
2 [Cn*] (m1)Pep l2 22 2
n=m+1
r1 1 ,
tr(QV) 2
(m1)p (m1)p (nm)p
2 1 1 2 Z (m1)Pep l2 22 2
< c (e) n [xp1 )]
arl) n=m+1
The series in the extreme rhs of (2.42) is convergent. Therefore
= 0(c
c P (N=n)
n=m+1 2
(2.43)
Hence, for m > 2, c (
rhs of (2.35) 5 K (m ~1I) c
+ (E tr(QV)a ~(+)
,(2.44)
where K (> 0) is a constant not depending on c. Further, using the
backwuard martingale property of Tn = {(p+2)/p~s2 n ob' aia
inequality for submartingales, it follows that for every r 1 ,
E (2r) T ]r J E 2(T )
E2( N ) 2 p+2 N2N
T r K E(Tr
Tn) 1 2 mr
a
2
a n>m
(2.45)
where K1 is a constant which only depends on r. Combining
(2.33) (2.35) and (2.44) (2.45), the uniform
integrability (in c < c,) of sN/(N 2c)2 follows.
e P x (i.e., s f 0), as c + 0,
Thus, for m > 2,
p 1/2 42
n (1+2tail) exp( ; c=1 a (a+2t)dt]
i=1
c1E 2[(sN/N)( 1
(2.46)
2
(p+2)
a (tr QV) I(eX 1) V Q V (9X)1 )
Similarly, it can be shown that
11
. 1/2exp(
ax( ;
c1E4 2[(sN/N) p (1+2ta 1
a 0O i=1
(2.47)
p+2
as c + 0. Hence, from (2.28) (2.29) and (2.46) (2.47) it follows
that for m > 2, and e Z x, c + 0,
R(e,,2,6b (2.48)
1/2 1/2
= 2c a(tr QV) +
c cb ~2(p2(4p l4)p ~ tr
(p+2)
gV1
QV)
x {(e1) V 1 1 1 1 + o(c) .
x E(E =W./a.) + o(c
For 6 = 1, i.e., 5 = 0, it follows from (2.29) that
R(X,o2,6 )
(2.49)
1
(1+2ta i
dt.
1/
= R(X,o 'N) 2b(p2)E 2(NN0.H
0 1=1
121
2 2 4 1
+ be E2(sN/N'J0 .8 (1+2tai )
0 1=1
P 1
H (1+2tai)
i=1
dt = E(C =1 ia ) where the W 's are
S1/2
Note that jO
. 1/2
as
a.s.
iid x2 Note sN/(Nc
1'
9 02/(o(tr QV)
) 9 o'tr QV)
o3(tr QV)
1/
a.s. 2
) + 4/(,(tr QV)
2
(p+2)
2
) = P
(p+2)2
and sN/(Nc
c + 0. Hence, using the uniform integrability (in c ( co) of
S1/2
s /(Nc ) and s /(Nc
) for m 2, one gets from (2.28) and
(2.49) that for m > 2, as c + 0,
2b 12
R(X,a ,6 ) = 2c a(tr
12
Qv)
(2.50)
. 1/2
1/2 2
c b~y~(p+2) 2
p 4
p
 b)(tr QV)
1/
Hence, for 0 < b < 2(p2), m > 2, asymptotically (as c + 0), for
O / x, the percent risk improvement of 6q over XN is
100{ 1/2 b ~~2(2(p24)/p b]a)(tr QV)3/2 (2.51)
(p+2) ~
x {(e1)'V lQ V (8,x1)}c + o(c).
For 9 = x, O < b < 2(p2), m > 2, asymptotically (as c + 0) the
percent risk improvement of 6~ over 2N 1s
10I12(r V)1 b P ~N 4/ N bE ~1+ol
2
(p+2)
(2.52)
Observe that the dominant term in both (2.51) and (2.52) is
maximized when b = (p24)/p. Unlike the fixed sample case, the
optimal choice of b in the sequential case depends on unknown
parameters O and 02. From an asymptotic point of view as evident in
(2.51) and (2.52) it appears that for small c, b = (p24)/p is the
optimal choice, which is different from p2. We may also note that
the a.'s are the eigenvalues of A = DVD i.e., the eigenvalues of
D DV = QV. For Q = V =I the expressions given in (2.51) and
(2.52) simplify respectively to
50b 2 2(P2(p2 4)/p b)op3/2 eOX)' (e01cl 2+ o(c 2)(2.53)
(p+2)2
and
50b 2(2(p 4)/p b)p (~p2) + o(1) (2.54)
(p+2)
Remark. It should be noted that in defining 6 n (21,fon
~2 2
uses sn instead of s then carries out the asymptotic calcula
tions in the same way as before, one will find that the optimal
choice of b = p 2. Then, however, our method of proof does not
give an exact dominance result as in Theorem 2.1.
2.4 A Monte Carlo Study
For simplicity, consider in this section the case when
x = 0, O = V =I .In this special case, Takada (1984) has shown
~ ~ ~ ~ p
that if one defines
n ~ ly = (p2 n > 2 (2.55)
~=(p+2) n hn
b b
then the estimator 6N N 1"' X ), where
6 =X X (2.56)
dominates RN for all 0 < b < 2 (p2). The difficulty with the
estimator n1 is that even for large n it does not stabilize since for
1 2
every n > 2, rh ~ (p+2) x In this section, our objective is to
b b
compare the risk performance of IN N and 6 with b = p2 and
b = (p24)/p.
For Monte Carlo simulation, we take p = 3. Also, in this
section, for actual risk simulation, we consider 6p2 and 6p2 or
2 b2
6124/ and 6!, 4 where 6 =6 (X ,..., X )=(1 )I \ ,
N ~ N, n ,n nX .X ~n
,n ~n
b bbn
n > 2, 6 =6 (X ,..Xn) =(1 l )t X ,ng 2.
Nn, ~n, ~1' Nn nX
~Vn ~n
In the above a+ = max(a,0). Such plus rule versions of JamesStein
estimators prevent overshrinking, and in fixed sample situations,
perform better than the usual JamesStein estimators.
To simulate the sequential sampling procedure, and evaluate
the estimators under consideration, a large pool of trivariate
N(o'l3) variables was generated with o' = (0,0,0), (0, 1/2 *1/2 )
1 1
(0'2l 011,(,,) 0/,2,(,,) Also, c, the cost
per sample unit, is taken as c = .01, .05, .1, .25, .5 and 1. It
b b
should be noted that the risk of the estimators aN and 6N depends on
6 only through Hen, while XN has a risk which does not depend on
nen. For each fixed e and c under consideration, from the pool,
samples were taken sequentially from the top down until 1000
experiments were completed.
A single experiment would be taking sequential samples from the
pool until the stopping criterion was met. At this point, sampling
would stop, the number of samples would be recorded and the estimators
2 2
p 2 p 2 (p 4)/p g(p 4)/p, n hi soitdlse
~N' N 6N 6N adtei socae lse
are computed.
On the completion of 1000 experiments, we compute the average
p2 (p24)/p p2 6(p24)/p, n hs r h
losses for 2y N ,N ,N N, ndtes ae h
simulated versions of the corresponding risks. Also, at this point,
we compute the percentage loss improvements
100((OXp2
100L~,R) L(e,6N ))/L(e,X );
100(~eX(p 4)/p
10(~,_N) L(eN ))/L(e, ~N)
100(L(e, N) L(e,6N ))/L(e, N
103((OX(p 4)/p
10(~ ,N) L(e,6N )/,e
Our simulation findings are summarized in Table 2.1 and Figures
2.1 to 2.6. It is clear from the table that as in the fixed sample
case, when X = 0, the risk improvement of all the estimators is most
substantial when 18en = 0, and the improvement keeps diminishing as IIen
moves further and further away from zero. Also, it is clearly evident
that with the proposed stopping rule b = (p24)/p does better than b=
p2. This is also clear from the asymptotic risk expansion. Also, for
a fixed en~f0 as c decreases, i.e., the average sample size gets larger,
the percentage risk improvement decreases as in the fixed sample case.
Table 2.1 The Risk and the Percentage Risk Improvements
p24 p24 p24 p24
c lIIlI N R(XN g) R( N2,) R(N 9 ) R(A~ 2,9 ) R(A~ Np N N0 2 Nsp
9.09
10.20
11.30
12.24
13.57
14.76
6.57
6.45
6.31
4.93
4.02
2.06
4.82
4.73
3.89
3.04
1.70
.77
13.38
15.04
16.67
18.01
20.01
21.58
9.66
9.45
9.39
7.20
5.65
2.84
7.01
6.80
5.65
4.35
2.23
7.92
8.22
8.17
8.61
9.64
10.14
5.95
5.17
4.88
3.55
2.86
1.41
4.32
3.87
3.00
2.17
1.37
11.81
12.34
12.26
12.86
14.34
14.93
8.83
7.70
7.31
5.31
4.19
1.87
6.40
5.66
4.48
3.12
1.84
.51
2.27
2.72
3.62
5.54
7.89
17.60
2.26
2.73
3.56
5.45
7.82
17.48
2.27
2.78
3.58
5.39
7.83
17.58
3.6948
2.5424
1.8166
1.1751
.8052
.3490
3.6412
2.4466
1.8518
1.1591
.8002
.3694
3.6677
2.5991
1.8506
1.1681
.8107
.3524
3.3591
2.2831
1.6114
1.0313
.6959
.2975
3.4018
2.2889
1.7350
1.1020
.7680
.3618
3.4908
2.4761
1.7787
1.1326
.7969
.3497
3.2005
2.1600
1.5138
.9635
.6441
.2737
3.2895
2.2154
1.6779
1.0757
.7550
.3589
3.4107
2.4223
1.7461
1.1173
.7926
.3493
3.4022
2.3333
1.6681
1.0739
.7276
.3136
3.4246
2.3202
1.7614
1.1179
.7773
.3642
3.5091
2.4985
1.7950
1.1427
.7996
.3507
3.2586
2.2286
1.5939
1.0240
.6897
.2969
3.3197
2.2581
1.7164
1.0976
.7667
.3625
3.4329
2.4520
1.7677
1.1317
.7958
.3506
1 0
.50 0
.25 0
.10 0
.05 0
.01 0
1 47
.50 /IT7
.25 J/Lif
.10 /Lif
.OS JATE
.01 JUT
1 1
.50 1
.25 1
.10 1
.05 1
.01 1
.88 .48
1
.50
.25
.10o 2
.05
.01l 7
1 71
.50
.25
.10 71
.05
.01l 1
1 2
.50 2
.25 2
.10 2
.05 2
2.27
2.74
3.57
5.45
7.74
17.62
2.28
2.73
3.59
5.48
7.80
17.63
2.27
2.68
3.62
5.46
7.91
3.6316
2.5176
1.7857
1.1648
.7940
.3516
3.6506
2.5321
1.8786
1.1659
.8181
.3496
3.6174
2.5962
1.7980
1.1870
.7930
3.5422
2.4638
1.7611
1.1489
.7859
.3502
3.5938
2.4894
1.8575
1.1586
.8147
.3486
3.5750
2.5701
1.7867
1.1796
.7883
3.5054
2.4438
1.7544
1.1429
.7832
.3499
3.5726
2.4741
1.8497
1.1572
.8144
.3483
3.5641
2.5627
1.7849
1.1170
.7866
3.5470
2.4710
1.7656
1.1528
.7880
.3508
3.5954
2.4966
1.8639
1.1601
.8154
.3489
3.5779
3.5713
1.7884
1.1815
.7895
3.5084
2.4505
1.7590
1.1481
.7861
.3508
3.5710
2.4830
1.8580
1.1585
.8149
.3488
3.5660
2.5621
1.7859
1.1796
.7883
2.46
2.14
1.38
1.37
1.02
.40
1.56
1.69
1.12
.63
.42
.29
1.17
1.01
.63
.62
.59
3.48
2.93
1.75
1.88
1.36
.48
2.14
2.29
1.54
.75
.45
.37
1.47
1.29
.73
.84
.81
2.33
1.85
1.13
1.03
.76
.23
1.51
1.40
.78
.50
.33
.20
1.09
.96
.53
.46
.44
3.39
2.67
1.50
1.43
.99
.23
2.18
1.94
1.10
.63
.39
.23
1.42
1.31
.67
.62
.59
.01 2 17.63 .3501
.3492 .3490 .3494 .3491
.26 .31 .20 .29
p24p24 p24 p24
caili N R(XN,2) R(f6p2,2) R(fN ) R(aN2,) R(AN N Np %P gp2 N P
~~ N A ~N N N N
Table 2.1continued
1 /
.50 /
.25 /
.10 /
.05 /
.01 /
2.28
2.73
3.53
5.57
7.84
17.44
3.5773
2.5431
1.7944
1.1658
.8193
.3496
3.5465
2.5247
1.7856
1.1620
.8159
.3491
3.5390
2.5201
1.7842
1.1616
.8147
.3490
3.5480
2.5274
1.7881
1.1634
.8169
.3492
3.5394
2.5221
1.7872
1.1634
.8161
.3491
1.07
.90
.57
.36
.56
.17
1.06
.83
.40
.21
.39
.14
The opposite is the case when 11ell = 0. The main reason is that
when 1 = 0,) NIlIXN 2 behaves as a multiple of c Ilell2 when
2 2
8 f 0, while NIIl II + x when O = 0.
Figure 2.1 plots the risk of the sample mean, RN, the
JamesStein estimator at b = p2, 6Np2, and Takada's estimator at
b = p2, p2, versus nen, for six levels of cost. As can be seen
from the graphs, 6NP2, for each level of cost, has smaller risk than
"~N,
p2 p2
Sp2~~ ~~ bein bete ha ha f p2,f r each level of191 Also, we
c see that thepecn risk dimproemen is greatest at Ilel = 0 and n
dment decreas n oes as erases Further or fixed c. otst the percent
risk improvement derese as lie l moves awa fvro zN ero. Figure 2.3
plots the iss of ts.hee sape maean N the JamcenrsStin oestmant ora
be =ha (p e 4)pren 6is and Takad s estiatort at b11 = (p 4)p, n
cesN sa vecresu s Ie, for te six r evi foush levels of cst. Throe
panttdecrns an oclsos foder thes. ter plotare ixdentcalto thoe ofren
Figur te 2.1. s oFigue 2.4ploste percent rikimrvmeto
S(p24)/p, ~~4/ (p24)/p
6 versu Xe andth 6i orveru l ,versus cost, frthe fu
~VN ~N) N,
previous levels of netl The patterns and conclusions for these
plots are identical to those of Figure 2.2. Figure 2.5 is a plot of
the risks of X 6 and 6!. ,4/ versus net, for the six
~N' ~N ~N
previous levels of cost. This figure gives us an idea of how
Figures 2.1 and 2.3 compare as far as the JamesStein estimators are
concerned. Here, for the first time, we can graphically see that
(~p 4)/p (~p2) wie bfr ot
N ~has smaller risk than a wie sbfrbt
dominate ~N). Hence, we have more evidence in favor of (p24)/p as
the optimal choice of b. Figure 2.6 plots the percent risk
impoveens p2, p2 (~p 4)/p ( 4/
imroemnt o 6 ,6 and 6!, ,4/ over X versus
~N, ~N ~N N ~NN
cost, for two levels of nea. Here, for the first time, we can
(p 4)/p
graphically see that SN has the greatest improvement while
6i24/ is second, 6N 2is third and aP has the least
improvement over XNgiving once again more evidence in the favor of
(p24)/p as the optimal choice of b.

.1 O
 
) rT .
_I 1 I r I I__ rTI I_ IT1 _ Irl _ _I _____li____ __
48
i cost=1. O
3.25
3.001
2.75
2.501a~=""cost=.50
2.00
1.50
1.251
1.001
0.251
cost=.25
cost=.10
~eP
cost=.05
A cost=.01
~denotes risk for XN
6p2
N,
Sp2
~N
NB'RM THETA
O denotes risk for
Denotes risk for
Figure 2.1 Risk Analysis at b=p2.
49
12
10(
CBS
0N
A denotes percent risk improvement for gp2
Figure 2.2 Percent Risk Improvement at b=p2.
1.00
0.75
_ I r ~ _I 1_ r rT1_7 __1 ~1_ _ __ r_ __ __ __ __ ____
3.00
9.75
i. 50
3.00
1. 75
Y
~sr~bc~c~ c.
~~~
~l~g~e
cost=1.0
cost= .50
cost=.25
cost=.10
cost=.05
cost=.01
NORMt THETAI
L denotes risk for XN
Odeote rik fr 6p2~N /
O denotes risk for 6 p24) /p
~N
Figure 2.3 Risk Analysis at b=(p24)/p.
'20
S16
CBS
E 6 =)/
N 10 e ecntrs mrveetfr_
Tdntspretrs mrvmn o ~24/
Figure 2.4 Percent Risk Improvement at b=(p24)/p.
Figure 2.5 Risk Analysis for the JamesStein Estimators.
~
. Bc a r
r
RISK
3.751
j LPe~ .ncost=1.0
3. 00
2.75
2~.50
2.25
2.00O
cost=.50
cost= .25
cost= .10
cost=.05
L 7
1.00
0.7'35
0.50
cost=.01
0.00
NCIRM THETAI
a denotes risk for XN
O denotes risk for 6p2
~N
Denotes risk for g624/
N
18
IISli=O
~ ;
a
~~E3~~
Ilel I=1
c 1P
E
0.0 0. 1 0.2? D.3 0.U 0.5 0.6 0.7 D.d 0..3 L,0
COST 2[Zq/
1 denotes percent risk improvement for
~ N
O denotes percent risk improvement for 6(24/
Denotes percent risk improvement for 6p2
O denotes percent risk improvement for 6p2
Figure 2.6 Percent Risk Improvement for All Estimators.
CHAPTER THREE
SEQUENTIAL SHRINKAGE ESTIMATION OF
LINEAR REGRESSION PARAMETERS
3.1 Introduction
Consider the linear regression model Y. = z'.8 + E.,=,,..
1 ~1~ 1
where the E.'s are independent and identically distributed
N(0,o2), B(px1) is unknown, and z 2. . are known pxl
vectors. Let Z' =z (1.,z ) and Y, = (Y1"" 'n)'. Assume
that the rank of Z' Z is p. The problem is estimation of B. Given
Y let the loss incurred in estimating 8 by 6 (Y ) be
Ln n'~'' =i 8 IZn Zn)an B) + Cn (3.1)
The usual estimate of B is the least squares estimator
a = (Zn Z )1Zn; Y which is distributed N(B,o2Zn~ Zn 1), with
risk
R(8, )~ = EL (8 ,8) (3.2)
= E[(B B) I(Z' Z n)(6 8)] + cn
n c
2 1/2
The above risk is minimized at n = n* E (pa ) .However, if
a' is unknown there does not exist any fixed sample size which
minimizes (3.2) simultaneously for all a Again, motivated from the
optimal fixed sample size n*, we propose the following sequential
procedure.
p MSE, 1/2
N = inf {n m : n ( n)}(3.3)
where m > p+1 is the initial sample size and
MSE = n Y Z B n /(n p) (3.4)
n ~n ~n ~n
for each n > p+1. The above stopping rule was first considered by
Mukhopadhyay (1974) who proposed the estimator 8N for B and studied
asymptotic properties of N and j .
Our next step is to show that the above procedure can be
motivated from a minimax criterion. With this end, first let 2 have
a N~~r2I) pror.Then the posterior distribution of 8 given
Yn = y_ is
N(u + .2Zn (2 nZ Z + 02 n 1(y~ Z v), (3.5)
pP 7 Zn InZn) + GI n) Zn *
Next we prove the following matrix lemma.
Lemma 3.1
2Z Z' 2 1 2 2 12
Z (rZ Z GI = ( Z'Z+ 0 ) Z' .
~n ~n n~n NP n
Proof: Use the identity
(r2ZnZn + 021 )Zn= Zn 2ZnZn + 02 n) .
Using Lemma 3.1 it follows that
T Z'(T +a )( Z U)
(T 22 2Z+ a2 1Zn (n Zn9
2~ + 1 1 1 2 
= T(I +m (n n)) (nZn ) Zn nN
(3.6)
(3.7)
2
= (I + 1,
T
(ZnZ 1 n '
Also, using Lemma 3.1,
I 2 n2Z + 2 n1Z
=Ip (r2Z'Z + 02I 1 ,2 'Z )
~p nn ~p nn
(3.8)
~2p ~ 2p)
= I ( (2Z Z + 02I 1 T2Z'Z
~pn n ~p Nnn
= 0(, Z'Z + 62I )
~nn 
2
= 4Z Z
,2 ~ n n
2
a 1
+  I )
T2 ~p
2
a
2 ~p
T
2
2 ~n~n
T
1 1 
Thus from (3.5) and (3.7) (3.8) it follows that the posterior
distribution of 3 given Yn = yn Is
02
T
s 1 1 ^
(Z Z ) ) (6 ) ,
N(1 + (I p
(3.9)
2
2 a
a(I +q
S' 1 )1Z Z 1)
(Zn~n Znn
The loss (3.1) being quadratic, the Bayes estimator of 8 is
2
a 1 1
+ r(ZnZ, ) ) (6 )
T
B
= p =+ (Ip
a2 a2 11
with risk  tr[ (I + (Z Z ) 1)]+ .
n p ,2 nn
Next, consider the sequence of {N(u,ml ), m>1} priors on f3.
Then the Bayes estimator becomes
B .2 11^
6 = +( +I + Z IZ ) ) (8B
~n "p m n n ~n
.2 ,2
with risk  tr [(I + (Zn)) + cn. Here, the Bayes
2a 2
risk converges to pa+ cn as m ", and R(B,o2 B ) p + cn
n ~n n
for all B. Hence n*, the sample size, and 8 the estimator, is a
~ ~n
sequential minimax rule. Consequently (3.3) can be viewed as an
empirical minimax stopping rule.
In Section 3.2 we show the dominance of a class of estimators
over SN for the loss (3.1) and the stopping rule (3.3). In Section
3.3 we provide an asymptotic risk expansion for 8N and an asymptotic
risk expansion for the shrinkage estimators dominating SN both up to
the second order terms.
3.2 A Class of JamesStein Estimators Dominating 6
In this section we consider the class of JamesStein estimators
6N(YN) where
r
~cN ~NNN ~N
n U )
a ~
sbv )
~01 ~0
(3.10)
for every n > p+1, where s2 22Sn sacntn n e s
the known point towards which we shrink. The main result of this
section is as follows:
Theorem 3.1 Under the stopping rule (3.3) with m 2p, and under
the loss (3.1),
2 b
R(,a ,S )
N N
2
SR(8,a ,N
N'
for every bE(0,2(p2)).
Proof: Note that
R(8,a2, ) (,2 )
N '
(3.11)
=bE2 1
1
 2bE 2 9
N
NN NZNN VN
~)( ~ ( ~ ~N
60
2 4
= E. E [(1 b
n B ,02 (B X)'(Z'Z )(B X)
~n ~nn n
2
bs
 2 B n)( ) )
~n n~n
_B *(Z Z )(B ))}I].
To proceed further we shall require the following lemma concerning
the orthogonal decomposition of SSE (=(np)MSEn) in regression.
Lemma 3.2 Let Yn = 6+ e where e N( O a2In) for every n p+1.
Assume that (Z'Z ) to be invertable for every n 1 p+1 and define
^ 2n
SSE = HY Z B 8 Then the following results hold.
n ~n ~ln
(i Zn+1Zn+ Zn n + zn+1zn+
(ii) (Z' Z ) 1
n+1~n+1
= (Z'Z ) (1  zn~ (ZIZ ) 1), where
~nn p n n1 n+1~r
n
n =1 + z (ZIZ ) z ,
n ~n+1 YIUI ~n+1'
(iii) 8 = +n [ (Z Z )1 z ,wee
~n+1 ~nn nn ~0 ~+1'wr En
B = (Z'Z )Z'Y ;v
a 2
(iv) SSE,, = SSE, + ni
z B and
~n+1 ~n
=Y
n+1
61
(v) Yn ZB is distributed independently of En.
Proof of (i):
Note that Z n+1l n+)I.
Zn
Z' Z = (Z'z )[ ]
~n+1~n+1 ~n~n+1
~n+1
Then
(3.12)
~nn ~n+1~n+1
Proof of (ii): Using the matrix identity from Rao (1965) that for
any matrix A (pxp) and (px1) vectors u and v,
(A + uv ) = A 1
then
(Z ,Z ) =~ (Z Z + z
~n+n+ ~n~n
(A luv A )~,
1+v A lu
n+1~n1+1)
(3.13)
(3.14)
(' 1_~(' 11,,31' )1]
Nn n 1 ;n 0 n1n + Z ln
1+n+1 Znn / n+1
= (ZtnZ) (I~p n n+1Un+1 n 0 *l
Proof of (iii): Note that
6 (Z' Z )Z Y
,n+1 Nn+1 n+1 sn+1n+1
(3.15)
= (Zn 1 _1 )~En+11n+1 Zn n1(nn
P T~n
n+11n+l
1 1
 z (Z'Z ) z )
n n+1 ~n~n ~n+1
+ Y (Z Z ) z2 (1
n+1 ~n~n ~n+1
= 8 Z Z ) z (z 8 )
~n np" ~n +1 ~n+1~
1 1
+  Y ~(Z'Z)z
l ,n+ n~n ~n+1
1 1
,n + ps(nZn ) n+1 '
where in the fourth equality in the rhs of (3.15) we use the
definition of np.
Proof of (iv): Note that
SSE,+ = "nYl Z e o~l
(3.16)
2
Y Z 8
sn snn+1
Y z 8
n+1 ~n+1~n+1
1 1 2
sIn inn E (ZI ) Z .11
n 12
+ (Y ~ z' 2' z (Z'Z ) z )
n+1 ~+1~n n ~n+1 ~Yn~n Nn+1
E E:
n nn2 + n+/nnZ(nZi
+ (E z (
n n ~n+1 ~n~n ;n+1
2
E
= SSE, + 
2
En
= SSE + 
n np
En2
(? 1) + (E i)
the third equality in the
the fifth equality we use
rhs of (3.16) we use part (iii),
the definitions of ~n and 6 *
where in
while in
Proof of (v): Since Yn Z 8, and En are both normally distributed
then to prove their independence we need only show that their
covariance is zero. With this end note that
Cov(Y Z 8,n.)
(3.17)
=Cov(Y ni'1+ in+ n z
= Cov(1n'in+1 ) + Cov(Z nin' n+1
64
= Var(Y )Z (Z Zn ) zn~ + Zn Var(B )zn~
= Z (Z ~Z ) z + ZZn (Z Z ) znt
~n~~n ~n+1 ~n ~n~n +
= 0
Note that if we define X = Y Z B n and
1 ~p+1 ~p+1 p+1
X.= ,1P i=2,...,np, then from Lemma 3.2 the X.'s are
1j nip
2 2
independent, and Xi ~ a x1, i=1,...,p. Also,
from Lemma 3.2, for
each n p+1,
SSE =nY Z B 5 = X1
p+1 ~p+1 ~p+1 p+1
p+2 p+1 tp+1 p+1 n~p+1 1
+ X2
2
"n2
nn2
SSE,_ = 'nY2 ~nZ 8 8
SSE =nY Z B 11 +
n ~n1 ~n1Nn1
X1+X2+...+X ,p
X +X + .+X +X
n 1 2 np1 np
be considered as the mean of
1
Consequently, MSEn = (np) SSE, can
(np) independent and identically distributed x2 random variables.
Denote by Bn the oalgebra generated by X1, .. Xnp.
using Lemma 3.2,
Then
E [ n I ] (3.18)
2 C N=n]
~Yn ~ ~n~ ~n
= E 2E 2 ^ nN=nljB ]
~~ ~ ~~ 2nnz' n ~n
4 1
=E [s I E [I JB }]
8,o 8,0lB (B x) (Z Z )(6 X)
4 1
= E 2[s I[Nn]E 2^ '
a 8, a (6 1)(n)( )
where for the second equality in (3.18) we use the fact that I
[N=n]
is a function of X1, . ., Xnp only, and in the third equality
we use the independence of X1, .. Xn and 8 .
Next, note that for every n ,p+1, since ZnZ is positive
definite, using the diagonalization theorem, there exists a
nonsingular Dn such that D~ (Z IZ ) D = A 1, a diagonal matrix with
all positive diagonal elements. We write a as the ith diagonal
element of A n. Use the transformation Ln = D'(6 1), so that
LS ~ N(q,a~ A ), with 2q = Dn n ) Then
2 2
E [ 0 = E [ ] (3.19)
B,a2 n X10 ZnZn n B n2
LIA L
~nnn
Here, for every n, 2
2
~ ZP W ., where the W 'ls are independent
2
2 ni
and w ,. ~ xl( )i=,.p.Hence, using again
2o/ni
the result
that if P(U>0) = 1 then E(U1) 0 I E~exp(tU)]dt, one gets
1
E 2 2 ]
~n
1hs of (3.19) = E
(3.20)
= IJ E[exp(tz=W .)]dt
= J 11 E~exp(tW i)]dt
2
ni 2t )d
exp( ) }d~t
2a /a.
1=1
1 r ,/2
= JO `l+)
t 1 Cp 2
exp (T~~ i , l=anitni)dt
a
t 1
exp( 1+2t 2
a
= g (f)
( ~lZnZn)(Bx))dt
(say)
since Zip a E2 .IAE = ( h) (Z'Z )(8h).
Here, A =a ( ,(r2
n n
We shall next show that g (6 ) is nonincreasing in n. Note that
for any nxp design matrix Z
~n
1i=1z11
n 2
~i=1zli
Ci=1 p1,i
, r=1zlizp1,i
Z'Z =
~nn
(3.21)
n
ri=1zli
n
i=1zp1,i
n 2
Ci=1zp1,i
Then for any a ER,
=P c 2 a.(Z1Zn).. a.
1=1 j=1 1 n 13
a'(ZnZ )a
Nn
(3.22)
=a E 2ZIZ ).+ 2 C C a.a.(Z IZ )..
1=1 1 ~nn 11 1 nn
i
= Pa.(E z2 ) + 2 a.a (rn z kj)
i=1 1 k= ik 1 3k= 1zkk
i
~= E [2 + a~zl + ... + "pz lkl
which is a nondecreasing function of n. Hence, using (3.21) and
(3.22), g (fn) is a nonincreasing
Next, observe that
function of n.
2
E ~S
8,02 (6 I) (ZI ZnB*
~n ~n~
Bn 1)'(Z'Z )( B)I ] ~n
(3.23)
68
= E E [n 6 )(ZIZ )( 6)I (nlB ]
2 2 ^ n N nn ^on [Nn
(B x)'(z'z )(B 8)
,n un ,n
= E [s2I ]E ]
2 n [N=n] 2C~
a~l 8,0 (X) (Z Z )(B 1)(
by the independence of X1, . ., Xn and Sn, and that I[N=n] is
a function only of X1..., Xn
Using the diagonalization as before,
sn ,nsn on
E [ ] (3.24)
B,o (6 ) (Z Z x)(
= E [ ]
,2 L'A L
n' n,n LA
PL .
p= 2 LA nl
n 2 ni nin
= Ca 02E [ 1ni L .
1=1 2 L' L in
inP knn 1 E~ ni
= (p2)E 2 _a_
=(p2)g (b ).
For the third equality in (3.24), one uses Stein's identity, while
for the last equality, one uses (3.19) and (3.20).
Combining (3.18) (3.20) and (3.23) (3.24),
2b 2 ^
R(Ba ,6 ) R(Bla ,BN) (3.25)
N, ,
2 2
s s
n n
= b In=m 9p n)E 2 i {b q 2(p2)}q ],~
2 2
S S
n n
< 2(p2)b zn=m p n,) E 2~ iE 1} I[N~n] '
where in the last step one uses 0 < b < 2(p2). Accordingly, for
proving the theorem, it suffices to show that
2 2
S S
2~~~~~~ g () j } 0 for all 02 > 0 (3.26)
n,=m p ~) _n 2 2 [) N=n]
With this end, let no be the smallest integer > m (> 2p) such
tha p~+21 cn /p 1 02. Then, write
2
n 1 s
Eo ( )E [n
n=m 9p ~n 2 Cn
2
s
1hs of (3.26) =
(3.27)
2
sn
+ g p(bA)E~ [ o
"n
,2
[Nan ]
2
+ E g (A )E [n+1
+n=n (p ~n+1) 2 n+1 ~
2
S
+1
2
 1}1 ~~l ]
2 2
s S
9p bn)Z 2 TE2 N n+1]
where the first term in the rhs of (3.27) should be interpreted as
zero if no = m. Note that for n n on the set [N n+1],
s 2> p(p+2) lcn /pt p(p+2) cn2 /pp 2
g (A ) is nonincreasing in n. Hence,
Also, by (3.21) and (3.22),
third term in the rhs of (3.27)
(3.28)
2
( n=n 9p n+1)E 2 fnl
o a
2
S
j) 1}
aT I
2
 n
2
a
Note that INn+] saB m su bl fnc o. Then using the
repesnttin 2 1 {(np)s2 p P
rereetaio n+1 np+1 n p+2
probability 1,
X }, one gets with
np+1
2 2
S S
Sn+1nl n+12 II Nnl/ n
E [ B]
(3.29)
1}IN n+1]
(np)((n+1p)2 )
p+2
2
(n+1p)
2
+3 0
(p+2)2
(np)2s4 +2(np)s2 p02
(n+1)(np+1) ar
[~Nan+1] .
(np)s2 p2
n+2
(7n+1)(np+
2 2
s s
n (
a
2
 1} + I[Nyn+1]C 1_T (np)
(n+1p)
1
[Ngn+1]
(1(
n+1
np)((np+1)2 7)
(np+1)2
1 2
 )
n n
((n+1p)3 )
1 p+2' p 2
(n+1p)
the multiple of I[Nn1 in the extreme rhs of (3.29) is
2
function of s ,where the maximum occurs at
Note that
a concave
2. n+1)(n+1p)2n(np)((n+1p)2 ,
s = r1 <2 a.
n ~~2((n+1)(n+)~)
(3.30)
2 2
Here, recalling that on the set [Nin+1], s > a it follows that
n
2nd term of the extreme rhs of (3.29)
(3.31)
1 c2
n
2
1 (np) _
I [Nn1 n12
~N~n~l n~l(n+1p)
1 2 1 ~
((n+1p)3 3 )
1 p+2 p 23
 n~r (n+1p)2P+
=I ( 1 1 ) [3 1
[N n+1] n+1 (nlp2 p+2
 Anp)1 02
p p+2
0 ,
since n > 2p.
Hence, from (3.28), (3.29) and (3.31),
third term in the rhs of (3.27) ( .
Next, note that I[Nm 1 with probability 1 and
(3.32)
E2 ~
2
s
m 
02
2
1)] = e
m
()~2 1 (mp+2) 1]
(3.33)
22
 {2 4mp)]
0 ,
since m > 2p.
Thus, if n =m then rhs of (3.27) < 0.
o
p(p+2) lcn /p
first note that for n ( n 1, on the set [N=n], s2 <
o n
Spp21c o)2p < 02 so that using the fact that g (f)i is
nonincreasing in n,
the first term in the rhs of (3.27) (3.34)
2
s ~
2n
2 a
2
Sn
2
n 1
(~~~~ 9ph)C
1 ,,
73
In view of (3.26), (3.27), (3.32) and (3.34), for proving the theorem
it suffices to show that
2
s
n
1)I + E [ o
[N=n] 2 n
a o
2
s
n
o )I]( 0 .
T" [Nan ]
a o
2
to E[n
n=m 2 n
a
2
s
2
0
(3.35)
2 1
s = 
n n p
2
Io no~s,1
To prove (3.35), first writing
+ p X ], one gets
p+2 n p
s2
n
E [ o
2n
a Co
,2 1)I N o]
(3.36)
2
s
2 2n
2
sn
NnlB ]
(op 5n 1
0
+2(n 1p)s2 
o 1
p+2
+3( )2 54
p+2
{
L
=E [
2
a
n (n p) aZ
2 p 2
(n 1p)sn +z c
o
no(n p)
['Nino~
4
S
n 1
  cs2 1
=E 2[{b o
 d 12 [
2 2
sn 1 sn 1
cn ) 1o7 1
o o o a
= E 2[{(n 1)(2b
S(n 1p)2
where b ,
"o "o (n p)2
1 (n p3 9)
"o o (n p)2+
0
p+2
(n 1p)(n p2
2
(n p)
0
1
n
o
,and
.Note that
n 1p
c b = o1 2 2 < 0
n n 2 p+2
o o n (n p)
(3.37)
and that
2
 bn )' = 1 2(p1)(n p)32 7 (p2)]> ,O
d +(c
"o (o
(3.38)
since n > 2p. Also, let
o
4
n 1 (n 1p) (n p 
Sn (n p) (n p)
o o o
fn = (n 1)(2b, c
n o n n
(3.39)
So that f o
<(01).Now, rewrite
extreme right of (3.36)
(3.40)
4
sn 1
+( (
2s2
n 1
o
S (n Cn bn [~j Nan ]I
o o o
2 2
S S
n 1 n 
< f E [ (~o
a n 2 1 2
1) I] .
[Nan ]
2 2
s s
n 1 n 1
If E [o__~ o
2 n 1 2
S2
1)I ] < 0, noting again that s2(ao
[N>n ] n
the set [N=n] for all n _
< 1, one gets from (3.35), (3.36) and
(3.40). Otherwise using f
(3.40) that
1hs of (3.35)
(3.41)
2 2
s s
n 1 n 1
+ E 2[ (
o1 a 2
2 2
s s
E 2 n( 
a a
n52
1)I[N=n]]
1)I Nn 1
~Nn o
Proceed inductively to get either 1hs of (3.36) I 0, or finally end
with
2 2
1hs of (3.35) < E 2[i~ m )I ]5
2 C TE N>m]l
(3.42)
as shown earlier.
2 1
Remark. Initially we defined S MEraiigtene
n (np)+an nS, elzigten
Nnn
dominance was then proven with conditions on an established along
the way. The particular choice of an = 4np) was most appealing
since then S M S n ec
(np)+ (np)
E 2S2]= p,02 Therefore, the bias would be negligible for large
p (as in Chapter 2). As a consequence of our choice we needed
m > 2p, and thus we require such a large initial sample size.
b.
3.3 Asymptotic Risk Expansi~on for 88 and 6
In this section we first obtain an asymptotic risk expansion for
~N up to the second order term. Subsequently, we find an asymptotic
risk expansion up to the second order term for 6N'
Observe that
2 ^ 2ENl 1 EN (.3
R(6,a ,8)=pa EN]+c[](.3
(Nn*)
=2cn* + cE[ N~
1/2 1/2 2
=2c (po2) + cE[( uI 1Nn*
a.s. a.s.
From definition N + m as c + 0 so that MSEN + 0 as c + 0,
since MvSEn is the mean of np independent and identically distributed
random variables. Also, using Amscombe's Theorem, /E (MSEN 02 ,
N(0,204) so that using the delta method, JNE (MSE N1/2 0) L
N(0, 1/2 02
Next use the inequality
p/)12MSE 1/2
N N1
Dividing both sides of (3.44) by n*, making c + 0, and using
1/2 a.s. a.s.
MSEN + as c + 0, it follows that N/n* 1 as c + 0. Thus,
1/2 L 1/2 L
'n*p (MSEN( ") + N(0, 1/202), and also "n*p (MSEN1 ") +
N(0, 1/202)
Again from (3.44), one gets the inequality
1/2 1/2
1/2 MSE Nn 1/2 MSE 0
n* < < + n* ,(3.45)
a 1 2 1/ a
n* n*
1/2 L 1/2 L
implying that (Nn*)/n* N(0, 1/2 ) as c 0. Hence (Nn*)/N +
a.s.
N(0, 1/2 ) since (N/n*) + 1 as c + 0. Also, the uniform
2Nnf
integrability of Nfor c < c (specified) can be proved by
repeating essentially the arguments of Ghosh and Mukhopadhyay (1980).
Thus, E 2 ~l;~(Nn* ) ] + 1/2 as c + and it follows that from (3.44)
as c + ,
2 =2 1/2~p2 1/2
R(B,a2,6 ) = c (p2 1/2c+ o(c ) (3.46)
Next, we find the asymptotic risk expansion for ( It follows
from (3.11), (3.18) (3.20) and (3.23) (3.25) that
2b 2
R(8,a ) = R(8,a ,8 ) (3.47)
sN = 1 /2 N 11
2b(p2)E 2C I 1 ~TIZE) exp{ 7 172#8 (cZqZN)}(B )}dt]
+ E 2[ 11 /2Tf)exp{ N2~B*~ 1 2tis)1( 4Z Z)}(1}d]
Now, for technical reasons, we shall handle the risk expansions
separately for the cases 8 P A and 8 = 1. However, both cases will
a.s. as
require the facts that as c + 0, s4 4( p 2, 2 + 2 p
N p+2 N p+2
1/2 a.s. 1/2
and Nc +(pa2) .Also, at this time, we shall make the
following assumption that ( ZlZ ) + K as n + =, where K is a
n ~n n
a.s.
positive definite matrix. Hence 4Z ZN)+ K as c +, 0.
Consider first the case 8 P x. Here, as c + 0,
1sN 1 /2 N t 1
c1 N Im(1+2t.) ex{ p(){NZZ)})dt3.8
0+t' a
1/2
P/2exp{ Nc t (){ Z N}()d
1/2 2 1/2)jBX)d
1+2tc
2 /2
2 Im exp{t IL(pa )(8X)'K(8X)}dt
4
sN I 1
1+2tc
a.s. 4
+ 0 g7
1/2
(pa2)
= 4 p 1
(p+2)2 (8h)'K(8X)
Next we prove the uniform integrability (in c < c ) of the left hand
side of (3.48) when B / x. First note that
1hs of (3.48)
 N
(3.49)
Nt6 1)' Z Z ) ( )d
3a
+ exp{ 3 48A) I IiZNZN) j Bx
SN K2
(T~ TT/2 dt]
where K2 is a constant not depending on c.
Thus it suffices to prove
the uniform integrability of BN )s{ 4ZZN 1i
First use the inequality
sZ
E 2 N Z 1 4
a c s
N c
(3.50)
{ (ZZN 1 > d
< d 6 E~t [( sN ( I~ ) (ZZN11+
02 NN
Taking 6 E (6 ,1/2 ) and using Holder's
E (0, 1/2 ).
for some 6
inequality, one gets
E (N I06X1){ 1(ZNN 1 1+6
E,2C(' ;
(3.51)
(1+6')/(1+6)
2 (1+6
(1+a)(1+s )
66' (66')/(1+6)
4I~)~~' 1 ~jl
SEr2[E (sNG ;NZN
Taking 6o E (6, 1/2 ) and using Holder's inequality once more one gets
rhs of (3.51)
(3.52)
i EE~2 Nc(1+6) (1+6' )/(1+6)
(1+6)(1+6') 1+4o
x IE~24~ 661 606
6o_4
1+60
(1+a')(1+60)
S[E {(8S? )' (Z~~NZN 18? 61
Next observe that for any arbitrary E in (0,1),
1+6
1+60
6_ .
147
E (2 (1+6)
2
c (2c (1+6)
a2 [Nien*]
+(E n* c) ] 5)
(3.53)
<(m2 )(1+6) P(
To proceed further we need the following lemma.
Lemma 3.3 For any 0 < E < 1, p > 1 and m > p+1, P(Njen*) =
( 1/2 )(mp)
0(c )
=Cn~ P (N=n) where [u] is the
n=m ,2
Proof: Note that P 2(Ncn*)
largest integer < u. Now
P 2(N~m)= P(1MSEm < cm2)
(3.54)
2 m_9 c 2)
= P(x < m )
mp p ,2
( 1/2 )(mp)
= 0 (c
2
since P(xk
Sd) = 0 (d ). Therefore, all we need show is that
C~n*I(1/2 )(mp)
tcn* P (N=n) = 0(c
n=m+1 2
(3.55)
Now, for n > m+1,
P 2(N=n) = P 2(MSE <  n )
2 np c n2
np~x_ p 2
(3.56)
expI~h np c 2]~E~exp(h
S2
72
np,)
< inf
Sh>0
=inf
h>0
np c 2 1 2
exp~h n ]()
p 2 1+2h
a
2 2 "P
n c n c 2
a a
= xpn p] (1
2
2 2 '
n c n c 2
= exp(1 )( )]
p 2 p 2
2
Note that for ng [ En*],  e2<1.
increasing in x for 0 < x I 1. Therefore
Also, x exp(1x) is
np
2
n" c 2
( ]
2
[Cn*] c
c [exp(1  )
n =m+1 p TE
c[En*] p Nn
nI~ P Nn)5
n a+1 2
(3.57)
mp
2
< c
mp nm
1 1 2 [En*] m"Ppl 2 E2E 2
(e ) e p 1 e )
p a2 n=m+1
mp mp nm
2 1 2 m 2 )23 2
< c (e ) n [Pexp(1 E )
2 n=m+1
pa
= (c( 12 )m"p)) ,
nm
since c mpep1/e is a convergent series. Hence,
n=m+1
using Lemma 3.3 for m p+3, and c I co < 1,
1hs of (3.53) (3.58)
2 (1+6) c 1/ )(mp)2 2 (1+6 )
< K (mc ) c Ep
(1+6) 26 2 2 (1+6)
K )(m 6) c +( p
where K (> 0) is a constant not depending on c. Further, since MSE,
can be expressed as a mean of independent and identically distributed
X12 random variables then MSE, is a one sample Ustatistic with kernel
of degree one and hence is a backward martingale. Consequently,
using Doob's maximal inequality for submartingales, it follows that
for every r 1 ,
E2s2r] = E2 MSEN r (3.59)
E 2SN =E [MSE ]
a2 N
MSE ]r
< E [sup
 2 n~
< K E [CMSE ]
where K1 is a constant which only depends on r. Next
observe that if an + a then for all E > 0 there exists an mo
such that for all n > m laa < E implying that for all n > m ,
(1+6')(1+60)
Let u= a =
6 g1 n
1 1
an ac
(8 )'Z'Z )}(BX),
~n n
a = (B )'K(BX) and E > 0. Then
66'
E2[{(Bx) iiZZN)}(8x)}
(3.60)
1 u1
E2C aN
=E [ ( I
2 a N'm ]
1N o nm3u
< 2u1{E [(
02 aN
+ E [(
a2 aN
I )~~u]
I [Nm])u1
u1
<' 2 [ (sup
ncm
o
1 u
a
n
+ ( ) ] ,
aE
independent of c. Combining (3.50) (3.60), the uniform
integrability in c < co of N 06,~(1)~'{ZN 1 follows.
o Nc
Thus, for m p+3, B as c' +0,
4
c1E 2 R
. )
exp{ INt X )'{g{ZZ~N)}( )}dt]
(3.61)
+ 4 p 1
(p+2)2 2Bh'X B
Similarly, it can be shown that, as c + 0,
4
c1E 2
1 2)/2
exp{ (~ )
'{(Z Zq)} ( )}dt]
(3.62)
+ 02 1 1
p+2 ( ) 'K (fX)
Hence, from (3.46) (3.47) and (3.61) (3.62) it follows that for
m 1 p+3, ~ as c + 0,
R(B,a2, ~) (3.63)
1/2 1/2
= 2c (po2 tl/2c +cbo2
2
(p+2) ( )lK( x )
+ o(c).
Consider now the case = X. It follows from (3.47) that
2
sN
2b(p2)E 2 C
1 P/2 t
1+2) t]
2 b
R(B,a ,6N)
2
= R(B,a N) 
(3.64)
4
E 2C~S r" T+"ZE) d t ]
a
b2
+
f
1/2
Note that I( ( ) dt 
01+2t
.Hen
1
p2
ce, using the uniform
c for m > p+1, one gets
o
2 4
SN SN
integrability of and in c <
Nc 12 Nc
from (3.46) and (3.64) that as c +t 0,
R(B,o2,6 ) =2c 1/ (po2)1/
(3.65)
1/2 3/2
(p+2)2
2
1b(b2 92)
p2 p
+ o(c
From (3.63) it can be seen that, asymptotically (as c +t 0) the
b^
percent risk improvement of 6N over BN for 6 P 1 is
1/2 2 1/2
100{ 1/2 b p ~ a (b 2 d )} + c
(p+2)2 (BX'()p
), (3.66)
while from (3.65), asymptotically the percent risk improvement of
b
6 over B for 6 = 1 is
87
100{1/2b 2 (lb 2 )}4~ + o(1). (3.67)
(p+2)
Here the dominant term in both (3.66) and (3.67) is maximized when
b = (p24)/p. Therefore, from an asymptotic point of view as
evident in (3.66) and (3.67) it appears that for small c, (p24)/p
is an optimal choice for b (as in Chapter 2).
CHAPTER FOUR
SEQUENTIAL SHRINKAGE ESTIMATION OF THE DIFFERENCE
OF TWO MULTIVARIATE NORMAL MEANS
4.1 Introduction
Let X ,X ,... and Y ,Y ,...be ko independent sequences
1' 2' ~1 2'
of random vectors with the X.'s independent and identically
distributed N(91,al 1) and the Y.'s independent and identically
disribtedN(2o2 V2). Here the Gi's E R are unknow while the
V.'s are known positive definite matrices, i=1,2. The problem is
~1
estimation of p = 6 6 Based on X ,..., X and Y ...,
~ 1 2' ~1 n ~1
Yif Fa is estimated by 6 = 6 (X ,...,X ; Y,...,Y ) let the
~m Nn,m ~n,m 1' n 1
incurred loss be given by
L(81'e2'6nm n(m n ~'m 9) + c(n+m), (4.1)
where Q denotes a known positive definite matrix and c(>0) denotes
the known cost per sampling unit. The most natural estimator of 11 is
W = X Y wi th risk
~n,m ~n ~Vm
2 2
R(01821'"2'W,alo 0 (4.2)
= E [(W u) Q(W u)] + c(n+m)
2 2 ~n,m n,m
21'i2,al'a2
[ (X x 1 X ( 1 2)'Q(Y 2
2 2 n n m ~ m
21'i2,al,02
+ 2 (X 1 '~ m 2)] + c(n+m)
2 2
al a2
tr (Q V ) + tr(Q V2) + c(n+m).
n m ~
2 2
When o1 and 02 are known, the above risk is minimized at n = n*
"12 1/2 2 1/2
(tr(Q V )) and m =m* E tr(Q V2)) .Also, for this
c c
pair (n*,m*) we have n*/m* = [ ] . However, when
a tr(Q V )
2 2
01 and "2 are unknown no fixed sample size will minimize (4.2)
2 2
simultaneously for all 01 and 02. In this case, motivated from the
optimal fixed sample size n* and m*, and additionally by n*/m*, the
following sequential procedure is proposed to determine the sample
sizes.
T = N + M, where
N = in~n > n ^2 t( k
sC 12
(4.3)
~2
s o'j 1/2"Y
M = inf~m
n > 2 and observe X or Y if neither process has stopped when
o n+1 ~m+1
12 ~2 1/2
/[s, tr(g V )]
^2
n/m ( [s tr(g V )]
(4.4)
12 ~2
/[s, tr(Q V2)
^2
n/m > [s tr(Q V )]
respectively.
Here,
^2 1 n 
s (n1)p] z (X. n )'V Xi 2 )
(4.5)
~2 1 ( Ym '1(Y ~
s = [m 1p Y. )V (
m j=1 3 m 2 3
for every n,m > 2. The above stopping rule is a multivariate
analogue of the stopping rule proposed in Ghosh and Mukhopadhyay
(1980). Similar stopping rules were considered by Chou and Hwang
(1984) who estiae ub Y.
N ~
A point of interest concerning the above stopping rule is that
s2 1/2
if N=n is the first occurrence of n > ( tr(Q V )) while
c
~2
S
m <( trg V ))
Sc 2p trS
then necessarily we must observe Ym~ since
^2 12 ~2 12
n/m > [s, tr(9 V )] /Es, tr(Q V )]
To see how the above sequential sampling procedure can be
motivated from a minimax criterion let e. have a N(Y.,T2B.) prior,
~11
i=1,2. Then following the arguments from Section 1.1 the joint
posterior distribution of [e ,02]' given Xi. = x., i=1,...,n and
1 1 1
VB ) (x ,v1
nT
2 .
2 1 1 1V 
mTT
Y + (I +
~1 Ip
2 1 1
22
2
2
 2(I
m p
Therefore, the posterior distribution of y = e e becomes
2 _2
1 1 1 x 2y )(
N(Y u + (I + VB ) (x Y1 (I +
~1 2 ~p 2 ~ ~1 ~n ~ p B
"1 "2
V ~11 (Y1 2
2 2 2 2
01 01 1 1 "2 "2 1 
V (I +B V )+ V (I + B V ) 1).
n ~1 ~p 2 ~1 ~1 m ~2 ~p ~2 ~2
Hence, the Bayes estimator of y becomes
2 2
01 1 1 Y 02 1 1
B (m 111 p E181 ;n 11 p + 2B2 Um
with posterior risk
2 2 2 2
01 01 a 1 1 a2 2 2 1 221~3 +cn
tr[Q( Vl p B1 V ) V2 p B2 2)} cnm
nT1 mT2
The posterior risk being independent of any unknown parameters
implies that the sequential Bayes rule is a fixed sample rule with
sample size determined by minimizing
2 2 2 2
01 01 1 1 a2 a2 1 )1j cn
trC [ V1 4 2 1 1 m ~2 4 2 B 2) ]+c(nm
ntl mr2
with respect to n and m.
Next, consider the sequence of (N( i,1B ), 1 > 1} priors on
i, i1,2.Then the corresponding sequence of Bayes estimators for
p lS
2 2
B 1 1 12 1 1
n~m 11 1 V1B1 Nn 11 ;p 2 m2
with risk
2 2 2 2
01 0 1 1 a2 a2 1 ~1~ cnm
tr[Q Vl I Bl 1 L V I + B2 4)) cnm
The latter converges to K tr(QV1) + tr(Q V) + c(n+m) as 1+mi,
n m
and R(91'22~,o2,02 n,m) = K for all 01' 2. Hence, under the loss
(4.1), the fixed sample rule with sample size determined by
02 02
1 2
mi n imizi ng tr(QV1) + tr( V ) + c(n+m) with respect to n and m,
n m 2
and estimator W is a minimax rule. Consequently, the stopping
rule (4.3) can be interpreted as an empirical minimax stopping rule.
In Section 4.2 we show the dominance of a class of estimators
over gM for the loss (4.1) under the stopping rule (4.3). In
Section 4.3 we develop an asymptotic risk expansion for gM and for
estimators belonging to the above class up to the second order term.
4.2 A Class of JamesStein Estimators Dominating ,~M
In this section we consider the class of JamesStein estimators
N,M N M)~~ where

