Citation
Asymptotically pointwise optimal stopping rules in multiparameter estimation

Material Information

Title:
Asymptotically pointwise optimal stopping rules in multiparameter estimation
Creator:
Hoekstra, Robert Michael, 1954-
Publication Date:

Subjects

Subjects / Keywords:
Bayes theorem ( jstor )
Bayesian networks ( jstor )
Martingales ( jstor )
Mathematical theorems ( jstor )
Mathematical vectors ( jstor )
Matrices ( jstor )
Perceptron convergence procedure ( jstor )
Random variables ( jstor )
Regression analysis ( jstor )
Statistical estimation ( jstor )

Record Information

Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
20382762 ( ALEPH )
21576609 ( OCLC )

Downloads

This item has the following downloads:


Full Text

PAGE 1

ASYMPTOTICALLY POINTWISE OPTIMAL STOPPING RULES IN MULTIPARAMETER ESTIMATION By ROBERT MICHAEL HOEKSTRA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1989

PAGE 2

To Leslie and Michelle

PAGE 3

ACKNOWLEDGMENTS As others have said before me, I would very much like to thank Dr. Malay Ghosh for being my dissertation advisor. His support was invaluable. I extend thanks to Dr. Joseph Glover, Dr. Ronald Randies, Dr. Murali Rao, and Dr. Dennis Wackerly for serving on my committee, and for their support. Also, I would like to thank Dr. Myron Chang for serving on my preliminary and final oral defense committees. On the practical side, thanks go to Cindy Zimmerman for taking one last dissertation through the word processor. Lastly, I thank the many people who seemed to think I would finish. m

PAGE 4

TABLE OF CONTENTS Page ACKNOWLEDGMENTS iii ABSTRACT vi CHAPTERS 1 INTRODUCTION 1 1.1 Sequential Estimation 1 1.2 Hierarchical and Empirical Bayes Models 3 1.3 Subject of Research 4 1.4 Some Tools 5 2 ASYMPTOTICALLY POINTWISE OPTIMAL STOPPING RULES FOR THE ESTIMATION OF A VECTOR OF MEANS UNDER A HIERARCHICAL BAYES MODEL 8 2.1 Introduction 8 2.2 A.P.O. Rule Under a Hierarchical Bayes Model 8 2.3 A.P.O. Rule Under an Improper Prior Model 20 2.4 Summary and Comparison 28 3 ASYMPTOTICALLY POINTWISE OPTIMAL STOPPING RULES FOR THE ESTIMATION OF A VECTOR OF REGRESSION PARAMETERS UNDER A HIERARCHICAL BAYES MODEL 30 3.1 Introduction 30 3.2 A.P.O. Rule Under a Hierarchical Bayes Model 34 3.3 A.P.O. Rule Under an Improper Prior Model 43 3.4 Summary and Comparison 49 4 SUMMARY AND FUTURE RESEARCH 51 4.1 Summary 51 4.2 Further Research 52 APPENDICES A DETAILS OF THEOREM 2.1 53 B DETAILS OF THEOREM 2.2 65 IV

PAGE 5

C DETAILS OF THEOREM 2.3 70 D DETAILS OF THEOREM 2.4 76 ! i r E DETAILS OF THEOREM 3.1 78 i F DETAILS OF THEOREM 3.2 90 G DETAILS OF THEOREM 3.3 93 H DETAILS OF THEOREM 3.4 96 BIBLIOGRAPHY 98 BIOGRAPHICAL SKETCH 100

PAGE 6

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ASYMPTOTICALLY POINTWISE OPTIMAL STOPPING RULES IN MULTIPARAMETER ESTIMATION By ROBERT MICHAEL HOEKSTRA August, 1989 Chairman: Malay Ghosh Cochairman: Dennis Wackerly Major Department: Statistics This dissertation is concerned with the sequential estimation of a multivariate normal mean vector and of a vector of regression parameters from a general linear model with multivariate normal error structure. In both cases the probabilistic setting is that of a hierarchical Bayes model. We proj)ose approximations to the optimal sequential Bayes stopping rules associated with these problems. These approximations are asymptotically f)ointwise optimal (A. P.O.) in the sense of Bickel and Yahav (1967, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, VI, pp. 401-413), and are developed in that sense. Using these A. P.O. stopping rules in conjunction with several natural estimators, we establish second order asymptotic Bayes risk expansions for the various estimation procedures. In addition we obtain second order risk expansions for the Bayes risks of the optimal procedures. These provide a standard for comparison of the performances of the approximate procedures in both the vector of means and vector of regression parameters cases. We find the A. P.O. stopping rules, in conjunction with estimation by posterior mean vectors, yield asymptotically VI

PAGE 7

"nondeficient" procedures in the sense of Woodroofe (1981, Zietschrift fiir Wahrscheinlichkeitstheorie und Verwandte Gebiete, pp. 331-341). We note the nature of the deficiencies associated with the other approximate procedures. Vll

PAGE 8

CHAPTER 1 INTRODUCTION 1.1 Sequential Estimation It is well-known that in sequential problems, once a Bayes rule tells one to stop, the Bayes action is independent of the stopping rule, and is the same as its fixed sample counterpart. However, the problem of determining Bayes stopping rules is usually more formidable, and although such stopping rules exist under fairly general conditions (see, e.g.. Theorems 4.4 and 4.5 of Chow, Robbins and Siegmund, 1970), their exact determination is usually very difTicult. Bickel and Yahav (1967, 1968, 1969a, 1969b) in a series of articles have developed stopping rules which are asymptotically equivalent to Bayes stopping rules as c (the cost per unit sample) goes to zero. They have called these rules "asymptotically pointwise optimal" (A. P.O.) rules. These authors have proved certain asymptotic optimality properties of the A. P.O. rules in sequential estimation and hypwthesis testing for a general class of distributions including but not limited to the one-parameter exponential family. In the context of sequential estimation, stronger asymptotic optimality results are proved by Woodroofe (1981) for the one-parameter exponential family. In particular, he has shown that A. P.O. rules are asymptotically "nondeficient," i.e., the difference between the Bayes risks of a Bayes estimator under the optimal Bayes rule and the A. P.O. rule is o(c) as c— +0. A similar result is proved by Woodroofe for estimation of the univariate normal mean with unknown variance. Woodroofe's results are extended by Rehalia (1984) without the assumption of conjugate priors. Recently, Finster (1987) has extended Woodroofe's results for a normal regression model. All of these results establish nondeficiency via second-order

PAGE 9

asymptotic expansions of the Bayes risks of the various procedures. Our results continue in this vein. We can illustrate the concepts discussed so far in an example which foreshadows the results of the dissertation. Suppose that given G = ^ and R = r, X,, X^,.-are iid N((9, r ). Also, suppose that given R = r, ~ Nfm, (Ar) j. Finally, suppose that R ~ Gammafla, ibj. That is, the X-'s have a conjugate prior distribution. Suppose we want to estimate 6 sequentially where our loss function is squared error plus linear cost. A Bayes sequential estimation procedure exists under this scenario, but it is not immediately accessible. As an alternative, an A. P.O. stopping rule could be determined as follows. The posterior risk for this problem, using the posterior mean as estimator, say 6^, is given by (e-^n) X,,..., Xji + nc = E ,2 E((e-^n) M, R, Xp..., Xn) X-,,..., Xn + nc = E ((A+n)R) -1 Xi,..., Xji + nc (A+n) ^E(R-l|Xp..., Xn) + nc. Asymptotically pointwise optimal stopping rules derived from the fact that for large n, E(R" |X,,..., Xn) « R~ , and hence that the posterior risk behaves as an essentially deterministic function with a computable minimum with respect to n. Differentiation yields ,-1 -1 that (A+n) R + nc is minimized at n such that R = (A+n) c. A stopping rule based -1 2 " on this idea could be "stop at the first n such that E(R IX,,..., X^) < (A+n) c. It turns out that this rule possesses asymptotic pointwise optimality in the following sense: suppose T = T(c) is the rule described and S = S(c) is any other stopping rule, then lim c-*0 E (e-^T) "(B-^g)^ Xp. . ., Xrp X-i,. .., Xn + Tc + Sc < 1 a.s.

PAGE 10

(^-^n)^ X-,,..., Xn as n— +00, and This property of T follows because of the stability of nE is proved in Bickel and Yahav (1967). As Woodroofe points out in his 1981 paper, an implication of being A. P.O. is that such a rule must be asymptotically equivalent to the Bayes stopping rule. It also follows from their arguments that any rule asymptotically equivalent to T is also A. P.O. This is a weakness of the property and led Woodroofe to consider nondeficiency as a more refined approximation. 1.2 Hierarchical and Empirical Bayes Models This dissertation concerns development of A. P.O. rules for hierarchical and empirical Bayes models. The idea behind a hierarchical Bayes model is that it is often convenient to model the subjective prior information in stages. In an empirical Bayes scenario, one can T exploit relationships among the coordinates of a parameter vector, say 9 — (^p..., ^p) by first putting a prior distribution on 6, and then estimating the prior parameters (usually smaller in number than p) from the joint marginal distribution of the observations. For example, in a field trial scenario, ^1,..., ^p might be mean yields associated with p randomly selected plots. The 5's can be thought of as separate realizations of a common random variable with mean /i. In this way, for drawing inference about a particular coordinate, say B-, information from other coordinates is also used. We also illustrate hierarchical and empirical Bayes models by example. Consider the following hierarchical Bayes model (Lindley and Smith, 1972). Suppose Xp..., Xp are independent with distributions N(^j, 1) (i = 1,..., p), given the ^j's. Also, suppose ^p..., ^p are iid, N(^, 1). The hierarchical Bayes approeich takes /i as unknown and places a prior distribution on it. We take the improper prior, /i ~ uniform(-oo, 00). The end result of this model is that the joint posterior distribution of ^1,..., ^p, given X= x(i = 1,..., p), is T T dependent . Specifically 6 = (^p.-i ^p) > given X = x = (x^..., Xp) , is distributed P T Np(ix + ixlp, lip + ^Iplp), where x = P~\E Xj and Ip = (1,..., 1) . For an example of the empirical Bayes approach, suppose we have, as before, X-'s ~ N(^-, 1) and

PAGE 11

^'s ~ N(/i, 1), iid. Under this model, marginally X,,..., Xp are iid N(;j, 2). The unknown, -1 P /i, is estimated from the data. We use x = p 2^ x. This gives us an estimated posterior i=l ' distribution for 6. Specifically, given X = x, ~ Npf ix + ^xlp, ^IpjNote that in the two examples we have a common posterior mean, but the empirical model does not adjust for the error associated with estimating fi. The implication is that the hierarchical Bayes approach can be used effectively in an empirical Bayes set up, typically by putting a suitable prior distribution on 9 at the first stage, and then putting another distribution on the prior parameters (usually referred to as hyperparameters) at the second stage. The second stage prior is intended to model the uncertainty in our knowledge of the hyperparameters. 1.3 Subject of Research The subject of this research is the p>erformance of A. P.O. stopping rules in the context of estimating both a vector of population means and a vector of regression parameters under hierarchical Bayes models. In Chapter 2, we develop an A. P.O. rule for the estimation of a vector of means under a hierarchical Bayes model originally introduced by Lindley and Smith (1972). Our A.P.O. rule remains asymptotically nondeficient for a wide class of proper (bonafide distribution) priors on the hyperparameters. Further, we consider the performances of two other procedures in the context of the hierarchical model. The first is derived from calculations based on placing a diffuse or improper prior on the hyperparameters. This yields the same stopping rule but a different estimator for 6. The second retains the original stopping rule and uses the sample mean vector to estimate 6. We determine asymptotic risk expansions for all three procedures, and provide performance comparisons. There is one article to date jiertinent to the discussion of Chapter 2; that is, the estimation of means under hierarchical Bayes models, via A.P.O. stopping rules. Martinsek (1987) develops A.P.O. stopping rules under empirical Bayes models and applies them to

PAGE 12

establish nondeficiency of a procedure for estimating a single normal mean. The corresponding features of our results are discussed in Chapter 2. In Chapter 3, we develop an A. P.O. rule for the estimation of a vector of regression parameters from a general linear model with a hierarchical prior distribution structure. Again, our A. P.O. rule remains asymptotically nondeficient for a wide class of priors on the hyperparameters. We again consider the performances of two other procedures, under the propter hierarchical model: a diffuse prior procedure and a "classical" procedure which uses the weighted least squares estimate of /?, the vector of regression parameters. Both of these use the original A. P.O. stopping rule. We determine asymptotic risk expansions for all three procedures, and provide performance comparisons. One should note that the regression model contains the vector of means model as a special case. There is also one paper relevant to the discussion of Chapter 3. Finster (1987) considers a single-stage conjugate prior set-up for estimating /? under the same regression model that we use. He shows that the classical one-step look ahead, or myopic stopping rule is A.P.O. and, in fact, nondeficient for estimating /?, under a loss structure different from ours. The different loss and lack of hyperparameters make direct comparison tenuous, but Finster obtains a risk expansion which matches those obtained in Chapter 3 in their first order terms and in two second order terms. This correspondence is noted in Section 3.4. 1.4 Some Tools We will be considering vector parameters associated with vector observations, and will be performing numerous matrix computations. Also, we will operate on numerous martingales and submartingales. At this point it is convenient to establish two lemmas. The first lemma is very useful in establishing inequalities and will typically be used without comment. Recall, for concreteness, that the spectral norm of a real, square matrix A is given 1 by {largest eigenvalue of a'^A}^ It has the defining representation, |A| = sup |Az|, where z is a vector, and the r.h.s. norms are euclidean vector norms.

PAGE 13

Lemma 1.1 : Suppose we have two sequences of positive definite matrices, JAn, n>lj and •| Bn, n>l >, such that An — > A, positive definite and Bjr, — f B, positive definite, where the convergence is elementwise. Then, for any sequence of vectors, Xn, there exists a positive constant k such that Xn AnXn < kxjBnXn for all n. Proof : The case of a null vector is immediate, so we assume Xn 7^ Q) n > 1Since elementwise convergence is equivalent to convergence in, for example, spectral norm, easy matrix norm properties can be used to show that Bn An — ^ B A, which has 1 1 (—1 5 — 1 5 I the eigenvalues of B A are the same as the eigenvalues of A B A 1. It follows that the supremum over n of the largest eigenvalue of B^ An is finite. We take this number as k. The result follows from the fact that for any pair of positive definite matrices, C and D, x^cx sup — fp < A, Xt^OX^DX where A is the largest eigenvalue of D C. The second lemma is a collection of basic martingale results, which will be used without reference.

PAGE 14

Lemma 1.2 : Supfxjse -jSn, F^, n > 1 iis a submartingale. a) If Gn is a sequence of a-fields such that Gj, C Fn, n > 1 and Sn is G^ measurable, then 1 [ is a submartingale. b) If 4> is any real nondecreasing convex function with E(?i(Sfi)| < oo, then <<^(Sn), Fji, n > 1 [ is a submartingale. If Sn is a martingale, then need only be convex. c) If 1> are uniformly integrable, then Sqo = I'lm Sn exists almost surely and •|Sn, Fn, 1 < n < oo ^ is a submartingale, where Fqo = "'(U^^Fn)d) If T is a finite stopping time with IE Srpl < oo, lim / ISnl = 0, I -"^ I n-+ooj [T>n] then E(St) > E(Si). If Sn is a martingale, then there is equality in the expression above. Proof : See Chow and Teicher (1978), Chapter Seven.

PAGE 15

CHAPTER 2 ASYMPTOTICALLY FOINTWISE OPTIMAL STOPPING RULES FOR THE ESTIMATION OF A VECTOR OF MEANS UNDER A HIERARCHICAL BAYES MODEL 2.1 Introduction In this chapter we consider the sequential estimation of a multivariate normal mean, 0, under a hierarchical Bayes model. Section 2.2 describes the model and develops the A. P.O. stopping rule, under a pure sequential sampling scheme. With these in place, we give, in Theorem 2.1, an asymptotic risk expansion for our sequential procedure (which is to stop according to the stopping rule, and estimate using the posterior mean). Next, in Theorem 2.2, we establish that this procedure is asymptotically nondeficient, by exhibiting an expansion for the sequential Bayes procedure. In Section 2.3 we look at two procedures that use the same stopping rule as before, but which estimate differently. The first estimates via a posterior mean from a model with an improper prior on the hyperparameters. The second estimates by the sample mean vector, which arises as a limit of the posterior mean. These two procedures are evaluated under the original hierarchical model, with the results given in Theorems 2.3 and 2.4. In Section 2.4 we compare the performances of our procedures. 2.2 A. P.O. Rule Under a Hierarchical Bayes Model Consider the following hierarchical Bayes model. Suppose that conditional on = 6 = (^p..., ^p)^ (p > 2) and R = r, Xp X2,... are iid Np(^, 1"^^), where E is a known

PAGE 16

positive definite (p.d.) matrix. Suppose also that conditional on M = m and R = r, 6 ~ Np(mlp,(Ar)~ p j, where Ip is a f)-component column vector with all elements equal to 1, A (> 0) is known, and D is a known p.d. matrix. It is assumed that marginally M and R are independently distributed with M having a proper pdf g(m) on (-oo, oo) such that 00 / m g(m)dm < oo, while R ~ Gamma ( ia, 4b j with a > and b > 0. In the above -oo and in what follows, we say that Z ~ Gamma (a, ^5), a > 0, /? > when Z has pdf f(z) = exp(-oz)z^ ^a^/r(/?), z > 0. In order to motivate the A. P.O. rule, first we need to find the posterior distribution of 6 given Xj = x(i = 1,..., n). Note that the joint pdf of X,,..., Xn, 6, M and R is given by ffep •••?n> i, m, r) 1 j[-iAr(^ mlp)Tp-l(^ mlp)] 1, jnp oc r exp i=l 5P , r exp X e'''^')^~\{m). (2.2.1) Next write i=l = E (Xi-xn)'^?-^(x. Xn) + n(xn ef^-^Zn ~ ?), (2-2.2) i=l -1 n where x^ = n E x-Also, it follows after some simplifications that i=f'

PAGE 17

10 rp rp n(xn i) ?~^(xn " ?) + Hi " mlp) V'^ii ' mlp) |_? (n?" ^ + AD"^) \nE-lxn + Amp"Hp)J (n?~-^ + ^P ^ (nE~^ + AD"^) (nE^^Xn + AmD^hp) + inx^E"^Xn + Am^ljp Ijp rp 1 I (nE-^Xn + AmD^hp) (nE"^ + Ap-^)' (nE'^Xn + Amp-hp) I. (2.2.3) Next, using Exercise 2.9 (p. 33) of Rao (1973), one gets (ni;~^ + AD"^) ^ = n"^S n~'^E{n~^T, + X'^D) E; (2.2.4) -1t^ \-2n/„-lv j^ \-"l -1 = A-^p A-^p(n-^E + A'-'P) P; (2.2.5) = (nA) P(n"^E + A"^P) E (2.2.6) Using (2.2.4) (2.2.6), one gets, after some simplications. "the term within braces in the right hand side of (2.2.3)" = (xn mlp)'^(n-lE + A^^D) \xn mlp). (2.2.7) Now, writing T «nl=.4(5?i-^n) S-^Xj-Xn) and i=l T -1 s^2('^) = (?n mlp) (n~^E + A"^PJ ^Xn mlpj,

PAGE 18

11 (and letting S-. and 89 denote the corresponding random variables) one gets from (2.2.1)-(2.2.3) and (2.2.7), f(xiv, Xn, ^, m , r) JP -1 T exp -ir{^ (nS"^Ap"M (ni;"^Xn + AmP"hp)} (n?"^+AP~^) -1 X U (n5"^+Ap"^^ fnS'^Xn+AmD'hpn X r i(np+b)-l exp iK^nl+^n2('")+^) g(m)(2.2.8) Formula (2.2.8) has several important consequences. First, conditional on X= x(i=l,...,n), M = m, and R = r G ~ Nr ('nE~^Ap~^) (nS'^Xn + AmD'hpY r'VnE"^ + AD"^) (2.2.9) Second, the joint marginal pdf of X,, . . ., X^, M, and R is given by f^xp ..., Xn, m, rj a expl -lr^Sjj^+Sjj2(m)+ajp g(m). (2.2.10) Hence, conditional on X= x. (i = 1, ..., n) and M = m. R ~ Gamma^i(Sjjj+Sj^2('")+^)' ^(np+b)]. (2.2.11) Finally, the joint marginal of X,, ..., X^, and M is given by -i(np+b) Xj, ..., Xn, mj oc [Sjjj+Sjj2('")+^J 6('") (2.2.12)

PAGE 19

12 Consider now the problem of sequentially estimating 6 by an estimate e under squared error loss, and cost c ( > 0) per unit sample. Let Aq denote the trivial tr-algebra and An the cr-algebra generated by Xp..., Xn (n > 1). The Bayes sequential decision problem is to find a stopping time r and an A7-measurable function e,= Cy-fXp..., X^-j for which eI ||0 §rll^ + Cm is minimized. For every stopping time r, the Bayes risk is minimized by §1= e(0|A,-] = 9r, say, where it follows from (2.2.9) that for every n > 1, ^n = E(eiAn)=(nE-^AD-l) (nS-^Xn+AE(MlAn)p-hp). (2.2.13) Also Bq = E(e|Ao) = e(m|Ao)1p = (EM)lp. Next, observe that if b>2 in the Gamma prior of R, E[||e-^nll^|An] = tr{v(e|An)} = tr|E[V(e|An, M, R)|An]+ V[E(eiAn, M, R)|An] ;[R-l(n?-^Ap-l) ^|AnJ+v[(ni;-^Ap-l) ^nE'^Xn+AMD hp)|An = tr|nE"^Ap^l} E[R~^|An] + trJ (nS'^AD ^j T -n X (Ap-hp)(AP"hp) (nE"^Ap-l) W[M|An]. (2.2.14) Write Fn = fs'^^An^^p"^) and G = P ^plp P • Note that r 1 1 S^i+Sn9(M)+a E[R-l|An,M]= -\f^_^ (2.2.15)

PAGE 20

13 It follows now from (2.2.14) and (2.2.15) that !e-^nir|An n-^tr(Fn){s^l+E[S^2(^)l^n]+a} (np+b-2) + n-2A2tr(FnGFn)V[M|An] = n-^Un + W;^ (say), (2.2.16) where Un _tr(?)S^_ np+b-2' (2.2.17) ^_n-Hr(Fn-S)S^l n-^tr(Fn){E[S^^(M)|An]+a} ^° ~ np+b-2 "^ np+b-2 + n-2A2tr(F„GFn)v[MlAn]. (2.2.18) We shall now show that n Un is the dominating term in (2.2.16). To see this, first observe that using the strong law of large numbers and regularity of conditional probability RS measures, r^— ^1 a.s. (P) as n -too, where P denotes the probability measure on (fi, "iF), np+b-2 ^ ' the basic probability space on which all the random variables are defined. It remains to show that n times Wn converges to zero in probability (P) as n-+oo. Since Fn = S+0(n ), n times the first term in the right hand side of (2.2.18) converges to zero a.s. (P) as n-+oo. Since E(M^) < oo, {E[M^|An], n>ljand |E[M|An], n>lj are uniformly integrable martingales. Hence V[M|An] -* V[M|Aoo] a.s. (P), where Aqo = (^(Xp X2,.. .j. Also, VrM|Aoo]
PAGE 21

14 converges to zero a.s. as n— oo. Finally, since E(S 9(^1)) = E-| R"" EPRS r,(M)|R,M |^ = E'JR" Pf = 7-91 it follows that eFS 2(M)|An| = Op(l)Hence n times the second term in the right hand side of (2.2.18) converges to zero in probability. Thus, n Wn— *0 as n-*oo. Next, notice that using (2.2.16), E J|e OrW +crj = E[r"^UT+ W:;i + cr]. (2.2.19) Explicit determination of r minimizing the right hand side of (2.2.19) is quite formidable. As a good approximation to the Bayes stopping rule, we neglect the middle term, W7-, and follow Bickel and Yahav (1967, 1968) to define the A. P.O. rule T = Tc = inf|n>n(j: Un
PAGE 22

15 Lemma 2.1 : Suppose b>2. Then (i) Ury -t tr(i:)R"^ a.s. (P) as c^O; -h (ii) E(U-j) -^ tr(i:)E(R~') as c-^0; (iii) c^T ^ {tr(S)}^R ^ a.s. (P) as c^O; 1 , 1. (iv) E(c^T) -^ |tr(E)pE(R ^\ as c-^0. Proof : First we prove (i). Write, using the Helmert orthogonal transformation, Yj = pi + ... + Xj_j (i-l)XJ]/(i(i-l))^ i = 2, 3,.... Then one has S^^ = i:f=2^'[ '^~'^Y ; — 1 T — 1 Note that conditional on R = r, the Yj's are iid Np(0, r ^E), so that the Yj ? Y; are iid r~ XpHence, using the strong law of large numbers, Un — * r~ tr(E) a.s. with respect to the conditional probability measure, as n— *oo. The conditional measure being regular, Un —> R~Hr(i;) a.s. (P) as n— ^oo. In addition, since T— ^oo a.s. (P) as c— oo, one gets part (i) of the lemma. To prove part (ii), first observe that, conditional on R = r (>0), Sjjj/(n-l)p is the average of (n-1) iid random variables each with first moment r~ . An application of Doob's maximal inequality for bax;kward martingales now provides E sup Sj^j/(n-l)p|R = r n>no kr a.e. for Uq > 2, where k (>0) is a generic constant which may depend on Uq and p, but not on r. Now since ELR" J < oo, one gets E sup Sj^-^/(n-l)p implies immediately that E sup Un n>nn n>no < c» for Uq > 2. This < oo for Uq > 2. Using part (i) of the lemma and the dominated convergence theorem, one gets part (ii) of the lemma. Next, to prove parts (iii) and (iv), one uses the inequalities Urp < c T < c + U'p_2 (defining Un = 0) as well as parts (i) and (ii) of the lemma.

PAGE 23

16 Lemma 2.2 : Suppose ELM^J < oo. Then V(M|Arp) -^ V(M|e,R) a.s. (P) as c^O. Proof: First note that V(M|An) = E[M2|An] E2[M|An] = E{E[M2|An, 6, r] | An} E2{E[M|An, 6, R] 1 An} (2.2.21) = E{E[M2|e, R]|An} E2{E[M|e, R]| An} (2.2.22) _ e{e[m2|0, r]|a^| _ e2{e[M|0, R]|aoo} (2-2.23) = E|E[M2|e, r]| E2|E[M|e, R]| = V(M|e, R) a.s. (P) as c-^oo. (2.2.24) Equation (2.2.22) follows upon examination of the joint distribution of Xp..., Xn, M, 6, and R (see (2.2.1)); (2.2.23) follows from the fact that given e[m2] < oo, V(M|An) is a positive supermartingale and thus a.s. convergent, and (2.2.24) follows since 6 and R being Aqo measurable, the a.s. (P) convergence of Xn to and Un^ to R implies that eLm^IB, rJ and E[M|e, R] are Aqo measurable. Now using T-*cx> a.s. (P) as c^O, the lemma follows from (2.2.24). The actual expression for V(M|e, R), though involved, can easily be computed from the joint distribution of 6, M, and R. We now proceed to prove the first main theorem of this section which provides a 1 Bayes risk expansion of ^^ in the form u.^c^+a>2C+o(c). (Note: Henceforth all almost sure probability statements will be w.r.t. P, unless otherwise stated.)

PAGE 24

17 Theorem 2.1 : K (hq-I)? >9, b>2, and E(M'*)< oo, then |e 5^11 +cT 1 1 = 2c^(trE)^{r(l(b-l))/r(lb)}(|) + c (2p)'^ Atr(?p-lE)/tr(E) + A2{tr(EG?)/tr(?)} E{RV(M|e, R)} + o(c) as c— »0, (2.2.25) -1 T -1 where we may recall that G = D IplpP . Proof: Let Wn = e[r Vn]Using (2.2.14) we write [lie ^tII^ + ct = EJEyie e^liVT] + <^t} = E[T-l(trFT) Wt + T-2A2tr(FTGFT)v(M|AT) + ct] (2.2.26) = E T-l(trE) W^ + T-l(tr(F-p) tr(E))WT + T~2A2tr(F^GFrp)v(M|A^) + cT = E 1 1 _i 11 2c^(trE)^E(R '|A^) + 2c2(trE)' 1 _i W| E(R ^\Aj) (2.2.27) + T" (tr?)^ W^ c^T + T-l[tr(FT) tr(E)]WT. + T-^A^trfr^GF^) V(M|A^) (2.2.28) Note that in going from (2.2.27) to (2.2.28) we have applied the basic algebraic identity in

PAGE 25

18 Woodroofe (1981) to T ^(trE)Wrp + cT. In view of the decomposition in (2.2.28) we prove the theorem by showing that as c— +0, E 1 1,1 1 1 2c^(trE)^ E(R"^iAT) = 2c^(trE)^ %^-'^))/Kl^) (l^ 1/1 2c ^(trE)W^ E[R ''lA^] (2p) ; (c-1t-1( r rp — tX £-( jVv rr\ iTFnr trE Wn -A tr(Ep~^?)/trE; El c~^T-V{trE}%|, chy (2.2.29) (2.2.30) (2.2.31) Erc-^T-2A2tr(F^GF^)v(M|A^)1 -» A2|tr(EGE)/trE}E[RV(M|e, R)]; (2.2.32) (2.2.33) This is done in Appendix A. Remark 2.2 . It is possible to modify the stopping rule given in (2.2.20) and get the same conclusion as given in Theorem 2.1. For example, defining Un = (trE)Sj^j/(n-l)p (n>2), and the stopping rule T* as T* = infln > no: Un < cn^J, (2.2.34) we get a Robbins-type stopping rule as proposed by Ghosh, Sinha, and Mukhopadhyay (1976) (see also Ghosh, Nickerson, and Sen, 1987). However, examining the proof of Theorem 2.1, it is clear that Epie ^ +||^ + cT* 1 has the same expansion as that given in (2.2.25), under the same conditions. Alternatively if we define T* = inf|n>nQ: (trE)Wn < cn^j. (2.2.35)

PAGE 26

19 a natural prior dependent stopping rule, we again get the same expansion, using arguments similar to those of Theorem 2. 1 . We shall now prove that if N = Nc denotes the Bayes stopping rule, then E 11© ^nII^ "^ ^^ ^^ ^^^ ^^^^ expression as given in the r.h.s. of (2.2.25). Theorem 2.2 : If (nQ-l)p > 9, b > 2 and E(M'*) < oo, then for the Bayes stopping rule N, lie ^^11^ + cn] = 2c^(trE)5{r(l(b-l))/r(b/2)}(a/2)^ + cr(2p)"-^ Atr(?p-lEytr(i;) + A2|tr(?GE)/tr(E)}E[RV(M|e,R)]J + o(c) as c— +0, (2.2.36) where G = P"hpljp~^. Proof : The decomposition of Theorem 2.1, equation (2.2.28) is valid here, with "N" replacing "T". It follows that to establish this theorem, it suffices to prove a set of asymptotic relationships as in (2.2.29) (2.2.33): E 1 1,1 1 1 2c5(tr?)^E(R"^|Aj^) = 2c^(tri;)^ r(l(b-l))/r(lb) (|)^ (2.2.37) _i 1/ 1 2c ^(tr?)^ W^ N E R 2|A N (2p)"\ (2.2.38) E[c~%~\trFj^ tri;)Wp^] -^ -Atr(i:p ^i;)/tr(E), (2.2.39)

PAGE 27

20 E[c-lN-2A2.tr(Fj^GFj^)V(M|Ap^)] ^ A2|tr(?GE)/tr(S)}E[RV(M|e, R)], (2.2.40) 11 1 ,2 "%"V(tri;)%^ c%) 0, as c — * 0. (2.2.41) Here, as in Theorem 2.1, we need the behavior of N to even get started. Write .2 1 . Following Bickel and Yahav (1967), it can be shown that for any Ln(c) = E e-^n +cn stopping rule r = Tc, L7-(c) ~ infLn(c) if and only if r^ ~ Tc a.s. as c ^ 0. N being the Bayes rule, we must have Nc ~ Tc a.s. as c — 0. Then, using Lemma 2.1, cN^ _ (tri;)R~-^ a.s. as c -+ (Nc -^ oo a.s. as c 0). This behavior for N is sufTicient to establish the appropriate pxjintwise convergences associated with the integrands of (2.2.37) (2.2.40). (See the proof of Theorem 2.1.) We will find that (2.2.41) is forced, given the other relationships, and the fact that N is Bayes. It remains to establish uniform integrabilities. The details are provided in Appendix B. Remark 2.3 . It follows from Theorems 2.1 and 2.2, that the Bayes risks of the A. P.O. rule T, and the Bayes rule N, agree up to the coefficient of c. Thus the proposed A. P.O. rule is asymptotically nondeficient in the sense of Woodroofe (1981). It is apparent that the rule T* of Ghosh, Sinha, and Mukhopadhay is also nondeficient. We now turn to procedures that follow from a variant of the Section 2.2 model that puts an improper uniform prior on M, as is often done in hierarchical Bayes analysis (see, e.g., Lindley and Smith, 1972). 2.3 A. P.O. Rule Under an Improper Prior Consider the following variant of the hierarchical Bayes model given in Section 2.2. As before, conditional on 6 = ^ and R = r, let X^, X2,... be iid Np(^, r"^?), and conditional on M = m, and R = r, let 6 ~ Np(mlp, (Ar)~^p), where A (>0) is known, and E and D are known positive definite matrices. Marginally M and R are assumed to be

PAGE 28

21 independently distributed with R ~ GammaQa, Ibj, but M ~ unifornn (-00, 00). The resulting prior distribution for 9 will be improper, but the posterior distribution of Q given X. = X(1 < i < n) will be proper for every n > 1. It follows that the Bayes risk of any procedure under this model is infinite. We shall proceed formally, to develop our estimator and stopping rule, and then evaluate their performance in the context of the model of Section 2.2. In order to motivate the A. P.O. rule we must first find the formal pxjsterior distribution of e given X= x. (1 < i < n). Note that the joint pdf of Xp Xj,..., Xr, 6, M and R is given by %!•••. Jn, §, m, r) inp oc r exp -ht (xj e)^-L-\x. 6) X r exp -\\T{e mlp)'^p-l(^ mlp)]exp(-lary'' \ (2.3.1) Writing C = P~^ (lpP~hp) P~hpljp ^ it follows that (e-mlp) p-^(0-mlp) = (ljp-llp) m-(lTp-llp) (lJp-1^) + e^ce. (2.3.2) Note that C is singular since Clp = 0. Also C is nonnegative definite (n.n.d.) since for every pxl vector a, nr — 1 2 a \^a ^— a u a np < ' " ~ Ijip-^lp > 0,

PAGE 29

22 using the Schwarz inequality. Thus has an improper prior distribution, since the prior pdf is proportional to expf -^Ar^ C^ j. Integrating the joint pdf given in (2.3.1) with respect to m, and using (2.3.2), one finds the joint pdf of X p..., Xn, 6, and R given by f(x^,..., Xn, 9, r) oc r^^^exp -IrJ (xj 6) ^ % 6) X r ^(p-1) expr-iAr(e'^Ce)] exp(-lar)i jb-1 (2.3.3) Using (2.2.2) and writing Qn = nS + AC, it follows after some algebra that Tv.-1,„. ^^.Tv-l. (xn O) ?"-^(?n -i) + >^i^Qi = i Quin ?~Hnxn) + M ? Hn(2-3.4) Since E is p.d. and C n.n.d., Qn is p.d. for every n > 1. Hence 1 r.h.s. of (2.3.4) = \iQn^?~^(nxn)] Qnfc Qn^?"^(nxn)] + nxJ(E-l nE-^Qn^E ^)xn. (2.3.5) We write s^^^ = EjLi(Xi-Xn) ?"^(Xi-Xn) and s*2 = nxJ(E ^ nE ^Qn^? ^)xn, and denote by S i and S*^ the correspxjnding random variables. Then from (2.3.3) (2.3.5) one gets

PAGE 30

23 f(xp..., Xn, e, rj oc r exp 1 '{» Q;'S"'(n!n)} 9n{« " 9n'§"'(n!i.)}_ .,^<'-')exp[4r(s„,+<, + .)p''-'. (2.3.6) Formula (2.3.6) leads to two important conclusions. First, conditional on Xj = Xj (i=l,...,n) and R=r, e~Np(^n, r'^Qn^). where ^n = Qn^?~\iiXn). Note that d^ does not depend on r. Second, observe that the joint marginal pdf of X^,. • •> ^n and R is given by i(np+b-3) r 1 / * \1 f(xp..., Xn, r) a r' «xPL-f (^nl + ^02 + ^)J(2.3.7) It follows from (2.3.7) that the conditional pdf of R given Xj = Xj (i=l,...,n) is Gamma (\{s^-^ + s*2 + a), ^(np+b-l)Y Thus, formally, E(||e ^nlPlAn) = tr(Qn^)E[R-l|An] = t^(9n^){(Snl + 5*2 + a)/(np+b-3)}. (2.3.8) The next step is to establish a stopping rule based on a dominant term from (2.3.8). To obtain such a dominant term we need a valid probability model. We assume the actual observations, X-, i > 1, come from the model of Section 2.2. Then, via Lemma 2.1, S ,/(np+b--3) — R"^ a.s. as n-^oo. Also, writing C = Ss"^, and applying Exercise 2.9, p. 33 of Rao (1973) again, one can write S*2 = AxJs(An-ls'^ES + l) s'^Xn < KX^E^^Xn. (2.3.9)

PAGE 31

24 Thus S*2 is Op(l) since xJs'-^Xn is a backward submartingale. Finally, tr^Qn j ~ trfn""^?]. We have as a dominant term n'-^(trE)^Sjj^/(np+b-3)j. But this is essentially the same as n"~-^Un from Section 2.2, and so we propose the same stopping rule T as given in (2.2.20). Thus, our procedure is to stop at time T, and then estimate 6 by f j. Remark 2.4 . In the special case when S = D = Ip, the above procedure can be compared to a recent A. P.O. rule of Martinsek (1987). To see this, first note that when E = D = Ip, using the formula (see Rao, 1973, p. 33 Exercise 2.8) (A+uvT) ^ = A-^ -(l+v^A-^u) \a-1uvTa-1), T where both A and A + uv are invertible, one gets (nE-1 + Ac) = [(n+A)Ip A(p)-hplj] = (n+A)"^[lp + A(np)-hplj]. (2.3.10) Hence -1 ,-1, . -1 -1. tr(E-^An-^c) = n(n+A) (p+An"!) = (n+A) (np+A). (2.3.11) Moreover the expression for S n simplifies to

PAGE 32

S*9 = "^n vT Ip n{(n+A)Ip Ap hplj} Xn = (n+A)~^A||Xnf p-l(xTlp)H 1 P /= \^ 25 (2.3.12) -Iv-P V where Xn = P~ Ejli^njUsing (2.3.11) and (2.3.12) one gets E(lie-^nll^lAn) -1 -1 = (n+A) (np+A)(np+b-3) .2 , .._i -P. .--=-2 EllXj-Xnll +(n+A)-^nA5:(X .-Xn) +a i=l j=l -^ (2.3.13) Identifying our A, a, and b with Iq, b^ and aQ respectively of Martinsek (1987), it follows that the rhs expression given in (2.3.13) is similar to the tJn given in p. 133 of Martinsek (1987) when p = 1. One basic difference is that instead of Xn in (2.3.13), Martinsek uses a Y which is independent of the X-'s, and uses this independence in his proof. The other major difference is that Martinsek (1987) uses the entire expression Un ^similar to the rhs of (2.3.13)) given in his p. 133 in defining his A.P.O. rule, in the spirit of T proposed at (2.2.35). We use the dominant term ElLlH^i " Xn|P/(np+b-2) in defining our rule. We shall also show that our procedure is not asymptotically more deficient than that of Martinsek (1987) because of the extra simplification in defining the A.P.O. rule (see Remark 2.5). We now develop the asymptotic Bayes risk expansion of our "noninformative" procedure, say (T, 0ry), under the general model of Section 2.2.

PAGE 33

26 Theorem 2.3 : Consider the model as proposed in Section 2.2. If {nQ-l)p>9, b>2, and E(M^)
PAGE 34

27 Theorems 2.1 and 2.3 involving V(M|9, R) are zeroes), :Atr(Ep-hplJp^^i:)/{(trE)(lJp-hp)} + o(c). (2.3.18) Further simplifying to the case D = E == Ip, (2.3.18) becomes cAp~^ + o(c). This agrees with the findings of Martinsek (1987). Thus, we may conclude that even for known M, the proposed diffuse prior procedure is first order efficient, but is not nondeficient unless p = Pc -+ oo as c— fO. The same phenomenon api>ears in Martinsek (1987). Remark 2.6 . At this point it is useful to consider the performance of (T, Xrp) relative to (T, dry) (or equivalently the T* case of Ghosh et al., 1976). We find that both procedures are first order efficient, but dry has a larger second order efficiency than Xrp. This follows from the theorem below. Theorem 2.4 : Consider the hierarchical model proposed in Section 2.2. If (nQ l)p > 9, b > 2, and E(M^) < oo, then |e-X:j,|| +cT ] 2c^(trE){r(l(b-l))/r(ib)}(|)^ + c(2p)"^ + o(c) as c-.0. Proof : Standard Bayesian calculations lead to E lie x^ii^ + ctJ = E lie f^ii + cTj \ii Xjl In view of Theorem 2.1, it suffices to show that

PAGE 35

28 c-^E V '2 I U rr\ "~ J\ rp -^ Atr(Ep"^E) /tr(i:) A2|tr(i:GS) /tr(E)} E|RV(M|e, R)} as c^O. (2.3.19) This is done in App>endix D. 2.4 Summary and Comparison It is useful at this point to collect our results for an overall comparison. We have three "competing" procedures: (T, Orj.), (T, ^rp) and (T, Xrp). They can be viewed as methods based on decreasing awareness of, or confidence in the "true" model. Their asymptotic performances take the following forms: [©-^xf + ^^ = wjc^ + cL(2p) ^ Atr(Sp-^E)/tr(S)J + A2{tr(i;GE)/tr(E)lE[RV(Mie, R)] + o(c) 2 , J/o„N 1 ^...^T^-lv^^/tr(I;)] = u^c^ + c|_(2p) Atr(Ep"^i;)/t + A|tr(i:(p-l-C)E)/tr(i:)}E[V(M|e, R)/V*] + o(c) (2.4.1) where V — "(!?P-'!P) AR = V(M|0, R) under a Uniform (-oo, cxd) prior, E 1^ + cT = w^c^ + c[(2p) ^ Atr(ECE)/tr(?)J + o(c), (2.4.2) ©-^x + ^^ E e-XrpB + cT ^ -1 = WjC^ + c(2p) + o(c). (2.4.3)

PAGE 36

29 A comparison of the second expression for the risk of (T, 9^) and the expression for the risk of (T, ^rp) shows the effect of the "lost" information concerning M. If g(m) expresses a precise knowledge of M, then V(M|e, R)/V* will tend to be close to zero. If g(m) expresses vague knowledge of M, then V(M|e, R)/V* will tend to be close to one, in which case the two risk expansions would coincide. A comparison of the asymptotic risks of either (T, 9rj.) or (T, drr.) and (T, Xrp) brings out the role of A in the model. As a measure of the relative variation in the X's given 6, R compared to the variation in 9 given M, R, we see that the use of posterior means in estimating is most advantageous when A is large; that is, when there is relatively more variation in the X's than in 6. Large values of A correspond to more precise priors on 0. Recall that Or^ = (tE^^AD"^) (tE'^X^p + AE(M|A^)p~hp), and thus that Xrp appears in the limit as A — 0. Similarly, Xj appears as the limit of ^rp as A 0.

PAGE 37

CHAPTER 3 ASYMPTOTICALLY POINTWISE OPTIMAL STOPPING RULES FOR THE ESTIMATION OF A VECTOR OF REGRESSION PARAMETERS UNDER A HIERARCHICAL BAYES MODEL 3.1 Introduction In this chapter we consider the estimation of a vector of regression parameters, ^, from a generalized linear model with multivariate, normal error structure, under a hierarchical Bayes model. We proceed as in Chapter 2, noting that the regression model can be considered a generalization of the model of that chapter. Section 3.2 describes the model and develops the A. P.O. stopping rule. Then, Theorem 3.1 gives an asymptotic expansion for the procedure which uses the posterior mean estimator. In Theorem 3.2 we show that the procedure is asymptotically nondeficient, again showing an expansion for the sequential Bayes procedure. In Section 3.3 we look at an improper prior based estimator and the weighted least squares estimator, in conjunction with the A. P.O. stopping rule. Risk expansions for these two procedures, under the original model, appear in Theorems 3.3 and 3.4. In Section 3.4 we compare the performances of the procedures. Before continuing, we should note one of the new features of our regression scenario. Observations will come at different points in "design" space, and will not be identically distributed. We will require some additional constraints, and some additional tools to handle this fact. It is convenient to give some of the tools now. The first proposition provides for considerable simplification of argument, and is, perhaps, not well known. 30

PAGE 38

31 Proposition 3.1: Suppose we observe a sequence of random (kxl) vectors, |Yn, n>l|, generated via the linear model Y.=X.^_+c., where the X-'s are (kxp) matrices of rank p, with k > p. Also, the gj are independently distributed ais N, (0, a^V) random vectors. Then the error sum of squares, SSE, has the following representation: " " wp2 SSE= E Yi-Xi/?JJ 1=1' i=2 where (3^ is the weighted least squares estimator of /?, and where the Z's are independently distributed with Z .2.2 2.2 • 1 <7^Xk_pandZj ~ a^Xk^ i = 2,..., n Proof : See Finster (1983), Section 3. The second proposition provides an integrability result for dealing with averages of independent, nonidentically distributed random variables, when we do not have a backward martingale structure. Proposition 3.2: Suppose we have a sequence of independent random variables JYj, i>l| such that sup El Y. I < oo, i/ an integer > 2, then, for < r < i/, i>l ' " supn n>l n < oo.

PAGE 39

32 Proof : First we show that it suffices to consider JYj, i>lj such that EYj 0, all i. Let EY. = //.. Then sup n nl iS"' But sup n n>l n is finite, so we take EYj = 0, all i. This gives us that Sn 1/ _ n n > 1, is a nonnegative L^ submartingale, which will be useful. We have E sup n n t dt n > t" dt 00 / < r + r / t^~h\ max n"*^ J \n
PAGE 40

33 Y, k~''E(Su-Su i) is bounded above by some constant, for all m. We have < c g k-4 E^ e(s[_j)1 k=2 li=2 ^ ^J where C is a generic constant, and we have used a global bound on the moments of |Yj^|. At this point, since SJ _. — k-1 j=l ^ is the i absolute power of a sum of independent zero mean random variables, we apply a result shown in Chung (1951), to obtain m / \ m ,M-lf o~^k-l i :=2 ^ '^ '^ ^^ k=2 i=2V j=l ' •" > m ,^-1 9 k=2 i=2 1 ''-I
PAGE 41

34 Proposition 3.3: Suppose we have a sequence of positive definite matrices, -j Aj, i>l |, plus a n pd matrix B, and a symmetric matrix C. If ^ (A.-B) ^ C as n— oo, then i=l n((n->f;A,) B"M -* -B"^CB~^ asn^oo. Proof: We write "((»-' £*!)"' B-') = (n-\|AO''n(B „"' £ A,)?-' -1 = (n-^i:Aj) (E(B-Aj))b-1 -B ^CB ^ as n -^ 00, n \ / _i n \ _i since y;(A. -B)-»Casn-*oo implies that (n^X)^;) -+Basn-+oo. Thus, \^\~^ ~> ' ^ i=l ^' the lemma holds. We now turn to the main development of the regression model results. 3.2 A.P.O. Rule Under a Hierarchical Bayes Model Consider the following hierarchical Bayes model, an expansion of the model of Section 2.2. Suppose that conditional on /3 = /? = (/?p..., /?p) and R = r, Yj. Y2V are independently distributed Nj^^Xj/?, r~Vj, where Y is a known positive definite (p.d.) matrix, and {X., n > 1} is a sequence of kxp matrices of rank p, with k > p. Suppose also that conditional on M = m and R = r, /? ~ Npfmlp, (Ar) D j, where Ip is a p-component vector of I's, A (>0) is known, and D is a known p.d. matrix. It is assumed, as before, that M and R are marginally independent, with M having a proper pdf, g(m), on (-00, 00) such /oo m g(m)dm < 00, while R ~ Gammafia, Ibj. Finally, we require a certain -00 amount of stability in the behavior of the design matrices to develop asymptotics. Here, we

PAGE 42

35 suppose that ^ X^'y^XnE ^ ^ H as n-+oo, for some pair of matrices, E and H, witli i=l E positive definite. An example of a situation where this condition is satisfied is that where the design matrices are constant from some fixed point on. Then we have ]C ?i Y ^i = i=l ^ X?^V"-^X. + Y. x'^y-^X. We would choose E~^ equal to X V "^ ^n^ S ^^^^^ ^° i=l i=n(j+l Y, (x7v~^XE~^)In this case the stability condition gives a measure of how far the observations are from being identically distributed. We again need the posterior distribution of the parameters to determine a stopping rule. From our discussion, the joint pdf of Yp--, Yn0> M, and R is given by f(yiv, yn. ^. m, r) ink oc r exp i i:(yi-Xi/?)TY-i(yi-Xi^) P r X r^exp -^(^-mlp) p-l(^-mlp) 2-1 X r^ exp [-f]g(m). (3.2.1) Expanding the quadratics in (3.2.1), and using techniques like those of Chapter 2, we obtain f(yiv, yni P, m. r) 2P r. (X I exp 1 1 -l[p -(Ap-^Mn)" (Amp-hp+Pn)} X (Ap-^Mn){^ (Ap-l+Mn) (AmD hp+Pn)} X r i(nk+b)-l ^^p[i(Snl + Sn2("^) + ^flg^'") (3.2.2)

PAGE 43

36 where n T 1 i=l Pn= Ex^y-^yp i=l and Snl-£fc-^i^n) Y~%-h^^^y Sn2(^) = (^n '"IpX^"^? + Mn^) (^^ mlp) with /?n = Mn'^Pn. the weighted least squares and maximum likelihood estimate of ^. We will abuse notation, and not distinguish between Pn, S^^p and 8^^2(1") as realizations, and the corresponding random variables. It follows from (3.2.2) that conditional on Y-^ = Iv---' Yn = yni M = m, and R = r, 13 ~ Np((Ap-l+Mn)"\Amp-hp+Pn), r-HAp-l+Mn)"^)(3-2.3) Also, conditional on Y^ = yp..., Yn = Xni M = m, R ~ Gamma U{s^i + S^^i"^) + a), i(nk+b) J (3.2.4) and the joint marginal of Y j,. . ., Yn, and M is given by -|(nk+b) f(yi,..., yn, m) a [S^^ + Sj^2M + ^] g("^)^^-^'^^ The posterior mean of p is thus

PAGE 44

37 ^n = E(^|An) = (AD"^ + Mn) ^(AE(M|An)p hp + Pn), and ^0 = E(^|Aq) = E(M)lp, where An is the cr-algebra generated by Yp-i Yn and Ag is taken as the trivial cr-algebra. If we estimate /? by ^nand b > 2, we obtain the following posterior estimation risk, via the same calculations as those of Chapter 2: \/3-ln\ An = tr(Ap-l+Mn) e[r ^|aJ 1 HP + tr{(Ap-l+Mn)~ (Ap-hp)(Ap-hp) (Ap-^+Mn)} V(M|An) = n-ltr(Fn)E[R-l|An] + A2n-2tr(FnGFn)V(M|An), (3.2.6) ,-It^-I . .-1 li iTr,-l where Fn = (An~^P~^ + n~^Mn) and G = P MplpP ^ Note that 1 S„l+S„9(M) + a E(R-l|An,M)= "^,,;t-2 (3.2.7) We can write (3.2.6) as l^-^nf |An = n-ltr(Fn)|fs„j + E(s„2(M)|An) + aj /(nk + b 2) + n-^A^tr(FnGFn)V[M|An] = n ^Un + n ^Rn, (3.2.8)

PAGE 45

38 where Un = tr(E)S^i/(nk + b 2); Rn = tr(Fn ?)S„i/(nk + b 2) + n-^tr(Fn)|(E[S^2(^^)l^n] + a)/(nk + b 2)1 + n-^A^tKFnGFn) V[MlAn] (3-2.9) and E is a p.d. matrix. We want Un to be the dominant element of (3.2.8), as in Chapter 2. There, however, fn automatically converged to E. Here, the imposed stability constraint, f^X.V~^X.-nE~^ -^ H, implies that n~^ X^ XjY"^Xj = n~%n -^ ?""^ and hence that i=r'' ' ' i=l Fji— >E, as n— ^oo. With this implication, we can show that Un is the dominant term, much as in Chapter 2. First note that conditional on R = r and ^ = ;0, S^^^/nk is the m.l.e. of r"^ It follows that Un -^ (trE)R~^ a.s. as n-*oo. Also, V(M|An) is a positive supermartingale, and thus almost surely convergent. Finally, since E||Sjj2(M)J = e{r-1e[RSj^2(^)I^' ^]} = e[r-1e(x?)] = g, it follows that E[Sj^2(^)l^n] '' '^P(^)Combining these facts we have that Un ^(trE)R"^ and Rn ^ as n-^oo. It follows that the stopping rule T = Tc = inf{n > n^: Un < cn^} (3.2.10) is A.P.O. in the sense of Bickel and Yahav (1967). Note that this is the "same" rule as that develop)ed in Chapter 2, and, again we can take Uq = 1, for the moment. Remark 3.1 . As in Chapter 2, our A.P.O. rule is not dependent on the prior distribution of M.

PAGE 46

39 Before establishing a risk expansion, and proving the asymptotic nondeficiency of the proposed A. P.O. rule, we state a lemma containing the first order properties of the procedure. Lemma 3.1 : Suppose b > 2. Then (i) Urp -^ tr(E)R"^ a.s. as c -^ 0; (ii) E(Urp) ^ tr(K)E(R"^) as c -+ 0; (iii) c^T ^ (trE)^R ^ a.s. as c -. 0; (iv) E(c2T) -^ (trE)2E(R ^ J as c ^ 0; (v) V(M|Arp) -^ V(M|^, R) a.s. as c ^ 0. Proof : First we prove (i). It follows from the definition of the stopping rule that T— »-oo a.s. as C-+0. In conjunction with the previously noted feict that Un — ^ (tr?)R a.s. as n— oo, n the result follows. To establish (ii), use Proposition 3.1 to write S^^^ = J2 Zj, where the Z's i=l are independent with Z^ ~ r~^Xk_p and Zj ~ r~^Xk' i > 2, given ^ = ^ and R = r. Then sup (S ,/nk + b 2) n>n(j < E(Z^) + KE < E(Zj^) + KE sup (Jz./n-l) (|/iAo-0 using Doob's maximal inequality applied to the backward martingale, |£ Zj /(n-l)|, and where k is some constant. Dominated convergence then gives (ii). Arguments, as in Lemmas 2.1 and 2.2 suffice to establish (iii) (v). We now develop the second order properties of our A. P.O. procedure. We start with a Bayes risk expansion. The result should be compared to that of Theorem 2.1.

PAGE 47

40 Theorem 3.1 : If (nf,-l)k > 9, b > 2, E(m'') < co, and Mn-nE ^ ^ H as n^oo, then (3~'0r^\ H-cT 1 1 ^2c\trV)Hr{^)/T(^m + cl(2k) ^ tr(EHK)/tr(i;) Atr(Ep-^E)/tr(i:) + A2{tr(EGE)/tr(?)} E[RV(M|^, R)] + o(c) as c— + 0. (3.2.11) Proof : Let Wn = E(R" Vn)Then, exactly as in (2.2.28), l^-^^f + cT 1 1 _i 1 1 2/-*.v\2 = E^2c^(trE)2E(R 2|A^) + 2c'(trE) W^ E(R ^|Arp) 1 1 1 + T-^ (trE)2w|, c^tJ + T-^[tr(FT) tr(E)]WT + T"2A2tr(F^GFrj,) V(M|A^) \. (3.2.12) As in Theorem 2.1, we establish Theorem 3.1 by showing that as c -+ 0, E 1 1 _i 2c^(trE)^E(R ^|A^) 1 1 = 2c^tr?)^ r(ti)/r(b) (1)^ (3.2.13) 1 1,1 2c "2(trE)^(w|, E(R ^|A^)) = (2k) ; (3.2.14)

PAGE 48

Erc-lT"l(tr(F^) tr(E))w^ 41 -tr(EH?)/tr(E) Atr(Lp ^?)/tr(E); (3.2.15) E[c~^T"2A2tr(F^GF-p)V(M|A^)] A2{tr(EGE)/tr(E)}E[RV(M|^, R)]; (3.2.16) 11 1 2' c"^T-^((tr?)'W|, c'Tj (3.2.17) This is done in Appendix E. Remark 3.2 . The chief technical difference between the vector of means model and the regression model is that the second provides independent, but not identically distributed observations. This has a direct impact on the proofs of (3.2.13) through (3.2.17). Nonetheless, Remark 2.1 is appropriate here, too. That is, other stopping rules, both prior independent and prior dependent, exhibit the same risk expansion. The chief structural difference is the presence of a "design effect," -tr(EHE)/tr(E), among the second order terms. E We shall now prove that if N = Nc denotes the Bayes stopping rule, then 9 ~1 ,^_^ j -IcN has the same expression as given in the r.h.s. of (3.2.11). Theorem 3.2 : If (nn-l)k > 9, b > 2, E(M'^) <
PAGE 49

42 1 , ,1 P l^f + cnJ = 2c5(trE)^{r(l(b-l))/r(b/2)}(a/2)^ + c (2k) tr(EHE)tr(E) Atr(Ep ^EytrC?) + A2|tr(?Gs)/tr(E)}E[RV(M|e,R)] + o(c) as c— 0, (3.2.18) where G = P"hpljp"^ Proof : The decomposition of Theorem 3.1, equation (3.2.12) is valid here, with "N" replacing "T". It follows that to establish this theorem, it suffices to prove a set of asymptotic relationships as in (3.2.13) (3.2.17): 1 1,1 1 1 2c5(tr?)^E(R""^|AN) = 2c5(tr?)^ r(l(b-l))/r(ib) (|)^ (3.2.19) 1/ 1 2/ xxr2 N 2c "(tr?)'' Wi, E R"^|An (2k) \ (3.2.20) E[c-%-\trFj^ trE)Wj^] -tr(EHi;)/tr(E) Atr(Ep-lE)/tr(E), (3.2.21) E[c-lN-2A2tr(Fj^GFj^)V(M|Aj^)] A2|tr(EGE)/tr(E)}E[RV(M|e, R)], (3.2.22) E 1 1 1 ^2 -%-^((trE)%^ c^n) 0, as c — 0. (3.2.23) Here, as in Theorem 3.1, we need the behavior of N to even get started. Write " .2 " . Following Bickel and Yahav (1967), it can be shown that for Ln(c) = E Mn\ + en any stopping rule r = re, L^(c) ~ infLn(c) if and only if r^ ~ Tc a.s. as c — 0. N being

PAGE 50

43 the Bayes rule, we must have Nc ~ Tc a.s. as c -> 0. Then, using Lennma 3.1, cN^ -^ (trE)R"-' a.s. as c -+ (Nc -<• oo a.s. as c ^ 0). This behavior for N is sufficient to establish the appropriate pointwise convergences associated with the integrands of (3.2.19) (3.2.23). (See the proof of Theorem 3.1.) We will find that (3.2.23) is forced, given the other relationships, and the fact that N is Bayes. It remains to establish uniform integrabilities. The details are provided in Appendix F. 3.3 A. P.O. Rule Under an Improper Prior Suppose we have the model of Section 3.2, with the following exception: g(m) = 1 on (-00, oc.). The resulting prior distribution on /? will be improper, but the posterior distribution of /? given Y. = yj (1 < i < n) will be proper for every n > 1. It follows that the Bayes risk of any procedure under this model is infinite. We shall proceed formally, to develop our estimator and stopping rule, and then evaluate their performance in the context of the model of Section 3.2. Once again, in order to motivate the A.P.O. we must first find the formal posterior distribution of /3 given Y= y; (1 < i < ")• Note that the joint pdf of Yp..., Yn, P, M, and R is given by f(yiv, yn. P^ m, r) iE(yrXi^rY-i(ypXj^)] ~ T ~l --1 '^ -^(^-mlp) p-l(^-mlp)J r^ e ^. (3.3.1) ink oc r exp X r^exp b_i -IS I Writing C == P"^ (lpP~hp) P hplpP ^ it follows that

PAGE 51

44 ,T (^-mlp) p-l(^-mlp) -f = (lpP-hp)[m (lJp-hp)"STp-l^J + ^Tg^ (332) Integrating the joint pdf in (3.3.1) with respect to M, using (3.3.2), we find f(yiv, yn> P, r) nk r oc r ^ exp iE(yi-Xi^)'^Y-i(yi-Xi^) 1=1 X r p-i 2 exp[-^^'^C^] r^ b 1 -f e ^. (3.3.3) As before, let n T 1 Mn= EX; Y"%, i=l Pn= Ex^'y-Vii=l (3.3.4) Then, it follows after some algebra that E (yi-?i^)'^Y"Hyi-Xi^) + A^Tg^ T = {§(Mn+AC)~Vn) (Mn+AC)(^ (Mn+AC)~ Pn) + Ey?'y"Vi-Pn(Mn+AC) \n i=l

PAGE 52

45 I 1 (^ (Mn+AC)"Vn) (Mn+AC)(^ (Mn+^C)" Pn) + ECyrXiMn^Pn^Y'^Cyi-XiMnlPn) i=l + Pn Mn^S(s'^Mn% + I)S^Mn^Pn, (3.3.5) where we have written AC as SS'^ and expanded (Mn + AC) , since C is n.n.d. with ranlc k 1. Now write S„i= i:(yi-XjMnlPn)'^Y-l(yi-XjMnlpn)= i:{yrXi^-)'^Y-l(yr^i^n) i=l *' 1=1 S*2 = P?Mn^S(s'^Mn^S + I)S'^Mn^Pn and, again, we will allow context to distinguish random variables and their realizations. From (3.3.3) (3.3.5) we get f(yiv, yn. i^ ink a r exp T 1 -l{p (Mn+AC)'Vn) (Mn + AC)(^ (Mn + AC)" Pn)_ X r exp i(Snl + ^^2 + 4 (3.3.6) Formula (3.3.6) leads to two important conclusions. First, conditional on Yj yj (i = 1,..., n), and R = r, ^ ~ Np((Mn+AC)~ ?„, r^^Mn + AC) ). Let ^n = (Mn+AC)~ Pn and observe that it does not depend on r. Second, from the joint marginal pdf of Y,,..., Yn and R-. we can obtain that conditional on Yi = yi (' = ^'••' ")' ^ ~ Gamma(l(S^l + S*2 + a), i(nk + b 1)). Thus, formally, the posterior risk, using ^n, is

PAGE 53

46 ^nfAn = tr(Mn+AC)~ E[R-l|An] = {tr(Mn+AC)~^}(Sj^l + 8*5 + a)/(nk + b 3) tr(?){Sj^j/(nk + b 3)} + tr{(Mn+AC)"^ ?}(S^i)/(nk + b 3) + {tr(Mn + AC)"^}(S*2 + a)/(nk + b 3). (3.3.7) The next step is to establish a stopping rule based on a dominant term from (3.3.7). To obtain such a dominant term we need a valid probability model, and we assume the observations Y., i > 1, come from the model of Section 3.2. Essentially the same argument as that of Section 2.3 for 6 establishes that (trE)|Sjjj/(nk + b 3)| is a dominant term. This is almost Un from equation (3.2.8), and so we propose the same stopping rule T as given in (3.2.10). Thus, our "noninformative" prior procedure is to stop at time T, and then to estimate /? by ^rp. We now develop the asymptotic Bayes risk expansion for our noninformative procedure, say (T, ^rp), under the model of Section 3.2. Theorem 3.3 : Consider the model proposed in Section 3.2. If (nQ-l)k > 9, b > 2, E(M^) < 00, and Mn nE~ — H as n— oo, then II' .ill

PAGE 54

47 ^-^T^f + cT 1 1 = 2c^(t.?)^ r(l(b-i))/r(ib) (I) + c[(2k) "* tr(EHS)/tr(E) Atr(EgE)/tr(E)J + o(c) as c — > 0. (3.3.8) Proof : Using standard Bayesian arguments, (|^^Tf + cT = E /?-/?T +«^T + E '^-§T (3.3.9) In view of Theorem 3.1, it suffices to show that Efc-^l^-^S^f -^ Atr{?(p-^-C)?}/tr(E) A2|tr(EGE)/tr(§)}E[RV(M|^, R)]. (3.3.10) This is done in Appendix G. We again evaluate the performance of the usual frequentist estimator, 0^^ , in this sequential context. We have, for the procedure (T, l3lj), the following result. Theorem 3.4 : Again, consider the proper regression model. If (nQ-l)k > 9, b > 2, E(M ) < oo, and Mn nT,~ -+ H as n— voo, then

PAGE 55

48 \tm + '^ 1 1 = 2c^(trE)^ r(l(b-i))/r( f + c[(2k) ^ tr(?HE)/tr(E)J + o(c) as c — » 0. (3.3.11) Proof : Standard Bayesian calculation leads to \tm + '^ p-p ^J^ + cT + E P-P^ (3.3.12) In view of Theorem 3.1, it suffices to show that E r^j^^^r Atr(?P"^E)/tr(E) A2|tr(EG?)/tr(E)]>E[RV(M|^, R)] as c -* 0. (3.3.13) This is done in Appendix H. Remark 3.3 . If we specialize our results to the case where E = D = I, H = and M has a degenerate distribution, then it is possible to make a comparison with the results of Finster (1987). He uses a different loss structure, but because the posterior estimation risk is similar to ours, the overall risk expansions show comparable terms. In our case we obtain, for (T, prj), that wjc^ + u^c = 2c2p2E(R ^) + c^(2k) Aj. This agrees with Fmster's expansion, upon equating our R"\ p, k and A with his a^, k, m and Tq. By constraining M to be degenerate, we return to a conjugate prior set-up, which is what Finster uses.

PAGE 56

49 3.4 Summary and Comparison It is useful at this point to collect our results in order to make comparisons. We have discussed three procedures: (T, ^j), (T, Prj.) and (T, ^^). As in Chapter 2, they can be viewed as methods based on decreasing awareness of, or confidence in the "true" model. Their asymptotic performances take the following forms: 13-13 j\ +cT 1 r= UifC + C (2k) ^ tr(EH?)/tr(i:) Atr(i;p-^?)/tr(E) + A2jtr(EG?)/tr(?)}E[RV(M|^, R)] + o(c) 1 w^c^ + c (2k) ^ tr(?HE)/tr(E) Atr(Ep-^i;)/tr(i:) + A|tr(i:(p-^-C)£)/tr(S)}E[v(M|^, R)/V*] + o(c), (3.4.1) where V (ipP hp)''^R I = V(M|^, R) under a uniform (-co, oo) prior, (I E ^-^rp +cT = u-^c^ + c[(2k) ^ tr(EHE)/tr(S) Atr(i:CE)/tr(E)J + o(c). (3.4.2) I^-^tI + ctJ = w^c^ + c(2k) Wo(c). (3.4.3) A comparison of the second expression for the risk of (T, ^rp) and the expression for the risk of (T, ^rp) shows the effect of the lost information concerning M. If g(m) expresses a precise

PAGE 57

50 knowledge of M, then V(M|^, R)/V* will tend to be close to zero. If g(m) expresses vague knowledge of M, then V(M|^, R)/V* will tend to be close to one, in which case the two risk expressions will coincide. A comparison of the asymptotic risks of either (T, ^rj,) or (T, (3n^) and (T, (3^) brings out the role of A in the model, just as in Chapter 2. Large values of A, indicating relatively more knowledge of the variation in f3 about Mlp, produce greater risk reduction by (T, ^^) or (T, ^^) over (T, ^^). Recall that 1^ = (Ap"^M^)~'^(AE(M|A^)p-hp + P^), which goes to M^^^Pj = ^^ as A goes to zero. Similarly ^rj. goes to ^^ as A ^ 0. It is also useful to look at the differences generated by going from the vector of means case to the regression parameters case. Comparing the statements of Theorems 2.1 through 2.4 with those of Theorems 3.1 through 3.4, we find only minor structural differences. There is the dimension reduction efTect of going to the regression model, which changes the c(2p) term to c(2k)~ . The greater the dimensional reduction provided by l3, the relatively smaller the risk contribution. Note that the asymptotic nature of the term in both models is second order. The other difference is in the appearance of a "design" term, -ctr(EHE)/tr(S). If the design matrices, X; are all the same, and we write Xj^Y'^Xj = S'^ then H must be null, and the extra term in the risk expansion disappears. When present, the design term has a negative coefficient, because of the matrix calculus involved. The term itself is not always negative, however. To get a better idea of what is going on, we note that a sufficient condition for -tr(SHi;)/tr(E) to be negative is for H to be positive definite. In this case the deviations of the XjY"^Xj's from E"^ are points, on the whole, "farther out" in the design space. These points contribute to more accurate estimation of l3. The situation of H negative definite produces the opposite effect. Again, the contribution is second order.

PAGE 58

CHAPTER 4 SUMMARY AND FUTURE RESEARCH 4.1 Summary In Chapter 2 we considered sequential estimation of a vector of normal means, 6, under a hierarchical Bayes model with loss given by sum of errors squared plus (linear) cost. We determined an A.P.O. stopping rule and showed that the corresponding sequential procedure was asymptotically nondeficient with respect to the Bayes sequential estimation 1 procedure, having the same Bayes risk expansion: w^c^ + ui^c + o(c). We then proceeded to look at the performance of estimators other than the posterior mean in conjunction with our A.P.O. rule: a diffuse prior based estimator and the traditional sample mean vector. We found risk expansions for these procedures, and identified their performance losses, which occurred in the order c terms. In Chapter 3 we considered sequential estimation of a vector of regression parameters, /?, from a generalized linear model with normal error structure, under a hierarchical Bayes model. In an analysis parallel to that of Chapter 2, we developed an A.P.O. stopping rule and evaluated the performance of estimation procedures based on the A.P.O. rule m conjunction with the true posterior mean of ^, a posterior mean arrived at via diffuse prior calculations, and the traditional weighted least squares estimator. Again, the basic procedure was shown nondeficient, and the remaining procedures showed deficiencies in the order c terms of their risk expansions, with the least squares estimator showing the greatest deficiency. We also indicated the practical differences produced by extending to a regression model. 51

PAGE 59

52 4.2 Further Research There are many directions in which to pursue further research, staying in the framework of A. P.O. rules for sequential multiparameter estimation. Three quickly identified '4"ronts'' are the model, the loss structure and the stopping rule. In the context of the model we can consider adding a general hyperprior component to the gamma parameters associated with the prior distribution of R. We can also consider putting a prior distribution on A. In the context of loss structure, we can look at various extensions, perhaps to the generality of Bickel and Yahav (1968), who looked at general smooth losses, having specified behavior near zero and infinity. In the context of stopping rule, we have noted certain simple modifications of the A. P.O. rules developed, which also prove nondeficient. Specifying a class of nondeficient stopping rules could be quite interesting. Finally, outside the realm of A. P.O. rules, it would be very practical to consider multiparameter estimation under hierarchical models when sampling is done in two or three stages. Full sequential sampling is not always even possible, much less practical. An asymptotic approeich could allow the sizes of the stagewise samples to increase when cost decreases, as for example in Hall (1981).

PAGE 60

APPENDIX A DETAILS OF THEOREM 2.1 Here we establish the details of Theorem 2.1. For convenience, we rewrite equations (2.2.29) (2.2.33). First we prove (2.2.29). Equation (2.2.29): For all c, 1 1 , _i 1 1 2c5(trE)^ e(r" Vt) = 2c^(trE)^ r(i(b-l))/r(lb) (|) Proof : Since < e(r ^|An), n > > is a uniformly integrable martingale, the optional stopping theorem applies to the l.h.s. of (2.2.29). The result follows upon computing the _i prior mean of R ^. We now consider (2.2.32). Equation 2.2.32: As c -+ 0, rc-^T-2A2tr(F^GF^)v(M|A^) A2{tr(EGs)/tri;}E[RV(M|0, R)]. Proof : From Lemmas 2.1 and 2.2, and the fact that T->oo a.s. as c-+0, we obtain T~2A2tr(F^GF^)v(M|A^) A^J trfEGEW? Uv(M|0, R) a.s. as c^O. (A.l) 53

PAGE 61

54 The result will follow by showing that the l.h.s. of (A.l) is uniformly integrable. We exhibit an integrable dominating function. Note that, using (2.2.17), (2.2.20), and Lemma 1.2, c"^T"2A2tr(FjGFrj,)V(M|A^) < A2u;^^tr(EGi;) E(m2|A^) < A^ftrEGE/tri;) sup E(M2|An)(s i/(n-l)p) . '^n>n (A.2) — z Note that the sequence •|(S i/(n-l)pj , n>nQ^ is a backward submartingale, since S 2/(n-l) is a mean of exchangeables, and that ^E (M |An), nQnn ^ °^ ' >-"0 < E sup E2(M2|An) n>no E 1"P (Sn„l/("-l)p) n>nn^ < K E^ [ e2(m2|Aoo)]e5 (Sn„l/("0-^)p) -r n = KE^Le2(m2|Aoo)J E R^E Knl/(no-l)p) -2 R, < K E^(M^)E^(r2)E^(x2^^_i)p)^^ < cx), (A.3) since (nQ-l)p > 5 and ER < oo. In the above, and in what follows, K is a positive generic constant which may depend on Uq and p, and need not be the same at different steps. Thus,

PAGE 62

55 the l.h.s. of (2.2.34) is uniformly integrabie, and so from (A.l) and (A. 3) we obtain (2.2.32). We next prove (2.2.31). Equation 2.2.31: As c -+ 0, c"^T~VtrFrp trEWrp -A tr ED"'l] /tri:. Proof : Note that |Wn, 0 since R is Aqo measurable. Also, F^ E = -T ^AE^D+AT ^e] E. It follows that c"^T"l(tr(F^ E))w.j, = -c~^T"2tr(AE(P+AT"^E) E)Wrp -A|tr(Ep ^E)/trEJ a.s. as c^O, (A.4) via Lemma 2.1, the martingale convergence theorem, and the fact that T—kx a.s. as c— 0. It remains to show uniform integrability. We have, using (2.2.20), -1 c~^T~2tr(AE(P + AT~^E) eW-j < AU;^^tr(Ep~^E)Wrj, < Atr(ED~^E) sup fUn^Wn) n>nr "^"0 < A{tr(Ep-lE)/trE} sup 1 + [E(Sjj2W|An) + a] (S^^j) -1 (A.S) where the last inequality follows from the fact that Wjj = (^nl + ErSjj2('^)|An1 + aV(np+b-2). It now suffices to show that sup ErSjj2('^)lAn]S~J

PAGE 63

56 is integrable since the argument will imply E T sup S^} n>n(j < oo. Note that 1/E[S^2(^)l^n] < E (Xn Mlp) (n"^? + AD ^) (Xn Mlp)|An < KE _(Xn-Mlp)'^?-HXn-Mlp)|An] nQ| and jXn (R?~ )Xn, n>nQ|, the Schwarz inequality, and the maximal inequality for submartingales, we can obtain " sup E[S„2(M)|An]S-;' n>no < K(no-l)p) X E sup |(xJ(R?-l)Xn)(RS„i/(n-l)p) + E[M2|An](s^i/(n-l)p) < K^E' sup (xJ(RE-l)X„) n>n 1 E^ sup (rS ,/(n-l)p) n>no + E < kJe^ _, 1 sup E(M^|An) n>no E' r2 sup (RSjji/(n-l)p)" n>nQ (xT (R?-l)XnJ E^ (RS l/(no-l)p) + e5[e(m4|Aoo)] E^ r2(rS ,/(no-l)p)"

PAGE 64

57 = K-^E (xT (RE-l)Xn J^l E^ E (RS ,/(no-l)p)"' e, R 1 1 + e^Cm'*) e^ R^E (RS i/(no-l)p)"' e, R < K^E^ (xT (R?-l)XnJ E^ (xf i)p) 1 1 1 + E^(M'') e^(r2)e2 (^(nO-l)p) (xT(R?-^)XnJ (A.7) is finite. We Since (nQ-l)p > 5, ER < oo, it remains to show tiiat E have, using Theorem 1, p. 55 of Searle (1971), concerning cumulants of quadratic forms for normal random vectors. (xT(RE-l)XnJ nQ^E<^E (xT(noRE-l)XnJ^ e, R 2p + p2 + (4+2p)eT(nQRE-i)e + (e'r(nQRE-i)e) (A.8) Now it suffices to show E (e'^(ARi;-i)e) < CO. Applying the same theorem, we have (g'^(ARE"^)6) = E
PAGE 65

58 {9 9 ~ 2tr(i;~^P) + (trE^^P) + 4AljE"^PE"hp + 2A(trE~^p)lJ?"hp RM^ + X^il^V'^lp)^^^^^ \ < oc> (A-9) using the marginal indejiendence of M and R. We now turn to (2.2.30). Equation 2.2.30: As c -* 0, 1 1/1 2c ItrE)^ W|, E[R ^|Anp] (2p) Proof : We will break up the l.h.s. of (2.2.30) into more accessible comjwnents. First, define Vn = e(r ^|An) = dn^EJgs^i + S^2(M)+ a)/(np+b-2)]2|An I, (A.IO) where dn^ = rr(i(np+b-l))/r(l(np+b-2))1(l(np+b-2)) I Now, in the r.h.s. of (A.IO) xpand Us^^+S^2(^)+ay{np+h-2)J = (E(R-Vn, M))^ about Wq = E(R-l|An) = S ^ + E[Sjj2(^)l^n) + a )/(np+b-2) in both oneand two-term Taylor series, to obtain we e: dnVn = W| + iE|cn'[E(R-l|An, M) E(R Vn)]An l = wi lE|^n^[E(R-l|An, M) E(R-l|An)] JAn (A.ll) (A.12) where Cxx and ^n are both between E(R ^|An, M) and E(R ^|An), and are not An measurable. Now write

PAGE 66

59 c ^(trE)5(w|,-V^) 1 , 1 (trEWw|>-d^V^) + E c 2(trS)2(d^-l)Vj (A.13) We first show that the first term on the r.h.s. of (A. 10) goes to with c. We will use (A. 12) to establish pointwise convergence of the integrand, and (A. 11) for uniform integrability. To that end, using (A.12), and noting the cancellation in E(R |An, M) and E(R |An), < c 1 11 (trS) ( YYnn— UnpVnp W4^-dT,Vr ) (trE)^EU^^^E(R-l|A^, M) E(R~1|A^)] |a^ 1 1 f 3 _i 1 _2 f _3p ,-|2 = Ic 2(trE)2(Tp+b-2) E^T St2(M) e(s^2(^)|At) < K(Tp+b-2) U;^2E{s|,2(M)|Arp} < K(Tp+b-2)~ V^^j (x!f X^) + E(M'*|A^) a^. as c — 0, (A.14) _1 _1 _3 _3 where we have used the facts that c T < Urp , ^rp < KUrp , S.p2(M) e(s^2('*^)I^t) < E s4^2(^^)|At, and an inequality similar to that of (A. 6). To obtain uniform integrability, we need (A.ll) to stay within our moment constraints. We have

PAGE 67

60 E -1 I, 1 2(trE)^fW^-d^V.p -iEic ^(tri;)^E Crp^(E(R"VT' ^) E(R"^|At)) nQL (S^j/(n-l)p) E(s„2(^)l^n) < CO, (A.15) where we have used the facts that c T~ < f Srp, /Tp+b-2 j and C-p < [Srp^ /Tp+b-2 ] ^, and the last line is justified via the same argument as that of (A. 7). To handle the second term we need to know the behavior of (dn-1)Observe that dn = ( ^(np+b-2) ) tI i(np+b-2) ) / FI i(np+b-l) (A.16) From Lemma 1 of Alvo (1977) we can obtain two facts: z2r(z)/r(z+l) = 1 + (8z)-i + Oe(z-2), (A.17) for large z (>0), where Oe denotes exact order, and 1 1 < z2r(z)/r(z+l) < 1 + (4z) -1 (A.18)

PAGE 68

61 _ 1 Putting z =: ^(np+b-2), we get the following: and Since have, easily {%'V^'-' dn-1 = (4np) + Oe(n 2) < dn-1 < [4(np+b-2)] (A.19) (A.20) Vqo = R > is a uniformly integrable martingale, with (A.19) we 1 1 -1 c 2(tri:)^(d^-l)V^ ^ (4p) ' a.s. as c ^ 0. (A.21) With (A.20) we have, by arguments similar to those for (2.2.31), uniform integrability, since ER~ < 00. These results, combined, yield (2.2.30). Finally, we establish the validity of (2.2.33). Equation 2.2.33: As c -+ 0, lT-l({trE} 11 1 Proof: Note that it is sufficient to show 1 1 1 E|c"lT~V(trE)%|, Ut)^} -* as c ^ 0, and E h~^T~'^(v?^ cHy\ ^ as c ^ 0. (A.22) (A.23) We begin with (A.22). We will find the following elementary inequalities useful. For a, b > 0,

PAGE 69

Using (2.2.20), (A. 25), and the fact that (tri;)W:3. > U^ > a.s., we obtain 62 (a^-b^)
PAGE 70

63 + al ^^ P(St2(m)IAt) Sti/(t-i)p j < K^ sup ^ "^,, ., ^ + sup (S„i/(n-l)p < K< -r (A.27) The arguments for (2.2.31) can be applied to show that the expression is integrable. Dominated convergence completes the justification of (A. 22). To establish (A. 23), note that from (2.2.20) we can obtain 1111 111 Jrp tt2 _ „2 , „2/rr 1N tt2 ^ ,2 , it2 < c^T U|, = c^ + c^(T-l) U;^ < c^ + V^_-^ V^ (A.28) Hence, using (A.28), E ^ ^ 2 c~^T~^(u|> c^T) < E 2c"^T"^ 1 -2 + (u^_i uy < E 2T~^ + 2c~h~^\]:^^(\]rj._^ Vrj.) {• -2/ < 2E(T"-') + 2El TU^''(Uy_^ U-j) (A.29) The dominated convergence theorem can be applied to show that E(T ) ^ 0, so it remains to consider the second term in the r.h.s. of (A.29). To this end, note

PAGE 71

|Un-l Unl 64 < (tr?) < (tr?) P^n-l, i((n-l)P+b-2) (np+b-2) V(YTi;-lYn)(i.p+b-2) pS^j((n-l)p+b-2) (np+b-2)"V(YTE-lYn)(nP+b-2)"^ (A.30) He n(U^_l Un)^Un2 < n(p((n-l)p+b-2) + (yTs-1y„)s-; < 2n p2((n-l)p+b-2) + [Y'^^-hn) S"^ < Kjn-1 + K2(n-1) ^(yJs'^Yn) [s^i/i^-l)) (A.31) ,-ln 2 -2 < K^n-l + K2{(n-l)''f: (y?'(RE-1)Yj) }(RS„i/(n-l)) i=2' < K^n^l + K2I ^s^p Un-1) ^J (Y^^lRE-^Yi) ^sup (RSj^i/(n-l)) (A.32) 2 _9 p Equation (A.31) can be used to show that T(Urp , Urp) Urp" — > as c — > 0, noting that the .Tv-1 e Y. S Y(i = 2,...) are identically distributed. The Schwarz inequality applied to (A.32) provides an integrable dominating function when (un l)p > 9, since |(n-l)" X) (Yi^(R-?^^)Yi) } and (RSj^2/("^l)) ^^^ backward submartingales. Thus i=2 (A.23) follows from (A.29) (A.32), and so (2.2.33) is established.

PAGE 72

APPENDIX B DETAILS OF THEOREM 2.2 Here, we provide the details necessary to complete the arguments of Theorem 2.2. We need a preliminary lemma. Lemma B.l . Suppose N is the Bayes stopping rule for the given sequential estimation problem. Then there exists a number B > such that Wj^ < BcN^ (B.l) Proof : Since N is Bayes, on the set [N=n] we must have, for any k. iiQ-Wii iVk |An + c(n+k) >E[|ie-^nllVn] + en. (B.2) That is, the immediate stopping risk must be less than the expected risk of taking any fixed number of additional observations. Note that {i El lie inW |An = (trGn)E[R-l|An] + (trHn)V(M|An), (B.3) A^tr ere trGn = tr(nE + AD j is decreasing in n, and trlJn = (GnP-hplJo-lgn) In view of (B.3), (B.2) translates to 65

PAGE 73

66 (trG„^k)E[R-l|Aj + (trH^^i^)E[v(M|A„^k)|A„] + c(n+k) > (trGn) E[R"Vn] + (trHn) v(M|An) + en. (B.4) Rearranging (B.4), and using the fact that V(M|An) is a positive supermartingale, gives us (trGn ^^Qn+k) E(R"^|An) + (trHn trH^^j^)v(M|An) < ck. (B.5) Note that ?n+k = (("+»^)?"^ + ^?"^) = (GnUki:-l) -1 = Gn-Gn(Gn + k-l?) Gn (B.6) -1 T -1 Substituting in (B.5), we obtain, writing A = D Iplp P , tr Gn(Gn + k-^E) Gn E(R-^|An) + A 'tr|2GnAGn(Gn + k~^E) Gn Gn(Gn + k ^e) xGnAGn(Gn+k-lE) Gn[v(M|An) < ck. (B.7)

PAGE 74

67 Using Gn = n Fn, we further obtain tr Fn(Gn+k-l?) Fn E(R-Vn) + AVltr|2FnAFn(Gn + k'^v) In Gn(Gn + k ^e) xFnAFn(Gn + k-^?) Fn}v(M|An) < ckn^. (B.8) We want to argue that the second term on the l.h.s. of (B.8) is nonnegative, and hence that it can be dropped from the inequality. Recall that k is arbitrary. Note that tr|2FnAFn(Gn + k"!?) F^ Gn(Gn + k"!?) FnAFn(Gn + k"!?) Fn I tr(2ki;Ai;) as n — + oo, for fixed k (B.9) (2n l)tr[FnAFn j as k -+ oo, for fixed n. (B.IO) Both of these expressions are nonnegative. It follows that tr Fn(Gn + k"^?) ?„ E(R-^|An) < ckn (B.ll) for all n larger than n , say, and k = 1. For 1 < n < n , we can choose values ki,..., k ,, ' n say, such that (B.ll) holds. We obtain min jtr FnfGn + k.-lE) ^Fn lE(R^l|An) < (cn^) max {k;}
PAGE 75

where k^ = 1. Since tr[Fn(Gn + k"^?] Fn ) -» k tr^EAs) > for each fixed k, as n — oo, we have E(R"Vn) < Bcn^ (B.13) for some B > 0, on the set [N = n]. This is sufficient for the lemma. We can now establish (2.2.37) (2.2.41). Note that (2.2.37) is immediate, since the integrand is a uniformly integrable martingale. With regard to (2.2.39), we have c-%-2Wj^ < BWj;jlWj^ = B, (B.14) applying the lemma; the result follows via the Dominated Convergence Theorem. For (2.2.40), we have c-1n-2v(M|Aj^) < BW^^E(m2|Aj^) < be(r|Aj^) e(m2|An) nQ J [n>nQ J using Jensen's inequality and Lemma B.l. The fact that the r.h.s. of (B.15) is integrable 2 follows from the Schwartz inequality and Doob's maximal inequality, since E(R |An) and E(M |Aji) form uniformly integrable martingales, and hence have integrable last elements. Equation (2.2.40) follows via the Dominated Convergence Theorem. Now, for (2.2.38), using (A. 8), arguments in (A. 12) and Jensen's inequality,

PAGE 76

69 1 1 2 1 , 1 {K Vn) = ^"'K ^N^n) + ^''(^N 1)V N 1 1,1 1 1 < B%Wj^^(w^ dj^Vj^) + B5NWj;^^(dp^ 1)Vn < K < K < K 1 1 E Wn'(^n)"'[e(^N2(M)|An) Sn2(M)] An + ^ n'Vn 'Nl ,(N-1)P, e(Sn2('^)|An) + 1 n>j{(^) K^n2(M)|An)+l (B.16) which is integrable, via the arguments of (A. 7) (A. 9). Again, we can apply dominated convergence to obtain the desired result. To establish (2.2.41), note that for all c. < c~^Ie Q-Orj.1 + CT E \6-i^f + cN = E 2c ^(trE)^jwi,-E(R ^|Arp) 1 1/1 2c ^(trS)2 W2^-e(r 2|Aj^) + Erc"^T~\trFrp trE)W J Erc~%"^(trFj^ trE)Wj^1 + E[c'^T~2A2tr(F^GFY)V(M|A^)] E[c"%"2^2jj(p^gp^)y(j^|^^^)'j + E c~^T"M (trE)%|, -c^T o(l) E / 111 c"%"M (trE)^W^ c^N / 11 1 " c"%~^ (trS)^W|j c^N as c — + 0, (B.17) since N is Bayes, and Theorem 2.1 holds. It follows that E o(l) as c — + also; i.e., (2.2.41) holds, and thus Theorem 2.2 is proved / 11 1 ^ c~%~M (trS)2W^ c^N

PAGE 77

APPENDIX C DETAILS OF THEOREM 2.3 Under the conditions of Theorem 2.3, ,-1 "X ~ -T Atr|?(P"^-C)E|/tr(E) A2|tr(EGE) /tr(E)} E[RV(M|e, R)] as c -» 0. (C.l) Proof : We first establish jX)intwise convergence of the integrand. Recall — 1 9n = UV~^ + AD"^) (nE~^Xn + Ap~hpE(M|An)) and -1 ^n = (nE~^ + AC) nE'^Xn, -1 here C = D"^ (lpP~hp) P"hpl^p~^ Hence, for n > 1, l^n-^nl (ni;~^Ac) (nE"l)Xn (nE"^Ap) {(ni;"MXn + Ap~^E(M|An)lp} 70

PAGE 78

71 = Xn-(n?~^Ac) {AC)Xn -Xn + (n?-^Ap-l) (Ap-^){Xn E(M|An)lp}| = a2 {(nE-^Ap-1) p-1 (nE-^Ac) c}(Xn E(M|An)lp)| , (C.2) since Clp = 0. Hence, using (C.2), c J2'~p~P'p| = aV^t~2 {(E"^AT"lp"^) p-1 (e-^AT-^c) c}(x^ E(M|A^)lp)| (C.3) We have, via arguments in Lemma 2.2, and the continuity of matrix inversion, that (X^ E(M|A^)lp) (e E(M|0, R)lp); (C.4) P"^(i:~^AT~^P~^) c(e~^AT"^c) -^ P"^ C a.s. as c -^ 0. (C.5) Using these and Lemma 2.1, we have ^"l^T'^xf ^ ^^(ti-?)"^R|?(P~-^-C)(e E(M|e, R)lp) = 4" (say), a.s. cis c — ^ 0. (C.6)

PAGE 79

72 At this point, $ does not look like the right expression. We first show uniform integrability of ]c~ j^rp-^rpr, c > oK and the indicate how E($) is correct. Using (2.2.14), (C.3), and basic projserties of the natural matrix norm, it can be shown that C 12'T'~2t -1 -1 P _ < A2u;^M{(i:"^AT"^D~n P"^ (e'^AT'^c) c} |X^ E(M|A^)lp < KU;^%rp E(M|Arp)lp"^ < k{(Sti/(T-1)p) |XtP + (St^/(T-1)p) e2(m|At) (C.7) Uniform integrability follows via arguments similar to those used to establish (2.2.31). f See (A.7)). We can now look at E($). It is important to note that in its current form, E(0), obviously nonnegative, demonstrates that there is a cost associated with using (T, dj,) versus (T, ^rp). This is not so apparent in (2.3.14). We have E($) = E A2(tri;) R|E(P"^-C)(e E(M|e, R)lp = A(trE) E AR^e Mlp + Mlp E(M|e, R)lpJ x(p~^-C)i;E(p~^-C)('e Mlp + Mlp E(M|e, R)lp)

PAGE 80

73 A(trS). I E \T XR(e Mlp) (P"^-C)E2(D"^-C)(e Mlp) + E AR^Mlp E(M|e, R)lp) (p~l-C)S^(D~^-C)(Mlp E(M|e, R)lp) + 2E AR(e Mlp) (p~l-C)s2(p~^-C)(Mlp E(M|e, R)lp) I. (C.8) We examine the three expectations in (C.8) individually. First, E T AR(e Mlp) (p~^-C)E2(p-l-C)(e Mlp) = E^ ARE rp (e Mip) (p"^-c)E^(p-i-c)(e Mip) M, R = E|ARtr((AR) ^p(P"^-C)i;2(p"^-C)U = tr|E(p"l-C)p(p~l-C)s} = tr{E(p~^-C)E}, (C.9) where we have used the fact that, given M and R, has mean vector mlp and covariance matrix (Ar) p. Second, eJar(m1p E(M|e, R)lp) (p-^-C)E2(P"l-C)(Mlp E(M|e, R)lj eJaRE (Mlp E(M|e, R)lp) (P"^-C)E2(p-l-C) X (Mlp E(M|e, R)lp) 6, R

PAGE 81

= E[ARV(M|e, R)lp(D~^-C)E2(P"^-C)lp] = Atr(E(P"^-C)lplJ(p^l-C)E) E[RV(M|e, R)] 74 = Atr(EG?) E[RV(M|e, R)], (CIO) where, recall, G = D hplpP"^ Finally, AR(e Mlp) (P"^-C)E^(P~^-C)(Mlp E(M|e, R)lp) = E AR(e E(M|e, R)lp + E(M|e, R)lp Mlpj X (P"^-C)s2(p-l-C)(Mlp E(M|e, R)lp) eJaRE (e E(M|e, R)lp)^(p"l-C)E2(P"^-C) X (Mlp E(M|e, R)lp) e, R eUre T (Mlp E(M|0, R)lp) (P"'1-C)E-(P"1-C) X (Mlp E(M|e, R)lp) 0, R (C.ll) = -Atr(EGE)E[RV(M|e, R)], (C.12) where the first term of (C.ll) is identically 0, and the value of the second follows from (CIO). Using (C9), (CIO), and (0.12), we find that

PAGE 82

75 < £($) = Atr(E(P"^-C)?)/tr(i:) A2|tr(EGE)/tr(E)}E[RV(M|e, R)], and the theorem is proved.

PAGE 83

APPENDIX D DETAILS OF THEOREM 2.4 Under the conditions of Theorem 2.4, c-1e C7 rp — J\. rr\ | Atr(i:p"^i;)/tr(E) A2/tr(SGi;)/tr(i;)}E[RV(M|e, R)] as c -* 0. (D.l) Proof : We first establish pointwise convergence of the integrand. Recall that ^n = (nS^^AD"^) (nS"^Xn + Ap~^lpE(M|An)) I n~^E(n~li;+Ap~^) Xn + (nS^^AD'M Ap"hpE(M|An). (D.2) Hence C I T^^-^T^ T"^I;(t'^^S+A"^P) Xrp T"^(S;"l+AT~^p"n AD"hpE(M|A^) 76

PAGE 84

77 -1 = A^c'^T"^ s(aT"^E+P) X^ (E-^AT^lp-^) p"^lpE(M|A^) A2(trS) RlED"Ve E(M|0, R)lp)r a.s. a^ c -^ 0, (D.3) using (as in Theorem 2.3) Lemma 2.1, arguments in Lemma 2.2, and the continuity of matrix inversion. Uniform integrability follows from the following inequality, similar to (C.7), and arguments like those establishing (2.2.31): c"^|e^-x^|^ < KA2u;^^i|x^|^ + e2(m|a^)1 < k|(Stj/(T-1)p) fX^f + (Sti/(T-1)p) e2(M|At) (D.4) It remains to show that — 1 H A2(trE) R SP"Ve E(M|0, R)1j = Atr(Ep~^E)/tr(?) A2/tr(EG?)/tr(E)|E[RV(Mie, R)]. (D.5) This follows from arguments virtually identical to those of (C.8) (C.IO) and (C.12). Thus, the theorem is proved.

PAGE 85

APPENDIX E DETAILS OF THEOREM 3.1 Here we prove the relationships which establish Theorem 3.1. We start with two propositions that are used to show uniform integrability. Proposition E.l : Under the regression model of Section 3.2, if xir^k > 2u, then E sup f RS 1 /nk + b 2) '^"0 < 00. Proof : Using Proposition 3.1, we can write E sup (RSjjj/nk + b 2) n>n = E < sup — ^ — i — E n>n(j sup (rJz. /n1 n>noV i=2 / 78

PAGE 86

79 < KE E^ sup RX^Zj /n 1 ln>nr,V i=2 /?, R < KE <<0 < oo when nQk/2 > u, where we have used Doob's maximal inequality on the backward submartingale {ERZ-Zn-l}. 'i=2 Note : It follows easily that E sup fs„i/nk + b 2) n>n(j < 00 under the additional constraint that b/2 > u, which ensures that E(R") < 00. Proposition E.2: Under the conditions of Theorem 3.1 sup (e{s„2(M)|A„}(s„i) n>nQ < 00. Proof : Note that, using Lemma 1.1, E[S„2(M)lAn](S^l)" E _(Mn^Pn-Mlpr(Ap-^Mn^)"\MnlPn-Mlp)(S^l)"Vn] = E r-1 1,,,_1-L,,_1 (Mn'Pn-^+^-Mlp) (Ap-^+M-^) (Mn'Pn-^+^-Mlp)(S 'nl)'Vn]

PAGE 87

80 < KE Mn'Pn-^l +I^-Mlp|r (Sj^i) An = KE ||R'^'(Mn^Pn-^)f + |R''''(^-Mlp)f|(RS^i)" Note that Mn^Pn"/? = Mn^EX?'v"^Y.-/? = Mn' 1 1 i:xJY"i(YpXj^) . It, follows that 1 H^ I 1 I R'(Mn^Pn-^)| < K n-lR2 Jx^'Y'^CYi-X;^) . So, continuing, we have E[S^2(M)|An](S„i)-^ < KE ln-lR'/'f:X?'Y"\Yi-Xi^)| + |R'/'(^-Mlp)||"l(RS^i)-^ A, Using this relation, we have sup (E{s„2(M)|An}(S^i)-l < K<^E /|l jj ||2 sup E n-lR^/'EXi^y-lCYj-X./?) (RS^i)"^ n>nn VB 1 'I An + E sup e( |r'/'(^-M1p)| (RS„i)-1 n>n An

PAGE 88

81 < K<^E^ sup n>nf n-lR'/'f:xjrY-l(Yi-X.^)| sup (RSjjj/n) n>nQ + E 1/2 sup |R'/'(^-Mlp)f n>ng .1/2 sup (RSj^j/n) -2 n>n(j Via a slight modification of Proposition E.l, E sup (RS i/n) n>nQ < oo if Uok > 5. Also, n>no E sup R''(^-Mlp) = E<^E R'^'(^-Mlp) R, M < KE{ J3 E U=l L L R, M < 00, since, given R = r and M = m, r^(/3--m) ~ N(0, A d-.) where d-is the j diagonal element of D. 1 Finally, if we write R^X^'y^^Yj-Xj^) = Qj, then given R = r and ^ = ^, Qj ~ Np(0, XY X-), independently and, since X^Y Xn ^ S , the components of the Qj's have bounded moments of arbitrary order. Using Proposition 3.2, an argument similar to that above for |R^(/9-Mlp)i gives us E 1-1 " I 1 = Eno /?, R < oo. This establishes the prof>osition. We now proceed to the proofs of (3.2.13) (3.2.17).

PAGE 89

Equation (3.2.13): For all c, 82 1 1,1 1 1 2c5(trE)^E(R~^|AT) = 2c5(trE)5 t[^) / r(|) (|)^ Proof : This is immediate via the optional stopping theorem. Equation (3.2.16) : As c — 0, Erc"^T''2A2tr(FrpGFrp)V(M|Arp)1 -^ A2/tr(SG?)/tr(i:)}E[RV(M|^, R)]. Proof : From Lemma 3.1, the behavior of Fn, and the fact that T — oo a.s. as c — 0, we obtain c~^T~2A2tr(F^GFrp)V(M|A^) A2{tr(i;Gi;)/tr(E)}RV(M|^, R) a.s. as c -^ 0. (E.l) The result will follow by showing that the l.h.s. of (E.l) is uniformly integrable. We exhibit an integrable dominating function. Note that, using (3.2.8) and (3.2.10), and the fact that tr(FiiGFii) is convergent and thus bounded. c~^T"^A^tr(F^GF^)V(M|A^) < Ka2u^^E(m2|A^) -1< K sup |E(M2|An)(S ,/nk + b 2) j. (E.2) n>no The Schwarz inequality coupled with Proposition E.l, the fact that

PAGE 90

83 is a martingale imply -1 -9 -1 c ^T '^W-p -^ (tr?) a.s. as c -* 0. We need the behavior of T(Fj-i:). Note that that n(Fn-?) = n|(An~^p~^n~^Mn) ?} = n|(nMn^-E) nMn^(A~^nP+nMn^)" nMn^} n(nMn^-E) nMn^(A~^P+Mnb nMn^ EHE E(AP"^)E asn ^ oo, applying Proposition 3.3, and the fact that M^ nE — * H as n — > oo. It follows that c"^T"^(tr(F^,) tr(E)W^ -tr(EHE)/tr(E) Atr(Ep ^E)/tr(E) a.s. as c -^ (E.3)

PAGE 91

It remains to show uniform integrability. We have, using (3.2.10), ^T"Vtr(F:j,-E))w^ < K(s^^/Tk + b 2) e[r Vj < KE < KE (RS^^/Tk + b 2) -1 At sup fRS„i/nk + b 2) ^ n>n„^ "' ' '^"0 A. 84 (E.4) using the facts that ntr(Fn-?) is bounded, and that fS-p^/Tk + b 2J is Ar^-measurable. The last expression is integrable via a minor adjustment to Proposition E.l. The result is proved. Equation (3.2.14): As c-^ 0, 1 1,1 2c 2(trE)^(w|, E(R ^[A^)) -(2k) Proof : We will break up the l.h.s. of (3.2.14) into more accessible components. First, define, as in Chapter 2, Vn = E(R 2|An) = dn'E (Snl + Sn2(^) + ^)/("P + ^ -,1 An (E.5) where dn^ = rr(i(np+b-l))/r(i(np+b-2)) (i(np+b-2)) I Now, in the r.h.s. of (E.5) we expand Rs^^^ + Sj^2(^) + ^)A"P + ^ " 2) ' = (E(R"Vn, M))' about Wn = E(R~^|An) = ( Sjjj + E^S^r,(M)|An) + a j /(np + b 2) in both oneand two-term Taylor series, to obtain

PAGE 92

85 dnVn = W| + isi Cn'[E(R-l|An, M) E(R-Vn)] |An 1 r _3|_ 9| = W^ iEJ^elE(R"^|An, M) E(R-l|An)J"|An (E.6) (E.7) where (n and ^n are both between E(R |A, M) and E(R lAjj), and are not Anmeasurable. Now write E \tTY,f(w\-Vr^) = E 1 , 1 (trEj^^W^-d^V^) + E 1 1 c ^(tr5)''(d^-l)V^ (E.8) We first show that the first term on the r.h.s. of (E.8) goes to with c. We will use (E.7) to establish pwintwise convergence of the integrand, and (E.6) for uniform integrability. To that end, using (E.7), and noting the cancellation in E(R~ |An, M) and E(R~ |An), 1 1.1 < c ^(trE)^(wi,-drfVrj,) _1 1 ( _3p -i^l 1 Ic ^(trE)^E<^ ^x^l E(R~ Vt' M) E(R~ VtHI^T f |c 2(tri;)^(Tp+b-2) E^T^ S^2('*^) " e(s^2(^)I^t) < K(Tp+b-2) U^2e|s|>2(M)|A-p} < K(Tp+b-2)-lu;^2||^w|4 ^ ^^i\j^^^ a.s. as c — 0, (E.9)

PAGE 93

86 1 2rp-l 1 _3 2 where we have used the facts that /?rp — ^ /? a.s., c T < U^p", ^^p < KUrp", Srp2(M) e(s^2(M)I^t) I I^T < E S4^2(^)I-^T ' ^"^ ^'^ inequality similar to ones from Proposition E.2. To obtain uniform integrability, we need (E.6) to stay within our moment constraints. We have E .11.1 , ^(trE)^fW^-dp,VrpJ 1 1 -^eIc ^(tri:)^E Crp^(E(R~VT' M) E(R~^|A^)) f _i 1 _i, .-1
PAGE 94

87 11 1 9 ^T~V(trE)%|, U^" i ^ as c ^ (E.ll) and 1 1 1t~^(u| c^t) I -* 35 c ^ 0. (E.12) We begin with (E.ll). Arguing as in Appendix A, we obtain 11 1 2 11 1 2 c"^T"M(trS)2W^ Uy < TUj^((trE)^W^ U^ , KT-'( 0. To establish uniform integrability we apply 1 1,2 th e inequality (a^-b^j < |a-b|, to obtain 11 1.2 ^T~l((tri:)^w|, U|,) < TU:^^|(trE)W^ Urp| < K sup ^ X , .. > + sup (S„i/nk) \. (E.14) ln>nn S^lM n>n.^ "^ ' J Propositions E.l and E.2 can be applied here, with slight modification, to show that the r.h.s. of (E.14) is integrable. Dominated convergence completes the justification of (E.ll). To establish (E.12) we apply the stopping rule based inequality as in Appendix A. Identical argument gives us that

PAGE 95

88 1t.-1 c~'T < 2E(T ^) + 2E Tu;;.-(u T I'-'T-l -%)' (E.15) Dominated convergence can be applied to show that E(T ^) -+ 0, so it remains to consider the second term. Recall that S^j has the representation ^ Z-, where the Z.'s are i=l ' ^ conditionally independent. Using this, we have Un-l-Vj| (trE) kSjj_j^j((n-l)k+b-2) (nk+b-2) ^ + Zn(nk+b-2) ^ < (tr?; kSj^j((n-l)k+b-2) (nk+b-2) + Zn(nk+b-2) (E.16) Hence ^^n'^i^n-r'^4 < nf k((n-l)k+b-2) +ZnS^}j < 2n| k2((n-l)k+b-2) + Z^S'J -1 < K^n-l + K2(n-1)" z2(s^^/(n-l)) -2 (E.17) < K^n-1 + K2|(n-1)~^ J (RZj)n(RS„^/(n-l)) _2 < K^np 1 + kJ sup ( (n-1)"^ J (RZj)^ ) sup ( ( J RZj) /(n-1)) | [n>nQV i=2 /n>nQVM=2 'V J j (E.18)

PAGE 96

89 —9 2 P Equation (E.17) can be used to show that TUj (Urpj-Urp) ^ as c ^ 0, noting that the Zj (i = 2, 3,...) are identically distributed. The Schwarz inequality applied to (E.18) provides f -1" 21^ an integrable dominating function when (nf,-l)k > 9, since < (n-1) T" (RZ-) > and r -in ]-^ I i=2 ' i <(n-l) ^ RZ. > are backward submartingales. Thus (E.12) follows from (E.15) I >=2 'J (E.18) and so (3.2.17) is established.

PAGE 97

APPENDIX F DETAILS OF THEOREM 3.2 Here, we provide the details necessary to complete the arguments of Theorem 3.2. We need a preliminary lemma. Lemma F.l : Suppose N is the Bayes stopping rule for the regression parameter estimation problem. Then there exists a number B > such that Wj^ < BcN' (F.1) Proof : Since N is Bayes, on the set [N=n] we must have ll^-^n+kll IVk |An + c(n+k) > E 11/? ^nll |An + en. (F.2) Recalling (3.2.7), (F.2) translates to H,+k)E[R"'|An] + (trH„^j^)E[v(M|A^^i^)|A; + c(n+k) > (trGn) E[R-l|An] + (trHn) v(M|An) + en (F.3) where trGn = tr(Ap +Mn) is decreasing in n, and trHn = A"tr (GnP IplpD Gn) • 90

PAGE 98

fl n+k T,,-l, The remainder of the proof is identical to that of Lemma B.l, substituting J^ XV Xfor n+r' ' kE-1. We can now establish (3.2.19) (3.2.23). Note that (3.2.19) is immediate, since the integrand is a uniformly integrable martingale. With regard to (3.2.21), we have c-1n-2Wj^ < BWj^%j^ = B, (F.4) applying Lemma F.l; the result follows via the Dominated Convergence Theorem. For (3.2.22), we have c % 2v(M|Aj^) < BWJ^^E(m2|Aj^) < be(r|Aj^) e(m2|An) < B^ sup E(R|An)^<^ sup E(M^|An)^, ln>nQ J [n>nQ J (F.5) using Jensen's inequality and Lemma F.l. The fax:t that the r.h.s. of (F.15) is integrable follows from the cauchy Schwartz inequality and Doob's maximal inequality, since E(R |An) and E(M (An) form uniformly integrable martingales, and hence have integrable last elements. Equation (3.2.22) follows via the Dominated Convergence Theorem. Now, for (3.2.20), using (E.8), arguments in (E.12) and Jensen's inequality. 1 , 1 1 , 1 «="'(W?J Vn) = c"5(wij d^y^) + c'\6^ 1)Vn < B'NW^'{wi^ dj^V^) + B^NWr,5(d^ 1)V, 'N ^"N ^^''N < K 1 1 E Wn'(^n)~'[K^N2(m)IAn) SnsW An + Wj^^v^

PAGE 99

92 < K < K -1 'Nl .(N-l)p e(Sn2(^1)|An) + 1 ,^->"n rfVp E(Sn2(M)|An) + 1 n>nQU("-l)P (F.6) which is integrable, via Proposition E.2. Again, we can apply dominated convergence to obtain the desired result. To establish (3.2.23), note that for all c, < c"^iE e-^T +^T E e-e^l + cN = E -1 1/ ^ -^ ^ 2c 2(trE)^ W|,-e(r 2|a^) 1 1/1 2c ^(trE)2 W2j-e(r 2|Aj^,) + e[c ^T ^(trFrp trE)W J Erc~%"l(trFj^ tri;)Wj^Tl + e[c ^T~2A2tr(F^GF^)V(M|A^)] E[c"%~2A2tr(Fj^GFj^)V(M|Aj^)] + E 1 1 1 c'h-M (trE)%|, c^T E ,-Kt-1 1 1 1 = 0(1) E 1 1 1 c'^N"-' (trEj'W^ -c-'N c'^N"-' (trE)'W^ c as c — + 0, N since N is Bayes, and Theorem 3.1 holds. It follows that E 0(1) as c — > also; i.e., (3.2.43) holds, and thus Theorem 3.2 is proved (F.7) 1 1 1 c"%"M (trE)^W^ c^N

PAGE 100

APPENDIX G DETAILS OF THEOREM 3.3 Here, we establish that, under the conditions of Theorem 3.1, c"1/?-/?^ Atr|x:(p~^-C)s}/tr(i;) A2{tr(i;Gi;)/tr(i:)}E[RV(M|^, R)] as c ^ 0. (G.l) Proof : We first establish pointwise convergence of the integrand. Recall ^n = (Mn+AP ^) ^(Pn + AE(M|An)p-hp /5n = (Mn+AC) Pn, whereC = p ^ (iJd hp) D hplpP"'^ Then, recalling that Mn^Fn = ^n /?n-/?n (Mn + AC) \SnPn " (Mn + Ap-^) \m„0J( + Ap-lE(M|An)lp 93

PAGE 101

94 = jPn (Mn+AC) Vn^S' ^n + (Mn+AD"!) \\D~^)[pJ E(M|An)lp)| = A2|{(M„+Ap-1)~^D-1 (Mn+AC)"^c}(^S' E(M|AJlp)| , (G.2) since Clp = 0. We have, using (G.2), ''\'^T-hf = A W2 {(T-^M^+AT-^p-^) ^p"^ (T-^M^-AT^^C) ^c}(^^ E(M|A^)lp)l . (G.3) Now, c~^T~2 -^ (trE)" R, pl^ -^ 13, E(M|A-p) -^ E(M|^, R), and the matrix expression converges to E(p~ -C), a.s. as c —^ 0. Hence 1 1I ii2 1 A2(trE)" r|e(P-1-C)(^ E(M|^, R)lp)|' a.s. as c -+ 0. (G.4) To obtain an integrable dominating function, we note < KU;^%;^ E(M|A^)lp|^

PAGE 102

95 < K sup (s^^/nk + b 2) fMn^Pn E(M|An)lp| n>no = K sup (Sj^i/nk + b 2) [E^Mn^Pn Mlp|An -1 / < IC sup (Sj^i/nk + b 2) E |Mn^Pn Mlp| n>no An (G.5) The fact that the r.h.s. of (G.5) is integrable follows quickly from Proposition E.2. The dominated convergence theorem is thus applicable. It remains to show that A2(trS) r[e(P-1-C)(^ E(M|^, R)lp)|' = A tr|s(P"^-C)E}/tr(i;) '{tr(EGE)/tr(i;)}E[RV(M|^, R)]. (G.5) The argument is identical to that of Appendix C, involving B_, since 6 and /? have the same distributional forms. The proof is then complete.

PAGE 103

APPENDIX H DETAILS OF THEOREM 3.4 Here, we establish that, under the conditions of Theorem 3.1, c-1e w|0-pjl Atr(Ep~^E)/tr(i;) A2/tr(EGE)/tr(E)}E[RV(M|^, R)] as c -^ 0. (H.l) Proof : We first establish pointwise convergence of the integrand. Recall that §n = (Mn+Ap-l)"^(Mn^S' + Ap-llpE(M|An)) -1\ ^\T^-lflW , r-.. , xr>-l\ ^n-li = K (Mn+Ap-^) Ap-Vn + (Mn+AP"') Ap-^lpE(M|An). (H.2) Hence .-'\i-r,f c ^|(M^+AP"^) AP"^^;^ (Mrp+AD"^) ApipE(M|A^) A^c"^T"2 -1 T~%^+AT~^P"^) P'^^f^T " E(M|A^)lp) . (H.3) 96

PAGE 104

97 Now, c~^T"2 -1 (T~%rp+AT"lp"^) ^p ^ ED"^ (trE) R, pl^ -^ p, E(M|A^) — E(M|^, R) and a.s. as c — 0. It follows that 11^, ^wl|2 ,2,..^,-! ^T-^xt ^ ^^^^'?) ^|?P"^(^ E(M|^, R)lp)|' a.s. as c — + 0. (H.4) A comparison of (G.3) and (H.3) shows that we can use the same dominating function here as there. It remains to show that E A2(trE) r|eD"1(^ E(M|/?, R)lp)l = Atr(Ep~lE)/tr(?) A2|tr(EGE)/tr(E)}E[RV(M|^, R)]. (H.5) Again, the argument follows that of the case for 6 of Appendix D. The proof is then complete.

PAGE 105

BIBLIOGRAPHY Alvo, M. (1977). Bayesian sequential estimation. Ann. Statist. . 5, 955-968. Bickel, P., and Yahav, J. (1967). Asymptotically pointwise optimal procedures in sequential analysis. Proc. 5th Berk. Symp. , VI, 401-413. Univ. of California Press, Berkeley. Bickel, P., and Yahav, J. (1968). Asymptotically optimal Bayes and minimax procedure in sequential estimation. Ann. Math. Statist. , 39, 442-456. Bickel, P., and Yahav, J. (1969a). An A. P.O. rule in sequential estimation with quadratic loss. Ann. Math. Statist. . 40, 417-426. Bickel, P., and Yahav, J. (1969b). Some contributions to the asymptotic theory of Bayes solutions. Z. Wahr. Ver. und Gebiete. . 11, 257-276. Chow, Y.S., Robbins, H. and Siegmund, D. (1970). Great Expectations: The Theory of Optimal Stopping . Houghton-Mifflin, Boston. Chow, Y.S., and Teicher, H. (1978). Probability Theory, Independence Interchangeability Martingales . Springer Verlag, New York. Chung, K.L. (1951). The strong law of large numbers. Proc. Second Berkeley Symp. , 341352. Univ. of California Press, Berkeley. Finster, M. (1983). A frequentist approach to sequential estimation in the general linear model. J. Amer. Statist. Assoc . 78, 403-407. Finster, M.P. (1987). A frequentistic and Bayesian analysis of Zellner's economic regression model under an informative prior. Sequential Analysis , 6, 139-153. Ghosh, M., Nickerson, D., and Sen, P.K. (1987). Sequential shrinkage estimation. Ann. Statist. . 1^, 817-829. Ghosh, M., Sinha, B.K., and Mukhopadhyay, N. (1976). Multivariate sequential point estimation. J_. Multivariate Anal. . 6, 281-294. Hall, P. (1981). Asymptotic theory of triple sampling for sequential estimation of a mean. The Annals of Statistics . 9, 1229-1238. 98

PAGE 106

99 Lindley, D.V., and Smith, A.F.M. (1972). Bayes estimates for the linear model (with discussion). J. R. Statist. Soc , B, 1-41. Martinsek, A.T. (1987). Empirical Bayes methods in sequential estimation. Sequential Analysis, 6, 119-137. Rao, C.R. (1973). Linear Statistical Inference and its Applications . 2nd Edition. Wiley, New York. Rehalia, M.E.H. (1984). Asymptotic sequential analysis on the A. P.O. rule performance. Sequential Analysis, 3, 155-174. Searle, S.R. (1971). Linear Models . Wiley, New York. Woodroofe, M. (1981). A. P.O. rules are asymptotically non-deficient for estimation with squared error loss. Z. Wahr. und Verw. Gebiete , 58, 331-341.

PAGE 107

BIOGRAPHICAL SKETCH Robert Michael Hoekstra was born on the 18th of December, 1954, in Lake City, Florida. His family moved to Charlottesville, Virginia, in 1960, to Alexandria, Virginia, in 1965, to Lima, Peru, in 1966, to Rome, Italy, in 1967 and back to Alexandria in 1968. There he completed high school, and enrolled at the University of Virginia in 1972. He graduated with a Bachelor of Science in Applied Mathematics, from the School of Engineering in 1981, having taken time off for a four-year enlistment in the U.S. Army. He then enrolled at the University of North Carolina at Chapel Hill, and received a Master of Science in Statistics in 1983. In the fall of 1983 he enrolled at the University of Florida to pursue a doctorate in statistics. Upon graduation, he will join the Department of Mathematics of the University of North Carolina at Charlotte, as an assistant professor. 100

PAGE 108

I certify that I have read this study and that in my opinion it conforms to eicceptable standards of scholarly presentation and is fully adequate, in scop>e and quality, as a dissertation for the degree of Doctor of Philosophy. ..P \axa^i(vixx^AvMalay Ghosh, Chair ( Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scojje and quality, as a dissertation for the degree of Doctor of Philosophy. X^^u-^' /\ . A ,<^:^^^c^^-. Dennis Wackerly, Cochair Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^\LAU[^ H i^aMAXL.1. Ronald Randies Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully eidequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Joseph Gl< f T/ioor^v^ Glover Professor of Mathematics This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Greiduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August, 1989 Dean, Graduate School


xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E9BDC8DRU_W77JGX INGEST_TIME 2014-07-02T22:10:21Z PACKAGE AA00022794_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES