Group Title: multivariate one-way classification model with random effects
Title: The multivariate one-way classification model with random effects
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00099371/00001
 Material Information
Title: The multivariate one-way classification model with random effects
Physical Description: vii, 110 leaves : ; 28 cm.
Language: English
Creator: Schott, James R., 1955-
Copyright Date: 1981
 Subjects
Subject: Multivariate analysis   ( lcsh )
Analysis of variance   ( lcsh )
Analysis of covariance   ( lcsh )
Matrices   ( lcsh )
Statistics thesis Ph. D
Dissertations, Academic -- Statistics -- UF
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
 Notes
Thesis: Thesis (Ph. D.)--University of Florida, 1981.
Bibliography: Bibliography: leaves 108-109.
General Note: Typescript.
General Note: Vita.
Statement of Responsibility: by James Robert Schott.
 Record Information
Bibliographic ID: UF00099371
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: alephbibnum - 000296339
oclc - 08075798
notis - ABS2701

Downloads

This item has the following downloads:

PDF ( 4 MBs ) ( PDF )


Full Text


/ii
I /
/ /






THE MULTIVARIATE ONE-WAY CLASSIFICATION
MODEL WITH RANDOM EFFECTS










BY

JAMES ROBERT SCHOTT


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY



UNIVERSITY OF FLORIDA


1981

































To Susan

and

My Parents



















ACKNOWLEDGMENTS


I would like to express my deepest appreciation to

Dr. John Saw for suggesting this topic to me and for

constantly providing guidance and assistance. He has made

this project not only a very rewarding educational expe-

rience but also an enjoyable one. I also wish to thank

Dr. Andre Khuri Dr. Mark Yang, and Dr. Dennis Wackerly for

their willingness to provide help when called upon.

Finally, a special thanks goes to Mrs. Edna Larrick

who has turned a somewhat unreadable rough draft into

a nicely typed manuscript.















TABLE OF CONTENTS


Page

ACKNOWLEDGMENTS . . . . . . . .. .. iii

ABSTRACT . . . . . . . . ... . . vi

CHAPTER
1 INTRODUCTION . . . . . . . . 1

1.1 The Random Effects Model, Scalar Case 1
1.2 The Multivariate Random Effects Model 11
1.3 Notation . . . . . . . .. 14

2 MAXIMIZATION OF THE LIKELIHOOD FUNCTION
FOR GENERAL Z . . . . . . .. 17

2.1 The Likelihood Function . . . .. .17
2.2 Some Lemmas . . . . . ... 27
2.3 The Maximum Likelihood Estimates . .. 40
2.4 The Likelihood Ratio Test . . .. .46
th
3 PROPERTIES OF THE s LARGEST ROOT TEST . 51

3.1 Introduction . . . . . . .. 51
3.2 The Uniformly Most Powerful Test
for m = 1 . . . . . . . 51
3.3 An Invariance Property . . . . 55
3.4 The Union-Intersection Principle . .. 56
3.5 A Monotonicity Property of the
Power Function . . . . . .. 60
3.6 The Limiting Distribution of s . .. 66

4 MAXIMIZATION OF THE LIKELIHOOD FUNCTION
WHEN Z = o21 . . . . . . .. 75

4.1 The Likelihood Function . . . .. .75
4.2 The Maximum Likelihood Estimates . .. 79
4.3 The Likelihood Ratio Test. . . .. .88

5 AN ALTERNATIVE TEST WHEN Z = o2I
AND ITS PROPERTIES . . . . . .. .94

5.1 Introduction . . . . . . .. 94
5.2 An Invariance Property . . . .. .95











TABLE OF CONTENTS (Continued)


CHAPTER Page

5 (Continued)

5.3 A Monotonicity Property of the
Power Function . . . . ... 96
m
5.4 The Limiting Distribution of Z 101
i=s


BIBLIOGRAPHY . . . . . . . . . 108

BIOGRAPHICAL SKETCH . . . . . . . ... 110















Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy


THE MULTIVARIATE ONE-WAY CLASSIFICATION
MODEL WITH RANDOM EFFECTS
By

James Robert Schott

August 1981

Chairman: Dr. John G. Saw
Major Department: Statistics

A well-known model in univariate statistical analysis

is the one-way random effects model. In this paper we

investigate the multivariate generalization of this model,

that is, the multivariate one-way random effects model.

Two specific situations, regarding the structure of

the variance-covariance matrix of the random error vectors,

are considered. In the first and most general case, it is

only assumed that this variance-covariance matrix is sym-

metric and positive definite. In the second case, it is

assumed, in addition, that the variance-covariance matrix

is a scalar multiple of the identity matrix.

Maximum likelihood estimates are obtained and the

likelihood ratio test for a hypothesis test on the rank

of the variance-covariance matrix of the random effect

vectors is derived. Properties of the likelihood ratio










test are investigated for the general case, while for the

second case an alternative test is developed and its prop-

erties are investigated. In each case a sequential proce-

dure for determining the rank of the variance-covariance

matrix of the random effect vectors is presented.














CHAPTER 1


INTRODUCTION



1.1 The Random Effects Model, Scalar Case

Suppose a physician is considering administering some

particular blood test to his patients as a part of their

physical examination. He suspects that the test results

vary with the presence and severity of a particular path-

ological condition. In order to examine variability in

the results of the blood test, the physician chooses to

administer the blood test n times to each of g patients.

This results in the observations x..: i = 1,2,...,g;
13
j = 1,2,...,n.

A suitable model to explain the different values of

x : i = 1,2,...,g; j = 1,2,...,n would be


X.. = V + ai + i... (1.1.1)

th
Here 9 is an overall mean, a. is an effect due to the i

patient, and z.. represents a random error due to the

measuring process. We assume that z..: i = 1,2,...,g;

j = 1,2,...,n are independent and have a normal distribu-
2
tion with mean zero and variance a 2
Z










If the physician is interested in using the blood test

as a diagnostic tool, he will certainly be interested to know

whether a major source of variation in the results of the

blood test is due to variation between the patients. Since

the physician will administer the test to an unlimited num-

ber of patients in the future, we should properly regard the

g patients involved as a sample from the entire population

of patients. The patient effects, a : i = 1,2,...,g, now

have the role of random variables, and (1.1.1) is a random

effects model. Again we assume that ai: i = 1,2,...,g are

independent and have a normal distribution with mean zero
2
and variance a Thus, from our model (1.1.1) we deduce that

x.. has a normal distribution with mean p and variance
2 2
a + G .
a z
The variation in the results of the blood test is

governed by a2 + a The portion of this attributable to
a z
2 2 2
the patients is, of course, oa/(Oa+o ), and the physician

would like to know whether this or, correspondingly,
22 22
a /a is sizeable. If 0 /2 is sufficiently large, he would
a z a z
choose to investigate the possible use of this test as a

means of detecting the pathological condition; otherwise

he would find the blood test essentially useless as a diag-

nostic tool. Hence, the physician might be interested in
2
testing the hypothesis Ho: o = 0 against the hypothesis

H1: 2 > 0.
1 a










In order to derive the likelihood ratio test for testing

the hypothesis H against H we first need to obtain the
2 2
likelihood function of (p,o ,o ). This is most easily done

by making a transformation. Let C be an orthogonal matrix,

with the element in the ith row and the jth column denoted by

cij, such that Clj = 1//n : j = 1,2,...,n. Since C is

orthogonal,
n n
E ckj = /n Z clj ckj = 0 for k = 2,3,...,n.
j=1 j=1 (1.1.2)

Consider the orthogonal transformation

(Yil'Yi2'' Yin)' = C(xilxi2... xin) (1.1.3)

Upon replacing xij by the right side of (1.1.1) and using

(1.1.2), we observe that
n
yil = E xik/ = / i.'
k=l
n
Yij = k cjk zik for j = 2,3,...,n,

n
where x. = Z xik/n. Thus,
S k=l
Cov(ij,yik) = 0 for j # k,

V (Yij) = -2 for j = 2,3,...,n,
ij~ z

and {xi ,Yi2,Yi3, "'in =1 is a set of gn mutually

independent random variables, where x. has a normal dis-
ti.
2 2
tribution with mean p and variance a /n + o and yij
z aij











has a normal distribution with mean zero and variance o .
z
n 2 n 2
Note also from (1.1.3) that (x..-x. ) = y and
j=l i j=2 1

denote this quantity by u.. We can now write the joint

density function of yi2,yi3,...,yin as

n
2 2 2 2
f(yi2',yi3". in;o) = I(2-Ta) exp[-y /20
j=2

2 -(n-l) n 2 2
= (22a )-(n- exp[- y /2 2]
j=2

n 2 2 2
= ( Z y j;oz) = g(ui;oz)
j=2

so that from the set {x. ,y.i2,yi3 .... in}, (xi.,ui)


2
is sufficient for (i,oz). Thus, we may assume that we have,

independently, u. and x. for i = 1,2,...,n, where ui/oa

has a chi-square distribution with v = n-i degrees of

freedom, and x. has a normal distribution with mean p and
i.
2 2
variance o /n + Note that with
z a
g
x = x. /g,
.. i=1 .

g 2 g- 2 2
S(x. -I) = Z (xi.-x ) + g(x -)
i=l i= 1

2 2 2
Then putting a = z /n+a we can write the joint density

function of x1.,x2..... x as









2 2 2 2
f(xl ,x2 ...,x ;X 2) = = (2T2) exp[-(xi.- ) /22]
1.2.i=l

2 -5g g 2 2
= (2ro)g exp[- X (xi. -) /20]
i=l1

= 2-g 2 2 2
= (2n2)g exp[-( E (x. -x ) +g(x -w) /2o2]
i=l
= g(x ,v;p,o ) ,

g 2
where v = n E (x. -x ) Hence, from the set
i=l
2 2
{x ,x2.,...,x }, (x ,v) is sufficient for (,a z/n + a )

Also, if we let c denote a constant, we can write the joint

density function of ul,u2,...,u as

2 2g 2 u-1 2 v
f(ulu2 '"'... ;z) = c exp (-ui/2a )u /(o
i=l







where u =ii u.. Thus, from the set {ul,u2,...,u }, u is
2-gv g 2 g v-l
= (a) z expl(- Z u./2o) Hlcu.
i=l i=l





2
where u Z u. Thus, from the set fulu,. .,u 1 u is

sufficient for o .
z
We may now assume that we have, independently, x ,

u, and v, where x has a normal distribution with mean
2 2 2
p and variance (a +no )/gn; u/o has a chi-square distri-
z a 2
2 2
bution with e = g(n-l) degrees of freedom, and v/(a +noa)

has a chi-square distribution with h = g-l degrees of
2 2
freedom. The likelihood function of (u,,a o ) can be

expressed as









2 2 2 e-l 2
exp[-(x -p) gn/2(o +nac2) u e-lexp[-u/2a ]
f ( ,u,v) = 2 z2 ae
S(2T(ao +na )/gn) (20 ) e (e)


v h-l exp[-v/2(o2+no2)

(2(o2 +n2 ))hF(h)
z a

Let the set = {(a2): a = 0} and the set
2 2 2
S= {(oa ) : o > 0}. We seek the maximum likelihood
2 2
estimators, i and 8 of v and a when the parameters
( Z Z
are restricted to lie within w, and the maximum likelihood
^2 ^2 2 2
estimators, u, oz and &2 of 2, oz, and a2

when the parameters are restricted to lie within Q.
2
In ao = 0 and
2 2 e-l 2
exp[-(x --) gn/2az] u exp[-u/2a i
f(x ,u,v) = (2- T/gn (2Y2z)eF (e)
(2iro /gn) (2a2) (e)
z z

v h-1 exp[-v/2o2]
zz
(202) h (h)

so that the logarithm of the likelihood function, omitting

a function of the observations, is

(x..-P) gn u+v (e+h+l) n2
in c(1.1.4)
2a2 22 2 z
z z
2
Differentiating (1.1.4) with respect to p and oz, we

obtain, respectively, the equations
2
(x..-p)gn (x ) gn+u+v e+h+l
2 2' 2 2 0,
z 2( ) 22z










which yield the maximum likelihood solutions

11 = x..

2
82 = (u+v)/(e+h+l).
z0

In S the logarithm of the likelihood function, omitting

a function of the observations, is
2
.. -) gn u e 2 v (h+l) 2 2
_____ - n a 2 2 Zn(o +n ).
2 2+a 22 2 z 2(a 2+2 z
2(a +no ) 20 2( +no )
z a z z a
(1.1.5)
2 2
Differentiation of (1.1.5) with respect to i, ao, and a

yields, respectively,

(x -j)gn
-2 = 0
(0 +na )
z a
2
n((x -j) gn+v) n(h+
2 2 n(h+
2( 2+no2 ) 2( 2+na )
z a z a


(x -11) gn+v u e (h+1)
2 22 + 22 2 2 2 0.
2(a +na ) 2(a ) 20 2(0 +nn )
z a z z z a
2 2
Solving these equations for J, a and a we obtain the
Z' C1'
maximal solution of the likelihood function in 0, (-,2 ,
2
-2




o = u/e = u,,
2


2 = (v/(h+l)-u/e)
a = (v/(h+l)-u/e) /n = (v*-u,)/n,











where u, = u/e and v, = v/(h+l). Since we insist that
-
a2 be greater than or equal to zero, the solution above

is the maximum likelihood solution only if v,-u, 1 0.

Suppose, however, that v, < u*. Clearly (1.1.5) is still

maximized when V = x so that we need to minimize

U + e Zn o2 + + (h+l) in(a2+nG2)
2 z (2 +n2 za
z z a
2 2
subject to the contraints 2 > 0 and a2 0. Equivalently,
Z a
we consider the problem of minimizing

i(x,t) = u/x + e in x + v/t + (h+l) Zn t

subject to the constraint t x>0. For fixed x W(x,t) is

concave upward in t with its absolute minimum at t = v,.

For each x i(x,t) is, therefore, minimized with respect to

t a x when

rv, if v, 2 x,
t =
S if v, < x.

Thus, p(x,t) is minimized over {(t,x): t > x > 0} by

setting

t = v, and x = u, if v, ? u,,

t = x = (u+v)/(e+h+l) if v, < u,.

Hence, for the maximum likelihood estimators when the

parameters are restricted to be within Q, we obtain

S= x
^2
Gz2 = u*,


S2
8a = (v,-u,)/n,










if v, > u,, and
=

2 =

2
a =


x(

(u+v) / (e+h+l),


if v, < u,.

Substituting the maximum likelihood estimates into

the likelihood function, we see that in w


max f(x ,u,v) = ue- vh- exp[-(e+h+l)/2]
W u+v (e+h+l) 2(e+h+l)
r (-e) F (h) (e+h+ )( l) (i/gn) 2

and in Q


ue- vh-1 exp
u
F (e)F (h) ()

max f(x ,u,v)=

max f(x ,u,v)

The likelihood ratio, A, is


max f(x ,u,v)
max f(R ,u,v)


[-(e+h+l)/2]
e (h+l)
(E-1) (i/gn)


(u/e) e(v/ (h+l)) (h+l)
[(u+v)/(e+h+l) (e+h+l)



1


2 (e+h+l)

if v, 2 u*,

if v, < u*.





if v, 2 u.,




if v* < u,.


Now putting w = u/(u+v) and noting that

v, 2 u, if and only if (iff)

v u
h+ -e iff

u+v > 1 1
Su(- + ) iff
h+l e h+1

e u
e+h+ u+v w










we can rewrite the likelihood ratio

(e+hl)(e+h+l) (e(_w) (h+l) if w e
ee (h+)l) w (w) if w e+h+l'
e (h+1)



1 if w > e
e+h+l

Since X is an increasing function of w, and Ho is rejected

for small values of X, it follows that H is rejected for

small values of w or large values of 1/w. Now

1 = u+v = + = + h v/h
w u u e u/e

so the likelihood ratio test rejects Ho for ev/hu large.

Recall that u/C has a chi-square distribution with
z
2 2
e degrees of freedom, and v/(a +na ) has a chi-square dis-
z a
tribution with h degrees of freedom, independent of u.
2 2 2
Hence, the quantity a ev/(a +na )hu has an F distribution
z z a
with h and e degrees of freedom. If we let F(h,e,a) denote

the constant for which P(F(h,e) F(h,e,a)) = a where F(h,e)

has an F distribution with h and e degrees of freedom, then

we will reject Ho if ev/hu 2 F(h,e,a). The power function
2 2
of this test is a function of = a /az and is given by

B(e) = P(F(h,e) 2 F(h,e,a)/(l+n9)).

Although the analysis which we have just outlined is,

by now, quite standard to any graduate level course in design

and analysis, we have reproduced it since it motivates the

more general problem to be described in the next section.










Indeed the situation we wish to consider contains the one-

way random effects model as a special case to which we can

return on occasion to check our work.


1.2 The Multivariate Random Effects Model

Suppose a physician is considering administering a bat-

tery of m distinct types of blood tests to his patients as

a part of their physical examination. He believes that,

based on the results of these tests, he may be able to detect

any one of several particular pathological conditions. In

order to examine variability in the results of the blood

tests, the physician chooses to administer the battery of

blood tests n times to each of g patients. This results in

the observations x.. (mxl): i = 1,2,...,g; j = 1,2,...,n.

A suitable model to explain the different values of

x..: i = 1,2,...,g; j = 1,2,... ,n would be

.ij = + a + j. (1.2.1)
-13 -1 -1

Here j(mxl) is an overall mean, (mxl) is an effect due to

the ith patient, and z. (mxl) represents a vector of random

errors due to the measuring process. We assume that

z: i = 1,2,...,g; j = 1,2,...,n are independent and have an

m-variate normal distribution with mean 0 and variance-covar-

iance matrix Z.

Since the physician will administer the tests to an

unlimited number of patients in the future, we should prop-

erly regard the g patients involved as a sample from the










entire population of patients. The patient effects,

a.: i = 1,2,...,g, now have the role of random vectors, and
--1
(1.2.1) is a multivariate random effects model. We will

assume that ai: i = 1,2,...,g are independent and have an

m-variate normal distribution with mean 0 and variance-

covariance matrix A. Hence, from our model (1.2.1) we see

that x.. has an m-variate normal distribution with mean y
-13
and variance-covariance matrix A + X.

While there are m different blood tests, it is believed

that there are some groups of tests for which the tests within

a group vary quite strongly together. In other words, the

data from some of the tests are highly correlated. For this

reason the number of sources of variation between the patients,

which we will denote by p, may be less than the number of tests,

m. That is, the rank of the variance-covariance matrix A is

p where p : m. Since A is symmetric, nonnegative definite,

and of rank p, there exists a matrix L(mxp) such that A = LL'.

Clearly L is not unique since if A = LL' and P(pxp) is such

that PP' = I, then A = L,L* where L = LP. This enables us to

rewrite (1.2.1) as

x. = + Lf. + z. (1.2.2)


where f (pxl): i = 1,2,...,g are independently distributed,

having a p-variate normal distribution with mean 0 and

variance-covariance matrix equal to the identity matrix.










If the physician is interested in using the blood tests

as a diagnostic tool, he will certainly be interested in deter-

mining the value p, since the p sources of variation may

correspond to p different pathological .disorders. So of

particular interest to the physician is a test of the hypoth-

esis H( : the rank of the matrix LL' < s-1 against the
0
hypothesis H(s) the rank of the matrix LL' = s. With such a

test procedure he could develop a sequential test procedure

for determining the rank of LL'. He would first test H()
0
(m) (m)
against H and if he rejects H he would stop and take

the rank of LL' to be m; otherwise, he would proceed to test
(m-1) (m-l)
H against H The procedure continues until either

some hypothesis H( is rejected, in which case he then takes
(1)
the rank of LL' to be s, or the hypothesis H is accepted,
0
in which case he would conclude that there is no significant

variation between patients.

In this paper we investigate the multivariate one-way

classification model with random effects, given by (1.2.2).

Two specific cases, regarding the structure of the variance-

covariance matrix E, will be considered. In the first and

most general case we will assume no more than that E is sym-

metric and positive definite. In the second case we will

assume that the vector of random errors, z.., is such that
--1]
its components are independent and have the same variance.

That is, we assume that Z is equal to some constant multiple

of the identity matrix. In each case we develop a test











procedure for testing the hypothesis H(S: the rank of
o
LL' < s-1 against the hypothesis Hs): the rank of LL' = s.
1
In addition, we investigate some of the properties of these

test procedures and present a numerical example to illustrate

the use of these procedures.


1.3 Notation

The following

Notation

(A) .

1,
(A)



a..



A-1

A'

IAI

tr A

dg(A)





diag(al,a2,...,am)



chi (A)


notation will be used whenever convenient:

Interpretation

row i of the matrix A

column j of the matrix A

the element in row i and column j

of the matrix A

the element in row i and column j

of the matrix A

the inverse of the matrix A

the transpose of the matrix A

the determinant of the matrix A

the trace of the matrix A

the diagonal matrix which has as its

diagonal elements the diagonal

elements of A

the diagonal matrix which has al,

a2'../aa as its diagonal elements

the ith largest latent root of the

matrix A










Notation Interpretation

rank (A) the rank of the matrix A

Im the m x m identity matrix

I the identity matrix (used when the

order of the matrix is obvious)

(0) the matrix which has all of its

elements equal to zero

x a vector

x. the ith element of the vector x

0 the vector which has all of its

elements equal to zero

E(x) the expected value of x

V(x) the variance of x

Cov(x,y) the covariance of x and y

P(A) the probability of event A

P(AIB) the probability of event A given

event B

S(x) the gamma function

x --> x n converges to x in distribution

a --> a convergence of a sequence of constants

exp(x) Euler's constant, "e," raised to the

x power

E is contained in

is distributed as

N(p,o2) the normal distribution with mean
Sand variance
v and variance o










Notation

N (_, E)


V1
F
V2




W (E,v,0)





Jones [1973]



Jones [1973:1]


Interpretation

the m-variate normal distribution with

mean p and variance-covariance matrix Z

the central chi-square distribution with

v degrees of freedom

the central F distribution with vl numer-

ator degrees of freedom and v2 denomina-

tor degrees of freedom

the central Wishart distribution with

variance-covariance matrix Z and degrees

of freedom v

the reference authored by Jones and

published in 1973

page 1 of the reference authored by Jones

and published in 1973














CHAPTER 2


MAXIMIZATION OF THE LIKELIHOOD FUNCTION
FOR GENERAL Z



2.1 The Likelihood Function

Suppose the vectors x. (mxl): i = 1,2,...,g; j = 1,2,

...,n can be modeled by

ix. = E + Lf + (2.1.1)


wherein Q(mxl) is a fixed but unknown vector, L(mxp) is

a fixed but unknown matrix, f. N (0,I): i = 1, 2,...,g,

and z.. ~ N (0,Z): i = 1,2,...,g; j = 1,2,...,n. We assume

that the set of random vectors {f ,fl .... g'zll .. '/Zgi

are mutually independent. Thus, xij Nm (,V) with V =

LL' + Z. However, for any orthogonal matrix P(pxp), V =

LL' + Z = LP(LP)' + Z so that L is not unique whereas LL'

is unique. The purpose of this section is to derive the like-

lihood function for v, LL', and E. Although xij and xkz are

independent for all (j,Z) when i # k, x.j and x are not

independent even when j # I, since Cov(xij,xi ) = LL'(j#).

Thus, the likelihood function is not simply the product of

the density functions of the x.i's. A transformation of the

x. 's will expedite the derivation of the likelihood function.
--D









Consider the Helmert transformation (see, for example,

Kendall and Stuart [1963:250]) given below:

il= xi. + (2*1)-yil + (3"2)-yi2 + "'" + (n(n-l))-yi ,
x = x. (2l)- i + (3-2)-y + + (n(n-l)) yi
-i2 --. il i2
x. 2 (3"2) i 2 -yi
x = 2(3*2) + *** + (n(n-1)) ,

-i3 -i. i2




x. = x. (n-1) (n(n-l))-yi,
-in -i. i'

where v = n-l. It will be helpful to note that the above

equations imply the following:

- -i -i -i
x n x il + n-i2 + ... + n xin'


Yil = 2-il 2-xi2'

Yi2= (3.2)- x + (3*2)xi2 2(3-2)-xi3'



y = (n(n-l)) x-i + *+ (n(n-l)) -i,n (n-1)(n(n-1)) -xin

In matrix formulation we have

(Xil ... in)' = H(xi.'Yil ...'. Yiv) '

and we note that, while not an orthogonal matrix, the columns

of H are orthogonal. The matrix H fails to be orthogonal

since H'H = diag(n,l,l,...,l). Observe that, upon replacing

xij by the right side of (2.1.1), we have









x. = I + Lfi + zi.,
-1. -1..

Yil = (2) (z il-i2)

i2 = (32) (z +z 2-2z ),
i2 i -12 -3



Yiv = {n(n-l)} (zil+i2 ..+i,n- (n-l)zin.

Thus

E(xi.Y j) = (0),

E(YijY q) = (0) if j # q,

E(yij.Yij) = .

Hence, it follows that {x 'Yil"'i are a set of

gn mutually independent vectors with x ~N (p,(1/n)Z+LL'):

i = 1,2,...,g and yij Nm(0,'): i = 1,2,...,g; j = 1,2,...,n.
n v
Note also that Z (xi-x ) (x -x. )' = Yijj and denote
j=l -'1 ij2 -i. j=l
this matrix by Ei. We can now write the joint density function

of Yil... 'Yi as

f(Yil .... Yi;Z) = 1l 27ZI-exp[-yi E yij]
j=l l
= 12ZI-vexp[-j Z ( z-yij)]


j=l
= 2rZl-vexp[-h E tr(y!.j yj )]
j=1 L










2Z-vv exp[- -1
= 12 exp[- EZ tr(Z Yij Y)]
j=l
j-1
= 27irZ exp[- tr E.]

= g(Ei;z)

so that from the set {x. ,Yil',i2 ..... yi, (E i'i.) is

sufficient.

Thus, we may assume that we have, independently,

E. Wr ( ,' 0)
x1 i r g.

x N (1 Z + LL')

Note that

9 _
g g
i.l -- -- .... ..

where x = Z x. /g. Then putting W = (l/n)Z + LL',
--.. -1.
i=l

we can write the joint density function of x ,x ,...,x as
1. 2. -g.

S-1 -
f(x ,x ... ;,W) = 2exW- exp[- (x -i_)'W (x. --
1. 2 .. g. i. i.

ii=

9 -
= 2zrW I-exp[- Z tr(W (x. -) (x -i))]
i=l 1i. i
i=1

4 9 -1-

= J2ffW-gexp[-j tr(W- (xi.-_) (xi.-)')]
i=i=l
= |2nW| exp[- tr(W Z (x. v) (x -v)')]
i-I 1-






21





-1g
= j27yWI -hexp[-Ig tr W 1 (x -vi) (x -ii)'



-' tr(W Z (x. -x ) (x. -x )' )]
i=l1

= 2lTW -gexp[-g tr W-(x -i) (x -)' ]


-1 g
x exp[- tr W E (x. -x ) (x. -x }']
i=l 1

= g(x ,H;jp,W),

g
where H = n E (x. -x ) (x. -x )' Hence, from the set
i=l

{xl. ... ,x }, (x ,H) is sufficient for (_, (l/n)E + LL' .

Also if we let c denote a constant, we can write the joint

density function of E1,...,E as


f(E1,...,E ;; ) = c IE 2(v-m-l)exp[- tr(EZE)]
S i=l


= c exp[- tr(Z- E) E. (-m-1)
i=l i=l

= gl (E; z)g92(E1,E2" ..E ) ,

g
where E = Z E.. Thus, from the set {E1,...,E 1, E is
i=l 1 g

sufficient for Z.

Then we may assume that we have, independently,

x.. ~ (g, (E+nLL' )),

E ~W (E,e,0),

H~ W (Z+nLL' ,h,0),
m










where e = g(n-l) and h = g-l. The problem is to estimate

j_, Z, and LL' or, equivalently, to estimate H, Z, and M where

M = nLL'. Recall that L is not uniquely defined so that if

LL' is an estimate of LL', then any L, such that iL' = LL',

is an estimate of L. The likelihood function of (_,Z,M) can

be expressed as

Km(I,e)Km (I,h)
f(x ,E,H) 2j (Ie)K mh) e H (h-m-) E \(e-m-)
I-(E+M)| I Z+Mllx 1 2e
gn

x exp[-(x -i) ((+M)) (M --1E)-1tr(Z -E)-tr (Z+M) H] ,
gn

weeK1 2 fm(m-l)
where K (I,v) 2 m(m-l) ((v-j+l)).
m j=1

The logarithm of the likelihood function, omitting a function

of the observations, is

-trZ E-e n[ Z -tr (s+M) 1H-h Zn I +MI

-l -
ZnlZ+MI (x /-)' ((/gn) (Z+M)) (x -H).

We seek the solution, (_, ,,M), which maximizes the equation

above, or equivalently, the solution which minimizes

tr Z-1E~e E-an + tr(Z+M)-1H+(h+l)ZnlZ+M|

-1 -
+ (x -Hj) '((l/gn) (Z+M))- (x -H). (2.1.2)

Before we can minimize the above equation, we need some results

on differentiation. Let W(mxm), X(mxm), and Y(mxm) be sym-

metric matrices, and let z(mxl) and a(mxl) be vectors. The

proof of the first result can be found in Graybill [1969:267].









Lemma 2.1.1:

3an|X| -1 -1
anlX = 2X dg(X ).


Lemma 2.1.2:

aZnlX+Y = 2(X+Y) dg((X+Y)-).
ax

Proof: Let V = X + Y. Then

3an x+Yl 3 anlv| av p n k V
ax. ax.. lDpaqm v x < ;v.
13 13 pq 13 13

so nX+Y ZV = 2V-l-dg(V-l) = 2(X+Y)-l-dg((X+Y)-).

Lemma 2.1.3:
-1
Str(X+Y) W -1 -1 -1 -1
X = -2(X+Y) W(X+Y) +dg((X+Y) W(X+Y)) .


Proof: Let V = X + Y. Note that


3X .= (0) = V + (v- 1 )
ID \1x131 a -i
"ij ij/
-1
so that =- V -V .

Then atr(X+Y) -1W Dtrv W = av
Then ^= tr W
ax.. 3x.- 3x.i


S- tr V' 1 V V-w


m m av
=- ZE (V WV-1) n
p=l q=l qp axij










S-(V WV 1)ji (V WV -). if i j,


(V-lwv-1)ii if i =
11

2 (V WV- ). if i = j.





-(V- WV-1) i if i j.

-1
Hence, 3tr(X+Y) W = -2V-1WV-1 + dg(V-1WV-1

= -2(X+Y)- W(X+Y)-1 + dg((X+Y)-1W(X+Y)-)


Lemma 2.1.4:

3(z-a)'W(z-a)
8z = 2W(z-a).

3 (z-a)' W(z-a) m m
Proof: -a ) (z -aq)w
1z z =i q=l

m m
= Z ( -a )w. + E (z -a)w
q=1 p=l


= 2 Z ( -a )wiq = 2(W). (z-a)
q=l qq i
S(z-a)' W(z-a)
so that z = 2W(z-a).

If we ignore the constraints that 2 is positive definite

and M is nonnegative definite and seek the stationary values

of (2.1.2) over all possible (j,Z,M), we find, upon taking the

partial derivatives of (2.1.2) with respect to E, M, and

and setting them equal to zero, that










e- h+1 -1 -1 -1_ (+M)H(E+M)-1
eX +(h+l) (Z+M) 1 Z HZ -(-+M) H(Z+M)

gn(X+M)1 (x ) (x -1)' (E+M)1 = (0),

(h+l)(E+M)-1-(E+M)- H(E+M)- gn(+M)- (x -. ) (x -_)' (+M)-


= (0),

gn(Z+M)1 (x 0) = 0,

for which the solutions are



= (l/e)E,

M = (/ (h+l))H-(l/e)E.

Since M is a nonnegative definite matrix, its maximum likeli-

hood estimate must also be nonnegative definite, so the solu-

tions above are the maximum likelihood estimates only if

(l/(h+l))H-(l/e)E is nonnegative definite. We find that,

while the solutions for 4 and Z are the natural unbiased esti-

mates, the solution for M is not. That is,

E(M) = (l/(h+l)) (hM-E).

Hence, we see that E(M) is also not necessarily nonnegative

definite.

Suppose that instead of using the likelihood function

of (,ZE,M) we use the marginal likelihood function of (E,M).

Justification for this follows from the fact that (E,H) is

"marginally sufficient" for (Z,M) or, in other words, (E,H)

is "sufficient for (Z,M) in the absence of knowledge of u."









For a detailed description of the principle of marginal suffi-

ciency see Barnard [1963]. There is ample precedent for the

use of this principle in multivariate theory. For example,

Bartlett's test has two forms, one involving the sample size

and the other involving the degrees of freedom. The marginal

likelihood function of (E,M) can be written

= Km(I,e)Km(I,h) HI(h-m-1) IE (e-m-1)

[Z+MIhiZ e

x exp[- tr ZE-E tr(E+M) -H],

where

-1 Vmv m(m-1) m
K (I,) = 2 i r (- H F( (-j+l)).
m j=l
The logarithm of the likelihood, omitting a function of the

observations, is

tr ~ E -_e n tr( +M) -H Jh kn|l+Ml.

We seek the solution, (Z,,M,), which maximizes the above

equation, or equivalently, the solution which minimizes

tr ZEE + e knlZI + tr(Z+M)1 H + h ZnlZ+M|. (2.1.3)

Again if we ignore the constraints that Z is positive defin-

ite and M is nonnegative definite and seek the stationary

values of (2.1.3) over all possible (Z,M), we find,

upon taking the partial derivatives of (2.1.3) with respect to

Z and M and setting them equal to zero, that

eZ-l+h(E+M)-1 -Z1EE-- (+M)- 1H(+M)-1 = (0),

h(E+M)- (E+M-I1H(E+M)-1 = (0),










for which the solutions are

*, = (l/e)E,

M, = (l/h)H-(l/e)E.

We see that these solutions are the natural unbiased estimates

of Z and M, and thus E(M,) = M is clearly nonnegative definite.

For this reason, we choose to continue our work with the mar-

ginal likelihood function of (E,M). Note that since M is non-

negative definite, the solutions above are the maximum likeli-

hood estimates only if (1/h)H-(l/e)E is also nonnegative

definite. In the next two sections we will derive maximum

likelihood estimates for E and M which are valid for all

possible (E,H).


2.2 Some Lemmas

Consider the function

p(A,B;D,e,h) = e[tr A- +ZnlAI] +h[tr B-1D +enlBJ],

where A, B, and D are m x m matrices. We assume that D is

diagonal with distinct, descending, positive diagonal elements;

that is, D = diag(dl,d2,... ,dm) with dI > d2 > ... > dm > 0.

We are interested in minimizing )(A,B;D,e,h) subject to
s
(A,B) E C = {(A,B): A E P B EP B A E U P. where P.
j=0

is the set of all symmetric, nonnegative definite matrices

of rank j. In this section it will be shown that the required

absolute minimum occurs when both A and B are diagonal.










The proof of this result relies mainly on a lemma regard-
-l
ing the stationary points of the function g(P) = tr PB P'D

where P(mxm) is orthogonal.

Lemma 2.2.1: Consider g(P) = tr PXP'D where P(mxm) is

such that PP' = I, and X(mxm) and D(mxm) are both symmetric

and positive definite. It is assumed that D is diagonal with

distinct, descending, positive diagonal elements. Then the

stationary points of g(P) occur when PXP' is diagonal.

Further, the absolute maximum of g(P) is

m
max g(P) = Z d.ch.(X),
P:PP'=I i=l

and the absolute minimum of g(P) is

m
min g(P) Z dm+-ich (X).
P:PP'=I i=l

Proof: Using the method of Lagrange multipliers, we look at

L(P,A) = tr PXP'D + tr A(PP'-I),

where A = A'. Let A be the matrix that has 1 in row i,
ij
column j, and 0's elsewhere. Then

DL = tr(Aij.XP'D+PXA..D) + trA(A. .P'+PA .)
dp 1] ]1 1] J1
aPij 3 ji

= tr(DPXA..+PXA ..D) + tr(PA..A+APA..)

= 2tr(AjiDPX) + 2tr(AjiAP)

= 2(DPX)ij + 2(AP)ij,










aL
-= tr(Ai+Ai..)(PP'-I) = 2(PP'-I). if i # j,
13

DL
L = tr A..(PP'-I) = (PP'-I)...
dA. 11 11
11

Thus, the stationary values of g(P) occur at the solutions to

2DPX + 2AP = 0,
I. (2.2.1)
PP' = I.

From (2.2.1) it follows that

A = -DPXP',

so that A = A' implies that

DPXP' = PXP'D,

or DY = YD, (2.2.2)

where Y = PXP'.

Examining the (i,j)th term on each side of (2.2.2), we see that

we must have d.yij = Yijd.. Since d. # d.: i # j, it follows

that y = 0: i # j. Thus, Y = PXP' is diagonal. It is clear

then that the stationary values of tr PXP'D are given by the

set of values

m
d (i) d chi(X),
i=l

where {t(l),t(2),...,t (m)} is a permutation of {1,2,...,m}, the

set being formed over all such permutations. Further, the

absolute maximum of tr PXP'D is, clearly,

m
max tr PXP'D = Z d. ch. (X),
P:PP'=I i=l










and the absolute minimum is, clearly,

m
min tr PXP'D = Z d +li ch (X).
P:PP'=I i=l

We will also need the following results, the first of

which can be found in Bellman [1970:117].

Lemma 2.2.2: Let X(mxm) and Y(mxm) be symmetric matrices

with Y nonnegative definite. Then

ch. (X+Y) > ch (X) for i = 1,2,...,m.

If Y is positive definite, then

ch. (X+Y) > chi(X) for i = 1,2,...,m.
1 1

Lemma 2.2.3: The function p(A,B;D,e,h) has an absolute

minimum over the set of solutions C = { (A,B) : A P B E P ,
s
B A U P.}.
j=0 3
-l
Proof: Since B is positive definite, it follows that B is

also positive definite, so that the diagonal elements of B-

are positive. Then we find that

m m
1 m -1 -1
tr B D = z (B d d (B ). = d tr B
i= 1 m i=l m

m m
= d I ch.(B 1) =d Z (ch.(B))-1
m i=l mi=l

since ch. (B ) = (ch +l(B)) Hence, using the fact that

m
for any matrix X(mxm), tr X = Z ch (X) and
i=l
m
XI = n ch.(x), we see that
i=l









m -1
q(A,B;D,e,h) 2 e Z ((ch (A)) + Zn(ch (A)))
i=l

m -1
+ h Z (d (ch.(B)) + nn(ch. (B))). (2.2.3)
i=l

From Lemma 2.2.2 we know that ch.(B-A) < ch.(B), since A is

positive definite. Then Cs can be written

Cs = {(A,B): chi(A) > 0: i = 1,2,...,m; chi(B) > 0:

i = 1,2,... ,m; 0 < chi(B-A) < chi(B): i = 1,2,...,s;

ch. (B-A) = 0: i = s+l,...,m; A = A', B = B'}. The closure,
1
CS, of C is {(A,B): ch.(A) 2 0: i = 1,2,...,m: ch.(B) > 0:

i = 1,2,...,m; 0 ch.(B-A) I ch.(B) : i = 1,2,...,s; ch (B-A)

= 0: i = s+l,...,m; A = A',B = B'}. Since (A,B;D,e,h)> 0,

it has an absolute minimum over C, since Cs is closed.

Note that from Lemma 2.2.2 if chi(B-A) = ch (B) for some i,

then it must be true that chm(A) = 0, since A must then be

positive semidefinite. Thus, for every (A,B) Cs Cs it

must be true that ch (A) = 0 or ch (B) = 0 or both. It then

follows from (2.2.3) that t(A,B;D,e,h) = whenever (A,B)

s C Hence, (A,B; D,e,h) has an absolute minimum

over C
s

Lemma 2.2.4: Suppose the function f(x), minimized

over x E S, achieves a minimum at x = a. Let the set S1

be such that for any x S-S1, there exists an xl E S1

such that f(x1) < f(x). Similarly, let the set S2 be such

that for any x ES-S2' there exists an x2 E S2 such that

f(x2) < f(x). Then it follows that a E S1l 2










Proof: Suppose a ? S1 n S2. Then either a g S1 or a S S2

or both. However, if a t S1, then a E S-S1, and there exists

no x1 E S1 such that f(xl) < f(a), since f is minimized at a.

This then is a contradiction, so it must be true that a E S.

Similarly, if a g S2, then a E S-S2, and there exists no

x2 E S2 such that f(x2) < f(a). This also is a contradiction,

so it must be true that a E S2. Hence, it follows that

a E S1 n S2'

In Lemma 2.2.3 it was seen that the function (A,B;D,e,h)

has an absolute minimum over the set C We will now show

that this absolute minimum will occur only when both A and B

are diagonal.

Lemma 2.2.5: The absolute minimum of O(A,B;D,e,h)

subject to (A,B) E Cs occurs when both A and B are diagonal.

We offer two proofs.

Proof 1: Define the sets S1 and S2 as follows:

S1 = {(A,B) E Cs: A is diagonal},

S2 = {(A,B) E Cs: B is diagonal}.
We want to show that if A(A,B;D,e,h) achieves a minimum at

(A,B) = (A,,B,), then (A,,B,) E S1 n S2. Now with

A = D- AD- and B = D-BD-,

t(A,B;D,e,h) = e[trA-l+inlAl] + h[trB-1D+Zn BI]


= e[trA-1D- +ni[] + h[trB-l+nlB ] +(e+h) inDI

= (B,A ;D-1,h,e) + (e+h)nIDI.
ttA;D,h,e] + (e+h)ZniDI.










Note that since D is positive definite, (A,B) E C if and
s
only if (D- AD -,D BD ) = (A,B) E C Thus, minimizing

A(A,B;D,e,h) subject to (A,B) E C is equivalent to minimiz-
-1
ing ((B,A;D ,h,e) subject to (A,B) C Moreover, if
- 1 1-2
(A,,B,) minimizes ((B,A;D ,h,e), then (D AD ,D DD) min-

imizes (A,B;D,e,h). Now arbitrarily fix (A,B) E C and
-l
consider 4(PBP',PAP';D- ,h,e) for all orthogonal P. Clearly

the terms n|lPAP'I, tr PB P', and InlPBP'I are constant for
-l
all orthogonal P, so that ((PBP',PAP';D ,h,e) is minimized

with respect to P when tr PAIP 'D- is minimized. It follows

from Lemma 2.2.1 that all the stationary points, and thus the

absolute minimum, occur when PAP' is diagonal. Hence, for

any (A,B) E Cs S1 there exists an (A1,B1) E S1 such that

p(BlAl;D ,h,e) < t(B,A;D- ,h,e).


But since A = D AD we know that A is diagonal if and only

if A is diagonal. So we find that for any (A,B) E Cs S ,

there exists an (Al,B1) E S1 such that

(A1,BI; D, e,h) < (A,B;D,e,h).


In a similar manner now arbitrarily fix (A,B) E C and con-

sider <(PAP',PBP';D,e,h) for all orthogonal P. Clearly this

is minimized with respect to P when tr PB P'D is minimized,

since the terms tr PA- P', ZnlPAP'I, and Zn|PBP'I are con-

stant for all orthogonal P. So from Lemma 2.2.1 it follows

that all the stationary points, and therefore the absolute

minimum, of ((PAP',PBP';D,e,h) occur when PBP' is diagonal.










This implies that for any (A,B) E Cs S2, there exists an

(A2,B2) E S2 such that


S(A2B 2;D,e,h) < O(A,B;D,e,h).


The result now follows from Lemma 2.2.4. Furthermore, from

Lemma 2.2.1 we see that if (A,,B,) minimizes 4(A,B;D,e,h),

then the diagonal elements of D- AD- are increasing and

the diagonal elements of B, are decreasing.

The second proof of Lemma 2.2.5 utilizes the concept of

"majorization" (see Marshall and Olkin [1974]).

Definition 2.2.6: Let x and y be real mxl vectors

with it element x. and y, respectively, and ith largest

element x(i) and y i), respectively. We say that x major-

izes y and write x > y, if

s s
i x() E y(i for s = 1,2,... ,m,
i=l (i) i=l )

with equality when s = m.

We will need some results which, while well known to

workers in the area of majorization, may not be readily

accessible to others. We prove the results here for the

benefit of the uninitiated reader.

Lemma 2.2.7: If S (mxm) is doubly stochastic, then
m
x > SLx = y.

Proof: Since S is doubly stochastic, it follows that

s.. 0 for all (i,j), and
13











for i = 1,2,...,m,



for j = 1,2,...,m.


Thus, for 1 t s m there exists kl,k2,...,kt such that


m
j=E (sklJ k i +...+s ktj)x
j=l ^I 2: ^t3


Clearly when t < m,


m
skj +s k2+...+s < j isij = 1
k1j k2j kt- i=l-


for j = 1,2,...,m,


m

j=1 (sklj+sk2 +...+sk = t

Then when t < m,

t m
i Y(i j (s +s +...+sk )x
i=l(i) j= Sklj+ k .+Sktj)x

t
E x .
i=l (


If t = m, then


s +s +...+sk
ks kt2j kt

so that


t

i=ly (i)


t
i= s.
i=1 1J


= 1,


t

j=l (sklj+sk2J ktj)x


t t
= E x. = x .
j=1 3 i=l (


m
E s.. = 1
j=1

m
E s..= 1
i=l 13


t

iZ y (i)
i=l










m
Lemma 2.2.8: If x > y and a (1) aa (2) ... > o() 0,

then
m m
S(i i) Z yi(i)
i=l i=l

Proof: Put d. = x y Then
---- (i) i *^

m m
E (x(i-y)i) = d(i)
i=l i=l

= d(o()-o(2) +

(dl+d2) (o(2)- (3)) +

(dl+d2+d3) (o(3)-a(4) +




(dl+d2+...+dm 1)((m- )- ()

(dl+d2+...+dm) o(m)"

The last term is zero, since

m m m m
d = (x()-Yi) = Z x yi) = 0.
i=1 I i=l i=1 i=li)

The partial sums are nonnegative, since
t t t t t
Sd = x ) Yi E x Z y(i) 0.
i=l i=l i=l i=l i=1

Further, the differences a(1)-o(2) (2)- (3) .....(m-1)-c(m)

are nonnegative. Hence, the result follows.

Lemma 2.2.9, Corollary: If x is an ordered vector,

that is, x >- x2 L ... 2 xm, S is doubly stochastic, and

a is also an ordered vector, then x'a (Sx) '.










m
Lemma 2.2.10: If x > y and (1) > a(2) >...> (m) > 0,

then

m m
Z x(i) (m+l-i) < Z i(m+l-i)'
i=1 i=l

Proof: The proof is similar to that of Lemma 2.2.8.

Letting di = x (i)-i, we have

m m
E (x )-yi )o = E d i(o+-i)
i^l(i) m+l-i) i(m+l-i)
i=1 i=l

= d1 (o(m)- (m-l)) +

(dl+d2) (m-1)- (m-2) +

(dl++2+d3) ((m-2)- (m-3) +




(dl+d2+...+dm-1) ((2)-o()) +

(dl+d2+...+dm)a(l)
t
We have seen that the partial sums, Z d.:t=l,...,m-l, are
i=l
m
nonnegative and Z d. is zero, so that the last term is zero.
i=l
Further, the differences o(m)-o(m-l) (m-1)- (m-2)" .'

0(2)-0(1) are negative or zero. Hence, the result follows.

Lemma 2.2.11, Corollary: If x is an ordered vector,

and y = Sx with S doubly stochastic, then

m m
Zix(m+l-i) i yio(m+l-i)"
i=l i=l

Furthermore, if o(1) > 0(2) > ...> (m) then there is

equality only if y = x.










We are now ready for the second proof of Lemma 2.2.5.

Recall that we need to show that the absolute minimum of

p(A,B;D,e,h) subject to (A,B) E C occurs when both A and B
s
are diagonal.

Proof 2 (Lemma 2.2.5): Let S1 and S2 be defined as before;

that is,

S1 = {(A,B) c Cs : A is diagonal},

S2 = {(A,B) E Cs : B is diagonal},

and recall that we need to show that if 4(A,B:D,e,h) is min-

imized at (A,,B,), then (A*,B*) E S1 n S2. Let

-l
B1 > B2 >" 8rm > 0 be the latent roots of B-, and
PB P' = diag(H ,m,-m_ ..', 1). Then

{((PAP',PBP';D,e,h) P(A,B;D,e,h)}/h

= tr PB1 P'D tr B D

m m (m 2
Z dm+1- Z m+-i d
j=1 m+l-j j=1 i=1


m m
= ji j d yjd ,m+ (2.2.4)
m+-j j m+-
= j=l


where y = = 2(B1' ,2' .'.. ), and P2 is the matrix

th 2
with (i,j) element Pm+l-j,m+l-i. Since PP' = P'P = I,

we see that P2 is doubly stochastic. Also d = (dld2,...,d )

and 3 are ordered vectors, so by Lemma 2.2.11, equation (2.2.4)

is not positive. Furthermore, d1 > d2 > ... > dm so that

6(PAP',PBP';D,e,h) cb(A,B;D,e,h),










with equality holding only when B-1 = diag(Bm ,m_1 .." 1).

Therefore, for any (A,B) E C S2 there exists an (A2,B2) E S2

such that

4(A2,B2;D,e,h) < q(A,B;D,e,h).

Now with A = D- AD- and B = D-BD-

o(A,B;D,e,h) = e[trA- D +Zn A ] + h[trB-1 +inlB]

+ (e+h)Zn|ID (2.2.5)

= 4(B,A;D- ,h,e) + (e+h)ZnIDI.

Let l 1 2 ". > 0 be the latent roots of A

and QA1Q' = diag (a1,a2,... ,m). Then by an argument iden-

tical to the previous one we find that

4(QBQ',QAQ';D- ,h,e) < 4(B,A;D ,h,e),

with equality holding only when A-1 = diag(ala2,...a '' ).

From (2.2.5) it follows that

S(DQAQ'D' ,DQBQ'D;D,e,h) 1 f(DAD ,DBD ;D,e,h),

with equality holding only when A = diag(&l,a2" .... ).

Note that DQAQ'D = diag(dl1 a ....d ma ). Thus, for any

(A,B) E Cs S1 there exists an (A1,B ) E S1 such that

q(A1,B1;D,e,h) < *(A,B;D,e,h).

The result now follows from Lemma 2.2.4.

Lemma 2.2.12, Corollary: Let R be some restriction on

the latent roots of A or B or both, and let CR be the subset
s
of C such that (A,B) e CR implies that R is satisfied.
s s
Since (A,B) e CR if and only if (PAP',PBP') E CR for any
s s
orthogonal P, it follows that the minimal value of










*(A,B;D,e,h) over (A,B) E CR occurs when A and B are diagonal.
S
For example, if the latent roots of A were known to be pro-

portional to a given set, then the minimal value of

p(A,B;D,e,h) over (A,B) E CR occurs when A is a diagonal
s
matrix with diagonal elements proportional to this set.


2.3 The Maximum Likelihood Estimates

In this section we seek the maximum likelihood estimates
s
of Z and M subject to the constraints Z e P and M e U P..
m j=0 3
Recall that the likelihood function of (E,M) is

Km(I,e)Km(I,h) H(h-m-l) (e-m-l)
f(E,H) H1(h-m-l)I(eml)
IZ+MJIhI e

x exp[-trZ E-tr(Z+M)1 H].

The logarithm of the likelihood function, omitting a function

of the observations, is

trI-1E-eZnIII tr(E+M)-1H-hnl|+MI.

We seek the solution, (2,M), which maximizes the above equa-

tion, or equivalently, the solution which minimizes

trZ-lE+eZnlI + tr(Z+M)-IH+htnl +MI (2.3.1)
s
subject to Z E P and M e U P,.
m j=0 3

Let E, = (l/e)E and H, = (l/h)H. Note that since E* and
m
H, are both symmetric matrices, and E*E Pm and H, E U P.,
j=0 j
there exists a nonsingular matrix K(mxm) such that

KEK' = I and KHK' = D, where D = diag(dl,d2,...,dm), and

d > d2 > ... > d > 0 are the latent roots of H,E,1
1 > 2 > m










Then with Z = KZK' and M = KMK', (2.3.1) can be rewritten

etrK' -E-1K-1I+etnl E+htrK'-1 (+M)- K D+hZn E+M

= e[tr-l+ZnI |l] + h[tr(+M)- D+En i+M ]

(e+h)n] K[2

= (((,Z+M;D,e,h) (e+h)nIK 2.

Thus, the problem has been reduced to that of minimizing
s
$(f,Z+M;D,e,h) subject to E P and M e U P. or,
m j=0 3
equivalently, (X,Z+M) e C But from Lemma 2.2.5 it is

known that the minimal solution to p(Z,X+M;D,e,h) is such

that E and E+M are diagonal, and in addition, it is known

that the diagonal elements of D -D are increasing while

the diagonal elements of D+M are decreasing.

Consider the function

g(x,y) = e( + nx) + IXd+Zny), (2.3.2)
x y

where d > 0. Differentiating (2.3.2) with respect to x and

y, we get the equations

1 1
+ = 0,
2 x
x

d 1
+ = 0,
2 y

which yield the minimal solution x0 = 1 and y0 = d. If

instead we wanted to minimize (2.3.2) subject to x = y,

(2.3.2) would reduce to


1 d
g(x) = e( + inx) + h(d + nx).
x x


(2.3.3)










Then

dg(x) 1 1 d 1
= e(- + ) + h(- + ) = 0
dx 2 x 2 x
x x
e+dh
so that x1 = =e+dh minimizes (2.3.3).

Now let

f(d) = g(xly1) g(x0,Y0)

Se+h e+dh d(e+h) e+dh
= e( + Rn( )) + h( dh + an(t--- )
e+dh e+h e+dh e+h

(e+h+hind)


= e(l l)h + h + (d-le) + (e+h)n(e+dh
e+dh e+dh e+h

(e+h+hind)


e+dh
= (e+h)Zn( e ) hind.
e+h
Differentiating f(d) with respect to d and noting that e > 1,

h 1, we find that when d > 1

df(d) h(e+h) h dh(e+h)-h(e+dh) eh(d-l)
--=> 0.
dd e+dh d (e+dh)d (e+dh)d

In other words, the difference g(xl,y1) g(x0,YO) is an

increasing function of d when d > 1.

Now with X = diag(xl,x2,...,xm) and Y = diag(yly2" .., m)

consider minimizing

m m d.
>(X,Y;D,e,h) = e L (- + in x.) + h Z (--- + in yi) (2.3.4)
i=l 1 i=l i

subject to (X,Y) E Cs, which in this case implies that

Yi xi > 0 for all i, and x. = y. for at least m s of the

i's. Suppose that d1 > d2 > ... > d > 1 > d > >d > 0.

Using the fact that f(d) is increasing in d for d > 1,










it then follows that the minimal solution to (2.3.4) is

(Xs,Ys), where if r I s,

si = Ysi = (e+dih)/(e+h) for s+l < i < m,

xsi = y si = di for 1 < i < s,

and if r < s,

xsi = si = (e+dih)/(e+h) for r+l < i < m,

xsi= 1, ysi = d. for 1 < i < r.
Thus, (i,Z,+M;D,e,h) is minimized subject to

(,S+M) E C at
s


so that the maximum

Z and M, where


M = Y -X,
likelihood estimates of and M are
likelihood estimates of T and M are


S= KX K',-

M= K (Y -X )K'-1

We now present an example to illustrate the computation

involved in deriving the maximum likelihood estimates.

Consider model (2.1.1) in which we take m = 4, g = 21,

n = 6, Z = I, and M = diag(99,24,0,0). Hence,e = g(n-l) =

105 and h = g-l = 20. Generating a matrix E from the dis-

tribution W4(I,105,0) and a matrix H from the distribution

W4 (I+M,20,0), we obtain










69.1329 4.07476 -5.12762 -9.94924

E= 127.055 -3.77638 20.4629
116.342 8.12511
100.186

1845.85 63.5986 -16.5227 -1.43363
H = 688.962 1.14908 -8.61601
20.1453 -,0100181
12.2617

With E, = (1/105)E and H, = (1/20)H we need to find a non-

singular matrix K such that KEK' = I and KHK' = D, where

D is a diagonal matrix. Let D1 = diag(chl (E),...,ch4(E*)),

and let P be the orthogonal matrix for which the ith column

is the characteristic vector of E, corresponding to chi(E,),

then, since E, is symmetric, P'EP = D Similarly, let

D = diag(chl(D1 P'HPD1 ),...,ch (D1 P'HPD1 )), and let Q

be the orthogonal matrix for which the ith column is the

characteristic vector of D P'H PD corresponding to

chi (D P'H*PD1) then, since D P'HPD1 is symmetric,

Q'D1 P'H PD -Q = D. Thus,we may take K = Q'D P'. Using

the above decomposition for K, we find that, for our example,

1.24522 -.0464049 .0380069 .130133
K = .0181884 -.925978 -.0477042 .213476
.00831611 -.00767896 .940375 -.237376
-.000914637 -.0158712 -.153048 -.994866

and D = diag(142.729,29.6669,.91847,.625404). Note that

d2 > 1 and d3 < 1, so that r = 2. Simple calculation yields

X0 = Y0 = diag(23.6766,5.5867,.986955,.940065),

X1 = diag(1,5.5867,.986955,.940065),










Y1 = diag(142.729, 5.5867, .986955, .940065),

X2=X3=X4= diag(l, 1, .986955, .940065),

Y2=Y3=Y4= diag(142.729, 29.6669, .986955, .940065).

Hence, if we let Z. and M. be the -vmximum likelihood estimates

of Z and M, respectively, subject to the constraints Z E P4


and M e U P., we find that
j=0

15.3199 .541388

6.52813
0 =


-.173203

-.0210184

1.0919


-.0910632

.0947756

.064921

.899582


M0 = (0),


6.5222
z1 =





91.5881 1.84178

.0370371
M1


.6578


.040037

1.20736


-.046356

-.0184676

1.0908




-.792796

-.0359426

.00686252


-.0471092

-.0378372

1.09073


-.0924036

.0947486

.0649326

.899582


.00837764

.000168469

-.000072518

.000000766-


-.0889834

.182707

.0652532

.898126










91.6383 3.13344 -.788088 -.0129987

33.2548 .105117 -.549569
M2=M =M =
.00730371 -.002076

.00909865


Further commentary on these data will be made in

Sections 3.6, 4.2, and 5.4.


2.4 The Likelihood Ratio Test
s
Recall that C = {(A,B):ACPm,EEPm,B-As U P.,
Sm M j=0

and suppose we know that (Z,Z+M) e Q = C We wish to test,

say, the null hypothesis that (E,Z+M)E w = C 1 C C The

alternative hypothesis is then that (Z,Z+M)c 0 w =

Cs C s. Thus, we are testing the hypothesis
(s)
H : rank (M) $ s-1

against the hypothesis
(s)
H1: rank (M) = s.

We adopt the likelihood approach and compare max f(E,H)

with mx f(E,H). Specifically, we look at

max f(E,H)/ria' f(E,H) = i e (0,1]

With the matrices X = diag(x s,xs2 ,...,sm) and

Ys = diag(ysl'Ys2'' 'Ysm given by


si = Ysi = (e+d h)/(e+h) for s+l i m,

si = l,si = di for 1 i s,
Sl Sl 1










if r > s, and

x i = Ysi = (e+d.h)/(e+h) for r+l < i < m,

si = 1, si = di for 1 5 i < r,

if r < s, the maximum likelihood estimators, Z, of Z and,

M., of M when the parameters are restricted to lie within

Q, are given by

S= K X K'-
0 S

Mn =K (Ys-Xs)K',

where K is a nonsingular matrix. Similarly, the maximum

likelihood estimators, r of Eand, M of M where the

parameters are restricted to lie within w, are given by

S= K-1X K ,1
o s-l

S= K- (Ys-l-Xs-lK'-

It should be noted that if r < s, then X = Xs1 and

Y = Ys-l' and if r s, xsi = Xs_,i and ysi = s-l,i

only for i # s.

The likelihood ratio, \, is

max f(E.H)

max f(E,H)


-1 -1 -h e
exp[-tr2l E-tr( +MN ) H] +Mj hIle

exp[-tr2 E-tr( +9)-1H] +A h he
~2 P- n^ ) 11]h Q ^ WI W










-etrX-1 -1] IYslEhh e
exp[-etrX -htrY D] IY Ih X e
-1 -1 h e
exp[-ietrX- -htrY D] IY | h X O
s s s-1 s-1


IYs Ih Xs e

IYs-1 Ih XsX e '

since, if r < s,

-1 -1 -1 -1
etr(X -X ) + htr(Y 1-Y )D
s-1 s s-1 s

-1 -1 -1 -1
= etr(X -X 1) + htr(Y -Y )D
s s s s

= 0,

and, if r 2 s, (2.4.1) becomes


-1 -1 -1 -1
e(x1 ,s-x) + h(Ys l, -Yss)ds
s-1,s ss s-ls ss


d h(e+h)
e(e+h) e+ (es
e+d h e+d h
s s


(e+dsh) (e+h)
e+d h (e+h) = 0.
e+d h
s

So we have


if r > s,

if r < s.


Since dl > d2 > ... > d >1 > dr+

clearly, r > s if and only if d > 1.

A as
e+d h -(e+h)
A d e+h if

h1 if


> ...> d > 0,
m
Hence, we can write



d > 1,
s


0 < d 1.
s


(2.4.1)










Now upon taking the derivative of A with respect to d over

the range ds > 1, we get

-(e+h)-1

dd s e+h e+h s

e+d h h-(e+h)-l e-de

= h d ( L J
s e+h e+h

which is negative for ds > 1. Thus, A is a decreasing

function of ds over the range ds > 1. In addition,


h e+dsh h(e+h)
-(e+h)
s (e+h- 1 for ds ,

with equality when ds = 1, so that A is a decreasing func-

tion of d
s
S(S)
The likelihood ratio test rejects Ho for small values

of A. Since A is a decreasing function of ds, the likeli-

hood ratio test rejects H(s) for large values of ds. Now

recall that with H, = (1/h)H and E, = (l/e)E, there exists

a nonsingular matrix K such that KH,K' = D and KEK' = I.

It follows then that d.: i = 1,2,...,m are the solutions to

H.-dEI = IKHK'-dKEK'I = ID-dlI = 0,

so we observe that ds is the sth largest solution to

IH,-dE*I = 0. (2.4.2)

With pi = dih/e : i = 1,2,...,m, (2.4.2) can be written

|H hEl = 0,

or


I H $E = 0


(2.4.3)










Hence, we would reject H) for large values of s = dsh/e,

where s is the sth largest solution to (2.4.3). It is of

particular importance to recall that H ~ W (Z+M,h,0) and

E Wm(Z,e,0), independently.
(s)
We have seen that the likelihood ratio test rejects H0

when 4s >c for some constant c. Now we want to choose for

the constant c some number, which we will denote by c(a,m,s)

to indicate its dependence upon a, m, and s, such that

P(Os > c(a,m,s) (Z,M)) a for all (E,Z+M) E Cs_1. For

c(a,m,s) we propose the a level critical value for the

largest root, el, from amongst the m-s+l roots of 1w1- W21 = 0,

where W1 W_ +(I,h-s+l,0) and W ~ W _+l(I,e,0), inde-

pendently. That is, we take c(a,m,s) such that

P(91 > c(a,m,s)) = a. Justification for this choice of

c(a,m,s) will be given in the next chapter.















CHAPTER 3


PROPERTIES OF THE sth LARGEST ROOT TEST



3.1 Introduction

In this chapter we investigate some properties of the
th
s largest root test presented in the previous chapter.

It would be desirable to show that this test is the uni-

formly most powerful test, but we were unable to do so for

general m. However, in Section 3.2 we show that for m = 1

the test is uniformly most powerful. Also, in Sections 3.3

and 3.4 it is shown that the sth largest root test is an
(s) (5)
invariant test of HO against H1 and is the test obtained

by the union-intersection principle (see Roy [1953]). Finally,

in the last two sections we discuss an important monotonicity

property of the roots ci : i = 1,2,...,m, and then use this

property in deriving the asymptotic distribution of s


3.2 The Uniformly Most Powerful
Test for m = 1

For m = 1 the problem reduces to that of the univariate

random effects model discussed in Section 1.1. Recall that
2 2 22
we have x -. N(p,(oz+na )/gn), u ocXe, and
2 2 2 2 2
v ( z+na) h, independently, where a,Oz, and o are all
2
unknown, and we wish to test the hypothesis H0:o = 0 against

H : 2 > 0.
S a










Suppose that for some set of points, y', in the space

of (x ,u,v), we reject H0 whenever the experimental (x ,u,v)
22 2 2
belongs to y'. Let B(y';,U ,oa ) = P[(x ,u,v)ey' po ,o ]
2
and require that y' be such that B(y';p,o2 0 ) = a. Let
z Z
x = u + v and y = v/u, so that u = x/(y+l) and v = xy/(y+l).

Then the Jacobian, II of x ,u, and v with respect to

x x, and y is Ij = (x ,u,v)/ (x ,x,y)II = x/(y+l)

Then
22 2 2 22
B(y'; a g (u;o a)g2(v;o2,o)g3(x ;p ,o2)du dv dx..
z (X z Od g . a Ot

2 2 2 2
= /// f(x,yza2 )f (x ;J )dx dy dx ,


where y = {(x ,x,y): (x ,u,v) E y'} and where, independently,
2 2
u ~ zXe'

2 22
v ~ (oz+no)2)Xh

2 2
x N(,,(a +nc) /gn),

so that


and (x


2 2 x(e+h)-l expf-x/2(o2+no2)]
f(x,y;o ,o ) = f(x,y) =z
z2 2 a(e+h)
az (2(02+no 2)) ((e+h)((e+h))
z a

2 2 e -(e+h) 2 2 2 2
(l+noa /2 ) y hl(y+l) (e+hexp[-xno /22 (2 +no ) (y+l)]
x a z a z z a
B (e,h)
,y) is independent of x .We note that when o2 = 0,


x and y are independent; that is,

2 2= x
f(x,y;oz ,0) = fl(x;z)f2(y) = fl()f2(Y)


1











where
2 2
x zXe+h


y (h/e)Fh.

Letting y(x,x ) = {y: (x ,x,y) e y}, we can write


(' 2,0) = f f f(x,y)f0(x )dy dx dx
z --m0 y(x,x )


= f fl(x)f(x ) f(y)dy dx dx
-o 0 1" y(x,x )

Putting

h(x,x ;o ) = h(x,x ) = f f2(y)dy,
z y.. (x,x )

we see that


(Y';oy2,o0) = if fl(x)f0(x )h(x,x )dx dx .
z -om 0
2 2
When o = 0, (x,x ) is sufficient for (a ,U). Further,
aO z
{fl(x) f (x..) -o < i < -, o2 > 0} is a complete family (see,
2
for example, Lehmann [1959:130]). Thus, since B(y';V,o,O0)

= a0' we must have h(x,x .) = 0.

Now let y, = { (x ,u,v): y = v/u > c) where c is some
2 2 2
constant. Then with q = (ql,q2)' where ql ~ (oz+nca)Xe+h

2 2
and q2 ~ N(p,(a 2+noc2) /gn), independently,

2 2 2 2 2 2 -he
[B(y';Poz,oo) a(y';,ozc)] (l+n /oz

2 2
= E(d(q;o2z, a)

where the expectation is with respect to the distribution

of q.










Here

d(q;oa 2'2) .(2 f )(y,ql)dy ( f (Y)Q(y,ql)dy,

where y,() = y,(ql,q2),y() = Y(ql,q2), and

2 2 2 2
Q(y,ql) = exp[-qlnoa/2o (o+noa) (y+l)].

Therefore,

q; o2, 2) f2 (y)Q(y,ql)d yf2 (y)Q'(y )dy.

Since

Q(y,ql) > Q(c,ql) when y e y.()-y(),

Q(y,q1) < Q(c,ql) when y E y()-y.(),

we find that

d(;o 2 Q(c,q (y)dy fy)dy]
~a~;ooQ y(c,q1) yO-y(Vy2 yo-y(),2(y)dy]
= Q(c,ql) [ (f2 (y)dy f2(y)dy]
y.()() 2

= Q(c,ql) [a0-a0] = 0.

Thus,
2 2
E(d(q;o ,ac)) > 0,

so that
2 2 2 2
B(y*; z,a) (y'; ,az, "

Therefore, amongst all critical regions of size a0 the crit-

ical region which rejects H0 when v/u > c is uniformly most

powerful in a test of H0: o = 0 against H: 2 > 0. That

is, the critical region 4 > c, where t is the only root of

(v-qu) = 0, is uniformly most powerful.









3.3 An Invariance Property

Consider the group of transformations G = {gX: K(mxm)

is nonsingular), where gK(E,H) = (KEK',KHK'). Since

E ~ W (Z,e,0) and H W (Z+M,h,0), it follows that

KEK'. Wm(KZK',e,0),KHK'~ Wm(KEK'+KMK',h,0), and rank

(KMK') = rank (M). Thus, the problem of testing the hypoth-
(s" (s)
esis H( : rank (M) : s-1 against H1(: rank (M) = s is

invariant under the group G.

We will need the following definition.

Definition 3.3.1: Let X be a space and G, a group of

transformations on X. A function T(x) on X is said to be

a maximal invariant with respect to G if

a) T(g(x)) = T(x) for all x EX and g e G;

b) T(xl) = T(x2) implies xl = g(x2) for some g E G.

We will also need the following well-known result (see,

for example, Lehmann [1959:216]).

Lemma 3.3.2: Let X be a space, let G be a group of

transformations on X, and let T(x) be a maximal invariant

with respect to G. A function f(x) is invariant with respect

to G if and only if f(x) is a function of T(x).

Now consider the roots, 1 > 2 > ... > m', of

H-pEI = 0 and the roots, 01 > 2 > ... > 9m, of

KHK'-SKEK' = 0, where K is nonsingular. Clearly

IKHK'-eKEK'I = 0

implies IKI H-EI K'I = 0

so that IH-OEI = 0,










and hence, 9. = c.: i = 1,2,...,m. Suppose now that

06 = 4i: i = 1,2,...,m are the roots of IH1-9E 1 = 0 and

IH2--E21 = 0, respectively, where E1, E2, H1, and H2 are all

positive definite, symmetric matrices. Then there exist

nonsingular matrices K1 and K2 such that

E1 = KK, H = KOKI'

E2 = 2K 2 = K2K,2

where P = diag(l,'2,...,i m). It then follows that

-1 -1 -1 -1
9 1H (K2K1 E K1 K',K2K1 H K'1 K')
g 21(E1,H1) = (K2 1 1 1 2 1 1 1 2
K2K1
= (K2 K1K KlK1 K,K2K1 KQK'K-1Kr)

= (K2K2,K2K" )

= (E2,H2),

-1
where gK2K lE G since, clearly, K2K1 is nonsingular. So by
2 1
Definition 3.3.1 {0: IH-d)E = 0} is the maximal invariant

with respect to G. The sth largest root, Os, is clearly a

function of (i,02'" ... ), and hence, by Lemma 3.3.2 the

test statistic ps is an invariant test statistic for testing

the hypothesis H( against the hypothesis Hs)
0 1

3.4 The Union-Intersection Principle

Suppose that in testing H(m): rank (M) m-1 against

H(m): rank (M) = m, we adopt the rule

R(m:m): reject H(m) if m > c(a,m,m).
Here 2 "' m > ae the roots of H- = 0,
Here ()I > < 2 > ... > c > 0 are the roots of JH-pE| = 0,










E Wm (Z,e,0) and H ~ Wm(E+M,h,0), independently, and

c(a,m,m) is chosen such that P(m > c(a,m,m) IHm)) a.
(s) (s)
Consider now testing H : rank (M) 5 s-1 against H1s

rank (M) = s. The hypothesis H(s) is true if and only if

the hypothesis H(): rank (FMF') 5 s-1 is true for all

F e S(m,s), where S(m,s) is the class of all (sxm) matrices

of rank s. Similarly, the hypothesis H(s) is false if and

only if the hypothesis H(s) is false,and the hypothesis

H (s: rank (FMF') = s is true,for at least one, and in fact
Fl1
(s)
all F E S(m,s). Hence, we could think of Hs as

()and H s) (s) (s)
FESm,s) FHs) and as FeSm,s) FH and reject Hs)

if (E,H) E = F ms) y(F), where y(F) is the rejection
FcS(m,s)
(s)
region appropriate to a test of the hypothesis pH The

sizes of y(F): F c S(m,s) should be such as to produce a

desired overall error of the first kind of the desired size.

Thisprocedureis known as the union-intersection procedure.

Note that we will reject H0) : rank (M) s-1 if for

some F e S(m,s), we reject Hs): rank (FMF') < s-1. Let

'F > .2F .. > > sF > 0 be the roots of IFHF' tFEF'I = 0,
where, clearly, FEF' Ws(FEF',e,0) and FHF' ~ Ws(FZF'

+ FMF',h,0), independently. Then by the rule R(s:s) we

reject FHs) s if sF > c(a',s,s),where a' is chosen to give

the desired overall error of the first kind of the desired

size. Hence, we will reject H(s) if for some F e S(m,s),

sF > c(a',s,s), or equivalently, if max tsF > c(a',s,s).
FES (m,s)











We need the following results, the first two of which

can be found in Bellman [1970:115].

Lemma 3.4.1: Let A(mxm) be a symmetric matrix. Then

the smallest latent root of A may be defined as follows:

ch (A) = min u'Au,
m'u=1
where u is a (mxl) vector.

The next result is well known as the Poincar6 separation

theorem.

Lemma 3.4.2: Let A(mxm) be a symmetric matrix. Then

for any matrix F(sxm) such that FF' = I

ch (A) > chj(FAF') 2 chs+j(A)
J n m-s+j
for j = 1,2,...,s.

We need Lemma 3.4.2 to prove the following lemma.

Lemma 3.4.3: Let A(mxm) be a symmetric matrix. Then

max min u'FAF'u = ch (A), (3.4.1)
F:FF'=I u'u=l

where F is a s x m matrix, and u is a m x 1 vector.

Proof: Since A is symmetric, there exists an orthogonal

matrix P(mxm) such that P'AP = A = diag(chl(A),

ch2(A),...,chm(A)), and hence, for any F such that FF' = I

min u'FAF'u = min u'FAF'u,
u'u=l u'u=l

where F = FP and FF' = FPP'F' = FF' = I. Then we can

rewrite (3.4.1) as

max min u'FAF'u.
F:FF'=I u'u=l










Let F,(sxm) be the matrix with (F,)ii = 1 for all i, and

(F,)ij = 0 for all i # j. Then

max min u'FAF'u min u'F AFu = ch (A).
F:FF'=I u'u=l u'u=1

Now by Lemma 3.4.2, for any F such that FF' = I, we know

that

min u'FAF'u < ch (A),
u'u=l

so that

max min u'FAF'u < ch (A).
F:FF'=I u'u=l

Therefore, it follows that

max min u'FAF'u = ch (A).
F:FF'=I u'u=l

We have seen that the union-intersection principle leads

to the rule which rejects H(): rank (M) 5 s 1 in favor of

H1(): rank (M) = s if max sF > c(a;s,s). Note that
FES(m,s)

with T(mxm) and F(sxm) such that TT' = E and F = FT, then for

fixed F e S(m,s)

IFHF' )FEF'| = 0

implies

IFTT- HT'I T'F' *FTT-ET'- T'F'I = 0,

or
IFT-1HT'-l 4FF'| = 0. (3.4.2)

Since F is of rank s, so also is FF'(sxs), and thus, there

exists a nonsingular matrix S(sxs) such that SFF'S' = I.










So with F = SF we find that (3.4.2) implies

IFT- IT'-1 (I| = 0,

and clearly, FF' = SFF'S' = I. Hence, it follows that

max 4sF = max min{: IFT HT'- F'-4IJ = 01
FES(m,s) F:FF'=I 4

Smax min u'FT-1HT'1 F'u,
F:FF'=I u'u=l

with the final equality due to Lemma 3.4.1. Now using

Lemma 3.4.3 and the fact that the latent roots of T-1HT'-

are the roots of IH- EI = 0, we observe that max sF = s'
FES(m,s)

and thus, the union-intersection principle leads to the rule
(s)
which rejects H( if s > c(a',s,s).


3.5 A Monotonicity Property
of the Power Function

The test procedure developed in the previous sections

depends on the latent roots, ,1' 2'",'... of the random

matrix HE-1. The distribution of these roots (see James

[1964]), and hence the power function of our test procedure,

depends upon the latent roots of the corresponding population

matrix (E+M)Z-I as parameters. Let 61 >62 : ... m 2 1
-l
be the latent roots of (E+M)E- and note that with T defined

such that Z = TT'

(z+M)Z-1 611 = 0

implies

IM (6-1)El = 0,











so that IT- MT' - (6-1)1 = 0.

Since Z is nonsingular, T is also nonsingular, and so the

rank of T-1MT'-1 is the same as the rank of M. Hence, M

has rank of at most s-1 if and only if 6 = 1, and testing

the hypothesis Hs) : rank (M) s 1 against Hs): rank(M)= s
0 rank(M)= s
(s)
is equivalent to testing the hypothesis H : 6 = 1 against
(s)
H1 : 6s > 1. A desirable property of the test statistic s

would be that it stochastically increases in 6 and thus,

that the power function increases monotonically in 6 In

this section we not only show that 6 stochastically increases
s
in 6s, but also that it stochastically increases in each

6.: i = 1,2,...,m. This more general result will be utilized
1
in the following section.

We will first prove the result for the largest latent

root, l That is, we will show that (1 stochastically

increases in 6.: i = 1,2,...,m.
1
Lemma 3.5.1: The test with the acceptance region

$1 = chl(HE ) 5 c

has power function which is monotonically increasing in

each population root 6..
1
The proof of Lemma 3.5.1 involves the following three

results, the first of which is due to Anderson [19551.

Lemma 3.5.2: Let y ~ N (0,1 ) and u N (O0,2 )

where Z2 Z1 is nonnegative definite. If 1 is a convex

set, symmetric about the origin, then P(yEW) 2( uE)).










Lemma 3.5.3: Let the random vectors yl,y2...., 'n and the

matrix U be mutually independent, the distribution of yi being

N (0,Z): i = 1,2,...,n. Let the set w, in the space of
m --

Iy1'y2' ". n'U}, be convex and symmetric in each yi given
the other yi's and U. Denote by Pi (w) the probability of

the set w when Z = Ei. Then whenever Z2 Z1 is nonnegative

definite, P 1(w) P 2(W).

Proof: Since Z1 and E2 are symmetric and 1 E Pm and
m
2 E U P it follows that there exists a nonsingular matrix
j=0 '
K such that KE1K' = I and KE2K' = A = diag(61,6, ... ).
m
Since it is assumed that Z2 E1 U P., we know that
j=0

6i 1: i = 1,2,...,m. Then y = Ky. ~ Nm(O,I) if Z = E ,

and i = K ~Nm(0,A) if I = E2. Let = {y,y2" ..'. nU:

(y1'Y2... ,' nU) w}, then P 1 (w) = P ( ) and P2 (W) =
PA ( ). So without loss of generality we can take E1 = I and

E2 = A. Let

Ai = diag(e1,e ...' i-1,l,i+ ,...,m),


Ai = diag(61,82 .... 8i i i+ ,... m '),

R = {y:' (1' y2 '" 'Yn'U) E ; yj: j#i and U fixed),

where 6j E {l,6 }: j # i. Then from Lemma 3.5.2 it follows

that

P i(Ri lj: j i,U) PA*(Ril j: j#i,U). (3.5.1)
iii i

Multiplying both sides of the inequality (3.5.1) by the joint










density of the temporarily fixed variables and integrating

with respect to them, we obtain

PA (w) I PA *().
i 1

Then by induction we have

P (W) PA(w),

or equivalently,

P (0) P (W).
1 2
Finally, the third result we need is due to Das Gupta,

Anderson, and Mudholkar [1964].

Lemma 3.5.4: For any symmetric matrix B(mxm) the region

a = {A(mxn): ch (AA'B) : c) is convex in A.

Proof (Lemma 3.5.1): Recall that H W (E+M,h,0) and

E ~ Wm(Z,e,0). Since the problem is invariant under trans-

formations gK(E,H) = (KEK',KHK'), we may assume, without loss

of generality, that Z + M = A = diag(61,6 2". ,6 m) and Z = I.

Then we can write H = YY', where Y = (y1',y2".. 'Yh) and

Y. N (O,A): i = 1,2,...,h, independently. So the accep-

tance region can be written as {Y: chl(YY'E- ) s c}. From

Lemma 3.5.4 it follows that the acceptance region is convex

in Y, and clearly we see that the acceptance region is also

symmetric in eachof the column vectors of Y. Note that the

vectors yl,y2, ... Yh and E are mutually independent, and the

distribution of yi is N (O,A). The result now follows from

Lemma 3.5.3.

The main result of this section follows from a result

due to Anderson and Das Gupta [1964].











Lemma 3.5.5: Suppose V W (E1,v,0) and U ~ Wm(Z2,u,0),

independently. Let X1 2 A2 ... Am be the latent roots
-i
of UV and let w be a set in the space of AXA2, .... m such

that when a point (AX 2,. ,A m) is in w, so is every point
* *
(A ,A2 ,. ,A ) for which A. < .: i = 1,2,...,m. Then the
1 2 m 1 1
probability of the set w depends on Z1 and Z2 only through the
-i
latent roots of E2 1 and is a monotonically decreasing func-
-1
tion of each of the latent roots of E2 1

Clearly, the set w = {(1'2 ... 'm): s 5 c} satisfies

the conditions of Lemma 3.5.5, so it follows that the proba-

bility of the set w is monotonically decreasing in each of

the latent roots 6,62',... 6m of (E+M)E- In other words,

the power function of the sth largest root test is a

monotonically increasing function of 6.: i = 1,2,...,m.

We now know that as 5 -- m, P( s>c) increases mono-

tonically. We will show that actually, as 6s m,

P(s >c) 1, and hence, for sufficiently large values of

5 the probability of rejecting H(s) = 1 will be arbi-
s 0 s
trarily close to one. Recall that there exists a nonsingu-

lar matrix K such that KEK' ~ W (I,e,0) and KHK'-W (A,h,0).

Let Kl(mxm) be such that

Kl(mxm) = diag(akl,ak2, .'.. ks,,... ,).

Note that KIKHK'K'{ Wm(K AK',h,0) and

K AK' = diag(c2k k 1a 2 ak 6 .,a 6
'1 1 2 2 2"' s s s+l'"' m

so that as a -r ', a2 k is. and hence ch (KiAKi) -, for










i = 1,2,...,s. Thus, we need to show that

P(ch (K1KHK'Kj(KEK') ) > c)- 1 as a c. The following

lemma provides the necessary result.

Lemma 3.5.6: Let V Wm( 1,v,0) and U Wm(Z2,u,0),

independently, and let

Kl(mxm) = diag(akl,ak2,...,aks,,.. ,1) .

Then P(ch (K UKV-1) > c) 1 as a -- .

Proof: Let


U =
VU21


U 12

U22


where U11 is s x s, U21 is (m-s)xs, U12 is s x (m-s), and

U22 is (m-s)x(m-s). Similarly, define Vll, V21, V12, and V22.

Let F, be the s x m matrix with (F,) =1: i = 1,2,...,s

and (F,) i = 0: i # j, and let

K2(sxs) = diag(kl,k2,... ,ks).

Recall from Section 3.4 that

chs(K UK'V1) = max min{A:IFK1UK'F'-XFVF' = 0}
FES(m,s)

2 min{A:IFK UK{F'-AF*VF'l = 0}

= min{X: a2K2UK-AVI = 0}

2 -1
=a chs(K2UlKV1 ).

Thus,


Sch (K21-1
= P(ch(K2UK2V11) > c/a2),


V 121 V v 22)2










-1
and K2UIIK2Vll is positive definite with probability one,

so that
-1
lim P(ch (K UK'V ) > c)

-1 2
= lim P(chs(K2UlKV11) > c/a )
lia (- s ,
oo
= P(chs(K2U1K2V) > 0) = 1.


3.6 The Limiting Distribution of s

We have seen that the likelihood ratio test for testing
(s) (s)
the hypothesis H : rank (M) < s 1 against H1 : rank (:M)
th
= s is based on the s largest root, %. However, if s is
s S
to be used as a test statistic, it is necessary to compute

the significance level, a, where

a = sup P(%s > clH0 )
U(s)
"0

With 61 62 ... > 6m as the latent roots of (Z+M)E-
(s)
the null hypothesis can be written H : 6 = 1, or more
0 s
precisely, H(s) 61 62 >... 2: 1, 6 = 6+
0 1 2 s-1 s s+1
... = 6m = 1. We will write s:m (61,62,...,6m) to indicate

that t is the sth largest root of m roots and depends on

the population roots 61,62 .... 6m. Then we may write a,

the significance level, as

a = sup P(s:m (61,62... ,_ ,1... ,1) > c).
61262 *" s-121
6 2a -. .. ?>l

But we saw in the previous section that s is stochastically

increasing in each 6.: i = 1,2,...,m. It then follows that










a = P( s:m( ,..., ,1, ...,1) > c),

where #s: (l, ... ,1, . .,1) denotes the random variable

which has the limiting distribution of s:m(662 .... 's-l'

1,...,1) as 6 m: i = 1,2,...,s-1. So the problem at

hand is to determine the distribution of s:m 0,,...,,

1, .. ,1) .

Recall that E ~ W (Z,e,0), H W W (E+M,h,0), and there

exists a matrix K such that KZK' = I and K(Z+M)K' = A =

diag(61,62 .... m). If we define E and H as

E = A KEK' W (A-e,O),

H = A KHK'A- ~ W (I,h,0),

where A2 = diag(61 ,62 ... ), then clearly s:m(61,

6,.... ) = ch (HE ) = ch (HE1). Hence,if we let
-1
En ~ W (Ane,0), where An = diag(n61,n62'... ,n6 1,,...,) ,
-~-1
then we need to find the limiting distribution of ch (HE )
s n
as n 0. Since we can write E = Y Y', where Y = (n)
n n n n 1
(n) (n) (n) -
Y ) 'e ) and Yi Nm(0,An ): i = 1,2,...,e, inde-

pendently, we can restate the problem as that of determining

the limiting distribution of ch (H(Y Y')-). Consider the
s onn
following elementary result.

Lemma 3.6.1: If u N(0,l/n), then un -L- u,

where u is a degenerate random variable with all of its

probability at zero.

We also need the following results, the first of which

is well known as the continuity theorem (see, for example,

Breiman [1968:236]).










Lemma 3.6.2: Let x1,x2,x3,... be a sequence of random
d
vectors. Then x --> x if and only if
--n

lim E[exp(ix't)] = E[exp(ix't)]
-n-
n-*

for all t where i = /-1.
(n) d
Lemma 3.6.3: Suppose that as n x. --> x

(n) (n) (n)
j = 1,2,...,m, and suppose {xl ,...,"x } are mutually

independent for all n. Then

x(n) x
1 X1

(n) xn) d = .
x2 -2 -


x (n)
-m m

Proof: Note that it follows from Lemma 3.6.2 that

(n) '
lim E[exp(ix n)'t)] = E[exp(ixt j)].
n- -J -J 3

Also, because of independence,

m
(n, m (n) E
E[exp(ix )t)] = E[exp(i Z x t)
j=-1

m (n) '
= n E[exp(ix. t.)],
j=l J


lim E[exp(ix(n 't)] = lim E[exp(ix ) t)
n-- j=l n-


m
= E[exp(ix't.)]
j=1









m
= E[exp(i Z x!t.)]
j=1l 3-

= E[exp(ix't)].

The result now follows from Lemma 3.6.2.

From Lemma 3.6.1 and Lemma 3.6.3 we observe that

Y -> Y with
n

1}
Y = ,
Y
2

where the elements of Yl((s-l)xe) are all equal to zero with

probability one, and Y2 = (Y21Y22...'Y2e) with

Y2i ~ Nm-s+1 (,I): i = 1,2,...,e, independently.

Consider the following result, the proof of which can be

found in Ostrowski [1973:334].

Lemma 3.6.4: Let A(nxn) and B(nxn) be two matrices, and

suppose the latent roots of A and B are A. and AI: i = 1,2,
1 1
...,n, respectively. Put

N = max (la..j bi |),
li n, ljin
and
1 n n
6 = Z la..-b. .
ni=l j=l 1

Then to every root A' of B there belongs a certain root
1
A. of A such that we have
11/n
IA.-AXi : (n+2)N6/n

Further, for a suitable ordering of A. and A! we have

IA.-Ai!I 2(n+l)2N61/n











Lemma 3.6.5, Corollary: If A is an n x n matrix, then

for each i ch.(A) is a continuous function of the elements

of A.

Lemma 3.6.6, Corollary: Let A be an n x n matrix and B,

an n x D matrix. Then the roots of the equation

IA-ABB'I = 0 (3.6.1)

are continuous functions of the elements of A and B except

at B such that BB'I = 0.

Proof: Let X.: i = 1,2,...,n be the roots of (3.6.1). Then
---- 1

when IBB'I 0, it follows that these roots are also the
-i
latent roots of A(BB')-. So, from Lemma 3.6.5, for each i
-1
X. is a continuous function of the elements of A(BB').
1
But clearly, when IBB'I # 0, the elements of A(BB') are

continuous functions of the elements of A and B. Hence, for

each i X. is a continuous function of AandB except when
1
IBB'I = 0.

We need one final result involving the limiting distri-

bution of a function of random vectors (see Mann and Wald

[1943]).

Lemma 3.6.7: Let x -->x, and let g(x) be a Borel
-n
measurable function such that the set R of discontinuity

points of g(x) is closed and P(xER) = 0. Then

g(x ) > g(x).
-n
Now recall that we seek the limiting distribution of
-l
ch (H(Y Y') ). In order to use Lemma 3.6.7 it is neces-
s n n
sary to show that ch (H(YY') ) is continuous with
5










probability one under the distribution of (H,Y). Now w:

H 11 12
H = ~D
H21 H22/

where H11 is (s-l)x(s-l), H12 is (s-l)x(m-s+l) H21 is

(m-s+1)x(s-1), and H22 is (m-s+1)x(m-s+l), the roots unc

the distribution of (H,Y) are the solutions to



4 = 0. (3.
H21 H22 (0) Y2Y2

Since H is nonsingular with probability one, we may put


(G11 G12)
H-i = G = ,


so (3.6.2) can be written

((0) G12Y2Y) 0
Im- (0) GYY = 0,
m (0) G22 Y2 YI


or

I-1 -OG12Y2 2
= 0.
(0) Im-s+l- G22 2 2

Thus, it must be true that

m-s+l G22 2YY = 0,

or


c G22 c= 0
Then we see that with probability one chundefined, (HY')-and
ch2( )-I),..... chs-1(H(YY')-1) are undefined, and


ith







der


(3.6.3)


6.2)









-1
ch (H(YY') ) is now the largest solution to (3.6.3); that is,
s
since YY' is of rank m-s+l with probability one, there are

only m-s+l solutions to IH-yYY'I = 0. It can be shown (see,

for example, Graybill [1969:165]) that G22 = (22-H 21HIHI2 -1,

since H22-H21 H 12 is nonsingular with probability one,

so that (3.6.3) can be written


H22-H21H11H12 = 0.

Clearly, Y2 Y is also nonsingular with probability one,
-1
and hence by Lemma 3.6.6 ch (H(YY') ) is continuous with

probability one under the distribution of (H,Y). The set of

discontinuity points, R, is closed, since R = {(H,Y): IY2 Y =0}.

Note also that as is well known (see, for example, Anderson

[1958:85]) H22-H21H1H12 m-s+(I,h-s+1,0). Therefore,
21 11 12 m-s+
from Lemma 3.6.7 since (H,Y ) -- (H,Y), it follows that

s:m( ', ,..., ,1,...,1) ~ 1:ms+ (1,1,...,1),

where l:ms+ (1,1,...,l) denotes the distribution of the

largest root of IW1-OW21 = 0, where W1 ~ W _s+l(I,h-s+l,0),

and W2 W ms+ (I,e,0), independently.

So in testing H s: rank (M) s-1 against H1s)

rank (M) = s we choose as our critical value c(a,m,s), where
-1
P(chl(W1W2 ) > c(a,m,s)) = a. By so doing we will guarantee
that

sup P(s:m (61 62 .' m) > c(,m,s)IH0 )) = a.
(s)
0
Charts and tables of the distribution of the largest root,

al, of W1 -e(W1+W2)1 = 0 are available (see, for example,









Morrison [1976:379], Pillai [1965,1967]). These may be used

to calculate c(a,m,s), since 61 = 1/(1+4i), where i1 is the

largest root of I,-1W,21 = 0.

In order to determine the rank of M, a sequential test

procedure is used. To illustrate this procedure, we will

return to the example presented in Section 2.3. Recall that

D = diag(142.729, 29.6669, .91847, .625404), h = 20, e = 105,

so that 1 = 27.1865, t2 = 5.65084, (3 = .174947, and 4=
(4)
.119125. First we consider testing the hypothesis H 4

rank (M) 3 against H 4): rank (M) = 4. The null hypothesis,

H 4), is rejected if t4 > c(.05,4,4), where c(.05,4,4) =

17 F(17,105,.05)/105, and F(17,105,.05) is the constant for

which P(F(17,105) > F(17,105,.05)) = .05 if F(17,105)
17
F1 0(0). Thus, c(.05,4,4) is approximately equal to .28,

and clearly, 4 = .119125 < .28, so that we do not reject

H4 Since H4) is not rejected, we now consider testing

the hypothesis H3) : rank (M) 2 against H1 : rank (M) = 3.
0 1
(3)
The null hypothesis, H 3, is rejected if $3 > c(.05,4,3).

Using the charts mentioned earlier we find that c(.05,4,3)

is approximately equal to .36. Since 03 = .174947 < .36,

we do not reject H3) and so next consider testing the
(2) (2)
hypothesis H2) : rank (M) : 1 against H2 : rank (M) = 2.

We find that c(.05,4,2) is approximately equal to .42, and

therefore, since 2 = 5.65084 > .42, we reject H2) and

conclude that the rank of M could very reasonably be taken

as two.






74



The procedure above is open to objections on the grounds

that the significance level for the test criterion has not

been adjusted to take into account the fact that a sequence

of hypotheses is being tested, with each one dependent on

the previous ones not being rejected. The mathematical com-

plications involved in controlling the overall error make

such an adjustment virtually impossible to carry out.















CHAPTER 4


MAXIMIZATION OF THE LIKELIHOOD FUNCTION
WHEN Z = o



4.1 The Likelihood Function

Suppose the vectors x. (mxl): i = 1,2,...,g;j=l,2,...,n

can be modeled by

x.. = + Lf + z., (4.1.1)
-1- -- -1i

wherein i(mxl) is a fixed but unknown vector, L(mxp)

is a fixed but unknown matrix, f. N (0,1): i = 1,2,...,g,
-1 p
2
and z.. ~ N (O,a I): i = 1,2,...,g;j=l,2-...,n. We assume

that the set of random vectors {fl,f2 ,.... f ,z ,... ,z }

are mutually independent. Thus, xi.. N (u,V) with

V = LL' + o2I. For any orthogonal matrix P(pxp) it follows

that V = LL' + o21 = LP(LP)' + o21, so that, while LL' is

unique, L is not unique. In this section we will derive the

likelihood function for p, LL', and 0

By methods identical to those presented in Section 2.1

it can be shown that (x ,E,H) is sufficient for (1_,o2,nLL')

where
S g n
x = E E x ./gn Nm (,(l/gn) (o I+nLL')),



E = g E (x.i-x. ) (x..-x. )' W (o I,e,0),
i=l j=l -1. -1-. m











g 2
H = n (x. -x )(x. -x ) '- W (o I+nLL',h,0),
i=l .. -
i=1

and e = g(n-l); h = g-l. In addition, if we let c denote

a constant, we find that the density of E can be written

f(E) = c|E f(e-m-1) exp[-tr(a2I)-1E]

m
= cl (e-m- ) exp[-( Z e.)/22]
i=l

m 2
= 9 ( e ii; )g2 (E).
1=1
m
Hence, from the set {ell,e1l2 ... ,e m,m-l,em b=trE= I e..
2 i=1
is sufficient for a2.

We may assume, then, that we have, independently,

x b, and H where
2
x N( (i,(1/gn) (a I+nLL')),

2 2
b/o Xg

2
H W ( 2I+nLL',h,0),
m
2
and B = mg(n-l). The problem is to estimate 1, a and LL',
2
or equivalently, to estimate i, a and M where M = nLL'.

We have seen that L is not uniquely defined and so if LL' is

an estimate of LL', then any L, such that LL' = LL', is an

estimate of L. The likelihood function of (H,o ,M) can be

expressed as
Km(I,h) - 1 1
f(x ,b,H) = 2h b H
-- .. / .........2.,... \. \ -,_2 .... 1 , 2, i,,


2(tL /gn)1 a I+MN) ( 0 M (2Y ) Z (0) )

x exp[-gn(x -i)'2I+M)-1 (x -v)l-b/a-tr~o2 II-t M)1H ],


(h-m-l)









where, as before,


K-l I, h) = 2 mh m(m-l) I F((h-j+l)).
j=1

The logarithm of the likelihood function, omitting a function

of the observations, is
2 2 2 -1 l 2
-b/2a -Zno -tr (a2I+M) H-h-n [I+MI

-Rnlo2I+M -gn(x -_ )'(a (2I+M)-1 ( -_).

We seek the solution which maximizes the equation above, or

equivalently, the solution which minimizes

b/a2+Bno2+tr( 2 I+M)-1H+(h+l)inIa 2I+M (4.1.2)

+gn(x -v_) (c2I+M)-1

If we ignore the constraints that 2 is positive and M is

nonnegative definite and seek the stationary values of (4.1.2)

over all possible (p,a ,M), we find, upon taking the partial

derivatives of (4.1.2) with respect to a M, and v and

setting them equal to zero, that

--b/(2) 2+B/ 2-tr(o2I+M)- iH(2I+M-+(h+l)tr(a2I+M)-1

-tr(gn(a2I+M)-1 (x -Iy)(x -v) (2I+M)-) = 0;

-(a2I+M)-1H(a2I+M)-1+(h+l) (2I+M)-1

-gn(2 I+M)-1-(x x ) '(o2I+M)-1 = (0),

gn(o2I+)- (x -j) = 0,

for which the solutions are


-2
0 = b/B,

M = (h+l)H (b/8)I.









Since M is a nonnegative definite matrix, its maximum likeli-

hood estimate must also be nonnegative definite, so the solu-

tions above are the maximum likelihood estimates only if

(h+l)- H (b/B)I is nonnegative definite. Although the

solutions for p and 2 are the natural unbiased estimates,

the solution for M is not since E(M) = (h+l)-1(hM-o2I).

In addition, we observe that E(M) is also not necessarily

nonnegative definite.

Using the principle of marginal sufficiency referred

to in Chapter 2, we see that (b,H) is marginally sufficient
2
for (o ,M). Hence, we choose to use the marginal likelihood

function of (o2,M) instead of the likelihood function of

(, ,12M). The marginal likelihood function of (o2,M) can be

expressed as

K (I,h)
f(b,H) = b h2-1 2 h -m-
I2 1h1(2a2) a2 )

x exp[-b/2a2-tr(o2I+M)-1H] .

The logarithm of the likelihood, omitting a function of

the observations, is

-b/2a2- no2-tr(o2I+M) H-hn oa2I+MI,

and we seek the solution which maximizes this equation, or

equivalently, the solution which minimizes

b/a2++tna+tr(a2 I+M)-1 H+hnl 2I+M1. (4.1.3)

Again, if we ignore the constraints that 2 is positive and

M is nonnegative definite and seek the stationary value of
(4.1.3 over all possible ( we find, uon taking
(4.1.3) over all possible (a ,M), we find, upon taking










the partial derivatives of (4.1.3) with respect to 02 and M

and setting them equal to zero, that

-b/(2) 2+B/a2-tr(a2I+M)-lH(o2I+M)- +htr(2 I+M)- = 0,

and

-(a2 I+M)-H(a2I+M)-1+h(a2 I+M)- = (0),

for which the solutions are
-2
2 = b/6,

M, = (l/h)H (b/8)I.

Note that these solutions are the natural unbiased estimates

of a2 and M and, clearly, E(M,) = M is nonnegative definite.

Hence, we choose to continue our work with the marginal like-

lihood function of (2 ,M). Since M is nonnegative definite,

the solutions above are the maximum likelihood estimates

only if (1/h)H (b/6)I is also nonnegative definite. In the

next section we will derive maximum likelihood estimates for
2
o and M which are valid for all possible (b,H).


4.2 The Maximum Likelihood Estimates

In this section we seek the maximum likelihood estimates
2 2 s
of a and M subject to the constraints a > 0 and M U P..
j=0
Recall that, aside from a constant, the logarithm of the

likelihood function of (2 ,M) is

-b/2o2- Betnc-tr(c2I+M) -H-hZnl 2I+Ml

We seek the solution which maximizes the equation above, or

equivalently, the solution which minimizes

b/a +Bnca +tr( 2I+M)-1H+hinl 12+Ml









2 s
subject to a > 0 and ME U P.. Note that this can be
j=0 3
rewritten as

tr( 2I)-1 (I)+ inl o2I +tr(c2 I+M) -1H+hinj 2I+M (4.2.1)
m m
Since (b/B)I and H, = (l/h)H are both symmetric matrices,
m
and (b/B)I eP and HE U P., there exists a nonsingular
j=0
matrix K(mxm) such that K((b/B)I)K' =I and KH,K' = D where

D = diag(dl,d2, .. ,dm) and dl>d2> ... >dm > 0 are the latent

roots of H((b/B)I)- = (B/b)H,. Then with 2 = BG /b and

M = KMK', (4.2.1) can be rewritten


-tr K' (a 2I)-1 K + ,n I o2II+htrK'- ( 2I+M)- K- D
m m
+ hnlao2I+M

= [tr(2 I)-l+in 2II]+h[tr(2I+M)-D+n]2I+M]

(-+h) n K 2
m

= (&2I,2 I+M;D,Ih)-t( +h)nlK 12,

where c is the function discussed in Section 2.2. Thus, the
,2 -2
problem has been reduced to that of minimizing ( 2 Ia I+M;
o 2 ~ s
D,-,h) subject to 2 > 0 and M U P., or equivalently,
m j0 j=0
2 2 s
(8 I,B I+M) Cs since C = {(A,B): AEP BEP B-AE U P.}.
s s m m j=0

Now for fixed (02I,B 2I+M) E C consider t(P(2 I)P',
2- B 2 2- B
P( 2I+M)P';D,-,h) = (2 I,('2I+M)P';D,-,h) for all orthog-
r m
onal P. Note that this is minimized with respect to P when

tr P(2 12+)- P'D is minimized. So from Lemma 2.2.1 it fol-

lows that all stationary points, and therefore the absolute

minimum, of (2 I,P(o2I+M)P';D, ,h) occur when P(2 I+M)P'


__











is diagonal. Hence, in searching for the absolute minimum

of (a&2I,32I+M;D,I,h) over all (o21,21 +M) E C we may assume
m s
that 2 I+M is diagonal. This result also follows immediately

from Lemma 2.2.12.

Now with V = diag(vl,v2,...,vm) and fi(vi) = di/vi

+ n vi, consider minimizing

1 m d.
( 1 1
,(uI,V;D, h) = ( +inu)+h Z (-- + nv.) (4.2.2)
i=l 1

1 m
= B(- +Znu)+h E f.(vi),
u i=l

subject to (uI,V) e Cs. The constraint (uI,V) E Cs can be

equivalently written as

v. > u > 0 for i = 1,2,... ,m, (4.2.3)
1
and

v. = u for i E J, (4.2.4)
1
where J c {l,2,...,m} is a set which has at least m s

elements. Now

df.(vi)
1dv = (1-d./vi)/v.,
dv 1 1 1

so that the function f. decreases monotonically in v. for
1 1
v. E (0,d.], increases monotonically in v. for v. s [di.,),

and is minimized over all v. E (0,c) when v. = d.. Thus,
1 1 1
the unrestricted minimum of (4.2.2) occurs when u = 1 and

V = D. It is evident from the structure of f. that if the
1
unrestricted minimum does not satisfy the constraints

(4.2.3) and (4.2.4), then the restricted minimum will










occur when u = v. = v. = ... v. for some set of integers
11 2 ik

{ili2 ... ik} c 1,2,...,m}. We need to determine k, the

number of integers, and also we need to know exactly which k

integers from amongst the integers 1,2,...,m comprise the

set {il,i2 .. ik }.

First, we will consider the constraint given by (4.2.3).

Let the variable r be defined in the following manner. If

l m -1 1< m m-1. i
then let r = 1. If dm< ...< dt+< < dt < ...< d, then

let r, 1 r < t, be the smallest value for which

m
d > (B+h Z d.)/(B+rh).
m-r j=m-r+l

Finally, if d < d <. < dI < 1, then let r, 1 r l m 1,
m m-1 1
be the smallest value for which the inequality above is

satisfied. If the inequality is not satisfied for any

choice of r, 1 r r m m 1, then let r = m. Now if r = 0,

the minimum of (4.2.2) subject to (4.2.3) is simply the

unrestricted minimum of (4.2.2), and if r > 0, the minimum

of (4.2.2) subject to (4.2.3) is just the minimum of (4.2.2)

subject to u = vm= v1 = ... mr+1 which occurs at

m
u = vm = ... = vmr+l = (B+h j d.) / (I+rh),

v. = d. for i = 1,2,...,m-r.

Now consider the constraint given by (4.2.4). If

r > m s, then the minimum of (4.2.2) subject to (4.2.3)










and (4.2.4) is simply the minimum of (4.2.2) subject to

(4.2.3). If r < m s, the minimum of (4.2.2) subject to

(4.2.3) and (4.2.4) is obtained by minimizing (4.2.2) sub-

ject to

S = = ... = v. if r = 0,
V1 32 m-s (4.2.5)
S=m = m-r l = j = ... =vm if r > 0,


where {jlj2'.. m-s-r} {12,... ,m-r-l,m-r}. We will

now show that, in fact, j = m-r, j2 = m-r-l,... m-s-r

s+l. Note that for q = 1,2,...,m-1 (4.2.2) is minimized

subject to u = v ...=vm-q+l when

m
v= = = vm-q+l = (B+h Z dj) /(+qh),
j=m-q+l

v. = d. for j = 1,2,...,m-q,
J 3

and has a minimal value equal to

m
j +-+n (. m-q
(I+ h dd
(++qh)Zn -qh +( +qh)+h(m-q)+h E Znd (4.2.6)
j=l 3

Similarly, (4.2.2) is minimized subject to u = vm =

= Vmq+1 = v., where i E {1,2,...,m-q-l,m-q}, when

m
u =v= .. = Vm-q+l = vi = (S+h E d.+hd.)/(B+(q+l)h),
j=m-q+l

(v. = d. for j = 1,...,i-1,i+l,...,m-q,
J J
and has a minimal value equal to
m
(+h d. +hd. mq
j l3 m-q
(S+(q+l)h)Zn -- (q+l)h -- (3+(q+l)h)+h(m-q-l)+h Z Znd..
l=l j1
j#i
(4.2.7)










Now subtracting (4.2.6) from (4.2.7), we obtain

m m
(+h E d.+hd. +h Z d
( j(--mih)-n ^--'-qul h
(S+(q+l)h)Zn m+(q+l)h )-(B+qh)yn m+qh )- "lh i'd
(4.2.8)

which is the increase in the minimal value of (4.2.2) due to

the additional constraint, u = v.. Differentiation of (4.2.8)

with respect to d. yields


h +(q+l)h ) h
m d. '
\+h E d.+hd.
j=m-q+l
m
which is negative when d. < (-+h Z d.)/(B+qh) and positive
1 j=m-q+l
m
when d. > (H+h Z d.)/(3+qh). Hence, (4.2.8) is an increas-
1 j=m-q+l 1
m
ing function of d. when d. > (S+h E d.)/(-+qh), so that if
1 1 j=m-q+l 3
m
d > (f+h Z d.)/(B+qh), choosing i = m-q will yield
m--q j=m-q+l

a smaller minimum value than any other choice of i < m-q.

In a similar manner subtracting the unrestricted minimal value

of (4.2.2) from the minimal value of (4.2.2) subject to u = v.

where it {1,2,...,m}, we obtain

/D +hdi
(B+h)nI- ( --h Znd.,
-+h 1

which is an increasing function of d. for d. > 1. Thus, if

d > 1, choosing i = m will yield a smaller minimum value than

any other choice of i
the minimum of (4.2.2) subject to (4.2.3) and (4.2.4) when

r< m- s. If r = 0, then d > 1, so that m-r = m is the
mn










optimal choice for jl. Further, since di > 1 for i = 1,2,...,m

where r = 0, we have


B+h Z d. (B+qh)( d d /q)
j=m-q+l < j=m-q+l 3
B+qh (B+qh)


E d./q < dm-q
j-=m-q+l


for q = 1,2,...,m-l, and hence, when r = 0 choosing j = m,

j2 = m-l,...jm-s = s+1 in (4.2.5) will yield a smaller
minimum than any other choice of {jlj2 .... ms}c {1,2,

...m-l,m}. Now from the definition of r we see that

m
dmr > (6+h Z d.)/(B+rh) if 1 < r < m-l. In addition,
j=m-r+l
m
for q = 1,2,...,m-2 if d >(B+h z d.)/(8+qh), then
m-q j--m-q+l


m
B+h Z d.
=m-(q+)
B+(q+l)h


m
(6+qh) ((+h E d.)/(6+qh)) + hd
= j=m-q+l m-q
B+(q+l)h

< ((B+qh)d +hd_ )/(6+(q+l)h)
m-q m-q
=d < d
m-q m-q-1


m
Thus, dmq > (8+h Z d )/(B+qh) for q = r,r+l,...,m-l.
-q j=m-q+l 3

It then follows that, when 1 < r < m-s, choosing jl = m-r,

j2 = m-r-l,...,jm-s-r = s+l in (4.2.5) will yield a smaller
minimum than any other choice of {jl,j2 ...,j msr

{1,2, .... m-r-1,m-r}.

We can now obtain the minimal solution to (4.2.2)

subject to (4.2.3) and (4.2.4). Denoting the minimal solution

by (s ,Vs), we find that if r > m-s,









m

j=m-r+l j
{ssm s,m-1smml=- *=Vsmr+l j^m r+1


v j = dj for j = 1,2,...,m-r,
s] 3

and if r < m s,

m
us=vsm=v s,m-=...=v s,s+l= (+h Z d.)/((+(m-s)h),
Sj=s+l


v = dj for j = 1,2,...,s.

2 -2 ~
Thus, l(o I,o I+M;D,-,h) is minimized subject to
m

(62I, I+MR)E C at

-2
= Us'

M V u I.
s s

The maximum likelihood estimates of 2 and M are, therefore,
^2
a and M given by

2 = bu /B,

-1 1
M= K (V -U I)K'
s s

To illustrate the computation involved in deriving the

maximum likelihood estimates, we will again consider the

example presented in Section 2.3. Recall that with m = 4,

g = 21, n = 6, Z = I, and M = diag(99, 24, 0, 0) a matrix E

from the distribution W4(I, 105, 0) and a matrix H from the

distribution W4(I+M, 20, 0) were generated. With

B = mg(n-l) = 420, b = trE, and H, = (1/20)H, we need to find

a nonsingular matrix K such that K((b/3)I)K' = I










and KHK' = D where D is a diagonal matrix. Let D1 =

diag(chl(H,),ch2(H,),...,chm(H,)), and let Q be the orthog-

onal matrix for which the ith column is the characteristic

vector of H, corresponding to ch (H,), then since H, is sym-

metric, P'HP = Dl. Clearly, ((B /b )P)'((b/B)I) ((6 /b)P)

= P'P = I and ((a /b )P)'H,((B /b )P) = (8/b)D1. Hence,

we find that, for our example,

1.00723 .0551967 -.00906271 -.00104477

.0551796 -1.00719 -.00310948 .0127707
K =
.00921922 -.00261055 1.00874 -.00010739

-.000345628 -.0128084 -.000137374 -1.0087

and D = diag(94.1065, 34.8845, 1.01721, .618312). Note that

d4 <1
yields u0 = 6.06506, V0 = 6.065061, ul = 2.39667, V1 =

diag(94.1065, 2.39667, 2.39667, 2.39667), u2 = .984153,

V2 = diag(94.1065, 34.8845, .984153, .984153), u3 = u4 =

.982651, V3 = V4 = diag(94.1065, 34.8845, 1.01721, .982651).
2
Thus, if we let o. and M. denote the maximum likelihood
1 1
estimates of o2 and M, respectively, subject to the con-
2 i
straints 2 > 0 and Me U P., we see that
j=0
^2
-0 = 5.95987,

MO = (0),
^ = 2.3551,
2 =2.3551,











89.8421 4.92338 -.808366 -.0931905

.269803 -.0442987 -.00510687
M =
1 .00727337 .000838493

.0000966637J

-2
o2 = .967085,


91.3255 3.17993 -.826433 -.0715582

33.4811 .0575388 -.426237
M =
.00770191 -.000448497

.0054369

^2 ^2
03 = G = .965608,


91.327 3.17993 -.826136 -.0715587

33.4825 .0574547 -.426256
M3 4
S.0416617 -.000452157

.00543713



4.3 The Likelihood Ratio Test
s
Recall that C = { (A,B): A Pm, Be P, B-A U p. }, and
m m j=0
2 '
suppose we know that (a I,o-I+M) e Q = Cs. We wish to test,

say, the null hypothesis that (o2I,a I+M) e = Cs-1C Cs

The alternative hypothesis, then, is that (o I,o2I+M) e

Q-) = C C Thus, we are testing the hypothesis
s s-l1

H s): rank(M) s 1

against the hypothesis

HS : rank(M) = s.
1










We adopt the likelihood approach and look at

max f(b,H)/max f(b,H) = A e (0,11.
w 2

With u and the matrix V = diag(v slvs2...,v s

given by

m
u = v =...=v = (B+h E d.)/(a+rh),
s sm s,m-r+l j-m-r+1
j-m-r+l

v = d. for j = 1,2,... ,m-r,
sj

if r > m s, and

m
S= v m=...=v + = (+h dj)/(B+(m-s)h),
s sm s,s+l j=s+i
j=s+l

= d. for j = 1,2,...,s,
-2 2
if r < m s, the maximum likelihood estimators a, of a ,and

M., of M, when the parameters are restricted to lie within

n, are given by

a2
0 = bus/B,

-1 -1
M = (Vs-usI)K'

where K is a nonsingular matrix. Similarly, the maximum

likelihood estimators 2 of a and M of M, when the

parameters are restricted to lie within w, are given by
-2
2 = bu /B,
S s-l1

-1 -1
MW = K- (Vsl-us- I)K'-

It should be noted that if r > m s +1, then V = V1 and

u = us-
s s-1*










The likelihood ratio, A, is

A = max f(b,H)/max f(b,H)


exp[-b/2 2 -tr(2 I+^M )-H] y^2 )a I+M h
expb/2 ^2 1 "2 ) 2 h
exp[-b/2a Q-tr (oaI+A ) H] (a) Ia2I+'iI' O
Q3 Q3 Q w w


-1
exp[-3/2u -_-htrV sD]_
s- s-htrV 1
-1l
exp[-%/2u -bhtrV1 D
s s


IVs h u8;
s s
V 1h u-i


IVs h u

Vs_11h u-1

since, if r > m s + 1

-1 -1 -1 -1
(u 1-u ) + h tr(V -V )D (4.3.1)
s-1 s s-1 s
-1 -1 -1 -1
= B(u -u ) + h tr(V -V )D
s s s s

= 0,

and if r < m s + 1, (4.3.1) becomes

B+(m-s+l)h :B+(m-s)h (s-) + (m-s+l)h
i-s
m m m _
B+h E d. 6+h E d) ,+h i d. =s
j=s 3 j=s+l j=s

+(m-s)h m
m d
+h Z d
j=s+l


6+(m-s+l)h
m
B+h Z d.
j=s D


m
(B+h Z d.)
j=s


+ (m-s) h
m
j+h Z d.)
j=s+l


m
(S+h d.) h
j=s-+l


= B + (m-s+l)h (B+(m-s)h)-h = 0.










So we have

m (B+h(m-s))
/H+h(m-s+l)) r+h Z d.
dh (+(m-(il) ()) j=s+l if r < m-s+l,
S B+h m -d B+(m-s)h if r
S +h d. /
j=s



1 if r m-s+l.


m
Putting t = hd /(S+h Z d.), we can rewrite X as
s s j=s


(B+(m-s+l)h)h 3+(m-s+l)h) (+h(m-s))
h B+(m-s)h th (l-t +h(-s)

if r < m-s+l,





1 if r 2 m-s+l.


We will now show that r < m-s+l if and only if

t > h/(B+(m-s+l)h). First consider the case in which

s = m. Then r < m-s+l = 1 if and only if d > 1, and

t = hd /(B+hd ) = h/(S/d +h) > h/(f+h) if d > 1, and
in m m m m
tm = h/(B/dm+h) < h/(B+h)

if d < 1. Consider now the case in which 1 s r m-l.
m
Again we want to show that r < m-s+l if and only if

ts > h/(B+(m-s+l)h). If r = 0, clearly
m
dmi > (B+h d.)/(B+ih),
j=m-i+l 3

for i = 1,2,...,m-l. Also, if 0 < r < m-s+l, then











m
d > (B+h Z d.)/(8+rh),
m-r j=m-r+l

and we have seen that this implies that

m
d > (B+h X d.)/(8+qh),
m-q j=m-q+l

for q = r,r+l,...,m-l and, more specifically, for q = m-s.

Hence, if r
m
d > (B+h Z d.)/(h+(m-s)h),
s j=s+l

which implies

m
3 + h Z d. < d (B+(m-s)h)+hd,
j=s 3 s s

so that

m
S+ h Z d. < hd (- + m-s+l),
j=s 3 s

and thus
hd
s 1 h
t >
s m j/h+m-s+l B+(m-s+l)h'
B+h Z d.
j=s 3

Also, if r > m-s+l, then it must be true that

m
d ms) ds ($+h X dj)/(B+(m-s)h)
j=s+l 3

which implies that

hd
s h
t = -
s m -+(m-s+l)h
B+h Z d.
j=s D










It follows that the likelihood ratio, X, can be written as

6+ (mh-s+)h m-s+h) h (+h(m-s)) th (-t ) (B+h(m-s))
h j +(m-s)h s s

if t > h
s B+(m-s+l)h '


1 if ts 6+(m-s+l)h


h (B+h~m-s))
Consider the function g~ts) t s (1-ts)

The derivative of g(ts) with respect to t is

th-(1-t s)h+h(m-s))- [h(l-t )-k(B+h(m-s))t 1,
S

which is negative for ts E (h/(B+(m-s+l)h),l). Thus, A is a

decreasing function of ts when ts (h/(i+(m-s+l)h),l). In
s S
addition,
(B+(m-s+l) 1hh 5+(m-s+l)h (B+h(m-s)) thh h (1-t) (B+h(m-s)) < i,
h-- ) B+(m-s)h s s


for t (h/(B+(m-s+l)h),l), with equality when ts
S s
h/(B+(m-s+l)h), so that X is a decreasing function of ts.
(s)
Since the likelihood ratio test rejects H for small

values of A, it equivalently rejects H( for large values

of t However, the distribution of t is intractable, and
s s
(s) (S)
so use of t in a test of Hs versus H is not practical.
s 0 1
In the following chapter we present an alternative test
(s) (s)
statistic for testing H0 versus H1




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs