/ii
I /
/ /
THE MULTIVARIATE ONEWAY CLASSIFICATION
MODEL WITH RANDOM EFFECTS
BY
JAMES ROBERT SCHOTT
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1981
To Susan
and
My Parents
ACKNOWLEDGMENTS
I would like to express my deepest appreciation to
Dr. John Saw for suggesting this topic to me and for
constantly providing guidance and assistance. He has made
this project not only a very rewarding educational expe
rience but also an enjoyable one. I also wish to thank
Dr. Andre Khuri Dr. Mark Yang, and Dr. Dennis Wackerly for
their willingness to provide help when called upon.
Finally, a special thanks goes to Mrs. Edna Larrick
who has turned a somewhat unreadable rough draft into
a nicely typed manuscript.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . . . . . .. .. iii
ABSTRACT . . . . . . . . ... . . vi
CHAPTER
1 INTRODUCTION . . . . . . . . 1
1.1 The Random Effects Model, Scalar Case 1
1.2 The Multivariate Random Effects Model 11
1.3 Notation . . . . . . . .. 14
2 MAXIMIZATION OF THE LIKELIHOOD FUNCTION
FOR GENERAL Z . . . . . . .. 17
2.1 The Likelihood Function . . . .. .17
2.2 Some Lemmas . . . . . ... 27
2.3 The Maximum Likelihood Estimates . .. 40
2.4 The Likelihood Ratio Test . . .. .46
th
3 PROPERTIES OF THE s LARGEST ROOT TEST . 51
3.1 Introduction . . . . . . .. 51
3.2 The Uniformly Most Powerful Test
for m = 1 . . . . . . . 51
3.3 An Invariance Property . . . . 55
3.4 The UnionIntersection Principle . .. 56
3.5 A Monotonicity Property of the
Power Function . . . . . .. 60
3.6 The Limiting Distribution of s . .. 66
4 MAXIMIZATION OF THE LIKELIHOOD FUNCTION
WHEN Z = o21 . . . . . . .. 75
4.1 The Likelihood Function . . . .. .75
4.2 The Maximum Likelihood Estimates . .. 79
4.3 The Likelihood Ratio Test. . . .. .88
5 AN ALTERNATIVE TEST WHEN Z = o2I
AND ITS PROPERTIES . . . . . .. .94
5.1 Introduction . . . . . . .. 94
5.2 An Invariance Property . . . .. .95
TABLE OF CONTENTS (Continued)
CHAPTER Page
5 (Continued)
5.3 A Monotonicity Property of the
Power Function . . . . ... 96
m
5.4 The Limiting Distribution of Z 101
i=s
BIBLIOGRAPHY . . . . . . . . . 108
BIOGRAPHICAL SKETCH . . . . . . . ... 110
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
THE MULTIVARIATE ONEWAY CLASSIFICATION
MODEL WITH RANDOM EFFECTS
By
James Robert Schott
August 1981
Chairman: Dr. John G. Saw
Major Department: Statistics
A wellknown model in univariate statistical analysis
is the oneway random effects model. In this paper we
investigate the multivariate generalization of this model,
that is, the multivariate oneway random effects model.
Two specific situations, regarding the structure of
the variancecovariance matrix of the random error vectors,
are considered. In the first and most general case, it is
only assumed that this variancecovariance matrix is sym
metric and positive definite. In the second case, it is
assumed, in addition, that the variancecovariance matrix
is a scalar multiple of the identity matrix.
Maximum likelihood estimates are obtained and the
likelihood ratio test for a hypothesis test on the rank
of the variancecovariance matrix of the random effect
vectors is derived. Properties of the likelihood ratio
test are investigated for the general case, while for the
second case an alternative test is developed and its prop
erties are investigated. In each case a sequential proce
dure for determining the rank of the variancecovariance
matrix of the random effect vectors is presented.
CHAPTER 1
INTRODUCTION
1.1 The Random Effects Model, Scalar Case
Suppose a physician is considering administering some
particular blood test to his patients as a part of their
physical examination. He suspects that the test results
vary with the presence and severity of a particular path
ological condition. In order to examine variability in
the results of the blood test, the physician chooses to
administer the blood test n times to each of g patients.
This results in the observations x..: i = 1,2,...,g;
13
j = 1,2,...,n.
A suitable model to explain the different values of
x : i = 1,2,...,g; j = 1,2,...,n would be
X.. = V + ai + i... (1.1.1)
th
Here 9 is an overall mean, a. is an effect due to the i
patient, and z.. represents a random error due to the
measuring process. We assume that z..: i = 1,2,...,g;
j = 1,2,...,n are independent and have a normal distribu
2
tion with mean zero and variance a 2
Z
If the physician is interested in using the blood test
as a diagnostic tool, he will certainly be interested to know
whether a major source of variation in the results of the
blood test is due to variation between the patients. Since
the physician will administer the test to an unlimited num
ber of patients in the future, we should properly regard the
g patients involved as a sample from the entire population
of patients. The patient effects, a : i = 1,2,...,g, now
have the role of random variables, and (1.1.1) is a random
effects model. Again we assume that ai: i = 1,2,...,g are
independent and have a normal distribution with mean zero
2
and variance a Thus, from our model (1.1.1) we deduce that
x.. has a normal distribution with mean p and variance
2 2
a + G .
a z
The variation in the results of the blood test is
governed by a2 + a The portion of this attributable to
a z
2 2 2
the patients is, of course, oa/(Oa+o ), and the physician
would like to know whether this or, correspondingly,
22 22
a /a is sizeable. If 0 /2 is sufficiently large, he would
a z a z
choose to investigate the possible use of this test as a
means of detecting the pathological condition; otherwise
he would find the blood test essentially useless as a diag
nostic tool. Hence, the physician might be interested in
2
testing the hypothesis Ho: o = 0 against the hypothesis
H1: 2 > 0.
1 a
In order to derive the likelihood ratio test for testing
the hypothesis H against H we first need to obtain the
2 2
likelihood function of (p,o ,o ). This is most easily done
by making a transformation. Let C be an orthogonal matrix,
with the element in the ith row and the jth column denoted by
cij, such that Clj = 1//n : j = 1,2,...,n. Since C is
orthogonal,
n n
E ckj = /n Z clj ckj = 0 for k = 2,3,...,n.
j=1 j=1 (1.1.2)
Consider the orthogonal transformation
(Yil'Yi2'' Yin)' = C(xilxi2... xin) (1.1.3)
Upon replacing xij by the right side of (1.1.1) and using
(1.1.2), we observe that
n
yil = E xik/ = / i.'
k=l
n
Yij = k cjk zik for j = 2,3,...,n,
n
where x. = Z xik/n. Thus,
S k=l
Cov(ij,yik) = 0 for j # k,
V (Yij) = 2 for j = 2,3,...,n,
ij~ z
and {xi ,Yi2,Yi3, "'in =1 is a set of gn mutually
independent random variables, where x. has a normal dis
ti.
2 2
tribution with mean p and variance a /n + o and yij
z aij
has a normal distribution with mean zero and variance o .
z
n 2 n 2
Note also from (1.1.3) that (x..x. ) = y and
j=l i j=2 1
denote this quantity by u.. We can now write the joint
density function of yi2,yi3,...,yin as
n
2 2 2 2
f(yi2',yi3". in;o) = I(2Ta) exp[y /20
j=2
2 (nl) n 2 2
= (22a )(n exp[ y /2 2]
j=2
n 2 2 2
= ( Z y j;oz) = g(ui;oz)
j=2
so that from the set {x. ,y.i2,yi3 .... in}, (xi.,ui)
2
is sufficient for (i,oz). Thus, we may assume that we have,
independently, u. and x. for i = 1,2,...,n, where ui/oa
has a chisquare distribution with v = ni degrees of
freedom, and x. has a normal distribution with mean p and
i.
2 2
variance o /n + Note that with
z a
g
x = x. /g,
.. i=1 .
g 2 g 2 2
S(x. I) = Z (xi.x ) + g(x )
i=l i= 1
2 2 2
Then putting a = z /n+a we can write the joint density
function of x1.,x2..... x as
2 2 2 2
f(xl ,x2 ...,x ;X 2) = = (2T2) exp[(xi. ) /22]
1.2.i=l
2 5g g 2 2
= (2ro)g exp[ X (xi. ) /20]
i=l1
= 2g 2 2 2
= (2n2)g exp[( E (x. x ) +g(x w) /2o2]
i=l
= g(x ,v;p,o ) ,
g 2
where v = n E (x. x ) Hence, from the set
i=l
2 2
{x ,x2.,...,x }, (x ,v) is sufficient for (,a z/n + a )
Also, if we let c denote a constant, we can write the joint
density function of ul,u2,...,u as
2 2g 2 u1 2 v
f(ulu2 '"'... ;z) = c exp (ui/2a )u /(o
i=l
where u =ii u.. Thus, from the set {ul,u2,...,u }, u is
2gv g 2 g vl
= (a) z expl( Z u./2o) Hlcu.
i=l i=l
2
where u Z u. Thus, from the set fulu,. .,u 1 u is
sufficient for o .
z
We may now assume that we have, independently, x ,
u, and v, where x has a normal distribution with mean
2 2 2
p and variance (a +no )/gn; u/o has a chisquare distri
z a 2
2 2
bution with e = g(nl) degrees of freedom, and v/(a +noa)
has a chisquare distribution with h = gl degrees of
2 2
freedom. The likelihood function of (u,,a o ) can be
expressed as
2 2 2 el 2
exp[(x p) gn/2(o +nac2) u elexp[u/2a ]
f ( ,u,v) = 2 z2 ae
S(2T(ao +na )/gn) (20 ) e (e)
v hl exp[v/2(o2+no2)
(2(o2 +n2 ))hF(h)
z a
Let the set = {(a2): a = 0} and the set
2 2 2
S= {(oa ) : o > 0}. We seek the maximum likelihood
2 2
estimators, i and 8 of v and a when the parameters
( Z Z
are restricted to lie within w, and the maximum likelihood
^2 ^2 2 2
estimators, u, oz and &2 of 2, oz, and a2
when the parameters are restricted to lie within Q.
2
In ao = 0 and
2 2 el 2
exp[(x ) gn/2az] u exp[u/2a i
f(x ,u,v) = (2 T/gn (2Y2z)eF (e)
(2iro /gn) (2a2) (e)
z z
v h1 exp[v/2o2]
zz
(202) h (h)
so that the logarithm of the likelihood function, omitting
a function of the observations, is
(x..P) gn u+v (e+h+l) n2
in c(1.1.4)
2a2 22 2 z
z z
2
Differentiating (1.1.4) with respect to p and oz, we
obtain, respectively, the equations
2
(x..p)gn (x ) gn+u+v e+h+l
2 2' 2 2 0,
z 2( ) 22z
which yield the maximum likelihood solutions
11 = x..
2
82 = (u+v)/(e+h+l).
z0
In S the logarithm of the likelihood function, omitting
a function of the observations, is
2
.. ) gn u e 2 v (h+l) 2 2
_____  n a 2 2 Zn(o +n ).
2 2+a 22 2 z 2(a 2+2 z
2(a +no ) 20 2( +no )
z a z z a
(1.1.5)
2 2
Differentiation of (1.1.5) with respect to i, ao, and a
yields, respectively,
(x j)gn
2 = 0
(0 +na )
z a
2
n((x j) gn+v) n(h+
2 2 n(h+
2( 2+no2 ) 2( 2+na )
z a z a
(x 11) gn+v u e (h+1)
2 22 + 22 2 2 2 0.
2(a +na ) 2(a ) 20 2(0 +nn )
z a z z z a
2 2
Solving these equations for J, a and a we obtain the
Z' C1'
maximal solution of the likelihood function in 0, (,2 ,
2
2
o = u/e = u,,
2
2 = (v/(h+l)u/e)
a = (v/(h+l)u/e) /n = (v*u,)/n,
where u, = u/e and v, = v/(h+l). Since we insist that

a2 be greater than or equal to zero, the solution above
is the maximum likelihood solution only if v,u, 1 0.
Suppose, however, that v, < u*. Clearly (1.1.5) is still
maximized when V = x so that we need to minimize
U + e Zn o2 + + (h+l) in(a2+nG2)
2 z (2 +n2 za
z z a
2 2
subject to the contraints 2 > 0 and a2 0. Equivalently,
Z a
we consider the problem of minimizing
i(x,t) = u/x + e in x + v/t + (h+l) Zn t
subject to the constraint t x>0. For fixed x W(x,t) is
concave upward in t with its absolute minimum at t = v,.
For each x i(x,t) is, therefore, minimized with respect to
t a x when
rv, if v, 2 x,
t =
S if v, < x.
Thus, p(x,t) is minimized over {(t,x): t > x > 0} by
setting
t = v, and x = u, if v, ? u,,
t = x = (u+v)/(e+h+l) if v, < u,.
Hence, for the maximum likelihood estimators when the
parameters are restricted to be within Q, we obtain
S= x
^2
Gz2 = u*,
S2
8a = (v,u,)/n,
if v, > u,, and
=
2 =
2
a =
x(
(u+v) / (e+h+l),
if v, < u,.
Substituting the maximum likelihood estimates into
the likelihood function, we see that in w
max f(x ,u,v) = ue vh exp[(e+h+l)/2]
W u+v (e+h+l) 2(e+h+l)
r (e) F (h) (e+h+ )( l) (i/gn) 2
and in Q
ue vh1 exp
u
F (e)F (h) ()
max f(x ,u,v)=
max f(x ,u,v)
The likelihood ratio, A, is
max f(x ,u,v)
max f(R ,u,v)
[(e+h+l)/2]
e (h+l)
(E1) (i/gn)
(u/e) e(v/ (h+l)) (h+l)
[(u+v)/(e+h+l) (e+h+l)
1
2 (e+h+l)
if v, 2 u*,
if v, < u*.
if v, 2 u.,
if v* < u,.
Now putting w = u/(u+v) and noting that
v, 2 u, if and only if (iff)
v u
h+ e iff
u+v > 1 1
Su( + ) iff
h+l e h+1
e u
e+h+ u+v w
we can rewrite the likelihood ratio
(e+hl)(e+h+l) (e(_w) (h+l) if w e
ee (h+)l) w (w) if w e+h+l'
e (h+1)
1 if w > e
e+h+l
Since X is an increasing function of w, and Ho is rejected
for small values of X, it follows that H is rejected for
small values of w or large values of 1/w. Now
1 = u+v = + = + h v/h
w u u e u/e
so the likelihood ratio test rejects Ho for ev/hu large.
Recall that u/C has a chisquare distribution with
z
2 2
e degrees of freedom, and v/(a +na ) has a chisquare dis
z a
tribution with h degrees of freedom, independent of u.
2 2 2
Hence, the quantity a ev/(a +na )hu has an F distribution
z z a
with h and e degrees of freedom. If we let F(h,e,a) denote
the constant for which P(F(h,e) F(h,e,a)) = a where F(h,e)
has an F distribution with h and e degrees of freedom, then
we will reject Ho if ev/hu 2 F(h,e,a). The power function
2 2
of this test is a function of = a /az and is given by
B(e) = P(F(h,e) 2 F(h,e,a)/(l+n9)).
Although the analysis which we have just outlined is,
by now, quite standard to any graduate level course in design
and analysis, we have reproduced it since it motivates the
more general problem to be described in the next section.
Indeed the situation we wish to consider contains the one
way random effects model as a special case to which we can
return on occasion to check our work.
1.2 The Multivariate Random Effects Model
Suppose a physician is considering administering a bat
tery of m distinct types of blood tests to his patients as
a part of their physical examination. He believes that,
based on the results of these tests, he may be able to detect
any one of several particular pathological conditions. In
order to examine variability in the results of the blood
tests, the physician chooses to administer the battery of
blood tests n times to each of g patients. This results in
the observations x.. (mxl): i = 1,2,...,g; j = 1,2,...,n.
A suitable model to explain the different values of
x..: i = 1,2,...,g; j = 1,2,... ,n would be
.ij = + a + j. (1.2.1)
13 1 1
Here j(mxl) is an overall mean, (mxl) is an effect due to
the ith patient, and z. (mxl) represents a vector of random
errors due to the measuring process. We assume that
z: i = 1,2,...,g; j = 1,2,...,n are independent and have an
mvariate normal distribution with mean 0 and variancecovar
iance matrix Z.
Since the physician will administer the tests to an
unlimited number of patients in the future, we should prop
erly regard the g patients involved as a sample from the
entire population of patients. The patient effects,
a.: i = 1,2,...,g, now have the role of random vectors, and
1
(1.2.1) is a multivariate random effects model. We will
assume that ai: i = 1,2,...,g are independent and have an
mvariate normal distribution with mean 0 and variance
covariance matrix A. Hence, from our model (1.2.1) we see
that x.. has an mvariate normal distribution with mean y
13
and variancecovariance matrix A + X.
While there are m different blood tests, it is believed
that there are some groups of tests for which the tests within
a group vary quite strongly together. In other words, the
data from some of the tests are highly correlated. For this
reason the number of sources of variation between the patients,
which we will denote by p, may be less than the number of tests,
m. That is, the rank of the variancecovariance matrix A is
p where p : m. Since A is symmetric, nonnegative definite,
and of rank p, there exists a matrix L(mxp) such that A = LL'.
Clearly L is not unique since if A = LL' and P(pxp) is such
that PP' = I, then A = L,L* where L = LP. This enables us to
rewrite (1.2.1) as
x. = + Lf. + z. (1.2.2)
where f (pxl): i = 1,2,...,g are independently distributed,
having a pvariate normal distribution with mean 0 and
variancecovariance matrix equal to the identity matrix.
If the physician is interested in using the blood tests
as a diagnostic tool, he will certainly be interested in deter
mining the value p, since the p sources of variation may
correspond to p different pathological .disorders. So of
particular interest to the physician is a test of the hypoth
esis H( : the rank of the matrix LL' < s1 against the
0
hypothesis H(s) the rank of the matrix LL' = s. With such a
test procedure he could develop a sequential test procedure
for determining the rank of LL'. He would first test H()
0
(m) (m)
against H and if he rejects H he would stop and take
the rank of LL' to be m; otherwise, he would proceed to test
(m1) (ml)
H against H The procedure continues until either
some hypothesis H( is rejected, in which case he then takes
(1)
the rank of LL' to be s, or the hypothesis H is accepted,
0
in which case he would conclude that there is no significant
variation between patients.
In this paper we investigate the multivariate oneway
classification model with random effects, given by (1.2.2).
Two specific cases, regarding the structure of the variance
covariance matrix E, will be considered. In the first and
most general case we will assume no more than that E is sym
metric and positive definite. In the second case we will
assume that the vector of random errors, z.., is such that
1]
its components are independent and have the same variance.
That is, we assume that Z is equal to some constant multiple
of the identity matrix. In each case we develop a test
procedure for testing the hypothesis H(S: the rank of
o
LL' < s1 against the hypothesis Hs): the rank of LL' = s.
1
In addition, we investigate some of the properties of these
test procedures and present a numerical example to illustrate
the use of these procedures.
1.3 Notation
The following
Notation
(A) .
1,
(A)
a..
A1
A'
IAI
tr A
dg(A)
diag(al,a2,...,am)
chi (A)
notation will be used whenever convenient:
Interpretation
row i of the matrix A
column j of the matrix A
the element in row i and column j
of the matrix A
the element in row i and column j
of the matrix A
the inverse of the matrix A
the transpose of the matrix A
the determinant of the matrix A
the trace of the matrix A
the diagonal matrix which has as its
diagonal elements the diagonal
elements of A
the diagonal matrix which has al,
a2'../aa as its diagonal elements
the ith largest latent root of the
matrix A
Notation Interpretation
rank (A) the rank of the matrix A
Im the m x m identity matrix
I the identity matrix (used when the
order of the matrix is obvious)
(0) the matrix which has all of its
elements equal to zero
x a vector
x. the ith element of the vector x
0 the vector which has all of its
elements equal to zero
E(x) the expected value of x
V(x) the variance of x
Cov(x,y) the covariance of x and y
P(A) the probability of event A
P(AIB) the probability of event A given
event B
S(x) the gamma function
x > x n converges to x in distribution
a > a convergence of a sequence of constants
exp(x) Euler's constant, "e," raised to the
x power
E is contained in
is distributed as
N(p,o2) the normal distribution with mean
Sand variance
v and variance o
Notation
N (_, E)
V1
F
V2
W (E,v,0)
Jones [1973]
Jones [1973:1]
Interpretation
the mvariate normal distribution with
mean p and variancecovariance matrix Z
the central chisquare distribution with
v degrees of freedom
the central F distribution with vl numer
ator degrees of freedom and v2 denomina
tor degrees of freedom
the central Wishart distribution with
variancecovariance matrix Z and degrees
of freedom v
the reference authored by Jones and
published in 1973
page 1 of the reference authored by Jones
and published in 1973
CHAPTER 2
MAXIMIZATION OF THE LIKELIHOOD FUNCTION
FOR GENERAL Z
2.1 The Likelihood Function
Suppose the vectors x. (mxl): i = 1,2,...,g; j = 1,2,
...,n can be modeled by
ix. = E + Lf + (2.1.1)
wherein Q(mxl) is a fixed but unknown vector, L(mxp) is
a fixed but unknown matrix, f. N (0,I): i = 1, 2,...,g,
and z.. ~ N (0,Z): i = 1,2,...,g; j = 1,2,...,n. We assume
that the set of random vectors {f ,fl .... g'zll .. '/Zgi
are mutually independent. Thus, xij Nm (,V) with V =
LL' + Z. However, for any orthogonal matrix P(pxp), V =
LL' + Z = LP(LP)' + Z so that L is not unique whereas LL'
is unique. The purpose of this section is to derive the like
lihood function for v, LL', and E. Although xij and xkz are
independent for all (j,Z) when i # k, x.j and x are not
independent even when j # I, since Cov(xij,xi ) = LL'(j#).
Thus, the likelihood function is not simply the product of
the density functions of the x.i's. A transformation of the
x. 's will expedite the derivation of the likelihood function.
D
Consider the Helmert transformation (see, for example,
Kendall and Stuart [1963:250]) given below:
il= xi. + (2*1)yil + (3"2)yi2 + "'" + (n(nl))yi ,
x = x. (2l) i + (32)y + + (n(nl)) yi
i2 . il i2
x. 2 (3"2) i 2 yi
x = 2(3*2) + *** + (n(n1)) ,
i3 i. i2
x. = x. (n1) (n(nl))yi,
in i. i'
where v = nl. It will be helpful to note that the above
equations imply the following:
 i i i
x n x il + ni2 + ... + n xin'
Yil = 2il 2xi2'
Yi2= (3.2) x + (3*2)xi2 2(32)xi3'
y = (n(nl)) xi + *+ (n(nl)) i,n (n1)(n(n1)) xin
In matrix formulation we have
(Xil ... in)' = H(xi.'Yil ...'. Yiv) '
and we note that, while not an orthogonal matrix, the columns
of H are orthogonal. The matrix H fails to be orthogonal
since H'H = diag(n,l,l,...,l). Observe that, upon replacing
xij by the right side of (2.1.1), we have
x. = I + Lfi + zi.,
1. 1..
Yil = (2) (z ili2)
i2 = (32) (z +z 22z ),
i2 i 12 3
Yiv = {n(nl)} (zil+i2 ..+i,n (nl)zin.
Thus
E(xi.Y j) = (0),
E(YijY q) = (0) if j # q,
E(yij.Yij) = .
Hence, it follows that {x 'Yil"'i are a set of
gn mutually independent vectors with x ~N (p,(1/n)Z+LL'):
i = 1,2,...,g and yij Nm(0,'): i = 1,2,...,g; j = 1,2,...,n.
n v
Note also that Z (xix ) (x x. )' = Yijj and denote
j=l '1 ij2 i. j=l
this matrix by Ei. We can now write the joint density function
of Yil... 'Yi as
f(Yil .... Yi;Z) = 1l 27ZIexp[yi E yij]
j=l l
= 12ZIvexp[j Z ( zyij)]
j=l
= 2rZlvexp[h E tr(y!.j yj )]
j=1 L
2Zvv exp[ 1
= 12 exp[ EZ tr(Z Yij Y)]
j=l
j1
= 27irZ exp[ tr E.]
= g(Ei;z)
so that from the set {x. ,Yil',i2 ..... yi, (E i'i.) is
sufficient.
Thus, we may assume that we have, independently,
E. Wr ( ,' 0)
x1 i r g.
x N (1 Z + LL')
Note that
9 _
g g
i.l   .... ..
where x = Z x. /g. Then putting W = (l/n)Z + LL',
.. 1.
i=l
we can write the joint density function of x ,x ,...,x as
1. 2. g.
S1 
f(x ,x ... ;,W) = 2exW exp[ (x i_)'W (x. 
1. 2 .. g. i. i.
ii=
9 
= 2zrW Iexp[ Z tr(W (x. ) (x i))]
i=l 1i. i
i=1
4 9 1
= J2ffWgexp[j tr(W (xi._) (xi.)')]
i=i=l
= 2nW exp[ tr(W Z (x. v) (x v)')]
iI 1
21
1g
= j27yWI hexp[Ig tr W 1 (x vi) (x ii)'
' tr(W Z (x. x ) (x. x )' )]
i=l1
= 2lTW gexp[g tr W(x i) (x )' ]
1 g
x exp[ tr W E (x. x ) (x. x }']
i=l 1
= g(x ,H;jp,W),
g
where H = n E (x. x ) (x. x )' Hence, from the set
i=l
{xl. ... ,x }, (x ,H) is sufficient for (_, (l/n)E + LL' .
Also if we let c denote a constant, we can write the joint
density function of E1,...,E as
f(E1,...,E ;; ) = c IE 2(vml)exp[ tr(EZE)]
S i=l
= c exp[ tr(Z E) E. (m1)
i=l i=l
= gl (E; z)g92(E1,E2" ..E ) ,
g
where E = Z E.. Thus, from the set {E1,...,E 1, E is
i=l 1 g
sufficient for Z.
Then we may assume that we have, independently,
x.. ~ (g, (E+nLL' )),
E ~W (E,e,0),
H~ W (Z+nLL' ,h,0),
m
where e = g(nl) and h = gl. The problem is to estimate
j_, Z, and LL' or, equivalently, to estimate H, Z, and M where
M = nLL'. Recall that L is not uniquely defined so that if
LL' is an estimate of LL', then any L, such that iL' = LL',
is an estimate of L. The likelihood function of (_,Z,M) can
be expressed as
Km(I,e)Km (I,h)
f(x ,E,H) 2j (Ie)K mh) e H (hm) E \(em)
I(E+M) I Z+Mllx 1 2e
gn
x exp[(x i) ((+M)) (M 1E)1tr(Z E)tr (Z+M) H] ,
gn
weeK1 2 fm(ml)
where K (I,v) 2 m(ml) ((vj+l)).
m j=1
The logarithm of the likelihood function, omitting a function
of the observations, is
trZ Ee n[ Z tr (s+M) 1Hh Zn I +MI
l 
ZnlZ+MI (x /)' ((/gn) (Z+M)) (x H).
We seek the solution, (_, ,,M), which maximizes the equation
above, or equivalently, the solution which minimizes
tr Z1E~e Ean + tr(Z+M)1H+(h+l)ZnlZ+M
1 
+ (x Hj) '((l/gn) (Z+M)) (x H). (2.1.2)
Before we can minimize the above equation, we need some results
on differentiation. Let W(mxm), X(mxm), and Y(mxm) be sym
metric matrices, and let z(mxl) and a(mxl) be vectors. The
proof of the first result can be found in Graybill [1969:267].
Lemma 2.1.1:
3anX 1 1
anlX = 2X dg(X ).
Lemma 2.1.2:
aZnlX+Y = 2(X+Y) dg((X+Y)).
ax
Proof: Let V = X + Y. Then
3an x+Yl 3 anlv av p n k V
ax. ax.. lDpaqm v x < ;v.
13 13 pq 13 13
so nX+Y ZV = 2Vldg(Vl) = 2(X+Y)ldg((X+Y)).
Lemma 2.1.3:
1
Str(X+Y) W 1 1 1 1
X = 2(X+Y) W(X+Y) +dg((X+Y) W(X+Y)) .
Proof: Let V = X + Y. Note that
3X .= (0) = V + (v 1 )
ID \1x131 a i
"ij ij/
1
so that = V V .
Then atr(X+Y) 1W Dtrv W = av
Then ^= tr W
ax.. 3x. 3x.i
S tr V' 1 V Vw
m m av
= ZE (V WV1) n
p=l q=l qp axij
S(V WV 1)ji (V WV ). if i j,
(Vlwv1)ii if i =
11
2 (V WV ). if i = j.
(V WV1) i if i j.
1
Hence, 3tr(X+Y) W = 2V1WV1 + dg(V1WV1
= 2(X+Y) W(X+Y)1 + dg((X+Y)1W(X+Y))
Lemma 2.1.4:
3(za)'W(za)
8z = 2W(za).
3 (za)' W(za) m m
Proof: a ) (z aq)w
1z z =i q=l
m m
= Z ( a )w. + E (z a)w
q=1 p=l
= 2 Z ( a )wiq = 2(W). (za)
q=l qq i
S(za)' W(za)
so that z = 2W(za).
If we ignore the constraints that 2 is positive definite
and M is nonnegative definite and seek the stationary values
of (2.1.2) over all possible (j,Z,M), we find, upon taking the
partial derivatives of (2.1.2) with respect to E, M, and
and setting them equal to zero, that
e h+1 1 1 1_ (+M)H(E+M)1
eX +(h+l) (Z+M) 1 Z HZ (+M) H(Z+M)
gn(X+M)1 (x ) (x 1)' (E+M)1 = (0),
(h+l)(E+M)1(E+M) H(E+M) gn(+M) (x . ) (x _)' (+M)
= (0),
gn(Z+M)1 (x 0) = 0,
for which the solutions are
= (l/e)E,
M = (/ (h+l))H(l/e)E.
Since M is a nonnegative definite matrix, its maximum likeli
hood estimate must also be nonnegative definite, so the solu
tions above are the maximum likelihood estimates only if
(l/(h+l))H(l/e)E is nonnegative definite. We find that,
while the solutions for 4 and Z are the natural unbiased esti
mates, the solution for M is not. That is,
E(M) = (l/(h+l)) (hME).
Hence, we see that E(M) is also not necessarily nonnegative
definite.
Suppose that instead of using the likelihood function
of (,ZE,M) we use the marginal likelihood function of (E,M).
Justification for this follows from the fact that (E,H) is
"marginally sufficient" for (Z,M) or, in other words, (E,H)
is "sufficient for (Z,M) in the absence of knowledge of u."
For a detailed description of the principle of marginal suffi
ciency see Barnard [1963]. There is ample precedent for the
use of this principle in multivariate theory. For example,
Bartlett's test has two forms, one involving the sample size
and the other involving the degrees of freedom. The marginal
likelihood function of (E,M) can be written
= Km(I,e)Km(I,h) HI(hm1) IE (em1)
[Z+MIhiZ e
x exp[ tr ZEE tr(E+M) H],
where
1 Vmv m(m1) m
K (I,) = 2 i r ( H F( (j+l)).
m j=l
The logarithm of the likelihood, omitting a function of the
observations, is
tr ~ E _e n tr( +M) H Jh knl+Ml.
We seek the solution, (Z,,M,), which maximizes the above
equation, or equivalently, the solution which minimizes
tr ZEE + e knlZI + tr(Z+M)1 H + h ZnlZ+M. (2.1.3)
Again if we ignore the constraints that Z is positive defin
ite and M is nonnegative definite and seek the stationary
values of (2.1.3) over all possible (Z,M), we find,
upon taking the partial derivatives of (2.1.3) with respect to
Z and M and setting them equal to zero, that
eZl+h(E+M)1 Z1EE (+M) 1H(+M)1 = (0),
h(E+M) (E+MI1H(E+M)1 = (0),
for which the solutions are
*, = (l/e)E,
M, = (l/h)H(l/e)E.
We see that these solutions are the natural unbiased estimates
of Z and M, and thus E(M,) = M is clearly nonnegative definite.
For this reason, we choose to continue our work with the mar
ginal likelihood function of (E,M). Note that since M is non
negative definite, the solutions above are the maximum likeli
hood estimates only if (1/h)H(l/e)E is also nonnegative
definite. In the next two sections we will derive maximum
likelihood estimates for E and M which are valid for all
possible (E,H).
2.2 Some Lemmas
Consider the function
p(A,B;D,e,h) = e[tr A +ZnlAI] +h[tr B1D +enlBJ],
where A, B, and D are m x m matrices. We assume that D is
diagonal with distinct, descending, positive diagonal elements;
that is, D = diag(dl,d2,... ,dm) with dI > d2 > ... > dm > 0.
We are interested in minimizing )(A,B;D,e,h) subject to
s
(A,B) E C = {(A,B): A E P B EP B A E U P. where P.
j=0
is the set of all symmetric, nonnegative definite matrices
of rank j. In this section it will be shown that the required
absolute minimum occurs when both A and B are diagonal.
The proof of this result relies mainly on a lemma regard
l
ing the stationary points of the function g(P) = tr PB P'D
where P(mxm) is orthogonal.
Lemma 2.2.1: Consider g(P) = tr PXP'D where P(mxm) is
such that PP' = I, and X(mxm) and D(mxm) are both symmetric
and positive definite. It is assumed that D is diagonal with
distinct, descending, positive diagonal elements. Then the
stationary points of g(P) occur when PXP' is diagonal.
Further, the absolute maximum of g(P) is
m
max g(P) = Z d.ch.(X),
P:PP'=I i=l
and the absolute minimum of g(P) is
m
min g(P) Z dm+ich (X).
P:PP'=I i=l
Proof: Using the method of Lagrange multipliers, we look at
L(P,A) = tr PXP'D + tr A(PP'I),
where A = A'. Let A be the matrix that has 1 in row i,
ij
column j, and 0's elsewhere. Then
DL = tr(Aij.XP'D+PXA..D) + trA(A. .P'+PA .)
dp 1] ]1 1] J1
aPij 3 ji
= tr(DPXA..+PXA ..D) + tr(PA..A+APA..)
= 2tr(AjiDPX) + 2tr(AjiAP)
= 2(DPX)ij + 2(AP)ij,
aL
= tr(Ai+Ai..)(PP'I) = 2(PP'I). if i # j,
13
DL
L = tr A..(PP'I) = (PP'I)...
dA. 11 11
11
Thus, the stationary values of g(P) occur at the solutions to
2DPX + 2AP = 0,
I. (2.2.1)
PP' = I.
From (2.2.1) it follows that
A = DPXP',
so that A = A' implies that
DPXP' = PXP'D,
or DY = YD, (2.2.2)
where Y = PXP'.
Examining the (i,j)th term on each side of (2.2.2), we see that
we must have d.yij = Yijd.. Since d. # d.: i # j, it follows
that y = 0: i # j. Thus, Y = PXP' is diagonal. It is clear
then that the stationary values of tr PXP'D are given by the
set of values
m
d (i) d chi(X),
i=l
where {t(l),t(2),...,t (m)} is a permutation of {1,2,...,m}, the
set being formed over all such permutations. Further, the
absolute maximum of tr PXP'D is, clearly,
m
max tr PXP'D = Z d. ch. (X),
P:PP'=I i=l
and the absolute minimum is, clearly,
m
min tr PXP'D = Z d +li ch (X).
P:PP'=I i=l
We will also need the following results, the first of
which can be found in Bellman [1970:117].
Lemma 2.2.2: Let X(mxm) and Y(mxm) be symmetric matrices
with Y nonnegative definite. Then
ch. (X+Y) > ch (X) for i = 1,2,...,m.
If Y is positive definite, then
ch. (X+Y) > chi(X) for i = 1,2,...,m.
1 1
Lemma 2.2.3: The function p(A,B;D,e,h) has an absolute
minimum over the set of solutions C = { (A,B) : A P B E P ,
s
B A U P.}.
j=0 3
l
Proof: Since B is positive definite, it follows that B is
also positive definite, so that the diagonal elements of B
are positive. Then we find that
m m
1 m 1 1
tr B D = z (B d d (B ). = d tr B
i= 1 m i=l m
m m
= d I ch.(B 1) =d Z (ch.(B))1
m i=l mi=l
since ch. (B ) = (ch +l(B)) Hence, using the fact that
m
for any matrix X(mxm), tr X = Z ch (X) and
i=l
m
XI = n ch.(x), we see that
i=l
m 1
q(A,B;D,e,h) 2 e Z ((ch (A)) + Zn(ch (A)))
i=l
m 1
+ h Z (d (ch.(B)) + nn(ch. (B))). (2.2.3)
i=l
From Lemma 2.2.2 we know that ch.(BA) < ch.(B), since A is
positive definite. Then Cs can be written
Cs = {(A,B): chi(A) > 0: i = 1,2,...,m; chi(B) > 0:
i = 1,2,... ,m; 0 < chi(BA) < chi(B): i = 1,2,...,s;
ch. (BA) = 0: i = s+l,...,m; A = A', B = B'}. The closure,
1
CS, of C is {(A,B): ch.(A) 2 0: i = 1,2,...,m: ch.(B) > 0:
i = 1,2,...,m; 0 ch.(BA) I ch.(B) : i = 1,2,...,s; ch (BA)
= 0: i = s+l,...,m; A = A',B = B'}. Since (A,B;D,e,h)> 0,
it has an absolute minimum over C, since Cs is closed.
Note that from Lemma 2.2.2 if chi(BA) = ch (B) for some i,
then it must be true that chm(A) = 0, since A must then be
positive semidefinite. Thus, for every (A,B) Cs Cs it
must be true that ch (A) = 0 or ch (B) = 0 or both. It then
follows from (2.2.3) that t(A,B;D,e,h) = whenever (A,B)
s C Hence, (A,B; D,e,h) has an absolute minimum
over C
s
Lemma 2.2.4: Suppose the function f(x), minimized
over x E S, achieves a minimum at x = a. Let the set S1
be such that for any x SS1, there exists an xl E S1
such that f(x1) < f(x). Similarly, let the set S2 be such
that for any x ESS2' there exists an x2 E S2 such that
f(x2) < f(x). Then it follows that a E S1l 2
Proof: Suppose a ? S1 n S2. Then either a g S1 or a S S2
or both. However, if a t S1, then a E SS1, and there exists
no x1 E S1 such that f(xl) < f(a), since f is minimized at a.
This then is a contradiction, so it must be true that a E S.
Similarly, if a g S2, then a E SS2, and there exists no
x2 E S2 such that f(x2) < f(a). This also is a contradiction,
so it must be true that a E S2. Hence, it follows that
a E S1 n S2'
In Lemma 2.2.3 it was seen that the function (A,B;D,e,h)
has an absolute minimum over the set C We will now show
that this absolute minimum will occur only when both A and B
are diagonal.
Lemma 2.2.5: The absolute minimum of O(A,B;D,e,h)
subject to (A,B) E Cs occurs when both A and B are diagonal.
We offer two proofs.
Proof 1: Define the sets S1 and S2 as follows:
S1 = {(A,B) E Cs: A is diagonal},
S2 = {(A,B) E Cs: B is diagonal}.
We want to show that if A(A,B;D,e,h) achieves a minimum at
(A,B) = (A,,B,), then (A,,B,) E S1 n S2. Now with
A = D AD and B = DBD,
t(A,B;D,e,h) = e[trAl+inlAl] + h[trB1D+Zn BI]
= e[trA1D +ni[] + h[trBl+nlB ] +(e+h) inDI
= (B,A ;D1,h,e) + (e+h)nIDI.
ttA;D,h,e] + (e+h)ZniDI.
Note that since D is positive definite, (A,B) E C if and
s
only if (D AD ,D BD ) = (A,B) E C Thus, minimizing
A(A,B;D,e,h) subject to (A,B) E C is equivalent to minimiz
1
ing ((B,A;D ,h,e) subject to (A,B) C Moreover, if
 1 12
(A,,B,) minimizes ((B,A;D ,h,e), then (D AD ,D DD) min
imizes (A,B;D,e,h). Now arbitrarily fix (A,B) E C and
l
consider 4(PBP',PAP';D ,h,e) for all orthogonal P. Clearly
the terms nlPAP'I, tr PB P', and InlPBP'I are constant for
l
all orthogonal P, so that ((PBP',PAP';D ,h,e) is minimized
with respect to P when tr PAIP 'D is minimized. It follows
from Lemma 2.2.1 that all the stationary points, and thus the
absolute minimum, occur when PAP' is diagonal. Hence, for
any (A,B) E Cs S1 there exists an (A1,B1) E S1 such that
p(BlAl;D ,h,e) < t(B,A;D ,h,e).
But since A = D AD we know that A is diagonal if and only
if A is diagonal. So we find that for any (A,B) E Cs S ,
there exists an (Al,B1) E S1 such that
(A1,BI; D, e,h) < (A,B;D,e,h).
In a similar manner now arbitrarily fix (A,B) E C and con
sider <(PAP',PBP';D,e,h) for all orthogonal P. Clearly this
is minimized with respect to P when tr PB P'D is minimized,
since the terms tr PA P', ZnlPAP'I, and ZnPBP'I are con
stant for all orthogonal P. So from Lemma 2.2.1 it follows
that all the stationary points, and therefore the absolute
minimum, of ((PAP',PBP';D,e,h) occur when PBP' is diagonal.
This implies that for any (A,B) E Cs S2, there exists an
(A2,B2) E S2 such that
S(A2B 2;D,e,h) < O(A,B;D,e,h).
The result now follows from Lemma 2.2.4. Furthermore, from
Lemma 2.2.1 we see that if (A,,B,) minimizes 4(A,B;D,e,h),
then the diagonal elements of D AD are increasing and
the diagonal elements of B, are decreasing.
The second proof of Lemma 2.2.5 utilizes the concept of
"majorization" (see Marshall and Olkin [1974]).
Definition 2.2.6: Let x and y be real mxl vectors
with it element x. and y, respectively, and ith largest
element x(i) and y i), respectively. We say that x major
izes y and write x > y, if
s s
i x() E y(i for s = 1,2,... ,m,
i=l (i) i=l )
with equality when s = m.
We will need some results which, while well known to
workers in the area of majorization, may not be readily
accessible to others. We prove the results here for the
benefit of the uninitiated reader.
Lemma 2.2.7: If S (mxm) is doubly stochastic, then
m
x > SLx = y.
Proof: Since S is doubly stochastic, it follows that
s.. 0 for all (i,j), and
13
for i = 1,2,...,m,
for j = 1,2,...,m.
Thus, for 1 t s m there exists kl,k2,...,kt such that
m
j=E (sklJ k i +...+s ktj)x
j=l ^I 2: ^t3
Clearly when t < m,
m
skj +s k2+...+s < j isij = 1
k1j k2j kt i=l
for j = 1,2,...,m,
m
j=1 (sklj+sk2 +...+sk = t
Then when t < m,
t m
i Y(i j (s +s +...+sk )x
i=l(i) j= Sklj+ k .+Sktj)x
t
E x .
i=l (
If t = m, then
s +s +...+sk
ks kt2j kt
so that
t
i=ly (i)
t
i= s.
i=1 1J
= 1,
t
j=l (sklj+sk2J ktj)x
t t
= E x. = x .
j=1 3 i=l (
m
E s.. = 1
j=1
m
E s..= 1
i=l 13
t
iZ y (i)
i=l
m
Lemma 2.2.8: If x > y and a (1) aa (2) ... > o() 0,
then
m m
S(i i) Z yi(i)
i=l i=l
Proof: Put d. = x y Then
 (i) i *^
m m
E (x(iy)i) = d(i)
i=l i=l
= d(o()o(2) +
(dl+d2) (o(2) (3)) +
(dl+d2+d3) (o(3)a(4) +
(dl+d2+...+dm 1)((m ) ()
(dl+d2+...+dm) o(m)"
The last term is zero, since
m m m m
d = (x()Yi) = Z x yi) = 0.
i=1 I i=l i=1 i=li)
The partial sums are nonnegative, since
t t t t t
Sd = x ) Yi E x Z y(i) 0.
i=l i=l i=l i=l i=1
Further, the differences a(1)o(2) (2) (3) .....(m1)c(m)
are nonnegative. Hence, the result follows.
Lemma 2.2.9, Corollary: If x is an ordered vector,
that is, x > x2 L ... 2 xm, S is doubly stochastic, and
a is also an ordered vector, then x'a (Sx) '.
m
Lemma 2.2.10: If x > y and (1) > a(2) >...> (m) > 0,
then
m m
Z x(i) (m+li) < Z i(m+li)'
i=1 i=l
Proof: The proof is similar to that of Lemma 2.2.8.
Letting di = x (i)i, we have
m m
E (x )yi )o = E d i(o+i)
i^l(i) m+li) i(m+li)
i=1 i=l
= d1 (o(m) (ml)) +
(dl+d2) (m1) (m2) +
(dl++2+d3) ((m2) (m3) +
(dl+d2+...+dm1) ((2)o()) +
(dl+d2+...+dm)a(l)
t
We have seen that the partial sums, Z d.:t=l,...,ml, are
i=l
m
nonnegative and Z d. is zero, so that the last term is zero.
i=l
Further, the differences o(m)o(ml) (m1) (m2)" .'
0(2)0(1) are negative or zero. Hence, the result follows.
Lemma 2.2.11, Corollary: If x is an ordered vector,
and y = Sx with S doubly stochastic, then
m m
Zix(m+li) i yio(m+li)"
i=l i=l
Furthermore, if o(1) > 0(2) > ...> (m) then there is
equality only if y = x.
We are now ready for the second proof of Lemma 2.2.5.
Recall that we need to show that the absolute minimum of
p(A,B;D,e,h) subject to (A,B) E C occurs when both A and B
s
are diagonal.
Proof 2 (Lemma 2.2.5): Let S1 and S2 be defined as before;
that is,
S1 = {(A,B) c Cs : A is diagonal},
S2 = {(A,B) E Cs : B is diagonal},
and recall that we need to show that if 4(A,B:D,e,h) is min
imized at (A,,B,), then (A*,B*) E S1 n S2. Let
l
B1 > B2 >" 8rm > 0 be the latent roots of B, and
PB P' = diag(H ,m,m_ ..', 1). Then
{((PAP',PBP';D,e,h) P(A,B;D,e,h)}/h
= tr PB1 P'D tr B D
m m (m 2
Z dm+1 Z m+i d
j=1 m+lj j=1 i=1
m m
= ji j d yjd ,m+ (2.2.4)
m+j j m+
= j=l
where y = = 2(B1' ,2' .'.. ), and P2 is the matrix
th 2
with (i,j) element Pm+lj,m+li. Since PP' = P'P = I,
we see that P2 is doubly stochastic. Also d = (dld2,...,d )
and 3 are ordered vectors, so by Lemma 2.2.11, equation (2.2.4)
is not positive. Furthermore, d1 > d2 > ... > dm so that
6(PAP',PBP';D,e,h) cb(A,B;D,e,h),
with equality holding only when B1 = diag(Bm ,m_1 .." 1).
Therefore, for any (A,B) E C S2 there exists an (A2,B2) E S2
such that
4(A2,B2;D,e,h) < q(A,B;D,e,h).
Now with A = D AD and B = DBD
o(A,B;D,e,h) = e[trA D +Zn A ] + h[trB1 +inlB]
+ (e+h)ZnID (2.2.5)
= 4(B,A;D ,h,e) + (e+h)ZnIDI.
Let l 1 2 ". > 0 be the latent roots of A
and QA1Q' = diag (a1,a2,... ,m). Then by an argument iden
tical to the previous one we find that
4(QBQ',QAQ';D ,h,e) < 4(B,A;D ,h,e),
with equality holding only when A1 = diag(ala2,...a '' ).
From (2.2.5) it follows that
S(DQAQ'D' ,DQBQ'D;D,e,h) 1 f(DAD ,DBD ;D,e,h),
with equality holding only when A = diag(&l,a2" .... ).
Note that DQAQ'D = diag(dl1 a ....d ma ). Thus, for any
(A,B) E Cs S1 there exists an (A1,B ) E S1 such that
q(A1,B1;D,e,h) < *(A,B;D,e,h).
The result now follows from Lemma 2.2.4.
Lemma 2.2.12, Corollary: Let R be some restriction on
the latent roots of A or B or both, and let CR be the subset
s
of C such that (A,B) e CR implies that R is satisfied.
s s
Since (A,B) e CR if and only if (PAP',PBP') E CR for any
s s
orthogonal P, it follows that the minimal value of
*(A,B;D,e,h) over (A,B) E CR occurs when A and B are diagonal.
S
For example, if the latent roots of A were known to be pro
portional to a given set, then the minimal value of
p(A,B;D,e,h) over (A,B) E CR occurs when A is a diagonal
s
matrix with diagonal elements proportional to this set.
2.3 The Maximum Likelihood Estimates
In this section we seek the maximum likelihood estimates
s
of Z and M subject to the constraints Z e P and M e U P..
m j=0 3
Recall that the likelihood function of (E,M) is
Km(I,e)Km(I,h) H(hml) (eml)
f(E,H) H1(hml)I(eml)
IZ+MJIhI e
x exp[trZ Etr(Z+M)1 H].
The logarithm of the likelihood function, omitting a function
of the observations, is
trI1EeZnIII tr(E+M)1Hhnl+MI.
We seek the solution, (2,M), which maximizes the above equa
tion, or equivalently, the solution which minimizes
trZlE+eZnlI + tr(Z+M)IH+htnl +MI (2.3.1)
s
subject to Z E P and M e U P,.
m j=0 3
Let E, = (l/e)E and H, = (l/h)H. Note that since E* and
m
H, are both symmetric matrices, and E*E Pm and H, E U P.,
j=0 j
there exists a nonsingular matrix K(mxm) such that
KEK' = I and KHK' = D, where D = diag(dl,d2,...,dm), and
d > d2 > ... > d > 0 are the latent roots of H,E,1
1 > 2 > m
Then with Z = KZK' and M = KMK', (2.3.1) can be rewritten
etrK' E1K1I+etnl E+htrK'1 (+M) K D+hZn E+M
= e[trl+ZnI l] + h[tr(+M) D+En i+M ]
(e+h)n] K[2
= (((,Z+M;D,e,h) (e+h)nIK 2.
Thus, the problem has been reduced to that of minimizing
s
$(f,Z+M;D,e,h) subject to E P and M e U P. or,
m j=0 3
equivalently, (X,Z+M) e C But from Lemma 2.2.5 it is
known that the minimal solution to p(Z,X+M;D,e,h) is such
that E and E+M are diagonal, and in addition, it is known
that the diagonal elements of D D are increasing while
the diagonal elements of D+M are decreasing.
Consider the function
g(x,y) = e( + nx) + IXd+Zny), (2.3.2)
x y
where d > 0. Differentiating (2.3.2) with respect to x and
y, we get the equations
1 1
+ = 0,
2 x
x
d 1
+ = 0,
2 y
which yield the minimal solution x0 = 1 and y0 = d. If
instead we wanted to minimize (2.3.2) subject to x = y,
(2.3.2) would reduce to
1 d
g(x) = e( + inx) + h(d + nx).
x x
(2.3.3)
Then
dg(x) 1 1 d 1
= e( + ) + h( + ) = 0
dx 2 x 2 x
x x
e+dh
so that x1 = =e+dh minimizes (2.3.3).
Now let
f(d) = g(xly1) g(x0,Y0)
Se+h e+dh d(e+h) e+dh
= e( + Rn( )) + h( dh + an(t )
e+dh e+h e+dh e+h
(e+h+hind)
= e(l l)h + h + (dle) + (e+h)n(e+dh
e+dh e+dh e+h
(e+h+hind)
e+dh
= (e+h)Zn( e ) hind.
e+h
Differentiating f(d) with respect to d and noting that e > 1,
h 1, we find that when d > 1
df(d) h(e+h) h dh(e+h)h(e+dh) eh(dl)
=> 0.
dd e+dh d (e+dh)d (e+dh)d
In other words, the difference g(xl,y1) g(x0,YO) is an
increasing function of d when d > 1.
Now with X = diag(xl,x2,...,xm) and Y = diag(yly2" .., m)
consider minimizing
m m d.
>(X,Y;D,e,h) = e L ( + in x.) + h Z ( + in yi) (2.3.4)
i=l 1 i=l i
subject to (X,Y) E Cs, which in this case implies that
Yi xi > 0 for all i, and x. = y. for at least m s of the
i's. Suppose that d1 > d2 > ... > d > 1 > d > >d > 0.
Using the fact that f(d) is increasing in d for d > 1,
it then follows that the minimal solution to (2.3.4) is
(Xs,Ys), where if r I s,
si = Ysi = (e+dih)/(e+h) for s+l < i < m,
xsi = y si = di for 1 < i < s,
and if r < s,
xsi = si = (e+dih)/(e+h) for r+l < i < m,
xsi= 1, ysi = d. for 1 < i < r.
Thus, (i,Z,+M;D,e,h) is minimized subject to
(,S+M) E C at
s
so that the maximum
Z and M, where
M = Y X,
likelihood estimates of and M are
likelihood estimates of T and M are
S= KX K',
M= K (Y X )K'1
We now present an example to illustrate the computation
involved in deriving the maximum likelihood estimates.
Consider model (2.1.1) in which we take m = 4, g = 21,
n = 6, Z = I, and M = diag(99,24,0,0). Hence,e = g(nl) =
105 and h = gl = 20. Generating a matrix E from the dis
tribution W4(I,105,0) and a matrix H from the distribution
W4 (I+M,20,0), we obtain
69.1329 4.07476 5.12762 9.94924
E= 127.055 3.77638 20.4629
116.342 8.12511
100.186
1845.85 63.5986 16.5227 1.43363
H = 688.962 1.14908 8.61601
20.1453 ,0100181
12.2617
With E, = (1/105)E and H, = (1/20)H we need to find a non
singular matrix K such that KEK' = I and KHK' = D, where
D is a diagonal matrix. Let D1 = diag(chl (E),...,ch4(E*)),
and let P be the orthogonal matrix for which the ith column
is the characteristic vector of E, corresponding to chi(E,),
then, since E, is symmetric, P'EP = D Similarly, let
D = diag(chl(D1 P'HPD1 ),...,ch (D1 P'HPD1 )), and let Q
be the orthogonal matrix for which the ith column is the
characteristic vector of D P'H PD corresponding to
chi (D P'H*PD1) then, since D P'HPD1 is symmetric,
Q'D1 P'H PD Q = D. Thus,we may take K = Q'D P'. Using
the above decomposition for K, we find that, for our example,
1.24522 .0464049 .0380069 .130133
K = .0181884 .925978 .0477042 .213476
.00831611 .00767896 .940375 .237376
.000914637 .0158712 .153048 .994866
and D = diag(142.729,29.6669,.91847,.625404). Note that
d2 > 1 and d3 < 1, so that r = 2. Simple calculation yields
X0 = Y0 = diag(23.6766,5.5867,.986955,.940065),
X1 = diag(1,5.5867,.986955,.940065),
Y1 = diag(142.729, 5.5867, .986955, .940065),
X2=X3=X4= diag(l, 1, .986955, .940065),
Y2=Y3=Y4= diag(142.729, 29.6669, .986955, .940065).
Hence, if we let Z. and M. be the vmximum likelihood estimates
of Z and M, respectively, subject to the constraints Z E P4
and M e U P., we find that
j=0
15.3199 .541388
6.52813
0 =
.173203
.0210184
1.0919
.0910632
.0947756
.064921
.899582
M0 = (0),
6.5222
z1 =
91.5881 1.84178
.0370371
M1
.6578
.040037
1.20736
.046356
.0184676
1.0908
.792796
.0359426
.00686252
.0471092
.0378372
1.09073
.0924036
.0947486
.0649326
.899582
.00837764
.000168469
.000072518
.000000766
.0889834
.182707
.0652532
.898126
91.6383 3.13344 .788088 .0129987
33.2548 .105117 .549569
M2=M =M =
.00730371 .002076
.00909865
Further commentary on these data will be made in
Sections 3.6, 4.2, and 5.4.
2.4 The Likelihood Ratio Test
s
Recall that C = {(A,B):ACPm,EEPm,BAs U P.,
Sm M j=0
and suppose we know that (Z,Z+M) e Q = C We wish to test,
say, the null hypothesis that (E,Z+M)E w = C 1 C C The
alternative hypothesis is then that (Z,Z+M)c 0 w =
Cs C s. Thus, we are testing the hypothesis
(s)
H : rank (M) $ s1
against the hypothesis
(s)
H1: rank (M) = s.
We adopt the likelihood approach and compare max f(E,H)
with mx f(E,H). Specifically, we look at
max f(E,H)/ria' f(E,H) = i e (0,1]
With the matrices X = diag(x s,xs2 ,...,sm) and
Ys = diag(ysl'Ys2'' 'Ysm given by
si = Ysi = (e+d h)/(e+h) for s+l i m,
si = l,si = di for 1 i s,
Sl Sl 1
if r > s, and
x i = Ysi = (e+d.h)/(e+h) for r+l < i < m,
si = 1, si = di for 1 5 i < r,
if r < s, the maximum likelihood estimators, Z, of Z and,
M., of M when the parameters are restricted to lie within
Q, are given by
S= K X K'
0 S
Mn =K (YsXs)K',
where K is a nonsingular matrix. Similarly, the maximum
likelihood estimators, r of Eand, M of M where the
parameters are restricted to lie within w, are given by
S= K1X K ,1
o sl
S= K (YslXslK'
It should be noted that if r < s, then X = Xs1 and
Y = Ysl' and if r s, xsi = Xs_,i and ysi = sl,i
only for i # s.
The likelihood ratio, \, is
max f(E.H)
max f(E,H)
1 1 h e
exp[tr2l Etr( +MN ) H] +Mj hIle
exp[tr2 Etr( +9)1H] +A h he
~2 P n^ ) 11]h Q ^ WI W
etrX1 1] IYslEhh e
exp[etrX htrY D] IY Ih X e
1 1 h e
exp[ietrX htrY D] IY  h X O
s s s1 s1
IYs Ih Xs e
IYs1 Ih XsX e '
since, if r < s,
1 1 1 1
etr(X X ) + htr(Y 1Y )D
s1 s s1 s
1 1 1 1
= etr(X X 1) + htr(Y Y )D
s s s s
= 0,
and, if r 2 s, (2.4.1) becomes
1 1 1 1
e(x1 ,sx) + h(Ys l, Yss)ds
s1,s ss sls ss
d h(e+h)
e(e+h) e+ (es
e+d h e+d h
s s
(e+dsh) (e+h)
e+d h (e+h) = 0.
e+d h
s
So we have
if r > s,
if r < s.
Since dl > d2 > ... > d >1 > dr+
clearly, r > s if and only if d > 1.
A as
e+d h (e+h)
A d e+h if
h1 if
> ...> d > 0,
m
Hence, we can write
d > 1,
s
0 < d 1.
s
(2.4.1)
Now upon taking the derivative of A with respect to d over
the range ds > 1, we get
(e+h)1
dd s e+h e+h s
e+d h h(e+h)l ede
= h d ( L J
s e+h e+h
which is negative for ds > 1. Thus, A is a decreasing
function of ds over the range ds > 1. In addition,
h e+dsh h(e+h)
(e+h)
s (e+h 1 for ds ,
with equality when ds = 1, so that A is a decreasing func
tion of d
s
S(S)
The likelihood ratio test rejects Ho for small values
of A. Since A is a decreasing function of ds, the likeli
hood ratio test rejects H(s) for large values of ds. Now
recall that with H, = (1/h)H and E, = (l/e)E, there exists
a nonsingular matrix K such that KH,K' = D and KEK' = I.
It follows then that d.: i = 1,2,...,m are the solutions to
H.dEI = IKHK'dKEK'I = IDdlI = 0,
so we observe that ds is the sth largest solution to
IH,dE*I = 0. (2.4.2)
With pi = dih/e : i = 1,2,...,m, (2.4.2) can be written
H hEl = 0,
or
I H $E = 0
(2.4.3)
Hence, we would reject H) for large values of s = dsh/e,
where s is the sth largest solution to (2.4.3). It is of
particular importance to recall that H ~ W (Z+M,h,0) and
E Wm(Z,e,0), independently.
(s)
We have seen that the likelihood ratio test rejects H0
when 4s >c for some constant c. Now we want to choose for
the constant c some number, which we will denote by c(a,m,s)
to indicate its dependence upon a, m, and s, such that
P(Os > c(a,m,s) (Z,M)) a for all (E,Z+M) E Cs_1. For
c(a,m,s) we propose the a level critical value for the
largest root, el, from amongst the ms+l roots of 1w1 W21 = 0,
where W1 W_ +(I,hs+l,0) and W ~ W _+l(I,e,0), inde
pendently. That is, we take c(a,m,s) such that
P(91 > c(a,m,s)) = a. Justification for this choice of
c(a,m,s) will be given in the next chapter.
CHAPTER 3
PROPERTIES OF THE sth LARGEST ROOT TEST
3.1 Introduction
In this chapter we investigate some properties of the
th
s largest root test presented in the previous chapter.
It would be desirable to show that this test is the uni
formly most powerful test, but we were unable to do so for
general m. However, in Section 3.2 we show that for m = 1
the test is uniformly most powerful. Also, in Sections 3.3
and 3.4 it is shown that the sth largest root test is an
(s) (5)
invariant test of HO against H1 and is the test obtained
by the unionintersection principle (see Roy [1953]). Finally,
in the last two sections we discuss an important monotonicity
property of the roots ci : i = 1,2,...,m, and then use this
property in deriving the asymptotic distribution of s
3.2 The Uniformly Most Powerful
Test for m = 1
For m = 1 the problem reduces to that of the univariate
random effects model discussed in Section 1.1. Recall that
2 2 22
we have x . N(p,(oz+na )/gn), u ocXe, and
2 2 2 2 2
v ( z+na) h, independently, where a,Oz, and o are all
2
unknown, and we wish to test the hypothesis H0:o = 0 against
H : 2 > 0.
S a
Suppose that for some set of points, y', in the space
of (x ,u,v), we reject H0 whenever the experimental (x ,u,v)
22 2 2
belongs to y'. Let B(y';,U ,oa ) = P[(x ,u,v)ey' po ,o ]
2
and require that y' be such that B(y';p,o2 0 ) = a. Let
z Z
x = u + v and y = v/u, so that u = x/(y+l) and v = xy/(y+l).
Then the Jacobian, II of x ,u, and v with respect to
x x, and y is Ij = (x ,u,v)/ (x ,x,y)II = x/(y+l)
Then
22 2 2 22
B(y'; a g (u;o a)g2(v;o2,o)g3(x ;p ,o2)du dv dx..
z (X z Od g . a Ot
2 2 2 2
= /// f(x,yza2 )f (x ;J )dx dy dx ,
where y = {(x ,x,y): (x ,u,v) E y'} and where, independently,
2 2
u ~ zXe'
2 22
v ~ (oz+no)2)Xh
2 2
x N(,,(a +nc) /gn),
so that
and (x
2 2 x(e+h)l expfx/2(o2+no2)]
f(x,y;o ,o ) = f(x,y) =z
z2 2 a(e+h)
az (2(02+no 2)) ((e+h)((e+h))
z a
2 2 e (e+h) 2 2 2 2
(l+noa /2 ) y hl(y+l) (e+hexp[xno /22 (2 +no ) (y+l)]
x a z a z z a
B (e,h)
,y) is independent of x .We note that when o2 = 0,
x and y are independent; that is,
2 2= x
f(x,y;oz ,0) = fl(x;z)f2(y) = fl()f2(Y)
1
where
2 2
x zXe+h
y (h/e)Fh.
Letting y(x,x ) = {y: (x ,x,y) e y}, we can write
(' 2,0) = f f f(x,y)f0(x )dy dx dx
z m0 y(x,x )
= f fl(x)f(x ) f(y)dy dx dx
o 0 1" y(x,x )
Putting
h(x,x ;o ) = h(x,x ) = f f2(y)dy,
z y.. (x,x )
we see that
(Y';oy2,o0) = if fl(x)f0(x )h(x,x )dx dx .
z om 0
2 2
When o = 0, (x,x ) is sufficient for (a ,U). Further,
aO z
{fl(x) f (x..) o < i < , o2 > 0} is a complete family (see,
2
for example, Lehmann [1959:130]). Thus, since B(y';V,o,O0)
= a0' we must have h(x,x .) = 0.
Now let y, = { (x ,u,v): y = v/u > c) where c is some
2 2 2
constant. Then with q = (ql,q2)' where ql ~ (oz+nca)Xe+h
2 2
and q2 ~ N(p,(a 2+noc2) /gn), independently,
2 2 2 2 2 2 he
[B(y';Poz,oo) a(y';,ozc)] (l+n /oz
2 2
= E(d(q;o2z, a)
where the expectation is with respect to the distribution
of q.
Here
d(q;oa 2'2) .(2 f )(y,ql)dy ( f (Y)Q(y,ql)dy,
where y,() = y,(ql,q2),y() = Y(ql,q2), and
2 2 2 2
Q(y,ql) = exp[qlnoa/2o (o+noa) (y+l)].
Therefore,
q; o2, 2) f2 (y)Q(y,ql)d yf2 (y)Q'(y )dy.
Since
Q(y,ql) > Q(c,ql) when y e y.()y(),
Q(y,q1) < Q(c,ql) when y E y()y.(),
we find that
d(;o 2 Q(c,q (y)dy fy)dy]
~a~;ooQ y(c,q1) yOy(Vy2 yoy(),2(y)dy]
= Q(c,ql) [ (f2 (y)dy f2(y)dy]
y.()() 2
= Q(c,ql) [a0a0] = 0.
Thus,
2 2
E(d(q;o ,ac)) > 0,
so that
2 2 2 2
B(y*; z,a) (y'; ,az, "
Therefore, amongst all critical regions of size a0 the crit
ical region which rejects H0 when v/u > c is uniformly most
powerful in a test of H0: o = 0 against H: 2 > 0. That
is, the critical region 4 > c, where t is the only root of
(vqu) = 0, is uniformly most powerful.
3.3 An Invariance Property
Consider the group of transformations G = {gX: K(mxm)
is nonsingular), where gK(E,H) = (KEK',KHK'). Since
E ~ W (Z,e,0) and H W (Z+M,h,0), it follows that
KEK'. Wm(KZK',e,0),KHK'~ Wm(KEK'+KMK',h,0), and rank
(KMK') = rank (M). Thus, the problem of testing the hypoth
(s" (s)
esis H( : rank (M) : s1 against H1(: rank (M) = s is
invariant under the group G.
We will need the following definition.
Definition 3.3.1: Let X be a space and G, a group of
transformations on X. A function T(x) on X is said to be
a maximal invariant with respect to G if
a) T(g(x)) = T(x) for all x EX and g e G;
b) T(xl) = T(x2) implies xl = g(x2) for some g E G.
We will also need the following wellknown result (see,
for example, Lehmann [1959:216]).
Lemma 3.3.2: Let X be a space, let G be a group of
transformations on X, and let T(x) be a maximal invariant
with respect to G. A function f(x) is invariant with respect
to G if and only if f(x) is a function of T(x).
Now consider the roots, 1 > 2 > ... > m', of
HpEI = 0 and the roots, 01 > 2 > ... > 9m, of
KHK'SKEK' = 0, where K is nonsingular. Clearly
IKHK'eKEK'I = 0
implies IKI HEI K'I = 0
so that IHOEI = 0,
and hence, 9. = c.: i = 1,2,...,m. Suppose now that
06 = 4i: i = 1,2,...,m are the roots of IH19E 1 = 0 and
IH2E21 = 0, respectively, where E1, E2, H1, and H2 are all
positive definite, symmetric matrices. Then there exist
nonsingular matrices K1 and K2 such that
E1 = KK, H = KOKI'
E2 = 2K 2 = K2K,2
where P = diag(l,'2,...,i m). It then follows that
1 1 1 1
9 1H (K2K1 E K1 K',K2K1 H K'1 K')
g 21(E1,H1) = (K2 1 1 1 2 1 1 1 2
K2K1
= (K2 K1K KlK1 K,K2K1 KQK'K1Kr)
= (K2K2,K2K" )
= (E2,H2),
1
where gK2K lE G since, clearly, K2K1 is nonsingular. So by
2 1
Definition 3.3.1 {0: IHd)E = 0} is the maximal invariant
with respect to G. The sth largest root, Os, is clearly a
function of (i,02'" ... ), and hence, by Lemma 3.3.2 the
test statistic ps is an invariant test statistic for testing
the hypothesis H( against the hypothesis Hs)
0 1
3.4 The UnionIntersection Principle
Suppose that in testing H(m): rank (M) m1 against
H(m): rank (M) = m, we adopt the rule
R(m:m): reject H(m) if m > c(a,m,m).
Here 2 "' m > ae the roots of H = 0,
Here ()I > < 2 > ... > c > 0 are the roots of JHpE = 0,
E Wm (Z,e,0) and H ~ Wm(E+M,h,0), independently, and
c(a,m,m) is chosen such that P(m > c(a,m,m) IHm)) a.
(s) (s)
Consider now testing H : rank (M) 5 s1 against H1s
rank (M) = s. The hypothesis H(s) is true if and only if
the hypothesis H(): rank (FMF') 5 s1 is true for all
F e S(m,s), where S(m,s) is the class of all (sxm) matrices
of rank s. Similarly, the hypothesis H(s) is false if and
only if the hypothesis H(s) is false,and the hypothesis
H (s: rank (FMF') = s is true,for at least one, and in fact
Fl1
(s)
all F E S(m,s). Hence, we could think of Hs as
()and H s) (s) (s)
FESm,s) FHs) and as FeSm,s) FH and reject Hs)
if (E,H) E = F ms) y(F), where y(F) is the rejection
FcS(m,s)
(s)
region appropriate to a test of the hypothesis pH The
sizes of y(F): F c S(m,s) should be such as to produce a
desired overall error of the first kind of the desired size.
Thisprocedureis known as the unionintersection procedure.
Note that we will reject H0) : rank (M) s1 if for
some F e S(m,s), we reject Hs): rank (FMF') < s1. Let
'F > .2F .. > > sF > 0 be the roots of IFHF' tFEF'I = 0,
where, clearly, FEF' Ws(FEF',e,0) and FHF' ~ Ws(FZF'
+ FMF',h,0), independently. Then by the rule R(s:s) we
reject FHs) s if sF > c(a',s,s),where a' is chosen to give
the desired overall error of the first kind of the desired
size. Hence, we will reject H(s) if for some F e S(m,s),
sF > c(a',s,s), or equivalently, if max tsF > c(a',s,s).
FES (m,s)
We need the following results, the first two of which
can be found in Bellman [1970:115].
Lemma 3.4.1: Let A(mxm) be a symmetric matrix. Then
the smallest latent root of A may be defined as follows:
ch (A) = min u'Au,
m'u=1
where u is a (mxl) vector.
The next result is well known as the Poincar6 separation
theorem.
Lemma 3.4.2: Let A(mxm) be a symmetric matrix. Then
for any matrix F(sxm) such that FF' = I
ch (A) > chj(FAF') 2 chs+j(A)
J n ms+j
for j = 1,2,...,s.
We need Lemma 3.4.2 to prove the following lemma.
Lemma 3.4.3: Let A(mxm) be a symmetric matrix. Then
max min u'FAF'u = ch (A), (3.4.1)
F:FF'=I u'u=l
where F is a s x m matrix, and u is a m x 1 vector.
Proof: Since A is symmetric, there exists an orthogonal
matrix P(mxm) such that P'AP = A = diag(chl(A),
ch2(A),...,chm(A)), and hence, for any F such that FF' = I
min u'FAF'u = min u'FAF'u,
u'u=l u'u=l
where F = FP and FF' = FPP'F' = FF' = I. Then we can
rewrite (3.4.1) as
max min u'FAF'u.
F:FF'=I u'u=l
Let F,(sxm) be the matrix with (F,)ii = 1 for all i, and
(F,)ij = 0 for all i # j. Then
max min u'FAF'u min u'F AFu = ch (A).
F:FF'=I u'u=l u'u=1
Now by Lemma 3.4.2, for any F such that FF' = I, we know
that
min u'FAF'u < ch (A),
u'u=l
so that
max min u'FAF'u < ch (A).
F:FF'=I u'u=l
Therefore, it follows that
max min u'FAF'u = ch (A).
F:FF'=I u'u=l
We have seen that the unionintersection principle leads
to the rule which rejects H(): rank (M) 5 s 1 in favor of
H1(): rank (M) = s if max sF > c(a;s,s). Note that
FES(m,s)
with T(mxm) and F(sxm) such that TT' = E and F = FT, then for
fixed F e S(m,s)
IFHF' )FEF' = 0
implies
IFTT HT'I T'F' *FTTET' T'F'I = 0,
or
IFT1HT'l 4FF' = 0. (3.4.2)
Since F is of rank s, so also is FF'(sxs), and thus, there
exists a nonsingular matrix S(sxs) such that SFF'S' = I.
So with F = SF we find that (3.4.2) implies
IFT IT'1 (I = 0,
and clearly, FF' = SFF'S' = I. Hence, it follows that
max 4sF = max min{: IFT HT' F'4IJ = 01
FES(m,s) F:FF'=I 4
Smax min u'FT1HT'1 F'u,
F:FF'=I u'u=l
with the final equality due to Lemma 3.4.1. Now using
Lemma 3.4.3 and the fact that the latent roots of T1HT'
are the roots of IH EI = 0, we observe that max sF = s'
FES(m,s)
and thus, the unionintersection principle leads to the rule
(s)
which rejects H( if s > c(a',s,s).
3.5 A Monotonicity Property
of the Power Function
The test procedure developed in the previous sections
depends on the latent roots, ,1' 2'",'... of the random
matrix HE1. The distribution of these roots (see James
[1964]), and hence the power function of our test procedure,
depends upon the latent roots of the corresponding population
matrix (E+M)ZI as parameters. Let 61 >62 : ... m 2 1
l
be the latent roots of (E+M)E and note that with T defined
such that Z = TT'
(z+M)Z1 611 = 0
implies
IM (61)El = 0,
so that IT MT'  (61)1 = 0.
Since Z is nonsingular, T is also nonsingular, and so the
rank of T1MT'1 is the same as the rank of M. Hence, M
has rank of at most s1 if and only if 6 = 1, and testing
the hypothesis Hs) : rank (M) s 1 against Hs): rank(M)= s
0 rank(M)= s
(s)
is equivalent to testing the hypothesis H : 6 = 1 against
(s)
H1 : 6s > 1. A desirable property of the test statistic s
would be that it stochastically increases in 6 and thus,
that the power function increases monotonically in 6 In
this section we not only show that 6 stochastically increases
s
in 6s, but also that it stochastically increases in each
6.: i = 1,2,...,m. This more general result will be utilized
1
in the following section.
We will first prove the result for the largest latent
root, l That is, we will show that (1 stochastically
increases in 6.: i = 1,2,...,m.
1
Lemma 3.5.1: The test with the acceptance region
$1 = chl(HE ) 5 c
has power function which is monotonically increasing in
each population root 6..
1
The proof of Lemma 3.5.1 involves the following three
results, the first of which is due to Anderson [19551.
Lemma 3.5.2: Let y ~ N (0,1 ) and u N (O0,2 )
where Z2 Z1 is nonnegative definite. If 1 is a convex
set, symmetric about the origin, then P(yEW) 2( uE)).
Lemma 3.5.3: Let the random vectors yl,y2...., 'n and the
matrix U be mutually independent, the distribution of yi being
N (0,Z): i = 1,2,...,n. Let the set w, in the space of
m 
Iy1'y2' ". n'U}, be convex and symmetric in each yi given
the other yi's and U. Denote by Pi (w) the probability of
the set w when Z = Ei. Then whenever Z2 Z1 is nonnegative
definite, P 1(w) P 2(W).
Proof: Since Z1 and E2 are symmetric and 1 E Pm and
m
2 E U P it follows that there exists a nonsingular matrix
j=0 '
K such that KE1K' = I and KE2K' = A = diag(61,6, ... ).
m
Since it is assumed that Z2 E1 U P., we know that
j=0
6i 1: i = 1,2,...,m. Then y = Ky. ~ Nm(O,I) if Z = E ,
and i = K ~Nm(0,A) if I = E2. Let = {y,y2" ..'. nU:
(y1'Y2... ,' nU) w}, then P 1 (w) = P ( ) and P2 (W) =
PA ( ). So without loss of generality we can take E1 = I and
E2 = A. Let
Ai = diag(e1,e ...' i1,l,i+ ,...,m),
Ai = diag(61,82 .... 8i i i+ ,... m '),
R = {y:' (1' y2 '" 'Yn'U) E ; yj: j#i and U fixed),
where 6j E {l,6 }: j # i. Then from Lemma 3.5.2 it follows
that
P i(Ri lj: j i,U) PA*(Ril j: j#i,U). (3.5.1)
iii i
Multiplying both sides of the inequality (3.5.1) by the joint
density of the temporarily fixed variables and integrating
with respect to them, we obtain
PA (w) I PA *().
i 1
Then by induction we have
P (W) PA(w),
or equivalently,
P (0) P (W).
1 2
Finally, the third result we need is due to Das Gupta,
Anderson, and Mudholkar [1964].
Lemma 3.5.4: For any symmetric matrix B(mxm) the region
a = {A(mxn): ch (AA'B) : c) is convex in A.
Proof (Lemma 3.5.1): Recall that H W (E+M,h,0) and
E ~ Wm(Z,e,0). Since the problem is invariant under trans
formations gK(E,H) = (KEK',KHK'), we may assume, without loss
of generality, that Z + M = A = diag(61,6 2". ,6 m) and Z = I.
Then we can write H = YY', where Y = (y1',y2".. 'Yh) and
Y. N (O,A): i = 1,2,...,h, independently. So the accep
tance region can be written as {Y: chl(YY'E ) s c}. From
Lemma 3.5.4 it follows that the acceptance region is convex
in Y, and clearly we see that the acceptance region is also
symmetric in eachof the column vectors of Y. Note that the
vectors yl,y2, ... Yh and E are mutually independent, and the
distribution of yi is N (O,A). The result now follows from
Lemma 3.5.3.
The main result of this section follows from a result
due to Anderson and Das Gupta [1964].
Lemma 3.5.5: Suppose V W (E1,v,0) and U ~ Wm(Z2,u,0),
independently. Let X1 2 A2 ... Am be the latent roots
i
of UV and let w be a set in the space of AXA2, .... m such
that when a point (AX 2,. ,A m) is in w, so is every point
* *
(A ,A2 ,. ,A ) for which A. < .: i = 1,2,...,m. Then the
1 2 m 1 1
probability of the set w depends on Z1 and Z2 only through the
i
latent roots of E2 1 and is a monotonically decreasing func
1
tion of each of the latent roots of E2 1
Clearly, the set w = {(1'2 ... 'm): s 5 c} satisfies
the conditions of Lemma 3.5.5, so it follows that the proba
bility of the set w is monotonically decreasing in each of
the latent roots 6,62',... 6m of (E+M)E In other words,
the power function of the sth largest root test is a
monotonically increasing function of 6.: i = 1,2,...,m.
We now know that as 5  m, P( s>c) increases mono
tonically. We will show that actually, as 6s m,
P(s >c) 1, and hence, for sufficiently large values of
5 the probability of rejecting H(s) = 1 will be arbi
s 0 s
trarily close to one. Recall that there exists a nonsingu
lar matrix K such that KEK' ~ W (I,e,0) and KHK'W (A,h,0).
Let Kl(mxm) be such that
Kl(mxm) = diag(akl,ak2, .'.. ks,,... ,).
Note that KIKHK'K'{ Wm(K AK',h,0) and
K AK' = diag(c2k k 1a 2 ak 6 .,a 6
'1 1 2 2 2"' s s s+l'"' m
so that as a r ', a2 k is. and hence ch (KiAKi) , for
i = 1,2,...,s. Thus, we need to show that
P(ch (K1KHK'Kj(KEK') ) > c) 1 as a c. The following
lemma provides the necessary result.
Lemma 3.5.6: Let V Wm( 1,v,0) and U Wm(Z2,u,0),
independently, and let
Kl(mxm) = diag(akl,ak2,...,aks,,.. ,1) .
Then P(ch (K UKV1) > c) 1 as a  .
Proof: Let
U =
VU21
U 12
U22
where U11 is s x s, U21 is (ms)xs, U12 is s x (ms), and
U22 is (ms)x(ms). Similarly, define Vll, V21, V12, and V22.
Let F, be the s x m matrix with (F,) =1: i = 1,2,...,s
and (F,) i = 0: i # j, and let
K2(sxs) = diag(kl,k2,... ,ks).
Recall from Section 3.4 that
chs(K UK'V1) = max min{A:IFK1UK'F'XFVF' = 0}
FES(m,s)
2 min{A:IFK UK{F'AF*VF'l = 0}
= min{X: a2K2UKAVI = 0}
2 1
=a chs(K2UlKV1 ).
Thus,
Sch (K211
= P(ch(K2UK2V11) > c/a2),
V 121 V v 22)2
1
and K2UIIK2Vll is positive definite with probability one,
so that
1
lim P(ch (K UK'V ) > c)
1 2
= lim P(chs(K2UlKV11) > c/a )
lia ( s ,
oo
= P(chs(K2U1K2V) > 0) = 1.
3.6 The Limiting Distribution of s
We have seen that the likelihood ratio test for testing
(s) (s)
the hypothesis H : rank (M) < s 1 against H1 : rank (:M)
th
= s is based on the s largest root, %. However, if s is
s S
to be used as a test statistic, it is necessary to compute
the significance level, a, where
a = sup P(%s > clH0 )
U(s)
"0
With 61 62 ... > 6m as the latent roots of (Z+M)E
(s)
the null hypothesis can be written H : 6 = 1, or more
0 s
precisely, H(s) 61 62 >... 2: 1, 6 = 6+
0 1 2 s1 s s+1
... = 6m = 1. We will write s:m (61,62,...,6m) to indicate
that t is the sth largest root of m roots and depends on
the population roots 61,62 .... 6m. Then we may write a,
the significance level, as
a = sup P(s:m (61,62... ,_ ,1... ,1) > c).
61262 *" s121
6 2a . .. ?>l
But we saw in the previous section that s is stochastically
increasing in each 6.: i = 1,2,...,m. It then follows that
a = P( s:m( ,..., ,1, ...,1) > c),
where #s: (l, ... ,1, . .,1) denotes the random variable
which has the limiting distribution of s:m(662 .... 'sl'
1,...,1) as 6 m: i = 1,2,...,s1. So the problem at
hand is to determine the distribution of s:m 0,,...,,
1, .. ,1) .
Recall that E ~ W (Z,e,0), H W W (E+M,h,0), and there
exists a matrix K such that KZK' = I and K(Z+M)K' = A =
diag(61,62 .... m). If we define E and H as
E = A KEK' W (Ae,O),
H = A KHK'A ~ W (I,h,0),
where A2 = diag(61 ,62 ... ), then clearly s:m(61,
6,.... ) = ch (HE ) = ch (HE1). Hence,if we let
1
En ~ W (Ane,0), where An = diag(n61,n62'... ,n6 1,,...,) ,
~1
then we need to find the limiting distribution of ch (HE )
s n
as n 0. Since we can write E = Y Y', where Y = (n)
n n n n 1
(n) (n) (n) 
Y ) 'e ) and Yi Nm(0,An ): i = 1,2,...,e, inde
pendently, we can restate the problem as that of determining
the limiting distribution of ch (H(Y Y')). Consider the
s onn
following elementary result.
Lemma 3.6.1: If u N(0,l/n), then un L u,
where u is a degenerate random variable with all of its
probability at zero.
We also need the following results, the first of which
is well known as the continuity theorem (see, for example,
Breiman [1968:236]).
Lemma 3.6.2: Let x1,x2,x3,... be a sequence of random
d
vectors. Then x > x if and only if
n
lim E[exp(ix't)] = E[exp(ix't)]
n
n*
for all t where i = /1.
(n) d
Lemma 3.6.3: Suppose that as n x. > x
(n) (n) (n)
j = 1,2,...,m, and suppose {xl ,...,"x } are mutually
independent for all n. Then
x(n) x
1 X1
(n) xn) d = .
x2 2 
x (n)
m m
Proof: Note that it follows from Lemma 3.6.2 that
(n) '
lim E[exp(ix n)'t)] = E[exp(ixt j)].
n J J 3
Also, because of independence,
m
(n, m (n) E
E[exp(ix )t)] = E[exp(i Z x t)
j=1
m (n) '
= n E[exp(ix. t.)],
j=l J
lim E[exp(ix(n 't)] = lim E[exp(ix ) t)
n j=l n
m
= E[exp(ix't.)]
j=1
m
= E[exp(i Z x!t.)]
j=1l 3
= E[exp(ix't)].
The result now follows from Lemma 3.6.2.
From Lemma 3.6.1 and Lemma 3.6.3 we observe that
Y > Y with
n
1}
Y = ,
Y
2
where the elements of Yl((sl)xe) are all equal to zero with
probability one, and Y2 = (Y21Y22...'Y2e) with
Y2i ~ Nms+1 (,I): i = 1,2,...,e, independently.
Consider the following result, the proof of which can be
found in Ostrowski [1973:334].
Lemma 3.6.4: Let A(nxn) and B(nxn) be two matrices, and
suppose the latent roots of A and B are A. and AI: i = 1,2,
1 1
...,n, respectively. Put
N = max (la..j bi ),
li n, ljin
and
1 n n
6 = Z la..b. .
ni=l j=l 1
Then to every root A' of B there belongs a certain root
1
A. of A such that we have
11/n
IA.AXi : (n+2)N6/n
Further, for a suitable ordering of A. and A! we have
IA.Ai!I 2(n+l)2N61/n
Lemma 3.6.5, Corollary: If A is an n x n matrix, then
for each i ch.(A) is a continuous function of the elements
of A.
Lemma 3.6.6, Corollary: Let A be an n x n matrix and B,
an n x D matrix. Then the roots of the equation
IAABB'I = 0 (3.6.1)
are continuous functions of the elements of A and B except
at B such that BB'I = 0.
Proof: Let X.: i = 1,2,...,n be the roots of (3.6.1). Then
 1
when IBB'I 0, it follows that these roots are also the
i
latent roots of A(BB'). So, from Lemma 3.6.5, for each i
1
X. is a continuous function of the elements of A(BB').
1
But clearly, when IBB'I # 0, the elements of A(BB') are
continuous functions of the elements of A and B. Hence, for
each i X. is a continuous function of AandB except when
1
IBB'I = 0.
We need one final result involving the limiting distri
bution of a function of random vectors (see Mann and Wald
[1943]).
Lemma 3.6.7: Let x >x, and let g(x) be a Borel
n
measurable function such that the set R of discontinuity
points of g(x) is closed and P(xER) = 0. Then
g(x ) > g(x).
n
Now recall that we seek the limiting distribution of
l
ch (H(Y Y') ). In order to use Lemma 3.6.7 it is neces
s n n
sary to show that ch (H(YY') ) is continuous with
5
probability one under the distribution of (H,Y). Now w:
H 11 12
H = ~D
H21 H22/
where H11 is (sl)x(sl), H12 is (sl)x(ms+l) H21 is
(ms+1)x(s1), and H22 is (ms+1)x(ms+l), the roots unc
the distribution of (H,Y) are the solutions to
4 = 0. (3.
H21 H22 (0) Y2Y2
Since H is nonsingular with probability one, we may put
(G11 G12)
Hi = G = ,
so (3.6.2) can be written
((0) G12Y2Y) 0
Im (0) GYY = 0,
m (0) G22 Y2 YI
or
I1 OG12Y2 2
= 0.
(0) Ims+l G22 2 2
Thus, it must be true that
ms+l G22 2YY = 0,
or
c G22 c= 0
Then we see that with probability one chundefined, (HY')and
ch2( )I),..... chs1(H(YY')1) are undefined, and
ith
der
(3.6.3)
6.2)
1
ch (H(YY') ) is now the largest solution to (3.6.3); that is,
s
since YY' is of rank ms+l with probability one, there are
only ms+l solutions to IHyYY'I = 0. It can be shown (see,
for example, Graybill [1969:165]) that G22 = (22H 21HIHI2 1,
since H22H21 H 12 is nonsingular with probability one,
so that (3.6.3) can be written
H22H21H11H12 = 0.
Clearly, Y2 Y is also nonsingular with probability one,
1
and hence by Lemma 3.6.6 ch (H(YY') ) is continuous with
probability one under the distribution of (H,Y). The set of
discontinuity points, R, is closed, since R = {(H,Y): IY2 Y =0}.
Note also that as is well known (see, for example, Anderson
[1958:85]) H22H21H1H12 ms+(I,hs+1,0). Therefore,
21 11 12 ms+
from Lemma 3.6.7 since (H,Y )  (H,Y), it follows that
s:m( ', ,..., ,1,...,1) ~ 1:ms+ (1,1,...,1),
where l:ms+ (1,1,...,l) denotes the distribution of the
largest root of IW1OW21 = 0, where W1 ~ W _s+l(I,hs+l,0),
and W2 W ms+ (I,e,0), independently.
So in testing H s: rank (M) s1 against H1s)
rank (M) = s we choose as our critical value c(a,m,s), where
1
P(chl(W1W2 ) > c(a,m,s)) = a. By so doing we will guarantee
that
sup P(s:m (61 62 .' m) > c(,m,s)IH0 )) = a.
(s)
0
Charts and tables of the distribution of the largest root,
al, of W1 e(W1+W2)1 = 0 are available (see, for example,
Morrison [1976:379], Pillai [1965,1967]). These may be used
to calculate c(a,m,s), since 61 = 1/(1+4i), where i1 is the
largest root of I,1W,21 = 0.
In order to determine the rank of M, a sequential test
procedure is used. To illustrate this procedure, we will
return to the example presented in Section 2.3. Recall that
D = diag(142.729, 29.6669, .91847, .625404), h = 20, e = 105,
so that 1 = 27.1865, t2 = 5.65084, (3 = .174947, and 4=
(4)
.119125. First we consider testing the hypothesis H 4
rank (M) 3 against H 4): rank (M) = 4. The null hypothesis,
H 4), is rejected if t4 > c(.05,4,4), where c(.05,4,4) =
17 F(17,105,.05)/105, and F(17,105,.05) is the constant for
which P(F(17,105) > F(17,105,.05)) = .05 if F(17,105)
17
F1 0(0). Thus, c(.05,4,4) is approximately equal to .28,
and clearly, 4 = .119125 < .28, so that we do not reject
H4 Since H4) is not rejected, we now consider testing
the hypothesis H3) : rank (M) 2 against H1 : rank (M) = 3.
0 1
(3)
The null hypothesis, H 3, is rejected if $3 > c(.05,4,3).
Using the charts mentioned earlier we find that c(.05,4,3)
is approximately equal to .36. Since 03 = .174947 < .36,
we do not reject H3) and so next consider testing the
(2) (2)
hypothesis H2) : rank (M) : 1 against H2 : rank (M) = 2.
We find that c(.05,4,2) is approximately equal to .42, and
therefore, since 2 = 5.65084 > .42, we reject H2) and
conclude that the rank of M could very reasonably be taken
as two.
74
The procedure above is open to objections on the grounds
that the significance level for the test criterion has not
been adjusted to take into account the fact that a sequence
of hypotheses is being tested, with each one dependent on
the previous ones not being rejected. The mathematical com
plications involved in controlling the overall error make
such an adjustment virtually impossible to carry out.
CHAPTER 4
MAXIMIZATION OF THE LIKELIHOOD FUNCTION
WHEN Z = o
4.1 The Likelihood Function
Suppose the vectors x. (mxl): i = 1,2,...,g;j=l,2,...,n
can be modeled by
x.. = + Lf + z., (4.1.1)
1  1i
wherein i(mxl) is a fixed but unknown vector, L(mxp)
is a fixed but unknown matrix, f. N (0,1): i = 1,2,...,g,
1 p
2
and z.. ~ N (O,a I): i = 1,2,...,g;j=l,2...,n. We assume
that the set of random vectors {fl,f2 ,.... f ,z ,... ,z }
are mutually independent. Thus, xi.. N (u,V) with
V = LL' + o2I. For any orthogonal matrix P(pxp) it follows
that V = LL' + o21 = LP(LP)' + o21, so that, while LL' is
unique, L is not unique. In this section we will derive the
likelihood function for p, LL', and 0
By methods identical to those presented in Section 2.1
it can be shown that (x ,E,H) is sufficient for (1_,o2,nLL')
where
S g n
x = E E x ./gn Nm (,(l/gn) (o I+nLL')),
E = g E (x.ix. ) (x..x. )' W (o I,e,0),
i=l j=l 1. 1. m
g 2
H = n (x. x )(x. x ) ' W (o I+nLL',h,0),
i=l .. 
i=1
and e = g(nl); h = gl. In addition, if we let c denote
a constant, we find that the density of E can be written
f(E) = cE f(em1) exp[tr(a2I)1E]
m
= cl (em ) exp[( Z e.)/22]
i=l
m 2
= 9 ( e ii; )g2 (E).
1=1
m
Hence, from the set {ell,e1l2 ... ,e m,ml,em b=trE= I e..
2 i=1
is sufficient for a2.
We may assume, then, that we have, independently,
x b, and H where
2
x N( (i,(1/gn) (a I+nLL')),
2 2
b/o Xg
2
H W ( 2I+nLL',h,0),
m
2
and B = mg(nl). The problem is to estimate 1, a and LL',
2
or equivalently, to estimate i, a and M where M = nLL'.
We have seen that L is not uniquely defined and so if LL' is
an estimate of LL', then any L, such that LL' = LL', is an
estimate of L. The likelihood function of (H,o ,M) can be
expressed as
Km(I,h)  1 1
f(x ,b,H) = 2h b H
 .. / .........2.,... \. \ ,_2 .... 1 , 2, i,,
2(tL /gn)1 a I+MN) ( 0 M (2Y ) Z (0) )
x exp[gn(x i)'2I+M)1 (x v)lb/atr~o2 IIt M)1H ],
(hml)
where, as before,
Kl I, h) = 2 mh m(ml) I F((hj+l)).
j=1
The logarithm of the likelihood function, omitting a function
of the observations, is
2 2 2 1 l 2
b/2a Zno tr (a2I+M) Hhn [I+MI
Rnlo2I+M gn(x _ )'(a (2I+M)1 ( _).
We seek the solution which maximizes the equation above, or
equivalently, the solution which minimizes
b/a2+Bno2+tr( 2 I+M)1H+(h+l)inIa 2I+M (4.1.2)
+gn(x v_) (c2I+M)1
If we ignore the constraints that 2 is positive and M is
nonnegative definite and seek the stationary values of (4.1.2)
over all possible (p,a ,M), we find, upon taking the partial
derivatives of (4.1.2) with respect to a M, and v and
setting them equal to zero, that
b/(2) 2+B/ 2tr(o2I+M) iH(2I+M+(h+l)tr(a2I+M)1
tr(gn(a2I+M)1 (x Iy)(x v) (2I+M)) = 0;
(a2I+M)1H(a2I+M)1+(h+l) (2I+M)1
gn(2 I+M)1(x x ) '(o2I+M)1 = (0),
gn(o2I+) (x j) = 0,
for which the solutions are
2
0 = b/B,
M = (h+l)H (b/8)I.
Since M is a nonnegative definite matrix, its maximum likeli
hood estimate must also be nonnegative definite, so the solu
tions above are the maximum likelihood estimates only if
(h+l) H (b/B)I is nonnegative definite. Although the
solutions for p and 2 are the natural unbiased estimates,
the solution for M is not since E(M) = (h+l)1(hMo2I).
In addition, we observe that E(M) is also not necessarily
nonnegative definite.
Using the principle of marginal sufficiency referred
to in Chapter 2, we see that (b,H) is marginally sufficient
2
for (o ,M). Hence, we choose to use the marginal likelihood
function of (o2,M) instead of the likelihood function of
(, ,12M). The marginal likelihood function of (o2,M) can be
expressed as
K (I,h)
f(b,H) = b h21 2 h m
I2 1h1(2a2) a2 )
x exp[b/2a2tr(o2I+M)1H] .
The logarithm of the likelihood, omitting a function of
the observations, is
b/2a2 no2tr(o2I+M) Hhn oa2I+MI,
and we seek the solution which maximizes this equation, or
equivalently, the solution which minimizes
b/a2++tna+tr(a2 I+M)1 H+hnl 2I+M1. (4.1.3)
Again, if we ignore the constraints that 2 is positive and
M is nonnegative definite and seek the stationary value of
(4.1.3 over all possible ( we find, uon taking
(4.1.3) over all possible (a ,M), we find, upon taking
the partial derivatives of (4.1.3) with respect to 02 and M
and setting them equal to zero, that
b/(2) 2+B/a2tr(a2I+M)lH(o2I+M) +htr(2 I+M) = 0,
and
(a2 I+M)H(a2I+M)1+h(a2 I+M) = (0),
for which the solutions are
2
2 = b/6,
M, = (l/h)H (b/8)I.
Note that these solutions are the natural unbiased estimates
of a2 and M and, clearly, E(M,) = M is nonnegative definite.
Hence, we choose to continue our work with the marginal like
lihood function of (2 ,M). Since M is nonnegative definite,
the solutions above are the maximum likelihood estimates
only if (1/h)H (b/6)I is also nonnegative definite. In the
next section we will derive maximum likelihood estimates for
2
o and M which are valid for all possible (b,H).
4.2 The Maximum Likelihood Estimates
In this section we seek the maximum likelihood estimates
2 2 s
of a and M subject to the constraints a > 0 and M U P..
j=0
Recall that, aside from a constant, the logarithm of the
likelihood function of (2 ,M) is
b/2o2 Betnctr(c2I+M) HhZnl 2I+Ml
We seek the solution which maximizes the equation above, or
equivalently, the solution which minimizes
b/a +Bnca +tr( 2I+M)1H+hinl 12+Ml
2 s
subject to a > 0 and ME U P.. Note that this can be
j=0 3
rewritten as
tr( 2I)1 (I)+ inl o2I +tr(c2 I+M) 1H+hinj 2I+M (4.2.1)
m m
Since (b/B)I and H, = (l/h)H are both symmetric matrices,
m
and (b/B)I eP and HE U P., there exists a nonsingular
j=0
matrix K(mxm) such that K((b/B)I)K' =I and KH,K' = D where
D = diag(dl,d2, .. ,dm) and dl>d2> ... >dm > 0 are the latent
roots of H((b/B)I) = (B/b)H,. Then with 2 = BG /b and
M = KMK', (4.2.1) can be rewritten
tr K' (a 2I)1 K + ,n I o2II+htrK' ( 2I+M) K D
m m
+ hnlao2I+M
= [tr(2 I)l+in 2II]+h[tr(2I+M)D+n]2I+M]
(+h) n K 2
m
= (&2I,2 I+M;D,Ih)t( +h)nlK 12,
where c is the function discussed in Section 2.2. Thus, the
,2 2
problem has been reduced to that of minimizing ( 2 Ia I+M;
o 2 ~ s
D,,h) subject to 2 > 0 and M U P., or equivalently,
m j0 j=0
2 2 s
(8 I,B I+M) Cs since C = {(A,B): AEP BEP BAE U P.}.
s s m m j=0
Now for fixed (02I,B 2I+M) E C consider t(P(2 I)P',
2 B 2 2 B
P( 2I+M)P';D,,h) = (2 I,('2I+M)P';D,,h) for all orthog
r m
onal P. Note that this is minimized with respect to P when
tr P(2 12+) P'D is minimized. So from Lemma 2.2.1 it fol
lows that all stationary points, and therefore the absolute
minimum, of (2 I,P(o2I+M)P';D, ,h) occur when P(2 I+M)P'
__
is diagonal. Hence, in searching for the absolute minimum
of (a&2I,32I+M;D,I,h) over all (o21,21 +M) E C we may assume
m s
that 2 I+M is diagonal. This result also follows immediately
from Lemma 2.2.12.
Now with V = diag(vl,v2,...,vm) and fi(vi) = di/vi
+ n vi, consider minimizing
1 m d.
( 1 1
,(uI,V;D, h) = ( +inu)+h Z ( + nv.) (4.2.2)
i=l 1
1 m
= B( +Znu)+h E f.(vi),
u i=l
subject to (uI,V) e Cs. The constraint (uI,V) E Cs can be
equivalently written as
v. > u > 0 for i = 1,2,... ,m, (4.2.3)
1
and
v. = u for i E J, (4.2.4)
1
where J c {l,2,...,m} is a set which has at least m s
elements. Now
df.(vi)
1dv = (1d./vi)/v.,
dv 1 1 1
so that the function f. decreases monotonically in v. for
1 1
v. E (0,d.], increases monotonically in v. for v. s [di.,),
and is minimized over all v. E (0,c) when v. = d.. Thus,
1 1 1
the unrestricted minimum of (4.2.2) occurs when u = 1 and
V = D. It is evident from the structure of f. that if the
1
unrestricted minimum does not satisfy the constraints
(4.2.3) and (4.2.4), then the restricted minimum will
occur when u = v. = v. = ... v. for some set of integers
11 2 ik
{ili2 ... ik} c 1,2,...,m}. We need to determine k, the
number of integers, and also we need to know exactly which k
integers from amongst the integers 1,2,...,m comprise the
set {il,i2 .. ik }.
First, we will consider the constraint given by (4.2.3).
Let the variable r be defined in the following manner. If
l
m 1 1< m m1. i
then let r = 1. If dm< ...< dt+< < dt < ...< d, then
let r, 1 r < t, be the smallest value for which
m
d > (B+h Z d.)/(B+rh).
mr j=mr+l
Finally, if d < d <. < dI < 1, then let r, 1 r l m 1,
m m1 1
be the smallest value for which the inequality above is
satisfied. If the inequality is not satisfied for any
choice of r, 1 r r m m 1, then let r = m. Now if r = 0,
the minimum of (4.2.2) subject to (4.2.3) is simply the
unrestricted minimum of (4.2.2), and if r > 0, the minimum
of (4.2.2) subject to (4.2.3) is just the minimum of (4.2.2)
subject to u = vm= v1 = ... mr+1 which occurs at
m
u = vm = ... = vmr+l = (B+h j d.) / (I+rh),
v. = d. for i = 1,2,...,mr.
Now consider the constraint given by (4.2.4). If
r > m s, then the minimum of (4.2.2) subject to (4.2.3)
and (4.2.4) is simply the minimum of (4.2.2) subject to
(4.2.3). If r < m s, the minimum of (4.2.2) subject to
(4.2.3) and (4.2.4) is obtained by minimizing (4.2.2) sub
ject to
S = = ... = v. if r = 0,
V1 32 ms (4.2.5)
S=m = mr l = j = ... =vm if r > 0,
where {jlj2'.. msr} {12,... ,mrl,mr}. We will
now show that, in fact, j = mr, j2 = mrl,... msr
s+l. Note that for q = 1,2,...,m1 (4.2.2) is minimized
subject to u = v ...=vmq+l when
m
v= = = vmq+l = (B+h Z dj) /(+qh),
j=mq+l
v. = d. for j = 1,2,...,mq,
J 3
and has a minimal value equal to
m
j ++n (. mq
(I+ h dd
(++qh)Zn qh +( +qh)+h(mq)+h E Znd (4.2.6)
j=l 3
Similarly, (4.2.2) is minimized subject to u = vm =
= Vmq+1 = v., where i E {1,2,...,mql,mq}, when
m
u =v= .. = Vmq+l = vi = (S+h E d.+hd.)/(B+(q+l)h),
j=mq+l
(v. = d. for j = 1,...,i1,i+l,...,mq,
J J
and has a minimal value equal to
m
(+h d. +hd. mq
j l3 mq
(S+(q+l)h)Zn  (q+l)h  (3+(q+l)h)+h(mql)+h Z Znd..
l=l j1
j#i
(4.2.7)
Now subtracting (4.2.6) from (4.2.7), we obtain
m m
(+h E d.+hd. +h Z d
( j(mih)n ^'qul h
(S+(q+l)h)Zn m+(q+l)h )(B+qh)yn m+qh ) "lh i'd
(4.2.8)
which is the increase in the minimal value of (4.2.2) due to
the additional constraint, u = v.. Differentiation of (4.2.8)
with respect to d. yields
h +(q+l)h ) h
m d. '
\+h E d.+hd.
j=mq+l
m
which is negative when d. < (+h Z d.)/(B+qh) and positive
1 j=mq+l
m
when d. > (H+h Z d.)/(3+qh). Hence, (4.2.8) is an increas
1 j=mq+l 1
m
ing function of d. when d. > (S+h E d.)/(+qh), so that if
1 1 j=mq+l 3
m
d > (f+h Z d.)/(B+qh), choosing i = mq will yield
mq j=mq+l
a smaller minimum value than any other choice of i < mq.
In a similar manner subtracting the unrestricted minimal value
of (4.2.2) from the minimal value of (4.2.2) subject to u = v.
where it {1,2,...,m}, we obtain
/D +hdi
(B+h)nI ( h Znd.,
+h 1
which is an increasing function of d. for d. > 1. Thus, if
d > 1, choosing i = m will yield a smaller minimum value than
any other choice of i
the minimum of (4.2.2) subject to (4.2.3) and (4.2.4) when
r< m s. If r = 0, then d > 1, so that mr = m is the
mn
optimal choice for jl. Further, since di > 1 for i = 1,2,...,m
where r = 0, we have
B+h Z d. (B+qh)( d d /q)
j=mq+l < j=mq+l 3
B+qh (B+qh)
E d./q < dmq
j=mq+l
for q = 1,2,...,ml, and hence, when r = 0 choosing j = m,
j2 = ml,...jms = s+1 in (4.2.5) will yield a smaller
minimum than any other choice of {jlj2 .... ms}c {1,2,
...ml,m}. Now from the definition of r we see that
m
dmr > (6+h Z d.)/(B+rh) if 1 < r < ml. In addition,
j=mr+l
m
for q = 1,2,...,m2 if d >(B+h z d.)/(8+qh), then
mq jmq+l
m
B+h Z d.
=m(q+)
B+(q+l)h
m
(6+qh) ((+h E d.)/(6+qh)) + hd
= j=mq+l mq
B+(q+l)h
< ((B+qh)d +hd_ )/(6+(q+l)h)
mq mq
=d < d
mq mq1
m
Thus, dmq > (8+h Z d )/(B+qh) for q = r,r+l,...,ml.
q j=mq+l 3
It then follows that, when 1 < r < ms, choosing jl = mr,
j2 = mrl,...,jmsr = s+l in (4.2.5) will yield a smaller
minimum than any other choice of {jl,j2 ...,j msr
{1,2, .... mr1,mr}.
We can now obtain the minimal solution to (4.2.2)
subject to (4.2.3) and (4.2.4). Denoting the minimal solution
by (s ,Vs), we find that if r > ms,
m
j=mr+l j
{ssm s,m1smml= *=Vsmr+l j^m r+1
v j = dj for j = 1,2,...,mr,
s] 3
and if r < m s,
m
us=vsm=v s,m=...=v s,s+l= (+h Z d.)/((+(ms)h),
Sj=s+l
v = dj for j = 1,2,...,s.
2 2 ~
Thus, l(o I,o I+M;D,,h) is minimized subject to
m
(62I, I+MR)E C at
2
= Us'
M V u I.
s s
The maximum likelihood estimates of 2 and M are, therefore,
^2
a and M given by
2 = bu /B,
1 1
M= K (V U I)K'
s s
To illustrate the computation involved in deriving the
maximum likelihood estimates, we will again consider the
example presented in Section 2.3. Recall that with m = 4,
g = 21, n = 6, Z = I, and M = diag(99, 24, 0, 0) a matrix E
from the distribution W4(I, 105, 0) and a matrix H from the
distribution W4(I+M, 20, 0) were generated. With
B = mg(nl) = 420, b = trE, and H, = (1/20)H, we need to find
a nonsingular matrix K such that K((b/3)I)K' = I
and KHK' = D where D is a diagonal matrix. Let D1 =
diag(chl(H,),ch2(H,),...,chm(H,)), and let Q be the orthog
onal matrix for which the ith column is the characteristic
vector of H, corresponding to ch (H,), then since H, is sym
metric, P'HP = Dl. Clearly, ((B /b )P)'((b/B)I) ((6 /b)P)
= P'P = I and ((a /b )P)'H,((B /b )P) = (8/b)D1. Hence,
we find that, for our example,
1.00723 .0551967 .00906271 .00104477
.0551796 1.00719 .00310948 .0127707
K =
.00921922 .00261055 1.00874 .00010739
.000345628 .0128084 .000137374 1.0087
and D = diag(94.1065, 34.8845, 1.01721, .618312). Note that
d4 <1
yields u0 = 6.06506, V0 = 6.065061, ul = 2.39667, V1 =
diag(94.1065, 2.39667, 2.39667, 2.39667), u2 = .984153,
V2 = diag(94.1065, 34.8845, .984153, .984153), u3 = u4 =
.982651, V3 = V4 = diag(94.1065, 34.8845, 1.01721, .982651).
2
Thus, if we let o. and M. denote the maximum likelihood
1 1
estimates of o2 and M, respectively, subject to the con
2 i
straints 2 > 0 and Me U P., we see that
j=0
^2
0 = 5.95987,
MO = (0),
^ = 2.3551,
2 =2.3551,
89.8421 4.92338 .808366 .0931905
.269803 .0442987 .00510687
M =
1 .00727337 .000838493
.0000966637J
2
o2 = .967085,
91.3255 3.17993 .826433 .0715582
33.4811 .0575388 .426237
M =
.00770191 .000448497
.0054369
^2 ^2
03 = G = .965608,
91.327 3.17993 .826136 .0715587
33.4825 .0574547 .426256
M3 4
S.0416617 .000452157
.00543713
4.3 The Likelihood Ratio Test
s
Recall that C = { (A,B): A Pm, Be P, BA U p. }, and
m m j=0
2 '
suppose we know that (a I,oI+M) e Q = Cs. We wish to test,
say, the null hypothesis that (o2I,a I+M) e = Cs1C Cs
The alternative hypothesis, then, is that (o I,o2I+M) e
Q) = C C Thus, we are testing the hypothesis
s sl1
H s): rank(M) s 1
against the hypothesis
HS : rank(M) = s.
1
We adopt the likelihood approach and look at
max f(b,H)/max f(b,H) = A e (0,11.
w 2
With u and the matrix V = diag(v slvs2...,v s
given by
m
u = v =...=v = (B+h E d.)/(a+rh),
s sm s,mr+l jmr+1
jmr+l
v = d. for j = 1,2,... ,mr,
sj
if r > m s, and
m
S= v m=...=v + = (+h dj)/(B+(ms)h),
s sm s,s+l j=s+i
j=s+l
= d. for j = 1,2,...,s,
2 2
if r < m s, the maximum likelihood estimators a, of a ,and
M., of M, when the parameters are restricted to lie within
n, are given by
a2
0 = bus/B,
1 1
M = (VsusI)K'
where K is a nonsingular matrix. Similarly, the maximum
likelihood estimators 2 of a and M of M, when the
parameters are restricted to lie within w, are given by
2
2 = bu /B,
S sl1
1 1
MW = K (Vslus I)K'
It should be noted that if r > m s +1, then V = V1 and
u = us
s s1*
The likelihood ratio, A, is
A = max f(b,H)/max f(b,H)
exp[b/2 2 tr(2 I+^M )H] y^2 )a I+M h
expb/2 ^2 1 "2 ) 2 h
exp[b/2a Qtr (oaI+A ) H] (a) Ia2I+'iI' O
Q3 Q3 Q w w
1
exp[3/2u _htrV sD]_
s shtrV 1
1l
exp[%/2u bhtrV1 D
s s
IVs h u8;
s s
V 1h ui
IVs h u
Vs_11h u1
since, if r > m s + 1
1 1 1 1
(u 1u ) + h tr(V V )D (4.3.1)
s1 s s1 s
1 1 1 1
= B(u u ) + h tr(V V )D
s s s s
= 0,
and if r < m s + 1, (4.3.1) becomes
B+(ms+l)h :B+(ms)h (s) + (ms+l)h
is
m m m _
B+h E d. 6+h E d) ,+h i d. =s
j=s 3 j=s+l j=s
+(ms)h m
m d
+h Z d
j=s+l
6+(ms+l)h
m
B+h Z d.
j=s D
m
(B+h Z d.)
j=s
+ (ms) h
m
j+h Z d.)
j=s+l
m
(S+h d.) h
j=s+l
= B + (ms+l)h (B+(ms)h)h = 0.
So we have
m (B+h(ms))
/H+h(ms+l)) r+h Z d.
dh (+(m(il) ()) j=s+l if r < ms+l,
S B+h m d B+(ms)h if r
S +h d. /
j=s
1 if r ms+l.
m
Putting t = hd /(S+h Z d.), we can rewrite X as
s s j=s
(B+(ms+l)h)h 3+(ms+l)h) (+h(ms))
h B+(ms)h th (lt +h(s)
if r < ms+l,
1 if r 2 ms+l.
We will now show that r < ms+l if and only if
t > h/(B+(ms+l)h). First consider the case in which
s = m. Then r < ms+l = 1 if and only if d > 1, and
t = hd /(B+hd ) = h/(S/d +h) > h/(f+h) if d > 1, and
in m m m m
tm = h/(B/dm+h) < h/(B+h)
if d < 1. Consider now the case in which 1 s r ml.
m
Again we want to show that r < ms+l if and only if
ts > h/(B+(ms+l)h). If r = 0, clearly
m
dmi > (B+h d.)/(B+ih),
j=mi+l 3
for i = 1,2,...,ml. Also, if 0 < r < ms+l, then
m
d > (B+h Z d.)/(8+rh),
mr j=mr+l
and we have seen that this implies that
m
d > (B+h X d.)/(8+qh),
mq j=mq+l
for q = r,r+l,...,ml and, more specifically, for q = ms.
Hence, if r
m
d > (B+h Z d.)/(h+(ms)h),
s j=s+l
which implies
m
3 + h Z d. < d (B+(ms)h)+hd,
j=s 3 s s
so that
m
S+ h Z d. < hd ( + ms+l),
j=s 3 s
and thus
hd
s 1 h
t >
s m j/h+ms+l B+(ms+l)h'
B+h Z d.
j=s 3
Also, if r > ms+l, then it must be true that
m
d ms) ds ($+h X dj)/(B+(ms)h)
j=s+l 3
which implies that
hd
s h
t = 
s m +(ms+l)h
B+h Z d.
j=s D
It follows that the likelihood ratio, X, can be written as
6+ (mhs+)h ms+h) h (+h(ms)) th (t ) (B+h(ms))
h j +(ms)h s s
if t > h
s B+(ms+l)h '
1 if ts 6+(ms+l)h
h (B+h~ms))
Consider the function g~ts) t s (1ts)
The derivative of g(ts) with respect to t is
th(1t s)h+h(ms)) [h(lt )k(B+h(ms))t 1,
S
which is negative for ts E (h/(B+(ms+l)h),l). Thus, A is a
decreasing function of ts when ts (h/(i+(ms+l)h),l). In
s S
addition,
(B+(ms+l) 1hh 5+(ms+l)h (B+h(ms)) thh h (1t) (B+h(ms)) < i,
h ) B+(ms)h s s
for t (h/(B+(ms+l)h),l), with equality when ts
S s
h/(B+(ms+l)h), so that X is a decreasing function of ts.
(s)
Since the likelihood ratio test rejects H for small
values of A, it equivalently rejects H( for large values
of t However, the distribution of t is intractable, and
s s
(s) (S)
so use of t in a test of Hs versus H is not practical.
s 0 1
In the following chapter we present an alternative test
(s) (s)
statistic for testing H0 versus H1
