Solutions to the multivariate G-sample Behrens-Fisher problem based upon generalizations of the Brown-Forsythe F* amd Wi...

MISSING IMAGE

Material Information

Title:
Solutions to the multivariate G-sample Behrens-Fisher problem based upon generalizations of the Brown-Forsythe F* amd Wilcox Hm tests
Physical Description:
v, 148 leaves : ill. ; 29 cm.
Language:
English
Creator:
Coombs, William Thomas, 1954-
Publication Date:

Subjects

Subjects / Keywords:
Statistical hypothesis testing   ( lcsh )
Multivariate analysis   ( lcsh )
Education -- Research -- Statistical methods   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1992.
Bibliography:
Includes bibliographical references (leaves 141-147).
Statement of Responsibility:
by William Thomas Coombs.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001801844
notis - AJM5613
oclc - 27719758
System ID:
AA00002084:00001

Full Text









SOLUTIONS TO THE MULTIVARIATE G-SAMPLE
BEHRENS-FISHER PROBLEM BASED UPON GENERALIZATIONS
OF THE BROWN-FORSYTHE F* AND WILCOX H TESTS












By


WILLIAM


THOMAS


COOMBS


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY


OF FLORIDA


1992












ACKNOWLEDGEMENTS


would


individuals


like


that


express


have


sincerest


assisted


appreciation


in completing


to the


study


First


, I would


like


to thank


. James


. Algina


, chairperson


doctoral


committee,


suggesting


topic


dissertation,


theoretical

providing


guiding


barriers,

editorial


professional


through


debugging


suggestions


personal


growth


difficult


computer


and


through


applied


errors,

fostering


encouragement


support,


and


friendship.


Second,


am indebted


and


grateful


the


other


members


committee


, Dr


. Linda


Crocker


. David


Miller


, and


. Ronald


. Randl


patiently


reading


the


manuscript,


offering


constructive


suggestions,


providing editorial assi


stance,


and giving continuous support.


Third,


must


thank


John


Newell


who


fifth


unoffi


cial


member


committee


still


attended


committee


meetings,


the


read


progress


manus


of the


script,


project.


and


Finally,


vigilantly


would


inquired


like


as to


express


heartfelt


thanks


wife


Laura


son


Tommy


Space


limitations


prevent


from


enumerating


many


personal


sacrifi


ces


both


large


and


small


, required


wife


so that


was


able


to accomplish


task


Although


shall


never


a


-


,,


..






committee and


family,


let me begin


simply


and


sincerely--thank


you.













TABLE


OF CONTENTS


ACKNOWLEDGEMENTS


ABSTRACT


CHAPTERS


INTRODUCTION


The Problem
Purpose of the
Significance of


. .
tudy
the


Stu
Study


* * a
a a a a a a


REVIEW


OF LITERATURE


The Independent
Alternatives to
ANOVA F Test .
Alternatives to
Hotelling's T2
Alternatives to
MANOVA Criteria
Alternatives to


Samples
the Ind

the ANO
'est
the Hot

the MAN


t Test
dependent

VA F Tes

selling's

OVA Crit


Sampl

t .


a 2
mn2


t Tes


Test. .

I a a a a


METHODOLOGY


Development of Test
Brown-Forsythe
Scale of
with
Equality
of


Statistics .
Generalizations
the measures <
in group variabi
of expectation <
between and


of between
lity
of the mea!
within


and

sures
group


Wilcox
Invariance
Brown-
Wilcox
Design .
Simulation
Summary


disp
General
propertyy
orsythe
General


version
ization
of the
Genera
ization


st Statisti
nations .


cs a a
. .


Procedure







RESULTS


AND


DISCUSSION


. . 67


Brown-Forsythe


General


zations


. . 74


Johansen


Test


cox


General


zation


CONCLUS IONS


S ~ ~ S 4 4 S S 4 0 4 5 S 96


General


Observations


S S S 4 4 4 S 4 4 496


Suggestions


to Future


Rese


archers


APPENDIX


ESTIMATED


TYPE


ERROR


RATES


. . 100


REFERENCES


. . . 141


BIOGRAPHICAL


SKETCH


. . . . f148













Abstract


of the


of Dissertation


University


Requirements


SOLUTIONS


BEHRENS-FISHER


Presented


of Florida


Degree


TO THE
PROBLEM


the


Partial I
of Doctor


MULTIVARIATE


BASED


UPON


Graduate


School


Fulfillment of
of Philosophy


G-SAMPLE


GENERALIZATIONS


OF THE


BROWN


-FORSYTHE


AND


WILCOX


TESTS


William


Thomas


Coombs


August


1992


Chairperson:


Major


James


Department:


J. Algina
Foundations


of Education


The


Brown


-Fors


ythe


and


Wilcox H_
-In


tests


are


generalized


form


multivariate


alternatives


MANOVA


use


situations


where


dispersion


matrices


are


het


eroscedastic.


Four


generalizations


the


Brown


-Forsythe


test


are


included.


Type


error


rates


for the Johansen


test


and


the five


new


general


nations


were


estimated


using


simulated


data


variety


of conditions


The


design


experiment


was


a 2


factorial


The


factors


were


type


distribution,


number


of dependent


variables


number


groups,


ratio


total


sample


size


number


dependent


variables


form


the


sample


size


ratio,


degree


of the sample


size


ratio


, (g)


degree


of heteroscedasticity,


relationship


sarmni e


s1. A


disnersion


matrices.


Only


conditions


, (e)







which dispersion matrices


were heterogeneous were


included.


controlling


Type


error


rates,


the


four


generalizations


Brown-Forsythe


test


greatly


outperform both


Johansen


test


and


generalization


the Wilcox H
-ll


test.












CHAPTER


INTRODUCTION


Comparing


two


population


means


using


data


from


independent

statistical


sample


hypothesis


one of

testing<


the most


fundamental


One solution


to thi


problems


problem,


the


independent


samples


test


, is


based


the


assumption


that


the


samples


are


drawn


from


populations


with


equal


variances.


According


Yao


(1965),


Behrens


(1929


was


first


solve


testing


without


making


assumption of


equal


population


variances


Fisher


(1935


, 1939)


showed


that


Behrens


solution


could


derived


from


Fisher'


theory


of stati


stical


inference


called


fiducial


probability


Others (Aspin,

solutions to th


The


1948;


two


independent


Welch,


-sample


193


Behrens


samples


test


8, 1947

-Fisher


has


) have

problem


been


proposed


as well


generalized


analysis


variance


(ANOVA)


test,


test


equality


population


means.


This


procedure


assumes


homoscedasticity,


that


22 ..
-- *


. Several


authors


have


proposed


procedures


to test


without


assuming


equal


population


variances.


Welch


(1951)


extended


1938


work


arrived


an approximate


degrees


of freedom


(APDF)


solution.


Brown


and


Forsythe


(1974),


James


= ~CL2









(1951) ,


and


Wilcox


(1988,


1989)


have


proposed


other


solutions


to the


-sample


Behrens


-Fisher


problem.


Hotelling


test


(1931)


to a test of the e

procedure makes


generalized


quality

the a


the


of two


independent


population mean


assumption


equal


samples


vectors.


population


dispersion


(variance-covariance)


matrices


, that


1 =- 2".


Several

without


authors

assuming


have

equal


proposed pr

population


ocedures


dispersion


test


matri


1=t- Lz


ces


James


(1954)


solution.


generalized


Anderson


1951


(1958)


work


, Bennet


and


arrived


(1951)


, Ito


series


(1969)


van


der


Merwe


(1986)


Scheffe


(1943),


and


Yao


(1965)


have


proposed


additional


solutions


the


multivariate


two-


sample


Behrens


-Fisher


problem.


Bartlett


(1939)


, Hotelling


(1951)


, Lawley


(1938),


Pillai


(1955)


Roy


(1945),


Wilks


(1932


have


proposed


multivariate general


zations


of the ANOVA F test,


creating the


four


basic


multivariate


analysis


variance


(MANOVA)


procedures


testing


These


procedures


make


the


assumption


of equal


population


dispersion


matrices


James


procedures


test


(1954)


the


and


Johansen


equality


mean


(1980)

vectors


proposed

without


making


assumption


of homoscedasticity,


that


1 =2:2


.
. ZG


James


extended James


(1951)


univariate


procedures


produce


first


-order


second


-order


series


solutions.









The


Problem


To date,


neither the


Brown


-Forsythe


(1974)


nor the


Wilcox


(1989)


procedure


has


been


extended


the


multivariate


setting.


test


G Brown


Forsythe


(1974)


proposed


the


statistic


- X


- 12
N


where


denotes


the


number


of observations


in the


group,


the


mean


the


group,


the


grand


mean,


the


variance


of the


group,


N the


total


number


of observations,


and


G the


number


groups


. The


statistic


approximately


distributed


as F


with


and


f degrees


freedom


where


ni
N


The


degrees


freedom,


, were


determined


using


procedure


due


to Satterthwaite


(1941).


To test


. = .


0o : 1


Wilcox


(1989


proposed


statistic


-y


where


i cxi


Clz


n)


N











n, ( n


-X
+ 1) 2


+ 1)


and


i

i=1


t w.
1


In the


equation

sample.


denotes


The


statistic


last


approximately


observation


distributed


as chi


-square


with


degrees


freedom.


Purpose


the


Study


The


purpose


thi


study


extend


the


univariate


procedures


proposed


Brown


and


Forsythe


(1974)


and


Wilcox


(1989)


test


and


compare


Type


error


rates


the


proposed


multivariate


generalizations


the


error


rates


Johansen


(1980)


test


under


varying


stributions


, numbers


dependent


(criterion)


variabi


numbers


groups


, forms


of the


sample


size


ratio


, degrees


the


sample


size


ratio,


ratios


total


sample


size


to number


dependent


variable


, degrees


heteroscedasticity,


relationships


of sample


size


to dispersion


matrices.


Sicqnificance


of the


Study


The


application


of multivariate


analysis


of variance









the


future


data


analysis


(Bray


& Maxwell,


1985,


p.7)


Stevens


suggested


three


reasons


why


multivariate


analysis


prominent:


1.
subject


the
ways


inves
the


sensitive
variables


Any


more


worthwhile


than


tigator 2
subjects


one


treatment


way,


hence


is to determine


will


measurement


affected
techniques


will
the


affect
problem


which
and


the
for


specific


then


find


those


Through


the


use


multiple


criterion


measures


des


we can


obtain


cription


a more


phenomenon


complete
under i]


and


detailed


investigation


Treatments


while


the


cost


can be


expensive


obtaining


data


to implement,
on several


dependent
maximizes


variable


information


relatively


gain


. (1986


small


. 2)


Hotelling


sensitive


violations


homoscedasticity


, particularly


when


sample


S1zes


are


unequal


(Algina


Oshima,


1990;


Algina


Oshima


, &


Tang,


1991;


Hakstia]

& Clay,

(1954)


n, Roed,

1963;

first-


& Lind


Ito

and


1979;


Schull,


second-ordei


Hollow

1964).

r, and


& Dunn,


Yao


1967;


(1965),


Johansen


(1980)


Hopkins


James


tests


are


alternatives


Hotelling


that


have


underlying


assumption


of homoscedasticity


In controlling


Type


error


rates


under


heteroscedasticity


, Yao


test


superior


James


Algina,


first-order

Oshima, and


test

Tang


(Algin

(1991)


Tang,


studied


1988;


Type


, 1965)


error


rates


the


four


procedures


when


applied


data


sampled


from


multivariate


distributions


composed


independent


- & .


I I r


,,


r I


k *









can be


seriously nonrobust with extremely skewed distributions


such


as the


exponential


and


lognormal,


are


fairly


robust


with moderately


skewed


distributions


such


as the


beta(5


They


also


appear


robust


with


non


-normal


symmetric


distributions


such


the


uniform,


, and


Laplace.


The


performance


Yao


test,


James


second


-order


test


, and


Johansen


test


was


slightly


superior


the


performance


James


first


-order


test


(Algina,


Oshima,


Tang


, 1991)


MANOVA


criteria


are


relatively


robust


non-normality


(Olson,


1974,


1976)


but


are


sensitive


violations


homoscedasticity


(Korin,


1972


Olson,


1974,


1979;


Pillai


Sudj ana


1975;


Stevens


, 1979)


The


Pillai


-Bartlett


trace


criterion


most


robust


of the


four


basic MANOVA criteria


for protection against non-normality and heteroscedasticity of


dispersion

to MANOVA


matri


ces


criteria


(Olson,


that


are


197r

not


4, 1976,

based or


1979)


Ithe


Alternatives


homoscedasticity


assumption


include James


s first-


and


second-order


tests


, and


Johansen


test.


When


sample


sizes


are


unequal,


dispersion matrices


are


unequal


, and


data


are


sampled


from


multivariate normal


distributions


Johansen


s test and James


second


-order


test


outperform


the


Pillai


-Bartlett


trace


criterion


and


James


first


-order


test


(Tang


, 1989)


the


Wilcox


univariate


test


case,


require


Brown-Forsythe


equality


* test


population









test.


Thi


suggests


that


generalizations


the


Brown-


Forsythe


procedure


and


the


Wilcox


procedure


might


have


advantages over the commonly used MANOVA


procedure


in cases


heteroscedasticity.


Brown


and


Forsythe


(1974)


used Monte


Carlo


techniques


examine


the


ANOVA


test,


Brown


-Forsythe


test,


Welch


APDF


test,


and


James


first-order


procedure.


The


critical


value


proposed


Welch


a better


approximation


small


sample


than


that


proposed by


James.


Under


normality


and


inequality


variances


both


Welch


s test


and


the


test


tend


to have


actual


Type


error


rates


near nominal


error rates


wide


variety


conditions


However,


there


are


conditions


which


each


fail


control


terms


power,


the


choi


between


Welch


(the


specialization


Johansen


s test


and,


the


case


of two


groups


of Yao


s test)


and


depends


upon


magnitude


the


means


and


their


standard


errors


The


Welch


test


preferred


the


test


extreme


means


coincide


with


small


variances


When


the


extreme


means


coincide


with


large


variances


power


of the


test


greater


than


that


the


Welch


test.


limited


simulation


Clinch


and


Keselman


(198


indicated


that


under


conditions


heteroscedasticity,


Brown


-Forsythe


test


ess


sensitive


to non-normality than


Welch


s test.


In fact,


Clinch


Keselman


concluded


the


user


- .-- .. --I--- -- t- -


LI ~1.


MI _-L


--


_


*


-L r__.









normal


data,


in some


conditions


test


has


better


control


over


r than


does


James


s second


-order


test


, Welch


s test,


Wilcox


test.


other


conditions


test


substantially worse control.


Oshima and Algina


concluded


that


James


second


-order


test


should


used


with


symmetric


distributions


Wilcox


test


should


used


with


moderately asymmetric distributions.


With markedly asymmetric


distributions


none


the


tests


had


good


control


Extensive


simulations


(Wilcox,


1988)


indicated


that


under


normality


the


Wilcox


H procedure


always


gave


the


experimenter


more


control


over


Type


error


rates


than


the


or Welch


test


and


has


error


rates


similar


James


second


-order


method,


regard


ess


degree


hetero


scedasticity


Wilcox


(1989)


proposed H ,


an improvement


to the


Wilcox


(1988)


H method;


improved


test


is much


easier to


use


than James


second-order method.


Wilcox


(1990)


indicated


that


the


test


more


robust


non


-normality


than


the


Welch


test.


Because


the


Johansen


(1980)


procedure


extension


of the


Welch


test


, the


results


reported by


Clinch


Keselman


and by


Wilcox suggest general


zations of the Brown


-Forsythe procedure


the


Wilcox


procedure


might


have


advantages


over


Johansen


procedure


in some


cases


of heteroscedasticity


and/or


skewne


SS.


Thus


the


construction


and


comparison


new


procedures


which


may


competitive


even


superior


under













CHAPTER


REVIEW


OF LITERATURE


Independent


Samples


t Test


The

the


independent


equality


samples


two


used


population


to test


means


the


when


hypothesis

independent


random


samples


are


drawn


from


two


populations


which


are


normally distributed and have equal


population variances.


The


test


statistic


1 1)
+,n


where


R1 R


has


at


distribution


with


n'+n2


degrees


of freedom.


The


degree


robustness


the


independent


samples


test


to violations


the


assumption


of homoscedasticity


been well


documented


(Boneau,


1960;


Glass,


Peckham


, & Sanders,


1972


Holloway


Dunn,


1967 ;


Hsu,


1938;


Scheffe,


1959).


cases


where


there


are


unequal


population


variances,


the


relationship


between


the


actual


Type


error


rate


nominal


Type


error


rate


influenced


sample


ccii 'Tnp Yun el a aa1ff1 nT


amr41


Wkan


e ~mnl n


Inin \


E '1 rt n~


P ra









large,


7 and


a are


near


one


another.


In fact,


Scheffe


(1959,


p.339)


has


shown


equal


-sized


samples


is asymptotically


standard


normal,


even


though


two


populations


are


non-


normal


have


unequal


variances.


However,


Ramsey


(1980)


found


there


are


boundary


conditions


where


longer


robust


to violations


of homoscedasti


city


even with equal


-sized


samples


selected


from


normal


populations.


Results


from


numerous


studies


(Boneau,


1960;


Hsu


1938;


Pratt


, 1964;


Scheffe


1959)


have


shown


that


when


the


sample


zes


are


unequal


and


the


larger


sample


selected


from


the


population


with

test


larger variance

is conservative


(known


as the


(that


7<


positive


condition),


Conversely,


when


larger


variance


sample


selected


(known


the


from


negative


population


condition),


with


the


smaller


test


liberal


(that


, r > a)


Alternatives


the


Independent


Samples


Test


According


to Yao


(1965)


Behrens


(1929


was


the


first


propose


a solution


the


problem


testing


the


equality


population


means


without


assuming


equal


population


variances


Fisher


problem


problem.


has


Fisher


come


to be known


(1935,1939)


noted


as the


that


Behrens-


Behrens


solution


could


be derived


using


Fisher


s concept


of fiducial


distributions.








A number


of other


tests


have


been


developed


test


the


hypothesis


1 = "2


in situations


which


Welch


(1947)


reported


several


tests


in which


the


test


statistic


+


The


critical


value


different


the


various


tests.


There


are


two


types


of critical


values:


approximate


degrees


freedom


(APDF),


and


series.


The


APDF


critical


value


(Welch,


1938)


fractile


Student


s t


distribution


with


2 2
a, (2]


( L


n1 -i


degrees

obtained


freedom.

replacing


practice,


parameters


the


estimator


statistic


that


replaces


1,2)


the


literature


the


test


using


estimator


referred


as the


Welch


test.


Welch


(1947)


expressed


the


series


critical


value


function


and


, and


developed


seri


critical


value


in powers


- 1)


The


first


three


terms


series


critical


value


are


shown


Table


The


zero-


order


term


simply


fractile


the


standard


normal


stribution


using


the


zero-order


term


critical










Table


Critical


Value


Terms


Welch's


(1947)


Zero-


F First-


. and


Second-Order Series Solutions


Power of
(n1 1) 1 Term


Zero


2
Si)2


z [ 2
4


2

1=1


z [-


Sc
i=1


-1)


i in


3+5z2+Z4 i=l


2
Si
n.
.1


15+32z


+9z4 2=1









whereas


the


second-order


critical


value


the


sum


three


terms.


the


sample


sizes


decline


, there


is a greater


need


for


the


more


complicated


critical


values.


James


(1951)


and James


(1954)


generalized


Welch series


solutions


to the


G-sample


case


and


multivariate


cases


respectively


Consequently,


tests


using


the


series


solution


are


referred


as James


s first-order and second-order tests.


The


zero-order


test


often


referred


the


asymptotic


test.


Aspin


(1948)


reported


the


third-


and


fourth-order


terms


, and


investigated


, for equal


-sized samples


variation


in the


first-


through


fourth


-order


critical


values.


Wilcox


(1989)


proposed


a modification


the


asymptotic


test.


The


Wilcox


statistic


2
s1
2+


where


224-1


n, (n+l1


.i (n1i -1


asymptotically


stributed


standard


normal


distribution.


Here


(i=l


are


biased


estimators


population


means


which


result


improved


empirical


Type


error


rates


(Wilcox


, 1989).


The


literature


suggests


following


conclusions


two-sample


case


regarding


the control c


Type


error


rates


2Xii









series


tests,


Brown


-Forsythe


test,


and


Wilcox


test:


performance


Welch


test


and


Brown


-Forsythe


test


superior


to the


test;


the


Wilcox


test


and


James


second-


order


test


are


superior


Welch


APDF


test;


and


most


applications


in education


and


the


social


sciences


where


data


are


sampled


from


normal


distributions


under


heterosceda


sticity,


Welch


APDF


test


is adequate.


Scheffe


(1970)


examined


different


tests


including


the


Welch


APDF


test


from


standpoint


Neyman-Pearson


school


thought.


Scheffe


concluded


Welch


test


, which


requires


only


the


easily


accessible


t-table,


sati


factory


practical


solution


to the


Behrens


-Fisher problem.


Wang


(1971)


examined


Behrens


-Fisher test


Welch APDF test


and


Welch-


Aspin


series


test


(Aspin,


1948;


Welch


, 1947)


Wang


found


Welch APDF


test


to be superior to


Behrens


-Fisher test


when


combining


over


the


experimental


conditions


considered.


Wang


found


T-aO


was


smaller


the


Welch-Aspin


series


test


than


the


Welch


APDF


test.


Wang


noted,


however


, that


Welch


-Aspin


series


critical


values


were


limited


select


sample


sizes


and


nominal


Type


error


rates.


Wang


concluded


, in


practice,


one


can


just


use


the


usual


t-table


carry


out


the


Welch


APDF


test


without


much


loss


accuracy


However,


the


Welch


APDF


test


becomes


conservative


with


very


long-tailed


symmetric


stributions


(Yuen,


1974)


Wilcox


* a


__








Wilcox


test


tended


to outperform


the


Welch


test.


Moreover,


over


conditions,


the


range


r was


.032


, .065)


a=.05, indicating


the


Wilcox


test


may


have


appropriate


Type


error


rates


under


heteroscedasticity


and


non-normality


summary,


the


independent


samples


test


is generally


acceptable


in terms


of controlling Type


error rates


provided


there


are


sufficiently


large


equal


--S1Z


sample


even


when


the


assumption


of homoscedasticity


violated.


For unequal


sized


samples


, however,


alternative


that


does


assume


equal


population


variances


such


the


Wilcox


test


James


second


-order


series


test


preferable.


ANOVA


F Test


The


ANOVA


used


test


the


hypothesis


equality


of G population means when


independent random samples


are


drawn


from


populations


which


are


normally


distributed


have


equal


population


variances.


The


test


statisti


i x1i.


N-G)


has


an F


distribution


with


G-1 and


N-G


degrees


of freedom.


Numerous


studi


have


shown


that


the


ANOVA


test


is not


robus


violations


assumption


homoscedastic


(Clinch


Keselman


, 1982;


Brown


Forsythe,


1974;


Kohr









test


with


one


exception.


Whereas


the


independent


samples


generally


robust


when


large


sample


zes


are


equal


, the


ANOVA

rates


may


even


not

with


maintain

equal-:


adequate


sized


control


samples


Type


the


degre


error

e of


heteros


Serlin,


cedasticity


1986)


conservative


larg

the

the


e


(Rogan


positive

negative


Keselman


condition

condition


1977 ;


the

the


Tomarken


test


test


liberal

1974; H


(Box


1954;


:orsnell,


1953


Clinch


Rogan


Keselman

& Keselm


1982


an,


197;


Brown

2; Wi]


l[


& Forsythe

cox, 1988)


Alternatives


ANOVA


F Test


number


tests


have


been


deve


loped


test


hypothesis


*. = S


in situations


which


(for


at least


one


pair


of i


and


Welch


(1951)


generalized


the Welch


(1938)


APDF


solution


proposed


statisti


w (x1


G 1
2 f (1
l=1 1


where


G
i-1


W
w










G

-=1
2=1


wix


and


=i2


=l ,...,G


The


statistic


approximately


distributed


with


and


G3-
G2-l i=1


degrees


of freedom.


James


(1951)


generalized


the


Welch


(1947)


series


solutions


, proposing


the


test
G

i=l


statistic


where


S -,


1
- i
ni


-t
1 =1


w.x
w


and


t a S a S aI S a Sa


1
i


- .


1(I(I ~


r


r, lr


1









freedom.


sample


sizes


are


not


sufficiently


large,


however,


distribution


test


statistic


may


not


accurately


approximated


a chi


-square


distribution


with


degrees


which


of freedom.


a function


James

of the


(1951)

sample


derived a series

variances such


expression

that


S2h


- a


James


found


approximations


to 2h(


of orders


Sand


1 = n
i -


the


first-order


test,


James


found


order


1 the
1


critical


value


2
- XG-I


2(G2


W
- )
Ff


null


hypothesis


hypothesis


> 2h(


rejected


James


favor


the


provided


alternative


ond-order


solution


which


approximates


order


James


noted


that


second


-order


test


very


computationally


intensive.


Brown


Forsythe


(1974)


proposed


test


statistic


i c(x.
1' I


-X


- n
N


statistic


approximately


distributed


with


and


a sec


P [C










nI2
N


degrees


freedom.


the


case


two


groups,


both


Brown


-Forsythe


test


and


Welch


(1951)


APDF


test


are


equivalent


the


Welch


(1938)


APDF


test


Wilcox


(1989)


proposed


the


states


- )


-
i11


where


G

2=1


ni (n,+l1)


n, (n,+1)


i= G


G
2=1


w
W


The


statistic


approximately


stributed


-square


with


G-1 degrees


freedom.


The


literature


suggests


the


following


conclusions


about









Brown-Forsythe,


performance


Wilcox


each


and


ese


Wilcox


tests


alternatives


ANOVA


superior to


Welch


test


outperforms


the James


first-


order


test;


generally


Welch


competitive


with


and


one


Brown


another,


-Forsythe


however


tests


, the


are


Welch


test


is preferred


with


data


sampled


from


normal


stributions


while


the


Brown


-Forsythe


test


is preferred


with


data


sampled


from


skewed


distributions


and


the


Wilcox


James


second


-order


test


outperform


these


other


alternatives


ANOVA


under


the


greatest


variety


conditions.


Brown


Forsythe


(1974)


used Monte


Carlo


techniques


examine


ANOVA


Brown


procedures


-Forsythe


when


equal


Welch

and


APDF


James


unequal


zero


samples


-order

s were


selected


from


normal


populations;


was


or 10;


ratio


largest


the


smallest


sample


size


was


the


ratio


the


largest


smallest


standard


deviation


was


total


sample


size


ranged


between


small


sample


sizes


critical


value


proposed


Welch


a better


approximation


true


critical


value


than


that


propose


d by


James.


Both


Welch


APDF


test


and


Brown


-Forsythe


test


have


r near


under


the


inequality


variances.


Kohr


Games


(1974)


examined


ANOVA


test


, Box


test,


and


Welch


APDF


test


when


equal


unequal


-t a aa


, or


; (d)


1


,,,,1


,,,,1


t,,,


r rr ur








1.5,


or 2


the


ratio


the


largest


the


smallest


standard


deviation


was


4/10,


or J13;


and


total


sample


size


ranged


between


and


The


best


control


Type


error


rates


was


demonstrated


the


Welch


APDF


test.


Kohr


and


Games


concluded


the


Welch


test


may


used


with


confidence


with


the


unequal


-sized


samples


and


heteroscedastic


conditions


examined


their


study


Kohr


Games


concluded


the


Welch


test


was


slightly


liberal


under


heteroscedastic


compared


conditions;


inflated


however


error


this


rates


bias


the


was


test


trivial


and


test


under


comparable


conditions.


Levy


(1978)


examined


Welch


test


when


data


were


sampled


from


either


the


uniform,


-square


, or exponential


stributions


and


found


that


under


heteroscedasticity


, the


Welch


test


can


liberal


Dijkstra


and


Werter


(1981)


compared


James


second


order,


Welch


APDF


and


Brown


-Forsythe


tests


when


equal


unequal


-S1Z


samples


were


selected


from


normal


populations;


was


ratio


largest


smallest


sample


was


total


sample


size


ranged


between


12 and


and


ratio


of the


largest


to the


smallest


standard


deviation


was


or 3


Dijkstra and


Werter concluded


the James


second


-order test gave


better


control


Type


error


rates


than


either


the


Brown-


Forsythe


or Welch


APDF


test


Clinch


(-J-I itt C.


(198


studied


the


ANOVA


. Welch


, or


J7,


U i IV


r.









when


equal


unequal


-sized


sample


were


selected


from


normal


stributions,


chi-square


distributions


with


degrees


freedom,


or t


distributions


with


five


degrees


freedom;


was


ratio


largest


smallest


sample


size


was


or 3


total


sample


size


was


144 ;


variances


were


either


homoscedastic


heteroscedastic


assumption


The


violations


ANOVA


Type


test


error


was


most


rates


affected


Welch


test


were


above


, especially


negative


case.


test


provided


the


best


Type


error


control


that


generally


only


became


nonrobust


with


extreme


heteroscedasticity


Although


both


Brown


-Forsythe


test


and


Welch


test


were


liberal


with


skewed


distributions,


the


tendency


was


stronger


the


Welch


test.


Tomarken


and


Serlin


(1986)


examined


tests


including


the


ANOVA


test,


Brown-Forsythe


test


, and


Welch


APDF


test


when


equal


and


unequal


-sized


samples


were


selected


from


normal


populations;


was


the


ratio


largest


the


smallest


sample


size


was


(c1)


total


sample


size


ranged


between


36 and


and


ratio


of the


largest


smallest


standard


deviation


was


Tomarken


though


Serlin


generally


found


acceptable,


that


Brown


was


least


-Forsythe


slightly


test,


liberal


whether


sample


sizes


were


equal


directly


inversely


S S -


, 6


, or


* *-









Wilcox,


Charlin,


and Thompson


(1986)


examined Monte Carlo


results


on the


robustness


the


ANOVA


Brown-Forsythe


and


the


Welch


APDF


test


when


equal


and


unequal


-sized


samples were


selected


from normal


populations;


G was


or 6;


ratio


of the


largest


to the


smallest


sample


was


, 3


, 3.3


total


sample


size


ranged


between


smallest


and


standard


and


deviation


the


was


ratio


or 4.


the


Wilcox


largest


, Charlin,


Thompson


gave


practical


situations


where


both


the


Welch


and F*


tests


may


not


provide


adequate


control


over


Type


error


rates.


Welch


unequal


For


test


equal


should


-sized


variances


but


be avoided


samples


and


unequal


favor


possibly


samples


the


unequal


test


, the


but


variances


the


Welch


test


was


preferred


the


test.


Wilcox


(1988)


proposed


competitor


Brown-


Forsythe


Welch


APDF


, and


James


second-order


test.


Simulated equal


and unequal


-sized samples


were


selected where


distributions


were


either


normal


, light-


tailed


symmetric,


heavy-tailed


symmetric,


medium-tailed


asymmetric,


exponential


-like;


was


, or 10;


the


ratio


of the


largest


smallest


sample


size


was


, or


total


the


ratio


sample


size


largest


ranged


the


between


smallest


and


100;


standard


deviation


was


, 4,


, or 9


These


simulations


indicated


that


under


..









than


did


the


test


or Welch


APDF


test.


Wilcox


showed


that,


under


have


normality


r much


, James'


closer


second


than


-order


the


test


Welch


Wilcox'


Brown


test


-Forsythe


tests


The


Wilcox


gave


conservative


results


provided


(i=l


. ,G)


Wilcox'


results


indicate


H procedure


Type


error


rate


that


similar


to James'


second-


order method


, regard


ess


of the


degree


of heteroscedasticity


Although


computationally


more


tedious,


Wilcox


recommended


James'


second


-order


procedure


general


use.


Wilcox


(1989)


proposed


, an


improvement


Wilcox'


(1988)


method,


designed


to be


more


comparable


power


James'


second


-order


test


Wilcox


compared


James'


second-


order


test


with


when


data


were


sampled


from


normal


populations


was


or 6


ratio


of the largest


small


sample


size


was


, or


total


sample


size


ranged


between


121;


and


ratio


largest


Wilcox'


to the


results


smallest


indicate


standard


that


deviation


when


was


applied


or 6


normal


heteroscedasti


data,


has


T near


a and


slightly


ess


power


than


James'


second


-order


test.


The


main


advantage


improved


Wilcox


procedure


that


much


easier


use


than


James'


second


order


, and


easily


extended


higher


way


designs.


Oshima


and


Algina


press)


studied


Type


error


rates


-A- *-----1 -a--- .-


.._LL


L~ 1


r ....


F L









These


conditions


were


obtained


crossing


the


31 conditions


defined


sample


sizes


and


standard


deviations


Wilcox


(1988)


study


with


five


distributions--normal,


uniform,


beta(1.5,8


, and


exponential.


The


James


second


-order


test


and


Wilcox


test


were


both


affected


non-normality


When


samples


were


selected


from symmetric


non


-normal


distributions


both James'


second-order test and


Wilcox'


test maintained


r near


When


the


tests


were


applied


to data


sampled


from


asymmetric


distributions,


T-a


increased.


Further,


degree


of asymmetry


increase


ed, I


v-a


tended


increase.


The


Brown


-Forsythe


test


outperformed


the


Wilcox


test


James'


second-order


test


under


some


conditions


, however,


reverse


held


under


other


conditions.


Oshima


Algina


concluded


the


Wilcox H
-m


test


and James'


second


-order test


were


preferable


Brown-Forsythe


test,


James'


second


-order


test


was


recommended


data


sampled


from


symmetric


distribution,


Wilcox'


test


was


recommended


data


sampled


from


moderately


skewed


distribution.


summary,


when


data


are


sampled


from


normal


distribution


have better


Wilcox


control


of Type


test


and


error


James


rates


second-order


, particularly


test


as the


degree


heteroscedasticity


gets


large.


All


these


alternatives


the


ANOVA


are


affected


skewed


data


t(5)








Hotellina


s T2


Test


Hotelling


(1931)


test


equality


population


mean


vectors


when


independent


random


samples


are


selected


from


populations


which


are


distributed


multivariate


normal


and


have


equal


dispersion


matrices.


The


test


stati


stic


given


nn2


n, + n2


-x^2


I s-i


X2C


where


-1),2


Hotelling


demonstrated


transformation


ng +n2


-p-i


n, +n2


has


an F


distribution


with


nl+n2


degrees


of freedom.


The


sensitivity


Hotelling


violations


assumption


of homoscedasticity


well


documented


been


investigated


empirically


both


(Algina


analytically


Oshima


, 1990;


(Ito


Schull,


Hakstian,


1964)


Roed,


Lind,


1979;


Holloway


Dunn,


1967 ;


Hopkins


Clay,


1963)


Schull


(1964)


inves


tigated


the


large


sample


properties


presence


of unequal


dispersion


matrices


Schull


showed


that


in the


case


two


very


large


equal


sized


samples


well


behaved


even


when


dispersion


gC


n,+n,


of T2


r T2









inequality


dispersion


matrices


provided


the


samples


are


very


large.


However,


the


two


samples


are


of unequal


size,


quite


a large


effect


occurs


on the


level


of significance


from


even


moderate


variations.


Schull


indicated


that,


asymptotically,


with


fixed


n,/ (n1+n2)


and


equal


eigenvalues


of E2 -S


a when


eigenvalues


are


greater


than


one


T >


when


eigenvalues


are


ess


than


one.


Hopkins


Clay


(1963)


examined


stributions


Hotelling'


with


sample


sizes


, 10


, and


selected


from


either


bivariate


normal


populations


with


zero


means,


dispersion


matri


ces


the


form


aI -


where


a,/01


was


circular


bivariate


symmetrical


leptokurtic


populations


with


zero


means


, equal


variances,


was


. Hopkins


and


Clay


reported


robust


violations


of homoscedasticity


when


n1=n2


but


that


robustn


ess


does


extend


to disparate


sample


zes.


Hopkins


Clay


reported


that


upper


tail


frequencies


distribution


Hotelling'


are


substantially


affected


moderate


degrees


symmetrical


leptokurtosis.


Holloway


and


Dunn


(1967)


examined


the


robustness


Hotelling'


violations


homoscedasticity


assumption


when


equal


and


unequal


-sized


samples


were


selected


from


multivariate


normal


distributions;


was


, 1


,,


__


*


L









eigenvalues


s2;-'I


were


Holloway


and


Dunn


found


equal


-sized


samples


help


keeping


r close


Further


Holloway


and


Dunn


found


that


large


equal


-sized


samples


control


Type


error


rates


depends


number


dependent


variable


example


, when


i = 50


(i=l1


the


and

and


eigenvalues


but


r markedly


of S2Z,


departs


= 10,

from


T is near


a when


for

or p


= 10


Holloway


and


Dunn


found


that


generally


number


dependent


variable


increases,


sample


size


decreases


, T Increases


Hakstian,


Roed


and


Lind


(1979)


obtained


empirical


sampling


stributions


of Hotelling'


when


equal


unequal


-sized


samples


were


selected


from


multivariate


normal


populations;


was


or 10;


(n1+n2)


was


or 10;


was


or 5


dispersion


matrices


were


form


where


was


d2I,


diag( 1


S. d2, d2


= 1


I...,


, or


Hakstian


Roed,


Lind


found


that


equal


-sized


sample


procedure


is generally


robust.


With


unequal


-sized


samples


was


shown


become


increasingly


ess


robust


disper


sion


heteroscedasticity


number


independent


variable


Increase.


Consequentially


, Hakstian,


Roed,


Lind


argued


against


use


negative


condition


cautious


use


in the


p05


itive


condition.


n1/ n2


r








number


of dependent


variables


was


or 20;


and


the


majority


conditions


= d2Z1


3.0) .


Algina


Oshima


found


that


even


with


a small


sample


size


ratio


example,


procedure


with


can


and


be seriously


.25S1,


sample


nonrobust


size


For


ratio


small


Algina


1.1:1


and


can


Oshima


produce

also


unacceptable


confirmed


Type


earlier


error


findings


rates.

that


Hotelling'


test


became


ess


robust


the


number


dependent


variable


and


degree


heteroscedasti


city


increased.


summary


, Hotelling'


test


robust


violations


assumption


homos


cedasti


city


even


when


there


are


equal


-sized


samples


, especially


the


ratio


total


sample


size


to number


of dependent


variable


small.


When


the


larger


sample


selected


from


the


population


with


larger


ected


dispersion


from


matrix


population


When


with


larger


smaller


sample


dispersion


matrix


, r > a.


These


tendenci


increase


with


the


inequality


the


size


the


two


samples


the


degree


heteroscedasticity,


and


the


number


of dependent


variables


Therefore


the


independent


behavior

samples


of Hot


test


selling'

under


test


similar


violations


assumption


homoscedasticity.


Hence,


desirable


examine


robust


alternatives


that


require


basic


~~ FI~IIIVC: Aa


4-I-n Ua~n ln 114-.InrraA


CkA


nrhnn~lrrrh









Alternatives


the


Hotellincr'


Test


number


tests


have


been


develop


test


hypothesis


situation


which


it?


Alternatives


to the


Hotelling


procedure


that


do not


assume


equality


James'


test


the


(1954)


two


population


first-


Johansen'


dispersion


second-order


(1980)


test.


matrices


tests


Differing


Yao'


only


include


(1965)


their


critical


values


four


tests


use


the


test


statistic


-x2


+2
t4 ,J


-'C2


where


I are


respectively


the


sample


mean


vector


sample


dispersion


matrix


sample


The


literature


suggests


the


following


conclusions


about


control


of Type


error rates


under heteroscedastic conditions


Hotelling'


test


, James'


first-


and


second-order tests


Yao'


test,


and


Johansen'


test


Yao'


test


, James'


second


-order test


and Johansen'


test are


superior to James'


first-order


test;


ese


alternatives


Hotelling'


are


sensitive


data


sampled


from


skewed


populations.


Yao


(1965)


conducted


a Monte


Carlo


study


compare


Type


error


rates


between


the


James


first


-order


test


test


when


equal


unequal


-sized


samples


were


selected,


was


, (c)


ratio


total


sample


size


to number









were


unequal.


Although


both


procedures


have


r near


a under


heteroscedasticity,


Yao'


test


was


superior


to James'


test.


Algina


and


Tang


(1988)


examined


performance


Hotelling'


James'


first


-order


test,


and


Yao'


test


when


was


of the


or 10;


largest


N:p


smallest


, 10


was


sample


or 20;


was


ratio


, 1.25


and


dispersion


matri


ces


were


form


and


where


was


diag{3,1,1


..., 1)


, diag{3,


. a


...,1)


diag{1/3,3,3


S.. .,3)


or


diag{ 1/3,1/3,


,3,3,S


.,3}


Algina


and


Tang


confirmed


the


superiority


of Yao'


test.


Yao'


test


produced


appropriate


Type


error


rates


when


, and


For


appropriate


error


rates


occurred


when


applied


both


specific


cases


where


one


dispersion


matrix


was


multiple


the


second


d2ES)


and


more


complex


cases


of heteroscedasticity


When


N:p


and


, Algina


Tang


found


Yao'


test


to be


liberal


Algina,


Oshima,


and


Tang


(1991)


studied


Type


error


rates


James'


first-


and


second


-order


Yao'


Johansen'


tests


various


conditions


defined


the


degree


heteroscedastic


non-normality


(uniform,


Laplace,


beta(5


exponential


, and


lognormal


distributions)


The


study


indicated


ese


four


alternatives


to Hotelling'


, 4,


, or


1: n2


t(5)


115),












positive


kurtosis.


Although


four


procedures


were


serious


nonrobust


with


exponential


lognormal


distributions,


they


were


fairly


robust


with


remaining


distributions.


The


performance of Yao


s test,


James


s second-


order


test,


Johansen


s test


was


slightly


superior


the


performance


of James


s first


-order


test


Algina,


Oshima


, and


Tang


indicate


that


test


also


sensitive


to skewn


ess.


summary


Yao


test


, James


second-order


test,


Johansen


test


work


reasonably


well


under


normality.


Although


of these


alternatives


to Hotelling


s T2 test


have


elevated


Type


error


rates


with


skewed


data,


Johansen


s test


practical


advantages


general


zing


to G


being


relatively


easy


to compute.


MANOVA


Criteria


The


four basi


multivariate


analyst


of variance


(MANOVA)


criteria


are


used


test


the


equality


of G


population


mean


vectors


when


independent


random


samples


are


selected


from


populations which are distributed multivariate normal


and have


equal


dispersion


matri


ces


Define


-z


-X) (X


-E Ii)


-x









The ba


sic


MANOVA criteria are


functions


of the eigenvalues


Define


to


the


eigenvalue


(i=1,.


where


- min(p,G-


Those


criteria


are


Roy


(1945)


largest


root


criterion


+x71


Hotelling


-Lawley


trace


criterion


(Hotelling,


1951;


Lawley


, 1938)


trace


1w-i


-z


. Pillai


-Bartlett


trace


criterion


(Pillai,


1955;


Bartlett,


1939)


trace [H


H+E)


and


Wilks


(1932)


likelihood


ratio


criterion


H+E


1


Both


analytic


(Pillai


Sudj ana,


1975)


and


empirical


(Korin


1972


Olson,


1974)


investigations


have


been


conducted


the


robustness


MANOVA


criteria


with


respect


violations


examined


homoscedasticity.


violations


Pillai


and


homoscedasticity


Sudjana


the


four


(1975)


basic


MANOVA


criteria.


Although


the


generalizability


the


study


- a -


IS)


ft f


I .


1 I..


m









heteroscedasticity,


results


were


consistent--modest


departures from a


for minor degrees of heteroscedasticity and


more


pronounced


departures


with


greater


heteroscedasticity


Korin


(1972)


studied


Roy's


largest


root


criterion


the


Hotelling-Lawley


likelihood


ratio


trace


criterion


criterion


when


equal-


and

and


Wilks'


unequal -


sized samples were selected from normal populations;


p was


or 4;


G was


or 6;


the


ratio of total


sample


size


to number of dependent variables was 8.25,


, 15.


, 18 or


dispersion matrices were of the form I or D,


where


was


2d2I


1.5


10) .


For


small


samples,


even


when


the


sample


sizes


were


equal


dispersion


heteroscedasticity produced Type I error rates greater than a.


Korin


reported


the error


rates


R were greater than those


for U


and L.


Olson


(1974)


conducted


Monte


Carlo


study


comparative robustness of six multivariate tests including the


four basic MANOVA criteria


when


equal-sized


samples were selected


; (b)


p was


or 10; (c


G was


was


dispersion


matrices


were


form


where


represented either a low or high degree of contamination.


the low degree of contamination,


= d2I,


whereas for the high


degree


of contamination,


= diag(pd2-p+l, 1,1,..., 1)


= 2,


, 10









should


avoided,


while


may


recommended


the


most


robust


of the


MANOVA


tests.


In terms


of the


magnitude


of the


departure


of r from


tendency


order


increased


the


was


typically


degree


hetero


> V.


scedasticity


increased.

increased


The

with


departure


from

the


increase


number


dependent


variable


, however,


the


impact


was


well


defined.


Additionally


, for


, and


7 decreased as


sample


increased


except


when


When


, 7 increased


four


basi


MANOVA


procedures


, although


the


increase


was


least


for


Stevens


(1979)


contested


Olson


(1976)


claim


that


superior


to L


and


general


use


multivariate


analysis


variance


because


greater


robustness


against


unequal


dispersion


matri


ces.


Stevens


believed


son


conclusions


were


tainted


using


an example


which


had


extreme


subgroup


variance


differences,


which


occur


very


infrequently


practice.


Stevens


conceded


Vwas


the


clear


choice


diffuse


structures,


however,


for concentrated noncentrality


structures


with


dispersion


heteroscedasticity,


actual


Type


error


rates


, U,


and


are


very


similar


Olson


(1979)


refuted


Stevens


(1979)


objections


practical


grounds.


experimenter,


faced


with


real


data


unknown


noncentrality


and


trying


follow


Stevens


recommendation


use









Alternatives


MANOVA


Criteria


number


tests


have


been


developed


test


hypothesis


1 = P2


* *= .G


in a situation


in which


(for


at least


one


pair


James


(1954)


generalized


James


(1951)


seri


solutions


and


proposed


the


stati


stic


-
1=1


where


G

i=1


"j
-Ej=


- If.-1


S G


iwii
i=1


James


(1954)


zero-


, first-


, and


second-order


critical


values


parall


those


developed


James


(1951).


Johansen


(1980)


generalized


Welch


(1951


test


proposed


using


the


James


(1954)


test


statistic


divided


- p(G-1)


+ 2A


G-l) +


1 f cj











2=1


trace


-w1w.)


+ trace


2 1W


The


critical


value


Johansen


test


fractile


distribution


with


p(G-


and


p(G-


1) [p(G


degrees


of freedom.


The


literature


suggests


the


following


conclusions


about


control


of Type


error


rates


when


sampling


from


multivariate


normal


populations


under


heteroscedast ic


conditions


four


basi


MANOVA


criteria


James'


first-


second-order


tests,


and


Johansen'


test


the


Pillai


-Bartlett


trace


criterion


most


criteria;


with


robust


unequal


the


-sized


four


samples


basic


, Johansen'


MANOVA


test


James


s second


-order


test


outperform


the


Pillai


-Bartlett


trace


criterion


and


James'


first


-order


(1969)


analytically


examined


Type


error


rates


James'


zero


-order


test


showed


showed


T-a
I


increased


the


variation


the


sample


sizes


degree


heteroscedasticity


and


number


dependent


variables


increased,


whereas


r-a


decreased


the


total


sample


size


increase


Tang


James'


(1989)


first-


studied


and


Pillai


second-order


-Bartlett


tests,


trace


criterion


Johansen'


test


when


equal


unequal


-Siz


ed samples


were


ected


from


multivariate


normal


populations;


was


or 6;


was


-rw1


-1)+


3/ (3A








number


dependent


variabi


was


dispersion


matri


ces


were


either


form


or D


, where


was


, diag{(l


,d2,d2)


or diag{ 1/d2


,dd2,d2}


for p=3


or D was d'I


diag(l,1,1,d2


,d2)


or diag( 1/d2


,1/d2


,1/d2


or 3).


Results


study


indicate


when


sample


zes


are


unequal


dispersion


matri


ces


are


unequal,


Johan


sen'


test


and


James


s second-order


test


perform


better


than the


Pillai


-Bartlett


trace


criterion and James


first


-order


test


Whil


both


Johansen'


test


and


James'


second


-order


test


tended


have


Type


error


rates


reasonably


near


Johansen'


test


was


slightly


liberal


where


eas


James'


second-


order test


was slightly


conservative.


Additionally,


ratio


total


sample


size


to number


of dependent


variable


has


strong


impact


performance


tests


Generally,


as N:p


increases


the


test


becomes


more


robust.


summary


, the Pillai


-Bartlett


criterion


appears


most


robust


four


asic


MANOVA


criteria


violations


assumption


of dispersion


homoscedasticity


In controlling


type


error


rates


the Johansen


test


and James


second-order


test


are


more


effective


than


either


the Pillai


Bartlett


trace


criterion


or James


first-order


test


Finally,


Johansen


test


computationally


practical


intensive


than


advantage


James


of being


second-order


ess


test.













CHAPTER


METHODOLOGY


In this


chapter


, the


development


of the


test


stati


stics,


design


and


the


simulation


procedure


are


described.


test


states


extend


the


work


of Brown


and


Forsythe


(1974)


Wilcox


(1989)


The


design


based


upon


review


relevant


literature


and


upon


the


cons


ideration


that


experimental


conditions


used


the


simulation


should


similar


those


found


educational


research.


Development


of Test


Statisti


Brown


-Forsvthe


Generalizations


test


*** = L


Lo :1


G Brown
u


and


Forsythe


(1974)


proposed


the


statistic


pmX


- x


Ni
N


The


statistic


approximately


distributed


with


f degrees


of freedom


, where


n.
- 'i)
IN
N


u n








Suppose


. XG


are


-dimensional


sample


mean


vectors


and


I SG


are


p-dimensional


dispersion


matri


ces


independent


random


samples


S1zes


respectively,


1,.-',niG,


from


multivariate


normal


stribution


,Zg)


To extend


the Brown


-Forsythe


statisti


the


multivariate


setting


, replace


means


corresponding


mean


vectors


and


replace


variances


their


corresponding


dispersion


matri


ces.


Define


-E


-K)


and


-z
1=1


The


(i=1,


S. .,G)


are


stributed


independently


Wishart


,S1)


and M


said


to have


a sum


of Wi


shares


stribution,


denoted


and


van


as M


der


~ SW(n,


Merwe


(1986)


- n1


have


generalized


- n,/N) ZG)


Satterthwaite


(1946)


results


and


approximated


the


sum


Wisharts


distribution


~ Wp(f


Applying


and


van


der


Merwe


results


to M


the


quantity


the


approximate


degrees


freedom


of M and


is given


trace


Ci i


+ trace


{ tra


In',


+ trace


is']


S. ,Np(G


ni
N


N,(p,


rC1)


WP(ni


/N) C1


ei C f









ni
N


The problem is


to construct


test statistic


and determine


critical


values


The


approach


used


this


study


construct


test


statistic


analogous


those


developed


Lawley-Hotelling


Pillai-Bartlett


(V)1


and


Wilks


Define


-r


-X)


and


- r


Then


the


test


stati


Hotelling-Lawley


trace


criterion,


the


Pillai-Bartlett


trace


criterion,


the


Wilks


likelihood


ratio


criterion


are,


respectively


trace


trace


flE-i


E) -1]


+ '1


Approximate


trans format ions


can


used


with


each


these


test


statisti


CS.


Define


the


following


variable


es:


= number


(the


independent


degrees


variables


of freedom


.....1 A. -- 1..- -- --- -- -- aa. a -


G


(V)


a(a+


,,,1


rHLU~I


1I-









=-- min(p,h)


(the degrees of


freedom


for the


multivariate analog to sums of

within groups)


squares


- h


.5(e


For the Hotelling-Lawley


criterion,


transformations


developed


Hughes


Saw


(197


McKeon


(1974)


respectively


are given by


2 (sn+1)
(2m+s+l


s(2m+s+l),2(sn+1)


and


F (2)
U


2n
a-2


- F
Sph,a


where


= 4


ph +


and


+ h)
- 1)


2n + p)


2n +


For


the


Pillai-Bartlett


criterion


(1985,


p.12)


transformation


is given by


2n+s+l
2m+s+l


F
- ) smn+s*l),s(2n+s*l)


For


Wilks


criterion,


(1952,


p.262)


transformation


is given by


rt 2q


F (1)
U


N


F









where


p2h2


= 1


, otherwise


and


= e


_ P


Scale


the


measures


between


within


qrouD


variability.


Consider


the


univariate


(p=1)


case


denominator


the


Brown-Forsythe


statisti


-z


- z


= G [


1


- i
N


G
_n .S


G 2
--' N


--2
= Gs


Here


is the


arithmetic


average


G sample


variances


the


their


respective


average

'e sample


the


sizes.


G sample


Because


variances

both are a


weighted


approaches


+h2


t
Y








freedom


for the


sum


squares


between


groups.


Because


numerator


the


between


group


sums


squares,


Brown-Forsythe statistic


is in the metric of the ratio of two


mean


squares


Now the


MANOVA


criteria


are


the metric


ratio of two sum of


squares.


Consider the common MANOVA


criteria


univariate


setting.


For


Hotelling-Lawley,


Pillai-Bartlett,

SSBG/(SSBG+SSWG),


and


Wilks


and L


respectively,


= SSWG/


(SSBG+SSWG)


SSBG/SSWG,


In each case the


test


statistics


are


functions


the


sum


squares


rather


than mean squares.


Hence,


in order to use criteria analogous


to U,


E must be


replaced by


(f/h)M.


i=1,...


eigenvalue


characteristic


equation


-r (f/h)M|I=0.


One


statistic


consider


would


analogous


to Roy


largest


root


criterion


(1945)


where


four


basic


MANOVA


criteria


, Roy's


largest


root


criterion


most


affected


heteroscedasticity


(Olson,


1974


, 1976,


1979;


Stevens,


1979).


Consequentially,


will


omitted.


Lawley-Hotelling trace (Hotelling,


1951;


Lawley,1938)


is based


upon


the


same


characteristic


equation


Roy'


largest


root


criterion


(194


this


case,


the analogous statistic U*


trace(H[(f/h)M]


provides one of the


test statistics


interest.


i=1,...


denote


eigenvalue


characteristic


equation


8e [H+(f/h)M]


-11=0.


Then


(1.









(Bartlett,1939;


Pillai,


1955)


= trace(H[H+(f/h)M]


= s ei


provides


another


test


statistic


interest.


Similarly,


F .1


the


eigenvalue


of the


characteristic


equation


(f/h) M


(H+(f/h)M)


, then


analogous


Wilks


(193


criterion


defined


(f/h)M


H+(f/h)M


conduct


hypothesis


testing,


approximate


tran


sformations


were


used


with


each


ese


analogous


test


statistics,


replacing


N-G,


the


degrees


of freedom


, by


the


approximate


degrees


freedom


Thus,


the


variables


are


defined


follows:


= number


of independent


variables


(the


degrees


freedom


the


multivariate


analog


sums


squares


between


group


= min(p,h)


trace


ciS


+ trace [


ciS2


{trace


[ci s,


+ trace


where


= 1


- i
N


-E
2=1


i-1,


S=G


G


iSi3 2)









For the modified Hotelling-Lawley


criterion,


the Hughes


and Saw


(197


and McKeon


(1974)


transformations respectively


are now given by


2(sn'+l)
s(2m+s+l


s(2m+s+l) 2 (sn'+1)


fU'


where


= 4


ph +


and


-' -


2n' + h)


2(n*


- 1) (2n'


+ p)
+ 1)


For the modified Pillai-Bartlett criterion the SAS (1985,


p.12)


transformation


now given by


2n'+s+l
2m+s+l


Fs(2m+s+l) ,s(2n'+s+l)


For the modified Wilks criterion,


the Rao


(1952,


p.262)


transformation


is now given by


t r*t


- 2q


where


p2h


Fph r't


p2 +h


p2 + h2


a'-2


Fphr d


- V'









=3-


Eaualitv


expectation


the


measures


of between


and


within


crouo


dispersion.


The


Brown


-Forsythe


statistic


was


constructed

expectations


that,


the


under


the


numerator


null


denominator


hypothesis


are


equal


show


the


proposed


multivariate


general


zation


Brown-


Forsythe


statistic


possesses


the


analogous


property


(that


E(H)=E(M),


assuming


true)


following


results


are


useful:


E(x1


=~1'


- IL


=11


E(x x'


= var


+ pp'


- ~ii)


- Var"


1


var


i=1


2i
n


Using


results


, E(M)


given


E(M5d


= E[


7n


Ir


P +











-E
2=1


- 12
n


Similarly,


using


results


, E (H)


given


- i)Ij


-IL),


-
.1=1


+
1=1


E(X


-Il,'


-z
2=1


var


VarZ


X-i


- -I
[xx1


- x F


- lx L


~- I


-I


+ IL I"


+ L I'] }


0
-z
2=1


G
1
l2i=1


n'El


x x I
X X


/ 2


IL j.L/


- n


ii x' 2 +


Ip p' + n


I 1'


- n


JAILr'


-Sn1
i~rI


1 =1
. i=1


nii


-2n [


var


+ iA CI


tA IL'


- -


Gn l. ]


= E[


- E=
i=1


- EC


Ir Cr/


i cx~i


- x~ ex,


it Ici
n
i











- 2


E
1 =
n=1


nE
ni
n


+ u Iu" ]


+ 2


.1' IL'


- Ei
Sil
1=1


i=1 n.
i- a.


-21
2=1


niS
n


-2n ppl'


+2n ppI


-SEI1
2=1





1=1


-n,
F^E.


n


Hence,


E(H)


= E (M).


Thus the modified Brown


-Forsythe general


nations parallel


basi


MANOVA


criteria


terms


the


measure


of between


group


dispersion,


the


measure


of within


group


dispers


metric


between


and


within


group


dispersion


, and


equality


the


expectation


the


measures


between


within

Wilcox


group dispersion.

Generalization


test


Wilcox


(1989)


proposed


using


test


statistic


-E
2=1


where


approximately


distributed


-square


with


degrees


freedom.


extend


thi


the


multivariate


. ._1


A .* I -.


jZ)


"


1 I I


1


L1~











- i)


where


- ni,1


+1) i12i


i=1


+ 1)


-1 I~~
1=1


The


statistic


approximately


stributed


-square


with


p(G-


degrees


freedom.


Invariance


ProDertv


Test


Stati


stics


Samples


experiment


were


selected


from


either


contaminated


population


or an uncontaminated


population.


subset

matrix


of populations


as their


labeled


common


uncontaminated


dispersion


had


matrix.


the


The


identity


subset


populations


categorized


as contaminated


had


a common


diagonal


matrix


generality


That


beyond


ese


the


matrix


limited


forms


form


entail


loss


heteroscedasti


city


investigate


due


well


known


theorem


Anderson


and


invariance


characteristic


test


statistics.


I..- .., .1t- .w!


-%/iK


---


n" 1


L.


I


F









positive


definite,


there


exists


pxp


nonsingular


matrix


such


that


TZ.T


TZ.T


, where


pxp


identity


matrix


and


pxp


diagonal


matrix


(Anderson,


1958)


Hence


, when


the


design


includes


two


population


subsets


with


common


dispersion


matri


ces


within


given


subset,


including


only


diagonal


matri


ces


each


simulated


experiment


additional


limitation


on generalizability.


Second


, the test


stati


are


invariant


with respect


transformations


where


a pxp


nonsingular


transformation.


Brown-Forsvthe


General


zations


=Tx1


denote


the


sample


mean


vector


and


well


sample


known


dispersion


that


- = Tx
I


. and
I


TS.T
1


sample


calculated


using


*and


well


known


that


THT


Now


-
N


n.
N


2'S1


=TM T


For


the


modified


Hotelling


-Lawley


trace


criterion


trace{


M i


matrix


It is


* and


* be


i=1


n.
N


=2":










trace {


T H T


TMT ] -1


trace


Similarly,


the


modified


Pillai-Bartlett


trace


criterion


H'[H*


trace{T H T'[T H 2


)TM ] -1}


trace {H[H+ tM] -1}
h


For


modified


Wilks


likelihood


ratio


criterion,


. + f
h


T M T'


TH2"'


f
+-
12


TM?1 '


STjIZI


if
-MI
h


f
+ M
h


Wilcox


Generalization


trace {


hM
h"


f
+ -M
h


J











- 12i


rs;1-


[ T'] -lnis-1


t~w~i
2 =1


G
(T') [r
2=1


ni9;1]-2.


1=1


- Tii


G
2=1


'-n S
i


{Z w1}T11
1i-


r21~


2=1


n'Si.i,


2 =1


G=1
i-1


i=1


Using


results


1-4,


is


shown


to be


invariant


follows


- T) 'W*


i(TX,


- Ti)


G

-T
i=1


- ) )


TI) -W1T-[T(fi


- 2)]


-r
. 11


- f) 'wi


-X)


ml-.~~ ~ ~ ~ ~ -% -~ a C a Sr ew ~ -r,.4 e TV -*t -


4-1,a


4-, 1-4-


.1=1


= T(


- Ti


alr


i=1


mk AHA CAHA


|


ann










loss


generality


solely


using


diagonal


matrices


simulate


experiments


which


there


are


only


sets


dispersion matrices.


It should be noted,


however


, when there


are more


than two sets


of differing dispersion matrices


matrices


cannot


always


simultaneously


diagonalized


transformation matrix T.


Design


Eight


factors


were


considered


study.


These


are


described


following paragraphs.


Distribution


tvDe


(DT).


Two


types


distributions--


normal


exponential--were


included


study.


Pearson


and Please


(1975)


suggested that studies of robustness should


focus


distributions


magnitudes


less


with


than


skewness


0.6,


kurtosis


respectively.


having


However,


there


evidence


suggest


these


boundaries


are


unnecessarily


restrictive.


example,


Kendall


and


Stuart


(1963,


p.57)


reported


the


time


marriage


over


300,000


Australians.


The skewness and kurtosis were


2.0 and


respectively.


distributional


Micceri


(1989)


characteristics


investigated

achievement


psychometric measures.


Of these 440 data sets,


15.2%


had both


tails


with


weights


about


Gaussian,


49.1%


least


one extremely heavy tail,


and 18


.0% had both tail weights less


*h~rn


a
fl~iicci an -


U


1-n ri',


found


28.4%


-IUI aUY aL 'a)( S.. a- -F -


~h P


YLIU








being


extremely


asymmetric.


Of the


distributions


considered,


11.4%


were


classified


within


category


having


skewness


extreme


The


Micceri


study


underscores


the


common


occurrence of


distributions that are non-normal


Further


Micceri


study


suggests


the


Pearson


and Please criterion may


too


restrictive.


For


the


normal


stribution


the


coefficients


of skewness


and


kurtosis


(p4/M22


are


respectively


0.00 and


0.00.


For


the


exponential


distribution


the


coefficients


skewness


and


kurtos is


are


respectively


SThe


Micceri


study


provides


evidence


that


proposed


normal


exponential


distributions


are


reasonable


representations


data


that


may


found


educational


research.


Number


of dependent


variables


(Dl.


Data


were


generated


simulate


experiments


which


there


are


dependent


variable


Thi


choi


reasonably


consistent


with


the


range


of variable


commonly


examined


educational


research


(Algina


Oshima,


1990;


Algina


Tang,


1988;


Hakstian,


Roed,


Lind,


1979;


, 1991;


Olson


, 1974;


Tang,


1989)


Number


of DODulations


sampled


Data


were


generated


to simulate


or G=6


experiments


populations


which


there


Dij kstra


sampling


Werter


(1981)


from either


simulated


experiments


simulated


with


equal


experiments


, and


with


eaual


Olson


to 2. 3


(1974)


and


* f


(,/2/23) 1/2


6


.


.









rare


educational


research


(Tang


1989)


Hence,


chosen number of populations sampled


should provide


reasonably


adequate


examination


this


factor


Decree


sample


size


ratio


( NR)


Only


unequal


sample


S1zes


are


used


study


Sample


size


ratios


were


chosen


ratios


range


n1:n2:n3


from


small


used


the


moderately


simulation


large.


when


The


sampling


basic


from


three


different


ratios


from


populations


. :n 6


different


are


used


populations


given


the


are


Table


simulation


given


Similarly,


when


Table


sampling


Fairly


large


ratios


were


used


in Algina


and


Tang


(1988)


study


, with


an extreme


ratio


of 5


In experimental


and


studi


common


to have


sample


-size


ratios


between


(Lin,


1991)


Olson


(1974)


examined


only


case


equal-


sized


samples


Since


error


rates


increase


as the


degree


the


sample


size


ratio


increases


(Algina


Oshima


, 1990),


nominal


error


rates


are


excess


ively


exceeded


using


small


to moderately


large


sample


size


ratios


, then


procedure


presumably


will


have


difficulty


with


extreme


sample


size


ratios


Conversely,


the


procedure


performs


well


under


this


range


sample


ratios


then


should


work


well


equal


sample


size


ratios


question


of extreme


sample


rati


still


open.


Hence


, sample


size


ratios


were


chosen


under


the


constraint


i


4m









Table


Sample


Size


Ratios


~Ln1 Ln2- n


nI : n2 : n3



1 1 1.3

1 1 2

1 1.3 1.3

1 2 2


Table


Sample


Size


Ratios


*: .


nI : n2 : : 4 n5 : n6



1 1 1 1 1.3 1.3

1 1 1 1 2 2

1 1 1.3 1.3 1.3 1.3

1 1 2 2 2 2









sample


size


and


largest


sample


size


populations


sampled.


some


cases


these


basi


ratios


could


be maintained


because


the


restriction


the


ratio


total


sample


size


to number


of dependent


variable


Departure


from


these


basic


ratios


was


minimized.


Form


the


sample


ratio


(NRF


When


there


are


three


groups


either


the sample


size


ratio


form


= n,


< n3


denoted


NRF=


or the


sample


size


ratio


form


-- n3


denoted


NRF=2


When


there


are


six


groups


either


sample


size


ratio


of the


form


- n2


= n3


4 < n5


- n6


denoted


NRF= 1


or the


sample


ratio


form


= n2


denoted


NRF=2.


Ratio


total


sample


size


number


dependent


variables


(N:D


The


ratios


chosen


were


N:p=10


and


N:p=20.


Hakstian


, Roed


and Lind


(1979)


simulated


experiments


with N


equal


With


some


notable


exceptions


(Algina


Tang


, 1988;


, 1991)


current


studi


tend


avoid


smaller


than


. Yao


s test


(which


is generally


more


robust


than


James


s first


-order


test)


should


have


N:p


at least


10 to


robust


(Algina


Tang,


1988)


With


, Lin


(1991)


reasoned


seems


likely


that


will


need


to be at least


robustness


obtained


upper


limit


was


chosen


represent


moderately


large


experiments.


These


*


.l-.-. ,,


r .c -on


I; I J


7


1


7









Decree


of heteroscedasticitv


Each


population


with


dispersion


matrix


equal


a pxp


identity


matrix


will


called


an uncontaminated


population.


Each


population


with


pxp


diagonal


dispersion


matrix


with


at least


one


diagonal


element


not


equal


one


will


called


contaminated


population.


The


forms


the


dispersion


matrices,


which


depend


upon


the


number


of dependent


variables


, are


shown


Table


Two


level


d=J2


were


used


simulate

matrices


the


degree


Olson


of hetero


(1974)


scedasticity


simulated


experiments


the

with


dispersion


d equal


, 3.0,


and


Algina


and


Tang


(1988)


simulated


experiments


(1989)


chose


with


equal


equal


1.5,


and


, and


Algina


Tang


Oshima


(1990)


selected


d equal


to 1.5 and


3.0


For


this


study,


was


used


to simulate


a small


degree


of heteroscedasticity


and


was


selected


represent


larger


degree


heterosceda


sticity


SThese


values


were


selected


to represent


range


heteroscedasticity


more


likely


common


educational


experiments


(Tang,


1989)


RelationshiD


of sample


size


to dispersion


matri


ces


Both

and


positive

dispersion


and


negative


matri


ces


relationships


were


between


investigated.


sample


the


size


positive


relationship


the


larger


samples


correspond


the


negative


relationship


, the


smaller


samples


correspond


to D.


- S U U a- a -- -


d=J


Ilr ii


^


rrrr 1


,


I* r








Table 4


Forms of


Dispersion Matrices


Matrix p=3 p=6



D Diag(l,d2,d2} Diag(l,l, d2 ,d2,d2 ,d)d

I Diag{l,l,l} Diag(1,1,1,1,1,1)








Table


Relationship


of Samile


Size


to Heteroscedasticitv


(G=3)


Sample


Size


1 : n2


Ratios


* n3


Relationship


Positive


Negative


IID


IDD


Table


Relationship


of Sample


Size


to Heteroscedasti


city


(G=6)


Sample


Size


Ratios


Relationship


1 : n2


: n4


: n5


Positive


Negative


IIIIDD


IIIIDD


IIDDDD


IIDDDD


DDDDII

DDDDII


DDIIII

DDIIII









Desicmn


Layout.


sample


sizes


were


determined


once


values


, N:p,


NRF,


and


were


specified.


These


sample


zes


are


summarized


Table


Table


respectively


Each


these


conditions


were


crossed


with


two


distributions


, two


level


heteroscedasticity


, and


relationships


sample


size


dispersion


matri


ces


generate


experimental


conditions


from


which


to draw


conclusions


regarding


the


competitiveness


the


proposed


statistics


establ i


shed


Johansen


procedure.


Simulation


Procedure


The

for each


simulation

condition,


was

with


conducted


replications


separate


per


runs


condition


each


condition,


performance


Johansen


test


(4),


variations


modifi


Hote


lling-Lawley


test


, the


modified


Pillai


-Bartlett


test


modified


Wilks


test


modified


Wilcox


test


were


evaluated


using


generated


data.


For


sample,


nixp


(i=1


S. .,G)


matrix


uncorrelated


pseudo


-random


observations


was


generated


(using


PROC


IML


SAS)


from


target


stribution--normal


exponential


When


target distribution was an


exponential,


the


random


observations


each


variates


were


*


.3 a ~ a a e r a a *1 -. a -A a 4-a ~.3 -


i G


,,1


-YI1L ~ -1


1


LL


-I









Table


Sample


zes


(G=3)


p G N: p N n, n2 n3


Note.
closely


occas


ionally


altered


maintain


ratio


as manageable









Table


Sample


zes


p G N:p N n1 n2 n3 n4 n5 n6


Note.
closely


J is occasionally
as manageable.


altered


maintain


ratio


LG=6)








variates were


identically


distributed


with mean


equal


zero,


variance


equal


one,


and


covariances


among


variates


equal


zero.


Each


nixp


matrix


observations


corresponding


contaminated


population


was


post


multiplied


an appropriate


D to simulate


dispersion


heteroscedasticity


For


each


replication,


the


data


were


analyzed


using


Johansen


s test


the


two variations of


the modified Hotelling-


Lawley


trace


criterion


the


criterion

modified


Wilks


modified Pillai-

likelihood ratio


-Bartlett


criterion


trace

. and


the modified


Wilcox test.


The


proportion


of 2000


replications


that


yielded


significant


results


at a= 0


were


recorded


Summary


Two


distribution


types


[DT=normal


exponential],


level


dependent


variable


(p=3


two


level


populations


sampled


or 6),


two


level


of the


form


of the


sample


size


ratio,


two


levels


of the


degree


of the


sample


size


ratio


, two


level


of ratio


total


sample


to number


dependent


variable


(N:p=10


or 20)


, two


level


degree


heteroscedasticity


(d=J


3.0),


and


two


levels


relationship of


negative


sample size


condition)


to dispersion matri


combine


give


ces


(S=positive


experimental


conditions


The


Johansen


test


('ii


the


two


variations


of the


m-a4 PA^ U a1 1 4r -hT TT^ t m f


TT *\


+Iha mh~; f; 6~


o; 11


taet llT


r T .~ Gt'l d'l)








modified


Wilcox


test


(H )
'-ID


were


applied


each


these


experimental


conditions.


Generalizations


behavior


these


tests will


be based


upon


collective


results


of these


experimental


conditions.











CHAPTER 4


RESULTS AND


DISCUSSION


this


chapter


analyses


a=.05


are


presented.


Results with regard to i


for a=


.01 and for a=.10 are similar.


The


analyses


are


based


data


presented


Appendix.


Distributions


the


six


tests


are


depicted


Figures


labelled


In each


.05 denotes


denotes


.0750


these


.0250


.1249,


SlX


.0749

and


figures,


the


the


interval


forth.


interval


labelled


From


these


figures


rates


it is clear that


in terms of


performance


controlling Type I


Johansen


and


error


modified


*


Wilcox


tests


are


similar;


the


performance


first


modified


Hotelling-Lawley


(U,*) ,


second


modified


Hotelling-Lawley


cn~*


modified


Pillai-Bartlett


modified Wilks


tests are similar;


the performance of


these

the p


two sets of


performancee


tests greatly

'the Johansen


differ


test


from


one another;


superior to


that


the Wilcox generalization;


and


the performance of each of


Brown-Forsythe


generalizations


superior


that


either


the


Johansen


test


or Wilcox


generalization.


Because


the performance of the Johansen and modified Wilcox tests were

so different from that of the Brown-Forsythe generalizations,

separate analyses were conducted for each of these two sets of


-~ a I I


r r


r































.05 .10


.15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00


Figure


Frequency
Johansen


Histocram


Estimate


TyDe


Error


Rates


Test
































m a


.05 .10


.15 .20 .25 .30 .35


.40 .45


.55 .60 .65


.70 .75 .80


.85 .90 .95 1.00


Figure


Freauencv


Modified


stoaram


Wilcox


Estimated


Tvoe


Error


Rates


Test
































.05 .10


.15 .20 .25 .30 .35


.45 .50 .55 .60 .65 .70 .75 .80 .85 .90


.95 1.00


Figure


Freaixan~y


First


Modified


stoararn


Estimated


Hotellinc-Lawlev


TvDe


Error


Rates


Test

































.05 .10


.15 .20 .25 .30 .35


.40 .45 .50


.55 .60 .65 .70


.75 .80 .85


.90 .95 1.00


Figure


Freauencv


Second


Histoaram


Modified


Hot


Estimated


ellinq-Lawley


Type
Test


Error


Rates











250


.05 .10


.15 .20


.25 .30 .35 .40 .45 .50


.55 .60 .65 .70


.80 .85 .90 .95 1.00


Figure


Frequency
Modified


Histocram


Pillai-Bartlett


Estimated


TvDe


Error


Rates


Test
































.05 .10 .15 .20 .25 .30 .35 .40 .45 .50


.55 .60


.65 .70 .75 .80 .85 .90


.95 1.00


Figure


Freauencv


Modified


Histoqram


Wilks


Estimated


Tvne


Error


Rates


Test









was


used


investigate


effect


the


following


factors: Distribution Type (DT), Number of Dependent Variables


(P:'1


Number of


Size Ratio


(NR)


Populations Sampled


Degree of


, Form of the Sample Size Ratio


(NRF)


the Sample


Ratio of


Total


Sample


Size


Number


Dependent


Variables


(N:p),


Degree of Heteroscedasticity


(d) ,


Relationship of Sample Size


to Dispersion Matrices


(S),


and Test


Criteria


(T) .


Brown-Forsvthe Generalizations


Because


there


are


nine


factors,


initial


analyses


were


conducted


determine


which


effects


enter


into


analysis of variance model.


A forward selection approach was


used,


with


main


effects


entered


first,


followed


two-way


interactions,


three-way


interactions,


four-way


interactions.


Because R2 was


for the model


with


four-way interactions,


more complex models were not examined.


models


are


shown


in Table


The


model


with


main


effects


and


two-way


through


four-way


interactions


was


selected.

Variance components were computed for each main effect,


two-way,


three-way,


four-way


interaction.


The


variance


component


i=1,...,255)


for each effect was computed using


the formula 106(MSEF-MSE)/(2x),


where MSEF was the mean square


for that given effect,


MSE was


the mean square error


for the


fnuir--far;tfnr i n'l-'rn rt'-i nn mnrd=1 .


. ant 2? was twhe ninmhtr nf 1 Aval


( G)









Tabl


Man rn i tnri0a


rn1 P2


Main Effects


rrTun -Wa 17


Interaction


hreoo-


Way Interaction, and Four-Way Interaction Models when using
Way Interaction, and Four-Way Interaction Models when using


the


Four


Brown


-Forsvtne


General


zations


Highest-Order Terms R2



Main Effects 0.52

Two-Way Interactions 0.77

Three-Way Interactions 0.89

Four-Way Interactions 0.96








variance components were set to zero.


Using the sum of these


variance


components


plus


MSEx106


measure


total


variance,


proportion


total


variance


in estimated


Type


error


rates


was


computed


for the


effect


,255)


using the


formula


e,/[ (el+... +e55)


106MSE] .


Shown


in Table


10 are effects that


were statistically significant and


accounted


for at


least


1% of


total


variance


in estimated


Type


error rates.


Because N:p


, and GxT are among the largest effects


and--in


contrast to


factors


such as d


DT--do not have to


inferred


from


data


, their


effects


were


examined


calculating


percentiles


each


combination


N:p.


These


percentiles


should


provide


insight


into


functioning of the four tests.


The DTxNRFxSxd interaction was


significant and


second


largest


effect.


Consequently the


effects of the four factors


involved in this


interaction were


examined


constructing


cell


mean


plots


involving


combinations


four


factors.


Other


interaction effects


with


large


variance


components


that


included


these


factors


were


checked


change


findings


significantly.


The DTxG


interaction will be examined because


accounts


for 4.0% of the total


variance in estimated Type I error rates


and


is not explained


in terms of


either the effect of T,


N:p,


and


G or the effect


of DT


, NRF


, S,


and d.


The


factor p


has nei their a larae ma i n effect or larae interactions with any


i-i,









Table


Variance


Comnon F.m t


Fi rst


Mnr i fi sd'


Hotalling-Lawle


Second Modified Hotellincg-Lawley, Modified Pillai-Bartlett.


Modified


Wilks


Tests


Percent


Effect


of Variance


N:p

DTxNRFxSxd

T

DTxNRFxS

NRFxSxd

DTxd

DTxG

NRFxS

G

GxT


DTxGxd

DTxGxNRFxS

Sxd

d

S

NRFxSxdxT









Table


10--continued.


Percent


Effect


of Variance


pxNRFxSxd


DTxGxN


:pxd


GxN:p

NRFxSxT

DTxNRxS

dxT

DTxS


Others









variance,


effect


was


examined by


inspecting


cell


means


and


--C.


Finally,


influence


degree


sample


size


ratios


(NR)


was


minimal


The


main


effect


accounted


error


only


rates.


.1% of the


The


total


three-way


variance


interaction


in estimated


DTxNRxS


Type


was


effect


with


the


largest


variance


component


which


included


and


still


only


accounted


of the


total


variance


estimated


Type


error


rates.


Effect


of T


, and


. Percentil


are


displayed


Tabl


percentil


are


shown


Table


Using


Bradley'


liberal


criterion


.5a) ,


the


following


patterns


emerge


regarding


control


Type


error


rates


the


Brown


-Forsythe


generalizations


first


modified


Hotelling-Lawley


test


*
CM1)


was


adequate


when


N:p


was


however


test


tended


to be


liberal


when


was


the


second modified Hotelling


-Lawley test


CM2)


was


adequate


when


either


was


10 and


was


or when


was


and


was


second


modified


Hotelling


-Lawley


test


tended


to be


cons


ervative


when


N:p


was


10 and


was


whereas


the


test


tended


to be


slightly


liberal


when


N:p


was


and


was


when


the


was


modifi


20 and


ed Pillai


was


-Bartlett


the


test


modified


was


Pillai


adequate

-Bartlett


test


tended


to be conservative


when N:p


was


10 or when N:p was


20 and


was


the


modified


Wilks


test


was


adequate


when









Table


Percentiles of


for the First Modified Hotellinq-Lawlev Test


1(U ) and Second Modified Hotellina-Lawley Test (U ) for
Combinations of Ratio of Total Sample Size to Number of
Dependent Variables (N:p) and Number of Populations Sampled
J-GI


(N:p=10)


(N:p=20)


Test


Percentile


95th

90th

75th

50th

25th

10th

5th

95th

90th

75th

50th

25th

10th

5th


.0795*

.0710

.0555

.0505

.0430

.0375

.0345

.0730

.0625

.0513

.0453

.0385

.0325

.0290


.0770

.0715

.0595

.0500

.0398

.0315

.0295

.0510

.0460

.0388

.0290

.0198*

.0140*

.0135*


.0855*

.0795*

.0610

.0538

.0493

.0460

.0435

.0815*

.0785*

.0590

.0510

.0470

.0430

.0405


.0885*

.0835*

.0708

.0625

.0540

.0490

.0485

.0710

.0650

.0565

.0483

.0388

.0355

.0330









Table


Percentiles


the


Modified


Pillai-Bartlett


and Modified Wilks Test (L ) for Combinations of Ratio of
Total Sample Size to Number of Dependent Variables (N:p) and


Number


of Populations Sampled


(N:p=10)


(N:p=20)


Test


Percentile


95th

90th

75th

50th

25th

10th

5th

95th

90th

75th

50th

25th

10th

5th


.0555

.0495

.0430

.0370

.0318

.0240*

.0200*

.0705

.0635

.0483

.0440

.0388

.0330

.0310


.0365

.0310

.0258

.0210*

.0145*

.0110*

.0070*

.0465

.0425

.0360

.0288

.0215*

.0155*

.0130*


.0695

.0660

.0533

.0480

.0425

.0365

.0345

.0780*

.0745

.0575

.0513

.0455

.0415

.0405


.0510

.0500

.0455

.0380

.0315

.0275

.0235*

.0615

.0580

.0533

.0450

.0375

.0345

.0325


Test







82

modified Wilks test was conservative when N:p was 10 and G was

6.


Effect of


DT, NRF.


shown


Figure


Figure


when data


were sampled


from a


normal


distribution,


regardless


the


form


sample


size


ratio,


mean


increased


degree


heteroscedasticity


increased


positive


condition


whereas


mean


decreased


degree


heteroscedasticity


increased


the


negative


condition.


However


, as shown


in Figures 9


and 10,


when data were sampled


from an


exponential


distribution,


mean


increased


as degree


of heteroscedasticity increased regardless of the relationship


of sample


sizes


and dispersion matrices.


The mean difference


in t between the higher and lower degree of heteroscedasticity


was


greater


positive


condition


when


the


sample


was


selected


first


form


sample


size


ratios


whereas when the sample was selected


in the second form of


the sample size ratio


, the mean difference was greater


in the


negative


condition.


With


data


sampled


from


exponential


distribution


Brown-Forsythe


generalizations


tend


conservative


when


there


was


slight


degree


heteroscedasticity


(that


, (b)


degree


heteroscedasticity


increased


(d=3)


the


first


form of the


sample size


ratio was paired with


the negative condition,


the degree of heteroscedasticity increased and the second


fnrm nf *1


cz mn1 aS


r^31+ i n


- a a -~


.. -* a


Cl ~ ~ ~ ~ ~ u '7 'I. *~ rb.. I aII


.. j~i .


nh~;t~trr


d=J


E 1 '7 d











Mean Type I Error Rate


0.07


0.06


0.05


0.04


0.03


0.02


d = sqrt(


positive
condition


negative
condition


Sample Size to Dispersion Relationship


Figure


Estimated


TvDe


Error


Rates


the


Two


Levels


Degree


of Heteroscedasticity (d = J2 or 3) and Relationship of Sample
Size to Dispersion Matrices (S = positive or negative
condition) When Data Were Sampled as in the First Form of the
-~~~~ ~~ ~~ a-- a-- -* S


Sample


Ratio


from


an Normal


DistriDution










Mean Type I Error Rate
0.07


= sqrt(2)


positive negative
condition condition
Sample Size to Dispersion Relationship


Figure


Estimated Type I Error Rates for the Two Levels of the Degree
of Heteroscedasticity (d = J2 or 3) and Relationship of Sample
Size to Dispersion Matrices (S = positive or negative
condition) When Data Were Sampled as in the Second Form of the
Sample Size Ratio from an Normal Distribution











Mean Type I Error Rate


0.07


0.06


0.05


0.04


0.03


0.02


positive negative


condition


condition


Sample Size to Dispersion Relationship


Figure


Estimated


Mean


TvDe


Error


Rates


the


Two


Levels


Degree of Heteroscedasticity (d = J2 or 3) and Relationship of
Sample Size to Dispersion Matrices (S = positive or negative
condition) When Data Were Sampled as in the First Form of the


Sample


Size


Ratio


from


a Exponential


Distribution


d = sqrt(2)











Mean Type I Error Rate


0.07


0.06


0.05


0.04


0.03


0.02


d=3


d = sqrt(2)
d = sqrt(2)


positive negative


condition


condition


Sample Size to Dispersion Relationship


Figure


Estimated


Mean


TvDe


Error


Rates


Two


Level


Degree of Heteroscedasticity (d = J2 or 3) and Relationship of
Sample Size to Dispersion Matrices (S = positive or negative
condition) When Data are Sampled as in the Second Form of the


Sanpile


Ratio


from


a Exponential


Distribution









distribution,


the


Brown-Forsythe


generalizations


tended


to be


liberal


when


the


first


form


sample


size


ratio


was


paired


with


the


positive


condition,


the


second


form


the


sample


size


ratio


was


paired


with


the


negative


condition.


Effect


of DTxG


interaction.


As shown in Figure 11


mean


the


Brown


-Forsythe


generalizations


was


nearer


a when


was


than


when


was


regard


ess


type


distribution


from which


the


data


were


sampled.


When


data


were


sampled


from


normal


distribution,


the


tests


tended


slightly


conservative.


Mean


was


near


when


data


were


sampled


from


exponential


distribution


and


was


However


when


data


were


sampled


from


exponential


distribution


and


was


Brown


-Forsythe


general


zations


tended


to be conservative


Effect


Shown


Figure


mean


was


near


a for


the


Brown


-Forsythe


general


zations


when


was


When


was


the


tests


tended


to be slightly


conservative.










Mean Type I Error Rate


0.07


0.06


0.05


0.04


0.03


0.02


G=3


Normal Exponential


Distribution


Type


Figure


Estimated


Mean


Distribution


Tvype


TVDe


Error
Number


Rates


Combinations


of Populations


Sampled











Mean Type I Error Rate


0.04


0.03


0.02


6


Number of Dependent Variables


Figure


Estimated


Mean


Tvpe


Error


Rates


Brown


-Forsvthe


Generalizations for the Two Levels of the Number of Dependent
Variables









Johansen Test and Wilcox Generalization


Because


there


are


nine


factors,


initial


analyses


were


conducted


determine


which


effects


enter


into


analysis of variance model.


A forward selection approach was


used,


with


main


effects


entered


first,


followed


two-way


interactions,


three-way


interactions,


four-way interactions.


Because R2 was


.997


for the model with


four-way


interactions


, more complex models were not examined.


The


main


models


effects


are


two-way


shown


through


in Table


four-way


model


interactions


with


was


selected.

Variance components were computed for each main effect


two-way,


three-way


and


four-way


interaction.


variance


component


i=1,


..,255


for each effect was computed using


the formula 104(MSEF-MSE)


, where MSEF was the mean square


for that


given


effect,


MSE was


the mean


square


error


for the


four-factor interaction model


, and


was the number of levels


for the


factors


included


that given


effect.


Negative


variance components were set to zero.


Using the sum of these


variance


components


plus


MSExl04


measure


total


variance,


proportion


total


variance


in estimated


Type


error


rates


was


computed


effect


(i=l,


. *.


using the formula e8/[ (8,+... +255)


+ 104MSE].


Shown in Table


14 are effects that


were statistically significant and (b)









Table


Magnitudes


n.E Pr2


Main Effcrts


Interaction


Tb rPa -


Way Interaction, and Four-Way Interaction Models when usinQ
a a


t~hw


Johan


sen


Test


and


Wi-I


cox


General


action


Highest-Order Terms R2



Main Effects 0.767

Two-Way Interactions 0.963

Three-Way Interactions 0.988

Four-Way Interactions 0.997


~wn -W~ v








Table


Variance Components


for the Johansen and Modified


Wilcox Tests


Percent


Effect


of Variance


36.2


GxN


227


14.8

11.4


GxT


GxNRFxNR

P

pxG


GxNR


NRFxNR


Others









Because


N:p,


, GxN:p,


and


GxT


are


among


the


largest


effects


and--in


contrast


to factors


such


as d and


DT--do


have


to be inferred


from


data,


their


effects


will


be examined


calculating


percentiles


of f


each


combination


of G and


N:p.


These


percentiles


should


provide


insight


into


functioning


these


two


tests.


Effect


of T,.


, and


SPercentiles


are


splayed


.5a)


Table

the l


Using


following


Bradley'


patterns


liberal


emerge


criterion


regarding


control


Type


error


rates


the


Johansen


test


and


cox


general


zation


Johansen


test


was


adequate


only


when N


was


20 and


was


and


Wilcox


general


zation


was


inadequate


over


range


experimental


conditions


considered


the


experiment.


Since


performance


of the Johansen


test


the


Wilcox


general


zation


was


inadequate


further


analyst


was


warranted


either


ese


two


tests.


Summary


clear


that


terms


controlling


Type


error


rates


under


the


heteroscedastic


experimental


conditions


considered


the


four


Brown


-Forsythe


general


zations


are


much


more


effective


than


either


modified


Wilcox


test


Johansen


test


, (b)


Johansen


is more


effective