UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
SOLUTIONS TO THE MULTIVARIATE GSAMPLE BEHRENSFISHER PROBLEM BASED UPON GENERALIZATIONS OF THE BROWNFORSYTHE F* AND WILCOX H TESTS By WILLIAM THOMAS COOMBS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1992 ACKNOWLEDGEMENTS would individuals like that express have sincerest assisted appreciation in completing to the study First , I would like to thank . James . Algina , chairperson doctoral committee, suggesting topic dissertation, theoretical providing guiding barriers, editorial professional through debugging suggestions personal growth difficult computer and through applied errors, fostering encouragement support, and friendship. Second, am indebted and grateful the other members committee , Dr . Linda Crocker . David Miller , and . Ronald . Randl patiently reading the manuscript, offering constructive suggestions, providing editorial assi stance, and giving continuous support. Third, must thank John Newell who fifth unoffi cial member committee still attended committee meetings, the read progress manus of the script, project. and Finally, vigilantly would inquired like as to express heartfelt thanks wife Laura son Tommy Space limitations prevent from enumerating many personal sacrifi ces both large and small , required wife so that was able to accomplish task Although shall never a  ,, .. committee and family, let me begin simply and sincerelythank you. TABLE OF CONTENTS ACKNOWLEDGEMENTS ABSTRACT CHAPTERS INTRODUCTION The Problem Purpose of the Significance of . . tudy the Stu Study * * a a a a a a a REVIEW OF LITERATURE The Independent Alternatives to ANOVA F Test . Alternatives to Hotelling's T2 Alternatives to MANOVA Criteria Alternatives to Samples the Ind the ANO 'est the Hot the MAN t Test dependent VA F Tes selling's OVA Crit Sampl t . a 2 mn2 t Tes Test. . I a a a a METHODOLOGY Development of Test BrownForsythe Scale of with Equality of Statistics . Generalizations the measures < in group variabi of expectation < between and of between lity of the mea! within and sures group Wilcox Invariance Brown Wilcox Design . Simulation Summary disp General propertyy orsythe General version ization of the Genera ization st Statisti nations . cs a a . . Procedure RESULTS AND DISCUSSION . . 67 BrownForsythe General zations . . 74 Johansen Test cox General zation CONCLUS IONS S ~ ~ S 4 4 S S 4 0 4 5 S 96 General Observations S S S 4 4 4 S 4 4 496 Suggestions to Future Rese archers APPENDIX ESTIMATED TYPE ERROR RATES . . 100 REFERENCES . . . 141 BIOGRAPHICAL SKETCH . . . . f148 Abstract of the of Dissertation University Requirements SOLUTIONS BEHRENSFISHER Presented of Florida Degree TO THE PROBLEM the Partial I of Doctor MULTIVARIATE BASED UPON Graduate School Fulfillment of of Philosophy GSAMPLE GENERALIZATIONS OF THE BROWN FORSYTHE AND WILCOX TESTS William Thomas Coombs August 1992 Chairperson: Major James Department: J. Algina Foundations of Education The Brown Fors ythe and Wilcox H_ In tests are generalized form multivariate alternatives MANOVA use situations where dispersion matrices are het eroscedastic. Four generalizations the Brown Forsythe test are included. Type error rates for the Johansen test and the five new general nations were estimated using simulated data variety of conditions The design experiment was a 2 factorial The factors were type distribution, number of dependent variables number groups, ratio total sample size number dependent variables form the sample size ratio, degree of the sample size ratio , (g) degree of heteroscedasticity, relationship sarmni e s1. A disnersion matrices. Only conditions , (e) which dispersion matrices were heterogeneous were included. controlling Type error rates, the four generalizations BrownForsythe test greatly outperform both Johansen test and generalization the Wilcox H ll test. CHAPTER INTRODUCTION Comparing two population means using data from independent statistical sample hypothesis one of testing< the most fundamental One solution to thi problems problem, the independent samples test , is based the assumption that the samples are drawn from populations with equal variances. According Yao (1965), Behrens (1929 was first solve testing without making assumption of equal population variances Fisher (1935 , 1939) showed that Behrens solution could derived from Fisher' theory of stati stical inference called fiducial probability Others (Aspin, solutions to th The 1948; two independent Welch, sample 193 Behrens samples test 8, 1947 Fisher has ) have problem been proposed as well generalized analysis variance (ANOVA) test, test equality population means. This procedure assumes homoscedasticity, that 22 ..  * . Several authors have proposed procedures to test without assuming equal population variances. Welch (1951) extended 1938 work arrived an approximate degrees of freedom (APDF) solution. Brown and Forsythe (1974), James = ~CL2 (1951) , and Wilcox (1988, 1989) have proposed other solutions to the sample Behrens Fisher problem. Hotelling test (1931) to a test of the e procedure makes generalized quality the a the of two independent population mean assumption equal samples vectors. population dispersion (variancecovariance) matrices , that 1 = 2". Several without authors assuming have equal proposed pr population ocedures dispersion test matri 1=t Lz ces James (1954) solution. generalized Anderson 1951 (1958) work , Bennet and arrived (1951) , Ito series (1969) van der Merwe (1986) Scheffe (1943), and Yao (1965) have proposed additional solutions the multivariate two sample Behrens Fisher problem. Bartlett (1939) , Hotelling (1951) , Lawley (1938), Pillai (1955) Roy (1945), Wilks (1932 have proposed multivariate general zations of the ANOVA F test, creating the four basic multivariate analysis variance (MANOVA) procedures testing These procedures make the assumption of equal population dispersion matrices James procedures test (1954) the and Johansen equality mean (1980) vectors proposed without making assumption of homoscedasticity, that 1 =2:2 . . ZG James extended James (1951) univariate procedures produce first order second order series solutions. The Problem To date, neither the Brown Forsythe (1974) nor the Wilcox (1989) procedure has been extended the multivariate setting. test G Brown Forsythe (1974) proposed the statistic  X  12 N where denotes the number of observations in the group, the mean the group, the grand mean, the variance of the group, N the total number of observations, and G the number groups . The statistic approximately distributed as F with and f degrees freedom where ni N The degrees freedom, , were determined using procedure due to Satterthwaite (1941). To test . = . 0o : 1 Wilcox (1989 proposed statistic y where i cxi Clz n) N n, ( n X + 1) 2 + 1) and i i=1 t w. 1 In the equation sample. denotes The statistic last approximately observation distributed as chi square with degrees freedom. Purpose the Study The purpose thi study extend the univariate procedures proposed Brown and Forsythe (1974) and Wilcox (1989) test and compare Type error rates the proposed multivariate generalizations the error rates Johansen (1980) test under varying stributions , numbers dependent (criterion) variabi numbers groups , forms of the sample size ratio , degrees the sample size ratio, ratios total sample size to number dependent variable , degrees heteroscedasticity, relationships of sample size to dispersion matrices. Sicqnificance of the Study The application of multivariate analysis of variance the future data analysis (Bray & Maxwell, 1985, p.7) Stevens suggested three reasons why multivariate analysis prominent: 1. subject the ways inves the sensitive variables Any more worthwhile than tigator 2 subjects one treatment way, hence is to determine will measurement affected techniques will the affect problem which and the for specific then find those Through the use multiple criterion measures des we can obtain cription a more phenomenon complete under i] and detailed investigation Treatments while the cost can be expensive obtaining data to implement, on several dependent maximizes variable information relatively gain . (1986 small . 2) Hotelling sensitive violations homoscedasticity , particularly when sample S1zes are unequal (Algina Oshima, 1990; Algina Oshima , & Tang, 1991; Hakstia] & Clay, (1954) n, Roed, 1963; first & Lind Ito and 1979; Schull, secondordei Hollow 1964). r, and & Dunn, Yao 1967; (1965), Johansen (1980) Hopkins James tests are alternatives Hotelling that have underlying assumption of homoscedasticity In controlling Type error rates under heteroscedasticity , Yao test superior James Algina, firstorder Oshima, and test Tang (Algin (1991) Tang, studied 1988; Type , 1965) error rates the four procedures when applied data sampled from multivariate distributions composed independent  & . I I r ,, r I k * can be seriously nonrobust with extremely skewed distributions such as the exponential and lognormal, are fairly robust with moderately skewed distributions such as the beta(5 They also appear robust with non normal symmetric distributions such the uniform, , and Laplace. The performance Yao test, James second order test , and Johansen test was slightly superior the performance James first order test (Algina, Oshima, Tang , 1991) MANOVA criteria are relatively robust nonnormality (Olson, 1974, 1976) but are sensitive violations homoscedasticity (Korin, 1972 Olson, 1974, 1979; Pillai Sudj ana 1975; Stevens , 1979) The Pillai Bartlett trace criterion most robust of the four basic MANOVA criteria for protection against nonnormality and heteroscedasticity of dispersion to MANOVA matri ces criteria (Olson, that are 197r not 4, 1976, based or 1979) Ithe Alternatives homoscedasticity assumption include James s first and secondorder tests , and Johansen test. When sample sizes are unequal, dispersion matrices are unequal , and data are sampled from multivariate normal distributions Johansen s test and James second order test outperform the Pillai Bartlett trace criterion and James first order test (Tang , 1989) the Wilcox univariate test case, require BrownForsythe equality * test population test. Thi suggests that generalizations the Brown Forsythe procedure and the Wilcox procedure might have advantages over the commonly used MANOVA procedure in cases heteroscedasticity. Brown and Forsythe (1974) used Monte Carlo techniques examine the ANOVA test, Brown Forsythe test, Welch APDF test, and James firstorder procedure. The critical value proposed Welch a better approximation small sample than that proposed by James. Under normality and inequality variances both Welch s test and the test tend to have actual Type error rates near nominal error rates wide variety conditions However, there are conditions which each fail control terms power, the choi between Welch (the specialization Johansen s test and, the case of two groups of Yao s test) and depends upon magnitude the means and their standard errors The Welch test preferred the test extreme means coincide with small variances When the extreme means coincide with large variances power of the test greater than that the Welch test. limited simulation Clinch and Keselman (198 indicated that under conditions heteroscedasticity, Brown Forsythe test ess sensitive to nonnormality than Welch s test. In fact, Clinch Keselman concluded the user  . .. I  t  LI ~1. MI _L  _ * L r__. normal data, in some conditions test has better control over r than does James s second order test , Welch s test, Wilcox test. other conditions test substantially worse control. Oshima and Algina concluded that James second order test should used with symmetric distributions Wilcox test should used with moderately asymmetric distributions. With markedly asymmetric distributions none the tests had good control Extensive simulations (Wilcox, 1988) indicated that under normality the Wilcox H procedure always gave the experimenter more control over Type error rates than the or Welch test and has error rates similar James second order method, regard ess degree hetero scedasticity Wilcox (1989) proposed H , an improvement to the Wilcox (1988) H method; improved test is much easier to use than James secondorder method. Wilcox (1990) indicated that the test more robust non normality than the Welch test. Because the Johansen (1980) procedure extension of the Welch test , the results reported by Clinch Keselman and by Wilcox suggest general zations of the Brown Forsythe procedure the Wilcox procedure might have advantages over Johansen procedure in some cases of heteroscedasticity and/or skewne SS. Thus the construction and comparison new procedures which may competitive even superior under CHAPTER REVIEW OF LITERATURE Independent Samples t Test The the independent equality samples two used population to test means the when hypothesis independent random samples are drawn from two populations which are normally distributed and have equal population variances. The test statistic 1 1) +,n where R1 R has at distribution with n'+n2 degrees of freedom. The degree robustness the independent samples test to violations the assumption of homoscedasticity been well documented (Boneau, 1960; Glass, Peckham , & Sanders, 1972 Holloway Dunn, 1967 ; Hsu, 1938; Scheffe, 1959). cases where there are unequal population variances, the relationship between the actual Type error rate nominal Type error rate influenced sample ccii 'Tnp Yun el a aa1ff1 nT amr41 Wkan e ~mnl n Inin \ E '1 rt n~ P ra large, 7 and a are near one another. In fact, Scheffe (1959, p.339) has shown equal sized samples is asymptotically standard normal, even though two populations are non normal have unequal variances. However, Ramsey (1980) found there are boundary conditions where longer robust to violations of homoscedasti city even with equal sized samples selected from normal populations. Results from numerous studies (Boneau, 1960; Hsu 1938; Pratt , 1964; Scheffe 1959) have shown that when the sample zes are unequal and the larger sample selected from the population with test larger variance is conservative (known as the (that 7< positive condition), Conversely, when larger variance sample selected (known the from negative population condition), with the smaller test liberal (that , r > a) Alternatives the Independent Samples Test According to Yao (1965) Behrens (1929 was the first propose a solution the problem testing the equality population means without assuming equal population variances Fisher problem problem. has Fisher come to be known (1935,1939) noted as the that Behrens Behrens solution could be derived using Fisher s concept of fiducial distributions. A number of other tests have been developed test the hypothesis 1 = "2 in situations which Welch (1947) reported several tests in which the test statistic + The critical value different the various tests. There are two types of critical values: approximate degrees freedom (APDF), and series. The APDF critical value (Welch, 1938) fractile Student s t distribution with 2 2 a, (2] ( L n1 i degrees obtained freedom. replacing practice, parameters the estimator statistic that replaces 1,2) the literature the test using estimator referred as the Welch test. Welch (1947) expressed the series critical value function and , and developed seri critical value in powers  1) The first three terms series critical value are shown Table The zero order term simply fractile the standard normal stribution using the zeroorder term critical Table Critical Value Terms Welch's (1947) Zero F First . and SecondOrder Series Solutions Power of (n1 1) 1 Term Zero 2 Si)2 z [ 2 4 2 1=1 z [ Sc i=1 1) i in 3+5z2+Z4 i=l 2 Si n. .1 15+32z +9z4 2=1 whereas the secondorder critical value the sum three terms. the sample sizes decline , there is a greater need for the more complicated critical values. James (1951) and James (1954) generalized Welch series solutions to the Gsample case and multivariate cases respectively Consequently, tests using the series solution are referred as James s firstorder and secondorder tests. The zeroorder test often referred the asymptotic test. Aspin (1948) reported the third and fourthorder terms , and investigated , for equal sized samples variation in the first through fourth order critical values. Wilcox (1989) proposed a modification the asymptotic test. The Wilcox statistic 2 s1 2+ where 2241 n, (n+l1 .i (n1i 1 asymptotically stributed standard normal distribution. Here (i=l are biased estimators population means which result improved empirical Type error rates (Wilcox , 1989). The literature suggests following conclusions twosample case regarding the control c Type error rates 2Xii series tests, Brown Forsythe test, and Wilcox test: performance Welch test and Brown Forsythe test superior to the test; the Wilcox test and James second order test are superior Welch APDF test; and most applications in education and the social sciences where data are sampled from normal distributions under heterosceda sticity, Welch APDF test is adequate. Scheffe (1970) examined different tests including the Welch APDF test from standpoint NeymanPearson school thought. Scheffe concluded Welch test , which requires only the easily accessible ttable, sati factory practical solution to the Behrens Fisher problem. Wang (1971) examined Behrens Fisher test Welch APDF test and Welch Aspin series test (Aspin, 1948; Welch , 1947) Wang found Welch APDF test to be superior to Behrens Fisher test when combining over the experimental conditions considered. Wang found TaO was smaller the WelchAspin series test than the Welch APDF test. Wang noted, however , that Welch Aspin series critical values were limited select sample sizes and nominal Type error rates. Wang concluded , in practice, one can just use the usual ttable carry out the Welch APDF test without much loss accuracy However, the Welch APDF test becomes conservative with very longtailed symmetric stributions (Yuen, 1974) Wilcox * a __ Wilcox test tended to outperform the Welch test. Moreover, over conditions, the range r was .032 , .065) a=.05, indicating the Wilcox test may have appropriate Type error rates under heteroscedasticity and nonnormality summary, the independent samples test is generally acceptable in terms of controlling Type error rates provided there are sufficiently large equal S1Z sample even when the assumption of homoscedasticity violated. For unequal sized samples , however, alternative that does assume equal population variances such the Wilcox test James second order series test preferable. ANOVA F Test The ANOVA used test the hypothesis equality of G population means when independent random samples are drawn from populations which are normally distributed have equal population variances. The test statisti i x1i. NG) has an F distribution with G1 and NG degrees of freedom. Numerous studi have shown that the ANOVA test is not robus violations assumption homoscedastic (Clinch Keselman , 1982; Brown Forsythe, 1974; Kohr test with one exception. Whereas the independent samples generally robust when large sample zes are equal , the ANOVA rates may even not with maintain equal: adequate sized control samples Type the degre error e of heteros Serlin, cedasticity 1986) conservative larg the the e (Rogan positive negative Keselman condition condition 1977 ; the the Tomarken test test liberal 1974; H (Box 1954; :orsnell, 1953 Clinch Rogan Keselman & Keselm 1982 an, 197; Brown 2; Wi] l[ & Forsythe cox, 1988) Alternatives ANOVA F Test number tests have been deve loped test hypothesis *. = S in situations which (for at least one pair of i and Welch (1951) generalized the Welch (1938) APDF solution proposed statisti w (x1 G 1 2 f (1 l=1 1 where G i1 W w G =1 2=1 wix and =i2 =l ,...,G The statistic approximately distributed with and G3 G2l i=1 degrees of freedom. James (1951) generalized the Welch (1947) series solutions , proposing the test G i=l statistic where S , 1  i ni t 1 =1 w.x w and t a S a S aI S a Sa 1 i  . 1(I(I ~ r r, lr 1 freedom. sample sizes are not sufficiently large, however, distribution test statistic may not accurately approximated a chi square distribution with degrees which of freedom. a function James of the (1951) sample derived a series variances such expression that S2h  a James found approximations to 2h( of orders Sand 1 = n i  the firstorder test, James found order 1 the 1 critical value 2  XGI 2(G2 W  ) Ff null hypothesis hypothesis > 2h( rejected James favor the provided alternative ondorder solution which approximates order James noted that second order test very computationally intensive. Brown Forsythe (1974) proposed test statistic i c(x. 1' I X  n N statistic approximately distributed with and a sec P [C nI2 N degrees freedom. the case two groups, both Brown Forsythe test and Welch (1951) APDF test are equivalent the Welch (1938) APDF test Wilcox (1989) proposed the states  )  i11 where G 2=1 ni (n,+l1) n, (n,+1) i= G G 2=1 w W The statistic approximately stributed square with G1 degrees freedom. The literature suggests the following conclusions about BrownForsythe, performance Wilcox each and ese Wilcox tests alternatives ANOVA superior to Welch test outperforms the James first order test; generally Welch competitive with and one Brown another, Forsythe however tests , the are Welch test is preferred with data sampled from normal stributions while the Brown Forsythe test is preferred with data sampled from skewed distributions and the Wilcox James second order test outperform these other alternatives ANOVA under the greatest variety conditions. Brown Forsythe (1974) used Monte Carlo techniques examine ANOVA Brown procedures Forsythe when equal Welch and APDF James unequal zero samples order s were selected from normal populations; was or 10; ratio largest the smallest sample size was the ratio the largest smallest standard deviation was total sample size ranged between small sample sizes critical value proposed Welch a better approximation true critical value than that propose d by James. Both Welch APDF test and Brown Forsythe test have r near under the inequality variances. Kohr Games (1974) examined ANOVA test , Box test, and Welch APDF test when equal unequal t a aa , or ; (d) 1 ,,,,1 ,,,,1 t,,, r rr ur 1.5, or 2 the ratio the largest the smallest standard deviation was 4/10, or J13; and total sample size ranged between and The best control Type error rates was demonstrated the Welch APDF test. Kohr and Games concluded the Welch test may used with confidence with the unequal sized samples and heteroscedastic conditions examined their study Kohr Games concluded the Welch test was slightly liberal under heteroscedastic compared conditions; inflated however error this rates bias the was test trivial and test under comparable conditions. Levy (1978) examined Welch test when data were sampled from either the uniform, square , or exponential stributions and found that under heteroscedasticity , the Welch test can liberal Dijkstra and Werter (1981) compared James second order, Welch APDF and Brown Forsythe tests when equal unequal S1Z samples were selected from normal populations; was ratio largest smallest sample was total sample size ranged between 12 and and ratio of the largest to the smallest standard deviation was or 3 Dijkstra and Werter concluded the James second order test gave better control Type error rates than either the Brown Forsythe or Welch APDF test Clinch (JI itt C. (198 studied the ANOVA . Welch , or J7, U i IV r. when equal unequal sized sample were selected from normal stributions, chisquare distributions with degrees freedom, or t distributions with five degrees freedom; was ratio largest smallest sample size was or 3 total sample size was 144 ; variances were either homoscedastic heteroscedastic assumption The violations ANOVA Type test error was most rates affected Welch test were above , especially negative case. test provided the best Type error control that generally only became nonrobust with extreme heteroscedasticity Although both Brown Forsythe test and Welch test were liberal with skewed distributions, the tendency was stronger the Welch test. Tomarken and Serlin (1986) examined tests including the ANOVA test, BrownForsythe test , and Welch APDF test when equal and unequal sized samples were selected from normal populations; was the ratio largest the smallest sample size was (c1) total sample size ranged between 36 and and ratio of the largest smallest standard deviation was Tomarken though Serlin generally found acceptable, that Brown was least Forsythe slightly test, liberal whether sample sizes were equal directly inversely S S  , 6 , or * * Wilcox, Charlin, and Thompson (1986) examined Monte Carlo results on the robustness the ANOVA BrownForsythe and the Welch APDF test when equal and unequal sized samples were selected from normal populations; G was or 6; ratio of the largest to the smallest sample was , 3 , 3.3 total sample size ranged between smallest and standard and deviation the was ratio or 4. the Wilcox largest , Charlin, Thompson gave practical situations where both the Welch and F* tests may not provide adequate control over Type error rates. Welch unequal For test equal should sized variances but be avoided samples and unequal favor possibly samples the unequal test , the but variances the Welch test was preferred the test. Wilcox (1988) proposed competitor Brown Forsythe Welch APDF , and James secondorder test. Simulated equal and unequal sized samples were selected where distributions were either normal , light tailed symmetric, heavytailed symmetric, mediumtailed asymmetric, exponential like; was , or 10; the ratio of the largest smallest sample size was , or total the ratio sample size largest ranged the between smallest and 100; standard deviation was , 4, , or 9 These simulations indicated that under .. than did the test or Welch APDF test. Wilcox showed that, under have normality r much , James' closer second than order the test Welch Wilcox' Brown test Forsythe tests The Wilcox gave conservative results provided (i=l . ,G) Wilcox' results indicate H procedure Type error rate that similar to James' second order method , regard ess of the degree of heteroscedasticity Although computationally more tedious, Wilcox recommended James' second order procedure general use. Wilcox (1989) proposed , an improvement Wilcox' (1988) method, designed to be more comparable power James' second order test Wilcox compared James' second order test with when data were sampled from normal populations was or 6 ratio of the largest small sample size was , or total sample size ranged between 121; and ratio largest Wilcox' to the results smallest indicate standard that deviation when was applied or 6 normal heteroscedasti data, has T near a and slightly ess power than James' second order test. The main advantage improved Wilcox procedure that much easier use than James' second order , and easily extended higher way designs. Oshima and Algina press) studied Type error rates A *1 a . .._LL L~ 1 r .... F L These conditions were obtained crossing the 31 conditions defined sample sizes and standard deviations Wilcox (1988) study with five distributionsnormal, uniform, beta(1.5,8 , and exponential. The James second order test and Wilcox test were both affected nonnormality When samples were selected from symmetric non normal distributions both James' secondorder test and Wilcox' test maintained r near When the tests were applied to data sampled from asymmetric distributions, Ta increased. Further, degree of asymmetry increase ed, I va tended increase. The Brown Forsythe test outperformed the Wilcox test James' secondorder test under some conditions , however, reverse held under other conditions. Oshima Algina concluded the Wilcox H m test and James' second order test were preferable BrownForsythe test, James' second order test was recommended data sampled from symmetric distribution, Wilcox' test was recommended data sampled from moderately skewed distribution. summary, when data are sampled from normal distribution have better Wilcox control of Type test and error James rates secondorder , particularly test as the degree heteroscedasticity gets large. All these alternatives the ANOVA are affected skewed data t(5) Hotellina s T2 Test Hotelling (1931) test equality population mean vectors when independent random samples are selected from populations which are distributed multivariate normal and have equal dispersion matrices. The test stati stic given nn2 n, + n2 x^2 I si X2C where 1),2 Hotelling demonstrated transformation ng +n2 pi n, +n2 has an F distribution with nl+n2 degrees of freedom. The sensitivity Hotelling violations assumption of homoscedasticity well documented been investigated empirically both (Algina analytically Oshima , 1990; (Ito Schull, Hakstian, 1964) Roed, Lind, 1979; Holloway Dunn, 1967 ; Hopkins Clay, 1963) Schull (1964) inves tigated the large sample properties presence of unequal dispersion matrices Schull showed that in the case two very large equal sized samples well behaved even when dispersion gC n,+n, of T2 r T2 inequality dispersion matrices provided the samples are very large. However, the two samples are of unequal size, quite a large effect occurs on the level of significance from even moderate variations. Schull indicated that, asymptotically, with fixed n,/ (n1+n2) and equal eigenvalues of E2 S a when eigenvalues are greater than one T > when eigenvalues are ess than one. Hopkins Clay (1963) examined stributions Hotelling' with sample sizes , 10 , and selected from either bivariate normal populations with zero means, dispersion matri ces the form aI  where a,/01 was circular bivariate symmetrical leptokurtic populations with zero means , equal variances, was . Hopkins and Clay reported robust violations of homoscedasticity when n1=n2 but that robustn ess does extend to disparate sample zes. Hopkins Clay reported that upper tail frequencies distribution Hotelling' are substantially affected moderate degrees symmetrical leptokurtosis. Holloway and Dunn (1967) examined the robustness Hotelling' violations homoscedasticity assumption when equal and unequal sized samples were selected from multivariate normal distributions; was , 1 ,, __ * L eigenvalues s2;'I were Holloway and Dunn found equal sized samples help keeping r close Further Holloway and Dunn found that large equal sized samples control Type error rates depends number dependent variable example , when i = 50 (i=l1 the and and eigenvalues but r markedly of S2Z, departs = 10, from T is near a when for or p = 10 Holloway and Dunn found that generally number dependent variable increases, sample size decreases , T Increases Hakstian, Roed and Lind (1979) obtained empirical sampling stributions of Hotelling' when equal unequal sized samples were selected from multivariate normal populations; was or 10; (n1+n2) was or 10; was or 5 dispersion matrices were form where was d2I, diag( 1 S. d2, d2 = 1 I..., , or Hakstian Roed, Lind found that equal sized sample procedure is generally robust. With unequal sized samples was shown become increasingly ess robust disper sion heteroscedasticity number independent variable Increase. Consequentially , Hakstian, Roed, Lind argued against use negative condition cautious use in the p05 itive condition. n1/ n2 r number of dependent variables was or 20; and the majority conditions = d2Z1 3.0) . Algina Oshima found that even with a small sample size ratio example, procedure with can and be seriously .25S1, sample nonrobust size For ratio small Algina 1.1:1 and can Oshima produce also unacceptable confirmed Type earlier error findings rates. that Hotelling' test became ess robust the number dependent variable and degree heteroscedasti city increased. summary , Hotelling' test robust violations assumption homos cedasti city even when there are equal sized samples , especially the ratio total sample size to number of dependent variable small. When the larger sample selected from the population with larger ected dispersion from matrix population When with larger smaller sample dispersion matrix , r > a. These tendenci increase with the inequality the size the two samples the degree heteroscedasticity, and the number of dependent variables Therefore the independent behavior samples of Hot test selling' under test similar violations assumption homoscedasticity. Hence, desirable examine robust alternatives that require basic ~~ FI~IIIVC: Aa 4In Ua~n ln 114.InrraA CkA nrhnn~lrrrh Alternatives the Hotellincr' Test number tests have been develop test hypothesis situation which it? Alternatives to the Hotelling procedure that do not assume equality James' test the (1954) two population first Johansen' dispersion secondorder (1980) test. matrices tests Differing Yao' only include (1965) their critical values four tests use the test statistic x2 +2 t4 ,J 'C2 where I are respectively the sample mean vector sample dispersion matrix sample The literature suggests the following conclusions about control of Type error rates under heteroscedastic conditions Hotelling' test , James' first and secondorder tests Yao' test, and Johansen' test Yao' test , James' second order test and Johansen' test are superior to James' firstorder test; ese alternatives Hotelling' are sensitive data sampled from skewed populations. Yao (1965) conducted a Monte Carlo study compare Type error rates between the James first order test test when equal unequal sized samples were selected, was , (c) ratio total sample size to number were unequal. Although both procedures have r near a under heteroscedasticity, Yao' test was superior to James' test. Algina and Tang (1988) examined performance Hotelling' James' first order test, and Yao' test when was of the or 10; largest N:p smallest , 10 was sample or 20; was ratio , 1.25 and dispersion matri ces were form and where was diag{3,1,1 ..., 1) , diag{3, . a ...,1) diag{1/3,3,3 S.. .,3) or diag{ 1/3,1/3, ,3,3,S .,3} Algina and Tang confirmed the superiority of Yao' test. Yao' test produced appropriate Type error rates when , and For appropriate error rates occurred when applied both specific cases where one dispersion matrix was multiple the second d2ES) and more complex cases of heteroscedasticity When N:p and , Algina Tang found Yao' test to be liberal Algina, Oshima, and Tang (1991) studied Type error rates James' first and second order Yao' Johansen' tests various conditions defined the degree heteroscedastic nonnormality (uniform, Laplace, beta(5 exponential , and lognormal distributions) The study indicated ese four alternatives to Hotelling' , 4, , or 1: n2 t(5) 115), positive kurtosis. Although four procedures were serious nonrobust with exponential lognormal distributions, they were fairly robust with remaining distributions. The performance of Yao s test, James s second order test, Johansen s test was slightly superior the performance of James s first order test Algina, Oshima , and Tang indicate that test also sensitive to skewn ess. summary Yao test , James secondorder test, Johansen test work reasonably well under normality. Although of these alternatives to Hotelling s T2 test have elevated Type error rates with skewed data, Johansen s test practical advantages general zing to G being relatively easy to compute. MANOVA Criteria The four basi multivariate analyst of variance (MANOVA) criteria are used test the equality of G population mean vectors when independent random samples are selected from populations which are distributed multivariate normal and have equal dispersion matri ces Define z X) (X E Ii) x The ba sic MANOVA criteria are functions of the eigenvalues Define to the eigenvalue (i=1,. where  min(p,G Those criteria are Roy (1945) largest root criterion +x71 Hotelling Lawley trace criterion (Hotelling, 1951; Lawley , 1938) trace 1wi z . Pillai Bartlett trace criterion (Pillai, 1955; Bartlett, 1939) trace [H H+E) and Wilks (1932) likelihood ratio criterion H+E 1 Both analytic (Pillai Sudj ana, 1975) and empirical (Korin 1972 Olson, 1974) investigations have been conducted the robustness MANOVA criteria with respect violations examined homoscedasticity. violations Pillai and homoscedasticity Sudjana the four (1975) basic MANOVA criteria. Although the generalizability the study  a  IS) ft f I . 1 I.. m heteroscedasticity, results were consistentmodest departures from a for minor degrees of heteroscedasticity and more pronounced departures with greater heteroscedasticity Korin (1972) studied Roy's largest root criterion the HotellingLawley likelihood ratio trace criterion criterion when equal and and Wilks' unequal  sized samples were selected from normal populations; p was or 4; G was or 6; the ratio of total sample size to number of dependent variables was 8.25, , 15. , 18 or dispersion matrices were of the form I or D, where was 2d2I 1.5 10) . For small samples, even when the sample sizes were equal dispersion heteroscedasticity produced Type I error rates greater than a. Korin reported the error rates R were greater than those for U and L. Olson (1974) conducted Monte Carlo study comparative robustness of six multivariate tests including the four basic MANOVA criteria when equalsized samples were selected ; (b) p was or 10; (c G was was dispersion matrices were form where represented either a low or high degree of contamination. the low degree of contamination, = d2I, whereas for the high degree of contamination, = diag(pd2p+l, 1,1,..., 1) = 2, , 10 should avoided, while may recommended the most robust of the MANOVA tests. In terms of the magnitude of the departure of r from tendency order increased the was typically degree hetero > V. scedasticity increased. increased The with departure from the increase number dependent variable , however, the impact was well defined. Additionally , for , and 7 decreased as sample increased except when When , 7 increased four basi MANOVA procedures , although the increase was least for Stevens (1979) contested Olson (1976) claim that superior to L and general use multivariate analysis variance because greater robustness against unequal dispersion matri ces. Stevens believed son conclusions were tainted using an example which had extreme subgroup variance differences, which occur very infrequently practice. Stevens conceded Vwas the clear choice diffuse structures, however, for concentrated noncentrality structures with dispersion heteroscedasticity, actual Type error rates , U, and are very similar Olson (1979) refuted Stevens (1979) objections practical grounds. experimenter, faced with real data unknown noncentrality and trying follow Stevens recommendation use Alternatives MANOVA Criteria number tests have been developed test hypothesis 1 = P2 * *= .G in a situation in which (for at least one pair James (1954) generalized James (1951) seri solutions and proposed the stati stic  1=1 where G i=1 "j Ej=  If.1 S G iwii i=1 James (1954) zero , first , and secondorder critical values parall those developed James (1951). Johansen (1980) generalized Welch (1951 test proposed using the James (1954) test statistic divided  p(G1) + 2A Gl) + 1 f cj 2=1 trace w1w.) + trace 2 1W The critical value Johansen test fractile distribution with p(G and p(G 1) [p(G degrees of freedom. The literature suggests the following conclusions about control of Type error rates when sampling from multivariate normal populations under heteroscedast ic conditions four basi MANOVA criteria James' first secondorder tests, and Johansen' test the Pillai Bartlett trace criterion most criteria; with robust unequal the sized four samples basic , Johansen' MANOVA test James s second order test outperform the Pillai Bartlett trace criterion and James' first order (1969) analytically examined Type error rates James' zero order test showed showed Ta I increased the variation the sample sizes degree heteroscedasticity and number dependent variables increased, whereas ra decreased the total sample size increase Tang James' (1989) first studied and Pillai secondorder Bartlett tests, trace criterion Johansen' test when equal unequal Siz ed samples were ected from multivariate normal populations; was or 6; was rw1 1)+ 3/ (3A number dependent variabi was dispersion matri ces were either form or D , where was , diag{(l ,d2,d2) or diag{ 1/d2 ,dd2,d2} for p=3 or D was d'I diag(l,1,1,d2 ,d2) or diag( 1/d2 ,1/d2 ,1/d2 or 3). Results study indicate when sample zes are unequal dispersion matri ces are unequal, Johan sen' test and James s secondorder test perform better than the Pillai Bartlett trace criterion and James first order test Whil both Johansen' test and James' second order test tended have Type error rates reasonably near Johansen' test was slightly liberal where eas James' second order test was slightly conservative. Additionally, ratio total sample size to number of dependent variable has strong impact performance tests Generally, as N:p increases the test becomes more robust. summary , the Pillai Bartlett criterion appears most robust four asic MANOVA criteria violations assumption of dispersion homoscedasticity In controlling type error rates the Johansen test and James secondorder test are more effective than either the Pillai Bartlett trace criterion or James firstorder test Finally, Johansen test computationally practical intensive than advantage James of being secondorder ess test. CHAPTER METHODOLOGY In this chapter , the development of the test stati stics, design and the simulation procedure are described. test states extend the work of Brown and Forsythe (1974) Wilcox (1989) The design based upon review relevant literature and upon the cons ideration that experimental conditions used the simulation should similar those found educational research. Development of Test Statisti Brown Forsvthe Generalizations test *** = L Lo :1 G Brown u and Forsythe (1974) proposed the statistic pmX  x Ni N The statistic approximately distributed with f degrees of freedom , where n.  'i) IN N u n Suppose . XG are dimensional sample mean vectors and I SG are pdimensional dispersion matri ces independent random samples S1zes respectively, 1,.',niG, from multivariate normal stribution ,Zg) To extend the Brown Forsythe statisti the multivariate setting , replace means corresponding mean vectors and replace variances their corresponding dispersion matri ces. Define E K) and z 1=1 The (i=1, S. .,G) are stributed independently Wishart ,S1) and M said to have a sum of Wi shares stribution, denoted and van as M der ~ SW(n, Merwe (1986)  n1 have generalized  n,/N) ZG) Satterthwaite (1946) results and approximated the sum Wisharts distribution ~ Wp(f Applying and van der Merwe results to M the quantity the approximate degrees freedom of M and is given trace Ci i + trace { tra In', + trace is'] S. ,Np(G ni N N,(p, rC1) WP(ni /N) C1 ei C f ni N The problem is to construct test statistic and determine critical values The approach used this study construct test statistic analogous those developed LawleyHotelling PillaiBartlett (V)1 and Wilks Define r X) and  r Then the test stati HotellingLawley trace criterion, the PillaiBartlett trace criterion, the Wilks likelihood ratio criterion are, respectively trace trace flEi E) 1] + '1 Approximate trans format ions can used with each these test statisti CS. Define the following variable es: = number (the independent degrees variables of freedom .....1 A.  1..     aa. a  G (V) a(a+ ,,,1 rHLU~I 1I = min(p,h) (the degrees of freedom for the multivariate analog to sums of within groups) squares  h .5(e For the HotellingLawley criterion, transformations developed Hughes Saw (197 McKeon (1974) respectively are given by 2 (sn+1) (2m+s+l s(2m+s+l),2(sn+1) and F (2) U 2n a2  F Sph,a where = 4 ph + and + h)  1) 2n + p) 2n + For the PillaiBartlett criterion (1985, p.12) transformation is given by 2n+s+l 2m+s+l F  ) smn+s*l),s(2n+s*l) For Wilks criterion, (1952, p.262) transformation is given by rt 2q F (1) U N F where p2h2 = 1 , otherwise and = e _ P Scale the measures between within qrouD variability. Consider the univariate (p=1) case denominator the BrownForsythe statisti z  z = G [ 1  i N G _n .S G 2 ' N 2 = Gs Here is the arithmetic average G sample variances the their respective average 'e sample the sizes. G sample Because variances both are a weighted approaches +h2 t Y freedom for the sum squares between groups. Because numerator the between group sums squares, BrownForsythe statistic is in the metric of the ratio of two mean squares Now the MANOVA criteria are the metric ratio of two sum of squares. Consider the common MANOVA criteria univariate setting. For HotellingLawley, PillaiBartlett, SSBG/(SSBG+SSWG), and Wilks and L respectively, = SSWG/ (SSBG+SSWG) SSBG/SSWG, In each case the test statistics are functions the sum squares rather than mean squares. Hence, in order to use criteria analogous to U, E must be replaced by (f/h)M. i=1,... eigenvalue characteristic equation r (f/h)MI=0. One statistic consider would analogous to Roy largest root criterion (1945) where four basic MANOVA criteria , Roy's largest root criterion most affected heteroscedasticity (Olson, 1974 , 1976, 1979; Stevens, 1979). Consequentially, will omitted. LawleyHotelling trace (Hotelling, 1951; Lawley,1938) is based upon the same characteristic equation Roy' largest root criterion (194 this case, the analogous statistic U* trace(H[(f/h)M] provides one of the test statistics interest. i=1,... denote eigenvalue characteristic equation 8e [H+(f/h)M] 11=0. Then (1. (Bartlett,1939; Pillai, 1955) = trace(H[H+(f/h)M] = s ei provides another test statistic interest. Similarly, F .1 the eigenvalue of the characteristic equation (f/h) M (H+(f/h)M) , then analogous Wilks (193 criterion defined (f/h)M H+(f/h)M conduct hypothesis testing, approximate tran sformations were used with each ese analogous test statistics, replacing NG, the degrees of freedom , by the approximate degrees freedom Thus, the variables are defined follows: = number of independent variables (the degrees freedom the multivariate analog sums squares between group = min(p,h) trace ciS + trace [ ciS2 {trace [ci s, + trace where = 1  i N E 2=1 i1, S=G G iSi3 2) For the modified HotellingLawley criterion, the Hughes and Saw (197 and McKeon (1974) transformations respectively are now given by 2(sn'+l) s(2m+s+l s(2m+s+l) 2 (sn'+1) fU' where = 4 ph + and '  2n' + h) 2(n*  1) (2n' + p) + 1) For the modified PillaiBartlett criterion the SAS (1985, p.12) transformation now given by 2n'+s+l 2m+s+l Fs(2m+s+l) ,s(2n'+s+l) For the modified Wilks criterion, the Rao (1952, p.262) transformation is now given by t r*t  2q where p2h Fph r't p2 +h p2 + h2 a'2 Fphr d  V' =3 Eaualitv expectation the measures of between and within crouo dispersion. The Brown Forsythe statistic was constructed expectations that, the under the numerator null denominator hypothesis are equal show the proposed multivariate general zation Brown Forsythe statistic possesses the analogous property (that E(H)=E(M), assuming true) following results are useful: E(x1 =~1'  IL =11 E(x x' = var + pp'  ~ii)  Var" 1 var i=1 2i n Using results , E(M) given E(M5d = E[ 7n Ir P + E 2=1  12 n Similarly, using results , E (H) given  i)Ij IL),  .1=1 + 1=1 E(X Il,' z 2=1 var VarZ Xi  I [xx1  x F  lx L ~ I I + IL I" + L I'] } 0 z 2=1 G 1 l2i=1 n'El x x I X X / 2 IL j.L/  n ii x' 2 + Ip p' + n I 1'  n JAILr' Sn1 i~rI 1 =1 . i=1 nii 2n [ var + iA CI tA IL'   Gn l. ] = E[  E= i=1  EC Ir Cr/ i cx~i  x~ ex, it Ici n i  2 E 1 = n=1 nE ni n + u Iu" ] + 2 .1' IL'  Ei Sil 1=1 i=1 n. i a. 21 2=1 niS n 2n ppl' +2n ppI SEI1 2=1 1=1 n, F^E. n Hence, E(H) = E (M). Thus the modified Brown Forsythe general nations parallel basi MANOVA criteria terms the measure of between group dispersion, the measure of within group dispers metric between and within group dispersion , and equality the expectation the measures between within Wilcox group dispersion. Generalization test Wilcox (1989) proposed using test statistic E 2=1 where approximately distributed square with degrees freedom. extend thi the multivariate . ._1 A .* I . jZ) " 1 I I 1 L1~  i) where  ni,1 +1) i12i i=1 + 1) 1 I~~ 1=1 The statistic approximately stributed square with p(G degrees freedom. Invariance ProDertv Test Stati stics Samples experiment were selected from either contaminated population or an uncontaminated population. subset matrix of populations as their labeled common uncontaminated dispersion had matrix. the The identity subset populations categorized as contaminated had a common diagonal matrix generality That beyond ese the matrix limited forms form entail loss heteroscedasti city investigate due well known theorem Anderson and invariance characteristic test statistics. I.. .., .1t .w! %/iK  n" 1 L. I F positive definite, there exists pxp nonsingular matrix such that TZ.T TZ.T , where pxp identity matrix and pxp diagonal matrix (Anderson, 1958) Hence , when the design includes two population subsets with common dispersion matri ces within given subset, including only diagonal matri ces each simulated experiment additional limitation on generalizability. Second , the test stati are invariant with respect transformations where a pxp nonsingular transformation. BrownForsvthe General zations =Tx1 denote the sample mean vector and well sample known dispersion that  = Tx I . and I TS.T 1 sample calculated using *and well known that THT Now  N n. N 2'S1 =TM T For the modified Hotelling Lawley trace criterion trace{ M i matrix It is * and * be i=1 n. N =2": trace { T H T TMT ] 1 trace Similarly, the modified PillaiBartlett trace criterion H'[H* trace{T H T'[T H 2 )TM ] 1} trace {H[H+ tM] 1} h For modified Wilks likelihood ratio criterion, . + f h T M T' TH2"' f + 12 TM?1 ' STjIZI if MI h f + M h Wilcox Generalization trace { hM h" f + M h J  12i rs;1 [ T'] lnis1 t~w~i 2 =1 G (T') [r 2=1 ni9;1]2. 1=1  Tii G 2=1 'n S i {Z w1}T11 1i r21~ 2=1 n'Si.i, 2 =1 G=1 i1 i=1 Using results 14, is shown to be invariant follows  T) 'W* i(TX,  Ti) G T i=1  ) ) TI) W1T[T(fi  2)] r . 11  f) 'wi X) ml.~~ ~ ~ ~ ~ % ~ a C a Sr ew ~ r,.4 e TV *t  41,a 4, 14 .1=1 = T(  Ti alr i=1 mk AHA CAHA  ann loss generality solely using diagonal matrices simulate experiments which there are only sets dispersion matrices. It should be noted, however , when there are more than two sets of differing dispersion matrices matrices cannot always simultaneously diagonalized transformation matrix T. Design Eight factors were considered study. These are described following paragraphs. Distribution tvDe (DT). Two types distributions normal exponentialwere included study. Pearson and Please (1975) suggested that studies of robustness should focus distributions magnitudes less with than skewness 0.6, kurtosis respectively. having However, there evidence suggest these boundaries are unnecessarily restrictive. example, Kendall and Stuart (1963, p.57) reported the time marriage over 300,000 Australians. The skewness and kurtosis were 2.0 and respectively. distributional Micceri (1989) characteristics investigated achievement psychometric measures. Of these 440 data sets, 15.2% had both tails with weights about Gaussian, 49.1% least one extremely heavy tail, and 18 .0% had both tail weights less *h~rn a fl~iicci an  U 1n ri', found 28.4% IUI aUY aL 'a)( S.. a F  ~h P YLIU being extremely asymmetric. Of the distributions considered, 11.4% were classified within category having skewness extreme The Micceri study underscores the common occurrence of distributions that are nonnormal Further Micceri study suggests the Pearson and Please criterion may too restrictive. For the normal stribution the coefficients of skewness and kurtosis (p4/M22 are respectively 0.00 and 0.00. For the exponential distribution the coefficients skewness and kurtos is are respectively SThe Micceri study provides evidence that proposed normal exponential distributions are reasonable representations data that may found educational research. Number of dependent variables (Dl. Data were generated simulate experiments which there are dependent variable Thi choi reasonably consistent with the range of variable commonly examined educational research (Algina Oshima, 1990; Algina Tang, 1988; Hakstian, Roed, Lind, 1979; , 1991; Olson , 1974; Tang, 1989) Number of DODulations sampled Data were generated to simulate or G=6 experiments populations which there Dij kstra sampling Werter (1981) from either simulated experiments simulated with equal experiments , and with eaual Olson to 2. 3 (1974) and * f (,/2/23) 1/2 6 . . rare educational research (Tang 1989) Hence, chosen number of populations sampled should provide reasonably adequate examination this factor Decree sample size ratio ( NR) Only unequal sample S1zes are used study Sample size ratios were chosen ratios range n1:n2:n3 from small used the moderately simulation large. when The sampling basic from three different ratios from populations . :n 6 different are used populations given the are Table simulation given Similarly, when Table sampling Fairly large ratios were used in Algina and Tang (1988) study , with an extreme ratio of 5 In experimental and studi common to have sample size ratios between (Lin, 1991) Olson (1974) examined only case equal sized samples Since error rates increase as the degree the sample size ratio increases (Algina Oshima , 1990), nominal error rates are excess ively exceeded using small to moderately large sample size ratios , then procedure presumably will have difficulty with extreme sample size ratios Conversely, the procedure performs well under this range sample ratios then should work well equal sample size ratios question of extreme sample rati still open. Hence , sample size ratios were chosen under the constraint i 4m Table Sample Size Ratios ~Ln1 Ln2 n nI : n2 : n3 1 1 1.3 1 1 2 1 1.3 1.3 1 2 2 Table Sample Size Ratios *: . nI : n2 : : 4 n5 : n6 1 1 1 1 1.3 1.3 1 1 1 1 2 2 1 1 1.3 1.3 1.3 1.3 1 1 2 2 2 2 sample size and largest sample size populations sampled. some cases these basi ratios could be maintained because the restriction the ratio total sample size to number of dependent variable Departure from these basic ratios was minimized. Form the sample ratio (NRF When there are three groups either the sample size ratio form = n, < n3 denoted NRF= or the sample size ratio form  n3 denoted NRF=2 When there are six groups either sample size ratio of the form  n2 = n3 4 < n5  n6 denoted NRF= 1 or the sample ratio form = n2 denoted NRF=2. Ratio total sample size number dependent variables (N:D The ratios chosen were N:p=10 and N:p=20. Hakstian , Roed and Lind (1979) simulated experiments with N equal With some notable exceptions (Algina Tang , 1988; , 1991) current studi tend avoid smaller than . Yao s test (which is generally more robust than James s first order test) should have N:p at least 10 to robust (Algina Tang, 1988) With , Lin (1991) reasoned seems likely that will need to be at least robustness obtained upper limit was chosen represent moderately large experiments. These * .l.. ,, r .c on I; I J 7 1 7 Decree of heteroscedasticitv Each population with dispersion matrix equal a pxp identity matrix will called an uncontaminated population. Each population with pxp diagonal dispersion matrix with at least one diagonal element not equal one will called contaminated population. The forms the dispersion matrices, which depend upon the number of dependent variables , are shown Table Two level d=J2 were used simulate matrices the degree Olson of hetero (1974) scedasticity simulated experiments the with dispersion d equal , 3.0, and Algina and Tang (1988) simulated experiments (1989) chose with equal equal 1.5, and , and Algina Tang Oshima (1990) selected d equal to 1.5 and 3.0 For this study, was used to simulate a small degree of heteroscedasticity and was selected represent larger degree heterosceda sticity SThese values were selected to represent range heteroscedasticity more likely common educational experiments (Tang, 1989) RelationshiD of sample size to dispersion matri ces Both and positive dispersion and negative matri ces relationships were between investigated. sample the size positive relationship the larger samples correspond the negative relationship , the smaller samples correspond to D.  S U U a a   d=J Ilr ii ^ rrrr 1 , I* r Table 4 Forms of Dispersion Matrices Matrix p=3 p=6 D Diag(l,d2,d2} Diag(l,l, d2 ,d2,d2 ,d)d I Diag{l,l,l} Diag(1,1,1,1,1,1) Table Relationship of Samile Size to Heteroscedasticitv (G=3) Sample Size 1 : n2 Ratios * n3 Relationship Positive Negative IID IDD Table Relationship of Sample Size to Heteroscedasti city (G=6) Sample Size Ratios Relationship 1 : n2 : n4 : n5 Positive Negative IIIIDD IIIIDD IIDDDD IIDDDD DDDDII DDDDII DDIIII DDIIII Desicmn Layout. sample sizes were determined once values , N:p, NRF, and were specified. These sample zes are summarized Table Table respectively Each these conditions were crossed with two distributions , two level heteroscedasticity , and relationships sample size dispersion matri ces generate experimental conditions from which to draw conclusions regarding the competitiveness the proposed statistics establ i shed Johansen procedure. Simulation Procedure The for each simulation condition, was with conducted replications separate per runs condition each condition, performance Johansen test (4), variations modifi Hote llingLawley test , the modified Pillai Bartlett test modified Wilks test modified Wilcox test were evaluated using generated data. For sample, nixp (i=1 S. .,G) matrix uncorrelated pseudo random observations was generated (using PROC IML SAS) from target stributionnormal exponential When target distribution was an exponential, the random observations each variates were * .3 a ~ a a e r a a *1 . a A a 4a ~.3  i G ,,1 YI1L ~ 1 1 LL I Table Sample zes (G=3) p G N: p N n, n2 n3 Note. closely occas ionally altered maintain ratio as manageable Table Sample zes p G N:p N n1 n2 n3 n4 n5 n6 Note. closely J is occasionally as manageable. altered maintain ratio LG=6) variates were identically distributed with mean equal zero, variance equal one, and covariances among variates equal zero. Each nixp matrix observations corresponding contaminated population was post multiplied an appropriate D to simulate dispersion heteroscedasticity For each replication, the data were analyzed using Johansen s test the two variations of the modified Hotelling Lawley trace criterion the criterion modified Wilks modified Pillai likelihood ratio Bartlett criterion trace . and the modified Wilcox test. The proportion of 2000 replications that yielded significant results at a= 0 were recorded Summary Two distribution types [DT=normal exponential], level dependent variable (p=3 two level populations sampled or 6), two level of the form of the sample size ratio, two levels of the degree of the sample size ratio , two level of ratio total sample to number dependent variable (N:p=10 or 20) , two level degree heteroscedasticity (d=J 3.0), and two levels relationship of negative sample size condition) to dispersion matri combine give ces (S=positive experimental conditions The Johansen test ('ii the two variations of the ma4 PA^ U a1 1 4r hT TT^ t m f TT *\ +Iha mh~; f; 6~ o; 11 taet llT r T .~ Gt'l d'l) modified Wilcox test (H ) 'ID were applied each these experimental conditions. Generalizations behavior these tests will be based upon collective results of these experimental conditions. CHAPTER 4 RESULTS AND DISCUSSION this chapter analyses a=.05 are presented. Results with regard to i for a= .01 and for a=.10 are similar. The analyses are based data presented Appendix. Distributions the six tests are depicted Figures labelled In each .05 denotes denotes .0750 these .0250 .1249, SlX .0749 and figures, the the interval forth. interval labelled From these figures rates it is clear that in terms of performance controlling Type I Johansen and error modified * Wilcox tests are similar; the performance first modified HotellingLawley (U,*) , second modified HotellingLawley cn~* modified PillaiBartlett modified Wilks tests are similar; the performance of these the p two sets of performancee tests greatly 'the Johansen differ test from one another; superior to that the Wilcox generalization; and the performance of each of BrownForsythe generalizations superior that either the Johansen test or Wilcox generalization. Because the performance of the Johansen and modified Wilcox tests were so different from that of the BrownForsythe generalizations, separate analyses were conducted for each of these two sets of ~ a I I r r r .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Figure Frequency Johansen Histocram Estimate TyDe Error Rates Test m a .05 .10 .15 .20 .25 .30 .35 .40 .45 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Figure Freauencv Modified stoaram Wilcox Estimated Tvoe Error Rates Test .05 .10 .15 .20 .25 .30 .35 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Figure Freaixan~y First Modified stoararn Estimated HotellincLawlev TvDe Error Rates Test .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Figure Freauencv Second Histoaram Modified Hot Estimated ellinqLawley Type Test Error Rates 250 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .80 .85 .90 .95 1.00 Figure Frequency Modified Histocram PillaiBartlett Estimated TvDe Error Rates Test .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Figure Freauencv Modified Histoqram Wilks Estimated Tvne Error Rates Test was used investigate effect the following factors: Distribution Type (DT), Number of Dependent Variables (P:'1 Number of Size Ratio (NR) Populations Sampled Degree of , Form of the Sample Size Ratio (NRF) the Sample Ratio of Total Sample Size Number Dependent Variables (N:p), Degree of Heteroscedasticity (d) , Relationship of Sample Size to Dispersion Matrices (S), and Test Criteria (T) . BrownForsvthe Generalizations Because there are nine factors, initial analyses were conducted determine which effects enter into analysis of variance model. A forward selection approach was used, with main effects entered first, followed twoway interactions, threeway interactions, fourway interactions. Because R2 was for the model with fourway interactions, more complex models were not examined. models are shown in Table The model with main effects and twoway through fourway interactions was selected. Variance components were computed for each main effect, twoway, threeway, fourway interaction. The variance component i=1,...,255) for each effect was computed using the formula 106(MSEFMSE)/(2x), where MSEF was the mean square for that given effect, MSE was the mean square error for the fnuirfar;tfnr i n'l'rn rt'i nn mnrd=1 . . ant 2? was twhe ninmhtr nf 1 Aval ( G) Tabl Man rn i tnri0a rn1 P2 Main Effects rrTun Wa 17 Interaction hreoo Way Interaction, and FourWay Interaction Models when using Way Interaction, and FourWay Interaction Models when using the Four Brown Forsvtne General zations HighestOrder Terms R2 Main Effects 0.52 TwoWay Interactions 0.77 ThreeWay Interactions 0.89 FourWay Interactions 0.96 variance components were set to zero. Using the sum of these variance components plus MSEx106 measure total variance, proportion total variance in estimated Type error rates was computed for the effect ,255) using the formula e,/[ (el+... +e55) 106MSE] . Shown in Table 10 are effects that were statistically significant and accounted for at least 1% of total variance in estimated Type error rates. Because N:p , and GxT are among the largest effects andin contrast to factors such as d DTdo not have to inferred from data , their effects were examined calculating percentiles each combination N:p. These percentiles should provide insight into functioning of the four tests. The DTxNRFxSxd interaction was significant and second largest effect. Consequently the effects of the four factors involved in this interaction were examined constructing cell mean plots involving combinations four factors. Other interaction effects with large variance components that included these factors were checked change findings significantly. The DTxG interaction will be examined because accounts for 4.0% of the total variance in estimated Type I error rates and is not explained in terms of either the effect of T, N:p, and G or the effect of DT , NRF , S, and d. The factor p has nei their a larae ma i n effect or larae interactions with any ii, Table Variance Comnon F.m t Fi rst Mnr i fi sd' HotallingLawle Second Modified HotellincgLawley, Modified PillaiBartlett. Modified Wilks Tests Percent Effect of Variance N:p DTxNRFxSxd T DTxNRFxS NRFxSxd DTxd DTxG NRFxS G GxT DTxGxd DTxGxNRFxS Sxd d S NRFxSxdxT Table 10continued. Percent Effect of Variance pxNRFxSxd DTxGxN :pxd GxN:p NRFxSxT DTxNRxS dxT DTxS Others variance, effect was examined by inspecting cell means and C. Finally, influence degree sample size ratios (NR) was minimal The main effect accounted error only rates. .1% of the The total threeway variance interaction in estimated DTxNRxS Type was effect with the largest variance component which included and still only accounted of the total variance estimated Type error rates. Effect of T , and . Percentil are displayed Tabl percentil are shown Table Using Bradley' liberal criterion .5a) , the following patterns emerge regarding control Type error rates the Brown Forsythe generalizations first modified HotellingLawley test * CM1) was adequate when N:p was however test tended to be liberal when was the second modified Hotelling Lawley test CM2) was adequate when either was 10 and was or when was and was second modified Hotelling Lawley test tended to be cons ervative when N:p was 10 and was whereas the test tended to be slightly liberal when N:p was and was when the was modifi 20 and ed Pillai was Bartlett the test modified was Pillai adequate Bartlett test tended to be conservative when N:p was 10 or when N:p was 20 and was the modified Wilks test was adequate when Table Percentiles of for the First Modified HotellinqLawlev Test 1(U ) and Second Modified HotellinaLawley Test (U ) for Combinations of Ratio of Total Sample Size to Number of Dependent Variables (N:p) and Number of Populations Sampled JGI (N:p=10) (N:p=20) Test Percentile 95th 90th 75th 50th 25th 10th 5th 95th 90th 75th 50th 25th 10th 5th .0795* .0710 .0555 .0505 .0430 .0375 .0345 .0730 .0625 .0513 .0453 .0385 .0325 .0290 .0770 .0715 .0595 .0500 .0398 .0315 .0295 .0510 .0460 .0388 .0290 .0198* .0140* .0135* .0855* .0795* .0610 .0538 .0493 .0460 .0435 .0815* .0785* .0590 .0510 .0470 .0430 .0405 .0885* .0835* .0708 .0625 .0540 .0490 .0485 .0710 .0650 .0565 .0483 .0388 .0355 .0330 Table Percentiles the Modified PillaiBartlett and Modified Wilks Test (L ) for Combinations of Ratio of Total Sample Size to Number of Dependent Variables (N:p) and Number of Populations Sampled (N:p=10) (N:p=20) Test Percentile 95th 90th 75th 50th 25th 10th 5th 95th 90th 75th 50th 25th 10th 5th .0555 .0495 .0430 .0370 .0318 .0240* .0200* .0705 .0635 .0483 .0440 .0388 .0330 .0310 .0365 .0310 .0258 .0210* .0145* .0110* .0070* .0465 .0425 .0360 .0288 .0215* .0155* .0130* .0695 .0660 .0533 .0480 .0425 .0365 .0345 .0780* .0745 .0575 .0513 .0455 .0415 .0405 .0510 .0500 .0455 .0380 .0315 .0275 .0235* .0615 .0580 .0533 .0450 .0375 .0345 .0325 Test 82 modified Wilks test was conservative when N:p was 10 and G was 6. Effect of DT, NRF. shown Figure Figure when data were sampled from a normal distribution, regardless the form sample size ratio, mean increased degree heteroscedasticity increased positive condition whereas mean decreased degree heteroscedasticity increased the negative condition. However , as shown in Figures 9 and 10, when data were sampled from an exponential distribution, mean increased as degree of heteroscedasticity increased regardless of the relationship of sample sizes and dispersion matrices. The mean difference in t between the higher and lower degree of heteroscedasticity was greater positive condition when the sample was selected first form sample size ratios whereas when the sample was selected in the second form of the sample size ratio , the mean difference was greater in the negative condition. With data sampled from exponential distribution BrownForsythe generalizations tend conservative when there was slight degree heteroscedasticity (that , (b) degree heteroscedasticity increased (d=3) the first form of the sample size ratio was paired with the negative condition, the degree of heteroscedasticity increased and the second fnrm nf *1 cz mn1 aS r^31+ i n  a a ~ .. * a Cl ~ ~ ~ ~ ~ u '7 'I. *~ rb.. I aII .. j~i . nh~;t~trr d=J E 1 '7 d Mean Type I Error Rate 0.07 0.06 0.05 0.04 0.03 0.02 d = sqrt( positive condition negative condition Sample Size to Dispersion Relationship Figure Estimated TvDe Error Rates the Two Levels Degree of Heteroscedasticity (d = J2 or 3) and Relationship of Sample Size to Dispersion Matrices (S = positive or negative condition) When Data Were Sampled as in the First Form of the ~~~~ ~~ ~~ a a * S Sample Ratio from an Normal DistriDution Mean Type I Error Rate 0.07 = sqrt(2) positive negative condition condition Sample Size to Dispersion Relationship Figure Estimated Type I Error Rates for the Two Levels of the Degree of Heteroscedasticity (d = J2 or 3) and Relationship of Sample Size to Dispersion Matrices (S = positive or negative condition) When Data Were Sampled as in the Second Form of the Sample Size Ratio from an Normal Distribution Mean Type I Error Rate 0.07 0.06 0.05 0.04 0.03 0.02 positive negative condition condition Sample Size to Dispersion Relationship Figure Estimated Mean TvDe Error Rates the Two Levels Degree of Heteroscedasticity (d = J2 or 3) and Relationship of Sample Size to Dispersion Matrices (S = positive or negative condition) When Data Were Sampled as in the First Form of the Sample Size Ratio from a Exponential Distribution d = sqrt(2) Mean Type I Error Rate 0.07 0.06 0.05 0.04 0.03 0.02 d=3 d = sqrt(2) d = sqrt(2) positive negative condition condition Sample Size to Dispersion Relationship Figure Estimated Mean TvDe Error Rates Two Level Degree of Heteroscedasticity (d = J2 or 3) and Relationship of Sample Size to Dispersion Matrices (S = positive or negative condition) When Data are Sampled as in the Second Form of the Sanpile Ratio from a Exponential Distribution distribution, the BrownForsythe generalizations tended to be liberal when the first form sample size ratio was paired with the positive condition, the second form the sample size ratio was paired with the negative condition. Effect of DTxG interaction. As shown in Figure 11 mean the Brown Forsythe generalizations was nearer a when was than when was regard ess type distribution from which the data were sampled. When data were sampled from normal distribution, the tests tended slightly conservative. Mean was near when data were sampled from exponential distribution and was However when data were sampled from exponential distribution and was Brown Forsythe general zations tended to be conservative Effect Shown Figure mean was near a for the Brown Forsythe general zations when was When was the tests tended to be slightly conservative. Mean Type I Error Rate 0.07 0.06 0.05 0.04 0.03 0.02 G=3 Normal Exponential Distribution Type Figure Estimated Mean Distribution Tvype TVDe Error Number Rates Combinations of Populations Sampled Mean Type I Error Rate 0.04 0.03 0.02 6 Number of Dependent Variables Figure Estimated Mean Tvpe Error Rates Brown Forsvthe Generalizations for the Two Levels of the Number of Dependent Variables Johansen Test and Wilcox Generalization Because there are nine factors, initial analyses were conducted determine which effects enter into analysis of variance model. A forward selection approach was used, with main effects entered first, followed twoway interactions, threeway interactions, fourway interactions. Because R2 was .997 for the model with fourway interactions , more complex models were not examined. The main models effects are twoway shown through in Table fourway model interactions with was selected. Variance components were computed for each main effect twoway, threeway and fourway interaction. variance component i=1, ..,255 for each effect was computed using the formula 104(MSEFMSE) , where MSEF was the mean square for that given effect, MSE was the mean square error for the fourfactor interaction model , and was the number of levels for the factors included that given effect. Negative variance components were set to zero. Using the sum of these variance components plus MSExl04 measure total variance, proportion total variance in estimated Type error rates was computed effect (i=l, . *. using the formula e8/[ (8,+... +255) + 104MSE]. Shown in Table 14 are effects that were statistically significant and (b) Table Magnitudes n.E Pr2 Main Effcrts Interaction Tb rPa  Way Interaction, and FourWay Interaction Models when usinQ a a t~hw Johan sen Test and WiI cox General action HighestOrder Terms R2 Main Effects 0.767 TwoWay Interactions 0.963 ThreeWay Interactions 0.988 FourWay Interactions 0.997 ~wn W~ v Table Variance Components for the Johansen and Modified Wilcox Tests Percent Effect of Variance 36.2 GxN 227 14.8 11.4 GxT GxNRFxNR P pxG GxNR NRFxNR Others Because N:p, , GxN:p, and GxT are among the largest effects andin contrast to factors such as d and DTdo have to be inferred from data, their effects will be examined calculating percentiles of f each combination of G and N:p. These percentiles should provide insight into functioning these two tests. Effect of T,. , and SPercentiles are splayed .5a) Table the l Using following Bradley' patterns liberal emerge criterion regarding control Type error rates the Johansen test and cox general zation Johansen test was adequate only when N was 20 and was and Wilcox general zation was inadequate over range experimental conditions considered the experiment. Since performance of the Johansen test the Wilcox general zation was inadequate further analyst was warranted either ese two tests. Summary clear that terms controlling Type error rates under the heteroscedastic experimental conditions considered the four Brown Forsythe general zations are much more effective than either modified Wilcox test Johansen test , (b) Johansen is more effective 