Citation |

- Permanent Link:
- http://ufdc.ufl.edu/UF00101030/00001
## Material Information- Title:
- Robustness of four multivariate tests under variance-covariance heteroscedasticity
- Creator:
- Tang, Kezhen Linda, 1953-
- Copyright Date:
- 1989
- Language:
- English
## Subjects- Subjects / Keywords:
- Covariance ( jstor )
Critical values ( jstor ) Degrees of freedom ( jstor ) False positive errors ( jstor ) Mathematical robustness ( jstor ) Matrices ( jstor ) Population mean ( jstor ) Sample size ( jstor ) Statistical discrepancies ( jstor ) Statistics ( jstor )
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Kezhen Linda Tang. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 22215575 ( OCLC )
AHD6997 ( LTUF ) 0021228149 ( ALEPH )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

ROBUSTNESS OF FOUR MULTIVARIATE TESTS UNDER VARIANCE-COVARIANCE HETEROSCEDASTICITY BY KEZHEN LINDA TANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1989 ACKNOWLEDGMENTS I would like to express my sincerest appreciation to Dr. James J. Algina for his guidance, advice, and encouragement. Starting with choosing the dissertation topic, through writing the computer program, to conducting the study, and summarizing the results, he has guided me step by step. I am also grateful to Dr. Frank G. Martin, who provided many valuable comments and suggestions. I also am indebted to Dr. Robert R. Sherman and Dr. Linda M. Crocker, who patiently read the manuscript and offered editorial assistance, and who also helped me build my confidence when I encountered difficulties. I also appreciate and thank Dr. Ramon C. Littell for his statistical assistance and for allowing me to use the computer facilities in the Department of Statistics. Many thanks are extended to Ms. Pamela S. Somerville for her assistance in using the computer program to process the manuscript. Without her help the formulas and symbols could not be presented clearly. Finally, I would like to express my most profound appreciation and gratitude to my family for their deep love and strong support. ii TABLE OF CONTENTS page ACKNOWLEDGMENTS.......................................... ........ ii ABSTRA CT ...... ................................................ v CHAPTERS 1 INTRODUCTION............................................... 1 The Prob lem............................................. 4 Purpose of the Study.................................... 5 Significance of the Study.................................. 5 2 REVIEW OF LITERATURE.................................... 7 The Behrens-Fisher Problem................................. 7 Welch's Solutions.......................................... 8 James' Series Solution for Comparing Several Means...... 10 Welch's APDF Solution for Comparing Several Means....... 12 Behavior of Welch's and James' Tests...................... 13 Hotelling's Two Sample T2.................................. 15 Behavior of T2 When Covariance Matrices Are Unequal..... 16 Yao's and James' Tests.................................... 18 Behavior of Yao's and James' Tests........................ 20 M A N OVA. . . . . . . . . . . . . . . . . . . . . . . . . 2 2 James' and Johansen's Solutions............................ 25 Sum m ary... ... .. ........... .... .... ............. ..... .... 28 3 METHODOLOGY............................................. 30 D e s ign. .. . .. .. . . . . . . . .. . . . . . . . . . . . 30 Invariance Property of the Test Statistic................. 36 Simulation Procedure.................................... 40 Sum m a ry . . . . . . . . . . . . . . . . . . . . . . . . 4 1 4 RESULTS AND DISCUSSION.................................. 43 The Performance of the Four Multivariate Tests When n, = n2 = n3 . . . . . . . . . .................... 43 The Performance of the Four Multivariate Tests When Sample Sizes Are Unequal.............................. 51 5 CONCLU SIONS ............................................. 100 Experiments with Equal Sample Sizes...................... 102 Experiments with Unequal Sample Sizes................... 103 APPENDICES A RESULTS OF THE SIMULATED EXPERIMENTS WHEN n, = n2 = na, a = .01, AND a = .10.................. 105 B RESULTS OF THE SIMULATED EXPERIMENTS WHEN SAMPLE SIZES ARE UNEQUAL AND a = .01............... 117 C RESULTS OF THE SIMULATED EXPERIMENTS WHEN SAMPLE SIZES ARE UNEQUAL AND a = .10.............. 141 REFERENCES.................................................... 165 BIOGRAPHICAL SKETCH..... ...................................... 170 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ROBUSTNESS OF FOUR MULTIVARIATE TESTS UNDER VARIANCE-COVARIANCE HETEROSCEDASTICITY BY Kezhen Linda Tang December, 1989 Chairman: James J. Algina Major Department: Foundations of Education Three hundred and sixty simulated experiments were conducted to compare Type I error rates of the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test when the covariance matrices are heterogeneous. Both equal and unequal sample-size conditions were investigated. Results of this study indicated that the ratio of total sample size to the number of variables had a strong impact on the performance of the tests. As the ratio increased, the tests become more robust, except the Pillai-Bartlett trace. With one exception, as the covariance heteroscedasticity increased, the performance of the tests declined. The exception was James' second order test, which performed better when heteroscedasticity increased and the ratio of total sample size to the number of variables was 20. When V vi the smaller samples were associated with the covariance matrices with smaller elements, the performance of Johansen's test and James' first order test improved. When the smaller samples were associated with the covariance matrices with larger elements, the performance of James' second order test improved. The number of variables impacted only James' second order test. The performance of the test declined as the number of variables increased. The sample size ratio affected the tests only when the ratio of the total sample size to the number of variables was small. The form of covariance matrices affected only the performance of the Pillai-Bartlett trace. In conclusion, Johansen's test and James' second order test performed better than the Pillai-Bartlett trace and James's first order test when covariance matrices were unequal. Johansen's test tended to have estimated Type I error rates larger than a and James' second order test tended to have estimated Type I error rates smaller than a. However, for both tests the Type I error rates were reasonably near a. CHAPTER 1 INTRODUCTION It is common in educational research to compare two or more samples on several criterion variables. That is, p response variables are observed in k treatment groups of n, experimental units per group (i = 1, . k) Multivariate analysis of variance (MANOVA) can be employed to analyze the data. When p = 1, the univariate F test can be used. When the assumptions of the F test are met, it is the uniformly most powerful, invariant test of the null hypothesis that the means of the populations from which the k samples are drawn are equal. However, in MANOVA there is no single invariant test that is uniformly most powerful. Four MANOVA test criteria are commonly employed: 1. Roy's largest root (R); 2. The Hotelling-Lawley trace (T); 3. Wilks' likelihood ratio (W); 4. The Pillai-Bartlett trace (V). When either k = 2 or p = 1, the four test criteria lead to the same decision about the null hypothesis. One of the major assumptions of MANOVA is homogeneity of the covariance matrices in the k populations from which the samples are drawn. Stevens (1986) pointed out that the restrictiveness of the I 2 assumption becomes more strikingly apparent when we realize that the corresponding assumption for the univariate ANOVA is that the variances on only a single variable are equal. Therefore, it is very unlikely that the equal covariance matrices assumption ever would be absolutely satisfied in practice. Olson (1974) provided a general picture of the behavior of the MANOVA criteria under violation of the covariance homogeneity assumption. According to Olson, for protection against covariance heterogeneity in the population, the T, W, and R tests should be avoided. Although the V test stands up best to violations of homogeneity of covariance matrices, its actual Type I error (r) is still somewhat high. For example, under various simulated experimental conditions, Olson (1973) reported that the estimated Type I error rate, r, of the V test ranged from .061 to .186 when the nominal rate was a = .05. James (1954) and Johansen (1980) proposed multivariate tests for the situation in which the covariance homogeneity assumption is violated. In their tests, the covariance matrices of k populations are estimated separately. Therefore, the common covariance matrix assumption is not required for the tests that are solutions to the multivariate Behrens-Fisher problem. The univariate Behrens-Fisher problem is that of testing equality of the means of two populations without assuming that the 3 population variances are equal. According to Yao (1965), Behrens was the first to offer a solution to the Behrens-Fisher problem. Fisher (1935, 1939) showed that Behrens' solution could be derived by using his theory of statistical inference called "fiducial probability." Welch (1938) proposed an approximate degrees of freedom (APDF) solution to the Behrens-Fisher problem, a solution in which the test statistic is approximately distributed as Student's t with appropriately adjusted degrees of freedom. Later, Welch (1951) proposed an APDF solution for the k-sample case, a solution in which the distribution of the test statistic is approximated by an F distribution with adjusted degrees of freedom. Welch (1947) proposed a series solution to two-sample Behrens-Fisher problem and presented the solution to terms of order 1/f2, where f, = n, 1. Aspin (1948) extended Welch's approximation to the terms of order l/fi. James (1951) proposed a solution to the k-sample Behrens-Fisher problem. He also used the series expansion technique. The distribution of his test statistic is approximated by a function of the X2 distribution. Welch's (1938, 1951) APDF solution was generalized by Yao (1965) to solve the multivariate two-sample Behrens-Fisher problem. It was further generalized by Johansen (1980) to solve the multivariate k-sample Behrens-Fisher problem. James' (1951) series 4 solution was generalized to the multivariate k-sample situation by James (1954), who reported both a first order and a second order test. The Problem The behavior of Welch's APDF test and James' series test has been investigated by several authors (Brown & Forsyth, 1974; Wang, 1971). The APDF solution in the two-sample multivariate situation has also been studied (Yao, 1965; Algina & Tang, 1988). However, the behavior of Johansen's APDF solution and James' series solution to multivariate k-sample Behrens-Fisher problem is an unexplored topic. The researches listed in the preceding paragraph demonstrated that with heteroscedastic covariance matrices (variances in the univariate case), the APDF and series solutions to both the univariate and multivariate Behrens-Fisher problems performed better than the usual tests, such as t, F, and Hotelling's T2, for which homoscedasticity is assumed. Furthermore, the studies demonstrated that the APDF tests are superior to James' first order tests in controlling r. Whether this is still true when the tests are applied in the multivariate k-sample situation is unknown. Dijkstra and Werter (1981), on the other hand, showed that James' second order test gives better control of r than does Welch's APDF test in the univariate k-sample case. Their findings suggest 5 that the superiority of Welch's APDF test to James' first order test may be due to omitting the second order term in James' test. With modern computer facilities it may be preferable to include James' second order term in the multivariate test. The behavior of MANOVA has been studied by Olson (1974). He concluded that among the four criteria only the V test is relatively robust against violation of the equal covariance matrices assumption. However, his study was restricted to an equal number of experimental units in each of k samples. Therefore, investigating the performance of James' and Johansen's tests when the sample sizes are unequal and comparing them to the performance of V under the same condition is merited. Purpose of the Study The purpose of the present study was to compare Type I error rates of the Pillai-Bartlett trace (V), Johansen's test, and James' first and second order tests when the covariance matrices are heterogeneous. Both equal and unequal sample-size conditions were investigated. Significance of the Study The application of multivariate analysis of variance (MANOVA) in education and the behavioral sciences has increased dramatically, and it appears that it will be used frequently for data analysis in the future (Bray & Maxwell, 1985). Stevens (1986) provided three 6 reasons why multivariate analysis is necessary: 1. Any worthwhile treatment will affect the subject in more than one way. Hence the problem for the investigator is to determine in which specific ways the subjects will be affected, and then find sensitive measurement techniques for those variables. 2. Through the use of multiple criterion measures we can obtain a more complete and detailed description of the phenomenon under investigation.... 3. Treatments can be expensive to implement, while the cost of obtaining data on several dependent variables is relatively small, and maximizes information gain. (p.2) Because the assumption of equal covariance matrices of k populations is unlikely to be satisfied in practice, there is potential danger that r will not be equal to a when using the usual MANOVA criteria. In this study, estimates of r were reported for four tests that can be employed in MANOVA: Pillai-Bartlett trace, James' first order test, James' second order test, and Johansen's test. Both equal and unequal sample sizes were employed. The results provided evidence about whether or not the latter three tests are good alternatives to the Pillai-Bartlett trace, which was reported by Olson to be more robust when sample sizes are equal than Roy's, Wilk's, or the Lawley-Hotelling test. CHAPTER 2 REVIEW OF LITERATURE The Behrens-Fisher Problem One of the most common situations in educational research is the comparison of the means of two populations. If we assume the variable is normally distributed in each population and has equal variances in the two populations, the t test is the uniformly most powerful test for the hypothesis that the means are equal. The robustness of t when the assumption of variance equality is violated has been studied extensively. Scheff6 (1959) summarized results of studies by Hsu (1938) and Welch (1938, 1951). In general, if the two samples are of equal size, the distribution of t is changed very little by inequality of variance, especially for large sample sizes. However, if the sample sizes are unequal, inequality of the variances affects the distribution considerably. The t test is liberal when the larger sample is drawn from the population with smaller variance. It is conservative otherwise. Yao (1965) pointed out that Behrens was the first to offer a solution to the problem of testing equality of the means of two populations without assuming that the variances are equal. This problem has become known as the Behrens-Fisher problem. Fisher 7 8 (1935, 1939) showed that Behrens' solution could be derived using his theory of statistical inference called "fiducial probability." Fisher and Healy (1956) calculated tables of the Behrens-Fisher distribution. James (1959) and Ray and Pitman (1961) further studied the distribution. Six other solutions to the Behrens-Fisher problem were discussed and compared by Scheff6 (1970). According to Scheff6, within the Neyman-Pearson school of thought, Welch's approximate degrees of freedom solution (APDF), which requires only the ubiquitous t-table, is a satisfactory practical solution to the Behrens-Fisher problem. Welch's Solutions APDF Solution Let s2 s2 1L + 2 ni n2 where X is the mean of sample drawn from population i, and S, is the sample variance. The distribution of v depends to a certain extent on a 2/ Welch (1938) approximated the distribution of Si/n, + n2 by the x2 distribution such that the mean and variance of the X2 distribution agree with the true mean and variance of S2/n1 + S /n2. This enabled Welch to approximate the distribution of v by a t distribution with degrees of freedom 9 [ S2 2 2 n n2 s4 s 4 1 + 2 n (1 1) n (n2 1) Welch's approximation involves referring v to a table of the t distribution, entered with f degrees of freedom. Welch (1947) showed that the errors involved in making probability statements about v on the basis of this particular approximation are of order 1/f,. Series Solution In addition to his APDF solution, Welch (1947) derived an exact solution, in the sense that it is independent of the irrelevant population parameters c2. Let R, and X2, the means of two samples, be distributed independently and normally with means i, P2 and variances a /n1, C /n2 respectively. Suppose that the observed data provide estimates S2 of the variances, c,, and these estimates are distributed independently of each other and of X, X2. Then, Pr [(R, X2) (p,, /2) < h(S S P) = P where 0 < P < 1. The quantity h(S2, S2, P) is a function of S and P such that P is the probability that (XI X2) (,' P2) is smaller than h(S2, S2, 10 P). To test the hypothesis H0: p p2, h(S1, S P) is calculated, and serves as a critical value for comparing the population mean differences. Welch (1947) provided a formula for h(S2, S2, P) up to terms of order 1/f,: h(S~ s~,P) -~(~A~)'~~l +(1 + 2) ____4 (( ++ 2) (3+ 2 +4, f2 42 ()3 ) 2 (XXS )2 3 (zS) (15 + 32 2 + 9 4 ) f, 1 32 ( AS )4 I where ( is the normal deviate such that i( ) = P, and A, = 1/n,. Aspin (1948) extended the work further to the terms of order 1/f0. In addition, tables of critical values were computed by Aspin (1948, 1949) and Trichett, James, and Welch (1954, 1956). The limiting behavior of the series solution was studied by Wald (1955). Welch's series solution to the Behrens-Fisher problem agrees with Welch's (1938) APDF solution up to and including the term of order 1/f,. James' Series Solution for Comparing Several Means James (1951) derived, for the k-sample case, a series solution to the Behrens-Fisher problem. Let X1,..., X., the means of k 11 samples, be distributed independently and normally with means 4, and variances c2/n, ..., or2/n., respectively. Let S2 (i = 1,...,k) be estimates of the o. The S2 are distributed independently of the X, and of each other. To test the hypothesis H P1 = p2 = - we calculate the statistic k W, (R' )2 where = (S2 / n)1 k W =W, i= = and k W If the n, are all large, LW' (R' )2 can be taken to have the x2 distribution with k 1 degrees of freedom. However, if the n, are not all large, the test statistic may not have a y2 distribution with k 1 degrees of freedom. James (1951) sought to derive a 12 series function of the S, such that Pr [ ,W, (X, R)2 < 2h(S )] = P James found approximations to 2h(S2) of orders 1/f, and 1/f2. To order of 1/f,, he found 2 W)= 2 +3X2~. + (k + 1) Z 1 W, 2 2h(Si) = X2, a + 3X.a+( x i w 2 1 k-l~a I 2(k 2 1 The quantity 2h(S) serves as James critical value. The test statistic, IWI(X, )2, is compared with this critical value and H0 is rejected if XW(X, R)2 > 2h(S2). James also provided the approximation of order 1/f2, which can be shown to agree with Welch's (1947) result when k = 2. However, James pointed out that this second order approximation involves a good deal of numerical calculation. Welch's APDF Solution for Comparing Several Means Welch derived an alternative series solution for the k-sample case. By comparing the cumulant-generating function of XW,(X, R)2 with that of the F distribution, Welch brought the two into agreement. Welch found that to order 1/f,, ZWx, R)2 is distributed as C times F, where 13 C = k 1 + 2(k 2) W 2 k + 1 f W J This F distribution has degrees of freedom df, = k 1 and 3f 2 W' 2 (k2 1) WThe steps in the approximate test procedure are calculating 2 =, w R)2 / (k 1) 2(k 2) 1 W 2 1 (k 2) f, W J and referring v2 to F table entered with degrees of freedom df, and df2. Behavior of Welch's and James' Tests Wang (1971) compared the Behrens-Fisher (Fisher, 1935, 1939) test and the Welch (1938) APDF test for two samples. The results show that at n, = 7 and n2 = 13, the maximum deviation from the nominal a level is 0.0144 for the Behrens-Fisher test and 0.0017 for Welch's APDF test. Combining all of the experimental conditions, Wang found that the maximum deviation of Welch's APDF test from 14 nominal a level is about 0.001. For larger sample sizes in both samples, such as 13:19, the maximum deviation from the nominal a level decreases to 0.0005. Wang also found that Welch-Aspin series solution (Welch 1947; Aspin, 1948) gives even smaller deviations from the nominal a than Welch's APDF test. However, Wang pointed out that because the Welch-Aspin critical values are available for only a selected set of a, f, and f2, and the actual computation for the critical values is very tedious, it seems reasonable to use the Welch APDF test, which requires only the t-table. Scheff6 (1970) also opined, as mentioned earlier, that Welch's APDF solution is a satisfactory practical solution of the Behrens-Fisher problem. Brown and Forsyth (1974) reported that in the k-sample situation, r for the ANOVA F test deviates markedly from the nominal a level when the variances of the populations are unequal. The Type I error rates of Welch's (1951) APDF solution and James' (1951) solution, on the other hand, are near the nominal a levels. They further pointed out that Welch's (1951) APDF solution is superior to that of James (1951) series solution in controlling r, especially when the sample sizes are small. Dijkstra and Werter (1981) showed that James' second order test gives better control of r than does Welch's test. Their conclusions were supported by Wilcox (1987). 15 Hotelling's Two Sample T2 Suppose the purpose is to compare the mean vectors p, and pg of two p-dimensional multivariate normal populations on the basis of a random sample from each. The two populations are assumed to have the same, though unknown, covariance matrix E of full rank p. Let the samples be denoted by (x j = 1, 2, ....n,; i = 1, 2} Define the sample mean vectors and covariance matrices as 1 n j=l n, S, ~ = n, X(' X' x' X' 1, I j=l For testing the hypothesis H0: p0 =02 vs H,: p p2, Hotelling (1931) proposed the T2 statistic which is a generalization of univariate t test. The test statistic is T2 n n2 X 'S1 n, + n2 where (n, 1)S1 + (n2 1)S2 n1 + n2 - 16 Hotelling showed that the quantity n + n2 P T2 (n, + n2 2)p has an F distribution with degrees of freedom p and n, + n2 p 1. H0 is rejected if T2 > (n, + n2 2)p F n, + n2 pwhere v = n1 + n2 p 1. Behavior of T2 When Covariance Matrices Are Unequal The robustness of Hotelling's T2 to the violation of homogeneous covariance matrices assumption has been investigated by many researchers both analytically (Ito & Schull, 1964) and empirically (Algina & Oshiina, 1989; Hakstian, Roed, & Lind, 1979; Holloway & Dunn, 1967; Hopkins & Clay, 1963). The inequality of covariance matrices, X, Z 2, is reflected by the eigenvalues 6,, i = 1, 2, ...., p, of the matrix EF22-1 Ito and Schull (1964) investigated the large-sample properties of the T2 statistic in the presence of unequal covariance matrices Z, and F2. They demonstrated that when n, and n2 are equal and large, 17 and the eigenvalues of Z1g2 are equal to one another but are not equal to one, heteroscedasticity has no effect upon r. Ito and Schull also investigated the situation in which the eigenvalues of 12-1 are distinct and both samples are very large, but only for p = 2. They reported r near a provided the eigenvalues are within the range (.5, 2). Hakstian, Roed, and Lind (1971) also found that with equal sample sizes the T2 procedure is generally robust with respect to violation of the homogeneity of covariance matrices assumption. However, even when n, = n2, other factors, such as sample sizes, p, and 6, also play important roles in the robustness of T2. Based on 1000 replications, Hopkins and Clay (1963) showed that when n, = n2 > 10, T2 is rather robust against variance inequality but that this robustness does not extend to smaller sample sizes. For example, when n, n2 = 5, p = 2, 91 62 10.24 and a = 0.05, r was .083. Holloway and Dunn (1967) found that for large and equal sample sizes, the robustness of T2 depends on p. For example, when nj = r2 = 06 = 62 = 10, r is close to a for p = 2 and p = 3, but departs fairly markedly from a when p = 7 or p = 10. For unequal sample sizes, in all of the preceding studies, researchers found that the test leads to unacceptable Type I error rates as the degree of population covariance matrix heterogeneity is increased. Even for p = 2 and relatively mild heterogeneity, for 18 example, E2 = 2.2571, a sample size ratio of only 1.1:1 can produce an unacceptable Type I error rate (Algina & Oshima, 1989). When the larger sample is drawn from the population with the smaller covariance matrix, the test results in overestimation of significance. When the larger sample is drawn from the population with the larger covariance matrix, the test results in underestimation of significance. This tendency increases with the magnitude of the inequality of two samples, with the degree of heterogeneity, and with p. In general, with large and equal sample sizes and a fairly large ratio of (n, + n2) to p, the T2 test is robust. Otherwise, the test is less robust, being conservative when the larger sample is drawn from the population with generally greater dispersion, and liberal in the opposite situation. Therefore, the behavior of T2 under violation of homogeneous covariance matrices assumption is similar to that of the univariate t test. Yao's and James' Tests James (1954) and Yao (1965) proposed tests to deal with multivariate Behrens-Fisher problem when there are two samples. The tests allow the assumption of equal population covariance matrices to be relaxed. James' (1954) test is a generalization of James' (1951) univariate solution of the Behrens-Fisher problem. Yao's test is a generalization of Welch's (1938, 1951) APDF 19 solution. The test statistic is T = (X1 X2)' (S1/n, + S2/n2)'(x1 X2) This test statistic is asymptotically distributed as chi-square with p degrees of freedom. James proposed a critical value, 2h(S ), which is a generalization of his univariate critical value discussed in the previous section. To order 1/f, this critical value is 2h(Si) = x2 (A + Bx2) where A = 1 + 1 [tr (I W1 W,)]2 1 tr (I W1 W,)2 + 1 E[tr (I W1 W)] B p(p + 2) fl 2 fl In A and B, W, and W are defined in the same manner these quantities are defined in James' univariate test, but with sample covariance matrices replacing sample variances. Yao (1965) generalized Welch's (1938, 1951) solution to the Behrens-Fisher problem. His critical region is based on Hotelling's T2 with p and f degrees of freedom. Let y = x, x2 O, A, = 20 S,/n,, and A = A, + A2. Define V, = y'A'AA-y. The approximate degrees of freedom (APDF) fT is defined by 1 f T 2 i=l 2 T T2 Therefore, the critical region is Ts > T a (p, f a if tables of T 2 are not available, the transformation F f 1 2 v T P + V can be used. Thus we reject H0: Al = A2 if FV > F(p, fT P + 1). Behavior of Yao's and James' Tests Yao (1965) conducted a simulation study to estimate r for James' test and his own test. The results indicated that both tests are quite robust. However, the generality of this conclusion is limited because Yao only studied p = 2. Ito (1969) investigated the behavior of James' test analytically. He studied both equal and 21 unequal sample sizes. For k = 2, the sample sizes, n, + n2, were 20, 300, and 600. He found that when n + n2 = 300 or 600, the test was quite robust. When the total sample size was 20, the test was liberal; i.e., r is greater than a. This tendency increases as sample sizes decreases, the inequality of sample sizes increases, the inequality of covariance matrices increases, and p increases. The behavior of James' test for sample sizes between 20 and 300 was not included in Ito's study. Algina and Tang (1988) investigated the performance of T2, Yao's test, and James' test under a broader set of conditions than Yao (1965) studied. In the Algina-Tang study, p = 2, 6, or 10; (n, + n2)/p = 6, 10, or 20; n, : n2 ranged from 1:1.25 to 1:5; and 0, ranged from 1.5 to 3.0. They found that the deviation of r from a is larger for Hotelling's T2 than for either Yao's or James' test when 0, r 1. Neither Yao's nor James' test tends to be conservative. The r for James' test tends to be larger than that for Yao's test. Algina and Tang concluded that in terms of control of Type I error rates, Yao's test is preferable to James' test. When the sample sizes are equal (n, = n2 = n), Yao's test has appropriate Type I error rates provided (2n/p) > 10. When n, F n, and 10 < (n, + n2 )/p, Yao's test can also be safely used, provided the ratio of the larger to the smaller sample size is less or equal to 2:1. When nj n2 and (n, + n2)/p > 20, Yao's test can be safely used a) if p = 2 22 and the ratio of the two samples is 5:1 or smaller; b) if p 6 and the ratio is 3:1 or smaller; and c) if p=10 and the ratio is 4:1 or smaller. MANOVA Suppose the purpose is to compare the means of p,, p2 .... AV of k p-dimensional normal populations on the basis of a random sample from each. Let H(p x p) and E(p x p) be the sums-of-products matrices for hypothesis and error, respectively, defined in the one-way case as k H= 2 n, (x, x)(X, X) i=1 k n E = X I (x, X,)(Xg X,) i=l j=l where x is the jth observation vector in group i, x, is the mean vector for the ith group and x is the grand mean vector. It is known that all invariant test criteria are functions of the eigenvalues of HE- (the rank of which is s = min(p, k 1)). There are four popular test statistics, each a different function of the eigenvalues of HE'. Let Am be the mth eigenvalue of HE-, m = 1, ..., s. The four test criteria are: 1. Roy's (1945, 1953) largest root (R). R = A,/(1 + A,), 23 where Al is the largest eigenvalue of HE-; 2. The Hotelling-Lawley (Lawley, 1938, 1939; Hotelling, 1951) trace (T). T = mAm; 3. Wilk's (1932) likelihood ratio(W). W = Im/( + Am); 4. The Pillai-Bartlett (Bartlett, 1939; Pillai, 1955) trace (V). V = XmAm/(l + Am). When s = 1, i.e. when either k = 2 or p = 1, the four criteria lead to identical conclusions. The robustness of MANOVA is, for the most part, an unexplored topic. Korin (1972) studied the behavior of R, T, and W under the violation of the homogeneity of covariance matrices assumption. In his study, the base population had a covariance matrix equal to I, the identity matrix. The other populations had covariance matrices in the form of d2I, where d = 1.5, 3.0, or f10.0. Korin reported that covariance heterogeneity in small samples, even when the sample sizes are all equal, produces higher Type I error rates than the supposed significance level. Among these three test criteria, R has the highest Type I error rate. He did not include V test in his study. Olson (1974) compared the robustness of R, T, W, and V to violations of the homogeneity of covariance matrices assumption. In his study, the number of dimensions, p, was 2, 3, 6, or 10, the number of groups, k, was 2, 3, 6, or 10, and the sample sizes, n, 24 were 5, 10, or 50. Olson referred to the sample drawn from a population that had a different covariance matrix than the remaining populations as a contaminated group. There were two levels of contamination, namely, low concentration of contamination and high concentration of contamination. In the low concentration condition, contamination occurred equally in all dimensions of p-space. Under high concentration, however, contamination occurred in only one dimension of p-space. The degree of contamination, d, was 2, 3, or 6. Olson (1974) found that in the low concentration of contamination condition, all of the four criteria became liberal when the assumption of equal covariance matrices was violated. The departure from the nominal a can be serious if d > 3. The order of the tests in terms of the magnitude of the departure of r from a was typically R > T > W > V. The size of the departure increased as heteroscedasticity increased. With increasing values of p, the T's for R, T, and W also increased. However, ; for V did not respond in any clearcut way to changes in p. Increasing sample sizes decreased ; for R, T, and W. It slightly increased r for V when k > 6. However, this increased ; for V was still smaller than those of R, T, and W. Under the high concentration-of-contamination condition, Olson (1974) found that the departure from the supposed significant level 25 was mild, even though one dimension was severely contaminated. Olson (1974) pointed out that, above all, every effort should be made to maintain samples of equal size, or at least to prevent contaminated groups from being smaller than other groups. Olson concluded that the V test is the most robust among the four against heterogeneity of covariance matrices, although its Type I error rate is somewhat greater than a. James' and Johansen's Solutions James (1954) and Johansen (1980) proposed tests to deal with the multivariate k-sample Behrens-Fisher problem. James' (1954) solution is an extension of his solution to the two-sample multivariate Behrens-Fisher problem. Johanson (1980), on the other hand, generalized Welch's (1951) APDF solution. James' (1954) Solution The test statistic is k (X, x)'W1(xI x) i=l where, n 1 I x, =j x j=1 X = W, XW X 26 W,= (S,/n,) and The critical value is 2h(S ) X2 (A + X2B) where r = p(k 1) 1 k [tr(I W-IW,)]2 A =1 + 2 i=1 ' B k tr(I WJW,)2 1 k [tr(I W W)]2 James (1954) also proposed a second order test. It was presented as formula 6.7 in his paper. Johansen's (1980) Solution The test statistic is [{(XI x)'W,(xI x) C where 27 c= p(k 1) + 2A 6A/[p(p 1) + 2] and k {tr(I W'W)2 + [tr(I W'W,)]2 A i=l 2 The critical value is F with degrees of freedom df, and df2, where df, p(k 1) df2 = p(k 1)[p(k 1) + 2]/3A Ito (1969) studied the large sample behavior of James' (1954) test. He used total sample sizes of 300 or 600. For k = 3 and 5, he found that James' test always tends to result in a slight overestimation of significance. This tendency is more severe for unequal sample sizes than equal sample sizes. In addition, the discrepancy from nominal a increased as p increased from 1 to 4. The robustness of James test for k > 2 when sample sizes are small to moderate has not been investigated. By using the method of James (1954) to derive the APDF test of Johanson (1980), Kaiser (1983) showed that equivalence to terms of order 1/n, holds between the solutions of Johansen and James. 28 Summary Based on the preceding review, one can conclude that t, F, T2 and MANOVA criteria are not robust to violation of the assumption of equal variance (covariance matrices in the multivariate case) when sample sizes are not equal. Tests proposed to deal with the Behrens-Fisher problem, on the other hand, are more robust than the usual tests. These tests follow one of two approaches. One is to develop the critical value of the test statistic through a series expansion (series solution). The other approach is to approximate the distribution of a test statistic through approximate degrees of freedom (APDF solution). Results of these two approaches agree to order 1/n,. The solutions to the multivariate problem are generalizations of the univariate solution. In general, the tests designed to deal with the Behrens-Fisher problem are liberal, i.e., they tend to have higher Type I error rates than the nominal a. However, this liberal tendency is mild except when the sample-size ratios are extreme. Therefore, these tests generally perform much better than the usual tests in controlling the Type I error rate when the assumption of homogeneity of variance (covariance matrices in the multivariate case) is violated. The preceding studies indicate that APDF solutions are more effective in controlling type I error rate than the series solutions 29 of the first order. However, in the univariate case when a second order series solution is used, its type I error rate is closer to a than that of the APDF solution. CHAPTER 3 METHODOLOGY The design of the study and the simulation procedure employed to carry out the study are described in this chapter. Four factors are the most important in designing a multivariate experiment: number of populations sampled, sample sizes, number of variables, and the significance level. The design of the present study was based on these four factors. In addition, the degree and form of heterogeneity was, of course, studied. Design The design of the study was based on the consideration that experimental conditions included in the simulation should reflect the reality of multivariate experiments in educational research. Number of dimensions (p). Data were generated to simulate experiments in which there are p = 3 or p = 6 response variates. It seems likely that p > 6 occurs seldomly in educational research, and therefore p = 6 was the upper limit chosen for the study. Number of populations sampled (k). All of the simulated experiments had a simple one-way design with k = 3. Algina and Tang (1988) investigated the behavior of James' (1954) and Yao's (1965) tests for k = 2. Therefore, the present study will concentrate on 30 31 investigating the behavior of James' and Johansen's (1980) tests when k > 2. In the univariate literature (Dijkstra & Werter, 1981), it appears that the advantage of James' second order test is in experiments with many groups. However, experiments, especially multivariate experiments, with a large number of groups seem to be rare in educational research. Therefore, k was kept at three. The simplification of this factor allowed more levels of another factor, the form of covariance matrix heteroscedasticity, to be investigated more thoroughly. Sample-size ratios (n1:n2jngj. Both equal and unequal sample sizes were used in the study. When the sample sizes were unequal, the sample-size ratios were small to moderately large. The basic ratios of n,:n2:n3 used in the simulation are presented in Table 1. Table 1 Sample-Size Ratios (n1n2_n3;n n, : n2 :n3 1.3 1 : 1 :1.3 1: 1:2 1 : 1.3: 1.3 1 :2 :2 32 In some cases these basic ratios could not be maintained exactly because of the restriction of a fixed total sample size (N). The departure from these basic ratios was made as small as possible. Ratio of total sample size (N) to number of variables. The ratio N to p was 10, 15, or 20. Total sample sizes of 10p, 15p, or 20p was assigned to k groups based on the ratio illustrated in Table 1. The sample sizes used in the present study are displayed in Tables 2 and 3. Algina and Tang (1988) studied the case of k 2 samples and included N:p = 6, 10, and 20. They reported that N:p should be at least 10 to be reasonably sure that Yao's test, which was generally more robust than James' test, would be robust. With k > 2, it seems likely that N:p will need to be at least 10 for robustness to obtain. Therefore, in the present study, the upper limit of 20 was chosen to represent moderately large experiments: a total sample size of 60 when p = 3 and a total sample size of 120 when p = 6. Heterogeneity of covariance matrices. Each population with a covariance matrix equal to a p x p identity matrix (I) will be called an "uncontaminated" population. Each population with a p x p diagonal covariance matrix D with at least one diagonal element not equal to one will be called a "contaminated" population. Three forms of D were used in this study. They are shown in Table 4. 33 Table 2 qAMnIP qi7Pq tUhpn n 3 N/p n, n2 n3 10 9 9 12 10 7 7 16 10 8 11 11 10 6 12 12 15 13 13 19 15 11 11 23 15 13 16 16 15 9 18 18 20 18 18 24 20 15 15 30 20 16 22 22 20 12 24 24 Two levels of d, d = 1. 5 and 3.0, were employed to simulate heterogeneity of covariance matrices. In Algina and Tang's (1988) study, d was chosen to be 1.5, 2.0, 2.5, or 3.0. In Olson's (1974) 34 Table 3 Sample Sizes When p = 6 N/p n, n2 n3 10 18 18 24 10 15 15 30 10 16 22 22 10 12 24 24 15 27 27 36 15 22 22 46 15 26 32 32 15 18 36 36 20 36 36 48 20 30 30 60 20 32 44 44 20 24 48 48 study, d was chosen to be 2.0, 3.0, or 6.0. They found that the smaller the d value, the more robust the test. In the present study, d = 15 was used to simulate a small degree of 35 Table 4 Forms of Covariance Matrix D Form p = 3 p= 6 1 diag(d2, d2, d2) diag(d2, d2, d2, d2, d2, d2) 2 diag(l, d2, d2) diag(l, 1, 1, d2, d2, d2) 3 diag(l/d2, d2, d2) diag(l/d2, 1/d2, 1/d2, d2, d2, d2) heteroscedasticity, which seems more likely to be common in educational experiment than extreme heteroscedasticity, such as d = 6.0. In the present study, d = 3.0 was selected to investigate a larger degree of heteroscedasticity and to permit comparison of the results of the present study with those of Algina and Tang (1988), and Olson (1974). Two relationships between sample size and covariance matrices were employed. In the positive relationship, the larger samples were associated with D. In the negative relationship, the smaller samples were associated with D. These two relationships are summarized in Table 5. For equal sample sizes, the two nonredundant heteroscedasticity patterns, (I, I, D) or (I, D, D), were employed. Significance levels. Three levels of significance were employed: .10, .05, and .01. 36 Table 5 Combination of Sample-Size Ratios and Contamination Sample-Size Ratios Relationship n: n2 n3 Positive Negative 1.0: 1.0 :1.3 I I D D D I 1.0: 1.0: 2.0 I I D D D I 1.0 :1.3 :1.3 I D D D I I 1.0 :2.0 :2.0 I D D D I I Invariance Property of the Test Statistic The form of heteroscedasticity investigated in this study is that the populations in an experiment can be divided into two subsets; each subset has a common variance-covariance matrix, with the matrices varying across subsets. In the simulated experiments the variance-covariance matrix for one subset was an identity matrix, whereas the matrix for the second subset was a diagonal matrix. These matrix forms entail no loss of generality beyond that due to the limited form of heteroscedasticity investigated in this study. The reasons there is no additional loss of generality are 37 twofold. First, by a well-known theorem (Anderson. 1958), for any pair of positive definite matrices there exists a matrix C which can transform the matrices into an identity matrix and a diagonal matrix simultaneously. Second, as shown below, the test statistics are invariant under transformations y, = Cx, i = 1,. k, where x, is the observed mean vector, and C is a p x p nonsingular matrix. In this section, the invariance property of the test statistic and the critical values are proved. The estimated variance-covariance matrix of x, is A, = S,/n. That of y, is CAC'. The test statistic of both James' and Johansen's procedure is t'Vt = (x, x)'W,(x, x), i = 1. k where W, =A,-, and W =ZW,, i= 1, k; t'= (t' 2 t' K ), and t, x, XV, i= 1,. k 1. 38 The covariance matrix of t is Al +A k A Cov (t) = V = AK AK . A2 + Ak AR A K + Ak A Let t = Ct, and M = I C, where denotes the Kronecker product. Then t* Mt, and Cov (t') = Cov (Mt) C(Al + A)C' CA C' CA C' C(A2 + A )C' L CA C' = MVM' = V*. . . CA C' CAK C' C(Ak- + A)C Now, t*'V1t* = (Mt)' (MVM' ) (Mt) , = t'M'M-TVlMlMt, = t'VIt. Therefore the test statistic is invariant under nonsingular transformation. _ 39 For both James' first order and Johansen's critical values, the quantities that are relevant to the invariance property are G = tr(I WlW,)r, r = 1, 2, 3, ... The critical value of James' second order procedure involves G and traces of cyclical products such as E = W1W W WW'W W1W . Let A,, j = 1, ...,.p be the eigenvalues of I W1W,. Therefore Ar, j = 1, ...,.p are the eigenvalues of (I WlW,)r. We need only to show that the eigenvalues of I WW are invariant, since tr(I W1'W,)=,. p The As are the solutions to the equation II W -W, AII = 0, or WJ1W, (1 A)Il = 0 For the transformed variables we have C-TW C-CWC' (1 A*)Il = 0, or C-TWW,C' (1 A)Il = 0. If we premultiply by C' and postmultiply by C, the eigenvalues remain unchanged. Carrying out these multiplications gives 40 |W'W, (1 A*)II = 0, which proves = Therefore C = tr(I WlW,)r is invariant. By the property that tr(XY) = tr(YX), it is obvious that tr(E) = tr(CT W1W W1W W1W I) = tr(E) and that the trace of other cyclical products of W1W, and W1W, will also be invariant. By the preceding results, using only diagonal matrices to simulate experiments in which there are only two sets of covariance matrices is justified. However, when there are more than two sets of different matrices, the matrices cannot always be simultaneously diagonalized by a matrix C. Simulation Procedure For each condition, data were generated and the performance of the Pillai-Bartlett (V) test, James' first order test, James' second order test, and Johansen's tests were evaluated using the generated data. Generating mean vectors. In order to generate a mean vector, a p x 1 vector of uncorrelated standard normal pseudorandom variables was generated by using the RANNOR function in SAS (1985). Multiplying the vector by 1/ n, a sample mean vector from an uncontaminated population was obtained. Multiplying the vector by 41 (1/ n1)D", a sample mean vector from a contaminated population was obtained. Generating covariance matrices. Let T be a p x p lower triangular matrix. Kshirsagar (1959) showed that if T N(0,l), for i o j, and Ti ~ )2, where n is the number of observations of a sample, then TT' ~ W(n 1, IP), a Wishart distribution with n 1 degrees of freedom. Thus S = (n l)'TT' is a covariance matrix for a sample of size n from a population with covariance matrix I. That is, S is a sample covariance matrix from an uncontaminated population. Pre and post multiplying S by D", a covariance matrix from a contaminated population is obtained. By using the RANNOR function described above, the T,,s are generated; by using the RANGAM function, the Ts are generated. Evaluating the performance of test statistics. For each replication, the data generated were analyzed by the Pillai-Bartlett trace (V), James' first order and second order test, and Johansen's tests. The test statistics and the critical values were calculated using PROC MATRIX in SAS. The proportion of 2000 replications that yield significant results at a = .10, .05, and .01 was recorded. Summary Combining one level of k (k = 3), two levels of p (p = 3 or 6), four levels of sample-size ratios (see Table 1), three levels of N/p (10, 15, 20), two levels of d (d = 1. 5 or 3.0), three levels 42 of D, and two levels of forms of contamination (positive or negative), 288 experimental conditions are obtained with unequal sample sizes. In addition there were 72 conditions with equal sample sizes. The Pillai-Bartlett trace (V), James' first order and second order tests, and Johansen's test were applied in these experimental conditions. A general picture of the behavior of these tests was obtained by these 360 simulated experiments. CHAPTER 4 RESULTS AND DISCUSSIONS In this chapter, the results for nominal a = .05 are presented. In general, the major patterns of results are similar for a = .01, .05, and .10. Consequently, the results for a = .01 and a = .10 are tabled in Appendices A, B, and C. The chapter is divided into two major sections. In the first section the results are presented for conditions with equal sample sizes. The results for conditions with unequal sample sizes are reported in the second section of the chapter. The Performance of the Four Multivariate Tests When n1 =_ n2 = n3 The Performance of the Four Tests When d = /1.5 Estimated rs are reported in Tables 6, 7, and 8 for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test; these results are for a = .05 in conditions in which n, = n2 = n3 and d = 1.5. Thirty-six ;s are reported for each test. For the Pillai-Bartlett trace, one of the ;s falls outside the criterion interval a 2 a(l-a)/2000. For Johansen's test, five rs fall outside the criterion interval. The largest r was .065; four of the five occur when n, = 10, i = 1, 2, 3. The number of ;s outside the interval is 21 and 16 for James' first order test and second order test respectively. For James' 43 Table 6 Actual Type = n = n I lror-r Rates D- d2I d D ,, d 1 for Four Mltivarit Tet: [I- and a = .05 p N/p Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 10 .0450 .0460 .0605 .0330 3 15 .0485 .0530 .0615 .0455 3 20 .0520 .0560 .0615 .0505 6 10 .0445 .0520 .0635 .0330 6 15 .0480 .0510 .0550 .0365 6 20 .0475 .0460 .0495 .0375 I:D:D 3 10 .0430 .0540 .0665 .0385 3 15 .0495 .0565 .0635 .0500 3 20 .0485 .0535 .0570 .0480 6 10 .0455 .0560 .0695 .0255 6 15 .0450 .0475 .0530 .0375 6 20 .0555 .0605 .0655 .0500 Note. The underlined a 2 a(1-a)/2000. rs are outside the interval 44 T r-- I- frFu MutvraeTs : 45 Table 7 Actual Type I Error Rates for Four Multivariate Tests: n= = n3, D = diag (d2 d2 1) or D = diag (d2 d2 d2 1 1 1), dand a = .05 p N/p Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 10 .0455 .0625 .0745 .0430 3 15 .0465 .0540 .0585 .0490 3 20 .0525 .0555 .0620 .0500 6 10 .0570 .0650 .0855 .0355 6 15 .0545 .0590 .0665 .0420 6 20 .0570 .0540 .0580 .0455 I:D:D 3 10 .0480 .0610 .0755 .0410 3 15 .0505 .0555 .0660 .0455 3 20 .0465 .0480 .0540 .0460 6 10 .0435 .0565 .0715 .0275 6 15 .0435 .0475 .0525 .0350 6 20 .0495 .0525 .0570 .0390 Note. The underlined a 2 n(l-a)/2000. rs are outside the interval 46 Table 8 Actual Type I Error Rates for Four Multivariate Tests: n= n2 = n3, D = diag (d2 d2 1/d2) or D = diag (d2 d2 d2 1/d2 l/d2 1/d2), d = 1.5, and a = .05 p N/p Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 10 .0470 .0555 .0730 .0380 3 15 .0505 .0515 .0565 .0440 3 20 .0385 .0435 .0475 .0400 6 10 .0465 .0515 .0675 .0265 6 15 .0450 .0555 .0620 .0400 6 20 .0570 .0570 .0620 .0515 I:D:D 3 10 .0515 .0535 .0660 .0420 3 15 .0515 .0510 .0595 .0455 3 20 .0405 .0500 .0555 .0455 6 10 .0505 .0625 .0790 .0405 6 15 .0485 .0475 .0535 .0365 6 20 .0565 .0535 .0580 .0440 Note. The underlined s are outside a 2fa(1-a)/2000. the interval 47 first order test, the 21 s outside the interval are larger than a - .05. The s for the first order test tend to be larger than those for Johansen's test. For James' second order test, the 16 s outside the interval are less than a = .05. James' second order test exhibited some tendency to perform better with p = 3 than with p = 6. However, in general p did not appear to have a strong effect for any of the tests. Neither, did the pattern of contamination (I:I:D vs I:D:D). As N/p increased, there was some tendency for the performance of James' and Johansen's tests to improve. The second order test appears to have nearer a for D d21 than for D = d21. The first order test and Johansen's test did not appear to be affected by the form of D. The Performance of the Four Tests When d = 3 In Tables 9, 10, and 11, 36 rs are reported for each test under the condition in which n, n2 = n3 and d = 3. For the PillaiBartlett trace 29 fall outside the criterion interval. For Johansen's test, 12 fall outside the interval. All of these are larger than .05. Eight of ;s outside the interval occur in conditions in which n, = 10. For James' first order test, 22 fall outside the interval; all are larger than .05. For James' second order test, 17 fall outside the interval; all are smaller than .05. In general, the effects of the factors on James' and Johansen's tests are similar to their effects for d = 1-.5. 48 Table 9 Actual Type I Error Rates for Four Multivariate Tests: n=n = n,DdI, dI = 3, nd a= .05 p N/p Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 10 .0920 .0665 .0830 .0430 3 15 .0855 .0530 .0590 .0425 3 20 .0890 .0555 .0595 .0505 6 10 .0860 .0690 .0790 .0395 6 15 .0905 .0550 .0635 .0395 6 20 .0795 .0435 .0505 .0320 I:D:D 3 10 .0505 3 15 .0615 3 20 .0625 6 10 .0635 6 15 .0655 6 20 .0690 Note. The underlined ca 2 a(1-a)/2000. .0585 .0575 .0520 .0770 .0610 .0605 .0800 .0665 .0580 .0875 .0680 .0665 rs are outside the interval .0315 .0425 .0435 .0325 .0340 .0475 I_ 49 Table 10 Actual Type I Error Rates for Four Multivariate Tests: n, = n2-= n., D = diag ((12 d2 1) or D = diag (d2 d2 d2 1 1 1 ), d = 3, and a = .05 p N/p Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 10 .0715 .0590 .0775 .0430 3 15 .0685 .0405 .0470 .0350 3 20 .0640 .0450 .0485 .0390 6 10 .0725 .0640 .0800 .0315 6 15 .0680 .0475 .0560 .0365 6 20 .0765 .0515 .0545 .0410 I:D:D 10 15 20 10 15 20 .0535 .0595 .0590 .0480 .0460 .0560 .0590 .0610 .0490 .0580 .0535 .0470 .0750 .0665 .0540 .0720 .0560 .0505 Note. The underlined a 2 a(1-a)/2000. is are outside the interval .0385 .0510 .0430 .0280 .0375 .0395 50 Table 11 Actual Type I Error Rates for Four Multivariate Tests: n, = n2 = n. D = dia. (d2 d2 1/d2) D = diag (d2 d2 d2 1/d2 1/d2 1/d2), d = 3, and a = .05 p N/p Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 10 .0780 .0665 .0810 .0460 3 15 .0735 .0520 .0605 .0425 3 20 .0740 .0475 .0515 .0430 6 10 .0735 .0675 .0800 .0370 6 15 .0760 .0665 .0755 .0480 6 20 .0775 .0505 .0545 .0425 I:D:D 3 10 .0705 .0700 .0845 .0365 3 15 .0745 .0580 .0655 .0400 3 20 .0710 .0505 .0565 .0410 6 10 .0665 .0625 .0785 .0440 6 15 .0735 .0525 .0610 .0435 6 20 .0685 .0540 .0600 .0485 Note. The underlined s are outside a 2 a(l-a)/2000. the interval 51 However, the effect of p on the second order test appears to be stronger when d = 3 than when d = 1.5. Summary The Pillai-Bartlett test worked well when the degree of heterogeneity was small, d = f1.5, but had excessive type I error rates when the degree of heterogeneity was large, d = 3.0. Johansen's test worked reasonably well, particularly with N/p > 15. Consequently, it probably should be use in place of the usual multivariate criteria. James' first order test is not competitive; it had more s than Johansen's test had outside the interval and tended to have ; larger than that for Johansen's test. James' second order test had more s than Johansen's test had outside the criterion interval. This seems to favor Johansen's test. However, some researchers may favor the second order test because this test tended to have ^ smaller than a, whereas Johansen's test tended to have larger than a. The Performance of the Four Multivariate Tests When Sample Sizes Are Unequal Estimated Type I error rates are reported for conditions in which the sample sizes are unequal. The section is divided into three subsections defined by N/p (10, 15, or 20). In each subsection, 96 s are reported for each of the four tests: PillaiBartlett, Johansen, James' first order, and James' second order. 52 For each subsection, there are eight tables, one for each combination of p (3 or 6), direction of relationship between the sample sizes and covariance matrices (positive or negative), and d ( 1.5 or 3.0). The Performance of the Four Tests When N/P = 10 Forty eight estimated rs for each of the four tests for the conditions in which N/p = 10 and d = f1.5 are reported in Tables 12 through 15. The number of s outside the interval is 20 for the Pillai-Bartlett Trace, 30 for the second order test, 37 for Johansen's test, and 48 for the first order test. Estimated rs for conditions in which d = 3.0 are reported in Tables 16 through 19. The number of s outside the criterion interval is 44, 31, 37, and 48 for the Pillai-Bartlett trace, James' second order test, Johansen's test, and James' first order test respectively. For the Pillai-Bartlett trace, when d changes from -1.5 to 3.0, the increase in the number of s outside the criterion interval is 120%. The decline in performance is most notable for the negative condition in which only five are outside the criterion interval for d = 1.5 but all 24 s are outside the criterion interval for d = 3.0. For the other tests the number of s outside the criterion interval does not change dramatically when d increases from f1.5 to 3. Based on the number of s outside the criterion interval it appears that there was only a slight decline in the performance of Johansen's 53 Table 12 Actual Type I Error Rates P tieCndit-ion N/ 10 for Four Multivariate Tests: 3 di = /1 5 and a= .05 n1 n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 9 9 7 7 8 11 6 12 D = diag(d2 9 9 7 7 8 11 12 16 11 12 d2 1) 12 16 11 .0375 .0240 .0380 .0380 .0345 .0285 .0370 6 12 12 .0430 .0520 .0735 .0530 .0765 .0515 .0675 .0520 .0775 .0625 .0930 .0620 .0955 .0640 .0920 .0735 .0990 .0375 .0455 .0380 .0540 .0350 .0370 .0365 .0515 D = diag(d2 9 9 7 7 8 11 6 12 Note. When n, = n, g the form of contamination is I:I:D; when n, o n2 = n3, the form of contamination is I:D:D. The underlined s are outside the interval a 2 a(1-a)/2000. d2 1/d2) 12 16 11 12 .0410 .0325 .0435 .0455 .0580 .0705 .0610 .0765 .0725 .0965 .0780 .0970 .0375 .0445 .0375 .0485 1 s ,t;- p-A n ,I p ,0 3 = -- n 0 Table 13 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10, p = 3, d = ,1.5, and a = .05 n1 n2 n3 Pillai- Johansen's James' James' Bartlett test lst order 2nd order trace test test D d21 9 9 7 7 8 11 6 12 D = diag(d2 9 9 7 7 8 11 12 16 11 12 d2 1) 12 16 11 .0540 .0725 .0535 .0580 .0475 .0580 .0540 6 12 12 .0675 .0555 .0845 .0630 .0810 .0575 .0725 .0685 .0855 .0720 .1115 .0820 1035 .0785 .0980 .0880 .1100 .0395 .0540 .0435 .0495 .0385 .0490 .0460 .0590 D = diag(d2 9 9 7 7 8 11 6 12 d2 1/d2) 12 .0435 16 .0530 11 .0525 12 .0580 54 .0630 .0695 .0610 .0825 .0845 .0875 .0745 .1030 .0425 .0420 .0415 .0490 Note. When n, = n2 n3, the form of contamination is D:D:I; when ni v n2 = n3, the form of contamination is D:I:I. The underlined rs are outside the interval a 2 a(1-a)/2000. Table 14 Actual Tvpe I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10, p = 6, d = 1.5, and a = .05 n n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 15 15 16 22 12 24 D = diag(d2 18 18 15 15 16 22 12 24 D = diag(d2 18 18 15 15 16 12 22 24 24 30 22 24 d2 d2 24 30 22 24 d2 d2 24 30 22 24 .0335 .0255 .0335 .0320 1 1 1) .0410 .0380 .0375 .0400 1/d2 1/d2 .0470 .0445 .0510 .0540 .0530 .0645 .0560 .0635 .0615 .0835 .0650 .0685 .0695 .0820 .0685 .0820 .0755 .1000 .0770 .0900 .0280 .0375 .0265 .0295 .0335 .0335 .0330 .0330 1/d2) .0690 .0715 .0630 .0805 .0870 .0920 .0730 .0995 .0320 .0280 .0330 .0405 Note. When n, = a, 9 n2 n., the n2 n3, the form of contamination is I:I:D; when form of contamination is I:D:D. The underlined s are outside the interval a 2 a(l-a)/2000. 55 56 Table 15 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p 10, p = 6, d = 1.5, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 24 .0530 .0565 .0670 .0290 15 15 30 .0725 .0675 .0870 .0315 16 22 22 .0595 .0720 .0855 .0355 12 24 24 .0825 .0940 .1190 .0425 D = diag(d2 d2 d2 1 1 1) 18 18 24 .0570 .0595 .0735 .0290 15 15 30 .0560 .0720 .0895 .0305 16 22 22 .0585 .0700 .0840 .0365 12 24 24 .0670 .0865 .1070 .0425 D = diag(d2 d2 d2 1/d2 1/d2 1/d2) 18 18 24 .0560 .0670 .0825 .0360 15 15 30 .0500 .0680 .0865 .0310 16 22 22 .0475 .0675 .0820 .0335 12 24 24 .0540 .0790 .0995 .0305 Note. When n, = n ; n2 = n3, the are outside the n2 n3, the form of contamination is D:D:I; when form of contamination is D:I:I. The underlined ;s interval a 2Fa(l-a)/2000. 57 Table 16 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/P = 10, p = 3, d = 3, and a = .05 n1 n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 9 9 7 7 8 11 6 12 D = diag(d2 9 9 7 7 8 11 6 12 D = diag(d2 9 9 7 8 6 7 11 12 12 16 11 12 d2 1) 12 16 11 12 d2 1/d2) 12 16 11 12 .0365 .0065 .0375 .0200 .0425 .0125 .0355 .0280 .0490 .0450 .0770 .0960 .0525 .0565 .0675 .0605 .0580 .0585 .0660 .0520 .0565 .0695 .0580 .0650 .0645 .0775 .0815 .0750 .0740 .0785 .0810 .0695 .0720 .0920 .0715 .0775 .0360 .0375 .0460 .0455 .0365 .0390 .0450 .0360 .0375 .0435 .0445 .0460 Note. When n, = n ? 3 n, the form of conta n, n2 = n3, the form of contamination is I: are outside the interval a 2 a(l-a)/2000. mination is I:I:D; when D:D. The underlined s 58 Table 17 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10, p = 3, d = 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D = d21 9 7 8 6 D = diag( 9 7 8 6 D = diag( 9 7 8 6 9 7 11 12 d 2 9 7 11 12 d2 9 7 11 12 12 16 11 12 d2 1) 12 16 11 12 d2 1/d2) 12 16 11 12 .1055 .2495 .1815 .2900 .0865 .1555 .1345 .2050 .0685 .1225 .1145 .1795 .0885 .1050 .0890 .1305 .0725 .0860 .0775 .1025 .0725 .0910 .0615 .0825 .1145 .1475 .1075 .1625 .0985 .1215 .0970 .1295 .0895 .1160 .0775 .1010 .0590 .0580 .0660 . 0855 .0470 .0530 .0495 .0715 .0500 .0570 .0430 .0575 Note. When n, = n2 7 n3, the form of contamination is D:D:I; when n 1 n-2 = n3, the form of contamination is D:I:I. The underlined rs are outside the interval a + 2 a(l-a)/2000. 59 Table 18 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10, p = 6, d = 3, and a = .05 n n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D = d21 18 18 15 15 16 22 12 24 D diag(d2 18 18 15 15 16 22 12 24 D = diag(d2 18 18 15 15 16 22 12 24 24 30 22 24 d2 d2 24 30 22 24 d2 d2 24 30 22 24 .0290 .0575 .0055 .0685 .0310 .0770 .0145 .0650 1 1 1) .0340 .0565 .0120 .0700 .0380 .0545 .0285 .0735 1/d2 1/d2 1/d2) .0675 .0710 .0720 .0795 .1060 .0625 .1680 .0710 Note. When n, = n, # n2 = n3, the are outside the the form of contamination is I:I:D; when form of contamination is I:D:D. The underlined s interval a 2 a(l-a)/2000. .0750 .0860 .0950 .0740 .0675 .0840 .0715 .0905 .0880 .1000 .0770 .0885 .0275 .0395 .0375 .0315 .0285 .0265 .0290 .0335 .0375 .0360 .0285 .0345 n, 9 no, , 60 Table 19 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10, p = 6. d = 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 24 15 15 30 16 22 22 12 24 24 D = diag(d2 d2 d2 18 18 24 15 15 30 16 22 22 12 24 24 D = diag(d2 d2 d2 18 18 24 15 15 30 16 22 22 12 24 24 .1280 .0915 .2650 .1070 .1990 .0875 .3940 .1480 11 1) .0745 .0685 .1365 .0930 .1275 .0720 .1925 .0890 1/d 2 1/d2 1/d2) .0495 .0590 .0620 .0685 .1095 .0670 .1485 .0620 .1155 .1370 .1105 .1705 .0800 .1115 .0870 .1085 .0720 .0870 .0800 .0725 .0320 .0385 .0435 .0680 .0275 .0350 .0355 .0405 .0260 .0330 .0325 .0340 Note. When n, = n2 9 n3, the form of contamination is D:D:I; when n1 7' n2 = n3, the form of contamination is D:I:I. The underlined rs are outside the interval a 2 a(l-a)/2000. 61 test and the second order test as d increased. These frequencies are misleading for Johansen's test: there is a marked increase in r in the negative condition when d changes from f1.5 to 3. Specifically, the mean r increases from .0710 to .0863. When d = f1.5, the estimated rs outside the criterion interval for the Pillai-Bartlett trace are smaller than a = .05 under positive conditions (smaller samples obtained from the populations having the contaminated variance-covariance matrices) and larger than a under negative conditions (smaller samples obtained from the populations having the contaminated variance-covariance matrices). This is similar to the performance of Hotelling's T2 found by Hakstian, Roed, and Lind (1971). The same pattern is observed for d = 3.0 except when D = diag(d2 d2 l/d2) or diag(d2 d2 d2 1/d2 1/d2 1/d2). Then ; is near or larger than a in both positive and negative conditions. For Johansen's test and James' first order test, the ;s outside the criterion interval tend to be larger than a. Estimated type I error rates for Johansen's test are generally closer to a than the is for the first order test are. This is similar to the performance of Yao's test and James' first order test when p = 2 (Algina and Tang, 1988). For James' second order test, the ;s outside the criterion interval are smaller than a. For d = f1.5 and 3.0 respectively, mean ;s for various levels in the design are reported in Tables 20 and 21 respectively. Two factors, sample-size-covariance-matrix (SSCM) relationship and 62 Table 20 The Effects of Four Factors on the Means of ;s of the Four Tests: d 11.5 Factor Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test Sample-sizecovariancematrix relationship Positive Negative n,:n2:n3 1:1:1.3 1:1:2 1:1.3:1.3 1:2:2 p 3 6 D d2I diag(d2 or 1) diag(d2 or 1/d2) .0383 .0577 P / Na .0391/.0518 .0322/.0603 .0401/.0542 .0421/.0645 .0464 .0496 .0480 .0478 0483 .0654 .0710 .0587 .0721 .0627 .0793 .0672 .0691 .0666 .0687 .0692 .0826 .0898 .0741 .0929 .0773 .1004 .0866 .0858 .0839 .0875 .0872 .0371 .0401 .0348 .0387 .0367 .0442 .0440 .0332 .0388 .0390 .0380 aThe mean rs under different sample size ratios are presented separately for positive condition (P) and negative condition (N) for the Pillai-Bartlett trace. 63 Table 21 The Effects of Four Factors on the Means of rs of the Four Tests: d = 3 Factor Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test Sample-sizecovariancematrix relationship Positive .0455 .0636 .0796 .0368 Negative .1575 .0863 .1081 .0476 n1:n2:n3 P / Na 1:1:1.3 .0431/.0854 .0670 .0842 .0371 1:1:2 .0256/.1652 .0794 .1032 .0414 1:1.3:1.3 .0542/,1444 .0700 .0864 .0417 1:2:2 .0592/.2349 .0834 .1016 .0487 p 3 .0991 .0741 .0949 .0496 6 .1038 .0758 .0929 .0348 D d21 .1246 .0845 .1059 .0470 diag(d2 or 1) .0840 .0719 .0906 .0396 diag(d2 or 1/d2) .0959 .0686 .0851 .0401 aThe mean ;s under different sample size ratios are presented separately for positive condition (P) and negative condition (N) for the Pillai-Bartlett trace. 64 sample size ratio, tended to have larger effects than the other factors had on the performance of the four tests. The ;s for all four tests tend to be larger in the negative condition than in the positive condition. For the second order test this trend resulted in better performance in the negative condition. For Johansen's test and the first order test, it resulted in poorer performance in the negative condition. For d = -1.5, the Pillai-Bartlett trace performed better in the negative condition: of the 20 is outside the criterion interval, only five came from the negative condition. The same pattern was observed with d = 3.0 and p = 3. However, when p increased to 6 the test performed more poorly in the negative condition. The rs for Johansen's and the first order test tend to increase as the sample size ratio becomes more extreme, resulting in poorer performance. For the Pillai-Bartlett trace the effect of the sample size ratio depended on the SSCM relationship. With a negative relationship the test tended to become more liberal as the ratio became more extreme. The effect was larger for d = 3.0 than it was for d = I .5. With a positive relationship, the effect of the sample size ratio was quite complicated, depending on d, p, and the form of D. For James' second order test, there is a tendency for ; to increase as the sample size ratio becomes more extreme. This increase tended to result in better performance for the test. 65 Type I error rates appear to be unaffected by variation in p for the Pillai-Bartlett trace, Johansen's test, and James' first order test. James' second order test was affected negatively by an increase in p. In fact, when d 1.5, the poorer performance of the second order test, in comparison to the Pillai-Bartlett trace was due to its performance when p = 6. For p = 3, the second order test has nine is outside the criterion interval. For p = 6, the number of ;s outside the criterion interval increases to 20. The form of D had little effect on is for the tests except for the Pillai-Bartlett trace when d = 3.0. Then the test performed better when D = diag(d2 d2 1/d2) and D = diag(d2 d2 d2 1/d2 1/d2 1/d2). For the contaminated covariance matrices with these forms, only one of 16 ;s is outside the criterion interval. The Performance of the Four Tests When N/p = 15 Estimated Ts for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test for conditions in which N/p = 15 and d = 1l.5 are reported in Tables 22, 23, 24, and 25. Forty-eight ;s are reported for each test. For PillaiBartlett trace, 24 is are outside the criterion interval. For Johansen's test, 13 is are outside the criterion interval. For James' first order test, 37 is are outside the criterion interval. For James' second order test, 15 rs are outside the criterion interval. Given d = 1.5, as N/p increases from 10 to 15, the 66 Table 22 ~O'nanse.. S D d 11 9 18 D diag(d2 13 13 11 11 13 16 9 18 11 13 9 11 16 18 18 d2 1) 19 23 16 23 16 18 Note. Then n n, = n3. are outside Jame (>5 2 01 .0390 .0380 .0315 .0350 .0480 .0435 .0495 .0410 am S . 0445 .0420 .0420 .0360 .0355 .0395 . 0455 .0455 .0450 .0430 .0520 .0510 0495 .0565 .0470 .0440 .0460 .0585 . 0':uU .0580 .0550 .0530 . 0610 .0565 .(>2 2 .0665 .0565 .0560 .0555 .0710 .0705 .0625 .0635 n, = n2 n3, the form of contamination is I:I:D; when the form of contamination is I:D:D. The underlined ^s t-e interval a 2 a(1-a)/2000. 67 Table 23 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15, p = 3, d = 1.5, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d2I 13 13 19 .0715 .0620 .0715 .0535 11 11 23 .0785 .0555 .0675 .0420 13 16 16 .0650 .0555 .0605 .0475 9 18 18 .0665 .0570 .0685 .0455 D = diag(d2 d2 1) 13 13 19 .0610 .0550 .0635 .0460 11 11 23 .0630 .0545 .0670 .0405 13 16 16 .0510 .0640 .0690 .0495 9 18 18 .0570 .0555 .0635 .0405 D = diag(d2 d2 1/d2) 13 13 19 .0565 .0570 .0660 .0480 11 11 23 .0600 .0630 .0735 .0505 13 16 16 .0530 .0550 .0635 .0435 9 18 18 .0530 .0635 .0705 .0500 Note. When n, = n2 g n3, the form of contamination is D:D:I; when n, 1 n2 = n the form of contamination is D:I:I. The underlined rs are outside the interval a 2 a(1-a)/2000. Table 24 Actual Tvpe I Error Rates for Four Multivariate Tests: 15 = 6, d = J1.5, and m = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 36 46 32 36 d2 d2 36 46 32 36 d2 d2 36 46 32 36 .0345 .0275 .0445 .0310 1 1 1) .0365 .0345 .0545 .0400 l/d2 1/d2 .0420 .0470 .0465 .0580 .0445 .0585 .0545 .0605 .0540 .0570 .0685 .0555 .0480 .0660 .0625 .0695 .0610 .0620 .0760 .0620 .0315 .0425 .0415 .0400 .0400 .0365 .0510 .0375 1/d2) .0520 .0540 .0470 .0655 .0575 .0605 .0550 .0765 .0345 .0340 .0320 .0415 Note. When n, = n n, n, = n3, the f 2 n3, the form of contamination is I:I:D; when orm of contamination is I:D:D. The underlined rs are outside the interval a 2 a(1-a)/2000. 68 Po itv C Adto ;N-;- 7/p Table 25 Actual Type I Error Rates Negative Condition. N/p for Four Multivariate Tests: 15p= 6 d = / 5 and a = .05 n1 r2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 36 46 32 36 d2 d2 36 46 32 36 d2 d2 36 46 32 36 .0485 .0420 .0870 .0710 .0565 .0640 .0820 .0735 1 1 1) .0610 .0570 .0730 .0630 .0565 .0510 .0645 .0655 l/d2 1/d2 1/d2) .0530 .0560 .0540 .0590 .0405 .0530 .0540 .0645 69 .0525 .0820 .0690 .0815 .0660 .0765 .0600 .0750 .0660 .0680 .0575 .0700 .0280 .0415 .0465 .0455 .0435 .0420 .0330 .0415 .0395 .0405 .0385 .0420 Note. When n, = n2 7 n3, the form of contamination is D:D:I; when n, 1 n2 = n3, the form of contamination is D:I:I. The underlined rs are outside the interval a 2 a(l-a)/2000. .5 . 70 number of rs outside the criterion interval decreases for all tests except the Pillai-Bartlett trace. There is a 65%, 23%, and 48% decrease for Johansen's test, James' first order test, and James' second order test respectively. The results for d = 3.0 are reported in Tables 26 through 29. For the Pillai-Bartlett trace, all but five rs are outside the criterion interval. For Johansen's test, 21 ;s are outside the criterion interval. For James first order test, 38 ;s are outside the criterion interval. For James' second order test, 23 ;s are outside the criterion interval. As in d = -1.5, the number of rs outside the criterion interval decreases for all tests except the Pillai-Bartlett trace as N/p increases from 10 to 15. There is a 47%, 21%, and 22% decrease for Johansen's test, James' first order test, and James' second order test respectively. These percentage decreases as N/p increases are smaller for d = 3 than the corresponding decreases were for d = f1.5. Given N/p = 15, when d increases from 1 .5 to 3, there is a 87%, 54%, and 60% increase in the number of s outside the criterion interval for the PillaiBartlett trace, Johansen's test, and James' second order test. For James' first order test, the increase is trivial. For d = 1.5 the rs for all four tests tend to be larger in the negative condition than in the positive condition. For the 71 Table 26 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p = 3, d = 3, and a = .05 n1 n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 13 13 11 11 13 16 9 18 D = diag(d2 13 13 11 11 13 16 9 18 D diag(d2 19 23 16 18 d2 1) 19 23 16 18 d2 1/d2) .0220 .0120 .0440 .0250 .0350 .0195 .0485 .0310 .0525 .0600 .0580 .0570 .0530 .0590 .0580 .0530 .0585 .0690 .0665 .0685 .0630 .0700 .0655 .0585 .0405 .0510 .0485 .0470 .0445 .0465 .0490 .0445 13 13 19 .0550 .0685 .0800 .0585 11 11 23 .0480 .0580 .0665 .0435 13 16 16 .0745 .0595 .0675 .0475 9 18 18 .0935 .0440 .0510 .0345 Note. When n, = n2 e n3, the form of contamination is I:I:D; when n1 n2 = n3, the form of contamination is I:D:D. The underlined s are outside the interval a 2 a(1-a)/2000. 72 Table 27 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p 15, p = 3, d 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 13 13 19 .1110 .0590 .0695 .0410 11 11 23 .1975 .0750 .0910 .0470 13 16 16 .1400 .0650 .0740 .0535 9 18 18 .2670 .0855 .0965 .0555 D = diag(d2 d2 1) 13 13 19 .0875 .0675 .0745 .0500 11 11 23 .1430 .0620 .0795 .0415 13 16 16 .0980 .0520 .0580 .0420 9 18 18 .1955 .0685 .0800 .0480 D = diag(d2 d2 1/d2) 13 13 19 .0705 .0545 .0670 .0435 11 11 23 .0960 .0615 .0765 .0475 13 16 16 .1035 .0530 .0605 .0455 9 18 18 .1935 .0640 .0765 .0500 Note. When n, = n2 P n3, the form of contamination is D:D:I; when ni, r2 = n the form of contamination is D:I:I. The underlined s are outside the interval a 2a(l-a)/2000. 73 Table 28 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p = 6, d = 3, and a = .05 n1 n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D = d21 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 D = diag(d2 36 46 32 36 d2 d2 36 46 32 36 d2 d2 .0360 .0025 .0370 .0150 1 1 1) .0325 .0150 .0465 .0300 1/d2 1/d2 .0455 .0530 .0550 .0565 .0465 .0515 .0555 .0530 .0555 .0595 .0620 .0625 .0535 .0595 .0625 .0605 .0330 .0325 .0370 .0390 .0310 .0325 .0395 .0360 1/d 2) 27 27 36 .0600 .0480 .0540 .0305 22 22 46 .0745 .0520 .0605 .0340 26 32 32 .0850 .0495 .0585 .0350 18 36 36 .1605 .0555 .0650 .0375 Note. When n, = n2 v n3, the form of contamination is I:I:D; when n 1, n2 = n the form of contamination is I:D:D. The underlined rs are outside the interval a 2ja(1-a)/2000. 74 Table 29 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15, p = 6, d = 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 D = diag(d2 27 27 22 22 26 32 18 36 36 46 32 36 d2 d2 36 46 32 36 d2 d2 36 46 32 36 .1135 .2745 .1555 .3810 1 1 1) .0800 .1415 .1045 .2050 1/d2 1/d2 .0630 .0755 .0975 .1715 .0600 .0600 .0570 .0740 .0605 .0655 .0650 .0645 .0710 .0770 .0650 .0880 .0660 .0765 .0690 .0760 .0345 .0290 .0395 .0390 .0395 .0325 .0405 .0385 1/d2) .0505 .0625 .0600 .0645 .0615 .0720 .0690 .0690 .0345 .0380 .0430 .0460 Note. When n, = n, = n1, the n3' the form of contamination is j:D:.; w en form of contamination is D:I:I. The underlined s are outside the interval a 2 a(1-a)/2000. I- 75 second order test this trend resulted in better performance in the negative condition. For Johansen's test and the first order test it resulted in better performance in the positive condition. In the positive condition, Johansen's test has three of 24 rs outside the criterion interval. In the negative condition, James' second order test has three of 24 rs outside the criterion interval. For the Pillai-Bartlett trace, performance was poor under both positive and negative conditions. In the positive condition, the test tended to yield ;s much smaller than a; in the negative condition, the test tended to yield rs much larger than a. However, when D = diag(d2 d2 1/d2) or D = diag(d2 d2 d2 1/d2 1/d2 1/d2) the test tended to be liberal in both the positive and negative conditions. With one exception, the effect of the SSCM relationship on the tests was similar when d = 3.0 to its effect when d = 1 .5: this factor did not appear to affect ; for the second order test when d = 3.0. Unlike the conditions in which N/p = 10, changing the sample size ratio did not have a clear impact on the performance of the tests. For d = 3.0, when the sample size ratio becomes more extreme, the ;s for Johansen's test and James first order test increase, which leads to poorer performance of tests. However, the impact of this factor was smaller than that of SSCM relationship. The sample size ratio did not have an effect on James' second order test. For the Pillai-Bartlett trace, as the sample size ratio 76 becomes more extreme, the test became more liberal in the negative condition and more conservative in the positive condition except when D had the form of diag(d2 d2 1/d2) or diag(d2 d2 d2 1/d2 1/d2 1/d2). With this form for D, the Pillai-Bartlett trace tended to become more liberal, in both the positive and the negative condition, as the sample size ratio became more extreme. For d = /1.5, both Johansen's test and the second order test performed better when p = 3 than when p 6. When p 3, both of them have four rs outside the criterion interval. When p increases to 6, there are nine rs outside the criterion interval for Johansen's test and 11 for the second order test. For the PillaiBartlett trace and the first order test, variation in p did not have an impact on the performance of the tests when d = 11.5. For James' second order test, when d increased to 3.0, the effect of the number of variables (p) became even stronger than when d = 11.5. When p = 3, the ;s for the second order test are close to a both in the positive and negative condition. Only two ;s are outside the criterion interval. However, when p increases to 6, the rs are smaller than a in both conditions; only three of 24 rs are in the criterion interval. The number of variables (p) did not have strong impact on the other tests when d = 3.0. For the Pillai-Bartlett trace, the effect of the form of D when d = f1.5 was similar to the effect observed when N/p 10 and 77 d 1.5. The test performed better with D = diag(d2 d2 1/d2) or D = diag(d2 d2 d2 1/d2 1/d2 1/d2) than with D in the other forms. With contaminated covariance matrices of these forms, only one ; of the 16 estimated for the Pillai-Bartlett trace is outside the criterion interval. When d increases to 3.0, the nature of the effect is that when D = diag(d2 d2 1/d2) or D = diag(d2 d2 d2 1/d2 l/d2 l/d2), the Pillai-Bartlett trace tended to perform better in the positive condition than it did in the negative condition. The form of D did not effect the other tests. In general, when N/p = 15, Johansen's test and James' second order test performed better than either the Pillai-Bartlett trace or James' first order test. When judged in terms of the number of rs outside the criterion interval Johansen's test performed well in the positive conditions, and except for conditions in which p 6 and d = 3, the second order test performed well in the negative conditions. In the exceptional condition, both the second order test and Johansen's test had the vast majority of ;s outside the criterion interval. However, the average r for Johansen's test was .062 and for the second order test was .038. The ;s for Johansen's test were not too extreme, even in the negative condition, the condition in which Johansen's test performed more poorly Of 48 rs, five were larger than .07 and only one was larger than .08. Similarly, ;s for the second order test were not too extreme in the 78 positive condition. None of these ;s was smaller than .03. Since ; for each test is reasonably near a in principle, either test might be used. However, in practice, programming the second order test is substantially more complicated than programming Johansen's test. The Performance of the Four Tests When N/p 20 Estimated rs for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test in conditions in which N/p = 20 and d = /1.5 are reported in Tables 30 through 33. Forty-eight is are reported of each test. For the Pillai-Bartlett trace, 19 ;s are outside the criterion interval. For Johansen's test three rs are outside the criterion interval, all of them in the p = 6 condition. For James' first order test, 18 ;s are outside the criterion interval. For James' second order test, 13 is are outside the criterion interval. Given d = /1.5, all tests had a notable decrease in the number of rs outside the criterion interval as N/p increased from 15 to 20. There were 21%, 77%, 51%, and 13% decreases for Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test respectively. Estimated Ts for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test in conditions in which N/p = 20 and d = 3 are reported in Tables 34, 35, 36, and 37. Forty-eight ;s are reported for each test. For the PillaiBartlett trace, 42 ;s fall outside the criterion interval. For 79 Johansen's test, seven Ts fall outside the criterion interval. For James' first order test, 26 s fall outside the criterion interval. For James' second order test, nine rs fall outside the criterion interval. Except for the Pillai-Bartlett trace, there is a decrease in the number of rs outside the criterion interval as N/p increases from 15 to 20. There is a 65%, 31%, and 62% decrease for Johansen's test, James' first order test, and James' second order test respectively. Given N/p = 20, as d increases from f1.5 to 3, the number of rs outside the criterion interval increases 121%, 133%, and 44% for the Pillai-Bartlett trace, Johansen's test, and James' first order test respectively. For the James' second order test, there is a 31% decrease. The SSCM relationship affected the performance of all tests except James' second order test. The latter test performed almost equally well in both positive and negative conditions. There are 13 Ts outside the criterion interval for the positive condition and nine for the negative condition. For the other tests, the number of rs outside the criterion interval is larger in the negative condition than in the positive condition. This resulted in a better performance for the first order test in the positive condition. Johansen's test performed better in the positive condition except when d = f1.5. When d = f1.5 it performed well in both conditions. 80 Table 30 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 20, p = 3, d = r1.5, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 24 .0340 .0440 .0490 .0395 15 15 30 .0320 .0450 .0515 .0415 16 22 22 .0415 .0480 .0515 .0435 12 24 24 .0335 .0470 .0510 .0395 D = diag(d2 d2 1) 18 18 24 .0510 .0550 .0575 .0485 15 15 30 .0295 .0540 .0585 .0470 16 22 22 .0450 .0505 .0550 .0435 12 24 24 .0365 .0520 .0580 .0435 D = diag(d2 d2 1/d2) 18 18 24 .0500 .0545 .0605 .0510 15 15 30 .0420 .0520 .0600 .0480 16 22 22 .0515 .0500 .0545 .0460 12 24 24 .0400 .0485 .0560 .0410 Note. When n, = n2 n the form of contamination is I:I:D; when ni n2 = n3, the form of contamination is I:D:D. The underlined is are outside the interval a 2 a(1-a)/2000. 81 Table 31 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p = 20, p = 3, d = .5, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 24 .0560 .0475 .0525 .0430 15 15 30 .0675 .0500 .0560 .0405 16 22 22 .0620 .0560 .0625 .0500 12 24 24 .0640 .0515 .0620 .0420 D = diag(d2 d2 1) 18 18 24 .0595 .0545 .0600 .0465 15 15 30 .0615 .0595 .0645 .0530 16 22 22 .0490 .0485 .0545 .0405 12 24 24 .0590 .0530 .0610 .0450 D = diag(d2 d2 1/d2) 18 18 24 .0470 .0480 .0500 .0410 15 15 30 .0515 .0470 .0535 .0380 16 22 22 .0575 .0545 .0610 .0500 12 24 24 .0535 .0520 .0600 .0440 Note. When n, = n2 n3, the form of contamination is D:D:I; when ni # n2 = n the form of contamination is D:I:I. The underlined ;s are outside the interval a 2 a(l-a)/2000. 82 Table 32 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 20, p = 6, d = 1.5, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D = d21 36 36 48 .0370 .0450 .0480 .0355 30 30 60 .0315 .0600 .0660 .0445 32 44 44 .0350 .0465 .0495 .0385 24 48 48 .0275 .0475 .0530 .0360 D = diag(d2 d2 d2 1 1 1) 36 36 48 .0405 .0515 .0555 .0415 30 30 60 .0340 .0520 .0590 .0410 32 44 44 .0470 .0525 .0545 .0475 24 48 48 .0415 .0590 .0675 .0435 D = diag(d2 d2 d2 1/d2 1/d2 l/d2) 36 36 48 .0535 .0475 .0565 .0380 30 30 60 .0520 .0585 .0655 .0410 32 44 44 .0490 .0490 .0550 .0385 24 48 48 .0565 .0585 .0645 .0445 Note. When n, = n2 n3, the form of contamination is I:I:D; when n -n2 = n3, the form of contamination is I:D:D. The underlined s are outside the interval a 2ja(l-a)/2000. 83 Table 33 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 20, p = 6, d = J15, and o = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D = d21 36 36 48 .0510 .0470 .0515 .0400 30 30 60 .0805 .0560 .0625 .0425 32 44 44 .0520 .0415 .0485 .0310 24 48 48 .0815 .0635 .0670 .0460 D = diag(d2 d2 d2 1 1 1) 36 36 48 .0470 .0505 .0550 .0420 30 30 60 .0635 .0530 .0580 .0385 32 44 44 .0590 .0505 .0520 .0430 24 48 48 .0690 .0520 .0590 .0400 D = diag(d2 d2 d2 1/d2 1/d2 1/d2) 36 36 48 .0460 .0505 .0570 .0420 30 30 60 .0555 .0630 .0725 .0455 32 44 44 .0530 .0590 .0630 .0495 24 48 48 .0550 .0555 .0645 .0410 Note. When n, = n2 n the form of contamination is D:D:I; whe n, d n2 = n3, the form of contamination is D:I:I. The underlined n1 s are outside the interval cc 2ja(1-a)/2000. 84 Table 34 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 20, p = 3, d = 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 24 .0380 .0535 .0555 .0475 15 15 30 .0125 .0540 .0585 .0480 16 22 22 .0360 .0515 .0580 .0475 12 24 24 .0275 .0550 .0585 .0505 D = diag(d2 d2 1) 18 18 24 .0365 .0480 .0530 .0420 15 15 30 .0165 .0570 .0600 .0505 16 22 22 .0445 .0535 .0575 .0475 12 24 24 .0305 .0555 .0620 .0510 D = diag(d2 d2 1/d2) 18 18 24 .0490 .0540 .0570 .0480 15 15 30 .0445 .0565 .0630 .0485 16 22 22 .0755 .0510 .0575 .0455 12 24 24 .0920 .0530 .0595 .0475 Note. When n, = n2 n3, the form of contamination is I:I:D; when n 6 n2 = n3, the form of contamination is I:D:D. The underlined s are outside the interval a 2 a(l-a)/2000. 85 Table 35 Actual Type I Zrror Rates for Four Multivariate Tests: Negative Condition, N/p = 20, p = 3, d = 3, and a = .05 n1 n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 18 18 15 15 16 22 12 24 D = diag(d2 18 18 15 15 16 22 12 24 D = diag(d2 18 18 15 15 16 22 12 24 24 30 22 24 d2 1) 24 30 22 24 d2 1/d2) 24 30 22 24 .0995 .1680 .1630 .2845 .0680 .1280 .1330 .1990 .0695 .0960 .1090 .1740 .0540 .0555 .0540 .0635 .0445 .0640 .0605 .0565 .0575 .0540 .0580 .0555 .0615 .0630 .0600 .0720 .0490 .0730 .0665 .0655 .0615 .0600 .0645 .0620 .0455 .0435 .0485 .0500 .0355 .0510 .0550 .0440 .0505 .0430 .0500 .0450 Note. When ni n2 = n3, are outside n, = the the I2 n3, the form or conta form of contamination is D: interval a 2 a(l-a)/2000. mination is D:D:I; when I:I. The underlined ;s 86 Table 36 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 20, p = 6, d = 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order .2nd order trace test test D d21 36 36 30 30 32 44 24 48 D diag(d2 36 36 30 30 32 44 24 48 D diag(d2 36 36 30 30 32 44 24 48 48 60 44 48 d2 d2 48 60 44 48 d2 d2 48 60 44 48 .0330 .0025 .0335 .0155 1 1 1) .0390 .0170 .0410 .0205 1/d2 1/d2 1/d2) .0545 .0730 .1155 .1795 Note. When n, = n, I n2 = n3, the are outside the .0550 .0480 .0565 .0610 .0500 .0450 .0510 .0535 .0520 .0485 .0485 .0410 .0465 .0605 .0580 .0555 .0575 .0535 .0535 .0445 .0400 .0375 .0420 .0465 .0425 .0390 .0420 .0350 .0375 .0510 .0480 .0405 .0515 .0665 .0635 .0620 2 9- n., the form of contamination is I:I:D; when form of contamination is I:D:D. The underlined s interval a 2 a(l-a)/2000. 87 Table 37 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/P = 20, p = 6, d = 3, and a = .05 n, n2 n3 Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test D d21 36 36 30 30 32 44 24 48 D = diag(d2 36 36 30 30 32 44 24 48 D diag(d2 36 36 30 30 32 44 24 48 48 60 44 48 d2 d2 48 60 44 48 d2 d2 48 60 44 48 Note. When n, = n, go 2 = n3, the are outside the .1230 .2560 .1915 .3790 1 1 1) .0790 .1345 .1175 .2085 1/d2 1/d2 .0590 .0730 .1115 .1765 .0565 .0730 .0565 .0705 .0575 .0535 .0530 .0655 .0655 .0835 .0625 .0760 .0620 .0625 .0590 .0735 .0445 .0440 .0440 .0490 0420 .0375 .0425 .0475 1/d2) .0505 .0510 .0530 .0520 .0535 .0605 .0575 .0550 .0400 .0390 .0455 .0410 n2 n3, the form of contamination is D:D:I; when form of contamination is D:I:I. The underlined ^s interval a 2 a(l-a)/2000. 88 For the Pillai-Bartlett trace, ;s in the positive condition tend to be much smaller than ck; is in the negative condition tend to be much larger than ca, especially when sample size ratio is more extreme. However, this trend does not hold when D is in the form of diag(d2 d2 1/d2) or diag(d2 d2 d2 1/d2 1/d2 1/d2) for the Pillai-Bartlett trace. When D is in those forms, is are in the criterion interval for d = /1.5 and s tend to be larger than ck for d = 3.0, in both the positive and negative condition. More extreme sample size ratio resulted in poorer performance of the Pillai-Bartlett trace and James' first order test. When the sample size ratio is extreme, the Pillai-Bartlett trace has smaller is in the positive condition and larger Ts in the negative condition. For James' first order test, the extreme ratio leads to larger is. This factor did not influence Johansen's test or the second order test. The number of variables affected only the second order test. The rs tended to get smaller as p increased. Given N/p = 20, when p = 3 only four is are outside the criterion interval. However, when p = 6, 18 rs are outside the criterion interval. In general, both the Pillai-Bartlett trace and James' first order test have too many is outside the criterion interval in comparison with Johansen's test and the second order test. They are not recommended for use when N/p = 20. Johansen's test 89 performed well in all conditions, but performed better in the positive conditions. Its performance in the negative conditions was better when for d = .15 than for d = 3.0. In terms of number of ;s outside the criterion interval, the second order test performed more poorly than Johansen's test. However, its performance was not bad and may be preferred by researchers who do not want to risk r > a. A practical problem for such researchers will be the difficulty in programming the second order test. Summary Six factors, the ratio of N to p, the magnitude of d in D, the sample-size-covariance relationship, the value of p, the sample size ratio, and the form of D, affected the performance of the four tests studied. However, the magnitude and the direction of the effect were not the same across tests. In general, as N/p increased, all of the tests except the Pillai-Bartlett trace become more robust. The effect of the other factors on Johansen's test and James' tests seems to decrease as N/p increased: the number of ;s within the criterion interval increased as N/p increased. Even though James' first order test performed better as N/p increased to 20, it had more is outside the criterion interval than the second order test and Johansen's test. As d increased, with the exception of the second order test when N/p = 20, the performance of all the tests declined. For N/p 90 = 20, James' second order test performed as well or better as d increased. The i-s tend to be larger in the negative condition than in the positive condition. This effect lead to a better performance in the positive condition for the second order test and a poorer performance for Johansen's test and the first order test. For the Pillai-Bartlett trace, ;s are much smaller than a in the positive condition and much larger in the negative condition. When the sample size ratio becomes more extreme, the performance of the tests, except the second order test, declines. The extreme sample size ratio lead to a moderately better performance on the second order test when N/p was less than 20. The change in p had a strong impact on the performance of the second order test. The test was more robust when p = 3 than when p = 6. This factor did not impact on the other tests. The form of D affected only the Pillai-Bartlett trace. When D is in the form of diag(d2 or 1/d2) the performance of the test improved. Tables 38, 39, and 40 were constructed using the rule of thumb that a test is robust for a particular condition if three or fewer is are outside the criterion interval. These serve to summarize the major effects of the factors on the four tests. Similar results 91 Table 38 ;s Outside the Criterion Interval of the Four Multivariate N/p = 10 and a =.05 d p Positive/ Negative Pillai- Johansen's James' James' Bartlett test 1st order 2nd order trace test test f1.5 3 P 8 7 12 7 N 2' 10 12 2* 6 P 7 10 12 11 N 3* 10 12 10 3 3 P 9 5 12 6 N 12 12 12 3* 6 P 12 9 12 12 N 11 11 12 10 Note. A indicates the test is relatively robust. The maximum number of ;s in each cell is 12. are reported in Tables 41 to 43 for a = .01 and in Tables 44 to 46 for a = .10. These substantiate the claim that major patterns of effects are similar for all three a levels. The most notable exception to this claim in for the second order test when a = .01. It was more robust than Johansen's test for both N/p = 15 and 20. This is a floor effect: the second order test tended to have ; < a. N~umb!er of Tests for NUMDer of 92 Table 39 Number of fs Outside the Criterion Interval of the Four Multivariate for N/p = 15 and a = .05 d p Positive/ Pillai- Johansen's James' James' Negative Bartlett test 1st order 2nd order trace test test 1.5 3 P 6 0* 6 4 N 7 4 12 0* 6 P 6 3* 9 8 N 5 6 10 3* 3 3 P 8 2' 9 2* N 12 8 11 0* 6 P 11 1 6 12 N 12 10 12 9 robust. The maximum Note. A indicates the test is relatively number of rs in each cell is 12. 93 Table 40 Number of ;s Outside the Criterion Interval of the Four Multivariate Tests for N/P = 20 and a = .05 d p Positive/ Negative /1 5 3 P N 6 P N 3 3 P N 6 P N PillaiBartlett trace 6 4 5 4 9 12 10 11 Johansen's test 0* 0* 1l 2* 0* 3* 1* 3* The maximum James' 1st order test James' 2nd order test 2* 7 4 5 3* 11 1* 6 4 0* 1* 4 5 8 Note. A indicates the test is relatively number of is in each cell is 12. 3' robust. 94 Table 41 Number of ;s Outside the Criterion Interval of the Four Multivariate Tests for N/p = 10 and a = .01 d p Positive/ Pillai- Johansen's James' James' Negative Bartlett test 1st order 2nd order trace test test 1.5 3 P 5 4 12 2* N 5 4 12 0* 6 P 6 7 12 5 N 10 11 12 4 3 3 P 5 6 12 4 N 10 7 12 9 6 P 4 8 12 8 N 11 11 12 5 Note. A indicates the test is relatively robust. The maximum number of ;s in each cell is 12. |

Full Text |

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID ENGEREDLL_0V7ZKV INGEST_TIME 2017-07-11T21:27:34Z PACKAGE UF00101030_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES PAGE 1 ROBUSTNESS OF FOUR MULTIVARIATE TESTS UNDER VARIANCE -COVARIANCE HETEROSCEDASTICITY BY KEZHEN LINDA TANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1989 PAGE 2 ACKNOWLEDGMENTS I would like to express my sincerest appreciation to Dr. James J. Algina for his guidance, advice, and encouragement. Starting with choosing the dissertation topic, through writing the computer program, to conducting the study, and summarizing the results, he has guided me step by step. I am also grateful to Dr. Frank G. Martin, who provided many valuable comments and suggestions. I also am indebted to Dr. Robert R. Sherman and Dr. Linda M. Crocker, who patiently read the manuscript and offered editorial assistance, and who also helped me build my confidence when I encountered difficulties. I also appreciate and thank Dr. Ramon C. Littell for his statistical assistance and for allowing me to use the computer facilities in the Department of Statistics. Many thanks are extended to Ms. Pamela S. Somerville for her assistance in using the computer program to process the manuscript. Without her help the formulas and symbols could not be presented clearly. Finally, I would like to express my most profound appreciation and gratitude to my family for their deep love and strong support. PAGE 3 TABLE OF CONTENTS page ACKNOWLEDGMENTS ii ABSTRACT v CHAPTERS 1 INTRODUCTION 1 The Problem 4 Purpose of the Study 5 Significance of the Study 5 2 REVIEW OF LITERATURE 7 The BehrensFisher Problem 7 Welch' s Solutions 8 James' Series Solution for Comparing Several Means 10 Welch's APDF Solution for Comparing Several Means 12 Behavior of Welch's and James' Tests 13 Hotelling's Two Sample T 2 15 Behavior of T 2 When Covariance Matrices Are Unequal 16 Yao's and James' Tests 18 Behavior of Yao s and James Tests 20 MANOVA 22 James' and Johansen's Solutions 25 Summary 28 3 METHODOLOGY 30 Design 30 Invariance Property of the Test Statistic 36 Simulation Procedure 40 Summary 41 4 RESULTS AND DISCUSSION 43 The Performance of the Four Multivariate Tests When n, = nj = n 3 43 The Performance of the Four Multivariate Tests When Sample Sizes Are Unequal 51 PAGE 4 5 CONCLUSIONS 10 Experiments with Equal Sample Sizes 102 Experiments with Unequal Sample Sizes 103 APPENDICES A RESULTS OF THE SIMULATED EXPERIMENTS WHEN n, ru, = n 3 a .01, AND a .10 105 B RESULTS OF THE SIMULATED EXPERIMENTS WHEN SAMPLE SIZES ARE UNEQUAL AND a = .01 117 C RESULTS OF THE SIMULATED EXPERIMENTS WHEN SAMPLE SIZES ARE UNEQUAL AND a .10 141 REFERENCES 165 BIOGRAPHICAL SKETCH 170 PAGE 5 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ROBUSTNESS OF FOUR MULTIVARIATE TESTS UNDER VARIANCE -COVARIANCE HETEROSCEDASTICITY BY Kezhen Linda Tang December, 1989 Chairman: James J. Algina Major Department: Foundations of Education Three hundred and sixty simulated experiments were conducted to compare Type I error rates of the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test when the covariance matrices are heterogeneous. Both equal and unequal sample-size conditions were investigated. Results of this study indicated that the ratio of total sample size to the number of variables had a strong impact on the performance of the tests. As the ratio increased, the tests become more robust, except the Pillai-Bartlett trace. With one exception, as the covariance heteroscedasticity increased, the performance of the tests declined. The exception was James' second order test, which performed better when heteroscedasticity increased and the ratio of total sample size to the number of variables was 20. When PAGE 6 the smaller samples were associated with the covariance matrices with smaller elements, the performance of Johansen's test and James' first order test improved. When the smaller samples were associated with the covariance matrices with larger elements, the performance of James' second order test improved. The number of variables impacted only James' second order test. The performance of the test declined as the number of variables increased. The sample size ratio affected the tests only when the ratio of the total sample size to the number of variables was small. The form of covariance matrices affected only the performance of the Pillai-Bartlett trace. In conclusion, Johansen's test and James' second order test performed better than the Pillai-Bartlett trace and James's first order test when covariance matrices were unequal. Johansen's test tended to have estimated Type I error rates larger than a and James' second order test tended to have estimated Type I error rates smaller than a. However, for both tests the Type I error rates were reasonably near a. PAGE 7 CHAPTER 1 INTRODUCTION It is common in educational research to compare two or more samples on several criterion variables. That is, p response variables are observed in k treatment groups of n, experimental units per group (i = 1, ..., k) Multivariate analysis of variance (MANOVA) can be employed to analyze the data. When p 1, the univariate F test can be used. When the assumptions of the F test are met, it is the uniformly most powerful, invariant test of the null hypothesis that the means of the populations from which the k samples are drawn are equal. However, in MANOVA there is no single invariant test that is uniformly most powerful. Four MANOVA test criteria are commonly employed: 1. Roy's largest root (R) ; 2. The Hotelling-Lawley trace (T) ; 3. Wilks' likelihood ratio (W) ; 4. The Pillai-Bartlett trace (V). When either k 2 or p = 1 the four test criteria lead to the same decision about the null hypothesis. One of the major assumptions of MANOVA is homogeneity of the covariance matrices in the k populations from which the samples are drawn. Stevens (1986) pointed out that the restric tiveness of the PAGE 8 2 assumption becomes more strikingly apparent when we realize that the corresponding assumption for the univariate ANOVA is that the variances on only a single variable are equal. Therefore, it is very unlikely that the equal covariance matrices assumption ever would be absolutely satisfied in practice. Olson (1974) provided a general picture of the behavior of the MANOVA criteria under violation of the covariance homogeneity assumption. According to Olson, for protection against covariance heterogeneity in the population, the T, W, and R tests should be avoided. Although the V test stands up best to violations of homogeneity of covariance matrices, its actual Type I error (r) is still somewhat high. For example, under various simulated experimental conditions, Olson (1973) reported that the estimated Type I error rate, r, of the V test ranged from .061 to .186 when the nominal rate was a = .05. James (1954) and Johansen (1980) proposed multivariate tests for the situation in which the covariance homogeneity assumption is violated. In their tests, the covariance matrices of k populations are estimated separately. Therefore, the common covariance matrix assumption is not required for the tests that are solutions to the multivariate Behrens-Fisher problem. The univariate Behrens-Fisher problem is that of testing equality of the means of two populations without assuming that the PAGE 9 3 population variances are equal. According to Yao (1965), Behrens was the first to offer a solution to the BehrensFisher problem. Fisher (1935, 1939) showed that Behrens' solution could be derived by using his theory of statistical inference called "fiducial probability Welch (1938) proposed an approximate degrees of freedom (APDF) solution to the Behrens -Fisher problem, a solution in which the test statistic is approximately distributed as Student's t with appropriately adjusted degrees of freedom. Later, Welch (1951) proposed an APDF solution for the k-sample case, a solution in which the distribution of the test statistic is approximated by an F distribution with adjusted degrees of freedom. Welch (1947) proposed a series solution to two-sample Behrens -Fisher problem and presented the solution to terms of order 1/f*, where f, n, 1. Aspin (1948) extended Welch's approximation to the terms of order 1/f*. James (1951) proposed a solution to the k-sample Behrens -Fisher problem. He also used the series expansion technique. The distribution of his test statistic is approximated by a function of the x 2 distribution. Welch's (1938, 1951) APDF solution was generalized by Yao (1965) to solve the multivariate two-sample Behrens Fisher problem. It was further generalized by Johansen (1980) to solve the multivariate k-sample Behrens-Fisher problem. James' (1951) series PAGE 10 solution was generalized to the multivariate k-sample situation by James (1954), who reported both a first order and a second order test The Problem The behavior of Welch's APDF test and James' series test has been investigated by several authors (Brown & Forsyth, 1974; Wang, 1971). The APDF solution in the two-sample multivariate situation has also been studied (Yao, 1965; Algina & Tang, 1988). However, the behavior of Johansen's APDF solution and James' series solution to multivariate k-sample Behrens-Fisher problem is an unexplored topic The researches listed in the preceding paragraph demonstrated that with heteroscedastic covariance matrices (variances in the univariate case), the APDF and series solutions to both the univariate and multivariate Behrens-Fisher problems performed better than the usual tests, such as t, F, and Hotelling's T 2 for which homoscedasticity is assumed. Furthermore, the studies demonstrated that the APDF tests are superior to James' first order tests in controlling r. Whether this is still true when the tests are applied in the multivariate k-sample situation is unknown. Dijkstra and Werter (1981), on the other hand, showed that James' second order test gives better control of r than does Welch's APDF test in the univariate k-sample case. Their findings suggest PAGE 11 5 that the superiority of Welch's APDF test to James' first order test may be due to omitting the second order term in James' test. With modern computer facilities it may be preferable to include James' second order term in the multivariate test. The behavior of MANOVA has been studied by Olson (1974). He concluded that among the four criteria only the V test is relatively robust against violation of the equal covariance matrices assumption. However, his study was restricted to an equal number of experimental units in each of k samples. Therefore, investigating the performance of James' and Johansen's tests when the sample sizes are unequal and comparing them to the performance of V under the same condition is merited. Purpose of the Study The purpose of the present study was to compare Type I error rates of the Pillai-Bartlett trace (V), Johansen's test, and James' first and second order tests when the covariance matrices are heterogeneous. Both equal and unequal sample-size conditions were investigated. Significance of the Study The application of multivariate analysis of variance (MANOVA) in education and the behavioral sciences has increased dramatically, and it appears that it will be used frequently for data analysis in the future (Bray & Maxwell, 1985). Stevens (1986) provided three PAGE 12 reasons why multivariate analysis is necessary: 1. Any worthwhile treatment will affect the subject in more than one way. Hence the problem for the investigator is to determine in which specific ways the subjects will be affected, and then find sensitive measurement techniques for those variables. 2. Through the use of multiple criterion measures we can obtain a more complete and detailed description of the phenomenon under investigation. 3. Treatments can be expensive to implement, while the cost of obtaining data on several dependent variables is relatively small, and maximizes information gain. (p. 2) Because the assumption of equal covariance matrices of k populations is unlikely to be satisfied in practice, there is potential danger that r will not be equal to a when using the usual MANOVA criteria. In this study, estimates of r were reported for four tests that can be employed in MANOVA: Pillai-Bartlett trace, James' first order test, James' second order test, and Johansen's test. Both equal and unequal sample sizes were employed. The results provided evidence about whether or not the latter three tests are good alternatives to the Pillai-Bartlett trace, which was reported by Olson to be more robust when sample sizes are equal than Roy's, Wilk's, or the Lawley-Hotelling test. PAGE 13 CHAPTER 2 REVIEW OF LITERATURE The Behrens-Fisher Problem One of the most common situations in educational research is the comparison of the means of two populations. If we assume the variable is normally distributed in each population and has equal variances in the two populations, the t test is the uniformly most powerful test for the hypothesis that the means are equal. The robustness of t when the assumption of variance equality is violated has been studied extensively. Scheffe (1959) summarized results of studies by Hsu (1938) and Welch (1938, 1951). In general, if the two samples are of equal size, the distribution of t is changed very little by inequality of variance, especially for large sample sizes. However, if the sample sizes are unequal, inequality of the variances affects the distribution considerably. The t test is liberal when the larger sample is drawn from the population with smaller variance. It is conservative otherwise. Yao (1965) pointed out that Behrens was the first to offer a solution to the problem of testing equality of the means of two populations without assuming that the variances are equal. This problem has become known as the Behrens-Fisher problem. Fisher PAGE 14 8 (1935, 1939) showed that Behrens solution could be derived using his theory of statistical inference called "fiducial probability." Fisher and Healy (1956) calculated tables of the BehrensFisher distribution. James (1959) and Ray and Pitman (1961) further studied the distribution. Six other solutions to the Behrens-Fisher problem were discussed and compared by Scheffe (1970). According to Scheffe, within the Neyman-Pearson school of thought, Welch's approximate degrees of freedom solution (APDF) which requires only the ubiquitous t-table, is a satisfactory practical solution to the Behrens-Fisher problem. Welch's Solutions APDF Solution Let x, PAGE 15 f = fjL + JLV n?(n, 1) n|(n2 1) Welch's approximation involves referring 1/ to a table of the t distribution, entered with f degrees of freedom. Welch (1947) showed that the errors involved in making probability statements about v on the basis of this particular approximation are of order 1/f,. Series Solution In addition to his APDF solution, Welch (1947) derived an exact solution, in the sense that it is independent of the irrelevant population parameters a\. Let X, and X 2 the means of two samples, be distributed independently and normally with means /x, fi 2 and variances crf/n, o\/^ respectively. Suppose that the observed data provide estimates S 2 t of the variances, a\, and these estimates are distributed independently of each other and of X, Xg. Then, Pr [(X, \) (/x,, M 2 ) < h(S?, S|, P)] = P where < P < 1. The quantity h(S*, S| P) is a function of S^ and P such that P is the probability that (X, X\>) (/x, M 2 ) is smaller than h(S*, S|, PAGE 16 10 P). To test the hypothesis H Q : /i, = m 2 h(S 2 S| P) is calculated, and serves as a critical value for comparing the population mean differences. Welch (1947) provided a formula for h(S 2 S|, P) up to terms of order 1/f 2 : h(sf, s|, p) = aw 2 )' (i + e) r a 2 s 4 l (SA.S?)' (15 + 32C 2 + 9| 4 ) 32 (S.A.Sf)* (3 + se 2 + ? 4 ) AfS A 3 S b E, 2JLTJ (S,A,S 2 ) 2^3 l" i (Wfr where Â£ is the normal deviate such that *(f) = P, and A, = 1/n,. Aspin (1948) extended the work further to the terms of order 1/f In addition, tables of critical values were computed by Aspin (1948, 1949) and Trichett, James, and Welch (1954, 1956). The limiting behavior of the series solution was studied by Wald (1955) Welch's series solution to the Behrens-Fisher problem agrees with Welch's (1938) APDF solution up to and including the term of order 1/f,. James' Series Solution for Comparing Several Means James (1951) derived, for the k-sample case, a series solution to the Behrens-Fisher problem. Let X,, X k the means of k PAGE 17 11 samples, be distributed independently and normally with means /*, ..., Mk and variances c^/n, ..., ^/n k respectively. Let S 2 (i = l,...,k) be estimates of the a\. The S 2 are distributed independently of the X, and of each other. To test the hypothesis H : p. n 2 ... M k we calculate the statistic k I H (X, X) 2 i = 1 where and W, = (S 2 / n,)" 1 k W I W, i = 1 k I W, X, i 1 W If the n, are all large, Â£, w i (X, X) 2 can be taken to have the \ 2 distribution with k 1 degrees of freedom. However, if the n, are not all large, the test statistic may not have a \ 2 distribution with k 1 degrees of freedom. James (1951) sought to derive a PAGE 18 12 series function of the S 2 such that Pr [ I,W, (X, X) 2 < 2h(S 2 )] = P James found approximations to 2h(S 2 ) of orders 1/f, and 1/f To order of 1/f., he found 2h(S 2 ) = xU a 1 + 3xli. a + (k + D 2(k 2 1) The quantity 2h(S 2 ) serves as James critical value. The test statistic, I.W^X, X) 2 is compared with this critical value and H is rejected if ^(X, X) 2 > 2h(S 2 ) James also provided the approximation of order 1/f 2 which can be shown to agree with Welch's (1947) result when k = 2. However, James pointed out that this second order approximation involves a good deal of numerical calculation. Welch's APDF Solution for Comparing Several Means Welch derived an alternative series solution for the k-sample case. By comparing the cumulantgenerating function of XjW^X, X) with that of the F distribution, Welch brought the two into agreement. Welch found that to order 1/f,, I.W^X, X) is distributed as C times F, where PAGE 19 c k i + 2(k 2) z, 1 k + 1 f, 13 This F distribution has degrees of freedom df, = k 1 and df, f 3 l PAGE 20 14 nominal a level is about 0.001. For larger sample sizes in both samples, such as 13:19, the maximum deviation from the nominal a level decreases to 0.0005. Wang also found that Welch-Aspin series solution (Welch 1947 ; Aspin, 1948) gives even smaller deviations from the nominal a than Welch's APDF test. However, Wang pointed out that because the Welch-Aspin critical values are available for only a selected set of a, f, and f 2 and the actual computation for the critical values is very tedious, it seems reasonable to use the Welch APDF test, which requires only the t-table. Scheffe (1970) also opined, as mentioned earlier, that Welch's APDF solution is a satisfactory practical solution of the BehrensFisher problem. Brown and Forsyth (1974) reported that in the k-sample situation, r for the ANOVA F test deviates markedly from the nominal a level when the variances of the populations are unequal. The Type I error rates of Welch's (1951) APDF solution and James' (1951) solution, on the other hand, are near the nominal a levels. They further pointed out that Welch's (1951) APDF solution is superior to that of James (1951) series solution in controlling r, especially when the sample sizes are small. Dijkstra and Werter (1981) showed that James' second order test gives better control of r than does Welch's test. Their conclusions were supported by Wilcox (1987). PAGE 21 15 Hotelling's Two Sample T 2 Suppose the purpose is to compare the mean vectors p, and n 2 of two p-dimensional multivariate normal populations on the basis of a random sample from each. The two populations are assumed to have the same, though unknown, covariance matrix Â£ of full rank p. Let the samples be denoted by (x (| j 1, 2, .... ry, I 1, 2) Define the sample mean vectors and covariance matrices as ii "' 3-1 s, -h4-i z (x x,Kx Xl) J-l For testing the hypothesis H Q : /i, n 2 vs H, : /i, /i 2 Hotelling (1931) proposed the T 2 statistic which is a generalization of univariate t test. The test statistic is T 2 ", "? (x, x 2 )'S" 1 (x, x 2 ) where (n, 1 ) S 1 + (il, 1)S 2 n, + r^ 2 PAGE 22 16 Hotelling showed that the quantity a, + ix, p 1 T 2 (n, + r^ 2)p has an F distribution with degrees of freedom p and n. + ru p 1. H is rejected if r 2 > ("1 + "2 2 >P F S; + nj p i a 'P' where u = n, + rig p 1 Behavior of T 2 When Covariance Matrices Are Unequal The robustness of Hotelling' s T 2 to the violation of homogeneous covariance matrices assumption has been investigated by many researchers both analytically (Ito & Schull, 1964) and empirically (Algina & Oshiina 1989; Hakstian, Roed, & Lind, 1979; Holloway & Dunn, 1967; Hopkins & Clay, 1963). The inequality of covariance matrices, S, Y~,, is reflected by the eigenvalues 0,, i 1 2, ...,p, of the matrix S^E^ 1 Ito and Schull (1964) investigated the large-sample properties of the T 2 statistic in the presence of unequal covariance matrices S, and Ej. They demonstrated that when n, and ru, are equal and large, PAGE 23 17 and the eigenvalues of S,^ 1 are equal to one another but are not equal to one, heteroscedasticity has no effect upon r. Ito and Schull also investigated the situation in which the eigenvalues of E^" 1 are distinct and both samples are very large, but only for p = 2. They reported r near a provided the eigenvalues are within the range ( 5 2) Hakstian, Roed, and Lind (1971) also found that with equal sample sizes the T 2 procedure is generally robust with respect to violation of the homogeneity of covariance matrices assumption. However, even when n, = r^, other factors, such as sample sizes, p, and 8, also play important roles in the robustness of T 2 Based on 1000 replications, Hopkins and Clay (1963) showed that when a, r^ > 10, T 2 is rather robust against variance inequality but that this robustness does not extend to smaller sample sizes. For example, when n, rb, = 5, p 2, 0, 6 2 10.24 and a 0.05, r was .083. Holloway and Dunn (1967) found that for large and equal sample sizes, the robustness of T 2 depends on p. For example, when n, = r^ 50, 0, S 2 = 10, r is close to a for p = 2 and p 3, but departs fairly markedly from a when p = 7 or p *= 10. For unequal sample sizes, in all of the preceding studies, researchers found that the test leads to unacceptable Type 1 error rates as the degree of population covariance matrix heterogeneity is increased. Even for p = 2 and relatively mild heterogeneity, for PAGE 24 18 example, 5^ = 2. 252,, a sample size ratio of only 1.1:1 can produce an unacceptable Type I error rate (Algina & Oshima, 1989). When the larger sample is drawn from the population with the smaller covariance matrix, the test results in overestimation of significance. When the larger sample is drawn from the population with the larger covariance matrix, the test results in underestimation of significance. This tendency increases with the magnitude of the inequality of two samples, with the degree of heterogeneity, and with p. In general, with large and equal sample sizes and a fairly large ratio of (n, + rt,) to p the T 2 test is robust. Otherwise, the test is less robust, being conservative when the larger sample is drawn from the population with generally greater dispersion, and liberal in the opposite situation. Therefore, the behavior of T under violation of homogeneous covariance matrices assumption is similar to that of the univariate t test. Yao's and James' Tests James (1954) and Yao (1965) proposed tests to deal with multivariate Behrens -Fisher problem when there are two samples. The tests allow the assumption of equal population covariance matrices to be relaxed. James' (1954) test is a generalization of James' (1951) univariate solution of the BehrensFisher problem. Yao's test is a generalization of Welch's (1938, 1951) APDF PAGE 25 19 solution. The test statistic is T 2 = (x, x 2 )'(S 1 /n 1 + S 2 /n 2 ) 1 (x 1 x 2 ) This test statistic is asymptotically distributed as chi-square with p degrees of freedom. James proposed a critical value, 2h(S 2 ) which is a generalization of his univariate critical value discussed in the previous section. To order 1/f,, this critical value is 2h(S 2 ) = x l (A + Exi) where A = 1 + 2 Â£ tr (I V 1 W.) 1 P(p + 2) tr (I V 1 WV 1 PAGE 26 20 S./a, and A = A, + Ag. Define V, = y'A 1 AA" 1 yThe approximate degrees of freedom (APDF) f T is defined by f T 1-1 f. Therefore, the critical region is T$ > T 2 (p, f T ) If tables of T 2 are not available, the transformation F f T P + l T^7 can be used. Thus we reject H Q : /*i M2 If F v > F (P> ^t P + *) Behavior of Yao's and James' Tests Yao (1965) conducted a simulation study to estimate r for James' test and his own test. The results indicated that both tests are quite robust. However, the generality of this conclusion is limited because Yao only studied p = 2. Ito (1969) investigated the behavior of James' test analytically. He studied both equal and PAGE 27 21 unequal sample sizes. For k = 2, the sample sizes, n, + r^, were 20, 300, and 600. He found that when rr, + r^ = 300 or 600, the test was quite robust. When the total sample size was 20, the test was liberal; i.e., r is greater than a. This tendency increases as sample sizes decreases, the inequality of sample sizes increases, the inequality of covariance matrices increases, and p increases. The behavior of James' test for sample sizes between 20 and 300 was not included in Ito's study. Algina and Tang (1988) investigated the performance of T 2 Yao's test, and James' test under a broader set of conditions than Yao (1965) studied. In the Algina-Tang study, p = 2, 6, or 10; (n, + ngVp 6, 10, or 20; n, : r^ ranged from 1:1.25 to 1:5; and 6 t ranged from 1.5 to 3.0. They found that the deviation of r from a is larger for Hotelling's T 2 than for either Yao's or James' test when 6*1. Neither Yao's nor James' test tends to be conservative. The t for James' test tends to be larger than that for Yao's test. Algina and Tang concluded that in terms of control of Type I error rates, Yao's test is preferable to James' test. When the sample sizes are equal (n, n 2 = n) Yao's test has appropriate Type I error rates provided (2n/p) > 10. When n 1 n 2 and 10 < (n, + r^ )/p, Yao's test can also be safely used, provided the ratio of the larger to the smaller sample size is less or equal to 2:1. When n, r^ and (n, + n 2 )/p > 20, Yao's test can be safely used a) if p = 2 PAGE 28 22 and the ratio of the two samples is 5:1 or smaller; b) if p 6 and the ratio is 3:1 or smaller; and c) if p=10 and the ratio is 4:1 or smaller MANOVA Suppose the purpose is to compare the means of /j, n 2 M k of k p-dimensional normal populations on the basis of a random sample from each. Let H(p x p) and E(p x p) be the sums -of -products matrices for hypothesis and error, respectively, defined in the one-way case as k H I n, (x, x)(x, x)' i-1 k n, E = I I (xÂ„ X.XX,, x,)' i-1 j=l where x is the i th observation vector in group i, x is the mean ii ~> vector for the ith group and x is the grand mean vector. It is known that all invariant test criteria are functions of the eigenvalues of HE" 1 (the rank of which is s = min(p, k 1)). There are four popular test statistics, each a different function of the eigenvalues of HE" 1 Let A m be the mth eigenvalue of HE" 1 m = 1, ..., s. The four test criteria are: 1. Roy's (1945, 1953) largest root (R) R = A,/(l + A,), PAGE 29 23 where A, is the largest eigenvalue of HE 1 ; 2. The Hotelling-Lawley (Lawley, 1938, 1939; Hotelling, 1951) trace (T) T = X m A m ; 3. Wilk's (1932) likelihood ratio(W). W = \\ m l/d + A m ) ; 4. The Pillai-Bartlett (Bartlett, 1939; Pillai, 1955) trace (V). V I m A m /(l + A m ). When s 1, i.e. when either k = 2 or p = 1 the four criteria lead to identical conclusions. The robustness of MANOVA is, for the most part, an unexplored topic. Korin (1972) studied the behavior of R, T, and W under the violation of the homogeneity of covariance matrices assumption. In his study, the base population had a covariance matrix equal to I, the identity matrix. The other populations had covariance matrices in the form of d 2 I where d = JlTS 73.0, or 710.0. Korin reported that covariance heterogeneity in small samples, even when the sample sizes are all equal, produces higher Type I error rates than the supposed significance level. Among these three test criteria, R has the highest Type I error rate. He did not include V test in his study. Olson (1974) compared the robustness of R, T, W, and V to violations of the homogeneity of covariance matrices assumption. In his study, the number of dimensions, p, was 2, 3, 6, or 10, the number of groups, k, was 2, 3, 6, or 10, and the sample sizes, n, PAGE 30 2 A were 5, 10, or 50. Olson referred to the sample drawn from a population that had a different covariance matrix than the remaining populations as a contaminated group. There were two levels of contamination, namely, low concentration of contamination and high concentration of contamination. In the low concentration condition, contamination occurred equally in all dimensions of p-space. Under high concentration, however, contamination occurred in only one dimension of p-space. The degree of contamination, d, was 2, 3, or 6. Olson (1974) found that in the low concentration of contamination condition, all of the four criteria became liberal when the assumption of equal covariance matrices was violated. The departure from the nominal a can be serious if d > 3. The order of the tests in terms of the magnitude of the departure of r from a was typically R > T > W > V. The size of the departure increased as heteroscedasticity increased. With increasing values of p, the t's for R, T, and W also increased. However, t for V did not respond in any clearcut way to changes in p. Increasing sample sizes decreased t for R, T, and W. It slightly increased r for V when k > 6. However, this increased t for V was still smaller than those of R, T, and W. Under the high concentration-of -contamination condition, Olson (1974) found that the departure from the supposed significant level PAGE 31 25 was mild, even though one dimension was severely contaminated. Olson (1974) pointed out that, above all, every effort should be made to maintain samples of equal size, or at least to prevent contaminated groups from being smaller than other groups. Olson concluded that the V test is the most robust among the four against heterogeneity of covariance matrices, although its Type I error rate is somewhat greater than a. James' and Johansen's Solutions James (1954) and Johansen (1980) proposed tests to deal with the multivariate k-sample Behrens-Fisher problem. James' (1954) solution is an extension of his solution to the two-sample multivariate Behrens-Fisher problem. Johansbn (1980), on the other hand, generalized Welch's (1951) APDF solution. James' (1954) Solution The test statistic is k I (x, x)'W,(x, x) i=l where x = W 1 X,W,x, PAGE 32 26 and W, = (S/n,)" 1 W = X, w The critical value is 2h(S 2 ) = x f (A + X 2 B) where r = p(k 1) A 1+ 1 I [tr(I -W^lf i=l r(r + 2) I 1-1 tr(I vr 1 w,) 2 1 Â£ [tr(I tr 1 w ( )] 2 1=1 James (1954) also proposed a second order test. It was presented as formula 6.7 in his paper. Johansen's (1980) Solution The test statistic is [X,(x, x)'W,(x, x)]/C where PAGE 33 27 C p(k 1) + 2A 6A/[p(p 1) + 2 and t tr(I W%y + [tr (I VT'W,) A I 2^ i = l The critical value is F with degrees of freedom df, and df 2 where df, p(k 1) df 2 = p(k l)[p(k 1) + 2]/3A Ito (1969) studied the large sample behavior of James' (1954) test. He used total sample sizes of 300 or 600. For k = 3 and 5, he found that James' test always tends to result in a slight overestimation of significance. This tendency is more severe for unequal sample sizes than equal sample sizes. In addition, the discrepancy from nominal a increased as p increased from 1 to 4 The robustness of James test for k > 2 when sample sizes are small to moderate has not been investigated. By using the method of James (1954) to derive the APDF test of Johanson (1980), Kaiser (1983) showed that equivalence to terms of order 1/n holds between the solutions of Johansen and James. PAGE 34 28 Summary Based on the preceding review, one can conclude that t, F, T and MANOVA criteria are not robust to violation of the assumption of equal variance (covariance matrices in the multivariate case) when sample sizes are not equal. Tests proposed to deal with the BehrensFisher problem, on the other hand, are more robust than the usual tests. These tests follow one of two approaches. One is to develop the critical value of the test statistic through a series expansion (series solution). The other approach is to approximate the distribution of a test statistic through approximate degrees of freedom (APDF solution) Results of these two approaches agree to order 1/n The solutions to the multivariate problem are generalizations of the univariate solution. In general, the tests designed to deal with the Behrens-Fisher problem are liberal, i.e., they tend to have higher Type I error rates than the nominal a. However, this liberal tendency is mild except when the sample-size ratios are extreme. Therefore, these tests generally perform much better than the usual tests in controlling the Type I error rate when the assumption of homogeneity of variance (covariance matrices in the multivariate case) is violated The preceding studies indicate that APDF solutions are more effective in controlling type I error rate than the series solutions PAGE 35 2 9 of the first order. However, in the univariate case when a second order series solution is used, its type I error rate is closer to a than that of the APDF solution. PAGE 36 CHAPTER 3 METHODOLOGY The design of the study and the simulation procedure employed to carry out the study are described in this chapter. Four factors are the most important in designing a multivariate experiment: number of populations sampled, sample sizes, number of variables, and the significance level. The design of the present study was based on these four factors. In addition, the degree and form of heterogeneity was, of course, studied. Design The design of the study was based on the consideration that experimental conditions included in the simulation should reflect the reality of multivariate experiments in educational research. Number of dimensions (p) Data were generated to simulate experiments in which there are p = 3 or p = 6 response variates. It seems likely that p > 6 occurs seldomly in educational research, and therefore p = 6 was the upper limit chosen for the study. Number of populations sampled (k) All of the simulated experiments had a simple one-way design with k = 3. Algina and Tang (1988) investigated the behavior of James' (1954) and Yao's (1965) tests for k = 2. Therefore, the present study will concentrate on 30 PAGE 37 31 investigating the behavior of James' and Johansen's (1980) tests when k > 2. In the univariate literature (Dijkstra &. Werter, 1981), it appears that the advantage of James' second order test is in experiments with many groups. However, experiments, especially multivariate experiments, with a large number of groups seem to be rare in educational research. Therefore, k was kept at three. The simplification of this factor allowed more levels of another factor, the form of covariance matrix heteroscedasticity to be investigated more thoroughly. Sample-size ratios (n 1 :ru:n 3 ). Both equal and unequal sample sizes were used in the study. When the sample sizes were unequal, the sample-size ratios were small to moderately large. The basic ratios of n,:^:^ used in the simulation are presented in Table 1. Table 1 Sample-Size Ratios (n ^ru,:^) "2 : n 3 1:1:1 1 : 1 : 1.3 1:1:2 1 : 1.3 : 1.3 1:2:2 PAGE 38 32 In some cases these basic ratios could not be maintained exactly because of the restriction of a fixed total sample size (N) The departure from these basic ratios was made as small as possible. Ratio of total sample size (N) to number of variables The ratio N to p was 10, 15, or 20. Total sample sizes of lOp, 15p, or 20p was assigned to k groups based on the ratio illustrated in Table 1. The sample sizes used in the present study are displayed in Tables 2 and 3. Algina and Tang (1988) studied the case of k 2 samples and included N:p = 6, 10, and 20. They reported that N:p should be at least 10 to be reasonably sure that Yao's test, which was generally more robust than James' test, would be robust. With k > 2, it seems likely that N:p will need to be at least 10 for robustness to obtain. Therefore, in the present study, the upper limit of 20 was chosen to represent moderately large experiments: a total sample size of 60 when p = 3 and a total sample size of 120 when p = 6 Heterogeneity of covariance matrices Each population with a covariance matrix equal to a p x p identity matrix (I) will be called an "uncontaminated" population. Each population with a p x p diagonal covariance matrix D with at least one diagonal element not equal to one will be called a "contaminated" population. Three forms of D were used in this study. They are shown in Table 4. PAGE 39 33 Table 2 Sample Sizes When p = 3 N/p n, n 2 n 3 10 PAGE 40 34 Table 3 Sample Sizes When p = 6 N/p n 2 10 PAGE 41 35 Table 4 Forms of Covariance Matrix D Form 1 diag(d 2 d 2 d 2 ) diag(d 2 d 2 d 2 d 2 d 2 d 2 ) 2 diag(l, d 2 d 2 ) diag(l, 1, 1, d 2 d 2 d 2 ) 3 diag(l/d 2 d 2 d 2 ) diag(l/d 2 1/d 2 1/d 2 d 2 d 2 d 2 ) heteroscedasticity which seems more likely to be common in educational experiment than extreme heteroscedasticity, such as d = 6.0, In the present study, d = 3.0 was selected to investigate a larger degree of heteroscedasticity and to permit comparison of the results of the present study with those of Algina and Tang (1988) and Olson (1974) Two relationships between sample size and covariance matrices were employed. In the positive relationship, the larger samples were associated with D. In the negative relationship, the smaller samples were associated with D. These two relationships are summarized in Table 5. For equal sample sizes, the two nonredundant heteroscedasticity patterns, (I, I, D) or (I, D, D) were employed. Significance levels Three levels of significance were employed: .10, .05, and .01. PAGE 42 36 Table 5 Combination of Sample-Size Ratios and Contamination Sample-Size Ratios Relationship n, : rv, : n 3 Positive Negative 1.0 : 1.0 : 1.3 1.0 : 1.0 : 2.0 1.0 : 1.3 : 1.3 1.0 : 2.0 : 2.0 I I D PAGE 43 37 twofold. First, by a well-known theorem (Anderson, 1958), for any pair of positive definite matrices there exists a matrix C which can transform the matrices into an identity matrix and a diagonal matrix simultaneously. Second, as shown below, the test statistics are invariant under transformations y, = Cx ( i 1, ,k, where x, is the observed mean vector, and C is a p x p nonsingular matrix. In this section, the invariance property of the test statistic and the critical values are proved. The estimated variance -covariance matrix of x, is A ( = S/n That of y is CA^' The test statistic of both James' and Johansen's procedure is t'V 1 t X|(x"i x)'W,(X| x) i 1 k where v, = V and W = Â£,W, i = 1 k ; f = (f, t'j ... t' K1 ), t, x. xÂ„, i 1, ,k 1. and PAGE 44 38 The covariance matrix of t is A, + A. A, Cov (t) = V K *2 + \ A k A^ + A k Let t* Ct and M = I C, where denotes the Kronecker product Then t" Mt and Cov (t*) = Cov (Mt) CCA, + A k )C CA k C CA k C CCA;, + A k )C CA k C MVM' = V'. CA k C CA k C C(A M + A k )C Now, t*'V 1 t* (Mt)' (MVM' )" 1 (Mt) t'M'M" T V 1 M" 1 Mt, = t'V 1 t. Therefore the test statistic is invariant under nonsingular transformation PAGE 45 39 For both James' first order and Johansen's critical values, the quantities that are relevant to the invariance property are G = tr(I r 1 W,) r r 1, 2, 3, ... The critical value of James' second order procedure involves G and traces of cyclical products such as Let A, j = 1, ...,pbe the eigenvalues of I W W ( Therefore \ T ., j 1, ..., p are the eigenvalues of (I V 1 W,) r We need only to show that the eigenvalues of I W^W, are invariant, since tr(I r 1 W,) r = ][,A[, J = 1.---PThe As are the solutions to the equation 1 1 vr 1 w, XI | =0, or \V\ (1 A)l| For the transformed variables we have |C T H" 1 C1 CW,C' (1 A*) 1 1 =0, or |CT W 1 W,C (1 A") 1 1 =0. If we premultiply by C and postmultiply by C, the eigenvalues remain unchanged. Carrying out these multiplications gives PAGE 46 40 |W 1 W (1 A*) 1 1 =0, which proves A = A* Therefore G = tr(I W" 1 W ( ) r is invariant. By the property that tr(XY) tr(YX) it is obvious that tr(E*) trCCVW^WjW^jW'WjC') = tr(E) and that the trace of other cyclical products of W' W | and W 1 W | will also be invariant. By the preceding results, using only diagonal matrices to simulate experiments in which there are only two sets of covariance matrices is justified. However, when there are more than two sets of different matrices, the matrices cannot always be simultaneously diagonalized by a matrix C. Simulation Procedure For each condition, data were generated and the performance of the Pillai-Bartlett (V) test, James' first order test, James' second order test, and Johansen's tests were evaluated using the generated data. Generating mean vectors In order to generate a mean vector, a p x 1 vector of uncorrelated standard normal pseudorandom variables was generated by using the RANNOR function in SAS (1985). Multiplying the vector by l/Tn^, a sample mean vector from an uncontaminated population was obtained. Multiplying the vector by PAGE 47 41 (l/JrT)D"' a sample mean vector from a contaminated population was obtained. Generating covariance matrices Let T be a p x p lower triangular matrix. Kshirsagar (1959) showed that if T,, ~ N(0,1), for i j and T 2 x 2 nr wnere n is tne number of observations of a sample, then TT' W(n 1, I p ) a Wishart distribution with n 1 degrees of freedom. Thus S = (n 1)" 1 TT' is a covariance matrix for a sample of size n from a population with covariance matrix I. That is, S is a sample covariance matrix from an uncontaminated population. Pre and post multiplying S by D' a covariance matrix from a contaminated population is obtained. By using the RANNOR function described above, the TÂ„s are generated; by using the RAN GAM function, the T 2 ,s are generated. Evaluating the performance of test statistics For each replication, the data generated were analyzed by the Pillai -Bartlett trace (V), James' first order and second order test, and Johansen's tests. The test statistics and the critical values were calculated using PROC MATRIX in SAS The proportion of 2000 replications that yield significant results at a .10, .05, and .01 was recorded. Summary Combining one level of k (k = 3), two levels of p (p 3 or 6), four levels of sample-size ratios (see Table 1), three levels of N/p (10, 15, 20), two levels of d (d = JTTb or 3.0), three levels PAGE 48 42 of D, and two levels of forms of contamination (positive or negative), 288 experimental conditions are obtained with unequal sample sizes. In addition there were 72 conditions with equal sample sizes. The Pillai -Bartlett trace (V), James' first order and second order tests, and Johansen's test were applied in these experimental conditions. A general picture of the behavior of these tests was obtained by these 360 simulated experiments. PAGE 49 CHAPTER 4 RESULTS AND DISCUSSIONS In this chapter, the results for nominal a .05 are presented. In general, the major patterns of results are similar for a = .01, .05, and .10. Consequently, the results for a = 01 and a .10 are tabled in Appendices A, B, and C. The chapter is divided into two major sections. In the first section the results are presented for conditions with equal sample sizes. The results for conditions with unequal sample sizes are reported in the second section of the chapter. The Performance of the Four Multivariate Tests When n 1 = ru = n 3 The Performance of the Four Tests When d = Jl 5 Estimated rs are reported in Tables 6, 7, and 8 for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test; these results are for a = .05 in conditions in which n, = ru, = n 3 and d = Jl.5 Thirty-six ts are reported for each test. For the Pillai-Bartlett trace, one of the rs falls outside the criterion interval a 2ja( l-a)/2000 For Johansen's test, five rs fall outside the criterion interval. The largest r was .065; four of the five occur when n, 10, i 1, 2, 3. The number of rs outside the interval is 21 and 16 for James' first order test and second order test respectively. For James' 43 PAGE 50 uu Table 6 Actual Type I Error Rates for Four Multivariate Tests : n 1 _^_ ru = n 3 D = d 2 I d = 7175. and a = .05 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D PAGE 51 45 Table 7 Actual Type I Error Rates for Four Multivariate Tests : n 1 = _g a _^_ % J D diae (d 2 d 2 1) or D diag (d 2 d 2 d 2 1 1 1) d = JlTb. and a = .05 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D PAGE 52 46 Table 8 Actual Tvpe I Error Rates for Four Multivariate Tests : B, il. n 3 D diac (d 2 d 2 1/d 2 ) or D dia g (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ). d = ./I75. and a .05 I:I:D I:D:D N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test 1 PAGE 53 47 first order test, the 21 rs outside the interval are larger than a .05. The fs for the first order test tend to be larger than those for Johansen's test. For James' second order test, the 16 fs outside the interval are less than a = .05. James' second order test exhibited some tendency to perform better with p 3 than with p = 6. However, in general p did not appear to have a strong effect for any of the tests. Neither, did the pattern of contamination (I:I:D vs I:D:D). As N/p increased, there was some tendency for the performance of James' and Johansen's tests to improve. The second order test appears to have f nearer a for D d 2 I than for D = d 2 I The first order test and Johansen's test did not appear to be affected by the form of D. The Performance of the Four Tests When d 3 In Tables 9, 10, and 11, 36 rs are reported for each test under the condition in which n, tv, n 3 and d = 3. For the PillaiBartlett trace 29 fall outside the criterion interval. For Johansen's test, 12 fall outside the interval. All of these are larger than .05. Eight of rs outside the interval occur in conditions in which n, = 10. For James' first order test, 22 fall outside the interval; all are larger than .05. For James' second order test, 17 fall outside the interval; all are smaller than .05. In general, the effects of the factors on James' and Johansen's tests are similar to their effects for d = JlTS PAGE 54 48 Table 9 Actual Type I Error Rates for Four Multivariate Tests : n 1 = ru = n 3 D = d 2 I d = 3, and a = .05 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D PAGE 55 49 Table 10 Actual Type I Error Rates for Four Multivariate Tests : n 1 = ru = n 3 D = diag (d 2 d 2 1) or D diae (d 2 d 2 d 2 1 1 1 ) d = 3. and a .05 p N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 PAGE 56 50 Table 11 Actual Type I Error Rates for Four Multivariate Tests : n, ru n 3 D diag ( d 2 d 2 1/d 2 ) D diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) d 3 and a .05 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 PAGE 57 51 However, the effect of p on the second order test appears to be stronger when d = 3 than when d = 71 5 Summary The Pillai-Bartlett test worked well when the degree of heterogeneity was small, d = Jl 5 but had excessive type I error rates when the degree of heterogeneity was large, d = 3.0. Johansen's test worked reasonably well, particularly with N/p > 15. Consequently, it probably should be use in place of the usual multivariate criteria. James' first order test is not competitive; it had more fs than Johansen's test had outside the interval and tended to have t larger than that for Johansen's test. James' second order test had more fs than Johansen's test had outside the criterion interval. This seems to favor Johansen's test. However, some researchers may favor the second order test because this test tended to have f smaller than a, whereas Johansen's test tended to have t larger than a. The Performance of the Four Multivariate Tests When Sample Sizes Are Unequal Estimated Type I error rates are reported for conditions in which the sample sizes are unequal. The section is divided into three subsections defined by N/p (10, 15, or 20). In each subsection, 96 fs are reported for each of the four tests: PillaiBartlett, Johansen, James' first order, and James' second order. PAGE 58 52 For each subsection, there are eight tables, one for each combination of p (3 or 6), direction of relationship between the sample sizes and covariance matrices (positive or negative), and d (TO or 3.0). The Performance of the Four Tests When N/p = 10 Forty eight estimated rs for each of the four tests for the conditions in which N/p = 10 and d JTTb are reported in Tables 12 through 15. The number of fs outside the interval is 20 for the Pillai-Bartlett Trace, 30 for the second order test, 37 for Johansen's test, and 48 for the first order test. Estimated rs for conditions in which d = 3.0 are reported in Tables 16 through 19. The number of fs outside the criterion interval is 44, 31, 37, and 48 for the Pillai-Bartlett trace, James' second order test, Johansen's test, and James' first order test respectively. For the Pillai-Bartlett trace, when d changes from JT. 5 to 3.0, the increase in the number of rs outside the criterion interval is 120%. The decline in performance is most notable for the negative condition in which only five are outside the criterion interval for d = Jl 5 but all 24 fs are outside the criterion interval for d = 3.0. For the other tests the number of fs outside the criterion interval does not change dramatically when d increases from JYTb to 3. Based on the number of fs outside the criterion interval it appears that there was only a slight decline in the performance of Johansen's PAGE 59 53 Table 12 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10. p = 3. d = 71.5. and a .05 ^2 n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test .0625 .0375 .0930 .0455 .0620 .0380 .0955 .0540 D PAGE 60 54 Table 13 Actual Type I Error Rates for Four Multivariate Tests: Negative PAGE 61 55 Table 14 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10, p = 6, d = ./l 5 and a = .05 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test 0695 .0280 D PAGE 62 56 Table 15 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10. p 6. d = 71.5. and a = .05 r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 63 57 Table 16 Positive PAGE 64 Table 17 Negative PAGE 65 59 Table 18 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10. p = 6. d = 3. and a = _05 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = PAGE 66 60 Table 19 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p =10, p = 6, d = 3, and a = .05 r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order D = PAGE 67 61 test and the second order test as d increased. These frequencies are misleading for Johansen's test: there is a marked increase in r in the negative condition when d changes from Jl 5 to 3. Specifically, the mean r increases from .0710 to .0863. When d = Jl 5 the estimated rs outside the criterion interval for the Pillai-Bartlett trace are smaller than a .05 under positive conditions (smaller samples obtained from the populations having the contaminated variance-covariance matrices) and larger than a under negative conditions (smaller samples obtained from the populations having the contaminated variance-covariance matrices). This is similar to the performance of Hotelling's T 2 found by Hakstian, Roed, and Lind (1971). The same pattern is observed for d = 3.0 except when D = diag(d 2 d 2 1/d 2 ) or diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) Then r is near or larger than a in both positive and negative conditions. For Johansen's test and James' first order test, the rs outside the criterion interval tend to be larger than a. Estimated type I error rates for Johansen's test are generally closer to a than the rs for the first order test are. This is similar to the performance of Yao's test and James' first order test when p 2 (Algina and Tang, 1988). For James' second order test, the rs outside the criterion interval are smaller than a. For d = Jl 5 and 3.0 respectively, mean rs for various levels in the design are reported in Tables 20 and 21 respectively. Two factors, sample-size-covariance-matrix (SSCM) relationship and PAGE 68 62 Table 20 The Effects of Four Factors on the Means of ?s of the Four Tests: d 7l~5 Factor PAGE 69 63 Table 21 The Effects of Four Factors on the Means of rs of the Four Tests: d = 3 Factor PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test Sample-sizecovariancematrix relationship PAGE 70 64 sample size ratio, tended to have larger effects than the other factors had on the performance of the four tests. The rs for all four tests tend to be larger in the negative condition than in the positive condition. For the second order test this trend resulted in better performance in the negative condition. For Johansen's test and the first order test, it resulted in poorer performance in the negative condition. For d = Jl 5 the Pillai -Bartlett trace performed better in the negative condition: of the 20 rs outside the criterion interval, only five came from the negative condition. The same pattern was observed with d = 3.0 and p = 3. However, when p increased to 6 the test performed more poorly in the negative condition. The ts for Johansen's and the first order test tend to increase as the sample size ratio becomes more extreme, resulting in poorer performance. For the Pillai-Bartlett trace the effect of the sample size ratio depended on the SSCM relationship. With a negative relationship the test tended to become more liberal as the ratio became more extreme. The effect was larger for d = 3.0 than it was for d = Jl. 5. With a positive relationship, the effect of the sample size ratio was quite complicated, depending on d, p, and the form of D. For James' second order test, there is a tendency for t to increase as the sample size ratio becomes more extreme. This increase tended to result in better performance for the test. PAGE 71 6 5 Tvpe I error rates appear to be unaffected by variation in p for the Pillai-Bartlett trace, Johansen's test, and James' first order test. James' second order test was affected negatively by an increase in p In fact, when d J\ 5 the poorer performance of the second order test, in comparison to the Pillai-Bartlett trace was due to its performance when p = 6. For p 3, the second order test has nine rs outside the criterion interval. For p = 6, the number of rs outside the criterion interval increases to 20. The form of D had little effect on rs for the tests except for the Pillai-Bartlett trace when d = 3.0. Then the test performed better when D = diag(d 2 d 2 1/d 2 ) and D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) For the contaminated covariance matrices with these forms, only one of 16 rs is outside the criterion interval. The Performance of the Four Tests When N/p = 15 Estimated rs for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test for conditions in which N/p = 15 and d = Jl 5 are reported in Tables 22, 23, 24, and 25. Forty-eight rs are reported for each test. For PillaiBartlett trace, 24 rs are outside the criterion interval. For Johansen's test, 13 rs are outside the criterion interval. For James' first order test, 37 rs are outside the criterion interval. For James' second order test, 15 rs are outside the criterion interval. Given d = 71 5 as N/p increases from 10 to 15, the PAGE 72 66 Table 22 Â•our Mui 1 onansei PAGE 73 67 Table 23 Actual Type I Error Rates for Four Multivariate Tests Negative Condition, N/p =15, p = 3. d = 71 5 and a 05 r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 74 Table 24 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p = 6. d = 7i~3 and a = .05 n. ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test .0345 PAGE 75 (/> Table 25 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15. p = 6. d = 7l 5 and a = J15 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 27 27 36 .0530 .0560 22 22 46 .0540 .0590 26 32 32 .0405 .0530 18 36 36 .0540 .0645 D d 2 I PAGE 76 70 number of ?s outside the criterion interval decreases for all tests except the Pillai-Bartlett trace. There is a 65%, 23%, and 48% decrease for Johansen's test, James' first order test, and James' second order test respectively. The results for d = 3.0 are reported in Tables 26 through 29. For the Pillai-Bartlett trace, all but five rs are outside the criterion interval. For Johansen's test, 21 ts are outside the criterion interval. For James first order test, 38 ts are outside the criterion interval. For James' second order test, 23 ts are outside the criterion interval. As in d J1.5 the number of ts outside the criterion interval decreases for all tests except the Pillai-Bartlett trace as N/p increases from 10 to 15. There is a 47%, 21%, and 22% decrease for Johansen's test, James' first order test, and James' second order test respectively. These percentage decreases as N/p increases are smaller for d = 3 than the corresponding decreases were for d JT75 Given N/p 15, when d increases from /l 5 to 3, there is a 87%, 54%, and 60% increase in the number of ts outside the criterion interval for the PillaiBartlett trace, Johansen's test, and James' second order test. For James' first order test, the increase is trivial. For d = /l 5 the ts for all four tests tend to be larger in the negative condition than in the positive condition. For the PAGE 77 71 Table 26 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p = 3, d 3, and a -= .05 nj n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D -= d 2 I PAGE 78 72 Table 27 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p =15. p = 3. d ^ 3, and a = .05 n. ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = PAGE 79 73 Table 28 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p Â•= 6 d 3. and a = .05 n. ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test .0330 D PAGE 80 74 Table 29 Actual Tvpe I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15, p = 6, d=3. an d a = .05 n. n, n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order PAGE 81 75 second order test this trend resulted in better performance in the negative condition. For Johansen's test and the first order test it resulted in better performance in the positive condition. In the positive condition, Johansen's test has three of 24 rs outside the criterion interval. In the negative condition, James' second order test has three of 24 rs outside the criterion interval. For the Pillai-Bartlett trace, performance was poor under both positive and negative conditions. In the positive condition, the test tended to yield rs much smaller than a; in the negative condition, the test tended to yield rs much larger than a. However, when D diag(d 2 d 2 1/d 2 ) or D diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) the test tended to be liberal in both the positive and negative conditions. With one exception, the effect of the SSCM relationship on the tests was similar when d 3.0 to its effect when d JTTS: this factor did not appear to affect r for the second order test when d 3.0. Unlike the conditions in which N/p 10, changing the sample size ratio did not have a clear impact on the performance of the tests. For d 3.0, when the sample size ratio becomes more extreme, the rs for Johansen's test and James first order test increase, which leads to poorer performance of tests. However, the impact of this factor was smaller than that of SSCM relationship. The sample size ratio did not have an effect on James' second order test. For the Pillai-Bartlett trace, as the sample size ratio PAGE 82 76 becomes more extreme, the test became more liberal in the negative condition and more conservative in the positive condition except when D had the form of diag(d 2 d 2 1/d 2 ) or diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) With this form for D, the Pillai -Bartlett trace tended to become more liberal, in both the positive and the negative condition, as the sample size ratio became more extreme. For d J\ 5 both Johansen's test and the second order test performed better when p 3 than when p 6. When p 3, both of them have four rs outside the criterion interval. When p increases to 6, there are nine rs outside the criterion interval for Johansen's test and 11 for the second order test. For the Pillai Bartlett trace and the first order test, variation in p did not have an impact on the performance of the tests when d = JTTb For James' second order test, when d increased to 3.0, the effect of the number of variables (p) became even stronger than when d = JT7S When p = 3, the rs for the second order test are close to a both in the positive and negative condition. Only two rs are outside the criterion interval. However, when p increases to 6, the rs are smaller than a in both conditions; only three of 2U rs are in the criterion interval. The number of variables (p) did not have strong impact on the other tests when d = 3.0. For the Pillai-Bartlett trace, the effect of the form of D when d 7l 5 was similar to the effect observed when N/p 10 and PAGE 83 77 a2 A 2 d JTTb The test performed better with D diag(d d 1/d ) or D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) than with D in the other forms. With contaminated covariance matrices of these forms, only one r of the 16 estimated for the Pillai-Bartlett trace is outside the criterion interval. When d increases to 3.0, the nature of the effect is that when D = diag(d 2 d 2 1/d 2 ) or D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ), the Pillai-Bartlett trace tended to perform better in the positive condition than it did in the negative condition. The form of D did not effect the other tests. In general, when N/p 15, Johansen's test and James' second order test performed better than either the Pillai-Bartlett trace or James' first order test. When judged in terms of the number of ts outside the criterion interval Johansen's test performed well in the positive conditions, and except for conditions in which p 6 and d = 3, the second order test performed well in the negative conditions. In the exceptional condition, both the second order test and Johansen's test had the vast majority of rs outside the criterion interval. However, the average t for Johansen's test was .062 and for the second order test was .038. The ts for Johansen's test were not too extreme, even in the negative condition, the condition in which Johansen's test performed more poorly Of 48 ts, five were larger than .07 and only one was larger than .08. Similarly, rs for the second order test were not too extreme in the PAGE 84 78 positive condition. None of these rs was smaller than .03. Since r for each test is reasonably near a in principle, either test might be used. However, in practice, programming the second order test is substantially more complicated than programming Johansen's test. The Performance of the Four Tests When N/p 20 Estimated rs for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test in conditions in which N/p = 20 and d = 71. 5 are reported in Tables 30 through 33. Forty-eight rs are reported of each test. For the Pillai-Bartlett trace, 19 fs are outside the criterion interval. For Johansen's test three rs are outside the criterion interval, all of them in the p = 6 condition. For James' first order test, 18 rs are outside the criterion interval. For James' second order test, 13 rs are outside the criterion interval. Given d = 71.5, all tests had a notable decrease in the number of fs outside the criterion interval as N/p increased from 15 to 20. There were 21%, 77%, 51%, and 13% decreases for Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test respectively. Estimated rs for the Pillai-Bartlett trace, Johansen's test, James' first order test, and James' second order test in conditions in which N/p 20 and d = 3 are reported in Tables 34, 35, 36, and 37. Forty-eight fs are reported for each test. For the PillaiBartlett trace, 42 fs fall outside the criterion interval. For PAGE 85 79 Johansen's test, seven rs fall outside the criterion interval. For James' first order test, 26 rs fall outside the criterion interval. For James' second order test, nine rs fall outside the criterion interval. Except for the Pillai -Bartlett trace, there is a decrease in the number of rs outside the criterion interval as N/p increases from 15 to 20. There is a 65%, 31%, and 62% decrease for Johansen's test, James' first order test, and James' second order test respectively. Given N/p =20, as d increases from Jl 5 to 3, the number of rs outside the criterion interval increases 121%, 133%, and 44% for the Pillai-Bartlett trace, Johansen's test, and James' first order test respectively. For the James' second order test, there is a 31% decrease. The SSCM relationship affected the performance of all tests except James' second order test. The latter test performed almost equally well in both positive and negative conditions. There are 13 rs outside the criterion interval for the positive condition and nine for the negative condition. For the other tests, the number of rs outside the criterion interval is larger in the negative condition than in the positive condition. This resulted in a better performance for the first order test in the positive condition. Johansen's test performed better in the positive condition except when d = 71. 5. When d = Jl 5 it performed well in both conditions. PAGE 86 80 Table 30 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p =20, p = 3. d = ./! 5 and a .05 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D d 2 I PAGE 87 81 Table 31 Actual Type I Error Rates for Four Multivariate Tests: Neaative Condition. N/p = 20. p ~ 3. d = /! 5 and a .05 n, n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 88 8? Table 32 Actual Type I Error Rates for Four Multiv ariate Tests: Positive Condition. N/p 20. p = 6. d = ./TT3 and q .05 James n 1 n, n 3 PillaiJohansen's James' Bartlett test 1st order 2nd order trace test test D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 36 36 48 .0535 .0475 30 30 60 .0520 .0585 32 44 44 .0490 .0490 24 48 48 .0565 .0585 .0565 .0655 .0550 .0645 D d 2 I PAGE 89 83 Table 33 Actual Type I Error Rates for Four Multivariat e Tests: Negative Condition. N/p = 20 p = 6 d = /l 5 and a = .05 n, n, n 3 PillaiJohansen's James' James lartlett test 1st order 2nd order test test 0510 PAGE 90 8'. Table 34 Actual Type I Error Rates for Four Multivariate Tests : Positive Condition. N/p 2 0. p = 3. d = 3, and a .05 trace ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order test test .0535 PAGE 91 85 Table 35 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p = 20, p 3. d 3. and a = .05 Johansen's James' James r^ n 3 PillaiBartlett test 1st order 2nd order trace test test D PAGE 92 8 6 Table 36 Actual Type I Error Rates for Four Multivariat e Tests: Positive Condition. N/p = 20. p 6. d 3. and a = .05 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order ,2nd order trace test test D d 2 I PAGE 93 87 Table 37 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 20. p 6, d = 3, and a = .05 n, iv, n 3 PillaiJohansen's James' James artlett test 1st order 2nd order trace .1230 PAGE 94 88 For the Pillai -Bartlett trace, fs in the positive condition tend to be much smaller than a; rs in the negative condition tend to be much larger than a, especially when sample size ratio is more extreme. However, this trend does not hold when D is in the form of diag(d d 2 1/d 2 ) or diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) for the Pillai-Bartlett trace. When D is in those forms, ts are in the criterion interval for d JTTS and fs tend to be larger than a for d = 3.0, in both the positive and negative condition. More extreme sample size ratio resulted in poorer performance of the Pillai-Bartlett trace and James' first order test. When the sample size ratio is extreme, the Pillai-Bartlett trace has smaller fs in the positive condition and larger ts in the negative condition. For James' first order test, the extreme ratio leads to larger fs. This factor did not influence Johansen's test or the second order test. The number of variables affected only the second order test. The fs tended to get smaller as p increased. Given N/p 20, when p = 3 only four fs are outside the criterion interval. However, when p = 6, 18 fs are outside the criterion interval. In general, both the Pillai-Bartlett trace and James' first order test have too many fs outside the criterion interval in comparison with Johansen's test and the second order test. They are not recommended for use when N/p = 20. Johansen's test PAGE 95 89 performed well in all conditions, but performed better in the positive conditions. Its performance in the negative conditions was better when for d = JTT5 than for d = 3.0. In terms of number of rs outside the criterion interval, the second order test performed more poorly than Johansen's test. However, its performance was not bad and may be preferred by researchers who do not want to risk r > q. A practical problem for such researchers will be the difficulty in programming the second order test. Summary Six factors, the ratio of N to p, the magnitude of d in D, the sample-size-covariance relationship, the value of p, the sample size ratio, and the form of D, affected the performance of the four tests studied. However, the magnitude and the direction of the effect were not the same across tests. In general, as N/p increased, all of the tests except the Pillai-Bartlett trace become more robust. The effect of the other factors on Johansen's test and James' tests seems to decrease as N/p increased: the number of fs within the criterion interval increased as N/p increased. Even though James' first order test performed better as N/p increased to 20, it had more rs outside the criterion interval than the second order test and Johansen's test. As d increased, with the exception of the second order test when N/p 20, the performance of all the tests declined. For N/p PAGE 96 90 = 20, James' second order test performed as well or better as d increased The rs tend to be larger in the negative condition than in the positive condition. This effect lead to a better performance in the positive condition for the second order test and a poorer performance for Johansen's test and the first order test. For the Pillai-Bartlett trace, is are much smaller than a in the positive condition and much larger in the negative condition. When the sample size ratio becomes more extreme, the performance of the tests, except the second order test, declines. The extreme sample size ratio lead to a moderately better performance on the second order test when N/p was less than 20. The change in p had a strong impact on the performance of the second order test. The test was more robust when p = 3 than when p = 6. This factor did not impact on the other tests. The form of D affected only the Pillai-Bartlett trace. When D is in the form of diag(d 2 or 1/d 2 ) the performance of the test improved. Tables 38, 39, and 40 were constructed using the rule of thumb that a test is robust for a particular condition if three or fewer ts are outside the criterion interval. These serve to summarize the major effects of the factors on the four tests. Similar results PAGE 97 91 Table 38 Number of fs Outside the Criterion Interval of the F our Multivariate Tests for N/p = 10 and a = .05 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test 073 3 P N 6 P N 3 3 P N 6 P N 8 PAGE 98 92 Table 39 Number of Â£s Outside the Criterion Interval of the F our Multivariate for N/p = 15 and a = .05 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test a. 5 3 P N 6 P N 3 3 P N 6 P N 6 PAGE 99 93 Table 40 Number of fs Outside the Criterion Interval of the F our Multivariate Tests for N/p 20 and a .05 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test AT75 3 p PAGE 100 94 Table 41 Number of Â£s Outside the Criterion Interval of the Four Multivar iate Tests for N/p = 10 and a = .01 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test OT5 3 P 5 4 12 2' N 5 4 12 0* P 6 7 12 5 N 10 11 12 4 P 5 6 12 4 N 10 7 12 9 P 4 8 12 8 N 11 11 12 5 Note. A indicates the test is relatively robust. The maximum number of rs in each cell is 12. PAGE 101 95 Table 42 Number of fs Outside the Criterion Interval of th e Four Multivariate Tests for N/p 15 and a = .01 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test 173 3 4 5 3* 10 5 8 3* 12 1" 1" 1* 7 4 4 2* 5 6 12 8 9 9 10 8 11 Note A indicates the test is relatively robust. The maximum number of rs in each cell is 12. PAGE 102 96 Table 43 Number of Â£s Outside the Criterion Interval of the Four Multivariate Testsfor N/p = 20 and a = .01 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test JTTs PAGE 103 97 Table 44 Number of rs Outside the Criterion Interval of the Four Mu ltivariate Tests for N/p = 10 and a = 10 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test PAGE 104 98 Table 45 Number of Â£s Outside the Criterion Interval of the Four Multivariate Tests for N/p = 15 and a = 10 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test OT5 3 p PAGE 105 99 Table 46 Number of fs Outside the Criterion Interval of the Four Multivariate Tests for N/p = 20 and a = 10 d p Positive/ PillaiJohansen's James' James' Negative Bartlett test 1st order 2nd order trace test test /T75 3 6 7 7 5 11 12 10 10 2* 0* 3" 3* 0* 4 2* 2* 3* 5 4 7 3* 7 6 10 3* 0* 4 6 0" 2* 4 5 Note A indicates the test is relatively robust. The maximum number of ts in each cell is 12. PAGE 106 CHAPTER 5 CONCLUSIONS The results of the study provide guidelines for conducting multivariate tests when variance-covariance matrices in the k populations from which the samples are drawn are heterogeneous. The results obtained can be applied to experiments which have experimental conditions similar to the 360 simulated experiments conducted in this study. The general conclusion is that no test among the four performs uniformly well; that is, the performance of the tests depends on the experimental conditions. James' first order test performed more poorly than the other tests in terms of controlling Type I error rate under all 360 experiments in this study. Therefore it is not recommended for use. The PillaiBartlett trace performed poorly except when n, r^ = n 3 and d = 7175. When the test is not robust, it tends to be conservative when the smaller samples are from populations with smaller variancecovariance matrices and liberal when the smaller samples are from populations with larger variance-covariance matrix. Johansen's test and James' second order test perform better than the other two tests. When Johansen's test is not robust, it tends to be liberal. When James second order test is not robust, it tends to be conservative 100 PAGE 107 101 Six factors affected the performance of the tests, namely, the ratio of total sample size to the number of variable (N/p) the magnitude of dispersion of the covariance matrices (d) sample-sizecovariance -matrix relationship (negative or positive), sample size ratio (n^iuing), number of variables (p) and form of the covariance matrices (D) Among these six factors, the ratio of total sample size to number of variables (N/p) had the dominant effect on the performance of the tests. As N/p increased from 10 to 20, the performance of the tests, except the Pillai -Bartlett trace, improved. When d increased from 71.5 to 3.0, the performance of the tests typically declined. The degree of decline increased as N/p increased. There is an exception for James' second order test. When N/p 20, the test performs better when d is large. The sample-size-covariance-matrix relationship also had a relatively strong impact on the performance on the tests. The performance of the Pillai-Bartlett trace follows the pattern of Hotelling's T 2 : the test is conservative under the positive condition and liberal under the negative condition. Johansen's test and James' first order test follow the same pattern as Yao s and James' tests when p = 2. They perform better in the positive PAGE 108 102 condition than in the negative condition. James second order test performs better in negative condition than in positive condition. When N/p increased to 20, the effect of this factor on the tests decreased. When the sample size ratio was more extreme, the performance of the tests except James' second order test declined. Extreme sample size ratios lead to a moderately better performance on the second order test when N/p was less than 20. The change in p has a strong impact on the performance of the second order test. The test was more robust when p = 3 than when p = 6. This factor did not impact the other tests. The form of D only effected the Pillai -Bartlett trace. When D is in the form of diag(d 2 or 1/d 2 ) the performance of the test improved. The generalizability of the results is limited by the range of variation in the factors and the use of multivariate normal data. Keeping these limitations in mind the following conclusions and recommendations can be set forth. Experiments with Equal Sample Sizes Conclusion 1 When the heterogeneity among the three population covariance matrices is small, such as d < 71 5 Pillai- PAGE 109 103 Bartlett trace, the traditional multivariate test, can be used safely. Johansen's test also can be used; its performance is as good as the Pillai-Bartlett trace when N/p > 15 and almost as good when N/p = 10. Conclusion 2 When d > 3, Johansen's test should be used to obtain an accurate type I error rate. Recommendation 1 When sample sizes are equal Johansen's test should replace the multivariate criteria based on the homogeneity assumption. Researchers should attempt to obtain an N/p ratio of at least 15. Experiments with Unequal Sample Sizes Conclusion 3 When N/p = 10, d < 71.5, and sample-sizecovariance -matrix (SSCM) relationship is negative, the PillaiBartlett trace can be used for both p 3 or p = 6 Conclusion 4 With N/p 10 p 3 and the SSCM relationship is negative, James' second order test performs well. Recommendation 2 When N/p 10 the best choice is James' second order test. It is typically conservative, but not extremely so. The other tests can be extremely liberal. It is, however, best to avoid N/p ratios this small because programming the second order test is extremely complicated. PAGE 110 104 Conclusion 5 When N/p 15 and the SSCM relationship is positive, Johansen's test works well. Conclusion 6 When N/p 15, d < 71 5 and the SSCM relationship is negative, James' second order test works well. It also performs well when d < 3 and p < 3. When d > 3 and p > 6, no test performs well. Conclusion 7 When N/p = 20, Johansen's test works well. The second order test words well provided p < 3. Conclusion 8 When N/p 20, the SSCM relationship is negative, and p < 6, the second order test works well. Recommendation 3 When N/p > 15 both Johansen's test and James second order test can be used because they each have r reasonably near a. The choice can be made based on the following considerations: James' second order test can be somewhat conservative; Johansen's test can be somewhat liberal. Johansen's test is substantially easier to program. PAGE 111 APPENDIX A RESULTS OF THE SIMULATED EXPERIMENTS WHEN n, = tl, = n 3 a = .01, AND a 10 Table 47 Actual Type I Error Rates for Four Multivariate Tests : n ] ru n 3 D = d 2 I d = ./T3. and a = .01 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 PAGE 112 106 Table 48 Actual Type I Error Rates for Four Multivariate Tests : 0, tu n 3 D diagCd 2 d 2 1) or D diae(d 2 d ? d 2 1 1 1) d = Jl 5 and a = 01 p N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order PAGE 113 107 Table 49 Actual Type I Error Rates for Four Multivariate Tests : n ru = n D diapCd 2 d 2 1/d 2 ) or D = diag(d ? d 2 d 2 1/d 2 1/d 2 1/d 2 ). d = TO. and a = .01 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I : I : D I;D:D 3 PAGE 114 108 Table 50 PAGE 115 Table 51 Actual Type I Error Rates for Four Multivariate Tests : n, ru n s D diag(d 2 d 2 1) or D diagfd 2 d 2 d 2 1 1 1) d-3, and a = 01 109 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 PAGE 116 110 Table 52 Actual Type I Error Rates for Four Multivariate Tests : n, tu Ti g. D diafrfd 2 d 2 1/d 2 ) or D = diag(d ? d 2 d 2 1/d 2 1/d 2 1/d 2 ). d 3. and a = .01 3 PAGE 117 Ill Table 53 Actual Type I Error Rates for Four Multivariate Tests : Si = a, n 3 p d 2 I. d = 7173. and a = 10 p N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test PAGE 118 112 Table 54 Actual Type PAGE 119 113 Table 55 Actual Tvpe I Error Rates for Four Multivariate Tests : B, a, T u. D diag(d 2 d 2 1/d 2 ) or D diagfd 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ). d 7175. and o 10 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D I:D:D 3 PAGE 120 114 Table 56 Actual Type I Error Rates for Four Multivariate Tests : n, = ru = r u, D d 2 I d 3. and a = 10 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D PAGE 121 115 Table 57 Actual Type I Error Rates for Four Multivariate Tests : ni = n^ = n 3 D diagfd 2 d 2 1) or D diae(d 2 d 2 d 2 1 1 1) d = 3 and a = 10 N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 PAGE 122 116 Table 5f Actual Type I Error Rates for Four Multivariate Tests n ~ ru n D diag(d 2 d 2 1/d 2 ) or D diaeCd 2 ^ d 2 d 2 1/d 2 1/d 2 1/d 2 ). d 3. and o .10 p N/p PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test I:I:D 3 PAGE 123 APPENDIX B RESULTS OF THE SIMULATED EXPERIMENTS WHEN SAMPLE SIZES ARE UNEQUAL AND a = .01 Table 59 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p =10. p 3. d = 71 5 and a .01 "1 "2 n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 124 Table 60 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p = 10, P 3, d = 71.5. and a = .01 118 N/p n, ix, n 3 PillaiJohansen's James' James' Jartlett test 1st order 2nd order trace test test D d z I 9 9 12 7 7 16 8 11 11 6 12 12 D diag(d 2 d 2 1) 9 9 12 7 7 16 8 11 11 6 12 12 D = diag(d 2 d 2 1/d 2 ) 9 9 12 .0130 7 7 16 .0110 8 11 11 .0250 6 12 12 .0330 .0095 PAGE 125 119 Table 61 Actual Type I Error Rates for Four Multivariat e Tests: Positive Condition, N/p = 10 p 6 d = ./l 5 and a .01 N/p n. n_ n 3 PillaiJohansen's James' James Bartlett test 1st order 2nd order trace test test 0040 PAGE 126 120 Table 62 Negative PAGE 127 Table 63 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10. p 3, d = 3, and a = .01 121 N/p n, rt, n 3 Pillailartlett trace .0070 PAGE 128 122 Table 64 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10. p = 3. d-3. and a 01 N/p a. rij n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order PAGE 129 123 Table 65 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 10. p 6, d 3, an d a = .01 N/p n 1 nj n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = d 2 I 18 18 24 .0040 15 15 30 .0015 16 22 22 .0050 12 24 24 .0015 D diag(d 2 d 2 d 2 1 1 1) .0115 PAGE 130 124 Table 66 PAGE 131 125 Table 67 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p =15. p 3, d VT3. and a 01 PAGE 132 126 Table 68 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15. p = 3, d = 71 5 and a = .01 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = PAGE 133 Table 69 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p 6, d JTTS and a = .01 127 fl, Pillaiiartlett trace Johansen' s test D = d z I 27 27 36 22 22 46 26 32 32 18 36 36 D diag(d 2 d 2 d 2 1 1 1) 13 13 19 .0115 11 11 23 .0135 13 16 16 .0090 9 18 18 .0145 D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 13 13 19 .0085 .0095 11 11 23 .0105 .0130 13 16 16 .0075 .0095 9 18 18 .0095 .0145 .0120 .0110 .0115 .0140 James 1st order test .0160 0170 .0145 .0210 James 2nd order test .0065 PAGE 134 121 Table 70 PAGE 135 129 Table 71 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p = 3, d = 3. and a = .01 n, rig n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 136 130 Table 72 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p 15. p 3, d 3, and a = .01 n, n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = d 2 I 13 13 19 ,0330 .0100 .0190 .0055 11 11 23 .0805 .0160 .0250 .0125 13 16 16 .0535 .0210 .0265 .0140 9 18 18 .0140 .0230 .0340 .0140 D = diag(d 2 d 2 1) 27 27 36 22 22 46 26 32 32 18 36 36 D = diag(d 2 d 2 1/d 2 ) 27 27 36 .0105 22 22 46 .0155 26 32 32 .0315 18 36 36 .0720 .0050 PAGE 137 131 Table 73 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15. p = 6. d 3, and a .01 n, ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D d 2 I 27 27 36 22 22 46 26 32 32 18 36 36 D = diag(d 2 d 2 d 2 1 1 1) 27 27 36 .0110 22 22 46 .0185 26 32 32 .0100 18 36 36 .0115 D = diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 27 27 36 .0135 .0115 22 22 46 .0090 .0130 26 32 32 .0120 .0120 18 36 36 .0110 .0135 .0060 PAGE 138 132 Table 74 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15, p = 6, d = 3. and a = .01 a. rio n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test .0385 PAGE 139 133 Table 75 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p = 20 p 3 d = J\ 5 and a 01 r^ Pillaiiartlett trace D d z I 18 18 24 15 15 30 16 22 22 12 24 24 D diag(d 2 d 2 1) 18 18 24 15 15 30 16 22 22 12 24 24 D diag(d 2 d 2 1/d 2 ) 18 18 24 .0100 15 15 30 .0085 16 22 22 .0090 12 24 24 .0085 Johansen' s test James 1st order test James 2nd order test .0060 PAGE 140 Table 76 134 Actual Type I Error Rates for Four Multivariate Tests: PAGE 141 135 Table 77 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 20, p = 6. d ./T75 and a = .01 n, n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order PAGE 142 136 Table 78 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 20. p = 6. d 71.5. and a .01 r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 143 137 Table 79 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p =20, p = 3, d = 3, and a = 01 PAGE 144 138 Table 80 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p = 20, p 3 d = 3. and a ; .01 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D d 2 I PAGE 145 139 Table 81 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p = 20, p = 6. d 3. an d a = .01 r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test .0060 PAGE 146 Table 82 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p = 20. p 6, d 3. and a = .01 140 D diag(d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 36 36 48 .0105 .0100 30 30 60 .0185 .0080 32 44 44 .0410 .0125 .0125 24 48 ki ,0765 PAGE 147 APPENDIX C RESULTS OF THE SIMULATED EXPERIMENTS WHEN SAMPLE SIZES ARE UNEQUAL AND a = 10 Table 83 Actual Type I Error Rates for Four Multivariate Tests: PAGE 148 142 Table 84 Negative PAGE 149 143 Table 85 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p 10 p 6 d ./l 5 a nd a 10 N/p n, nj n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D d z I 18 18 24 .0815 .1220 .1375 -0680 15 15 30 ,0605 .1345 .1500 .0665 16 22 22 .0830 .1340 .1195 -0680 12 24 24 .0645 .1295 .1470 .0675 D diag (d 2 d 2 d 2 1 1 1) 18 18 24 .0905 15 15 30 .0785 16 22 22 .0885 12 24 24 .0810 D diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 18 18 24 15 15 30 16 22 22 12 24 24 .1155 PAGE 150 Table 86 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10. p = 6. d = 713, and a .10 144 N/p n, r^ n 3 PillaiJartlett trace Johansen' s test James 1st order test James 2nd order test D d'l 18 18 24 .1120 15 15 30 .1395 16 22 22 .1190 12 24 24 .1500 D diag (d 2 d 2 d 2 1 1 1) 18 18 24 .1080 15 15 30 .1140 16 22 22 .1140 12 24 24 .1280 D = diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 18 18 24 .0970 .1245 15 15 30 .0880 16 22 22 .0945 12 24 24 .1015 1250 1225 1245 1670 1260 1395 1260 1465 .1340 .1235 .1400 1410 1485 1415 1905 1450 1610 1390 1635 1425 1505 1380 1570 .0625 .0625 0765 .0880 .0680 .0690 .0760 .0840 0775 .0675 .0770 .0760 Note When n, r^ n 3 the form of contamination is I:I:D; when n 1 r* n^ = n 3 the form of contamina tion is I:D :D. The underlined rs are outside the interval a 2ja{ l-a)/2000 PAGE 151 145 Table 87 Rates for Four Multivariate Tests: Positive PAGE 152 146 Table 88 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 10, p 3. d 3. and a = .10 N/p n, r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 153 147 Table 89 Positive PAGE 154 148 Table 90 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p =10, p-6, d-3. and a = 10 N/p n 1 ru n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D PAGE 155 149 Table 91 Actual Type I Error Rates for Four Multivariate Tests: '* ~~~ .10 Positive Condition, N/p = 15 p = 3 d = ./l 5 and a N/p n, PAGE 156 150 Table 92 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15, p = 3, d Â•= J\ 5 and a 10 N/p rt, rig n 3 PillaiJohansen's James' James' Bartlet test 1st order 2nd order trace test test .1165 PAGE 157 151 Table 93 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 15, p = 6. d ./l 5 and a 10 N/p a, r^ n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test .0955 PAGE 158 152 Table 94 Actual Type I Error Rates for Four Multivariat e Tests: Negative Condition. N/p = 15 p 6 d = ./l 5 and a 10 N/p a, r^ n 3 PillaiBartlett trace Johansen' s test D d 2 I 27 27 36 22 22 46 26 32 32 18 36 36 D diag (d 2 d 2 d 2 1 1 1) 27 27 36 .1095 .1110 22 22 46 .1380 .1170 26 32 32 .1090 .1125 18 36 36 .1175 .1265 D diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 27 27 36 .0995 .1050 22 22 46 .0100 .1195 26 32 32 .0915 .1005 18 36 36 .0990 .1120 James 1st order test 1195 1315 .1220 .1355 .1110 .1305 .1095 .1210 James 2nd order test .1040 PAGE 159 Table 95 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p = 15, p = 3. d=3, and a = 10 153 N/p n, PAGE 160 154 Table 96 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p = 15. p=3. d=3. and a = 10 N/p n 1 rij n 3 PillaiJohansen's James' James' Bartlett test 1st order 2nd order trace test test D = PAGE 161 155 Table 97 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p =15, p = 6. d = 3 an d a = .10 N/p n, rt, n 3 PillaiBartlett trace D = d^I 27 27 36 22 22 46 26 32 32 18 36 36 0670 Johansen' s test .1040 .1090 .1150 .1025 .0070 .0795 .0350 D = diag (d 2 d 2 d 2 1 1 1) 27 27 36 .0710 22 22 46 .0335 26 32 32 .0855 18 36 36 .0650 D = diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 27 27 36 .1045 .1000 22 22 46 .1245 .1070 26 32 32 .1510 .1030 18 36 36 .2390 .1120 James 1st order test 1160 1175 1260 1100 .0995 PAGE 162 136 Table 98 PAGE 163 157 Table 99 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition. N/p =20. p3. d71 .5. and a 10 N/p n, r^ n 3 PillaiJartlett trace D d 2 I 18 18 24 15 15 30 16 22 22 12 24 24 D = diag(d 2 d 2 1) 18 18 24 15 15 30 16 22 22 12 24 24 D diag(d 2 d 2 1/d 2 ) 18 18 24 .0995 15 15 30 .0900 16 22 22 .0915 12 24 24 .0910 Johansen' s test James 1st order test James 2nd order test .0810 PAGE 164 138 Table 100 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition. N/p = 20 p = 3 d = ./l 5 and a = 10 N/p a, nj n 3 PillaiJohansen's James' James' Bartlet test 1st order 2nd order trace test test D d 2 I PAGE 165 Table 101 Actual Type I Error Rates for Four Multivari ate Tests: Positive Condition. N/p = 20. p = 6. d ./! 5 and a .10 159 N/p n, ix, n 3 PillaiBartlett trace Johansen' s test D = d 2 I 36 36 48 .0705 30 30 60 .0660 32 44 44 .0725 24 48 48 .0635 D = diag (d 2 d 2 d 2 1 1 1) 36 36 48 .0805 30 30 60 .0725 32 44 44 .1025 24 48 48 .0845 D diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 36 36 48 .1100 .1140 30 30 60 .0100 .1145 32 44 44 .1020 .1000 24 48 48 .1110 .1160 James 1st order test J ame s 2nd order test .0880 PAGE 166 Table 102 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p -20. p-6. d-./1.5. and a 10 160 N/p n, rig n 3 PillaiBartlett trace Johansen' s test D d^I 36 36 48 30 30 60 32 44 44 24 48 48 D diag (d 2 d 2 d 2 1 1 1) 36 36 48 .0950 30 30 60 .1250 32 44 44 .1045 24 48 48 .1265 D diag (d 2 d 2 d 2 1/d 2 1/d 2 1/d 2 ) 36 36 48 .0980 .1050 30 30 60 .1075 .1210 32 44 44 .1075 .1075 24 48 48 .1095 .1175 .0970 .1055 .1000 .1105 James 1st order test 1010 1140 1050 1160 James 2nd order test 1190 PAGE 167 Table 103 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p = 20. p = 3, d -= 3, and a = .10 161 N/p n. PAGE 168 Table 104 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p =20.p=3.d =3. and a = 10 162 N/p n, PAGE 169 Table 105 Actual Type I Error Rates for Four Multivariate Tests: Positive Condition, N/p =20, p-6. d = 3. an d a = 10 163 N/p n, PAGE 170 164 Table 106 Actual Type I Error Rates for Four Multivariate Tests: Negative Condition, N/p =20. p = 6. d ^ 3. and a 10 N/P n. PAGE 171 REFERENCES Algina, J., & Oshima, T. (1989). Robustness of the independent sample Hotelling's T 2 to variance-covariance heteroscedasticitv when sample sizes are unequal or in small ratios Manuscript submitted for publication. Algina, J., & Tang, K. L. (1988). Type I error rates for Yao's and James' tests of equality of mean vectors under variance-covariance heteroscedasticity Journal of Educational Statistics 13, 281-290. Anderson, T. W. (1958). An introduction to multivariate statistical analysis New York: Wiley. Aspin, A. A. (1948). An examination and further development of formula arising in the problem of comparing two mean values. Biometrika 35, 88-96. Aspin, A. A. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated (with appendix by B. L. Welch), Biometrika 36, 290-296. Bartlett, M. S. (1939). A note on tests of significance in multivariate analysis. Proceedings of the Cambridge Philosophical Society 35, 180-185. Bray, J. H. & Maxwell, S. E. (1985). Multivariate analysis of variance. Beverly Hills, CA: Sage Publications. Brown, M. B., & Forsyth, A. B. (1974). The small sample behavior of some statistics which test the equality of several means. Technometrics 16 129-132. Dijkstra, J. B. & Werter, S. P. J. (1981). Testing the equality for several means when the population variances are unequal. Communications in Statistics-Simulation and Computation BIO 557-569. Fisher, R. A. (1935). The fiducial argument in statistical inference. Annals of Eugenics 6, 391-398. 165 PAGE 172 166 Fisher, R. A. (1939). The comparison of samples with possible unequal variances. Annals of Eugenics 9, 174-180. Fisher, R. A. & Healy, M. J. R. (1956). New tables of Behrens test of significance. Journal of the Royal Statistical Society Ser B 18, 212-216. Hakstian, A. R. Roed, J. C, & Lind, J. C. (1979). Two-sample T 2 procedure and the assumption of homogeneous covariance matrices. Psychological Bulletin 86, 1255-1263. Holloway, L. N. & Dunn, 0. J. (1967). The robustness of Hotelling's T 2 Journal of the American Statistical Association 62, 124-136. Hopkins, J. W. & Clay, P. P. F. (1963). Some empirical distributions of bivariate T 2 and homoscedasticity criterion M under unequal variance and leptokurtosis Journal of the American Statistical Association, 58 1048-1053. Hotelling, H. (1931). The generalization of Student's ratio. Annals of Mathematical Statistics 2, 360-378. Hotelling, H. (1951). A generalized T test and measure of multivariate dispersion. In Jerzy Neyman (Ed.), Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (pp. Press 23-41). Berkeley: University of California Hsu, P. L. (1938). Contributions to the theory of 'Student's' t-test as applied to the problem of two samples. Statistical Research Memoirs 2., 1-24. Ito, K. (1969). On the effect of heteroscedasticity and nonnormality upon some multivariate test procedures. In P. R. Krishnaiah (Ed.), Multivariate Analysis-II (pp. 87-120). New York: Academic Press. Ito, K. & Schull, W. J. (1964). On the robustness of the T 2 test in multivariate analysis of variance when variance -covariance matrices are not equal. Biometrika 51 71-78. James, G. S. (1951). The comparison of several groups of observations when the ratios of the population variances are unknown. Biometrika 38 324-329. James, G. S. (1954). Tests of linear hypothesis in univariate and multivariate analysis when the ratios of the population variances are unknown. Biometrika 41, 19-43. PAGE 173 167 James, G. S. (1959). The BehrensFisher distribution and weighted means. Journal of the Royal Statistical Society Ser. B. 21, 78-79. Johansen, S. (1980). The WelchJames approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika 67, 85-92. Kaiser, L. (1983). Asymptotic equivalence of an expansion test and an approximate degrees of freedom test. Biometrika 70 505-509. Korin, B.P. (1972). Some comments of the homoscedasticity criterion M and the multivariate analysis of variance tests T, W and R. Biometrika 59, 215-216. Kshirsagar, A. M. (1959). Bartlett decomposition and Wishart distribution. Annals of Mathematical Statistics 30, 239-241. Lawley, D. N. (1938). A generalization of Fisher's z test. Biometrika 30, 180-187. Lawley, D. N. (1939). A correction to 'A generalization of Fisher's z test'. Biometrika 30 467-469. Olson, C. L. (1973). A Monto Carlo investigation of the robustness of multivariate analysis of variance Unpublished doctoral dissertation, University of Toronto. Olson, C. L. (1974). Comparative robustness of six tests in multivariate analysis of variance. Journal of the American Statistical Association 69 894-908. Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. Annals of Mathematical Statistics 26 117-121. Ray, W. D. & Pitman, E. N. T. (1961). An exact distribution of the Fisher-Behrens-Welch statistic. Journal of the Royal Statistical Society 23, 377-384. Roy, S. N. (1945). The individual sampling distribution of the maximum, the minimum, and any intermediate of the p-statistics on the null-hypothesis. Sankhya 7, Part 2 133-158. PAGE 174 168 Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Annals of Mathematical Statistics 24, 220-238. SAS Institute Inc. (1985). SAS R user's Euide : Basics, version 5 edition Cary NC : Author. Scheffe, H. (1959). Analysis of variance New York: John Wiley and Sons Inc Scheffe, H. (1970). Practical solutions of the Behrens -Fisher problem. Journal of the American Statistical Association 65_, 1051-1058. Stevens, J. (1986). Applied multivariate statistics for the social sciences Hillsdale, NJ : Lawrence Erlbaum Associates. Trichett, W. H. James, G. S., & Welch, B. L. (1954). On the comparison of two means, further discussion of iterative methods for calculation tables. Biometrika 41, 361-374. Trichett, W. H. James, G. S., & Welch, B. L. (1956). Further critical values for the two-means problem. Biometrika 43, 203-203. Wald, A. (1955). Testing the difference between the means of two normal populations with unknown standard deviations. In A. Wald, Selected Papers in Statistics and Probability (pp 669-695) New York: McGraw-Hill Book Co. Wang, Y. Y. (1971). Probabilities of the type I errors of the Welch tests for the Behrens -Fisher problem. Journal of the American Statistical Association 66, 605-608. Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika 29, 350-362. Welch, B. L. (1947). The generalization of 'Student's' problem when several different population variances are involved. Biometrika 34,28-35. Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika 38 330-336. PAGE 175 169 Wilcox, R. R. (1987). New statistical procedures for the social sciences: Modern solutions to basic problems. Hillsdale, NJ : Lawrence Erlbaum Associates. Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika 24, 471-494. Yao Y. (1965). An approximate degrees of freedom solution to the multivariate BehrensFisher problem. Biometrika 52, 139-147. PAGE 176 BIOGRAPHICAL SKETCH Kezhen Linda Tang was born on August 21, 1953, in Beijing, the People's Republic of China, where she had a very happy childhood. She moved to Nanjing, a southeastern city, with her parents when she was eleven years old and finished her high school there. She entered Nanjing Teachers University in February, 1978, and graduated with the Bachelor of Arts degree in Chinese Language and Literature in January, 1982, after which she taught literature in Jiangsu Educational College for one year. In June, 1983, Ms. Tang enrolled in the graduate program in Foundations of Education at the University of Florida. She earned the master's degree in educational psychology in August, 1986. Recognizing the need to use statistical techniques in educational research, she transferred to the Research and Evaluation Methodology program in the Foundations of Education Department, University of Florida, for her Ph.D. study. She concurrently is pursuing a master's degree in statistics. She will graduate with the Ph.D. in December, 1989. 170 PAGE 177 I certify that I have read this study and that in ray opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^ *> n1 Jaiies J. Algina, Ifhair Professor of Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Lirra'a M. Crocker Professor of Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. L / 1 Robert R. Sherman Professor of Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Frank Martin Professor of Statistics PAGE 178 This dissertation was submitted to the Graduate Faculty of the College of Education and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. December, 198< i /W-J Z-A. Chairman, Foundations of EaJucation AAgJJLg Dean, College of Education Dean, Graduate School PAGE 179 UNIVERSITY OF FLORIDA 3 1262 08285 301 O' |