Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UF00098279/00001
## Material Information- Title:
- A comparison of methods for combining tests of significance
- Creator:
- Louv, William C., 1952-
- Publication Date:
- 1979
- Copyright Date:
- 1979
- Language:
- English
- Physical Description:
- vii, 122 leaves : ill. ; 28 cm.
## Subjects- Subjects / Keywords:
- Approximation ( jstor )
Binomials ( jstor ) Degrees of freedom ( jstor ) Null hypothesis ( jstor ) Probabilities ( jstor ) Random variables ( jstor ) Sample size ( jstor ) Significance level ( jstor ) Statistical discrepancies ( jstor ) Statistics ( jstor ) Dissertations, Academic -- Statistics -- UF Statistical hypothesis testing ( lcsh ) Statistics thesis Ph. D - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis--University of Florida.
- Bibliography:
- Bibliography: leaves 118-121.
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by William C. Louv.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 023309024 ( AlephBibNum )
06429717 ( OCLC ) AAL1950 ( NOTIS )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

A COMPARISON OF METHODS FOR COMBINING TESTS OF SIGNIFICANCE BY WILLIAM C. LOUV A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1979 Digitized by the Internet Archive in 2009 witfn unaing from University of Florida, George A. Smathers Libraries http://www.archive.org/details/comparisonofmeth00louv ACKNOWLEDGMENTS I am indebted to Dr. Ramon C. Littell for his guidance and encouragement, without which this dissertation would not have been completed. I also wish to thank Dr. John G. Saw for his careful proofreading and many helpful suggestions. The assistance of Dr. Dennis D. Wackerly throughout my course of graduate study is greatly appreciated. My special thanks go to Dr. William Mendenhall who gave me the opportunity to come to the University of Florida and who encouraged me to pursue the degree of Doctor of Philosophy. TABLE OF CONTENTS Page ACKNOWLEDGMENTS . . . . . . . .. . . . . . iii ABSTRACT . . . . . . . .. . . . . . vi CHAPTER I INTRODUCTION AND LITERATURE REVIEW . . . . . . 1 1.1 Statement of the Combination Problem . . . . 1 1.2 Non-Parametric Combination Methods . . . . . 2 1.3 A Comparison of Non-Parametric Methods . . . . 5 1.4 Parametric Combination Methods . . . . . . 8 1.5 Weighted Methods of Combination . . . . .. 11 1.6 The Combination of Dependent Tests . . . . .. 12 1.7 The Combination of Tests Based on Discrete Data . 13 1.8 A Preview of Chapters II, III, and IV . . . .. 18 II BAHADUR EFFICIENCIES OF GENERAL COMBINATION METHODS . . 19 2.1 The Notion of Bahadur Efficiency . . . . .. 19 2.2 The Exact Slopes for TA) and T() . . . . . 21 2.3 Further Results on Bahadur Efficiencies . . .. 26 2.4 Optimality of T(F) in the Discrete Data Case . .. 28 III THE COMBINATION OF BINOMIAL EXPERIMENTS . . . . .. 32 3.1 Introduction . . . . . . . . . . 32 3.2 Parametric Combination Methods . . . . . .. 33 3.3 Exact Slopes of Parametric Methods . . . . .. 37 3.4 Approximate Slopes of Parametric Methods . . .. 44 3.5 Powers of Combination Methods . . . . . .. 54 3.6 A Synthesis of Comparisons . . . . . . .. 57 (F) 3.7 Approximation of the Null Distributions of T , (LR) (ALR)79 T T.. . . . . . . .... . 79 IV APPLICATIONS AND FUTURE RESEARCH . . . . . ... 96 4.1 Introduction . . . . . . . . . . 96 4.2 Estimation: Confidence Regions Based on Non-parametric Combination Methods . . . . .. 96 TABLE OF CONTENTS (Continued) CHAPTER IV (Continued) Page 4.3 The Combination of 2 x2 Tables . . . . .. 110 4.4 Testing for the Heterogeneity of Variances ..... 113 4.5 Testing for the Difference of Means with Incomplete Data . . . . . . . ... .115 4.6 Asymptotic Efficiencies for k-.. . . . ... 116 BIBLIOGRAPHY . . . . . . . . ... .... .. .118 BIOGRAPHICAL SKETCH . . . . . . . . ... . . 122 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A COMPARISON OF METHODS FOR COMBINING TESTS OF SIGNIFICANCE By William C. Louv August 1979 Chairman: Ramon C. Littell Major Department: Statistics Given test statistics X( ),...,X(k) for testing the null hypotheses H1,...,Hk, respectively, the combining problem is to select a function of X ,...,(k) to be used as an overall test of the hypoth- esis H = H n H2 n ... n Hk. Functions based on the probability integral transformation, that is, the significance levels attained by X(1),...,X(k) form a class of non-parametric combining methods. These methods are com- pared in a general setting with respect to Bahadur asymptotic relative efficiency. It is concluded that Fisher's omnibus method is at least as efficient as all other methods whether X(),...(k) arise from contin- uous or discrete distributions. Given a specific parametric setting, it may be possible to improve upon the non-parametric methods. The problem of combining binom- ial experiments is studied in detail. Parametric methods analogous to the sum of chi's procedure and the Cochran-Mantel-Haenszel procedure as well as the likelihood ratio test and an approximate likelihood ratio test are compared to Fisher's method. Comparisons are made with respect to Bahadur efficiency and with respect to exact power. The power vi comparisons take the form of plots of contours of equal power. If prior information concerning the nature of the unknown binomial success probabilities is unavailable, Fisher's method is recommended. Other methods are preferred when specific assumptions can be made concerning the success probabilities. For instance, the Cochran-Mantel-Haenszel procedure is optimal when the success probabilities have a common value. Fisher's statistic has a chi-square distribution with 2k degrees of freedom when X(1),...,X(k) are continuous. In the discrete case, however, the exact distribution of Fisher's statistic is difficult to obtain. Several approximate methods are compared and Lancaster's mean chi-square approximation is recommended. The combining problem is also approached from the standpoint of estimation. Non-parametric methods are inverted to form k-dimensional confidence regions. Several examples for k=2 are graphically displayed. CHAPTER I INTRODUCTION AND LITERATURE REVIEW 1.1 Statement of the Combination Problem The problem of combining tests of significance has been studied by several writers over the past fifty years. The problem is: Given test statistics X(),...,X(k) for testing null hypotheses H1,...,Hk, respectively, to select a function of X(,...,X(k) to be used as the combined test of the hypothesis H = H1 n H2 n ... n k. In most of the (i) work cited, the X are assumed to be mutually independent, and, except where stated otherwise, that is true in this paper. Some practical situations in which an experimenter may wish to combine tests are: i. The data from k separate experiments, each conducted to (1) (k) test the same H, yield the respective test statistics X ,...,X It is desired to pool the information from the separate experiments to form a combined test of H. It would be desirable to pool the informa- tion by combining the X(i) if (a) only the X(i), instead of the raw data, are available, if (b) the information from the ith experiment is sufficiently contained in X or if (c) a theoretically optimal test based on all the data is intractible. th (i) ii. The i of k experiments yields X to test a hypothesis H., i = l,...,k, and a researcher wishes to simultaneously test 1 the truth of l ,..., 'k. Considerations (a), (b), and (c) in the preced- ing paragraph again lead to the desirability of combining X) as a test of H = H1 n ... n Hk. iii. A simultaneous test of H = H1 n ... n Hk is desired, and the data from a single experiment yield X( ) ,X) as tests of H, ...,Hk, respectively. Combining the X(i) can provide a test of H. In Section 1.2 several non-parametric methods of combination are introduced. A literature review of comparisons of these procedures is given in Section 1.3. The remainder of this chapter is primarily a literature review of more specific aspects of the combination problem. We make some minor extensions which are identified as such. 1.2 Non-parametric Combination Methods (i) Suppose that H. is rejected for large values of X Define 1 L(i) = 1 F.(X(i), where F. is the cumulative distribution function of X under H.. If X ) is a continuous random variable, then L) 1 is uniformly distributed on (0,1) under H.. Many of the well-known 1 methods of combination may be expressed in terms of the L Such methods considered here are: (1) T(F) -2QenL) (Omnibus method, Fisher [13]) (2) T(N) =_Z-1-(L(i)) (Normal transform, Liptak [26]) (3) (m) = -min (i) (Minimum significance level, Tippett [42]) (4) T = -max L (Maximum significance level, Wilkinson [44]) (5) T() = 21Ln(l L(i) (Pearson [361) (6) T(A) -EL(i) (Edgington [12]). As the statistics are defined here, H is rejected when large values are observed. Figure 1 (page 4) shows the rejection regions for the sta- tistics defined above when k = 2. In the continuous case, the null distributions of these statis- tics are easily obtained. They are all based upon the fact that the L are uniformly distributed under H. It is easily established that this is true. The cumulative distribution function for L(i) is P{L < } = P{1 F(x) < } = 1 P{F(x) < 1 I } -l = 1 P{x < F -(1 )} = 1 F{F-(1 ()} = 1 (1 9) = 9. That T(N) has a normal distribution with mean 0 and variance k follows trivially. The statistics T) and T are seen to be based on the order statistics of a uniform random variable on (0,1) and therefore distributed according to beta distributions. (P) (F) That T and T(F) are distributed as chi-squares on 2k degrees of freedom is established as follows. The probability density function (i) of L is fL(k) = I(0,1)(P) Let S = -2ZnL. Then -S/2 dL -1 -S/2 L' dS 2 It follows that dL 1 -S/2 f (S) = f (S) = e I (2 (S). Edgington's statistic, T(A), is a sum of uniform random variables. As shown by Edgington, significance levels can be established for L(2) L(-2) T(F) L) (N L(2) L(2) () L L (2) T(m) L 1(M) 0)2) T(D T (P) T(A) Figure 1. Rejection Regions in Terms of the Signific for k=2. ance Levels ) values of TA on the basis of the following equation [12]: k k k k k (A) t k (t-l) k (t-2) k (t-3) k (t-S) P{T >-t} =-- ( ) k ( ) ( ) k .+ (S) (1.1i) k! 1 k! 2 k! 3 k! S k! where S is the smallest integer greater than t. 1.3 A Comparison of Non-parametric Methods The general non-parametric methods of combination are rules prescribing that H should be rejected for certain values of (L (), L(2),..., L(k)). Several basic theoretical results for non- parametric methods of combination are due to Birnbaum [7]. Some of these results are summarized in the following paragraphs. (i) Under H., L is distributed uniformly on (0,1) in the con- 1 (i) tinuous case. When H. is not true, L is distributed according to 1 a non-increasing density function on (0,1), say gi(L(i)), if X(i) has a distribution belonging to the exponential family. Some overall alternative hypotheses that may be considered are: (i) H : One or more of the L 's have non-uniform densities gA gi (L(i)). HB: All of the L(i)'s have the same non-uniform density g(L). HC: One of the L(i) s has a non-uniform density g (L()). H is the appropriate alternative hypothesis in most cases where prior knowledge of the alternative densities g (L(i)) is unavailable [7]. The following condition is satisfied by all of the methods introduced in Section 1.2. Condition 1: If H is rejected for any given set of L 's,then it will also be rejected for all sets of L(i)*'s such that L (i) L) for each i [7]. It can be shown that the best test of H versus any particular alternative in HA must satisfy Condition 1. It seems reasonable, therefore, that any method not satisfying Condition 1 can be elim- inated from consideration [7]. In the present context, Condition 1 does little to restrict the class of methods from which to choose. In fact, "for each non- parametric method of combination satisfying Condition 1, we can find some alternative H represented by non-increasing functions gl(L(1) ,...,g k(L(k) against which that method of combination gives a best test of H" [7]. It should be noted that direct comparison of general combining methods with respect to power is difficult in typical contexts. The precise distributions of the gi(L(i)) under the alternative hypothesis are intractible except in very special cases. (i) When the X have distributions belonging to the one-parameter exponential family, the overall null hypothesis can be written H: (1) = (1),.. (k) (k) S= 0 0 Rejection of H is based upon 0 0 (X(1),...,X(k) It is reasonable to reject the use of inadmissible tests. A test is inadmissible if there exists another test which is at least as powerful for all alternatives and more powerful for at least one alternative. Birnbaum proves that a necessary condition for the admissibility of a test is convexity of the acceptance region in the (X(1),...,X(k)) hyperplane. For X(i) with distributions in the exponential family, T) and T( do not have convex acceptance regions and are therefore inadmissible [7]. Although Birnbaum does not consider Edgington's method, we see that it is clear that T(A) must also be inadmissible. For instance, for k=2, consider the points (O,c), (c,O), and (c/2,c/2) in the (L(1),L(2)) plane which fall on the boundary of the acceptance region T(A) > c. The points in the (X(1),X(2) plane corresponding to (O,c) and (c,O) would fall on the axes at o (and -"). The point correspond- ing to (c/2,c/2) certainly falls interior to the boundaries described by the points corresponding to (c,0) and (O,c). The acceptance region can not, therefore, be convex and hence T is inadmissible. This argument is virtually the same as that used by Birnbaum to establish (PM (M) the inadmissibility of T and T For a given inadmissible test it is not known how to find a particular test which dominates. Birnbaum, however, argues that the choice of which test to use should be restricted to admissible tests. The choice of a test from the class of admissible tests is then contin- gent upon which test has more power against alternatives of interest [7]. In summary of Birnbaum's observations, since T( and T do not in general form convex acceptance regions in the (X(1),...,X(k)) hyperplane, they are not in general admissible and can be eliminated as viable methods. We can extend Birnbaum's reasoning to reach the same conclusion about T By inspecting the acceptance regions formed by the various methods, Birnbaum also observes than T) is more sensitive (F) to HC (departure from H by exactly one parameter) that T(F). The test (F) T ( however, has better overall sensitivity to HA [7]. Littell and Folks have carried out comparisons of general non- parametric methods with respect to exact Bahadur asymptotic relative efficiency. A detailed account of the notion of Bahadur efficiencies is deferred to Section 2.1. In their first investigation [26], Littell and Folks compare T(F) T(N), T and T(m). The actual values of the efficiencies are (F) given in Section 2.3. The authors show that T(F) is superior to the other three procedures according to this criterion. They also observe that the relative efficiency of T(m) is consistent with Birnbaum's observation that T performs well versus HC. (F) Further, Littell and Folks show that T with some restric- tions on the parameter space, is optimal among all tests based on the X as long as the X) are optimal. This result is extended in (F) a subsequent paper [28] by showing that T(F) is at least as efficient as any other combination procedure. The only condition necessary for this extension is equivalent to Birnbaum's Condition 1. A formal state- ment of this result is given in Section 2.3. 1.4 Parametric Combination Methods (F) The evidence thus far points strongly to T(F) as the choice among general non-parametric combination procedures when prior knowledge of the alternative space is unavailable. When the distributions of the X(i) belong to some parametric family, or when the alternative param- (F) eter space can be characterized, it may be possible that T(F) and the other general non-parametric methods can be improved upon. A summary of such investigations follows. Oosterhoff [33j considers the combination of k normally dis- tributed random variables with known variances, and unknown means P1l. 2,... k. The null hypothesis tested is H; P = )2 = ... = k 0 versus the alternative HA :i > 0, with strict inequality for at least A i one i. lie observed that many combination problems reduce to this situation asymptotically. The difference in power between a particular test and the optimal test for a given (p 2" ,...,i k) is called the short- coming. Oosterhoff proves that the shortcomings of T(F) and the maximum likelihood test go to zero for all (l' .. k) as the overall signif- icance level tends to zero. The maximum shortcoming of the likelihood (F) ratio test is shown to be smaller than the maximum shortcoming of T Oosterhoff derives a most stringent Bayes test with respect to a least favorable prior. According to numerical comparisons (again with respect to shortcomings), the most stringent test performs sim- ilarly to the likelihood ratio test. The likelihood ratio test is much easier to implement than the most stringent test and is therefore pre- ferable. Fisher's statistic, T(F) is seen to be slightly more powerful than the likelihood ratio test when the means are similar; the opposite is true when the means are dissimilar. A simple summing of the normal variates performs better than all other methods when the means are very similar [33]. Koziol and Perlman [20] study the combination of chi-square (i) 2 variates X ~ X (6.). The hypothesis test considered is H: Vi 1 6 = .. = = 0 vs HA: 6.i 0 (strict inequality for at least one i) where the 6. are non-centrality parameters. The V. correspond to the 1 1 respective degrees of freedom. An earlier Monte Carlo study by Bhartacharya [6] also addressed this problem and compared the statistics T(F), T(m), and EX(i) Bhattacharya concluded that X(i) and TF) were almost equally powerful and that both of these methods clearly dominated T(m). Koziol and Perlman endeavor to establish the power of T( and EXi) in some absolute sense. To do this, they compare T(F) and EX( to Bayes procedures since Bayes procedures are admissible and have good power in an absolute sense [20]. (i) When the v. are equal, EX is Bayes with respect to priors giving high probability to (01,...,9k) central to the parameter space (Type B alternatives). The test Eexp{ X ) is Bayes with respect to priors which assign high probability to the extremes of the param- eter space (Type C alternatives). For unequal v.'s the Bayes tests i have slightly altered forms. The Bayes procedures are compared to T(F) T(m) and T(N) for k=2 for various values of (1,' 2) via numerical tabulations and via the calculation of power contours. (i) The statistic T is seen to have better power than the other tests for Type C alternatives but performs rather poorly in other situa- tions. The Bayes test performs comparably to T for Type C alterna- (m) tives and is much more sensitive to Type B alternatives than T .The (N) statistic, T is relatively powerful over only a small region at the (N) center of the parameter space. The statistic, T is seen to be dom- inated by some other procedure for each value of k investigated. The (F) (i) (F) statistics, T and EX are good overall procedures, with T more sensitive to Type C alternatives and EX more sensitive to Type B alternatives. For v E 2, T(F) is more sensitive to Type B alternatives than EXi) is to Type C alternatives and T(F) is therefore recommended. The opposite is true for v = 1. These observations were supported for k>2 through Monte Carlo simulations. Koziol and Perlman also consider the maximum shortcomings of the tests. In the context of no prior information, they show that T(F) minimizes the maximum shortcomings for vi. 2 while X minimizes the maximum shortcoming for v. =1. An additional statistic can be consid- ered when vi = 1. It is T X) = Z(X(i) M the sum of chi's procedures. For k=2, T is powerful only for a small region in the center of the parameter space. For large k, the performance of T(X) becomes progres- sively worse. It can be said that T(X) performs similarly to T(N) 1.5 Weighted Methods of Combination Good [14 suggests a weighted version of Fisher's statistic, (G) (i) T( = -i .nL He showed that, if the Ai are all different, signif- icance probabilities can be found by the relationship (C k P{T( > x) = E A exp(-x/A ) r=l where k-1 A (r r 1 r 2 r r-l r r+l r k Zelen [45] illustrates the use of T(G) in the analysis of incomplete block designs. In such designs, it is often possible to perform two independent analyses of the data. The usual analysis (intrablock analysis) depends only on comparisons within blocks. The second analysis (interblock analysis) makes use of the block totals only. Zelen defines independent F-ratios corresponding to the two types of analysis. The attained significance level corresponding to the interblock analysis is weighted according to the interblock effi- ciency which is a function of the estimated block and error variances. A similar example is given by Pape [34 Pape extends Zelen's method to the more general context of a multi-way completely random design. Koziol and Perlman [20] also considered weighted methods for the problem of combining independent chi-squares. They conclude that when prior information about the non-centrality parameters is available, increased power can be achieved at the appropriate alternative by a (i) weighted version of the sum test, Eb.X if v. > 2 for all i and by 1 1 (G) the weighted Fisher statistic, T when v i 2 for all i. 1.6 The Combination of Dependent Tests The combinations considered up to this point have been based on mutually independent L) arising from mutually independent statis- tics, X As previously indicated, in such cases, the functions of the L) which comprise the general methods have nulldistributions which are easily obtained. When the Xi (and thus the L ) are not inde- pendent the null distributions are not tractible in typical cases. Brown [9] considers a particular example of the problem of combining dependent statistics. The statistics to be combined are assumed to have a joint multivariate normal distribution with known covariance matrix f and unknown mean vector (ul ,2 2,..., k)'. The hypoth- esis test of interest is H: p. = versus H: p > p. (strict in- 1 10 1 i- 10 equality for at least one i). A likelihood ratio test can be derived [31], but obtaining significance values from this approach is difficult. Brown bases his solution on T The null distribution of T( is not chi-square on 2k degrees of freedom in this case. The mean of (F) T is 2k as in the independent case. The variance has covariance terms which Brown approximates. The approximation is expressed as a function of the correlations between the normal variates. These first two moments are equated to the first two moments of a gamma dis- tribution. The resultant gamma distribution is used to obtain approx- imate significance levels. 1.7 The Combination of Tests Based on Discrete Data As noted in previous sections, the literature tends to support (F) T as a non-parametric combining method in the general, continuous data framework. Those authors who have addressed the problem of com- (F) bining discrete statistics have utilized T(F) assuming that the opti- mality properties established in the continuous case are applicable. The problem then becomes one of determining significance (F) probabilities since T(F) is no longer distributed as a chi-square on 2k degrees of freedom. We describe the problem as follows. Suppose (i)* (i)* L derives from a discontinuous statistic, X and that a and b are possible values of L(i)*, < a < b 1, such that a < L
is impossible. For a < E < b, P{L (i) } = P{L a} = a. If L(i) derives from a continuous statistic, X(i), then P{L (i) = (i)* (i) Since a < &, L is stochastically larger than L It follows that Fisher's statistic is stochastically smaller in the discrete case than (F) in the continuous case. The ultimate result is that if T(F) is com- pared to a cli-square distribution with 2k degrees of freedom when the data are discrete, the null hypothesis will be rejected with too low a probability. When k, the number of statistics to be combined, is small and (nl,n2 ...,nk), the numbers of attainable levels of the discrete sta- tistics are small, the exact distribution of T(F) can be determined. Wallis [43] gives algorithms to generate null distributions when all of the X(i) are discrete and when one X( is discrete. Generating null distributions via Wallis' algorithms becomes intractible very quickly as k and the number of attainable levels of the X) increase. The generation of complete null distributions is even beyond the capa- bility of usual computer storage limitations in experiments of modest size. The significance level attained by a particular value of T(F) can be obtained for virtually any situation with a computer, however. A transformation of T(F) which can be referred to standard tables is indicated. A method suggested by Pearson [37] involves the addition, by a separate random experiment, of a continuous variable to the original discrete variable and thus yielding a continuous variable. Suppose X(i) can take on values 0, 1, 2,..., n. with probabilities p0 p ,..., p ni 1 0 1 ni Let P() = E p Note that P(i)
J x x ni ni-1 1 0 n. n.-l 1 0 1 1 (i) (i) for large values of X ), then the P. j = 0,1,2,...,n. are the n. j 1 th observable significance levels for the i test; that is, the observ- (i) (i) able values of random variable L = 1 F(X 1) under the null n. n. 1i hypothesis. Assume that an exact slope exists for all tests; that is, assume that there exist functions c l(), c2(0),...,c k() such that -2 (i) -- nL c.(0) as n - n. n. I i 1 1 with probability one [6] for i = 1,2,...,k. (d) (i) i Proposition 3. Let T = (-2 EZ n L )2. If II n 1 lim -- n[l- F (d)n t)] exists then the exact slope of T(d) is n n n n o k cd(0) = Z c (0). i=l Proof: This proof utilizes the Bahadur-Savage Theorem (Theorem 1). To establish the first part of Theorem 1, observe that (1) (k) (d) 2 (nL) 2nnL) ; T n_ _ n n n (A cl(a) + ... + AkCk(0))k as n + om with probability one [0]. Consistent with the notation of Theorem 1, denote this limiting quantity bd(0). Now, to establish f(t) of Part 2 of Theorem 1, choose Z.c (0,1), i = 1,2,...,k. For each i, there exists a j such that 1 [P(i) (i) i j Pj-1] (i) Now, since L is a discrete random variable, n. i P{(i) .) = P{L (i)
n i n. j 1
1 1
Thus
P{L() e .)} P{u( ) M .}
n. i i
1
(ii
where the U are mutually independent uniform random variables on
(0,1). It follows that
P{-21nL(i) >_ } < P{-2n U(i) > .},
n. 1 1
1
and hence that
k k
P{ Z -2nL) >z"} P{ E -2n U(i) "}.
i=l i i=l
Thus,
-1 Mi)
-1 n P{[E 2nL (i)] t} n P{[ -2Zn U (i) > n t}. (2.4)
n n. n
1
The quantity Z = [Z 2Rn U (i) is distributed as the square root of
n
a chi-square with 2k degrees of freedom. It follows that the density
of Z is
n 2
f (Z 1 2k-1 -Z2/2
fZ (Z) k- Z e
n 2 F(k)
Thus, from the result given in (2.1), the limit of the right-hand side
of (2.4) can be written as
-1
lim -4n f (Wn t)
n Z
Sn 2k-1 -1 2 n
= lim n [ (/n t)2 exp( (/n t) )]
n- 21 F(k)
-1 1 2
=lim -- [(2k-1) kn(n t) - nt]
n 2
nco
1 2
t
2
Hence, it follows from (2.4) that
1 2
f (t) > t ,
since it is assumed that the limit of the left-hand side exists.
Applying the result of Theorem 1,
Cd(e) ZiA.c.i ().
d 11
By Theorem 2,
Cd(X) 11().
Hence,
cd(9) = ic. (e).
That is, the exact slope of T(F) is EX.c.(6) regardless of whether the
11
X are continuous or discrete. The condition imposed in Proposition 3
that lim -_. n[lF (d) (/n t)] exist is not very restrictive. It is
n n
n -co
satisfied in most typical cases and in every example considered in this
dissertation.
CHAPTER III
THE COMBINATION OF BINOMIAL EXPERIMENTS
3.1 Introduction
Chapter III deals with the combination of binomial experiments.
That is, suppose k binomial experiments are performed. Let nl,n2,... ,nk
(1) (2) (k)
and X X ...,X he the sizes and the observed numbers of suc-
n1 n2 nk
cesses, respectively, for the experiments. Denote the unknown success
probabilities as pl,p2,...,pk. Suppose one wishes to test the overall
null hypothesis H: pl = P10' p2 = P20' ". = PkO versus the alter-
native hypothesis HA : p > p10' .. k PkO (with strict inequality
for at least one p.). The problem, then, is to choose the best function
of (X ... X (k) for this hypothesis test.
n I nk
1 k.
(F)
The results of Chapters I and II support T(F) as a non-
parametric method with good overall power when there is no prior infor-
mation concerning the unknown parameters. The method based on the
minimum significance level, T is sensitive to situations where
exactly one of the individual hypotheses is rejected. That Is, T) is
powerful versus the alternative IC of Section 1.3.
The investigations of Koziol and Perlman [20] and Oosterhoff [33]
show that the general non-parametric combining methods can be improved
32
on for certain parametric combining problems. It follows that there
may be combination methods based directly on (X ... ) that are
n nk
1 k
superior to Fisher's omnibus procedure.
Chapter III is a detailed comparison of T(F) and several
parametric combination methods.
3.2 Parametric Combination Methods
As stated in Section 1.3, no method of combination is most
powerful versus all possible alternative hypotheses. There are, however,
certain restricted alternative hypotheses against which most powerful
tests do exist.
th
Let the likelihood function for the i binomial experiment
be denoted by
n. (i) n.-X(
L(p) = ( i)pX (1- p.) (3.1)
X
According to the Neyman-Pearson Lemma, if a most powerful test of the
null hypothesis H: pi = Pi, all i, versus the alternative hypothesis
HA: pi > io (with strict inequality for at least one i) exists, it is
to reject H if
k L(p )
iO
71 < C.
i=l L(
Upon substituting (3.1) and taking logs, an equivalent form of the test
is to reject H if
EX() M n{Pi(l Pi )/Pi (l-Pi)} > C. (3.2)
It follows that rejecting H when
SXi) > C (3.3)
is most powerful if p.(l p i)/p i(1-p.) is constant on i.
1 i iO 1
The problem of combining 2 x2 contingency tables is closely
related to the problem being considered. The purpose of each 2x2 table
can be interpreted as testing for the equality of success probabilities
between a standard treatment and an experimental treatment. The overall
null hypothesis is that the experimental and standard success probabil-
ities are equal in all experiments. The overall alternative hypothesis
is that the experimental success probability is superior in at least
one experiment.
Cochran [LO] and Mantel and Haenszel [29] suggest the statistic
E widi//Ew.p.qi (3.4)
where
S= nilni2/(nil + ni2) d = Pil Pi2
for combining 2x2 tables. Mantel and Pasternak [34] have discussed
this statistic in the context of combining binomial experiments. Each
individual binomial experiment is similar to an experiment resulting
in a 2x2 table, two cells of which are empty because the control
success probability is considered known and need not be estimated from
(CHM)
the data. The statistic defined by (3.4) will be denoted by T(
It can easily be shown that the test T(crH) > c is equivalent to the
test E X(i) > C, thus T(CMH) is the most powerful test when
p.(l p )/pi(1 p.) is constant on i.
In many practical combination problems with binomial experiments,
Pi =1/2 for all i. The null hypothesis is then H: p. =1/2, i=,2,...,k
and the general alternative hypothesis is HA: pi > 1/2 (strict inequal-
ity for at least one i). This is the hypothesis testing problem under
consideration throughout the remainder of Chapter III. For pi0 =1/2,
all i, T( ) is uniformly most powerful for testing H: pi0=1/2, all i,
versus HB: pi = p2 = p3 = ... = k > 1/2.
For the hypothesis test just described, T( ) can be written
k k
(i) 1 1 ).
S(X n.)/( E n.) (3.5)
i=l 2 i 4 1
i=l j=1
This variate is asymptotically standard normal. It is of note that
this form is standardized by a pooled estimate of the standard devia-
tion. An alternative statistic can be formed by standardizing each
X( yielding
(X) =1 {(x(i) 1 1 )A
T {(X -n)/(- n)
2 i 4 i
which also has an asymptotic standard normal distribution. The statistic,
(X)
T is analogous to the sum of chi's procedure which has been recommended
for combining 2x2 tables. The statistic T(X) is not in general equiva-
lent to T(CmlI); in fact, the test T(X) > c is equivalent to the test
k (i) (x) (CMI)
E n X > c. When the n. are all equal, T and T are equivalent.
i=l
(i) (i)
Weighted sums of the X say S(g) = Z gi X form a general
class of statistics. Oosterhoff considers this class and makes the fol-
lowing observations concerning their relationship with the individual
sample sizes [32]. It follows from (3.2) that if kn(p./l-p.) = a gi
then the most powerful test of H versus H is E g. X( > c.
1
Let pi = + C It follows that
i 2 1
n(p./l p.) = kn(l + 2L./l 2t.)
(2e.)3 (2c.)5
= 2{2 + + ...}
1 3! 5!
= 4 + (E2)
i O+0 i
1
This implies that for alternatives close to the null hypothesis, H,
S(g) is most powerful if Ei = e gi; that is, if the deviations from the
null values of the pi are proportional to the respective g.. The sum
of chi'sprocedure, T is a special case where g. (n.) It fol-
(x)
lows that the alternatives against which T) is powerful is strongly
related to the sample sizes, n,n2,....,nk.
The weighted sum, S(g), may be a viable statistic if prior
information concerning the p. is available. Under the null hypothesis,
S(g) is a linear combination of binomial random variables, each with
success probability 1/2. The null distribution of S(g) will therefore
be asymptotically normal. The proper normalization of S(g) is analogous
to that of CmllH) given in (3.5).
A well-known generalization of the likelihood ratio test is to
reject the null hypothesis for large values of -2 kn{ sup L(O,X)/sup L(O,X)}.
c60 ~ 6
It is easily shown that for the hypothesis test being considered, the
likelihood ratio statistic is
(LR) k (i) (i) (i) X(i) X(i) 1
_i= 1 2 n. i n. n. 2
i=l1 1 1
where
1 if -- 1/2
.(i n
i (i)
0 if < 1/2.
i
Under broad conditions, which are satisfied in this instance, the
(LR)
statistic T(LR) has an asymptotic chi-square distribution with degrees
of freedom.
Suppose z., i = 1,2,...,k are normal random variables with
means U. and variance 1. The likelihood ratio test for H: P. = 0,
1 1
i = l,...,k versis H1: 1i i 0 (with strict inequality for at least one i)
is to reject H for large values of
k 2
E z. I{z. 0}. (3.6)
i=l
For the binomial problem, an "approximate likelihood ratio" test is then
to reject for large values of
k
(ALR) (i) 1 2 1 (i) 1
T =E (X n) /- n. Ix > n.}
2 4 1 -2 i
i=l
1 1 1
since (X n )/( n. ) is asymptotically a standard normal random var-
iable under H. The exact null distribution of (3.6) is easily derived.
Critical values are tabled in Oosterhoff. When p = 1/2, the normal
approximation to the binomial is considered satisfactory for even
fairly small sample sizes. It follows that the exact null distribution
of (3.6) should serve as an adequate approximation to the null distribu-
tion of T(R)
3.3 Exact Slopes of Parnmetric Methods
In this section, the exact slopes of r(F) T(CI) and T(LR)
are compared. We have not been successful in deriving the exact slope
for T(). A more complete comparison of methods is given in Section 3.4
with respect to approximate slopes.
Suppose X) is a binomial random variable based on n. observa-
n 1
tions with unknown success probability p.. Consider testing the single
null hypothesis H: pi = 1/2 versus the single alternative hypothesis
H : p. > 1/2.
(i 1 (i) (i)
Proposition 4. Let T = X The exact slope of T is
1n. n n.
i n. i i
1
c.(9) = 2{pi n 2pi + (1 pi) n 2(1 pi)}.
Theorem 1 is used to prove Proposition 4. There are several
means by which the function f(t) of Part 2 of Theorem 1 can be obtained.
Perhaps the most straightforward way is by using Chernoff's Theorem [1].
Bahadur, in fact, suggests that the derivation of f(t) provides a good
exercise in the application of Chernoff's Theorem.
Theorem 3. (Chernoff's Theorem). Let y be a real valued random
ty
variable. Let (t) = E(e ) be the moment generating function of y.
Then, 0 < #(t) < for each t and P(0) = 1. Let P = inf{((t): t > 0}.
Let yl, Y2,..., denote a sequence of independent replicates of y and
for n = 1,2,..., let Pn = P{yi + ... + yn 01. Then
1
in P Zn P as n -c .
n n
Proof of Proposition 4. For Part 1 of Theorem 1,
,(i) X(i)
T X
1 i
-+ Pi
n i
1
with probability one [6] giving b(6). For the binomial problem
0 = (pl, 2 ... ,Pk)
Now, as n Lends to infinity,
lim -1
n.
lim
n.
1
= lim -
n.
1
= lim -
n.
1
= lim -
n.
1
Zn(l -F (n-. a))
Zn P{X()/n. > A7 a}
n. i i
(i)
n. 1
1
Zn P{X(
n.
i
- n.a > 0}.
1
The random variable
(i)
X
n.
1
(i)
X = (y
n. 1
1
- n.a can be expressed as
- a) + (y2 a) + ... (n a)
2 n
where the y. are independent replicates of a Bernoulli random variable
y with parameter 1/2. Therefore, P(t) of Chernoff's Theorem is
(t) = e (l + et).
y -a 2
1
It follows that
1 -at
The quantity e
T
1 -at t
P = inf{- e (l + e ):
2
(1 + et) is minimized for
a
t = n -
1-a
Thus,
1 -a[Zn a/l-aj [[Zn a/1-a]
S = 2e (1+ eJ /l-)
1
Zn P = -a Zn a/1-a + Zn (1 + a/1-a).
2
(3.7)
t > }.
and
Hence,
-1 (i) 1
li -- n P{X(i) > n.a} =a An(a/1-a) n( (l +a/1-a)) (3.8)
n. n. 2
1 1
giving f(a) of Part 2 of Theorem 1. Thus,
c.(0) = 2{pi kn p./l-Pi -n -(1 + p./l-p.)}
1i i 2 i i
= 2{pi n 2pi + (1 p.) An 2(1 p.)}
Following the notation of Section 2.2, suppose k binomial
experiments are to be combined and the sample sizes nl,n2,..., nk
satisfy nl + n2 + ... + nk = nk and
n.
lim = X., i = l,...,k.
n 1
Then 1 + ... + k = k and
1 k
-2 (i)
n L .c. (0) as n m.
n n. i i
1
According to Proposition 3, c F() = Z X c (6) in both the continuous and
F i i
discrete case if lim -- n[l F (/To)] exists. The existence of this
n n
limit for a single binomial experiment is shown in (3.8) of the proof
of Proposition 4. Therefore, for the binomial combination problem, the
exact slope for Fisher's method, T(F) is
k
c ,() = X Ai{pi Zn 2pi + (1 p.i)n 2(1 pi)}.
i=l
A property of likelihood ratio test statistics is that they
achieve the maximum possible exact slope [2]. Theorem 2 states that
the exact slope for the combination problem is bounded above by
Si.c.(O). Proposition 3 shows that T(F) achieves this. If follows
11I
41
(F)
that T and the maximum likelihood test have the same exact slope;
that is,
cF(0) = CLR(6).
This relationship is true regardless of whether the data are discrete
or continuous.
(CMH) 1
Let T( -C) X). This form of the Cochran-Mantel-
n -n k ni
Haenszel statistic is equivalent to those previously given in (3.3)
and (3.5).
Proposition 5. The exact slope of T is
n
c (0) = 2k{p kn 2p + (1-p) Zn 2(1-p)} where p = E X.p..
CMH k ii
Proof. To get b(O) of Part 1 of Theorem 1,
S(CmH)
-n = X i) /nk
n n
nn.
1
X(i)
1 n n
k n, n
k i i
with probability one [a]. Now, for Part 2 of Theorem 1, as n tends to
infinity,
lim -1 kn[l F(JI a)]
n
= lim r- in P{-- E i) ,n a}
n n k i
-1 (i)
= lim k in P{E X n k a} = f(a). (3.9)
nk n.
i
(i)
Under the null hypothesis, Z X is a binomial random variable based
n.
I
on nl + ... + nk = nk trials with success probability 1/2. The quantity
(3.9) is the same as the quantity (3.7) except that n. has been replaced
1
by nk. Theorem 3 can be directly applied to line (3.9), yielding
-1
lim -_- n [1 F(/n a)]
n
= k{a en -a n 1(1 + a/l-a)} = f(a)
1-a 2
and therefore
cCMH(0) = 2f(b(O))
= 2k{p ln(p/l-p) ln(-(l + pl-p)}
= 2k{p ln 2p + (l-p) ln 2(l-p)}.
A comparison of T(CME) relative to T(F) and F(LR) with respect
to exact slopes is given in the next section.
Derivation of the exact slope of the sum of chi's procedure,
(X)
T has not been accomplished. An incomplete approach to the problem
follows.
Let T(X) = Z n X ). To derive b(O) of Part 1 of Theorem 1,
T n n. n.
T(X) n.
n 1n s
,n n n
X p. as n m
i i
with probability one [8]. Now, as n tends to infinity,
-1
lim Ln P{1 F(Vn a)}
-1 P(n1 (1) nk (k)
= lim- p{ I ( + ... ) n a).
n ( nl nk
The left-hand side of the above probability statement is a weighted sum
of independent binomial random variables based on varying sample sizes
nl,n2,...,nk each with success probability 1/2. The moment generating
function of this random variable is therefore
n n
= -+ e +... e+ e
2 2
(3.10)
From the form of the moment generating function given in (3.10), it is
apparent that the random variable in question can be regarded as a sum
n independent identically distributed variates each with moment generat-
ing function
1 1 t)nl/n k 1 nk /n
+ Te I... + e
Then, since as n tends to infinity
n-1 n(1) ^(k)
lim P-1 x ( +. ... + X na
n n n nk
n n M
= lim -1{ Z X a 0},
n n. n
j=1 i
#(t) of Theorem 3 is
-at 1 1 1 1 1 k
(t) = e [( + e ) ... ( + e )]
2 2 2 2
and P = inf{f(t): t a 0}
The quantity p has not been found.
3.4 Approximate Slopes of Parametric Methods
Exact slopes are defined in Section 2.1. In Section 3.3 some
comparisons among methods are made with respect to exact slopes and
corresponding efficiencies. Bahadur also defines a quantity called the
approximate slope [3]. Suppose that X has an asymptotic null distri-
bution F; that is,
lim F (x) = F(x)
n
n- oo
for all x. For each n, let
L = 1 F(x)
n n
be the approximate level attained. (Consistent with Bahadur's notation,
the superscript a stands for approximate.) If there exists a c(a)(
such that
-2 Zn L(a) + (a)
n n
(a)
with probability one [0], then c ()() is called the approximate slope
of (X }.
n
If ca) () is the approximate slope of a sequence {x ),
1 n(
i = 1,2, then cl (a)/c2 (a ) is known as the approximate asymptotic
efficiency of {x()} relative to {x(2)
n n
A result similar to Theorem 1 is given by Bahadur [3] for the
calculation of approximate slopes. Suppose that there exists a
function b(0), 0 < b(O) < co, such that
T
n
-- b(6)
with probability one [0]. Suppose that for some a, 0 < a < m, the
limiting null distribution F satisfies
1 2
Zn[l F(t)] ~ at as t c
2
(a) [b() 2
Then the approximate slope is c (() = a[b(6)]. This result is
applicable with a = 1 for statistics with asymptotic standard normal
distributions [3]. This result can be shown directly by applying the
result of Killeen et al. given in (2.1).
The approximate slope, c(a)(0), and the exact slope, c(6),
of a sequence of test statistics are guaranteed to be in agreement only
for alternative hypotheses close to the null hypothesis. Otherwise,
they may result in very different quantities. One notable exception is
the likelihood ratio statistic. When the asymptotic null distribution
is taken to be the chi-square distribution from the well-known -2 Zn
(likelihood ratio statistic) approximation, the approximate slope of the
likelihood ratio statistic is the same as the exact slope. The approx-
imate slope is based upon the asymptotic distribution of the statistic.
Equivalent test statistics may have different asymptotic null distribu-
tions giving rise to different approximate slopes. This apparent short-
coming does not exist with exact slopes.
In typical applied situations, the significance levels attained
by T(C0H) and T(X) will be ascertained by appealing to their asymptotic
(LR)
normal distributions. Similarly, T(R) will be compared to the appropri-
ate chi-square distribution and approximate levels for T(ALR) will be
obtained from the asymptotic distribution given in Section 1.2. Approx-
imate slopes based upon these asymptotic distributions would therefore
seem to afford a more appropriate comparison of the methods. In other
words, it is appealing to consider the null distribution that will be
used to obtain significance levels in practice when comparing the
statistics. The only statistic which will not usually be compared to
(CMHI)
asymptotic distributions is perhaps T .(CMH) The null distribution of
T is binomial based on nI + ... + nk trials with success probabil-
ity 1/2. However, even with the availability of extensive binomial
tables, T(CMH) will often be standardized as in (3.5) and compared to
standard normal tables since the normal approximation to the binomial
when p = 1/2 is satisfactory even for fairly small sample sizes.
The asymptotic null distribution of T(F) in the discrete case
is easily shown to be chi-square with 2k degrees of freedom. This is
(F)
also the exact distribution of T(F) in the continuous case. It follows
that the approximate slope in the discrete case is the same as the
exact slope in the continuous case. In summary,
(a) (a)( k
CLR () =c =c R() =C ( =2 E X.{p kn 2p +(l-p.i)n 2(l-pi)}.
LR F LR F ill i i 1 1
i=l
(Cuf) (x)
In order to derive the approximate slopes for T and T
consider the linear combination Zn X of which T and T are
1 n.
(i)
special cases. The variate X has an asymptotic normal distribution
ni
1
1 1
with mean n. and variance n under the null hypothesis. It follows
2 4 i
directly that
(U) 1 ++l / 2u+l
T (Zn.Xni n2 n )/ vn
n i 2 i 2 i
is asymptotically standard normal.
Proposition 6. The approximate slope of T) is
n
c+(a) +1 2 2c+1+
c(a)() = [A (2p -l)] /i
cti i
Proof. First, to get b(O),
1 C+1
-Zni )
2 i
1 f-2c+l
21/ Eni
1
(E(-)
nn
(i)
i
cE+1
n
1 i
E )
2 x
n
2"+1
2cA+l
1 i
2/ 2a
v n
X(i)
1 /( 2a+1
2 ni
Ac 1 ac+l
( (.iPi) -iz
i 2 i
1 / Z2A+
2 i
with probability one [0].
(Ca)
Now, since T is asymptotically standard
n
normal,
(a)
co
= [b(O)]2
-a+l 1 E+1 2 1 2a+l
i i 2 i 4 i
= [Zi (2Pi
2 2a+1
- 1)] /ZA.
i
Letting a = 0 yields the approximate slope of'lT(CI)
C(a() = [EiX(2p 1)]2/k
CHH 1 i
Letting c = -2 yields the approximate slope of T ,
(a) 2
c(a() = [E(2p 1)]2/k.
X 1 i
(")
T
n
11
I aQ(i)
- (EnX M
i n.
v n 1
as n CO
By inspection of the above approximate slopes, it is apparent
(CMIH)
that T is more efficient when the p. are proportional to the A.
1 1
(X)
(the relative sample sizes) and T() is more efficient when the p. are
inversely related to the X.. The boundary of the parameter space where
(a) (a)
c (6) = c (6) is not p = p = ... = k' however. The statistic
T(CIMH) is more efficient than T(X) in more than half of the parameter
(a) (CHH) (x)
space. As a further comparison, e (T T ), the approximate
efficiency of T(C with respect to T(X), can be integrated over the
parameter space. The result is greater than one which again supports
(CMH)
use of T It should be noted, however, that when the pi are pro-
portional to the A. both tests have high efficiencies relative to when
1
(X)
p. are inversely related to the A.. Therefore T() is more efficient in
a region of the parameter space where both tests have relatively low
efficiency. This is a good property for T(X)
An "approximate" likelihood ratio test is introduced in
Section 3.2. A statistic which is equivalent to the form given in
Section 3.2 is
k n
(ALR) [(i) 1 2 i (i)
T = (E [(X n) / IX n
i=l i 4 n 2
Proposition 7. The approximate slope of TALR) is
n
c (o) = eA.(2p.-)1).
Proof. To find b(0) of Part 1 of Theorem 1,
(i)
,(ALRO) X
Tn 1 (i 1 2 Ix(i) 1
-4- --( )n.}}
S n ni 2 2 1
{4E A.(p --1)2 as n co
i i 2
with probability one [0]. Thus,
b(O) = {4EXA(p --1)2 = {ZA.(2p- l)2
I i 2 1 1
To find f(t) of Part 2 of Theorem 1, the asymptotic null distribution
of T(ALR) is required. According to Oosterhoff [33],
n
k
2 -k k 2
P{z.z I{z. >0} s} = 2 Z (.)P[x2 s)
1 1 j=l J
j=1
where z. is a standard normal random variable. Since, under the null
1
hypothesis
(X 2- n.)/ / ./4 z.
n. 2 1 1 1
1
in distribution, it follows that
(ALR) (ALR) 2 2 -k k 2 2
P{T (ALR) >s} = P{(T (ALR) s 2}-2 -k( k )P{X 2s (3.11)
n n j j
as n-, oo for all s. It follows that the associated density function is
a linear combination of chi-square densities. The result of Killeen et al.
can be applied to verify that
1 2
kn[l F(t)] ~ as t -*
2
(ALR)
where F is the asymptotic null distribution of T Hence,
n
c (0) = [b(0)] = Ei(2i- 1)2.
ALR i 1
Before proceeding to a further comparison of approximate slopes, the
slopes are summarized in the following listing.
Approximate Slope
Fishers (T(F)
Fisher's (T )
2EA (pi Zn 2p. + (l-pi) en 2(l-pi)}
i 1 1 1 1
Likelihood Ratio (T(LR) 2EA {pi n 2p + (1-p )n 2(l-p.)}
"Approximate Likelihood Ratio"(T(ALR) Zi(2i 1)2
Sum of Chi's (T(X) [ 1 (2p 1)]2
(CM) 1 2
Cochran-Mantel-Haenzel (T ) -[Ei(2p 1)]
Letting A = 5A (2i 1), it is easy to see that cALR() (
i i ALR
(ak 2 1 2 (a) (a)
c (0O) since Z A 2 [A.] It is also true that c ( c c().
X i=l1 k 1 ALR CM
i=1
Let B. = (2p. 1). It can easily be shown that
1 1
2 1 2 1 2
EX Bi [A.B] = E XX.(B. B.)2
ii k 1 1 k i ] i
i |

Full Text |

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID EO9W3CVIT_MXC3YO INGEST_TIME 2017-07-17T20:25:47Z PACKAGE UF00098279_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES PAGE 1 A COMPARISON OF METHODS FOR COMBINING TESTS OF SIGNIFICANCE BY WILLIAM C. LOUV A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIRE^fENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1979 PAGE 2 Digitized by the Internet Archive XO JILL in 2009 with funaing from University of Florida, George A. Smathers Libraries http://www.archive.org/details/comparisonofmethOOIouv PAGE 3 ACKNOWLEDGMENTS I am indebted to Dr. Ramon C. Littell for his guidance and encouragement, without which this dissertation would not have been completed. I also wish to thank Dr. John G. Saw for his careful proofreading and many helpful suggestions. The assistance of Dr. Dennis D. Wackerly throughout my course of graduate study is greatly appreciated. My special thanks go to Dr. William Mendenhall who gave me the opportunity to come to the University of Florida and who encouraged me to pursue the degree of Doctor of Philosophy. Ill PAGE 4 TABLE OF CONTENTS Page ACKNOWLEDGMENTS iii ABSTRACT vi CHAPTER I INTRODUCTION AND LITERATURE REVIEW 1 1.1 Statement of the Combination Problem 1 1.2 Non-Pararaetric Combination Methods 2 1.3 A Comparison of Non-Parametric Methods 5 1.4 Parametric Combination Methods 8 1.5 Weighted Methods of Combination 11 1.6 The Combination of Dependent Tests 12 1.7 The Combination of Tests Based on Discrete Data ... 13 1.8 A Preview of Chapters II, III, and IV 18 II BAHADUR EFFICIENCIES OF GENERAL COMBINATION METHODS .... 19 2.1 The Notion of Bahadur Efficiency 19 2.2 The Exact Slopes for T^'^^ and T^^"* 21 2.3 Further Results on Bahadur Efficiencies 26 (F) 2. A Optimality of T in tlie Discrete Data Case 28 III THE COMBINATION OF BINOMIAL EXPERIMENTS 32 3.1 Introduction 32 3.2 Parametric Combination Methods 33 3.3 Exact Slopes of Parametric Methods 37 3.4 Approximate Slopes of Parametric Methods 44 3.5 Powers of Combination Methods 54 3.6 A Synthesis of Comparisons 57 (F) 3.7 Approximnt ion of the Null Distributions of T , T^'''^\ T^'^'-'^) 79 IV APPLfCATIONS AND FUTURE RESEARCH 96 4.1 Introduction 96 4.2 Estimation: Confidence Regions Based on Non-parametric Combination Methods 96 iv PAGE 5 TABLE OF CONTENTS (Continued) CHAPTER IV (Continued) Page 4.3 The Combination of 2 x 2 Tables 110 4.4 Testing for the Heterogeneity of Variances 113 4.5 Testing for the Difference of Means with Incomplete Data 115 4.6 Asymptotic Efficiencies for k^Â°Â° 115 BIBLIOGRAPHY ng BIOGRAPHICAL SKETCH 122 PAGE 6 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A COMPARISON OF METHODS FOR COMBINING TESTS OF SIGNIFICANCE By William C. Louv August 1979 Chairman: Ramon C. Littell Major Department: Statistics Given test statistics X ,...,X for testing the null hypotheses H ,...,H , respectively, the combining problem is to select a function of X ,...,X to be used as an overall test of the hypothesis H = H "^ H '^ Â•Â•Â• 1^ ^Ik. Â• Functions based on the probability integral transformation, that is, the significance levels attained by X ,...,X , form a class of non-parametric combining methods. These methods are compared in a general setting with respect to Bahadur asymptotic relative efficiency. It is concluded that Fisher's omnibus method is at least as efficient as all other methods whether X ,...,X arise from continuous or discrete distributions. Given a specific parametric setting, it may be possible to improve upon the non-parametric methods. The problem of combining binomial experiments is studied in detail. Parametric methods analogous to the sum of chi's procedure and the Cochran-Mantel-Haenszel procedure as well as the likelihood ratio test and an approximate likelihood ratio test are compared to Fisher's method. Comparisons are made with respect to Bahadur efficiency and with respect to exact power. The power vi PAGE 7 comparisons Cake tlie form of plots of contours of equal power. If prior information concerning the nature of the unknown binomial success probabilities is unavailable, Fisher's method is recommended. Other methods are preferred when specific assumptions can be made concerning the success probabilities. For instance, the Cochran-Mantel-Haenszel procedure is optimal when the success probabilities have a common value. Fisher's statistic has a chi-square distribution with 2k degrees of freedom when X ,...,X are continuous. In the discrete case, however, the exact distribution of Fisher's statistic is difficult to obtain. Several approximate methods are compared and Lancaster's mean chi-square approximation is recommended. The combining problem is also approached from the standpoint of estimation. Non-parametric methods are inverted to form k-diraensional confidence regions. Several examples for k = 2 are graphically displayed.. Vll PAGE 8 CHAPTER I INTRODUCTION AND LITERATURE REVIEW 1.1 Statement of the Combination Problem The problem of combining tests of significance has been studied by several writers over the past fifty years. The problem is: Given test statistics X ,...,X for testing null hypotheses H ,...,H , respectively, to select a function of X ,...,X to be used as the combined test of the hypothesis H = H, n H. n . . . n H, . In most of the 12 k work cited, the X are assumed to be mutually independent, and, except where stated otherwise, that is true in this paper. Some practical situations in which an experimenter may wish to combine tests are: i. The data from k separate experiments, each conducted to test the same H, yield the respective test statistics X ,...,X It is desired to pool the information from the separate experiments to form a combined test of H. It would be desirable to pool the information by combining the X if (a) only the X ^ , instead of the raw data, are available, if (b) the information from the i experiment is sufficiently contained in X , or if (c) a theoretically optimal test based on all the data is intractible. ii. The i of k experiments yields X to test a hypothesis H., i = 1 k, and a researcher wishes to simultaneously test PAGE 9 the truth of H ,...,11 . Considerations (a), (b) , and (c) in the preceding paragraph again lead to the desirability of combining X as a test of H = H, n ... n H, . 1 k iii. A simultaneous test of H = H n ... n H is desired, and the data from a single experiment yield X , . . . ,X as tests of H , . . . ,H , respectively. Combining the X can provide a test of H. In Section 1.2 several non-parametric methods of combination are introduced. A literature review of comparisons of these procedures is given in Section 1.3. The remainder of this chapter is primarily a literature review of more specific aspects of the combination problem. We make some minor extensions which are identified as such. 1 . 2 Non-parametric Combination Methods Suppose that H. is rejected for large values of X . Define L = 1 F (X J, where F. is the cumulative distribution function of X under H.. If X is a continuous random variable, then L is uniformly distributed on (0,1) under }) . . Many of the well-known 1 methods of combination may be expressed in terms of the L . Such methods considered here are: (1) T^^^= -2ZÂ£nL^^^ (Omnibus method, Fisher [13]) (2) T^^-* = -z$"^(l*-^^) (Normal transform, Liptak [26]) (3) T = -inin 1^ (Minimum significance level , Tippet t [42]) (4) T = -max L (Maximum significance level, Wilkinson [44]) .,.0') _ ,,Â„r, ,(i) (5) T = 2);Â£n(l L^ ^] (I'earson [36,]) (6) T^^^ = -EL*-"-^ (Edgington [12]). PAGE 10 As the statistics are defined here, H is rejected when large values are observed. Figure 1 (page 4) shows the rejection regions for the statistics defined above when k = 2. In the continuous case, the null distributions of these statistics are easily obtained. They are all based upon the fact that the L are uniformly distributed under H. It is easily established that this is true. The cumulative distribution function for L is P{L 11}= P{1 F(x) < i.) = 1 P{F(x) < 1 1} = 1 P{x < F~''^(l Â£)} = 1 f{f"-'-(i I)} = 1 (1 Â£) = Â£. (N) That T has a normal distribution with mean and variance k (m) (H) follows trivially. The statistics T and T are seen to be based on the order statistics of a uniform random variable on (0,1) and therefore distributed according to beta distributions. (F) (F) That T and T are distributed as chi-squares on 2k degrees of freedom is established as follows. The probability density function of L^i^ is Let S = -2J.nL. Then It follows that -S/2 dL -1 -S/2 ^ = ^ ' ^ = T^ Edgington's statistic, T , is a sum of uniform random variables. As shown by Edgington, significance levels can be established for PAGE 11 L 2) J(m) L (i) J(M) T (p) Figure 1. L^" I (I) J(A) Rejection Regions in Terms of the Significance Levels for k = 2. PAGE 12 1 c ^(A) values ot T on Lhe basis of the following equation [12]: (A)_._t^ ^^At::!)^ k(t^ u^^^ uu-s)"^ k! 4'' k! S^ k! ^Â•Â•Â•^^S^~k!~ ^^-1^ P{t(^) >-t} =f-(SlLli)_+ (k lt^2)__ k (t-3)% k (t-S) k! ^r k! 4'' k! 4'^ k! +'--+(c) Â— nwhere S is the smallest integer greater than t. 1.3 A Comparison of Non-parametric Methods The general non-parametric methods of combination are rules prescribing that H should be rejected for certain values of A^^) t(2) ,(k)^ ^ , ^ L'> L ,..., L J. Several basic theoretical results for nonparametric methods of combination are due to Birnbaum [7], Some of these results are summarized in the following paragraphs. Under H^, L is distributed uniformly on (0,1) in the continuous case. When H^ is not true, L^^^ is distributed according to a non-increasing density function on (0,1), say g.(L^^^), if X^^^ has a distribution belonging to the exponential family. Some overall alternative hypotheses that may be considered are: H^: One or more of the L ^ 's have non-uniform densities H^: All of the L 's have the same non-uniform density g(L). H^: One of the L 's has a non-uniform density g.(L ^'*). '^\ ^^ ^^^ appropriate alternative hypothesis in most cases where prior knowledge of tlic alternative densities g . (l ) is unavailable (7J. The following condition is satisfied by all of the metliods introduced in Section 1.2. Condition 1: If H is rejected for any given set of L 's, then it will also be rejected for all sets of L^^^^'s such that L^^^* 2 L*"^'* for each i [7] . PAGE 13 It can be shown that the best test of H versus any particular alternative in H must satisfy Condition 1. It seems reasonable, therefore, that any method not satisfying Condition 1 can be eliminated from consideration [7]. In the present context. Condition 1 does little to restrict the class of methods from which to choose. In fact, "for each nonparametric method of combination satisfying Condition 1, we can find some alternative H represented by non-increasing functions g, (l J,...,g (l J against which that method of combination gives a best test of H" [7] . It should be noted that direct comparison of general combining methods with respect to power is difficult in typical contexts. The precise distributions of the g . (l ) under the alternative hypothesis are intractible except in very special cases. When the X have distributions belonging to the one-parameter exponential family, the overall null hypothesis can be written H: g(l) _ q(1) (k) _ (k) Â„ . ,. . u u A ~ 'Â•Â•Â•Â» " Rejection of H is based upon (X ,...,X J. It is reasonable to reject the use of inadmissible tests. A test is inadmissible if there exists another test which is at least as powerful for all alternatives and more powerful for at least one alternative. Birnbaum proves that a necessary condition for the admissibility of a test is convexity of the acceptance region In the (X ,...,X J hyperplane. For X with distributions in the (p) (M) exponential family, T and T do not have convex acceptance regions and are therefore inadmissible [7]. Although Birnbaum does not consider Edgington's method, we see (A) that it is clear that T must also be inadmissible. For instance. PAGE 14 for k=2, consider the points (0,c), (c,0), and (c/2,c/2) in the (l ,L } plane which fall on the boundary of the acceptance region T > c. The points in the (x^^\x^^^3 plane corresponding to (0,c) and (c,0) would fall on tlie axes at (and --) . The point corresponding to (c/2,c/2) certainly falls interior to the boundaries described by the points corresponding to (c,0) and (0,c). The acceptance region can not, therefore, be convex and hence T^'^^ is inadmissible. This argument is virtually the same as that used by Birnbaum to establish the inadmissibility of T and T . For a given inadmissible test it is not known how to find a particular test which dominates. Birnbaum, however, argues that the choice of which test to use should be restricted to admissible tests. The choice of a test from the class of admissible tests is then contingent upon which test has more power against alternatives of interest [7] In summary of Birnbaum's observations, since T^'''^ and T^^^ do not in general form convex acceptance regions in the (x , . . . ,X^'^^) hyperplane, they are not in general admissible and can be eliminated as viable methods. We can extend Birnbaum's reasoning to reach the same conclusion about T . By inspecting the acceptance regions formed by the various methods, Birnbaum also observes than T ^^ is more sensitive to H^ (departure from H by exactly one parameter) that J^^K The test 1 , however, has better overall sensitivity to H (71. A Littell and Folks have carried out comparisons of general nonparametric methods with respect to exact Bahadur asymptotic relative efficiency. A detailed account of the notion of Bahadur efficiencies is deferred to Section 2.1. PAGE 15 In ti.eir first investigation [26], Littell and Folks compare M) ^(N) Â„(M) ^ ^(m) ^^ ' ' ^ ' ^ ' ^^Â° ^ ' The actual values of the efficiencies are given in Section 2.3. The authors show that T^''^ is superior to the other three procedures according to this criterion. They also observe that the relative efficiency of T^'"^ is consistent with Birnbaum's observation that T^'"^ performs well versus H . Further, Littell and Folks show that T^^\ with some restrictions on the parameter space, is optimal among all tests based on the X " as long as the X ^^ are optimal. This result is extended in a subsequent paper [28] by showing that T^^^ is at least as efficient as any other combination procedure. The only condition necessary for this extension is equivalent to Birnbaum's Condition 1. A formal statement of this result is given in Section 2.3. ^Â•^ Parametric Combination Methods The evidence thus far points strongly to T^^^ as the choice among general non-parametric combination procedures when prior knowledge of the alternative space is unavailable. When the distributions of the X belong to some parametric family, or when the alternative parameter space can be characterized, it may be possible that T^^^ and the other general non-parametric methods can be improved upon. A summary of such investigations follows. Oosterhoff [3jJ considers the combination of k normally distributed random variables with known variances, and unknown means V^,U^,...,\i^. The null hypothesis tested isH;y =y =...=y=0 1 2 * ' * k versus the alternative H^: p . > 0, with strict inequality for at least PAGE 16 one i. lie observed that many combination problems reduce to this situation asymptotically. The difference in power between a particular test and the optimal test for a given (u ,VJ , . . . ,ii ) is called the short(F) coming. Oosterhoff proves that the shortcomings of T and the maximum likelihood test go to zero for all (y^ , . . . ,1J, ) as the overall signif1 k icance level tends to zero. The maximum shortcoming of the likelihood (F) ratio test is shown to be smaller than the maximum shortcoming of T Oosterhoff derives a most stringent Bayes test with respect to a least favorable prior. According to numerical comparisons (again with respect to shortcomings), the most stringent test performs similarly to the likelihood ratio test. The likelihood ratio test is much easier to implement than the most stringent test and is tiierefore pre(F) ferable. Fisher's statistic, T , is seen to be slightly more powerful than the likelihood ratio test when the means are similar; the opposite is true when the means are dissimilar. A simple summing of the normal variates performs better than all other methods when the means are very similar [33] . Koziol and Perlman [20] study the combination of chi-square variates X "^ ~ x,, (9.)The hypothesis test considered is H: i ^ 6, = . . . = 6, = vs H : 0. > (strict inequality for at least one i) 1 k A 1 -1 ^ where the 9. are non-centrality parameters. The v. correspond to the respective degrees of freedom. An earlier Monte Carlo study by Bhartacharya [6] also addressed this problem and compared the statistics ^(F)^ j(m) ^ ^^^ Yy^(i) ^ Bhattacharya concluded that EX^^^ and T^^^ were almost equally powerful and that both of these methods clearly dominated T . Koziol and Perlman endeavor to establish the power of T and PAGE 17 10 T.X in some absolute sense. To do this, they compare T ^^ and EX^"^^ to Bayes procedures since Bayes procedures are admissible and have good power in an absolute sense [20]. When the v^ are equal, LX ^ is Bayes with respect to priors giving high probability to (^ j^Â» Â• Â• Â• .9;^) central to the parameter space (Type B alternatives). The test E exp{yX^^^ } is Bayes with respect to priors which assign high probability to the extremes of the parameter space (Type C alternatives). For unequal v 's the Bayes tests i have slightly altered forms. The Bayes procedures are compared to t(F) rr,(m) (N) ^ , o ^ i , i , and i tor k=Z for various values of (v ,v ) via numerical tabulations and via the calculation of power contours. The statistic, T , is seen to have better power than the other tests for Type C alternatives but performs rather poorly in other situations. The Bayes test performs comparably to T^'"'' for Type C alternatives and is much more sensitive to Type B alternatives than T . The . . (N) statistic, T , IS relatively powerful over only a small region at the center of the parameter space. The statistic, T*^^\ is seen to be dominated by some other procedure for each value of k investigated. The statistics, T and EX , are good overall procedures, with T^^^ more sensitive to Type C alternatives and ZX^^"* more sensitive to Type B alternatives. For v > 2, t"^^^ is more sensitive to Type B alternatives than ZX is to Type C alternatives and T^^^ is therefore recommended. The opposite is true for v = 1. These observations were supported for k>2 through Monte Carlo simulations. Koziol and Perlman also consider the maximum shortcomings of the tests. In the context of no prior information, they show that T*-^-* PAGE 18 11 minimizes the maximum shortcomings for \> . 1 2 while ZX minimizes the maximum shortcoming for v =l. An additional statistic can be consid1 XX) _ ^f^(i) I, ered when v. = l. it is T^^^ = e(x^ '^ the sum of chi's procedures. (X) For k-2, T is powerful only for a small region in the center of the (y ) parameter space. For large k, the performance of T becomes progressively worse. It can be said that T^ performs similarly to T . 1. 5 Weighted Methods of Combination Good []4] suggests a weighted version of Fisher's statistic, T = -ZX.2.nL . He showed that, if the X. are all different, significance probabilities can be found by the relationship ?{T^^^ > x) = E A exp(-x/A ) , r "^ r r=l where A = (G) Zelen [45] illustrates the use of T^ in the analysis of incomplete block designs. In such designs, it is often possible to perform two independent analyses of the data. The usual analysis (intrablock analysis) depends only on comparisons within blocks. The second analysis (interblock analysis) makes use of the block totals only. Zelen defines independent F-ratios corresponding to the two types of analysis. The attained significance level corresponding to the interblock analysis is weighted according to the interblock efficiency which is a function of the estimated block and error variances. PAGE 19 12 A similar example is given by Pape [ 34 J . Pape extends Zelen's method to the more general context of a multi-way completely random design. Koziol and Perlman [20] also considered weighted methods for the problem of combining independent chi-squares. They conclude that wlien prior information about the non-centrality parameters is available, increased power can be achieved at the appropriate alternative by a weighted version of the sum test, Eb X , if v > 2 for all i and bv 1 i ' CG) the weighted Fisher statistic, T , when v. 1 2 for all i. 1 1.6 The Combination of Dependent Tests The combinations considered up to this point have been based on mutually independent L arising from mutually independent statistics, X . As previously indicated, in such cases, the functions of the L which comprise the general methods have null distributions which are easily obtained. l>nien the X^^^ (and thus the L ^^) are not independent the null distributions are not tractible in typical cases. Brown [9] considers a particular example of the problem of combining dependent statistics. The statistics to be combined are assumed to have a joint multivariate normal distribution with known covariance matrix \ and unknown mean vector (p, ,vi , , . . . ,m )'. The hypoth12 k esis test of interest is H: y. = y/ versus H, : y > y (strict in1 lo 1 1 io equality for at least one i) . A likelihood ratio test can be derived [31 J, but obtaining significance values from this approach is difficult. Brown bases his solution on T^ . The null distribution of T^^"* is not chi-square on 2k degrees of freedom in this case. The mean of PAGE 20 13 T is 2k as in the independent case. The variance has covariance terms which Brown approximates. The approximation is expressed as a function of the correlations between the normal variates. These first two moments are equated to the first two moments of a gamma distribution. The resultant gamma distribution is used to obtain approximate significance levels. 1. 7 The Combination of Tests Based on Discrete Data As noted in previous sections, the literature tends to support (F) T as a non-parametric combining method in the general, continuous data framework. Those authors who have addressed the problem of com(F) bining discrete statistics have utilized T assuming that the optimality properties established in the continuous case are applicable. The problem then becomes one of determining significance (F) probabilities since T is no longer distributed as a chi-square on 2k degrees of freedom. We describe the problem as follows. Suppose (i)* (i)* L derives from a discontinuous statistic, X , and that a and b (i)* (i)* are possible values ofL ,Ola
PAGE 21 14 When k, tiie number of statistics to be combined, is small and (n .n^, . . . ,n ) , the numbers of attainable levels of the discrete sta(F) tistics are small, the exact distribution of T can be determined. Wallis [43] gives algorithms to generate null distributions when all of the X are discrete and when one X is discrete. Generating null distributions via Wallis' algorithms becomes intractible very quickly as k and the number of attainable levels of the X increase. The generation of complete null distributions is even beyond the capability of usual computer storage limitations in experiments of modest size. The significance level attained by a particular value of T can be obtained for virtually any situation with a computer, however. (F) A transformation of T which can be referred to standard tables is indicated. A method suggested by Pearson [37] involves the addition, by a separate random experiment, of a continuous variable to the original discrete variable and thus yielding a continuous variable. Suppose X can take on values 0, 1, 2,..., n. with probabilities p_ , p, , . . . , p . n-L 1 1 n-j^ Let pf^^ = E p . Note that P*^^^ J = 0,1,2, ... ,n. , are the observable significance levels for the .th 1 test; I.e., the observable values of the random variable L = 1 f(x ij under the null hypothesis. Denote by U , i = l,2,...,k, mutually independent uniform random variables on (0,1). Pearson's statistic is defined as , such that T //n-+b(e) with probability one [0], 2. There exists a function f(t), 0 |