(22 ARS/H/6

COMPARISONS

AMONG TREATMENT

MEANS IN AN

ANALYSIS

OF VARIANCE

/I AGRICULTURAL

,U RESEARCH

SERVICE

OF UNITED STATES

DEPARTMENT OF

AGRICULTURE

HEADQUARTERS

ARS/H/6

FOREWORD

That the analysis of variance is a powerful technique for testing hypotheses has been accepted for many years.

In analyzing a set of data, however, the scientist usually is interested in relationships between the means to which the

analysis of variance is insensitive.

As early as 1939, statisticians used techniques independent of the analysis of variance to compare means from a

given experiment. Since the middle 1950's, the interest and literature have increased almost exponentially.

In May 1957, Biometrical Services issued ARS 20-3, Mean Separation by the Functional Andlysis of Variance

and Multiple Comparisons. This publication has been out of print for many years. Since the publication of ARS 20-

3 much work has been done on the subject, indicating the need for a major revision.

Since the job of coordinating the national aspects of statistical consulting in ARS was delegated to the Data

Systems Application Division (DSAD), we asked Victor Chew, mathematical statistician, to revise ARS 20-3. We

feel that he has done a very thorough job, which should put mean separation techniques in the appropriate field of

reference with. respect to other statistical techniques that may be used in drawing judgements from data.

Copies of this publication may be obtained from Victor Chew, University of Florida, Room 217, Rolfs Hall,

Gainesville, Florida 32611.

Judson U. McGuire, Jr.

Staff Specialist, DSAD-ARS

PREFACE

The equality of the true average responses of two treatments (varieties, insecticides, concentrations,

temperatures, etc.) usually is tested statistically by the Student's t-test. This is generalized for t (three or

more) treatments by the F-test or the analysis of variance. If the F-test rejects the hypothesis that the t

treatment means are equal, the only conclusion is that the t means are not all equal. It does not necessarily

follow that these t means are all unequal although this may well be true. The next stage in the data analysis is

to determine which treatment means are different. Repeated application of the Student's t-test to all possible

pairs of treatment means (using pooled error either from all t samples or only from the two samples involved

in the t-test) usually is discouraged since this procedure gives a large probability of getting one or more false

positives (that is, of declaring two treatment means to be different, when they are, in fact, equal). Special

techniques (called multiple comparison procedures) are available for this purpose.

Uses and abuses of multiple comparison procedures are discussed in this publication. One glaring abuse

is its use in comparing several levels of a quantitative factor (such as concentration, temperature, and pH).

Regression analysis is the appropriate technique here. Equivalently, the treatment sum of squares in the

analysis of variance table should be partitioned into linear, quadratic, etc., components. In comparing the

effects of, say, 10, 20, 30, and 40 p/m of a certain chemical, if the regression of the response on concentration or

if any component of the sum of squares for concentrations is significant, then no multiple comparison

procedure is necessary. ALL concentrations are significantly different in their effects. In fact, not only will 10

and 20 p/m be different, but so also will 10 and 10.1 p/m. The difference, of course, between the effects of 10

p/m and 10.1 p/m will be extremely small. However, the usual statistical test of significance is not concerned

with the magnitude of the difference, but only whether a true difference exists, no matter how small.

Washington, D.C. Issued October 1977

Issued October 1977

Washington, D.C.

CONTENTS

Page

Chapter 1. Introduction --------------------------- -------------------------------- 1

Chapter 2. Partitioning of Degrees of Freedom for Treatments ----------------------------- 2

2.1 Orthogonal Contrasts ----------------------------------------------- 3

2.2 Qualitative Factors-------------------------- 4

2.3 Quantitative Factors----------------------- 7

2.3.1 One Factor -------------------------------------------------- 7

2.3.2 Two or More Factors -------------------------11

2.4 Mixed Factors ----------------- ------ 13

Chapter 3. Multiple Comparison Procedures ------------------------------------------- 15

3.1 Error Rates ---------------------------------- 15

3.2 Fisher's Protected and Unprotected LSD Methods ------------- 16

3.3 Newman-Keuls' Multiple Range Test ------------------------- 17

3.4 Tukey's HSD Method and Multiple Range Test -------------- 18

3.5 Scheffd's Method --------------------- ------------------------- 19

3.6 Duncan's Methods ----------------------------------------------- 20

3.6.1 Multiple Range Test ----------------------- 20

3.6.2 Bayesian k-ratio t (LSD) Rule ------------------- 22

3.7 Studentized Maximum Modulus Procedure --------------------- ---------- 24

3.8 Comparisons Against a Control -----------------------24

3.8.1 Dunnett's Method ------------------------- ------- 24

3.8.2 Gupta and Sobel's Method ---------------------- ----------- 25

3.8.3 Williams' Method ----------------- ----------------------------- 26

3.8.4 Sequential Methods --------------- ------------------------- 27

3.9 Miscellaneous Methods ---------------------------------------------- 27

3.9.1 Bonferroni Procedure for Preselected Contrasts ------------ ------ 27

3.9.2 Gabriel's Simultaneous Test Procedure (STP) ----------------------- 27

3.9.3 Kurtz-Link-Tukey-Wallace Procedure ---- ----------------------- 28

3.9.4 Covariance Adjusted Means ------------------------------ 28

3.9.5 Procedures for Two-Way Interactions ---------------------- 28

3.9.6 Nonparametric Methods ---------------------- --------------- 28

3.9.7 Gupta's Random Subset Selection Procedure ---------------------- 29

3.9.8 Scott and Knott's Cluster Analysis Method -------------------------- 29

3.9.9 Multivariate Populations --------------------------------------- 30

3.9.10 Subset Selection Approach to Multiple Comparisons ------------------ 31

3.9.11 Other Parameters and Populations -------------------------------- 31

Chapter 4. Conclusion --------------- ----------------------------------------------- 32

Tables

A. Two-Sided (100 alm)% Points of Student's t-Distribution With v Degrees of Freedom ------ 36

B. Percentage Points of the Studentized 'Range q(a;p,) ------------------------------- 37

C. Critical Values for Duncan's Multiple Range Test .---------------------------------- 45

D1. Critical Values of k-ratio t test (k=100) ----------------------------------------- 49

D2. Critical Values of k-ratio t test (k =500) ------------------------------------------ 52

E. 100y% Points of the Distribution of the Largest Absolute Value of k Uncorrelated Student t

Variates With v Degrees of Freedom --------------- ----------------------------- 54

Fl. Critical Values of t(a;q,v) for One-Sided Dunnett's Tests for Comparing Control

Against Each of q Other Treatments ----------------------------------------------- 55

F2. Critical Values of t(a;q,v) for Two-Sided Dunnett's Tests for Comparing Control Against

Each of q Other Treatments ------------ -------------------------------------- 56

G. Critical Values of t(a;p,v) for Testing Zero Against Nonzero Dose Levels ------------- 57

List of References --------------------------------------------------------------------- 59

COMPARISONS AMONG TREATMENT MEANS

IN AN ANALYSIS OF VARIANCE

By Victor Chew'

CHAPTER 1. INTRODUCTION

Before embarking on an experimental project, the research scientist should carefully consider various

issues. These issues include questions that the experiment hopefully will answer, the factors or variables to

be controlled or kept constant during the experiment, the levels of the factors to be varied in the study, the

number of observations to be taken, and the manner in which these observations will be grouped into blocks.

We shall need fewer observations or have wider applicability of the results, or both, if the experiment is

designed efficiently.

This publication is concerned with a particular facet of the analysis of the experimental data, assuming

that the experiment has been designed properly. It is applicable irrespective of the experimental design

(completely randomized, randomized blocks, Latin square, split plot, etc.). We also shall assume that the

reader is familiar with the computational aspects of the analysis of variance for these designs.

The basic terms and notions in statistical inference will be reviewed in this chapter. This is necessary to

understand the relative merits of multiple comparison procedures that are currently available.

In the simplest hypothesis testing situation, we compare two treatments (varieties of peanuts, fertilizers,

temperatures, pH, machine settings, etc.). If we denote the true means of the two treatments by /, and A2, the

statistical hypothesis to be tested is usually that these two means are equal (A, =/12). This hypothesis, called

the null hypothesis, often is denoted by Ho. We write it as Ho: (gi 92) = 0. (We can test a more general

hypothesis, viz., (p, P2) = d, where d is specified numerically.)

In classical hypothesis testing, we must decide whether to accept or to reject Ho. (In sequential testing,

we allow a third alternative of requiring more observations to be taken.) Because the true or population

means ., and A2 are unknown and unknowable, our decision from the statistical test (whether to accept or

reject Ho) is subject to error. If Y, and Y2 are the observed or sample means, estimating p., and A2

respectively, then because of nonhomogeneity of the experimental material (such as plants, animals, plots of

land, batches of peanuts), failure to reproduce identical experimental conditions, errors of measurements,

etc., Y, and 72 will be unequal. even if ,i and A2 are equal. In fact, we may even have Y, larger than y2 when

actually M, is smaller than P2, especially from a small experiment.

There are two kinds of error in hypothesis testing:

Type I-Reject Ho when Ho is, in fact, true (i.e., erroneously deciding that g, and P., are unequal).

Type II-Accept Ho when Ho is, in fact, false (i.e., incorrectly deciding that g, and /., are equal).

The probabilities of a test making these errors usually are denoted by a and 0f, respectively. The perfect test

is, of course, infallible (where a-= 03 = 0), but this is impossible with a finite sample. A good experiment is one

in which both a and /3 are small. The value of a is called the significance level of the test, sometimes expressed

as a percentage. By suitably choosing the rejection region or critical values for the test statistic, we can make

a as small as we like, but only at the expense of increasing fS. For example, we can make a = 0 by always

accepting H0, regardless of the experimental data, but in this case =- LThe only way to decrease both a an

f3 simultaneously is to increase the. sample size (number of observations). Conventionally, a is taken to be

equal to .05 or .01. With f3 defined as the probability of accepting Ho when Ho is false, (1 f3) is the probability

Mathematical statistician, Biometrical and Statistical Services, Agricultural Research Service, U.S. Department of Agriculture,

217 Rolfs Hall, University of Florida, Gainesville, Fla. 32611

of rejecting Ho when H, is false. This quantity is called the power of the test-the probability of the test to

detect a difference when one exists. There are infinitely many tests with the same value of a; among these, we

choose the most powerful one (for which f3 is least) if one exists.

If Ho is false, another alternative hypothesis (denoted by Ha) is true. Corresponding to H,,: (A, ju,) = 0,

three possible alternative hypotheses are (/, P2) > 0, (AI /2) < 0, and (/, b2) 0, called the right-tail,

left-tail, and two-tail alternative hypotheses, respectively. If the first treatment is "control" (i.e., no

treatment at all), the second treatment is the application of some insecticide, and the response being

measured is the number of a particular insect per plant, we know a priori that the alternative to Ho,:/, = /. is

Ha: /g > /2. because the application of the insecticide cannot possibly increase the average count. By

capitalizing on the one-sidedness of Ha, we can construct a more powerful test of H,,, with the same a. If we

are comparing two new insecticides, the alternative hypothesis is two-sided.

It will be seen that a is associated with Ho and 3 with Ha. This explains why we can control a but not /3.

We need the actual difference between the two means to control 13. For this reason, experimenters too often

ignore Type II errors. If they are only concerned with holding Type I errors down to 5%, they need not

conduct the experiment at all. They merely need to take 20 index cards, mark one with an X. shuffle them

thoroughly, and draw one card at random. Reject Ho if the marked card is drawn. At a saving of hundreds if

not thousands of dollars, this experimenter has only a 5% chance of making a Type I error. The reader should

think about the value of 3, in this case.

We cannot emphasize strongly enough the distinction between statistical and practical significance. Any

difference between the sample means 7, and F,, no matter how small, must be declared statistically

significant if the population or true means A, and p.2 are unequal, unless the test has committed a Type II

error (incorrectly declaring two means equal). The test will declare the difference significant if we have

enough replications. In calculating the number "n" of observations to be taken, we only should require n to be

large enough so that the test will detect a difference of at least d (of practical significance) between /, and P.,.

It is no big loss to declare incorrectly that A, and P. are equal if they differ by an insignificant amount.

The author thinks that the research worker has been oversold on hypothesis testing. Just as no two peas

in a pod are identical, no two treatment means will be exactly equal. They always will be different, even if only

in the thousandth decimal place. It seems ridiculous, therefore, to test a hypothesis that we a prior know is

almost certain to be false. If the test accepts the hypothesis of equal treatments, a Type II error probably has

occurred. A related but much more informative alternative approach is interval estimation of (A, P.2). The

confidence limits, of the form (y, y2) t c, will tell us whether the null hypothesis will be accepted (if the

limits have different signs) or rejected (if they have the same signs). They also will give the estimated

magnitude of the actual difference. The value of c depends, among other things, on the confidence level y. If y

= 0.95, we have 95% confidence that (g, P.2) is between (', y c) and (yF, Y2 + c). The closer y is to

unity, the wider the confidence interval. For a given y, we can shorten the interval by increasing the sample

size.

The practice of hypothesis testing when comparing several treatments is even more difficult to justify.

When comparing 10 new varieties of corn, for example, it is inconceivable that all the true average yields will

be exactly equal. Besides a simultaneous confidence interval approach for all pairs of varieties, a better

objective may be to select the smallest subgroup that has a preassigned probability (95%, say) of including the

highest yielding variety. This subgroup of varieties may be tested more intensively and compared in a later

experiment, as in the screening of new drugs.

CHAPTER 2. PARTITIONING OF DEGREES OF FREEDOM FOR TREATMENTS

This chapter deals with situations in which it is possible, before performing the experiment, to partition

the degrees of freedom (d.f.) for treatments, either completely into single d.f. or partially into groups of d.f.

Partitioning must not be suggested after examination of the experimental data. LeClerg (1957) 2 referred to

this partitioning as "functional analysis of variance." Use of a multiple comparison procedure in this chapter

(with a couple of exceptions, explicitly stated) constitutes an abuse of the technique. If the difference between

2 The year in parentheses following the author's name refers to List of References, p. 59.

the observed average responses of two treatments is statistically significant, we shall simply say that the two

treatments are different.

In this chapter, a significant F-test for treatments is not a prerequisite for the partitioning of the

treatments d.f. or s.s. (sum of squares). In fact, the F-test need not and should not be carried out at all. In

comparing t treatments, with (t 1) d.f., the blanket or overall F-test for treatments is averaged over (t 1)

orthogonal comparisons (defined later). If only one or two of these comparisons (or contrasts) are significant,

the overall F-test is diluted or weakened by the (t 2) or (t 3) nonsignificant contrasts and erroneously may

give a nonsignificant F value.

2.1 Orthogonal Contrasts

Let y, Y2,. yt and TI, T2,. . Tt be the sample means and totals from Treatments 1, 2, . t,

respectively. (Unless otherwise stated, we shall assume that the treatments are equally replicated. If n is the

common number of replicates per treatment, we have Y, = Ti/n.) The expression lay = (aTy +. . + ayt) is

called a linear combination of the treatment means. A linear combination is called a comparison or a contrast

if the coefficients (the a's) add up to zero. For example, if we have t = 4 treatments, y, (y2 + Y3 + y4) is a

linear combination of the treatment means. It is not a contrast, however, since the sum of coefficients is

nonzero. (It is equal to -2.) This linear combination compares the mean of the first treatment with the sum of

the means of the remaining three treatments, which is not a fair comparison according to the ordinary

meaning of "fair." A fair comparison is to compare y, with the average of the means of the remaining three

treatments, given by y, (y2 + y, + y4)/3, which is now also a contrast since the coefficients add up to zero. To

avoid fractional coefficients, the preceding contrast usually is written 3YI (Y2 + Y3 + Y4).

The sum of squares corresponding to a contrast C = Yay is

s.s. (C) = n(Say-)2/(Sa2) = (YaT)2/[n(a2)], (2.1)

where Xa2 is the sum of the squares of the coefficients in the contrast. (Notice that the s.s. is unchanged if we

multiply the coefficients by a constant.) Since a contrast has one d.f., the s.s. is also a mean square (m.s.)

because (m.s.) = (s.s.)/(d.f.). It may be tested for significance by dividing it by the error m.s. (with m d.f.,

say) that normally would be used to make the overall test for treatments in the analysis of variance. The

calculated ratio is compared with the critical value of the F-distribution with 1 and m d.f.

If we are comparing t = 4 treatments in a completely randomized experiment with n = 3 replicates per

treatment, the d.f. for the error m.s. is m = t(n 1) = 8. In a 5% two-tail test, the critical value of the

F-distribution with 1 and 8 d.f. is 5.32. If a one-tail test is justifiable (as, for example, if in the contrast 31, -

(y2 + y.3 + Y4), the first treatment is control and the other treatments are three types of insecticides), the 5%

critical value is only 3.46. Since a smaller critical value is easier to exceed, a significant difference is easier to

declare in a one-tail test. Consequently, the test is less likely to commit a Type II error (failure to declare a

difference when one exists).

Two contrasts, C, = Say and C2 = Yby, are said to be orthogonal if lab =0 (i.e., if the sum of the

products of the corresponding coefficients in the two contrasts is zero). A set of contrasts is said to be

mutually orthogonal if all pairs of contrasts in the set are orthogonal. If, for brevity, we write (a,7 + aTy2 +

. . + atYt) as (a,, a2, . at), the three contrasts (1, 1, -1, -1), (1, -1, -1, 1), and (1, -1, 1, -1) are

mutually orthogonal. It can be proved that there are only (t 1) mutually orthogonal contrasts among t

means; however, there are infinitely many such sets of mutually orthogonal contrasts.

The following are another two sets of mutually orthogonal contrasts: (1, 1, -1, -1), (1, -1, 0, 0), (0, 0, 1,

-1), and (3, -1, -1, -1), (0,2, -1, -1), (0, 0, 1, -1). It also can be proved thatif C, C2,. ., C,-i ac (t -1)

mutually orthogonal contrasts, their individual sums of squares add up exactly to the treatments s.s. The

statistical distributions of these contrasts are independent. This is one reason why, whenever possible, we

should aim for an orthogonal decomposition of the treatments d.f. Of the possible sets of mutually orthogonal

contrasts, the experimenter should choose the set that is most interesting or most relevant to his study.

Mutual orthogonality is desirable but not absolutely essential. If several contrasts interest the scientist, he

should not let the lack of mutual orthogonality prevent him from performing the statistical tests, as long as

these contrasts have not been suggested by the data. Contrasts suggested after data snooping should be,

tested by a multiple comparison procedure.

2.2. Qualitative Factors

Experimental variables or factors may be divided into qualitative and quantitative factors. E

qualitative factors are varieties (peanuts, corn, etc.), types (soils, fungicides, etc.), locations, and

chemical analyses or of counting bacteria. Examples of quantitative factors are temperature

humidity, pH, concentration, and several levels of a fertilizer. Although the various varieties or

an experiment also are referred to as the levels of the factors "varieties" and "soil types," no

numerical values can be assigned to the levels of a qualitative factor. Levels of a quantitative vari

course, naturally numerical.

Factorial experiments are those in which the treatments are made up of all possible combine

levels of two or more factors (qualitative or quantitative). (The term factoriall" thus merely de

nature of the treatments and not the design of the experiment, which may be completely r:

randomized block, Latin square, split-plot, etc.) The simplest factorial is the 22 or 2 x 2 experimel

factors A and B, ach at two levels. For the 2 x 2 factorial, the partitioning of the d.f. for treat

same whether the two factors are both qualitative or quantitative, or one of each kind. The two le

designated generally as H (high) or L (low). The low level, in particular, may be zero. For a quality

we may arbitrarily label one level H and the other L. The four treatments are denoted by (1), a

where absence of a letter implies that the corresponding factor is at the low level; and (1) is a spe

for the treatment where both factors are at the low level. These four treatments could have

explicitly but awkardly denoted by ALB,, AHB,,, A,,BH, and AHBH, respectively.

The three d.f. for treatments are partitioned into the main effect of A, main effect of B

interaction. The coefficients for these contrasts are as follows:

Treatments

Contrasts -(1) a b ab

C, -1 1 -1 1 Main effect of A

C2 -1 -1 1 1 Main effect of B

C3 1 -1 -1 1 Interaction of A and B

The coefficients for the main effect of A are +1 for treatments where A is at the high level and -1I

low level; and similarly for B. The coefficients for interaction are obtained by multiplying cor

coefficients for main effects. To get the sums of squares for the preceding contrasts, we apply Eq

to the four treatment means or totals, using the coefficients for each contrast in turn.

The difference [a (1)] is called the simple effect of A at the low level of B; similarly, (ab b) i

effect of A at the high level of B. The main effect of A is the average of the simple effects of A

fractions, the coefficients for this average have been multiplied by two. The reader will recall th

contrast is unchanged if the coefficients are multiplied by a common number.)

If the factors A and B act independently, the two simple effects of A should be about

(Experimental or random errors will prevent them from being exactly equal.) Therefore, their

(ab b) [a (1)] = ab + (1) a b = C3

should be approximately zero if A and B are independent. If this quantity is large (significantly dit

zero), we say that there is interaction between A and B (i. e., effect of A at low level of B is different

of A at high level of B). We also can write C3 as (ab a) [b (1)] = (effect of B at high level of A)

B at low level of A) so that if effect of A depends on the level of B, we know that the effect of B der

level of A.

The following artificial two-way tables of means show some possible results of the tests for i

and interaction. In (d), for example, the simple effect of A is 10 units at low B and 20 units at high

dependence of the effect of A on the level of B or interaction between A and B.

Low

B High

Average

(a)

Low

B High

Average

(b)

Low

B High

Average

(c)

Low

B High

Average

(d)

Low High Average

10 20 15

12 24 18

11 22

A sig.

B not sig.

A x B not sig.

A

Low High Average

10 20 15

22 34 28

16 27

A sig.

B sig.

A x B not sig.

A

Low High Average

10 20 15

. 6 26 16

8 23

A sig.

B not sig.

A x B sig.

A

Low High Average

10 20 15

18 38 28

A sig.

B sig.

A x B sig.

In general, a two-factor experiment is a p x q factorial. The (pq 1) d.f. for treatments will be

partitioned into main effects of A with (p 1)d.f., main effects of B with (q 1) d.f., and interaction with (p

- 1) (q 1) d.f. The A x B interaction is more difficult to illustrate if p and q are greater than two, but the

interpretation is similar to that in the 2 x 2 factorial; viz., differences among levels of A depend on the levels of

B, and vice versa. If the p levels of A are such that orthogonal contrasts are possible, the (p 1) d.f. for the

main effects of A should be partitioned further into single d.f. If it is impossible to partition the (p 1) d.f. for

A, then it is legitimate to use a multiple comparison procedure to compare the p levels of A.

Testing the main effects of A presupposes that there is no A x B interaction. If interaction exists, the

differences among the levels of A depend on the level of B. It does not make much sense to compare the levels

of Aaveraged over all levels of B, which is what main effect is. It is more instructive to compare the levels of A

for each level of B separately, and vice versa, using the pooled error mean square from the complete

experiment, if the assumption of homogeneous variances is valid.

With three factors, the simplest is a 23 or 2 x 2 x 2 factorial. The eight treatments may be denoted by (1),

a, b, ab, c, ac, be, abc, in an obvious extension of the previous notation, where, for example, ac stands for the

treatment with factors A and C at their high level and B at the low level. The seven d.f. for treatments will be

partitioned into main effects (A, B, C), two-factor (or first order) interactions (A x B, A x C, B x C), and

three-factor (or second order) interaction (A x B x C), each with a single d.f. Second and higher order

interactions-are difficult to interpret. The A x B x C interaction is the interaction of (A x B) and C. If A x B

x C interaction is significant, the A x B interaction at the high level of C is different from that at the low level

of C. The coefficients for the following contrasts are obtained as in the 2 x 2 factorial experiment.

Treatments

(1) a b ab ac be abc

A -1 1 -1 1 -1 1 -1 1

B -1 -1 1 1 -1 -1 1 1

A x B 1 -1 -1 1 1 -1 -1 1

C -1 -1 -1 -1 1 1 1 1

A x C 1 -1 1 -1 -1 1 -1 1

B x C. 1 1 -1 -1 -1 -1 1 1

A x B x C -1 1 1 -1 1 -1 -1 1

The 2 x 2 x 2 factorial can be generalized to the p x q x r factorial (three factors A, B, and C, with p, q,

and r levels, respectively), to the 2P factorial (p factors, each at two levels), and to the p, x p2 X . x p, (r

factors with p, P2, P. Pr levels). The total number of treatment combinations increases rapidly with

increasing number of factors. With six factors, even if each is at two levels, we require 26 = 64 experimental

units per replicate. Besides the 6 main effects, there will be 15 two-factor, 20 three-factor, 15 four-factor, 6

five-factor, and 1 six-factor interactions. If we can assume that high order interactions (four-factor or higher,

say) do not exist, as is usually true, we may pool these interactions for use as error mean square so that we do

not need to replicate. In fact, a single replicate already may be too large an experiment, and our resources

may allow us to carry out only a portion of the full factorial experiment. So-called fractional factorial

experiments are available for this purpose. They are discussed in Davies-(1956), Cochran and Cox (1957),

Peng (1967), John (1971), and Anderson and McLean (1974).

The following example, taken from Little and Hills (1972), shows the partitioning of treatments d.f. to

give meaningful single d.f. contrasts. Six sources of nitrogen on yield of sugar beet were compared: Control

(1), urea (2), ammonium sulfate (3), ammonium nitrate (4), calcium nitrate (5), and sodium nitrate (6).

Treatments

Contrasts 1 2 3 4 5 6

C, -5 1 1 1 1 1 Nitrogen vs. no nitrogen

C., 0 -4 1 1 1 1 Organic vs. inorganic nitrogen

C, 0 0 -1 -1 1 1 Ammonium vs. nitrate nitrogen

C, 0 0 -1 1 0 0 Ammonium nitrate vs. sulfate

C, 0 0 0 0 -1 1 Calcium vs. sodium nitrate

The reader should check the mutual orthogonality of the contrasts. Note that the interpretation of Contrast

C3 is not quite right since Treatment 4 contains both ammonium and nitrate nitrogen.

An interesting factorial experiment was conducted by Dr. Ralph Segall at the U.S. Horticultural

Research Laboratory in Orlando, Fla. He studied the effects of 10 fertilizer treatments on the incidence of

postharvest bacterial soft-rot of tomato fruits. The 10 treatments (all of which had 18-0-25) initially may be

regarded as a 2 x 5 factorial (mulching at two levels and "additives" at five levels). The five additives are

made up of control and four chemicals. The four chemicals are in the form of a 2 x 2 factorial (2 anions and 2

cations). We have shown the coefficients for only five mutually orthogonal contrasts. The remaining four

contrasts are the interactions between C, and each of C2, C3, C4, and C5. The reader may interpret the

contrasts C1, . C, and the interactions between C, and each of C2, . C5.

Contrasts

Treatments C, C, C, C4 C5

Control (1) 1 -4 0 0 0

Calcium nitrate (2) 1 1 1 1 1

Mulched beds Calcium chloride (3) 1 1 1 -1 -1

Potassium nitrate (4) 1 1 -1 1 -1

Potassium chloride (5) 1 1 -1 -1 1

Control (6) -1 -4 0 0 0

Calcium nitrate (7) -1 1 1 1 1

Nonmulched beds Calcium chloride (8) -1 1 1 -1 -1

Potassium nitrate (9) -1 1 -1 1 -1

Potassium chloride (10) -1 1 -1 -1 1

There may be situations in which it is justifiable to apply a multiple comparison procedure to compare

factorial treatments. For example, suppose a farmer is interested in growing one of three types of grasses and

using one of four types of fertilizers. The farmer is not interested in the scientific comparison of yields from

the three varieties of grasses or types of fertilizers. He is only interested in maximizing his profit. If the

commercial values of the three grasses and the costs of the four fertilizers are different, analyzing the profit

(in dollars and cents) per plot is more relevant than analyzing yields per plot. The 12 treatments (combina-

tions of grasses and fertilizers) may be compared for profitability, using a multiple comparison procedure and

ignoring their factorial nature.

At a panel discussion sponsored by the Data Systems Application Division, Agricultural Research

Service, during the joint meeting of the statistical societies in Atlanta in August 1975, two panel members

(Dr. David B. Duncan and Dr. John W. Tukey) said they might condone multiple comparisons of individual

factorial treatments (from qualitative factors) if the main effects were not significant (Duncan) or if their F

ratios were less than two (Tukey).

2.3. Quantitative Factors

2.3.1. One Factor

With a quantitative factor (e.g., temperature, pressure, humidity, pH, and concentration or levels of a

fertilizer), regression analysis or curve fitting is the most appropriate technique. The treatments d.f. and s.s.

should be partitioned into components due to linear (first degree) regression, quadratic (second degree)

regression, cubic (third degree) regression, and so forth. If enough theoretical knowledge exists to specify

the mathematical form of the relationship between the response y and the experimental variable x (e.g.,

logistic, Mitscherlich's law, Gompertz's law, von Bertalanffy's curve, etc.), this equation should be fitted to

the data. In most (if not all) agricultural experimentation, however, the mathematical relationship between

the response and the so-called independent variable is so complex that it defies specification. Therefore, we

must approximate the unknown mathematical relationship by means of a polynomial of the form y = bo + bx

+ b2x2 + . + bdxd Within a limited range of the independent variable, a polynomial approximation is

usually satisfactory if the response does not level off in the experimental range of x, in which case an

asymptotic curve should be fitted.

Table 1 shows the analysis of variance of a randomized block experiment with b replicates or blocks, t

treatments (levels of a quantitative factor), and m measurements per plot (experimental unit), with partition-

ing of the treatments d.f. and s.s. into linear and quadratic components. With the general availability of

computer programs, it is not difficult to fit a polynomial of a higher degree than quadratic. The ratio'

ms(dr)/ms(e) provides a test for the statistical significance of the combined contributions from the higher

order polynomials, sometimes called a test of the lack of fit of the fitted model (in this case quadratic). If

quadratic is sufficient, this ratio has the F-distribution with (t 3) and (b 1) (t 1) d.f. (For testing, the

author generally recommends the use of ms(e) rather than ms(s) as the error term since the latter does not

represent true replications. If b = 1, we are forced to use ms(s) as the error term, but this is dangerous since

ms(s) may seriously underestimate ms(e) and it will then be easy to get a spuriously significant result.)

If the quadratic term is statistically significant but its s.s. is only a small part of the treatments s.s., we

may prefer to fit a linear trend only since the curvature of the response curve is only slight. We may be able to

predict the response y better (i.e., with a smaller mean squared error of prediction) by using a straight line

rather than a quadratic, even if the true response curve is a quadratic function. The curvature, however,

must be slight. This comes about through having to estimate fewer parameters (constants of the response

function) in linear regression. A straight line is also easier to use than a parabolic curve.

In comparing the effects of, say, 10, 20,.30, and 40 p/m of a certain chemical, if the linear or quadratic

regression of response on concentration is significant, or both are significant, no multiple comparison

procedure is necessary. All concentrations are significantly different in their effects. In fact, even 10 and 10.1

p/m also will be different. Of course, the difference between the effects of 10 and 10.1 p/m will be extremely

small. The usual significance test is not concerned with the magnitude of the difference, however. It is only

concerned about whether a true difference exists, no matter how small.

We have the following possible results with one factor:

I I I I

10 20 30 40

x

(a) LR (NS)

QR (NS)

I I I I

10 20 30 40

x

(c) LR (S)

QR (S)

LR = linear regression;

S = significant;

I I I i

10 20 30 40

x

(b) LR (S)

QR (NS)

y*

I x*

10 20 30 40

x

(d) LR (NS)

QR (S)

QR = quadratic regression

NS = not significant

In (a), all treatments (infinitely many between 10 and 40 p/m) are the same, while in (b) and (c) all treatments

are different. In (d), all treatments less than x* (the value of x that will maximize y) are different. We may

want to estimate x* and construct confidence limits for it. If y* is the maximum response, we may be

interested in finding the range of x that will give a response higher than (y* A), where (y* A) is an

acceptably high yield. If it costs more to apply the factor x the higher its level is, we should take z as the

Table 1. Analysis of variance of a randomized block experiment to compare effects of several levels of a

quantitative factor

Sources of variation d.f. s.s. m.s. F

Blocks (B) b-1 ss(b) ms(b) ms(b)/ms(e)

Treatments (T) t-1 ss(t) ms(t) ms(t)/ms(e)

Linear regression 1 ss(o r) ms( 1 r) ms( J r)/ms(e)

Quadratic reg. (additional) 1 ss(qr) ms(qr) ms(qr)/ms(e)

Deviations from reg. t-3 ss(dr) ms(dr) ms(dr)/ms(e)

Error (B x T) (b-l) (t-1) ss(e) ms(e)

Subsampling error bt(m-1) ss(s) ms(s)

Total btm-1 ss(T)

response variable, where z is the yield per unit cost of application of x. These considerations are more

meaningful than the question often asked by the naive experimenter: Among 10, 29, 30, and 40 p/m, which are

different in their effects?

There are two options if the lowest level of x in the experiment is zero (control). We may fit a regression

curve to all levels (including zero), or we may isolate a single d.f. for the contrast between zero and nonzero

levels and fit a regression curve to the nonzero levels only. Quite often the regression is curvilinear in the first

option and linear in the second option. If this is so, the second method of analysis is preferable, especially if in

actual usage the factor x will not be applied at a level below the first nonzero level of the experiment.

For the linear regression model y = bo + b,x, the estimated responses at x = x* and at x = x** are y* = b,

+ bix* and y** = bo + bix**, respectively. Therefore, the estimated difference in response at any two values

x* and x** is equal to b,(x** x*), and the variance of this estimated or predicted.difference is (x** x*)2

(variance of b,). The formula for the variance of b, is given in Equation (2.4). The 100 (1 a)% confidence

interval for the true difference is b,(x** x*) t(a;v) V(x** x*)2 (estimated variance of b,), where t(a;i) is

the two-sided (100 a)% point of Student's t-distribution with v d.f.

For the quadratic regression model y = bo + b,x + b2x2, the estimated difference is b,(x** x*) +

b2(x**2 x*2), with variance equal to [(x** x*)2 (variance of b,) + (x**2 x*2)2 (variance of b2) + 2(x** -

x*) (x**2 x*") covariancee of b, and b,)]. In a good regression computer program, the printout will include

the estimated variances and covariances of the estimated regression coefficients.

Because linear relationships occur frequently, we will give the computational results for linear regres-

sion analysis. In general, let yS be the mean of the n, observations taken at xi, the ith level of the factor (i = 1, 2,

. t). (We are allowing unequal replications here. In Table 1, ni = bm, a constant.) The equation of the

fitted line is y = b, + b, x, where

b, = xnixiy (1nixi) (Xniyj)/N (2.2)

in,.xi (Inixi)2/N

bo = [(Sniyi) b, (Inixi,)/N, (2.3)

and N = (n + n2 +. . + nt), the total number of observations. (In the simplest linear regression problem, n,

= n, =. . = nt = 1, and the above formulas for the slope and intercept of the line will reduce to more familiar

ones.) The s.s. for linear regression is (Num.)2/Den., where "Num." and "Den." are the numerator and

denominator, respectively, of the expression for b1 above. The s.s. for deviations from regression, now with (t

- 2) d.f. if we are only fitting a straight line, is most conveniently obtained by subtracting ss(. r) from ss(t),

the treatments s.s. Finally, the variance of h1 is

var. (b,) = 0o2/[n iX (1nix1)2/N], (2.4)

and o-2 may be estimated by ms(e) in Table 1, or by ms(dr) if b = 1.

If the levels are replicated equally and spaced equally, the computations for obtaining the various s.s. for

regression will be simplified considerably by the use of orthogonal polynomials, shown in Table 2 for 3, 4, and

5 levels only. For more extensive tables and discussion of the method for getting the actual regression

equation, see Fisher and Yates (1963). If we look at t = 4 levels, say, in Table 2, we see that the three sets of

9

coefficients form a set of mutually orthogonal contrasts. (A polynomial curve of degree (t 1) will pass

through the t means exactly.) With these coefficients, we can obtain the s.s. for linear or quadratic

regression, using Equation (2.1) in the previous section on orthogonal contrasts. An example follows.

Table 2. Orthogonal polynomials

(t = number of levels; d = degree of polynomial)

t=3 t=4 t=5

d=1 d=2 d=1 d=2 d=3 d=1 d=2 d=3 d=4

-1 +1 -3 +1 -1 -2 +2 -1 +1

0 -2 -1 -1 +3 -1 -1 +2 -4

+1 +1 Tl -1 -3 0 -2 0 -

+3 +1 +1 +1 -1 -2 -4

+2 +2 +1 +1

Chew (1962) discussed published results of an experiment wherein the research worker erroneously

concluded that there were no treatment differences, through failure to partition the treatments d.f. and s.s.

Table 3 shows the analysis of variance and treatment means with b = 5 blocks, t = 4 treatments (0, 2, 4, and 6

degrees of angle), and m = 5 repeated measurements on each experimental unit. (The response was the force

in pounds required to separate a set of electrical connectors at various angles of pull.) The treatment means

show increasing response with increasing angles. Each treatment mean was an average of ni = bm = 25

observations. From the coefficients in Table 2, the means in Table 3 and Equation (2.1), we have the following

sums of squares for regression:

linear regression =

quadratic regression =

cubic regression =

25[(-3) (41.94) + (-1) (42.36) + (1) (43.82) + (3) (46.30)]2

(-3)2 + (-1)2 + (1)2 + (3)2

25[(1) (41.94) + (-1) (42.36) + (-1) (43.82) + (1) (46.30)]2

(1)2 + (-1)2 +(-1)2 +(1)2

25[(-1) (41.94) + (3) (42.36) + (-3) (43.82) + (1) (46.30)]2

(-1)2 + (3)2 + (-3)2 +(1)2

In a two-tail test, the F-ratio for linear regression is significant at between the 2V2% and the 1% level. In a

one-tail test it will 'be significant at between the 11A% and the 1/2% level. (A one-tail test could be justified

here.)

Table 3. Analysis of variance and means

Source of variation d.f. s.s. m.s.

Blocks 4 1234.83 308.71

Treatments: 3 290.79 96.93 2.56 (not sig.)

Linear regression 1 264.26 264.26 6.97*

Quadratic regression 1 26.52 26.52 <1

Cubic regression 1 .01 .01 <1

Error 12 455.03 37.92

Subsampling error 80 316.50 3.96

Total 99 2297.15

x: 0 2 4 6

y: 41.94 42.36 43.82 46.30

Difference: 0.42 1.46 2.48

= 264.26

= 26.52

= 0.01

With n, = 25 and N = 100, the formulas for the slope and intercept give:

bi = (25)[0(41.94) + 2 (42.36) + 4 (43.82) + 6 (46.30)] (25) (12) (25) (174.42)/100

(25) (0 + 4 + 16 + 36) [25(12)]2/100

= 0.727;

bo = [25 (174.42) 0.727(25) (12)]/100 = 41.424,

so that the equation is y = 41.424 + 0.727x.

Since regression is significant, no multiple comparisons are necessary. The treatments areALL different

(in their effects). For example, 0 and 2 degrees are different (without testing), as well as 0 and 1 degree or

even 0 and 0.1 degree. This equation gives an estimate of y for any given x; and, clearly, for two different

values of x, the equat. on gives different values of y. The difference in response at x = x* from that at x = x** is

y(at x**) y(at x*) = 0.727 (x** x*),

and its estimated variance is (x** x*)2 (37.92)/{(25) (56) [25(12)]2/100} = 0.0758 (x** x*)2, using

Equation (2.4) for the variance of bi. The 95% confidence interval for the difference in the two responses

corresponding to a unit difference in the x values is 0.727 2.179 vr775 = 0.727 .600 = (.127, 1.327).

If the observed means of the t levels are in increasing (or decreasing) order and t is at least four, no

further statistical test is necessary to establish significance of treatment effects, if it is known a priori that the

effect of treatment, if any, is to increase (or decrease) the response, for the probability of the t means falling in

that order under the null hypothesis is 1/(t!) < 1/24, ift -- 4, which is significant at the conventional 5% level. If

there is no prior knowledge of the direction of the treatment effect, a two-sided test is necessary and t has to

be at least five for the ordering of the t means to be significant at the 5% level.

For a criticism of the widespread misuse of Duncan's multiple range test in agricultural research to

compare levels of a quantitative factor, see Mead and Pike (1975), particularly Section 2.2.

2.3.2. Two or More Factors

For one quantitative factor, we partition the treatments d.f. into linear, quadratic, cubic, etc., regression,

which is equivalent to fitting a polynomial of the form y = bo + b,x + b2X2 +. . + bdXd, where y is the

measured response and x is the level of the experimental factor. We similarly analyze two quantitative factors

A and B. Denote the levels of A and B by x, and x,, respectively. The following are the first and the second

degree (or order) polynomials in two variables:

y = bo + b,x, + b2x2 (first order)

y = bo + (bx, + bx2) + (b1,x,2 + bl2x 1x + b22X22) (second order)

In the second order polynomial, the coefficients b11, b12, b22 could have been replaced by b3, b4, bs. The double

subscript, however, reminds us that these are the coefficients for the quadratic terms. Just as the second

order model is obtained from the first order model by adding the second order (or quadratic) terms, we

similarly obtain the third order model by adding the cubic terms (b,11x,3 + b112x 1x2 + b122x1x22 + b222X23) to

the second order model.

In partitioning the d.f. in a 2 x 2 factorial, we are in essence fitting the model y = bo + bix, + b2x2 +

b,2x,x.,, an incomplete second order model: (With only two levels, we cannot estimate squared terms.)

In a 3 x 3 factorial, the 2 d.f. for each of the two main effects may be further partitioned into linear and

quadratic terms. The 4 d.f. for the A x B interaction may be partitioned into products of the linear and

quadratic terms of the main effects. Therefore, we are fitting the model

y = bo + (bx, + b11x12) + (b2x2 + b22x22) + (bl2X1X2 + b122XIx22 + b112x12x2 + b1122I2x22),

(main effects of A) (main effects of B) (interaction A x B)

which is a second order model plus two cubic and one quartic terms.

Table 4 gives the analysis of variance of a randomized block experiment with b blocks and t treatments,

with the t treatments forming a p x q factorial. This table should be compared with Table 1 for one

quantitative factor. (If m measurements were made on each experimental unit, we will assume that they have

been averaged; otherwise, there will be an extra line in the analysis of variance, as in Table 1.) The 2 d.f. for

linear regressica. may be further partitioned to show the individual contributions from x, and x2 separately.

11

T

They are partitioned similarly for quadratic and cubic regressions. The sums of squares in the s.s. column

usually are called the sequential sums of squares. For example, ss(qr) is not the total quadratic regression

s.s.; it is the additional s.s., after fitting a linear model. In other words, ss(qr) is the difference in regression

sums of squares between fitting a linear model and a full quadratic model. If the true model (true state of

nature) is linear, n.o(qr), ms(cr), and ms( of) will be almost the same as ms(e), the error m.s. The quadratic

model has 5 coefficients (other than the intercept bo); therefore, it has 5 d.f. and its s.s. is obtained by adding

ss(, r) and ss(qr). If p = q = 5 (i.e., a 5x 5 factorial), t = pq = 25 and "lack of fit" has (t -10) = 15 d.f. If we are

certain that a cubic model is adequate, and this is usually so, we do not need any replication. We can use

ms( o of) as the error m.s. in making tests of significance. With replication, however, we can test the cubic

model. The extension of Table 4 to three or more quantitative factors should be obvious.

Table 4. Analysis of variance of a randomized block experiment

with 2 quantitative factors

Sources of variation d.f. s.s. m.s.

Blocks (B) b 1 ss(b) ms(b)

Treatments (T) t 1 ss(t)

Linear regression- 2 ss(A r) ms(U r)

Quadratic reg. (additional) 3 ss(qr) ms(qr)

Cubic reg. (additional) 4 ss(cr) ms(cr)

Lack of fit t -10 ss(. of) ms(A of)

Error (B x T) (b 1) (t 1) ss(e) ms(e)

Total bt 1 ss(T)

Since getting the various s.s. is extremely tedious on a desk calculator, a computer is necessary. If the

levels of A and B are equally replicated and equally spaced (e.g., 5, 10, and 15 units for A and 100, 200, and 300

p/m for B), we can use orthogonal polynomials, as in the one-factor case. We illustrate this with a 3 x 3

factorial. From Section 2.3.1, we know how to obtain the linear and quadratic regression s.s. for A and for B.

using either the means or the sums for the levels of A and of B. Table 5 gives the coefficients for getting the

s.s. corresponding to xix2, x12x2, xix22, and x,2x22. The coefficients will operate on the treatment means as

usual. For example, if we denote the treatment means by y1, . ., y9 in the order shown in Table 5, the s.s.

correspondingtoxiX2 (or AL x BL) is, from Equation (2.1) in Section 2.1, equal to b(y, Y Y7 + 7,)2/4, where

b is the number of observations in each mean. We also can use the coefficients in Table 5 to get the s.s. for A,

AQ, BL and BQ, but these can be obtained more easily from the three means for the three levels of A, and

similarly for B. The reader should verify that the coefficients for the components of the main effects are

similar to those given in Table 2. As before, the coefficients for interactions are the products of corresponding

coefficients for the main effects. With Table 5 as an example, the reader should have no difficulty in extending

this to a 3 x 4 or 4 x 5 factorial, or to more than two factors. As an exercise, the reader should write the

coefficients for a 2 x 3 x 3 factorial.

Table 5. Orthogonal polynomials for 3 x 3 factorial (equally spaced)

Treatments

A=1 A=2 A=3

B: 1 2 3 1 2 3 1 2 3

x, or AL: -1 -1 -1 0 0 0 1 1 1

x2 or AQ: 1 1 1 -2 -2 -2 1 1 1

x2 or BL: -1 0 1 -1 0 1 -1 0 1

x2 or B: 1 -2 1 1 -2 1 1 -2 1

xx, or ALx BL: 1 0 -1 0 0 0 -1 0 1

xix, or ALx BL: -1 0 1 2 0 -2 -1 0 1

xlxx or ALx BQ: -1 2 -1 0 0 0 1 -2 1

xx2 or AQx BQ: 1 -2 1 -2 4 -2 -1 -2 1

As in the one-factor case, if regression (whether linear or quadratic) is significant, then all treatments

are different and no multiple comparison procedure is necessary. Suppose a second order model is necessary

and sufficient. We can use this model for interpolation; i.e., to predict the response y at any point within the

range of the values of the two factors used in the experiment. Polynomials are notoriously bad for extrapola-

tion. We also can find the combination of values of x, and x2 that will optimize (maximize or minimize) y. To do

this, we differentiate y with respect to x, and x2, set these two derivatives to zero, and solve the two resulting

equations The solution is:

x* = (2bb22-b2b12)/(b2i-4b11b22)

x*2 = (2b2b1,-b,b12)/(b'i-4b ,b22).

These values of x* and x* (if the true values of the b's are known) will optimize y. The estimated optimum

value of y is obtained by putting the estimated values of x* and x* (in terms of the estimated b's) into the

second order model.

If the two factors are two kinds of fertilizers, say, the optimum y may require such a large amount of both

fertilizers that it will not be economically optimum. Instead of fitting a model to the yield y, perhaps we should

fit a model to z, the yield per dollar of fertilizers applied, and optimize z.

If the response surface (value of y as x, and x2 vary) is highly peaked at the optimum, we should not stray

far from the optimum combination of x, and x2 because y will drop sharply. On the other hand, if the response

surface is rather flat near the optimum, we can depart from the optimum condition without any appreciable

decrease in y and the other combinations may be more convenient. One way to study the response surface is to

draw contours. Suppose the estimated optimum value ofy is 138, say. We can set y = 135,130, 125, etc., in the

second order model. These values will give us the sets of values of x, and x, that will give an estimated yield of

135, 130, etc.

We also can use the equation to estimate the difference in the response at two different points. For

example, for the same value of x, but different values of x2 (x* and x2, say), the difference in the responses

is y(x,,x*) y(x1,x') = (x*-x2) b2 + x,(x*-xi) bl2 + (x2-X22) b22, and its variance is (x*-x2)2

V(b2) + X2(x*-x')2 V(b12) + (x2*2-x22 2)V(b22) + 2xi(x*-X)2 Cov(b2,bl2) + 2(x*-xD) (x*2-X 2)

Cov(b2,b22) + 2x1(x *-x) (x*2-X2) Cov(b12,b22). Similarly, we can estimate y(xi*,x2) y(x1',X2) and

y(x*,,x*2) y(x ,x), and their standard errors. Variances and covariances of the regression coeffi-

cients will be included in the computer printout from a good regression analysis program.

We conclude by mentioning a question of experimental design. Box and Wilson (1951) pointed out that

the squared terms in the second order model are estimated with relatively low precision in a 3 x 3 factorial.

Box and his coworkers have developed so-called response surface designs. The texts mentioned previously

for fractional factorials also contain discussion on response surface methodology. Further references are Box

and Hunter (1958) and Myers (1971).

2.4. Mixed Factors

Consider two factors A and B, with p and q levels respectively, where A is qualitative and B is

quantitative. An example would be an experiment comparing several varieties of peanuts and several rates of

a fertilizer, or destruction rates of a certain bacteria at different temperatures, using several culture media.

Table 6 shows the analysis of variance of a randomized block experiment, showing the partitioning of the

d.f. for the pq treatments. We have partitioned the d.f. for the main effects of B into linear and quadratic

regression only, but a higher polynomial also may be fitted. If the levels of B are spaced equally, ss(BL) and

ss(BQ) will be easy to get, using orthogonal polynomials, and ss(BR) will be obtained by difference, using ss(B).

If the levels of A are such that meaningful orthogonal contrasts can be formed among them (before looking at

the data), we should partition its d.f. accordingly, and also the d.f. for Ax BL,, etc.

Page

Missing

or

Unavailable

Page

Missing

or

Unavailable

If a = .05, this equation gives E = .05, .0975, .1426, .1855, .2263, .2649, .3017, .3366, .3698, .5124, and .6227

for t = 2, 3,. . 9, 10, 15, and 20, respectively. Thus, if we test each of the 9 orthogonal comparisons at the

5% level, in an experiment with t = 10 treatments (and the null hypothesis Ho is true), the probability of

rejecting (incorrectly) one or more comparisons is 36.98%. The overall protection against incorrectly reject-

ing any of the nine comparisons is 63.02% in this example.

If E = .05, the preceding equation gives a = .05, .0253, .0169, .0127, .0057, .0037, and .0028 fort = 2, 3, 4,

5, 10, 15, and 20, respectively. Thus, if we wish to hold the experimentwise error rate to 5% (i.e., 5%

probability of rejecting one or more orthogonal comparisons in an experiment where all treatments are equal

or, equivalently, 95% protection against incorrectly rejecting any comparison), we have to make each

comparison at a = .0057 (i.e., the 0.57% level) if there are 10 treatments in the experiment.

There is no rigid rule or criterion that enables us to decide whether a comparisonwise or an experi-.

mentwise error rate is more appropriate. It is mostly a subjective choice. An experimentwise rate is more

conservative in that fewer Type I errors (false significance) will be made; however, more Type II errors

(failure to detect true differences) will be made. A similar problem exists in choosing the significance level a

in the simple two-treatment case. Should a be taken to be .05 or .01? In situations where incorrectly rejecting

one comparison may vitiate the entire experiment or incorrectly rejecting one comparison is as serious as

incorrectly rejecting 10 comparisons, an experimentwise error rate is more pertinent. A comparisonwise

error rate should be used if one faulty inference does not affect the remaining inferences from the same

experiment. The author favors comparisonwise error rates in general. For further discussion of error rates,

see Tukey (1953b), Harter (1957), and Federer (1961).

We shall now describe the multiple comparison procedures in turn. Some textbooks that contain a

discussion of this topic are Federer (1955), Steel and Torrie (1960), Scheff6 (1959), Seeger (1966), Kirk (1968),

Bancroft (1968), and Miller (1966). Some review papers on this topic are Hartley (1955), Cornell (1971), Gill

(1973), Games (1971), Ryan (1959), O'Neill and Wetherill (1971), Thomas (1973), Waldo (1976), etc. The

O'Neill and Wetherill paper has a bibliography of 234 references, classified into 15 categories (multiple range

tests, error rates, simultaneous confidence intervals, etc.). Thomas has an unpublished bibliography on

multiple comparison techniques (available from him) containing about 300 references up to 1970.

3.2 Fisher's Protected and Unprotected LSD Methods

Fisher's protected LSD (least significant difference) procedure is to be applied only if the overall F test

for treatments is significant. It consists of applying the ordinary Student's t test to any pair of means yi and yj.

Let s2 be the error mean square (with v degrees of freedom) from the analysis of variance table, and ni and nj

be the number of replications of treatments i and j, respectively. The two treatments will be declared

different if the two observed means y, and yj differ (in absolute magnitude) by more than the LSD given by

LSD = t(a,v) Vs [(1/nt) + (/inj)], (3.2)

where t(a,v) is the tabulated two-sided (100 a)% value of the t-distribution with v degrees of freedom; e.g.,

t(.05, 30) = 2.04.

Besides permitting unequally replicated treatments, the procedure is applicable for interval estimation.

Thus, the 100(1 a)% confidence interval for (At jA) is (y -7y) LSD. (Note that if the difference between

yi and Yj is less than the LSD, the confidence limits will have different signs so that the hypothesis of equal

means is accepted. Recall the connection between hypothesis testing and interval estimation mentioned in

chapter 1.) A third desirable feature is its ease of application, especially if all treatments are replicated

equally. The LSD for all pairs of treatments is t(a,v) V2s2/n, where n is the common number of replications.

(It is possible for the overall F test to be significant but none of the t tests for the pairwise differences to be

significant. See Miller (1966, page 91).

To illustrate the method we will use the data in Duncan (1955) from a randomized block experiment with

six blocks and seven treatments (varieties of barley). The analysis of variance gave a treatments mean square

of 366.97 (with 6 d.f.), an error mean square (s2) of 79.64 (with v = 30 d.f.), with a highly significant F ratio of

4.61. The means (in bushels per acre) of the seven varieties, given below, have been relabeled A through G in

increasing order.

49.6 58.1 61.0 61.5 67.6 71.2 .71.3

A B C D E F G

With v = 30 and taking a to be 0.05, t(a,v) = 2.04 and the LSD = 2.04 x V/2(79.64)/6 = 10.51. Any two means

differing by more than 10.51 will be significantly different at the 5% level. We systematically test G -A,

G-B,G-C,G-D,G-E,G-F;F-A,F-B,. .,F-E;E-A,. .,E-D;. .;B-A. In practice, of

course we may not need to test all possible pairs. For example, once we have found G C = 10.3 to be less

than the LSD, we need not test G -D, G -E, and G -F, for these cannot be significant. The results usually

are presented by underscoring (means underscored.by the same line are not significantly different) or by

using superscripts (means having the same superscript are not significantly different). For the preceding

example, the results are as follows:

49.6c 58.1bc' 61.0ab 61.5ab 67.6ab 71.2a 71.3a

A B C D E F G

Another way of presenting the results, which is typographically convenient, is to group the means as follows:

(A,B), (B,C,D,E), and (C,D,E,F,G). Means in the same parentheses are not different. There were seven

differences (GA, GB, FA, FB, EA, DA, and CA). An unpleasant feature of many multiple comparison

procedures is the lack of transitivityy." In the preceding example, (A and B) and (B and C) were the same, but

A and C were different.

This procedure is satisfactory if Ho is true. However, suppose Ho is false such that all means but one are

equal, and this single mean is much larger (or much smaller) than the other (t -1) means. The overall F-test

will be significant, and repeated t-tests applied to the (t-1) equal means will have a large probability of

declaring some of these (t-1) means .to be unequal. This objection is removed in the Newman-Keuls'

procedure, to be discussed in Section 3.3.

In the unprotected LSD method, a preliminary F test need not be carried out at all, but the error rate for

each individual comparison is reduced to a/m, where m is the total number of comparisons (preferably

specified in advance) that we wish to make among the t treatments. If we restrict ourselves to orthogonal

contrasts, m = (t-1); if we make all possible pairwise comparisons, m = t(t-1)/2. More generally, we can

budget m different error rates a,, as, . am for the m contrasts, where these add up to a. If it is more

serious to incorrectly reject the i-th contrast than the j-th contrast, we would choose a, < a,. It can be

shown (using the so-called Bonferroni inequality) that the experimentwise error rate E is at most a.

Percentage points of the t-distribution for carrying out Fisher's unprotected LSD procedure may be found in

Table A in the appendix, reproduced from Dunn (1961). Alternatively, Scheffd (1959, page 80) gives the

. following approximation (due to A.M. Peiser) for the upper (one-sided) a point of the t distribution with v d.f.:

t., = z, + (4v)-(z. + z3,),

where z. denotes the upper a point of the standard normal distribution; e.g., z.o. = 1.645.

3.3. Newman-Keuls' Multiple Range Test

This method is applicable only in situations where all t treatments are equally replicated n times. As in

Section 3.2, s2 is the error mean square with v degrees of freedom. This method does not have a prior

significant F test as a prerequisite. To apply the method, we arrange the means in ascending order, but

instead of comparing the difference between any two means with a constant least significant difference (as in

Section 3.2), we test it against a variable yardstick

Wp = q(a; p, v)s-Is/n, (3.3)

where p (= 2, 3,. . t) is the number of means whose range (i.e., largest-smallest) we are testing, and q(a;

p, v) is the (100 a)% point of q(p, v), the distribution of the studentized range of p means and v degrees of

freedom. Values of q(a; p, v) are tabulated in Pearson and Hartley (1966) and Harter (1960a). They are

reproduced in condensed form in the Appendix (Table B), Beyer (1968), Miller.(1966), Steel and Torrie (1960),

etc.

For the numerical example in Section 3.2, t = 7, v = 30, and V/s2/n = V79.64/6 = 3.643. For a = .05, the

values of q are:

p: 2 3 4 5 6 7

q(.05; p, 30): 2.89 3.49 3.85 4.10 4.30 4.46

Wp = 3.643q : 10.53 12.71 14.03 14.94 15.66 16.25

Fisher's LSD and W2 are identical. We test G -A against W7 = 16.25 since G -A is the range of 7 means.

There are 2 ranges of 6 means (viz., G-B and F -A), and these are compared with W; = 15.66. Similarly, we

test the three five-mean ranges G-C, F-F, E -A against W = 14.94; G-D, F-C, E -B, D-A against W4=

14.03; G -E, F -D, E -C, D -B, C -A against W3 = 12.71; and G-F, F -E, E -D, D -C, C -B, B -A against

W2= 10.53. In practice, we need to perform much fewer tests than these, for once two means are judged to be

not different, they are underscored by a line, and no further testing is made among means that are between

the two means so underscored. We need only test G-A = 21.8>W7, G-B = 13.2

21.6 > W6, E -A = 18.0 > W5, and D-A = 11.9 < W4 (underscore). No further testing is necessary. The

results are as follows:

Aa Bab Cab Dab Eb Fb Gb or (A, B, C, D) and (B, C, D, E, F, G).

This method gives only 3 significant pairs (G-A, F-A, and E-A),. compared to 7 pairs from the LSD

method. The Newman-Keuls' procedure is intuitively more appealing than the LSD method. One feels that

the difference between the extremes of 7 means should pass a more stringent test than the difference between

the extremes of, say, 3 means. The method has the disadvantage of not being amenable to interval estimation.

The error rate is confusing because it is neither experimentwise nor comparisonwise. At each stage of testing

(range oft means, (t -1) means, etc.), the probability of rejecting the hypothesis of equal means, if true, is a.

3.4. Tukey's HSD Method and Multiple Range Test

Tukey's original HSD (honestly significant difference) procedure (1951, 1953) requires equal replica-

tions. It has the simplicity of Fisher's LSD method in having a constant yardstick with which to test all pairs

of treatment means. The HSD is calculated as the Wp of the Newman-Keuls procedure, with p taken at its

maximum value (i.e., with p = t, the total number of treatments). Thus, two treatments are declared to be

different (in their effects) if the absolute magnitude of the difference between their means exceeds

HSD = Wt = q(a; t, v)Vs2/n, (3.4a)

where the symbols are as in Equation (3.3).

In the previous example, with t = 7 treatments, error mean square s2 = 79.64 with v = 30 d.f. and n = 6

replications, the HSD = q(a; 7, 30) x 3.643 = 4.46 x 3.643 = 16.25, if a = .05. Testing the difference between

every pair of means against 16.25, we get results that are identical to those given by the Newman-Keuls

procedure. In general, we shall get fewer significant differences from Tukey's method. Since error rate of

Tukey's HSD method is experimentwise, Hartley (1955) recommends that a be taken as 0.10 or higher.

Tukey's HSD procedure also can be used to construct simultaneous confidence intervals for all pairs of

treatment differences as follows:

Prob. {(piJ,-j) lies within (,i-yj) HSD: i,j = 1,2, . ., t} = (1-a). (3.4b)

In words, Equation (3.4b) states that the probability is 0.95 that all of the following statements are true:

/9G- A = (71.3 49.6) 16.25; /AG-/AB = (71.3 58.1) 16.25; .;

AG --F = (71.3 71.2) 16.25; AF-- A = (71.2 49.6) 16.25; .;

AMF-/AE = (71.2 67.6) 16.25; . ; /B-A = (58.1 49.6) 16.25.

Equation (3.4b) can be generalized to simultaneous confidence intervals for linear contrasts among the t

treatment population means, as shown in Equation (3.4c).

t t t

Prob. .i ci/i lies within I ciLY (WSD) I I ci =(l-a), (3.4c)

=1 i= i= 1

for all sets of coefficients (cl, c2, . ct) satisfying 2c, = 0. (There is an uncountable infinity of such sets.)

Equation (3.4c) immediately reduces to (3.4b) if the contrast is a pairwise difference, for then one coefficient is

+1, another is -1, and the rest are zero. Equation (3.4c) also enables us to test a more general hypothesis Ho:

ecigi = d (specified). We reject Ho if the confidence limits for the contrast exclude d. Gabriel (1964) shows that

at least one contrast will be significant if, and only if, the overall F test is significant. This is not true if the

contrasts are restricted to paired differences only.

To overcome the conservativeness of his HSD procedure, Tukey also has proposed a multiple range test,

using the average of his HSD and the Newman-Keuls statistic as the test criterion. Thus, the range of p

ranked means is tested against

Y2[q(a; p, v) + q(a; t, v)]Vs2/n. (3.4d)

Spj0tvoll and Stoline (1973) and Hochberg (1975, 1976) have extended Tukey's HSD procedure to allow

unequal variances or unequal sample sizes. If sample sizes are unequal, two approximate procedures are to

use the harmonic mean of the sample sizes (reciprocal of the arithmetic mean of the reciprocals of the sample

sizes) or to replace the estimated variance of a mean (s2/n) in Equation (3.4a) by the average of the variances of

the two means concerned, viz., s2[(1/n1) + (1/nj)]2, as in Kramer's (1956) modification of Duncan's multiple

range test. Keselman, Toothaker, and Shooter (1975) found that these two methods "have the same

sensitivity for detecting real mean differences."

3.5. Scheffe's Method

Like Tukey's HSD, Scheffe's (1953) procedure is applicable to general contrasts, and not just paired

comparisons. Since it employs an experimentwise error rate, Scheffe (1959, page 71) suggests taking a = .10.

Scheff6's procedure is more general than Tukey's in being able to handle unequal replications. Let ni be the

t

number of replications of the i-th treatment. The contrast C = 2 cai will be estimated by C = 2ciy,, with

variance estimated by i=1

V(C) = s2 (c2 /n), (3.5a)

where s2 is the error mean square (from the analysis of variance table) with v degrees of freedom, say. The 100

(1. a)% simultaneous confidence intervals for all contrasts C (uncountable infinity of them, obtainable by

varying the set of coefficients c, c2, . et) are

C V(t-1).F(a;t-1,v).V(C), (3.5b)

where F(a; t-1,vP) is the upper (100 a)% point of the F-distribution with (t-1) and v degrees of freedom (for

numerator and denominator, respectively). As an example, F(.05;6, 30) = 2.42. For pairwise differences (2c2

= 2) and equal replications (ni = n), Equation (3.5a) reduces to

(-yj) = 2s2/n. (3.5c)

From Equation (3.5b), the 100(1 -a)% simultaneous confidence interval for all paired differences (g -gj) (for

all i and j) is

(i-yj) V(t-1).F(a;t-1l,). (2s2/n). (3.5d)

Equation (3.5d) can be used to test the significance of the difference between two means 1A, and gj. We declare

these to be different if the sample means J' and Y differ in absolute magnitude by an amount exceeding

S = /(t-1).F(a;t-1,v).(2s2/n). (3.be)

For t=2 treatments, S above is identical with the LSD since V/F(a;1,v) = t(a,v). Using the relationship

between hypothesis testing and interval estimation, we can test the general null hypothesis Ho:lci/i = r

(specified) by seeing whether d falls inside or outside the interval given in Equation (3.5b).

For the previous numerical example, taking a = .05, we have S = V/6 x 2.42 x 2(79.64)/6 = 19.63. Two

treatment sample means will be declared significantly different at the 5% level if their difference exceeds

19.63 in magnitude. (Note that this least significant difference is even larger than Tukey's HSD = 16.25. This

is a general result. Tukey's procedure is preferred over Scheff4's for pairwise comparisons, but for general

contrasts Scheff4's method gives a shorter interval.) Application of Scheffe's procedure to the previous

numerical example gives the following results: (A,B,C,D,E) and (B,C,D,E,F,G). There are only two

significant differences (G-A and F-A), compared to three differences from the Newman-Keuls and the

Tukey procedures.

Equations (3.5a) and (3.5b) are directly applicable to situations where the sample means have unequal

variances because of unequal replications, assuming that single observations are uncorrelated and have equal

variances. For situations where the unequal variances of the sample means also may be caused by observa-

tions from the different treatments having unequal variances, Brown and Forsythe (1974) replace Equation

(3.5a) by S(cisI/n1), where s2 is the sample variance of the i-th treatment, and F (a; t- 1,v) in Equation (3.5b) is

replaced by F(a; t-1,f), where f is obtained using Satterthwaite's result on the d.f. of a linear combination of

sample variances, as follows:

1

= flP/(ni-1)

f i

f, = (s,/ni)/I(s2/n,).

For another approximation, see Spj0tvoll (1972).

If the sample means are correlated, Equation (3.5b) will still hold but Equation (3.5a) must be modified to

include the covariances of the sample means, as in Equation (3.5f).

Scheffe's method can be directly generalized to linear model situations, expressible in matrix notation as

y = X/3 + e. This covers both multiple regression and analysis of variance models higher than just the one-way

classification. The contrast C = Zc1/3 will be estimated by C = 1cibl, where the b,'s are the least squares

estimates of the f31's. The estimated variance of C is

V(C) = I 'cicj (estimated covariance of bi,bj). (3.50

Most regression computer programs (e.g., the SAS package put out by North Carolina State University)

include the estimated covariances of the estimated regression coefficients as part of the output. Equations

(3.5b) and (3.5f) may now be used to construct simultaneous confidence intervals for linear contrasts or to

make multiple comparisons among the g3's.

3.6. Duncan's Methods

Of the several procedures that D.B. Duncan proposed between 1941 and 1975, we shall discuss only

two-his most popular (multiple range test) and his most recent (Bayesian k-ratio LSD rule), which he hopes

will supplant the former.

3.6.1. Multiple Range Test

This method assumes homoscedastic (equal variances) and uncorrelated means. It is very similar to the

Newman-Keuls procedure, except that the protection level at each testing stage varies with p, the number of

means whose range is being tested for significance. Duncan's rationale for decreasing the protection level as p

increases is as follows. In experiments factoriall or otherwise) where the (p-1) degrees of freed:mn for the p

treatments are partitioned into single degrees of freedom to correspond to (p-1) mutually orthogonal

contrasts, the experimenter has no qualms about testing each contrast at the a level. Assuming for simplicity

that the number of degrees of freedom for the error mean square is infinite (or quite large). the (p -1) F-ratios

are statistically independent (almost). Therefore, the probability of rejecting one or more contrasts, if all p

means are equal, is

Duncan (1955) modifies Newman-Keuls' multiple range test by using a variable level a, as the significance

level when testing the range of p means. As an illustration, with p = 9 equal means and a = .05, the

probability of incorrectly rejecting one or more of 8 orthogonal contrasts is 1 -(.95)8 = 1 -.6634 = .3366. This

large probability of Type I error makes Duncan's multiple range test very powerful (large probability of

detecting differences when they exist). Experimenters are often more interested in finding than in not

finding significant differences among the treatments being tested. For this reason, Duncan's procedure

received widespread acceptance among research workers, particularly in the agricultural sciences. As

originally proposed, no preliminary significant overall F test is required. To overcome, somewhat, the

objection of a possibly large Type I error probability, we may conservatively require a significant overall F

test as a necessary condition for the application of the multiple range test.

In the Newman-Keuls procedure, the yardstick for testing the significance of the range or p means is W,

= q(a;p,)Vs2/n. In Duncan's procedure, the yardstick is similar, except that a is replaced by a,, defined by

Equation (3.6a), giving the following "shortest significant range" criterion:

R, = q(a,;p,p)V)Sn. (3.6b)

Thus, no special tables are required if we have extensive tables of q(p,v), the distribution of the studentized

range of p means and v d.f. However, the percentiles a, are "awkward," being equal, for example, to .05,

.0975, .1426, .1855, .2262, and .2649 if a = .05 and p = 2,3,4,5,6, and 7, respectively. For this reason, Duncan

(1955) tabulates q(a,;p,v) for a = .05 and .01; p = 2(1)10(2)20,50,100; and v = 1(1)20(2)30,40,60,100, and =.

More accurate and more extensive tables are given in Harter (1960), reproduced in Harter (1970). A

condensed table of q(a,;p,v) is given in the appendix as Table C, in Steel and Torrie (1960), etc.

To apply the method, we arrange the means in ascending order and test each pair against R,, starting

with the extremes. Once two means are declared to be not significantly different, we underline them and no

further testing is made between means underscored by this line. Applied to the previous example with t = 7

means, v = 30 d.f., s2 = 79.64, and each treatment equally replicated n = 6 times so that Vs/ = 3.643, we

have:

p: 2, 3, 4, 5, 6, 7

q(.05,;p,30): 2.89, 3.04, 3.12, 3.20, 3.25, 3.29

R,=3.643q : 10.53, 11.07, 11.37, 11.66, 11.84. 11.99

The results of the test are:

A B C D E F G

49.6 58.1 61.0 61.5 67.6 71.2 71.3

In these results, G A = 21.7 > R7, the shortest significant range for 7 means; G B = 13.2 > R,; G C =

10.3 < R5, so we underline G through C and make no comparisons among C,D,E,F, and G. F A = 21.6 > Re;

F B = 13.1 > Rs, and we need not test F C, etc.; E A = 18.0 > Rs; E B = 9.5 < R4, so underline B

through E; D A = 11.9 > R4; C A = 11.4 > Ra; and finally P A = 8.5 < R2, so underline A and B. Thus,

the method gives seven significant differences (GA, GB, FA, FB, EA, DA, CA), compared to three

significant differences from Newman-Keuls' test.

One disadvantage of this procedure is that it is not amenable to simultaneous interval estimation. If we

use (yi yj) R, as the confidence interval for (/Ii g), some pairs of means will have confidence intervals of

different widths, even though all treatments are equally replicated.

In a sense, Fisher's LSD, Newman-Keuls' MRT, and Tukey's HSD are particular cases of Duncan's

MRT. If in Equation (3.6b), we put a, = a and p = 2, we obtain Fisher's LSD. Tukey's IISD is obtained by

putting a, = a and p = t; and substitution of a for a, gives the Newman-Keuls' MRT.

If the sample sizes are unequal, Bancroft (1968) suggests using the harmonic mean of the sample sizes

(reciprocal of the arithmetic mean of the reciprocals of the sample sizes):

nh = [(ni- + n2-2 + . + n')/t]-'.

ap = 1 (1-a)P-1.

(3.6a)

Kramer (1956) suggests replacing s2/n (the common variance of the sample means) in Equations (3.3), (3.4a),

and (3.6b) by the average of s2/n, and s2/nj, the variances of the two sample means being tested. Equation

(3.6b) becomes

R, = q(a,;p,v)Vs2[(1/n+) + (l/ni)]/2. (3.6c)

Kramer (1957) extends the procedure in an obvious manner to correlated as well as heteroscedastic

means, where the variance ofy is c,,o-2, that ofY is cjj~2, and their covariance is cjcr2. The coefficients c,, cjj,

and c,j are known, but 0.2 is unknown and is estimated as usual by the error mean square with, say, v.d.f. from

the analysis of variance. (This does not handle the situation where the unequal variances of the means are due

to observations from the different treatments having unequal variances. The correlation between the means

may be due to an incomplete block design or a covariate being used in the analysis.) If 5, and 3 are the

extremes of p ranked treatments, then we declare these treatments to be different if their difference exceeds

q(a,;p,P)VT/2(c, 2cu + cjj)s" (3.6d)

in Duncan's test, and similarly for the Newman-Keuls or the Tukey tests. Note that if the means are

uncorrelated, c1j = 0, ci, = 1/ni, and cjj = 1/nj, so that Equation (3.6d) reduces to (3.6c).

Kramer's extension of the test to correlated and heteroscedastic means is approximate; and it is also

conservative, in the sense that it tends to declare two means equal when they are not. Duncan (1957) proposes

a more powerful test, which imposes a further condition for a subset of means to be declared homogeneous.

3.6.2 Bayesian k-ratio t (LSD) Rule

In Fisher's protected LSD method, the result of the overall F test for treatment effects is used only in a

go, no-go fashion. In Duncan's Bayesian k-ratio t or k-ratio LSD rule, the observed value of the F test statistic

actually is used in calculating the LSD or the critical t value for comparing two means. If the F ratio is large

(indicating heterogeneous treatments), the critical t value is reduced, thereby increasing the power of the

test; and if the F ratio is small (indicating homogeneous or nearly homogeneous treatments), the critical t

value is increased, making it more difficult to declare two treatments to be significantly different and thus

decreasing Type I error probability. Duncan (1975) summarizes his earlier work (1961 and 1965) and that of

his former doctoral students (Ray A. Waller and Dennis 0. Dixon) at The Johns Hopkins University in 1969

and 1974.

The k-ratio t test is based on an EBALEP (empirical Bayes, additive losses, exchangeable priors)

approach. The sample mean T is, of course, a random variable, usually assumed to be normally distributed

with mean li and variance 0o2/n. In Bayesian statistical inference, the population means p/,, p2s,. . pt also

are regarded as random variables, with a prior distribution that usually is assumed to be normal with some

mean po and variance or2. (This may well be true experimentally and not merely conceptually, if the t

treatments correspond to t varieties, say, randomly selected for field testing from a larger collection of

varieties.) The term "empirical Bayes" comes about through having to use the data to estimate the parame-

ters of the conceptual superpopulation of populations. If Li is the loss incurred when the i-th decision is

erroneous, and similarly with Lj, the additive losses assumption states that the loss incurred is Li + Lj, if both

the i-th and the j -th decisions are incorrect. Finally, the exchangeable prior distributions assumption states

a prior the comparisons are "equally plausible:" This rules out, for example, the case where the t treatments

form a p x q factorial (where a priori comparisons of main effects are more likely to be significant than

interaction effects) or where the t treatments correspond to t levels of a quantitative factor, where we may a

priori expect an ordering of the true treatment means p/. -
under Ch. 2, and no multiple comparison technique is appropriate.)

A novel feature of the test is the use of the ratio (denoted by k) of the relative seriousness of Type I to

Type II errors. By considering the case of t = 2 treatments (where no multipJ) comparison problem exists),

the critical value in the regular Student's t test at a given a level can be made approximately equal to that in

the k-ratio t test for some value of k. In round figures, the approximate correspondence between a and k is:

a : .10. .05, .01

k : 50. 100. 50>.

Therefore, Duncan recommends that k be taken to be equal to 100 or 500, where an experimenter previously

used to test at the 5% or the 1% level, respectively.

Any difference d between two means or, more generally, any contrast c among the means is significantly

different from zero if the ratio d/sd or c/Sc exceeds some critical value t(k,F,t,v), where sd = v2s2/n, and s2 is

the error mean square with v degrees of freedom, n is the constant number of replications of each treatment,

and F is the observed F ratio for treatments from the analysis of variance table. (The estimated variance si

of a contrast is given in Equation (3.5a).) As indicated above, the critical t value depends on the four

arguments k, F, t, and v. (Unfortunately, we have used the same letter t to denote two entirely different

things-the total number of treatments in the experiment and the t test or distribution.) Its dependence on F

is awkward for tabulation because of the uncountably infinite number of values that F can take, making

interpolation almost inevitable in each application. There is also no easy or explicit formula for calculating the

critical value. It is the solution of an extremely complicated integral equation, which appears as Equation

(3.15) in Duncan (1975). Table D in the appendix gives the critical values for the k-ratio t test for k = 100 and

500, taken from Waller and Duncan (1972). For interpolating with respect to F, Waller and Duncan (1969)

recommend linear interpolation using a = V\/Ffor F -< 2.4, except who- q > 100 and v > 60; otherwise, we

use b = VF/(F -1), for F > 2.4, except when q < 20 and v -< 20, where q = t-1. When a cannot be used, b is

used, ai.d vice versa. Interpolation with respect to q and v should hardly ever be necessary. If needed, the

recommendation is to interpolate using q and 1/v. Values of a and b are included in Table D.

For large experiments (large number t of treatments and large number v of d.f. for error), the critical

values may be approximated as follows, with b already defined above:

t(100. F, x, oc) = 1.72 b(for k = 100) (3.6e)

t(500, F, x, x) = 2.23 b(for k = 500)

Duncan (1965) considers Equation (3.6e) to give adequate approximation if t -= 15 and v-= 30. Equation (3.6e)

shows that for large F (sign of heterogeneous treatments), two means will be declared different if their

studentized difference (d/sd) exceeds only 1.72 (for k = 100, corresponding to a = .05), while for a small F =

1.5, say, the critical value is raised to 1.72 V1/.5-7 = 2.98, reducing the probability of Type I error.

In the numerical example we have been considering, t = 7 treatments, error mean square s& = 79.64 with

v = 30 degrees of freedom, F = 4.61, and standard error of a difference sd = v/2sn = V2(79.64)/6 = 5.15. For

k = 100, q = t-1 = 6, and v = 30, Table D gives t = 2.16 for F = 4.0 (and b = 1.155) and t = 2.02 for F = 6.0

(and b = 1.095). Interpolating for F = 4.61 (and b = V/4.61/3.61 = 1.130), we get the critical t value as t(100,

4.61, 7, 30) = 2.02 + (2.16 2.02) (1.130 1.095)/(1.155 1.095) = 2.02 + .08 = 2.10. (If we had interpolated

directly with respect to F, instead of the recommended b = VF/(F -1), the calculated value oft would be 2.12.

Although t = 7 is too small to be regarded as infinite, use of Equation (3.6e) gives a calculated t of

1.72-V4.61/3.61 = 1.72 (1.13) = 1.94.) Instead of dividing each difference by its standard error sd and

comparing it with the k-ratio t value, it will be more convenient computationally to multiply the t value by Sd

to give the corresponding k-ratio LSD = 2.10 (5.15) = 10.82 for the present problem. Any two means differing

by more than 10.82 will be declared different. The results are as follows, being identical to those obtained by

using Fisher's LSD method.

49.6 58.1 61.0 61.5 67.6 71.2 71.3

A B C D E F G

The LSD's (in multiples of sd) from the procedures for 7 treatments and 30 d.f. for error are:

LSD/s,

Fisher's 2.04

Newman-Keuls' MRT (q(a;p,v)/V2) 2.04, 2.47, . 3.15

Tukey's HSD 3.15

Tukey's MRT 2.60--3.15

Scheffo's 3.81

Duncan's MRT (q(a,;p.0)/IV ) 2.04. 2.15, . 2.33

Duncan's k-ratio t test (for an observed F = 4.61) 2.10

23

This tabulation shows that Duncan's k-ratio LSD rule is almost as powerful as Fisher's LSD, without the

latter's higher Type I error probability, for if Ho were true, the observed F would have been smaller (equal to

2.4, say) and from Table D, the critical value for t would have been 2.42. If the treatments are very

heterogeneous, the k-ratio LSD rule can be more powerful than Fisher's LSD. If F = 10, for example, the

critical k-ratio t value is 1.93, compared to 2.04 for Fisher's LSD rule.

The k-ratio t test is adaptable for simultaneous interval estimation. Following Fisher's, Scheffe's, and

Tukey's methods, one would expect the k-ratio confidence interval for 8 = (Ai Aj) to be (d = Yi -Yj) t k-ratio

LSD, where the LSD = t(k,F,t,v)sd, but this is not so. Besides the four parameters k, F, t, v, the LSD in the

interval estimation problem also depends on the observed value of t = d/sd. Unfortunately, tables are not

available at present. We refer the reader to Duncan (1975) and Dixon and Duncan (1975) for details. A large

sample solution for the limits is as follows:

[S., 8u] = [1-(l/F)]d V/1-(1iF)sdt(k, x, c, c ), (3.6f)

where t = 1.72 (for k = 100) and 2.23 (for k = 500). Note that the point estimate of 8 = (p., j) is [1 -(1/F)] (y

7j). Dixon and Duncan (1975) think that the preceding large sample approximation is adequate if t 16, v /

60, and F > 6.

Another approximation that assumes only a large observed F value (with finite t and v) is the following:

[8., 8u] = d sdt(k, c-, t, v). (3.6g)

The values of t(k, -c, t, v) are independent of t and are obtainable from the last row in Table D in the

appendix for k = 100 and 500.

3.7. Studentized Maximum Modulus Procedure

All the procedures so far discussed for simultaneous interval estimation are for contrasts among the k

means (or paired differences in particular). Sometimes, the experimenter may wish to construct simultane-

ous confidence intervals for the population means themselves. Assume that all the sample means y,, y2,.

Yt are correlated equally with correlation coefficient p and with possibly unequal variances dro2, dO2 ....

dtor2, where the d's are known constants. If s2 is the usual unbiased estimate of 0-2 with v degrees of freedom,

the probability is y = (1 -a) that gt lies within y, u(t, v, p; y) Vd-s, for all i = 1, 2, . t simultaneously,

where u(t, v, p; y) is the two-sided (100 y)% point of the maximum absolute value of the t-variate Student's t

distribution with v degrees of freedom and common correlation p. (Constructing a 10o(/y)/k% confidence

interval for ji independently of the others, using data from the i-th sample only, is not efficient.)

This technique can be extended to linear combinations of the means (not necessarily contrasts). The

probability is (1-a) that !ci/jL lies within 1ciYi u(t, v, p; y) sicil Vdi for all (uncountably infinite) sets of

constants (c,, c2,. . c). Values of u(t, v, p; y) are given in Hahn and Hendrickson (1971) for p = 0, .2, .4, .5;

y = .90, .95, .99; t = 1 (1)6(2)12, 15, 20; v = 3(1)12, 15(5)30, 40, 60. Table E in the appendix gives the values of

u(t, v, 0; y). Use of Table E in cases where p#;z 0 gives conservative results. The values for p 0 are smaller

than corresponding ones with p = 0.

3.8 Comparisons Against a Control

3.8.1. Dunnett's Method

In experiments comparing t treatments, one of the treatments quite often is a control (check or

untreated). In these experiments, we could partition the (t -1) d.f. for treatments into 1 d.f. for comparing

control against the average of the other treatments and (t -2) d.f. for comparisons among the (t -1) "real"

treatments. If these (t -1) other treatments are significantly different, the 1 d.f. comparison between their

average and the control may not be meaningful. The experimenter may wish to compare the control with each

of the other (t -1) treatments (and not with their average). Duncan's k-ratio t test is not applicable here since

the exchangeable priors (or equally plausible comparisons) assumption is not satisfied. (The difference

between a control and a treatment is a priori likely to be larger than that between two treatments.) Dunnett

(1955) gives a procedure for the simultaneous interval estimation or multiple comparisons of the control with

each of the others, with an experimentwise error rate. A treatment and a control are declared different if

their means differ by.more than t(a; q, v)sd, where so is the standard error of a difference, q = (t -1) is the

number of treatments other than control. Values of t(a; q, v) are given in Dunnett (1964) and reproduced for

both one-sided and two-sided tests in Table F of the appendix. If we are comparing insecticides, for example,

and the control is a standard one, two-sided tests would be proper since we do not know a priori if.the new

insecticides would be better or worse than the standard insecticide. More extensive tables of V2t(a; q, v) for

one-sided tests are given in Gupta and Sobel (1957) for up to 50 treatments.

To illustrate the method, suppose that variety A in our numerical example is a standard variety, thus

calling for two-sided tests of A against each of the others. From Table F, with 30 d.f. for error and q = 6 other

treatments besides control, the critical t value in a 5% two-sided test is t(.05; 6,,30) = 2.72. The standard error

of a difference is sd = /2s2/n = 5.15. The LSD between control and each of the others is LSD = 2.72 (5.15) =

14.0. Since the mean of A is 49.6, any variety will be different from A, if its mean is at least 49.6 + 14.0 = 63.6.

The result is that B, C, and D are noc different from A, ouL E, F, G are better than A. The two-sided interval

estimate of the difference between a standard variety and any other variety is their observed mean difference

14.0.

The preceding discussion assumes equal replications. If the control is replicated n, times and the i -th

treatment is replicated n, times, we define sd = Vs2[(1/n,) + (1/n1)], which reduces to the previous definition if

all replications are equal. More generally, if within treatment variances are not homogeneous, we define sd =

V/(s./n,.) + (s/n1) and use Satterthwaite's result for getting the d.f. of a linear combination of mean squares. It

may suffice to calculate only two error mean squares, one for within control and the other for within other

treatments. For a refinement, see Dunnett (1964). Dunnett's paper also gives the following optimal allocation

of experimental units. If n, = n., = . = n,-, = n, say, we should take n, = n V'fT. Bechhofer (1969)

generalizes this result to the case where the variances are unequal but their ratios o-/or2 (i = 1, 2, . t -1)

are known.

Robson (1961) extends Dunnett's procedure to the case of a balanced incomplete block design, giving rise

to correlated treatment means.

3.8.2 Gupta and Sobel's Method

Using the statistic in Dunnett's method, Gupta and Sobel (1958) give the following procedure for

selecting all treatments that are as good as or better than the control or standard treatment. The procedure

guarantees a probability of at least (1 -a) that the selected subset of treatments contains all treatments that

are at least as good as the control. The rule is to include in the subset all treatments whose means yi exceed

that of the control y0 by the amount

(i -Y o) t(a; q, v)sd, (3.8a)

where t(a: q, v) is the one-sided critical value in Dunnett's test.

In using Equation (3.8a) as the criterion, we throw away treatments that are significantly worse than

control. Treatments whose sample means are slightly less than those of control (so that yi y, will be slightly

negative) will be included in the subset. If we use Dunnett's test as a screening procedure, we declare the

i-th treatment to be as good as or better than control if

(Yi yo) 2- + t(a; q, v)Sd. (3.8b)

Comparing Equations (3.8a) and (3.8b), it is obvious that Gupta and Sobel's procedure will give a larger

subset of treatments. Dunnett's method retains only those treatments that have proved themselves superior

to control, while Gupta and Sobel's method discards only those treatments that have proved inferior to

standard treatment.

Gupta and Sobel (1958) also discuss other related problems---ecomparing variances and binomial parame-

ters.

Sobel and Tong (1971) consider the optimal allocation of observations for partitioning a set of normal

populations in comparison with a control.

3.8.3 Williams' Method

Williams (1971) considers the case where the t treatments are t levels or doses of some substance, with

the control corresponding to zero dose. This situation was discussed in Section 2.3.1, where the recommended

analysis was either to compare zero against the average of the nonzero doses and fit a regression to the q =

(t-1) nonzero levels or to fit a curve through all t doses (including zero). Williams claims there are

circumstances in which the experimenter may not wish to fit a curve to the t doses. He may wish, instead, to

compare zero dose against each of the other doses. As an example, he cites toxicity studies in which the aim of

the experiment may be to determine the lowest dose at which there is activity. (The assumption is that the

response is zero up to this "lowest dose" and increases thereafter, instead of continuously increasing from

zero, slowly at first and more rapidly afterwards.) Another reason for not wishing to fit a curve may be the

experimenter's unwillingness to assume a particular form (logistic, etc.) for the response function. The

number of levels is usually very small (3 to 5), making model fitting rather difficult.

Dunnett's procedure may be used to compare zero with the other doses, but some power is lost in not

making use of the structure in the treatments. Williams assumes a nondecreasing response function so that ,,

-< /AL <. . gq if the treatments To, T1, T2,. . T, are in increasing order of dosages. (If, say, the third

dose (i.e., second nonzero dose) is the level at which activity first becomes noticeable, we have /o = j, < 2^z

. <. .) The first step in Williams' test is to estimate pz (i = 0,1,. . q). Because of the constraints on the

''s, Ai is not necessarily estimated by yi, the sample mean. Bartholomew (1961) gives the following maximum

likelihood estimates of the g's. Ifyo y< < Y2 <. - Yq, then t = yt (i.e., Ai is estimated by7y).. Otherwise,

there is at least one i for which 7y > y7+,. We replace both :7 and y,+, by their weighted average

Yi.i+, = (n1yt + ni, 7yi+,)/(n+n1+,),

where ni is the number of replications of treatment or dose i. We now have only q means yo, Y,. . Yi-1,

Y7,i+1, 7y+2, * 7q. If these means are in nondecreasing order, we stop and estimate gJ by 7j (forj = 0, 1,

. ., i-1, i+ 1, . q) and estimate both gt and /i+, by 71,1+i. Otherwise, we repeat the averaging process,

giving 7yii+ a weight of (ni + n.+,). For instance, if 7i.i+, > y1+2, we average them to give

Y.i+,i+2 = [ni + n+,)ii+, + n+2Yi+2Y(n1 + n+1 + n1+2)

as the common estimate of tz, At+,, and /1+2, if the sample means are now in correct ascending order.

We now have the estimated population means /2o, Z, ,. . /,, where some of these may be equal, from

the averaging process. Assuming equal replications for all doses (including zero), we now test

tp = (/p yo)/V2Ss2/n, (3.8c)

taking p = q, q -1, . 1 in this order, stopping as soon as we get a nonsignificant result. We declare the

p -th nonzero dose to be different from control iff, above exceeds the critical value (a;p,v), given in Table G

in the appendix. (Note that for simplicity of statistical distribution, we test bZp against the unadjusted sample

mean To and not against 2o, even if /o is not estimated by Yo.) Of course, we-can apply the test in the following

alternative way. Declare ,p and /o different if

(/A. 7o) > T(a;p,v)Sd. (3.8d)

Williams (1971) gives an example of a randomized block experiment with 8 blocks and t = 7 doses (zero

and q = 6 nonzero doses), and an error mean square s2 = 1.16 with v = 42 d.f. The observed means areyo =

10.4, i, = 9.9, 72 = 10.0, Y7 = 10.6, 74 = 11.4, 75 = 11.9, and 76 = 11.7. The effect of the substance in the

experiment, if anything, can only increase the mean of the response. Since yo > 7y, we average these to give

Yo7, = (10.4 + 9.9)/2 = 10.15, and because this average exceeds 72, we form the weighted average Yo.1.2 = (27o.,

+ Y2)/3 = 10.1. Since 7y and 76 are not in the correct ascending order, we average them to give y5,6 = 11.8. We

thus have the following estimates of the population means.

Ao = A, = As = Yo.1.2 10.1; 43 = T3 = 10.6; 4 = 4 = 11.4;

,A = -- = 7.5 = 11.8.

The standard error of a difference is Sd = v'2(1.16)/8 = .539. For a test at a =.05, Table G gives the following

critical values for 40 d.f.

p : 6, 5, --.

t(.05;p,40) : 1.81, 1.80, 1.80, 1.79, 1.76, 1.68

t(.05;p,40)sd: .98, .97, .97, .96, .95, .91

Applying equation (3.8d),

g, Yo = 11.8 10.4 = 1.4 > .98; conclude ie > ep.

a5 Yo = 11.8 10.4 = 1.4 > .97; conclude ,5 > o.

LO fo = 11.4 10.4 = 1.0 > .97; conclude /44 > Po.

.s, Yo = 10.6 10.4 = 0.2 < .96; conclude 93 = ; = 1, =/ o.

The conclusion is that the fourth nonzero dose was the lowest dose at which response was observed.

Williams (1972) extends the procedure to handle the case where the zero dose has a different (larger)

number of replications than that cf the nonzero levels, for both one-sided and two-sided tests.

In general, we would recommend the regression approach of Section 2.3.1. Suppose we have the

following results:

Dose: 0 1 2 3 4 5

Response: 5 7 10 15 25 40

Using the present procedure, we may conclude that treatment is first effective at dose 3. The author would

rather believe that the response is increasing continuously from dose 0, gradually at first and more rapidly at

higher doses. We might fit a curve and estimate the lowest dose at which the response will be at least* say. If

higher doses are more expensive and cost is a consideration, we could adjust the response to a per dollar basis

and estimate the dose that will produce the highest adjusted response.

3.8.4 Sequential Methods

See Dudewicz, Ramberg, and Chen (1975) for a two-stage procedure when variances are unequal and

unknown, and Paulson (1962) for a sequential procedure, assuming equal variances. In the latter, inferior

treatments are dropped at each stage.

3.9. Miscellaneous Methods

In this section we shall discuss briefly various related techniques or merely cite their references.

3.9.1 Bonferroni Procedure for Preselected Contrasts

Tukey's and Scheffe's methods enable us to construct confidence intervals for an infinite number of linear

contrasts among the t means so that the probability is (1 a) that they are all simultaneously true. Usually an

experimenter is only interested in a rather small subset of m contrasts, say. If these m contrasts are

preselected and not suggested by the data, Dunn (1961) recommends the usual method based on the Student's

t distribution to construct an interval for each contrast independently, with confidence coefficient 1 (a/m),

so that from Bonferroni's inequality, the overall or simultaneous confidence level for all m contrasts is at least

(1 a), as in Fisher's unprotected LSD. Two-sided (100 a/m)% points of the t distribution are given in the

paper and reproduced in Table A in the appendix. In the notation of Section 3.5, the confidence interval for

each contrast is

C t(a/m;v) V )-, (3.9a)

where t(a/m;v) is the two-sided (100 a/m)% point of the t distribution with v degrees of freedom. These

intervals often will- be narrower than those given by Tukey's or Scheffe's methods. See also Schafer and

MacReady (1975).

3.9.2 Gabriel's Simultaneous Test Procedure (STP)

Gabriel (1964, 1969a) gives a procedure for testing the homogeneity of the (2t t 1) subsets (with at

least two means) from a set of t means. Let P be any subset containing at least two treatments and S2 be the

treatment sum of square for those treatments in P. These treatments will be declared to be different if

S2 > (t-1)s2F (a;t-l,v), (3.9b)

where s2 is the error mean square with v d.f. from the analysis of variance of the complete data (with t

treatments), and F(a;t -1,v) is the upper (100 a)% point of the F distribution with (t-1) and v d.f. Note that

the critical value of F in Equation (3.9b) is that for the complete data so that the righthand side is identical for

all subsets.

The error rate is experimentwise. If H, is true (all t means are equal), the probability is only a that one or

more of the (21 t 1) subsets will be declared incorrectly to be heterogeneous. The procedure also has the

following nice property. Any set containing a significant subset is itself significant. (However, the converse is

not necessarily true, and it is possible for a significant set to contain no significant proper subsets.) Because of

this property, it is not necessary to test all subsets. For example, if the set (A, B, C) is significant, the set (A.

B, C, D) will be significant; and if(E, F, G) is not significant, the subsets (E, F), (E, G), and (F, G) also will be

not significant.

The 1964 paper has a numerical example. Tukey's HSD method, which is conservative compared with

Newman-Keuls' or Duncan's multiple range tests, found two significant pairs. Gabriel's STP and Scheffe'

test found all subsets of two means (i.e., all paired differences) to be not significant. Generally, a set P will be

declared significant by Gabriel's STP if and only if some contrast involving only those means-in P is judged

significant by Sche.ff6's procedure.

3.9.3 Kurtz-Link-Tukey-Wallace Range Procedure

The analysis of variance is based on sums of squares. For computational convenience, analogous

procedures based on ranges are available. Kurtz, Link, Tukey, and Wallace (1965) give a similar shortcut

procedure for multiple comparisons. This paper also has an interesting general discussion on the philosophy of

multiple comparisons.

3.9.4 Covariance Adjusted Means

For multiple comparisons of adjusted treatment means in an analysis of covariance., see Kramer (1957),

Halperin and Greenhouse (1958); Scheffd (1959, pp. 209-213); Bancroft (1968, Section 8.7); and Thigpen and

Paulson (1974).

3.9.5 Procedures for Two-Way Interactions

Suppose that the t treatments are in the form of a p x q factorial, both factors being qualitative. The

partitioning of the pq-1 degrees of freedom for the t = pq treatments is discussed in Section 2.2. Harter (1970)

gives a procedure for comparing interaction effects of the form

A,Bu + A1Bv A,B, A1B, = [(A, A,)Bu]-[(Ai AI)BV]

= [A,(Bu Bv)l-fA,(BE B,)],

where ABu, for example, is the mean for the i-th level of factor A and the u-th level of factor B. The

preceding interaction is the difference between two differences; viz., (difference between the i-th and the

j -th levels of factor A, both at the u-th level of B) minus (difference between the i-th and the j -th levels of

A, both at the v -th level of B). As the second form of the expression shows, the interaction also can be written

as the difference between the u -th and the v -th levels of B at the i -th level of A minus the same difference at

the j -th level of A. See also Dunn and Massey (1965), Sen (1969), Johnson (1976), and Bradu and Gabriel

(1974). The last paper describes three methods for testing and simultaneous interval estimation.

3.9.6 Nonparametric Methods

In all the methods considered so far, we have assumed that the data are distributed normally. If we

cannot or do not wish to make this assumption, we must resort to nonparametric methods for separating the

means. See Steel (1959, 1961); Dunn (1964); Miller (1966, ch4); Rhyne and Steel (1965, 1967); McDonald and

Thompson (1967); Tobach et al. (1967); Rizvi, Sobel, and Woodworth (1968); Sen (1969); Puri and Puri (1969);

Slivka (1970); and Hollander and Wolfe (1973, Sections 6.3, 7.3, and 7.7).

3.9.7 Gupta's Random Subset Selection Procedure

In experiments where the scientist is looking for the best treatment (e.g., a plant breeder selecting a new

variety for highest yield or resistance to some disease), multiple comparison techniques are inappropriate.

We cited Gupta and Sobel (1958) in Section 3.8.2 for a method for selecting treatments that are as good as or

better than a control or standard treatment. Some selected references on problems of selecting the best out of

t treatments are Paulson (1964); Gupta (1965); Robbins, Sobel, and Starr (1968); Bechhofer, Kiefer, and Sobel

(1968); Sobel (1969); Tong (1970); Rizvi (1971); Chiu (1974a, 1974b); a review paper with 71 references by

Weatherill and Ofosu (1974); Wackerly (1975); Santner (1975); and Gupta and Panchapakesan (1971).

Selection problems may be posed in several ways, of which the following two are the most common.

(a) Given 8* > 0 and P* < 1, find a procedure that will, with probability of at least P*, choose the

population with the largest mean if this mean exceeds the second largest mean by at least 8*.

(b) Given 1/t < P* < 1, find the smallest subset of the t treatments such that the probability is at least P*

that the subset will contain the best population.

The preceding formulations are referred to as the "indifference zone" and the "random subset" approaches,

respectively. In (a), we are indifferent to all differences that are less than 8*; and in (b), the number of

treatments that are included in the subset is a random variable. Decision theoretic approaches (minimax,

Bayesian, etc.) are also possible.

Gupta (1965) gives the following random subset solution. Include the i-th treatment in the subset if its

sample mean Yi satisfies the condition

71 -Y 7max. t(a;t,v)sd, (3.9c)

where t(a;t,v) is the one-sided critical value of Dunnett's test statistic (Section 3.8.1). Values of t(a;t,v) are

given in Table F1 in the appendix, with t = (q+1); e.g., if t = 7, we look under q = (t-1) = 6.

In our numerical example, we have t = 7, v = 30 d.f., Sd = V2s2/n = V2(79.64)/6 = 5.15, and max. = 71.3.

Taking a = 1 -P* = .05, the value of t(.05; 7, 30) from Table F1 with t = 7 (or q = 6) is 2.40. From Equation

(3.9c), we include in the subset all treatments whose means exceed 71.3 (2.40) (5.15) = 71.3 12.36 = 58.94.

Thus, we are 95% confident that the set (C, D, E, F, G) will contain the best treatment (variety).

3.9.8 Scott and Knott's Cluster Analysis Method

If a scientist has collected a mass of data (usually multivariate), he may wish to know if these came from

one or more populations. If the latter, he would like to know into how many groups or clusters the data should

be divided, and the best way of forming these groups. (For a recent paper and book on cluster analysis, see

Kuiper and Fisher (1975) and Hartigan (1975).) With univariate data, we can arrange the observations in

ascending order. If the data are 10, 11, 55, 56, 59, for example, they can be divided into two clusters in an

obvious manner,' namely (10, 11) and (55, 56, 59). In less clearcut situations, an objective criterion for

grouping is required. If we know that the data came from two populations only, we can form the two groups

by maximizing the sum of squares between the two groups (or equivalently, such that the sum of the within

groups sums of squares is a minimum). With t observations (or means), we need only consider the (t -1)

possible partitions formed by dividing between two successive ordered means. The multiple range tests we

have considered do, in fact, group the means, but they allow a particular mean to be in more than one group.

Duncan's test, for example, groups the means in the example into (A, B), (B, C, D, E), and (C, D, E, F, G).

Tukey (1949) was the first to consider forming nonoverlapping clusters by looking at the gaps in the ordered

means and testing their statistical significance, but he retracted this procedure in his 1953 manuscript

(circulated privately) on the problem of multiple comparisons.

Scott and Knott (1974) propose the following sequential partitioning and testing procedure. Arrange the

t = 7 means in ascending order, denoted by A, B, C, D, E, F and G, respectively. Partition these into two

groups, using the above criterion. Suppose this results in (A, B, C, D) and (E, F, G) as the two groups. Now

test the null hypothesis Ho: kp = = . = t.7 against the alternative hypothesis Ha: /4, = m, or min.

(Presumably, the overall F test with 6 and v d.f. need not be performed. The usual F statistic tests Ho against

the most general alternative that not all means are equal. The proposed procedure tests H( against the much

more specific alternative that all the means are either m, or m2, with at least one mean in each group, and,

therefore, should be more powerful than the usual F test.) If Ho is rejected, we partition (A, B, C, D) into two

groups and test the equality of these groups. The procedure is similar for (E, F, G). It is repeated until H,, is

accepted.

The test is as follows. We assume that the t means y,, Y2, . ., t are uncorrelated and homoscedastic.

which implies equal replications n, say. As usual, let s" be the estimate (with v d.f.) of the common variance o-'

of single observations. (In the completely randomized design, v = t(n-1).) Suppose that the partitioning

criterion forms two groups with t, and t., = (t -t,) means. The groups G, and G. will contain nt, and nt. original

observations, respectively. Let T, be the sum of the nt, observations in G,, and similarly for T_. In the usual

analysis of variance computations, the between groups sum of squares is

B,, = [IT /(nt) I [T.,2/(nt) I [T2/(nt) (3.9d)

where T = (T, + T2). Under the null hypothesis, the maximum likelihood estimate of (r2 is

t

do2 = n )2 + vs2 V(t + v), (3.9e)

i=1

where 7 = (y, +. .. + 't)/t.

The test statistic is

A = m 1B,i',i,,)i 2(,r -2)1 = 1.37(6 (B, 'i(j-,). '.'.

The 95% points for the distribution of X were obtained by simulation and were found to be approximated

adequately, for practical purposes, by the chi-square distribution with v(, = t/(7r -2) = t/(1.1416) d.f.

(The simulation also included the case with v = 0, for which the 95% points of X were estimated to be 2.75,

6.60, 12.11, and 21.74 for t = 2, 5, 10, and 20, respectively. This shows that we can test the homogeneity of t

means, even when each mean is based on n = 1 replication. This is, of course, impossible with the usual F test

and its general alternative hypothesis since the error mean square has zero d.f. As mentioned earlier, the

present X test makes an extra assumption about the alternative hypothesis.)

In our numerical example, t = 7, n = 6, s2 = 79.64 with v = 30 d.f. (design being that of a randomized

block experiment). The means in ascending order were 49.6(A), 58.1(B), 61.0(C), 61.5(D), 67.6(E), 71.2(F),

and 71.3(G). To find the partition with the largest between groups sum of squares, we should try, theoreti-

cally, the t 1 = 6 possible partitions: (A, BCDEFG), (AB, CDEFG), (ABC, DEFG), (ABCDE, FG),

(ABCDEF,G). In practice, we need try two or three possibilities only. (With a computer it is easy enough to

try all (t-1) partitions.) In this example, (A, BCDEFG) and (ABCD, EFG) are the two most serious

candidates.. It can be shown that (ABCD, EFG) is the optimum partition. Here, t, = 4, t., = 3, T, = 6 (49.6 -

58.1 + 61.0 + 61.5) = 3181.2, T., = 1260.6, T = T, + T., = 2641.8,y = (49.6 +. . + 71.3)/7 = 62.9. and !.(V, -

y)2 = 370.04.

From Equations (3.9d) and (3.9e), B, = (1381.2)2/24 + (1260.6)2/18 (2641.8)2/42 = 1602.86 and d' =

[6(370.04) + 30(79.64)]/(7 + 30) = 124.58. From Equation (3.9f), the test statistic is X = 1.376 (1602.86/124.58)

= 17.70. Using the chi-square approximation with vo = t/1.1416 = 7/1.1416 = 6.1 d.f., the value 17.70 is

significant. (The 95% point of the chi-square distribution is 12.6 for 6 d.f. and 14.1 for 7 d.f.')

We next have to partition (ABCD) and (EFG). In partitioning (EFG), t is now equal to three. For t = 3

means, the optimum partition is at the larger of the two gaps, giving (E, FG) with t, = 1, t. = 2, T, = 405.6, T.,

= 855.0, 1(y, y)2 = 8.8866, giving d,,2 = [6 (8.8866) + 30 (79.64)1/33 = 74.02, B,, = 405.62/6 + 855 2/12 -

1260.62/18 = 53.29, and X = (1.376) (53.29)/74.02 = 0.99, which is not significant. The significance of the

partition of (ABCD) into (A, BCD) is borderline. If we accept this as being significant, the final groupings are

A, BCD, ard EFG, which is what inspection of the means would suggest.

For another cluster analysis approach to multiple comparisons, see Jolliffe (1975).

3.9.9 Multivariate Populations

We hav-e so far considered univariate populations only. Quite often, we may collect several kinds of

measurements from each experimental unit. For example, in comparing t brands of chocolate cake mixes, we

may evaluate the resulting cakes with respect to each of p characteristics (flavor, aroma, texture, moistness,

etc.). As another example, we may compare t treatments (storage conditions) for degreening lemons and take

color measurements on each of p dates. We may (and sometimes do) carry outp separate univariate analyses

of variance, one for each of the p characteristics or dates, but we sacrifice some power in not making use of the

correlations among the p characteristics. There is also a problem with the overall significance level in making

p separate analyses. Preferably, we should perform one multivariate (p-dimensional) analysis of variance. If

the null hypothesis of equal mean vectors (each population mean is now a set of p numbers) is rejected, we now

have two different kinds of multiple comparison problems. With respect to which of the p characteristics do

the populations differ? (In the preceding cake example, do the cakes differ in flavor only, in flavor and texture

only, or in all p characteristics?) We do not, of course, have this problem in univariate (p = 1) situations. We

have been considering the other kind of multiple comparisons in this report (viz., which populations differ

from which). These comparisons are discussed in Kramer (1972, Section 5.11), Gabriel (1968, 1969b),

Krishnaiah (1969), Miller (1966, Chapter 5), and Morrison (1967, Section 5.4).

3.9.10 Subset Selection Approach to Multiple Comparisons

We mentioned in the last paragraph of Chapter 1 that hypothesis testing is usually almost totally

irrelevant. Two treatments will be declared significantly different if they are sufficiently replicated. If two

means are declared significantly different, many experimenters often are misled into thinking that the

difference is of practical importance. Reading (1975) applies the indifference zone formulation of subset

selection problems to multiple comparisons. The experimenter specifies three quantities: P(probability that

all decisions concerning pairwise means are correct, an experimentwise probability), 6S (largest amount that

two populations can differ and still be considered practically the same), and 8* (smallest amount by which two

population means must differ to be considered definitely different). The interval (86,8*) is the indifference

zone. If two treatments differ by an amount in this zone, the experimenter does not care whether the

treatments are declared different or the same. Given these three quantities, Reading gives tables for the

necessary sample size and the critical value that must be exceeded for the difference between two means to be

declared significant. Unfortunately, at present, the tables go up to t = 4 treatments only and assume that oy2

is known.

3.9.11 Other Parameters and Populations

In this publication, we have been comparing, estimating, or selecting normal populations with respect to

their means. We conclude this chapter by citing selected references to similar work for other parameters and

other populations.

(a) Variances of normal 'populations. See David (1956), Ryan (1960), Bechhofer (1968), and Levy (1975a,

1975b) for multiple comparisons; Jensen and Jones (1969) for simultaneous interval estimation; Gupta

(1965), Ofosu (1975), and Arvesen and McCabe (1975) for subset selection.

(b) Various kinds of simultaneous prediction intervals. Hahn (1970, 1972).

(c) Regression coefficients. Duncan (1970) for multiple comparisons, and Hahn and Hendrickson (1971) for

simultaneous interval estimation.

(d) Subset selection for normal population with the largest (or smallest) a quantile. Barlow and Gupta

(e) Subset selection for normal population with the largest exceedance probability. Kappenman (1972)

gives a method for selecting the normal population with the highest hi = P(X, > c), where X, ~ N(t, ai)

and c is a given constant.

(f) .Comparison of several independent treatment mean squares against a common error mean square. See

Nair (1948); Hartley (1955); and David (1962, pages 155-156).

(g) Subset selection for gamma populations. Gupta (1963).

(h) Ranking and selection of binomial populations. Gupta and Sbbel (1960), Ryan (1960), Taylor and David

(1962), Paulson (1967), Bland and Bratcher (1968), Hoel and Sobel (1972), and Leonard (1972).

(i) Multinomial populations. Goodman (1965) and Fienberg and Holland (1973) for simultaneous estima-

tion; Bechhofer, Elmaghraby, and Morse (1959) for selection; and Gabriel (1966) for multiple compari-

sons.

(j) Subset selection for Poisson, negative binomial, and Fisher's logarithmic distributions. Gupta and

Panchapakesan (1971).

31

(k) Multiple comparisons of regression functions. Spj0tvoll (1972).

(1) Multiple comparisons of logistic curves. Reiers0l (1961).

(m) Selection of best treatment in paired-comparison experiments. Trawinski and David (1963).

(n) Ranking of main effects in analysis of variance, variances of normal populations, and correlation

coefficients of bivariate normal distributions. Eaton (1967).

(o) Interval estimation of a ranked parameter. Alam and Saxena (1974).

(p) Simultaneous interval estimation of contrasts among means of a multivariate normal population.

Bhargava and Srivastava (1973).

(q) Applications to multiple regression problems. Miller (1966), Morrison (1967, Section 3.6), Wynn and

Bloomfield (1971), Hochberg and Quade (1975), and Tarone (1976).

CHAPTER 4. CONCLUSION

The findings from some Monte Carlo sampling studies that have been conducted to evaluate the relative

performances of the various multiple comparison procedures are summarized in this chapter. Here, we

assume that multiple comparisons are appropriate, ruling out situations covered in Chapter 2, where the

proper statistical technique is the partitioning of the degrees of freedom for treatments into orthogonal

contrasts. When it is not possible a priori to form meaningful orthogonal contrasts, it is assumed that the

problem is really one of multiple comparisons and not of ranking and subset selection. A plant breeder who is

interested in selecting a new variety should not be concerned with multiple comparisons of all possible pairs of

varieties.

Scheff6's method is the most versatile. It allows unequal replications, correlated means from covariance

adjustment, general contrasts (and not just paired comparisons), and simultaneous interval estimation. The

penalty for this generality is reduced power (failure to detect true differences in testing and wide confidence

intervals in interval estimation of differences between two means). Tukey's HSD method also can handle

general contrasts and interval estimation, but it requires equal replications and uncorrelated means.

Duncan's and Newman-Keuls' multiple range tests are exact only for paired comparisons of uncorrelated

means with equal replications and are not adaptable for interval estimation. The LSD easily can handle

unequal replications, can be used for interval estimation, and can be extended in a simple and obvious manner

to general contrasts. Duncan's Bayesian k-ratio rule is too new to have found widespread acceptance by

experimental scientists. Duncan is very enthusiastic about this procedure and, in a private communication.

expressed the hope that his Biometrics 1975 paper "will mark the beginning of the end of all of the earlier (pre-

1960) a-level multiple comparison procedures."

We refer the reader to Section 3.6.2, where we tabulate the LSD's for the various procedures (in

multiples of the standard error of the difference between two means). In ascending order, we have Fisher's

LSD, Duncan's k-ratio rule, Duncan's MRT, Newman-Keuls' MRT, Tukey's MRT, Tukey's HSD, and

Scheffe's method. (Duncan's k-ratio rule is data dependent. It may be more "reckless" than Fisher's LSD or

more conservative than Tukey's HSD, depending on the observed value of the F ratio for treatments.) The

above order is, therefore, in decreasing order of the number of paried comparisons that will be declared

significant. If the objective is to find as many significantly different pairs as possible, Fisher's LSD is best.

The problem, however, is not this simple.

There are two main difficulties in assessing the relative merits of the multiple comparison procedures.

"In testing a hypothesis involving a simple two-decision situation, such as that to which the Neyman-Pearson

theory is directly applicable, one compares two competing test criteria by fixing the Type I errors to be the

same for both and compare the two power curves. Unfortunately, multiple-comparison procedures do not

pertain to a single simple two-decision situation, but are special cases of multiple-decision procedures. At

present there is no generally acceptable analytical method of comparing, in a manner similar to that for the

two-decision situation, two competing multiple-decision test criteria." (Bancroft 1968, p. 105.)

Another difficulty is due to the different error rates used. Tukey's and Scheff6's'methods use an

experimentwise error rate, while Fisher's LSD adopts a comparisonwise error rate. The multiple range tests

of Duncan and of Newman-Keuls use different error rates, both of which are neither experimentwise nor

comparisonwise. Duncan's k-ratio rule does not even use the concept of error rate; it uses the ratio of the

relative seriousness of the two types of errors.

Because of these difficulties, the procedures have been compared using Monte Carlo sampling methods

only. There is a difficulty with such empirical sampling studies. It is easy to study the probability of Type I

error (declaring two equal means to be unequal) because there is, of course, only one way in which t means can

be equal. It is much more difficult to compare the probability of Type II error (declaring two unequal means to

be equal), because t means can be unequal in many ways. They can be all unequal (equally spaced, clustered in

t\N o or more groups, etc.), all equal but one, etc. It is unlikely that one method will be best for all patterns of

inequality.

Balaam (196S) was the first to publish results of a sampling study. He considered only four means, each

with five observations, in eighteen configurations: (0;0,0,0), (1,0,0,0),. ., (6,0,0,0); (1,1,0,0), (2,1,0,0), . .,

(5,1,0,0); (2,2,0,0), (3,2,0,0), (4,2,0,0); (3,3,0,0), (4,2,1,0), and (4,4,1,0). Three procedures (LSD, Newman-

Keuls'. and Duncan's MRT) were compared, each with and without a significant preliminary F test. The

Newman-Keuls' procedure was found inferior. The LSD was superior to Duncan's MRT, in both protected

and unprotected cases, but the difference in performance was small in the protected case.

Boardman and Moffitt (1971) compared five procedures (LSD, Scheffe's, Tukey's HSD, Newman-Keuls'

MRT, and Duncan's MRT) for testing all possible pairs of means with respect to their Type I comparisonwise

and experimentwise error rates. They carried out 30 sets of 10,000 sampling experiments with t = 2, 3,. .,

11 normal populations; samples of equal sizes n = 5, 10, and 15; and a = .05.

For t = 10 treatments, and taking a = 5%, the Type I comparisonwise error rate for Duncan's MRT is

a:ibt 2.5 .21% for Tukey's, and .01% for Scheff6's procedure, showing the conservativeness of the latter

two procedures.

On an experimentwise basis, the error rate in Tukey's HSD and Newman-Keuls' multiple range test

remains constant at 5% as t increases from 2 to 10, while for Duncan's MRT and Fisher's LSD, it increases to

q- and 63% respectively. For Scheff6's procedure, it decreases from 5% to .23%, showing conservativeness

of the Scheffe procedure for pairwise contrasts. Thus, with t = 10 populations with equal means (and (10 x 0)/2

= 45 po ssible pairwise comparisons), there is a 38% probability that one or more of the 45 comparisons will be

diec red significantly different by Duncan's procedure.

in view of this rather high experimentwise probability, Gill (1973) recommends that Duncan's procedure

be discontinued. Of course, Gill has even stronger feelings against the LSD procedure. In defense of these

tvo procedures, the comparison, rather than the experiment, is the basic unit for the comparisonwise

adherents. One wrong conclusion will not affect the usefulness of the remaining 44 comparisons. On the other

hand, the rationale of the experimentwise error rate philosophy is that one wrong comparison vitiates all of

the remaining .44 comparisons. Thus, making one wrong conclusion is as serious as making 45 wrong

judgments in the same experiment (is this reasonable, in most cases?). We have to ensure that all 45

comparisons are correct, not without having to pay a high premium, of course. For example, in a cubic lattice

dic.:ign wit h t = 729 varieties, (Cochran and Cox 1957, page 423), it will be virtually impossible to ensure that

'!/ 729 x 7';)/2 '" 265,356 paired comparisons will be judged correctly.

Be.-cause of the independence of the validity of the individual comparisons (in the comparisonwise school),

we can "aforrd" one wrong comparison out of 45. After all, in a 5% test, there is a one in 20 chance of an

incorr-ct rejection so that out of 45 comparisons we should expect and tolerate about two wrong conclusions.

io .:!, ,,i tiho probability of one or more wrong rejections out of 45, it will be interesting to know also the

ro,,,iiiy ,i wo or more wrong rejections. If the probability of two or more incorrect conclusions is

a .nsik riraily lower than that of one or more wrong conclusions, this should remove much of Gill's objections to

)uncan's MRT and Fisher's LSD procedures.

In agricultural experiments, the treatment means are much more likely to be unequal so that Type II

error consideration should be at least as important as Type I error consideration. In the Boardman and

Moffitt study. the procedures were applied without a prior significant overall F test, which is, in fact, a

prerequisite of the Fisher's protected LSD method. Although not required for the Duncan procedure, it may

he desirable to apply the procedure only after a significant F test. As Dunnett (1970) points out, multiple

comparison procedures are techniques for ferreting out differences among the t means, and there is no reason

for doing so, unless there is an indication that differences exist, either priori or as evidenced by a significant

i- test. The experimentwise error rates for the protected Fisher's LSD and the "protected" Duncan's MRT

.ill, of course, be 5%. See Bernhardson (1975).

33

Based on the Boardman-Moffitt study (who considered only the null case of equal means). Gill recom-

mended Tukey's HSD and, to a lesser extent, the Newman-Keuls' procedure. In another simulation study,

Carmer and Swanson (1973) recommended just the opposite. Their conclusions were:

. that Scheffd's test, Tukey's test, and the Student-Newman-Keuls' test are less appropriate than either the least

significant difference with the restriction that the analysis of variance F value be significant at a = .05, two Bayesian

modifications of the least significant difference or Duncan's multiple range test. Because of its ease of application, many

researchers may prefer the restricted least significant difference.

Carmer and Swanson conducted 88,000 simulations in all, with various numbers of treatments and

replicates, and different patterns of heterogeneity among the treatment means. The study "was prompted

mainly by the authors' own uncertainty as to the most appropriate procedure to recommend to students and

researchers in the agricultural sciences." In an earlier publication, Carmer and Swanson (1971) reported on 5

of the present 10 procedures.

The following multiple comparison procedures were studied:

1. LSD (unprotected)

2. TSD (Tukey's HSD)

3. SNK (Student-Newman-Keuls)

4. MRT (Duncan's multiple range test)

5. SSD (Scheffd's procedure)

6. FSD1 (Fisher's protected LSD, with the preliminary F test applied at the 1% level)

7. FSD2 (as in FSD1 but F test at 5% level)

8. FSD3 (as in FSD1 but F test at 10% level)

9. BSD (Duncan's approximate Bayesian k-ratio LSD rule for t -- 15 treatment and error d.f. v t 30;

see Equation (3.6e) of present report)

10. BET (Waller-Duncan's exact Bayesian k-ratio LSD rule)

We quote from Section 7 ("Concluding Remarks") of Carmer and Swanson (1973):

. the SSD should never be employed for pairwise multiple comparisons. . the TSD and SNK are clearly inferior in

ability to detect real differences. Although the SSD, TSD, and SNK provide excellent protection against Type I errors, it is the

authors' feeling that, in evaluation of the various procedures, concern for ability to detect real differences should receive a high

priority. . the FSD1 procedure also appears to stress protection against Type I errors at the expense of sensitivity . it

also seems reasonable not to recommend procedures which unduly deemphasize protection against Type I errors. From this

point of view, then, the ordinary LSD and perhaps the FSD3 can be eliminated from consideration: in addition, their

sensitivities to real differences are not appreciably greater than those of the FSD2, BSD, BET, and MRT. These latter four

procedures thus constitute a group from which the consulting statistician or experimenter might generally make a choice .

while the MRT often produces a lower frequency of Type I errors, the other three are generally more sensitive in detecting real

differences . dependence of the critical value on the observed analysis of variance F value is more appealing than

dependence on the number of treatments in the experiment. Since the BET is an improved and more exact version than the

BSD, it seems reasonable to prefer the former. . the procedure (BET) is easier to apply than the MRT . many subject

matter researchers will find the FSD2 attractive because of its simplicity and the fact that they are already familiar with

Student's t table.

Carmer and Swanson's final choice is thus between FSD2 and BET. Waller and Duncan (1969) claim that

the similarity in performance between the FSD2 and BET says a lot for BET, but as Carmer and, Swanson

point out, it is just as reasonable to claim that this similarity speaks a lot for the FSD2.

Thomas (1974) compared "seven methods of pairwise comparisons and four for constructing simultane-

ous sets of confidence limits. The general conclusions are that Duncan's multiple range test is the best method

of those considered for the former and the Bonferroni t-based limits for the latter."

We mentioned at the beginning of this chapter that one main difficulty in comparing the procedures is due

to the different kinds of Type I error rates used. Comparing one procedure using a 5% comparisonwise Type I

error rate with another procedure using a 5% experimentwise Type I error rate is almost like comparing

oranges with bananas. As Einot and Gabriel (1975) pointed out, any observed difference in the performance of

the two procedures is more likely to be due to the different Type I error probabilities than to the techniques

used. Therefore, one should force all procedures to have the same experimentwise (or comparisonwise) Type

I error rate and compare their powers, as in the Neyman-Pearson two-decision situations. With orthogonal

contrasts and large numbers of degrees of freedom for error mean square, we have seen in Section 3.1 that for

t = 10 treatments, say, a 5% experimentwise error rate corresponds to a .57% comparisonwise error rate, and

a 5% comparisonwise error rate is equivalent to a 36.98% experimentwise error rate.

Einot and Gabriel (1975) studied the powers of multiple comparison procedures for fixed maximal

experimentwise levels, and ". ..generally recommend the Tukey technique for its elegant simplicity and

existent confidence bounds-its power is little below that of any other method. Simulation was for 3, 4, and 5

treatments: the conclusions might need modification for more treatments."

No doubt the reader will think that the last word has not been written on the choice of a multiple

comparison procedure. (Some statisticians do not even believe in multiple comparisons. In his discussion of

the review paper by O'Neill and Wetherill (1971), R. L. Plackett expressed his "view that much of the subject

of multiple comparisons is essentially artificial," while J. A. Nelder went so far as stating that in his opinion

"multiple comparison methods have no place at all in the interpretation of data.") In the final analysis, the

choice will be subjective. To a very large extent, this choice will hinge on a choice between an experimentwise

error rate (for which Tukey's HSD is the recommended procedure) and a comparisonwise error rate (for

which Duncan's MRT is recommended). As mentioned earlier, the author's opinion is that in the majority of

cases, the comparisonwise basis is more appropriate since one wrong inference usually does not make the

other inferences in the same experiment meaningless. There is really not that much difference between the

methods. We can remove or reduce objections to Duncan's MRT by requiring an initial significant overall F

test or by taking Duncan's comparisonwise a to be 0.01 or 0.001. Similarily, we can remove or reduce

objections to Tukey's HSD by taking Tukey's experimentwise a to be 0.10 or 0.25, but, as Einot and Gabriel

wondered, it may be that "it does not seem scientifically respectable to work explicitly with a level of 0.25."

The choice of the kind of Type I error rates is bypassed altogether in the Waller-Duncan Bayesian k-ratio

LSD rule. It also has the extremely appealing feature that the observed F value is used in the calculation of

the LSD. With a large F (of 3.0 and above, indicating strong evidence of existence of differences), the test

behaves like the comparisonwise procedures (Duncan's MRT and Fisher's LSD) with good power properties,

while for a small F, it becomes conservative with good protection against Type I error, as in the Tukey HSD

procedure. It is as if the choice between a comparisonwise and an experimentwise error rate is taken out of

the experimenter's hands and is determined by the experiment itself (the experimental F value). "In this way

the decision theoretic rule enjoys the advantages of both comparisonwise and experimentwise a rules without

their disadvantages." (Dixon and Duncan 1975, p. 822). This procedure will become more popular in the

future, especially if more extensive tables become available.

35

7"

TABLE A.-Two-sided (100 alm)% points of student's t-distribution with v degrees of freedom*

a = .05

2 3 4 5 6 7 8 9 10 15 20 25 30 35 40 45 50

5 3.17 3.54 3.81 4.04 4.22 4.38 4.53 4.66 4.78 5.25 5.60 5.89 6.15 6.36 6.56 6.70 6.86

7 2.84 3.13 3.34 3.50 3.64 3.76 3.86 3.95 4.03 4.36 4.59 4.78 4.95 5.09 5.21 5.31 5.40

10 2.64 2.87 3.04 3.17 3.28 3.37 3.45 3.52 3.58 3.83 4.01 4.15 4.27 4.37 4.45 4.53 4.59

12 2.56 2.78 2.94 3.06 3.15 3.24 3.31 3.37 3.43 3.65 3.80 3.93 4.04 4.13 4.20 4.26 4.32

15 2.49 2.69 2.84 2.95 3.04 3.11 3.18 3.24 3.29 3.48 3.62 3.74 3.82 3.90 3.97 4.02 4.07

20 2.42 2.61 2.75 2.85 2.93 3.00 3.06 3.11 3.16 3.33 3.46 3.55 3.63 3.70 3.76 3.80 3.85

24 2.39 2.58 2.70 2.80 2.88 2.94 3.00 3.05 3.09 3.26 3.38 3.47 3.54 3.61 3.66 3.70 3.74

30 2.36 2.54 2.66 2.75 2.83 2.89 2.94 2.99 3.03 3.19 3.30 3.39 3.46 3.52 3.57 3.61 3.65

40 2.33 2.50 2.62 2.71 2.78 2.84 2.89 2.93 2.97 3.12 3.23 3.31 3.38 3.43 3.48 3.51 3.55

60 2.30 2.47 2.58 2.66 2.73 2.79 2.84 2.88 2.92 3.06 3.16 3.24 3.30 3.34 3.39 3.42 3.46

120 2.27 2.43 2.54 2.62 2.68 2.74 2.79. 2.83 2.86 2.99 3.09 3.16 3.22 3.27 3.31 3.34 3.37

2.24 2.39 2.50 2.58 2.64 2.69 2.74 2.77 2.81 2.94 3.02 3.09 3.15 3.19 3.23 3.26 3.29

a = .01

5.89

4.78

4.15

3.93

3.74

3.55

3.47

3.39

3.31

3.24

6.15

4.95

4.27

4.04

3.82

3.63

3.54 .

3.46

3.38

3.30

6.36

5.09

4.37

4.13

3.90

3.70

3.61

3.52

3.43

3.34

6.56

5.21

4.45

4.20

3.97

3.76

3.66

3.57

3.48

3.39

6.70

5.31

4.53

4.26

4.02

3.80

3.70

3.61

3.51

3.42

6.86

5.40

4.59

4.32

4.07

3.85

3.74

3.65

3.55

3.46

7.51

5.79

4.86

4.56

4.29

8.00

6.08

5.06

4.73

4.42

8.68

6.49

5.33

4.95

4.61

4.33

4.2t

4.13

3.93

3.81

8.95 9.19 9.41 9.68

6.67 6.83 6.93 7.06

5.44 5.52 5.60 5.70

5.04 5.12 5.20 5.27

4.71 4.78 4.84 4.90

4.39 4.46 4.52 4.56

4.3t 4.3t 4.3t 4.4t

4.26 4.1t 4.2t 4.2t

3.97 4.01 4.1t 4.1t

3.84 3.89 3.93 3.97

I

120 2.86 2.99 3.09 3.16 3.22 3.27 3.31 3.34 3.37 3.50 3.58 3.64 3.69 3.73 3.77 3.80 3.83

O 2.81 2.94 3.02 3.09 3.15 3.19 3.23 3.26 3.29 3.40 3.48 3.54 3.59 3.63 3.116 3.69 3.72

tObtained by graphical interpolation.

Source: Reproduced from Olhie Jean Dunn, Multiple Comparisons Among Means. Journal of the American Statistical As n-.iition, vol. 56 (19(11), pp. 52-414, with the

permission of tha author and the editor.

4.78

4.03

3.58

3.43

3.29

3.16

3.09

3.03

2.97

2.92

5.25

4.36

3.83

3.65

3.48

3.33

3.26

3.19

3.12

3.06

5.60

4.59

4.01

3.80

3.62

3.46

3.38

3.30

3.23

3.16

TABLE B.--Percentage points of the studentized range q(a;p,v)*

a = .05

2 3 4 5 6 7 8 9 10

1 17.97 26.98 32.82 37.08 40.41 43.12 45.40 47.36 49.07

2 6.085 8.331 9.798 10.88 11.74 12.44 13.03 13.54 13.99

3 4.501 5.910 6.825 7.502 8.037 8.478 8.853 9.177 9.462

4 3.927 5.040 5.757 6.287 6.707 7.053 7.347 7.602 7.826

5 3.635 4.602 5.218 5.673 6.033 6.330 6.582 6.802 6.995

6 3.461 4.339 4.896 5.305 5.628 5.895 6.122 6.319 6.493

7 3.344 4.165 4.681 5.060 5.359 5.606 5.815 5.998 6.158

8 3.261 4.041 4.529 4.886 5.167 5.399 5.597 5.767 5.918

9 3.199 3.949 4.415 4.756 5.024 5.244 5.432 5.595 5.739

10 3.151 3.877 4.327 4.654 4.912 5.124 5.305 5.461 5.599

11 3.113 3.820 4.256 4.574 4.823 5.028 5.202 5.353 5.487

12 3.082 3.773 4.199 4.508 4.751 4.950 5.119 5.265 5.395

13 3.055 3.735 4.151 4.453 4.690 4.885 5.049 5.192 5.318

14 3.033 3.702 4.111 4.407 4.639 4.829 4.990 5.131 5.254

15 3.014 3.674 4.076 4.367 4.595 4.782 4.940 5.077 5.198

16 2.998 3.649 4.046 4.333 4.557 4.741 4.897 5.031 5.150

17 2.984 3.628 4.020 4.303 4.524 4.705 4.858 4.991 5.108

18 2.971 3.609 3.997 4.277 4.495 4.673 4.824 4.956 5.071

19 2.960 3.593 3.977 4.253 4.469 4.465 4.794 4.924 5.038

20 2.950 3.578 3.958 4.232 4.445 4.620 4.768 4.896 5.008

24 2.919 3.532 -3.901 4.166 4.373 4.541 4.684 4.807 4.915

30 2.888 3.486 3.845 4.102 4.302 4.464 4.602 4.720 4.824

40 2.858 3.442 3.791 4.039 4.232 4.389 4.521 4.635 4.735

60 2.829 3.399 3.737 3.977 4.163 4.314 4.441 4.550 4.646

120 2.800 3.356 3.685 3.917 4.096 4.241 4.363 4.468 4.560

0 2.772 3.314 3.633 3.858 4.030 4.170 4.286 4.387 4.474

TABLE B. -Percentage points of the studentized range q(a;p, v)*-Continued

a = .05

P

11 12 13 14 15 16 17 18 19

1 50.59 51.96 53.20 54.33 55.36 56.32 57.22 58.04 58.83

2 14.39 14.75 15.08 15.38 15.65 15.91 16.14 16.37 16.57

3 9.717 9.946 10.15 10.35 10.53 10.69 10.84 10.98 11.11

4 8.027 8.208 8.373 8.525 8.664 8.794 8.914 9.028 9.134

5 7.168 7.324 7.466 7.596 7.717 7.828 7.932 8.030 8.122

6 6.649 6.789 6.917 7.034 "7.143 7.244 7.338 7.426 7.508

7 6.302 6.431 6.550 6.658 6.759 6.852 6.939 7.020 7.097

8 6.054 6.175 6.287 6.389 6.483 6.571 6.653 6.729 6.802

9 5.867 5.983 6.089 6.186 6.276 6.359 6.437 6.510 6.579

10 5.722 5.833 5.935 6.028 6.114 6.194 6.269 6.339 6.405

11 5.605 5.713 5.811 5.901 5.984 6.062 6.134 6.202 6.265

12 5.511 5.615 5.710 5.798 5.878 5.953 6.023 6.089 6.151

13 5.431 5.533 5.625 5.711 5.789 5.862 5.931 5.995 6.055

14 5.364 5.463 5.554 5.637 5.714 5.786 5.852 5.915 5.974

15 5.306 5.404 5.493 5.574 5.649 5.720 5.785 5.846 5.904

16 5.256 5.352 5.439 5.520 5.593 5.662 5.727 5.786 5.843

17 5.212 5.307 5.392 5.471 5.544 5.612 5.675 5.734 5.790

18 5.174 5.267 5.352 5.429 5.501 5.568 5.630 5.688 5.743

19 5.140 5.231 5.315 5.391 5.462 5.528 5.589 5.647 5.701

20 5.108 5.199 5.282 5.357 5.427 5.493 5.553 5.610 5.663

24 5.012 5.099 5.179 5.251 5.319 5.381 5.439 5.494 5.545

30 4.917 5.001 5.077 5.147 5.211 5.271 5.327 5.379 5.429

40 4.824 4.904 4.977 5.044 5.106 5.163 5.216 5.266 5.313

60 4.732 4.808 4.878 4.942 5.001 5.056 5.107 5.154 5.199

120 4.641 4.714 4.781 4.842 4.898 4.950 4.998 5.044 5.086

4.552 4.622 4.685 4.743 4.796 4.845 4.891 4.934 4.974

,. ~,-

TABLE B. -Percentage points of the studentized range q(a;p, v)*-Continued

a = .05

v20 22 24 26 28 30 32 34 36

1 59.56 60.91 62.12 63.22 64.23 65.15 66.01 66.81 67.56

2 16.77 17.13 17.45 17.75 18.02 18.27 18.50 18.72 18.92

3 11.24 11.47 11.68 11.87 12.05 12.21 12.36 12.50 12.63

4 9.233 9.418 9.584 9.736 9.875 10.00 10.12 10.23 10.34

5 8.208 8.368 8.512 8.643 8.764 8.875 8.979 9.075 9.165

6 7.587 7.730 7.861 7.979 8.088 8.189 8.283 8.370 8.452

7 7.170 7.303 7.423 7.533 7.634 7.728 7.814 7.895 7.972

8 6.870 6.995 7.109 7.212 7.307 7.395 7.477 7.554 7.625

9 6.644 6.763 6.871 6.970 7.061 7.145 7.222 7.295 7.363

10 6.467 6.582 6.686 6.781 6.868 6.948 7.023 7.093 7.159

11 6.326 6.436 6.536 6.628 6.712 6.790 6.863 6.930 6.994

12 6.209 6.317 6.414 6.503 6.585 6.660 6.731 6.796 6.858

13 6.112 6.217 6.312 6.398 6.478 6.551 6.620 6.684 6.744

14 6.029 6.132 6.224 6.309 6.387 6.459 6.526 6.588 6.647

15 5.958 6.059 6.149 6.233 6.309 6.379 6.445 6.506 6.564

16 5.897 5.995 6.084 6.166 6.241 6.310 6.374 6.434 6.491

17 5.842 5.940 6.027 6.107 6.181 6.249 6.313 6.372 6.427

18 5.794 5.890 5.977 6.055 6.128 6.195 6.258 6.316 6.371

19 5.752 5.846 5.932 6.009 6.081 6.147 6.209 6.267 6.321

20 5.714 5.807 5.891 5.968 6.039 6.104 6.165 6.222 6.275

24 5.594 5.683 5.764 5.838 5.906 5.968 6.027 6.081 6.132

30 5.475 5.561 5.638 5.709 5.774 5.833 5.889 5.941 5.990

40 5.358 5.439 5.513 5.581 5.642 5.700 5.753 5.803 5.849

60 5.241 5.319 5.389 5.453 5.512 5.566 5.617 5.664 5.708

120 5.126 5.200 5.266 5.327 5.382 5.434 5.481 5.526 5.568

00 5.012 5.081 5.144 5.201 5.253 5.301 5.346 5.388 5.427

TABLE B.--Percentage points of the studentized range q(a;p,v)*-Continued

a = .05

38 40 50 60 70 80 90 100

1 68.26 68.92 71.73 73.97 75.82 77.40 78.77 79.98

2 19.11 19.28 20.05 20.66 21.16 21.59 21.96 22.29

3 12.75 12.87 13.36 13.76 14.08 14.36 14.61 14.82

4 10.44 10.53 10.93 11.24 11.51 11.73 11.92 12.09

5 9.250 9.330 9.674 9.949 10.18 10.38 10.54 10.69

6 8.529 8.601 8.913 9.163 -.370 9.548 9.702 9.839

7 8.043 8.110 8.400 8.632 8.824 8.989 9.133 9.261

8 7.693 7.756 8.029 8.248 8.430 8.586 8.722 8.843

9 7.428 7.488 7.749 7.958 8.132 8.281 8.410 8.526

10 7.220 7.279 7.529 7.730 7.897 8.041 8.166 8.276

11 7.053 7.110 7.352 7.546 7.708 7.847 7.968 8.075

12 6.916 6.970 7.205 7.394 7.552 7.687 7.804 7.909

13 6.800 6.854 7.083 7.267 7.421 7.552 7.667 7.769

14 6.702 6.754 6.979 7.159 7.309 7.438 7.550 7.650

15 6.618 6.669 6.888 7.065 7.212 7.339 7.449 7.546

16 6.544 6.594 6.810 6.984 7.128 7.252 7.360 7.457

17 6.479 6.529 6.741 6.912 7.054 7.176 7.283 7.377

18 6.422 6.471 6.680 6.848 6.989 7.109 7.213 7.307

19 6.371 6.419 6.626 6.792 6.930 7.048 7.152 7.244

20 6.325 6.373 6.576 6.740 6.877 6.994 7.097 7.187

24 6.181 6.226 6.421 6.579 6.710 6.822 6.920 7.008

30 6.037 6.080 6.267 6.417 6.543 6.650 6.744 6.827

40 5.893 5.934 6.112 6.255 6.375 6.477 6.566 6.645

60 5.750 5.789 5.958 6.093 6.206 6.303 6.387 6.462

120 5.607 5.644 5.802 5.929 6.035 6.126 6.205 6.275

0 5.463 5.498 5.646 5.764 5.863 5.947 6.020 6.085

TABLE B.-Percentage poi-ts of the stvdentized range q(a;p, v)*-Continued

a = 01

v 2 3 4 5 6 7 8 9 10

1 90.03 135.0 164.3 185.6 202.2 215.8 227.2 237.0 245.6

2 14.04 19.02 22.29 24.72 26.63 28.20 29.53 30.68 31.69

3 8.261 10.62 12.17 13.33 14.24 15.00 15.64 16.20 16.69

4 6.512 8.120 9.173 9.958 10.58 11.10 11.55 11.93 12.27

5 5.702 6.976 7.804 8.421 8.913 9.321 9.669 9.972 10.24

6 5.243 6.331 7.033 7.556 7.973 8.318 8.613 8.869 9.097

7 4.949 5.919 6.543 7.005 7.373 7.679 7.939 8.166 8.368

8 4.746 5.635 6.204 6.625 6.960 7.237 7.474 7.681 7.863

9 4.596 5.428 5.957 6.348 6.658 6.915 7.134 7.325 7.495

10 4.482 5.270 5.769 6.136 6.428 6.669 6.875 7.055 7.213

11 4.392 5.146 5.621 5.970 6.247 6.476 6.672 6.842 6.992

12 4.320 5.046 5.502 5.836 6.101 6.321 6.507 6.670 6.814

13 4.260 4.964 5.404 797 5.981 6.192 6.372 6.528 6.667

14 4.210 4.895 5.322 5.634 5.881 6.085 6.258 6.409 6.543

15 4.168 4.836 5.252 5.556 5.796 5.994 6.162 6.309 6.439

16 4.131 4.786 5.192 5.489 5.722 5.915 6.079 6.222 6.349

17 4.099 4.742 5.140 5.430 5.659 5.847 6.007 6.147 6.270

18 4.071 4.703 5.094 5.379 5.603 5.788 5.944 6.081 6.201

19 4.046 4.670 5.054 5.334 5.554 5.735 5.889 6.022 6.141

20 4.024 4.639 5.018 5.294 5.510 5.688 5.839 5.970 6.087

24 3.956 4.546 4.907 5.168 5.374 5.542 5.685 5.809 5.919

30 3.889 4.455 4.799 5.048 5.242 5.401 5.536 5.653 5.756

40 3.825 4.367 4.696 4.931 5.114 5.265 5.392 5.502 5.599

60 3.762 4.282 4.595 4.818 4.991 5.133 5.253 5.356 5.447

120 3.702 4.200 4.497 4.709 4.872 5.005 5.118 5.214 5.299

00 3.643 4.120 4.403 4.603 4.757 4.882 4.987 5.078 5.157

:7;

TABLE B. -Percentage points of the studentized range q(a;p, v)*-Continued

a = .01

11 12 13 14 15 16 17 18 19

1 253.2 260.0 266.2 271.8 277.0 281.8 286.3 290.4 294.3

2 32.59 33.40 34.13 34.81 35.43 36.00 36.53 37.03 37.50

3 17.13 17.53 17.89 18.22 18.52 18.81 19.07 19.32 19.55

4 12.57 12.84 13.09 13.32 13.53 13.73 13.91 14.08 14.24

5 10.48 10.70 10.89 11.08 11.24 11.40 11.55 11.68. 11.81

6 9.301 9.485 9.653 9.808 9.951 10.08 10.21 10.32 10.43

7 8.548 8.711 8.860 8.997 9.124 9.242 9.353 9.456 9.554

8 8.027 8.176 8.312 8.436 8.552 1 8.659 8.760 8.854 8.943

9 7.647 7.784 7.910 8.025 8.132 8.232 8.325 8.412 8.495

10 7.356 7.485 7.603 7.712 7.812 7.906 7.993 8.076 8.153

11 7.128 7.250 7.362 7.465 7.560 7.649 7.732 7.809 7.883

12 6.943 7.060 7.167 7.265 7.356 7.441 7.520 7.594 7.665

13 6.791 6.903 7.006 7.101 7.188 7.269 7.345 7.417 7.485

14 6.664 6.772 6.871 6.962 7.047 7.126 7.199 7.268 7.333

15 6.555 6.660 6.757 6.845 6.927 7.003 7.074 7.142 7.204

16 6.462 6.564 6.658 6.744 6.823 6.898 6.967 7.032 7.093

17 6.381 6.480 6.572 6.656 6.734 6.806 6.873 6.937 6.997

18 6.310 6.407 6.497 6.579 6.655 6.725 6.792 6.854 6.912

19 6.247 6.342 6.430 6.510 6.585 6.654 6.719 6.780 6.837

20 6.191 6.285 6.371 6.450 6.523 6.591 6.654 6.714 6.771

24 6.017 6.106 6.186 6.261 6.330 6.394 6.453 6.510 6.563

30 5.849 5.932 6.008 6.078 6.143 6.203 6.259 6.311 6.361

40 5.686 5.764 5.835 5.900 5.961 6.017 6.069 6.119 6.165

60 5.528 5.601 5.667 5.728 5.785 5.837 5.886 5.931 5.974

120 5.375 5.443 5.505 5.562 5.614 5.662 5.708 5.750 5.790

m 5.227 5.290 5.348 5.400 5.448 5.493 5.535 5.574 5.611

TABLE B.--Percentage points of the studentized range q(a;p,v)*-Continued

a = .01

2" 20 22 24 26 28 30 32 34 36

1 298.0 304.7 310.8 316.3 321.3 326.0 330.3 334.3 338.0

2 37.95 38.76 39.49 40.15 40.76 41.32 41.84 42.33 42.78

3 19.77 20.17 20.53 20.86 21.16 21.44 21.70 21.95 22.17

4 14.40 14.68 14.93 15.16 15.37 15.57 15.75 15.92 16.08

5 11.93 12.16 12.36 12.54 12.71 12.87 13.02 13.15 13.28

6 10.54 10.73 10.91 11.06 11.21 11.34 11.47 11.58 11.69

7 9.646 9.815 9.970 10.11 10.24 10.36 10.47 10.58 10.67

8 9.027 9 182 9.322 9.450 9.569 9.678 9.779 9.874 9.964

9 8.573 8.717 8.847 8.966 9.075 9.177 9.271 9.360 9.443

10 8.226 8 361 8.483 8.595 8.698 8.794 8.883 8.966 9.044

11 7.952 8.080 8.196 8.303 8.400 8.491 8.575 8.654 8.728

12 7.731 7.853 7.964 8.066 8.159 8.246 8.327 8.402 8.473

13 7.548 7.665 7.772 7.870 7.960 8.043 8.121 8.193 8.262

14 7.395 7.508 7.611 7.705 7.792 7.873 7.948 8.018 8.084

15 7.264 7.374 7.474 7.566 7.650 7.728 7.800 7.869 7.932

16 7.152 7.258 7.356 7.445 7.527 7.602 7.673 7.739 7.802

17 7.053 7.158 7.253 7.340 7.420 7.493 7.563 7.627 7.687

18 6.968 7.070 7.163 7.247 7.325 7.398 7.465 7.528 7.587

19 6.891 6.992 7.082 7.166 7.242 7.313 7.379 7.440 7.498

20 6.823 6.922 7.011 7.092 7.168 7.237 7.302 7.362 7.419

24 6.612 6.705 6.789 6.865 6.936 7.001 7.062 7.119 7.173

30 6.407 6.494 6.572 6.644 6.710 6.772 6.828 6.881 6.932

40 6.209 6.289 6.362 6.429 6.490 6.547 6.600 6.650 6.697

60 6.015 6.090 6.158 6.220 6.277 6.330 6.378 6.424 6.467

120 5.827 5.897 5.959 6.016 6.069 6.117 6.162 6.204 6.244

cc 5.645 5.709 5.766 5.818 5.866 5.911 5.952 5.990 6.026

Page

Missing

or

Unavailable

Page

Missing

or

Unavailable

TABLE C.-Critical values for Duncan's Multiple Range Test--( continuedd

20

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

22

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

24

30

40

60

120

mG

a = .05 __

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.481

3.486

3.492

3.498

3.505

26 28

17.97 17.97

6.085 6.085

4.516 4.516

4.033 4.033

3.814 3.814

3.697 3.697

3.626 3.626

3.579 3.579

3.547 3.547

3.526 3.526

3.510 3.510

3.499 3.499

3.490 3.490

3.485 3.485

3.481 3.481

3.478 3.478

3.476 3.476

3.474 3.474

3.474 3.474

3 474 3.474

3.477 3.477

3.484 3.486

3.492 3.497

3.501 3.509

3.511 3.522

3.522 3.536

40

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.500

3.515

3.532

3.550

50

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.503

3.521

3.541

3.562

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.525

3.548

3.574

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.529

3.555

3.584

3.510 3.510

3.499 3.499

3.490 3.490

3.485 3.485

3.481 3.481

3.478 3.478

3.476 3.476

3.474 3.474

3.474 3.474

3.473 3.474

3.471 3.475

3.470 3.477

3.469 3.479

3.467 3.481

3.466 3.483

3.466 3.486

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.531

3.561

3.594

I I

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.534

3.566

3.603

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.537

3.585

3.640

= .05

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.537-

3.596

3.668W

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510.

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.537

3.600

3.690

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.537

3.601

3.708

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490-

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.537

3.601

3.722

17.97

6.085

4.516

4.033

3.814

3.697

3.626

3.579

3.547

3.526

3.510

3.499

3.490

3.485

3.481

3.478

3.476

3.474

3.474

3.474

3.477

3.486

3.504

3.537

3.601

3.735

TABLE.C.-Critical values for Duncan's Multiple Range Test-Continued

a = .01

P 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.93 90.03 90.03

2 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04

3 8.261 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321

.4 6.512 6.677 6.740 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756

5 5.702 5.893 5.989 6.040 6.065 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074

6 5.243 5.439 5.549 5.614 5.655 5.680 5.694 5.701 5.703 5.703 5.703 5.703 5.703 5.703 5.703 5.703 5.703 5.703

7 4.949 5.145 5.260 5.334 5.383 5.416 5.439 5.454 5.464 5.470 5.472 5.472 5.472 5.472 5.472 5.472 5.472 5.472

8 4.746 4.939 5.057 5.135 5.189 5.227 5.256 5.276 5.291 5.302 5.309 5.314 5.316 5.317 5.317 5.317 5.317 5.317

9 4.596 4.787 4.906 4.986 5.043 5.086 5.118 5.142 5.160 5.174 5.185 5.193 5.199 5.203 5.205 5.206 5.206 5.206

10 4.482 4.671 4.790 4.871 4.931 4.975 5.010 5.037 5.058 5.074 5.088 5.098 5.106 5.112 5.117 5.120 5.122 5.124

11 4.392 4.579 4.697 4.780 4.841 4.887 4.924 4.952 4.975 4.994 5.009 5.021 5.031 5.039 5.045 5.050 5.054 5.057

12 4.320 4.504 4.622 4.706 4.767 4.815 4.852 4.883 4.907 4.927 4.944 4.958 4.969 4.978 4.986 4.993 4.998 5.002

13 4.260 4.442 4.560 4.644 4.706 4.755 4.793 4.824 4.850 4.872 4.889 4.904 4.917 4.928 4.937 4.944 4.950 4.956

14 4.210 4.391 4.508 4.591 4.654 4.704 4.743 4.775 4.802 4.824 4.843 4.859 4.872 4.884 4.894 4.902 4.910 4.916

15 4.168 4.347 4.463 4.547 4.610 4.660 4.700 4.733 4.760 4.783 4.803 4.820 4.834 4.846 4.857 4.866 4.874 4.881

16 4.131 4.309 4.425 4.509 4.572 4.622 4.663 4.696 4.724 4.748 4.768 4.786 4.800 4.813 4.825 4.835 4.844 4.851

17 4.099 4.275 4.391 4.475 4.539 4.589 4.630 4.664 4.693 4.717 4.738 4.756 4.771 4.785 4.797 4.807 4.816 4.824

18 4.071 4.246 4.362 4.445 4.509 4.560 4.601 4.635 4.664 4.689 4.711 4.729 4.745 4.759 4.772 4.783 4.792 4.801

19 4.046 4.220 4.335 4.419 4.483 4.534 4.575 4.610 4.639 4.665 4.686 4.705 4.722 4.736 4.749 4.761 4.771 4.780

20 4.024 4.197 4.312 4.395 4.459 4.510 4.552 4.587 4.617 4.642 4.664 4.684 4.701 4.716 4.729 4.741 4.751 4.761

24 3.956 4.126 4.239 4.322 4.386 4.437 4.480 4.516 4.546 4.573 4.596 4.616 4.634 4.651 4.665 4.678 4.690 4.700

30 3.889 4.056 4.168 4.250 4.314 4.366 4.409 4.445 4.477 4.504 4.528 4.550 4.569 4.586 4.601 4.615 4.628 4.640

40 3.825 3.988 4.098 4.180 4.244 4.296 4.339 4.376 4.408 4.436 4.461 4.483 4.503 4.521 4.537 4.553 4.566 4.579

60 3.762 3.922 4.031 4.111 4.174 4.226 4.270 4.307 4.340 4.368 4.394 4.417 4.438 4.456 4.474 4.490 4.504 4.518

120 3.702 3.858 3.965 4.044 4.107 4.158 4.202 4.239 4.272 4.301 4.327 4.351 4.372 4.392 4.410 4.426 4.442 4.456

0 '3.643 3.796 3.900 3.978 4.040 4.091 4.135 4.172 4.205 4.235 4.261 4.285 4.307 4.327 4.345 4.363 4.379 4.394

TABLE C.-Critical values bfor Duncan's Multiple Range Test-Continued

a = .01

20 22 24 26 28 30 32 34 36 38 40 50 60 70 80 90 100

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.059

5.006

4.960

4.921

4.887

4.858

4.832

4.808

4.788

4.769

4.710

4.650

4.591

4.530

4.469

4.408

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.010

4.966

4.929

4.897

4.869

4.844

4.821

4.802

4.786

4.727

4.669

4.611

4.553

4.494

4.434

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.970

4.935

4.904

4.877

4.853

4.832

4.812

4.795

4.741

4.685

4.630

4.573

4.516

4.457

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.938

4.909

4.883

4.860

4.839

4.821

4.805

4.752

4.699

4.645

4.591

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.912

4.887

4.865

4.846

4.828

4.813

4.762

4.711

4.659

4.607

4.535 4.552 4.568

4.478 4.497 4.514

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.890

4.869

4.850

4.833

4.818

4.770

4.721

4.671

4.620

90.03

14.04

8.321

6.756

6.074

5.703

5.472-

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.872

4.854

4.838

4.823

4.777

4.730

4.682

4.633

4.583

4.530

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.873

4.856

4.841

4.827

4.783

4.738

4.692

4.645

4.596

4.545

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.857

4.843

4.830

4.788

4.744

4.700

4.655

4.609

4.559

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.844

4.832

4.791

4.750

4.708

4.665

4.619

4.572

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.794

4.755

4.715

4.673

4.630

4.584

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.802

4.772

4.740

4.707

4.673

4.635

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.802

4.777

4.754

4.730

4.703

4.675

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.802

4.777

4.761

4.745

4.727

4.707

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.802

4.777

4.764

4.755

4.745

4.734

90.03

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.802

4.777

4.764

4.761

4.759

4.756

90.03.

14.04

8.321

6.756

6.074

5.703

5.472

5.317

5.206

5.124

5.061

5.011

4.972

4.940

4.914

4.892

4.874

4.858

4.845

4.833

4.802

4.777

4.764

4.765

4.770

4.776

Source: Reproduced from H. Leon Harter, Critical Values for Duncan's New Multiple Range Test, Biometrics, vol. 16(1960), with the permission of the author and

the editor.

TABLE Dl.-Critical values of k-ratio t test (k = 100)

v(denominator d.f. for F)

qinum. d.f. for F) 6

8 10 12 14 16 18 20 24 30 40 60 120

F = 1.2 (a = .913, b = 2.449)

* *

* *

.00 3.00

.09 3.10

.16 3.17

.21 3.23

.26 3.28

.33 3.37

.52 3.58

.67 3.76

.80 3.91

F = 1.4 (a = .845, b = 1.871)

*

2.82 2.82 2.81 2.80 2.80 2.79 2.78

2.90 2.90 2.89 2.89 2.89 2.88 2.88

2.95 2.95 2.96 2.96 2.96 2.95 2.95

2.99 3.00 3.00 3.01 3.01 3.01 3.01

3.02 3.03 3.04 3.04 3.05 3.05 3.06

3.04 3.06 3.07 3.08 3.08 3.09 3.09

3.08 3.10 3.11 3.12 3.13 3.14 3.15

3.16 3.19 3.22 3.24 3.25 3.28 3.30

3.22 3.26 3.29 3.32 3.34 3.38 3.41

3.26 3.31 3.35 3.39 3.42 3.46 3.50

F = 1.7 (a = .767, b = 1.558)

* ,* *

F = 2.0 (a = .707,

2 *

4 2.74 2.67 2.63 2.59 2.56 2.54

6 2.79 2.74 2.70 2.67 2.64 2.62

2.81 2.77 2.74 2.71 2.69 2.67

10 2.83 2.80 2.77 2.74 2.72 2.70

I 2.-4 2.82 2.79 2.77 2.75 2.73

14 2.85 2.83 2.81 2.79 2.77 2.75

!6 2.85 2.84 2.82 2.80 2.78 2.76

20 2.86 2.85 2.84 2.82 2.80 2.78

40 2.88 2.89 2.88 2.86 2.85 2.83

100 2.89 2.91 2.90 2.89 2.88 2.86

2.90 2.92 2.92 2.91 B.S)0 2.88

See footnotes at end of table.

2.59 2.58

2.70 2.69

2.76 2.75

2.81 2.80

2.84 2.84

2.87 2.86

2.89 2.89

2.92 2.92

3.00 2.99

3.05 3.05

3.09 3.08

b = 1.414)

* *

2.52 2.51

2.60 2.59

2.65 2.64

2.69 2.67

2.71 2.70

2.73 2.72

2.74 2.73

2.77 2.75

2.81 2.80

2.84 2.82

2.86 2.85

* *

2.49 2.46 2.44 2.41 2.3

2.57 2.54 2.52 2.49 2.4

2.62 2.59 2.56 2.53 2.4

2.65 2.62 2.59 2.56 2.5

2.67 2.64 2.61 2.57 2.5

2.69 2.66 2.63 2.59 2.5

2.70 2.67 2.64 2.59 2.5

2.72 2.69 2.65 2.61 2.5

2.77 2.73 2.68 2.62 2.5

2.79 2.75 2.69 2.62 2.5

2.81 2.76 2.69 2.61 2.5

TABLE D1.-Critical values of k-ratio t test (k = 100)-Continued

v(denominator d.f. for F)

q(num. d.f. for F) 6 8 10 12 14 16 18 20 24 30 40 60 120

F = 2.4 (a = .645, b = 1.309)

2 2.18

4 2.71 2.63 2.57 2.53 2.49 2.47 2.44 2.43 2.40 2.37 2.34 2.31 2.28

6 2.75 2.68 2.63 2.58 2.55 2.52 2.50 2.48 2.46 2.42 2.39 '2.36 2.32

8 2.77 2.71 2.66 2.62 2.59 2.56 2.54 2.52 2.49 2.45 2.42 2.38 2.34

.10 2.79 2.73 2.68 2.64 2.61 2.58 2.56 2.54 2.50 2.47 2.43 2.39 2.34

12 2.79 2.74 2.70 2.66 2.62 2.60 2.57 2.55 2.52 2.48 2.44 2.39 2.35

14 2.80 2.75 2.71 2.67 2.64 2.61 2.58 2.56 2.53 2.49 2.44 2.40 2.35

16 2.81 2.76 2.72 2.68 2.65 2.62 2.59 2.57 2.53 2.49 2.45 2.40 2.34

20 2.82 2.77 2.73 2.69 2.66 2.63 2.60 2.58 2.54. 2.50 2.45 2.40 2.34

40 2.83 2.80 2.76 2.72 2.69 2.66 2.63 2.60 2.56 2.51 2.46 2.39 2.33

100 2.84 2.81 2.78 2.74 2.71 2.67 2.64 2.62 2.57 2.51 2.45 2.39 2.:'2

co 2.85 2.83 2.79 2.76 2.72 2.68 2.65 2.62 2.57 2.51 2.45 2.38 2.3,1

F = 3.0 (a = .577, b = 1.225)

2 2.41 2.36 2.32 2.29 2.27 2.25 2.22 2.20 2.17 2.14 2.11

4 2.68 2.57 2.50 2.45 2.41 2.38 2.35 2.33 2.30 2.27 2.24 2.20 2.17

6 2.71 2.61 2.54 2.49 2.44 2.41 2.39 2.36 2.33 2.29 2.26 2.22 2.18

8 2.72 2.63 2.56 2.51 2.47 2.43 2.40 2.38 2.34 2.31 2.27 2.22 2.18

10 2.74 2.65 2.58 2.52 2.48 2.44 2.41 2.39 2.35 2.31 2.27 2.22 2.18

12 2.74 2.66 2.59 2.53 2.49 2.45 2.42 2.40 2.36 2.31 2.27 2.22 2.18

14 2.75 2.66 2.60 2.54 2.49 2.46 2.43 2.40 2.36 2.32 2.27 2.22 2.17

16 2.75 2.67 2.60 2.55 2.50 2.46 2.43 2.40 2.36 2.32 2.27 2.22 2.17

20 2.76 2.68 2.61 2.55 2.51 2.47 2.43 2.41 2.36 2.32 2.27 2.22 2.17

40 2.77 2.70 2.63 2.57 2.52 2.48 2.44 2.41 2.37 2.32 2.26 2.21 2.16

100 2.78 2.71 2.64 2.58 2.53 2.49 2.45 2.42 2.37 2.31 2.26 2.21 2.16

0 2.79 2.71 2.65 2.59 2.53 2.49 2.45 2.42 2.37 2.31 2.26 2.20 2.15

F = 4.0 (a = .500, b = 1.155)

2 2.58 2.44 2.35 2.29 2.25 2.22 2.20 2.18 2.15 2.12 2.09 2.06 2.03

4 2.63 2.50 2.41 2.35 2.30 2.27 2.24 2.22 2.18 2.15 2.12 2.08 2.05

6 2.65 2.52 2.43 2.37 2.32 2.28 2.25 2.23 2.19 2.16 2.12 2.08 2.04

10 2.67 2.55 2.46 2.39 2.34 2.30 2.26 2.24 2.20 2.16 2.12 2.08 2.04

20 2.69 2.57 2.47 2.40 2.35 2.30 2.27 2.24 2.20 2.15 2.11 2.0.7 2.03

0 2.71 2.59 2.49 2.42 2.36 2.31 2.27 2.24 2.19 2.15 2.11 2.06 2.02

F = 6.0 (a= .408, b = 1.095)

2 2.53 2.37 2.27 2.21 2.16 2.13 2.10 2.08 2.05 2.02 1.99 1.96 1.93

4 2.56 2.40 2.30 2.23 2.18 2.14 2.12 2.09 2.06 2.02 1.99 1.96 1.93

6 2.58 2.42 2.31 2.24 2.19 2.15 2.12 2.09 2.06 2.02 1.99 1.95 1.92

10 2.59 2.43 2.32 2.24 2.19 2.15 2.12 2.09 2.06 2.02 1.99 1.95 1.92

20 2.60 2.44 2.32 2.25 2.19 2.15 2.12 2.09 2.05 2.02 1.98 1.95 1.92

0 2.61 2.44 2.33 2.25 2.19 2.15 2.12 2.09 2.05 2.02 1.98 1.95 1.92

See footnotes at end of table.

v(denominator d.f. for F)

q(num. d.f. for F) 6 8 10 12 14 16 18 20 24 30 40 60 120

F= 10.0 (a = .316, b = 1.054)

2 2.48 2.30 2.19 2.12 2.07 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.85

4 2.49 2.31 2.20 2.13 2.08 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.84

6 2.50 2.31 2.20 2.13 2.08 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.84

10-00 2.51 2.32 2.20 2.13 2.08 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.84

F = 25.0(a = .200, b = 1.021)

2-4 .2.40 2.20 2.10 2.03 1.99 1.95 1.93 1.91 1.88 1.86 1.83 1.80 1.78

6-ao 2.41 2.21 2.10 2.03 1.99 1.95 1.93 1.91 1.88 1.86 1.83 1.80 1.78

F = (a = 0, b = 1)

2-ao 2.33 2.13 2.03 1.97 1.93 1.90 1.88 1.86 1.84 1.81 1.79 1.76 1.74

*All differences not significant. a = 1/F b = [F/(F 1)] .

If v=4, t=2.83 for all q and F satisfying F > 8.12/q.

Source: Reproduced from Waller, Ray A., and Duncan, David B., A Bayes Rule for the Symmetric Multiple Comparisons Problem,

Corrigenda, Journal of the American Statistical Association, vol. 67 (1972), with permission of author and publisher.

51

TABLE D2.-Critical values of k-ratio t test (k =500)

v (denominator d.f. for F)

q (num. d.f. for F) 6 8 10 12 14 16 18 20 24 30 40 60 120

F= 1.2 (a = .913, b = 2.449)

2-16 *

20 4.70 4.82 4.89 .

40 4.75 4.91 5.03 5.12 5.20 5.25 5.30 5.34 5.41 5.48 5.55 5.61 5.67

100 4.79 4.98 5.13 5.25 5.34 5.43 5.50 5.56 5.65 5.76 5.89 6.02 6.13

0 4.81 5.03 5.20 5.34 5.46 5.56 5.65 5.73 5.86 6.02 6.20 6.41 6.56

F = 1.4 (a= .845, b = 1.871)

2-14 *

16 4.61 4.66 4.68 4.69 4.69 4.69 4.69 4.68 4.67 4.65 4.62 .4.58 4.53

20 4.64 4.70 4.73 4.75 4.76 4.77 4.77 4.76 4.76 4.74 4.72 4.68 4.62

40 4.68 4.78 4.85 4.89 4.92 4.94 4.96 -4.96 4.97 4.97 4.95 4.90 4.81

0 4.74 4.88 4.99 5.06 5.12 5.17 5.20 5.23 5.26 5.28 5.26 5.16 4.82

F = 1.7 (a = .767, b = 1.558)

2-8 *

10 4.08 4.02 3.95 3.87

12 4.50 4.46 4.42 4.38 4.34 4.30 4.27 4.24 4.19 4.14 4.07 3.99 3.90

20 4.55 4.54 4.52 4.49 4.46 4.43 4.40 4.37 4.32 4.26 4.18 4.08 3.95

40 4.59 4.61 4.61 4.60 4.57 4.55 4.52 4.49 4.44 4.36 4.26 4.12 3.93

o 4.64 4.69 4.71 4.72 4.71 4.69 4.66 4.63 4.57 4.46 4.31 4.07 3.76

F = 2.0 (a = .707, b = 1.414)

2-6 *

8 3.98 3.93 3.89 3.83 3.76 3.69 3.60 3.51

10 4.41 4.31 4.22 4.15 4.08 4.03 3.98 3.94 3.88 3.80 3.72 3.63 3.53

20 4.48 4.41 4.34 4.27 4.21 4.16 4.10 4.06 3.98 3.89 3.78 3.65 3.51

40 4.51 4.47 4.41 4.35 4.29 4.23 4.17 4.12 4.03 3.92 3.78 3.62 3.44

0o 4.55 4.53 4.49 4.43 4.37 4.31 4.25 4.19 4.07 3.93 3.75 3.54 3.33

F = 2.4 (a = .645, b = 1.309)

2-4 *

6 3.77 3.71 3.65 3.61 3.54 3.47 3.39 3.30 3.22

8 4.31 4.14 4.01 3.91 3.83 3.76 3.70 3.66 3.58 3.50 3.41 3.32 3.22

10 4.33 4.18 4.05 3.95 3.87 3.79 3.73 3.68 3.60 3.51 3.42 3.31 3.21

20 4.39 4.26 4.14 4.04 3.95 3.87 3.80 3.74 3.64 3.53 3.41 3.28 3.15

0 4.45 4.35 4.25 4.14 4.03 3.94 3.85 3.78 3.64 3.50 3.34 3.18 3.04

F = 3.0-(a = .577, b = 1.225)

2 *

4 3.43 3.38 3.33 3.26 3.19 3.12 3.04 2.97

6 4.19 3.95 3.79 3.66 3.56 3.49 3.43 3.37 3.30 3.21 3.13 3.04 2.95

10 4.24 4.02 3.85 3.72 3.62 3.53 3.46 3.40 3.31 3.21 3.12 3.02 2.92

20 4.28 4.08 3.91 3.77 3.65 3.56 3.48 3.41 3.31 3.20 3.09 2.98 2.87

0 4.33 4.15 3.97 3.82 3.69 3.57 3.48 3.40 3.28 3.15 3.03 2.92 2.82

TABLE D2.-Critical values of k-ratio t test (k = 500)-Continued

v (denominator d.f. for F)

q(num. d.f. for F) 6

8 10 12 14 16 18 20 24 30 40 60 120

F = 4.0 (a = .500, b = 1.155)

*

3.40 3.30 3.22 3.16 3.11 3.04 2.96

3.43 3.32 3.24 3.17 3.12 3.04 2.95

3.46 3.34 3.25 3.17 3.11 3.03 2.94

3.48 3.35 3.25 3.17 3.10 3.01 2.92

3.49 3.35 3.24 3.15 3.09 2.99 2.89

*

3.74

4.08 3.78

4.12 3.83

4.15 3.86

4.19 3.90

3.90 3.54

3.93 3.57

3.95 3.59

3.97 3.60

3.99 3.62

3.99 3.62

F = 10.0 (a = .316, b = 1.054)

2.96 2.86 2.79 2.74 2.70

2.96 2.86 2.79 2.73 2.69

2.96 2.85 2.78 2.72 2.68

2.96 2.85 2.78 2.72 2.68

2.95 2.85 2.77 2.72 2.67

F = 25.0 (a = .200, b = 1.021)

2.92 2.79 2.70 2.64 2.59 2.56

2.92 2.79 2.70 2.64 2.59 2.55

2.92 2.78 2.70 2.63 2.59 2.55

2.64

2.63

2.62

2.62

2.61

2.58

2.57

2.56

2.56

2.56

2.52

2.51

2.50

2.50

2.50

2.47

2.46

2.45

2.45

2.45

2.42

2.41

2.40

2.40

2.40

2.51 2.46 2.41 2.36 2.32

2.50 2.45 2.41 2.36 2.32

2.50 2.45 2.41 2.36 2.32

F = (a = 0, b = 1)

2.80 2.69 2.61

2.55 2.51 2.48 2.44 2.39 2.35 2.31 2.27

*All differences not significant. a = 1/F4, b = [F/(F 1)]' .

If v=4, t = 4.52 for all q and F satisfying V' > 20.43/q.

Source: Reproduced from Waller, Ray A., and Duncan, David B. A. Bayes Rule for the Symmetric Multiple Comparisons Problem,

Corrigenda, Journal of the American Statistical Association, vol. 67 (1972), pp. 253-255, with the permission of the author and the

publisher.

53

~TT

F = 6.0 (a = .408, b = 1.095)

3.14 3.04 2.97 2.91 2.87

3.17 3.06 2.98 2.92 2.87

3.18 3.06 2.98 2.91 2.86

3.18 3.06 2.97 2.91 2.85

3.18 3.06 2.97 2.90 2.84

3.18 3.05 2.96 2.89 2.83

3.10

3.11

3.11

3.11

3.11

3.72 3.33

3.75 3.35

3.78 3.36

3.79 3.36

3.80 3.37

3.55 3.14

3.57 3.14

3.57 3.14

2-co 3.39 3.00

TABLE E.-100-y% points of the distribution of the largest absolute value of k uncorrelated Stndent t variiates

with v degrees of freedom

v k 1 2 3 4 5 6 8 10 12 15 20

y-=0.90

2.353

2.132

2.015

1.943

1.895

1.860

1.833

1.813

1.796

1.782

1.753

1.725

1.708

1.697

1.684

1.671

3.183

2.777

2.571

2.447

2.365

2.306

2.262

2.228

2.201

2.179

2.132

2.086

2.060

2.042

2.021

2.000

2.989

2.662

2.491

2.385

2.314

2.262

2.224

2.193

2.169

2.149

2.107

2.065

2.041

2.025

2.006

1.986

3.960

3.382

3.091

2.916

2.800

2.718

2.657

2.609

2.571

2.540

2.474

2.411

2.374

2.350

2.321

2.292

5.841 7.127

4.604 5.462

4.032 4.700

3.707 4.271

3.500 3.998

3.355 3.809

3.250 3.672

3.169 3.567

3.106 3.485

3.055 3.418

2.947 3.279

2.845 3.149

2.788 3.075

2.750 3.027

2.705 2.969

2.660 2.913

Source: Reproduced from Hahn and Hendrickson (1971), Biometrika 58, p. 323, with the permission of the author and publisher.

3.844

3.368

3.116

2.961

2.856

2.780

2.:73

2.678

2.642

2.612

2.548

2.486

2.450

2.426

2.397

2.368

4.011

3.506

3.239

3.074

2.962

2.881

2.819

2.771

2.733

2.701

2.633

2.567

2.528

2.502

2.470

2.439

3.369

2.976

2.769

2.642

2.556

2.494

2.447

2.410

2.381

2.357

2.305

2.255

2.226

2.207

2.183

2.160

4.430

3.745

3.399

3.193

3.056

2.958

2.885

2.829

2.784

2.747

2.669

2.594

2.551

2.522

2.488

2.454

7.914

5.985

5.106

4.611

4.296

4.080

3.922

3.801

3.707

3.631

3.472

3.323

3.239

3.185

3.119

3.055

3.637

3.197

2.965

2.822

2.725

2.656

2.603

2.562

2.529

2.501

2.443

2.386

2.353

2.331

2.305

2.278

4.764

4.003

3.619

3.389

3.236

3.128

3.046

2.984

2.933

2.892

2.805

2.722

2.673

2.641

2.603

2.564

8.479

6.362

5.398

4.855

4.510

4.273

4.100

3.969

3.865

3.782

3.608

3.446

3.354

3.295

3.223

3.154

y = 0.95

5.023

4.203

3.789

3.541

3.376

3.258

3.171

3.103

3.048

3.004

2.910

2.819i

2.766

2.732

2.690

2.649

5.233

4.366-

3.928

3.664

3.489

3.365

3.272

3.199

3.142

3.095

2.994

2.898

2.842

2.805

2.760

2.716

4.272

3.722

3.430

3.249

3.127

3.038

2.970

2.918

2.875

2.840

2.765

2.691

2.648

2.620

2.585

2.550

5.562

4.621

4.145

3.858

3.668

3.532

3.430

3.351

3.288

3.236

3.126

3.020

2.959

2.918

2.869

2.821

9.838

7.274

6.106

5.449

5.031

4.742

4.532

4.373

4.247

4.146

3.935

3.738

3.626

3.555

3.468

3.384

4.471

3.887

3.576

3.384

3.253

3.158

3.086

3.029

2.984

2.946

2.865

2.786

2.740

2.709

2.671

2.634

5.812

4.817

4.312

4.008

3.805

3.660

3.552

3.468

3.400

3.345

3.227

3.114

3.048

3.005

2.952

2.900

10.269

7.565

6.333

5.640

5.198

4.894

4.672

4.503

4.370

4.263

4.040

3.831

3.713

3.637

3.545

3.456

4.631

4.020

3.694

3.493

3.355

3.255

3.179

3.120

3.072

3.032

2.947

2.863

2.814

2.781

2.741

2.701

6.015

4.975

4.447

4.129

3.916

3.764

3.651

3.562

3.491

3.433

3.309

3.190

3.121

3.075

3.019

2.964

10.6161

7.801

6.519

5.796

5.335

5.017

4.785

4.609

4.470

4.359

4.125

3.907

3.783

3.704

3.607

3.515

4.823

4.180

3.837

3.624

3.478

:3.373

3.292

3.229

3.178

3.136

3.045

2. i56

2.903:

2.868

2.825

2.782

6.259

5.166

4.611

4.275

4.051

3.891

3.770

3.677

3.602

3.541

3.409

3.282

3.208

3.160

3.100

3.041

11.034

8.087

6.744

5.985

5.502

5.168

4.924

4.739

4.593

4.475

4.229

3.v99

3.869

3.785

3.683

3.586

3..x66

4.383

4.018

3.790

3.522

3.436

3.:1;3

2. i70

3.0o7

3.016

2.97s

2.931

2.,S.4

6.567

5.409

4.819

4.462

4.223

4.052

3.923

3.0o23

3.743

:3.677

3.536

3.399

3.320

3.267

3.203

3.139

11.559

8.451

7.050

6.250

5.716

5.361

5.103

4.905

4.750

4.625

4.363

4.117

3.978

3.889

3.780

3.676

y = 0.99

8.919

6.656

5.625

5.046

4.677

4.424

4.239

4.098

3.988

3.899

3.714

3.541

3.442

3.379

3.303

3.229

9.277

6.897

5.812

5.202

4.814

4.547

4.353

4.205

4.087

3.995

3.800

3.617

3.514

3.448

3.367

3.290

TA4BLEF 1.-Critical (val it s /'t/( (v;tq, ./for one-sided DIilfett's ftestsfiOr t)n)(nparig c'ltroIl against ea('lI o( theIertrcatmenets

_____- -I---- a .01

6 7 8

q '

2.02

1.94

1.89

1.86

1.83

1.81

1.80

1.78

1.77

1.76

1.75

1.75

1.74

1.73

1.73

1.72

1.71

1.70

1.68

1.67

1.66

1.64

2.44

2.34

2.27

2.22

2.18

a= .05

3

2.68

2.56

2.48

2.42

2.37

2.34

2.31

2.29

2.27

2.25

2.24

2.23

2.22

2.21

2.20

2.19

2.17

2.15

2.13

2,10

2.08

2.06

4

2.85

2.71

2.62

2.55

2.50

2.47

2.44

2.41

2:39

2.37

2.36

2.34

2.33

2.32

2.31

2.30

2.28

2.25

2.23

2.21

2.18

2.16

5

2.98

2.83

2.73

2.66 (

2.60

2.56

2.53

2.50

2.48

2.46

2.44

2.43

2.42

2.41

2.40

2.39

2.36

2.33

2.31

2.28

2 3

3.08

2.92

2.82

2.74

2.68

2.64

2.60

2.58.

2.55

2.53

2.51

2.50

2.49

2.48

2.47

2.46

2.43

2.40

2.37

2.35

4 5 6 7 8 9

3.16

3.00

2.89

2.81

2.75

2.70

2.67

2.64

2.61

2.59

2.57

2.56

2.54

2.53

2.52

2.51

2.48

2.45

2.42

2.39

2.37

2.34

3.24

3.07

2.95

2.87

2.81

2.76

2.72

2.69

2.65

2.64

2.62

2.61

2.59

2.58

2.57

2.56

2.53

2.50

2.47

2.44

2.41

2.38

9

3.30

3.12

3.01

2.92

2.86

2.81

2.77

2.74

2.71

2.69

2.67

2.65

2.64

2.62

2.61

2.60

2.57

2.54

2.51

2.48

2.45

2.42

1

3.37

3.14

3.00(

2.!90

2.82

2.76

2.72

2.68

2.65

2.62

2.60

2.58

2.57

2.55

2.54

2.53

2.49

2.46

2.42

2.39

2.36

2.33

3.90(

3.61

3.42

3.29

3.19

3.11

3.06

3.01

2.97

2.94

2.91

2.88

2.86

2.84

2.83

2.81

2.77

2.72

2.68

2.64

4.21

3.88

3.66

3.51

3.40

3.31

3.25

3.19

3.15

3.11

3.08

3.05

3.03

3.01

2.99

2.97

2.92

2.87

2.82

2.78

4.43

4.07

3.83

3.67

3.55

3.45

3.38

3.32

3.27

3.23

3.20

3.17

3.14

3.12

3.10

3.08

3.03

2.97

2.92

2.87

2.82

2.77

2.15

2.13

2.11

2.09

2.08

2.07

2.06

2.05

2.04

2.03

2.03

2.01

1.99

1.97

1.95

1.93

1.92

Source: Reproduced from C.W. Dunnett, A multiple comparison procedure for comparing several treatments with a control, Journal of the American Statistical

Association, vol. 50 (1955).

4.60

4.21

3.96

3.79

3.66

3.56

3.48

3.42

3.37

3.32

3.29

3.26

3.23

3.21

3.18

3.17

3.11

3.05

2.99

2.94

2.89

2.84

4.73

4.33

4.07

3.88

3.75

3.64

3.56

3.50

3.44

3.40

3.36

3.33

3.30

3.27

3.25

3.23

3.17

3.11

3.05

3.00

2.94

2.89

4.85

4.43

4.15

3.96

3.82

3.71

3.63

3.56

3.51

3.46

3.42

3.39

3.36

3.33

3.31

3.29

3.22

3.16

3.10

3.04

2.99

2.93

4.94

4.51

4.23

4.03

3.89

3.78

3.69

3.62

3.56

3.51

3.47

3.44

3.41

3.38

3.36

3.34

3.27

3.21

3.14

3.08

3.03

2.97

2.26 2.32

2.23 I 2.29

2.60 2.73

2.56 2.68

5.03

4.59

4.30

4.09

3.94

3.83

3.74

3.67

3.61

3.56

3.52

3.48

3.45

3.42

3.40

3.38

3.31

3.24

3.18

3.12

3.06

3.00

1

TABLE F2.--Critical values of t(aa,v) for two-sided Dunnett's tests for comparing control against each of q

other treatments

____________a = .05____________

v, 1 2 3 4 5 6 7 8 .9 10 11 12 15 20

5 2.57 3.03 3.29 3.48 3.62 3.73 3.82 3.90 3.97 4.03 4.09 4.14 4.26 4.42

6 2.45 2.86 3.10 3.26 3.39 3.49 3.57 3.64 3.71 3.76 3.81 3.86 3.97 4.11

7 2.36 2.75 2.97 3.12 3.24 3.33 3.41 3.47 3.53 3.58 3.63 3.67 3.78 3.91

8 2.31 2.67 2.88 3.02 3.13 3.22 3.29 3.35 3.41 3.46 3.50 3.54 3.64 3.76

9 2.26 2.61 2.81 2.95 3.05 3.14 3.20 3.26 3.32 3.36 3.40 3.44 3.53 3.65

10 2.23 2.57 2.76 2.89 2.99 3.07 3.14 3.19 3.24 3.29 3.33 3.36 3.45 3.57

11 2.20 2.53 2.72 2.84 2.94 3.02 3.08 3.14 3.19 3.23 3.27 3.30 3.39 3.50

12 2.18 2.50 2.68 2.81 2.90 2.98 3.04 3.09 3.14 3.18 3.22 3.25 3.34 3.45

13 2.16 2.48 2.65 2.78 2.87 2.94 3.00 3.06 3.10 3.14 3.18 3.21 3.29 3.40

14 2.14 2.46 2.63 2.75 2.84 2.91 2.97 3.02 3.07 3.11 3.14 3.18 3.26 3.36

15 2.13 2.44 2.61 2.73 2.82 2.89 2.95 3.00 3.04 3.08 3.12 3.15 3.23 3.33

16 2.12 2.42 2.59 2.71 2.80 2.87 2.92 2.97 3.02 3.06 3.09 3.12 3.20 3.30

17 2.11 2.41 2.58 2.69 2.78 2.85 2.90 2.95 3.00 3.03 3.07 3.10 3.18 3.27

18 2.10 2.40 2.56 2.68 2.76 2.83 2.89 2.94 2.98 3.01 3.05 3.08 3.16 3.25

19 2.09 2.39 2.55 2.66 2.75 2.81 2.87 2.92 2.96 3.00 3.03 3.06 3.14 3.23

20 2.09 2.38 2.54 2.65 2.73 2.80 2.86 2.90 2.95 2.98 3.02 3.05 3.12 3.22

24 2.06 2.35 2.51 2.61 2.70 2.76 2.81 2.86 2.90 2.94 2.97 3.00 3.07 3.16

30 2.04 2.32 2.47 2.58 2.66 2.72 2.77 2.82 2.86 2.89 2.92 2.95 3.02 3.11

40 2.02 2.29 2.44 2.54 2.62 2.68 2.73 2.77 2.81 2.85 2.87 2.90 2.97 3.06

60 2.00 2.27 2.41 2.51 2.58 2.64 2.69 2.73 2.77 2.80 2.83 2.86 2.92 3.00

120 1.98 2.24 2.38 2.47 2.55 2.60 2.65 2.69 2.73 2.76 2.79 2.81 2.87 2.95

0 1.96 2.21 2.35 2.44 2.51 2.57 2.61 2.65 2.69 2.72 2.74 2.77 2.83 2.91

a = .01

j 1 2 3 4 5 6 7 8 9 10 11 12 15 20

5 4.03 4.63 4.98 5.22 5.41 5.56 5.69 5.80 5.89 5.98 6.05 6.12 6.30 6.52

6 3.71 4.21 4.51 4.71 4.87 5.00 5.10 5.20 5.28 5.35 5.41 5.47 5.62 5.81

7 3.50 3.95 4.21 4.39 4.53 4.64 4.74 4.82 4.89 4.95 5.01 5.06 5.19 5.36

8 3.36 3.77 4.00 4.17 4.29 4.40 4.48 4.56 4.62 4.68 4.73 4.78 4.90 5.05

9 3.25 3.63 3.85 4.01 4.12 4.22 4.30 4.37 4.43 4.48 4.53 4.57 4.68 4.82

10 3.17 3.53 3.74 3.88 3.99 4.08 4.16 4.22 4.28 4.33 4.37 4.42 4.52 4.65

11 3.11 3.45 3.65 3.79 3.89 3.98 4.05 4.11 4.16 4.21 4.25 4.29 4.39 4.52

12 3.05 3.39 3.58 3.71 3.81 3.89 3.96 4.02 4.07 4.12 4.16 4.19 4.29 4.41

13 3.01 3.33 3.52 3.65 3.74 3.82 3.89 3.94 3.99 4.04 4.08 4.11 4.20 4.32

14 2.98 3.29 3.47 3.59 3.69 3.76 3.83 3.88 3.93 3.97 4.01 4.05 4.13 4.24

15 2.95 3.25 3.43 3.55 3.64 3.71 3.78 3.83 3.88 3.92 3.95 3.99 4.07 4.18

16 2.92 3.22 3.39 3.51 3.60 3.67 3.73 3.78 3.83 3.87 3.91 3.94 4.02 4.13

17 2.90 3.19 3.36 3.47 3.56 3.63 3.69 3.74 3.79 3.83 3.86 3.90 3.98 4.08

18 2.88 3.17 3.33 3.44 3.53 3.60 3.66 3.71 3.75 3.79 3.83 3.86 3.94 4.04

19 2.86 3.15 3.31 3.42 3.50 3.57 3.63 3.68 3.72 3.76 3.79 3.83 3.90 4.00

20 2.85 3.13 3.29 3.40 3.48 3.55 3.60 3.65 3.69 3.73 3.77 3.80 3.87 3.97

24 2.80 3.07 3.22 3.32 3.40 3.47 3.52 3.57 3.61 3.64 3.68 3.70 3.78 3.87

30 2.75 3.01 3.15 3.25 3.33 3.39 3.44 3.49 3.52 3.56 3.59 3.62 3.69 3.78

40 2.70 2.95 3.09 3.19 3.26 3.32 3.37 3.41 3.44 3.48 3.51 3.53 3.60 3.68

60 2.66 2.90 3.03 3.12 3.19 3.25 3.29 3.33 3.37 3.40 3.42 3.45 3.51 3.59

120 2.62 2.85 2.97 3.06 3.12 .3.18 3.22 3.26 3.29 3.32 3.35 3.37 3.43 3.51

0o 2.58 2.79 2.92 3.00 3.06 3.11 3.15 3.19 3.22 3.25 3.27 3.29 3.35 3.42

Source: Reproduced from C.W. Dunnett, New tables for multiple comparisons with a control, Biometrics 20 (1964), with the

permission of the author and the editor..

TABLE G.-Critical values oft(a;p, v) for teid.ig zero against nonzero dose levels

(p = number of nonzero levels)

a = .05

1 2 3 4 5 6 7 8 9 10

5 2.02 2.14 2.19 2.21 2.22 2.23 2.24 2.24 2.25 2.25

6 1.94 2.06 2.10 2.12 2.13 2.14 2.14 2.15 2.15 2.15

7 1.89 2.00 2.04 2.06 2.07 2.08 2.08 2.09 2.09 2.09

8 1.86 1.96 2.00 2.01 2.02 2.03 2.04 2.04 2.04 2.04

9 1.83 1.93 1.96 1.98 1.99 2.00 2.00 2.01 2.01 2.01

10 1.81 1.91 1.94 1.96 1.97 1.97 1.98 1.98 1.98 1.98

11 1.80 1.89 1.92 1.94 1.94 1.95 1.95 1.96 1.96 1.96

12 1.78 1.87 1.90 1.92 1.93 1.93 1.94 1.94 1.94 1.94

13 1.77 1.86 1.89 1.90 1.91 1.92 1.92 1.93 1.93 1.93

14 1.76 1.85 1.88 1.89 1.90 1.91. 1.91 1.91 1.92 1.92

15 1.75 1.84 1.87 1.88 1.89 1.90 1.90 1.90 1.90 1.91

16 1.75 1.83 1.86 1.87 1.88 1.89 1.89 1.89 1.90 1.90

17 1.74 1.82 1.85 1.87 1.87 1.88 1.88 1.89 1.89 1.89

18 1.73 1.82 1.85 1.86 1.87 1.87 1.88 1.88 1.88 1.88

19 1.73 1.81 1.84 1.85 1.86 1.87 1.87 1.87 1.87 1.88

20 1.72 1.81 1.83 1.85 1.86 1.86 1.86 1.87 1.87 1.87

22 1.72 1.80 1.83 1.84 1.85 1.85 1.85 1.86 1.86 1.86

24 1.71 1.79 1.82 1.83 1.84 1.84 1.85 1.85 1.85 1.85

26 1.71 1.79 1.81 1.82 1.83 1.84 1.84 1.84 1.84 1.85

28 1.70 1.78 1.81 1.82 1.83 1.83 1.83 1.84 1.84 1.84

30 1.70 1.78 1.80 1.81 1.82 1.83 1.83 1.83 1.83 1.83

35 1.69 1.77 1.79 1.80 1.81 1.82 1.82 1.82 1.82 1.83

40 1.68 1.76 1.79 1.80 1.80 1.81 1.81 1.81 1.82 1.82

60 1.67 1.75 1.77 1.78 1.79 1.79 1.80 1.80 1.80 1.80

120 1.66 1.73 1.75 1.77 1.77 1.78 1.78 1.78 1.78 1.78

0c 1.645 1.716 1.739 1.750 1.756 1.760 1.763 1.765 1.767 1.768

TABLE G.--Critical values of t(ap,v) for testing zero against nonzero dose levels

(p = number of nonzero levels)---Continued

a = .01

1 2 3 4 5 C1 7 8 9 10

5 3.36 3.50 3.55 3.57 3.59 3.60 3.60 3.61 3.61 3.61

6 3.14 3.26 3.29 3.31 3.32 3.33 3.34 3.34 3.34 3.35

7 3.00 3.10- 3.13 3.15 3.16 3.16 3.17 3.17 3.17 3.17

8 2.90 2.99 3.01 3.03 3.04 3.04 3.05 3.05 3.05 3.05

9 2.82 2.90 2.93 2.94 2.35 2.95 2.96 2.96 2.96 2.96

10 2.76 2.84 2.86 2.88 2.88 2.89 2.89 2.89 2.90 2.90

11 2.72 2.79 2.81 2.82 2.83 2.83 2.84 2.84 2.84 2.84

12 2.68 2.75 2.77 2.78 2.79 2.79 2.79 2.80 2.80 2.80

13 2.65 2.72 2.74 2.75 2.75 2.76 2.76 2.76 2.76 2.76

14 2.62 2.69 2.71 2.72 2.72 2.73 2.73 2.73 2.73 2.73

15 2.60 2.66 2.68 2.69 2.70 2.70 2.70 2.71 2.71 2.71

16 2.58 2.64 2.66 2.67 2.68 2.68 2.68 2.68 2.68 2.69

17 2.57 2.63 2.64 2.65 2.66 2.66 2.66 2.66 2.67 2.67

18 2.55 2.61 2.63 2.64 2.64 2.64 2.65 2.65 2.65 2.65

19 2.54 2.60 2.61 2.62 2.63 2.63 2.63 2.63 2.63 2.63

20 2.53 2.58 2.60 2.61 2.61 2.62 2.62 2.62 2.62 2.62

22 2.51 2.56 2.58 2.59 2.59 2.59 2.60 2.60 2.60 2.60

24 2.49 2.55 2.56 2.57 2.57 2.57 2.58 2.58 2.58 2.58

26 2.48 2.53 2.55 2.55 2.56 2.56 2.56 2.56 2.56 2.56

28 2.47 2.52 2.53 2.54 2.54 2.55 2.55 2.55 2.55 2.55

30 2.46 2.51 2.52 2.53 2.53 2.54 2.54 2.54 2.54 2.54

35 2.44 2.49 2.50 2.51 2.51 2.51 2.51 2.52 2.52 2.52

40 2.42 2.47 2.48 2.49 2.49 2.50 2.50 2.50 2.50 2.50

60 2.39 2.43 2.45 2.45 2.46 2.46 2.46 2.46 2.46 2.46

120 2.36 2.40 2.41 2.42 2.42 2.42 2.42 2.42 2.42 2.43

oo 2.326 2.366 2.377 2.382 2.385 2.386 2.387 2.388 2.389 2.389

Source: Reproduced from D.A.Williams, A test for differences between treatment means when

several dose levels are comparedwith a zero dose control, Biometrics 27 (1971), with the permission of

the author and the editor.

LIST OF REFERENCES

Alam, K., and Saxena, K. M. L. 1974. On Interval Estimation of a Ranked Parameter. Jour. Roy. Statis. Soc.

B 36: 277-283.

Anderson. V. L., and McLean, R. A. 1974. Design of Experiments: A Realistic Approach. Marcel Dekker,

Inc., New York.

Arvesen, J. N., and McCabe, G. P., Jr. 1975. Subset Selection Problems for Variances With Applications to

Regression Analysis. Jour. Amer. Statis. Assoc. 70: 166-170.

Balaam, L. N. 1963. Multiple-Comparisons A Sampling Experiment. Austral. Jour. Statis. 5: 62-85.

Bancroft, T. A. 1968. Topics in Intermediate Statistical Methods. V. 1. Iowa State Univ. Press, Ames.

Barlow, R. E., and Gupta, S. S. 1969. Selection Procedures for Restricted Families of Probability Distribu-

tions. Ann. Math. Statis. 40: 905-934.

Bartholomew, D. J. 1961. Ordered Tests in the Analysis of Variance. Biometrika 48: 325-332.

Bechhofer, R. E. 1968. Single-stage Procedures for Ranking Multiply-Classified Variances of Normal

Populations. Technometrics 10: 693-714. r

___ 1969. Optimal Allocation of Observations When Comparing Several Treatments With a Control. In

Multivariate Analysis-II, P. R. Krishnaiah, ed., pp. 463-473. Academic Press, New York.

-_ Kiefer. J., and Sobel, M. 1968. Sequential Identification and Ranking Procedures. Univ. Chicago

Press, Chicago.

Elmaghraby, S., and Morse, N. 1959. A Single-Sample Multiple-Decision Procedure for Selecting the

Multinomial Event Which Has the Highest Probability. Ann. Math. Statis. 30: 102-119.

Bernhardson, C. A. 1975. Type I Error Rates When Multiple Comparison Procedures Follow a Significant F

Test of ANOVA. Biometrics 31: 229-232.

Beyer, W. H., ed. 1968. Handbook of Tables for Probability and Statistics. 2d ed. The Chemical Rubber Co.,

* Cleveland.

Bhargava, R. P., and Srivastava, M. S. 1973. On Tukey's Confidence Intervals for the Contrasts of Means for

the Intraclass Correlation Model. Jour. Roy. Statis. Soc. B 35: 147-152.

Bland, R. P., and Bratcher, T. L. 1968. A Bayesian Approach to the Problem of Ranking Binomial

Probabilities. SIAM Jour. Appl. Math. 16: 843-850.

Boardman, T. J., and Moffitt, D. R. 1971. Graphical Monte Carlo Type I Error Rates for Multiple Comparison

Procedures. Biometrics 27: 738-744.

Bohrer, R. 1967. On Sharpening Scheffe's Bounds. Jour. Roy. Statis. Soc. B 29: 110-114.

Box. G. E. P., and Hunter, J. F. 1958.. Experimental Designs for Exploring Response Surfaces. In

Experimental Designs in Industry. Victor Chew, ed., pp. 138-190. John Wiley and Sons, Inc., New York.

Bradu, D., and Gabriel, K. R. 1974. Simultaneous Statistical Inference on Interactions in Two-Way Analysis

of Variance. Jour. Amer. Statis. Assoc. 69: 428-436.

Brown, M. B., and Forsythe, A. B. 1974. The ANOVA and Multiple Comparisons for Data With Heterogene-

ous Variances. Biometrics 30: 719-724.

Carter, S. G., and Swanson, M. R. 1971. Detection of Differences Between Means: A Monte Carlo Study of

Five Pairwise Multiple Comparisons Procedures. Agron. Jour. 63: 940-945.

Carmer, S. G., and Swanson, M. R. 1973. Evaluation of Ten Pairwise Multiple Comparison Procedures by

Monte Carlo Methods. Jour. Amer. Statis Assoc. 68: 66-74.

Chew, V. 1962. Regression Techniques in the Analysis of Variance. Industrial Quality Control. v. 18, No. 12,

pp. 1-2.

Chiu, W. K. 1974a. Selecting the m Populations With Largest Means From k Normal Populations With

Unknown Variances. Austral. Jour. Statis. 16: 144-147.

___ 1974b. The Ranking of Means of Normal Populations for a Generalized Selection Goal. Biometrika 61:

579-584.

Cochran, W. G., and Cox, G. M. 1957. Experimental Designs, 2d ed. John Wiley and Co., New York.

Cornell, J. A. 1971. A Review of Multiple Comparison Procedures for Comparing a Set of k Population

Means. Soil Crop Sci. Soc. Fla. Proc. 31: 92-97.

Cox, D. R. 1965. A Remark on Multiple Comparison Methods. Technometrics 6: 223-224.

59

David, H. A. 1956. The Ranking of Variances in Normal Populations. Jour. Amer. Statis. Assoc. 51: 621-626.

__ 1962. Multiple Decisions and Multiple Comparisons, Chapter 9. In Contributions to Order Statistics.

Sarhan, A. E., and Greenberg, G. B., ed., John Wiley and Sons, Inc., pp. 144-162, New York.

Davies, 0. L., ed. 1956. The Design and Analysis of Industrial Experiments. Oliver and Boyd, Edinburgh.

Dixon, D. 0., and Duncan, D. B. 1975. Minimum Bayes Risk t-Intervals for Multiple Comparisons. Jour.

Amer. Statis. Assoc. 70: 822-831.

Dudewicz, E. J. 1976. Introduction to Statistics and Probability (Ch. 11, Ranking and Selection Procedures).

Holt, Rinehart and Winston, New York.

__ Ramberg, J. S., and Chen, H. J. 1975. New Tables for Multiple Comparisons With a Control

(Unknown Variances). Biometrische Zeitschrift 17: 13-26.

Duncan, D. B. 1955. Multiple Range and Multiple F Tests. Biometrics 11: 1-42.

___ 1957. Multiple Kange Tests for Correlated and Heteroscedastic Means. Biometrics 13: 164-1.76.

_ 1965. A Bayesian Approach to Multiple Comparisons. Technometrics 7: 171-222.

______ 1970. Answer to Query #273, Multiple Comparison Methods for Comparing Regression Coefficien:,s.

Biometrics 26: 141-143.

___ 1975. t Tests and Intervals for Comparisons Suggested by the Data. Biometrics 8:1: -.:-:.1.

Dunn, 0. J. 1961. Multiple Comparisons Among Means. Jour. Amer. Statis. Assoc.'56: 52-64

__ 1964. Multiple Comparisons Using Rank Sums. Technometrics 6: 241-252.

___ and Massey, F. J., Jr. 1965. Estimation of Multiple Contrasts Using t-Distributions. .Jour. Amer.

Statis. Assoc. 60: 573-583.

Dunnett, C. W. 1955. A Multiple Comparisons Procedure for Comparing Several Treatments With a Controi.

Jour. Amer. Statis. Assoc. 50: 1096-1121.

___ 1964. New Tables for Multiple Comparisons With a Control. Biometrics 20: 482-491.

___ 1970. Multiple Comparison Tests (Query #272). Biometrics 26: 139-141.

Eaton, M. L. 1967. Some Optimum Properties of Ranking Procedures. Ann. Math. Statis. 38: 124-137.

Einot, I., and Gabriel, K. R. 1975. A Study of the Powers of Several Methods of Multiple Comparisons. Jour.

Amer. Statis. Assoc. 70: 574-583.

Federer, W. T. 1955. Experimental Design, Theory and Application. Macmillan & Co., New York.

___ 1961. Experimental Error Rates. Amer. Soc. Hort. Sci. Proc. 78: 605-615.

Fienberg, S. E., and Holland, P. W. 1973. Simultaneous Estimation of Multinomial Cell Probabilities. Jour.

Amer. Statis. Assoc. 68: 683-691.

Fisher, R. A. 1935. The Design of Experiments. 1st ed. Oliver and Boyd, London.

and Yates, F. 1963. Statistical Tables for Biological, Agricultural, and Medical Research. 6th ed.

Oliver and Boyd Ltd., Edinburgh.

Gabriel, K. R. 1964. A Procedure for Testing the Homogeneity of all Sets of Means in Analysis of Variance.

Biometrics 20: 459-477.

--__ 1966. Simultaneous Test Procedures for Multiple Comparisons on Categorical Data. Jour. Amer.

Statis. Assoc. 61: 1081-1096.

___ 1968. Simultaneous Test Procedures in Multivariate Analysis of Variance. Biometrika 55: 489-504.

-__ 1969a. Simultaneous Test Procedures Some Theory of Multiple Comparisons. Ann. Math. Statis.

40: 224-250.

Gabriel, K. R. 1969b. A Comparison of Some Methods of Simultaneous Inference in MANOVA. In Mul-

tivariate Analysis-II. P. R. Krishnaiah, ed., pp. 67-88. Academic Press, New York.

Games, P. A. 1971. Multiple Comparisons of Means. Amer. Ed. Res. Jour. 8: 531-565.

Gill, J. L. 1973. Current Status of Multiple Comparisons of Means in Designed Experiments. Jour. Dairy Sci.

56: 973-977.

Goodman, L. A. 1965. On Simultaneous Confidence Intervals for Multinomial Proportions. Technometrics 7:

247-254.

Gupta, S. S. 1963. On a Selection and Ranking Procedure for Gamma Populations. Ann. Inst. Statis, Math.

14: 199-216.

___ 1965. On Some Multiple Decision (Selection and Ranking) Rules. Technometrics 6: 225-245.

____ and Sobel, M. 1957. On a Statistic Which Arises in Selection and Ranking Problems. Ann. Math.

Statis. 28: 957-967.

and Sobel, M. 1958. On Selecting a Subset Which Contains All Populations Better Than a Standard.

Ann. Math. Statis 29: 235-244.

-- and Sobel, M. 1960. Selecting a Subset Containing the Best of Several Binomial Populations. In

Contribution to Probability and Statistics, ch. 20. Stanford University Press, Stanford.

and Panchapakesan, S. 1971. Contributions to Multiple Decision (Subset Selection) Rules, Mul-

tivariate Distribution Theory and Order Statistics. Report No. 71-0218. Aerospace Res. Lab., AFSC,

USAF, Wright-Patterson AFB, Ohio.

-__ and Panchapakesan, S. 1972. On a Class of Subset Selection Procedures. Ann. Math. Statis. 43:

814-822.

Hahn, G. J. 1970. Prediction Intervals for a Normal Distribution. Gen. Elec. Co. TIS Rpt. No. 71-C-038. Gen.

Elec. Co., Schenectady.

-__ 1972. Simultaneous Prediction Intervals for a Regression Model. Technometrics 14: 203-214.

Sand Hendrickson, R. W. 1971. A Table of Percentage Points of the Distribution of the Largest

Absolute Value of k Student t Variates and its Applications. Biometrika 58: 323-332.

Halperin, M., and Greenhouse, S. W. 1958. A Note on Multiple Comparisons for Adjusted Means in the

Analysis of Covairiance. Biometrika 45: 256-259.

Harter, H. L. 1957. Error Rates and Sample Sizes for Range Tests in Multiple Comparisons. Biometrics 13:

511-536.

-__ 1960a. Critical Values for Duncan's New Multiple Range Tests. Biometrics 16: 671-685.

-__ 1960b. Tables of Range and Studentized Range. Ann. Math. Statis. 31: 1122-1147.

-__ 1961. Corrected Error Rates for Duncan's New Multiple Range Test. Biometrics 17: 321-324.

-__ 1970. Order Statistics and Their Use in Testing and Estimation. v. 1. Tests Based on Range and

Studentized Range of Samples from a Normal Population. (Contains updated versions of Harter's Biomet-

rics (1957, 1960,1961), Technometrics (1961), and AMS (1960) papers.) U.S. Govt. Print. Off., Washington,

D.C.

___ 1970. Multiple comparison procedures for interactions. Amer. Statis. 24: 30-32.

Hartigan, J. A. 1975. Clustering Algorithms. John Wiley and Co., Inc., New York.

Hartley, H. 0. 1955. Some Recent Developments in Analysis of Variance. Communications on Pure and

Applied Mathematics 8: 47-72.

Hochberg, Y. 1975. An Extension of the T-Method to General Unbalanced Models of Fixed Effects. Jour.

Roy. Statis. Soc. B 37: 426-433.

___ 1976. A Modification of the T-Method of Multiple Comparisons for a One-Way Layout with Unequal

Variances. Jour. Amer. Statis. Assoc. 71: 200-203.

and Quade, D. 1975. One-Sided Simultaneous Confidence Bounds on Regression Surfaces With

Intercepts. Jour. Amer. Statis. Assoc. 70: 889-891.

Hoel, D., and Sobel, M. 1972. Comparisons of Sequential Procedures for Selecting the Best Binomial

Population. In Sixth Berkeley Symposium Math. Statis. Probability Proc., v. 4, pp. 53-69.

Hollander, M., and Wolfe, D. A. 1973. Nonparametric Statistical Methods. John Wiley and Co., New York.

Jensen, D. R. 1976. The Comparison of Several Response Functions With a Standard. Biometrics 32: 51-59.

___ and Jones, M. Q. 1969. Simultaneous Confidence Intervals for Variances. Jour. Amer. Statis. Assoc.

64: 324-332.

John, P. W. M. 1971. Statistical Design and Analysis of Experinients. The MacMillan Company, New York.

Johnson, D. E.. 1976. Some New Multiple Comparison Procedures for the Two-Way-AOV Model With

Interaction. Biometrics 32: 929-934.

Jolliffe, I. T. 1975. Cluster Analysis as a Multiple Comparison Method. In Applied Statistics. R. P. Gupta, ed.

North-Holland Pub. Co., New York.

Kappenman. R. F. 1972. A Note on Selection of the Greatest Exceedance Probability. Technometrics 14:

219-222.

Keselman, H. J., Toothaker, L. E., and Shooter, M. 1975. An Evaluation of Two Unequal nk forms of the

Tukey Multiple Comparison Statistic. Jour. Amer. Statis. Assoc. 70: 584-587.

Keuls, M. 1952. The Use of the "Studentized Range" in Connection With an Analysis of Variance. Euphytica

1: 112-122.

Kirk, R. E. 1968. Experimental Design Procedures for the Behavioral Sciences. Brooks/Cole, Belmont.

61

IT^^~y

Kramer, C. Y. 1956. Extension of Multiple Range Tests to Group Means With Unequal Numbers of

Replications. Biometrics 12: 309-310.

-__ 1957. Extension of Multiple Range Tests to Group Correlated Adjusted Means. Biometrics 13: 13-18.

-__ 1972. A First Course in Methods of Multivariate Analysis. Va. Polytech. Inst. State Univ.,

Blacksburg.

Krishnaiah, P. R. 1969. Simultaneous Test Procedures Under General MANOVA Models. In Multivariate

Analysis-II, P. R. Krishnaiah, ed., pp. 121-144. Academic Press, New York.

Kuiper, F. K., and Fisher, L. 1975. A Monte Carlo Comparison of Six Clustering Procedures. Biometrics 31:

777-784.

Kurtz, T. E., Link, R. F., Tukey, J. W., and Wallace, D. L. 1965. Short-Cut Multiple Comparisons for

Balanced Single and Double Classifications: Part 1, Results. Technometrics 7: 95-169.

LeClerg, F I1 1957. Mean Separation by the Functional Analysis of Variance and Multiple Comparisons,

U.S. Dept. Agr., Agr. Res. Serv., ARS 20-3. (Reprinted July 1970.)

Leonard, T. 1972. Bayesian Methods for Binomial Data. Biometrika 59: 581-589.

Levy, K. J. 1975a. An Empirical Comparison of Several Multiple Range Tests for Variances. Jour. Amer.

Statis. Assoc. 70: 180-183.

__ 1975b. A Multiple Range Procedure for Correlated Variances in a Two-Way Classification. Biomet-

rics 31: 243-246.

Little, T. M., and Hills, F. J. 1972. Statistical Methods in Agricultural Research. Univ. Calif., Agr. Ext.

Serv., Davis.

Marriott, F. H. C. 1971. Practical Problems in a Method of Cluster Analysis. Biometrics 27: 501-514.

McCool, J. I. 1975. Multiple Comparisons for Weibull Parameters. IEEE Transactions on Reliability R-24:

186-192.

McDonald, B. J., and Thompson, W. A., Jr. 1967. Rank Sum Multiple Comparisons in One- and Two-Way

Classifications. Biometrika 54: 487-497.

Mead, R., and Pike, D. J. 1975. A Review of Response Surface Methodology From a Biometrics Viewpoint.

Biometrics 31: 803-852.

Miller, R. G., Jr. 1966. Simultaneous Statistical Inference. McGraw-Hill Book Co., New York.

Morrison, D. F. 1967. Multivariate Statistical Methods. McGraw-Hill Book Co., New York.

Myers, R. H. 1971. Response Surface Methodology. Allyn and Bacon. Inc., Boston.

Nair, K. R. 1948. The Studentized Form of the Extreme Mean Square Test in the Analysis of Variance.

Biometrika 35: 16-31.

Newman, D. 1939. The Distribution of the Range in Samples From a Normal Population, Expressed in Terms

of an Independent Estimate of Standard Deviation. Biometrika 31: 20-30.

Ofosu, J. B. 1975. A Two-Stage Minimax Procedure for Selecting the Normal Population With the Small

Variance. Jour. Amer. Statis. Assoc. 70: 171-174.

O'Neill, R., and Wetherill, G. B. 1971. The Present State of Multiple Comparison Methods. Jour. Roy. Statis.

Soc. 70 :171-174.

Patel, J. K. 1976. Ranking and Selection of IFR Populations Based on Means. Jour. Amer. Statis. Assoc. 71:

143-146.

Paulson, E. 1962. A Sequential Procedure for Comparing Several Experimental Categories With a Standard

or Control. Ann. Math. Statis. 33: 438-443.

___ 1964 A Sequential Procedure for Selecting the Population with the Largest Mean From K Normal

Populations. Ann. Math. Statis. 35: 174-180.

-__ 1967. Sequential Procedures for Selecting the Best One of Several Binomial Populations. Ann. Math.

Statis. 38: 117-123.

Pearson, E. S., and Hartley, H. 0. 1966. Biometrika Tables for Statisticians. V. 1, 3d ed. Cambridge Univ.

Press, London.

Peng, K. C. 1967. The Design and Analysis of Scientific Experiments. Addison-Wesley Pub. Co., Inc.,

Reading.

Petrinovich, L. F., and Hardyck, C. D. 1969. Error Rates for Multiple Comparison Methods. Psychol. Bul.

71: 43-54.

Puri, M. L., and Puri, P. S. 1969. Multiple Decision Procedures Based on Ranks for Certain Problems in

Analysis of Variance. Ann. Math. Statis. 40: 619-632.

Ramachandran, K. V. 1956. Contributions to simultaneous confidence interval estimation. Biometrics 12:

51-56.

Reading, J. C. 1975. A Multiple Comparison Procedure for Classifying All Pairs out of K Means as Close or

Distant. Jour. Amer. Statis. Assoc. 70: 832-838.

Reiersol, 0. 1961. Linear and Non-Linear Multiple Comparisons in Logit Analysis. Biometrika 48: 359-365.

Corrigenda, Biometrika 49: 284.

Rhyne, A. L., and Steel, R. G. D. 1965. Tables for a Treatments Versus Control Multiple Comparisons Sign

Test. Technometrics 7: 293-306.

Sand Steel, R. G. D. 1967. A Multiple Comparisons Sign Test: All Pairs of Treatments. Biometrics 23:

539-549.

Rizvi, M. H., Sobel, M., and Woodworth, G. C. 1968. Nonparametric Ranking Procedures for Comparisons

With a Control. Ann. Math. Statis. 39: 2075-2093.

-__ 1971. Some Selection Problems Involving Folded Normal Distributions. Technometrics 13: 355-369.

Robbins, H., Sobel, M., and Starr, N. 1968. A Sequential Procedure for Selecting the Largest of K Means.

Ann. Math. Statis. 39: 88-92.

Robson, D. S. 1961. Multiple Comparisons With a Control in Balanced Incomplete Block Designs.

Technometrics 3: 103-105.

Ryan, T. A. 1959. Multiple Comparisons in Psychological Research. Psychol. Bul. 56: 26-47.

--__ 1960. Significance Tests for Multiple Comoarison of Proportions, Variances, and Other Statistics.

Psychol. Bul. 57: 318-328.

Ryan, T. A., Jr., and Antle, C. E. 1976. A Note on Gupta's Selection Procedure. Jour. Amer. Statis. Assoc.

71: 140-142,

Santner, T. J. 1975. A Restricted Subset Selection Approach to Ranking and Selection Problems. Ann. Stat.

3: 334-349.

Saxena, K. M. L. 1976. A Single-Sample Procedure for Estimation of the Largest Mean. Jour. Amer. Statis.

Assoc. 71: 147-148.

Schafer, W. D., and MacReady, G. B. 1975. A Modification of the Bonferroni Procedure on Contrasts Which

Are Grouped Into Internally Independent Sets. Biometrics 31: 227-228.

Scheffe, H. 1953. A Method for Judging All Contrasts in the Analysis of Variance. Biometrika 40: 87-104.

-__ 1959. The Analysis of Variance. John Wiley and Sons, Inc., New York.

Scott, A. J., and Knott, M. 1974. A Cluster Analysis Method for Grouping Means in the Analysis of Variance.

Biometrics 30: 507-512.

Seeger, P. 1966. Variance Analysis of Complete Designs. Almqvist and Wiksell, Stockholm.

Sen, P. K. 1969. A Generalization of the T-Method of Multiple Comparisons. Jour. Amer. Statis. Assoc. 64:

290-295.

___ 1969. On Nonparametric T-Method of Multiple Comparisons for Randomized Blocks. Ann. Inst.

Statis. Math. 21: 329-333.

Sherman, E. 1965. A Note on Multiple Comparisons Using Rank Sums. Technometrics 6: 255-256.

Siotani, M. 1964. Interval Estimates for Linear Combinations of Means. Jour. Amer. Statis. Assoc. 59:

1141-1164.

Slivka, J. 1970. A One Sided Nonparametric Multiple Comparison Control Percentile Tests: Treatments

Versus Control. Biometrika 57: 431-438.

Sobel, M. 1969. Selecting a Subset Containing at Least One of the T Best Populations. In Multivariate

Analysis-II. P. R. Krishnaiah, ed. pp. 515-539. Academic Press, New York.

_. and Tong, Y. L. 1971. Optimal Allocation of Observations for Partitioning a Set of Normal Popula-

tions in Comparison With a Control. Biometrika 58: 177-181.

Spjqtvoll, E. 1972. Multiple Comparisons of Regression Functions. Ann. Math. Statis. 72: 1076-1088.

___ 1972. Joint Confidence Intervals for All Linear Functions of Means in the One-Way Layout With

Unknown Group Variances. Biometrika 59: 683-685.

___ and Stoline, M. R. 1973. An Extension of the T-Method of Multiple Comparison to Include the Cases

With Unequal Sample Sizes. Jour. Amer. Statis. Assoc. 68: 975-978.

560-572.

__ _- 1961. Some Rank Sum Multiple Comparisons Tests. Biometrics 17: 539-552.

___ and Torrie, J. H. 1960. Principles and Procedures of Statistics. McGraw-Hill, New York.

Tarone, R. E. 1976. Simultaneous Confidence Ellipsoids in the General Linear Model. Technometrics 18:

85-87.

Taylor, R. J., and David, H. A. 1962. A Multi-Stage Procedure for the Selection of the Best of Several

Binomial Populations. Jour. Amer. Statis. Assoc. 57: 785-796.

Thigpen, C. C., and Paulson. A. S. 1974. A Multiple Range Test for Analysis of Covariance. Biometrika 61:

475 4.

Thomas, D. A. H. 1973. Multiple Comparisons Among Means A Review. Statistician 22: 16-42.

__ 1974. Error Rates in Multiple Comparisons Among Means Results of a Simulation Exercise. Jour.

Roy. Statis. Soc. C 23: 284-294.

Tobach, E., Smith, M., Rose, G., and Richter, D. 1967. A Table for Rank Sum Multiple Paired Comparisons.

Technometrics 9: 561-567.

Tong, Y. L. 1970. Multi-Stage Interval Estimation of the Largest Mean of K Normal Populations. Jour. Roy.

Statis. Soc. B 32: 272-277.

Trawinski, B. J., and David, H. A. 1963. Selection of the Best Treatment in a Paired-Comparison Experi-

ment. Ann. Math. Statis. 34: 75-94.

Tukey, J. W. 1949. Comparing Individual Means in the Analysis of Variance. Biometrics 5: 99-114.

___ 1951. Quick- and-dirty Methods in Statistics, Part 2. Simple Analyses for Standard Designs. Amer.

Soc. Qual. Control, 5th Ann. Conv. Trans. pp. 189-197.

___ 1953a. Some Selected Quick and Easy Methods of Statistical Analysis. Trans. N.Y. Acad. Sci. (2) 16:

88-97.

___ 1953b. The Problem of Multiple Comparisons. Unpublished Dittoed Notes, Princeton Univ., 396 pp.

_ 1960. Conclusions vs. Decisions. Technometrics 2: 423-433.

Ury, H. K. 1976. A Comparison of Four Procedures for Multiple Comparisons Among Means (Pairwise

Contrasts) for Arbitrary Sample Sizes. Technometrics 18: 89-97.

Verhagen, A. M. W. 1963. The "Caution Level" in Multiple Tests of Significance. Austral. Jour. Statis. 5:

41-48.

Wackerly, D. D. 1975. An Alternative Approach to the Problem of Selecting the Best of K Populations.

Technical Report #91. Univ. Fla. Dept. Statis., Gainesville.

Waldo, D. R. 1976. An Evaluation of Multiple Comparison Procedures. Jour. Animal Sci. 42: 539-544.

Waller, R. A., and Duncan, D. B. 1969 and 1972. A Bayes Rule for the Symmetric Multiple Comparison

Problem. Jour. Amer. Statis. Assoc. 64: 1484-1503, and Corrigenda 67: 253-255.

Weatherill, G. B., and Ofosu, J. B. 1974. Selection of the Best of K Normal Populations. Jour. Roy. Statis.

Soc. C 23: 253-277.

Williams, D. A. 1971. A Test for Differences Between Treatment Means When Several Dose Levels Are

Compared With a Zero Dose Control. Biometrics 27: 103-117.

1972. The Comparison of Several Dose Levels With a Zero Dose Control. Biometrics 28: 519-531.

Wynn, H. P., and Bloomfield, P. 1971. Simultaneous Confidence Bands in Regression Analysis. Jour. Roy.

Statis. Soc. B 33: 202-217.

64

* U.S. GOVERNMENT PRINTING OFFICE: 1978 0-280-931/SEA-5

ARS/H/6

C

COMPARISONS

AMONG TREATMENT

MEANS IN AN

ANALYSIS

OF VARIANCE

AGRICULTURAL

RESEARCH

SERVICE

OF UNITED STATES

DEPARTMENT OF

AGRICULTURE

HEADQUARTERS