Asymptotic nonparametric confidence intervals for the ratio of scale parameters in balanced one-way random effects models

MISSING IMAGE

Material Information

Title:
Asymptotic nonparametric confidence intervals for the ratio of scale parameters in balanced one-way random effects models
Uncontrolled:
One-way random effects models
Random effects models
Physical Description:
v, 127 leaves : ; 28 cm.
Language:
English
Creator:
Groggel, David John, 1956-
Publication Date:

Subjects

Subjects / Keywords:
Confidence intervals   ( lcsh )
Statistical hypothesis testing -- Asymptotic theory   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1983.
Bibliography:
Includes bibliographical references (leaves 125-126).
Statement of Responsibility:
by David John Groggel.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 000447022
notis - ACK8310
oclc - 11386637
System ID:
AA00003431:00001

Full Text
















ASYMPTOTIC NONPARAMETRIC CONFIDENCE INTERVALS
FOR THE RATIO OF SCALE PARAMETERS IN BALANCED
ONE-WAY RANDOM EFFECTS MODELS
















BY

DAVID JOHN GROGGEL


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


1983
















ACKNOWLEDGMENTS


I would like to express my deepest appreciation to all those who

have assisted me during the preparation of this dissertation and

throughout my time in graduate school. In particular I wish to thank

Dr. Dennis Wackerly, chairman of my research committee, for his help

with this research and for his guidance and kindness as my advisor. I

would also like to thank Dr. P. V. Rao, for his frequent contributions

to this research, and Dr. Richard Scheaffer and the entire Department of

Statistics, for the encouragement, support, and friendships they have

provided.

Further, I would like to thank my parents, Mr. and Mrs. Richard

Groggel, for their ever-present love and support, and my in-laws, Mr.

and Mrs. Warren Rubin, for their love and support over the past five

years. Finally, special thanks goes to my wife Kathy for her help with

the typing of this dissertation but especially for the love, patience,

and encouragement she constantly offers.


1

















TABLE OF CONTENTS


ACKNOWLEDGMENTS .... ....... ...................... ............

ABSTRACT....................................................

CHAPTER


PAGE

........ii

........iv


ONE INTRODUCTION..................................................1

TWO CONFIDENCE INTERVALS USING U-STATISTICS .......................7

2.1 General Theory of U-Statistics...........................7
2.2 Confidence Intervals for the Intraclass
Correlation Coefficient...............................9

THREE CONFIDENCE INTERVALS USING MODIFIED
ANSARI-BRADLEY STATISTICS.................................17

3.1 Model and Formation of Pseudo-Samples...................17
3.2 Asymptotic Distribution of the Ansari-Eradley
Statistic Using Pseudo-Samples ........................22
3.3 Asymptotic Confidence Intervals Using the
Modified Ansari-Bradley Statistic.....................39

FOUR MONTE CARLO STUDY............................................44

FIVE SUMMARY......................................................79

APPENDICES

A VARIANCES AND COVARIANCE OF U1 AND U2........................82

B THE RELATIONSHIP BETWEEN Ul, U2, MST, AND MSE.................85

C A CONSISTENT ESTIMATE FOR oT.............................. 89

D DERIVATION OF ENDPOINTS IN CHI-SQUARE PROCEDURE..............93

E C AND C TERMS ...............................................97

REFERENCES ........................................... ................ 125

BIOGRAPHICAL SKETCH........................................ ..................127

















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy


ASYMPTOTIC NONPARAMETRIC CONFIDENCE INTERVALS
FOR THE RATIO OF SCALE PARAMETERS IN BALANCED
ONE-WAY RANDOM EFFECTS MODELS

By

David John Groggel

August, 1983

Chairman: Dennis D. Wackerly
Major Department: Statistics


This dissertation examines the problem of estimating the proportion

of variability in the responses from a balanced one-way random effects

model that can be attributed to the treatment effect. Methods are

derived that do not require the classical assumptions of normality of

both the treatment and error effects.

Asymptotic confidence intervals are derived from functions of

U-statistics that possess either an asymptotic normal distribution or

asymptotic chi-square distribution. These methods require the effects

to have continuous distributions with zero means and finite fourth

moments.

Asymptotic confidence intervals are also derived based on the

asymptotic normality of modified versions of the Ansari-Bradley

two-sample scale statistic. Pseudo-samples of observations that are

asymptotically equivalent to samples of the effects are formed using

either sample means or sample medians. The Ansari-Bradley statistic


A1









calculated using these samples is shown to have an asymptotic normal

distribution and intervals are formed following a procedure of Sen

([1966] Annals of Mathematical Statistics 37, 1759-1770). In forming

these intervals a representation of the Ansari-Bradley statistic

developed by Bhattacharyya ([1977] Journal of the American Statistical

Association 72, 459-463) is used. The construction of these intervals

requires the effects to have continuous distributions that are symmetric

about zero and that differ only by a scale parameter. Other assumptions

on the distributions of the effects are needed depending on whether

sample means or sample medians are used to form the pseudo-samples.

A Monte Carlo study was performed to compare intervals formed by

these methods with the classical normal theory intervals and intervals

based on jackknifed U-statistics as derived by Arvesen ([1969] Annals of

Mathematical Statistics 40, 2076-2100). The study shows that the

intervals based on functions of U-statistics are poor while the

intervals based on the modified Ansari-Bradley statistics are nearly

always comparable to and, in some cases, superior to the normal theory

and Arvesen intervals.

















CHAPTER ONE
INTRODUCTION


The balanced one-way random effects model,


Zij = P + ai + ij i = 1,2,...,k, j = 1,2,...,n,


has been studied and analyzed by many people. In this model, p is an

unknown constant and the cij and ai are independent samples of

independent observations from continuous populations. The majority of

the research concerning this model has been done under the classical

assumptions that the eij (commonly called the error effects) and ai

(commonly called the treatment effects) have normal distributions with

2 2
zero means and variances a2 and a respectively. In this dissertation,
E a

the model is studied under more general assumptions concerning the

distributions of these effects.

In the classical case, the test of hypothesis that is usually of

2
interest is a test concerning the magnitude of a The test is usually

of the form


2 2
(1.1) H : = 0 H : > 0
o a a a


or


2 2 2 2
(1.2) H : a c c2 : a > co
o a a a a e


where c is some specified constant.









In some instances, particularly applications in genetics and the

social sciences, an estimate of Ca/(ao + a ) is desired. This quantity

is commonly known as the intraclass correlation coefficient and is

denoted by p. In most cases, an estimate of p is more informative than

testing the hypotheses in either (1.1) or (1.2). An estimate provides

information about the actual relative magnitudes of the variance

components rather than just a conclusion that Ho should or should not be

rejected.

As an example of where an estimate of p is useful, consider the

problem described in Snedecor and Cochran (1967, Example 10.13.1). In a

study involving Poland China swine, two boars were taken from each of

four litters. All of the litters had the same sire and all eight boars

were fed a standard ration from weaning to a weight of about 225

pounds. The response of interest was the average daily weight gain.
2
The component 0a represents the variability in weight gain that was due
a
to the genetic differences in the litters while a represents the

variability in weight gain due to non-genetic factors. The ratio
2 2 + 2
a2/(c2 + a2) is the proportion of the total variability that can be

attributed to the genetic differences in the litters.

Scheffe' (1959), as well as many others, describes the procedures

for testing the hypotheses in (1.1) and (1.2) as well as the form of an

exact confidence interval for p. Both the test procedure and confidence

interval construction use the mean squares from the usual analysis of

variance table and percentiles of the F-distribution.

Scheffe" shows that these procedures are not robust if the

assumptions of the normality of the effects are violated. It is

therefore desirable to have procedures that can be used to perform tests









and construct confidence intervals for the parameters associated with

the balanced one-way random effects model which can be used when the

assumptions of the normality of the effects is in doubt.

The analysis of the random effects model without the normality

assumptions has not been researched nearly as much as the classical

case. Govindarajulu and Deshpande' (1972) studied the case in which the

Eij are independent and identically distributed with continuous

distribution function F(x) and the ai are independent with distribution

functions Gi(x). In this case, it is not necessary that the

expectations of the ai all be equal. Assuming, without loss of

generality, that w = 0 the authors examined the hypotheses



H : G(x) 0 if x < 0 for every i
o(1.3) 0 if x > 0
(1.3)
Ha: Gi(x) is nontrivial for at least one i


and derived the locally most powerful rank test by considering the

alternative hypothesis


H : Zij = Aa. + .ij for small positive A.


The hypotheses in (1.3) are analogous to the hypotheses in (1.1) since

in both cases the null hypothesis states that the ai do not contribute

to the response variable Zij and the alternative hypothesis states that

at least one ai contributes to the response. Govindarajulu (1975)

looked at the same hypotheses under the more restrictive assumption that

Gi(x) = G(x) for every i. Both of these papers considered the

unbalanced one-way random effects model, that is, j = 1,2,...,ni for

i = 1,2,...,k.









For the balanced one-way model, Arvesen (1969) and Arvesen and

Schmitz (1970) used jackknifing techniques on appropriate U-statistics

to develop procedures for testing hypotheses and forming confidence

2 2
intervals for functions of a and a This work was later extended to

the unbalanced model by Arvesen and Layard (1975). The procedures

require the distributions of the eij and ai to be continuous with zero

means. The procedures also assume finite fourth moments in the balanced

model and finite moments of at least order six in the unbalanced model.

Shoemaker (1981) examined some estimation and testing problems

using the concept of mid-variances in the balanced model where the

effects are assumed to have continuous, symmetric distributions.

In Chapter Two of this dissertation, two methods of constructing

asymptotic confidence intervals for p based on the theory of

U-statisics are described. Section 2.1 gives a brief review of some of

the basic results concerning U-statistics. In Section 2.2 a method

using U-statisics similar to those used by Arvesen (1969) is

described. This method was developed before the work of Arvesen was

known to exist. However, the confidence coefficient used for the

intervals in Section 2.2 is derived in a way different than that

presented by Arvesen. As in Arvesen's work, the method in Section 2.2

requires the Eij and ai to be independent random samples of independent

observations from continuous distributions with zero means and finite

fourth moments. Also in Section 2.2, an asymptotic confidence interval

for p is derived using a quadratic form (involving two U-statistics) to

construct a statistic with an asymptotic chi-square distribution with

two degrees of freedom.









In Chapter Three we work with scale parameters rather than

variances. The distribution of a random varible X is said to have a

scale parameter 6 (0 < 6 < =) if X has a distribution function of the

form F(x/6) where F(x) is the distribution function of a random variable

Y and the form of F(x) does not depend on 6. In other words,

X/6 has the same distribution as Y. The advantage of working with scale

parameters is that they may exist for random variables for which

variances (and thus standard deviations) do not exist. For those random

variables where both a scale parameter and a standard deviation exist, a

scale parameter is always a constant multiple of the standard deviation.

In Chapter Three the Eij and ai are assumed to be independent

samples of independent observations from continuous distributions with

distribution functions F(x) = D(x/61) and G(x) = D(x/62) respectively.

That is, 61 is a scale parameter for the .ij and 62 is a scale parameter

for the ai. It is also assumed that both distributions are symmetric

about zero with densities that are bounded and have a bounded first

derivative. In Section 3.1 the Ansari-Bradley two-sample scale

statistic (Ansari and Bradley 1960) is described. Modified versions of

the Ansari-Bradley statistic, one involving the use of sample means and

another involving sample medians, are shown to have asymptotic normal

distributions in Section 3.2. In Section 3.3 these statistics are used

to form asymptotic confidence intervals for 6 /(61 + 62). In those

situations where both scale parameters and variances of the effects

exist, this quantity is numerically equivalent to p.

In Chapter Four we present a summary of a Monte Carlo study that

compares the lengths and observed confidence coefficients of intervals

constructed using normal theory as in Scheffe' (1959), Arvesen's (1969)







6


U-statistics, U-statistics as described in Chapter Two, and the modified

Ansari-Bradley statistics as described in Chapter Three. Chapter Five

contains a summary.

Throughout this dissertation we use the symbol E to denote equal by

definition. Also, unless otherwise specified, sums involving i are from

1 to k, sums involving j are from 1 to n, and integrals are over the

region (-,).
















CHAPTER TWO
CONFIDENCE INTERVALS USING U-STATISTICS


2.1 General Theory of U-Statistics


The theory of U-statisitcs was first developed by Hoeffding (1948).

For the convenience of the reader, in this section we state without

proof some results and theorems due to Hoeffding which we will utilize

in the discussions which follow.

Let Z1'Z "...,Z m be independent, indentically distributed random

vectors and let h(Z,Z ...,Z ) be a function of s(
vectors. A U-statistic has the form

-1
Um = U(Z1^Z,2,..Z-) = s) E h(Z 'Z ...,Z ),
veV 1 2 s

where V is the set of all distinct subsets of integers, (v, 2,..., s),

taken without replacement and without regard to order from (1,2,...,m).

It is easily seen that Um is an unbiased estimate of the parameter

A = E[h(Z ,Z2,...,Z)]. The function h is assumed to be symmetric in

its arguments (it can be made so if it is not) and is known as the

kernel. The value of s is the smallest possible sample size for which

an unbiased estimate of A exists and is referred to as the degree of the

kernel.

Define


hc(z z)2,. z ) = E[h(Z ,Z ,...,,Z ) Z =z ,Z =z ,..., =z ]
c1-2 -s -1-1-2-2 -c -c









and


(2.1.1) = E([hC(Z 1, 2', ,Zc)]2J A2


for c = 1,2,...,s. The quantity Ec can also be written as


(2.1.2) c = Cov[h( ,Z ,...,Z )h( ...,Z )],
1 2 s 1 2 s

where v = (vl2",..., s)' and v = (v2,v',... ,s) are subsets of the

integers (1,2,...,m) with exactly c integers in common.

The variance and asymptotic distribution of a U-statistic are given

in the following results and theorem.

Result 2.1.1 (Hoeffding 1948, Equation 5.13). If

E[h2 (Z Z2...,Zs)] < m, then the variance of Um is


-1 s
Var(U ) = ( (cS)mS .
m "s cs-c c
c=l


Result 2.1.2 (Hoeffding 1948, Equation 5.23). If

E[h2( ZZ2,...,Zs)] < (, then lim mVar(U ) = s2l
-s m+0 m 1

Theorem 2.1.1 (Hoeffding 1948, Theorem 7.1). If

E[h2 ,Z2,...,Z )] < m and I > 0, then m1/2(Um A) d+ N(0,s2
1 2 s m m+O 1

(w)
For w = 1,2,...,g, let U(w) be U-statistics all defined on the same
m
m vectors with degrees s(w), kernels h and expectations A(w). For any

two of these U-statistics, say U () and U define
m m


(2.1.3) (1,2) E[h( ,Z ,...,Z )h (Z,...,Z )]
c v= 11 12 is(l) V21 222 2s(2)

A(1)A(2),


where 1 = (v11'12'" .. vls(1) ) and v2 = (v21' 22'.. 'V2s(2))' are








subsets of the integers (1,2,...,m) with exactly c integers in common.

The covariance of these U-statistics and their joint asymptotic

distribution are described in the following results and theorem.

Result 2.1.3 (Hoeffding 1948, Equation 6.5). If

E(h ) < -, E(h ) < -, and s(2) ( s(1), then the covariance

of U(1) and U(2) is such that
m m


m-1 s(2)
Cov[U(1),U(2) s (s(1) m-s(1) (1,2)
m m s(2)) c s(2)-c c
c=l


Result 2.1.4 (Hoeffding 1948, Page 304). Under the same conditions

(1) (2) (1,2)
as in Result 2.1.3, lim mCov[U)U2 = s(l)s(2)( 2)

Theorem 2.1.2 (Hoeffding 1948, Theorem 7.1). If E(h2) < for
w
w = '1,2,...,g, then


ml/2([U)_A((1)],[U(2)-A(2)],...,[U()-A(g)] m Ng(OA)
M(g)] -m N g(0A),

where A is a g by g matrix with elements Ai = s(i)s(j)Si'j)

In a later paper Hoeffding proved the following theorem concerning

the asymptotic convergence of a U-statistic.

Theorem 2.1.3 (Hoeffding 1961). If E[ h(Z1 Z2 ,...,Z s)] < ,

then U a--s A.
m m-a



2.2 Confidence Intervals for the Intraclass Correlation Coefficient


Consider the balanced one-way random effects model


Zij = p + ai + eij i = 1,2,...,k, j = 1,2,...,n,


where the eij are independent random variables with a continuous







distribution with mean zero and finite fourth moment and the ai are

independent random variables with a continuous distribution (not

necessarily the same family as the distribution of the eij) which also

has mean zero and finite fourth moment. The Eij and ai are assumed to

be independent of each other and the variances of the two distributions

2 2
are denoted by 2 and a respectively. The parameter p is an unknown
a
constant.

In the work that follows in this section, the number of

observations per treatment, n, remains fixed as the number of

treatments, k, increases to infinity. Due to the structure of the model

this is sufficient to obtain, at least theoretically, unlimited

knowledge about both the eij and ai.

Let Zi = (Zil'Zi2',.Zin )', for i = 1,2,...,k, be k independent

and identically distributed vectors. On these k vectors we define two

U-statistics,



-1
(2.2.1) Ul = kZhl(Zi),
i -1
where hl(Zi) = (n) Z E (Z. Z
j

and


-1
(2.2.2) U2= ( I 2 Z i ',h
2 2 i
where h2(Zii.) = n-2Z (Zi-Zi-j) 2.
JJ


These U-statistics are unbiased estimates for the expectations of

their respective kernels which are







E(Ul) = E[h,(Z,)] = E[(Z11-Z12)2]
(2.2.3)
= E(el -E2 ) = 20
11 12 e

and


E(U2) = E[h2(Z1Z2) = E[(Zll-Z21) 2
(2.2.4)
2 2 2
= E(l-a2+11-21) = 20 + 2a.

4 4 2
If E E(el4) and n -E(ao) are finite, and thus E(h ) < and

E(h2) < -, Results 2.1.2 and 2.1.4 imply (see Appendix A) that


4 -1
(2.2.5) lim kVar(U1) = 4n-[4 + (3-n)(n-l) i] = ,
k+.

4 4 2 2
(2.2.6) lim kVar(U2) =4(4 + /n o a /n + 4a22/n) = 22
k+ 22

and


(2.2.7) lim kCov(U1,U2) = 4n- (4 -o) o12.
k+w2

4 2 2 4
Since E( ) = 24 > [E( )]2 = o it is clear that, for large k, U1

and U2 are positively correlated.

Using Theorem 2.1.2 we can describe the asymptotic distributions of

U1 and U2 in the following theorem.

Theorem 2.2.1. If U, and U2 are U-statistics as defined in (2.2.1)

and (2.2.2) and if all, 022, and a12 are as defined in (2.2.5) through

(2.2.7), then


k /2[U -2o2], k 2[U2-(2ao+o2)]) N2(
k+=o







where



all 012
(2.2.8) A = [ )*
12 "22


Using Theorem 2.1.3, (2.2.3), and (2.2.4), we know U1 a-I 202 and
k+- e


U a5~ 202 + 202. It then follows that U /U2 as o/(o2 + o2
k+* k+co


which is equal to 1 c/(a2 + a ). Thus, 1 UI/U2 is a strongly

consistent point estimate for the intraclass correlation coefficient,

p = 0/(2 + 02).
a a E:
It is useful to note at this point that U1 and U2 are related to

quantities encountered in the classical normal theory one-way analysis

of variance. If MST and MSE denote the mean square for treatments and

the mean square for error, respectively, from the analysis of variance,

then we show in Appendix B that MSE = U1/2 and MST = [nU2 + (1-n)U1]/2.

The usual point estimate for p in the normal theory case is

n- (MST-MSE)/[n- (MST-MSE) + MSE] (Scheffe' 1959, Page 229). Using the

above-mentioned relationships between MST, MSE, Ul, and U2, it is easily

seen that the normal theory and U-statistic approaches both lead to the

same point estimate for p.

Consider now the statistic Tk U/U2, and define a vector a, such

that



aTk Tk)

aU1 aU2

2 2 2
evaluated at the points E(U,) = 2a and E(U2) = 20 + 202. Since
and a> Since









aTk k 2
1/U2 and -U /U2
aU au
U1 2


we have


2 -1 2 2 2 2).
(2.2.9) a' = ((20' + 2o) (-2a )/(2a + 2a )2 .
2o) (- 2o

2
Letting oT = a'iAa, where Aj is as defined in (2.2.8), and using

Theorem 2.1.2 and Theorem 14.6-2 from Bishop, Fienberg, and

Holland (1975) concerning differentiable functions of vectors with a

joint asymptotic normal distribution, we can state the following

theorem.

Theorem 2.2.2. If Tk = U1/U2 and 2T = a'a, then


kl/2[(l-Tk) -p]/oT -d+ N(0,1).
k+o
2
The quantity oT depends on unknown parameters but Slutsky's Theorem

(Serfling 1980, Page 19) assures us that Tk will still have an

asymptotic normal distribution if we replace aT by a consistent

estimate. Such an estimate is derived in Appendix C and is referred to

here as 8T. Using this estimate, we can construct an asymptotic,

100(1-))% confidence interval for p as


-1/2
(2.2.10) (1-UI/U2) (Z/2(k )1 ,


where Z,/2 denotes the (1-C/2)th percentile of a standard normal

distribution.

The above procedure was derived before it was known that

Arvesen (1969) had developed a similar procedure involving the

jackknifing of U-statistics. Also, Arvesen and Schmitz (1970)







considered the specific problem of constructing an asymptotic confidence

interval for the intraclass coefficient.

2 2
In their procedure Arvesen and Schmitz estimated 6 = ln(no2/o2 + 1)

by jackknifing the statistic k = In(MST/MSE) (note that MST/MSE can be

written as a function of U-statistics: see Appendix B). The log

transformation was used for variance stabilization which Arvesen and

Schmitz showed, through simulation, was useful for moderate sample

sizes.

The Arvesen-Schmitz procedure involves leaving out, one at a time,
^i
each of the vectors Zi, and calculating BI,- = ln(MST/MSE) using the

remaining vectors as the data for a one-way design with k 1

treatments. Using 0 as the estimate calculated using all k vectors,

psuedo- estimates are formed as i. = k0 (k-l)8tl. A point estimate

A -1
is calculated as =k S. and the standard deviation of the point
i
-1 2 )21/2 \
estimate is estimated by s. = [(k-1) (i-p) ] Then, as in
a i
Tukey (1958), the distribution of the statistic,


1/2 ^ -1
t = k/2 (B-)sl,
k-1


is approximated by a t distribution with k 1 degrees of freedom.

If t /2,k- is the (l-/2)th percentile of a t distribution with

k 1 degrees of freedom, then an approximate 100(1-C)% confidence

interval for a is

S-1/2 k-/2
(t- (t 2 l )sk/2 + (t )s.k (L,U).
;/2,k-1 ; /2,k-1







Therefore, an approximate 100(1-))% confidence interval for p is


(2.2.11) ([exp(L)-l]/[exp(L)-l+n], [exp(U)-1]/[exp(U)-1+n]).


Another method of obtaining an asymptotic confidence interval for p

can be derived using the fact that, for a two-dimensional vector W,

W ~ N2(0,A) implies W'A I W X where X) is a chi-square random variable

with two degrees of freedom (Serfling 1980, Page 128). Therefore,

Theorem 2.2.1 implies that


(2.2.12) k([U_-E(U) ,U2-E(U2)]i [U-E(U1),U2-E(U2)j x2
k+X
2
Letting D Det(A) = 011022 012 > 0 we obtain



1 D-1 22 012
A-' = D-I (22 -12(
al
12 11


and letting X' = E(U1) = 2a2 and Y' = E(U2) = 202 + 2a2 we can rewrite

(2.2.12) as


kDI [22(X'-U)2 + oll(Y-U2)2 2C12(X--U1)(Y'-U2)] d 2.
k+=

Defining X2l as the (l-i)th percentile of a X2 distribution and

2
setting the above quadratic equation equal to X; we obtain


2 2 2 -1
(2.2.13) a22(X'-U1)2 + o1(Y--U) 2a12(X'-U1)(Y-U2) DX2k- = 0,


which is the equation of an ellipse such that the probability the point

(U1,U2) is in the interior of the ellipse is approximately 1 i.








Using the observed point, (U1,U2) c, as the center of the

ellipse, we can form an asymptotic 100(1-i)% confidence interval for p

in the following manner


Y'=E(U2)=2o2+2o2
2 a CE


Y'=d X'
1 2


2
X'=E(U1)=2a
1 E;


Figure 2.2.1


Let dl and d2 (dl < d2) be the slopes of the two lines that pass

through the origin and intersect the ellipse in exactly one point (see

2 2 2
Figure 2.2.1). Using Y' = dX', or equivalently d = (2 + 2 )/2 we can

form an asymptotic 100(1-)% confidence interval for p as


(2.2.14)


( -1 -d1
(1 -d 1-d ),
1' 2


where the exact forms of dl and d2 are given in Appendix D.

As we shall see in Chapter Four, this method of constructing a

confidence interval for p is inferior to other available methods and

therefore would not be recommended for use in practice.















CHAPTER THREE
CONFIDENCE INTERVALS USING MODIFIED ANSARI-ERADLEY STATISTICS


3.1 Model and Formation of Pseudo-Samples


Ansari and Bradley (1960) introduced a two-sample rank statistic

that can be used to construct a confidence interval for the ratio of two

scale parameters. Let X1,X2,...,Xn and YI,Y2**,.,Yk be two independent

samples of independent observations from populations with continuous

distribution functions, F(x) and G(x) respectively, such that

F(x) = D(x/61) and G(x) = D(x/62) for some distribution function D(x).

That is, 61 and 62 are scale parameters associated with the X's and Y's

respectively. Define e = 62/S1, the ratio of the two scale

parameters. Thus 6X and Y have the same distribution.

The Ansari-Bradley statistic can be formulated in different ways.

In the formulation we utilize, the combined sample of X's and Y's are

ordered and the observations are ranked from the inside out as


N/2,...,2,1,1,2,...,N/2


if N E n + k is even and as


(N-1)/2,...,2,1,0,1,2,...,(N-1)/2


if N is odd. The Ansari-Bradley statistic is then defined as

W = ERank(Xi).
i








The statistic W can be used as a test statistic for testing

H : 6 = 1 versus one or two-sided alternatives. The distribution of W
o
under the null hypothesis (F(x) = G(x)) is tabled for moderate values of

n and k (Ansari and Bradley 1960). Bauer (1972) describes a method of

inverting the test procedure to obtain a confidence interval for e.

Using Theorem 1 of Chernoff and Savage (1958), Ansari and Bradley

(1960) showed that TN = W/(nN) has an asymptotic normal distribution

-1
which, under the null hypothesis, has mean 1/4 and variance k(48nN)-.

However, the Ansari-Bradley statistic does not satisfy all the

assumptions necessary for the application of Theorem 1 of Chernoff and

Savage. An alternate proof of the asymptotic normality of TN under the

null hypothesis is given in Section 3.2. The alternate proof modifies

the Chernoff and Savage proof so that it may be applied in the present

situation.

Consider now the balanced one-way random effects model


Zij = V + i + ij i = 1,2,...,k, j = 1,2,...,n,


where u is an unknown constant and the eij and ai are independent

samples of independent observations from continuous distributions with

distribution functions F(x) and G(x) and density functions f(x) and g(x)

respectively. Also, assume there exist scale parameters, 61 and 62,

such that F(x) = D(x/61) and G(x) = D(x/62) where D(x) is a continuous

distribution function corresponding to a random variable with a

distribution symmetric about zero. Thus, the Eij and the ai are random

variables with distributions symmetric about zero and they satisfy the

assumptions needed for using the Ansari-Bradley statistic. Therefore,

the eij/61 and the ai/62 have the same distribution.








22 2
The objective is to estimate the parameter Y = 62/(6 + 62) in

order to assess whether the variability contributed by the treatments is

large compared to the overall variability of the responses, i.e., to

estimate the proportion of the variability in the responses attributable

to the treatments.

Ordinarily in the two sample scale problem, 6 is the parameter that

would be of interest. However, in order to compare methods involving

scale parameters to methods involving variances we instead look at the

parameter y which is a function of 9, namely y = 2/(9 2+1). Thus, y and
= /2 + 2
p = 2 /(a + o ) are analogous parameters. In fact, y = p in those

cases where both scale parameters and variances exist since

62/6 = a/o One advantage to a procedure that estimates y is that an

estimate of the desired quantity can be found in those cases where

variances, and thus p, do not exist.

Ideally, we would like to have one sample consisting of the Eij and

another consisting of the ai and then use the Ansari-Bradley statistic

to give us information about 8 which could be transformed into a

confidence interval for y. However, knowing only the Zij, the

individual Eij and ai are not observable. What we can do is formulate a

sample of size n that, as n + m, essentially behaves like the cij from

treatment i and another sample of size k that, as n and k + -, mimics

the ai.

The derivation of these two pseudo-samples follows. In the next

section we will show that the asymptotic distribution of the Ansari-

Bradley statistic calculated using these pseudo-samples is the same,

when 0 = 1, as the asymptotic distribution of the Ansari-Bradley

statistic calculated using the actual e.. from treatment i and the

actual a..







We begin formation of the pseudo-samples by defining

Z = n Ez, Z = (nk) -EZ 1 i = n j = (nk) EE.ij
i. ij .. ij' i. ij .. ij1
j ij j ij
-1
and a = k Ei. The two pseudo-samples we obtain are
i

1 2

x1 = Zil-i. = l-Ei. Y = ZZ = +E. --C

X = Z2-i. = e2-. Y2 = Z2.- = a2+2 -a-
x2 12fi. 12 i. 2 2. .. 2 2. ..



(3.1.1)




Yk Zk.-Z = k+k. -a-E
S= -Z. = Z. -i .
n in i. in i.

Under the assumptions of Theorem 3.2.1, as n and k +, e a,

and E. for 1 < i < n, converge in probability to zero. Therefore,

for large N, we would expect the Xj and the Yi to behave like random

samples from F(x) and G(x) respectively.

Throughout this chapter we assume that n and k both tend to

infinity in such a way that ,A = n/N always satisfies the condition that

10 < XN < 1 X0 for 0 < X0 < 1/2. Obviously, N = n + k will therefore

tend to infinity. To facilitate discussions, we will simply say that

N + -.

Recall that 61 is a scale parameter for the Eij and 62 is a scale

parameter for the ai. Let F (x) and G (x) be the distribution functions

for the Xj and Yi respectively. In an asymptotic sense we can think of

61 as being a scale parameter for the Xj and 62 as being a scale

parameter for the Y. since
1








F (x) = P(X
--+ P(ei..x) = F(x) = D(x/6 )



and


G (x) = P(Y. 1 1 1. 2.
--- P(a.ix) = G(x) = D(x/62).
N-- 1

These asymptotic equivalences can be justified by noting again that the

means converge in probability to zero and using Slutsky's Theorem

(Serfling 1980, Page 19).

Samples with the same asymptotic properties as those in (3.1.1) can

be obtained in other ways. One approach is to use medians rather than

means. Define Z. = median of (Zil Zi2,...,Zin),

A A
Z = median of (Z1,Z 2,... Zk), = median of (E i .i2, .., in),

and &= median of (a 1+9,a2+e2,...,ak+pk). We could then obtain the

pseudo-samples


1 2

X Z = Zil- = El-z Y = Z-2 = -l+l-8
1 iii ii i 1 1 1 1

X = Z -Zi = E2-: Y = Z2-Z = a2+~2-
2 12 i i2 i 2 2 2 2



(3.1.2)



Yk = Z-Z = k+k-e

X' = Z. -Z = E. -'..
n in i in 1

Using reasoning similar to that used with the samples in (3.1.1),

under the assumptions of Theorem 3.2.2, we conclude that these pseudo-


L







samples would be asymptotically equivalent to random samples from F(x)

and G(x) respectively.

Other methods of obtaining the two pseudo-samples could be used as

long as they provided samples with the correct asymptotic properties,

the estimates involved converge to zero, and the estimates satisfy other

criteria that will be examined in Section 3.2.

In the next section, we turn our attention to the asymptotic

distribution of the Ansari-Bradley statistic calculated using the

pseudo-samples. We show that, when 8 = 1, this distribution is

equivalent to the asymptotic distribution of the Ansari-Bradley

statistic calculated using the actual eij and ai.



3.2 Asymptotic Distribution of the Ansari-Bradley Statistic
Using Pseudo-Samples


Consider the pseudo-samples described in (3.1.1). For simplicity,

but with no loss of generality, we will assume that we are using the Eij

from treatment one. Let c' = (e, e12-,..., n), a' = (a ,a2,..., Ik

X" = (XX2,...,Xn), Y' = (YY2, ...,Yk), and W(E,g) denote the

Ansari-Bradley statistic calculated using the samples E and a. We now

derive an expression for W(.,g) that is similar to a Chernoff and

Savage (1958) expansion of a statistic. We do not use a direct

application of the Chernoff and Savage procedure because the Ansari-

Bradley statistic does not satisfy all the assumptions necessary for

implementing the Chernoff and Savage expansion. However, using an

alternative expansion that produces a similar expression to that

obtained by Chernoff and Savage allows us to make use of some of their

results.







For any event A, let I(A) be 1 if event A occurs and 0 if event A

does not occur. We then define empirical distribution functions for the

elj and the ai as Fn(x) = n-EI(E1 j i
With XN as described in Section 3.1, we then define the combined sample

empirical distribution function as HN(x) = XNFn(x) + (l-XN)Gk(x) and let

the combined population distribution function be denoted


(3.2.1)


H(x) = XF-(x) + (1-XN)G(x).


We define a function JN[HN(x)] to be


JN[HN(X)] = I


-1 -1
(2N)- + 11/2+(2N)- -HN(x)

11/2+(2N)-lN(x)I-
j1/2+(2N) *~HN(x)


if N is even

if N is odd


and a function J[H(x)] as


J[H(x)] = 1/2-H(x)j.


We also let


J'[H(x)] = I


-1 if H(x) < 1/2

1 if H(x) > 1/2


and note that J'[H(x)] is the derivative of J[H(x)] with respect to H(x)

at all points except H(x) = 1/2 (where the derivative is not defined).

We make J'(1/2) = -1 by definition so J'[H(x)] will be defined

everywhere.

Let


(3.2.5) T (e,a) = (nN)-W(e,a) = JN [HN(x)]dF (x)


be an alternative representation of the Ansari-Bradley statistic and let


(3.2.2)


(3.2.3)


(3.2.4)


I _______________________^ __ ---- _---------------------







A = fJ[H(x)]dF(x),


(3.2.7)

(3.2.8)

(3.2.9.a)

(3.2.9.b)

(3.2.9.c)


BIN = fJ[H(x)]d[Fn(x)-F(x)],

B2N = f[HN(x)-H(x)]J'[H(x)]dF(x),

ClN = XNf[Fn(x)-F(x)]J'[H(x)]d[Fn(x)-F(x)],

C2N = (1-N)f[Gk(x)-G(x)]J'[H(x)]d[Fn(x)-F(x)],

C3N = fJNtHN(x)]-J[HN(x)])dFn(x).


Then, by adding and subtracting appropriate quantities,


TN(,_) /JN[HN(X)]dFn(x)

= fJ[H(x)]dF(x) + fJ[H(x)]d[Fn(x)-F(x)]

+ f(JN[HN(x)]-J[HN(x)])dFn(x)

+ f(J[HN(x)]-J[H(x)])dFn(x)

= A + B1N + C3N + f(J[HN(x)]-J[H(x)])dFn(x).

Now, using (3.2.3), J[HN(x)] J[H(x)] is equal to


H(x) HN(X)

HN(x) H(x)
1 HN(x) H(x)

HN(x) + H(x) 1


if 0 if 1/2
if 0
if 1/2

Since we assume that F(x) and G(x) are such that F(0) = 1/2 = G(O), it
follows that H(O) = 1/2. Thus, using (3.2.4), J[HN(x)] J[H(x)] is

equal to


(3.2.6)







[HN(x)-H(x)]J'[H(x)] + 0

[HN(x)-H(x)]J'[H(x)] + 0

[HN(x)-H(x)]J'[H(x)] + [1-2HN(x)]J'[H(x)]

[HN(x)-H(x)]J'[II(x) + [l-2HN(x)]J'[H(x)]


01HN(x) 1/2

1/2
04HN(x)<1/2

1/2

We define


0

0
KN(x) =
[1-2HN(x)]J'[H(x)]

[l-2HN(x)]J'[H(x)]


0HN(x)<1/2

1/2
O0HN(x)<1/2

1/2

(3.2.9.d)


C4N = fKN(x)dFn(x).


Then J[HN(x)] J[H(x)] = [HN(x)-H(x)]J'[H(x)] + KN(x) and therefore


TN(e,g) = A + B1N + C3N f[HN(x)-H(x)]J'[H(x)]dFn(x) + C4N

= A + BIN + C3N + f[HN(x)-H(x)]J'[H(x)]dF(x)

+ f[HN(x)-H(x)]J'[H(x)Id[Fn(x)-F(x)] + C4N
= A + BIN + B2N + C3N + f[Fn(X)-F(x)]J'[H(x)]d[Fn(x)-F(x)]

+ (1-AN)J[Gk(x)-G(x)]J'[H(x)Jd[F (x)-F(x)] + C4N
(3.2.10) = A + BIN + B2N + C1N + C2N + C3N + C4N.


The terms C1N through C4N are shown to be of order o p(N-2) in

Appendix E. Since A, BIN, and B2N are equal to or analogous to

corresponding terms in a Chernoff and Savage (1958) expansion, we see

that TN(E,a) has an asymptotic normal distribution with mean


S= fJ[II(x)JdF(x)


x<0

x>0

x>0

x(O.


and

and

and

and


x<0

x>0

x>0

x<0







and variance


Na = 2(1-i )( f f G(x)[l-G(y)]J'[H(x)]J'[H(y)]dF(x)dF(y)
-o -1
+ (1-X)X- f f F(x)[l-F(y)]J'[H(x)]J'[H(y)]dG(x)dG(y)).
-oe

Under Ho: 8 = 1, implying 61 = 62 and therefore F(x) = G(x) = H(x), it
2 -1
can be shown that uN = 1/4 and o = k(48nN)-

To prove that W(X,Y), where X and Y are as given in (3.1.1), has

the same asymptotic distribution as U(_,a), when 6 = 1, we look at


(3.2.11)


TN(X,Y) = (nN)-1W(X,Y)


and show that it has the same asymptotic distribution as TN(E, ).

Remembering that F(x) and G(x) are the true distribution functions

for the elj and the ai respectively and that Fn(x) and Gk(x) are the

corresponding empirical distribution functions, we define


(3.2.12)


(3.2.13)


* -1
F (x) = n ZI(X( n j


G*(x) = kZI(Y ki


(3.2.14)

(3.2.15)

(3.2.16)

(3.2.17)

(3.2.18.a)

(3.2.18.b)


HN(x) = Vn(x) + (l-N)Gk(x),

A = fJ[H(x)]dF(x),

BN = /J[H(x)]d[Fn(x)-F(x)],

B2N =[HN(x)-H(x)J'[(x)JdF(x),

CN = AN[Fn(x)-F(x)]J'[H(x)d[Fn(x)-F(x)],

C2N = (l-XN)f[G(x)-G(x)]J'[H(x)]d[Fn(x)-F(x)],







(3.2.18.c)

(3.2.18.d)


n
C3N= N[H(x)]-J[HN(x)])dF (x),
C4N = (x)dF ),
C N = fKN(x)dFn(x),


where


0

0
KN(x) =
[1-2HN(x)]J'[H(x)]

[1-2HN(x)]J'[H(x)]


and H(x), JN, J, and J' are as defined in

Expanding TN(X,Y) = fJN[HN(x)]dFn(x)

expanded TN((,a) produces


(3.2.19)


if 0(HN(x)<1/2 and x<0
N

if 1/20

if 00

if 1/2

(3.2.1) through (3.2.4).

in the same manner as we


X *
TN(X,Y) = A + BIN + B2N + CIN + C2N + C3N + C4N.


Recall that F(x), G(x), and thus H(x), are assumed to represent

distributions symmetric about zero. This implies that J[H(0)] =

J(1/2) = 0. Therefore, defining


(3.2.20)


%(x) = JJ'[H(y)]dF(y)




B(x) = fJ'[H(y)]dG(y),


(3.2.21)


B1N can be written as


f(XJ'[H(y)]dH(y))d[Fn(x)-F(x)]

= fI(fJ'[H(y)]d[rNF(y)+(l-N)G(y)])d[Fn(x)-F(x)]

= XNfi(x)d[F*(x)-F(x)] + (1-A )fB(x)d[F*(x)-F(x)].






Integrating B2N by parts produces


[HN(x)-H(x)]B(x)|_f fB(x)d[HN(x)-H(x)]

= -(XNj(x)d[FN(x)-F(x)] + (l-XN)fB(x)d[Gk(x)-G(x)]).

*
Thus, BIN + B2N may be expressed as


(1-~k)(fB(x)d[F*(x)-F(x)] f/(x)d[Gk(x)-G(x)])

(3.2.22) = (1-XN)(n-ZB(e l.) E[B(c )]


k- EZB(cai+i -a- ) E[B(al)]).


In the same manner it can be shown that BIN + B2N (given in (3.2.7)

and (3.2.8)) can be written as


(3.2.23) (l-XN)(n-B(el ) E[B(e11)] k-1B(ai) E[B(cl)]).
j i

Recalling the form of J'[H(x)] given in (3.2.4) and the definitions

of B(x) and B(x) given in (3.2.20) and (3.2.21), we see that


1/2 F(x) if x<0
(3.2.24) B(x) = {
F(x) 1/2 if x>0

and


1/2 G(x) if x<0
(3.2.25) B(x) = {
G(x) 1/2 if x>0.

We then define


-f(x) if x<0
(3.2.26) B'(x) = {
f(x) if x>0







and


-g(x) if x0O
(3.2.27) B'(x) = {
g(x) if x>O,


noting that B'(x) and B'(x) are the respective derivatives of E(x) and

B(x), except at x = 0. We make B'(0) = -f(0) and B'(0) = -g(0) by

definition so the functions will be defined everywhere.

The proof that the asymptotic distribution of Tj(X,Y) is the same

as that of TN(E,a), when e = 1, is given in the following theorem.

Theorem 3.2.1. Using the model and assumptions described in

Section 3.1 and the pseudo-samples given in (3.1.1), if

(i) 6 = 62/61 = 1 (WLOG assume 61 = 62 = 1),

(ii) F'(x) = f(x) and IF"(x)| = ff'(x)| are continuous and

bounded by constants B1 and B2 respectively,

(iii) f(0) > 0,

(iv) fx2f(x)dx < ",

then


(3.2.28) N 2[TN(X)-1/4](48nk1 )1/2 -- N(0,1).
N+c

Proof: First note (i) implies F(x) = G(x) = H(x) and (iv) implies
1/2-
N a = 0 (1). Assumption (iv) also implies that for every i,
p
1/2- 1/2- 2
N /2i = 0 (1) and E[(N 2i.)2] is uniformly bounded. Also note that
i p
the Glivenko-Cantelli Theorem (Serfling 1980, Page 61) states that

supF n(x)-F(x)| = o (1). We begin the proof of the theorem by proving
x p


two lemmas.







Lemma 3.2.1. Under the assumptions of Theorem 3.2.1,


n N l/2 B(c-e )-B(C )] = Op(1).


Proof: Recalling the form of B(x) from (3.2.25) we see that

B(c j-. ) B(Elj) is equal to


-G( ) G(e l-.) if e1j<0 and elj-l.O

G(ej-E) G G(E -j) if Ej >0 and e j-l.>0


1 G(eI -eI ) G(C I) if ej >0 and e j-~l.0

G(elj-1.) + G(e ) 1 if eljO and e l-1>0,


which can also be written as


+ 0

+ 0

+ [1-2G(eIj-1l.)

+ [2G(elj-1. )-1]


Elj<0

Sl>0

Elj>0

Elj<0


and l j- l.,0

and e j-le >0

and elj- l.<0

and lje .>0.


(3.2.29) BN (X ) =


0

0

1 2G(el.-e1.)

2G(e .-1 ) 1


if Elj
if E j>0

if lj>0

if lj0
1lj


and e j-e .<0
1j 1.

and lj-E. >0
i1 1.

and e l-e l.0

and e j-l. >0
1i 1.


and noting that


G(cj-E1.) G(ej) = (-l )g(Clj) + (1/2; )g(El +Tj),


--[G( j- 1. )-G(Eij)

[G( Ej-E )-G(qj )]

[G(el-1. )-G( i)]

-[G(elj-l. )-G(elj)]
--[G~ -q )G~ J


Defining







where T. is between 0 and -i., we use the form of B'(x) (3.2.27) to

obtain

--1 1/2
n N Z[B(ce )-B(e )]

= -(N 12. )n- B'(elj) + (1/2 1.)(N 12. )n- E'[G( )]g'(E +T
.
(3.2.30) J J
+ n-1N1/2 (X).


Now, E[B'(E1 ) = JO -f(x)dF(x) + fJf(x)dF(x) = 0 since f(x) is

symmetric about zero. Also, E[B'2 (ej)] = f2 (x)dF(x) is bounded since

f(x) is bounded. Therefore, since the Eli are independent, we apply the
Markov inequality (Chow and Teicher 1978, Page 88) to obtain
-1
n 1B'(ecj) = o (1). The assumptions of the theorem imply 1. = o (1),

1/2-
N 12. = 0 (1), and IJ'[G(c l)]g'(e l+T, ) < B2. Thus, the sum of the

first two terms of (3.2.30) is 0 (1)o (1) + o (1)0 (1)0 (1) = o (1). To

complete the proof of the lemma we must show n N 2BN(X.) = o (1).

Using (3.2.29) we see that n 1N /2 % (X.) is equal to
j -J



+ [2G(e .-i. )-l]I(eC 0)I(e -'I.>0)

2n lNl/2 ([G(0)-G(e -i1 )]I(0
+ [G(Elj-1 )-G(0)]I( 1
< 2n-1Nl/2([G(0 )-G(c j-C .)]I(0
+ [G(Cj-e )-G(EO1)I(I C 2n-INl/2 ([CjG(I lj-' l. )]I 10<


= 2n- Nl/2F(((-l )g(el +j )I[I(0< Elj< )+I(.e <.
where T' is between 0 and -e
J 1.







< 2BI/2N In- Z[I(0

S2BN 1 Fn(O)-Fn( 1.).

< [2BIN/2 1 ] [IFn(O)-F(O) +lF(O)-F(S.) )I+F(e1)-Fn(e1.) ]

= Op(l)[o (1) + Op(1) + op(1)]

= p (1),


using the Glivenko-Cantelli Theorem, the continuity of F(x), and the

fact that e = o (1). This completes the proof of the lemma.
i. p
Lemma 3.2.2. Under the assumptions of Theorem 3.2.1,


-1 1/2 -(1).
k-IN/2[B(ai+i -a-s )-B(ai)] = o (1).
i

Proof: We begin by recalling the forms of %(x) and B'(x) given in

(3.2.24) and (3.2.26). Proceeding as in the proof of Lemma 3.2.1, we

define


S0 if ai<0 and ai+i. -a-E <0

0 if ai>0 and ai+Ei -a-s >0
(3.2.31) N (Yi) = i i i ..
1 2F(ai+i.- a-c ) if a.>0 and ac.+. -a-s 40
1 1 1. .

2F(a +. -a-. ) 1 if ai O and ai+E i-a-s >0
1 i. .. i l l. ..

and obtain


k- Nl/2 Z[B(a i+i -a- )-B(ai)
i
1/2 -1 -1 1/2-
(3.2.32) = -N (a+ )k ZB'(ai) + k E(N E )B (a.)
i i


Si ii +T

+ k 1Nl/2^ E(Yi),
iN







where T is between 0 and e. -a-E To prove the lemma we show that
1 **1.
each of the four terms in (3.2.32) is o (1).

Since N1/2a = 0 (1) and N1/2 = o (1), it follows that
p .. p
N (/2(+-e ) = 0 (1) and therefore the first term in (3.2.32) is seen to
p
be o (1) following the same argument used to show

(N/2-i. )n EB'(e j) was o (1) in the proof of Lemma 3.2.1.
S--1J 1/2-
In the second term of (3.2.32) let AN = k E(N e )'(ca). Since
i
the e. and the B'(a ) are independent with zero means, we see that

E(AN) = 0. We also note that E(2A) = Nk E[((.)BB2(al)]. The

assumptions of the theorem guarantee that IB'(ac)I is bounded and that

E(e ) = o (1). It then follows that E(Aj) = o(1) and thus, using the

Markov inequality, that AN = o (1).

We now turn attention to the third term of (3.2.32). From

assumption (ii) and the definition of J', we see that

IJ'[F(ai)]f'(ai+Ti) < B2. We can then write


IN2(2k)-E( i.a- )2 J'[F(ai)]f'(ai+T )I
i
1/21 i- 2

< BN 2(2k)1E(C. -a-- )2
2 i

Expanding the upper bound we obtain


(3.2.33) B2/2[N (crie )(+cie ) + k EN E/
2. .. i.

1/2- -1 -
i
2N1/2(a+e )k -Ze. ].


1/2 -1 -
As before, N (cfri- ) = 0 (1). Also, k le is o (1) using the Markov
p i.

-1 1/2-2
inequality. In the middle term of (3.2.33) let A, = k 'N ei. Since
i
the e. are independent and identically distributed, assumption (iv)






1/2-2
assures us that E(AN) = E(N l) = o(1). Therefore, we see that

A = o (1) by again applying the Markov inequality. Combining these

results we see that (3.2.33) is equal to B2/2[0 (1)o (1) + o (1)

Op (1)o p(1)] = o p(1).

Recalling the definition of B~(Yi) given in (3.2.31), we express

the fourth term of (3.2.32) as


k-1 N/2E(l-2F(a.+i. -a-c )]I(a.>0)I(a +E i.-a-- <0)


+ [2F(a +i, -a-c )-1]I(ai<0)I(ai+i.-a-.. >0))


< 2k- Nl/2 ([F(a )-F(a +i.-a-c )]I(O

+ [F(a.+E -a-c )-F(a )]I(a+e -c.

= 2k-lNl/2 (|(E -a-- )f(a.+Tr) [I(0

+ I(a+c -E. where T' is between 0 and e. -a-e
i 1. ..


i
(3.2.34) < 2B 1N 1 +-E k k-[I(0 2B Ic .N 1/ .


+ 2BIN /2k- -Ei. I [I(0

To complete the proof of the lemma we must show both terms in (3.2.34)

are o (1).

Beginning with the first term, we have seen previously that
1/2 I
NI/2 -1~ = 0 (1). Since the (ai,i.) are identically distributed for

i = 1,2,...,k, we see that


E(k- Z[I(O i

= P(0
1 .. 1. ..* 1.*







Using the assumptions of the theorem we have also shown that
1/2 -- -
N (a+e -E ) = 0 (1). This implies that for any A > 0, there exists

a bounded D > 0, such that P(-D 1 A/2 for large

N. Therefore,


P(OD)
1 1. 1 ..

and thus, for large N,

1 1/2
P(0

= G(D/N1/2) G(0) + A/2.

Since D/N1/2 = o(l) and G(x) is continuous, the above quantity can be

made arbitrarily small by choosing large enough N. Thus, we have shown

that P(0

that P(acrt -E1.

that the first term of (3.2.34) is 2B10p(l)o (1) = o (1).

For the second term look at


E(N /2k- Iel | [I(0

< Nk E(2 ) + N(k-l)k- E(I j [I(O

+ I(cr-e -El-
S1. 1L 2. 2 2.

+ I(a+i -2
.. 2. 2

Let [I(O
+ I( -i -e2
inequality (Chow and Teicher 1978, Page 104), the expectation of

interest is less than or equal to







Nk-E(2 .) + N(k-l)k-[E(R2)]I2[E(S2)]1/2

= o(l) + N(k-1)k'[E(e .)]E(S)J]2

= o(1) + 0(1)o(1) = o(1)

-2
since E(C ) = 0(N) and E(S) < P(0
E j)1 1. + PL which is o(l) as shown above. Thus, again applying the Markov

inequality, we see that the second term in (3.2.34) is o p(1) completing

the proof of the lemma.

To prove Theorem 3.2.1 we recall that under Ho: 8 = 1,


N 2[TN(E,a)-1/4](48nk- 1/2 -d N(0,1).
N+

Theorem 3.2.1 is established by showing


(3.2.35) N/2[TN(X,Y)-TN(e,a)] = o (1)


and applying Slutsky's Theorem.

Using the representations of TN(E,a) and TN(X,Y) given in (3.2.10)

and (3.2.19) respectively, we write the LHS of (3.2.35) as


1/2 *
N (A + BN + B + E ChN A BN B2 Z CN).
h=1 h=1
-1/2
In Appendix E it is shown that the ChN and ChN terms are all o (N ).

Since A* = A, for large N we can write the LHS of (3.2.35) as


N/2(B + BN BN- B ) + o (I).
IN 2N IN 2N p


Using (3.2.22) and (3.2.23) this quantity is equal to







(l-N)N(n-[B(e j -.) B(E j)]


k E[B(ai+i -a-c ) B(ai)]) + o (1).
i p

The theorem follows by applying Lemmas 3.2.1 and 3.2.2.

Theorem 3.2.2. Using the model and assumptions described in

Section 3.1 and the pseudo-samples given in (3.1.2), if assumptions (i),

(ii), and (iii) of Theorem 3.2.1 are satisfied and if

(iva) lim inf -ln[l-F(x)] > 0
21n(x)

or

(ivb) fIx[~f(x)dx < = for some I > 0,

then


N /2[TN(X',Y')-1/4](48nk-) 1/2 -d- N(0,1).
N+

Indication of Proof: The assumptions of Theorem 3.2.1, where sample

means were used to form the pseudo-samples, implied that a = p (1)
1/2-
and N a = 0 (1). Also, for every i = 1,2,...,k, that

N 2i. = 0(1), i. = o(1), and E[(N12. )2] is uniformly bounded.

These facts were instrumental in the proof of the theorem. In the

present theorem, where sample medians are used to form the pseudo-

samples, the assumptions produce similar results for a and i, i =

1,2,...,k, (defined in Section 3.1).

1/2
Assumption (iii) assures us that N a2 = 0 (1) (see Proposition
p
E.10 in Appendix E) and N1/2 i = 0 (1) for every i (Serfling 1980, Page
i p
77). Anderson (1981, Propositions 1 and 2) showed that either of (iva)

and (ivb) is a necessary and sufficient condition for E[(N /2 i) 2 to be

uniformly bounded for every i (under the assumption that f(x) is








symmetric about zero, Anderson's conditions on a+ and a_ in his

Proposition 1 are equivalent to (iva)). For distributions with finite

first moment, (ivb) is obviously satisfied. For the Cauchy distribution

it can be shown that (iva) is true and (ivb) is true for 0 < J < 1.

Since the sample median is a consistent estimate for the population

median and since F(x) represents a distribution symmetric about zero, we

see that & = o (1) and Ci = o (1) for i = 1,2,...,k.

Therefore, the proof of Theorem 3.2.2 is analogous to the proof of

Theorem 3.2.1 with a + e replaced by & and E. replaced by 2i for
1. 1
every i.

The proofs that appear in Appendix E regarding the negligibility of

the C* terms are given utilizing the assumptions of Theorem 3.2.1 and

using sample means to form the pseudo-samples. Corresponding proofs

using the assumptions of Theorem 3.2.2 and sample medians to form the

pseudo-samples are analogous.

Whether to obtain samples like those in (3.1.1) or (3.1.2) will

depend for the most part on what is known or believed about the actual

distributions of the Eij and a.. For some distributions, the Cauchy

distribution for example, second moments do not exist so samples

obtained using medians as in (3.1.2) would be used since the assumptions

for Theorem 3.2.1, which pertain to pseudo-samples constructed with

sample means, are not met. In those cases where either means or medians

could be used, the size of the tails of the distributions would be an

important factor in choosing how to obtain the samples. For

distributions with heavy tails, when extreme observations are more

likely, samples involving medians may be preferred since the median is

less affected by extreme observations than the mean. For distributions








with lighter tails, like the normal distribution, the mean may be

preferred over the median since in these cases the mean is more

efficient (Serfling 1980, Page 86).

Theorems similar to Theorems 3.2.1 and 3.2.2 could possibly be

proven if the pseudo-samples involved are obtained by using estimates

that have the same type of large sample properties as the sample mean

and sample median. Adjustments in the assumptions may have to be made

in these cases to ensure that the estimates meet the requirements for

proof of a theorem analogous to Theorem 3.2.1 or Theorem 3.2.2.




3.3 Asymptotic Confidence Intervals Using the Modified
Ansari-Bradley Statistic


If we could observe the actual values of elj and ai as our two

samples we could use the procedure developed by Bauer (1972) to

construct an exact confidence interval for 6 = 6,/6, or any function of

6, such as y = e2/(2 + 1) = 6 /(2 + 62). Using the values of

W(g,g) for which 9 = 1 is not rejected, Bauer derives a confidence

interval for 6 where the endpoints are particular order statistics of

the subset of ratios ai/elj which are greater than zero. The choices of

the order statistics, and hence the confidence coefficient of the

interval, are derived using the tabled distribution of the Ansari-

Bradley statistic.

Without the actual Clj and ai as our two samples we cannot obtain

an exact confidence interval for 6. However, we can construct an

asymptotic confidence interval using a procedure of Sen (1966) that uses

the results of Chernoff and Savage (1958), Hodges and Lehmann (1963),

Lehmann (1963), and Sen (1963) among others.







Using Sen's procedure for constructing an asymptotic confidence

interval for e requires a statistic, which we shall call SN(X,Y), such

that SN(9X,Y) is monotonically increasing in 6 and, when properly

standardized, has an asymptotic normal distribution. These conditions

being satisfied, the endpoints of an asymptotic 100(1-C)% confidence

interval for 6 are


aLN = Inf{: ZN >-Z /2} and
(3.3.1)

6UN = Sup{: ZN < Z /2'

where ZN is the standardized version of SN(9X,Y) and Z /2 is the

(l-;/2)th percentile of the standard normal distribution.

We can apply Sen's procedure using the modified Ansari-Bradley

statistic W(X,Y) where X and Y are samples of observations as in (3.1.1)

or (3.1.2). Recalling the description of W in Section 3.1 it is easily

seen that W(9X,Y) is monotonically increasing in e. Defining


(3.3.2) ZN( X,Y) = [W(6X,Y)-nN/4](nkN/48)-1/2


it follows from Theorem 3.2.1 or Theorem 3.2.2 that Z (OX,Y) has an

asymptotic standard normal distribution. Thus, the requirements for the

use of Sen's procedure are met and the endpoints of the desired interval

are as described in (3.3.1).

In order to derive computational formulas for LN and eUN in our

case, we will use a representation of W(X,Y) introduced by Ehattacharyya

(1977). Bhattacharyya's representation uses ratios of observation from

the two samples in much the same way as in Bauer's (1972) exact

confidence interval procedure.







To begin, the Xj and Yi are adjusted by subtracting the combined

sample median. This centers the combined sample around zero but does

not change the value of W(X,Y) since all observations are shifted

equally. If we let m be the combined sample median then we are really

dealing with the samples X i1 and Y f1. For ease of exposition we

suppress this fact by continuing to refer to the samples as the

vectors X and Y.

We now define some notation as in Bhattacharyya (1977):


minW(X,Y) = minimum possible value of W(X,Y). Attained
when all the Xj have the smallest ranks,

relevant pair: a pair (Xj,Yi) where Xj and Yi have
the same sign,

(3.3.3) p(X,Y) = number of relevant pairs in the two observed
samples,

p'(X,Y) = number of relevant pairs where X /Yi > 1,

Pmax(X,Y) = maximum possible number of relevant pairs.


Bhattacharyya proved that W(X,Y) can be written in terms of these

quantities through the expression


(3.3.4) p'(X,Y) + (1/2)[p (X,Y)-p(X,Y)] + minW(X,Y) = W(X,Y).


For ease of exposition we will refer to the quantities defined in

(3.3.3) as simply minW, p, p', and pmax. We also note that if

N = n + k is odd, one of the observations in the combined sample will be

zero after subtracting f. In this case (3.3.4) is obtained by

eliminating this observation from consideration in forming the relevant

pairs.







The quantities minW and Pmax, defined in (3.3.3), are constants

that depend on n and k. If n and k are both even, minW = n(n+2)/4 and

Pmax/2 = nk/4. If n and k are both odd, minW = (n+1)2/4 and max/2 =
(nk-1)/4. If n is odd and k is even, minW = (n-l)(n+1)/4 and pmax/2 =

k(n-1)/4 if m is an X while minW = (n+l)2/4 and pmax/2 = n(k-l)/4 if

i is a Y. If n is even and k is odd, minW = n2/4 and Pmax/2 = k(n-1)/4

if i is an X while minW = n(n+2)/4 and pmax/2 = n(k-1)/4 if 8 is a Y.

In all of these cases, for large N (which is what we are interested in)

minW behaves like n2/4 and Pmax/2 behaves like nk/4. Thus, for large N,

minW + Pmax/2 = nN/4.

We can now derive the computational formulas for the endpoints

(3.3.1) of the interval obtained using Sen's (1966) procedure. First we

look at 6LN


0LN = Inf: ZN > -Z/2
= Inf{8: ZN(eX,Y) > -Z/2}

= Inf{6: [W(6X,Y)-nN/4](nkN/48)-1/2> -Z /2>

= Inf{e: W(6X,Y) > nN/4 (Z /,2)(nkN/48)/2 },


which, using (3.3.4), can be written as


Inf({: p + pmax/2 p/2 + minW > nN/4 (Z /2)(nkN/48)1/2


and for large N is equivalent to


Inf{6: p' > p/2 (Z /2)(nkN/48)1/2.


Recalling the definitions of p' and p from (3.3.3) and their dependence

on the vectors of observations 8X and Y we obtain






S( )1/2
LN = Inf{6: #(X. /Yi>1) > p/2 (Z /2)(nkN/48)1}

= Inf{6: #(Yi/Xj <) > p/2 (Z /2)(nkN/48)1/2}

= Inf{6: more than p/2 (Z /2)(nkN/48)/2 of the positive

(Yi/Xj) are less than 6}.

If we order the positive (Yi/Xj) we can apply the above expression to

see that

A> 1/2
(3.3.5) LN the {p/2 (Z/2)(nkN/48) 1/2+1 order statistic

of the positive (Yi/Xj),

where {x} = the greatest integer less than or equal to x.

Starting with the definition of iUN in (3.3.1) and following

similar steps we obtain

(3 1/2
(3.3.6) UN = the [p/2 + (Z /2)(nkN/48)'/2]+1 order statistic

of the positive (Yi/Xj),

where [x] = the greatest integer less than x.

It is a simple matter to convert these endpoints of a confidence

interval for 6 into endpoints of a confidence interval for

Y = e2/(62+1). The endpoints of an asymptotic 100(1-C)% confidence

interval for y are


(3.3.7) N = 2N/( N +) and N = 2 /(2N+
?LN LN LN UN UN+1.















CHAPTER FOUR
MONTE CARLO STUDY


A Monte Carlo study was undertaken to compare the various methods

of constructing confidence intervals for the intraclass correlation

coefficient, P = 0/(a2+o2), discussed in this dissertation. Throughout
a a e
this chapter we will refer to p as the parameter of interest even though

in some cases we are interested in scale parameters and the parameter of
2 2
interest is Y = 6 2/(6+62). We refer only to p for ease of presentation

and because p and y are numerically equivalent in those cases where both

exist.

Using IMSL (International Mathematical and Statistical Libraries)

subroutines, random numbers were generated from five distributions which

are symmetric about zero. The five distributions used were normal,

uniform, logistic, Laplace (double exponential), and Cauchy. In each

case the resulting random numbers were used to form responses in the

balanced one-way random effects model (without loss of generality we

assume p = 0)


Zij = ai + ij i = 1,2,...,k, j = 1,2,...,n.


The nk responses in each model were formed by generating nk + k random

numbers from one of the distributions, multiplying nk of these numbers

by a constant to obtain the simulated values of the eij, multiplying the

remaining k numbers by a constant to obtain the simulated values of the

ai, and adding the eij and ai to obtain the simulated responses.








Various multipliers were used to obtain effects with differing values of

p.

For each of the five distributions, four different size models were

generated. The size of the model is determined by k (the number of

treatments) and n (the number of observations per treatment). The

combinations of k and n used (with k listed first) were (6,12), (12,15),

(18,12), and (12,6). For models of size (6,12) and (12,15), numbers

were generated for each distribution and multipliers chosen to obtain

values for p of .10000, .26471, .40000, .50000, .60000, .73529, and

.90000. For models of size (18,12) and (12,6), only p values of .26471,

.50000, and .73529 were used. Since the results for these values of p

were consistent with the results for the same values of p for models of

size (6,12) and (12,15), the remaining values of p were not used for

models (18,12) and (12,6).

For every combination of distribution, model size, and p value, 200

sets of responses were generated and confidence intervals for p were

constructed using each of the methods described in this dissertation.

We will use the following conventions when referring to the individual

procedures. The Normal procedure refers to that based on normal theory

and the F-distribution as discussed in Scheffe' (1959, Pages 221-230).

The Arvesen procedure is the procedure based on jackknifed U-statistics

presented by Arvesen and Schmitz (1970) which leads to intervals of the

form given by (2.2.11). In Section 2.2 we presented two procedures for

computing intervals based on U-statistics. The first, which involved a

function of U-statistics with an asymptotic normal distribution,

produced intervals of the form given by (2.2.10) and will hereafter be

referred to as the U-statistic procedure. The second, which we call the








Chi-Square procedure, produces intervals of the form given by (2.2.14)

2
and is based on a function of U-statistics which has an asymptotic X2

distribution. The procedure presented in Chapter Three based on the

Ansari-Bradley statistic using pseudo-samples involving means (as given

in (3.1.1)) is called the ABMeans procedure. The corresponding

procedure based on pseudo-samples constructed using medians (as given in

(3.1.2)) is called the ABMedians procedure.

Intervals based on the Chi-Square procedure were not constructed

for the Cauchy distribution. These intervals could not be obtained

because of overflow errors encountered during the calculations. Since

the Chi-Square procedure is clearly inferior to the others (see

discussion below), the omission of these results in inconsequential.

Recall from Chapter Three that confidence intervals constructed

using the ABMeans and ABMedians procedures are formed using only the eij

from one treatment. Thus, for each of these procedures there are k

possible intervals that could be constructed from the responses in one

model. Individually, these k possible intervals, unlike the intervals

constructed using the other procedures, do not make full use of all the

information contained in the responses. In an attempt to obtain a

single interval that does make use of all the information, confidence

intervals were also formed in each case using procedures we will call

ABMeansC and ABMediansC. These intervals were calculated by averaging

the endpoints of the k different intervals that could be formed using

the ABMeans and ABMedians procedures respectively. This method of

combining the k possible intervals is based on the premise that if only

one interval was constructed, using either the ABMeans or ABMedians

procedure, it would most likely be constructed using the eij from a








randomly selected treatment. Thus, any of the k possible treatments

would be equally likely to be selected and any of the k possible

intervals would be equally valid. Averaging the endpoints of all

possible intervals can be thought of as assigning an equal weight to

each of the equally likely endpoints.

For each 200 intervals and for each procedure, the empirical

confidence coefficient (the number of intervals containing the value of

p divided by 200), the average length of the 200 intervals, and the

standard deviation of the 200 lengths were calculated. These

calculations were performed differently for the ABMeans and ABMedians

procedures since these procedures produce k possible intervals for each

set of responses. Therefore, the empirical confidence coefficient and

average lengths reported for these procedures were calculated using 200k

intervals rather than 200. Also, for each i = 1,2,...,k, we calculated

the standard deviation of the lengths of the 200 intervals constructed

if the Eij, j = 1,2,...,n, were used. The standard deviation reported

is the average of these k standard deviations.

A summary of the Monte Carlo study is presented in the following

tables. These tables are numbered in such a way that tables including

results for a particular distribution or model size can be easily

identified. The first position in the number of the table refers to the

distribution used to generate the responses according to the following

scheme: 1-normal, 2-uniform, 3-logistic, 4-Laplace, and 5-Cauchy. Thus,

the higher numbered tables are for distributions with heavier tails

(exception is that the normal distribution has heavier tails than the

uniform distribution).








Table 1Al
Behavior of nominal 90% confidence intervals for


models with


k = 6 treatments, n = 12 observations per treatment, and F normal.
P
.10000 .26471 .40000 .50000 .60000 .73529 .90000


Arvesen




Normal




ABMeans




ABMedians




ABMeansC


ABMediansC




U-statistic


Chi-Square


.91000
.5257
.312


.93000
.3845
.122


.82167
.4244
.217


.49417
.4422
.134


.87000
.4244
.153


.44000
.4422
.083


.72000
.2000
.110


.87500
.2625
.150


.87000
.5674
.226


.90000
.5060
.088


.85500
.5594
.174


.87167
.4992
.122


.92000
.5594
.107


.91500
.4992
.073


.68500
.3128
.111


.81000
.4070
.146


Note: For each procedure the


.90000
.5995
.209


.87500
.5224
.072


.85750
.5987
.168


.93000
.5202
.118


.90500
.5987
.114


.98500
.5202
.070


.63500
.3376
.114


.76500
.5219
.170


.88000
.6211
.194


.88500
.5149
.068


.83667
.6001
.158


.88083
.5228
.121


.92000
.6001
.112


.94500
.5228
.075


.62500
.3547
.110


.78000
.5422
.170


.170 .134 134


.87500
.5855
.195


.88000
.4868
.080


.84833
.5926
.166


.85250
.5228
.125


.88500
.5926
.133


.93000
.5228
.080


.63500
.3358
.113


.70000
.3983
.134


first row is the empirical


.90000
.5551
.217


.92500
.4215
.106


.85833
.5669
.193


.79917
.5095
.137


.91500
.5669
.165


.87500
.5095
.104


.65000
.3021
.112


.73000
.3597
.134


confidence


.94000
.3220
.187


.91000
.2178
.101


.89167
.3820
.237


.80667
.4177
.190


.95000
.3820
.216


.87000
.4177
.165


.69500
.1533
.089


.79000
.1820
1nA


coefficient, the second is the average length of the intervals, and the
third is the standard deviation of the lengths. Example: For 200 90%
confidence intervals constructed using Arvesen's procedure when
p = .10000, the proportion of intervals that contained .10000 was
.91000, the average length of the intervals was .5257, and the standard
deviation of the lengths was .312.








Table 1A2
Behavior of nominal 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F normal.

.10000 .26471 .40000 .50000 .60000 .73529 .90000
.i0000 .26471 .40000 .50000 .60000 .73529 .90000


Arvesen




Normal




ABMeans




ABMedians




ABMeansC


ABMediansC




U-statistic




Chi-Square


.93000
.5871
.303


.96500
.4728
.130


.90667
.5487
.238


.68250
.5441
.143


.94500
.5487
.161


.68500
.5441
.089


.81500
.2307
.126


.89500
.2860
.148


.91000
.6550
.226


.96500
.5980
.091


.92833
.6877
.176


.94500
.6047
.128


.96500
.6877
.104


.98000
.6047
.076


.73500
.3655
.138


.87500
.4478
.155


.92500
.6939
.206


.93500
.6100
.075


.94000
.7176
.163


.97250
.6247
.119


.97500
.7176
.107


.99500
.6247
.074


.68000
.3974
.134


.81000
.5219
.170


.94000
.7207
.187


.93000
.5971
.076


.92917
.7218
.153


.94833
.6302
.121


.96500
.7218
.111


.97000
.6302
.077


.71000
.4190
.129


.83500
.5989
.165


.94000
.6902
.197


.92500
.5636
.091


.92333
.7096
.163


.91667
.6305
.125


.97500
.7096
.131


.98000
.6305
.080


.74500
.5635
.202


.79500
.6271
.197


.95500
.6650
.228


.95500
.4883
.119


.94167
.6840
.196


.89833
.6165
.143


.98500
.6840
.173


.97000
.6165
.113


.74000
.6128
.239


.80000
.6784
.223


.96000
.4275
.226


.97000
.2572
.116


.95500
.4978
.258


.91083
.5234
.209


.97500
.4978
.239


.95000
.5234
.184


.78500
.5737
.336


.82500
.6782
.320


Note: Format of this table is identical to Table 1Al.








Table 1B1
Behavior of nominal 90% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F normal.
P

.10000 .26471 .40000 .50000 .60000 .73529 .90000


Arvesen




Normal




ABMeans




ABMedians




ABMeansC


ABMediansC




U-statistic


Chi-Square


.91500
.2770
.189


.91500
.2190
.061


.84333
.3512
.146


.64458
.3773
.117


.87000
.3512
.081


.61000
.3773
.065


.77000
.1592
.064


.88500
.1956
.077


.93000
.3601
.109


.91500
.3273
.040


.90667
.4783
.124


.85792
.4479
.099


.95000
.4783
.064


.91000
.4479
.054


.78500
.2593
.062


.89500
.3471
.088


.96500
.4142
.113


.92500
.3599
.023


.91625
.5244
.106


.92250
.4741
.085


.98500
.5244
.058


.98500
.4741
.044


.82500
.3070
.071


.91000
.4475
.120


.91000
.3874
.106


.90000
.3564
.026


.89583
.5155
.106


.90417
.4705
.086


.96500
.5155
.073


.96500
.4705
.051


.79000
.2912
.066


.89500
.4434
.118


.89000
.3853
.116


.87000
.3364
.035


.89750
.5088
.114


.90125
.4683
.088


.97000
.5088
.086


.98000
.4683
.059


.74000
.2873
.075


.83000
.4723
.147


Note: Format of this table is identical to Table lAl.


.92500
.3110
.094


.92500
.2717
.050


.92125
.4498
.135


.90667
.4355
.104


.98500
.4498
.106


.97500
.4355
.079


.82000
.2305
.062


.91000
.4465
.180


.87500
.1713
.083


.86500
.1394
.053


.90208
.2843
.149


.83375
.3222
.136


.96000
.2843
.125


.89000
.3222
.114


.77500
.1202
.055


.82500
.3454
.247


Note: Format of this table is


identical to Table 1Al.








Table 1B2
Behavior of nominal 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F normal.
P

.10000 .26471 .40000 .50000 .60000 .73529 .90000


Arvesen




Normal




ABMeans




ABMedians




ABMeansC


ABMediansC




U-statistic


.94500
.3268
.193


.97500
.2679
.073


.90708
.4267
.163


.75375
.4473
.128


.93500
.4267
.089


.78500
.4473
.071


.83500
.1836
.074


.95500
.4317
.123


.95000
.3917
.044


.94667
.5610
.129


.93167
.5210
.106


.97000
.5610
.066


.96000
.5210
.057


.86000
.3076
.074


.91000 .94500
Chi-Square .2140 .3867
.083 .095
Note: Format for this table


.98000
.4923
.126


.95000
.4255
.025


.96125
.6094
.109


.95875
.5515
.088


1.00000
.6094
.061


1.00000
.5515
.046


.88500
.3656
.084


.92500
.5061
.125


.96000
.4623
.117


.94000
.4196
.031


.94708
.5999
.111


.94621
.5451
.090


.98000
.5999
.077


.98000
.5451
.055


.84500
.3467
.078


.91500
.5185
.132


.95000
.4603
.130


.96000
.3954
.042


.95083
.5928
.119


.95250
.5439
.094


1.00000
.5928
.090


.99000
.5439
.065


.80500
.3423
.089


.86500
.5578
.161


is identical to Table 1Al.


.97500
.3779
.111


.95500
.3194
.058


.96583
.5322
.146


.95750
.5095
.111


.99500
.5322
.117


.99000
.5095
.086


.88000
.2745
.074


.93000
.5552
.202


.92500
.2151
.104


.91000
.1653
.062


.95000
.3507
.170


.90458
.3892
.151


.98000
.3507
.144


.95000
.3892
.128


.85500
.1431
.066


.85500
.4926
.306


is identical to Table IAI.








Table 1C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F normal.
90% 95%
p p

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





AEMeans





ABMedians





ABMeansC


ABMediansC


.92000
.2936
.095



.89000
.2695
.031



.88361
.4675
.128



.83444
.4598
.101



.98000
.4675
.057


.92000
.4598
.044


.90500
.3334
.089



.90000
.2968
.014



.88471
.5085
.106



.89917
.4787
.087



.97000
.5085
.062


.98000
.4787
.050


.92500
.2474
.058



.88500
.2336
.036



.90555
.4355
.130



.92389
.4156
.104



.99500
.4355
.097


1.00000
.4156
.076


.96000
.3512
.110



.94500
.3207
.035



.93583
.5353
.135



.89472
.5247
.105



.99000
.5353
.059


.95500
.5247
.046


.95000
.3972
.102



.95000
.3488
.016



.92805
.5781
.109



.94250
.5450
.090



.98500
.5781
.065


.99000
.5450
.053


.94500
.2987
.068



.95500
.2619
.043



.94805
.5027
.140



.95778
.4792
.111



1.00000
.5027
.106


1.00000
.4792
.082


Note: For each procedure the first row is the empirical confidence
coefficient, the second is the average length of the intervals, and the
third is the standard deviation of the lengths. Also for each
procedure, the first three columns apply to nominal 90% confidence
intervals and the second three columns apply to nominal 95% confidence
intervals. Example: For 200 90% confidence intervals constructed using
Arvesen's procedure when p = .26471, the proportion of intervals that
contained .26471 was .92000, the average lengths of the intervals was
.2936, and the standard deviation of the lengths was .095. For 200 95%
confidence intervals constructed under the same conditions, the
corresponding values were .96000, .3512, and .110.









Behavior of nominal


Table ID
90% and 95% confidence


intervals for models with


k = 12 treatments, n = 6 observations per treatment, and F normal.
90% 95%
p P

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


.86500
.4310
.164



.85500
.4043
.065



.82208
.5725
.167



.78000
.5894
.148



.92000
.5725
.077


.89000
.4522
.121



.88000
.4139
.041



.86708
.6026
.155



.88500
.5922
.133



.97500
.6026
.091


.93500
.3692
.125



.90500
.3185
.068



.88125
.5470
.185



.91708
.5387
.152



1.00000
.5470
.127


.92000
.5054
.169



.92000
.4786
.076



.90792
.6898
.167



.88958
.7013
.144



.97500
.6898
.080


.94000
.5388
.135



.93500
.4890
.047



.93333
.7212
.149



.94917
.7045
.126



1.00000
.7212
.089


.97000
.4480
.145



.96000
.3787
.079



.94792
.6720
.186



.96417
.6555
.152



1.00000
.6720
.130


ABMediansC

Note: Format


.87000 .98500 .99500
.5894 .5922 .5387
.061 .073 .099
of this table is identical


.96000
.7013
.062
to Table 1C.


1.00000
.7045
.070


1.00000
.6555
.102








Table 2A
Behavior of nominal 90% and 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F uniform.
90% 95%
P P

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


ABMediansC


.89500
.5292
.206



.96000
.5195
.076



.86583
.5018
.173



.87583
.4691
.122



.94000
.5018
.112


.93500
.4691
.071


.92500
.5576
.189



.97000
.5312
.047



.88083
.5646
.145



.90333
.5075
.115



.95000
.5646
.103


.97500
.5075
.065


.92500
.4680
.182



.97000
.4197
.084



.87000
.5194
.178



.83250
.4937
.127



.95000
.5194
.155


.89000
.4937
.092


.95000
.6211
.210



.99000
.6124
.078



.94833
.6301
.185



.95083
.5756
.125



.98000
.6301
.115


.97500
.5756
.075


of this table is identical to Table 1C.


.97500
.6579
.190



.98000
.6133
.052



.95167
.6836
.147



.96750
.6148
.117



.98500
.6836
.106


.99500
.6148
.064


.96500
.5719
.200



.98500
.4862
.094



.94417
.6411
.181



.92417
.6028
.130



.98000
.6411
.159


.97500
.6028
.097


Note: Format








Table 2B
Behavior of nominal 90% and 95% confidence


intervals for models with


k = 12 treatments, n = 15 observations per treatment, and F uniform.
90% 95%
p p

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


.92000
.3028
.089



.95500
.3311
.039



.89792
.4032
.121



.85708
.4194
.096



.98500
.4032
.063


.94500
.3311
.085



.95500
.3633
.015



.90542
.4610
.102



.92708
.4472
.085



.98000
.4610
.071


.91500
.2369
.081



.98000
.2691
.039



.89833
.3715
.126



.90167
.3988
.113



.98000
.3715
.111


.95000
.3657
.104



.97500
.3960
.044



.95208
.4822
.113



.92292
.4924
.102



.99500
.4822
.068


.96000
.3981
.098



.98500
.4274
.018



.94914
.5431
.107



.96833
.5223
.087



.99000
.5431
.074


.95500
.2885
.097



1.00000
.3164
.046



.94583
.4458
.140



.95208
.4677
.119



.99500
.4458
.124


ABMediansC


Note: Format


.94000
.4194
.047
of this


.98500
.4472
.045
table is


.96500
.3988
.087
identical


.97500
.4924
.051
to Table 1C.


1.00000
.5223
.048


.98500
.4677
.095








Table 2C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F uniform.
90% 95%
p p

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


.93000
.2487
.058



.96500
.2746
.024



.87922
.4012
.122



.83556
.4311
.097



.96500
.4012
.048


.91500
.2543
.059



.96500
.2994
.009



.88611
.4427
.093



.91639
.4461
.083



.98000
.4427
.058


.90000
.1835
.052



.97000
.2229
.028



.87361
.3527
.112



.90833
.3689
.102



.98500
.3527
.094


.97000
.2992
.069



.98500
.3266
.027



.93139
.4644
.134



.90194
.4938
.101



.99000
.4644
.052


.95500
.3054
.069



.98500
.3516
.010



.92917
.5095
.099



.95417
.5101
.086



.99500
.5095
.063


.94500
.2219
.062



.98500
.2610
.033



.92444
.4133
.123



.94694
.4262
.107



.99500
.4133
.105


ABMediansC


Note: Format


.90000
.4311
.038
of this


.99000
.4461
.049
table is


.98500
.3689
.075
identical


.97000
.4938
.041
to Table 1C.


1.00000
.5101
.053


.99000
.4262
.082








Behavior of nominal


Table 2D
90% and 95% confidence intervals for models with


k = 12 treatments, n = 6 observations per treatment, and F uniform.
90% 95%
P P

.26471 .50000 .73529 .26471 .50000 .73529

.91500 .92500 .93500 .93000 .95000 .96500
Arvesen .4055 .3816 .2925 .4809 .4580 .3555
.127 .107 .117 .139 .122 .136



.93000 .94500 .98000 .97500 .97500 .99500
Normal .4170 .4201 .3126 .4932 .4201 .3720
.051 .031 .054 .060 .031 .063



.86458 .84708 .84708 .93583 .92208 .92583
ABMeans .5470 .5505 .4681 .6641 .6686 .5916
.167 .153 .179 .171 .153 .198



.80167 .87000 .89875 .90833 .93917 .95167
ABMedians .5858 .5606 .4829 .6975 .6750 .5989
.149 .132 .159 .143 .128 .166



.95000 .94500 .97000 .97000 .99000 .99500
ABMeansC .5470 .5505 .4681 .6641 .6686 .5916
.076 .101 .140 .079 .102 .160



.88500 .96500 .99500 .95500 .99500 1.00000
ABMediansC .5858 .5606 .4829 .6975 .6750 .5989
.066 .075 .115 .065 .076 .120
Note: Format of this table is identical to Table 1C.








Behavior of nominal
k = 6 treatments, n


Table 3A
90% and 95% confidence intervals
= 12 observations per treatment,


for models with
and F logistic.


90% 95%
p p

.26471 .50000 .73529 .26471 .50000 .73529

.93000 .88000 .91500 .94500 .94000 .95000
Arvesen .5940 .6133 .5258 .6805 .7110 .6396
.231 .214 .188 .229 .212 .202



.88500 .84000 .84500 .97000 .89500 .91500
Normal .5039 .5056 .4203 .5959 .5876 .4871
.088 .080 .113 .090 .088 .128



.91000 .84833 .88833 .96333 .92833 .93917
ABMeans .6038 .6192 .5601 .7273 .7375 .6762
.167 .167 .188 .160 .157 .194



.86167 .90917 .82750 .94083 .96167 .91417
ABMedians .5047 .5248 .5139 .6123 .6348 .6225
.120 .116 .140 .124 .121 .146



.95000 .92500 .94000 .98000 .98500 .98000
ABMeansC .6038 .6192 .5601 .7273 .7375 .6762
.102 .118 .157 .094 .111 .165



.91000 .95500 .91000 .99000 .99500 .96000
ABMediansC .5047 .5248 .5139 .6123 .6348 .6225
.069 .069 .103 .072 .072 .108
Note: Format of this table is identical to Table 1C.








Table 3B
Behavior of nominal 90% and 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F logistic.
90% 95%
p P

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


.92500
.4090
.148



.86500
.3265
.048



.88333
.5064
.129



.85042
.4645
.099



.96500
.5064
.072


.86500
.4386
.142



.85500
.3528
.027



.90625
.5402
.111



.91333
.4878
.086



.98000
.5402
.067


.89000
.3553
.124



.86500
.2765
.059



.90917
.4661
.142



.89917
.4454
.111



.97500
.4661
.110


.96000
.4843
.160


.91500
.3904
.053



.93875
.5915
.132



.91750
.5407
.103



.97500
.5915
.074


.93500
.5177
.150


.90000
.4160
.032



.95250
.6247
.111



.95625
.5653
.086



.99500
.6247
.069


.92500
.4296
.142


.94500
.3251
.069



.95583
.5495
.151



.95208
.5209
.117



.98500
.5495
.120


ABMediansC


Note: Format


.92500
.4645
.051
of this


.97000
.4878
.048
table is


.97000
.4454
.086
identical


.97500
.5407
.053
to Table 1C.


.99500
.5653
.050


.98500
.5209
.092








Table 3C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F logistic.
90% 95%
p p

.26471 .50000 .73529 .26471 .50000 .73529

.94000 .88000 .85500 .96000 .93000 .94500
Arvesen .3398 .3579 .2846 .4035 .4251 .3437
.125 .102 .086 .140 .114 .102



.85000 .84500 .79500 .91500 .93500 .87000
Normal .2695 .2956 .2240 .3207 .3476 .2624
.033 .015 .046 .037 .018 .054



.88806 .87694 .89972 .93083 .92805 .94444
ABMeans .4874 .5234 .4444 .5559 .5933 .5123
.130 .108 .140 .135 .110 .148



.84333 .89833 .91305 .90028 .94444 .95472
ABMedians .4718 .4842 .4295 .5579 .5508 .4937
.102 .086 .108 .106 .089 .115



.98000 .97500 1.00000 1.00000 .98500 1.00000
ABMeansC .4874 .5234 .4444 .5559 .5933 .5123
.056 .062 .106 .058 .065 .113



.94500 .98000 1.00000 .98500 .98500 1.00000
ABMediansC .4718 .4842 .4295 .5579 .5508 .4937
.043 .049 .079 .044 .051 .084
Note: Format of this table is identical to Table 1C.








Table 3D
Behavior of nominal 90% and 95% confidence intervals


for models with


k = 12 treatments, n = 6 observations per treatment, and F logistic.
90% 95%
P P

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans


ABMedians


ABMeansC


.89500
.4585
.180



.88500
.4084
.061



.85208
.5913
.169


.80250
.5962
.150


.95500
.5913
.073


.88500
.4588
.138



.85000
.4149
.048



.86458
.6098
.158


.89125
.5966
.136


.98000
.6098
.083


.89500
.3947
.135



.87500
.3190
.077



.87458
.5440
.189


.89875
.5258
.161


.99500
.5440
.121


.92000
.5354
.189



.95500
.4824
.071



.92750
.7084
.165


.90542
.7099
.143


.99000
.7084
.072


.93000
.5444
.147



.92000
.4904
.052



.93208
.7299
.149


.95292
.7117
.128


.99000
.7299
.081


.94000
.4789
.154



.94000
.3791
.089



.94667
.6679
.191


.96125
.6420
.162


1.00000
.6679
.129


ABMediansC


Note: Format


.89000
.5962
.064
of this


.99000
.5966
.063
table is


.99500
.5258
.099
identical


.98500
.7099
.063
to Table 1C.


1.00000
.7117
.065


1.00000
.6420
.101








Table 4A
Behavior of nominal 90% and 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F Laplace.
90% 95%
P p

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


.90500
.6241
.256



.85000
.4761
.103



.86333
.6179
.192



.88333
.5146
.128



.93500
.6179
.123


.87000
.6529
.218



.75500
.4938
.095



.85583
.6521
.180



.90833
.5495
.122



.93000
.6521
.131


.85500
.5874
.217



.77500
.4240
.127



.85417
.5711
.219



.79083
.5144
.152



.93500
.5711
.177


.93500
.7015
.246



.96500
.5668
.107



.94083
.7382
.176



.94667
.6193
.134



.98000
.7382
.111


.91500
.7484
.216



.88000
.5766
.103



.93333
.7673
.162



.96833
.6551
.119



.97000
.7673
.118


.89500
.6986
.215



.87500
.4920
.144



.94250
.6890
.209



.89250
.6272
.153



.98000
.6890
.170


ABMediansC


uLo. U .or La L


.93500
.5146
.077


uL Lh il


.96500 .91000
.5495 .5144
.074 .108
table is identical


.99000
.6193
.081
to Table 1C.


.99000
.6551
.076


.95500
.6272
.109


.109


XNT t F --








Table 4B
Behavior of nominal 90% and 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F Laplace.


90%


95%


.26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


ABMediansC


.88000
.4133
.184



.78500
.3105
.057



.88333
.5422
.137



.86083
.4775
.108



.96000
.5422
.068


.94500
.4775
.054


.84500
.4696
.154



.74500
.3473
.032



.89167
.5707
.125



.90458
.5036
.094



.96500
.5707
.076


.98000
.5036
.053


.86000
.4028
.136



.77500
.2858
.064



.89625
.5126
.165



.87042
.4777
.120



.98500
.5126
.125


.97500
.4777
.088


.26471 .50000 .73529


.93000
.4862
.197



.86500
.3722
.064



.93792
.6271
.138



.92625
.5532
.112



.98500
.6271
.068


.98000
.5532
.059


Note: Format of this table is identical to Table 1C.


.88000
.5520
.165



.85000
.4102
.036



.94417
.6557
.123



.95042
.5812
.099



1.00000
.6557
.076


1.00000
.5812
.055


.92000
.4846
.155



.86500
.3361
.075



.94208
.5965
.171



.92292
.5561
.126



.99500
.5965
.132


.99000
.5561
.093








Table 4C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F Laplace.
90% 95%
P p

.26471 .50000 .73529 .26471 .50000 .73529

.89500 .91000 .85500 .94500 .94500 .91500
Arvesen .3403 .4314 .3330 .4038 .5075 .4008
.131 .134 .111 .147 .146 .130



.78500 .76500 .70500 .86500 .82500 .83500
Normal .2609 .2912 .2245 .3108 .3424 .2630
.040 .021 .050 .046 .025 .059



.86444 .88583 .88278 .91222 .93305 .92694
ABMeans .5268 .5693 .4808 .5974 .6410 .5521
.135 .118 .155 .138 .118 .162



.83528 .90528 .90194 .89750 .95055 .94250
ABMedians .4880 .5164 .4562 .5542 .5837 .5208
.103 .095 .119 .106 .096 .123



.93000 .99000 .97500 .97000 1.00000 .99500
ABMeansC .5268 .5693 .4808 .5974 .6410 .5521
.055 .060 .107 .057 .061 .115



.93500 .99000 .99000 .97000 .99000 .99000
ABMediansC .4880 .5164 .4562 .5542 .5837 .5208
.041 .052 .078 .043 .054 .082
Note: Format of this table is identical to Table 1C.








Behavior of nominal
k = 12 treatments.


Table 4D
90% and 95% confidence intervals for models with
n = 6 observations per treatment, and F Laplace.


90% 95%
p P

.26471 .50000 .73529 .26471 .50000 .73529

.88000 .86000 .82500 .92000 .91500 .90000
Arvesen .4856 .4955 .4095 .5614 .5847 .4973
.207 .155 .138 .210 .166 .161



.80500 .77500 .77000 .88000 .84500 .85000
Normal .4020 .4066 .3243 .4755 .4806 .3852
.066 .055 .087 .076 .063 .101



.79583 .82917 .83500 .89000 .91458 .91458
ABMeans .6126 .6128 .5404 .7257 .7284 .6617
.176 .178 .211 .169 .170 .213



.78042 .88042 .87000 .88333 .94833 .93583
ABMedians .6115 .6058 .5341 .7218 .7175 .6514
.142 .146 .178 .133 .137 .179



.87000 .95000 .97000 .97000 .99000 .99500
ABMeansC .6126 .6128 .5404 .7257 .7284 .6617
.084 .096 .137 .082 .097 .140



.86000 .98500 .99000 .97500 .99500 1.00000
ABMediansC .6115 .6058 .5341 .7218 .7175 .6514
.060 .074 .108 .056 .070 .111
Note: Format of this table is identical to Table 1C.







Table 5A
Behavior of nominal 90% and 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F Cauchy.
90% 95%
p P

.26471 .50000 .73529 .26471 .50000 .73529

.66500 .56500 .64000 .72500 .62500 .68500
Arvesen .5969 .5642 .6561 .6375 .6148 .7219
.398 .372 .337 .392 .373 .333



.46500 .30500 .25000 .88500 .42000 .31500
Normal .3232 .3583 .3679 .4025 .4389 .4432
.133 .149 .167 .145 .165 .186



.51333 .62083 .61833 .63000 .72500 .72583
ABMeans .5537 .5520 .4795 .6663 .6659 .5949
.330 .326 .344 .327 .322 .351



.56583 .92500 .68167 .71167 .97147 .77333
ABMedians .4635 .4952 .5025 .5559 .5944 .6037
.204 .204 .202 .225 .217 .216



.47000 .74000 .77000 .58000 .80000 .85000
ABMeansC .5537 .5520 .4795 .6663 .6659 .5949
.237 .228 .254 .241 .230 .264



.57500 .98500 .81000 .76000 1.00000 .90500
ABMediansC .4635 .4952 .5025 .5559 .5944 .6037
.117 .109 .108 .132 .116 .116
Note: Format of this table is identical to Table 1C.








Table 5B
Behavior of nominal 90% and 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F Cauchy.
90% 95%
p P

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


ABMediansC


.51000
.4659
.415



.15000
.1617
.089



.50708
.5692
.303



.46375
.3680
.185



.45000
.5692
.182


.52000
.3680
.117


.49000
.4853
.389



.15000
.1844
.107



.60625
.5391
.351



.90542
.4149
.194



.74000
.5391
.187


.97500
.4149
.112


.46000
.4809
.362



.11000
.2042
.112



.67542
.4902
.312



.64708
.4213
.176



.88500
.4902
.174


.75500
.4213
.094


.54500
.4993
.426



.20000
.1986
.103



.62667
.6738
.297



.54667
.4272
.206



.63000
.6738
.187


.62500
.4272
.131


.54500
.5404
.405



.18500
.2246
.124



.72792
.6581
.302



.95375
.4801
.213



.84500
.6581
.183


.99500
.4801
.126


.49500
.5432
.382



.15000
.2464
.130



.78792
.6021
.317



.70125
.4895
.191



.95000
.6021
.194


.84000
.4895
.102


Note: Format of this table is identical to Table 1C.









Behavior of nominal
k = 18 treatments,


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


Table 5C
90% and 95% confidence intervals for models with
n = 12 observations per treatment, and F Cauchy.


90% 95%
P P

.26471 .50000 .73529 .26471 .50000 .73529


.49000
.4382
.405



.13000
.1454
.079



.60861
.6172
.284



.47500
.3690
.191



.66000
.6172
.146


.45500
.4473
.358



.11500
.1643
.083



.68250
.6013
.293



.90556
.4034
.196



.93500
.6013
.161


.44500
.4502
.311



.11000
.1746
.091



.69056
.5301
.314



.63417
.3914
.188



.98000
.5301
.183


.52000
.4772
.420



.16500
.1754
.091



.68889
.6817
.277



.55778
.4198
.210



.79000
.6817
.137


.50000
.5023
.375



.13000
.1972
.097



.76000
.6673
.282



.95000
.4571
.215



.96000
.6673
.156


.49500
.5340
.345



.13000
.2084
.106



.75806
.5967
.310



.68056
.4457
.205



.98500
.5967
.186


.57000
ABMediansC .3690
.148
Note: Format of this


.98500
.4034
.151
table is


.76500
.3914
.141
identical


.64500
.4198
.165
to Table IC.


1.00000
.4571
.167


.79500
.4457
.157








Behavior of nominal


Table 5D
90% and 95% confidence intervals for models with


k = 12 treatments, n = 6 observations per treatment, and F Cauchy.
90% 95%
p p

.26471 .50000 .73529 .26471 .50000 .73529


Arvesen





Normal





ABMeans





ABMedians





ABMeansC


.67500
.5242
.379



.48000
.3024
.103



.56292
.5701
.309



.58708
.5208
.222



.62000
.5701
.162


.53000
.4839
.356



.23000
.2897
.115



.62333
.5700
.326



.86667
.5092
.236



.88500
.5700
.177


.57500
.5166
.313



.21500
.2953
.139



.69708
.5784
.344



.73042
.5110
.227



.97500
.5784
.208


.69000
.5646
.380



.64500
.3622
.116



.69708
.6920
.304



.69833
.6176
.232



.80000
.6920
.155


.56000
.5478
.378



.30000
.3464
.132



.75458
.6953
.317



.94042
.6024
.250



.94500
.6953
.169


.65000
.6164
.334



.29000
.3518
.162



.80792
.7042
.310



.81417
.6140
.207



.99500
.7042
.190


ABMediansC


Note: Format


.61500
.5208
.163
of this


.98500
.5092
.182


.86500
.5110
.153


table is identical to


.81500
.6176
.179
Table 1C.


1.00000
.6024
.202


.92000
.6140
.169







The second position in the number of the table is a letter that

refers to the size of the model used according to the following scheme:

A-(6,12), B-(12,15), C-(18,12), and D-(12,6).

The first four tables (Tables 1Al, 1A2, 1B1, and 1B2) give complete

results for all procedures and all p values for nominal 90% and 95%

confidence intervals constructed using normally distributed responses

and model sizes (6,12) and (12,15). The performance of each procedure

over the range of p values given in these tables held consistently

throughout the rest of the study. For this reason, the remaining tables

give results only for the p values .26471, .50000, and .73529.

First we examine Table lA1. The Arvesen procedure gives an

empirical confidence coefficient that is consistently close to the 90%

nominal level over the entire range of p values. However, compared to

the other procedures, the average length and the standard deviation of

the lengths are quite high. The Normal procedure performs well, as it

should in this case since the needed assumptions are met, in empirical

confidence coefficient, length, and standard deviation. The ABMeans

procedure produces results similar to the Arvesen procedure giving

slightly lower empirical confidence coefficient, essentially equal

length, and less variability of length. The ABMedians procedure gives

results similar to the Normal procedure near the center of the range of

p values except for higher variability of length. The empirical

confidence coefficient drops off as p gets larger or smaller, especially

for p = .10000.

Due to the method of construction, the ABMeansC and ABMediansC

procedures produce intervals with the same average length as the ABMeans

and ABMedians procedures. However, the variability of the lengths is




71


decreased by using the combined procedures. For the ABMeansC procedure

the empirical confidence coefficient increases over the ABMeans

procedure to the point where it is consistently above the 90% nominal

level. However, it is slightly lower at p = .10000 than it is at other

p values but then only slightly below the nominal level. The ABMediansC

procedure performs very well near the center of the p values with

empirical confidence coefficient above the nominal level and variability

similar to the Normal procedure. However, like the ABMedians procedure,

the empirical confidence coefficient drops as p moves farther from

.50000 and again is especially poor at p = .10000.

The U-statistic procedure produces short intervals with moderate

variability but with empirical confidence coefficient well below the

nominal level for all values of p. The Chi-Square procedure also

produces intervals with consistently low empirical confidence

coefficient. The lengths are moderate while the variability of the

lengths is quite high.

Table 1A2 contains results for nominal 95% confidence intervals

under the same conditions as those used for Table 1Al. These results

are consistent with those found in Table 1Al.

Tables 1B1 and 1B2 show what happens if the model size is increased

to (12,15). For most procedures empirical confidence coefficients

generally get closer to the nominal level than they were for the (6,12)

model. For the ABMeansC and ABMediansC procedures, the empirical

confidence coefficient rises even higher above the nominal level except

for those same p values where they were low in the (6,12) model. For

all procedures the average lengths and standard deviations of lengths

decrease for the larger model though the decrease is not as pronounced







for the AB procedures as it is for the other procedures. Therefore, the

ABMeans and ABMedians procedures do not compare as favorably with the

Arvesen and Normal procedures as they did when the model was (6,12).

The ABMeansC and ABMediansC procedures also have longer lengths than the

Arvesen and Normal procedures but with generally higher empirical

confidence coefficients.

The U-statistic and Chi-Square procedures also improve as the model

size increases but the improvement is not sufficient to raise these

procedures to the level of the others. Another aspect to these

procedures is the high occurrence of intervals which have endpoints

that must be truncated at either 0 or 1 (since 0 4 p < 1). This happens

more frequently for the Chi-Square procedure than for the U-statistic

procedure, sometimes occurring as often as in 60% of the intervals even

when p = .50000. The poor performance of the U-statistic and Chi-Square

procedures was consistent over the whole study. For this reason these

procedures are not recommended for use and results are not given for

them beyond Table 1B2.

Beginning with Table 1C, results for both nominal 90% and 95%

confidence intervals are given in the same table for the reduced range

of p values. Table IC shows that increasing the model size even

further, to (18,12), produces results consistent with the findings when

the model size was increased from (6,12) to (12,15). That is, lengths

and variability of lengths decrease for all procedures though at a

slower rate for the AB procedures. Also, empirical confidence

coefficients for the ABMeansC and ABMediansC procedures increase while

the empirical confidence coefficients of the other procedures stay near

the nominal levels.







The results in Table 1D are also consistent with the results in the

previous tables. Table 1D shows that, as would be expected, the

performance of the procedures is generally poorer than in Tables 1B1 and

1B2, since n is smaller, but better than in Tables 1Al and 1A2 since the

number of treatments is higher.

Tables 2A through 2D are for effects with uniform distributions.

The patterns exhibited in Tables 1Al through ID are generally apparent

in these tables as well. The most notable exception is that the

empirical confidence coefficient for the Normal procedure is noticeably

higher than the nominal level. Also, the lengths of the intervals using

the Arvesen procedure are essentially the same as in the Normal

procedure. However, the Normal procedure is still superior due to the

increased empirical confidence coefficient and much less variable

lengths.

As the tails of the distributions of the effects get heavier, the

empirical confidence coefficient of the Normal procedure decreases.

This can be seen in Tables 3A through 3D which deal with effects that

have logistic distributions. The results for the other procedures are

similar to those seen previously.

For the smaller models, the ABMeansC procedure is apparently

superior to the Arvesen procedure. For example, in Table 3A for

p = .26471 and nominal 90% intervals, the ABMeansC procedure produced

intervals with higher empirical confidence coefficient, smaller

variability of length, and just slightly higher average length than the

Arvesen intervals. If the results for nominal 90% intervals for the

ABMeansC procedure are compared with nominal 95% intervals for the

Arvesen procedure (still p = .26471) the ABMeansC procedure is better in







all three areas. As the model size increases, the ABMeansC procedure

still produces intervals with higher empirical confidence coefficient

and less variable lengths than the Arvesen procedure but with clearly

higher average lengths (Table 3C).

Tables 4A through 4D give results for effects with Laplace

distributions. With even heavier tails the empirical confidence

coefficient of the Normal procedure decreases even more. Performance of

the other procedures generally follows the patterns discussed earlier.

If the effects have Cauchy distributions then they do not possess

finite second moments. Therefore, the only procedures whose assumptions

are satisfied are the ABMedians and ABMediansC. This is evident in the

results in Tables 5A through 5D where the ABMediansC procedure gives

consistently better results than the other procedures. The ABMeans and

ABMeansC procedures perform better overall than the Arvesen and Normal

procedures and in the larger models (Table 5C for example) are better

than The ABMedians and ABMediansC procedures for p values away from

.50000. For all procedures intervals have more variable lengths when

the Cauchy distribution is used.

As we have seen, an overall view of the tables show that it is very

difficult to choose a uniformly "best" procedure. Each of the

procedures has situations where it performs well and other situations

where its performance is poor. With the exception of Cauchy distributed

effects (Tables 5A through 5D), the Arvesen procedure produces intervals

with confidence coefficient consistently close to the nominal level.

However, the lengths of the intervals constructed are quite variable,

especially for smaller models.








The Normal procedure produces intervals that are generally narrower

and less variable than those produced by the Arvesen procedure but with

an inconsistent confidence coefficient. The confidence coefficient

ranges from above the nominal level when the uniform distribution is

used (Tables 2A through 2D) to well below the nominal level when the

Cauchy distribution is used (Tables 5A through 5D).

Another aspect to both the Arvesen and Normal procedures is the

possibility of needing to truncate the endpoints of the interval at

either 0 or 1. This is necessary more often with the Arvesen procedure

than with the Normal procedure and in both cases less than with the

Chi-Square procedure. In those cases where truncation was necessary,

the length of the interval was calculated using the value of 0 and/or 1.

For the methods using the modified Ansari-Bradley statistics we saw

that, due to the method of combining the k possible intervals, the

length of the combined interval is the same as the average length of all

k possible intervals. Therefore, the average lengths reported in the

tables are identical for the ABMeans and ABMeansC procedures as well as

for the ABMedians and ABMediansC procedures. Since the tables also

showed that combining the intervals produces less variable lengths and

consistently higher confidence coefficient, it is recommended that the

ABMeansC and ABMediansC procedures be used rather than the ABMeans or

ABMedians procedures.

The ABMeansC procedure produces intervals with confidence

coefficient consistently higher than the nominal level, even for the

smaller models, except when p = .10000. Even when p = .10000 the

confidence coefficient is only slightly below the nominal level. As

with the Arvesen and Normal procedures, the performance of the ABMeansC







procedure is poorer when the effects have Cauchy distributions although

the drop-off in performance is less severe. The average lengths of the

intervals from the ABMeansC procedure are approximately the same as the

average lengths from the Arvesen procedure for small models but do not

decrease as quickly as the model size increases.

The ABMediansC procedure produces intervals with generally shorter

and less variable lengths than the ABMeansC procedure. The confidence

coefficient is consistently above the nominal level for p values near

.50000 but falls as p moves toward 0 or 1. The drop is quite severe as

p approaches .10000.

As with the ABMeansC procedure, the reduction in average length and

standard deviation of length as the model size increases is not as rapid

for the ABMediansC procedure as it is with the Arvesen and Normal

procedures. However, unlike the Arvesen and Normal procedures, the

ABMeansC or ABMediansC procedures will always produce intervals with

endpoints between 0 and 1 due to their methods of construction (see

Section 3.3).

The choice of which procedure to use to construct a confidence

interval for p will really depend on how much is known or is being

assumed about the model. If it is assumed that the effects have a

distribution similar to a uniform or normal distribution, then the

Normal procedure is clearly superior (Tables 1Al through 2D) since it

produces narrow intervals with confidence coefficient close to or

greater than the nominal level. However, the Normal procedure is not

recommended if the effects might have distributions with heavy tails.

If it is believed that p is near .50000 and nothing is known about

the distribution of the effects, then the ABMediansC procedure is








recommended since, for every distribution including Cauchy, the method

performs very well for values of p near .50000. However, this method is

not recommended if p is thought to be near .10000.

If nothing is known about the distribution of the effects or the

value of p, but moments are assumed to exist, then the Arvesen procedure

or the ABMeansC procedure should be used. These procedures gave the

most consistent performance over the whole range of situations. For

smaller models the ABMeansC procedure would be recommended since it

provides less variable intervals with higher confidence coefficient than

the Arvesen procedure with little or no increase in length. For larger

models the disparity in length and confidence coefficient between the

two procedures increases. If a high confidence coefficient is desired,

then the ABMeansC procedure should be used. If a shorter length is

desired, then the Arvesen procedure will produce such an interval but

with more variation in the lengths and a smaller confidence coefficient.

If it is believed that the effects may have a very heavy tailed

distribution, such as Cauchy, then either the ABMeansC or ABMediansC

procedures should be used since their performance is superior to the

other procedures in this case.

The overall performance of the ABMeansC and ABMediansC procedures

is such that they merit serious consideration when a confidence interval

for p is desired. For distributions of all types and for all but

extreme values of p, these procedures produce intervals that compare

favorably with intervals produced by other procedures and, in many

cases, are superior. This is especially true when the model size is

small. This conclusion is apparently valid even when the assumptions








necessary to apply the other procedures are met. For example, compare

the performance of the AEMediansC and Normal procedures when the effects

have normal distributions and the model size is (6,12) (Tables lA1 and

1A2). Yet the AB procedures can sometimes be validly implemented under

less restrictive assumptions than the competing techniques.

As we have seen, one of the points to consider when choosing a

procedure to use is the assumptions necessary for valid implementation

of the procedure. These assumptions are reviewed in the following

chapter.














CHAPTER FIVE
SUMMARY


In this dissertation we have derived and studied various methods of

measuring the proportion of the total variability in the responses from

a balanced one-way random effects model,


Zij = 1 + ai + eij i = 1,2,...,k, j = 1,2,...,n,


that is attributable to the treatments. These methods require different

assumptions and therefore, theoretically, can only be used if the

appropriate assumptions are met.

The ABMeans and ABMedians procedures (and thus also the AEMeansC

and ABMediansC procedures) derived in Chapter Three require the eij and

ai to possess continuous distributions that are symmetric about zero and

that differ only by a scale parameter. Both procedures also require the

distributions to have bounded, continuous densities that are positive at

zero with bounded, continuous first derivatives. The ABMeans procedure

requires the distributions to have finite second moments while the

ABMedians procedure requires either flxI*f(x)dx < for some p > 0

or lim inf -ln[l-F(x)][21n(x)]1 > 0. Both procedures are asymptotic as

both k (number of treatments) and n (number of observations per

treatment) go to infinity. However, as we saw in Chapter Four, Monte

Carlo studies show that the procedures perform quite well for small

values of n and k.







These assumptions are a broadening of the assumptions used in the

classical, normal theory analysis of the balanced one-way random effects

model. In the classical analysis the effects are assumed to have normal

distributions with zero means and finite variances. The assumptions for

the ABMeans and ABMedians procedures allow the effects to have other

symmetric distributions and, in the ABMedians procedure, does not

require finite second moments.

The U-statistic and Chi-Square procedures derived in Chapter Two,

as well as the Arvesen procedure described in Chapter Two, also require

the eij and ai to have continuous distributions. These distributions

must have mean zero and finite fourth moments but need not be symmetric

nor of the same family. These procedures only require k to go to

infinity rather than both k and n. In some senses this is a more

reasonable approach since increasing k is sufficient to obtain more

information about both the treatment and error effects. Therefore, the

procedures involving U-statistics could be used in some situations where

the ABMeans and ABMedians procedures could not.

Of the procedures derived in this dissertation the ABMeansC and

ABMediansC procedures produced the most promising results. Future

research could include trying to find other ways of combining the

individual intervals that would produce a narrower interval, even if the

empirical confidence coefficient is decreased. The ABMeansC and

ABMediansC procedures produce intervals that, for the most part, have

empirical confidence coefficient far above the nominal level. It would

be desirable to obtain intervals, presumably shorter, with confidence

coefficients nearer to the nominal levels.





81


Other possible areas of future research would be to extend the

procedures to the unbalanced model and to two-way and more complex

models. Formation of pseudo-samples using quantities other than sample

means or sample medians could also be examined.













APPENDIX A
VARIANCES AND COVARIANCE OF U1 AND U2


Using the balanced one-way random effects model under the

assumptions given in Section 2.2, we recall that Ul, given in (2.2.1),

is a U-statistic based on a kernel of degree s = 1. Therefore, from

Result,2.1.2


(A.1)


lim kVar(U1)
k+ao


where, using (2.1.2),


(A.2)


S= E[h(Z (2)2
1 = E[h1(Z1)I (2a2)2.


Recalling from the assumptions in Section 2.2 that the Eij and ai are

mutually independent with mean zero we obtain


E[h (Z1)] = (J-2E[E I (Zj -Z )2]2
j
= 4[n2n(n-1) 2 (n 2)(nO2)E(Z 2)4




+ ( (2) n22 E[(Z11-12 13-14)2]

= 2[n(n-l)]- [E(E11-E12)4

+ 2(n-2)E[(E 1-E12) 2 (El 13) 2

+ (1/2)(n-2)(n-3)E[(E1 --12)2 (e13-14) 2







= 2[n(n-1)]-l[(2~4+6a4) + 2(n-2)(+3o )

+ (1/2)(n-2)(n-3)(4o4)]

44/n + [n(n-l)]- (4n2-8n+12)o


and therefore, referring to Result 2.1.2, (A.1), and (A.2),


lim kVar(U1) = 44/n + [n(n-1)]1 (4n -8n+12)a 4a4
k+<
= 4n-[~ + o4(3-n)(n-1)1],


establishing (2.2.5).

Now recall that U2, given (2.2.2), is a U-statistic of degree
s = 2. We again use Result 2.1.2 to get

(A.3) lim kVar(U2) = 412,
k+-

where in this case, again using (2.1.2),

(A.4) 12 = E[h2(ZI' Z2)h2(Z1Z3)] (2a2+2ao)2.

Calculating the expectation on the RHS of (A.4) gives


E[h2(Z ,Z2)h2 (Z1Z3)]

= n-4E([ZE (Zlj-Z2j-.)2][E, (Z1-j Z3j.)2])

= n-4 (nE[(Z -Z ) (Z 1-Z31) 2
11 21 21 j 31

+ n3 (n-1)E[(Zl-Z 212 (Z12-Z31)2 )

= n-1(E[ (al-a2+11-E21) 2 (l-a3+E 31)2]

+ (n-1)E[(a -a2+E 1-21) 2(al-a3+12- 31)







[(n,+ [ +3a4+3a4+122a 2)
44 a C a e

+ (n-l)(n +3o4+4o4+8o2o 2)

and therefore, referring to Result 2.1.2, (A.3), and (A.4),


lim kVar(U2) = 4(n4 + 4/n a4 4/n + 4o2 2/n),
k+w a E aE

establishing (2.2.6).

Finally, observe from Result 2.1.4 that


(A.5) lim kCov(U1,U2) = 2(1,2),
k2 1

where, using (2.1.3),


(A.6) (1,2) = E[h (Z)h2(Z )] (2a2)(2 2+202)
1 1-1 21'-2'a

Again, first looking at the expectation on the RHS of (A.6), we obtain


E[h1 (Zl)h2(Z,Z2)]

= 2n-3(n-l)-lE[[ (Z .-Z 2 ][Z (Z-Z )2
J j i(-z22-2 i2j)
j = 2n-3(n-l)-l[2n(2)E[(Z1-Z 2(Z -Z )2]
12 11-21

+ n(2)(n 2 )E[(Z -Z2 (Z13-Z212]]

= nl2E[(l 1E-12)2(a1-a2 +E lC21)

+ (n-2)E[(e1-E12)2 (l-a2+)13-+21 2

Sn-1[2( +3 4+4o2a 2) + (n-2)(4a +42a 2)]


and therefore, referring to Result 2.1.4, (A.5) and (A.6),


lim kCov(U1,U) =4n 4-o
k+-w


establishing (2.2.7).














APPENDIX B
THE RELATIONSHIP BETWEEN U1, U2, MST, AND MSE


This appendix establishes the relationship between U1 and U2, given

in (2.2.1) and (2.2.2), and MST and MSE, the mean squares from the usual

one-way analysis of variance table (Scheffe' 1959, Page 225).

First, we expand U1, U2, MST, and MSE so that each is written

completely in terms of the quantities ai and eij. While this is not

necessary in order to see the relationship between U1 and MSE, it

facilitates establishment of the overall relationship between the

statistics.

From (2.2.1) we see that


U1 = 2[nk(n-1)]-1 E E (e ij-E )2
i j
(B.1) = 2[nk(n-1)]- E E E [(e6.+2.) 2e. .E. .


= 2(nk)- 2. 2[nk(n-l)]- E Ei .
ij ij i j3j3 i j"

-1- -1
Letting Z. = n ZZij and = n E it follows that


-1 2
MSE = [k(n-l)] EE(Z -Z ) Expanding, we obtain
ij

-1 2
MSE = [k(n-l)]l E(C. -. )
ij 1j 1i

1 2 -2
= [k(n-1)] -1(E 2 + 2 )
ij j i. ij:i.








= [k(n-1)]-[Ze 2
ij J

= [k(n-l)]- [Eij
ij


+ n E
i j

- n-E(Z Eij2]
i j


= [k(n-1)]-[((e 2.
ij ij


-1 2
- n E2e )
ij i
ij


-1i
- n-I E E i ]
i j~j, 1j ij


= (nk) Z2E [nk(n-l)] 1 Z E E ..
ij i l J' ij ij
ij t follows that


Thus, from (B.1) and (B.2) it follows that


MSE = Ul/2.


In the same manner, from (2.2.2) we see that


U2 = 2[n2k(k-l)]- E E
i
= 2[n2k(k-1)]-lZ Z
i

E (ai-ai +E j-E -j ) 2
j'

j-


+ (C +C ,) (2 a.,) + 2(aCiij+aiEi.j,)
+ ij ,i 1 1 i1i


- 2(aEi..+a. eij) (2E ijej)


-1 2 -1 2
= 2k Ea2 + 2(nk) ZEE
i ij

4[k(k-l)] L Z a i, + 4(nk) -EaiEij
i -1
4[nk(k-l)]-E E Ea.i .i
ini'j J1 i
4[n2k(k-l)]-1Z ZE e.. .
i

Letting Z


-1
= (nk) EZij
ij i .


= (nk) EEi ., and a = k Eai, it follows
ij i


that MST = n(k-l)- E(Z-Z ) 2.


Expanding, we obtain


(B.2)


2n- E(E )j Eij
i j i


(B.3)







MST = n(k-1) (a -ace. -e )2
i

= n(k-1) 1[a2 + 2 + i + (e2 -2. E ) 2caa
) + i2 +i. .. i ... I
i

+ 2ajii. (2ai ..+2aci.-2a )]


= n(k-1) [a2 + k lZa )2 + n2E(1 ij
i i i j

(n2k)-1(Eze )2 2k-1 (Za )2
ij i

+ 2nZ-1.ai Ej 2(nk)- l(Ea )(E Z ,)]
i i i i'j

= n(k-1)l[1l+k-l12k- a 2
i

+ [n (n k) ]Ee2 + [k--2k-]E E a ai.
ij .j i i

+ [2n-1-2(nk)-l]Ea e.. + [-2(nk) 1]E Z Ea
ij 13 ii'j

+ [-(n2k) -1]E E E E
iti'j j '

+ [n-2-(n2k)-1] E ,iji
i jjjj
= nk-Ea2 + (nk)-l E 2n[k(k-l)]- E a. a.
i 1 ij 1i
+ 2k- EEaE. 2[k(k-l)]-E E Ea C. .
ij i ij iti'j 1 i'j
-1
2[nk(k-l)]-lE E E c Eij ..,,
i
+ (nk)-iE EC i E ,ji
i j j'

allows from (B.1), (B.3), and (B.4) that


li Ei


(B.4)


It now fo








MST nU2/2 = (l-n)(nk)-1 Ei2 + (nk)-l Z i i-.
ij J i j j

= (l-n)Ul/2


and thus


MST = [nU2 + (1-n)U1]/2.















APPENDIX C
A CONSISTENT ESTIMATE FOR aT


The confidence interval for p given in (2.2.10) involves an

estimate for the asymptotic standard deviation of U1/U2. The form of

that estimate is derived in this Appendix using the model and

assumptions from Section 2.2.

From (2.2.8) and (2.2.9) it follows that


T 11 a e 22 E a E
(C.I) T2 = o(2 2 o)2 422-3(22 2
4012 (2o +20 )


Theorem 2.1.3 gives conditions under which .U-statistics converge almost

surely to their expectation. Using (2.2.3) and (2.2.4), it follows that

for any number c,


(C.2)


and


(C.3)


Uc aks (2o2+2o2)c
2 k+- a E




c (2 2 c
U ka2s (2 2
1 k+- E


If consistent estimates of all, 022' and 012 can be found, they can be

combined with U1 and U2 to form a consistent estimate of aT as given

in (C.1).

We now turn our attention to finding consistent estimates

for all, 022, and 012. Note that from (2.2.5), (A.1), and (A.2),

S= E[h (Z)] (2o2)2. Defining








(C.4) 8 = k- 1[h1 )-U 2,
i

we obtain the following result.

a .s.
Result C.1. Defining 811 as in (C.4), 811 k+a o11

Proof: Expanding the RHS of (C.4) we obtain


S= k- Eh(Z.) 2Ulk E (Z) + U2
11 1 1 1 --i 1

= k-1 ( i) U2


as E[h2(Z)] (22)2.
k+ 1 -1

The last step is justified by using (C.3) with c = 2 and noting that
-1 2
k Zh l(Zi) is a U-statistic and applying Theorem 2.1.3.
i
Now note that from (2.2.7), (A.5), and (A.6),

012 = 2(E[hl(Z1)h21,2)] (2a2)(2o2+22)). Defining


(C.5) 812 = 2([k(k-l)]-l1 h(Zi)h2(Zi,Zi) U1U2,
iti"


we obtain the following result.

Result C.2. Defining 812 as in (C.5), 812 o012'

Proof: We rewrite 812 as
12
S12/2 = 2[k(k-1)]-l Z (1/2)[h (Zi)h2(Z.,Zi.)
i
+ hl(Zi)h2(Z ,Zi)] U1U2.


The first term on the RHS is a U-statistic and therefore, by

Theorem 2.1.3, converges almost surely to the expectation of its kernel

which is







(1/2)E[h(Zl1)h2(Z1,Z2) + h1(Z2)h2(Z2,ZI1)

= E[h1(Zl)h2(Z1,Z2)].


Application of (C.2) and (C.3) with c = 1 completes the proof.

Finally, note that (2.2.6), (A.3), and (A.4) imply that

a22 = 4(E[h2(Z ,2)h2(ZZ3)] (2a2+2o2)2). Defining

-1
(C.6) g(Zi) = (k-l)1 h2(ZiZ)
ii i

and

2
(C.7) (22 = 4k E[g(Zi)-U2]
i

we obtain the following result.

Result C.3. Defining g(Zi) as in (C.6) and 622 as in (C.7),

a.s
22 k+4 22'

Proof: Expanding (C.7) we obtain


822/4 = k- [g2 (Zi)] 2U2[k(k-1)]- EE h2 (Zii) + U2
i iti 2
'22 1 k-2 -1 2 2
= k- [(k-l)-1 Z h2(Zi,Zi)]2 U2
i i'1i

Rewriting the first term we obtain


[k(k-1)2] -1( h (Z ,Z ) + 2E E E h (ZZ i )h (Z ,Z)
2 -1 i2 i )h 2 1
iti i i " iti"

= (k-)- 2[k(k-1)]l ZEh2(Z, Zi)
i (C.8)
1 1 i-1 k
+ (k-2)(k-1) 6[k(k-1)(k-2)3] (E E E h (Z ,Zi.)h (Z.,Zi )
i i'=1 i:=i'+l
i i:







+ E E h2(Zi,Zi.)h2(Zi,Z i.).
Si

The first term of (C.8) is equal to

therefore converges almost surely to

Theorem 2.1.3 since E[h2(Z ,Z2)

moments, 14 and n4, of the eij and a,

rewritten as



(k-2)(k-1) 6[k(k-1)(k-2)3] 1


times a U-statistic and

This follows from

the assumed finite fourth

second term of (C.8) can be


i-1 i-1
E E h2(Z i'Zi)h2(Z iZ i
i i'=l i '-i'+l z


i-1 k
+ h2 (ZiZi.) )h2( (ZZi .)
i i'=1 i--i+l

+ Z Z h2(Zi,Z9i)h2( ii))
i
= (k-2)(k-1)- (k)-1 E E (1/3)[h (Zi i)h2i i
i
+ h2(Z i)h(Zi,Zi) + h2(Z iZi)hZii)],

which is (k-2)(k-1)-1 times a U-statistic and thus, again using

Theorem 2.1.3, converges almost surely to E[h2(Z1,Z2)h2(Z1,Z_3)]. Thus,

(C.8) converges almost surely to E[h2(Z1,Z2)h2(Z1,Z3)]. The result is

proven by using this fact and applying (C.2) with c = 2.

Using Results C.1, C.2, and C.3, and (C.2) and(C.3) with various
2
values of c, we obtain a consistent estimate of aT We will denote the
2
estimate by 0T where


2 -= 2 22 4 -3
S= +11U2 a22U1 U2 2a12U1U2













APPENDIX D
DERIVATION OF ENDPOINTS IN CHI-SQUARE PROCEDURE


Using the model and assumptions in Section 2.2 a confidence

interval for p was derived using the X2 distribution (2.2.14). The

formulas for the endpoints of this interval are derived in this

appendix.

To find the slopes, dl and d2, of the two lines in Figure 2.2.1 we

rewrite (2.2.13) as

2 2 + 2 2
22X2 2a22U X' + 22U + a 2 2a UY' + aU2

2o12X'Y' + 2ao2U1Y, + 2ao2U2X" 2a 2U1U2

D2 k-1 0.


Substituting Y' = dX' and collecting coefficients yields


X'2 (22 + olld2 2a12d)

(D.1) + X'(-2a22U1 2allU2d + 2o12U1d + 2a12U2)

2 + U 2 Dx2 -1 0.
+ (o22UI + 11U 212U1U2 -2 ) = 0

The values of d for which Y' = dX' are tangents to the ellipse

depicted in Figure 2.2.1 are the values that yield only one solution of

this quadratic equation in X'. If we write (D.1) as
,2
alX2 + blX' + cl = 0, the values of d we are seeking must satisfy

b 2 4ac = 0. Now,
1 -41c









b2= 22 + 4aiU2d2 + 402 U2 + 40 U2
b1 221 11 2 12U1 12 2

+ 8ala22UUd 8o0 2Ud 8oa 8 U


8 U 2- 82U 801a12U2d + 8022U1U2d
8o1112U2U2 2 12 18OlO12Ud+8o UU2d


and


4a2 2 42 22
4ac1 = 422 41 1122U 82 2022U1U2


2 -1
- 4022DX2~k


+ 4ao 22U2d + 402 U2d2 8al U U d2
+ 11 22 1 11 2 11 12 1 2
- 4 d2 2 -1 8 2 8O a
11 DXk 812 22 d 1112d

2 2 -
+ 16012U1U2d + 8al2dDX2Sk ,


hence,


2 U 2 2 2 2 a UUd
b 4 = 4 d + 42U2 + 1a22U1U2d
1 11 12 1 122 112212

8a2UU2d 4allo22U + 4o22DX2k-1

-4o22Ud + 4 d2 2 1 8o 2 -1
411221 + 11 DX2k 1 8l2dDX2k
2 2 2 2 2 -1
= d (412U1 4a1122U1 + 4o1DX2k )

2 2 -1
+ d(8o11022U1U2 8ao2UlU2 8O12DX2 k
2 2 4 12 2 -
+ (4al2U2 4a11o22U2 + 4o22 DXk )-
12 2 2 11 2 22%


(D.2)


Writing (D.2) as a2d + b2d + c2, we see the values of
2 2
b2 4a1c1 equal to 0 are the roots of a2d + b2d + c2 = 0.
I 112 2 2


roots are (2a2)


2 1/2 -
[-b2 (b2-4a2c2) r and


d that make

These two


(2a2) i[-b2 + (b2-4a2c2)1/2] r.

The values needed to find r and r are

-b 2 + a 2 -1)
-b2 = 8(-o11l22U1U2 + 2U 1 U2 + 12DX2k ),







2 2 a U2 + 2 -1
2a2 = 8(a12U1 all U + 11DX2k ),
2 121 11 22 1 112 2

b = 64(o a2 U2 U2 + 4 2U2U2 + a2D2 ( )k-2
2 2 12 02 -2 1
2a11a22U2Ulu2- 2allO 2a22UIU2DX 2
3 2 -1
+ 2a02U1U2DX2 k ),
4 ,2 2 22 2 2 2 -1
4a2 = 64(a 2Ul22- 0 a11 a2 2UU2 + a02a22UDX22 k


0- 22U2U2 + 20U2U22
2 2 2 1 2 2 2 -1
alla22U DX2 k + allal2U2 2D k
11+ 211122 2

2 U2DX k + o1 22D2 (X 2 )2k-2
1- 1a22 2D + 2122 21) ),

and


b2 4a2c2 = 64DX22X2 2oa 221U2

1212 12a221 11 221
+2o32U1U 022022U2 +ollo22U2

2 2+ 2 2 2 -1
1- 12 2 11 22U2 -11 22DX2

Ideally, both r and r+ will be greater than one since that would

produce a confidence interval with endpoints between 0 and 1 (the range

of possible values for p). If this occurs, we define the values of d as


dl = min(r ,r ) and d2 = max(r ,r ).


In practice however, it is possible that one or both of r and r

are not greater than one. These situations are handled in the following

manner. If the ellipse intersects the Y' axis, d2 is set equal to -.

If the ellipse intersects the line X = Y', dI is set equal to 1. If

both of these events occur, the confidence interval will have endpoints

of 0 and 1. If only one occurs, the other value of d is set equal to

the value of r- or r whichever is greater than one.