ASYMPTOTIC NONPARAMETRIC CONFIDENCE INTERVALS
FOR THE RATIO OF SCALE PARAMETERS IN BALANCED
ONE-WAY RANDOM EFFECTS MODELS
BY
DAVID JOHN GROGGEL
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1983
ACKNOWLEDGMENTS
I would like to express my deepest appreciation to all those who
have assisted me during the preparation of this dissertation and
throughout my time in graduate school. In particular I wish to thank
Dr. Dennis Wackerly, chairman of my research committee, for his help
with this research and for his guidance and kindness as my advisor. I
would also like to thank Dr. P. V. Rao, for his frequent contributions
to this research, and Dr. Richard Scheaffer and the entire Department of
Statistics, for the encouragement, support, and friendships they have
provided.
Further, I would like to thank my parents, Mr. and Mrs. Richard
Groggel, for their ever-present love and support, and my in-laws, Mr.
and Mrs. Warren Rubin, for their love and support over the past five
years. Finally, special thanks goes to my wife Kathy for her help with
the typing of this dissertation but especially for the love, patience,
and encouragement she constantly offers.
1
TABLE OF CONTENTS
ACKNOWLEDGMENTS .... ....... ...................... ............
ABSTRACT....................................................
CHAPTER
PAGE
........ii
........iv
ONE INTRODUCTION..................................................1
TWO CONFIDENCE INTERVALS USING U-STATISTICS .......................7
2.1 General Theory of U-Statistics...........................7
2.2 Confidence Intervals for the Intraclass
Correlation Coefficient...............................9
THREE CONFIDENCE INTERVALS USING MODIFIED
ANSARI-BRADLEY STATISTICS.................................17
3.1 Model and Formation of Pseudo-Samples...................17
3.2 Asymptotic Distribution of the Ansari-Eradley
Statistic Using Pseudo-Samples ........................22
3.3 Asymptotic Confidence Intervals Using the
Modified Ansari-Bradley Statistic.....................39
FOUR MONTE CARLO STUDY............................................44
FIVE SUMMARY......................................................79
APPENDICES
A VARIANCES AND COVARIANCE OF U1 AND U2........................82
B THE RELATIONSHIP BETWEEN Ul, U2, MST, AND MSE.................85
C A CONSISTENT ESTIMATE FOR oT.............................. 89
D DERIVATION OF ENDPOINTS IN CHI-SQUARE PROCEDURE..............93
E C AND C TERMS ...............................................97
REFERENCES ........................................... ................ 125
BIOGRAPHICAL SKETCH........................................ ..................127
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
ASYMPTOTIC NONPARAMETRIC CONFIDENCE INTERVALS
FOR THE RATIO OF SCALE PARAMETERS IN BALANCED
ONE-WAY RANDOM EFFECTS MODELS
By
David John Groggel
August, 1983
Chairman: Dennis D. Wackerly
Major Department: Statistics
This dissertation examines the problem of estimating the proportion
of variability in the responses from a balanced one-way random effects
model that can be attributed to the treatment effect. Methods are
derived that do not require the classical assumptions of normality of
both the treatment and error effects.
Asymptotic confidence intervals are derived from functions of
U-statistics that possess either an asymptotic normal distribution or
asymptotic chi-square distribution. These methods require the effects
to have continuous distributions with zero means and finite fourth
moments.
Asymptotic confidence intervals are also derived based on the
asymptotic normality of modified versions of the Ansari-Bradley
two-sample scale statistic. Pseudo-samples of observations that are
asymptotically equivalent to samples of the effects are formed using
either sample means or sample medians. The Ansari-Bradley statistic
A1
calculated using these samples is shown to have an asymptotic normal
distribution and intervals are formed following a procedure of Sen
([1966] Annals of Mathematical Statistics 37, 1759-1770). In forming
these intervals a representation of the Ansari-Bradley statistic
developed by Bhattacharyya ([1977] Journal of the American Statistical
Association 72, 459-463) is used. The construction of these intervals
requires the effects to have continuous distributions that are symmetric
about zero and that differ only by a scale parameter. Other assumptions
on the distributions of the effects are needed depending on whether
sample means or sample medians are used to form the pseudo-samples.
A Monte Carlo study was performed to compare intervals formed by
these methods with the classical normal theory intervals and intervals
based on jackknifed U-statistics as derived by Arvesen ([1969] Annals of
Mathematical Statistics 40, 2076-2100). The study shows that the
intervals based on functions of U-statistics are poor while the
intervals based on the modified Ansari-Bradley statistics are nearly
always comparable to and, in some cases, superior to the normal theory
and Arvesen intervals.
CHAPTER ONE
INTRODUCTION
The balanced one-way random effects model,
Zij = P + ai + ij i = 1,2,...,k, j = 1,2,...,n,
has been studied and analyzed by many people. In this model, p is an
unknown constant and the cij and ai are independent samples of
independent observations from continuous populations. The majority of
the research concerning this model has been done under the classical
assumptions that the eij (commonly called the error effects) and ai
(commonly called the treatment effects) have normal distributions with
2 2
zero means and variances a2 and a respectively. In this dissertation,
E a
the model is studied under more general assumptions concerning the
distributions of these effects.
In the classical case, the test of hypothesis that is usually of
2
interest is a test concerning the magnitude of a The test is usually
of the form
2 2
(1.1) H : = 0 H : > 0
o a a a
or
2 2 2 2
(1.2) H : a c c2 : a > co
o a a a a e
where c is some specified constant.
In some instances, particularly applications in genetics and the
social sciences, an estimate of Ca/(ao + a ) is desired. This quantity
is commonly known as the intraclass correlation coefficient and is
denoted by p. In most cases, an estimate of p is more informative than
testing the hypotheses in either (1.1) or (1.2). An estimate provides
information about the actual relative magnitudes of the variance
components rather than just a conclusion that Ho should or should not be
rejected.
As an example of where an estimate of p is useful, consider the
problem described in Snedecor and Cochran (1967, Example 10.13.1). In a
study involving Poland China swine, two boars were taken from each of
four litters. All of the litters had the same sire and all eight boars
were fed a standard ration from weaning to a weight of about 225
pounds. The response of interest was the average daily weight gain.
2
The component 0a represents the variability in weight gain that was due
a
to the genetic differences in the litters while a represents the
variability in weight gain due to non-genetic factors. The ratio
2 2 + 2
a2/(c2 + a2) is the proportion of the total variability that can be
attributed to the genetic differences in the litters.
Scheffe' (1959), as well as many others, describes the procedures
for testing the hypotheses in (1.1) and (1.2) as well as the form of an
exact confidence interval for p. Both the test procedure and confidence
interval construction use the mean squares from the usual analysis of
variance table and percentiles of the F-distribution.
Scheffe" shows that these procedures are not robust if the
assumptions of the normality of the effects are violated. It is
therefore desirable to have procedures that can be used to perform tests
and construct confidence intervals for the parameters associated with
the balanced one-way random effects model which can be used when the
assumptions of the normality of the effects is in doubt.
The analysis of the random effects model without the normality
assumptions has not been researched nearly as much as the classical
case. Govindarajulu and Deshpande' (1972) studied the case in which the
Eij are independent and identically distributed with continuous
distribution function F(x) and the ai are independent with distribution
functions Gi(x). In this case, it is not necessary that the
expectations of the ai all be equal. Assuming, without loss of
generality, that w = 0 the authors examined the hypotheses
H : G(x) 0 if x < 0 for every i
o(1.3) 0 if x > 0
(1.3)
Ha: Gi(x) is nontrivial for at least one i
and derived the locally most powerful rank test by considering the
alternative hypothesis
H : Zij = Aa. + .ij for small positive A.
The hypotheses in (1.3) are analogous to the hypotheses in (1.1) since
in both cases the null hypothesis states that the ai do not contribute
to the response variable Zij and the alternative hypothesis states that
at least one ai contributes to the response. Govindarajulu (1975)
looked at the same hypotheses under the more restrictive assumption that
Gi(x) = G(x) for every i. Both of these papers considered the
unbalanced one-way random effects model, that is, j = 1,2,...,ni for
i = 1,2,...,k.
For the balanced one-way model, Arvesen (1969) and Arvesen and
Schmitz (1970) used jackknifing techniques on appropriate U-statistics
to develop procedures for testing hypotheses and forming confidence
2 2
intervals for functions of a and a This work was later extended to
the unbalanced model by Arvesen and Layard (1975). The procedures
require the distributions of the eij and ai to be continuous with zero
means. The procedures also assume finite fourth moments in the balanced
model and finite moments of at least order six in the unbalanced model.
Shoemaker (1981) examined some estimation and testing problems
using the concept of mid-variances in the balanced model where the
effects are assumed to have continuous, symmetric distributions.
In Chapter Two of this dissertation, two methods of constructing
asymptotic confidence intervals for p based on the theory of
U-statisics are described. Section 2.1 gives a brief review of some of
the basic results concerning U-statistics. In Section 2.2 a method
using U-statisics similar to those used by Arvesen (1969) is
described. This method was developed before the work of Arvesen was
known to exist. However, the confidence coefficient used for the
intervals in Section 2.2 is derived in a way different than that
presented by Arvesen. As in Arvesen's work, the method in Section 2.2
requires the Eij and ai to be independent random samples of independent
observations from continuous distributions with zero means and finite
fourth moments. Also in Section 2.2, an asymptotic confidence interval
for p is derived using a quadratic form (involving two U-statistics) to
construct a statistic with an asymptotic chi-square distribution with
two degrees of freedom.
In Chapter Three we work with scale parameters rather than
variances. The distribution of a random varible X is said to have a
scale parameter 6 (0 < 6 < =) if X has a distribution function of the
form F(x/6) where F(x) is the distribution function of a random variable
Y and the form of F(x) does not depend on 6. In other words,
X/6 has the same distribution as Y. The advantage of working with scale
parameters is that they may exist for random variables for which
variances (and thus standard deviations) do not exist. For those random
variables where both a scale parameter and a standard deviation exist, a
scale parameter is always a constant multiple of the standard deviation.
In Chapter Three the Eij and ai are assumed to be independent
samples of independent observations from continuous distributions with
distribution functions F(x) = D(x/61) and G(x) = D(x/62) respectively.
That is, 61 is a scale parameter for the .ij and 62 is a scale parameter
for the ai. It is also assumed that both distributions are symmetric
about zero with densities that are bounded and have a bounded first
derivative. In Section 3.1 the Ansari-Bradley two-sample scale
statistic (Ansari and Bradley 1960) is described. Modified versions of
the Ansari-Bradley statistic, one involving the use of sample means and
another involving sample medians, are shown to have asymptotic normal
distributions in Section 3.2. In Section 3.3 these statistics are used
to form asymptotic confidence intervals for 6 /(61 + 62). In those
situations where both scale parameters and variances of the effects
exist, this quantity is numerically equivalent to p.
In Chapter Four we present a summary of a Monte Carlo study that
compares the lengths and observed confidence coefficients of intervals
constructed using normal theory as in Scheffe' (1959), Arvesen's (1969)
6
U-statistics, U-statistics as described in Chapter Two, and the modified
Ansari-Bradley statistics as described in Chapter Three. Chapter Five
contains a summary.
Throughout this dissertation we use the symbol E to denote equal by
definition. Also, unless otherwise specified, sums involving i are from
1 to k, sums involving j are from 1 to n, and integrals are over the
region (-,).
CHAPTER TWO
CONFIDENCE INTERVALS USING U-STATISTICS
2.1 General Theory of U-Statistics
The theory of U-statisitcs was first developed by Hoeffding (1948).
For the convenience of the reader, in this section we state without
proof some results and theorems due to Hoeffding which we will utilize
in the discussions which follow.
Let Z1'Z "...,Z m be independent, indentically distributed random
vectors and let h(Z,Z ...,Z ) be a function of s(
vectors. A U-statistic has the form
-1
Um = U(Z1^Z,2,..Z-) = s) E h(Z 'Z ...,Z ),
veV 1 2 s
where V is the set of all distinct subsets of integers, (v, 2,..., s),
taken without replacement and without regard to order from (1,2,...,m).
It is easily seen that Um is an unbiased estimate of the parameter
A = E[h(Z ,Z2,...,Z)]. The function h is assumed to be symmetric in
its arguments (it can be made so if it is not) and is known as the
kernel. The value of s is the smallest possible sample size for which
an unbiased estimate of A exists and is referred to as the degree of the
kernel.
Define
hc(z z)2,. z ) = E[h(Z ,Z ,...,,Z ) Z =z ,Z =z ,..., =z ]
c1-2 -s -1-1-2-2 -c -c
and
(2.1.1) = E([hC(Z 1, 2', ,Zc)]2J A2
for c = 1,2,...,s. The quantity Ec can also be written as
(2.1.2) c = Cov[h( ,Z ,...,Z )h( ...,Z )],
1 2 s 1 2 s
where v = (vl2",..., s)' and v = (v2,v',... ,s) are subsets of the
integers (1,2,...,m) with exactly c integers in common.
The variance and asymptotic distribution of a U-statistic are given
in the following results and theorem.
Result 2.1.1 (Hoeffding 1948, Equation 5.13). If
E[h2 (Z Z2...,Zs)] < m, then the variance of Um is
-1 s
Var(U ) = ( (cS)mS .
m "s cs-c c
c=l
Result 2.1.2 (Hoeffding 1948, Equation 5.23). If
E[h2( ZZ2,...,Zs)] < (, then lim mVar(U ) = s2l
-s m+0 m 1
Theorem 2.1.1 (Hoeffding 1948, Theorem 7.1). If
E[h2 ,Z2,...,Z )] < m and I > 0, then m1/2(Um A) d+ N(0,s2
1 2 s m m+O 1
(w)
For w = 1,2,...,g, let U(w) be U-statistics all defined on the same
m
m vectors with degrees s(w), kernels h and expectations A(w). For any
two of these U-statistics, say U () and U define
m m
(2.1.3) (1,2) E[h( ,Z ,...,Z )h (Z,...,Z )]
c v= 11 12 is(l) V21 222 2s(2)
A(1)A(2),
where 1 = (v11'12'" .. vls(1) ) and v2 = (v21' 22'.. 'V2s(2))' are
subsets of the integers (1,2,...,m) with exactly c integers in common.
The covariance of these U-statistics and their joint asymptotic
distribution are described in the following results and theorem.
Result 2.1.3 (Hoeffding 1948, Equation 6.5). If
E(h ) < -, E(h ) < -, and s(2) ( s(1), then the covariance
of U(1) and U(2) is such that
m m
m-1 s(2)
Cov[U(1),U(2) s (s(1) m-s(1) (1,2)
m m s(2)) c s(2)-c c
c=l
Result 2.1.4 (Hoeffding 1948, Page 304). Under the same conditions
(1) (2) (1,2)
as in Result 2.1.3, lim mCov[U)U2 = s(l)s(2)( 2)
Theorem 2.1.2 (Hoeffding 1948, Theorem 7.1). If E(h2) < for
w
w = '1,2,...,g, then
ml/2([U)_A((1)],[U(2)-A(2)],...,[U()-A(g)] m Ng(OA)
M(g)] -m N g(0A),
where A is a g by g matrix with elements Ai = s(i)s(j)Si'j)
In a later paper Hoeffding proved the following theorem concerning
the asymptotic convergence of a U-statistic.
Theorem 2.1.3 (Hoeffding 1961). If E[ h(Z1 Z2 ,...,Z s)] < ,
then U a--s A.
m m-a
2.2 Confidence Intervals for the Intraclass Correlation Coefficient
Consider the balanced one-way random effects model
Zij = p + ai + eij i = 1,2,...,k, j = 1,2,...,n,
where the eij are independent random variables with a continuous
distribution with mean zero and finite fourth moment and the ai are
independent random variables with a continuous distribution (not
necessarily the same family as the distribution of the eij) which also
has mean zero and finite fourth moment. The Eij and ai are assumed to
be independent of each other and the variances of the two distributions
2 2
are denoted by 2 and a respectively. The parameter p is an unknown
a
constant.
In the work that follows in this section, the number of
observations per treatment, n, remains fixed as the number of
treatments, k, increases to infinity. Due to the structure of the model
this is sufficient to obtain, at least theoretically, unlimited
knowledge about both the eij and ai.
Let Zi = (Zil'Zi2',.Zin )', for i = 1,2,...,k, be k independent
and identically distributed vectors. On these k vectors we define two
U-statistics,
-1
(2.2.1) Ul = kZhl(Zi),
i -1
where hl(Zi) = (n) Z E (Z. Z
j
and
-1
(2.2.2) U2= ( I 2 Z i ',h
2 2 i*
*
where h2(Zii.) = n-2Z (Zi-Zi-j) 2.
JJ
These U-statistics are unbiased estimates for the expectations of
their respective kernels which are
E(Ul) = E[h,(Z,)] = E[(Z11-Z12)2]
(2.2.3)
= E(el -E2 ) = 20
11 12 e
and
E(U2) = E[h2(Z1Z2) = E[(Zll-Z21) 2
(2.2.4)
2 2 2
= E(l-a2+11-21) = 20 + 2a.
4 4 2
If E E(el4) and n -E(ao) are finite, and thus E(h ) < and
E(h2) < -, Results 2.1.2 and 2.1.4 imply (see Appendix A) that
4 -1
(2.2.5) lim kVar(U1) = 4n-[4 + (3-n)(n-l) i] = ,
k+.
4 4 2 2
(2.2.6) lim kVar(U2) =4(4 + /n o a /n + 4a22/n) = 22
k+ 22
and
(2.2.7) lim kCov(U1,U2) = 4n- (4 -o) o12.
k+w2
4 2 2 4
Since E( ) = 24 > [E( )]2 = o it is clear that, for large k, U1
and U2 are positively correlated.
Using Theorem 2.1.2 we can describe the asymptotic distributions of
U1 and U2 in the following theorem.
Theorem 2.2.1. If U, and U2 are U-statistics as defined in (2.2.1)
and (2.2.2) and if all, 022, and a12 are as defined in (2.2.5) through
(2.2.7), then
k /2[U -2o2], k 2[U2-(2ao+o2)]) N2(
k+=o
where
all 012
(2.2.8) A = [ )*
12 "22
Using Theorem 2.1.3, (2.2.3), and (2.2.4), we know U1 a-I 202 and
k+- e
U a5~ 202 + 202. It then follows that U /U2 as o/(o2 + o2
k+* k+co
which is equal to 1 c/(a2 + a ). Thus, 1 UI/U2 is a strongly
consistent point estimate for the intraclass correlation coefficient,
p = 0/(2 + 02).
a a E:
It is useful to note at this point that U1 and U2 are related to
quantities encountered in the classical normal theory one-way analysis
of variance. If MST and MSE denote the mean square for treatments and
the mean square for error, respectively, from the analysis of variance,
then we show in Appendix B that MSE = U1/2 and MST = [nU2 + (1-n)U1]/2.
The usual point estimate for p in the normal theory case is
n- (MST-MSE)/[n- (MST-MSE) + MSE] (Scheffe' 1959, Page 229). Using the
above-mentioned relationships between MST, MSE, Ul, and U2, it is easily
seen that the normal theory and U-statistic approaches both lead to the
same point estimate for p.
Consider now the statistic Tk U/U2, and define a vector a, such
that
aTk Tk)
aU1 aU2
2 2 2
evaluated at the points E(U,) = 2a and E(U2) = 20 + 202. Since
and a> Since
aTk k 2
1/U2 and -U /U2
aU au
U1 2
we have
2 -1 2 2 2 2).
(2.2.9) a' = ((20' + 2o) (-2a )/(2a + 2a )2 .
2o) (- 2o
2
Letting oT = a'iAa, where Aj is as defined in (2.2.8), and using
Theorem 2.1.2 and Theorem 14.6-2 from Bishop, Fienberg, and
Holland (1975) concerning differentiable functions of vectors with a
joint asymptotic normal distribution, we can state the following
theorem.
Theorem 2.2.2. If Tk = U1/U2 and 2T = a'a, then
kl/2[(l-Tk) -p]/oT -d+ N(0,1).
k+o
2
The quantity oT depends on unknown parameters but Slutsky's Theorem
(Serfling 1980, Page 19) assures us that Tk will still have an
asymptotic normal distribution if we replace aT by a consistent
estimate. Such an estimate is derived in Appendix C and is referred to
here as 8T. Using this estimate, we can construct an asymptotic,
100(1-))% confidence interval for p as
-1/2
(2.2.10) (1-UI/U2) (Z/2(k )1 ,
where Z,/2 denotes the (1-C/2)th percentile of a standard normal
distribution.
The above procedure was derived before it was known that
Arvesen (1969) had developed a similar procedure involving the
jackknifing of U-statistics. Also, Arvesen and Schmitz (1970)
considered the specific problem of constructing an asymptotic confidence
interval for the intraclass coefficient.
2 2
In their procedure Arvesen and Schmitz estimated 6 = ln(no2/o2 + 1)
by jackknifing the statistic k = In(MST/MSE) (note that MST/MSE can be
written as a function of U-statistics: see Appendix B). The log
transformation was used for variance stabilization which Arvesen and
Schmitz showed, through simulation, was useful for moderate sample
sizes.
The Arvesen-Schmitz procedure involves leaving out, one at a time,
^i
each of the vectors Zi, and calculating BI,- = ln(MST/MSE) using the
remaining vectors as the data for a one-way design with k 1
treatments. Using 0 as the estimate calculated using all k vectors,
psuedo- estimates are formed as i. = k0 (k-l)8tl. A point estimate
A -1
is calculated as =k S. and the standard deviation of the point
i
-1 2 )21/2 \
estimate is estimated by s. = [(k-1) (i-p) ] Then, as in
a i
Tukey (1958), the distribution of the statistic,
1/2 ^ -1
t = k/2 (B-)sl,
k-1
is approximated by a t distribution with k 1 degrees of freedom.
If t /2,k- is the (l-/2)th percentile of a t distribution with
k 1 degrees of freedom, then an approximate 100(1-C)% confidence
interval for a is
S-1/2 k-/2
(t- (t 2 l )sk/2 + (t )s.k (L,U).
;/2,k-1 ; /2,k-1
Therefore, an approximate 100(1-))% confidence interval for p is
(2.2.11) ([exp(L)-l]/[exp(L)-l+n], [exp(U)-1]/[exp(U)-1+n]).
Another method of obtaining an asymptotic confidence interval for p
can be derived using the fact that, for a two-dimensional vector W,
W ~ N2(0,A) implies W'A I W X where X) is a chi-square random variable
with two degrees of freedom (Serfling 1980, Page 128). Therefore,
Theorem 2.2.1 implies that
(2.2.12) k([U_-E(U) ,U2-E(U2)]i [U-E(U1),U2-E(U2)j x2
k+X
2
Letting D Det(A) = 011022 012 > 0 we obtain
1 D-1 22 012
A-' = D-I (22 -12(
al
12 11
and letting X' = E(U1) = 2a2 and Y' = E(U2) = 202 + 2a2 we can rewrite
(2.2.12) as
kDI [22(X'-U)2 + oll(Y-U2)2 2C12(X--U1)(Y'-U2)] d 2.
k+=
Defining X2l as the (l-i)th percentile of a X2 distribution and
2
setting the above quadratic equation equal to X; we obtain
2 2 2 -1
(2.2.13) a22(X'-U1)2 + o1(Y--U) 2a12(X'-U1)(Y-U2) DX2k- = 0,
which is the equation of an ellipse such that the probability the point
(U1,U2) is in the interior of the ellipse is approximately 1 i.
Using the observed point, (U1,U2) c, as the center of the
ellipse, we can form an asymptotic 100(1-i)% confidence interval for p
in the following manner
Y'=E(U2)=2o2+2o2
2 a CE
Y'=d X'
1 2
2
X'=E(U1)=2a
1 E;
Figure 2.2.1
Let dl and d2 (dl < d2) be the slopes of the two lines that pass
through the origin and intersect the ellipse in exactly one point (see
2 2 2
Figure 2.2.1). Using Y' = dX', or equivalently d = (2 + 2 )/2 we can
form an asymptotic 100(1-)% confidence interval for p as
(2.2.14)
( -1 -d1
(1 -d 1-d ),
1' 2
where the exact forms of dl and d2 are given in Appendix D.
As we shall see in Chapter Four, this method of constructing a
confidence interval for p is inferior to other available methods and
therefore would not be recommended for use in practice.
CHAPTER THREE
CONFIDENCE INTERVALS USING MODIFIED ANSARI-ERADLEY STATISTICS
3.1 Model and Formation of Pseudo-Samples
Ansari and Bradley (1960) introduced a two-sample rank statistic
that can be used to construct a confidence interval for the ratio of two
scale parameters. Let X1,X2,...,Xn and YI,Y2**,.,Yk be two independent
samples of independent observations from populations with continuous
distribution functions, F(x) and G(x) respectively, such that
F(x) = D(x/61) and G(x) = D(x/62) for some distribution function D(x).
That is, 61 and 62 are scale parameters associated with the X's and Y's
respectively. Define e = 62/S1, the ratio of the two scale
parameters. Thus 6X and Y have the same distribution.
The Ansari-Bradley statistic can be formulated in different ways.
In the formulation we utilize, the combined sample of X's and Y's are
ordered and the observations are ranked from the inside out as
N/2,...,2,1,1,2,...,N/2
if N E n + k is even and as
(N-1)/2,...,2,1,0,1,2,...,(N-1)/2
if N is odd. The Ansari-Bradley statistic is then defined as
W = ERank(Xi).
i
The statistic W can be used as a test statistic for testing
H : 6 = 1 versus one or two-sided alternatives. The distribution of W
o
under the null hypothesis (F(x) = G(x)) is tabled for moderate values of
n and k (Ansari and Bradley 1960). Bauer (1972) describes a method of
inverting the test procedure to obtain a confidence interval for e.
Using Theorem 1 of Chernoff and Savage (1958), Ansari and Bradley
(1960) showed that TN = W/(nN) has an asymptotic normal distribution
-1
which, under the null hypothesis, has mean 1/4 and variance k(48nN)-.
However, the Ansari-Bradley statistic does not satisfy all the
assumptions necessary for the application of Theorem 1 of Chernoff and
Savage. An alternate proof of the asymptotic normality of TN under the
null hypothesis is given in Section 3.2. The alternate proof modifies
the Chernoff and Savage proof so that it may be applied in the present
situation.
Consider now the balanced one-way random effects model
Zij = V + i + ij i = 1,2,...,k, j = 1,2,...,n,
where u is an unknown constant and the eij and ai are independent
samples of independent observations from continuous distributions with
distribution functions F(x) and G(x) and density functions f(x) and g(x)
respectively. Also, assume there exist scale parameters, 61 and 62,
such that F(x) = D(x/61) and G(x) = D(x/62) where D(x) is a continuous
distribution function corresponding to a random variable with a
distribution symmetric about zero. Thus, the Eij and the ai are random
variables with distributions symmetric about zero and they satisfy the
assumptions needed for using the Ansari-Bradley statistic. Therefore,
the eij/61 and the ai/62 have the same distribution.
22 2
The objective is to estimate the parameter Y = 62/(6 + 62) in
order to assess whether the variability contributed by the treatments is
large compared to the overall variability of the responses, i.e., to
estimate the proportion of the variability in the responses attributable
to the treatments.
Ordinarily in the two sample scale problem, 6 is the parameter that
would be of interest. However, in order to compare methods involving
scale parameters to methods involving variances we instead look at the
parameter y which is a function of 9, namely y = 2/(9 2+1). Thus, y and
= /2 + 2
p = 2 /(a + o ) are analogous parameters. In fact, y = p in those
cases where both scale parameters and variances exist since
62/6 = a/o One advantage to a procedure that estimates y is that an
estimate of the desired quantity can be found in those cases where
variances, and thus p, do not exist.
Ideally, we would like to have one sample consisting of the Eij and
another consisting of the ai and then use the Ansari-Bradley statistic
to give us information about 8 which could be transformed into a
confidence interval for y. However, knowing only the Zij, the
individual Eij and ai are not observable. What we can do is formulate a
sample of size n that, as n + m, essentially behaves like the cij from
treatment i and another sample of size k that, as n and k + -, mimics
the ai.
The derivation of these two pseudo-samples follows. In the next
section we will show that the asymptotic distribution of the Ansari-
Bradley statistic calculated using these pseudo-samples is the same,
when 0 = 1, as the asymptotic distribution of the Ansari-Bradley
statistic calculated using the actual e.. from treatment i and the
actual a..
We begin formation of the pseudo-samples by defining
Z = n Ez, Z = (nk) -EZ 1 i = n j = (nk) EE.ij
i. ij .. ij' i. ij .. ij1
j ij j ij
-1
and a = k Ei. The two pseudo-samples we obtain are
i
1 2
x1 = Zil-i. = l-Ei. Y = ZZ = +E. --C
X = Z2-i. = e2-. Y2 = Z2.- = a2+2 -a-
x2 12fi. 12 i. 2 2. .. 2 2. ..
(3.1.1)
Yk Zk.-Z = k+k. -a-E
S= -Z. = Z. -i .
n in i. in i.
Under the assumptions of Theorem 3.2.1, as n and k +, e a,
and E. for 1 < i < n, converge in probability to zero. Therefore,
for large N, we would expect the Xj and the Yi to behave like random
samples from F(x) and G(x) respectively.
Throughout this chapter we assume that n and k both tend to
infinity in such a way that ,A = n/N always satisfies the condition that
10 < XN < 1 X0 for 0 < X0 < 1/2. Obviously, N = n + k will therefore
tend to infinity. To facilitate discussions, we will simply say that
N + -.
Recall that 61 is a scale parameter for the Eij and 62 is a scale
parameter for the ai. Let F (x) and G (x) be the distribution functions
for the Xj and Yi respectively. In an asymptotic sense we can think of
61 as being a scale parameter for the Xj and 62 as being a scale
parameter for the Y. since
1
F (x) = P(X
--+ P(ei..x) = F(x) = D(x/6 )
and
G (x) = P(Y.
1 1 1. 2.
--- P(a.ix) = G(x) = D(x/62).
N-- 1
These asymptotic equivalences can be justified by noting again that the
means converge in probability to zero and using Slutsky's Theorem
(Serfling 1980, Page 19).
Samples with the same asymptotic properties as those in (3.1.1) can
be obtained in other ways. One approach is to use medians rather than
means. Define Z. = median of (Zil Zi2,...,Zin),
A A
Z = median of (Z1,Z 2,... Zk), = median of (E i .i2, .., in),
and &= median of (a 1+9,a2+e2,...,ak+pk). We could then obtain the
pseudo-samples
1 2
X Z = Zil- = El-z Y = Z-2 = -l+l-8
1 iii ii i 1 1 1 1
X = Z -Zi = E2-: Y = Z2-Z = a2+~2-
2 12 i i2 i 2 2 2 2
(3.1.2)
Yk = Z-Z = k+k-e
X' = Z. -Z = E. -'..
n in i in 1
Using reasoning similar to that used with the samples in (3.1.1),
under the assumptions of Theorem 3.2.2, we conclude that these pseudo-
L
samples would be asymptotically equivalent to random samples from F(x)
and G(x) respectively.
Other methods of obtaining the two pseudo-samples could be used as
long as they provided samples with the correct asymptotic properties,
the estimates involved converge to zero, and the estimates satisfy other
criteria that will be examined in Section 3.2.
In the next section, we turn our attention to the asymptotic
distribution of the Ansari-Bradley statistic calculated using the
pseudo-samples. We show that, when 8 = 1, this distribution is
equivalent to the asymptotic distribution of the Ansari-Bradley
statistic calculated using the actual eij and ai.
3.2 Asymptotic Distribution of the Ansari-Bradley Statistic
Using Pseudo-Samples
Consider the pseudo-samples described in (3.1.1). For simplicity,
but with no loss of generality, we will assume that we are using the Eij
from treatment one. Let c' = (e, e12-,..., n), a' = (a ,a2,..., Ik
X" = (XX2,...,Xn), Y' = (YY2, ...,Yk), and W(E,g) denote the
Ansari-Bradley statistic calculated using the samples E and a. We now
derive an expression for W(.,g) that is similar to a Chernoff and
Savage (1958) expansion of a statistic. We do not use a direct
application of the Chernoff and Savage procedure because the Ansari-
Bradley statistic does not satisfy all the assumptions necessary for
implementing the Chernoff and Savage expansion. However, using an
alternative expansion that produces a similar expression to that
obtained by Chernoff and Savage allows us to make use of some of their
results.
For any event A, let I(A) be 1 if event A occurs and 0 if event A
does not occur. We then define empirical distribution functions for the
elj and the ai as Fn(x) = n-EI(E1
j i
With XN as described in Section 3.1, we then define the combined sample
empirical distribution function as HN(x) = XNFn(x) + (l-XN)Gk(x) and let
the combined population distribution function be denoted
(3.2.1)
H(x) = XF-(x) + (1-XN)G(x).
We define a function JN[HN(x)] to be
JN[HN(X)] = I
-1 -1
(2N)- + 11/2+(2N)- -HN(x)
11/2+(2N)-lN(x)I-
j1/2+(2N) *~HN(x)
if N is even
if N is odd
and a function J[H(x)] as
J[H(x)] = 1/2-H(x)j.
We also let
J'[H(x)] = I
-1 if H(x) < 1/2
1 if H(x) > 1/2
and note that J'[H(x)] is the derivative of J[H(x)] with respect to H(x)
at all points except H(x) = 1/2 (where the derivative is not defined).
We make J'(1/2) = -1 by definition so J'[H(x)] will be defined
everywhere.
Let
(3.2.5) T (e,a) = (nN)-W(e,a) = JN [HN(x)]dF (x)
be an alternative representation of the Ansari-Bradley statistic and let
(3.2.2)
(3.2.3)
(3.2.4)
I _______________________^ __ ---- _---------------------
A = fJ[H(x)]dF(x),
(3.2.7)
(3.2.8)
(3.2.9.a)
(3.2.9.b)
(3.2.9.c)
BIN = fJ[H(x)]d[Fn(x)-F(x)],
B2N = f[HN(x)-H(x)]J'[H(x)]dF(x),
ClN = XNf[Fn(x)-F(x)]J'[H(x)]d[Fn(x)-F(x)],
C2N = (1-N)f[Gk(x)-G(x)]J'[H(x)]d[Fn(x)-F(x)],
C3N = fJNtHN(x)]-J[HN(x)])dFn(x).
Then, by adding and subtracting appropriate quantities,
TN(,_) /JN[HN(X)]dFn(x)
= fJ[H(x)]dF(x) + fJ[H(x)]d[Fn(x)-F(x)]
+ f(JN[HN(x)]-J[HN(x)])dFn(x)
+ f(J[HN(x)]-J[H(x)])dFn(x)
= A + B1N + C3N + f(J[HN(x)]-J[H(x)])dFn(x).
Now, using (3.2.3), J[HN(x)] J[H(x)] is equal to
H(x) HN(X)
HN(x) H(x)
1 HN(x) H(x)
HN(x) + H(x) 1
if 0
if 1/2
if 0
if 1/2
Since we assume that F(x) and G(x) are such that F(0) = 1/2 = G(O), it
follows that H(O) = 1/2. Thus, using (3.2.4), J[HN(x)] J[H(x)] is
equal to
(3.2.6)
[HN(x)-H(x)]J'[H(x)] + 0
[HN(x)-H(x)]J'[H(x)] + 0
[HN(x)-H(x)]J'[H(x)] + [1-2HN(x)]J'[H(x)]
[HN(x)-H(x)]J'[II(x) + [l-2HN(x)]J'[H(x)]
01HN(x) 1/2
1/2
04HN(x)<1/2
1/2
We define
0
0
KN(x) =
[1-2HN(x)]J'[H(x)]
[l-2HN(x)]J'[H(x)]
0HN(x)<1/2
1/2
O0HN(x)<1/2
1/2
(3.2.9.d)
C4N = fKN(x)dFn(x).
Then J[HN(x)] J[H(x)] = [HN(x)-H(x)]J'[H(x)] + KN(x) and therefore
TN(e,g) = A + B1N + C3N f[HN(x)-H(x)]J'[H(x)]dFn(x) + C4N
= A + BIN + C3N + f[HN(x)-H(x)]J'[H(x)]dF(x)
+ f[HN(x)-H(x)]J'[H(x)Id[Fn(x)-F(x)] + C4N
= A + BIN + B2N + C3N + f[Fn(X)-F(x)]J'[H(x)]d[Fn(x)-F(x)]
+ (1-AN)J[Gk(x)-G(x)]J'[H(x)Jd[F (x)-F(x)] + C4N
(3.2.10) = A + BIN + B2N + C1N + C2N + C3N + C4N.
The terms C1N through C4N are shown to be of order o p(N-2) in
Appendix E. Since A, BIN, and B2N are equal to or analogous to
corresponding terms in a Chernoff and Savage (1958) expansion, we see
that TN(E,a) has an asymptotic normal distribution with mean
S= fJ[II(x)JdF(x)
x<0
x>0
x>0
x(O.
and
and
and
and
x<0
x>0
x>0
x<0
and variance
Na = 2(1-i )( f f G(x)[l-G(y)]J'[H(x)]J'[H(y)]dF(x)dF(y)
-o
-1
+ (1-X)X- f f F(x)[l-F(y)]J'[H(x)]J'[H(y)]dG(x)dG(y)).
-oe
Under Ho: 8 = 1, implying 61 = 62 and therefore F(x) = G(x) = H(x), it
2 -1
can be shown that uN = 1/4 and o = k(48nN)-
To prove that W(X,Y), where X and Y are as given in (3.1.1), has
the same asymptotic distribution as U(_,a), when 6 = 1, we look at
(3.2.11)
TN(X,Y) = (nN)-1W(X,Y)
and show that it has the same asymptotic distribution as TN(E, ).
Remembering that F(x) and G(x) are the true distribution functions
for the elj and the ai respectively and that Fn(x) and Gk(x) are the
corresponding empirical distribution functions, we define
(3.2.12)
(3.2.13)
* -1
F (x) = n ZI(X(
n j
G*(x) = kZI(Y
ki
(3.2.14)
(3.2.15)
(3.2.16)
(3.2.17)
(3.2.18.a)
(3.2.18.b)
HN(x) = Vn(x) + (l-N)Gk(x),
A = fJ[H(x)]dF(x),
BN = /J[H(x)]d[Fn(x)-F(x)],
B2N =[HN(x)-H(x)J'[(x)JdF(x),
CN = AN[Fn(x)-F(x)]J'[H(x)d[Fn(x)-F(x)],
C2N = (l-XN)f[G(x)-G(x)]J'[H(x)]d[Fn(x)-F(x)],
(3.2.18.c)
(3.2.18.d)
n
C3N= N[H(x)]-J[HN(x)])dF (x),
C4N = (x)dF ),
C N = fKN(x)dFn(x),
where
0
0
KN(x) =
[1-2HN(x)]J'[H(x)]
[1-2HN(x)]J'[H(x)]
and H(x), JN, J, and J' are as defined in
Expanding TN(X,Y) = fJN[HN(x)]dFn(x)
expanded TN((,a) produces
(3.2.19)
if 0(HN(x)<1/2 and x<0
N
if 1/20
if 00
if 1/2
(3.2.1) through (3.2.4).
in the same manner as we
X *
TN(X,Y) = A + BIN + B2N + CIN + C2N + C3N + C4N.
Recall that F(x), G(x), and thus H(x), are assumed to represent
distributions symmetric about zero. This implies that J[H(0)] =
J(1/2) = 0. Therefore, defining
(3.2.20)
%(x) = JJ'[H(y)]dF(y)
B(x) = fJ'[H(y)]dG(y),
(3.2.21)
B1N can be written as
f(XJ'[H(y)]dH(y))d[Fn(x)-F(x)]
= fI(fJ'[H(y)]d[rNF(y)+(l-N)G(y)])d[Fn(x)-F(x)]
= XNfi(x)d[F*(x)-F(x)] + (1-A )fB(x)d[F*(x)-F(x)].
Integrating B2N by parts produces
[HN(x)-H(x)]B(x)|_f fB(x)d[HN(x)-H(x)]
= -(XNj(x)d[FN(x)-F(x)] + (l-XN)fB(x)d[Gk(x)-G(x)]).
*
Thus, BIN + B2N may be expressed as
(1-~k)(fB(x)d[F*(x)-F(x)] f/(x)d[Gk(x)-G(x)])
(3.2.22) = (1-XN)(n-ZB(e l.) E[B(c )]
k- EZB(cai+i -a- ) E[B(al)]).
In the same manner it can be shown that BIN + B2N (given in (3.2.7)
and (3.2.8)) can be written as
(3.2.23) (l-XN)(n-B(el ) E[B(e11)] k-1B(ai) E[B(cl)]).
j i
Recalling the form of J'[H(x)] given in (3.2.4) and the definitions
of B(x) and B(x) given in (3.2.20) and (3.2.21), we see that
1/2 F(x) if x<0
(3.2.24) B(x) = {
F(x) 1/2 if x>0
and
1/2 G(x) if x<0
(3.2.25) B(x) = {
G(x) 1/2 if x>0.
We then define
-f(x) if x<0
(3.2.26) B'(x) = {
f(x) if x>0
and
-g(x) if x0O
(3.2.27) B'(x) = {
g(x) if x>O,
noting that B'(x) and B'(x) are the respective derivatives of E(x) and
B(x), except at x = 0. We make B'(0) = -f(0) and B'(0) = -g(0) by
definition so the functions will be defined everywhere.
The proof that the asymptotic distribution of Tj(X,Y) is the same
as that of TN(E,a), when e = 1, is given in the following theorem.
Theorem 3.2.1. Using the model and assumptions described in
Section 3.1 and the pseudo-samples given in (3.1.1), if
(i) 6 = 62/61 = 1 (WLOG assume 61 = 62 = 1),
(ii) F'(x) = f(x) and IF"(x)| = ff'(x)| are continuous and
bounded by constants B1 and B2 respectively,
(iii) f(0) > 0,
(iv) fx2f(x)dx < ",
then
(3.2.28) N 2[TN(X)-1/4](48nk1 )1/2 -- N(0,1).
N+c
Proof: First note (i) implies F(x) = G(x) = H(x) and (iv) implies
1/2-
N a = 0 (1). Assumption (iv) also implies that for every i,
p
1/2- 1/2- 2
N /2i = 0 (1) and E[(N 2i.)2] is uniformly bounded. Also note that
i p
the Glivenko-Cantelli Theorem (Serfling 1980, Page 61) states that
supF n(x)-F(x)| = o (1). We begin the proof of the theorem by proving
x p
two lemmas.
Lemma 3.2.1. Under the assumptions of Theorem 3.2.1,
n N l/2 B(c-e )-B(C )] = Op(1).
Proof: Recalling the form of B(x) from (3.2.25) we see that
B(c j-. ) B(Elj) is equal to
-G( ) G(e l-.) if e1j<0 and elj-l.O
G(ej-E) G G(E -j) if Ej >0 and e j-l.>0
1 G(eI -eI ) G(C I) if ej >0 and e j-~l.0
G(elj-1.) + G(e ) 1 if eljO and e l-1>0,
which can also be written as
+ 0
+ 0
+ [1-2G(eIj-1l.)
+ [2G(elj-1. )-1]
Elj<0
Sl>0
Elj>0
Elj<0
and l j- l.,0
and e j-le >0
and elj- l.<0
and lje .>0.
(3.2.29) BN (X ) =
0
0
1 2G(el.-e1.)
2G(e .-1 ) 1
if Elj
if E j>0
if lj>0
if lj0
1lj
and e j-e .<0
1j 1.
and lj-E. >0
i1 1.
and e l-e l.0
and e j-l. >0
1i 1.
and noting that
G(cj-E1.) G(ej) = (-l )g(Clj) + (1/2; )g(El +Tj),
--[G( j- 1. )-G(Eij)
[G( Ej-E )-G(qj )]
[G(el-1. )-G( i)]
-[G(elj-l. )-G(elj)]
--[G~ -q )G~ J
Defining
where T. is between 0 and -i., we use the form of B'(x) (3.2.27) to
obtain
--1 1/2
n N Z[B(ce )-B(e )]
= -(N 12. )n- B'(elj) + (1/2 1.)(N 12. )n- E'[G( )]g'(E +T
.
(3.2.30) J J
+ n-1N1/2 (X).
Now, E[B'(E1 ) = JO -f(x)dF(x) + fJf(x)dF(x) = 0 since f(x) is
symmetric about zero. Also, E[B'2 (ej)] = f2 (x)dF(x) is bounded since
f(x) is bounded. Therefore, since the Eli are independent, we apply the
Markov inequality (Chow and Teicher 1978, Page 88) to obtain
-1
n 1B'(ecj) = o (1). The assumptions of the theorem imply 1. = o (1),
1/2-
N 12. = 0 (1), and IJ'[G(c l)]g'(e l+T, ) < B2. Thus, the sum of the
first two terms of (3.2.30) is 0 (1)o (1) + o (1)0 (1)0 (1) = o (1). To
complete the proof of the lemma we must show n N 2BN(X.) = o (1).
Using (3.2.29) we see that n 1N /2 % (X.) is equal to
j -J
+ [2G(e .-i. )-l]I(eC 0)I(e -'I.>0)
2n lNl/2 ([G(0)-G(e -i1 )]I(0
+ [G(Elj-1 )-G(0)]I( 1
< 2n-1Nl/2([G(0 )-G(c j-C .)]I(0
+ [G(Cj-e )-G(EO1)I(I
C 2n-INl/2 ([CjG(I lj-' l. )]I 10<
= 2n- Nl/2F(((-l )g(el +j )I[I(0< Elj< )+I(.e <.
where T' is between 0 and -e
J 1.
< 2BI/2N In- Z[I(0
S2BN 1 Fn(O)-Fn( 1.).
< [2BIN/2 1 ] [IFn(O)-F(O) +lF(O)-F(S.) )I+F(e1)-Fn(e1.) ]
= Op(l)[o (1) + Op(1) + op(1)]
= p (1),
using the Glivenko-Cantelli Theorem, the continuity of F(x), and the
fact that e = o (1). This completes the proof of the lemma.
i. p
Lemma 3.2.2. Under the assumptions of Theorem 3.2.1,
-1 1/2 -(1).
k-IN/2[B(ai+i -a-s )-B(ai)] = o (1).
i
Proof: We begin by recalling the forms of %(x) and B'(x) given in
(3.2.24) and (3.2.26). Proceeding as in the proof of Lemma 3.2.1, we
define
S0 if ai<0 and ai+i. -a-E <0
0 if ai>0 and ai+Ei -a-s >0
(3.2.31) N (Yi) = i i i ..
1 2F(ai+i.- a-c ) if a.>0 and ac.+. -a-s 40
1 1 1. .
2F(a +. -a-. ) 1 if ai O and ai+E i-a-s >0
1 i. .. i l l. ..
and obtain
k- Nl/2 Z[B(a i+i -a- )-B(ai)
i
1/2 -1 -1 1/2-
(3.2.32) = -N (a+ )k ZB'(ai) + k E(N E )B (a.)
i i
Si ii +T
+ k 1Nl/2^ E(Yi),
iN
where T is between 0 and e. -a-E To prove the lemma we show that
1 **1.
each of the four terms in (3.2.32) is o (1).
Since N1/2a = 0 (1) and N1/2 = o (1), it follows that
p .. p
N (/2(+-e ) = 0 (1) and therefore the first term in (3.2.32) is seen to
p
be o (1) following the same argument used to show
(N/2-i. )n EB'(e j) was o (1) in the proof of Lemma 3.2.1.
S--1J 1/2-
In the second term of (3.2.32) let AN = k E(N e )'(ca). Since
i
the e. and the B'(a ) are independent with zero means, we see that
E(AN) = 0. We also note that E(2A) = Nk E[((.)BB2(al)]. The
assumptions of the theorem guarantee that IB'(ac)I is bounded and that
E(e ) = o (1). It then follows that E(Aj) = o(1) and thus, using the
Markov inequality, that AN = o (1).
We now turn attention to the third term of (3.2.32). From
assumption (ii) and the definition of J', we see that
IJ'[F(ai)]f'(ai+Ti) < B2. We can then write
IN2(2k)-E( i.a- )2 J'[F(ai)]f'(ai+T )I
i
1/21 i- 2
< BN 2(2k)1E(C. -a-- )2
2 i
Expanding the upper bound we obtain
(3.2.33) B2/2[N (crie )(+cie ) + k EN E/
2. .. i.
1/2- -1 -
i
2N1/2(a+e )k -Ze. ].
1/2 -1 -
As before, N (cfri- ) = 0 (1). Also, k le is o (1) using the Markov
p i.
-1 1/2-2
inequality. In the middle term of (3.2.33) let A, = k 'N ei. Since
i
the e. are independent and identically distributed, assumption (iv)
1/2-2
assures us that E(AN) = E(N l) = o(1). Therefore, we see that
A = o (1) by again applying the Markov inequality. Combining these
results we see that (3.2.33) is equal to B2/2[0 (1)o (1) + o (1)
Op (1)o p(1)] = o p(1).
Recalling the definition of B~(Yi) given in (3.2.31), we express
the fourth term of (3.2.32) as
k-1 N/2E(l-2F(a.+i. -a-c )]I(a.>0)I(a +E i.-a-- <0)
+ [2F(a +i, -a-c )-1]I(ai<0)I(ai+i.-a-.. >0))
< 2k- Nl/2 ([F(a )-F(a +i.-a-c )]I(O
+ [F(a.+E -a-c )-F(a )]I(a+e -c.
= 2k-lNl/2 (|(E -a-- )f(a.+Tr) [I(0
+ I(a+c -E.
where T' is between 0 and e. -a-e
i 1. ..
i
(3.2.34) < 2B 1N 1 +-E k k-[I(0
2B Ic .N 1/ .
+ 2BIN /2k- -Ei. I [I(0
To complete the proof of the lemma we must show both terms in (3.2.34)
are o (1).
Beginning with the first term, we have seen previously that
1/2 I
NI/2 -1~ = 0 (1). Since the (ai,i.) are identically distributed for
i = 1,2,...,k, we see that
E(k- Z[I(O
i
= P(0
1 .. 1. ..* 1.*
Using the assumptions of the theorem we have also shown that
1/2 -- -
N (a+e -E ) = 0 (1). This implies that for any A > 0, there exists
a bounded D > 0, such that P(-D 1 A/2 for large
N. Therefore,
P(OD)
1 1. 1 ..
and thus, for large N,
1 1/2
P(0
= G(D/N1/2) G(0) + A/2.
Since D/N1/2 = o(l) and G(x) is continuous, the above quantity can be
made arbitrarily small by choosing large enough N. Thus, we have shown
that P(0
that P(acrt -E1.
that the first term of (3.2.34) is 2B10p(l)o (1) = o (1).
For the second term look at
E(N /2k- Iel | [I(0
< Nk E(2 ) + N(k-l)k- E(I j [I(O
+ I(cr-e -El-
S1. 1L 2. 2 2.
+ I(a+i -2
.. 2. 2
Let [I(O
+ I( -i -e2
inequality (Chow and Teicher 1978, Page 104), the expectation of
interest is less than or equal to
Nk-E(2 .) + N(k-l)k-[E(R2)]I2[E(S2)]1/2
= o(l) + N(k-1)k'[E(e .)]E(S)J]2
= o(1) + 0(1)o(1) = o(1)
-2
since E(C ) = 0(N) and E(S) < P(0
E j)1 1. + PL
which is o(l) as shown above. Thus, again applying the Markov
inequality, we see that the second term in (3.2.34) is o p(1) completing
the proof of the lemma.
To prove Theorem 3.2.1 we recall that under Ho: 8 = 1,
N 2[TN(E,a)-1/4](48nk- 1/2 -d N(0,1).
N+
Theorem 3.2.1 is established by showing
(3.2.35) N/2[TN(X,Y)-TN(e,a)] = o (1)
and applying Slutsky's Theorem.
Using the representations of TN(E,a) and TN(X,Y) given in (3.2.10)
and (3.2.19) respectively, we write the LHS of (3.2.35) as
1/2 *
N (A + BN + B + E ChN A BN B2 Z CN).
h=1 h=1
-1/2
In Appendix E it is shown that the ChN and ChN terms are all o (N ).
Since A* = A, for large N we can write the LHS of (3.2.35) as
N/2(B + BN BN- B ) + o (I).
IN 2N IN 2N p
Using (3.2.22) and (3.2.23) this quantity is equal to
(l-N)N(n-[B(e j -.) B(E j)]
k E[B(ai+i -a-c ) B(ai)]) + o (1).
i p
The theorem follows by applying Lemmas 3.2.1 and 3.2.2.
Theorem 3.2.2. Using the model and assumptions described in
Section 3.1 and the pseudo-samples given in (3.1.2), if assumptions (i),
(ii), and (iii) of Theorem 3.2.1 are satisfied and if
(iva) lim inf -ln[l-F(x)] > 0
21n(x)
or
(ivb) fIx[~f(x)dx < = for some I > 0,
then
N /2[TN(X',Y')-1/4](48nk-) 1/2 -d- N(0,1).
N+
Indication of Proof: The assumptions of Theorem 3.2.1, where sample
means were used to form the pseudo-samples, implied that a = p (1)
1/2-
and N a = 0 (1). Also, for every i = 1,2,...,k, that
N 2i. = 0(1), i. = o(1), and E[(N12. )2] is uniformly bounded.
These facts were instrumental in the proof of the theorem. In the
present theorem, where sample medians are used to form the pseudo-
samples, the assumptions produce similar results for a and i, i =
1,2,...,k, (defined in Section 3.1).
1/2
Assumption (iii) assures us that N a2 = 0 (1) (see Proposition
p
E.10 in Appendix E) and N1/2 i = 0 (1) for every i (Serfling 1980, Page
i p
77). Anderson (1981, Propositions 1 and 2) showed that either of (iva)
and (ivb) is a necessary and sufficient condition for E[(N /2 i) 2 to be
uniformly bounded for every i (under the assumption that f(x) is
symmetric about zero, Anderson's conditions on a+ and a_ in his
Proposition 1 are equivalent to (iva)). For distributions with finite
first moment, (ivb) is obviously satisfied. For the Cauchy distribution
it can be shown that (iva) is true and (ivb) is true for 0 < J < 1.
Since the sample median is a consistent estimate for the population
median and since F(x) represents a distribution symmetric about zero, we
see that & = o (1) and Ci = o (1) for i = 1,2,...,k.
Therefore, the proof of Theorem 3.2.2 is analogous to the proof of
Theorem 3.2.1 with a + e replaced by & and E. replaced by 2i for
1. 1
every i.
The proofs that appear in Appendix E regarding the negligibility of
the C* terms are given utilizing the assumptions of Theorem 3.2.1 and
using sample means to form the pseudo-samples. Corresponding proofs
using the assumptions of Theorem 3.2.2 and sample medians to form the
pseudo-samples are analogous.
Whether to obtain samples like those in (3.1.1) or (3.1.2) will
depend for the most part on what is known or believed about the actual
distributions of the Eij and a.. For some distributions, the Cauchy
distribution for example, second moments do not exist so samples
obtained using medians as in (3.1.2) would be used since the assumptions
for Theorem 3.2.1, which pertain to pseudo-samples constructed with
sample means, are not met. In those cases where either means or medians
could be used, the size of the tails of the distributions would be an
important factor in choosing how to obtain the samples. For
distributions with heavy tails, when extreme observations are more
likely, samples involving medians may be preferred since the median is
less affected by extreme observations than the mean. For distributions
with lighter tails, like the normal distribution, the mean may be
preferred over the median since in these cases the mean is more
efficient (Serfling 1980, Page 86).
Theorems similar to Theorems 3.2.1 and 3.2.2 could possibly be
proven if the pseudo-samples involved are obtained by using estimates
that have the same type of large sample properties as the sample mean
and sample median. Adjustments in the assumptions may have to be made
in these cases to ensure that the estimates meet the requirements for
proof of a theorem analogous to Theorem 3.2.1 or Theorem 3.2.2.
3.3 Asymptotic Confidence Intervals Using the Modified
Ansari-Bradley Statistic
If we could observe the actual values of elj and ai as our two
samples we could use the procedure developed by Bauer (1972) to
construct an exact confidence interval for 6 = 6,/6, or any function of
6, such as y = e2/(2 + 1) = 6 /(2 + 62). Using the values of
W(g,g) for which 9 = 1 is not rejected, Bauer derives a confidence
interval for 6 where the endpoints are particular order statistics of
the subset of ratios ai/elj which are greater than zero. The choices of
the order statistics, and hence the confidence coefficient of the
interval, are derived using the tabled distribution of the Ansari-
Bradley statistic.
Without the actual Clj and ai as our two samples we cannot obtain
an exact confidence interval for 6. However, we can construct an
asymptotic confidence interval using a procedure of Sen (1966) that uses
the results of Chernoff and Savage (1958), Hodges and Lehmann (1963),
Lehmann (1963), and Sen (1963) among others.
Using Sen's procedure for constructing an asymptotic confidence
interval for e requires a statistic, which we shall call SN(X,Y), such
that SN(9X,Y) is monotonically increasing in 6 and, when properly
standardized, has an asymptotic normal distribution. These conditions
being satisfied, the endpoints of an asymptotic 100(1-C)% confidence
interval for 6 are
aLN = Inf{: ZN >-Z /2} and
(3.3.1)
6UN = Sup{: ZN < Z /2'
where ZN is the standardized version of SN(9X,Y) and Z /2 is the
(l-;/2)th percentile of the standard normal distribution.
We can apply Sen's procedure using the modified Ansari-Bradley
statistic W(X,Y) where X and Y are samples of observations as in (3.1.1)
or (3.1.2). Recalling the description of W in Section 3.1 it is easily
seen that W(9X,Y) is monotonically increasing in e. Defining
(3.3.2) ZN( X,Y) = [W(6X,Y)-nN/4](nkN/48)-1/2
it follows from Theorem 3.2.1 or Theorem 3.2.2 that Z (OX,Y) has an
asymptotic standard normal distribution. Thus, the requirements for the
use of Sen's procedure are met and the endpoints of the desired interval
are as described in (3.3.1).
In order to derive computational formulas for LN and eUN in our
case, we will use a representation of W(X,Y) introduced by Ehattacharyya
(1977). Bhattacharyya's representation uses ratios of observation from
the two samples in much the same way as in Bauer's (1972) exact
confidence interval procedure.
To begin, the Xj and Yi are adjusted by subtracting the combined
sample median. This centers the combined sample around zero but does
not change the value of W(X,Y) since all observations are shifted
equally. If we let m be the combined sample median then we are really
dealing with the samples X i1 and Y f1. For ease of exposition we
suppress this fact by continuing to refer to the samples as the
vectors X and Y.
We now define some notation as in Bhattacharyya (1977):
minW(X,Y) = minimum possible value of W(X,Y). Attained
when all the Xj have the smallest ranks,
relevant pair: a pair (Xj,Yi) where Xj and Yi have
the same sign,
(3.3.3) p(X,Y) = number of relevant pairs in the two observed
samples,
p'(X,Y) = number of relevant pairs where X /Yi > 1,
Pmax(X,Y) = maximum possible number of relevant pairs.
Bhattacharyya proved that W(X,Y) can be written in terms of these
quantities through the expression
(3.3.4) p'(X,Y) + (1/2)[p (X,Y)-p(X,Y)] + minW(X,Y) = W(X,Y).
For ease of exposition we will refer to the quantities defined in
(3.3.3) as simply minW, p, p', and pmax. We also note that if
N = n + k is odd, one of the observations in the combined sample will be
zero after subtracting f. In this case (3.3.4) is obtained by
eliminating this observation from consideration in forming the relevant
pairs.
The quantities minW and Pmax, defined in (3.3.3), are constants
that depend on n and k. If n and k are both even, minW = n(n+2)/4 and
Pmax/2 = nk/4. If n and k are both odd, minW = (n+1)2/4 and max/2 =
(nk-1)/4. If n is odd and k is even, minW = (n-l)(n+1)/4 and pmax/2 =
k(n-1)/4 if m is an X while minW = (n+l)2/4 and pmax/2 = n(k-l)/4 if
i is a Y. If n is even and k is odd, minW = n2/4 and Pmax/2 = k(n-1)/4
if i is an X while minW = n(n+2)/4 and pmax/2 = n(k-1)/4 if 8 is a Y.
In all of these cases, for large N (which is what we are interested in)
minW behaves like n2/4 and Pmax/2 behaves like nk/4. Thus, for large N,
minW + Pmax/2 = nN/4.
We can now derive the computational formulas for the endpoints
(3.3.1) of the interval obtained using Sen's (1966) procedure. First we
look at 6LN
0LN = Inf: ZN > -Z/2
= Inf{8: ZN(eX,Y) > -Z/2}
= Inf{6: [W(6X,Y)-nN/4](nkN/48)-1/2> -Z /2>
= Inf{e: W(6X,Y) > nN/4 (Z /,2)(nkN/48)/2 },
which, using (3.3.4), can be written as
Inf({: p + pmax/2 p/2 + minW > nN/4 (Z /2)(nkN/48)1/2
and for large N is equivalent to
Inf{6: p' > p/2 (Z /2)(nkN/48)1/2.
Recalling the definitions of p' and p from (3.3.3) and their dependence
on the vectors of observations 8X and Y we obtain
S( )1/2
LN = Inf{6: #(X. /Yi>1) > p/2 (Z /2)(nkN/48)1}
= Inf{6: #(Yi/Xj <) > p/2 (Z /2)(nkN/48)1/2}
= Inf{6: more than p/2 (Z /2)(nkN/48)/2 of the positive
(Yi/Xj) are less than 6}.
If we order the positive (Yi/Xj) we can apply the above expression to
see that
A> 1/2
(3.3.5) LN the {p/2 (Z/2)(nkN/48) 1/2+1 order statistic
of the positive (Yi/Xj),
where {x} = the greatest integer less than or equal to x.
Starting with the definition of iUN in (3.3.1) and following
similar steps we obtain
(3 1/2
(3.3.6) UN = the [p/2 + (Z /2)(nkN/48)'/2]+1 order statistic
of the positive (Yi/Xj),
where [x] = the greatest integer less than x.
It is a simple matter to convert these endpoints of a confidence
interval for 6 into endpoints of a confidence interval for
Y = e2/(62+1). The endpoints of an asymptotic 100(1-C)% confidence
interval for y are
(3.3.7) N = 2N/( N +) and N = 2 /(2N+
?LN LN LN UN UN+1.
CHAPTER FOUR
MONTE CARLO STUDY
A Monte Carlo study was undertaken to compare the various methods
of constructing confidence intervals for the intraclass correlation
coefficient, P = 0/(a2+o2), discussed in this dissertation. Throughout
a a e
this chapter we will refer to p as the parameter of interest even though
in some cases we are interested in scale parameters and the parameter of
2 2
interest is Y = 6 2/(6+62). We refer only to p for ease of presentation
and because p and y are numerically equivalent in those cases where both
exist.
Using IMSL (International Mathematical and Statistical Libraries)
subroutines, random numbers were generated from five distributions which
are symmetric about zero. The five distributions used were normal,
uniform, logistic, Laplace (double exponential), and Cauchy. In each
case the resulting random numbers were used to form responses in the
balanced one-way random effects model (without loss of generality we
assume p = 0)
Zij = ai + ij i = 1,2,...,k, j = 1,2,...,n.
The nk responses in each model were formed by generating nk + k random
numbers from one of the distributions, multiplying nk of these numbers
by a constant to obtain the simulated values of the eij, multiplying the
remaining k numbers by a constant to obtain the simulated values of the
ai, and adding the eij and ai to obtain the simulated responses.
Various multipliers were used to obtain effects with differing values of
p.
For each of the five distributions, four different size models were
generated. The size of the model is determined by k (the number of
treatments) and n (the number of observations per treatment). The
combinations of k and n used (with k listed first) were (6,12), (12,15),
(18,12), and (12,6). For models of size (6,12) and (12,15), numbers
were generated for each distribution and multipliers chosen to obtain
values for p of .10000, .26471, .40000, .50000, .60000, .73529, and
.90000. For models of size (18,12) and (12,6), only p values of .26471,
.50000, and .73529 were used. Since the results for these values of p
were consistent with the results for the same values of p for models of
size (6,12) and (12,15), the remaining values of p were not used for
models (18,12) and (12,6).
For every combination of distribution, model size, and p value, 200
sets of responses were generated and confidence intervals for p were
constructed using each of the methods described in this dissertation.
We will use the following conventions when referring to the individual
procedures. The Normal procedure refers to that based on normal theory
and the F-distribution as discussed in Scheffe' (1959, Pages 221-230).
The Arvesen procedure is the procedure based on jackknifed U-statistics
presented by Arvesen and Schmitz (1970) which leads to intervals of the
form given by (2.2.11). In Section 2.2 we presented two procedures for
computing intervals based on U-statistics. The first, which involved a
function of U-statistics with an asymptotic normal distribution,
produced intervals of the form given by (2.2.10) and will hereafter be
referred to as the U-statistic procedure. The second, which we call the
Chi-Square procedure, produces intervals of the form given by (2.2.14)
2
and is based on a function of U-statistics which has an asymptotic X2
distribution. The procedure presented in Chapter Three based on the
Ansari-Bradley statistic using pseudo-samples involving means (as given
in (3.1.1)) is called the ABMeans procedure. The corresponding
procedure based on pseudo-samples constructed using medians (as given in
(3.1.2)) is called the ABMedians procedure.
Intervals based on the Chi-Square procedure were not constructed
for the Cauchy distribution. These intervals could not be obtained
because of overflow errors encountered during the calculations. Since
the Chi-Square procedure is clearly inferior to the others (see
discussion below), the omission of these results in inconsequential.
Recall from Chapter Three that confidence intervals constructed
using the ABMeans and ABMedians procedures are formed using only the eij
from one treatment. Thus, for each of these procedures there are k
possible intervals that could be constructed from the responses in one
model. Individually, these k possible intervals, unlike the intervals
constructed using the other procedures, do not make full use of all the
information contained in the responses. In an attempt to obtain a
single interval that does make use of all the information, confidence
intervals were also formed in each case using procedures we will call
ABMeansC and ABMediansC. These intervals were calculated by averaging
the endpoints of the k different intervals that could be formed using
the ABMeans and ABMedians procedures respectively. This method of
combining the k possible intervals is based on the premise that if only
one interval was constructed, using either the ABMeans or ABMedians
procedure, it would most likely be constructed using the eij from a
randomly selected treatment. Thus, any of the k possible treatments
would be equally likely to be selected and any of the k possible
intervals would be equally valid. Averaging the endpoints of all
possible intervals can be thought of as assigning an equal weight to
each of the equally likely endpoints.
For each 200 intervals and for each procedure, the empirical
confidence coefficient (the number of intervals containing the value of
p divided by 200), the average length of the 200 intervals, and the
standard deviation of the 200 lengths were calculated. These
calculations were performed differently for the ABMeans and ABMedians
procedures since these procedures produce k possible intervals for each
set of responses. Therefore, the empirical confidence coefficient and
average lengths reported for these procedures were calculated using 200k
intervals rather than 200. Also, for each i = 1,2,...,k, we calculated
the standard deviation of the lengths of the 200 intervals constructed
if the Eij, j = 1,2,...,n, were used. The standard deviation reported
is the average of these k standard deviations.
A summary of the Monte Carlo study is presented in the following
tables. These tables are numbered in such a way that tables including
results for a particular distribution or model size can be easily
identified. The first position in the number of the table refers to the
distribution used to generate the responses according to the following
scheme: 1-normal, 2-uniform, 3-logistic, 4-Laplace, and 5-Cauchy. Thus,
the higher numbered tables are for distributions with heavier tails
(exception is that the normal distribution has heavier tails than the
uniform distribution).
Table 1Al
Behavior of nominal 90% confidence intervals for
models with
k = 6 treatments, n = 12 observations per treatment, and F normal.
P
.10000 .26471 .40000 .50000 .60000 .73529 .90000
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
U-statistic
Chi-Square
.91000
.5257
.312
.93000
.3845
.122
.82167
.4244
.217
.49417
.4422
.134
.87000
.4244
.153
.44000
.4422
.083
.72000
.2000
.110
.87500
.2625
.150
.87000
.5674
.226
.90000
.5060
.088
.85500
.5594
.174
.87167
.4992
.122
.92000
.5594
.107
.91500
.4992
.073
.68500
.3128
.111
.81000
.4070
.146
Note: For each procedure the
.90000
.5995
.209
.87500
.5224
.072
.85750
.5987
.168
.93000
.5202
.118
.90500
.5987
.114
.98500
.5202
.070
.63500
.3376
.114
.76500
.5219
.170
.88000
.6211
.194
.88500
.5149
.068
.83667
.6001
.158
.88083
.5228
.121
.92000
.6001
.112
.94500
.5228
.075
.62500
.3547
.110
.78000
.5422
.170
.170 .134 134
.87500
.5855
.195
.88000
.4868
.080
.84833
.5926
.166
.85250
.5228
.125
.88500
.5926
.133
.93000
.5228
.080
.63500
.3358
.113
.70000
.3983
.134
first row is the empirical
.90000
.5551
.217
.92500
.4215
.106
.85833
.5669
.193
.79917
.5095
.137
.91500
.5669
.165
.87500
.5095
.104
.65000
.3021
.112
.73000
.3597
.134
confidence
.94000
.3220
.187
.91000
.2178
.101
.89167
.3820
.237
.80667
.4177
.190
.95000
.3820
.216
.87000
.4177
.165
.69500
.1533
.089
.79000
.1820
1nA
coefficient, the second is the average length of the intervals, and the
third is the standard deviation of the lengths. Example: For 200 90%
confidence intervals constructed using Arvesen's procedure when
p = .10000, the proportion of intervals that contained .10000 was
.91000, the average length of the intervals was .5257, and the standard
deviation of the lengths was .312.
Table 1A2
Behavior of nominal 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F normal.
.10000 .26471 .40000 .50000 .60000 .73529 .90000
.i0000 .26471 .40000 .50000 .60000 .73529 .90000
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
U-statistic
Chi-Square
.93000
.5871
.303
.96500
.4728
.130
.90667
.5487
.238
.68250
.5441
.143
.94500
.5487
.161
.68500
.5441
.089
.81500
.2307
.126
.89500
.2860
.148
.91000
.6550
.226
.96500
.5980
.091
.92833
.6877
.176
.94500
.6047
.128
.96500
.6877
.104
.98000
.6047
.076
.73500
.3655
.138
.87500
.4478
.155
.92500
.6939
.206
.93500
.6100
.075
.94000
.7176
.163
.97250
.6247
.119
.97500
.7176
.107
.99500
.6247
.074
.68000
.3974
.134
.81000
.5219
.170
.94000
.7207
.187
.93000
.5971
.076
.92917
.7218
.153
.94833
.6302
.121
.96500
.7218
.111
.97000
.6302
.077
.71000
.4190
.129
.83500
.5989
.165
.94000
.6902
.197
.92500
.5636
.091
.92333
.7096
.163
.91667
.6305
.125
.97500
.7096
.131
.98000
.6305
.080
.74500
.5635
.202
.79500
.6271
.197
.95500
.6650
.228
.95500
.4883
.119
.94167
.6840
.196
.89833
.6165
.143
.98500
.6840
.173
.97000
.6165
.113
.74000
.6128
.239
.80000
.6784
.223
.96000
.4275
.226
.97000
.2572
.116
.95500
.4978
.258
.91083
.5234
.209
.97500
.4978
.239
.95000
.5234
.184
.78500
.5737
.336
.82500
.6782
.320
Note: Format of this table is identical to Table 1Al.
Table 1B1
Behavior of nominal 90% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F normal.
P
.10000 .26471 .40000 .50000 .60000 .73529 .90000
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
U-statistic
Chi-Square
.91500
.2770
.189
.91500
.2190
.061
.84333
.3512
.146
.64458
.3773
.117
.87000
.3512
.081
.61000
.3773
.065
.77000
.1592
.064
.88500
.1956
.077
.93000
.3601
.109
.91500
.3273
.040
.90667
.4783
.124
.85792
.4479
.099
.95000
.4783
.064
.91000
.4479
.054
.78500
.2593
.062
.89500
.3471
.088
.96500
.4142
.113
.92500
.3599
.023
.91625
.5244
.106
.92250
.4741
.085
.98500
.5244
.058
.98500
.4741
.044
.82500
.3070
.071
.91000
.4475
.120
.91000
.3874
.106
.90000
.3564
.026
.89583
.5155
.106
.90417
.4705
.086
.96500
.5155
.073
.96500
.4705
.051
.79000
.2912
.066
.89500
.4434
.118
.89000
.3853
.116
.87000
.3364
.035
.89750
.5088
.114
.90125
.4683
.088
.97000
.5088
.086
.98000
.4683
.059
.74000
.2873
.075
.83000
.4723
.147
Note: Format of this table is identical to Table lAl.
.92500
.3110
.094
.92500
.2717
.050
.92125
.4498
.135
.90667
.4355
.104
.98500
.4498
.106
.97500
.4355
.079
.82000
.2305
.062
.91000
.4465
.180
.87500
.1713
.083
.86500
.1394
.053
.90208
.2843
.149
.83375
.3222
.136
.96000
.2843
.125
.89000
.3222
.114
.77500
.1202
.055
.82500
.3454
.247
Note: Format of this table is
identical to Table 1Al.
Table 1B2
Behavior of nominal 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F normal.
P
.10000 .26471 .40000 .50000 .60000 .73529 .90000
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
U-statistic
.94500
.3268
.193
.97500
.2679
.073
.90708
.4267
.163
.75375
.4473
.128
.93500
.4267
.089
.78500
.4473
.071
.83500
.1836
.074
.95500
.4317
.123
.95000
.3917
.044
.94667
.5610
.129
.93167
.5210
.106
.97000
.5610
.066
.96000
.5210
.057
.86000
.3076
.074
.91000 .94500
Chi-Square .2140 .3867
.083 .095
Note: Format for this table
.98000
.4923
.126
.95000
.4255
.025
.96125
.6094
.109
.95875
.5515
.088
1.00000
.6094
.061
1.00000
.5515
.046
.88500
.3656
.084
.92500
.5061
.125
.96000
.4623
.117
.94000
.4196
.031
.94708
.5999
.111
.94621
.5451
.090
.98000
.5999
.077
.98000
.5451
.055
.84500
.3467
.078
.91500
.5185
.132
.95000
.4603
.130
.96000
.3954
.042
.95083
.5928
.119
.95250
.5439
.094
1.00000
.5928
.090
.99000
.5439
.065
.80500
.3423
.089
.86500
.5578
.161
is identical to Table 1Al.
.97500
.3779
.111
.95500
.3194
.058
.96583
.5322
.146
.95750
.5095
.111
.99500
.5322
.117
.99000
.5095
.086
.88000
.2745
.074
.93000
.5552
.202
.92500
.2151
.104
.91000
.1653
.062
.95000
.3507
.170
.90458
.3892
.151
.98000
.3507
.144
.95000
.3892
.128
.85500
.1431
.066
.85500
.4926
.306
is identical to Table IAI.
Table 1C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F normal.
90% 95%
p p
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
AEMeans
ABMedians
ABMeansC
ABMediansC
.92000
.2936
.095
.89000
.2695
.031
.88361
.4675
.128
.83444
.4598
.101
.98000
.4675
.057
.92000
.4598
.044
.90500
.3334
.089
.90000
.2968
.014
.88471
.5085
.106
.89917
.4787
.087
.97000
.5085
.062
.98000
.4787
.050
.92500
.2474
.058
.88500
.2336
.036
.90555
.4355
.130
.92389
.4156
.104
.99500
.4355
.097
1.00000
.4156
.076
.96000
.3512
.110
.94500
.3207
.035
.93583
.5353
.135
.89472
.5247
.105
.99000
.5353
.059
.95500
.5247
.046
.95000
.3972
.102
.95000
.3488
.016
.92805
.5781
.109
.94250
.5450
.090
.98500
.5781
.065
.99000
.5450
.053
.94500
.2987
.068
.95500
.2619
.043
.94805
.5027
.140
.95778
.4792
.111
1.00000
.5027
.106
1.00000
.4792
.082
Note: For each procedure the first row is the empirical confidence
coefficient, the second is the average length of the intervals, and the
third is the standard deviation of the lengths. Also for each
procedure, the first three columns apply to nominal 90% confidence
intervals and the second three columns apply to nominal 95% confidence
intervals. Example: For 200 90% confidence intervals constructed using
Arvesen's procedure when p = .26471, the proportion of intervals that
contained .26471 was .92000, the average lengths of the intervals was
.2936, and the standard deviation of the lengths was .095. For 200 95%
confidence intervals constructed under the same conditions, the
corresponding values were .96000, .3512, and .110.
Behavior of nominal
Table ID
90% and 95% confidence
intervals for models with
k = 12 treatments, n = 6 observations per treatment, and F normal.
90% 95%
p P
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.86500
.4310
.164
.85500
.4043
.065
.82208
.5725
.167
.78000
.5894
.148
.92000
.5725
.077
.89000
.4522
.121
.88000
.4139
.041
.86708
.6026
.155
.88500
.5922
.133
.97500
.6026
.091
.93500
.3692
.125
.90500
.3185
.068
.88125
.5470
.185
.91708
.5387
.152
1.00000
.5470
.127
.92000
.5054
.169
.92000
.4786
.076
.90792
.6898
.167
.88958
.7013
.144
.97500
.6898
.080
.94000
.5388
.135
.93500
.4890
.047
.93333
.7212
.149
.94917
.7045
.126
1.00000
.7212
.089
.97000
.4480
.145
.96000
.3787
.079
.94792
.6720
.186
.96417
.6555
.152
1.00000
.6720
.130
ABMediansC
Note: Format
.87000 .98500 .99500
.5894 .5922 .5387
.061 .073 .099
of this table is identical
.96000
.7013
.062
to Table 1C.
1.00000
.7045
.070
1.00000
.6555
.102
Table 2A
Behavior of nominal 90% and 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F uniform.
90% 95%
P P
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
.89500
.5292
.206
.96000
.5195
.076
.86583
.5018
.173
.87583
.4691
.122
.94000
.5018
.112
.93500
.4691
.071
.92500
.5576
.189
.97000
.5312
.047
.88083
.5646
.145
.90333
.5075
.115
.95000
.5646
.103
.97500
.5075
.065
.92500
.4680
.182
.97000
.4197
.084
.87000
.5194
.178
.83250
.4937
.127
.95000
.5194
.155
.89000
.4937
.092
.95000
.6211
.210
.99000
.6124
.078
.94833
.6301
.185
.95083
.5756
.125
.98000
.6301
.115
.97500
.5756
.075
of this table is identical to Table 1C.
.97500
.6579
.190
.98000
.6133
.052
.95167
.6836
.147
.96750
.6148
.117
.98500
.6836
.106
.99500
.6148
.064
.96500
.5719
.200
.98500
.4862
.094
.94417
.6411
.181
.92417
.6028
.130
.98000
.6411
.159
.97500
.6028
.097
Note: Format
Table 2B
Behavior of nominal 90% and 95% confidence
intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F uniform.
90% 95%
p p
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.92000
.3028
.089
.95500
.3311
.039
.89792
.4032
.121
.85708
.4194
.096
.98500
.4032
.063
.94500
.3311
.085
.95500
.3633
.015
.90542
.4610
.102
.92708
.4472
.085
.98000
.4610
.071
.91500
.2369
.081
.98000
.2691
.039
.89833
.3715
.126
.90167
.3988
.113
.98000
.3715
.111
.95000
.3657
.104
.97500
.3960
.044
.95208
.4822
.113
.92292
.4924
.102
.99500
.4822
.068
.96000
.3981
.098
.98500
.4274
.018
.94914
.5431
.107
.96833
.5223
.087
.99000
.5431
.074
.95500
.2885
.097
1.00000
.3164
.046
.94583
.4458
.140
.95208
.4677
.119
.99500
.4458
.124
ABMediansC
Note: Format
.94000
.4194
.047
of this
.98500
.4472
.045
table is
.96500
.3988
.087
identical
.97500
.4924
.051
to Table 1C.
1.00000
.5223
.048
.98500
.4677
.095
Table 2C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F uniform.
90% 95%
p p
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.93000
.2487
.058
.96500
.2746
.024
.87922
.4012
.122
.83556
.4311
.097
.96500
.4012
.048
.91500
.2543
.059
.96500
.2994
.009
.88611
.4427
.093
.91639
.4461
.083
.98000
.4427
.058
.90000
.1835
.052
.97000
.2229
.028
.87361
.3527
.112
.90833
.3689
.102
.98500
.3527
.094
.97000
.2992
.069
.98500
.3266
.027
.93139
.4644
.134
.90194
.4938
.101
.99000
.4644
.052
.95500
.3054
.069
.98500
.3516
.010
.92917
.5095
.099
.95417
.5101
.086
.99500
.5095
.063
.94500
.2219
.062
.98500
.2610
.033
.92444
.4133
.123
.94694
.4262
.107
.99500
.4133
.105
ABMediansC
Note: Format
.90000
.4311
.038
of this
.99000
.4461
.049
table is
.98500
.3689
.075
identical
.97000
.4938
.041
to Table 1C.
1.00000
.5101
.053
.99000
.4262
.082
Behavior of nominal
Table 2D
90% and 95% confidence intervals for models with
k = 12 treatments, n = 6 observations per treatment, and F uniform.
90% 95%
P P
.26471 .50000 .73529 .26471 .50000 .73529
.91500 .92500 .93500 .93000 .95000 .96500
Arvesen .4055 .3816 .2925 .4809 .4580 .3555
.127 .107 .117 .139 .122 .136
.93000 .94500 .98000 .97500 .97500 .99500
Normal .4170 .4201 .3126 .4932 .4201 .3720
.051 .031 .054 .060 .031 .063
.86458 .84708 .84708 .93583 .92208 .92583
ABMeans .5470 .5505 .4681 .6641 .6686 .5916
.167 .153 .179 .171 .153 .198
.80167 .87000 .89875 .90833 .93917 .95167
ABMedians .5858 .5606 .4829 .6975 .6750 .5989
.149 .132 .159 .143 .128 .166
.95000 .94500 .97000 .97000 .99000 .99500
ABMeansC .5470 .5505 .4681 .6641 .6686 .5916
.076 .101 .140 .079 .102 .160
.88500 .96500 .99500 .95500 .99500 1.00000
ABMediansC .5858 .5606 .4829 .6975 .6750 .5989
.066 .075 .115 .065 .076 .120
Note: Format of this table is identical to Table 1C.
Behavior of nominal
k = 6 treatments, n
Table 3A
90% and 95% confidence intervals
= 12 observations per treatment,
for models with
and F logistic.
90% 95%
p p
.26471 .50000 .73529 .26471 .50000 .73529
.93000 .88000 .91500 .94500 .94000 .95000
Arvesen .5940 .6133 .5258 .6805 .7110 .6396
.231 .214 .188 .229 .212 .202
.88500 .84000 .84500 .97000 .89500 .91500
Normal .5039 .5056 .4203 .5959 .5876 .4871
.088 .080 .113 .090 .088 .128
.91000 .84833 .88833 .96333 .92833 .93917
ABMeans .6038 .6192 .5601 .7273 .7375 .6762
.167 .167 .188 .160 .157 .194
.86167 .90917 .82750 .94083 .96167 .91417
ABMedians .5047 .5248 .5139 .6123 .6348 .6225
.120 .116 .140 .124 .121 .146
.95000 .92500 .94000 .98000 .98500 .98000
ABMeansC .6038 .6192 .5601 .7273 .7375 .6762
.102 .118 .157 .094 .111 .165
.91000 .95500 .91000 .99000 .99500 .96000
ABMediansC .5047 .5248 .5139 .6123 .6348 .6225
.069 .069 .103 .072 .072 .108
Note: Format of this table is identical to Table 1C.
Table 3B
Behavior of nominal 90% and 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F logistic.
90% 95%
p P
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.92500
.4090
.148
.86500
.3265
.048
.88333
.5064
.129
.85042
.4645
.099
.96500
.5064
.072
.86500
.4386
.142
.85500
.3528
.027
.90625
.5402
.111
.91333
.4878
.086
.98000
.5402
.067
.89000
.3553
.124
.86500
.2765
.059
.90917
.4661
.142
.89917
.4454
.111
.97500
.4661
.110
.96000
.4843
.160
.91500
.3904
.053
.93875
.5915
.132
.91750
.5407
.103
.97500
.5915
.074
.93500
.5177
.150
.90000
.4160
.032
.95250
.6247
.111
.95625
.5653
.086
.99500
.6247
.069
.92500
.4296
.142
.94500
.3251
.069
.95583
.5495
.151
.95208
.5209
.117
.98500
.5495
.120
ABMediansC
Note: Format
.92500
.4645
.051
of this
.97000
.4878
.048
table is
.97000
.4454
.086
identical
.97500
.5407
.053
to Table 1C.
.99500
.5653
.050
.98500
.5209
.092
Table 3C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F logistic.
90% 95%
p p
.26471 .50000 .73529 .26471 .50000 .73529
.94000 .88000 .85500 .96000 .93000 .94500
Arvesen .3398 .3579 .2846 .4035 .4251 .3437
.125 .102 .086 .140 .114 .102
.85000 .84500 .79500 .91500 .93500 .87000
Normal .2695 .2956 .2240 .3207 .3476 .2624
.033 .015 .046 .037 .018 .054
.88806 .87694 .89972 .93083 .92805 .94444
ABMeans .4874 .5234 .4444 .5559 .5933 .5123
.130 .108 .140 .135 .110 .148
.84333 .89833 .91305 .90028 .94444 .95472
ABMedians .4718 .4842 .4295 .5579 .5508 .4937
.102 .086 .108 .106 .089 .115
.98000 .97500 1.00000 1.00000 .98500 1.00000
ABMeansC .4874 .5234 .4444 .5559 .5933 .5123
.056 .062 .106 .058 .065 .113
.94500 .98000 1.00000 .98500 .98500 1.00000
ABMediansC .4718 .4842 .4295 .5579 .5508 .4937
.043 .049 .079 .044 .051 .084
Note: Format of this table is identical to Table 1C.
Table 3D
Behavior of nominal 90% and 95% confidence intervals
for models with
k = 12 treatments, n = 6 observations per treatment, and F logistic.
90% 95%
P P
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.89500
.4585
.180
.88500
.4084
.061
.85208
.5913
.169
.80250
.5962
.150
.95500
.5913
.073
.88500
.4588
.138
.85000
.4149
.048
.86458
.6098
.158
.89125
.5966
.136
.98000
.6098
.083
.89500
.3947
.135
.87500
.3190
.077
.87458
.5440
.189
.89875
.5258
.161
.99500
.5440
.121
.92000
.5354
.189
.95500
.4824
.071
.92750
.7084
.165
.90542
.7099
.143
.99000
.7084
.072
.93000
.5444
.147
.92000
.4904
.052
.93208
.7299
.149
.95292
.7117
.128
.99000
.7299
.081
.94000
.4789
.154
.94000
.3791
.089
.94667
.6679
.191
.96125
.6420
.162
1.00000
.6679
.129
ABMediansC
Note: Format
.89000
.5962
.064
of this
.99000
.5966
.063
table is
.99500
.5258
.099
identical
.98500
.7099
.063
to Table 1C.
1.00000
.7117
.065
1.00000
.6420
.101
Table 4A
Behavior of nominal 90% and 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F Laplace.
90% 95%
P p
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.90500
.6241
.256
.85000
.4761
.103
.86333
.6179
.192
.88333
.5146
.128
.93500
.6179
.123
.87000
.6529
.218
.75500
.4938
.095
.85583
.6521
.180
.90833
.5495
.122
.93000
.6521
.131
.85500
.5874
.217
.77500
.4240
.127
.85417
.5711
.219
.79083
.5144
.152
.93500
.5711
.177
.93500
.7015
.246
.96500
.5668
.107
.94083
.7382
.176
.94667
.6193
.134
.98000
.7382
.111
.91500
.7484
.216
.88000
.5766
.103
.93333
.7673
.162
.96833
.6551
.119
.97000
.7673
.118
.89500
.6986
.215
.87500
.4920
.144
.94250
.6890
.209
.89250
.6272
.153
.98000
.6890
.170
ABMediansC
uLo. U .or La L
.93500
.5146
.077
uL Lh il
.96500 .91000
.5495 .5144
.074 .108
table is identical
.99000
.6193
.081
to Table 1C.
.99000
.6551
.076
.95500
.6272
.109
.109
XNT t F --
Table 4B
Behavior of nominal 90% and 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F Laplace.
90%
95%
.26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
.88000
.4133
.184
.78500
.3105
.057
.88333
.5422
.137
.86083
.4775
.108
.96000
.5422
.068
.94500
.4775
.054
.84500
.4696
.154
.74500
.3473
.032
.89167
.5707
.125
.90458
.5036
.094
.96500
.5707
.076
.98000
.5036
.053
.86000
.4028
.136
.77500
.2858
.064
.89625
.5126
.165
.87042
.4777
.120
.98500
.5126
.125
.97500
.4777
.088
.26471 .50000 .73529
.93000
.4862
.197
.86500
.3722
.064
.93792
.6271
.138
.92625
.5532
.112
.98500
.6271
.068
.98000
.5532
.059
Note: Format of this table is identical to Table 1C.
.88000
.5520
.165
.85000
.4102
.036
.94417
.6557
.123
.95042
.5812
.099
1.00000
.6557
.076
1.00000
.5812
.055
.92000
.4846
.155
.86500
.3361
.075
.94208
.5965
.171
.92292
.5561
.126
.99500
.5965
.132
.99000
.5561
.093
Table 4C
Behavior of nominal 90% and 95% confidence intervals for models with
k = 18 treatments, n = 12 observations per treatment, and F Laplace.
90% 95%
P p
.26471 .50000 .73529 .26471 .50000 .73529
.89500 .91000 .85500 .94500 .94500 .91500
Arvesen .3403 .4314 .3330 .4038 .5075 .4008
.131 .134 .111 .147 .146 .130
.78500 .76500 .70500 .86500 .82500 .83500
Normal .2609 .2912 .2245 .3108 .3424 .2630
.040 .021 .050 .046 .025 .059
.86444 .88583 .88278 .91222 .93305 .92694
ABMeans .5268 .5693 .4808 .5974 .6410 .5521
.135 .118 .155 .138 .118 .162
.83528 .90528 .90194 .89750 .95055 .94250
ABMedians .4880 .5164 .4562 .5542 .5837 .5208
.103 .095 .119 .106 .096 .123
.93000 .99000 .97500 .97000 1.00000 .99500
ABMeansC .5268 .5693 .4808 .5974 .6410 .5521
.055 .060 .107 .057 .061 .115
.93500 .99000 .99000 .97000 .99000 .99000
ABMediansC .4880 .5164 .4562 .5542 .5837 .5208
.041 .052 .078 .043 .054 .082
Note: Format of this table is identical to Table 1C.
Behavior of nominal
k = 12 treatments.
Table 4D
90% and 95% confidence intervals for models with
n = 6 observations per treatment, and F Laplace.
90% 95%
p P
.26471 .50000 .73529 .26471 .50000 .73529
.88000 .86000 .82500 .92000 .91500 .90000
Arvesen .4856 .4955 .4095 .5614 .5847 .4973
.207 .155 .138 .210 .166 .161
.80500 .77500 .77000 .88000 .84500 .85000
Normal .4020 .4066 .3243 .4755 .4806 .3852
.066 .055 .087 .076 .063 .101
.79583 .82917 .83500 .89000 .91458 .91458
ABMeans .6126 .6128 .5404 .7257 .7284 .6617
.176 .178 .211 .169 .170 .213
.78042 .88042 .87000 .88333 .94833 .93583
ABMedians .6115 .6058 .5341 .7218 .7175 .6514
.142 .146 .178 .133 .137 .179
.87000 .95000 .97000 .97000 .99000 .99500
ABMeansC .6126 .6128 .5404 .7257 .7284 .6617
.084 .096 .137 .082 .097 .140
.86000 .98500 .99000 .97500 .99500 1.00000
ABMediansC .6115 .6058 .5341 .7218 .7175 .6514
.060 .074 .108 .056 .070 .111
Note: Format of this table is identical to Table 1C.
Table 5A
Behavior of nominal 90% and 95% confidence intervals for models with
k = 6 treatments, n = 12 observations per treatment, and F Cauchy.
90% 95%
p P
.26471 .50000 .73529 .26471 .50000 .73529
.66500 .56500 .64000 .72500 .62500 .68500
Arvesen .5969 .5642 .6561 .6375 .6148 .7219
.398 .372 .337 .392 .373 .333
.46500 .30500 .25000 .88500 .42000 .31500
Normal .3232 .3583 .3679 .4025 .4389 .4432
.133 .149 .167 .145 .165 .186
.51333 .62083 .61833 .63000 .72500 .72583
ABMeans .5537 .5520 .4795 .6663 .6659 .5949
.330 .326 .344 .327 .322 .351
.56583 .92500 .68167 .71167 .97147 .77333
ABMedians .4635 .4952 .5025 .5559 .5944 .6037
.204 .204 .202 .225 .217 .216
.47000 .74000 .77000 .58000 .80000 .85000
ABMeansC .5537 .5520 .4795 .6663 .6659 .5949
.237 .228 .254 .241 .230 .264
.57500 .98500 .81000 .76000 1.00000 .90500
ABMediansC .4635 .4952 .5025 .5559 .5944 .6037
.117 .109 .108 .132 .116 .116
Note: Format of this table is identical to Table 1C.
Table 5B
Behavior of nominal 90% and 95% confidence intervals for models with
k = 12 treatments, n = 15 observations per treatment, and F Cauchy.
90% 95%
p P
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
ABMediansC
.51000
.4659
.415
.15000
.1617
.089
.50708
.5692
.303
.46375
.3680
.185
.45000
.5692
.182
.52000
.3680
.117
.49000
.4853
.389
.15000
.1844
.107
.60625
.5391
.351
.90542
.4149
.194
.74000
.5391
.187
.97500
.4149
.112
.46000
.4809
.362
.11000
.2042
.112
.67542
.4902
.312
.64708
.4213
.176
.88500
.4902
.174
.75500
.4213
.094
.54500
.4993
.426
.20000
.1986
.103
.62667
.6738
.297
.54667
.4272
.206
.63000
.6738
.187
.62500
.4272
.131
.54500
.5404
.405
.18500
.2246
.124
.72792
.6581
.302
.95375
.4801
.213
.84500
.6581
.183
.99500
.4801
.126
.49500
.5432
.382
.15000
.2464
.130
.78792
.6021
.317
.70125
.4895
.191
.95000
.6021
.194
.84000
.4895
.102
Note: Format of this table is identical to Table 1C.
Behavior of nominal
k = 18 treatments,
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
Table 5C
90% and 95% confidence intervals for models with
n = 12 observations per treatment, and F Cauchy.
90% 95%
P P
.26471 .50000 .73529 .26471 .50000 .73529
.49000
.4382
.405
.13000
.1454
.079
.60861
.6172
.284
.47500
.3690
.191
.66000
.6172
.146
.45500
.4473
.358
.11500
.1643
.083
.68250
.6013
.293
.90556
.4034
.196
.93500
.6013
.161
.44500
.4502
.311
.11000
.1746
.091
.69056
.5301
.314
.63417
.3914
.188
.98000
.5301
.183
.52000
.4772
.420
.16500
.1754
.091
.68889
.6817
.277
.55778
.4198
.210
.79000
.6817
.137
.50000
.5023
.375
.13000
.1972
.097
.76000
.6673
.282
.95000
.4571
.215
.96000
.6673
.156
.49500
.5340
.345
.13000
.2084
.106
.75806
.5967
.310
.68056
.4457
.205
.98500
.5967
.186
.57000
ABMediansC .3690
.148
Note: Format of this
.98500
.4034
.151
table is
.76500
.3914
.141
identical
.64500
.4198
.165
to Table IC.
1.00000
.4571
.167
.79500
.4457
.157
Behavior of nominal
Table 5D
90% and 95% confidence intervals for models with
k = 12 treatments, n = 6 observations per treatment, and F Cauchy.
90% 95%
p p
.26471 .50000 .73529 .26471 .50000 .73529
Arvesen
Normal
ABMeans
ABMedians
ABMeansC
.67500
.5242
.379
.48000
.3024
.103
.56292
.5701
.309
.58708
.5208
.222
.62000
.5701
.162
.53000
.4839
.356
.23000
.2897
.115
.62333
.5700
.326
.86667
.5092
.236
.88500
.5700
.177
.57500
.5166
.313
.21500
.2953
.139
.69708
.5784
.344
.73042
.5110
.227
.97500
.5784
.208
.69000
.5646
.380
.64500
.3622
.116
.69708
.6920
.304
.69833
.6176
.232
.80000
.6920
.155
.56000
.5478
.378
.30000
.3464
.132
.75458
.6953
.317
.94042
.6024
.250
.94500
.6953
.169
.65000
.6164
.334
.29000
.3518
.162
.80792
.7042
.310
.81417
.6140
.207
.99500
.7042
.190
ABMediansC
Note: Format
.61500
.5208
.163
of this
.98500
.5092
.182
.86500
.5110
.153
table is identical to
.81500
.6176
.179
Table 1C.
1.00000
.6024
.202
.92000
.6140
.169
The second position in the number of the table is a letter that
refers to the size of the model used according to the following scheme:
A-(6,12), B-(12,15), C-(18,12), and D-(12,6).
The first four tables (Tables 1Al, 1A2, 1B1, and 1B2) give complete
results for all procedures and all p values for nominal 90% and 95%
confidence intervals constructed using normally distributed responses
and model sizes (6,12) and (12,15). The performance of each procedure
over the range of p values given in these tables held consistently
throughout the rest of the study. For this reason, the remaining tables
give results only for the p values .26471, .50000, and .73529.
First we examine Table lA1. The Arvesen procedure gives an
empirical confidence coefficient that is consistently close to the 90%
nominal level over the entire range of p values. However, compared to
the other procedures, the average length and the standard deviation of
the lengths are quite high. The Normal procedure performs well, as it
should in this case since the needed assumptions are met, in empirical
confidence coefficient, length, and standard deviation. The ABMeans
procedure produces results similar to the Arvesen procedure giving
slightly lower empirical confidence coefficient, essentially equal
length, and less variability of length. The ABMedians procedure gives
results similar to the Normal procedure near the center of the range of
p values except for higher variability of length. The empirical
confidence coefficient drops off as p gets larger or smaller, especially
for p = .10000.
Due to the method of construction, the ABMeansC and ABMediansC
procedures produce intervals with the same average length as the ABMeans
and ABMedians procedures. However, the variability of the lengths is
71
decreased by using the combined procedures. For the ABMeansC procedure
the empirical confidence coefficient increases over the ABMeans
procedure to the point where it is consistently above the 90% nominal
level. However, it is slightly lower at p = .10000 than it is at other
p values but then only slightly below the nominal level. The ABMediansC
procedure performs very well near the center of the p values with
empirical confidence coefficient above the nominal level and variability
similar to the Normal procedure. However, like the ABMedians procedure,
the empirical confidence coefficient drops as p moves farther from
.50000 and again is especially poor at p = .10000.
The U-statistic procedure produces short intervals with moderate
variability but with empirical confidence coefficient well below the
nominal level for all values of p. The Chi-Square procedure also
produces intervals with consistently low empirical confidence
coefficient. The lengths are moderate while the variability of the
lengths is quite high.
Table 1A2 contains results for nominal 95% confidence intervals
under the same conditions as those used for Table 1Al. These results
are consistent with those found in Table 1Al.
Tables 1B1 and 1B2 show what happens if the model size is increased
to (12,15). For most procedures empirical confidence coefficients
generally get closer to the nominal level than they were for the (6,12)
model. For the ABMeansC and ABMediansC procedures, the empirical
confidence coefficient rises even higher above the nominal level except
for those same p values where they were low in the (6,12) model. For
all procedures the average lengths and standard deviations of lengths
decrease for the larger model though the decrease is not as pronounced
for the AB procedures as it is for the other procedures. Therefore, the
ABMeans and ABMedians procedures do not compare as favorably with the
Arvesen and Normal procedures as they did when the model was (6,12).
The ABMeansC and ABMediansC procedures also have longer lengths than the
Arvesen and Normal procedures but with generally higher empirical
confidence coefficients.
The U-statistic and Chi-Square procedures also improve as the model
size increases but the improvement is not sufficient to raise these
procedures to the level of the others. Another aspect to these
procedures is the high occurrence of intervals which have endpoints
that must be truncated at either 0 or 1 (since 0 4 p < 1). This happens
more frequently for the Chi-Square procedure than for the U-statistic
procedure, sometimes occurring as often as in 60% of the intervals even
when p = .50000. The poor performance of the U-statistic and Chi-Square
procedures was consistent over the whole study. For this reason these
procedures are not recommended for use and results are not given for
them beyond Table 1B2.
Beginning with Table 1C, results for both nominal 90% and 95%
confidence intervals are given in the same table for the reduced range
of p values. Table IC shows that increasing the model size even
further, to (18,12), produces results consistent with the findings when
the model size was increased from (6,12) to (12,15). That is, lengths
and variability of lengths decrease for all procedures though at a
slower rate for the AB procedures. Also, empirical confidence
coefficients for the ABMeansC and ABMediansC procedures increase while
the empirical confidence coefficients of the other procedures stay near
the nominal levels.
The results in Table 1D are also consistent with the results in the
previous tables. Table 1D shows that, as would be expected, the
performance of the procedures is generally poorer than in Tables 1B1 and
1B2, since n is smaller, but better than in Tables 1Al and 1A2 since the
number of treatments is higher.
Tables 2A through 2D are for effects with uniform distributions.
The patterns exhibited in Tables 1Al through ID are generally apparent
in these tables as well. The most notable exception is that the
empirical confidence coefficient for the Normal procedure is noticeably
higher than the nominal level. Also, the lengths of the intervals using
the Arvesen procedure are essentially the same as in the Normal
procedure. However, the Normal procedure is still superior due to the
increased empirical confidence coefficient and much less variable
lengths.
As the tails of the distributions of the effects get heavier, the
empirical confidence coefficient of the Normal procedure decreases.
This can be seen in Tables 3A through 3D which deal with effects that
have logistic distributions. The results for the other procedures are
similar to those seen previously.
For the smaller models, the ABMeansC procedure is apparently
superior to the Arvesen procedure. For example, in Table 3A for
p = .26471 and nominal 90% intervals, the ABMeansC procedure produced
intervals with higher empirical confidence coefficient, smaller
variability of length, and just slightly higher average length than the
Arvesen intervals. If the results for nominal 90% intervals for the
ABMeansC procedure are compared with nominal 95% intervals for the
Arvesen procedure (still p = .26471) the ABMeansC procedure is better in
all three areas. As the model size increases, the ABMeansC procedure
still produces intervals with higher empirical confidence coefficient
and less variable lengths than the Arvesen procedure but with clearly
higher average lengths (Table 3C).
Tables 4A through 4D give results for effects with Laplace
distributions. With even heavier tails the empirical confidence
coefficient of the Normal procedure decreases even more. Performance of
the other procedures generally follows the patterns discussed earlier.
If the effects have Cauchy distributions then they do not possess
finite second moments. Therefore, the only procedures whose assumptions
are satisfied are the ABMedians and ABMediansC. This is evident in the
results in Tables 5A through 5D where the ABMediansC procedure gives
consistently better results than the other procedures. The ABMeans and
ABMeansC procedures perform better overall than the Arvesen and Normal
procedures and in the larger models (Table 5C for example) are better
than The ABMedians and ABMediansC procedures for p values away from
.50000. For all procedures intervals have more variable lengths when
the Cauchy distribution is used.
As we have seen, an overall view of the tables show that it is very
difficult to choose a uniformly "best" procedure. Each of the
procedures has situations where it performs well and other situations
where its performance is poor. With the exception of Cauchy distributed
effects (Tables 5A through 5D), the Arvesen procedure produces intervals
with confidence coefficient consistently close to the nominal level.
However, the lengths of the intervals constructed are quite variable,
especially for smaller models.
The Normal procedure produces intervals that are generally narrower
and less variable than those produced by the Arvesen procedure but with
an inconsistent confidence coefficient. The confidence coefficient
ranges from above the nominal level when the uniform distribution is
used (Tables 2A through 2D) to well below the nominal level when the
Cauchy distribution is used (Tables 5A through 5D).
Another aspect to both the Arvesen and Normal procedures is the
possibility of needing to truncate the endpoints of the interval at
either 0 or 1. This is necessary more often with the Arvesen procedure
than with the Normal procedure and in both cases less than with the
Chi-Square procedure. In those cases where truncation was necessary,
the length of the interval was calculated using the value of 0 and/or 1.
For the methods using the modified Ansari-Bradley statistics we saw
that, due to the method of combining the k possible intervals, the
length of the combined interval is the same as the average length of all
k possible intervals. Therefore, the average lengths reported in the
tables are identical for the ABMeans and ABMeansC procedures as well as
for the ABMedians and ABMediansC procedures. Since the tables also
showed that combining the intervals produces less variable lengths and
consistently higher confidence coefficient, it is recommended that the
ABMeansC and ABMediansC procedures be used rather than the ABMeans or
ABMedians procedures.
The ABMeansC procedure produces intervals with confidence
coefficient consistently higher than the nominal level, even for the
smaller models, except when p = .10000. Even when p = .10000 the
confidence coefficient is only slightly below the nominal level. As
with the Arvesen and Normal procedures, the performance of the ABMeansC
procedure is poorer when the effects have Cauchy distributions although
the drop-off in performance is less severe. The average lengths of the
intervals from the ABMeansC procedure are approximately the same as the
average lengths from the Arvesen procedure for small models but do not
decrease as quickly as the model size increases.
The ABMediansC procedure produces intervals with generally shorter
and less variable lengths than the ABMeansC procedure. The confidence
coefficient is consistently above the nominal level for p values near
.50000 but falls as p moves toward 0 or 1. The drop is quite severe as
p approaches .10000.
As with the ABMeansC procedure, the reduction in average length and
standard deviation of length as the model size increases is not as rapid
for the ABMediansC procedure as it is with the Arvesen and Normal
procedures. However, unlike the Arvesen and Normal procedures, the
ABMeansC or ABMediansC procedures will always produce intervals with
endpoints between 0 and 1 due to their methods of construction (see
Section 3.3).
The choice of which procedure to use to construct a confidence
interval for p will really depend on how much is known or is being
assumed about the model. If it is assumed that the effects have a
distribution similar to a uniform or normal distribution, then the
Normal procedure is clearly superior (Tables 1Al through 2D) since it
produces narrow intervals with confidence coefficient close to or
greater than the nominal level. However, the Normal procedure is not
recommended if the effects might have distributions with heavy tails.
If it is believed that p is near .50000 and nothing is known about
the distribution of the effects, then the ABMediansC procedure is
recommended since, for every distribution including Cauchy, the method
performs very well for values of p near .50000. However, this method is
not recommended if p is thought to be near .10000.
If nothing is known about the distribution of the effects or the
value of p, but moments are assumed to exist, then the Arvesen procedure
or the ABMeansC procedure should be used. These procedures gave the
most consistent performance over the whole range of situations. For
smaller models the ABMeansC procedure would be recommended since it
provides less variable intervals with higher confidence coefficient than
the Arvesen procedure with little or no increase in length. For larger
models the disparity in length and confidence coefficient between the
two procedures increases. If a high confidence coefficient is desired,
then the ABMeansC procedure should be used. If a shorter length is
desired, then the Arvesen procedure will produce such an interval but
with more variation in the lengths and a smaller confidence coefficient.
If it is believed that the effects may have a very heavy tailed
distribution, such as Cauchy, then either the ABMeansC or ABMediansC
procedures should be used since their performance is superior to the
other procedures in this case.
The overall performance of the ABMeansC and ABMediansC procedures
is such that they merit serious consideration when a confidence interval
for p is desired. For distributions of all types and for all but
extreme values of p, these procedures produce intervals that compare
favorably with intervals produced by other procedures and, in many
cases, are superior. This is especially true when the model size is
small. This conclusion is apparently valid even when the assumptions
necessary to apply the other procedures are met. For example, compare
the performance of the AEMediansC and Normal procedures when the effects
have normal distributions and the model size is (6,12) (Tables lA1 and
1A2). Yet the AB procedures can sometimes be validly implemented under
less restrictive assumptions than the competing techniques.
As we have seen, one of the points to consider when choosing a
procedure to use is the assumptions necessary for valid implementation
of the procedure. These assumptions are reviewed in the following
chapter.
CHAPTER FIVE
SUMMARY
In this dissertation we have derived and studied various methods of
measuring the proportion of the total variability in the responses from
a balanced one-way random effects model,
Zij = 1 + ai + eij i = 1,2,...,k, j = 1,2,...,n,
that is attributable to the treatments. These methods require different
assumptions and therefore, theoretically, can only be used if the
appropriate assumptions are met.
The ABMeans and ABMedians procedures (and thus also the AEMeansC
and ABMediansC procedures) derived in Chapter Three require the eij and
ai to possess continuous distributions that are symmetric about zero and
that differ only by a scale parameter. Both procedures also require the
distributions to have bounded, continuous densities that are positive at
zero with bounded, continuous first derivatives. The ABMeans procedure
requires the distributions to have finite second moments while the
ABMedians procedure requires either flxI*f(x)dx < for some p > 0
or lim inf -ln[l-F(x)][21n(x)]1 > 0. Both procedures are asymptotic as
both k (number of treatments) and n (number of observations per
treatment) go to infinity. However, as we saw in Chapter Four, Monte
Carlo studies show that the procedures perform quite well for small
values of n and k.
These assumptions are a broadening of the assumptions used in the
classical, normal theory analysis of the balanced one-way random effects
model. In the classical analysis the effects are assumed to have normal
distributions with zero means and finite variances. The assumptions for
the ABMeans and ABMedians procedures allow the effects to have other
symmetric distributions and, in the ABMedians procedure, does not
require finite second moments.
The U-statistic and Chi-Square procedures derived in Chapter Two,
as well as the Arvesen procedure described in Chapter Two, also require
the eij and ai to have continuous distributions. These distributions
must have mean zero and finite fourth moments but need not be symmetric
nor of the same family. These procedures only require k to go to
infinity rather than both k and n. In some senses this is a more
reasonable approach since increasing k is sufficient to obtain more
information about both the treatment and error effects. Therefore, the
procedures involving U-statistics could be used in some situations where
the ABMeans and ABMedians procedures could not.
Of the procedures derived in this dissertation the ABMeansC and
ABMediansC procedures produced the most promising results. Future
research could include trying to find other ways of combining the
individual intervals that would produce a narrower interval, even if the
empirical confidence coefficient is decreased. The ABMeansC and
ABMediansC procedures produce intervals that, for the most part, have
empirical confidence coefficient far above the nominal level. It would
be desirable to obtain intervals, presumably shorter, with confidence
coefficients nearer to the nominal levels.
81
Other possible areas of future research would be to extend the
procedures to the unbalanced model and to two-way and more complex
models. Formation of pseudo-samples using quantities other than sample
means or sample medians could also be examined.
APPENDIX A
VARIANCES AND COVARIANCE OF U1 AND U2
Using the balanced one-way random effects model under the
assumptions given in Section 2.2, we recall that Ul, given in (2.2.1),
is a U-statistic based on a kernel of degree s = 1. Therefore, from
Result,2.1.2
(A.1)
lim kVar(U1)
k+ao
where, using (2.1.2),
(A.2)
S= E[h(Z (2)2
1 = E[h1(Z1)I (2a2)2.
Recalling from the assumptions in Section 2.2 that the Eij and ai are
mutually independent with mean zero we obtain
E[h (Z1)] = (J-2E[E I (Zj -Z )2]2
j
= 4[n2n(n-1) 2 (n 2)(nO2)E(Z 2)4
+ ( (2) n22 E[(Z11-12 13-14)2]
= 2[n(n-l)]- [E(E11-E12)4
+ 2(n-2)E[(E 1-E12) 2 (El 13) 2
+ (1/2)(n-2)(n-3)E[(E1 --12)2 (e13-14) 2
= 2[n(n-1)]-l[(2~4+6a4) + 2(n-2)(+3o )
+ (1/2)(n-2)(n-3)(4o4)]
44/n + [n(n-l)]- (4n2-8n+12)o
and therefore, referring to Result 2.1.2, (A.1), and (A.2),
lim kVar(U1) = 44/n + [n(n-1)]1 (4n -8n+12)a 4a4
k+<
= 4n-[~ + o4(3-n)(n-1)1],
establishing (2.2.5).
Now recall that U2, given (2.2.2), is a U-statistic of degree
s = 2. We again use Result 2.1.2 to get
(A.3) lim kVar(U2) = 412,
k+-
where in this case, again using (2.1.2),
(A.4) 12 = E[h2(ZI' Z2)h2(Z1Z3)] (2a2+2ao)2.
Calculating the expectation on the RHS of (A.4) gives
E[h2(Z ,Z2)h2 (Z1Z3)]
= n-4E([ZE (Zlj-Z2j-.)2][E, (Z1-j Z3j.)2])
= n-4 (nE[(Z -Z ) (Z 1-Z31) 2
11 21 21 j 31
+ n3 (n-1)E[(Zl-Z 212 (Z12-Z31)2 )
= n-1(E[ (al-a2+11-E21) 2 (l-a3+E 31)2]
+ (n-1)E[(a -a2+E 1-21) 2(al-a3+12- 31)
[(n,+ [ +3a4+3a4+122a 2)
44 a C a e
+ (n-l)(n +3o4+4o4+8o2o 2)
and therefore, referring to Result 2.1.2, (A.3), and (A.4),
lim kVar(U2) = 4(n4 + 4/n a4 4/n + 4o2 2/n),
k+w a E aE
establishing (2.2.6).
Finally, observe from Result 2.1.4 that
(A.5) lim kCov(U1,U2) = 2(1,2),
k2 1
where, using (2.1.3),
(A.6) (1,2) = E[h (Z)h2(Z )] (2a2)(2 2+202)
1 1-1 21'-2'a
Again, first looking at the expectation on the RHS of (A.6), we obtain
E[h1 (Zl)h2(Z,Z2)]
= 2n-3(n-l)-lE[[ (Z .-Z 2 ][Z (Z-Z )2
J j i(-z22-2 i2j)
j
= 2n-3(n-l)-l[2n(2)E[(Z1-Z 2(Z -Z )2]
12 11-21
+ n(2)(n 2 )E[(Z -Z2 (Z13-Z212]]
= nl2E[(l 1E-12)2(a1-a2 +E lC21)
+ (n-2)E[(e1-E12)2 (l-a2+)13-+21 2
Sn-1[2( +3 4+4o2a 2) + (n-2)(4a +42a 2)]
and therefore, referring to Result 2.1.4, (A.5) and (A.6),
lim kCov(U1,U) =4n 4-o
k+-w
establishing (2.2.7).
APPENDIX B
THE RELATIONSHIP BETWEEN U1, U2, MST, AND MSE
This appendix establishes the relationship between U1 and U2, given
in (2.2.1) and (2.2.2), and MST and MSE, the mean squares from the usual
one-way analysis of variance table (Scheffe' 1959, Page 225).
First, we expand U1, U2, MST, and MSE so that each is written
completely in terms of the quantities ai and eij. While this is not
necessary in order to see the relationship between U1 and MSE, it
facilitates establishment of the overall relationship between the
statistics.
From (2.2.1) we see that
U1 = 2[nk(n-1)]-1 E E (e ij-E )2
i j
(B.1) = 2[nk(n-1)]- E E E [(e6.+2.) 2e. .E. .
= 2(nk)- 2. 2[nk(n-l)]- E Ei .
ij ij i j3j3 i j"
-1- -1
Letting Z. = n ZZij and = n E it follows that
-1 2
MSE = [k(n-l)] EE(Z -Z ) Expanding, we obtain
ij
-1 2
MSE = [k(n-l)]l E(C. -. )
ij 1j 1i
1 2 -2
= [k(n-1)] -1(E 2 + 2 )
ij j i. ij:i.
= [k(n-1)]-[Ze 2
ij J
= [k(n-l)]- [Eij
ij
+ n E
i j
- n-E(Z Eij2]
i j
= [k(n-1)]-[((e 2.
ij ij
-1 2
- n E2e )
ij i
ij
-1i
- n-I E E i ]
i j~j, 1j ij
= (nk) Z2E [nk(n-l)] 1 Z E E ..
ij i l J' ij ij
ij t follows that
Thus, from (B.1) and (B.2) it follows that
MSE = Ul/2.
In the same manner, from (2.2.2) we see that
U2 = 2[n2k(k-l)]- E E
i*
*
= 2[n2k(k-1)]-lZ Z
i*
*
E (ai-ai +E j-E -j ) 2
j'
j-
+ (C +C ,) (2 a.,) + 2(aCiij+aiEi.j,)
+ ij ,i 1 1 i1i
- 2(aEi..+a. eij) (2E ijej)
-1 2 -1 2
= 2k Ea2 + 2(nk) ZEE
i ij
4[k(k-l)] L Z a i, + 4(nk) -EaiEij
i*
-1*
4[nk(k-l)]-E E Ea.i .i
ini'j J1 i
4[n2k(k-l)]-1Z ZE e.. .
i*
*
Letting Z
-1
= (nk) EZij
ij i .
= (nk) EEi ., and a = k Eai, it follows
ij i
that MST = n(k-l)- E(Z-Z ) 2.
Expanding, we obtain
(B.2)
2n- E(E )j Eij
i j i
(B.3)
MST = n(k-1) (a -ace. -e )2
i
= n(k-1) 1[a2 + 2 + i + (e2 -2. E ) 2caa
) + i2 +i. .. i ... I
i
+ 2ajii. (2ai ..+2aci.-2a )]
= n(k-1) [a2 + k lZa )2 + n2E(1 ij
i i i j
(n2k)-1(Eze )2 2k-1 (Za )2
ij i
+ 2nZ-1.ai Ej 2(nk)- l(Ea )(E Z ,)]
i i i i'j
= n(k-1)l[1l+k-l12k- a 2
i
+ [n (n k) ]Ee2 + [k--2k-]E E a ai.
ij .j i i
+ [2n-1-2(nk)-l]Ea e.. + [-2(nk) 1]E Z Ea
ij 13 ii'j
+ [-(n2k) -1]E E E E
iti'j j '
+ [n-2-(n2k)-1] E ,iji
i jjjj
= nk-Ea2 + (nk)-l E 2n[k(k-l)]- E a. a.
i 1 ij 1i*
*
+ 2k- EEaE. 2[k(k-l)]-E E Ea C. .
ij i ij iti'j 1 i'j
-1
2[nk(k-l)]-lE E E c Eij ..,,
i*
*
+ (nk)-iE EC i E ,ji
i j j'
allows from (B.1), (B.3), and (B.4) that
li Ei
(B.4)
It now fo
MST nU2/2 = (l-n)(nk)-1 Ei2 + (nk)-l Z i i-.
ij J i j j
= (l-n)Ul/2
and thus
MST = [nU2 + (1-n)U1]/2.
APPENDIX C
A CONSISTENT ESTIMATE FOR aT
The confidence interval for p given in (2.2.10) involves an
estimate for the asymptotic standard deviation of U1/U2. The form of
that estimate is derived in this Appendix using the model and
assumptions from Section 2.2.
From (2.2.8) and (2.2.9) it follows that
T 11 a e 22 E a E
(C.I) T2 = o(2 2 o)2 422-3(22 2
4012 (2o +20 )
Theorem 2.1.3 gives conditions under which .U-statistics converge almost
surely to their expectation. Using (2.2.3) and (2.2.4), it follows that
for any number c,
(C.2)
and
(C.3)
Uc aks (2o2+2o2)c
2 k+- a E
c (2 2 c
U ka2s (2 2
1 k+- E
If consistent estimates of all, 022' and 012 can be found, they can be
combined with U1 and U2 to form a consistent estimate of aT as given
in (C.1).
We now turn our attention to finding consistent estimates
for all, 022, and 012. Note that from (2.2.5), (A.1), and (A.2),
S= E[h (Z)] (2o2)2. Defining
(C.4) 8 = k- 1[h1 )-U 2,
i
we obtain the following result.
a .s.
Result C.1. Defining 811 as in (C.4), 811 k+a o11
Proof: Expanding the RHS of (C.4) we obtain
S= k- Eh(Z.) 2Ulk E (Z) + U2
11 1 1 1 --i 1
= k-1 ( i) U2
as E[h2(Z)] (22)2.
k+ 1 -1
The last step is justified by using (C.3) with c = 2 and noting that
-1 2
k Zh l(Zi) is a U-statistic and applying Theorem 2.1.3.
i
Now note that from (2.2.7), (A.5), and (A.6),
012 = 2(E[hl(Z1)h21,2)] (2a2)(2o2+22)). Defining
(C.5) 812 = 2([k(k-l)]-l1 h(Zi)h2(Zi,Zi) U1U2,
iti"
we obtain the following result.
Result C.2. Defining 812 as in (C.5), 812 o012'
Proof: We rewrite 812 as
12
S12/2 = 2[k(k-1)]-l Z (1/2)[h (Zi)h2(Z.,Zi.)
i
+ hl(Zi)h2(Z ,Zi)] U1U2.
The first term on the RHS is a U-statistic and therefore, by
Theorem 2.1.3, converges almost surely to the expectation of its kernel
which is
(1/2)E[h(Zl1)h2(Z1,Z2) + h1(Z2)h2(Z2,ZI1)
= E[h1(Zl)h2(Z1,Z2)].
Application of (C.2) and (C.3) with c = 1 completes the proof.
Finally, note that (2.2.6), (A.3), and (A.4) imply that
a22 = 4(E[h2(Z ,2)h2(ZZ3)] (2a2+2o2)2). Defining
-1
(C.6) g(Zi) = (k-l)1 h2(ZiZ)
ii i
and
2
(C.7) (22 = 4k E[g(Zi)-U2]
i
we obtain the following result.
Result C.3. Defining g(Zi) as in (C.6) and 622 as in (C.7),
a.s
22 k+4 22'
Proof: Expanding (C.7) we obtain
822/4 = k- [g2 (Zi)] 2U2[k(k-1)]- EE h2 (Zii) + U2
i iti 2
'22 1 k-2 -1 2 2
= k- [(k-l)-1 Z h2(Zi,Zi)]2 U2
i i'1i
Rewriting the first term we obtain
[k(k-1)2] -1( h (Z ,Z ) + 2E E E h (ZZ i )h (Z ,Z)
2 -1 i2 i )h 2 1
iti i i "*
iti"*
= (k-)- 2[k(k-1)]l ZEh2(Z, Zi)
i*
(C.8)*
1 1 i-1 k
+ (k-2)(k-1) 6[k(k-1)(k-2)3] (E E E h (Z ,Zi.)h (Z.,Zi )
i i'=1 i:=i'+l
i i:
+ E E h2(Zi,Zi.)h2(Zi,Z i.).
Si*
*
The first term of (C.8) is equal to
therefore converges almost surely to
Theorem 2.1.3 since E[h2(Z ,Z2)
moments, 14 and n4, of the eij and a,
rewritten as
(k-2)(k-1) 6[k(k-1)(k-2)3] 1
times a U-statistic and
This follows from
the assumed finite fourth
second term of (C.8) can be
i-1 i-1
E E h2(Z i'Zi)h2(Z iZ i
i i'=l i '-i'+l z
i-1 k
+ h2 (ZiZi.) )h2( (ZZi .)
i i'=1 i--i+l
+ Z Z h2(Zi,Z9i)h2( ii))
i*
*
= (k-2)(k-1)- (k)-1 E E (1/3)[h (Zi i)h2i i
i*
*
+ h2(Z i)h(Zi,Zi) + h2(Z iZi)hZii)],
which is (k-2)(k-1)-1 times a U-statistic and thus, again using
Theorem 2.1.3, converges almost surely to E[h2(Z1,Z2)h2(Z1,Z_3)]. Thus,
(C.8) converges almost surely to E[h2(Z1,Z2)h2(Z1,Z3)]. The result is
proven by using this fact and applying (C.2) with c = 2.
Using Results C.1, C.2, and C.3, and (C.2) and(C.3) with various
2
values of c, we obtain a consistent estimate of aT We will denote the
2
estimate by 0T where
2 -= 2 22 4 -3
S= +11U2 a22U1 U2 2a12U1U2
APPENDIX D
DERIVATION OF ENDPOINTS IN CHI-SQUARE PROCEDURE
Using the model and assumptions in Section 2.2 a confidence
interval for p was derived using the X2 distribution (2.2.14). The
formulas for the endpoints of this interval are derived in this
appendix.
To find the slopes, dl and d2, of the two lines in Figure 2.2.1 we
rewrite (2.2.13) as
2 2 + 2 2
22X2 2a22U X' + 22U + a 2 2a UY' + aU2
2o12X'Y' + 2ao2U1Y, + 2ao2U2X" 2a 2U1U2
D2 k-1 0.
Substituting Y' = dX' and collecting coefficients yields
X'2 (22 + olld2 2a12d)
(D.1) + X'(-2a22U1 2allU2d + 2o12U1d + 2a12U2)
2 + U 2 Dx2 -1 0.
+ (o22UI + 11U 212U1U2 -2 ) = 0
The values of d for which Y' = dX' are tangents to the ellipse
depicted in Figure 2.2.1 are the values that yield only one solution of
this quadratic equation in X'. If we write (D.1) as
,2
alX2 + blX' + cl = 0, the values of d we are seeking must satisfy
b 2 4ac = 0. Now,
1 -41c
b2= 22 + 4aiU2d2 + 402 U2 + 40 U2
b1 221 11 2 12U1 12 2
+ 8ala22UUd 8o0 2Ud 8oa 8 U
8 U 2- 82U 801a12U2d + 8022U1U2d
8o1112U2U2 2 12 18OlO12Ud+8o UU2d
and
4a2 2 42 22
4ac1 = 422 41 1122U 82 2022U1U2
2 -1
- 4022DX2~k
+ 4ao 22U2d + 402 U2d2 8al U U d2
+ 11 22 1 11 2 11 12 1 2
- 4 d2 2 -1 8 2 8O a
11 DXk 812 22 d 1112d
2 2 -
+ 16012U1U2d + 8al2dDX2Sk ,
hence,
2 U 2 2 2 2 a UUd
b 4 = 4 d + 42U2 + 1a22U1U2d
1 11 12 1 122 112212
8a2UU2d 4allo22U + 4o22DX2k-1
-4o22Ud + 4 d2 2 1 8o 2 -1
411221 + 11 DX2k 1 8l2dDX2k
2 2 2 2 2 -1
= d (412U1 4a1122U1 + 4o1DX2k )
2 2 -1
+ d(8o11022U1U2 8ao2UlU2 8O12DX2 k
2 2 4 12 2 -
+ (4al2U2 4a11o22U2 + 4o22 DXk )-
12 2 2 11 2 22%
(D.2)
Writing (D.2) as a2d + b2d + c2, we see the values of
2 2
b2 4a1c1 equal to 0 are the roots of a2d + b2d + c2 = 0.
I 112 2 2
roots are (2a2)
2 1/2 -
[-b2 (b2-4a2c2) r and
d that make
These two
(2a2) i[-b2 + (b2-4a2c2)1/2] r.
The values needed to find r and r are
-b 2 + a 2 -1)
-b2 = 8(-o11l22U1U2 + 2U 1 U2 + 12DX2k ),
2 2 a U2 + 2 -1
2a2 = 8(a12U1 all U + 11DX2k ),
2 121 11 22 1 112 2
b = 64(o a2 U2 U2 + 4 2U2U2 + a2D2 ( )k-2
2 2 12 02 -2 1
2a11a22U2Ulu2- 2allO 2a22UIU2DX 2
3 2 -1
+ 2a02U1U2DX2 k ),
4 ,2 2 22 2 2 2 -1
4a2 = 64(a 2Ul22- 0 a11 a2 2UU2 + a02a22UDX22 k
0- 22U2U2 + 20U2U22
2 2 2 1 2 2 2 -1
alla22U DX2 k + allal2U2 2D k
11+ 211122 2
2 U2DX k + o1 22D2 (X 2 )2k-2
1- 1a22 2D + 2122 21) ),
and
b2 4a2c2 = 64DX22X2 2oa 221U2
1212 12a221 11 221
+2o32U1U 022022U2 +ollo22U2
2 2+ 2 2 2 -1
1- 12 2 11 22U2 -11 22DX2
Ideally, both r and r+ will be greater than one since that would
produce a confidence interval with endpoints between 0 and 1 (the range
of possible values for p). If this occurs, we define the values of d as
dl = min(r ,r ) and d2 = max(r ,r ).
In practice however, it is possible that one or both of r and r
are not greater than one. These situations are handled in the following
manner. If the ellipse intersects the Y' axis, d2 is set equal to -.
If the ellipse intersects the line X = Y', dI is set equal to 1. If
both of these events occur, the confidence interval will have endpoints
of 0 and 1. If only one occurs, the other value of d is set equal to
the value of r- or r whichever is greater than one.
*
* |