Analysis of continuous proportions

MISSING IMAGE

Material Information

Title:
Analysis of continuous proportions
Physical Description:
ix, 139 leaves. : ill. ; 28 cm.
Language:
English
Creator:
Johnson, David Walter, 1948-
Publication Date:

Subjects

Subjects / Keywords:
Distribution (Probability theory)   ( lcsh )
Statistical hypothesis testing   ( lcsh )
Estimation theory   ( lcsh )
Dirichlet series   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis--University of Florida.
Bibliography:
Bibliography: leaves 137-138.
Statement of Responsibility:
by David Johnson.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 000582385
notis - ADB0760
oclc - 14102999
System ID:
AA00003537:00001

Full Text











ANALYSIS OF CONTINUOUS PROPORTIONS


By

DAVID WALTER JOHNSON












A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE DEGREE OF DOCTOR OF PHILOSOPHY










UNIVERSITY OF FLORIDA


1974




















TO CAROLYN














ACKNOWLEDGMENTS


I would like to express my appreciation to the

chairman of my committee, Dr. Ramon C. Littell, for

his guidance in the preparation of this dissertation.

A word of thanks also goes to the other members of my

committee, Dr. Frank G. Martin, Dr. Zoran Pop-Stojanovic,

Dr. P.V. Rao, and Dr. John G. Saw, for their assistance

and to Dr. William Mendenhall for his encouragement

during my graduate career at the University of Florida.

Special thanks go to my parents for their assistance,

both financial and emotional, and to my wife, Carolyn,

without whose love and understanding this dissertation

would not have been possible.


iii

















TABLE OF CONTENTS




ACKNOWLEDGMENTS . . .

LIST OF TABLES . . .

ABSTRACT . .


CHAPTER

1


INTRODUCTION . .


2 REVIEW OF THE LITERATURE .



3 ESTIMATION FOR THE BETA DISTRIBUTION

Introduction . .

Sample Moment Estimators .

Maximum Likelihood Estimators .

Derivation of the Estimators

Asymptotic Properties of the
Estimators . .

Small Sample Properties of
the Estimators .

Geometric Mean Estimator .

Comparison of the Estimators .


. 18

. 18

. 19

. 26

. 26


Page

iii

vi

vii




























TABLE OF CONTENTS (Continued)


Page


HYPOTHESIS TESTING FOR THE BETA


DISTRIBUTION


Introduction . .

One-sided Tests for a When 8 Is

Normal Approximation .

Beta Approximation .

Two-sided Tests for a When 8 Is

Normal Approximation .

Beta Approximation .

One-sided Tests for a When 8 Is
Unknown .

Two-sided Tests for a When 8 Is
Unknown .

A Test for the Mean of the Beta
Distribution .


5 ESTIMATORS FOR THE DIRICHLET
DISTRIBUTION . .


BIBLIOGRAPHY . .


BIOGRAPHICAL SKETCH . .


Known


Known


85

86

89

91

96

98

99


100


113


. .. 114


. 129


. 137


. 139


CHAPTER

4


. . 85







. ..







. .




















LIST OF TABLES


Table Page

1 Variances and Covariances of the
Sample Moment Estimators .. 68

2 Biases and Expected Mean Squares and
Products of the Maximum Likelihood
Estimators . .. 71

3 Variance, Bias, and Bias Squared for
the Geometric Mean Estimator 76

4 Value of c for the Normal Approximation
n
to the Distribution of T Xi for a
i=l
One-sided Test for a When 8 Is Known 116

5 Values of the Parameters and c for the
Beta Approximation to the Distribution
n
of H Xi for a One-sided Test for a When
i=l
8 Is Known . . 120








Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



ANALYSIS OF CONTINUOUS PROPORTIONS

By

David Walter Johnson

December, 1974



Chairman: Dr. Ramon C. Littell
Major Department: Statistics



Many experiments are designed to take measurements

on the decomposition of one particular entity into several

parts. The effects of other variables in the experiment

on this decomposition are then noted and analyzed. This

dissertation is a first step toward the analysis of such

a multivariate response of continuous proportions. Two

possible distributions, the Dirichlet distribution and

a generalization of the Dirichlet distribution, are

proposed as models for this response vector. It is then

desired to investigate the general inference problems

of estimation and hypothesis testing for these distri-

butions.

However, before attempting to solve the multivariate

problem, it was thought best to investigate the univariate

case. A univariate response of this type would consist

of a single continuous proportion, and it is assumed


vii








that such a response will follow a beta distribution.

The problem then is to estimate and to conduct tests

of hypothesis about the parameters and mean of such a

beta distribution.

Chapter 3 deals with methods of estimation for the

beta distribution. Two sets of estimators, the moment

estimators and the maximum likelihood estimators, are

given for the parameters of the beta distribution. For

the mean of the beta distribution, a geometric mean

estimator is given in addition to the moment and maximum

likelihood estimators. Comparisons among the estimators

are made in terms of their biases, expected mean squares,

and variances.

In Chapter 4 various tests of hypothesis are

constructed for the parameters and mean of the beta

distribution. Both one-sided and two-sided hypotheses

are considered and uniformly most powerful or uniformly

most powerful unbiased tests are given. To obtain the

critical values for such tests two methods of approx-

imation, a normal approximation and a beta approximation,

are given. Where it is possible, comparisons are made

between the two methods.

Finally, Chapter 5 deals with estimation for the

Dirichlet distribution. Methods for obtaining both

the moment and the maximum likelihood estimators for

the parameters of the Dirichlet distribution are given.


viii








As was noted earlier, this is just a beginning on the

original problem. It remains to be determined if the

properties of the estimators for the parameters and

mean of the beta distribution also hold for the

parameters and mean of the Dirichlet distribution.

Also, the problem of developing tests of hypothesis for

the Dirichlet distribution is still to be investigated.

Finally, the question of the usefulness of the

generalization of the Dirichlet distribution needs to

be answered.















CHAPTER 1

INTRODUCTION


In many experiments in various disciplines, an

analysis is done on the constituent parts of one mea-

surement. Generally, these constituents are expressed

in terms of the percentage or the proportion which each

one makes of the entire measurement. For example, after

some treatments have been applied to a set of experi-

mental units, a chemical analysis might be performed

to determine if the treatments have had an effect on

the composition of the material. From such an experiment

one might measure the total protein content, and then

divide that total protein into three or more types

of protein.

Specifically, consider an experiment in which a

measurement is made of the chemical composition of

shrimp. Measurements are made of the variables

% solid, % water, total extractable nitrogen, total

extractable protein, and many others. The % solid is

further divided into % fat, % protein, % ash, and

% carbohydrates. The treatments to be applied to the

shrimp consist of two sets. Some of the shrimp will be

stored on ice for periods of zero, seven, and fourteen








days. The remaining shrimp will be divided into five

batches and each batch will be cooked by a different

cooking method. The experimenter wishes to know if the

two sets of treatments, the storing times and the cooking

methods, have any effects on the composition of the shrimp.

At present, the most frequently used method of

analyzing such data is to consider each component of

the chemical analysis separately. Techniques, such as

analysis of variance, are applied to determine if

differences exist among the treatments. In some experi-

ments, this component by component analysis may be what

the experimenter wants. However, it may also be that the

experimenter would like an overall measure of differences

between the treatments, rather than the analysis for

each component separately. For example, considering

the shrimp data, the experimenter may wish to know if

the breakdown of % solid into its four constituents

is the same for all the treatments. If the separate

analyses result in the conclusion that the treatments

are different for some constituents but not for others,

it may not be clear whether the treatments are different

overall.

One possible solution to this problem would be to

create a vector of the constituents, assume the vector

has a multivariate normal distribution, and use well-

known results of multivariate analysis to analyze the

data. However, the assumption of multivariate normality








may not be appropriate. In the shrimp data, the per-

centages of the four constituents, which make up the

total percentage of solids, must themselves add to the

total percentage of solids. If one forms a vector of

the constituents, using either the percentages themselves

or the proportion that each percentage takes of the

total percentage of solids, then the elements of that

vector are subject to a constraint. Thus, a better

approach may be to try to determine the distribution

of such a vector, and through that distribution, make

inferences about the population from which that vector

has come.

The problem, then, is one of inference in general.

Although the ultimate goal is to make inferences about

a vector of continuous proportions using the Dirichlet

or generalized Dirichlet distribution, this dissertation

deals for the most part with inferences about a single

continuous proportion using the beta distribution.

Some results dealing with the Dirichlet distribution

are given in Chapter 5.














CHAPTER 2

REVIEW OF THE LITERATURE


A distribution which naturally presents itself to

this problem is the Dirichlet distribution. The deri-

vation of the Dirichlet distribution is straightforward

and is discussed in Hogg and Craig (1965). Let

X1,...,Xk+1 be mutually stochastically independent

random variables each having a gamma distribution with

parameters a. and = 1. The joint distribution of
1
X1,...,Xk+l is,then,


k+l ai.- -Xi
S(X ,... ,X k) = 1 X. e 0 i=l (a



Let


X.
= 1i = 1,2,...,k 0 X1 +..+Xk+l


Yk+l = X1+...+Xk+l O


Then



X1 = 1Yk+l'" Xk k Yk+l


Xk+1 = Yk+(l1-Y 2 '-Y2- -k). (2.3)









To obtain the density of

the above transformation


k+1
0

0





0


0

Yk+1
0





0


Y1,'...,Yk+, the Jacobian of

from the Xi's into the Yi's is


0

0

k+1




0


-Y -k+ -Yk+
k+l k+l k+l


S Yk+1 k

k+-Y1 1 -Y1-"-Yk
k+l 1-1 k


By expanding the determinant about

for k an even integer,


J = Y1


Yk+1
0


the last column,


0

k+1


-Y -Yk+ -Yk+ -Yk+
k+l k+l k+l k+l


"' -Yk+1


Yk+1
0


0

k+1


-Y -Y -Y -Y .. -Y
k+l k+l k+l k+l k+l

+(1-Y -Y .-Y) Yk+
1 2 k k+1


+ .


...


. .








Expanding the determinant in the first term about the

first column, the determinant in the second term about

the second column, and so forth,


k-l k-1
J = Y[-(-Yk+l(Y k+) -Y [(-Yk+l )](Yk+l)

+Y [-(-Yk+l (Yk+) k-l .- k+ -k+l (k+l)
Y3 k+l k+l k k+1 k+k

+(l-Y -Y -...-Y)y
1 2 k k+1

= yk (2.4)
k+l


k
for k even. Similarly, if k is odd, J = (Yk+l) Then



f(Y1 **Yk+) = (Y1 Yk+) l... (YkYk+l) k


(k+l-1
k+l 1 aI +k.l

H Y -Y 1 k+1



k+1 a.-1 al+...+a k--1
k1 Yk1 k+l
= I Y Y
i=1 i k+l

a k+l-1 -"Yk+l k+l
+ k+lk+l

(l-Y k+l e / r(a.).
I1 k i=l


(2.5)








Thus,


k a.-1 k a -1
H Y. (- z Y.) +
i=l 1 i=l1


00 k+l
7 a.-l
i=l 1 -Yk
k+1
0

k+l
r( ai.) k ai-l
= i=l 1 Y.
k+l i=l 1
Sr(a,)
i=l


+1 k+l
+1
/ n rF(a) dY
i k+l
i=1


k k+l-1
(1- Y.)
i=l


0 1


From an inspection of the Dirichlet distribution,

it is easily seen that this distribution has some of

the properties desired in a distribution for a vector

of continuous proportions. First, it is a distribution

on k continuous random variables. Second, each of these

random variables is defined on the interval (0,1), so that

the Y.'s are in the form of proportions. Third, if the

Dirichlet density is written in a slightly different

form, namely, letting Pi = Yi for i=l,...,k and

Pk+l = (1-Y -...-Y ), the density is
k+l 1 k


f(P ,P ,...,k+ ) =
1 2 k+1


k+l
r ( Z ai) k+l Ci-1
i=l 1 P.
k+l i=l
= r(c.)
i=l 1


(2.6)


(2.7)








Now, the P.'s have the property that they sum to one.

Thus, the Dirichlet distribution seems to be a good

candidate for a distribution for a vector of continuous

proportions.

A second distribution for a vector of continuous

proportions is a generalization of the Dirichlet distri-

bution that was proposed by Connor and Mosimann (1969).

This generalization is based on a "neutrality" concept

defined by Connor and Mosimann. Consider a set

(P1"',,Pk) of nonnegative continuous random variables

satisfying the constraint that the P.'s sum to one.
1
Let



S. = E P. j=l,...,k with S = 0, (2.8)
3 i=1 1 0


Z. = i i=2,...,k-l with Z = PiZk = 1, (2.9)
i 1 lZk = 1, (2.9)
Si-1

P = (Pl',..Pk) with Pjl = (Pl'".Pj)


and Pj2 (Pj+l '".Pk) j
and Wj = [1/(l-Sj)]Pj2. (2.11)




The concept of a neutral proportion is then defined

as follows:

Definition 2.1: Given a random vector of proportions,

(P",...Pk), the proportion P1 is said to be








neutral if P is independent of the vector

(P2/1-P ,P3/l-P.. ,Pk/1-P )

Intuitively, this definition states that "P1 does not

influence (i.e. is neutral to) the manner in which

the remaining proportions P2",..,Pk proportionally

divide the remainder of the unit interval; namely the

interval (P1,l)." The concept of neutrality is also

defined for vectors.

Definition 2.2: Given P divided so that P = (P. ',P ),

P.jl is a neutral vector if it is independent

of Wj. If Pjl is neutral for all j, then P

is said to be completely neutral.

An important point to note here is that the order of the

P.'s in the vector P is critical. While P1 may be neutral

in the vector (Pl,P2,... Pk), P2 need not be neutral in

(P2',P1P 3"',Pk). A similar relationship holds for
neutral vectors.

Connor and Mosimann next state and outline the

proofs of three theorems relating these concepts of

neutrality to the random variables, Zi, i=l,...,k

defined previously.

Theorem 2.1: If P. is neutral for j=l,2,...,r

then the random variables Z1,Z2,...,Zr are

mutually independent.

For the natural extension of this theorem to a completely

neutral vector a stronger result may be obtained.








Theorem 2.2: P is completely neutral if and only if

Z Z2,...,"Zk are mutually independent.
Finally, from this theorem a third result is obtained.

Theorem 2.3: P is completely neutral if and only if

Pjl is independent of Zj+1 for all j.
With the concept of neutrality thus defined and related

to the random variables Zi, it is possible to derive

a generalization of the Dirichlet distribution. Assume

that P is completely neutral. This implies that the

Zi's are mutually independent. Assume the density of

Zi is the univariate beta density,


-1 ai-1 bi-l
f(Zi) = [B(ai,bi)] Zi (l-Zi) (2.12)


where ai>0 and bi>0 and B(ai,bi) = r(ai)r(bi)/r(ai+bi)

is the beta function. Then


k-1 -1 ai-l bi.-
f(Z ,...,Zk) = n B(a ,b.) Z. (1-Zi) (2.13)
i=l i 1 1 i
i=1


Now, the Zi's can be transformed to the Pi's since

Zi = Pi/(l-P1+...+Pi_). The Jacobian of the trans-

formation is lower triangular of the form









1 0

1
I-P1


0


1
i-P -P
1 2


1
1-P1 2 3


1
k-2
1- E P.
j=1 3


1
I-S
m-1


f(Pl,...,P ) =
I k -


k-1 -1 al-1 a2-1
H B(a.,b.) P (P /1-P )
i=l 1 1 1 2


a -1 b -1
k l--P1)


p b -1
(1- 2 ) 2
1-P


k-1
1/ n
m=l


... (1-


b -1
p bk
Pk-i )
1-P -...k-2


(l-S )
m-1


k-1 a.-l a.-l
= B(a.,b.) P. /(1-S ) 1
i=l 1 1 1 -1

k-1 b -lk-I
H [i-S./(l-S _I)] n 1/1-S _.
i=l m=l


k-1
= n
m=l


Thus,


(2.14)


*

*
...

...


..*


. .








k-1 b k-lk-l a. -l
f(P ,...,P ) = H B(a.,b.) P k P
1 i=1 i k i=l 1


k-2 b.-l k-i a.-l b.
S (I-S.) / (1-S. ) 1 (1-S. )
i=l 1 i=l 1-1 1-1


k- bkl-i1 k-I a.-1 k -a.
= B(a,,b,) P l P ( ) 1
i=l k i=1 j=i


k-2 b.-i k-i b.-i
H (-S.) / H (-S. i )
i=l 1 i=l 1-1


k-1 -i k-1
= n B(a.,b.) P
i=l 1 1 k


k-i a.-1 k b.-(ai+b.)
n P. ( Z P.) 1 (2.15)
i=l j=i 3




where


k k-i
E Pi= i, Pk = 1- P., and b0 is arbitrary.
i=l i=l



In the special case when b. = a.+b. i=2,...,k-1, this

generalization of the Dirichlet distribution reduces

to the Dirichlet distribution itself. Thus, it is seen

that the distribution just derived is indeed a general-

ization of the Dirichlet distribution.








Some interesting properties of this distribution

in relation to the Dirichlet distribution are:

1. Since the Dirichlet distribution is a special

case, a vector of proportions following a

Dirichlet distribution is completely neutral.

2. From the symmetry of the Dirichlet distribution,

if P has a Dirichlet distribution then any

permutation of P is necessarily completely

neutral. But, for P to follow the generalized

Dirichlet distribution, only one permutation of

P need be completely neutral.

From these results, it is quite evident why the general-

ized Dirichlet distribution may more easily fit a vector

of continuous proportions than the Dirichlet distribution.

If there exists one completely neutral vector among

the permutations of P, and if it may be assumed that the

Z.'s have univariate beta distributions, then P has a

generalized Dirichlet distribution. Furthermore, to

rule out the Dirichlet distribution, it is necessary

to find only one permutation of P which is not

completely neutral.

The final section of Connor and Mosimann's paper

deals with two examples, one of which will be considered

briefly. The data for this example come from a horny

covering, called a scute, on the underside of a particular

variety of Mexican turtle. The undershell is divided

into two sections, an anterior and a posterior section.








The anterior section is covered by five scutes, one

gular scute and a pair of humeral and pectoral scutes.

Measurements are taken along the midline of the length

of the gular, humeral, and pectoral scutes. If these

lengths are denoted by Y1,Y2, and Y3 respectively,

while the total length is L, then the proportion of the

total length taken by each scute may be formed by

Pi = Yi/L for i=1,2,3. These types of data are used by

taxonomists to distinguish between different populations

of turtles.

Through a study of the correlations of P1 with

P2/1-P' P2 with P1/l-P2, and P3 with Pl/l-P3, it was

found that the vector P = (P1,P2,P3) is completely

neutral. For other orderings of P, it was found that

the vector is not neutral, a larger pectoral of humeral

scute favoring the gular scute to occupy proportionately

more of the remaining interval. However, there is a

problem which Connor and Mosimann themselves point out.

"In these analyses the correlation coefficient is used

as a measure of dependence or nonneutrality. Significant

non-zero correlations, tested by Fisher's Z transformation,

are taken as evidence of nonneutrality. On the other

hand, even if the population correlations are zero,

neutrality does not necessarily follow with our non-

normal data." In other. words, a vector which is claimed

to be neutral, need not be neutral at all. Thus, some

better measure of neutrality would be very useful.








Finally, since P = (P1,P,P 3) is a completely

neutral vector and no other ordering of P yields a

completely neutral vector, then the generalized

Dirichlet distribution may be considered for P, while

the Dirichlet distribution may not be considered.

If it is assumed that Z1 and Z2 have beta distributions,

then P has a generalized Dirichlet distribution.

Thus, at this point, it seemed reasonable to

consider only the generalized Dirichlet distribution

for a vector of continuous proportions, since the

Dirichlet distribution is a special case of the

generalized Dirichlet distribution. However, a recent

paper by James (1972) would seem to restrict the

usefulness of the generalized Dirichlet distribution.

The main thrust of James's paper is that if certain

ratios of the P.'s, other than Z., are assumed to have
1 1
beta distributions, then the generalized Dirichlet

distribution reduces to the Dirichlet distribution.

As an example, a theorem from James's paper states,

Theorem 2.4: Let (P ,...,P ) have the generalized

Dirichlet density and suppose that each of

the variables Ui = Pi/1- E P i=l,...,n-1

is marginally beta. Then b. = ai+l+bi+l
i i +1 i+1
for all i = l,...,n-l and consequently

(P...,Pk) has the Dirichlet density.

Thus, the rather simple assumption that U. has a beta

distribution reduces the generalized Dirichlet distribution








to the Dirichlet distribution. There is apparently

then a dilemma. On the one hand, the practical example

of the turtle scutes leads to the conclusion that the

generalized Dirichlet distribution is of value, while

James's theoretical considerations imply that it has

little value.

It seems then, that as well as the general inference

problems of estimation and testing, there are the

problems of a measure of neutrality for a neutral

vector and of determining whether the Dirichlet

distribution or the generalized Dirichlet distribution

better fits a vector of continuous proportions. Although

these are important problems, their solution will be

left for future consideration.

This, then, is the problem and the work which has

been done on it, as set forth at the beginning of this

dissertation. It became evident as the research pro-

gressed, that the problem was quite broad. Thus, the

results which follow are a beginning toward utilization

of the Dirichlet distribution for analyzing a vector of

continuous proportions. Most results deal not with the

Dirichlet distribution, but rather with the beta distri-

bution. Since the Dirichlet and generalized Dirichlet

distributions are so closely related to the beta distri-

bution, and since work on estimation of and testing

hypotheses about the parameters of the beta distribution

has been rather slight, it was thought to be a good








starting place for work on the entire problem. As is

pointed out in Chapters 3 and 4, the chapters dealing

with inference for the beta distribution, some results

are directly applicable and others may generalize

rather straightforwardly to the Dirichlet and generalized

Dirichlet distributions. Chapter 5 gives the results

thus far obtained for the Dirichlet distribution.

However, some questions still remain unanswered and

will provide the basis for further research.















CHAPTER 3

ESTIMATION FOR THE BETA DISTRIBUTION


Introduction


If the Dirichlet or generalized Dirichlet distri-

bution is to be used as a model for a vector of contin-

uous proportions, it will be necessary to have a means

of estimating the parameters of these distributions.

For the generalized Dirichlet distribution, the parameters

may be estimated through the Z.'s which have univariate
1
beta distributions. This might also be done for the

Dirichlet distribution since it is a special case of the

generalized Dirichlet distribution. However, the parameters

of the Dirichlet distribution may also be estimated

directly. Since the Dirichlet distribution is a multi-

variate beta distribution, it is quite evident that

similar problems will be encountered for the Dirichlet

distribution as for the beta distribution in any method

of estimation. For example, in maximum likelihood

estimation, the likelihood equations for the beta

distribution are


n
E ln(Xi)
(a) Y(&+) = i=l (3.1)
n








and

n
Z In (-Xi)
T(8) Y(a+3) = i=l, (3.2)
n

where Y is the digamma function.

For the Dirichlet distribution, the equations are similar.


n
k+l E In(Pij)
Y(a.) 1( Z C ) = i=l j=l,...,k+l.(3.3)
j=l n



Thus, the solution of these equations is an extension of

the solution of the equations for the beta distribution,

and will be given in Chapter 5. For this reason, a study

was made of possible estimators for the parameters of

the beta distribution, and the properties of those

estimators. Specifically, exact formulas or approximations,

where exact formulas were not obtainable, were found for

the biases, variances, and covariance of the estimators.

In addition to estimating the parameters, estimators for

the mean of the beta distribution have also been considered.

This was done because such an estimator would have more

practical significance to an experimenter than estimators

of the parameters themselves.


Sample Moment Estimators


The first estimators considered were the sample

moment estimators. Let M1 denote the first sample moment,







n n 2
E Xi/n, and M2 denote the second sample moment, E Xi/n.
i=l i=l
Using the method of moments technique of estimation,

these sample moments are equated to the corresponding

population moments in terms of the parameters of the

distribution. The equations obtained are then solved

for the estimators of the parameters.

A cdaA +
at a(a+l)
M1 = ^ and M2 = (3.4)
S +8 (a+B) (a++1)



From the first equation,

a(l-M1)

S= M (3.5)



Substituting into the second equation,

M1(a+l)
M2 = (a+a(l-M1)+l) ; (3.6)

M1


S2 2
aM aM M1 MM2 (3.7)



Thus,

M1(M -M) (1-M ) (M -M )
S= 2 and = -M2 (3.8)
M-M M -M
21 2 1








These, then,are the moment estimators of the parameters

of the beta distribution. The moment estimator of the

mean of the beta distribution is simply the first

sample moment, M1.

Since the estimators of a and are ratios of

functions of the sample moments, their expected values ,

variances, and covariance are not easily obtainable.

Therefore, first order approximations for the biases,

variances, and covariance of the estimators were found.

Let

M1(M1-M2)
g(M1,M2) = M -M2 = a (3.9)
2 1



and

(1-M1) (M -M2
h(M,M2) = 2 = 8 (3.10)
2 1


Then, by expanding these functions in a Taylor series

and approximating them by the first order terms of the

series


a = g(M11M2) = g(E(M1),E(M2))

g (M,M 2)
+ (M1-E(M )) 1M1 E(M1),E(M2)

6g(M l2)
+ (M2-E(M2)) M2 E(M ),E(M2) (3.11)








and


S= h(M1'M2) = h(E(M1),E(M2))

6h (M1,M2)
+ (MI-E(M1)) 6M1 E(M1),E(M2)

6h(M1,M2)
+ (M2-E(M2)) 6M2 E(M1),E(M2).(3.12)


Thus, first order approximations to the biases, variances,

and covariance of the estimators of a and 8 may be found.

6g (M1 ,M2)
E(a-a) = E((M1-E(M ))) 1 E(M1),E(M2)

6g(M1,M2)
+ E((M2-E(M2))) 6M2 E(M1),E(M2). (3.13)

h (M1,M2)
E(B-8) = E((M -E(M1))) 6M1 E(M ),E(M2)

6h(M1,M2)
+ E((M -E(M2))) -M2 E(M1),E(M2). (3.14)


But the partial derivatives of g(MIM2) evaluated at E(M1)

and E(M2) are constant when the expected value is taken,

and thus these terms drop out of E(a-a). Therefore,to a
first order approximation the estimators of the parameters
are unbiased.

First order approximations to the variances and

covariance of the estimators are found similarly.







Var(a) = Var (M4) Sg (M 2)
6M E (M) ,E (M2)


2
+ Var(M2) Sg (M,M 2)2
6M2 E (M ),E(M2)



+ 2 Cov(Ml,M2) rg(1 2)j 1 g(r M 2) E
1 2
M E(MI),E(M2),

(3.15)


Var() Var(MI) h(M1'M2) E() 2
6M1 I E(M) E (M2



+ Var(M2) 6h(MM2) 2
6M2 E (M ), E (M2



+ 2 Cov(M1,M2) 6 2 6h(M M2
6M 6M
6m1 L M2 E(M1),E(M2),

(3.16)


and









Cov(a,8) Var(M1) 6(M1M2) 1h(MI'M2)
6M1 6M1 E(MI),E(M2)



+ Var(M2) 6g(MI'M2) 6h(MI'M2 )
6M2 6M2 E(M) ,E(M2)



+ Cov(MIM2) {5(MirM2) Lh(M1,M2
6M 6M




S6g(M2 6h(M M1 jE(Ml),E(M2) (3.17)



Expressions for Var(M1), Var(M2), Cov(M1,M2), and the

partial derivatives evaluated at E(M1), E(M2) were then

found in terms of a and 8.

n 2 n 2
Var(M) X. E Xi__

n n


1 a(a+l) a2
= n
(a+8) (a++1) (a+8) C3.181
n 2 2 n 2 2
Var(M2) = E i=l E i=1 }
n n



= 1 a(a+l) (a+2) (a+3) a2 (+)2
n 2 2
a+8) (a+8+1) (a+8+2) (a+++3) (a+8)2 (a+8+1)

S3.19)








n n 2 n n 2
E X. E X E X. E X
Cov(MIM2) = E i=1 i=1 -E i=1 E i=1
n n n n



2
= a(a(+l) (a+2) a (a+l)

(at+8) (c+8+1) (a+8+2) (ca+)) (a++1)j (3.20)


6g (M,M2) M2 (2M1-M2-M2)
6M E(M ),E(M ) 2
S1 M E(M1),E(M2) (3.21)


6g(M1M2) = MI(M1-l)
6M E(M ),E(M2) 2
S1 (M2-MI) E(M1),E(M2) (3.22)


6h(M'M2) =M2 (1+M2-4MI)+M2 (1+M2
____2 2 1 1
6MI E(M1 ),E(M2) (M2 M2 ) 2
1 (M2-M1 E(M1),E(M)


and (3.23)


6h(M ,M ) M (2M -M2-1)
12 = 1 1 1
6M2 E(M1 ),E(M2) (M 2M2 2
2 2 (M2-M) E(M),E(M2) (3.24)



Since E(M ) = a/a+B and E(M2) = a(a+1)/(a+8)(a+a+1),

these partial derivatives are then expressed in terms

of a and 8. At this point a computer program was used
A A A
to evaluate Var(a), Var(8), and Cov(a,8) for various

values of a and 8. The results of those computations

are shown in Table I.








The estimator of the mean of the beta distribution,

MI, is an unbiased estimator, and an exact formula for

its variance exists.

n 2 n 2.
I X., E X.
Var(M) = E i=l Ei=1
n n




= 1 __ (3.25)
n 2
n (a+B)(a++1) *


The variance of the estimator of the mean for various

combinations of a and 8 is also contained in Table 1.


Maximum Likelihood Estimators


Derivation of the Estimators

A second natural set of estimators to consider are

the maximum likelihood estimators. Unfortunately, the

likelihood equations for the beta distribution do not

yield simple solutions for the estimators. The problem

of obtaining the maximum likelihood estimators for the

beta distribution has been investigated in a paper by

Gnanadesikan, Pinkham, and Hughes (1967). The main thrust

of Gnanadesikan, Pinkham, and Hughes's paper is to

investigate the effect of using all or some of the order

statistics from a beta distribution to obtain maximum

likelihood estimators for the parameters. Given a sample

X,...,Xn from a beta distribution with parameters


I








a and 6, so that


f(Xi) r(a+)
r (a))r M


a-1-1
X. (1-X )
1 1


n a-1 8-1
L(X ..,x n) = F r(o+) iHn X. (1-xi
n ()F()R i=i 1


(3.27)


and


In L(X ,...,X ) = n In F(a+B) n In F(a) n lnr(8)
I n

n n
+ (a-1) Z In X. + (6-1) E In(l-X.).
i=l i=l
(3.28)


Note that the notation has been changed from Gnanadesikan,

Pinkham, and Hughes's paper to conform with the notation

used in this dissertation. The likelihood equations

using this notation are then


A LI
n 1n r(a+B) n 61n r (a) + E In X. = 0,
6a 6Ta i=l


n i6n r(6+R) n i1n rF() + E In(l-Xi) = 0.
Si=l


Then


0 1


(3.26)


and


(3.29)


(3.30)









Equivalently

n
Z In Xi
'(a) Y(a+8) = i=l (3.31)
n



and

n
E In(l-X.)
Y(a) Y(a+$) = i=l (3.32)
n



where



l(a) = 61n r(a), T'(a) = 61n F(8), and

6a B

CTa+B) = 61n r(a+) = 61n r(a+a) (3.33)

S ^
5a 5$



By the nature of these functions of a and 6, called "psi"

functions, involved in the likelihood equations, it was
A A
not possible to solve directly for a and 3. Thus, the

Newton-Raphson iteration method was used to solve the

equations. The problem is further complicated for

Gnanadesikan, Pinkham, and Hughes since they wish to

find maximum likelihood estimators also for the case

when only the first p order statistics from a sample of

size n are used. The likelihood equations are then of








the form

n
E In X.
P i=1 = Y(a) T(a+$) (1-p) 1(x;' (3.34)
n n n
SI(Xp;a,8)
and

n
E ln(l-Xi)
^ .i1' (3.35) (
P i=l ______ = () Y(a+c) (1-p) 21(.V.. (3.35)
n n n I (Xp;,)
I(xp;a,8)



where

^ -1 ^ i-1
I(x;a,f) = t- (l-t) dt,

x

1

I(x;,) = t (1-t)1 n(t) dt,
x


^ a -1 ^
I2(x;a,8) = ta-l(l-t)l8-n(l-t) dt (3.36)
Jx



Gnanadesikan, Pinkham, and Hughes then go on to

compare the estimators obtained by using various fractions

of the data and present two examples of their results.

However, the section of their paper which is relevant

here is the section in which they actually compute the

estimators given the entire sample. The method proceeds








as follows. Let

n n
E In X. E In(l-Xi)
kI = i=l and k2 = i=l (3.37)
n n



Then the likelihood equations are



P(a) '(a+B) = k and Y([) T(a+B) = k2, (3.38)



where kl and k2 are constant in terms of a and 8. Let

the solutions of these equations, a and B, be equal to

some initial values plus a correction. That is, a = a0+h

and 8 = 80+1. Then


Y(a0+h) Y(a0+h+80+l) kI = 0 and


T(80+1) Y(a0+h+80+l) k2 = 0. (3.39)



Expanding these functions in Taylor series' and using

only the constant and first derivative terms of those

series as an approximation, the equations are



'(a0+h) y(a0+h+50+l) k1 = '(a0) (ao+BO0) k1



+ h 6[ (a) Y(a+) k1



+ 1 a[(a) (a+-) ]
a $0'








and


Y(80+1) '(a +h+80+1) k2 C! T(0) '(0+ 0) k2


+ h 6[Y(T) (a+6) k2]


+ 1 6[(T) y(a+$) k2]
L ,1 0-


Let


s[T'()-0(a+e)-2]
6a I0, 0 ,


= i'(a )T' (0) T' (t0+0 ) [T' (a~ ) + '(BO)] .


Now, let the first correction to the initial estimate,

a0, be hl, and the first correction to B0 be 11. Then


(a0+B0) + k1 ((a0)


(a+0 0) + k2 T(B0)


-' (a0+)


'' (%0) T'(a0 +0)


(3.40)


(3.41)


h =


I


!


[T (a)- (a+e)-kl]


1


6[IyT (a)-T (a+)-kl]


I


O',, O








h = ['(a0+ 0) + kl (0a)] I[ (B ) '( +0)]


+' (o0 +0)[(Q0 0+0) + k2 T(80)] / D (3.42)


and
(00) (C 0+0) (a00+ 0) + kl '(a0)

1 =
1 (a0+B0) 1(a0 +80) + k2 0(BO)


D


= [((o+0) + k2 i(0)] [V' (0) (a0+80)


+T'(aO+B0) [(x0 +80) + k1 '(a0)] / D (3.43)



The entire process is now repeated using a0+hl and 80+11

as new initial estimators. The iteration continues until

it is clear that the process is converging to a solution.

As the initial estimators, ~0 and B0, Gnanadesikan

Pinkham, and Hughes propose the sample moment estimators

discussed earlier. These seem to work satisfactorily,

in that the process does converge to a solution. That is

the Newton-Raphson method. The only difference from

Gnanadesikan, Pinkham, and Hughes's method of solving the

likelihood equations is that Bernoulli series approximations

for the derivatives of the psi functions have been used

in the corrections to the initial estimators, rather than

the approximation of the derivatives by differences.

The Bernoulli series approximations were given in a paper

by Choi and Wette (1969). To find Y' (X), the equation is









1 2
T'(x) = {l+{l+[1-(! 2)/X ]/3X}/2X}/X X>8. (3.44)
5 7X



If X<8 then use is made of the recurrence formula



Y'(X) = Y'(X+l) + 1/X2 (3.45)



For example,


1 1 1 1
*' (4.5) = '' (8.5) + 2 + ( 2 + ( 2+ 2
(4.5) (5.5) (6.5) (7.5)



By using this approximation to the derivatives of the psi

functions, the Newton-Raphson method converges in fewer

iterations than Gnanadesikan, Pinkham, and Hughes report

in their paper. In no case did it take more than four

iterations to arrive at a solution.

Since the maximum likelihood estimator of a function

of a parameter or parameters is simply the function of

the estimators, the estimator of the mean, a/a+8, is

a/a+B where a and 8 are the maximum likelihood estimators

of a and 8.


Asymptotic Properties of the Estimators

Having obtained the maximum likelihood estimators,

the next logical question would be what sort of statis-

tical properties do these estimators have. More specif-








ically, since they are rather difficult estimators

to obtain, are they a significant improvement over the

sample moment estimators. To answer these questions,

an attempt has been made to investigate the biases,

variances, and covariance of the estimators.

Because of the fact that the estimators were

obtained through an iteration process rather than being

given through an explicit formula, exact expressions for

the biases, variances, and covariance were unobtainable.

However, two methods of approximation were available.

First, since these are maximum likelihood estimators,

their asymptotic distribution is well known if certain

regularity conditions are met. From the asymptotic

distribution the biases, variances, and covariance for

large n are known. Second, in an attempt to determine

the biases, variances, and covariance for smaller values

of n, the likelihood equations were expanded in a Taylor

series. Since the first and second moments of the right

hand side of the equations are obtainable, then the

expanded equations may be solved simultaneously for the

biases, variances, and covariance of the estimators of

a and 6.

First, consider the asymptotic distribution of the

estimators. This distribution, for a multidimensional

parameter vector, is given in Wilks (1962).








Theorem 3.1: If (Xl,...,Xn) is a sample from the

c.d.f., F(X;60), where 80 is r-dimensional and

F(X;8) is regular with respect to its first and

second 8-derivatives for 8 in r0O and if the
A A
maximum likelihood estimator (08,...,er)

satisfying (12.7.1) is unique for n>some no,
n
and measurable with respect to H F(X ;8),
i=l '
then it is asymptotically distributed for large

n, according to the r-dimensional normal
-1
distribution, N({ 0 },IIn Bpq 1 ), where
p0 pq
1l1 pql 1 I IBpq(eO0' 0)11-
Note that


B p (6,0 ) = S (X;d')Sq(X;9') dF(X;6) (3.46)
S-00

To apply this theorem, the conditions of the theorem must

be met for the beta distribution. Regularity with respect

to the first and second 8-derivatives is equivalent to



E(S (X;9)) = 0 and


E(S (X;9)S (X;6) +E(S (X;9))) = 0 (3.47)



Now in this case


S (X;9) = 61n f(X,a,8) = Y(a+B) T(a) + In X,
p 6a

S (X;9) = (S*n f(X,a,g) = '(c+B) '(8) + In(l-X),
q -6







and


S pq(X;) = S21n f(X,a,8) = T' (a+B)


Thus,


E[S (X;6)] = T(a+B) T(a) + E(ln X)


To obtain E(ln X) consider


xa-f(l-X)B-1 dx = F(a)r(B) ,
0 r(a+8)
1
6 o XU l(l-X)B-l dX = 6 rF(a)rF()
&a 0 o ^ r(a+B)


(in X) Xa-1(1-X)-1dx = r() [r'(a)-r(a)Y(a+)],
0o r (a+B)


E(ln X) =


r (a+ ) r(B) [r' (a)-r () (a+ ) ]
r(a)r (B) r(a+B)


= ~(a) ~(a+B)


Therefore,


E[S (X;6)] = T(a+B) T(a) + [T(a) T(a+B)]
p


(3.54)


(3.48)


(3.49)


(3.50)



(3.51)


and


(3.52)


(3.53)


= 0 .








Similarly, E[S (X;9)] = 0.

pq q
E[Spq(X;O)] = E 6 in f(X,a,8)



= -E 6 In f(X,a,8) 6 In f(Xa,8) (3.55)


and


E[S p(X;)S (X;6)] = E 6 In f(X,a,8) 6 In f(X,8), .

(3.56)


Thus,



E[S (X;6)S (X;9) + E{S (X;6)}] = 0. (3.57)
p q pq


Therefore, F(X,90) is regular with respect to its first

and second 0-derivatives. Since the iteration process

converges to a single pair of estimators for a and a,

then the solutions to the likelihood equations are unique.
n n
Expressing a as a = a0+ E hi and 8 as 6 = 80+ Z ii,
i=l i=l
then a and 8 are measurable functions if a0, %0, hi,

and i are measurable functions for all i, since
n n
lim E hi and lim 1.i are measurable if all the hi's
n-- i=l n-+ i=l
and 1.'s are measurable. Let, M1 and M2 be the first

two sample moments. Then M1 and M2 are measurable

since the Xi's are random variables and hence measurable.








2 2
But, a0 = M1(M1-M2)/M2-M1 and 80 = -M) (-M -M2)/M2-M.

Since products and sums of measurable functions are

measurable and since a continuous function of a measurable

function is measurable, then a0 and 80 are measurable

functions. Consider hi,



h = [(a0c+ 0) + k1 Y(a c)] [ (0) ,'(10+0o)]


+ T' (ao0+) ['(a0+00) + kl T(0)] / D (3.58)



where



D = (a0)T' (0) (a0+ 0) [' (a0) + (80)] (3.59)



Since f(X) and Y' (X) for 0
functions and k1 and k2 are measurable functions, then

all components of hI are measurable. Thus, hI is

measurable and similarly 11 is measurable. Therefore,
A
after one iteration the estimators al = a0+hl'

1 = a0+l1 are measurable functions. But, h2 and 12

are computed by replacing a0 and 80 in hI and 11 by
A A
a1 and B1. Thus, h2 and 12 are measurable. Similarly,

h. and 1. are measurable for all i. Therefore a and 8
1 1
are measurable functions. Thus, the conditions for the

theorem giving the asymptotic distribution of the

maximum likelihood estimators are satisfied.







Once again, the theorem states that the asymptotic
distribution of the estimators is the r-dimensional
normal distribution, N({Op0}, In Bpqll -) Hence,
A A
asymptotically the estimators, a and B, are unbiased.

The notation which Wilks uses for the asymptotic variances

and covariance of the estimators, Iln Bpq I- is more

commonly given as the inverse of n times the information

matrix, (nI)-. To find the information matrix, proceed
as follows,


f(X,a,O) = r(a+B) X'- (1-X)-1 (3.60)
r(a)r(s)


In f(X,a,B) = In F(a+8) In r(a) In r(')

+ (a-1) In X + (B-l) In(l-X), (3.61)


61n f(X,a,B) = Y(a+c) T(a) + In X and (3.62)
6a


6 2n f(X,a,8) = Y'(a+8) V'(a) (3.63)
6a2


Therefore,


E 61n f(X,a,=) = -Ei 21n f(X,a,8)
I L 2 L


= Y'(a) Y' (a+6).


(3.64)







Similarly


621n f(X,a,B) = Y' (a+) (T )
682

621n f(X,a,8) = '(a+6) .
6a68


E [ 61n f(X,a,6)]
EL ------


2 = (8) Y' (a+) and (3.67)


E 61n f(X,a,B) 61n f(X,a,8) = -a' (a+a).
(3.68)


Hence,


= Y'
I =


(a) V'(a+B)


-V' (a+B)


-V' (a+8)


V'(8) Y'((a+0)


so that


III =


Y'(a) Y'(a+B)


-u (ar+B)


-I" (a+a)


1' (8) F' (cr+B)


= F' (ca)Y' () Y' (Ca+) [v' (a) + '' (8)]


and


Thus,


(3.65)



(3.66)


I,


(3.69)


(3.70)








Now



O (g) Y' (a+g) (a+g)

I-1 =

T (a+g) T (a) Y (a+g)
II II (3.71)


Therefore, asymptotically,



Var(a) = T' () 1' (a+)
nlIh

Var(8) = Y'(a) T'(a+g) and
nIlh

Cov(a, ) = V'(a+g) (3.72)
nlII


The estimator of the mean,a/a+g, is asymptotically unbiased

since it is the maximum likelihood estimator of a/a+8.

To find the variance of the estimator, note that the

asymptotic variance of a function of the maximum likelihood

estimators is simply the variance of the Taylor series

expansion of the function terminated after the first








derivatives. Thus, asymptotically,


Var = 6a/a+B 2 Var(a)
.+J L a 2,,


+ [ 6a/a+ 2 Var(B)



+ 2 a/a+/a+B L1 Cov(ca,a)



2 a^ 2 A
S Var(a) + a Var(8) 2apCov(a,8) (3.72)
(a+8)4


where Var(a), Var(B), and Cov(a, ) are the asymptotic
expressions obtained previously. Substituting those
expressions,


Var a_ = { 2Y'(8) T'(a+)]


+ a2 [ (a) I' (a+8)]


2a8 B'(a+e)}/nIII (ca+)4 (3.73)




Small Sample Properties of the Estimators
At this point, a comparison of these asymptotic
results with the results for sample moment estimation
could be made. However, some information, even if only









an approximation, on the small sample biases, variances,

and covariance of the maximum likelihood estimators

would be desirable. Fortunately this can be done.

Consider again the likelihood equations

n
E In X.
(a) T (c+B) = i=l and
n

n
Z ln(l-X.)
'(B) Y(a+) = i=l (3.74)
n



Now

n
EZ In X.
E i=l = E(ln X.) (3.75)
n



since the X. 's are a random sample from a beta distribution
1
with parameters a and B. As has been shown before,

E(ln X.) = '(a) '(a+c). Similarly

n
E In(l-X.)
E i=l = E[ln(l-Xi)] = Y(8) Y(a++). (3.76)
n



Now let



K(a,8) = T'(a) (a+) [) () (a+)]


L(a, $) = T(0) T ( a+ ) E(T) -T (a I







n
E In X.
k = i=l
n


- [Y(a) ~(a+8)] and


E In(l-X.)
1 = i=l 1 [() \(a+3)],
n


(3.77)


so that the likelihood equations may be transformed

into the following equations,


K(a,8) = k


L(a, ) = 1 .


and


(3.80)


Now let


m. .
13


ij


= E( k ) ,



= E[ (a-a) (0-8) ]


K. = f i6 KtcS)
13 ~j ,8


Li = 6 L(ct,8)
iAj!


and


(3.81)


By expanding K(c,8) in a Taylor series about the point

(ca,) ,


K(a, ) = ay() T(a+S) If (a) T(a+B)]









K(a,8) = K(a,B) + (a-a) 5K(,) I)



2 2 A







^ 2 2
+ (-) K(aB) + (a-a) K(aa)








+ (-8)2 2 K(a,B) + ... (3.82)
2 2 a,8



= (a-a) K10 + (0-8) K01 + (a-a)2 K20

+ (a-a) )(-) K + (-B)2 K02 + ... (3.83)


Similarly


L(a, ) = (a-a) L10 + (- 01 + (a-a)2 L20


+ (a-a)(8-8) L11 + (-2 L2 + ... (3.84)


Since K(a,8) = k and L(a,8) = 1 and since the moments

of k and 1 can be found, then the moments, or more

specifically the pij's, may be expressed in terms of

the moments of k and 1. Specifically, terminating the
Taylor series after second derivatives,


0 = E(k) = E(K(a,a)) = E{(a-a) K10 + (a-8) K01

+ (a-a)2 K20 + (a-a)(8-8) K11 + ( -)2 K02








0 = E(1) = E(L(a,B)) = E{(a-a) L10 + (8-8) L01

^ 2 2
+ (a-a) L2 + (a-a)(8-) L11 + (-) L2),


2 ^ ^ 2 ^ ^
m20 = E(k[) = E(K(a,))] = E{[(a-a) K10 + (^-8) K01

+ (a-a) K20 + (a-a)(8-) K + (-_)2 K02 12,





+ (a-a) K20 + (a-a) (-B) K + (B-B)2 K02]
20 11 02



[(a-a) L10 + (-8) L01 + (a-a) L20
A 2 10+L01+ ((ctt) L 20

+ (c-a)(-) L1 + (B-8)2 L02]} and


m0 = E(12) = E[(L(a,))2] = E{[(a-a) L0 + (-8) L0

A 2 a A A 2 2
+ (a-a) L20 + (a-a) ( L-) L1 + (~-)2 LO]2}.

(3.85)


Expressing these equations in one matrix equation and

including only the terms of the expansions involving

10' O 01' P20' v11' and V02' since those are the terms








of interest,



1 K0 01 20 11 02 10



L10 L01 L20 L11 L02 01


m = 0 0 K2 2K K K 2
20 10= 0 10 01 01 20


KloL01
m 0 0 K L 10 KL L 0
11 10 10 K+ K01 01 11
01 10

2 2
m2 0 0 L 2Lo L L 202
02 10 10 01 01 02
.(3.86)


In matrix notation, m = KP Solving for the vector V ,
-1
K m = p so that an approximation for p in terms of a

matrix of constants with respect to the distribution,

F(X ,...,X ), and a vector m, for which moments can be
1 n -
found, is obtained. Notice that the j-vector is not
A A A A A A
exactly E(a), E( ), Var(a), Cov(a,8), and Var(s), but
^ ^ ^ 2 A A 2
rather E(a-a), E(B-6), E(a-a) E(a-a)(8-B), and E(a-8)2

However, from these expected values the biases, variances,

and covariance of the estimators may be found.

The next question is how good an approximation is

this. The order of the approximation in powers of n

depends on the order in n of the terms in m and K, and

the order of succeeding terms in the equation. In other

words, if the Taylor series expansion of K(a,8) were

to be carried out further, thus incorporating higher








moments of [( a-a), (a-_)j], higher moments of (k ,j)

would need to be included in order to solve the system

of equations. Higher powers on n would then be included

in the expansion, implying a better approximation.
Since K is constant in n, the order of the approximation

will depend only on m. On the condition that m.. is
-- 1J
of order [(i+j+l)/2] in 1/n, where [ ] in this case

represents the greatest integer function, the solution

of the matrix equation previously given yields p to order
1/n. Since this approximation worked fairly well, except

for very small values of n, a better approximation has

not been computed.

To prove the condition that mij is of order
[(i+j+l)/2], consider the two special cases of mk0 for

k odd and for k even, in order to fix the ideas of the

proof for the more general case of m...

Sn ~k
mk0= 1 E i(Yi ]) (3.87)
"k i=l

where Y. = In Xi, X1,...,X are a random sample from the

beta distribution with parameters a and 6, and

p = (a) '(a+B) = E(ln X.). Let k be even. In the
n 1
expansion of [ E (Yi-_)]k there will be a term
i=l

2 2 2
E (Y ) (Yi ... (Yi ) (3.88)
i+i2...i k/2 1 2 k/2







Since the Xi's and hence the Yi's are independent and

identically distributed,


Z
ELi E~..i/


2 2 2
1 12 k/2


= n(n-l) (n-2)...(n-+l) [E(Yi-p) 2k/2
Y, I


(3.89)


which is of order nk/2 All terms preceding this one
n
in the expansion of [ E (Yi-P)]k will be of smaller
k/2 i=l
order than nk/2 since they will include a summation

over less than k/2 subscripts. The next term following

this one in the expansion would be


Z
iJ+i2+--- .~


(Y -) (Y -) ...(Y -) (Yi -P).
1+ 2 k k (
2 2- (3.90)


The expected value of this term is zero since the Y.'s

are independent and


E (Y
ik


-p) = E(ln X.
41l


- E[ln X.
1


Thus, all terms with fewer components than (3.88) will

have a smaller order than nk/2 and all terms with more

components than (3.88) .will be zero in expectation.


]) = 0.


(3.91)








Thus, the highest power of n present in mk for k even, is


k k/2 1 1
(1/n ) (n k) = 7- = l(k+l)/2] (3.92)
n n



so that mkO is of order [(k+1)/2] in 1/n. Now let k be

odd. By an argument similar to the previous one, the

term with highest power of n, which shall be referred to
n k
as the "worst" term, in the expansion of E[ E (Yi-p)] is
i=l1

E i (Yi -)(Y 2 (Yik )
+2+ +i1 2 kk-
S2

(Yi P k-3
= n(n-1) (n-2)... (nk-l- )E (Y. --) {E[ (Y. -)2]}

(3.93)


which has highest power of n equal to n(k-1)/2

Therefore, mk0, for k odd, has highest power of n


k (k-l)/2 1 1
(1/n) (n ( 2) = (k+1)/2 = [(k+l)/2] (3.94)
n n


so that mk0 is of order [(k+1)/2] in 1/n. Now, consider

m in general.
pq

n P n
pq n i y n 1I=


1 n P n q
= p+q E (Y.-P) E (Z )
n I ii=l i= z (3.95)








where


Yi = in X. ,
i 1

Zi = In(l-Xi),


py = T(a) T'(a+8) = E(ln Xi), and


uz = T(8) Y(a+6) = E(ln(l-Xi)) (3.96)


Assume pq.

Let p and q both be even. Then there are two possible

"worst" terms in the expansion,


E [i E
E1
11+1 2 "ip+j1 2+ J
2 2


2 2
(Yil-y (Yi2-y) ...
1


2 2 2 21
(Y y (Z l- z) 2(Z 2 z) j Z)
2 2


which has highest power of n equal to 2 + and
2 2


E E
i +i2i' +ip+j ...j qp
2


(Yil-y ) (Zil-pZ)


(Y. )(Z i-P )(Z ) ..Z ) 1
( p Y 'p J1 ~y ZI Z) Z


(3.97)


(3.98)








which has highest power of n equal to p + q-2 = 2P+
2 2
Thus, mpq has highest power of n


(1/nP+q) (p+q/2) = 1/n+q/2 = /n[(p+q/2)+] (3.99)



so that mpq is of order [(p+q+l)/2] in 1/n for p and

q even. For p even and q odd, the "worst" terms have

powers of n equal to P + -1 = p+q- and p + -p-1 = p+q-1
2 2 2 2 2
so that mp is of order p+q+1 [+q+1] in 1/n. For p
Pq 2 2
odd and q even,the "worst" terms have powers of n equal

to -l1 + q = p+q-1 and p + q--1 = +q-1, so that m
2 2 2 2 2 q
is again of order p+q+l [p+q+1] in 1. Finally, for
2 2 n
p and q both odd, the "worst" terms have powers of n

equal to p-1 + q-1 =p+q-2 and p + q-P = P+q so that
2 2 2 2 2
mpq is of order E. = [PIE ] in 1. This completes the

proof.

Now, if a solution for the vector p in the matrix
equation m = KI can be found, then an approximation of

order 1/n for E(a-a), E(a-a), E(a-a)2, E(a-a) ( a-), and
A 2
E(B-8) is obtained. Solving the matrix equation involves

finding m20, mll, and m02 and the K-matrix and its

inverse. Expressions for m and K have been found, but

the computation of K- and then of j is done by a

computer program. The results for various combinations

of a and 8 are given in Table 2. The K-matrix is

found from partial derivatives of K(a,a) and L(a,8)












K.. = 6i+j K(c,8)
11 ,3 ae
6a 6$ i! ji


L.. = 6i+j L(a,8)
1 i j 'a,
6a 65 i! j!


and


(3.100)


Since K(a,$)

L(a,$) = )


= W(a) y(T+8) [Y(a) Y(ca+)] and

- T(a+$) [T'($) (a+B)]


K0 = y' (a) (a+T )


K01 = -V'(a++)

K = w"(a) \Y(a+6)
20 2

KI = -T"(a+8)


K2 = -"(a+8)
02 2


L01 = 'T () I' (a+3)


L10 = -' (a+8)


L02 = Y"() Y"(2a+)

L1 = -V"(a+$)


L20 = -2"(a+8)
2


(3.101)


As before, the psi primes and double primes are calculated

using Bernoulli series expansions of those functions.

The vector, m, is easily found since


n n 2





= Var[. 1 In X
In i=l 1


since








m20 =1 Var(ln Xi)


= El(ln X.)2 [E(ln X.)]2]
n 1 1


1 E[(ln X.) _[(a) Y(a+c)] 2 and (3.102)
n i1


at-1 -1 2
r(a+C) X. (1-X.) In X dX.
Sr (a)r( ) 1


= (a+g) 62 r(ca)r(B)
r(a)r(M) 1 6-2 F(a+B)


= (a+)B)
rT~TFrm


[r"(a)r(B ) r' (a)r(B)Y (a+B )


r(a)r(B) (a+B ) (a)r(B) (a+8)


+ r(a)r(o)Ty(ca+)2] / r(a+a)

= (' (a) + '(a)2) y(a) Y(a+S)


'' (a+B) Y(a)~(a+) + T(a++)2

2
= ''(a) Y'(a+8) + [Y(a) '(a+B)]2 (3.103)


Therefore,


m = 1 [( (a) Y' (a+)] (3.104)
20 n


E(ln X.)
1








Similarly


n n
m 1 = E 1 E In X. E( 1 E In X.)
n i=l n i=l


n 1 n




n n
= Cov ( i n X.),(1 In(1-X))
I n il 1 i=l


= 1 E [In X. In(l-X.) E(ln X.)E(ln(1-X.))
n 1 1 1 1


= (a+(3.105)
n


and



m02 1 E [(n(l-X ))2 [E(n(l-Xi))]2]
n


= [L' () '' (a+B)]. (3.106)
n



This gives an approximation of order 1/n for E(a-a),
S^ 2 2
E(S-8), E(a-a) E(a-a)(B-8), E(8-e) The accuracy

.of this approximation depends on the magnitude of the

coefficients of the 1/n2 terms, the 1/n3 terms, and so

forth in the expansion. An idea of how well the approxi-

mation works can be gathered by looking at Table 2.








Notice that the problem is symmetric in a and 6, so that

it is only necessary to find the biases, variances, and

covariance for combinations of a = a and 8 = b. The

results for a = b and 8 = a will be found from the

symmetry. Since Var(a) = E(a-a) [E(a-a)]2 and Var(a)>0,

this provides a check on the accuracy of the approximation

for a given sample size n. For example, consider the

first entry in Table 2, for a = 8 = .1 Here
E(&-a) = .20503 and E(a-a)2 = .01515 Evidently, for
n n
n = 1, the approximation is not good since the bias

squared is then larger than the expected mean square,
S 2a 2
E(a-a). However, for n = 10, E(a-a) = .001515 and

E(a-a) = .020503, so that the bias squared is smaller

than the expected mean square, which makes the variance

of & positive. Consider now the entry for 8 = .1, a = 10.

Here E(a-a) = 111.5135 and E(a-a)2= 1013.568 For n = 10,
n n
the bias squared is still larger than the expected

mean square. A sample size of at least 13 is needed

before the bias squared becomes smaller than the expected

mean square. For a sample of size 20 the bias squared

would be 31.088 and the expected mean square would be

50.6784, so that the Var(&) would be 19.5904. From

Table 2, it can be seen that for larger values of 8 a

sample size of 5 or 6 is sufficient so that the bias

squared is smaller than the expected mean square.

For example, take the case where a = 5 and 8 = 1. Here








E(a-a) = 16.1605 and E(a-a)2 57.0243 A sample size
n n
of 5 is large enough so that the expected mean square is

larger than the bias squared, and for a sample of size
2 2
20, [E(a-a)] = .6529 and E(a-a) = 2.8512. Thus, the

Var(a) = 2.1983. For most parameter values, then, one

needs a sample of only about 20 observations to have

a reasonable approximation to the biases, variances,

and covariance of the parameters.

To obtain approximations of order 1/n for E(a/a+8)

and Var(a/a+B), an expansion of the function a/a+a

in a Taylor series is needed.



a= a + (a-a) 6ca/a+1 + (-6) 6ca/ai+6
a+ a+B 6a



2 2^ 2 A
+ (a-a) 6 aC/a+ + (a-a) (0-) 6 ac/a+a
2 a2 aS 6a6 a,



S 2 2^ A A
+ (6-g)26 a/a++ (3.107)
2 ^2 2




Since all terms of the expansion not included in the

approximation are of higher order than 1/n, an

approximation to E(a/a+8) need only include these








six terms. Thus,


E( -a- ) = Ca + E(a-a)r g j + E(g-g) -a1
a+ '+B (a+B)2 L(++


+ E(a-a)2 -2g + E(a-a) ( -8) a-[
2 (+) (a+8a)3

2
+ E(-)2 2a and
2 L(a+B)3


^. 2 2
Var( ) = 6a/a+ 1 Var(a) + ]a/a+ var(8)
a+ L a l L B|


+ 2 6a/ag+6 a/a+g 1 Cov(a,$)
L ^ IJL E- 6^


= V2 Var(a) + a2 Var(B)



+ -2ag Cov(a,) (3.108)
(a+ )4


Again, this approximation is of order 1/n since all terms

of the expansion which are left out of the approximation
are of higher order than 1/n when the variance of the
estimator is found. The bias and variance of the
estimator are included in Table 2 for n = 10 and n = 100.









Geometric Mean Estimator


Another estimator for the mean of the beta distri-

bution is the geometric mean of the observations. The
n 1/n
estimator itself is H X1n, which estimates a/a+.
i=l
The bias of the estimator is easily computed directly

from the distribution of the sample.


r n f1 1
E n l/n x 1/n (
Si=1 Ji= i r (aO+8)
0i=l i=l () ()
0


n a-1 8-1
i x (1-x.) dX ..dX
i=l 1 1


i 1 1
n n a+1-1
= (a+8) ... xn (1-Xi)
r (a)F (8) i=i
0 0

n
= F (a+8) r(a+(I/n))r(8)
Sr(a)r(B) r(a+B+(1/n))


n
= F(a+B))r(a+(l/n))
r (a) (a+B+ (l/n))


dXl.. dXn


(3.109)


Thus, the bias in the estimator is


Sn
E n XI/n_
i=l


1 n
a+ = r(a+e)r((+(l/n)) _. (3.110)
+6 r(a) )r(a+8+(l/n)) "








The variance can also be computed directly from the

distribution.


n 1/n E nX/n 1/ 2 n 1/n 2]
Var X = E ( X X )
i=l i= i=l x


n 2n
= F r(a+6)r(a+(2/n)) 1 r(a+B)F(a+(1/n)) .
Sr(a)r(a+6+(2/n)) FL (a) r(a+)++(1/n))

(3.111)


Notice that, similar to the arithmetic mean estimator

for the mean of the distribution, the bias and variance

of the geometric mean are exact expressions. The bias

and variance are given for various values of a and 8

in Table 3.


Comparison of the Estimators


The final section of this chapter will be devoted to

a comparison of the three estimators of the mean and the

two sets of estimators for the parameters, which have

been considered. For this, the reader is referred to

Tables 1, 2, and 3. To determine what parameter values

to consider when calculating biases, variances, and

covariance for the estimators, the various types of curves

for different choices of parameters in the beta distri-

bution were investigated. A beta distribution with

parameters a = 8 < 1 has a U-shaped distribution,









symmetric about the point X = 1/2. If a < a < 1, then

the distribution is still U-shaped, but skewed to the

right. If a < B < 1, then the distribution is again

U-shaped, but skewed to the left. At a = 8 = 1, the

distribution is identical to the uniform distribution.

If a > 1 and 8 < 1, the distribution increases with

increasing X, increasing more rapidly the larger a is

in comparison to 8. If a < 1 and 8 > 1, the distribution

is decreasing in X. If a = 8 > 1, the distribution is

bell-shaped and symmetric about 1/2. If a > 8, then the

distribution is skewed to the right. Finally, if a < 8,

then the distribution is skewed to the left. Thus, to

include all types of curves represented by the beta

distribution, a range of parameters from .1 to 10 was

considered. More values around 1 were included since

that appears to be a critical point at which the beta

distribution changes shape.

Two sets of estimators for the parameters of the

beta distribution, the sample moment estimators and

the maximum likelihood estimators, were obtained.

Through a first order approximation, the sample moment

estimators were found to be unbiased to order 1/n. For

the maximum likelihood estimators, for a particular
A
value of one parameter, say B, the bias in 8 is fairly
A
constant as a varies, while the bias in a increases

with increasing a. For example, for 8 = .1 and a ranging









from .5 to 10, E(8-8) ranges from .15955 to .16834
n n
For the same parameter values, E(a-a) ranges from
2.4301 111.5135 ^ ^
301 to 1115135 For a = = .1, E(a-a) = E(-3) =
n n
.20503. The results for other values of 8 are similar
n
and since the problem is symmetric in a and B, similar

results would be obtained for particular values of a.

Of course, the maximum likelihood estimators are

asymptotically unbiased.

Looking now at the variances and covariance of the

first order approximation for the sample moment estimators,

for a particular value of B, and a ranging from .5 to 10,

Var(B) increases gradually with increasing a. For example,

at-a = .5 and = .1, Var(B) = .0497 and at a = 10 and
n
B = .1, Var(B) = .15777. For the same parameter values
n
Var(a) increases much more rapidly. At a = .5 and B = .1,
A 2329.66
Var(a) = 1.4159, and at a = 10 and 8 = .1, Var(a) 2329.66
n n
The covariance of a and 8 falls in between ranging from
.1534 at a = .5 and = .1 to 14.942 at a = 10 and B = .1.
n n
The results are similar for other parameter values.

For the maximum likelihood estimators, the expected
^ 2 2
mean squares, E(a-a) E(-) and E(a-a)(8-8) measure

the expected values of the squares and products of the

deviations of the estimators from the true parameter

values. For the sample moment estimators, the variances

and covariance measure these deviations since the sample

moment estimators are unbiased to first order. Thus, it









would seem reasonable to compare the variances and

covariance of the sample moment estimators with the

expected mean squares and products of the maximum

likelihood estimators. For the parameter values consid-

ered, the expected mean squares and products behave

quite similarly to the variances and covariance of the

sample moment estimators. For example, for 8 = .1 and
2
a ranging from .5 to 10, E(a-a) ranges from .85955 to
n
1013.568 2 01141 01094
1 .568, E(SB-) ranges from *0111 to 01094 and
n n n
E(a-a)(8-8) ranges from .03196 to 1.041. Two differences
n n
from the sample moment estimators are apparent. First,
^ 2
E(-2) decreases slightly as a increases. Second, and

more important, the expected mean squares and products

are somewhat smaller than the variances and covariance

for the sample moment estimators. The proportional

differences appear to be greatest for small parameter

values. For example, at a = .5 and 8 = .1,


2
Var(a) = 1.4159 E(a-a) = .85955
n n


Var(B) = .0497 E(B-5) = .01141
n n


Cov(a,$) = .1534 E(a-a)(8-8) = .03196 (3.112)
n n









and at a = 5 and 8 = 2,



Var(a) = 56.5833 E(a-a) = 50.507
n n


Var(a) = 8.2367 E(-8)2 = 6.9664
n n


Cov(a,8) = 18.55 E(a-a) (B-) = 15.782. (3.113)
n n



Thus, the maximum likelihood estimators are somewhat more

exact than the sample moment estimators.

Asymptotically, Var(a), Var(B), and Cov(a,S) are
A' 2
identical to the first order approximations of E(a-a) ,
A 2 ^ A
E(_-8) and E(a-a) (8-). Hence, the maximum likelihood

estimators are also more exact estimators asymptotically

than the sample moment estimators.

For the mean of the beta distribution, the sample

moment estimator is unbiased. If a = 8, the first order

approximation of the maximum likelihood estimator is

also unbiased. For a > 8 and a particular value of 6,

the bias is sometimes a decreasing function of a, and

sometimes an increasing function of a. For example,

for 8 = .1, E(a/a+B) ranges from .2003/n to .01455/n as

a increases from .5 to 10. For B = .5, E(a/a+8) increases

from .1107/n to .1418/n and then decreases to .0574/n

as a ranges from 1 to 10. For a < 8, the bias is simply

the negative of the bias for the same combination with








a > 8. For example, for a = .5 and 8 = .1, E(a/a+) =

.2003/n and for a = .1 and 8 = .5, E(a/a+8) = -.2003/n.

For estimating the mean, a third estimator, the geometric

mean, was considered. Since the expected value and

variance of the geometric mean are complicated functions

of n, they were evaluated for various values of n. The

biases are consistently negative and larger in absolute

value than the biases for the first order approximation

to the maximum likelihood estimators. For example, for

a = 5 and 8 = 2 and n = 10, the bias is -.01892 for the

geometric mean and .01424 for the first order approximation

to the maximum likelihood estimator.

The variance of the sample moment estimator of the

mean decreases as a increases for a particular value

of 8. For example, for b = .1 and a ranging from .5 to
A A A
10, Var(a/a+B) ranges from .08681/n to .00088/n.

A similar relation holds for the variance of the first

order approximation of the maximum likelihood estimator.

For 8 = .1 and a ranging from .5 to 10 Var(a/a+8) ranges

from .04312/n to .00079/n. As one can see, the maximum

likelihood estimator has smaller variance than the sample

moment estimator. These results are typical of other

parameter combinations. For almost all parameter values,

the geometric mean estimator has a larger variance than

either the maximum likelihood estimator or the sample

moment estimator.








As a check on the comparison between the sample moment

estimator of the mean and the maximum likelihood estimator

of the mean, some simulation work was done. From a beta

distribution with parameters a = 3 and 8 =5, 100 samples

of size 5 and 100 samples of size 20 were generated

on a computer. The two estimators of the mean were then

calculated for each sample. Finally, to compare the

estimators, the mean square error for each estimator

for the samples of size 5 and the samples of size 20

was calculated. Let M1 be the sample moment estimator

of the mean. The mean square error for the sample

moment estimator is then


100
Sa 2
S(M1- ) / 100 (3.114)
i=l +


A A A
If a/a+8 is the maximum likelihood estimator of the mean,

then its mean square error is


100
S( )2 / 100 (3.115)
i=l a+B a+a


For samples of size 5 from a beta distribution with

parameters a = 3 and 8 = 5, the mean square error for the

sample moment estimator was .0070148, and the mean square

error for the maximum likelihood estimator was .0056575.

For samples of size 20,' the mean square error for the moment

estimator was .0010910, and the mean square error for the


____~ __









maximum likelihood estimator was .0010896. Thus, in

both cases the maximum likelihood estimator more closely

estimated the mean of the distribution, although the

difference was very small for samples of size 20.

Thus, the choice of an estimator for either the

parameters or the mean of the beta distribution appears

to depend on the characteristics one desires in the

estimator. If one wishes to have unbiased estimators,

at least to order 1/n, then one should use the sample

moment estimators. If one desires smaller variance

or expected mean square in the estimators and can

tolerate some bias, which decreases with n, then one

should use the maximum likelihood estimators. However,

the savings in expected mean squares may not be enough

to offset the difficulty in obtaining the maximum

likelihood estimators. If that is the case, then the

sample moment estimators provide easy to calculate,

unbiased estimators for the parameters and the mean

of the beta distribution.





























EF)




E!


0
-I)


04





E




45-)
0












0)
H
Cd

L12
H 0
r a-i

v i-

0l a
0)c









En
U)


U


Cd

0,-



Cd


0 0C 0 0

r- N H



en




N N N N


0O I


Ln




N0
]n o '
Ln (N v -f
o 0o r -







3 r m
C N C N
o 0 ,- C0

N












0 C C C)

m rN v 00

Hr- W LA 0
1 D r- C

H
en
r4





en en %0 0
Ln Ln n C
o o o N



< Ha
< S 4 > 4
(13 > (d
> > 0 >
u


rCl

II
02.




>> 0 >



LA

II
0a


H H a o


m 0 N 0

00 r 0
N CM
Ln






0, C I 0
Sm N









0 0 0 H
N 01 0 0










N 20


n n o r
CMo o o




















0 C 0 L
L A LA L
'0 H




N N r- n








0* 02



N




C C C C
LA LAn in L
N N N N4
o enr H)
e c i a
\ *\ \
0D 0 0

in n i i
CM C (*( C
0o usn 1-
* *C


(a (0 H0


> > 0 >
U





02


N 0 r-i 0
a e e e
'0 0,N 0
%.0 N






m 0 0 0r


0 0 m
r- en N C










N H 0 0
SN N N
C CM 0 L
CV 4 CM






















e N e n
r 02 00 02
^ wn CD w










L 0 0 'D




en C e e






N N H
a lc c
\ \ \.

en en I"- e
en e w e
en en u3 o
rL r-l 'a1
* *O
CM CM V
















aV k a
CV\ \



00 0 0







Ho I'0 1 0


w -I
m 0 4 a
*









ri m









C. 0 m 1
00 r- 0 0





C
en fN eln LO

C *




Wo w m I'




Ln LA l 0
* e n


< H


> > 0 >

L



II
C02


(, 0 .a .


00 (N t.0 0
o o o en
S r- C) o





N N N N

*
) 0) C
LA)















CN CN N
mo 0i i 0
(n 0( (n 0

3 CO 00






m C C C

N N I
r^ r^ o


-s
O -
>>
u



(N
II
QQ2


S.I 0 an

U A N no
.
O IcrW ON
o o
0





CO O0 -n 01
1N i l \D 0
co o0 01 o
C q ;

oor oo 19


<


> >. 0 >




Ln
II
Q0


I



SHl








ae










-I














C) ) CN C)


LO LO) 01% 0
ON ON 0
.n .I .
'\D 'D '.
H H H3


m r H
(~ <<

(~>
> > 0 >





ri
0
0Q














Lnm C C 0 H0 0
m a a c0 0 0

H r m. H H
o H H C) 0 0O

H en

H O

r 0






ib N 0 N N
N H o o i 0
0-l r0 a)] r-I (Y
*

en H
CN


rN e' Ln co C) CN

IN o v r-u H C<









m 10 0 N H r4
CM H lz0 H C) 0







H
0L O't.0 0H















H L H
H w n o C) qo o Ln








CD N N N N
N L' Hn m H-I Co




H H 0o C) C) C
(N Lr















a Da a aN H
H CM 0 0 i.




*












N N N < Nn
o< aa < a +






<\ or o H CH








I I (I -
Sn <0 <02+
03 02 CM 0 +M












H Ms ^-
--- H (tf
M >


Ln


0
CO
co
*o
C4




0 0
0 H
So0











Ci CM
0 0
0 C




o 0






en en

o o


m m
o o






. )
C) 0


en CN

Co o


0


0
0 0
H H
II II


N LA 0) (n C)
H 0 0 '0' en

O H C '* m.


r 1-1 N N


O HC
00C L CO N H
H CO CO C
00 00 L 0 v


1-1H N o



N N N N N
a o N Hm n

r.0 N en o'0






C0 CO C A
CO C) H N en
kD ko CC) m m
0 00 00
00o CD -i r- m




. H LA .
0 '.0 Ce en H
N > m Yen A r-
A CA me N .
C) C) o C en







CO C 0 C


4 HI.;
* *
i-l i-


-. a -* c-


< < l <6 <


O,
0

U)



4-)
0

0 M





--)
O H


AO



















4X
N w
U) M4


04-





S05



04-
O 4-
IOCA












Ln
0 I
o H
o
0
o o
* cc

'.0


0

II








Ln
II








II





II
11










4)J
-r .


0


( II
C 8


ZH

E-1

LA

II
Mr


N N
m c(

M u


m



*
C N
m ,
Ln Ln


\o N








'o col



9 *rl


0 No
H ii















CN c

N Nm
oo'

. *4


H H





Ln L
LA \L


C C
H 0
- L o
o in
0
*4
o 0a
I H
N




C FN









< o
n o

N N


* H
C CV














N N
N c
( N










H 0
NC H
H H


< a + + o
< < 0 o0
\ \ H H
< a < a II II

S>


II

II


IC <6 ca a ca + <3 o
I I I I
<( < < S I < II II


-l



II
CQ.


CD (NI
o o
0 0


('4 tLA
Ln 0
o 0
o o
* a


o o
o












N N


CO CN
CO 0'n

riM


SN


CN N

m o
C) "



c 4
in



C~ N
ON
r-' i-I








LO N


o o

c o


N N

.0 LO
om C)









LO
N- H
r. 0
r'N 11-1








(Y) oo
C CM
oo i.
co o
0 0





r4

o 0

cr


a\ 0
on o
0 0
o o
o o


CN I CN ^ <
-. < CO. .--. < a a <- + o
6m m + I I I I <3 \ H H
__ [. <- I <. -


H >


LOI
021


<
a oa a
I I I I
<





(N
II


Ln
0" I
o H
o
0 a
cr






m N
N oC
C 0
CO O
0 0








0 0
o o
o 0


M 0o o .0 ao
m 0 co "T r
On 0 n H H-


o n H Ci .0

CM



O H C I
0 c 0 0 4

n Nm 0o 1.

H v l H C 3





Co CM o c uT


1) 1 tn 0 N
\ \ m .
in n C \
o )\o ^ ^ ^
Or t ^r TI ^
in iCn C w
* *~
in in r^ n













O (4 r- O
0 : m 0 0

o01 q co o
" 'H' 0 r,-








.
tcn LOt CO C






CH r-C C
in in r in
i-l 1-1 in
in in ~ 0

* *~


co in
Ro
o o o
0 o


c(:
+ < o o
\ < II II
-" I
M ra
>


(N I CN < 0
-- ^ a m a c. +
I I I < \ -IH H-
< <0o < a I- <
<
I >


Ln
II
ca


o H
0 --.C
0 0
0 0





0 0




o o
o o


N N
o r-











%0 0
-.






m

N^


' H
H 0
o 0
o o
0 0
.








o 0

0 0














0 0 .0 0
0 ,- m
co 0o H 3
o O Un LO
LIn Ln N% r (N 0

C4 C4 0) CO aN
i-l i-l r-


N H
r-4 0
0 0
0 0
o 0


Ln
II








(N
II








*CC

3 II
c a


-I
0
0


N II

r-1




in

II








II
8


c,
C I c ^( <-
S-< < + 0
m + < 3 0 0
I I I '- I < N r- 4 -
<-- SI <' 6 -




o

II



























cn



M a
0
4)





O lC



II




4-I II
0






m 0







C/2
u
9 1in

m n

11 6
co


< H II
E I


















































I

f II






1-1









II































































II







t,
m II




1--1








II
























































i-4









I"

















o
i--I
II
r--I

















O
C)
II







tLn
II







II







Q) Ln

-r, II

0
U n

r-!
III


0 II














CHAPTER 4

HYPOTHESIS TESTING FOR THE BETA DISTRIBUTION


Introduction


As in the estimation problem, since the Dirichlet

or generalized Dirichlet distribution is so closely

related to the beta distribution, the problem of developing

a test of hypothesis was considered for the beta distribution.

In this chapter, tests will be developed for two cases.

First, a test about one parameter assuming the other

is known, and second, a test about one parameter leaving

the other unspecified will be given. Since the beta

distribution is symmetric in its parameters, a test

about one parameter, say a, could easily be adapted to

test the other parameter, 8. Thus, all tests will be

constructed on the parameter a. Finally, it will be

shown that the test for a leaving 8 arbitrary can be

adapted to test an hypothesis about the mean of the

beta distribution, a/a+8.

The types of hypotheses to be considered are generally

known as one-sided and two-sided hypotheses. It is desired

to test either that a is less than some constant, a is

greater than some constant, or a is equal to some

constant, both fixing and not fixing 8. It is further

desired to find the best possible test for each of these

85








hypotheses. Fortunately, because the beta distribution

is a member of the exponential class of distributions,

such a best test can be found.


One-sided Tests for a When 8 Is Known


In this section, one-sided tests of hypotheses about

a when 8 is specified will be developed. From

Ferguson (1967), consider the following definitions.

Definition 4.1:A test > of H0: 8600 against H : eEO1

is said to have size a if sup E (X) = a.
0eco
Definition 4.2: A test 00 is said to be uniformly

most powerful (UMP) of size a for testing

H0: 6800 against H : Oeo1 if o0 is of size a

and if for any other test 0 of size at most a

EO 0(X) > Eo (X) for each EO-1.

For the beta distribution, a typical hypothesis to be

considered is, H : aa0 against H : a

the following theorem from Ferguson.

Theorem 4.1: If the distribution of X has monotone

likelihood ratio, any test of the form



1 if X > x

((X) = y if X = x0 (5.24)

0 if X < x0


has nondecreasing power function. Any test of

the form (5.24) is UMP of its size for testing








H : 8<0 against H : 8e>0 for any 80 e, provided

its size is not zero. For every 0
60C, there exist numbers -m
such that the test (5.24) is UMP of size a,

for testing H : e<0 against H : 8>80.

A similar statement would hold for testing the hypothesis

H0: >8 0 against H : 8<80. For the proof of this theorem,
the reader is referred to Ferguson (1967). Since the beta

distribution is a continuous distribution, y may be

taken as 0. In the notation of Ferguson, the test is

given as a function 4(X) which takes values either 0 or 1.

If 4(X) is 1, then the null hypothesis is rejected, while

if 4(X) is 0, the null hypothesis is not rejected. It

should be noted here that the X Ferguson refers to is a

sufficient statistic for the parameter in question.

The problem is always reduced from a sample of observations

to a sufficient statistic or statistics.

Consider now the following hypothesis, H : a>a against

H1: a
a UMP test for this hypothesis exists and is of the form



1 if t < c
4(t) = (4.1)
0 if t > c


where t is the sufficient statistic for a. Note, for this

hypothesis, since 8 is specified, the distribution contains

only one parameter, a. Thus, there is one sufficient








statistic for a. To determine the test completely, a

sufficient statistic, t, must be found and the constant,

c, determined so that the test is of a given size, say

.05.. Since the beta distribution with specified B

parameter is a member of the exponential class of distri-

butions, the sufficient statistic for a is easily found.


a-i 6-1
f(X,a) = r(a+a) X (l-X) (4.2)
r(a)r(B)


and


n n a-_1 -1
L(X1'...',X ,) = F (a+6) I (l-X.)
rn (a)(r ( i=l


n n
= Fr(a+) exp{a In I X.)
r (a)r(B)J i=l

n n
exp{-ln H Xi + (8-l)ln H (l-Xi)}. (4.3)
i=l i=l


n n
Thus, In H X or equivalently Xi, is a sufficient
i=l i=l
statistic for a. Thus, the test is of the form


n
1 if H X. < c
i=l 1
$(t) = (4.4)
n
0 if H Xi > c
i=l


The problem remains to find the constant, c, so that the








test is of a given size. Assume a test of size .05 is

desired. Then, c can be determined from the following
n
probability statement, Pr[ H X. < c] = .05. To evaluate
i=l1 n
the constant some knowledge of the distribution of H Xi
n i=l
or an approximation to the distribution of H Xi or a
n i=l
function of H X. is necessary. Since an exact distribution
n i=l
for I Xi was not obtainable, several approximations
i=l1
were considered.


Normal Approximation

The first approximation considered was a normal
n
approximation. The probability statement Pr[ H Xi c ] =
n i=l
.05 is equivalent to, Pr[( Z In Xi)/n < (In c)/n] = .05.
n i=l
Since ( E In X.)/n is the mean of the In X.'s, then
i=l 1 n
asymptotically the distribution of ( E In Xi)/n will be
i=l
normal with some mean, p, and some variance,a /n. Thus,


n In X. c
Pr E < n c = .05 (4.5)
i=1 n n



is equivalent to


In c
Pr Z < n = .05 C4.6)




where Z is a standard normal random variable. The

probability can then be evaluated from standard normal

tables and the constant, c, can be found. From standard








normal tables, the above probability statement implies



In c
n = -1.645
a//n



or



In c = [ -1.645 c//n + ]n .


(4.7)


(4.8)


Thus,


c = exp{[ -1.645 a//n + P]n}


= exp [ -1.645 /n- + np] .


(4.9)


All that is needed now is to evaluate p and a2


n
E In X.
S= E i=l = E(ln Xi)
Sn


= \(a) \(a+6)


and


2 = Var(ln Xi) = T (a) T'( +B)


(4.10)


from results of Chapter 3. Thus, ( E In X.)/n is
i=l
asymptotically distributed as N{[T(a)-Y(a+)] [Y'(a)-T' (a+B)/n]}.

The constant, c, is given for various combinations of a,

8, and n in Table 4. Where c was too small to be








evaluated, In c is given. For those values, the test

could be written as


1 if

(t) =

0 if


n
E In X. < In c
i=l

n
E In X. > In c
i=l 1


(4.11)


For the hypothesis H : aac, the

UMP test would be of the form


1 if

S(t) = i

0 if


The probability,

evaluated by the

approximation to


n
H X. > c
i=l1

n
H X. < c
i=l


(4.12)


n
Pr[ H X. > c ] = .05, could again be
i=l
normal approximation to find an

C.


Beta Approximation

To obtain an approximation to the distribution of
n
I X. directly, three methods were attempted. First,
i=l n
the first two moments of H Xi were equated to the first
i=l
two moments of a beta random variable. This is intuitively

appealing since the beta distribution takes on a wide

variety of forms for various choices of its parameters.
n
Since H X. lies on the interval (0,1), it will most
i=l
probably be adequately approximated by some beta distribution.