UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
ANALYSIS OF CONTINUOUS PROPORTIONS By DAVID WALTER JOHNSON A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1974 TO CAROLYN ACKNOWLEDGMENTS I would like to express my appreciation to the chairman of my committee, Dr. Ramon C. Littell, for his guidance in the preparation of this dissertation. A word of thanks also goes to the other members of my committee, Dr. Frank G. Martin, Dr. Zoran PopStojanovic, Dr. P.V. Rao, and Dr. John G. Saw, for their assistance and to Dr. William Mendenhall for his encouragement during my graduate career at the University of Florida. Special thanks go to my parents for their assistance, both financial and emotional, and to my wife, Carolyn, without whose love and understanding this dissertation would not have been possible. iii TABLE OF CONTENTS ACKNOWLEDGMENTS . . . LIST OF TABLES . . . ABSTRACT . . CHAPTER 1 INTRODUCTION . . 2 REVIEW OF THE LITERATURE . 3 ESTIMATION FOR THE BETA DISTRIBUTION Introduction . . Sample Moment Estimators . Maximum Likelihood Estimators . Derivation of the Estimators Asymptotic Properties of the Estimators . . Small Sample Properties of the Estimators . Geometric Mean Estimator . Comparison of the Estimators . . 18 . 18 . 19 . 26 . 26 Page iii vi vii TABLE OF CONTENTS (Continued) Page HYPOTHESIS TESTING FOR THE BETA DISTRIBUTION Introduction . . Onesided Tests for a When 8 Is Normal Approximation . Beta Approximation . Twosided Tests for a When 8 Is Normal Approximation . Beta Approximation . Onesided Tests for a When 8 Is Unknown . Twosided Tests for a When 8 Is Unknown . A Test for the Mean of the Beta Distribution . 5 ESTIMATORS FOR THE DIRICHLET DISTRIBUTION . . BIBLIOGRAPHY . . BIOGRAPHICAL SKETCH . . Known Known 85 86 89 91 96 98 99 100 113 . .. 114 . 129 . 137 . 139 CHAPTER 4 . . 85 . .. . . LIST OF TABLES Table Page 1 Variances and Covariances of the Sample Moment Estimators .. 68 2 Biases and Expected Mean Squares and Products of the Maximum Likelihood Estimators . .. 71 3 Variance, Bias, and Bias Squared for the Geometric Mean Estimator 76 4 Value of c for the Normal Approximation n to the Distribution of T Xi for a i=l Onesided Test for a When 8 Is Known 116 5 Values of the Parameters and c for the Beta Approximation to the Distribution n of H Xi for a Onesided Test for a When i=l 8 Is Known . . 120 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ANALYSIS OF CONTINUOUS PROPORTIONS By David Walter Johnson December, 1974 Chairman: Dr. Ramon C. Littell Major Department: Statistics Many experiments are designed to take measurements on the decomposition of one particular entity into several parts. The effects of other variables in the experiment on this decomposition are then noted and analyzed. This dissertation is a first step toward the analysis of such a multivariate response of continuous proportions. Two possible distributions, the Dirichlet distribution and a generalization of the Dirichlet distribution, are proposed as models for this response vector. It is then desired to investigate the general inference problems of estimation and hypothesis testing for these distri butions. However, before attempting to solve the multivariate problem, it was thought best to investigate the univariate case. A univariate response of this type would consist of a single continuous proportion, and it is assumed vii that such a response will follow a beta distribution. The problem then is to estimate and to conduct tests of hypothesis about the parameters and mean of such a beta distribution. Chapter 3 deals with methods of estimation for the beta distribution. Two sets of estimators, the moment estimators and the maximum likelihood estimators, are given for the parameters of the beta distribution. For the mean of the beta distribution, a geometric mean estimator is given in addition to the moment and maximum likelihood estimators. Comparisons among the estimators are made in terms of their biases, expected mean squares, and variances. In Chapter 4 various tests of hypothesis are constructed for the parameters and mean of the beta distribution. Both onesided and twosided hypotheses are considered and uniformly most powerful or uniformly most powerful unbiased tests are given. To obtain the critical values for such tests two methods of approx imation, a normal approximation and a beta approximation, are given. Where it is possible, comparisons are made between the two methods. Finally, Chapter 5 deals with estimation for the Dirichlet distribution. Methods for obtaining both the moment and the maximum likelihood estimators for the parameters of the Dirichlet distribution are given. viii As was noted earlier, this is just a beginning on the original problem. It remains to be determined if the properties of the estimators for the parameters and mean of the beta distribution also hold for the parameters and mean of the Dirichlet distribution. Also, the problem of developing tests of hypothesis for the Dirichlet distribution is still to be investigated. Finally, the question of the usefulness of the generalization of the Dirichlet distribution needs to be answered. CHAPTER 1 INTRODUCTION In many experiments in various disciplines, an analysis is done on the constituent parts of one mea surement. Generally, these constituents are expressed in terms of the percentage or the proportion which each one makes of the entire measurement. For example, after some treatments have been applied to a set of experi mental units, a chemical analysis might be performed to determine if the treatments have had an effect on the composition of the material. From such an experiment one might measure the total protein content, and then divide that total protein into three or more types of protein. Specifically, consider an experiment in which a measurement is made of the chemical composition of shrimp. Measurements are made of the variables % solid, % water, total extractable nitrogen, total extractable protein, and many others. The % solid is further divided into % fat, % protein, % ash, and % carbohydrates. The treatments to be applied to the shrimp consist of two sets. Some of the shrimp will be stored on ice for periods of zero, seven, and fourteen days. The remaining shrimp will be divided into five batches and each batch will be cooked by a different cooking method. The experimenter wishes to know if the two sets of treatments, the storing times and the cooking methods, have any effects on the composition of the shrimp. At present, the most frequently used method of analyzing such data is to consider each component of the chemical analysis separately. Techniques, such as analysis of variance, are applied to determine if differences exist among the treatments. In some experi ments, this component by component analysis may be what the experimenter wants. However, it may also be that the experimenter would like an overall measure of differences between the treatments, rather than the analysis for each component separately. For example, considering the shrimp data, the experimenter may wish to know if the breakdown of % solid into its four constituents is the same for all the treatments. If the separate analyses result in the conclusion that the treatments are different for some constituents but not for others, it may not be clear whether the treatments are different overall. One possible solution to this problem would be to create a vector of the constituents, assume the vector has a multivariate normal distribution, and use well known results of multivariate analysis to analyze the data. However, the assumption of multivariate normality may not be appropriate. In the shrimp data, the per centages of the four constituents, which make up the total percentage of solids, must themselves add to the total percentage of solids. If one forms a vector of the constituents, using either the percentages themselves or the proportion that each percentage takes of the total percentage of solids, then the elements of that vector are subject to a constraint. Thus, a better approach may be to try to determine the distribution of such a vector, and through that distribution, make inferences about the population from which that vector has come. The problem, then, is one of inference in general. Although the ultimate goal is to make inferences about a vector of continuous proportions using the Dirichlet or generalized Dirichlet distribution, this dissertation deals for the most part with inferences about a single continuous proportion using the beta distribution. Some results dealing with the Dirichlet distribution are given in Chapter 5. CHAPTER 2 REVIEW OF THE LITERATURE A distribution which naturally presents itself to this problem is the Dirichlet distribution. The deri vation of the Dirichlet distribution is straightforward and is discussed in Hogg and Craig (1965). Let X1,...,Xk+1 be mutually stochastically independent random variables each having a gamma distribution with parameters a. and = 1. The joint distribution of 1 X1,...,Xk+l is,then, k+l ai. Xi S(X ,... ,X k) = 1 X. e 0 Let X. = 1i = 1,2,...,k 0 Yk+l = X1+...+Xk+l O Then X1 = 1Yk+l'" Xk k Yk+l Xk+1 = Yk+(l1Y 2 'Y2 k). (2.3) To obtain the density of the above transformation k+1 0 0 0 0 Yk+1 0 0 Y1,'...,Yk+, the Jacobian of from the Xi's into the Yi's is 0 0 k+1 0 Y k+ Yk+ k+l k+l k+l S Yk+1 k k+Y1 1 Y1"Yk k+l 11 k By expanding the determinant about for k an even integer, J = Y1 Yk+1 0 the last column, 0 k+1 Y Yk+ Yk+ Yk+ k+l k+l k+l k+l "' Yk+1 Yk+1 0 0 k+1 Y Y Y Y .. Y k+l k+l k+l k+l k+l +(1Y Y .Y) Yk+ 1 2 k k+1 + . ... . . Expanding the determinant in the first term about the first column, the determinant in the second term about the second column, and so forth, kl k1 J = Y[(Yk+l(Y k+) Y [(Yk+l )](Yk+l) +Y [(Yk+l (Yk+) kl . k+ k+l (k+l) Y3 k+l k+l k k+1 k+k +(lY Y ...Y)y 1 2 k k+1 = yk (2.4) k+l k for k even. Similarly, if k is odd, J = (Yk+l) Then f(Y1 **Yk+) = (Y1 Yk+) l... (YkYk+l) k (k+l1 k+l 1 aI +k.l H Y Y 1 k+1 k+1 a.1 al+...+a k1 k1 Yk1 k+l = I Y Y i=1 i k+l a k+l1 "Yk+l k+l + k+lk+l (lY k+l e / r(a.). I1 k i=l (2.5) Thus, k a.1 k a 1 H Y. ( z Y.) + i=l 1 i=l1 00 k+l 7 a.l i=l 1 Yk k+1 0 k+l r( ai.) k ail = i=l 1 Y. k+l i=l 1 Sr(a,) i=l +1 k+l +1 / n rF(a) dY i k+l i=1 k k+l1 (1 Y.) i=l 0 From an inspection of the Dirichlet distribution, it is easily seen that this distribution has some of the properties desired in a distribution for a vector of continuous proportions. First, it is a distribution on k continuous random variables. Second, each of these random variables is defined on the interval (0,1), so that the Y.'s are in the form of proportions. Third, if the Dirichlet density is written in a slightly different form, namely, letting Pi = Yi for i=l,...,k and Pk+l = (1Y ...Y ), the density is k+l 1 k f(P ,P ,...,k+ ) = 1 2 k+1 k+l r ( Z ai) k+l Ci1 i=l 1 P. k+l i=l = r(c.) i=l 1 (2.6) (2.7) Now, the P.'s have the property that they sum to one. Thus, the Dirichlet distribution seems to be a good candidate for a distribution for a vector of continuous proportions. A second distribution for a vector of continuous proportions is a generalization of the Dirichlet distri bution that was proposed by Connor and Mosimann (1969). This generalization is based on a "neutrality" concept defined by Connor and Mosimann. Consider a set (P1"',,Pk) of nonnegative continuous random variables satisfying the constraint that the P.'s sum to one. 1 Let S. = E P. j=l,...,k with S = 0, (2.8) 3 i=1 1 0 Z. = i i=2,...,kl with Z = PiZk = 1, (2.9) i 1 lZk = 1, (2.9) Si1 P = (Pl',..Pk) with Pjl = (Pl'".Pj) and Pj2 (Pj+l '".Pk) j and Wj = [1/(lSj)]Pj2. (2.11) The concept of a neutral proportion is then defined as follows: Definition 2.1: Given a random vector of proportions, (P",...Pk), the proportion P1 is said to be neutral if P is independent of the vector (P2/1P ,P3/lP.. ,Pk/1P ) Intuitively, this definition states that "P1 does not influence (i.e. is neutral to) the manner in which the remaining proportions P2",..,Pk proportionally divide the remainder of the unit interval; namely the interval (P1,l)." The concept of neutrality is also defined for vectors. Definition 2.2: Given P divided so that P = (P. ',P ), P.jl is a neutral vector if it is independent of Wj. If Pjl is neutral for all j, then P is said to be completely neutral. An important point to note here is that the order of the P.'s in the vector P is critical. While P1 may be neutral in the vector (Pl,P2,... Pk), P2 need not be neutral in (P2',P1P 3"',Pk). A similar relationship holds for neutral vectors. Connor and Mosimann next state and outline the proofs of three theorems relating these concepts of neutrality to the random variables, Zi, i=l,...,k defined previously. Theorem 2.1: If P. is neutral for j=l,2,...,r then the random variables Z1,Z2,...,Zr are mutually independent. For the natural extension of this theorem to a completely neutral vector a stronger result may be obtained. Theorem 2.2: P is completely neutral if and only if Z Z2,...,"Zk are mutually independent. Finally, from this theorem a third result is obtained. Theorem 2.3: P is completely neutral if and only if Pjl is independent of Zj+1 for all j. With the concept of neutrality thus defined and related to the random variables Zi, it is possible to derive a generalization of the Dirichlet distribution. Assume that P is completely neutral. This implies that the Zi's are mutually independent. Assume the density of Zi is the univariate beta density, 1 ai1 bil f(Zi) = [B(ai,bi)] Zi (lZi) (2.12) where ai>0 and bi>0 and B(ai,bi) = r(ai)r(bi)/r(ai+bi) is the beta function. Then k1 1 ail bi. f(Z ,...,Zk) = n B(a ,b.) Z. (1Zi) (2.13) i=l i 1 1 i i=1 Now, the Zi's can be transformed to the Pi's since Zi = Pi/(lP1+...+Pi_). The Jacobian of the trans formation is lower triangular of the form 1 0 1 IP1 0 1 iP P 1 2 1 1P1 2 3 1 k2 1 E P. j=1 3 1 IS m1 f(Pl,...,P ) = I k  k1 1 al1 a21 H B(a.,b.) P (P /1P ) i=l 1 1 1 2 a 1 b 1 k lP1) p b 1 (1 2 ) 2 1P k1 1/ n m=l ... (1 b 1 p bk Pki ) 1P ...k2 (lS ) m1 k1 a.l a.l = B(a.,b.) P. /(1S ) 1 i=l 1 1 1 1 k1 b lkI H [iS./(lS _I)] n 1/1S _. i=l m=l k1 = n m=l Thus, (2.14) * * ... ... ..* . . k1 b klkl a. l f(P ,...,P ) = H B(a.,b.) P k P 1 i=1 i k i=l 1 k2 b.l ki a.l b. S (IS.) / (1S. ) 1 (1S. ) i=l 1 i=l 11 11 k bkli1 kI a.1 k a. = B(a,,b,) P l P ( ) 1 i=l k i=1 j=i k2 b.i ki b.i H (S.) / H (S. i ) i=l 1 i=l 11 k1 i k1 = n B(a.,b.) P i=l 1 1 k ki a.1 k b.(ai+b.) n P. ( Z P.) 1 (2.15) i=l j=i 3 where k ki E Pi= i, Pk = 1 P., and b0 is arbitrary. i=l i=l In the special case when b. = a.+b. i=2,...,k1, this generalization of the Dirichlet distribution reduces to the Dirichlet distribution itself. Thus, it is seen that the distribution just derived is indeed a general ization of the Dirichlet distribution. Some interesting properties of this distribution in relation to the Dirichlet distribution are: 1. Since the Dirichlet distribution is a special case, a vector of proportions following a Dirichlet distribution is completely neutral. 2. From the symmetry of the Dirichlet distribution, if P has a Dirichlet distribution then any permutation of P is necessarily completely neutral. But, for P to follow the generalized Dirichlet distribution, only one permutation of P need be completely neutral. From these results, it is quite evident why the general ized Dirichlet distribution may more easily fit a vector of continuous proportions than the Dirichlet distribution. If there exists one completely neutral vector among the permutations of P, and if it may be assumed that the Z.'s have univariate beta distributions, then P has a generalized Dirichlet distribution. Furthermore, to rule out the Dirichlet distribution, it is necessary to find only one permutation of P which is not completely neutral. The final section of Connor and Mosimann's paper deals with two examples, one of which will be considered briefly. The data for this example come from a horny covering, called a scute, on the underside of a particular variety of Mexican turtle. The undershell is divided into two sections, an anterior and a posterior section. The anterior section is covered by five scutes, one gular scute and a pair of humeral and pectoral scutes. Measurements are taken along the midline of the length of the gular, humeral, and pectoral scutes. If these lengths are denoted by Y1,Y2, and Y3 respectively, while the total length is L, then the proportion of the total length taken by each scute may be formed by Pi = Yi/L for i=1,2,3. These types of data are used by taxonomists to distinguish between different populations of turtles. Through a study of the correlations of P1 with P2/1P' P2 with P1/lP2, and P3 with Pl/lP3, it was found that the vector P = (P1,P2,P3) is completely neutral. For other orderings of P, it was found that the vector is not neutral, a larger pectoral of humeral scute favoring the gular scute to occupy proportionately more of the remaining interval. However, there is a problem which Connor and Mosimann themselves point out. "In these analyses the correlation coefficient is used as a measure of dependence or nonneutrality. Significant nonzero correlations, tested by Fisher's Z transformation, are taken as evidence of nonneutrality. On the other hand, even if the population correlations are zero, neutrality does not necessarily follow with our non normal data." In other. words, a vector which is claimed to be neutral, need not be neutral at all. Thus, some better measure of neutrality would be very useful. Finally, since P = (P1,P,P 3) is a completely neutral vector and no other ordering of P yields a completely neutral vector, then the generalized Dirichlet distribution may be considered for P, while the Dirichlet distribution may not be considered. If it is assumed that Z1 and Z2 have beta distributions, then P has a generalized Dirichlet distribution. Thus, at this point, it seemed reasonable to consider only the generalized Dirichlet distribution for a vector of continuous proportions, since the Dirichlet distribution is a special case of the generalized Dirichlet distribution. However, a recent paper by James (1972) would seem to restrict the usefulness of the generalized Dirichlet distribution. The main thrust of James's paper is that if certain ratios of the P.'s, other than Z., are assumed to have 1 1 beta distributions, then the generalized Dirichlet distribution reduces to the Dirichlet distribution. As an example, a theorem from James's paper states, Theorem 2.4: Let (P ,...,P ) have the generalized Dirichlet density and suppose that each of the variables Ui = Pi/1 E P i=l,...,n1 is marginally beta. Then b. = ai+l+bi+l i i +1 i+1 for all i = l,...,nl and consequently (P...,Pk) has the Dirichlet density. Thus, the rather simple assumption that U. has a beta distribution reduces the generalized Dirichlet distribution to the Dirichlet distribution. There is apparently then a dilemma. On the one hand, the practical example of the turtle scutes leads to the conclusion that the generalized Dirichlet distribution is of value, while James's theoretical considerations imply that it has little value. It seems then, that as well as the general inference problems of estimation and testing, there are the problems of a measure of neutrality for a neutral vector and of determining whether the Dirichlet distribution or the generalized Dirichlet distribution better fits a vector of continuous proportions. Although these are important problems, their solution will be left for future consideration. This, then, is the problem and the work which has been done on it, as set forth at the beginning of this dissertation. It became evident as the research pro gressed, that the problem was quite broad. Thus, the results which follow are a beginning toward utilization of the Dirichlet distribution for analyzing a vector of continuous proportions. Most results deal not with the Dirichlet distribution, but rather with the beta distri bution. Since the Dirichlet and generalized Dirichlet distributions are so closely related to the beta distri bution, and since work on estimation of and testing hypotheses about the parameters of the beta distribution has been rather slight, it was thought to be a good starting place for work on the entire problem. As is pointed out in Chapters 3 and 4, the chapters dealing with inference for the beta distribution, some results are directly applicable and others may generalize rather straightforwardly to the Dirichlet and generalized Dirichlet distributions. Chapter 5 gives the results thus far obtained for the Dirichlet distribution. However, some questions still remain unanswered and will provide the basis for further research. CHAPTER 3 ESTIMATION FOR THE BETA DISTRIBUTION Introduction If the Dirichlet or generalized Dirichlet distri bution is to be used as a model for a vector of contin uous proportions, it will be necessary to have a means of estimating the parameters of these distributions. For the generalized Dirichlet distribution, the parameters may be estimated through the Z.'s which have univariate 1 beta distributions. This might also be done for the Dirichlet distribution since it is a special case of the generalized Dirichlet distribution. However, the parameters of the Dirichlet distribution may also be estimated directly. Since the Dirichlet distribution is a multi variate beta distribution, it is quite evident that similar problems will be encountered for the Dirichlet distribution as for the beta distribution in any method of estimation. For example, in maximum likelihood estimation, the likelihood equations for the beta distribution are n E ln(Xi) (a) Y(&+) = i=l (3.1) n and n Z In (Xi) T(8) Y(a+3) = i=l, (3.2) n where Y is the digamma function. For the Dirichlet distribution, the equations are similar. n k+l E In(Pij) Y(a.) 1( Z C ) = i=l j=l,...,k+l.(3.3) j=l n Thus, the solution of these equations is an extension of the solution of the equations for the beta distribution, and will be given in Chapter 5. For this reason, a study was made of possible estimators for the parameters of the beta distribution, and the properties of those estimators. Specifically, exact formulas or approximations, where exact formulas were not obtainable, were found for the biases, variances, and covariance of the estimators. In addition to estimating the parameters, estimators for the mean of the beta distribution have also been considered. This was done because such an estimator would have more practical significance to an experimenter than estimators of the parameters themselves. Sample Moment Estimators The first estimators considered were the sample moment estimators. Let M1 denote the first sample moment, n n 2 E Xi/n, and M2 denote the second sample moment, E Xi/n. i=l i=l Using the method of moments technique of estimation, these sample moments are equated to the corresponding population moments in terms of the parameters of the distribution. The equations obtained are then solved for the estimators of the parameters. A cdaA + at a(a+l) M1 = ^ and M2 = (3.4) S +8 (a+B) (a++1) From the first equation, a(lM1) S= M (3.5) Substituting into the second equation, M1(a+l) M2 = (a+a(lM1)+l) ; (3.6) M1 S2 2 aM aM M1 MM2 (3.7) Thus, M1(M M) (1M ) (M M ) S= 2 and = M2 (3.8) MM M M 21 2 1 These, then,are the moment estimators of the parameters of the beta distribution. The moment estimator of the mean of the beta distribution is simply the first sample moment, M1. Since the estimators of a and are ratios of functions of the sample moments, their expected values , variances, and covariance are not easily obtainable. Therefore, first order approximations for the biases, variances, and covariance of the estimators were found. Let M1(M1M2) g(M1,M2) = M M2 = a (3.9) 2 1 and (1M1) (M M2 h(M,M2) = 2 = 8 (3.10) 2 1 Then, by expanding these functions in a Taylor series and approximating them by the first order terms of the series a = g(M11M2) = g(E(M1),E(M2)) g (M,M 2) + (M1E(M )) 1M1 E(M1),E(M2) 6g(M l2) + (M2E(M2)) M2 E(M ),E(M2) (3.11) and S= h(M1'M2) = h(E(M1),E(M2)) 6h (M1,M2) + (MIE(M1)) 6M1 E(M1),E(M2) 6h(M1,M2) + (M2E(M2)) 6M2 E(M1),E(M2).(3.12) Thus, first order approximations to the biases, variances, and covariance of the estimators of a and 8 may be found. 6g (M1 ,M2) E(aa) = E((M1E(M ))) 1 E(M1),E(M2) 6g(M1,M2) + E((M2E(M2))) 6M2 E(M1),E(M2). (3.13) h (M1,M2) E(B8) = E((M E(M1))) 6M1 E(M ),E(M2) 6h(M1,M2) + E((M E(M2))) M2 E(M1),E(M2). (3.14) But the partial derivatives of g(MIM2) evaluated at E(M1) and E(M2) are constant when the expected value is taken, and thus these terms drop out of E(aa). Therefore,to a first order approximation the estimators of the parameters are unbiased. First order approximations to the variances and covariance of the estimators are found similarly. Var(a) = Var (M4) Sg (M 2) 6M E (M) ,E (M2) 2 + Var(M2) Sg (M,M 2)2 6M2 E (M ),E(M2) + 2 Cov(Ml,M2) rg(1 2)j 1 g(r M 2) E 1 2 M E(MI),E(M2), (3.15) Var() Var(MI) h(M1'M2) E() 2 6M1 I E(M) E (M2 + Var(M2) 6h(MM2) 2 6M2 E (M ), E (M2 + 2 Cov(M1,M2) 6 2 6h(M M2 6M 6M 6m1 L M2 E(M1),E(M2), (3.16) and Cov(a,8) Var(M1) 6(M1M2) 1h(MI'M2) 6M1 6M1 E(MI),E(M2) + Var(M2) 6g(MI'M2) 6h(MI'M2 ) 6M2 6M2 E(M) ,E(M2) + Cov(MIM2) {5(MirM2) Lh(M1,M2 6M 6M S6g(M2 6h(M M1 jE(Ml),E(M2) (3.17) Expressions for Var(M1), Var(M2), Cov(M1,M2), and the partial derivatives evaluated at E(M1), E(M2) were then found in terms of a and 8. n 2 n 2 Var(M) X. E Xi__ n n 1 a(a+l) a2 = n (a+8) (a++1) (a+8) C3.181 n 2 2 n 2 2 Var(M2) = E i=l E i=1 } n n = 1 a(a+l) (a+2) (a+3) a2 (+)2 n 2 2 a+8) (a+8+1) (a+8+2) (a+++3) (a+8)2 (a+8+1) S3.19) n n 2 n n 2 E X. E X E X. E X Cov(MIM2) = E i=1 i=1 E i=1 E i=1 n n n n 2 = a(a(+l) (a+2) a (a+l) (at+8) (c+8+1) (a+8+2) (ca+)) (a++1)j (3.20) 6g (M,M2) M2 (2M1M2M2) 6M E(M ),E(M ) 2 S1 M E(M1),E(M2) (3.21) 6g(M1M2) = MI(M1l) 6M E(M ),E(M2) 2 S1 (M2MI) E(M1),E(M2) (3.22) 6h(M'M2) =M2 (1+M24MI)+M2 (1+M2 ____2 2 1 1 6MI E(M1 ),E(M2) (M2 M2 ) 2 1 (M2M1 E(M1),E(M) and (3.23) 6h(M ,M ) M (2M M21) 12 = 1 1 1 6M2 E(M1 ),E(M2) (M 2M2 2 2 2 (M2M) E(M),E(M2) (3.24) Since E(M ) = a/a+B and E(M2) = a(a+1)/(a+8)(a+a+1), these partial derivatives are then expressed in terms of a and 8. At this point a computer program was used A A A to evaluate Var(a), Var(8), and Cov(a,8) for various values of a and 8. The results of those computations are shown in Table I. The estimator of the mean of the beta distribution, MI, is an unbiased estimator, and an exact formula for its variance exists. n 2 n 2. I X., E X. Var(M) = E i=l Ei=1 n n = 1 __ (3.25) n 2 n (a+B)(a++1) * The variance of the estimator of the mean for various combinations of a and 8 is also contained in Table 1. Maximum Likelihood Estimators Derivation of the Estimators A second natural set of estimators to consider are the maximum likelihood estimators. Unfortunately, the likelihood equations for the beta distribution do not yield simple solutions for the estimators. The problem of obtaining the maximum likelihood estimators for the beta distribution has been investigated in a paper by Gnanadesikan, Pinkham, and Hughes (1967). The main thrust of Gnanadesikan, Pinkham, and Hughes's paper is to investigate the effect of using all or some of the order statistics from a beta distribution to obtain maximum likelihood estimators for the parameters. Given a sample X,...,Xn from a beta distribution with parameters I a and 6, so that f(Xi) r(a+) r (a))r M a11 X. (1X ) 1 1 n a1 81 L(X ..,x n) = F r(o+) iHn X. (1xi n ()F()R i=i 1 (3.27) and In L(X ,...,X ) = n In F(a+B) n In F(a) n lnr(8) I n n n + (a1) Z In X. + (61) E In(lX.). i=l i=l (3.28) Note that the notation has been changed from Gnanadesikan, Pinkham, and Hughes's paper to conform with the notation used in this dissertation. The likelihood equations using this notation are then A LI n 1n r(a+B) n 61n r (a) + E In X. = 0, 6a 6Ta i=l n i6n r(6+R) n i1n rF() + E In(lXi) = 0. Si=l Then 0 (3.26) and (3.29) (3.30) Equivalently n Z In Xi '(a) Y(a+8) = i=l (3.31) n and n E In(lX.) Y(a) Y(a+$) = i=l (3.32) n where l(a) = 61n r(a), T'(a) = 61n F(8), and 6a B CTa+B) = 61n r(a+) = 61n r(a+a) (3.33) S ^ 5a 5$ By the nature of these functions of a and 6, called "psi" functions, involved in the likelihood equations, it was A A not possible to solve directly for a and 3. Thus, the NewtonRaphson iteration method was used to solve the equations. The problem is further complicated for Gnanadesikan, Pinkham, and Hughes since they wish to find maximum likelihood estimators also for the case when only the first p order statistics from a sample of size n are used. The likelihood equations are then of the form n E In X. P i=1 = Y(a) T(a+$) (1p) 1(x;' (3.34) n n n SI(Xp;a,8) and n E ln(lXi) ^ .i1' (3.35) ( P i=l ______ = () Y(a+c) (1p) 21(.V.. (3.35) n n n I (Xp;,) I(xp;a,8) where ^ 1 ^ i1 I(x;a,f) = t (lt) dt, x 1 I(x;,) = t (1t)1 n(t) dt, x ^ a 1 ^ I2(x;a,8) = tal(lt)l8n(lt) dt (3.36) Jx Gnanadesikan, Pinkham, and Hughes then go on to compare the estimators obtained by using various fractions of the data and present two examples of their results. However, the section of their paper which is relevant here is the section in which they actually compute the estimators given the entire sample. The method proceeds as follows. Let n n E In X. E In(lXi) kI = i=l and k2 = i=l (3.37) n n Then the likelihood equations are P(a) '(a+B) = k and Y([) T(a+B) = k2, (3.38) where kl and k2 are constant in terms of a and 8. Let the solutions of these equations, a and B, be equal to some initial values plus a correction. That is, a = a0+h and 8 = 80+1. Then Y(a0+h) Y(a0+h+80+l) kI = 0 and T(80+1) Y(a0+h+80+l) k2 = 0. (3.39) Expanding these functions in Taylor series' and using only the constant and first derivative terms of those series as an approximation, the equations are '(a0+h) y(a0+h+50+l) k1 = '(a0) (ao+BO0) k1 + h 6[ (a) Y(a+) k1 + 1 a[(a) (a+) ] a $0' and Y(80+1) '(a +h+80+1) k2 C! T(0) '(0+ 0) k2 + h 6[Y(T) (a+6) k2] + 1 6[(T) y(a+$) k2] L ,1 0 Let s[T'()0(a+e)2] 6a I0, 0 , = i'(a )T' (0) T' (t0+0 ) [T' (a~ ) + '(BO)] . Now, let the first correction to the initial estimate, a0, be hl, and the first correction to B0 be 11. Then (a0+B0) + k1 ((a0) (a+0 0) + k2 T(B0) ' (a0+) '' (%0) T'(a0 +0) (3.40) (3.41) h = I ! [T (a) (a+e)kl] 1 6[IyT (a)T (a+)kl] I O',, O h = ['(a0+ 0) + kl (0a)] I[ (B ) '( +0)] +' (o0 +0)[(Q0 0+0) + k2 T(80)] / D (3.42) and (00) (C 0+0) (a00+ 0) + kl '(a0) 1 = 1 (a0+B0) 1(a0 +80) + k2 0(BO) D = [((o+0) + k2 i(0)] [V' (0) (a0+80) +T'(aO+B0) [(x0 +80) + k1 '(a0)] / D (3.43) The entire process is now repeated using a0+hl and 80+11 as new initial estimators. The iteration continues until it is clear that the process is converging to a solution. As the initial estimators, ~0 and B0, Gnanadesikan Pinkham, and Hughes propose the sample moment estimators discussed earlier. These seem to work satisfactorily, in that the process does converge to a solution. That is the NewtonRaphson method. The only difference from Gnanadesikan, Pinkham, and Hughes's method of solving the likelihood equations is that Bernoulli series approximations for the derivatives of the psi functions have been used in the corrections to the initial estimators, rather than the approximation of the derivatives by differences. The Bernoulli series approximations were given in a paper by Choi and Wette (1969). To find Y' (X), the equation is 1 2 T'(x) = {l+{l+[1(! 2)/X ]/3X}/2X}/X X>8. (3.44) 5 7X If X<8 then use is made of the recurrence formula Y'(X) = Y'(X+l) + 1/X2 (3.45) For example, 1 1 1 1 *' (4.5) = '' (8.5) + 2 + ( 2 + ( 2+ 2 (4.5) (5.5) (6.5) (7.5) By using this approximation to the derivatives of the psi functions, the NewtonRaphson method converges in fewer iterations than Gnanadesikan, Pinkham, and Hughes report in their paper. In no case did it take more than four iterations to arrive at a solution. Since the maximum likelihood estimator of a function of a parameter or parameters is simply the function of the estimators, the estimator of the mean, a/a+8, is a/a+B where a and 8 are the maximum likelihood estimators of a and 8. Asymptotic Properties of the Estimators Having obtained the maximum likelihood estimators, the next logical question would be what sort of statis tical properties do these estimators have. More specif ically, since they are rather difficult estimators to obtain, are they a significant improvement over the sample moment estimators. To answer these questions, an attempt has been made to investigate the biases, variances, and covariance of the estimators. Because of the fact that the estimators were obtained through an iteration process rather than being given through an explicit formula, exact expressions for the biases, variances, and covariance were unobtainable. However, two methods of approximation were available. First, since these are maximum likelihood estimators, their asymptotic distribution is well known if certain regularity conditions are met. From the asymptotic distribution the biases, variances, and covariance for large n are known. Second, in an attempt to determine the biases, variances, and covariance for smaller values of n, the likelihood equations were expanded in a Taylor series. Since the first and second moments of the right hand side of the equations are obtainable, then the expanded equations may be solved simultaneously for the biases, variances, and covariance of the estimators of a and 6. First, consider the asymptotic distribution of the estimators. This distribution, for a multidimensional parameter vector, is given in Wilks (1962). Theorem 3.1: If (Xl,...,Xn) is a sample from the c.d.f., F(X;60), where 80 is rdimensional and F(X;8) is regular with respect to its first and second 8derivatives for 8 in r0O and if the A A maximum likelihood estimator (08,...,er) satisfying (12.7.1) is unique for n>some no, n and measurable with respect to H F(X ;8), i=l ' then it is asymptotically distributed for large n, according to the rdimensional normal 1 distribution, N({ 0 },IIn Bpq 1 ), where p0 pq 1l1 pql 1 I IBpq(eO0' 0)11 Note that B p (6,0 ) = S (X;d')Sq(X;9') dF(X;6) (3.46) S00 To apply this theorem, the conditions of the theorem must be met for the beta distribution. Regularity with respect to the first and second 8derivatives is equivalent to E(S (X;9)) = 0 and E(S (X;9)S (X;6) +E(S (X;9))) = 0 (3.47) Now in this case S (X;9) = 61n f(X,a,8) = Y(a+B) T(a) + In X, p 6a S (X;9) = (S*n f(X,a,g) = '(c+B) '(8) + In(lX), q 6 and S pq(X;) = S21n f(X,a,8) = T' (a+B) Thus, E[S (X;6)] = T(a+B) T(a) + E(ln X) To obtain E(ln X) consider xaf(lX)B1 dx = F(a)r(B) , 0 r(a+8) 1 6 o XU l(lX)Bl dX = 6 rF(a)rF() &a 0 o ^ r(a+B) (in X) Xa1(1X)1dx = r() [r'(a)r(a)Y(a+)], 0o r (a+B) E(ln X) = r (a+ ) r(B) [r' (a)r () (a+ ) ] r(a)r (B) r(a+B) = ~(a) ~(a+B) Therefore, E[S (X;6)] = T(a+B) T(a) + [T(a) T(a+B)] p (3.54) (3.48) (3.49) (3.50) (3.51) and (3.52) (3.53) = 0 . Similarly, E[S (X;9)] = 0. pq q E[Spq(X;O)] = E 6 in f(X,a,8) = E 6 In f(X,a,8) 6 In f(Xa,8) (3.55) and E[S p(X;)S (X;6)] = E 6 In f(X,a,8) 6 In f(X,8), . (3.56) Thus, E[S (X;6)S (X;9) + E{S (X;6)}] = 0. (3.57) p q pq Therefore, F(X,90) is regular with respect to its first and second 0derivatives. Since the iteration process converges to a single pair of estimators for a and a, then the solutions to the likelihood equations are unique. n n Expressing a as a = a0+ E hi and 8 as 6 = 80+ Z ii, i=l i=l then a and 8 are measurable functions if a0, %0, hi, and i are measurable functions for all i, since n n lim E hi and lim 1.i are measurable if all the hi's n i=l n+ i=l and 1.'s are measurable. Let, M1 and M2 be the first two sample moments. Then M1 and M2 are measurable since the Xi's are random variables and hence measurable. 2 2 But, a0 = M1(M1M2)/M2M1 and 80 = M) (M M2)/M2M. Since products and sums of measurable functions are measurable and since a continuous function of a measurable function is measurable, then a0 and 80 are measurable functions. Consider hi, h = [(a0c+ 0) + k1 Y(a c)] [ (0) ,'(10+0o)] + T' (ao0+) ['(a0+00) + kl T(0)] / D (3.58) where D = (a0)T' (0) (a0+ 0) [' (a0) + (80)] (3.59) Since f(X) and Y' (X) for 0 functions and k1 and k2 are measurable functions, then all components of hI are measurable. Thus, hI is measurable and similarly 11 is measurable. Therefore, A after one iteration the estimators al = a0+hl' 1 = a0+l1 are measurable functions. But, h2 and 12 are computed by replacing a0 and 80 in hI and 11 by A A a1 and B1. Thus, h2 and 12 are measurable. Similarly, h. and 1. are measurable for all i. Therefore a and 8 1 1 are measurable functions. Thus, the conditions for the theorem giving the asymptotic distribution of the maximum likelihood estimators are satisfied. Once again, the theorem states that the asymptotic distribution of the estimators is the rdimensional normal distribution, N({Op0}, In Bpqll ) Hence, A A asymptotically the estimators, a and B, are unbiased. The notation which Wilks uses for the asymptotic variances and covariance of the estimators, Iln Bpq I is more commonly given as the inverse of n times the information matrix, (nI). To find the information matrix, proceed as follows, f(X,a,O) = r(a+B) X' (1X)1 (3.60) r(a)r(s) In f(X,a,B) = In F(a+8) In r(a) In r(') + (a1) In X + (Bl) In(lX), (3.61) 61n f(X,a,B) = Y(a+c) T(a) + In X and (3.62) 6a 6 2n f(X,a,8) = Y'(a+8) V'(a) (3.63) 6a2 Therefore, E 61n f(X,a,=) = Ei 21n f(X,a,8) I L 2 L = Y'(a) Y' (a+6). (3.64) Similarly 621n f(X,a,B) = Y' (a+) (T ) 682 621n f(X,a,8) = '(a+6) . 6a68 E [ 61n f(X,a,6)] EL  2 = (8) Y' (a+) and (3.67) E 61n f(X,a,B) 61n f(X,a,8) = a' (a+a). (3.68) Hence, = Y' I = (a) V'(a+B) V' (a+B) V' (a+8) V'(8) Y'((a+0) so that III = Y'(a) Y'(a+B) u (ar+B) I" (a+a) 1' (8) F' (cr+B) = F' (ca)Y' () Y' (Ca+) [v' (a) + '' (8)] and Thus, (3.65) (3.66) I, (3.69) (3.70) Now O (g) Y' (a+g) (a+g) I1 = T (a+g) T (a) Y (a+g) II II (3.71) Therefore, asymptotically, Var(a) = T' () 1' (a+) nlIh Var(8) = Y'(a) T'(a+g) and nIlh Cov(a, ) = V'(a+g) (3.72) nlII The estimator of the mean,a/a+g, is asymptotically unbiased since it is the maximum likelihood estimator of a/a+8. To find the variance of the estimator, note that the asymptotic variance of a function of the maximum likelihood estimators is simply the variance of the Taylor series expansion of the function terminated after the first derivatives. Thus, asymptotically, Var = 6a/a+B 2 Var(a) .+J L a 2,, + [ 6a/a+ 2 Var(B) + 2 a/a+/a+B L1 Cov(ca,a) 2 a^ 2 A S Var(a) + a Var(8) 2apCov(a,8) (3.72) (a+8)4 where Var(a), Var(B), and Cov(a, ) are the asymptotic expressions obtained previously. Substituting those expressions, Var a_ = { 2Y'(8) T'(a+)] + a2 [ (a) I' (a+8)] 2a8 B'(a+e)}/nIII (ca+)4 (3.73) Small Sample Properties of the Estimators At this point, a comparison of these asymptotic results with the results for sample moment estimation could be made. However, some information, even if only an approximation, on the small sample biases, variances, and covariance of the maximum likelihood estimators would be desirable. Fortunately this can be done. Consider again the likelihood equations n E In X. (a) T (c+B) = i=l and n n Z ln(lX.) '(B) Y(a+) = i=l (3.74) n Now n EZ In X. E i=l = E(ln X.) (3.75) n since the X. 's are a random sample from a beta distribution 1 with parameters a and B. As has been shown before, E(ln X.) = '(a) '(a+c). Similarly n E In(lX.) E i=l = E[ln(lXi)] = Y(8) Y(a++). (3.76) n Now let K(a,8) = T'(a) (a+) [) () (a+)] L(a, $) = T(0) T ( a+ ) E(T) T (a I n E In X. k = i=l n  [Y(a) ~(a+8)] and E In(lX.) 1 = i=l 1 [() \(a+3)], n (3.77) so that the likelihood equations may be transformed into the following equations, K(a,8) = k L(a, ) = 1 . and (3.80) Now let m. . 13 ij = E( k ) , = E[ (aa) (08) ] K. = f i6 KtcS) 13 ~j ,8 Li = 6 L(ct,8) iAj! and (3.81) By expanding K(c,8) in a Taylor series about the point (ca,) , K(a, ) = ay() T(a+S) If (a) T(a+B)] K(a,8) = K(a,B) + (aa) 5K(,) I) 2 2 A ^ 2 2 + () K(aB) + (aa) K(aa) + (8)2 2 K(a,B) + ... (3.82) 2 2 a,8 = (aa) K10 + (08) K01 + (aa)2 K20 + (aa) )() K + (B)2 K02 + ... (3.83) Similarly L(a, ) = (aa) L10 + ( 01 + (aa)2 L20 + (aa)(88) L11 + (2 L2 + ... (3.84) Since K(a,8) = k and L(a,8) = 1 and since the moments of k and 1 can be found, then the moments, or more specifically the pij's, may be expressed in terms of the moments of k and 1. Specifically, terminating the Taylor series after second derivatives, 0 = E(k) = E(K(a,a)) = E{(aa) K10 + (a8) K01 + (aa)2 K20 + (aa)(88) K11 + ( )2 K02 0 = E(1) = E(L(a,B)) = E{(aa) L10 + (88) L01 ^ 2 2 + (aa) L2 + (aa)(8) L11 + () L2), 2 ^ ^ 2 ^ ^ m20 = E(k[) = E(K(a,))] = E{[(aa) K10 + (^8) K01 + (aa) K20 + (aa)(8) K + (_)2 K02 12, + (aa) K20 + (aa) (B) K + (BB)2 K02] 20 11 02 [(aa) L10 + (8) L01 + (aa) L20 A 2 10+L01+ ((ctt) L 20 + (ca)() L1 + (B8)2 L02]} and m0 = E(12) = E[(L(a,))2] = E{[(aa) L0 + (8) L0 A 2 a A A 2 2 + (aa) L20 + (aa) ( L) L1 + (~)2 LO]2}. (3.85) Expressing these equations in one matrix equation and including only the terms of the expansions involving 10' O 01' P20' v11' and V02' since those are the terms of interest, 1 K0 01 20 11 02 10 L10 L01 L20 L11 L02 01 m = 0 0 K2 2K K K 2 20 10= 0 10 01 01 20 KloL01 m 0 0 K L 10 KL L 0 11 10 10 K+ K01 01 11 01 10 2 2 m2 0 0 L 2Lo L L 202 02 10 10 01 01 02 .(3.86) In matrix notation, m = KP Solving for the vector V , 1 K m = p so that an approximation for p in terms of a matrix of constants with respect to the distribution, F(X ,...,X ), and a vector m, for which moments can be 1 n  found, is obtained. Notice that the jvector is not A A A A A A exactly E(a), E( ), Var(a), Cov(a,8), and Var(s), but ^ ^ ^ 2 A A 2 rather E(aa), E(B6), E(aa) E(aa)(8B), and E(a8)2 However, from these expected values the biases, variances, and covariance of the estimators may be found. The next question is how good an approximation is this. The order of the approximation in powers of n depends on the order in n of the terms in m and K, and the order of succeeding terms in the equation. In other words, if the Taylor series expansion of K(a,8) were to be carried out further, thus incorporating higher moments of [( aa), (a_)j], higher moments of (k ,j) would need to be included in order to solve the system of equations. Higher powers on n would then be included in the expansion, implying a better approximation. Since K is constant in n, the order of the approximation will depend only on m. On the condition that m.. is  1J of order [(i+j+l)/2] in 1/n, where [ ] in this case represents the greatest integer function, the solution of the matrix equation previously given yields p to order 1/n. Since this approximation worked fairly well, except for very small values of n, a better approximation has not been computed. To prove the condition that mij is of order [(i+j+l)/2], consider the two special cases of mk0 for k odd and for k even, in order to fix the ideas of the proof for the more general case of m... Sn ~k mk0= 1 E i(Yi ]) (3.87) "k i=l where Y. = In Xi, X1,...,X are a random sample from the beta distribution with parameters a and 6, and p = (a) '(a+B) = E(ln X.). Let k be even. In the n 1 expansion of [ E (Yi_)]k there will be a term i=l 2 2 2 E (Y ) (Yi ... (Yi ) (3.88) i+i2...i k/2 1 2 k/2 Since the Xi's and hence the Yi's are independent and identically distributed, Z ELi E~..i/ 2 2 2 1 12 k/2 = n(nl) (n2)...(n+l) [E(Yip) 2k/2 Y, I (3.89) which is of order nk/2 All terms preceding this one n in the expansion of [ E (YiP)]k will be of smaller k/2 i=l order than nk/2 since they will include a summation over less than k/2 subscripts. The next term following this one in the expansion would be Z iJ+i2+ .~ (Y ) (Y ) ...(Y ) (Yi P). 1+ 2 k k ( 2 2 (3.90) The expected value of this term is zero since the Y.'s are independent and E (Y ik p) = E(ln X. 41l  E[ln X. 1 Thus, all terms with fewer components than (3.88) will have a smaller order than nk/2 and all terms with more components than (3.88) .will be zero in expectation. ]) = 0. (3.91) Thus, the highest power of n present in mk for k even, is k k/2 1 1 (1/n ) (n k) = 7 = l(k+l)/2] (3.92) n n so that mkO is of order [(k+1)/2] in 1/n. Now let k be odd. By an argument similar to the previous one, the term with highest power of n, which shall be referred to n k as the "worst" term, in the expansion of E[ E (Yip)] is i=l1 E i (Yi )(Y 2 (Yik ) +2+ +i1 2 kk S2 (Yi P k3 = n(n1) (n2)... (nkl )E (Y. ) {E[ (Y. )2]} (3.93) which has highest power of n equal to n(k1)/2 Therefore, mk0, for k odd, has highest power of n k (kl)/2 1 1 (1/n) (n ( 2) = (k+1)/2 = [(k+l)/2] (3.94) n n so that mk0 is of order [(k+1)/2] in 1/n. Now, consider m in general. pq n P n pq n i y n 1I= 1 n P n q = p+q E (Y.P) E (Z ) n I ii=l i= z (3.95) where Yi = in X. , i 1 Zi = In(lXi), py = T(a) T'(a+8) = E(ln Xi), and uz = T(8) Y(a+6) = E(ln(lXi)) (3.96) Assume p Let p and q both be even. Then there are two possible "worst" terms in the expansion, E [i E E1 11+1 2 "ip+j1 2+ J 2 2 2 2 (Yily (Yi2y) ... 1 2 2 2 21 (Y y (Z l z) 2(Z 2 z) j Z) 2 2 which has highest power of n equal to 2 + and 2 2 E E i +i2i' +ip+j ...j qp 2 (Yily ) (ZilpZ) (Y. )(Z iP )(Z ) ..Z ) 1 ( p Y 'p J1 ~y ZI Z) Z (3.97) (3.98) which has highest power of n equal to p + q2 = 2P+ 2 2 Thus, mpq has highest power of n (1/nP+q) (p+q/2) = 1/n+q/2 = /n[(p+q/2)+] (3.99) so that mpq is of order [(p+q+l)/2] in 1/n for p and q even. For p even and q odd, the "worst" terms have powers of n equal to P + 1 = p+q and p + p1 = p+q1 2 2 2 2 2 so that mp is of order p+q+1 [+q+1] in 1/n. For p Pq 2 2 odd and q even,the "worst" terms have powers of n equal to l1 + q = p+q1 and p + q1 = +q1, so that m 2 2 2 2 2 q is again of order p+q+l [p+q+1] in 1. Finally, for 2 2 n p and q both odd, the "worst" terms have powers of n equal to p1 + q1 =p+q2 and p + qP = P+q so that 2 2 2 2 2 mpq is of order E. = [PIE ] in 1. This completes the proof. Now, if a solution for the vector p in the matrix equation m = KI can be found, then an approximation of order 1/n for E(aa), E(aa), E(aa)2, E(aa) ( a), and A 2 E(B8) is obtained. Solving the matrix equation involves finding m20, mll, and m02 and the Kmatrix and its inverse. Expressions for m and K have been found, but the computation of K and then of j is done by a computer program. The results for various combinations of a and 8 are given in Table 2. The Kmatrix is found from partial derivatives of K(a,a) and L(a,8) K.. = 6i+j K(c,8) 11 ,3 ae 6a 6$ i! ji L.. = 6i+j L(a,8) 1 i j 'a, 6a 65 i! j! and (3.100) Since K(a,$) L(a,$) = ) = W(a) y(T+8) [Y(a) Y(ca+)] and  T(a+$) [T'($) (a+B)] K0 = y' (a) (a+T ) K01 = V'(a++) K = w"(a) \Y(a+6) 20 2 KI = T"(a+8) K2 = "(a+8) 02 2 L01 = 'T () I' (a+3) L10 = ' (a+8) L02 = Y"() Y"(2a+) L1 = V"(a+$) L20 = 2"(a+8) 2 (3.101) As before, the psi primes and double primes are calculated using Bernoulli series expansions of those functions. The vector, m, is easily found since n n 2 = Var[. 1 In X In i=l 1 since m20 =1 Var(ln Xi) = El(ln X.)2 [E(ln X.)]2] n 1 1 1 E[(ln X.) _[(a) Y(a+c)] 2 and (3.102) n i1 at1 1 2 r(a+C) X. (1X.) In X dX. Sr (a)r( ) 1 = (a+g) 62 r(ca)r(B) r(a)r(M) 1 62 F(a+B) = (a+)B) rT~TFrm [r"(a)r(B ) r' (a)r(B)Y (a+B ) r(a)r(B) (a+B ) (a)r(B) (a+8) + r(a)r(o)Ty(ca+)2] / r(a+a) = (' (a) + '(a)2) y(a) Y(a+S) '' (a+B) Y(a)~(a+) + T(a++)2 2 = ''(a) Y'(a+8) + [Y(a) '(a+B)]2 (3.103) Therefore, m = 1 [( (a) Y' (a+)] (3.104) 20 n E(ln X.) 1 Similarly n n m 1 = E 1 E In X. E( 1 E In X.) n i=l n i=l n 1 n n n = Cov ( i n X.),(1 In(1X)) I n il 1 i=l = 1 E [In X. In(lX.) E(ln X.)E(ln(1X.)) n 1 1 1 1 = (a+(3.105) n and m02 1 E [(n(lX ))2 [E(n(lXi))]2] n = [L' () '' (a+B)]. (3.106) n This gives an approximation of order 1/n for E(aa), S^ 2 2 E(S8), E(aa) E(aa)(B8), E(8e) The accuracy .of this approximation depends on the magnitude of the coefficients of the 1/n2 terms, the 1/n3 terms, and so forth in the expansion. An idea of how well the approxi mation works can be gathered by looking at Table 2. Notice that the problem is symmetric in a and 6, so that it is only necessary to find the biases, variances, and covariance for combinations of a = a and 8 = b. The results for a = b and 8 = a will be found from the symmetry. Since Var(a) = E(aa) [E(aa)]2 and Var(a)>0, this provides a check on the accuracy of the approximation for a given sample size n. For example, consider the first entry in Table 2, for a = 8 = .1 Here E(&a) = .20503 and E(aa)2 = .01515 Evidently, for n n n = 1, the approximation is not good since the bias squared is then larger than the expected mean square, S 2a 2 E(aa). However, for n = 10, E(aa) = .001515 and E(aa) = .020503, so that the bias squared is smaller than the expected mean square, which makes the variance of & positive. Consider now the entry for 8 = .1, a = 10. Here E(aa) = 111.5135 and E(aa)2= 1013.568 For n = 10, n n the bias squared is still larger than the expected mean square. A sample size of at least 13 is needed before the bias squared becomes smaller than the expected mean square. For a sample of size 20 the bias squared would be 31.088 and the expected mean square would be 50.6784, so that the Var(&) would be 19.5904. From Table 2, it can be seen that for larger values of 8 a sample size of 5 or 6 is sufficient so that the bias squared is smaller than the expected mean square. For example, take the case where a = 5 and 8 = 1. Here E(aa) = 16.1605 and E(aa)2 57.0243 A sample size n n of 5 is large enough so that the expected mean square is larger than the bias squared, and for a sample of size 2 2 20, [E(aa)] = .6529 and E(aa) = 2.8512. Thus, the Var(a) = 2.1983. For most parameter values, then, one needs a sample of only about 20 observations to have a reasonable approximation to the biases, variances, and covariance of the parameters. To obtain approximations of order 1/n for E(a/a+8) and Var(a/a+B), an expansion of the function a/a+a in a Taylor series is needed. a= a + (aa) 6ca/a+1 + (6) 6ca/ai+6 a+ a+B 6a 2 2^ 2 A + (aa) 6 aC/a+ + (aa) (0) 6 ac/a+a 2 a2 aS 6a6 a, S 2 2^ A A + (6g)26 a/a++ (3.107) 2 ^2 2 Since all terms of the expansion not included in the approximation are of higher order than 1/n, an approximation to E(a/a+8) need only include these six terms. Thus, E( a ) = Ca + E(aa)r g j + E(gg) a1 a+ '+B (a+B)2 L(++ + E(aa)2 2g + E(aa) ( 8) a[ 2 (+) (a+8a)3 2 + E()2 2a and 2 L(a+B)3 ^. 2 2 Var( ) = 6a/a+ 1 Var(a) + ]a/a+ var(8) a+ L a l L B + 2 6a/ag+6 a/a+g 1 Cov(a,$) L ^ IJL E 6^ = V2 Var(a) + a2 Var(B) + 2ag Cov(a,) (3.108) (a+ )4 Again, this approximation is of order 1/n since all terms of the expansion which are left out of the approximation are of higher order than 1/n when the variance of the estimator is found. The bias and variance of the estimator are included in Table 2 for n = 10 and n = 100. Geometric Mean Estimator Another estimator for the mean of the beta distri bution is the geometric mean of the observations. The n 1/n estimator itself is H X1n, which estimates a/a+. i=l The bias of the estimator is easily computed directly from the distribution of the sample. r n f1 1 E n l/n x 1/n ( Si=1 Ji= i r (aO+8) 0i=l i=l () () 0 n a1 81 i x (1x.) dX ..dX i=l 1 1 i 1 1 n n a+11 = (a+8) ... xn (1Xi) r (a)F (8) i=i 0 0 n = F (a+8) r(a+(I/n))r(8) Sr(a)r(B) r(a+B+(1/n)) n = F(a+B))r(a+(l/n)) r (a) (a+B+ (l/n)) dXl.. dXn (3.109) Thus, the bias in the estimator is Sn E n XI/n_ i=l 1 n a+ = r(a+e)r((+(l/n)) _. (3.110) +6 r(a) )r(a+8+(l/n)) " The variance can also be computed directly from the distribution. n 1/n E nX/n 1/ 2 n 1/n 2] Var X = E ( X X ) i=l i= i=l x n 2n = F r(a+6)r(a+(2/n)) 1 r(a+B)F(a+(1/n)) . Sr(a)r(a+6+(2/n)) FL (a) r(a+)++(1/n)) (3.111) Notice that, similar to the arithmetic mean estimator for the mean of the distribution, the bias and variance of the geometric mean are exact expressions. The bias and variance are given for various values of a and 8 in Table 3. Comparison of the Estimators The final section of this chapter will be devoted to a comparison of the three estimators of the mean and the two sets of estimators for the parameters, which have been considered. For this, the reader is referred to Tables 1, 2, and 3. To determine what parameter values to consider when calculating biases, variances, and covariance for the estimators, the various types of curves for different choices of parameters in the beta distri bution were investigated. A beta distribution with parameters a = 8 < 1 has a Ushaped distribution, symmetric about the point X = 1/2. If a < a < 1, then the distribution is still Ushaped, but skewed to the right. If a < B < 1, then the distribution is again Ushaped, but skewed to the left. At a = 8 = 1, the distribution is identical to the uniform distribution. If a > 1 and 8 < 1, the distribution increases with increasing X, increasing more rapidly the larger a is in comparison to 8. If a < 1 and 8 > 1, the distribution is decreasing in X. If a = 8 > 1, the distribution is bellshaped and symmetric about 1/2. If a > 8, then the distribution is skewed to the right. Finally, if a < 8, then the distribution is skewed to the left. Thus, to include all types of curves represented by the beta distribution, a range of parameters from .1 to 10 was considered. More values around 1 were included since that appears to be a critical point at which the beta distribution changes shape. Two sets of estimators for the parameters of the beta distribution, the sample moment estimators and the maximum likelihood estimators, were obtained. Through a first order approximation, the sample moment estimators were found to be unbiased to order 1/n. For the maximum likelihood estimators, for a particular A value of one parameter, say B, the bias in 8 is fairly A constant as a varies, while the bias in a increases with increasing a. For example, for 8 = .1 and a ranging from .5 to 10, E(88) ranges from .15955 to .16834 n n For the same parameter values, E(aa) ranges from 2.4301 111.5135 ^ ^ 301 to 1115135 For a = = .1, E(aa) = E(3) = n n .20503. The results for other values of 8 are similar n and since the problem is symmetric in a and B, similar results would be obtained for particular values of a. Of course, the maximum likelihood estimators are asymptotically unbiased. Looking now at the variances and covariance of the first order approximation for the sample moment estimators, for a particular value of B, and a ranging from .5 to 10, Var(B) increases gradually with increasing a. For example, ata = .5 and = .1, Var(B) = .0497 and at a = 10 and n B = .1, Var(B) = .15777. For the same parameter values n Var(a) increases much more rapidly. At a = .5 and B = .1, A 2329.66 Var(a) = 1.4159, and at a = 10 and 8 = .1, Var(a) 2329.66 n n The covariance of a and 8 falls in between ranging from .1534 at a = .5 and = .1 to 14.942 at a = 10 and B = .1. n n The results are similar for other parameter values. For the maximum likelihood estimators, the expected ^ 2 2 mean squares, E(aa) E() and E(aa)(88) measure the expected values of the squares and products of the deviations of the estimators from the true parameter values. For the sample moment estimators, the variances and covariance measure these deviations since the sample moment estimators are unbiased to first order. Thus, it would seem reasonable to compare the variances and covariance of the sample moment estimators with the expected mean squares and products of the maximum likelihood estimators. For the parameter values consid ered, the expected mean squares and products behave quite similarly to the variances and covariance of the sample moment estimators. For example, for 8 = .1 and 2 a ranging from .5 to 10, E(aa) ranges from .85955 to n 1013.568 2 01141 01094 1 .568, E(SB) ranges from *0111 to 01094 and n n n E(aa)(88) ranges from .03196 to 1.041. Two differences n n from the sample moment estimators are apparent. First, ^ 2 E(2) decreases slightly as a increases. Second, and more important, the expected mean squares and products are somewhat smaller than the variances and covariance for the sample moment estimators. The proportional differences appear to be greatest for small parameter values. For example, at a = .5 and 8 = .1, 2 Var(a) = 1.4159 E(aa) = .85955 n n Var(B) = .0497 E(B5) = .01141 n n Cov(a,$) = .1534 E(aa)(88) = .03196 (3.112) n n and at a = 5 and 8 = 2, Var(a) = 56.5833 E(aa) = 50.507 n n Var(a) = 8.2367 E(8)2 = 6.9664 n n Cov(a,8) = 18.55 E(aa) (B) = 15.782. (3.113) n n Thus, the maximum likelihood estimators are somewhat more exact than the sample moment estimators. Asymptotically, Var(a), Var(B), and Cov(a,S) are A' 2 identical to the first order approximations of E(aa) , A 2 ^ A E(_8) and E(aa) (8). Hence, the maximum likelihood estimators are also more exact estimators asymptotically than the sample moment estimators. For the mean of the beta distribution, the sample moment estimator is unbiased. If a = 8, the first order approximation of the maximum likelihood estimator is also unbiased. For a > 8 and a particular value of 6, the bias is sometimes a decreasing function of a, and sometimes an increasing function of a. For example, for 8 = .1, E(a/a+B) ranges from .2003/n to .01455/n as a increases from .5 to 10. For B = .5, E(a/a+8) increases from .1107/n to .1418/n and then decreases to .0574/n as a ranges from 1 to 10. For a < 8, the bias is simply the negative of the bias for the same combination with a > 8. For example, for a = .5 and 8 = .1, E(a/a+) = .2003/n and for a = .1 and 8 = .5, E(a/a+8) = .2003/n. For estimating the mean, a third estimator, the geometric mean, was considered. Since the expected value and variance of the geometric mean are complicated functions of n, they were evaluated for various values of n. The biases are consistently negative and larger in absolute value than the biases for the first order approximation to the maximum likelihood estimators. For example, for a = 5 and 8 = 2 and n = 10, the bias is .01892 for the geometric mean and .01424 for the first order approximation to the maximum likelihood estimator. The variance of the sample moment estimator of the mean decreases as a increases for a particular value of 8. For example, for b = .1 and a ranging from .5 to A A A 10, Var(a/a+B) ranges from .08681/n to .00088/n. A similar relation holds for the variance of the first order approximation of the maximum likelihood estimator. For 8 = .1 and a ranging from .5 to 10 Var(a/a+8) ranges from .04312/n to .00079/n. As one can see, the maximum likelihood estimator has smaller variance than the sample moment estimator. These results are typical of other parameter combinations. For almost all parameter values, the geometric mean estimator has a larger variance than either the maximum likelihood estimator or the sample moment estimator. As a check on the comparison between the sample moment estimator of the mean and the maximum likelihood estimator of the mean, some simulation work was done. From a beta distribution with parameters a = 3 and 8 =5, 100 samples of size 5 and 100 samples of size 20 were generated on a computer. The two estimators of the mean were then calculated for each sample. Finally, to compare the estimators, the mean square error for each estimator for the samples of size 5 and the samples of size 20 was calculated. Let M1 be the sample moment estimator of the mean. The mean square error for the sample moment estimator is then 100 Sa 2 S(M1 ) / 100 (3.114) i=l + A A A If a/a+8 is the maximum likelihood estimator of the mean, then its mean square error is 100 S( )2 / 100 (3.115) i=l a+B a+a For samples of size 5 from a beta distribution with parameters a = 3 and 8 = 5, the mean square error for the sample moment estimator was .0070148, and the mean square error for the maximum likelihood estimator was .0056575. For samples of size 20,' the mean square error for the moment estimator was .0010910, and the mean square error for the ____~ __ maximum likelihood estimator was .0010896. Thus, in both cases the maximum likelihood estimator more closely estimated the mean of the distribution, although the difference was very small for samples of size 20. Thus, the choice of an estimator for either the parameters or the mean of the beta distribution appears to depend on the characteristics one desires in the estimator. If one wishes to have unbiased estimators, at least to order 1/n, then one should use the sample moment estimators. If one desires smaller variance or expected mean square in the estimators and can tolerate some bias, which decreases with n, then one should use the maximum likelihood estimators. However, the savings in expected mean squares may not be enough to offset the difficulty in obtaining the maximum likelihood estimators. If that is the case, then the sample moment estimators provide easy to calculate, unbiased estimators for the parameters and the mean of the beta distribution. EF) E! 0 I) 04 E 45) 0 0) H Cd L12 H 0 r ai v i 0l a 0)c En U) U Cd 0, Cd 0 0C 0 0 r N H en N N N N 0O I Ln N0 ]n o ' Ln (N v f o 0o r  3 r m C N C N o 0 , C0 N 0 C C C) m rN v 00 Hr W LA 0 1 D r C H en r4 en en %0 0 Ln Ln n C o o o N < Ha < S 4 > 4 (13 > (d > > 0 > u rCl II 02. >> 0 > LA II 0a H H a o m 0 N 0 00 r 0 N CM Ln 0, C I 0 Sm N 0 0 0 H N 01 0 0 N 20 n n o r CMo o o 0 C 0 L L A LA L '0 H N N r n 0* 02 N C C C C LA LAn in L N N N N4 o enr H) e c i a \ *\ \ 0D 0 0 in n i i CM C (*( C 0o usn 1 * *C > > 0 > U 02 N 0 ri 0 a e e e '0 0,N 0 %.0 N m 0 0 0r 0 0 m r en N C N H 0 0 SN N N C CM 0 L CV 4 CM e N e n r 02 00 02 ^ wn CD w L 0 0 'D en C e e N N H a lc c \ \ \. en en I" e en e w e en en u3 o rL rl 'a1 * *O CM CM V aV k a CV\ \ 00 0 0 Ho I'0 1 0 w I m 0 4 a * ri m C. 0 m 1 00 r 0 0 C en fN eln LO C * Wo w m I' Ln LA l 0 * e n < H > > 0 > L II C02 (, 0 .a . 00 (N t.0 0 o o o en S r C) o N N N N * ) 0) C LA) CN CN N mo 0i i 0 (n 0( (n 0 3 CO 00 m C C C N N I r^ r^ o s O  >> u (N II QQ2 S.I 0 an U A N no . O IcrW ON o o 0 CO O0 n 01 1N i l \D 0 co o0 01 o C q ; oor oo 19 < > >. 0 > Ln II Q0 I SHl ae I C) ) CN C) LO LO) 01% 0 ON ON 0 .n .I . '\D 'D '. H H H3 m r H (~ << (~> > > 0 > ri 0 0Q Lnm C C 0 H0 0 m a a c0 0 0 H r m. H H o H H C) 0 0O H en H O r 0 ib N 0 N N N H o o i 0 0l r0 a)] rI (Y * en H CN rN e' Ln co C) CN IN o v ru H C< m 10 0 N H r4 CM H lz0 H C) 0 H 0L O't.0 0H H L H H w n o C) qo o Ln CD N N N N N L' Hn m HI Co H H 0o C) C) C (N Lr a Da a aN H H CM 0 0 i. * N N N < Nn o< aa < a + <\ I I (I  Sn <0 <02+ 03 02 CM 0 +M H Ms ^  H (tf M > Ln 0 CO co *o C4 0 0 0 H So0 Ci CM 0 0 0 C o 0 en en o o m m o o . ) C) 0 en CN Co o 0 0 0 0 H H II II N LA 0) (n C) H 0 0 '0' en O H C '* m. r 11 N N O HC 00C L CO N H H CO CO C 00 00 L 0 v 11H N o N N N N N a o N Hm n r.0 N en o'0 C0 CO C A CO C) H N en kD ko CC) m m 0 00 00 00o CD i r m . H LA . 0 '.0 Ce en H N > m Yen A r A CA me N . C) C) o C en CO C 0 C 4 HI.; * * il i < < l <6 < O, 0 U) 4) 0 0 M ) O H AO 4X N w U) M4 04 S05 04 O 4 IOCA Ln 0 I o H o 0 o o * cc '.0 0 II Ln II II II 11 4)J r . 0 ( II C 8 ZH E1 LA II Mr N N m c( M u m * C N m , Ln Ln \o N 'o col 9 *rl 0 No H ii CN c N Nm oo' . *4 H H Ln L LA \L C C H 0  L o o in 0 *4 o 0a I H N C FN < o n o N N * H C CV N N N c ( N H 0 NC H H H < a < < 0 o0 \ \ H H < a < a II II S> II II IC I I I I l II CQ. CD (NI o o 0 0 ('4 tLA Ln 0 o 0 o o * a o o o N N CO CN CO 0'n riM SN CN N m o C) " c 4 in C~ N ON r' iI LO N o o c o N N .0 LO om C) LO N H r. 0 r'N 111 (Y) oo C CM oo i. co o 0 0 r4 o 0 cr a\ 0 on o 0 0 o o o o CN I CN ^ < . < CO. .. 6m m + H > LOI 021 < a oa a I I I I < (N II Ln 0" I o H o 0 a cr m N N oC C 0 CO O 0 0 0 0 o o o 0 M 0o o .0 ao m 0 co "T r On 0 n H H o n H Ci .0 CM O H C I 0 c 0 0 4 n Nm 0o 1. H v l H C 3 Co CM o c uT 1) 1 tn 0 N \ \ m . in n C \ o )\o ^ ^ ^ Or t ^r TI ^ in iCn C w * *~ in in r^ n O (4 r O 0 : m 0 0 o01 q co o " 'H' 0 r, . tcn LOt CO C CH rC C in in r in il 11 in in in ~ 0 * *~ co in Ro o o o 0 o c(: " I M ra > (N I CN < 0  ^ < <0o < a < I > Ln II ca o H 0 .C 0 0 0 0 0 0 o o o o N N o r %0 0 . m N^ ' H H 0 o 0 o o 0 0 . o 0 0 0 0 0 .0 0 0 , m co 0o H 3 o O Un LO LIn Ln N% r (N 0 C4 C4 0) CO aN il il r N H r4 0 0 0 0 0 o 0 Ln II (N II *CC 3 II c a I 0 0 N II r1 in II II 8 c, C I c ^( < S< < + 0 m + < 3 0 0 I I I ' I < N r 4  < o II cn M a 0 4) O lC II 4I II 0 m 0 C/2 u 9 1in m n 11 6 co < H II E I I f II 11 II II t, m II 11 II i4 I" o iI II rI O C) II tLn II II Q) Ln r, II 0 U n r! III 0 II CHAPTER 4 HYPOTHESIS TESTING FOR THE BETA DISTRIBUTION Introduction As in the estimation problem, since the Dirichlet or generalized Dirichlet distribution is so closely related to the beta distribution, the problem of developing a test of hypothesis was considered for the beta distribution. In this chapter, tests will be developed for two cases. First, a test about one parameter assuming the other is known, and second, a test about one parameter leaving the other unspecified will be given. Since the beta distribution is symmetric in its parameters, a test about one parameter, say a, could easily be adapted to test the other parameter, 8. Thus, all tests will be constructed on the parameter a. Finally, it will be shown that the test for a leaving 8 arbitrary can be adapted to test an hypothesis about the mean of the beta distribution, a/a+8. The types of hypotheses to be considered are generally known as onesided and twosided hypotheses. It is desired to test either that a is less than some constant, a is greater than some constant, or a is equal to some constant, both fixing and not fixing 8. It is further desired to find the best possible test for each of these 85 hypotheses. Fortunately, because the beta distribution is a member of the exponential class of distributions, such a best test can be found. Onesided Tests for a When 8 Is Known In this section, onesided tests of hypotheses about a when 8 is specified will be developed. From Ferguson (1967), consider the following definitions. Definition 4.1:A test > of H0: 8600 against H : eEO1 is said to have size a if sup E (X) = a. 0eco Definition 4.2: A test 00 is said to be uniformly most powerful (UMP) of size a for testing H0: 6800 against H : Oeo1 if o0 is of size a and if for any other test 0 of size at most a EO 0(X) > Eo (X) for each EO1. For the beta distribution, a typical hypothesis to be considered is, H : aa0 against H : a the following theorem from Ferguson. Theorem 4.1: If the distribution of X has monotone likelihood ratio, any test of the form 1 if X > x ((X) = y if X = x0 (5.24) 0 if X < x0 has nondecreasing power function. Any test of the form (5.24) is UMP of its size for testing H : 8<0 against H : 8e>0 for any 80 e, provided its size is not zero. For every 0 60C, there exist numbers m such that the test (5.24) is UMP of size a, for testing H : e<0 against H : 8>80. A similar statement would hold for testing the hypothesis H0: >8 0 against H : 8<80. For the proof of this theorem, the reader is referred to Ferguson (1967). Since the beta distribution is a continuous distribution, y may be taken as 0. In the notation of Ferguson, the test is given as a function 4(X) which takes values either 0 or 1. If 4(X) is 1, then the null hypothesis is rejected, while if 4(X) is 0, the null hypothesis is not rejected. It should be noted here that the X Ferguson refers to is a sufficient statistic for the parameter in question. The problem is always reduced from a sample of observations to a sufficient statistic or statistics. Consider now the following hypothesis, H : a>a against H1: a a UMP test for this hypothesis exists and is of the form 1 if t < c 4(t) = (4.1) 0 if t > c where t is the sufficient statistic for a. Note, for this hypothesis, since 8 is specified, the distribution contains only one parameter, a. Thus, there is one sufficient statistic for a. To determine the test completely, a sufficient statistic, t, must be found and the constant, c, determined so that the test is of a given size, say .05.. Since the beta distribution with specified B parameter is a member of the exponential class of distri butions, the sufficient statistic for a is easily found. ai 61 f(X,a) = r(a+a) X (lX) (4.2) r(a)r(B) and n n a_1 1 L(X1'...',X ,) = F (a+6) I (lX.) rn (a)(r ( i=l n n = Fr(a+) exp{a In I X.) r (a)r(B)J i=l n n exp{ln H Xi + (8l)ln H (lXi)}. (4.3) i=l i=l n n Thus, In H X or equivalently Xi, is a sufficient i=l i=l statistic for a. Thus, the test is of the form n 1 if H X. < c i=l 1 $(t) = (4.4) n 0 if H Xi > c i=l The problem remains to find the constant, c, so that the test is of a given size. Assume a test of size .05 is desired. Then, c can be determined from the following n probability statement, Pr[ H X. < c] = .05. To evaluate i=l1 n the constant some knowledge of the distribution of H Xi n i=l or an approximation to the distribution of H Xi or a n i=l function of H X. is necessary. Since an exact distribution n i=l for I Xi was not obtainable, several approximations i=l1 were considered. Normal Approximation The first approximation considered was a normal n approximation. The probability statement Pr[ H Xi c ] = n i=l .05 is equivalent to, Pr[( Z In Xi)/n < (In c)/n] = .05. n i=l Since ( E In X.)/n is the mean of the In X.'s, then i=l 1 n asymptotically the distribution of ( E In Xi)/n will be i=l normal with some mean, p, and some variance,a /n. Thus, n In X. c Pr E < n c = .05 (4.5) i=1 n n is equivalent to In c Pr Z < n = .05 C4.6) where Z is a standard normal random variable. The probability can then be evaluated from standard normal tables and the constant, c, can be found. From standard normal tables, the above probability statement implies In c n = 1.645 a//n or In c = [ 1.645 c//n + ]n . (4.7) (4.8) Thus, c = exp{[ 1.645 a//n + P]n} = exp [ 1.645 /n + np] . (4.9) All that is needed now is to evaluate p and a2 n E In X. S= E i=l = E(ln Xi) Sn = \(a) \(a+6) and 2 = Var(ln Xi) = T (a) T'( +B) (4.10) from results of Chapter 3. Thus, ( E In X.)/n is i=l asymptotically distributed as N{[T(a)Y(a+)] [Y'(a)T' (a+B)/n]}. The constant, c, is given for various combinations of a, 8, and n in Table 4. Where c was too small to be evaluated, In c is given. For those values, the test could be written as 1 if (t) = 0 if n E In X. < In c i=l n E In X. > In c i=l 1 (4.11) For the hypothesis H : a UMP test would be of the form 1 if S(t) = i 0 if The probability, evaluated by the approximation to n H X. > c i=l1 n H X. < c i=l (4.12) n Pr[ H X. > c ] = .05, could again be i=l normal approximation to find an C. Beta Approximation To obtain an approximation to the distribution of n I X. directly, three methods were attempted. First, i=l n the first two moments of H Xi were equated to the first i=l two moments of a beta random variable. This is intuitively appealing since the beta distribution takes on a wide variety of forms for various choices of its parameters. n Since H X. lies on the interval (0,1), it will most i=l probably be adequately approximated by some beta distribution. 