Citation
Design of paired comparison experiments with quantitative independent variables

Material Information

Title:
Design of paired comparison experiments with quantitative independent variables
Creator:
Offen, Walter William, 1953- ( Dissertant )
Littell, Ramon C. ( Thesis advisor )
Rao, Rejaver V. ( Reviewer )
Agresti, Alan G. ( Reviewer )
Matthews, Richard P. ( Reviewer )
Place of Publication:
Gainesville, Fla.
Publisher:
University of Florida
Publication Date:
Copyright Date:
1980
Language:
English
Physical Description:
x, 181 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Design efficiency ( jstor )
Eigenvalues ( jstor )
Experiment design ( jstor )
Factorial design ( jstor )
Factorials ( jstor )
Matrices ( jstor )
Maximum likelihood estimations ( jstor )
Minimax ( jstor )
Null hypothesis ( jstor )
Parametric models ( jstor )
Dissertations, Academic -- Statistics -- UF
Paired comparisons (Statistics) ( lcsh )
Statistics thesis Ph. D
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Abstract:
An experiment in which the treatments are compared and ranked pairwise is called a paired comparison experiment. The Bradley-Terry model for preference probabilities with ties not allowed is used in this dissertation. This model defines treatment parameters, tt , ... , 7T , such that the probability that treatment T. is preferred over treatment T. is equal to tt. (tt. +Tr . ) . l-Then the treatments are levels of continuous, quantitative variables, the logarithm of the Bradley- Terry parameters can be modeled as a regression-type model , InTT . = 'iS, X, . . There have been a large nvimber of papers on the topic 1 ^ k ki of paired comparisons , but only a few have considered experimental design. Past papers on design are reviewed in Chapter 1. In 1973 Springall presented the asymptotic distribution of the maximum likelihood estimators of B, , ... ,3 , where s is the number of regression- 1 s type parameters in the model. Chapter 2 of the present dissertation contains a derivation of this asymptotic distribution. The remainder of the dissertation is a consideration of designs for situations where the treatments are levels of a single quantitative variable. The asymptotic variance-covariance matrix is a function of the true parameter values. Hence, the optimal designs also depend on the parameters. In previous papers on design, the Bradley-Terry parameters were taken to be equal when evaluating the variances and covariances. The present dissertation considers the design problem for various parameter values. Notice that the general design problem is complex. It involves deciding how many levels and which particular levels should be chosen. Furthermore, it must be determined how often to compare each pair of treatments. In some cases this complexity was resolved. In other cases additional restrictions had to be imposed before optimal designs could be found. Chapter 3 considers designs which minimize the variance of a single parameter estimate, S.. The linear, quadratic, and cubic models are considered. The optimal design in the linear case turns out to be comparisons of only the smallest level in the experimental region with the largest, for most values of the linear parameter. The designs for the quadratic and cubic cases depend more heavily on the parameter values than for the linear case. Chapter 4 presents optimal designs for fitting a quadratic model. The optimality criteria considered are D-optimality , the minimization of the average variance of Inir , and the minimization of the maximum value of InTT.. These designs also depend heavily on the parameters, although some overall design recommendations are given in the last section. The remaining chapters contain a brief discussion of related topics. Chapter 5 is a consideration of designs which protect against the bias present when the true model is cubic. Oiapter 6 is a discussion of designs for preliminary test estimators. Chapter 7 is a consideration of two-stage sampling. The first stage is a design which results with a small variance of the quadratic parameter estimate. A test of hypothesis then determines whether the second stage design should be one which is optimal for fitting a linear or a quadratic model. A discussion of the best error rate to use for the test of hypothesis is included. The appendices contain computer programs that find maximum likelihood estimates of the Bradlev-Terry parameters and 6,/ ... ,B • These programs are written in the languages APL and Fortran.
Thesis:
Thesis (Ph. D.)--University of Florida, 1980.
Bibliography:
Includes bibliographic references (leaves 178-180).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Walter William Offen.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
023428712 ( AlephBibNum )
07188318 ( OCLC )
AAL5316 ( NOTIS )

Downloads

This item has the following downloads:


Full Text


















DESIGN OF PAIRED COMPARISON EXPERIMENTS
WITH QUANTITATIVE INDEPENDENT VARIABLES












BY

WALTER WILLIAM OFFEN


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY






UNIVERSITY OF FLORIDA


1980



















Digitized by the Internet Archive
in 2009 witiff~hiing from
University of Florida, George A. Smathers Libraries


http://www.archive.org/details/designofpairedco00offe
















ACKNOWLEDGMENTS


I would like to thank Drs. Ramon C. Littell and William Mendenhall

for encouraging me to work past the Master of Statistics degree to pur-

sue a Ph.D. I sincerely appreciate their concern and beneficial advice.

A special thanks goes to Dr. Ramon C. Littell, first of all for

the attention and help he gave to me in my early years at the University

of Florida while working for him at the Institute of Food and Agricul-

tural Sciences. I especially thank him for the immense amount of time

and effort he contributed to this project. His patience and sincere

interest will always be remembered.

Thanks go to my family for their understanding and support during

the many years of my education. I am especially grateful to my wife,

Susie, for her love and caring, which makes it all worthwhile.

Finally, I would like to thank my typists, Susan and Walter Offen,

who both did an outstanding job of typing this manuscript. Also,

thanks go to Drs. John Cornell, Ramon Littell, Frank Martin, and

Kenneth Portier for allowing my typists to use their office type-

writers.


1ii
















TABLE OF CONTENTS


Page
ACKNOWLEDGMENTS . . . . . .. . . . . . iii

LIST OF TABLES . . . . . . . . . . . . vi

LIST OF FIGURES . . . . . . . . ... . . . .vii

ABSTRACT . . . . . . . . ... . . . . . .viii

CHAPTER

1 DESIGN OF PAIRED COMPARISON EXPERIMENTS: A LITERATURE
REVIEW . . . . . . . . . ..... 1

1.1 Introduction .... . . . . . . . 1
1.2 Bradley-Terry Model . . . . . . . 3
1.3 Estimation of the Bradley-Terry Parameters and
Corresponding Asymptotic Distribution . . 6
1.4 Designs of Factorial Paired Comparison
Experiments . . . . . . . . . 11
1.5 Response Surface Designs . . . . ... .22

2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS 26

2.1 Introduction . . . . . . . ... 26
2.2 Estimation .. . . . . . . . .27
2.3 Asymptotic Distribution . . . . . . 29

3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER
ESTIMATE ........ . . . . . . .... 34

3.1 Introduction . . . . . . . . . 34
3.2 Linear Model .... . . . . . .34
3.3 Quadratic Model ... . . . ... 38

Minimization of var (82) . . . . ... 38

Equivalence of El-Helbawy and Bradley's
method and minimizing the variance . . 50
3.4 Cubic Model .. . . . . . . . 58

4 DESIGNS FOR FITTING A QUADRATIC MODEL . . . ... .65

4.1 Introduction .... . ..... ... .... 65









Page

. 68


4.2 D-optimal Designs . . . . . .
Condition 1 .....
Condition 2 . . . . .
Condition 3 .....
4.3 Average-variance Designs ....
Condition 1 .....
Condition 2 . . . . . .
Condition 3 ......
4.4 Minimax Designs ......
Condition 1 ......
Condition 2 . . . . .
Condition 3 . . . .


4.5 Conclusion and Design Recommendations . .. .128

5 DESIGNS THAT PROTECT AGAINST BIAS . . . . ... 135


Introduction . . . . . . .
All Bias Designs . . . . . .
Integrated Mean Square Error Designs .


. . 135
. . . 136
. . . 138


6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS . . . .. .145

6.1 Introduction . . . .. . . . . .145
6.2 Average-variance Designs for Preliminary Test
Maximum Likelihood Estimators . . . ... .146

7 TWO-STAGE SAMPLING FOR MODEL FITTING . . . . .. .150


Introduction . . . . . . .
Optimal Error Rate . . . . . .
Concluding Remarks . . . . . .


. .. 150
. 151
. .. 157


APPENDIX


A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES
OF 7i ...." t'r . . . . . . . . .. .. 165

B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES
OF 8 ,..., s . . . . . . . . . . 170

BIBLIOGRAPHY . . . . . . . . . . . . . 178

BIOGRAPHICAL SKETCH . . . . . . . . .. ... .181















LIST OF TABLES


TABLE Page

1.1 Efficiencies of balanced 2 and 2 factorial paired
comparison designs . . . . . . . . ... 18

3.1 Designs which minimize var(B ) . . . . . . 48

3.2 Designs which minimize var( 3) . . . . . . 64

4.1 D-optimal designs with restriction x=-l,0,1 ...... 90

4.2 D-optimal (local) designs with 2=0 . . . . . 94

4.3 D-optimal designs .. . . . . . . . . .95

4.4 Average-variance designs with restriction x=-l,0,l .. 105

4.5 Average-variance (local) designs with B2=0 ...... 109

4.6 Average-variance designs . . . . . . . .. 110

4.7 Minimax designs with restriction x=-l,0,1 . . ... 119

4.8 Minimax (local) designs with 2=0 . . . . ... 123

4.9 Minimax designs ... . . . . . . ... .124

4.10 Efficiency of designs . . .... ... ...... 133

4.11 Bias of designs .... . . . . . . . 134

5.1 J-criterion designs . . . .. . .. . . 141

5.2 Values of J . . . . . . . . . . . .143


















LIST OF FIGURES


FIGURE


1.1 Depiction of Designs I and II . . . . .

3.1 Designs which minimize var(2) . . . .

4.1 Procedure summary . . . . . . .

4.2 Depiction of possible designs for Condition 1 .

4.3 Locally optimal designs when 2=0 . . . .

7.1 Minimization of the average mean square error -
prior . . . . . . . . . .

7.2 Minimization of the average mean square error -
prior . . . . . . . . . . .

7.3 D-optimality Uniform prior . . . .

7.4 D-optimality Normal prior . . ......


Page

. . 12

. . . 49

. . 69

. . . 75

129

Uniform
. . 161

Normal
. . . 162


F
















Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy


DESIGN OF PAIRED COMPARISON EXPERIMENTS
WITH QUANTITATIVE INDEPENDENT VARIABLES


By

Walter William Offen

August 1980

Chairman: Ramon C. Littell
Major Department: Statistics


An experiment in which the treatments are compared and ranked

pairwise is called a paired comparison experiment. The Bradley-Terry

model for preference probabilities with ties not allowed is used in

this dissertation. This model defines treatment parameters, ...

t such that the probability that treatment T. is preferred over

-1
treatment T. is equal to i. (T7.+r.) When the treatments are levels

of continuous, quantitative variables, the logarithm of the Bradley-

Terry parameters can be modeled as a regression-type model,

In7T = kxki. There have been a large number of papers on the topic
i k kxi'

of paired comparisons, but only a few have considered experimental

design. Past papers on design are reviewed in Chapter 1. In 1973

Springall presented the asymptotic distribution of the maximum likeli-

hood estimators of S1 ... 3 where s is the number of regression-

type parameters in the model. Chapter 2 of the present dissertation

contains a derivation of this asymptotic distribution. The remainder

viii










of the dissertation is a consideration of designs for situations where

the treatments are levels of a single quantitative variable.

The asymptotic variance-covariance matrix is a function of the

true parameter values. Hence, the optimal designs also depend on the

parameters. In previous papers on design, the Bradley-Terry parameters

were taken to be equal when evaluating the variances and covariances.

The present dissertation considers the design problem for various

parameter values.

Notice that the general design problem is complex. It involves

deciding how many levels and which particular levels should be chosen.

Furthermore, it must be determined how often to compare each pair of

treatments. In some cases this complexity was resolved. In other

cases additional restrictions had to be imposed before optimal designs

could be found.

Chapter 3 considers designs which minimize the variance of a

single parameter estimate, S.. The linear, quadratic, and cubic models
1

are considered. The optimal design in the linear case turns out to be

comparisons of only the smallest level in the experimental region with

the largest, for most values of the linear parameter. The designs for

the quadratic and cubic cases depend more heavily on the parameter

values than for the linear case.

Chapter 4 presents optimal designs for fitting a quadratic model.

The optimality criteria considered are D-optimality, the minimization

of the average variance of Inr., and the minimization of the maximum

value of InT.. These designs also depend heavily on the parameters,
1

although some overall design recommendations are given in the last

section.









The remaining chapters contain a brief discussion of related top-

ics. Chapter 5 is a consideration of designs which protect against the

bias present when the true model is cubic. Chapter 6 is a discussion

of designs for preliminary test estimators. Chapter 7 is a considera-

tion of two-stage sampling. The first stage is a design which results

with a small variance of the quadratic parameter estimate. A test of

hypothesis then determines whether the second stage design should be

one which is optimal for fitting a linear or a quadratic model. A

discussion of the best error rate to use for the test of hypothesis is

included.

The appendices contain computer programs that find maximum likeli-

hood estimates of the Bradley-Terry parameters and 1, .... ,s. These

programs are written in the languages APL and Fortran.
















CHAPTER 1

DESIGN OF PAIRED COMPARISON EXPERIMENTS:
A LITERATURE REVIEW


1.1 Introduction


An experiment in which the treatments are compared and ranked

pairwise is called a paired comparison experiment. Such an experiment

requires a number of subjects which are usually individuals, but could

conceivably be a machine, for instance. Each subject is required to

compare two samples and decide which of the two is preferable for the

attribute under study. For example, each subject could be asked to

state which of two samples tastes better. Paired comparison experi-

ments are often conducted in areas such as food testing where it may be

difficult for an individual to quantify his like or dislike of a sample,

but he may be able to readily decide which of the two samples is pre-

ferred.

Suppose that an experimenter wants to determine which of four

brands of coffee most consumers prefer. There are a number of experi-

mental procedures available. One possibility is that each individual

be required to taste and rank all four brands from least favorable to

most favorable. However, sensory fatigue may be a problem because

after tasting the second or third sample, it may be difficult to per-

ceive any difference between the subsequent sample and the first sample

tasted.









Another possible procedure is to have each individual assign to

each brand a score on some numerical scale. Sensory fatigue may again

be a problem with this experimental procedure since all four brands are

tasted by each individual. Additionally, there may be a degree of ar-

bitrariness to choosing a numerical score for each treatment.

A third possibility is to conduct a paired comparison experiment.

The problems mentioned in the two preceding paragraphs are not present

for paired comparison experiments. That is to say, paired comparison

experiments have a minimal amount of individual guesswork involved.

As pointed out in the first paragraph, the complete experiment involves

a number of individuals, each one choosing out of two samples the one

which is preferable. For the particular example with four brands of

coffee, 24 subjects might be utilized, thereby resulting in 4 repli-

cates of each of the 6 distinct pairs of treatments.

The treatments can have a factorial arrangement. However, facto-

rial paired comparison experiments have often been analyzed ignoring

the factorial arrangement of the treatments. For example, Larmond,

Petrasovits, and Hill (1968) conducted a 2x3 factorial paired compari-

son experiment, but they did not test for interaction or main effects

as normally is done with factorial experiments. Although Abelson and

Bradley (1954) presented a theoretical development, only relatively

recently have tractable methods been available for testing interaction

and main effects in factorial paired comparison experiments.

In the case of quantitative treatments, an analogue of multiple

linear regression analysis is appropriate. For example, instead of

four distinctly different brands of coffee, there may only be one brand










of coffee with four different levels of an additive. The design of

paired comparison experiments for fitting response surfaces with one

independent variable is discussed in Chapters 3-7.


1.2 Bradley-Terry Model


Denote the t treatments as T ,T ... ,Tt. Let ni.. 0 be the

number of times T. and T. are compared, i 1 3
T. T. for "T. is selected over T.", where selection is based on the
1 ] 1 3
particular attribute under study. A general model defines ( ) function-

ally independent parameters, 0 s it.. 1, such that



P(T. T.) = T ., U.. + .. = 1, ij, i,j=l,...,t. (1.2.1)



Each distinct pair of treatment comparisons is a set of independent

Bernoulli trials. The entire experiment is a combination of the (2)

independent sets.

Bradley and Terry (1952) proposed a basic model for paired compar-

isons. A treatment parameter T. is associated with each treatment T.,
l 1
i=, ... ,t. The Bradley-Terry model is then defined to be



P(T. T.) = .. = i./(T. + T.) i7j, i,j=l,...,t. (1.2.2)
1 j 13 1 1 3



The right-hand side of (1.2.2) is invariant under scale changes, so a

single constraint is imposed to make the treatment parameters unique.

Two popular choices are Z rT = 1, or Z In 7. = 0. This constraint is
replaced with another constraint in Chapters 2-7.
replaced with another constraint in Chapters 2-7.










Another model for paired comparisons was presented by Thurstone

(1927). He used the concept of a subjective continuum on which only

order can be perceived. Let p. represent the location point on the

continuum for treatment T., i=l, ... ,t. An individual receives a
1

sensation X. in response to treatment T., with X. distributed
1 1 1

Normal(J.,l). The probability that T. is preferred over T. is then
1 1




P(T. T.) = P(X. > X.) =- e /2 dy i (1.2.3)
1 J3 1 3 i,j=l,...,t.
-(i.-UJ.)



Bradley (1953) noted that replacement of the normal density function

in (1.2.3) by the logistic density function yields




P(T. T) = 1sech2 (y/2) dy = /(+) j (1.2.4)
J sech(y/ i J i,j=l,...,t.
-(InT.-lnT.)
1 J


This is the Bradley-Terry model with U.=lnr., i=l, ... ,t.
1 1

Thompson and Singh (1967) gave a psychophysical consideration by

assuming a treatment T stimulates unobservable signals UI, ... ,U

which are then transmitted to the brain by m sense receptors. The U.

are independent and identically distributed, and m is taken to be fixed

and large. The two variables considered for the response X to treatment

T are


(i) X is the average of the signals U1, ... ,U

(ii) X is the maximum of the signals U1, ... ,U .
1 m









By the Central Limit Theorem, asymptotic normality results for X in

(i), which results with Thurstone's model. The random variable in (ii)

has one of three distributions depending on the distribution of the U..
1

If U., for example, is distributed exponentially, then the asymptotic
1

distribution of X is the extreme value distribution. In any case, how-

ever, the Bradley-Terry model in (1.2.4) results from (ii).

Rao and Kupper (1967) generalized the Bradley-Terry model to allow

for ties. A tie parameter 8 2 1 is introduced, and the model becomes




P(T. T) 1 sechy/2) dy = r./(7T.+7.) i, j
1 3 4 1 1 ] i j=l.
-(lniT.-lnTj)+n



(lnTr.-lnr.)+-f
T 1 2
(T. = T = 1 sech (y/2) dy (1.2.5)
S] 4
(InTT. -InTT.)-n

ij




(7T. + 6T.) (6O s. + 7T.)



where ni=ln6 is the threshold of perception for the judges. That is to

say, if it is hypothesized that the two samples generate a random vari-

able d which measures the difference between the samples, and further-

more that Idl < n, then the judge will not be able to discern a differ-

ence between the two samples, and will thus declare a tie. Note that

the Bradley-Terry model is a special case with 8=1. Another model to

handle ties was proposed by Davidson (1970).










The Bradley-Terry model has received much attention in statistical

literature since it was introduced in 1952. For the remainder of this

dissertation, the Bradley-Terry model with ties not allowed is used.


1.3 Estimation of the Bradley-Terry Parameters
and Corresponding Asymptotic Distribution


When the model was proposed by Bradley and Terry, it was assumed

that all of the n.. were equal. Dykstra (1960) extended the method to
13

permit unequal n...

The parameters are estimated by the maximum likelihood method and

likelihood ratio tests are developed. Assuming independence of the

paired comparisons, the complete likelihood function is



t a. t n..
L = H / H (T. + i .) (1.3.1)
i=l i


where a. is the total number of times T. is selected over any other
1 i

treatment, i=l, ... ,t. The likelihood equations are



t
a. n..
p .i = 0 i=l,...,t,
Pi (Pi + Pj)
J 3
j i

(1.3.2)
t

SPi= 1 ,
i=



where p.i denotes the maximum likelihood estimate of i'', i=l, ... ,t.
1 1










These equations must be solved iteratively (see Appendix A for an

APL computer program which finds maximum likelihood estimates of the

treatment parameters). A brief description of the iterative procedure

follows. Letting p(k) be the kth approximation to pi, then


t
(k) *(k) *(k)
Pi = i / Pi (1.3.3)
i=l

where

t
*(k) a (k-l) (k-l) -- 1.3.
.= / nij/(P + P ) k=1,2,.--. (1.3.4)
Pi

j
j#i

(0)
The iteration can be started with p. =l/t, i=l, ... ,t. Dykstra (1956)

(0)
suggested a method of obtaining good initial parameter estimates p ,

i=l, ... ,t.

Ford (1957) proposed the model independently and proved that the

iterative procedure converged to a unique maximum for L if the following

assumption is true.


Assumption 1.1. For every possible partition of the treatments into

the two sets S1 and S2, some treatment in S1 is preferred at least once

to some treatment in S2'


If Assumption 1.1 does not hold, then a. must be zero for at least one
1
i. If exactly one a. is zero, say a. there may still be a unique
0
solution with p. =0. However, Ford proceeded to explain that if the
o

set S2 which violates Assumption 1.1 contains two or more treatments,

then the corresponding estimates of the Bradley-Terry treatment









parameter must be zero, and consequently individual members of 52 could

not be compared. This fact is also evident by noticing that in (1.3.4),

*(k)
pi would be zero for at least two values of i, and so by (1.3.3),
(k)
Pi would also be zero for the same values of i. Then the next time

in the iterative procedure that (1.3.4) is executed, the denominator

in the summation would be zero for some i and j, and hence the itera-

tive procedure would not converge.

The asymptotic theory and tests of hypothesis require the follow-

ing additional assumption.


Assumption 1.2. For every partition of the treatments into two non-

empty subsets S1 and S2, there exists a treatment T ES1 and a treatment

T.S such that j. >0, where


n..

ij N
ij = N-lim N itj, i,j=l.....t,




N = ni-

i
Bradley (1955) investigated asymptotic properties of the treatment

parameters with each n.,=n. Davidson and Bradley (1970) considered a

multivariate model where each subject chooses one of two samples on more

than one attribute. As a special case of the multivariate model, David-

son and Bradley obtained a more general result, allowing the n.. to be

different. They showed that the asymptotic joint distribution of

vN(p 1-r), .....i./(p -r ) is the singular multivariate normal distribu-

tion of dimensionality (t-1) in a space of t dimensions, with zero mean









vector and variance-covariance matrix Z = ( ..), where
3


7.. = cofactor of A.. in
13 13


13 txt txl



1'xt 0



13 txt tx1



1' 0
-- xt


(1.3.5)


1' = (1,1, ... ,1), the t-dimensional unit row vector, and


t
_ii i= Z 'J i=l,..,t,
ji (7T + T" )

Jti


A = -U i/(i + T)2
13 13 1


It is apparent that the variances and covariances in (1.3.5) depend on

the true treatment parameters. Hence estimates of the variances and

covariances are usually found by substituting the consistent maximum

likelihood estimates for 71' "'. t into (1.3.5).

In the special case with i.=l/t, n..=n for all i,j,
1 13


0.. = 2(t-l) /t3 and a


= -2(t-l)/t3 ij.


(1.3.6)


Use of (1.3.6) is adequate if 7T, ... 7 t are approximately equal.


ij, i,j=l,...,t.








The major test proposed is a test of treatment equality,


H 0 i = = it = 1/t ,

versus

H : T. f j for some i,j, i/j, i,j=l,...,t.
a i 3

The likelihood ratio statistic is



-21nX = 2Nln2 2 [ nijln(p +p ) a In pI.

i
Under the null hypothesis, H -21nA is asymptotically central chi-
o
square with (t-l) degrees of freedom. Bradley and Terry (1952) and

Bradley (1954) provided tables for t=3,4,5, including exact signifi-

cance levels. Comparison of the chi-square significance levels with

the tabled exact significance levels suggests that the former may be

used even for relatively small values of n... This is perhaps compar-

able to using the normal approximation to the binomial.

The asymptotic distribution under the alternative hypothesis is

also available for -21nX. Let



6 t
1 iN
S= -+ = 0 lim 6. = 6.
i=l


Then -21nA is asymptotically noncentral chi-square with (t-l) degrees

of freedom and noncentrality parameter i where



92
1 4 Iij(i 6
i








1.4 Designs of Factorial Paired Comparison Experiments


Littell and Boyett (1977) compared two designs for RxC factorial

paired comparison experiments. The two designs considered are:


Design I: Consists of independent paired comparisons of each of the
RC
( ) pairs of treatments.


Design II: Consists of comparisons between pairs of treatment com-

binations differing only in the levels of one factor, i.e. comparing

two treatments in the same row or column.


Note that there are (R) = -RC(RC-1) distinct pairs of treatments
2 2

for Design I, but only R( ) + C( ) =RC(R+C-2) distinct pairs of treat-

ments for Design II. In the 4x4 case, for example, Design I requires

120 distinct paired comparisons whereas Design II requires only 48

distinct paired comparisons. The two designs could entail the same

total number of paired comparisons, but Design II requires preparation

of fewer distinct pairs of samples.

Design II is also preferable from a computational point of view

because its analysis can be carried out by using a computer program

capable of handling the one-way classification with unequal replication

of pairs. A specialized factorial program is required for Design I to

obtain maximum likelihood estimates under the main effects hypothesis,

H : T., = .B., where T., is the Bradley-Terry treatment parameter
0 13 1 3 1]
given by (1.4.1).

The two designs are illustrated for the 2x2 case in Figure 1.1.

The comparisons made with Design I are indicated by the solid arrows.

The comparisons made with Design II are indicated by the dotted arrows.

















Factor B


high


cE--- --z


Figure 1.1. Depiction of Designs I and II


low


Factor A


high









Designs I and II were compared by Littell and Boyett on the basis

of the asymptotic relative efficiencies, in the Bahadur sense, of their

respective likelihood ratio tests. That is, test statistics are com-

pared on the basis of the ratio of the exponential rates of convergence

to zero of their observed significance levels. In general, the Bahadur

efficiency of test 1 relative to test 2 is defined to be



lim(- -2 log L ())
n n
E = (2) = cl ()/c2 (e)
lim(- log L )
n n
rrto


(i)
where L is the observed significance level of test i. The function

c. () is called the exact slope of test i.

Usually Bahadur efficiency is used to compare two test statistics

that are based on the same sample space. However, its use is equally

appropriate when the sample spaces are different. The two designs

should be compared on the basis of an equivalent number of comparisons

in each. Therefore, a "single observation" with Design I will be taken

as n = R+C-2 comparisons of each pair in the design, and a "single

observation" with Design II will be taken as niI = RC-1 comparisons of

each pair in the design. This gives -RC(RC-1)(R+C-2) total comparisons

in each design.

Let T.. represent the treatment with the first factor at level i

and the second factor at level j. The Bradley-Terry model is then



P(T. Tkl) = T ij i + k ik=l" (1.4.1)
j,1=1,...,C.









The following tests of hypotheses were considered by Littell and

Boyett.


Test 1: H : = 1/RC H : 7. = .B .
o 1i a 13 x]
This is a test for detecting two main effects versus no treatment dif-

ferences. Efficiencies were tabulated for the 2x2 case (Littell and

Boyett (1977), Table 1). The table showed that Design I is more effi-

cient than Design II for most combinations of a and 3.


Test 2: H : T.. = 1/RC, H : .,. = 8./R.
o 13 a 13 3
This is a test for detecting a single main effect versus no treatment

differences. For this test, Design I was shown to be uniformly more

efficient than Design II.


Test 3: H : T.. = ./R H : i. = aB .
o j3 3 a 13 i j
This is a test for detecting two main effects versus a single main ef-

fect. Efficiencies were again tabulated for the 2x2 case. The table

showed that Design I is more efficient than Design II for most combina-

tions of a and B.


Test 4: H : T.. = a.6. H : T.. 0 .
o 13 i 3 a 1]
This is a test for detecting interaction versus two main effects. The

results showed that Design II is uniformly superior to Design I for

detecting interaction.


Summarizing, when testing for main effects (Tests 1-3), Design I

is usually more efficient than II. When testing for interaction,

Design II is uniformly more efficient than I.










Littell and Boyett suggested that it might be a good idea for the

experiment to be conducted in two stages. The first stage is a number

of replicates of Design II followed by a test for interaction. If it

is significant, the second stage is completed by again using Design II,

and simple effects are analyzed by applying one-way analyses within rows

and columns. If interaction is not significant, then the second stage

consists of replicates of pairs in Design I that are not in Design II,

and main effects are analyzed.

The total number of comparisons to be made in the experiment can

be divided into the two parts such that after the experiment is complet-

ed, the end result is either a complete Design I or Design II. For

example, suppose the treatments form a 2x2 factorial, and resources are

available for 24 comparisons. In the first stage, the experimenter runs

4 replicates of the four comparisons in Design II (see Figure 1.1).

If interaction is significant, two more replicates of the four compari-

sons in Design I that are not in Design II are run. The latter then

results in the same comparisons made as if a single stage experiment

using Design I had been conducted.

Quenouille and John (1971) also considered designs for 2n facto-

rials in paired comparisons. However, they assumed an individual is

asked to assign a number on a scale to signify the degree of difference

in the two samples instead of merely deciding which sample possesses

more of the particular attribute under study. This model was developed

by Scheffe (1952).

In particular, Quenouille and John discussed the following 23

factorial. The 28 possible paired comparisons were divided into 7 sets









of 4 blocks as follows:


Set (1) : (I,ABC) (A,BC) (B,AC) (C,AB)

Set (2): (I,AB) (A,B) (C,ABC) (AC,BC)

Set (3) : (I,AC) (A,C) (B,ABC) (AB,BC)

Set (4): (I,BC) (A,ABC) (B,C) (AB,AC) (1.4.2)

Set (5) : (I,A) (B,AB) (C,AC) (BC,ABC)

Set (6) : (I,B) (A,AB) (C,BC) (AC,ABC)

Set (7) : (I,C) (A,AC) (B,BC) (AB,ABC)


The letter "I" represents the treatment where all factors are at their

low levels. If a letter appears, then that factor is at its high level.

Note that each treatment appears once in each set. These sets are con-

structed by first pairing treatment I with each of the remaining treat-

ments. These are called the initial blocks. The remaining blocks in

the sets are constructed by multiplying the initial block by any treat-

ment that has not yet appeared in that particular set, with the usual

2 2 2
rule that A = B = C =1.

To illustrate this technique, we will construct set (1) in (1.4.2).

The initial block is (I,ABC). So far only I and ABC have appeared, so

we may choose any treatment other than these. Suppose we choose BC.

Then the next block in the set would be (BC,A). Note that order is

unimportant. Now we may choose any treatment other than I, ABC, BC, or

A. If we choose B, we get the third block in set (1). To construct the

fourth block, there are only two choices remaining, namely C or AB.

Either one results in the last block of set (1).

Each set lacks information on the effects that have an even number

of letters in common with the initial block. For example, the initial









block for set (1) is (I,ABC), so this set provides no information on

the effects AB, AC, and BC, i.e. on all first-order interactions. Sim-

ilarly, the initial block for set (5) in (1.4.2) is (I,A), so this set

provides no information on B, C, and BC.

This concept of no information on B, C, and BC from data in set

(5) may be viewed as follows. If the responses from comparisons in set

(5) are modeled as


Y(I,A) = (p+a+e (p+e a+E(I,A)


Y(B,AB) = (+a+b+(ab)+eAB) (+b+e = a+(ab)+(BAB)


Y(C,AC) = (+a+c+(ac)+eAC) (U+c+eC) = a+(ac)+E(C,AC)


Y(BC,ABC) = (p+a+b+c+(ab)+(ac)+(bc)+(abc)+e BC) -
(BCABC) ABC
(p+b+c+(bc)+eBC


= a+(ab)+(ac)+(abc)+E(BCABC)


where all of the e's are independent and have expected value zero, then

it is clear that b, c, and (bc), corresponding to the effects B, C, and

BC, are not estimable from the four comparisons in set (5).

Quenouille and John define efficiency as follows. Suppose the

design is made up of r sets, s (s : r) of which give information on some

particular effect. Then s/r is the fraction of the comparisons made

that give information on that particular effect. On the other hand,

suppose that each set is run exactly once, i.e. each pair of treatments

is compared. It can be shown that there are a total of (2n-1) sets,

n- of which give information on any particular effect. In this case,

2n-/(2 -1) is the fraction of the comparisons made that give










Efficiencies of


Table 1.1.
balanced 2 and 2 factorial paired comparison designs


n Design


{2}
2
f1}


{3}
{2}
{1}
3
{3,2}

{3,1}
{2,1}


Number of
blocks


2
4


4
12
12
16
16
24


Efficiency of interactions of order
0 1 2


1.50 0.00
0.75 1.50


1.75
1.17
0.58
1.31
0.88
0.88


0.00
1.17
1.17
0.88
0.88
1.17


1.75
0.00
1.75
0.44
1.75
0.88


Notation: {j}: A design constructed by combining all
block I and a j-factor combination.

{j,k): A design constructed by combining the
(k}.

Source: Quenouille and John (1971, Table 1).


sets with initial


designs {j} and


information on any particular effect. The efficiency defined by Que-

nouille and John is the ratio of these two fractions. In other words,

the efficiency is defined to be


S s/r
E
2n-l/(2n-l)


Table 1.1 gives the efficiencies of some designs in the 2 and 2

cases. These designs are balanced in the sense that all main effects

are estimated with the same efficiency, all first order interactions










are estimated with the same efficiency, and so on. The table indicates

that in the 22 case, Design {1} is efficient for estimating the inter-

action effect. This result is the same as the result Littell and

Boyett obtained using Bahadur efficiency.

El-Helbawy and Bradley (1978) considered treatment contrasts in

paired comparison experiments. The theory developed is completely gen-

eral, but they specifically showed how optimal designs can be found in

the case of a 23 factorial. Their optimality criterion is the maximi-

zation of the chi-square noncentrality parameter from the likelihood

ratio test, which thereby maximizes the asymptotic power of the test.

Let a=(al,a2,a3), where a. represents the level of factor i,

i=1,2,3, and each a. is 0 or 1. Then the treatment parameters are
1

defined by El-Helbawy and Bradley to be


1 = 2 3 12 13 23 123 (1.4.3)
= T T T 7 i T v (1.4.3)
a al a2 a3 ala2 aa3 a2a3 ala2a3


for all a. The factorial parameters are defined by taking logarithms

of the 8 treatment parameters given by (1.4.3). So from (1.4.3),



Ya = logfa


3

= log + log7 + logT123
a a.a alaa3



3

= Ya + Y + Yl23 (1.4.4)
i=l i


The last two expressions in (1.4.4) define the factorial parameters








1 2 3 12 13 23 123
Y' Y2 Y 3 2a, y Y These parameters are
as1 a2 a3 ala aal a a3 3 ala2a3

subject to the usual analysis of variance constraints, namely


= 0 ,


i=1,2,3,


1

Ya a
a.=0
3


= 0 ,


i

(1.4.5)


1
S 123
ala a3
a 2=0


1


12a3
a3=0


The null hypothesis of no two-factor interaction

two factors is now considered. This test may be made


12
Ho: YOO


between the first

by testing


= 0 .


12
The constraints then require that Y 2=0, al,a2=0,1. The null hypoth-


esis is therefore equivalent to


12 12 12 12
S 00 Y O 01+ = 01


12 12 12 12 12
since by (1.4.5), y0 y Y + Y1 = 4y00 Let B denote a

matrix whose rows are orthonormal contrasts. If we let Y' =

(Y000,Y 001,Y010 00 ,Y01 ,lll), then the null hypothesis is

also equivalent to



H : B y = 0,
o -n=l

where B = S (1,1,-1,-1,-1,-,1,1).
"n=1 8


a.=0
1




Ya a.
a. =0
I


1
7123

al=0


= 0.










Let


a if comparing T. and T. give no information (in the

sense discussed by Quenouille and John) on the two-
n .
1-3
N factor interaction between the first two factors,


b otherwise,


where n.. is defined in Section 1.2, and N = n

i Then 12a + 16b = 1, since 16 of the 28 distinct pairs give information

on the two-factor interaction. El-Helbawy and Bradley then proceeded

to show that b should be taken as large as possible in order to maximize

the noncentrality parameter. This gives the result b=1/16, a=0. So

the optimal design is to compare the treatments that give information

on the two-factor interaction between the first two factors. These

pairs of treatments would be the sets (3), (4), (5), and (6) in (1.4.2).

This result is the same as Quenouille and John would obtain if

they wanted to find an optimal design to estimate AB interaction, since

using these 4 sets maximizes the efficiency of the AB interaction using

their definition of efficiency. It is also analogous to the result

Littell and Boyett obtained because the treatments compared in either

case differ either in the level of A, or the level of B, but not both.

El-Helbawy and Bradley noted that the result discussed in the two

preceding paragraphs still holds if other specified factorial effects

are assumed to be null (e.g. three-factor interaction). While the sta-

tistics for the two tests may differ, their limit distributions are the

same for either H or H
o a










El-Helbawy and Bradley also considered simultaneously testing for

the presence of AB, AC, and ABC interactions. In this case the null

hypothesis is


H: B y= 0,
o -n=3-

where


1 1 -1 -1 -1 -1 1 1

B = = 1 -1 1 -1 -1 1 -1 1
-n=3 Y8
1 -1 -1 1 -1 1 1 -1


By the same method they concluded that the optimal design is to make

the following comparisons (using Quenouille and John's notation):

(I,A), (B,AB), (C,AC), and (BC,ABC). Note that this is set (5) in

(1.4.2). Note also that this is the only set that contains information

on all three of the effects AB, AC, and ABC. So this design is also

optimal using Quenouille and John's definition of efficiency.

Beaver (1977) considered the analysis of factorial paired compari-

son experiments using the weighted least squares approach of Grizzle,

Starmer, and Koch (1969). He considered a loglinear transformation on

the T. 's. This approach produces noniterative estimates of the i.'s.

Also, the usual tests for a factorial experiment are easily implemented.

However, Beaver gave no consideration to experimental design.


1.5 Response Surface Designs


Springall (1973) considered response surface fitting using the

Bradley-Terry model with ties allowed as generalized by Rao and Kupper

(1967). The model is given in (1.2.5).










The logarithm of the treatment parameters are defined to be func-

tions of continuous, independent variables, x ,x2, ... ,xs, measurable

without error, i.e.


s

In = k xi i=,...,t. (1.5.1)
k=l


Note that some of the variables can be functions of other variables.
2
For example, s=2 and x 2=xli results in the parameterization
2
In i = 1xli+ 2xli .. The parameters 6, ... are estimated by

the maximum likelihood method. Appendix B has an APL program and a

Fortran program which produce maximum likelihood estimates of ...

I but which assume there are no ties.

Springall presented the following definition, theorem, and example.


Definition 1.1. Analogue designs are defined to be designs in which

the elements of the paired comparison variance-covariance matrix are

proportional to the elements of any classical response surface variance-

covariance matrix with the same design points and an equal number of

replicates at each design point.


The advantage of such designs is that they enable certain desirable

properties found in classical response surface designs (e.g. rotatabi-

lity, orthogonality), which are dependent on the relative sizes of the

elements of the variance-covariance matrix, to be readily produced.
*
For example, consider the classical regression model y = 0 +
*2
81x + 8x Then an experiment can be designed in such a way that the
variance-covariance matrix of the least squares estimators 0' and
"* m*
variance-covariance matrix of the least squares estimators 3 B%, and









82 possesses an optimal form. For instance, the classical design with

an equal number of replicates at each of the four levels x=-2,-1,1,2

can be shown to be rotatable, and so the analogue paired comparison

designs discussed in Example 1.1 below have the same property, i.e.

var(ln ) = var(ln ).
x -x
2
The analagous paired comparison model is In x = x + 2 x An

analogue paired comparison design has a corresponding variance-covari-

ance matrix of the maximum likelihood estimators 81 2, such that each

element is a constant times the corresponding element of the least

squares variance-covariance matrix. For the particular example of a

quadratic model currently under discussion, var(6l) = c.var(B ),

var( 2) = c-var(B2), and cov(S ,62) = c-cov(18 2), for some c.

Springall described a method which uses linear programming tech-

niques that produces an analogue design with minimum elements of the

variance-covariance matrix. The following theorem shows how one can

produce an analogue design, but it does not necessarily have minimum

elements of the variance-covariance matrix.


Theorem 1.1. If a classical response surface design has certain prop-

erties dependent on the relative sizes of the elements of the variance-

covariance matrix, then the analogue paired comparison design will have

the same properties if



n. = N [i kl 11
k

where i.. = i.7./(7. + .) 2
iJ I J J









Example 1.1. Consider the quadratic model In Ti = B x + 2x2, with

design coordinates x=-2,-l,l,2. Assume e=1 and 1= 2=0, and set N=150.

Springall showed that an analogue design is given by n.. = 150/6 = 25,

i
An alternative design may be obtained by using linear programming.

Let Z denote the variance-covariance matrix of the maximum likelihood

estimates 1I and B2. By simultaneously maximizing the elements of ET

subject to the constraint N=150, a design which minimizes the elements

of Z is produced. Springall did this and arrived at the design

{n 2=0, n 1371, n14=9, n23=0, n24=70, n34=0}, after integerization of

the n...
13

The asymptotic distribution of the maximum likelihood estimates

of 8,81 .. ,s was presented by Springall, but he presented few de-

tails. Therefore, the maximum likelihood estimates of the parameters

81, ... ,s and their asymptotic joint distribution are developed in

Chapter 2. The tie parameter, 0, is not considered in the remainder

of this dissertation.
















CHAPTER 2

ASYMPTOTIC THEORY OF THE
MAXIMUM LIKELIHOOD ESTIMATORS


2.1 Introduction


This chapter contains the asymptotic theory for the maximum like-

lihood estimates of the parameters given by (1.5.1), where the logarithms

of the treatment parameters are expressed as a polynomial function of a

single quantitative variable, x. An example of an experiment which had

a single quantitative variable was given in Section 1.1. The treatments

in that example were different levels of an additive for coffee. For

an experiment with quantitative independent variables, it is appropriate

to do a paired comparison analogue to classical regression analysis.

The theory developed in Sections 2.2 and 2.3 assume the general

case of s independent variables xl, ... ,xs. The parameterization of

the Bradley-Terry parameters, given in (1.5.1), is reproduced here:


s
ln = Z kki' i1...
k=l


The Bradley-Terry model with ties not allowed is used, where the r. are

defined by (1.2.2).

In order for all of the parameters 81 ... ,s to be estimable, it

is required that s a (t-l). This is clearly necessary since the Bradley-

Terry parameters have dimension (t-l). Note also that an intercept










parameter, 0 is not estimable without imposing a constraint on the

Bradley-Terry parameters. This becomes apparent when examining the

preference probabilities. For example, suppose the model was


s

in = 0 + k xki
k=l


i=l,...,t.


Then


P(T. T.) = ./(T. + r.)
I 3 1 1 3

s

exp(80 + kki
k=l


s

exp (B0 + Bkxki)
k=l


s

+ exp(B0 + 7 kxkj
k=l


s
exp( kxki)
k=l
s s

exp( kxki) + exp( kxkj)
k=l k=l


isi


The parameter 60 falls out, implying that any choice of B0 results with

the same preference probabilities. Choosing B0=0 replaces the imposi-

tion of a constraint on the treatment parameters.


2.2 Estimation


As was previously mentioned, the parameters B1, ... ,s are esti-

mated by the maximum likelihood method. From (1.3.1), the log-likeli-

hood equation is







t
In L = a.lni. n .(nT(r. +
Si= il
i=l i

t s
= ia Skkik
i=l Lk=l


(2.2.1)


I n ijn exp( S kki) + exp( kxkj
i

The maximum likelihood estimators are found by taking partial

derivatives of the log-likelihood equation, setting them equal to zero,

and solving for the parameters. The likelihood equations are


t
9ln L r
-1n L = a.x n.)
r i=l i r







Xrj
= i m. m
isj ij


s s
Sriexp( kx)ki + x rjexp( kxkj)
k=l k=l
s s
exp(Z kkxki) + exp( 7 kXkj
k=l k=l


s s

iexp( SkXki) + xrj exp( kxkj)
k=l k=l
s s
exp( kxki) + exp( B Skxkj
k=l k=l


i= > 7m. ri r = 0 r=l...s.
13 ] + 7.
i/j *L 3


(2.2.2)


where m.. is the total number of times T. is preferred over T..
1] 3










These equations must be solved iteratively. Appendix B contains

an APL program and a Fortran program which find the estimates of 81,...,

0 The procedure described in the appendix converged for most situa-
s

tions. However, occasionally when an attempt was made to fit a full

model (s=t-l), the procedure did not converge, which remains unexplained.

In this situation one can resort to the program found in Appendix A

which finds estimates of the Bradley-Terry parameters, and then solve

(t-l) equations in (t-l) unknowns to find the estimates of 8 ,... s


2.3 Asymptotic Distribution


First of all, some additional notation and elementary theory is

presented. This will become useful when the asymptotic joint distribu-

tion of ... is derived.
1 s

Let Xik be a random variable which is equal to 1 if T. is selected
i;)k 1
th
over T. for the k comparison of T. and T., 0 otherwise. Then X.
3 1 3 13k
has the Bernoulli distribution with probability of "success" equal to

T./(T. + T.), i,j=l, ... ,t, k=l, ... n.j., and the Bernoulli random

variables are independent of each other. This implies that m. has the
1J

binomial distribution, with sample size n.. and "success" probability

equal to T./(IT + T.), where
1 i 3


13
m. = Xijk i
k=l


Furthermore, the m..'s are also independent of each other, and from the
ii
properties of the binomial distribution we have


Em.. = n. .T./( + T.) ,
1J 1J J 1 3


i

(2.3.1)









2
var(m.ij) = n.ijiT/(ir + .) i 1D 1313 1 3


cov(mij,mkl) = 0 ,


i i,j,k,l=l,...,t.


From (2.3.1), (2.3.2), and (2.3.3), it follows that



n. Tr. nk T
E(m..m. ) = (Em.j) (E ) = k
] k mk iT. + Tj. Tk + T '
1 ] k I





2 2
2 2 in. j.T + n.. i.
Em.. = var(m..) + (Em..)2 1 J l 1
13 13 13 (I + i)2
1 3


(2.3.2)



(2.3.3)


(2.3.4)


(2.3.5)


Under certain regularity conditions verified by Springall (1973),

it follows from a theorem found in Wilks (1962, page 380) that

NI( ) ...s, N( -B ) is asymptotically multivariate normal,
with mean vector ~- and variance-covariance matrix (ab), where
with mean vector 0 and variance-covariance matrix (A ,) where
ab


= 31n L ln L
a = E aJ '
a b b


a,b=l,...,s.


We now proceed to derive the asymptotic variance-covariance matrix

by deriving Xab for general a and b. From (2.2.2) we have



Ix .-x x .-x
Dln L = Xai x a + a a .
~ ~ = L L m^ij + L ^7 m+jT j
a i N 4-V j
3<1


(2.3.6)









- X -X X .
SX i m.. + a] ai (n..- .. .
zi< i z + 1j T2 3 + 1 ijj )
i

m. (x -x ) n. a=l,...,s. (2.3.7)
i< j ( a -

Then from (2.3.4)-(2.3.7), we find Xab to be



A = E m. i (x .-x ) n. i.
ab i








i (ijk or j24)
2 2
n. .T i T n. i T

i
- nij r nkl~Tk
2 n n (x \-x ) (x -x )
z j (zz : .) (T+T, 7+1) I ai aj bk bl





i

+(7C + lT ) kT + T1) (Xai-xaj) kxbk xbl)
i



i< 1 2l
ni ( Z 2+, Zj -r.)2 T aib=l,....s. (2.3.8)
T- < T.T









As previously mentioned, the remaining chapters deal with finding

optimal designs when the logarithm of the treatment parameters is ex-

pressed as a polynomial equation in a single variable. The various

optimality criteria considered include


(i) Minimization of the variance of the maximum likelihood esti-

mator of one parameter,

(ii) D-optimality,

(iii) Minimization of the average variance of the predicted

"response", In iT

(iv) Minimization of the maximum variance of In 7 (minimax),
x
(v) Minimization of the bias,

(vi) Minimization of the average mean square error of In Z when
x
bias is present (J-criterion),

(vii) D-optimality for a preliminary test estimator.


These are all defined in subsequent chapters.

For the case of a single quantitative independent variable, the

model given in (1.5.1) becomes


s

n t. = x i=l,...,t. (2.3.9)
i k ,i k ..
k=l


Similarly, the variance-covariance matrix given in (2.3.8) becomes
S-l
(ab)1, where



\b = nj 2 (x a-x a)(x.b-x.b), a,b,...s.
ab + ( 2 + 1 (2
i









Notice that the complete design problem is quite complex. It

involves deciding how many levels and additionally which particular

levels should be chosen. Furthermore, it must be determined how often

to compare each pair of treatments.

A further complication is apparent by noticing that the variance-

covariance matrix, whose elements are given in (2.3.10), depends on the

true parameter values. Hence the optimal designs also depend on the

parameters. The designs discussed by El-Helbawy and Bradley and Spring-

all were found under the assumption that all Bradley-Terry treatment

parameters are equal, or equivalently that = 2= =B =0. In no

paper of which the author is aware was the dependency of the optimal

design on the true parameter values examined. As will become apparent

in the following chapters, the optimal design for one set of parameter

values can be quite different from the optimal design for another set

of parameter values.















CHAPTER 3

MINIMIZATION OF THE VARIANCE
OF A SINGLE PARAMETER ESTIMATE


3.1 Introduction


The present chapter contains a discussion of the linear, quadra-

tic, and cubic models. The optimality criterion considered is the

minimization of var(S.), where i=1,2, and 3 for the linear, quadratic,

and cubic models, respectively. The linear model is discussed in Sec-

tion 3.2. Designs for the quadratic model are found in Section 3.3 for

32=0, 81 variable. The general expression for var(82), given by

(3.2.6), can be shown to be a continuous function of 2, and hence the

designs in Section 3.3 are locally optimal at the point B2=0. Likewise,

designs for the cubic model are found in Section 3.4 for 3=0, 81 and

2 variable, and are similarly locally optimal.


3.2 Linear Model


From (2.3.9), the linear model is


In f. = 8x. i=l,...,t. (3.2.1)
1 1


Since there is only one parameter, by (2.3.10) the variance-covariance

matrix becomes a simple variance, X-1, where

B(x. + x.)
S TS' e I ]2
i i








(x. x.)2

ni. (exp(8(x.-x.)/2) + exp(B(x.-x.)/2))2
i

a.

-= nij 8d../2 -Bd/2 (3.2.2)
i

where d..=x.-x.. For the remainder of this dissertation, it is assumed
13 i J
without loss of generality that the independent variable, x, is scaled

such that it is in the interval [-1,1].

Although it is not necessary to treat the two cases B=0 and SrO

separately, the special case 8=0 is considered first because the argu-

ment used to derive the optimal designs is easier to follow for this

situation. When 8=0, (3.2.2) reduces to



n = n. (x. x.)2/4 (3.2.3)

i

The expression in (3.2.3) is a maximum when all the "weight" of the n..
1)
2
is placed onto pairs which maximize (x. x.) Clearly this implies
1 3
that the design which minimizes var(B) is a comparison of only the pair

(-1,1).

The designs which maximize A when 85O are found using an argument

identical to the one used for the case $=0. From (3.2.2), all of the

"weight" of the n.. should be placed onto pairs which maximize
13

2 (d/2 -$d/2 2
y = d2(e + e- 2 (3.2.4)


where d is then the optimal difference between the levels in the pairs.

The value of d which maximizes y in (3.2.4) is the solution to









y 2d -^d2(ed/2 e-Bd/2
y = 2d Sd 2(e = 0 (3.2.5)
3d (ed/2 + e-d/2)2 (ed/2 + e-d/2 3



The equality in (3.2.5) implies


coth(Sd/2) = Sd/2 (3.2.6)


The solution to coth(u) = u is a transcendental number, and can be

shown to be approximately u=1.1997. Hence, from (3.2.6) and the fact

that Idl must be less than or equal to 2, the design which minimizes

var(B) for a given value of 8 is


dl = Ix. x.I = min(2,2.3994/ B ) (3.2.7)


It is now shown that the value of Idl given in (3.2.7) is in fact

a maximum for y, and consequently for A. By (3.2.5), the second par-

tial derivative of y with respect to d is given by


2 y Bd/2 + d/2 2
Y (e + e )
Dd

22 322 2
2 4dB-tanh(Bd/2) 2d2/2 + 6 d tanh ($d/2) (3.2.8)


The second partial derivative 3 y/d2 is negative if and only if the

expression in (3.2.8) is negative. From (3.2.6), the equality


2 = Bd-tanh(8d/2) (3.2.9)


holds at d=2.3994/181, which is then substituted into (3.2.8) to give


2
3- = 2 8 B2d2/2 + 6 = -2d2 /2 < 0 (3.2.10)

opt. d










Because there is only one solution to (3.2.6) where d>0, the result in

(3.2.10) implies that the solution of Idl given in (3.2.7) does in fact

minimize the variance of B.

By (3.2.7) the optimal design is a comparison of only the pair

(-1,1) when 8 <1.2. It can be shown that P(l -1)=.92 when 8=1.2.

Because of this relatively large preference probability of choosing one

endpoint over the other, in most practical situations B will be less

than 1.2. So in these cases the optimal design is a comparison of only

the endpoints of the experimental region.

The optimal designs given by (3.2.7) are also optimal for other

criteria. In particular, they are optimal for criteria (ii)-(iv) intro-

duced in Section 2.3. These criteria are formally defined in Chapter 4,

but a brief explanation for their equivalence in the case of a linear

model follows. The equivalence of the criterion being discussed in the

present section to D-optimality follows from the fact that the deter-

minant of a Ixl matrix is simply the only element in the matrix. The

equivalence to the other two criteria follows from the fact that


2
var(ln i ) = x var() .(3.2.11)
x

From (3.2.11), criterion (iii) is, in this case, the minimization of



2 2 "
I
x var(8) dx = -var(8)
-1


and criterion (iv) is the minimization of


2
max x var(B) = var(B)
xEL-l,1]









3.3 Quadratic Model

The quadratic model is

2.
In = xi + x i=, ,t.
1 81xi 2 1 '


From (2.3.10),

mum likelihood


the asymptotic variance-covariance matrix of the maxi-
S2 i -1
estimators a1 and 82 is (A ) where
1 2 abI


A11 =
i


12 -= I
i


2i i

2
ijij(X i x)




*..(X. x.)(x. 2 2
nij ij(xi 2 x



2 2 2
n ijij (xi x. )
13fl 1


(3.3.2)




(3.3.3)




(3.3.4)


and where


.ij = (xi'x) =
iJ 1 3


exp( 81(x. + x.) + 8(x.2 + .) )
S2exp( 2 1
2 2 )2
( exp(8 1xi + 8 2xi ) + exp(81xj + 2xj ) x


(3.3.5)


i


Minimization of var( 2)


Consider the test of the hypothesis


H : 8 = 0 S, unspecified,
o 2 1


versus


H : S2 / 0 1B unspecified.
a 2 2.


(3.3.1)









Minimizing the asymptotic variance of 52 would equivalently be maximiz-

ing the asymptotic power of the test. The variance of 2 is the lower

right-hand element of the 2x2 matrix (lab)- That is,



var(B2) = 1 (3.3.6)
11 22 12


Under the null hypothesis, (3.3.5) becomes


8 (Xi + Xj)
e
J = 1 -2 i (e13 +e )


As was pointed out in Section 3.1, in the present section (3.3.6)

is minimized for 82=0. This produces locally optimal designs. Assuming

52=0 greatly simplifies the design problem as will be seen in the

remainder of the section.

The end result of Lemmas 3.1-3.2 and Theorems 3.1-3.3 is a set of

optimal designs which minimize var(82) under the null hypothesis for

various choices of the linear parameter. Hereafter, "optimal design"

will refer to the design which is optimal for the criterion being dis-

cussed in the section.

The following two definitions are useful in the discussion that

follows.


Definition 3.1. The pair of treatments (-x ,-x2) is defined to be

the symmetric counterpart of the pair (x ,x2).


Definition 3.2. A design is said to be symmetric if each pair in the

design is compared as often as its respective symmetric counterpart.









Lemma 3.1. Let {(x.,x.) denote .ij as given by (3.3.7). Then

(xix .) = (-Xi-x.-x), for any x. and x..

Proof.
8 C(x. + x.) -2 1(x. + x.)
x e e
X i lxj 2 -281(xi + x.)
Se +e ) e 3


81(-xi-x)
e
e1(-x.) + e1 (-x ) 2
Ce^ j' +e


= 0(-Xi,-xj ) +


Theorem 3.1. The optimal design is a symmetric design.

Proof. Notice that both of the following hold:

2 2
(i) (x. x.) = ( (-x.) (-x ) )
3 1 3 1

(ii) (x.2 x 2) = ( (-x.) (-x.)


From (3.3.2), (3.3.4), Lemma 3.1, and the above, it is clear that ll

and X22 are invariant under changes of any number of comparisons of

pairs to comparisons of their symmetric counterparts.

Let D denote any particular design, and let varDC (2) be the var-
th
iance of 2 for Design D Also, let the i pair in D be denoted

(x.,xi). The sets of comparisons {(x.,x ), (-x.,-x)} are considered

in sequence, and without loss of generality, it is assumed that


n ,- n = d. 2 ,
x.,x. -X. ,-X.
1 1 1 1

for all i. Notice that it could be true that n = 0.
-X -X
I I










Design D2 is formed from Design D1 by changing d./2 of the compar-

isons of the pair (x.,xi) to comparisons of the pair (-x.,-x'). This
1 1 1 1
results in (n + n )/2 comparisons of each of the two pairs
x.,x. -x.,-x.
11 1 1
th
in the i set. This change is made for each set of comparisons, there-

by defining Design D2 to be a symmetric design. The total number of

comparisons, N, is left unchanged.

By the first paragraph of this proof, the values of All and A22

are the same for the two designs. By (3.3.3), 12=0 for Design D2,
12
and hence A2 is a minimum. Therefore, by (3.3.6),



varD (2) S varD (2) (3.3.8)



Note that the variances are equal if 12=0 for Design DI, and it

is possible that A 12=0 for a nonsymmetric design. However, (3.3.8)

shows that no design can "do better" than a symmetric design. Also

note that in practice, d. must be an even integer in order for the
1
n. 's to be integers.


So the search for optimal designs in the present section will hence-

forth be restricted to the class of all symmetric designs.

Because 12=0 for a symmetric design, (3.3.6) reduces to


var(B ) = 1/22 (3.3.9)



So it is now necessary to find designs which maximize A22. Notice that

122 depends on 1. As previously mentioned, the optimal design also

depends on the value of S1.









Theorem 3.2. The optimal design is a design in which the only compar-

isons made are comparisons of the pairs (x,l) and (-l,-x) an equal num-

ber of times, for some x. Furthermore, x, 0.


Proof. First of all, the notation used for A22' which is defined by

(3.3.4) and (3.3.7), is changed. The N total comparisons are arbitrar-
.th
ily ordered, and the i pair is then defined to be (x.,x!). The vari-

ables y. and z. are defined to be
1 1


y = x. x z. = x + x i=,...,N.
1 1 1 1 1


Then from (3.3.7),


1(x' + x) -8 l(X + x!)
1 i i 11 1
= e e
i li xx l -Sl(xi + x!)
(e +e ) e



8 y /2 -1+i/2 2 i=l .. N
(e + e


Notice that i. is a function of y. but not of z.. From (3.3.4), 12 is
1 1 12
rewritten as
N

22 = iii zi
i=l


Let D be any design. By holding y. fixed, we examine what changes
2
in z.2 will increase 22. Clearly if 1zil is as large as possible for

every i, then X22 is also as large as possible for the particular set

of y.'s found in D. Without loss of generality, it is assumed that
1
x. 1 1 1 1 1









all comparisons must involve either x=-l or x=l. Additionally, using

an argument similar to one used in Section 3.2, it is clear from the

expression for A22 given in (3.3.4) that x=l should be compared with

one and only one level of x, namely the value of x which maximizes

0(l,x)-(1-x ) This result in conjunction with Theorem 3.1 proves the

first part of the theorem.

We can now rewrite 122 simply as


22
122 = NO(l,x)-(l-x2)2


It remains to be proven that the optimal value of x is such that x L 0.

The variable I(x) is defined to be the positive square root of

X22/N, i.e.


81(1 + x)/2


1 B2x
~(x) ( x2
e1 1x
e + e


(3.3.10)


Since (3.3.10) and the fact that xE[-l,l] imply I(x)

A22 is equivalent to maximizing P(x).

It will now be shown that '(x) > i(-x), x a 0.

inequalities are equivalent:


f(x) a 4(-x)


1 (1 + x)/2
e
1 1x
e + e



13 (l+x)+1 1 (l+x)-0 x
e + e


> 0, maximizing



The following five




(3.3.11)


1e(1 x)/2
e (3.3.12)
e + e



~ (l-x) + 1 1( (1-x)+81x
r e + e (3.3.13)









81(1 + x)/2 -1 (1 + x)/2 81(1 x)/2 -8 (1 x)/2
e + e a e + e (3.3.14)


cosh(u + 81x) cosh(u) (3.3.15)

where

u-u
cosh(u) = (e + e )/2 ,

u = 1 (1 x)/2


Now cosh(u) is a monotonically increasing function of u for u. 2 0.

Furthermore, it can be assumed that 3 1. 0 because Theorem 3.3 proves

that the optimal designs for 1 = 10 are the same. Or, equivalently,

the remainder of this proof can be repeated for the case 1 <0. Then

x ; 0 and B1 > 0 implies B x > 0, and hence by the equivalence of

(3.3.11) and (3.3.15), 4(x) L(-x) for all x O 0. This completes

the proof of the second part. t


The next theorem shows that optimal designs only need to be found

for nonnegative values of 81.


Theorem 3.3. The optimal design for B1=-B10 is the same as the

optimal design for 81=B10.

Proof. Recall that the optimal design is found by finding the value of

x which maximizes P(x) from the proof of Theorem 3.2. Since i(x) is

actually a function of 1 and x, we write


B1(1 + x)/2

(81,x) = e1 x/ (1 2
e + e









The theorem is proved by observing that



81(1 + x)/2 -81(1 + x)
e 2 e
t(81,x) = 1ex (1 x2) e* (l + x)
e +e e


-81(1 + x)/2
=e (1 x2)
-81x -81
1
e + e


= (-1,x) +



All that remains to be done is to find the value of x that maxi-

mizes f(x) for various choices of B1 0. Unfortunately, a closed form

solution of x as a function of B1 does not exist. However when 31=0,

the optimal value of x can be analytically derived, and hence this

situation is considered first.

Suppose B =0. Then from (3.3.10), i(x) = (1 x2)/2. Clearly

this is maximized by x=0, and hence from Theorems 3.1 and 3.2, the

optimal design is an equal number of comparisons of only the pairs

(0,1) and (-1,0). For the case 81 0, the grid method described in the

next paragraph was used to obtain the results presented in Table 3.1

and Figure 3.1.

The grid method utilized first calculated 4(x) for x=0(.l)l,

and the best value of x, say xl, out of these 11 choices was found.

Then t(x) was calculated for the 11 values x=(x -.l),(x -.08), ...

(x +.l), and again the best value of x was found. This procedure was

repeated until the optimal value of x was found accurate to the fourth









decimal place. The entire procedure was done separately for each value

of 16 .

The following lemma proves that the values of x which are found

via the grid search are absolute maximums for i(x).


Lemma 3.2. Exactly one local maximum (and hence it is the absolute

maximum) exists for l (x) in the interval L0,1].

Proof. By Theorem 3.3, only the case 1 >0 needs to be considered.

From (3.3.10), the first partial derivative of l(x) with respect to x

is

1(1 + x)/2 e (81(1 x )/2 2x)
1 1(x) = 1 1x 2 x (3.3.16)
L 2 l 2
( e + e ) e (8B(1 x )/2 + 2x)


It can be seen from (3.3.16) that (0)>0 and 4i'(l)<0. Hence ,0'(x)=0

for at least one xeLo,l1, and it remains to be shown that i'(x)=0 for

exactly one xE[O,lj. (An attempt was made to show 4"(x) is negative

for all 61 and xELO,l2, but it turned out not to be true. In particular,

a graph of '(x) as a function of x for 81=5 indicated ("(x)>0 for values

of x near 0.)

From (3.3.16), W'(x)=0 if and only if

-- (1 x)

(81(1 x2)/2 2x) = e () 1(1 x )/2 + 2x) (3.3.17)


Notice that the functions f(x) = ( 1(1 x2)/2 2x) and

f2(x) = (81(1 x2)/2 + 2x) are reflections of each other about the

axis x=0, and they are quadratic in x. Hence they intersect exactly









once at x=0. Also, for fixed B1, fl and f2 have maximums at -2/81 and

2/81, respectively. Let gl(x) be the right-hand side of (3.3.17). It

was shown in the preceding paragraph that fl(x) and g (x) intersect at

least once in the interval Lo,l]. To determine that they intersect

exactly once, it is sufficient to show that f l(x)/3x < gl (x)/ax.

From (3.3.17),


f l(x)
3x -x 2,


gl(x) -81(1 x) 2 1(1 x)
x = e (1(1 x )/2 + 2x) + e (-8x + 2)


-3 (1 x)
= e 1 ( 2(1 x )/2 + B3x + 2)


For 31>0 and xELO,1], it is clear that fl (x)/2x < 0 and

agl(x)/3x > 0, which completes the proof. t


Table 3.1 presents the optimal value of x found from the grid

search described immediately prior to Lemma 3.2, for various choices

of 1 1. Figure 3.1 is a graph of the optimal value of x versus 1B11.

The table includes relatively large values of B11 to illustrate that

xtl as Is1 B m. Recall that for a given value of B l1, the optimal

design is a comparison of (-l,-x) and (x,l) an equal number of times,

where x is found in Table 3.1 or Figure 3.1.

Table 3.1 indicates that the optimal design when B1=0 involves

only 3 levels, x=-l,0,1. However, when 81O, the optimal design

requires 4 levels, x=-l,-x ,x ,1, for some x In certain experimental
So o
situations it may be significantly more laborious to make four "batches"
















Table 3.1. Designs which minimize var(B2)


Relative
81 Optimal x efficiency

0 0 1
.05 .0003 1
.10 .0012 1
.15 .0028 1
.20 .0050 1
.25 .0077 .9999
.30 .0110 .9998
.35 .0150 .9995
.40 .0194 .9992
.45 .0243 .9988
.50 .0297 .9982
.55 .0356 .9974
.60 .0419 .9963
.65 .0486 .9950
.70 .0556 .9935
.75 .0630 .9915
.80 .0708 .9893
.85 .0787 .9866
.90 .0869 .9836
.95 .0953 .9801
1.0 .1039 .9761
1.1 .1216 .9668
1.2 .1396 .9556
1.3 .1580 .9426
1.4 .1764 .9269
1.5 .1948 .9097
1.6 .2130 .8905
1.7 .2310 .8696
1.8 .2487 .8472
1.9 .2660 .8233
2.0 .2829 .7983
3.0 .4269 .5242
4.0 .5295 .2974
5.0 .6031 .1546









































CN
< ~












Ol
N












4
U
Lf

aD



a.






cC









than it is to make just three. For this reason, and because a priori

information about 1 may not be available, asymptotic relative efficien-

cies were calculated for the design ({n,0 = nl = N/2} to the optimal
-1,0 0,1
designs found in the table. The relative efficiency of the first design

to the second (optimal) design is defined to be


var 2(2)
R.E. = 2
var (B2)


1, 1 2
e -/(e 1 + 1)2
B1(1 + x) ,2 1 x 2
e (1 x )/( e + e )


The relative efficiencies were enumerated for various choices of 81,

and are presented next to the optimal designs in Table 3.1.


Equivalence of El-Helbawy and Bradley's method
and minimizing the variance


An application of El-Helbawy and Bradley's method for finding opti-

mal designs was briefly discussed in Section 1.4. It will now be prov-

en that their optimality criterion is equivalent to minimizing var(8k),

any k. First of all, notation and results of El-Helbawy and Bradley's

paper which are necessary for the proof of Theorem 3.5 are presented.

Recall that El-Helbawy and Bradley considered linear contrasts of

Yi = In i=l, ... ,t. The matrices B and B are defined as mxt and
1 1 -m -f
nxt, respectively, with zero-sum, orthonormal rows, such that

0 s m < m+n S t-l. The null hypothesis is


H : B Y(T) = 0
S -n-- -n










and the alternative hypothesis is


H : B y(r) 0 ,
a -n-- -n


th
where Tr and y(r) are column vectors with i elements 7T. and y., respec-

tively. The rows of B represent the linear contrasts assumed to be
-m

null. Note that B need not exist (i.e. m=0).
-m

The vector T is a value of T which satisfies the null hypothesis
-o -

and the constraints, where r is the true parameter vector.

The matrix B is defined to be the txt orthonormal matrix



[1,'//


B = B (3.3.18)

--m




where B* is any (t-m-l)xt matrix such that BB' = I and l'=(l, ... ,1),
-m t -

the t-dimensional unit vector.

The following theorem is extracted from El-Helbawy and Bradley

(1978, Theorem 2, page 833).



Theorem 3.4. Let p be the maximum likelihood estimator of T. Under

the assumption of connectedness (Assumption 1.2), and by the assumption

of strictly positive elements of T, which has already been inherently

assumed since ln(0) is undefined, v7((p) Y(7)) has a limiting distri-

bution function that is singular, t-variate normal in a space of (t-m-l)

dimensions, with zero mean vector and dispersion matrix


,
E(r) = B*'(C(-)) B* (3.3.19)
-m -- -m


where










C(T) = B*A(7r)B*' (3.3.20)
M- -m


and A(r) is the txt matrix with elements


t

Aii (i) = i ij i /(ni + .), i=l.. t,

j=l
ji (3.3.21)


A..(Tr) = -U. .ir. ./(r. + i .) i j, i,j=l,...,t,



and where U.. is defined in Assumption 1.2.
13


El-Helbawy and Bradley's optimality criterion is the maximization

of the noncentrality parameter, 2, for the asymptotic chi-square dis-

tribution of the likelihood ratio statistic. The noncentrality parame-

ter is


2 -1
2 = 6'E 6 (3.3.22)
-0 -


where 6 is a vector of constants defined in their paper, E is the
o
-1
leading principal matrix of order n of C B provides the first n
-o -n
rows of B*, and


C = B*A(T )B*' (3.3.23)
-o -m- -o -m


A theorem proving the equivalence of the two optimality criteria

is now presented.


Theorem 3.5. Consider the model In iT = 1 x+B 62x2+ +kxk

Suppose we are interested in testing, for some fixed i,


H : 3.=0 versus H : 30 .
o 1 a i









Then the design which minimizes var(Bi) is identical to the design

which maximizes the noncentrality parameter from the likelihood ratio

test. (Note that it does not matter whether or not it is assumed that

some of the other l.'s are 0.)
3

Proof. Orthogonal polynomials can be derived for any spacing of the

independent variable. Let these contrasts be the rows of B. The lxt

matrix B is the orthogonal polynomial corresponding to the parameter
-n=1

B. being tested.

From the method used to find orthogonal polynomials, it follows

that


1 c-,n=1 (TI


where c is a constant. So minimizing var(B.) is equivalent to minimiz-

ing var(B y(p)). But from Theorem 3.4,
-n=1-

-1
var(B n-(p)) = B (B*'(C(T)) B*)B'
-n=l -m - -n=






-1
0
= (1,0, ... ,0) (C(7) 1





S ((C(7)) ) l


2
From the definition of Z E is the leading principal matrix of

-order 1 of C 1
order 1 of C i.e.
-o


Z = ((C())1)-1
o - 11









In this case, 6 is a scalar constant, and so maximizing 12 is equiva-

lent to minimizing Z = var(B). t
o 1


Example 3.1. This example illustrates the equivalence of the two

methods presented in Theorem 3.5. It is assumed that the true model

is a quadratic model, In ix = l x + 5 x Also, the design is re-

stricted to have the three design points x=-l,0,1.

Table 3.1 indicates that the design which minimizes the variance

of O2 is {n-1,0 = nO,1 = N/2}.

The design problem is now approached using the method presented by

El-Helbawy and Bradley. The null hypothesis is


Ho: 2 = 0 unknown.
o 2 1


Using orthogonal polynomials, this hypothesis is equivalent to


H (-1 2 -1)
Ho. y(T) = 0 ,
o /6


where y_' (r) = (In -i1, In O In 1) .

Note that El-Helbawy and Bradley's constraint, that the In ir.'s
1
sum to zero, poses no problem. Instead of constraining T0 to be equal

to 1, an intercept parameter, B0, could be introduced to the model.

The model would then become


2
In = 0 Sx + B2x


with the constraint

SIn rT = 0
x x
x


It can be shown in this particular case that 80 3 .
0 32








For a definition of the following matrices and notation, see

(3.3.18)-(3.3.23). For the example under discussion,


B
-n=1


= (-1 2 -1)//6 ,


2(-1 2 -1)//6
B* =
-m (-1 0 1)//2


12 + 13
A_ ) = T -1

--13


where 7 '=(1,1,1), i.e. B =B =0. Note that T can be any
-o 1 2 -o
satisfies the null hypothesis and the constraint.

The matrix C is
-o


vector which


C = B*A(TT )B*'
-o -M- -- -im



1 2 12 23 Vi2 12 23

S112 -23 23 1223


implying


2 (112 + 4 +13 + 23
C

S12 H23)
712


12 -23)
/12 / -

3 12 + 23
2 12 23


-1
Then since Z is the upper left-hand element of C
o -o


S 12 + 4113 + "23
o 21 C


-W12

+12 + H23

-123


-13

-23

13 + 23_









implying



-1 21 Co
o0 12 + 4 +13 + 23



3 V12V13 + V12 23 + 113 23
S ---V(3.3.24)
8 012 + 4113 + 23 (3 24)



The design which maximizes the chi-square noncentrality parameter from
-l
the likelihood ratio test is the design which maximizes E
o
-1
By the definition of the U ., 3 = 1 12 13 So can be
ij 23 12 13 o
rewritten as



-1 12 13 + (12 13)( 12 13
o 12 + 4 +13 + (1 V2 13)


2
12(1 V12 U13) + (V13 132)
(3.3.25)
31 + 1
13


Suppose that P13 is temporarily held constant. It is then claimed that
131
choosing u12 = (1 13)/2 maximizes o- for the particular V13"

From (3.3.25), the expression 12((1 U 3) 1 2) must be maximized.

Because of the constraint 0 S 12 5 1- ,13' it is clear that

12((1 U13) 12) is maximized when U12 (1 13)/2, which

proves the claim. The constraint among the U..'s in addition to the

above claim implies that the optimal design must be a design such that


1 (13
12 2 V23 (3.3.26)









-I
The scalar ~ is now rewritten as a function of v112 alone. From
o
(3.3.24) and (3.3.26),



-1 3 212(1 212) + 12
o 8 2112 + 4(1 212)


2
3 212 3112
8 4 612
3 1 12


= 3112
16


Clearly o-1 is maximized by choosing 112 as large as possible. By

(3.3.26), this would make 12- 23=.5, and 13=0. So this optimal

design is in fact the same as the design which minimizes var(2).


An attempt was made to find the optimal design for the test of

hypothesis

H : B = = 0 versus H : not H
o 1 2 a o


using El-Helbawy and Bradley's methodology. When simultaneously test-

ing more than one parameter, or equivalently more than one treatment

contrast, a constant value, 6., must be chosen for each parameter. The

theory behind the optimal designs includes an infinite sequence of

values of T satisfying H which are converging to a vector m7 satisfy-
Sa -o

ing H When the treatment contrasts are for factorial treatments,
o

which is the case they discussed extensively, the relative sizes of the

6. are said to be equivalent to the relative importance placed on each
1

contrast being tested in H Hence, El-Helbawy and Bradley usually

chose 6 =6 = = 6 .
1 2 n









In the present situation, choosing 61 and 62 restricts the alter-

native parameter values in the infinite sequence mentioned in the pre-

ceding paragraph to be a proper subset of the alternative parameters

satisfying Ha. For example, 61=62 implies that 68= 2, and hence the
2
alternative models are In T1 = Bx + Sx with B- as N-*. Choosing
x
different 6. can result in a different design because the maximization
1
of the noncentrality parameter depends on the relative sizes of the

6.. Additionally, the optimal designs actually found after choosing

the relative sizes of 61 and 62 were found to be unreasonable (e.g. for

61=62, the optimal design is a comparison of only the pair (-1,0)).

Chapter 4 presents other optimality criteria for fitting a quadratic

model.


3.4 Cubic Model


The cubic model is


2 3
In I. = x. + Bx.2 + x 3 i=l,...,t. (3.4.1)



The asymptotic variance-covariance matrix of the maximum likelihood
-1
estimators, B1, ,' and 63 is ( ab) where lab is given by (2.3.10).

The inverse of the variance-covariance matrix is partitioned as



11 12 13


(A b) = 12 22 23



13 A23 33


Then by a result found in Graybill (1969, page 165), the variance of









83 is

r -1
var( = ab 33



= 33 (A13 23> ( 12 ( 1j
12 22 23



SF 13 132212 23 23 1123- 12 13

33 11 A22 122 (3.4.2)


To simplify (3.4.2), only the case S =0 is considered in the

search for designs which minimize var(3 ). As pointed out in Section

3.1, this gives locally optimal designs. The following theorem shows

that optimal designs only need to be found for linear and quadratic

parameters which are both positive.


Theorem 3.6. Suppose that 3=0, and further that the optimal design

for given 3 and 2 is comparisons of the pairs (x.,x.) a total of

n times, for all pairs. Then
x. ,x.
1

(i) The optimal design for (-81) and (-52) is the same as the

optimal design for 1 and 82'

(ii) The optimal design for (-81) and 2 is comparisons of the

pairs (-x.,-x.) a total of n times, for all pairs,
1 3 x.,x.
(iii) The optimal design for 81 and (-2 ) is the same as in (ii).

Proof. From (3.3.5),


. 2 (xij) =








2 2 2 2
exp(-8 (x.+x.) 2 (x+x. )) exp(2 (x.+x.) + 23 (x. i+x ))
1 1 3 2 l 1 j 2 i
(exp(-1x i-82x ) + exp(-81x -82x )) exp (2B1(x +x ) + 282 (x i+x 2))


2 2
exp(B (x. + x.) + 2(x + x ))

(exp(61xi + B2xi2) + exp( 1xj + 2x 2))2


= 1 (x ,x.) (3.4.3)


2
Notice that since B =0, Tr i./(r. + ir.) from the expression for a
3 13 1 3 ab
given in (2.3.10) is equal to 812 (x,x ). Hence by (2.3.10) and
83,63 i j

(3.4.3), it is clear that Xab is invariant under changing the sign of

both 81 and 2, for all a and b. Therefore the variance of 3 is also

invariant under changing the sign of both 81 and 2. This implies that

the design which minimizes var(3 ) for 81 and 82 is also the design

which minimizes var(3 ) for (- 1) and (-8 ), thereby proving (i).

To prove (ii), note that from (3.3.5),

2 2
exp(-8 (x.+x.) + 3 (x. +x. ))
) = (xi,x.)' =
5-1 1 j 2 2 2
S2 (exp(-1Xi +B2xi ) + exp(-81xj+62x ))


exp(B (-xi-x.) + 8 (x. +x. ))

(exp(B1(-xi)+B2xi ) + exp(B (-x )+B2x2))2



S (-x.,xi'-x) (3.4.4)


Again, by (2.3.10) and (3.4.4), changing 81 to (-81) and the pairs

(x.,x.) to (-x.,-x.), for all pairs in the design, will not change 1 ,

122' 33' nor 113. It changes only the sign of A12 and 123. But









notice that in (3.4.2) all terms involving A12 and A23 are of the form

2 2
112 23, 122, or 23 Hence the negative sign cancels, leaving

var(3 ) invariant under the change described above. This proves (ii).

To prove (iii), simply apply the results of (i) and (ii).


Theorem 3.5 proved the equivalence of minimizing var(83) and

El-Helbawy and Bradley's criterion. A Fortran computer program was

used to find the optimal design using El-Helbawy and Bradley's metho-

dology for 8 = 2=8 =0. The levels allowed were x=-l,-.5,0,.5,1. The

grid search included all ten pairs formed by these five levels, assum-

ing a symmetric design. The optimal design found was {n =

n = .045N, n.5 = .621N, n_, = .298N, all other n.i = 0}.
.5,1 -.5,.5 -1,1 13
For this design, it can be shown that var(3 )=16/N.

It was decided that for unequally spaced levels, employing El-Hel-

bawy and Bradley's methodology would be substantially more laborious

than minimizing var(83). For this reason, the designs subsequently

discussed were found by minimizing the expression found in (3.4.2).

Optimal designs were found by a grid search via a Fortran computer

program for 83=0, 81 2=0(.2)1. The comparisons allowed were (-l,x ),

(x,x2 ), (x 2,1), and (-1,1), where the optimal value of xl and x2 were

also found by the computer. The motivation for the choice of these

four pairs is the result discussed in the two preceding paragraphs for

the case of all three parameters equal to zero, which indicated that

only the pairs (-1,-.5), (-.5,.5), (.5,1),.and (-1,1) need to be com-

pared.

The grid procedure is now briefly described. The initial values

for xl and x2 were x = .7(.1)-.3 and x =.3(.1).7. For each of the 25









combinations of xl and x2, the optimal comparison proportions were

found by another grid. The latter grid began by considering different

comparison proportions, changing by increments of 0.1. This grid

became finer and finer until the proportions were accurate to .0004.

From this the best combination of xl and x2 out of the initial 25

choices was found, and the entire procedure was repeated for a finer

grid about the previous best choice of xl and x2. The entire grid

search stopped when xl and x2 were accurate to .0037.

Notice that it is possible that the designs found are not abso-

lutely optimal, but rather they may only be local minimums for (3.4.2).

This is because only four pairs were considered when it is possible

that a five or six pair design, for example, might offer a slight im-

provement. Also, the grid search restricted xl and x2 rather than

allowing them to take on values in the entire interval [-1,1. The

latter point is not believed to be a problem due to the resulting

design for 1 =2 =B3=0 originally found using El-Helbawy and Bradley's

methodology, and the fact that the optimal designs currently being dis-

cussed all have values of x1 and x2 within 0.02 of 0.5.

The optimal designs and the value of var( 3) for N=l are presented

in Table 3.2. The first two columns of the table give the value of

81 and 2. Columns 5 through 8 give the comparison proportions for the

pairs (-l,x ), (x ,x ), (x2,1), and (-1,1), respectively, where xl and

x2 are given in columns 3 and 4. The last column gives N-var(3 ). The

number of times a pair is compared is found by multiplying the appro-

priate comparison proportion by N. Recall that by Theorem 3.6, the

optimal design for both parameters negative is the same as the optimal

design for !31 and B2 1. If exactly one parameter is negative, then









the optimal design is found by first finding the optimal design for

j8 1 and 1821, and then taking -ix to be the proportion of the total

number of comparisons that the pair (l,-x ) is compared, xl,x2 to be

the proportion for (-xl,-x ), and so on. To calculate var(3 ) for N>1,

the tabled value of N-var(B3) is divided by N.

Notice from Table 3.2 that for 8: .6, the optimal design is very

close to the design {n 1,.5 = n.5,.5 = n.5,1 = .3333N}, regardless of
-i,-.5 -.5,.5 .5,1
the value of 2. This implies that if it is a priori believed that 8

is relatively large, then this design will result in a variance of 83

very close to the tabled minimum. As has been previously mentioned, it

is also of interest that the optimal values of xl and x2 are always

very close to -.5 and .5, respectively. Finally, notice that the

design for =B 2=0 found in Table 3.2 is not the same as the one found

using El-Helbawy and Bradley's methodology, although the value of

var( 3) appears to be the same for both, and hence they are equally

optimal. However, it was not established that the two designs are

mathematically equivalent. Since the two criteria are equivalent,

there may be some problem with local minimums of var(63). It appears

that the surface of the expression for var( 3) is relatively flat, and

so it is possible that a finer initial grid is necessary to guarantee

that the minimum found is in fact the absolute minimum of var(83).












Table 3.2. Designs which minimize var(B3)




1 62 xl x2 -1,xl Xl,X2 X2,1 1,1 N-var(83)


0 0 -.50 .50 .133 .533 .133 .200 16.0000
.2 0 -.51 .50 .338 .331 .331 0 16.0836
.4 0 -.50 .50 .332 .337 .332 0 16.3225
.6 0 -.50 .50 .330 .341 .330 0 16.7322
.8 0 -.50 .50 .327 .346 .327 0 17.3190
1 0 -.50 .50 .323 .353 .323 0 18.0959

0 .2 -.50 .50 0 .666 0 .333 16.0000
.2 .2 -.50 .51 .333 .329 .338 0 16.1447
.4 .2 -.50 .50 .330 .336 .335 0 16.3835
.6 .2 -.50 .50 .329 .338 .333 0 16.7943
.8 .2 -.50 .50 .326 .346 .328 0 17.3819
1 .2 -.50 .50 .321 .353 .327 0 18.1604

0 .4 -.50 .50 0 .666 0 .333 16.0003
.2 .4 -.50 .50 .105 .561 .103 .231 16.3209
.4 .4 -.50 .50 .330 .333 .337 0 16.5664
.6 .4 -.50 .50 .327 .338 .335 0 16.9799
.8 .4 -.51 .50 .324 .342 .334 0 17.5716
1 .4 -.51 .50 .319 .349 .333 0 18.3554

0 .6 -.50 .50 0 .666 0 .333 16.0005
.2 .6 -.50 .50 .004 .661 .001 .334 16.3207
.4 .6 -.51 .50 .333 .328 .340 0 16.8748
.6 .6 -.51 .50 .327 .333 .340 0 17.2927
.8 .6 -.51 .50 .323 .337 .340 0 17.8907
1 .6 -.51 .49 .317 .346 .337 0 18.6825

0 .8 -.50 .50 0 .666 0 .333 16.0010
.2 .8 -.50 .50 .004 .661 0 .334 16.3225
.4 .8 -.51 .47 .039 .633 .078 .320 17.2619
.6 .8 -.51 .50 .328 .327 .344 0 17.7368
.8 .8 -.51 .50 .323 .331 .346 0 18.3436
1 .8 -.52 .49 .319 .338 .343 0 19.1472

0 1 -.50 .50 0 .666 0 .333 16.0015
.2 1 -.50 .50 0 .664 0 .335 16.3233
.4 1 -.51 .47 .034 .633 0 .333 17.2685
.6 1 -.52 .50 .331 .318 .351 0 18.1804
.8 1 -.52 .50 .324 .325 .351 0 18.9372
1 1 -.52 .49 .318 .331 .351 0 19.7558
















CHAPTER 4

DESIGNS FOR FITTING A QUADRATIC MODEL


4.1 Introduction


The present chapter contains optimal designs for fitting a quad-

ratic model under three different conditions, each for three different

optimality criteria. The D-optimality criterion is considered in Sec-

tion 4.2. This criterion minimizes the determinant of the asymptotic

variance-covariance matrix. In the present chapter, this matrix is a

2x2 matrix since there are two maximum likelihood estimators, 81 and B2"

From Section 2.3, these estimators are asymptotically normal. There-

fore, by a result in Box and Lucas (1959), the determinant of the

variance-covariance matrix is proportional to the volume contained with-

in any particular ellipsoidal probability contour for $=(81, 2) about

S, where $ is the true parameter value. The D-optimal design will
-o -o

then minimize the volume of any such probability contour.

Section 4.3 contains designs which minimize the average-variance

of In T = B x + 2 x The average-variance is simply the variance of

ln 't integrated over the experimental region. In this case, the exper-

imental region is xEL-l,l]. The property that these designs have is

obvious. That is to say, a good estimate of all of the preference prob-

abilities is desirable, and the design which minimizes the integral over

x of var(ln ir ) will give estimates of 7 which are good "on the
x x
average".









The criterion discussed in Section 4.4 is similar to the average-

variance criterion. The designs found in this section minimize the

maximum of var(ln TT ), where the maximum is taken over the interval
x
xEi-l,l]. These designs are referred to as minimax designs. The dis-

advantage of this criterion is similar to the disadvantage often attrib-

utable to the minimax rules in the decision theory framework. That is,

minimax designs may be slightly "pessimistic" in the sense that they

minimize the largest value that the variance can be. This minimization

is good, but at the same time the variance of In T may be relatively
x
large for most values of x in the interval [-1,1], hence causing the

minimax design to be unfavorable.

Finally, Section 4.5 presents some design recommendations and

conclusions. A number of designs are compared to determine which

design does best for a wide range of parameter values, using primarily

the D-optimality criterion. Additionally, an examination is performed

on how well these designs protect against the bias present when the

true model is cubic.

Recall that in general, the problem of finding optimal designs is

complex. It involves the determination of the number of levels to be

included in the experimental design, and also which particular levels

are to be chosen. Additionally, the number of times each pair is to

be compared must be determined. In Chapter 3 it was proven that the

design which minimizes var( 2) must be of the form {n_-x = nx,1 = N/2,

thereby implying that the optimal number of levels is four, and only

two pairs are to be compared an equal number of times. All that re-

mained to be resolved was the determination of x for different values

of a1 In the present situation, a similar simplification of the









general problem does not appear to be possible, except by imposing some

restrictions, such as letting $2=0. For this reason, three conditions

are considered under which optimal designs are found. This essentially

results in designs which are optimal for a proper subclass of the class

of all designs. However, as will become apparent in the remainder of

the chapter, there is not much difference in the degree of optimality

among the three conditions presently considered. Hence it is conjec-

tured that the degree of optimality of the designs found in the final

and most general subsection of Sections 4.2-4.4 is very close to if not

equal to the best that can possibly be achieved. The three conditions

are now briefly introduced.

The first condition considered in each of Sections 4.2-4.4 is that

the levels are restricted to be x=-l,0,1. This greatly simplifies the

general problem to one with only two unknowns, namely the comparison

proportions, P-1,0 and p0,1. The value of J-1,1 is found by the rela-

tionship -i, + 0 + -_ = i.
-tionship 1,0 0,1 -1,1

The second condition discussed in each of the following three sec-

tions is to let $2=0 in the enumeration of the variances and covariance.

In this situation the optimal designs are proved to be symmetric, and

hence the design problem is simplified to a choice from the class of

symmetric designs. As was pointed out in Section 2.3, previous papers

on the design of paired comparison experiments have always relied upon

setting all parameters equal to zero. Presently, the linear parameter

is allowed to vary. The three criteria are a continuous function of

the parameters because the variance-covariance matrix is continuous in

the parameters. Therefore, the designs found for the second condition









are locally optimal in the sense that they are close to optimal for

values of 82 which are near zero.

The third condition in each of the following three sections is a

consideration of designs whose comparisons are of the form (-l,x ),

(x2,1), and (-1,1). The optimal choice of the levels xl and x2 are

found in addition to the three comparison proportions. This is a gen-

eralization of the first condition since setting x =x2=0 results in the

first condition.

Figure 4.1 presents a synopsis of the procedures used in Sections

4.2-4.4 to find the optimal designs for each of the three optimality

criteria and the three conditions introduced in the present section.

The first column presents a brief description of the three conditions,

and hence the three rows in the figure correspond to the three condi-

tions. Columns 2-4 contain a brief description of the methodology used

for the three criteria.


4.2 D-optimal Designs


The D-optimal design is defined to be the design which minimizes

the determinant of the asymptotic variance-covariance matrix. Letting

Z denote the variance-covariance matrix, this is equivalent to maximiz-

ing IZ-l. In the case of a quadratic model, the D-optimal design is

the design which maximizes D, where


D = = 11 22 -122 (4.2.1)



and where X 11 122' and X12 are given in (3.3.2)-(3.3.4). Because the

optimal design depends on the true parameter values, designs are found













0

> C0

>- 0
-.4l o j




-,-t
o -o > o
X4 X .O O
I *- m X C) .


















O*H
a )01 '0 a .
4J -,- UT 4 *

04 r 50 -
I ifU 0 0
-C 0-4


'.0 0.
0 Cl C


I 0

"' -ri 11

































CC 0
4- 4 C) 4- CC

-1U m > U


,, >.- X -H 4
Eo 0 i


- u 'o
; U *d 1U X -H -


. .-i ro c n i
S C) U C) CB N U CC





















SH 0 1II 0 > 0 C
SO 4
S0 00 0




Uo ,) a) > o 1- z
Q f q o 0 4J

Ci 0 CC 0
0 0 004
I -H *4 CC
rC U 0 U


I NC O >S 00
i '-Q *H 0 *'C C 'i U


I E X U -. U





14 O R ir U O CC 0 0 ]
0 CC 0 CC N U V 4 CO













I 0 CC -- C: 0- C
i ti *- *H *-l CO ( C0
4 4 34 40 U










I 0 O 10 + 4J


0 H rN N- n Q, I
R 4 ul g g






S 0 0 rc4 r3 H
0 > 0 U a-i 4J C 0




E I U c
0 0 ---- CC e CC H 0




*0 0 C U 0 0 U
--4 0 -4 C 0 -





o H a U .C N 0 X N

4J C C U o HC 0Q 0 U M
0 :a 04 Io C ll -H C o x 'N
0 CC EC 4C rI I -H f X n
'N__ r ,-______< '4m~ C C ~ H ~









for various choices of the parameters. The three conditions intro-

duced in Section 4.1 are now discussed, beginning with Condition 1.


Condition 1

The levels are restricted to be x=-1,0,1. Because of this restric-

tion, and by substituting i.. for n.. in (3.3.2)-(3.3.4), the expression

for All' A12' and 22 can be simplified to



11 = -1,0 -1,0 + 0,1O Ol + 4-1,1 -1,1


12 0,1 0,1 1,0 -1,0 (4.2.2)


22 = P-1,0 -1.0 + 0,10,1


The substitution of Pij for n.. is made throughout the present chapter.

This essentially sets N=l since U.. = lim(n../N).
13 N4- ij

From (4.2.1) and (4.2.2), the expression for D is


D = 4- 1,0 0,1-1,0 0,1 +


4(1-u1,- 0 0,l1 -1,0_-1,0 + 0,1 0,1-'1,1 (4.2.3)



where .. is defined in (3.3.5). The partial derivatives of D with

respect to i -,0 and 10, are, respectively,
-1,0 0,1


3 4 0 40 ( 0 + 0 ) +
U, 0,1 -1,00, -1,1 -1,0-1,0 + 0, 0,1 +
-1,0
(4.2.4)
4(1 U_1,0 ,- ,O -1,1 -1,0 '








aD
-- =, 4U-1,00- ',0o0,1 4 -1,1( -1',0_-1,0 + '0,O10'1) +

4(1 p-1,0 0, -1,1 '


Setting the derivatives in (4.2.4) equal to zero results in


-= 1,0 -1,1 '


(4.2.5)


-iO, (-1-,0 0,1 + -,0 -,1 0,1 -1,1

0+ O,1 20,l ,


In matrix notation, (4.2.5) is equivalent to


-1I,0 -1,0 -1,1
0,1 0,i -i,i


= 0,-1,1


(4.2.6)


implying


ul 0,l -1,1


(4.2.7)


where


S2-1,'0 -1,1
A =

-1,0 0,1 -1,0 -1,1 0, -1,1


-tl,0 O,l% 0,1 -1,1 -1,

2%,itid)


For the solution of (4.2.7) to be a maximum for the expression in

(4.2.3), it needs to be shown that the matrix of second order partial


(4.2.4)


1-1 ,0(2.-_1,0 _I,' +









derivatives is negative definite (see Kaplan, page 128). By (4.2.4),

this matrix is


-1,02 D/-1,O 0,1
= = -4 A (4.2.8)
2 D/-1,0 Ol 2D/O, 12


where, from (3.3.5),


1 1 2
Se= 1/( e + e

-i + B + 82 2
1 2 1 2 2
= e /( e + 1 (4.2.9)

6i -u S -i-S
S= e + 2/(1 + e1 2 )2
0,



It is known from elementary matrix algebra that the determinant of

a square matrix is equal to the product of its eigenvalues. So in this

case, IXI=ele2, where el and e2 are the eigenvalues of X. If jX[>0,

then either both eigenvalues are positive or both are negative, or

equivalently, X is either positive definite or negative definite. But

it is clear from (4.2.8) that



( 1 0 ) X = -8_-1,1 -10 < 0, (4.2.10)



where -1,1 and (,0 are both positive and are given in (4.2.9). The
-1,1 -1,0
inequality in (4.2.10) implies that X is negative definite if |xJ>0.

However, it will be shown by (4.2.12) that the determinant is not nec-

essarily positive for all values of 81 and $2.








To simplify the notation, we write


/ -2a

-(a + c b)


-(a + c b)

-2c


where


a = = e / (e + e )(e + 1)
26 -1 + 2 a3 + )] 2

b = 0-1,010,1 = e22/ (e 1 + 1)(1 + e 1 2 2

81 +82 -81 1 + 2] 2
c = i,0, = e / (e + e )(1 + e )


Then the determinant of X can be written as

IXI = 4ac (a + c b)2

2ac + 2ab2 2 c
= 2ac + 2ab + 2bc a -b -c


(4.2.11)


After resubstituting for a, b, and c, and after some algebra in

(4.2.11), an expression for IXI is found by


482 -B + 382 8 + 8 -8 + 8
e + 4e + 1 + 4e + 3e

1 3e + 382 -381 + 2 -21 + 282 381 + 382
S= 2 +3e + e + 3e + e

281 + 28 28 -381 + 382 38 + 82
+ 3e +3e + e + e


- -21 -8 281 + 482 -48 + 282
e + 2e + e + e

481 + 28 -281 + 482 281 81 + 282
+ e + e + e +2e


(4.2.12)



I,


I









where

282 [(-B, B1 -8 + 2 B1 + 82, ) 2
S= 4e / (e + e ) (e + 1)(1 + e 1 (4.2.13)


Since >>0 for all $1 and 82, IXI>0 if and only if the right-hand side

of (4.2.12) is positive. However, it appears that for large enough

values of B1 and B2, some of the terms with a negative sign will over-

whelm the sum of all of the terms with a positive sign, and hence the

determinant could be negative. For example, 1 and B2 can be made

large enough such that the term exp(481 + 282) in (4.2.12) is larger

than the sum of all the terms with positive signs, thereby making

the expression in (4.2.12) negative. It will be seen shortly that this

is in fact the case.

An APL computer program was utilized to solve equation (4.2.7) for

various values of B1 and B2. The determinant, Ix was also calculated

and found to be positive for all values of 81 and 82 when the solution

fell inside the region of allowable values for the comparison propor-

tions, p ,0' ,1 and -i 1,1 That is, J.. Z 0, for all i and j, and

-1,0 0,1 + -1 ,1 = 1. Figure 4.2 depicts the values that -1I,0 and
-Iu Ui -+,v -1,0
j0,1 can have. They must lie inside or on the solid triangle. The
0,1
dotted triangle in the figure depicts the region that the D-optimal

designs are exclusively located.

When the solution to (4.2.7) falls outside of the solid triangle

in Figure 4.2, the optimal design must lie somewhere on the solid tri-

angle. This can be proven by contradiction since D is a quadratic func-

tion of two unknowns, and hence it can only have one stationary point.

For a point to be on the triangle, one of the following must be true:

1-1,0 = 0, o0,1 or -1,0 0,1= 1.











10,1


1









.5 ---------









0
0.5 1




Figure 4.2. Depiction of possible designs for Condition 1



If i,0 = 0, then from (4.2.3),


D = 40,1l _,10, 1 1 10,12



and 3D/O,1 = 0 implies lOi = 0.5. This clearly maximizes D

because of the negative coefficient on the quadratic term. Therefore

if the optimal design lies on the line segment -i1,0 = 0, then it must

be {VI0, = 1- = 0.5}. If v0,i = 0, then

2
D = 4_1,10-1,0 -, -l,0



and it can similarly be shown that the optimal design is the midpoint









of the line segment, namely {0, = 0, -I,0 = -1, = 0.5}. Finally,
0 / i ~-1,0 -1= /

if 1-1,0 +0, = 1, then


0 = 1 0 -
D = 4~-1,010, -1_,0 0,l



= 4- 1,0 0,11-1,0(1 -1,0'


Again, it can similarly be shown that D is maximized when _,

p0, = 0.5. Therefore, if the D-optimal design is on the solid tri-

angle, it must be one of these three designs.

When the solution to (4.2.7) was outside of the solid triangle in

Figure 4.2, D was calculated for the three designs presented in the

preceding paragraph, and the one with the largest value of D was taken

as D-optimal. Otherwise the D-optimal design was found directly from

(4.2.7). The determinant of X was also calculated. It was occasion-

ally negative as conjectured earlier, but only when the solution was

outside of the region, at which time the sign of IXJ is irrelevant.

The "restricted" D-optimal designs are presented in Table 4.1 for

all combinations of the parameter values B =0(.1)1,2,3 and 2=0(.1)1,2,3.

Recall that -1,0' 0,1' and p-1,1 represent the proportion of the

total available comparisons to be allocated to the pairs (-1,0), (0,1),

and (-1,1), respectively. The number of times one of these three pairs

should be compared is then found by multiplying the corresponding Pij

by N. The values of D are also given for the purpose of comparing the

designs with the designs found in the next two subsections. Addition-

ally, the relative efficiencies are given for the "restricted" D-opti-

mal designs presented in Table 4.1 relative to the D-optimal designs









found for Condition 3 in Table 4.3. The relative efficiency is the

square root of the ratio of the corresponding values of D, since D is a
2
multiple of N

Recall from Section 3.1 that a value of 1 for the parameter 1 is

"large". This is basically true for any parameter since x is in the

interval [-1,1]. For this reason, it is assumed that rarely will S1 or

82 be larger than 1, and hence parameter values of less than or equal to

1 are of most interest. Parameter values of 2 and 3 are included pri-

marily to observe the change in the designs for large parameter values.

Table 4.1 only gives positive parameter values. The following

theorem shows that the table can also be used to find optimal designs

for negative values of 81 or 2. This theorem is similar to Theorem

3.6.


Theorem 4.1. Suppose that the D-optimal design for given 81 and 82

is a comparison of x. with x. a total of n times, for all pairs in
1 ] x.,x.
the design. Then


(i) The D-optimal design for (-81) and (-82) is the same as for

81 and 682

(ii) The D-optimal design for (-81) and 2 is a comparison of

(-x.) with (-x.) a total of n times, for all pairs in
1 ] x.,x.
1 3
the design,

(iii) The D-optimal design for 81 and (-82) is the same as in (ii).


Proof. From (3.4.3),


- 1,- (xix.) = i, 2(x.,x.) ,








where ( 12(x.,x.) is given in (3.3.5). From (3.3.2)-(3.3.4) and the

above result, it is clear that A ll, 22' and A12 are invariant under

changing the sign of both 81 and 2. Therefore, since D = A 1 22- 12

D is invariant under changing the sign of both B1 and B2, and so (i) is

proved.

To prove (ii), note that by (3.4.4),


1,l 82 (xi,xj) = l,0 (- ,-x )



Again, from (3.3.2)-(3.3.4) and the above, changing 81 to (-61) and the

pairs (x.,x.) to (-x.,-x.) will not change XA nor A22 It changes

only the sign of 12' and hence does not change A 12. Therefore (ii)

is proved.

To prove (iii), simply apply the results of (i) and (ii).


So, as in Section 3.3, the optimal design for both parameters nega-

tive is the same as the design for I811 and IB21. When exactly one

parameter is negative, the column headed 1,0 becomes 0,' and vice

versa.

Notice from Table 4.1 that for small parameter values, the optimal

design is inside the solid triangle in Figure 4.2, signifying all three

pairs are to be compared. The optimal design approaches the boundary

as the parameters increase. Also, in all cases the optimal design was

inside or on the dotted triangle in Figure 4.2, signifying that no pair

should be compared more than half of the time.

Notice also from Table 4.1 that most of the relative efficiencies

are approximately 0.95 when 81 and 82 are both less than or equal to 1.









This indicates that the increase in efficiency for the generalization

considered under Condition 3 is slight. If an experimenter warrants

that a minimum number of levels should be used, then the advantage of

the designs found presently under Condition 1 is that they have only

three levels as opposed to four levels for the other two conditions.

The fact that they are also about 95% efficient allows the present

designs to be useful in this situation. However, if there is no signi-

ficant advantage to only having three levels, then perhaps the designs

found under Conditions 2 and 3 are more useful since they are slightly

more efficient.


Condition 2


In this subsection, locally D-optimal designs are found analytical-

ly for 32=0. The designs are local as described in Section 4.1. That

is, by the continuity of the criterion, the optimal designs are close

to optimal for 32 close to zero. It turns out that they are also local

in the sense that the designs found under Condition 3 have slightly

larger values of D for some values of 1. Since the criterion is a

maximization of D, the designs under Condition 3 are therefore occa-

sionally slightly better.

By the same argument that was used in the proof of Theorem 3.1,

the D-optimal design must be a symmetric design when 2=0, which implies

12=0. So the search for a D-optimal design is reduced to a search

for a symmetric design which maximizes D = 1 22. Also, the proof of

Theorem 3.2 can be presently implemented to show that every pair in the

D-optimal design must include either x=-l or x=l.









From (3.3.7), .ij is written as


4i. = ( 4cosh2(1 (x. x.)/2) )
13 i J


where


1 u -u
cosh(u) = (e + e-)
2


1 u -u
sinh(u) = (e e) .
2

The maximization of D is equivalent to the maximization of the follow-

ing redefined D:


D = 4Xi 22 = f(x)g(x)

N/2 2



N/2 (1 x.)
f(x) =
i= cosh2(8(xi 1)/2)



N/2 2 + .)2
g (x ) = .
cosh (8 (x. 1)/2)
i=l 1


x' = (1,X2.x, ... XN/2)


Two useful, easily verified facts are


3cosh(6 (x. 1)/2) 1
x 2- B sinh(8l(x -
x. 2 1
.1


(4.2.14)






(4.2.15)





(4.2.16)


1)/2) ,


(4.2.17)


Dsinh(6(xi 1)/2) 1
= 1 -- cosh(3S x 1)/2)
ix. 2


where









The partial derivative of D with respect to x. is then
1

DD f(x) 9g(x)
x g(x) + f(x) (4.2.18)
dx. ax. x.
1 1 1


where


af(x) -2(1 x.) (1 x)2 sinh(8 (xi 1)/2)
-1 1 1 11
axi cosh2 (6(x 1)/2) cosh 3 (x 1)/2)
(4.2.19)


Dg(x) 4x.3 4x. (1 x.2) 2 sinh(3C(x. 1)/2)
1 1 1 1 1
xi cosh2(8 (x. 1)/2) cosh3(1 (x. 1)/2)

(4.2.20)
i=l,...,N/2.


Because of the complexity of handling these N/2 equations of N/2

unknowns, and because it appeared to be reasonable, the expression in

(4.2.18) was evaluated at the N/2-dimensional point x.=x, i=1,...,N/2,

set equal to zero and solved. This produced a design of two equally

weighted comparisons, namely (-l,-x) and (x,l), for some x. The mechan-

ics of the procedure follows.

The partial derivative of D with respect to x., evaluated at x.=x,
1 1
for all i, is


S -2(l -__x) l (1 x)2sinh (1 2)2

9D L cosh cosh3 cosh


-+ 2(l x2 (1 x) )
2 2 2




cosh cosh cosh2

(4.2.21)

where cosh and sinh are shorthand for cosh(B1(x 1)/2) and








sinh(1B(x 1)/2), respectively. By setting the expression in (4.2.21)

equal to zero, multiplying it by cosh5/(N(l x)3(1 + x)), and combin-

ing terms, we have


(1 + 3x)cosh + B1(1 x2)sinh = 0 (4.2.22)


Because of the transcendental nature of this equation, x can not be

expressed as a closed form function of 1. Consequently, an APL pro-

gram was written to find the solution to (4.2.22) for various choices

of 1. These designs are presented in Table 4.2 and are discussed at

the end of the present subsection.

In order for the solution to (4.2.22) to be a local maximum for D,

it must be shown that


cosa


(cosal, ... ,cos /2) axx. 0

Sx.=x c /
cosaN/2


for any angles ai, such that 0 < a. 2r, and


N/2
cos a = 1
i=l


(see Kaplan, page 128). This is equivalent to showing that the matrix

X is negative definite, where


( 2D
X = (4.2.23)
3x N/2xN/2
i = N/2 x N/2








From (4.2.18), the second order partial derivatives are


2 D 2f(x) 2f(x) Dg(x) 2 g(x)
= g(x) + 2 + f(x)- i=,...,N/2,
2 2 1 (4.2.24)x.
S1 1 (4.2.24)


a2D
2x. x.
1 3


F[2f(x) f(x) 2g(x)] f(x) 2g(x) 2gx)
= x.x. xg( + xx. Sx. + x.) x-
131- 3 iL j 1 3


;f(x) 3g(x) 9f(x) 3g(x)
= 2x x + -. x. x. ij, i,j=l,... N/2,
1 3 3 1


(4.2.25)


where, from (4.2.19) and (4.2.20),


2 2812
2 (-x )2
22 i 1 +
cosh


4(l-x )B sinh
cosh3
cosh


2 1 2 2 2
4(3x. -1) -(1-x )
1 2 c
cosh2


-(l-x ) 21 sinh2

cosh4
(4.2.26)


3
8(x. -xi)S sin]

cosh3


3 1-x 2 2 inh 2
-(1-x ) 8I sinh
cosh 4
cosh


I
i=l,...,N/2.


(4.2.27)


Evaluating the second order partial derivatives at the point x.=x,
for al i, (4.2.23) can be written as
for all i, (4.2.23) can be written as


a b
b a


x =


(4.2.28)


a b
b a


22f (x)
2





92g(x)
ax.2
1





2
2 gx)

ax.2
1









where


= 2D

Dx.
x .=x




= g (x) + 2 f (x) (2 (4.2.29)
2 xx -x.2
3x 3xi
xa.=
Xi=X

and


2
Do
b = Dxx (any ifj)

x.=x
1


Df(x) Dg(x)
= 2 Dx. x. (4.2.30)
1 1i



The complete expressions for a and b can be found from (4.2.29) and

(4.2.30), in addition to (4.2.15), (4.2.16), (4.2.19), and (4.2.20),

by substituting x for x..

The matrix X is of a special form as indicated by (4.2.28). It

can be written as


X = D + yz' ,


where D=(a b) I N/2, =b, and y'=z'=l'=(1, ... ,1). From Graybill

(page 187, Theorem 8.5.3), the characteristic equation is


N/2 N
-1
S d 2
(d + c7 y.z. ) (d ) = 0
i=l









N

N
d + a Zy.z. = (a b) + b. As discussed in the previous subsection,
1 1
1
the matrix X is negative definite if and only if all of the eigenvalues

are negative. An analytic proof that the two eigenvalues are negative

follows.

Consider first the eigenvalue (a-b), where from (4.2.29) and

(4.2.30),


C2x) 2 g(x)
a-b = g(x) + f(x)
3x.i x.

1


cosh2(2 (l1-x)2 ) + sinh-cosh-(4(l-x)1


+ sinh2 (- 2(1-x) 2 2J -22

cosh4 cosh2


2 2 1 2 2
cosh (4(3x -1) -(1-x )B
2 1
(1-x) 2 + sinh-cosh-(8x(l-x 2) ) + sinh ( -(1-x2) 2)

coshcosh 4

(4.2.31)


I.t needs to be shown that (a-b) is negative at the stationary point x,

which is the solution to (4.2.22). This relationship between x and B1

found in (4.2.22) is used in the above expression of (a-b) by substi-

tuting for sinh,


-(1 + 3x)cosh
sinh = (4.2.32)
1l(1 x2)








Upon substituting (4.2.32) into (4.2.31), the eigenvalue (a-b) at the

stationary point, x, is



(2 --x) 2 -4(1+3x) x)2] 1-x) 2 +
2 1 (2(1+x) 2 cosh4
a-b =

+ 2 1 2 2 3 21 4
[ 4(3x -1) 1-x )1) 8x(l+3x) + I1(l+3x) .2sh 3 3

(4.2.33)


After factoring out -(l-x) /cosh and combining terms,


a-b < 0

if and only if (4.2.34)


2 (x 1)(x2 2) ) + ( -22x2 20x 6 ) < 0
1 2

The first term in (4.2.34) has roots x=l and x=+v2, and is nonpositive

for XEL-l,l]. The roots of the second term are imaginary, and clearly

the function is negative at x=0. Hence it is negative for all x. The

result a-b < 0 then follows from (4.2.34).
N
The second eigenvalue is (a-b) + b. From (4.2.19), (4.2.20),

and (4.2.30),



b 2(1-x) (-2cosh (l-x)B sinh) (-4x(l+x)cosh (l-x)(l+x)2 sinh)
cosh6 1 1



2(1-x) 8x(l+x)cosh2 + 2(1-x)(l+x)2 1cosh-sinh
cosh6 + 4x(l-x2S cosh-sinh + (1-x2) 2 2sinh2


2(1-x) (-x2 + 2x 1) (4.2.35)
cosh









the last step produced by again using the relationship in (4.2.32) and

combining terms. Then


N (1 x)
Sb =(2x 2) (4.2.36)
2 4
cosh


By factoring N(1-x) /cosh4 out of the second eigenvalue, it is clear
2
from (4.2.34) and (4.2.36) that


(a b) + b < 0
2
if and only if


2 2(- -(x2-1)(x -2)) + (-22x2-20x-6) + (2x-2) < 0
1 2

if and only if


12(- -()x2-) (x -2)) + (-22x2-18x-8) < 0 (4.2.37)


The first term in (4.2.37) is unchanged from (4.2.34). The roots of

the second term are still imaginary, and hence the eigenvalue is nega-

tive. Therefore X is negative definite, and the solution to (4.2.22)

is in fact a local maximum for D.

The locally D-optimal designs for 81=0(.02)1,2(1)5 are presented

in Table 4.2. The value of the original D=llX 22 is also given for

comparison purposes. By comparing the present values of D to the cor-

responding values of D associated with the designs in Table 4.1, it can

be seen that the present designs offer a slight improvement for 2=0.

Theorem 3.3 is applicable here since B2=0, and so the optimal design

for any particular value of 81 is the same as the optimal design for

(- ). Recall that for a given 31, the optimal design is to make half









of the total comparisons between x and 1, where x is found in the table,

and the other half between (-x) and (-1). The optimal value of x is

graphed as a function of 1811 for 1811 1 in Figure 4.3.


Condition 3


A grid method was used to find D-optimal designs with no restric-

tions on the values of the two parameters. With 2S 0, the argument for

a symmetric design is no longer applicable. As a generalization of the

first subsection, the three comparisons (-l,x ), (x 2,1), and (-1,1)

were allowed. A description of the grid method follows.

In general, if there are k independent unknowns, and the grid is

to have n values for each variable at each stage, then the grid method

is simply an evaluation of the function at each of the n points, and

the determination of which of these points maximizes (or minimizes) the

function. It then repeats the procedure, only with a finer grid cen-

tered at the best point previously found. This procedure is repeated

until the optimal point has been found with sufficient accuracy.

The procedure used for the present condition was essentially a

grid within a grid. At each step, the combination of three values each

of x1 and x2 were taken. Initially these values were -0.5, 0, and 0.5.

For each of these 9 points, a grid method was used on the allocation of

the comparisons to the three pairs. This grid consisted of initial

increments of 0.1 in the allocation fractions. Notice that this is not

a full 113 grid since, for example, { -, = ,l = U-1 = 0.5} is
1X X2 L ,1
not a possible allocation for the three pairs because they must sum to

1. After the best of these possible allocations were found, a finer 52








grid was performed about the current best allocation. Note that it is

a 52 grid rather than a 53 grid, because when two of the allocation

proportions are known, the third is just the sum of the two proportions

subtracted from 1. This was then repeated until the best allocation

fractions were accurate to .0001. The best of the 9 combinations of

x and x2 was taken, and then a finer 32 grid about the current best

combination of xl and x2 was performed using the same procedure des-

cribed in the present paragraph.

The entire procedure described in the preceding paragraph was

repeated until the values of x1 and x2 were also accurate to .0001.

D-optimal designs were found for combinations of 81=0(.1)1,2,3 and

82=0(.1)1,2,3, and are presented in Table 4.3. Once again, the value

of D for N=l is also given in the table. Notice from the table that

when nx,1=0, the value of x2 is immaterial, and therefore an asterisk


appears under the column for x2.

Theorem 4.1 is applicable, so designs for negative 81 or 82 can

also be found from the table. If both S1 and 82 are negative, then the

optimal design is identical to the one for positive B1 and 82. If,

however, exactly one of the parameters is negative, then the optimal

design is the same as the one for 16 I and 1B I, except that nx
1 2 -P1t tx

changes to n and n x2 changes to nx2,. For example, if
l,-x x ,l 2

81=0.6, 82=-0.3, then the D-optimal design would be to run .462N compar-

isons of the pair (-1,.23), .468N of the pair (-.25,1), and .070N of

the pair (-1,1).

Table 4.3 indicates that once again no pair should be compared more

than half the time. Also, the optimal design is inside the region










Table 4.1. D-optimal designs with restriction x=-1,0,l




1 82 -1.,0 p0,l -1,1 D Efficiency

0 0 .333 .333 .333 .083333 .9613
.1 0 .334 .334 .332 .082506 .9613
.2 0 .337 .337 .327 .080098 .9613
.3 0 .341 .341 .318 .076314 .9612
.4 0 .348 .348 .305 .071460 .9608
.5 0 .357 .357 .287 .065895 .9599
.6 0 .369 .369 .263 .059986 .9583
.7 0 .384 .384 .232 .054068 .9555
.8 0 .405 .405 .190 .048426 .9536
.9 0 .433 .433 .135 .043283 .9542
1 0 .470 .470 .060 .038814 .9592
2 0 .500 .500 0 .011024 .9664
3 0 .500 .500 0 .002041 .7260

0 .1 .333 .333 .334 .083056 .9615
.1 .1 .336 .332 .332 .082233 .9617
.2 .1 .340 .333 .327 .079836 .9615
.3 .1 .345 .336 .319 .076070 .9613
.4 .1 .353 .342 .306 .071237 .9610
.5 .1 .362 .350 .288 .065694 .9602
.6 .1 .374 .362 .264 .059805 .9585
.7 .1 .389 .378 .233 .053905 .9558
.8 .1 .409 .399 .192 .048276 .9538
.9 .1 .435 .428 .137 .043141 .9543
1 .1 .470 .467 .063 .038676 .9591
2 .1 .500 .500 0 .011001 .9660
3 .1 .500 .500 0 .002039 .7257

0 .2 .332 .332 .336 .082233 .9619
.1 .2 .336 .330 .334 .081422 .9619
.2 .2 .342 .329 .329 .079060 .9619
.3 .2 .349 .331 .321 .075346 .9618
.4 .2 .357 .335 .308 .070575 .9615
.5 .2 .367 .343 .291 .065097 .9608
.6 .2 .378 .354 .268 .059269 .9594
.7 .2 .393 .370 .237 .053421 .9567
.8 .2 .411 .392 .197 .047833 .9544
.9 .2 .436 .421 .143 .042725 .9545
1 .2 .468 .461 .070 .038270 .9587
2 .2 .500 .500 0 .010931 .9647
3 .2 .500 .500 0 .002034 .7245

0 .3 .331 .331 .338 .080888 .9625
.1 .3 .337 .327 .337 .080098 .9625
.2 .3 .344 .324 .332 .077793 .9626




Full Text

PAGE 1

DESIGN OF PAIRED COtlPARISON EXPERIMENTS WITH QUANTITATIVE INDEPENDENT VARIABLES BY WALTER WILLIAM OFFEN A DISSERTATION PPJESENTED TO THE GRADUATE COUl-ICIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REOUIPEMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY or FLORIDA 1980

PAGE 2

Digitized by the Internet Archive in 2009 witPf ftifi^ing from University of Florida, George A. Smathers Libraries http://www.archive.org/details/designofpairedcoOOoffe

PAGE 3

ACKNOWLEDGMENTS I would like to thank Drs. Ramon C. Littell and William Mendenhall for encouraging me to work past the Master of Statistics degree to pursue a Ph.D. I sincerely appreciate their concern and beneficial advice. A special thanks goes to Dr. Ramon C. Littell, first of all for the attention and help he gave to me in my early years at the University of Florida while working for him at the Institute of Food and Agricultural Sciences. I especially thank him for the immense amount of time and effort he contributed to this project. His patience and sincere interest will always be remembered. Thanks go to my family for their understanding and support during the many years of my education. I am especially grateful to my wife, Susie, for her love and caring, which makes it all worthwhile. Finally, I would like to thank my typists, Susan and Walter Of fen, who both did an outstanding job of typing this manuscript. Also, thanks go to Drs. John Cornell, Ramon Littell, Frank Martin, and Kenneth Portier for allowing my typists to use their office typewriters .

PAGE 4

TABLE OF CONTENTS Page ACKNOWLEDGMENTS iii LIST OF TABLES vi LIST OF FIGURES vii ABSTRACT viii CHAPTER 1 DESIGN OF PAIRED COMPARISON EXPERIMENTS : A LITERATURE REVIEW 1 1.1 Introduction 1 1.2 Bradley-Terry Model 3 1.3 Estimation of the Bradley-Terry Parameters and Corresponding Asymptotic Distribution 6 1.4 Designs of Factorial Paired Comparison Experiments 11 1.5 Response Surface Designs 22 2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS . 26 2.1 Introduction 26 2.2 Estimation 27 2.3 Asymptotic Distribution 29 3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER ESTIMATE 34 3.1 Introduction 34 3.2 Linear Model 34 3.3 Quadratic Model 38 Minimization of var (S^) 38 Equivalence of El-Helbawy and Bradley's method and minimizing the variance .... 50 3.4 Cubic Model 58 4 DESIGNS FOR FITTING A QUADRATIC tODEL 65 4.1 I.ntroduction 65

PAGE 5

Page 4.2 D-optimal Designs ^^ Condition 1 '^'^ Condition 2 ^^ Condition 3 4.3 Average-variance Designs 99 Condition 1 100 Condition 2 101 Condition 3 H^ 4.4 Minimax Designs H^ Condition 1 H^ Condition 2 H"^ Condition 3 H^ 4.5 Conclusion and Design Recommendations 128 5 DESIGNS THAT PROTECT AGAINST BIAS 135 5.1 Introduction 13^ 5.2 All Bias Designs 13^ 5.3 Integrated Mean Square Error Designs 138 6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS 145 6.1 Introduction 1^^ 6.2 Average-variance Designs for Preliminary Test Maximum Likelihood Estimators 146 7 T\'70-STAGE SAMPLING FOR MODEL FITTING 150 7.1 Introduction 1^0 7.2 Optimal Error Rate 151 7.3 Concluding Remarks 157 APPENDIX A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES OF -n,...,TT ,e 1^5 B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES 0^3, 3^ BIBLIOGRAPHY BIOGRAPHICAL SKETCH 170 178 181

PAGE 6

LIST OF TABLES TABLE Page 2 3 1.1 Efficiencies of balanced 2 and 2 factorial paired comparison designs 18 3.1 Designs which minimize var(6 ) 48 3.2 Designs which minimize var(3 ) 54 4.1 D-optimal designs with restriction x=-l,0,l 90 4.2 D-optimal (local) designs with S_=0 94 4.3 D-optimal designs 95 4.4 Average-variance designs with restriction x=-l,0,l . . . 105 4.5 Average-variance (local) designs with B=0 109 4.6 Average-variance designs 110 4.7 Minimax designs with restriction x=-l,0,l 119 4.8 Minimax (local) designs with 3_=0 123 4.9 Minimax designs 124 4.10 Efficiency of designs 133 4.11 Bias of designs 134 5.1 J-criterion designs 141 5.2 Values of J 143

PAGE 7

LIST OF FIGURES FIGUI?E Page 1.1 Depiction of Designs I and II 12 3.1 Designs which minimize var(B-) 49 4.1 Procedure summary 69 4.2 Depiction of possible designs for Condition 1 75 4.3 Locally optimal designs when 6-=0 129 7.1 Minimization of the average mean square error Uniform prior 161 7.2 Minimization of the average mean square error Normal prior 152 7.3 D-optimality Uniform prior 163 7.4 D-optimality Normal prior 164

PAGE 8

Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DESIGN OF PAIRED COMPARISON EXPERIMENTS WITH QUANTITATIVE INDEPENDENT VARIABLES By Walter William Offen August 1980 Chairman: Ramon C. Littell Major Department: Statistics An experiment in which the treatments are compared and ranked pairwise is called a paired comparison experiment. The Bradley-Terry model for preference probabilities with ties not allowed is used in this dissertation. This model defines treatment parameters, tt , ... , 7T , such that the probability that treatment T. is preferred over treatment T. is equal to tt. (tt. +Tr . ) . l-Then the treatments are levels of continuous, quantitative variables, the logarithm of the BradleyTerry parameters can be modeled as a regression-type model , InTT . = 'iS, X, . . There have been a large nvimber of papers on the topic 1 ^ k ki of paired comparisons , but only a few have considered experimental design. Past papers on design are reviewed in Chapter 1. In 1973 Springall presented the asymptotic distribution of the maximum likelihood estimators of B, , ... ,3 , where s is the number of regression1 s type parameters in the model. Chapter 2 of the present dissertation contains a derivation of this asymptotic distribution. The remainder

PAGE 9

of the dissertation is a consideration of designs for situations where the treatments are levels of a single quantitative variable. The asymptotic variance-covariance matrix is a function of the true parameter values. Hence, the optimal designs also depend on the parameters. In previous papers on design, the Bradley-Terry parameters were taken to be equal when evaluating the variances and covariances. The present dissertation considers the design problem for various parameter values. Notice that the general design problem is complex. It involves deciding how many levels and which particular levels should be chosen. Furthermore, it must be determined how often to compare each pair of treatments. In some cases this complexity was resolved. In other cases additional restrictions had to be imposed before optimal designs could be found. Chapter 3 considers designs which minimize the variance of a single parameter estimate, S.. The linear, quadratic, and cubic models are considered. The optimal design in the linear case turns out to be comparisons of only the smallest level in the experimental region with the largest, for most values of the linear parameter. The designs for the quadratic and cubic cases depend more heavily on the parameter values than for the linear case. Chapter 4 presents optimal designs for fitting a quadratic model. The optimality criteria considered are D-optimality , the minimization of the average variance of Inir , and the minimization of the maximum value of InTT.. These designs also depend heavily on the parameters, although some overall design recommendations are given in the last section.

PAGE 10

The remaining chapters contain a brief discussion of related topics. Chapter 5 is a consideration of designs which protect against the bias present when the true model is cubic. Oiapter 6 is a discussion of designs for preliminary test estimators. Chapter 7 is a consideration of two-stage sampling. The first stage is a design which results with a small variance of the quadratic parameter estimate. A test of hypothesis then determines whether the second stage design should be one which is optimal for fitting a linear or a quadratic model. A discussion of the best error rate to use for the test of hypothesis is included. The appendices contain computer programs that find maximum likelihood estimates of the Bradlev-Terry parameters and 6,/ ... ,B • These 1 s programs are written in the languages APL and Fortran.

PAGE 11

CHAPTER 1 DESIGN OF PAIRED COMPARISON EXPERIMENTS A LITERATURE REVIEW 1.1 Introduction An experiment in which the treatments are compared and ranked pairwise is called a paired comparison experiment. Such an experiment requires a number of siob^jects which are usually individuals, but could conceivably be a machine, for instance. Each subject is required to compare two samples and decide which of the two is preferable for the attribute under study. For example, each subject could be asked to state which of two samples tastes better. Paired comparison experiments are often conducted in areas such as food testing where it may be difficult for an individual to quantify his like or dislike of a sample, but he may be able to readily decide which of the two samples is preferred. Suppose that an experimenter wants to determine which of four brands of coffee most consumers prefer. There are a number of experimental procedures available. One possibility is that each individual be required to taste and rank all four brands from least favorable to most favorable. However, sensory fatigue may be a problem because after tasting the second or third sample, it may be difficult to perceive any difference between the s\ibsequent sample and the first sample tasted.

PAGE 12

Another possible procedure is to have each individual assign to each brand a score on some niomerical scale. Sensory fatigue may again be a problem with this experimental procedure since all four brands are tasted by each individual. Additionally, there may be a degree of arbitrariness to choosing a numerical score for each treatment. A third possibility is to conduct a paired comparison experiment. The problems mentioned in the two preceding paragraphs are not present for paired comparison experiments. That is to say, paired comparison experiments have a minimal amount of individual guesswork involved. As pointed out in the first paragraph, the complete experiment involves a number of individuals , each one choosing out of two samples the one which is preferable. For the particular example with four brands of coffee, 24 s\±)jects might be utilized, thereby resulting in 4 replicates of each of the 6 distinct pairs of treatments. The treatments can have a factorial arrangement. However, factorial paired comparison experiments have often been analyzed ignoring the factorial arrangement of the treatments. For example, Larmond, Petrasovits, and Hill (1968) conducted a 2^3 factorial paired comparison experiment, but they did not test for interaction or main effects as normally is done with factorial experiments. Although Abelson and Bradley (1954) presented a theoretical development, only relatively recently have tractable methods been available for testing interaction and main effects in factorial paired comparison experiments. In the case of quantitative treatments, an analogue of multiple linear regression analysis is appropriate. For example, instead of four distinctly different brands of coffee, there may only be one brand

PAGE 13

of coffee with four different levels of an additive. The design of paired comparison experiments for fitting response surfaces with one independent variable is discussed in Chapters 3-7. 1.2 Bradley-Terry Model Denote the t treatments as T ,T , ... ,T . Let n ^ be the 12 t ij nvimber of times T. and T, are compared, i< j , i,j=l, ... ,t. We write T. -^ T. for "T. is selected over T.", where selection is based on the particular attribute under study. A general model defines ( ) functionally independent parameters , i> tt . £• 1 , such that P{T. ^ T.) = IT. . , IT. . + TT. . = 1, iT^j, i,j = l,. . .,t. (1.2.1) Each distinct pair of treatment comparisons is a set of independent Bernoulli trials. The entire experiment is a combination of the (") independent sets. Bradley and Terry (1952) proposed a basic model for paired comparisons. A treatment parameter tt . is associated with each treatment T , 1 1 i=l, ... ,t. The Bradley-Terry model is then defined to be P(T. ^T.) = TT. . = Tr./(7T. + TT.) , i^^ j , i , j = l , . . . , t . (1.2.2) The right-hand side of (1.2.2) is invariant under scale changes, so a single constraint is imposed to make the treatment parameters unique. Two popular choices are Ztt. =1, or z^ln-r. =0. This constraint is 1 1 replaced with another constraint in Chapters 2-7.

PAGE 14

Another model for paired comparisons was presented by Thurstone (1927) . He used the concept of a subjective continuum on which only order can be perceived. Let y . represent the location point on the continuum for treatment T. , i=l, ... ,t. An individual receives a 1 sensation X. in response to treatment T. , with X. distributed 1 IX Normal (u. ,1) . The probability that T. is preferred over T. is then P(T. -^ T.) = P(X. > X.) = -^ I e ^ /^ dy , ^^^ (1.2.3) 3 ^ ^ ^J i,j = l t. -(!i.-U.) 1 : Bradley (1953) noted that replacement of the normal density function in (1.2.3) by the logistic density function yields P(T. -^ T.) =1 -sech^(y/2) dy = tt ./("". +1T . ) , ^^^ (1.2.4) ' ^ J ^ ^ ^ 3 i,j = l t. (InTT.-lnTT . ) 1 : This is the Bradley-Terry model with u.=lnTT., i=l, ... ,t. Thompson and Singh (1967) gave a psychophysical consideration by assuming a treatment T stimulates unobservable signals U , ... ,U which are then transmitted to the brain by m sense receptors. The U. are independent and identically distributed, and m is taken to be fixed and large. The two variables considered for the response X to treatment T are (i) X is the average of the signals U , ... ,U , (ii) X is the maximum of the signals U . ... ,U . 1 m

PAGE 15

By the Central Limit Theorem, asymptotic normality results for X in (i) , which results with Thurstone ' s model. The random variable in (ii) has one of three distributions depending on the distribution of the U. . If U. , for example, is distributed exponentially, then the asymptotic distribution of X is the extreme value distribution. In any case, however, the Bradley-Terry model in (1.2.4) results from (ii) . Rao and Kupper (1967) generalized the Bradley-Terry model to allow for ties. A tie parameter 9 ^ 1 is introduced, and the model becomes P(T. ^ T.) = I i sech^(y/2) dy = IT ./ (TT. +011 . ) , ^^^ i,j=l,...,t, (InTT.-lnTT. )+ri 1 D -(IniT.-lmT J+n 1 D 1 2 p(T. = T.) = T sech (y/2) dy (1.2.5) 3 4 i, j=l, . . . ,t, -(InTT.-lnTT J-n 1 3 (8^ Dtt.tt. 1 D (TT. + 077.) (0Tr. + IT.: 1 : 1 : where ri=ln0 is the threshold of perception for the judges. That is to say, if it is hypothesized that the two samples generate a random variable d which measures the difference between the samples, and furthermore that |d| < Ti , then the judge will not be able to discern a difference between the two samples, and will thus declare a tie. Note that the Bradley-Terry model is a special case with 9=1. Another model to handle ties was proposed by Davidson (1970) ,

PAGE 16

The Bradley-Terry model has received much attention in statistical literature since it was introduced in 1952. For the remainder of this dissertation, the Bradley-Terry model with ties not allowed is used. 1.3 Estimation of the Bradley-Terry Parameters and Corresponding Asymptotic Distribution When the model was proposed by Bradley and Terry, it was assumed that all of the n. . were eqpaal. Dykstra (1960) extended the method to permit unequal n. .. The parameters are estimated by the maximum likelihood method and likelihood ratio tests are developed. Assuming independence of the paired comparisons, the complete likelihood function is t a. t n. . L = K TT. ^ / n (TT. + TT.) ^^ , (1.3.1) i=l " i
PAGE 17

These equations must be solved iteratively (see Appendix A for an APL computer program which finds maximum likelihood estimates of the treatment parameters) . A brief description of the iterative procedure follows . Letting p . be the k approximation to p . , then (k) Pi *(k) , V" *(k) ., . ,. = p^ ^ /, i ' (1.3.3) i=l where i. / > n. ./(p. 1 ZIj L i: 1 p. = a. / > n. ./(p. + p . , k=l,2,••. (1.3.4) : The iteration can be started with p. =l/t, i=l, ... ,t. Dykstra (1956) suggested a method of obtaining good initial parameter estimates p. , i=l, ... ,t. Ford (1957) proposed the model independently and proved that the iterative procedure converged to a unique maximxrai for L if the following assumption is true. Assumption 1.1 . For every possible partition of the treatments into the two sets S and S , some treatment in S is preferred at least once to some treatment in S . If Assiomption 1.1 does not hold, then a. must be zero for at least one i. If exactly one a. is zero, say a. , there may still be a unique o solution with p. =0. However, Ford proceeded to explain that if the o set S which violates Assumption 1.1 contains two or more treatments, then the corresponding estimates of the Bradley-Terry treatment

PAGE 18

parameter must be zero, and consequently individual members of S could not be compared. This fact is also evident by noticing that in (1,3.4) , * (k) p. would be zero for at least two values of i, and so by (1.3.3), (k) p. would also be zero for the same values of i. Then the next time in the iterative procedure that (1,3.4) is executed, the denominator in the summation would be zero for some i and j , and hence the iterative procedure would not converge , The asymptotic theory and tests of hypothesis require the following additional assumption. Assumption 1.2 . For every partition of the treatments into two nonempty subsets S, and S^, there exists a treatment T.eS, and a treatment 12 1 1 T.oS^ such that U. .>0, where : 2 13 y^ = lim — -^ , ij^j, i,j=l, 1, N = = Z Z ".. Bradley (1955) investigated asymptotic properties of the treatment parameters with each n. ,=n. Davidson and Bradley (1970) considered a multivariate model where each subject chooses one of two samples on more than one attribute. As a special case of the multivariate model, Davidson and Bradley obtained a more general result, allowing the n. . to be different. They showed that the asymptotic joint distribution of v''N(p -TT ), ... ,i'N(p -T7 ) is the singular multivariate normal distribution of dimensionality (t-1) in a space of t dimensions, with zero mean

PAGE 19

vector and variance-covariance matrix S = (a . ) , where O. . = cof actor of A . . in 13 13 'hj^ txt i txl i'lxt ^^ij^ txt i txl i' ixt (1.3.5) 1_' = (1»1, ... ,1), the t-dimensional unit row vector, and X. . 11 ^ ^ ^^ (71. + 1 3 (IT. + IT .) 1 D 2 ' i=l,...,t, ^j = -^ij'^^^i ^ ^j^ ' Ly^j, i,j=l,...,t. It is apparent that the variances and covariances in (1.3.5) depend on the true treatment parameters. Hence estimates of the variances and covariances are usually found by substituting the consistent maximum likelihood estimates for tt , ... ,7T into (1.3.5). In the special case with 7T.=l/t, n. ,=n for all i,i, 1 x: a. , = 2(t-l)^/t^ , and a. . 11 1: 2(t-l)/t , It^j. (1.3.6) Use of (1.3.6) is adequate if IT ... ,71 are approximately equal.

PAGE 20

10 The major test proposed is a test of treatment equality, "o= ^ = ^2 = •• = \ = ^/^ ' H : TT. 7^ 7T. , for some i,j, i?^ j , i,j=l,...,t. a 1 J The likelihood ratio statistic is -21nA = 2Nln2 2 E E "ij'^^VPj^ E 'i'" ^i ^ i 5. = , lim 6. =5. iN iN 1 1=1 Then -21nA is asymptotically noncentral chi-square with (t-l) degrees of freedom and noncentrality parameter X , where ^ • dl"^i"i-'i i
PAGE 21

11 1.4 Designs of Factorial Paired Comparison Experiments Littell and Boyett (1977) compared two designs for Rxc factorial paired comparison experiments. The two designs considered are: Design I: Consists of independent paired comparisons of each of the RC ( ) pairs of treatments. Design II : Consists of comparisons between pairs of treatment combinations differing only in the levels of one factor, i.e. comparing two treatments in the same row or column. RC 1 Note that there are ( ) = — RC(RC-l) distinct pairs of treatments for Design I, but only R(S + C( ) = ^C{R+C-2) distinct pairs of treatments for Design II. In the 4X4 case, for example, Design I requires 120 distinct paired comparisons whereas Design II requires only 48 distinct paired comparisons. The two designs could entail the same total number of paired comparisons, but Design II requires preparation of fewer distinct pairs of samples. Design II is also preferable from a computational point of view because its analysis can be carried out by using a computer program capable of handling the one-way classification with unequal replication of pairs. A specialized factorial program is required for Design I to obtain maximum likelihood estimates under the main effects hypothesis, H : Ti = a.3., where TT . . is the Bradley-Terry treatment parameter given by (1.4.1). The two designs are illustrated for the 2^2 case in Figure 1.1. The comparisons made with Design I are indicated by the solid arrows. The comparisons made with Design II are indicated by the dotted arrows.

PAGE 22

Factor B low high 12 low Factor A high

PAGE 23

13 Designs I and II were compared by Littell and Boyett on the basis of the asymptotic relative efficiencies, in the Bahadur sense, of their respective likelihood ratio tests. That is, test statistics are compared on the basis of the ratio of the exponential rates of convergence to zero of their observed significance levels. In general, the Bahadur efficiency of test 1 relative to test 2 is defined to be 2 (1) lim(log L ^ ') n->co c f9)/c,(e) , , . , 2 ^ (2), ^l^""^2 lim( log L ) where L is the observed significance level of test i. The fvinction n c. (9) is called the exact slope of test i. Usually Bahadur efficiency is used to compare two test statistics that are based on the same sample space. However, its use is equally appropriate when the sample spaces are different. The two designs should be compared on the basis of an equivalent number of comparisons in each. Therefore, a "single observation" with Design I will be taken as n = R+C-2 comparisons of each pair in the design, and a "single observation" with Design II will be taken as n = RC-1 comparisons of each pair in the design. This gives -RCiRC-l) {R+C-2) total comparisons in each design. Let T. . represent the treatment with the first factor at level i and the second factor at level j . The Bradley-Terry model is then ^^^ij^\l) = "i/^^ij^\l^' ''""^^ ""' (l-^-l) j ,1=1, . . . ,C.

PAGE 24

14 The following tests of hypotheses were considered by Littell and Boyett. Test 1 : H : TT . . = 1/RC , H: TT..=a.6.o 13 a 13 1 ] This is a test for detecting two main effects versus no treatment differences. Efficiencies were tabulated for the 2x2 case (Littell and Boyett (1977) , Table 1) . The table showed that Design I is more efficient than Design II for most combinations of a and 3Test 2: H : IT . . = 1/RC , H : 7T . . = B . /R • 01: a 1: : This is a test for detecting a single main effect versus no treatment differences. For this test, Design I was shown to be uniformly more efficient than Design II. Test 3: H: TT..=S./R, H: Tr..=a.3.. o 1: 3 a 1: 1 : This is a test for detecting two main effects versus a single main effect. Efficiencies were again tabulated for the 2x2 case. The table showed that Design I is more efficient than Design II for most combinations of a and STest 4: H: 7T..=a.3., H: IT..^O. o ij 1 j a ij This is a test for detecting interaction versus two main effects. The results showed that Design II is uniformly superior to Design I for detecting interaction. Summarizing, when testing for main effects (Tests 1-3), Design I is usually more efficient than II. When testing for interaction, Design II is uniformly more efficient than I.

PAGE 25

15 Littell and Boyett suggested that it might be a good idea for the experiment to be conducted in two stages. The first stage is a number of replicates of Design II followed by a test for interaction. If it is significant, the second stage is completed by again using Design II, and simple effects are analyzed by applying one-way analyses within rows and columns. If interaction is not significant, then the second stage consists of replicates of pairs in Design I that are not in Design II, and main effects are analyzed. The total n\mber of comparisons to be made in the experiment can be divided into the two parts such that after the experiment is completed, the end result is either a complete Design I or Design II. For example, suppose the treatments form a 2^2 factorial, and resources are available for 24 comparisons. In the first stage, the experimenter rims 4 replicates of the four comparisons in Design II (see Figure 1.1). If interaction is significant, two more replicates of the four comparisons in Design I that are not in Design II are run. The latter then results in the same comparisons made as if a single stage experiment using Design I had been conducted. n Ouenouille and John (1971) also considered designs for 2 factorials in paired comparisons. However, they assumed an individual is asked to assign a number on a scale to signify the degree of difference in the two samples instead of merely deciding which sample possesses more of the parricular attribute under study. This model was developed by Scheffe (1952) . In particular, Ouenouille and John discussed the following 2 factorial. The 28 possible paired comparisons were divided into 7 sets

PAGE 26

16 of 4 blocks as follows : Set

PAGE 27

17 block for set (1) is (I, ABC), so this set provides no information on the effects AB, AC, and BC, i.e. on all first-order interactions. Similarly, the initial block for set (5) in (1.4.2) is (I, A), so this set provides no information on B , C, and BC. This concept of no information on B , C, and BC from data in set (5) may be viewed as follows. If the responses from comparisons in set (5) are modeled as ^I,A) = ^^^^-^V (^^-^I^ = ^^^I,A) ' ^B,AB) = (y+a+b+(ab)-fe^) (y+b+e^) = a+ (ab) +£ ^^^^^ , ^(CAC) = (y+a+c+(ac)+e^^) (y+c+e^) = a+(ac)+£^^^^^^ , ^ (v>r anr-'i = (y+a+b+c+ (ab) + (ac) + (bc) + (abc) +e,^^) (BL,ABC; ABC (y+b+c+(bc)+e ) BC = a+(ab) + (ac) + (abc)+e,„^ ,„^^ , (BCABC) where all of the e ' s are independent and have expected value zero , then it is clear that b, c, and (be) , corresponding to the effects B, C, and BC, are not estimable from the four comparisons in set (5) . Quenouille and John define efficiency as follows. Suppose the design is made up of r sets, s (s ^ r) of which give information on some particular effect. Then s/r is the fraction of the comparisons made that give infoirmation on that particular effect. On the other hand, suppose that each set is run exactly once, i.e. each pair of treatments is compared. It can be shown that there are a total of (2^-1) sets, _,n-l 2 of which give information on any particular effect. In this case, 2 /(2 -1) is the fraction of the comparisons made that give

PAGE 28

18 Table 1.1. 2 3 Efficiencies of balanced 2 and 2 factorial paired comparison designs

PAGE 29

19 are estimated with the same efficiency, and so on. The table indicates that in the 2 case, Design 11) is efficient for estimating the interaction effect. This result is the same as the result Littell and Boyett obtained using Bahadur efficiency. El-Helbawy and Bradley (1978) considered treatment contrasts in paired comparison experiments. The theory developed is completely general , but they specifically showed how optimal designs can be found in the case of a 2 factorial. Their optimality criterion is the maximization of the chi-square noncentrality parameter from the likelihood ratio test, which thereby maximizes the asymptotic power of the test. Let a^=(a ,a ,a ), where a. represents the level of factor i, i=l,2,3, and each a. is or 1. Then the treatment parameters are defined by El-Helbawy and Bradley to be 1 2 3 12 13 23 123 ,^ ^ ^, 7T=1TTr7TTl7T 7T TT , (1.4.3) ^ ^ ^2 ^3 ^^2 ^^3 ^2^ ^^2S for all a_. The factorial parameters are defined by taking logarithms of the 8 treatment parameters given by (1.4.3). So from (1.4.3), Y = logiT = ^ logTT^^ . ^ ^ logTT^^^^ 123 + logiT , a a a . T .^ . i j 12 3 1=1 i<] -' E ^a'^ E S ^a^a 123 + Y . (1.4.4) a. a a a J 12 3 i=l i
PAGE 30

20 , 1 2 3 12 13 23 123 Y^ / Y^ , Y^ , Y^ , » Y^ , , Y, -, . Y, ^ ^ . These parameters are ^1 ^2 ^3 V2 ^iS ^2^3 ^1^2^3 siibject to the usual analysis of variance constraints, namely a.=0 1 111 Y Y ''' = V Y 123 y Y 123 Z. a^a^a^ /, a^a^a3 /, a^a^a^ a^-0 a2=0 a3=0 , i
PAGE 31

21 Let a if comparing T. and T. give no information (in the sense discussed by Quenouille and John) on the twofactor interaction between the first two factors, b otherwise. where n. . is defined in Section 1.2, and N -11^.Then 12a + 15b = 1, since 16 of the 28 distinct pairs give information on the two-factor interaction. El-Helbawy and Bradley then proceeded to show that b should be taken as large as possible in order to maximize the noncentrality parameter. This gives the result b=l/16, a=0. So the optimal design is to compare the treatments that give information on the two-factor interaction between the first two factors. These pairs of treatments would be the sets (3), (4), (5), and (6) in (1.4.2). This result is the same as Quenouille and John would obtain if they wanted to find an optimal design to estimate AB interaction, since using these 4 sets maximizes the efficiency of the AB interaction using their definition of efficiency. It is also analogous to the result Littell and Boyett obtained because the treatments compared in either case differ either in the level of A, or the level of B, but not both. El-Helbawy and Bradley noted that the result discussed in the two preceding paragraphs still holds if other specified factorial effects are assumed to be null (e.g. three-factor interaction). VThile the statistics for the two tests may differ, their limit distributions are the same for either H or H .

PAGE 32

22 El-Helbawy and Bradley also considered simultaneously testing for the presence of AB, AC, and ABC interactions. In this case the null hypothesis is where 2n=3 1^ H ^=3! = ° ' 1 1-1-1-1-1 1 1 1-1 1-1-1 1-1 1 1-1-1 1-1 1 1-1 By the same method they concluded that the optimal design is to make the following comparisons (using Quenouille and John's notation): (I, A), (B,AB) , (C,AC) , and (BC,ABC). Note that this is set (5) in (1.4.2). Note also that this is the only set that contains information on all three of the effects AB, AC, and ABC. So this design is also optimal using Quenouille and John's definition of efficiency. Beaver (1977) considered the analysis of factorial paired comparison experiments using the weighted least squares approach of Grizzle, Starmer, and Koch (1969) . He considered a loglinear transformation on the TT.'s. This approach produces noniterative estimates of the tt.'s. Also, the usual tests for a factorial experiment are easily implemented. However, Beaver gave no consideration to experimental design. 1.5 Response Surface Designs Springall (1973) considered response surface fitting using the Bradley-Terry model with ties allowed as generalized by Rao and Kupper (1967). The model is given in (1.2.5).

PAGE 33

23 The logarithm of the treatment parameters are defined to be functions of continuous, independent variables, x^,X2, ... ,x^, measurable without error, i.e. s k=l Note that some of the variables can be functions of other variables. 2 For example, s=2 and x =x results in the parameterization InTT =8x.+6x.^. The parameters 9,3 , ... ,8 are estimated by i 1 li 2 li -L s the maximum likelihood method. Appendix B has an APL program and a Fortran program which produce maximum likelihood estimates of g^, — , S , but which assume there are no ties, s Springall presented the following definition, theorem, and example. Definition 1.1. Analogue designs are defined to be designs in which the elements of the paired comparison variance-covariance matrix are proportional to the elements of any classical response surface variancecovariance matrix with the same design points and an equal number of replicates at each design point. The advantage of such designs is that they enable certain desirable properties found in classical response surface designs (e.g. rotatability, orthogonality) , which are dependent on the relative sizes of the elements of the variance-covariance matrix, to be readily produced. * For example, consider the classical regression model y = S^ + 6 X + B X . Then an experiment can be designed in such a way that the variance-covariance matrix of the least squares estimators 3q, B^, and

PAGE 34

24 B^ possesses an optimal form. For instance, the classical design with an equal number of replicates at each of the four levels x=-2,-l,l,2 can be shown to be rotatable , and so the analogue paired comparison designs discussed in Example 1.1 below have the same property, i.e. var ( In TT ) = var ( In tt ) . X -X 2 The analagous paired comparison model is InTT =Sx + 6x. An X 1 2 analogue paired comparison design has a corresponding variance-covariance matrix of the maximum likelihood estimators 6 ,Q , such that each element is a constant times the corresponding element of the least squares variance-covariance matrix. For the particular example of a quadratic model currently under discussion, var(6 ) = c-var(S ), var(B ) = c-var(6^) , and cov(B ,6.) = c-cov(B ,6 ), for some c. Springall described a method which uses linear programming techniques that produces an analogue design with minimum elements of the variance-covariance matrix. The following theorem shows how one can produce an analogue design, but it does not necessarily have minimum elements of the variance-covariance matrix. Theorem 1.1 . If a classical response surface design has certain properties dependent on the relative sizes of the elements of the variancecovariance matrix, then the analogue paired comparison design will have the same properties if ID *i: 1 1 *^i' k1>. , = 7T.tt./(tt. + it.) , 1] 1 : 1 :

PAGE 35

25 2 Example 1.1 . Consider the quadratic model In IT = Q x + ^ x , with design coordinates x=-2,-l,l,2. Assimie 9=1 and S =3_=0, and set N=150. Springall showed that an analogue design is given by n. . = 150/6 = 25, i
PAGE 36

CHAPTER 2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS 2 . 1 Introduction This chapter contains the asymptotic theory for the maximum likelihood estimates of the parameters given by (1.5.1), where the logarithms of the treatment parameters are expressed as a polynomial function of a single quantitative variable, x. An example of an experiment which had a single quantitative variable was given in Section 1.1. The treatments in that example were different levels of an additive for coffee. For an experiment with quantitative independent variables , it is appropriate to do a paired comparison analogue to classical regression analysis. The theory developed in Sections 2.2 and 2.3 assume the general case of s independent variables x , ... ,x . The parameterization of the Bradley-Terry parameters, given in (1.5.1), is reproduced here: TT. = V 3^X^. , 1 /_, k ki In TT. = y 3,_x, , , i=l, . . . ,t. k=l The Bradlev-Terrv model with ties not allowed is used, where the tt. are 1 defined by (1.2.2) . In order for all of the parameters 8 , ... ,S to be estimable, it is required that s <. (t-1) . This is clearly necessary since the BradleyTerry parameters have dimension (t-1) . Note also that an intercept 26

PAGE 37

27 parameter, 6 / is not estimable without imposing a constraint on the Bradley-Terry parameters. This becomes apparent when examining the preference probabilities. For example, suppose the model was s 1"^ = ^ " E Vki ' '^' ^• k=l Then P(T. -> T.) = •iT./('n" . + 'T .) ID 1 1 : s exp(S k=l O Y '^\i' ^^p^^ ^ S ^Ai^ " '"^^^0 " E ^ k=l k=l s ^^p^ E Vki> k=l iT^j S S • • T <. 1,3=1, . . . ,t. ^^P( E Vki^ + exp( ^ 8^x^.: k=l k=l The parameter S falls out, implying that any choice of S results with the same preference probabilities. Choosing 3 =0 replaces the imposition of a constraint on the treatment parameters. 2.2 Estimation As was previously mentioned, the parameters 3,, .-• /6 are estimated by the maximum likelihood method. From (1.3.1), the log-likelihood equation is

PAGE 38

28 In L = ^ a.lnTT. ) } n. .ln(Tr. + tt.; i=l i 6, X ) + X .exp(^ 8, x, .) ri /_, k kx rj /_, k k^ k=I k=l i=l 1^3 ^^P^Z Vki^ + exp(^ S^x^.) k=l k=l Z_. Zj "'ij^ri Zj Zj "^i: X . exp ( ri S Vki^ ^ ^j^^p^Z Vkr k=l i?^j i?^j 'E Vki^ -exp(^ S^x^.: "^p^Z. Vki^ ^ ^^p^ k=l k=l Z_J Z_» ID ] TT. + T. £1 = , r=l, . . . ,s. (2.2.2) if^j 1 D where m. . is the total number of times T. is preferred over T.

PAGE 39

29 These equations must be solved iteratively. Appendix B contains an APL program and a Fortran program which find the estimates of Q , . . . , 6 . The procedure described in the appendix converged for most situas tions. However, occasionally when an attempt was made to fit a full model (s=t-l) , the procedure did not converge, which remains unexplained. In this situation one can resort to the program found in Appendix A which finds estimates of the Bradley-Terry parameters, and then solve (t-1) equations in (t-1) unknowns to find the estimates of 6,, ... ,S • Is 2.3 Asymptotic Distribution First of all, some additional notation and elementary theory is presented. This will become useful when the asymptotic joint distribution of 3, , ... ,6 is derived. 1 s Let X. ., be a random variable which is equal to 1 if T, is selected ijk 1 over T. for the k comparison of T. and T., otherwise. Then X. ., has the Bernoulli distribution with probability of "success" equal to TT./(''T. + IT . ) , i,j=l, ... ,t, k=l, ... ,n.., and the Bernoulli random variables are independent of each other. This implies that m. . has the binomial distribution, with sample size n. . and "success" probability equal to tt . / CfT . + ^ .) , where n . . ID 1. . = > X. ., , ID

PAGE 40

30 var(m. .) = n. .it .tt ./ (tt . + it . ) , i
PAGE 41

31 11 i
PAGE 42

32 As previously mentioned, the remaining chapters deal with finding optimal designs when the logarithm of the treatment parameters is expressed as a polynomial equation in a single variable. The various optimality criteria considered include (i) Minimization of the variance of the maximum likelihood estimator of one parameter, (ii) D-optimality, (iii) Minimization of the average variance of the predicted "response". In tt , (iv) Minimization of the maximum variance of In tt (minimax) , X (v) Minimization of the bias, (vi) Minimization of the average mean square error of In t when bias is present (J-criterion) , (vii) D-optimality for a preliminary test estimator. These are all defined in subsequent chapters. For the case of a single quantitative independent variable, the model given in (1.5.1) becomes 'i X ^^i' In 7T, = P" S,.x, , i=l,...,t. (2.3.9) k=l Similarly, the variance-covariance matrix given in (2.3.8) becomes (A , ) , where ab V V ''i^'j a a b = ) / n. . -^ — (x. -X. ) (x. -X X ^ = > > n. . " ^ ., (x."-x.") (x."-x.^) , a,b=l,...,s. ^"^ " 3 (2.3.10)

PAGE 43

33 Notice that the complete design problem is quite complex. It involves deciding how many levels and additionally which particular levels should be chosen. Furthermore, it must be determined how often to compare each pair of treatments. A further complication is apparent by noticing that the variancecovariance matrix, whose elements are given in (2.3.10), depends on the true parameter values. Hence the optimal designs also depend on the parameters. The designs discussed by El-Helbawy and Bradley and Springall were found under the ass\jmption that all Bradley-Terry treatment parameters are equal, or equivalently that 6 =6 = ••• =6 =0. In no 12 s paper of which the author is aware was the dependency of the optimal design on the true parameter values examined. As will become apparent in the following chapters, the optimal design for one set of parameter values can be quite different from the optimal design for another set of parameter values .

PAGE 44

CHAPTER 3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER ESTIMATE 3 . 1 Introduction The present chapter contains a discussion of the linear, quadratic, and cubic models. The optimality criterion considered is the minimization of var(S.), where i=l,2, and 3 for the linear, quadratic, and cubic models, respectively. The linear model is discussed in Section 3.2. Designs for the quadratic model are found in Section 3.3 for S_=0, 3, variable. The general expression for var(Bp) , given by (3.2.5), can be shown to be a continuous function of S , and hence the designs in Section 3.3 are locally optimal at the point 6_=0. Likewise, de signs for the cubic model are found in Section 3.4 for S =0, 8 and S_ variable, and are similarly locally optimal. 3 . 2 Linear Model From (2.3.9), the linear model is In TT. = Sx. , i=l,... ,t. (3.2.1) 1 1 Since there is only one parameter, by (2.3.10) the variance-covariance matrix becomes a simple variance, X , where 6(x. + X.) \ = y y n.. -^ — ^ — (X. x.)^ i
PAGE 45

35 -11'. (X. x.)^ 1 1 . . (exp(S(x.-x.)/2) + exp(6(x.-x.)/2)) ^ 2 X Z ""ij 6d /2 ^' -6d ii]. Although it is not necessary to treat the two cases 6=0 and 87^0 separately, the special case 8=0 is considered first because the arg^ament used to derive the optimal designs is easier to follow for this situation. When 6=0, (3.2.2) reduces to ^ = / / n..(x. x.)^/4 . (3.2.3) i
PAGE 46

36 :^r OA a^2^ 6d/2 -6d/2^ h. = 2d _ Bdje ^_e I n 2 5) 3d , 6d/2 -6d/2 2 , 6d/2 -6d/2, 3 ' (^•^^) (e + e ; (e + e ) The equality in (3.2.5) implies coth(6d/2) = 3d/2 . (3.2.6) The solution to coth (u) = u is a transcendental number, and can be shown to be approximately u=1.1997. Hence, from (3.2.6) and the fact that |d| must be less than or equal to 2, the design which minimizes var(B) for a given value of 8 is Idl = Ix. x.l = min(2,2.3994/l si ) . (3.2.7) 2. J It is now shown that the value of |d| given in (3.2.7) is in fact a maximum for y, and consequently for A. By (3.2.5), the second partial derivative of y with respect to d is given by 3^y . 6d/2 ^ -Bd/2.2 — (e + e ) = dd" 2 4d6-tanh(Bd/2) S^d^/2 + |i3^d^tanh^ (Sd/2) . (3.2.8) 2 2 The second partial derivative 9 y/9d is negative if and only if the expression in (3.2.8) is negative. From (3.2.6), the equality 2 = 8d-tanh(Bd/2) (3.2.9) holds at d=2. 3994/1 Sl , which is then substituted into (3.2.8) to give 9d^ 8 Q^d^/2 + 6 = -Q^d^/2 < . (3.2.10) opt. d

PAGE 47

37 Because there is only one solution to (3.2.6) where d>0, the result in (3.2.10) implies that the solution of |d[ given in (3.2.7) does in fact minimize the variance of 6. By (3.2.7) the optimal design is a comparison of only the pair (-1,1) when S <1.2. It can be shown that P(l h-1)=.92 when 6=1.2. Because of this relatively large preference probability of choosing one endpoint over the other, in most practical situations 3 will be less than 1.2. So in these cases the optimal design is a comparison of only the endpoints of the experimental region. The optimal designs given by (3.2.7) are also optimal for other criteria. In particular, they are optimal for criteria (ii)-(iv) introduced in Section 2.3. These criteria are formally defined in Chapter 4, but a brief explanation for their equivalence in the case of a linear model follows. The equivalence of the criterion being discussed in the present section to D-optimality follows from the fact that the determinant of a 1x1 matrix is simply the only element in the matrix. The equivalence to the other two criteria follows from the fact that 2 ^ vardn tt^) = x var(6) . (3.2.11) From (3.2.11), criterion (iii) is, in this case, the minimization of / 2 2 '^ I X var(B) dx = var(6) , -1 and criterion (iv) is the minimization of 2 ^ max X var(B) = var(6) xeL-1,1]

PAGE 48

38 3 . 3 Quadratic Model The quadratic model is In TT. = 6^x. + Q^x^ , i=l,...,t. (3.3.1) From (2.3.10), the asymptotic variance-covariance matrix of the maximum likelihood estimators S, and 3 is (A , ) , where 1 2 ab X^^ = > > n,^<|)^^(x, X J , (3.3.2) ) ) n. . > n..cj)..(x. X.) (x.^ x.^) , (3.3.3) 12 Z_i ^ ID ID 1 D 1 D i
PAGE 49

39 Minimizing the asymptotic variance of 6_ would equivalently be maximizing the asymptotic power of the test. The variance of 6 is the lower right-hand element of the 2^2 matrix (X , ) . That is, ab '^11 varCB^) = J . (3.3.6) ^11^22 " '^2 Under the null hypothesis, (3.3.5) becomes B^(x^ + X.) ( e + e -' ) As was pointed out in Section 3.1, in the present section (3.3.6) is minimized for B =0. This produces locally optimal designs. Assioming 6 =0 greatly simplifies the design problem as will be seen in the remainder of the section. The end result of Lemmas 3.1-3.2 and Theorems 3.1-3.3 is a set of optimal designs which minimize var(6 ) under the null hypothesis for various choices of the linear parameter. Hereafter, "optimal design" will refer to the design which is optimal for the criterion being discussed in the section. The following two definitions are useful in the discussion that follows . Definition 3.1. The pair of treatments (-x ,-x ) is defined to be the symmetric counterpart of the pair (x ,x ) . Definition 3.2 . A design is said to be symmetric if each pair in the design is compared as often as its respective symmetric counterpart.

PAGE 50

40 Lemma 3,1. Let (l)(x.,x.) denote 4 . . as given by (3.3.7). Then (x.,x.) = (t (-X. ,-x . ) , for any x. and x.. 1 : 1 : ID Proof. •^(x. ,x.) 6t (x. + x.) -26, (X. + X.) 1 1 ] 111 e e ' i' j 8,x. 3,x. „ -26, (X. + X.) ,li li2 li -1 ( e + e -' ) e ' B, (-X.-X.; e ^ ^ ^ 6,(-x.) S,{-x.) _ ( e -' + e ) = (})(-x. ,-x .) . t 1 3 Theorem 3.1 . The optimal design is a symmetric design. Proof. Notice that both of the following hold: (i) (X. x.)^ = ( (-X.) (-X.) )^ , : 1 3 1 (ii) (x.^ x.^) = ( (-x.)^ (-X.)" ) . 3 1 3 1 From (3.3.2), (3.3.4), Lemma 3.1, and the above, it is clear that X and A are invariant under changes of any number of comparisons of pairs to comparisons of their symmetric counterparts. Let D denote any particular design, and let var (6_) be the var1 1 iance of B for Design D . Also, let the i pair in D be denoted (x.,xl). The sets of comparisons {(x.,xl), (-x.,-xl)} are considered 11 1111 in sequence, and without loss of generality, it is assumed that n , n , = d. ^ , X. ,x I -x . ,-x . 1 11 11 for all 1. Notice that it could be true that n , = 0. -x. ,-x: 1 1

PAGE 51

41 Design D is formed from Design D by changing d./2 of the comparisons of the pair {x.,xl) to comparisons of the pair (-x ,-x'). This 11 11 results in (n , + n ,)/2 comparisons of each of the two pairs X. ,xl -X. ,-x: 11 11 in the i set. This change is made for each set of comparisons, thereby defining Design D to be a symmetric design. The total number of comparisons, N, is left unchanged. By the first paragraph of this proof, the values of ,\ and A are the same for the two designs. By (3.3.3), 'V =0 for Design D , 2 and hence X is a minimum. Therefore, by (3.3.5), var (3 ) 2. var (6„) . (3.3.8) 2 "l ^ Note that the variances are equal if A =0 for Design D , and it 12 ^1 is possible that -^,^=0 for a nonsymmetric design. However, (3.3.8) shows that no design can "do better" than a symmetric design. Also note that in practice, d. must be an even integer in order for the n . . ' s to be integers . So the search for optimal designs in the present section will henceforth be restricted to the class of all symmetric designs. Because ^.2~^ ^°^ ^ symmetric design, (3.3.5) reduces to var(62) = I/A22 • (3.3.9) So it is now necessary to find designs which maximize A. . Notice that ^22 depends on 3 . As previously mentioned, the optimal design also depends on the value of 8 .

PAGE 52

42 Theorem 3.2 . The optimal design is a design in which the only comparisons made are comparisons of the pairs (x,l) and (-1,-x) an equal niimber of times, for some x. Furthermore, x^ 0. Proof. First of all, the notation used for X , v^ich is defined by (3.3.4) and (3.3.7), is changed. The N total comparisons are arbitrarily ordered, and the i pair is then defined to be (x. ,xl). The vari11 ables y. and z. are defined to be z . = x. + xl , i=l, . . . ,N. Ill Then from (3.3.7) , 8, (x. + x!) -6, (X. + x!) Ill 111 e e B,x. S,x: _ -6, (x. + x!) , li^ li\2 li 1 ( e + e ) e S,y,/2 -6^y./2 ( e + e ) i=l, . . . ,N, Notice that 6. is a function of y. but not of z . . From (3.3.4), A^is 1 11 22 rewritten as *22 V ^ 2 2 ) .y. z. / . 11 1 1=1 Let D be any design. By holding y. fixed, we examine what changes in z. will increase \^^. Clearly if z.l is as large as possible for 1 22 ^ ' i' every i, then \^ is also as large as possible for the particular set ^ 2 of y.'s found in D. Without loss of generality, it is assxamed that x.
PAGE 53

43 all comparisons must involve either x=-l or x=l. Additionally, using an argument similar to one used in Section 3.2, it is clear from the expression for A given in (3.3.4) that x=l should be compared with one and only one level of x, namely the value of x which maximizes 2 2 (}'(l,x) • (1-x ) . This result in conjunction with Theorem 3.1 proves the first part of the theorem. We can now rewrite \ simply as A22 = N(}){l,x) . d-x")^ . It remains to be proven that the optimal value of x is such that x ^0. The variable H^ (x) is defined to be the positive square root of A^/N, i.e. 6^(1 + x)/2 'P(x) = ^-g ^-^ (1 X ) . (3.3.10) e + e Since (3.3.10) and the fact that x£[-l,l] imply 'Kx) ^0, maximizing A is equivalent to maximizing ij^(x). It will now be shown that 'Mx) ^'|i(-x) , x ^ 0. The following five inequalities are equivalent: ^{x) ^ ^(-x) (3.3.11) 5^(1 + x)/2 S^d x)/2 (3.3.12) ^1 ^rh -\ e + e e + e |6^(l+x)+6^ |6^(l+x)-3^x i^^(i-x)+3^ |6^(l-x)+3^x ^ e + e (3.3.13)

PAGE 54

44 3^(1 + x)/2 -6^(1 + x)/2 B^d x)/2 -6 (1 x)/2 e + e ^ e + e (3.3.14) cosh(u + 6 x) ^ cosh(u) , (3.3.15) where cosh(u) = (e + e )/2 , u = e^(l x)/2 . Now cosh(u) is a monotonically increasing function of u for u. ^ 0. Furthermore, it can be assumed that B ^0 because Theorem 3.3 proves that the optimal designs for 8 = -S are the same. Or, equivalently , the remainder of this proof can be repeated for the case B <0. Then X ^ and B ^0 implies Q x ^0, and hence by the equivalence of (3.3.11) and (3.3.15), '4^ (x) ^'4'(-x) for all x ^ 0. This completes the proof of the second part. t The next theorem shows that optimal designs only need to be found for nonnegative values of S, • Theorem 3.3 . The optimal design for B =-B is the same as the optimal design for S =6 . Proof. Recall that the optimal design is found by finding the value of X which maximizes 'l^(x) from the proof of Theorem 3.2. Since '4) (x) is actually a function of 3 and x, we '/rrite B^d + x)/2 '^<2wx) = ^-^ -— (1 x^) . e + e

PAGE 55

45 The theorem is proved by observing that 6^(1 + x)/2 -6^(1 + X 'l'(3wX) = ^ ^-— (1 x^^ ^ 1' ' S^ B^x ^^ " ' -6^(1 + X) e + e e -e, (1 + x)/2 2 (1 X ) -6,x -B^ e + e ijj(-6^,x) All that remains to be done is to find the value of x that maximizes i|j(x) for various choices of B ^ 0. Unfortunately, a closed form solution of X as a function of 3 does not exist. However when 3-=0, the optimal value of x can be analytically derived, and hence this situation is considered first. 2 Suppose 3 =0. Then from (3.3.10), i|;(x) = (1 x )/2. Clearly this is maximized by x=0, and hence from Theorems 3.1 and 3.2, the optimal design is an equal number of comparisons of only the pairs (0,1) and (-1,0). For the case 3 t^O, the grid method described in the next paragraph was used to obtain the results presented in Table 3.1 and Figure 3.1. The grid method utilized first calculated "Mx) for x=0(.l)l, and the best value of x, say x , out of these 11 choices was found. Then ijj(x) was calculated for the 11 values x= (x -. 1) , (x -.08) , ... , X 1 (x +.1), and again the best value of x was foxind. This procedure was repeated until the optimal value of x was found accurate to the fourth

PAGE 56

46 decimal place. The entire procedure was done separately for each value of |6J. The following lemma proves that the values of x which are found via the grid search are absolute maximums for \Ij (x) . Lemma 3.2 . Exactly one local maximum (and hence it is the absolute maximum) exists for i|; (x) in the interval [o , 1 ]. Proof. By Theorem 3.3, only the case Q >0 needs to be considered. From (3.3.10), the first partial derivative of t]j(x) with respect to x is g, (1 + x)/2 '' ^1 2 e (6^(1 x )/2 2x) ti>'(x) = ^1 ^1^ ( e + e )" ^1^ 2 e (S^(l x )/2 + 2x) . (3.3.16) It can be seen from (3.3.16) that t|j'(0)>0 and i|;'(l)<0. Hence '|;'(x)=0 for at least one xeLo,lJ, and it remains to be shown that i];'(x)=0 for exactly one xeLo,lj. (An attempt was made to show i^" (x) is negative for all B, and xeLo,lj, but it turned out not to be true. In particular, a graph of i|^(x) as a function of x for 3, =5 indicated 4;"(x)>0 for values of x near . ) From (3.3.16), '^j' (x)=0 if and only if -S (1 X) (6^(1 x")/2 2x) = e (S^d X )/2 + 2x) (3.3.17) Notice that the functions f (x) = (3 (1 x )/2 2x) and f „ (x) = (3^(1 x')/2 + 2x) are reflections of each other about the axis x=0, and they are quadratic in x. Hence they intersect exactly

PAGE 57

47 once at x=0. Also, for fixed S, , f, and f have maximums at -2/S and 2/6 , respectively. Let g (x) be the right-hand side of (3.3.17). It was shown in the preceding paragraph that f (x) and g (x) intersect at least once in the interval Lo,lJ. To determine that they intersect exactly once, it is sufficient to show that 3f (x)/3x < 8g (x)/3x. From (3.3.17) , Sf^(x) = -6,x 2 , 3x "^1 3g (x) -3 (1 X) , -6 (1 X) — ^ — = B^e (3^(1 x"')/2 + 2x) + e ^ (-g^x + 2) -6(1 X) 2 = e ^ (6^ (1 X )/2 + 6^x + 2) . For 6 >0 and xeLo,l], it is clear that 3f (x)/3x < and 3g (x)/3x > 0, which completes the proof. t Table 3.1 presents the optimal value of x found from the grid search described immediately prior to Lemma 3.2, for various choices of I 6, I . Figure 3.1 is a graph of the optimal value of x versus I 6, | . The table includes relatively large values of |6 ! to illustrate that x+l as |6j'H». Recall that for a given value of |S,|, the optimal design is a comparison of (-1,-x) and (x,l) an equal niojniber of times, where x is found in Table 3.1 or Figure 3.1. Table 3.1 indicates that the optimal design when S =0 involves only 3 levels, x=-l,0,l. However, when 6 7^0, the optimal design requires 4 levels, x=-l,-x ,x ,1, for some x . In certain experimental o o o situations it may be significantly more laborious to make four "batches'

PAGE 58

48 Table 3.1.

PAGE 59

49 X O (J) — o 00 o 1^ o o in o Q ro OJ O O _ 00 I^ tI3 in ^ ro CvJ — O o

PAGE 60

50 than it is to make just three. For this reason, and because a priori information about S, niay not be available, asymptotic relative efficiencies were calculated for the design (n n ~ "^n i ~ ^/^^ ^° '^^^ optimal designs found in the table. The relative efficiency of the first design to the second (optimal) design is defined to be var (6 ) R.E. = . var^CB^) e -"-/(e -*• + 1)^ S,(l.x) 6, 6,x e (l-x)/(e +e ) The relative efficiencies were enumerated for various choices of 8 , and are presented next to the optimal designs in Table 3.1. Equivalence of El-Helbawy and Bradley's method and minimizing the variance An application of El-Helbawy and Bradley's method for finding optimal designs was briefly discussed in Section 1.4. It will now be proven that their optimality criterion is equivalent to minimizing var(3, )/ any k. First of all, notation and results of El-Helbawy and Bradley's paper which are necessary for the proof of Theorem 3.5 are presented. Recall that El-Helbawy and Bradley considered linear contrasts of Y = In TT , i=l, ... ,t. The matrices B and B are defined as mxt and 11 — m — n n^t, respectively, with zero-sum, orthonormal rows, such that s. m < m+n s t-1. The null hypothesis is H : B Y('^) = , o — n — n

PAGE 61

51 and the alternative hypothesis is H : B Y(TT) 7^ , a — n -ra where tt and Y(2L) 3xe column vectors with i elements tt. and y., respectively. The rows of B represent the linear contrasts ass\imed to be — m null. Note that B need not exist (i.e. m=0) . — m The vector it is a value of tt which satisfies the null hypothesis — o — and the constraints, where tt is the true parameter vector. The matrix B is defined to be the txt orthonormal matrix B = I'/i/t B -m B* -m (3.3.18) where B* is anv (t-m-l)xt matrix such that BB' = I , and l'=(l, ... ,1), -m ' — t — the t-dimensional unit vector. The following theorem is extracted from El-Helbawy and Bradley (1978, Theorem 2, page 833) . Theorem 3.4 . Let p be the maximum likelihood estimator of tt_. Under the ass\m:iption of connectedness (Assumption 1.2) , and by the assumption of strictly positive elements of ^, which has already been inherently assumed since ln(0) is undefined, v^(Y(p) y_{Jl)) has a limiting distribution function that is singular, t-variate normal in a space of (t-m-1) dimensions, with zero mean vector and dispersion matrix E(TT) = B*' (C(^))~^B* , — -in -m (3.3.19) where

PAGE 62

52 C(7r) = B*A(77)B*' , (3.3.20) — m m and A(TT) is the txt matrix with elements V. . (TT) = 77. ^ 1J..TT./(7T. + 7T . ) , i=l,...,t, 11 1 Z_j ID : 1 3 j=l (3.3.21) A (TT) = -y. .Tr.1T./(7T. + TT.) , ifij, i,j = l,...,t, •L-j_ 2.J X J 1 J ' J ' ' J r t t and where u. . is defined in Assumption 1.2. El-Helba;v)Y and Bradley's optimality criterion is the maximization 2 of the noncentrality parameter, A , for the asymptotic chi-square distribution of the likelihood ratio statistic. The noncentrality parameter is \^ = 5'Z "'•5 , (3.3.22) — o — where 6 is a vector of constants defined in their paper, E is the — o leading principal matrix of order n of C , B provides the first n — o — n rows of B*, and — m C = B*ACrT )B*' . (3.3.23) — o -TO o — m A theorem proving the equivalence of the two optimality criteria is now presented. 2 k Theorem 3.5. Consider the model In tt = S,x+6^x + ••• +6, x . x 1 2 k Suppose we are interested in testing, for some fixed i, H : 6.=0 versus H : 3 . ,5^0 o 1 a 1

PAGE 63

53 Then the design which minimizes var(6.) is identical to the design which maximizes the noncentrality parameter from the likelihood ratio test. (Note that it does not matter whether or not it is assumed that some of the other 6 ' s are . ) : Proof. Orthogonal polynomials can be derived for any spacing of the independent variable. Let these contrasts be the rows of B. The ixt matrix B^^^ is the orthogonal polynomial corresponding to the parameter 6. being tested. From the method used to find orthogonal polynomials, it follows that where c is a constant. So minimizing var(6.) is equivalent to minimiz1 ing var(B ,y(p) ) . But from Theorem 3.4, — n=l ' ^ar{B_y{p)) = B . (B*' (C(7T) )"-'-B*)B' — n-1 -n=l -m -m -n=l = (1,0, ... ,0) (C{^)) '" = ((£(£)) ^)^^ . 2 From the definition of X , S is the leading principal matrix of order 1 of C ~ , i.e. — o ^o = ((2(1))"') 11 .

PAGE 64

54 2 In this case, 6_ is a scalar constant, and so maximizing A is equivalent to minimizing S = var(S ) . t o 1 Example 3.1 . This example illustrates the equivalence of the two methods presented in Theorem 3.5. It is assumed that the true model 2 is a quadratic model. In it = 6.x + B„x . Also, the design is restricted to have the three design points x=-l,0,l. Table 3.1 indicates that the design which minimizes the variance The design problem is now approached using the method presented by El-Helbawy and Bradley. The null hypothesis is H : 6_ = , 3, unknown, o 2 1 Using orthogonal polynomials, this hypothesis is equivalent to .'6 where y_' (7T_) = (In t_ , In tt , In tt ) . Note that El-Helbawy and Bradley's constraint, that the In tt . ' s sum to zero, poses no problem. Instead of constraining tt to be equal to 1, an intercept parameter, S> , could be introduced to the model. The model would then become In TT = S„ + 3,x + 8^x^ , X 12 with the constraint \ In TT = . X 2 It can be shown in this particular case that S^ = ^j-

PAGE 65

55 For a definition of the following matrices and notation, see (3.3.18)-(3.3.23) . For the example under discussion. ln=l = (-1 2 -D/Ze , B* — m (-1 2 -D/v'e (-1 l)//2 A(7T ) o '^12 ^ ^13 -y 12 -^2 'l2 ^ ^23 -U -u -u 13 13 -y 23 23 ^13 ^ ^23 where 7T^' = (1,1,1) , i.e. 8^=62=0. Note that tt^ can be any vector which satisfies the null hypothesis and the constraint. The matrix C is — o C = B*A(7T )B*' "~o -m o -m 1 ^^2 " ^23) yn ^2 23 Jf ^^2 ^23^ i (^2 %3 ^ ^23) implying C — o ^^12 ^ ^^13 " ^ 13 ^23' ;= ^^12 ^23^ /12 = ^^12 ^23^ /12 2 ^^12 ^ 1^23^ — o' Then since Z^ is the upper left-hand element of C -1 "12 ^ ^^3 " ^23

PAGE 66

56 implying 2 C O U „ + 4u + U ^12 "^IS ^^23 3 ^12^13 ^ ^2^23 ^ ^3'^2_3 _ ^3 3 ^^^ 12 ^13 ^23 The design which maximizes the chi-square noncentrality parameter from the likelihood ratio test is the design which maximizes S o By the definition of the y. ., y^, = 1 y,^ y,,. So Z can be 13 23 12 13 o rewritten as J -1 ^ ^2^3 ^ ^^2 ^ ^3^ ^' ^2 ^3^ y +4y +(l-u -y ) ^12 ^13 ^12 ^13 ^12^^ ^12 " ^13^ ^ ^^13 " ^13^^ — 3y^3^ 1 -^' ''•'-''' Suppose that y is temporarily held constant. It is then claimed that choosing y,_ = (1 y,-)/2 maximizes T. for the particular y,^. Iz 13 o 13 From (3.3.25), the expression y ( (1 y ) y,^) must be maximized. Because of the constraint s> y, „ <. l-y , it is clear that 12 13 y ((1 u ) y ) is maximized when y = (1 ^,3)/^, which proves the claim. The constraint among the y. .'s in addition to the above claim implies that the optimal design must be a design such that ^ -^3 ^2 = — T^ = ^23 • ^'-'-^^

PAGE 67

57 The scalar E is now rewritten as a function of u,^ alone. From o 12 (3.3.24) and (3.3.26) , 2 -1 _ 3 ^12^^ ^^12^ ^^12 o 8 2y^2 + 4(1 2m^^) 3 ^12 ^^2' 8 4 6U^2 16 " Clearly Y. is maximized bv choosing y._ as large as possible. By o * 12 (3.3.26), this would make U =u^ =.5, and y =0. So this optimal design is in fact the same as the design which minimizes var(6 )• t An attempt was made to find the optimal design for the test of hypothesis H : 6, = 6^ = versus H : not H o 1 2 a o using El-Helbawy and Bradley's methodology. ^-Jhen simultaneously testing more than one parameter , or equivalently more than one treatment contrast, a constant value, 6., must be chosen for each parameter. The 1 theory behind the optimal designs includes an infinite sequence of values of tt satisfying H , which are converging to a vector tt satisfy— a — o ing H . When the treatment contrasts are for factorial treatments, o which is the case they discussed extensively, the relative sizes of the 5 . are said to be equivalent to the relative importance placed on each contrast being tested in H . Hence, El-Helbawv and Bradlev usuallv o chose 5 =6„= • • • = 5 . 12 n

PAGE 68

58 In the present situation, choosing 5 and 5 restricts the alternative parameter values in the infinite sequence mentioned in the preceding paragraph to be a proper subset of the alternative parameters satisfying H . For example, <5 =6 implies that B =& , and hence the 2 alternative models are In fr = 6x + 6x , with B^O as N-x». Choosing X different 6 . can result in a different design because the maximization of the noncentrality parameter depends on the relative sizes of the 5.. Additionally, the optimal designs actually found after choosing the relative sizes of 6 and o were found to be unreasonable (e.g. for 6 =6 , the optimal design is a comparison of only the pair (-1,0)). Chapter 4 presents other optimality criteria for fitting a quadratic mode 1 . 3.4 Cubic Model The cubic model is In TT. ?^x. + B^x.^ + S3X.^ , i=l t. (3.4.1) The asymptotic variance-covariance matrix of the maximum likelihood estimators, B, , S^, and S, is (X ^) , where X , is given by (2.3.10) 1 2 3 ab ab The inverse of the variance-covariance matrix is partitioned as
PAGE 69

59 var(e3) = ah 33 ^3 '^3 '^23' 11 12 12 22 -1 13 23 -1 33 \3^\3^22-\2-^23^ "--23^-^23-^2^3^ \l'^22 ^2 -1 (3.4.2) To simplify (3.4.2), only the case S =0 is considered in the search for designs which minimize var(S ). As pointed out in Section 3.1, this gives locally optimal designs. The following theorem shows that optimal designs only need to be found for linear and quadratic parameters which are both positive. Theorem 3.6 . Suppose that 6^=0, and further that the optimal design for given 6 and S is comparisons of the pairs (x. ,x ) a total of times, for all pairs. Then X. ,x . 1 J (i) The optimal design for (-6^) and (-8^) is the same as the optimal design for 3 and 3 , (ii) The optimal design for (-6^) and 3 is comparisons of the pairs (-X ,-x.) a total of n times, for all pairs, 1 J X. ,X . irf .... ^3 (ill) The optimal design for 3 and (-6 ) is the same as in (ii) Proof. From (3.3.5) , 4) q o (x. ,x.: -3, ,-3_ 1 :

PAGE 70

60 exp(-6^(x^+x.) 62 (x^ +x. )) exp(26^(x^+x.) + 26^ (x^^+x^^) ) (exp(-6,x.-6 x.^) + exp(-6,x.-6.x.^))^ exp (26, (x .+x . ) + 26_ (x . ^+x . ^) ) li2i lj23 I13 2ij 2 2 exp{6 (x + X.) + S-(x. + X )) -^ -^ J ^ X J 2 2 2 (exp(6,x. + 6_x. ) + exp(6,x. + 6-X. )) li 2i 1 J 2 J = (x ,x ) . (3.4.3) 12 -' 2 Notice that since 6 =0, it . it . / (tt . +11.) from the expression for X , 3 1 ] 1 J ab given in (2.3.10) is equal to 1^„ (x.,x.). Hence bv (2.3.10) and p^/P2 1 3 (3.4.3), it is clear that A , is invariant under chanaing the sign of ab both B and 6 , for all a and b. Therefore the variance of 3 is also invariant under changing the sign of both 3 and 6 . This implies that the design which minimizes var(B^) for B^ and 3^ is also the design which minimizes var(6 ) for (-6 ) and (-6^), thereby proving (i) . To prove (ii) , note that from (3.3.5), 2 2 exp(-6, (x.+x.) + S^(x. +x. )) , , , 1 1^ 213 * (x. ,x ^ = ^ ^ ^ (exp(-6,x.+S^x.^) + exp(-6,x.+8_x.^))^ ii2i i:)23 2 2 exp(S, (-X.-X.) + 6^(x. +x. )) ^ 1 1 J 2 1 : (exp(6 (-x.)+S^x.") + exp(6, (-x.)+6_x." Ii2i 1:2] Again, by (2.3.10) and (3.4.4), changing 3 to (-6 ) and the pairs (x.,x.) to (-x.,-x.), for all pairs in the design, will not change \ , 131: ll X , ^,,» nor \ . It changes only the sign of A and \^ . But

PAGE 71

61 notice that in (3.4.2) all terms involving A and X are of the form -. . .,2 2 12 23' 12 ' °^ 23 ^®"^® th^ negative sign cancels, leaving var(B2) ^"'^^^i^nt under the change described above. This proves (ii) . To prove (iii) , simply apply the results of (i) and (ii) . t Theorem 3.5 proved the equivalence of minimizing var(3 ) and El-Helbawy and Bradley's criterion. A Fortran computer program was used to find the optimal design using El-Helbawy and Bradley's methodology for 6^=62=6^=0. The levels allowed were x=-l ,-. 5 ,0, . 5, 1. The grid search included all ten pairs formed by these five levels, assuming a symmetric design. The optimal design found was ^n ' -l,-.5 ".5,1 " •°'*^^' ''-55 = -^^IN, n = .298N, all other n. . = O}. For this design, it can be shown that var(S )=16/N. It was decided that for unequally spaced levels, employing El-Helbawy and Bradley's methodology would be substantially more laborious than minimizing var(63). ^°'' ^^^^ reason, the designs subsequently discussed were found by minimizing the expression found in (3.4.2). Optimal designs were found by a grid search via a Fortran computer program for 83=0, &^,2^=0{ .2)1. The comparisons allowed were (-l,x ), '^l'^2^' ^^'^2'"'"^' ^'^ (-1/1). where the optimal value of x and x were also found by the computer. The motivation for the choice of these four pairs is the result discussed in the two preceding paragraphs for the case of all three parameters equal to zero, which indicated that only the pairs (-l,-.5), (-.5, .5), (.5,1), and (-1,1) need to be compared. The grid procedure is now briefly described. The initial values for x^ and x^ were x^=-. 7 ( .1) ". 3 and X2=.3(.l).7. For each of the 25

PAGE 72

62 combinations of x and x , the optimal comparison proportions were found by another grid. The latter grid began by considering different comparison proportions, changing by increments of 0.1. This grid became finer and finer until the proportions were accurate to ±.0004. From this the best combination of x and x out of the initial 25 choices was fo^md, and the entire procedure was repeated for a finer grid about the previous best choice of x and x . The entire grid search stopped when x and x were accurate to ±.0037. Notice that it is possible that the designs found are not absolutely optimal, but rather they may only be local minimums for (3.4.2). This is because only four pairs were considered when it is possible that a five or six pair design, for example, might offer a slight improvement. Also, the grid search restricted x and x^ rather than allowing them to take on values in the entire interval [-1,1 j. The latter point is not believed to be a problem due to the resulting design for Q =B =Q =0 originally found using El-Helbawy and Bradley's methodology, and the fact that the optimal designs currently being discussed all have values of x and x within 0.02 of 0.5. The optimal designs and the value of var(8^) for N=l are presented in Table 3.2. The first two columns of the table give the value of 6 and 8 . Columns 5 through 8 give the comparison proportions for the pairs (-l,x ), (x ,x ) , (x ,1), and (-1,1), respectively, where x and X are given in columns 3 and 4. The last column gives N-var(6 ). The number of times a pair is compared is found by multiplying the appropriate comparison proportion by N. Recall that by Theorem 3.6, the optimal design for both parameters negative is the same as the optimal design for |3 | and |3-|. If exactly one parameter is negative, then

PAGE 73

63 the optimal design is foiand by first finding the optimal design for I 6^ I and l&^l ' ^^^ ^^^" taking ]j_ to be the proportion of the total nvunber of comparisons that the pair (1 , -x ) is compared, u to be 1 x^,X2 the proportion for (-x ,-x ), and so on. To calculate var(S ) for N>1, the tabled value of N-var(3^) is divided by N. Notice from Table 3.2 that for Q. '^ .6, the optimal design is very close to the design {n_^ . ^ = ^_ c, s'^Sl" •3333n}, regardless of the value of 6 . This implies that if it is a priori believed that •^ 1 is relatively large, then this design will result in a variance of 6 very close to the tabled minimum. As has been previously mentioned, it is also of interest that the optimal values of x and x are always very close to -.5 and .5, respectively. Finally, notice that the design for 3^=6^=0 found in Table 3 . 2 is not the same as the one found using El-Helbawy and Bradley's methodology, although the value of var(6^) appears to be the same for both, and hence they are equally optimal. However, it was not established that the two designs are mathematically equivalent. Since the two criteria are equivalent, there may be some problem with local minimums of var(6 ). It appears that the surface of the expression for var(6 ) is relatively flat, and so it is possible that a finer initial grid is necessary to guarantee that the minimum found is in fact the absolute minimum of var(6 ).

PAGE 74

64 Table 3.2. Designs which minimize var(6 ) h

PAGE 75

CHAPTER 4 DESIGNS FOR FITTING A QUADRATIC MODEL 4.1 Introduction The present chapter contains optimal designs for fitting a quadratic model iinder three different conditions, each for three different optimality criteria. The D-optimality criterion is considered in Section 4.2. This criterion minimizes the determinant of the asymptotic variance-covariance matrix. In the present chapter, this matrix is a 2X2 matrix since there are two maximum likelihood estimators, 6 and 3 . From Section 2.3, these estimators are asymptotically normal. Therefore, by a result in Box and Lucas (1959), the determinant of the variance-covariance matrix is proportional to the volume contained within any particular ellipsoidal probability contour for &^{B ,3 ) about 3^, where 6 is the true parameter value. The D-optimal design will then minimize the volume of any such probability contour. Section 4.3 contains designs which minimize the average-variance ^ -V '^ 2 °f In 7T^ = 3 x + 3-X . The average-variance is simply the variance of 1" ^x ^"^^s^^ated over the experimental region. In this case, the experimental region is x£L-l,lJ. The property that these designs have is obvious. That is to say, a good estimate of all of the preference probabilities is desirable, and the design which minimizes the integral over x of vardn ^ ) will give estimates of tt which are good "on the X X average" . 55

PAGE 76

66 The criterion discussed in Section 4.4 is similar to the averagevariance criterion. The designs found in this section minimize the maximum of var(ln tt ), where the maximum is taken over the interval X xeL-lfl]. These designs are referred to as minimax designs. The disadvantage of this criterion is similar to the disadvantage often attributable to the minimax rules in the decision theory framework. That is, minimax designs may be slightly "pessimistic" in the sense that they minimize the largest value that the variance can be. This minimization is good, but at the same time the variance of In it may be relatively X large for most values of x in the interval [-1,1 J, hence causing the minimax design to be unfavorable. Finally, Section 4.5 presents some design recommendations and conclusions. A number of designs are compared to determine which design does best for a wide range of parameter values, using primarily the D-optimality criterion. Additionally, an examination is performed on how well these designs protect against the bias present when the true model is cubic. Recall that in general, the problem of finding optimal designs is complex. It involves the determination of the number of levels to be included in the experimental design, and also which particular levels are to be chosen. Additionally, the number of times each pair is to be compared must be determined. In Chapter 3 it was proven that the design which minimizes var(3 ) must be of the form (n = n = N/2} thereby implying that the optimal number of levels is four, and only two pairs are to be compared an equal number of times. All that remained to be resolved was the determination of x for different values of |SJ. In the present situation, a similar simplification of the

PAGE 77

67 general problem does not appear to be possible, except by imposing some restrictions, such as letting 6 =0. For this reason, three conditions are considered under which optimal designs are found. This essentially results in designs which are optimal for a proper subclass of the class of all designs. However, as will become apparent in the remainder of the chapter, there is not much difference in the degree of optimality among the three conditions presently considered. Hence it is conjectured that the degree of optimality of the designs found in the final and most general subsection of Sections 4.2-4.4 is very close to if not equal to the best that can possibly be achieved. The three conditions are now briefly introduced. The first condition considered in each of Sections 4.2-4.4 is that the levels are restricted to be x=-l,0,l. This greatly simplifies the general problem to one with only two unknowns, namely the comparison proportions, y_^ ^ and y^ ^. The value of y is found by che relationship l^.i "^ ^0 1 "^ '^-1 1 " ^' The second condition discussed in each of the following three sections is to let 6 =0 in the enumeration of the variances and covariance. In this situation the optimal designs are proved to be symmetric, and hence the design problem is simplified to a choice from the class of symmetric designs. As was pointed out in Section 2.3, previous papers on the design of paired comparison experiments have always relied upon setting all parameters equal to zero. Presently, the linear parameter is allowed to vary. The three criteria are a continuous function of the parameters because the variance-covariance matrix is continuous in the parameters. Therefore, the designs found for the second condition

PAGE 78

68 are locally optimal in the sense that they are close to optimal for values of 6_ which are near zero. The third condition in each of the following three sections is a consideration of designs whose comparisons are of the form (-l,x ), (x ,1) , and (-1,1) . The optimal choice of the levels x and x are found in addition to the three comparison proportions . This is a generalization of the first condition since setting x =x =0 results in the first condition. Figure 4.1 presents a synopsis of the procedures used in Sections 4.2-4.4 to find the optimal designs for each of the three optimality criteria and the three conditions introduced in the present section. The first coluinn presents a brief description of the three conditions, and hence the three rows in the figure correspond to the three conditions. Columns 2-4 contain a brief description of the methodology used for the three criteria. 4.2 D-optimal Designs The D-optimal design is defined to be the design which minimizes the determinant of the asymptotic variance-covariance matrix. Letting S denote the variance-covariance matrix, this is equivalent to maximizing |Z ~[. In the case of a quadratic model, the D-optimal design is the design which maximizes D, where and where '^,,, '\-_, and X are given in (3.3.2)-(3.3.4). Because the optimal design depends on the true parameter values, designs are found

PAGE 79

69

PAGE 80

70 for various choices of the parameters. The three conditions introduced in Section 4.1 are now discussed, beginning with Condition 1. Condition 1 The levels are restricted to be x=-l,0,l. Because of this restriction, and by substituting u . . for n. . in (3. 3. 2) (3 .3 .4) , the expression for -^-.w '^1-,/ and A can be simplified to A =U ({) +uct) +4U i 11 ^-1,0^-1,0 ^0,1^0,1 ^-1,1^-1,1 ' 12 '^o,ro,i ^-1,0^-1, ' (4.2.2) '^22 = ^-1,0^-1,0 " %, 1*0,1 The substitution of y . . for n. . is made throughout the present chapter. This essentially sets N=l since u. . = lim(n. ./N) . From (4.2.1) and (4.2.2), the expression for D is D = 4u u (t> (J) ^-1,0^0,1^-1,0^0,1 ^^^-^-1,0 ^0,1^ <^-i, 0^-1,0 ^ ^o,i*o,i)*-i,i ' (^•^•^) where . . is defined in (3.3.5). The oartial derivatives of D with respect to y and y are, respectively, 3U "^0, 1^-1, 0^^0,1 ^-1,1 "^-1, 0^-1,0 "^0, 1^0,1 — i , u 4(i-y -u )'i> ^ , ^ -1,0 ^0,1^-1,1^-1,0 ' (4.2.4)

PAGE 81

3D By 0,1 = ^^^-1,0*-!, 0*0,1 ^*-l,l^^-l,0*-l,0 ^ ^0,1*0,1^ ^ '*^^ ^-1,0 ^0,1^*-1, 1^^0,1 Setting the derivatives in (4.2,4) equal to zero results in 71 (4.2.4) u (2d) ) + ^-1,0^ '-1,0^-1,1^ ^0,1^-^-1,0^,1 ^ *0, 1^-1,1 ^ *-i,o*-i,i^ ^-l,0^-*-l, 0*0,1 ^ *-l,0*-l,l ^ *o,i*-i,i^ + u (2d) d) ] *^0,1 ^0,1-1,1 -1,0-1,1 ' '0,1-1,1 (4.2.5) In matrix notation, (4.2.5) is equivalent to -1,0 0,1 -1,0-1,1 \ i) d) '0,1-1,1 (4.2.5) implying / "-.,0 \ \ "0,1 / = A -1 -1,0-1,1 ^0,1*-1,1 / (4.2.7) where A = -1,0-1,1 i) d)+d) d) -i-d) d) ^-1,0 0,1 -1,0-1,1 0,1-1,1 i> d) +(j) i|) +d) ({> -1,0 0,1 0,1-1,1 -1,0-1,1 2d) d) 0,1-1,1 For the solution of (4.2.7) to be a maximum for the expression in (4.2.3), it needs to be shown that the matrix of second order partial

PAGE 82

72 derivatives is negative definite (see Kaplan, page 128). By (4.2.4), this matrix is X = ^'°/3^-i,o' ^ °/^^-i,o3%,i 5'°/3^-i,o3^o,i\ ''^^'%,l' = -4 A , (4.2.8) where, from (3.3.5), -1,1 w. 1 1x2 = l/( e + e ) , -1,0 1 "^2 ., 1 "^2 ^ . 2 /( e + 1 ) , (4.2,9) ^1 ^ ^2 ^ ^ ^2 2 It is known from elementary matrix algebra that the determinant of a square matrix is equal to the product of its eigenvalues. So in this case, |x|=e e^, where e and e are the eigenvalues of X. If |x|>0, then either both eigenvalues are positive or both are negative, or equivalently , X is either positive definite or negative definite. But it is clear from (4.2.8) that (1 ) X -l,l'^-l,0 < , (4.2.10) where and are both positive and are given in (4.2.9) . The inequality in (4.2.10) implies that X is negative definite if |x|>0. However, it will be shown by (4.2.12) that the determinant is not necessarily positive for all values of 6. and Q .

PAGE 83

73 To simplify the notation, we write -2a -(a + c b) X = -(a + c b) -2c where -1,1-1,0 -^1 ^ ^2, = e / ; -^1 ^1, , -h " ^2 (e + e ) (e +1) b = -1,0^0,1 = e / + 1) (1 + e ^ ^) -1,1^0,1 6, . 6 = e / -^^ y, »1 + «2 (e + e ) (1 + e ) Then the determinant of X can be written as X| = 4ac (a + c b) = 2ac + 2ab + 2bc a^ b^ c^ (4.2.11) After resubstituting for a, b, and c, and after some algebra in (4.2.11), an expression for |x| is found by \A = 2 e + 4e ^ + 1 + 4e ^ ^ + 3e ^ ^ B, + 36 -36+ 6„ -26+ 26, 36, + 3^ + 3e ^ + e ^ ^ + 3el +e^ 26 + 26 26 -36^ + 36, 36, + 3, + 3e'^ + 3e^ + e ^ ^ + ^ ^ 2 (4,2.12) -2f ^1 ^^1 ^ ^^2 "^^1 "" ^^2 + 2e + e 46^ + 26^ -26^ + 462 26 6+ 26 +e +e +e+2e

PAGE 84

74 where 26^ r -6, 6, -6, +6, 6, + 8. -I 2 2 1 , I. , 1 2 . ,, ,, ^1 ^2, ? = 4e / (e + e ) (e + 1) (1 + e ) . (4.2.13) Since C>0 for all Q and 6./ |x|>0 if and only if the right-hand side of (4.2.12) is positive. However, it appears that for large enough values of 6 and S , some of the terms with a negative sign will overwhelm the sum of all of the terms with a positive sign, and hence the deteinninant could be negative . For example , B ^rid Q can be made large enough such that the tejnn exp(46^ + 2Q ) in (4.2.12) is larger than the sum of all the terms with positive signs, thereby making the expression in (4.2.12) negative. It will be seen shortly that this is in fact the case . An APL computer program was utilized to solve equation (4.2.7) for various values of 3, and 3 . The determinant, |x|, was also calculated and found to be positive for all values of 6 and 3^ when the solution fell inside the region of allowable values for the comparison proportions, U , „, U. ,, and y , ,. That is, \i . . ^0, for all i and j, and -1,00,1 -1,1 1 j M n''"^ni ''"^1 "'" Figure 4.2 depicts the values that \i and U can have. They must lie inside or on the solid triangle. The dotted triangle in the figure depicts the region that the D-optimal designs are exclusively located. When the solution to (4.2.7) falls outside of the solid triangle in Figure 4.2, the optimal design must lie somewhere on the solid triangle. This can be proven by contradiction since D is a quadratic function of two unknowns, and hence it can only have one stationary point. For a point to be on the triangle, one of the following must be true: ^-1,0 = °' %,1 = °' °^ ^-1,0 ^ ^0,1 = ^-

PAGE 85

75 '0,1 .5 1,0 Figure 4.2. Depiction of possible designs for Condition 1 If y_ p. = 0, then from (4.2.3), 1 , u ° = ^*0,1*-1,1%,1 -^0,1 ^ ' and oD/Sy = implies y = 0,5. This clearly maximizes D because of the negative coefficient on the quadratic term. Therefore if the optimal design lies on the line segment y_ =0, then it must be {y^ ^ = y_^ ^ = 0.5}. If ^Q -,_ = 0, then ° = "'i'-i, 1^-1,0^^-1,0 ^^-1,0 ) ' and it can similarly be shown that the optimal design is the midpoint

PAGE 86

76 )f the line segment, namely {y ^ = 0, y = y = 0.5}. Finally, ^^ ^-1,0 ^ ^0,1 = ^' ^^^^ ° = ^^-1,0^0,1^-1,0*0,1 ^-1,0'^'0, 11^-1,0^^ "^-1,0^ Again, it can similarly be shown that D is maximized when y = y = 0.5. Therefore, if the D-optimal design is on the solid triangle, it must be one of these three designs. When the solution to (4.2.7) was outside of the solid triangle in Figure 4.2, D was calculated for the three designs presented in the preceding paragraph, and the one with the largest value of D was taken as D-optimal. Otherwise the D-optimal design was found directly from (4.2.7). The determinant of X was also calculated. It was occasionally negative as conjectured earlier, but only when the solution was outside of the region, at which time the sign of |x| is irrelevant. The "restricted" D-optimal designs are presented in Table 4.1 for all combinations of the parameter values 8 =0(.1)1,2,3 and B_=0 ( . 1) 1 , 2, 3. Recall that y , „, y , , and y , , represent the proportion of the -1,0 0,1 -1,1 j^ j^ total available comparisons to be allocated to the pairs (-1,0), (0,1), and (-1,1), respectively. The number of times one of these three pairs should be compared is then found by multiplying the corresponding y. . by N. The values of D are also given for the purpose of comparing the designs with the designs found in the next two subsections. Additionally, the relative efficiencies are given for the "restricted" D-optimal designs presented in Table 4.1 relative to the D-optimal designs

PAGE 87

77 found for Condition 3 in Table 4.3. The relative efficiency is the square root of the ratio of the corresponding values of D, since D is a 2 multiple of N . Recall from Section 3.1 that a value of 1 for the parameter S is "large". This is basically true for any parameter since x is in the interval [-1,1 J. For this reason, it is assumed that rarely will 6, or S be larger than 1, and hence parameter values of less than or equal to 1 are of most interest. Parameter values of 2 and 3 are included primarily to observe the change in the designs for large parameter values. Table 4.1 only gives positive parameter values. The following theorem shows that the table can also be used to find optimal designs for negative values of B, or B . This theorem is similar to Theorem 3.6. Theorem 4.1 . Suppose that the D-optimal design for given B and 6 is a comparison of x. with x. a total of n times, for all pairs in 1 : X. ,x . ^ the design. Then (i) The D-optimal design for (-B ) and (-B ) is the same as for B^ and B^, (ii) The D-optimal design for (-S ) and B is a comparison of (-X.) with (-X.) a total of n times, for all pairs in 1 : X. ,x . -^ 1 ] the design, (iii) The D-optimal design for S and (-B,) is the same as in (ii) Proof. From (3.4.3), , _n (x ,x ) = (^ (x. ,x.) ,

PAGE 88

78 where
PAGE 89

79 This indicates that the increase in efficiency for the generalization considered under Condition 3 is slight. If an experimenter warrants that a minim\im number of levels should be used, then the advantage of the designs found presently under Condition 1 is that they have only three levels as opposed to four levels for the other two conditions. The fact that they are also about 95% efficient allows the present designs to be useful in this situation. However, if there is no significant advantage to only having three levels, then perhaps the designs found under Conditions 2 and 3 are more useful since they are slightly more efficient. Condition 2 In this subsection, locally D-optimal designs are found analytically for 3 =0. The designs are local as described in Section 4.1. That is, by the continuity of the criterion, the optimal designs are close to optimal for S^ close to zero. It turns out that they are also local in the sense that the designs found under Condition 3 have slightly larger values of D for some values of S . since the criterion is a maximization of D, the designs under Condition 3 are therefore occasionally slightly better. 3y the same arginnent that was used in the proof of Theorem 3.1, the D-optimal design must be a symmetric design when 3 =0, ^ich implies , 2 ^-^2 ^° ^^^ search for a D-optimal design is reduced to a search for a symmetric design which maximizes D = a A . Also, the proof of Theorem 3.2 can be presently implemented to show that every pair in the D-optimal design must include either x=-l or x=l.

PAGE 90

80 From (3.3.7), ({) . . is written as where ). . = ( 4cosh^(6, (X. x.)/2) ) ^ , ID 1 1 : cosh(u) = :r (e + e ) , 1 u -u sinh(u) = — (e e ) The maximization of D is equivalent to the maximization of the following redefined D: where D = 4X^^X22 " f(x)g(x) , (4.2.14) (X) = Y ^/^ (l-x.)2 f(x) = > , (4.2.15) . ^ cosh (3^ (x. l)/2) 1=1 1 1 N/2 2 2 ^ (1 X.) (1 + X.) , (4.2.16) '2) i=l ^1 i ^' = ^^l'^2 V2^ Two useful, easily verified facts are 3cosh(B (X. l)/2) V^^ = 4 S,sinh(S, (x. l)/2) , dx. 2 1 1 1 9sinh(6 (x. l)/2) K-^ = ^ 3,cosh(S, (X. l)/2) . )X . 2 1 1 1 (4.2.17)

PAGE 91

81 The partial derivative of D with respect to x . is then 3d_ 3x. 9f(x) 3x. 3g(x) g(x) + f(x) (4.2.18) where 9f(x) -2(1 X.) 3g(x) cosh (6, (x^ l)/2) (1 X.) 6,sinh(6, (x. l)/2) 1 1 1 1 cosh^(6^(x^ l)/2) (4.2.19) 4x. 4x. 1 1 2. 2 (1 x. ) 6,sinh(6, (x. l)/2) 1 1 1 1 cosh (B, (X. l)/2) 1 1 cosh (B, (x. l)/2) 1 1 i=l, N/2. (4.2.20) Because of the complexity of handling these N/2 equations of N/2 unknowns, and because it appeared to be reasonable, the expression in (4.2.18) was evaluated at the N/2-dimensional point x.=x, i=l,,..,N/2, set equal to zero and solved. This produced a design of two equally weighted comparisons, namely (-1,-x) and (x,l), for some x. The mechanics of the procediore follows. The partial derivative of D with respect to x,, evaluated at x.=x, for all i, is 3d 3x. (4.2.21) '.>;here cosh and sinh are shorthand for cosh (3 (x l)/2) and

PAGE 92

82 sinh(B, (x l)/2) , respectively. By setting the expression in (4.2.21) 5 3 equal to zero, multiplying it by cosh /(N{1 x) (1 + x) ) , and combining terms , we have (1 + 3x)cosh + B (1 X )sinh (4.2.22) Because of the transcendental nature of this equation, x can not be expressed as a closed form function of 3, • Consequently, an APL program was written to find the solution to (4.2.22) for various choices of 6 . These designs are presented in Table 4.2 and are discussed at the end of the present siobsection. In order for the solution to (4.2.22) to be a local maximum for D, it must be shown that a^D ^=°"^ '=°"V2^ Un^ 1 D cosa. N/2' < for any angles a. , such that ^ a. ^ 2tt, and 1 1 N/2 > cos a. i=l (see Kaplan, page 128) . This is equivalent to showing that the matrix X is negative definite, where / X = ^2 3 D ix. ex . 1 ] (4.2.23) ^i ^ ' N/2 X N/2

PAGE 93

83 From (4.2.18), the second order partial derivatives are 3^0 ^ ^(^^ 3x. 3x. g(x) + 23f(x) 3g(x) 3 g(x) 3x. Sj + f(x), i=l,...,N/2, 3x. 1 (4.2.24) S^D 3x. 3x . 1 J 3^f (x) 3f(x) 3g(x) g(x) + 3x. 3x . — 3x. 3x . 1 : 1 : 3f(x) 3g{x) 3^g(x) + f(x)3x . 3x. — 3x . 3x . 3 1 ID 3f(x) 3g(x) 3f(x) 3g(x) 3x . 3x. D 1 . iT^J, i,j=l,...,N/2, (4.2.25) where, from (4.2.19) and (4.2.20), 3^f(x) -^(I-xJ^b/ 4(l-x.)6,sinh |-(l-x.)^S,^sinh^ 2il, il 2il cosh cosh" cosh (4.2,26) 9\(x) 2 1 2 2 2 4(3x. -1) ^(1-x. ) 3, 1 2 1 1 ,2 cosh (x. -X. ) S, sinh 111 cosh' 3., 2 2, 2 . ,2 — (l-x. ) 3, sinh 2 1 1 ,4 cosh i=l, . . . ,N/2. (4.2.27) Evaluating the second order partial derivatives at the point x.=x, for all i, (4.2.23) can be written as a b b a (4.2.28) a b

PAGE 94

where 3x/ 34 9^f(x) 8f(x) 9g(x) 9^g(x) g(x) + 2 ^^t: — — + f (x) 9x.' 3x. 9x. 3x.^ 1 J (4.2.29) and b = 3^D 3x. 3x . 1 1 (any ij^j) = 2 3f(x) 3g(x) 9x. 3x. X J (4.2.301 The complete expressions for a and b can be found from (4.2.29) and (4.2.30), in addition to (4.2.15), (4.2.16), (4.2.19), and (4.2.20), by substituting x for x.. The matrix X is of a special form as indicated by (4.2.28). It can be written as X = D + ayz ' , where D=(a b) I . , a=b, and y '=z_'=j^' = (1 , ... ,1). From Graybill (page 187, Theorem 8.5.3), the characteristic equation is N/2 (d + °'X ^i'i " ^1 A) (d \) = . i=l

PAGE 95

85 So (— 1) of the eigenvalues are d=a-b, and one eigenvalue is d + a Ey.z. = (a b) + — b. As discussed in the previous subsection, .11 2 1 the matrix X is negative definite if and only if all of the eigenvalues are negative. An analytic proof that the two eigenvalues are negative follows . Consider first the eigenvalue (a-b) , where from (4.2.29) and (4.2.30) , '"3^f(x) a-b = d^gix) g(x) + f(x) 3x.' x.=x 1 2 1 2 2 cosh (2 -jd-x) 6 ) + sinh-cosh(4(l-x)S ; 2 3 2 2 + sinh (j(l-x) 3^ ) _ cosh N„ 2 2 cosh ^(1-x) cosh 2 ^ 12 2 cosh (4(3x -1) |(l-x )B^ + sinh-cosh(8x(l-x^) 3 ) + sinh^ (jd-x"^) ^8^) cosh' (4.2.31) It needs to be shown that (a-b) is negative at the stationary point x, which is the solution to (4.2.22). This relationship between x and S found in (4.2.22) is used in the above expression of (a-b) by substituting for sinh. sinh = -(1 + 3x) cosh 3^(1 x^) (4.2.32)

PAGE 96

86 Upon substituting (4.2.32) into (4.2.31), the eigenvalue (a-b) at the stationary point, x, is a-b = /o 1m ^2Q 2 . -4(l+3x) 3(l+3x) N,n ,2,, ,2 -d-x) (1+x) cosh (4(3x^-1) j(l-x^)6^^) 8x(l+3x) + |(l+3x) N,-, x2 jd-x) cosh (4.2.33) N 2 4 After factoring out — (1-x) /cosh and combining terms. a-b < if and only if (4.2.34) 5^( j(x^ 1) (x^ 2) ) + ( -22x^ 20x 6 ) < The first term in (4.2.34) has roots x=±l and x=±/2, and is nonpositive for xe[-l,l]. The roots of the second term are imaginary, and clearly the function is negative at x=0. Hence it is negative for all x. The result a-b < then follows from (4.2.34). N The second eigenvalue is (a-b) + -b. From (4.2.19), (4.2.20), and (4.2.30) , ^ ^ 2d x) (_2^osh (l-x)S,sinh) (-4x(l+x) cosh (1-x) (1+x) "S, sinh) cosh 2(l-x) cosh 8x(l+x)cosh'' + 2(l-x)(l+x) 3^cosh-sinh 2 2 2 2"' + 4x(l-x ) 3^cosh-f!inh + (1-x )'"3^ sinh'^ 2(l-x) ,4 cosh (-x" + 2x 1) , (4.2.35)

PAGE 97

87 the last step produced by again using the relationship in (4.2.32) and combining termsThen N ij ^ i _ (2x 2) . (4.2.36) cosh By factoring — (1-x) /cosh out of the second eigenvalue, it is clear from (4.2.34) and (4.2.36) that (a b) + I b < if and only if B^(j(x^-l) (x^-2)) + (-22x^-20x-6) + (2x-2) < if and only if 6^(|(x^-l) (x^-2)) + (-22x^-18x-8) < . (4.2.37) The first term in (4.2.37) is unchanged from (4.2.34). The roots of the second term are still imaginary, and hence the eigenvalue is negative. Therefore X is negative definite, and the solution to (4.2.22) is in fact a local maximum for D. The locally D-optimal designs for 6 =0 ( .02) 1, 2 (1) 5 are presented in Table 4.2. The value of the original D=X A is also given for comparison purposes. By comparing the present values of D to the corresponding values of D associated with the designs in Table 4.1, it can be seen that the present designs offer a slight improvement for 6„=0. Theorem 3.3 is applicable here since 3_=0, and so the optimal design for any particular value of 8^ is the same as the optimal design for (-6,) . Recall that for a given 6,, the optimal design is to make half

PAGE 98

88 of the total comparisons between x and 1 , where x is found in the table , and the other half between (-x) and (-1) . The optimal value of x is graphed as a fiinction of |B | for |B,| ^ 1 in Figure 4.3. Condition 3 A grid method was used to find D-optimal designs with no restrictions on the values of the tMD parameters. With 3 y^O, the argument for a symmetric design is no longer applicable. As a generalization of the first subsection, the three comparisons (-l,x ), (x ,1), and (-1,1) were allowed. A description of the grid method follows. In general , if there are k independent unknowns , and the grid is to have n values for each variable at each stage, then the grid method is simply an evaluation of the function at each of the n points, and the determination of which of these points maximizes (or minimizes) the function. It then repeats the procedure, only with a finer grid centered at the best point previously found. This procedure is repeated until the optimal point has been found with sufficient accuracy. The procedure used for the present condition was essentially a grid within a grid. At each step, the combination of three values each of X and x were taken. Initially these values were -0.5, 0, and 0.5. For each of these 9 points, a grid method was used on the allocation of the comparisons to the three pairs. This grid consisted of initial increments of 0.1 in the allocation fractions. Notice that this is not a full 11 grid since, for example, {y = y = u = 0.5} is — -L,x X^,l —1,1 not a possible allocation for the three pairs because they must sum to 2 1. After the best of these possible allocations were found, a finer 5

PAGE 99

89 grid was performed about the current best allocation. Note that it is 2 3 a 5 grid rather than a 5 grid, because when two of the allocation proportions are known, the third is just the sum of the two proportions subtracted from 1. This was then repeated until the best allocation fractions were accurate to ±.0001. The best of the 9 combinations of 2 X and X was taken, and then a finer 3 grid about the current best combination of x and x^ was performed using the same procedure described in the present paragraph. The entire procedure described in the preceding paragraph was repeated until the values of x and x were also accurate to ±.0001. D-optimal designs were found for combinations of 3 =0(.1)1,2,3 and 62=0 ( .1) 1,2,3, and are presented in Table 4.3. Once again, the value of D for N=l is also given in the table. Notice from the table that when n =0, the value of x is immaterial, and therefore an asterisk appears under the column for x . 2 Theorem 4.1 is applicable, so designs for negative S or 3 can the also be found from the table. If both 3 and 3 are negative, then optimal design is identical to the one for positive B and 3 . If, however, exactly one of the parameters is negative, then the optimal design is the same as the one for \B \ and |B J, except that n changes to n , and n , changes to n . For example, if l/-x^ ^2' -x^,-l 3^^=0.6, 32=-0.3, then the D-optimal design would be to run .462N comparisons of the pair (-1,.23), ,468N of the pair (-.25,1), and .070N of the pair (-1,1) . Table 4.3 indicates that once again no pair should be compared more than half the time. Also, the optimal design is inside the region

PAGE 100

90 Tabl<

PAGE 101

Table 4.1continued 91 h

PAGE 102

Table 4.1 continued 92 Sl ^

PAGE 103

Table 4.1 continued 93 «!

PAGE 104

Table 4.2. D-optimal (local) designs with 6_=0 94 ^1

PAGE 105

95 Table 4.3. D-optimal designs "l

PAGE 106

96

PAGE 107

Table 4.3 continued 97 h

PAGE 108

98 Table 4.3 continued h

PAGE 109

99 depicted by Figure 4.2 for small parameter values, and the designs approach the boundary as the parameters become larger. Notice the peculiar designs for B=2,3 and . 5 5, 6 ^1. The optimal design in this case is a comparison of only (-l,x ) and (-1,1), each N/2 times. A comparison of the designs in Table 4.2 with the designs for B_=0 in Table 4.3 shows that at least some of the designs found in the previous subsection are only locally optimal. This follows from the fact that the value of D is larger in the present subsection than it is in the previous subsection for B ^ .6. Notice that the designs are identical for 3 ^ .8. Based on what has been presented up to now, the design D ={n_ = n = .4N, n_ = .2n} is a candidate for an overall efficient design. This is reasonably close to all of the optimal designs in Table 4.3 where the parameters are both less than or equal to 1. The efficiency of D relative to the optimal designs is discussed in detail in Section 4.5. Also considered in Section 4.5 is the amount of bias present for design D when the true model is cubic. 4.3 Average-variance Designs The optimality criterion discussed in this section minimizes V, the average variance of the predicted value, where ' r vardn rr ) dx . (4.3.1) --1 The integral is over the experimental region, which in this case is L-l»lj, and In TT = 3,x + S_x . The variance of the predicted value. In ir , is

PAGE 110

var (In IT ) / \l \2 -1 ( X x^ ) \ ^2 ^2 I ^11^22 '^12 100 (4.3.2) From (4.3.1) and (4.3.2) , 2 , 2 , 1^22 ^ ?\l XX X ^ 11 22 12 (4.3.3) Three methods for finding optimal designs analogous to the methods discussed in the previous section for D-optimal designs are now discussed. Condition 1 The levels are again restricted to be x=-l,0,l. Because of the complexity of V relative to D, it was decided to use a grid method for this condition rather than approach the problem in a manner similar to what was done for D-optimality . The grid consisted of initial increments of 0.1 in the allocation fractions. After finding the best of 2 the allocations above , finer and finer 5 grids were run until the fractions were accurate to at least ±.001. The procedure described in the preceding paragraph was performed for all combinations of S =0(.1)1,2,3 and 3^=0 ( . 1) 1, 2, 3 . The designs are presented in Table 4.4 along with the value of V for N=l. Again, efficiencies of the "restricted" designs relative to the designs found for Condition 3 are given. In this case, V is a multiple of 1/N, so the relative efficiency is defined to be the ratio of the corresponding

PAGE 111

101 values of V. The relative efficiencies are larger than .95 when 3, and 6 are both less than or equal to 1, indicating that the designs in Table 4.4 are only slightly less efficient than the optimal designs found in Table 4.6. The proof of Theorem 4.1 is again applicable, and so optimal designs for negative values of S or 6_ can be found as before. That is, optimal designs for negative 3^ and Q are the same as for | 3-, | and |3 I • When exactly one of the parameters is negative, the columns for y and \i effectively exchange. Condition 2 In this si±isection, local average-variance designs are found under the assumption that the true quadratic parameter is zero. The argument for symmetry is identical to the one used in Condition 2 for the D-opti2 mality criterion. That is, reducing X reduces V, and so the optimal design must be symmetric because '^,^=0 for a symmetric design. Then from (4.3.3), 2/3 2/5 ,^-,.s —+ -r^— . (4.3.4) \l ^22 Again, using a similar proof, it can be shown that every pair in the optimal design must include either x=-l or x=l. Then from (4.3.4), V can be written V -_ Ml^ . ill^ , ,4.3.5, f (x; g (x) where f(x) and g(x) are defined in (4.2.15) and (4.2.16).

PAGE 112

102 The partial derivative of V with respect to x. is 3f (x) 3g(x) 3v

PAGE 114

The remaining eigenvalue is 104 N N (a-b) + ^ = (a-b) + J

PAGE 115

105 Table 4,

PAGE 116

106

PAGE 117

Table 4.4 continued 107 61

PAGE 118

108

PAGE 119

Table 4.5. Average-variance (local) designs with 3 =0 109 h

PAGE 120

Table 4.6. Average-variance designs 110 sl

PAGE 121

Ill Table 4.6 continued 81

PAGE 122

Ta±ile 4.6 continued 112 h

PAGE 123

113

PAGE 124

114 Condition 3 A grid method identical to the one described in Condition 3 for D-optimal designs is used to minimize V in (4.3.3). The optimal designs are presented in Table 4.6. Theorem 4.1 is again applicable, and so optimal designs for negative values of B, or Q can be found in the manner described in Condition 3 of Section 4.2. Notice that the pattern of the designs in Table 4.6 is similar to the D-optimal designs in Table 4.3. However, it is no longer true that eveiry pair is compared no more than half the time. Also, for S_ ^ 1, the change in x . x^, y ^ , y , , or u , , found in Table 4.6 is 1 2 "-'•'^1 X ,1 -1,1 monotone as a function of S . For 3_=2 or 6-=3, however, this is not the case. For example, notice the large difference between the optimal design for 6 =.8, 3_=2 and the optimal design for 3 =.9, S_=2. Once again a comparison of the designs in Table 4.5 with the designs for S =0 in Table 4.6 shows the designs found in the preceding subsection result in only local minimums for V when |B | ^ .6. The corresponding designs found presently are superior because of the smaller value of V. 4.4 Minimax Designs This section is a consideration of designs which minimize the maximum value of vardn t ) over x£L-l,lJ. The designs are referred to as X minimax designs. From (4.3.2), the minimax design minimizes max vardn it ) = max xeL-l,lj ^ xeL-l,lJ x"(A,, 2xA^^ + x-A.^^) 'll 22 ~ 12 (4.4.1)

PAGE 125

115 The following theorem simplifies (4.4.1). Theorem 4.2. Given a quadratic model. Then var(ln ir ) is a maximum X over xe[-l,l] at x=±l. Proof. Note that 22 -X. 12 '2 I 12 11 \l'"^22 " ^2^ Case 1: A <. 0. var (In tt ) var(B^ + 62) ^22 ^ \l ^^2 /\ A A ^ 11 22 12 \l'^22 ^2 (-1 s. x s 1) Case 2: A > 0. 12 var (In tt var (In tt var(62 " ^1^ \l " -^22 " ^\2 \l'S2 ^2 x''(A, x" + A^^ + 2A_x) 11 22 12 A A A ^ 11 22 12 (-1 s X s 1)

PAGE 126

116 var (In tt ) X From the preceding theorem, the minimax design is the design which minimizes ^22 2^2 ^ \l \2 ^ 2^2 -^ \l L '^1^22 " '^2^ \l^22 ^12^ ^22-^1^2! '\l 2 ^1^2 \2 (4.4.2) Three conditions are again briefly discussed. The procedures for the three conditions are very similar to the procedures described for average-variance designs because of the similarity of V and M given by (4.3.3) aind (4.4.2), respectively. Condition 1 The levels are restricted to be x=-l,0,l. A grid method identical to the one described in Condition 1 of Section 4.3 is used. The proof of Theorem 4.1 is applicable, and so optimal designs for negative 3 or S are found as explained for Condition 1 of Section 4.3. These "restricted" minimax designs are presented in Table 4.7 along with the values of M, for N=l. The efficiencies of the "restricted" designs relative to the designs for Condition 3 are once again given. In this case, M is a multiple of 1/N, so the relative efficiency is defined to be the ratio of the corresponding values of M. The relative efficiencies are once again larger than 0.95 when 3 and 3 are both less than or equal to 1, indicating that these designs are only slightly less

PAGE 127

117 efficient than the optimal designs found at the end of the present section in Table 4.9. Condition 2 In this subsection local rainimax designs are found under the assumption that the true quadratic parameter is zero. The same argument for symmetry as used in Condition 2 of Section 4.2 can be applied 2 here. Decreasing \ reduces the numerator of (4.4.2) and increases the denominator, and hence decreases M. So the minimax design must be a symmetric design, implying a =0. Then from (4.4.2), M = Y^ + Y^ . (4.4.3) 11 22 Also, the same argument used in Condition 2 of Section 4.2 can again be applied to show that every pair in the optimal design must be of the foirm (-l,x.) or (x.,1), some x., and hence M can be '^rritten 11 1 :4.4.4) f(x) g(x) ' where f(x) and g(x) are defined in (4.2.15) and (4.2.16). >7otice that the only differences between (4.4.4) and (4.3.5) are the coefficients 2/3 and 2/5. The oartial derivative of M with respect to x . is 1 3f(x) 3g(x) 5m _ 3x. 3x. 3x. ~ ^ + — , (4.4.5) " f (x) g2(x) where 3f(x)/ox. and 3a(x)/3x. are given in (4.2.19) and (4.2.20). — 1 ' — 1

PAGE 128

118 Via the same procedure employed in Condition 2 of Section 4.3, the optimal design is a comparison of (-1,-x) and {x,l) an equal number of times , where x is the solution to (2(l+x)^+4x)cosh + (d-x) (l+x)^+(l-x^))6 sinh = (4.4.6) To show that the matrix of second order partial derivatives is positive definite, equations (4. 3 .8) (4 . 3 .14) can be repeated for the present situation. Everything is identical except that a multiple of 2/3 and 2/5 is removed from the places where they appear. The eigenvalue (a-b) can then be shown to be equal to (4.3.11) after the coefficients 2/3 and 2/5 are removed from (4.3.11). Likewise, the eigenN value (a-b) + ^ is equal to (4.3.13) after the coefficients 2/3 and 2/5 are removed from (4.3.13). An APL computer program solved (4.4.6) for S =0 ( .02) 1 , 2 (1) 5 , and simultaneously evaluated both eigenvalues. Both eigenvalues were positive in all cases, and so the designs presented are at least locally minimax. The designs are presented in Table 4.8 along with the value of M. Again, Theorem 3.3 is applicable, and so the optimal design for a negative value of S, is the same as the optimal design for |3 | . Condition 3 A grid method identical to the one described in Condition 3 of Section 4.2 was used. The minimax designs are presented in Table 4.9. Theorem 4.1 is again applicable, and so optimal designs for negative values of S^ or B^ can be found in the manner described for Condition 3 of Section 4.2.

PAGE 129

Table 4.7. Minimax designs with restriction x=-l,0,l 119

PAGE 130

120 Table 4.7 continued 61

PAGE 131

Table 4.7 continued 121 ^1

PAGE 132

122

PAGE 133

Table 4.8. Minimax (local) designs with 62=0 123 61

PAGE 134

124 Table 4.9. Minimax designs \

PAGE 135

125

PAGE 136

Table 4.9continued 126 Sl

PAGE 137

127

PAGE 138

128 The optimal values of x and x in Table 4.9 are once again similar to the previous two optimality criteria discussed in the present chapter. However, notice that the pair (-1,1) should seldom or never be compared for the minimax criterion. That is the major difference between the present designs and the designs in the two previous sections. The optimal designs for 3^=0 found in Table 4.9 are identical to the designs found in Table 4.8, suggesting that the minimax designs found in Table 4.8 may in fact give absolute minimums for M. 4.5 Conclusion and Design Recommendations Figure 4.3 is a graph of the optimal level to be compared with x=l for each of the three criteria discussed in the present chapter when 6 =0, as a function of | 8, | . This figure has been referred to in Condition 2 for each of the three previous sections. Recall that these are only locally optimal in some cases, and hence are possibly not as useful as the optimal designs found for Condition 3 in the three previous sections. However, notice that the optimal value of x is closer to for the minimax criterion than it is for the other two criteria. Also, the slope of the D-optimal curve is steeper than the other two which are essentially parallel, signifying that these D-optimal designs change faster than the others as |S^| changes. The remainder of this section contains a discussion and comparison of seven particular designs on how good they are overall. Estimates of their relative efficiencies are obtained for the three optimality criteria discussed in the present chapter, and it is determined which designs are best overall for protecting against possible bias.

PAGE 139

129

PAGE 140

130 Let Designs 1-3 be defined as three point, four point, and five point designs, respectively, where the levels are eqxaally spaced, and all pairs are compared an equal number of times. Also, define Designs 4-7 as: Design 4: {\^_^^^^^ = U__24,l = •^' ^-1,1 = "^^ ' Design 5: ^y_,^_24 = ^-.24,1 = ''^' ^^-1,1 = '^^ ' Design 6: {u_^ 2 " ^2 1 ^ '^-^ ' Design 7: {y_ 5 "^ ^_ 5 i "^ .4706, i^ , , = .0588} . The reason for including Designs 1-3 is that they are designs which might naively be run by an experimenter who has little knowledge of optimal designs. The motivation for including Designs 4 and 5 is that they are similar to many of the optimal designs found in Tables 4.3, 4.6, and 4.9, especially for relatively small parameter values. Design 6 is similar to the optimal designs for large parameter values. The source of Design 7 is Springall's paper. It is included here although he did not state any optimal properties for the design. The efficiencies of the seven designs defined above relative to the optimal designs found in Tables 4.3, 4.6, and 4.9 were calculated for combinations of B ,3 =0(.2)1. The integrated mean square error (J) was calculated for 3 ,S^=0(.25)1 and B =-l(.4)l, where C J = I E(ln TT g(x) ) dx , ^-1 N=100, and the true model, g(x), is cubic. Finally, the bias part of J was calculated. See the following chapter for an expression for J and the bias.

PAGE 141

131 Table 4.10 presents the minimum, maximum, and median of the relative efficiencies for each of the three criteria discussed in the present chapter. This table gives an overall estimate of what is lost by running a design different from the optimal design. The estimates are given for the 24 combinations of Q and Q where j 3, | ^ .6/ for the 12 where [ S^ | > .5, and for all 36 combined. Remember that these minimvims , maximiims , and medians are only estimates of what could be considered as corresponding tr^ae parameter values. The efficiencies for the D-optimal criterion will now be discussed. The average-variance and minimax criteria can be considered in a similar manner. For the condition j3 | ^1, Designs 4 and 5 both have a median over 99%, with Design 4 slightly larger. However Design 5 has a minimum efficiency of 96.5% as opposed to 93.5% for Design 4. Both of these designs are overall superior to the other designs considered. For the case |S j £ .6, Designs 4 and 5 are again superior to the others. However, Design 6 is the better design for the case [ S^ | > .6, although Design 5 is very close. The values of the bias and the integrated mean square error were ranked from smallest (1) to largest (7) for each of the 150 combinations of 8 , S , and B considered. Table 4.11 presents an arithmetic mean of these ranks. This gives a rough idea of how the bias and integrated mean square error compare among the seven designs. It is no surprise that Design 3 has a small bias since there are a total of ten pairs compared an equal number of times. Surprisingly, Design 7 had the smallest bias every time. However, the variance part of J is much larger for Designs 3 and 7 than for Designs 5 and 6, and

PAGE 142

132 hence the latter two designs have smaller values of J as indicated by the average rankings in Table 4.11. By taking in conjionction the information given by Tables 4.10 and 4.11, the following recommendations are given. If no a priori information about S, is available, or if it is believed that | 6, | ^ .6, then Design 5 is a good design to use. If it is believed that | 6, | > .5, then Design 6 is the better design. The relative efficiencies were also calculated for the design (y c"^ i;n~^n c ~ ^ ^ ^ ~ -25}. They were in the vicinity of 25% efficient. Hence this design and Designs 1-3, any of which an experimenter could naively elect to use, are in fact poor designs for fitting a quadratic model.

PAGE 143

Table 4.10. Efficiency of designs 133 Design Condition 16,1

PAGE 144

Table 4.11. Bias of designs 1. _-— 'V33% ^ ^-3l%VL~ — -I ^ ____., 40% ~^^^ ^-"1.^40% ""' 20? .^I 1 1^ 1 __,^ 50<^, -^^^ 50% ___, 5.88% -____ ,--'-'"'' ___47.06% -~"l^""~-~„ I \ '-H ^ Ilr 1 -1 ^^^^ -.5 ^^, .5 1 """47.06% ''''' 134 Mean ranking Desian Bias J 7.00 6.00 2. -1 -^/^ ^/^ ^ 5.94 5.59 Each of 5 pairs compared an equal number of times I 1 1 \ 1 -1 -.5 .5 1 3. 2.41 5.97 Each of 10 pairs compared an equal number of times ^-1 . -.24 .24 1 ^-^^ ^•'^^ 5-1 ,^ -.24 ^^ .24 1 3.85 2.18 """'10% '" 6. , 2.74 1.65 -1 -.2 .2 1 1.00 3.19

PAGE 145

CHAPTER 5 DESIGNS THAT PROTECT AGAINST BIAS 5 . 1 Introduction This chapter is concerned with finding designs which protect against the bias present when the true model is cubic, and a quadratic model is fit. Section 5.2 contains a discussion of designs which minimize the bias. Section 5.3 presents designs which minimize J, the integrated mean square error, where 2 E(ln 7T g(x) ) dx '^1 2 " 2 E( In TT E(ln TT ) ) dx + (E(ln tt ) g(x)) dx and where -1 ^-1 V + B , (5.1.1) g(x) = 3*x + 3*x^ + 3*x^ , (5.1.2) In TT X 3^x + 62^ , (5.1.3) 2 In TT^ = 3^x + S^x . (5.1.4) 135

PAGE 146

136 In the present chapter, Bw 6^/ and Q* represent the true parameter values. Notice that designs which minimize V in (5.1.1) when S*=0 were found in Section 4.3. Since 6*=0 implies B=0 , these designs also minimize J. An expression for V is given by (4.3.3). However the X. i: in the expression are now a f\inction of Q* , 6*, and 6* rather than 6, and 3_. A general expression for \.. is found in (2.3.10). 5.2 All Bias Designs From (5.1.2) and (5.1.3) , (E(ln 7T ) g(x) ) = ( (6^ B*)x + (62 8*)x^ 8*x-^ ) B*)^x^ + 2(3^ i*) (S^ S*)x" + ( (62 S*)^ 2Q*{B^ 8*) )x'* ^^3(^2 r,j.> 5 ,oj.2 5 B*)x + 3* X (5.2.1) Then from (5.1.1) and (5.2.1), the bias is 3 = r^ (E(ln TT ) g(x)) dx X -1 2 2 2,^2 4, = f(3, -3*) .3(32-82*) -73* 36*(3, -3*) . (5.2.2) Notice that any design which results with the values of 3^ and 82 that minimizes (5.2.2) is a minimum bias design. The optimal values of 3 and 3 as functions of 3*, 3*, and 3* are now found. Taking derivatives in (5.2.2),

PAGE 147

137 Iff<6, 6J. !«• , 1 1^ = |-(3. S*) (5.2.3) 5'''2 "^2' Setting the expressions in (5.2.3) equal to zero results in (5.2.4) h = ^2 • The three roots to the equation formed by equating (5.1.2) and (5.1.4), after substituting (5.2.4) into (5.1.4), are x=-. 7746 ,0, .7746. These three roots are the points at which the quadratic and cubic curves intersect. The expected number of times that x. is preferred over x. is defined as n. .P(x. ^ x.), where the probability that x. is preferred 13 1 3 ^ over x is calculated using the true cubic model. This is referred to J as "expected" data. If the only levels compared are -.7746, 0, and .7746, then the "expected" data under the true cubic model is the same as the "expected" data under the quadratic model given by (5.2.4). Hence the maximum likelihood procedure using "expected" data under the true model results in 3 and 3 when only the three levels -.7746, 0, and .7746 are pairwise compared. This procedure is equivalent to first finding maximum likelihood estimates, 3, and Q , from the data, and then taking expectations. This was verified for the present situation. Therefore, any paired comparisons among the levels x=-. 7746 , 0, . 7746 results in e3 =3 and e3 =3-/ where 3, and 3 are given in (5.2.4), and

PAGE 148

138 hence any such design minimizes the bias. Notice that the optimal design does not depend on the parameters. This choice of equally optimal designs does not appear to be reasonable, and so the designs are not recommended to be used in practice. Minimizing the bias could result in a large average variance, and hence a large integrated mean square error, J. The next section is a consideration of designs which minimize J. 5.3 Integrated Mean Square Error Designs Define the function f as f(SwB^,B,,x) = In TT = 8,x + 6^x^ + 6,x^ . (5.3.1) 12 3 X 12 3 Then it is clear that f(6^,S2/33/x) = -f(-6^,-32/-63,x) -f{8^,-62.33,-x) (5.3.2) f(-B^,B2'-63/-x) , and f(8^,B2/-63,x) = -f(-6^, -62/83, X) -f(B^,-S2/-S3,-x) (5.3.3) f(-B^,32'B3,-x) . Intuitively, this would imply that optimal designs only need to be found for 3^ and 6^ ^ since f is just transposed about the x-axis or y-axis , or both, for the case of negative values of 3 or 3^For

PAGE 149

139 example, by (5.3.2) the function f for (B^^-B^/B^) if f (B^,B2/33,x) transposed at)Out both the x-axis and y-axis. From (3.2.5), (|)^^ can be written exp(f(63_, 62^33, X.) + f(B^, 32^63, X. ID ( exp(f(3^, 62/63, X.)) + exp(f(B^, 62,63, X.)) ) . (5.3.4) The fact that only B^ and 62 ^ need to be considered can be verified by using (5.3.4) in a manner similar to the proof of Theorem 4.1. A Fortran program performed a grid search for the designs which minimize J, where J is given by (5.1.1). The total number of comparisons was taken to be N=100, and so V, as given by (4.3.3), was divided by 100 before substituting into (5.1.1). The grid search was performed for combinations of 8^,62=0 ,. 5 , 1 , and S3=-l(.5)l, plus a few additional parameter combinations. The levels were fixed at x=-l, -. 5 ,0 , . 5 ,1 . The grid allowed for all 10 possible paired comparisons. The search ended when all 10 comparison proportions were accurate to at least ±.01. These optimal designs are presented in Table 5.1. The value of J is also given. As was previously discussed, the designs for negative B or B are found as follows. First of all, the design for |B^|, IB I, and 6 is located in the table. If 6 and Bare both positive 2 3 L ^ or both negative, then the design is found directly from Table 5.1. If this is not the case, then n becomes n_ _ for all pairs m the X . / X . X . / X . design. Notice that the pairs (-l,-.5), (-.5,0), and (0,.5) are never compared, and (.5,1) is only compared when 3^=32=6^=1. The designs presented are not intended to be precise, but rather they give a rough

PAGE 150

140 description of designs which protect against bias. For example, the designs should be symmetric when 6-=0, but Table 5.1 shows them to be close to but not exactly symmetric. Because of the restriction of the five fixed levels, the minimvmi value of J found in the present section is not necessarily the minimum for the class of all possible designs. In fact, when 8 =0, Designs 4-6 in Section 4.5 have slightly smaller values of J than those given in Table 5.1. This can be seen in Table 5.2 which compares the values of J for the optimal designs presently being discussed with Designs 4-6. This table shows that if the cubic parameter is relatively large, then the present designs are significantly better for protection against bias. Otherwise Designs 4-6 adequately protect against bias, and they additionally are highly efficient for the optimal properties discussed in Chapter 4.

PAGE 151

141 Table 5.1. J-criterion desiqns 81

PAGE 152

142 Table 5.1 continued «1

PAGE 153

143 Table 5.2. Values of J Design 4 Design 5 Design 6 "Minimum" J

PAGE 154

144 Table 5 . 2 continued Design 4 Design 5 Design 6 "Minimum" J 5

PAGE 155

CHAPTER 6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS 6.1 Introduction The motivation for this chapter is a paper by Sen (1979) on preliminary test maximijin likelihood estimators (PTMLE) . The procedure for the PTMLE is to collect the entire data and then make a test of hypothesis. If the null hypothesis is rejected, then the maximum likelihood estimates under an unrestricted parameter space are found. If the null hypothesis is accepted, the estimates are found under a restricted parameter space. Presently, the unrestricted parameter space is the two-dimensional (6 ,B ) , the restricted parameter space is (6 ,0) , and hence the null hypothesis is H : 3^=0. The objective of the present chapter is to o 2 find designs which minimize the integrated mean square error using the preliminary test estimator. This is similar to Section 4.3, except that in the present situation a linear model may be fit to the data instead of simply fitting a quadratic model regardless of the data. Define (S^/0) as the maximxim likelihood estimator under the restricted parameter space, {Q ,B ) as the estimator under the unrestricted parameter space, and (6*/ 3*) as the PTMLE. Sen shows that for any fixed alternative, i.e. for any $7^0, i^N(S 3^,6 8^) and /N(6* 3,/3* B^) have the same asymptotic multi-normal distribution, and therefore the optimal designs for the PTMLE are the same as 145

PAGE 156

146 the optimal designs found in Section 4.3. Likewise, the optimal designs for 6^=0 are identical to the ones presented in Section 3.2. The remaining situation is when (6 ,Q ) is "close" to the restricted parameter space, i.e. when 6_ is "close" to zero. This is r -1°° conceived by defining a sequence (K ; of local alternatives to be ^ N=l = Y//N , :6.1.1) where Y is some scalar constant. Notice that as N-><», K -* H . This situation is covered in the next section. 6.2 Average-variance Designs for Preliminary Test Maximum Likelihood Estimators For the remainder of the present chapter, the sequence of alternatives given in (6.1.1) is considered. The asymptotic dispersion matrix for the PTMLE in general is given by (5.8) on page 1029 of Sen's paper. After working through Sen's paper to find this matrix for the present situation, the asymptotic dispersion matrix for /N(3* 3^,S* 3^) is E = 11 12 22 1/X 11 11 12 -1 12 22 ,2 2 , ^ rr'( 2P(x3,^^xJ,^) -^^xl^^xi^) ) where ,\ , X , and \ are given by (3 . 3 . 2) (3 . 3 .4) , x" a is a noncentral chi-square random variable with noncentrality parameter A and 9 2 2 r degrees of freedom, xT is such that P(xr r /-> * XT , ) = <^» ^"<^ l,a 1,A=0 1,0L

PAGE 157

147 2 _ A,, A "• A _ 2 11 22 12 /c -, ON Y i > (6.2.2) \l -\2/\l [6.2.3) Notice that in (6.2.1) noncentral chi-square probabilities need to be found. A number of convenient approximations exist. One such approximation is to use a central chi-square probability to approximate a noncentral chi-square probability. This is found in the Handbook of Mathematical Functions, published by the National Bureau of Standards. Letting V denote the degrees of freedom and A denote the noncentrality 1 9 parameter, P(X . ^ c) can be approximated by P(X"'* „ ^ c/{l+b)), where \) fix V , b = A/(v + A) , and V* = (V + A)/(l + b) . Haynam, Govindarajulu, and Leone (1962) present a computational formula for noncentral chi-square probabilities. The formula is ^ j! P(X^_,>y) \' -"''-'-'V^)^ j=0 r((v/2)+j) J y (x/2)^^/2).j-l^-x/2^^^/2^ -Y ''"'''"^^'' P(X,^ >y) . (6.2.4, j! ^V j=0 2 ^ ^ The value of j! becomes large very rapidly, and hence the summation in (6.2.4) also converges rapidly. These two methods were compared to observe how close the approximation is to the actual probability, since the computing costs would be reduced if the approximation was adequate. This comparison was made

PAGE 158

148 for combinations of v=3,5, A=.05(.05)2, and a=. 05 ( , 05) , 30. These values were chosen because they are the type of probabilities that need to be calculated in (6.2.1). The summation in (6.2.4) was terminated when the j term was less than 10 . This turned out to be between 5 and 11 terms in all cases considered. For small a, A, and v=3, the approximation formula slightly overestimates the probability. However, for v=3, a-^ .20, and large values of A, the approximation underestimates the probability by as much as .01. For this reason, it was decided to use the computational formula (6.2.3) in the discussion which follows. Once again, a Fortran program was written to conduct a grid search for optimal designs. This was done for N=100 total comparisons and all combinations of 3 ,S_=0(.2)1. The entire procedure was repeated for error rates a=.01 , . 05, . 10, .15, . 20. The optimal design minimizes 2 '^ ^ (XX) -1 dx , (6.2.5) where E is given in (6.2.1). By (6.1.1), y=*'N6^ was substituted for y in (6.2.5). The procedure used for each value of a is identical to the one described in the last subsection of Section 4.2, except that the values of x , x , and the comparison proportions were found accurate to only ±.001. The result of the grid search was that for 6 ^ .2, the level a=.01 gave the smallest value for V. This implies that it may be best to take a=0. This further implies that a linear model should be fit regardless of the data. For S ^ .4, it appeared that it is best to

PAGE 159

149 use a=l, implying a quadratic model should be fit regardless of the data, and hence the optimal designs are found in Section 4.3. This is a very peculiar result. It seems to be implying that there is some value of 3 , say S> ^, in the interval (.2,. 4) such that a linear model should be fit for any |6 | less than S ^, and a quadratic model should be fit for any \S> \ greater than 6 . No definite explanation for this phenomenon is presently known.

PAGE 160

CHAPTER 7 TWO-STAGE SAMPLING FOR MODEL FITTING 7.1 Introduction Suppose the entire experiment is conducted in two stages. The total number of paired comparisons, N, is then split into two parts. The first sampling stage has N total comparisons, and the second has N total comparisons, such that N +N^=N. After the first stage, the hypothesis H : S_=0 is tested. If the hypothesis is rejected, then the design of the second stage would be one which is optimal for fitting a quadratic model. The quadratic model is then fit using only the data from the second stage. If the hypothesis is accepted, then the second stage would be a design optimal for fitting a linear model, again using only the second stage data for the estimation of the linear model. The first stage design, referred to as Design 1, is {n = n = . 5N } . This design was shown in Section 3.3 to minimize var(6^) for 3 =6 =0. The design was also shown to be highly efficient for |6 I ^1. This is a good choice for the first stage design because it maximizes the power of the test of the null hypothesis for a given value of N . The second stage design is one of two choices depending on whether or not it is decided to fit a quadratic model. If the null hypothesis is rejected, the design for the second stage, referred to as Design 2Q, is fn _, , = n . ^, .4rJ , n , , = .2N^}. This design was shown in -.24,1 -i,.24 2 -1,1 2 150

PAGE 161

151 Section 4.5 to be one of the overall better designs for fitting a quadratic model. Finally, if the null hypothesis is accepted, the design {n_ = N^} is to be run. This design was shown in Section 3.2 to be optimal for minimizing var(6,), and hence optimal for fitting a linear model. This design is referred to as Design 2L. The following section discusses optimal choices of the error rate for the test of hypothesis and the optimal values of N and N using a Bayesian approach. 7 . 2 Optimal Error Rate Two prior bivariate distributions for (6^ ,B_) and two optimality criteria are considered using a Bayesian approach. The prior distributions are the uniform and normal distributions, both centered about the point (0,0). The uniform prior is appropriate if an experimenter could state that 3, and g are in some intervals centered about zero, but with no reason to believe that some subset of values for 8 and B is more likely than any other. On the other hand, if the person could state that S and 6^ are in some intervals centered about zero, and furthermore that it is more likely that they are near zero than it is that they are far away from zero, then the normal prior is appropriate. The optimality criteria considered are the minimization of the integrated mean square error and D-optimality. These two criteria have been previously discussed in Chapter 4 for the case of no bias, and are also applicable for the present situation. The difference between the situation in Chapter 4 and the situation in the present chapter is that bias can appear in the present situation. That is, there is bias present if a linear model is fit when the true model is quadratic. For this reason, the mean square error of the maximum likelihood estimators

PAGE 162

152 is used in the present chapter instead of the variance-covariance matrix. The decision rule, d, is a choice of either the linear design (1) or the quadratic design (q) for the second stage of the experiment. Notice that the decision rule is equivalent to choosing the error rate for the test of H : 6^=0. For the present section, the Saves rule is o 2 found for fixed N, and N^, such that N,+N^=100. 12 12 Define p to be the probability of rejecting the null hypothesis. The value of p depends on 6 , a, and var(B^) from the first stage. In fact, it can be shown that 1 ^(z^) + ^(2^) , (7.2.1) where "l = -V2 V^^^^^^^))-' , (7.2.2) ^0 = 2^/0 B_/(var(6_))"^ , (7.2.3) 2 a/2 2 2 and where * is the standard normal cumulative distribution function, and z is such that (1 $(z )) = a. Because of the symmetry of the first stage design, by (3.3.9) the standard error of 8 is (var(32))'^ ^ '^^"\2 " ^ ^"i*-l o^^' ^ ^^1*^0 i/^)'""^ ' (7.2.4) where (|) and i1) are given in (4.2.9). Consider first the integrated mean square error criterion. The loss function for this criterion is L(8,l) = / MSEdn TT ) dx , (7.2.5) J

PAGE 163

153 L(3.,q) = MSEdn TT ) dx , (7.2.6) where 6 = iB^,Q^), In TT = S,x , (7.2.7) X 1 In 7T = B,x + B^x^ , (7.2.8) X 12 and where B is the maximum likelihood estimator from Design 2L, and (6, fS ) is the maximum likelihood estimator from Design 2Q. The risk function is then R(6,d) = (1 p)L(B,l) + pL(S,q) . (7.2.9) The mean square error of (S ,0) is MSE(6 ,0) = , (7.2.10) [ s,^i where (^ is given in (4.2.9) , and ^1* = ^V-1,1 ' ^'-'-^^^ From (7.2.5) and (7.2.10), the loss when Design 2L is chosen is L(B,1) = (X x ) l/-\,,* \ / X -1 \ B. / V > I a. 11

PAGE 164

154 The mean square error of (3,/3„) is ^/\l MSE(6^,B2) " ' (7.2.13) V 1/X22 / where from (3 . 3. 2) (3 . 3. 5) , ^22 = ^2^-^^^^*-l,.24 ^'^-.24,1^^ ' ^^"^-^^^ -.766 + 1.05766 1,.24 3^ 6^ .246^ + .05766^ 2 ( e + e ) (7,2.16) .766 + 1.05766. -.24,1 -.246^ + .057662 ^1+22 2 ( e + e ) (7.2.17) and (p_ is given in (4.3.9). The off-diagonal elements of (7.2.10) and (7.2.13) are zero because both second stage designs are symmetric. From (7.2.6) and (7.2.13), the loss when E)esign 2Q is chosen is L(6,q) = (XX) dx ^-l \ l/X22/\x^/ = 2Z1 ^ 2^ , (7.2.18) \l 22 Then for the integrated mean square error criterion, by (7.2.9), (7.2.12), and (7.2.13) the risk function becomes

PAGE 165

R(3,d) (1 P) /3 +l6 ' \l* ^^ + p 2/3 2/5 The Bayes risk is defined to be 155 (7.2.19) r{T,d) R(6,d) dT(6) (7.2.20) For the D-optimal criterion, the loss function is L(6,l) L(S,q) 1/A^^* 3, l/X 11 i/\ 22 ^ /\i* i''\r^22 (7.2.21) (7.2.22) Hence the risk function in this case is R(3,d) (1 -P)&,A^^^ ^ P/A^^X^^ (7.2.23) The two prior distributions are now defined. The first prior is the bivariate uniform distribution, whose density function is dT(3) 1/4U if 3^£[-l,l], &^eL-u,u} , otherwise , (7.2.24) where U is the upper limit to the prior distribution of 3 • The other prior considered is the bivariate normal distribution. Since the support of the normal distribution is infinite, the density was truncated at three standard deviations about the mean in order to perform the numerical integration. The marginal densities are then

PAGE 166

156 -4.5" 2 dTg_^(6^) dx iB) 2 ^ /.3334 if S^eL-1,1], otherwise , 2 2 e /2.5a if 6-£[-U,u], otherwise. (7.2.25) where a=U/3. These marginals are the usual normal density divided by .9974 so that the density integrates to 1. The joint prior distribution of iB ,Q ) is the product of the marginals. The integral in (7.2.20) can not be analytically found for any criterion and prior since p is a function of the standard normal cumulative distribution, which in turn is a function of 3_. So a SAS computer program was written to numerically integrate (7.2.20) for the two prior distributions described in the two preceding paragraphs and the two criteria discussed in the present section. The numerical integration is performed as follows. The rectangle formed by the area of positive density is divided into a 10" grid. This results with 100 rectangles of equal size. The expression R{8_,d)dT{6_) is then evaluated at each of the centers of the 100 rectangles. This gives the height of the function R(3_,d) dT (8_) at the center of each rectangle. The lengths of the sides of the rectangles are .02 and .02U, and so the approximate volume of each region formed by the function and the sides of each rectangle is found by multiplying the height of the function at the center of the rectangle by .04U. The approximation of the interval in (7.2.20) is then found by summing these 100 volumes.

PAGE 167

157 For each of the four combinations of the two prior distributions and tvro criteria, the Bayes risk in (7.2.20) was found for N =10(10)90, a=0(.05)l, and various values of U. The value of N is found by N =100-N . For each value of N, and U, the best a level was found from 2 1 1 the choices 0=0 (.05)1. These optimal a levels are graphed as a function of U, the upper limit to the prior distibution of S-, for each of N =10(10)90. Remember that these graphs are not precise since the optimal a level is only found to the nearest multiple of .05, but they do give a basic understanding for the behavior of the a level as a function of the prior distribution of 6 • The graphs for the integrated mean square error criterion with a uniform prior are found in Figure 7.1, and with a normal prior are found in Figure 7.2. The graphs for the D-optimality criterion with a uniform prior are found in Figure 7.3, and with a normal prior are found in Figure 7.4. The next section contains a discussion of these four figures. 7 . 3 Concluding Remarks In Figures 7.1-7.4, the axis labelled "U" corresponds to the upper limit of the prior distribution of 6 . The U-axis in the graphs for the normal prior. Figures 7.2 and 7.4, ranges up to 2.08. The normal prior with U=2 corresponds to 3 having a normal distribution with mean 0, and standard deviation 2/3. The U-axis of the graphs for the uniform prior. Figures 7.1 and 7.3, ranges up to 1.2. It is likewise true that for the uniform distribution, U=1.2 implies that the standard error of 3 is also approximately 2/3. So for a given amount of variability, the upper limit for the normal prior is approximately 1.7 3 times the upper limit for the uniform prior.

PAGE 168

158 For example, if an experimenter believes Q is somewhere in the interval (-.5,. 5), but has no inclination to where in that interval 6 2 is, then a uniform prior would be appropriate. Figure 7.1 indicates that for U=.5 and N =10, the test should be run at an error rate of approximately a=.20; for N =20, a=.08. A normal prior with the same amount of variability or uncertainty would be found by multiplying U=.5 by 1.73. This results in U=.865. Figure 7.2 indicates that for U=.865 and N =10, the test should be run at an error rate of approximately a=.35; for N =20, a=.20. So the error rates for the particular value of the standard error of 3 discussed in the preceding paragraph appear to be slightly larger for the normal prior than for the uniform prior. However, this is not apparent for the entire range of values for U and N . In fact, by looking at the maximum value of U in Figxires 7.1 and 7,2, a=l in both cases for N =10,20,30, but a is larger for the uniform prior than for the normal prior when N =40(10)80. For most values of the standard error of S_, and N , in particular when the standard error is relatively large, a is larger for the uniform prior than for the normal prior. Notice from all four figures that for a given value of U and N , the a level for the normal prior is smaller than for the uniform prior. This is partially explained by the previous paragraph. The normal prior with the same value of U as a uniform prior will have a smaller standard error, and most of the density is near 3^=0. Consequently, there is a higher probability that 3^ is near zero, and it is therefore reasonable that the hypothesis H : 3^=0 should be tested at a o 2 smaller a level.

PAGE 169

159 Comparing the two criteria for each of the two priors indicates that for given values of U and N , the a level is smaller for the D-optimal criterion than it is for the integrated mean square error criterion. It turns out that in all four situations, for a given value of U, the Bayes risk is smaller for N =10 than it is for any of N =20(10)90. This indicates that if an experimenter is interested in rionning a twostage experiment, a relatively small number of comparisons should be used in the first stage, saving the majority of the available comparisons for the second stage. This may be explained by the fact that in the present discussion, the first stage data are not used in the estimation phase. Only the second stage data are used for the final estimation. So the result found at the beginning of the present paragraph implies that it is better to use the large majority of the available comparisons for the sole purpose of testing the null hypothesis. Further research remains to be conducted on combining the first and second stage data in the estimation phase. This would involve deriving vardn tt ) , where In tt is expressed as ~ ~ 2 In ^ = 1(3, X) + (1 I) (3,x + &K ) , X 1 12 and where I is the decision rule for the first stage, i.e. r 1 if it is decic Lded to fit a linear model, I = { (_ if it is decided to fit a quadratic model, This is not a trivial problem because the random variables I, B , and

PAGE 170

160 (3,/8„) are not independent. Intuitively, if this could be resolved, it appears reasonable that some optimal ratio N /N and a level could be found .

PAGE 171

161

PAGE 172

162

PAGE 173

163 o rO 3 O OOO

PAGE 175

APPENDIX A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES OF 7T, , . . . .TT ,9 1' t This appendix contains a computer program written in the language APL. There are two matrices which need to be entered onto the computer. First of all, the matrix referred to as "N" is a t^t matrix, with (i,j) element equal to the number of times treatment T. is oref erred 1 over treatment T.. The other matrix is also txt, and is referred to as "TIE". The lower triangular part and the main diagonal of "TIE" are all zeroes. The (i,j) element, where i
PAGE 176

166 freedom. If there are no ties, only (1) and (5) above are printed, because in this case the calculation of the element of the inverse to the variancecovariance matrix corresponding to 9 results in division by zero. An iterative procedure is used in the estimation phase of the computer program. The initial estimates of 9,TT , ... ,7T are the ones presented by Dykstra (1956) . The method used to find the estimates and asymptotic variance-covariance matrix was presented by Rao and Kupper (1967) . In their presentation, it was assumed that each pair was compared an equal number of times. Therefore, as pointed out above, the variance-covariance matrices given by the program may not be applicable if the pairs are not compared an equal number of times. T^+. Q(k) 1 (k) " (k) , ^ . th Let y ,TT^ , ... ,Tr be the estimates after the k iteration. The iterative procedure is terminated when (k-1) _ ^(k) (k-1) 1 + t i=l ; (k) ^ ; (k-1) 1 i : (k-1) < 10 -4 If so desired, this can be changed simply by altering line [13 J. Example. Suppose that N = 40 20 40 14 TIE = 3 5 1 Then an execution of the program yields the following;

PAGE 177

167 Initial estimates Final estimates (19 iterations) 1.0789

PAGE 178

168 1 -r e z H i iil St «»• A j. N C + Ci <^ Jl M 2 ;Xi ili HI U< £i + X s -^ i13 -i+ X -^ ^ C
PAGE 179

169 «t

PAGE 180

APPENDIX B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES OF S, ,...,8 1 S This appendix contains two computer programs that find iterative estimates of S , ... ,S . The first one, found on pages 174 and 175 is written in the language APL. The second one, found on pages 176 and 177, is written in Fortran. The APL version is discussed first. The input is the following, (1) DATA = A t^t matrix whose (i,j) element is the number of times T. is preferred over T.. (2) MODEL = A vector of integers defining the model which is to be fit. This is now explained in detail. Suppose there are v independent variables in the experiment. Then the first v elements of MODEL give the order of each variable (e.g. 3 means to 2 3 include x., x. , and x. ). The remaining elements are considered 111 ^ in pairs, and correspond to two-way interaction terms which are to be included in the model (e.g. (1,3) occurring after the first V elements means that the term x x should be included). Notice that any 3-order or higher order terms can not be included without a modification of the program. (3) X = A t^v matrix of levels in the experimental design. 170

PAGE 181

171 The output is the following. (1) A listing of the vector MODEL. (2) The txs matrix X formed by concatenating onto X the columns corresponding to the higher order and interaction effects. (3) The sxtxt matrix where the (i,j,k) element is (X) .. (X), .. 31 ki (4) The parameter estimates at each iterative step until the convergence criterion is met. The initial estimates are 6,='''=6 =0. Is These estimates at each iteration are printed in the same order as the columns of X. The Newton-Raphson method with a slight modification is used to perform the iteration phase. See Scarborough (1962) for a thorough discussion of the Newton-Raphson method. Springall stated that the iterative estimates of 8,3,/ ... ,3 converged ver^/ slowly. This is Is also true when only estimating 3,/ ... ,3 using the Newton-Raphson method without any modifications. However, it is the author's experience that when the convergence is slow, it is because the estimates are "j\imping" from one side of what will be the final estimates to the other. That is, if 3,, ... ,3 represent the final estimates, then the 1 s estimates after the k iteration may be 3 +£, , ... ,3 +£ t and the 11 s s St ^ estimates after the (k+1) iteration may be approximately 8 -£w ... , 3 -£ , for some values of Z., i=l, ... ,s. Because of this phenomonen, s s 1 the modification to the Newton-Raphson method is to take the average of the k" and (k+1) iterative estimate before continuing with the next iteration whenever the direction of convergence changes. For example, St if the successive estimates of 3 are decreasing, and the (k+1) estimate of 3^ is larger than the k , then the average of the k and

PAGE 182

172 St st (k+1) estimate replaces the (k+1) estimate before proceeding to the (k+2) iteration. This modification substantially decreased the number of iterations until convergence. The iteration phase is terminated when the maximum difference between the estimates and the preceding estimates is less than 10 If it is desired, this can easily be changed by altering line [55]. As pointed out in Section 2.2, the procedure does not always converge. However, the only time this occurred was when it was attempted to fit a full model, i.e. when s=t-l. When s=t-l, the program in Appendix A can be used to find estimates of In tt . , and then the t equations, t-1 In TT 1 ,/_, k ki i=l, . . . ,t, k=l can be solved for ,B t-l Example . Suppose that DATA = 10 6 15 11 19 14 MODEL = [2 J Then an execution of the program yields the following: MODEL = 2 .X = Matrix of differences; -1 1 2 1 -2 -1 1 -1 1

PAGE 183

173 Iteration 6,,S ,... 1 -.468182 -.145455 2 -.485958 .319043 3 -.469350 -.144161 AVG -.477654 .087441 11 -.478947 .084719 As can be seen in the iterative phase in the example, an average is taken between the 3 and 4 iteration. This occurred because 3, decreased and 6increased from the 1 to the 2 iteration, while the reverse occurred for the 2 to the 3 iteration. The Fortran program as it currently stands can only be used for fitting a quadratic model. It can, however, be modified to be as general as the APL program. The matrices DATA and X need to be read in. This is done in line numbers 3-8. These statements in addition to the data itself at the end of the program will need to be changed according to the problem at hand. A peculiar occurrence was made evident by running the Fortran program for the same example as presented above. It turned out that the Fortran program converged to the same final estimates in only 4 iterations instead of 11 iterations. The procedure in each program is the same, but no explanation for the difference in the speed of convergence for the two programs is presently known.

PAGE 184

174 €21 r.Z2 LAI Z51 l61 Z71 C31 C?l i: 1 J cm 11123 C13a i: 1 4 3 C15: C161 C173 1131 Ci?l L201 i::2in l22] 1:23: i::24 3 C25: C263 :27j C28D C293 C30] C31J C32J C33J C34a C35D C36: C37: C33: C393 C40 3 1:413 L'42n C433 C443 C453 ,T, (T^f ;-;[:;i3) );^Q 7 MOr.EL. ESTIM >' J I J J ; T J K ; M ATR I K J XD I FF J 3 J I TEP: ; P ; P ^ ; P 2 ? F-3 5 COP: ; D ; DETERM ,* L J N ; BETftOLD ; DIFFOLI' J D IFF J FLAG J l_UL. BETA<-(l,pX[:m)fO 'f^F'(1 =*^'^J:'e:i-Ci:il)/i-i,L.3 '-3:«^i -»(1 =K (1 =1 cr.iFF^( (S^/-;;[:i;3 T2: i^i+i ^3:-'<-'-'+i ;cr.iFF[:;i; j:^KCi;:--C^f 3 xr.iFFr; jji-]^_v;i:,iFFc; i; j: 4(1 =-!< T)/T3, T4 T4M(1 =I< (T-i) )/T2,T5 D<' ' ;n<' ' ?;;diff MATRi;;^ ( ( 5 , T , T ) ^-DATA ) x>.'DIFF FLAG<-ITEP:t-0 n*' ' r n<' ' ;d<' ' ;a<' ' ; ' itef.ation COF.>.( 1 ,S)j.-0 To : P'ft&ETA+ , xi^X != 1 ^ C T f T ) f , Ff^^^-CjFl F-3«-'='l-p'l+p-2 EaNSf._ + / + /MATF:I>;x ( S , T , T ) ;.•> F-3 D<-(3,S)/.0 WAT*-(5,5,5)^0 T7;l<-l + i E'C ; L3^ + / + /MATF:i;;x ( 5 , T , t ),-«(( t , T ),>;;[:; lj ) xF 3( Fl x ( Sj (( r , t ) ,^KC ; "3 ) X F1 ) + ( ( T , T ) ;.-> ;.r c ; L 3 ) X F1 ) ( F1 +F-2 ) « 2 MAT[:;;L:3f.(S.S)fD[:;L3 MATCL ; : L]<-EQ?-!S OF DIFFERENCES &0' ^1 f B2

PAGE 185

175 1146: C472 C481 C49: C503 C5in C521 C533 C54a C55] C563 C57J C53: C593 12601 £612 C62J C63: i:64: l6oI] C66j 1:67 J CoSj C693 C70 3 l712 C723 C733 C74J 1:733 ->(1 =L.<3)/T7,TB Tg ; riETERM^-DET D T9:i<.i + i COR CI ; X]t-(DET MflTCI ; ; 3 )-^DETEF:M 4(1 =I<5)/T9.»T10 TIQ J ITER^-ITER + l 2<-BETA BETftf-BETA + COR ' I5f-5»30f^l0,6 ' D>="MT( ( 1 ,S+1 )p ITER, BETA) -*(1 =< r/ I BETA-BETAOUP) < lE-.5)/T12,Tll Tlir-^d =ITER<50)/T11A,0 "fll«:->(l =f=">-AG = 0)/TllB,TllC TUB : FLAG<-1 "'"ll'^WCl ='="1-'^G=1)/T11D,T11E TllDJ tiIFFOl_r><-BETA-BETAOl-D F-UAGf.2 TllE:DIFF*-BETA-&ETAOl_r' AAAJL.i_l_<-Ll.l_+j^ •^(1 =< ( xr.IFFCl ;i.I_1.3 )=xDIFFOI_riCl ;i-l_l_:] ) ) /BBB, TllF &&B;-*(1 =1-1-I_ = 5)/T1XG, AAA TllF;BETA«-(BETA+&ETAOI_r') -f-2 ' «'''G • ; • ;;5,30f^io,6 ' ni^wf c ( i , 5)f e
PAGE 186

176 DIMENSION DATA(5,5) r XDIF F ( 2 f 5 . 3 ) , XDIFFM ( 2 ,5.. 5 ) rP(5) rPRATI0(5r5) I lX(5f 2) > BETA (2) r D(2>2) f«AT( 100^3) READ(5f 100) ( (DATA(I» J) f J=l ,3) f 1=1 f 3) 100 FORMAT ( 12F2.0) WRITE C6f 101) ( (DATAdf J) rJ=l,3)»I = lf3) 101 F0RMAT(20Xf ' DATA ' ///3 ( 3F6 , 0// ) ) READ (5f 102) (X(I»1),I=1,3) 102 FORMAT (3F2.0) DO 1 1=1,3 1 Xdf 2)=X(I,1)**2 DO 2 1=1,2 DO 2 J=lf3 DO 2 K=lf3 2 XDIFFd, J,K)=X( J,I)-X(K, I) WRITE(6»103)((X(I,J),J=l,2)»I=lf3) 103 FORMAT ( '1' f20X, ' X ' ///3 ( 2F6 . 1// ) ) URITE(6f 104) ( ( (XDIFFC I , Jf K) f K=l,3) , J=lf 3) ,1=1. 2) 104 FORMAT ( '1' f20Xf ' XDIFF ' ///3 ( 3F6 . 0// ) ///3 ( 3F6 , 0// ) ) BETA (1) =0.0 BETA(2)=0.0 CALL ESTIM(DATAf3rl00dTERfX,XDIFF.BETAf D.DETfMATfM) WRITE(6,105)ITERf BETA( 1) r BETA (2) 105 FORMATC '1' rlOXr 'NUMBER OF ITERATIONS ' //15X d6//20X ,' BETA ' /// 12F15.6) UR I TE ( 6 . 1 06 ) DET f D ( 1 f 1 ) T D ( 1 f 2 ) f D ( 2 r 1 ) . D ( 2 » 2 ) r 1 ( (MATCIf J) r J=l »3) f 1=1 »M) 106 FORMAT ( ' 1' f lOX , F15 . 6//4F20 . 6////100 ( /3F20 . 6 ) ) STOP END SUBROUTINE ESTIM(DATAf IITfNITERrlTERf XfXDIFFf BETAf D^DET.MATfM) DIMENSION DAT A (5. 5) , XDIFF ( 2 f 5 , 5 ) , XDIFFM ( 2 f 5 f 5 ) , P ( 5 ) . D ( 2 r 2 ) , lPRATI0<5f 5) ,X(5,2) f BETA(2) .BOLD (2) .D0LD(2) f DIFF(2) rMAT( 100,3) DO 1 1 = 1 f 2 DO 1 J=lrIIT DO 1 K=lfIIT 1 XDIFFM( I , J,K)=DATA( JtK)*XDIFF( I , J.K) M = ITER=0 IFLAG=0 10 DO 2 1 = 1 T I IT 2 P( I)=EXP(BETA( 1 )*X( Ir 1 )+BETA(2)*XCI,2) ) M=M+1 MAT(M» 1 )=FLOAT( ITER) MAT(Mr2)=BETA( 1 ) MAT(M.3)=BETA(2) FUNC1=0.0 FUNC2=0.0 DO 3 1 = 1 f I IT DO 3 J=lfIIT PRAT 1 (I , J ) =P ( J ) / ( P (I ) +P ( J ) )

PAGE 187

177 4 44 11 14 16 //GO, 0151 -1 FUNC1=FUNC1+XDII-FM(1 r I f J ) *PRATIO (I ? J ) FUNC2=FUNC2+XDIFFM(2f Ir J)*PRATIO(rr J) DO 44 IR=lf2 DO 44 L=1j2 DD=0.0 DO 4 I=lfIIT DO 4 J=lfIIT IFd.EQ. J) GO TO 4 DD = DD + XD IFFM ( IR ^ I » J ) * ( X ( J » L) *PRAT 10 ( I f J ) l(X(IfL):KPRATIO( J.I)+XC JrD^PRATIOdf J)**2) ) CONTINUE D(IRfL)=DD DET=D(lf l)*D(2f2)-D(lf2)*D<2»l) ITER=ITER+1 B0LD(1)=&ETA(1> B0LD(2)=BETA(2) BETA ( 1 ) =BETA (1 ) + ( FUNC2*D ( 1 f 2 ) -FUNCl *D ( 2 f 2 ) ) /DET BETA ( 2 ) =BETA ( 2 ) + ( FUNC1*D (2^1) -FUNC2*D ( 1 r 1 ) ) /DET DO 5 1 = 1 f 2 DIF = ABS(BETA(I)-BOLD(r ) ) IF(DIF,GT. .000001) GO TO 11 CONTINUE RETURN IFCITER.GE. NITER) RETURN IF(IFLAG»EQ.2) GO TO 14 IF( IFLAG.EQ.l) GO TO 13 IFLAG=1 GO TO 10 DGLD ( 1 ) =&ETA ( 1 ) -BOLD ( 1 ) DOLD C 2 ) =BETA ( 2 ) -BOLD ( 2 ) IFLAG=2 GO TO 10 DIFF(1)=BETA(1)-B0LD(1) DIFF(2)=BETA<;2)-B0LD^2) IF(SIGN(DIFF(1) »D0LD(1) ) .NE.DIFF(l) ) GO TO 15 IF(SIGN(DIFF(2) f D0LD(2) ) .EQ.DIFF(2) ) GO TO 16 BETA( 1 )=
PAGE 188

BIBLIOGRAPHY Abelson, R. M. and Bradley, R. A. (1954) . A 2X2 factorial with paired comparisons. Biometrics 10 , 487-502. Beaver, R. J. (1977) . Weighted least squares response surface fitting in factorial paired comparisons. Commun. Statist. A6 , 1275-87. Box, G. E. P. and Lucas, H. L. (1959). Design of experiments in nonlinear situations. Biometrika 46 , 77-90. Bradley, R. A. (1953) . Some statistical methods in taste testing and quality evaluation. Biometrics 9 , 22-38. Bradley, R. A. (1955) . Rank analysis of incomplete block designs. III. Some large-sample results on estimation and power for a method of paired comparisons. Biometrika 42 , 450-70. Bradley, R. A. (1976). Science, statistics, and paired comparisons. Biometrics 32 , 213-32. Bradley, R. A. and Terry, M. E. (1952) . The rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 , 324-45. David, H. A. (1963). The Method of Paired Comparisons . Griffin, London . Davidson, R. R. (1970) . On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. J. Amer. Statist. Assoc. 65 , 317-28. Davidson, R. R. and Bradley, R. A. (1970) . Multivariate paired comparisons: Some large sample results on estimation and tests of equality of preference. In Nonparametric Techniques in Statistical Inference, (M. L. Puri, ed.) . Cambridge Univ. Press, 111-25. Dykstra, 0. (1956) . A note on the rank analysis of incomplete block designs applications beyond the scope of existing tables. Biometrics 12 , 301-6. Dykstra, 0. (1960). Rank analysis of incomplete block designs: a method of paired comparisons employing unequal repetitions on pairs. Biometrics 16 , 176-88. 178

PAGE 189

179 El-Helbawy, A. T. and Bradley, R. A. (1978) . Treatment contrasts in paired comparisons: Large-sample results, applications, and some optimal designs. J. Amer. Statist. Assoc. 73 , 831-9. Ford, L. R. , Jr. (1957). Solution of a ranking problem from binary comparisons. Amer. Math. Monthly 64 (8) , 28-3 3. Graybill, F. A. (1969). Introduction to Matrices with Applications in Statistics . Wadsworth, Belmont, Calif. ~~~~ Grizzle, J. E. , Starmer, C. F. and Koch, G. G. (1969). Analysis of categorical data by linear models. Biometrics 23 , 489-504. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables . (1964). U.S. Department of Commerce, National Bureau of Standards. Haynam, G. E., Govindarajulu, Z. and Leone, F. C. (1962). Tables of the Cumulative Non-Central Chi-Square Distribution . Case Institute of Technology Statistical Laboratory, Report 104, Cleveland, Ohio. Kaplan, W. (1952). Advanced Calculus . Addison-Wesley, Reading, Mass. Larmond, E. , Petrasovits, A. and Hill, P. (1968). Applications of multiple paired comparisons in studying the effect of aging and finish on beef tenderness. Canad. J. Anim. Sci. 49 , 51-8. Littell, R. C. and Boyett, J. M. (1977). Designs for RXC factorial paired comparison experiments. Biometrika 64 , 73-7. Quenouille, M. H. and John, J. A. (1971). Paired comparison designs for 2" factorials. Appl . Statist. 20 , 16-24. Rao, P. V. and Kupper , L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. J. Amer. Statist. Assoc. 62 , 194-204, Corrigenda 63 , 1550. Scarborough, J. B. (1962) . Numerical Mathematical Analysis . The Johns Hopkins Press, Baltimore, Maryland. Scheffe, H. (1952). An analysis of variance for paired comparisons. J. Amer. Statist. Assoc. 47 , 381-400. Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7, 1019-33. Springall, A. (1973). Response surface fitting using a generalization of the Bradley-Terry paired comoarison model. Appl. Statist. 22, 59-68.

PAGE 190

180 Thompson, W. A. and Singh, J. (1967). The use of limit theorems in paired comparison model building. Psychometrika 32, 255-64. Thurstone, L. L. (1927). Psychophysical analysis. Amer. J. Psychol 38, 368-89. Wilks, S. S. (1962). Mathematical Statistics . John Wiley s Sons, New York .

PAGE 191

BIOGRAPHICAL SKETCH Walter William Offen was born in Harmony, Minnesota, on November 30, 1953, the only American-born child of German immigrants. Six months later his family moved to the Chicago area where he resided through high school. He attended Northwestern University and received the Bachelor of Arts degree in June, 197 5, with a major in mathematics and a minor in economics. He received the degree of Master of Statistics from the University of Florida in June, 1977. Since that time he has been pursuing the degree of Doctor of Philosophy. During his time in graduate school at the University of Florida, he worked for the Department of Statistics as an assistant in their Institute of Food and Agricultural Sciences consulting unit. He will begin working for Eli Lilly and Company in September, 1980, as a senior statistician. Walter Offen is married to the former Susan Elaine Hauser. They will reside in Noblesville, Indiana. 181

PAGE 192

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. /,/^... — -^-. Ramon C. Littell, Chairman Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Pejaver V. Rao Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. X^^ -^ Alan G. Agresti Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard F. Matthews Professor of Food Science and Human Nutrition

PAGE 193

This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate Council , and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1980 Dean, Graduate School