
Citation 
 Permanent Link:
 https://ufdc.ufl.edu/UF00098278/00001
Material Information
 Title:
 Design of paired comparison experiments with quantitative independent variables
 Creator:
 Offen, Walter William, 1953 ( Dissertant )
Littell, Ramon C. ( Thesis advisor )
Rao, Rejaver V. ( Reviewer )
Agresti, Alan G. ( Reviewer )
Matthews, Richard P. ( Reviewer )
 Place of Publication:
 Gainesville, Fla.
 Publisher:
 University of Florida
 Publication Date:
 1980
 Copyright Date:
 1980
 Language:
 English
 Physical Description:
 x, 181 leaves : ill. ; 28 cm.
Subjects
 Subjects / Keywords:
 Design efficiency ( jstor )
Eigenvalues ( jstor ) Experiment design ( jstor ) Factorial design ( jstor ) Factorials ( jstor ) Matrices ( jstor ) Maximum likelihood estimations ( jstor ) Minimax ( jstor ) Null hypothesis ( jstor ) Parametric models ( jstor ) Dissertations, Academic  Statistics  UF Paired comparisons (Statistics) ( lcsh ) Statistics thesis Ph. D
 Genre:
 bibliography ( marcgt )
nonfiction ( marcgt )
Notes
 Abstract:
 An experiment in which the treatments are compared and ranked
pairwise is called a paired comparison experiment. The BradleyTerry
model for preference probabilities with ties not allowed is used in
this dissertation. This model defines treatment parameters, tt , ... ,
7T , such that the probability that treatment T. is preferred over
treatment T. is equal to tt. (tt. +Tr . ) . lThen the treatments are levels
of continuous, quantitative variables, the logarithm of the Bradley
Terry parameters can be modeled as a regressiontype model
,
InTT . = 'iS, X, . . There have been a large nvimber of papers on the topic
1 ^ k ki
of paired comparisons , but only a few have considered experimental
design. Past papers on design are reviewed in Chapter 1. In 1973
Springall presented the asymptotic distribution of the maximum likelihood
estimators of B, , ... ,3 , where s is the number of regression
1 s
type parameters in the model. Chapter 2 of the present dissertation
contains a derivation of this asymptotic distribution. The remainder
of the dissertation is a consideration of designs for situations where
the treatments are levels of a single quantitative variable.
The asymptotic variancecovariance matrix is a function of the
true parameter values. Hence, the optimal designs also depend on the
parameters. In previous papers on design, the BradleyTerry parameters
were taken to be equal when evaluating the variances and covariances.
The present dissertation considers the design problem for various
parameter values.
Notice that the general design problem is complex. It involves
deciding how many levels and which particular levels should be chosen.
Furthermore, it must be determined how often to compare each pair of
treatments. In some cases this complexity was resolved. In other
cases additional restrictions had to be imposed before optimal designs
could be found.
Chapter 3 considers designs which minimize the variance of a
single parameter estimate, S.. The linear, quadratic, and cubic models
are considered. The optimal design in the linear case turns out to be
comparisons of only the smallest level in the experimental region with
the largest, for most values of the linear parameter. The designs for
the quadratic and cubic cases depend more heavily on the parameter
values than for the linear case.
Chapter 4 presents optimal designs for fitting a quadratic model.
The optimality criteria considered are Doptimality , the minimization
of the average variance of Inir , and the minimization of the maximum
value of InTT.. These designs also depend heavily on the parameters,
although some overall design recommendations are given in the last
section. The remaining chapters contain a brief discussion of related topics.
Chapter 5 is a consideration of designs which protect against the
bias present when the true model is cubic. Oiapter 6 is a discussion
of designs for preliminary test estimators. Chapter 7 is a consideration
of twostage sampling. The first stage is a design which results
with a small variance of the quadratic parameter estimate. A test of
hypothesis then determines whether the second stage design should be
one which is optimal for fitting a linear or a quadratic model. A
discussion of the best error rate to use for the test of hypothesis is
included.
The appendices contain computer programs that find maximum likelihood
estimates of the BradlevTerry parameters and 6,/ ... ,B â€¢ These
programs are written in the languages APL and Fortran.
 Thesis:
 Thesis (Ph. D.)University of Florida, 1980.
 Bibliography:
 Includes bibliographic references (leaves 178180).
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 by Walter William Offen.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 023428712 ( AlephBibNum )
07188318 ( OCLC ) AAL5316 ( NOTIS )

Downloads 
This item has the following downloads:

Full Text 
DESIGN OF PAIRED COMPARISON EXPERIMENTS
WITH QUANTITATIVE INDEPENDENT VARIABLES
BY
WALTER WILLIAM OFFEN
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1980
Digitized by the Internet Archive
in 2009 witiff~hiing from
University of Florida, George A. Smathers Libraries
http://www.archive.org/details/designofpairedco00offe
ACKNOWLEDGMENTS
I would like to thank Drs. Ramon C. Littell and William Mendenhall
for encouraging me to work past the Master of Statistics degree to pur
sue a Ph.D. I sincerely appreciate their concern and beneficial advice.
A special thanks goes to Dr. Ramon C. Littell, first of all for
the attention and help he gave to me in my early years at the University
of Florida while working for him at the Institute of Food and Agricul
tural Sciences. I especially thank him for the immense amount of time
and effort he contributed to this project. His patience and sincere
interest will always be remembered.
Thanks go to my family for their understanding and support during
the many years of my education. I am especially grateful to my wife,
Susie, for her love and caring, which makes it all worthwhile.
Finally, I would like to thank my typists, Susan and Walter Offen,
who both did an outstanding job of typing this manuscript. Also,
thanks go to Drs. John Cornell, Ramon Littell, Frank Martin, and
Kenneth Portier for allowing my typists to use their office type
writers.
1ii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . . . .. . . . . . iii
LIST OF TABLES . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . ... . . . .vii
ABSTRACT . . . . . . . . ... . . . . . .viii
CHAPTER
1 DESIGN OF PAIRED COMPARISON EXPERIMENTS: A LITERATURE
REVIEW . . . . . . . . . ..... 1
1.1 Introduction .... . . . . . . . 1
1.2 BradleyTerry Model . . . . . . . 3
1.3 Estimation of the BradleyTerry Parameters and
Corresponding Asymptotic Distribution . . 6
1.4 Designs of Factorial Paired Comparison
Experiments . . . . . . . . . 11
1.5 Response Surface Designs . . . . ... .22
2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS 26
2.1 Introduction . . . . . . . ... 26
2.2 Estimation .. . . . . . . . .27
2.3 Asymptotic Distribution . . . . . . 29
3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER
ESTIMATE ........ . . . . . . .... 34
3.1 Introduction . . . . . . . . . 34
3.2 Linear Model .... . . . . . .34
3.3 Quadratic Model ... . . . ... 38
Minimization of var (82) . . . . ... 38
Equivalence of ElHelbawy and Bradley's
method and minimizing the variance . . 50
3.4 Cubic Model .. . . . . . . . 58
4 DESIGNS FOR FITTING A QUADRATIC MODEL . . . ... .65
4.1 Introduction .... . ..... ... .... 65
Page
. 68
4.2 Doptimal Designs . . . . . .
Condition 1 .....
Condition 2 . . . . .
Condition 3 .....
4.3 Averagevariance Designs ....
Condition 1 .....
Condition 2 . . . . . .
Condition 3 ......
4.4 Minimax Designs ......
Condition 1 ......
Condition 2 . . . . .
Condition 3 . . . .
4.5 Conclusion and Design Recommendations . .. .128
5 DESIGNS THAT PROTECT AGAINST BIAS . . . . ... 135
Introduction . . . . . . .
All Bias Designs . . . . . .
Integrated Mean Square Error Designs .
. . 135
. . . 136
. . . 138
6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS . . . .. .145
6.1 Introduction . . . .. . . . . .145
6.2 Averagevariance Designs for Preliminary Test
Maximum Likelihood Estimators . . . ... .146
7 TWOSTAGE SAMPLING FOR MODEL FITTING . . . . .. .150
Introduction . . . . . . .
Optimal Error Rate . . . . . .
Concluding Remarks . . . . . .
. .. 150
. 151
. .. 157
APPENDIX
A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES
OF 7i ...." t'r . . . . . . . . .. .. 165
B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES
OF 8 ,..., s . . . . . . . . . . 170
BIBLIOGRAPHY . . . . . . . . . . . . . 178
BIOGRAPHICAL SKETCH . . . . . . . . .. ... .181
LIST OF TABLES
TABLE Page
1.1 Efficiencies of balanced 2 and 2 factorial paired
comparison designs . . . . . . . . ... 18
3.1 Designs which minimize var(B ) . . . . . . 48
3.2 Designs which minimize var( 3) . . . . . . 64
4.1 Doptimal designs with restriction x=l,0,1 ...... 90
4.2 Doptimal (local) designs with 2=0 . . . . . 94
4.3 Doptimal designs .. . . . . . . . . .95
4.4 Averagevariance designs with restriction x=l,0,l .. 105
4.5 Averagevariance (local) designs with B2=0 ...... 109
4.6 Averagevariance designs . . . . . . . .. 110
4.7 Minimax designs with restriction x=l,0,1 . . ... 119
4.8 Minimax (local) designs with 2=0 . . . . ... 123
4.9 Minimax designs ... . . . . . . ... .124
4.10 Efficiency of designs . . .... ... ...... 133
4.11 Bias of designs .... . . . . . . . 134
5.1 Jcriterion designs . . . .. . .. . . 141
5.2 Values of J . . . . . . . . . . . .143
LIST OF FIGURES
FIGURE
1.1 Depiction of Designs I and II . . . . .
3.1 Designs which minimize var(2) . . . .
4.1 Procedure summary . . . . . . .
4.2 Depiction of possible designs for Condition 1 .
4.3 Locally optimal designs when 2=0 . . . .
7.1 Minimization of the average mean square error 
prior . . . . . . . . . .
7.2 Minimization of the average mean square error 
prior . . . . . . . . . . .
7.3 Doptimality Uniform prior . . . .
7.4 Doptimality Normal prior . . ......
Page
. . 12
. . . 49
. . 69
. . . 75
129
Uniform
. . 161
Normal
. . . 162
F
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
DESIGN OF PAIRED COMPARISON EXPERIMENTS
WITH QUANTITATIVE INDEPENDENT VARIABLES
By
Walter William Offen
August 1980
Chairman: Ramon C. Littell
Major Department: Statistics
An experiment in which the treatments are compared and ranked
pairwise is called a paired comparison experiment. The BradleyTerry
model for preference probabilities with ties not allowed is used in
this dissertation. This model defines treatment parameters, ...
t such that the probability that treatment T. is preferred over
1
treatment T. is equal to i. (T7.+r.) When the treatments are levels
of continuous, quantitative variables, the logarithm of the Bradley
Terry parameters can be modeled as a regressiontype model,
In7T = kxki. There have been a large number of papers on the topic
i k kxi'
of paired comparisons, but only a few have considered experimental
design. Past papers on design are reviewed in Chapter 1. In 1973
Springall presented the asymptotic distribution of the maximum likeli
hood estimators of S1 ... 3 where s is the number of regression
type parameters in the model. Chapter 2 of the present dissertation
contains a derivation of this asymptotic distribution. The remainder
viii
of the dissertation is a consideration of designs for situations where
the treatments are levels of a single quantitative variable.
The asymptotic variancecovariance matrix is a function of the
true parameter values. Hence, the optimal designs also depend on the
parameters. In previous papers on design, the BradleyTerry parameters
were taken to be equal when evaluating the variances and covariances.
The present dissertation considers the design problem for various
parameter values.
Notice that the general design problem is complex. It involves
deciding how many levels and which particular levels should be chosen.
Furthermore, it must be determined how often to compare each pair of
treatments. In some cases this complexity was resolved. In other
cases additional restrictions had to be imposed before optimal designs
could be found.
Chapter 3 considers designs which minimize the variance of a
single parameter estimate, S.. The linear, quadratic, and cubic models
1
are considered. The optimal design in the linear case turns out to be
comparisons of only the smallest level in the experimental region with
the largest, for most values of the linear parameter. The designs for
the quadratic and cubic cases depend more heavily on the parameter
values than for the linear case.
Chapter 4 presents optimal designs for fitting a quadratic model.
The optimality criteria considered are Doptimality, the minimization
of the average variance of Inr., and the minimization of the maximum
value of InT.. These designs also depend heavily on the parameters,
1
although some overall design recommendations are given in the last
section.
The remaining chapters contain a brief discussion of related top
ics. Chapter 5 is a consideration of designs which protect against the
bias present when the true model is cubic. Chapter 6 is a discussion
of designs for preliminary test estimators. Chapter 7 is a considera
tion of twostage sampling. The first stage is a design which results
with a small variance of the quadratic parameter estimate. A test of
hypothesis then determines whether the second stage design should be
one which is optimal for fitting a linear or a quadratic model. A
discussion of the best error rate to use for the test of hypothesis is
included.
The appendices contain computer programs that find maximum likeli
hood estimates of the BradleyTerry parameters and 1, .... ,s. These
programs are written in the languages APL and Fortran.
CHAPTER 1
DESIGN OF PAIRED COMPARISON EXPERIMENTS:
A LITERATURE REVIEW
1.1 Introduction
An experiment in which the treatments are compared and ranked
pairwise is called a paired comparison experiment. Such an experiment
requires a number of subjects which are usually individuals, but could
conceivably be a machine, for instance. Each subject is required to
compare two samples and decide which of the two is preferable for the
attribute under study. For example, each subject could be asked to
state which of two samples tastes better. Paired comparison experi
ments are often conducted in areas such as food testing where it may be
difficult for an individual to quantify his like or dislike of a sample,
but he may be able to readily decide which of the two samples is pre
ferred.
Suppose that an experimenter wants to determine which of four
brands of coffee most consumers prefer. There are a number of experi
mental procedures available. One possibility is that each individual
be required to taste and rank all four brands from least favorable to
most favorable. However, sensory fatigue may be a problem because
after tasting the second or third sample, it may be difficult to per
ceive any difference between the subsequent sample and the first sample
tasted.
Another possible procedure is to have each individual assign to
each brand a score on some numerical scale. Sensory fatigue may again
be a problem with this experimental procedure since all four brands are
tasted by each individual. Additionally, there may be a degree of ar
bitrariness to choosing a numerical score for each treatment.
A third possibility is to conduct a paired comparison experiment.
The problems mentioned in the two preceding paragraphs are not present
for paired comparison experiments. That is to say, paired comparison
experiments have a minimal amount of individual guesswork involved.
As pointed out in the first paragraph, the complete experiment involves
a number of individuals, each one choosing out of two samples the one
which is preferable. For the particular example with four brands of
coffee, 24 subjects might be utilized, thereby resulting in 4 repli
cates of each of the 6 distinct pairs of treatments.
The treatments can have a factorial arrangement. However, facto
rial paired comparison experiments have often been analyzed ignoring
the factorial arrangement of the treatments. For example, Larmond,
Petrasovits, and Hill (1968) conducted a 2x3 factorial paired compari
son experiment, but they did not test for interaction or main effects
as normally is done with factorial experiments. Although Abelson and
Bradley (1954) presented a theoretical development, only relatively
recently have tractable methods been available for testing interaction
and main effects in factorial paired comparison experiments.
In the case of quantitative treatments, an analogue of multiple
linear regression analysis is appropriate. For example, instead of
four distinctly different brands of coffee, there may only be one brand
of coffee with four different levels of an additive. The design of
paired comparison experiments for fitting response surfaces with one
independent variable is discussed in Chapters 37.
1.2 BradleyTerry Model
Denote the t treatments as T ,T ... ,Tt. Let ni.. 0 be the
number of times T. and T. are compared, i
1 3
T. T. for "T. is selected over T.", where selection is based on the
1 ] 1 3
particular attribute under study. A general model defines ( ) function
ally independent parameters, 0 s it.. 1, such that
P(T. T.) = T ., U.. + .. = 1, ij, i,j=l,...,t. (1.2.1)
Each distinct pair of treatment comparisons is a set of independent
Bernoulli trials. The entire experiment is a combination of the (2)
independent sets.
Bradley and Terry (1952) proposed a basic model for paired compar
isons. A treatment parameter T. is associated with each treatment T.,
l 1
i=, ... ,t. The BradleyTerry model is then defined to be
P(T. T.) = .. = i./(T. + T.) i7j, i,j=l,...,t. (1.2.2)
1 j 13 1 1 3
The righthand side of (1.2.2) is invariant under scale changes, so a
single constraint is imposed to make the treatment parameters unique.
Two popular choices are Z rT = 1, or Z In 7. = 0. This constraint is
replaced with another constraint in Chapters 27.
replaced with another constraint in Chapters 27.
Another model for paired comparisons was presented by Thurstone
(1927). He used the concept of a subjective continuum on which only
order can be perceived. Let p. represent the location point on the
continuum for treatment T., i=l, ... ,t. An individual receives a
1
sensation X. in response to treatment T., with X. distributed
1 1 1
Normal(J.,l). The probability that T. is preferred over T. is then
1 1
P(T. T.) = P(X. > X.) = e /2 dy i (1.2.3)
1 J3 1 3 i,j=l,...,t.
(i.UJ.)
Bradley (1953) noted that replacement of the normal density function
in (1.2.3) by the logistic density function yields
P(T. T) = 1sech2 (y/2) dy = /(+) j (1.2.4)
J sech(y/ i J i,j=l,...,t.
(InT.lnT.)
1 J
This is the BradleyTerry model with U.=lnr., i=l, ... ,t.
1 1
Thompson and Singh (1967) gave a psychophysical consideration by
assuming a treatment T stimulates unobservable signals UI, ... ,U
which are then transmitted to the brain by m sense receptors. The U.
are independent and identically distributed, and m is taken to be fixed
and large. The two variables considered for the response X to treatment
T are
(i) X is the average of the signals U1, ... ,U
(ii) X is the maximum of the signals U1, ... ,U .
1 m
By the Central Limit Theorem, asymptotic normality results for X in
(i), which results with Thurstone's model. The random variable in (ii)
has one of three distributions depending on the distribution of the U..
1
If U., for example, is distributed exponentially, then the asymptotic
1
distribution of X is the extreme value distribution. In any case, how
ever, the BradleyTerry model in (1.2.4) results from (ii).
Rao and Kupper (1967) generalized the BradleyTerry model to allow
for ties. A tie parameter 8 2 1 is introduced, and the model becomes
P(T. T) 1 sechy/2) dy = r./(7T.+7.) i, j
1 3 4 1 1 ] i j=l.
(lniT.lnTj)+n
(lnTr.lnr.)+f
T 1 2
(T. = T = 1 sech (y/2) dy (1.2.5)
S] 4
(InTT. InTT.)n
ij
(7T. + 6T.) (6O s. + 7T.)
where ni=ln6 is the threshold of perception for the judges. That is to
say, if it is hypothesized that the two samples generate a random vari
able d which measures the difference between the samples, and further
more that Idl < n, then the judge will not be able to discern a differ
ence between the two samples, and will thus declare a tie. Note that
the BradleyTerry model is a special case with 8=1. Another model to
handle ties was proposed by Davidson (1970).
The BradleyTerry model has received much attention in statistical
literature since it was introduced in 1952. For the remainder of this
dissertation, the BradleyTerry model with ties not allowed is used.
1.3 Estimation of the BradleyTerry Parameters
and Corresponding Asymptotic Distribution
When the model was proposed by Bradley and Terry, it was assumed
that all of the n.. were equal. Dykstra (1960) extended the method to
13
permit unequal n...
The parameters are estimated by the maximum likelihood method and
likelihood ratio tests are developed. Assuming independence of the
paired comparisons, the complete likelihood function is
t a. t n..
L = H / H (T. + i .) (1.3.1)
i=l i
where a. is the total number of times T. is selected over any other
1 i
treatment, i=l, ... ,t. The likelihood equations are
t
a. n..
p .i = 0 i=l,...,t,
Pi (Pi + Pj)
J 3
j i
(1.3.2)
t
SPi= 1 ,
i=
where p.i denotes the maximum likelihood estimate of i'', i=l, ... ,t.
1 1
These equations must be solved iteratively (see Appendix A for an
APL computer program which finds maximum likelihood estimates of the
treatment parameters). A brief description of the iterative procedure
follows. Letting p(k) be the kth approximation to pi, then
t
(k) *(k) *(k)
Pi = i / Pi (1.3.3)
i=l
where
t
*(k) a (kl) (kl)  1.3.
.= / nij/(P + P ) k=1,2,.. (1.3.4)
Pi
j
j#i
(0)
The iteration can be started with p. =l/t, i=l, ... ,t. Dykstra (1956)
(0)
suggested a method of obtaining good initial parameter estimates p ,
i=l, ... ,t.
Ford (1957) proposed the model independently and proved that the
iterative procedure converged to a unique maximum for L if the following
assumption is true.
Assumption 1.1. For every possible partition of the treatments into
the two sets S1 and S2, some treatment in S1 is preferred at least once
to some treatment in S2'
If Assumption 1.1 does not hold, then a. must be zero for at least one
1
i. If exactly one a. is zero, say a. there may still be a unique
0
solution with p. =0. However, Ford proceeded to explain that if the
o
set S2 which violates Assumption 1.1 contains two or more treatments,
then the corresponding estimates of the BradleyTerry treatment
parameter must be zero, and consequently individual members of 52 could
not be compared. This fact is also evident by noticing that in (1.3.4),
*(k)
pi would be zero for at least two values of i, and so by (1.3.3),
(k)
Pi would also be zero for the same values of i. Then the next time
in the iterative procedure that (1.3.4) is executed, the denominator
in the summation would be zero for some i and j, and hence the itera
tive procedure would not converge.
The asymptotic theory and tests of hypothesis require the follow
ing additional assumption.
Assumption 1.2. For every partition of the treatments into two non
empty subsets S1 and S2, there exists a treatment T ES1 and a treatment
T.S such that j. >0, where
n..
ij N
ij = Nlim N itj, i,j=l.....t,
N = ni
i
Bradley (1955) investigated asymptotic properties of the treatment
parameters with each n.,=n. Davidson and Bradley (1970) considered a
multivariate model where each subject chooses one of two samples on more
than one attribute. As a special case of the multivariate model, David
son and Bradley obtained a more general result, allowing the n.. to be
different. They showed that the asymptotic joint distribution of
vN(p 1r), .....i./(p r ) is the singular multivariate normal distribu
tion of dimensionality (t1) in a space of t dimensions, with zero mean
vector and variancecovariance matrix Z = ( ..), where
3
7.. = cofactor of A.. in
13 13
13 txt txl
1'xt 0
13 txt tx1
1' 0
 xt
(1.3.5)
1' = (1,1, ... ,1), the tdimensional unit row vector, and
t
_ii i= Z 'J i=l,..,t,
ji (7T + T" )
Jti
A = U i/(i + T)2
13 13 1
It is apparent that the variances and covariances in (1.3.5) depend on
the true treatment parameters. Hence estimates of the variances and
covariances are usually found by substituting the consistent maximum
likelihood estimates for 71' "'. t into (1.3.5).
In the special case with i.=l/t, n..=n for all i,j,
1 13
0.. = 2(tl) /t3 and a
= 2(tl)/t3 ij.
(1.3.6)
Use of (1.3.6) is adequate if 7T, ... 7 t are approximately equal.
ij, i,j=l,...,t.
The major test proposed is a test of treatment equality,
H 0 i = = it = 1/t ,
versus
H : T. f j for some i,j, i/j, i,j=l,...,t.
a i 3
The likelihood ratio statistic is
21nX = 2Nln2 2 [ nijln(p +p ) a In pI.
i
Under the null hypothesis, H 21nA is asymptotically central chi
o
square with (tl) degrees of freedom. Bradley and Terry (1952) and
Bradley (1954) provided tables for t=3,4,5, including exact signifi
cance levels. Comparison of the chisquare significance levels with
the tabled exact significance levels suggests that the former may be
used even for relatively small values of n... This is perhaps compar
able to using the normal approximation to the binomial.
The asymptotic distribution under the alternative hypothesis is
also available for 21nX. Let
6 t
1 iN
S= + = 0 lim 6. = 6.
i=l
Then 21nA is asymptotically noncentral chisquare with (tl) degrees
of freedom and noncentrality parameter i where
92
1 4 Iij(i 6
i
1.4 Designs of Factorial Paired Comparison Experiments
Littell and Boyett (1977) compared two designs for RxC factorial
paired comparison experiments. The two designs considered are:
Design I: Consists of independent paired comparisons of each of the
RC
( ) pairs of treatments.
Design II: Consists of comparisons between pairs of treatment com
binations differing only in the levels of one factor, i.e. comparing
two treatments in the same row or column.
Note that there are (R) = RC(RC1) distinct pairs of treatments
2 2
for Design I, but only R( ) + C( ) =RC(R+C2) distinct pairs of treat
ments for Design II. In the 4x4 case, for example, Design I requires
120 distinct paired comparisons whereas Design II requires only 48
distinct paired comparisons. The two designs could entail the same
total number of paired comparisons, but Design II requires preparation
of fewer distinct pairs of samples.
Design II is also preferable from a computational point of view
because its analysis can be carried out by using a computer program
capable of handling the oneway classification with unequal replication
of pairs. A specialized factorial program is required for Design I to
obtain maximum likelihood estimates under the main effects hypothesis,
H : T., = .B., where T., is the BradleyTerry treatment parameter
0 13 1 3 1]
given by (1.4.1).
The two designs are illustrated for the 2x2 case in Figure 1.1.
The comparisons made with Design I are indicated by the solid arrows.
The comparisons made with Design II are indicated by the dotted arrows.
Factor B
high
cE z
Figure 1.1. Depiction of Designs I and II
low
Factor A
high
Designs I and II were compared by Littell and Boyett on the basis
of the asymptotic relative efficiencies, in the Bahadur sense, of their
respective likelihood ratio tests. That is, test statistics are com
pared on the basis of the ratio of the exponential rates of convergence
to zero of their observed significance levels. In general, the Bahadur
efficiency of test 1 relative to test 2 is defined to be
lim( 2 log L ())
n n
E = (2) = cl ()/c2 (e)
lim( log L )
n n
rrto
(i)
where L is the observed significance level of test i. The function
c. () is called the exact slope of test i.
Usually Bahadur efficiency is used to compare two test statistics
that are based on the same sample space. However, its use is equally
appropriate when the sample spaces are different. The two designs
should be compared on the basis of an equivalent number of comparisons
in each. Therefore, a "single observation" with Design I will be taken
as n = R+C2 comparisons of each pair in the design, and a "single
observation" with Design II will be taken as niI = RC1 comparisons of
each pair in the design. This gives RC(RC1)(R+C2) total comparisons
in each design.
Let T.. represent the treatment with the first factor at level i
and the second factor at level j. The BradleyTerry model is then
P(T. Tkl) = T ij i + k ik=l" (1.4.1)
j,1=1,...,C.
The following tests of hypotheses were considered by Littell and
Boyett.
Test 1: H : = 1/RC H : 7. = .B .
o 1i a 13 x]
This is a test for detecting two main effects versus no treatment dif
ferences. Efficiencies were tabulated for the 2x2 case (Littell and
Boyett (1977), Table 1). The table showed that Design I is more effi
cient than Design II for most combinations of a and 3.
Test 2: H : T.. = 1/RC, H : .,. = 8./R.
o 13 a 13 3
This is a test for detecting a single main effect versus no treatment
differences. For this test, Design I was shown to be uniformly more
efficient than Design II.
Test 3: H : T.. = ./R H : i. = aB .
o j3 3 a 13 i j
This is a test for detecting two main effects versus a single main ef
fect. Efficiencies were again tabulated for the 2x2 case. The table
showed that Design I is more efficient than Design II for most combina
tions of a and B.
Test 4: H : T.. = a.6. H : T.. 0 .
o 13 i 3 a 1]
This is a test for detecting interaction versus two main effects. The
results showed that Design II is uniformly superior to Design I for
detecting interaction.
Summarizing, when testing for main effects (Tests 13), Design I
is usually more efficient than II. When testing for interaction,
Design II is uniformly more efficient than I.
Littell and Boyett suggested that it might be a good idea for the
experiment to be conducted in two stages. The first stage is a number
of replicates of Design II followed by a test for interaction. If it
is significant, the second stage is completed by again using Design II,
and simple effects are analyzed by applying oneway analyses within rows
and columns. If interaction is not significant, then the second stage
consists of replicates of pairs in Design I that are not in Design II,
and main effects are analyzed.
The total number of comparisons to be made in the experiment can
be divided into the two parts such that after the experiment is complet
ed, the end result is either a complete Design I or Design II. For
example, suppose the treatments form a 2x2 factorial, and resources are
available for 24 comparisons. In the first stage, the experimenter runs
4 replicates of the four comparisons in Design II (see Figure 1.1).
If interaction is significant, two more replicates of the four compari
sons in Design I that are not in Design II are run. The latter then
results in the same comparisons made as if a single stage experiment
using Design I had been conducted.
Quenouille and John (1971) also considered designs for 2n facto
rials in paired comparisons. However, they assumed an individual is
asked to assign a number on a scale to signify the degree of difference
in the two samples instead of merely deciding which sample possesses
more of the particular attribute under study. This model was developed
by Scheffe (1952).
In particular, Quenouille and John discussed the following 23
factorial. The 28 possible paired comparisons were divided into 7 sets
of 4 blocks as follows:
Set (1) : (I,ABC) (A,BC) (B,AC) (C,AB)
Set (2): (I,AB) (A,B) (C,ABC) (AC,BC)
Set (3) : (I,AC) (A,C) (B,ABC) (AB,BC)
Set (4): (I,BC) (A,ABC) (B,C) (AB,AC) (1.4.2)
Set (5) : (I,A) (B,AB) (C,AC) (BC,ABC)
Set (6) : (I,B) (A,AB) (C,BC) (AC,ABC)
Set (7) : (I,C) (A,AC) (B,BC) (AB,ABC)
The letter "I" represents the treatment where all factors are at their
low levels. If a letter appears, then that factor is at its high level.
Note that each treatment appears once in each set. These sets are con
structed by first pairing treatment I with each of the remaining treat
ments. These are called the initial blocks. The remaining blocks in
the sets are constructed by multiplying the initial block by any treat
ment that has not yet appeared in that particular set, with the usual
2 2 2
rule that A = B = C =1.
To illustrate this technique, we will construct set (1) in (1.4.2).
The initial block is (I,ABC). So far only I and ABC have appeared, so
we may choose any treatment other than these. Suppose we choose BC.
Then the next block in the set would be (BC,A). Note that order is
unimportant. Now we may choose any treatment other than I, ABC, BC, or
A. If we choose B, we get the third block in set (1). To construct the
fourth block, there are only two choices remaining, namely C or AB.
Either one results in the last block of set (1).
Each set lacks information on the effects that have an even number
of letters in common with the initial block. For example, the initial
block for set (1) is (I,ABC), so this set provides no information on
the effects AB, AC, and BC, i.e. on all firstorder interactions. Sim
ilarly, the initial block for set (5) in (1.4.2) is (I,A), so this set
provides no information on B, C, and BC.
This concept of no information on B, C, and BC from data in set
(5) may be viewed as follows. If the responses from comparisons in set
(5) are modeled as
Y(I,A) = (p+a+e (p+e a+E(I,A)
Y(B,AB) = (+a+b+(ab)+eAB) (+b+e = a+(ab)+(BAB)
Y(C,AC) = (+a+c+(ac)+eAC) (U+c+eC) = a+(ac)+E(C,AC)
Y(BC,ABC) = (p+a+b+c+(ab)+(ac)+(bc)+(abc)+e BC) 
(BCABC) ABC
(p+b+c+(bc)+eBC
= a+(ab)+(ac)+(abc)+E(BCABC)
where all of the e's are independent and have expected value zero, then
it is clear that b, c, and (bc), corresponding to the effects B, C, and
BC, are not estimable from the four comparisons in set (5).
Quenouille and John define efficiency as follows. Suppose the
design is made up of r sets, s (s : r) of which give information on some
particular effect. Then s/r is the fraction of the comparisons made
that give information on that particular effect. On the other hand,
suppose that each set is run exactly once, i.e. each pair of treatments
is compared. It can be shown that there are a total of (2n1) sets,
n of which give information on any particular effect. In this case,
2n/(2 1) is the fraction of the comparisons made that give
Efficiencies of
Table 1.1.
balanced 2 and 2 factorial paired comparison designs
n Design
{2}
2
f1}
{3}
{2}
{1}
3
{3,2}
{3,1}
{2,1}
Number of
blocks
2
4
4
12
12
16
16
24
Efficiency of interactions of order
0 1 2
1.50 0.00
0.75 1.50
1.75
1.17
0.58
1.31
0.88
0.88
0.00
1.17
1.17
0.88
0.88
1.17
1.75
0.00
1.75
0.44
1.75
0.88
Notation: {j}: A design constructed by combining all
block I and a jfactor combination.
{j,k): A design constructed by combining the
(k}.
Source: Quenouille and John (1971, Table 1).
sets with initial
designs {j} and
information on any particular effect. The efficiency defined by Que
nouille and John is the ratio of these two fractions. In other words,
the efficiency is defined to be
S s/r
E
2nl/(2nl)
Table 1.1 gives the efficiencies of some designs in the 2 and 2
cases. These designs are balanced in the sense that all main effects
are estimated with the same efficiency, all first order interactions
are estimated with the same efficiency, and so on. The table indicates
that in the 22 case, Design {1} is efficient for estimating the inter
action effect. This result is the same as the result Littell and
Boyett obtained using Bahadur efficiency.
ElHelbawy and Bradley (1978) considered treatment contrasts in
paired comparison experiments. The theory developed is completely gen
eral, but they specifically showed how optimal designs can be found in
the case of a 23 factorial. Their optimality criterion is the maximi
zation of the chisquare noncentrality parameter from the likelihood
ratio test, which thereby maximizes the asymptotic power of the test.
Let a=(al,a2,a3), where a. represents the level of factor i,
i=1,2,3, and each a. is 0 or 1. Then the treatment parameters are
1
defined by ElHelbawy and Bradley to be
1 = 2 3 12 13 23 123 (1.4.3)
= T T T 7 i T v (1.4.3)
a al a2 a3 ala2 aa3 a2a3 ala2a3
for all a. The factorial parameters are defined by taking logarithms
of the 8 treatment parameters given by (1.4.3). So from (1.4.3),
Ya = logfa
3
= log + log7 + logT123
a a.a alaa3
3
= Ya + Y + Yl23 (1.4.4)
i=l i
The last two expressions in (1.4.4) define the factorial parameters
1 2 3 12 13 23 123
Y' Y2 Y 3 2a, y Y These parameters are
as1 a2 a3 ala aal a a3 3 ala2a3
subject to the usual analysis of variance constraints, namely
= 0 ,
i=1,2,3,
1
Ya a
a.=0
3
= 0 ,
i
(1.4.5)
1
S 123
ala a3
a 2=0
1
12a3
a3=0
The null hypothesis of no twofactor interaction
two factors is now considered. This test may be made
12
Ho: YOO
between the first
by testing
= 0 .
12
The constraints then require that Y 2=0, al,a2=0,1. The null hypoth
esis is therefore equivalent to
12 12 12 12
S 00 Y O 01+ = 01
12 12 12 12 12
since by (1.4.5), y0 y Y + Y1 = 4y00 Let B denote a
matrix whose rows are orthonormal contrasts. If we let Y' =
(Y000,Y 001,Y010 00 ,Y01 ,lll), then the null hypothesis is
also equivalent to
H : B y = 0,
o n=l
where B = S (1,1,1,1,1,,1,1).
"n=1 8
a.=0
1
Ya a.
a. =0
I
1
7123
al=0
= 0.
Let
a if comparing T. and T. give no information (in the
sense discussed by Quenouille and John) on the two
n .
13
N factor interaction between the first two factors,
b otherwise,
where n.. is defined in Section 1.2, and N = n
i
Then 12a + 16b = 1, since 16 of the 28 distinct pairs give information
on the twofactor interaction. ElHelbawy and Bradley then proceeded
to show that b should be taken as large as possible in order to maximize
the noncentrality parameter. This gives the result b=1/16, a=0. So
the optimal design is to compare the treatments that give information
on the twofactor interaction between the first two factors. These
pairs of treatments would be the sets (3), (4), (5), and (6) in (1.4.2).
This result is the same as Quenouille and John would obtain if
they wanted to find an optimal design to estimate AB interaction, since
using these 4 sets maximizes the efficiency of the AB interaction using
their definition of efficiency. It is also analogous to the result
Littell and Boyett obtained because the treatments compared in either
case differ either in the level of A, or the level of B, but not both.
ElHelbawy and Bradley noted that the result discussed in the two
preceding paragraphs still holds if other specified factorial effects
are assumed to be null (e.g. threefactor interaction). While the sta
tistics for the two tests may differ, their limit distributions are the
same for either H or H
o a
ElHelbawy and Bradley also considered simultaneously testing for
the presence of AB, AC, and ABC interactions. In this case the null
hypothesis is
H: B y= 0,
o n=3
where
1 1 1 1 1 1 1 1
B = = 1 1 1 1 1 1 1 1
n=3 Y8
1 1 1 1 1 1 1 1
By the same method they concluded that the optimal design is to make
the following comparisons (using Quenouille and John's notation):
(I,A), (B,AB), (C,AC), and (BC,ABC). Note that this is set (5) in
(1.4.2). Note also that this is the only set that contains information
on all three of the effects AB, AC, and ABC. So this design is also
optimal using Quenouille and John's definition of efficiency.
Beaver (1977) considered the analysis of factorial paired compari
son experiments using the weighted least squares approach of Grizzle,
Starmer, and Koch (1969). He considered a loglinear transformation on
the T. 's. This approach produces noniterative estimates of the i.'s.
Also, the usual tests for a factorial experiment are easily implemented.
However, Beaver gave no consideration to experimental design.
1.5 Response Surface Designs
Springall (1973) considered response surface fitting using the
BradleyTerry model with ties allowed as generalized by Rao and Kupper
(1967). The model is given in (1.2.5).
The logarithm of the treatment parameters are defined to be func
tions of continuous, independent variables, x ,x2, ... ,xs, measurable
without error, i.e.
s
In = k xi i=,...,t. (1.5.1)
k=l
Note that some of the variables can be functions of other variables.
2
For example, s=2 and x 2=xli results in the parameterization
2
In i = 1xli+ 2xli .. The parameters 6, ... are estimated by
the maximum likelihood method. Appendix B has an APL program and a
Fortran program which produce maximum likelihood estimates of ...
I but which assume there are no ties.
Springall presented the following definition, theorem, and example.
Definition 1.1. Analogue designs are defined to be designs in which
the elements of the paired comparison variancecovariance matrix are
proportional to the elements of any classical response surface variance
covariance matrix with the same design points and an equal number of
replicates at each design point.
The advantage of such designs is that they enable certain desirable
properties found in classical response surface designs (e.g. rotatabi
lity, orthogonality), which are dependent on the relative sizes of the
elements of the variancecovariance matrix, to be readily produced.
*
For example, consider the classical regression model y = 0 +
*2
81x + 8x Then an experiment can be designed in such a way that the
variancecovariance matrix of the least squares estimators 0' and
"* m*
variancecovariance matrix of the least squares estimators 3 B%, and
82 possesses an optimal form. For instance, the classical design with
an equal number of replicates at each of the four levels x=2,1,1,2
can be shown to be rotatable, and so the analogue paired comparison
designs discussed in Example 1.1 below have the same property, i.e.
var(ln ) = var(ln ).
x x
2
The analagous paired comparison model is In x = x + 2 x An
analogue paired comparison design has a corresponding variancecovari
ance matrix of the maximum likelihood estimators 81 2, such that each
element is a constant times the corresponding element of the least
squares variancecovariance matrix. For the particular example of a
quadratic model currently under discussion, var(6l) = c.var(B ),
var( 2) = cvar(B2), and cov(S ,62) = ccov(18 2), for some c.
Springall described a method which uses linear programming tech
niques that produces an analogue design with minimum elements of the
variancecovariance matrix. The following theorem shows how one can
produce an analogue design, but it does not necessarily have minimum
elements of the variancecovariance matrix.
Theorem 1.1. If a classical response surface design has certain prop
erties dependent on the relative sizes of the elements of the variance
covariance matrix, then the analogue paired comparison design will have
the same properties if
n. = N [i kl 11
k
where i.. = i.7./(7. + .) 2
iJ I J J
Example 1.1. Consider the quadratic model In Ti = B x + 2x2, with
design coordinates x=2,l,l,2. Assume e=1 and 1= 2=0, and set N=150.
Springall showed that an analogue design is given by n.. = 150/6 = 25,
i
An alternative design may be obtained by using linear programming.
Let Z denote the variancecovariance matrix of the maximum likelihood
estimates 1I and B2. By simultaneously maximizing the elements of ET
subject to the constraint N=150, a design which minimizes the elements
of Z is produced. Springall did this and arrived at the design
{n 2=0, n 1371, n14=9, n23=0, n24=70, n34=0}, after integerization of
the n...
13
The asymptotic distribution of the maximum likelihood estimates
of 8,81 .. ,s was presented by Springall, but he presented few de
tails. Therefore, the maximum likelihood estimates of the parameters
81, ... ,s and their asymptotic joint distribution are developed in
Chapter 2. The tie parameter, 0, is not considered in the remainder
of this dissertation.
CHAPTER 2
ASYMPTOTIC THEORY OF THE
MAXIMUM LIKELIHOOD ESTIMATORS
2.1 Introduction
This chapter contains the asymptotic theory for the maximum like
lihood estimates of the parameters given by (1.5.1), where the logarithms
of the treatment parameters are expressed as a polynomial function of a
single quantitative variable, x. An example of an experiment which had
a single quantitative variable was given in Section 1.1. The treatments
in that example were different levels of an additive for coffee. For
an experiment with quantitative independent variables, it is appropriate
to do a paired comparison analogue to classical regression analysis.
The theory developed in Sections 2.2 and 2.3 assume the general
case of s independent variables xl, ... ,xs. The parameterization of
the BradleyTerry parameters, given in (1.5.1), is reproduced here:
s
ln = Z kki' i1...
k=l
The BradleyTerry model with ties not allowed is used, where the r. are
defined by (1.2.2).
In order for all of the parameters 81 ... ,s to be estimable, it
is required that s a (tl). This is clearly necessary since the Bradley
Terry parameters have dimension (tl). Note also that an intercept
parameter, 0 is not estimable without imposing a constraint on the
BradleyTerry parameters. This becomes apparent when examining the
preference probabilities. For example, suppose the model was
s
in = 0 + k xki
k=l
i=l,...,t.
Then
P(T. T.) = ./(T. + r.)
I 3 1 1 3
s
exp(80 + kki
k=l
s
exp (B0 + Bkxki)
k=l
s
+ exp(B0 + 7 kxkj
k=l
s
exp( kxki)
k=l
s s
exp( kxki) + exp( kxkj)
k=l k=l
isi
The parameter 60 falls out, implying that any choice of B0 results with
the same preference probabilities. Choosing B0=0 replaces the imposi
tion of a constraint on the treatment parameters.
2.2 Estimation
As was previously mentioned, the parameters B1, ... ,s are esti
mated by the maximum likelihood method. From (1.3.1), the loglikeli
hood equation is
t
In L = a.lni. n .(nT(r. +
Si= il
i=l i
t s
= ia Skkik
i=l Lk=l
(2.2.1)
I n ijn exp( S kki) + exp( kxkj
i
The maximum likelihood estimators are found by taking partial
derivatives of the loglikelihood equation, setting them equal to zero,
and solving for the parameters. The likelihood equations are
t
9ln L r
1n L = a.x n.)
r i=l i
r
Xrj
= i m. m
isj ij
s s
Sriexp( kx)ki + x rjexp( kxkj)
k=l k=l
s s
exp(Z kkxki) + exp( 7 kXkj
k=l k=l
s s
iexp( SkXki) + xrj exp( kxkj)
k=l k=l
s s
exp( kxki) + exp( B Skxkj
k=l k=l
i= > 7m. ri r = 0 r=l...s.
13 ] + 7.
i/j *L 3
(2.2.2)
where m.. is the total number of times T. is preferred over T..
1] 3
These equations must be solved iteratively. Appendix B contains
an APL program and a Fortran program which find the estimates of 81,...,
0 The procedure described in the appendix converged for most situa
s
tions. However, occasionally when an attempt was made to fit a full
model (s=tl), the procedure did not converge, which remains unexplained.
In this situation one can resort to the program found in Appendix A
which finds estimates of the BradleyTerry parameters, and then solve
(tl) equations in (tl) unknowns to find the estimates of 8 ,... s
2.3 Asymptotic Distribution
First of all, some additional notation and elementary theory is
presented. This will become useful when the asymptotic joint distribu
tion of ... is derived.
1 s
Let Xik be a random variable which is equal to 1 if T. is selected
i;)k 1
th
over T. for the k comparison of T. and T., 0 otherwise. Then X.
3 1 3 13k
has the Bernoulli distribution with probability of "success" equal to
T./(T. + T.), i,j=l, ... ,t, k=l, ... n.j., and the Bernoulli random
variables are independent of each other. This implies that m. has the
1J
binomial distribution, with sample size n.. and "success" probability
equal to T./(IT + T.), where
1 i 3
13
m. = Xijk i
k=l
Furthermore, the m..'s are also independent of each other, and from the
ii
properties of the binomial distribution we have
Em.. = n. .T./( + T.) ,
1J 1J J 1 3
i
(2.3.1)
2
var(m.ij) = n.ijiT/(ir + .) i
1D 1313 1 3
cov(mij,mkl) = 0 ,
i
i,j,k,l=l,...,t.
From (2.3.1), (2.3.2), and (2.3.3), it follows that
n. Tr. nk T
E(m..m. ) = (Em.j) (E ) = k
] k mk iT. + Tj. Tk + T '
1 ] k I
2 2
2 2 in. j.T + n.. i.
Em.. = var(m..) + (Em..)2 1 J l 1
13 13 13 (I + i)2
1 3
(2.3.2)
(2.3.3)
(2.3.4)
(2.3.5)
Under certain regularity conditions verified by Springall (1973),
it follows from a theorem found in Wilks (1962, page 380) that
NI( ) ...s, N( B ) is asymptotically multivariate normal,
with mean vector ~ and variancecovariance matrix (ab), where
with mean vector 0 and variancecovariance matrix (A ,) where
ab
= 31n L ln L
a = E aJ '
a b b
a,b=l,...,s.
We now proceed to derive the asymptotic variancecovariance matrix
by deriving Xab for general a and b. From (2.2.2) we have
Ix .x x .x
Dln L = Xai x a + a a .
~ ~ = L L m^ij + L ^7 m+jT j
a i
N 4V j
3<1
(2.3.6)
 X X X .
SX i m.. + a] ai (n.. .. .
zi< i z + 1j T2 3 + 1 ijj )
i
m. (x x ) n. a=l,...,s. (2.3.7)
i< j ( a 
Then from (2.3.4)(2.3.7), we find Xab to be
A = E m. i (x .x ) n. i.
ab i
i
(ijk or j24)
2 2
n. .T i T n. i T
i
 nij r nkl~Tk
2 n n (x \x ) (x x )
z j (zz : .) (T+T, 7+1) I ai aj bk bl
i
+(7C + lT ) kT + T1) (Xaixaj) kxbk xbl)
i
i< 1 2l
ni ( Z 2+, Zj r.)2 T aib=l,....s. (2.3.8)
T < T.T
As previously mentioned, the remaining chapters deal with finding
optimal designs when the logarithm of the treatment parameters is ex
pressed as a polynomial equation in a single variable. The various
optimality criteria considered include
(i) Minimization of the variance of the maximum likelihood esti
mator of one parameter,
(ii) Doptimality,
(iii) Minimization of the average variance of the predicted
"response", In iT
(iv) Minimization of the maximum variance of In 7 (minimax),
x
(v) Minimization of the bias,
(vi) Minimization of the average mean square error of In Z when
x
bias is present (Jcriterion),
(vii) Doptimality for a preliminary test estimator.
These are all defined in subsequent chapters.
For the case of a single quantitative independent variable, the
model given in (1.5.1) becomes
s
n t. = x i=l,...,t. (2.3.9)
i k ,i k ..
k=l
Similarly, the variancecovariance matrix given in (2.3.8) becomes
Sl
(ab)1, where
\b = nj 2 (x ax a)(x.bx.b), a,b,...s.
ab + ( 2 + 1 (2
i
Notice that the complete design problem is quite complex. It
involves deciding how many levels and additionally which particular
levels should be chosen. Furthermore, it must be determined how often
to compare each pair of treatments.
A further complication is apparent by noticing that the variance
covariance matrix, whose elements are given in (2.3.10), depends on the
true parameter values. Hence the optimal designs also depend on the
parameters. The designs discussed by ElHelbawy and Bradley and Spring
all were found under the assumption that all BradleyTerry treatment
parameters are equal, or equivalently that = 2= =B =0. In no
paper of which the author is aware was the dependency of the optimal
design on the true parameter values examined. As will become apparent
in the following chapters, the optimal design for one set of parameter
values can be quite different from the optimal design for another set
of parameter values.
CHAPTER 3
MINIMIZATION OF THE VARIANCE
OF A SINGLE PARAMETER ESTIMATE
3.1 Introduction
The present chapter contains a discussion of the linear, quadra
tic, and cubic models. The optimality criterion considered is the
minimization of var(S.), where i=1,2, and 3 for the linear, quadratic,
and cubic models, respectively. The linear model is discussed in Sec
tion 3.2. Designs for the quadratic model are found in Section 3.3 for
32=0, 81 variable. The general expression for var(82), given by
(3.2.6), can be shown to be a continuous function of 2, and hence the
designs in Section 3.3 are locally optimal at the point B2=0. Likewise,
designs for the cubic model are found in Section 3.4 for 3=0, 81 and
2 variable, and are similarly locally optimal.
3.2 Linear Model
From (2.3.9), the linear model is
In f. = 8x. i=l,...,t. (3.2.1)
1 1
Since there is only one parameter, by (2.3.10) the variancecovariance
matrix becomes a simple variance, X1, where
B(x. + x.)
S TS' e I ]2
i
i
(x. x.)2
ni. (exp(8(x.x.)/2) + exp(B(x.x.)/2))2
i
a.
= nij 8d../2 Bd/2 (3.2.2)
i
where d..=x.x.. For the remainder of this dissertation, it is assumed
13 i J
without loss of generality that the independent variable, x, is scaled
such that it is in the interval [1,1].
Although it is not necessary to treat the two cases B=0 and SrO
separately, the special case 8=0 is considered first because the argu
ment used to derive the optimal designs is easier to follow for this
situation. When 8=0, (3.2.2) reduces to
n = n. (x. x.)2/4 (3.2.3)
i
The expression in (3.2.3) is a maximum when all the "weight" of the n..
1)
2
is placed onto pairs which maximize (x. x.) Clearly this implies
1 3
that the design which minimizes var(B) is a comparison of only the pair
(1,1).
The designs which maximize A when 85O are found using an argument
identical to the one used for the case $=0. From (3.2.2), all of the
"weight" of the n.. should be placed onto pairs which maximize
13
2 (d/2 $d/2 2
y = d2(e + e 2 (3.2.4)
where d is then the optimal difference between the levels in the pairs.
The value of d which maximizes y in (3.2.4) is the solution to
y 2d ^d2(ed/2 eBd/2
y = 2d Sd 2(e = 0 (3.2.5)
3d (ed/2 + ed/2)2 (ed/2 + ed/2 3
The equality in (3.2.5) implies
coth(Sd/2) = Sd/2 (3.2.6)
The solution to coth(u) = u is a transcendental number, and can be
shown to be approximately u=1.1997. Hence, from (3.2.6) and the fact
that Idl must be less than or equal to 2, the design which minimizes
var(B) for a given value of 8 is
dl = Ix. x.I = min(2,2.3994/ B ) (3.2.7)
It is now shown that the value of Idl given in (3.2.7) is in fact
a maximum for y, and consequently for A. By (3.2.5), the second par
tial derivative of y with respect to d is given by
2 y Bd/2 + d/2 2
Y (e + e )
Dd
22 322 2
2 4dBtanh(Bd/2) 2d2/2 + 6 d tanh ($d/2) (3.2.8)
The second partial derivative 3 y/d2 is negative if and only if the
expression in (3.2.8) is negative. From (3.2.6), the equality
2 = Bdtanh(8d/2) (3.2.9)
holds at d=2.3994/181, which is then substituted into (3.2.8) to give
2
3 = 2 8 B2d2/2 + 6 = 2d2 /2 < 0 (3.2.10)
opt. d
Because there is only one solution to (3.2.6) where d>0, the result in
(3.2.10) implies that the solution of Idl given in (3.2.7) does in fact
minimize the variance of B.
By (3.2.7) the optimal design is a comparison of only the pair
(1,1) when 8 <1.2. It can be shown that P(l 1)=.92 when 8=1.2.
Because of this relatively large preference probability of choosing one
endpoint over the other, in most practical situations B will be less
than 1.2. So in these cases the optimal design is a comparison of only
the endpoints of the experimental region.
The optimal designs given by (3.2.7) are also optimal for other
criteria. In particular, they are optimal for criteria (ii)(iv) intro
duced in Section 2.3. These criteria are formally defined in Chapter 4,
but a brief explanation for their equivalence in the case of a linear
model follows. The equivalence of the criterion being discussed in the
present section to Doptimality follows from the fact that the deter
minant of a Ixl matrix is simply the only element in the matrix. The
equivalence to the other two criteria follows from the fact that
2
var(ln i ) = x var() .(3.2.11)
x
From (3.2.11), criterion (iii) is, in this case, the minimization of
2 2 "
I
x var(8) dx = var(8)
1
and criterion (iv) is the minimization of
2
max x var(B) = var(B)
xELl,1]
3.3 Quadratic Model
The quadratic model is
2.
In = xi + x i=, ,t.
1 81xi 2 1 '
From (2.3.10),
mum likelihood
the asymptotic variancecovariance matrix of the maxi
S2 i 1
estimators a1 and 82 is (A ) where
1 2 abI
A11 =
i
12 = I
i
2i
i
2
ijij(X i x)
*..(X. x.)(x. 2 2
nij ij(xi 2 x
2 2 2
n ijij (xi x. )
13fl 1
(3.3.2)
(3.3.3)
(3.3.4)
and where
.ij = (xi'x) =
iJ 1 3
exp( 81(x. + x.) + 8(x.2 + .) )
S2exp( 2 1
2 2 )2
( exp(8 1xi + 8 2xi ) + exp(81xj + 2xj ) x
(3.3.5)
i
Minimization of var( 2)
Consider the test of the hypothesis
H : 8 = 0 S, unspecified,
o 2 1
versus
H : S2 / 0 1B unspecified.
a 2 2.
(3.3.1)
Minimizing the asymptotic variance of 52 would equivalently be maximiz
ing the asymptotic power of the test. The variance of 2 is the lower
righthand element of the 2x2 matrix (lab) That is,
var(B2) = 1 (3.3.6)
11 22 12
Under the null hypothesis, (3.3.5) becomes
8 (Xi + Xj)
e
J = 1 2 i
(e13 +e )
As was pointed out in Section 3.1, in the present section (3.3.6)
is minimized for 82=0. This produces locally optimal designs. Assuming
52=0 greatly simplifies the design problem as will be seen in the
remainder of the section.
The end result of Lemmas 3.13.2 and Theorems 3.13.3 is a set of
optimal designs which minimize var(82) under the null hypothesis for
various choices of the linear parameter. Hereafter, "optimal design"
will refer to the design which is optimal for the criterion being dis
cussed in the section.
The following two definitions are useful in the discussion that
follows.
Definition 3.1. The pair of treatments (x ,x2) is defined to be
the symmetric counterpart of the pair (x ,x2).
Definition 3.2. A design is said to be symmetric if each pair in the
design is compared as often as its respective symmetric counterpart.
Lemma 3.1. Let {(x.,x.) denote .ij as given by (3.3.7). Then
(xix .) = (Xix.x), for any x. and x..
Proof.
8 C(x. + x.) 2 1(x. + x.)
x e e
X i lxj 2 281(xi + x.)
Se +e ) e 3
81(xix)
e
e1(x.) + e1 (x ) 2
Ce^ j' +e
= 0(Xi,xj ) +
Theorem 3.1. The optimal design is a symmetric design.
Proof. Notice that both of the following hold:
2 2
(i) (x. x.) = ( (x.) (x ) )
3 1 3 1
(ii) (x.2 x 2) = ( (x.) (x.)
From (3.3.2), (3.3.4), Lemma 3.1, and the above, it is clear that ll
and X22 are invariant under changes of any number of comparisons of
pairs to comparisons of their symmetric counterparts.
Let D denote any particular design, and let varDC (2) be the var
th
iance of 2 for Design D Also, let the i pair in D be denoted
(x.,xi). The sets of comparisons {(x.,x ), (x.,x)} are considered
in sequence, and without loss of generality, it is assumed that
n , n = d. 2 ,
x.,x. X. ,X.
1 1 1 1
for all i. Notice that it could be true that n = 0.
X X
I I
Design D2 is formed from Design D1 by changing d./2 of the compar
isons of the pair (x.,xi) to comparisons of the pair (x.,x'). This
1 1 1 1
results in (n + n )/2 comparisons of each of the two pairs
x.,x. x.,x.
11 1 1
th
in the i set. This change is made for each set of comparisons, there
by defining Design D2 to be a symmetric design. The total number of
comparisons, N, is left unchanged.
By the first paragraph of this proof, the values of All and A22
are the same for the two designs. By (3.3.3), 12=0 for Design D2,
12
and hence A2 is a minimum. Therefore, by (3.3.6),
varD (2) S varD (2) (3.3.8)
Note that the variances are equal if 12=0 for Design DI, and it
is possible that A 12=0 for a nonsymmetric design. However, (3.3.8)
shows that no design can "do better" than a symmetric design. Also
note that in practice, d. must be an even integer in order for the
1
n. 's to be integers.
So the search for optimal designs in the present section will hence
forth be restricted to the class of all symmetric designs.
Because 12=0 for a symmetric design, (3.3.6) reduces to
var(B ) = 1/22 (3.3.9)
So it is now necessary to find designs which maximize A22. Notice that
122 depends on 1. As previously mentioned, the optimal design also
depends on the value of S1.
Theorem 3.2. The optimal design is a design in which the only compar
isons made are comparisons of the pairs (x,l) and (l,x) an equal num
ber of times, for some x. Furthermore, x, 0.
Proof. First of all, the notation used for A22' which is defined by
(3.3.4) and (3.3.7), is changed. The N total comparisons are arbitrar
.th
ily ordered, and the i pair is then defined to be (x.,x!). The vari
ables y. and z. are defined to be
1 1
y = x. x z. = x + x i=,...,N.
1 1 1 1 1
Then from (3.3.7),
1(x' + x) 8 l(X + x!)
1 i i 11 1
= e e
i li xx l Sl(xi + x!)
(e +e ) e
8 y /2 1+i/2 2 i=l .. N
(e + e
Notice that i. is a function of y. but not of z.. From (3.3.4), 12 is
1 1 12
rewritten as
N
22 = iii zi
i=l
Let D be any design. By holding y. fixed, we examine what changes
2
in z.2 will increase 22. Clearly if 1zil is as large as possible for
every i, then X22 is also as large as possible for the particular set
of y.'s found in D. Without loss of generality, it is assumed that
1
x.
1 1 1 1 1
all comparisons must involve either x=l or x=l. Additionally, using
an argument similar to one used in Section 3.2, it is clear from the
expression for A22 given in (3.3.4) that x=l should be compared with
one and only one level of x, namely the value of x which maximizes
0(l,x)(1x ) This result in conjunction with Theorem 3.1 proves the
first part of the theorem.
We can now rewrite 122 simply as
22
122 = NO(l,x)(lx2)2
It remains to be proven that the optimal value of x is such that x L 0.
The variable I(x) is defined to be the positive square root of
X22/N, i.e.
81(1 + x)/2
1 B2x
~(x) ( x2
e1 1x
e + e
(3.3.10)
Since (3.3.10) and the fact that xE[l,l] imply I(x)
A22 is equivalent to maximizing P(x).
It will now be shown that '(x) > i(x), x a 0.
inequalities are equivalent:
f(x) a 4(x)
1 (1 + x)/2
e
1 1x
e + e
13 (l+x)+1 1 (l+x)0 x
e + e
> 0, maximizing
The following five
(3.3.11)
1e(1 x)/2
e (3.3.12)
e + e
~ (lx) + 1 1( (1x)+81x
r e + e (3.3.13)
81(1 + x)/2 1 (1 + x)/2 81(1 x)/2 8 (1 x)/2
e + e a e + e (3.3.14)
cosh(u + 81x) cosh(u) (3.3.15)
where
uu
cosh(u) = (e + e )/2 ,
u = 1 (1 x)/2
Now cosh(u) is a monotonically increasing function of u for u. 2 0.
Furthermore, it can be assumed that 3 1. 0 because Theorem 3.3 proves
that the optimal designs for 1 = 10 are the same. Or, equivalently,
the remainder of this proof can be repeated for the case 1 <0. Then
x ; 0 and B1 > 0 implies B x > 0, and hence by the equivalence of
(3.3.11) and (3.3.15), 4(x) L(x) for all x O 0. This completes
the proof of the second part. t
The next theorem shows that optimal designs only need to be found
for nonnegative values of 81.
Theorem 3.3. The optimal design for B1=B10 is the same as the
optimal design for 81=B10.
Proof. Recall that the optimal design is found by finding the value of
x which maximizes P(x) from the proof of Theorem 3.2. Since i(x) is
actually a function of 1 and x, we write
B1(1 + x)/2
(81,x) = e1 x/ (1 2
e + e
The theorem is proved by observing that
81(1 + x)/2 81(1 + x)
e 2 e
t(81,x) = 1ex (1 x2) e* (l + x)
e +e e
81(1 + x)/2
=e (1 x2)
81x 81
1
e + e
= (1,x) +
All that remains to be done is to find the value of x that maxi
mizes f(x) for various choices of B1 0. Unfortunately, a closed form
solution of x as a function of B1 does not exist. However when 31=0,
the optimal value of x can be analytically derived, and hence this
situation is considered first.
Suppose B =0. Then from (3.3.10), i(x) = (1 x2)/2. Clearly
this is maximized by x=0, and hence from Theorems 3.1 and 3.2, the
optimal design is an equal number of comparisons of only the pairs
(0,1) and (1,0). For the case 81 0, the grid method described in the
next paragraph was used to obtain the results presented in Table 3.1
and Figure 3.1.
The grid method utilized first calculated 4(x) for x=0(.l)l,
and the best value of x, say xl, out of these 11 choices was found.
Then t(x) was calculated for the 11 values x=(x .l),(x .08), ...
(x +.l), and again the best value of x was found. This procedure was
repeated until the optimal value of x was found accurate to the fourth
decimal place. The entire procedure was done separately for each value
of 16 .
The following lemma proves that the values of x which are found
via the grid search are absolute maximums for i(x).
Lemma 3.2. Exactly one local maximum (and hence it is the absolute
maximum) exists for l (x) in the interval L0,1].
Proof. By Theorem 3.3, only the case 1 >0 needs to be considered.
From (3.3.10), the first partial derivative of l(x) with respect to x
is
1(1 + x)/2 e (81(1 x )/2 2x)
1 1(x) = 1 1x 2 x (3.3.16)
L 2 l 2
( e + e ) e (8B(1 x )/2 + 2x)
It can be seen from (3.3.16) that (0)>0 and 4i'(l)<0. Hence ,0'(x)=0
for at least one xeLo,l1, and it remains to be shown that i'(x)=0 for
exactly one xE[O,lj. (An attempt was made to show 4"(x) is negative
for all 61 and xELO,l2, but it turned out not to be true. In particular,
a graph of '(x) as a function of x for 81=5 indicated ("(x)>0 for values
of x near 0.)
From (3.3.16), W'(x)=0 if and only if
 (1 x)
(81(1 x2)/2 2x) = e () 1(1 x )/2 + 2x) (3.3.17)
Notice that the functions f(x) = ( 1(1 x2)/2 2x) and
f2(x) = (81(1 x2)/2 + 2x) are reflections of each other about the
axis x=0, and they are quadratic in x. Hence they intersect exactly
once at x=0. Also, for fixed B1, fl and f2 have maximums at 2/81 and
2/81, respectively. Let gl(x) be the righthand side of (3.3.17). It
was shown in the preceding paragraph that fl(x) and g (x) intersect at
least once in the interval Lo,l]. To determine that they intersect
exactly once, it is sufficient to show that f l(x)/3x < gl (x)/ax.
From (3.3.17),
f l(x)
3x x 2,
gl(x) 81(1 x) 2 1(1 x)
x = e (1(1 x )/2 + 2x) + e (8x + 2)
3 (1 x)
= e 1 ( 2(1 x )/2 + B3x + 2)
For 31>0 and xELO,1], it is clear that fl (x)/2x < 0 and
agl(x)/3x > 0, which completes the proof. t
Table 3.1 presents the optimal value of x found from the grid
search described immediately prior to Lemma 3.2, for various choices
of 1 1. Figure 3.1 is a graph of the optimal value of x versus 1B11.
The table includes relatively large values of B11 to illustrate that
xtl as Is1 B m. Recall that for a given value of B l1, the optimal
design is a comparison of (l,x) and (x,l) an equal number of times,
where x is found in Table 3.1 or Figure 3.1.
Table 3.1 indicates that the optimal design when B1=0 involves
only 3 levels, x=l,0,1. However, when 81O, the optimal design
requires 4 levels, x=l,x ,x ,1, for some x In certain experimental
So o
situations it may be significantly more laborious to make four "batches"
Table 3.1. Designs which minimize var(B2)
Relative
81 Optimal x efficiency
0 0 1
.05 .0003 1
.10 .0012 1
.15 .0028 1
.20 .0050 1
.25 .0077 .9999
.30 .0110 .9998
.35 .0150 .9995
.40 .0194 .9992
.45 .0243 .9988
.50 .0297 .9982
.55 .0356 .9974
.60 .0419 .9963
.65 .0486 .9950
.70 .0556 .9935
.75 .0630 .9915
.80 .0708 .9893
.85 .0787 .9866
.90 .0869 .9836
.95 .0953 .9801
1.0 .1039 .9761
1.1 .1216 .9668
1.2 .1396 .9556
1.3 .1580 .9426
1.4 .1764 .9269
1.5 .1948 .9097
1.6 .2130 .8905
1.7 .2310 .8696
1.8 .2487 .8472
1.9 .2660 .8233
2.0 .2829 .7983
3.0 .4269 .5242
4.0 .5295 .2974
5.0 .6031 .1546
CN
< ~
Ol
N
4
U
Lf
aD
a.
cC
than it is to make just three. For this reason, and because a priori
information about 1 may not be available, asymptotic relative efficien
cies were calculated for the design ({n,0 = nl = N/2} to the optimal
1,0 0,1
designs found in the table. The relative efficiency of the first design
to the second (optimal) design is defined to be
var 2(2)
R.E. = 2
var (B2)
1, 1 2
e /(e 1 + 1)2
B1(1 + x) ,2 1 x 2
e (1 x )/( e + e )
The relative efficiencies were enumerated for various choices of 81,
and are presented next to the optimal designs in Table 3.1.
Equivalence of ElHelbawy and Bradley's method
and minimizing the variance
An application of ElHelbawy and Bradley's method for finding opti
mal designs was briefly discussed in Section 1.4. It will now be prov
en that their optimality criterion is equivalent to minimizing var(8k),
any k. First of all, notation and results of ElHelbawy and Bradley's
paper which are necessary for the proof of Theorem 3.5 are presented.
Recall that ElHelbawy and Bradley considered linear contrasts of
Yi = In i=l, ... ,t. The matrices B and B are defined as mxt and
1 1 m f
nxt, respectively, with zerosum, orthonormal rows, such that
0 s m < m+n S tl. The null hypothesis is
H : B Y(T) = 0
S n n
and the alternative hypothesis is
H : B y(r) 0 ,
a n n
th
where Tr and y(r) are column vectors with i elements 7T. and y., respec
tively. The rows of B represent the linear contrasts assumed to be
m
null. Note that B need not exist (i.e. m=0).
m
The vector T is a value of T which satisfies the null hypothesis
o 
and the constraints, where r is the true parameter vector.
The matrix B is defined to be the txt orthonormal matrix
[1,'//
B = B (3.3.18)
m
where B* is any (tml)xt matrix such that BB' = I and l'=(l, ... ,1),
m t 
the tdimensional unit vector.
The following theorem is extracted from ElHelbawy and Bradley
(1978, Theorem 2, page 833).
Theorem 3.4. Let p be the maximum likelihood estimator of T. Under
the assumption of connectedness (Assumption 1.2), and by the assumption
of strictly positive elements of T, which has already been inherently
assumed since ln(0) is undefined, v7((p) Y(7)) has a limiting distri
bution function that is singular, tvariate normal in a space of (tml)
dimensions, with zero mean vector and dispersion matrix
,
E(r) = B*'(C()) B* (3.3.19)
m  m
where
C(T) = B*A(7r)B*' (3.3.20)
M m
and A(r) is the txt matrix with elements
t
Aii (i) = i ij i /(ni + .), i=l.. t,
j=l
ji (3.3.21)
A..(Tr) = U. .ir. ./(r. + i .) i j, i,j=l,...,t,
and where U.. is defined in Assumption 1.2.
13
ElHelbawy and Bradley's optimality criterion is the maximization
of the noncentrality parameter, 2, for the asymptotic chisquare dis
tribution of the likelihood ratio statistic. The noncentrality parame
ter is
2 1
2 = 6'E 6 (3.3.22)
0 
where 6 is a vector of constants defined in their paper, E is the
o
1
leading principal matrix of order n of C B provides the first n
o n
rows of B*, and
C = B*A(T )B*' (3.3.23)
o m o m
A theorem proving the equivalence of the two optimality criteria
is now presented.
Theorem 3.5. Consider the model In iT = 1 x+B 62x2+ +kxk
Suppose we are interested in testing, for some fixed i,
H : 3.=0 versus H : 30 .
o 1 a i
Then the design which minimizes var(Bi) is identical to the design
which maximizes the noncentrality parameter from the likelihood ratio
test. (Note that it does not matter whether or not it is assumed that
some of the other l.'s are 0.)
3
Proof. Orthogonal polynomials can be derived for any spacing of the
independent variable. Let these contrasts be the rows of B. The lxt
matrix B is the orthogonal polynomial corresponding to the parameter
n=1
B. being tested.
From the method used to find orthogonal polynomials, it follows
that
1 c,n=1 (TI
where c is a constant. So minimizing var(B.) is equivalent to minimiz
ing var(B y(p)). But from Theorem 3.4,
n=1
1
var(B n(p)) = B (B*'(C(T)) B*)B'
n=l m  n=
1
0
= (1,0, ... ,0) (C(7) 1
S ((C(7)) ) l
2
From the definition of Z E is the leading principal matrix of
order 1 of C 1
order 1 of C i.e.
o
Z = ((C())1)1
o  11
In this case, 6 is a scalar constant, and so maximizing 12 is equiva
lent to minimizing Z = var(B). t
o 1
Example 3.1. This example illustrates the equivalence of the two
methods presented in Theorem 3.5. It is assumed that the true model
is a quadratic model, In ix = l x + 5 x Also, the design is re
stricted to have the three design points x=l,0,1.
Table 3.1 indicates that the design which minimizes the variance
of O2 is {n1,0 = nO,1 = N/2}.
The design problem is now approached using the method presented by
ElHelbawy and Bradley. The null hypothesis is
Ho: 2 = 0 unknown.
o 2 1
Using orthogonal polynomials, this hypothesis is equivalent to
H (1 2 1)
Ho. y(T) = 0 ,
o /6
where y_' (r) = (In i1, In O In 1) .
Note that ElHelbawy and Bradley's constraint, that the In ir.'s
1
sum to zero, poses no problem. Instead of constraining T0 to be equal
to 1, an intercept parameter, B0, could be introduced to the model.
The model would then become
2
In = 0 Sx + B2x
with the constraint
SIn rT = 0
x x
x
It can be shown in this particular case that 80 3 .
0 32
For a definition of the following matrices and notation, see
(3.3.18)(3.3.23). For the example under discussion,
B
n=1
= (1 2 1)//6 ,
2(1 2 1)//6
B* =
m (1 0 1)//2
12 + 13
A_ ) = T 1
13
where 7 '=(1,1,1), i.e. B =B =0. Note that T can be any
o 1 2 o
satisfies the null hypothesis and the constraint.
The matrix C is
o
vector which
C = B*A(TT )B*'
o M  im
1 2 12 23 Vi2 12 23
S112 23 23 1223
implying
2 (112 + 4 +13 + 23
C
S12 H23)
712
12 23)
/12 / 
3 12 + 23
2 12 23
1
Then since Z is the upper lefthand element of C
o o
S 12 + 4113 + "23
o 21 C
W12
+12 + H23
123
13
23
13 + 23_
implying
1 21 Co
o0 12 + 4 +13 + 23
3 V12V13 + V12 23 + 113 23
S V(3.3.24)
8 012 + 4113 + 23 (3 24)
The design which maximizes the chisquare noncentrality parameter from
l
the likelihood ratio test is the design which maximizes E
o
1
By the definition of the U ., 3 = 1 12 13 So can be
ij 23 12 13 o
rewritten as
1 12 13 + (12 13)( 12 13
o 12 + 4 +13 + (1 V2 13)
2
12(1 V12 U13) + (V13 132)
(3.3.25)
31 + 1
13
Suppose that P13 is temporarily held constant. It is then claimed that
131
choosing u12 = (1 13)/2 maximizes o for the particular V13"
From (3.3.25), the expression 12((1 U 3) 1 2) must be maximized.
Because of the constraint 0 S 12 5 1 ,13' it is clear that
12((1 U13) 12) is maximized when U12 (1 13)/2, which
proves the claim. The constraint among the U..'s in addition to the
above claim implies that the optimal design must be a design such that
1 (13
12 2 V23 (3.3.26)
I
The scalar ~ is now rewritten as a function of v112 alone. From
o
(3.3.24) and (3.3.26),
1 3 212(1 212) + 12
o 8 2112 + 4(1 212)
2
3 212 3112
8 4 612
3 1 12
= 3112
16
Clearly o1 is maximized by choosing 112 as large as possible. By
(3.3.26), this would make 12 23=.5, and 13=0. So this optimal
design is in fact the same as the design which minimizes var(2).
An attempt was made to find the optimal design for the test of
hypothesis
H : B = = 0 versus H : not H
o 1 2 a o
using ElHelbawy and Bradley's methodology. When simultaneously test
ing more than one parameter, or equivalently more than one treatment
contrast, a constant value, 6., must be chosen for each parameter. The
theory behind the optimal designs includes an infinite sequence of
values of T satisfying H which are converging to a vector m7 satisfy
Sa o
ing H When the treatment contrasts are for factorial treatments,
o
which is the case they discussed extensively, the relative sizes of the
6. are said to be equivalent to the relative importance placed on each
1
contrast being tested in H Hence, ElHelbawy and Bradley usually
chose 6 =6 = = 6 .
1 2 n
In the present situation, choosing 61 and 62 restricts the alter
native parameter values in the infinite sequence mentioned in the pre
ceding paragraph to be a proper subset of the alternative parameters
satisfying Ha. For example, 61=62 implies that 68= 2, and hence the
2
alternative models are In T1 = Bx + Sx with B as N*. Choosing
x
different 6. can result in a different design because the maximization
1
of the noncentrality parameter depends on the relative sizes of the
6.. Additionally, the optimal designs actually found after choosing
the relative sizes of 61 and 62 were found to be unreasonable (e.g. for
61=62, the optimal design is a comparison of only the pair (1,0)).
Chapter 4 presents other optimality criteria for fitting a quadratic
model.
3.4 Cubic Model
The cubic model is
2 3
In I. = x. + Bx.2 + x 3 i=l,...,t. (3.4.1)
The asymptotic variancecovariance matrix of the maximum likelihood
1
estimators, B1, ,' and 63 is ( ab) where lab is given by (2.3.10).
The inverse of the variancecovariance matrix is partitioned as
11 12 13
(A b) = 12 22 23
13 A23 33
Then by a result found in Graybill (1969, page 165), the variance of
83 is
r 1
var( = ab 33
= 33 (A13 23> ( 12 ( 1j
12 22 23
SF 13 132212 23 23 1123 12 13
33 11 A22 122 (3.4.2)
To simplify (3.4.2), only the case S =0 is considered in the
search for designs which minimize var(3 ). As pointed out in Section
3.1, this gives locally optimal designs. The following theorem shows
that optimal designs only need to be found for linear and quadratic
parameters which are both positive.
Theorem 3.6. Suppose that 3=0, and further that the optimal design
for given 3 and 2 is comparisons of the pairs (x.,x.) a total of
n times, for all pairs. Then
x. ,x.
1
(i) The optimal design for (81) and (52) is the same as the
optimal design for 1 and 82'
(ii) The optimal design for (81) and 2 is comparisons of the
pairs (x.,x.) a total of n times, for all pairs,
1 3 x.,x.
(iii) The optimal design for 81 and (2 ) is the same as in (ii).
Proof. From (3.3.5),
. 2 (xij) =
2 2 2 2
exp(8 (x.+x.) 2 (x+x. )) exp(2 (x.+x.) + 23 (x. i+x ))
1 1 3 2 l 1 j 2 i
(exp(1x i82x ) + exp(81x 82x )) exp (2B1(x +x ) + 282 (x i+x 2))
2 2
exp(B (x. + x.) + 2(x + x ))
(exp(61xi + B2xi2) + exp( 1xj + 2x 2))2
= 1 (x ,x.) (3.4.3)
2
Notice that since B =0, Tr i./(r. + ir.) from the expression for a
3 13 1 3 ab
given in (2.3.10) is equal to 812 (x,x ). Hence by (2.3.10) and
83,63 i j
(3.4.3), it is clear that Xab is invariant under changing the sign of
both 81 and 2, for all a and b. Therefore the variance of 3 is also
invariant under changing the sign of both 81 and 2. This implies that
the design which minimizes var(3 ) for 81 and 82 is also the design
which minimizes var(3 ) for ( 1) and (8 ), thereby proving (i).
To prove (ii), note that from (3.3.5),
2 2
exp(8 (x.+x.) + 3 (x. +x. ))
) = (xi,x.)' =
51 1 j 2 2 2
S2 (exp(1Xi +B2xi ) + exp(81xj+62x ))
exp(B (xix.) + 8 (x. +x. ))
(exp(B1(xi)+B2xi ) + exp(B (x )+B2x2))2
S (x.,xi'x) (3.4.4)
Again, by (2.3.10) and (3.4.4), changing 81 to (81) and the pairs
(x.,x.) to (x.,x.), for all pairs in the design, will not change 1 ,
122' 33' nor 113. It changes only the sign of A12 and 123. But
notice that in (3.4.2) all terms involving A12 and A23 are of the form
2 2
112 23, 122, or 23 Hence the negative sign cancels, leaving
var(3 ) invariant under the change described above. This proves (ii).
To prove (iii), simply apply the results of (i) and (ii).
Theorem 3.5 proved the equivalence of minimizing var(83) and
ElHelbawy and Bradley's criterion. A Fortran computer program was
used to find the optimal design using ElHelbawy and Bradley's metho
dology for 8 = 2=8 =0. The levels allowed were x=l,.5,0,.5,1. The
grid search included all ten pairs formed by these five levels, assum
ing a symmetric design. The optimal design found was {n =
n = .045N, n.5 = .621N, n_, = .298N, all other n.i = 0}.
.5,1 .5,.5 1,1 13
For this design, it can be shown that var(3 )=16/N.
It was decided that for unequally spaced levels, employing ElHel
bawy and Bradley's methodology would be substantially more laborious
than minimizing var(83). For this reason, the designs subsequently
discussed were found by minimizing the expression found in (3.4.2).
Optimal designs were found by a grid search via a Fortran computer
program for 83=0, 81 2=0(.2)1. The comparisons allowed were (l,x ),
(x,x2 ), (x 2,1), and (1,1), where the optimal value of xl and x2 were
also found by the computer. The motivation for the choice of these
four pairs is the result discussed in the two preceding paragraphs for
the case of all three parameters equal to zero, which indicated that
only the pairs (1,.5), (.5,.5), (.5,1),.and (1,1) need to be com
pared.
The grid procedure is now briefly described. The initial values
for xl and x2 were x = .7(.1).3 and x =.3(.1).7. For each of the 25
combinations of xl and x2, the optimal comparison proportions were
found by another grid. The latter grid began by considering different
comparison proportions, changing by increments of 0.1. This grid
became finer and finer until the proportions were accurate to .0004.
From this the best combination of xl and x2 out of the initial 25
choices was found, and the entire procedure was repeated for a finer
grid about the previous best choice of xl and x2. The entire grid
search stopped when xl and x2 were accurate to .0037.
Notice that it is possible that the designs found are not abso
lutely optimal, but rather they may only be local minimums for (3.4.2).
This is because only four pairs were considered when it is possible
that a five or six pair design, for example, might offer a slight im
provement. Also, the grid search restricted xl and x2 rather than
allowing them to take on values in the entire interval [1,1. The
latter point is not believed to be a problem due to the resulting
design for 1 =2 =B3=0 originally found using ElHelbawy and Bradley's
methodology, and the fact that the optimal designs currently being dis
cussed all have values of x1 and x2 within 0.02 of 0.5.
The optimal designs and the value of var( 3) for N=l are presented
in Table 3.2. The first two columns of the table give the value of
81 and 2. Columns 5 through 8 give the comparison proportions for the
pairs (l,x ), (x ,x ), (x2,1), and (1,1), respectively, where xl and
x2 are given in columns 3 and 4. The last column gives Nvar(3 ). The
number of times a pair is compared is found by multiplying the appro
priate comparison proportion by N. Recall that by Theorem 3.6, the
optimal design for both parameters negative is the same as the optimal
design for !31 and B2 1. If exactly one parameter is negative, then
the optimal design is found by first finding the optimal design for
j8 1 and 1821, and then taking ix to be the proportion of the total
number of comparisons that the pair (l,x ) is compared, xl,x2 to be
the proportion for (xl,x ), and so on. To calculate var(3 ) for N>1,
the tabled value of Nvar(B3) is divided by N.
Notice from Table 3.2 that for 8: .6, the optimal design is very
close to the design {n 1,.5 = n.5,.5 = n.5,1 = .3333N}, regardless of
i,.5 .5,.5 .5,1
the value of 2. This implies that if it is a priori believed that 8
is relatively large, then this design will result in a variance of 83
very close to the tabled minimum. As has been previously mentioned, it
is also of interest that the optimal values of xl and x2 are always
very close to .5 and .5, respectively. Finally, notice that the
design for =B 2=0 found in Table 3.2 is not the same as the one found
using ElHelbawy and Bradley's methodology, although the value of
var( 3) appears to be the same for both, and hence they are equally
optimal. However, it was not established that the two designs are
mathematically equivalent. Since the two criteria are equivalent,
there may be some problem with local minimums of var(63). It appears
that the surface of the expression for var( 3) is relatively flat, and
so it is possible that a finer initial grid is necessary to guarantee
that the minimum found is in fact the absolute minimum of var(83).
Table 3.2. Designs which minimize var(B3)
1 62 xl x2 1,xl Xl,X2 X2,1 1,1 Nvar(83)
0 0 .50 .50 .133 .533 .133 .200 16.0000
.2 0 .51 .50 .338 .331 .331 0 16.0836
.4 0 .50 .50 .332 .337 .332 0 16.3225
.6 0 .50 .50 .330 .341 .330 0 16.7322
.8 0 .50 .50 .327 .346 .327 0 17.3190
1 0 .50 .50 .323 .353 .323 0 18.0959
0 .2 .50 .50 0 .666 0 .333 16.0000
.2 .2 .50 .51 .333 .329 .338 0 16.1447
.4 .2 .50 .50 .330 .336 .335 0 16.3835
.6 .2 .50 .50 .329 .338 .333 0 16.7943
.8 .2 .50 .50 .326 .346 .328 0 17.3819
1 .2 .50 .50 .321 .353 .327 0 18.1604
0 .4 .50 .50 0 .666 0 .333 16.0003
.2 .4 .50 .50 .105 .561 .103 .231 16.3209
.4 .4 .50 .50 .330 .333 .337 0 16.5664
.6 .4 .50 .50 .327 .338 .335 0 16.9799
.8 .4 .51 .50 .324 .342 .334 0 17.5716
1 .4 .51 .50 .319 .349 .333 0 18.3554
0 .6 .50 .50 0 .666 0 .333 16.0005
.2 .6 .50 .50 .004 .661 .001 .334 16.3207
.4 .6 .51 .50 .333 .328 .340 0 16.8748
.6 .6 .51 .50 .327 .333 .340 0 17.2927
.8 .6 .51 .50 .323 .337 .340 0 17.8907
1 .6 .51 .49 .317 .346 .337 0 18.6825
0 .8 .50 .50 0 .666 0 .333 16.0010
.2 .8 .50 .50 .004 .661 0 .334 16.3225
.4 .8 .51 .47 .039 .633 .078 .320 17.2619
.6 .8 .51 .50 .328 .327 .344 0 17.7368
.8 .8 .51 .50 .323 .331 .346 0 18.3436
1 .8 .52 .49 .319 .338 .343 0 19.1472
0 1 .50 .50 0 .666 0 .333 16.0015
.2 1 .50 .50 0 .664 0 .335 16.3233
.4 1 .51 .47 .034 .633 0 .333 17.2685
.6 1 .52 .50 .331 .318 .351 0 18.1804
.8 1 .52 .50 .324 .325 .351 0 18.9372
1 1 .52 .49 .318 .331 .351 0 19.7558
CHAPTER 4
DESIGNS FOR FITTING A QUADRATIC MODEL
4.1 Introduction
The present chapter contains optimal designs for fitting a quad
ratic model under three different conditions, each for three different
optimality criteria. The Doptimality criterion is considered in Sec
tion 4.2. This criterion minimizes the determinant of the asymptotic
variancecovariance matrix. In the present chapter, this matrix is a
2x2 matrix since there are two maximum likelihood estimators, 81 and B2"
From Section 2.3, these estimators are asymptotically normal. There
fore, by a result in Box and Lucas (1959), the determinant of the
variancecovariance matrix is proportional to the volume contained with
in any particular ellipsoidal probability contour for $=(81, 2) about
S, where $ is the true parameter value. The Doptimal design will
o o
then minimize the volume of any such probability contour.
Section 4.3 contains designs which minimize the averagevariance
of In T = B x + 2 x The averagevariance is simply the variance of
ln 't integrated over the experimental region. In this case, the exper
imental region is xELl,l]. The property that these designs have is
obvious. That is to say, a good estimate of all of the preference prob
abilities is desirable, and the design which minimizes the integral over
x of var(ln ir ) will give estimates of 7 which are good "on the
x x
average".
The criterion discussed in Section 4.4 is similar to the average
variance criterion. The designs found in this section minimize the
maximum of var(ln TT ), where the maximum is taken over the interval
x
xEil,l]. These designs are referred to as minimax designs. The dis
advantage of this criterion is similar to the disadvantage often attrib
utable to the minimax rules in the decision theory framework. That is,
minimax designs may be slightly "pessimistic" in the sense that they
minimize the largest value that the variance can be. This minimization
is good, but at the same time the variance of In T may be relatively
x
large for most values of x in the interval [1,1], hence causing the
minimax design to be unfavorable.
Finally, Section 4.5 presents some design recommendations and
conclusions. A number of designs are compared to determine which
design does best for a wide range of parameter values, using primarily
the Doptimality criterion. Additionally, an examination is performed
on how well these designs protect against the bias present when the
true model is cubic.
Recall that in general, the problem of finding optimal designs is
complex. It involves the determination of the number of levels to be
included in the experimental design, and also which particular levels
are to be chosen. Additionally, the number of times each pair is to
be compared must be determined. In Chapter 3 it was proven that the
design which minimizes var( 2) must be of the form {n_x = nx,1 = N/2,
thereby implying that the optimal number of levels is four, and only
two pairs are to be compared an equal number of times. All that re
mained to be resolved was the determination of x for different values
of a1 In the present situation, a similar simplification of the
general problem does not appear to be possible, except by imposing some
restrictions, such as letting $2=0. For this reason, three conditions
are considered under which optimal designs are found. This essentially
results in designs which are optimal for a proper subclass of the class
of all designs. However, as will become apparent in the remainder of
the chapter, there is not much difference in the degree of optimality
among the three conditions presently considered. Hence it is conjec
tured that the degree of optimality of the designs found in the final
and most general subsection of Sections 4.24.4 is very close to if not
equal to the best that can possibly be achieved. The three conditions
are now briefly introduced.
The first condition considered in each of Sections 4.24.4 is that
the levels are restricted to be x=l,0,1. This greatly simplifies the
general problem to one with only two unknowns, namely the comparison
proportions, P1,0 and p0,1. The value of J1,1 is found by the rela
tionship i, + 0 + _ = i.
tionship 1,0 0,1 1,1
The second condition discussed in each of the following three sec
tions is to let $2=0 in the enumeration of the variances and covariance.
In this situation the optimal designs are proved to be symmetric, and
hence the design problem is simplified to a choice from the class of
symmetric designs. As was pointed out in Section 2.3, previous papers
on the design of paired comparison experiments have always relied upon
setting all parameters equal to zero. Presently, the linear parameter
is allowed to vary. The three criteria are a continuous function of
the parameters because the variancecovariance matrix is continuous in
the parameters. Therefore, the designs found for the second condition
are locally optimal in the sense that they are close to optimal for
values of 82 which are near zero.
The third condition in each of the following three sections is a
consideration of designs whose comparisons are of the form (l,x ),
(x2,1), and (1,1). The optimal choice of the levels xl and x2 are
found in addition to the three comparison proportions. This is a gen
eralization of the first condition since setting x =x2=0 results in the
first condition.
Figure 4.1 presents a synopsis of the procedures used in Sections
4.24.4 to find the optimal designs for each of the three optimality
criteria and the three conditions introduced in the present section.
The first column presents a brief description of the three conditions,
and hence the three rows in the figure correspond to the three condi
tions. Columns 24 contain a brief description of the methodology used
for the three criteria.
4.2 Doptimal Designs
The Doptimal design is defined to be the design which minimizes
the determinant of the asymptotic variancecovariance matrix. Letting
Z denote the variancecovariance matrix, this is equivalent to maximiz
ing IZl. In the case of a quadratic model, the Doptimal design is
the design which maximizes D, where
D = = 11 22 122 (4.2.1)
and where X 11 122' and X12 are given in (3.3.2)(3.3.4). Because the
optimal design depends on the true parameter values, designs are found
0
> C0
> 0
.4l o j
,t
o o > o
X4 X .O O
I * m X C) .
O*H
a )01 '0 a .
4J , UT 4 *
04 r 50 
I ifU 0 0
C 04
'.0 0.
0 Cl C
I 0
"' ri 11
CC 0
4 4 C) 4 CC
1U m > U
,, >. X H 4
Eo 0 i
 u 'o
; U *d 1U X H 
. .i ro c n i
S C) U C) CB N U CC
SH 0 1II 0 > 0 C
SO 4
S0 00 0
Uo ,) a) > o 1 z
Q f q o 0 4J
Ci 0 CC 0
0 0 004
I H *4 CC
rC U 0 U
I NC O >S 00
i 'Q *H 0 *'C C 'i U
I E X U . U
14 O R ir U O CC 0 0 ]
0 CC 0 CC N U V 4 CO
I 0 CC  C: 0 C
i ti * *H *l CO ( C0
4 4 34 40 U
I 0 O 10 + 4J
0 H rN N n Q, I
R 4 ul g g
S 0 0 rc4 r3 H
0 > 0 U ai 4J C 0
E I U c
0 0  CC e CC H 0
*0 0 C U 0 0 U
4 0 4 C 0 
o H a U .C N 0 X N
4J C C U o HC 0Q 0 U M
0 :a 04 Io C ll H C o x 'N
0 CC EC 4C rI I H f X n
'N__ r ,______< '4m~ C C ~ H ~
for various choices of the parameters. The three conditions intro
duced in Section 4.1 are now discussed, beginning with Condition 1.
Condition 1
The levels are restricted to be x=1,0,1. Because of this restric
tion, and by substituting i.. for n.. in (3.3.2)(3.3.4), the expression
for All' A12' and 22 can be simplified to
11 = 1,0 1,0 + 0,1O Ol + 41,1 1,1
12 0,1 0,1 1,0 1,0 (4.2.2)
22 = P1,0 1.0 + 0,10,1
The substitution of Pij for n.. is made throughout the present chapter.
This essentially sets N=l since U.. = lim(n../N).
13 N4 ij
From (4.2.1) and (4.2.2), the expression for D is
D = 4 1,0 0,11,0 0,1 +
4(1u1, 0 0,l1 1,0_1,0 + 0,1 0,1'1,1 (4.2.3)
where .. is defined in (3.3.5). The partial derivatives of D with
respect to i ,0 and 10, are, respectively,
1,0 0,1
3 4 0 40 ( 0 + 0 ) +
U, 0,1 1,00, 1,1 1,01,0 + 0, 0,1 +
1,0
(4.2.4)
4(1 U_1,0 , ,O 1,1 1,0 '
aD
 =, 4U1,00 ',0o0,1 4 1,1( 1',0_1,0 + '0,O10'1) +
4(1 p1,0 0, 1,1 '
Setting the derivatives in (4.2.4) equal to zero results in
= 1,0 1,1 '
(4.2.5)
iO, (1,0 0,1 + ,0 ,1 0,1 1,1
0+ O,1 20,l ,
In matrix notation, (4.2.5) is equivalent to
1I,0 1,0 1,1
0,1 0,i i,i
= 0,1,1
(4.2.6)
implying
ul 0,l 1,1
(4.2.7)
where
S21,'0 1,1
A =
1,0 0,1 1,0 1,1 0, 1,1
tl,0 O,l% 0,1 1,1 1,
2%,itid)
For the solution of (4.2.7) to be a maximum for the expression in
(4.2.3), it needs to be shown that the matrix of second order partial
(4.2.4)
11 ,0(2._1,0 _I,' +
derivatives is negative definite (see Kaplan, page 128). By (4.2.4),
this matrix is
1,02 D/1,O 0,1
= = 4 A (4.2.8)
2 D/1,0 Ol 2D/O, 12
where, from (3.3.5),
1 1 2
Se= 1/( e + e
i + B + 82 2
1 2 1 2 2
= e /( e + 1 (4.2.9)
6i u S iS
S= e + 2/(1 + e1 2 )2
0,
It is known from elementary matrix algebra that the determinant of
a square matrix is equal to the product of its eigenvalues. So in this
case, IXI=ele2, where el and e2 are the eigenvalues of X. If jX[>0,
then either both eigenvalues are positive or both are negative, or
equivalently, X is either positive definite or negative definite. But
it is clear from (4.2.8) that
( 1 0 ) X = 8_1,1 10 < 0, (4.2.10)
where 1,1 and (,0 are both positive and are given in (4.2.9). The
1,1 1,0
inequality in (4.2.10) implies that X is negative definite if xJ>0.
However, it will be shown by (4.2.12) that the determinant is not nec
essarily positive for all values of 81 and $2.
To simplify the notation, we write
/ 2a
(a + c b)
(a + c b)
2c
where
a = = e / (e + e )(e + 1)
26 1 + 2 a3 + )] 2
b = 01,010,1 = e22/ (e 1 + 1)(1 + e 1 2 2
81 +82 81 1 + 2] 2
c = i,0, = e / (e + e )(1 + e )
Then the determinant of X can be written as
IXI = 4ac (a + c b)2
2ac + 2ab2 2 c
= 2ac + 2ab + 2bc a b c
(4.2.11)
After resubstituting for a, b, and c, and after some algebra in
(4.2.11), an expression for IXI is found by
482 B + 382 8 + 8 8 + 8
e + 4e + 1 + 4e + 3e
1 3e + 382 381 + 2 21 + 282 381 + 382
S= 2 +3e + e + 3e + e
281 + 28 28 381 + 382 38 + 82
+ 3e +3e + e + e
 21 8 281 + 482 48 + 282
e + 2e + e + e
481 + 28 281 + 482 281 81 + 282
+ e + e + e +2e
(4.2.12)
I,
I
where
282 [(B, B1 8 + 2 B1 + 82, ) 2
S= 4e / (e + e ) (e + 1)(1 + e 1 (4.2.13)
Since >>0 for all $1 and 82, IXI>0 if and only if the righthand side
of (4.2.12) is positive. However, it appears that for large enough
values of B1 and B2, some of the terms with a negative sign will over
whelm the sum of all of the terms with a positive sign, and hence the
determinant could be negative. For example, 1 and B2 can be made
large enough such that the term exp(481 + 282) in (4.2.12) is larger
than the sum of all the terms with positive signs, thereby making
the expression in (4.2.12) negative. It will be seen shortly that this
is in fact the case.
An APL computer program was utilized to solve equation (4.2.7) for
various values of B1 and B2. The determinant, Ix was also calculated
and found to be positive for all values of 81 and 82 when the solution
fell inside the region of allowable values for the comparison propor
tions, p ,0' ,1 and i 1,1 That is, J.. Z 0, for all i and j, and
1,0 0,1 + 1 ,1 = 1. Figure 4.2 depicts the values that 1I,0 and
Iu Ui +,v 1,0
j0,1 can have. They must lie inside or on the solid triangle. The
0,1
dotted triangle in the figure depicts the region that the Doptimal
designs are exclusively located.
When the solution to (4.2.7) falls outside of the solid triangle
in Figure 4.2, the optimal design must lie somewhere on the solid tri
angle. This can be proven by contradiction since D is a quadratic func
tion of two unknowns, and hence it can only have one stationary point.
For a point to be on the triangle, one of the following must be true:
11,0 = 0, o0,1 or 1,0 0,1= 1.
10,1
1
.5 
0
0.5 1
Figure 4.2. Depiction of possible designs for Condition 1
If i,0 = 0, then from (4.2.3),
D = 40,1l _,10, 1 1 10,12
and 3D/O,1 = 0 implies lOi = 0.5. This clearly maximizes D
because of the negative coefficient on the quadratic term. Therefore
if the optimal design lies on the line segment i1,0 = 0, then it must
be {VI0, = 1 = 0.5}. If v0,i = 0, then
2
D = 4_1,101,0 , l,0
and it can similarly be shown that the optimal design is the midpoint
of the line segment, namely {0, = 0, I,0 = 1, = 0.5}. Finally,
0 / i ~1,0 1= /
if 11,0 +0, = 1, then
0 = 1 0 
D = 4~1,010, 1_,0 0,l
= 4 1,0 0,111,0(1 1,0'
Again, it can similarly be shown that D is maximized when _,
p0, = 0.5. Therefore, if the Doptimal design is on the solid tri
angle, it must be one of these three designs.
When the solution to (4.2.7) was outside of the solid triangle in
Figure 4.2, D was calculated for the three designs presented in the
preceding paragraph, and the one with the largest value of D was taken
as Doptimal. Otherwise the Doptimal design was found directly from
(4.2.7). The determinant of X was also calculated. It was occasion
ally negative as conjectured earlier, but only when the solution was
outside of the region, at which time the sign of IXJ is irrelevant.
The "restricted" Doptimal designs are presented in Table 4.1 for
all combinations of the parameter values B =0(.1)1,2,3 and 2=0(.1)1,2,3.
Recall that 1,0' 0,1' and p1,1 represent the proportion of the
total available comparisons to be allocated to the pairs (1,0), (0,1),
and (1,1), respectively. The number of times one of these three pairs
should be compared is then found by multiplying the corresponding Pij
by N. The values of D are also given for the purpose of comparing the
designs with the designs found in the next two subsections. Addition
ally, the relative efficiencies are given for the "restricted" Dopti
mal designs presented in Table 4.1 relative to the Doptimal designs
found for Condition 3 in Table 4.3. The relative efficiency is the
square root of the ratio of the corresponding values of D, since D is a
2
multiple of N
Recall from Section 3.1 that a value of 1 for the parameter 1 is
"large". This is basically true for any parameter since x is in the
interval [1,1]. For this reason, it is assumed that rarely will S1 or
82 be larger than 1, and hence parameter values of less than or equal to
1 are of most interest. Parameter values of 2 and 3 are included pri
marily to observe the change in the designs for large parameter values.
Table 4.1 only gives positive parameter values. The following
theorem shows that the table can also be used to find optimal designs
for negative values of 81 or 2. This theorem is similar to Theorem
3.6.
Theorem 4.1. Suppose that the Doptimal design for given 81 and 82
is a comparison of x. with x. a total of n times, for all pairs in
1 ] x.,x.
the design. Then
(i) The Doptimal design for (81) and (82) is the same as for
81 and 682
(ii) The Doptimal design for (81) and 2 is a comparison of
(x.) with (x.) a total of n times, for all pairs in
1 ] x.,x.
1 3
the design,
(iii) The Doptimal design for 81 and (82) is the same as in (ii).
Proof. From (3.4.3),
 1, (xix.) = i, 2(x.,x.) ,
where ( 12(x.,x.) is given in (3.3.5). From (3.3.2)(3.3.4) and the
above result, it is clear that A ll, 22' and A12 are invariant under
changing the sign of both 81 and 2. Therefore, since D = A 1 22 12
D is invariant under changing the sign of both B1 and B2, and so (i) is
proved.
To prove (ii), note that by (3.4.4),
1,l 82 (xi,xj) = l,0 ( ,x )
Again, from (3.3.2)(3.3.4) and the above, changing 81 to (61) and the
pairs (x.,x.) to (x.,x.) will not change XA nor A22 It changes
only the sign of 12' and hence does not change A 12. Therefore (ii)
is proved.
To prove (iii), simply apply the results of (i) and (ii).
So, as in Section 3.3, the optimal design for both parameters nega
tive is the same as the design for I811 and IB21. When exactly one
parameter is negative, the column headed 1,0 becomes 0,' and vice
versa.
Notice from Table 4.1 that for small parameter values, the optimal
design is inside the solid triangle in Figure 4.2, signifying all three
pairs are to be compared. The optimal design approaches the boundary
as the parameters increase. Also, in all cases the optimal design was
inside or on the dotted triangle in Figure 4.2, signifying that no pair
should be compared more than half of the time.
Notice also from Table 4.1 that most of the relative efficiencies
are approximately 0.95 when 81 and 82 are both less than or equal to 1.
This indicates that the increase in efficiency for the generalization
considered under Condition 3 is slight. If an experimenter warrants
that a minimum number of levels should be used, then the advantage of
the designs found presently under Condition 1 is that they have only
three levels as opposed to four levels for the other two conditions.
The fact that they are also about 95% efficient allows the present
designs to be useful in this situation. However, if there is no signi
ficant advantage to only having three levels, then perhaps the designs
found under Conditions 2 and 3 are more useful since they are slightly
more efficient.
Condition 2
In this subsection, locally Doptimal designs are found analytical
ly for 32=0. The designs are local as described in Section 4.1. That
is, by the continuity of the criterion, the optimal designs are close
to optimal for 32 close to zero. It turns out that they are also local
in the sense that the designs found under Condition 3 have slightly
larger values of D for some values of 1. Since the criterion is a
maximization of D, the designs under Condition 3 are therefore occa
sionally slightly better.
By the same argument that was used in the proof of Theorem 3.1,
the Doptimal design must be a symmetric design when 2=0, which implies
12=0. So the search for a Doptimal design is reduced to a search
for a symmetric design which maximizes D = 1 22. Also, the proof of
Theorem 3.2 can be presently implemented to show that every pair in the
Doptimal design must include either x=l or x=l.
From (3.3.7), .ij is written as
4i. = ( 4cosh2(1 (x. x.)/2) )
13 i J
where
1 u u
cosh(u) = (e + e)
2
1 u u
sinh(u) = (e e) .
2
The maximization of D is equivalent to the maximization of the follow
ing redefined D:
D = 4Xi 22 = f(x)g(x)
N/2 2
N/2 (1 x.)
f(x) =
i= cosh2(8(xi 1)/2)
N/2 2 + .)2
g (x ) = .
cosh (8 (x. 1)/2)
i=l 1
x' = (1,X2.x, ... XN/2)
Two useful, easily verified facts are
3cosh(6 (x. 1)/2) 1
x 2 B sinh(8l(x 
x. 2 1
.1
(4.2.14)
(4.2.15)
(4.2.16)
1)/2) ,
(4.2.17)
Dsinh(6(xi 1)/2) 1
= 1  cosh(3S x 1)/2)
ix. 2
where
The partial derivative of D with respect to x. is then
1
DD f(x) 9g(x)
x g(x) + f(x) (4.2.18)
dx. ax. x.
1 1 1
where
af(x) 2(1 x.) (1 x)2 sinh(8 (xi 1)/2)
1 1 1 11
axi cosh2 (6(x 1)/2) cosh 3 (x 1)/2)
(4.2.19)
Dg(x) 4x.3 4x. (1 x.2) 2 sinh(3C(x. 1)/2)
1 1 1 1 1
xi cosh2(8 (x. 1)/2) cosh3(1 (x. 1)/2)
(4.2.20)
i=l,...,N/2.
Because of the complexity of handling these N/2 equations of N/2
unknowns, and because it appeared to be reasonable, the expression in
(4.2.18) was evaluated at the N/2dimensional point x.=x, i=1,...,N/2,
set equal to zero and solved. This produced a design of two equally
weighted comparisons, namely (l,x) and (x,l), for some x. The mechan
ics of the procedure follows.
The partial derivative of D with respect to x., evaluated at x.=x,
1 1
for all i, is
S 2(l __x) l (1 x)2sinh (1 2)2
9D L cosh cosh3 cosh
+ 2(l x2 (1 x) )
2 2 2
cosh cosh cosh2
(4.2.21)
where cosh and sinh are shorthand for cosh(B1(x 1)/2) and
sinh(1B(x 1)/2), respectively. By setting the expression in (4.2.21)
equal to zero, multiplying it by cosh5/(N(l x)3(1 + x)), and combin
ing terms, we have
(1 + 3x)cosh + B1(1 x2)sinh = 0 (4.2.22)
Because of the transcendental nature of this equation, x can not be
expressed as a closed form function of 1. Consequently, an APL pro
gram was written to find the solution to (4.2.22) for various choices
of 1. These designs are presented in Table 4.2 and are discussed at
the end of the present subsection.
In order for the solution to (4.2.22) to be a local maximum for D,
it must be shown that
cosa
(cosal, ... ,cos /2) axx. 0
Sx.=x c /
cosaN/2
for any angles ai, such that 0 < a. 2r, and
N/2
cos a = 1
i=l
(see Kaplan, page 128). This is equivalent to showing that the matrix
X is negative definite, where
( 2D
X = (4.2.23)
3x N/2xN/2
i = N/2 x N/2
From (4.2.18), the second order partial derivatives are
2 D 2f(x) 2f(x) Dg(x) 2 g(x)
= g(x) + 2 + f(x) i=,...,N/2,
2 2 1 (4.2.24)x.
S1 1 (4.2.24)
a2D
2x. x.
1 3
F[2f(x) f(x) 2g(x)] f(x) 2g(x) 2gx)
= x.x. xg( + xx. Sx. + x.) x
131 3 iL j 1 3
;f(x) 3g(x) 9f(x) 3g(x)
= 2x x + . x. x. ij, i,j=l,... N/2,
1 3 3 1
(4.2.25)
where, from (4.2.19) and (4.2.20),
2 2812
2 (x )2
22 i 1 +
cosh
4(lx )B sinh
cosh3
cosh
2 1 2 2 2
4(3x. 1) (1x )
1 2 c
cosh2
(lx ) 21 sinh2
cosh4
(4.2.26)
3
8(x. xi)S sin]
cosh3
3 1x 2 2 inh 2
(1x ) 8I sinh
cosh 4
cosh
I
i=l,...,N/2.
(4.2.27)
Evaluating the second order partial derivatives at the point x.=x,
for al i, (4.2.23) can be written as
for all i, (4.2.23) can be written as
a b
b a
x =
(4.2.28)
a b
b a
22f (x)
2
92g(x)
ax.2
1
2
2 gx)
ax.2
1
where
= 2D
Dx.
x .=x
= g (x) + 2 f (x) (2 (4.2.29)
2 xx x.2
3x 3xi
xa.=
Xi=X
and
2
Do
b = Dxx (any ifj)
x.=x
1
Df(x) Dg(x)
= 2 Dx. x. (4.2.30)
1 1i
The complete expressions for a and b can be found from (4.2.29) and
(4.2.30), in addition to (4.2.15), (4.2.16), (4.2.19), and (4.2.20),
by substituting x for x..
The matrix X is of a special form as indicated by (4.2.28). It
can be written as
X = D + yz' ,
where D=(a b) I N/2, =b, and y'=z'=l'=(1, ... ,1). From Graybill
(page 187, Theorem 8.5.3), the characteristic equation is
N/2 N
1
S d 2
(d + c7 y.z. ) (d ) = 0
i=l
N
N
d + a Zy.z. = (a b) + b. As discussed in the previous subsection,
1 1
1
the matrix X is negative definite if and only if all of the eigenvalues
are negative. An analytic proof that the two eigenvalues are negative
follows.
Consider first the eigenvalue (ab), where from (4.2.29) and
(4.2.30),
C2x) 2 g(x)
ab = g(x) + f(x)
3x.i x.
1
cosh2(2 (l1x)2 ) + sinhcosh(4(lx)1
+ sinh2 ( 2(1x) 2 2J 22
cosh4 cosh2
2 2 1 2 2
cosh (4(3x 1) (1x )B
2 1
(1x) 2 + sinhcosh(8x(lx 2) ) + sinh ( (1x2) 2)
coshcosh 4
(4.2.31)
I.t needs to be shown that (ab) is negative at the stationary point x,
which is the solution to (4.2.22). This relationship between x and B1
found in (4.2.22) is used in the above expression of (ab) by substi
tuting for sinh,
(1 + 3x)cosh
sinh = (4.2.32)
1l(1 x2)
Upon substituting (4.2.32) into (4.2.31), the eigenvalue (ab) at the
stationary point, x, is
(2 x) 2 4(1+3x) x)2] 1x) 2 +
2 1 (2(1+x) 2 cosh4
ab =
+ 2 1 2 2 3 21 4
[ 4(3x 1) 1x )1) 8x(l+3x) + I1(l+3x) .2sh 3 3
(4.2.33)
After factoring out (lx) /cosh and combining terms,
ab < 0
if and only if (4.2.34)
2 (x 1)(x2 2) ) + ( 22x2 20x 6 ) < 0
1 2
The first term in (4.2.34) has roots x=l and x=+v2, and is nonpositive
for XELl,l]. The roots of the second term are imaginary, and clearly
the function is negative at x=0. Hence it is negative for all x. The
result ab < 0 then follows from (4.2.34).
N
The second eigenvalue is (ab) + b. From (4.2.19), (4.2.20),
and (4.2.30),
b 2(1x) (2cosh (lx)B sinh) (4x(l+x)cosh (lx)(l+x)2 sinh)
cosh6 1 1
2(1x) 8x(l+x)cosh2 + 2(1x)(l+x)2 1coshsinh
cosh6 + 4x(lx2S coshsinh + (1x2) 2 2sinh2
2(1x) (x2 + 2x 1) (4.2.35)
cosh
the last step produced by again using the relationship in (4.2.32) and
combining terms. Then
N (1 x)
Sb =(2x 2) (4.2.36)
2 4
cosh
By factoring N(1x) /cosh4 out of the second eigenvalue, it is clear
2
from (4.2.34) and (4.2.36) that
(a b) + b < 0
2
if and only if
2 2( (x21)(x 2)) + (22x220x6) + (2x2) < 0
1 2
if and only if
12( ()x2) (x 2)) + (22x218x8) < 0 (4.2.37)
The first term in (4.2.37) is unchanged from (4.2.34). The roots of
the second term are still imaginary, and hence the eigenvalue is nega
tive. Therefore X is negative definite, and the solution to (4.2.22)
is in fact a local maximum for D.
The locally Doptimal designs for 81=0(.02)1,2(1)5 are presented
in Table 4.2. The value of the original D=llX 22 is also given for
comparison purposes. By comparing the present values of D to the cor
responding values of D associated with the designs in Table 4.1, it can
be seen that the present designs offer a slight improvement for 2=0.
Theorem 3.3 is applicable here since B2=0, and so the optimal design
for any particular value of 81 is the same as the optimal design for
( ). Recall that for a given 31, the optimal design is to make half
of the total comparisons between x and 1, where x is found in the table,
and the other half between (x) and (1). The optimal value of x is
graphed as a function of 1811 for 1811 1 in Figure 4.3.
Condition 3
A grid method was used to find Doptimal designs with no restric
tions on the values of the two parameters. With 2S 0, the argument for
a symmetric design is no longer applicable. As a generalization of the
first subsection, the three comparisons (l,x ), (x 2,1), and (1,1)
were allowed. A description of the grid method follows.
In general, if there are k independent unknowns, and the grid is
to have n values for each variable at each stage, then the grid method
is simply an evaluation of the function at each of the n points, and
the determination of which of these points maximizes (or minimizes) the
function. It then repeats the procedure, only with a finer grid cen
tered at the best point previously found. This procedure is repeated
until the optimal point has been found with sufficient accuracy.
The procedure used for the present condition was essentially a
grid within a grid. At each step, the combination of three values each
of x1 and x2 were taken. Initially these values were 0.5, 0, and 0.5.
For each of these 9 points, a grid method was used on the allocation of
the comparisons to the three pairs. This grid consisted of initial
increments of 0.1 in the allocation fractions. Notice that this is not
a full 113 grid since, for example, { , = ,l = U1 = 0.5} is
1X X2 L ,1
not a possible allocation for the three pairs because they must sum to
1. After the best of these possible allocations were found, a finer 52
grid was performed about the current best allocation. Note that it is
a 52 grid rather than a 53 grid, because when two of the allocation
proportions are known, the third is just the sum of the two proportions
subtracted from 1. This was then repeated until the best allocation
fractions were accurate to .0001. The best of the 9 combinations of
x and x2 was taken, and then a finer 32 grid about the current best
combination of xl and x2 was performed using the same procedure des
cribed in the present paragraph.
The entire procedure described in the preceding paragraph was
repeated until the values of x1 and x2 were also accurate to .0001.
Doptimal designs were found for combinations of 81=0(.1)1,2,3 and
82=0(.1)1,2,3, and are presented in Table 4.3. Once again, the value
of D for N=l is also given in the table. Notice from the table that
when nx,1=0, the value of x2 is immaterial, and therefore an asterisk
appears under the column for x2.
Theorem 4.1 is applicable, so designs for negative 81 or 82 can
also be found from the table. If both S1 and 82 are negative, then the
optimal design is identical to the one for positive B1 and 82. If,
however, exactly one of the parameters is negative, then the optimal
design is the same as the one for 16 I and 1B I, except that nx
1 2 P1t tx
changes to n and n x2 changes to nx2,. For example, if
l,x x ,l 2
81=0.6, 82=0.3, then the Doptimal design would be to run .462N compar
isons of the pair (1,.23), .468N of the pair (.25,1), and .070N of
the pair (1,1).
Table 4.3 indicates that once again no pair should be compared more
than half the time. Also, the optimal design is inside the region
Table 4.1. Doptimal designs with restriction x=1,0,l
1 82 1.,0 p0,l 1,1 D Efficiency
0 0 .333 .333 .333 .083333 .9613
.1 0 .334 .334 .332 .082506 .9613
.2 0 .337 .337 .327 .080098 .9613
.3 0 .341 .341 .318 .076314 .9612
.4 0 .348 .348 .305 .071460 .9608
.5 0 .357 .357 .287 .065895 .9599
.6 0 .369 .369 .263 .059986 .9583
.7 0 .384 .384 .232 .054068 .9555
.8 0 .405 .405 .190 .048426 .9536
.9 0 .433 .433 .135 .043283 .9542
1 0 .470 .470 .060 .038814 .9592
2 0 .500 .500 0 .011024 .9664
3 0 .500 .500 0 .002041 .7260
0 .1 .333 .333 .334 .083056 .9615
.1 .1 .336 .332 .332 .082233 .9617
.2 .1 .340 .333 .327 .079836 .9615
.3 .1 .345 .336 .319 .076070 .9613
.4 .1 .353 .342 .306 .071237 .9610
.5 .1 .362 .350 .288 .065694 .9602
.6 .1 .374 .362 .264 .059805 .9585
.7 .1 .389 .378 .233 .053905 .9558
.8 .1 .409 .399 .192 .048276 .9538
.9 .1 .435 .428 .137 .043141 .9543
1 .1 .470 .467 .063 .038676 .9591
2 .1 .500 .500 0 .011001 .9660
3 .1 .500 .500 0 .002039 .7257
0 .2 .332 .332 .336 .082233 .9619
.1 .2 .336 .330 .334 .081422 .9619
.2 .2 .342 .329 .329 .079060 .9619
.3 .2 .349 .331 .321 .075346 .9618
.4 .2 .357 .335 .308 .070575 .9615
.5 .2 .367 .343 .291 .065097 .9608
.6 .2 .378 .354 .268 .059269 .9594
.7 .2 .393 .370 .237 .053421 .9567
.8 .2 .411 .392 .197 .047833 .9544
.9 .2 .436 .421 .143 .042725 .9545
1 .2 .468 .461 .070 .038270 .9587
2 .2 .500 .500 0 .010931 .9647
3 .2 .500 .500 0 .002034 .7245
0 .3 .331 .331 .338 .080888 .9625
.1 .3 .337 .327 .337 .080098 .9625
.2 .3 .344 .324 .332 .077793 .9626

Full Text 
PAGE 1
DESIGN OF PAIRED COtlPARISON EXPERIMENTS WITH QUANTITATIVE INDEPENDENT VARIABLES BY WALTER WILLIAM OFFEN A DISSERTATION PPJESENTED TO THE GRADUATE COUlICIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REOUIPEMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY or FLORIDA 1980
PAGE 2
Digitized by the Internet Archive in 2009 witPf ftifi^ing from University of Florida, George A. Smathers Libraries http://www.archive.org/details/designofpairedcoOOoffe
PAGE 3
ACKNOWLEDGMENTS I would like to thank Drs. Ramon C. Littell and William Mendenhall for encouraging me to work past the Master of Statistics degree to pursue a Ph.D. I sincerely appreciate their concern and beneficial advice. A special thanks goes to Dr. Ramon C. Littell, first of all for the attention and help he gave to me in my early years at the University of Florida while working for him at the Institute of Food and Agricultural Sciences. I especially thank him for the immense amount of time and effort he contributed to this project. His patience and sincere interest will always be remembered. Thanks go to my family for their understanding and support during the many years of my education. I am especially grateful to my wife, Susie, for her love and caring, which makes it all worthwhile. Finally, I would like to thank my typists, Susan and Walter Of fen, who both did an outstanding job of typing this manuscript. Also, thanks go to Drs. John Cornell, Ramon Littell, Frank Martin, and Kenneth Portier for allowing my typists to use their office typewriters .
PAGE 4
TABLE OF CONTENTS Page ACKNOWLEDGMENTS iii LIST OF TABLES vi LIST OF FIGURES vii ABSTRACT viii CHAPTER 1 DESIGN OF PAIRED COMPARISON EXPERIMENTS : A LITERATURE REVIEW 1 1.1 Introduction 1 1.2 BradleyTerry Model 3 1.3 Estimation of the BradleyTerry Parameters and Corresponding Asymptotic Distribution 6 1.4 Designs of Factorial Paired Comparison Experiments 11 1.5 Response Surface Designs 22 2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS . 26 2.1 Introduction 26 2.2 Estimation 27 2.3 Asymptotic Distribution 29 3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER ESTIMATE 34 3.1 Introduction 34 3.2 Linear Model 34 3.3 Quadratic Model 38 Minimization of var (S^) 38 Equivalence of ElHelbawy and Bradley's method and minimizing the variance .... 50 3.4 Cubic Model 58 4 DESIGNS FOR FITTING A QUADRATIC tODEL 65 4.1 I.ntroduction 65
PAGE 5
Page 4.2 Doptimal Designs ^^ Condition 1 '^'^ Condition 2 ^^ Condition 3 4.3 Averagevariance Designs 99 Condition 1 100 Condition 2 101 Condition 3 H^ 4.4 Minimax Designs H^ Condition 1 H^ Condition 2 H"^ Condition 3 H^ 4.5 Conclusion and Design Recommendations 128 5 DESIGNS THAT PROTECT AGAINST BIAS 135 5.1 Introduction 13^ 5.2 All Bias Designs 13^ 5.3 Integrated Mean Square Error Designs 138 6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS 145 6.1 Introduction 1^^ 6.2 Averagevariance Designs for Preliminary Test Maximum Likelihood Estimators 146 7 T\'70STAGE SAMPLING FOR MODEL FITTING 150 7.1 Introduction 1^0 7.2 Optimal Error Rate 151 7.3 Concluding Remarks 157 APPENDIX A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES OF n,...,TT ,e 1^5 B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES 0^3, 3^ BIBLIOGRAPHY BIOGRAPHICAL SKETCH 170 178 181
PAGE 6
LIST OF TABLES TABLE Page 2 3 1.1 Efficiencies of balanced 2 and 2 factorial paired comparison designs 18 3.1 Designs which minimize var(6 ) 48 3.2 Designs which minimize var(3 ) 54 4.1 Doptimal designs with restriction x=l,0,l 90 4.2 Doptimal (local) designs with S_=0 94 4.3 Doptimal designs 95 4.4 Averagevariance designs with restriction x=l,0,l . . . 105 4.5 Averagevariance (local) designs with B=0 109 4.6 Averagevariance designs 110 4.7 Minimax designs with restriction x=l,0,l 119 4.8 Minimax (local) designs with 3_=0 123 4.9 Minimax designs 124 4.10 Efficiency of designs 133 4.11 Bias of designs 134 5.1 Jcriterion designs 141 5.2 Values of J 143
PAGE 7
LIST OF FIGURES FIGUI?E Page 1.1 Depiction of Designs I and II 12 3.1 Designs which minimize var(B) 49 4.1 Procedure summary 69 4.2 Depiction of possible designs for Condition 1 75 4.3 Locally optimal designs when 6=0 129 7.1 Minimization of the average mean square error Uniform prior 161 7.2 Minimization of the average mean square error Normal prior 152 7.3 Doptimality Uniform prior 163 7.4 Doptimality Normal prior 164
PAGE 8
Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DESIGN OF PAIRED COMPARISON EXPERIMENTS WITH QUANTITATIVE INDEPENDENT VARIABLES By Walter William Offen August 1980 Chairman: Ramon C. Littell Major Department: Statistics An experiment in which the treatments are compared and ranked pairwise is called a paired comparison experiment. The BradleyTerry model for preference probabilities with ties not allowed is used in this dissertation. This model defines treatment parameters, tt , ... , 7T , such that the probability that treatment T. is preferred over treatment T. is equal to tt. (tt. +Tr . ) . lThen the treatments are levels of continuous, quantitative variables, the logarithm of the BradleyTerry parameters can be modeled as a regressiontype model , InTT . = 'iS, X, . . There have been a large nvimber of papers on the topic 1 ^ k ki of paired comparisons , but only a few have considered experimental design. Past papers on design are reviewed in Chapter 1. In 1973 Springall presented the asymptotic distribution of the maximum likelihood estimators of B, , ... ,3 , where s is the number of regression1 s type parameters in the model. Chapter 2 of the present dissertation contains a derivation of this asymptotic distribution. The remainder
PAGE 9
of the dissertation is a consideration of designs for situations where the treatments are levels of a single quantitative variable. The asymptotic variancecovariance matrix is a function of the true parameter values. Hence, the optimal designs also depend on the parameters. In previous papers on design, the BradleyTerry parameters were taken to be equal when evaluating the variances and covariances. The present dissertation considers the design problem for various parameter values. Notice that the general design problem is complex. It involves deciding how many levels and which particular levels should be chosen. Furthermore, it must be determined how often to compare each pair of treatments. In some cases this complexity was resolved. In other cases additional restrictions had to be imposed before optimal designs could be found. Chapter 3 considers designs which minimize the variance of a single parameter estimate, S.. The linear, quadratic, and cubic models are considered. The optimal design in the linear case turns out to be comparisons of only the smallest level in the experimental region with the largest, for most values of the linear parameter. The designs for the quadratic and cubic cases depend more heavily on the parameter values than for the linear case. Chapter 4 presents optimal designs for fitting a quadratic model. The optimality criteria considered are Doptimality , the minimization of the average variance of Inir , and the minimization of the maximum value of InTT.. These designs also depend heavily on the parameters, although some overall design recommendations are given in the last section.
PAGE 10
The remaining chapters contain a brief discussion of related topics. Chapter 5 is a consideration of designs which protect against the bias present when the true model is cubic. Oiapter 6 is a discussion of designs for preliminary test estimators. Chapter 7 is a consideration of twostage sampling. The first stage is a design which results with a small variance of the quadratic parameter estimate. A test of hypothesis then determines whether the second stage design should be one which is optimal for fitting a linear or a quadratic model. A discussion of the best error rate to use for the test of hypothesis is included. The appendices contain computer programs that find maximum likelihood estimates of the BradlevTerry parameters and 6,/ ... ,B Â• These 1 s programs are written in the languages APL and Fortran.
PAGE 11
CHAPTER 1 DESIGN OF PAIRED COMPARISON EXPERIMENTS A LITERATURE REVIEW 1.1 Introduction An experiment in which the treatments are compared and ranked pairwise is called a paired comparison experiment. Such an experiment requires a number of siob^jects which are usually individuals, but could conceivably be a machine, for instance. Each subject is required to compare two samples and decide which of the two is preferable for the attribute under study. For example, each subject could be asked to state which of two samples tastes better. Paired comparison experiments are often conducted in areas such as food testing where it may be difficult for an individual to quantify his like or dislike of a sample, but he may be able to readily decide which of the two samples is preferred. Suppose that an experimenter wants to determine which of four brands of coffee most consumers prefer. There are a number of experimental procedures available. One possibility is that each individual be required to taste and rank all four brands from least favorable to most favorable. However, sensory fatigue may be a problem because after tasting the second or third sample, it may be difficult to perceive any difference between the s\ibsequent sample and the first sample tasted.
PAGE 12
Another possible procedure is to have each individual assign to each brand a score on some niomerical scale. Sensory fatigue may again be a problem with this experimental procedure since all four brands are tasted by each individual. Additionally, there may be a degree of arbitrariness to choosing a numerical score for each treatment. A third possibility is to conduct a paired comparison experiment. The problems mentioned in the two preceding paragraphs are not present for paired comparison experiments. That is to say, paired comparison experiments have a minimal amount of individual guesswork involved. As pointed out in the first paragraph, the complete experiment involves a number of individuals , each one choosing out of two samples the one which is preferable. For the particular example with four brands of coffee, 24 s\Â±)jects might be utilized, thereby resulting in 4 replicates of each of the 6 distinct pairs of treatments. The treatments can have a factorial arrangement. However, factorial paired comparison experiments have often been analyzed ignoring the factorial arrangement of the treatments. For example, Larmond, Petrasovits, and Hill (1968) conducted a 2^3 factorial paired comparison experiment, but they did not test for interaction or main effects as normally is done with factorial experiments. Although Abelson and Bradley (1954) presented a theoretical development, only relatively recently have tractable methods been available for testing interaction and main effects in factorial paired comparison experiments. In the case of quantitative treatments, an analogue of multiple linear regression analysis is appropriate. For example, instead of four distinctly different brands of coffee, there may only be one brand
PAGE 13
of coffee with four different levels of an additive. The design of paired comparison experiments for fitting response surfaces with one independent variable is discussed in Chapters 37. 1.2 BradleyTerry Model Denote the t treatments as T ,T , ... ,T . Let n ^ be the 12 t ij nvimber of times T. and T, are compared, i< j , i,j=l, ... ,t. We write T. ^ T. for "T. is selected over T.", where selection is based on the particular attribute under study. A general model defines ( ) functionally independent parameters , i> tt . Â£Â• 1 , such that P{T. ^ T.) = IT. . , IT. . + TT. . = 1, iT^j, i,j = l,. . .,t. (1.2.1) Each distinct pair of treatment comparisons is a set of independent Bernoulli trials. The entire experiment is a combination of the (") independent sets. Bradley and Terry (1952) proposed a basic model for paired comparisons. A treatment parameter tt . is associated with each treatment T , 1 1 i=l, ... ,t. The BradleyTerry model is then defined to be P(T. ^T.) = TT. . = Tr./(7T. + TT.) , i^^ j , i , j = l , . . . , t . (1.2.2) The righthand side of (1.2.2) is invariant under scale changes, so a single constraint is imposed to make the treatment parameters unique. Two popular choices are Ztt. =1, or z^lnr. =0. This constraint is 1 1 replaced with another constraint in Chapters 27.
PAGE 14
Another model for paired comparisons was presented by Thurstone (1927) . He used the concept of a subjective continuum on which only order can be perceived. Let y . represent the location point on the continuum for treatment T. , i=l, ... ,t. An individual receives a 1 sensation X. in response to treatment T. , with X. distributed 1 IX Normal (u. ,1) . The probability that T. is preferred over T. is then P(T. ^ T.) = P(X. > X.) = ^ I e ^ /^ dy , ^^^ (1.2.3) 3 ^ ^ ^J i,j = l t. (!i.U.) 1 : Bradley (1953) noted that replacement of the normal density function in (1.2.3) by the logistic density function yields P(T. ^ T.) =1 sech^(y/2) dy = tt ./("". +1T . ) , ^^^ (1.2.4) ' ^ J ^ ^ ^ 3 i,j = l t. (InTT.lnTT . ) 1 : This is the BradleyTerry model with u.=lnTT., i=l, ... ,t. Thompson and Singh (1967) gave a psychophysical consideration by assuming a treatment T stimulates unobservable signals U , ... ,U which are then transmitted to the brain by m sense receptors. The U. are independent and identically distributed, and m is taken to be fixed and large. The two variables considered for the response X to treatment T are (i) X is the average of the signals U , ... ,U , (ii) X is the maximum of the signals U . ... ,U . 1 m
PAGE 15
By the Central Limit Theorem, asymptotic normality results for X in (i) , which results with Thurstone ' s model. The random variable in (ii) has one of three distributions depending on the distribution of the U. . If U. , for example, is distributed exponentially, then the asymptotic distribution of X is the extreme value distribution. In any case, however, the BradleyTerry model in (1.2.4) results from (ii) . Rao and Kupper (1967) generalized the BradleyTerry model to allow for ties. A tie parameter 9 ^ 1 is introduced, and the model becomes P(T. ^ T.) = I i sech^(y/2) dy = IT ./ (TT. +011 . ) , ^^^ i,j=l,...,t, (InTT.lnTT. )+ri 1 D (IniT.lmT J+n 1 D 1 2 p(T. = T.) = T sech (y/2) dy (1.2.5) 3 4 i, j=l, . . . ,t, (InTT.lnTT Jn 1 3 (8^ Dtt.tt. 1 D (TT. + 077.) (0Tr. + IT.: 1 : 1 : where ri=ln0 is the threshold of perception for the judges. That is to say, if it is hypothesized that the two samples generate a random variable d which measures the difference between the samples, and furthermore that d < Ti , then the judge will not be able to discern a difference between the two samples, and will thus declare a tie. Note that the BradleyTerry model is a special case with 9=1. Another model to handle ties was proposed by Davidson (1970) ,
PAGE 16
The BradleyTerry model has received much attention in statistical literature since it was introduced in 1952. For the remainder of this dissertation, the BradleyTerry model with ties not allowed is used. 1.3 Estimation of the BradleyTerry Parameters and Corresponding Asymptotic Distribution When the model was proposed by Bradley and Terry, it was assumed that all of the n. . were eqpaal. Dykstra (1960) extended the method to permit unequal n. .. The parameters are estimated by the maximum likelihood method and likelihood ratio tests are developed. Assuming independence of the paired comparisons, the complete likelihood function is t a. t n. . L = K TT. ^ / n (TT. + TT.) ^^ , (1.3.1) i=l " i
PAGE 17
These equations must be solved iteratively (see Appendix A for an APL computer program which finds maximum likelihood estimates of the treatment parameters) . A brief description of the iterative procedure follows . Letting p . be the k approximation to p . , then (k) Pi *(k) , V" *(k) ., . ,. = p^ ^ /, i ' (1.3.3) i=l where i. / > n. ./(p. 1 ZIj L i: 1 p. = a. / > n. ./(p. + p . , k=l,2,Â•Â•. (1.3.4) : The iteration can be started with p. =l/t, i=l, ... ,t. Dykstra (1956) suggested a method of obtaining good initial parameter estimates p. , i=l, ... ,t. Ford (1957) proposed the model independently and proved that the iterative procedure converged to a unique maximxrai for L if the following assumption is true. Assumption 1.1 . For every possible partition of the treatments into the two sets S and S , some treatment in S is preferred at least once to some treatment in S . If Assiomption 1.1 does not hold, then a. must be zero for at least one i. If exactly one a. is zero, say a. , there may still be a unique o solution with p. =0. However, Ford proceeded to explain that if the o set S which violates Assumption 1.1 contains two or more treatments, then the corresponding estimates of the BradleyTerry treatment
PAGE 18
parameter must be zero, and consequently individual members of S could not be compared. This fact is also evident by noticing that in (1,3.4) , * (k) p. would be zero for at least two values of i, and so by (1.3.3), (k) p. would also be zero for the same values of i. Then the next time in the iterative procedure that (1,3.4) is executed, the denominator in the summation would be zero for some i and j , and hence the iterative procedure would not converge , The asymptotic theory and tests of hypothesis require the following additional assumption. Assumption 1.2 . For every partition of the treatments into two nonempty subsets S, and S^, there exists a treatment T.eS, and a treatment 12 1 1 T.oS^ such that U. .>0, where : 2 13 y^ = lim Â— ^ , ij^j, i,j=l, 1, N = = Z Z ".. Bradley (1955) investigated asymptotic properties of the treatment parameters with each n. ,=n. Davidson and Bradley (1970) considered a multivariate model where each subject chooses one of two samples on more than one attribute. As a special case of the multivariate model, Davidson and Bradley obtained a more general result, allowing the n. . to be different. They showed that the asymptotic joint distribution of v''N(p TT ), ... ,i'N(p T7 ) is the singular multivariate normal distribution of dimensionality (t1) in a space of t dimensions, with zero mean
PAGE 19
vector and variancecovariance matrix S = (a . ) , where O. . = cof actor of A . . in 13 13 'hj^ txt i txl i'lxt ^^ij^ txt i txl i' ixt (1.3.5) 1_' = (1Â»1, ... ,1), the tdimensional unit row vector, and X. . 11 ^ ^ ^^ (71. + 1 3 (IT. + IT .) 1 D 2 ' i=l,...,t, ^j = ^ij'^^^i ^ ^j^ ' Ly^j, i,j=l,...,t. It is apparent that the variances and covariances in (1.3.5) depend on the true treatment parameters. Hence estimates of the variances and covariances are usually found by substituting the consistent maximum likelihood estimates for tt , ... ,7T into (1.3.5). In the special case with 7T.=l/t, n. ,=n for all i,i, 1 x: a. , = 2(tl)^/t^ , and a. . 11 1: 2(tl)/t , It^j. (1.3.6) Use of (1.3.6) is adequate if IT ... ,71 are approximately equal.
PAGE 20
10 The major test proposed is a test of treatment equality, "o= ^ = ^2 = Â•Â• = \ = ^/^ ' H : TT. 7^ 7T. , for some i,j, i?^ j , i,j=l,...,t. a 1 J The likelihood ratio statistic is 21nA = 2Nln2 2 E E "ij'^^VPj^ E 'i'" ^i ^ i 5. = , lim 6. =5. iN iN 1 1=1 Then 21nA is asymptotically noncentral chisquare with (tl) degrees of freedom and noncentrality parameter X , where ^ Â• dl"^i"i'i i
PAGE 21
11 1.4 Designs of Factorial Paired Comparison Experiments Littell and Boyett (1977) compared two designs for Rxc factorial paired comparison experiments. The two designs considered are: Design I: Consists of independent paired comparisons of each of the RC ( ) pairs of treatments. Design II : Consists of comparisons between pairs of treatment combinations differing only in the levels of one factor, i.e. comparing two treatments in the same row or column. RC 1 Note that there are ( ) = Â— RC(RCl) distinct pairs of treatments for Design I, but only R(S + C( ) = ^C{R+C2) distinct pairs of treatments for Design II. In the 4X4 case, for example, Design I requires 120 distinct paired comparisons whereas Design II requires only 48 distinct paired comparisons. The two designs could entail the same total number of paired comparisons, but Design II requires preparation of fewer distinct pairs of samples. Design II is also preferable from a computational point of view because its analysis can be carried out by using a computer program capable of handling the oneway classification with unequal replication of pairs. A specialized factorial program is required for Design I to obtain maximum likelihood estimates under the main effects hypothesis, H : Ti = a.3., where TT . . is the BradleyTerry treatment parameter given by (1.4.1). The two designs are illustrated for the 2^2 case in Figure 1.1. The comparisons made with Design I are indicated by the solid arrows. The comparisons made with Design II are indicated by the dotted arrows.
PAGE 22
Factor B low high 12 low Factor A high
PAGE 23
13 Designs I and II were compared by Littell and Boyett on the basis of the asymptotic relative efficiencies, in the Bahadur sense, of their respective likelihood ratio tests. That is, test statistics are compared on the basis of the ratio of the exponential rates of convergence to zero of their observed significance levels. In general, the Bahadur efficiency of test 1 relative to test 2 is defined to be 2 (1) lim(log L ^ ') n>co c f9)/c,(e) , , . , 2 ^ (2), ^l^""^2 lim( log L ) where L is the observed significance level of test i. The fvinction n c. (9) is called the exact slope of test i. Usually Bahadur efficiency is used to compare two test statistics that are based on the same sample space. However, its use is equally appropriate when the sample spaces are different. The two designs should be compared on the basis of an equivalent number of comparisons in each. Therefore, a "single observation" with Design I will be taken as n = R+C2 comparisons of each pair in the design, and a "single observation" with Design II will be taken as n = RC1 comparisons of each pair in the design. This gives RCiRCl) {R+C2) total comparisons in each design. Let T. . represent the treatment with the first factor at level i and the second factor at level j . The BradleyTerry model is then ^^^ij^\l) = "i/^^ij^\l^' ''""^^ ""' (l^l) j ,1=1, . . . ,C.
PAGE 24
14 The following tests of hypotheses were considered by Littell and Boyett. Test 1 : H : TT . . = 1/RC , H: TT..=a.6.o 13 a 13 1 ] This is a test for detecting two main effects versus no treatment differences. Efficiencies were tabulated for the 2x2 case (Littell and Boyett (1977) , Table 1) . The table showed that Design I is more efficient than Design II for most combinations of a and 3Test 2: H : IT . . = 1/RC , H : 7T . . = B . /R Â• 01: a 1: : This is a test for detecting a single main effect versus no treatment differences. For this test, Design I was shown to be uniformly more efficient than Design II. Test 3: H: TT..=S./R, H: Tr..=a.3.. o 1: 3 a 1: 1 : This is a test for detecting two main effects versus a single main effect. Efficiencies were again tabulated for the 2x2 case. The table showed that Design I is more efficient than Design II for most combinations of a and STest 4: H: 7T..=a.3., H: IT..^O. o ij 1 j a ij This is a test for detecting interaction versus two main effects. The results showed that Design II is uniformly superior to Design I for detecting interaction. Summarizing, when testing for main effects (Tests 13), Design I is usually more efficient than II. When testing for interaction, Design II is uniformly more efficient than I.
PAGE 25
15 Littell and Boyett suggested that it might be a good idea for the experiment to be conducted in two stages. The first stage is a number of replicates of Design II followed by a test for interaction. If it is significant, the second stage is completed by again using Design II, and simple effects are analyzed by applying oneway analyses within rows and columns. If interaction is not significant, then the second stage consists of replicates of pairs in Design I that are not in Design II, and main effects are analyzed. The total n\mber of comparisons to be made in the experiment can be divided into the two parts such that after the experiment is completed, the end result is either a complete Design I or Design II. For example, suppose the treatments form a 2^2 factorial, and resources are available for 24 comparisons. In the first stage, the experimenter rims 4 replicates of the four comparisons in Design II (see Figure 1.1). If interaction is significant, two more replicates of the four comparisons in Design I that are not in Design II are run. The latter then results in the same comparisons made as if a single stage experiment using Design I had been conducted. n Ouenouille and John (1971) also considered designs for 2 factorials in paired comparisons. However, they assumed an individual is asked to assign a number on a scale to signify the degree of difference in the two samples instead of merely deciding which sample possesses more of the parricular attribute under study. This model was developed by Scheffe (1952) . In particular, Ouenouille and John discussed the following 2 factorial. The 28 possible paired comparisons were divided into 7 sets
PAGE 26
16 of 4 blocks as follows : Set
PAGE 27
17 block for set (1) is (I, ABC), so this set provides no information on the effects AB, AC, and BC, i.e. on all firstorder interactions. Similarly, the initial block for set (5) in (1.4.2) is (I, A), so this set provides no information on B , C, and BC. This concept of no information on B , C, and BC from data in set (5) may be viewed as follows. If the responses from comparisons in set (5) are modeled as ^I,A) = ^^^^^V (^^^I^ = ^^^I,A) ' ^B,AB) = (y+a+b+(ab)fe^) (y+b+e^) = a+ (ab) +Â£ ^^^^^ , ^(CAC) = (y+a+c+(ac)+e^^) (y+c+e^) = a+(ac)+Â£^^^^^^ , ^ (v>r anr'i = (y+a+b+c+ (ab) + (ac) + (bc) + (abc) +e,^^) (BL,ABC; ABC (y+b+c+(bc)+e ) BC = a+(ab) + (ac) + (abc)+e,Â„^ ,Â„^^ , (BCABC) where all of the e ' s are independent and have expected value zero , then it is clear that b, c, and (be) , corresponding to the effects B, C, and BC, are not estimable from the four comparisons in set (5) . Quenouille and John define efficiency as follows. Suppose the design is made up of r sets, s (s ^ r) of which give information on some particular effect. Then s/r is the fraction of the comparisons made that give infoirmation on that particular effect. On the other hand, suppose that each set is run exactly once, i.e. each pair of treatments is compared. It can be shown that there are a total of (2^1) sets, _,nl 2 of which give information on any particular effect. In this case, 2 /(2 1) is the fraction of the comparisons made that give
PAGE 28
18 Table 1.1. 2 3 Efficiencies of balanced 2 and 2 factorial paired comparison designs
PAGE 29
19 are estimated with the same efficiency, and so on. The table indicates that in the 2 case, Design 11) is efficient for estimating the interaction effect. This result is the same as the result Littell and Boyett obtained using Bahadur efficiency. ElHelbawy and Bradley (1978) considered treatment contrasts in paired comparison experiments. The theory developed is completely general , but they specifically showed how optimal designs can be found in the case of a 2 factorial. Their optimality criterion is the maximization of the chisquare noncentrality parameter from the likelihood ratio test, which thereby maximizes the asymptotic power of the test. Let a^=(a ,a ,a ), where a. represents the level of factor i, i=l,2,3, and each a. is or 1. Then the treatment parameters are defined by ElHelbawy and Bradley to be 1 2 3 12 13 23 123 ,^ ^ ^, 7T=1TTr7TTl7T 7T TT , (1.4.3) ^ ^ ^2 ^3 ^^2 ^^3 ^2^ ^^2S for all a_. The factorial parameters are defined by taking logarithms of the 8 treatment parameters given by (1.4.3). So from (1.4.3), Y = logiT = ^ logTT^^ . ^ ^ logTT^^^^ 123 + logiT , a a a . T .^ . i j 12 3 1=1 i<] ' E ^a'^ E S ^a^a 123 + Y . (1.4.4) a. a a a J 12 3 i=l i
PAGE 30
20 , 1 2 3 12 13 23 123 Y^ / Y^ , Y^ , Y^ , Â» Y^ , , Y, , . Y, ^ ^ . These parameters are ^1 ^2 ^3 V2 ^iS ^2^3 ^1^2^3 siibject to the usual analysis of variance constraints, namely a.=0 1 111 Y Y ''' = V Y 123 y Y 123 Z. a^a^a^ /, a^a^a3 /, a^a^a^ a^0 a2=0 a3=0 , i
PAGE 31
21 Let a if comparing T. and T. give no information (in the sense discussed by Quenouille and John) on the twofactor interaction between the first two factors, b otherwise. where n. . is defined in Section 1.2, and N 11^.Then 12a + 15b = 1, since 16 of the 28 distinct pairs give information on the twofactor interaction. ElHelbawy and Bradley then proceeded to show that b should be taken as large as possible in order to maximize the noncentrality parameter. This gives the result b=l/16, a=0. So the optimal design is to compare the treatments that give information on the twofactor interaction between the first two factors. These pairs of treatments would be the sets (3), (4), (5), and (6) in (1.4.2). This result is the same as Quenouille and John would obtain if they wanted to find an optimal design to estimate AB interaction, since using these 4 sets maximizes the efficiency of the AB interaction using their definition of efficiency. It is also analogous to the result Littell and Boyett obtained because the treatments compared in either case differ either in the level of A, or the level of B, but not both. ElHelbawy and Bradley noted that the result discussed in the two preceding paragraphs still holds if other specified factorial effects are assumed to be null (e.g. threefactor interaction). VThile the statistics for the two tests may differ, their limit distributions are the same for either H or H .
PAGE 32
22 ElHelbawy and Bradley also considered simultaneously testing for the presence of AB, AC, and ABC interactions. In this case the null hypothesis is where 2n=3 1^ H ^=3! = Â° ' 1 11111 1 1 11 111 11 1 111 11 1 11 By the same method they concluded that the optimal design is to make the following comparisons (using Quenouille and John's notation): (I, A), (B,AB) , (C,AC) , and (BC,ABC). Note that this is set (5) in (1.4.2). Note also that this is the only set that contains information on all three of the effects AB, AC, and ABC. So this design is also optimal using Quenouille and John's definition of efficiency. Beaver (1977) considered the analysis of factorial paired comparison experiments using the weighted least squares approach of Grizzle, Starmer, and Koch (1969) . He considered a loglinear transformation on the TT.'s. This approach produces noniterative estimates of the tt.'s. Also, the usual tests for a factorial experiment are easily implemented. However, Beaver gave no consideration to experimental design. 1.5 Response Surface Designs Springall (1973) considered response surface fitting using the BradleyTerry model with ties allowed as generalized by Rao and Kupper (1967). The model is given in (1.2.5).
PAGE 33
23 The logarithm of the treatment parameters are defined to be functions of continuous, independent variables, x^,X2, ... ,x^, measurable without error, i.e. s k=l Note that some of the variables can be functions of other variables. 2 For example, s=2 and x =x results in the parameterization InTT =8x.+6x.^. The parameters 9,3 , ... ,8 are estimated by i 1 li 2 li L s the maximum likelihood method. Appendix B has an APL program and a Fortran program which produce maximum likelihood estimates of g^, Â— , S , but which assume there are no ties, s Springall presented the following definition, theorem, and example. Definition 1.1. Analogue designs are defined to be designs in which the elements of the paired comparison variancecovariance matrix are proportional to the elements of any classical response surface variancecovariance matrix with the same design points and an equal number of replicates at each design point. The advantage of such designs is that they enable certain desirable properties found in classical response surface designs (e.g. rotatability, orthogonality) , which are dependent on the relative sizes of the elements of the variancecovariance matrix, to be readily produced. * For example, consider the classical regression model y = S^ + 6 X + B X . Then an experiment can be designed in such a way that the variancecovariance matrix of the least squares estimators 3q, B^, and
PAGE 34
24 B^ possesses an optimal form. For instance, the classical design with an equal number of replicates at each of the four levels x=2,l,l,2 can be shown to be rotatable , and so the analogue paired comparison designs discussed in Example 1.1 below have the same property, i.e. var ( In TT ) = var ( In tt ) . X X 2 The analagous paired comparison model is InTT =Sx + 6x. An X 1 2 analogue paired comparison design has a corresponding variancecovariance matrix of the maximum likelihood estimators 6 ,Q , such that each element is a constant times the corresponding element of the least squares variancecovariance matrix. For the particular example of a quadratic model currently under discussion, var(6 ) = cvar(S ), var(B ) = cvar(6^) , and cov(B ,6.) = ccov(B ,6 ), for some c. Springall described a method which uses linear programming techniques that produces an analogue design with minimum elements of the variancecovariance matrix. The following theorem shows how one can produce an analogue design, but it does not necessarily have minimum elements of the variancecovariance matrix. Theorem 1.1 . If a classical response surface design has certain properties dependent on the relative sizes of the elements of the variancecovariance matrix, then the analogue paired comparison design will have the same properties if ID *i: 1 1 *^i' k1>. , = 7T.tt./(tt. + it.) , 1] 1 : 1 :
PAGE 35
25 2 Example 1.1 . Consider the quadratic model In IT = Q x + ^ x , with design coordinates x=2,l,l,2. Assimie 9=1 and S =3_=0, and set N=150. Springall showed that an analogue design is given by n. . = 150/6 = 25, i
PAGE 36
CHAPTER 2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS 2 . 1 Introduction This chapter contains the asymptotic theory for the maximum likelihood estimates of the parameters given by (1.5.1), where the logarithms of the treatment parameters are expressed as a polynomial function of a single quantitative variable, x. An example of an experiment which had a single quantitative variable was given in Section 1.1. The treatments in that example were different levels of an additive for coffee. For an experiment with quantitative independent variables , it is appropriate to do a paired comparison analogue to classical regression analysis. The theory developed in Sections 2.2 and 2.3 assume the general case of s independent variables x , ... ,x . The parameterization of the BradleyTerry parameters, given in (1.5.1), is reproduced here: TT. = V 3^X^. , 1 /_, k ki In TT. = y 3,_x, , , i=l, . . . ,t. k=l The BradlevTerrv model with ties not allowed is used, where the tt. are 1 defined by (1.2.2) . In order for all of the parameters 8 , ... ,S to be estimable, it is required that s <. (t1) . This is clearly necessary since the BradleyTerry parameters have dimension (t1) . Note also that an intercept 26
PAGE 37
27 parameter, 6 / is not estimable without imposing a constraint on the BradleyTerry parameters. This becomes apparent when examining the preference probabilities. For example, suppose the model was s 1"^ = ^ " E Vki ' '^' ^Â• k=l Then P(T. > T.) = Â•iT./('n" . + 'T .) ID 1 1 : s exp(S k=l O Y '^\i' ^^p^^ ^ S ^Ai^ " '"^^^0 " E ^ k=l k=l s ^^p^ E Vki> k=l iT^j S S Â• Â• T <. 1,3=1, . . . ,t. ^^P( E Vki^ + exp( ^ 8^x^.: k=l k=l The parameter S falls out, implying that any choice of S results with the same preference probabilities. Choosing 3 =0 replaces the imposition of a constraint on the treatment parameters. 2.2 Estimation As was previously mentioned, the parameters 3,, .Â• /6 are estimated by the maximum likelihood method. From (1.3.1), the loglikelihood equation is
PAGE 38
28 In L = ^ a.lnTT. ) } n. .ln(Tr. + tt.; i=l i 6, X ) + X .exp(^ 8, x, .) ri /_, k kx rj /_, k k^ k=I k=l i=l 1^3 ^^P^Z Vki^ + exp(^ S^x^.) k=l k=l Z_. Zj "'ij^ri Zj Zj "^i: X . exp ( ri S Vki^ ^ ^j^^p^Z Vkr k=l i?^j i?^j 'E Vki^ exp(^ S^x^.: "^p^Z. Vki^ ^ ^^p^ k=l k=l Z_J Z_Â» ID ] TT. + T. Â£1 = , r=l, . . . ,s. (2.2.2) if^j 1 D where m. . is the total number of times T. is preferred over T.
PAGE 39
29 These equations must be solved iteratively. Appendix B contains an APL program and a Fortran program which find the estimates of Q , . . . , 6 . The procedure described in the appendix converged for most situas tions. However, occasionally when an attempt was made to fit a full model (s=tl) , the procedure did not converge, which remains unexplained. In this situation one can resort to the program found in Appendix A which finds estimates of the BradleyTerry parameters, and then solve (t1) equations in (t1) unknowns to find the estimates of 6,, ... ,S Â• Is 2.3 Asymptotic Distribution First of all, some additional notation and elementary theory is presented. This will become useful when the asymptotic joint distribution of 3, , ... ,6 is derived. 1 s Let X. ., be a random variable which is equal to 1 if T, is selected ijk 1 over T. for the k comparison of T. and T., otherwise. Then X. ., has the Bernoulli distribution with probability of "success" equal to TT./(''T. + IT . ) , i,j=l, ... ,t, k=l, ... ,n.., and the Bernoulli random variables are independent of each other. This implies that m. . has the binomial distribution, with sample size n. . and "success" probability equal to tt . / CfT . + ^ .) , where n . . ID 1. . = > X. ., , ID
PAGE 40
30 var(m. .) = n. .it .tt ./ (tt . + it . ) , i
PAGE 41
31 11 i
PAGE 42
32 As previously mentioned, the remaining chapters deal with finding optimal designs when the logarithm of the treatment parameters is expressed as a polynomial equation in a single variable. The various optimality criteria considered include (i) Minimization of the variance of the maximum likelihood estimator of one parameter, (ii) Doptimality, (iii) Minimization of the average variance of the predicted "response". In tt , (iv) Minimization of the maximum variance of In tt (minimax) , X (v) Minimization of the bias, (vi) Minimization of the average mean square error of In t when bias is present (Jcriterion) , (vii) Doptimality for a preliminary test estimator. These are all defined in subsequent chapters. For the case of a single quantitative independent variable, the model given in (1.5.1) becomes 'i X ^^i' In 7T, = P" S,.x, , i=l,...,t. (2.3.9) k=l Similarly, the variancecovariance matrix given in (2.3.8) becomes (A , ) , where ab V V ''i^'j a a b = ) / n. . ^ Â— (x. X. ) (x. X X ^ = > > n. . " ^ ., (x."x.") (x."x.^) , a,b=l,...,s. ^"^ " 3 (2.3.10)
PAGE 43
33 Notice that the complete design problem is quite complex. It involves deciding how many levels and additionally which particular levels should be chosen. Furthermore, it must be determined how often to compare each pair of treatments. A further complication is apparent by noticing that the variancecovariance matrix, whose elements are given in (2.3.10), depends on the true parameter values. Hence the optimal designs also depend on the parameters. The designs discussed by ElHelbawy and Bradley and Springall were found under the ass\jmption that all BradleyTerry treatment parameters are equal, or equivalently that 6 =6 = Â•Â•Â• =6 =0. In no 12 s paper of which the author is aware was the dependency of the optimal design on the true parameter values examined. As will become apparent in the following chapters, the optimal design for one set of parameter values can be quite different from the optimal design for another set of parameter values .
PAGE 44
CHAPTER 3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER ESTIMATE 3 . 1 Introduction The present chapter contains a discussion of the linear, quadratic, and cubic models. The optimality criterion considered is the minimization of var(S.), where i=l,2, and 3 for the linear, quadratic, and cubic models, respectively. The linear model is discussed in Section 3.2. Designs for the quadratic model are found in Section 3.3 for S_=0, 3, variable. The general expression for var(Bp) , given by (3.2.5), can be shown to be a continuous function of S , and hence the designs in Section 3.3 are locally optimal at the point 6_=0. Likewise, de signs for the cubic model are found in Section 3.4 for S =0, 8 and S_ variable, and are similarly locally optimal. 3 . 2 Linear Model From (2.3.9), the linear model is In TT. = Sx. , i=l,... ,t. (3.2.1) 1 1 Since there is only one parameter, by (2.3.10) the variancecovariance matrix becomes a simple variance, X , where 6(x. + X.) \ = y y n.. ^ Â— ^ Â— (X. x.)^ i
PAGE 45
35 11'. (X. x.)^ 1 1 . . (exp(S(x.x.)/2) + exp(6(x.x.)/2)) ^ 2 X Z ""ij 6d /2 ^' 6d ii]. Although it is not necessary to treat the two cases 6=0 and 87^0 separately, the special case 8=0 is considered first because the arg^ament used to derive the optimal designs is easier to follow for this situation. When 6=0, (3.2.2) reduces to ^ = / / n..(x. x.)^/4 . (3.2.3) i
PAGE 46
36 :^r OA a^2^ 6d/2 6d/2^ h. = 2d _ Bdje ^_e I n 2 5) 3d , 6d/2 6d/2 2 , 6d/2 6d/2, 3 ' (^Â•^^) (e + e ; (e + e ) The equality in (3.2.5) implies coth(6d/2) = 3d/2 . (3.2.6) The solution to coth (u) = u is a transcendental number, and can be shown to be approximately u=1.1997. Hence, from (3.2.6) and the fact that d must be less than or equal to 2, the design which minimizes var(B) for a given value of 8 is Idl = Ix. x.l = min(2,2.3994/l si ) . (3.2.7) 2. J It is now shown that the value of d given in (3.2.7) is in fact a maximum for y, and consequently for A. By (3.2.5), the second partial derivative of y with respect to d is given by 3^y . 6d/2 ^ Bd/2.2 Â— (e + e ) = dd" 2 4d6tanh(Bd/2) S^d^/2 + i3^d^tanh^ (Sd/2) . (3.2.8) 2 2 The second partial derivative 9 y/9d is negative if and only if the expression in (3.2.8) is negative. From (3.2.6), the equality 2 = 8dtanh(Bd/2) (3.2.9) holds at d=2. 3994/1 Sl , which is then substituted into (3.2.8) to give 9d^ 8 Q^d^/2 + 6 = Q^d^/2 < . (3.2.10) opt. d
PAGE 47
37 Because there is only one solution to (3.2.6) where d>0, the result in (3.2.10) implies that the solution of d[ given in (3.2.7) does in fact minimize the variance of 6. By (3.2.7) the optimal design is a comparison of only the pair (1,1) when S <1.2. It can be shown that P(l h1)=.92 when 6=1.2. Because of this relatively large preference probability of choosing one endpoint over the other, in most practical situations 3 will be less than 1.2. So in these cases the optimal design is a comparison of only the endpoints of the experimental region. The optimal designs given by (3.2.7) are also optimal for other criteria. In particular, they are optimal for criteria (ii)(iv) introduced in Section 2.3. These criteria are formally defined in Chapter 4, but a brief explanation for their equivalence in the case of a linear model follows. The equivalence of the criterion being discussed in the present section to Doptimality follows from the fact that the determinant of a 1x1 matrix is simply the only element in the matrix. The equivalence to the other two criteria follows from the fact that 2 ^ vardn tt^) = x var(6) . (3.2.11) From (3.2.11), criterion (iii) is, in this case, the minimization of / 2 2 '^ I X var(B) dx = var(6) , 1 and criterion (iv) is the minimization of 2 ^ max X var(B) = var(6) xeL1,1]
PAGE 48
38 3 . 3 Quadratic Model The quadratic model is In TT. = 6^x. + Q^x^ , i=l,...,t. (3.3.1) From (2.3.10), the asymptotic variancecovariance matrix of the maximum likelihood estimators S, and 3 is (A , ) , where 1 2 ab X^^ = > > n,^<)^^(x, X J , (3.3.2) ) ) n. . > n..cj)..(x. X.) (x.^ x.^) , (3.3.3) 12 Z_i ^ ID ID 1 D 1 D i
PAGE 49
39 Minimizing the asymptotic variance of 6_ would equivalently be maximizing the asymptotic power of the test. The variance of 6 is the lower righthand element of the 2^2 matrix (X , ) . That is, ab '^11 varCB^) = J . (3.3.6) ^11^22 " '^2 Under the null hypothesis, (3.3.5) becomes B^(x^ + X.) ( e + e ' ) As was pointed out in Section 3.1, in the present section (3.3.6) is minimized for B =0. This produces locally optimal designs. Assioming 6 =0 greatly simplifies the design problem as will be seen in the remainder of the section. The end result of Lemmas 3.13.2 and Theorems 3.13.3 is a set of optimal designs which minimize var(6 ) under the null hypothesis for various choices of the linear parameter. Hereafter, "optimal design" will refer to the design which is optimal for the criterion being discussed in the section. The following two definitions are useful in the discussion that follows . Definition 3.1. The pair of treatments (x ,x ) is defined to be the symmetric counterpart of the pair (x ,x ) . Definition 3.2 . A design is said to be symmetric if each pair in the design is compared as often as its respective symmetric counterpart.
PAGE 50
40 Lemma 3,1. Let (l)(x.,x.) denote 4 . . as given by (3.3.7). Then (x.,x.) = (t (X. ,x . ) , for any x. and x.. 1 : 1 : ID Proof. Â•^(x. ,x.) 6t (x. + x.) 26, (X. + X.) 1 1 ] 111 e e ' i' j 8,x. 3,x. Â„ 26, (X. + X.) ,li li2 li 1 ( e + e ' ) e ' B, (X.X.; e ^ ^ ^ 6,(x.) S,{x.) _ ( e ' + e ) = (})(x. ,x .) . t 1 3 Theorem 3.1 . The optimal design is a symmetric design. Proof. Notice that both of the following hold: (i) (X. x.)^ = ( (X.) (X.) )^ , : 1 3 1 (ii) (x.^ x.^) = ( (x.)^ (X.)" ) . 3 1 3 1 From (3.3.2), (3.3.4), Lemma 3.1, and the above, it is clear that X and A are invariant under changes of any number of comparisons of pairs to comparisons of their symmetric counterparts. Let D denote any particular design, and let var (6_) be the var1 1 iance of B for Design D . Also, let the i pair in D be denoted (x.,xl). The sets of comparisons {(x.,xl), (x.,xl)} are considered 11 1111 in sequence, and without loss of generality, it is assumed that n , n , = d. ^ , X. ,x I x . ,x . 1 11 11 for all 1. Notice that it could be true that n , = 0. x. ,x: 1 1
PAGE 51
41 Design D is formed from Design D by changing d./2 of the comparisons of the pair {x.,xl) to comparisons of the pair (x ,x'). This 11 11 results in (n , + n ,)/2 comparisons of each of the two pairs X. ,xl X. ,x: 11 11 in the i set. This change is made for each set of comparisons, thereby defining Design D to be a symmetric design. The total number of comparisons, N, is left unchanged. By the first paragraph of this proof, the values of ,\ and A are the same for the two designs. By (3.3.3), 'V =0 for Design D , 2 and hence X is a minimum. Therefore, by (3.3.5), var (3 ) 2. var (6Â„) . (3.3.8) 2 "l ^ Note that the variances are equal if A =0 for Design D , and it 12 ^1 is possible that ^,^=0 for a nonsymmetric design. However, (3.3.8) shows that no design can "do better" than a symmetric design. Also note that in practice, d. must be an even integer in order for the n . . ' s to be integers . So the search for optimal designs in the present section will henceforth be restricted to the class of all symmetric designs. Because ^.2~^ ^Â°^ ^ symmetric design, (3.3.5) reduces to var(62) = I/A22 Â• (3.3.9) So it is now necessary to find designs which maximize A. . Notice that ^22 depends on 3 . As previously mentioned, the optimal design also depends on the value of 8 .
PAGE 52
42 Theorem 3.2 . The optimal design is a design in which the only comparisons made are comparisons of the pairs (x,l) and (1,x) an equal niimber of times, for some x. Furthermore, x^ 0. Proof. First of all, the notation used for X , v^ich is defined by (3.3.4) and (3.3.7), is changed. The N total comparisons are arbitrarily ordered, and the i pair is then defined to be (x. ,xl). The vari11 ables y. and z. are defined to be z . = x. + xl , i=l, . . . ,N. Ill Then from (3.3.7) , 8, (x. + x!) 6, (X. + x!) Ill 111 e e B,x. S,x: _ 6, (x. + x!) , li^ li\2 li 1 ( e + e ) e S,y,/2 6^y./2 ( e + e ) i=l, . . . ,N, Notice that 6. is a function of y. but not of z . . From (3.3.4), A^is 1 11 22 rewritten as *22 V ^ 2 2 ) .y. z. / . 11 1 1=1 Let D be any design. By holding y. fixed, we examine what changes in z. will increase \^^. Clearly if z.l is as large as possible for 1 22 ^ ' i' every i, then \^ is also as large as possible for the particular set ^ 2 of y.'s found in D. Without loss of generality, it is assxamed that x.
PAGE 53
43 all comparisons must involve either x=l or x=l. Additionally, using an argument similar to one used in Section 3.2, it is clear from the expression for A given in (3.3.4) that x=l should be compared with one and only one level of x, namely the value of x which maximizes 2 2 (}'(l,x) Â• (1x ) . This result in conjunction with Theorem 3.1 proves the first part of the theorem. We can now rewrite \ simply as A22 = N(}){l,x) . dx")^ . It remains to be proven that the optimal value of x is such that x ^0. The variable H^ (x) is defined to be the positive square root of A^/N, i.e. 6^(1 + x)/2 'P(x) = ^g ^^ (1 X ) . (3.3.10) e + e Since (3.3.10) and the fact that xÂ£[l,l] imply 'Kx) ^0, maximizing A is equivalent to maximizing ij^(x). It will now be shown that 'Mx) ^'i(x) , x ^ 0. The following five inequalities are equivalent: ^{x) ^ ^(x) (3.3.11) 5^(1 + x)/2 S^d x)/2 (3.3.12) ^1 ^rh \ e + e e + e 6^(l+x)+6^ 6^(l+x)3^x i^^(ix)+3^ 6^(lx)+3^x ^ e + e (3.3.13)
PAGE 54
44 3^(1 + x)/2 6^(1 + x)/2 B^d x)/2 6 (1 x)/2 e + e ^ e + e (3.3.14) cosh(u + 6 x) ^ cosh(u) , (3.3.15) where cosh(u) = (e + e )/2 , u = e^(l x)/2 . Now cosh(u) is a monotonically increasing function of u for u. ^ 0. Furthermore, it can be assumed that B ^0 because Theorem 3.3 proves that the optimal designs for 8 = S are the same. Or, equivalently , the remainder of this proof can be repeated for the case B <0. Then X ^ and B ^0 implies Q x ^0, and hence by the equivalence of (3.3.11) and (3.3.15), '4^ (x) ^'4'(x) for all x ^ 0. This completes the proof of the second part. t The next theorem shows that optimal designs only need to be found for nonnegative values of S, Â• Theorem 3.3 . The optimal design for B =B is the same as the optimal design for S =6 . Proof. Recall that the optimal design is found by finding the value of X which maximizes 'l^(x) from the proof of Theorem 3.2. Since '4) (x) is actually a function of 3 and x, we '/rrite B^d + x)/2 '^<2wx) = ^^ Â— (1 x^) . e + e
PAGE 55
45 The theorem is proved by observing that 6^(1 + x)/2 6^(1 + X 'l'(3wX) = ^ ^Â— (1 x^^ ^ 1' ' S^ B^x ^^ " ' 6^(1 + X) e + e e e, (1 + x)/2 2 (1 X ) 6,x B^ e + e ijj(6^,x) All that remains to be done is to find the value of x that maximizes ij(x) for various choices of B ^ 0. Unfortunately, a closed form solution of X as a function of 3 does not exist. However when 3=0, the optimal value of x can be analytically derived, and hence this situation is considered first. 2 Suppose 3 =0. Then from (3.3.10), i;(x) = (1 x )/2. Clearly this is maximized by x=0, and hence from Theorems 3.1 and 3.2, the optimal design is an equal number of comparisons of only the pairs (0,1) and (1,0). For the case 3 t^O, the grid method described in the next paragraph was used to obtain the results presented in Table 3.1 and Figure 3.1. The grid method utilized first calculated "Mx) for x=0(.l)l, and the best value of x, say x , out of these 11 choices was found. Then ijj(x) was calculated for the 11 values x= (x . 1) , (x .08) , ... , X 1 (x +.1), and again the best value of x was foxind. This procedure was repeated until the optimal value of x was found accurate to the fourth
PAGE 56
46 decimal place. The entire procedure was done separately for each value of 6J. The following lemma proves that the values of x which are found via the grid search are absolute maximums for \Ij (x) . Lemma 3.2 . Exactly one local maximum (and hence it is the absolute maximum) exists for i; (x) in the interval [o , 1 ]. Proof. By Theorem 3.3, only the case Q >0 needs to be considered. From (3.3.10), the first partial derivative of t]j(x) with respect to x is g, (1 + x)/2 '' ^1 2 e (6^(1 x )/2 2x) ti>'(x) = ^1 ^1^ ( e + e )" ^1^ 2 e (S^(l x )/2 + 2x) . (3.3.16) It can be seen from (3.3.16) that tj'(0)>0 and i;'(l)<0. Hence ';'(x)=0 for at least one xeLo,lJ, and it remains to be shown that i];'(x)=0 for exactly one xeLo,lj. (An attempt was made to show i^" (x) is negative for all B, and xeLo,lj, but it turned out not to be true. In particular, a graph of i^(x) as a function of x for 3, =5 indicated 4;"(x)>0 for values of x near . ) From (3.3.16), '^j' (x)=0 if and only if S (1 X) (6^(1 x")/2 2x) = e (S^d X )/2 + 2x) (3.3.17) Notice that the functions f (x) = (3 (1 x )/2 2x) and f Â„ (x) = (3^(1 x')/2 + 2x) are reflections of each other about the axis x=0, and they are quadratic in x. Hence they intersect exactly
PAGE 57
47 once at x=0. Also, for fixed S, , f, and f have maximums at 2/S and 2/6 , respectively. Let g (x) be the righthand side of (3.3.17). It was shown in the preceding paragraph that f (x) and g (x) intersect at least once in the interval Lo,lJ. To determine that they intersect exactly once, it is sufficient to show that 3f (x)/3x < 8g (x)/3x. From (3.3.17) , Sf^(x) = 6,x 2 , 3x "^1 3g (x) 3 (1 X) , 6 (1 X) Â— ^ Â— = B^e (3^(1 x"')/2 + 2x) + e ^ (g^x + 2) 6(1 X) 2 = e ^ (6^ (1 X )/2 + 6^x + 2) . For 6 >0 and xeLo,l], it is clear that 3f (x)/3x < and 3g (x)/3x > 0, which completes the proof. t Table 3.1 presents the optimal value of x found from the grid search described immediately prior to Lemma 3.2, for various choices of I 6, I . Figure 3.1 is a graph of the optimal value of x versus I 6,  . The table includes relatively large values of 6 ! to illustrate that x+l as 6j'HÂ». Recall that for a given value of S,, the optimal design is a comparison of (1,x) and (x,l) an equal niojniber of times, where x is found in Table 3.1 or Figure 3.1. Table 3.1 indicates that the optimal design when S =0 involves only 3 levels, x=l,0,l. However, when 6 7^0, the optimal design requires 4 levels, x=l,x ,x ,1, for some x . In certain experimental o o o situations it may be significantly more laborious to make four "batches'
PAGE 58
48 Table 3.1.
PAGE 59
49 X O (J) Â— o 00 o 1^ o o in o Q ro OJ O O _ 00 I^ tI3 in ^ ro CvJ Â— O o
PAGE 60
50 than it is to make just three. For this reason, and because a priori information about S, niay not be available, asymptotic relative efficiencies were calculated for the design (n n ~ "^n i ~ ^/^^ ^Â° '^^^ optimal designs found in the table. The relative efficiency of the first design to the second (optimal) design is defined to be var (6 ) R.E. = . var^CB^) e "/(e *Â• + 1)^ S,(l.x) 6, 6,x e (lx)/(e +e ) The relative efficiencies were enumerated for various choices of 8 , and are presented next to the optimal designs in Table 3.1. Equivalence of ElHelbawy and Bradley's method and minimizing the variance An application of ElHelbawy and Bradley's method for finding optimal designs was briefly discussed in Section 1.4. It will now be proven that their optimality criterion is equivalent to minimizing var(3, )/ any k. First of all, notation and results of ElHelbawy and Bradley's paper which are necessary for the proof of Theorem 3.5 are presented. Recall that ElHelbawy and Bradley considered linear contrasts of Y = In TT , i=l, ... ,t. The matrices B and B are defined as mxt and 11 Â— m Â— n n^t, respectively, with zerosum, orthonormal rows, such that s. m < m+n s t1. The null hypothesis is H : B Y('^) = , o Â— n Â— n
PAGE 61
51 and the alternative hypothesis is H : B Y(TT) 7^ , a Â— n ra where tt and Y(2L) 3xe column vectors with i elements tt. and y., respectively. The rows of B represent the linear contrasts ass\imed to be Â— m null. Note that B need not exist (i.e. m=0) . Â— m The vector it is a value of tt which satisfies the null hypothesis Â— o Â— and the constraints, where tt is the true parameter vector. The matrix B is defined to be the txt orthonormal matrix B = I'/i/t B m B* m (3.3.18) where B* is anv (tml)xt matrix such that BB' = I , and l'=(l, ... ,1), m ' Â— t Â— the tdimensional unit vector. The following theorem is extracted from ElHelbawy and Bradley (1978, Theorem 2, page 833) . Theorem 3.4 . Let p be the maximum likelihood estimator of tt_. Under the ass\m:iption of connectedness (Assumption 1.2) , and by the assumption of strictly positive elements of ^, which has already been inherently assumed since ln(0) is undefined, v^(Y(p) y_{Jl)) has a limiting distribution function that is singular, tvariate normal in a space of (tm1) dimensions, with zero mean vector and dispersion matrix E(TT) = B*' (C(^))~^B* , Â— in m (3.3.19) where
PAGE 62
52 C(7r) = B*A(77)B*' , (3.3.20) Â— m m and A(TT) is the txt matrix with elements V. . (TT) = 77. ^ 1J..TT./(7T. + 7T . ) , i=l,...,t, 11 1 Z_j ID : 1 3 j=l (3.3.21) A (TT) = y. .Tr.1T./(7T. + TT.) , ifij, i,j = l,...,t, Â•Lj_ 2.J X J 1 J ' J ' ' J r t t and where u. . is defined in Assumption 1.2. ElHelba;v)Y and Bradley's optimality criterion is the maximization 2 of the noncentrality parameter, A , for the asymptotic chisquare distribution of the likelihood ratio statistic. The noncentrality parameter is \^ = 5'Z "'Â•5 , (3.3.22) Â— o Â— where 6 is a vector of constants defined in their paper, E is the Â— o leading principal matrix of order n of C , B provides the first n Â— o Â— n rows of B*, and Â— m C = B*ACrT )B*' . (3.3.23) Â— o TO o Â— m A theorem proving the equivalence of the two optimality criteria is now presented. 2 k Theorem 3.5. Consider the model In tt = S,x+6^x + Â•Â•Â• +6, x . x 1 2 k Suppose we are interested in testing, for some fixed i, H : 6.=0 versus H : 3 . ,5^0 o 1 a 1
PAGE 63
53 Then the design which minimizes var(6.) is identical to the design which maximizes the noncentrality parameter from the likelihood ratio test. (Note that it does not matter whether or not it is assumed that some of the other 6 ' s are . ) : Proof. Orthogonal polynomials can be derived for any spacing of the independent variable. Let these contrasts be the rows of B. The ixt matrix B^^^ is the orthogonal polynomial corresponding to the parameter 6. being tested. From the method used to find orthogonal polynomials, it follows that where c is a constant. So minimizing var(6.) is equivalent to minimiz1 ing var(B ,y(p) ) . But from Theorem 3.4, Â— n=l ' ^ar{B_y{p)) = B . (B*' (C(7T) )"'B*)B' Â— n1 n=l m m n=l = (1,0, ... ,0) (C{^)) '" = ((Â£(Â£)) ^)^^ . 2 From the definition of X , S is the leading principal matrix of order 1 of C ~ , i.e. Â— o ^o = ((2(1))"') 11 .
PAGE 64
54 2 In this case, 6_ is a scalar constant, and so maximizing A is equivalent to minimizing S = var(S ) . t o 1 Example 3.1 . This example illustrates the equivalence of the two methods presented in Theorem 3.5. It is assumed that the true model 2 is a quadratic model. In it = 6.x + BÂ„x . Also, the design is restricted to have the three design points x=l,0,l. Table 3.1 indicates that the design which minimizes the variance The design problem is now approached using the method presented by ElHelbawy and Bradley. The null hypothesis is H : 6_ = , 3, unknown, o 2 1 Using orthogonal polynomials, this hypothesis is equivalent to .'6 where y_' (7T_) = (In t_ , In tt , In tt ) . Note that ElHelbawy and Bradley's constraint, that the In tt . ' s sum to zero, poses no problem. Instead of constraining tt to be equal to 1, an intercept parameter, S> , could be introduced to the model. The model would then become In TT = SÂ„ + 3,x + 8^x^ , X 12 with the constraint \ In TT = . X 2 It can be shown in this particular case that S^ = ^j
PAGE 65
55 For a definition of the following matrices and notation, see (3.3.18)(3.3.23) . For the example under discussion. ln=l = (1 2 D/Ze , B* Â— m (1 2 D/v'e (1 l)//2 A(7T ) o '^12 ^ ^13 y 12 ^2 'l2 ^ ^23 U u u 13 13 y 23 23 ^13 ^ ^23 where 7T^' = (1,1,1) , i.e. 8^=62=0. Note that tt^ can be any vector which satisfies the null hypothesis and the constraint. The matrix C is Â— o C = B*A(7T )B*' "~o m o m 1 ^^2 " ^23) yn ^2 23 Jf ^^2 ^23^ i (^2 %3 ^ ^23) implying C Â— o ^^12 ^ ^^13 " ^ 13 ^23' ;= ^^12 ^23^ /12 = ^^12 ^23^ /12 2 ^^12 ^ 1^23^ Â— o' Then since Z^ is the upper lefthand element of C 1 "12 ^ ^^3 " ^23
PAGE 66
56 implying 2 C O U Â„ + 4u + U ^12 "^IS ^^23 3 ^12^13 ^ ^2^23 ^ ^3'^2_3 _ ^3 3 ^^^ 12 ^13 ^23 The design which maximizes the chisquare noncentrality parameter from the likelihood ratio test is the design which maximizes S o By the definition of the y. ., y^, = 1 y,^ y,,. So Z can be 13 23 12 13 o rewritten as J 1 ^ ^2^3 ^ ^^2 ^ ^3^ ^' ^2 ^3^ y +4y +(lu y ) ^12 ^13 ^12 ^13 ^12^^ ^12 " ^13^ ^ ^^13 " ^13^^ Â— 3y^3^ 1 ^' ''Â•'''' Suppose that y is temporarily held constant. It is then claimed that choosing y,_ = (1 y,)/2 maximizes T. for the particular y,^. Iz 13 o 13 From (3.3.25), the expression y ( (1 y ) y,^) must be maximized. Because of the constraint s> y, Â„ <. ly , it is clear that 12 13 y ((1 u ) y ) is maximized when y = (1 ^,3)/^, which proves the claim. The constraint among the y. .'s in addition to the above claim implies that the optimal design must be a design such that ^ ^3 ^2 = Â— T^ = ^23 Â• ^''^^
PAGE 67
57 The scalar E is now rewritten as a function of u,^ alone. From o 12 (3.3.24) and (3.3.26) , 2 1 _ 3 ^12^^ ^^12^ ^^12 o 8 2y^2 + 4(1 2m^^) 3 ^12 ^^2' 8 4 6U^2 16 " Clearly Y. is maximized bv choosing y._ as large as possible. By o * 12 (3.3.26), this would make U =u^ =.5, and y =0. So this optimal design is in fact the same as the design which minimizes var(6 )Â• t An attempt was made to find the optimal design for the test of hypothesis H : 6, = 6^ = versus H : not H o 1 2 a o using ElHelbawy and Bradley's methodology. ^Jhen simultaneously testing more than one parameter , or equivalently more than one treatment contrast, a constant value, 6., must be chosen for each parameter. The 1 theory behind the optimal designs includes an infinite sequence of values of tt satisfying H , which are converging to a vector tt satisfyÂ— a Â— o ing H . When the treatment contrasts are for factorial treatments, o which is the case they discussed extensively, the relative sizes of the 5 . are said to be equivalent to the relative importance placed on each contrast being tested in H . Hence, ElHelbawv and Bradlev usuallv o chose 5 =6Â„= Â• Â• Â• = 5 . 12 n
PAGE 68
58 In the present situation, choosing 5 and 5 restricts the alternative parameter values in the infinite sequence mentioned in the preceding paragraph to be a proper subset of the alternative parameters satisfying H . For example, <5 =6 implies that B =& , and hence the 2 alternative models are In fr = 6x + 6x , with B^O as NxÂ». Choosing X different 6 . can result in a different design because the maximization of the noncentrality parameter depends on the relative sizes of the 5.. Additionally, the optimal designs actually found after choosing the relative sizes of 6 and o were found to be unreasonable (e.g. for 6 =6 , the optimal design is a comparison of only the pair (1,0)). Chapter 4 presents other optimality criteria for fitting a quadratic mode 1 . 3.4 Cubic Model The cubic model is In TT. ?^x. + B^x.^ + S3X.^ , i=l t. (3.4.1) The asymptotic variancecovariance matrix of the maximum likelihood estimators, B, , S^, and S, is (X ^) , where X , is given by (2.3.10) 1 2 3 ab ab The inverse of the variancecovariance matrix is partitioned as
PAGE 69
59 var(e3) = ah 33 ^3 '^3 '^23' 11 12 12 22 1 13 23 1 33 \3^\3^22\2^23^ "23^^23^2^3^ \l'^22 ^2 1 (3.4.2) To simplify (3.4.2), only the case S =0 is considered in the search for designs which minimize var(S ). As pointed out in Section 3.1, this gives locally optimal designs. The following theorem shows that optimal designs only need to be found for linear and quadratic parameters which are both positive. Theorem 3.6 . Suppose that 6^=0, and further that the optimal design for given 6 and S is comparisons of the pairs (x. ,x ) a total of times, for all pairs. Then X. ,x . 1 J (i) The optimal design for (6^) and (8^) is the same as the optimal design for 3 and 3 , (ii) The optimal design for (6^) and 3 is comparisons of the pairs (X ,x.) a total of n times, for all pairs, 1 J X. ,X . irf .... ^3 (ill) The optimal design for 3 and (6 ) is the same as in (ii) Proof. From (3.3.5) , 4) q o (x. ,x.: 3, ,3_ 1 :
PAGE 70
60 exp(6^(x^+x.) 62 (x^ +x. )) exp(26^(x^+x.) + 26^ (x^^+x^^) ) (exp(6,x.6 x.^) + exp(6,x.6.x.^))^ exp (26, (x .+x . ) + 26_ (x . ^+x . ^) ) li2i lj23 I13 2ij 2 2 exp{6 (x + X.) + S(x. + X )) ^ ^ J ^ X J 2 2 2 (exp(6,x. + 6_x. ) + exp(6,x. + 6X. )) li 2i 1 J 2 J = (x ,x ) . (3.4.3) 12 ' 2 Notice that since 6 =0, it . it . / (tt . +11.) from the expression for X , 3 1 ] 1 J ab given in (2.3.10) is equal to 1^Â„ (x.,x.). Hence bv (2.3.10) and p^/P2 1 3 (3.4.3), it is clear that A , is invariant under chanaing the sign of ab both B and 6 , for all a and b. Therefore the variance of 3 is also invariant under changing the sign of both 3 and 6 . This implies that the design which minimizes var(B^) for B^ and 3^ is also the design which minimizes var(6 ) for (6 ) and (6^), thereby proving (i) . To prove (ii) , note that from (3.3.5), 2 2 exp(6, (x.+x.) + S^(x. +x. )) , , , 1 1^ 213 * (x. ,x ^ = ^ ^ ^ (exp(6,x.+S^x.^) + exp(6,x.+8_x.^))^ ii2i i:)23 2 2 exp(S, (X.X.) + 6^(x. +x. )) ^ 1 1 J 2 1 : (exp(6 (x.)+S^x.") + exp(6, (x.)+6_x." Ii2i 1:2] Again, by (2.3.10) and (3.4.4), changing 3 to (6 ) and the pairs (x.,x.) to (x.,x.), for all pairs in the design, will not change \ , 131: ll X , ^,,Â» nor \ . It changes only the sign of A and \^ . But
PAGE 71
61 notice that in (3.4.2) all terms involving A and X are of the form . . .,2 2 12 23' 12 ' Â°^ 23 ^Â®"^Â® th^ negative sign cancels, leaving var(B2) ^"'^^^i^nt under the change described above. This proves (ii) . To prove (iii) , simply apply the results of (i) and (ii) . t Theorem 3.5 proved the equivalence of minimizing var(3 ) and ElHelbawy and Bradley's criterion. A Fortran computer program was used to find the optimal design using ElHelbawy and Bradley's methodology for 6^=62=6^=0. The levels allowed were x=l ,. 5 ,0, . 5, 1. The grid search included all ten pairs formed by these five levels, assuming a symmetric design. The optimal design found was ^n ' l,.5 ".5,1 " Â•Â°'*^^' ''55 = ^^IN, n = .298N, all other n. . = O}. For this design, it can be shown that var(S )=16/N. It was decided that for unequally spaced levels, employing ElHelbawy and Bradley's methodology would be substantially more laborious than minimizing var(63). ^Â°'' ^^^^ reason, the designs subsequently discussed were found by minimizing the expression found in (3.4.2). Optimal designs were found by a grid search via a Fortran computer program for 83=0, &^,2^=0{ .2)1. The comparisons allowed were (l,x ), '^l'^2^' ^^'^2'"'"^' ^'^ (1/1). where the optimal value of x and x were also found by the computer. The motivation for the choice of these four pairs is the result discussed in the two preceding paragraphs for the case of all three parameters equal to zero, which indicated that only the pairs (l,.5), (.5, .5), (.5,1), and (1,1) need to be compared. The grid procedure is now briefly described. The initial values for x^ and x^ were x^=. 7 ( .1) ". 3 and X2=.3(.l).7. For each of the 25
PAGE 72
62 combinations of x and x , the optimal comparison proportions were found by another grid. The latter grid began by considering different comparison proportions, changing by increments of 0.1. This grid became finer and finer until the proportions were accurate to Â±.0004. From this the best combination of x and x out of the initial 25 choices was fo^md, and the entire procedure was repeated for a finer grid about the previous best choice of x and x . The entire grid search stopped when x and x were accurate to Â±.0037. Notice that it is possible that the designs found are not absolutely optimal, but rather they may only be local minimums for (3.4.2). This is because only four pairs were considered when it is possible that a five or six pair design, for example, might offer a slight improvement. Also, the grid search restricted x and x^ rather than allowing them to take on values in the entire interval [1,1 j. The latter point is not believed to be a problem due to the resulting design for Q =B =Q =0 originally found using ElHelbawy and Bradley's methodology, and the fact that the optimal designs currently being discussed all have values of x and x within 0.02 of 0.5. The optimal designs and the value of var(8^) for N=l are presented in Table 3.2. The first two columns of the table give the value of 6 and 8 . Columns 5 through 8 give the comparison proportions for the pairs (l,x ), (x ,x ) , (x ,1), and (1,1), respectively, where x and X are given in columns 3 and 4. The last column gives Nvar(6 ). The number of times a pair is compared is found by multiplying the appropriate comparison proportion by N. Recall that by Theorem 3.6, the optimal design for both parameters negative is the same as the optimal design for 3  and 3. If exactly one parameter is negative, then
PAGE 73
63 the optimal design is foiand by first finding the optimal design for I 6^ I and l&^l ' ^^^ ^^^" taking ]j_ to be the proportion of the total nvunber of comparisons that the pair (1 , x ) is compared, u to be 1 x^,X2 the proportion for (x ,x ), and so on. To calculate var(S ) for N>1, the tabled value of Nvar(3^) is divided by N. Notice from Table 3.2 that for Q. '^ .6, the optimal design is very close to the design {n_^ . ^ = ^_ c, s'^Sl" Â•3333n}, regardless of the value of 6 . This implies that if it is a priori believed that Â•^ 1 is relatively large, then this design will result in a variance of 6 very close to the tabled minimum. As has been previously mentioned, it is also of interest that the optimal values of x and x are always very close to .5 and .5, respectively. Finally, notice that the design for 3^=6^=0 found in Table 3 . 2 is not the same as the one found using ElHelbawy and Bradley's methodology, although the value of var(6^) appears to be the same for both, and hence they are equally optimal. However, it was not established that the two designs are mathematically equivalent. Since the two criteria are equivalent, there may be some problem with local minimums of var(6 ). It appears that the surface of the expression for var(6 ) is relatively flat, and so it is possible that a finer initial grid is necessary to guarantee that the minimum found is in fact the absolute minimum of var(6 ).
PAGE 74
64 Table 3.2. Designs which minimize var(6 ) h
PAGE 75
CHAPTER 4 DESIGNS FOR FITTING A QUADRATIC MODEL 4.1 Introduction The present chapter contains optimal designs for fitting a quadratic model iinder three different conditions, each for three different optimality criteria. The Doptimality criterion is considered in Section 4.2. This criterion minimizes the determinant of the asymptotic variancecovariance matrix. In the present chapter, this matrix is a 2X2 matrix since there are two maximum likelihood estimators, 6 and 3 . From Section 2.3, these estimators are asymptotically normal. Therefore, by a result in Box and Lucas (1959), the determinant of the variancecovariance matrix is proportional to the volume contained within any particular ellipsoidal probability contour for &^{B ,3 ) about 3^, where 6 is the true parameter value. The Doptimal design will then minimize the volume of any such probability contour. Section 4.3 contains designs which minimize the averagevariance ^ V '^ 2 Â°f In 7T^ = 3 x + 3X . The averagevariance is simply the variance of 1" ^x ^"^^s^^ated over the experimental region. In this case, the experimental region is xÂ£Ll,lJ. The property that these designs have is obvious. That is to say, a good estimate of all of the preference probabilities is desirable, and the design which minimizes the integral over x of vardn ^ ) will give estimates of tt which are good "on the X X average" . 55
PAGE 76
66 The criterion discussed in Section 4.4 is similar to the averagevariance criterion. The designs found in this section minimize the maximum of var(ln tt ), where the maximum is taken over the interval X xeLlfl]. These designs are referred to as minimax designs. The disadvantage of this criterion is similar to the disadvantage often attributable to the minimax rules in the decision theory framework. That is, minimax designs may be slightly "pessimistic" in the sense that they minimize the largest value that the variance can be. This minimization is good, but at the same time the variance of In it may be relatively X large for most values of x in the interval [1,1 J, hence causing the minimax design to be unfavorable. Finally, Section 4.5 presents some design recommendations and conclusions. A number of designs are compared to determine which design does best for a wide range of parameter values, using primarily the Doptimality criterion. Additionally, an examination is performed on how well these designs protect against the bias present when the true model is cubic. Recall that in general, the problem of finding optimal designs is complex. It involves the determination of the number of levels to be included in the experimental design, and also which particular levels are to be chosen. Additionally, the number of times each pair is to be compared must be determined. In Chapter 3 it was proven that the design which minimizes var(3 ) must be of the form (n = n = N/2} thereby implying that the optimal number of levels is four, and only two pairs are to be compared an equal number of times. All that remained to be resolved was the determination of x for different values of SJ. In the present situation, a similar simplification of the
PAGE 77
67 general problem does not appear to be possible, except by imposing some restrictions, such as letting 6 =0. For this reason, three conditions are considered under which optimal designs are found. This essentially results in designs which are optimal for a proper subclass of the class of all designs. However, as will become apparent in the remainder of the chapter, there is not much difference in the degree of optimality among the three conditions presently considered. Hence it is conjectured that the degree of optimality of the designs found in the final and most general subsection of Sections 4.24.4 is very close to if not equal to the best that can possibly be achieved. The three conditions are now briefly introduced. The first condition considered in each of Sections 4.24.4 is that the levels are restricted to be x=l,0,l. This greatly simplifies the general problem to one with only two unknowns, namely the comparison proportions, y_^ ^ and y^ ^. The value of y is found by che relationship l^.i "^ ^0 1 "^ '^1 1 " ^' The second condition discussed in each of the following three sections is to let 6 =0 in the enumeration of the variances and covariance. In this situation the optimal designs are proved to be symmetric, and hence the design problem is simplified to a choice from the class of symmetric designs. As was pointed out in Section 2.3, previous papers on the design of paired comparison experiments have always relied upon setting all parameters equal to zero. Presently, the linear parameter is allowed to vary. The three criteria are a continuous function of the parameters because the variancecovariance matrix is continuous in the parameters. Therefore, the designs found for the second condition
PAGE 78
68 are locally optimal in the sense that they are close to optimal for values of 6_ which are near zero. The third condition in each of the following three sections is a consideration of designs whose comparisons are of the form (l,x ), (x ,1) , and (1,1) . The optimal choice of the levels x and x are found in addition to the three comparison proportions . This is a generalization of the first condition since setting x =x =0 results in the first condition. Figure 4.1 presents a synopsis of the procedures used in Sections 4.24.4 to find the optimal designs for each of the three optimality criteria and the three conditions introduced in the present section. The first coluinn presents a brief description of the three conditions, and hence the three rows in the figure correspond to the three conditions. Columns 24 contain a brief description of the methodology used for the three criteria. 4.2 Doptimal Designs The Doptimal design is defined to be the design which minimizes the determinant of the asymptotic variancecovariance matrix. Letting S denote the variancecovariance matrix, this is equivalent to maximizing Z ~[. In the case of a quadratic model, the Doptimal design is the design which maximizes D, where and where '^,,, '\_, and X are given in (3.3.2)(3.3.4). Because the optimal design depends on the true parameter values, designs are found
PAGE 79
69
PAGE 80
70 for various choices of the parameters. The three conditions introduced in Section 4.1 are now discussed, beginning with Condition 1. Condition 1 The levels are restricted to be x=l,0,l. Because of this restriction, and by substituting u . . for n. . in (3. 3. 2) (3 .3 .4) , the expression for ^.w '^1,/ and A can be simplified to A =U ({) +uct) +4U i 11 ^1,0^1,0 ^0,1^0,1 ^1,1^1,1 ' 12 '^o,ro,i ^1,0^1, ' (4.2.2) '^22 = ^1,0^1,0 " %, 1*0,1 The substitution of y . . for n. . is made throughout the present chapter. This essentially sets N=l since u. . = lim(n. ./N) . From (4.2.1) and (4.2.2), the expression for D is D = 4u u (t> (J) ^1,0^0,1^1,0^0,1 ^^^^1,0 ^0,1^ <^i, 0^1,0 ^ ^o,i*o,i)*i,i ' (^Â•^Â•^) where . . is defined in (3.3.5). The oartial derivatives of D with respect to y and y are, respectively, 3U "^0, 1^1, 0^^0,1 ^1,1 "^1, 0^1,0 "^0, 1^0,1 Â— i , u 4(iy u )'i> ^ , ^ 1,0 ^0,1^1,1^1,0 ' (4.2.4)
PAGE 81
3D By 0,1 = ^^^1,0*!, 0*0,1 ^*l,l^^l,0*l,0 ^ ^0,1*0,1^ ^ '*^^ ^1,0 ^0,1^*1, 1^^0,1 Setting the derivatives in (4.2,4) equal to zero results in 71 (4.2.4) u (2d) ) + ^1,0^ '1,0^1,1^ ^0,1^^1,0^,1 ^ *0, 1^1,1 ^ *i,o*i,i^ ^l,0^*l, 0*0,1 ^ *l,0*l,l ^ *o,i*i,i^ + u (2d) d) ] *^0,1 ^0,11,1 1,01,1 ' '0,11,1 (4.2.5) In matrix notation, (4.2.5) is equivalent to 1,0 0,1 1,01,1 \ i) d) '0,11,1 (4.2.5) implying / ".,0 \ \ "0,1 / = A 1 1,01,1 ^0,1*1,1 / (4.2.7) where A = 1,01,1 i) d)+d) d) id) d) ^1,0 0,1 1,01,1 0,11,1 i> d) +(j) i) +d) ({> 1,0 0,1 0,11,1 1,01,1 2d) d) 0,11,1 For the solution of (4.2.7) to be a maximum for the expression in (4.2.3), it needs to be shown that the matrix of second order partial
PAGE 82
72 derivatives is negative definite (see Kaplan, page 128). By (4.2.4), this matrix is X = ^'Â°/3^i,o' ^ Â°/^^i,o3%,i 5'Â°/3^i,o3^o,i\ ''^^'%,l' = 4 A , (4.2.8) where, from (3.3.5), 1,1 w. 1 1x2 = l/( e + e ) , 1,0 1 "^2 ., 1 "^2 ^ . 2 /( e + 1 ) , (4.2,9) ^1 ^ ^2 ^ ^ ^2 2 It is known from elementary matrix algebra that the determinant of a square matrix is equal to the product of its eigenvalues. So in this case, x=e e^, where e and e are the eigenvalues of X. If x>0, then either both eigenvalues are positive or both are negative, or equivalently , X is either positive definite or negative definite. But it is clear from (4.2.8) that (1 ) X l,l'^l,0 < , (4.2.10) where and are both positive and are given in (4.2.9) . The inequality in (4.2.10) implies that X is negative definite if x>0. However, it will be shown by (4.2.12) that the determinant is not necessarily positive for all values of 6. and Q .
PAGE 83
73 To simplify the notation, we write 2a (a + c b) X = (a + c b) 2c where 1,11,0 ^1 ^ ^2, = e / ; ^1 ^1, , h " ^2 (e + e ) (e +1) b = 1,0^0,1 = e / + 1) (1 + e ^ ^) 1,1^0,1 6, . 6 = e / ^^ y, Â»1 + Â«2 (e + e ) (1 + e ) Then the determinant of X can be written as X = 4ac (a + c b) = 2ac + 2ab + 2bc a^ b^ c^ (4.2.11) After resubstituting for a, b, and c, and after some algebra in (4.2.11), an expression for x is found by \A = 2 e + 4e ^ + 1 + 4e ^ ^ + 3e ^ ^ B, + 36 36+ 6Â„ 26+ 26, 36, + 3^ + 3e ^ + e ^ ^ + 3el +e^ 26 + 26 26 36^ + 36, 36, + 3, + 3e'^ + 3e^ + e ^ ^ + ^ ^ 2 (4,2.12) 2f ^1 ^^1 ^ ^^2 "^^1 "" ^^2 + 2e + e 46^ + 26^ 26^ + 462 26 6+ 26 +e +e +e+2e
PAGE 84
74 where 26^ r 6, 6, 6, +6, 6, + 8. I 2 2 1 , I. , 1 2 . ,, ,, ^1 ^2, ? = 4e / (e + e ) (e + 1) (1 + e ) . (4.2.13) Since C>0 for all Q and 6./ x>0 if and only if the righthand side of (4.2.12) is positive. However, it appears that for large enough values of 6 and S , some of the terms with a negative sign will overwhelm the sum of all of the terms with a positive sign, and hence the deteinninant could be negative . For example , B ^rid Q can be made large enough such that the tejnn exp(46^ + 2Q ) in (4.2.12) is larger than the sum of all the terms with positive signs, thereby making the expression in (4.2.12) negative. It will be seen shortly that this is in fact the case . An APL computer program was utilized to solve equation (4.2.7) for various values of 3, and 3 . The determinant, x, was also calculated and found to be positive for all values of 6 and 3^ when the solution fell inside the region of allowable values for the comparison proportions, U , Â„, U. ,, and y , ,. That is, \i . . ^0, for all i and j, and 1,00,1 1,1 1 j M n''"^ni ''"^1 "'" Figure 4.2 depicts the values that \i and U can have. They must lie inside or on the solid triangle. The dotted triangle in the figure depicts the region that the Doptimal designs are exclusively located. When the solution to (4.2.7) falls outside of the solid triangle in Figure 4.2, the optimal design must lie somewhere on the solid triangle. This can be proven by contradiction since D is a quadratic function of two unknowns, and hence it can only have one stationary point. For a point to be on the triangle, one of the following must be true: ^1,0 = Â°' %,1 = Â°' Â°^ ^1,0 ^ ^0,1 = ^
PAGE 85
75 '0,1 .5 1,0 Figure 4.2. Depiction of possible designs for Condition 1 If y_ p. = 0, then from (4.2.3), 1 , u Â° = ^*0,1*1,1%,1 ^0,1 ^ ' and oD/Sy = implies y = 0,5. This clearly maximizes D because of the negative coefficient on the quadratic term. Therefore if the optimal design lies on the line segment y_ =0, then it must be {y^ ^ = y_^ ^ = 0.5}. If ^Q ,_ = 0, then Â° = "'i'i, 1^1,0^^1,0 ^^1,0 ) ' and it can similarly be shown that the optimal design is the midpoint
PAGE 86
76 )f the line segment, namely {y ^ = 0, y = y = 0.5}. Finally, ^^ ^1,0 ^ ^0,1 = ^' ^^^^ Â° = ^^1,0^0,1^1,0*0,1 ^1,0'^'0, 11^1,0^^ "^1,0^ Again, it can similarly be shown that D is maximized when y = y = 0.5. Therefore, if the Doptimal design is on the solid triangle, it must be one of these three designs. When the solution to (4.2.7) was outside of the solid triangle in Figure 4.2, D was calculated for the three designs presented in the preceding paragraph, and the one with the largest value of D was taken as Doptimal. Otherwise the Doptimal design was found directly from (4.2.7). The determinant of X was also calculated. It was occasionally negative as conjectured earlier, but only when the solution was outside of the region, at which time the sign of x is irrelevant. The "restricted" Doptimal designs are presented in Table 4.1 for all combinations of the parameter values 8 =0(.1)1,2,3 and B_=0 ( . 1) 1 , 2, 3. Recall that y , Â„, y , , and y , , represent the proportion of the 1,0 0,1 1,1 j^ j^ total available comparisons to be allocated to the pairs (1,0), (0,1), and (1,1), respectively. The number of times one of these three pairs should be compared is then found by multiplying the corresponding y. . by N. The values of D are also given for the purpose of comparing the designs with the designs found in the next two subsections. Additionally, the relative efficiencies are given for the "restricted" Doptimal designs presented in Table 4.1 relative to the Doptimal designs
PAGE 87
77 found for Condition 3 in Table 4.3. The relative efficiency is the square root of the ratio of the corresponding values of D, since D is a 2 multiple of N . Recall from Section 3.1 that a value of 1 for the parameter S is "large". This is basically true for any parameter since x is in the interval [1,1 J. For this reason, it is assumed that rarely will 6, or S be larger than 1, and hence parameter values of less than or equal to 1 are of most interest. Parameter values of 2 and 3 are included primarily to observe the change in the designs for large parameter values. Table 4.1 only gives positive parameter values. The following theorem shows that the table can also be used to find optimal designs for negative values of B, or B . This theorem is similar to Theorem 3.6. Theorem 4.1 . Suppose that the Doptimal design for given B and 6 is a comparison of x. with x. a total of n times, for all pairs in 1 : X. ,x . ^ the design. Then (i) The Doptimal design for (B ) and (B ) is the same as for B^ and B^, (ii) The Doptimal design for (S ) and B is a comparison of (X.) with (X.) a total of n times, for all pairs in 1 : X. ,x . ^ 1 ] the design, (iii) The Doptimal design for S and (B,) is the same as in (ii) Proof. From (3.4.3), , _n (x ,x ) = (^ (x. ,x.) ,
PAGE 88
78 where
PAGE 89
79 This indicates that the increase in efficiency for the generalization considered under Condition 3 is slight. If an experimenter warrants that a minim\im number of levels should be used, then the advantage of the designs found presently under Condition 1 is that they have only three levels as opposed to four levels for the other two conditions. The fact that they are also about 95% efficient allows the present designs to be useful in this situation. However, if there is no significant advantage to only having three levels, then perhaps the designs found under Conditions 2 and 3 are more useful since they are slightly more efficient. Condition 2 In this subsection, locally Doptimal designs are found analytically for 3 =0. The designs are local as described in Section 4.1. That is, by the continuity of the criterion, the optimal designs are close to optimal for S^ close to zero. It turns out that they are also local in the sense that the designs found under Condition 3 have slightly larger values of D for some values of S . since the criterion is a maximization of D, the designs under Condition 3 are therefore occasionally slightly better. 3y the same arginnent that was used in the proof of Theorem 3.1, the Doptimal design must be a symmetric design when 3 =0, ^ich implies , 2 ^^2 ^Â° ^^^ search for a Doptimal design is reduced to a search for a symmetric design which maximizes D = a A . Also, the proof of Theorem 3.2 can be presently implemented to show that every pair in the Doptimal design must include either x=l or x=l.
PAGE 90
80 From (3.3.7), ({) . . is written as where ). . = ( 4cosh^(6, (X. x.)/2) ) ^ , ID 1 1 : cosh(u) = :r (e + e ) , 1 u u sinh(u) = Â— (e e ) The maximization of D is equivalent to the maximization of the following redefined D: where D = 4X^^X22 " f(x)g(x) , (4.2.14) (X) = Y ^/^ (lx.)2 f(x) = > , (4.2.15) . ^ cosh (3^ (x. l)/2) 1=1 1 1 N/2 2 2 ^ (1 X.) (1 + X.) , (4.2.16) '2) i=l ^1 i ^' = ^^l'^2 V2^ Two useful, easily verified facts are 3cosh(B (X. l)/2) V^^ = 4 S,sinh(S, (x. l)/2) , dx. 2 1 1 1 9sinh(6 (x. l)/2) K^ = ^ 3,cosh(S, (X. l)/2) . )X . 2 1 1 1 (4.2.17)
PAGE 91
81 The partial derivative of D with respect to x . is then 3d_ 3x. 9f(x) 3x. 3g(x) g(x) + f(x) (4.2.18) where 9f(x) 2(1 X.) 3g(x) cosh (6, (x^ l)/2) (1 X.) 6,sinh(6, (x. l)/2) 1 1 1 1 cosh^(6^(x^ l)/2) (4.2.19) 4x. 4x. 1 1 2. 2 (1 x. ) 6,sinh(6, (x. l)/2) 1 1 1 1 cosh (B, (X. l)/2) 1 1 cosh (B, (x. l)/2) 1 1 i=l, N/2. (4.2.20) Because of the complexity of handling these N/2 equations of N/2 unknowns, and because it appeared to be reasonable, the expression in (4.2.18) was evaluated at the N/2dimensional point x.=x, i=l,,..,N/2, set equal to zero and solved. This produced a design of two equally weighted comparisons, namely (1,x) and (x,l), for some x. The mechanics of the procediore follows. The partial derivative of D with respect to x,, evaluated at x.=x, for all i, is 3d 3x. (4.2.21) '.>;here cosh and sinh are shorthand for cosh (3 (x l)/2) and
PAGE 92
82 sinh(B, (x l)/2) , respectively. By setting the expression in (4.2.21) 5 3 equal to zero, multiplying it by cosh /(N{1 x) (1 + x) ) , and combining terms , we have (1 + 3x)cosh + B (1 X )sinh (4.2.22) Because of the transcendental nature of this equation, x can not be expressed as a closed form function of 3, Â• Consequently, an APL program was written to find the solution to (4.2.22) for various choices of 6 . These designs are presented in Table 4.2 and are discussed at the end of the present siobsection. In order for the solution to (4.2.22) to be a local maximum for D, it must be shown that a^D ^=Â°"^ '=Â°"V2^ Un^ 1 D cosa. N/2' < for any angles a. , such that ^ a. ^ 2tt, and 1 1 N/2 > cos a. i=l (see Kaplan, page 128) . This is equivalent to showing that the matrix X is negative definite, where / X = ^2 3 D ix. ex . 1 ] (4.2.23) ^i ^ ' N/2 X N/2
PAGE 93
83 From (4.2.18), the second order partial derivatives are 3^0 ^ ^(^^ 3x. 3x. g(x) + 23f(x) 3g(x) 3 g(x) 3x. Sj + f(x), i=l,...,N/2, 3x. 1 (4.2.24) S^D 3x. 3x . 1 J 3^f (x) 3f(x) 3g(x) g(x) + 3x. 3x . Â— 3x. 3x . 1 : 1 : 3f(x) 3g{x) 3^g(x) + f(x)3x . 3x. Â— 3x . 3x . 3 1 ID 3f(x) 3g(x) 3f(x) 3g(x) 3x . 3x. D 1 . iT^J, i,j=l,...,N/2, (4.2.25) where, from (4.2.19) and (4.2.20), 3^f(x) ^(IxJ^b/ 4(lx.)6,sinh (lx.)^S,^sinh^ 2il, il 2il cosh cosh" cosh (4.2,26) 9\(x) 2 1 2 2 2 4(3x. 1) ^(1x. ) 3, 1 2 1 1 ,2 cosh (x. X. ) S, sinh 111 cosh' 3., 2 2, 2 . ,2 Â— (lx. ) 3, sinh 2 1 1 ,4 cosh i=l, . . . ,N/2. (4.2.27) Evaluating the second order partial derivatives at the point x.=x, for all i, (4.2.23) can be written as a b b a (4.2.28) a b
PAGE 94
where 3x/ 34 9^f(x) 8f(x) 9g(x) 9^g(x) g(x) + 2 ^^t: Â— Â— + f (x) 9x.' 3x. 9x. 3x.^ 1 J (4.2.29) and b = 3^D 3x. 3x . 1 1 (any ij^j) = 2 3f(x) 3g(x) 9x. 3x. X J (4.2.301 The complete expressions for a and b can be found from (4.2.29) and (4.2.30), in addition to (4.2.15), (4.2.16), (4.2.19), and (4.2.20), by substituting x for x.. The matrix X is of a special form as indicated by (4.2.28). It can be written as X = D + ayz ' , where D=(a b) I . , a=b, and y '=z_'=j^' = (1 , ... ,1). From Graybill (page 187, Theorem 8.5.3), the characteristic equation is N/2 (d + Â°'X ^i'i " ^1 A) (d \) = . i=l
PAGE 95
85 So (Â— 1) of the eigenvalues are d=ab, and one eigenvalue is d + a Ey.z. = (a b) + Â— b. As discussed in the previous subsection, .11 2 1 the matrix X is negative definite if and only if all of the eigenvalues are negative. An analytic proof that the two eigenvalues are negative follows . Consider first the eigenvalue (ab) , where from (4.2.29) and (4.2.30) , '"3^f(x) ab = d^gix) g(x) + f(x) 3x.' x.=x 1 2 1 2 2 cosh (2 jdx) 6 ) + sinhcosh(4(lx)S ; 2 3 2 2 + sinh (j(lx) 3^ ) _ cosh NÂ„ 2 2 cosh ^(1x) cosh 2 ^ 12 2 cosh (4(3x 1) (lx )B^ + sinhcosh(8x(lx^) 3 ) + sinh^ (jdx"^) ^8^) cosh' (4.2.31) It needs to be shown that (ab) is negative at the stationary point x, which is the solution to (4.2.22). This relationship between x and S found in (4.2.22) is used in the above expression of (ab) by substituting for sinh. sinh = (1 + 3x) cosh 3^(1 x^) (4.2.32)
PAGE 96
86 Upon substituting (4.2.32) into (4.2.31), the eigenvalue (ab) at the stationary point, x, is ab = /o 1m ^2Q 2 . 4(l+3x) 3(l+3x) N,n ,2,, ,2 dx) (1+x) cosh (4(3x^1) j(lx^)6^^) 8x(l+3x) + (l+3x) N,, x2 jdx) cosh (4.2.33) N 2 4 After factoring out Â— (1x) /cosh and combining terms. ab < if and only if (4.2.34) 5^( j(x^ 1) (x^ 2) ) + ( 22x^ 20x 6 ) < The first term in (4.2.34) has roots x=Â±l and x=Â±/2, and is nonpositive for xe[l,l]. The roots of the second term are imaginary, and clearly the function is negative at x=0. Hence it is negative for all x. The result ab < then follows from (4.2.34). N The second eigenvalue is (ab) + b. From (4.2.19), (4.2.20), and (4.2.30) , ^ ^ 2d x) (_2^osh (lx)S,sinh) (4x(l+x) cosh (1x) (1+x) "S, sinh) cosh 2(lx) cosh 8x(l+x)cosh'' + 2(lx)(l+x) 3^coshsinh 2 2 2 2"' + 4x(lx ) 3^coshf!inh + (1x )'"3^ sinh'^ 2(lx) ,4 cosh (x" + 2x 1) , (4.2.35)
PAGE 97
87 the last step produced by again using the relationship in (4.2.32) and combining termsThen N ij ^ i _ (2x 2) . (4.2.36) cosh By factoring Â— (1x) /cosh out of the second eigenvalue, it is clear from (4.2.34) and (4.2.36) that (a b) + I b < if and only if B^(j(x^l) (x^2)) + (22x^20x6) + (2x2) < if and only if 6^((x^l) (x^2)) + (22x^18x8) < . (4.2.37) The first term in (4.2.37) is unchanged from (4.2.34). The roots of the second term are still imaginary, and hence the eigenvalue is negative. Therefore X is negative definite, and the solution to (4.2.22) is in fact a local maximum for D. The locally Doptimal designs for 6 =0 ( .02) 1, 2 (1) 5 are presented in Table 4.2. The value of the original D=X A is also given for comparison purposes. By comparing the present values of D to the corresponding values of D associated with the designs in Table 4.1, it can be seen that the present designs offer a slight improvement for 6Â„=0. Theorem 3.3 is applicable here since 3_=0, and so the optimal design for any particular value of 8^ is the same as the optimal design for (6,) . Recall that for a given 6,, the optimal design is to make half
PAGE 98
88 of the total comparisons between x and 1 , where x is found in the table , and the other half between (x) and (1) . The optimal value of x is graphed as a fiinction of B  for B, ^ 1 in Figure 4.3. Condition 3 A grid method was used to find Doptimal designs with no restrictions on the values of the tMD parameters. With 3 y^O, the argument for a symmetric design is no longer applicable. As a generalization of the first subsection, the three comparisons (l,x ), (x ,1), and (1,1) were allowed. A description of the grid method follows. In general , if there are k independent unknowns , and the grid is to have n values for each variable at each stage, then the grid method is simply an evaluation of the function at each of the n points, and the determination of which of these points maximizes (or minimizes) the function. It then repeats the procedure, only with a finer grid centered at the best point previously found. This procedure is repeated until the optimal point has been found with sufficient accuracy. The procedure used for the present condition was essentially a grid within a grid. At each step, the combination of three values each of X and x were taken. Initially these values were 0.5, 0, and 0.5. For each of these 9 points, a grid method was used on the allocation of the comparisons to the three pairs. This grid consisted of initial increments of 0.1 in the allocation fractions. Notice that this is not a full 11 grid since, for example, {y = y = u = 0.5} is Â— L,x X^,l Â—1,1 not a possible allocation for the three pairs because they must sum to 2 1. After the best of these possible allocations were found, a finer 5
PAGE 99
89 grid was performed about the current best allocation. Note that it is 2 3 a 5 grid rather than a 5 grid, because when two of the allocation proportions are known, the third is just the sum of the two proportions subtracted from 1. This was then repeated until the best allocation fractions were accurate to Â±.0001. The best of the 9 combinations of 2 X and X was taken, and then a finer 3 grid about the current best combination of x and x^ was performed using the same procedure described in the present paragraph. The entire procedure described in the preceding paragraph was repeated until the values of x and x were also accurate to Â±.0001. Doptimal designs were found for combinations of 3 =0(.1)1,2,3 and 62=0 ( .1) 1,2,3, and are presented in Table 4.3. Once again, the value of D for N=l is also given in the table. Notice from the table that when n =0, the value of x is immaterial, and therefore an asterisk appears under the column for x . 2 Theorem 4.1 is applicable, so designs for negative S or 3 can the also be found from the table. If both 3 and 3 are negative, then optimal design is identical to the one for positive B and 3 . If, however, exactly one of the parameters is negative, then the optimal design is the same as the one for \B \ and B J, except that n changes to n , and n , changes to n . For example, if l/x^ ^2' x^,l 3^^=0.6, 32=0.3, then the Doptimal design would be to run .462N comparisons of the pair (1,.23), ,468N of the pair (.25,1), and .070N of the pair (1,1) . Table 4.3 indicates that once again no pair should be compared more than half the time. Also, the optimal design is inside the region
PAGE 100
90 Tabl<
PAGE 101
Table 4.1continued 91 h
PAGE 102
Table 4.1 continued 92 Sl ^
PAGE 103
Table 4.1 continued 93 Â«!
PAGE 104
Table 4.2. Doptimal (local) designs with 6_=0 94 ^1
PAGE 105
95 Table 4.3. Doptimal designs "l
PAGE 106
96
PAGE 107
Table 4.3 continued 97 h
PAGE 108
98 Table 4.3 continued h
PAGE 109
99 depicted by Figure 4.2 for small parameter values, and the designs approach the boundary as the parameters become larger. Notice the peculiar designs for B=2,3 and . 5 5, 6 ^1. The optimal design in this case is a comparison of only (l,x ) and (1,1), each N/2 times. A comparison of the designs in Table 4.2 with the designs for B_=0 in Table 4.3 shows that at least some of the designs found in the previous subsection are only locally optimal. This follows from the fact that the value of D is larger in the present subsection than it is in the previous subsection for B ^ .6. Notice that the designs are identical for 3 ^ .8. Based on what has been presented up to now, the design D ={n_ = n = .4N, n_ = .2n} is a candidate for an overall efficient design. This is reasonably close to all of the optimal designs in Table 4.3 where the parameters are both less than or equal to 1. The efficiency of D relative to the optimal designs is discussed in detail in Section 4.5. Also considered in Section 4.5 is the amount of bias present for design D when the true model is cubic. 4.3 Averagevariance Designs The optimality criterion discussed in this section minimizes V, the average variance of the predicted value, where ' r vardn rr ) dx . (4.3.1) 1 The integral is over the experimental region, which in this case is LlÂ»lj, and In TT = 3,x + S_x . The variance of the predicted value. In ir , is
PAGE 110
var (In IT ) / \l \2 1 ( X x^ ) \ ^2 ^2 I ^11^22 '^12 100 (4.3.2) From (4.3.1) and (4.3.2) , 2 , 2 , 1^22 ^ ?\l XX X ^ 11 22 12 (4.3.3) Three methods for finding optimal designs analogous to the methods discussed in the previous section for Doptimal designs are now discussed. Condition 1 The levels are again restricted to be x=l,0,l. Because of the complexity of V relative to D, it was decided to use a grid method for this condition rather than approach the problem in a manner similar to what was done for Doptimality . The grid consisted of initial increments of 0.1 in the allocation fractions. After finding the best of 2 the allocations above , finer and finer 5 grids were run until the fractions were accurate to at least Â±.001. The procedure described in the preceding paragraph was performed for all combinations of S =0(.1)1,2,3 and 3^=0 ( . 1) 1, 2, 3 . The designs are presented in Table 4.4 along with the value of V for N=l. Again, efficiencies of the "restricted" designs relative to the designs found for Condition 3 are given. In this case, V is a multiple of 1/N, so the relative efficiency is defined to be the ratio of the corresponding
PAGE 111
101 values of V. The relative efficiencies are larger than .95 when 3, and 6 are both less than or equal to 1, indicating that the designs in Table 4.4 are only slightly less efficient than the optimal designs found in Table 4.6. The proof of Theorem 4.1 is again applicable, and so optimal designs for negative values of S or 6_ can be found as before. That is, optimal designs for negative 3^ and Q are the same as for  3,  and 3 I Â• When exactly one of the parameters is negative, the columns for y and \i effectively exchange. Condition 2 In this siÂ±isection, local averagevariance designs are found under the assumption that the true quadratic parameter is zero. The argument for symmetry is identical to the one used in Condition 2 for the Dopti2 mality criterion. That is, reducing X reduces V, and so the optimal design must be symmetric because '^,^=0 for a symmetric design. Then from (4.3.3), 2/3 2/5 ,^,.s Â—+ r^Â— . (4.3.4) \l ^22 Again, using a similar proof, it can be shown that every pair in the optimal design must include either x=l or x=l. Then from (4.3.4), V can be written V _ Ml^ . ill^ , ,4.3.5, f (x; g (x) where f(x) and g(x) are defined in (4.2.15) and (4.2.16).
PAGE 112
102 The partial derivative of V with respect to x. is 3f (x) 3g(x) 3v
PAGE 114
The remaining eigenvalue is 104 N N (ab) + ^ = (ab) + J
PAGE 115
105 Table 4,
PAGE 116
106
PAGE 117
Table 4.4 continued 107 61
PAGE 118
108
PAGE 119
Table 4.5. Averagevariance (local) designs with 3 =0 109 h
PAGE 120
Table 4.6. Averagevariance designs 110 sl
PAGE 121
Ill Table 4.6 continued 81
PAGE 122
TaÂ±ile 4.6 continued 112 h
PAGE 123
113
PAGE 124
114 Condition 3 A grid method identical to the one described in Condition 3 for Doptimal designs is used to minimize V in (4.3.3). The optimal designs are presented in Table 4.6. Theorem 4.1 is again applicable, and so optimal designs for negative values of B, or Q can be found in the manner described in Condition 3 of Section 4.2. Notice that the pattern of the designs in Table 4.6 is similar to the Doptimal designs in Table 4.3. However, it is no longer true that eveiry pair is compared no more than half the time. Also, for S_ ^ 1, the change in x . x^, y ^ , y , , or u , , found in Table 4.6 is 1 2 "'Â•'^1 X ,1 1,1 monotone as a function of S . For 3_=2 or 6=3, however, this is not the case. For example, notice the large difference between the optimal design for 6 =.8, 3_=2 and the optimal design for 3 =.9, S_=2. Once again a comparison of the designs in Table 4.5 with the designs for S =0 in Table 4.6 shows the designs found in the preceding subsection result in only local minimums for V when B  ^ .6. The corresponding designs found presently are superior because of the smaller value of V. 4.4 Minimax Designs This section is a consideration of designs which minimize the maximum value of vardn t ) over xÂ£Ll,lJ. The designs are referred to as X minimax designs. From (4.3.2), the minimax design minimizes max vardn it ) = max xeLl,lj ^ xeLl,lJ x"(A,, 2xA^^ + xA.^^) 'll 22 ~ 12 (4.4.1)
PAGE 125
115 The following theorem simplifies (4.4.1). Theorem 4.2. Given a quadratic model. Then var(ln ir ) is a maximum X over xe[l,l] at x=Â±l. Proof. Note that 22 X. 12 '2 I 12 11 \l'"^22 " ^2^ Case 1: A <. 0. var (In tt ) var(B^ + 62) ^22 ^ \l ^^2 /\ A A ^ 11 22 12 \l'^22 ^2 (1 s. x s 1) Case 2: A > 0. 12 var (In tt var (In tt var(62 " ^1^ \l " ^22 " ^\2 \l'S2 ^2 x''(A, x" + A^^ + 2A_x) 11 22 12 A A A ^ 11 22 12 (1 s X s 1)
PAGE 126
116 var (In tt ) X From the preceding theorem, the minimax design is the design which minimizes ^22 2^2 ^ \l \2 ^ 2^2 ^ \l L '^1^22 " '^2^ \l^22 ^12^ ^22^1^2! '\l 2 ^1^2 \2 (4.4.2) Three conditions are again briefly discussed. The procedures for the three conditions are very similar to the procedures described for averagevariance designs because of the similarity of V and M given by (4.3.3) aind (4.4.2), respectively. Condition 1 The levels are restricted to be x=l,0,l. A grid method identical to the one described in Condition 1 of Section 4.3 is used. The proof of Theorem 4.1 is applicable, and so optimal designs for negative 3 or S are found as explained for Condition 1 of Section 4.3. These "restricted" minimax designs are presented in Table 4.7 along with the values of M, for N=l. The efficiencies of the "restricted" designs relative to the designs for Condition 3 are once again given. In this case, M is a multiple of 1/N, so the relative efficiency is defined to be the ratio of the corresponding values of M. The relative efficiencies are once again larger than 0.95 when 3 and 3 are both less than or equal to 1, indicating that these designs are only slightly less
PAGE 127
117 efficient than the optimal designs found at the end of the present section in Table 4.9. Condition 2 In this subsection local rainimax designs are found under the assumption that the true quadratic parameter is zero. The same argument for symmetry as used in Condition 2 of Section 4.2 can be applied 2 here. Decreasing \ reduces the numerator of (4.4.2) and increases the denominator, and hence decreases M. So the minimax design must be a symmetric design, implying a =0. Then from (4.4.2), M = Y^ + Y^ . (4.4.3) 11 22 Also, the same argument used in Condition 2 of Section 4.2 can again be applied to show that every pair in the optimal design must be of the foirm (l,x.) or (x.,1), some x., and hence M can be '^rritten 11 1 :4.4.4) f(x) g(x) ' where f(x) and g(x) are defined in (4.2.15) and (4.2.16). >7otice that the only differences between (4.4.4) and (4.3.5) are the coefficients 2/3 and 2/5. The oartial derivative of M with respect to x . is 1 3f(x) 3g(x) 5m _ 3x. 3x. 3x. ~ ^ + Â— , (4.4.5) " f (x) g2(x) where 3f(x)/ox. and 3a(x)/3x. are given in (4.2.19) and (4.2.20). Â— 1 ' Â— 1
PAGE 128
118 Via the same procedure employed in Condition 2 of Section 4.3, the optimal design is a comparison of (1,x) and {x,l) an equal number of times , where x is the solution to (2(l+x)^+4x)cosh + (dx) (l+x)^+(lx^))6 sinh = (4.4.6) To show that the matrix of second order partial derivatives is positive definite, equations (4. 3 .8) (4 . 3 .14) can be repeated for the present situation. Everything is identical except that a multiple of 2/3 and 2/5 is removed from the places where they appear. The eigenvalue (ab) can then be shown to be equal to (4.3.11) after the coefficients 2/3 and 2/5 are removed from (4.3.11). Likewise, the eigenN value (ab) + ^ is equal to (4.3.13) after the coefficients 2/3 and 2/5 are removed from (4.3.13). An APL computer program solved (4.4.6) for S =0 ( .02) 1 , 2 (1) 5 , and simultaneously evaluated both eigenvalues. Both eigenvalues were positive in all cases, and so the designs presented are at least locally minimax. The designs are presented in Table 4.8 along with the value of M. Again, Theorem 3.3 is applicable, and so the optimal design for a negative value of S, is the same as the optimal design for 3  . Condition 3 A grid method identical to the one described in Condition 3 of Section 4.2 was used. The minimax designs are presented in Table 4.9. Theorem 4.1 is again applicable, and so optimal designs for negative values of S^ or B^ can be found in the manner described for Condition 3 of Section 4.2.
PAGE 129
Table 4.7. Minimax designs with restriction x=l,0,l 119
PAGE 130
120 Table 4.7 continued 61
PAGE 131
Table 4.7 continued 121 ^1
PAGE 132
122
PAGE 133
Table 4.8. Minimax (local) designs with 62=0 123 61
PAGE 134
124 Table 4.9. Minimax designs \
PAGE 135
125
PAGE 136
Table 4.9continued 126 Sl
PAGE 137
127
PAGE 138
128 The optimal values of x and x in Table 4.9 are once again similar to the previous two optimality criteria discussed in the present chapter. However, notice that the pair (1,1) should seldom or never be compared for the minimax criterion. That is the major difference between the present designs and the designs in the two previous sections. The optimal designs for 3^=0 found in Table 4.9 are identical to the designs found in Table 4.8, suggesting that the minimax designs found in Table 4.8 may in fact give absolute minimums for M. 4.5 Conclusion and Design Recommendations Figure 4.3 is a graph of the optimal level to be compared with x=l for each of the three criteria discussed in the present chapter when 6 =0, as a function of  8,  . This figure has been referred to in Condition 2 for each of the three previous sections. Recall that these are only locally optimal in some cases, and hence are possibly not as useful as the optimal designs found for Condition 3 in the three previous sections. However, notice that the optimal value of x is closer to for the minimax criterion than it is for the other two criteria. Also, the slope of the Doptimal curve is steeper than the other two which are essentially parallel, signifying that these Doptimal designs change faster than the others as S^ changes. The remainder of this section contains a discussion and comparison of seven particular designs on how good they are overall. Estimates of their relative efficiencies are obtained for the three optimality criteria discussed in the present chapter, and it is determined which designs are best overall for protecting against possible bias.
PAGE 139
129
PAGE 140
130 Let Designs 13 be defined as three point, four point, and five point designs, respectively, where the levels are eqxaally spaced, and all pairs are compared an equal number of times. Also, define Designs 47 as: Design 4: {\^_^^^^^ = U__24,l = Â•^' ^1,1 = "^^ ' Design 5: ^y_,^_24 = ^.24,1 = ''^' ^^1,1 = '^^ ' Design 6: {u_^ 2 " ^2 1 ^ '^^ ' Design 7: {y_ 5 "^ ^_ 5 i "^ .4706, i^ , , = .0588} . The reason for including Designs 13 is that they are designs which might naively be run by an experimenter who has little knowledge of optimal designs. The motivation for including Designs 4 and 5 is that they are similar to many of the optimal designs found in Tables 4.3, 4.6, and 4.9, especially for relatively small parameter values. Design 6 is similar to the optimal designs for large parameter values. The source of Design 7 is Springall's paper. It is included here although he did not state any optimal properties for the design. The efficiencies of the seven designs defined above relative to the optimal designs found in Tables 4.3, 4.6, and 4.9 were calculated for combinations of B ,3 =0(.2)1. The integrated mean square error (J) was calculated for 3 ,S^=0(.25)1 and B =l(.4)l, where C J = I E(ln TT g(x) ) dx , ^1 N=100, and the true model, g(x), is cubic. Finally, the bias part of J was calculated. See the following chapter for an expression for J and the bias.
PAGE 141
131 Table 4.10 presents the minimum, maximum, and median of the relative efficiencies for each of the three criteria discussed in the present chapter. This table gives an overall estimate of what is lost by running a design different from the optimal design. The estimates are given for the 24 combinations of Q and Q where j 3,  ^ .6/ for the 12 where [ S^  > .5, and for all 36 combined. Remember that these minimvims , maximiims , and medians are only estimates of what could be considered as corresponding tr^ae parameter values. The efficiencies for the Doptimal criterion will now be discussed. The averagevariance and minimax criteria can be considered in a similar manner. For the condition j3  ^1, Designs 4 and 5 both have a median over 99%, with Design 4 slightly larger. However Design 5 has a minimum efficiency of 96.5% as opposed to 93.5% for Design 4. Both of these designs are overall superior to the other designs considered. For the case S j Â£ .6, Designs 4 and 5 are again superior to the others. However, Design 6 is the better design for the case [ S^  > .6, although Design 5 is very close. The values of the bias and the integrated mean square error were ranked from smallest (1) to largest (7) for each of the 150 combinations of 8 , S , and B considered. Table 4.11 presents an arithmetic mean of these ranks. This gives a rough idea of how the bias and integrated mean square error compare among the seven designs. It is no surprise that Design 3 has a small bias since there are a total of ten pairs compared an equal number of times. Surprisingly, Design 7 had the smallest bias every time. However, the variance part of J is much larger for Designs 3 and 7 than for Designs 5 and 6, and
PAGE 142
132 hence the latter two designs have smaller values of J as indicated by the average rankings in Table 4.11. By taking in conjionction the information given by Tables 4.10 and 4.11, the following recommendations are given. If no a priori information about S, is available, or if it is believed that  6,  ^ .6, then Design 5 is a good design to use. If it is believed that  6,  > .5, then Design 6 is the better design. The relative efficiencies were also calculated for the design (y c"^ i;n~^n c ~ ^ ^ ^ ~ 25}. They were in the vicinity of 25% efficient. Hence this design and Designs 13, any of which an experimenter could naively elect to use, are in fact poor designs for fitting a quadratic model.
PAGE 143
Table 4.10. Efficiency of designs 133 Design Condition 16,1
PAGE 144
Table 4.11. Bias of designs 1. _Â— 'V33% ^ ^3l%VL~ Â— I ^ ____., 40% ~^^^ ^"1.^40% ""' 20? .^I 1 1^ 1 __,^ 50<^, ^^^ 50% ___, 5.88% ____ ,''"'' ___47.06% ~"l^""~~Â„ I \ 'H ^ Ilr 1 1 ^^^^ .5 ^^, .5 1 """47.06% ''''' 134 Mean ranking Desian Bias J 7.00 6.00 2. 1 ^/^ ^/^ ^ 5.94 5.59 Each of 5 pairs compared an equal number of times I 1 1 \ 1 1 .5 .5 1 3. 2.41 5.97 Each of 10 pairs compared an equal number of times ^1 . .24 .24 1 ^^^ ^Â•'^^ 51 ,^ .24 ^^ .24 1 3.85 2.18 """'10% '" 6. , 2.74 1.65 1 .2 .2 1 1.00 3.19
PAGE 145
CHAPTER 5 DESIGNS THAT PROTECT AGAINST BIAS 5 . 1 Introduction This chapter is concerned with finding designs which protect against the bias present when the true model is cubic, and a quadratic model is fit. Section 5.2 contains a discussion of designs which minimize the bias. Section 5.3 presents designs which minimize J, the integrated mean square error, where 2 E(ln 7T g(x) ) dx '^1 2 " 2 E( In TT E(ln TT ) ) dx + (E(ln tt ) g(x)) dx and where 1 ^1 V + B , (5.1.1) g(x) = 3*x + 3*x^ + 3*x^ , (5.1.2) In TT X 3^x + 62^ , (5.1.3) 2 In TT^ = 3^x + S^x . (5.1.4) 135
PAGE 146
136 In the present chapter, Bw 6^/ and Q* represent the true parameter values. Notice that designs which minimize V in (5.1.1) when S*=0 were found in Section 4.3. Since 6*=0 implies B=0 , these designs also minimize J. An expression for V is given by (4.3.3). However the X. i: in the expression are now a f\inction of Q* , 6*, and 6* rather than 6, and 3_. A general expression for \.. is found in (2.3.10). 5.2 All Bias Designs From (5.1.2) and (5.1.3) , (E(ln 7T ) g(x) ) = ( (6^ B*)x + (62 8*)x^ 8*x^ ) B*)^x^ + 2(3^ i*) (S^ S*)x" + ( (62 S*)^ 2Q*{B^ 8*) )x'* ^^3(^2 r,j.> 5 ,oj.2 5 B*)x + 3* X (5.2.1) Then from (5.1.1) and (5.2.1), the bias is 3 = r^ (E(ln TT ) g(x)) dx X 1 2 2 2,^2 4, = f(3, 3*) .3(3282*) 73* 36*(3, 3*) . (5.2.2) Notice that any design which results with the values of 3^ and 82 that minimizes (5.2.2) is a minimum bias design. The optimal values of 3 and 3 as functions of 3*, 3*, and 3* are now found. Taking derivatives in (5.2.2),
PAGE 147
137 Iff<6, 6J. !Â«Â• , 1 1^ = (3. S*) (5.2.3) 5'''2 "^2' Setting the expressions in (5.2.3) equal to zero results in (5.2.4) h = ^2 Â• The three roots to the equation formed by equating (5.1.2) and (5.1.4), after substituting (5.2.4) into (5.1.4), are x=. 7746 ,0, .7746. These three roots are the points at which the quadratic and cubic curves intersect. The expected number of times that x. is preferred over x. is defined as n. .P(x. ^ x.), where the probability that x. is preferred 13 1 3 ^ over x is calculated using the true cubic model. This is referred to J as "expected" data. If the only levels compared are .7746, 0, and .7746, then the "expected" data under the true cubic model is the same as the "expected" data under the quadratic model given by (5.2.4). Hence the maximum likelihood procedure using "expected" data under the true model results in 3 and 3 when only the three levels .7746, 0, and .7746 are pairwise compared. This procedure is equivalent to first finding maximum likelihood estimates, 3, and Q , from the data, and then taking expectations. This was verified for the present situation. Therefore, any paired comparisons among the levels x=. 7746 , 0, . 7746 results in e3 =3 and e3 =3/ where 3, and 3 are given in (5.2.4), and
PAGE 148
138 hence any such design minimizes the bias. Notice that the optimal design does not depend on the parameters. This choice of equally optimal designs does not appear to be reasonable, and so the designs are not recommended to be used in practice. Minimizing the bias could result in a large average variance, and hence a large integrated mean square error, J. The next section is a consideration of designs which minimize J. 5.3 Integrated Mean Square Error Designs Define the function f as f(SwB^,B,,x) = In TT = 8,x + 6^x^ + 6,x^ . (5.3.1) 12 3 X 12 3 Then it is clear that f(6^,S2/33/x) = f(6^,32/63,x) f{8^,62.33,x) (5.3.2) f(B^,B2'63/x) , and f(8^,B2/63,x) = f(6^, 62/83, X) f(B^,S2/S3,x) (5.3.3) f(B^,32'B3,x) . Intuitively, this would imply that optimal designs only need to be found for 3^ and 6^ ^ since f is just transposed about the xaxis or yaxis , or both, for the case of negative values of 3 or 3^For
PAGE 149
139 example, by (5.3.2) the function f for (B^^B^/B^) if f (B^,B2/33,x) transposed at)Out both the xaxis and yaxis. From (3.2.5), ()^^ can be written exp(f(63_, 62^33, X.) + f(B^, 32^63, X. ID ( exp(f(3^, 62/63, X.)) + exp(f(B^, 62,63, X.)) ) . (5.3.4) The fact that only B^ and 62 ^ need to be considered can be verified by using (5.3.4) in a manner similar to the proof of Theorem 4.1. A Fortran program performed a grid search for the designs which minimize J, where J is given by (5.1.1). The total number of comparisons was taken to be N=100, and so V, as given by (4.3.3), was divided by 100 before substituting into (5.1.1). The grid search was performed for combinations of 8^,62=0 ,. 5 , 1 , and S3=l(.5)l, plus a few additional parameter combinations. The levels were fixed at x=l, . 5 ,0 , . 5 ,1 . The grid allowed for all 10 possible paired comparisons. The search ended when all 10 comparison proportions were accurate to at least Â±.01. These optimal designs are presented in Table 5.1. The value of J is also given. As was previously discussed, the designs for negative B or B are found as follows. First of all, the design for B^, IB I, and 6 is located in the table. If 6 and Bare both positive 2 3 L ^ or both negative, then the design is found directly from Table 5.1. If this is not the case, then n becomes n_ _ for all pairs m the X . / X . X . / X . design. Notice that the pairs (l,.5), (.5,0), and (0,.5) are never compared, and (.5,1) is only compared when 3^=32=6^=1. The designs presented are not intended to be precise, but rather they give a rough
PAGE 150
140 description of designs which protect against bias. For example, the designs should be symmetric when 6=0, but Table 5.1 shows them to be close to but not exactly symmetric. Because of the restriction of the five fixed levels, the minimvmi value of J found in the present section is not necessarily the minimum for the class of all possible designs. In fact, when 8 =0, Designs 46 in Section 4.5 have slightly smaller values of J than those given in Table 5.1. This can be seen in Table 5.2 which compares the values of J for the optimal designs presently being discussed with Designs 46. This table shows that if the cubic parameter is relatively large, then the present designs are significantly better for protection against bias. Otherwise Designs 46 adequately protect against bias, and they additionally are highly efficient for the optimal properties discussed in Chapter 4.
PAGE 151
141 Table 5.1. Jcriterion desiqns 81
PAGE 152
142 Table 5.1 continued Â«1
PAGE 153
143 Table 5.2. Values of J Design 4 Design 5 Design 6 "Minimum" J
PAGE 154
144 Table 5 . 2 continued Design 4 Design 5 Design 6 "Minimum" J 5
PAGE 155
CHAPTER 6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS 6.1 Introduction The motivation for this chapter is a paper by Sen (1979) on preliminary test maximijin likelihood estimators (PTMLE) . The procedure for the PTMLE is to collect the entire data and then make a test of hypothesis. If the null hypothesis is rejected, then the maximum likelihood estimates under an unrestricted parameter space are found. If the null hypothesis is accepted, the estimates are found under a restricted parameter space. Presently, the unrestricted parameter space is the twodimensional (6 ,B ) , the restricted parameter space is (6 ,0) , and hence the null hypothesis is H : 3^=0. The objective of the present chapter is to o 2 find designs which minimize the integrated mean square error using the preliminary test estimator. This is similar to Section 4.3, except that in the present situation a linear model may be fit to the data instead of simply fitting a quadratic model regardless of the data. Define (S^/0) as the maximxim likelihood estimator under the restricted parameter space, {Q ,B ) as the estimator under the unrestricted parameter space, and (6*/ 3*) as the PTMLE. Sen shows that for any fixed alternative, i.e. for any $7^0, i^N(S 3^,6 8^) and /N(6* 3,/3* B^) have the same asymptotic multinormal distribution, and therefore the optimal designs for the PTMLE are the same as 145
PAGE 156
146 the optimal designs found in Section 4.3. Likewise, the optimal designs for 6^=0 are identical to the ones presented in Section 3.2. The remaining situation is when (6 ,Q ) is "close" to the restricted parameter space, i.e. when 6_ is "close" to zero. This is r 1Â°Â° conceived by defining a sequence (K ; of local alternatives to be ^ N=l = Y//N , :6.1.1) where Y is some scalar constant. Notice that as N><Â», K * H . This situation is covered in the next section. 6.2 Averagevariance Designs for Preliminary Test Maximum Likelihood Estimators For the remainder of the present chapter, the sequence of alternatives given in (6.1.1) is considered. The asymptotic dispersion matrix for the PTMLE in general is given by (5.8) on page 1029 of Sen's paper. After working through Sen's paper to find this matrix for the present situation, the asymptotic dispersion matrix for /N(3* 3^,S* 3^) is E = 11 12 22 1/X 11 11 12 1 12 22 ,2 2 , ^ rr'( 2P(x3,^^xJ,^) ^^xl^^xi^) ) where ,\ , X , and \ are given by (3 . 3 . 2) (3 . 3 .4) , x" a is a noncentral chisquare random variable with noncentrality parameter A and 9 2 2 r degrees of freedom, xT is such that P(xr r /> * XT , ) = <^Â» ^"<^ l,a 1,A=0 1,0L
PAGE 157
147 2 _ A,, A "Â• A _ 2 11 22 12 /c , ON Y i > (6.2.2) \l \2/\l [6.2.3) Notice that in (6.2.1) noncentral chisquare probabilities need to be found. A number of convenient approximations exist. One such approximation is to use a central chisquare probability to approximate a noncentral chisquare probability. This is found in the Handbook of Mathematical Functions, published by the National Bureau of Standards. Letting V denote the degrees of freedom and A denote the noncentrality 1 9 parameter, P(X . ^ c) can be approximated by P(X"'* Â„ ^ c/{l+b)), where \) fix V , b = A/(v + A) , and V* = (V + A)/(l + b) . Haynam, Govindarajulu, and Leone (1962) present a computational formula for noncentral chisquare probabilities. The formula is ^ j! P(X^_,>y) \' "''''V^)^ j=0 r((v/2)+j) J y (x/2)^^/2).jl^x/2^^^/2^ Y ''"'''"^^'' P(X,^ >y) . (6.2.4, j! ^V j=0 2 ^ ^ The value of j! becomes large very rapidly, and hence the summation in (6.2.4) also converges rapidly. These two methods were compared to observe how close the approximation is to the actual probability, since the computing costs would be reduced if the approximation was adequate. This comparison was made
PAGE 158
148 for combinations of v=3,5, A=.05(.05)2, and a=. 05 ( , 05) , 30. These values were chosen because they are the type of probabilities that need to be calculated in (6.2.1). The summation in (6.2.4) was terminated when the j term was less than 10 . This turned out to be between 5 and 11 terms in all cases considered. For small a, A, and v=3, the approximation formula slightly overestimates the probability. However, for v=3, a^ .20, and large values of A, the approximation underestimates the probability by as much as .01. For this reason, it was decided to use the computational formula (6.2.3) in the discussion which follows. Once again, a Fortran program was written to conduct a grid search for optimal designs. This was done for N=100 total comparisons and all combinations of 3 ,S_=0(.2)1. The entire procedure was repeated for error rates a=.01 , . 05, . 10, .15, . 20. The optimal design minimizes 2 '^ ^ (XX) 1 dx , (6.2.5) where E is given in (6.2.1). By (6.1.1), y=*'N6^ was substituted for y in (6.2.5). The procedure used for each value of a is identical to the one described in the last subsection of Section 4.2, except that the values of x , x , and the comparison proportions were found accurate to only Â±.001. The result of the grid search was that for 6 ^ .2, the level a=.01 gave the smallest value for V. This implies that it may be best to take a=0. This further implies that a linear model should be fit regardless of the data. For S ^ .4, it appeared that it is best to
PAGE 159
149 use a=l, implying a quadratic model should be fit regardless of the data, and hence the optimal designs are found in Section 4.3. This is a very peculiar result. It seems to be implying that there is some value of 3 , say S> ^, in the interval (.2,. 4) such that a linear model should be fit for any 6  less than S ^, and a quadratic model should be fit for any \S> \ greater than 6 . No definite explanation for this phenomenon is presently known.
PAGE 160
CHAPTER 7 TWOSTAGE SAMPLING FOR MODEL FITTING 7.1 Introduction Suppose the entire experiment is conducted in two stages. The total number of paired comparisons, N, is then split into two parts. The first sampling stage has N total comparisons, and the second has N total comparisons, such that N +N^=N. After the first stage, the hypothesis H : S_=0 is tested. If the hypothesis is rejected, then the design of the second stage would be one which is optimal for fitting a quadratic model. The quadratic model is then fit using only the data from the second stage. If the hypothesis is accepted, then the second stage would be a design optimal for fitting a linear model, again using only the second stage data for the estimation of the linear model. The first stage design, referred to as Design 1, is {n = n = . 5N } . This design was shown in Section 3.3 to minimize var(6^) for 3 =6 =0. The design was also shown to be highly efficient for 6 I ^1. This is a good choice for the first stage design because it maximizes the power of the test of the null hypothesis for a given value of N . The second stage design is one of two choices depending on whether or not it is decided to fit a quadratic model. If the null hypothesis is rejected, the design for the second stage, referred to as Design 2Q, is fn _, , = n . ^, .4rJ , n , , = .2N^}. This design was shown in .24,1 i,.24 2 1,1 2 150
PAGE 161
151 Section 4.5 to be one of the overall better designs for fitting a quadratic model. Finally, if the null hypothesis is accepted, the design {n_ = N^} is to be run. This design was shown in Section 3.2 to be optimal for minimizing var(6,), and hence optimal for fitting a linear model. This design is referred to as Design 2L. The following section discusses optimal choices of the error rate for the test of hypothesis and the optimal values of N and N using a Bayesian approach. 7 . 2 Optimal Error Rate Two prior bivariate distributions for (6^ ,B_) and two optimality criteria are considered using a Bayesian approach. The prior distributions are the uniform and normal distributions, both centered about the point (0,0). The uniform prior is appropriate if an experimenter could state that 3, and g are in some intervals centered about zero, but with no reason to believe that some subset of values for 8 and B is more likely than any other. On the other hand, if the person could state that S and 6^ are in some intervals centered about zero, and furthermore that it is more likely that they are near zero than it is that they are far away from zero, then the normal prior is appropriate. The optimality criteria considered are the minimization of the integrated mean square error and Doptimality. These two criteria have been previously discussed in Chapter 4 for the case of no bias, and are also applicable for the present situation. The difference between the situation in Chapter 4 and the situation in the present chapter is that bias can appear in the present situation. That is, there is bias present if a linear model is fit when the true model is quadratic. For this reason, the mean square error of the maximum likelihood estimators
PAGE 162
152 is used in the present chapter instead of the variancecovariance matrix. The decision rule, d, is a choice of either the linear design (1) or the quadratic design (q) for the second stage of the experiment. Notice that the decision rule is equivalent to choosing the error rate for the test of H : 6^=0. For the present section, the Saves rule is o 2 found for fixed N, and N^, such that N,+N^=100. 12 12 Define p to be the probability of rejecting the null hypothesis. The value of p depends on 6 , a, and var(B^) from the first stage. In fact, it can be shown that 1 ^(z^) + ^(2^) , (7.2.1) where "l = V2 V^^^^^^^))' , (7.2.2) ^0 = 2^/0 B_/(var(6_))"^ , (7.2.3) 2 a/2 2 2 and where * is the standard normal cumulative distribution function, and z is such that (1 $(z )) = a. Because of the symmetry of the first stage design, by (3.3.9) the standard error of 8 is (var(32))'^ ^ '^^"\2 " ^ ^"i*l o^^' ^ ^^1*^0 i/^)'""^ ' (7.2.4) where () and i1) are given in (4.2.9). Consider first the integrated mean square error criterion. The loss function for this criterion is L(8,l) = / MSEdn TT ) dx , (7.2.5) J
PAGE 163
153 L(3.,q) = MSEdn TT ) dx , (7.2.6) where 6 = iB^,Q^), In TT = S,x , (7.2.7) X 1 In 7T = B,x + B^x^ , (7.2.8) X 12 and where B is the maximum likelihood estimator from Design 2L, and (6, fS ) is the maximum likelihood estimator from Design 2Q. The risk function is then R(6,d) = (1 p)L(B,l) + pL(S,q) . (7.2.9) The mean square error of (S ,0) is MSE(6 ,0) = , (7.2.10) [ s,^i where (^ is given in (4.2.9) , and ^1* = ^V1,1 ' ^''^^^ From (7.2.5) and (7.2.10), the loss when Design 2L is chosen is L(B,1) = (X x ) l/\,,* \ / X 1 \ B. / V > I a. 11
PAGE 164
154 The mean square error of (3,/3Â„) is ^/\l MSE(6^,B2) " ' (7.2.13) V 1/X22 / where from (3 . 3. 2) (3 . 3. 5) , ^22 = ^2^^^^^*l,.24 ^'^.24,1^^ ' ^^"^^^^ .766 + 1.05766 1,.24 3^ 6^ .246^ + .05766^ 2 ( e + e ) (7,2.16) .766 + 1.05766. .24,1 .246^ + .057662 ^1+22 2 ( e + e ) (7.2.17) and (p_ is given in (4.3.9). The offdiagonal elements of (7.2.10) and (7.2.13) are zero because both second stage designs are symmetric. From (7.2.6) and (7.2.13), the loss when E)esign 2Q is chosen is L(6,q) = (XX) dx ^l \ l/X22/\x^/ = 2Z1 ^ 2^ , (7.2.18) \l 22 Then for the integrated mean square error criterion, by (7.2.9), (7.2.12), and (7.2.13) the risk function becomes
PAGE 165
R(3,d) (1 P) /3 +l6 ' \l* ^^ + p 2/3 2/5 The Bayes risk is defined to be 155 (7.2.19) r{T,d) R(6,d) dT(6) (7.2.20) For the Doptimal criterion, the loss function is L(6,l) L(S,q) 1/A^^* 3, l/X 11 i/\ 22 ^ /\i* i''\r^22 (7.2.21) (7.2.22) Hence the risk function in this case is R(3,d) (1 P)&,A^^^ ^ P/A^^X^^ (7.2.23) The two prior distributions are now defined. The first prior is the bivariate uniform distribution, whose density function is dT(3) 1/4U if 3^Â£[l,l], &^eLu,u} , otherwise , (7.2.24) where U is the upper limit to the prior distribution of 3 Â• The other prior considered is the bivariate normal distribution. Since the support of the normal distribution is infinite, the density was truncated at three standard deviations about the mean in order to perform the numerical integration. The marginal densities are then
PAGE 166
156 4.5" 2 dTg_^(6^) dx iB) 2 ^ /.3334 if S^eL1,1], otherwise , 2 2 e /2.5a if 6Â£[U,u], otherwise. (7.2.25) where a=U/3. These marginals are the usual normal density divided by .9974 so that the density integrates to 1. The joint prior distribution of iB ,Q ) is the product of the marginals. The integral in (7.2.20) can not be analytically found for any criterion and prior since p is a function of the standard normal cumulative distribution, which in turn is a function of 3_. So a SAS computer program was written to numerically integrate (7.2.20) for the two prior distributions described in the two preceding paragraphs and the two criteria discussed in the present section. The numerical integration is performed as follows. The rectangle formed by the area of positive density is divided into a 10" grid. This results with 100 rectangles of equal size. The expression R{8_,d)dT{6_) is then evaluated at each of the centers of the 100 rectangles. This gives the height of the function R(3_,d) dT (8_) at the center of each rectangle. The lengths of the sides of the rectangles are .02 and .02U, and so the approximate volume of each region formed by the function and the sides of each rectangle is found by multiplying the height of the function at the center of the rectangle by .04U. The approximation of the interval in (7.2.20) is then found by summing these 100 volumes.
PAGE 167
157 For each of the four combinations of the two prior distributions and tvro criteria, the Bayes risk in (7.2.20) was found for N =10(10)90, a=0(.05)l, and various values of U. The value of N is found by N =100N . For each value of N, and U, the best a level was found from 2 1 1 the choices 0=0 (.05)1. These optimal a levels are graphed as a function of U, the upper limit to the prior distibution of S, for each of N =10(10)90. Remember that these graphs are not precise since the optimal a level is only found to the nearest multiple of .05, but they do give a basic understanding for the behavior of the a level as a function of the prior distribution of 6 Â• The graphs for the integrated mean square error criterion with a uniform prior are found in Figure 7.1, and with a normal prior are found in Figure 7.2. The graphs for the Doptimality criterion with a uniform prior are found in Figure 7.3, and with a normal prior are found in Figure 7.4. The next section contains a discussion of these four figures. 7 . 3 Concluding Remarks In Figures 7.17.4, the axis labelled "U" corresponds to the upper limit of the prior distribution of 6 . The Uaxis in the graphs for the normal prior. Figures 7.2 and 7.4, ranges up to 2.08. The normal prior with U=2 corresponds to 3 having a normal distribution with mean 0, and standard deviation 2/3. The Uaxis of the graphs for the uniform prior. Figures 7.1 and 7.3, ranges up to 1.2. It is likewise true that for the uniform distribution, U=1.2 implies that the standard error of 3 is also approximately 2/3. So for a given amount of variability, the upper limit for the normal prior is approximately 1.7 3 times the upper limit for the uniform prior.
PAGE 168
158 For example, if an experimenter believes Q is somewhere in the interval (.5,. 5), but has no inclination to where in that interval 6 2 is, then a uniform prior would be appropriate. Figure 7.1 indicates that for U=.5 and N =10, the test should be run at an error rate of approximately a=.20; for N =20, a=.08. A normal prior with the same amount of variability or uncertainty would be found by multiplying U=.5 by 1.73. This results in U=.865. Figure 7.2 indicates that for U=.865 and N =10, the test should be run at an error rate of approximately a=.35; for N =20, a=.20. So the error rates for the particular value of the standard error of 3 discussed in the preceding paragraph appear to be slightly larger for the normal prior than for the uniform prior. However, this is not apparent for the entire range of values for U and N . In fact, by looking at the maximum value of U in Figxires 7.1 and 7,2, a=l in both cases for N =10,20,30, but a is larger for the uniform prior than for the normal prior when N =40(10)80. For most values of the standard error of S_, and N , in particular when the standard error is relatively large, a is larger for the uniform prior than for the normal prior. Notice from all four figures that for a given value of U and N , the a level for the normal prior is smaller than for the uniform prior. This is partially explained by the previous paragraph. The normal prior with the same value of U as a uniform prior will have a smaller standard error, and most of the density is near 3^=0. Consequently, there is a higher probability that 3^ is near zero, and it is therefore reasonable that the hypothesis H : 3^=0 should be tested at a o 2 smaller a level.
PAGE 169
159 Comparing the two criteria for each of the two priors indicates that for given values of U and N , the a level is smaller for the Doptimal criterion than it is for the integrated mean square error criterion. It turns out that in all four situations, for a given value of U, the Bayes risk is smaller for N =10 than it is for any of N =20(10)90. This indicates that if an experimenter is interested in rionning a twostage experiment, a relatively small number of comparisons should be used in the first stage, saving the majority of the available comparisons for the second stage. This may be explained by the fact that in the present discussion, the first stage data are not used in the estimation phase. Only the second stage data are used for the final estimation. So the result found at the beginning of the present paragraph implies that it is better to use the large majority of the available comparisons for the sole purpose of testing the null hypothesis. Further research remains to be conducted on combining the first and second stage data in the estimation phase. This would involve deriving vardn tt ) , where In tt is expressed as ~ ~ 2 In ^ = 1(3, X) + (1 I) (3,x + &K ) , X 1 12 and where I is the decision rule for the first stage, i.e. r 1 if it is decic Lded to fit a linear model, I = { (_ if it is decided to fit a quadratic model, This is not a trivial problem because the random variables I, B , and
PAGE 170
160 (3,/8Â„) are not independent. Intuitively, if this could be resolved, it appears reasonable that some optimal ratio N /N and a level could be found .
PAGE 171
161
PAGE 172
162
PAGE 173
163 o rO 3 O OOO
PAGE 175
APPENDIX A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES OF 7T, , . . . .TT ,9 1' t This appendix contains a computer program written in the language APL. There are two matrices which need to be entered onto the computer. First of all, the matrix referred to as "N" is a t^t matrix, with (i,j) element equal to the number of times treatment T. is oref erred 1 over treatment T.. The other matrix is also txt, and is referred to as "TIE". The lower triangular part and the main diagonal of "TIE" are all zeroes. The (i,j) element, where i
PAGE 176
166 freedom. If there are no ties, only (1) and (5) above are printed, because in this case the calculation of the element of the inverse to the variancecovariance matrix corresponding to 9 results in division by zero. An iterative procedure is used in the estimation phase of the computer program. The initial estimates of 9,TT , ... ,7T are the ones presented by Dykstra (1956) . The method used to find the estimates and asymptotic variancecovariance matrix was presented by Rao and Kupper (1967) . In their presentation, it was assumed that each pair was compared an equal number of times. Therefore, as pointed out above, the variancecovariance matrices given by the program may not be applicable if the pairs are not compared an equal number of times. T^+. Q(k) 1 (k) " (k) , ^ . th Let y ,TT^ , ... ,Tr be the estimates after the k iteration. The iterative procedure is terminated when (k1) _ ^(k) (k1) 1 + t i=l ; (k) ^ ; (k1) 1 i : (k1) < 10 4 If so desired, this can be changed simply by altering line [13 J. Example. Suppose that N = 40 20 40 14 TIE = 3 5 1 Then an execution of the program yields the following;
PAGE 177
167 Initial estimates Final estimates (19 iterations) 1.0789
PAGE 178
168 1 r e z H i iil St Â«Â»Â• A j. N C + Ci <^ Jl M 2 ;Xi ili HI U< Â£i + X s ^ i13 i+ X ^ ^ C
PAGE 179
169 Â«t
PAGE 180
APPENDIX B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES OF S, ,...,8 1 S This appendix contains two computer programs that find iterative estimates of S , ... ,S . The first one, found on pages 174 and 175 is written in the language APL. The second one, found on pages 176 and 177, is written in Fortran. The APL version is discussed first. The input is the following, (1) DATA = A t^t matrix whose (i,j) element is the number of times T. is preferred over T.. (2) MODEL = A vector of integers defining the model which is to be fit. This is now explained in detail. Suppose there are v independent variables in the experiment. Then the first v elements of MODEL give the order of each variable (e.g. 3 means to 2 3 include x., x. , and x. ). The remaining elements are considered 111 ^ in pairs, and correspond to twoway interaction terms which are to be included in the model (e.g. (1,3) occurring after the first V elements means that the term x x should be included). Notice that any 3order or higher order terms can not be included without a modification of the program. (3) X = A t^v matrix of levels in the experimental design. 170
PAGE 181
171 The output is the following. (1) A listing of the vector MODEL. (2) The txs matrix X formed by concatenating onto X the columns corresponding to the higher order and interaction effects. (3) The sxtxt matrix where the (i,j,k) element is (X) .. (X), .. 31 ki (4) The parameter estimates at each iterative step until the convergence criterion is met. The initial estimates are 6,='''=6 =0. Is These estimates at each iteration are printed in the same order as the columns of X. The NewtonRaphson method with a slight modification is used to perform the iteration phase. See Scarborough (1962) for a thorough discussion of the NewtonRaphson method. Springall stated that the iterative estimates of 8,3,/ ... ,3 converged ver^/ slowly. This is Is also true when only estimating 3,/ ... ,3 using the NewtonRaphson method without any modifications. However, it is the author's experience that when the convergence is slow, it is because the estimates are "j\imping" from one side of what will be the final estimates to the other. That is, if 3,, ... ,3 represent the final estimates, then the 1 s estimates after the k iteration may be 3 +Â£, , ... ,3 +Â£ t and the 11 s s St ^ estimates after the (k+1) iteration may be approximately 8 Â£w ... , 3 Â£ , for some values of Z., i=l, ... ,s. Because of this phenomonen, s s 1 the modification to the NewtonRaphson method is to take the average of the k" and (k+1) iterative estimate before continuing with the next iteration whenever the direction of convergence changes. For example, St if the successive estimates of 3 are decreasing, and the (k+1) estimate of 3^ is larger than the k , then the average of the k and
PAGE 182
172 St st (k+1) estimate replaces the (k+1) estimate before proceeding to the (k+2) iteration. This modification substantially decreased the number of iterations until convergence. The iteration phase is terminated when the maximum difference between the estimates and the preceding estimates is less than 10 If it is desired, this can easily be changed by altering line [55]. As pointed out in Section 2.2, the procedure does not always converge. However, the only time this occurred was when it was attempted to fit a full model, i.e. when s=tl. When s=tl, the program in Appendix A can be used to find estimates of In tt . , and then the t equations, t1 In TT 1 ,/_, k ki i=l, . . . ,t, k=l can be solved for ,B tl Example . Suppose that DATA = 10 6 15 11 19 14 MODEL = [2 J Then an execution of the program yields the following: MODEL = 2 .X = Matrix of differences; 1 1 2 1 2 1 1 1 1
PAGE 183
173 Iteration 6,,S ,... 1 .468182 .145455 2 .485958 .319043 3 .469350 .144161 AVG .477654 .087441 11 .478947 .084719 As can be seen in the iterative phase in the example, an average is taken between the 3 and 4 iteration. This occurred because 3, decreased and 6increased from the 1 to the 2 iteration, while the reverse occurred for the 2 to the 3 iteration. The Fortran program as it currently stands can only be used for fitting a quadratic model. It can, however, be modified to be as general as the APL program. The matrices DATA and X need to be read in. This is done in line numbers 38. These statements in addition to the data itself at the end of the program will need to be changed according to the problem at hand. A peculiar occurrence was made evident by running the Fortran program for the same example as presented above. It turned out that the Fortran program converged to the same final estimates in only 4 iterations instead of 11 iterations. The procedure in each program is the same, but no explanation for the difference in the speed of convergence for the two programs is presently known.
PAGE 184
174 Â€21 r.Z2 LAI Z51 l61 Z71 C31 C?l i: 1 J cm 11123 C13a i: 1 4 3 C15: C161 C173 1131 Ci?l L201 i::2in l22] 1:23: i::24 3 C25: C263 :27j C28D C293 C30] C31J C32J C33J C34a C35D C36: C37: C33: C393 C40 3 1:413 L'42n C433 C443 C453 ,T, (T^f ;;[:;i3) );^Q 7 MOr.EL. ESTIM >' J I J J ; T J K ; M ATR I K J XD I FF J 3 J I TEP: ; P ; P ^ ; P 2 ? F3 5 COP: ; D ; DETERM ,* L J N ; BETftOLD ; DIFFOLI' J D IFF J FLAG J l_UL. BETA<(l,pX[:m)fO 'f^F'(1 =*^'^J:'e:iCi:il)/ii,L.3 '3:Â«^i Â»(1 =K (1 =1 cr.iFF^( (S^/;;[:i;3 T2: i^i+i ^3:'<''+i ;cr.iFF[:;i; j:^KCi;:C^f 3 xr.iFFr; jji]^_v;i:,iFFc; i; j: 4(1 =!< T)/T3, T4 T4M(1 =I< (Ti) )/T2,T5 D<' ' ;n<' ' ?;;diff MATRi;;^ ( ( 5 , T , T ) ^DATA ) x>.'DIFF FLAG<ITEP:t0 n*' ' r n<' ' ;d<' ' ;a<' ' ; ' itef.ation COF.>.( 1 ,S)j.0 To : P'ft&ETA+ , xi^X != 1 ^ C T f T ) f , Ff^^^CjFl F3Â«'='lp'l+p2 EaNSf._ + / + /MATF:I>;x ( S , T , T ) ;.Â•> F3 D<(3,S)/.0 WAT*(5,5,5)^0 T7;l<l + i E'C ; L3^ + / + /MATF:i;;x ( 5 , T , t ),Â«(( t , T ),>;;[:; lj ) xF 3( Fl x ( Sj (( r , t ) ,^KC ; "3 ) X F1 ) + ( ( T , T ) ;.> ;.r c ; L 3 ) X F1 ) ( F1 +F2 ) Â« 2 MAT[:;;L:3f.(S.S)fD[:;L3 MATCL ; : L]<EQ?!S OF DIFFERENCES &0' ^1 f B2
PAGE 185
175 1146: C472 C481 C49: C503 C5in C521 C533 C54a C55] C563 C57J C53: C593 12601 Â£612 C62J C63: i:64: l6oI] C66j 1:67 J CoSj C693 C70 3 l712 C723 C733 C74J 1:733 >(1 =L.<3)/T7,TB Tg ; riETERM^DET D T9:i<.i + i COR CI ; X]t(DET MflTCI ; ; 3 )^DETEF:M 4(1 =I<5)/T9.Â»T10 TIQ J ITER^ITER + l 2<BETA BETftfBETA + COR ' I5f5Â»30f^l0,6 ' D>="MT( ( 1 ,S+1 )p ITER, BETA) *(1 =< r/ I BETABETAOUP) < lE.5)/T12,Tll Tlir^d =ITER<50)/T11A,0 "fllÂ«:>(l =f=">AG = 0)/TllB,TllC TUB : FLAG<1 "'"ll'^WCl ='="1'^G=1)/T11D,T11E TllDJ tiIFFOl_r><BETABETAOlD FUAGf.2 TllE:DIFF*BETA&ETAOl_r' AAAJL.i_l_<Ll.l_+j^ Â•^(1 =< ( xr.IFFCl ;i.I_1.3 )=xDIFFOI_riCl ;il_l_:] ) ) /BBB, TllF &&B;*(1 =11I_ = 5)/T1XG, AAA TllF;BETAÂ«(BETA+&ETAOI_r') f2 ' Â«'''G Â• ; Â• ;;5,30f^io,6 ' ni^wf c ( i , 5)f e
PAGE 186
176 DIMENSION DATA(5,5) r XDIF F ( 2 f 5 . 3 ) , XDIFFM ( 2 ,5.. 5 ) rP(5) rPRATI0(5r5) I lX(5f 2) > BETA (2) r D(2>2) fÂ«AT( 100^3) READ(5f 100) ( (DATA(IÂ» J) f J=l ,3) f 1=1 f 3) 100 FORMAT ( 12F2.0) WRITE C6f 101) ( (DATAdf J) rJ=l,3)Â»I = lf3) 101 F0RMAT(20Xf ' DATA ' ///3 ( 3F6 , 0// ) ) READ (5f 102) (X(IÂ»1),I=1,3) 102 FORMAT (3F2.0) DO 1 1=1,3 1 Xdf 2)=X(I,1)**2 DO 2 1=1,2 DO 2 J=lf3 DO 2 K=lf3 2 XDIFFd, J,K)=X( J,I)X(K, I) WRITE(6Â»103)((X(I,J),J=l,2)Â»I=lf3) 103 FORMAT ( '1' f20X, ' X ' ///3 ( 2F6 . 1// ) ) URITE(6f 104) ( ( (XDIFFC I , Jf K) f K=l,3) , J=lf 3) ,1=1. 2) 104 FORMAT ( '1' f20Xf ' XDIFF ' ///3 ( 3F6 . 0// ) ///3 ( 3F6 , 0// ) ) BETA (1) =0.0 BETA(2)=0.0 CALL ESTIM(DATAf3rl00dTERfX,XDIFF.BETAf D.DETfMATfM) WRITE(6,105)ITERf BETA( 1) r BETA (2) 105 FORMATC '1' rlOXr 'NUMBER OF ITERATIONS ' //15X d6//20X ,' BETA ' /// 12F15.6) UR I TE ( 6 . 1 06 ) DET f D ( 1 f 1 ) T D ( 1 f 2 ) f D ( 2 r 1 ) . D ( 2 Â» 2 ) r 1 ( (MATCIf J) r J=l Â»3) f 1=1 Â»M) 106 FORMAT ( ' 1' f lOX , F15 . 6//4F20 . 6////100 ( /3F20 . 6 ) ) STOP END SUBROUTINE ESTIM(DATAf IITfNITERrlTERf XfXDIFFf BETAf D^DET.MATfM) DIMENSION DAT A (5. 5) , XDIFF ( 2 f 5 , 5 ) , XDIFFM ( 2 f 5 f 5 ) , P ( 5 ) . D ( 2 r 2 ) , lPRATI0<5f 5) ,X(5,2) f BETA(2) .BOLD (2) .D0LD(2) f DIFF(2) rMAT( 100,3) DO 1 1 = 1 f 2 DO 1 J=lrIIT DO 1 K=lfIIT 1 XDIFFM( I , J,K)=DATA( JtK)*XDIFF( I , J.K) M = ITER=0 IFLAG=0 10 DO 2 1 = 1 T I IT 2 P( I)=EXP(BETA( 1 )*X( Ir 1 )+BETA(2)*XCI,2) ) M=M+1 MAT(MÂ» 1 )=FLOAT( ITER) MAT(Mr2)=BETA( 1 ) MAT(M.3)=BETA(2) FUNC1=0.0 FUNC2=0.0 DO 3 1 = 1 f I IT DO 3 J=lfIIT PRAT 1 (I , J ) =P ( J ) / ( P (I ) +P ( J ) )
PAGE 187
177 4 44 11 14 16 //GO, 0151 1 FUNC1=FUNC1+XDIIFM(1 r I f J ) *PRATIO (I ? J ) FUNC2=FUNC2+XDIFFM(2f Ir J)*PRATIO(rr J) DO 44 IR=lf2 DO 44 L=1j2 DD=0.0 DO 4 I=lfIIT DO 4 J=lfIIT IFd.EQ. J) GO TO 4 DD = DD + XD IFFM ( IR ^ I Â» J ) * ( X ( J Â» L) *PRAT 10 ( I f J ) l(X(IfL):KPRATIO( J.I)+XC JrD^PRATIOdf J)**2) ) CONTINUE D(IRfL)=DD DET=D(lf l)*D(2f2)D(lf2)*D<2Â»l) ITER=ITER+1 B0LD(1)=&ETA(1> B0LD(2)=BETA(2) BETA ( 1 ) =BETA (1 ) + ( FUNC2*D ( 1 f 2 ) FUNCl *D ( 2 f 2 ) ) /DET BETA ( 2 ) =BETA ( 2 ) + ( FUNC1*D (2^1) FUNC2*D ( 1 r 1 ) ) /DET DO 5 1 = 1 f 2 DIF = ABS(BETA(I)BOLD(r ) ) IF(DIF,GT. .000001) GO TO 11 CONTINUE RETURN IFCITER.GE. NITER) RETURN IF(IFLAGÂ»EQ.2) GO TO 14 IF( IFLAG.EQ.l) GO TO 13 IFLAG=1 GO TO 10 DGLD ( 1 ) =&ETA ( 1 ) BOLD ( 1 ) DOLD C 2 ) =BETA ( 2 ) BOLD ( 2 ) IFLAG=2 GO TO 10 DIFF(1)=BETA(1)B0LD(1) DIFF(2)=BETA<;2)B0LD^2) IF(SIGN(DIFF(1) Â»D0LD(1) ) .NE.DIFF(l) ) GO TO 15 IF(SIGN(DIFF(2) f D0LD(2) ) .EQ.DIFF(2) ) GO TO 16 BETA( 1 )=
PAGE 188
BIBLIOGRAPHY Abelson, R. M. and Bradley, R. A. (1954) . A 2X2 factorial with paired comparisons. Biometrics 10 , 487502. Beaver, R. J. (1977) . Weighted least squares response surface fitting in factorial paired comparisons. Commun. Statist. A6 , 127587. Box, G. E. P. and Lucas, H. L. (1959). Design of experiments in nonlinear situations. Biometrika 46 , 7790. Bradley, R. A. (1953) . Some statistical methods in taste testing and quality evaluation. Biometrics 9 , 2238. Bradley, R. A. (1955) . Rank analysis of incomplete block designs. III. Some largesample results on estimation and power for a method of paired comparisons. Biometrika 42 , 45070. Bradley, R. A. (1976). Science, statistics, and paired comparisons. Biometrics 32 , 21332. Bradley, R. A. and Terry, M. E. (1952) . The rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 , 32445. David, H. A. (1963). The Method of Paired Comparisons . Griffin, London . Davidson, R. R. (1970) . On extending the BradleyTerry model to accommodate ties in paired comparison experiments. J. Amer. Statist. Assoc. 65 , 31728. Davidson, R. R. and Bradley, R. A. (1970) . Multivariate paired comparisons: Some large sample results on estimation and tests of equality of preference. In Nonparametric Techniques in Statistical Inference, (M. L. Puri, ed.) . Cambridge Univ. Press, 11125. Dykstra, 0. (1956) . A note on the rank analysis of incomplete block designs applications beyond the scope of existing tables. Biometrics 12 , 3016. Dykstra, 0. (1960). Rank analysis of incomplete block designs: a method of paired comparisons employing unequal repetitions on pairs. Biometrics 16 , 17688. 178
PAGE 189
179 ElHelbawy, A. T. and Bradley, R. A. (1978) . Treatment contrasts in paired comparisons: Largesample results, applications, and some optimal designs. J. Amer. Statist. Assoc. 73 , 8319. Ford, L. R. , Jr. (1957). Solution of a ranking problem from binary comparisons. Amer. Math. Monthly 64 (8) , 283 3. Graybill, F. A. (1969). Introduction to Matrices with Applications in Statistics . Wadsworth, Belmont, Calif. ~~~~ Grizzle, J. E. , Starmer, C. F. and Koch, G. G. (1969). Analysis of categorical data by linear models. Biometrics 23 , 489504. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables . (1964). U.S. Department of Commerce, National Bureau of Standards. Haynam, G. E., Govindarajulu, Z. and Leone, F. C. (1962). Tables of the Cumulative NonCentral ChiSquare Distribution . Case Institute of Technology Statistical Laboratory, Report 104, Cleveland, Ohio. Kaplan, W. (1952). Advanced Calculus . AddisonWesley, Reading, Mass. Larmond, E. , Petrasovits, A. and Hill, P. (1968). Applications of multiple paired comparisons in studying the effect of aging and finish on beef tenderness. Canad. J. Anim. Sci. 49 , 518. Littell, R. C. and Boyett, J. M. (1977). Designs for RXC factorial paired comparison experiments. Biometrika 64 , 737. Quenouille, M. H. and John, J. A. (1971). Paired comparison designs for 2" factorials. Appl . Statist. 20 , 1624. Rao, P. V. and Kupper , L. L. (1967). Ties in pairedcomparison experiments: A generalization of the BradleyTerry model. J. Amer. Statist. Assoc. 62 , 194204, Corrigenda 63 , 1550. Scarborough, J. B. (1962) . Numerical Mathematical Analysis . The Johns Hopkins Press, Baltimore, Maryland. Scheffe, H. (1952). An analysis of variance for paired comparisons. J. Amer. Statist. Assoc. 47 , 381400. Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7, 101933. Springall, A. (1973). Response surface fitting using a generalization of the BradleyTerry paired comoarison model. Appl. Statist. 22, 5968.
PAGE 190
180 Thompson, W. A. and Singh, J. (1967). The use of limit theorems in paired comparison model building. Psychometrika 32, 25564. Thurstone, L. L. (1927). Psychophysical analysis. Amer. J. Psychol 38, 36889. Wilks, S. S. (1962). Mathematical Statistics . John Wiley s Sons, New York .
PAGE 191
BIOGRAPHICAL SKETCH Walter William Offen was born in Harmony, Minnesota, on November 30, 1953, the only Americanborn child of German immigrants. Six months later his family moved to the Chicago area where he resided through high school. He attended Northwestern University and received the Bachelor of Arts degree in June, 197 5, with a major in mathematics and a minor in economics. He received the degree of Master of Statistics from the University of Florida in June, 1977. Since that time he has been pursuing the degree of Doctor of Philosophy. During his time in graduate school at the University of Florida, he worked for the Department of Statistics as an assistant in their Institute of Food and Agricultural Sciences consulting unit. He will begin working for Eli Lilly and Company in September, 1980, as a senior statistician. Walter Offen is married to the former Susan Elaine Hauser. They will reside in Noblesville, Indiana. 181
PAGE 192
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. /,/^... Â— ^. Ramon C. Littell, Chairman Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Pejaver V. Rao Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. X^^ ^ Alan G. Agresti Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard F. Matthews Professor of Food Science and Human Nutrition
PAGE 193
This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate Council , and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1980 Dean, Graduate School

