DESIGN OF PAIRED COMPARISON EXPERIMENTS
WITH QUANTITATIVE INDEPENDENT VARIABLES
BY
WALTER WILLIAM OFFEN
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1980
Digitized by the Internet Archive
in 2009 witiff~hiing from
University of Florida, George A. Smathers Libraries
http://www.archive.org/details/designofpairedco00offe
ACKNOWLEDGMENTS
I would like to thank Drs. Ramon C. Littell and William Mendenhall
for encouraging me to work past the Master of Statistics degree to pur
sue a Ph.D. I sincerely appreciate their concern and beneficial advice.
A special thanks goes to Dr. Ramon C. Littell, first of all for
the attention and help he gave to me in my early years at the University
of Florida while working for him at the Institute of Food and Agricul
tural Sciences. I especially thank him for the immense amount of time
and effort he contributed to this project. His patience and sincere
interest will always be remembered.
Thanks go to my family for their understanding and support during
the many years of my education. I am especially grateful to my wife,
Susie, for her love and caring, which makes it all worthwhile.
Finally, I would like to thank my typists, Susan and Walter Offen,
who both did an outstanding job of typing this manuscript. Also,
thanks go to Drs. John Cornell, Ramon Littell, Frank Martin, and
Kenneth Portier for allowing my typists to use their office type
writers.
1ii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . . . .. . . . . . iii
LIST OF TABLES . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . ... . . . .vii
ABSTRACT . . . . . . . . ... . . . . . .viii
CHAPTER
1 DESIGN OF PAIRED COMPARISON EXPERIMENTS: A LITERATURE
REVIEW . . . . . . . . . ..... 1
1.1 Introduction .... . . . . . . . 1
1.2 BradleyTerry Model . . . . . . . 3
1.3 Estimation of the BradleyTerry Parameters and
Corresponding Asymptotic Distribution . . 6
1.4 Designs of Factorial Paired Comparison
Experiments . . . . . . . . . 11
1.5 Response Surface Designs . . . . ... .22
2 ASYMPTOTIC THEORY OF THE MAXIMUM LIKELIHOOD ESTIMATORS 26
2.1 Introduction . . . . . . . ... 26
2.2 Estimation .. . . . . . . . .27
2.3 Asymptotic Distribution . . . . . . 29
3 MINIMIZATION OF THE VARIANCE OF A SINGLE PARAMETER
ESTIMATE ........ . . . . . . .... 34
3.1 Introduction . . . . . . . . . 34
3.2 Linear Model .... . . . . . .34
3.3 Quadratic Model ... . . . ... 38
Minimization of var (82) . . . . ... 38
Equivalence of ElHelbawy and Bradley's
method and minimizing the variance . . 50
3.4 Cubic Model .. . . . . . . . 58
4 DESIGNS FOR FITTING A QUADRATIC MODEL . . . ... .65
4.1 Introduction .... . ..... ... .... 65
Page
. 68
4.2 Doptimal Designs . . . . . .
Condition 1 .....
Condition 2 . . . . .
Condition 3 .....
4.3 Averagevariance Designs ....
Condition 1 .....
Condition 2 . . . . . .
Condition 3 ......
4.4 Minimax Designs ......
Condition 1 ......
Condition 2 . . . . .
Condition 3 . . . .
4.5 Conclusion and Design Recommendations . .. .128
5 DESIGNS THAT PROTECT AGAINST BIAS . . . . ... 135
Introduction . . . . . . .
All Bias Designs . . . . . .
Integrated Mean Square Error Designs .
. . 135
. . . 136
. . . 138
6 DESIGNS FOR PRELIMINARY TEST ESTIMATORS . . . .. .145
6.1 Introduction . . . .. . . . . .145
6.2 Averagevariance Designs for Preliminary Test
Maximum Likelihood Estimators . . . ... .146
7 TWOSTAGE SAMPLING FOR MODEL FITTING . . . . .. .150
Introduction . . . . . . .
Optimal Error Rate . . . . . .
Concluding Remarks . . . . . .
. .. 150
. 151
. .. 157
APPENDIX
A COMPUTER PROGRAM THAT FINDS MAXIMUM LIKELIHOOD ESTIMATES
OF 7i ...." t'r . . . . . . . . .. .. 165
B COMPUTER PROGRAMS THAT FIND MAXIMUM LIKELIHOOD ESTIMATES
OF 8 ,..., s . . . . . . . . . . 170
BIBLIOGRAPHY . . . . . . . . . . . . . 178
BIOGRAPHICAL SKETCH . . . . . . . . .. ... .181
LIST OF TABLES
TABLE Page
1.1 Efficiencies of balanced 2 and 2 factorial paired
comparison designs . . . . . . . . ... 18
3.1 Designs which minimize var(B ) . . . . . . 48
3.2 Designs which minimize var( 3) . . . . . . 64
4.1 Doptimal designs with restriction x=l,0,1 ...... 90
4.2 Doptimal (local) designs with 2=0 . . . . . 94
4.3 Doptimal designs .. . . . . . . . . .95
4.4 Averagevariance designs with restriction x=l,0,l .. 105
4.5 Averagevariance (local) designs with B2=0 ...... 109
4.6 Averagevariance designs . . . . . . . .. 110
4.7 Minimax designs with restriction x=l,0,1 . . ... 119
4.8 Minimax (local) designs with 2=0 . . . . ... 123
4.9 Minimax designs ... . . . . . . ... .124
4.10 Efficiency of designs . . .... ... ...... 133
4.11 Bias of designs .... . . . . . . . 134
5.1 Jcriterion designs . . . .. . .. . . 141
5.2 Values of J . . . . . . . . . . . .143
LIST OF FIGURES
FIGURE
1.1 Depiction of Designs I and II . . . . .
3.1 Designs which minimize var(2) . . . .
4.1 Procedure summary . . . . . . .
4.2 Depiction of possible designs for Condition 1 .
4.3 Locally optimal designs when 2=0 . . . .
7.1 Minimization of the average mean square error 
prior . . . . . . . . . .
7.2 Minimization of the average mean square error 
prior . . . . . . . . . . .
7.3 Doptimality Uniform prior . . . .
7.4 Doptimality Normal prior . . ......
Page
. . 12
. . . 49
. . 69
. . . 75
129
Uniform
. . 161
Normal
. . . 162
F
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
DESIGN OF PAIRED COMPARISON EXPERIMENTS
WITH QUANTITATIVE INDEPENDENT VARIABLES
By
Walter William Offen
August 1980
Chairman: Ramon C. Littell
Major Department: Statistics
An experiment in which the treatments are compared and ranked
pairwise is called a paired comparison experiment. The BradleyTerry
model for preference probabilities with ties not allowed is used in
this dissertation. This model defines treatment parameters, ...
t such that the probability that treatment T. is preferred over
1
treatment T. is equal to i. (T7.+r.) When the treatments are levels
of continuous, quantitative variables, the logarithm of the Bradley
Terry parameters can be modeled as a regressiontype model,
In7T = kxki. There have been a large number of papers on the topic
i k kxi'
of paired comparisons, but only a few have considered experimental
design. Past papers on design are reviewed in Chapter 1. In 1973
Springall presented the asymptotic distribution of the maximum likeli
hood estimators of S1 ... 3 where s is the number of regression
type parameters in the model. Chapter 2 of the present dissertation
contains a derivation of this asymptotic distribution. The remainder
viii
of the dissertation is a consideration of designs for situations where
the treatments are levels of a single quantitative variable.
The asymptotic variancecovariance matrix is a function of the
true parameter values. Hence, the optimal designs also depend on the
parameters. In previous papers on design, the BradleyTerry parameters
were taken to be equal when evaluating the variances and covariances.
The present dissertation considers the design problem for various
parameter values.
Notice that the general design problem is complex. It involves
deciding how many levels and which particular levels should be chosen.
Furthermore, it must be determined how often to compare each pair of
treatments. In some cases this complexity was resolved. In other
cases additional restrictions had to be imposed before optimal designs
could be found.
Chapter 3 considers designs which minimize the variance of a
single parameter estimate, S.. The linear, quadratic, and cubic models
1
are considered. The optimal design in the linear case turns out to be
comparisons of only the smallest level in the experimental region with
the largest, for most values of the linear parameter. The designs for
the quadratic and cubic cases depend more heavily on the parameter
values than for the linear case.
Chapter 4 presents optimal designs for fitting a quadratic model.
The optimality criteria considered are Doptimality, the minimization
of the average variance of Inr., and the minimization of the maximum
value of InT.. These designs also depend heavily on the parameters,
1
although some overall design recommendations are given in the last
section.
The remaining chapters contain a brief discussion of related top
ics. Chapter 5 is a consideration of designs which protect against the
bias present when the true model is cubic. Chapter 6 is a discussion
of designs for preliminary test estimators. Chapter 7 is a considera
tion of twostage sampling. The first stage is a design which results
with a small variance of the quadratic parameter estimate. A test of
hypothesis then determines whether the second stage design should be
one which is optimal for fitting a linear or a quadratic model. A
discussion of the best error rate to use for the test of hypothesis is
included.
The appendices contain computer programs that find maximum likeli
hood estimates of the BradleyTerry parameters and 1, .... ,s. These
programs are written in the languages APL and Fortran.
CHAPTER 1
DESIGN OF PAIRED COMPARISON EXPERIMENTS:
A LITERATURE REVIEW
1.1 Introduction
An experiment in which the treatments are compared and ranked
pairwise is called a paired comparison experiment. Such an experiment
requires a number of subjects which are usually individuals, but could
conceivably be a machine, for instance. Each subject is required to
compare two samples and decide which of the two is preferable for the
attribute under study. For example, each subject could be asked to
state which of two samples tastes better. Paired comparison experi
ments are often conducted in areas such as food testing where it may be
difficult for an individual to quantify his like or dislike of a sample,
but he may be able to readily decide which of the two samples is pre
ferred.
Suppose that an experimenter wants to determine which of four
brands of coffee most consumers prefer. There are a number of experi
mental procedures available. One possibility is that each individual
be required to taste and rank all four brands from least favorable to
most favorable. However, sensory fatigue may be a problem because
after tasting the second or third sample, it may be difficult to per
ceive any difference between the subsequent sample and the first sample
tasted.
Another possible procedure is to have each individual assign to
each brand a score on some numerical scale. Sensory fatigue may again
be a problem with this experimental procedure since all four brands are
tasted by each individual. Additionally, there may be a degree of ar
bitrariness to choosing a numerical score for each treatment.
A third possibility is to conduct a paired comparison experiment.
The problems mentioned in the two preceding paragraphs are not present
for paired comparison experiments. That is to say, paired comparison
experiments have a minimal amount of individual guesswork involved.
As pointed out in the first paragraph, the complete experiment involves
a number of individuals, each one choosing out of two samples the one
which is preferable. For the particular example with four brands of
coffee, 24 subjects might be utilized, thereby resulting in 4 repli
cates of each of the 6 distinct pairs of treatments.
The treatments can have a factorial arrangement. However, facto
rial paired comparison experiments have often been analyzed ignoring
the factorial arrangement of the treatments. For example, Larmond,
Petrasovits, and Hill (1968) conducted a 2x3 factorial paired compari
son experiment, but they did not test for interaction or main effects
as normally is done with factorial experiments. Although Abelson and
Bradley (1954) presented a theoretical development, only relatively
recently have tractable methods been available for testing interaction
and main effects in factorial paired comparison experiments.
In the case of quantitative treatments, an analogue of multiple
linear regression analysis is appropriate. For example, instead of
four distinctly different brands of coffee, there may only be one brand
of coffee with four different levels of an additive. The design of
paired comparison experiments for fitting response surfaces with one
independent variable is discussed in Chapters 37.
1.2 BradleyTerry Model
Denote the t treatments as T ,T ... ,Tt. Let ni.. 0 be the
number of times T. and T. are compared, i
1 3
T. T. for "T. is selected over T.", where selection is based on the
1 ] 1 3
particular attribute under study. A general model defines ( ) function
ally independent parameters, 0 s it.. 1, such that
P(T. T.) = T ., U.. + .. = 1, ij, i,j=l,...,t. (1.2.1)
Each distinct pair of treatment comparisons is a set of independent
Bernoulli trials. The entire experiment is a combination of the (2)
independent sets.
Bradley and Terry (1952) proposed a basic model for paired compar
isons. A treatment parameter T. is associated with each treatment T.,
l 1
i=, ... ,t. The BradleyTerry model is then defined to be
P(T. T.) = .. = i./(T. + T.) i7j, i,j=l,...,t. (1.2.2)
1 j 13 1 1 3
The righthand side of (1.2.2) is invariant under scale changes, so a
single constraint is imposed to make the treatment parameters unique.
Two popular choices are Z rT = 1, or Z In 7. = 0. This constraint is
replaced with another constraint in Chapters 27.
replaced with another constraint in Chapters 27.
Another model for paired comparisons was presented by Thurstone
(1927). He used the concept of a subjective continuum on which only
order can be perceived. Let p. represent the location point on the
continuum for treatment T., i=l, ... ,t. An individual receives a
1
sensation X. in response to treatment T., with X. distributed
1 1 1
Normal(J.,l). The probability that T. is preferred over T. is then
1 1
P(T. T.) = P(X. > X.) = e /2 dy i (1.2.3)
1 J3 1 3 i,j=l,...,t.
(i.UJ.)
Bradley (1953) noted that replacement of the normal density function
in (1.2.3) by the logistic density function yields
P(T. T) = 1sech2 (y/2) dy = /(+) j (1.2.4)
J sech(y/ i J i,j=l,...,t.
(InT.lnT.)
1 J
This is the BradleyTerry model with U.=lnr., i=l, ... ,t.
1 1
Thompson and Singh (1967) gave a psychophysical consideration by
assuming a treatment T stimulates unobservable signals UI, ... ,U
which are then transmitted to the brain by m sense receptors. The U.
are independent and identically distributed, and m is taken to be fixed
and large. The two variables considered for the response X to treatment
T are
(i) X is the average of the signals U1, ... ,U
(ii) X is the maximum of the signals U1, ... ,U .
1 m
By the Central Limit Theorem, asymptotic normality results for X in
(i), which results with Thurstone's model. The random variable in (ii)
has one of three distributions depending on the distribution of the U..
1
If U., for example, is distributed exponentially, then the asymptotic
1
distribution of X is the extreme value distribution. In any case, how
ever, the BradleyTerry model in (1.2.4) results from (ii).
Rao and Kupper (1967) generalized the BradleyTerry model to allow
for ties. A tie parameter 8 2 1 is introduced, and the model becomes
P(T. T) 1 sechy/2) dy = r./(7T.+7.) i, j
1 3 4 1 1 ] i j=l.
(lniT.lnTj)+n
(lnTr.lnr.)+f
T 1 2
(T. = T = 1 sech (y/2) dy (1.2.5)
S] 4
(InTT. InTT.)n
ij
(7T. + 6T.) (6O s. + 7T.)
where ni=ln6 is the threshold of perception for the judges. That is to
say, if it is hypothesized that the two samples generate a random vari
able d which measures the difference between the samples, and further
more that Idl < n, then the judge will not be able to discern a differ
ence between the two samples, and will thus declare a tie. Note that
the BradleyTerry model is a special case with 8=1. Another model to
handle ties was proposed by Davidson (1970).
The BradleyTerry model has received much attention in statistical
literature since it was introduced in 1952. For the remainder of this
dissertation, the BradleyTerry model with ties not allowed is used.
1.3 Estimation of the BradleyTerry Parameters
and Corresponding Asymptotic Distribution
When the model was proposed by Bradley and Terry, it was assumed
that all of the n.. were equal. Dykstra (1960) extended the method to
13
permit unequal n...
The parameters are estimated by the maximum likelihood method and
likelihood ratio tests are developed. Assuming independence of the
paired comparisons, the complete likelihood function is
t a. t n..
L = H / H (T. + i .) (1.3.1)
i=l i
where a. is the total number of times T. is selected over any other
1 i
treatment, i=l, ... ,t. The likelihood equations are
t
a. n..
p .i = 0 i=l,...,t,
Pi (Pi + Pj)
J 3
j i
(1.3.2)
t
SPi= 1 ,
i=
where p.i denotes the maximum likelihood estimate of i'', i=l, ... ,t.
1 1
These equations must be solved iteratively (see Appendix A for an
APL computer program which finds maximum likelihood estimates of the
treatment parameters). A brief description of the iterative procedure
follows. Letting p(k) be the kth approximation to pi, then
t
(k) *(k) *(k)
Pi = i / Pi (1.3.3)
i=l
where
t
*(k) a (kl) (kl)  1.3.
.= / nij/(P + P ) k=1,2,.. (1.3.4)
Pi
j
j#i
(0)
The iteration can be started with p. =l/t, i=l, ... ,t. Dykstra (1956)
(0)
suggested a method of obtaining good initial parameter estimates p ,
i=l, ... ,t.
Ford (1957) proposed the model independently and proved that the
iterative procedure converged to a unique maximum for L if the following
assumption is true.
Assumption 1.1. For every possible partition of the treatments into
the two sets S1 and S2, some treatment in S1 is preferred at least once
to some treatment in S2'
If Assumption 1.1 does not hold, then a. must be zero for at least one
1
i. If exactly one a. is zero, say a. there may still be a unique
0
solution with p. =0. However, Ford proceeded to explain that if the
o
set S2 which violates Assumption 1.1 contains two or more treatments,
then the corresponding estimates of the BradleyTerry treatment
parameter must be zero, and consequently individual members of 52 could
not be compared. This fact is also evident by noticing that in (1.3.4),
*(k)
pi would be zero for at least two values of i, and so by (1.3.3),
(k)
Pi would also be zero for the same values of i. Then the next time
in the iterative procedure that (1.3.4) is executed, the denominator
in the summation would be zero for some i and j, and hence the itera
tive procedure would not converge.
The asymptotic theory and tests of hypothesis require the follow
ing additional assumption.
Assumption 1.2. For every partition of the treatments into two non
empty subsets S1 and S2, there exists a treatment T ES1 and a treatment
T.S such that j. >0, where
n..
ij N
ij = Nlim N itj, i,j=l.....t,
N = ni
i
Bradley (1955) investigated asymptotic properties of the treatment
parameters with each n.,=n. Davidson and Bradley (1970) considered a
multivariate model where each subject chooses one of two samples on more
than one attribute. As a special case of the multivariate model, David
son and Bradley obtained a more general result, allowing the n.. to be
different. They showed that the asymptotic joint distribution of
vN(p 1r), .....i./(p r ) is the singular multivariate normal distribu
tion of dimensionality (t1) in a space of t dimensions, with zero mean
vector and variancecovariance matrix Z = ( ..), where
3
7.. = cofactor of A.. in
13 13
13 txt txl
1'xt 0
13 txt tx1
1' 0
 xt
(1.3.5)
1' = (1,1, ... ,1), the tdimensional unit row vector, and
t
_ii i= Z 'J i=l,..,t,
ji (7T + T" )
Jti
A = U i/(i + T)2
13 13 1
It is apparent that the variances and covariances in (1.3.5) depend on
the true treatment parameters. Hence estimates of the variances and
covariances are usually found by substituting the consistent maximum
likelihood estimates for 71' "'. t into (1.3.5).
In the special case with i.=l/t, n..=n for all i,j,
1 13
0.. = 2(tl) /t3 and a
= 2(tl)/t3 ij.
(1.3.6)
Use of (1.3.6) is adequate if 7T, ... 7 t are approximately equal.
ij, i,j=l,...,t.
The major test proposed is a test of treatment equality,
H 0 i = = it = 1/t ,
versus
H : T. f j for some i,j, i/j, i,j=l,...,t.
a i 3
The likelihood ratio statistic is
21nX = 2Nln2 2 [ nijln(p +p ) a In pI.
i
Under the null hypothesis, H 21nA is asymptotically central chi
o
square with (tl) degrees of freedom. Bradley and Terry (1952) and
Bradley (1954) provided tables for t=3,4,5, including exact signifi
cance levels. Comparison of the chisquare significance levels with
the tabled exact significance levels suggests that the former may be
used even for relatively small values of n... This is perhaps compar
able to using the normal approximation to the binomial.
The asymptotic distribution under the alternative hypothesis is
also available for 21nX. Let
6 t
1 iN
S= + = 0 lim 6. = 6.
i=l
Then 21nA is asymptotically noncentral chisquare with (tl) degrees
of freedom and noncentrality parameter i where
92
1 4 Iij(i 6
i
1.4 Designs of Factorial Paired Comparison Experiments
Littell and Boyett (1977) compared two designs for RxC factorial
paired comparison experiments. The two designs considered are:
Design I: Consists of independent paired comparisons of each of the
RC
( ) pairs of treatments.
Design II: Consists of comparisons between pairs of treatment com
binations differing only in the levels of one factor, i.e. comparing
two treatments in the same row or column.
Note that there are (R) = RC(RC1) distinct pairs of treatments
2 2
for Design I, but only R( ) + C( ) =RC(R+C2) distinct pairs of treat
ments for Design II. In the 4x4 case, for example, Design I requires
120 distinct paired comparisons whereas Design II requires only 48
distinct paired comparisons. The two designs could entail the same
total number of paired comparisons, but Design II requires preparation
of fewer distinct pairs of samples.
Design II is also preferable from a computational point of view
because its analysis can be carried out by using a computer program
capable of handling the oneway classification with unequal replication
of pairs. A specialized factorial program is required for Design I to
obtain maximum likelihood estimates under the main effects hypothesis,
H : T., = .B., where T., is the BradleyTerry treatment parameter
0 13 1 3 1]
given by (1.4.1).
The two designs are illustrated for the 2x2 case in Figure 1.1.
The comparisons made with Design I are indicated by the solid arrows.
The comparisons made with Design II are indicated by the dotted arrows.
Factor B
high
cE z
Figure 1.1. Depiction of Designs I and II
low
Factor A
high
Designs I and II were compared by Littell and Boyett on the basis
of the asymptotic relative efficiencies, in the Bahadur sense, of their
respective likelihood ratio tests. That is, test statistics are com
pared on the basis of the ratio of the exponential rates of convergence
to zero of their observed significance levels. In general, the Bahadur
efficiency of test 1 relative to test 2 is defined to be
lim( 2 log L ())
n n
E = (2) = cl ()/c2 (e)
lim( log L )
n n
rrto
(i)
where L is the observed significance level of test i. The function
c. () is called the exact slope of test i.
Usually Bahadur efficiency is used to compare two test statistics
that are based on the same sample space. However, its use is equally
appropriate when the sample spaces are different. The two designs
should be compared on the basis of an equivalent number of comparisons
in each. Therefore, a "single observation" with Design I will be taken
as n = R+C2 comparisons of each pair in the design, and a "single
observation" with Design II will be taken as niI = RC1 comparisons of
each pair in the design. This gives RC(RC1)(R+C2) total comparisons
in each design.
Let T.. represent the treatment with the first factor at level i
and the second factor at level j. The BradleyTerry model is then
P(T. Tkl) = T ij i + k ik=l" (1.4.1)
j,1=1,...,C.
The following tests of hypotheses were considered by Littell and
Boyett.
Test 1: H : = 1/RC H : 7. = .B .
o 1i a 13 x]
This is a test for detecting two main effects versus no treatment dif
ferences. Efficiencies were tabulated for the 2x2 case (Littell and
Boyett (1977), Table 1). The table showed that Design I is more effi
cient than Design II for most combinations of a and 3.
Test 2: H : T.. = 1/RC, H : .,. = 8./R.
o 13 a 13 3
This is a test for detecting a single main effect versus no treatment
differences. For this test, Design I was shown to be uniformly more
efficient than Design II.
Test 3: H : T.. = ./R H : i. = aB .
o j3 3 a 13 i j
This is a test for detecting two main effects versus a single main ef
fect. Efficiencies were again tabulated for the 2x2 case. The table
showed that Design I is more efficient than Design II for most combina
tions of a and B.
Test 4: H : T.. = a.6. H : T.. 0 .
o 13 i 3 a 1]
This is a test for detecting interaction versus two main effects. The
results showed that Design II is uniformly superior to Design I for
detecting interaction.
Summarizing, when testing for main effects (Tests 13), Design I
is usually more efficient than II. When testing for interaction,
Design II is uniformly more efficient than I.
Littell and Boyett suggested that it might be a good idea for the
experiment to be conducted in two stages. The first stage is a number
of replicates of Design II followed by a test for interaction. If it
is significant, the second stage is completed by again using Design II,
and simple effects are analyzed by applying oneway analyses within rows
and columns. If interaction is not significant, then the second stage
consists of replicates of pairs in Design I that are not in Design II,
and main effects are analyzed.
The total number of comparisons to be made in the experiment can
be divided into the two parts such that after the experiment is complet
ed, the end result is either a complete Design I or Design II. For
example, suppose the treatments form a 2x2 factorial, and resources are
available for 24 comparisons. In the first stage, the experimenter runs
4 replicates of the four comparisons in Design II (see Figure 1.1).
If interaction is significant, two more replicates of the four compari
sons in Design I that are not in Design II are run. The latter then
results in the same comparisons made as if a single stage experiment
using Design I had been conducted.
Quenouille and John (1971) also considered designs for 2n facto
rials in paired comparisons. However, they assumed an individual is
asked to assign a number on a scale to signify the degree of difference
in the two samples instead of merely deciding which sample possesses
more of the particular attribute under study. This model was developed
by Scheffe (1952).
In particular, Quenouille and John discussed the following 23
factorial. The 28 possible paired comparisons were divided into 7 sets
of 4 blocks as follows:
Set (1) : (I,ABC) (A,BC) (B,AC) (C,AB)
Set (2): (I,AB) (A,B) (C,ABC) (AC,BC)
Set (3) : (I,AC) (A,C) (B,ABC) (AB,BC)
Set (4): (I,BC) (A,ABC) (B,C) (AB,AC) (1.4.2)
Set (5) : (I,A) (B,AB) (C,AC) (BC,ABC)
Set (6) : (I,B) (A,AB) (C,BC) (AC,ABC)
Set (7) : (I,C) (A,AC) (B,BC) (AB,ABC)
The letter "I" represents the treatment where all factors are at their
low levels. If a letter appears, then that factor is at its high level.
Note that each treatment appears once in each set. These sets are con
structed by first pairing treatment I with each of the remaining treat
ments. These are called the initial blocks. The remaining blocks in
the sets are constructed by multiplying the initial block by any treat
ment that has not yet appeared in that particular set, with the usual
2 2 2
rule that A = B = C =1.
To illustrate this technique, we will construct set (1) in (1.4.2).
The initial block is (I,ABC). So far only I and ABC have appeared, so
we may choose any treatment other than these. Suppose we choose BC.
Then the next block in the set would be (BC,A). Note that order is
unimportant. Now we may choose any treatment other than I, ABC, BC, or
A. If we choose B, we get the third block in set (1). To construct the
fourth block, there are only two choices remaining, namely C or AB.
Either one results in the last block of set (1).
Each set lacks information on the effects that have an even number
of letters in common with the initial block. For example, the initial
block for set (1) is (I,ABC), so this set provides no information on
the effects AB, AC, and BC, i.e. on all firstorder interactions. Sim
ilarly, the initial block for set (5) in (1.4.2) is (I,A), so this set
provides no information on B, C, and BC.
This concept of no information on B, C, and BC from data in set
(5) may be viewed as follows. If the responses from comparisons in set
(5) are modeled as
Y(I,A) = (p+a+e (p+e a+E(I,A)
Y(B,AB) = (+a+b+(ab)+eAB) (+b+e = a+(ab)+(BAB)
Y(C,AC) = (+a+c+(ac)+eAC) (U+c+eC) = a+(ac)+E(C,AC)
Y(BC,ABC) = (p+a+b+c+(ab)+(ac)+(bc)+(abc)+e BC) 
(BCABC) ABC
(p+b+c+(bc)+eBC
= a+(ab)+(ac)+(abc)+E(BCABC)
where all of the e's are independent and have expected value zero, then
it is clear that b, c, and (bc), corresponding to the effects B, C, and
BC, are not estimable from the four comparisons in set (5).
Quenouille and John define efficiency as follows. Suppose the
design is made up of r sets, s (s : r) of which give information on some
particular effect. Then s/r is the fraction of the comparisons made
that give information on that particular effect. On the other hand,
suppose that each set is run exactly once, i.e. each pair of treatments
is compared. It can be shown that there are a total of (2n1) sets,
n of which give information on any particular effect. In this case,
2n/(2 1) is the fraction of the comparisons made that give
Efficiencies of
Table 1.1.
balanced 2 and 2 factorial paired comparison designs
n Design
{2}
2
f1}
{3}
{2}
{1}
3
{3,2}
{3,1}
{2,1}
Number of
blocks
2
4
4
12
12
16
16
24
Efficiency of interactions of order
0 1 2
1.50 0.00
0.75 1.50
1.75
1.17
0.58
1.31
0.88
0.88
0.00
1.17
1.17
0.88
0.88
1.17
1.75
0.00
1.75
0.44
1.75
0.88
Notation: {j}: A design constructed by combining all
block I and a jfactor combination.
{j,k): A design constructed by combining the
(k}.
Source: Quenouille and John (1971, Table 1).
sets with initial
designs {j} and
information on any particular effect. The efficiency defined by Que
nouille and John is the ratio of these two fractions. In other words,
the efficiency is defined to be
S s/r
E
2nl/(2nl)
Table 1.1 gives the efficiencies of some designs in the 2 and 2
cases. These designs are balanced in the sense that all main effects
are estimated with the same efficiency, all first order interactions
are estimated with the same efficiency, and so on. The table indicates
that in the 22 case, Design {1} is efficient for estimating the inter
action effect. This result is the same as the result Littell and
Boyett obtained using Bahadur efficiency.
ElHelbawy and Bradley (1978) considered treatment contrasts in
paired comparison experiments. The theory developed is completely gen
eral, but they specifically showed how optimal designs can be found in
the case of a 23 factorial. Their optimality criterion is the maximi
zation of the chisquare noncentrality parameter from the likelihood
ratio test, which thereby maximizes the asymptotic power of the test.
Let a=(al,a2,a3), where a. represents the level of factor i,
i=1,2,3, and each a. is 0 or 1. Then the treatment parameters are
1
defined by ElHelbawy and Bradley to be
1 = 2 3 12 13 23 123 (1.4.3)
= T T T 7 i T v (1.4.3)
a al a2 a3 ala2 aa3 a2a3 ala2a3
for all a. The factorial parameters are defined by taking logarithms
of the 8 treatment parameters given by (1.4.3). So from (1.4.3),
Ya = logfa
3
= log + log7 + logT123
a a.a alaa3
3
= Ya + Y + Yl23 (1.4.4)
i=l i
The last two expressions in (1.4.4) define the factorial parameters
1 2 3 12 13 23 123
Y' Y2 Y 3 2a, y Y These parameters are
as1 a2 a3 ala aal a a3 3 ala2a3
subject to the usual analysis of variance constraints, namely
= 0 ,
i=1,2,3,
1
Ya a
a.=0
3
= 0 ,
i
(1.4.5)
1
S 123
ala a3
a 2=0
1
12a3
a3=0
The null hypothesis of no twofactor interaction
two factors is now considered. This test may be made
12
Ho: YOO
between the first
by testing
= 0 .
12
The constraints then require that Y 2=0, al,a2=0,1. The null hypoth
esis is therefore equivalent to
12 12 12 12
S 00 Y O 01+ = 01
12 12 12 12 12
since by (1.4.5), y0 y Y + Y1 = 4y00 Let B denote a
matrix whose rows are orthonormal contrasts. If we let Y' =
(Y000,Y 001,Y010 00 ,Y01 ,lll), then the null hypothesis is
also equivalent to
H : B y = 0,
o n=l
where B = S (1,1,1,1,1,,1,1).
"n=1 8
a.=0
1
Ya a.
a. =0
I
1
7123
al=0
= 0.
Let
a if comparing T. and T. give no information (in the
sense discussed by Quenouille and John) on the two
n .
13
N factor interaction between the first two factors,
b otherwise,
where n.. is defined in Section 1.2, and N = n
i
Then 12a + 16b = 1, since 16 of the 28 distinct pairs give information
on the twofactor interaction. ElHelbawy and Bradley then proceeded
to show that b should be taken as large as possible in order to maximize
the noncentrality parameter. This gives the result b=1/16, a=0. So
the optimal design is to compare the treatments that give information
on the twofactor interaction between the first two factors. These
pairs of treatments would be the sets (3), (4), (5), and (6) in (1.4.2).
This result is the same as Quenouille and John would obtain if
they wanted to find an optimal design to estimate AB interaction, since
using these 4 sets maximizes the efficiency of the AB interaction using
their definition of efficiency. It is also analogous to the result
Littell and Boyett obtained because the treatments compared in either
case differ either in the level of A, or the level of B, but not both.
ElHelbawy and Bradley noted that the result discussed in the two
preceding paragraphs still holds if other specified factorial effects
are assumed to be null (e.g. threefactor interaction). While the sta
tistics for the two tests may differ, their limit distributions are the
same for either H or H
o a
ElHelbawy and Bradley also considered simultaneously testing for
the presence of AB, AC, and ABC interactions. In this case the null
hypothesis is
H: B y= 0,
o n=3
where
1 1 1 1 1 1 1 1
B = = 1 1 1 1 1 1 1 1
n=3 Y8
1 1 1 1 1 1 1 1
By the same method they concluded that the optimal design is to make
the following comparisons (using Quenouille and John's notation):
(I,A), (B,AB), (C,AC), and (BC,ABC). Note that this is set (5) in
(1.4.2). Note also that this is the only set that contains information
on all three of the effects AB, AC, and ABC. So this design is also
optimal using Quenouille and John's definition of efficiency.
Beaver (1977) considered the analysis of factorial paired compari
son experiments using the weighted least squares approach of Grizzle,
Starmer, and Koch (1969). He considered a loglinear transformation on
the T. 's. This approach produces noniterative estimates of the i.'s.
Also, the usual tests for a factorial experiment are easily implemented.
However, Beaver gave no consideration to experimental design.
1.5 Response Surface Designs
Springall (1973) considered response surface fitting using the
BradleyTerry model with ties allowed as generalized by Rao and Kupper
(1967). The model is given in (1.2.5).
The logarithm of the treatment parameters are defined to be func
tions of continuous, independent variables, x ,x2, ... ,xs, measurable
without error, i.e.
s
In = k xi i=,...,t. (1.5.1)
k=l
Note that some of the variables can be functions of other variables.
2
For example, s=2 and x 2=xli results in the parameterization
2
In i = 1xli+ 2xli .. The parameters 6, ... are estimated by
the maximum likelihood method. Appendix B has an APL program and a
Fortran program which produce maximum likelihood estimates of ...
I but which assume there are no ties.
Springall presented the following definition, theorem, and example.
Definition 1.1. Analogue designs are defined to be designs in which
the elements of the paired comparison variancecovariance matrix are
proportional to the elements of any classical response surface variance
covariance matrix with the same design points and an equal number of
replicates at each design point.
The advantage of such designs is that they enable certain desirable
properties found in classical response surface designs (e.g. rotatabi
lity, orthogonality), which are dependent on the relative sizes of the
elements of the variancecovariance matrix, to be readily produced.
*
For example, consider the classical regression model y = 0 +
*2
81x + 8x Then an experiment can be designed in such a way that the
variancecovariance matrix of the least squares estimators 0' and
"* m*
variancecovariance matrix of the least squares estimators 3 B%, and
82 possesses an optimal form. For instance, the classical design with
an equal number of replicates at each of the four levels x=2,1,1,2
can be shown to be rotatable, and so the analogue paired comparison
designs discussed in Example 1.1 below have the same property, i.e.
var(ln ) = var(ln ).
x x
2
The analagous paired comparison model is In x = x + 2 x An
analogue paired comparison design has a corresponding variancecovari
ance matrix of the maximum likelihood estimators 81 2, such that each
element is a constant times the corresponding element of the least
squares variancecovariance matrix. For the particular example of a
quadratic model currently under discussion, var(6l) = c.var(B ),
var( 2) = cvar(B2), and cov(S ,62) = ccov(18 2), for some c.
Springall described a method which uses linear programming tech
niques that produces an analogue design with minimum elements of the
variancecovariance matrix. The following theorem shows how one can
produce an analogue design, but it does not necessarily have minimum
elements of the variancecovariance matrix.
Theorem 1.1. If a classical response surface design has certain prop
erties dependent on the relative sizes of the elements of the variance
covariance matrix, then the analogue paired comparison design will have
the same properties if
n. = N [i kl 11
k
where i.. = i.7./(7. + .) 2
iJ I J J
Example 1.1. Consider the quadratic model In Ti = B x + 2x2, with
design coordinates x=2,l,l,2. Assume e=1 and 1= 2=0, and set N=150.
Springall showed that an analogue design is given by n.. = 150/6 = 25,
i
An alternative design may be obtained by using linear programming.
Let Z denote the variancecovariance matrix of the maximum likelihood
estimates 1I and B2. By simultaneously maximizing the elements of ET
subject to the constraint N=150, a design which minimizes the elements
of Z is produced. Springall did this and arrived at the design
{n 2=0, n 1371, n14=9, n23=0, n24=70, n34=0}, after integerization of
the n...
13
The asymptotic distribution of the maximum likelihood estimates
of 8,81 .. ,s was presented by Springall, but he presented few de
tails. Therefore, the maximum likelihood estimates of the parameters
81, ... ,s and their asymptotic joint distribution are developed in
Chapter 2. The tie parameter, 0, is not considered in the remainder
of this dissertation.
CHAPTER 2
ASYMPTOTIC THEORY OF THE
MAXIMUM LIKELIHOOD ESTIMATORS
2.1 Introduction
This chapter contains the asymptotic theory for the maximum like
lihood estimates of the parameters given by (1.5.1), where the logarithms
of the treatment parameters are expressed as a polynomial function of a
single quantitative variable, x. An example of an experiment which had
a single quantitative variable was given in Section 1.1. The treatments
in that example were different levels of an additive for coffee. For
an experiment with quantitative independent variables, it is appropriate
to do a paired comparison analogue to classical regression analysis.
The theory developed in Sections 2.2 and 2.3 assume the general
case of s independent variables xl, ... ,xs. The parameterization of
the BradleyTerry parameters, given in (1.5.1), is reproduced here:
s
ln = Z kki' i1...
k=l
The BradleyTerry model with ties not allowed is used, where the r. are
defined by (1.2.2).
In order for all of the parameters 81 ... ,s to be estimable, it
is required that s a (tl). This is clearly necessary since the Bradley
Terry parameters have dimension (tl). Note also that an intercept
parameter, 0 is not estimable without imposing a constraint on the
BradleyTerry parameters. This becomes apparent when examining the
preference probabilities. For example, suppose the model was
s
in = 0 + k xki
k=l
i=l,...,t.
Then
P(T. T.) = ./(T. + r.)
I 3 1 1 3
s
exp(80 + kki
k=l
s
exp (B0 + Bkxki)
k=l
s
+ exp(B0 + 7 kxkj
k=l
s
exp( kxki)
k=l
s s
exp( kxki) + exp( kxkj)
k=l k=l
isi
The parameter 60 falls out, implying that any choice of B0 results with
the same preference probabilities. Choosing B0=0 replaces the imposi
tion of a constraint on the treatment parameters.
2.2 Estimation
As was previously mentioned, the parameters B1, ... ,s are esti
mated by the maximum likelihood method. From (1.3.1), the loglikeli
hood equation is
t
In L = a.lni. n .(nT(r. +
Si= il
i=l i
t s
= ia Skkik
i=l Lk=l
(2.2.1)
I n ijn exp( S kki) + exp( kxkj
i
The maximum likelihood estimators are found by taking partial
derivatives of the loglikelihood equation, setting them equal to zero,
and solving for the parameters. The likelihood equations are
t
9ln L r
1n L = a.x n.)
r i=l i
r
Xrj
= i m. m
isj ij
s s
Sriexp( kx)ki + x rjexp( kxkj)
k=l k=l
s s
exp(Z kkxki) + exp( 7 kXkj
k=l k=l
s s
iexp( SkXki) + xrj exp( kxkj)
k=l k=l
s s
exp( kxki) + exp( B Skxkj
k=l k=l
i= > 7m. ri r = 0 r=l...s.
13 ] + 7.
i/j *L 3
(2.2.2)
where m.. is the total number of times T. is preferred over T..
1] 3
These equations must be solved iteratively. Appendix B contains
an APL program and a Fortran program which find the estimates of 81,...,
0 The procedure described in the appendix converged for most situa
s
tions. However, occasionally when an attempt was made to fit a full
model (s=tl), the procedure did not converge, which remains unexplained.
In this situation one can resort to the program found in Appendix A
which finds estimates of the BradleyTerry parameters, and then solve
(tl) equations in (tl) unknowns to find the estimates of 8 ,... s
2.3 Asymptotic Distribution
First of all, some additional notation and elementary theory is
presented. This will become useful when the asymptotic joint distribu
tion of ... is derived.
1 s
Let Xik be a random variable which is equal to 1 if T. is selected
i;)k 1
th
over T. for the k comparison of T. and T., 0 otherwise. Then X.
3 1 3 13k
has the Bernoulli distribution with probability of "success" equal to
T./(T. + T.), i,j=l, ... ,t, k=l, ... n.j., and the Bernoulli random
variables are independent of each other. This implies that m. has the
1J
binomial distribution, with sample size n.. and "success" probability
equal to T./(IT + T.), where
1 i 3
13
m. = Xijk i
k=l
Furthermore, the m..'s are also independent of each other, and from the
ii
properties of the binomial distribution we have
Em.. = n. .T./( + T.) ,
1J 1J J 1 3
i
(2.3.1)
2
var(m.ij) = n.ijiT/(ir + .) i
1D 1313 1 3
cov(mij,mkl) = 0 ,
i
i,j,k,l=l,...,t.
From (2.3.1), (2.3.2), and (2.3.3), it follows that
n. Tr. nk T
E(m..m. ) = (Em.j) (E ) = k
] k mk iT. + Tj. Tk + T '
1 ] k I
2 2
2 2 in. j.T + n.. i.
Em.. = var(m..) + (Em..)2 1 J l 1
13 13 13 (I + i)2
1 3
(2.3.2)
(2.3.3)
(2.3.4)
(2.3.5)
Under certain regularity conditions verified by Springall (1973),
it follows from a theorem found in Wilks (1962, page 380) that
NI( ) ...s, N( B ) is asymptotically multivariate normal,
with mean vector ~ and variancecovariance matrix (ab), where
with mean vector 0 and variancecovariance matrix (A ,) where
ab
= 31n L ln L
a = E aJ '
a b b
a,b=l,...,s.
We now proceed to derive the asymptotic variancecovariance matrix
by deriving Xab for general a and b. From (2.2.2) we have
Ix .x x .x
Dln L = Xai x a + a a .
~ ~ = L L m^ij + L ^7 m+jT j
a i
N 4V j
3<1
(2.3.6)
 X X X .
SX i m.. + a] ai (n.. .. .
zi< i z + 1j T2 3 + 1 ijj )
i
m. (x x ) n. a=l,...,s. (2.3.7)
i< j ( a 
Then from (2.3.4)(2.3.7), we find Xab to be
A = E m. i (x .x ) n. i.
ab i
i
(ijk or j24)
2 2
n. .T i T n. i T
i
 nij r nkl~Tk
2 n n (x \x ) (x x )
z j (zz : .) (T+T, 7+1) I ai aj bk bl
i
+(7C + lT ) kT + T1) (Xaixaj) kxbk xbl)
i
i< 1 2l
ni ( Z 2+, Zj r.)2 T aib=l,....s. (2.3.8)
T < T.T
As previously mentioned, the remaining chapters deal with finding
optimal designs when the logarithm of the treatment parameters is ex
pressed as a polynomial equation in a single variable. The various
optimality criteria considered include
(i) Minimization of the variance of the maximum likelihood esti
mator of one parameter,
(ii) Doptimality,
(iii) Minimization of the average variance of the predicted
"response", In iT
(iv) Minimization of the maximum variance of In 7 (minimax),
x
(v) Minimization of the bias,
(vi) Minimization of the average mean square error of In Z when
x
bias is present (Jcriterion),
(vii) Doptimality for a preliminary test estimator.
These are all defined in subsequent chapters.
For the case of a single quantitative independent variable, the
model given in (1.5.1) becomes
s
n t. = x i=l,...,t. (2.3.9)
i k ,i k ..
k=l
Similarly, the variancecovariance matrix given in (2.3.8) becomes
Sl
(ab)1, where
\b = nj 2 (x ax a)(x.bx.b), a,b,...s.
ab + ( 2 + 1 (2
i
Notice that the complete design problem is quite complex. It
involves deciding how many levels and additionally which particular
levels should be chosen. Furthermore, it must be determined how often
to compare each pair of treatments.
A further complication is apparent by noticing that the variance
covariance matrix, whose elements are given in (2.3.10), depends on the
true parameter values. Hence the optimal designs also depend on the
parameters. The designs discussed by ElHelbawy and Bradley and Spring
all were found under the assumption that all BradleyTerry treatment
parameters are equal, or equivalently that = 2= =B =0. In no
paper of which the author is aware was the dependency of the optimal
design on the true parameter values examined. As will become apparent
in the following chapters, the optimal design for one set of parameter
values can be quite different from the optimal design for another set
of parameter values.
CHAPTER 3
MINIMIZATION OF THE VARIANCE
OF A SINGLE PARAMETER ESTIMATE
3.1 Introduction
The present chapter contains a discussion of the linear, quadra
tic, and cubic models. The optimality criterion considered is the
minimization of var(S.), where i=1,2, and 3 for the linear, quadratic,
and cubic models, respectively. The linear model is discussed in Sec
tion 3.2. Designs for the quadratic model are found in Section 3.3 for
32=0, 81 variable. The general expression for var(82), given by
(3.2.6), can be shown to be a continuous function of 2, and hence the
designs in Section 3.3 are locally optimal at the point B2=0. Likewise,
designs for the cubic model are found in Section 3.4 for 3=0, 81 and
2 variable, and are similarly locally optimal.
3.2 Linear Model
From (2.3.9), the linear model is
In f. = 8x. i=l,...,t. (3.2.1)
1 1
Since there is only one parameter, by (2.3.10) the variancecovariance
matrix becomes a simple variance, X1, where
B(x. + x.)
S TS' e I ]2
i
i
(x. x.)2
ni. (exp(8(x.x.)/2) + exp(B(x.x.)/2))2
i
a.
= nij 8d../2 Bd/2 (3.2.2)
i
where d..=x.x.. For the remainder of this dissertation, it is assumed
13 i J
without loss of generality that the independent variable, x, is scaled
such that it is in the interval [1,1].
Although it is not necessary to treat the two cases B=0 and SrO
separately, the special case 8=0 is considered first because the argu
ment used to derive the optimal designs is easier to follow for this
situation. When 8=0, (3.2.2) reduces to
n = n. (x. x.)2/4 (3.2.3)
i
The expression in (3.2.3) is a maximum when all the "weight" of the n..
1)
2
is placed onto pairs which maximize (x. x.) Clearly this implies
1 3
that the design which minimizes var(B) is a comparison of only the pair
(1,1).
The designs which maximize A when 85O are found using an argument
identical to the one used for the case $=0. From (3.2.2), all of the
"weight" of the n.. should be placed onto pairs which maximize
13
2 (d/2 $d/2 2
y = d2(e + e 2 (3.2.4)
where d is then the optimal difference between the levels in the pairs.
The value of d which maximizes y in (3.2.4) is the solution to
y 2d ^d2(ed/2 eBd/2
y = 2d Sd 2(e = 0 (3.2.5)
3d (ed/2 + ed/2)2 (ed/2 + ed/2 3
The equality in (3.2.5) implies
coth(Sd/2) = Sd/2 (3.2.6)
The solution to coth(u) = u is a transcendental number, and can be
shown to be approximately u=1.1997. Hence, from (3.2.6) and the fact
that Idl must be less than or equal to 2, the design which minimizes
var(B) for a given value of 8 is
dl = Ix. x.I = min(2,2.3994/ B ) (3.2.7)
It is now shown that the value of Idl given in (3.2.7) is in fact
a maximum for y, and consequently for A. By (3.2.5), the second par
tial derivative of y with respect to d is given by
2 y Bd/2 + d/2 2
Y (e + e )
Dd
22 322 2
2 4dBtanh(Bd/2) 2d2/2 + 6 d tanh ($d/2) (3.2.8)
The second partial derivative 3 y/d2 is negative if and only if the
expression in (3.2.8) is negative. From (3.2.6), the equality
2 = Bdtanh(8d/2) (3.2.9)
holds at d=2.3994/181, which is then substituted into (3.2.8) to give
2
3 = 2 8 B2d2/2 + 6 = 2d2 /2 < 0 (3.2.10)
opt. d
Because there is only one solution to (3.2.6) where d>0, the result in
(3.2.10) implies that the solution of Idl given in (3.2.7) does in fact
minimize the variance of B.
By (3.2.7) the optimal design is a comparison of only the pair
(1,1) when 8 <1.2. It can be shown that P(l 1)=.92 when 8=1.2.
Because of this relatively large preference probability of choosing one
endpoint over the other, in most practical situations B will be less
than 1.2. So in these cases the optimal design is a comparison of only
the endpoints of the experimental region.
The optimal designs given by (3.2.7) are also optimal for other
criteria. In particular, they are optimal for criteria (ii)(iv) intro
duced in Section 2.3. These criteria are formally defined in Chapter 4,
but a brief explanation for their equivalence in the case of a linear
model follows. The equivalence of the criterion being discussed in the
present section to Doptimality follows from the fact that the deter
minant of a Ixl matrix is simply the only element in the matrix. The
equivalence to the other two criteria follows from the fact that
2
var(ln i ) = x var() .(3.2.11)
x
From (3.2.11), criterion (iii) is, in this case, the minimization of
2 2 "
I
x var(8) dx = var(8)
1
and criterion (iv) is the minimization of
2
max x var(B) = var(B)
xELl,1]
3.3 Quadratic Model
The quadratic model is
2.
In = xi + x i=, ,t.
1 81xi 2 1 '
From (2.3.10),
mum likelihood
the asymptotic variancecovariance matrix of the maxi
S2 i 1
estimators a1 and 82 is (A ) where
1 2 abI
A11 =
i
12 = I
i
2i
i
2
ijij(X i x)
*..(X. x.)(x. 2 2
nij ij(xi 2 x
2 2 2
n ijij (xi x. )
13fl 1
(3.3.2)
(3.3.3)
(3.3.4)
and where
.ij = (xi'x) =
iJ 1 3
exp( 81(x. + x.) + 8(x.2 + .) )
S2exp( 2 1
2 2 )2
( exp(8 1xi + 8 2xi ) + exp(81xj + 2xj ) x
(3.3.5)
i
Minimization of var( 2)
Consider the test of the hypothesis
H : 8 = 0 S, unspecified,
o 2 1
versus
H : S2 / 0 1B unspecified.
a 2 2.
(3.3.1)
Minimizing the asymptotic variance of 52 would equivalently be maximiz
ing the asymptotic power of the test. The variance of 2 is the lower
righthand element of the 2x2 matrix (lab) That is,
var(B2) = 1 (3.3.6)
11 22 12
Under the null hypothesis, (3.3.5) becomes
8 (Xi + Xj)
e
J = 1 2 i
(e13 +e )
As was pointed out in Section 3.1, in the present section (3.3.6)
is minimized for 82=0. This produces locally optimal designs. Assuming
52=0 greatly simplifies the design problem as will be seen in the
remainder of the section.
The end result of Lemmas 3.13.2 and Theorems 3.13.3 is a set of
optimal designs which minimize var(82) under the null hypothesis for
various choices of the linear parameter. Hereafter, "optimal design"
will refer to the design which is optimal for the criterion being dis
cussed in the section.
The following two definitions are useful in the discussion that
follows.
Definition 3.1. The pair of treatments (x ,x2) is defined to be
the symmetric counterpart of the pair (x ,x2).
Definition 3.2. A design is said to be symmetric if each pair in the
design is compared as often as its respective symmetric counterpart.
Lemma 3.1. Let {(x.,x.) denote .ij as given by (3.3.7). Then
(xix .) = (Xix.x), for any x. and x..
Proof.
8 C(x. + x.) 2 1(x. + x.)
x e e
X i lxj 2 281(xi + x.)
Se +e ) e 3
81(xix)
e
e1(x.) + e1 (x ) 2
Ce^ j' +e
= 0(Xi,xj ) +
Theorem 3.1. The optimal design is a symmetric design.
Proof. Notice that both of the following hold:
2 2
(i) (x. x.) = ( (x.) (x ) )
3 1 3 1
(ii) (x.2 x 2) = ( (x.) (x.)
From (3.3.2), (3.3.4), Lemma 3.1, and the above, it is clear that ll
and X22 are invariant under changes of any number of comparisons of
pairs to comparisons of their symmetric counterparts.
Let D denote any particular design, and let varDC (2) be the var
th
iance of 2 for Design D Also, let the i pair in D be denoted
(x.,xi). The sets of comparisons {(x.,x ), (x.,x)} are considered
in sequence, and without loss of generality, it is assumed that
n , n = d. 2 ,
x.,x. X. ,X.
1 1 1 1
for all i. Notice that it could be true that n = 0.
X X
I I
Design D2 is formed from Design D1 by changing d./2 of the compar
isons of the pair (x.,xi) to comparisons of the pair (x.,x'). This
1 1 1 1
results in (n + n )/2 comparisons of each of the two pairs
x.,x. x.,x.
11 1 1
th
in the i set. This change is made for each set of comparisons, there
by defining Design D2 to be a symmetric design. The total number of
comparisons, N, is left unchanged.
By the first paragraph of this proof, the values of All and A22
are the same for the two designs. By (3.3.3), 12=0 for Design D2,
12
and hence A2 is a minimum. Therefore, by (3.3.6),
varD (2) S varD (2) (3.3.8)
Note that the variances are equal if 12=0 for Design DI, and it
is possible that A 12=0 for a nonsymmetric design. However, (3.3.8)
shows that no design can "do better" than a symmetric design. Also
note that in practice, d. must be an even integer in order for the
1
n. 's to be integers.
So the search for optimal designs in the present section will hence
forth be restricted to the class of all symmetric designs.
Because 12=0 for a symmetric design, (3.3.6) reduces to
var(B ) = 1/22 (3.3.9)
So it is now necessary to find designs which maximize A22. Notice that
122 depends on 1. As previously mentioned, the optimal design also
depends on the value of S1.
Theorem 3.2. The optimal design is a design in which the only compar
isons made are comparisons of the pairs (x,l) and (l,x) an equal num
ber of times, for some x. Furthermore, x, 0.
Proof. First of all, the notation used for A22' which is defined by
(3.3.4) and (3.3.7), is changed. The N total comparisons are arbitrar
.th
ily ordered, and the i pair is then defined to be (x.,x!). The vari
ables y. and z. are defined to be
1 1
y = x. x z. = x + x i=,...,N.
1 1 1 1 1
Then from (3.3.7),
1(x' + x) 8 l(X + x!)
1 i i 11 1
= e e
i li xx l Sl(xi + x!)
(e +e ) e
8 y /2 1+i/2 2 i=l .. N
(e + e
Notice that i. is a function of y. but not of z.. From (3.3.4), 12 is
1 1 12
rewritten as
N
22 = iii zi
i=l
Let D be any design. By holding y. fixed, we examine what changes
2
in z.2 will increase 22. Clearly if 1zil is as large as possible for
every i, then X22 is also as large as possible for the particular set
of y.'s found in D. Without loss of generality, it is assumed that
1
x.
1 1 1 1 1
all comparisons must involve either x=l or x=l. Additionally, using
an argument similar to one used in Section 3.2, it is clear from the
expression for A22 given in (3.3.4) that x=l should be compared with
one and only one level of x, namely the value of x which maximizes
0(l,x)(1x ) This result in conjunction with Theorem 3.1 proves the
first part of the theorem.
We can now rewrite 122 simply as
22
122 = NO(l,x)(lx2)2
It remains to be proven that the optimal value of x is such that x L 0.
The variable I(x) is defined to be the positive square root of
X22/N, i.e.
81(1 + x)/2
1 B2x
~(x) ( x2
e1 1x
e + e
(3.3.10)
Since (3.3.10) and the fact that xE[l,l] imply I(x)
A22 is equivalent to maximizing P(x).
It will now be shown that '(x) > i(x), x a 0.
inequalities are equivalent:
f(x) a 4(x)
1 (1 + x)/2
e
1 1x
e + e
13 (l+x)+1 1 (l+x)0 x
e + e
> 0, maximizing
The following five
(3.3.11)
1e(1 x)/2
e (3.3.12)
e + e
~ (lx) + 1 1( (1x)+81x
r e + e (3.3.13)
81(1 + x)/2 1 (1 + x)/2 81(1 x)/2 8 (1 x)/2
e + e a e + e (3.3.14)
cosh(u + 81x) cosh(u) (3.3.15)
where
uu
cosh(u) = (e + e )/2 ,
u = 1 (1 x)/2
Now cosh(u) is a monotonically increasing function of u for u. 2 0.
Furthermore, it can be assumed that 3 1. 0 because Theorem 3.3 proves
that the optimal designs for 1 = 10 are the same. Or, equivalently,
the remainder of this proof can be repeated for the case 1 <0. Then
x ; 0 and B1 > 0 implies B x > 0, and hence by the equivalence of
(3.3.11) and (3.3.15), 4(x) L(x) for all x O 0. This completes
the proof of the second part. t
The next theorem shows that optimal designs only need to be found
for nonnegative values of 81.
Theorem 3.3. The optimal design for B1=B10 is the same as the
optimal design for 81=B10.
Proof. Recall that the optimal design is found by finding the value of
x which maximizes P(x) from the proof of Theorem 3.2. Since i(x) is
actually a function of 1 and x, we write
B1(1 + x)/2
(81,x) = e1 x/ (1 2
e + e
The theorem is proved by observing that
81(1 + x)/2 81(1 + x)
e 2 e
t(81,x) = 1ex (1 x2) e* (l + x)
e +e e
81(1 + x)/2
=e (1 x2)
81x 81
1
e + e
= (1,x) +
All that remains to be done is to find the value of x that maxi
mizes f(x) for various choices of B1 0. Unfortunately, a closed form
solution of x as a function of B1 does not exist. However when 31=0,
the optimal value of x can be analytically derived, and hence this
situation is considered first.
Suppose B =0. Then from (3.3.10), i(x) = (1 x2)/2. Clearly
this is maximized by x=0, and hence from Theorems 3.1 and 3.2, the
optimal design is an equal number of comparisons of only the pairs
(0,1) and (1,0). For the case 81 0, the grid method described in the
next paragraph was used to obtain the results presented in Table 3.1
and Figure 3.1.
The grid method utilized first calculated 4(x) for x=0(.l)l,
and the best value of x, say xl, out of these 11 choices was found.
Then t(x) was calculated for the 11 values x=(x .l),(x .08), ...
(x +.l), and again the best value of x was found. This procedure was
repeated until the optimal value of x was found accurate to the fourth
decimal place. The entire procedure was done separately for each value
of 16 .
The following lemma proves that the values of x which are found
via the grid search are absolute maximums for i(x).
Lemma 3.2. Exactly one local maximum (and hence it is the absolute
maximum) exists for l (x) in the interval L0,1].
Proof. By Theorem 3.3, only the case 1 >0 needs to be considered.
From (3.3.10), the first partial derivative of l(x) with respect to x
is
1(1 + x)/2 e (81(1 x )/2 2x)
1 1(x) = 1 1x 2 x (3.3.16)
L 2 l 2
( e + e ) e (8B(1 x )/2 + 2x)
It can be seen from (3.3.16) that (0)>0 and 4i'(l)<0. Hence ,0'(x)=0
for at least one xeLo,l1, and it remains to be shown that i'(x)=0 for
exactly one xE[O,lj. (An attempt was made to show 4"(x) is negative
for all 61 and xELO,l2, but it turned out not to be true. In particular,
a graph of '(x) as a function of x for 81=5 indicated ("(x)>0 for values
of x near 0.)
From (3.3.16), W'(x)=0 if and only if
 (1 x)
(81(1 x2)/2 2x) = e () 1(1 x )/2 + 2x) (3.3.17)
Notice that the functions f(x) = ( 1(1 x2)/2 2x) and
f2(x) = (81(1 x2)/2 + 2x) are reflections of each other about the
axis x=0, and they are quadratic in x. Hence they intersect exactly
once at x=0. Also, for fixed B1, fl and f2 have maximums at 2/81 and
2/81, respectively. Let gl(x) be the righthand side of (3.3.17). It
was shown in the preceding paragraph that fl(x) and g (x) intersect at
least once in the interval Lo,l]. To determine that they intersect
exactly once, it is sufficient to show that f l(x)/3x < gl (x)/ax.
From (3.3.17),
f l(x)
3x x 2,
gl(x) 81(1 x) 2 1(1 x)
x = e (1(1 x )/2 + 2x) + e (8x + 2)
3 (1 x)
= e 1 ( 2(1 x )/2 + B3x + 2)
For 31>0 and xELO,1], it is clear that fl (x)/2x < 0 and
agl(x)/3x > 0, which completes the proof. t
Table 3.1 presents the optimal value of x found from the grid
search described immediately prior to Lemma 3.2, for various choices
of 1 1. Figure 3.1 is a graph of the optimal value of x versus 1B11.
The table includes relatively large values of B11 to illustrate that
xtl as Is1 B m. Recall that for a given value of B l1, the optimal
design is a comparison of (l,x) and (x,l) an equal number of times,
where x is found in Table 3.1 or Figure 3.1.
Table 3.1 indicates that the optimal design when B1=0 involves
only 3 levels, x=l,0,1. However, when 81O, the optimal design
requires 4 levels, x=l,x ,x ,1, for some x In certain experimental
So o
situations it may be significantly more laborious to make four "batches"
Table 3.1. Designs which minimize var(B2)
Relative
81 Optimal x efficiency
0 0 1
.05 .0003 1
.10 .0012 1
.15 .0028 1
.20 .0050 1
.25 .0077 .9999
.30 .0110 .9998
.35 .0150 .9995
.40 .0194 .9992
.45 .0243 .9988
.50 .0297 .9982
.55 .0356 .9974
.60 .0419 .9963
.65 .0486 .9950
.70 .0556 .9935
.75 .0630 .9915
.80 .0708 .9893
.85 .0787 .9866
.90 .0869 .9836
.95 .0953 .9801
1.0 .1039 .9761
1.1 .1216 .9668
1.2 .1396 .9556
1.3 .1580 .9426
1.4 .1764 .9269
1.5 .1948 .9097
1.6 .2130 .8905
1.7 .2310 .8696
1.8 .2487 .8472
1.9 .2660 .8233
2.0 .2829 .7983
3.0 .4269 .5242
4.0 .5295 .2974
5.0 .6031 .1546
CN
< ~
Ol
N
4
U
Lf
aD
a.
cC
than it is to make just three. For this reason, and because a priori
information about 1 may not be available, asymptotic relative efficien
cies were calculated for the design ({n,0 = nl = N/2} to the optimal
1,0 0,1
designs found in the table. The relative efficiency of the first design
to the second (optimal) design is defined to be
var 2(2)
R.E. = 2
var (B2)
1, 1 2
e /(e 1 + 1)2
B1(1 + x) ,2 1 x 2
e (1 x )/( e + e )
The relative efficiencies were enumerated for various choices of 81,
and are presented next to the optimal designs in Table 3.1.
Equivalence of ElHelbawy and Bradley's method
and minimizing the variance
An application of ElHelbawy and Bradley's method for finding opti
mal designs was briefly discussed in Section 1.4. It will now be prov
en that their optimality criterion is equivalent to minimizing var(8k),
any k. First of all, notation and results of ElHelbawy and Bradley's
paper which are necessary for the proof of Theorem 3.5 are presented.
Recall that ElHelbawy and Bradley considered linear contrasts of
Yi = In i=l, ... ,t. The matrices B and B are defined as mxt and
1 1 m f
nxt, respectively, with zerosum, orthonormal rows, such that
0 s m < m+n S tl. The null hypothesis is
H : B Y(T) = 0
S n n
and the alternative hypothesis is
H : B y(r) 0 ,
a n n
th
where Tr and y(r) are column vectors with i elements 7T. and y., respec
tively. The rows of B represent the linear contrasts assumed to be
m
null. Note that B need not exist (i.e. m=0).
m
The vector T is a value of T which satisfies the null hypothesis
o 
and the constraints, where r is the true parameter vector.
The matrix B is defined to be the txt orthonormal matrix
[1,'//
B = B (3.3.18)
m
where B* is any (tml)xt matrix such that BB' = I and l'=(l, ... ,1),
m t 
the tdimensional unit vector.
The following theorem is extracted from ElHelbawy and Bradley
(1978, Theorem 2, page 833).
Theorem 3.4. Let p be the maximum likelihood estimator of T. Under
the assumption of connectedness (Assumption 1.2), and by the assumption
of strictly positive elements of T, which has already been inherently
assumed since ln(0) is undefined, v7((p) Y(7)) has a limiting distri
bution function that is singular, tvariate normal in a space of (tml)
dimensions, with zero mean vector and dispersion matrix
,
E(r) = B*'(C()) B* (3.3.19)
m  m
where
C(T) = B*A(7r)B*' (3.3.20)
M m
and A(r) is the txt matrix with elements
t
Aii (i) = i ij i /(ni + .), i=l.. t,
j=l
ji (3.3.21)
A..(Tr) = U. .ir. ./(r. + i .) i j, i,j=l,...,t,
and where U.. is defined in Assumption 1.2.
13
ElHelbawy and Bradley's optimality criterion is the maximization
of the noncentrality parameter, 2, for the asymptotic chisquare dis
tribution of the likelihood ratio statistic. The noncentrality parame
ter is
2 1
2 = 6'E 6 (3.3.22)
0 
where 6 is a vector of constants defined in their paper, E is the
o
1
leading principal matrix of order n of C B provides the first n
o n
rows of B*, and
C = B*A(T )B*' (3.3.23)
o m o m
A theorem proving the equivalence of the two optimality criteria
is now presented.
Theorem 3.5. Consider the model In iT = 1 x+B 62x2+ +kxk
Suppose we are interested in testing, for some fixed i,
H : 3.=0 versus H : 30 .
o 1 a i
Then the design which minimizes var(Bi) is identical to the design
which maximizes the noncentrality parameter from the likelihood ratio
test. (Note that it does not matter whether or not it is assumed that
some of the other l.'s are 0.)
3
Proof. Orthogonal polynomials can be derived for any spacing of the
independent variable. Let these contrasts be the rows of B. The lxt
matrix B is the orthogonal polynomial corresponding to the parameter
n=1
B. being tested.
From the method used to find orthogonal polynomials, it follows
that
1 c,n=1 (TI
where c is a constant. So minimizing var(B.) is equivalent to minimiz
ing var(B y(p)). But from Theorem 3.4,
n=1
1
var(B n(p)) = B (B*'(C(T)) B*)B'
n=l m  n=
1
0
= (1,0, ... ,0) (C(7) 1
S ((C(7)) ) l
2
From the definition of Z E is the leading principal matrix of
order 1 of C 1
order 1 of C i.e.
o
Z = ((C())1)1
o  11
In this case, 6 is a scalar constant, and so maximizing 12 is equiva
lent to minimizing Z = var(B). t
o 1
Example 3.1. This example illustrates the equivalence of the two
methods presented in Theorem 3.5. It is assumed that the true model
is a quadratic model, In ix = l x + 5 x Also, the design is re
stricted to have the three design points x=l,0,1.
Table 3.1 indicates that the design which minimizes the variance
of O2 is {n1,0 = nO,1 = N/2}.
The design problem is now approached using the method presented by
ElHelbawy and Bradley. The null hypothesis is
Ho: 2 = 0 unknown.
o 2 1
Using orthogonal polynomials, this hypothesis is equivalent to
H (1 2 1)
Ho. y(T) = 0 ,
o /6
where y_' (r) = (In i1, In O In 1) .
Note that ElHelbawy and Bradley's constraint, that the In ir.'s
1
sum to zero, poses no problem. Instead of constraining T0 to be equal
to 1, an intercept parameter, B0, could be introduced to the model.
The model would then become
2
In = 0 Sx + B2x
with the constraint
SIn rT = 0
x x
x
It can be shown in this particular case that 80 3 .
0 32
For a definition of the following matrices and notation, see
(3.3.18)(3.3.23). For the example under discussion,
B
n=1
= (1 2 1)//6 ,
2(1 2 1)//6
B* =
m (1 0 1)//2
12 + 13
A_ ) = T 1
13
where 7 '=(1,1,1), i.e. B =B =0. Note that T can be any
o 1 2 o
satisfies the null hypothesis and the constraint.
The matrix C is
o
vector which
C = B*A(TT )B*'
o M  im
1 2 12 23 Vi2 12 23
S112 23 23 1223
implying
2 (112 + 4 +13 + 23
C
S12 H23)
712
12 23)
/12 / 
3 12 + 23
2 12 23
1
Then since Z is the upper lefthand element of C
o o
S 12 + 4113 + "23
o 21 C
W12
+12 + H23
123
13
23
13 + 23_
implying
1 21 Co
o0 12 + 4 +13 + 23
3 V12V13 + V12 23 + 113 23
S V(3.3.24)
8 012 + 4113 + 23 (3 24)
The design which maximizes the chisquare noncentrality parameter from
l
the likelihood ratio test is the design which maximizes E
o
1
By the definition of the U ., 3 = 1 12 13 So can be
ij 23 12 13 o
rewritten as
1 12 13 + (12 13)( 12 13
o 12 + 4 +13 + (1 V2 13)
2
12(1 V12 U13) + (V13 132)
(3.3.25)
31 + 1
13
Suppose that P13 is temporarily held constant. It is then claimed that
131
choosing u12 = (1 13)/2 maximizes o for the particular V13"
From (3.3.25), the expression 12((1 U 3) 1 2) must be maximized.
Because of the constraint 0 S 12 5 1 ,13' it is clear that
12((1 U13) 12) is maximized when U12 (1 13)/2, which
proves the claim. The constraint among the U..'s in addition to the
above claim implies that the optimal design must be a design such that
1 (13
12 2 V23 (3.3.26)
I
The scalar ~ is now rewritten as a function of v112 alone. From
o
(3.3.24) and (3.3.26),
1 3 212(1 212) + 12
o 8 2112 + 4(1 212)
2
3 212 3112
8 4 612
3 1 12
= 3112
16
Clearly o1 is maximized by choosing 112 as large as possible. By
(3.3.26), this would make 12 23=.5, and 13=0. So this optimal
design is in fact the same as the design which minimizes var(2).
An attempt was made to find the optimal design for the test of
hypothesis
H : B = = 0 versus H : not H
o 1 2 a o
using ElHelbawy and Bradley's methodology. When simultaneously test
ing more than one parameter, or equivalently more than one treatment
contrast, a constant value, 6., must be chosen for each parameter. The
theory behind the optimal designs includes an infinite sequence of
values of T satisfying H which are converging to a vector m7 satisfy
Sa o
ing H When the treatment contrasts are for factorial treatments,
o
which is the case they discussed extensively, the relative sizes of the
6. are said to be equivalent to the relative importance placed on each
1
contrast being tested in H Hence, ElHelbawy and Bradley usually
chose 6 =6 = = 6 .
1 2 n
In the present situation, choosing 61 and 62 restricts the alter
native parameter values in the infinite sequence mentioned in the pre
ceding paragraph to be a proper subset of the alternative parameters
satisfying Ha. For example, 61=62 implies that 68= 2, and hence the
2
alternative models are In T1 = Bx + Sx with B as N*. Choosing
x
different 6. can result in a different design because the maximization
1
of the noncentrality parameter depends on the relative sizes of the
6.. Additionally, the optimal designs actually found after choosing
the relative sizes of 61 and 62 were found to be unreasonable (e.g. for
61=62, the optimal design is a comparison of only the pair (1,0)).
Chapter 4 presents other optimality criteria for fitting a quadratic
model.
3.4 Cubic Model
The cubic model is
2 3
In I. = x. + Bx.2 + x 3 i=l,...,t. (3.4.1)
The asymptotic variancecovariance matrix of the maximum likelihood
1
estimators, B1, ,' and 63 is ( ab) where lab is given by (2.3.10).
The inverse of the variancecovariance matrix is partitioned as
11 12 13
(A b) = 12 22 23
13 A23 33
Then by a result found in Graybill (1969, page 165), the variance of
83 is
r 1
var( = ab 33
= 33 (A13 23> ( 12 ( 1j
12 22 23
SF 13 132212 23 23 1123 12 13
33 11 A22 122 (3.4.2)
To simplify (3.4.2), only the case S =0 is considered in the
search for designs which minimize var(3 ). As pointed out in Section
3.1, this gives locally optimal designs. The following theorem shows
that optimal designs only need to be found for linear and quadratic
parameters which are both positive.
Theorem 3.6. Suppose that 3=0, and further that the optimal design
for given 3 and 2 is comparisons of the pairs (x.,x.) a total of
n times, for all pairs. Then
x. ,x.
1
(i) The optimal design for (81) and (52) is the same as the
optimal design for 1 and 82'
(ii) The optimal design for (81) and 2 is comparisons of the
pairs (x.,x.) a total of n times, for all pairs,
1 3 x.,x.
(iii) The optimal design for 81 and (2 ) is the same as in (ii).
Proof. From (3.3.5),
. 2 (xij) =
2 2 2 2
exp(8 (x.+x.) 2 (x+x. )) exp(2 (x.+x.) + 23 (x. i+x ))
1 1 3 2 l 1 j 2 i
(exp(1x i82x ) + exp(81x 82x )) exp (2B1(x +x ) + 282 (x i+x 2))
2 2
exp(B (x. + x.) + 2(x + x ))
(exp(61xi + B2xi2) + exp( 1xj + 2x 2))2
= 1 (x ,x.) (3.4.3)
2
Notice that since B =0, Tr i./(r. + ir.) from the expression for a
3 13 1 3 ab
given in (2.3.10) is equal to 812 (x,x ). Hence by (2.3.10) and
83,63 i j
(3.4.3), it is clear that Xab is invariant under changing the sign of
both 81 and 2, for all a and b. Therefore the variance of 3 is also
invariant under changing the sign of both 81 and 2. This implies that
the design which minimizes var(3 ) for 81 and 82 is also the design
which minimizes var(3 ) for ( 1) and (8 ), thereby proving (i).
To prove (ii), note that from (3.3.5),
2 2
exp(8 (x.+x.) + 3 (x. +x. ))
) = (xi,x.)' =
51 1 j 2 2 2
S2 (exp(1Xi +B2xi ) + exp(81xj+62x ))
exp(B (xix.) + 8 (x. +x. ))
(exp(B1(xi)+B2xi ) + exp(B (x )+B2x2))2
S (x.,xi'x) (3.4.4)
Again, by (2.3.10) and (3.4.4), changing 81 to (81) and the pairs
(x.,x.) to (x.,x.), for all pairs in the design, will not change 1 ,
122' 33' nor 113. It changes only the sign of A12 and 123. But
notice that in (3.4.2) all terms involving A12 and A23 are of the form
2 2
112 23, 122, or 23 Hence the negative sign cancels, leaving
var(3 ) invariant under the change described above. This proves (ii).
To prove (iii), simply apply the results of (i) and (ii).
Theorem 3.5 proved the equivalence of minimizing var(83) and
ElHelbawy and Bradley's criterion. A Fortran computer program was
used to find the optimal design using ElHelbawy and Bradley's metho
dology for 8 = 2=8 =0. The levels allowed were x=l,.5,0,.5,1. The
grid search included all ten pairs formed by these five levels, assum
ing a symmetric design. The optimal design found was {n =
n = .045N, n.5 = .621N, n_, = .298N, all other n.i = 0}.
.5,1 .5,.5 1,1 13
For this design, it can be shown that var(3 )=16/N.
It was decided that for unequally spaced levels, employing ElHel
bawy and Bradley's methodology would be substantially more laborious
than minimizing var(83). For this reason, the designs subsequently
discussed were found by minimizing the expression found in (3.4.2).
Optimal designs were found by a grid search via a Fortran computer
program for 83=0, 81 2=0(.2)1. The comparisons allowed were (l,x ),
(x,x2 ), (x 2,1), and (1,1), where the optimal value of xl and x2 were
also found by the computer. The motivation for the choice of these
four pairs is the result discussed in the two preceding paragraphs for
the case of all three parameters equal to zero, which indicated that
only the pairs (1,.5), (.5,.5), (.5,1),.and (1,1) need to be com
pared.
The grid procedure is now briefly described. The initial values
for xl and x2 were x = .7(.1).3 and x =.3(.1).7. For each of the 25
combinations of xl and x2, the optimal comparison proportions were
found by another grid. The latter grid began by considering different
comparison proportions, changing by increments of 0.1. This grid
became finer and finer until the proportions were accurate to .0004.
From this the best combination of xl and x2 out of the initial 25
choices was found, and the entire procedure was repeated for a finer
grid about the previous best choice of xl and x2. The entire grid
search stopped when xl and x2 were accurate to .0037.
Notice that it is possible that the designs found are not abso
lutely optimal, but rather they may only be local minimums for (3.4.2).
This is because only four pairs were considered when it is possible
that a five or six pair design, for example, might offer a slight im
provement. Also, the grid search restricted xl and x2 rather than
allowing them to take on values in the entire interval [1,1. The
latter point is not believed to be a problem due to the resulting
design for 1 =2 =B3=0 originally found using ElHelbawy and Bradley's
methodology, and the fact that the optimal designs currently being dis
cussed all have values of x1 and x2 within 0.02 of 0.5.
The optimal designs and the value of var( 3) for N=l are presented
in Table 3.2. The first two columns of the table give the value of
81 and 2. Columns 5 through 8 give the comparison proportions for the
pairs (l,x ), (x ,x ), (x2,1), and (1,1), respectively, where xl and
x2 are given in columns 3 and 4. The last column gives Nvar(3 ). The
number of times a pair is compared is found by multiplying the appro
priate comparison proportion by N. Recall that by Theorem 3.6, the
optimal design for both parameters negative is the same as the optimal
design for !31 and B2 1. If exactly one parameter is negative, then
the optimal design is found by first finding the optimal design for
j8 1 and 1821, and then taking ix to be the proportion of the total
number of comparisons that the pair (l,x ) is compared, xl,x2 to be
the proportion for (xl,x ), and so on. To calculate var(3 ) for N>1,
the tabled value of Nvar(B3) is divided by N.
Notice from Table 3.2 that for 8: .6, the optimal design is very
close to the design {n 1,.5 = n.5,.5 = n.5,1 = .3333N}, regardless of
i,.5 .5,.5 .5,1
the value of 2. This implies that if it is a priori believed that 8
is relatively large, then this design will result in a variance of 83
very close to the tabled minimum. As has been previously mentioned, it
is also of interest that the optimal values of xl and x2 are always
very close to .5 and .5, respectively. Finally, notice that the
design for =B 2=0 found in Table 3.2 is not the same as the one found
using ElHelbawy and Bradley's methodology, although the value of
var( 3) appears to be the same for both, and hence they are equally
optimal. However, it was not established that the two designs are
mathematically equivalent. Since the two criteria are equivalent,
there may be some problem with local minimums of var(63). It appears
that the surface of the expression for var( 3) is relatively flat, and
so it is possible that a finer initial grid is necessary to guarantee
that the minimum found is in fact the absolute minimum of var(83).
Table 3.2. Designs which minimize var(B3)
1 62 xl x2 1,xl Xl,X2 X2,1 1,1 Nvar(83)
0 0 .50 .50 .133 .533 .133 .200 16.0000
.2 0 .51 .50 .338 .331 .331 0 16.0836
.4 0 .50 .50 .332 .337 .332 0 16.3225
.6 0 .50 .50 .330 .341 .330 0 16.7322
.8 0 .50 .50 .327 .346 .327 0 17.3190
1 0 .50 .50 .323 .353 .323 0 18.0959
0 .2 .50 .50 0 .666 0 .333 16.0000
.2 .2 .50 .51 .333 .329 .338 0 16.1447
.4 .2 .50 .50 .330 .336 .335 0 16.3835
.6 .2 .50 .50 .329 .338 .333 0 16.7943
.8 .2 .50 .50 .326 .346 .328 0 17.3819
1 .2 .50 .50 .321 .353 .327 0 18.1604
0 .4 .50 .50 0 .666 0 .333 16.0003
.2 .4 .50 .50 .105 .561 .103 .231 16.3209
.4 .4 .50 .50 .330 .333 .337 0 16.5664
.6 .4 .50 .50 .327 .338 .335 0 16.9799
.8 .4 .51 .50 .324 .342 .334 0 17.5716
1 .4 .51 .50 .319 .349 .333 0 18.3554
0 .6 .50 .50 0 .666 0 .333 16.0005
.2 .6 .50 .50 .004 .661 .001 .334 16.3207
.4 .6 .51 .50 .333 .328 .340 0 16.8748
.6 .6 .51 .50 .327 .333 .340 0 17.2927
.8 .6 .51 .50 .323 .337 .340 0 17.8907
1 .6 .51 .49 .317 .346 .337 0 18.6825
0 .8 .50 .50 0 .666 0 .333 16.0010
.2 .8 .50 .50 .004 .661 0 .334 16.3225
.4 .8 .51 .47 .039 .633 .078 .320 17.2619
.6 .8 .51 .50 .328 .327 .344 0 17.7368
.8 .8 .51 .50 .323 .331 .346 0 18.3436
1 .8 .52 .49 .319 .338 .343 0 19.1472
0 1 .50 .50 0 .666 0 .333 16.0015
.2 1 .50 .50 0 .664 0 .335 16.3233
.4 1 .51 .47 .034 .633 0 .333 17.2685
.6 1 .52 .50 .331 .318 .351 0 18.1804
.8 1 .52 .50 .324 .325 .351 0 18.9372
1 1 .52 .49 .318 .331 .351 0 19.7558
CHAPTER 4
DESIGNS FOR FITTING A QUADRATIC MODEL
4.1 Introduction
The present chapter contains optimal designs for fitting a quad
ratic model under three different conditions, each for three different
optimality criteria. The Doptimality criterion is considered in Sec
tion 4.2. This criterion minimizes the determinant of the asymptotic
variancecovariance matrix. In the present chapter, this matrix is a
2x2 matrix since there are two maximum likelihood estimators, 81 and B2"
From Section 2.3, these estimators are asymptotically normal. There
fore, by a result in Box and Lucas (1959), the determinant of the
variancecovariance matrix is proportional to the volume contained with
in any particular ellipsoidal probability contour for $=(81, 2) about
S, where $ is the true parameter value. The Doptimal design will
o o
then minimize the volume of any such probability contour.
Section 4.3 contains designs which minimize the averagevariance
of In T = B x + 2 x The averagevariance is simply the variance of
ln 't integrated over the experimental region. In this case, the exper
imental region is xELl,l]. The property that these designs have is
obvious. That is to say, a good estimate of all of the preference prob
abilities is desirable, and the design which minimizes the integral over
x of var(ln ir ) will give estimates of 7 which are good "on the
x x
average".
The criterion discussed in Section 4.4 is similar to the average
variance criterion. The designs found in this section minimize the
maximum of var(ln TT ), where the maximum is taken over the interval
x
xEil,l]. These designs are referred to as minimax designs. The dis
advantage of this criterion is similar to the disadvantage often attrib
utable to the minimax rules in the decision theory framework. That is,
minimax designs may be slightly "pessimistic" in the sense that they
minimize the largest value that the variance can be. This minimization
is good, but at the same time the variance of In T may be relatively
x
large for most values of x in the interval [1,1], hence causing the
minimax design to be unfavorable.
Finally, Section 4.5 presents some design recommendations and
conclusions. A number of designs are compared to determine which
design does best for a wide range of parameter values, using primarily
the Doptimality criterion. Additionally, an examination is performed
on how well these designs protect against the bias present when the
true model is cubic.
Recall that in general, the problem of finding optimal designs is
complex. It involves the determination of the number of levels to be
included in the experimental design, and also which particular levels
are to be chosen. Additionally, the number of times each pair is to
be compared must be determined. In Chapter 3 it was proven that the
design which minimizes var( 2) must be of the form {n_x = nx,1 = N/2,
thereby implying that the optimal number of levels is four, and only
two pairs are to be compared an equal number of times. All that re
mained to be resolved was the determination of x for different values
of a1 In the present situation, a similar simplification of the
general problem does not appear to be possible, except by imposing some
restrictions, such as letting $2=0. For this reason, three conditions
are considered under which optimal designs are found. This essentially
results in designs which are optimal for a proper subclass of the class
of all designs. However, as will become apparent in the remainder of
the chapter, there is not much difference in the degree of optimality
among the three conditions presently considered. Hence it is conjec
tured that the degree of optimality of the designs found in the final
and most general subsection of Sections 4.24.4 is very close to if not
equal to the best that can possibly be achieved. The three conditions
are now briefly introduced.
The first condition considered in each of Sections 4.24.4 is that
the levels are restricted to be x=l,0,1. This greatly simplifies the
general problem to one with only two unknowns, namely the comparison
proportions, P1,0 and p0,1. The value of J1,1 is found by the rela
tionship i, + 0 + _ = i.
tionship 1,0 0,1 1,1
The second condition discussed in each of the following three sec
tions is to let $2=0 in the enumeration of the variances and covariance.
In this situation the optimal designs are proved to be symmetric, and
hence the design problem is simplified to a choice from the class of
symmetric designs. As was pointed out in Section 2.3, previous papers
on the design of paired comparison experiments have always relied upon
setting all parameters equal to zero. Presently, the linear parameter
is allowed to vary. The three criteria are a continuous function of
the parameters because the variancecovariance matrix is continuous in
the parameters. Therefore, the designs found for the second condition
are locally optimal in the sense that they are close to optimal for
values of 82 which are near zero.
The third condition in each of the following three sections is a
consideration of designs whose comparisons are of the form (l,x ),
(x2,1), and (1,1). The optimal choice of the levels xl and x2 are
found in addition to the three comparison proportions. This is a gen
eralization of the first condition since setting x =x2=0 results in the
first condition.
Figure 4.1 presents a synopsis of the procedures used in Sections
4.24.4 to find the optimal designs for each of the three optimality
criteria and the three conditions introduced in the present section.
The first column presents a brief description of the three conditions,
and hence the three rows in the figure correspond to the three condi
tions. Columns 24 contain a brief description of the methodology used
for the three criteria.
4.2 Doptimal Designs
The Doptimal design is defined to be the design which minimizes
the determinant of the asymptotic variancecovariance matrix. Letting
Z denote the variancecovariance matrix, this is equivalent to maximiz
ing IZl. In the case of a quadratic model, the Doptimal design is
the design which maximizes D, where
D = = 11 22 122 (4.2.1)
and where X 11 122' and X12 are given in (3.3.2)(3.3.4). Because the
optimal design depends on the true parameter values, designs are found
0
> C0
> 0
.4l o j
,t
o o > o
X4 X .O O
I * m X C) .
O*H
a )01 '0 a .
4J , UT 4 *
04 r 50 
I ifU 0 0
C 04
'.0 0.
0 Cl C
I 0
"' ri 11
CC 0
4 4 C) 4 CC
1U m > U
,, >. X H 4
Eo 0 i
 u 'o
; U *d 1U X H 
. .i ro c n i
S C) U C) CB N U CC
SH 0 1II 0 > 0 C
SO 4
S0 00 0
Uo ,) a) > o 1 z
Q f q o 0 4J
Ci 0 CC 0
0 0 004
I H *4 CC
rC U 0 U
I NC O >S 00
i 'Q *H 0 *'C C 'i U
I E X U . U
14 O R ir U O CC 0 0 ]
0 CC 0 CC N U V 4 CO
I 0 CC  C: 0 C
i ti * *H *l CO ( C0
4 4 34 40 U
I 0 O 10 + 4J
0 H rN N n Q, I
R 4 ul g g
S 0 0 rc4 r3 H
0 > 0 U ai 4J C 0
E I U c
0 0  CC e CC H 0
*0 0 C U 0 0 U
4 0 4 C 0 
o H a U .C N 0 X N
4J C C U o HC 0Q 0 U M
0 :a 04 Io C ll H C o x 'N
0 CC EC 4C rI I H f X n
'N__ r ,______< '4m~ C C ~ H ~
for various choices of the parameters. The three conditions intro
duced in Section 4.1 are now discussed, beginning with Condition 1.
Condition 1
The levels are restricted to be x=1,0,1. Because of this restric
tion, and by substituting i.. for n.. in (3.3.2)(3.3.4), the expression
for All' A12' and 22 can be simplified to
11 = 1,0 1,0 + 0,1O Ol + 41,1 1,1
12 0,1 0,1 1,0 1,0 (4.2.2)
22 = P1,0 1.0 + 0,10,1
The substitution of Pij for n.. is made throughout the present chapter.
This essentially sets N=l since U.. = lim(n../N).
13 N4 ij
From (4.2.1) and (4.2.2), the expression for D is
D = 4 1,0 0,11,0 0,1 +
4(1u1, 0 0,l1 1,0_1,0 + 0,1 0,1'1,1 (4.2.3)
where .. is defined in (3.3.5). The partial derivatives of D with
respect to i ,0 and 10, are, respectively,
1,0 0,1
3 4 0 40 ( 0 + 0 ) +
U, 0,1 1,00, 1,1 1,01,0 + 0, 0,1 +
1,0
(4.2.4)
4(1 U_1,0 , ,O 1,1 1,0 '
aD
 =, 4U1,00 ',0o0,1 4 1,1( 1',0_1,0 + '0,O10'1) +
4(1 p1,0 0, 1,1 '
Setting the derivatives in (4.2.4) equal to zero results in
= 1,0 1,1 '
(4.2.5)
iO, (1,0 0,1 + ,0 ,1 0,1 1,1
0+ O,1 20,l ,
In matrix notation, (4.2.5) is equivalent to
1I,0 1,0 1,1
0,1 0,i i,i
= 0,1,1
(4.2.6)
implying
ul 0,l 1,1
(4.2.7)
where
S21,'0 1,1
A =
1,0 0,1 1,0 1,1 0, 1,1
tl,0 O,l% 0,1 1,1 1,
2%,itid)
For the solution of (4.2.7) to be a maximum for the expression in
(4.2.3), it needs to be shown that the matrix of second order partial
(4.2.4)
11 ,0(2._1,0 _I,' +
derivatives is negative definite (see Kaplan, page 128). By (4.2.4),
this matrix is
1,02 D/1,O 0,1
= = 4 A (4.2.8)
2 D/1,0 Ol 2D/O, 12
where, from (3.3.5),
1 1 2
Se= 1/( e + e
i + B + 82 2
1 2 1 2 2
= e /( e + 1 (4.2.9)
6i u S iS
S= e + 2/(1 + e1 2 )2
0,
It is known from elementary matrix algebra that the determinant of
a square matrix is equal to the product of its eigenvalues. So in this
case, IXI=ele2, where el and e2 are the eigenvalues of X. If jX[>0,
then either both eigenvalues are positive or both are negative, or
equivalently, X is either positive definite or negative definite. But
it is clear from (4.2.8) that
( 1 0 ) X = 8_1,1 10 < 0, (4.2.10)
where 1,1 and (,0 are both positive and are given in (4.2.9). The
1,1 1,0
inequality in (4.2.10) implies that X is negative definite if xJ>0.
However, it will be shown by (4.2.12) that the determinant is not nec
essarily positive for all values of 81 and $2.
To simplify the notation, we write
/ 2a
(a + c b)
(a + c b)
2c
where
a = = e / (e + e )(e + 1)
26 1 + 2 a3 + )] 2
b = 01,010,1 = e22/ (e 1 + 1)(1 + e 1 2 2
81 +82 81 1 + 2] 2
c = i,0, = e / (e + e )(1 + e )
Then the determinant of X can be written as
IXI = 4ac (a + c b)2
2ac + 2ab2 2 c
= 2ac + 2ab + 2bc a b c
(4.2.11)
After resubstituting for a, b, and c, and after some algebra in
(4.2.11), an expression for IXI is found by
482 B + 382 8 + 8 8 + 8
e + 4e + 1 + 4e + 3e
1 3e + 382 381 + 2 21 + 282 381 + 382
S= 2 +3e + e + 3e + e
281 + 28 28 381 + 382 38 + 82
+ 3e +3e + e + e
 21 8 281 + 482 48 + 282
e + 2e + e + e
481 + 28 281 + 482 281 81 + 282
+ e + e + e +2e
(4.2.12)
I,
I
where
282 [(B, B1 8 + 2 B1 + 82, ) 2
S= 4e / (e + e ) (e + 1)(1 + e 1 (4.2.13)
Since >>0 for all $1 and 82, IXI>0 if and only if the righthand side
of (4.2.12) is positive. However, it appears that for large enough
values of B1 and B2, some of the terms with a negative sign will over
whelm the sum of all of the terms with a positive sign, and hence the
determinant could be negative. For example, 1 and B2 can be made
large enough such that the term exp(481 + 282) in (4.2.12) is larger
than the sum of all the terms with positive signs, thereby making
the expression in (4.2.12) negative. It will be seen shortly that this
is in fact the case.
An APL computer program was utilized to solve equation (4.2.7) for
various values of B1 and B2. The determinant, Ix was also calculated
and found to be positive for all values of 81 and 82 when the solution
fell inside the region of allowable values for the comparison propor
tions, p ,0' ,1 and i 1,1 That is, J.. Z 0, for all i and j, and
1,0 0,1 + 1 ,1 = 1. Figure 4.2 depicts the values that 1I,0 and
Iu Ui +,v 1,0
j0,1 can have. They must lie inside or on the solid triangle. The
0,1
dotted triangle in the figure depicts the region that the Doptimal
designs are exclusively located.
When the solution to (4.2.7) falls outside of the solid triangle
in Figure 4.2, the optimal design must lie somewhere on the solid tri
angle. This can be proven by contradiction since D is a quadratic func
tion of two unknowns, and hence it can only have one stationary point.
For a point to be on the triangle, one of the following must be true:
11,0 = 0, o0,1 or 1,0 0,1= 1.
10,1
1
.5 
0
0.5 1
Figure 4.2. Depiction of possible designs for Condition 1
If i,0 = 0, then from (4.2.3),
D = 40,1l _,10, 1 1 10,12
and 3D/O,1 = 0 implies lOi = 0.5. This clearly maximizes D
because of the negative coefficient on the quadratic term. Therefore
if the optimal design lies on the line segment i1,0 = 0, then it must
be {VI0, = 1 = 0.5}. If v0,i = 0, then
2
D = 4_1,101,0 , l,0
and it can similarly be shown that the optimal design is the midpoint
of the line segment, namely {0, = 0, I,0 = 1, = 0.5}. Finally,
0 / i ~1,0 1= /
if 11,0 +0, = 1, then
0 = 1 0 
D = 4~1,010, 1_,0 0,l
= 4 1,0 0,111,0(1 1,0'
Again, it can similarly be shown that D is maximized when _,
p0, = 0.5. Therefore, if the Doptimal design is on the solid tri
angle, it must be one of these three designs.
When the solution to (4.2.7) was outside of the solid triangle in
Figure 4.2, D was calculated for the three designs presented in the
preceding paragraph, and the one with the largest value of D was taken
as Doptimal. Otherwise the Doptimal design was found directly from
(4.2.7). The determinant of X was also calculated. It was occasion
ally negative as conjectured earlier, but only when the solution was
outside of the region, at which time the sign of IXJ is irrelevant.
The "restricted" Doptimal designs are presented in Table 4.1 for
all combinations of the parameter values B =0(.1)1,2,3 and 2=0(.1)1,2,3.
Recall that 1,0' 0,1' and p1,1 represent the proportion of the
total available comparisons to be allocated to the pairs (1,0), (0,1),
and (1,1), respectively. The number of times one of these three pairs
should be compared is then found by multiplying the corresponding Pij
by N. The values of D are also given for the purpose of comparing the
designs with the designs found in the next two subsections. Addition
ally, the relative efficiencies are given for the "restricted" Dopti
mal designs presented in Table 4.1 relative to the Doptimal designs
found for Condition 3 in Table 4.3. The relative efficiency is the
square root of the ratio of the corresponding values of D, since D is a
2
multiple of N
Recall from Section 3.1 that a value of 1 for the parameter 1 is
"large". This is basically true for any parameter since x is in the
interval [1,1]. For this reason, it is assumed that rarely will S1 or
82 be larger than 1, and hence parameter values of less than or equal to
1 are of most interest. Parameter values of 2 and 3 are included pri
marily to observe the change in the designs for large parameter values.
Table 4.1 only gives positive parameter values. The following
theorem shows that the table can also be used to find optimal designs
for negative values of 81 or 2. This theorem is similar to Theorem
3.6.
Theorem 4.1. Suppose that the Doptimal design for given 81 and 82
is a comparison of x. with x. a total of n times, for all pairs in
1 ] x.,x.
the design. Then
(i) The Doptimal design for (81) and (82) is the same as for
81 and 682
(ii) The Doptimal design for (81) and 2 is a comparison of
(x.) with (x.) a total of n times, for all pairs in
1 ] x.,x.
1 3
the design,
(iii) The Doptimal design for 81 and (82) is the same as in (ii).
Proof. From (3.4.3),
 1, (xix.) = i, 2(x.,x.) ,
where ( 12(x.,x.) is given in (3.3.5). From (3.3.2)(3.3.4) and the
above result, it is clear that A ll, 22' and A12 are invariant under
changing the sign of both 81 and 2. Therefore, since D = A 1 22 12
D is invariant under changing the sign of both B1 and B2, and so (i) is
proved.
To prove (ii), note that by (3.4.4),
1,l 82 (xi,xj) = l,0 ( ,x )
Again, from (3.3.2)(3.3.4) and the above, changing 81 to (61) and the
pairs (x.,x.) to (x.,x.) will not change XA nor A22 It changes
only the sign of 12' and hence does not change A 12. Therefore (ii)
is proved.
To prove (iii), simply apply the results of (i) and (ii).
So, as in Section 3.3, the optimal design for both parameters nega
tive is the same as the design for I811 and IB21. When exactly one
parameter is negative, the column headed 1,0 becomes 0,' and vice
versa.
Notice from Table 4.1 that for small parameter values, the optimal
design is inside the solid triangle in Figure 4.2, signifying all three
pairs are to be compared. The optimal design approaches the boundary
as the parameters increase. Also, in all cases the optimal design was
inside or on the dotted triangle in Figure 4.2, signifying that no pair
should be compared more than half of the time.
Notice also from Table 4.1 that most of the relative efficiencies
are approximately 0.95 when 81 and 82 are both less than or equal to 1.
This indicates that the increase in efficiency for the generalization
considered under Condition 3 is slight. If an experimenter warrants
that a minimum number of levels should be used, then the advantage of
the designs found presently under Condition 1 is that they have only
three levels as opposed to four levels for the other two conditions.
The fact that they are also about 95% efficient allows the present
designs to be useful in this situation. However, if there is no signi
ficant advantage to only having three levels, then perhaps the designs
found under Conditions 2 and 3 are more useful since they are slightly
more efficient.
Condition 2
In this subsection, locally Doptimal designs are found analytical
ly for 32=0. The designs are local as described in Section 4.1. That
is, by the continuity of the criterion, the optimal designs are close
to optimal for 32 close to zero. It turns out that they are also local
in the sense that the designs found under Condition 3 have slightly
larger values of D for some values of 1. Since the criterion is a
maximization of D, the designs under Condition 3 are therefore occa
sionally slightly better.
By the same argument that was used in the proof of Theorem 3.1,
the Doptimal design must be a symmetric design when 2=0, which implies
12=0. So the search for a Doptimal design is reduced to a search
for a symmetric design which maximizes D = 1 22. Also, the proof of
Theorem 3.2 can be presently implemented to show that every pair in the
Doptimal design must include either x=l or x=l.
From (3.3.7), .ij is written as
4i. = ( 4cosh2(1 (x. x.)/2) )
13 i J
where
1 u u
cosh(u) = (e + e)
2
1 u u
sinh(u) = (e e) .
2
The maximization of D is equivalent to the maximization of the follow
ing redefined D:
D = 4Xi 22 = f(x)g(x)
N/2 2
N/2 (1 x.)
f(x) =
i= cosh2(8(xi 1)/2)
N/2 2 + .)2
g (x ) = .
cosh (8 (x. 1)/2)
i=l 1
x' = (1,X2.x, ... XN/2)
Two useful, easily verified facts are
3cosh(6 (x. 1)/2) 1
x 2 B sinh(8l(x 
x. 2 1
.1
(4.2.14)
(4.2.15)
(4.2.16)
1)/2) ,
(4.2.17)
Dsinh(6(xi 1)/2) 1
= 1  cosh(3S x 1)/2)
ix. 2
where
The partial derivative of D with respect to x. is then
1
DD f(x) 9g(x)
x g(x) + f(x) (4.2.18)
dx. ax. x.
1 1 1
where
af(x) 2(1 x.) (1 x)2 sinh(8 (xi 1)/2)
1 1 1 11
axi cosh2 (6(x 1)/2) cosh 3 (x 1)/2)
(4.2.19)
Dg(x) 4x.3 4x. (1 x.2) 2 sinh(3C(x. 1)/2)
1 1 1 1 1
xi cosh2(8 (x. 1)/2) cosh3(1 (x. 1)/2)
(4.2.20)
i=l,...,N/2.
Because of the complexity of handling these N/2 equations of N/2
unknowns, and because it appeared to be reasonable, the expression in
(4.2.18) was evaluated at the N/2dimensional point x.=x, i=1,...,N/2,
set equal to zero and solved. This produced a design of two equally
weighted comparisons, namely (l,x) and (x,l), for some x. The mechan
ics of the procedure follows.
The partial derivative of D with respect to x., evaluated at x.=x,
1 1
for all i, is
S 2(l __x) l (1 x)2sinh (1 2)2
9D L cosh cosh3 cosh
+ 2(l x2 (1 x) )
2 2 2
cosh cosh cosh2
(4.2.21)
where cosh and sinh are shorthand for cosh(B1(x 1)/2) and
sinh(1B(x 1)/2), respectively. By setting the expression in (4.2.21)
equal to zero, multiplying it by cosh5/(N(l x)3(1 + x)), and combin
ing terms, we have
(1 + 3x)cosh + B1(1 x2)sinh = 0 (4.2.22)
Because of the transcendental nature of this equation, x can not be
expressed as a closed form function of 1. Consequently, an APL pro
gram was written to find the solution to (4.2.22) for various choices
of 1. These designs are presented in Table 4.2 and are discussed at
the end of the present subsection.
In order for the solution to (4.2.22) to be a local maximum for D,
it must be shown that
cosa
(cosal, ... ,cos /2) axx. 0
Sx.=x c /
cosaN/2
for any angles ai, such that 0 < a. 2r, and
N/2
cos a = 1
i=l
(see Kaplan, page 128). This is equivalent to showing that the matrix
X is negative definite, where
( 2D
X = (4.2.23)
3x N/2xN/2
i = N/2 x N/2
From (4.2.18), the second order partial derivatives are
2 D 2f(x) 2f(x) Dg(x) 2 g(x)
= g(x) + 2 + f(x) i=,...,N/2,
2 2 1 (4.2.24)x.
S1 1 (4.2.24)
a2D
2x. x.
1 3
F[2f(x) f(x) 2g(x)] f(x) 2g(x) 2gx)
= x.x. xg( + xx. Sx. + x.) x
131 3 iL j 1 3
;f(x) 3g(x) 9f(x) 3g(x)
= 2x x + . x. x. ij, i,j=l,... N/2,
1 3 3 1
(4.2.25)
where, from (4.2.19) and (4.2.20),
2 2812
2 (x )2
22 i 1 +
cosh
4(lx )B sinh
cosh3
cosh
2 1 2 2 2
4(3x. 1) (1x )
1 2 c
cosh2
(lx ) 21 sinh2
cosh4
(4.2.26)
3
8(x. xi)S sin]
cosh3
3 1x 2 2 inh 2
(1x ) 8I sinh
cosh 4
cosh
I
i=l,...,N/2.
(4.2.27)
Evaluating the second order partial derivatives at the point x.=x,
for al i, (4.2.23) can be written as
for all i, (4.2.23) can be written as
a b
b a
x =
(4.2.28)
a b
b a
22f (x)
2
92g(x)
ax.2
1
2
2 gx)
ax.2
1
where
= 2D
Dx.
x .=x
= g (x) + 2 f (x) (2 (4.2.29)
2 xx x.2
3x 3xi
xa.=
Xi=X
and
2
Do
b = Dxx (any ifj)
x.=x
1
Df(x) Dg(x)
= 2 Dx. x. (4.2.30)
1 1i
The complete expressions for a and b can be found from (4.2.29) and
(4.2.30), in addition to (4.2.15), (4.2.16), (4.2.19), and (4.2.20),
by substituting x for x..
The matrix X is of a special form as indicated by (4.2.28). It
can be written as
X = D + yz' ,
where D=(a b) I N/2, =b, and y'=z'=l'=(1, ... ,1). From Graybill
(page 187, Theorem 8.5.3), the characteristic equation is
N/2 N
1
S d 2
(d + c7 y.z. ) (d ) = 0
i=l
N
N
d + a Zy.z. = (a b) + b. As discussed in the previous subsection,
1 1
1
the matrix X is negative definite if and only if all of the eigenvalues
are negative. An analytic proof that the two eigenvalues are negative
follows.
Consider first the eigenvalue (ab), where from (4.2.29) and
(4.2.30),
C2x) 2 g(x)
ab = g(x) + f(x)
3x.i x.
1
cosh2(2 (l1x)2 ) + sinhcosh(4(lx)1
+ sinh2 ( 2(1x) 2 2J 22
cosh4 cosh2
2 2 1 2 2
cosh (4(3x 1) (1x )B
2 1
(1x) 2 + sinhcosh(8x(lx 2) ) + sinh ( (1x2) 2)
coshcosh 4
(4.2.31)
I.t needs to be shown that (ab) is negative at the stationary point x,
which is the solution to (4.2.22). This relationship between x and B1
found in (4.2.22) is used in the above expression of (ab) by substi
tuting for sinh,
(1 + 3x)cosh
sinh = (4.2.32)
1l(1 x2)
Upon substituting (4.2.32) into (4.2.31), the eigenvalue (ab) at the
stationary point, x, is
(2 x) 2 4(1+3x) x)2] 1x) 2 +
2 1 (2(1+x) 2 cosh4
ab =
+ 2 1 2 2 3 21 4
[ 4(3x 1) 1x )1) 8x(l+3x) + I1(l+3x) .2sh 3 3
(4.2.33)
After factoring out (lx) /cosh and combining terms,
ab < 0
if and only if (4.2.34)
2 (x 1)(x2 2) ) + ( 22x2 20x 6 ) < 0
1 2
The first term in (4.2.34) has roots x=l and x=+v2, and is nonpositive
for XELl,l]. The roots of the second term are imaginary, and clearly
the function is negative at x=0. Hence it is negative for all x. The
result ab < 0 then follows from (4.2.34).
N
The second eigenvalue is (ab) + b. From (4.2.19), (4.2.20),
and (4.2.30),
b 2(1x) (2cosh (lx)B sinh) (4x(l+x)cosh (lx)(l+x)2 sinh)
cosh6 1 1
2(1x) 8x(l+x)cosh2 + 2(1x)(l+x)2 1coshsinh
cosh6 + 4x(lx2S coshsinh + (1x2) 2 2sinh2
2(1x) (x2 + 2x 1) (4.2.35)
cosh
the last step produced by again using the relationship in (4.2.32) and
combining terms. Then
N (1 x)
Sb =(2x 2) (4.2.36)
2 4
cosh
By factoring N(1x) /cosh4 out of the second eigenvalue, it is clear
2
from (4.2.34) and (4.2.36) that
(a b) + b < 0
2
if and only if
2 2( (x21)(x 2)) + (22x220x6) + (2x2) < 0
1 2
if and only if
12( ()x2) (x 2)) + (22x218x8) < 0 (4.2.37)
The first term in (4.2.37) is unchanged from (4.2.34). The roots of
the second term are still imaginary, and hence the eigenvalue is nega
tive. Therefore X is negative definite, and the solution to (4.2.22)
is in fact a local maximum for D.
The locally Doptimal designs for 81=0(.02)1,2(1)5 are presented
in Table 4.2. The value of the original D=llX 22 is also given for
comparison purposes. By comparing the present values of D to the cor
responding values of D associated with the designs in Table 4.1, it can
be seen that the present designs offer a slight improvement for 2=0.
Theorem 3.3 is applicable here since B2=0, and so the optimal design
for any particular value of 81 is the same as the optimal design for
( ). Recall that for a given 31, the optimal design is to make half
of the total comparisons between x and 1, where x is found in the table,
and the other half between (x) and (1). The optimal value of x is
graphed as a function of 1811 for 1811 1 in Figure 4.3.
Condition 3
A grid method was used to find Doptimal designs with no restric
tions on the values of the two parameters. With 2S 0, the argument for
a symmetric design is no longer applicable. As a generalization of the
first subsection, the three comparisons (l,x ), (x 2,1), and (1,1)
were allowed. A description of the grid method follows.
In general, if there are k independent unknowns, and the grid is
to have n values for each variable at each stage, then the grid method
is simply an evaluation of the function at each of the n points, and
the determination of which of these points maximizes (or minimizes) the
function. It then repeats the procedure, only with a finer grid cen
tered at the best point previously found. This procedure is repeated
until the optimal point has been found with sufficient accuracy.
The procedure used for the present condition was essentially a
grid within a grid. At each step, the combination of three values each
of x1 and x2 were taken. Initially these values were 0.5, 0, and 0.5.
For each of these 9 points, a grid method was used on the allocation of
the comparisons to the three pairs. This grid consisted of initial
increments of 0.1 in the allocation fractions. Notice that this is not
a full 113 grid since, for example, { , = ,l = U1 = 0.5} is
1X X2 L ,1
not a possible allocation for the three pairs because they must sum to
1. After the best of these possible allocations were found, a finer 52
grid was performed about the current best allocation. Note that it is
a 52 grid rather than a 53 grid, because when two of the allocation
proportions are known, the third is just the sum of the two proportions
subtracted from 1. This was then repeated until the best allocation
fractions were accurate to .0001. The best of the 9 combinations of
x and x2 was taken, and then a finer 32 grid about the current best
combination of xl and x2 was performed using the same procedure des
cribed in the present paragraph.
The entire procedure described in the preceding paragraph was
repeated until the values of x1 and x2 were also accurate to .0001.
Doptimal designs were found for combinations of 81=0(.1)1,2,3 and
82=0(.1)1,2,3, and are presented in Table 4.3. Once again, the value
of D for N=l is also given in the table. Notice from the table that
when nx,1=0, the value of x2 is immaterial, and therefore an asterisk
appears under the column for x2.
Theorem 4.1 is applicable, so designs for negative 81 or 82 can
also be found from the table. If both S1 and 82 are negative, then the
optimal design is identical to the one for positive B1 and 82. If,
however, exactly one of the parameters is negative, then the optimal
design is the same as the one for 16 I and 1B I, except that nx
1 2 P1t tx
changes to n and n x2 changes to nx2,. For example, if
l,x x ,l 2
81=0.6, 82=0.3, then the Doptimal design would be to run .462N compar
isons of the pair (1,.23), .468N of the pair (.25,1), and .070N of
the pair (1,1).
Table 4.3 indicates that once again no pair should be compared more
than half the time. Also, the optimal design is inside the region
Table 4.1. Doptimal designs with restriction x=1,0,l
1 82 1.,0 p0,l 1,1 D Efficiency
0 0 .333 .333 .333 .083333 .9613
.1 0 .334 .334 .332 .082506 .9613
.2 0 .337 .337 .327 .080098 .9613
.3 0 .341 .341 .318 .076314 .9612
.4 0 .348 .348 .305 .071460 .9608
.5 0 .357 .357 .287 .065895 .9599
.6 0 .369 .369 .263 .059986 .9583
.7 0 .384 .384 .232 .054068 .9555
.8 0 .405 .405 .190 .048426 .9536
.9 0 .433 .433 .135 .043283 .9542
1 0 .470 .470 .060 .038814 .9592
2 0 .500 .500 0 .011024 .9664
3 0 .500 .500 0 .002041 .7260
0 .1 .333 .333 .334 .083056 .9615
.1 .1 .336 .332 .332 .082233 .9617
.2 .1 .340 .333 .327 .079836 .9615
.3 .1 .345 .336 .319 .076070 .9613
.4 .1 .353 .342 .306 .071237 .9610
.5 .1 .362 .350 .288 .065694 .9602
.6 .1 .374 .362 .264 .059805 .9585
.7 .1 .389 .378 .233 .053905 .9558
.8 .1 .409 .399 .192 .048276 .9538
.9 .1 .435 .428 .137 .043141 .9543
1 .1 .470 .467 .063 .038676 .9591
2 .1 .500 .500 0 .011001 .9660
3 .1 .500 .500 0 .002039 .7257
0 .2 .332 .332 .336 .082233 .9619
.1 .2 .336 .330 .334 .081422 .9619
.2 .2 .342 .329 .329 .079060 .9619
.3 .2 .349 .331 .321 .075346 .9618
.4 .2 .357 .335 .308 .070575 .9615
.5 .2 .367 .343 .291 .065097 .9608
.6 .2 .378 .354 .268 .059269 .9594
.7 .2 .393 .370 .237 .053421 .9567
.8 .2 .411 .392 .197 .047833 .9544
.9 .2 .436 .421 .143 .042725 .9545
1 .2 .468 .461 .070 .038270 .9587
2 .2 .500 .500 0 .010931 .9647
3 .2 .500 .500 0 .002034 .7245
0 .3 .331 .331 .338 .080888 .9625
.1 .3 .337 .327 .337 .080098 .9625
.2 .3 .344 .324 .332 .077793 .9626
