Quantile dispersion graphs for design comparisons for logistic models and other modeling issues

MISSING IMAGE

Material Information

Title:
Quantile dispersion graphs for design comparisons for logistic models and other modeling issues
Physical Description:
Book
Creator:
Robinson, Kevin S
Publication Date:

Record Information

Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 24903013
oclc - 45640833
System ID:
AA00020434:00001

Full Text










QUANTILE DISPERSION GRAPHS FOR DESIGN COMPARISONS FOR
LOGISTIC MODELS AND OTHER MODELING ISSUES













By

KEVIN S. ROBINSON


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2000































Copyright 2000

by

Kevin S. Robinson


































Dedicated to my loving wife, Becky














ACKNOWLEDGMENTS

I would first like to thank my chairman, Dr. Andre I. Khuri, for his enormous

amount of guidance, patience, and support in the writing of this dissertation. His

hardwork ethic and enthusiasm for statistics, research, and teaching serve as an

excellent example for me as I pursue an academic career. I would also like to thank

the members of my committee: Dr. Alan G. Agresti, Dr. Malay Ghosh, Dr. Kenneth

M. Portier, and Dr. Raphael T. Haftka, for their patience and assistance. Special

thanks go to all of the faculty, staff, and students of the Department of Statistics of

the University of Florida for the imparting of knowledge and encouragement over

the years, especially Dr. Ronald H. Randles and Dr. G. Geoffrey Vining.

I would also like to acknowledge the Department of Mathematical Sciences

at Messiah College for influences academically, personally, and professionally. I

specifically thank Dr. L. Marlin Eby for his encouragement and guidance concerning

graduate school in statistics and his continuing friendship.

I wish to acknowledge my family. I am deeply grateful to my parents for their

love and encouragement. I wish to extend my love and gratitude to my three sisters

and their husbands. I also thank my in-laws for their support and prayers.

Most importantly, I thank my wife, Becky. I am eternally grateful for her

love, patience, encouragement, and confidence in me.

Finally, I praise God for His daily blessings and the gift of Jesus Christ.














TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ............................. iv

ABSTRACT ......................... ........... vii

CHAPTERS

1 INTRODUCTION ............................ 1

2 LITERATURE REVIEW ........................ 3

2.1 Introduction .. ... .. ...... ... .... . ... 3
2.2 Designs for Generalized Linear Models ................ 4
2.3 Review of Multiresponse Modeling ................ 10
2.4 Functions of Multiple Responses ................. 13

3 A GRAPHICAL PROCEDURE FOR EVALUATING AND
COMPARING DESIGNS FOR LOGISTIC MODELS ....... 19

3.1 Introduction ........................... 19
3.2 Prediction Capability of Design ................. 22
3.3 Application to Logistic Models .................. 26
3.4 Quantile Dispersion Graphs ..................... 27
3.5 Numerical Examples ....................... 29
3.6 Conclusions ........................... 46

4 EVALUATION AND COMPARISON OF STANDARD RSM
DESIGNS UNDER NON-STANDARD CONDITIONS ...... 50

4.1 Introduction ........................... 50
4.2 Preliminary Results .......................... 50
4.3 Scaled Mean-Squared Error of Prediction ............ 60
4.4 SMSEP for the Logistic Regression Model ............ 62
4.5 A Numerical Example ........................... 63
4.6 Conclusions ........................... 72








5 MODELING FUNCTIONS OF MULTIPLE RESPONSES ..... 75

5.1 Introduction ................. .......... 75
5.2 "Naive" Modeling Procedures .................. 75
5.3 Multiresponse Analysis and Proposed Modeling Procedure 77
5.4 Numerical Examples . . . . . . . . . .... .. 86
5.5 Summary . . . . . . . . . . .......... 99

6 SUMMARY AND FUTURE RESEARCH . . . . . .... ..100

APPENDICES

A DISTRIBUTION OF THE RATIO .................... 103

B S-PLUS CODE: MODELING PRODUCT
CHAPTER 5 EXAMPLE 1 ..................... 104

C S-PLUS CODE: MODELING RATIO
CHAPTER 5 EXAMPLE 4 ..................... 107

REFERENCES .......................... ......... 112

BIOGRAPHICAL SKETCH ............................ 117














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy




QUANTILE DISPERSION GRAPHS FOR DESIGN COMPARISONS FOR
LOGISTIC MODELS AND OTHER MODELING ISSUES



By

Kevin S. Robinson

August 2000

Chairman: Andre I. Khuri
Major Department: Statistics

Response surface methodology (RSM) involves techniques for experimental

design, statistical modeling, and optimization. This collection of techniques has

been used successfully for the development, improvement, and optimization of

processes. RSM was developed under the assumptions of a linear model and

normally distributed response. However, in many modern applications, such as

drug development and quality assurance, the response is of a discrete nature. The

response could also be multivariate in nature consisting of a number of observed

outcomes for an experimental unit. Traditional RSM techniques are inappropriate in

such situations since the basic assumptions are not quite valid. A more appropriate

analysis of such situations would involve the use of generalized linear models or

multiresponse modeling techniques.








Designs for fitting a generalized linear model depend on the unknown

parameters of the model. The use of any design optimality criterion would therefore

require some prior knowledge of the parameters. A graphical technique is proposed

for comparing and evaluating designs for a logistic regression model. Quantiles of

the mean-squared error of prediction are obtained on concentric surfaces inside a

region of interest, R. For a given design, these quantiles depend on the model's

parameters. Plots of the maxima and minima of the quantiles, over a subset of the

parameter space, produce the so-called quantile dispersion graphs (QDGs). The

plots provide a comprehensive assessment of the overall prediction capability of the

design within the region R. They also depict the dependence of the design on the

model's parameters. The QDGs can therefore be conveniently used to compare

several candidate designs. Sequential generation of optimal designs and an extension

of the QDGs to the scaled mean-squared error of prediction are also considered.

In a multiresponse experiment the researcher may be interested in modeling

a function of the responses. A method is presented for the modeling of a function of

multiple responses. The proposed procedure is based upon a multiresponse analysis

of the data and Taylor's series approximation of the function of interest. The

proposed technique compares favorably with a standard "naive" procedure.














CHAPTER 1
INTRODUCTION

In the twenty-first century, statistical data analysis will continue to grow in

application and importance. Response surface methodology (RSM) with its basic

goals of experimental design and modeling with the purpose of process improvement

will also continue to develop in the present century. Myers (1999) emphasizes

that two areas into which RSM is moving are generalized linear models (GLMs)

and multiple responses. The applications using GLMs include the modeling of

binary response data and Poisson count data. While multiple responses are often

encountered in such situations as the yield and profit of a chemical process or the

assessment of the overall quality of a manufactured product.

In the area of GLMs, the main issue confronting RSM is experimental design.

There is software available in many of the statistical packages for the analysis of

GLMs. Unfortunately, however, none is available, as far as we know, for constructing

designs for GLMs. Myers (1999, p. 37) stresses that new and creative approaches are

needed to deal with the design issues encountered in GLMs. The standard response

surface designs and optimal designs were developed under traditional assumptions

concerning the model and response. Designs for GLMs encounter the problem of

dependence on the unknown parameters. The objective of an experiment is to

provide estimates of the parameters; therefore, the researcher is faced with a serious

dilemma. The researcher is required to provide guessed values for the parameters or

employ a sequential approach to the experiment. Methods are needed to assess the

sensitivity of GLMs designs to the unknown parameters.






2

Multiresponse data are now also very common with much of the research and

analysis focusing on optimization. The analysis of multiresponse data should give

consideration to the correlation structure among the responses. Khuri and Cornell

(1996, chapter 7) consider not only the optimization of multiresponse data, but the

modeling, choice of design, and lack of fit testing as well. In certain multiresponse

experiments, the responses may be further used in the computation of an outcome

of interest to the experimenter. Basically, the experimenter is interested in modeling

a function of correlated responses. Methods that recognize the multivariate nature

of the data are needed in this area.

In this dissertation, a graphical technique for the comparison of designs for

logistic regression models is developed and a modeling procedure for the function of

multiple responses is proposed. A review of the literature concerning GLMs design

and multiresponse modeling is contained in chapter 2. This chapter also gives more

details concerning the areas to be investigated in this dissertation. Chapters 3

and 4 concern the development of the so-called quantile dispersion graphs for the

comparison of designs for logistic regression models. In chapter 5, a new method

for the modeling of functions of responses is proposed which takes into account the

correlation structure among the responses. Finally, a summary and a list of future

research topics are included in chapter 6.














CHAPTER 2
LITERATURE REVIEW

2.1 Introduction

Extensive work in the area of single-response modeling has been carried out

in the statistical literature since the 1950's. Response surface methodology (RSM),

as developed in the early 1950's by George Box and several other researchers (Box

and Wilson 1951), mainly considers situations involving a single-response with

independent and normally distributed errors. Available textbooks on RSM include

Box and Draper (1987), Khuri and Cornell (1996), and Myers and Montgomery

(1995).

In many experimental situations, the traditional error assumptions are

inappropriate. The generalized linear models (GLMs) approach (McCullagh and

Nelder 1989) provides an extension of linear models in a manner that allows less

stringent error and model assumptions. Khuri (1993) provided a discussion of

the use of RSM within the framework of GLMs. He noted that optimal design

theory based on traditional RSM was no longer appropriate in the GLMs context.

Sequential design and Bayesian design are approaches employed in the design of

experiments for GLMs.

In other experimental situations, however, several responses are often of

interest. For example, in the semiconductor industry, the number of particles

remaining and the oxide removal rate for a standard wafer cleaning process are of

interest in the manufacturing process. Consider also, the assessment of pencil lead

which often uses the following: ash test, porosity test, strength test, and outside








diameter. Each of these examples has multiple responses that may be correlated.

Multiple responses are also encountered through the modeling of a response's mean

and variance, often analyzed as a dual response.

The modeling of several responses should take into consideration the

multivariate nature of the data. A multivariate analysis can be more informative and

more appropriate than a collection of univariate analyses. Unfortunately, however,

multiresponse modeling tends to be ignored because of its complexity. The use

of linear models and analysis of variance has been extended to modeling multiple

responses (Anderson 1984, chap. 8). The classical multivariate analysis of variance

(MANOVA) modeling can be restrictive in that each of the responses must have the

same linear relationship with the control variables and the same design. Multiple

design multivariate regression models are more general as they allow the responses

to have different relationships with the control variables as well as different designs

(McDonald 1975). Khuri (1996) presented a thorough review of multiresponse

surface methodology.


2.2 Designs for Generalized Linear Models

2.2.1 Review of Linear Model Design

Developments in optimal experimental design have largely been concerned

with linear models; for example, see the books by Fedorov (1972), Silvey (1980),

Pukelsheim (1993), and Atkinson and Donev (1992). From a RSM perspective,

the books by Box and Draper (1987), Khuri and Cornell (1996), and Myers and

Montgomery (1995) mainly discuss designs for linear response surface models with

the traditional error assumptions.

In the linear model situation, attention is primarily focused on the properties

of the variance-covariance matrix for the estimates of the model's unknown








parameters. Silvey (1980, p. 3) presents Fisher's information matrix as a motivation

for this approach. In this case, the variance-covariance matrix of the estimates is

o2[X'X]-1.

Traditional design criteria focus upon properties of the matrix X'X which

is independent of the model's parameters. For example, the D-optimality criterion

calls for the maximization of the determinant of X'X. While in G-optimality, the

criterion is to minimize the maximum prediction variance, which depends upon

X'X, over a design region.

Kiefer (1985) extended the notion of a fixed n-point design to a probability

measure over the experimental region, termed a design measure. Silvey (1980,

pp. 13-14) noted that the use of design measures allowed the exploitation of

mathematical techniques to solve the optimal design problem resulting in some

elegant and useful theory. The most notable result is probably the Equivalence

Theorem of Kiefer and Wolfowitz (1960) which establishes the equivalence of

D-optimality and G-optimality for design measures. Further discussions of design

measures and Equivalence Theorem are presented in Chapter 4.


2.2.2 Generalized Linear Model Design

The generalized linear models (GLMs) approach allows for less restrictive

error and model assumptions. For example, normally distributed errors and constant

variance properties are no longer required. The text by McCullagh and Nelder

(1989) presented a thorough account of the theory of GLMs. Khuri (1993) provided

a discussion of the use of the GLMs approach in RSM. Atkinson and Haines (1996)

provide a recent review of the design theory for nonlinear and generalized linear

models.








In the GLMs situation the response, y, is assumed to follow a distribution

from the exponential family. This includes the normal as well as the binomial,

Poisson, and gamma distributions. The mean of the response is modeled as a

function of the form, E[y(x)] = p(x) = h[f'(x)f3] where f(ax) is a vector of order

p x 1 whose elements consist of powers and products of powers of the elements of

x up to degree d(_ 1), and /3 is a vector of p unknown parameters. The function

f'(x)}3 is called the linear predictor. It is assumed that h(.) is a strictly monotone

function. Using the inverse of the function h(.) we have g[p(x)] = f'(x)f3. The

function g(.) is called the link function. In traditional linear models, the link function

is the identity function. The method of maximum likelihood (ML) is utilized to

estimate f3. It follows that an approximation for the variance-covariance matrix of

the ML estimates of 03 is the inverse of Fisher's information matrix which is p x p

with (i,j)th element E[- 0 2 ] where I is the log-likelihood function. Iterative

reweighted least squares (IRWLS) can be used to fit a generalized linear model as

discussed in Khuri (1993) and McCullagh and Nelder (1989).

Let z(x) = g[y(x)]. Using the first-order terms in a Taylor series expansion,

z(x) ,, g[p(x)] + [y(x) y(x)]g'[p(x)]. Thus E[z(x)] g[I(x)] = f'(x)13 and

Var[z(x)] Var[y(x)](g'(L(X))2 = w(x,/3). Using IRWLS, Var[3] Z [X'WX]-1,

where 3 is the ML estimate of /3, X is a matrix whose uth row is f'(xu), and W

is a diagonal matrix whose uth diagonal element is 1 where x is the vector
w(Xu,P)3
of design settings at the uth experimental run. It follows that the information
n
matrix, X'WTVX = l f(xu)f'(xu), depends on the unknown parameters,
u=1 W(X .UP5
/3, through w(axu, /3). Let z(x) = f'(x)f3 denote the predicted value of z at a point
x in the experimental region.

In the GLMs situation, as in traditional design theory, the focus has been

on the variance-covariance matrix, (X'WVX)-1, of the parameter estimates, where








the matrix W depends upon the unknown model parameters. A D-optimal design

maximizes the determinant of X'WTVX; however, there is no single D-optimal design

because of the depenency on the unknown parameters. G-optimal designs are

obtained by minimizing the maximum prediction variance over the design region,

where the prediction variance is Var[z(x)] fl(x)[XfVX]-lf(x).

The dependence upon the unknown parameters, f3, encountered in GLMs

also occurs in the consideration of designs for nonlinear regression models. Common

approaches for dealing with the dependency problem include using a "good" guess of

the unknown parameters, a sequential approach in which a design is determined in

at least two stages, and a Bayesian approach which introduces a prior distribution

on the parameters. White (1973) provided an extension of Kiefer and Wolfowitz

(1960) Equivalence Theorem to nonlinear models. Chapter 4 contains a review of

this extension to the Equivalence Theorem.

Developments in designs for GLMs experiments have been mostly

concerned with binary responses commonly modeled using the logistic regression

model(Abdelbasit and Plackett 1983; Minkin 1987; Chaloner and Larntz 1989;

Myers, Myers, and Carter 1994). Abdelbasit and Plackett (1983) discuss designs

for experiments dealing with binary data, specifically biological assay or fatigue

experiments. An experiment consists of subjects receiving a dose level (stimulus),

x, to which the subject may respond. Interest is in the probability of a response,

7r(x). D-optimal designs for the logistic model, logit 7r(x) = log (,) = /3(x A),

are found assuming two-point symmetric designs about /. and equal allocations. The

optimal design is to use the dose levels corresponding to ED17.6 and ED82.4, where

ED100op represents the value of x which produces a 100p% response rate. Note that

these dose levels are unknown since the parameters are unknown. Abdelbasit and








Plackett suggest dividing the experiment into stages and using improved estimates

after each stage in determining the dose levels.

Minkin (1987) considers the selection of a design for logistic regression such

that the likelihood-based confidence region is of minimum volume. Minkin notes that

maximization of the determinant of Fisher's information matrix is in effect equivalent

to minimizing the volume of a confidence region based on a second-order Taylor's

approximation of the log-likelihood function. This work generalizes the results of

Abdelbasit and Plackett (1983) that designs with an even number of observations

should be two point symmetric designs about ED50 with equal allocation at ED17.6

and ED82.4. For an odd number of observations, N, no fewer than (N 1) points

should be located at each of the two dose levels. Again, a sequential approach to

selecting the design is recommended. Further discussion of sequential design and

analysis is given by Wu (1985a,b) and Chaudhuri and Mykland (1993).

An alternative approach to using a "good" guess or a sequential design is

to utilize a Bayesian methodology by placing a prior distribution on the unknown

parameters. A thorough review of Bayesian experimental design is given by Chaloner

and Verdinelli (1995). Chaloner and Larntz (1989) apply the Bayesian design theory

to logistic regression experiments. Designs were found using a range of uniform and

independent prior distributions for ti and /6. It was concluded that the number of

support points increased with the uncertainty in the prior distribution.

Myers, Myers, and Carter (1994) investigated optimal designs for the logistic

model by considering the prediction of a response. The prediction variance of logit

* is considered. The authors considered general designs that minimize the average

prediction variance, the so-called Q-optimality, and designs centered around ED50

that minimize the maximum prediction variance, namely G-optimality. Designs were

found using the Nelder and Mead (1965) simplex search procedure. The authors








considered the robustness of the designs to poor initial guesses through efficiencies.

They concluded that Q-optimal designs with two/three points were fairly robust to

poor initial guesses. Furthermore, underestimating the slope, /3, resulted in a higher

Q-efficiencies, while improved estimates of ED50 (p) reduced the effect of a poor

slope estimate.


2.2.3 Graphical Tools for Evaluating Experimental Designs

Until recently attempts to compare designs have been made using single-

valued criteria, such as D-optimality and G-optimality. However, the success of

a design for a given model is determined by its ability to predict the response

efficiently. The prediction variance function can be utilized to determine prediction

capability. G-optimality addresses the prediction variance but fails to provide

information about adequacy of prediction at various points in the region of interest.

Advances in computing and graphical methods have recently resulted in effective

graphical techniques to evaluate designs (Giovannitti-Jensen and Myers 1989;

Myers, Vining, Giovannitti-Jensen, and Myers 1992; Khuri, Kim, and Um 1996).

The variance dispersion graphs (VDGs) presented by Giovannitti-Jensen and Myers

(1989) are two dimensional plots of the maximum, minimum, and spherical average

prediction variances over concentric spheres inside a region of interest against their

radii. Khuri, Kim, and Um (1996) suggest that quantile plots of the prediction

variance on various spheres inside a region of interest allow for a more comprehensive

and accurate assessment of the prediction capability of a design.

In the nonlinear and GLMs situation, the VDGs and quantile plots encounter

the difficulty of dependence upon the unknown parameters as well as biased

estimates. Khuri and Lee (1996) present a graphical technique extending the

quantile plots to nonlinear models. For evaluating the prediction capability








of a design, Khuri and Lee suggest plotting quantiles of the estimated scaled

mean-squared error of prediction, ESMSEP, on concentric surfaces within the region

of interest. The mean-squared error is utilized to address the bias in the estimates

and is put on a "per observation" basis. The mean-squared error is estimated using

the fitted model because of the dependence on the model's unknown parameters.

In order to compare designs, Khuri and Lee propose the so-called minimum

and maximum SMSEP plots, where SMSEP is the scaled mean-squared error of

prediction. Nonlinear models with only one control variable were considered. The

minimum and maximum SMSEP are found over a subset of the parameter space for

each design and are plotted against x, where x is the control variable. An example

was presented to demonstrate that designs based on a single-valued criterion may

perform poorly in terms of the mean-squared error of prediction.

The development of quantile dispersion graphs (QDGs) for logistic regression

models is presented in chapter 3. Chapter 4 contains details about using the QDGs

for the comparison of traditional RSM designs as well as issues concerning the

sequential generation of optimal designs.


2.3 Review of Multiresponse Modeling

2.3.1 General Multiresponse Model

Suppose that an experimenter is interested in modeling r responses using

k control variables. An experiment is performed where the responses and control

variables are measured on each of N experimental runs.

Multiresponse models allow each of the responses to have a different

relationship with the control variables. Each response is represented by a model

function involving the control variables and a set of unknown parameter values. The








multivariate nature of the data is accounted for by taking into consideration the

variance-covariance structure of the r responses.

At the uth run, let yu = (yui, Yu2, Y3, ... yur)' be the vector of r responses,

and Xu = (xi, Xu2, xu3,... Xuq)' be the vector of q control variables. The model for

the ith response is of the form:

yui fi(xf,,3)+ui u= l,2,... ,N i=1,2,... ,r (2.1)

where /3 = (0) /2, ,... i/p)' is vector of p (unknown) parameters, cui is a random

error, and the fi model function for the ith response is of known form.

The standard assumptions about the random errors are: E(cu,) = 0 all u,

i and E(EmEui',) = aoi, if u = u', zero otherwise. Thus, random errors (among the

r responses) from the same experimental run are assumed to be correlated, but

random errors from different experimental runs are uncorrelated.

Assuming that the vector of random errors for the uth run, eu =

(ul, u2, f) u3, 0 Ceur)t,' has a multivariate normal distribution, Box and Draper
(1965) utilized the Bayesian methodology to estimate /3. The likelihood function

for the N x r response matrix Y, where the ith column of Y is the vector of N

observations on the ith response (i = 1, 2,... r), is given by
1 1 1
L(YIf, E) = (2)12 exp(-1trace[E-1 Z ()Z(f)]) (2.2)
(2,-)N,/2j~jN/2 2

where E is the variance-covariance matrix (crij) for the r responses, Z(/3) = Y-F(f3),

and F(f3) is an N x r matrix whose (u, i)th element is fi(xu, f). Assuming

noninformative prior distributions for /3 and E, the marginal posterior density for /3

is


P(/3IY) Cx IZ'(/3)Z(/3) N- (


(2.3)






12

Estimation of 83 is achieved by maximizing this posterior density, which is equivalent

to minimizing the determinant

A(13)= Z'(/3)Z(3) (2.4)

with respect to f3. This estimation procedure is referred to as the Box-Draper

determinant criterion. Bates and Watts (1988, pp. 138-139) noted that this Bayesian

argument is in fact not needed and that the standard likelihood approach leads to

the same method of estimation. This estimation method is very general in that it

places no restrictions on the functional relationships used in the modeling.

2.3.2 Linear Multiresponse Model

If the multiresponse model (2.1) is linear in the parameters, then


fA(x,,3) = f'(x,)43i i= 1,2,...,r,

where fi(xr) is a vector of order p, x 1 whose elements consist of powers and

products of powers of the elements of xa up to degree di(> 1), and fli is a vector of

pi unknown constant coefficients. In this case, the vector /3 in model (2.1) is of the

form f3 = (f03: f : :. : 0'r)', assuming that the /31's do not have common elements.

The generalized least-squares estimator of f3 is given by the value of /3 that

minimizes
r r
E Oij y- Xifjy- X- (2.5)
i=l j=l
where y, = (yUi, Y2i, yi, ... YNi), i = 1, 2... r, Xi is an N x pi matrix whose uth

row is f'i(Xu), and aj is the (i,j)th element of -1. Since E is unknown, Zellner

(1962) proposed a procedure for estimating it. For this purpose, each response is

analyzed using ordinary least squares. Consider the least-squares residuals for the








ith response, ej, i = 1, 2,... r. An estimate of E is given by E = (&ij), where


aij = i, j = 1, 2,...,r. (2.6)


An estimate of f3 is obtained by using E in place of E in criterion (2.5). The

generalized least-squares method is therefore a two-stage procedure which does not

require the normality assumption. Chapter 5 contains a more detailed presentation

of the analysis for the linear multiresponse model.

Problems such as dependencies among the responses and missing data are

associated with the analysis of multiresponse models. Box, Hunter, MacGregor, and

Erjavec (1973) considered three kinds of dependencies: dependence among the errors

which led to the Box-Draper criterion, linear dependencies among the expected values

of the responses and linear dependencies in the data. An eigenvalue-eigenvector

analysis was proposed to detect and identify such dependencies. Khuri (1990) noted

that problems may occur in the eigenvalue analysis if the response variables have

different units of measurement. He suggested a scaling convention for the responses

to address this situation. The problem of missing data was addressed by Box,

Draper, and Hunter (1970) by treating the missing values as additional parameters.

Stewart and Sorensen (1981) considered an approach using the posterior density

function of f3 and E given the complete data.


2.4 Functions of Multiple Responses

In some multiresponse experiments, the modeling of a function (or functions)

of the response variables is of interest. The following two multiresponse experiments

originated from the semiconductor industry (T. Moore, personal communication,

1996).








Example 1

SEMATECH performed an experiment involving a silicon nitride deposition

process using three factors in an effort to identify a suitable process window. The

three control variables (factors) were pressure, total flow, and gas ratio. The

responses of interest were thickness (THK), wafer to wafer standard deviation

(WTWSD), and within wafer standard deviation (WIWSD). The main objective

was the determination of desired settings of the control variables that would achieve

a target value for THK while minimizing the variation in the process. The problem

at hand is the modeling of the total variation in the process (in general, modeling

functions of the responses measured). In the present example, the total variation is

defined by the function

TOTSD = VWTWSD2 + WIWSD2 (2.7)


Two methods of modeling this variable can be considered. First, the new response

TOTSD is modeled along with the THK response. Second, the three responses

WTWSD, WIWSD, and THK are modeled and their predicted values are substituted

into TOTSD. It is of interest to know which of these two methods is better and

whether there are better alternative approaches to modeling such a function of

multiple responses.


Example 2

SEMATECH performed an experiment to investigate a tool used in the

process of polishing wafers. Three control variables (factors) were considered: table

speed, downforce, and slurry concentration. The measured responses were removal

rate of metal (RR), oxide removal rate (OR), and within wafer standard deviation

(WIWSD) for the removal rate of metal. Two functions of the multiple responses








were of primary interest.

RR WIWSD
Y1 = y2 = RR

Settings of the factors that would maximize Yl and minimize Y2 were desired.

Each of the above examples illustrates a situation in which the primary

interest is the modeling of a function (or functions) of several responses. Little work

has been done in this area.

Hunter and Jaworski (1986) describe a method for model building where

the response of interest, G(yi, Y2,... yr), is a function of r responses. A "naive"

approach consists of calculating the value of the response function for each

experimental run and then using single-response techniques to model this response

as a polynomial function of the control variables. Hunter and Jaworski proposed

to fit each of the r responses yi, y2,..., yr individually and then the model for the

response of interest is obtained by G(21, 22, .. #2), where fti is the estimate of the

mean from the individually fitted model for the ith response. Hunter and Jaworski

present an example where the response of interest is the product of yi and Y2. Using

a 22 factorial design with a center run, first-order models are fitted to Yi and Y2.

The models are then multiplied together to obtain a second-order model for y1y2. A

more detailed discussion of these two procedures is presented in Chapter 5.

Hunter and Jaworski place no constraints on how the response of interest,

G(yi, Y2, ... Yr), is related to Yi, Y2,... Yr. For example, they do not discuss the

problems that may occur if G(yi, Y2,... Y) is an ill-behaved function such as a

ratio, Y-. Also, correlations among the multiple responses Yi, y2,., y. are not

considered.

Guo and Sachs (1993) employ the naive approach and the alternative approach

of Hunter and Jaworski in modeling spatial uniformity in the semiconductor industry.






16

Spatial uniformity is defined to be the fundamental ratio of the observed standard

deviation and observed mean within a single wafer or a batch of wafers (coefficient

of variation). Guo and Sachs incorrectly call the method of Hunter and Jaworski:
"multiple response surfaces approach." Guo and Sachs describe the methods as

follows:

In most cases, spatial uniformity is not measured directly but is calculated

from measurements of output characteristics taken at specified locations.

It might be beneficial to fit these primitive output characteristics first and

then manipulate these models to construct the model of the spatial uni-

formity. This is different from the traditional approach of first calculating

the value of the spatial uniformity for each experimental run and then

fitting an equation to these calculated values. (Guo and Sachs 1993, p.

42)

Neither approach recognizes the multivariate nature of the data.

Guo and Sachs (1993) provide a simple example to illustrate the two

approaches of creating models. An experiment in the semiconductor wafer

manufacturing consisted of two factors (x1, x2) and three output characteristics

(deposition rates at three locations within the wafer). Table 2.1 contains the data

for the uniformity example. Using the naive approach of working with the last

column resulted in the following second-order model:

%) = 55.92 1.4528x, .9196x2 + 0.01094X2 + 0.00867x2 + 0.00658xlx2

The method proposed by Hunter and Jaworski suggests the fitting of first-order

models to each of the three output characteristics:

Ai = 29.8449 + 0.2940xi + 0.1175x2

A2 = 34.0764 + 0.2048x, + 0.1378x2








Table 2.1: Guo and Sachs Example: Data for Uniformity Models


Equipment Spatial
Settings Output Characteristics Uniformity
Run x, x2 Yi Y2 Y3 (%)
1 46 22 45.994 46.296 48.589 3.022
2 56 22 48.843 48.731 49.681 1.058
3 66 22 51.555 50.544 50.908 1.004
4 46 32 47.647 47.846 48.519 0.952
5 56 32 50.208 49.930 50.072 0.278
6 66 32 52.931 52.387 51.505 1.377
7 46 42 47.641 49.488 48.947 1.950
8 56 42 51.365 51.365 50.642 0.817
9 66 42 54.436 52.985 51.716 2.566
Source: Guo and Sachs (1993, p. 43)



A3 = 41.3942 + 0.1346x, + 0.0355x2

The model for spatial uniformity is now obtained by using the above first-order

models in the following equation:
V() = / -[(-)2+(92-_)2+(93-)2_[ (100)

where = A iAl +A2 A3)

This results in the following model for spatial uniformity:
() 34-M.140-0.911 xl -0.537x2+0.0064x2+0.0029x2+0.0062xlX2 1
(A) M 35.105+0.211xl+0.097X2 (100)
In other situations, the function of interest can be a ratio, I, of two
Y2

responses, where E[yi] = p,, Var[yj] = ain, i = 1,2, Cov[yi, y2] = 012 and the

correlation coefficient is p. Fieller (1932) and Hinkley (1969) considered the

distribution of the ratio, Y when Y1, y2 have a joint bivariate normal distribution.
Yy2
(See Appendix A for the distribution of Y.) This distribution approaches normality

as x2 tends to zero. Shanmugalingam (1982) noted that seldom is this condition
Pa2
met in practical situations and performed simulation studies to investigate the

distribution. Shanmugalingam also noted that heterogeneity and anormality both






18

occur together. This violates assumptions utilized in the naive approach of modeling

Sas a single response. The work of Fieller and Hinkley addresses the distribution
Y2
of the ratio but fails to address any modeling of the relationship between the ratio

and control variables.

A method for the modeling of functions of multiple responses is proposed in

chapter 5. The proposed method incorporates the multivariate nature of the data

and uses a Taylor's series approximation of the function of interest.













CHAPTER 3
A GRAPHICAL PROCEDURE FOR EVALUATING AND
COMPARING DESIGNS FOR LOGISTIC MODELS

3.1 Introduction

The design of experiments has traditionally considered linear models with

random experimental errors assumed to be normally distributed with a zero mean

and a constant variance. The texts by Fedorov (1972), Silvey (1980), Pukelsheim

(1993), and Atkinson and Donev (1992) give detailed accounts of optimal design

theory; while Box and Draper (1987), Myers and Montgomery (1995), and

Khuri and Cornell (1996) detail design of experiments from a response surface

methodology (RSM) perspective. Traditional design theory has focused on design

criteria that pertain to the variance-covariance matrix of the vector of parameter

estimates for a given linear model. A number of such criteria have been grouped

together under the so-called alphabetic optimality, for example, D-optimality and

G-optimality. Alphabetic optimality criteria utilize single-valued functions that

measure the precision of estimation of the parameters or of the mean response, both

of which involve the variance-covariance matrix of the parameter estimates. These

single-valued criteria, however, may be insufficient to assess the overall quality

of prediction of a given experimental design throughout the experimental region

(Khuri, Kim, and Um 1996). For linear models, the variance-covariance matrix

of the parameter estimates depends upon the design settings, but is free of the

unknown parameters of the model under consideration.








In many experimental situations, the common assumptions concerning the

form of the fitted model (e.g. linear) and/or the distribution of the error term (e.g.

normal with a zero mean and a constant variance) are violated. For example, one

of the most common ways in which violations may occur is when the response, in a

given experiment, is binary. Binary data are frequently encountered in dose-response

models, quantal response models, and success-failure experiments. Piegorsch (1998)

discusses engineering situations where the data are observed proportions and the

binomial probability model is assumed. Carteret al. (1986) and Vidmar, McKean,

and Hettmansperger (1992) present examples from drug combination experiments

with response-nonresponse data.

Generalized linear models (GLMs) provide an extension of linear models by

relaxing the assumptions of normality and constant variance for the random errors

(McCullagh and Nelder 1989). The basic components of GLMs consist of a response

having a distribution that belongs to the exponential family, such as the normal,

binomial, Poisson, and gamma distributions. In addition, the mean of the response

is assumed to be a strictly monotone increasing function of a linear predictor

which depends on unknown parameters and several control variables. For GLMs,

however, the variance-covariance matrix of the parameter estimates, depends upon

the unknown parameters of the linear predictor. This dependency problem causes a

great difficulty in the construction of designs for GLMs. A few strategies for dealing

with this dependency problem include the use of local optimal designs based on a
"good" initial guess of the unknown parameters, a sequential approach whereby the

design is developed in stages combined with updated estimates of the parameters

(Wu 1985a,b; Chaudhuri and Mykland 1993). Another strategy includes the use

of the Bayesian methodology which imposes prior distributions on the parameters

(Chaloner and Larntz 1989; Chaloner and Verdinelli 1995).








Advances in computing and graphical methods have, in recent years, made

it possible to develop effective graphical techniques for evaluating designs which

do not require using single-valued criteria as in traditional design theory. Variance

dispersion graphs (Giovannitti-Jensen and Myers, 1989) and quantile plots of the

prediction variance (Khuri, Kim and Urn, 1996) are such techniques used in RSM

models to assess the prediction capability of a design. These techniques, however,

still encounter the same problem of dependence upon unknown parameters when

applied to nonlinear models and GLMs. For the latter models, the parameter

estimates are biased.

While investigating the bias and accuracy of parameter estimates for the

quantal response model, Sowden (1971, p. 602) commented that his results "have

obvious implications for designing quantal response experiments. A detailed

investigation may well suggest alternative design criteria." For nonlinear models,

both bias and variance functions should be considered in the evaluation of

experimental designs. Khuri and Lee (1996) suggested a graphical approach based

on the mean squared error of prediction for evaluating and comparing designs for

nonlinear models. The use of the mean-squared error of prediction incorporates the

prediction bias as well as the prediction variance.

In this chapter, a graphical technique, based on the mean-squared error of

prediction, is developed for the evaluation and comparison of experimental designs

for logistic regression models.








3.2 Prediction Capability of Design
The prediction capability of a given design will be evaluated using the

mean-squared error of prediction namely,

MSE[A(x)] = Var[/2(x)]+}Bias[A(x]}2, (3.1)

where A(x) is the estimated value of the mean response.

3.2.1 Generalized Linear Models

Consider a sample, yi,... yN, of independent random response variables

each having a density in the exponential family (Cordeiro and McCullagh 1991),

that is,

6(yu,;Ou, ) = exp[{yO, b(O,) + c(y,)} +d(y, ,)], u = 1,2,... ,N. (3.2)

where b(.), c(.), and d(., .) are known functions and 0 is a dispersion parameter.
It follows that E(yu) = u = u and Var(yu) = -, where Vu = d (Cordeiro
d~u 0dOus
and McCullagh 1991, p. 631). The specification of a link function, 71 = g(A), is

required where g(-) is a strictly monotone increasing function and 'q is a linear

predictor. Let xu be a vector of control variables associated with the uth response

value, yu, u = 1,2,... N. The mean response at a point x in a region of interest is

modeled as a function of the control variables of the form p,(x) = h[(c(x)], where h(.)

is the inverse function of g(.), the elements of x consist of the control variables, and

il(x) is assumed to be a linear function of the form q(x) = f'(x)f. Here, f(x) is

a vector of order p x 1 whose elements consist of powers and products of powers of

the elements of x up to degree d(> 1), and /3 is a vector of p unknown parameters.

The method of maximum likelihood is utilized to estimate /3 implemented through

an iterative reweighted least squares procedure as discussed in Khuri (1993) and








McCullagh and Nelder (1989, pp. 40-43). Using the work of Khuri (1993), which
considered RSM within the framework of GLMs, an approximate prediction variance

of #(x), the estimated value of the mean response at x, is given by

Var[f#(x)] = 1(dl)2f'(x)[X'WX]-lf(x) (3.3)


where X is a matrix whose uth row is f'(xu) u = 1,2,... ,N, and W =

Diag(w,... .WN), where wu = with d" denoting the derivative of i with
ngVd de it
respect to r evaluated at xu. Note that in Khuri (1993), wu = (d )Var() -
(d u)2, The dispersion parameter, , has been factored out in expression (3.3).


3.2.2 Bias Computation

Difficulty in the computation of bias of parameter estimates has hindered the

use of bias as a criterion for the selection of designs for nonlinear models as well

as generalized linear models. Cox and Snell (1968) and Cox and Hinkley (1974)

developed a bias approximation formula for the maximum likelihood estimate of a

single parameter using likelihood theory and Taylor's series. Using tensor methods,

McCullagh (1987) developed a multiparameter version of the bias computation.
Bias results have been further developed for nonlinear regression models with

normal errors (Cook, Tsai, and Wei 1986) and with exponential family errors (Paula

1992). Cordeiro and McCullagh (1991) used the tensor methodology to develop bias

computation for generalized linear models. Khuri and Lee (1996) used Cook et al.

(1986) bias formula in the development of their graphical procedure for comparing

designs for nonlinear models. Cadigan (1994) presented a
bias computation method which did not require the use of the tensor

methodology. His method resulted in a relatively simple computational algorithm

for bias approximation. This method is summarized here.





24

Let L denote the log likelihood function, where L = L(/3) is a function of
/, a p x 1 vector of parameters to be estimated. Let u(/3) be a p x 1 vector of
first-order partial derivatives of L with respect to /3, and let j(/6) be a p x p matrix
of second-order partial derivatives of L [Hessian matrix of L]. Fisher's information
matrix is then given by I = -E[j(13)] = E[u(j3)u'(/3)].
Using the likelihood theory and Taylor's series, Cadigan (1994, p. 92,
formula (5)) developed the following bias approximation for the maximum likelihood
estimate, /3, using a second-order approximation of the bias:
1 0. vecj (/3)', -1i (4
Bias(13) z I-{E[j(6)I-1u(,3)] + -E[ -8 vecJ-1} (3.4)
2 V/
where if A is m x n and A = [a,1,... an], then vec(A) is an mn x 1 vector of the
form


al
[ a
vec(A) =
an


A Bias Formula for Generalized Linear Models

Cadigan (1994) applied the bias formula (3.4) to GLMs. The resulting bias
approximation for GLMs is given by Cadigan (1994) and Cordeiro and McCullagh
(1991, eq. 4.2):

1
Bias(,3) = --(X'WX)-X'ZdFln (3.5)
2q5

where X and W are defined as before (see formula (3.3)), and Zd =
Diag(zn,... ,ZNN), where zuu is the uth diagonal element of Z =
X(X'WX)-1X', F = Diag(fln,... fNN), where fuu = V and 1N is





25

an N x 1 vector of ones. du and d2 are the first and second derivatives of p with

respect to 7] evaluated at x.

Cordeiro and McCullagh (1991) further developed the following approximate
expressions for the bias in i- = X/3 and /A = h(i)), where #. and i) are the vectors

whose uth elements are p(xu) = h[)(xu)] and 7)(xa) = f'(xu)/, respectively,
u= 1,2,... ,N:
1
Bias(i) = -- ZZdFIn (3.6)
20)


Bias(A) = -(G2 GZF)Zdln (3.7)
24

where G1 = Diag(dAu) and G2 = Diag( d^u) (Cordeiro and McCullagh 1991, eqs.
4.3 and 4.4). Expression (3.7) follows from considering the bias of #t which is a
function of ). Cordeiro and McCullagh (1991) noted that to order n-1,

Bias[A(xu)] = Bias[)(x1)][-] + 1var[(xu)][- ], u = 1,2,... N. (3.8)
d?7u 2 d7%2

Note that Var[)(x)] = Var[f'(x)/3] -,f'(x)(X'WX)-lf(x) and Bias[7(x)] Z
f(x)Bias(/3) The expressions in (3.3) and (3.8) can then be used to get the

mean-squared error of prediction for the mean response.

Cordeiro and McCullagh (1991) also noted that the bias of the maximum
likelihood estimates, /3, in generalized linear models can be computed through

a simple weighted linear regression computation. Letting = --W 1ZdF1N,
Bias(/,3) in (3.5) can be written as

Bias(3) = (X'WX)-1X'WC. (3.9)

The bias of / is then the vector of regression coefficients based on the weighted
linear regression of t on X with W being the weight matrix.








3.3 Application to Logistic Models

Consider independent random variables, rl,... rn, with r, having a binomial
distribution with index m, and probability of success 7ru, (u = 1,... n). Hence,
E(ru) = m7r,, and Var(r.) = m,7ru(1 7r,). Let au be a vector of design settings of
a set of control variables associated with ru (u = 1, 2,... n). Consider the sample
proportions yu = -* (u = 1,... ,n). Then, E(yu) = 7ru and Var(yu) = ru(1-u). The
density for a sample proportion yu has therefore the following form:

exp[yumlog( -'u- ) + muljog(1 ru) + (3.10)

This density is in the exponential family (see equation (3.2)) with 0 = 1, 6 =
mMog(~-~'u),b(0,) = mulog(l + exp(- )),c(yu) = 0, and d(yu,q) = log( ;).

Using a logit link function with a linear predictor to model the
probability of success, 7ru, as a function of the control variables, zX, results in
g[r(xu)] = log( '(x)) = f'(xu)f3 = r(xu) = qu. Hence, the mean response is
E[y(u)] = (xu = r() = 1u = lexp(,[x.)]
For the calculation of Bias(1) = (X'WX)-'X'W W and t are needed
(dSa )2
(see expression (3.9)). For W = Diag(wl,... wWn), where wu = 1 and
Vu = du. We have wu = muru(1 7ru), = W-L ZdF1I whose uth element

1c i zu ,d, d^ z _1 d exp( xp(xp(2u)/ exp(i) _
i 1 V. -d~ 1 a")- 1 =i
1is )- 2 v =U -V 2Zuu 2 ,U (1+exp(,tl))3 (1+exp(,))2
_ 1--exp(,,) zu(1 21r) = z,,(r 1/2),u = 1,2,... ,n.
2 uu l+exp(,7n) -.2 u
Consider the mean-squared error of prediction for fr(a) at the setting,
x, of the control variables, MSE[r(a)] = Var[fr(x)] + {Bias[f(a)]}2. Here,
r(x) = exp[,(x)] where (ax) = f'(x)4. By formula (3.3), an approximate variance
1+exp[,(xi)]







of # is given by

Var(fr(x)) ;z: (d^r^ .)f^)X'WX)-1f^)

7 r(X)2(1 r(x))2f'(X)(X'WX)-f(x). (3.11)

For the prediction bias of r(x), formula (3.8) leads to the following bias expression:

Bias(fr(x)) Bias[(x)] dr( + Var[1(x)] d2ir(x) (3.12)

SBias[)(z)]7r(X)(1 7r(x)) + 2Var[?(x)]7r(x)[1 7r(x)][1 2r(x)].

Noting that Var[i(x)] = Var[f(x)/3] f'(x)(X'WX)-lf(x) and Bias[i(ax)] =
f'(x)Bias(3), we finally get from formulas (3.11) and (3.12) the following expression
for the mean-squared error of prediction at x

MSE(r(x)) = 7r2(X)(1 (x))2f'(x)(X'WX)-lf(x) (3.13)

+{f'(x)Bias(3)7r(ax)(1 7r(x))

+lf'(x)(X'WX)-lf(x)7r(x)[1 r(x)][1 2(x)]}2
2

It should be noted that MSE[fr(c)] depends upon the unknown 3. An estimate can
be obtained by replacing 03 with 3.

3.4 Quantile Dispersion Graphs
As mentioned earlier, traditional single-valued design criteria may be
insufficient to fully characterize the efficiency of an experimental design. In order
to assess the prediction capability of an experimental design for linear response
surface models, Giovannitti-Jensen and Myers (1989) suggested the use of variance
dispersion graphs (VDGs). The VDGs consist of plots of the maxima and minima
of the prediction variance on concentric spheres contained inside an experimental
region, R. Khuri, Kim and Um (1996) proposed using plots of quantiles of the








prediction variance on concentric spheres as a means of getting more information

about the design's prediction capability than can be obtained from the VDGs.

The quantile plots allow complete assessment of the distribution of the prediction

variance on concentric spheres throughout the experimental region, R.

In the linear model situation, the prediction variance depends upon the

design used and the location at which prediction is to be made. In GLMs and

nonlinear models the mean-squared error of prediction (MSEP) is considered

because of biased parameter estimates. However, the MSEP now depends upon the

prediction location, x, the design settings, and on the unknown model parameters.

More specifically, the MSEP depends upon the unknown elements of /3 in the linear

predictor. Let TD(X,,3) = MSE[a(x)], where D is a given design. After a design

D has been used in the fitting of a model, the prediction capability of the design

can be investigated by using an estimate of TD(X, 3) obtained by replacing f3 with

/3. However, if no estimate of f/3 is available or comparisons of several designs is of

interest, some subset C of the parameter space of /3 must be considered in order to

assess the dependency of the MSEP on the unknown parameters, as we shall now

explain.

Quantile dispersion graphs of the MSEP, which we shall now define, can

be utilized to evaluate and compare the prediction capability of designs: Consider

several concentric surfaces, denoted by RN, located within the experimental region,

R. These surfaces are obtained by reducing the boundary of R using a shrinkage

factor A. The prediction capability can be assessed by looking at the distribution

of the MSEP on each of several of these concentric surfaces. More specifically,

the distribution of MSEP on a concentric surface is fully described in terms of its

quantiles. For a given design D and a given /3 e C, let QD (p,/3, A) denote the pth

quantile of the distribution of values of TD(X, 3) on RA, for a specified value of the








shrinkage factor A. Now to assess the dependency of the quantiles of the MSEP on

J3, QD(PA A) is computed for several values of /3 forming a grid of points inside the
region C. Let us then consider the following quantities:


Qax(p, A) = maxQD(p,/3, A) (3.14)
D /3EC
j3ec


Qmmi(p, A) = mmin QD(P,, A) (3.15)
/3ec-

Plots of Qmax(p, A) and Qmin(p, A) against p result in the so-called quantile

dispersion graphs (QDGs) of the MSEP. Such plots, which can be constructed for

each of several values of A, provide a comprehensive assessment of the prediction

capability of a design D throughout the region R. Several designs can therefore be

compared by examining their corresponding QDG profiles. Similar QDGs graphs

were used by Khuri and Lee (1998) for the comparison of designs for nonlinear

models(Khuri 1997; Lee and Khuri 1999).


3.5 Numerical Examples

Example 1

Carter et al. (1986) utilized logistic regression in a cytotoxicity study to

investigate the dose-response curve for a combination of two agents. Cell cytotoxicity

for the combination of the two agents, methylmethanesulfonate (MMS) and phorbol

12-myristate 13-acetate (PMA), was evaluated by counting the number of viable and

dead cells. Interest was in modeling 7r(x), the probability of death, as a function of

x, = concentration of MMS/10 and x2 = concentration of PMA.

The results of the experiment are shown in Table 3.1. Each of the two

agents had 4 levels. The fitted model is given in equation (3.16) and its estimated

parameters and standard errors are shown in Table 3.2.








Table 3.1: Experimental design and


Xl
MMS/10
0
0
0
0
1
10
25
1
1
1
10
10
10
25
25
25


X2
PMA
0
1
10
100
0
0
0
1
10
100
1
10
100
1
10
100


log 1 (x) = + + 312x2 + /3x /322x2
log 11 7rx == A0 + #1X1 + 032X2 1 2l?+ ta


(3.16)


The experimental region, R, of the agent combinations is rectangular with

0 < x, < 25 and 0 < x2 < 100. Let the design shown in Table 3.1 be denoted by D1.

To illustrate the proposed graphical technique, QDGs can be utilized to investigate

the prediction capability of D1 and to compare it against two other designs, D2

and D3. The latter two designs have the same settings for x, and x2 as in D1, but

differ in the numbers of replications at the combinations of x, and x2. Design D2

is balanced in the sense that the numbers of replications are equal with magnitudes


Number of
Dead Cells r.
19
24
54
68
16
17
19
19
41
79
10
36
62
36
56
74


Total Number
of Cells m"
98
87
91
87
91
90
92
94
90
93
83
85
83
92
93
87

1436


Yu = r
0.19
0.28
0.59
0.78
0.18
0.19
0.21
0.20
0.46
0.85
0.12
0.42
0.75
0.39
0.60
0.85


response values


(Example 1)








Table 3.2: Parameter Estimates and Model Analysis (Example 1)


Parameter Estimate Std Error P-Value
Ao -1.329935000 0.1179 < 0.0001
01 -0.083535200 0.0266 0.0017
/2 0.159119900 0.0163 < 0.0001
311 0.003883342 0.0010 0.0002
/322 -0.001307519 0.00016 < 0.0001

Scaled Deviance = 13.9195 DF = 11



comparable to those for D1, which is considered nearly balanced. Design D3 is

unbalanced with widely different numbers of replications. The total number of

replications is the same as in D2. These designs are given in Table 3.3.


For each of the three designs in in Table 3.3, we consider the distribution of

TD(X, /3) on each of several concentric rectangles, RX, obtained by a reduction of the

boundary of the experimental region, R, using a shrinkage factor A, 0.5 < A < 1 (see

Figure 3.1). This is done as follows: let the bounds on variable xa be a2 and b1 such

that ai < xi < bi, i = 1,2. For each value of A, the corresponding reduced bounds

on xi are

a, + (1 A)(b1 a.) < x, < b, (1 A)(bi a2).

To investigate the dependency of TrD (x, /3) on /3, a subset, C, of the parameter space

for the /3 vector is considered. For each parameter, a range consisting of the point

estimate (given in Table 3.2) plus/minus two standard errors was considered. The

subset C was obtained by selecting 3 points within each parameter range, namely,





















Table 3.3: Designs: D1, D2, and D3 (Example 1)


D1
Number of Cells
98
87
91
87
91
90
92
94
90
93
83
85
83
92
93
87

1436


D2 D3
Number of Cells Number of Cells
90 15
90 25
90 35
90 45
90 55
90 65
90 75
90 85
90 95
90 105
90 115
90 125
90 135
90 145
90 155
90 165


1440


Xl
MMS/10
0
0
0
0
1
10
25
1
1
1
10
10
10
25
25
25


X2
PMA
0
1
10
100
0
0
0
1
10
100
1
10
100
1
10
100


1440













0
0
lambda = 1


0.9
c00
0.8

0.7

0.6


0




X C>
cm
C:I







0 5 10 15 20 25

xl


Figure 3.1 -- Concentric rectangles within the region R (Example 1)








the point estimate and the two end points.

,0 E (-1.565667,-1.329935,-1.094203)
/1 E (-0.13676336, -0.08353520, -0.03030704)
C = /32 e (0.1264244,0.1591199,0.1918154)

/3n (0.001821515,0.003883342,0.005945169)

/822 E (-0.0016228061, -0.0013075190, -0.0009922319)

For each design and a selected value of /3 in C, quantiles of the distribution of

TD(X, /3) are obtained for x E R\, where A is one of several values of A chosen from

the interval (0.5, 1]. The number of points chosen on each R\ was 2000 consisting of

500 on each side. The quantiles are calculated for p = 0(0.05)1. The procedure is

repeated for other values of /3 in the subset, C. Then, Qmax(p, A) and Qmin(p, A)

are calculated using formulas (3.14) and (3.15). The S-PLUS language (Becker,

Chambers, and Wilks 1988) was used for conducting the numerical investigation and

obtaining the actual plots.

The QDGs for the comparison of designs D1 and D2 are presented in

Figure 3.2. The plots for the two designs overlap so closely indicating a very small

difference in the prediction capabilities for D1 and D2. We recall that D1 differs
from D2 by having slightly unequal numbers of cells for the agent combinations. The

dependency on the parameter vector /3 can be seen in the separation in the values of
Qmax and Qmn, especially near the center of the experimental region (A = 0.7,0.6).

The prediction capabilities of the designs appear to be sensitive to the parameter

values. A desirable property for a design is to have close and small values of Qm"

and Q"i' over the range of p. The closeness of Q ax and Qrmni indicates robustness,

or lack of sensitivity, of the MSEP to the parameter values.















lambda=1


-- D1
A ---- D2





0.0 0.2 0.4 0.6 0.8 1.0


lambda=0.7







, = ; ." ; -."; ;; ;; ; A-; ;-.


a-
C,,
LU


00
"8 o


4-
Sc5
5 5
C)
C)


lambda=0.9


0.0 0.2 0.4 0.6 0.8 1.0


Q.
La

C>)
,,- 0
0 c
a)
'-
co
M
aC
c


0.0 0.2 0.4 0.6 0.8 1.0


lambda=0.6


0.0 0.2 0.4 0.6 0.8 1.0


Figure 3.2 -- QDGs for Designs D1 and D2 (Example 1)


0


WO
C,,
a. 0

o c
w 5;
C/)
0

= 0
C>
c;


CL

cl)
9-
ULJ

03
O
% o

4-,
CO
86
Cu








In order to see the effect of extremely "unbalanced" cell counts at the agent

combinations, designs D2 and D3 are compared in Figure 3.3. Once again, the

designs appear sensitive to the parameter values especially near the center of the

experimental region. However, the balanced design D2 appears to be somewhat

more robust to the parameter values than D3.

Example 2

Recognizing the dependency of design criteria on model parameters, Sitter

(1992) proposed a minimax procedure to obtain designs for binary data that

are robust to poor initial parameter estimates. A number of design optimality

criteria, including D-optimality, were considered. This minimax criterion resulted

in designs with more design points and greater coverage of the design space than

traditional design criteria. Experimental situations with one control variable, x,

were investigated. The probability of interest, ir(x), was modeled as a function of

/O(x p). The corresponding linear predictor is /30 + lx, where the slope parameter

/#1 is /3 and the intercept parameter /0 is -Op Here, ps is the 50% response dose,

which is termed ED50. This results in the following logistic regression model

log -r(x =[x.-p]= o + O3.u = 1,2,...,n.

Let xu be the uth design point, 7r(xu) the corresponding probability, and m, the

number of observations at xu. The determinant of the information matrix, IX'WX I,

for this model, is equal to (Scazzero and Ord 1993, p. 257)

E Zmimj7r(xi)[1 7r(xi)]7r(xj)[1 7r(xj)][xi xj]2.
i Insight into the D-optimality criterion, which maximizes this determinant, can

be gained by a close examination of this result. The criterion depends upon the















lambda=1


A ---- D2 /
* D3 A


wc'0
CI C)<

2o
0 D
o



3 0
06


0.0 0.2 0.4 0.6 0.8 1.0


______ lambda=0.7 ___

^ AA

-A-A & -a-A- ^- A-A A-

.f^ ^


0.



co


0
CO



3 0
060


0.0 0.2 0.4 0.6 0.8 1.0


lambda=0.9


0.0 0.2 0.4 0.6 0.8 1.0


lambda=0.6


A- -A-tr -A- A A -A "-


0.0 0.2 0.4 0.6 0.8 1.0


Figure 3.3 -- QDGs for designs D2 and D3 (Example 1)


a-
w
C,,,

o

JO 0

3 0
0 d


0


C/- 0
O

C)
0

3 0
0 C5


-~~AAA.~.AAA






38

probabilities as well as the design points specifically through their spacings [xi -Xj]2.

For the D-optimal design consisting of two design points the determinant reduces to


mim27r(xi)[1 7r(xi)]7ir(x2)[1 7r(x2)][Xl x2]2.

Sitter's (1992) minimax approach amounts to finding a design that minimizes the

maximum of the reciprocal of this determinant over some parameter space of A

and 3. Note that the determinant is close to 0 if either 7r(xi) or 7r(x2) is near 0 or

1. In this situation, the D-optimality criterion becomes difficult to interpret and

calculating the inverse of the information matrix can become problematic. The

inverse of the information matrix is used in calculating the MSEP, (see equation

(3.13)). Even though the design space may be unbounded in an experimental

situation, Atkinson (1995) noted that designs for logistic regression models are

constrained such that the probabilities were not too close to zero or one. For the

comparison of designs, it will also be necessary to be cautious in the selection of the

parameter space since the probabilities also depend upon the model parameters.

Sitter (1992, p. 1151) presented an illustration of the minimax procedure.

A logistic regression model was utilized in a study to investigate the economic

value of sport fishing in British Columbia. The control variable was the amount

of increased fishing cost, x, and the binary response consisted of fishing/not

fishing. Probability of fishing was to be modeled as a function of increased

fishing cost. The D-optimal design for initial parameter estimates o, = 40 and

3, = 0.9 consisted of two design points x = 38.28,41.72. The minimax D-optimal

design produced by Sitter's minimax procedure consisted of the 7 design points

x = 31.72,34.48,37.24,40.00,42.76, 45.52,48.28. The minimax design was to be

robust over the parameter space given by 33 < M! < 47 and 0.5 < /3 < 1.25. Both

designs assume equal number of observations at the design points.








Sitter noted that the minimax D-optimal design performed better than

the D-optimal design for most of the parameter space. Sitter also noted the

performance of the D-optimal design was very sensitive to the initial estimate of

it. The D-optimal design was better than the minimax D-optimal design for initial

estimates of i. that were nearly correct. As noted before, the D-optimality criterion

for the two point D-optimal design can be affected if the probabilities get close

to zero or one. For this example, the probabilities for the D-optimal design equal

7r(38.28) = 1- and 7r(41.72)= +exp(-[41.72-])* Note that as I moves

away from the initial estimate of 40, the probabilities can get very close to zero

or one regardless of the value of /3. Thus over much of the parameter space, the

D-optimal design has a determinant close to zero. As a result, the parameter space

for M was restricted to 38 < It < 42 in order to compare the designs in terms of

prediction capability.

Note also the presence of 7r(x) and 1 7r(x) in the MSEP expression in

equation (3.13). We should therefore be cautious about probability values near zero

or one. An experimental region, R, 30 < x < 50 was then considered to control the

probability values and to agree with the actual regions covered by the two designs.

The following is a graphical comparison of the prediction capabilities of

the two designs for the model log [ (x- ] = 3o + P3xl u = 1,2,... ,n. The

parameter space C for (/60, f1) is trapezoidal and consists of 0.5 < / < 1.25 and

-01(42) < Ao < -i31(38). This parameter space corresponds to 38 < IL < 42 and
0.5 < /3 < 1.25 where the slope parameter /31 is /3 and the intercept parameter /3o

is -/,31. The experimental region investigated was 30 < x < 50 and the designs

consisted of 70 runs with equal weight to the design points For selected parameter

values (/30, /1) in C, the MSEP [see equation (3.13)] can be calculated throughout

the experimental region. Since in this example there is only one control variable, no






40

shrinkage of the experimental region was necessary. Minimum and maximum MSEP

plots for points in the experimental region can be constructed where the minimum

and maximum are obtained over the parameter space C. The plots for the two

designs are given in Figure 3.4. We can see that the D-optimal design does well near

the center of the experimental region while the minimax D-optimal design performs

better away from the center. The quantile dispersion graphs for the two designs are

given in Figures 3.5 and 3.6. From these plots, it appears that the dispersion of the

quantiles for the minimax D-optimal design is less than the D-optimal design. From

this, it appears that the minimax design is somewhat more robust to the parameter

values. Figure 3.7 contains an overlaying display of the QDGs for the two designs.

It is evident that the minimax design performs slightly better than the D-optimal

design. Sitter (1992) claimed that the minimax design performed much better over

a large portion of the parameter space. His claim can be partly understood by

considering the sensitivity of determinant of the two-level D-optimal design to the

initial estimate of it. For Sitter's parameter space the two-level D-optimal design

has a determinant close to zero for much of the parameter space of p. However, our

graphical analysis used a restricted parameter space, as was seen earlier, to allow

for a fair comparison of the two designs. In this analysis, the minimax D-optimal

design still appears to perform better than the D-optimal design, but not to the

level claimed by Sitter.


Example 3

Piegorsch (1998) provided an introduction to modeling for data which are

proportions often encountered in engineering/quality settings. He presented an

example involving alloy fasteners used in aircraft construction (Montgomery and

Peck 1982, sec 6.3). Interest was in modeling the probability of fastener failure
















0
c'J _______\________-
S, Minimax D-opt
---- D-opt
LO
C)


w
/ I

W





6
& / / \














30 35 40 45 50
x
i / \\/ ^ -










Figure 3.4 Minimax D-optimal and D-optimal Designs






MAX and MIN MSEP (Example 2)
in/ // \ \


o -^ '_______^- \^^_______""~-
o '_________________________




30 35 40 45 50

X

Figure 3.4 -- Minimax D-optimal and D-optimal Designs
MAX and FAIN MSEP (Example 2)





















S---- Quantiles: Initial Parameter Estimates I


Figure 3.5 -- QDG for the Minimax D-optimal Design (Example 2)


0
0


















- -- Quantiles: Initial Parameter Estimates I


Figure 3.6 -- QDGs for the D-optimal Design (Example 2)


0
5






















- Minimax D-optimal
---- D-optimal


.- 8 .


0.0 0.2 0.4 0.6 0.8 1.0


Figure 3.7 -- Combined QDGs for the Minimax D-optimal & D-optimal Designs (Example 2)








Table 3.4: Experimental design and response values (Example 3)


X =_
Pressure Load
2500
2700
2900
3100
3300
3500
3700
3900
4100
4300


Table 3.5: Parameter


coded x =
2X-6800
1800
-1
-7/9
-5/9
-3/9
-1/9
1/9
3/9
5/9
7/9
1


m =
Total Fasteners
50
70
100
60
40
85
90
50
80
65


Estimates and Model Analysis using


Parameter
A3


Estimate
-0.0750
1.3936


Std Error
0.0830
0.1418


y=
Number Failing
10
17
30
21
18
43
54
33
60
51


coded x (Example 3)


P-Value
0.3657
0.0001


Scaled Deviance = 0.3719 DF = 8



as a function of pressure load. The results of the experiment are given in Table

3.4. 10 different pressure were considered with the number of fasteners and failures

recorded. The fitted model is given in equation (3.17) and its estimated parameters

and standard errors are shown in Table 3.5.




log 1 x) ]= 30 +,3lx (3.17)




The D-optimal design for experimental situations using model (3.17)

is given by Minkin (1987). This design is a two point design with x-values:








x, = -(1.5434 + /o)//3 and x2 = (1.5434 #o)/#i. This design illustrates the

design dependency problem. Using the estimates from Table 3.5, in retrospect,

results in x, = -1.053655 and x2 = 1.161344 which are values outside the explored

experimental region.

To investigate the prediction capabilities of the design, it will be compared

to a design with the same pressure loads but with equal replication (69 fasteners) at

each of the loads. The design in Table 3.4 will be denoted as Da and the balanced

design as Db. The parameter space investigated consisted of the point estimates,

point estimates standard error and point estimates 2 standard errors. Figures

3.8 and 3.9 contain the max and min plots of SMSEP and the QDGs of the

SMSEP for the two designs. It is evident that the two designs exhibit very similar

prediction capabilities. There is considerable variation in the SMSEP throughout

the experimental region, being low near the center and larger near the boundaries.

The designs do, however, appear to have a robustness to the parameter values.


3.6 Conclusions

The proposed graphical technique, based on using the QDGs for the mean

squared error of prediction, provides a procedure for evaluating and comparing

designs for GLMs. The QDGs provide information concerning the prediction

capability of a design and its sensitivity to the parameter values. This information

is useful in the evaluation and comparison of designs.

Myers (1999), in considering designs for GLMs, stated that standard response

surface designs may work well in such experimental situations. Because of the

dependency of design performance on the model parameters, it is difficult to judge

the performance of such designs. Myers reiterates the need for robust designs that

are efficient despite the dependency on model parameters. Further work with QDGs














0


5 -a
0


co
0
0 C
0





__ Da>
o
o D 0. 0











6x
Fiur 3.8 --a and Q ein/A n MNM Eape3
", D-b

O '\
O

CO
S0 \\ /
0
CO0 \\ /.e




o -





-1.0 -0.5 0.0 0.5 1.0

X

Figure 3.8 -- D._a and D._b Designs MAX and MIN MSEP (Example 3)














0
0

0
0



0 -- Da/
6 Db



co o _'
0




ui 0
6
C>



0 C--





0.0 0.2 0.4 0.6 0.8 1.0
P


Figure 3.9 -- Combined QDGs of MSEP for the D_a & D_b Designs (Example 3)
0
0




0.0 0.2 0.4 0.6 0.8 1.0

p

Figure 3.9 --Combined QDGs of MSEP for the DL-a & D-b Designs (Example 3)








and other graphical techniques may be useful in evaluating standard response

surface designs and in the search for robust designs.














CHAPTER 4
EVALUATION AND COMPARISON OF STANDARD RSM
DESIGNS UNDER NON-STANDARD CONDITIONS

4.1 Introduction

The development of a graphical technique for the evaluation and comparison

of experimental designs for logistic regression models was presented in chapter 3.

The Quantile Dispersion Graphs (QDGs) give a graphical representation of the

prediction capabilities of a design based on the distribution of the mean-squared

error of prediction. They also convey information about the effects of the design,

location of prediction, and parameter values on the prediction capability of a

particular experimental situation.

This chapter examines the implementation of the QDGs for the evaluation

and comparison of classical RSM designs when utilized in non-standard conditions.

A discussion is also presented concerning the General Equivalence Theorem and the

sequential generation of optimal designs.


4.2 Preliminary Results

Extensive research has been done in the area of experimental design for linear

models with the traditional error assumptions of normality, independence, and

constant variance. For the first-order model,
k
y= o + ixi +E,
i=1

where x1, x2,..., Xk is a set of control variables, the class of orthogonal designs has

many desirable properties. This class includes 2k factorial designs, certain fractions


50








thereof, as well as simplex and Plackett-Burman designs (Khuri and Cornell 1996,

chap. 3). For the second-order model,
k k k-1 k
y=83o X/3iXiE 3jx2+ E Z8jXiXj +,E,
i=1 i=1 i=1 i
examples of common designs include the 3k factorial designs, Box-Behnken designs,

and central composite designs (CCD) (Khuri and Cornell 1996, chap. 4).

In a non-standard situation, when the response is not measured on a

continuous scale, or when the traditional assumptions fail, a researcher may continue

to consider using these classical RSM designs. In such situations, how do these

designs measure up? Myers (1999, p. 34) states that standard RSM designs may

work well in these situations, but wonders, "How would one know?"

The QDGs, which were developed in chapter 3, can be used to evaluate and

compare the prediction capabilities of classical RSM designs. In the comparison of

these RSM designs it is not only of interest to compare designs against one another,

but also to compare the performance of the designs to an "optimal" design. Myers

(1999, p. 40) states that any comparison of designs must include a comparison with

an "optimal" design. Basically, the optimal design will serve as a base-line or a

reference point with which to compare the designs.


4.2.1 Design Theory

The concepts of design measure and moment matrix are important ideas

in the theory of optimal designs and will be discussed in this section. Khuri and

Cornell (1996, chap. 12, sec. 2) provide an introduction to optimal design theory.

An experimental design over an experimental region, R, can be defined

as a probability measure and is referred to as a design measure, (. The design

measure, C, satisfies the two properties ( (x) > 0 for all x E R and fR d((x) = 1.








A design measure is said to be discrete if it consists of n points in R where

C(Xk) = mk/N, if Xk is a design point, and 0 otherwise (k = 1, 2,... n). Note, that
mk denotes the number of observations being collected at the kth distinct design
point (k = 1,2,... n) and N = M= k. All other design measures are called
continuous design measures.
For the definition of the moment matrix, consider a linear model of the form

y = f'(x)f + E.

The moment matrix of a design measure C is

M() = [mij(0)] = [f fi(x)fj(x)dC( (x)]

where fi(x) is the ith element of f(ax) evaluated at a point x E R (i = 1,2,... ,p).
For a discrete design (, it follows that M(C) = En=1 -f(Xk)f'(Xk) = NX'X

where X results from expressing the linear model in matrix notation, that is,

y = Xf3 + e. Using the moment matrix, the standardized prediction variance
function is defined to be d(x,C)= f'(x)[M()]-1f(x).
A design measure is defined to be D-optimal if it maximizes IM(C)I over the

class of all design measures. A design measure is G-optimal if it minimizes over the
class of all design measures the maximum standardized prediction variance over the

experimental region R.

4.2.2 Equivalence Theorem and Extensions for GLMs

The Equivalence Theorem of Kiefer and Wolfowitz (1960) established a
relationship between D-optimality and G-optimality. This result has been utilized

in the evaluation of designs as well as the construction of designs (Wynn 1970).
The Equivalence Theorem basically establishes three design properties that are







equivalent and, hence, occur simultaneously. When applied to linear models, the

equivalence theorem gives the result that a D-optimal design measure (* satisfies

the following properties simultaneously:

C* maximizes IM(()I, over the class of all design measures

minimizes maxxd(x, (), over the class of all design measures

maxxd(x, (*) = p,

where for linear models, d(x, C) = f'(x)[M(()]-1-f(x) is the standardized prediction

variance, and p is the number of parameters in the linear model. This result gives
the equivalence of D- and G-optimal designs for linear models. Note that for a

discrete design measure, d(x, ) = Var[(x)], where Var[a(x)] is the prediction

variance at x.

White (1973) provided an extension of the Equivalence Theorem to nonlinear
models by giving a generalization of the variance function, d(x, (). Noting the

dependence of designs upon the parameters of a given nonlinear model, White

defines a design measure (* to be D(o)-optimal if it maximizes the determinant of

M((, /3) for a given f3 over the class of all design measures. The generalization of

d(x, () is given by

d(x, C,3) = trace{I(x,/3)[M(C,/3)]-}, (4.1)

where I(x, f3) is the information matrix at the point x. The (i, j)th element of

I(x,/ ) is

E L OL


where L is the log of the probability density function or probability mass function

associated with a random variable observed at the point x (White 1973, p. 345).








The information matrix at x for GLMs can be written as (Atkinson and Haines

1996, p. 456)
1
(4.2)
x = Var[y(X)]g,[(X)]}2f(x)f(x) 42)

The Equivalence Theorem can be extended by defining a design to be G(13)-optimal

if it minimizes the maxxd(x, ,3) for a given f3 over the class of all design measures.
The Extended Equivalence Theorem given by White states that the following three

properties of a design measure (* are equivalent

C* is D(/3) optimal, over the class of all design measures

C* is G(/3) optimal, over the class of all design measures

maxxd(x,C *, 3) = p.

Application of the Extended Equivalence Theorem to GLMs

For generalized linear models, the response y(x) has a distribution in

the exponential family with a mean, E[y(x)] = t(x). The linear predictor,

77(x) = f'(x)/3 is related to the mean through a strictly monotonic increasing link

function, g(.). The relationship is 77(x) = g[p(x)] = f'(x)/3, and using the inverse

function h(.) = g-'(.) gives the result that (i(x) = h(77(x)). The model for the

response can be written, as



= h[q(x)] + e,

where the random error, E, has a distribution in the exponential family (see chapter

3, section 3.2, equation 3.2). The moment matrix for GLMs for a discrete design







measure C is

M((, 8)= XWX = Z 1( f(Xk)f( (4.3)
N ~ ~k=1 N ,#(43
where q is the dispersion parameter, the summation is over the n distinct design
points with replication, Mnk, N = Y M ik, and OW is a diagonal matrix whose uth
diagonal element is w(, Var[ )] )]' u = 1, 2... N. The information

matrix, I(xc, /3), at x is defined as -' ff(x)f'(x) (see equation (4.2)). Using this,
w(XT/)
the variance function of White (1973), (4.1) can be written as follows

d(axC,(,/3) = trace{I(x,/3)[M((,/3)]-1}
1
= trace{w(1)f(x)f(aX)[M(CA)]-}

w(x,f3)



A motivation for the above result can be given by recalling that in linear models,
the standardized prediction variance is d(x, () = Var[y(x)] for a discrete measure
(. For GLMs there is a choice, however, concerning the use of either the prediction
variance of the linear predictor, i)(x), or the estimated mean response, A(x). The
following result establishes that the two approaches are equivalent and agree with
the variance function of White (1973).
Under the GLMs setting, consider i(x) = f'(x)3 and /2(x) =
h['i(x)]. The prediction variance of 17(x), Var[)(x)], is approximately equal to
1f'(x)[X'WX]-lf(x), and the prediction variance of /2(x) using first-order
Taylor's series is Var(/2(x)) 1f'(x)[X'WX] -f(x){h'[7'(x)]}2. Applying the







analogue of the scaling from linear models results in

N Var[A(x)= f'(x) M((, 0)]'f(X)
Var[y(x)] Var[y(x)]
1
'[M(,] (var[y(x)][g(p(x))]

w(x, f 3)(X)[M(C,"]-,f W

= d(x, C,3).

Therefore, the two approaches of getting d(x, (, f3) are equivalent.

Example Logistic Regression Models

Binary response data are commonly modeled by using logistic regression
models. The binary response, y(x), has the binomial distribution with
index, 1, and probability of success, ir(x). The mean of the response is

p(x) = 7r(x) and the variance is Var[y(x)] = 7r(x)[1 ir(x)]. Using the
logistic link function gives log [(X) = f'(x)3 = (x). The weight, w(a,k),
equals -X1 and the moment matrix for a discrete design measure ( is

M(C,3) = 1 X'WX = Z^=1 E'!r(X)[1 r(xk)]f(xk)f'(xk), where mk if the

number of experimental units at the kth distinct experimental run (k = 1, 2,... n).

The standardized prediction variance, d(x, (, 3), follows as

7r(x)[1 7r(x)]f'(x)[M(, 0)]-1f(x) = Nr(x)[1 r(x)]f'(x)[X'WX]-1 (x).

(4.4)

This is the appropriate variance function for using the Extended Equivalence
Theorem and establishing optimal design results. Note that since the variance of the

response is no longer constant, scaling of the prediction variance does not remove
the variance term completely. The presence of the binomial variance, 7r(x)[1 7r(x)],








is evident in the variance function. The variance function incorporates the variance

of the response, y(x), at the location, x.


4.2.3 Sequential Generation of D-optimal Designs

Using the Equivalence Theorem of Kiefer and Wolfowitz (1960), Wynn (1970)

proposed a sequential procedure for the generation of D-optimal designs for a linear

model. Wynn's procedure generates a sequence of designs as follows:

1. Start with an initial discrete design, D, with no points, x1, X2.,.. Xno, which

support the fitting of the linear model.

2. Find a point, xa, which maximizes the variance function,


d(x,DO) = f'(x)[M(Do)]-l f(x).

Augment Do by adding the point x0 to get the design D1.

3. Repeat step (2) to obtain a sequence of designs D, D1,... Dn,... where D"

is obtained from Dn-1 by adding a point of maximum standardized prediction

variance based upon using D"1.

Wynn established that this sequence of designs approaches the D-optimal design. In

practice, the procedure is performed until "convergence" of the variance function to

p is attained.

A natural extension of the sequential procedure for the generation of designs

for GLMs and nonlinear models would involve the use of the Extended Equivalence

Theorem and the standardized prediction variance function. The procedure would

require a specification of initial parameter values and, therefore, the resulting design

would be locally optimal.








Sequential Generation of Design Single-Variable Logistic Model

For the single-variable logistic model, log [ ()= ] 3o = 3lx, the D-optimal

design is known to be a two-point symmetric design with equal replication at the

x values corresponding to effective dose (ED) probabilities of 0.824 and 0.176

(Abdelbasit and Plackett 1983; Minkin 1987).

To illustrate the sequential generation of designs, the procedure was applied

to the single-variable logistic model. Initial parameter values of 0 for the intercept,

#o, and 1 for the slope, 31, were used resulting in the model, log r ( = x.
I "l (X, I) X
Let 30 = (0, 1). The D-optimal design is then to have equal replications at

log 17 = -1.5437 = x, and log 0824 = 1.5437 = x2. The design, D, consisting of

the 3 points of -1, 0, and 1 with equal weights, i.e. u was used as the initial
N -, 3 a sd te nta

design and the experimental region, R, was selected to be [-1,1]. The standardized

prediction variance function in (4.4) was utilized. The maximum of d(x, D, 00) over

R is equal to 2.417 and is attained at the location x = -1. This value was added to

the initial design to obtain the design D1. Application of the sequential procedure

either added the value of x = -1 or 1 to the previous design. The design with 25

points resulted in a maximum variance function of 2.031 while the design with 75

points resulted in a value of 2.010. As can be seen, the D-optimal design consists

of points located on the boundary of the experimental region, as is the case with

D-optimal designs for linear models.

The above procedure was performed again using an experimental region

[-2,2] and the same initial 3-point design, D. The initial design had a maximum

variance function value of 3.694 at the location x = 2. This point was added to the

initial design and the resulting design had a maximum variance function value of

3.460 at the point x = -1.94. The sequential generation of designs was continued

until a 77-point design had a maximum variance function value of 2.010. The








Table 4.1: Frequency Table For Sequential Design Points


X Frequency Percent
-1.94 1 1.3
-1.78 1 1.3
-1.68 1 1.3
-1.64 1 1.3
-1.6 1 1.3
-1.58 2 2.6
-1.56 7 9.1
-1.54 23 29.9
-1 1 1.3
0 1 1.3
1 1 1.3
1.54 22 28.6
1.56 7 9.1
1.58 2 2.6
1.6 1 1.3
1.64 1 1.3
1.68 1 1.3
1.76 1 1.3
1.94 1 1.3
2 1 1.3



structure of the design can be seen in Table 4.1. The design has converged towards

the D-optimal design of equal replication at x, = -1.5437 and x2 = 1.5437. The

starting design points of -1, 0, and 1 are clearly seen to be very poor points. Recall

that this design is a locally D-optimal design for the initial parameter values of

,0 = 0 and /i = 1. Atkinson and Donev (1992, p. 119) note that at this point

there are several approaches for more precisely finding the D-optimal design. These

approaches include exchange algorithms in which the poor points from the initial

design are dropped and the sequential procedure is started again. Other possibilities

include analytical optimization, if possible, and numerical methods which use the

design from the sequential procedure as a starting point.








4.3 Scaled Mean-Squared Error of Prediction

In chapter 3, the development of the QDGs involved the use of the

mean-squared error of prediction (MSEP). The MSEP incorporates both the

prediction variance and prediction bias.

MSE[A(x)] = Var[A(x)] + {Bias[A2(x)]}2.

The Equivalence Theorem establishing the relationship between D-optimality and

G-optimality uses a standardized prediction variance function. For linear models,

this function results in a quantity that is free of scale and is placed on a per

observation basis. Giovannitti-Jensen and Myers (1989), in their development of

variance dispersion graphs, utilized the variance function, d(x, (), for linear models.

Recently, Khuri and Lee (1998) proposed quantile plots of the estimated scaled

mean-squared error of prediction for the evaluation and comparison of designs for

nonlinear models. The scaled mean-squared error of prediction is given by
N N N
Ny MSE[JA(x)] = Var[ Var f[(x)] + ar[y(x)] {NBias[f(x)]}2 (4.5)
Var[y(a)] Var)y(x)] VariyWx)]

The first portion can be recognized as the standardized prediction variance function,

d(x, (,3). This portion depends on the location of prediction, x, the parameters,

03, and the experimental design. This quantity can be evaluated by using the design

weights, 1-, which enter in the moment matrix, M((, 3). From equation (4.3), we

note that the moment matrix depends only upon the design weights and the distinct

design points of the experimental design. Working with the design weights gives
flexibility in the consideration of designs and removes the dependency upon N since

the computation of d(x, (, 0) would only require knowledge of the distinct design







points and the design weights. Rewriting the second portion in (4.5) as

N 1
_{Bias[ft(x)]}2 = f N_______]}
Var[y(x)] NVar[y()] {Bias[f()]}2

also allows the computation of NBias[#(ax)] in terms of the design only through the
distinct design points and the design weights. In order to see this, consider that

NBias[f(x)] = Ndias[(x)] + N-Var[(d)] d2 (4.6)


(see chapter 3, section 3.2, equation 3.8). We consider the two portions of equation
(4.6) separately. From the first portion, consider NBias[r(z)] = Nf'(z)Bias(/). It
needs to be shown that NBias(/3}) can be calculated using the distinct design points
and the design weights. From section 3.2 of chapter 3 equation (3.5) it follows that

NBias[] = N(--)(X'WX)-X'ZdFIN
20
S ) [XWX] -1 ZdFlN
= (-- )[X' X-l'zdF1N

(-)[M(,61)]1-X'ZdF1N

The moment matrix can again be computed using the distinct design points and the
design weights. It needs to be established that X'ZdFI N can be computed using the
distinct design points and design weights as well. Zd is a diagonal matrix which can
be written as a block diagonal matrix with the matrices zklmkI.k along the diagonal,
where Zkk is the element (see chapter 3, section 3.2, equation 3.5) associated with
the kth distinct design point with replication, mk. F is also a block diagonal matrix
with the matrices fkklm, along the diagonal, where fkk is the element (see chapter
3, section 3.2, equation 3.5) associated with the kth distinct design point with
replication, ink. X'ZdFIN can now be expressed as Y= m zkkfkkfk(Xk) where the
summation is over the n distinct design points. In order to use the design weights,







k N
we need in the expression. Consider using -, mk-Zfkkf(xk). Note that

Nzkk is equal to f'(Xk) [x'W]X f(xk) (k = 1,2,... n). It finally follows that
X'ZdFlN can be calculated using the distinct design points and the design weights.
Considering now the second portion of equation (4.6), note that

1 YWY f W
NVar[n(x)] = f'(X) XWX]- f(x)

= f'(x)[M(C,f3)]1f(x).

Therefore, the second portion of equation (4.6) depends upon the moment matrix,

M((,/3) and can be computed without knowledge of N. Note, however, that the
prediction bias portion of the scaled mean-squared error of prediction (SMSEP)

does depend upon the total sample size, N. The researcher must therefore have
some idea about the number of experimental runs available for the experiment. If
the researcher is only interested in using the standardized prediction variance the

comparisons can be performed without the specification of N.

4.4 SMSEP for the Logistic Regression Model

For binary response data modeled using the logistic link function, the
mean-squared error of prediction is given by

MSE[7r(x)] = Var[#(ax)] + {Bias[f(x)]}2.

The appropriate scaling for this situation is N which is the total sample
7r(x)[1-7r(X)]
size divided by the binomial variance for a single observation at the location, x. The
scaled mean-squared error of prediction takes the form

N MSE[_f(x)] = NVar[_(_)] {NBYias[(x)]}2 (47)
r(x)[1 7r(x)] L J r(x)[1 7r(x)] Nir(x)[1 r(x)]"








The SMSEP depends upon the unknown f3, the design, and the control variables,

i.e. the elements of x. Let TD(X, 13) = SMSEP[fr(x)] where D is a given design. To

evaluate TD(ax, 6) for a given a design, an estimate of rTD(a, 3) can be obtained by

replacing 13 with /3. Otherwise, some parameter space concerning 13, denoted by C,

must be considered. The QDGs can be constructed for the standardized prediction

variance function and the SMSEP as illustrated in the following example.

4.5 A Numerical Example

Consider again (see chapter 3, section 3.5) the cytotoxicity study described

in Carter et al. (1986, sec. 2.5). The investigation involved two control variables

and interest was in modeling 7r(x), the probability of death of the cells. The

experimental design, a 42 factorial design, and response values yu are given in Table

4.2, and the parameter estimates for a complete second-order model for the linear

predictor are given in Table 4.3. The generalized linear models procedure (PROC

GENMOD) in SAS (1997) was used to fit the model.

The results of the experiment were utilized as preliminary information for the

evaluation and comparison of designs by using the estimates as initial guesses for the

parameter values. The subset, C, of the parameter space was obtained by selecting

3 points for each parameter, namely, the estimate and the estimate plus/minus two

standard errors.

It would be of interest to evaluate and compare the prediction capabilities

of two second-order designs, namely a 32 factorial design and a central composite

design. These designs can be compared against a D-optimal design which will need

to be obtained through sequential generation.











Table 4.2: Experimental design and response values


Number of
Dead Cells r,
19
24
54
68
16
17
19
19
41
79
10
36
62
36
56
74


Total Number
of Cells m"
98
87
91
87
91
90
92
94
90
93
83
85
83
92
93
87

1436


Table 4.3: Parameter Estimates


Parameter


02
ill
/22
012


Estimate
-1.3430
-0.0818
0.1598
0.0039
-0.0013
-0.0001


Std Error
0.1232
0.0270
0.0165
0.0010
0.0002
0.0002


Xl
MMS/10
0
0
0
0
1
10
25
1
1
1
10
10
10
25
25
25


X2
PMA
0
1
10
100
0
0
0
1
10
100
1
10
100
1
10
100


yu = r-
M.n
0.19
0.28
0.59
0.78
0.18
0.19
0.21
0.20
0.46
0.85
0.12
0.42
0.75
0.39
0.60
0.85








4.5.1 Sequential Generation of D-optimal Design

The experimental region is rectangular with x1 E [0, 25] and x2 E [0,100].

The sequential generation of a D-optimal design was performed using the parameter

estimates from the "initial" experiment as the initial parameter values. The initial

design, D, consisted of the 16-point 42 factorial design used by Carter et al. (1986)

The initial design can be seen in plot A of Figure 4.1. The sequential generation

resulted in a design with N = 375 experimental runs and a maximum d(x, f(,3)

value equal to 6.043. This design, however, consists of only 36 distinct points (see

plot B of Figure 4.1). Notice that the points tend to fall into clusters or groups

which suggests that a reduction of the points can be obtained. A cluster analysis

was performed using the S-Plus language (Venables and Ripley 1994, sec. 12.2)

which reduced the number of distinct design points to 9. The resulting design had

N = 375 experimental runs with a maximum d(x, f3) value of 6.085 (see plot

C of Figure 4.1). In an attempt to find the D-optimal design more precisely, this

design was used as an initial design and the sequential procedure started again.

This resulted in an additional 33 runs and after clustering the resulting design had 9

distinct design points with 408 experimental runs and a maximum d(z, (, /3) value

of 6.028. This approximate D-optimal design (see plot D of Figure 4.1) appears to

be an adjusted 32 factorial design.


4.5.2 Comparison of Designs

For the comparison of designs, the approximate D-optimal design (see plot

A of Figure 4.2) along with a 32 design and CCD designs were considered. The 32

design consisted of the 9 points associated with x, 0, 12.5,25 and x2 E 0, 50, 100

with equal replications and 405 experimental runs (see plot B of Figure 4.2). The

CCD designs used a coded axial value of a = 1 in order to remain within the














A: initial design
N=16 w/16 distinct pts max d = 17.357









0 5 10 15 20 25

X1



C: after cluster design
N=375 w/ 9 distinct pts max d = 6.085




0 5 10 15 20 25

*0


0 5 10 15 20 25


B: after sequential design
N=375 w/36 distinct pts max d = 6.043







0* 10 1


0 5 10 15 20 25


D: approx D-optimal design
N=408 w/9 distinct pts max d= 6.028




0 5 10 15 20 25

*


0 5 10 15 20 25


Figure 4.1 -- Sequential Generation of D-optimal Design






67

experimental regions. The levels for the CCD designs (corresponding to the coded

levels of -1, -2, 0, , 1) were x, C 0, 6.25, 12.5,18.75, 25 and X2 E 0, 25,50, 75,100.

Two CCD designs were used with 405 experimental runs. The first CCD design

(CCD1) consisted of equal replication of the 9 design points. The second CCD

(CCD2) design replicated the center point twice as much as the factorial and axial

points (see plot C of Figure 4.2). The experimental designs are given in Table 4.4.

The QDGs can be graphed for both the standardized prediction variance

and the SMSEP. First, the performance of the two CCD designs are compared

(see plots of Figure 4.3). It can be seen that on the boundary of the experimental

region (A = 1) the performance of the two designs are very similar. Towards the

center (A = 0.6) the performance of design CCD2 is slightly better, but overall

the performances of the two designs are comparable. Since the CCD2 design had

more replication at the center it is expected that it would have better prediction

capabilities near the center. For the overall comparison of designs, we consider the

approximate D-optimal design, 32 design, and the CCD1 design. The QDGs for the

standardized prediction variance and SMSEP for A = 1,0.8, and 0.6 are given in

Figures 4.4, 4.5, and 4.6. On the boundary of the region, the CCD design has the

largest dispersion. The CCD design exhibits great variation in the quantiles and the

quantile values larger than the median appear sensitive to the parameter values. The

magnitudes of the scaled quantities are also noted to be large on the boundary. The

scaling involves multiplying by N which can get large for values of 7r(x)

near zero or one. Moving towards the center of the region A = 0.8 (see Figure 4.5),

the CCD design still has variation in the quantile values. The D-optimal design and

the 32 design, however, have max quantiles for the standardized prediction variance

larger than the CCD design and have greater sensitivity to the parameter values.

The CCD design is affected by the bias portion of the SMSEP and again exhibits
















A: D-optimal design

NA IM































U K

0 5 10 15 20 25
X1


B: 3A2 design

N Y U


I N 1

0 5 10 15 20 25
x1


C: CCD designs

U iU







U























U M

0 5 10 15 20 25
XI


Figure 4.2 -- Experimental Designs










Table 4.4: Experimental Designs


D-optimal Design (N=408)
Xi X2
MMS/10 PMA Design Weight
0.0 1 55/408
0.0 18 38/408
0.0 100 61/408
12.5 4 22/408
12.5 20 44/408
12.5 100 41/408
25.0 0 54/408
25.0 15 32/408
25.0 100 61/408


32
Xl
MMS/10
0.0
0.0
0.0
12.5
12.5
12.5
25.0
25.0
25.0


Design (N=405)
X2
PMA Design
0
50
100
0
50
100
0
50
100


Weight
1/9
1/9
1/9
1/9
1/9
1/9
1/9
1/9
1/9


CCD Designs (N=405)


X1
MMS/10
0.00
0.00
25.00
25.00
6.25
18.75
12.50
12.50
12.50


X2
PMA
0
100
0
100
50
50
25
75
50


Design Weight
CCD1CCD2
1/9 1/10
1/9 1/10
1/9 1/10
1/9 1/10
1/9 1/10
1/9 1/10
1/9 1/10
1/9 1/10
1/9 2/10















lambda=1l




---- CCD1 /
-- CCD2


"aO
C)
-0





CO

00



.*--a
co


0)0)








..N
S-
~cv

^0-.
0C\I
Ow

(0=
81r
CO,


0.0 0.2 0.4 0.6 0.8


CL
wI o



0
80

C)
=3 8
0


CO c
4-
0
U)




c o
M
0

*:3
a


lambda=l1
/
/

//,
/-,

-^


0.0 0.2 0.4 0.6 0.8 1.0


lambda=0.6

--- ---
,^XJ

-" "- '

-/' _ .


0.0 0.2 0.4 0.6 0.8 1.0

P


Figure 4.3 -- QDGs for the CCD designs


0.0 0.2 0.4 0.6 0.8 1.0

P




lambda=0.6


---_




-- --












/
/
/
/
/
I
/



/Dopt
- 3/
/
/


/
/
-- D o p t
-- 3sq /
-- CCbl /
/


0.0 0.2 0.4 0.6 0.8 1.0
P


/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/




0.0 0.2 0.4 0.6 0.8 1.0
P


Figure 4.4 -- QDGs of Standardized Prediction Variance and SMSEP -- Lambda = 1






72

large dispersion in the QDG of SMSEP. Near the center A = 0.6 (see Figure 4.6), the

D-optimal design shows great variation and sensitivity to parameter values for both

quantiles of the standardized prediction variance and SMSEP. The 32 design and

CCD design have much better looking QDGs. The CCD design is slightly better

than the 32 design for the standardized prediction variance, whereas, the 32 design is

slightly better for the SMSEP. Overall, the D-optimal design seems to perform well

on the boundary and poorly near the center. The CCD, however, does well near the

center of the region and poorly on the boundary and is affected by the prediction

bias. The 32 design appears to be a nice compromise in this situation, in fact, this is

a CCD design with coded axial runs at a = 1.


4.6 Conclusions

This chapter investigated the use of QDGs as a graphical tool for the analysis

of the prediction capabilities of classical RSM designs in non-standard situations.

In order to compare the designs to optimal designs a discussion of the sequential

generation of D-optimal designs was presented. The sequential generation of designs

involved an application of the Extended Equivalence Theorem. The QDGs are

extended to the standardized prediction variance and the scaled mean-squared error

of prediction. For evaluating and comparing classical RSM designs, the QDGs allow

for an assessment of parameter dependence caused by the non-standard conditions.
















.0 T- I--- I 8C)
-o -- Dopt C Oo
C3s3s ,C/
,0 0
Ca 0
-=o Dop ,/

S 0 O


CO CflC
i o / ,'
S ; // 8/""/

A /I
2 0 -i /^./I

_____ 'I / ''
/

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
P P

Figure 4.5 QDGs of Standardized Prediction Variance and SMSEP -- Lambda = 0.8









































0.0 0.2 0.4 0.6 0.8 1.0


0.0 0.2 0.4 0.6


Figure 4.6 -- QDGs of Standardized Prediction Variance and SMSEP -- Lambda = 0.6


0.8 1.0


-------------------














CHAPTER 5
MODELING FUNCTIONS OF MULTIPLE RESPONSES

5.1 Introduction

As previously mentioned in chapter 2, response surface methodology was

developed around the traditional assumptions of a linear model and a normally

distributed response variable. However, in a multiresponse experiment, a number

of responses are observed and the variable of interest to an experimenter may be

calculated from the responses. For example, Hunter and Jaworski (1986) note that

in some chemical experiments, the yield is obtained by multiplying the observed

values of conversion and selectivity, while in other situations the profit is often

computed from several observed responses. Also, as mentioned in chapter 2, in the

semiconductor industry, the experimenter is often interested in modeling the ratio of

responses. These experimental situations present a challenging problem of modeling

a function of multiple responses.

In Section 5.2 we review "naive" approaches for modeling a function of

multiple responses. In Section 5.3 we present a modeling procedure that addresses

the multivariate nature of the data. Section 5.4 consists of numerical examples. This

is followed by a summary of recommendations and conclusions in Section 5.5.


5.2 "Naive" Modeling Procedures

For the modeling of a function of multiple responses there are two standard

methods that have been utilized in the model building process. In this section, we

explain and critique these two methods.






76

The first method reduces the problem to a single variable situation in which

standard modeling techniques can be applied. The procedure consists of modeling

the calculated response as a polynomial function of the control variables. For each

experimental run the function of interest is applied to observed responses reducing

the data for the run into a single response variable. This method does not recognize

the multivariate nature of the data and, as a result, it does not take into account

the correlation structure among the responses. The distribution of the computed

response may also violate the standard assumptions for linear modeling. If, for

example, the computed function involves a sum of responses, the distributional

assumptions may be feasible. On the other hand, if the computed function consists

of a complex function such as a product or ratio of responses, there could be serious

violations of the distributional assumptions. Avoiding the multivariate nature of

the data and possible violations of distributional assumptions can result in a less

efficient approach for the modeling of the function of interest.

The second method was proposed by Hunter and Jaworski (1986) as an

alternative approach to modeling the computed response variable of interest. The

procedure consists of fitting individual models to the responses and using the

function of interest on the fitted models to construct a model for the response of

interest. Hunter and Jaworski (1986, p. 321) suggest that their method results

in flexible models and greater insight into the process under investigation. This

procedure, while utilizing the individual responses, fails to address the multivariate

nature of the data in the fitting of the models. Furthermore, this procedure does

not take into account any information from the structure of the function of interest.

The two procedures can be visualized in Table 5.1. The first procedure consists of

working with the rows first, by computing the response of interest for each run and

then modeling the values in the last column. The method proposed by Hunter and








Table 5.1: Modeling a Function of Multiple Responses


Individual Obs. on r Responses Response of Interest
Run Yi Y2 Yi yr G(yi, y2,... yr)
1 yn y12 yli yir G(yn, Y12,... ,yir)
2 y21 y22 "'" y2i "" y2r G(y2,22,... ,y2r)

U Yul u2 .. Yu.i ".... Yur G(y(,uy2,. ,Yur)

n YNm YN2 Y"Ni YNr G(yN1, yN2,.. YNr)

model model
functions Yi y2 ... i y function
Source: Hunter and Jaworski (1986, p. 325)



Jaworski, however, consists of working with the columns of the individual responses

and then combining the fitted models in the last row to obtain a fitted model for the

response of interest.

These two methods lack foundations which would motivate the procedures.

They consist of ad-hoc and naive approaches to the problem. Both approaches

ignore the multivariate nature of the data in the experimental situations and are

therefore not taking into account any relationships that may exist among the

individual response variables.


5.3 Multiresponse Analysis and Proposed Modeling Procedure

For an experimental situation involving a response of interest that is

computed from individual responses, it is important to consider the multivariate

nature of the data. In this section, we review the analysis of the linear multiresponse

model and propose a new procedure for the modeling of a function of responses that

addresses the multivariate nature of the situation.








Suppose that there are r responses, namely, Yi, y2,... ,yr each of which is

believed to depend upon a set of k control variables denoted by x1, x2,... Xk and

the experimenter is interested in modeling a function of the responses, namely,

G(y 2, y.2.. y,.). Khuri and Cornell (1996, chapter 7) present a discussion of

methods for the analysis of multiresponse experiments. We summarize here the

estimation of a linear multiresponse model.

Let us assume that the individual response variables can be modeled using

polynomial regression models in the xi's. The model for the ith response is then of

the form

yi= f'i(x)3 +i, i=1,2,... ,r (5.1)

where fi(x) is a vector of order p, x 1 whose elements consist of powers and products

of powers of the elements of x = (x1, x2,... Xk)', and /Oi is a vector of pi unknown

constant coefficients. From an experiment consisting of N runs, the following model

for the ith response can be obtained from (5.1)


y =Xii + ej i = l,2,... ,r (5.2)

where y, and ej are N x 1 vectors of observations and random errors for the ith

response and X, is an N x p, matrix whose uth row is f'(xu,) u = 1,2,... N. As

a final step in forming the linear multiresponse model, the models in (5.2) can be

combined to construct a multiresponse model of the form


y =Xf3 + E (5.3)
where y = [y' y : ... : y']', f = [f3': 31 : ... : 31]', e = [e[: : : e }', and

X is the block-diagonal matrix, diag(X1, X2,..., Xr). The assumptions about the

random errors are: E(e1) = 0, Var(e4) = aiiIN, and Cov(ei, ej) = OijlN. Thus,








random errors from the same experimental run are assumed to be correlated, but

random errors from different experimental runs are uncorrelated. Let E denote

the r x r matrix whose (i, j)th element is oa (i,j = 1, 2,... ,r), which is the
variance-covariance matrix for the r responses. It follows that e in (5.3) has a
variance-covariance matrix of the form

Var(e) = A = E IN. (5.4)

It follows that the best linear unbiased estimator of / is given by

3=(XA X XA y (5.5)

with a variance-covariance matrix equal to (X'A-1X)-1. Since E is typically
unknown, an estimate of it needs to be used in (5.5). A consistent estimator of E
was proposed by Zellner (1962) which uses the residual vectors from an ordinary
least-squares fit to the single-response models in (5.2). This estimator is given by

E = (&ij) where

S y[IN Xi(XX)-Xi)[IN Xj(X'Xj)-Xj]yj
aij =N,j = 1,2,... ,r. (5.6)

Replacing E with t in (5.4) results in A* = 9 0 IN- Using A* in (5.5) we obtain
the estimator

S= (XIA*XlXIA y. (5.7)

This is called the seemingly unrelated regression estimator (SURE) of f. It is also
called the estimated generalized least-squares estimator of f/. Khuri and Cornell

(1996, p. 254) note that if the single-response models in (5.2) have the same Xi

matrices, that is Xi = X0, then the best linear unbiased estimator of fa does not
depend on E. In this case, the best linear unbiased estimator of /3i in (5.2) is the








same as the ordinary least-squares estimate obtained from individually fitting the

ith model (i = 1,2,... ,r).

Let us now consider a multivariate situation in which the experimenter has

interest in modeling a function of the responses, for example, G(yl, y2). Using linear

models for the means of the two responses results in:

Ai(x) = E[yi(x)] = fi(x)O3l

/2(X) = E[y2(x)] = f2)2.

Observations within an experimental run are correlated with a variance-covariance
matrix denoted by E. The linear multiresponse model estimation described above

can be utilized for the estimation of 1, /32, and E. It follows that the fitted models

are:


Aila0 = A(01

#2(X) = fP()2-

We now propose a procedure for modeling a function of multiple responses

that recognizes the correlation among the responses and takes into account the

structure of the function of interest. The proposed procedure involves approximating

the function of interest by using a Taylor's series approximation about the means of

the individual responses. We assume that G(y1, y2) is continuous and has continuous

partial derivatives of order < 3 in a neighborhood of (Al, P2). Using a first-order

approximation of G(yi, Y2) in a neighborhood of (Al, 12) results in

Z = G(1 2)+ (Yi -11)- I + (Y2 112) ] G(p1, A2). (5.8)

This approximation can be used to develop approximate expressions for the expected

value and variance of G(yi, Y2). The expected value expression resulting from the







first-order approximation is

E[G(yi, y2)] G G(pi, A2)- (5.9)

The variance expression is approximately given by
vaC( 1 9('l' ) 2 * Var[G(yi, y2)] ; uln [ -YG \ + 022
1 09Y, J +2 [ (9Y2 J
+212 G(i1, A2)1 [OG(pj,, 2)1 (5.10)
1 9yi J L Y2
Let us denote the approximate standard deviation of G(y1, y2), given as the square
root of the right-hand side of (5.10), by r(x, ^1, 32, E). The expressions in (5.9) and
(5.10) depend upon the unknown means and variance-covariance of the responses.
The first-order prediction of G(yi, Y2) uses the results of the analysis of the linear
multiresponse model. That is, the predicted value of G(yi, y2) at a point x is equal
to

)(x) = G(A,(x), 2(x)).

In order to assess the quality of prediction evaluated over all the design points, we
consider using scaled and unsealed sum of absolute deviations. The scaled sum of
absolute deviations is defined as
N
s = G(y,,y.2)-O(1)(z)
U= 7-(Xu 0,0,E)
where yju denotes the uth value of the ith response (i = 1,2, ... r; u = 1,2, ... ,
N). The unsealed sum of absolute deviations is
N
T(1) = E jG(yu, Yu2) (1)(x).
u=l
Using a second-order approximation of G(yi, y2) in a neighborhood of (Al, /12)
results in









z(2) = G(Al, A2) + (yi /1i) + (Y2 A2) -"] G(111, 12)
1 ia2 aY2
[ (Yl _1)2"2 + 2(yl )(Y2 2) 2 + (Y2 /12)29 G(I, A2).
(5.11)

This second-order approximation leads to the approximate expressions for the
expected value and variance of G(yi, Y2) given in (5.12) and (5.13).

E[G(yi, y2)] G G(i, p2)
+1 ['11 [i92 G- i- A2)4 2a12 [a2-9Y --2A] + U22 [-0 A2 )
2 L 19Y OYLa12 2 /J .
(5.12)



Using properties of the moments for the multivariate normal distribution (Anderson
1984, p. 49), the variance and covariance terms in the second-order variance
approximation can be simplified.
RESULT 1:

Var [(y, /,)2] = E [(yi Mj)O] [E [(y, ,)2]]2
= 3(a,,)2 (ai)2

= 2(a,)2,i = 1,2

RESULT 2:

Var [(yi A1)(Y2 A2)] = E [(Yl- _/l)2(y2 A2)2] [E [(yi /11)(Y2 2)]]2
= (oU)(a22) + 2(aI2)2- (012)2


= U11722 + (C12)2











Var[G(yi, Y2)] 911 G(IL L Ay 2) + 22 yl A 2l L OG(i, A2) OG(l, 2)
I 12 I 2G l (9Y)2 2 y 1 i2 G(J,/ 2
+Var(yi l)2 12 2 + Va)r(y2 2 ) 22 2)


+Cv[(yi (L), (yi ^i)2 [8~.s ] --y1 D2Gs, (52)j
11 1 [ 1 D fG(1, 1
+2Cov [(y, Ai) (y2 112)] -'--p -a2 2
+ o [(Y tt2), (y )2] 1 9G(,, A2) [1 G(l,./2)
+2Cov[(Y /i2), (y( )(Y2 2)] [ a( ] y2 ]



E(Y2- 152), (Y ts) 9 [G(1m,52) [ Dl2G~t54)]
+2Cov [(y, f1), (y )(Y2 2)] [o G(l, A2) 1 2 G(ar, p,2)
L 9y, iL2 DYiY2
+2Cov [(Yl 14), (Y2 A)2] [ (p,,112) [1 o(9 l, J2)
+2Cov [(Y2 A2), (Y1 -_ 1 -)2] W-p j, A ) 1 o2G (pi ,,A2 )


i9y2 2 o Dy2
+2Cov [(y2- A)2, (Y A )(Y P )] aG --^ --j2 C92G( ,A2
+2Cov [(Y2 -A2), (Y2-/2)2] [21 O G(/, A2) 1 J 2G(l, /2)
[.Y [1 r 2aG(/i1,A2) [l G(/l, /A2)]


+2Cov [(yl 14)(y, 2), (Y2 =2)2] [2 l 2)][ L 2- Gp 112)
OYIoy2 (5.13)
(5.13)







RESULT 3:

Cov [(yi pi), (yi pi)2] = E [(Yi i)(yi i)2] E [(yi pi)] E [(Yi pi)2]
E [(Yi pi)3]

= 0,i=1,2

RESULT 4:


Coy [(yi mi), (yi pi) (yj )]


= E[(y-)- (Y- I',)(Y )
-E [(y Ai)] E [(y Ai)(y )]

= E [(y,- ,)2(yj )]


= 0


RESULT 5:


Cov [(Yi IL), (yj )2]


= E [(yi i)(y j)2] E[(y- )][( y (y-)]E )2]
= E[(Yi l,)(yj_ -t)2]


RESULT 6:

Coy [(y p)2, (yi, ) - ] = E [(y i)(y, )(yj-j)]

-E [(y,- A)2] E [(y )(y j)]
= E [(yi pi)3(yj Aj)] aifaij
= oUiioij + iiaij + OUiioij aiiaij

= 2(ii(ij







RESULT 7:


Cov [(Yi ,,)2, (yj j)2]


= E (yi )(yj Mj)2] E [(yi p)2] E [(yj -,)2]
= E [(yi (yj j- )2] aoajj
= 0.ayTjj + (0,ij)2 + (Oij)2 oi..Ojj
= 2(0.,j)2


Using Results 1-7, the second-order variance approximation in (5.13) can be
rewritten as follows:


Var[G(yi, y2)]


[OG(p17/2) ]2 + 022 [G(A 1P52)] 2+ 12 [ G(i, AG2)] [G(A,, 2)]
21 -yl +[0"2 y2 + 20r12 -y2 -
O yi 1 (9Y2 I I ay, 9yL 2
+2(0,1) G(jsi, P2) )2 [2 G(piP2)] 2

+ (a 110 "22 + ((712)2) a -(pY 2 2
^(^ n ~~1 'l ftl^) i9Y)
+2(20110.12) [1 02G(i, A2) 102G(1, A2)y

2 0G(p1i, P) 1 0&G(1, A2)
+2(2(22012) 02--ay( 2 Y2
+2(20a220r12) 92G(1Q--ii A2 -1--Qti,2)-
I 1y9Y2 J L2 ay22 IJ


(5.14)

The square root of the right-hand side of (5.14) is denoted by 0(x, /32, E).
The second-order prediction of G(yl, Y2) is obtained by using the results from a
multiresponse analysis in (5.12) and (5.14). The second-order predicted value for
the function of interest is equal to

(2)(X) = G(#I(x), A2(X))
+ 1 [ai aG2G(Al(x), 2(X)) G((x)2(X)) + &22G(x), X))
2 yi12 Oy19Y2 +y2






86

In order to assess the quality of prediction, we consider again using both scaled and

unsealed sum of absolute deviations. The scaled sum of absolute deviations is
N (2) (X.) y, ) _
S(2 = : G(y,.1, Y.2) ~ )~ )
U= 0(^,1^,2,:)

The unsealed sum of absolute deviations is
N
T(2) = E\IG(y,,, y,2) -_(2)(x, .
u:-I
11=1
The proposed method based upon the Taylor's series approximation to the

function of interest takes into account both the multivariate nature of the data and

the complexity of the function.


5.4 Numerical Examples

In this section, four examples illustrating the proposed procedure for

modeling a function of multiple responses are discussed. The first three examples

involve the product of two responses, G(yj, Y2) = YlY2, while the last example

involves the modeling of the ratio of two responses, G(y1, y2) = I-


Example 1

Hunter and Jaworski (1986) investigate a chemical experiment where the

yield for each run is computed as the product of conversion yi and selectivity y2.

The response of interest is therefore G (yl, y2) = y y2. The experiment used a 22

factorial design with one center run. The two control variables were temperature x,

and time x2. The design settings and the observed values for the two responses, as

well as the product, are given in Table 5.2.

Hunter and Jaworski (1986) analyzed this experiment using both the standard

naive procedure and their proposed methodology. The standard naive procedure for

modeling a function of responses fits a linear model to the last column consisting of








Table 5.2: Chemical Data for Example 1


X1 x2 Y1 Y2 G(yi,y2)=y Yy2
-1 -1 0.799 0.502 0.401098
1 -1 0.203 0.113 0.022939
-1 1 0.997 0.901 0.898297
1 1 0.401 0.499 0.200099
0 0 0.602 0.500 0.301000
Source: Hunter and Jaworski (1986, p. 324)


the observed products. This results in the following fitted model for the product


G(yi, Y2) = 0.365 0.269x, + 0.168x2.

This first-order model resulted in a sum of absolute residuals equal to 0.384. If the

model is allowed to include the interaction term, the fitted model obtained is

G(yi, y2) = 0.365 0.269x, + 0.169x2 0.080xlx2.

This results in a sum of absolute residuals equal to 0.127.

Now, consider applying our proposed methodology based upon the linear

multiresponse model analysis and Taylor's series approximation to the function

G(yi, y2) = Y1Y2- Considering first-order models for both of the response means

results in the following models


A1(x) = f(x)/ = /31 0uXi + 021X2

I1(X) = f2(X) 3 002 +- 12Xl + 3 22X2-

Since the model forms are the same for the two models, the multiresponse estimates

and the ordinary least squares estimates of 01 and 6 2 will be the same. Because of

this, the predictions from the Hunter and Jaworski method will agree with G(1)(x),

the first-order prediction, from the proposed procedure. The proposed procedure,








however, takes into account the variation of G(yi, y2) in assessing the quality of

prediction. Multiresponse estimation resulted in the following fitted models and

estimate of the variance-matrix

Aa(x) = 0.6004 0.2980xi + 0.0990x2

it2(x) = 0.50300 0.19775x, + 0.19625x2
6.40e 07 -1.20e 06

-1.20e -06 1.07e -05

The first-order approximation in (5.8) for the function G(y1, y2) = YiY2 is

z(1) = I1P2 + (yi Pl)02 + (Y2 A2)l1.

Using (5.9) and (5.10) results in the following expressions for the expected value and

variance of the product

E [G(yi, y2)] ; AW2-.


Var[G(yi,y2)] r o"11i2 + 0"222 + 2012A1W2


11 Al Atl

The first-order prediction for the product is therefore tl(x)f2(x). Table 5.3

contains the results from the first-order prediction. The assessment of the quality

of prediction gives an unsealed sum of absolute deviations TM1 = 0.0088012 and a

scaled sum of absolute deviations S1) = 4.891519. The procedure performs quite

well when compared to the naive procedure.








Table 5.3: Example 1: Results from First-Order Prediction


x1 x2 ylY2 predicted deviation
-1 -1 0.401098 0.4032973 -0.0021993
1 -1 0.022939 0.0221706 0.0007684
-1 1 0.898297 0.8946678 0.0036292
1 1 0.200099 0.2013021 -0.0012031
0 0 0.301000 0.3020012 -0.0010012


Applying the second-order approximation from (5.11) to the function

G(yi, y2) results in the following

Z(2) = A192 + (yi A1)A2 + (y2 /2).1

+(yi A1)(Y2 P2).

Using (5.12) and (5.14) gives the following for the expected value and variance of

the product

E[G(yl,y2)] A1.2+0'12


Var [G(yi, Y2)] 0,112 + 022A 20+12/12 + o"11022 + 9 22

The second-order prediction is therefore G(2)(X) -= p1(x)pt2(x) + &12. The

second-order prediction is a correction of the first-order prediction by adding on

the estimated covariance between the responses. Table 5.4 contains the results

from using the second-order prediction. The estimated covariance between the

responses is small. As a result, the second-order predictions are close to the

first-order predictions. The assessment statistics are as follows: T(2) = 0.0088 and
S(2) = 4.891642. The proposed procedure has performed quite well in comparison to

the standard naive approach.








Table 5.4: Example 1: Results from Second-Order Prediction


x1 x2 y1y2 predicted deviation
-1 -1 0.401098 0.4032961 -0.0021981
1 -1 0.022939 0.0221694 0.0007696
-1 1 0.898297 0.8946666 0.0036304
1 1 0.200099 0.2013009 -0.0012019
0 0 0.301000 0.3020000 -0.0010000


Table 5.5: Example 2: Design Settings and Generated Data


X1 X2 Yi Y2 G(yi,y2)=yiy2
-1 -1 29.3 79.3 2323.49
1 -1 50.3 10.7 538.21
-1 1 8.9 79.2 704.88
1 1 29.9 70.3 2101.97
0 0 28.5 58.6 1670.10
0 0 29.4 59.1 1737.54



Example 2

The second example involves the product of two responses as well, however,

the two models are slightly different. The fitted model for the first response is

a first-order model, while the fitted model for the second response includes the

interaction term XlX2 as well. The design settings and generated data are given in

Table 5.5.

The standard naive procedure resulted in the following first-order model for

the observed product


SG(yi, y2) = 1512.6983 97.0475x, 13.7125x2.


The sum of absolute residuals for this model was 3564.613. Adding the interaction

term to the fitted model reduced the sum of absolute residuals to 764.4867 and








Table 5.6: Example 2: Results from First-Order Prediction


X1 x2 Y1Y2 predicted deviation
-1 -1 2323.49 2296.3715 27.11847
1 -1 538.21 518.7799 19.43014
-1 1 704.88 684.7532 20.12681
1 1 2101.97 2076.5965 25.37347
0 0 1670.10 1749.2878 -79.18778
0 0 1737.54 1749.2878 -11.74778


T(1) = 182.9844
S = 4.8225



resulted in the following model


G(yi, Y2) = 1512.6983 97.0475x, 13.7125x2 + 795.5925xlX2.


The analysis of the linear multiresponse model gives the following fitted models and

estimate of the variance-covariance for the responses.


fi(ax) = 29.383 + 10.5x, 10.2x2

{2(x) = 59.533 19.375x, + 14.875x2 + 14.925xlx2
0.1613889 0.1855556
S =
0.1855556 0.2543056

Tables 5.6 and 5.7 give the results of the first-order and second-order predictions

from using the analysis of the linear multiresponse model and the proposed

procedure for modeling the product of responses. The procedure again performs well

in predicting the observed products.


Example 3

The third example involves the product of two responses with the different

models, however, the design settings do not form an orthogonal design. The fitted








Table 5.7: Example 2: Results from Second-Order Prediction


x1 x2 y1y2 predicted deviation
-1 -1 2323.49 2296.5571 26.93292
1 -1 538.21 518.9654 19.24458
-1 1 704.88 684.9387 19.94125
1 1 2101.97 2076.7821 25.18792
0 0 1670.10 1749.4733 -79.37333
0 0 1737.54 1749.4733 -11.93333


T(2) = 182.6133
S(2) = 4.8121



Table 5.8: Example 3: Design Settings and Generated Data


Xl x2 yi y2 G(yl,y2)=yYly2
-1.7 -1.0 20.88 202.81 4234.673
1.5 -0.8 52.47 111.83 5867.720
-1.0 0.7 14.46 213.68 3089.813
0.9 0.9 29.08 162.46 4724.337
-1.0 -1.4 34.65 211.22 7318.773
1.0 -0.6 45.98 139.76 6426.165
-1.3 1.0 5.83 176.24 1027.479
1.0 0.8 32.18 181.58 5843.244
0.0 0.0 29.59 162.22 4800.090
0.0 0.1 27.87 168.42 4693.865
-0.2 0.0 29.17 194.56 5675.315



model for the first response is again a first-order model while the fitted model for

the second response includes the interaction term x1x2. The design settings and

generated data are given in Table 5.8.

The standard naive procedure resulted in the following first-order model for

the observed products


G(yi, y2) = 4908.4565 895.5674x, 1416.3569X2.