UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Record Information

Full Text 
QUANTILE DISPERSION GRAPHS FOR DESIGN COMPARISONS FOR LOGISTIC MODELS AND OTHER MODELING ISSUES By KEVIN S. ROBINSON A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2000 Copyright 2000 by Kevin S. Robinson Dedicated to my loving wife, Becky ACKNOWLEDGMENTS I would first like to thank my chairman, Dr. Andre I. Khuri, for his enormous amount of guidance, patience, and support in the writing of this dissertation. His hardwork ethic and enthusiasm for statistics, research, and teaching serve as an excellent example for me as I pursue an academic career. I would also like to thank the members of my committee: Dr. Alan G. Agresti, Dr. Malay Ghosh, Dr. Kenneth M. Portier, and Dr. Raphael T. Haftka, for their patience and assistance. Special thanks go to all of the faculty, staff, and students of the Department of Statistics of the University of Florida for the imparting of knowledge and encouragement over the years, especially Dr. Ronald H. Randles and Dr. G. Geoffrey Vining. I would also like to acknowledge the Department of Mathematical Sciences at Messiah College for influences academically, personally, and professionally. I specifically thank Dr. L. Marlin Eby for his encouragement and guidance concerning graduate school in statistics and his continuing friendship. I wish to acknowledge my family. I am deeply grateful to my parents for their love and encouragement. I wish to extend my love and gratitude to my three sisters and their husbands. I also thank my inlaws for their support and prayers. Most importantly, I thank my wife, Becky. I am eternally grateful for her love, patience, encouragement, and confidence in me. Finally, I praise God for His daily blessings and the gift of Jesus Christ. TABLE OF CONTENTS page ACKNOWLEDGMENTS ............................. iv ABSTRACT ......................... ........... vii CHAPTERS 1 INTRODUCTION ............................ 1 2 LITERATURE REVIEW ........................ 3 2.1 Introduction .. ... .. ...... ... .... . ... 3 2.2 Designs for Generalized Linear Models ................ 4 2.3 Review of Multiresponse Modeling ................ 10 2.4 Functions of Multiple Responses ................. 13 3 A GRAPHICAL PROCEDURE FOR EVALUATING AND COMPARING DESIGNS FOR LOGISTIC MODELS ....... 19 3.1 Introduction ........................... 19 3.2 Prediction Capability of Design ................. 22 3.3 Application to Logistic Models .................. 26 3.4 Quantile Dispersion Graphs ..................... 27 3.5 Numerical Examples ....................... 29 3.6 Conclusions ........................... 46 4 EVALUATION AND COMPARISON OF STANDARD RSM DESIGNS UNDER NONSTANDARD CONDITIONS ...... 50 4.1 Introduction ........................... 50 4.2 Preliminary Results .......................... 50 4.3 Scaled MeanSquared Error of Prediction ............ 60 4.4 SMSEP for the Logistic Regression Model ............ 62 4.5 A Numerical Example ........................... 63 4.6 Conclusions ........................... 72 5 MODELING FUNCTIONS OF MULTIPLE RESPONSES ..... 75 5.1 Introduction ................. .......... 75 5.2 "Naive" Modeling Procedures .................. 75 5.3 Multiresponse Analysis and Proposed Modeling Procedure 77 5.4 Numerical Examples . . . . . . . . . .... .. 86 5.5 Summary . . . . . . . . . . .......... 99 6 SUMMARY AND FUTURE RESEARCH . . . . . .... ..100 APPENDICES A DISTRIBUTION OF THE RATIO .................... 103 B SPLUS CODE: MODELING PRODUCT CHAPTER 5 EXAMPLE 1 ..................... 104 C SPLUS CODE: MODELING RATIO CHAPTER 5 EXAMPLE 4 ..................... 107 REFERENCES .......................... ......... 112 BIOGRAPHICAL SKETCH ............................ 117 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy QUANTILE DISPERSION GRAPHS FOR DESIGN COMPARISONS FOR LOGISTIC MODELS AND OTHER MODELING ISSUES By Kevin S. Robinson August 2000 Chairman: Andre I. Khuri Major Department: Statistics Response surface methodology (RSM) involves techniques for experimental design, statistical modeling, and optimization. This collection of techniques has been used successfully for the development, improvement, and optimization of processes. RSM was developed under the assumptions of a linear model and normally distributed response. However, in many modern applications, such as drug development and quality assurance, the response is of a discrete nature. The response could also be multivariate in nature consisting of a number of observed outcomes for an experimental unit. Traditional RSM techniques are inappropriate in such situations since the basic assumptions are not quite valid. A more appropriate analysis of such situations would involve the use of generalized linear models or multiresponse modeling techniques. Designs for fitting a generalized linear model depend on the unknown parameters of the model. The use of any design optimality criterion would therefore require some prior knowledge of the parameters. A graphical technique is proposed for comparing and evaluating designs for a logistic regression model. Quantiles of the meansquared error of prediction are obtained on concentric surfaces inside a region of interest, R. For a given design, these quantiles depend on the model's parameters. Plots of the maxima and minima of the quantiles, over a subset of the parameter space, produce the socalled quantile dispersion graphs (QDGs). The plots provide a comprehensive assessment of the overall prediction capability of the design within the region R. They also depict the dependence of the design on the model's parameters. The QDGs can therefore be conveniently used to compare several candidate designs. Sequential generation of optimal designs and an extension of the QDGs to the scaled meansquared error of prediction are also considered. In a multiresponse experiment the researcher may be interested in modeling a function of the responses. A method is presented for the modeling of a function of multiple responses. The proposed procedure is based upon a multiresponse analysis of the data and Taylor's series approximation of the function of interest. The proposed technique compares favorably with a standard "naive" procedure. CHAPTER 1 INTRODUCTION In the twentyfirst century, statistical data analysis will continue to grow in application and importance. Response surface methodology (RSM) with its basic goals of experimental design and modeling with the purpose of process improvement will also continue to develop in the present century. Myers (1999) emphasizes that two areas into which RSM is moving are generalized linear models (GLMs) and multiple responses. The applications using GLMs include the modeling of binary response data and Poisson count data. While multiple responses are often encountered in such situations as the yield and profit of a chemical process or the assessment of the overall quality of a manufactured product. In the area of GLMs, the main issue confronting RSM is experimental design. There is software available in many of the statistical packages for the analysis of GLMs. Unfortunately, however, none is available, as far as we know, for constructing designs for GLMs. Myers (1999, p. 37) stresses that new and creative approaches are needed to deal with the design issues encountered in GLMs. The standard response surface designs and optimal designs were developed under traditional assumptions concerning the model and response. Designs for GLMs encounter the problem of dependence on the unknown parameters. The objective of an experiment is to provide estimates of the parameters; therefore, the researcher is faced with a serious dilemma. The researcher is required to provide guessed values for the parameters or employ a sequential approach to the experiment. Methods are needed to assess the sensitivity of GLMs designs to the unknown parameters. 2 Multiresponse data are now also very common with much of the research and analysis focusing on optimization. The analysis of multiresponse data should give consideration to the correlation structure among the responses. Khuri and Cornell (1996, chapter 7) consider not only the optimization of multiresponse data, but the modeling, choice of design, and lack of fit testing as well. In certain multiresponse experiments, the responses may be further used in the computation of an outcome of interest to the experimenter. Basically, the experimenter is interested in modeling a function of correlated responses. Methods that recognize the multivariate nature of the data are needed in this area. In this dissertation, a graphical technique for the comparison of designs for logistic regression models is developed and a modeling procedure for the function of multiple responses is proposed. A review of the literature concerning GLMs design and multiresponse modeling is contained in chapter 2. This chapter also gives more details concerning the areas to be investigated in this dissertation. Chapters 3 and 4 concern the development of the socalled quantile dispersion graphs for the comparison of designs for logistic regression models. In chapter 5, a new method for the modeling of functions of responses is proposed which takes into account the correlation structure among the responses. Finally, a summary and a list of future research topics are included in chapter 6. CHAPTER 2 LITERATURE REVIEW 2.1 Introduction Extensive work in the area of singleresponse modeling has been carried out in the statistical literature since the 1950's. Response surface methodology (RSM), as developed in the early 1950's by George Box and several other researchers (Box and Wilson 1951), mainly considers situations involving a singleresponse with independent and normally distributed errors. Available textbooks on RSM include Box and Draper (1987), Khuri and Cornell (1996), and Myers and Montgomery (1995). In many experimental situations, the traditional error assumptions are inappropriate. The generalized linear models (GLMs) approach (McCullagh and Nelder 1989) provides an extension of linear models in a manner that allows less stringent error and model assumptions. Khuri (1993) provided a discussion of the use of RSM within the framework of GLMs. He noted that optimal design theory based on traditional RSM was no longer appropriate in the GLMs context. Sequential design and Bayesian design are approaches employed in the design of experiments for GLMs. In other experimental situations, however, several responses are often of interest. For example, in the semiconductor industry, the number of particles remaining and the oxide removal rate for a standard wafer cleaning process are of interest in the manufacturing process. Consider also, the assessment of pencil lead which often uses the following: ash test, porosity test, strength test, and outside diameter. Each of these examples has multiple responses that may be correlated. Multiple responses are also encountered through the modeling of a response's mean and variance, often analyzed as a dual response. The modeling of several responses should take into consideration the multivariate nature of the data. A multivariate analysis can be more informative and more appropriate than a collection of univariate analyses. Unfortunately, however, multiresponse modeling tends to be ignored because of its complexity. The use of linear models and analysis of variance has been extended to modeling multiple responses (Anderson 1984, chap. 8). The classical multivariate analysis of variance (MANOVA) modeling can be restrictive in that each of the responses must have the same linear relationship with the control variables and the same design. Multiple design multivariate regression models are more general as they allow the responses to have different relationships with the control variables as well as different designs (McDonald 1975). Khuri (1996) presented a thorough review of multiresponse surface methodology. 2.2 Designs for Generalized Linear Models 2.2.1 Review of Linear Model Design Developments in optimal experimental design have largely been concerned with linear models; for example, see the books by Fedorov (1972), Silvey (1980), Pukelsheim (1993), and Atkinson and Donev (1992). From a RSM perspective, the books by Box and Draper (1987), Khuri and Cornell (1996), and Myers and Montgomery (1995) mainly discuss designs for linear response surface models with the traditional error assumptions. In the linear model situation, attention is primarily focused on the properties of the variancecovariance matrix for the estimates of the model's unknown parameters. Silvey (1980, p. 3) presents Fisher's information matrix as a motivation for this approach. In this case, the variancecovariance matrix of the estimates is o2[X'X]1. Traditional design criteria focus upon properties of the matrix X'X which is independent of the model's parameters. For example, the Doptimality criterion calls for the maximization of the determinant of X'X. While in Goptimality, the criterion is to minimize the maximum prediction variance, which depends upon X'X, over a design region. Kiefer (1985) extended the notion of a fixed npoint design to a probability measure over the experimental region, termed a design measure. Silvey (1980, pp. 1314) noted that the use of design measures allowed the exploitation of mathematical techniques to solve the optimal design problem resulting in some elegant and useful theory. The most notable result is probably the Equivalence Theorem of Kiefer and Wolfowitz (1960) which establishes the equivalence of Doptimality and Goptimality for design measures. Further discussions of design measures and Equivalence Theorem are presented in Chapter 4. 2.2.2 Generalized Linear Model Design The generalized linear models (GLMs) approach allows for less restrictive error and model assumptions. For example, normally distributed errors and constant variance properties are no longer required. The text by McCullagh and Nelder (1989) presented a thorough account of the theory of GLMs. Khuri (1993) provided a discussion of the use of the GLMs approach in RSM. Atkinson and Haines (1996) provide a recent review of the design theory for nonlinear and generalized linear models. In the GLMs situation the response, y, is assumed to follow a distribution from the exponential family. This includes the normal as well as the binomial, Poisson, and gamma distributions. The mean of the response is modeled as a function of the form, E[y(x)] = p(x) = h[f'(x)f3] where f(ax) is a vector of order p x 1 whose elements consist of powers and products of powers of the elements of x up to degree d(_ 1), and /3 is a vector of p unknown parameters. The function f'(x)}3 is called the linear predictor. It is assumed that h(.) is a strictly monotone function. Using the inverse of the function h(.) we have g[p(x)] = f'(x)f3. The function g(.) is called the link function. In traditional linear models, the link function is the identity function. The method of maximum likelihood (ML) is utilized to estimate f3. It follows that an approximation for the variancecovariance matrix of the ML estimates of 03 is the inverse of Fisher's information matrix which is p x p with (i,j)th element E[ 0 2 ] where I is the loglikelihood function. Iterative reweighted least squares (IRWLS) can be used to fit a generalized linear model as discussed in Khuri (1993) and McCullagh and Nelder (1989). Let z(x) = g[y(x)]. Using the firstorder terms in a Taylor series expansion, z(x) ,, g[p(x)] + [y(x) y(x)]g'[p(x)]. Thus E[z(x)] g[I(x)] = f'(x)13 and Var[z(x)] Var[y(x)](g'(L(X))2 = w(x,/3). Using IRWLS, Var[3] Z [X'WX]1, where 3 is the ML estimate of /3, X is a matrix whose uth row is f'(xu), and W is a diagonal matrix whose uth diagonal element is 1 where x is the vector w(Xu,P)3 of design settings at the uth experimental run. It follows that the information n matrix, X'WTVX = l f(xu)f'(xu), depends on the unknown parameters, u=1 W(X .UP5 /3, through w(axu, /3). Let z(x) = f'(x)f3 denote the predicted value of z at a point x in the experimental region. In the GLMs situation, as in traditional design theory, the focus has been on the variancecovariance matrix, (X'WVX)1, of the parameter estimates, where the matrix W depends upon the unknown model parameters. A Doptimal design maximizes the determinant of X'WTVX; however, there is no single Doptimal design because of the depenency on the unknown parameters. Goptimal designs are obtained by minimizing the maximum prediction variance over the design region, where the prediction variance is Var[z(x)] fl(x)[XfVX]lf(x). The dependence upon the unknown parameters, f3, encountered in GLMs also occurs in the consideration of designs for nonlinear regression models. Common approaches for dealing with the dependency problem include using a "good" guess of the unknown parameters, a sequential approach in which a design is determined in at least two stages, and a Bayesian approach which introduces a prior distribution on the parameters. White (1973) provided an extension of Kiefer and Wolfowitz (1960) Equivalence Theorem to nonlinear models. Chapter 4 contains a review of this extension to the Equivalence Theorem. Developments in designs for GLMs experiments have been mostly concerned with binary responses commonly modeled using the logistic regression model(Abdelbasit and Plackett 1983; Minkin 1987; Chaloner and Larntz 1989; Myers, Myers, and Carter 1994). Abdelbasit and Plackett (1983) discuss designs for experiments dealing with binary data, specifically biological assay or fatigue experiments. An experiment consists of subjects receiving a dose level (stimulus), x, to which the subject may respond. Interest is in the probability of a response, 7r(x). Doptimal designs for the logistic model, logit 7r(x) = log (,) = /3(x A), are found assuming twopoint symmetric designs about /. and equal allocations. The optimal design is to use the dose levels corresponding to ED17.6 and ED82.4, where ED100op represents the value of x which produces a 100p% response rate. Note that these dose levels are unknown since the parameters are unknown. Abdelbasit and Plackett suggest dividing the experiment into stages and using improved estimates after each stage in determining the dose levels. Minkin (1987) considers the selection of a design for logistic regression such that the likelihoodbased confidence region is of minimum volume. Minkin notes that maximization of the determinant of Fisher's information matrix is in effect equivalent to minimizing the volume of a confidence region based on a secondorder Taylor's approximation of the loglikelihood function. This work generalizes the results of Abdelbasit and Plackett (1983) that designs with an even number of observations should be two point symmetric designs about ED50 with equal allocation at ED17.6 and ED82.4. For an odd number of observations, N, no fewer than (N 1) points should be located at each of the two dose levels. Again, a sequential approach to selecting the design is recommended. Further discussion of sequential design and analysis is given by Wu (1985a,b) and Chaudhuri and Mykland (1993). An alternative approach to using a "good" guess or a sequential design is to utilize a Bayesian methodology by placing a prior distribution on the unknown parameters. A thorough review of Bayesian experimental design is given by Chaloner and Verdinelli (1995). Chaloner and Larntz (1989) apply the Bayesian design theory to logistic regression experiments. Designs were found using a range of uniform and independent prior distributions for ti and /6. It was concluded that the number of support points increased with the uncertainty in the prior distribution. Myers, Myers, and Carter (1994) investigated optimal designs for the logistic model by considering the prediction of a response. The prediction variance of logit * is considered. The authors considered general designs that minimize the average prediction variance, the socalled Qoptimality, and designs centered around ED50 that minimize the maximum prediction variance, namely Goptimality. Designs were found using the Nelder and Mead (1965) simplex search procedure. The authors considered the robustness of the designs to poor initial guesses through efficiencies. They concluded that Qoptimal designs with two/three points were fairly robust to poor initial guesses. Furthermore, underestimating the slope, /3, resulted in a higher Qefficiencies, while improved estimates of ED50 (p) reduced the effect of a poor slope estimate. 2.2.3 Graphical Tools for Evaluating Experimental Designs Until recently attempts to compare designs have been made using single valued criteria, such as Doptimality and Goptimality. However, the success of a design for a given model is determined by its ability to predict the response efficiently. The prediction variance function can be utilized to determine prediction capability. Goptimality addresses the prediction variance but fails to provide information about adequacy of prediction at various points in the region of interest. Advances in computing and graphical methods have recently resulted in effective graphical techniques to evaluate designs (GiovannittiJensen and Myers 1989; Myers, Vining, GiovannittiJensen, and Myers 1992; Khuri, Kim, and Um 1996). The variance dispersion graphs (VDGs) presented by GiovannittiJensen and Myers (1989) are two dimensional plots of the maximum, minimum, and spherical average prediction variances over concentric spheres inside a region of interest against their radii. Khuri, Kim, and Um (1996) suggest that quantile plots of the prediction variance on various spheres inside a region of interest allow for a more comprehensive and accurate assessment of the prediction capability of a design. In the nonlinear and GLMs situation, the VDGs and quantile plots encounter the difficulty of dependence upon the unknown parameters as well as biased estimates. Khuri and Lee (1996) present a graphical technique extending the quantile plots to nonlinear models. For evaluating the prediction capability of a design, Khuri and Lee suggest plotting quantiles of the estimated scaled meansquared error of prediction, ESMSEP, on concentric surfaces within the region of interest. The meansquared error is utilized to address the bias in the estimates and is put on a "per observation" basis. The meansquared error is estimated using the fitted model because of the dependence on the model's unknown parameters. In order to compare designs, Khuri and Lee propose the socalled minimum and maximum SMSEP plots, where SMSEP is the scaled meansquared error of prediction. Nonlinear models with only one control variable were considered. The minimum and maximum SMSEP are found over a subset of the parameter space for each design and are plotted against x, where x is the control variable. An example was presented to demonstrate that designs based on a singlevalued criterion may perform poorly in terms of the meansquared error of prediction. The development of quantile dispersion graphs (QDGs) for logistic regression models is presented in chapter 3. Chapter 4 contains details about using the QDGs for the comparison of traditional RSM designs as well as issues concerning the sequential generation of optimal designs. 2.3 Review of Multiresponse Modeling 2.3.1 General Multiresponse Model Suppose that an experimenter is interested in modeling r responses using k control variables. An experiment is performed where the responses and control variables are measured on each of N experimental runs. Multiresponse models allow each of the responses to have a different relationship with the control variables. Each response is represented by a model function involving the control variables and a set of unknown parameter values. The multivariate nature of the data is accounted for by taking into consideration the variancecovariance structure of the r responses. At the uth run, let yu = (yui, Yu2, Y3, ... yur)' be the vector of r responses, and Xu = (xi, Xu2, xu3,... Xuq)' be the vector of q control variables. The model for the ith response is of the form: yui fi(xf,,3)+ui u= l,2,... ,N i=1,2,... ,r (2.1) where /3 = (0) /2, ,... i/p)' is vector of p (unknown) parameters, cui is a random error, and the fi model function for the ith response is of known form. The standard assumptions about the random errors are: E(cu,) = 0 all u, i and E(EmEui',) = aoi, if u = u', zero otherwise. Thus, random errors (among the r responses) from the same experimental run are assumed to be correlated, but random errors from different experimental runs are uncorrelated. Assuming that the vector of random errors for the uth run, eu = (ul, u2, f) u3, 0 Ceur)t,' has a multivariate normal distribution, Box and Draper (1965) utilized the Bayesian methodology to estimate /3. The likelihood function for the N x r response matrix Y, where the ith column of Y is the vector of N observations on the ith response (i = 1, 2,... r), is given by 1 1 1 L(YIf, E) = (2)12 exp(1trace[E1 Z ()Z(f)]) (2.2) (2,)N,/2j~jN/2 2 where E is the variancecovariance matrix (crij) for the r responses, Z(/3) = YF(f3), and F(f3) is an N x r matrix whose (u, i)th element is fi(xu, f). Assuming noninformative prior distributions for /3 and E, the marginal posterior density for /3 is P(/3IY) Cx IZ'(/3)Z(/3) N ( (2.3) 12 Estimation of 83 is achieved by maximizing this posterior density, which is equivalent to minimizing the determinant A(13)= Z'(/3)Z(3) (2.4) with respect to f3. This estimation procedure is referred to as the BoxDraper determinant criterion. Bates and Watts (1988, pp. 138139) noted that this Bayesian argument is in fact not needed and that the standard likelihood approach leads to the same method of estimation. This estimation method is very general in that it places no restrictions on the functional relationships used in the modeling. 2.3.2 Linear Multiresponse Model If the multiresponse model (2.1) is linear in the parameters, then fA(x,,3) = f'(x,)43i i= 1,2,...,r, where fi(xr) is a vector of order p, x 1 whose elements consist of powers and products of powers of the elements of xa up to degree di(> 1), and fli is a vector of pi unknown constant coefficients. In this case, the vector /3 in model (2.1) is of the form f3 = (f03: f : :. : 0'r)', assuming that the /31's do not have common elements. The generalized leastsquares estimator of f3 is given by the value of /3 that minimizes r r E Oij y Xifjy X (2.5) i=l j=l where y, = (yUi, Y2i, yi, ... YNi), i = 1, 2... r, Xi is an N x pi matrix whose uth row is f'i(Xu), and aj is the (i,j)th element of 1. Since E is unknown, Zellner (1962) proposed a procedure for estimating it. For this purpose, each response is analyzed using ordinary least squares. Consider the leastsquares residuals for the ith response, ej, i = 1, 2,... r. An estimate of E is given by E = (&ij), where aij = i, j = 1, 2,...,r. (2.6) An estimate of f3 is obtained by using E in place of E in criterion (2.5). The generalized leastsquares method is therefore a twostage procedure which does not require the normality assumption. Chapter 5 contains a more detailed presentation of the analysis for the linear multiresponse model. Problems such as dependencies among the responses and missing data are associated with the analysis of multiresponse models. Box, Hunter, MacGregor, and Erjavec (1973) considered three kinds of dependencies: dependence among the errors which led to the BoxDraper criterion, linear dependencies among the expected values of the responses and linear dependencies in the data. An eigenvalueeigenvector analysis was proposed to detect and identify such dependencies. Khuri (1990) noted that problems may occur in the eigenvalue analysis if the response variables have different units of measurement. He suggested a scaling convention for the responses to address this situation. The problem of missing data was addressed by Box, Draper, and Hunter (1970) by treating the missing values as additional parameters. Stewart and Sorensen (1981) considered an approach using the posterior density function of f3 and E given the complete data. 2.4 Functions of Multiple Responses In some multiresponse experiments, the modeling of a function (or functions) of the response variables is of interest. The following two multiresponse experiments originated from the semiconductor industry (T. Moore, personal communication, 1996). Example 1 SEMATECH performed an experiment involving a silicon nitride deposition process using three factors in an effort to identify a suitable process window. The three control variables (factors) were pressure, total flow, and gas ratio. The responses of interest were thickness (THK), wafer to wafer standard deviation (WTWSD), and within wafer standard deviation (WIWSD). The main objective was the determination of desired settings of the control variables that would achieve a target value for THK while minimizing the variation in the process. The problem at hand is the modeling of the total variation in the process (in general, modeling functions of the responses measured). In the present example, the total variation is defined by the function TOTSD = VWTWSD2 + WIWSD2 (2.7) Two methods of modeling this variable can be considered. First, the new response TOTSD is modeled along with the THK response. Second, the three responses WTWSD, WIWSD, and THK are modeled and their predicted values are substituted into TOTSD. It is of interest to know which of these two methods is better and whether there are better alternative approaches to modeling such a function of multiple responses. Example 2 SEMATECH performed an experiment to investigate a tool used in the process of polishing wafers. Three control variables (factors) were considered: table speed, downforce, and slurry concentration. The measured responses were removal rate of metal (RR), oxide removal rate (OR), and within wafer standard deviation (WIWSD) for the removal rate of metal. Two functions of the multiple responses were of primary interest. RR WIWSD Y1 = y2 = RR Settings of the factors that would maximize Yl and minimize Y2 were desired. Each of the above examples illustrates a situation in which the primary interest is the modeling of a function (or functions) of several responses. Little work has been done in this area. Hunter and Jaworski (1986) describe a method for model building where the response of interest, G(yi, Y2,... yr), is a function of r responses. A "naive" approach consists of calculating the value of the response function for each experimental run and then using singleresponse techniques to model this response as a polynomial function of the control variables. Hunter and Jaworski proposed to fit each of the r responses yi, y2,..., yr individually and then the model for the response of interest is obtained by G(21, 22, .. #2), where fti is the estimate of the mean from the individually fitted model for the ith response. Hunter and Jaworski present an example where the response of interest is the product of yi and Y2. Using a 22 factorial design with a center run, firstorder models are fitted to Yi and Y2. The models are then multiplied together to obtain a secondorder model for y1y2. A more detailed discussion of these two procedures is presented in Chapter 5. Hunter and Jaworski place no constraints on how the response of interest, G(yi, Y2, ... Yr), is related to Yi, Y2,... Yr. For example, they do not discuss the problems that may occur if G(yi, Y2,... Y) is an illbehaved function such as a ratio, Y. Also, correlations among the multiple responses Yi, y2,., y. are not considered. Guo and Sachs (1993) employ the naive approach and the alternative approach of Hunter and Jaworski in modeling spatial uniformity in the semiconductor industry. 16 Spatial uniformity is defined to be the fundamental ratio of the observed standard deviation and observed mean within a single wafer or a batch of wafers (coefficient of variation). Guo and Sachs incorrectly call the method of Hunter and Jaworski: "multiple response surfaces approach." Guo and Sachs describe the methods as follows: In most cases, spatial uniformity is not measured directly but is calculated from measurements of output characteristics taken at specified locations. It might be beneficial to fit these primitive output characteristics first and then manipulate these models to construct the model of the spatial uni formity. This is different from the traditional approach of first calculating the value of the spatial uniformity for each experimental run and then fitting an equation to these calculated values. (Guo and Sachs 1993, p. 42) Neither approach recognizes the multivariate nature of the data. Guo and Sachs (1993) provide a simple example to illustrate the two approaches of creating models. An experiment in the semiconductor wafer manufacturing consisted of two factors (x1, x2) and three output characteristics (deposition rates at three locations within the wafer). Table 2.1 contains the data for the uniformity example. Using the naive approach of working with the last column resulted in the following secondorder model: %) = 55.92 1.4528x, .9196x2 + 0.01094X2 + 0.00867x2 + 0.00658xlx2 The method proposed by Hunter and Jaworski suggests the fitting of firstorder models to each of the three output characteristics: Ai = 29.8449 + 0.2940xi + 0.1175x2 A2 = 34.0764 + 0.2048x, + 0.1378x2 Table 2.1: Guo and Sachs Example: Data for Uniformity Models Equipment Spatial Settings Output Characteristics Uniformity Run x, x2 Yi Y2 Y3 (%) 1 46 22 45.994 46.296 48.589 3.022 2 56 22 48.843 48.731 49.681 1.058 3 66 22 51.555 50.544 50.908 1.004 4 46 32 47.647 47.846 48.519 0.952 5 56 32 50.208 49.930 50.072 0.278 6 66 32 52.931 52.387 51.505 1.377 7 46 42 47.641 49.488 48.947 1.950 8 56 42 51.365 51.365 50.642 0.817 9 66 42 54.436 52.985 51.716 2.566 Source: Guo and Sachs (1993, p. 43) A3 = 41.3942 + 0.1346x, + 0.0355x2 The model for spatial uniformity is now obtained by using the above firstorder models in the following equation: V() = / [()2+(92_)2+(93)2_[ (100) where = A iAl +A2 A3) This results in the following model for spatial uniformity: () 34M.1400.911 xl 0.537x2+0.0064x2+0.0029x2+0.0062xlX2 1 (A) M 35.105+0.211xl+0.097X2 (100) In other situations, the function of interest can be a ratio, I, of two Y2 responses, where E[yi] = p,, Var[yj] = ain, i = 1,2, Cov[yi, y2] = 012 and the correlation coefficient is p. Fieller (1932) and Hinkley (1969) considered the distribution of the ratio, Y when Y1, y2 have a joint bivariate normal distribution. Yy2 (See Appendix A for the distribution of Y.) This distribution approaches normality as x2 tends to zero. Shanmugalingam (1982) noted that seldom is this condition Pa2 met in practical situations and performed simulation studies to investigate the distribution. Shanmugalingam also noted that heterogeneity and anormality both 18 occur together. This violates assumptions utilized in the naive approach of modeling Sas a single response. The work of Fieller and Hinkley addresses the distribution Y2 of the ratio but fails to address any modeling of the relationship between the ratio and control variables. A method for the modeling of functions of multiple responses is proposed in chapter 5. The proposed method incorporates the multivariate nature of the data and uses a Taylor's series approximation of the function of interest. CHAPTER 3 A GRAPHICAL PROCEDURE FOR EVALUATING AND COMPARING DESIGNS FOR LOGISTIC MODELS 3.1 Introduction The design of experiments has traditionally considered linear models with random experimental errors assumed to be normally distributed with a zero mean and a constant variance. The texts by Fedorov (1972), Silvey (1980), Pukelsheim (1993), and Atkinson and Donev (1992) give detailed accounts of optimal design theory; while Box and Draper (1987), Myers and Montgomery (1995), and Khuri and Cornell (1996) detail design of experiments from a response surface methodology (RSM) perspective. Traditional design theory has focused on design criteria that pertain to the variancecovariance matrix of the vector of parameter estimates for a given linear model. A number of such criteria have been grouped together under the socalled alphabetic optimality, for example, Doptimality and Goptimality. Alphabetic optimality criteria utilize singlevalued functions that measure the precision of estimation of the parameters or of the mean response, both of which involve the variancecovariance matrix of the parameter estimates. These singlevalued criteria, however, may be insufficient to assess the overall quality of prediction of a given experimental design throughout the experimental region (Khuri, Kim, and Um 1996). For linear models, the variancecovariance matrix of the parameter estimates depends upon the design settings, but is free of the unknown parameters of the model under consideration. In many experimental situations, the common assumptions concerning the form of the fitted model (e.g. linear) and/or the distribution of the error term (e.g. normal with a zero mean and a constant variance) are violated. For example, one of the most common ways in which violations may occur is when the response, in a given experiment, is binary. Binary data are frequently encountered in doseresponse models, quantal response models, and successfailure experiments. Piegorsch (1998) discusses engineering situations where the data are observed proportions and the binomial probability model is assumed. Carteret al. (1986) and Vidmar, McKean, and Hettmansperger (1992) present examples from drug combination experiments with responsenonresponse data. Generalized linear models (GLMs) provide an extension of linear models by relaxing the assumptions of normality and constant variance for the random errors (McCullagh and Nelder 1989). The basic components of GLMs consist of a response having a distribution that belongs to the exponential family, such as the normal, binomial, Poisson, and gamma distributions. In addition, the mean of the response is assumed to be a strictly monotone increasing function of a linear predictor which depends on unknown parameters and several control variables. For GLMs, however, the variancecovariance matrix of the parameter estimates, depends upon the unknown parameters of the linear predictor. This dependency problem causes a great difficulty in the construction of designs for GLMs. A few strategies for dealing with this dependency problem include the use of local optimal designs based on a "good" initial guess of the unknown parameters, a sequential approach whereby the design is developed in stages combined with updated estimates of the parameters (Wu 1985a,b; Chaudhuri and Mykland 1993). Another strategy includes the use of the Bayesian methodology which imposes prior distributions on the parameters (Chaloner and Larntz 1989; Chaloner and Verdinelli 1995). Advances in computing and graphical methods have, in recent years, made it possible to develop effective graphical techniques for evaluating designs which do not require using singlevalued criteria as in traditional design theory. Variance dispersion graphs (GiovannittiJensen and Myers, 1989) and quantile plots of the prediction variance (Khuri, Kim and Urn, 1996) are such techniques used in RSM models to assess the prediction capability of a design. These techniques, however, still encounter the same problem of dependence upon unknown parameters when applied to nonlinear models and GLMs. For the latter models, the parameter estimates are biased. While investigating the bias and accuracy of parameter estimates for the quantal response model, Sowden (1971, p. 602) commented that his results "have obvious implications for designing quantal response experiments. A detailed investigation may well suggest alternative design criteria." For nonlinear models, both bias and variance functions should be considered in the evaluation of experimental designs. Khuri and Lee (1996) suggested a graphical approach based on the mean squared error of prediction for evaluating and comparing designs for nonlinear models. The use of the meansquared error of prediction incorporates the prediction bias as well as the prediction variance. In this chapter, a graphical technique, based on the meansquared error of prediction, is developed for the evaluation and comparison of experimental designs for logistic regression models. 3.2 Prediction Capability of Design The prediction capability of a given design will be evaluated using the meansquared error of prediction namely, MSE[A(x)] = Var[/2(x)]+}Bias[A(x]}2, (3.1) where A(x) is the estimated value of the mean response. 3.2.1 Generalized Linear Models Consider a sample, yi,... yN, of independent random response variables each having a density in the exponential family (Cordeiro and McCullagh 1991), that is, 6(yu,;Ou, ) = exp[{yO, b(O,) + c(y,)} +d(y, ,)], u = 1,2,... ,N. (3.2) where b(.), c(.), and d(., .) are known functions and 0 is a dispersion parameter. It follows that E(yu) = u = u and Var(yu) = , where Vu = d (Cordeiro d~u 0dOus and McCullagh 1991, p. 631). The specification of a link function, 71 = g(A), is required where g() is a strictly monotone increasing function and 'q is a linear predictor. Let xu be a vector of control variables associated with the uth response value, yu, u = 1,2,... N. The mean response at a point x in a region of interest is modeled as a function of the control variables of the form p,(x) = h[(c(x)], where h(.) is the inverse function of g(.), the elements of x consist of the control variables, and il(x) is assumed to be a linear function of the form q(x) = f'(x)f. Here, f(x) is a vector of order p x 1 whose elements consist of powers and products of powers of the elements of x up to degree d(> 1), and /3 is a vector of p unknown parameters. The method of maximum likelihood is utilized to estimate /3 implemented through an iterative reweighted least squares procedure as discussed in Khuri (1993) and McCullagh and Nelder (1989, pp. 4043). Using the work of Khuri (1993), which considered RSM within the framework of GLMs, an approximate prediction variance of #(x), the estimated value of the mean response at x, is given by Var[f#(x)] = 1(dl)2f'(x)[X'WX]lf(x) (3.3) where X is a matrix whose uth row is f'(xu) u = 1,2,... ,N, and W = Diag(w,... .WN), where wu = with d" denoting the derivative of i with ngVd de it respect to r evaluated at xu. Note that in Khuri (1993), wu = (d )Var()  (d u)2, The dispersion parameter, , has been factored out in expression (3.3). 3.2.2 Bias Computation Difficulty in the computation of bias of parameter estimates has hindered the use of bias as a criterion for the selection of designs for nonlinear models as well as generalized linear models. Cox and Snell (1968) and Cox and Hinkley (1974) developed a bias approximation formula for the maximum likelihood estimate of a single parameter using likelihood theory and Taylor's series. Using tensor methods, McCullagh (1987) developed a multiparameter version of the bias computation. Bias results have been further developed for nonlinear regression models with normal errors (Cook, Tsai, and Wei 1986) and with exponential family errors (Paula 1992). Cordeiro and McCullagh (1991) used the tensor methodology to develop bias computation for generalized linear models. Khuri and Lee (1996) used Cook et al. (1986) bias formula in the development of their graphical procedure for comparing designs for nonlinear models. Cadigan (1994) presented a bias computation method which did not require the use of the tensor methodology. His method resulted in a relatively simple computational algorithm for bias approximation. This method is summarized here. 24 Let L denote the log likelihood function, where L = L(/3) is a function of /, a p x 1 vector of parameters to be estimated. Let u(/3) be a p x 1 vector of firstorder partial derivatives of L with respect to /3, and let j(/6) be a p x p matrix of secondorder partial derivatives of L [Hessian matrix of L]. Fisher's information matrix is then given by I = E[j(13)] = E[u(j3)u'(/3)]. Using the likelihood theory and Taylor's series, Cadigan (1994, p. 92, formula (5)) developed the following bias approximation for the maximum likelihood estimate, /3, using a secondorder approximation of the bias: 1 0. vecj (/3)', 1i (4 Bias(13) z I{E[j(6)I1u(,3)] + E[ 8 vecJ1} (3.4) 2 V/ where if A is m x n and A = [a,1,... an], then vec(A) is an mn x 1 vector of the form al [ a vec(A) = an A Bias Formula for Generalized Linear Models Cadigan (1994) applied the bias formula (3.4) to GLMs. The resulting bias approximation for GLMs is given by Cadigan (1994) and Cordeiro and McCullagh (1991, eq. 4.2): 1 Bias(,3) = (X'WX)X'ZdFln (3.5) 2q5 where X and W are defined as before (see formula (3.3)), and Zd = Diag(zn,... ,ZNN), where zuu is the uth diagonal element of Z = X(X'WX)1X', F = Diag(fln,... fNN), where fuu = V and 1N is 25 an N x 1 vector of ones. du and d2 are the first and second derivatives of p with respect to 7] evaluated at x. Cordeiro and McCullagh (1991) further developed the following approximate expressions for the bias in i = X/3 and /A = h(i)), where #. and i) are the vectors whose uth elements are p(xu) = h[)(xu)] and 7)(xa) = f'(xu)/, respectively, u= 1,2,... ,N: 1 Bias(i) =  ZZdFIn (3.6) 20) Bias(A) = (G2 GZF)Zdln (3.7) 24 where G1 = Diag(dAu) and G2 = Diag( d^u) (Cordeiro and McCullagh 1991, eqs. 4.3 and 4.4). Expression (3.7) follows from considering the bias of #t which is a function of ). Cordeiro and McCullagh (1991) noted that to order n1, Bias[A(xu)] = Bias[)(x1)][] + 1var[(xu)][ ], u = 1,2,... N. (3.8) d?7u 2 d7%2 Note that Var[)(x)] = Var[f'(x)/3] ,f'(x)(X'WX)lf(x) and Bias[7(x)] Z f(x)Bias(/3) The expressions in (3.3) and (3.8) can then be used to get the meansquared error of prediction for the mean response. Cordeiro and McCullagh (1991) also noted that the bias of the maximum likelihood estimates, /3, in generalized linear models can be computed through a simple weighted linear regression computation. Letting = W 1ZdF1N, Bias(/,3) in (3.5) can be written as Bias(3) = (X'WX)1X'WC. (3.9) The bias of / is then the vector of regression coefficients based on the weighted linear regression of t on X with W being the weight matrix. 3.3 Application to Logistic Models Consider independent random variables, rl,... rn, with r, having a binomial distribution with index m, and probability of success 7ru, (u = 1,... n). Hence, E(ru) = m7r,, and Var(r.) = m,7ru(1 7r,). Let au be a vector of design settings of a set of control variables associated with ru (u = 1, 2,... n). Consider the sample proportions yu = * (u = 1,... ,n). Then, E(yu) = 7ru and Var(yu) = ru(1u). The density for a sample proportion yu has therefore the following form: exp[yumlog( 'u ) + muljog(1 ru) + (3.10) This density is in the exponential family (see equation (3.2)) with 0 = 1, 6 = mMog(~~'u),b(0,) = mulog(l + exp( )),c(yu) = 0, and d(yu,q) = log( ;). Using a logit link function with a linear predictor to model the probability of success, 7ru, as a function of the control variables, zX, results in g[r(xu)] = log( '(x)) = f'(xu)f3 = r(xu) = qu. Hence, the mean response is E[y(u)] = (xu = r() = 1u = lexp(,[x.)] For the calculation of Bias(1) = (X'WX)'X'W W and t are needed (dSa )2 (see expression (3.9)). For W = Diag(wl,... wWn), where wu = 1 and Vu = du. We have wu = muru(1 7ru), = WL ZdF1I whose uth element 1c i zu ,d, d^ z _1 d exp( xp(xp(2u)/ exp(i) _ i 1 V. d~ 1 a") 1 =i 1is ) 2 v =U V 2Zuu 2 ,U (1+exp(,tl))3 (1+exp(,))2 _ 1exp(,,) zu(1 21r) = z,,(r 1/2),u = 1,2,... ,n. 2 uu l+exp(,7n) .2 u Consider the meansquared error of prediction for fr(a) at the setting, x, of the control variables, MSE[r(a)] = Var[fr(x)] + {Bias[f(a)]}2. Here, r(x) = exp[,(x)] where (ax) = f'(x)4. By formula (3.3), an approximate variance 1+exp[,(xi)] of # is given by Var(fr(x)) ;z: (d^r^ .)f^)X'WX)1f^) 7 r(X)2(1 r(x))2f'(X)(X'WX)f(x). (3.11) For the prediction bias of r(x), formula (3.8) leads to the following bias expression: Bias(fr(x)) Bias[(x)] dr( + Var[1(x)] d2ir(x) (3.12) SBias[)(z)]7r(X)(1 7r(x)) + 2Var[?(x)]7r(x)[1 7r(x)][1 2r(x)]. Noting that Var[i(x)] = Var[f(x)/3] f'(x)(X'WX)lf(x) and Bias[i(ax)] = f'(x)Bias(3), we finally get from formulas (3.11) and (3.12) the following expression for the meansquared error of prediction at x MSE(r(x)) = 7r2(X)(1 (x))2f'(x)(X'WX)lf(x) (3.13) +{f'(x)Bias(3)7r(ax)(1 7r(x)) +lf'(x)(X'WX)lf(x)7r(x)[1 r(x)][1 2(x)]}2 2 It should be noted that MSE[fr(c)] depends upon the unknown 3. An estimate can be obtained by replacing 03 with 3. 3.4 Quantile Dispersion Graphs As mentioned earlier, traditional singlevalued design criteria may be insufficient to fully characterize the efficiency of an experimental design. In order to assess the prediction capability of an experimental design for linear response surface models, GiovannittiJensen and Myers (1989) suggested the use of variance dispersion graphs (VDGs). The VDGs consist of plots of the maxima and minima of the prediction variance on concentric spheres contained inside an experimental region, R. Khuri, Kim and Um (1996) proposed using plots of quantiles of the prediction variance on concentric spheres as a means of getting more information about the design's prediction capability than can be obtained from the VDGs. The quantile plots allow complete assessment of the distribution of the prediction variance on concentric spheres throughout the experimental region, R. In the linear model situation, the prediction variance depends upon the design used and the location at which prediction is to be made. In GLMs and nonlinear models the meansquared error of prediction (MSEP) is considered because of biased parameter estimates. However, the MSEP now depends upon the prediction location, x, the design settings, and on the unknown model parameters. More specifically, the MSEP depends upon the unknown elements of /3 in the linear predictor. Let TD(X,,3) = MSE[a(x)], where D is a given design. After a design D has been used in the fitting of a model, the prediction capability of the design can be investigated by using an estimate of TD(X, 3) obtained by replacing f3 with /3. However, if no estimate of f/3 is available or comparisons of several designs is of interest, some subset C of the parameter space of /3 must be considered in order to assess the dependency of the MSEP on the unknown parameters, as we shall now explain. Quantile dispersion graphs of the MSEP, which we shall now define, can be utilized to evaluate and compare the prediction capability of designs: Consider several concentric surfaces, denoted by RN, located within the experimental region, R. These surfaces are obtained by reducing the boundary of R using a shrinkage factor A. The prediction capability can be assessed by looking at the distribution of the MSEP on each of several of these concentric surfaces. More specifically, the distribution of MSEP on a concentric surface is fully described in terms of its quantiles. For a given design D and a given /3 e C, let QD (p,/3, A) denote the pth quantile of the distribution of values of TD(X, 3) on RA, for a specified value of the shrinkage factor A. Now to assess the dependency of the quantiles of the MSEP on J3, QD(PA A) is computed for several values of /3 forming a grid of points inside the region C. Let us then consider the following quantities: Qax(p, A) = maxQD(p,/3, A) (3.14) D /3EC j3ec Qmmi(p, A) = mmin QD(P,, A) (3.15) /3ec Plots of Qmax(p, A) and Qmin(p, A) against p result in the socalled quantile dispersion graphs (QDGs) of the MSEP. Such plots, which can be constructed for each of several values of A, provide a comprehensive assessment of the prediction capability of a design D throughout the region R. Several designs can therefore be compared by examining their corresponding QDG profiles. Similar QDGs graphs were used by Khuri and Lee (1998) for the comparison of designs for nonlinear models(Khuri 1997; Lee and Khuri 1999). 3.5 Numerical Examples Example 1 Carter et al. (1986) utilized logistic regression in a cytotoxicity study to investigate the doseresponse curve for a combination of two agents. Cell cytotoxicity for the combination of the two agents, methylmethanesulfonate (MMS) and phorbol 12myristate 13acetate (PMA), was evaluated by counting the number of viable and dead cells. Interest was in modeling 7r(x), the probability of death, as a function of x, = concentration of MMS/10 and x2 = concentration of PMA. The results of the experiment are shown in Table 3.1. Each of the two agents had 4 levels. The fitted model is given in equation (3.16) and its estimated parameters and standard errors are shown in Table 3.2. Table 3.1: Experimental design and Xl MMS/10 0 0 0 0 1 10 25 1 1 1 10 10 10 25 25 25 X2 PMA 0 1 10 100 0 0 0 1 10 100 1 10 100 1 10 100 log 1 (x) = + + 312x2 + /3x /322x2 log 11 7rx == A0 + #1X1 + 032X2 1 2l?+ ta (3.16) The experimental region, R, of the agent combinations is rectangular with 0 < x, < 25 and 0 < x2 < 100. Let the design shown in Table 3.1 be denoted by D1. To illustrate the proposed graphical technique, QDGs can be utilized to investigate the prediction capability of D1 and to compare it against two other designs, D2 and D3. The latter two designs have the same settings for x, and x2 as in D1, but differ in the numbers of replications at the combinations of x, and x2. Design D2 is balanced in the sense that the numbers of replications are equal with magnitudes Number of Dead Cells r. 19 24 54 68 16 17 19 19 41 79 10 36 62 36 56 74 Total Number of Cells m" 98 87 91 87 91 90 92 94 90 93 83 85 83 92 93 87 1436 Yu = r 0.19 0.28 0.59 0.78 0.18 0.19 0.21 0.20 0.46 0.85 0.12 0.42 0.75 0.39 0.60 0.85 response values (Example 1) Table 3.2: Parameter Estimates and Model Analysis (Example 1) Parameter Estimate Std Error PValue Ao 1.329935000 0.1179 < 0.0001 01 0.083535200 0.0266 0.0017 /2 0.159119900 0.0163 < 0.0001 311 0.003883342 0.0010 0.0002 /322 0.001307519 0.00016 < 0.0001 Scaled Deviance = 13.9195 DF = 11 comparable to those for D1, which is considered nearly balanced. Design D3 is unbalanced with widely different numbers of replications. The total number of replications is the same as in D2. These designs are given in Table 3.3. For each of the three designs in in Table 3.3, we consider the distribution of TD(X, /3) on each of several concentric rectangles, RX, obtained by a reduction of the boundary of the experimental region, R, using a shrinkage factor A, 0.5 < A < 1 (see Figure 3.1). This is done as follows: let the bounds on variable xa be a2 and b1 such that ai < xi < bi, i = 1,2. For each value of A, the corresponding reduced bounds on xi are a, + (1 A)(b1 a.) < x, < b, (1 A)(bi a2). To investigate the dependency of TrD (x, /3) on /3, a subset, C, of the parameter space for the /3 vector is considered. For each parameter, a range consisting of the point estimate (given in Table 3.2) plus/minus two standard errors was considered. The subset C was obtained by selecting 3 points within each parameter range, namely, Table 3.3: Designs: D1, D2, and D3 (Example 1) D1 Number of Cells 98 87 91 87 91 90 92 94 90 93 83 85 83 92 93 87 1436 D2 D3 Number of Cells Number of Cells 90 15 90 25 90 35 90 45 90 55 90 65 90 75 90 85 90 95 90 105 90 115 90 125 90 135 90 145 90 155 90 165 1440 Xl MMS/10 0 0 0 0 1 10 25 1 1 1 10 10 10 25 25 25 X2 PMA 0 1 10 100 0 0 0 1 10 100 1 10 100 1 10 100 1440 0 0 lambda = 1 0.9 c00 0.8 0.7 0.6 0 X C> cm C:I 0 5 10 15 20 25 xl Figure 3.1  Concentric rectangles within the region R (Example 1) the point estimate and the two end points. ,0 E (1.565667,1.329935,1.094203) /1 E (0.13676336, 0.08353520, 0.03030704) C = /32 e (0.1264244,0.1591199,0.1918154) /3n (0.001821515,0.003883342,0.005945169) /822 E (0.0016228061, 0.0013075190, 0.0009922319) For each design and a selected value of /3 in C, quantiles of the distribution of TD(X, /3) are obtained for x E R\, where A is one of several values of A chosen from the interval (0.5, 1]. The number of points chosen on each R\ was 2000 consisting of 500 on each side. The quantiles are calculated for p = 0(0.05)1. The procedure is repeated for other values of /3 in the subset, C. Then, Qmax(p, A) and Qmin(p, A) are calculated using formulas (3.14) and (3.15). The SPLUS language (Becker, Chambers, and Wilks 1988) was used for conducting the numerical investigation and obtaining the actual plots. The QDGs for the comparison of designs D1 and D2 are presented in Figure 3.2. The plots for the two designs overlap so closely indicating a very small difference in the prediction capabilities for D1 and D2. We recall that D1 differs from D2 by having slightly unequal numbers of cells for the agent combinations. The dependency on the parameter vector /3 can be seen in the separation in the values of Qmax and Qmn, especially near the center of the experimental region (A = 0.7,0.6). The prediction capabilities of the designs appear to be sensitive to the parameter values. A desirable property for a design is to have close and small values of Qm" and Q"i' over the range of p. The closeness of Q ax and Qrmni indicates robustness, or lack of sensitivity, of the MSEP to the parameter values. lambda=1  D1 A  D2 0.0 0.2 0.4 0.6 0.8 1.0 lambda=0.7 , = ; ." ; ."; ;; ;; ; A; ;. a C,, LU 00 "8 o 4 Sc5 5 5 C) C) lambda=0.9 0.0 0.2 0.4 0.6 0.8 1.0 Q. La C>) ,, 0 0 c a) ' co M aC c 0.0 0.2 0.4 0.6 0.8 1.0 lambda=0.6 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.2  QDGs for Designs D1 and D2 (Example 1) 0 WO C,, a. 0 o c w 5; C/) 0 = 0 C> c; CL cl) 9 ULJ 03 O % o 4, CO 86 Cu In order to see the effect of extremely "unbalanced" cell counts at the agent combinations, designs D2 and D3 are compared in Figure 3.3. Once again, the designs appear sensitive to the parameter values especially near the center of the experimental region. However, the balanced design D2 appears to be somewhat more robust to the parameter values than D3. Example 2 Recognizing the dependency of design criteria on model parameters, Sitter (1992) proposed a minimax procedure to obtain designs for binary data that are robust to poor initial parameter estimates. A number of design optimality criteria, including Doptimality, were considered. This minimax criterion resulted in designs with more design points and greater coverage of the design space than traditional design criteria. Experimental situations with one control variable, x, were investigated. The probability of interest, ir(x), was modeled as a function of /O(x p). The corresponding linear predictor is /30 + lx, where the slope parameter /#1 is /3 and the intercept parameter /0 is Op Here, ps is the 50% response dose, which is termed ED50. This results in the following logistic regression model log r(x =[x.p]= o + O3.u = 1,2,...,n. Let xu be the uth design point, 7r(xu) the corresponding probability, and m, the number of observations at xu. The determinant of the information matrix, IX'WX I, for this model, is equal to (Scazzero and Ord 1993, p. 257) E Zmimj7r(xi)[1 7r(xi)]7r(xj)[1 7r(xj)][xi xj]2. i be gained by a close examination of this result. The criterion depends upon the lambda=1 A  D2 / * D3 A wc'0 CI C)< 2o 0 D o 3 0 06 0.0 0.2 0.4 0.6 0.8 1.0 ______ lambda=0.7 ___ ^ AA AA & aA ^ AA A .f^ ^ 0. co 0 CO 3 0 060 0.0 0.2 0.4 0.6 0.8 1.0 lambda=0.9 0.0 0.2 0.4 0.6 0.8 1.0 lambda=0.6 A Atr A A A A " 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.3  QDGs for designs D2 and D3 (Example 1) a w C,,, o JO 0 3 0 0 d 0 C/ 0 O C) 0 3 0 0 C5 ~~AAA.~.AAA 38 probabilities as well as the design points specifically through their spacings [xi Xj]2. For the Doptimal design consisting of two design points the determinant reduces to mim27r(xi)[1 7r(xi)]7ir(x2)[1 7r(x2)][Xl x2]2. Sitter's (1992) minimax approach amounts to finding a design that minimizes the maximum of the reciprocal of this determinant over some parameter space of A and 3. Note that the determinant is close to 0 if either 7r(xi) or 7r(x2) is near 0 or 1. In this situation, the Doptimality criterion becomes difficult to interpret and calculating the inverse of the information matrix can become problematic. The inverse of the information matrix is used in calculating the MSEP, (see equation (3.13)). Even though the design space may be unbounded in an experimental situation, Atkinson (1995) noted that designs for logistic regression models are constrained such that the probabilities were not too close to zero or one. For the comparison of designs, it will also be necessary to be cautious in the selection of the parameter space since the probabilities also depend upon the model parameters. Sitter (1992, p. 1151) presented an illustration of the minimax procedure. A logistic regression model was utilized in a study to investigate the economic value of sport fishing in British Columbia. The control variable was the amount of increased fishing cost, x, and the binary response consisted of fishing/not fishing. Probability of fishing was to be modeled as a function of increased fishing cost. The Doptimal design for initial parameter estimates o, = 40 and 3, = 0.9 consisted of two design points x = 38.28,41.72. The minimax Doptimal design produced by Sitter's minimax procedure consisted of the 7 design points x = 31.72,34.48,37.24,40.00,42.76, 45.52,48.28. The minimax design was to be robust over the parameter space given by 33 < M! < 47 and 0.5 < /3 < 1.25. Both designs assume equal number of observations at the design points. Sitter noted that the minimax Doptimal design performed better than the Doptimal design for most of the parameter space. Sitter also noted the performance of the Doptimal design was very sensitive to the initial estimate of it. The Doptimal design was better than the minimax Doptimal design for initial estimates of i. that were nearly correct. As noted before, the Doptimality criterion for the two point Doptimal design can be affected if the probabilities get close to zero or one. For this example, the probabilities for the Doptimal design equal 7r(38.28) = 1 and 7r(41.72)= +exp([41.72])* Note that as I moves away from the initial estimate of 40, the probabilities can get very close to zero or one regardless of the value of /3. Thus over much of the parameter space, the Doptimal design has a determinant close to zero. As a result, the parameter space for M was restricted to 38 < It < 42 in order to compare the designs in terms of prediction capability. Note also the presence of 7r(x) and 1 7r(x) in the MSEP expression in equation (3.13). We should therefore be cautious about probability values near zero or one. An experimental region, R, 30 < x < 50 was then considered to control the probability values and to agree with the actual regions covered by the two designs. The following is a graphical comparison of the prediction capabilities of the two designs for the model log [ (x ] = 3o + P3xl u = 1,2,... ,n. The parameter space C for (/60, f1) is trapezoidal and consists of 0.5 < / < 1.25 and 01(42) < Ao < i31(38). This parameter space corresponds to 38 < IL < 42 and 0.5 < /3 < 1.25 where the slope parameter /31 is /3 and the intercept parameter /3o is /,31. The experimental region investigated was 30 < x < 50 and the designs consisted of 70 runs with equal weight to the design points For selected parameter values (/30, /1) in C, the MSEP [see equation (3.13)] can be calculated throughout the experimental region. Since in this example there is only one control variable, no 40 shrinkage of the experimental region was necessary. Minimum and maximum MSEP plots for points in the experimental region can be constructed where the minimum and maximum are obtained over the parameter space C. The plots for the two designs are given in Figure 3.4. We can see that the Doptimal design does well near the center of the experimental region while the minimax Doptimal design performs better away from the center. The quantile dispersion graphs for the two designs are given in Figures 3.5 and 3.6. From these plots, it appears that the dispersion of the quantiles for the minimax Doptimal design is less than the Doptimal design. From this, it appears that the minimax design is somewhat more robust to the parameter values. Figure 3.7 contains an overlaying display of the QDGs for the two designs. It is evident that the minimax design performs slightly better than the Doptimal design. Sitter (1992) claimed that the minimax design performed much better over a large portion of the parameter space. His claim can be partly understood by considering the sensitivity of determinant of the twolevel Doptimal design to the initial estimate of it. For Sitter's parameter space the twolevel Doptimal design has a determinant close to zero for much of the parameter space of p. However, our graphical analysis used a restricted parameter space, as was seen earlier, to allow for a fair comparison of the two designs. In this analysis, the minimax Doptimal design still appears to perform better than the Doptimal design, but not to the level claimed by Sitter. Example 3 Piegorsch (1998) provided an introduction to modeling for data which are proportions often encountered in engineering/quality settings. He presented an example involving alloy fasteners used in aircraft construction (Montgomery and Peck 1982, sec 6.3). Interest was in modeling the probability of fastener failure 0 c'J _______\________ S, Minimax Dopt  Dopt LO C) w / I W 6 & / / \ 30 35 40 45 50 x i / \\/ ^  Figure 3.4 Minimax Doptimal and Doptimal Designs MAX and MIN MSEP (Example 2) in/ // \ \ o ^ '_______^ \^^_______""~ o '_________________________ 30 35 40 45 50 X Figure 3.4  Minimax Doptimal and Doptimal Designs MAX and FAIN MSEP (Example 2) S Quantiles: Initial Parameter Estimates I Figure 3.5  QDG for the Minimax Doptimal Design (Example 2) 0 0   Quantiles: Initial Parameter Estimates I Figure 3.6  QDGs for the Doptimal Design (Example 2) 0 5  Minimax Doptimal  Doptimal . 8 . 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3.7  Combined QDGs for the Minimax Doptimal & Doptimal Designs (Example 2) Table 3.4: Experimental design and response values (Example 3) X =_ Pressure Load 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300 Table 3.5: Parameter coded x = 2X6800 1800 1 7/9 5/9 3/9 1/9 1/9 3/9 5/9 7/9 1 m = Total Fasteners 50 70 100 60 40 85 90 50 80 65 Estimates and Model Analysis using Parameter A3 Estimate 0.0750 1.3936 Std Error 0.0830 0.1418 y= Number Failing 10 17 30 21 18 43 54 33 60 51 coded x (Example 3) PValue 0.3657 0.0001 Scaled Deviance = 0.3719 DF = 8 as a function of pressure load. The results of the experiment are given in Table 3.4. 10 different pressure were considered with the number of fasteners and failures recorded. The fitted model is given in equation (3.17) and its estimated parameters and standard errors are shown in Table 3.5. log 1 x) ]= 30 +,3lx (3.17) The Doptimal design for experimental situations using model (3.17) is given by Minkin (1987). This design is a two point design with xvalues: x, = (1.5434 + /o)//3 and x2 = (1.5434 #o)/#i. This design illustrates the design dependency problem. Using the estimates from Table 3.5, in retrospect, results in x, = 1.053655 and x2 = 1.161344 which are values outside the explored experimental region. To investigate the prediction capabilities of the design, it will be compared to a design with the same pressure loads but with equal replication (69 fasteners) at each of the loads. The design in Table 3.4 will be denoted as Da and the balanced design as Db. The parameter space investigated consisted of the point estimates, point estimates standard error and point estimates 2 standard errors. Figures 3.8 and 3.9 contain the max and min plots of SMSEP and the QDGs of the SMSEP for the two designs. It is evident that the two designs exhibit very similar prediction capabilities. There is considerable variation in the SMSEP throughout the experimental region, being low near the center and larger near the boundaries. The designs do, however, appear to have a robustness to the parameter values. 3.6 Conclusions The proposed graphical technique, based on using the QDGs for the mean squared error of prediction, provides a procedure for evaluating and comparing designs for GLMs. The QDGs provide information concerning the prediction capability of a design and its sensitivity to the parameter values. This information is useful in the evaluation and comparison of designs. Myers (1999), in considering designs for GLMs, stated that standard response surface designs may work well in such experimental situations. Because of the dependency of design performance on the model parameters, it is difficult to judge the performance of such designs. Myers reiterates the need for robust designs that are efficient despite the dependency on model parameters. Further work with QDGs 0 5 a 0 co 0 0 C 0 __ Da> o o D 0. 0 6x Fiur 3.8 a and Q ein/A n MNM Eape3 ", Db O '\ O CO S0 \\ / 0 CO0 \\ /.e o  1.0 0.5 0.0 0.5 1.0 X Figure 3.8  D._a and D._b Designs MAX and MIN MSEP (Example 3) 0 0 0 0 0  Da/ 6 Db co o _' 0 ui 0 6 C> 0 C 0.0 0.2 0.4 0.6 0.8 1.0 P Figure 3.9  Combined QDGs of MSEP for the D_a & D_b Designs (Example 3) 0 0 0.0 0.2 0.4 0.6 0.8 1.0 p Figure 3.9 Combined QDGs of MSEP for the DLa & Db Designs (Example 3) and other graphical techniques may be useful in evaluating standard response surface designs and in the search for robust designs. CHAPTER 4 EVALUATION AND COMPARISON OF STANDARD RSM DESIGNS UNDER NONSTANDARD CONDITIONS 4.1 Introduction The development of a graphical technique for the evaluation and comparison of experimental designs for logistic regression models was presented in chapter 3. The Quantile Dispersion Graphs (QDGs) give a graphical representation of the prediction capabilities of a design based on the distribution of the meansquared error of prediction. They also convey information about the effects of the design, location of prediction, and parameter values on the prediction capability of a particular experimental situation. This chapter examines the implementation of the QDGs for the evaluation and comparison of classical RSM designs when utilized in nonstandard conditions. A discussion is also presented concerning the General Equivalence Theorem and the sequential generation of optimal designs. 4.2 Preliminary Results Extensive research has been done in the area of experimental design for linear models with the traditional error assumptions of normality, independence, and constant variance. For the firstorder model, k y= o + ixi +E, i=1 where x1, x2,..., Xk is a set of control variables, the class of orthogonal designs has many desirable properties. This class includes 2k factorial designs, certain fractions 50 thereof, as well as simplex and PlackettBurman designs (Khuri and Cornell 1996, chap. 3). For the secondorder model, k k k1 k y=83o X/3iXiE 3jx2+ E Z8jXiXj +,E, i=1 i=1 i=1 i examples of common designs include the 3k factorial designs, BoxBehnken designs, and central composite designs (CCD) (Khuri and Cornell 1996, chap. 4). In a nonstandard situation, when the response is not measured on a continuous scale, or when the traditional assumptions fail, a researcher may continue to consider using these classical RSM designs. In such situations, how do these designs measure up? Myers (1999, p. 34) states that standard RSM designs may work well in these situations, but wonders, "How would one know?" The QDGs, which were developed in chapter 3, can be used to evaluate and compare the prediction capabilities of classical RSM designs. In the comparison of these RSM designs it is not only of interest to compare designs against one another, but also to compare the performance of the designs to an "optimal" design. Myers (1999, p. 40) states that any comparison of designs must include a comparison with an "optimal" design. Basically, the optimal design will serve as a baseline or a reference point with which to compare the designs. 4.2.1 Design Theory The concepts of design measure and moment matrix are important ideas in the theory of optimal designs and will be discussed in this section. Khuri and Cornell (1996, chap. 12, sec. 2) provide an introduction to optimal design theory. An experimental design over an experimental region, R, can be defined as a probability measure and is referred to as a design measure, (. The design measure, C, satisfies the two properties ( (x) > 0 for all x E R and fR d((x) = 1. A design measure is said to be discrete if it consists of n points in R where C(Xk) = mk/N, if Xk is a design point, and 0 otherwise (k = 1, 2,... n). Note, that mk denotes the number of observations being collected at the kth distinct design point (k = 1,2,... n) and N = M= k. All other design measures are called continuous design measures. For the definition of the moment matrix, consider a linear model of the form y = f'(x)f + E. The moment matrix of a design measure C is M() = [mij(0)] = [f fi(x)fj(x)dC( (x)] where fi(x) is the ith element of f(ax) evaluated at a point x E R (i = 1,2,... ,p). For a discrete design (, it follows that M(C) = En=1 f(Xk)f'(Xk) = NX'X where X results from expressing the linear model in matrix notation, that is, y = Xf3 + e. Using the moment matrix, the standardized prediction variance function is defined to be d(x,C)= f'(x)[M()]1f(x). A design measure is defined to be Doptimal if it maximizes IM(C)I over the class of all design measures. A design measure is Goptimal if it minimizes over the class of all design measures the maximum standardized prediction variance over the experimental region R. 4.2.2 Equivalence Theorem and Extensions for GLMs The Equivalence Theorem of Kiefer and Wolfowitz (1960) established a relationship between Doptimality and Goptimality. This result has been utilized in the evaluation of designs as well as the construction of designs (Wynn 1970). The Equivalence Theorem basically establishes three design properties that are equivalent and, hence, occur simultaneously. When applied to linear models, the equivalence theorem gives the result that a Doptimal design measure (* satisfies the following properties simultaneously: C* maximizes IM(()I, over the class of all design measures minimizes maxxd(x, (), over the class of all design measures maxxd(x, (*) = p, where for linear models, d(x, C) = f'(x)[M(()]1f(x) is the standardized prediction variance, and p is the number of parameters in the linear model. This result gives the equivalence of D and Goptimal designs for linear models. Note that for a discrete design measure, d(x, ) = Var[(x)], where Var[a(x)] is the prediction variance at x. White (1973) provided an extension of the Equivalence Theorem to nonlinear models by giving a generalization of the variance function, d(x, (). Noting the dependence of designs upon the parameters of a given nonlinear model, White defines a design measure (* to be D(o)optimal if it maximizes the determinant of M((, /3) for a given f3 over the class of all design measures. The generalization of d(x, () is given by d(x, C,3) = trace{I(x,/3)[M(C,/3)]}, (4.1) where I(x, f3) is the information matrix at the point x. The (i, j)th element of I(x,/ ) is E L OL where L is the log of the probability density function or probability mass function associated with a random variable observed at the point x (White 1973, p. 345). The information matrix at x for GLMs can be written as (Atkinson and Haines 1996, p. 456) 1 (4.2) x = Var[y(X)]g,[(X)]}2f(x)f(x) 42) The Equivalence Theorem can be extended by defining a design to be G(13)optimal if it minimizes the maxxd(x, ,3) for a given f3 over the class of all design measures. The Extended Equivalence Theorem given by White states that the following three properties of a design measure (* are equivalent C* is D(/3) optimal, over the class of all design measures C* is G(/3) optimal, over the class of all design measures maxxd(x,C *, 3) = p. Application of the Extended Equivalence Theorem to GLMs For generalized linear models, the response y(x) has a distribution in the exponential family with a mean, E[y(x)] = t(x). The linear predictor, 77(x) = f'(x)/3 is related to the mean through a strictly monotonic increasing link function, g(.). The relationship is 77(x) = g[p(x)] = f'(x)/3, and using the inverse function h(.) = g'(.) gives the result that (i(x) = h(77(x)). The model for the response can be written, as = h[q(x)] + e, where the random error, E, has a distribution in the exponential family (see chapter 3, section 3.2, equation 3.2). The moment matrix for GLMs for a discrete design measure C is M((, 8)= XWX = Z 1( f(Xk)f( (4.3) N ~ ~k=1 N ,#(43 where q is the dispersion parameter, the summation is over the n distinct design points with replication, Mnk, N = Y M ik, and OW is a diagonal matrix whose uth diagonal element is w(, Var[ )] )]' u = 1, 2... N. The information matrix, I(xc, /3), at x is defined as ' ff(x)f'(x) (see equation (4.2)). Using this, w(XT/) the variance function of White (1973), (4.1) can be written as follows d(axC,(,/3) = trace{I(x,/3)[M((,/3)]1} 1 = trace{w(1)f(x)f(aX)[M(CA)]} w(x,f3) A motivation for the above result can be given by recalling that in linear models, the standardized prediction variance is d(x, () = Var[y(x)] for a discrete measure (. For GLMs there is a choice, however, concerning the use of either the prediction variance of the linear predictor, i)(x), or the estimated mean response, A(x). The following result establishes that the two approaches are equivalent and agree with the variance function of White (1973). Under the GLMs setting, consider i(x) = f'(x)3 and /2(x) = h['i(x)]. The prediction variance of 17(x), Var[)(x)], is approximately equal to 1f'(x)[X'WX]lf(x), and the prediction variance of /2(x) using firstorder Taylor's series is Var(/2(x)) 1f'(x)[X'WX] f(x){h'[7'(x)]}2. Applying the analogue of the scaling from linear models results in N Var[A(x)= f'(x) M((, 0)]'f(X) Var[y(x)] Var[y(x)] 1 '[M(,] (var[y(x)][g(p(x))] w(x, f 3)(X)[M(C,"],f W = d(x, C,3). Therefore, the two approaches of getting d(x, (, f3) are equivalent. Example Logistic Regression Models Binary response data are commonly modeled by using logistic regression models. The binary response, y(x), has the binomial distribution with index, 1, and probability of success, ir(x). The mean of the response is p(x) = 7r(x) and the variance is Var[y(x)] = 7r(x)[1 ir(x)]. Using the logistic link function gives log [(X) = f'(x)3 = (x). The weight, w(a,k), equals X1 and the moment matrix for a discrete design measure ( is M(C,3) = 1 X'WX = Z^=1 E'!r(X)[1 r(xk)]f(xk)f'(xk), where mk if the number of experimental units at the kth distinct experimental run (k = 1, 2,... n). The standardized prediction variance, d(x, (, 3), follows as 7r(x)[1 7r(x)]f'(x)[M(, 0)]1f(x) = Nr(x)[1 r(x)]f'(x)[X'WX]1 (x). (4.4) This is the appropriate variance function for using the Extended Equivalence Theorem and establishing optimal design results. Note that since the variance of the response is no longer constant, scaling of the prediction variance does not remove the variance term completely. The presence of the binomial variance, 7r(x)[1 7r(x)], is evident in the variance function. The variance function incorporates the variance of the response, y(x), at the location, x. 4.2.3 Sequential Generation of Doptimal Designs Using the Equivalence Theorem of Kiefer and Wolfowitz (1960), Wynn (1970) proposed a sequential procedure for the generation of Doptimal designs for a linear model. Wynn's procedure generates a sequence of designs as follows: 1. Start with an initial discrete design, D, with no points, x1, X2.,.. Xno, which support the fitting of the linear model. 2. Find a point, xa, which maximizes the variance function, d(x,DO) = f'(x)[M(Do)]l f(x). Augment Do by adding the point x0 to get the design D1. 3. Repeat step (2) to obtain a sequence of designs D, D1,... Dn,... where D" is obtained from Dn1 by adding a point of maximum standardized prediction variance based upon using D"1. Wynn established that this sequence of designs approaches the Doptimal design. In practice, the procedure is performed until "convergence" of the variance function to p is attained. A natural extension of the sequential procedure for the generation of designs for GLMs and nonlinear models would involve the use of the Extended Equivalence Theorem and the standardized prediction variance function. The procedure would require a specification of initial parameter values and, therefore, the resulting design would be locally optimal. Sequential Generation of Design SingleVariable Logistic Model For the singlevariable logistic model, log [ ()= ] 3o = 3lx, the Doptimal design is known to be a twopoint symmetric design with equal replication at the x values corresponding to effective dose (ED) probabilities of 0.824 and 0.176 (Abdelbasit and Plackett 1983; Minkin 1987). To illustrate the sequential generation of designs, the procedure was applied to the singlevariable logistic model. Initial parameter values of 0 for the intercept, #o, and 1 for the slope, 31, were used resulting in the model, log r ( = x. I "l (X, I) X Let 30 = (0, 1). The Doptimal design is then to have equal replications at log 17 = 1.5437 = x, and log 0824 = 1.5437 = x2. The design, D, consisting of the 3 points of 1, 0, and 1 with equal weights, i.e. u was used as the initial N , 3 a sd te nta design and the experimental region, R, was selected to be [1,1]. The standardized prediction variance function in (4.4) was utilized. The maximum of d(x, D, 00) over R is equal to 2.417 and is attained at the location x = 1. This value was added to the initial design to obtain the design D1. Application of the sequential procedure either added the value of x = 1 or 1 to the previous design. The design with 25 points resulted in a maximum variance function of 2.031 while the design with 75 points resulted in a value of 2.010. As can be seen, the Doptimal design consists of points located on the boundary of the experimental region, as is the case with Doptimal designs for linear models. The above procedure was performed again using an experimental region [2,2] and the same initial 3point design, D. The initial design had a maximum variance function value of 3.694 at the location x = 2. This point was added to the initial design and the resulting design had a maximum variance function value of 3.460 at the point x = 1.94. The sequential generation of designs was continued until a 77point design had a maximum variance function value of 2.010. The Table 4.1: Frequency Table For Sequential Design Points X Frequency Percent 1.94 1 1.3 1.78 1 1.3 1.68 1 1.3 1.64 1 1.3 1.6 1 1.3 1.58 2 2.6 1.56 7 9.1 1.54 23 29.9 1 1 1.3 0 1 1.3 1 1 1.3 1.54 22 28.6 1.56 7 9.1 1.58 2 2.6 1.6 1 1.3 1.64 1 1.3 1.68 1 1.3 1.76 1 1.3 1.94 1 1.3 2 1 1.3 structure of the design can be seen in Table 4.1. The design has converged towards the Doptimal design of equal replication at x, = 1.5437 and x2 = 1.5437. The starting design points of 1, 0, and 1 are clearly seen to be very poor points. Recall that this design is a locally Doptimal design for the initial parameter values of ,0 = 0 and /i = 1. Atkinson and Donev (1992, p. 119) note that at this point there are several approaches for more precisely finding the Doptimal design. These approaches include exchange algorithms in which the poor points from the initial design are dropped and the sequential procedure is started again. Other possibilities include analytical optimization, if possible, and numerical methods which use the design from the sequential procedure as a starting point. 4.3 Scaled MeanSquared Error of Prediction In chapter 3, the development of the QDGs involved the use of the meansquared error of prediction (MSEP). The MSEP incorporates both the prediction variance and prediction bias. MSE[A(x)] = Var[A(x)] + {Bias[A2(x)]}2. The Equivalence Theorem establishing the relationship between Doptimality and Goptimality uses a standardized prediction variance function. For linear models, this function results in a quantity that is free of scale and is placed on a per observation basis. GiovannittiJensen and Myers (1989), in their development of variance dispersion graphs, utilized the variance function, d(x, (), for linear models. Recently, Khuri and Lee (1998) proposed quantile plots of the estimated scaled meansquared error of prediction for the evaluation and comparison of designs for nonlinear models. The scaled meansquared error of prediction is given by N N N Ny MSE[JA(x)] = Var[ Var f[(x)] + ar[y(x)] {NBias[f(x)]}2 (4.5) Var[y(a)] Var)y(x)] VariyWx)] The first portion can be recognized as the standardized prediction variance function, d(x, (,3). This portion depends on the location of prediction, x, the parameters, 03, and the experimental design. This quantity can be evaluated by using the design weights, 1, which enter in the moment matrix, M((, 3). From equation (4.3), we note that the moment matrix depends only upon the design weights and the distinct design points of the experimental design. Working with the design weights gives flexibility in the consideration of designs and removes the dependency upon N since the computation of d(x, (, 0) would only require knowledge of the distinct design points and the design weights. Rewriting the second portion in (4.5) as N 1 _{Bias[ft(x)]}2 = f N_______]} Var[y(x)] NVar[y()] {Bias[f()]}2 also allows the computation of NBias[#(ax)] in terms of the design only through the distinct design points and the design weights. In order to see this, consider that NBias[f(x)] = Ndias[(x)] + NVar[(d)] d2 (4.6) (see chapter 3, section 3.2, equation 3.8). We consider the two portions of equation (4.6) separately. From the first portion, consider NBias[r(z)] = Nf'(z)Bias(/). It needs to be shown that NBias(/3}) can be calculated using the distinct design points and the design weights. From section 3.2 of chapter 3 equation (3.5) it follows that NBias[] = N()(X'WX)X'ZdFIN 20 S ) [XWX] 1 ZdFlN = ( )[X' Xl'zdF1N ()[M(,61)]1X'ZdF1N The moment matrix can again be computed using the distinct design points and the design weights. It needs to be established that X'ZdFI N can be computed using the distinct design points and design weights as well. Zd is a diagonal matrix which can be written as a block diagonal matrix with the matrices zklmkI.k along the diagonal, where Zkk is the element (see chapter 3, section 3.2, equation 3.5) associated with the kth distinct design point with replication, mk. F is also a block diagonal matrix with the matrices fkklm, along the diagonal, where fkk is the element (see chapter 3, section 3.2, equation 3.5) associated with the kth distinct design point with replication, ink. X'ZdFIN can now be expressed as Y= m zkkfkkfk(Xk) where the summation is over the n distinct design points. In order to use the design weights, k N we need in the expression. Consider using , mkZfkkf(xk). Note that Nzkk is equal to f'(Xk) [x'W]X f(xk) (k = 1,2,... n). It finally follows that X'ZdFlN can be calculated using the distinct design points and the design weights. Considering now the second portion of equation (4.6), note that 1 YWY f W NVar[n(x)] = f'(X) XWX] f(x) = f'(x)[M(C,f3)]1f(x). Therefore, the second portion of equation (4.6) depends upon the moment matrix, M((,/3) and can be computed without knowledge of N. Note, however, that the prediction bias portion of the scaled meansquared error of prediction (SMSEP) does depend upon the total sample size, N. The researcher must therefore have some idea about the number of experimental runs available for the experiment. If the researcher is only interested in using the standardized prediction variance the comparisons can be performed without the specification of N. 4.4 SMSEP for the Logistic Regression Model For binary response data modeled using the logistic link function, the meansquared error of prediction is given by MSE[7r(x)] = Var[#(ax)] + {Bias[f(x)]}2. The appropriate scaling for this situation is N which is the total sample 7r(x)[17r(X)] size divided by the binomial variance for a single observation at the location, x. The scaled meansquared error of prediction takes the form N MSE[_f(x)] = NVar[_(_)] {NBYias[(x)]}2 (47) r(x)[1 7r(x)] L J r(x)[1 7r(x)] Nir(x)[1 r(x)]" The SMSEP depends upon the unknown f3, the design, and the control variables, i.e. the elements of x. Let TD(X, 13) = SMSEP[fr(x)] where D is a given design. To evaluate TD(ax, 6) for a given a design, an estimate of rTD(a, 3) can be obtained by replacing 13 with /3. Otherwise, some parameter space concerning 13, denoted by C, must be considered. The QDGs can be constructed for the standardized prediction variance function and the SMSEP as illustrated in the following example. 4.5 A Numerical Example Consider again (see chapter 3, section 3.5) the cytotoxicity study described in Carter et al. (1986, sec. 2.5). The investigation involved two control variables and interest was in modeling 7r(x), the probability of death of the cells. The experimental design, a 42 factorial design, and response values yu are given in Table 4.2, and the parameter estimates for a complete secondorder model for the linear predictor are given in Table 4.3. The generalized linear models procedure (PROC GENMOD) in SAS (1997) was used to fit the model. The results of the experiment were utilized as preliminary information for the evaluation and comparison of designs by using the estimates as initial guesses for the parameter values. The subset, C, of the parameter space was obtained by selecting 3 points for each parameter, namely, the estimate and the estimate plus/minus two standard errors. It would be of interest to evaluate and compare the prediction capabilities of two secondorder designs, namely a 32 factorial design and a central composite design. These designs can be compared against a Doptimal design which will need to be obtained through sequential generation. Table 4.2: Experimental design and response values Number of Dead Cells r, 19 24 54 68 16 17 19 19 41 79 10 36 62 36 56 74 Total Number of Cells m" 98 87 91 87 91 90 92 94 90 93 83 85 83 92 93 87 1436 Table 4.3: Parameter Estimates Parameter 02 ill /22 012 Estimate 1.3430 0.0818 0.1598 0.0039 0.0013 0.0001 Std Error 0.1232 0.0270 0.0165 0.0010 0.0002 0.0002 Xl MMS/10 0 0 0 0 1 10 25 1 1 1 10 10 10 25 25 25 X2 PMA 0 1 10 100 0 0 0 1 10 100 1 10 100 1 10 100 yu = r M.n 0.19 0.28 0.59 0.78 0.18 0.19 0.21 0.20 0.46 0.85 0.12 0.42 0.75 0.39 0.60 0.85 4.5.1 Sequential Generation of Doptimal Design The experimental region is rectangular with x1 E [0, 25] and x2 E [0,100]. The sequential generation of a Doptimal design was performed using the parameter estimates from the "initial" experiment as the initial parameter values. The initial design, D, consisted of the 16point 42 factorial design used by Carter et al. (1986) The initial design can be seen in plot A of Figure 4.1. The sequential generation resulted in a design with N = 375 experimental runs and a maximum d(x, f(,3) value equal to 6.043. This design, however, consists of only 36 distinct points (see plot B of Figure 4.1). Notice that the points tend to fall into clusters or groups which suggests that a reduction of the points can be obtained. A cluster analysis was performed using the SPlus language (Venables and Ripley 1994, sec. 12.2) which reduced the number of distinct design points to 9. The resulting design had N = 375 experimental runs with a maximum d(x, f3) value of 6.085 (see plot C of Figure 4.1). In an attempt to find the Doptimal design more precisely, this design was used as an initial design and the sequential procedure started again. This resulted in an additional 33 runs and after clustering the resulting design had 9 distinct design points with 408 experimental runs and a maximum d(z, (, /3) value of 6.028. This approximate Doptimal design (see plot D of Figure 4.1) appears to be an adjusted 32 factorial design. 4.5.2 Comparison of Designs For the comparison of designs, the approximate Doptimal design (see plot A of Figure 4.2) along with a 32 design and CCD designs were considered. The 32 design consisted of the 9 points associated with x, 0, 12.5,25 and x2 E 0, 50, 100 with equal replications and 405 experimental runs (see plot B of Figure 4.2). The CCD designs used a coded axial value of a = 1 in order to remain within the A: initial design N=16 w/16 distinct pts max d = 17.357 0 5 10 15 20 25 X1 C: after cluster design N=375 w/ 9 distinct pts max d = 6.085 0 5 10 15 20 25 *0 0 5 10 15 20 25 B: after sequential design N=375 w/36 distinct pts max d = 6.043 0* 10 1 0 5 10 15 20 25 D: approx Doptimal design N=408 w/9 distinct pts max d= 6.028 0 5 10 15 20 25 * 0 5 10 15 20 25 Figure 4.1  Sequential Generation of Doptimal Design 67 experimental regions. The levels for the CCD designs (corresponding to the coded levels of 1, 2, 0, , 1) were x, C 0, 6.25, 12.5,18.75, 25 and X2 E 0, 25,50, 75,100. Two CCD designs were used with 405 experimental runs. The first CCD design (CCD1) consisted of equal replication of the 9 design points. The second CCD (CCD2) design replicated the center point twice as much as the factorial and axial points (see plot C of Figure 4.2). The experimental designs are given in Table 4.4. The QDGs can be graphed for both the standardized prediction variance and the SMSEP. First, the performance of the two CCD designs are compared (see plots of Figure 4.3). It can be seen that on the boundary of the experimental region (A = 1) the performance of the two designs are very similar. Towards the center (A = 0.6) the performance of design CCD2 is slightly better, but overall the performances of the two designs are comparable. Since the CCD2 design had more replication at the center it is expected that it would have better prediction capabilities near the center. For the overall comparison of designs, we consider the approximate Doptimal design, 32 design, and the CCD1 design. The QDGs for the standardized prediction variance and SMSEP for A = 1,0.8, and 0.6 are given in Figures 4.4, 4.5, and 4.6. On the boundary of the region, the CCD design has the largest dispersion. The CCD design exhibits great variation in the quantiles and the quantile values larger than the median appear sensitive to the parameter values. The magnitudes of the scaled quantities are also noted to be large on the boundary. The scaling involves multiplying by N which can get large for values of 7r(x) near zero or one. Moving towards the center of the region A = 0.8 (see Figure 4.5), the CCD design still has variation in the quantile values. The Doptimal design and the 32 design, however, have max quantiles for the standardized prediction variance larger than the CCD design and have greater sensitivity to the parameter values. The CCD design is affected by the bias portion of the SMSEP and again exhibits A: Doptimal design NA IM U K 0 5 10 15 20 25 X1 B: 3A2 design N Y U I N 1 0 5 10 15 20 25 x1 C: CCD designs U iU U U M 0 5 10 15 20 25 XI Figure 4.2  Experimental Designs Table 4.4: Experimental Designs Doptimal Design (N=408) Xi X2 MMS/10 PMA Design Weight 0.0 1 55/408 0.0 18 38/408 0.0 100 61/408 12.5 4 22/408 12.5 20 44/408 12.5 100 41/408 25.0 0 54/408 25.0 15 32/408 25.0 100 61/408 32 Xl MMS/10 0.0 0.0 0.0 12.5 12.5 12.5 25.0 25.0 25.0 Design (N=405) X2 PMA Design 0 50 100 0 50 100 0 50 100 Weight 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 CCD Designs (N=405) X1 MMS/10 0.00 0.00 25.00 25.00 6.25 18.75 12.50 12.50 12.50 X2 PMA 0 100 0 100 50 50 25 75 50 Design Weight CCD1CCD2 1/9 1/10 1/9 1/10 1/9 1/10 1/9 1/10 1/9 1/10 1/9 1/10 1/9 1/10 1/9 1/10 1/9 2/10 lambda=1l  CCD1 /  CCD2 "aO C) 0 CO 00 .*a co 0)0) ..N S ~cv ^0. 0C\I Ow (0= 81r CO, 0.0 0.2 0.4 0.6 0.8 CL wI o 0 80 C) =3 8 0 CO c 4 0 U) c o M 0 *:3 a lambda=l1 / / //, /, ^ 0.0 0.2 0.4 0.6 0.8 1.0 lambda=0.6   ,^XJ " " ' /' _ . 0.0 0.2 0.4 0.6 0.8 1.0 P Figure 4.3  QDGs for the CCD designs 0.0 0.2 0.4 0.6 0.8 1.0 P lambda=0.6 _   / / / / / I / /Dopt  3/ / / / /  D o p t  3sq /  CCbl / / 0.0 0.2 0.4 0.6 0.8 1.0 P / / / / / / / / / / / / / / / / / / / 0.0 0.2 0.4 0.6 0.8 1.0 P Figure 4.4  QDGs of Standardized Prediction Variance and SMSEP  Lambda = 1 72 large dispersion in the QDG of SMSEP. Near the center A = 0.6 (see Figure 4.6), the Doptimal design shows great variation and sensitivity to parameter values for both quantiles of the standardized prediction variance and SMSEP. The 32 design and CCD design have much better looking QDGs. The CCD design is slightly better than the 32 design for the standardized prediction variance, whereas, the 32 design is slightly better for the SMSEP. Overall, the Doptimal design seems to perform well on the boundary and poorly near the center. The CCD, however, does well near the center of the region and poorly on the boundary and is affected by the prediction bias. The 32 design appears to be a nice compromise in this situation, in fact, this is a CCD design with coded axial runs at a = 1. 4.6 Conclusions This chapter investigated the use of QDGs as a graphical tool for the analysis of the prediction capabilities of classical RSM designs in nonstandard situations. In order to compare the designs to optimal designs a discussion of the sequential generation of Doptimal designs was presented. The sequential generation of designs involved an application of the Extended Equivalence Theorem. The QDGs are extended to the standardized prediction variance and the scaled meansquared error of prediction. For evaluating and comparing classical RSM designs, the QDGs allow for an assessment of parameter dependence caused by the nonstandard conditions. .0 T I I 8C) o  Dopt C Oo C3s3s ,C/ ,0 0 Ca 0 =o Dop ,/ S 0 O CO CflC i o / ,' S ; // 8/""/ A /I 2 0 i /^./I _____ 'I / '' / 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 P P Figure 4.5 QDGs of Standardized Prediction Variance and SMSEP  Lambda = 0.8 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 Figure 4.6  QDGs of Standardized Prediction Variance and SMSEP  Lambda = 0.6 0.8 1.0  CHAPTER 5 MODELING FUNCTIONS OF MULTIPLE RESPONSES 5.1 Introduction As previously mentioned in chapter 2, response surface methodology was developed around the traditional assumptions of a linear model and a normally distributed response variable. However, in a multiresponse experiment, a number of responses are observed and the variable of interest to an experimenter may be calculated from the responses. For example, Hunter and Jaworski (1986) note that in some chemical experiments, the yield is obtained by multiplying the observed values of conversion and selectivity, while in other situations the profit is often computed from several observed responses. Also, as mentioned in chapter 2, in the semiconductor industry, the experimenter is often interested in modeling the ratio of responses. These experimental situations present a challenging problem of modeling a function of multiple responses. In Section 5.2 we review "naive" approaches for modeling a function of multiple responses. In Section 5.3 we present a modeling procedure that addresses the multivariate nature of the data. Section 5.4 consists of numerical examples. This is followed by a summary of recommendations and conclusions in Section 5.5. 5.2 "Naive" Modeling Procedures For the modeling of a function of multiple responses there are two standard methods that have been utilized in the model building process. In this section, we explain and critique these two methods. 76 The first method reduces the problem to a single variable situation in which standard modeling techniques can be applied. The procedure consists of modeling the calculated response as a polynomial function of the control variables. For each experimental run the function of interest is applied to observed responses reducing the data for the run into a single response variable. This method does not recognize the multivariate nature of the data and, as a result, it does not take into account the correlation structure among the responses. The distribution of the computed response may also violate the standard assumptions for linear modeling. If, for example, the computed function involves a sum of responses, the distributional assumptions may be feasible. On the other hand, if the computed function consists of a complex function such as a product or ratio of responses, there could be serious violations of the distributional assumptions. Avoiding the multivariate nature of the data and possible violations of distributional assumptions can result in a less efficient approach for the modeling of the function of interest. The second method was proposed by Hunter and Jaworski (1986) as an alternative approach to modeling the computed response variable of interest. The procedure consists of fitting individual models to the responses and using the function of interest on the fitted models to construct a model for the response of interest. Hunter and Jaworski (1986, p. 321) suggest that their method results in flexible models and greater insight into the process under investigation. This procedure, while utilizing the individual responses, fails to address the multivariate nature of the data in the fitting of the models. Furthermore, this procedure does not take into account any information from the structure of the function of interest. The two procedures can be visualized in Table 5.1. The first procedure consists of working with the rows first, by computing the response of interest for each run and then modeling the values in the last column. The method proposed by Hunter and Table 5.1: Modeling a Function of Multiple Responses Individual Obs. on r Responses Response of Interest Run Yi Y2 Yi yr G(yi, y2,... yr) 1 yn y12 yli yir G(yn, Y12,... ,yir) 2 y21 y22 "'" y2i "" y2r G(y2,22,... ,y2r) U Yul u2 .. Yu.i ".... Yur G(y(,uy2,. ,Yur) n YNm YN2 Y"Ni YNr G(yN1, yN2,.. YNr) model model functions Yi y2 ... i y function Source: Hunter and Jaworski (1986, p. 325) Jaworski, however, consists of working with the columns of the individual responses and then combining the fitted models in the last row to obtain a fitted model for the response of interest. These two methods lack foundations which would motivate the procedures. They consist of adhoc and naive approaches to the problem. Both approaches ignore the multivariate nature of the data in the experimental situations and are therefore not taking into account any relationships that may exist among the individual response variables. 5.3 Multiresponse Analysis and Proposed Modeling Procedure For an experimental situation involving a response of interest that is computed from individual responses, it is important to consider the multivariate nature of the data. In this section, we review the analysis of the linear multiresponse model and propose a new procedure for the modeling of a function of responses that addresses the multivariate nature of the situation. Suppose that there are r responses, namely, Yi, y2,... ,yr each of which is believed to depend upon a set of k control variables denoted by x1, x2,... Xk and the experimenter is interested in modeling a function of the responses, namely, G(y 2, y.2.. y,.). Khuri and Cornell (1996, chapter 7) present a discussion of methods for the analysis of multiresponse experiments. We summarize here the estimation of a linear multiresponse model. Let us assume that the individual response variables can be modeled using polynomial regression models in the xi's. The model for the ith response is then of the form yi= f'i(x)3 +i, i=1,2,... ,r (5.1) where fi(x) is a vector of order p, x 1 whose elements consist of powers and products of powers of the elements of x = (x1, x2,... Xk)', and /Oi is a vector of pi unknown constant coefficients. From an experiment consisting of N runs, the following model for the ith response can be obtained from (5.1) y =Xii + ej i = l,2,... ,r (5.2) where y, and ej are N x 1 vectors of observations and random errors for the ith response and X, is an N x p, matrix whose uth row is f'(xu,) u = 1,2,... N. As a final step in forming the linear multiresponse model, the models in (5.2) can be combined to construct a multiresponse model of the form y =Xf3 + E (5.3) where y = [y' y : ... : y']', f = [f3': 31 : ... : 31]', e = [e[: : : e }', and X is the blockdiagonal matrix, diag(X1, X2,..., Xr). The assumptions about the random errors are: E(e1) = 0, Var(e4) = aiiIN, and Cov(ei, ej) = OijlN. Thus, random errors from the same experimental run are assumed to be correlated, but random errors from different experimental runs are uncorrelated. Let E denote the r x r matrix whose (i, j)th element is oa (i,j = 1, 2,... ,r), which is the variancecovariance matrix for the r responses. It follows that e in (5.3) has a variancecovariance matrix of the form Var(e) = A = E IN. (5.4) It follows that the best linear unbiased estimator of / is given by 3=(XA X XA y (5.5) with a variancecovariance matrix equal to (X'A1X)1. Since E is typically unknown, an estimate of it needs to be used in (5.5). A consistent estimator of E was proposed by Zellner (1962) which uses the residual vectors from an ordinary leastsquares fit to the singleresponse models in (5.2). This estimator is given by E = (&ij) where S y[IN Xi(XX)Xi)[IN Xj(X'Xj)Xj]yj aij =N,j = 1,2,... ,r. (5.6) Replacing E with t in (5.4) results in A* = 9 0 IN Using A* in (5.5) we obtain the estimator S= (XIA*XlXIA y. (5.7) This is called the seemingly unrelated regression estimator (SURE) of f. It is also called the estimated generalized leastsquares estimator of f/. Khuri and Cornell (1996, p. 254) note that if the singleresponse models in (5.2) have the same Xi matrices, that is Xi = X0, then the best linear unbiased estimator of fa does not depend on E. In this case, the best linear unbiased estimator of /3i in (5.2) is the same as the ordinary leastsquares estimate obtained from individually fitting the ith model (i = 1,2,... ,r). Let us now consider a multivariate situation in which the experimenter has interest in modeling a function of the responses, for example, G(yl, y2). Using linear models for the means of the two responses results in: Ai(x) = E[yi(x)] = fi(x)O3l /2(X) = E[y2(x)] = f2)2. Observations within an experimental run are correlated with a variancecovariance matrix denoted by E. The linear multiresponse model estimation described above can be utilized for the estimation of 1, /32, and E. It follows that the fitted models are: Aila0 = A(01 #2(X) = fP()2 We now propose a procedure for modeling a function of multiple responses that recognizes the correlation among the responses and takes into account the structure of the function of interest. The proposed procedure involves approximating the function of interest by using a Taylor's series approximation about the means of the individual responses. We assume that G(y1, y2) is continuous and has continuous partial derivatives of order < 3 in a neighborhood of (Al, P2). Using a firstorder approximation of G(yi, Y2) in a neighborhood of (Al, 12) results in Z = G(1 2)+ (Yi 11) I + (Y2 112) ] G(p1, A2). (5.8) This approximation can be used to develop approximate expressions for the expected value and variance of G(yi, Y2). The expected value expression resulting from the firstorder approximation is E[G(yi, y2)] G G(pi, A2) (5.9) The variance expression is approximately given by vaC( 1 9('l' ) 2 * Var[G(yi, y2)] ; uln [ YG \ + 022 1 09Y, J +2 [ (9Y2 J +212 G(i1, A2)1 [OG(pj,, 2)1 (5.10) 1 9yi J L Y2 Let us denote the approximate standard deviation of G(y1, y2), given as the square root of the righthand side of (5.10), by r(x, ^1, 32, E). The expressions in (5.9) and (5.10) depend upon the unknown means and variancecovariance of the responses. The firstorder prediction of G(yi, Y2) uses the results of the analysis of the linear multiresponse model. That is, the predicted value of G(yi, y2) at a point x is equal to )(x) = G(A,(x), 2(x)). In order to assess the quality of prediction evaluated over all the design points, we consider using scaled and unsealed sum of absolute deviations. The scaled sum of absolute deviations is defined as N s = G(y,,y.2)O(1)(z) U= 7(Xu 0,0,E) where yju denotes the uth value of the ith response (i = 1,2, ... r; u = 1,2, ... , N). The unsealed sum of absolute deviations is N T(1) = E jG(yu, Yu2) (1)(x). u=l Using a secondorder approximation of G(yi, y2) in a neighborhood of (Al, /12) results in z(2) = G(Al, A2) + (yi /1i) + (Y2 A2) "] G(111, 12) 1 ia2 aY2 [ (Yl _1)2"2 + 2(yl )(Y2 2) 2 + (Y2 /12)29 G(I, A2). (5.11) This secondorder approximation leads to the approximate expressions for the expected value and variance of G(yi, Y2) given in (5.12) and (5.13). E[G(yi, y2)] G G(i, p2) +1 ['11 [i92 G i A2)4 2a12 [a29Y 2A] + U22 [0 A2 ) 2 L 19Y OYLa12 2 /J . (5.12) Using properties of the moments for the multivariate normal distribution (Anderson 1984, p. 49), the variance and covariance terms in the secondorder variance approximation can be simplified. RESULT 1: Var [(y, /,)2] = E [(yi Mj)O] [E [(y, ,)2]]2 = 3(a,,)2 (ai)2 = 2(a,)2,i = 1,2 RESULT 2: Var [(yi A1)(Y2 A2)] = E [(Yl _/l)2(y2 A2)2] [E [(yi /11)(Y2 2)]]2 = (oU)(a22) + 2(aI2)2 (012)2 = U11722 + (C12)2 Var[G(yi, Y2)] 911 G(IL L Ay 2) + 22 yl A 2l L OG(i, A2) OG(l, 2) I 12 I 2G l (9Y)2 2 y 1 i2 G(J,/ 2 +Var(yi l)2 12 2 + Va)r(y2 2 ) 22 2) +Cv[(yi (L), (yi ^i)2 [8~.s ] y1 D2Gs, (52)j 11 1 [ 1 D fG(1, 1 +2Cov [(y, Ai) (y2 112)] 'p a2 2 + o [(Y tt2), (y )2] 1 9G(,, A2) [1 G(l,./2) +2Cov[(Y /i2), (y( )(Y2 2)] [ a( ] y2 ] E(Y2 152), (Y ts) 9 [G(1m,52) [ Dl2G~t54)] +2Cov [(y, f1), (y )(Y2 2)] [o G(l, A2) 1 2 G(ar, p,2) L 9y, iL2 DYiY2 +2Cov [(Yl 14), (Y2 A)2] [ (p,,112) [1 o(9 l, J2) +2Cov [(Y2 A2), (Y1 _ 1 )2] Wp j, A ) 1 o2G (pi ,,A2 ) i9y2 2 o Dy2 +2Cov [(y2 A)2, (Y A )(Y P )] aG ^ j2 C92G( ,A2 +2Cov [(Y2 A2), (Y2/2)2] [21 O G(/, A2) 1 J 2G(l, /2) [.Y [1 r 2aG(/i1,A2) [l G(/l, /A2)] +2Cov [(yl 14)(y, 2), (Y2 =2)2] [2 l 2)][ L 2 Gp 112) OYIoy2 (5.13) (5.13) RESULT 3: Cov [(yi pi), (yi pi)2] = E [(Yi i)(yi i)2] E [(yi pi)] E [(Yi pi)2] E [(Yi pi)3] = 0,i=1,2 RESULT 4: Coy [(yi mi), (yi pi) (yj )] = E[(y) (Y I',)(Y ) E [(y Ai)] E [(y Ai)(y )] = E [(y, ,)2(yj )] = 0 RESULT 5: Cov [(Yi IL), (yj )2] = E [(yi i)(y j)2] E[(y )][( y (y)]E )2] = E[(Yi l,)(yj_ t)2] RESULT 6: Coy [(y p)2, (yi, )  ] = E [(y i)(y, )(yjj)] E [(y, A)2] E [(y )(y j)] = E [(yi pi)3(yj Aj)] aifaij = oUiioij + iiaij + OUiioij aiiaij = 2(ii(ij RESULT 7: Cov [(Yi ,,)2, (yj j)2] = E (yi )(yj Mj)2] E [(yi p)2] E [(yj ,)2] = E [(yi (yj j )2] aoajj = 0.ayTjj + (0,ij)2 + (Oij)2 oi..Ojj = 2(0.,j)2 Using Results 17, the secondorder variance approximation in (5.13) can be rewritten as follows: Var[G(yi, y2)] [OG(p17/2) ]2 + 022 [G(A 1P52)] 2+ 12 [ G(i, AG2)] [G(A,, 2)] 21 yl +[0"2 y2 + 20r12 y2  O yi 1 (9Y2 I I ay, 9yL 2 +2(0,1) G(jsi, P2) )2 [2 G(piP2)] 2 + (a 110 "22 + ((712)2) a (pY 2 2 ^(^ n ~~1 'l ftl^) i9Y) +2(20110.12) [1 02G(i, A2) 102G(1, A2)y 2 0G(p1i, P) 1 0&G(1, A2) +2(2(22012) 02ay( 2 Y2 +2(20a220r12) 92G(1Qii A2 1Qti,2) I 1y9Y2 J L2 ay22 IJ (5.14) The square root of the righthand side of (5.14) is denoted by 0(x, /32, E). The secondorder prediction of G(yl, Y2) is obtained by using the results from a multiresponse analysis in (5.12) and (5.14). The secondorder predicted value for the function of interest is equal to (2)(X) = G(#I(x), A2(X)) + 1 [ai aG2G(Al(x), 2(X)) G((x)2(X)) + &22G(x), X)) 2 yi12 Oy19Y2 +y2 86 In order to assess the quality of prediction, we consider again using both scaled and unsealed sum of absolute deviations. The scaled sum of absolute deviations is N (2) (X.) y, ) _ S(2 = : G(y,.1, Y.2) ~ )~ ) U= 0(^,1^,2,:) The unsealed sum of absolute deviations is N T(2) = E\IG(y,,, y,2) _(2)(x, . u:I 11=1 The proposed method based upon the Taylor's series approximation to the function of interest takes into account both the multivariate nature of the data and the complexity of the function. 5.4 Numerical Examples In this section, four examples illustrating the proposed procedure for modeling a function of multiple responses are discussed. The first three examples involve the product of two responses, G(yj, Y2) = YlY2, while the last example involves the modeling of the ratio of two responses, G(y1, y2) = I Example 1 Hunter and Jaworski (1986) investigate a chemical experiment where the yield for each run is computed as the product of conversion yi and selectivity y2. The response of interest is therefore G (yl, y2) = y y2. The experiment used a 22 factorial design with one center run. The two control variables were temperature x, and time x2. The design settings and the observed values for the two responses, as well as the product, are given in Table 5.2. Hunter and Jaworski (1986) analyzed this experiment using both the standard naive procedure and their proposed methodology. The standard naive procedure for modeling a function of responses fits a linear model to the last column consisting of Table 5.2: Chemical Data for Example 1 X1 x2 Y1 Y2 G(yi,y2)=y Yy2 1 1 0.799 0.502 0.401098 1 1 0.203 0.113 0.022939 1 1 0.997 0.901 0.898297 1 1 0.401 0.499 0.200099 0 0 0.602 0.500 0.301000 Source: Hunter and Jaworski (1986, p. 324) the observed products. This results in the following fitted model for the product G(yi, Y2) = 0.365 0.269x, + 0.168x2. This firstorder model resulted in a sum of absolute residuals equal to 0.384. If the model is allowed to include the interaction term, the fitted model obtained is G(yi, y2) = 0.365 0.269x, + 0.169x2 0.080xlx2. This results in a sum of absolute residuals equal to 0.127. Now, consider applying our proposed methodology based upon the linear multiresponse model analysis and Taylor's series approximation to the function G(yi, y2) = Y1Y2 Considering firstorder models for both of the response means results in the following models A1(x) = f(x)/ = /31 0uXi + 021X2 I1(X) = f2(X) 3 002 + 12Xl + 3 22X2 Since the model forms are the same for the two models, the multiresponse estimates and the ordinary least squares estimates of 01 and 6 2 will be the same. Because of this, the predictions from the Hunter and Jaworski method will agree with G(1)(x), the firstorder prediction, from the proposed procedure. The proposed procedure, however, takes into account the variation of G(yi, y2) in assessing the quality of prediction. Multiresponse estimation resulted in the following fitted models and estimate of the variancematrix Aa(x) = 0.6004 0.2980xi + 0.0990x2 it2(x) = 0.50300 0.19775x, + 0.19625x2 6.40e 07 1.20e 06 1.20e 06 1.07e 05 The firstorder approximation in (5.8) for the function G(y1, y2) = YiY2 is z(1) = I1P2 + (yi Pl)02 + (Y2 A2)l1. Using (5.9) and (5.10) results in the following expressions for the expected value and variance of the product E [G(yi, y2)] ; AW2. Var[G(yi,y2)] r o"11i2 + 0"222 + 2012A1W2 11 Al Atl The firstorder prediction for the product is therefore tl(x)f2(x). Table 5.3 contains the results from the firstorder prediction. The assessment of the quality of prediction gives an unsealed sum of absolute deviations TM1 = 0.0088012 and a scaled sum of absolute deviations S1) = 4.891519. The procedure performs quite well when compared to the naive procedure. Table 5.3: Example 1: Results from FirstOrder Prediction x1 x2 ylY2 predicted deviation 1 1 0.401098 0.4032973 0.0021993 1 1 0.022939 0.0221706 0.0007684 1 1 0.898297 0.8946678 0.0036292 1 1 0.200099 0.2013021 0.0012031 0 0 0.301000 0.3020012 0.0010012 Applying the secondorder approximation from (5.11) to the function G(yi, y2) results in the following Z(2) = A192 + (yi A1)A2 + (y2 /2).1 +(yi A1)(Y2 P2). Using (5.12) and (5.14) gives the following for the expected value and variance of the product E[G(yl,y2)] A1.2+0'12 Var [G(yi, Y2)] 0,112 + 022A 20+12/12 + o"11022 + 9 22 The secondorder prediction is therefore G(2)(X) = p1(x)pt2(x) + &12. The secondorder prediction is a correction of the firstorder prediction by adding on the estimated covariance between the responses. Table 5.4 contains the results from using the secondorder prediction. The estimated covariance between the responses is small. As a result, the secondorder predictions are close to the firstorder predictions. The assessment statistics are as follows: T(2) = 0.0088 and S(2) = 4.891642. The proposed procedure has performed quite well in comparison to the standard naive approach. Table 5.4: Example 1: Results from SecondOrder Prediction x1 x2 y1y2 predicted deviation 1 1 0.401098 0.4032961 0.0021981 1 1 0.022939 0.0221694 0.0007696 1 1 0.898297 0.8946666 0.0036304 1 1 0.200099 0.2013009 0.0012019 0 0 0.301000 0.3020000 0.0010000 Table 5.5: Example 2: Design Settings and Generated Data X1 X2 Yi Y2 G(yi,y2)=yiy2 1 1 29.3 79.3 2323.49 1 1 50.3 10.7 538.21 1 1 8.9 79.2 704.88 1 1 29.9 70.3 2101.97 0 0 28.5 58.6 1670.10 0 0 29.4 59.1 1737.54 Example 2 The second example involves the product of two responses as well, however, the two models are slightly different. The fitted model for the first response is a firstorder model, while the fitted model for the second response includes the interaction term XlX2 as well. The design settings and generated data are given in Table 5.5. The standard naive procedure resulted in the following firstorder model for the observed product SG(yi, y2) = 1512.6983 97.0475x, 13.7125x2. The sum of absolute residuals for this model was 3564.613. Adding the interaction term to the fitted model reduced the sum of absolute residuals to 764.4867 and Table 5.6: Example 2: Results from FirstOrder Prediction X1 x2 Y1Y2 predicted deviation 1 1 2323.49 2296.3715 27.11847 1 1 538.21 518.7799 19.43014 1 1 704.88 684.7532 20.12681 1 1 2101.97 2076.5965 25.37347 0 0 1670.10 1749.2878 79.18778 0 0 1737.54 1749.2878 11.74778 T(1) = 182.9844 S = 4.8225 resulted in the following model G(yi, Y2) = 1512.6983 97.0475x, 13.7125x2 + 795.5925xlX2. The analysis of the linear multiresponse model gives the following fitted models and estimate of the variancecovariance for the responses. fi(ax) = 29.383 + 10.5x, 10.2x2 {2(x) = 59.533 19.375x, + 14.875x2 + 14.925xlx2 0.1613889 0.1855556 S = 0.1855556 0.2543056 Tables 5.6 and 5.7 give the results of the firstorder and secondorder predictions from using the analysis of the linear multiresponse model and the proposed procedure for modeling the product of responses. The procedure again performs well in predicting the observed products. Example 3 The third example involves the product of two responses with the different models, however, the design settings do not form an orthogonal design. The fitted Table 5.7: Example 2: Results from SecondOrder Prediction x1 x2 y1y2 predicted deviation 1 1 2323.49 2296.5571 26.93292 1 1 538.21 518.9654 19.24458 1 1 704.88 684.9387 19.94125 1 1 2101.97 2076.7821 25.18792 0 0 1670.10 1749.4733 79.37333 0 0 1737.54 1749.4733 11.93333 T(2) = 182.6133 S(2) = 4.8121 Table 5.8: Example 3: Design Settings and Generated Data Xl x2 yi y2 G(yl,y2)=yYly2 1.7 1.0 20.88 202.81 4234.673 1.5 0.8 52.47 111.83 5867.720 1.0 0.7 14.46 213.68 3089.813 0.9 0.9 29.08 162.46 4724.337 1.0 1.4 34.65 211.22 7318.773 1.0 0.6 45.98 139.76 6426.165 1.3 1.0 5.83 176.24 1027.479 1.0 0.8 32.18 181.58 5843.244 0.0 0.0 29.59 162.22 4800.090 0.0 0.1 27.87 168.42 4693.865 0.2 0.0 29.17 194.56 5675.315 model for the first response is again a firstorder model while the fitted model for the second response includes the interaction term x1x2. The design settings and generated data are given in Table 5.8. The standard naive procedure resulted in the following firstorder model for the observed products G(yi, y2) = 4908.4565 895.5674x, 1416.3569X2. 