
Citation 
 Permanent Link:
 http://ufdc.ufl.edu/AA00038446/00001
Material Information
 Title:
 A unified approach to process optimization
 Creator:
 McGoff, Philip John, 1961
 Publication Date:
 2000
 Language:
 English
 Physical Description:
 vii, 136 leaves : ill. ; 29 cm.
Subjects
 Subjects / Keywords:
 Control variables ( jstor )
Crystals ( jstor ) Experiment design ( jstor ) Mathematical independent variables ( jstor ) Mathematical variables ( jstor ) Noise control ( jstor ) Standard deviation ( jstor ) Statistical discrepancies ( jstor ) Statistics ( jstor ) Subroutines ( jstor ) Dissertations, Academic  Statistics  UF ( lcsh ) Statistics thesis, Ph.D ( lcsh )
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Thesis:
 Thesis (Ph.D.)University of Florida, 2000.
 Bibliography:
 Includes bibliographical references (leaves 133135).
 General Note:
 Printout.
 General Note:
 Vita.
 Statement of Responsibility:
 by Philip John McGoff.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for nonprofit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
 Resource Identifier:
 022982363 ( ALEPH )
45077440 ( OCLC )

Downloads 
This item has the following downloads:

Full Text 
A UNIFIED APPROACH TO PROCESS OPTIMIZATION
By
PHILIP JOHN MCGOFF
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA
2000
This dissertation is dedicated to my family, especially my to father, who is not here
to see me become the first in the family with a doctorate.
ACKNOWLEDGMENTS
I would like to thank my family and friends for supporting me in my return to graduate school. I would also like to thank all of my teachersfrom elementary school, high school, undergraduate, and graduate schoolfor spending time with me and challenging me to do my best. I would especially like to thank Mr. Kristo and the rest of the math department at Owatonna High School for giving me an excellent foundation in basic mathematics. Finally I would like to thank the University of Florida Department of Statistics, the faculty, staff, and fellow students, for making my stay in Gainesville an excellent experience.
iii
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS.. . . . . . .............................iii
ABSTRACT... . . . . . . ....................................vi
CHAPTERS
1 INTRODUCTION . . . . . .............................1
2 METHODS OF OPTIMIZATION........... .... .6
2.1 The Taguchi Method.. . . ........................6
2.2 Dual Response Surface Optimization.. . ................11
2.3 Alternative Methods of Dual Response Surface Optimization 16 2.4 Criticisms of the Various Optimization Schemes. ........ 22 2.5 Multiple Response Optimization... . .................25
2.6 Criticisms of Multiple Response Optimization Methods . . . 34 2.7 Robust Multiple Response Optimization. . ..............36
2.8 Optimizing a Function. . . . ......................38
3 ROBUST PARAMETER DESIGN.. . . ...................41
3.1 A Description of the Proposed Method.. . .............41
3.2 Printing Process Example.. . . .....................47
3.3 Example with Lifetime Data.. . . ...................54
4 MULTIPLE RESPONSE OPTIMIZATION.. . ...............59
4.1 A Description of the Proposed Method.. . .............59
4.2 Tire Tread.Compound Example... . .................63
4.3 Repeatability of a Multiresponse Optimization Experiment 68 4.4 Albumin Nanospheres Example.. . . . ................72
5 MULTIPLE RESPONSE ROBUST PARAMETER DESIGN .... 79
5.1 A Description of the Proposed Method.. . .............79
5.2 Oxygen Plasma Anodization Example. . .............. 85
5.3 Repeatability of Anodization Example. . ...............90
6 CONCLUSION AND FURTHER RESEARCH............... 93
iv
APPENDICES
A SOLVER COMMAND IN MICROSOFT EXCEL. . ...........95
B FORTRAN CODE FOR MULTIVARIATE NORMAL PROBABILITY 99 C BASIC OUTLINE OF AN OPTIMIZATION EXPERIMENT . . . 128 REFERENCES... . . . . ....................................133
BIOGRAPHICAL SKETCH.. . . .. ............................136
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
A UNIFIED APPROACH TO PROCESS OPTIMIZATION
By
Philip John McGoff
May 2000
Chairman: G. Geoffrey Vining
Major Department: Statistics
The goal of any optimization experiment is to find the settings of the factors that can be controlled which results in optimal levels of the responses of interest. In robust parameter design, the two responses of interest are the mean and variance of a quality characteristic. In multiple response optimization, the responses of interest are the quality characteristics of the product. In both of these cases, a quantity that is a function of the estimates of the responses of interest is either maximized or minimized. A variety of quantities have been proposed for robust parameter design and multiple response optimization, but all of the proposed quantities are lacking in some respectthey may lack intuitive appeal, depend too heavily on the definition of subjective parameters, or fail altogether in certain situations. In addition, most of the quantities proposed for robust parameter design cannot be adapted easily to multiple response optimization. The probability that all of the responses are simultaneously within their upper and lower specification limits is a quantity which
vi
can be used for robust parameter design and multiple response optimization. The probability method also has an intuitive appeal that will make it easy to explain to people in fields outside of statistics. This method does not depend on the definition of subjective parameters, and it works in all of the situations that have been addressed. It may also be extended to multiple response robust parameter design, which none of the other methods has attempted.
vii
CHAPTER 1
INTRODUCTION
It is the goal of any industry to produce high quality products as inexpensively as possible. The quality of a product is usually measured by physical characteristics, such as diameter, purity, taste, or hardness. When these characteristics are at specific target levels or values, the product is thought to be of high quality. As the characteristics deviate from the target levels, the quality of the product decreases. Therefore producing a high quality product translates into producing the product with quality characteristics at specific levels. There are three possible situations for an individual characteristic:
" Target is best: the quality characteristic has a target level that is most desirable.
For instance, the diameter of a roller bearing or viscosity of a fluid.
" Larger is better: the quality characteristic should be made as large as possible.
For instance, the purity of a chemical or a car's gas mileage.
" Smaller is better: the quality characteristic should be made as small as possible.
For instance, impurities in a chemical or number of bubbles in a paint job.
Optimizing a product or process is a situation found in many areas of applied statistics. A set of control factors affects a response of interest, which is to be maximized (or minimized). The goal of the experimenter is to find the combination of the control factors that does maximize (or minimize) the response of interest. Box and Wilson (1951) introduced the topic of response surface methods in an attempt to answer this problem. They recommended that a series of experiments be performed, first to find out which control factors actually affect the response of interest, and secondly to find the optimal settings of the control factors. The
optimal settings are found by running a response surface experiment capable of fitting the response to a second order polynomial in the control factors. Using some basic calculus, the maximum (or minimum) of the second order polynomial can be found. The situation can become much more complicated if the response of interest does not have constant variance in the region in question, or if there is more than one response of interest. Robust parameter design deals with the situation where there is a single response that does not have a constant variance in the region of interest. Multiple response optimization deals with the situation where there is more than one response of interest.
Robust Parameter Design
In the manufacturing and use of the product, there will exist factors that will affect the values of the quality characteristic, not all of which can be easily controlled. Those that can be easily controlled are referred to as control factors. These could be the amount of material going into a chemical process, the temperature of a chemical bath, or the pressure in a reactor. Those factors that cannot be easily controlled, or are too costly to control, are referred to as noise factors. These could be ambient humidity and temperature, the speed at which a customer drives his car, or the temperature at which a customer bakes his cake.
Optimizing the industrial process entails finding the levels of the control
factors that will produce a product that has the desired quality characteristics. Due to natural variability, it is impossible to find settings that will always give the same values of the quality characteristics, and so all we ask is that it give those values a high percentage of the time on average. This variability will lead to decreased quality, as the product will not always be at its target value. One approach that has been traditionally taken is a two step process. The first step gets the characteristic at its desired level, on average. The second step is to find the sources of variability
3
and either control them or eliminate them. Control or elimination of sources of variability can be difficult and/or costly.
Another approach is to find settings of the control factors that make the product insensitive to the sources of variability. This is the approach used in the area of parameter design or robust parameter design. This approach has the appeal that it is typically more cost effective than controlling or trying to totally eliminate the sources of variation. If levels of the control factors can be found for which the product is insensitive to the sources of variation, and the product's quality characteristics can be made consistently close to their desired levels, this will result in a consistently high quality product.
Experiments can be run in order to find the settings of the control factors that will result in a consistently high quality product. The experiment will have the quality characteristics as the responses of interest, and the control and noise factors as the variables of interest. Notice that some noise factors can be controlled during experimentation, even though they cannot be easily controlled otherwise. An example of this would be the speed a car is driven or the temperature at which a cake is baked. Even if no noise factors exist or are known, an experiment can still be run to find the settings of the control factors that produce the smallest variability while still achieving a desired level of the response.
Multiple Response Optimization
Most products and processes have more than one response of interest, and all of these responses may depend on a set of control factors. These responses may have either an ideal target value that is desired, or a range of values that will result in a product that is of satisfactory quality. These targets and ranges are typically given in the form of targets and upper and lower specification limits. Often the product cannot be shipped to a customer unless all of the individual responses are within their upper and lower specifications. Ideally, there would exist a combination of the
control factors that simultaneously put all of the responses at their individual target value. This does not often happen, though, so a set of control factors must be found that somehow balances all of the responses, so that each response is somewhat close to its target.
Multiple response optimization usually assumes that the variability of the individual responses does not depend on control factors. It can be easily imagined, however, that situations may exist where one or more of the responses depend on the control factors, as in robust parameter design. Experimental situations can therefore fall into three categories:
" Robust Parameter Design: There is only one quality characteristic of interest,
and its variability depends on the control factors in the experiment.
" Multiple Response Optimization: There is more than one quality characteristic
of interest, but the variability of these responses are not thought to depend on
the control factors in the experiment.
" Multiple Response Robust Parameter Design: There is more than one quality
characteristic of interest, and the variability of these responses depends on the
control factors in the experiment.
Multiple response robust parameter design has not been dealt with much in the literature, but robust parameter design and multiple response optimization have been researched extensively.
Robust parameter design and multiple response optimization have been
treated as two separate problems, each with a unique method for finding the best setting of the control factors. All of the proposed methods have much in common, though. They all start with a designed experiment which is capable of supporting the fitting of the responses using polynomial models in the control settings. Some or all of the fitted responses are transformed into a univariate function, which is maximized (or minimized) over the experimental region of interest. The univariate
5
functions proposed for robust parameter design cannot be easily adapted for use in multiple response optimization, and vice versa. Maximizing the probability that the responses are between their upper and lower specifications can be used for both robust parameter design and multiple response optimization. In the following chapters, I will propose that this function be used for all three of the experimental situations listed above.
CHAPTER 2
METHODS OF OPTIMIZATION
2.1 The Taguchi Method
Genichi Taguchi is credited with introducing robust parameter design based on his work with various industries in Japan. His method is explained in papers by Kackar (1985) and Byrne and Taguchi (1987), as well as discussed by a panel headed by Nair (1992). Explanations of this method are also included in books on response surface methodology, such as those by Myers and Montgomery (1995) or Khuri and Cornell (1996). The philosophy behind the use of robust parameter designs is that it is more cost effective to try and make a process robust to sources of variability than it is to try to control or eliminate those sources. This same philosophy can be used to improve product performance in the field. The product can be designed in such a way that its performance is not sensitive to variability that it will encounter during its lifetime.
Taguchi believes that as long as a product's characteristic is not at its target level, it is not of the highest quality. This is different from the belief commonly held in some industries that as long as a product's characteristic is between its lower and upper specification limits, it is of the highest quality (see Figure 2.1). Taguchi's belief that a product can have lower quality while still being within specification is the driving force behind reducing product variability. It is noted that customer satisfaction increases as product variability decreases.
Robust parameter design aims to find the optimal settings of the control
parameters. In order to do this, a criterion must be specified that is to be optimized. Taguchi recommends optimizing the expected loss, defined as the expected value of
Taguchi
Mical Classical Lower Limit Target Upper Limit
Figure 2.1: Loss Functions for Taguchi and Classical View.
monetary losses an arbitrary user of the product is likely to suffer at times during the product's life span due to performance variation. This makes the concept of optimization concrete in that it relates product performance and monetary losses.
Taguchi recommends using a quadratic loss function, 1(Y)  K(Y 7)21
where Y is the quality characteristic, T is its target value, and K is some constant. The constant K is determined by the cost to repair or replace a product if it performs unsatisfactorily. (There are slight modifications to this loss function for the cases of larger or smaller is better.) Then the criterion that is to be optimized is the expected loss,
L = E[l(Y)] = KE[(Y )2].
Note that the final solution will not depend on the choice of K.
The actual design of the experiment is a design in the control factors crossed with a design in the noise factors. If the control design consists of m combinations of the levels of the control factors and the noise design consists of n combinations of the levels of the noise factors, then the overall design will have a total of mn runs. Notice that each combination of control factors will be repeated n times, once for each combination of the noise factors. This methodology serves to induce noise at each combination of the control factors. These n runs are then used to calculate a performance statistic. There will be m performance statistic values, one for each combination of the control factors.
The performance statistic suggested by Taguchi is referred to as a signal to noise ratio, and a large value is desired. Although Kackar notes that over 60 different signal to noise ratios are used, there are three main ones that are used, depending on the type of quality characteristic. They are:
Smaller is Better:
In this case, the target value is T = 0 and the expected loss is proportional to E[(Y  0)2] = E[Y2].
The performance measure is then estimated by
10/og
where n is the number of runs among the noise factors at each combination of levels of the control factors.
Larger is Better:
In this case, the quality characteristic Y is transformed to 1/Y and treated as though it were the case of smaller is better. Therefore the performance statistic becomes
 10log (/ /
Target is Best:
Kackar lists two situations, both of which depend on the assumption that an adjustment parameter exists. An adjustment parameter affects the mean of the response, but does not affect the variance. Thus the variance can be reduced, and the mean moved to the appropriate level by means of the adjustment parameter. The situations of interest in the target is best case are when the variance is not linked to the mean and when it is.
In the case that the variance is not linked to the mean, the idea is to minimize the variance, since the mean may be moved to target by means of the adjustment parameter. For this reason, the performance statistic suggested is
10
where s2 is the unbiased estimate of the variance and 9 is the mean response at that experimental point.
When the variance is linked to the mean, and is most likely positively correlated, the coefficient of variation a/ should be made small. When the coefficient of variation remains small, any change in the mean results in a relatively small increase in the variance. Again the target is reached through adjustment parameters. For this reason, the performance statistic suggested is 101og(q / '
where y is the average of the response values. Note that this is the only performance statistic listed here that can be legitimately called a function of the signal to noise ratio.
The Taguchi method starts with the identification of control factors, noise factors, and their appropriate settings. The parameter design experiment is designed and run, and the appropriate performance statistic is calculated for each combination of the control factor settings. The combination of control factor settings with the best performance statistic value is then chosen as the new setting. A confirmatory experiment is run to see if the new settings do indeed lead to an improved value of the performance statistic.
In a published discussion of the Kackar paper, Box points out that the
important engineering ideas of Taguchi are accompanied by statistical procedures that are often unnecessarily complicated and inefficient, and sometimes naive. He specifically mentions not exploiting the sequential nature of experimentation, limiting the choice of a design, focusing on the universal use of a signal to noise ratio, and the lack of data transformations, among others. Incorporating some of these issues and others into the methodology one uses would lead to a statistical refinement of Taguchi's excellent engineering ideas, as will be covered later.
11
2.2 Dual Response Surface Optimization
As previously mentioned, serious concerns were expressed about the lack of use of statistical methods in the Taguchi method. Many of these were detailed by Box in the discussion of Kackar's paper. An excellent paper by Myers et al. (1992) encapsulates many statistical improvements to the Taguchi method.
Taguchi recommends a design that crosses the noise design with the control design. Typically this will result in a very large design, unless they are highly fractionated designs. When the control design is highly fractionated, there is typically little or no information about interactions, some of which may be very important. Taguchi's rationale is that maximum information must be gained about the control by noise interactions and not interactions among the control variables. His designs therefore sacrifice possibly important control by control interactions at the expense of measuring possibly unimportant control by noise interactions.
Myers et al. (1992) give an example of a Taguchi design with 3 control
factors and 2 noise factors. The design recommended by Taguchi is an L9 for the control variables crossed with a 22 factorial for the noise variables. The L9 design is a three level PlackettBurman design for three factors and is given in Table 2.1. The levels of the three variables are coded by 1 for the lowest level, +1 for the highest level, and 0 for the middle level. When the design for the control variables is crossed with the design for the noise variables, the resulting experiment will consist of 9x4=36 runs. This approach gives maximal information about control by noise variable interactions, because the Taguchi method seeks to find the level of the control settings that is least affected by the noise variables. However, often many degrees of freedom are used for estimating control by noise interactions that could be better utilized estimating potentially important control by control interactions. For instance, a popular response surface design in five factors that could be used is a hybrid design consisting of a central composite design in the control variables
12
Table 2.1: L9 Taguchi Design.
X1 X2 X3
1 1 1
1 0 0
1 +1 +1
0 1 0
0 0 +1
0 +1 1
+1 1 +1
+1 0 1
+1 +1 0
x1, x2, and x3 and the noise variables zjandz2, which is given in Table 2.2. The first 16 runs evolve from a 2'1 fractional factorial in all five factors, plus axial runs in the three control variables with the noise variables at their midlevel, and 3 center runs among the five variables. This design has only 25 runs, compared to the 36 suggested by Taguchi, and it has the flexibility of estimating control by control interactions.
The practice of putting the control and noise factors into a single experimental design was first referred to as forming a combined array by Welch et al. (1990) and Shoemaker et al. (1991). Combining all of the factors into one design allows for greater flexibility for estimating certain effects, such as important control by control interactions. Enough information will still be available for the control by noise interactions, which will lead to robustness. The combined array philosophy has been almost unanimously endorsed by the statistical community.
A paper by Vining and Myers (1990) speaks to many of the other criticisms of the Taguchi method, such as the universal use of a signal to noise ratio, and limiting the recommended optimal setting to one of the combinations of the factors that was run in the experiment. They recommend using the aforementioned combined array and treating the mean and variance as separate responses. The mean and variance are each modelled with a second order polynomial and each can be optimized
13
Table 2.2: Central Composite Design in 5 Variables.
Run x, x2 x3 z1 z2 1 1 1 1 1 +1 2 +1 1 1 1 1 3 1 +1 1 1 1 4 +1 +1 1 1 +1 5 1 1 +1 1 1 6 +1 1 +1 1 +1 7 1 +1 +1 1 +1 8 +1 +1 +1 1 1 9 1 1 1 +1 1 10 +1 1 1 +1 +1 11 1 +1 1 +1 +1 12 +1 +1 1 +1 1 13 1 1 +1 +1 +1 14 +1 1 +1 +1 1 15 1 +1 +1 +1 1 16 +1 +1 +1 +1 +1 17 1 0 0 0 0 18 +1 0 0 0 0 19 0 1 0 0 0 20 0 +1 0 0 0 21 0 0 1 0 0 22 0 0 +1 0 0 23 0 0 0 0 0 24 0 0 0 0 0 25 0 0 0 0 0
14
separately and simultaneously. This is referred to as the dual response surface optimization approach.
This approach follows the dual response approach of Myers and Carter
(1973), which is a response surface technique for dealing with two responses, where generally both pertain to quality characteristics whose variances are assumed constant over the experimental region. The responses are classified as a primary response and a secondary response, where the latter will act as a constraint. The fitted primary response is given by
=P  bo + x'b + x'Bx
and the fitted secondary response is given by YS  CO + x'c + x'Cx
where b, B, c, and C are vectors and symmetric matrices whose elements are the least squares estimators of the coefficients of the linear, quadratic, and cross terms of the second order model.
The object of the optimization is to find conditions on the elements of x
that optimize YP subject to the constraint that Ys = k, where k is some desirable or acceptable value of the secondary response. Using a Lagrangian multiplier p, they consider
L  bo + x'b +x'Bx  iu(o +x'c + x'Cx  k) and solve for the x that satisfies
6L0
15
The solution is
1
(B  C)x= ( c  b).
2
Much of what Myers and Carter present deals with ensuring that a local maximum (or minimum) is actually obtained, and closely follows the ridge analysis approach developed by Hoerl (1959) and Draper (1963). The mathematical details will not be presented here.
Vining and Myers recommend using the dual response approach of Myers and Carter, using the mean and variance as the two responses. The three basic situations are dealt with as follows:
" Target is best: minimize the variance U2, subject to the constraint that the
mean /t is held at a specified target value. Here the primary response is the
variance, and the secondary response , or constraint, is the mean.
" Larger is better: maximize the mean p, subject to the constraint that the
variance U2 is at some acceptable level.
" Smaller is better: minimize the mean p, subject to the constraint that the
variance U2 is at some acceptable level. In this and the previous case, the
primary response is the mean, and the constrained response is the variance.
Myers and Carter showed that a solution can always be found if the optimization is performed on the surface of a sphere of radius p, thus giving an additional constraint of x'x = p2. Del Castillo and Montgomery (1993) showed that this additional constraint can be dispensed with by using a standard nonlinear programming technique called the generalized reduced gradient (GRG) algorithm, which will be described later.
Vining and Myers recommend fitting the two responses
=p  bo + x'b + x'Bx
16
Y, = CO + x'c + x'Cx
as in the dual response situation of Myers and Carter. The experiment is set up using an npoint response surface design that is capable of supporting the fit of a second order model. Some designs that are presented are a central composite design or a 33 factorial. Details of these, and other designs can be found in books on response surface methodology, such as Myers and Montgomery (1995) or Khuri and Cornell (1996). This design is then replicated m times to obtain an estimate of the variance at each design point. Ideally this replication would be in the noise parameters, as in the Taguchi method, but any scheme with proper randomization would be adequate. The average response 9i and the sample standard deviation si are calculated for i = 1,... , m, and then fit to the primary and secondary response models above. The dual response system is then optimized using the GRG algorithm, which can be found in many software packages, such as Microsoft Excel's solver command.
This approach answers several of the statistical issues raised by critics of the Taguchi method. It allows for sequential experimentation, where an experiment or series of experiments is/are performed to locate a favorable region, followed by another experiment to locate the optimal settings in that area. The approach allows for a variety of designs which may include estimating interactions among the control variables. Both the mean and the variance are separately modeled, allowing for a better understanding of the process to be acquired while avoiding the use of a signal to noise ratio.
2.3 Alternative Methods of Dual Response Surface Optimization
Several alternatives to the optimization methods of Vining and Myers, as augmented by Del Castillo and Montgomery, have been suggested. These deal mainly with the perceived problem that the constraints on the secondary response
17
are unnecessarily restrictive, mainly in the target is best scenario. The main suggestion deals with the secondary constraint that the fitted mean be at a certain target. If some bias, defined as the distance that the fitted mean is away from the target, is allowed, new settings of the control factors may result in an even lower variance. This must be done with caution, however, as too much bias will not be good for the overall optimization of the product. Though it has not been mentioned in the literature, in the other two cases of larger or smaller is better, the constraint of setting the variance at an acceptable level may also lead to problems, which will be detailed later.
All of the following methods begin the same way as Vining and Myers. A response surface experiment is designed that will allow both the mean and variance (or a function of the variance) to be modeled by a second order model in the p control variables. In most cases, the standard deviation is used rather than the variance, but another common transformation of the variance is a log transform. The transformation is merely meant to allow a better fit to the model form. The fit of the mean of the response is given by
W=/ 0oï¿½3)/ixi + Z ZAixx
i1 i=1 ji
and the standard deviation of the response is given by
Wr= ^/o + jixi + >3 7'ij Xi Xj
i=1 i=1 j>i
where the / and 7 are the least squares estimates of the true parameters. Once the models are fit, one or both of the response measures is to be maximized or minimized in order to determine the optimal setting of the control factors. Vining and Myers recommended using a constrained maximization (or minimization) of the two responses, as described above. Other people have recommended that different
18
quantities be maximized (or minimized), or that a modified approach be taken to the constrained optimization.
Lin and Tu (1995) recommend minimizing a squared error loss expression defined as
MSE (@  )2 + ^2
MSE = (W~4 T)2F
where w^, and w. are the fitted mean and variance from the two responses surface models, respectively, and T is the target value for the mean. This allows some deviation from the target level, or bias, to enter into the proposed solution, but it should not be too large, since the overall MSE must be minimized. The maximization may also be performed using the GRG algorithm that is recommended for the method of Vining and Myers.
Lin and Tu claim that the advantages of their approach over that of Vining and Myers are that it is not restricted to fitting full second order models and it may lead to a better overall solution. By not being restricted to a full second order model, the experimenter can use common regression techniques to find the simplest model for the responses. In fact, the two responses do not even have to have the same model form. In actuality, the use of the GRG algorithm for solving the method of Vining and Myers is not restricted to a full second order model either. Lin and Tu also mention that more flexibility may be gained by minimizing the quantity MSE* = 1(?bp  T)2 + A2tO
where A. are prespecified constants such that 0 _ A K__ 1 and A1 + A2  1. This would allow more freedom for the experimenter, by enabling the mean or variance to take on different weights in the final solution.
Copeland and Nelson (1996) propose a slightly different formulation to the problem as part of a paper about computer algorithms for finding a maximum or
19
a minimum in a region. For the target is best case, they suggest minimizing the variance subject to the bias, or distance the mean w, is from target T, being less than some A, which is specified by the experimenter. This can be stated as Minimize w^2
subject to (*, T)2 < A2.
This allows for the mean to deviate from the target level, but places an upper limit on the amount of deviation. For the larger or smaller cases, they suggest optimizing the mean wA while keeping the variance w.^ less than, rather than equal to, some level. This is really a minor modification to that of Vining and Myers, since the final solution typically has the variance at the defined limit anyway.
Copeland and Nelson's method answers a concern with the method suggested by Lin and Tu, namely that the mean could deviate too far from the target level, which is contrary to Taguchi's original idea of keeping the response close to target as often as possible. By placing an upper limit on the amount that the mean can deviate from the target level, this method allows some deviation, but not too much. Their modification in the larger or smaller case is an improvement over the use of Lagrangian multipliers suggested by Vining and Myers, but is really no different than that suggested by Del Castillo and Montgomery. The linear programming solution does not explicitly use Lagrangian multipliers and is also not restricted to equality contraints. Copeland and Nelson also note that the solution usually has the variance equalling the upper limit, so little is gained by using an inequality rather than an equality.
Kim and Lin (1998) propose a solution based on fuzzy optimization
methodology. They map the mean and variance to membership functions, and then each of these two functions are mapped to a single value which is optimized. The membership function lies between 0 and 1, and values close to 1 are better. The
20
optimal solution is the set of control settings that maximizes the minimum of the two membership functions.
The general case of the membership function is given by d_ djzj if d 0
(z) = ed1
1Izi ifd=O
where d is a shape parameter (see Figure 2.2 for a graph of the membership function for different choices of d), and where z, and z, are defined as mx . min < W
Otherwise
I min
0 ba < "W0mi .^ .min
Za "wa'wa ,,, .min < Wa max
 WcTWm m1n Wa < Wa wrnax~wrmn ï¿½ a _
1 ,> wnax The shape parameter d is a measure of how quickly the value of m(z) changes along the 0 to 1 scale as z moves away from 0. As d increases (from large negative to large positive), m(z) changes slowly as z moves away from 0. Note that z moving from 0 to 1 is the same as the response moving from the target to its upper or lower limit. In other words, if it is critical that the mean (or variance) be close to target, a large negative value of d should be used. If it is not critical that the mean (or variance) be close to target, then a large positive value of d should be used (see Figure 2.2). This gives the experimenter some flexibility to add more weight to either the mean of the response or the variance of the response.
21
d0
C14
I I I I I I 0.0 0.2 0.4 0.6 0.8 1.0
Figure 2.2: Membership Function for Some Choices of Shape Parameters.
The optimal setting is the values of the control variables x that maximizes the minimum of the two membership functions, or it may be stated as: maximize A (2.1) subject to m(z,,) _ A
m(z,) > A
Since m(z) is close to 1 when z is close to 0, this implies that the best setting will be the one that gets both m(z,) and m(z,) as close to 1 as possible. When this happens, both the mean P, and the standard deviation W^ should be close to their targets, where the target for the standard deviation ^,@ is usually 0.
A brief explanation of fuzzy set theory may help motivate the method proposed by Kim and Lin. A good discussion of the subject from a statistical
22
Table 2.3: Membership Values
x 0 1 2 3 4 5 6 7 8 9 Set A 1 1 1 1 0 0 0 0 0 0 SetB 1 0.9 0.7 0.5 0.2 0 0 0 0 0
point of view can be found in Laviolette et al. (1995). Fuzzy set theory assumes that elements have some membership in a set. If the membership is 0, the element is never in the set. If the membership is 1, the element is always in the set. As membership increases from 0 to 1, the possibility that it is in the set increases. For instance, if the elements of our universe are the digits from 0 to 9, we can define two sets, A  (x: x < 3) and B = (x: x is small). The membership in each set for each element is given in Table 2.3. Set A is referred to as a crisp set, since all of the membership values are 0 or 1. Traditional set theory deals with crisp sets. Set B is a fuzzy set, since the membership values take values other than 0 or 1. Membership can be thought of as being a conditional probability, i.e. P(x is small given x = i) gives the membership values for i = 0,1, ..., 9. The membership values for the intersection of two sets is given by the minimum of the membership values of the two sets. Similarly, the union of two sets uses the maximum.
The method of Kim and Lin defines two fuzzy sets, A=(The mean is at a good level) and B=(The standard deviation is at a good level). It then finds the value of x that gives the greatest possibility of being in the intersection of A and B, or AfB=(The mean and standard deviation are at good levels). Since the intersection of two fuzzy sets uses the minimum of the two membership values, their method is maximizing the minimum of the two membership values that they define.
2.4 Criticisms of the Various Optimization Schemes
All of the optimization schemes have shortcomings that should be addressed. The method suggested by Vining and Myers has already been mentioned as being
23
too restrictive on its secondary response. The issue in the target is best case of not allowing any bias has been addressed. In the larger or smaller is better case, the restriction of the variance being equal to some acceptable level has not really been addressed. Copeland and Nelson change the restriction that the variance be less than or equal to some level, but admit that the final solution usually has it equalling that level anyway. By dictating what is an acceptable level of variance, one may find a solution that is not the best possible. Vining and Myers use a number of different acceptable levels and suggest that the experimenter compare the proposed solutions and choose the one they think is best. No methodology is given as to how to compare the proposed solutions, though. In the larger is better case, it is difficult to compare the following two solutions:
W1LA Woh
Solution A 557.9 60
Solution B 687.5 75
It is unclear whether the larger mean in solution B sufficiently offsets its increased standard deviation.
The method suggested by Lin and Tu does have the statistical appeal of
minimizing squared error loss. For the target is best case, this method is difficult to criticize. For the larger or smaller is better case, though, a difficulty arises in how to define bias, since there is no explicit target. For the smaller is better case, they suggest a target of 0, which is consistent with what Taguchi suggests. This may not lead to a good solution if the mean cannot legitimately attain 0. If the mean is approximately 100 and the standard deviation is approximately 10, then the bias term will dominate the variance term, meaning the solution will be driven by bias,
24
with variance largely ignored. For instance, take the following two solutions: WA WaMSE
SolutionA 100 10 10100
Solution B 90 40 9700
According to this method, solution B would be chosen by virtue of a smaller MSE, even though the standard deviation is four times that of solution A. This probably would not be an acceptable choice of solutions. No mention is made of the larger is better situation, as far as the definition of bias is concerned, thus they do not even address this case. An experimenter could conceivably choose some artificial target, but then the solution may depend too heavily on the choice of this target.
A criticism with the method of Copeland and Nelson is the issue of how much bias is acceptable. This is similar to the criticism of Vining and Myers with regard to how much variance is acceptable. As with Vining and Myers, one could suggest a number of possible solutions and pick the best one, but no method is given for comparing the possible solutions. It is not a simple task to compare a solution that results in a bias of I and W^= 45.20 with a solution that results in a bias of 5 and w = 44.73. It is unclear whether the lower standard deviation in the second case compensates for the larger bias.
The main criticism with the method of Kim and Lin is that too many
parameters have to be specified for the membership functions. They require targets, upper and lower limits, and shape parameters to be specified, all of which can greatly affect the final solution. It is this dependence on so many parameters that can lead to the belief that the parameters drive the solution more than the actual data do. As will be shown later, the final solution can be highly dependent on the choice of these parameters.
Another criticism of the method of Kim and Lin is that it is unnecessarily complicated, and its solution is in terms of a quantity A from equation 2.1 that has
25
no physical interpretation. If a statistician reports that the optimal settings result in A = 0.26, the engineer will most likely not know whether that is a good or a bad result. I believe a simpler approach having a more intuitive interpretation would be much preferred.
2.5 Multiple Response Optimization
For experimental situations where there is more than one quality response of interest, all of which depend on the settings of a group of control variables, a number of methods for determining the optimal set of control variables have been suggested. If the variance of the responses does not depend on the settings of the control variables, then a response surface approach can be used in the optimization. The optimization experiment will begin by running an experiment capable of fitting a second degree polynomial in the p control factors x = (xl,...I, xP), such as a central composite design, face centered cube, or a 3P factorial. These experiments are discussed in books on response surface methodology, such as those by Myers and Montgomery (1995) or Khuri and Cornell (1996). At each of the n design points, the r responses are observed as yij, with i= 1, ..., r and j = 1, ..., n. Each of the responses is then individually fit to a full second order response surface model, given by
4i= /+ j + : + j
j=1 j=l k>j
where the /3 are the usual least squares estimates of the true regression parameters. At this point, a univariate function, say O(x), is constructed, which is a function of the control variables x through the fitted responses yi. The optimal solution is the set of control factors x that maximizes (or minimizes, depending on the function) the function q(x). The difference between the proposed multiple response optimization methods lies in the definition of the function q(x) to be optimized.
26
Derringer and Suich (1980) proposed a desirability function for the case of optimizing several responses simultaneously. For the ith response, let the individual desirability di be a transformation of the fitted value ^)i, where 0 < di 5 1. The value of di increases as the desirability of the corresponding response increases. The overall desirability of the r responses is the geometric mean of the individual desirabilities,
D r )1/r
Note that D is also between 0 and 1, and that if any individual di is 0, then D will be 0. That is, if any individual response is unacceptable, the entire product is unacceptable. This is consistent with much of industry's needs and demands where a product must meet all of its specifications, or it will be rejected.
The individual desirability for the case of target is best, where the target for the ith response is Ti and the lower and upper specifications are y,!",f and yia1, is given by
( 'Y7in) mn Sli
T yumn Yi 
10ynrfifxr yta
diTi _Y ...
tmin y Max
0 Yi< YZ or ) > Y
where s and t are shape parameters. Figure 2.3 shows the desirability function for some choices of s and t. Typically a shape factor of 1 is used, but there are situations where a value other than 1 may be used. One situation would be if a product could be inexpensively reworked if a quality characteristic was too high, but would have to be scrapped if it was too low. In this case, if s was chosen to be large and t was chosen to be small, the desirability function would be larger when the response was above the target level, which would favor the response being at or above target.
co
(D c;
"a
Coj
0 100 200 300 400
500
Figure 2.3: Two Sided Desirability Function for Selected Shape Parameters.
27
28
Other choices for the shape parameters could be thought of as weighting factors for responses that are more or less important than other responses.
In the case of larger is better, where the lower specification limit is yj"' an upper limit yrna still must be specified. This upper limit may be thought of as the level above which there is no added value to the product. This upper level in essence becomes the target for the response. In the case of smaller is better, the negative of the response is used and treated as though it were a larger is better situation. The desirability function for larger is better is given by
0 lyi < yin
yi >ymX
where s is a shape parameter. Figure 2.4 shows this function for several choices of s. Note that the desirability function does not need all of the responses to have the same functional form.
The overall desirability D depends on the location x in the region of interest. The overall optimum is found by maximizing D with respect to x. This has an intuitive appeal that the optimal choice of the settings of the control variables x is the location that is most desirable, i.e. has the highest value of the desirability function D. There are some disadvantages to this method that must be addressed. The desirability function D is treated as being a function of only x, but it is, in fact, also a function of the choices of the upper and lower specification limits, the targets, and the shape parameters for each of the individual responses. In addition, the responses are treated as being independent, since any correlation among the responses is not taken into account.
Khuri and Conlon (1981) proposed a distance measure from some ideal
optimum, where the distance is relative to the estimated variance of the responses. The main advantage to this method is that it incorporates the possible correlations
Co CD
04
0 100 200 300 400
500
Figure 2.4: One Sided Desirability Function for Selected Shape Parameters.
29
30
among the responses, which the desirability method does not. The first step in their method is to define a vector of ideal targets for the responses f. The individual elements of the ideal target vector /i are defined by the actual target in the case of target is best, or by the predicted maximum or minimum that can be obtained in the region of interest in the case of larger or smaller is better, respectively. If the target vector is an estimate of the true target vector, then it should be denoted by /. Obviously, if there exists some x that simultaneously achieves all of the individual targets at the same time, it is the ideal optimum. Since this rarely occurs, the x that minimizes the vector distance from this ideal optimum is defined to be the optimal set of operating conditions.
Before defining the distance function, other quantities must be defined and estimated. The linear model for an individual response is written in the form: yi  Xjli + i, i  1, 2, ..., r,
where yi is the vector of observations of the ith response, f3i is a vector of q unknown regression parameters, Ei is a vector of random errors associated with the ith response, and X is an nxq full column rank matrix of constants obtained from the experimental design. The fitted ith response is of the form /i(x)  z(x)oi, where z(x) is a vector of dimension q, whose first element is one and the remaining elements consist of powers and crossproducts of control variables x1, x2,... , Xp. Note that this method assumes that all of the responses use the same functional form. The matrix of the observations from the r responses is denoted by Y, and the variancecovariance matrix is denoted by . An unbiased estimate of the variancecovariance matrix is given by
=Y[In  X(X'X)X']Y/(n q)7
31
where I, is an identity matrix of order nxn (Gnanadesikan 1977, for instance). The distance function is then defined as
 q5I~1p~x)~ I1/2
p[Yj(), ï¿½  '[(Yx()  ()(Y)(x) 0)] (2.2) ,Z' (X ) Z *l(
The optimal solution is given by the set of the control factors x which minimizes the distance p[Y/(x), 0].
The method given above is an attempt to measure the distance away from target in terms of the variability associated with the estimated responses. Since the variance associated with the fitted responses EY is given by Y Z'(X)(X'X)1Z(X),
equation 2.2 can be rewritten as
, 4  '(  /2(2.3)
where EY is Ey with E substituted. Equation 2.3 shows the relationship to a distance measure in terms of the variability, just generalized to a multivariate situation. An estimate of the distance of the fitted responses from their targets is given by Y (x)  4, and it is scaled by an estimate of the variance of the fitted responses Y
In the case where the variancecovariance matrix can be assumed to be
diagonal, i.e., no correlation exists among the responses, the above calculations can be simplified somewhat. Equation 2.2 can be simplified to F~Ijr() I 4I . ,)C% 11/2
P2[1J(X),4ï¿½]  Z ,,()(,J, ( , (2.4)
where 80jj is the ith diagonal element of E. Khuri and Conlon also talk about the fact that the target vector 4 may be a random variable, due to the fact that the targets may be estimated from the experiment. This occurs when a response is
32
maximized or minimized, which means that the target is the estimated maximum or minimum that may be achieved in the experimental region. Dealing with a random target vector greatly complicates the analysis, and will not be dealt with here.
Ames et al. (1997) propose that a quality loss function be used in order to optimize several responses simultaneously. Their method is somewhat similar to that of Khuri and Conlon, though it can be thought of as being both more general and more restrictive. A quality loss function is defined by
r
QLP(x) = 3 wi(j(x)  T)2, (2.5)
i1
where pi(x) is the value of an individual response evaluated at a given x, Ti is the target for the ith response, and wi is a prespecified weighting factor. It is assumed that the target vector is selected by the experimenter, and may be considered constant. The optimal point is the value of x that minimizes the quality loss function. Note that the loss function in equation 2.5 is of the same form as equation 2.4, with qi replacing Ti as the target and the weight wi being equal to [i/Uiiz'(x)(X'X)1z(x)]1/2. By not dealing with the entire covariance matrix, Ames' method is more restrictive than that of Khuri and Conlon. But by being more flexible in the possible choices of the weighting factors wi, it is less restrictive than that of Khuri and Conlon.
Vining (1998) presents a more general squared error loss function that does account for the possible correlation among the responses and reduces, in a special case, to the distance measure proposed by Khuri and Conlon. The loss function that he introduces is of the form
L = [y(x)  6]'C[jj(x)  0],
where C is an appropriate positive definite matrix of costs or weights and 6 is the vector of target values, assumed to be known. The overall optimum is the location
33
of x which minimizes the expected value of the loss function, which is estimated by
E(L)  (x)  9]'C[y(x)  0] + trace[CE(x)], (2.6)
where Ey(x) is the variancecovariance matrix of the fitted vector y. The first term of equation 2.6 represents the penalty imposed for any predicted value that does not simultaneously meet all of the target values, while the second term represents the penalty imposed by the quality of the prediction.
One serious concern with this method is that the choice of the cost matrix C can be somewhat subjective. Vining offers some different choices that the practitioner may use. One choice would be to use C [Ey(x)]', which would make the term
trace[CEy(x)]= trace[I] r,
which is a constant for all values of x. Thus, minimizing E(L) is equivalent to minimizing the distance measure suggested by Khuri and Conlon in equation 2.3, as long as the targets 0 are known. This shows that the distance measure of Khuri and Conlon can also be thought of as a loss function, though the cost matrix that is used in not practical in an engineering sense. A more general choice that is suggested is C = KE1, where K is a suitably chosen matrix that shapes the variancecovariance structure to reflect the economics of the process. With this choice, the loss function becomes
E(L)  [(x)  0]'KK[y(x)  0] + z'(x)(X'X)'z(x)trace[K].
Lai and Chang (1994) propose a fuzzy set approach to multiple response optimization that attempts to simultaneously optimize the individual responses while minimizing the variance of the predicted responses. This is in the same vein as minimizing the mean square error that Vining proposed, in that it has a penalty
34
for not being at the overall optimum and a penalty for the quality of the prediction. A description of this method would necessitate the introduction of a large amount of fuzzy set theory, fuzzy regression, and a number of other fuzzy concepts, and will not be discussed further here.
2.6 Criticisms of Multiple Response Optimization Methods
As with the single response optimization methods, all of the multiple response optimization methods have shortcomings that need to be addressed. The method of Derringer and Suich does have an intuitive appeal with its desirability function, but it is questionable whether the desirability function has any real correlation with overall product quality. For instance, one choice of shape factors could give a final desirability that is equal to 0.70, while another choice of shape factors could give a final desirability of 0.75. In this case, it is not clear which of the two solutions gives the better set of operating conditions. Even though 0.75 is a larger desirability than 0.70, both are a function of their shape parameters, so a comparison is very difficult to make and as a result the desirability has no real meaning in and of itself. Ignoring the possibility that the responses could be strongly correlated is a far more serious problem, as is the issue of multiple shape parameters and targets that must be specified by the experimenter. Treating the responses as if they were independent could lead to a suboptimal solution, especially when some of the responses are highly correlated. Similarly, poor choices of the values of the shape parameters and targets may lead to a suboptimal solution. In fact, the solution itself may depend heavily on the choice of these parameters. This could lead to a situation where the experimental data is repeatedly reoptimized with different parameters until the experimenter gets the solution that he desires.
Khuri and Conlon's distance function does consider the correlation among the responses, but it is not without its own shortcomings. Their treatment of the target, especially in larger (or smaller) is better settings, can lead to serious problems.
35
In these situations, they find the maximum that each response can achieve when treated by itself and use this as the target. This ignores the actual specifications that a product may have. For instance, a response may be able to achieve a value of 500, but the product may only need to be above 100, and anything over 200 may not add anything to the usefulness of the product. In this case, Khuri and Conlon's method would still try to achieve a response of 500, and this may make the solution entirely dependent on that one response, and in fact it may drive other responses far from their desired values. A target of 200 would probably be a more realistic and lead to a better overall solution. A situation where one response is driven far from its target because another response is trying to reach an unreasonable maximum will be illustrated in a later example.
The method proposed by Ames et al. suffers from the fact that it ignores possible correlations among the responses. In addition, a weight is required for each response, which may not be easy to define. Possible choices would be the reciprocal of the variance or a function of the cost or importance of each response. Using the reciprocal of the variance is a special case of the Khuri and Conlon method. In any case, the final solution may depend more on the choice of these weights than the actual data of the experiment.
The method that Vining proposes addresses some of the problems with Khuri and Conlon's distance function, most notably the issue with the appropriate targets, but has the same problem as Ames with respect to the definition of a cost matrix. Vining suggests defining the cost matrix in a manner similar to Ames, namely the inverse of the variancecovariance matrix or a function of the costs of each response. Though there may be times when the costs can be easily defined, it could become too subjective to make for a truly optimal solution. Vining recommends that the targets be chosen in a manner consistent with that of Derringer and Suich. In the
36
cases where a response is to be maximized or minimized, the target should set at a level above or below which there is no further benefit to the product.
2.7 Robust Multiple Response Optimization
All of the multiple response optimization methods described previously assume that the variancecovariance among the responses is constant in the experimental region. There may exist situations where this assumption is not valid, and a method for optimizing a multiple response system where the variance is not constant in the experimental region must be addressed. Pignatiello (1993) has addressed this issue by suggesting that the mean square error be minimized, similar to what Vining suggested for multiple response systems, though this appears to be the only attempt at optimization under the assumption of nonconstant variance.
The optimization experiment will begin in a manner similar to that of simple multiple response optimization. The optimization experiment will begin by running an experiment capable of fitting the assumed model for the response. Since an estimate of the variance will be needed at some of the points of the experiment, some of the experimental points will need to be replicated. The part of the design that is replicated must be able to support the model that is assumed for the variance. At each of the n design points, the r responses are observed as yij, with i = 1,..., r and j  1,..., n. At the nr points that have been replicated, an estimate of the mean Kij and the variance s. can be obtained, with i =1,..., rand j =1,..., n.
Pignatiello recommends that all of the points of the experiment be replicated and uses a loss function of the form
L = [f/(x)  O]'C[fi(x)  0],
where C is an appropriate positive definite matrix of costs or weights and 0 is the vector of target values. He only suggests using this method when all of the responses are of the 'target is best' situation. Though he does not deal with 'larger (or smaller)
1 37
is better', a target could be chosen in the manner recommended by Derringer and Suich, among others, namely choose as the target the level above (or below) which the product shows no overall increase in value. An estimate of the expected loss, or mean squared error (MSE), is then given by
E(L) [V(x)  6]'C[:U(x)  0] + trace[CE2(x)],
where E(x) is an estimate of the variancecovariance matrix, which does depend on the location in the experimental region.
The strategy that Pignatiello recommends to find the optimal point is to estimate E(L) at each point in the experimental design, and then use traditional regression techniques on the expected loss. He suggests using the expected loss as the response of interest in a sequential set of experiments to find the optimal setting of the control variables, as in response surface methodology. He suggests beginning with a small design, such as a fractional factorial, to find out which control variables need to increase or decrease in order to reduce the estimate of E(L). He then suggests a method of steepest descent, in an effort to find the direction that most quickly decreases the expected loss. Other approaches for finding the minimum expected loss are mentioned, but the overall idea is similar in all of the cases and the only difference between them is the set of assumptions used to find the optimum.
There are several problems with this method that must be addressed. As with other methods, a cost or weighting matrix is introduced and must be specified by the experimenter. Pignatiello does give some methods to specify the individual elements of the matrix, but these can often be quite subjective. In many cases, the overall choice of the optimum may depend too much on the choice of these subjective parameters. Though he does not deal with the case of 'larger (or smaller) is better,' the most likely way to deal with it is to introduce 'artificial' targets for these responses. Again these choices may be very subjective and greatly affect
38
the final optimal point. He does not go into much detail on how he estimates the variancecovariance matrix, but it appears that he estimates it at every point that is run in the experiment. Better strategies may or may not exist that give a better overall estimate of the variancecovariance matrix, and could lead to more efficient experimental designs.
2.8 Optimizing a Function
In all of the optimization schemes presented thus far, the experimental data are reduced to a single function which is to be either maximized or minimized. Some of the methods add a secondary constraint, and all have a constraint on the values that x may take. Typically the solution cannot be expressed in a closed form, so that some form of numerical method must be used. Several numerical methods are available, including the generalized reduced gradient method mentioned previously.
Del Castillo and Montgomery proposed using the generalized reduced
gradient (GRG) algorithm for solving the dual response surface problem presented by Vining and Myers. The GRG algorithm was chosen because it is known to work well when optimization is done on a small region and the constraints are not highly nonlinear, which is true of the dual response surface problem.
Like many functional maximization algorithms, the GRG uses a gradient to find the direction of steepest ascent, then moves in this direction in order to increase the value of the function. The first step of the algorithm is to convert the inequality constraints into equality constraints through the use of slack variables. For instance, a constraint of the form x1 + x2 > c is rewritten as x1 + x2 + x, = c, where the slack variable x3 must be nonnegative. Note that constraints involve more than one variable, thus the bounds on the individual variables of the form 0 x1 K 1.0 are not deemed to be constraints.
After the constraints are modified, there will be m constraints and n variables, some of which may be slack variables. The ni variables are grouped into two sets,
39
basic variables and nonbasic variables. The basic variables are those variables that are strictly between their bounds, while those that are at their bounds are nonbasic variables. The system of m equations in n variables typically cannot be solved explicitly, since m is usually less than n. However, the system can be solved for m of the variables (basic variables) in terms of the remaining n  m nonbasic variables. The gradient of the objective function with respect to the nonbasic variables is called the "reduced gradient." If the magnitude of the reduced gradient at the current point is small, then the algorithm stops. Otherwise, the gradient is used to give the direction of steepest ascent (or descent for minimization problems). A line search is performed in this direction, a new point found, and the process repeated.
Nelder and Mead (1965) proposed a direct search algorithm, utilizing a simplex. Box (1965) adapted the algorithm for constrained maximization. The method does not assume any smoothness, and thus does not utilize gradients. Instead it uses a simplex of points in the space of the variables. The vertex of the simplex with the lowest response value is reflected to the opposite side of the simplex, and by repeating this the simplex moves towards the maximum point. By use of expansion and contractions of the simplex, it will eventually surround the maximum point.
The simplex is initially defined by k points, one of which is specified by the user, and the rest are randomly generated. The number of vertices of the simplex is greater than the dimensionality of the surface, in order to reduce the possibility that the simplex collapses to a subspace of the variables. All of the initial points will be chosen to satisfy the constraints of the problem. The function is evaluated at all of the vertices of the simplex, and the vertex with the lowest response value is reflected through the centroid of the remaining points of the simplex. The new point will be a> 1 times as far from the centroid as the original vertex. If the new point still has the smallest value, it is moved half way towards the centroid of the remaining
40
points. If the new point is outside of the bounds of the variables, it is moved just inside of the boundary. If the new point does not satisfy some of the constraints, it is moved half way towards the centroid of the remaining points.
The simplex method tends to be much slower than the gradient method, but they both tend to give the same result if the surface is reasonably smooth. Since the simplex does not assume any smoothness, it will be less likely to fail to find the true optimum due to unusual functional circumstances, such as the solution being in a corner of the variable space. Though it is slower than most gradient methods, the simplex method does still give a solution in well under a minute on modern computers (Gateway computer with an Intel Pentium 266 MHz processor with 64 MB of RAM) for the types of problems that have been dealt with here.
As with any maximization algorithm, one must be aware that the algorithm may settle on a local optimum. In order to guard against this, one commonly uses a number of different starting points, and if they all find the same optimum, it can be safely assumed that it is the global optimum. For simple functions in a small number of variables, one may construct a grid in the variable space and evaluate the function at every point in the grid. Contour plots can then be generated to look at the surface of the function to verify if it is a local or global optimum.
CHAPTER 3
ROBUST PARAMETER DESIGN
3.1 A Description of the Proposed Method
As with the other methods that have been previously introduced, this
method begins with a response surface experiment that will allow both the mean and variance (or a function of the variance) to be fit to a second order model in the p control variables. The fit of the mean of the response is given by = 0% + E ii+ 3Z/ijxixj
i=1 i=1 j>i
and model for the standard deviation of the response is given by
Wr= I'o +F yxi+YiXij
i=1 i=1 j>i
where the/ and 7 are the least squares estimates of the true parameters. It is at this point that all of the proposed methods attempt something different. Vining and Myers proposed that a constrained optimization be performed. Lin and Tu proposed that a squared error loss function be minimized. Copeland and Nelson introduced a slightly different constrained optimization than the method proposed by Vining and Myers. Finally, Kim and Lin took a fuzzy set theory approach to the optimization.
Taguchi's idea is that the variance of a process should be kept as small as possible, while keeping the mean of the process at a specific target. If one allows for bias, or the mean not equaling the target, this problem can also be approached by keeping the response close to target with the highest probability, or maximizing the probability that the quality characteristic y is within its specifications, v,. and ymax. Thus the optimal solution is the location of the control parameters x that will:
41
42
" Target is best: maximize P(ymin
" Larger is better: maximize P(ymin < y)
" Smaller is better: maximize P(y < yma)
For simplicity, the probability will be calculated by assuming that the response of interest y is normal with mean P, and standard deviation w^ (which both depend on the control variables x). As will be seen later, the assumption of normality is not a necessary assumption, but does simplify calculations. One could choose another distribution and calculate the probability in order to find the optimal setting.
This approach is similar to the goal of a high Cpk, which is becoming popular in industry. A thorough treatment of capability indices can be found in Kotz and Johnson (1993), and a discussion of issues surrounding the use of capability indices can be found in Rodriguez (1992). The index Cpk is a measure of how far the mean
9 of a distribution is from its closest specification limit, either y,ij or ymx . The distance is measured in terms of the standard deviation of the distribution, given by the usual unbiased estimate s. The idea of Cpk is to have a measure of the percentage of product that is failing that is easy to calculate. The equation for Cpk is usually given by the minimum of the lower and upper capability indices, or CPk= min CP   Y m, CpU = Ymaxy.
3s 3s
If one assumes normality, then the lower and upper capability indices are simply multiples of a standard normal z statistic, and the probability of being out of specification is easily calculated by P(z > 3Cp1) + P(z > 3Cpu). In many cases, one of these two probabilities is very nearly 0, 0o the probability of being out of specification is very nearly P(z > 3Ck). Ck is not, however, a meaningful measure if the assumption of normality does not hold. The assumption of normality is what gives the 0pk the close relationship with the probability of being outside the specification limits.
43
The proposed probability method has intuitive appeal, because most people will easily understand the problem being solved, that of measuring the probability that the process is within specifications. This is in direct contrast to the Kim and Lin method, in which A, the value of a membership function, has no direct interpretation. The proposed method does simultaneously minimize variation while staying close to target, and uses the same basic methodology for all three cases listed, unlike any of the previously proposed methods. It also answers all of the issues presented by the other proposed methods, while introducing few new issues.
One issue that other methods have addressed is that of allowing some bias into the final solution, or allowing the final solution to have the estimated mean A not equal to its target. This method does allow for some bias in the optimal solution, but it does penalize for too much bias. As the bias gets larger, the mean is moving closer to the maximum or minimum specification limits, thus decreasing the probability that is to be maximized. Thus the bias will want to be kept small. If the amount of bias is a critical concern, the maximum and minimum specification limits can be made very close to each other. This would be akin to making A small in the method of Copeland and Nelson.
An issue that has not been addressed by other methods is that of the constraint that the variancew ^4 be kept at a specific level, which is done in the constrained optimization method of Vining and Myers, as well as that proposed by Copeland and Nelson. In the case of larger (or smaller) is better, these two methods recommend that the secondary constraint be that the variances )' be set at or below some acceptable value r. The choice of an acceptable amount of variance can be quite subjective. One recommendation is to use a number of different choices of a02 and then pick the one that gives the best balance between the mean and variance. No mechanism is given to compare these competing solutions, however. The proposed probability method is not forced to use this constraint, since the goal
44
is to balance the mean and the variance simultaneously, via a probability. Also in the case of larger (or smaller) is better, there is no need to introduce an artificial definition of bias. The squared error loss method of Lin and Tu minimizes a measure that has a variance term and a bias term. The bias term is defined as the distance that the mean is from its target, but there is no clear target when the response is to maximized (or minimized). In these situations, an artificial target is selected anyway, though its choice can be quite subjective and greatly influence the final solution.
Though the probability method must specify a minimum and maximum limit for the response, there are far fewer parameters introduced than the method of Kim and Lin. Kim and Lin need the experimenter to define shape parameters for both the mean and the variance. If one assumes normality, the choice of the minimum and maximum will frequently not affect the choice of the optimal setting, as long as the maximum and minimum are symmetric about the target. As can be seen in Figure 3.1, a solution that maximizes P(yl < y < yu) will be almost identical to a solution that maximizes P(yl* < y < yu*). The Kim and Lin method gives a solution that can be very dependent on the shape parameters that must be specified, as will be shown in a later example.
The use of the shape parameters in the Kim and Lin method seems to be motivated by a situation where being out of specification on one side is less costly than the other. For instance, it may be relatively inexpensive to repair a part that is too large, while a part that is too small may have to be scrapped altogether. The proposed methodology is easily modified for such an economic situation. Instead of maximizing the probability of being within specification limits, one could minimize the expected cost of failing material, given by c1P(y < Ymrn) + c2P(y > Ymax), where Cl and c2 are the costs of failing low and high, respectively. Though this would add
45
yl* yl yu yu*
Figure 3.1: Optimal Solution and Two Sets of Limits.
46
two extra parameters, the cost parameters could be much easier for an experimenter to specify than the shape parameters necessary for the method of Kim and Lin.
Thus the probability method seems to answer all of the issues presented in previous approaches to the problem. Issues that may arise from this method would be the assumption of normality, no direct mention of a target value in the solution, and the usual issue of how much bias may enter the solution. As for the assumption of normality, it is not critical and is only used for simplicity. One could use any distributional assumptions, as long as the probability is maximized accordingly, as will be shown in an example. The target does enter the solution as being half way between the maximum and minimum limits, though this is not explicit. This does not seem to be a major issue, since the minimum and maximum may be made very close, which will force the optimal setting near the target. This will also keep the bias within some bounds, if that is critical.
Two situations may cause this method to have troubles. The first is if the mean response is unable to fall within the upper and lower limits, which will cause this method to try to find an area with a very large variance. Figure 3.2 shows two possible distributions in a larger is better situation. It can be easily seen that P(y > yl) is smaller for distribution 1 than it is for distribution 2. The probability method would choose distribution 2 over distribution 1, despite the fact that distribution 2 has a lower mean and a larger variance, neither of which is consistent with a situation where the response is to be maximized. In this situation, a small variance would cause the process to be consistently outside of the specification window. A large variance, on the other hand, would get at least some of the product within specification some of the time. Either way, this would not be a desirable process for most people. The second situation that could cause problems would be if the specification window was very wide or narrow. In this situation, the probability of being within specification would be almost identical (and almost always either 0
47
Dist.
Figure 3.2: Two Possible Solutions for Larger is Better.
or 1) for the entire experimental region. This would cause the optimization routine to become very unstable, and would likely return different locations for the optimum for different starting values, though all locations would have essentially that same probability of being within specification. It would probably be best to modify the specification limits if this should arise, so that the probability being maximized is not so close to 0 or 1.
3.2 Printing Process Example
The example will be taken from Box and Draper (1987), which also appears in Vining and Myers, Lin and Tu, and Kim and Lin. The purpose of the experiment
s
48
was to determine the effect of speed (x1), pressure (x2), and distance (x3) on the quality of a printing process. The experiment was conducted in a 33 factorial design with three replicates at each design point. The data set with the estimated responses is in Table 3.1, and the summary statistics of the fit of the two models in Table 3.2. A second order model was fit for both the mean and standard deviation as follows (Vining and Myers, 1990):
wt =327.6 + 177.0xl + 109.4x2 + 131.5x3 + 32.0xi
 22.4x2  29.1x2 + 66.0x1x2 + 75.5xjx2 + 43.6x2x3 (3.1)
WO, =34.9 + 11.5x1 + 15.3x2 + 29.2x3 + 4.2x2
 1.3X + 16.8X2 + 7.7xlx2 + 5.1xlx3 + 14.1x2x3 (3.2)
3.2.1 Case 1: Target is Best
In order to compare all of the different methods, the bounds that Kim and Lin proposed will be used, namely wn = 490, wax 510, I0o 500, wmin V1500  38.73, and wmfx  /'2100 = 45.83. The shape parameters that will be used are =d 4.39,1.70,0,1.70,4.39, and d = 0. The region of interest will be a sphere of radius 1. Table 3.3 gives the optimal settings from all of the methods mentioned, where the traditional method is that of Vining and Myers combined with Del Castillo and Montgomery. The final column is the probability that the response will be between the upper and lower bounds given above, assuming normality with mean II and standard deviation r . This is calculated as
P(490 < y < 510)= (2it )1/2exp((y  bu)2/(2zi))dy.
It should be noted that these solutions were arrived at using the solver
command in Microsoft Excel, version 7.0. Solver is a generalized reduced gradient
49
Table 3.1: Printing Study Data.
U X1 X2 X3 Yul Yu2 Yu3 Y su W A W
1 1 1 1 34 10 2 0 1 1 115 116 3 1 1 1 192 186 4 1 0 1 82 88 5 0 0 1 44 178 6 1 0 1 322 350 7 1 1 1 141 110 8 0 1 1 259 251 9 1 1 1 290 280
10 1 1 11 0 1 12 1 1 13 1 0 14 0 0 15 1 0 16 1 1 17 0 1 18 1 1 19 1 1 20 0 1 21 1 1 22 1 0 23 0 0 24 1 0 25 1 1 26 0 1 27 1 1
0 81 81 0 90 122 0 319 376 0 180 180 0 372 372 0 541 568 0 288 192 0 432 336 0 713 725 1 364 99 1 232 221 1 408 415 1 182 233 1 507 515 1 846 535 1 236 126 1 660 440 1 878 991
28 130 263 88 188 350 86 259 245 81 93 376 154 372 396 312 513 754 199 266 443 182 434 640 168 403
1161 1010.0 142.5 911.2 137.5
Table 3.2: Summary of Fits of the Two Responses.
R2 0.927 0.454 Rad 0.888 0.165
24.0 120.3 213.7 86.0 136.7 340.7 112.3 256.3 271.7 81.0 101.7 357.0 171.3 372.0 501.7 264.0 427.0 730.7 220.7 239.7 422.0 199.0 485.3
12.5 75.4 8.4 78.9 42.8 146.4 3.7 97.6 80.4 167.1 16.2 300.6 27.6 75.0 4.6 210.6 23.6 410.1 0.0 116.8 17.7 195.8 32.9 338.9 15.0 182.6 0.0 327.6 92.5 536.6 63.5 203.6 88.6 414.7 21.1 689.7 133.8 100.2 23.5 254.6 18.5 473.1 29.4 209.6 44.6 430.1
25.4 19.9 22.8 20.3 22.5 33.1 12.5 22.4 40.7 18.6 18.2 26.3 27.6 34.9 50.6 33.8 48.9 72.3 45.4 50.1 63.3 68.4 80.9
673.7 158.2 714.5 101.7 176.7 55.5 274.2 88.8 501.0 138.9 560.7 108.9
50
Table 3.3: Target is Best, Limits of 490 and 510.
Method Optimal Setting W Prob Proposed (0.983, 0.003,0.182) 494.61 44.66 0.1759 Traditional (0.984, 0.025,0.175) 500.00 45.32 0.1746 LinTu (0.983, 0.002,0.182) 494.53 44.65 0.1759
CopelandNelson
A=5 (0.983, 0.004,0.182) 495.00 44.71 0.1759 A=1 (0.984, 0.021,0.177) 499.00 44.20 0.1751
KimLin
d,=4.39 (0.975,0.005,0.187) 490.57 44.23 0.1749 d,=1.70 (0.983,0.014,0.185) 491.19 44.24 0.1754 d0 (0.977, 0.003,0.184) 492.02 44.39 0.1754 dA=1.70 (0.983, 0.002, 0.183) 493.52 44.53 0.1759 d,=4.39 (0.979, 0.042,0.202) 495.74 44.81 0.1758
algorithm for finding a constrained maximum. The details on the use of the solver command can be found in the Appendix A. It should be noted that with any maximization or minimization algorithm, there is the chance that a local, rather than global, optimum will be found. A common way to guard against this is to use a number of different starting points. If all of the different starting points end up at the same optimum, it may be safely assumed that it is the true global optimum. It may also be possible to evaluate the function at every point of a rough grid of the experimental region. The value at the optimum that has been found can then be compared to the values over the rough grid to determine whether it is likely that it is a true global optimum.
As Table 3.3 shows, there is very little real difference among most of the
solutions. The proposed method is essentially identical to the method of Lin and Tu. When one is confronted with many similar solutions, the simplest and most robust is most desirable. In order to check for robustness, the above is rerun with new limits of.wmi = 450, wumx =550 (approximately ï¿½1 standard deviation), w =5,an all other limits the same. Note that Kim and Lin is the only other method that uses these limits, so the location of the other solutions will stay the same, although the
51
Table 3.4: Target is Best, Limits of 450 and 550.
Method Optimal Setting P a Prob Proposed (0.983, 0.003, 0.183) 494.58 44.66 0.7336 Traditional (0.984, 0.025,0.175) 500.00 45.32 0.7300 LinTu (0.983, 0.002,0.182) 494.53 44.65 0.7336
CopelandNelson
A=5 (0.983, 0.004,0.182) 495.00 44.71 0.7336 A=I (0.984, 0.021,0.177) 499.00 45.20 0.7312
KimLin
d,=4.39 (0.937,0.047,0.232) 465.57 41.50 0.6253 d,=1.70 (0.948,0.034,0.215) 473.66 42.38 0.6759 d1A=0 (0.979,0.057,0.195) 480.97 43.02 0.7099 d,=1.70 (0.968,0.013,0.191) 486.53 43.80 0.7243 d,=4.39 (0.978,0.003,0.184) 492.16 44.41 0.7324
calculation of the probability will change, due to the new limits. Table 3.4 gives the results for all of the methods.
As can be seen in Table 3.4, the method of Kim and Lin has some serious problems with bias. While the other methods have a bias in the range of 5 or less, Kim and Lin's method has biases ranging from 8 to 34, despite their claim that their method will limit the bias. Note that the bias of 34.43 is approaching one full standard deviation (41.50) from target. One of the concerns of the Kim and Lin method was that the final solution would depend too much on the choice of the shape parameters, as well as the choice of upper and lower limits. As Table
3.4 shows, there is a great deal of difference between the solutions with shape parameters of +4.39 and 4.39. The predicted mean can be as close as 8 from target or as far as 34.5 from target. In addition, the solutions are quite different when the upper and lower limits are changed, as can be seen when comparing Table 3.3 with Table 3.4. Fixing the shape parameter at 0, when the limits are 490 and 510, the estimated mean P , is 492, but it is 481 when the limits are changed to 450 and 550. The proposed probability method, as expected, remains essentially the same when the limits are changed. With the tighter limits, the estimated mean t, is 494.61,
52
Table 3.5: Larger is Better, Lower Limit of 550.
Method Optimal Setting ft Prob Proposed (0.818, 0.415, 0.399) 637.35 74.17 0.8806 Traditional (0.946, 0.312, 0.088) 594.02 60.00 0.7684
and it only changes to 494.58 when the limits are widened to 450 and 550. The standard deviation , does not change when the limits are changed. This shows that the proposed probability method is insensitive to the choices of the upper and lower limits.
3.2.2 Case 2: Larger is Better
Vining and Myers originally looked at the case of maximizing the mean while restricting the standard deviation to 60. Del Castillo and Montgomery also analyzed this situation, as did Copeland and Nelson, whose restriction was keeping the standard deviation less than or equal to 60. These methods will be compared to the proposed method of maximizing the probability that the response is greater than 550
P(550 < y) = j5(2rz)/2exp((y  z,)2/(2ih2))dy.
The maximization will be over a spherical region of radius 1. The results are in Table 3.5.
By not having a restriction on the standard deviation, the proposed method comes up with a much higher response value, with only a moderately higher value of the standard deviation, which results in a much higher probability of success. Of course one could always change the restriction in the traditional method to keeping the standard deviation less than 75, and one would expect a solution similar to the proposed method. One advantage that the proposed method has over the other methods is the ability to compare two potential solutions such as those above.
53
Table 3.6: Larger is Better, Comparing Different Lower Limits.
Lower Limit Optimal Setting W Wa Prob > 550
500 (0.847, 0.404, 0.346) 633.81 71.84 0.8783 550 (0.818, 0.415, 0.399) 637.35 74.17 0.8806 600 (0.789, 0.423, 0.445) 639.07 76.18 0.8788 650 (0.762, 0.429, 0.485) 639.40 77.88 0.8745
Indeed Vining and Myers solve the problem for standard deviations of 60, 75, and 90, but no methodology is given to compare the three competing solutions. Their only advise is that "the user must decide which of these settings offers the best compromise." Previous methods have had to assume an upper limit on the standard deviation, but no method has been given to choose an appropriate upper limit.
Note that the results of Copeland and Nelson were almost identical to those of the Vining and Myers method. The method of Lin and Tu does not address the case of larger is better. The method of Kim and Lin was not looked at due to the many extra parameters that would need to be specified, as well as its inferior performance in the case of target is best.
In order to assess how much the final solution depends on the lower limit that is used, I found the optimal point for lower limits of 500, 600, and 650, which are displayed in Table 3.6. In order to compare the different solutions, the probability that the response is greater than 550 is calculated for all of the cases
P(550 < y)  (27rt2 )/2exp((yW1,)2/(2Wv2))dy.
Although the x locations change by a small amount, the probability of being above 550 does not change too much, which shows that this method appears to be robust to the choice of the lower limit.
54
Table 3.7: Smaller is Better, Upper Limit of 150.
Method Optimal Setting A 30 Prob Proposed (0.399, 0.451, 0.798) 136.34 19.25 0.7611 LinTu (0.392, 0.421, 0.818) 136.26 19.44 0.7602 CopelandNelson (0.391, 0.414, 0.822) 136.25 19.48 0.7598
3.2.3 Case 3: Smaller is Better
Copeland and Nelson looked at the case of smaller is better, as did Lin and Tu. These two methods will come up with nearly identical results when optimization is over a sphere of radius 1. The proposed method maximizes the probability that the response is less than 150
P~y 150 0
P(y < 150) = 0j (27rW.I12exp((y  W (W1)y
All three come up with nearly identical solutions, as given in Table 3.7.
The traditional Vining and Myers method will be identical to the Copeland and Nelson solution. Again the Kim and Lin method was not considered for the same reasons as in the larger is better situation.
3.3 Example with Lifetime Data
This example is simulated for a hypothetical reliability experiment. The experiment has four equally spaced levels of a control variable, x, which are coded
1,  1 1and 1, and we are interested in the time until failure, t, measured in hundreds of hours. The failure time data is randomly generated from a Weibull distribution, whose density is given by f(t) = (tI16a)3
The true a and /3 depend on x according to the following equations:
a =4.6992  10.88x+8x2
55
Table 3.8: Time Until Failure Data (Time in Hundreds of Hours).
11 1
IX
True a 23.58 9.21 1.96 1.82 True / 0.576 0.649 0.540 0.248
Sample 0.1283, 0.4431 0.1082, 0.0239 0.1020, 0.3425 0.6462, 3.5592
0.0040, 0.2122 0.0068, 0.1066 0.2261, 3.5968 0.3835, 6.5698 0.0886, 0.2198 0.4795, 0.5322 0.2177, 0.0683 3.6837, 0.0962 2.6648, 0.0286 0.1203, 1.4320 2.5847, 1.8744 0.0007, 0.0050 0.0040, 0.0563 0.0203, 0.3836 9.2947, 0.2816 0.0847, 0.0003
& 4.6771 3.8210 0.7505 1.8988 0.5603 0.7259 0.6552 0.3736 4 0.3850 0.3213 1.8589 1.5029 01_ 0.8122 0.4368 2.8921 2.2947
/3 = 0.6172  0.164x  0.205x2.
At each level of x, a random sample of size 10 is generated from a Weibull distribution, with the appropriate a and/3. Table 3.8 lists the data for each level of x. The table lists the true values of a and / that were used to generate the sample, as well as the maximum likelihood estimates, & and/3, and the sample mean, f4, and standard deviation, a. Note that the maximum likelihood estimates of the Weibull parameters, & and /3 must be found using a numerical procedure, since no closed form exists for these. The details can be found in any book on survival analysis or reliability, such as Lawless (1982).
The purpose of this analysis is to show that the proposed method is flexible enough to handle a distribution that is not normal. Furthermore, it will show that one could get a solution that would be far from optimal, if normality is assumed. Four analyses are given in Table 3.9, they are:
* Truth: This is the location of x in [1, 1] that maximizes the probability that
the time until failure is greater than 0.1, P(t > 0.1). This is done by the maximization algorithm of Nelder and Mead that has been discussed previously.
56
Table 3.9: Optimization Information for Different Distributional Assumptions. Distribution Parameter Estimates R2 Optimal x P(t > 0.1) Truth 0.5510 0.6952 Weibull & 2.161  1.711x + 1.127x2 0.7839 0.2819 0.6446
/3 = 0.719  0.095x  0.252x2 0.9995
Normal f = 1.108 + 0.734x  0.164x2 0.6660 1.0000 0.5194
a 1.678 + 1.035x  0.125x2 0.5805
Lognormal 4 1.189 + 0.346x  1.178x2 0.5688 0.0278 0.5460
= 1.534 + 0.709x + 1.27391 0.9456
" Weibull: The estimated parameters, & and /3, are separately fit to second order
models in x. It is then possible to calculate an estimate of the probability that the time until failure is greater than 0.1, assuming that t is Weibull distributed
with parameters &(x) and /3(x)
P(t > 0.1)= &(x)/3(x)(&(x)t),(8le(a()t) ()dt.
We then find the location of x in [1, 1] that maximizes the estimate of P(t > 0.1). Table 3.9 gives the individual fitted equations, as well as the R2 of the fit, the location of the optimal x, and the true P(t > 0.1) for that x. Note that since we know that the data were generated from a Weibull distribution with the functional form of a and /3 given above, we are able to calculate the true
P(t > 0.1).
* Normal: The estimated parameters, " and a, are separately fit to second order
models in x. It is then possible to calculate an estimate of the probability that time until failure is greater than 0.1, assuming that t is normally distributed
with parameters 4(x) and &r(x)
P(t > 0.1)= (2ir82(x))l/2exp((t ft(x))2/(23r2(x)))dt.
57
We then find the location of x in [1, 1] that maximizes the estimate of P(t > 0.1). Table 3.9 again gives the pertinent information. Note that the probability
given is the true probability for the x given.
* Lognormal: In many situations when heavily skewed data is encountered, the
experimenter will try to transform the data to make it more normal and then analyze the transformed data as though it were normal. In this case, a logarithm transformation would be a good choice. For the lognormal line of Table 3.9, all of the original times to failure are transformed, then everything procedes as with the normal case above. The means and standard deviations of the transformed data are not given in Table 3.8, but they are used to fit second order equations in x, and thus find the optimal x as in the normal case above. Table 3.9 again gives the pertinent information, and the probability is the true probability for
that x.
As the table shows, it is possible to find an optimal setting for x and not have to rely on the assumption of normality. Furthermore, the solution assuming normality is far from optimal, even suggesting that the optimal setting is not within the experimental region. Taking the logarithm transformation does not seem to help very much in this situation, though it is an improvement over the assumption of normality. The Weibull solution is better than the normal or lognormal solution, but it is a fair distance from the true optimum solution. I believe that this is due more to a lack of repeatability with any optimization situation rather than something that is unique to this situation.
It should be noted that I had a very difficult time coming up with data that would show a difference in optima when assuming a normal distribution rather than a Weibull distribution. Initially I thought that I could use any randomly generated set of data, but I instead found that the data had to be very heavily skewed before any difference could be seen. Perhaps I could have more easily arrived at a suitable
58
set of data, but it seemed to me that this method is robust to the assumption of normality. In any case, the practitioner would still want to use the correct distributional assumption in order to get the proper estimate of the probability of being acceptable.
CHAPTER 4
MULTIPLE RESPONSE OPTIMIZATION 4.1 A Description of the Proposed Method
As with the other methods that have been proposed for multiple response optimization, this method begins with an experiment capable of fitting a second degree polynomial in the p control factors x = (x1,...I, xP)', such as a central composite design, face centered cube, or a 3P factorial. At each of the n design points, the r responses are observed as yij, with i = 1, ..., r and j = 1,..., n. Each of the responses is then individually fit to a full second order response surface model, given by
=/=3A0 +ï¿½E /Aj~xj+ z /ijkXjsck,
j=1 j=1 k>j
where the , are the least squares estimates of the true regression parameters. This can also be written as
fli(X)  X )3ji , .,r,
where X is the design matrix of control variables, their powers, and their crossproducts, and /i is a vector of the least squares estimate of the ith set of regression coefficients. An unbiased estimator (Gnanadesikan 1977) of the variancecovariance matrix is given by E= Y[In X(X'X)X']Y/(n q),
where I. is a nxn identity matrix, q is the number of terms in the full model, and Y is the rxm matrix formed by the vectors of the actual responses at the n experimental
59
60
settings. Note that E is assumed to be constant over the experimental region, so E does not depend on x. All of the proposed methods are essentially the same up until this point. Now a function of fitted responses needs to be maximized or minimized. The functions that have been proposed include the desirability function of Derringer and Suich, the distance measure of Khuri and Conlon, a weighted loss function of Ames, et al., and the mean squared error of Vining.
The goal of optimization in the case of a single response is to maximize the probability that the response of interest will be within its specification limits. This is the same goal when there is more than one response of interest, namely maximize the probability that all of the responses are simultaneously within their specification limits. Therefore the approach taken for multiple response optimization will be a multivariate generalization of the previous approach for a single response. The actual optimization is done by finding the combination of control settings x that maximizes the probability that all of the responses yi are within their specification limits, yT" and YT'x, under the assumption that the response vector y is multivariate normal with mean vector having elements equal to ypi(x) and variancecovariance matrix E. The quantity to be maximized is
S y1
max max
f ... (27r)r/21El1/2exp((y y(x))'El(y y(x)))dyl...dy,
where some of the maxima or minima may be ï¿½oo. If a response is to be maximized or minimized ('larger is better' or 'smaller is better') then the appropriate upper or lower limit would be +Fac or oc, respectively.
The proposed probability method has many advantages over the other
methods that have been proposed for multiple response optimization. One of the issues that was raised with many of the other methods had to do with targets that had to be specified in the cases where the response was to be maximized or
61
minimized. In these cases, an artificial target was specified that was based on either knowledge of the situation, the absolute maximum or minimum that could be attained in the region of interest, or some other unspecified method. The only parameters that need to be specified for the probability method are the upper and lower limits on each of the responses, which are often known. When the response is to be maximized or minimized, the upper or lower limit that is not clearly known is merely set to +oo or oc, respectively. The probability method is not trying to reach an artificial target in these situations, rather it is trying to get the distribution as far from the specification limits as is reasonable. The idea of trying to keep the response away from its specification limits, rather than get it close to a target, is the main philosophical difference between the probability method and the other methods that have been proposed. For instance, Figure 4.1 shows a situation of two responses which are to be maximized, and some of the contours of constant loss from not being at the targets. Note that points A and B have the same loss, though point A is out of specification for yi.
A concern with the desirability method of Derringer and Suich is that there may be a number of shape parameters that need to be specified in order to perform the optimization. In the absence of any other information, these shape parameters are typically set at 1. These shape parameters can, however, be used to give more or less weight to certain responses, or to influence a certain response to be higher or lower due to economic situations. Even with good knowledge of the engineering and economic situations, it may be difficult to specify the shape parameters that will reflect the true situation that is desired. A similar concern arises with the weight parameters that need to be specified with the weighted loss function approach of Ames, et al. Similarly, the need to specify a cost matrix is a concern with the mean squared error method of Vining. The probability method avoids the need to specify shape, weight, or cost parameters by relying on the upper and lower specifications
In Spec
Y2 TargetMin
T Y1 Target
Figure 4.1: Contours of Constant Loss.
62
Out of Spec
B
MI Min
i i
63
for the individual responses. Obviously, if the shape, weight, or cost parameters are easier to specify than the specification limits, then the probability method may not be the best method of optimization. In many industrial situations, however, the specifications are clearly defined, whereas the shape, weight, and cost parameters are not.
The proposed method does utilize the fact that the responses may be
correlated, which is not the case for the methods of Derringer and Suich or Ames, et al. As the number of quality characteristics that are monitored increases, the chances for some of them to be highly correlated increases. Ignoring possible correlations may lead to erroneous conclusions, where highly correlated characteristics unfairly weight one area of the experimental region over another.
There are areas where the probability method proposed here may not be the best approach to multiple response optimization. One situation, discussed earlier, deals with economic tradeoffs that may need to be dealt with. For instance, if a certain response can be inexpensively reworked if it falls above the upper specification limit, but must be scrapped if it falls below the lower specification limit, there is a definite economic advantage to keeping the response toward the higher end of the specification range. The probability method is not currently able to deal adequately with such a situation, though it may be possible in the future. Another situation that the probability method may not be suited for is encountered often in the food industry, where there are several responses that are measured, but only a few are critical to the success of the product. The probability method will treat all responses equally, and there is currently no methodology to adapt this to a situation where the responses have different weights.
4.2 Tire Tread Compound Example
Derringer and Suich present an example where a tire tread compound is to be optimized. The experiment is to determine the optimal setting of three ingredients,
64
Table 4.1: Experimental Design.
Compound No. x1 X2 X3 Yi Y2 Y3 Y4
1 1 1 +1 102 900 470 67.5
2 +1 1 1 120 860 410 65
3 1 +1 1 117 800 570 77.5
4 +1 +1 +1 198 2294 240 74.5 5 1 1 1 103 490 640 62.5
6 +1 1 +1 132 1289 270 67 7 1 +1 +1 132 1270 410 78 8 +1 +1 1 139 1090 380 70 9 1.633 0 0 102 770 590 76 10 +1.633 0 0 154 1690 260 70
11 0 1.633 0 96 700 520 63 12 0 1.633 0 163 1540 380 75 13 0 0 1.633 116 2184 520 65 14 0 0 +1.633 153 1784 290 71
15 0 0 0 133 1300 380 70
16 0 0 0 133 1300 380 68.5
17 0 0 0 140 1145 430 68 18 0 0 0 142 1090 430 68 19 0 0 0 145 1260 390 69
20 0 0 0 142 1344 390 70
hydrated silica x1, silane coupling agent x2, be optimized, and their specifications are:
PICO Abrasion Index, Yi 120 < 200% Modulus, Y2 1000 < Elongation at Break, y3 400 < Hardness, y4 60 <
and sulfur level x3. The properties to
Yi
Y2 Y3
Y4
A central composite design with 20 runs is used
< 600 < 75 for the experiment. The central
composite design consists of a full 23 factorial, with settings coded as ï¿½1, axial points coded as ï¿½1.633, and six center points, coded as 0. The settings and responses for the experiment are in Table 4.1. The data was fit to a full second order response surface polynomial, and the estimates of the coefficients are in Table 4.2.
65
Table 4.2: Regression Coefficients and Standard Errors for Responses.
Yl Y2 Y3 Y4
/3o 139.12 1261.11 400.38 68.91
/31 16.49 268.15 99.67 1.41
/2 17.88 246.50 31.40 4.32
03 10.91 139.48 73.92 1.63
Oil 4.01 83.55 7.93 1.56
1322 3.45 124.79 17.31 0.06
/333 1.57 199.17 0.43 0.32
/312 5.13 69.38 8.75 1.63
/13 7.13 94.13 6.25 0.13
/323 7.88 104.38 1.25 0.25
Std. Error 5.61 328.69 20.55 1.27
A comparison of all of the proposed methods will be done using the
regression equations given in Table 4.2, and limits given by Derringer and Suich. The optimization will be performed within a sphere of radius 1.633. Additional parameters need to be specified for some of the methods. The Derringer and Suich method uses artificial upper limits of 170 and 1300 for the first two responses, respectively. All of the shape parameters are set to 1. The Khuri and Conlon method would probably use 192 and 2271 as targets for the first two responses, these being the maximum values that can be attained by these responses within the experimental region. Ames et al. use 120 and 1000 as targets for the first two responses, though these targets are probably ill conceived, as they are the lower specification limits on the responses. A better choice would have been the upper limits specified by Derringer and Suich. The cost matrix used for Vining's method is C  E1 which is based on a multivariate control statistic. The results are in Table 4.3 for the proposed probability method, the desirability method of Derringer and Suich, the distance method of Khuri and Conlon, the weighted loss method of Ames et al., arid the mean square error (MSE) method of Vining. The probability
66
Table 4.3: Comparison of Optimization Methods.
Method Proposed Desirability Distance Wgtd Loss MSE Location x, 0.329 0.050 0.534 0.461 0.073
X2 0.863 0.145 1.536 0.283 0.408 X3 1.244 0.868 0.148 0.528 0.549
Response Y1 131.1 129.4 166.3 122.7 138.7
Y2 1464 1300 1474 1069 1318 Y3 445.4 465.7 359.4 500.3 423.6 Y4 69.6 68.0 73.8 67.5 69.6
Probability 0.886 0.781 0.020 0.403 0.719
listed is the probability that all of the responses are within their specification, or
P(120 < yi; 1000 < Y2; 400
J2= [0 f400 j(2r) 211/2exp((y y(x))'  (y y(x)))dy,
assuming that the response vector y is multivariate normal with mean vector y(x) and variancecovariance matrix E. The Fortran program that calculated this probability is contained in Appendix B. The estimate of the variancecovariance matrix is
31.49 34.78 3.14 1.13 34.78 108039.33 1489.08 30.36
3.14 1489.08 422.27 1.36 1.13 30.36 1.36 1.61
As the table shows, the proposed probability method outperforms the other methods when the probability of being within specification is compared. The probability of 0.886 is much larger than Derringer and Suich's 0.781, and Vining's
0.719. The other two methods perform rather poorly in comparison, with Khuri and Conlon predicting a probability of 0.020, and Ames et al. predicting 0.403. Derringer and Suich's method seems to suffer from the choice of the artificial upper limit on y2. They choose an upper limit of 1300, above which the product is not
67
thought to have any added value, and their optimal solution does indeed hit 1300. This level, however, is only 300 above the lower limit of 1000, which is less than the standard error of 327. The probability method, on the other hand, has this response at 1464, or roughly one and a half times the standard error. Vining's MSE method uses the same artificial upper limit of 1300 for y2 and suffers similarly for it. The weighted loss method of Ames et al. used a target of 1000 for Y2, which was not a good choice. The resulting optimum has y2  1069, which is barely inside the specification limit of 1000. The Khuri and Conlon method suffers the most by ignoring any of the specifications and relying solely on targets. Their target for yi is 192, and their resulting optimum has Y = 166.3, which is 46.3 above the lower limit of 120. In terms of the standard error, 46.3 translates to 8.25 times the standard error of 5.61, clearly much farther away from the specification limit than it needs to be. For comparison, the proposed probability method has y  131.1, or approximately 2 times the standard error above the specification limit of 120.
In the above comparison, Vining's MSE method suffers greatly due to the choice of 1300 as an upper limit for Y2. Since Vining did not use this example in his paper, it is not clear whether he would have selected 1300 as an appropriate upper limit, since much of the experimental data is at or above this level. In order to investigate the effect of this choice, the analysis is done for several choices of an upper limit, with the results being given in Table 4.4. The final upper limit used is slightly larger than the largest value of Y2 observed in the experiment. It is obvious that a higher upper limit for Y2 improves the solution in terms of the probability of being within specification, but it is this dependence on the choice of an upper limit that causes problems for this method. A lower specification limit is given for yi and y2, and each response should be as large as possible. Trying to define what an appropriate upper level should be can be quite difficult. It is assumed that Derringer and Suich used upper limits based on their knowledge of the process and
68
Table 4.4: MSE Method for Different Choices of Y2's Upper Limit.
Limit (Xl,x2,x3) (Yl,Y2,Y3,Y4) MSE Prob 1400 (0.106, 0.447,0.587) (138.9, 1331, 422.6, 69.6) 53.8 0.7180 1600 (0.110, 0.446,0.616) (138.5, 1333, 424.3, 69.6) 55.6 0.7351 1800 (0.114, 0.445,0.647) (138.0, 1335, 426.2, 69.5) 58.2 0.7520 2000 (0.118, 0.442,0.679) (137.4, 1338, 428.2, 69.4) 61.5 0.7684 2200 (0.122, 0.438,0.712) (136.9, 1341, 430.3, 69.3) 65.5 0.7842 2400 (0.125, 0.434,0.747) (136.3, 1345, 432.7, 69.2) 70.2 0.7990
product that they were experimenting with. If there is not enough information to set an appropriate upper limit, other methods may be attempted. The method proposed by Khuri and Conlon picked the largest fitted value that could be achieved in the experimental region. As the above example shows, such a choice may result in a solution that has one of the responses well outside of its specification limits. A similar situation can exist with the MSE approach. Setting the upper limit for Y2 to 2400 should imply that the upper limit for yi should be set at 200, since both values are slightly larger than the largest value observed in the experiment. These choices for the upper limit will give a result that is even worse than that of Khuri and Conlon. The maximum MSE is achieved at x = (0.611, 1.514,0.009), which gives a response vector y= (171.8, 1548, 342.1, 73.8) which results in a probability of meeting all specifications of 0.002. Problems with the definition of an appropriate upper limit for 'larger is better' situations illustrates the advantage of using the proposed probability method. The probability method is not trying to get the response as close to some chosen target, but is trying to get it as far as possible from the specification limit.
4.3 Repeatability of a Multiresponse Optimization Experiment
One issue that needs to be dealt with in optimization experiments is the question of how close the final optimal solution is to the true optimal solution. One way to look at this question is to define the true functional relationship of the
69
response variables to the control variables, as well as the true variability associated with the responses, and repeatedly rerun the experiment using randomly generated data. If the experiment is simulated a large number of times, then some information about the closeness of the estimated final solution to the true solution can be measured.
A simulation was carried out starting with the Derringer and Suich data set. The true functional relationship is defined as yj = X/, + i, i = 1, 2,3,4,
where the elements of 0 are the regression coefficients from the fit of the Derringer and Suich data, whose estimates are listed in Table 4.2, and qi is normal with mean 0 and standard deviation equal to the standard error from the fit of the Derringer and Suich data, also listed in Table 4.2. Given this information, the 20 run central composite design can be repeatedly simulated, the model fit, and the optimal solutions found for both the Derringer and Suich method and the currently proposed probability method. This was done 5000 times with the results shown in Figures 4.2 and 4.3. Figure 4.2 gives the distribution of each element of the deviation x  xOpt = (xl  0.329, x2  0.863, x3 + 1.244), where xopt is the optimal solution found by the probability method. Figure 4.3 gives the Euclidean distance between each solution x and the ideal maximum, as well as the distribution of the true probabilities found for each method. Table 4.5 gives the statistical information contained in the figures. It should be noted that the simulation was done on a Gateway personal computer with a 333 MHz Pentium processor in Microsoft Fortran Powerstation and took appoximately 16 hours to run. The Fortran program is listed in Appendix B.
The interpretation of the graphs is somewhat difficult, as the desirability graphs for the individual elements of the x vector from Derringer and Suich's
8
8
8
0.5 0.0 0.5 1.0
xl deviaion probability
0.5 0.0 0.5
x2 deviabon probability
1.0 1.5
S... .. a..a
2.5 2.0 1.5 1.0 0.5 0.0 0.5
x3 deviabon probability
0.5 0.0 0.5 1.0 xl deviation desirability
0.5 0.0 0.5 1.0 1.5 x2 deviation dasirabikly
8
8
ï¿½o ...I I n,
2.5 2.0 1.5 1.0 0.5 0.0 0.5
x3 deviation desirability
Figure 4.2: Distance from True Optimal x for Probability and Desirability Methods.
70
8.
OD
8
04
8
8
8
o
CM
I. I . .I I
0
0
0
0
0.0 1.0 2.0
Distance for Probability
0
0
0
CO
0.0 1.0 2.0
Distance for Desirability
0
0
LO
LO
0
o
0
0
0
0
0
Cl
! I i i i
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 True Probability for Probability True Probability for Desirability
Figure 4.3: Euclidean Distance from True Optimum and Probability for Probability and Desirability Methods.
71
72
Table 4.5: Repeatability of Optimization Methods.
Probability Desirability
Mean Std. Dev. Mean Std. Dev. x1  X17opt 0.105 0.224 0.336 0.170 X2  X2,opt 0.177 0.354 0.586 0.270 X3  X3,opt 0.169 0.317 0.301 0.253 Distance from Optimum 0.497 0.330 0.357 0.250 Probability 0.810 0.079 0.768 0.076
method are going to deviate more from the optimal solution Xopt since they are being compared to a solution from the proposed probability method. For this reason, the central location of those graphs are further from 0.0 than the same graphs for the probability method. The variability of both methods are roughly equivalent, which points out that the precision of the two methods are approximately the same. The variability associated with either method is larger than what would be hoped for in an optimization experiment. The distance graphs give a similar interpretation, with variability being high for both methods, and the probability method being closer to
0.0 on the whole. The true probability graphs show that the probability method gives an optimal solution that has a much better probability of being within all specifications than the solution provided by the desirability method of Derringer and Suich.
4.4 Albumin Nanospheres Example
Muller et al. (1996) describe an experiment to optimize the manufacturing process of albumin nanospheres. Albumin particles have been used as drug carrier systems, and, if small enough, they may be utilized as a sitespecific drug carrier. This sitespecific manner of drug delivery could eliminate possible severe adverse drug reactions due to systemic application. The three responses of interest are the yield, yi, particle size, Y2, and polydispersity index, y3. The yield is calculated as the weight of the dried microspheres in relation to the sum of the starting material, and
73
Table 4.6: Control Factors and Levels.
Factor 1.664 1 0 +1 +1.664
Albumin (% w/v) 5 11 20 29 35
Aqueous phase volume (% v/v) 1.3 3.4 6.5 9.6 11.7
Duration of Emulsification (min) 5 9 15 21 25 Glutaraldehyde (mmol) 0.2 2.8 6.6 10.5 13 Drug (mg) 0 10 25 40 50
should be made as large as possible. The particle size is obtained from a scanning electron microscope, and is a measure of the mean diameter of the microspheres, and should be made as small as possible. The polydispersity index is a measure of the width of the size distribution, and should be made as small as possible.
A fivefactor orthogonal central composite design was run to find the optimal settings of the control factors that will produce the most desirable nanospheres. The five factors that are thought to affect the production are albumin concentration, X1, aqueous phase volume, x2, duration of emulsification, x3, glutaraldehyde concentration, x4, and amount of drug, x5. The central composite design consists of a 251 fractional factorial, axial points, and three center point replicates. The coded and actual levels of the factors are listed in Table 4.6.
The experimenters took a desirability function approach to finding the
optimal settings, in the manner of Derringer and Suich. The limits for the responses of interest are given in Table 4.7. The original analysis in the paper does not adhere to the exact desirability approach presented by Derringer and Suich in their original paper. The purpose of the experiment was to obtain information about the process in order to design a new process which would result in smaller nanospheres with a smaller polydispersity. For this reason, they do not actually recommend an optimal setting of the control factors, they merely interpret the response surface graphs that they generated, in order to assist in the design of the new process. The data from this experiment will be used to compare the method that has been proposed here
74
Table 4.7: Response Variables and Limits. Variable Lower Limit, yT Upper Limit, yax Goal
Yield (% w/w) 50 100 Maximize Particle Size (nm) 0 500 Minimize Polydispersity Index 0 0.2 Minimize
with the desirability method of Derringer and Suich. In order to compare the two methods, the data will be used to fit the individual responses to full second order polynomial models in the control settings. These models will be used to generate the optimal settings for the two methods. In addition, the probability approach will need to estimate the variancecovariance matrix of the responses.
The data from the experiment can be found in Table 4.8. After an initial attempt to analyze the data as it is, it was determined that the models for size Y2 and polydispersity Y3 would be better fit after a log transform. The naive approach of analyzing the data without a transformation led to estimates of the responses at the optimal settings that were negative. Clearly a negative size and a negative measure of dispersion were undesirable. The need to transform these data should not be surprising since both of the distributions of the responses are quite skewed. For simplicity, all of the responses were fit to full second order polynomials, given by
5 5 5
A30 + S /3 Xj +55 fiJkXjXk,
j=1 j=1 k>j
for i = 1, 2, 3, where the / are the least squares estimates of the true regression parameters. The fitted parameters for the responses are in Table 4.9.
The two methods used to find the optimal settings of the control factors are the probability method and the desirability method. The probability method seeks the location x that maximizes the probability P(Y1 > 50; y2 < 500; Y3 < 0.2), where the transformed response vector (yi, lu(y2), ln(y3))' is assumed to be multivariate normal with mean vector given by (di1, ln(y2), ln(y3))' and variancecovariance matrix
75
Table 4.8: Experimental Data.
x1 X2 X3 X4 X5 Yi Y2 Y3
1 1 1 1 +1 78 222 0.077 +1 1 1 1 1 78 187 0.083
1 +1 1 1 1 12 1186 0.586 +1 +1 1 1 +1 89 286 0.274
1 1 +1 1 1 83 323 0.232 +1 1 +1 1 +1 80 182 0.112
1 +1 +1 1 +1 8 1106 0.884 +1 +1 +1 1 1 93 321 0.313
1 1 1 ï¿½1 1 91 277 0.178 +1 1 1 ï¿½1 +1 69 263 0.273
1 +1 1 +1 +1 15 1781 0.878 +1 +1 1 +1 1 81 296 0.294
1 1 +1 +1 +1 89 324 0.267 +1 1 +1 +1 1 70 197 0.110
1 +1 +1 +1 1 75 228 0.089 +1 +1 +1 +1 +1 11 708 0.664
1.664 0 0 0 0 90 330 0.350 +1.664 0 0 0 0 93 372 0.394
0 1.664 0 0 0 76 162 0.107 0 +1.664 0 0 0 7 1486 0.718 0 0 1.664 0 0 83 2024 0.498 0 0 +1.664 0 0 99 381 0.394 0 0 0 1.664 0 68 234 0.093 0 0 0 +1.664 0 81 215 0.039 0 0 0 0 1.664 73 220 0.126 0 0 0 0 +1.664 84 220 0.093 0 0 0 0 0 97 218 0.101 0 0 0 0 0 82 221 0.073 0 0 0 0 0 74 217 0.083
76
Table 4.9: Analysis of Regression.
yi ln(y2) ln(y3)
0o 86.254 5.589 2.260 031 5.803 0.205 0.083 132 17.124 0.487 0.517 133 1.050 0.172 0.020 ,04 0.076 0.007 0.169 035 5.836 0.129 0.157 Oil 1.428 0.049 0.409 122 16.630 0.170 0.303 033 1.248 0.081 0.472 044 4.711 0.112 0.248 055 3.267 0.119 0.037 1312 13.000 0.127 0.014 1313 7.625 0.142 0.044 114 12.375 0.137 0.182 1323 1.000 0.085 0.114 1324 1.250 0.062 0.201 1325 8.250 0.169 0.179 034 1.125 0.107 0.245 35 7.625 0.113 0.157 45 7.625 0.248 0.309 R2 0.939 0.870 0.931 Adj R2 0.785 0.544 0.757 v/MSE 13.311 0.494 0.428
77
given by
177.17 3.67 3.13 E 3.67 0.24 0.131ï¿½
3.13 0.13 0.18 The desirability method seeks the location x which maximizes an overall desirability function D  (dld2d3)1/3, where dl, the desirability of the first response, is defined by:
150 .if 50<1 <100 100 50 di 0 if Yj < 50 I if YJ > 100 and d2 and d3, the desirabilities of the second and third responses, respectively, are defined by:
Y a if 0< < ymax di 0 if n >yax
1 if Yj < 0
where the yfax = 500 and ymax = 0.2.
The optimal settings given by each method are given in Table 4.10. The probability listed is the probability for the given location x, P(yl > 50; Y2 < 500; y3 < 0.2). The probability method results in a solution that predicts a smaller size and polydispersity than the desirability method, which is the result that was sought by the optimization experiment. The yield that is predicted is also smaller, though, which is not what is desired. The lower yield is still well above the lower limit of 50 that was given. Since the standard error for yield is 13.39, the predicted yield of 75.79% is approximately two standard deviations above the lower limit of 50.
78
Table 4.10: Optimal Settings.
Probability Desirability X1 0.542 0.059 X2 0.533 0.085 X3 0.257 0.095 X4 1.42 0.008 X5 0.326 0.083 75.79 83.79
ln(y2) 5.09 5.64 ln(y3) 2.83 2.19 Y2 161.60 282.14 YJ3 0.059 0.112 Probability 0.9303 0.7206
It should be noted that the assumption of normality is used to greatly ease the calculations that have been made. In the above example, two responses were transformed to make the assumptions of normality more realistic. If one were ambitious enough, a different distribution could be assumed for the responses, instead of using a transformation. For instance, the above approach used a log transform, assumed that the transformed response was normal, whose mean depended on the control variables, and the variance was constant over the experimental area. Instead of this, the experimenter could have assumed a Weibull distribution, with the scale parameter depending on the control variables, and the shape parameter being constant. The optimization would still be to maximize the probability of being within specifications, but the distributional assumptions would be different. The probability of being within specifications is a multiple integral that would have to be calculated, but it could be performed, if desired.
CHAPTER 5
MULTIPLE RESPONSE ROBUST PARAMETER DESIGN
5.1 A Description of the Proposed Method
In situations where there are multiple responses and the variance of one or more of these responses cannot be assumed to be constant over the experimental region, the same method that has been used in other scenarios may be used, i.e. maximize the probability that the responses are within their specified limits under the assumption of normality. This method is an extension of the multiple response optimization, as well as the robust parameter design. Since the variance cannot be assumed to be constant over the experimental region, both the response and the standard deviation will be modeled from the experimental data for each of the responses of interest. These models will then be used to estimate the probability that the responses are within their respective limits. The final optimal setting of the control factors x will be that combination that maximizes the probability of meeting all specifications.
Let xi be a pxl vector representing the ith setting of the control variables and Yij be the Jth response of interest at xi, for i= 1,..., n and j = 1,... , r. Suppose that an appropriate linear model is
Yij f f'(x)'3j + E , (5.1) where f(xi) is a qmxl vector representing the appropriate polynomial expansion of xi for the assumed model for the responses, and fj is a qmxl vector of unknown coefficients for the jth response, and the Eij are normally distributed with mean 0 and variance crj. Furthermore, let the errors of the jth response be independent, or let
79
80
cij be independent of ciij, i / i'. In other words, the errors for a given response are assumed to be independent, but correlation exists between the different responses. Consider an appropriate transformation of the variance h(Tij)  ', with a linear predictor defined by
= (5.2)
where g(xi) is a qxl vector representing the appropriate polynomial expansion of xi for the assumed model for the function of the variance, and y,j is a qxl vector of unknown coefficients for the jth function of the variance. Common transformations of the variance include the log transformation (Tij= = log(aj)) or the square root transformation (7ij  Oij).
The first step of the optimization process will be selecting an appropriate experimental design that will allow for the estimation of the models for both the average response and the variance of the response (or an appropriate function of the variance). The model for the mean response is usually assumed to be a second order polynomial, so a response surface design is typically used. Since the variance will also need to be modeled, some replication of this design will be necessary in order to estimate the variance at different points in the experimental design. In addition to the average response and the variance, some estimate of the correlation of the responses will be necessary. With all of this estimation needed, the experimental design may need to be very large, which could make the experiment expensive and difficult to carry out.
A central composite design with some replication will allow estimation of the functions of the response of interest and the variance, but the correlation structure will still need to be estimated. If the correlation structure is assumed to be constant over the experimental region, the addition of replicated center points could be used
81
to estimate the correlation of the responses of interest, given by
nc
Sjj (Y J )(Ykj'
k1
where n, is the number of times the center point has been replicated, and 90j and soj are the mean and the standard deviation of the jth response at the center point. Though the assumption of a constant correlation structure simplifies the optimization procedure, it may be a valid assumption in many cases, especially where physical characteristics are the responses of interest and their correlation is inherent to the product. Another possible way to estimate the correlation structure is to use the replicated points to form a pooled estimate of the correlation. In this case the correlation can be based on Y/8[I X(X'X)X']Y, divided by the appropriate number of degrees of freedom, where Y8c is the matrix of responses, scaled by the estimated standard deviation, or yjjsij. This estimate depends on the assumed model, thus any model misspecification could lead to a poor estimate of the correlation coefficients.
Vining and Schaub (1996) give some recommendations for experimental
designs that will be able to estimate the mean and the variance of a response and will still be small enough to be practical to run. One situation of interest is when the response of interest is assumed to be a second order response surface model of the control variables, and the variance (or appropriate function of the variance) is assumed to be a linear function of the control variables. In this case they suggest that a central composite design be used as the basis for the experiment. The full design will be used to estimate the second order response surface of the response of interest. In order to estimate the linear function of the variance, the factorial points of the central composite design will be replicated, which will allow for estimates of the variance at these points. The factorial portion of the design will be used to estimate the linear function of the variance.
82
Suppose that an appropriate experiment has been run. Let yj be the nxl vector of the jth response, and let rj be the n,xl vector of the jth linear predictor. Here n, represents the number of distinct design points that have been replicated. From equation (5.1), we obtain
yjm= Xfj + Ej
where X' [f(x1),... , f(x, )]. Similarly, from equation (5.2), we obtain irj =Z"yi3
where Z'= [g(x),..., g(x,)]. At this point, the least squares estimates could be used to obtain estimates of the unknown parameters 3ij and yj for j = 1,... , r.
It is now possible to estimate, for a given setting of the control factors x, the mean of each of the responses and a function of the variance for each of the responses. These are given by jj(x) = f(x)3 and ^j(x) = g(x)^j for j = 1,..., r, where j (X'X)X'yj and = (ZIZ)Z'rj. In addition, the variancecovariance matrix of the responses can be estimated by E(x), whose ill diagonal element is hQ?), and the ijth offdiagonal element is p ij\Ih( ^)h(T^). The optimal setting of the control factors is the combination x that maximizes the probability that all of the responses y are within their specified limits, or
m zn max m i max . m in < r < max)
P(y1
The probability is calculated by assuming that the response vector y is multivariate normal with mean vector y,(x) and variancecovariance matrix ](x). For responses that are to maximized, the upper limit is set equal to +oo, and for responses that are to be minimized, the lower limit is set to oo.
This method of maximizing the probability of being within specification has the appeal that it is an extension of the methods proposed in previous chapters.
83
This will allow an experimenter to proceed in a similar fashion for any optimization problem. Each experimental situation will be slightly different with regards to the choice of experimental design, methods for estimating the parameters of interest, and the models that are assumed to be correct, but the optimal setting of the control factors x will be found in the same way, i.e. maximizing a probability.
Pignatiello is the only other person to propose a method of optimization for the situation of multiple responses whose variance cannot be assumed constant. He proposes replicating every point in an experimental design and minimizing an estimate of the mean squared error, given by
R(x) = trace(CE(x)) + (y(x)  ytarget)C(y(x)  Ytarget),
where C is a cost matrix that must be specified. This method requires that the quantityR(x) be found for each point in the experimental design, and this quantity (or a function of this quantity) is to be fit to a predictive model in the control variables x. This approach needs a separate estimate of the mean and variancecovariance matrix at each point in the experimental design. The obvious estimate of the mean is the average of the replicated runs at that point P(x). The estimate of the variancecovariance matrix at a given experimental point is given by E (x), whose ijth element is given by
nr
Z(Yik T)(Yjk  Y)/r
k=1
where nr is the number of times that the design point has been replicated.
The mean squared error method proposed by Pignatiello does have some problems that need to be addressed. The requirement that the quantity R(x) be calculated at every design point means that the entire experiment must be replicated enough times to allow for proper estimation of all of the quantities in the equation. This total replication may lead to a very large experiment. A smaller experiment
84
may be obtained by the use of partial replication, as described above. One could easily use the same estimations of the response and the variancecovariance matrix as are used for the proposed probability method, and then find the location of the control variables x that minimized the quantity R(x). This would, in essence, replace the probability function with the mean squared error function, with everything else remaining essentially the same.
A more serious problem, perhaps, is that of selecting a target in the cases of larger or smaller is better, and of selecting the values for the cost matrix. Pignatiello admits that the targets are only applicable in the case of 'target is best,' and he does not recommend using this method for the other situations. The experimenter could conceivably use artificial targets, as Derringer and Suich recommend for multiple response optimization, but the final solution could depend greatly on the choice of these targets. Though he does give recommendations for selecting the values for the cost matrix, it still may be based on subjective choices which may greatly influence the final solution. These cost quantities may be less well known than the specifications required for the product. As was seen in the case of multiple response optimization, the final solution can depend greatly on the choices of targets and cost matrix values.
The proposed probability method does have some shortcomings, also, such as ease of calculation, size of the experiment, and the ability to estimate all of the parameters. Since a computer program needs to be written, this method is not as straight forward as a method that could be done in a spreadsheet, such as Microsoft Excel. The program that was used for the example that follows is similar to that listed in Appendix B, and it should be fairly easy to modify for other applications. The potential large size of the experiment is not unique to the probability method. Any time both the mean response and the variance need to be estimated, replication of all or part of a traditional design will be needed.
85
5.2 Oxygen Plasma Anodization Example
A simulated example will be presented that deals with the frequency
adjustment of quartzaluminum oscillator crystals. The following is a hypothetical process, and the description should be treated only as motivation for the simulated experiment that follows. The crystals have aluminum plating on top of a quartz substrate. The frequency of the crystals depends on the thickness of the quartz and the total mass of the aluminum plating. The crystal is adjusted to its final frequency by means of an oxygen plasma. The crystal is placed in an oxygen rich chamber near a high voltage anode. When voltage is applied to the anode, the surrounding oxygen becomes a plasma. When the crystal is placed near the plasma, the oxygen molecules combine with the aluminum molecules to create aluminum oxide. The extra oxygen molecules increase the mass of the aluminum plating, thus reducing the frequency of the crystal.
The responses of interest in the frequency adjustment of the crystals are the total amount of frequency change that can be achieved, yi, and the final resistance of the crystal, Y2. The total amount of frequency adjustment should be as large as possible, but must be greater than 60 kHz in order for the process to run efficiently. Past experience has shown that the range of frequencies after the initial aluminum plating is approximately 60 kHz, due mostly to differences in the thickness of the quartz (see Figure 5.1). The goal of initial plating is to have the frequency of the crystal 30 kHz higher than the final target frequency, on average. If this is achieved, then the frequency of the crystals range from target to 60 kHz above target. All of the crystals are then put into the oxygen plasma chamber in order to adjust all of them to the same final frequency. Due to process variability, not all lots of crystals are targeted properly at 30 kHz above final frequency, thus some crystals are below the final frequency and some are more than 60 kHz above the final target. Those crystals that are too low in frequency must be reworked. If the process cannot
86
Too Low
Too High
Target
Frequency
Figure 5.1: Frequency Distribution of Quartz Crystals.
adjust the frequency more than 60 kHz, the crystals that are too high in frequency will be unable to be adjusted to final frequency, and must be reworked. If the range of frequency adjustment can be made greater than 60 KHz, then the frequency distribution can be shifted slightly, and the amount of crystals that need to be reworked could be reduced. It is for this reason that settings for the control variables x are sought that maximize the amount that the frequency can be adjusted. The resistance of the crystal is effected most by contaminants in earlier processing steps, and by specification must be held below 30 ohms.
The control variables that affect the responses y are the distance that the crystal is from the anode, x1, and the pressure of the oxygen rich chamber, x2. When the pressure is too low, there is not enough oxygen available to bind to the aluminum plating on the crystal, causing the total possible frequency shift to be
87
quite low. As the pressure increases, the amount of oxygen available increases, which in turn increases the amount that the frequency can be shifted. At some point, the increased pressure causes the plasma to be contained too close to the anode, which begins to reduce the amount that the frequency can be shifted. When the crystal is too far from the anode, none of the plasma is available to the aluminum, so the amount that the frequency can be adjusted is fairly small. As the crystal moves closer to the anode, it comes in contact with the plasma cloud that surrounds the anode, and thus the amount that the frequency can be adjusted increases. At some point the crystal can become too close and actually inhibit the flow of oxygen around the anode, thus reducing the plasma cloud and decreasing the amount that the frequency can be adjusted. The variability of the amount of frequency shift tends to increase as distance between the anode and the crystal decreases and the pressure increases. The resistance of the crystal can increase due to contaminants coming off of the anode and attaching to the crystal. This typically happens more at low pressures and when the distance between the anode and the crystal is small. The variability of the resistance follows the same pattern as the resistance itself, increasing as the distance and pressure decrease.
A simulated experiment is performed whose goal is to find the settings of the control factors x that maximizes the probability that the resistance is less than 30 ohms, and the amount that the frequency can be adjusted is greater than 60 kHz. The true functional relationship between the responses, their variability, and the control factors is as follows:
yi 80.0 ï¿½ 1.0x1 + O.5x2  5.Ox  3.0x ï¿½ 2.0xlx2 y2 = 20.0 + 0.5x1  0.5x2 + 3.0x ï¿½ 2.Ox + 1.0xlx2 =l 10.0  3.0x1 + 1.0x2
=2 7.0  1.0x1  0.3x2
88
Table 5.1: Experimental Data.
X1 X2 Yl Y2 81 82
1 1 69.51801 40.09109
1 1 94.16965 27.49232 19.79 13.23
1 1 55.02909 13.63547 1 1 75.88098 12.99614
1 1 69.30468 24.13184 5.48 7.43
1 1 65.00851 27.08687
1 1 84.19299 27.77813
1 1 63.19381 15.60778 12.41 6.09
1 1 62.22724 21.59617 1 1 67.22293 29.85625
1 1 73.92506 20.72596 3.76 4.84
1 1 73.53838 22.49168
1.414 0 63.58725 24.38051 1.414 0 75.03255 33.32704
0 1.414 62.53672 21.47707 0 1.414 68.58016 29.29213 0 0 83.01640 5.34768 0 0 81.90584 8.20455
0 0 94.42947 14.14651 7.67 8.28
0 0 76.35438 27.28385 0 0 93.47378 22.04167 0 0 93.47162 17.18751
The covariance between the two responses is P12 = 0.2. The setting of the control variables that maximizes the probability that both responses are acceptable is x = (0.302,0.073), which results in a probability of 0.9098, and response vector y = (79.9,20.4). The standard deviation of the responses at this setting is a, = 9.2 and c2 = 6.7.
The experiment will be a central composite design with the factorial points replicated three times and the center point replicated six times. The axial points are not replicated. Data is simulated for the experiment and is in Table 5.1. The data from the experiment is used to fit a full second order model for the responses, a first order model for the standard deviation, and the correlation between the responses. All of the data is used to separately fit a full second order model for the responses.
89
The fitted equations are:
Yl(x)= 87.1 + 0.8x, + 0.2x2 7.5x2  9.4x + 1.1x1x2 =()  15.7 + 0.2x, + 0.2X2 + 5.34 + 3.5x + 2.1x1x2
Just the five replicated points are used to fit a first order model for the standard deviation of the responses. The fitted equations are: a, (x) = 9.8  5.7x,  2.3x2
U2 (X)= 8.0  1.8x,  2.4x2
The six center points are used to estimate the correlation between the two responses. The estimated correlation is 312 = 0.02, which is assumed to be constant over the experimental region.
The optimal setting of the control variables x is the setting which maximizes P(yi > 60; Y2 < 30) in the experimental region of radius Vi2, under the assumption that y is multivariate normal with mean vector Y and variancecovariance structure a 1a(X l(X)2(X)
P12 Ol(X) 2 (X) 2(k )
The probability is calculated using the algorithm from Schervish, which has been described previously, and the maximum of the probability function is obtained from the IMSL algorithm NCONF, which finds the minimum of a function with nonlinear constraints. Based on this simulated experiment, the setting of the control variables that maximizes the probability of both of the responses being within their acceptable limits is x = (0.235, 0.555). At this location in the experimental region, the true response vector is y = (79.6, 20.8), and the probability is 0.8967. The standard deviation of the responses at this setting is o = 9.9 and u2 = 6.6. A comparison of the true solution to the experimental solution is contained in Table 5.2. Though the
90
Table 5.2: Comparison of Solutions.
True Optimum Experimental Optimum Control Setting (0.302, 0.073) (0.235, 0.555) Mean Response (79.9, 20.4) (79.6, 20.8) Standard Deviation (9.2, 6.7) (9.9, 6.6) Probability 0.9098 0.8967
location of the control settings appear to be fairly far apart, the response, standard deviation, and probability are not that different. This is because the probability of being within acceptable limits is fairly flat near the true optimal setting.
5.3 Repeatability of Anodization Example
The above analysis is a single experiment simulated from a hypothetical
process and is only meant to be a step by step guide to finding the optimal setting of control factors that results in the maximum probability of being within acceptable limits. Since this is an analysis based on a single experiment, it would be beneficial to assess the repeatability of running such an experiment. Since the experiment is simulated, it may be easily repeated. The experimental data was regenerated 1000 times, each set of data being used to estimate the mean responses, standard deviations, correlation, and optimal setting of the control factors. The results are given in Table 5.3, which lists the true probability at the setting of the control factors and the distance the setting is from its true optimal setting Xi,opt. As was the case with simple multiple response optimization, the repeatability of this simulated experiment is not particularly good. A scatter plot of the the locations of the experimental optima is in Figure 5.2, which has 50 and 90% confidence ellipses. This figure is perhaps a better indication that a single experiment may not do a good job in finding a point that is close to the true optimum. As was seen with the case of multiple response optimization, the lack of repeatability does not seem to be a problem that is isolated to one method of optimization, but seems to be a problem
91
Table 5.3: Repeatability of Experiment.
True Probability x,  xj,opt X2  X2,opt Mean 0.821 0.099 0.022 Std. Dev. 0.081 0.564 0.735 Median 0.847 0.108 0.011 75th %ile 0.891 0.533 0.467 25th %ile 0.750 0.277 0.524
that will be encountered in any optimization experiment where a relatively small amount of data is generated.
5o * l soon*.
I16
ses a... s a. a. \ , ï¿½ ,
I J l l s ,r 0, OF. ï¿½ : "
go a s IG .Ra
ï¿½ ï¿½3 ' .5m i 5. ï¿½ .
g %* % SG
go a so.. / .': . ' . . I , ., . .. ...::,.;. .. ../  _ _, :"_. ':' '4, ,, : , so
0 a a " .be% a. " .
" a. " l"" 'e ."  .a ""a .a' .
Or," son,. ' 0. .' . . . . 1 .
a"a su ,a
i m ,
xl
Figure 5.2: Location of Optimal Settings.
92
' /
X2
0
CHAPTER 6
CONCLUSION AND FURTHER RESEARCH
The probability method has been shown to work well for the situations of robust parameter design, multiple response optimization, and multiple response robust parameter design. In each case, it performs as well as or better than the methods that have been previously proposed. The analyses have also shown some serious potential problems with some of the other proposed methods. The proposed probability method is a unified and intuitive approach to the optimization problems that have been presented here.
The analysis that has been done does point to some areas in the field of optimization that still need to be addressed. Perhaps the most important area that needs to be looked at has to do with the repeatability of any optimization experiment. As has been shown, a single optimization experiment may give an optimal setting that is far from the true optimum. This is a fact that experimenters must be aware of before they begin an such an experiment. Much cost and effort could go into an experiment that returns a suggested combination of the control settings that is far from optimal. Such a situation could lead some to become skeptical of not only optimization experiments, but other statistical techniques, as well. For this reason, there needs to be more research on the repeatability of optimization experiments, and possible ways to make them more repeatable. It is unclear if there are experimental designs that would lead to more repeatable experiments, or if there are conditions on the responses that must be met in order for the repeatability to be more acceptable. There may also be other analysis techniques that could make the repeatability better, such as using prior production
93

Full Text 
PAGE 1
A UNIFIED APPROACH TO PROCESS OPTIMIZATION By PHILIP JOHN MCGOFF A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2000
PAGE 2
This dissertation is dedicated to my family, especially my to father, who is not here to see me become the first in the family with a doctorate.
PAGE 3
ACKNOWLEDGMENTS I would like to thank my family and friends for supporting me in my return to graduate school. I would also like to thank all of my teachersfrom elementary school, high school, undergraduate, and graduate schoolfor spending time with me and challenging me to do my best. I would especially like to thank Mr. Kristo and the rest of the math department at Owatonna High School for giving me an excellent foundation in basic mathematics. Finally I would like to thank the University of Florida Department of Statistics, the faculty, staff, and fellow students, for making my stay in Gainesville an excellent experience. m
PAGE 4
TABLE OF CONTENTS gage ACKNOWLEDGMENTS iii ABSTRACT vi CHAPTERS 1 INTRODUCTION 1 2 METHODS OF OPTIMIZATION 6 2.1 The Taguchi Method 6 2.2 Dual Response Surface Optimization 11 2.3 Alternative Methods of Dual Response Surface Optimization 16 2.4 Criticisms of the Various Optimization Schemes 22 2.5 Multiple Response Optimization 25 2.6 Criticisms of Multiple Response Optimization Methods ... 34 2.7 Robust Multiple Response Optimization 36 2.8 Optimizing a Function 38 3 ROBUST PARAMETER DESIGN 41 3.1 A Description of the Proposed Method 41 3.2 Printing Process Example 47 3.3 Example with Lifetime Data 54 4 MULTIPLE RESPONSE OPTIMIZATION 59 4.1 A Description of the Proposed Method 59 4.2 Tire Tread Compound Example 63 4.3 Repeatability of a Multiresponse Optimization Experiment . 68 4.4 Albumin Nanospheres Example 72 5 MULTIPLE RESPONSE ROBUST PARAMETER DESIGN .... 79 5.1 A Description of the Proposed Method 79 5.2 Oxygen Plasma Anodization Example 85 5.3 Repeatability of Anodization Example 90 6 CONCLUSION AND FURTHER RESEARCH 93 IV
PAGE 5
APPENDICES A SOLVER COMMAND IN MICROSOFT EXCEL 95 B FORTRAN CODE FOR MULTIVARIATE NORMAL PROBABILITY 99 C BASIC OUTLINE OF AN OPTIMIZATION EXPERIMENT ... 128 REFERENCES 133 BIOGRAPHICAL SKETCH 136 V
PAGE 6
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A UNIFIED APPROACH TO PROCESS OPTIMIZATION By Philip John McGoff May 2000 Chairman: G. Geoffrey Vining Major Department: Statistics The goal of any optimization experiment is to find the settings of the factors that can be controlled which results in optimal levels of the responses of interest. In robust parameter design, the two responses of interest are the mean and variance of a quality characteristic. In multiple response optimization, the responses of interest are the quality characteristics of the product. In both of these cases, a quantity that is a function of the estimates of the responses of interest is either maximized or minimized. A variety of quantities have been proposed for robust parameter design and multiple response optimization, but all of the proposed quantities are lacking in some respectthey may lack intuitive appeal, depend too heavily on the definition of subjective parameters, or fail altogether in certain situations. In addition, most of the quantities proposed for robust parameter design cannot be adapted easily to multiple response optimization. The probability that all of the responses are simultaneously within their upper and lower specification limits is a quantity which vi
PAGE 7
can be used for robust parameter design and multiple response optimization. The probability method also has an intuitive appeal that will make it easy to explain to people in fields outside of statistics. This method does not depend on the definition of subjective parameters, and it works in all of the situations that have been addressed. It may also be extended to multiple response robust parameter design, which none of the other methods has attempted. Vll
PAGE 8
CHAPTER 1 INTRODUCTION It is the goal of any industry to produce high quality products as inexpensively as possible. The quality of a product is usually measured by physical characteristics, such as diameter, purity, taste, or hardness. When these characteristics are at specific target levels or values, the product is thought to be of high quality. As the characteristics deviate from the target levels, the quality of the product decreases. Therefore producing a high quality product translates into producing the product with quality characteristics at specific levels. There are three possible situations for an individual characteristic: Â• Target is best: the quality characteristic has a target level that is most desirable. For instance, the diameter of a roller bearing or viscosity of a fluid. Â• Larger is better: the quality characteristic should be made as large as possible. For instance, the purity of a chemical or a carÂ’s gas mileage. Â• Smaller is better: the quality characteristic should be made as small as possible. For instance, impurities in a chemical or number of bubbles in a paint job. Optimizing a product or process is a situation found in many areas of applied statistics. A set of control factors affects a response of interest, which is to be maximized (or minimized). The goal of the experimenter is to find the combination of the control factors that does maximize (or minimize) the response of interest. Box and Wilson (1951) introduced the topic of response surface methods in an attempt to answer this problem. They recommended that a series of experiments be performed, first to find out which control factors actually affect the response of interest, and secondly to find the optimal settings of the control factors. The 1
PAGE 9
2 optimal settings are found by running a response surface experiment capable of fitting the response to a second order polynomial in the control factors. Using some basic calculus, the maximum (or minimum) of the second order polynomial can be found. The situation can become much more complicated if the response of interest does not have constant variance in the region in question, or if there is more than one response of interest. Robust parameter design deals with the situation where there is a single response that does not have a constant variance in the region of interest. Multiple response optimization deals with the situation where there is more than one response of interest. Robust Parameter Design In the manufacturing and use of the product, there will exist factors that will affect the values of the quality characteristic, not all of which can be easily controlled. Those that can be easily controlled are referred to as control factors. These could be the amount of material going into a chemical process, the temperature of a chemical bath, or the pressure in a reactor. Those factors that cannot be easily controlled, or are too costly to control, are referred to as noise factors. These could be ambient humidity and temperature, the speed at which a customer drives his car, or the temperature at which a customer bakes his cake. Optimizing the industrial process entails finding the levels of the control factors that will produce a product that has the desired quality characteristics. Due to natural variability, it is impossible to find settings that will always give the same values of the quality characteristics, and so all we ask is that it give those values a high percentage of the time on average. This variability will lead to decreased quality, as the product will not always be at its target value. One approach that has been traditionally taken is a two step process. The first step gets the characteristic at its desired level, on average. The second step is to find the sources of variability
PAGE 10
3 and either control them or eliminate them. Control or elimination of sources of variability can be difficult and/or costly. Another approach is to find settings of the control factors that make the product insensitive to the sources of variability. This is the approach used in the area of parameter design or robust parameter design. This approach has the appeal that it is typically more cost effective than controlling or trying to totally eliminate the sources of variation. If levels of the control factors can be found for which the product is insensitive to the sources of variation, and the productÂ’s quality characteristics can be made consistently close to their desired levels, this will result in a consistently high quality product. Experiments can be run in order to find the settings of the control factors that will result in a consistently high quality product. The experiment will have the quality characteristics as the responses of interest, and the control and noise factors as the variables of interest. Notice that some noise factors can be controlled during experimentation, even though they cannot be easily controlled otherwise. An example of this would be the speed a car is driven or the temperature at which a cake is baked. Even if no noise factors exist or are known, an experiment can still be run to find the settings of the control factors that produce the smallest variability while still achieving a desired level of the response. Multiple Response Optimization Most products and processes have more than one response of interest, and all of these responses may depend on a set of control factors. These responses may have either an ideal target value that is desired, or a range of values that will result in a product that is of satisfactory quality. These targets and ranges are typically given in the form of targets and upper and lower specification limits. Often the product cannot be shipped to a customer unless all of the individual responses are within their upper and lower specifications. Ideally, there would exist a combination of the
PAGE 11
4 control factors that simultaneously put all of the responses at their individual target value. This does not often happen, though, so a set of control factors must be found that somehow balances all of the responses, so that each response is somewhat close to its target. Multiple response optimization usually assumes that the variability of the individual responses does not depend on control factors. It can be easily imagined, however, that situations may exist where one or more of the responses depend on the control factors, as in robust parameter design. Experimental situations can therefore fall into three categories: Â• Robust Parameter Design: There is only one quality characteristic of interest, and its variability depends on the control factors in the experiment. Â• Multiple Response Optimization: There is more than one quality characteristic of interest, but the variability of these responses are not thought to depend on the control factors in the experiment. Â• Multiple Response Robust Parameter Design: There is more than one quality characteristic of interest, and the variability of these responses depends on the control factors in the experiment. Multiple response robust parameter design has not been dealt with much in the literature, but robust parameter design and multiple response optimization have been researched extensively. Robust parameter design and multiple response optimization have been treated as two separate problems, each with a unique method for finding the best setting of the control factors. All of the proposed methods have much in common, though. They all start with a designed experiment which is capable of supporting the fitting of the responses using polynomial models in the control settings. Some or all of the fitted responses are transformed into a univariate function, which is maximized (or minimized) over the experimental region of interest. The univariate
PAGE 12
5 functions proposed for robust parameter design cannot be easily adapted for use in multiple response optimization, and vice versa. Maximizing the probability that the responses are between their upper and lower specifications can be used for both robust parameter design and multiple response optimization. In the following chapters, I will propose that this function be used for all three of the experimental situations listed above.
PAGE 13
CHAPTER 2 METHODS OF OPTIMIZATION 2.1 The Taguchi Method Genichi Taguchi is credited with introducing robust parameter design based on his work with various industries in Japan. His method is explained in papers by Kackar (1985) and Byrne and Taguchi (1987), as well as discussed by a panel headed by Nair (1992). Explanations of this method are also included in books on response surface methodology, such as those by Myers and Montgomery (1995) or Khuri and Cornell (1996). The philosophy behind the use of robust parameter designs is that it is more cost effective to try and make a process robust to sources of variability than it is to try to control or eliminate those sources. This same philosophy can be used to improve product performance in the field. The product can be designed in such a way that its performance is not sensitive to variability that it will encounter during its lifetime. Taguchi believes that as long as a productÂ’s characteristic is not at its target level, it is not of the highest quality. This is different from the belief commonly held in some industries that as long as a productÂ’s characteristic is between its lower and upper specification limits, it is of the highest quality (see Figure 2.1). TaguchiÂ’s belief that a product can have lower quality while still being within specification is the driving force behind reducing product variability. It is noted that customer satisfaction increases as product variability decreases. Robust parameter design aims to find the optimal settings of the control parameters. In order to do this, a criterion must be specified that is to be optimized. Taguchi recommends optimizing the expected loss, defined as the expected value of 6
PAGE 14
7 Figure 2.1: Loss Functions for Taguchi and Classical View.
PAGE 15
8 monetary losses an arbitrary user of the product is likely to suffer at times during the productÂ’s life span due to performance variation. This makes the concept of optimization concrete in that it relates product performance and monetary losses. Taguchi recommends using a quadratic loss function, 1(Y) = K(Yt) 2 , where Y is the quality characteristic, r is its target value, and K is some constant. The constant K is determined by the cost to repair or replace a product if it performs unsatisfactorily. (There are slight modifications to this loss function for the cases of larger or smaller is better.) Then the criterion that is to be optimized is the expected loss, L = E[((F)] = KE\(Y t) 2 ]. Note that the final solution will not depend on the choice of K . The actual design of the experiment is a design in the control factors crossed with a design in the noise factors. If the control design consists of m combinations of the levels of the control factors and the noise design consists of n combinations of the levels of the noise factors, then the overall design will have a total of ran runs. Notice that each combination of control factors will be repeated n times, once for each combination of the noise factors. This methodology serves to induce noise at each combination of the control factors. These n runs are then used to calculate a performance statistic. There will be ra performance statistic values, one for each combination of the control factors. The performance statistic suggested by Taguchi is referred to as a signal to noise ratio, and a large value is desired. Although Kackar notes that over 60 different signal to noise ratios are used, there are three main ones that are used, depending on the type of quality characteristic. They are:
PAGE 16
9 Smaller is Better: In this case, the target value is r = 0 and the expected loss is proportional to E[(Y0) 2 ] = E[Y 2 ]. The performance measure is then estimated by n 10 log ^Vi/n , i=l where n is the number of runs among the noise factors at each combination of levels of the control factors. Larger is Better: In this case, the quality characteristic Y is transformed to 1 /Y and treated as though it were the case of smaller is better. Therefore the performance statistic becomes n 10 log 5^(1 /Vi)/n 2=1 Target is Best: Kackar lists two situations, both of which depend on the assumption that an adjustment parameter exists. An adjustment parameter affects the mean of the response, but does not affect the variance. Thus the variance can be reduced, and the mean moved to the appropriate level by means of the adjustment parameter. The situations of interest in the target is best case are when the variance is not linked to the mean and when it is. In the case that the variance is not linked to the mean, the idea is to minimize the variance, since the mean may be moved to target by means of the adjustment parameter. For this reason, the performance statistic suggested is n log(s 2 ) = log ^(Vi y) 2 /(n 1) 2=1
PAGE 17
10 where s 2 is the unbiased estimate of the variance and y is the mean response at that experimental point. When the variance is linked to the mean, and is most likely positively correlated, the coefficient of variation o / ji should be made small. When the coefficient of variation remains small, any change in the mean results in a relatively small increase in the variance. Again the target is reached through adjustment parameters. For this reason, the performance statistic suggested is 10log(y 2 /s 2 ) where y is the average of the response values. Note that this is the only performance statistic listed here that can be legitimately called a function of the signal to noise ratio. The Taguchi method starts with the identification of control factors, noise factors, and their appropriate settings. The parameter design experiment is designed and run, and the appropriate performance statistic is calculated for each combination of the control factor settings. The combination of control factor settings with the best performance statistic value is then chosen as the new setting. A confirmatory experiment is run to see if the new settings do indeed lead to an improved value of the performance statistic. In a published discussion of the Kackar paper, Box points out that the important engineering ideas of Taguchi are accompanied by statistical procedures that are often unnecessarily complicated and inefficient, and sometimes naive. He specifically mentions not exploiting the sequential nature of experimentation, limiting the choice of a design, focusing on the universal use of a signal to noise ratio, and the lack of data transformations, among others. Incorporating some of these issues and others into the methodology one uses would lead to a statistical refinement of TaguchiÂ’s excellent engineering ideas, as will be covered later.
PAGE 18
11 2.2 Dual Response Surface Optimization As previously mentioned, serious concerns were expressed about the lack of use of statistical methods in the Taguchi method. Many of these were detailed by Box in the discussion of KackarÂ’s paper. An excellent paper by Myers et al. (1992) encapsulates many statistical improvements to the Taguchi method. Taguchi recommends a design that crosses the noise design with the control design. Typically this will result in a very large design, unless they are highly fractionated designs. When the control design is highly fractionated, there is typically little or no information about interactions, some of which may be very important. TaguchiÂ’s rationale is that maximum information must be gained about the control by noise interactions and not interactions among the control variables. His designs therefore sacrifice possibly important control by control interactions at the expense of measuring possibly unimportant control by noise interactions. Myers et al. (1992) give an example of a Taguchi design with 3 control factors and 2 noise factors. The design recommended by Taguchi is an L$ for the control variables crossed with a 2 2 factorial for the noise variables. The L9 design is a three level PlackettBurman design for three factors and is given in Table 2.1. The levels of the three variables are coded by 1 for the lowest level, +1 for the highest level, and 0 for the middle level. When the design for the control variables is crossed with the design for the noise variables, the resulting experiment will consist of 9x4=36 runs. This approach gives maximal information about control by noise variable interactions, because the Taguchi method seeks to find the level of the control settings that is least affected by the noise variables. However, often many degrees of freedom are used for estimating control by noise interactions that could be better utilized estimating potentially important control by control interactions. For instance, a popular response surface design in five factors that could be used is a hybrid design consisting of a central composite design in the control variables
PAGE 19
12 Table 2.1: Lg Taguchi Design. Xi x 2 ^3 1 1 1 1 0 0 1 +1 +1 0 1 0 0 0 +1 0 +1 1 +1 1 +1 +1 0 1 +1 +1 0 xi,x 2 , and x 3 and the noise variables ^and^, which is given in Table 2.2. The first 16 runs evolve from a 2 51 fractional factorial in all five factors, plus axial runs in the three control variables with the noise variables at their midlevel, and 3 center runs among the five variables. This design has only 25 runs, compared to the 36 suggested by Taguchi, and it has the flexibility of estimating control by control interactions. The practice of putting the control and noise factors into a single experimental design was first referred to as forming a combined array by Welch et al. (1990) and Shoemaker et al. (1991). Combining all of the factors into one design allows for greater flexibility for estimating certain effects, such as important control by control interactions. Enough information will still be available for the control by noise interactions, which will lead to robustness. The combined array philosophy has been almost unanimously endorsed by the statistical community. A paper by Vining and Myers (1990) speaks to many of the other criticisms of the Taguchi method, such as the universal use of a signal to noise ratio, and limiting the recommended optimal setting to one of the combinations of the factors that was run in the experiment. They recommend using the aforementioned combined array and treating the mean and variance as separate responses. The mean and variance are each modelled with a second order polynomial and each can be optimized
PAGE 20
13 Table 2 . 2 : Central Composite Design in 5 Variables. Run X\ X2 Xz 2 1 22 1 1 1 1 1 +1 2 +1 1 1 1 1 3 1 +1 1 1 1 4 +1 +1 1 1 +1 5 1 1 +1 1 1 6 +1 1 +1 1 +1 7 1 +1 +1 1 +1 8 +1 +1 +1 1 1 9 1 1 1 +1 1 10 +1 1 1 +1 +1 11 1 +1 1 +1 +1 12 +1 +1 1 +1 1 13 1 1 +1 +1 +1 14 +1 1 +1 +1 1 15 1 +1 +1 +1 1 16 +1 +1 +1 +1 +1 17 1 0 0 0 0 18 +1 0 0 0 0 19 0 1 0 0 0 20 0 +1 0 0 0 21 0 0 1 0 0 22 0 0 +1 0 0 23 0 0 0 0 0 24 0 0 0 0 0 25 0 0 0 0 0
PAGE 21
14 separately and simultaneously. This is referred to as the dual response surface optimization approach. This approach follows the dual response approach of Myers and Carter (1973), which is a response surface technique for dealing with two responses, where generally both pertain to quality characteristics whose variances are assumed constant over the experimental region. The responses are classified as a primary response and a secondary response, where the latter will act as a constraint. The fitted primary response is given by y p = b 0 + x'b + x'Bx X and the fitted secondary response is given by y s = c o + x'c + x'Cx where b, B,c, and C are vectors and symmetric matrices whose elements are the least squares estimators of the coefficients of the linear, quadratic, and cross terms of the second order model. The object of the optimization is to find conditions on the elements of x that optimize y p subject to the constraint that y s = k , where k is some desirable or acceptable value of the secondary response. Using a Lagrangian multiplier /i, they consider L = b 0 + x'b + x'Bx Â— y(co + x'c + x'Cx Â— k ) and solve for the x that satisfies
PAGE 22
15 The solution is (B pC)x = i(/xc Â— b). Much of what Myers and Carter present deals with ensuring that a local maximum (or minimum) is actually obtained, and closely follows the ridge analysis approach developed by Hoerl (1959) and Draper (1963). The mathematical details will not be presented here. Vining and Myers recommend using the dual response approach of Myers and Carter, using the mean and variance as the two responses. The three basic situations are dealt with as follows: Â• Target is best: minimize the variance a 2 , subject to the constraint that the mean p is held at a specified target value. Here the primary response is the variance, and the secondary response , or constraint, is the mean. Â• Larger is better: maximize the mean p, subject to the constraint that the variance cr 2 is at some acceptable level. Â• Smaller is better: minimize the mean p, subject to the constraint that the variance a 2 is at some acceptable level. In this and the previous case, the primary response is the mean, and the constrained response is the variance. Myers and Carter showed that a solution can always be found if the optimization is performed on the surface of a sphere of radius p, thus giving an additional constraint of x'x = p 2 . Del Castillo and Montgomery (1993) showed that this additional constraint can be dispensed with by using a standard nonlinear programming technique called the generalized reduced gradient (GRG) algorithm, which will be described later. Vining and Myers recommend fitting the two responses y p = b 0 + x'b + x' Bx
PAGE 23
16 y s = c 0 + x'c + x'Cx as in the dual response situation of Myers and Carter. The experiment is set up using an npoint response surface design that is capable of supporting the fit of a second order model. Some designs that are presented are a central composite design or a 3 3 factorial. Details of these, and other designs can be found in books on response surface methodology, such as Myers and Montgomery (1995) or Khuri and Cornell (1996). This design is then replicated m times to obtain an estimate of the variance at each design point. Ideally this replication would be in the noise parameters, as in the Taguchi method, but any scheme with proper randomization would be adequate. The average response yi and the sample standard deviation Si are calculated for i = 1, . . . , m, and then fit to the primary and secondary response models above. The dual response system is then optimized using the GRG algorithm, which can be found in many software packages, such as Microsoft ExcelÂ’s solver command. This approach answers several of the statistical issues raised by critics of the Taguchi method. It allows for sequential experimentation, where an experiment or series of experiments is/are performed to locate a favorable region, followed by another experiment to locate the optimal settings in that area. The approach allows for a variety of designs which may include estimating interactions among the control variables. Both the mean and the variance are separately modeled, allowing for a better understanding of the process to be acquired while avoiding the use of a signal to noise ratio. 2.3 Alternative Methods of Dual Response Surface Optimization Several alternatives to the optimization methods of Vining and Myers, as augmented by Del Castillo and Montgomery, have been suggested. These deal mainly with the perceived problem that the constraints on the secondary response
PAGE 24
17 are unnecessarily restrictive, mainly in the target is best scenario. The main suggestion deals with the secondary constraint that the fitted mean be at a certain target. If some bias, defined as the distance that the fitted mean is away from the target, is allowed, new settings of the control factors may result in an even lower variance. This must be done with caution, however, as too much bias will not be good for the overall optimization of the product. Though it has not been mentioned in the literature, in the other two cases of larger or smaller is better, the constraint of setting the variance at an acceptable level may also lead to problems, which will be detailed later. All of the following methods begin the same way as Vining and Myers. A response surface experiment is designed that will allow both the mean and variance (or a function of the variance) to be modeled by a second order model in the p control variables. In most cases, the standard deviation is used rather than the variance, but another common transformation of the variance is a log transform. The transformation is merely meant to allow a better fit to the model form. The fit of the mean of the response is given by p p p Wfj,= flo + PiXi + i = 1 i = 1 j>i and the standard deviation of the response is given by p p p w a Â— 7o + 7 i x i + 7 ijX i%j i i = 1 i = 1 j>i A where the /3 and 7 are the least squares estimates of the true parameters. Once the models are fit, one or both of the response measures is to be maximized or minimized in order to determine the optimal setting of the control factors. Vining and Myers recommended using a constrained maximization (or minimization) of the two responses, as described above. Other people have recommended that different
PAGE 25
18 quantities be maximized (or minimized), or that a modified approach be taken to the constrained optimization. Lin and Tu (1995) recommend minimizing a squared error loss expression defined as MSE = Â« T ) 2 + u>l where w A and w 2 are the fitted mean and variance from the two responses surface models, respectively, and T is the target value for the mean. This allows some deviation from the target level, or bias, to enter into the proposed solution, but it should not be too large, since the overall MSE must be minimized. The maximization may also be performed using the GRG algorithm that is recommended for the method of Vining and Myers. Lin and Tu claim that the advantages of their approach over that of Vining and Myers are that it is not restricted to fitting full second order models and it may lead to a better overall solution. By not being restricted to a full second order model, the experimenter can use common regression techniques to find the simplest model for the responses. In fact, the two responses do not even have to have the same model form. In actuality, the use of the GRG algorithm for solving the method of Vining and Myers is not restricted to a full second order model either. Lin and Tu also mention that more flexibility may be gained by minimizing the quantity MSE* = X 1 (w^T) 2 + X 2 wl where A* are prespecified constants such that 0 < Aj < 1 and Ai + A 2 = 1. This would allow more freedom for the experimenter, by enabling the mean or variance to take on different weights in the final solution. Copeland and Nelson (1996) propose a slightly different formulation to the problem as part of a paper about computer algorithms for finding a maximum or
PAGE 26
19 a minimum in a region. For the target is best case, they suggest minimizing the variance subject to the bias, or distance the mean is from target T, being less than some A, which is specified by the experimenter. This can be stated as Minimize w 2 subject to (Wfj, Â— T) 2 < A 2 . This allows for the mean to deviate from the target level, but places an upper limit on the amount of deviation. For the larger or smaller cases, they suggest optimizing the mean w ^ while keeping the variance w 2 less than, rather than equal to, some level. This is really a minor modification to that of Vining and Myers, since the final solution typically has the variance at the defined limit anyway. Copeland and NelsonÂ’s method answers a concern with the method suggested by Lin and Tu, namely that the mean could deviate too far from the target level, which is contrary to TaguchiÂ’s original idea of keeping the response close to target as often as possible. By placing an upper limit on the amount that the mean can deviate from the target level, this method allows some deviation, but not too much. Their modification in the larger or smaller case is an improvement over the use of Lagrangian multipliers suggested by Vining and Myers, but is really no different than that suggested by Del Castillo and Montgomery. The linear programming solution does not explicitly use Lagrangian multipliers and is also not restricted to equality contraints. Copeland and Nelson also note that the solution usually has the variance equalling the upper limit, so little is gained by using an inequality rather than an equality. Kim and Lin (1998) propose a solution based on fuzzy optimization methodology. They map the mean and variance to membership functions, and then each of these two functions are mapped to a single value which is optimized. The membership function lies between 0 and 1, and values close to 1 are better. The
PAGE 27
20 optimal solution is the set of control settings that maximizes the minimum of the two membership functions. The general case of the membership function is given by / < V e d_ e d \z\ e d Â—l if d 7^ 0 if d = 0 where d is a shape parameter (see Figure 2.2 for a graph of the membership function for different choices of d), and where and z a are defined as / Mo w fj, Mo^ in w^/d 0 ?/) max Â—Mo V 1 wf n < < Ho Vo < < W%** Otherwise 0 w a < w min a w m^x_ w min wf n w max a The shape parameter d is a measure of how quickly the value of m(z) changes along the 0 to 1 scale as 2 moves away from 0. As d increases (from large negative to large positive), 771 ( 2 ) changes slowly as 2 moves away from 0. Note that 2 moving from 0 to 1 is the same as the response moving from the target to its upper or lower limit. In other words, if it is critical that the mean (or variance) be close to target, a large negative value of d should be used. If it is not critical that the mean (or variance) be close to target, then a large positive value of d should be used (see Figure 2.2). This gives the experimenter some flexibility to add more weight to either the mean of the response or the variance of the response.
PAGE 28
21 Figure 2.2: Membership Function for Some Choices of Shape Parameters. The optimal setting is the values of the control variables x that maximizes the minimum of the two membership functions, or it may be stated as: maximize A (2.1) subject to m(Zfj,) > A m(z a ) > A Since m(z ) is close to 1 when 2 is close to 0, this implies that the best setting will be the one that gets both m(z IJ/ ) and m(z a ) as close to 1 as possible. When this happens, both the mean w ^ and the standard deviation w a should be close to their targets, where the target for the standard deviation w a is usually 0. A brief explanation of fuzzy set theory may help motivate the method proposed by Kim and Lin. A good discussion of the subject from a statistical
PAGE 29
22 Table 2.3: Membership Values X 0 1 2 3 4 5 6 7 8 9 Set A 1 1 1 1 0 0 0 0 0 0 Set B 1 0.9 0.7 0.5 0.2 0 0 0 0 0 point of view can be found in Laviolette et al. (1995). Fuzzy set theory assumes that elements have some membership in a set. If the membership is 0, the element is never in the set. If the membership is 1, the element is always in the set. As membership increases from 0 to 1, the possibility that it is in the set increases. For instance, if the elements of our universe are the digits from 0 to 9, we can define two sets, A = (x : x < 3) and B Â— (x : x is small). The membership in each set for each element is given in Table 2.3. Set A is referred to as a crisp set, since all of the membership values are 0 or 1. Traditional set theory deals with crisp sets. Set B is a fuzzy set, since the membership values take values other than 0 or 1. Membership can be thought of as being a conditional probability, i.e. P(x is small given x = i) gives the membership values for i = 0, 1, ..., 9. The membership values for the intersection of two sets is given by the minimum of the membership values of the two sets. Similarly, the union of two sets uses the maximum. The method of Kim and Lin defines two fuzzy sets, A=(The mean is at a good level) and B=(The standard deviation is at a good level). It then finds the value of x that gives the greatest possibility of being in the intersection of A and B, or APB=(The mean and standard deviation are at good levels). Since the intersection of two fuzzy sets uses the minimum of the two membership values, their method is maximizing the minimum of the two membership values that they define. 2.4 Criticisms of the Various Optimization Schemes All of the optimization schemes have shortcomings that should be addressed. The method suggested by Vining and Myers has already been mentioned as being
PAGE 30
23 too restrictive on its secondary response. The issue in the target is best case of not allowing any bias has been addressed. In the larger or smaller is better case, the restriction of the variance being equal to some acceptable level has not really been addressed. Copeland and Nelson change the restriction that the variance be less than or equal to some level, but admit that the final solution usually has it equalling that level anyway. By dictating what is an acceptable level of variance, one may find a solution that is not the best possible. Vining and Myers use a number of different acceptable levels and suggest that the experimenter compare the proposed solutions and choose the one they think is best. No methodology is given as to how to compare the proposed solutions, though. In the larger is better case, it is difficult to compare the following two solutions: Wp w a Solution A 557.9 60 Solution B 687.5 75 It is unclear whether the larger mean in solution B sufficiently offsets its increased standard deviation. The method suggested by Lin and Tu does have the statistical appeal of minimizing squared error loss. For the target is best case, this method is difficult to criticize. For the larger or smaller is better case, though, a difficulty arises in how to define bias, since there is no explicit target. For the smaller is better case, they suggest a target of 0, which is consistent with what Taguchi suggests. This may not lead to a good solution if the mean cannot legitimately attain 0. If the mean is approximately 100 and the standard deviation is approximately 10, then the bias term will dominate the variance term, meaning the solution will be driven by bias,
PAGE 31
24 with variance largely ignored. For instance, take the following two solutions: w a MSE Solution A 100 10 10100 Solution B 90 40 9700 According to this method, solution B would be chosen by virtue of a smaller MSE, even though the standard deviation is four times that of solution A. This probably would not be an acceptable choice of solutions. No mention is made of the larger is better situation, as far as the definition of bias is concerned, thus they do not even address this case. An experimenter could conceivably choose some artificial target, but then the solution may depend too heavily on the choice of this target. A criticism with the method of Copeland and Nelson is the issue of how much bias is acceptable. This is similar to the criticism of Vining and Myers with regard to how much variance is acceptable. As with Vining and Myers, one could suggest a number of possible solutions and pick the best one, but no method is given for comparing the possible solutions. It is not a simple task to compare a solution that results in a bias of 1 and w a = 45.20 with a solution that results in a bias of 5 and w a = 44.73. It is unclear whether the lower standard deviation in the second case compensates for the larger bias. The main criticism with the method of Kim and Lin is that too many parameters have to be specified for the membership functions. They require targets, upper and lower limits, and shape parameters to be specified, all of which can greatly affect the final solution. It is this dependence on so many parameters that can lead to the belief that the parameters drive the solution more than the actual data do. As will be shown later, the final solution can be highly dependent on the choice of these parameters. Another criticism of the method of Kim and Lin is that it is unnecessarily complicated, and its solution is in terms of a quantity A from equation 2.1 that has
PAGE 32
25 no physical interpretation. If a statistician reports that the optimal settings result in A = 0.26, the engineer will most likely not know whether that is a good or a bad result. I believe a simpler approach having a more intuitive interpretation would be much preferred. 2.5 Multiple Response Optimization For experimental situations where there is more than one quality response of interest, all of which depend on the settings of a group of control variables, a number of methods for determining the optimal set of control variables have been suggested. If the variance of the responses does not depend on the settings of the control variables, then a response surface approach can be used in the optimization. The optimization experiment will begin by running an experiment capable of fitting a second degree polynomial in the p control factors x = (aq, . . . ,x p )', such as a central composite design, face centered cube, or a 3 P factorial. These experiments are discussed in books on response surface methodology, such as those by Myers and Montgomery (1995) or Khuri and Cornell (1996). At each of the n design points, the r responses are observed as with i = 1, . . . , r and j = 1, . . . , n. Each of the responses is then individually fit to a full second order response surface model, given by p p p in Â— Ao + Pij x j + j = 1 j = 1 k>j A where the f3 are the usual least squares estimates of the true regression parameters. At this point, a univariate function, say (j>(x), is constructed, which is a function of the control variables x through the fitted responses i/j. The optimal solution is the set of control factors x that maximizes (or minimizes, depending on the function) the function (x) to be optimized. / , / y Pijk x j x k
PAGE 33
26 Derringer and Suich (1980) proposed a desirability function for the case of optimizing several responses simultaneously. For the i th response, let the individual desirability di be a transformation of the fitted value y where 0 < di < 1. The value of di increases as the desirability of the corresponding response increases. The overall desirability of the r responses is the geometric mean of the individual desirabilities, \i=l / Note that D is also between 0 and 1, and that if any individual di is 0, then D will be 0. That is, if any individual response is unacceptable, the entire product is unacceptable. This is consistent with much of industryÂ’s needs and demands where a product must meet all of its specifications, or it will be rejected. The individual desirability for the case of target is best, where the target for the I th response is TÂ» and the lower and upper specifications are and y Â™ ax , is given by di = < mm y. max where s and t are shape parameters. Figure 2.3 shows the desirability function for some choices of s and t. Typically a shape factor of 1 is used, but there are situations where a value other than 1 may be used. One situation would be if a product could be inexpensively reworked if a quality characteristic was too high, but would have to be scrapped if it was too low. In this case, if s was chosen to be large and t was chosen to be small, the desirability function would be larger when the response was above the target level, which would favor the response being at or above target.
PAGE 34
27 y Figure 2.3: Two Sided Desirability Function for Selected Shape Parameters.
PAGE 35
28 Other choices for the shape parameters could be thought of as weighting factors for responses that are more or less important than other responses. In the case of larger is better, where the lower specification limit is yÂ™ m , an upper limit yf 10 * still must be specified. This upper limit may be thought of as the level above which there is no added value to the product. This upper level in essence becomes the target for the response. In the case of smaller is better, the negative of the response is used and treated as though it were a larger is better situation. The desirability function for larger is better is given by / 0 Vi < y min Vi~y min y max y min lt min ^ n. < n.max Vi Â— Ui Â— Ui \ 1 Ui > y max where s is a shape parameter. Figure 2.4 shows this function for several choices of s. Note that the desirability function does not need all of the responses to have the same functional form. The overall desirability D depends on the location x in the region of interest. The overall optimum is found by maximizing D with respect to x. This has an intuitive appeal that the optimal choice of the settings of the control variables x is the location that is most desirable, i.e. has the highest value of the desirability function D. There are some disadvantages to this method that must be addressed. The desirability function D is treated as being a function of only x , but it is, in fact, also a function of the choices of the upper and lower specification limits, the targets, and the shape parameters for each of the individual responses. In addition, the responses are treated as being independent, since any correlation among the responses is not taken into account. Khuri and Conlon (1981) proposed a distance measure from some ideal optimum, where the distance is relative to the estimated variance of the responses. The main advantage to this method is that it incorporates the possible correlations
PAGE 36
29 y Figure 2.4: One Sided Desirability Function for Selected Shape Parameters.
PAGE 37
30 among the responses, which the desirability method does not. The first step in their method is to define a vector of ideal targets for the responses 0. The individual elements of the ideal target vector 0; are defined by the actual target in the case of target is best, or by the predicted maximum or minimum that can be obtained in the region of interest in the case of larger or smaller is better, respectively. If the target vector is an estimate of the true target vector, then it should be denoted by 0. Obviously, if there exists some x that simultaneously achieves all of the individual targets at the same time, it is the ideal optimum. Since this rarely occurs, the x that minimizes the vector distance from this ideal optimum is defined to be the optimal set of operating conditions. Before defining the distance function, other quantities must be defined and estimated. The linear model for an individual response is written in the form: y% X Â® Â•Â•Â•> where y i is the vector of observations of the ith response, /3 i is a vector of q unknown regression parameters, e* is a vector of random errors associated with the ith response, and X is an nxq full column rank matrix of constants obtained from A the experimental design. The fitted i th response is of the form yi(x) = z'(x)f3 where z(x) is a vector of dimension q, whose first element is one and the remaining elements consist of powers and crossproducts of control variables Xi,x 2 ,... ,x p . Note that this method assumes that all of the responses use the same functional form. The matrix of the observations from the r responses is denoted by Y, and the variancecovariance matrix is denoted by S. An unbiased estimate of the variancecovariance matrix is given by S = Y'[I n X (X 1 X)1 X']Y / (n q),
PAGE 38
31 where I n is an identity matrix of order nxn (Gnanadesikan 1977, for instance). The distance function is then defined as p[y(*),0] = [(y( x ) 1 (y(x) ) z'(x)(X'X)1 z{x) The optimal solution is given by the set of the control factors x which minimizes the distance p[y(x),4>]. The method given above is an attempt to measure the distance away from target in terms of the variability associated with the estimated responses. Since the variance associated with the fitted responses Ey is given by Ey = z\x){X'X)1 z{x)E , equation 2.2 can be rewritten as p[y(*), ] = (y(*) (y(*) 4>) 1 1/2 (2.3) A A where Ey is Ey with E substituted. Equation 2.3 shows the relationship to a distance measure in terms of the variability, just generalized to a multivariate situation. An estimate of the distance of the fitted responses from their targets is given by y(x) Â— ] = i = 1 (&(*) i y _a ii z'(x)(x'x)1 z(x)_ 1/2 (2.4) A where a a is the ith diagonal element of E. Khuri and Conlon also talk about the fact that the target vector
PAGE 39
32 maximized or minimized, which means that the target is the estimated maximum or minimum that may be achieved in the experimental region. Dealing with a random target vector greatly complicates the analysis, and will not be dealt with here. Ames et al. (1997) propose that a quality loss function be used in order to optimize several responses simultaneously. Their method is somewhat similar to that of Khuri and Conlon, though it can be thought of as being both more general and more restrictive. A quality loss function is defined by r QLP(*) = ^2 v>i(yi(x) Ti ) 2 , (2.5) i= 1 where jji{x) is the value of an individual response evaluated at a given x , T* is the target for the i th response, and Wi is a prespecified weighting factor. It is assumed that the target vector is selected by the experimenter, and may be considered constant. The optimal point is the value of x that minimizes the quality loss function. Note that the loss function in equation 2.5 is of the same form as equation 2.4, with , replacing T) as the target and the weight Wi being equal to [1 / 'a u z' {x)(X' X ) 1 z(x)] 1 ^ . By not dealing with the entire covariance matrix, AmesÂ’ method is more restrictive than that of Khuri and Conlon. But by being more flexible in the possible choices of the weighting factors Wi, it is less restrictive than that of Khuri and Conlon. Vining (1998) presents a more general squared error loss function that does account for the possible correlation among the responses and reduces, in a special case, to the distance measure proposed by Khuri and Conlon. The loss function that he introduces is of the form L = [Â£(*) 0}'C[y(x) 0], where C is an appropriate positive definite matrix of costs or weights and 0 is the vector of target values, assumed to be known. The overall optimum is the location
PAGE 40
33 of x which minimizes the expected value of the loss function, which is estimated by E (L) = [y(x) 9}'C[y(x) 9} + trace[CS^(aj)], (2.6) where 'Ey(x) is the variancecovariance matrix of the fitted vector y. The first term of equation 2.6 represents the penalty imposed for any predicted value that does not simultaneously meet all of the target values, while the second term represents the penalty imposed by the quality of the prediction. One serious concern with this method is that the choice of the cost matrix C can be somewhat subjective. Vining offers some different choices that the practitioner may use. One choice would be to use C = [ Sy (*)] x , which would make the term trace[CSy(x)] = trace[I r ] which is a constant for all values of x. Thus, minimizing E(L) is equivalent to minimizing the distance measure suggested by Khuri and Conlon in equation 2.3, as long as the targets 9 are known. This shows that the distance measure of Khuri and Conlon can also be thought of as a loss function, though the cost matrix that is used in not practical in an engineering sense. A more general choice that is suggested is C = KT,~ l , where K is a suitably chosen matrix that shapes the variancecovariance structure to reflect the economics of the process. With this choice, the loss function becomes E (L) = [y(x) 9^KY~ l [y{x) 9] + zÂ’(x)(X'X ) 1 z(x)tvace[K]. Lai and Chang (1994) propose a fuzzy set approach to multiple response optimization that attempts to simultaneously optimize the individual responses while minimizing the variance of the predicted responses. This is in the same vein as minimizing the mean square error that Vining proposed, in that it has a penalty
PAGE 41
34 for not being at the overall optimum and a penalty for the quality of the prediction. A description of this method would necessitate the introduction of a large amount of fuzzy set theory, fuzzy regression, and a number of other fuzzy concepts, and will not be discussed further here. 2.6 Criticisms of Multiple Response Optimization Methods As with the single response optimization methods, all of the multiple response optimization methods have shortcomings that need to be addressed. The method of Derringer and Suich does have an intuitive appeal with its desirability function, but it is questionable whether the desirability function has any real correlation with overall product quality. For instance, one choice of shape factors could give a final desirability that is equal to 0.70, while another choice of shape factors could give a final desirability of 0.75. In this case, it is not clear which of the two solutions gives the better set of operating conditions. Even though 0.75 is a larger desirability than 0.70, both are a function of their shape parameters, so a comparison is very difficult to make and as a result the desirability has no real meaning in and of itself. Ignoring the possibility that the responses could be strongly correlated is a far more serious problem, as is the issue of multiple shape parameters and targets that must be specified by the experimenter. Treating the responses as if they were independent could lead to a suboptimal solution, especially when some of the responses are highly correlated. Similarly, poor choices of the values of the shape parameters and targets may lead to a suboptimal solution. In fact, the solution itself may depend heavily on the choice of these parameters. This could lead to a situation where the experimental data is repeatedly reoptimized with different parameters until the experimenter gets the solution that he desires. Khuri and ConlonÂ’s distance function does consider the correlation among the responses, but it is not without its own shortcomings. Their treatment of the target, especially in larger (or smaller) is better settings, can lead to serious problems.
PAGE 42
35 In these situations, they find the maximum that each response can achieve when treated by itself and use this as the target. This ignores the actual specifications that a product may have. For instance, a response may be able to achieve a value of 500, but the product may only need to be above 100, and anything over 200 may not add anything to the usefulness of the product. In this case, Khuri and ConlonÂ’s method would still try to achieve a response of 500, and this may make the solution entirely dependent on that one response, and in fact it may drive other responses far from their desired values. A target of 200 would probably be a more realistic and lead to a better overall solution. A situation where one response is driven far from its target because another response is trying to reach an unreasonable maximum will be illustrated in a later example. The method proposed by Ames et al. suffers from the fact that it ignores possible correlations among the responses. In addition, a weight is required for each response, which may not be easy to define. Possible choices would be the reciprocal of the variance or a function of the cost or importance of each response. Using the reciprocal of the variance is a special case of the Khuri and Conlon method. In any case, the final solution may depend more on the choice of these weights than the actual data of the experiment. The method that Vining proposes addresses some of the problems with Khuri and ConlonÂ’s distance function, most notably the issue with the appropriate targets, but has the same problem as Ames with respect to the definition of a cost matrix. Vining suggests defining the cost matrix in a manner similar to Ames, namely the inverse of the variancecovariance matrix or a function of the costs of each response. Though there may be times when the costs can be easily defined, it could become too subjective to make for a truly optimal solution. Vining recommends that the targets be chosen in a manner consistent with that of Derringer and Suich. In the
PAGE 43
36 cases where a response is to be maximized or minimized, the target should set at a level above or below which there is no further benefit to the product. 2.7 Robust Multiple Response Optimization All of the multiple response optimization methods described previously assume that the variancecovariance among the responses is constant in the experimental region. There may exist situations where this assumption is not valid, and a method for optimizing a multiple response system where the variance is not constant in the experimental region must be addressed. Pignatiello (1993) has addressed this issue by suggesting that the mean square error be minimized, similar to what Vining suggested for multiple response systems, though this appears to be the only attempt at optimization under the assumption of nonconstant variance. The optimization experiment will begin in a manner similar to that of simple multiple response optimization. The optimization experiment will begin by running an experiment capable of fitting the assumed model for the response. Since an estimate of the variance will be needed at some of the points of the experiment, some of the experimental points will need to be replicated. The part of the design that is replicated must be able to support the model that is assumed for the variance. At each of the n design points, the r responses are observed as yij, with i = 1, . . . , r and j = 1, . . . ,n. At the n r points that have been replicated, an estimate of the mean y^ and the variance sf can be obtained, with i = 1, . . . , r and j Â— 1, . . . , n r . Pignatiello recommends that all of the points of the experiment be replicated and uses a loss function of the form L = [y{x) 0]'C[y(x) 0], where C is an appropriate positive definite matrix of costs or weights and 0 is the vector of target values. He only suggests using this method when all of the responses are of the Â’target is bestÂ’ situation. Though he does not deal with Â’larger (or smaller)
PAGE 44
I 37 is betterÂ’, a target could be chosen in the manner recommended by Derringer and Suich, among others, namely choose as the target the level above (or below) which the product shows no overall increase in value. An estimate of the expected loss, or mean squared error (MSE), is then given by E(L) = [y(x) 0}'C[y(x) Â— 6} + trace[CE(*)], where S(cc) is an estimate of the variancecovariance matrix, which does depend on the location in the experimental region. The strategy that Pignatiello recommends to find the optimal point is to estimate E(L) at each point in the experimental design, and then use traditional regression techniques on the expected loss. He suggests using the expected loss as the response of interest in a sequential set of experiments to find the optimal setting of the control variables, as in response surface methodology. He suggests beginning with a small design, such as a fractional factorial, to find out which control variables need to increase or decrease in order to reduce the estimate of E(L). He then suggests a method of steepest descent, in an effort to find the direction that most quickly decreases the expected loss. Other approaches for finding the minimum expected loss are mentioned, but the overall idea is similar in all of the cases and the only difference between them is the set of assumptions used to find the optimum. There are several problems with this method that must be addressed. As with other methods, a cost or weighting matrix is introduced and must be specified by the experimenter. Pignatiello does give some methods to specify the individual elements of the matrix, but these can often be quite subjective. In many cases, the overall choice of the optimum may depend too much on the choice of these subjective parameters. Though he does not deal with the case of Â’larger (or smaller) is better,Â’ the most likely way to deal with it is to introduce Â’artificialÂ’ targets for these responses. Again these choices may be very subjective and greatly affect
PAGE 45
38 the final optimal point. He does not go into much detail on how he estimates the variancecovariance matrix, but it appears that he estimates it at every point that is run in the experiment. Better strategies may or may not exist that give a better overall estimate of the variancecovariance matrix, and could lead to more efficient experimental designs. 2.8 Optimizing a Function In all of the optimization schemes presented thus far, the experimental data are reduced to a single function which is to be either maximized or minimized. Some of the methods add a secondary constraint, and all have a constraint on the values that x may take. Typically the solution cannot be expressed in a closed form, so that some form of numerical method must be used. Several numerical methods are available, including the generalized reduced gradient method mentioned previously. Del Castillo and Montgomery proposed using the generalized reduced gradient (GRG) algorithm for solving the dual response surface problem presented by Vining and Myers. The GRG algorithm was chosen because it is known to work well when optimization is done on a small region and the constraints are not highly nonlinear, which is true of the dual response surface problem. Like many functional maximization algorithms, the GRG uses a gradient to find the direction of steepest ascent, then moves in this direction in order to increase the value of the function. The first step of the algorithm is to convert the inequality constraints into equality constraints through the use of slack variables. For instance, a constraint of the form aq + X 2 > c is rewritten as aq + X 2 + x s = c, where the slack variable x s must be nonnegative. Note that constraints involve more than one variable, thus the bounds on the individual variables of the form 0 < aq < 1.0 are not deemed to be constraints. After the constraints are modified, there will be m constraints and n variables, some of which may be slack variables. The n variables are grouped into two sets,
PAGE 46
39 basic variables and nonbasic variables. The basic variables are those variables that are strictly between their bounds, while those that are at their bounds are nonbasic variables. The system of m equations in n variables typically cannot be solved explicitly, since m is usually less than n. However, the system can be solved for m of the variables (basic variables) in terms of the remaining n Â— m nonbasic variables. The gradient of the objective function with respect to the nonbasic variables is called the Â’Â’reduced gradient.Â” If the magnitude of the reduced gradient at the current point is small, then the algorithm stops. Otherwise, the gradient is used to give the direction of steepest ascent (or descent for minimization problems). A line search is performed in this direction, a new point found, and the process repeated. Nelder and Mead (1965) proposed a direct search algorithm, utilizing a simplex. Box (1965) adapted the algorithm for constrained maximization. The method does not assume any smoothness, and thus does not utilize gradients. Instead it uses a simplex of points in the space of the variables. The vertex of the simplex with the lowest response value is reflected to the opposite side of the simplex, and by repeating this the simplex moves towards the maximum point. By use of expansion and contractions of the simplex, it will eventually surround the maximum point. The simplex is initially defined by k points, one of which is specified by the user, and the rest are randomly generated. The number of vertices of the simplex is greater than the dimensionality of the surface, in order to reduce the possibility that the simplex collapses to a subspace of the variables. All of the initial points will be chosen to satisfy the constraints of the problem. The function is evaluated at all of the vertices of the simplex, and the vertex with the lowest response value is reflected through the centroid of the remaining points of the simplex. The new point will be a > 1 times as far from the centroid as the original vertex. If the new point still has the smallest value, it is moved half way towards the centroid of the remaining
PAGE 47
40 points. If the new point is outside of the bounds of the variables, it is moved just inside of the boundary. If the new point does not satisfy some of the constraints, it is moved half way towards the centroid of the remaining points. The simplex method tends to be much slower than the gradient method, but they both tend to give the same result if the surface is reasonably smooth. Since the simplex does not assume any smoothness, it will be less likely to fail to find the true optimum due to unusual functional circumstances, such as the solution being in a corner of the variable space. Though it is slower than most gradient methods, the simplex method does still give a solution in well under a minute on modern computers (Gateway computer with an Intel Pentium 266 MHz processor with 64 MB of RAM) for the types of problems that have been dealt with here. As with any maximization algorithm, one must be aware that the algorithm may settle on a local optimum. In order to guard against this, one commonly uses a number of different starting points, and if they all find the same optimum, it can be safely assumed that it is the global optimum. For simple functions in a small number of variables, one may construct a grid in the variable space and evaluate the function at every point in the grid. Contour plots can then be generated to look at the surface of the function to verify if it is a local or global optimum.
PAGE 48
CHAPTER 3 ROBUST PARAMETER DESIGN 3.1 A Description of the Proposed Method As with the other methods that have been previously introduced, this method begins with a response surface experiment that will allow both the mean and variance (or a function of the variance) to be fit to a second order model in the p control variables. The fit of the mean of the response is given by p p p Wfj, Â— Po + fii%i + i = 1 i=l j>i and model for the standard deviation of the response is given by p p p w a 7o + 7i X i + <v. 'T T Â• i= 1 iÂ—1 j>i A where the /3 and 7 are the least squares estimates of the true parameters. It is at this point that all of the proposed methods attempt something different. Vining and Myers proposed that a constrained optimization be performed. Lin and Tu proposed that a squared error loss function be minimized. Copeland and Nelson introduced a slightly different constrained optimization than the method proposed by Vining and Myers. Finally, Kim and Lin took a fuzzy set theory approach to the optimization. TaguchiÂ’s idea is that the variance of a process should be kept as small as possible, while keeping the mean of the process at a specific target. If one allows for bias, or the mean not equaling the target, this problem can also be approached by keeping the response close to target with the highest probability, or maximizing the probability that the quality characteristic y is within its specifications, y m i n and y m axThus the optimal solution is the location of the control parameters x that will: 41
PAGE 49
42 Â• Target is best: maximize P (y m i n < y < y ma x ) Â• Larger is better: maximize P (ymin < y) Â• Smaller is better: maximize P (y < y max ) For simplicity, the probability will be calculated by assuming that the response of interest y is normal with mean w p and standard deviation w a (which both depend on the control variables x). As will be seen later, the assumption of normality is not a necessary assumption, but does simplify calculations. One could choose another distribution and calculate the probability in order to find the optimal setting. This approach is similar to the goal of a high C pk , which is becoming popular in industry. A thorough treatment of capability indices can be found in Kotz and Johnson (1993), and a discussion of issues surrounding the use of capability indices can be found in Rodriguez (1992). The index C pk is a measure of how far the mean y of a distribution is from its closest specification limit, either y m i n or y ma x The distance is measured in terms of the standard deviation of the distribution, given by the usual unbiased estimate s. The idea of C pk is to have a measure of the percentage of product that is failing that is easy to calculate. The equation for C pk is usually given by the minimum of the lower and upper capability indices, or ^ . I y ymin ymax 2 / L pk min I C^ p i 0 5 L P u 3s 3s If one assumes normality, then the lower and upper capability indices are simply multiples of a standard normal 2 statistic, and the probability of being out of specification is easily calculated by P (z > 3 C p i) + P(z > 3C pu ). In many cases, one of these two probabilities is very nearly 0, so the probability of being out of specification is very nearly P(z > 3 C pk )C pk is not, however, a meaningful measure if the assumption of normality does not hold. The assumption of normality is what gives the C pk the close relationship with the probability of being outside the specification limits.
PAGE 50
43 The proposed probability method has intuitive appeal, because most people will easily understand the problem being solved, that of measuring the probability that the process is within specifications. This is in direct contrast to the Kim and Lin method, in which A, the value of a membership function, has no direct interpretation. The proposed method does simultaneously minimize variation while staying close to target, and uses the same basic methodology for all three cases listed, unlike any of the previously proposed methods. It also answers all of the issues presented by the other proposed methods, while introducing few new issues. One issue that other methods have addressed is that of allowing some bias into the final solution, or allowing the final solution to have the estimated mean Wfj, not equal to its target. This method does allow for some bias in the optimal solution, but it does penalize for too much bias. As the bias gets larger, the mean is moving closer to the maximum or minimum specification limits, thus decreasing the probability that is to be maximized. Thus the bias will want to be kept small. If the amount of bias is a critical concern, the maximum and minimum specification limits can be made very close to each other. This would be akin to making A small in the method of Copeland and Nelson. An issue that has not been addressed by other methods is that of the constraint that the variance be kept at a specific level, which is done in the constrained optimization method of Vining and Myers, as well as that proposed by Copeland and Nelson. In the case of larger (or smaller) is better, these two methods recommend that the secondary constraint be that the variance be set at or below some acceptable value a$. The choice of an acceptable amount of variance can be quite subjective. One recommendation is to use a number of different choices of
PAGE 51
44 is to balance the mean and the variance simultaneously, via a probability. Also in the case of larger (or smaller) is better, there is no need to introduce an artificial definition of bias. The squared error loss method of Lin and Tu minimizes a measure that has a variance term and a bias term. The bias term is defined as the distance that the mean is from its target, but there is no clear target when the response is to maximized (or minimized). In these situations, an artificial target is selected anyway, though its choice can be quite subjective and greatly influence the final solution. Though the probability method must specify a minimum and maximum limit for the response, there are far fewer parameters introduced than the method of Kim and Lin. Kim and Lin need the experimenter to define shape parameters for both the mean and the variance. If one assumes normality, the choice of the minimum and maximum will frequently not affect the choice of the optimal setting, as long as the maximum and minimum are symmetric about the target. As can be seen in Figure 3.1, a solution that maximizes P (yl < y < yu) will be almost identical to a solution that maximizes P (yl* < y < yu*). The Kim and Lin method gives a solution that can be very dependent on the shape parameters that must be specified, as will be shown in a later example. The use of the shape parameters in the Kim and Lin method seems to be motivated by a situation where being out of specification on one side is less costly than the other. For instance, it may be relatively inexpensive to repair a part that is too large, while a part that is too small may have to be scrapped altogether. The proposed methodology is easily modified for such an economic situation. Instead of maximizing the probability of being within specification limits, one could minimize the expected cost of failing material, given by C\P(y < y m in ) + C 2 P(y > ymax), where ci and C 2 are the costs of failing low and high, respectively. Though this would add
PAGE 52
45 Figure 3.1: Optimal Solution and Two Sets of Limits.
PAGE 53
46 two extra parameters, the cost parameters could be much easier for an experimenter to specify than the shape parameters necessary for the method of Kim and Lin. Thus the probability method seems to answer all of the issues presented in previous approaches to the problem. Issues that may arise from this method would be the assumption of normality, no direct mention of a target value in the solution, and the usual issue of how much bias may enter the solution. As for the assumption of normality, it is not critical and is only used for simplicity. One could use any distributional assumptions, as long as the probability is maximized accordingly, as will be shown in an example. The target does enter the solution as being half way between the maximum and minimum limits, though this is not explicit. This does not seem to be a major issue, since the minimum and maximum may be made very close, which will force the optimal setting near the target. This will also keep the bias within some bounds, if that is critical. Two situations may cause this method to have troubles. The first is if the mean response is unable to fall within the upper and lower limits, which will cause this method to try to find an area with a very large variance. Figure 3.2 shows two possible distributions in a larger is better situation. It can be easily seen that P(V > yl) is smaller for distribution 1 than it is for distribution 2. The probability method would choose distribution 2 over distribution 1, despite the fact that distribution 2 has a lower mean and a larger variance, neither of which is consistent with a situation where the response is to be maximized. In this situation, a small variance would cause the process to be consistently outside of the specification window. A large variance, on the other hand, would get at least some of the product within specification some of the time. Either way, this would not be a desirable process for most people. The second situation that could cause problems would be if the specification window was very wide or narrow. In this situation, the probability of being within specification would be almost identical (and almost always either 0
PAGE 54
47 Figure 3.2: Two Possible Solutions for Larger is Better. or 1) for the entire experimental region. This would cause the optimization routine to become very unstable, and would likely return different locations for the optimum for different starting values, though all locations would have essentially that same probability of being within specification. It would probably be best to modify the specification limits if this should arise, so that the probability being maximized is not so close to 0 or 1. 3.2 Printing Process Example The example will be taken from Box and Draper (1987), which also appears in Vining and Myers, Lin and Tu, and Kim and Lin. The purpose of the experiment
PAGE 55
48 was to determine the effect of speed (xi), pressure (X 2 ), and distance (x 3 ) on the quality of a printing process. The experiment was conducted in a 3 3 factorial design with three replicates at each design point. The data set with the estimated responses is in Table 3.1, and the summary statistics of the fit of the two models in Table 3.2. A second order model was fit for both the mean and standard deviation as follows (Vining and Myers, 1990): w Â„ =327.6 + 177Oxi + 109.4x 2 + 131.5x 3 + 32.0x? Â— 22.4x2 ~ 29.IX3 + 66.0xix 2 + 75.5xix 2 + 43.6x 2 x 3 (3.1) w a =34.9 + 11.5xi + 15.3 x 2 + 29.2x 3 + 4.2x 2 Â— 1.3x2 + I6.8X3 + 7.7xix 2 + 5.1xiX 3 + 14.1x 2 x 3 (3.2) 3.2.1 Case 1: Target is Best In order to compare all of the different methods, the bounds that Kim and Lin proposed will be used, namely ieÂ“ in = 490, ieÂ™ ax = 510, no Â— 500, u;Â“ in = x/1500 = 38.73, and w Â“Â“ = \/2100 = 45.83. The shape parameters that will be used are d M = 4.39, 1.70,0, 1.70,4.39, and d a = 0. The region of interest will be a sphere of radius 1. Table 3.3 gives the optimal settings from all of the methods mentioned, where the traditional method is that of Vining and Myers combined with Del Castillo and Montgomery. The final column is the probability that the response will be between the upper and lower bounds given above, assuming normality with mean and standard deviation w a . This is calculated as />510 P(490 < y < 510) = / (27rw^)Â” 1/2 exp((y w^) 2 /(2wl))dy. J 490 It should be noted that these solutions were arrived at using the solver command in Microsoft Excel, version 7.0. Solver is a generalized reduced gradient
PAGE 56
u 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 49 Table 3.1: Printing Study Data. Xi Xz Uul Uu 2 Uu3 1 l 1 34 10 28 0 l 1 115 116 130 1 l 1 192 186 263 1 0 1 82 88 88 0 0 1 44 178 188 1 0 1 322 350 350 1 1 1 141 110 86 0 1 1 259 251 259 1 1 1 290 280 245 1 1 0 81 81 81 0 1 0 90 122 93 1 1 0 319 376 376 1 0 0 180 180 154 0 0 0 372 372 372 1 0 0 541 568 396 1 1 0 288 192 312 0 1 0 432 336 513 1 1 0 713 725 754 1 1 1 364 99 199 0 1 1 232 221 266 1 1 1 408 415 443 1 0 1 182 233 182 0 0 1 507 515 434 1 0 1 846 535 640 1 1 1 236 126 168 0 1 1 660 440 403 1 1 1 878 991 1161 Vu w ( J 24.0 12.5 75.4 25.4 120.3 8.4 78.9 19.9 213.7 42.8 146.4 22.8 86.0 3.7 97.6 20.3 136.7 80.4 167.1 22.5 340.7 16.2 300.6 33.1 112.3 27.6 75.0 12.5 256.3 4.6 210.6 22.4 271.7 23.6 410.1 40.7 81.0 0.0 116.8 18.6 101.7 17.7 195.8 18.2 357.0 32.9 338.9 26.3 171.3 15.0 182.6 27.6 372.0 0.0 327.6 34.9 501.7 92.5 536.6 50.6 264.0 63.5 203.6 33.8 427.0 88.6 414.7 48.9 730.7 21.1 689.7 72.3 220.7 133.8 100.2 45.4 239.7 23.5 254.6 50.1 422.0 18.5 473.1 63.3 199.0 29.4 209.6 68.4 485.3 44.6 430.1 80.9 673.7 158.2 714.5 101.7 176.7 55.5 274.2 88.8 501.0 138.9 560.7 108.9 1010.0 142.5 911.2 137.5 Table 3.2: Summary of Fits of the Two Responses. I r R 2 Rl M A 0.927 0.888 w a 0.454 0.165
PAGE 57
50 Table 3.3: Target is Best, Limits of 490 and 510. Method Optimal Setting W a Prob Proposed (0.983, 0.003, 0.182) 494.61 44.66 0.1759 Traditional (0.984, 0.025, 0.175) 500.00 45.32 0.1746 LinTu (0.983, 0.002, 0.182) 494.53 44.65 0.1759 CopelandNelson A=5 (0.983, 0.004, 0.182) 495.00 44.71 0.1759 A=1 (0.984, 0.021, 0.177) 499.00 44.20 0.1751 KimLin dfj,= 4.39 (0.975, 0.005, 0.187) 490.57 44.23 0.1749 d M = 1.70 (0.983, 0.014, 0.185) 491.19 44.24 0.1754 d^= 0 (0.977, 0.003, 0.184) 492.02 44.39 0.1754 dp=1.70 (0.983, 0.002, 0.183) 493.52 44.53 0.1759 4.39 (0.979, 0.042, 0.202) 495.74 44.81 0.1758 algorithm for finding a constrained maximum. The details on the use of the solver command can be found in the Appendix A. It should be noted that with any maximization or minimization algorithm, there is the chance that a local, rather than global, optimum will be found. A common way to guard against this is to use a number of different starting points. If all of the different starting points end up at the same optimum, it may be safely assumed that it is the true global optimum. It may also be possible to evaluate the function at every point of a rough grid of the experimental region. The value at the optimum that has been found can then be compared to the values over the rough grid to determine whether it is likely that it is a true global optimum. As Table 3.3 shows, there is very little real difference among most of the solutions. The proposed method is essentially identical to the method of Lin and Tu. When one is confronted with many similar solutions, the simplest and most robust is most desirable. In order to check for robustness, the above is rerun with new limits of wÂ™ in = 450, rcÂ“ ax = 550 (approximately Â±1 standard deviation), w Â“Â“ = 50, and all other limits the same. Note that Kim and Lin is the only other method that uses these limits, so the location of the other solutions will stay the same, although the
PAGE 58
51 Table 3.4: Target is Best, Limits of 450 and 550. Method Optimal Setting W a Prob Proposed (0.983, 0.003, 0.183) 494.58 44.66 0.7336 Traditional (0.984, 0.025, 0.175) 500.00 45.32 0.7300 LinTu (0.983, 0.002, 0.182) 494.53 44.65 0.7336 CopelandNelson A=5 (0.983, 0.004, 0.182) 495.00 44.71 0.7336 A=1 (0.984, 0.021, 0.177) 499.00 45.20 0.7312 KimLin gL=4.39 (0.937, 0.047, 0.232) 465.57 41.50 0.6253 aII H1 Â• O (0.948, 0.034, 0.215) 473.66 42.38 0.6759 dÂ»= 0 (0.979, 0.057, 0.195) 480.97 43.02 0.7099 ^=1.70 (0.968, 0.013, 0.191) 486.53 43.80 0.7243 dfj,=4.39 (0.978, 0.003, 0.184) 492.16 44.41 0.7324 calculation of the probability will change, due to the new limits. Table 3.4 gives the results for all of the methods. As can be seen in Table 3.4, the method of Kim and Lin has some serious problems with bias. While the other methods have a bias in the range of 5 or less, Kim and LinÂ’s method has biases ranging from 8 to 34, despite their claim that their method will limit the bias. Note that the bias of 34.43 is approaching one full standard deviation (41.50) from target. One of the concerns of the Kim and Lin method was that the final solution would depend too much on the choice of the shape parameters, as well as the choice of upper and lower limits. As Table 3.4 shows, there is a great deal of difference between the solutions with shape parameters of +4.39 and 4.39. The predicted mean can be as close as 8 from target or as far as 34.5 from target. In addition, the solutions are quite different when the upper and lower limits are changed, as can be seen when comparing Table 3.3 with Table 3.4. Fixing the shape parameter at 0, when the limits are 490 and 510, the estimated mean w ^ is 492, but it is 481 when the limits are changed to 450 and 550. The proposed probability method, as expected, remains essentially the same when the limits are changed. With the tighter limits, the estimated mean w M is 494.61,
PAGE 59
52 Table 3.5: Larger is Better, Lower Limit of 550. Method Optimal Setting A A a Prob Proposed Traditional (0.818, 0.415, 0.399) (0.946, 0.312, 0.088) 637.35 594.02 74.17 60.00 0.8806 0.7684 and it only changes to 494.58 when the limits are widened to 450 and 550. The standard deviation w a does not change when the limits are changed. This shows that the proposed probability method is insensitive to the choices of the upper and lower limits. 3.2.2 Case 2: Larger is Better Vining and Myers originally looked at the case of maximizing the mean while restricting the standard deviation to 60. Del Castillo and Montgomery also analyzed this situation, as did Copeland and Nelson, whose restriction was keeping the standard deviation less than or equal to 60. These methods will be compared to the proposed method of maximizing the probability that the response is greater than 550 poo P(550
PAGE 60
53 Table 3.6: Larger is Better, Comparing Different Lower Limits. Lower Limit Optimal Setting Wfj, w a Prob > 550 500 (0.847, 0.404, 0.346) 633.81 71.84 0.8783 550 (0.818, 0.415, 0.399) 637.35 74.17 0.8806 600 (0.789, 0.423, 0.445) 639.07 76.18 0.8788 650 (0.762, 0.429, 0.485) 639.40 77.88 0.8745 Indeed Vining and Myers solve the problem for standard deviations of 60, 75, and 90, but no methodology is given to compare the three competing solutions. Their only advise is that Â“the user must decide which of these settings offers the best compromise.Â” Previous methods have had to assume an upper limit on the standard deviation, but no method has been given to choose an appropriate upper limit. Note that the results of Copeland and Nelson were almost identical to those of the Vining and Myers method. The method of Lin and Tu does not address the case of larger is better. The method of Kim and Lin was not looked at due to the many extra parameters that would need to be specified, as well as its inferior performance in the case of target is best. In order to assess how much the final solution depends on the lower limit that is used, I found the optimal point for lower limits of 500, 600, and 650, which are displayed in Table 3.6. In order to compare the different solutions, the probability that the response is greater than 550 is calculated for all of the cases roc P(550
PAGE 61
54 Table 3.7: Smaller is Better, Upper Limit of 150. Method Optimal Setting A A a Prob Proposed (0.399, 0.451, 0.798) 136.34 19.25 0.7611 LinTu (0.392, 0.421, 0.818) 136.26 19.44 0.7602 CopelandNelson (0.391, 0.414, 0.822) 136.25 19.48 0.7598 3.2.3 Case 3: Smaller is Better Copeland and Nelson looked at the case of smaller is better, as did Lin and Tu. These two methods will come up with nearly identical results when optimization is over a sphere of radius 1. The proposed method maximizes the probability that the response is less than 150 / 150 (27 Twl)1/2 exp((y wtf/{2wl))dy. OO All three come up with nearly identical solutions, as given in Table 3.7. The traditional Vining and Myers method will be identical to the Copeland and Nelson solution. Again the Kim and Lin method was not considered for the same reasons as in the larger is better situation. 3.3 Example with Lifetime Data This example is simulated for a hypothetical reliability experiment. The experiment has four equally spaced levels of a control variable, x, which are coded Â— 1, Â— and 1, and we are interested in the time until failure, t, measured in o o hundreds of hours. The failure time data is randomly generated from a Weibull distribution, whose density is given by f(t)=a(3{at) l3 1 e{at)0 . The true a and /3 depend on x according to the following equations: a = 4.6992 10.88a: + 8a: 2
PAGE 62
55 Table 3.8: Time Until Failure Data (Time in Hundreds of Hours). X i j 3 1 3 1 True a 23.58 9.21 1.96 1.82 True P 0.576 0.649 0.540 0.248 Sample 0.1283, 0.4431 0.1082, 0.0239 0.1020, 0.3425 0.6462, 3.5592 0.0040, 0.2122 0.0068, 0.1066 0.2261, 3.5968 0.3835, 6.5698 0.0886, 0.2198 0.4795, 0.5322 0.2177, 0.0683 3.6837, 0.0962 2.6648, 0.0286 0.1203, 1.4320 2.5847, 1.8744 0.0007, 0.0050 0.0040, 0.0563 0.0203, 0.3836 9.2947, 0.2816 0.0847, 0.0003 /s a 4.6771 3.8210 0.7505 1.8988 Â— Â— p 0.5603 0.7259 0.6552 0.3736 A V 0.3850 0.3213 1.8589 1.5029 A a 0.8122 0.4368 2.8921 2.2947 P = 0.6172 0.164a; 0.205a; 2 . At each level of x, a random sample of size 10 is generated from a Weibull distribution, with the appropriate a and ft. Table 3.8 lists the data for each level of x. The table lists the true values of a and ft that were used to generate the sample, A as well as the maximum likelihood estimates, a and P, and the sample mean, ft, and standard deviation, a. Note that the maximum likelihood estimates of the Weibull parameters, ot and P must be found using a numerical procedure, since no closed form exists for these. The details can be found in any book on survival analysis or reliability, such as Lawless (1982). The purpose of this analysis is to show that the proposed method is flexible enough to handle a distribution that is not normal. Furthermore, it will show that one could get a solution that would be far from optimal, if normality is assumed. Four analyses are given in Table 3.9, they are: Â• Truth: This is the location of x in [1, 1] that maximizes the probability that the time until failure is greater than 0.1, P (t > 0.1). This is done by the maximization algorithm of Nelder and Mead that has been discussed previously.
PAGE 63
56 Table 3.9: Optimization Information for Different Distributional Assumptions. Distribution Parameter Estimates R 2 Optimal x P(t > 0.1) Truth 0.5510 0.6952 Weibull d = 2.161 1.711a; + 1.127a: 2 $ = 0.719 0.095a; 0.252a; 2 0.7839 0.9995 0.2819 0.6446 Normal p= 1.108 + 0.734a; 0.164a; 2 a = 1.678 + 1.035a; 0.125a; 2 0.6660 0.5805 1.0000 0.5194 Lognormal p = Â—1.189 + 0.346a; 1.178a; 2 a = 1.534 + 0.709a; + 1.273a; 2 0.5688 0.9456 0.0278 0.5460 A Â• Weibull: The estimated parameters, a and /?, are separately fit to second order models in x. It is then possible to calculate an estimate of the probability that the time until failure is greater than 0.1, assuming that t is Weibull distributed A with parameters a(x) and f3(x) /Â»00 P (t > 0.1) = / d(a;)^(x)(d(a:)t)^ :r) 1 e^ (a:) ^ 0.1). Table 3.9 gives the individual fitted equations, as well as the R 2 of the fit, the location of the optimal x, and the true P (t > 0.1) for that x. Note that since we know that the data were generated from a Weibull distribution with the functional form of a and /? given above, we are able to calculate the true P (t > 0.1). Â• Normal: The estimated parameters, p and d, are separately fit to second order models in x. It is then possible to calculate an estimate of the probability that time until failure is greater than 0.1, assuming that t is normally distributed with parameters p(x) and a(x) poo P (t > 0.1) = / ( 27 T d 2 (x))~ l l 2 exp((t p(x)) 2 /(2 a 2 (x)))dt. J o.i
PAGE 64
57 We then find the location of Â£ in [1, 1] that maximizes the estimate of P(t > 0.1). Table 3.9 again gives the pertinent information. Note that the probability given is the true probability for the x given. Â• Lognormal: In many situations when heavily skewed data is encountered, the experimenter will try to transform the data to make it more normal and then analyze the transformed data as though it were normal. In this case, a logarithm transformation would be a good choice. For the lognormal line of Table 3.9, all of the original times to failure are transformed, then everything procedes as with the normal case above. The means and standard deviations of the transformed data are not given in Table 3.8, but they are used to fit second order equations in x, and thus find the optimal x as in the normal case above. Table 3.9 again gives the pertinent information, and the probability is the true probability for that x. As the table shows, it is possible to find an optimal setting for x and not have to rely on the assumption of normality. Furthermore, the solution assuming normality is far from optimal, even suggesting that the optimal setting is not within the experimental region. Taking the logarithm transformation does not seem to help very much in this situation, though it is an improvement over the assumption of normality. The Weibull solution is better than the normal or lognormal solution, but it is a fair distance from the true optimum solution. I believe that this is due more to a lack of repeatability with any optimization situation rather than something that is unique to this situation. It should be noted that I had a very difficult time coming up with data that would show a difference in optima when assuming a normal distribution rather than a Weibull distribution. Initially I thought that I could use any randomly generated set of data, but I instead found that the data had to be very heavily skewed before any difference could be seen. Perhaps I could have more easily arrived at a suitable
PAGE 65
58 set of data, but it seemed to me that this method is robust to the assumption of normality. In any case, the practitioner would still want to use the correct distributional assumption in order to get the proper estimate of the probability of being acceptable.
PAGE 66
CHAPTER 4 MULTIPLE RESPONSE OPTIMIZATION 4.1 A Description of the Proposed Method As with the other methods that have been proposed for multiple response optimization, this method begins with an experiment capable of fitting a second degree polynomial in the p control factors x = (aq, . . . ,x p )', such as a central composite design, face centered cube, or a 3 P factorial. At each of the n design points, the r responses are observed as t/jj, with i Â— 1, . . . , r and j = 1, . . . , n. Each of the responses is then individually fit to a full second order response surface model, given by p p p ill Â— PiO T ^ ] PijXj + ^ ^ 'y ^ Pijk%j%ki j = 1 j 1 k>j where the ft are the least squares estimates of the true regression parameters. This can also be written as Vi{x) X Pi, i 1, ..., r, where X is the design matrix of control variables, their powers, and their A i 1 crossproducts, and is a vector of the least squares estimate of the r set of regression coefficients. An unbiased estimator (Gnanadesikan 1977) of the variancecovariance matrix is given by t = Y'[I n X{X'X)1 X'}Yl{n q), where I n is a nxn identity matrix, q is the number of terms in the full model, and Y is the rxn matrix formed by the vectors of the actual responses at the n experimental 59
PAGE 67
60 settings. Note that E is assumed to be constant over the experimental region, so E does not depend on x. All of the proposed methods are essentially the same up until this point. Now a function of fitted responses needs to be maximized or minimized. The functions that have been proposed include the desirability function of Derringer and Suich, the distance measure of Khuri and Conlon, a weighted loss function of Ames, et ah, and the mean squared error of Vining. The goal of optimization in the case of a single response is to maximize the probability that the response of interest will be within its specification limits. This is the same goal when there is more than one response of interest, namely maximize the probability that all of the responses are simultaneously within their specification limits. Therefore the approach taken for multiple response optimization will be a multivariate generalization of the previous approach for a single response. The actual optimization is done by finding the combination of control settings x that maximizes the probability that all of the responses ?/* are within their specification limits, and y r P ax , under the assumption that the response vector y is multivariate normal with mean vector having elements equal to yi(x) and variancecovariance matrix E. The quantity to be maximized is where some of the maxima or minima may be Â±oo. If a response is to be maximized or minimized (Â’larger is betterÂ’ or Â’smaller is better ) then the appropriate upper or lower limit would be +oo or Â— oo, respectively. The proposed probability method has many advantages over the other methods that have been proposed for multiple response optimization. One of the issues that was raised with many of the other methods had to do with targets that had to be specified in the cases where the response was to be maximized or
PAGE 68
61 minimized. In these cases, an artificial target was specified that was based on either knowledge of the situation, the absolute maximum or minimum that could be attained in the region of interest, or some other unspecified method. The only parameters that need to be specified for the probability method are the upper and lower limits on each of the responses, which are often known. When the response is to be maximized or minimized, the upper or lower limit that is not clearly known is merely set to +oo or Â— oo, respectively. The probability method is not trying to reach an artificial target in these situations, rather it is trying to get the distribution as far from the specification limits as is reasonable. The idea of trying to keep the response away from its specification limits, rather than get it close to a target, is the main philosophical difference between the probability method and the other methods that have been proposed. For instance, Figure 4.1 shows a situation of two responses which are to be maximized, and some of the contours of constant loss from not being at the targets. Note that points A and B have the same loss, though point A is out of specification for y\. A concern with the desirability method of Derringer and Suich is that there may be a number of shape parameters that need to be specified in order to perform the optimization. In the absence of any other information, these shape parameters are typically set at 1. These shape parameters can, however, be used to give more or less weight to certain responses, or to influence a certain response to be higher or lower due to economic situations. Even with good knowledge of the engineering and economic situations, it may be difficult to specify the shape parameters that will reflect the true situation that is desired. A similar concern arises with the weight parameters that need to be specified with the weighted loss function approach of Ames, et al. Similarly, the need to specify a cost matrix is a concern with the mean squared error method of Vining. The probability method avoids the need to specify shape, weight, or cost parameters by relying on the upper and lower specifications
PAGE 69
62 Min Target Y1 Figure 4.1: Contours of Constant Loss.
PAGE 70
63 for the individual responses. Obviously, if the shape, weight, or cost parameters are easier to specify than the specification limits, then the probability method may not be the best method of optimization. In many industrial situations, however, the specifications are clearly defined, whereas the shape, weight, and cost parameters are not. The proposed method does utilize the fact that the responses may be correlated, which is not the case for the methods of Derringer and Suich or Ames, et al. As the number of quality characteristics that are monitored increases, the chances for some of them to be highly correlated increases. Ignoring possible correlations may lead to erroneous conclusions, where highly correlated characteristics unfairly weight one area of the experimental region over another. There are areas where the probability method proposed here may not be the best approach to multiple response optimization. One situation, discussed earlier, deals with economic tradeoffs that may need to be dealt with. For instance, if a certain response can be inexpensively reworked if it falls above the upper specification limit, but must be scrapped if it falls below the lower specification limit, there is a definite economic advantage to keeping the response toward the higher end of the specification range. The probability method is not currently able to deal adequately with such a situation, though it may be possible in the future. Another situation that the probability method may not be suited for is encountered often in the food industry, where there are several responses that are measured, but only a few are critical to the success of the product. The probability method will treat all responses equally, and there is currently no methodology to adapt this to a situation where the responses have different weights. 4.2 Tire Tread Compound Example Derringer and Suich present an example where a tire tread compound is to be optimized. The experiment is to determine the optimal setting of three ingredients,
PAGE 71
64 Table 4.1: Experimental Design. Compound No. *i %2 y i 2/2 2/3 2/4 1 1 1 Â±1 102 900 470 67.5 2 Â±1 1 1 120 860 410 65 3 1 Â±1 1 117 800 570 77.5 4 Â±1 Â±1 Â±1 198 2294 240 74.5 5 1 1 1 103 490 640 62.5 6 Â±1 1 Â±1 132 1289 270 67 7 1 Â±1 Â±1 132 1270 410 78 8 Â±1 Â±1 1 139 1090 380 70 9 1.633 0 0 102 770 590 76 10 Â±1.633 0 0 154 1690 260 70 11 0 1.633 0 96 700 520 63 12 0 1.633 0 163 1540 380 75 13 0 0 1.633 116 2184 520 65 14 0 0 Â±1.633 153 1784 290 71 15 0 0 0 133 1300 380 70 16 0 0 0 133 1300 380 68.5 17 0 0 0 140 1145 430 68 18 0 0 0 142 1090 430 68 19 0 0 0 145 1260 390 69 20 0 0 0 142 1344 390 70 hydrated silica x x , silane coupling agent Â£ 2 , and sulfur level x 3 . The properties to be optimized, and their specifications are: PICO Abrasion Index, y x 120 < 2/i 200% Modulus, i /2 1000 < 2/2 Elongation at Break, yz 400 < 2/3 < 600 Hardness, y$ 60 < 2/4 < 75 A central composite design with 20 runs is used for the experiment. The central composite design consists of a full 2 3 factorial, with settings coded as Â±1, axial points coded as Â±1.633, and six center points, coded as 0. The settings and responses for Â« the experiment are in Table 4.1. The data was fit to a full second order response surface polynomial, and the estimates of the coefficients are in Table 4.2.
PAGE 72
65 Table 4.2: Regression Coefficients and Standard Errors for Responses. A 2/i A 2/2 A 2/3 A 2/4 A 139.12 1261.11 400.38 68.91 A 16.49 268.15 99.67 1.41 A @2 17.88 246.50 31.40 4.32 As 10.91 139.48 73.92 1.63 Ai 4.01 83.55 7.93 1.56 A 2 3.45 124.79 17.31 0.06 &3 1.57 199.17 0.43 0.32 &2 5.13 69.38 8.75 1.63 As 7.13 94.13 6.25 0.13 As 7.88 104.38 1.25 0.25 Std. Error 5.61 328.69 20.55 1.27 A comparison of all of the proposed methods will be done using the regression equations given in Table 4.2, and limits given by Derringer and Suich. The optimization will be performed within a sphere of radius 1.633. Additional parameters need to be specified for some of the methods. The Derringer and Suich method uses artificial upper limits of 170 and 1300 for the first two responses, respectively. All of the shape parameters are set to 1. The Khuri and Conlon method would probably use 192 and 2271 as targets for the first two responses, these being the maximum values that can be attained by these responses within the experimental region. Ames et al. use 120 and 1000 as targets for the first two responses, though these targets are probably ill conceived, as they are the lower specification limits on the responses. A better choice would have been the upper limits specified by Derringer and Suich. The cost matrix used for ViningÂ’s method is C Â— S 1 , which is based on a multivariate control statistic. The results are in Table 4.3 for the proposed probability method, the desirability method of Derringer and Suich, the distance method of Khuri and Conlon, the weighted loss method of Ames et ah, and the mean square error (MSE) method of Vining. The probability
PAGE 73
66 Table 4.3: Comparison of Optimization Methods. Method Proposed Desirability Distance Wgtd Loss MSE Location x 1 0.329 0.050 0.534 0.461 0.073 x 2 0.863 0.145 1.536 0.283 0.408 xz 1.244 0.868 0.148 0.528 0.549 Response /S 2/i 131.1 129.4 166.3 122.7 138.7 A 2/2 1464 1300 1474 1069 1318 A 2/3 445.4 465.7 359.4 500.3 423.6 A 2/4 69.6 68.0 73.8 67.5 69.6 Probability 0.886 0.781 0.020 0.403 0.719 listed is the probability that all of the responses are within their specification, or P(120 < yi, 1000 < y 2 ; 400 < y 3 < 600; 60 < y 4 < 75) 600 /*75 Â‘OO POO 120 j 1000 J 400 r 0 ~ i / (27r)Â“ 2 i) ~ 1/2 exp((y y(x))"Z (yy(x)))dy, J 60 assuming that the response vector y is multivariate normal with mean vector y(x ) and variancecovariance matrix S. The Fortran program that calculated this probability is contained in Appendix B. The estimate of the variancecovariance matrix is 31.49 34.78 3.14 1.13 ' 34.78 108039.33 1489.08 30.36 3.14 1489.08 422.27 1.36 1.13 30.36 1.36 1.61 j As the table shows, the proposed probability method outperforms the other methods when the probability of being within specification is compared. The probability of 0.886 is much larger than Derringer and SuichÂ’s 0.781, and ViningÂ’s 0.719. The other two methods perform rather poorly in comparison, with Khuri and Conlon predicting a probability of 0.020, and Ames et al. predicting 0.403. Derringer and SuichÂ’s method seems to suffer from the choice of the artificial upper limit on y 2 . They choose an upper limit of 1300, above which the product is not
PAGE 74
67 thought to have any added value, and their optimal solution does indeed hit 1300. This level, however, is only 300 above the lower limit of 1000, which is less than the standard error of 327. The probability method, on the other hand, has this response at 1464, or roughly one and a half times the standard error. ViningÂ’s MSE method uses the same artificial upper limit of 1300 for y 2 and suffers similarly for it. The weighted loss method of Ames et al. used a target of 1000 for y 2 , which was not a good choice. The resulting optimum has y 2 = 1069, which is barely inside the specification limit of 1000. The Khuri and Conlon method suffers the most by ignoring any of the specifications and relying solely on targets. Their target for y\ is 192, and their resulting optimum has y\ Â— 166.3, which is 46.3 above the lower limit of 120. In terms of the standard error, 46.3 translates to 8.25 times the standard error of 5.61, clearly much farther away from the specification limit than it needs to be. For comparison, the proposed probability method has yi = 131.1, or approximately 2 times the standard error above the specification limit of 120. In the above comparison, ViningÂ’s MSE method suffers greatly due to the choice of 1300 as an upper limit for y 2 . Since Vining did not use this example in his paper, it is not clear whether he would have selected 1300 as an appropriate upper limit, since much of the experimental data is at or above this level. In order to investigate the effect of this choice, the analysis is done for several choices of an upper limit, with the results being given in Table 4.4. The final upper limit used is slightly larger than the largest value of 2/2 observed in the experiment. It is obvious that a higher upper limit for ?/2 improves the solution in terms of the probability of being within specification, but it is this dependence on the choice of an upper limit that causes problems for this method. A lower specification limit is given for y\ and y ^ , and each response should be as large as possible. Trying to define what an appropriate upper level should be can be quite difficult. It is assumed that Derringer and Suich used upper limits based on their knowledge of the process and
PAGE 75
68 Table 4.4: MSE Method for Different Choices of y 2 s Upper Limit. Limit *s) / A A \y 1 , 2 / 2 , 2 / 3 , 2 / 4 ) MSE Prob 1400 (0.106, 0.447, 0.587) (138.9, 1331, 422.6, 69.6) 53.8 0.7180 1600 (0.110, 0.446, 0.616) (138.5, 1333, 424.3, 69.6) 55.6 0.7351 1800 (0.114, 0.445, 0.647) (138.0, 1335, 426.2, 69.5) 58.2 0.7520 2000 (0.118, 0.442, 0.679) (137.4, 1338, 428.2, 69.4) 61.5 0.7684 2200 (0.122, 0.438, 0.712) (136.9, 1341, 430.3, 69.3) 65.5 0.7842 2400 (0.125, 0.434, 0.747) (136.3, 1345, 432.7, 69.2) 70.2 0.7990 product that they were experimenting with. If there is not enough information to set an appropriate upper limit, other methods may be attempted. The method proposed by Khuri and Conlon picked the largest fitted value that could be achieved in the experimental region. As the above example shows, such a choice may result in a solution that has one of the responses well outside of its specification limits. A similar situation can exist with the MSE approach. Setting the upper limit for y 2 to 2400 should imply that the upper limit for yi should be set at 200, since both values are slightly larger than the largest value observed in the experiment. These choices for the upper limit will give a result that is even worse than that of Khuri and Conlon. The maximum MSE is achieved at x = (0.611,1.514,0.009), which gives a response vector y = (171.8, 1548,342.1,73.8) which results in a probability of meeting all specifications of 0.002. Problems with the definition of an appropriate upper limit for Â’larger is betterÂ’ situations illustrates the advantage of using the proposed probability method. The probability method is not trying to get the response as close to some chosen target, but is trying to get it as far as possible from the specification limit. 4.3 Repeatability of a Multiresponse Optimization Experiment One issue that needs to be dealt with in optimization experiments is the question of how close the final optimal solution is to the true optimal solution. One way to look at this question is to define the true functional relationship of the
PAGE 76
69 response variables to the control variables, as well as the true variability associated with the responses, and repeatedly rerun the experiment using randomly generated data. If the experiment is simulated a large number of times, then some information about the closeness of the estimated final solution to the true solution can be measured. A simulation was carried out starting with the Derringer and Suich data set. The true functional relationship is defined as Ui Â— Xf3i + Â£Â»,*= 1, 2, 3, 4, where the elements of /? are the regression coefficients from the fit of the Derringer and Suich data, whose estimates are listed in Table 4.2, and e; is normal with mean 0 and standard deviation equal to the standard error from the fit of the Derringer and Suich data, also listed in Table 4.2. Given this information, the 20 run central composite design can be repeatedly simulated, the model fit, and the optimal solutions found for both the Derringer and Suich method and the currently proposed probability method. This was done 5000 times with the results shown in Figures 4.2 and 4.3. Figure 4.2 gives the distribution of each element of the deviation x x opt = (aq 0.329, x 2 0.863, x 3 + 1.244), where x opt is the optimal solution found by the probability method. Figure 4.3 gives the Euclidean distance between each solution x and the ideal maximum, as well as the distribution of the true probabilities found for each method. Table 4.5 gives the statistical information contained in the figures. It should be noted that the simulation was done on a Gateway personal computer with a 333 MHz Pentium processor in Microsoft Fortran Powerstation and took appoximately 16 hours to run. The Fortran program is listed in Appendix B. The interpretation of the graphs is somewhat difficult, as the desirability graphs for the individual elements of the x vector from Derringer and SuichÂ’s
PAGE 77
600 800 0 100 200 300 400 500 0 200 400 600 800 70 o O CM O * xl deviation probability xl deviation desirability x3 deviation probability x3 deviation desirability Figure 4.2: Distance from True Optimal x for Probability and Desirability Methods
PAGE 78
500 1000 1500 0 500 1000 1500 71 Distance for Probability Distance for Desirability 0.2 0.4 0.6 0.8 o o o o o o o CD O o C\J o J I 1 1 1 0.2 0.4 0.6 0.8 True Probability for Probability True Probability for Desirability Figure 4.3: Euclidean Distance from True Optimum and Probability for Probability and Desirability Methods.
PAGE 79
72 Table 4.5: Repeatability of Optimization Methods. Probability Mean Std. Dev. Desirability Mean Std. Dev. %l,opt %2 %2,opt ^3 *^ 3 , opt Distance from Optimum Probability 0.105 0.224 0.177 0.354 0.169 0.317 0.497 0.330 0.810 0.079 0.336 0.170 0.586 0.270 0.301 0.253 0.357 0.250 0.768 0.076 method are going to deviate more from the optimal solution x op t since they are being compared to a solution from the proposed probability method. For this reason, the central location of those graphs are further from 0.0 than the same graphs for the probability method. The variability of both methods are roughly equivalent, which points out that the precision of the two methods are approximately the same. The variability associated with either method is larger than what would be hoped for in an optimization experiment. The distance graphs give a similar interpretation, with variability being high for both methods, and the probability method being closer to 0.0 on the whole. The true probability graphs show that the probability method gives an optimal solution that has a much better probability of being within all specifications than the solution provided by the desirability method of Derringer and Suich. 4.4 Albumin Nanospheres Example Muller et al. (1996) describe an experiment to optimize the manufacturing process of albumin nanospheres. Albumin particles have been used as drug carrier systems, and, if small enough, they may be utilized as a sitespecific drug carrier. This sitespecific manner of drug delivery could eliminate possible severe adverse drug reactions due to systemic application. The three responses of interest are the yield, ?/i, particle size, y 2 , and polydispersity index, y 3 . The yield is calculated as the weight of the dried microspheres in relation to the sum of the starting material, and
PAGE 80
73 Table 4.6: Control Factors and Levels. Factor 1.664 1 0 +1 +1.664 Albumin (% w/v) 5 11 20 29 35 Aqueous phase volume (% v/v) 1.3 3.4 6.5 9.6 11.7 Duration of Emulsification (min) 5 9 15 21 25 Glutaraldehyde (mmol) 0.2 2.8 6.6 10.5 13 Drug (mg) 0 10 25 40 50 should be made as large as possible. The particle size is obtained from a scanning electron microscope, and is a measure of the mean diameter of the microspheres, and should be made as small as possible. The polydispersity index is a measure of the width of the size distribution, and should be made as small as possible. A fivefactor orthogonal central composite design was run to find the optimal settings of the control factors that will produce the most desirable nanospheres. The five factors that are thought to affect the production are albumin concentration, Xi, aqueous phase volume, x%, duration of emulsification, Â£3, glutaraldehyde concentration, X 4 , and amount of drug, x 5. The central composite design consists of a 2 5_1 fractional factorial, axial points, and three center point replicates. The coded and actual levels of the factors are listed in Table 4.6. The experimenters took a desirability function approach to finding the optimal settings, in the manner of Derringer and Suich. The limits for the responses of interest are given in Table 4.7. The original analysis in the paper does not adhere to the exact desirability approach presented by Derringer and Suich in their original paper. The purpose of the experiment was to obtain information about the process in order to design a new process which would result in smaller nanospheres with a smaller polydispersity. For this reason, they do not actually recommend an optimal setting of the control factors, they merely interpret the response surface graphs that they generated, in order to assist in the design of the new process. The data from this experiment will be used to compare the method that has been proposed here
PAGE 81
74 Table 4.7: Response Variables and Limits. Variable Lower Limit, y" an Upper Limit, yÂ™ ax Goal Yield (% w/w) 50 100 Maximize Particle Size (nm) 0 500 Minimize Polydispersity Index 0 0.2 Minimize with the desirability method of Derringer and Suich. In order to compare the two methods, the data will be used to fit the individual responses to full second order polynomial models in the control settings. These models will be used to generate the optimal settings for the two methods. In addition, the probability approach will need to estimate the variancecovariance matrix of the responses. The data from the experiment can be found in Table 4.8. After an initial attempt to analyze the data as it is, it was determined that the models for size y 2 and polydispersity y 3 would be better fit after a log transform. The naive approach of analyzing the data without a transformation led to estimates of the responses at the optimal settings that were negative. Clearly a negative size and a negative measure of dispersion were undesirable. The need to transform these data should not be surprising since both of the distributions of the responses are quite skewed. For simplicity, all of the responses were fit to full second order polynomials, given by 5 55 PiO + Pij x j + j = 1 j=l k>j for i = 1,2,3, where the ft are the least squares estimates of the true regression parameters. The fitted parameters for the responses are in Table 4.9. The two methods used to find the optimal settings of the control factors are the probability method and the desirability method. The probability method seeks the location x that maximizes the probability P(?/i > 50; 2/2 < 500; 2/3 < 0.2), where the transformed response vector (yi,ln(y 2 ),ln(yz)y is assumed to be multivariate normal with mean vector given by (y\,ln(y 2 ),ln{yz))' and variancecovariance matrix A
PAGE 82
75 Table 4 . 8 : Experimental Data . Xl %2 ^3 Â£4 Â£5 Vi 2/2 2/3 1 1 1 1 +1 78 222 0.077 +1 1 1 1 1 78 187 0.083 1 + 1 1 1 1 12 1186 0.586 +1 + 1 1 1 +1 89 286 0.274 1 1 +1 1 1 83 323 0.232 +1 1 +1 1 +1 80 182 0.112 1 + 1 +1 1 +1 8 1106 0.884 +1 + 1 +1 1 1 93 321 0.313 1 1 1 +1 1 91 277 0.178 +1 1 1 +1 +1 69 263 0.273 1 + 1 1 +1 +1 15 1781 0.878 +1 + 1 1 +1 1 81 296 0.294 1 1 +1 +1 +1 89 324 0.267 +1 1 +1 +1 1 70 197 0.110 1 + 1 +1 +1 1 75 228 0.089 +1 + 1 +1 +1 +1 11 708 0.664 1.664 0 0 0 0 90 330 0.350 + 1.664 0 0 0 0 93 372 0.394 0 1.664 0 0 0 76 162 0.107 0 + 1.664 0 0 0 7 1486 0.718 0 0 1.664 0 0 83 2024 0.498 0 0 + 1.664 0 0 99 381 0.394 0 0 0 1.664 0 68 234 0.093 0 0 0 + 1.664 0 81 215 0.039 0 0 0 0 1.664 73 220 0.126 0 0 0 0 + 1.664 84 220 0.093 0 0 0 0 0 97 218 0.101 0 0 0 0 0 82 221 0.073 0 0 0 0 0 74 217 0.083
PAGE 83
76 Table 4.9: Analysis of Regression. Vi ln(y 2 ) ln{y 3 ) A 86.254 5.589 2.260 A A 5.803 0.205 0.083 A A 17.124 0.487 0.517 A A 1.050 0.172 0.020 A 0.076 0.007 0.169 A 5.836 0.129 0.157 Ax 1.428 0.049 0.409 A2 16.630 0.170 0.303 As 1.248 0.081 0.472 A 4 4.711 0.112 0.248 As 3.267 0.119 0.037 4x2 13.000 0.127 0.014 4x3 7.625 0.142 0.044 4x4 12.375 0.137 0.182 As 1.000 0.085 0.114 A4 1.250 0.062 0.201 As 8.250 0.169 0.179 a 4 1.125 0.107 0.245 As 7.625 0.113 0.157 As 7.625 0.248 0.309 R 2 0.939 0.870 0.931 Adj R 2 0.785 0.544 0.757 4Mse 13.311 0.494 0.428
PAGE 84
77 given by A 177.17 3.67 3.13 3.67 0.24 0.13 3.13 0.13 0.18 \ / The desirability method seeks the location x which maximizes an overall desirability function D = (did 2 d 3 ) 1/3 , where di, the desirability of the first response, is defined by: j/i~ 50 10050 d x = < 0 \ 1 if 50 100 and d 2 and d 3 , the desirabilities of the second and third responses, respectively, are defined by: r max IJi yi y max ifO y max i if ili < o where the y? ax = 500 and y^ ax = 0.2. The optimal settings given by each method are given in Table 4.10. The probability listed is the probability for the given location x, P(?/i > 50; 2/2 < 500; y 3 < 0.2). The probability method results in a solution that predicts a smaller size and polydispersity than the desirability method, which is the result that was sought by the optimization experiment. The yield that is predicted is also smaller, though, which is not what is desired. The lower yield is still well above the lower limit of 50 that was given. Since the standard error for yield is 13.39, the predicted yield of 75.79% is approximately two standard deviations above the lower limit of 50.
PAGE 85
78 Table 4.10: Optimal Settings. Probability Desirability Xi 0.542 0.059 0.533 0.085 Â£3 0.257 0.095 Â£4 1.42 0.008 Â£5 0.326 0.083 A 2/1 75.79 83.79 ln(y 2 ) 5.09 5.64 ln(y 3 ) 2.83 2.19 A 2/2 161.60 282.14 /S 2/3 0.059 0.112 Probability 0.9303 0.7206 It should be noted that the assumption of normality is used to greatly ease the calculations that have been made. In the above example, two responses were transformed to make the assumptions of normality more realistic. If one were ambitious enough, a different distribution could be assumed for the responses, instead of using a transformation. For instance, the above approach used a log transform, i assumed that the transformed response was normal, whose mean depended on the control variables, and the variance was constant over the experimental area. Instead of this, the experimenter could have assumed a Weibull distribution, with the scale parameter depending on the control variables, and the shape parameter being constant. The optimization would still be to maximize the probability of being within specifications, but the distributional assumptions would be different. The probability of being within specifications is a multiple integral that would have to be calculated, but it could be performed, if desired.
PAGE 86
CHAPTER 5 MULTIPLE RESPONSE ROBUST PARAMETER DESIGN 5.1 A Description of the Proposed Method In situations where there are multiple responses and the variance of one or more of these responses cannot be assumed to be constant over the experimental region, the same method that has been used in other scenarios may be used, i.e. maximize the probability that the responses are within their specified limits under the assumption of normality. This method is an extension of the multiple response optimization, as well as the robust parameter design. Since the variance cannot be assumed to be constant over the experimental region, both the response and the standard deviation will be modeled from the experimental data for each of the responses of interest. These models will then be used to estimate the probability that the responses are within their respective limits. The final optimal setting of the control factors x will be that combination that maximizes the probability of meeting all specifications. Let X{ be a pxl vector representing the i th setting of the control variables and Dij be the j th response of interest at Xi, for i = 1, . . . , n and j = 1, . . . , r. Suppose that an appropriate linear model is Vij Â— f'( x i)Pj + Uj> (51) where f(xi) is a q m x 1 vector representing the appropriate polynomial expansion of Xi for the assumed model for the responses, and f3j is a q m x 1 vector of unknown coefficients for the j th response, and the e*.,are normally distributed with mean 0 and variance of. Furthermore, let the errors of the j th response be independent, or let 79
PAGE 87
80 Â€ij be independent of e^j, i ^ i ! . In other words, the errors for a given response are assumed to be independent, but correlation exists between the different responses. Consider an appropriate transformation of the variance h(Tij) = ajj, with a linear predictor defined by Tij = g{xi) 'jj, ( 5 . 2 ) where g(xi ) is a q v x 1 vector representing the appropriate polynomial expansion of Xi for the assumed model for the function of the variance, and 'Yj is a q v xl vector of unknown coefficients for the j th function of the variance. Common transformations of the variance include the log transformation (r^ = log{afj)) or the square root transformation (r^ = <7y). The first step of the optimization process will be selecting an appropriate experimental design that will allow for the estimation of the models for both the average response and the variance of the response (or an appropriate function of the variance). The model for the mean response is usually assumed to be a second order polynomial, so a response surface design is typically used. Since the variance will also need to be modeled, some replication of this design will be necessary in order to estimate the variance at different points in the experimental design. In addition to the average response and the variance, some estimate of the correlation of the responses will be necessary. With all of this estimation needed, the experimental design may need to be very large, which could make the experiment expensive and difficult to carry out. A central composite design with some replication will allow estimation of the functions of the response of interest and the variance, but the correlation structure will still need to be estimated. If the correlation structure is assumed to be constant over the experimental region, the addition of replicated center points could be used
PAGE 88
81 to estimate the correlation of the responses of interest, given by n c Pfj = T(y kJ yoj)(ykj' yoj')/( s oj s oj'), k Â— 1 where n c is the number of times the center point has been replicated, and y 0 j and s 0 j are the mean and the standard deviation of the j th response at the center point. Though the assumption of a constant correlation structure simplifies the optimization procedure, it may be a valid assumption in many cases, especially where physical characteristics are the responses of interest and their correlation is inherent to the product. Another possible way to estimate the correlation structure is to use the replicated points to form a pooled estimate of the correlation. In this case the correlation can be based on Y SC [I Â— X(X' X)~ l X']Y sc divided by the appropriate number of degrees of freedom, where Y sc is the matrix of responses, scaled by the estimated standard deviation, or y^/s ij. This estimate depends on the assumed model, thus any model misspecification could lead to a poor estimate of the correlation coefficients. Vining and Schaub (1996) give some recommendations for experimental designs that will be able to estimate the mean and the variance of a response and will still be small enough to be practical to run. One situation of interest is when the response of interest is assumed to be a second order response surface model of the control variables, and the variance (or appropriate function of the variance) is assumed to be a linear function of the control variables. In this case they suggest that a central composite design be used as the basis for the experiment. The full design will be used to estimate the second order response surface of the response of interest. In order to estimate the linear function of the variance, the factorial points of the central composite design will be replicated, which will allow for estimates of the variance at these points. The factorial portion of the design will be used to estimate the linear function of the variance.
PAGE 89
82 Suppose that an appropriate experiment has been run. Let be the nxl vector of the j th response, and let Tj be the nÂ„xl vector of the j th linear predictor. Here n v represents the number of distinct design points that have been replicated. From equation (5.1), we obtain Vj ~ X Pj + e i where X' = [/(aq), . . . , f(x n )}. Similarly, from equation (5.2), we obtain Tj = Zjj where Z' = [y(aq), . . . ,g(x n )]. At this point, the least squares estimates could be used to obtain estimates of the unknown parameters / 3 j and for j = 1, . . . , r. It is now possible to estimate, for a given setting of the control factors x, the mean of each of the responses and a function of the variance for each of the responses. These are given by Vj{x) = f(x)/ 3 j and fj(x) = g(x)'jj for j = 1 , . . . ,r, where = (X'X)~ 1 X'y j and 7 j = (Z'Z)~ 1 Z l Tj. In addition, the variancecovariance matrix of the responses can be estimated by S(Â»), whose i th diagonal element is h(fj), and the ij th offdiagonal element is p*j\/h(fj)/i(fj). The optimal setting of the control factors is the combination x that maximizes the probability that all of the responses y are within their specified limits, or The probability is calculated by assuming that the response vector y is multivariate A normal with mean vector y(x) and variancecovariance matrix E(a:). For responses that are to maximized, the upper limit is set equal to +00, and for responses that are to be minimized, the lower limit is set to Â—00. This method of maximizing the probability of being within specification has the appeal that it is an extension of the methods proposed in previous chapters.
PAGE 90
83 This will allow an experimenter to proceed in a similar fashion for any optimization problem. Each experimental situation will be slightly different with regards to the choice of experimental design, methods for estimating the parameters of interest, and the models that are assumed to be correct, but the optimal setting of the control factors x will be found in the same way, i.e. maximizing a probability. Pignatiello is the only other person to propose a method of optimization for the situation of multiple responses whose variance cannot be assumed constant. He proposes replicating every point in an experimental design and minimizing an estimate of the mean squared error, given by R(flj) = trace (CS (a;)) T (y(aj) Vtarget ) ^(sKÂ®) Vtarget) i where C is a cost matrix that must be specified. This method requires that the quantity R(x) be found for each point in the experimental design, and this quantity (or a function of this quantity) is to be fit to a predictive model in the control variables x. This approach needs a separate estimate of the mean and variancecovariance matrix at each point in the experimental design. The obvious estimate of the mean is the average of the replicated runs at that point y(x). The estimate of the variancecovariance matrix at a given experimental point is given by S(s), whose ij th element is given by n r ^2(yik Vi)(yjk yj)/ n r, k = 1 where n r is the number of times that the design point has been replicated. The mean squared error method proposed by Pignatiello does have some problems that need to be addressed. The requirement that the quantity R(x) be calculated at every design point means that the entire experiment must be replicated enough times to allow for proper estimation of all of the quantities in the equation. This total replication may lead to a very large experiment. A smaller experiment
PAGE 91
84 may be obtained by the use of partial replication, as described above. One could easily use the same estimations of the response and the variancecovariance matrix as are used for the proposed probability method, and then find the location of the control variables x that minimized the quantity R(x). This would, in essence, replace the probability function with the mean squared error function, with everything else remaining essentially the same. A more serious problem, perhaps, is that of selecting a target in the cases of larger or smaller is better, and of selecting the values for the cost matrix. Pignatiello admits that the targets are only applicable in the case of Â’target is best,Â’ and he does not recommend using this method for the other situations. The experimenter could conceivably use artificial targets, as Derringer and Suich recommend for multiple response optimization, but the final solution could depend greatly on the choice of these targets. Though he does give recommendations for selecting the values for the cost matrix, it still may be based on subjective choices which may greatly influence the final solution. These cost quantities may be less well known than the specifications required for the product. As was seen in the case of multiple response optimization, the final solution can depend greatly on the choices of targets and cost matrix values. The proposed probability method does have some shortcomings, also, such as ease of calculation, size of the experiment, and the ability to estimate all of the parameters. Since a computer program needs to be written, this method is not as straight forward as a method that could be done in a spreadsheet, such as Microsoft Excel. The program that was used for the example that follows is similar to that listed in Appendix B, and it should be fairly easy to modify for other applications. The potential large size of the experiment is not unique to the probability method. Any time both the mean response and the variance need to be estimated, replication of all or part of a traditional design will be needed.
PAGE 92
85 5.2 Oxygen Plasma Anodization Example A simulated example will be presented that deals with the frequency adjustment of quartzaluminum oscillator crystals. The following is a hypothetical process, and the description should be treated only as motivation for the simulated experiment that follows. The crystals have aluminum plating on top of a quartz substrate. The frequency of the crystals depends on the thickness of the quartz and the total mass of the aluminum plating. The crystal is adjusted to its final frequency by means of an oxygen plasma. The crystal is placed in an oxygen rich chamber near a high voltage anode. When voltage is applied to the anode, the surrounding oxygen becomes a plasma. When the crystal is placed near the plasma, the oxygen molecules combine with the aluminum molecules to create aluminum oxide. The extra oxygen molecules increase the mass of the aluminum plating, thus reducing the frequency of the crystal. The responses of interest in the frequency adjustment of the crystals are the total amount of frequency change that can be achieved, yi, and the final resistance of the crystal, y 2 . The total amount of frequency adjustment should be as large as possible, but must be greater than 60 kHz in order for the process to run efficiently. Past experience has shown that the range of frequencies after the initial aluminum plating is approximately 60 kHz, due mostly to differences in the thickness of the quartz (see Figure 5.1). The goal of initial plating is to have the frequency of the crystal 30 kHz higher than the final target frequency, on average. If this is achieved, then the frequency of the crystals range from target to 60 kHz above target. All of the crystals are then put into the oxygen plasma chamber in order to adjust all of them to the same final frequency. Due to process variability, not all lots of crystals are targeted properly at 30 kHz above final frequency, thus some crystals are below the final frequency and some are more than 60 kHz above the final target. Those crystals that are too low in frequency must be reworked. If the process cannot
PAGE 93
86 Figure 5.1: Frequency Distribution of Quartz Crystals. adjust the frequency more than 60 kHz, the crystals that are too high in frequency will be unable to be adjusted to final frequency, and must be reworked. If the range of frequency adjustment can be made greater than 60 KHz, then the frequency distribution can be shifted slightly, and the amount of crystals that need to be reworked could be reduced. It is for this reason that settings for the control variables x are sought that maximize the amount that the frequency can be adjusted. The resistance of the crystal is effected most by contaminants in earlier processing steps, and by specification must be held below 30 ohms. The control variables that affect the responses y are the distance that the crystal is from the anode, aq, and the pressure of the oxygen rich chamber, x When the pressure is too low, there is not enough oxygen available to bind to the aluminum plating on the crystal, causing the total possible frequency shift to be
PAGE 94
87 quite low. As the pressure increases, the amount of oxygen available increases, which in turn increases the amount that the frequency can be shifted. At some point, the increased pressure causes the plasma to be contained too close to the anode, which begins to reduce the amount that the frequency can be shifted. When the crystal is too far from the anode, none of the plasma is available to the aluminum, so the amount that the frequency can be adjusted is fairly small. As the crystal moves closer to the anode, it comes in contact with the plasma cloud that surrounds the anode, and thus the amount that the frequency can be adjusted increases. At some point the crystal can become too close and actually inhibit the flow of oxygen around the anode, thus reducing the plasma cloud and decreasing the amount that the frequency can be adjusted. The variability of the amount of frequency shift tends to increase as distance between the anode and the crystal decreases and the pressure increases. The resistance of the crystal can increase due to contaminants coming off of the anode and attaching to the crystal. This typically happens more at low pressures and when the distance between the anode and the crystal is small. The variability of the resistance follows the same pattern as the resistance itself, increasing as the distance and pressure decrease. A simulated experiment is performed whose goal is to find the settings of the control factors x that maximizes the probability that the resistance is less than 30 ohms, and the amount that the frequency can be adjusted is greater than 60 kHz. The true functional relationship between the responses, their variability, and the control factors is as follows: yi = 80.0 T l.Oxi T 0.5x2 Â— S.Ox^ 3.0x2 T 2. 0x^X2 y 2 = 20.0 + 0.5xi 0.5x 2 + 3.0xi + 2.0x2 + l0xix 2 o\ = 10.0 Â— 3.0xi + I.O.X 2
PAGE 95
88 Table 5.1: Experimental Data. X\ X2 2/i 2/2 Sl S 2 1 1 69.51801 40.09109 1 1 94.16965 27.49232 19.79 13.23 1 1 55.02909 13.63547 1 1 75.88098 12.99614 1 1 69.30468 24.13184 5.48 7.43 1 1 65.00851 27.08687 1 1 84.19299 27.77813 1 1 63.19381 15.60778 12.41 6.09 1 1 62.22724 21.59617 1 1 67.22293 29.85625 1 1 73.92506 20.72596 3.76 4.84 1 1 73.53838 22.49168 1.414 0 63.58725 24.38051 1.414 0 75.03255 33.32704 0 1.414 62.53672 21.47707 0 1.414 68.58016 29.29213 0 0 83.01640 5.34768 0 0 81.90584 8.20455 0 0 94.42947 14.14651 7.67 8.28 0 0 76.35438 27.28385 0 0 93.47378 22.04167 0 0 93.47162 17.18751 The covariance between the two responses is p \2 = 0.2. The setting of the control variables that maximizes the probability that both responses are acceptable is a; = (0.302,0.073), which results in a probability of 0.9098, and response vector y = (79.9,20.4). The standard deviation of the responses at this setting is o\ Â— 9.2 and 02 = 6.7. The experiment will be a central composite design with the factorial points replicated three times and the center point replicated six times. The axial points are not replicated. Data is simulated for the experiment and is in Table 5.1. The data from the experiment is used to fit a full second order model for the responses, a first order model for the standard deviation, and the correlation between the responses. All of the data is used to separately fit a full second order model for the responses.
PAGE 96
89 The fitted equations are: yi(x) Â— 87.1 + 0.8xi + 0.2x 2 7.5xJ 9.4x2 + l.lxix 2 y 2 (x) = 15.7 + 0.2X! + 0.2x 2 + 5.3x? + 3.5x2 + 2.1XiX 2 Just the five replicated points are used to fit a first order model for the standard deviation of the responses. The fitted equations are: &i(x) = 9.8 5.7x! 2.3x 2 cr 2 (x) = 8.0 Â— 1.8xi Â— 2.4 x 2 The six center points are used to estimate the correlation between the two responses. The estimated correlation is p\ 2 Â— Â—0.02, which is assumed to be constant over the experimental region. The optimal setting of the control variables x is the setting which maximizes P(t/i > 60; y 2 < 30) in the experimental region of radius \/2, under the assumption that y is multivariate normal with mean vector y and variancecovariance structure ^ a\{x) Pi2&i(x)v 2 (x) ' ^ Pi2^i{x)a 2 (x) &l(x) J The probability is calculated using the algorithm from Schervish, which has been described previously, and the maximum of the probability function is obtained from the IMSL algorithm NCONF, which finds the minimum of a function with nonlinear constraints. Based on this simulated experiment, the setting of the control variables that maximizes the probability of both of the responses being within their acceptable limits is x Â— (0.235,0.555). At this location in the experimental region, the true response vector is y = (79.6,20.8), and the probability is 0.8967. The standard deviation of the responses at this setting is o\ = 9.9 and a 2 = 6.6. A comparison of the true solution to the experimental solution is contained in Table 5.2. Though the
PAGE 97
90 Table 5.2: Comparison of Solutions. True Optimum Experimental Optimum Control Setting (0.302, 0.073) (0.235, 0.555) Mean Response (79.9, 20.4) (79.6, 20.8) Standard Deviation (9.2, 6.7) (9.9, 6.6) Probability 0.9098 0.8967 location of the control settings appear to be fairly far apart, the response, standard deviation, and probability are not that different. This is because the probability of being within acceptable limits is fairly flat near the true optimal setting. 5.3 Repeatability of Anodization Example The above analysis is a single experiment simulated from a hypothetical process and is only meant to be a step by step guide to finding the optimal setting of control factors that results in the maximum probability of being within acceptable limits. Since this is an analysis based on a single experiment, it would be beneficial to assess the repeatability of running such an experiment. Since the experiment is simulated, it may be easily repeated. The experimental data was regenerated 1000 times, each set of data being used to estimate the mean responses, standard deviations, correlation, and optimal setting of the control factors. The results are given in Table 5.3, which lists the true probability at the setting of the control factors and the distance the setting is from its true optimal setting Xi t0pt . As was the case with simple multiple response optimization, the repeatability of this simulated experiment is not particularly good. A scatter plot of the the locations of the experimental optima is in Figure 5.2, which has 50 and 90% confidence ellipses. This figure is perhaps a better indication that a single experiment may not do a good job in finding a point that is close to the true optimum. As was seen with the case of multiple response optimization, the lack of repeatability does not seem to be a problem that is isolated to one method of optimization, but seems to be a problem
PAGE 98
91 Table 5.3: Repeatability of Experiment. True Probability %l %l,opt ^2, opt Mean 0.821 0.099 0.022 Std. Dev. 0.081 0.564 0.735 Median 0.847 0.108 0.011 75th %ile 0.891 0.533 0.467 25th %ile 0.750 0.277 0.524 that will be encountered in any optimization experiment where a relatively small amount of data is generated.
PAGE 99
92 Figure 5.2: Location of Optimal Settings.
PAGE 100
CHAPTER 6 CONCLUSION AND FURTHER RESEARCH The probability method has been shown to work well for the situations of robust parameter design, multiple response optimization, and multiple response robust parameter design. In each case, it performs as well as or better than the methods that have been previously proposed. The analyses have also shown some serious potential problems with some of the other proposed methods. The proposed probability method is a unified and intuitive approach to the optimization problems that have been presented here. The analysis that has been done does point to some areas in the field of optimization that still need to be addressed. Perhaps the most important area that needs to be looked at has to do with the repeatability of any optimization experiment. As has been shown, a single optimization experiment may give an optimal setting that is far from the true optimum. This is a fact that experimenters must be aware of before they begin an such an experiment. Much cost and effort * could go into an experiment that returns a suggested combination of the control settings that is far from optimal. Such a situation could lead some to become skeptical of not only optimization experiments, but other statistical techniques, as well. For this reason, there needs to be more research on the repeatability of optimization experiments, and possible ways to make them more repeatable. It is unclear if there are experimental designs that would lead to more repeatable experiments, or if there are conditions on the responses that must be met in order for the repeatability to be more acceptable. There may also be other analysis techniques that could make the repeatability better, such as using prior production 93
PAGE 101
94 data in a Bayesian manner to improve prediction, or using confirmation experiments to either verify the results of the first experiment or to improve the prediction in the first experiment. A single optimization experiment will probably not result in the true optimal settings of the control variables. The experiment should, however, result in settings of the control variables that are close to the optimal settings, and methods should be explored that will most easily find these optimal settings. This should be part of a method of continuous improvement, where the process is constantly trying to be improved. Continuous improvement will also help to avoid problems that may arise from natural changes to the process, such as equipment wear, changes in raw materials, etc. Though most of the analyses that were performed here relied on normality, this assumption may be able to be relaxed. Since the normality was only used to simplify the calculation of the probability, it is conceivable that other distributional assumptions could be used. In fact, a Weibull distribution was used in an example for robust parameter design. If other distributions can be used in the univariate case, they may also be able to be used in the multivariate cases.
PAGE 102
APPENDIX A SOLVER COMMAND IN MICROSOFT EXCEL The following is a description of how to use Microsoft Excel to find the levels of the control variables that will maximize the probability that the response of interest is within its specification limits. This description is for robust parameter design situations where there is only on response of interest whose variance depends on the levels of the control variables. The situations where there are multiple responses is covered in appendix B. The numbers used below are from the printing press example given in chapter 3. The following description also contains information on the other methods that have been used to analyze this data set. Figures A.l and A. 2 show the spreadsheet and the solver command window. The spreadsheet has the optimal solutions for the proposed probability method, the constrained optimization method of Vining and Myers, the squared error loss method of Lin and Tu, and the constrained optimization method of Copeland and Nelson for two choices of A. In each of the methods, the cell below the cell labeled Â’Â’MeanÂ” is calculated from equation 3.1, with the xi,X 2 ,and Xs replace by the appropriate cell. Similarly the cell below Â”Std DevÂ” is calculated from equation 3.2. Similarly, all of the methods calculate the distance from the origin to the point x , which is in the cell under Â’Â’Radius.Â” For example, in the area for the probability method, the mean in cell DIO is given by =327.62963 + 177 * A10 + 109.42593 * BIO + 131.46296 * CIO (A.l) + 32 * A10 * A10 22.38889 * BIO * BIO 29.05556 * CIO * CIO + 66.027778 * A10 * BIO + 75.472222 * A10 * CIO + 43.583333 * BIO * CIO 95
PAGE 103
96 X Microsoft Excel vm dissert IB Be Edit jgew Insert Fgrra* loots Oat Disa III Arlai it. * b I u \ m 'A ."T B~ ; C ""p ~1 F 1 iTarqet is Best Example for Printing Study Data 2 C ms isis am ; ,,M ,u b r iti''Â’Â’Â’ TWWTi 5* 4 Probability Method 5 Lower Limit Target Upper Limit Probability 81 490 500 510 0.1759 S ^Optimal Information 9 1x1 x2 x3 Mean Std Dev Radius 10 Â“8S6 0.003 0.182 494.61 44.66 1 CopelandNelson Target Mean Delta Max Delta Probability 500 f 5.00 5 0.1759 Optimal Information xl x2 x3 Mean Std Dev Radius 0.983 0.004 0.182 495.00 44.71 1 Â— 12 iVining Myers CopelandNelson 13 Target Mean Probability Target Mean Delta Max Delta Probability 500 ! 0.1746 500 1.00 1 0.1751 4il >,4.>*Â«*>Â»Â» .,WÂ«vwmsvw.. ... ..V. .wy. .V.SW .Â«* . ~~ .> 16 (Optimal Information 17 1x1 x2 x3 Mean Std Dev Radius Optimal Information xl x2 x3 Mean Std Dev Radius m 0.984 0.025 0.175 500.00 45.32 1 0.984 0.021 0.177 499.00 45.20 1 m 1 a 4 23 :UnTu 21 Target Mean Loss Probability 500 2023.446 0.1759 j 1 1 i i M 241 Optimal Information pjxl x2 x3 Mean Std Dev Radius III 0.933 0.002 0.182 494.53 44.65 1 v i . tfT~ * r 'i i Figure A.l: Microsoft Excel Spreadsheet. and standard deviation in cell Dll is given by =34.883248 + 11.526786 * 410 + 15.323036 * BIO + 29.190296 * CIO (A.2) + 4.2037439 * 410 * 410 1.31585 * BIO * B10 + 16.777879 * CIO * CIO + 7.7194614 * 410 * B10 + 5.1092608 * 410 * CIO + 14.081718 * B10 * CIO. The radius given in cell E10 is calculated by = SQRT(A10 * 410 + B10 * B10 + CIO * CIO). Finding the optimal point for the probability method begins by calculating the probability of being within specifications. The probability of being between the lower limit (cell A6) and the upper limit (cell C6) is calculated in cell D6 by = NORM DIST(C0, DIO, E10, TRUE) NORM DIST(A0, D10, E10,TRUE).
PAGE 104
97 Figure A. 2: Window for Solver Command. The function NORMDIST gives the cumulative distribution of a given value for a normal function with a given mean and standard deviation. In order to maximize the probability calculated in cell D6, the user should first enter starting values of a;i,x 2 ,and x 3 into cells A10, BIO, and CIO, respectively. Then the user selects the Â” SolverÂ” option from the Â’Â’ToolsÂ” menu. Selecting Â’Â’SolverÂ” will open up the window given in Figure A. 2. The target cell is the probability calculated in D6, which is what is to be maximized. The maximization is to be done by changing cells A10, BIO, and CIO, which can be entered as Â”$A$10:$C$10.Â” A constraint can be added which keeps the search within a radius of 1, which is entered as Â”$F$10<=1.Â” When all of this is entered, click on the button labeled Â’Â’Solve,Â” and the algorithm will find the location of the control variables x that maximizes the probability of being within specification. The other methods are solved in a similar manner. The Vining and Myers method uses the solver command in a manner similar to that of the probability method. In this case, however, the target cell is E18, the standard deviation of the response, which should be minimized. An addition constraint is added that keeps the mean in cell D18 at 500. The method of Lin and Tu needs to calculate a loss in
PAGE 105
98 cell B22, which is given by = (Â£>26 A22) * (Â£>26 A22) + Â£26 * Â£26, or the bias squared plus the variance. With this method, the target cell is the loss in cell B22, and it is minimized. For the method of Copeland and Nelson, a delta is calculated in cell 16 (or cell 114), which is the absolute value of the difference of the mean in cell K10 (or cell K18) from the target mean of 500 in cell H6 (or cell H14). In this method, the target cell is the standard deviation in cell L10 (or cell L18), which is to be minimized. An additional constraint is that the delta calculated in cell 16 (or cell 114) be less than the given maximum delta given in cell J6 (or cell J14). For each of the methods, the probability of being within specifications is calculated, and this is done in the same fashion as for the probability method.
PAGE 106
APPENDIX B FORTRAN CODE FOR MULTIVARIATE NORMAL PROBABILITY The following code was used to find the maximum probability of being within specification for the multiple response optimization problem. It also calculates the probability of being within specifications for the solutions that the other methods have suggested. The subroutine mulnor makes up a majority of the program, and it can be found on the internet web page for StatLib at http://lib.stat.cmu.edu. The subroutine is algorithm 195 from the Applied Statistics section. The numbers in the example below are from the tire tread compound experiment described in chapter 4. program exl use msimsl USE PORTLIB integer N, ny ,xl,x2,x3 parameter (ny=4) parameter (N=3) dimension Xds(M) ,Xkc(N) ,Xamesl(N) ,Xames2(N) ,y(ny) ,Xmax(N) ,vining(N) data Xds/0 . 050,0 . 145,0 .868/ data Xkc/0. 534, 1.536,0. 148/ data Xamesl/0 . 461 , 0.283, 0.528/ data Xames2/0.379, 0.548, 0.640/ data vining/0.073, 0.408, 0.549/ INTEGER IPARAMC7), ITP, L, N0UT, MAXFCN REAL F, FSCALE, RPARAM(7) , X(N) , XGUESS(N), XLB(N) , XSCALE(N) REAL XUB(N), FT0L ,maxF 99
PAGE 107
4 100 integer tempi, temp2 REAL (8) elapsed_time EXTERNAL mprob ! XGUESS is the starting point of the search algorithm DATA XGUESS/ 0.5, 0.5, 0.5/ data XSCALE/1 .0,1.0, 1.0/, FSCALE/1.0/ ! XLB and XUB are the lower and upper bounds of the search DATA XLB/1. 633, 1.633, 1.633/, XUB/1 . 633, 1 . 633, 1 . 633/ I TP = 0 ! Default parameters are used IPARAM(l) = 0 MAXFCN=5000 FT0L=1 . OE5 elapsed_time = TIMEFO i ! BCP0L is a NelderMead type search algorithm ! Starting at XGUESS, it returns the value X that minimizes the ! function mprob (a subroutine contained in this program) . The ! value F is the value of the function at the minimum. i ! Note that since the probability it to be maximized, the ! subroutine mprob returns the negative of the true probability. i CALL BCP0L (mprob, N, XGUESS, ITP, XLB, XUB, FT0L, MAXFCN , X, F) elapsed_time = TIMEFO print*, X ( 1 ) ,X(2) ,X(3) , MAXFCN, F call evaluate (X,y)
PAGE 108
101 print*,y(l), y(2) , y(3), y(4) print* , elapsed_time ! The probability is calulated for the DerringerSuich method, print* , "DerringerSuich" call mprob(N, Xds, F) print*, F call evaluate (Xds ,y) print*, y(l), y(2), y(3) , y(4) ! The probability is calulated for the KhuriConlon method, print*, Â’KhuriConlonÂ’ call mprob(N, Xkc, F) print*, F call evaluate (Xkc, y) print*, y(l), y(2), y(3) , y(4) ! The probability is calulated for the Ames, et al. method, print* , * Ames Â’ call mprob(N, Xamesl, F) print* ,F call evaluate (Xamesl ,y) print*, y(l), y(2), y(3) , y(4) ! The probability is calulated for the Vining method, print* , Â’ViningÂ’ call mprob(N, vining, F) print* ,F call evaluate (vining, y) print*, y(l), y(2), y(3) , y(4) i
PAGE 109
102 The following will find the value of the probability for a grid of points in the experimental region. This data file can be exported and used to generate contour plots of the probability in the region of interest. It may be eliminated if it is not needed. open (3 , FILE= ; probnl . dat ; ) elapsed_time = TIMEFO temp2=163**2 maxF=0 . 0 do xl=16,16 do x2=16 , 16 templ=xl**2+x2**2 if (templmaxF) then maxF=F Xmax(l)=X(l) Xmax(2)=X(2) Xmax(3)=X(3) end if
PAGE 110
end if write (3, ; (F10.5,A,F10.5,A,F10.5,A,F10.5) ; ) X ( 1 ) , & ,X(2),> >,X(3),' >,F end do end if end do end do elapsed_time = TIMEF () elapsed_time print* ,Xmax(l) ,Xmax(2) ,Xmax(3) ,maxF call evaluate (Xmax,y) print* ,y(l) , y(2), y(3) , y(4) print* , elapsed_time close (3) end ! ! This is the start of the subroutines SUBROUTINE mprob (N, X, F) ! This returns the probability F at a point X. ! ! The key is the subroutine mulnor, which calculates the ! probability of being within a rectangular region, assuming ! a multinormal distribution. i INTEGER N REAL X(N) , F parameter (ny=4) integer i
PAGE 111
104 dimension a(ny) ,b(ny) ,sig(ny*(nyl)/2) ,y(ny) ,yu(ny) ,yl(ny) dimension stdev(ny) integer inf (ny) , if ault real prob, bound, eps data eps/0.0001/ ! sig gives the upper triangular portion of the correlation matrix data sig/0. 01885893, 0.02720418, 0.22046146, 0.15876509, & 0.072886, 0.05203539/ ! inf gives the types of bounds on the responses: ! 0 is used if the upper bound is infinity ! 1 is used if the lower bound is infinity ! 2 is used if both bounds are specified data inf /0, 0,2, 2/ ! yl and yu are the lower and upper bounds of the ! rectangular region data yl /120 . 0, 1000 . 0,400. 0 ,60 . 0/ , yu /500.0, 1500.0,600.0,75.0/ ! Since the individual responses are assumed to have a variance of ! 1, they must be scaled by the standard deviations, data stdev /5.6112, 328.69, 20.549, 1.2674/ call evaluate (X,y) do i=l,ny a(i)=(yu(i)y(i) ) /stdev (i) b ( i) = (yl ( i) Â— y ( i) ) /stdev (i) end do call mulnor (a, b, sig, eps ,ny, inf , prob, bound, if ault) F=prob * (1) end
PAGE 112
105 subroutine evaluate (x,y) ! This evaluates the DerringerSuich fit for a given x. dimension y(*),x(*) y Cl) = 139 . 1192387+16 . 4936447 *x ( 1) + 17 . 8807 651*x (2) + & 10 . 9065385*x(3) 4 . 0096009*x ( 1) **23 . 4471056*x (2) **2 y Cl) = y ( 1 ) Â— 1 5721213*x(3) **2+5 . 125*x(l)*x(2)+7 . 125*x(l) *x(3) & +7 . 875*x(2) *x(3) y (2) = 1261. 133138+268. 151102*x(l) +246. 503174*x(2) +139. 484533*x(3) & 83 . 565885*x ( 1) **2124 . 815539*x (2) **2 y (2) = y(2)+199.181747*x(3)**2+69.375*x(l)*x(2)+94.125*x(l)*x(3) & +104 . 375*x(2) *x(3) y (3) = 400 . 384575499 . 6664161*x (1) 31 . 3963948*x (2) 73 . 9190024*x (3) & +7 . 9326889*x (1) **2+17 . 3076104*x (2) **2 y (3) = y(3)+0.4327517*x(3)**2+8.75*x(l)*x(2)+6.25*x(l)*x(3) & +1 . 25*x(2) *x(3) y (4) = 68 . 909615211 . 40984528*x (1) +4 . 31968553*x (2) +1 . 63484452*x (3) & +1 . 55768153*x (1) **2+0 . 05769409*x (2) **2 y (4) = y (4)0 . 31730277*x(3) **21 . 625*x(l) *x(2)+0 . 125*x(l) *x(3) & 0.25*x(2)*x(3) end The following is the Fortran program that was used to generate the simulation results for the repeatability of multiple response optimization. The subroutines of the program that are identical to the previous program are not included, program DSsim use msimsl USE P0RTLIB
PAGE 113
106 integer N, NOBS , NIND ,LDX,NC0EF, INTCEP integer M, ME, IPRINT, IERSVR, IPACT, ISACT, ICODE parameter (INTCEP=1, NIND=9, N0BS=20, LDX=N0BS, NCOEF=INTCEP+NIND) parameter (N=4, M=2, ME=0, IPRINT=0, IERSVR=0, IPACT=1, ISACT=0) real X(3),y(N), yvec(NOBS), R(N) real stdev(N) ,xdesign(20,3) ,B(NC0EF), SST, SSE real XGUESS(3), XLB(3), XUB(3), F ,F1 ,F2,F3,FT0L real XG1(3), XG2(3) real XSCALE(3) , FSCALE, RPARAM(7) ,minF common /fitted/ bmatrix(10,4) common /sighat/ sig(6) , sstmat (4) , ssemat(4) common /mstuff/ siginv(4,4), xpxinv(10 , 10) common /actual/ ymat(20,4), xmatrix(20,9) integer i,j, k, ITP, IPARAM(7) ,MAXFCN, loopiter EXTERNAL ds, mprob, getmaxyl, getmaxy2, kc, dssphere, mprobsp data stdev /5 .61 ,328 . 69 ,20 . 55, 1 . 27/ DATA XSCALE/ 1.0, 1.0, 1.0/, FSCALE/ 1 . 0E0/ DATA XLB/1. 633, 1.633, 1.633/, XUB/1 . 633, 1 . 633, 1 . 633/ This gets the design points and generates the design matrix call f illxdes (xdesign) call fillxmat (xdesign) call RNOPT (7) call RNSET(O) Open a file to put all of the solutions into
PAGE 114
107 open (3 , FILE= Â’ solution . dat Â’ ) Select how many iterations of the simulation to perform do ii=l,1000 print* ,Â’ Iteration = Â’,ii This fills the y matrix with randomly generated data do i=l ,20 do j=l ,3 x(j)=xdesign(i, j) end do call fittedy(x.y) CALL RNNOR (N, R) do j=l ,4 ymat (i , j ) =y ( j ) +R( j ) *stdev ( j ) end do end do Now we are going to fit all of the data to a full second order polynomial, do j=l ,4 do i=l,20 yvec(i)=ymat(i, j) end do CALL RLSE (NOBS, yvec, NIND, xmatrix, LDX, INTCEP, B, SST, SSE) do i=l, NCOEF bmatrix(i, j)=B(i)
PAGE 115
108 end do sstmat (j)=SST ssemat (j)=SSE end do call getsig Now to see if we can maximize the whole thing First using Derringer and Suich's method minF=l . 0 ytar(l)=0.0 ytar(2)=0.0 This finds a good starting point, which is used for all methods, do i=10, 10 x(l)=real(i)/10.0 do j=10 , 10 x(2)=real(j)/10.0 do k=10,10 x(3) =real (k) / 10 . 0 call ds(N, x, F) call gety(x,y) if (F . LT. minF) then minF=F XGUESS(l)=x(l) XGUESS(2)=x(2) XGUESS(3)=x(3)
PAGE 116
109 end if if (y ( 1 ) .GT. ytar(l)) then ytar(l)=y(l) XGl(l)=x(l) XG1 (2)=x(2) XG1 (3)=x(3) end if if (y (2) .GT. ytar(2)) then ytar(2)=y(2) XG2(l)=x(l) XG2(2)=x(2) XG2(3)=x(3) end if end do end do end do Occasionally the starting point is too close to the maximum, so the maximization algorithm fails. If this is the case, the starting point is perturbed a bit and the maximization is tried again. loopiter=0 do while (loopiter .LT. 3) CALL ERSET (IERSVR, IPACT, ISACT) MAXFCN=5000 This algorithm returns the maximum within the spherical
PAGE 117
no ! region of interest. The optimal point is x, with the ! desirability given as F. The desirability is calculated in ! the subroutine dssphere. CALL NCONF (dssphere, M, ME, N, XGUESS, ITP, XLB, XUB, & XSCALE, IPRINT, MAXFCN , x, F) ICODE = IERCD ( ) if ( (x ( 1 ) .EQ. XGUESS (1) ) .or. (ICODE .EQ. 2)) then do i=l,3 XGUESS ( i ) =XGUESS ( i ) +0 . 1 end do else loopiter=3 end if loopiter=loopiter+l F= (1 . 0) *F print*,x,F, ICODE end do ! The following gives additional information for the point x. call mprob(N, x, FI) Fl= (1 . 0) *F1 call trueprob(N, x, F2) F2= (1 . 0) *F2 call trueds(N, x, F3) F3= (1 . 0) *F3 call gety(x,y) ! And it is all written to a file. write(3,99) x,y,F, F3, F2, ICODE
PAGE 118
Ill ! Now try the probability method IPARAM(l) = 0 loopiter=0 ! A few tries are allowed for the same reason as above, do while (loopiter .LT. 3) CALL ERSET (IERSVR, IPACT, ISACT) MAXFCN=5000 ! Same as above. The subroutine mprobsp returns the ! probability, which is maximized. CALL NCONF (mprobsp, M, ME, N, XGUESS, ITP, XLB, XUB, & XSCALE, IPRINT, MAXFCN, x, F) ICODE = IERCD ( ) if ( (x(l) .EQ. XGUESS (1) ) .or. (ICODE .EQ. 2)) then do i=l,3 XGUESS ( i ) =XGUESS ( i ) +0 . 1 end do else loopiter=3 end if loopiter=loopiter+l F=(1.0)*F print*,x,F, ICODE end do ! Again, more details about the point x. call trueprob(N, x, F3) F3= (1 . 0) *F3
PAGE 119
112 call gety(x,y) ! And write to file. write(3,98) x,y,F, F3, FI, ICODE end do close (3) 99 FORMAT ( Â’ DS Â’ ,3F10.5,2X,4F15.4,2X,3F10.5,I6) 98 FORMAT ( Â’MEÂ’ , 3F10 . 5 , 2X.4F15 . 4, 2X, 3F10 . 5 , 16) end ! subroutine fittedy(x.y) ! This is the assumed true functional relationship ! between the responses and the control factors, real x(3) ,y (4) y(l)=139 . 1192387+16.4936447*x(l)+17.8807651*x(2)+10.9065385*x(3) y(l)=y(l)4.0096009*x(l)**23.4471056*x(2)**21.5721213*x(3)**2 y(l)=y(l)+5.125*x(l)*x(2)+7.125*x(l)*x(3)+7.875*x(2)*x(3) y (2) =1261. 133138+268. 151102*x(l) +246. 503174*x(2)+139.484533*x (3) y (2)=y (2)83 . 565885*x(l) **2124.815539*x(2) **2+199 . 181747*x(3) **2 y (2)=y (2)+69 . 375*x(l) *x(2)+94 . 125*x(l) *x(3)+104.375*x(2)*x(3) y (3) =400 . 384575499 . 6664161*x (1) 31 . 3963948*x (2) 73 . 9190024*x (3) y (3)=y (3)+7 . 9326889*x(l) **2+17. 3076104*x(2) **2+0 .4327517*x(3) **2 y(3)=y(3)+8.75*x(l)*x(2)+6.25*x(l)*x(3)+1.25*x(2)*x(3) y (4) =68 . 909615211 . 40984528*x (1) +4 . 31968553*x (2) +1 . 63484452*x (3) y(4)= y (4)+1.55768153*x(l)**2+0.05769409*x(2)**20.31730277*x(3)**2 y(4)=y(4)1.625*x(l)*x(2)+0.125*x(l)*x(3)0.25*x(2)*x(3) end i
PAGE 120
subroutine gety(X, Y) ! This returns the values of the responses, based ! on the fitted equations that were generated from ! the simulated data, real X(3), Y(4) integer i common /fitted/ bmatrix(10,4) do i=l, 4 Y ( i) =bmatrix ( 1 , i) +bmatrix (2 , i ) *X ( 1) +bmatrix (3 , i) *X (2) + & bmatrix(4,i)*X(3) Y(i)=Y (i)+bmatrix(5 , i)*X(l) **2+bmatrix(6 , i)*X(2) **2+ & bmatrix(7 , i) *X(3) **2 Y(i)=Y(i)+bmatrix(8,i)*X(l)*X(2)+bmatrix(9,i)*X(l)*X(3) & bmatrix(10,i)*X(2)*X(3) end do end ! subroutine getsig ! This estimates the sigma hat matrix, which must be ! done for each simulated set of data. It is ! complicated, but it does work correctly, common /actual/ ymat(20,4), xmatrix(20,9) common /mstuff/ siginv(4,4), xpxinv(10, 10) common /sighat/ sig(6) ,sstmat(4), ssemat(4) real xmat (20 , 10) .tempi (20 ,20) , temp2(20, 10) , temp3(4,20) real xpx(10,10), sigma(4,4) ,sigmal(4,4) ,sse(4,4) integer i, j, k, NRA, NCA, LDA, NB, LDB, NCY
PAGE 121
114 data NRA /20/, NCA /10/, LDA /20/, NB /10/, LDB /10/, NCY /4/ do i=l,20 xmat (i , 1)=1 .0 do j=l ,9 k=j+l xmat ( i , k) =xmatr ix ( i , j ) end do end do CALL MXTXF (NRA, NCA, xmat, LDA, NB, xpx, LDB) CALL LINRG (NB, xpx, LDB, xpxinv, LDB) CALL MRRRR (NRA , NCA , xmat , LDA , NB , NB , xpxinv , LDB , NRA , NB , temp2 , LDA) CALL MXYTF ( NRA , NCA , t emp2 , LD A , NRA , NC A , xmat , LDA , NRA , NRA , t emp 1 , LD A ) do i=l ,20 do j=l , 20 if (i .EQ. j) then tempi (i, j)=1.0templ(i, j) else tempi (i, j)=templ(i, j)*(1.0) end if end do end do CALL MXTYF (NRA , NCY , ymat , LDA , NRA , NRA .tempi , LDA , NCY , NRA , temp3 , NCY) CALL MRRRR (NCY , NRA , temp3 , NCY , NRA , NCY , ymat , LDA , NCY , NCY , s igma , NCY ) do i=l ,4 do j=l ,4 sigma (i , j)=sigma(i , j ) / 10 .0 if (i .EQ. j) then
PAGE 122
115 sse(i, j)=1.0/sqrt(sigma(i, j)) else sse(i, j)=0.0 end if end do end do CALL LI NRG (NCY, sigma, NCY, siginv, NCY) CALL MRRRR (NCY , NCY , sse , NCY , NCY , NCY , sigma , NCY , NCY , NCY , sigmal , NCY) CALL MRRRR (NCY , NCY , sigmal , NCY , NCY , NCY , sse , NCY , NCY , NCY , sigma , NCY) ! This gives the upper triangular portion of the ! correlation matrix, which is needed to calculate ! the probability of interest. sig(l)=sigma(2, 1) sig(2)=sigma(3, 1) sig(3)=sigma(3,2) sig(4)=sigma(4, 1) sig(5)=sigma(4,2) sig(6)=sigma(4,3) end i SUBROUTINE mprob (N, X, F) ! This returns the probability based on the ! simulated data. INTEGER N REAL X(N) , F parameter (ny=4) integer i
PAGE 123
116 dimension a(ny) ,b(ny) ,y(ny) ,yu(ny) ,yl(ny) common /sighat/ sig(6) ,sstmat(4), ssemat(4) dimension stdev(ny) integer inf (ny) , if ault real prob, bound, eps data eps/0.0001/ data inf /0, 0,2, 2/ data yl /120.0, 1000. 0,400. 0,60.0/, yu /500.0, 1500.0,600.0,75.0/ do i1,4 stdev (i) =sqrt (ssemat (i) / 10.0) end do call gety(X,y) do i=l,ny a(i)=(yu(i)y(i))/stdev(i) b (i) = (yl ( i) y (i) ) /stdev (i) end do call mulnor (a, b,sig, eps ,ny, inf , prob, bound, if ault) F=prob * (1) end i SUBROUTINE mprobsp (M, ME, N, X, ACTIVE, F, G) ! This is used in the maximization algorithm. It is ! basically the same as mprob, but adds the nonlinear ! constraints that the solution must be within a ! radius of 1.633. INTEGER N, M, ME X(*) , F, G(*) , dist REAL
PAGE 124
117 logical ACTIVE (*) parameter (ny=4) integer i dimension a(ny) ,b(ny) ,y(ny) ,yu(ny) ,yl(ny) common /sighat/ sig(6) ,sstmat(4), ssemat(4) dimension stdev(ny) integer inf (ny) , if ault real prob, bound, eps data eps/0.0001/ data inf /0, 0,2, 2/ data yl /120 . 0 , 1000 . 0,400 . 0,60 . 0/ ,yu /500. 0,1500. 0,600. 0,75.0/ do i=l,4 stdev(i) =sqrt (ssemat (i) / 10.0) end do call gety(X,y) do i=l,ny a(i)=(yu(i)y(i))/stdev(i) b ( i) = (yl ( i) Â— y ( i) )/stdev(i) end do call mulnor (a, b,sig, eps ,ny, inf , prob, bound, if ault) F=prob * (1) dist=sqrt (X ( 1 ) **2+X(2) **2+X(3) **2 ) if (ACTIVE (1) ) G(l)=dist if (ACTIVEC2)) G (2) =1 . 633dist end i SUBROUTINE trueprob (N, X, F)
PAGE 125
118 ! This returns the probability based on the assumed ! fit obtained from the original DerringerSuich data. INTEGER N REAL X (N) , F parameter (ny=4) integer i dimension a(ny) ,b(ny) ,y(ny) ,yu(ny) ,yl(ny) common /sighat/ sig(6) ,sstmat(4), ssemat(4) dimension stdev(ny) integer inf (ny) , if ault real prob, bound, eps data eps/0.0001/ data inf/0,0,2,2/ data yl /120.0, 1000. 0,400. 0,60.0/, yu /500. 0,1500. 0,600. 0,75.0/ data stdev /5.6112, 328.69, 20.549, 1.2674/ call fittedy(X,y) do i=l,ny a(i)=(yu(i)y (i) ) / stdev (i) b ( i)=(yl(i)y(i)) /stdev (i) end do call mulnor (a , b , sig , eps , n , inf , prob , bound , if ault) F=prob * (1) end i subroutine ds(N, X, F) ! This returns the desirability based on the ! simulated data.
PAGE 126
119 integer N real X(N) , F, Y(4) , d call gety(X,Y) if (Y (1) .LT. 120.0) then d=0 . 0 else if (Y(l) .GT. 170.0) then d=l . 0 else d=(Y(l) 120.0)/ (170. 0120.0) end if end if if (Y (2) .LT. 1000.0) then d=0 . 0 else if (Y (2) .GT. 1300.0) then d=d*l . 0 else d=d* (Y(2) 1000 . 0) / (1300 . 01000 . 0) end if end if if ( (Y (3) .LT. 400.0) .OR. (Y(3) .GT. 600.0)) then d=0 . 0 else if (Y (3) .LT. 500.0) then d=d* (Y (3) 400 . 0) / (500 . 0400 . 0) else
PAGE 127
d=d* (600 . 0Y (3) ) / (600 . 0500 . 0) end if end if if ( (Y (4) .LT. 60.0) .OR. (Y(4) .GT. 75.0)) then d=0 . 0 else if (Y (4) .LT. 67.5) then d=d* (Y (4) 60 . 0) / (67 . 560 . 0) else d=d* (75 . 0Y (4) ) / (75 . 067 . 5) end if end if d=sqrt (sqrt (d) ) F=d* (1.0) end i subroutine dssphere(M, ME, N, X, ACTIVE, F, G) ! This is used by the maximization algorithm. ! It returns the desirability based on the ! simulated data, with the added nonlinear ! constraint that the solution must be within ! a radius of 1.633. integer N, M, ME real X(*) , F, G(*), Y(4), d, dist logical ACTIVE (*) call gety(X,Y) if (Y (1) .LT. 120.0) then
PAGE 128
121 d=0 . 0 else if (Y(l) .GT. 170.0) then d=l . 0 else d=(Y(l) 120.0)/ (170. 0120.0) end if end if if (Y (2) .LT. 1000.0) then d=0 . 0 else if (Y (2) .GT. 1300.0) then d=d*l . 0 else d=d* (Y (2) 1000 .0)/ (1300 . 01000 . 0) end if end if if ( (Y (3) .LT. 400.0) .OR. (Y(3) .GT. 600.0)) then d=0 . 0 else if (Y (3) .LT. 500.0) then d=d* (Y (3) 400 . 0) / (500 . 0400 . 0) else d=d* (600 . 0Y (3) ) / (600 . 0500 . 0) end if end if if ( (Y (4) .LT. 60.0) .OR. (Y(4) .GT. 75.0)) then
PAGE 129
122 d=0 . 0 else if (Y (4) .LT. 67.5) then d=d* (Y (4) 60 . 0) / (67 . 560 . 0) else d=d* (75 . 0Y (4) ) / (75 . 067 . 5) end if end if d=sqrt (sqrt (d) ) F=d* (1 . 0) dist=sqrt (X(l) **2+X(2) **2+X(3) **2) if (ACTIVE (1) ) G(l)=dist if (ACTIVE (2) ) G(2)=l .633dist end i subroutine trueds(N, X, F) ! This returns the true desirability obtained ! from the fit of the original Derringer and ! Suich data, integer N real X(N), F, Y(4), d call fittedy(X,Y) if (Y (1) .LT. 120.0) then d=0 . 0 else if (Y (1) .GT. 170.0) then d=l .0
PAGE 130
123 else d=(Y(l) 120.0)/ (170. 0120.0) end if end if if (Y (2) .LT. 1000.0) then d=0 . 0 else if (Y (2) .GT. 1300.0) then d=d*l . 0 else d=d* (Y (2) 1000 . 0) / ( 1300 . 01000 . 0) end if end if if ( (Y (3) .LT. 400.0) .OR. (Y(3) .GT. d=0 . 0 else if (Y (3) .LT. 500.0) then d=d* (Y (3) 400 . 0) / (500 . 0400 . 0) else d=d* (600 . 0Y (3) ) / (600 . 0500 . 0) end if end if if ( (Y (4) .LT. 60.0) .OR. (Y(4) .GT. d=0 . 0 else if (Y (4) .LT. 67.5) then 600.0)) then 75.0)) then d=d* (Y (4) 60 . 0) / (67 . 560 . 0)
PAGE 131
124 else d=d* (75 . 0Y (4) ) / (75 . 067 . 5) end if end if d=sqrt (sqrt (d) ) F=d* (1 . 0) end i subroutine f illxmat (xdesign) ! Some of the fits and estimations need the ! full design matrix, which this generates, common /actual/ ymat(20,4), xmatrix(20,9) real xdesign (20 ,3) integer i do i=l,20 xmatrix(i , l)=xdesign(i , 1) xmatrix(i,2)=xdesign(i,2) xmatr ix ( i , 3) =xdesign ( i , 3) xmatrix(i ,4)=xdesign(i , 1) **2 xmatrix(i ,5)=xdesign(i ,2)**2 xmatr ix (i , 6) =xdesign (i , 3) **2 xmatrix(i ,7)=xdesign(i , 1) *xdesign(i , 2) xmatrix(i ,8)=xdesign(i , 1) *xdesign(i ,3) xmatrix(i ,9)=xdesign(i ,2)*xdesign(i ,3) end do end i
PAGE 132
125 subroutine f illxdes (xdesign) ! This puts the design points into a matrix. ! It should be in a file, but this was easier ! given my time constraints, real xdesign(20,3) xdesign(l , 1)=1 . 0 xdesign (1 ,2)=l .0 xdesign(l ,3)=1 . 0 xdesign(2, 1) = 1 . 0 xdesign(2,2)=l .0 xdesign(2,3)=l .0 xdesign(3, 1)=1 .0 xdesign (3 , 2) =1 . 0 xdesign(3,3)=l . 0 xdesign (4, 1) = 1 . 0 xdesign(4,2) = l . 0 xdesign (4 , 3) =1 . 0 xdesign (5, 1)=1 . 0 xdesign(5,2)=l .0 xdesign(5,3)=l .0 xdesign(6,l)=1.0 xdesign (6, 2)=l . 0 xdesign (6 ,3)=1 . 0 xdesign (7, 1)=1 . 0 xdesign (7 , 2)=1 . 0 xdesign(7,3)=l .0 xdesign (8, 1)=1 . 0
PAGE 133
126 xdesign (8 , 2) =1 . 0 xdesign(8,3)=l .0 xdesign(9, 1)=1.633 xdesign (9 ,2) =0 . 0 xdesign(9 , 3) =0 . 0 xdesign (10, 1)=1 . 633 xdesign(10,2)=0.0 xdesign(10,3)=0.0 xdesign (11 , 1)=0 . 0 xdesign (11 ,2) =l . 633 xdesign(ll,3)=0.0 xde sign(12,l)=0.0 xdesign(12,2)=l .633 xdesign ( 12 , 3) =0 . 0 xdesign (13, 1)=0 . 0 xdesign(13,2)=0 . 0 xdesign(13,3)=l .633 xdesign( 14 , 1) =0 . 0 xdesign ( 14 , 2) =0 . 0 xdesign (14, 3) =1 .633 xdesign (15, 1)=0.0 xdesign (15, 2) =0.0 xdesign ( 15 , 3) =0 . 0 xde sign(16,l)=0.0 xdesign (16, 2) =0.0 xdesign (16, 3) =0.0 xdesign (17, 1)=0.0
PAGE 134
xdesign(17,2)=0. xdesign(17,3)=0 . xdesign(18, 1)=0 . xdesign(18,2)=0. xdesign(18 ,3)=0 . xdesign(19 , 1)=0 . xdesign(19,2)=0. xdesign(19,3)=0. xdesign(20,l)=0. xdesign(20 ,2)=0 . xdesign (20 , 3) =0 . 0 0 0 0 0 0 0 0 0 0 0 end
PAGE 135
APPENDIX C BASIC OUTLINE OF AN OPTIMIZATION EXPERIMENT The following is a general description of all of the steps necessary to carry out an optimization experiment, whether it has one response of interest or multiple responses of interest. Select the Responses of Interest and the Control Factors An optimization experiment begins by identifying the responses of interest and what control factors may affect them. The responses of interest are typically the important quality characteristics of the product being investigated. Information gathered from prior experimentation or knowledge of the process being investigated will help to identify control factors that affect the responses of interest. For each of the responses of interest, upper and lower specification limits will also need to be chosen. Select an Appropriate Experimental Design The appropriate experimental design depends on the functional form that is assumed to exist between the responses of interest, the yi, and levels of the control factors, the Xj. Most optimization experiments assume that the responses of interest can be estimated by second order polynomial models of the control factors. This model can be written as p p p Vi = A) T ^ ^ T ^ ^ ^ ' fiijXiXj + Cj, 2Â—1 2Â—1 j>i where the f3 are unknown regression parameters, e is an error term, and p is the number of control factors. If a second order polynomial model is assumed to be 128
PAGE 136
129 true, the experiment must have at least three levels of each of the control variables. Popular response surface experimental designs that can support the fit of a second order polynomial model are the central composite design and the 3 3 factorial, which are detailed in books on response surface methods, such as those by Myers and Montgomery (1995) or Khuri and Cornell (1996). If the variance of one or more of the responses also depends on the levels of the control factors, some of the experimental points will have to be replicated in order to obtain an estimate of the variance at those points. The number of experimental points that need to be replicated will depend on the functional form that is assumed exist between the variance and the levels of the control factors. A good treatment of the issues involved with this can be found in Vining and Schaub (1996). Once the form of the experimental design is chosen, appropriate levels of the response variables must be chosen. The choice of levels may be made based on prior experimentation or knowledge of the process being investigated. The highest and lowest level of the control variable should be made as far apart as is feasible, with the other levels determined by the choice of the experimental design. Estimation of the Regression Parameters Once the experimental design and appropriate levels of the control factors are chosen, the experimental points can be properly randomized and the experiment run. The data from the experiment are then used to estimate the unknown regression parameters /3. The estimates are generally obtained from computer packages, such as Minitab, SAS, Splus, Microsoft Excel, etc. Any other unknown quantities, such as the variancecovariance matrix, will also be estimated from the experimental data using similar software packages.
PAGE 137
130 Maximizing the Probability of Being Within Specification Appendix A contains details for robust parameter design situations, where there is only one response of interest whose variance depends on the levels of the control variables. Only minor changes to the description given in appendix A would be necessary for a new optimization experiment. The equations in appendix A that define the mean and the standard deviation (equations A.l and A. 2, respectively) would have to match the regression equations obtained from the new experiment. Appendix B contains a Fortran program that can be modified to solve the problem for situations where there is more than one response of interest. For the multiple response optimization situations, where there is more than one response whose variances do not depend on the levels of the control variables, a number of changes would have to be made to the program given in appendix B. The numbers given in appendix B match the numbers obtained from the tire tread example given in chapter 4. Specifically, the changes necessary would be: Â• The parameters ny and N would have to be specified correctly in the main program. The parameter ny should equal the number of response variables, while the parameter N should equal the number of control variables. The ny parameter also needs to be set in the subroutine mprob. The lines in question are parameter (ny=4) parameter (N=3) since the example has 4 responses of interest and 3 control parameters. Â• The upper and lower bounds for the control parameters are set in a data line for XLB and XIJB in the main program. The line in question is DATA XLB/1. 633, 1.633, 1.633/, XUB/1 . 633, 1 . 633, 1 . 633/ since the limits of the example are Â±1.633.
PAGE 138
131 Â• The starting point for the algorithm is set in a data line for XGUESS in the main program. The line in question is DATA XGUESS/0 . 5 , 0.5, 0.5/ since the starting point used in the example was x' = (0.5, 0.5, Â—0.5). The experimenter should use a number of different starting points from different areas of the experimental region. This will help to eliminate the possibility that the algorithm settles on a local maximum, rather than the global maximum. Â• The upper triangular portion of the correlation matrix must be given in a data line for sig in the mprob subroutine. Note that this is the correlation matrix, not the variancecovariance matrix. The correlation matrix must have diagonal elements equal to 1. The lines in question are data sig/0. 01885893, 0.02720418, 0.22046146, 0.15876509, & 0.072886, 0.05203539/ since this is the correlation matrix from the example. The correlation matrix can be obtained from statistical packages, such as Minitab, Splus, or SAS. Â• The bounds for each response variable, as well as the type of response variable must be set in the mprob subroutine. The upper and lower bounds are set in a data line for yu and yl, respectively. The type of response is set in a data line for inf. If the response is to be maximized, inf should be set to 0. If the response is to be minimized, inf should be set to 1. If the response has a specific target, inf should be set to 2. The lines in question are data inf /0, 0,2, 2/ data yl /120. 0,1000. 0,400. 0,60.0/, yu /500. 0,1500. 0,600. 0,75.0/ since these are the limits of the four responses from the example given. In the example, the first two responses were to be maximized, thus the first two values of inf are 0, and the last two responses had targets, thus the next two values of inf are 2.
PAGE 139
132 Â• The standard deviations of each of the responses must be entered in a data line for stdev in the mprob subroutine. The standard deviation is typically equal to the mean squared error from the regression analysis. Also note that if S is a diagonal matrix of the standard deviations of the responses, then the variancecovariance matrix can be obtained by preand postmultiplying the correlation matrix by S. The line in question is data stdev /5.6112, 328.69, 20.549, 1.2674/ since these were the standard deviations from the example for the four responses. Â• The regression equations for each of the responses y(i ) must be specified in the subroutine evaluate. The first equation from this subroutine begins y (1) = 139. 1192387+16. 4936447*x(l)+17 . 8807651*x(2) + which matches the coefficients in table 4.2. The multiple response robust parameter design situation, where there is more than one response of interest, and the variancecovariance matrix depends on the levels of the control factors, can be solved using a Fortran program similar to that in appendix B. An additional change to those listed above would be: Â• The standard deviations would not be entered into a data line in the mprob subroutine, rather they would have their own separate regression equations. These equations could be put into the evaluate subroutine or into a separate subroutine.
PAGE 140
REFERENCES Ames, A. E., Mattucci, N., MacDonald, S., Szonyi, G., and Hawkins, D. M. (1997), Â“Quality Loss Functions for Optimization Across Multiple Response Surfaces,Â” Journal of Quality Technology , 29, 339346. Box, G. E. P. and Draper, N. R. (1987), Empirical Model Building and Response Surfaces, New York: Wiley. Box, G. E. P. and Wilson, K. B. (1951), Â“On the Experimental Attainment of Optimum Conditions,Â” Journal of the Royal Statistical Society, Series B, 13, 145. Box, M. J. (1965), Â“A New Method of Constrained Optimization and a Comparison with Other Methods,Â” The Computer Journal, 8, 4252. Byrne, D. M. and Taguchi, S. (1987), Â“The Taguchi Approach to Parameter Design,Â” Quality Progress, 20(12), 1926. Copeland, K. A. F. and Nelson, P. R. (1996), Â“Dual Response Optimization via Direct Function Minimization,Â” Journal of Quality Technology , 28, 331336. Del Castillo, E. and Montgomery, D. C. (1993), Â“A Nonlinear Programming Solution to the Dual Response Problem,Â” Journal of Quality Technology, 25, 199204. Derringer, G. and Suich, R. (1980), Â“Simultaneous Optimization of Several Response Variables,Â” Journal of Quality Technology, 12, 214219. Draper, N. R. (1963), Â“Ridge Analysis of Response Surfaces,Â” Technometrics, 5, 469479. Gnanadesikan, R. (1977), Methods For Statistical Data Analysis of Multivariate Observations, New York: Wiley. Hoerl, A. E. (1959), Â“Optimal Solution of Many Variables Equations,Â” Chemical Engineering Progress, 55, 6978. Kackar, R. N. (1985), Â“OffLine Quality Control, Parameter Design, and the Taguchi Method (with discussion),Â” Journal of Quality Technology, 17, 176209. Khuri, A. I. and Cordon, M. (1981), Â“Simultaneous Optimization of Multiple Responses Represented by Polynomial Regression Functions,Â” Technometrics, 23, 363375. 133
PAGE 141
134 Khuri, A. I. and Cornell, J. A. (1996), Response Surfaces: Designs and Analyses , New York: Marcel Dekker. Kim, K. J. and Lin, D. K. J. (1998), Â“Dual Response Surface Optimization: A Fuzzy Modeling Approach,Â” Journal of Quality Technology, 30, 110. Kotz, S. and Johnson, N. L. (1993), Process Capability Indices, London: Chapman & Hall. Lai, Y. J. and Chang, S. I. (1994), Â“A Fuzzy Approach for Multiresponse Optimization: An OffLine Quality Engineering Problem,Â” Fuzzy Sets and Systems, 63, 117Â— 129. Laviolette, M., Seaman, John W., J., Barrett, J. D., and Woodall, W. H. (1995), Â“A Probabilistic and Statistical View of Fuzzy Methods (Disc: P262292),Â” Technometrics, 37, 249261. Lawless, J. F. (1982), Statistical Models and Methods for Lifetime Data, New York: Wiley. Lin, D. K. J. and Tu, W. (1995), Â“Dual Response Surface Optimization,Â” Journal of Quality Technology, 27, 3439. Muller, B. G., Leuenberger, H., and Kissel, T. (1996), Â“Albumin Nanospheres as Carriers for Passive Drug Targeting: An Optimized Manufacturing Technique,Â” Pharmaceutical Research, 13, 3237. Myers, R. H. and Carter, W. H. (1973), Â“Response Surface Techniques for Dual Response Systems,Â” Technometrics, 15, 301317. Myers, R. H., Khuri, A. I., and Vining, G. (1992), Â“Response Surface Alternatives to the Taguchi Robust Parameter Design Approach,Â” American Statistician, 46, 131139. Myers, R. H. and Montgomery, D. C. (1995), Response Surface Methodology: Process and Product Optimization Using Designed Experiments, New York: Wiley. Nair, V. N. (1992), Â“TaguchiÂ’s Parameter Design: A Panel Discussion,Â” Technometrics, 34, 127161. Nelder, J. A. and Mead, R. (1965), Â“A Simplex Method for Function Minimization,Â” The Computer Journal , 7, 308313. Pignatiello, Joseph J., J. (1993), Â“Strategies for Robust Multiresponse Quality Engineering,Â” HE Transactions , 25, 515. Rodriguez, R. N. (1992), Â“Recent Developments in Process Capability Analysis,Â” Journal of Quality Technology, 24, 176187.
PAGE 142
135 Shoemaker, A. C., Tsui, K. L., and Wu, C. F. J. (1991), Â“Economical Experimentation Methods for Robust Design,Â” Technometrics , 33, 415427. Vining, G. G. (1998), Â“A Compromise Approach to Multiresponse Optimization,Â” Journal of Quality Technology , 40, 309313. Vining, G. G. and Myers, R. H. (1990), Â“Combining Taguchi and Response Surface Philosophies: A Dual Response Approach,Â” Journal of Quality Technology , 22, 3845. Vining, G. G. and Schaub, D. (1996), Â“Experimental Designs for Estimating Both Mean and Variance Functions,Â” Journal of Quality Technology, 28, 135147. Welch, W. J., Yu, T. K., Kang, S. M., and Sacks, J. (1990), Â“Computer Experiments for Quality Control by Parameter Design,Â” Journal of Quality Technology , 22, 1522. \
PAGE 143
BIOGRAPHICAL SKETCH Philip McGoff was born in 1961 in Owatonna, Minnesota, the youngest of seven children. He graduated from the University of Chicago in 1983 with a bachelor of arts degree in mathematics and physics. After a number of years working as an engineer, computer programmer, and high school teacher, he decided in 1995 to continue his education in statistics at the University of Florida. After graduation, he plans to use his statistics degree in an industrial setting, working with the design of experiments and process optimization. 136
PAGE 144
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. G. Geoffrey Vining, Chairman Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. John A. Cornell Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doatoii of Philosophy. IB w * * J/nfes P. Hobert Assistant Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Andrew Rosalsky Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philoso] Dale W. Kirmse Associate Professor of Chemical Engineering
PAGE 145
This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 2000 Dean, Graduate School

