UFDC Home  myUFDC Home  Help 



Full Text  
MULTIFIDELITY GLOBAL DESIGN OPTIMIZATION INCLUDING PARALLELIZATION POTENTIAL By STEVEN E. COX A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2002 For my wife, Amanda and my parents who have always supported me. ACKNOWLEDGMENTS I would like to thank my advisor Dr Raphael Haftka who provide me with the opportunity and funding to pursue a PhD. He has been very helpful in exploring new avenues of research and in securing opportunities and contacts to broaden my exposure to the state of the art in the optimization field. He has been a patient instructor and a wealth of knowledge and experience to draw from. My thanks go especially to my parents who provided me with the education and encouragement which made this dissertation possible. They sacrificed to send me to good schools and took an active interest in my education. They are responsible for my strong work ethic and instilled a deep desire to never stop learning new things. Together with the rest of my family, they provided the support which has allowed me to get where I am now. I would like to thank my wife Amanda who encouraged me to complete my research and never complained about the long hours and travel required. She has been supportive and enthusiastic about my work which has made my life much easier and all of the effort worthwhile. I would also like to thank the past and present members of the Structural and Multidisciplinary Optimization Group for their friendship and support over the last five years. SatchiVenkataramam, Melih Papila and Raluca Rosca were wonderful sources of information on the various requirements for graduate students including publications, classes, examinations and this dissertation. Laurent Grosset, Jaco Schutte, Amit Kale and Xueyong Qu were very helpful for discussing ideas and provided additional viewpoints on research as well as different cultures. I would like to thank my committee members for their guidance and support on my research. Their comments and advice made this dissertation a much more useful document. I would also like to thank Dr B. J. Fregly for standing in at my defense when Dr George was unable to attend. I am grateful for the financial support provided in NASA grant NAG21179 which covered most of my graduate career and from William Hart through Sandia National Laboratories which provided the funding to allow me to finish my graduate work. I am also grateful for the use of the Ross parallel cluster of processors at Sandia where some of the parallel experiments were run. William Hart was also very helpful in the formulation of the DIRECTBP algorithm and the conversion of my code into more robust C++ libraries. TABLE OF CONTENTS page A C K N O W L E D G M E N T S .................................................................... ......... .............. iii L IST O F TA B LE S ........... ................................................ .... .... .. ............. viii LIST O F FIG U R E S ........... ................................................. ........ .............. ix A B S T R A C T ...................................................................................................................... x i 1 INTRODUCTION AND LITERATURE SURVEY .............................................1 G lob al O p tim izatio n ............ .......................................................... ..................... 5 Genetic Algorithms ...... ........ ..................... ........ ..................... 6 Simulated Annealing .. ............................ ........ ............... 8 Particle Swarm ......................................... 10 M ultistart M methods ......... ........................ .. .. ...... .......... 11 Pattern Search M ethods ... ..... .......................................... ....... .............. 14 L ipschitzian O ptim ization ....................................................................... 16 Approximation Methods for Optimization and Reduced Cost ............................... 18 M ultifidelity Optimization .. ...... ........................................... ...... .............. 21 M ultidisciplinary O ptim ization .............. ........................................................... 24 Parallel Computing ..................... .......................... .. 26 Overview of Research ............................................... .. 28 2 PERFORMANCE OF GLOBAL OPTIMIZATION ALGORITHMS NOISE VERSUS WIDELY SEPARATED OPTIMA .............................................30 O ptim izers ............................................................. .......... ...... 3 1 Sequential Quadratic Programming ......................................................... 31 Dynamic Search ..... ....................................... 31 T he D IR E C T A lgorithm .......................................................................................... 33 Constraints w ith D IRECT.................... .................................... .......................... 38 Test Functions ......................................... 40 G riew ank Function .... .............................. ................................ .............. 40 Quartic Function .................... .......................... .. 44 H SCT Design Problem ... ...... .......................................... ................... 47 Com prison Concluding R em arks ........................................... ........................ 53 v 3 D IR E C T B P ...................................................................5 5 O original D IRECT A lgorithm ............................................. ............................. 56 Convergence B behavior ...................................................................... .......... ... 56 Demonstration of Local Convergence Difficulty .............................................. 58 The D IRECTBP A lgorithm ................................... ............................ ............ 61 Im plem entation ....................... ..... ....... .............. 66 Identifying a Positive Spanning Set ........................................ ....... .............. 67 Maintaining a Positive Spanning Set .............................................. .......... 69 L im itin g A ................................................................................................................ 7 0 Local Search ......................................... 71 Experim ental Com prisons .............................................. ............................ 72 Test problem s ......................................... 72 E x p erim ental m eth od s ......................................................................................... 74 Test results ......................................... 75 4 MULTIFIDELITY DIRECT ..................................................................... ...............80 Correction Methods .................. .......................... .. 80 C constant C orrection R atio ................................................ ............................ 81 Linear Correction Response Surface ................... .......................... .......... 83 Test Problem s ......................................... 88 Test Results ............................................................. 90 Sixth O rder Test Problem ............................................... ............................. 90 G riew ank Function .... .............................. ................................ .............. 94 Algorithm Status ......................................... 98 5 P A R A L L E L D IR E C T ....................................................................... ...................100 Previous Parallel D IRE C T ................................................ ............................ 100 D IR E C T M modifications .................................................. ............................... 102 Accelerated Start ............................................... .. 102 C ontinu ou s D IR E C T .............................................................................................. 104 M message Traffic ............. ...................... .... .......................... .. .............. 108 Performance ......................................... ............ 115 Alternate Convex Hull Calculation ....... ........................................................... 118 A lg orith m Statu s ......... ................................ ...................................... .................. 12 1 6 CONCLUSIONS ................................. ............... ..............123 A SEQUENTIAL DIRECT C++ LIBRARY .................................... ...............126 B DIRECT AND DIRECTBP ............................................. ........................... 130 C PARALLEL DIRECT CODE .............. ......................................... ..............169 D MULTIFIDELITY DIRECT CODE .................................................... ..............205 The CCF Multifidelity DIRECT Code ............................................................. 205 The LRS Multifidelity DIRECT Code ........................................................ 230 LIST OF REFEREN CES .................................................. ................................ 257 BIOGRAPHICAL SKETCH ........................................................... .............. 263 vii LIST OF TABLES Table page 1 Comparisons for 5, 10, and 20 DV Griewank functions.......................................42 2 Effect of Variation of din 10dimensional Griewank function............................ 43 3 Comparisons for 5, 10, and 20 DV quartic functions. ...........................................45 4 Comparison of 5, 10, 15, and 20 DV HSCT problem......................................50 5 Fifteen DV optima for DOT and DIRECT .......................................... ......... 51 6 Comparison of DIRECT and DIRECTBP on fi of Eq. 15. ................ ..............76 7 Comparison of the average performance (30 runs) of DIRECT and DIRECTBP on the tendimensional Griewank function............. ............................... ..............77 8 Comparison of the performance of DIRECT and DIRECTBP on the third test problem E q. 19. ...................................................................... 78 9 Comparison of the performance of DIRECT and DIRECTBP on the design of a C S T R E q 2 1 ...................... ............................. ............................... 7 8 10 Multifidelity versions of DIRECT vs. single fidelity DIRECT for the sixth order test p rob lem E q 2 5 .................................................................. 9 1 11 Multifidelity DIRECT comparison for modified sixth order problem, Eq. 28, 29.... 92 12 Multifidelity constant correction factor DIRECT with modified potentially optimal box selection ...................................................................... ..........93 13 Multifidelity DIRECT Comparison for Griewank Function.............. .................94 14 Multifidelity DIRECT comparison for modified Griewank function....................96 15 Performance of the parallel DIRECT algorithms on a 20DV Griewank function. .116 16 Parameters in Object Oriented DIRECT library code. .......................................... 127 LIST OF FIGURES Figure page 1 False local optima due to noise ............ ...... ........... .............. 2 Function w ith true local optim a....................................................... .............. 2 3 Illustration of single point crossover in Genetic algorithm.............................7 4 Exam ple of a single point m utation. .............................................. ....................7 5 Tunneling to locate a different objective function valley. ......................................13 6 Nine steps of a pattern search with 4 possible step directions. ..............................15 7 L ipschitzian optim ization ............................ ................................. .............. 35 8 Onedimensional example of DIRECT........................ ........................ 35 9 Point selection in D IRE CT ............................................. ............................ 36 10 The D IRECT box division in 2D ................................. ................................. 36 11 Griewank function in one dimension. ............................... .. ...................... 40 12 Quartic function in one dimension...................................................................... 44 13 D efinitions of design variables. .................................................... ....................48 14 Distribution of design variables for the best 25 designs found by DOT for the 10 DV H SC T case. ..........................................................................53 15 Instance where the current best box does not contain the local optimum .................58 16 Possible distribution of points for potentially optimal box selection ...................58 17 A onedimensional plot offi(x) (Eq. 15) in the range x = [0,3] ............................59 18 Distance between c*(t) and the global optimum for Eq. 15 in ten dimensions.........60 19 Value of f(c*(t)) for fi (Eq. 15) in ten dimensions as the optimizations progress.....60 20 An illustration of the sets of vectors Tt used for X*(t) at various positions in the feasible dom ain .................................................... ................ 63 21 Twodimensional example of vectors failing to form a positive spanning set..........65 22 One dimensional Griewank function with d = 100 ..........................................73 23 Constant correction scheme for multifidelity DIRECT ...........................................81 24 H ighfidelity function in one dim ension ........................................ .................... 88 25 Lowfidelity function in one dimension............................... ....................... 89 26 Modified Griewank function and low fidelity approximation ...............................96 27 Initial box division for parallel DIRECT ..................................................103 28 Alternate initial box division for parallel DIRECT ..............................................104 29 Continuous parallel DIRECT organization ................................................105 30 Test problem in one dimension for parallel DIRECT (e = 0.2)........................... 108 31 Initial message pattern for 4 processors using Prob. B. ................ ..............110 32 Final message pattern for 4 processors using Prob. B. .....................................111 33 Final message pattern for 4 processors using Prob. A. .................. ..........111 34 Initial message pattern for 16 processors using Prob. B ............. ....................113 35 Final message pattern for 16 processors using Prob. B. ............... ....................113 36 Initial message pattern for 25 processors using Prob. B ............. ....................114 37 Final message pattern for 25 processors using Prob. B. .............................. 114 38 Comparison of computational efficiency of the four parallel DIRECT algorithms. 116 39 Outline of existing parallel DIRECT code arrangement ................................. 119 40 Modification to the DIRECT code to reduce idle time .................... .............120 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy MULTIFIDELITY GLOBAL DESIGN OPTIMIZATION INCLUDING PARALLELIZATION POTENTIAL By Steven E. Cox December 2002 Chair: Raphael Haftka Department: Mechanical and Aerospace Engineering The DIRECT global optimization algorithm is a relatively new space partitioning algorithm designed to determine the globally optimal design within a designated design space. This dissertation examines the applicability of the DIRECT algorithm to two classes of design problems: unimodal functions where small amplitude, high frequency fluctuations in the objective function make optimization difficult; and multimodal functions where multiple local optima are formed by the underlying physics of the problem (as opposed to minor fluctuations in the analysis code). DIRECT is compared with two other multistart local optimization techniques on two polynomial test problems and one engineering conceptual design problem. Three modifications to the DIRECT algorithm are proposed to increase the effectiveness of the algorithm. The DIRECTBP algorithm is presented which alters the way DIRECT searches the neighborhood of the current best point as optimization progresses. The algorithm reprioritizes which points to analyze at each iteration. This is to encourage analysis of points that surround the best point but that are farther away than the points selected by the DIRECT algorithm. This increases the robustness of the DIRECT search and provides more information on the characteristics of the neighborhood of the point selected as the global optimum. A multifidelity version of the DIRECT algorithm is proposed to reduce the cost of optimization using DIRECT. By augmenting expensive highfidelity analysis with cheap lowfidelity analysis, the optimization can be performed with fewer highfidelity analyses. Two correction schemes are examined using high and lowfidelity results at one point to correct the lowfidelity result at a nearby point. This corrected value is then used in place of a highfidelity analysis by the DIRECT algorithm. In this way the number of highfidelity analyses required is reduced and the optimization became less expensive. Finally the DIRECT algorithm is parallelized to allow its implementation on multiple processors. Two masterslave implementations are proposed using an arbitrary number of processors to speed the analysis of points for the optimization. The two methods are compared on a heterogeneous collection of processors with special attention to computational and algorithmic efficiency. CHAPTER 1 INTRODUCTION AND LITERATURE SURVEY Most of the weight and performance characteristics of a vehicle are set in the conceptual level of a design. Designers would like to use the most accurate models available and a large number of design variables at the conceptual level to produce a globally optimal design for the finished vehicle. Consequently, as computational power has increased and more accurate physical models become available, the design of complex machines has become more involved at the earliest stages of development. As the number of variables increases, the volume of the design space increases exponentially and gradient calculations become more expensive. This increases the cost of optimizing the design over even a small range for each variable. Often the feasible design space or the objective functions are nonconvex and the resulting multiple local optima can trap local optimizers and prevent them from locating the best design. To solve this problem, either multiple starting points or a global optimization algorithm may be used. Both of these methods will increase the cost of the optimization. Discretization errors, roundoff errors, and less than fully converged iterative calculations within analysis codes can result in noisy constraints and objective functions. This can create additional spurious optima. A distinction is made between these noisegenerated numerical optima and genuine optima due to physical nonconvexity. Designers are primarily concerned with locating the physical local optima modeled by the analysis functions and with finding a way to bypass the numerical noise. Our research has used approximations, dynamic search techniques, and space partitioning methods to try to overcome numerical noise. As the design space becomes larger, complex constraint boundaries can lead to many local optima and disjoint feasible regions for even simple objective functions. This requires an optimization method which can jump from one feasible region to another and that can search for other local optima away from explored regions. Numerical noise can also hinder local optimizers by corrupting gradient calculations and creating false local optima. Different optimizers handle one of these challenges and not the other. One objective of this dissertation is to select an approach that is able to deal with both numerical noise (Figure 1) and widely separated local optima (Figure 2). In the past, designers relied on simple algebraic estimates for such things as weight and load calculations for structural components, lift and drag for wings and fuselages, or magnetic and electrical effects for electronic components (Raymer 1992). This worked well for designing conventional aircraft or spacecraft, but did not necessarily produce the best designs for the mission for which they were needed. Experts C 50 2 t8. 30 40 50 30 10 10 30 50 Design Variables Design Variable Figure 1. False local optima due to noise Figure 2. Function with true local optima typically narrow the potential designs to a manageable range based on existing designs. This reduces the possibility of developing new designs that could be useful for new mission requirements. Local optimization methods are applied to subsets of the design variables as each part of the craft is optimized in turn. Multiple iterations through the design variables leads to a locally optimal design according to the analysis equations used. Many of the algebraic estimates are based on empirical data from a limited number of designs and cannot predict the behavior of a design with a substantially different configuration. These methods can lead to designs whose physical behavior is not truly optimal. Lamberti et al. (2001) demonstrated the use of approximate analysis equations in the design of a reusable launch vehicle. By using simple shell and plate theories to optimize the design of an integrated fuel tank, they were able to examine 14 different design concepts with about 10 variables each in less time than it took to perform a single nonlinear finite element analysis. To use these simple models, however, they were forced to constrain the design to regions were the models were accurate. This precluded consideration of potentially optimum designs. As computer power increased, new analysis codes were developed that can accurately predict the behavior of unusual and complicated structures. However, these codes can take anywhere from hours to days for a single highfidelity analysis. For this reason they have been used extensively in the final stages of the design cycle but they are generally considered too expensive for the conceptual stage of the design (where as many as tens of thousands of analyses must be performed). To search more of the design space in the conceptual stage, designers are forced to use simpler versions of these codes. A prominent area of research today is designed to find ways to combine the simple and complex analysis codes so that the designer gains the accuracy of the highfidelity codes while retaining the lower computational cost of the lowfidelity codes This dissertation examines ways to combine the high and lowfidelity models with the DIRECT algorithm for efficient global optimization. Multifidelity local optimization has been used for over a decade on various engineering problems with good results. This dissertation attempts to convert these methods to operate efficiently for global optimization. It examines two correction methods to improve the results from approximate models: constant correction factors (over a limited region of the design space) and linear correction response surface models (to improve the correction away from the highfidelity solutions). As analysis programs become more expensive and optimizers require more evaluations to search ever larger design domains, the computing power of a single processor has become a major limiting factor to design optimization. Parallel systems have become available with thousands of processors, which can perform trillions of operations per second. With the profusion of powerful PCs in the average company and the improvement of buildingwide networks, simple parallel arrangements are now possible for very little additional investment (Groenwold et al. 2000). New optimization methods that distribute the work evenly over multiple processors or computers would allow engineers to incorporate more expensive analysis programs and examine more design concepts with more variables. With this in mind, the DIRECT algorithm used here was examined for its potential for use on parallel computers. An optimization algorithm that generates function evaluations in large batches is inherently suited for running on a parallel machine. Baker et al. (2001) showed this with the DIRECT optimizer, but room still exists for improvement. This research examines ways of reducing the idle time of the parallel code by continuously generating work for the processors and working with the strengths and weaknesses of different parallel systems. Global Optimization One reason so many types of global optimizers exist is that no one optimizer performs well on every type of problem. Much research has been done to find computationally efficient methods to perform global optimization for high dimensional, nonconvex design spaces. Many global optimization codes have been developed and tested for use with different classes of problems (Floudas and Pardalos 1996). However, most of these global optimization algorithms are specialized to a narrow class of problems. The optimization methods examined here are those commonly used in engineering design applications. Groenwold et al. (2000) applied eight global optimization algorithms to twelve test problems and found that no optimizer was the fastest on more than four of the problems. Different optimizers are designed to take advantage of different characteristics of various classes of problems. Because of this, Groenwold et al. proposed using multiple optimizers running in parallel on each problem and stopping the process once the first optimizer finds the global optimum. They found that this increased the speed of the optimization provided there were idle computers available. Unfortunately, for most classes of nonconvex problems there is no way to guarantee that you have reached the global optimum with a feasible number of analyses except for very small design spaces (Haftka and Girdal 1992). Some popular methods, such as evolution strategies and the multistart method, can provide statistical estimates of the probability that the global optimum has been reached. Evolution strategies include methods such as genetic algorithms, simulated annealing, and other stochastic methods. These methods rely on random steps to slowly move toward a better value. Statistical measures are used to determine whether a global optimum has been reached. Deterministic methods do not rely on random variables and their behavior is predictable based on the characteristics of the problem they are optimizing. These methods include optimizers such as Lipschitzian optimization (including DIRECT) and some pattern search methods. Genetic Algorithms Genetic algorithm strategies model the natural process of reproduction and survival of the fittest. They start with a random population of initial designs. Each design is coded into a chromosome or list of values representing all of the design variables together. Each variable is limited to discrete values and is usually coded in binary form or integer numbers. For binary coding, the range of each continuous variable can be divided into 2n intervals and the value is coded as a binary string n digits long. For other divisions or discrete variables not all of the binary values will necessarily correspond to a possible variable value, in which case they must be removed from the population. Each design is analyzed and the designs are ranked in order of best to worst. The parent designs are then randomly crossed depending on their relative performance to produce child designs. The best parent is the most likely to be selected and the worst parent is least likely. The two parent designs are combined in various ways ranging from simple single or multiple point crossover to more complicated selection of the order of points based on some of the points from each design. This is useful for problems such as the traveling salesman problem where all of the points must be used once. Crossover involves taking two designs and selecting some number of points at random in the chromosome. The chromosomes are split at those points and two new chromosomes are formed using segments from both of the parent chromosomes. This continues until the population size of the new points equals the number of parent designs. Mutation is also used to increase the randomness of the search and prevent the designs from converging to a small range too quickly. Each value in the chromosome has a small chance of randomly changing to a different value. After the population is set, they are checked to see that they are all possible designs and any illegal designs are regenerated (Haftka and Giirdal 1992). Parent 1 001101101100 Parent 2 0110111010011 Original Gene 011011001101 Parent 2 011011100011 S1 1 Mutated Gene 011010001101 Offspring 1 001101100011 Offspring 2 011011101100 Figure 3. Illustration of singlepoint Figure 4. Example of a singlepoint crossover mutation. RamirezRosado and BernalAgustin (1998) used genetic algorithms in the design of power distribution networks with over 300 variables. They used an integer coding for simplicity and showed that it was possible to optimize this type of system using genetic algorithms. Similarly, Savic and Walters (1997) used genetic algorithms to design water distribution networks. Designs having up to 34 discrete variables with 16 possible values were coded using gray coding. This method is a variation of binary coding such that the integer values of the binary numbers are rearranged such that adjacent values differ by only one bit. Their results were comparable to previous studies using other optimization methods. Soh and Yang (1996) developed a fuzzylogic version of genetic algorithms as a way to soften the constraint boundaries and handle imprecise performance data. Their method allows the designer to include fuzzy rules as a way of including expert knowledge and experience. This was shown to speed up the convergence to a lowest weight design on a variety of structural problems. Vasconcelos et al. (1997) combined the global search capabilities of the genetic algorithm with the faster convergence of a local optimizer to optimize electromagnets. They tried several different criteria for switching from the genetic algorithm to an augmented Lagrangian method for the final convergence. They found that this generally speeded up the convergence rate but it hurt the optimizer's ability to locate the global optimum. Simulated Annealing Simulated annealing optimization is based on the physical process of slowly cooling a piece of metal so that it settles into a minimum energy state. While the metal is very hot the atoms can move freely into different energy configurations but as it cools, the atoms tend towards the lowest energy state provided the cooling schedule gives the atoms time to reach a stable configuration. For simulated annealing, the objective function value is treated as the energy state of the material and the changes in the design variables is analogous to the movement of the atoms. The optimizer starts at a random point in the design space and takes a random step in any allowed direction. If the objective function decreases in value (for minimization) then the new point is accepted and another random step is taken from the new point. If the new point has a higher function value then the new point can be rejected or accepted based on a random probabilistic decision. As the optimization progresses, the chance of an inferior point being accepted decreases slowly from almost 100% to 0%. This is analogous to the metal being cooled. This allows the optimizer to wander freely at first while still prejudicing it to move towards a better design. As the optimization progresses the probability that the optimizer can move to a higher function value decreases and it is forced to a local minimum (Haftka and Girdal 1992). Lombardi et al. (1992) examined the use of SA in the optimization of composite plates. They compared cooling strategies and the effect that the increase in the number of plies had on the cost of the optimization. They found that as the size of the design space increased, a smaller fraction of the designs needed to be explored to reach an optimum. Their results indicated that SA performed well on bucking and contiguous ply constrained problems but had difficulties with strain constraints. Swift and Batill (1992) used SA for discrete optimization of several structural designs. They used finite element analyses of truss and wing structures and used the data to train a neural network to allow rapid calculation of additional designs. The design spaces contained from 410 to 4113 discrete designs and would have been too expensive to search completely. They produced improvements of 6% to 13% for all three test cases using less than 5000 iterations per optimization. Tzan and Pantelides (1996) proposed a modified SA algorithm for optimizing structural designs. As better feasible designs were located, the search space was reduced. They also incorporated sensitivity analyses to determine which design variables needed to be changed when the design violated constraints to return it to the feasible region. These changes were found to increase the convergence rate of the optimizer but violated the symmetry found in conventional simulated annealing, as the optimization progresses it becomes impossible to get back to some of the points in the design space. They demonstrated this method on the design of a ten bar truss and a ten story building and found it worked well for problems with dynamic constraints. Particle Swarm A relatively new stochastic optimization technique proposed by Kennedy and Eberhart (1995) is based on the swarming behavior of creatures such as insects, birds and fish. It is composed of multiple search 'bots' run in parallel through the design space. Each bot adjusts its path to attempt to move back towards the best point it had located while also attempting to move towards the best point located by any bot in the swarm. This is accomplished by assigning a position and step length or velocity at each iteration. The position at subsequent iterations is simply j,i+ = j,i + j,i (1) while the velocity at each iteration is augmented to redirect the bot back towards its individual best point and the swarm best point located. ji+1 ji 1 +C ,j j ,i 2 21 b ji,(X (2) where 1r and r2 are continuously updated random numbers, Xj,bi is the best point located by botj and Xb is the best point located by the swarm. When the paths of all of the bots are examined at once it mimics the flight of a swarm of insects about a light or food source. The swarm ofbots move about the best points located to converge towards an optimal solution. There have been a number of variations to the particle swarm algorithms to attempt to maximize the performance over different classes of problems. Clerc (1999) suggested adding a constriction factor to the velocity term to limit its growth while Shi and Eberhart (1998) proposed adding an inertia term to reweight the velocity from the previous iteration and Fourie and Groenwold (2000) allowed for dynamic reduction of the weighting term based on the performance of the optimization. Schutte and Groenwold (2002) compared these different weighting methods for two simple truss problems and proposed a dynamic penalty function method for increasing the need for feasibility as the optimization progresses to deal with constraints. Carlisle and Dozier (2001) attempted to establish standard parameter settings for an off the shelf particle swarm optimizer. Ranges of values for the number ofbots, values for ci and c2, limits on the magnitude of vi and other parameters were explored to determine general guidelines for using particle swarm optimization. Multistart Methods Multistart methods use local optimizers, which have been used in engineering for many years, to increase the chances of locating the global optimum. Standard gradient based optimizers have been developed which are robust and efficient at locating a local optimum. By starting the local optimizer at multiple starting points scattered throughout the design space it is hoped that at least one of the optimizations will result in the global optimum. Statistical methods exist to determine the probability that the optimum you have located is the global optimum (Boender and Kan 1983 and Snyman and Fatti 1987). One statistical stopping criterion for multistart optimization is based on the assumption that the region of convergence to the global optimum is at least as large as the region of convergence of all other local optima. After n local optimizations from random starting points, let r be the number of points that led the optimizer to the best function value located by any of the optimizations, f. This criterion states that the probability that the global optimum, f *, has been found is given by; f ) (n +1)!(2n r) (3) (2n + 1)!(n )!r) The optimization stops once P exceeds a desired confidence level provided by the user. This is known as a Bayesian global stopping criteria (Groenwold et al. 1995, 2000). Baker et al. (1998) optimized the design of an HSCT using multistart local optimization to examine substantially different types of designs. Relatively few starting points and no statistical measure of the probability of finding the global optimum were used. The different starting points were able to illustrate the separate local optima in their analysis models. Kam et al. (1996) used multistart local optimization to optimize the weight of a composite plate. The layers were represented as continuous variables in the local optimizations in order to make use of gradientbased optimizers. Instead of rounding the answers to whole numbers of layers, a branchandbound method was used to find the global optimum design in the neighborhood of the continuous solution. Groenwold et al. (1995) used a gradientbased optimization method known as dynamic search trajectory (Snyman 1983) to perform multistart optimization. This gradient method has some ability to move through weak local minima in order to reach a lower one nearby. It does this by modeling the optimization as a ball rolling downhill. As the optimizer moves down the gradient it builds momentum which can carry it over small bumps in the objective function. It still functions primarily as a local optimizer however, and requires multiple starting points to indicate that it has located the global optimum and it is more expensive than other gradient methods due to the need to continuously calculate gradients. Chan et al. (1999) presented a global optimization strategy using multistart optimization that uses tunneling to locate a starting point for the next local optimization. After the first local optimum is found, random directions are searched to see if a new point can be found with the same objective function as the local optimum. If such a point can be found, then it must lie in another valley than the previous optimum. If the search direction remains feasible, then the end point is extrapolated towards the origin to find a design with the same weight as the local optimum. The line connecting the local optimum to this point is searched to try to find a feasible point of similar weight. Additionally, if the points along the search direction change from infeasible to feasible as you move away from the local optimum then the feasible points most likely lie in a different valley. The new point is accepted as a starting point if the function value is less than the second best local optimum found to prevent it from tunneling back into a previous local optimum. Feasible Region Possible Randon search results C 2 1. Search does not leave current valley S2. Search direction is feasible fA Extrapolation towards the origin leads to a point of the same function value A Extrapolation from local optimum leads to a point in a different valley ^ 3. Search passes from infeasible to feasible Infeasible Region B The end of the search vector becomes a new starting point Design Variables Figure 5. Tunneling to locate a different objective function valley. Pattern Search Methods Generalized pattern search (GPS) theory defines a semiglobal method for stepping through the design space in search of a good local optimum. It is a well established class of optimizers that has been in use for over 50 years on many classes of problems. At each iteration, a set of search directions and lengths are used to select a finite set of points surrounding the current point to analyze. This set of search directions must positively span 9n and is selected from a finite pool of possible search directions for each algorithm. A set of directions, V, is defined as positively spanning 9n if it is possible to move from any point in 9n to any point in 9n through a finite number of steps in the directions of V If a better point is located in this way, the next iteration starts at the new best point and selects another set of surrounding points to analyze using the same (or different) set of search directions and the same step length or longer. If no improvement is made in an iteration, then the step lengths of the search are reduced and a new set of points are analyzed (Torczon 1997, Dennis and Torczon 1991, Audet and Dennis 2000A,B). In this way, the optimizer takes a series of finite steps through the design space on a finite set of search directions until it reaches a local optimum (Figure 5). Torczon (1997) provided for a local convergence proof for a general pattern search algorithm; for any continuously differentiable problem, liminfk+, kVf(xkI = 0. However, while initial steps may step from one local basin to another, there is no guarantee of convergence to a global optimum. The algorithm is capable of stepping over noise and narrow local optima and is a popular nongradient based method. This class of algorithm is also well suited to mixed integer and other discrete problems due to the fact that all of the points selected will lie on a mesh determined by the choice of the search pattern and the minimum step size. Figure 6. Nine steps of a pattern search with 4 possible step directions. The algorithm samples the function at the end of each search vector and moves to a new point with a better function value. If no improvement is found with the current step lengths (points 6 and 8), the step length is contracted and the pattern (or possibly a different pattern) is sampled about the current point. Pattern search theory has been combined with filtering to handle constraints (Audet and Dennis 2000B), and with stochastic searches such as genetic algorithms to improve the convergence properties (Hart 2001). The algorithms may move to the first better point they locate at each iteration or can wait until all of the points have been analyzed before moving to the best point. For the first case, adaptive selection of the order of the search directions to sample can be applied to speed the movement of the algorithm through the design space, reducing the number of points sampled at some iterations. Additional points can also be added to the ones selected by the search steps to take advantage of knowledge of the local behavior of the design space before performing a rigorous search of the pattern about the best point. Lipschitzian Optimization Lipschitzian optimization can be guaranteed to converge to the global optimum of the design space for an unconstrained problem with a continuous objective function that has a known Lipschitz constant. The Lipschitz constant is an upper bound on the absolute value of the slope of the function at any point in the design space. Since it is known that the function cannot change any faster than this it is possible to eliminate portions of the design space that are near analyzed designs with poor performance. In practical use it is rare that the Lipschitz constant is known for a physical optimization problem, but this method is used by estimating a value for the constant based on expert knowledge. If the constant is chosen too large, then the optimizer will be slow and spend more time searching near regions of poor performance. If the constant is chosen smaller than it should be, then the optimizer will search more near the best points it finds and ignore larger unexplored regions. This can allow it to settle on a local optimum instead of the global optimum. Commonly used Lipschitzian optimization also requires that all of the vertices of the design box be analyzed at the start. This requires 2n evaluations, where n is the number of dimensions. This makes this method impractical for more than a few dimensions (Jones et al. 1993). Jones et al. (1993) developed a variation of Lipschitzian optimization that addresses the problem of higher dimensions and an unknown Lipschitz constant. Jones' DIRECT algorithm, (for DIviding RECTangles) progressively divides the design space into smaller boxes and analyses the center of each box instead of the vertices. This allows the optimizer to work on problems of higher dimension with fewer points at the beginning. The algorithm also selects points for sampling based on the assumption that the Lipschitz constant could be any number from zero to infinity. This allows a simultaneous global and local search of the design space where large unexplored regions are examined at the same time as the regions with good function values are explored in more detail. A complete description of this algorithm is found in Chapter 2. Nelson and Papalambros (1998) modified this method by including a local optimizer when the design space is divided. When DIRECT selects rectangles to divide, it first performs a local optimization starting at the best function value found. It then divides the design space to place the local optimum found in its own box. This can speed up the convergence of the optimizer but it results in boxes with irregular aspect ratios and evaluation points that are not at the center of their respective boxes in the design space. Gablonsky and Kelley (2001) proposed a locally biased version of the DIRECT algorithm. They modified the box size calculation to reduce the number of different box sizes at each iteration. This reduces the number of boxes which are divided at each iteration while still dividing the box which contains the best point located. Their formulation is intended for problems with only a few local optima. Baker et al. (2000 & 2001) examined several parallel versions of the DIRECT algorithm. They demonstrated that the algorithm can maintain a reasonable efficiency on up to 128 processors with a problem that takes about 1.7 seconds to analyze. More information on their work is found on page 28 and in Chapter 5. He et al. (2002) have examined data management for DIRECT codes to reduce the excess memory used by some implementations of DIRECT. They developed dynamic data structures for a Fortran implementation of DIRECT to allow the code to adapt to arbitrary problem sizes while maintaining code efficiency. BartholomewBiggs et al. (2002) combined DIRECT with a restart to modify the design space after DIRECT has been run for a number of iterations. In their version, after DIRECT has been run for a set number of iterations (usually 50100) the optimizer is stopped and then restarted with a (possibly) smaller design space centered at the best point located. They found that this can reduce the cost of DIRECT for an aircraft routing problem. However, this removes a large portion of the design space from consideration and can potentially cause DIRECT to converge to a poorer local optimum. Approximation Methods for Optimization and Reduced Cost Response surface techniques were originally used to fit an approximation to physical experiments to reduce the noise in the observations and provide an inexpensive method for calculating the performance at other points. They are often used in optimization to replace expensive numerical analyses both for efficiency and to remove local numerical optima that are caused by finite resolution and incomplete convergence. Ishikawa et al. (1999) proposed using a radial basis method combined with sequential quadratic programming (SQP) to optimize engineering problems. The radial basis method is a way of interpolating the function based on a scattering of sample points. The data is fit to a basis function to allow efficient prediction of the function at other points and provide an approximate solution. From there SQP is used for the final convergence. Jones et al. (1998) used a variation of response surface modeling known as a DACE model to perform global optimization. They combined high fidelity analyses to create a simple response surface and added a correction factor based on how close the approximation was to a high fidelity point used to construct the surface. When a response surface is generated, it does not pass through the data points exactly. There is a known bias error in the response surface at each of these points. Since the error in the response surface near these points is correlated to the known error, a correction factor is calculated to find a closer approximation to the true function, similar to Kriging modeling. This allows the model to capture multimodal functions with even linear or quadratic response surfaces. They use this model for their efficient global optimization (EGO) algorithm. This algorithm uses statistical measures based on the potential error of the DACE model combined with the predicted values to find points where the expected improvement is maximized. It continues evaluating points and refining the model until convergence is satisfied. This optimizer compared favorably to other global optimizers with the added benefit that it provided a simple visualization tool for multiobj ective optimization after the DACE models are generated. This can be very useful in dealing with problems where design requirements must be rewritten due to constraint violations. Giunta and Watson (1998) compared the performance of a polynomial response surface to the kriging model for a simple test function of 1, 5 or 10 dimensions. The test problem was a combination of sinusoidal terms that could be shifted from a noisy quasi quadratic form to a noisy quasisinusoidal form. For the one dimensional case kriging was more accurate for the sinusoidal form while the quadratic response surface modeled the quadratic form better, as expected. For the higher dimensional cases, however, the response surface approach was better for both forms. Simpson et al. (1998) compared these two models for use in multidisciplinary design of an aerospike propulsion system. response surface and kriging models were fit to the structural weight and the aerodynamic performance of the engine and the results were compared for their accuracy. They were then used to optimize the engine for four different criteria. Their results indicated that the errors were similar for both methods. They demonstrated that either of these models could be used for approximating the highfidelity results for optimization. Knill et al. (1999) turned to response surfaces to reduce noise and eliminate spurious local optima in the design of the HSCT. By identifying the major sources of noise in their analysis and replacing them with response surface they were able to transform a five design variable space with many local optima from noise into a convex region with one optimum. For the higher dimensional design spaces they were able to smooth the objective function and constraints and reduce the number of local optima found by local optimizers. Liu et al. (1999) used response surfaces to combine panel level optimizations with wing level optimizations for a composite wing box structure. This method works well when the design variables can be divided into sets of local variables with a small number of connecting global variables, in this case three inplane loads and the number of 0, 45 and 900 plies. Genetic optimizations were used to optimize ply arrangements for composite plates based on given numbers of plies and loads to maximize the buckling load multiplier. These optimum values were fitted to a response surface to provide a cheap approximation to the buckling load multiplier for any combination of plies and loads within the design space. This allowed them to perform global optimization of the ply arrangements followed by a local optimization of the wing thicknesses. Performing a global optimization of the entire wing at once would have been much too expensive. Multifidelity Optimization As computer power has increased, new analysis codes have been developed which can accurately predict the behavior of unusual and complicated structures. However, these codes can take anywhere from hours to days to generate high fidelity results. For this reason they have been used extensively in the final stages of the design cycle but they are generally considered too expensive for use in the conceptual stage of the design where as many as tens of thousands of analyses must be performed. In order to search more of the design space in the conceptual stage, designers are forced to use less expensive, lowerfidelity versions of these codes. A prominent area of research today is designed to find ways to combine the low and highfidelity analysis codes so that the designer gains the accuracy of the highfidelity codes while retaining the lower computational cost of the lowfidelity codes. One disadvantage of using response surfaces for approximating a highfidelity function is that they are not based on the physical properties of the behavior they are trying to represent. Because they are based on a small number of points and are low order polynomials, they are best at modeling small regions of design space. For larger regions, they can fail to capture local fluctuations on the performance of the system and generally have large errors wherever extrapolation is required. For most engineering applications, lowfidelity models based on the physical behavior of the system components are available. These models match the trends in the design space better than response surfaces over the entire design space. In order to obtain the accuracy of the highfidelity models, however, you must have some way of including the highfidelity analyses in the optimization. Haftka (1991) proposed a method for scaling simple approximations to improve their accuracy over large regions of design space known as a Global Local Approximation (GLA). He proposed using a linear scaling factor to preserve the accuracy of the corrected approximate function over a larger region than a constant scaling factor or a Taylor approximation to the highfidelity function. At an initial starting point, the function value and first derivatives of the high and lowfidelity analyses are calculated and used to create a correction function for the lowfidelity values such that: fh(x)= p3(x,)f(x,) (4) and Vf,(x,)= p(x,)VfY(x) (5) wherefh is the highfidelity value andf is the lowfidelity value. 3(x) is a linear function, which means that as the design gets further away from xo, the correction function will no longer correct the lowfidelity value to the highfidelity value. He demonstrated the accuracy of this model as the design moves away from the starting point for a simple beam analysis. Chang et al. (1993) continued the comparison with other approximation models for the design of an HSCT wing box. They showed that GLA was able to increase the accuracy over that of several other approximation models as the design variables moved further from the starting point. McQuade et al. (1995) has worked on combining high and lowfidelity analyses by using GLA on the design of a 2D scramjet engine. They compared this method with the older method of using Taylor series approximations as the lowfidelity model. A trust region approach similar to traditional nonlinear programming is used to limit the optimizer from moving too far from the starting point. Once the move limit is reached, a new set of highfidelity analyses is used to update the correction function and the optimization is restarted with new move limits. This method is guaranteed to converge to the same design as an optimization on the highfidelity function for appropriate move limits. GLA worked well for the optimization of the nozzle shape but failed to capture the forebody performance as well as the Taylor series. This was due to the movement of shock impingement points and showed the sensitivity of GLA to discontinuities in the design space. Alexandrov et al. (1998, 2000) used GLA to construct an approximation management framework for the optimization of 3D wings. This method was shown to be effective for several local optimizers and is guaranteed to converge to the same design as an optimization on the high fidelity function for appropriate move limits. Giunta et al. (1995) and Kaufman et al. (1996) used simple models to help determine which variables had the largest effect on wing structural weight and locate the regions of the design space where reasonable designs could occur. This reduced the volume of the design space that the optimizers had to examine with the more expensive evaluations. Response surfaces were fit to detailed analyses scattered throughout this reasonable design space and local optimizers were applied to these surfaces to find the optimum designs. Balabanov et al., (1998) used thousands of structural optimizations of designs using a coarse finiteelement model to make a quadratic response surface for calculating wing bending material weights. They then used about a hundred refined finiteelement optimizations to calculate a correction factor for the coarser model and generated a linear response surface model for the correction factor over the entire design space. This was found to reduce the errors in the low fidelity response surface by more than half. Vitali et al. (1998) used correction response surfaces to calculate stress intensity factors for a cracked composite plate. They fit a quadratic response surface to the results from a high fidelity analysis of the cracked plate and compared the results to using a quadratic response surface of the ratio of the high fidelity model to a simpler model to correct the simple model. They found that it was more accurate to use a correction factor on a simple model than to use traditional response surfaces directly on the high fidelity model. Vitali and Sankar (1999) then used this method to optimize the weight of a stiffened composite panel constrained by crack propagation limits. The optimum weight found using the correction response surface satisfied the design constraints when checked by the high fidelity analysis. Multidisciplinary Optimization The design of complex structures usually involves many design variables from several different disciplines. The requirements for the structure may entail structural, mechanical, aerodynamic, electrical or magnetic considerations, each of which can require complex formulas and programs to evaluate. Traditional design methods consider each of these disciplines separately. Teams of specialists are used to optimize separate portions of the design and then the groups come together to compromise on design variables affecting multiple aspects of the design. This can lead to suboptimal designs with no clear understanding of the tradeoffs between different disciplines (Korngold and Gabriele 1997). Integrating the analysis of each discipline into a single optimization can provide an explicit way to deal with their interactions and handle constraints. MacMillin et al. (1997) investigated the interaction of conflicting requirements of different disciplines on the gross takeoff weight for an HSCT design. They showed that the addition of conflicting constraints from different disciplines resulted in a design that was as much as 35% heavier than a design that was optimized without these interactions. This shows the large changes that can occur when interdisciplinary effects are included in the optimization. Kroo et al. (1994) presented a way of decomposing the multidisciplinary optimization of an aircraft into separate disciplines and suboptimizations and using constraints to handle their interactions. The design is divided into separate disciplines with the constraints delegated to their respective disciplines and auxiliary variables are inserted to represent the cross dependencies of different disciplines. The system wide optimization attempts to minimize the objective function by providing the sub optimizers with design variables and target values for the auxiliary variables. Each sub optimization attempts to minimize the difference between the target values for the auxiliary variables and the values used to satisfy the local constraints. The system optimization is constrained so that the target values equal the values used in each suboptimization. This method worked well for problems with few auxiliary variables and allows each sup optimizer to use local expertise in optimizing each discipline. Another popular way of combining multiple disciplines in to a single optimization is the use of response surfaces. Korngold and Gabriele (1997) used localized response surfaces to create an approximation for each discipline model involved in a design and then performed a global sensitivity analysis to approximate the global design space in a small region. This global approximation is then used to perform a discrete optimization for a design and manufacturing example program. After each optimization, the local response surfaces are regenerated and a new global approximation is calculated until convergence is satisfied. Kaufman et al. (1996) used response surfaces to integrate the results from commercial codes from different disciplines. An empirical weight equation was used to locate the design variables with the largest effect on structural weight to reduce the number of dimensions that the response surface has to model. Structural optimizations were performed over the resulting design space and the results were used to fit a response surface for the wing structural weight of an HSCT design. This also served to reduce the noise found when trying to calculate gradients directly from the structural optimization results. Parallel Computing As analysis programs become more expensive and optimizers require more evaluations to search everlarger design domains, the computing power of a single processor has become a major limiting factor to design optimization. Parallel systems have become available with thousands of processors that can perform trillions of operations per second. New optimization methods that distribute the work evenly over multiple processors hold much promise for allowing engineers to incorporate more expensive analysis programs and examine more design concepts with more variables. Optimization offers many avenues for parallelization from coarsegrained division of large jobs to each processor to fine grained parallel codes that divide each job into small tasks for multiple processors. The optimizer can generate multiple evaluation requests at a time, each of which can run on a different machine. This allows the designer to use commercially available analysis codes that were written for serial machines without having to modify them. This is the simplest form of parallelization but it is limited in the number of processors that it can efficiently utilize. Krasteva et al. (1999) explored different methods of distributing these jobs to multiple processors. Distributed dynamic load balancing techniques and termination criteria were examined to help determine efficient ways of utilizing multiple processors with a minimum amount of processor idle time. Vendors have also started to modify some of their codes to run on parallel machines. By dividing each analysis into smaller parts and spreading them over a large number of processors, the optimization can utilize more processors and reduce latency where processors sit idle waiting for other large jobs to finish. This creates other problems with communication and synchronization between processors and requires a lot of work to efficiently divide the analysis into equal parts that do not rely extensively on one another (Culler et al. 1999). Burgee et al. (1996) used parallel computing to evaluate multiple designs at once to create response surface models for each discipline. Simple models were used to locate good regions in the design space where more complex models were used to generate a local response surface. Response surfaces allowed the optimizer to use black box programs for separate disciplines and combine them for inexpensive genetic optimization. In this way the analysis within each discipline was done in parallel as opposed to parallelism across disciplines. Weston et al. (1994) described a Framework for Interdisciplinary Design Optimization (FIDO), which is designed to break up each analysis into separate disciplines and evaluating them in parallel. They optimized a simplified HSCT design by running structural, aerodynamic, performance and propulsion codes on separate workstations and using FIDO to coordinate the interactions between them. The FIDO program allows the user to utilize any standalone analysis code which can be parallel or sequential. It also allows the user to replace specific codes if higher fidelity is needed. Bhardwaj et al. (2000) described a parallel structural dynamics code for static and transient response of structures. The code was tested on as many as 1000 processors on problems with as many as 8 million degrees of freedom. This program demonstrates the benefits that can be obtained from using a parallel implementation by reducing the solution time for one problem from 5 days to 30 minutes. Baker et al. (2000) examined the use of the DIRECT algorithm on a parallel architecture. They demonstrated the variation of evaluations needed per iteration for the DIRECT optimizer for the optimization of an HSCT using 28 variables. They examined different load balancing strategies which provided up to 86% efficiency for 64 processors on a problem where each evaluation took about 1.75 seconds. However, as written for sequential operation, DIRECT requested at most 400 evaluations per iteration and the large variation in the number of evaluations per iteration could make the use of more processors inefficient. Modifications to the DIRECT algorithm to generate more evaluation requests per iteration would be needed to take advantage of more processors. Overview of Research The main thrust of this dissertation is efficient ways to deal with global optimization of complex structures. An efficient global optimization method is selected as a starting point for combining with alternate concepts for reducing the time required to perform an optimization. The dissertation then examines ways of combining multifidelity information and multiple processors with the chosen algorithm to improve the utility of an already efficient algorithm. The next section of this dissertation presents the work done comparing three single fidelity global optimization algorithms. They are compared to see how they deal with noise and multimodal functions for high dimensional problems. The results of this work have been published in the Journal of Global Optimization. Chapter 3 describes the DIRECTBP algorithm, a modification to DIRECT intended to provide for a more robust search about the best local optimum located by DIRECT as the optimization progresses. Chapter 4 describes the multifidelity versions of the DIRECT optimizer used in this research and the results obtained on several test problems. Chapter 5 presents a parallel version of the DIRECT optimizer designed for use on large clusters of heterogeneous processors. CHAPTER 2 PERFORMANCE OF GLOBAL OPTIMIZATION ALGORITHMS: NOISE VERSUS WIDELY SEPARATED OPTIMA The need for global optimization is driven by a combination of multiple locally optimal designs for complex structures and noise in analysis functions due to the use of discrete and iterative analysis methods. The purpose of this chapter is to examine some general algorithms to see how they perform with respect to each of these two conditions. Three global optimization methods were compared on how well they optimized these two types of problems. The first method uses multiple starting points to optimize the design using sequential quadratic programming (SQP) as implemented in DOT, (Vanderplaats 1995). The second method, Snyman's dynamic search algorithm, (Snyman 1983), is capable of passing through shallow local minima to locate a better optimum but still requires multiple starting points. The third method is DIRECT, which is run only once. Two test functions were examined to explore the performance of the optimizers on these two types of problems, the socalled Quartic and Griewank functions. The comparisons help to show the strengths of each optimizer and suggest the types of problems each is best suited for. The optimizers were then run on a more complicated problem with both numerical noise and widely separated local optima to see how they handled a more realistic engineering problem. The configuration design of the High Speed Civil Transport (HSCT) is an example of a high dimensional design problem with a nonconvex feasible design space due to nonlinear constraints and numerical noise (Knill et al. 1999). This is typical of many problems in industry that require global optimization. The analysis code uses a combination of simple functions from various disciplines to evaluate general planforms for the HSCT. The combination of iterative range calculations, structural formulas which are pieced together from multiple programs and minimum gauge and spacing requirements lead to irregular design spaces and numerical noise. The relative impact of noise and multiple local optima can also be adjusted by including more variables. Optimizers Sequential Quadratic Programming The commercial program Design Optimization Tools (DOT), (Vanderplaats 1995), was used to optimize the HSCT design using SQP. SQP forms quadratic approximations of the objective function and a linear approximation for the constraints and moves towards the optimum within given move limits. It then forms a new approximation and repeats the process until it reaches a local optimum. Due to the use of approximations, DOT is relatively quick to perform a single optimization and will handle a limited amount of noise without becoming stuck in spurious local minima. DOT has been successfully used in the past with the HSCT problem, and compared favorably to other local optimizers (Haim et. al. 1999). In order to perform a global search of the design space, DOT was started at 100 random initial designs for the HSCT problem and up to 20,000 starting points for the test problems. The lowest objective function value found for each case was taken as the global minimum. Dynamic Search The second optimizer tested was Snyman's dynamic search method, Leap Frog Optimization Procedure with Constraints, Version 3 (LFOPCV3) (Snyman 1983). LFOPCV3 is a semiglobal optimization method that is capable of moving through shallow local minima. The method is based on the physical situation of a particle rolling down hill. As the particle moves down, it builds momentum, which carries it out of small dips in its path. At each step, LFOPCV3 subtracts the gradient of the objective function from the velocity of the particle from the previous step. The new velocity determines the step size and direction for the current step. When the velocity vector makes an angle of more than 900 with the gradient vector, i.e., when it is descending, the velocity increases and the particle builds up momentum to go through areas where it is ascending. When ascent occurs, a damping strategy is used to extract energy from the particle to prevent endless oscillation about a minimum. The dampening strategy used in the version of LFOPCV3 used here has been modified to increase its ability to search further away from a local optimum. Whenever the optimizer is at a point where the function value is higher than the best function value located so far, it creates a shallow quadratic function centered at the best location found and uses the slope of the quadratic function plus a small percentage of the objective function slope to calculate the next step. This reduces the dampening when it is moving uphill while preventing it from becoming trapped in a local optimum that is worse than the best point found so far. LFOPCV3 handles constraints with a standard quadratic penalty function approach. This causes the gradient of the penalty function to increase as the constraint violation increases which gives a smooth gradient at the constraint boundaries. In our initial experiments with this algorithm it demonstrated the ability to move further than DOT when searching for the global minimum and has been successfully used on functions with hundreds of local minima. However, it still requires multiple starts to sample the entire design box. For this comparison, we used the same 100 initial designs to compare the performance with DOT for the HSCT problem and up to 20,000 starting points for the test problems. The DIRECT Algorithm The DIRECT algorithm (Jones et al. 1993) is a variation of Lipschitzian optimization that uses all values for the Lipschitz constant. The basic problem that DIRECT can solve is defined as min f(x) x e T" (6) XL <: XU where xL, x e R n are simple bounds on the vector x. Implementations of DIRECT commonly renormalize these bounds such that xL = 0 and xu = e = (1,...,1). Classical Lipschitzian optimization requires the user to specify the Lipschitz constant K, which is used as a prediction of the maximum possible slope of the objective function over the global domain. Lipschitzian optimization uses the value of the objective function at the corners of each box and K to find the box with potentially the lowest objective function value. The design with the predicted minimum possible function value is evaluated and the process is repeated for a set number of iterations. This method is guaranteed to asymptotically approach the global optimum for a Lipschitz continuous function provided a large enough value of K is used. DIRECT does not require an estimate of the Lipschitz constant. Instead of evaluating every vertex of each box, DIRECT uses the function value at the center of each box and the box size to find the boxes which potentially contain the optimum. This reduces the number of points that need to be analyzed at the first iteration and allows the use of DIRECT on higher dimensional problems than are usually feasible for Lipschitzian optimization. DIRECT selects a boxj to divide if using some Lipschitz constant K,, that box could contain a lower function value than any other box. DIRECT is guaranteed to come within y of the global optimum once there are no boxes left larger than y. The DIRECT algorithm is given below. The parameter y 2 0 defines the minimum box diameter that is divided by DIRECT. If y is nonzero, then it can implicitly define a stopping condition for DIRECT. The boxes generated by DIRECT can be uniquely labeled with an integer index, j. We denote the center of boxj as cj, and let Xj(t) denote boxj in iteration t. This reflects the fact that the size of boxj can vary over time, but the center remains the same. The set Bt is the set of all boxes considered in iteration t. Note that boxes are never removed from Bt; dividing a box Xj(t) simply shrinks it and adds new boxes, so Xj(t+l) e B{t+1} after Xj(t) is divided. We denote the best box in iteration t as X*(t), which has center c*(t). The function diam(Xj (t)) computes the diameter of Xj(t), twice the distance from one vertex of Xj(t) to center point of that box. Finally, recall that ei is the unit vector in the ith dimension. I. Normalize the search space to the unit box. EvaluateJ(ci) and set Bt= {X1}. II. Fort=l, ... A. Identify the set S Bt of potentially optimal boxes. B. For each box X(t) e St where diam(Xj(t)) > y: 1. Identify I, thedimensions of Xj(t) with the longest side length. Let 8 equal onethird of this length. 2. Sample the function at the points c, + ei for all i I, where c, is the center of box Xj(t) and ei is the ith unit vector. 3. While I is not empty: a. Divide Xj(t) into thirds along the dimension i e Iwith the lowest value off(c, + 8ei) to create two new boxes and reduce the size of box Xj(t). b. Remove i from I and add the new boxes to B,. C. Terminate if an appropriate stopping rules satisfied. Since DIRECT divides boxes of all scales simultaneously, it can perform both a localized search and a global search (using both low and high values for the Lipschitz constant K). However, to ensure sufficient local progress, K is required to be large enough that there is a minimum possible improvement, e, over the current best point based on a slope of K. This is to prevent a small box from being divided when there is not much predicted possible improvement within that box. This lower bound on K is a function of box size; smaller boxes will have a higher lower bound than larger boxes. For our work we have used = 10 For our work we have used F. = 10 . f(x) f/ ,\.. slope = K slope = K a xl b Step 1 a x2 xl Step 2 b rr f(x) a box 1 b Step 1 I I I A / / slope K a box2 box box3 b Step 2 a x2 xl x3 b a x2 xl x3 x4 Step 3 Step 4 Figure 7. Lipschitzian optimization i i slopee K2 i Slope= K1 b a box 2 box 1 box 3 b box 2 box 4 box 1 box 5 box 4 box 5 Step 3 Blow up of Step 3 Figure 8. Onedimensional example of DIRECT Figures 7 and 8 contrasts the behavior of DIRECT and Lipschitzian optimization on a simple onedimensional problem. In Step 2 of Figure 8, box #1 is selected for dividing because, for any value of the Lipschitz constant K, that box can potentially contain a lower function value than any other box. In Step 3, both boxes 2 and 4 are selected for division because there are some values of K, K] and K2, for which those boxes potentially contain a lower value than any other box. It can be shown (Jones et al. 1993) that this requires these two boxes to lie on the bottom, right hand part of the convex hull of the set of boxes in a graph such as Figure 9. Potentially  S" Optimal > /0 o 0 . Boxes S 1st Iteration 2nd Iteration = Points S . / / Selected A  Kmln for Analysis \ = Analyzed Potentially Optimal Boxes Points o o o o Box Diameter > 3rd Iteration 4th Iteration Figure 9. Point selection in DIRECT. Figure 10. DIRECT box division in 2D. In Figure 9, each of the boxes is one of five sizes. The best designs from the second smallest box size and the largest and next to largest box sizes are potentially optimal because, for each of these, there is some value of K where that box could contain a better design than any other box. The box with the lowest function value is not potentially optimal because it is not a large improvement over the function value of the next larger box on the convex hull. Any value of K which would make this box potentially optimal would not provide for sufficient improvement over the current best value. Graham's scan routine is used to identify the potentially optimal boxes (Jones et al. 1993). Boxes that were not previously potentially optimal can become potentially optimal in later iterations as the other boxes are divided. (Figure 10) As the boxes on the convex hull of Figure 9 are divided, new boxes are added and the box that was divided becomes smaller. This may put a new set of boxes on the convex hull, some of which were interior points in previous iterations. DIRECT will usually select more than one box to divide at each iteration. This allows for a wider search than Lipschitzian optimization and makes DIRECT more suitable for parallelization (Baker et al. 2000). Potentially optimal boxes are divided into thirds on their long sides (Figure 10). Each dimension is examined separately, with two points added in each coordinate direction corresponding to a long side of the potentially optimal box. Thus the number of points from each box division increases linearly with the number of dimensions. These new points are ranked and the box containing the center point is divided into thirds in the dimension which will put the best new point into a box of its own. This is continued with the rest of the long dimensions, dividing off the best remaining point each time until the original box has been divided on all of the long sides. More recent work by Jones has suggested dividing each potentially optimal box in only one dimension per iteration. The selection of which long dimension to divide is based on which dimensions have been divided the most so far over the entire design space. He has found that this can reduce the cost of the optimization while only slightly reducing the robustness of the algorithm. The comparison given in this chapter was completed before the modified algorithm was proposed, however, the original algorithm was more useful for this work for several reasons. The extra robustness of the original algorithm is helpful for some of the difficult test problems used where there are a large number of local optima and the algorithm needs to explore more of the design space before concentrating on the best optimum it has found. There is more work using the original algorithm in the literature to examine. Finally, the conversion of DIRECT to a parallel implementation needs to increase the number of points analyzed at each iteration to allow the use of more processors and improve the load balancing across them. One of the shortcomings of the DIRECT algorithm is the lack of an obvious stopping criterion. Different authors have advocated stopping after a set number of function evaluations or a limit on the number of iterations (Jones et al. 1993, Nelson II 1998 and Baker et al. 2000). However, this does not take into account the behavior of DIRECT and can lead to continuing the search long after the minimum is located or stopping it while improvements are still being made. This implementation of DIRECT has employed a stopping criteria based on the smallest current box. The search is stopped once the size of the smallest box has reached a set limit, and a local optimizer is then used to refine the solution in the immediate vicinity. This is motivated by the recognized characteristic of DIRECT that it gets close to the optimum relatively quickly but the final convergence is slow compared to gradient based methods. DOT was used to perform SQP optimizations from up to 15 different starting points after DIRECT was stopped. These points were the 15 best points separated by at least 5% of the width of the design space. Additional stopping conditions are contained in recent work by He et al. (2002). Constraints with DIRECT The implementation of DIRECT used here uses a linear penalty function to handle constraints, g < 0. Initial work with this code was based on a quadratic penalty function. This tended to over penalize design points with moderate constraint violations. Due to the sparse sampling of the design space in the first few iterations, this penalty function could eliminate large regions from consideration. A linear penalty function was better able to sample more of the design space while still concentrating on the most promising regions. The penalty function is given as follows. F = f + gP,, (7) 1=1 wherefis the objective function, g is the constraint vector, and Pi is a penalty parameter, which is zero for satisfied constraints and a small positive number for violated constraints. The performance of DIRECT depends on the magnitude of the penalty parameter, P, used. The parameter should be as small as possible while still preventing the optimizer from selecting an infeasible design. Otherwise, boxes which cover the constraint boundaries can be over penalized by an infeasible center. This can make it unlikely that DIRECT will analyze points along the constraint boundaries. Choosing a value that is approximately twice the magnitude of the largest Lagrange multipliers will push the optimizer out of the infeasible region without over penalizing boxes with an infeasible center. For the HSCT problem the range constraint had the largest effect on the objective function. On average, the HSCT requires approximately 90 lbs. of fuel per mile of range deficiency. Therefore, the penalty constant was chosen to increase the objective function by twice this much per mile of violation. The range constraint is given as: g =2( R 1 (8) 5500 where R is the range. The objective function, f, is the gross weight normalized by 700,000 lbs. This gives a value of 0.071 for the penalty multiplier. For our study we rounded this up to 0.1. Test Functions The optimizers were compared for two algebraic test functions in addition to the HSCT problems. The Griewank function was used as an example of a problem with one true optimum superimposed with noise. A simple quartic function was optimized in a hypercube to examine the optimizer's ability to locate the global optimum from a large number of widely separated local optima. Each optimizer was run multiple times for the 5, 10, and 20design variable (DV) cases and the results are given below. Griewank Function The Griewank function is a quadratic function with noise added by including a cosine function. Without the noise, the function is convex with one optimum. This function was used to test how well the optimizers could move through the noise to locate the actual optimum. The onedimensional Griewank function is given in Figure 11. 10 9 8 7 6 5 4 3 .15 5 5 15 Figure 11. Griewank function in one dimension. The ndimensional Griewank function is defined as: n 2 n F(x)=l 1 + I cos (9) 1=1 d 1=1  The relative strength of the noise can be controlled by adjusting d. When dis small, the Griewank function is primarily a quadratic function with a noise component from the cosine terms. As d is increased, the objective function becomes flatter and the cosine portion becomes dominant. The function changes from a noisy function with one global optimum, to an almost flat surface with hundreds of nearly equal local optima. The constant d is taken to be 200, 1000 and 20,000 for the 5, 10, and 20 DV cases, respectively. The design domain was [400,600]n for the DOT and LFOPCV3 optimizers with random starting points. For DIRECT, the box had edges of length 1000 with the upper limit randomly set between 100 and 900 to randomize the results for the different runs. This value of d and the size of the design space is much larger than that used by He et al. (2002), which makes the problem more difficult to solve. Each optimizer was run enough times to compute the average number of evaluations for each optimum found. The global optimum is at x = 0. For this comparison, the optimizer was considered to have reached the global optimum if all of the design variables were in the range +0.1. The optimizers were compared based on the number of function evaluations required and the number of times each optimizer located the global optimum. The results are given in Table 1. We use two measures of efficiency of the optimizers. The first is the average number of function evaluations per optimum. The second measure is the number of function evaluations needed to achieve 90% confidence that the global optimum was found. The first measure favors the random search procedure because it is based on Table 1. Comparisons for 5, 10, and 20 DV Griewank functions. (d = 200, 1000 and 20000 respectively) # of optima located Function evaluations 90% 5 DV / # of runs per optimum confidence DIRECT 87/100 3400 3335 DOT 5/300 10260 23416 LFOPCV3 60/400 9050 19235 10 DV DIRECT 56/100 11810 18550 DOT 96 / 500 1260 2604 LFOPCV3 136/500 32240 63598 20 DV DIRECT 8/200 102650 231593 DOT 16/2000 19740 45265 LFOPCV3 128/2000 7220 16084 average performance that does not take into account that occasionally random search can give poor results purely due to chance. The second measure includes some cost of "insurance" against chance. To calculate the number of function evaluations for 90% confidence, we start by observing that the probability of not locating the optimum with one optimization run is pl=(n1)/n (10) where n is the total number of optimization runs per optimum located. The probability of not locating the optimum in r optimizations is pr= ((n1)/n)r (11) We setpr=0.1, and solve for r, which gives the number of runs needed for 90% confidence that the global optimum was found. Multiplying this number by the average number of function evaluations per run, a, gives the desired number of function evaluations for 90% confidence, f90. ln(0.1) f90 =a (12) In 1 n When p, is close to one, it is easy to check that the number of function evaluations for 90% confidence is approximately 2.3 (ln10) times the average number of function evaluations per optimum. The results in Table 1 do not give a clear advantage to any one optimizer but they do indicate that DIRECT is not the most efficient optimizer in higher dimensional space. The Griewank function is smooth with an underlying quadratic shape. This is suitable for gradient based optimization if the algorithm is capable of moving past the weak local optima. LFOPCV3 does this by design while DOT is able to do this by using approximations based on widely separated points. The lack of a clear trend in Table 1 may be due to the effect of d, which was different for each case. The relative strength of the noise changed as the number of design variables increased. To investigate the effect of the noise on the different optimizers, the 10variable case was repeated for different values ofd. The results for this comparison are given in Table 2. Table 2. Effect of Variation of d in 10dimensional Griewank function. # of times optimum function evaluations 90% DIRECT 100 runs located per optimum Confidence d= 4000 19/100 39480 81958 d= 1000 56/100 11810 18550 d= 200 83/100 6990 7539 DOT 500 runs d= 4000 37/500 3970 8801 d= 1000 96/500 1260 2604 d= 200 160/500 620 1188 LFOPCV 500 runs d= 4000 25/500 193410 434117 d= 1000 136/500 32240 63598 d= 200 390/500 830 989 As expected, the optimization is easier for small values of d, where the global optimum is more distinct from other local optima. DOT appears to be the least sensitive to this parameter while LFOPCV3 was the most sensitive to d. The approximations used by DOT appear to allow it to move past the noise and capture the underlying quadratic function. LFOPCV3, however, is slowed down as it passes through the noise and is not able to locate small improvements in the local optima for the large value ofd. Quartic Function The onedimensional quartic function is given in Figure 12. 0 12 14 6 8 12\ 14 Figure 12. Quartic function in one dimension(e= 0.2). The ndimensional quartic function is defined as F()= [2.2(x, + e,)2 (x + e)4], (13) =11 where n is the number of dimensions and ei is a random number uniformly distributed in the range [0.2, 0.4]. At the start of each optimization, the n values of e are selected and frozen until the optimization is complete. The values are then randomly selected again for the next run. The design space was the hypercube [2,2]n. The global optimum is at x = 2. This problem contains 3n local optima located at the constraint boundaries and close to the center of each variable. This problem was used to examine the optimizer's abilities to locate the global optimum for a nonconvex objective function with a large number of widely separated local optima. The optimizer was considered to have located the optimum if all of the design variables were greater than 1.9. DIRECT was run for 200 different random combinations of e for each case. LFOPCV3 and DOT used 20,000 starting points in order to get a statistically meaningful average number of function evaluations per optimum located. The results are given in Table 3. Table 3. Comparisons for 5, 10, and 20 DV quartic functions. # of optima located / function evaluations 90% 5 DV # of runs per optimum confidence DIRECT 200/200 1025 1025 DOT 410/20000 3397 7741 LFOPCV3 8 / 20000 2581418 5942746 10 DV DIRECT 200 / 200 2192 2192 DOT 16/20000 174189 400925 LFOPCV3 0 / 20000  20 DV DIRECT 200 / 200 11266 11266 DOT 0 / 20000  LFOPCV3 0 / 20000 For the Griewank function, none of the optimizers were 100% effective in locating the optimum. For the quartic function, DIRECT was 100% efficient at locating the optimum. This means that a single run of DIRECT more than satisfies the 90% requirement. For this reason the 90% confidence column equals the function evaluations per optimum column for DIRECT. DIRECT is clearly better suited for this problem than either DOT or LFOPCV3. Examination of the order of the points analyzed by DIRECT indicates that it is able to detect the slight global trend towards the global optimum from the first iteration. At each iteration, DIRECT divided the box which contains the global optimum. Due to the extreme change in function value near each local optimum, once DIRECT has sampled a point in the basin of a local optimum, it quickly refines the solution in that region. The large difference in function values also has the effect of masking the intermediate sized boxes when the convex hull is calculated for the potentially optimal box selection. DIRECT divided the best box and some of the largest boxes at each iteration while ignoring the intermediate sized boxes with average function values. The fact that DIRECT found the global optimum at the beginning of the search allowed it to quickly move to the best optimum but limited the amount of the design space that was searched. If DIRECT initially locates a different deep local optimum, then the optimization will converge to that local optimum as our experiments with the parallel DIRECT found (Chapter 5). While DIRECT does locate the basins of other local optima, its ability to locate the global trend in this problem allowed it to immediately move to the basin of the global optimum. DOT is unable to adequately capture the shape of the entire design space with the quadratic approximation. The quadratic approximation is only good for describing small portions of the design space, which prevents it from accurately extrapolating to the regions of other local optima. LFOPCV3 is strictly a local optimizer for this problem. It is designed to move through noise and weak local minima, not the deep basins found in this problem. In order for this optimizer to find the global optimum, it must start in the basin that contains the global optimum. For e = 0.3, the odds of starting in this region for the 5, 10, and 20 DV cases are 1 in 333, 1 in 111,000 and 1 in 123 x 10 respectively. This makes multistart local optimization methods a poor choice for this type of problem. LFOPCV3's performance in the 5 DV case indicates that it is not even this good at locating the optimum. In many cases LFOPCV3 stopped at saddle points and some points where the slope was not zero. HSCT Design Problem The design problem used to compare the three optimization methods is the configuration design of a HSCT. The HSCT design code uses up to 29 design variables and up to 68 constraints (MacMillin et al. 1997). The design goal is to minimize the gross take off weight (GTOW) while satisfying 68 range, performance, and geometric constraints. The HSCT code uses simple structural and aerodynamic models to analyze a conceptual design of the aircraft. In more recent versions of the code the aerodynamic analysis for the drag is replaced by a response surface constructed on the basis of a large number of simulations. The version of the HSCT code employed in this study employs an aerodynamic drag response surface based on solutions of the Euler equations ( McGrory et al. 1993). Once the design variables were selected using design of experiments theory, the wing camber for each design was found using Carlson's modified linear theory optimization method WINGDES (Carlson and Miller 1974 and Carlson and Walkley 1984). This design was then analyzed using the Euler equations. The Euler analysis was made at two angles of attack and a response surface was constructed for the drag polar in terms of the intervening variables CDo and K: CD = CDo + KC (14) The viscous contribution to the drag CDo is obtained from standard algebraic estimates of the skin friction assuming turbulent flow (Hopkins 1972). Previous work with the HSCT code has demonstrated the nonconvexity of the feasible domain and the existence of multiple local minima (Knill et al. 1999). In addition, several analyses and sub optimizations, including range calculations and structural weights, result in noisy performance functions. This noise adds additional local minima and makes accurate gradient calculations difficult. Optimization methods must be able to deal with this noise and move from one region of design space to another without becoming trapped in local minima. x Y x 4x 5 , x15 x I \X4 X 16x x XX x7 Z2 X3 x 18 19 Fuselage Radius x Figurel3. Definitions of design variables. *x Croot /100 ft. *x = t/croot 2 C t lp /10ft. x12 t/c break ** x3 = b/2 /10ft. x13= t/ctip ***x4 LE length m /100ft. *** x = Fuse radius /10ft. *x5 LE sweep m /100 *** x15= Fuse radius /10ft. **x6 = LE sweep out /100 *** x 16 Fuse radius /10ft. x7 = TE length /100ft. ***x 1 Fuse radius /10ft. x 8 TE sweep m /10 ** x 18 Inboard nacelle /10ft. ** x9 = max t/c loc. /10% x 19 Nacelle spacing /10ft. ** x LEradius /ft. x20 W0iel /105 lb Variables for 5 DV case x 10 ** =Variables added for 10 DV case *** = Variables added for 15 DV case The HSCT code optimization can work with subsets of 5, 10, 15, or 20 of the design variables as defined in Figure 13. The variables designated with one asterisk are used in the 5 DV case. The 10, 15 and 20 DV cases add the variables designated with two, three and four asterisks respectively. These design cases were used to compare the performance of the optimizers. The five DV cases have only one optimum, while the higher dimensional designs contain many local optima and nonconvex feasible regions. This allowed us to compare the optimizers on problems of different complexities. For each design case, the HSCT code was run for 100 starting points for DOT and LFOPCV3, and once for DIRECT. Table 4 shows the results from these runs. The 90% confidence column here refers to the number of function evaluations needed to find a design within 1000 lb of the best optimum design out of all 3 optimizers. In order to ensure that DIRECT will locate the same optimum with slightly different starting conditions, it was run a second time for each case with the design box perturbed by 1%. In each case, DIRECT located an optimum that was within 300 lb of the original optimum. For a small number of design variables DOT has a large advantage over DIRECT, but this advantage diminishes with dimension, and for the 20 design variable case, DIRECT is slightly better. LFOPCV3 did very poorly, and was not able to find the global optimum except for the five variable case. LFOPCV3 was slowed down by the need to continuously calculate gradients by finite differences while DOT was able to construct an approximation to the problem which reduced the number of gradient calculations it required. Table 4. Comparison of 5, 10, 15, and 20 DV HSCT problem. Best GTOW Function # of 90% 5 DV Located evaluations Opt* Confidence DIRECT 638238 2345 1 2345 DOT 638231 10870 94 89 LFOPCV3 638813 33234 5 14919 10 DV DIRECT 624751 9067 1 9067 DOT 624731 29791 12 5366 LFOPCV3 631300 137929 0 15 DV DIRECT 603127 40662 1 40662 DOT 602685 47440 12 8545 LFOPCV3 620538 1437312 0 20 DV DIRECT 588404 51147 1 51147 DOT 588586 57428 2 65453 LFOPCV3 620092 418315 0 1 run for DIRECT, 100 runs each for DOT and LFOPCV3 For the 5 DV case, each optimizer found approximately the same global optimum. This was expected due to earlier experiments that showed the 5 DV constraint set was convex with only one local optimum. The difference in the design variables is less than 1%, which can be attributed to noise in the objective function causing premature convergence for LFOPCV3. DIRECT uses DOT for the final convergence, which should cause both to reach the same design when optimizing in the same local region. Here DOT was the most efficient optimizer by far since it did not require more than a few starting points to locate the optimum. For the 10 DV case, DIRECT and DOT located a design of the same weight, with DOT being more efficient by a factor of two. The 10 DV case has distinct local optima, (Knill et al. 1999), and LFOPCV3 could not locate the global optimum. For the 15 DV case DOT found a slightly lower weight than DIRECT and was also much more efficient. However, this difference is less than 0.08% of the total weight of the HSCT. This case illustrates the unpredictable performance of a multistart method. Out of the initial 100 random starting points, DOT was only able to locate the global optimum once but it found seven designs with weights as good as DIRECT's result. To evaluate the likelihood of finding the optimum with DOT, another 100 random starting points were chosen for DOT. DOT was unable to locate the best optimum for this set of starting points. Examination of the design space near the best optimum indicates that it lies in a small feasible region, which explains the difficulty locating it. The designs found by DOT and DIRECT are shown in Table 5. They differ only slightly in most of the design variables. This indicates that the difference between the two is primarily due to noise preventing the optimizer from moving through a narrow region of similar weights to find the best optimum. Table 5. Fifteen DV optima for DOT and DIRECT.1 Design DOT DIRECT % difference Variables Root chord 1.7306 1.7116 4.2 Tip chord 0.8425 0.8563 2.3 Semispan 6.9164 6.9336 1.1 LE length in. 1.2052 1.2074 1.0 LE sweep in. 0.7128 0.7121 1.2 LE sweep out. 0.1205 0.1300 4.8 Max t/c loc. 4.8400 4.8502 0.6 LE radius 3.0660 2.4353 33.2 t/c ratio 1.9872 2.0304 5.4 Fuse. Radius 1 0.5176 0.5144 2.1 Fuse. Radius 2 0.5669 0.5699 2.0 Fuse. Radius 3 0.5659 0.5699 2.7 Fuse. Radius 4 0.4966 0.4928 2.5 Inboard nacelle 2.8293 2.8519 0.9 Fuel wt. 3.0938 3.1005 1.1 1 The design variables in Table 5 are defined in Figure 11. For the 20 DV case, DIRECT performed slightly better than DOT. The weights they found differed by only 200 lb, but the design variables differed by as much as 35% of their respective ranges. This case showed that DIRECT is still able to perform well for a 20 dimensional design space. The number of function evaluations increased rapidly for more than 10 dimensions, but it was still less expensive than the two multistart methods for the 20 DV case. Without prior knowledge of the design space, it is difficult to decide on the number of starting points to choose for the multistart methods. In an effort to determine the number of local optima scattered throughout the design space, the optimum designs from the 10 DV DOT optimizations were compared. Figure 10 shows the distribution of the design variables for the 25 best optimizations found by DOT over the range of each design variable. The range of the GTOW was only 16,000 lb. but some of the variables differ by their entire range. This plot indicates that there are some variables, such as the root chord, t/c ratio, the inboard nacelle location, fuel weight, and the outboard sweep angle, which tend to a small range of values for the optimized design. The other variables fluctuate widely without affecting the GTOW much. This indicates that the design space contains narrow channels with slowly varying weights. DOT easily locates these regions with similar weights but cannot always move through them without becoming trapped by minor fluctuations in the objective function. The HSCT problem has both physical local optima and numerical noise. The good performance of DIRECT and DOT for this problem is consistent with the fact that DOT handled well low amplitude noise for the Griewank test function, and DIRECT handled well widely separated local optima for the quartic test function. As the dimensionality of the problem increases, the number of substantially separated optima can increase exponentially, and the effectiveness of multistart methods appears to deteriorate faster than that of DIRECT. The poor performance of LFOPCV3 compared to the Griewank problem may have to do with the fact that the noise for the HSCT problem is not differentiable. Design Variable Variation Normalized to Variable range Design Variables 1. Root chord ____ __ 2. Tip chord S X 1 3. Semispan 0.8 4. Max t/c location 0.6 4A 5. LE radius S 6. t/c ratio 0.4 7. Inboard nacell 0.2 1 8. Fuel weight 0 o 9. LE sweep in 0 n  1 10. LE sweep out 1 2 3 4 5 6 7 8 9 10 Figure 14. Distribution of design variables for the best 25 designs found by DOT for the 10 DV HSCT case. Earlier work on optimizing the HSCT found that the aerodynamic and structural evaluations were noisy which hindered local optimization algorithms (Baker et al. 1998 and Balabanov et al. 1998). Response surfaces were used to reduce the noise and improve gradient calculations for the local optimizers. However, there still exist some sources of noise in the constraints and objective function which could account for the poor performance of LFOPCV3 for the HSCT problem. The use of additional response surfaces to smooth the analysis may improve the performance of LFOPCV3 for this case. Comparison Concluding Remarks Three optimization procedures were tested for the global optimization of a HSCT. DOT a local optimizer used with sequential quadratic programmingand LFOPCV3a semiglobal optimizerwere applied with random multistarts. DIRECTa global Lipschitzian optimizerwas applied in a single run. The three optimization procedures were first tested on two simple algebraic problems. The Griewank function is a convex function superimposed with low amplitude noise from a trigonometric function. The quartic test function does not have any noise but displays a large number of widely separated local optima. These two functions revealed that the two multistart procedures, DOT and LFOPCV3, dealt well with trigonometric noise, while DIRECT was much more effective in finding the global minimum among widely separated local optima. The HSCT problem presents a combination of numerical noise and multiple physical optima, and thus has elements from both test functions. For a small number of design variables DOT had a large advantage over DIRECT, but the advantage decreased with increasing dimensionality, with the two having similar performance for the 20 variable case. LFOPCV3 did poorly for the HSCT problem, possibly because of the effect of the noise on the derivatives that have to be calculated by its search procedure more frequently than by DOT. CHAPTER 3 DIRECTBP Based on the results from the optimization comparison, DIRECT was selected as a promising global optimization method for high dimensional designs. However, while DIRECT does have a guarantee of global convergence to the global optimum, this result does not necessarily imply its practical utility in most applications. DIRECT searches globally for promising regions of the design space while locally refining the solution about the best point located so far. Because of this, the behavior of DIRECT about the best points it finds quickly is of more practical importance than its performance in the limit as t? o. Other than the global guarantee of convergence, there is no guarantee that DIRECT will converge towards a locally optimal point at any intermediate time. By dividing the design space into separate regions or boxes, DIRECT may have difficulty extending its local search past the bounds of the current best box. Thus DIRECT only moves its local search between boxes when the global search identifies a better box. This can lead to DIRECT dividing boxes arbitrarily small on the border of a much larger box and indicating that no improvement can be made about that point when the local slope is clearly not zero. The incorporation of a local optimizer into DIRECT, either at the end of the optimization as in the previous chapter or periodically during the DIRECT optimization (Cox et al. 2001 and Nelson II 1998), can partially solved this problem. This has worked well for smooth functions with widely separated local optima where a local optimizer can quickly move to the only optimum in the immediate vicinity. However, for noisy functions with large numbers of weak local minima in the region of a true minimum, a local optimizer might easily become trapped and fail to locate the best point. It would be preferable in this case to modify the DIRECT search to allow it to refine boxes in the neighborhood of the best point without being confined by the bounds of the current best box. The modified version of DIRECT, DIRECTBP algorithm for Box Penetration was designed to address this issue. Original DIRECT Algorithm Convergence Behavior The DIRECT algorithm is a space partitioning heuristic for selecting the order of the boxes to divide, so as to bias the divisions towards boxes with known good function values while not ignoring sparsely sampled areas. Dividing the boxes in any order will guarantee that you locate a point within X of the global optimum by the time the design space has been completely sampled on a mesh of X spacing. The important distinguishing feature of different heuristics is how likely the method is to locate such a point when far fewer points have been analyzed. In this measure, DIRECT has proven to be a very efficient method. The global search using a high value of K is what gives DIRECT an asymptotic guarantee of convergence to the global optimum. However, this search is too expensive as a practical matter to depend on for locating a good value. In practice, DIRECT is run long enough to give the global portion of the search a fair chance of locating a basin which contains a very good optimum, which the local portion of DIRECT searches while the global part continues to look for a better basin. In this way, DIRECT only has to sample on a fine mesh near each good local optima and can ignore most of the design space where a sparse sampling indicates poor function values. One problem with this method is that the local portion of the search is confined to the potentially optimal boxes which contain the best points located. As these boxes get small, the local search is unable to make appreciable progress within that box towards the optimum. It may be useful to enable the local search to move past the bounds of the best box if there is a large unexplored region nearby. Let X*(t) be the box containing the best function value at iteration t, and c*(t) be the center point of box X*(t). Consider a situation such as Figures 15 and 16 where X*(t) is next to a much larger box with a poor function value at the center. When the local optimum near c*(t) is outside of X*(t), DIRECT is unable to move to the local optimum using the local portion of its search. The local portion of the DIRECT search will continue to divide X*(t) and sample points which asymptotically approach the box boundary near the optimum. That is, the progress of the local portion of DIRECT is limited by the boundary of X*(t). The local portion of the DIRECT search will stall, even if larger improvements are possible in the neighborhood of X*(t). Asymptotically, DIRECT deals with this situation through its global search. By continuing to divide the large boxes in the design space, the algorithm will eventually divide box 2 in Figure 15. This would allow DIRECT to shift the local search to the box that contains the optimum and eventually reach the correct solution. However, Figure 16 suggests that almost all of the boxes that are larger than box 2 will have to be divided before box 2 will be located on the convex hull and divided. Thus DIRECT may terminate before box 2 is identified and its subboxes are divided enough to locate a point Box boundary o I*. 2* o * Box F B > 1 .* * Potentially Optimal Boxes Position Box Diameter > Figure 15. Instance where the current best Figure 16. Possible distribution of points box does not contain the local for potentially optimal box optimum selection. that is better than any other located. This problem is aggravated in higher dimensional problems. The number of boxes will increase rapidly with the number of dimensions, and each box will generate more new boxes when it is divided. This results in more large boxes which may need to be divided before the one containing the global optimum is identified. Previous work has attempted to deal with problems like this by stopping the optimization early and performing additional DIRECT optimizations with a smaller design space centered at the current best point (BartholomewBiggs et al. 2003). However, this runs the risk of excluding regions of the design space which have unidentified good local optima. Demonstration of Local Convergence Difficulty The following example shows how DIRECT can stall in its progress towards a global minimum. Equation 15 defines fl, a combined quadratic and sinusoidal function. f.,(x= ( 1)sin((xl 0.1)1.5)+ (x .4)2 +0.05 cos(77x, )+ (15) 1.7 1.2 99 The last two terms are used to add 'noise' and make the contribution from each dimension different. The one dimensional graph of this function is given in Figure 17. 7 6 5 4 3 2 0 05 1 15 2 25 3 Figure 17. A one dimensional plot of fi(x) (Eq. 15) in the range x = [0,3]. If the design space is 0 < xi < 2.5, for i=l...,n, then DIRECT has no difficulty locating the global optimum at x = 1.1. For a ten dimensional case, DIRECT locates the optimum in about 100 iterations and 1500 function evaluations. However, setting the upper bound to 3 causes the search to move to the wrong portion of the design space initially. For the same ten dimensional case, DIRECT can be run for almost 19,000 iterations and over 60,000 function evaluations without locating the global optimum. Figures 18 and 19 show the progress of DIRECT towards the global optimum for the ten dimensional case of Equation 15 and shows how the progress of the best design stalls as it approaches x = 1. The convergence of DIRECT to a local optimum depends on locating the box that contains the optimum and then refining the solution within that box. For this problem, the function value at the center of the box that contains the global optimum is higher than the function value at any other box of that size or larger. This E 1.5 E '' 1.25 a 0.75 Q 0.5 0.25 0 DIRECTBP  DIRECT Figure 18. Distance between c*(t) and the global optimum for Eq. 15 in ten dimensions.  DIRECTBP  DIRECT Figure 19. Value of f(c*(t)) for fi (Eq. 15) in ten dimensions as the optimizations progress. means that every other box with side lengths of 13 or larger will be divided before the one containing the global optimum. For the 10 dimensional case there are 59,049 separate boxes this size that would have to be divided. For most practical implementations of DIRECT, the optimizer would be stopped before the middle box is divided for problems of as few as 5 or 6 dimensions. This prevents DIRECT from identifying the box that contains the global optimum as potentially optimal before it is stopped. The DIRECTBP Algorithm The DIRECTBP algorithm was proposed as a way to help with this shortcoming of the DIRECT algorithm. The DIRECTBP algorithm forces the neighboring boxes of the best box to be divided at a similar rate as the best box. This will allow DIRECTBP to extend its search for a local optimum beyond the bounds of the best box. This local search is inspired by generalized pattern search theory (Dennis and Torczon 1991, and Torczon 1997). By forcing the neighboring boxes to be divided as the best box is shrunk, DIRECTBP will be able to detect better regions near the best box that were masked by large neighbors with poor function values at the center. This will provide a more thorough search of the design space surrounding the best point located and add a measure of confidence that DIRECTBP has found the best point in the immediate vicinity in exchange for a moderate increase in the cost of the opotimization. The DIRECTBP algorithm differs from the original DIRECT algorithm by reprioritizing whether boxes are divided. Specifically, DIRECTBP may (a) prevent the best potentially optimal box from being divided and (b) add additional boxes in each iteration to the list of boxes that are divided. This reprioritization of the boxes in DIRECTBP ensures that the neighbors of the box containing the best point are refined and divided. This prevents the boxes surrounding the best point from becoming too large relative to the size of the best box being divided. This also allows DIRECTBP to balance its global search with effective local search about the best point without having the local search confined to a single box. Let oj(t) be the smallest box edge length of the jth box at iteration t, Xj(t), and let A*t = c*j(t), where X*(t) = Xj*(t). Further, let vj(t) be the vectors from the center of the best box, c*(t), to the center of box Xj(t), pj(t) = vj(t)l and Xj(t) = pj(t)/A*t. We say that Xj(t) is a neighboring box of X*(t) if the box touches at least one point of X*(t). A face neighbor is any neighboring box that shares an n 1 dimensional face with the best box. We say that there is a balanced neighborhood about the best box if (some) neighboring boxes have centers at similar distances Ivj on all sides of the best box. The balance in distances is controlled by a parameter A which defines the largest ratio of distances to the size of the smallest box (ki) that still allows vj to be considered a "similar" distance to the best box. The formal definition of "on all sides" comes through the concept of a positive spanning set defined below. A set of vectors Vis a positive spanning set for A c 9 n if each point in A is a nonnegative linear combination of finitely many elements of V. Consider Vt = { vj(t)  Xj(t) is a neighbor of X*(t) }. (16) Let Tt { el, ..., en}, such that the vectors in Tt indicate the faces of X*(t) not on the boundary of the design space. Figure 20 illustrates the values of Tt for boxes at different positions in the feasible domain. We say that Vt is a balanced neighborhood of X*(t) if V't c Vt is a positive spanning set for Tt and maxv <: AA* (17) ve Vt where A 2 1 is a user specified constant which defines the maximum allowed ratio of pj(t)/A*t for the boxes used to form a positive spanning set in the neighborhood of X*(t). Figure 20. An illustration of the sets of vectors Tt used for X*(t) at various positions in the feasible domain. The DIRECTBP algorithm is described below, with modifications to the original DIRECT algorithm on page 33 in boldface. I. Normalize the search space to the unit box. Evaluate f(c,), and set Bt = {X} II. For t= ,... A. Identify the set B c Bt of potentially optimal boxes. B. If t > 1, examine the neighborhood of X*(t) and modify S. 1. Form Vt = {vj(t)}, from the neighbors of X*(t), and let V't = { vj(t) e Vt I Xj(t) < A } 2. If V't is not a positive spanning set for Tt, then a. remove X*(t) from St if X*(t) is a cube; b. add the face neighbors Xj(t) of X*(t) to St, where the face neighbor has the property that oj(t) > A*t; c. optionally, add additional neighbors of X*(t) into St. C. For each box X(t) e St where diam(Xj(t)) > y: 1. Identify I, the dimensions of N(t) with the longest side length. Let 8 equal 13 of this length. 2. Sample the function at the points (cj 8ei) for all i e I. 3. While I is not empty: a. Divide Xj(t) into thirds along the dimension i e Iwith the lowest value of f(cj 8ei) to create two new boxes and reduce the size of box Xj(t). b. Remove i from I and add the new boxes to Bt+l. D. Terminate if an appropriate stopping rule is satisfied. In Step 2.b, DIRECTBP ensures that the best box is only divided if (a) it is not a cube or (b) it is a cube and the neighborhood is balanced. This allows DIRECT's local search to progress locally beyond the bounds of the current best box without relying on the global portion of the DIRECT search. The description above provides the basic requirements of this step, but we defer the details of our implementation until the next section. For example, the method for selecting boxes to add to St in Step 2.b can significantly impact the efficiency of the search in DIRECTBP. The most important consideration for efficiency is to rapidly form a balanced neighborhood with the fewest number of extra function evaluations. Thus, it is clear that an effective design for DIRECTBP will probably not include all of the neighbors that define Vt V't in St. However, including only one box per iteration is probably not sufficient to ensure rapid local search about X*(t). Figure 21 illustrates that it is possible for Vt to not form a positive spanning set for Tt. In this case, we need to select a neighboring box to divide in order to improve the span of V't. Regardless how the neighboring boxes are selected, V't will form a positive spanning set for Tt within a finite number of iterations. When X*(t) is on a boundary of the domain, then the requirement that V't positively spans Tt to form a balanced neighborhood has the following effect. Consider some d e Tt that is parallel to a domain boundary. V't will positively span the direction d if there exists some face neighbor Xj(t) for which cj is located along d and )j(t) < A. Further, if V't does not positively span d, the face neighbor indicated by d will be divided. This requires only finitely many iterations of DIRECTBP, since it suffices to divide this face neighbor until it is the same size as the best box. This puts the center of the face neighbor at c*(t) + ei A*(t), ei has the same direction as d. = Best Box Figure 21. Twodimensional example of vectors failing to form a positive spanning set. The vectors from the center of the best box to the centers of the neighboring boxes do not form a positive spanning set for Tt = {ei, e2}. Vm points towards the center of the design space not positively spanned by V' Finally, note that these changes to DIRECT will not affect the search far from the best point. Instead, the more balanced search about the best point adds an element of robustness to DIRECT by widening the search in the area likely to contain the global optimum. Consequently, DIRECTBP effectively converges both locally and globally. Implementation The performance of DIRECTBP may be quite dependent on the manner in which it is implemented. In particular, the number of points to be analyzed in each iteration will be significantly influenced by the method used by DIRECTBP to select the neighbors of X*(t) to divide. Further, Vt can become quite large, so the method of identifying V't that forms a positive spanning set for Tt may impact the overhead of the program. The code used here attempts to remain as close as possible to the original version of DIRECT to avoid hurting the performance on problems where DIRECT already performs well. DIRECTBP requires that V't forms a positive spanning set for Tt before a cubic X*(t) can be divided. Algorithmically, we break this into two separate subproblems. First, we identify a positive spanning subset of Vt for Tt. If such a subset does not exist, we select neighboring boxes to divide to ensure that such a subset exists in a subsequent iteration. Otherwise, we confirm that for all vectors v in the subset, vl/A*t < A. If this is not true, then we select neighboring boxes to divide to reduce the lengths of the long vectors in this subset. The details of this implementation are outlined below with a more detailed discussion following. I. Examine the neighborhood of X*(t) and modify St. A. Form V = {vj(t)}, from the neighbors of X*(t). B. Find the set Q of neighboring boxes with shortest vectors v(t) that form a positive spanning set for Tt, if it exists. Include all of the face neighbors of X*(t) in Q if V't is not a positive spanning set. C. If Q does not exist, 1. Add to the set S one or more of the neighboring boxes corresponding to the vectors in the set Vt that are considered likely to increase the balance of Vt when divided, if they are not already there, 2. Remove the best box X*(t) from St if X*(t) is cubic. D. Else if ) 2 A for any Xj(t) E Q or diam(X*(t)) is smaller than y, then 1. Add the boxes corresponding to all of the y(t) where Xj A, 2. If diam(X*(t)) < y, then add the boxes Xj(t) e Q where pj(t) > cy to St where A > c > 1, 3. Remove the best box from St if X*(t) is cubic. Identifying a Positive Spanning Set An important subroutine of DIRECTBP performs the test of whether a set of vectors forms a positive spanning set for Tt. The following theorem provides a general mechanism for performing this test efficiently. Theorem 1. Let V = {v, ...,vk} span Xi, then if 3o~O0 such that  1 v = va,v then Vforms a positive spanning setfor 5". k Proof Let w = v v. Since V spans 9n, it follows that V u V forms a J=1 k positive spanning set for 9n. Also vi = v + w Vi. Therefore, V u w positively j=1,j# spans V u V. Now suppose that 3 oi > 0 s.t. w =Xai vi. Then V positively spans V u w and in turn forms a positive spanning set for 9"n. Based on previous experiments, most practical implementations of DIRECTBP will use A >\/n. Using such a value of A, the vectors vi(t) shorter than AA*t are guaranteed to span 9in because of the way that DIRECT generates boxes. In this case, the check for a positive spanning set for interior X*(t) only requires solving a single set of constrained linear equations to calculate the ai from Theorem 1. Thus, as each box's vj(t) is added to V't to help form a positive spanning set, only solving a single set of constrained linear equations is needed to determine if a positive spanning set for Tt has been generated. If X*(t) is on the boundary, either a linear solve is required for each vector in Tt, or an intricate scheme to ensure that all directions d parallel to the boundary are positively spanned is used, in which case Theorem 1 can be used by artificially adding to Vt all the outer normals to X*(t) on the boundary. In this case, the requirement that the face neighbors of X*(t) on the boundary of the design space be divided to the same size as X*(t) is explicitly included to ensure that the added vectors do not artificially allow Vt to span the directions d parallel to the boundary. In the implementation ofDIRECTBP used in this research, instead of solving a set of linear equations to determine if a positive spanning set has been formed, an iterative heuristic was used to solve for Vm by maximizing the minimum angle between vm and all vi(t) (Figure 21). Starting with Vm equal to the negative average of vi(t), Vm is moved away from the closest vi(t) until the smallest angle with the closest vi(t) can not be increased. If Vt does not form a positive spanning set, then Vm is guaranteed to be located in the space not covered by Vt and the angles between Vm and all of the vi(t) will be greater than 900. If Vt does form a positive spanning set, then the maximum smallest angle will be less than 900. This method was used because of a lack of suitable code for solving a constrained set of linear equations when the code was first written and because it generates and approximation to Vm which is used in the next section. Maintaining a Positive Spanning Set Whenever the set Vt does not form a positive spanning set for Tt, some of the neighboring boxes and X*(t) (only if it is not square) must be divided until a balanced neighborhood is formed. There is a set N of neighbors that intersect the space not covered by the positive cone of the vectors in Vt. Dividing some of the boxes associated with N and perhaps X*(t) itself will eventually generate a positive spanning set for Tt about the best box. If all of the face neighbors in N are divided until they are the same size or smaller than the best box, then all of the coordinate directions in Tt not positively spanned by the set Vt will be added to Vt and it will be guaranteed to form a positive spanning set (see below). If some neighbors which are in N but are not face neighbors are also divided, the number of iterations needed to form a positive spanning set may decrease (though at the cost of additional boxes being divided at each iteration). In this implementation of the DIRECTBP algorithm, when no positive spanning set for Tt is formed from Vtwe select other neighboring boxes to divide by locating the v v (t) vector Vm which maximizes the minimum angle mini cos1 (Figure 21). In v v (t)Q addition to the face neighbors of X*(t), the boxes, Xi(t), corresponding to the vi(t) which form the smallest angles with Vm are added to the set of boxes to be divided. These boxes make up a subset of N and potentially have a large portion of their space not spanned by Vt. The use of all of the face neighbors is guaranteed to eventually form a positive spanning set and experiments have suggested that dividing the other boxes which were added can help generate a positive spanning set in fewer iterations. Limiting A DIRECTBP requires that some subset of neighboring points must define a positive spanning set and the ratio A= pmax(t)/A*t must be bounded above as A*t  0 where Pm (t) = max, p, (t). If the neighboring boxes do form a positive spanning set but some ki 2 A, then the best box is not divided if it is cubic and instead the neighbors needed for a positive spanning set where pi(t) 2 AA*t are divided. In order to eliminate the need to divide every neighboring box where pi(t) 2 AA*t, it is necessary to select the smallest set of pi(t) that are larger than AA*t and that are needed to form a positive spanning set. In order to select a near minimal set of pi(t) without adding excessive overhead, DIRECTBP starts with all of the boxes where pi(t) is less than a designated size. If the best box is larger than y, DIRECTBP starts with all of the boxes where pi(t) < AA*t. If the initial set of vectors does not form a positive spanning set, then DIRECTBP adds the box corresponding to the vector vi(t) closest to the direction Vm to this set of boxes (Figure 21). It continues to add boxes in this way until a positive spanning set is formed or all of the neighbors have been added. If vectors were added to form the positive spanning set, then the best box is not divided at that iteration if it is a cube and the neighboring boxes used to form the positive spanning set where pi(t) is larger than AA*t are divided. If the best box is too small to be divided any further, the optimizer reduces the maximum allowable length of the vectors used to form a positive spanning set. It replaces A by c and progresses as in the previous paragraph. This parameter, c, is used to force the optimizer to further refine the neighborhood of the best point at the end of the optimization. For this research the value of c was set to 10 while A ranged from 15 to 35. The value of c is linked to one possible stopping criterion for DIRECTBP based on the longest vector needed to form a positive spanning set. The fourth stopping rule on page 73 stops the optimizer when the longest vector needed to form a positive spanning set is shorter than 10 times the minimum divisible box size. Without changing A to c, the neighborhood would not be refined sufficiently for this stopping rule. The choice of A can greatly affect the performance of DIRECTBP. Choosing a value that is too small can increase the number of neighbors that are divided and degrade the efficiency of the search. Choosing a number that is too large can cause the optimizer to fail to search the surrounding boxes until after the best point has been refined to a very small box. This can lead to excessive sampling about a suboptimal box before examining its neighbors or DIRECTBP could reach its limit on function evaluations or iterations before dividing any of the neighbors. Local Search Figures 18 and 19 demonstrate the difference between the DIRECTBP search and the DIRECT search for the 10 dimensional case of the example problem given in equation 15. DIRECT progresses rapidly towards the optimum design in the first 40 iterations but then the local portion of the optimization becomes stuck at x = 1. DIRECT BP makes the same progress as DIRECT in the first 40 iterations but, for about the next 10 iterations, it stops dividing the best box while it examines the neighboring boxes for a better point. At this point DIRECT continues to divide the best box progressively smaller as it asymptotically approaches the corer of the best box. However, at about the 50th iteration DIRECTBP locates a better point in one of the neighboring boxes. This creates the jump in the function values and position seen in Figures 18 and 19. A new box next to the previous best box contains the new best point which allows the local search to step past the box boundary into the neighboring box and progress to the global optimum. The progress of the local portion of DIRECTBP may be slower than DIRECT while it examines the surrounding boxes, but DIRECTBP has the advantage of performing a more robust local search around X*(t). By forcing the neighbors of the best box to divide at the same rate as the best box, DIRECTBP can move from box to box to follow trends in the objective function that may not have been apparent to DIRECT due to the placement of the analyzed points. The local search does not stall at the edge of the box bordering x = 1. DIRECTBP is able to step past this point and identify the actual optimum in the neighboring box. Experimental Comparisons Test problems DIRECTBP was compared with DIRECT on five test problems: the simple sinusoidal function, fl, and Griewank's function given earlier and two engineering problems from Floudas et al. (1999). The ndimensional Griewank function is defined again as f2 (x + = +I cos (18) where d is a constant selected to control the relative magnitude of the quadratic and the trigonometric portions. For the 10 dimensional problem, d = 1000, each element in xu was randomly set in the range [100,400], and the elements of XL were set to give a span of 500 in each dimension. Each instance of xu and xL was run on both versions of DIRECT in order to compare their performance on the same problem. The one dimensional Griewank function is depicted again in Figure 22. The Griewank function is a quadratic function with noise in the form of a cosine function. Without the noise, the function is convex with one optimum. With the noise, the function adds thousands of local optima in the region where the derivative of the sum term has a magnitude < 1. This function was used to test how well the optimizers could move through the noise to locate the actual optimum. 4.5 0.54 : : 4 3.5 3 2.5  2 1.5 1 0.5 0 20 15 10 5 0 5 10 15 20 Figure 22. One dimensional Griewank function with d = 100. Two test problems from Floudas et al. (1999) were also used as problems more representative of engineering applications. The first problem is given in equations 19. min 5.3578x2 +0.8357x1x5 +37.2392xc (19) 0.00002584x3x5 0.00006663x x5 0.0000734xx4 14 0.000853007x2x5 + 0.00009395xx4 0.00033085x3x5 X 1 1330.3294x2x)c 0.42x,x,1 0.30586x2" x < 1 s.t. 0.00024186x2x5 + 0.00010159x x2 +0.00007379x2 1 2275.1327x3l5 1 0.2668xx51 0.40584x4X1 <1 0.00029955X3X5 + 0.00007992xX3 + 0.00012157X3X4 1 The bounds of the design space are given in equations 20. 78 < x, < 102 33 < x2 < 45 27 < x3 < 45 (20) 27 < x4 < 45 27 < x5 < 45 The second engineering problem from Flodas et al. (1999) is the optimal design of a Continuous Stirred Tank Reactor (CSTR) used in chemical manufacturing. This problem has eight variables with four nonlinear inequality constraints and is given in equations 21. min 0.4x067 7067 + 0.4x067067 +10.0 x, 2 (21) 0.0588xx7 + 0.1x, 1 0.0588x6x8 +0.1x, +0. 1x2 < s.t. 4x3X1 + 2x071X1 + 0.0588x 13X7 7 1 4x4X61 + 2x4 71x61 + 0.05884 13x 1 The bounds of the design space are given in equation 24. 0.1 x, <10, i = 1,...,8 (22) For both of these problems from Flodas et al. (1999), a penalty function was used to handle the constraints. A penalty multiplier of 100 was found to perform adequately. Experimental methods For this comparison, we assume that the local optimization is available for free and each optimizer will be considered to have located the global optimum if the best point it locates is within the basin containing the global optimum. Four different stopping conditions were used: 1. Stop when the diameter of the smallest box is < 0.0001. 2. Stop after a set number of iterations. 3. Stop after a set number of function evaluations. 4. (DIRECTBP only) Stop when a balanced neighborhood is formed with cA*t < 0.001. The Griewank function has a random component, the bounds of the design space. Therefore, it was run 30 times using the same sets of values of xu and xL for each stopping condition with each optimizer and the results were averaged. The DIRECTBP algorithm has one additional parameter not used in DIRECT, namely A. A designates how long a vector from c*(t) to the center of a neighboring box can be before the neighbor must be divided and A*t is held constant. This parameter can have a large effect on the performance of DIRECTBP so three values were used for each problem; 15, 25 and 35. Test results The first problem examined was the bound constrained problem in Equation 15. Table 6 gives the results of the two optimizers on this problem. The original version of DIRECT was unable to locate the global optimum but instead stopped at the point x = 1.0 for all three stopping conditions while DIRECTBP was able to locate the global optimum at x = 1.1 in all but two cases. In the two instances where the DIRECTBP was unable to locate the global optimum, it still performed better than the original DIRECT. However, in these cases the value of A was too large for DIRECTBP to successfully refine its search before the stopping rule was applied. This problem shows how exploring the neighborhood of the best point makes the search more robust and will lead the optimizer to a good local minimum when the global search is insufficient. Table 6. Comparison of DIRECT and DIRECTBP on fi of Eq. 15, n = 10. Method A Stop Cond Func Eval Iterations Opt Found Original NA 1 2613 92 No DIRECT NA 2 15967 2000 No NA 3 59981 18913 No DIRECTBP 15 1 9121 136 Yes 25 1 9019 133 Yes 35 1 28919 258 Yes 15 2 5525 100 Yes 25 2 5479 100 Yes 35 2 5799 100 No 15 3 8173 132 Yes 25 3 8711 132 Yes 35 3 8515 133 No 15 4 11789 148 Yes 25 4 12069 146 Yes 35 4 29777 264 Yes Table 7 summarizes the performance of DIRECT and DIRECTBP on the Griewank function. Both versions of DIRECT are able to get near the global optimum but locating the actual global minimum is more difficult. For both optimizers, and for any value of A, the global optimum was located nine out of the thirty optimizations. However, the best local optimum located in each optimization was different. The average value of the result returned by DIRECT was about 30% higher than that returned by DIRECTBP. This was primarily due to one run where DIRECT selected a point 27.5 units away from the global optimum with a function value of 1.17 while DIRECTBP found a better point 12.1 units away from the global optimum with a function value of 0.147. The Griewank function has all of the local optima clustered in one area of the design space around the global optimum. In this case, DIRECT naturally examines many of the boxes surrounding the best box, so the modifications for DIRECTBP do not have as large of an effect. However, this case illustrates how the extra robustness of the search around the best box can make an impact on multimodal problems in general. Table 7. Comparison of the average performance (30 runs) of DIRECT and DIRECTBP on the tendimensional Griewank function2 Method A Stop Cond Func Eval Iterations Ave Func Val Original NA 1 5678 69 .142 DIRECT NA 2 5859 70 .143 NA 3 10211 104 .142 DIRECTBP 15 1 8017 71 .106 25 1 7987 71 .106 35 1 7431 69 .107 15 2 7948 70 .107 25 2 7941 70 .107 35 2 7636 70 .107 15 3 10235 87 .105 25 3 10252 86 .106 35 3 10232 88 .105 15 4 9625 81 .105 25 4 9263 78 .106 35 4 8950 78 .105 The comparison of the first test problem from Floudas et al. (1999), Eq. 19, is given in Table 8. For this problem both optimizers located the global optimum of 10123 no matter which stopping condition was used. This is an example of a problem where DIRECT already searches the neighborhood of the best point without requiring any additional boxes to be divided. Here, DIRECT identifies the region containing the global optimum without any difficulty and divides the sequence of boxes containing the global optimum. The neighborhood of the best box is always balanced and there is no need to divide any of the neighboring boxes during the optimization. In a problem such as this where DIRECT is particularly efficient, the DIRECTBP algorithm only increases the overhead of the optimization but not the number of analyses. 2 For both optimizers and any value of A, the global optimum was located nine out of thirty runs. Table 8. Comparison of the performance of DIRECT and DIRECTBP problem En 19 on the third test Method A Stop Cond Func Eval Iterations Func Val Original NA 1 1033 47 10123 DIRECT NA 2 2859 100 10123 NA 3 993 47 10123 DIRECTBP 15 1 1033 47 10123 25 1 1033 47 10123 35 1 1033 47 10123 15 2 2859 100 10123 25 2 2859 100 10123 35 2 2859 100 10123 15 3 993 47 10123 25 3 993 47 10123 35 3 993 47 10123 15 4 1035 48 10123 25 4 1035 48 10123 35 4 1035 48 10123 The comparison of the results of the optimization of the CSTR, Eq. 21, is given in Table 9. The total number of function evaluations was limited to 80000 for all cases. Table 9. Comparison of the performance of DIRECT and DIRECTBP on the design of a CSTR, Eq. 21. Method A Stop Cond Func Eval Iterations Func Val Original NA 1 47357 97 3.9580 DIRECT NA 2 7087 100 3.9511 NA 3 5999 88 3.9516 DIRECTBP 15 1 80000 653 3.9525 25 1 79999 705 3.9576 35 1 61745 274 3.9526 15 2 7263 100 3.9511 25 2 7147 100 3.9511 35 2 7139 100 3.9511 15 3 5985 87 3.9524 25 3 5999 88 3.9524 35 3 5999 88 3.9519 15 4 79999 705 3.9579 25 4 79999 705 3.9579 35 4 79997 731 3.9547 For this problem the biggest penalty for DIRECTBP came when trying to use the first or fourth stopping criteria. Here, DIRECTBP was very slow at dividing the smallest boxes and in four cases reached the alternate limit of 80000 function evaluations before shrinking the best box sufficiently. The performance of DIRECT and DIRECT BP using the other stopping criteria however, showed that DIRECTBP was as quick at getting close to the global optimum as DIRECT and did not divide many more boxes per iteration than DIRECT, as illustrated by the second stopping rule. Both versions of DIRECT came close to the global optimum at 3.9511 and were efficient at locating the global optimum for the second and third stopping rules. These problems illustrate the improved robustness of the DIRECTBP algorithm over the standard DIRECT algorithm. By examining the neighborhood of the best box at each iteration, the DIRECTBP search can overcome one shortcoming of the DIRECT algorithm which can occasionally affect its performance on strongly multimodal problems. As the last three problems illustrate, this shortcoming of DIRECT does not affect the search on most problems but is instead a robustness issue. The modification is intended to improve the local search in the case where the global search is unable to move it to a neighboring box where a better function value can be found. The DIRECT BP algorithm is intended to increase the confidence that the result is at least locally optimal while the global portion of the DIRECT search suggests that the optimum found is a relatively good local optima within the design space. CHAPTER 4 MULTIFIDELITY DIRECT Based on the results from the optimization comparison, DIRECT was selected as a promising global optimization method for high dimensional designs. In order to increase the applicability of DIRECT to problems requiring expensive analysis, a multifidelity version is desired. Although DIRECT is relatively efficient and robust, it still requires tens of thousands of evaluations to locate the global optimum for more than about 10 design variables for nonconvex functions. This would make it too expensive to use highfidelity analysis to perform an optimization using the standard form of DIRECT. For this reason we have begun to explore different ways to incorporate multifidelity data into the DIRECT optimizer. Correction Methods In order to incorporate multifidelity data in the DIRECT algorithm, a correction ratio is used to improve the accuracy of a low fidelity analysis. The goal of the correction scheme is to improve the accuracy over the entire design space while putting the most emphasis on points near the optima found by DIRECT. The DIRECT algorithm provides a readymade method for selecting points to perform highfidelity analyses without substantially adding complexity to the algorithm. DIRECT selects boxes for division that are either large, which puts their centers far away from other analyzed points, or have low function values, which implies that they are near an optimum. This is also the desired distribution of highfidelity points. For the multifidelity implementation of the DIRECT algorithm, whenever DIRECT selects a box for division, a highfidelity analysis is performed at the center of the box before it is divided. The correction factor at this point would then be used to calculate a correction for the lowfidelity analyses generated in that box in one of two ways. Constant Correction Ratio The first correction factor, multifidelity version of DIRECT uses a constant correction factor (CCF) for each box. Figure 23 shows how this correction is applied. The correction factor is based on the center point of the last box that the lowfidelity point was in before the box was divided. The correction factor for the box to be divided is calculated immediately before the new points are analyzed and the correction is applied before the box is split into separate boxes. This avoids the need for response surfaces or other surrogates. If one of the corrected boxes is divided at a later iteration, a new high fidelity value is calculated at the center of that box to correct the new low fidelity points that are generated and the center point. Potentially Optimal Box 1. Previously known low fidelity value Design Space Calculate high fidelity value if unknown 2. Calculate ratio of high to low fidelity values at the center 3. Select new points and calculate low fidelity value at these points B = Low Fidelity data point 4. Apply correction factor to low O = High Fidelity data point fidelity values Fidelity = Corrected data point 5. Divide box based on corrected values Figure 23. Constant correction scheme for multifidelity DIRECT Equation 19 shows how the lowfidelity values are corrected in the CCF multifidelity DIRECT. ais the ratio of fhghc,,,e and f owc,, the high and lowfidelity function values at the center of the box being divided, while f0ow and foor, are the low fidelity and corrected function values at the new points that are generated in the box. fcorr = a flo (23) The CCF multifidelity DIRECT algorithm is outlined as follows. The components added for the multifidelity version are in boldface. I. Normalize the search space to the unit hypercube. EvaluateJ(ci) and set Bt = {Xi(1)}. II. For t =1, A. Identify the set S c B, of potentially optimal boxes. B. For each box X(t) e St where diam(Xj(t)) > y: 1. Perform a highfidelity analysis at the center point of the box and calculatefh 2. Calculate the ratio a = fi/f at the center 3. Identify I, the dimensions of Xj(t) with the longest side length. Let 8 equal onethird of this length. 4. Sample the function at the points q + 8ei for all i I, where c, is the center of box Xj(t) and ei is the ith unit vector. 5. Apply the correction ratio to the lowfidelity values within the box fcor =fia 6. While I is not empty: a. Divide Xj(t) into thirds along the dimension i e Iwith the lowest value off(c, + 8ei) to create two new boxes and reduce the size of box Xj(t). b. Remove i from I and add the new boxes to B,. C. Terminate if an appropriate stopping rule is satisfied. Linear Correction Response Surface The second correction method that we have studied uses a linear response surface (LRS) to predict the correction factor for the lowfidelity points. A linear fit to the correction factor at two points per dimension is used to predict the slope of the correction factor in part of the design space as described next. For each potentially optimal box, a highfidelity analysis is performed at the center, the correction factor at that point is calculated, and this correction factor is then compared with the correction factor used originally at that point. If the correction factor was accurate enough, then the slopes used in that box will not be updated. For this work the allowable error in the correction factor was set to 20%. The correction factor at that point will be combined with the slopes of the correction factor response surface that had been used to calculate the original correction factor at that point and used to calculate the correction factor for the low fidelity points generated in that box. If the correction factor was not accurate enough, then a high and lowfidelity analysis is performed at each new point that is generated to divide the potentially optimal box. The corrected value of each of the new points will then be set to the highfidelity value. The correction factors at these points will generate new slopes to be used in the boxes that result from dividing the potentially optimal box. For any dimensions that were not divided at this time, the slope of the response surface used to calculate the corrected value at the center of the potentially optimal box is copied over to the new response surface. This correction method refines the correction factor used within each box by incorporating a first order approximation to the correction factor. By checking the accuracy of the correction when each box is divided, the algorithm can identify where the correction response surface is still accurate and where it is in need of refinement. As each correction response surface is generated, it is numbered and the slopes are saved. For each box in the design space a variable is used to indicate which response surface it should use to calculate the correction factor for the points generated when it is divided. This version of the multifidelity DIRECT took much longer to converge than the original form of DIRECT. The convergence criterion that was used is based on the size of the smallest box in the design space just as in the original version of DIRECT. For this version, the size of the smallest box size shrunk rapidly at the start of the optimization but took a large number of iterations to get any smaller. DIRECT considers the points on the lower, right side of the convex hull of points in Figure 9 to be potentially optimal. Points on the left side of the convex hull are not potentially optimal because one can find a competing point that is both lower and is in a larger box. For the LRS version of DIRECT, after a few iterations, the smallest box generally had a corrected function value that was slightly higher than the corrected function value of a slightly larger box. This continued for a large number of iterations before a box with the smallest size would get divided. The difference between the function values of the smallest box and the smallest one selected for division was generally very small and comparable to the size of the error in the corrected value, but it resulted in an order of magnitude increase in the number of function evaluations used by DIRECT. For this reason, the points on the lower, left side of the convex hull were included provided their value was less than 0.1% higher than the next largest box on the convex hull. This was an arbitrary value based on limited experimentation, but it worked well for these test problems. Equation 26 shows how the lowfidelity values are corrected in the LRS multifidelity DIRECT for a box using the jh response surface. fhgh.,. and fow, are the high and lowfidelity function values at the center of the box being divided while fow and fo,,r are the lowfidelity and corrected function values at the new points that are generated. RSj(i) is the coefficient for the slope in the e, direction of response surface and (x, x ,,, ) is the distance of the new point from the center of the box being divided. fco= fh.gh + (RS, (i)(x x,, ) (24) fio center ) The LRS version of the multifidelity DIRECT algorithm is described as follows. The components added for this version are in boldface. I. Normalize the search space to the unit hypercube. A. Divide Box 1 1. Perform high and lowfidelity analysis at the points c, & ci + ei for all i e I, where c is the center of the design space and ei is the ith unit vector. 2. fcor highh for each point analyzed 3. Create correction response surface (RS) from initial points 4. Store coefficients in RS array and set 'RS to use' to 1 for these boxes 5. While I is not empty: a. Divide Xj(t) into thirds along the dimension i e Iwith the lowest value offcor (cj 8ei) to create two new boxes and reduce the size of box Xj(t). b. Remove i from I and add the new boxes to B,. =1, . Identify the set St c_ Bt of potentially optimal boxes. For each box Xj(t) e St where diam(Xj(t)) > y: 1. If no high fidelity analysis at center a. Perform high fidelity analysis at center b. Compare actual correction factor (CF) to CF used previously in that box c. If correction is inaccurate i. Identify I, the dimensions of Xj(t) with the longest side length. Let 8 equal onethird of this length. ii. Perform high and low fidelity analysis at the points c + ei for all i e I, where c is the center of the potentially optimal box and ei is the ith unit vector. iii. fcor =fhigh for each point analyzed iv. Create correction RS from new points. Use terms from old RS for undivided dimensions v. Store RS in next row of RS array II. For t A. B. vi. Update 'RS to use' in new boxes and divided box to indicate new RS d. If correction is OK i. Get 'RS to use' from potentially optimal box ii. Get RS from RS array iii. Identify I, the dimensions of Xj(t) with the longest side length. Let 8 equal onethird of this length. iv. Perform lowfidelity analysis at the points c + ei for all i I, where c is the center of the box and ei is the ith unit vector. v. Calculate correction factor (CF) for the center of the box vi. Apply RS and CF at the center of the box to low fidelity values 2. If previous high fidelity analysis at center a. Identify I, the dimensions of Xj(t) with the longest side length. Let 8 equal onethird of this length. b. Perform lowfidelity analysis at the points c + ei for all i I, where c is the center of the box and ei is the ith unit vector. c. Get 'RS to use' from potentially optimal box and the appropriate RS from the RS array. d. Calculate CF for the center of the box. e. Use RS and CF at the center of the box to calculate the CF for the low fidelity values. f. While I is not empty: a. Divide Xj(t) into thirds along the dimension i I with the lowest value offcor (c 8ei) to create two new boxes and reduce the size of box Xj(t). b. Remove i from I and add the new boxes to B,. C. Terminate if an appropriate stopping rule is satisfied. Test Problems The first test problem is a 6th order function confined to the box [2,2]n. The high fidelity function is defined as: y 25 + 2(x, +e, ) 1.5(x, + e,4 e +.2(x, + e, )6 (25) 7=1 The onedimensional graph of this function is given in Figure 24. 20 15 2 1.5 1 0.5 0 0.5 1 1.5 2 Figure 24. Highfidelity Function in One Dimension 