Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0012932/00001
## Material Information- Title:
- Applications of Parallel Global Optimization to Mechanics Problems
- Creator:
- SCHUTTE, JACO FRANCOIS (
*Author, Primary*) - Copyright Date:
- 2008
## Subjects- Subjects / Keywords:
- Algorithms ( jstor )
Approximation ( jstor ) Cost functions ( jstor ) Design analysis ( jstor ) Design optimization ( jstor ) Local minimum ( jstor ) Mathematical independent variables ( jstor ) Mathematical variables ( jstor ) Perceptron convergence procedure ( jstor ) Structural deflection ( jstor )
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Jaco Francois Schutte. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Embargo Date:
- 4/17/2006
- Resource Identifier:
- 77078670 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads:
schutte_j.pdf
schutte_j_Page_130.txt schutte_j_Page_017.txt schutte_j_Page_089.txt schutte_j_Page_018.txt schutte_j_Page_115.txt schutte_j_Page_145.txt schutte_j_Page_137.txt schutte_j_Page_009.txt schutte_j_Page_062.txt schutte_j_Page_002.txt schutte_j_Page_046.txt schutte_j_Page_113.txt schutte_j_Page_031.txt schutte_j_Page_061.txt schutte_j_Page_060.txt schutte_j_pdf.txt schutte_j_Page_073.txt schutte_j_Page_138.txt schutte_j_Page_127.txt schutte_j_Page_038.txt schutte_j_Page_083.txt schutte_j_Page_048.txt schutte_j_Page_084.txt schutte_j_Page_072.txt schutte_j_Page_128.txt schutte_j_Page_021.txt schutte_j_Page_008.txt schutte_j_Page_070.txt schutte_j_Page_122.txt schutte_j_Page_131.txt schutte_j_Page_044.txt schutte_j_Page_105.txt schutte_j_Page_003.txt schutte_j_Page_059.txt schutte_j_Page_144.txt schutte_j_Page_120.txt schutte_j_Page_096.txt schutte_j_Page_112.txt schutte_j_Page_110.txt schutte_j_Page_142.txt schutte_j_Page_033.txt schutte_j_Page_045.txt schutte_j_Page_147.txt schutte_j_Page_107.txt schutte_j_Page_101.txt schutte_j_Page_025.txt schutte_j_Page_056.txt schutte_j_Page_030.txt schutte_j_Page_132.txt schutte_j_Page_081.txt schutte_j_Page_063.txt schutte_j_Page_074.txt schutte_j_Page_034.txt schutte_j_Page_085.txt schutte_j_Page_022.txt schutte_j_Page_075.txt schutte_j_Page_013.txt schutte_j_Page_133.txt schutte_j_Page_055.txt schutte_j_Page_078.txt schutte_j_Page_080.txt schutte_j_Page_066.txt schutte_j_Page_015.txt schutte_j_Page_007.txt schutte_j_Page_035.txt schutte_j_Page_111.txt schutte_j_Page_116.txt schutte_j_Page_027.txt schutte_j_Page_012.txt schutte_j_Page_117.txt schutte_j_Page_140.txt schutte_j_Page_041.txt schutte_j_Page_086.txt schutte_j_Page_036.txt schutte_j_Page_149.txt schutte_j_Page_004.txt schutte_j_Page_019.txt schutte_j_Page_014.txt schutte_j_Page_135.txt schutte_j_Page_010.txt schutte_j_Page_082.txt schutte_j_Page_067.txt schutte_j_Page_148.txt schutte_j_Page_091.txt schutte_j_Page_121.txt schutte_j_Page_040.txt schutte_j_Page_146.txt schutte_j_Page_088.txt schutte_j_Page_039.txt schutte_j_Page_126.txt schutte_j_Page_095.txt schutte_j_Page_016.txt schutte_j_Page_136.txt schutte_j_Page_114.txt schutte_j_Page_106.txt schutte_j_Page_068.txt schutte_j_Page_099.txt schutte_j_Page_042.txt schutte_j_Page_071.txt schutte_j_Page_094.txt schutte_j_Page_023.txt schutte_j_Page_134.txt schutte_j_Page_058.txt schutte_j_Page_090.txt schutte_j_Page_065.txt schutte_j_Page_026.txt schutte_j_Page_103.txt schutte_j_Page_108.txt schutte_j_Page_057.txt schutte_j_Page_032.txt schutte_j_Page_098.txt schutte_j_Page_076.txt schutte_j_Page_051.txt schutte_j_Page_093.txt schutte_j_Page_005.txt schutte_j_Page_100.txt schutte_j_Page_064.txt schutte_j_Page_028.txt schutte_j_Page_054.txt schutte_j_Page_006.txt schutte_j_Page_043.txt schutte_j_Page_092.txt schutte_j_Page_097.txt schutte_j_Page_069.txt schutte_j_Page_141.txt schutte_j_Page_150.txt schutte_j_Page_143.txt schutte_j_Page_037.txt schutte_j_Page_053.txt schutte_j_Page_024.txt schutte_j_Page_139.txt schutte_j_Page_118.txt schutte_j_Page_124.txt schutte_j_Page_050.txt schutte_j_Page_001.txt schutte_j_Page_077.txt schutte_j_Page_049.txt schutte_j_Page_151.txt schutte_j_Page_123.txt schutte_j_Page_011.txt schutte_j_Page_152.txt schutte_j_Page_079.txt schutte_j_Page_109.txt schutte_j_Page_052.txt schutte_j_Page_104.txt schutte_j_Page_029.txt schutte_j_Page_129.txt schutte_j_Page_020.txt schutte_j_Page_102.txt schutte_j_Page_047.txt schutte_j_Page_119.txt schutte_j_Page_087.txt schutte_j_Page_125.txt |

Full Text |

APPLICATIONS OF PARALLEL GLOBAL OPTIMIZATION TO MECHANICS PROBLEMS By JACO FRANCOIS SCHUTTE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by JACO FRANCOIS SCHUTTE This work is dedicated to my parents and my wife Lisa. ACKNOWLEDGMENTS First and foremost, I would like to thank Dr. Raphael T. Haftka, chairman of my advisory committee, for the opportunity he provided me to complete my doctoral studies under his exceptional guidance. Without his unending patience, constant encouragement, guidance and expertise, this work would not have been possible. Dr. Haftka's mentoring has made a lasting impression on both my academic and personal life. I would also like to thank the members of my advisory committee, Dr. Benjamin Fregly, Dr. Alan D. George, Dr. Panos M. Pardalos, and Dr. Nam Ho Kim. I am grateful for their willingness to serve on my committee, for the help they provided, for their involvement with my oral examination, and for reviewing this dissertation. Special thanks go to Dr. Benjamin Fregly, who provided a major part of the financial support for my studies. Special thanks also go to Dr. Alan George whose parallel processing graduate course provided much of the inspiration for the research presented in this manuscript, and for reviewing some of my publications. Thanks go also to Dr. Nielen Stander who provided me with the wonderful opportunity to do an internship at the Livermore Software Technology Corporation. My colleagues in the Structural and Multidisciplinary Optimization Research Group at the University of Florida also deserve many thanks for their support and the many fruitful discussions. Special thanks go to Tushar Goel, Erdem Acar, and also Dr. Satchi Venkataraman and his wife Beth, who took me in on my arrival in the USA and provided me a foothold for which I will forever be grateful. The financial support provided by AFOSR grant F49620-09-1-0070 to R.T.H. and the NASA Cooperative Agreement NCC3-994, the "Institute for Future Space Transport" University Research, Engineering and Technology Institute is gratefully acknowledged. I would also like to express my deepest appreciation to my parents. Their limitless love, support and understanding are the mainstay of my achievements in life. Lastly, I would like to thank my wife, Lisa. Without her love, patience and sacrifice I would never have been able to finish this dissertation. TABLE OF CONTENTS A C K N O W L E D G M E N T S ................................................................................................. iv L IST O F TA B L E S .................... ...... ..... ....................... .. .. ..... ............ .. ix LIST OF FIGURES ............................... ... ...... ... ................. .x A B S T R A C T .......................................... ..................................................x iii CHAPTER 1 IN TR OD U CTION ............................................... .. ......................... .. State ent of Problem ....................................... ...... .......... .............. .. 1 Purpose of R research .................................. .. .. .. ........ .... ............. . Significance of R research .......................... ................ ... .... .... .... ............... .... . Parallelism by Exploitation of Optimization Algorithm Structure........................2 Parallelism through Multiple Independent Concurrent Optimizations ...............3 Parallelism through Concurrent Optimization of Decomposed Problems ............3 R o ad m ap ................................................................................................... . 4 2 B A C K G R O U N D ............................................................ ........ .......... .... .... Population-based Global O ptim ization..................................... ......... ............... 5 Parallel Processing in Optimization .................................... ... ..................... ....... 6 D ecom position in Large Scale Optim ization..................................... .....................7 Literature Review: Problem Decomposition Strategies ..........................................8 Collaborative O ptim ization (C O ) .................................. ..................................... 8 Concurrent SubSpace Optimization (CSSO) ................................................11 A nalytical Target Cascading (A TC)................................................................ 15 Quasiseparable Decomposition and Optimization ...........................................17 3 GLOBAL OPTIMIZATION THROUGH THE PARTICLE SWARM ALGORITHM .................................. ... .. .......... .......... .... 18 O overview ............. ..... ..... ... ............................................. ............... 18 In tro du ctio n ......... ....................... ........................... ... .. ................19 Theory .................................................... ............... 21 Particle Swarm Algorithm ........................... ...............21 A analysis of Scale Sensitivity. ........................................ ......................... 24 M eth odology ................................................................................. ....................2 8 Optimization Algorithms......... .......................................... 28 A nalytical Test Problem s ............................................................................. 30 Biom echanical Test Problem ........................................ ........................... 32 R e su lts ........................................................................................................... 3 7 D iscu ssio n .................41.............................................. C o n c lu sio n s........................................................................................................... 4 6 4 PARALLELISM BY EXPLOITING POPULATION-BASED ALGORITHM S T R U C T U R E S ..................................................................................................... 4 7 O v e rv iew .........................................................................................4 7 Introduction ........................................................................................................ 48 Serial Particle Sw arm A lgorithm ...................................................................... ..... 50 Parallel Particle Sw arm A lgorithm ................................................................... 53 Concurrent Operation and Scalability ........................................ ......53 Asynchronous vs. Synchronous Implementation .................................... 54 C o h eren c e ...................................................................................... ............. 5 5 Network Communication ................................................56 Synchronization and Implementation....................... .............. 58 Sample Optimization Problems .................................................59 Analytical Test Problems .................................................. 59 Biomechanical System Identification problems ............. .... ............ 60 Speedup and Parallel Efficiency .................................................................. 63 N u m eric al R e su lts................................................................................................. 6 5 D isc u ssio n ............................................................................................................. 6 7 C o n c lu sio n s........................................................................................................... 7 3 5 IMPROVED GLOBAL CONVERGENCE USING MULTIPLE INDEPENDENT O P T IM IZ A T IO N S ............................................................................................... 74 Overview .............. ..................................................74 Introdu action ............. ..................................................................................... 74 M methodology ............. ....................................................................................77 A nalytical T est Set ............................................................77 M ultiple-run M ethodology ........................................ ...... ........ 78 Exploratory run and budgeting scheme ................................................81 Bayesian convergence probability estimation......................... ...84 N um erical R esults................................ .... .. .................................. 85 Multi-run Approach for Predetermined Number of Optimizations .....................85 M ulti-run Efficiency ........................... .... ..................... .. .............. 87 Bayesian Convergence Probability Estimation ...................................... 89 Monte Carlo Convergence Probability Estimation......................... ...92 C o n c lu sio n s........................................................................................................... 9 2 6 PARALLELISM BY DECOMPOSITION METHODOLOGIES ..........................94 O v e rv iew .................................................................................. 9 4 Introdu action ...........................................................................................94 Quasiseparable Decomposition Theory .......................... ......... ............... 96 Stepped H ollow Cantilever Beam Exam ple ........................................ ....................98 Stepped hollow beam optimization. ...................................... ............... 102 Quasiseparable Optimization Approach ............. ............. .............. ........ 104 R e su lts .................. ....................... ..........................................................1 0 6 All-at-once Approach ............................... ................................. 106 Hybrid all-at-once Approach ..............................................107 Quasiseparable Approach ....... ............................................................. 108 Approximation of Constraint Margins ............................. ............. 109 D iscu ssion ......... .... ................................................ ............ ...... .... ... 112 C o n clu sio n s.................................................... ................ 1 15 7 CONCLUSIONS ...................... ........ ................... 116 Parallelism by Exploitation of Optimization Algorithm Structure ...........................116 Parallelism through Multiple Independent Optimizations......................................116 Parallelism through Concurrent Optimization of Decomposed Problems .............17 Future Directions .................. ................................................. ........ 117 S u m m a ry ........................................................................................1 1 8 APPENDIX A ANALYTICAL TEST PROBLEM SET ..............................119 G riew an k ......... ... .............................................................................. 1 19 H artm an 6 ................................ ... ................................. ........... 119 S h e k e l 1 0 ......................................................................... 12 0 B MONTE CARLO VERIFICATION OF GLOBAL CONVERGENCE PR O B A B IL ITY ...... ... .............. .. .... .................................... ........ 122 LIST OF REFEREN CES ........................................... ........................ ............... 126 BIOGRAPHICAL SKETCH ............... ................. ............... 138 LIST OF TABLES Table page 1 Standard PSO algorithm parameters used in the study ........................................24 2 Fraction of successful optimizer runs for the analytical test problems ..................37 3 Final cost function values and associated marker distance and joint parameter root-mean-square (RMS) errors after 10,000 function evaluations performed by multiple unsealed and scaled PSO, GA, SQP, and BFGS runs.............................40 4 Parallel PSO results for the biomechanical system identification problem using synthetic marker trajectories without and with numerical noise.............................66 5 Parallel PSO results for the biomechanical system identification problem using synthetic marker trajectories without and with numerical noise..............................67 6 Particle swarm algorithm parameters ............................................... ............... 77 7 Problem convergence tolerances................................... .............................. ....... 78 8 Theoretical convergence probability results for Hartman problem .........................87 9 Minimum, maximum and median fitness evaluations when applying ratio of change stopping criteria on pool of 1,000 optimizations for Griewank, Hartman and Shekel problem s .......................... ........................ .. ............ .... 89 10 Beam material properties and end load configuration. ........................................101 11 Stepped hollow beam global optimum ................. ......... ............ ..... ............. 104 12 All-at-once approach median solution ....................................... ............... 107 13 Hybrid, all-at-once median solution.... ................... ........................108 14 Quasiseparable optimization result .......................... ......... .... ............... 110 15 Surrogate lower level approximation optimization results ..................................112 16 H artm an problem constants......................................................... ............... 120 17 Shekel problem constants..................... ................. ........................... 121 LIST OF FIGURES Figure page 1 Collaborative optimization flow diagram ..................................... .................10 2 Collaborative optimization subspace constraint satisfaction procedure (taken from [6]) ...................................... .................................. ........... 10 3 Concurrent subspace optimization methodology flow diagram.............................13 4 Example hierarchical problem structure ..................... .............................16 5 Sub-problem inform ation flow .......................................................... ............... 16 6 Joint locations and orientations in the parametric ankle kinematic model ..............33 7 Comparison of convergence history results for the analytical test problems........... 38 8 Final cost function values for ten unsealed (dark bars) and scaled (gray bars) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem.........39 9 Convergence history for unsealed (dark lines) and scaled (gray lines) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem...................40 10 Sensitivity of gradient calculations to selected finite difference step size for one d esig n v ariab le .............................................................................. ............... 4 3 11 Serial implementation of PSO algorithm ...................................... ............... 54 12 Parallel implementation of the PSO algorithm ........... .......... ............... 57 13 Surface plots of the (a) Griewank and (b) Corana analytical test problems showing the presence of multiple local minima ....................................................61 14 Average fitness convergence histories for the (a) Griewank and (b) Corana analytical test problems for swarm sizes of 16,32,64,and 128 particles and 10000 sw arm iteration s .................................................... ................ 64 15 Fitness convergence and parameter error plots for the biomechanical system identification problem using synthetic data with noise................ .............. ....68 16 (a) Speedup and (b) parallel efficiency for the analytical and biomechanical optim ization problem s .............................................................. .... 69 17 Multiple local minima for Griewank analytical problem surface plot in two dim en sion s ........... ............ ....................................... ................... 75 18 Cumulative convergence probability Pc as a function of the number of optimization runs with assumed equal P, values....................................................81 19 Fitness history and convergence probability Pc plots for Griewank, Hartman and S h ek el p rob lem s .................................................... ................ 82 20 Typical Shekel fitness history plots of 20 optimizations (sampled out of 1000).....83 21 Shekel convergence probability for an individual optimization as a function of fitness evaluations and population size ........................................ ............... 85 22 Theoretical cumulative convergence probability Pc as a function of the number of optimization runs with constant P, for the Hartman problem ...........................86 23 Theoretical convergence probability Pc with sets of multiple runs for the G riew ank prob lem .............................................................................. ...... 88 24 Theoretical convergence probability Pc using information from exploratory optimizations which are stopped using a rate of change stopping condition for the Griewank, Hartman and Shekel problems............................... ............... 90 25 Bayesian Pc estimation comparison to using extrapolated and randomly sampled optimizations out of pool of 1000 runs for Griewank problem.............................91 26 Bayesian Pc estimation comparison to using extrapolated and randomly sampled optimizations out of pool of 1000 runs for Hartman problem. .............................91 27 Bayesian Pc estimation comparison to using extrapolated and randomly sampled optimizations out of pool of 1000 runs for Shekel problem ................................92 28 Stepped hollow cantilever beam ........................................ ......................... 99 29 Dimensional parameters of each cross section...................................................... 99 30 Projected displacement in direction 0 ........................................ ............... 100 31 Tip deflection contour plot as a function of beam section 5 with height h, and width w with yield stress and aspect ratio constraints indicated by dashed and dash dotted lines respectively .......................................................... ............... 103 32 Quasiseperable optimization flow chart. ...................................... ............... 105 33 Results for 1000 all-at-once optimizations..................... ..................... 106 34 Hybrid PSO-fmincon strategy for 100 optimizations ........................................108 35 Repeated optimizations of section 1 subproblem using fnincon function............110 36 Summed budget value and constraint margins for individual sections ................111 37 Global and local optimum in section 1 sub-optimization. Scale is 0.1:1 ..............111 38 Decomposed cross section solution. Scale is 0.1:1 .............. ...... ....................112 39 Target tip deflection value histories as a function of upper-level fitness ev alu atio n s ...................................... ......... .................. ................ 1 13 40 Constraint margin value histories as a function of upper-level function evaluations ............ ... ....................... ......... .............. ......... 114 41 Predicted and Monte Carlo sampled convergence probability Pc for 5 independent optimization runs for the Griewank problem.................................122 42 Predicted and Monte Carlo sampled convergence probability Pc for 12 independent optimization runs for the Griewank problem................................... 123 43 Monte Carlo sampled convergence probability Pc with sets of multiple runs for the Griewank problem ...................................... ......... .................... 123 44 Monte Carlo sampled convergence probability Pc using information from exploratory optimizations stopped using a rate of change stopping condition for the Griewank, Hartman and Shekel problems............................................124 45 Bayesian Pc comparison for Griewank, Hartman and Shekel problem................ 125 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy APPLICATIONS OF PARALLEL GLOBAL OPTIMIZATION TO MECHANICS PROBLEMS By Jaco Francois Schutte December 2005 Chair: Raphael T. Haftka Cochair: Benjamin J. Fregly Major Department: Mechanical and Aerospace Engineering Global optimization of complex engineering problems, with a high number of variables and local minima, requires sophisticated algorithms with global search capabilities and high computational efficiency. With the growing availability of parallel processing, it makes sense to address these requirements by increasing the parallelism in optimization strategies. This study proposes three methods of concurrent processing. The first method entails exploiting the structure of population-based global algorithms such as the stochastic Particle Swarm Optimization (PSO) algorithm and the Genetic Algorithm (GA). As a demonstration of how such an algorithm may be adapted for concurrent processing we modify and apply the PSO to several mechanical optimization problems on a parallel processing machine. Desirable PSO algorithm features such as insensitivity to design variable scaling and modest sensitivity to algorithm parameters are demonstrated. A second approach to parallelism and improving algorithm efficiency is by utilizing multiple optimizations. With this method a budget of fitness evaluations is distributed among several independent sub-optimizations in place of a single extended optimization. Under certain conditions this strategy obtains a higher combined probability of converging to the global optimum than a single optimization which utilizes the full budget of fitness evaluations. The third and final method of parallelism addressed in this study is the use of quasiseparable decomposition, which is applied to decompose loosely coupled problems. This yields several sub-problems of lesser dimensionality which may be concurrently optimized with reduced effort. CHAPTER 1 INTRODUCTION Statement of Problem Modem large scale problems often require high-fidelity analyses for every fitness evaluation. In addition to this, these optimization problems are of a global nature in the sense that many local minima exist. These two factors combine to form exceptionally demanding optimization problems which require many hours of computation on high-end single-processor computers. In order to efficiently solve such challenging problems parallelism may be employed for improved optimizer throughput on computational clusters or multi-core processors. Purpose of Research The research presented in this manuscript is targeted on the investigation of methods of implementing parallelism in global optimization. These methods are (i) parallel processing through the optimization algorithm, (ii) multiple independent concurrent optimizations, and (iii) parallel processing by decomposition. Related methods in the literature are reported and will be compared to the approaches formulated in this study. Significance of Research Parallel processing is becoming a rapidly growing resource in the engineering community. Large "processor farms" or Beowulf clusters are becoming increasingly common at research and commercial engineering facilities. In addition to this, processor manufacturers are encountering physical limitations such as heat dissipation and constraints on processor dimensions at current clock frequencies because of the upper limit on signal speeds. These place an upper limit on the clock frequencies that can be attained and have forced manufacturers to look at other alternatives to improve processing capability. Both Intel and AMD are currently developing methods of putting multiple processors on a single die and will be releasing multiprocessor cores in the consumer market in the near future. This multi-core technology will enable even users of desktop computers to utilize concurrent processing and make it an increasingly cheap commodity in the future. The engineering community is facing more complex and computationally demanding problems as the fidelity of simulation software is improved every day. New methods that can take advantage of the increasing availability of parallel processing will give the engineer powerful tools to solve previously intractable problems. In this manuscript the specific problem of the optimization of large-scale global engineering problems is addressed by utilizing three different avenues of parallelism. Any one of these methods and even combinations of them may utilize concurrent processing to its advantage. Parallelism by Exploitation of Optimization Algorithm Structure Population-based global optimizers such as the Particle Swarm Optimizer (PSO) or Genetic Algorithms (GA' s) coordinate their search effort in the design space by evaluating a population of individuals in an iterative fashion. These iterations take the form of discrete time steps for the PSO and generations in the case of the GA. Both the PSO and the GA algorithm structures allow fitness calculations of individuals in the population to be evaluated independently and concurrently. This opens up the possibility of assigning a computational node or processor in a networked group of machines to each individual in the population, and calculating the fitness of each individual concurrently for every iteration of the optimization algorithm. Parallelism through Multiple Independent Concurrent Optimizations A single optimization of a large-scale problem will have significant probability of becoming entrapped in a local minimum. This risk is alleviated by utilizing population based algorithms such as PSO and GA's. These global optimizers have the means of escaping from such a local minima if enough iterations are allowed. Alternatively, a larger population may be used, allowing for higher sampling densities of the design space, which also reduces the risk of entrapment. Both these options require significant additional computation effort, with no guarantee of improvement in global convergence probability. A more effective strategy can be followed while utilizing the same amount of resources. By running several independent but limited optimizations it will be shown that in most cases the combined probability of finding the global optimum is greatly improved. The limited optimization runs are rendered independent by applying a population based optimizer with different sets of initial population distributions in the design space. Parallelism through Concurrent Optimization of Decomposed Problems Some classes of such problems may be subdivided into several more tractable sub- problems by applying decomposition strategies. This process of decomposition generally involves identifying groups of variables and constraints with minimal influence on one another. The choice of which decomposition strategy to apply depends largely on the original problem structure, and the interaction among variables. The objective is to find an efficient decomposition strategy to separate such large scale global optimization problems into smaller sub-problems without introducing spurious local minima, and to apply an efficient optimizer to solve the resulting sub-problems. Roadmap A background on the global optimization of large scale optimization problems, appropriate optimization algorithms and techniques, and parallelism will be presented in Chapter 2. Chapter 3 presents an evaluation of the global, stochastic population based algorithm, the Particle Swarm Optimizer, through several analytical and biomechanical system identification problems. In Chapter 4 the parallelization of this population based algorithm is demonstrated and applied. Chapter 5 details the use of multiple independent concurrent optimizations for significant improvements in combined convergence probability. Chapter 6 shows how complex structural problems with a large number of variables may be decomposed into multiple independent sub-problems which can be optimized concurrently using a two level optimization scheme. In Chapter 7 some conclusions are drawn and avenues for future research are proposed. CHAPTER 2 BACKGROUND Population-based Global Optimization Global optimization often requires specialized robust approaches. These include stochastic and/or population-based optimizers such as GAs and the PSO. The focus of this research is on exploring avenues of parallelism in population based optimization algorithms. We demonstrate these methods using the stochastic Particle Swarm Optimizer, which is a stochastic search algorithm suited to continuous problems. Other merits to the PSO include low sensitivity to algorithm parameters, and insensitivity to scaling of design variables. These qualities will be investigated in Chapter 3. This algorithm does not require gradients, which is an important consideration when solving problems of high dimensionality, often the case in large scale optimization. The PSO has a performance comparable to GAs, which are also candidates for any of the methods of parallelism proposed in this manuscript, and may be more suitable for discrete or mixed variable types of problems. In the research presented in this manuscript the PSO is applied to a biomechanical problem with a large number of continuous variables. This problem has several local minima, and, when attempting to solve it with gradient based optimizers, demonstrated high sensitivity to the scaling of design variables. This made it an ideal candidate to demonstrate the desirable quantities of the algorithm. Other application problems include structural sizing problems and composite laminate angle optimization. Parallel Processing in Optimization There are five approaches which may be utilized to decompose a single computational task into smaller problems which may then be solved concurrently. These are geometric decomposition, iterative decomposition, recursive decomposition, speculative decomposition and functional decomposition, or a combination of these [1,2]. Among these, functional decomposition is most commonly applied and will also be the method of implementation presented in Chapter 4. The steps followed in parallelizing a sequential program consist of the following from [1]. 1. Decompose the sequential program or data into smaller tasks. 2. Assign the tasks to processes. A process is an abstract entity that performs tasks. 3. Orchestrate the necessary data access, communication, and synchronization. 4. Map or assign the processes to computational nodes. The functional decomposition method is based on the premise that applications such as an optimization algorithm may be broken into many distinct phases, each of which interacts with some or all of the others. These phases can be implemented as co- routines, each of which will execute for as long as it is able and then invoke another and remain suspended until it is again needed. Functional decomposition is the simplest way of parallelization if it can be implemented by turning its high-level description into a set of cooperating processes [2]. When using this method, balancing the throughput of the different computational stages will be highly problematic when there are dependencies between stages, for example, when data requires sequential processing by several stages. This limits the parallelism that may be achieved using functional decomposition. Any further parallelism must be achieved through using geometric, iterative or speculative decomposition within a functional unit [2]. When decomposing a task into concurrent processes some additional communication among these routines is required for coordination and the interchange of data. Among the methods of communication for parallel programming the parallel virtual machine (PVM) and the message-passing interface (MPI) are the most widely used. For the research undertaken in Chapter 4 a portable implementation of the MPI library [3,4] containing a set of parallel communication functions [5] is used. Decomposition in Large Scale Optimization The optimization community developed several formalized decomposition methods such as Collaborative Optimization (CO) [6], Concurrent SubSpace Optimization (CSSO) [7], and Quasiseparable decomposition to deal with the challenges presented by large scale engineering problems. The problems addressed using these schemes include multi- disciplinary optimization in aerospace design, or large scale structural and biomechanical problems. Decomposition methodologies in large scale optimization are currently intensely studied because increasingly advanced higher fidelity simulation methods result in large scale problems becoming intractable. Problem decomposition allows for: 1. Simplified decomposed subsystems. In most cases the decomposed sub-problems are of reduced dimensionality, and therefore less demanding on optimization algorithms. An example of this is the number of gradient calculations required per optimization iteration, which in the case of gradient based algorithms, scales directly with problem dimensionality. 2. A broader work front to be attacked simultaneously, which results in a problem being solved in less time if the processing resources are available. Usually computational throughput is limited in a sequential fashion, i.e., the FLOPS limit for computers. However, if multiple processing units are available this limit can be circumvented by using an array of networked computers, for example, a Beowulf cluster. Several such decomposition strategies have been proposed (see next section for a short review), all differing in the manner in which they address some or all of the following. 1. Decomposition boundaries, which may be disciplinary, or component interfaces in a large structure. 2. Constraint handling 3. Coordination among decomposed sub-problems. Literature Review: Problem Decomposition Strategies Here follows a summary of methodologies used for the decomposition and optimization of large scale global problems. This review forms the background for the study proposed in Chapter 6 of this manuscript. Collaborative Optimization (CO) Overview. The Collaborative Optimization (CO) strategy was first introduced by Kroo et al., [8]. Shortly after its introduction this bi-level optimization scheme was extended by Tappeta and Renaud [9,10] to three distinct formulations to address multi- objective optimization of large-scale systems. The CO paradigm is based on the concept that the interaction among several different disciplinary experts optimizing a design is minimal for local changes in each discipline in a MDO problem. This allows a large scale system to be decomposed into sub-systems along domain specific boundaries. These subsystems are optimized through local design variables specific to each subsystem, subject to the domain specific constraints. The objective of each subsystem optimization is to maintain agreement on interdisciplinary design variables. A system level optimizer enforces this interdisciplinary compatibility while minimizing the overall objective function. This is achieved by combining the system level fitness with the cumulative sum of all discrepancies between interdisciplinary design variables. This strategy is extremely suited for parallel computation because of the minimal interaction between the different design disciplines, which results in reduced communication overhead during the course of the optimization. Methodology. This decomposition strategy is described with the flow diagram in Figure 1. As mentioned previously, the CO strategy is a bi-level method, in which the system level optimization sets and adjusts a interdisciplinary design variables during the optimization. The subspace optimizer attempts both to satisfy local constraints by adjusting local parameters, and to meet the interdisciplinary design variable targets set by the system level optimizer. Departures from the target interdisciplinary design parameters are allowed, which may occur because of insufficient local degrees of freedom, but is to be minimized. The system level optimizer attempts to adjust the interdisciplinary parameters such that the objective function is minimized, while maximizing the agreement between subsystems. This process of adjusting the system level target design, and the subsystems attempting to match it whilst satisfying local constraints, is repeated until convergence. This procedure can be graphically illustrated in Figure 2 (taken from Kroo and Manning [6]). The system level optimizer sets a design target P, and each subspace optimization attempts to satisfy local constraints while matching the target design P as closely as possible by moving in directions 1 and 2. During the next system level optimization cycle the target design P will be moved in direction 3 in order to maximize the agreement between the target design and subspace designs that satisfy local constraints. Attempt to match system level Optimize system mandated target design approximation problem and set target design I Subproblem Subproblem optimization optimization (local variables) (local variables) I -- L- . Figure 1 Collaborative optimization flow diagram Subspace 2 constraint Subspace 1 constraint Figure 2 Collaborative optimization subspace constraint satisfaction procedure (taken from [6]) Refinements on method. Several enhancements to this method have been proposed, among which are the integration of this architecture into a decision based design framework as proposed by Hazelrigg [11,12], the use of response surfaces [13] to model disciplinary analyses, and genetic algorithms with scheduling for increased efficiency [14]. Solution quality and computational efficiency. Sobieski and Kroo [15] report very robust performance on their CO scheme, with identical solutions being found on both collaborative and single level optimizations of 45 design variable, 26 constraint problems. Braun and Kroo [16] showed CO to be unsuited for small problems with strong coupling, but for large scale problems with weak coupling the CO methodology becomes more computationally efficient. They also found that the amount of system level iterations is dependent on the level of coupling of sub-systems, and that the required number of sub-optimizations scales in proportion to the overall problem size. Similar findings are reported by Alexandrov [17]. Braun et al. [18] evaluated the performance of CO on a set of quadratic problems presented by Shankar et al. [19] to evaluate the CSSO method. Unlike CSSO, the CO method did not require an increased amount of iterations for QP problems with strong coupling, and converged successfully in all cases. Applications. This decomposition architecture has been extensively demonstrated using analytical test problems [18,20] and aerospace optimization problems such as trajectory optimization [16,18], vehicle design [13,20-22], and satellite constellation configurations [23]. Concurrent SubSpace Optimization (CSSO) Overview. This method was proposed by Sobieszczanski-Sobieski [24], and, like CO, divides the MDO problem along disciplinary boundaries. The main difference however is the manner in which the CSSO framework coordinates the subsystem optimizations. A bi-level optimization scheme is used in which the upper optimization problem consists of a linear [7] or second order [25] system approximation created with the use of Global Sensitivity Equations (GSE's). This system approximation reflects changes in constraints and the objective function as a function of design variables. Because of the nonlinearities, this approximation is only accurate in the immediate neighborhood of the current design state, and needs to be updated after every upper-level optimization iteration. After establishing a system level approximation the subsystems are independently optimized using only design variables local to the subspace. The system level approximation is then updated by a sensitivity analysis to reflect changes in the subspace design. The last two steps of subspace optimization and system approximation is repeated through the upper level optimizer until convergence is achieved. Methodology. The basic steps taken to optimize a MDO problem with the CSSO framework are as follows: 1. Select initial set of designs for each subsystem. 2. Construct system approximation using GSE's 3. a) Subspace optimization through local variables and objective b) Update system approximation by performing sensitivity analysis 4. Optimize design variables according to system approximation 5. Stop if converged; otherwise go to 3) where step (3) contains the lower level optimization in this bi-level framework. This process is illustrated in Figure 3. Refinements on method. As previously mentioned, early CSSO strategies used linear system level approximations obtained with the Global Sensitivity Equations (GSE). Coordination of subspace or disciplinary optimizations is achieved through system level sensitivity information. Figure 3 Concurrent subspace optimization methodology flow diagram This imposes limits on the allowable deviation from the current design, and requires a new approximation to be constructed at every system iteration to maintain reasonable accuracy. Recent research focused on alternate methods for acquiring system level approximations for the coordination effort. Several authors [25-28] modified the system approximation to utilize a second order response surface approximation. This is combined with a database of previous fitness evaluation points, which can be used to create and update the response surface. This response surface then serves to couple the subsystem optimizations and coordinate system level design. Solution quality and computational efficiency. Shankar et al. [19] investigated the robustness of the CSSO on a set of analytical problems. Several quadratic programming problems with weak and strong coupling between subsystems were evaluated with a modification of Sobieszczanski-Sobieski's nonhierarchical subspace optimization scheme [7]. Results indicated reasonable performance for problems with weak coupling between subsystems. For large problems with strong interactions between subsystems, this decomposition scheme proved unreliable in terms of finding global sensitivities, leading to poor solutions. Tappeta et al., using the iSIGHT software. [29-31], analyzed two analytical and two structural problems, a welding design and a stepped beam weight minimization. In this work it is reported that the Karush-Kuhn-Tucker conditions were met in some of the cases, and that most problems converged closely to the original problem solution. Lin and Renaud compared the commercial software package LANCELOT [32] which compares the Broydon-Fletcher-Goldfarb-Shanno (BFGS) method to the CSSO strategy, the latter incorporating response surfaces. In this study the authors show similar computational efficiencies for small uncoupled analytical problems. For large scale MDO problems however the CSSO method consistently outperformed the LANCELOT optimizer in this area. Sellar et al. [26] compared a CSSO with neural network based response surface enhancements with a full (all at once) system optimization. The CSSO-NN algorithm showed a distinct advantage in computational efficiency over the all-at-once approach, while maintaining a high level of robustness. Applications. This decomposition methodology has been applied to large scale aerospace problems like high temperature and pressure aircraft engine components [29,33], aircraft brake component optimization [34], and aerospace vehicle design [26,35]. Analytical Target Cascading (ATC) Overview. Analytical Target Cascading was introduced by Michelena et al. [36] in 1999, and developed further by Kim [37] as a product development tool. This method is typically used to solve object based decomposed system optimization problems. Tosserams et al. [38] introduced a Lagrangian relaxation strategy which in some cases improves the computational efficiency of this method by several orders of magnitude. Methodology. ATC is a strategy which coordinates hierarchically decomposed systems or elements of a problem (see Figure 4) by the introduction of target and response coupling variables. Targets are set by parent elements which are met by responses from the children elements in the hierarchy (see Figure 5 obtained from [38]). At each element an optimization problem is formulated to find local variables, parent responses and child targets which minimize a penalized discrepancy function, while meeting local constraints. The responses are rebalanced up to higher levels by iteratively changing targets in a nested loop in order to obtain consistency. Several coordination strategies are available to determine the sequence of solving the sub-problems and order of exchange in targets and responses [39]. Proof of convergence is also presented for some of these classes of approaches in [39]. Refinements on method. Tosserams et al. [38] introduced the use of augmented Lagrangian relaxation in order to reduce the computational cost associated with obtaining very accurate agreement between sub-problems, and the coordination effort at the inner loop of the method. Allison et al. exploited the complimentary nature of ATC and CO to obtain an optimization formulation called nested ATC-MDO [40]. Kokkolaras et al. extended the formulation of ATC to include the design of product families [41]. 16 Element index ---- i=I /=2 i= 3 l j= L Level index i Figure 4 Example hierarchical problem structure Optimization inputs from parent: targets from children: responses r(+l) k t(+1) k Optimization outputs from parent: responses to children: responses Figure 5 Sub-problem information flow Solution quality and computational efficiency. Michelena et al. [39] prove, under certain convexity assumptions that the ATC process will yield the optimal solution of the original design target problem. The original ATC formulation had the twofold problem of requiring large penalty weights to accurate solutions, and the excessive repeat in the inner loop which solves sub-problems before the outer loop can proceed. Both these problems are addressed by using augmented Lagrangian relaxation [38] by Tosserams et al., which reported a decrease in computational effort on the order of between orders 10 and 10000. Applications. The ATC strategy is applied the design of structural members and a electric water pump in [40], and automotive design [42,43]. The performance of the Augmented Lagrangian Relaxation ATC enhancement was tested using several geometric programming problems [38]. Quasiseparable Decomposition and Optimization The quasiseparable decomposition and optimization strategy will be the focus of the research proposed in Chapter 6. This methodology addresses a class of problems common in the field of engineering and can be applied to a wide class of structural, biomechanical and other disciplinary problems. The strategy, which will be explained in detail in Chapter 6, is based on a two-level optimization approach which allows for the global search effort to be concentrated at the lower level sub-problem optimizations. The system to be optimized is decomposed in to several lower level subsystem optimizations which are coordinated by an upper level optimization. A Sequential Quadratic Programming (SQP) based optimizer is applied in this optimization infrastructure to solve a example structural sizing problem. This example problem entails the maximization of the tip displacement of a hollow stepped cantilever beam with 5 sections. The quasiseparable decomposition methodology is applied to decompose the structure into several sub-problems of reduced dimensionality. The parallelism with this strategy is achieved by optimizing the independent sub-problems (sections of the stepped beam) concurrently, allowing for the utilization of parallel processing resources. CHAPTER 3 GLOBAL OPTIMIZATION THROUGH THE PARTICLE SWARM ALGORITHM Overview This chapter introduces the population based algorithm which will be the target for investigating parallelism throughout the manuscript. This stochastic algorithm mimics swarming or flocking behavior found in animal groups such as bees, fish and birds. The swarm of particles is basically several parallel individual searches that are influenced by individual and swarm memory of regions in the design space with high fitness. These positions of the regions are constantly updated and reported to the individuals in the swarm through a simple communication model. This model allows for the algorithm to be easily decomposed into concurrent processes, each representing an individual particle in the swarm. This approach to allow parallelism will be detailed in Chapter 4. For the purpose of illustrating the performance of the PSO, it is compared to several other algorithms commonly used when solving problems in biomechanics. This comparison is made through optimization of several analytical problems, and a biomechanical system identification problem. The focus of the research presented in this chapter is to demonstrate that the PSO has good properties such as insensitivity to the scaling of design variables and very few algorithm parameters to fine tune. These make the PSO a valuable addition in the arsenal of optimization methods in biomechanical optimization, in which it has never been applied before. The work presented in this Chapter was in collaboration with Jeff Reinbold, who supplied the biomechanical test problem [58] and Byung II Koh, who developed the parallel SQP and BFGS algorithms [77] used to establish a computational efficiency comparison. The research in this Chapter was also published in [58,76,129]. Thanks goes to Soest and Casius for their willingness to share their numerical results published in [48]. Introduction Optimization methods are used extensively in biomechanics research to predict movement-related quantities that cannot be measured experimentally. Forward dynamic, inverse dynamic, and inverse static optimizations have been used to predict muscle, ligament, and joint contact forces during experimental or predicted movements (e.g., see references [44-55]). System identification optimizations have been employed to tune a variety of musculoskeletal model parameters to experimental movement data (e.g., see references [56-60]). Image matching optimizations have been performed to align implant and bone models to in vivo fluoroscopic images collected during loaded functional activities (e.g., see references [61-63]). Since biomechanical optimization problems are typically nonlinear in the design variables, gradient-based nonlinear programming has been the most widely used optimization method. The increasing size and complexity of biomechanical models has also led to parallelization of gradient-based algorithms, since gradient calculations can be easily distributed to multiple processors [44-46]. However, gradient-based optimizers can suffer from several important limitations. They are local rather than global by nature and so can be sensitive to the initial guess. Experimental or numerical noise can exacerbate this problem by introducing multiple local minima into the problem. For some problems, multiple local minima may exist due to the nature of the problem itself. In most situations, the necessary gradient values cannot be obtained analytically, and finite difference gradient calculations can be sensitive to the selected finite difference step size. Furthermore, the use of design variables with different length scales or units can produce poorly scaled problems that converge slowly or not at all [64,65], necessitating design variable scaling to improve performance. Motivated by these limitations and improvements in computer speed, recent studies have begun investigating the use of non-gradient global optimizers for biomechanical applications. Neptune [47] compared the performance of a simulated annealing (SA) algorithm with that of downhill simplex (DS) and sequential quadratic programming (SQP) algorithms on a forward dynamic optimization of bicycle pedaling utilizing 27 design variables. Simulated annealing found a better optimum than the other two methods and in a reasonable amount of CPU time. More recently, Soest and Casius [48] evaluated a parallel implementation of a genetic algorithm (GA) using a suite of analytical tests problems with up to 32 design variables and forward dynamic optimizations of jumping and isokinetic cycling with up to 34 design variables. The genetic algorithm generally outperformed all other algorithms tested, including SA, on both the analytical test suite and the movement optimizations. This study evaluates a recent addition to the arsenal of global optimization methods - particle swarm optimization (PSO) for use on biomechanical problems. A recently- developed variant of the PSO algorithm is used for the investigation. The algorithm's global search capabilities are evaluated using a previously published suite of difficult analytical test problems with multiple local minima [48], while its insensitivity to design variable scaling is proven mathematically and verified using a biomechanical test problem. For both categories of problems, PSO robustness, performance, and scale- independence are compared to that of three off-the-shelf optimization algorithms a genetic algorithm (GA), a sequential quadratic programming algorithm (SQP), and the BFGS quasi-Newton algorithm. In addition, previously published results [48] for the analytical test problems permit comparison with a more complex GA algorithm (GA*), a simulated annealing algorithm (SA), a different SQP algorithm (SQP*), and a downhill simplex (DS) algorithm. Theory Particle Swarm Algorithm. Particle swarm optimization is a stochastic global optimization approach introduced by Kennedy and Eberhart [66]. The method's strength lies in its simplicity, being easy to code and requiring few algorithm parameters to define convergence behavior. The following is a brief introduction to the operation of the particle swarm algorithm based on a recent implementation by Groenwold and Fourie [67] incorporating dynamic inertia and velocity reduction. Consider a swarm ofp particles, where each particle's position x' represents a possible solution point in the problem design space D. For each particle i, Kennedy and Eberhart [66] proposed that the position x'k+ be updated in the following manner: xk+1 = Xk + Vk+1 (3.1) with a pseudo-velocity vk+1 calculated as follows: vk +1= kVkck cl ( -x) +cr k (g -X ) (3.2) Here, subscript k indicates a (unit) pseudo-time increment. The point p' is the best- found cost location by particle i up to timestep k, which represents the cognitive contribution to the search vector vk+. Each component of vk+1 is constrained to be less than or equal to a maximum value defined in vma The point gk is the global best-found position among all particles in the swarm up to time k and forms the social contribution to the velocity vector. Cost function values associated with p' and gk are denoted by fb, and fbg respectively. Random numbers rl and r2 are uniformly distributed in the interval [0,1]. Shi and Eberhart [68] proposed that the cognitive and social scaling parameters c, and c2 be selected such that c, = c2 = 2 to allow the product czr, or c2r2 to have a mean of 1. The result of using these proposed values is that the particles overshoot the attraction points p' and gk half the time, thereby maintaining separation in the group and allowing a greater area to be searched than if the particles did not overshoot. The variable wk, set to 1 at initialization, is a modification to the original PSO algorithm [66]. By reducing its value dynamically based on the cost function improvement rate, the search area is gradually reduced [69]. This dynamic reduction behavior is defined by wd, the amount by which the inertia wk is reduced, vd, the amount by which the maximum velocity vm+a is reduced, and d, the number of iterations with no improvement in gk before these reduction take place [67] (see algorithm flow description below). Initialization of the algorithm involves several important steps. Particles are randomly distributed throughout the design space, and particle velocities v' are initialized to random values within the limits 0 < v' < vm The particle velocity upper limit vmax is calculated as a fraction of the distance between the upper and lower bound on variables in the design space vmax = K(x, x,) with K = 0.5 as suggested in [69]. Iteration counters k and t are set to 0. Iteration counter k is used to monitor the total number of swarm iterations, while iteration counter t is used to monitor the number of swarm iterations since the last improvement in g Thus, t is periodically reset to zero during the optimization while k is not. The algorithm flow can be represented as follows: 1. Initialize a. Set constants KC, C1, C2, kmax, Ivmax 0V, v wd, and d b. Set counters k = 0, t = 0. Set random number seed. c. Randomly initialize particle positions x' E D in 91 for i = 1,..., p d. Randomly initialize particle velocities 0 < v, < vmax for i = 1,...,p e. Evaluate cost function values fj using design space coordinates x' for i=l,...,p f Set fb,,, = j and p0 = x0 for i = ,..., p g. Set f,, to best fe,, and go to corresponding xo 2. Optimize h. Update particle velocity vectors vk1 using Eq. (3.2) i. If vk+ > vm for any component, then set that component to its maximum allowable value j. Update particle position vectors xk,+ using Eq. (3.1) k. Evaluate cost function values fk' using design space coordinates xk,+ for i= ,...,p 1. If f+l < fdest, then f/st = f/+, p.+, = xk+ for i= 1,..., p m. If f < f~st, then fbst = f-+, gk+ = xk+' for i = 1..., p n. If fg, was improved in (e), then reset t = 0. Else increment t o. If maximum number of function evaluations is exceeded, then go to 3 p. If t= d, then multiply wk+ by ( wd) and vk by (1- v ) q. Increment k r. Go to 2(a). 3. Report results 4. Terminate This algorithm was coded in the C programming language by the author [70] and used for all PSO analyses performed in the study. A standard population size of 20 particles was used for all runs, and other algorithm parameters were also selected based on standard recommendations (Table 1) [70-72]. The C source code for our PSO algorithm is freely available at http://www.mae.ufl.edu/-fregly/downloads/pso.zip (last accessed 12/2005). Table 1 Standard PSO algorithm parameters used in the study Parameter Description Value p Population size (number of particles) 20 cl Cognitive trust parameter 2.0 c2 Social trust parameter 2.0 wo Initial inertia 1 Wd Inertia reduction parameter 0.01 K Bound on velocity fraction 0.5 Vd Velocity reduction parameter 0.01 d Dynamic inertia/velocity reduction delay (function 200 evaluations) Analysis of Scale Sensitivity. One of the benefits of the PSO algorithm is its insensitivity to design variable scaling. To prove this characteristic, we will use a proof by induction to show that all particles follow an identical path through the design space regardless of how the design variables are scaled. In actual PSO runs intended to investigate this property, use of the same random seed in scaled and unsealed cases will ensure that an identical sequence of random r, and r2 values are produced by the computer throughout the course of the optimization. Consider an optimization problem with n design variables. An n-dimensional constant scaling vector C can be used to scale any or all dimensions of the problem design space: C2 C= 3 (3.3) We wish to show that for any time step k > 0, vk = vk, (3.4) xk = xk (3.5) where xk and vk (dropping superscript i) are the unsealed position and velocity, respectively, of an individual particle and xk = Cxk and vk = 'vk are the corresponding scaled versions. First, we must show that our proposition is true for the base case, which involves initialization (k = 0) and the first time step (k = 1). Applying the scaling vector C to an individual particle position xo during initialization produces a scaled particle position x : x' = x0 (3.6) where the right hand side is a component-by-component product of vectors 4 and x0 This implies that P' = Po, g' = g, (3.7) In the unsealed case, the pseudo-velocity is calculated as v0 = K(X xLB) (3.8) In the scaled case, this becomes v, = (x' x' = 0 (X' xLB) =K(XUB XLB)(3.9) = [(XUB -XLB)] = 5v, -V0 From Eqs. (3.1) and (3.2) and these initial conditions, the particle pseudo-velocity and position for the first time step can be written as v,= wo vo+c1ir (po-Xo)+c2r (g- Xo) (3.10) x1 = x0 +V (3.11) in the unsealed case and v' = w v + c1 (p' -x')+Cr, (g' -x) = wWoVo + cir (5po Xo ) + Czrz (go Xo ) = 5v, =x + v = x0 + Iv1 = 4[XO + v1] = ;X in the scaled case. Thus, our proposition is true for the base case. Next, we must show that our proposition is true for the inductive step. If we assume our proposition holds for any time step k =j, we must prove that it also holds for time step k =j + 1. We begin by replacing subscript k with subscriptj in Eqs. (3.4) and (3.5). If we then replace subscript 0 with subscriptj and subscript 1 with subscriptj + 1 in Eqs. (3.12) and (3.13), we arrive at Eqs. (3.4) and (3.5) where subscript k is replaced by subscripts + 1. Thus, our proposition is true for any time step + 1. Consequently, since the base case is true and the inductive step is true, Eqs. (3.4) and (3.5) are true for all k> 0. From Eqs. (3.4) and (3.5), we can conclude that any linear scaling of the design variables (or subset thereof) will have no effect on the final or any intermediate result of the optimization, since all velocities and positions are scaled accordingly. This fact leads to identical step intervals being taken in the design space for scaled and unsealed version of the same problem, assuming infinite precision in all calculations. In contrast, gradient-based optimization methods are often sensitive to design variable scaling due to algorithmic issues and numerical approximations. First derivative methods are sensitive because of algorithmic issues, as illustrated by a simple example. Consider the following minimization problem with two design variables (x,y) where the cost function is 2 x2 + (3.14) 100 with initial guess (1,1). A scaled version of the same problem can be created by letting x = x,y = y /10 so that the cost function becomes X2 +y2 (3.15) with initial guess (1,10). Taking first derivatives of each cost function with respect to the corresponding design variables and evaluating at the initial guesses, the search direction for the unsealed problem is along a line rotated 5.70 from the positive x axis and for the scaled problem along a line rotated 45. To reach the optimum in a single step, the unsealed problem requires a search direction rotated 84.3 and the scaled problem 45. Thus, the scaled problem can theoretically reach the optimum in a single step while the unsealed problem cannot due to the effect of scaling on the calculated search direction. Second derivative methods are sensitive to design variable scaling because of numerical issues related to approximation of the Hessian (second derivative) matrix. According to Gill et al. [64], Newton methods utilizing an exact Hessian matrix will be insensitive to design variable scaling as long as the Hessian matrix remains positive definite. However, in practice, exact Hessian calculations are almost never available, necessitating numerical approximations via finite differencing. Errors in these approximations result in different search directions for scaled versus unsealed versions of the same problem. Even a small amount of design variable scaling can significantly affect the Hessian matrix so that design variable changes of similar magnitude will not produce comparable magnitude cost function changes [64]. Common gradient-based algorithms that employ an approximate Hessian include Newton and quasi-Newton nonlinear programming methods such as BFGS, SQP methods, and nonlinear least-squares methods such as Levenberg-Marquardt [64]. A detailed discussion of the influence of design variable scaling on optimization algorithm performance can be found in Gill et al. [64]. Methodology Optimization Algorithms In addition to our PSO algorithm, three off-the-shelf optimization algorithms were applied to all test problems (analytical and biomechanical see below) for comparison purposes. One was a global GA algorithm developed by Deb [73-75]. This basic GA implementation utilizes one mutation operator and one crossover operator along with real encoding to handle continuous variables. The other two algorithms were commercial implementations of gradient-based SQP and BFGS algorithms (VisualDOC, Vanderplaats R & D, Colorado Springs, CO). All four algorithms (PSO, GA, SQP, and BFGS) were parallelized to accommodate the computational demands of the biomechanical test problem. For the PSO algorithm, parallelization was performed by distributing individual particle function evaluations to different processors as detailed by the author in [76]. For the GA algorithm, individual chromosome function evaluations were parallelized as described in [48]. Finally, for the SQP and BFGS algorithms, finite difference gradient calculations were performed on different processors as outlined by Koh et al. in [77]. A master-slave paradigm using the Message Passing Interface (MPI) [3,4] was employed for all parallel implementations. Parallel optimizations for the biomechanical test problem were run on a cluster of Linux- based PCs in the University of Florida High-performance Computing and Simulation Research Laboratory (1.33 GHz Athlon CPUs with 256MB memory on a 100Mbps switched Fast Ethernet network). While the PSO algorithm used standard algorithm parameters for all optimization runs, minor algorithm tuning was performed on the GA, SQP, and BFGS algorithms for the biomechanical test problem. The goal was to give these algorithms the best possible chance for success against the PSO algorithm. For the GA algorithm, preliminary optimizations were performed using population sizes ranging from 40 to 100. It was found that for the specified maximum number of function evaluations, a population size of 60 produced the best results. Consequently, this population size was used for all subsequent optimization runs (analytical and biomechanical). For the SQP and BFGS algorithms, automatic tuning of the finite difference step size (FDSS) was performed separately for each design variable. At the start of each gradient-based run, forward and central difference gradients were calculated for each design variable beginning with a relative FDSS of 10-1. The step size was then incrementally decreased by factors often until the absolute difference between forward and central gradient results was a minimum. This approach was taken since the amount of noise in the biomechanical test problem prevented a single stable gradient value from being calculated over a wide range of FDSS values (see Discussion). The forward difference step size automatically selected for each design variable was used for the remainder of the run. Analytical Test Problems The global search capabilities of our PSO implementation were evaluated using a suite of difficult analytical test problems previously published by Soest and Casius [48]. In that study, each problem in the suite was evaluated using four different optimizers: SA, GA*, SQP*, and DS, where a star indicates a different version of an algorithm used in our study. One thousand optimization runs were performed with each optimizer starting from random initial guesses and using standard optimization algorithm parameters. Each run was terminated based on a pre-defined number of function evaluations for the particular problem being solved. We followed an identical procedure with our four algorithms to permit comparison between our results and those published by Soest and Casius in [48]. Since two of the algorithms used in our study (GA and SQP) were of the same general category as algorithms used by Soest and Casius in [48] (GA* and SQP*), comparisons could be made between different implementations of the same general algorithm. Failed PSO and GA runs were allowed to use up the full number of function evaluations, whereas failed SQP and BFGS runs were re-started from new random initial guesses until the full number of function evaluations was completed. Only 100 rather than 1000 runs were performed with the SQP and BFGS algorithms due to a database size problem in the VisualDOC software. A detailed description of the six analytical test problems can be found in Soest and Casius [48]. Since the design variables for each problem possessed the same absolute upper and lower bound and appeared in the cost function in a similar form, design variable scaling was not an issue in these problems. The six analytical test problems are described briefly below. H1: This simple 2-dimensional function [48] has several local maxima and a global maximum of 2 at the coordinates (8.6998, 6.7665). 8 8 HI(x,,x2) =2sin2 2 (3.16) d+1 x,, x2 [- 100,100] where d = j(x 8.6998)2+ (x2 -6.7665)2 Ten thousand function evaluations were used for this problem. H, : This inverted version of the F6 function used by Schaffer et al. [78] has 2 dimensions with several local maxima around the global maximum of 1.0 at (0,0). sin2( X2 + 0.5 H2(x,,x2) =0.5 -(X- 0- (3.17) 1+0001 (x+x x,,x,2 [-100,100] This problem was solved using 20,000 function evaluations per optimization run. H3 : This test function from Corana et al. [79] was used with dimensionality n = 4, 8, 16, and 32. The function contains a large number of local minima (on the order of 104") with a global minimum of 0 at x, <0.05. H ) (t'sgn(z,)+z+)2 x c.d ifoeiX-Z di x, otherwise x, [-1000,1000] where z = +0.49999 sgn(x s, c = 0.15, 1 1 i = 1,5,9,... 1000 i = 2,6,10,... s= 0.2, t =0.05, and d, = 0 ,,1,. 10 i 3,7,11,... 100 i =4,8,12,... The use of the floor function in Eq. (3.18) makes the search space for this problem the most discrete of all problems tested. The number of function evaluations used for this problem was 50,000 (n = 4), 100,000 (n = 8), 200,000 (n = 16), and 400,000 (n = 32). For all of the analytical test problems, an algorithm was considered to have succeeded if it converged to within 10-3 of the known optimum cost function value within the specified number of function evaluations [48]. Biomechanical Test Problem In addition to these analytical test problems, a biomechanical test problem was used to evaluate the scale-independent nature of the PSO algorithm. Though our PSO algorithm is theoretically insensitive to design variable scaling, numerical round-off errors and implementation details could potentially produce a scaling effect. Running the other three algorithms on scaled and unsealed versions of this test problem also permitted investigation of the extent to which other algorithms are influenced by design variable scaling. The biomechanical test problem involved determination of an ankle joint kinematic model that best matched noisy synthetic (i.e., computer generated) movement data. Similar to that used by van der Bogert et al. [56], the ankle was modeled as a three- dimensional linkage with two non-intersecting pin joints defined by 12 subject-specific parameters (Figure 6). Y Tibia r Pe ^ - Center N 7 Y (superior) z x (lateral) Lab (anterior) Figure 6 Joint locations and orientations in the parametric ankle kinematic model. Each pi (i = 1,...,12) represents a different position or orientation parameter in the model These parameters represent the positions and orientations of the talocrural and subtalar joint axes in the tibia, talus, and calcaneous. Position parameters were in units of centimeters and orientation parameters in units of radians, resulting in parameter values of varying magnitude. This model was part of a larger 27 degree-of-freedom (DOF) full- body kinematic model used to optimize other joints as well [58]. Given this model structure, noisy synthetic movement data were generated from a nominal model for which the "true" model parameters were known. Joint parameters for the nominal model along with a nominal motion were derived from in vivo experimental movement data using the optimization methodology described below. Next, three markers were attached to the tibia and calcaneous segments in the model at locations consistent with the experiment, and the 27 model DOFs were moved through their nominal motions. This process created synthetic marker trajectories consistent with the nominal model parameters and motion and also representative of the original experimental data. Finally, numerical noise was added to the synthetic marker trajectories to emulate skin and soft tissue movement artifacts. For each marker coordinate, a sinusoidal noise function was used with uniformly distributed random period, phase, and amplitude (limited to a maximum of+ 1 cm). The values of the sinusoidal parameters were based on previous studies reported in the literature [80,53]. An unconstrained optimization problem with bounds on the design variables was formulated to attempt to recover the known joint parameters from the noisy synthetic marker trajectories. The cost function was min f(p) (3.19) p with 50 6 3 f(p) = min (ck k (p, q)) (3.20) k=l q j=1 1=1 where p is a vector of 12 design variables containing the joint parameters, q is a vector of 27 generalized coordinates for the kinematic model, cijk is the ith coordinate of synthetic marker at time frame k, and yk (p, q) is the corresponding marker coordinate from the cinematic model. At each time frame, ck (p, q) was computed from the current model parameters p and an optimized model configuration q. A separate Levenberg- Marquardt nonlinear least-squares optimization was performed for each time frame in Eq. (3.20) to determine this optimal configuration. A relative convergence tolerance of 10-3 was chosen to achieve good accuracy with minimum computational cost. A nested optimization formulation (i.e., minimization occurs in Eqs. (3.19) and (3.20)) was used to decrease the dimensionality of the design space in Eq. (3.19). Equation (3.20) was coded in Matlab and exported as stand-alone C code using the Matlab Compiler (The Mathworks, Natick, MA). The executable read in a file containing the 12 design variables and output a file containing the resulting cost function value. This approach facilitated the use of different optimizers to solve Eq. (3.19). To investigate the influence of design variable scaling on optimization algorithm performance, two versions of Eq. (3.20) were generated. The first used the original units of centimeters and radians for the position and orientation design variables respectively. Bounds on the design variables were chosen to enclose a physically realistic region around the solution point in design space. Each position design variable was constrained to remain within a cube centered at the midpoint of the medial and lateral malleoli, where the length of each side was equal to the distance between the malleoli (i.e., 11.32 cm). Each orientation design variable was constrained to remain within a circular cone defined by varying its "true" value by + 300. The second version normalized all 12 design variables to be within [-1,1] using norm 2x XUB XLB (3.21) XUB XLB where x and xL denote the upper and lower bounds, respectively, on the design variable vector [81]. Two approaches were used to compare PSO scale sensitivity to that of the other three algorithms. For the first approach, a fixed number of scaled and unsealed runs (10) were performed with each optimization algorithm starting from different random initial seeds, and the sensitivity of the final cost function value to algorithm choice and design variable scaling was evaluated. The stopping condition for PSO and GA runs was 10,000 function evaluations, while SQP and BFGS runs were terminated when a relative convergence tolerance of 10-5 or absolute convergence tolerance of 10-6 was met. For the second approach, a fixed number of function evaluations (10,000) were performed with each algorithm to investigate unsealed versus scaled convergence history. A single random initial guess was used for the PSO and GA algorithms, and each algorithm was terminated once it reached 10,000 function evaluations. Since individual SQP and BFGS runs require much fewer than 10,000 function evaluations, repeated runs were performed with different random initial guesses until the total number of function evaluations exceeded 10,000 at the termination of a run. This approach essentially uses SQP and BFGS as global optimizers, where the separate runs are like individual particles that cannot communicate with each another but have access to local gradient information. Finite difference step size tuning at the start of each run was included in the computation of number of function evaluations. Once the total number of runs required to reach 10,000 function evaluations was known, the lowest cost function value from all runs at each iteration was used to represent the cost over a range of function evaluations equal to the number of runs. Results For the analytical test problems, our PSO algorithm was more robust than our GA, SQP, and BFGS algorithms (Table 2, top half). PSO converged to the correct global solution nearly 100% of the time on four of the six test problems (H1 and H3 with n = 4, 8, and 16). It converged 67% of the time for problem H2 and only 1.5% of the time for problem H3 with n = 32. In contrast, none of the other algorithms converged more than 32% of the time on any of the analytical test problems. Though our GA algorithm typically exhibited faster initial convergence than did our PSO algorithm (Figure 7, left column), it leveled off and rarely reached the correct final point in design space within the specified number of function evaluations. Table 2 Fraction of successful optimizer runs for the analytical test problems. Top half: Results from the PSO, GA, SQP, and BFGS algorithms used in the present study. Bottom half: Results from the SA, GA, SQP, and DS algorithms used in Soest and Casius 48. The GA and SQP algorithms used in that study were different from the ones used in our study. Successful runs were identified by a final cost function value within of the known optimum value, consistent with [48] H3 Study Algorithm H1 H2 (n = 4) (n = 8) (n = 16) (n = 32) PSO 0.972 0.688 1.000 1.000 1.000 0.015 GA 0.000 0.034 0.000 0.000 0.000 0.002 Present SQP 0.09 0.11 0.00 0.00 0.00 0.00 BFGS 0.00 0.32 0.00 0.00 0.00 0.00 Soest SA 1.000 0.027 0.000 0.001 0.000 0.000 and GA 0.990 0.999 1.000 1.000 1.000 1.000 Casius SQP 0.279 0.810 0.385 0.000 0.000 0.000 (2003) DS 1.000 0.636 0.000 0.000 0.000 0.000 a Present Study o100n -_ _ 1U- -- PSO S--GA 10-1 SQP .---.BFGS 10-2 -' _-- --- ---- 0 5 10 15 20 xC 103 41C 1 2 3 4 5 _ x 104 0 1 2 3 4 x 105 Number of Function Evaluations Soest & Casius (2003) 1 2 4 6 8 10 x103 Iu I x 104 3 1 2 3 4 x 10' Number of Function Evaluations Figure 7 Comparison of convergence history results for the analytical test problems. Left column: Results from the PSO, GA, SQP, and BFGS algorithms used in the present study. Right column: Results from the SA, GA, SQP, and DS algorithms used in Soest and Casius [5]. The GA and SQP algorithms used in that study were different from the ones used in our study. (a) Problem H1. The SA results have been updated using corrected data provided by Soest and Casius, since the results in 48 accidentally used a temperature reduction rate of 0.5 rather than the standard value of 0.85 as reported. (b) Problem H-2. (c) Problem H3 with n = 4. (d) Problem H3 with n = 32. Error was computed using the known cost at the global optimum and represents the average of 1000 runs (100 multi-start SQP and BFGS runs in our study) with each algorithm. 3000 8000 SQP BFGS 2250 6000 m Unsealed S1500 u Scaled 4000 750 i n 2000 0 0 12345678910 12345678910 Run Run Figure 8 Final cost function values for ten unsealed (dark bars) and scaled (gray bars) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem. Each pair of unsealed and scaled runs was started from the same initial points) in design space, and each run was terminated when the specified stopping criteria was met (see text). In contrast, the SQP and BFGS algorithms were highly sensitive to design variable scaling in the biomechanical test problem. For the ten trials, unsealed and scaled SQP or BFGS runs rarely converged to similar points in design space (note y axis scale in Figure 8) and produced large differences in final cost function value from one trial to the next (Figure 8c and d). Scaling improved the final result in seven out often SQP trials and in five often BFGS trials. The best unsealed and scaled SQP final cost function values were 255 and 121, respectively, while those of BFGS were 355 and 102 (Table 3). Thus, scaling yielded the best result found with both algorithms. The best SQP and BFGS trials generally produced larger RMS marker distance errors (up to two times worse), orientation parameter errors (up to five times worse), and position parameter errors (up to six times worse) than those found by PSO or GA. 40 Table 3 Final cost function values and associated marker distance and joint parameter root-mean-square (RMS) errors after 10,000 function evaluations performed by multiple unsealed and scaled PSO, GA, SQP, and BFGS runs. See Figure 9 for corresponding convergence histories RMS Error Cost Marker Orientation Position Optimizer Formulation Function Distances Parameters Parameters (mm) (deg) (mm) Unscaled 69.5 5.44 2.63 4.47 PSOt Scaled 69.5 5.44 2.63 4.47 GA Unscaled 77.9 5.78 2.65 6.97 Scaled 74.0 5.64 3.76 4.01 Unscaled 255 10.4 3.76 14.3 Scaled 121 7.21 3.02 9.43 Unscaled 69.5 5.44 2.63 4.47 BFGS 69.5 5.44 2.634.47 Scaled 69.5 5.44 2.63 4.47 2000 4000 6000 Number of Function Evaluation d: Black lines 3ray lines 8000 10000 s Figure 9 Convergence history for unsealed (dark lines) and scaled (gray lines) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem. Each algorithm was run terminated after 10,000 function evaluations. Only one unsealed and scaled PSO and GA run were required to reach 10,000 function evaluations, while repeated SQP and BFGS runs were required to reach that number. Separate SQP and BFGS runs were treated like individual particles in a single PSO run for calculating convergence history (see text). -- PSO ----GA ------ SQP S- BFGS -- -------- ------ -- \- - - - Unscalec Scaled: 1011 0 Discussion This chapter evaluated a recent variation of the PSO algorithm with dynamic inertia and velocity updating as a possible addition to the arsenal of methods that can be applied to difficult biomechanical optimization problems. For all problems investigated, our PSO algorithm with standard algorithm parameters performed better than did three off-the-self optimizers GA, SQP, and BFGS. For the analytical test problems, PSO robustness was found to be better than that of two other global algorithms but worse than that of a third. For the biomechanical test problem with added numerical noise, PSO was found to be insensitive to design variable scaling while GA was only mildly sensitive and SQP and BFGS highly sensitive. Overall, the results suggest that our PSO algorithm is worth consideration for difficult biomechanical optimization problems, especially those for which design variable scaling may be an issue. Though our biomechanical optimization involved a system identification problem, PSO may be equally applicable to problems involving forward dynamic, inverse dynamic, inverse static, or image matching analyses. Other global methods such as SA and GA have already been applied successfully to such problems [47,48,62], and there is no reason to believe that PSO would not perform equally well. As with any global optimizer, PSO utilization would be limited by the computational cost of function evaluations given the large number required for a global search. Our particle swarm implementation may also be applicable to some large-scale biomechanical optimization problems. Outside the biomechanics arena [71,72,82-91], PSO has been used to solve problems on the order of 120 design variables [89-91]. In the present study, our PSO algorithm was unsuccessful on the largest test problem, H3 with n = 32 design variables. However, in a recent study, our PSO algorithm successfully solved the Griewank global test problem with 128 design variables using population sizes ranging from 16 to 128 76. When the Corana test problem (H3) was attempted with 128 DVs, the algorithm exhibited worse convergence. Since the Griewank problem possesses a bumpy but continuous search space and the Corana problem a highly discrete search space, our PSO algorithm may work best on global problems with a continuous search space. It is not known how our PSO algorithm would perform on biomechanical problems with several hundred DVs, such as the forward dynamic optimizations of jumping and walking performed with parallel SQP in [44-46]. One advantage of global algorithms such as PSO, GA, and SA is that they often do not require significant algorithm parameter tuning to perform well on difficult problems. The GA used by Soest and Casius in [48] (which is not freely available) required no tuning to perform well on all of these particular analytical test problems. The SA algorithm used by Soest and Casius in [48] required tuning of two parameters to improve algorithm robustness significantly on those problems. Our PSO algorithm (which is freely available) required tuning of one parameter (wd, which was increased from 1.0 to 1.5) to produce 100% success on the two problems where it had significant failures. For the biomechanical test problem, our PSO algorithm required no tuning, and only the population size of our GA algorithm required tuning to improve convergence speed. Neither algorithm was sensitive to the two sources of noise present in the problem noise added to the synthetic marker trajectories, and noise due to a somewhat loose convergence tolerance in the Levenberg-Marquardt sub-optimizations. Thus, for many global algorithm implementations, poor performance on a particular problem can be rectified by minor tuning of a small number of algorithm parameters. 2 [ 10 S\ -- Forward le-3 S o -0- Central le-3 -i- Forward le-6 -0- Central le-6 10-6 10-5 10 10-3 10-2 10- 100 Finite Difference Step Size Figure 10 Sensitivity of gradient calculations to selected finite difference step size for one design variable. Forward and central differencing were evaluated using relative convergence tolerances of 10-3 and 10-6 for the nonlinear least squares sub-optimizations performed during cost function evaluation (see Eq. (3.20)). In contrast, gradient-based algorithms such as SQP and BFGS can require a significant amount of tuning even to begin to approach global optimizer results on some problems. For the biomechanical test problem, our SQP and BFGS algorithms were highly tuned by scaling the design variables and determining the optimal FDSS for each design variable separately. FDSS tuning was especially critical due to the two sources of noise noted above. When forward and central difference gradient results were compared for one of the design variables using two different Levenberg-Marquardt relative convergence tolerances (10-3 and 10-6), a "sweet spot" was found near a step size of 10-2 (Figure 10). Outside of that "sweet spot," which was automatically identified and used in generating our SQP and BFGS results, forward and central difference gradient results diverged quickly when the looser tolerance was used. Since most users of gradient-based optimization algorithms do not scale the design variables or tune the FDSS for each design variable separately, and many do not perform multiple runs, our SQP and BFGS results for the biomechanical test problem represent best-case rather than typical results. For this particular problem, an off-the-shelf global algorithm such as PSO or GA is preferable due to the significant reduction in effort required to obtain repeatable and reliable solutions. Another advantage of PSO and GA algorithms is the ease with which they can be parallelized [48,76] and their resulting high parallel efficiency. For our PSO algorithm, Schutte et al. [76] recently reported near ideal parallel efficiency for up to 32 processors. Soest and Casius [48] reported near ideal parallel efficiency for their GA algorithm with up to 40 processors. Though SA has historically been considered more difficult to parallelize [92], Higginson et al. [93] recently developed a new parallel SA implementation and demonstrated near ideal parallel efficiency for up to 32 processors. In contrast, Koh et al. [77] reported poor SQP parallel efficiency for up to 12 processors due to the sequential nature of the line search portion of the algorithm. The caveat for these parallel efficiency results is that the time required per function evaluation was approximately constant and the computational nodes were homogeneous. As shown in [76], when function evaluations take different amounts of time, parallel efficiency of our PSO algorithm (and any other synchronous parallel algorithm, including GA, SA, SQP, and BFGS) will degrade with increasing number of processors. Synchronization between individuals in the population or between individual gradient calculations requires slave computational nodes that have completed their function evaluations to sit idle until all nodes have returned their results to the master node. Consequently, the slowest computational node (whether loaded by other users, performing the slowest function evaluation, or possessing the slowest processor in a heterogeneous environment) will dictate the overall time for each parallel iteration. An asynchronous PSO implementation with load balancing, where the global best-found position is updated continuously as each particle completes a function evaluation, could address this limitation. However, the extent to which convergence characteristics and scale independence would be affected is not yet known. To put the results of our study into proper perspective, one must remember that optimization algorithm robustness can be influenced heavily by algorithm implementation details, and no single optimization algorithm will work for all problems. For two of the analytical test problems (H2 and H3 with n = 4), other studies have reported PSO results using formulations that did not include dynamic inertia and velocity updating. Comparisons are difficult given differences in the maximum number of function evaluations and number of particles, but in general, algorithm modifications were (not surprisingly) found to influence algorithm convergence characteristics [94-96]. For our GA and SQP algorithms, results for the analytical test problems were very different from those obtained by Soest and Casius in [48] using different GA and SQP implementations. With seven mutation and four crossover operators, the GA algorithm used by Soest and Casius in [48] was obviously much more complex than the one used here. In contrast, both SQP algorithms were highly-developed commercial implementations. In contrast, poor performance by a gradient-based algorithm can be difficult to correct even with design variable scaling and careful tuning of the FDSS. These findings indicate that specific algorithm implementations, rather than general classes of algorithms, must be evaluated to reach any conclusions about algorithm robustness and performance on a particular problem. Conclusions In summary, the PSO algorithm with dynamic inertia and velocity updating provides another option for difficult biomechanical optimization problems with the added benefit of being scale independent. There are few algorithm-specific parameters to adjust, and standard recommended settings work well for most problems [70,94]. In biomechanical optimization problems, noise, multiple local minima, and design variables of different scale can limit the reliability of gradient-based algorithms. The PSO algorithm presented here provides a simple-to-use off-the-shelf alternative for consideration in such cases. The algorithm's main drawback is the high cost in terms of function evaluations because of slow convergence in the final stages of the optimization, a common trait among global search algorithms. The time requirements associated with the high computational cost may be circumvented by utilizing the parallelism inherent in the swarm algorithm. The development of such a parallel PSO algorithm will be detailed in the next chapter. CHAPTER 4 PARALLELISM BY EXPLOITING POPULATION-BASED ALGORITHM STRUCTURES Overview The structures of population based optimizers such as Genetic Algorithms and the Particle Swarm may be exploited in order to enable these algorithms to utilize concurrent processing. These algorithms require a set of fitness values for the population or swarm of individuals for each iteration during the search. The fitness of each individual is independently evaluated, and may be assigned to separate computational nodes. The development of such a parallel computational infrastructure is detailed in this chapter and applied to a set of large-scale analytical problems and a biomechanical system identification problem for the purpose of quantifying its efficiency. The parallelization of the PSO is achieved with a master-slave, coarse-grained implementation where slave computational nodes are associated with individual particle search trajectories and assigned their fitness evaluations. Greatly enhanced computation throughput is demonstrated using this infrastructure, with efficiencies of 95% observed for load balanced conditions. Numerical example problems with large load imbalances yield poor performance, which decreases in a linear fashion as additional nodes are added. This infrastructure is based on a two-level approach with flexibility in terms of where the search effort can be concentrated. For the two problems presented, the global search effort is applied in the upper level. This work was done in collaboration with Jeff Reinbolt, who created the biomechanical kinematic analysis software [58] and evaluated the quality of the solutions found by the parallel PSO. The work presented in this chapter was also published in [58,76,129]. Introduction Present day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima -large- scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles).These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available. Numerical optimization has been widely used in engineering to solve a variety of NP-complete problems in areas such as structural optimization, neural network training, control system analysis and design, and layout and scheduling problems. In these and other engineering disciplines, two major obstacles limiting the solution efficiency are frequently encountered. First, even medium-scale problems can be computationally demanding due to costly fitness evaluations. Second, engineering optimization problems are often plagued by multiple local optima, requiring the use of global search methods such as population-based algorithms to deliver reliable results. Fortunately, recent advances in microprocessor and network technology have led to increased availability of low cost computational power through clusters of low to medium performance computers. To take advantage of these advances, communication layers such as MPI [3, 5] and PVM [97] have been used to develop parallel optimization algorithms, the most popular being gradient-based, genetic (GA), and simulated annealing (SA) algorithms [48,98,99]. In biomechanical optimizations of human movement, for example, parallelization has allowed problems requiring days or weeks of computation on a single- processor computer to be solved in a matter of hours on a multi-processor machine [98]. The particle swarm optimization (PSO) algorithm is a recent addition to the list of global search methods [100].This derivative-free method is particularly suited to continuous variable problems and has received increasing attention in the optimization community. It has been successfully applied to large-scale problems [69,100,101] in several engineering disciplines and, being a population-based approach, is readily parallelizable. It has few algorithm parameters, and generic settings for these parameters work well on most problems [70,94]. In this study, we present a parallel PSO algorithm for application to computationally demanding optimization problems. The algorithm's enhanced throughput due to parallelization and improved convergence due to increased population size are evaluated using large-scale analytical test problems and medium-scale biomechanical system identification problems. Both types of problems possess multiple local minima. The analytical test problems utilize 128 design variables to create a tortuous design space but with computationally cheap fitness evaluations. In contrast, the biomechanical system identification problems utilize only 12 design variables but each fitness evaluation is much more costly computationally. These two categories of problems provide a range of load balance conditions for evaluating the parallel formulation. Serial Particle Swarm Algorithm Particle swarm optimization was introduced in 1995 by Kennedy and Eberhart [66]. Although several modifications to the original swarm algorithm have been made to improve performance [68,102-105] and adapt it to specific types of problems [69,106,107], a parallel version has not been previously implemented. The following is a brief introduction to the operation of the PSO algorithm. Consider a swarm ofp particles, with each particle's position representing a possible solution point in the design problem space D. For each particle i, Kennedy and Eberhart proposed that its position x, be updated in the following manner: +1 = x + v, (4.1) with a pseudo-velocity v, calculated as follows: vk = kVk + C1 (P + C2 (k X ) (4.2) Here, subscript k indicates a (unit) pseudo-time increment, p,k represents the best ever position of particle i at time k (the cognitive contribution to the pseudo-velocity vector v,k+1), andpgk represents the global best position in the swarm at time k (social contribution). rl and r2 represent uniform random numbers between 0 and 1.To allow the product clrl or c2r2 to have a mean of 1, Kennedy and Eberhart proposed that the cognitive and social scaling parameters cl and c2 be selected such that cl = c2 = 2 .The result of using these proposed values is that the particles overshoot the target half the time, thereby maintaining separation within the group and allowing for a greater area to be searched than if no overshoot occurred. A modification by Fourie and Groenwold [69] on the original PSO algorithm 66 allows transition to a more refined search as the optimization progresses. This operator reduces the maximum allowable velocity Vmak and particle inertia wk in a dynamic manner, as dictated by the dynamic reduction parameters K d, Wd. For the sake of brevity, further details of this operator are omitted, but a detailed description can be found in References [69,70]. The serial PSO algorithm as it would typically be implemented on a single CPU computer is described below, where p is the total number of particles in the swarm. The best ever fitness value of a particle at design coordinates pk is denoted byfbest and the best ever fitness value of the overall swarm at coordinates pk byfgbest. At time step k = 0, the particle velocities v,O are initialized to values within the limits 0 < v 0 < Vmax0 .The vector Vmax is calculated as a fraction of the distance between the upper and lower bounds Vmax = ((xUB XLB) [69], with = 0 .5. With this background, the PSO algorithm flow can be described as follows: 6. Initialize a. b. c. d. e. f. g. h. 7. Optimize a. b. c. Set constants K c, cl, kmax, V ax, w0, Vd, Wd, and d Initialize dynamic maximum velocity vx and inertia wk Set counters k = 0, t = 0, i =1. Set random number seed. Randomly initialize particle positions x' E D in 91 for i = 1,..., p Randomly initialize particle velocities 0 < v, < vmax for i = 1,..., p Evaluate cost function values fj using design space coordinates x' for i=l,...,p Set fb,t = fo and p, = x, for i = ,..., p Set fg, to best fb,,t and go to corresponding x' Update particle velocity vectors v'k+ using Eq.(4.2) If vk+~ > vm~ for any component, then set that component to its maximum allowable value Update particle position vectors x'~+ using Eq. (4.1) d. Evaluate cost function values fk' using design space coordinates x'k+ for i=l,...,p e. If f, < fb~,t, then fb~, = fk~+, pk+1 = Xk+1 for i = l,...,p f If f 1 < fb,, then fb, = f~ ', gk+1 = Xk+ for i = ,...,p g. If fb, was improved in (e), then reset t = 0. Else increment t. If k> kmax go to 3 h. If t = d, then multiply wk+ by ( wd) and vkm by (1 vd) i. If maximum number of function evaluations is exceeded, then go to 3 j. Increment i. If i > p then increment k, and set i = 1 k. Go to 2(a). 8. Report results 9. Terminate The above logic is illustrated as a flow diagram in Figure 11 without detailing the working of the dynamic reduction parameters. Problem independent stopping conditions based on convergence tests are difficult to define for global optimizers. Consequently, we typically use a fixed number of fitness evaluations or swarm iterations as a stopping criteria. Parallel Particle Swarm Algorithm The following issues had to be addressed in order to create a parallel PSO algorithm. Concurrent Operation and Scalability The algorithm should operate in such a fashion that it can be easily decomposed for parallel operation on a multi-processor machine. Furthermore, it is highly desirable that it be scalable. Scalability implies that the nature of the algorithm should not place a limit on the number of computational nodes that can be utilized, thereby permitting full use of available computational resources. An example of an algorithm with limited scalability is a parallel implementation of a gradient-based optimizer. This algorithm is decomposed by distributing the workload of the derivative calculations for a single point in design space among multiple processors. The upper limit on concurrent operations using this approach is therefore set by the number of design variables in the problem. On the other hand, population-based methods such as the GA and PSO are better suited to parallel computing. Here the population of individuals representing designs can be increased or decreased according to the availability and speed of processors Any additional agents in the population will allow for a higher fidelity search in the design space, lowering susceptibility to entrapment in local minima. However, this comes at the expense of additional fitness evaluations. Initialize :0l','rilhm / constants kfmai. W,'' '. CI,, ce, c2 Setk = 1i = 1 Randomly initialize all particle positions .xT Randomly initialize all particle velocities vt Evaluate objective function f (x) for particle i Set i = 1, Increment k Update particle i and swarm yes best values fbies fbgest no i > total number I pj.iic velocity v, of particles? for particle i Update velocity x, Increment i for particle i no- Si)pniL, criterion satisfied? yes Output results Figure 11 Serial implementation of PSO algorithm. To avoid complicating the diagram, we have omitted velocity/inertia reduction operations. Asynchronous vs. Synchronous Implementation The original PSO algorithm was implemented with a synchronized scheme for updating the best 'remembered 'individual and group fitness values fk andfgk, respectively, and their associated positions p,k and pgk This approach entails performing the fitness evaluations for the entire swarm before updating the best fitness values. Subsequent experimentation revealed that improved convergence rates can be obtained by updating thefk andfgk values and their positions after each individual fitness evaluation (i.e. in an asynchronous fashion) [70,94]. It is speculated that because the updating occurs immediately after each fitness evaluation, the swarm reacts more quickly to an improvement in the best-found fitness value. With the parallel implementation, however, this asynchronous improvement on the swarm is lost since fitness evaluations are performed concurrently. The parallel algorithm requires updatingfk andfgk for the entire swarm after all fitness evaluations have been performed, as in the original particle swarm formulation. Consequently, the swarm will react more slowly to changes of the best fitness value 'position' in the design space. This behavior produces an unavoidable performance loss in terms of convergence rate compared to the asynchronous implementation and can be considered part of the overhead associated with parallelization. Coherence Parallelization should have no adverse affect on algorithm operation. Calculations sensitive to program order should appear to have occurred in exactly the same order as in the serial synchronous formulation, leading to the exact same final answer. In the serial PSO algorithm the fitness evaluations form the bulk of the computational effort for the optimization and can be performed independently. For our parallel implementation, we therefore chose a coarse decomposition scheme where the algorithm performs only the fitness evaluations concurrently on a parallel machine. Step 2 of the particle swarm optimization algorithm was modified accordingly to operate in a parallel manner: 2) Optimize a) Update particle velocity vectors vk+1 using Eq.(4.2) b) If vk+ > vm for any component, then set that component to its maximum allowable value c) Update particle position vectors x'k+ using Eq. (4.1) d) Concurrently evaluate fitness values f+, using design space co-ordinates x'k+ for i =l,...,p e) If f+, < f/ then fbst = fk+l, Pk+ = xkl for i= 1,..., p f) If fk, < fbg,,, then fbgt = fk+l, gk+l = Xk+l for i = 1,..., p g) If f ,, was improved in (e), then reset t = 0. Else increment t. If k> kmax go to 3 h) If t = d, then multiply wk+1 by (1 wd) and vk+ by (1- v ) i) If maximum number of function evaluations is exceeded, then go to 3 j) Increment k k) Goto2(a). The parallel PSO algorithm is represented by the flow diagram in Figure 12. Network Communication In a parallel computational environment, the main performance bottleneck is often the communication latency between processors. This issue is especially relevant to large clusters of computers where the use of high performance network interfaces are limited due to their high cost. To keep communication between different computational nodes at a minimum, we use fitness evaluation tasks as the level of granularity for the parallel software. As previously mentioned each of these evaluations can be performed independently and requires no communication aside from receiving design space co- ordinates to be evaluated and reporting the fitness value at the end of the analysis. hiniihlize ali,-ritlniii / constalln ..... i'I i. I ', Ci, C2 / I s2 Set k = 1 Randomly initialize all particle positions xa Randomly initialize all particle velocities vk f(.I) f) f() f(4) e' f(cx) Barrier synchronize Update particle and swarm Increment k best values fst, fest Update velocity v[ for all particles Update position 4X for all particles no Stopping criterion satisfied? yes Output results ( Stop Figure 12 Parallel implementation of the PSO algorithm. We have again omitted velocity/inertia reduction operations to avoid complicating the diagram. The optimization infrastructure is organized into a coordinating node and several computational nodes. PSO algorithm functions and task orchestration are performed by the coordinating node, which assigns the design co-ordinates to be evaluated, in parallel, to the computational nodes. With this approach, no communication is required between computational nodes as individual fitness evaluations are independent of each other. The only necessary communication is between the coordinating node and the computational nodes and encompasses passing the following information: 1) Several distinct design variable configuration vectors assigned by coordinating node to slave nodes for fitness evaluation. 2) Fitness values reported from slave nodes to coordinating node. 3) Synchronization signals to maintain program coherence. 4) Termination signals from coordinating node to slave nodes on completion of analysis to stop the program cleanly. The parallel PSO scheme and required communication layer were implemented in ANSI C on a Linux operating system using the message passing interface (MPI) libraries. Synchronization and Implementation From the parallel PSO algorithm, it is clear that some means of synchronization is required to ensure that all of the particle fitness evaluations have been completed and results reported before the velocity and position calculations can be executed (steps 2a and 2b). Synchronization is done using a barrier function in the MPI communication library which temporarily stops the coordinating node from proceeding with the next swarm iteration until all of the computational nodes have responded with a fitness value. Because of this approach, the time required to perform a single parallel swarm fitness evaluation will be dictated by the slowest fitness evaluation in the swarm. Two networked clusters of computers were used to obtain the numerical results. The first cluster was used to solve the analytical test problems and comprised 40 1 .33 GHz Athlon PCs located in the High-performance Computing and Simulation (HCS) Research Laboratory at the University of Florida. The second group was used to solve the biomechanical system identification problems and consisted of 32 2 .40 GHz Intel PCs located in the HCS Research Laboratory at Florida State University. In both locations, 100 Mbps switched networks were utilized for connecting nodes. Sample Optimization Problems Analytical Test Problems Two well-known analytical test problems were used to evaluate parallel PSO algorithm performance on large-scale problems with multiple local minima (see Appendix A for mathematical description of both problems).The first was a test function (Figure 13 (a)) introduced by Griewank [108] which superimposes a high-frequency sine wave on a multi-dimensional parabola. In contrast, the second problem used the Corana test function [109] which exhibits discrete jumps throughout the design space (Figure 13(b)).For both problems, the number of local minima increases exponentially with the number of design variables. To investigate large-scale optimization issues, we formulated both problems using 128 design variables. Since fitness evaluations are extremely fast for these test problems, a delay of approximately half a second was built into each fitness evaluation so that total computation time would not be swamped by communication time. Since parallelization opens up the possibility of utilizing large numbers of processors, we used the analytical test problems to investigate how convergence rate and final solution are affected by the number of particles employed in a parallel PSO run. To ensure that all swarms were given equally 'fair' starting positions, we generated a pool of 128 initial positions using the Latin Hypercube Sampler (LHS). Particle positions selected with this scheme will be distributed uniformly throughout the design space [110]. This initial pool of 128 particles was divided into the following sub-swarms: one swarm of 128 particles, two swarms of 64 particles, four swarms of 32 particles, and eight swarms of 16 particles. Each sub-swarm was used independently to solve the two analytical test problems. This approach allowed us to investigate whether is it more efficient to perform multiple parallel optimizations with smaller population sizes or one parallel optimization with a larger population size given a sufficient number of processors. To obtain comparisons for convergence speed, we allowed all PSO runs to complete 10,000 iterations before the search was terminated. This number of iterations corresponded to between 160,000 and 1,280,000 fitness evaluations depending on the number of particles employed in the swarm. Biomechanical System Identification problems In addition to the analytical test problems, medium-scale biomechanical system identification problems were used to evaluate parallel PSO performance under more realistic conditions. These problems were variations of a general problem that attempts to find joint parameters (i.e. positions and orientations of joint axes) that match a kinematic ankle model to experimental surface marker data [56]. The data are collected with an optoelectronic system that uses multiple cameras to record the positions of external markers placed on the body segments. To permit measurement of three-dimensional motion, we attach three non-colinear markers to the foot and lower leg. The recordings are processed to obtain marker trajectories in a laboratory-fixed co-ordinate system [111, 112] The general problem possesses 12 design variables and requires approximately 1 minute for each fitness evaluation. Thus, while the problem is only medium-scale in terms of number of design variables, it is still computationally costly due to the time required for each fitness evaluation.. Y2 Q8~4" Figure 13 Surface plots of the (a) Griewank and (b) Corana analytical test problems showing the presence of multiple local minima. For both plots, 126 design variables were fixed at their optimal values and the remaining 2 design variables varied in a small region about the global minimum. The first step in the system identification procedure is to formulate a parametric ankle joint model that can emulate a patient's movement by possessing sufficient degrees of freedom. For the purpose of this paper, we approximate the talocrural and subtalar joints as simple 1 degree of freedom revolute joints. The resulting ankle joint model (Figure 6) contains 12 adjustable parameters that define its kinematic structure [56]. The model also has a set of virtual markers fixed to the limb segments in positions corresponding to the locations of real markers on the subject. The linkage parameters are then adjusted via optimization until markers on the model follow the measured marker trajectories as closely as possible. To quantify how closely the kinematic model with specified parameter values can follow measured marker trajectories, we define a cumulative marker error e as follows: e = Z A, (4.3) J=1 1-l where A Ax2 + Ayl + Az (4.4) where Ax,,, Ay,, and Az,, are the spatial displacement errors for marker i at time framej in the x y ,and z directions as measured in the laboratory-fixed coordinate system, n = 50 is the number of time frames, and m = 6 (3 on the lower leg and 3 on the foot) is the number of markers. These errors are calculated between the experimental marker locations on the human subject and the virtual marker locations on the kinematic model. For each time frame, a non-linear least squares sub-optimization is performed to determine the joint angles that minimize 4,,2 given the current set of model parameters. The first sub-optimization is started from an initial guess of zero for all joint angles. The sub-optimization for each subsequent time frame is started with the solution from the previous time frame to speed convergence. By performing a separate sub-optimization for each time frame and then calculating the sum of the squares of the marker co-ordinate errors, we obtain an estimate of how well the model fits the data for all time frames included in the analysis. By varying the model parameters and repeating the sub- optimization process, the parallel PSO algorithm finds the best set of model parameters that minimize e over all time frames. For numerical testing, three variations of this general problem were analyzed as described below. In all cases the number of particles used by the parallel PSO algorithm was set to a recommended value of 20 [94]. 1) Synthetic data without numerical noise: Synthetic (i.e. computer generated) data without numerical noise were generated by simulating marker movements using a lower body kinematic model with virtual markers. The synthetic motion was based on an experimentally measured ankle motion (see 3 below).The kinematic model used anatomically realistic joint positions and orientations. Since the joint parameters associated with the synthetic data were known, his optimization was used to verify that the parallel PSO algorithm could accurately recover the original model. 2) Synthetic data with numerical noise: Numerical noise was superimposed on each synthetic marker coordinate trajectory to emulate the effect of marker displacements caused by skin movement artifacts [53]. A previously published noise model requiring three random parameters was used to generate a perturbation Nin each marker coordinate [80]: N= Asin(cot+q+) (4.5) where A is the amplitude, co the frequency, and p the phase angle of the noise. These noise parameters were treated as uniform random variables within the bounds 0 < A < Icm, 0 < co < 25 rad/s, and 0 < p < 2 (obtained from [80]). 3) Experimental data: Experimental marker trajectory data were obtained by processing three-dimensional recordings from a subject performing movements with reflective markers attached to the foot and lower leg as previously described. Institutional review board approval was obtained for the experiments and data analysis, and the subject gave informed consent prior to participation. Marker positions were reported in a laboratory-fixed coordinate system. Speedup and Parallel Efficiency Parallel performance for both classes of problems was quantified by calculating speedup and parallel efficiency for different numbers of processors. Speedup is the ratio of sequential execution time to parallel execution time and ideally should equal the 64 number of processors. Parallel efficiency is the ratio of speedup to number of processors and ideally should equal 100%.For the analytical test problems, only the Corana problem was run since the half second delay added to both problems makes their parallel performance identical. For the biomechanical system identification problems, only the synthetic data with numerical noise case was reported since experimentation with the other two cases produced similar parallel performance. b 0 2000 4000 6000 8000 10000 Iterations Figure 14 Average fitness convergence histories for the (a) Griewank and (b) Corana analytical test problems for swarm sizes of 16,32,64,and 128 particles and 10000 swarm iterations. Triangles indicate the location on each curve where 160 000 fitness evaluations were completed. s. Corana 128 DVs I The number of particles and nodes used for each parallel evaluation was selected based on the requirements of the problem. The Corana problem with 128 design variables was solved using 32 particles and 1, 2, 4, 8, 16, and 32 nodes. The biomechanical problem with 12 design variables was solved using 20 particles and 1, 2, 5, 10, and 20 nodes. Both problems were allowed to run until 1000 fitness evaluations were completed. Numerical Results Convergence rates for the two analytical test problems differed significantly with changes in swarm size. For the Griewank problem (Figure 14(a)), individual PSO runs converged to within le-6 of the global minimum after 10 000 optimizer iterations, regardless of the swarm size. Run-to-run variations in final fitness value (not shown) for a fixed swarm size were small compared to variations between swarm sizes. For example, no runs with 16 particles produced a better final fitness value than any of the runs with 32 particles, and similarly for the 16-32, 32-64, and 64-128 combinations. When number of fitness evaluations was considered instead of number of swarm iterations, runs with a smaller swarm size tended to converge more quickly than did runs with a larger swarm size (see triangles in Figure 14). However, two of the eight runs with the smallest number of particles failed to show continued improvement near the maximum number of iterations, indicating possible entrapment in a local minimum. Similar results were found for the Corana problem (Figure 14(b)) with two exceptions. First, the optimizer was unable obtain the global minimum for any swarm size within the specified number of iterations (Figure 14(b)), and second, overlapping in results between different swarm sizes was observed. For example, some 16 particle results were better than 32 particles results, and similarly for the other neighboring combinations. On average, however, a larger swarm size tended to produce better results for both problems. Table 4 Parallel PSO results for the biomechanical system identification problem using synthetic marker trajectories without and with numerical noise. Optimizations on synthetic with noise and without noise were with 20 particles and were terminated after 40000 fitness evaluations. Model Upper Lower Synthetic Synthetic data parameter bound bound solution Without noise With noise pi(deg) 48.67 -11.63 18.37 18.36 15.13 p2 (deg) 30.00 -30.00 0.00 -0.01 8.01 p3 (deg) 70.23 10.23 40.23 40.26 32.97 p4(deg) 53.00 -7.00 23.00 23.03 23.12 p5 (deg) 72.00 12.00 42.00 42.00 42.04 p6 (cm) 6.27 -6.27 0.00 0.00 -0.39 p7 (cm) -33.70 -46.24 -39.97 -39.97 -39.61 p8 (cm) 6.27 -6.27 0.00 -0.00 0.76 p9 (cm) 0.00 -6.27 -1.00 -1.00 -2.82 pio (cm) 15.27 2.72 9.00 9.00 10.21 pii (cm) 10.42 -2.12 4.15 4.15 3.03 p12 (cm) 6.89 -5.65 0.62 0.62 -0.19 The parallel PSO algorithm found ankle joint parameters consistent with the known solution or results in the literature [61-63].The algorithm had no difficulty recovering the original parameters from the synthetic date set without noise (Table 4), producing a final cumulative error e on the order of 10 13.The original model was recovered with mean orientation errors less than 0.05 and mean position errors less than 0.008cm. Furthermore, the parallel implementation produced identical fitness and parameter histories as did a synchronous serial implementation. For the synthetic data set with superimposed noise, a RMS marker distance error of 0.568cm was found, which is on the order of the imposed numerical noise with maximum amplitude of 1cm. For the experimental data set, the RMS marker distance error was 0.394cm (Table 5), comparable to the error for the synthetic data with noise. Convergence characteristics were similar for the three data sets considered in this study. The initial convergence rate was quite high (Figure 15(a)), where after it slowed when the approximate location of the global minimum was found. Table 5 Parallel PSO results for the biomechanical system identification problem using synthetic marker trajectories without and with numerical noise. Synthetic Data Experimental RMS errors Without noise With noise data Marker distances (cm) 3.58E-04 0.568 0.394 Orientation parameters (deg) 1.85E-02 5.010 N/A Position parameters (cm) 4.95E-04 1.000 N/A As the solution process proceeded, the optimizer traded off increases in RMS joint orientation error (Figure 15(b)) for decreases in RMS joint position error (Figure 15(c)) to achieve further minor reductions in the fitness value. The analytical and biomechanical problems exhibited different parallel performance characteristics. The analytical problem demonstrated almost perfectly linear speedup (Figure 16(a), squares) resulting in parallel efficiencies above 95% for up to 32 nodes (Figure 16(b), squares). In contrast, the biomechanical problem exhibited speedup results that plateaued as the number of nodes was increased (Figure 16(a), circles), producing parallel efficiencies that decreased almost linearly with increasing number of nodes (Figure 16(b), circles). Each additional node produced roughly a 3% reduction in parallel efficiency. Discussion This study presented a parallel implementation of the particle swarm global optimizer. The implementation was evaluated using analytical test problems and biomechanical system identification problems. Speedup and parallel efficiency results were excellent when each fitness evaluation took the same amount of time. 100 200 0 100 200 300 400 300 400 500 600 500 600 0 100 200 300 400 500 600 700 8C 1.5 1- 100 200 300 400 Swarm iterations 600 700 Figure 15 Fitness convergence and parameter error plots for the biomechanical system identification problem using synthetic data with noise I 600 " 400 200 30 25 20 15 10 rI_ a 30 S20- 10- -0- Analytical -0- Biomechanical 0 I b 60- S40- I. 20 0 I I I I I 0 5 10 15 20 25 30 3 Number of Nodes Figure 16 (a) Speedup and (b) parallel efficiency for the analytical and biomechanical optimization problems. For problems with large numbers of design variables and multiple local minima, maximizing the number of particles produced better results than repeated runs with fewer particles. Overall, parallel PSO makes efficient use of computational resources and provides a new option for computationally demanding engineering optimization problems. The agreement between optimized and known orientation parameters pi-p4 for the biomechanical problem using synthetic data with noise was poorer than initially expected. This finding was the direct result of the sensitivity of orientation calculations to errors in marker positions caused by the injected numerical noise. Because of the close proximity of the markers to each other, even relatively small amplitude numerical noise in marker positions can result in large fluctuations in the best-fit joint orientations. While more time frames could be used to offset the effects of noise, this approach would increase the cost of each fitness evaluation due to an increased number of sub- optimizations. Nonetheless, the fitness value for the optimized parameters was lower than that for the parameters used to generate the original noiseless synthetic data. Though the biomechanical optimization problems only involved 12 design variables, multiple local minima existed when numerical or experimental noise was present. When the noisy synthetic data set was analyzed with a gradient-based optimizer using 20 random starting points, the optimizer consistently found distinct solutions, indicating a large number of local minima. Similar observations were made for a smaller number of gradient-based runs performed on the experimental data set. To evaluate the parallel PSOs ability to avoid entrapment in these local minima, we performed 10 additional runs with the algorithm. All 10 runs converged to the same solution, which was better than any of the solutions found by gradient-based runs. Differences in parallel PSO performance between the analytical test problem and the biomechanical system identification problem can be explained by load balancing issues. The half second delay added to the analytical test problem made all fitness evaluations take approximately the same amount of time and substantially less time than communication tasks. Consequently, load imbalances were avoided and little degradation in parallel performance was observed with increasing number of processors. In contrast, for the biomechanical system identification problem, the time required to complete the 50 sub-optimizations was sensitive to the selected point in design space, thereby producing load imbalances. As the number of processors increased, so did the likelihood that at least one fitness evaluation would take much longer than the others. Due to the synchronization requirement of the current parallel implementation, the resulting load imbalance caused by even one slow fitness evaluation was sufficient to degrade parallel performance rapidly with increasing number of nodes. An asynchronous parallel implementation could be developed to address this problem with the added benefit of permitting high parallel efficiency on inhomogeneous clusters. Our results for the analytical and biomechanical optimization problems suggest that PSO performs best on problems with continuous rather than discrete noise. The algorithm consistently found the global minimum for the Griewank problem, even when the number of particles was low. Though the global minimum is unknown for the biomechanical problem using synthetic data with noise, multiple PSO runs consistently converged to the same solution. Both of these problems utilized continuous, sinusoidal noise functions. In contrast, PSO did not converge to the global minima for the Corana problem with its discrete noise function. Thus, for large-scale problems with multiple local minima and discrete noise, other optimization algorithms such as GA may provide better results [48]. Use of a LHS rather than uniform random sampling to generate initial points in design space may be a worthwhile PSO algorithm modification. Experimentation with our random number generator indicated that initial particle positions can at times be grouped together. This motivated our use of LHS to avoid re-sampling the same region of design space when providing initial guesses to sub-swarms. To investigate the influence of sampling method on PSO convergence rate, we performed multiple runs with the Griewank problem using uniform random sampling and a LHS with the default design variable bounds (-600 to +600) and with the bounds shifted by 200 (-400 to +800).We found that when the bounds were shifted, convergence rate with uniform random sampling changed while it did not with a LHS. Thus, swarm behavior appears to be influenced by sampling method, and a LHS may be helpful for minimizing this sensitivity. A secondary motivation for running the analytical test problems with different numbers of particles was to determine whether the use of sub-swarms would improve convergence. The question is whether a larger swarm where all particles communicate with each other is more efficient than multiple smaller swarms where particles communicate within each sub-swarm but not between sub-swarms. It is possible that the global best position found by a large swarm may unduly influence the motion of all particles in the swarm. Creating sub-swarms that do not communicate eliminates this possibility. In our approach, we performed the same number of fitness evaluations for each population size. Our results for both analytical test problems suggest that when a large numbers of processors are available, increasing the swarm size will increase the probability of finding a better solution. Analysis of PSO convergence rate for different numbers of particles also suggests an interesting avenue for future investigation. Passing an imaginary curve through the triangles in Figure 14 reveals that for a fixed number of fitness evaluations, convergence rate increases asymptotically with decreasing number of particles. While the solution found by a smaller number of particles may be a local minimum, the final particle positions may still identify the general region in design space where the global minimum is located. Consequently, an adaptive PSO algorithm that periodically adjusts the number of particles upward during the course of an optimization may improve convergence speed. For example, an initial run with 16 particles could be performed for a fixed number of fitness evaluations. At the end of that phase, the final positions of those 16 particles would be kept, but 16 new particles would be added to bring the total up to 32 particles. The algorithm would continue using 32 particles until the same number of fitness evaluations was completed. The process of gradually increasing the number of particles would continue until the maximum specified swarm size (e.g. 128 particles) was analyzed. To ensure systematic sampling of the design space, a LHS would be used to generate a pool of sample points equal to the maximum number of particles and from which sub-samples would be drawn progressively at each phase of the optimization. In the scenario above with a maximum of 128 particles, the first phase with 16 particles would remove 16 sampled points from the LHS pool, the second phase another 16 points, the third phase 32 points, and the final phase the remaining 64 points. Conclusions In summary, the parallel Particle Swarm Optimization algorithm presented in this chapter exhibits excellent parallel performance as long as individual fitness evaluations require the same amount of time. For optimization problems where the time required for each fitness evaluation varies substantially, an asynchronous implementation may be needed to reduce wasted CPU cycles and maintain high parallel efficiency. When large numbers of processors are available, use of larger population sizes may result in improved convergence rates to the global solution. An adaptive PSO algorithm that increases population size incrementally may also improve algorithm convergence characteristics. CHAPTER 5 IMPROVED GLOBAL CONVERGENCE USING MULTIPLE INDEPENDENT OPTIMIZATIONS Overview This chapter presents a methodology for improving the global convergence probability in large scale global optimization problems in cases where several local minima are present. The optimizer applied in this methodology is the PSO, but the strategy outlined here is applicable to any type of algorithm. The controlling idea behind this optimization approach is to utilize several independent optimizations, each using a fraction of a budget of computational resources. Although optimizations may have a limited probability of convergence individually as compared to a single optimization utilizing the full budget, it is shown that when they are combined they will have a cumulative convergence probability far in excess of the single optimization. Since the individual limited optimizations are independent they may be executed concurrently on separate computation nodes with no interaction. This exploitation of parallelism allows us to vastly increase the probability of convergence to the global minimum while simultaneously reducing the required wall clock time if a parallel machine is used. Introduction If we consider the general global unconstrained optimization problem for the real valued function f(x) defined on the set x E D in D one cannot state that a global solution has been found unless an exhaustive search of the set is x e D is performed [115]. With a finite number of function evaluations, at best we can only estimate the probability of arriving at or near the global optimum. To solve global optimization problems reliably the optimizer needs to achieve an efficient balance between sampling the entire design space and directing progressively more densely spaced sampling points towards promising regions for a more refined search [116]. Many algorithms achieve this balance, such as the deterministic DIRECT optimizer [117] or stochastic algorithms such as genetic algorithms [118], simulated annealing [119,120], clustering [121], and the particle swarm optimizer [122]. Although these population-based global optimization algorithms are fairly robust, they can be attracted, at least temporarily, towards local optima which are not global (see, for example, the Griewank problem in Figure 17). Griewank 10 -50 X2 -10 -10 Figure 17 Multiple local minima for Griewank analytical problem surface plot in two dimensions This difficulty can be addressed by allowing longer optimization runs or an increased population size. Both these options often result in a decrease in the algorithm efficiency, with no guarantee that the optimizer will escape from the local optimum. It is possible that restarting the algorithm when it gets stuck in a local minimum and allowing multiple optimization runs may be a more efficient approach. This follows from the hypothesis that several limited independent optimization runs, each with a small likelihood of finding the global optimum, may be combined in a synergistic effort which yields a vastly improved global convergence probability. This approach is routinely used for global search using multi-start local optimizers [123]. Le Riche and Haftka have also suggested the use of this approach with genetic algorithms for solving complex composite laminate optimization problems [124]. The main difficulty in the application of such a multi-run strategy is deciding when the optimizer should be stopped. The objective of this manuscript is to solve this problem by developing an efficient and robust scheme by which to allocate computational resources to individual optimizations in a set of multiple optimizations. The organization of this manuscript is as follows: First a brief description of the optimization algorithm applied in this study, the PSO algorithm is given. Next a set of analytical problems are described, along with details on calculating global convergence probabilities. After that, the multiple run methodology is outlined and a general budget strategy is presented for dividing a fixed number of fitness evaluations among multiple searches on a single processor. The use of this method on a parallel processing machine is also discussed. Then, numerical results based on the multiple run strategy for both single and multi-processor machines are reported and discussed. Finally, general conclusions about the multi-run methodology are presented. Methodology Analytical Test Set The convergence behavior of the PSO algorithm was analyzed with the Griewank [108], Shekel [114] and Hartman [114] analytical problems (see Appendix A for problem definitions), each of which possess multiple local minima. Analytical test problems were used because global solutions are known a-priori. The known solution value allows us to ascertain if an optimization has converged to the global minimum. To estimate the probability of converging to the global optimum, we performed 1000 optimization runs for each problem, with each run limited to 500,000 fitness evaluations. These optimization runs are performed with identical parameter settings, with the exception of a different random number seed for each optimization run in order to start the population at different initial points in the design space. To evaluate the global convergence probability of the PSO algorithm as a function of population size, we solved each problem using a swarm of 10, 20, 50 and 100 particles. A standard set of parameters were used for the other algorithm parameters (Table 6). Table 6 Particle swarm algorithm parameters Parameter Description Value cl Cognitive trust parameter 2.0 c2 Social trust parameter 2.0 wo Initial inertia 1 Wd Inertia reduction parameter 0.01 K Bound on velocity fraction 0.5 vd Velocity reduction parameter 0.01 d Dynamic inertia/velocity reduction delay (function 200 evaluations) We assumed that convergence to the global optimum was achieved when the fitness value was within a predetermined fixed tolerance E, (see Table 7) of the known global optimum f*. f < f + (5.1) For the Shekel and Hartman problems, the tolerance ensures that the minimum corresponding to the global optimum for the problem has been found. That is, because c i 0 the exact optimum is not obtained, but if a local optimizer is started from the PSO solution found with the given tolerance it will converge to the global optimum. For the Griewank problem, however, starting a local optimizer at the PSO solution will not guarantee convergence to the global optimum, since this noisy, shallow convex problem has several local minima grouped around the global optimum that will defeat a local optimizer Table 7 Problem convergence tolerances Problem Convergence tolerance et Griewank 0.1 Shekel 0.001 Hartman 0.001 Multiple-run Methodology The use of multiple optimizations using a global optimizer such as a GA was first proposed by Le Riche and Haftka [124]. However, no criterion was given on the division of computational resources between the multiple optimizations, and the efficiency of the approach was not investigated. The method entails running multiple optimizations with a reduced number of fitness evaluations, either by limiting the number of algorithm iterations or reducing the population while keeping the number of iterations constant. Individually, the convergence probability of such a limited optimization may only be a fraction of a single traditional optimization run. However, the cumulative convergence probability obtained by combining the limited runs can be significantly higher than that of the single run. Previously similar studies have been undertaken to investigate the efficiency of repeated optimizations using simple search algorithms such as pure random search, grid search, and random walk [130,131]. The use of multiple local optimizations or clustering [132] is a common practice but for some algorithms the efficiency of this approach decreases rapidly when problems with a high number of local minima are encountered [130]. For estimating the efficiency of the proposed strategy and for comparison with optimizations with increased populations/allowed iterations, we are required to calculate the probability of convergence to the global optimum for an individual optimization run, P,. This convergence probability cannot be easily calculated for practical engineering problems with unknown solutions. For the set of analytical problems however, the solutions are known and a large number of optimizations of these problems can be performed at little computational cost. With some reasonable assumptions these two facts allow us to estimate the probability of convergence to the global optimum for individual optimization runs. The efficiency and exploration run considerations derived from the theoretical analytical results are equally applicable to practical engineering problems where solutions are not known a priori. The first step in calculating P, is using the convergence ratio, Cr, which is calculated as follows: C =- (5.2) r N where Nc is the number of globally converged optimizations and Nis the number of optimizations, in this case 1000. For a very large number of optimizations the probability P, that any individual run converges to the global optimum approaches Cr. For a finite number of runs however the standard error se in P, can be quantified using: e = (5.3) V N which is an estimate of the standard deviation of Cr. For example, if we obtain a convergence probability of Pi = 0.5 with N = 1000 optimizations, the standard error would be se = 0.016. To obtain the combined cumulative probability of finding the global optimum by multiple independent optimizations we apply the statistical law for calculating the probability for success with repeated independent events. We denote the combined or cumulative probability of N multiple independent optimization runs converging to the solution as Pc, and using the fact that the convergence events are uncorrelated then N P = 1- I (5.4) where P, is the probability of the ith single individual optimization run converging to the global optimum. If we assume that individual optimization runs with similar parameter settings, as in the case of the following study, have equal probability of convergence we can simplify Eq. (5.4) to 4 =1-(1- (5.5) The increase in cumulative probability Pc with fixed values ofP, for increasing number of optimization runs Nis illustrated in Figure 18. It must be stressed that the above relations are only valid for uncorrelated optimizations, which may not be the case when a poor quality random number generator is used to generate initial positions in the design space. Certain generators can exhibit a tendency to favor regions in the design space, biasing the search and probability of convergence to a minimum in these regions. 0.9 a*' 0.8 0.9 6 - - -- 4 03 o 0.7 ' > 0.6 o 0.5 0.4 o 0.3 0.2 0.1 0 00oo 1 2 3 4 5 6 7 8 9 10 Number of runs Figure 18 Cumulative convergence probability Pc as a function of the number of optimization runs with assumed equal P, values To verify the cumulative probability values predicted in theory with Eq. (5.5), the Monte Carlo method is applied, sampling random pairs, triplets, quintuplets etc. of optimizations in the pool of 1000 runs. For example, to estimate the experimental global convergence probability of two runs, we selected a large number of random pairs of optimizations among the 1000 runs. Applying Eq. (5.2), the number of cases in which either or both of the pairs converged N, for the limited optimization run, divided by N (total number of pairs selected) will yield the experimental global convergence probability. Exploratory run and budgeting scheme Using the multiple run methodology requires a budget strategy by which to divide a fixed budget of fitness evaluations among the independent optimization runs. The budget of fitness evaluations nb is usually dictated by how much time the user is willing to allocate on a machine in order to solve a problem divided by how long a single fitness evaluation takes to execute. An exploratory optimization utilizing a fraction, nl of this budget is required to determine the interaction between the optimizer and problem. The fitness history of this optimization is used to obtain an estimate of the number of fitness evaluations to be allocated to each run, n,. This strategy is based on the assumption that a single fitness history will be sufficient to quantify the optimizer behavior on a problem. For the set of problems it is observed that a correlation exists between the point where the fitness history levels off and the convergence history levels off (Figure 19). Griewank, 20 particles Hartman, 20 particles Shekel, 20 particles 0 0 102 -2 i -4 S100 -2 -6 I_ -8 -3 -10 0 5 10 0 0.5 1 1.5 2 0 5 10 Fitness Evaluationsx 104 Fitness Evaluationsx 104 Fitness Evaluationsx 104 1 0.4 1 0.8 0.3 0.8 0.6 0.6 8 0.2 S0.4 0.4 a) a) 0.2 0.2 >o o 0o o 0 0 0 0 01-l-------- 1 --- <------ 0 0 1--------- 0 5 10 0 0.5 1 1.5 2 0 5 10 Fitness Evaluationsx 104 Fitness Evaluationsx 104 Fitness Evaluationsx 104 Figure 19 Fitness history and convergence probability Pc plots for Griewank, Hartman and Shekel problems We hypothesize that the algorithm will converge quickly to the optimum or stall at a similar number of fitness evaluations (Figure 20). The exploratory run is stopped using a stopping criterion which monitors the rate of change of the objective fitness value as a function of the number of fitness evaluations. As soon as this rate of improvement drops below a predetermined value (i.e. the fitness value plot levels off), the exploratory optimization is stopped and the number of fitness evaluations noted as n1. The stopping criterion parameters used for obtaining the numerical results is a change of less than 0.01 in fitness value for at least 500 functions evaluations. 83 0 -- -2 - -4 (D -2 > n) -6 (D u- -8 -10 -12 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fitness evaluations x105 Figure 20 Typical Shekel fitness history plots of 20 optimizations (sampled out of 1000) The remainder of the budgeted fitness evaluations is distributed among a number of N independent optimizations, which may be calculated as follows N= --n 1 (5.6) Sn, with an allowed number of fitness evaluations per run calculate by n N= n > n (5.7) If a multi-processor machine is available, very high Pc values may be reached using a multiple run strategy. If we take the simple case where each independent optimization run is assigned to a separate node, the multiple run approach will be constrained somewhat differently than the previous single-processor case. Rather than the number of multiple optimizations being limited by a fixed budget of fitness evaluations (which is divided equally among the set of multiple independent optimizations using Eq.(5.6)), the number of optimization runs will be defined by the number of computational nodes and the wall clock time available to the user. A similar method to that followed for a single 7rrT11 processor machine for determining algorithm/problem behavior must still be followed to determine the optimal number of fitness evaluations for a single independent optimization run. This exploratory run can, however, be done using a parallel implementation of the population-based algorithm under consideration, in which concurrent processing is achieved through functional decomposition [125]. Bayesian convergence probability estimation If the amount of time and the global probability of convergence are competing considerations a Bayesian convergence probability estimation method may be used, as proposed by Groenwold et al. [134,135]. This criterion states that the optimization is stopped once a certain confidence level is reached, which is that the best solution found among all optimizations I will be the global solution f*. This probability or confidence measure is given in [135] as (N+t)!(2N+b)! Prq=-(N + ) (2N + (5.8) Pr[f=f' q (2N+U)! N+b)!(5.8) where q is the predetermined confidence level set by the user, usually 0.95. N is the total number of optimizations performed up to the time of evaluating the stopping criteria, and = a + b -1, b =b N, with a, b suitable parameters of a Beta distribution f(a, b) . The number of optimizations among the total Nwhich yield a final value of f is defined as Nc. Values of parameter a and b were chosen as 1 and 5 respectively, as recommended by Groenwold et. al.[135]. Numerical Results Multi-run Approach for Predetermined Number of Optimizations For the three problems under consideration only a limited improvement in global convergence probability is achieved by applying the traditional approaches of increasing the number of fitness evaluations or the population size (Figure 21). TI 7 ---------- T--- 7---------- I --- 7II I 09 - 0.8 ,,..- 0. - T 1 p......... r I I ---------- (D 0.5 Ti I ....... 10 particles - 20 particles S0.4 --- 50 particles > --- 100 particles 0 03- A- -- --- - - 0.2 - 0.1 r - - 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fitness evaluations x 10 Figure 21 Shekel convergence probability for an individual optimization as a function of fitness evaluations and population size For the Shekel problem, using larger swarm sizes and/or allowing an increased number of fitness evaluations yielded higher convergence probabilities only up to a point. Similar results were obtained for the Griewank and Hartman problem cases. On the other hand, optimizations with a small number of particles reached moderate global convergence probabilities at significantly fewer fitness evaluations than did optimizations with large swarms. This behavior was observed for all the problems in the test set (Figure 19). To exploit this behavior we replace a single optimization with several PSO runs, each with a limited population and number of iterations. These individual optimizations utilize the same amount of resources allocated to the original single optimization (in this case the number of fitness evaluations). To illustrate the merit of such an approach we optimize the Hartman analytical problem with and without multiple limited optimizations. We observe that for a single optimization the probability of convergence is not significantly improved by allowing more fitness evaluations, or by increasing the population size (Figure 22). 0.8 ----- --.. 0.9 8- -' .-- - 0.7 -- - 10 particles 0.7 ...... 10 particles o S-' -- 20 particles S06 50 particles S/ ---. 100 particles 0.5 -- P with 10 indep. optimizations S0.4 - :- ---- I - 0 .......... ..... I-. 0.3- o -r---------- 0. -- i.--al------------ r- -------------- 0.2 I I I I 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fitness evaluations x 10 Figure 22 Theoretical cumulative convergence probability Pc as a function of the number of optimization runs with constant P, for the Hartman problem. Multiple independent runs with 10 particles. We also observe that an optimization with 10 particles quickly attains a probability of convergence of P, = 0.344 after only 10,000 fitness evaluations. Using a multiple run strategy with 10 independent optimizations of 10,000 fitness evaluations yields the theoretical Pc values reported in Table 8 (calculated using Eq. (5.5) with P, = 0.344 and n = 1,...,10). These values are indicated as circled data points at a sum equivalent number of fitness evaluations in Figure 22 for comparison with a single extended optimization. |

Full Text |

PAGE 1 APPLICATIONS OF PARALLEL GLOBAL OPTIMIZATION TO MECHANICS PROBLEMS By JACO FRANCOIS SCHUTTE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 PAGE 2 Copyright 2005 by JACO FRANCOIS SCHUTTE PAGE 3 This work is dedicated to my parents and my wife Lisa. PAGE 4 iv ACKNOWLEDGMENTS First and foremost, I would like to thank Dr. Raphael T. Haftka, chairman of my advisory committee, for the opportunity he pr ovided me to complete my doctoral studies under his exceptional guidance. Without his une nding patience, constant encouragement, guidance and expertise, this work would not have been possible. Dr. Haftka's mentoring has made a lasting impression on both my academic and personal life. I would also like to thank the members of my advisory committee, Dr. Benjamin Fregly, Dr. Alan D. George, Dr Panos M. Pardalos, and Dr. Nam Ho Kim. I am grateful for their willingness to serve on my committee, for the help they provided, for their involvement with my oral examination, and for reviewing this di ssertation. Special thanks go to Dr. Benjamin Fregly, who provide d a major part of the financial support for my studies. Special thanks also go to Dr. Alan George whose parallel processing graduate course provided much of the inspiration for th e research presented in this manuscript, and for reviewing some of my publications. Thanks go also to Dr. Nielen Stander who provided me with the wonderful opportunity to do an internship at the Livermore Software Technology Corporation. My colleagues in the Structural and Multidisciplinary Optimization Research Group at the University of Florida also de serve many thanks for their support and the many fruitful discussions. Speci al thanks go to Tushar Goel, Erdem Acar, and also Dr. Satchi Venkataraman and his wife Beth, who took me in on my arrival in the USA and provided me a foothold for which I will forever be grateful. PAGE 5 v The financial support provided by AFOSR grant F49620-09-1-0070 to R.T.H. and the NASA Cooperative Agreement NCC3-994, the Institute for Future Space Transport University Research, Engineering and Technol ogy Institute is grat efully acknowledged. I would also like to express my deepest appreciation to my parents. Their limitless love, support and understanding are the main stay of my achievements in life. Lastly, I would like to thank my wife, Lisa Without her love, pa tience and sacrifice I would never have been able to finish this dissertation. PAGE 6 vi TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES.............................................................................................................ix LIST OF FIGURES.............................................................................................................x ABSTRACT.....................................................................................................................xiii CHAPTER 1 INTRODUCTION........................................................................................................1 Statement of Problem...................................................................................................1 Purpose of Research.....................................................................................................1 Significance of Research..............................................................................................1 Parallelism by Exploitation of Op timization Algorithm Structure........................2 Parallelism through Multiple Indepe ndent Concurrent Optimizations.................3 Parallelism through Concurrent Opti mization of Decomposed Problems............3 Roadmap.......................................................................................................................4 2 BACKGROUND..........................................................................................................5 Population-based Global Optimization.........................................................................5 Parallel Processing in Optimization..............................................................................6 Decomposition in Large Scale Optimization................................................................7 Literature Review: Proble m Decomposition Strategies...............................................8 Collaborative Optimization (CO)..........................................................................8 Concurrent SubSpace Optimization (CSSO).......................................................11 Analytical Target Cascading (ATC)....................................................................15 Quasiseparable Decomposition and Optimization..............................................17 3 GLOBAL OPTIMIZATION THRO UGH THE PARTICLE SWARM ALGORITHM............................................................................................................18 Overview.....................................................................................................................18 Introduction.................................................................................................................19 Theory.........................................................................................................................21 Particle Swarm Algorithm...................................................................................21 PAGE 7 vii Analysis of Scale Sensitivity...............................................................................24 Methodology...............................................................................................................28 Optimization Algorithms.....................................................................................28 Analytical Test Problems....................................................................................30 Biomechanical Test Problem...............................................................................32 Results.................................................................................................................37 Discussion...................................................................................................................41 Conclusions.................................................................................................................46 4 PARALLELISM BY EXPLOITING PO PULATION-BASED ALGORITHM STRUCTURES...........................................................................................................47 Overview.....................................................................................................................47 Introduction.................................................................................................................48 Serial Particle Swarm Algorithm................................................................................50 Parallel Particle Swarm Algorithm.............................................................................53 Concurrent Operation and Scalability.................................................................53 Asynchronous vs. Synchronous Implementation................................................54 Coherence............................................................................................................55 Network Communication....................................................................................56 Synchronization and Implementation..................................................................58 Sample Optimization Problems..................................................................................59 Analytical Test Problems....................................................................................59 Biomechanical System Identification problems..................................................60 Speedup and Parallel Efficiency..........................................................................63 Numerical Results.......................................................................................................65 Discussion...................................................................................................................67 Conclusions.................................................................................................................73 5 IMPROVED GLOBAL CONVERGENCE USING MULTIPLE INDEPENDENT OPTIMIZATIONS.....................................................................................................74 Overview.....................................................................................................................74 Introduction.................................................................................................................74 Methodology...............................................................................................................77 Analytical Test Set..............................................................................................77 Multiple-run Methodology..................................................................................78 Exploratory run and budgeting scheme........................................................81 Bayesian convergence probability estimation..............................................84 Numerical Results.......................................................................................................85 Multi-run Approach for Predetermi ned Number of Optimizations.....................85 Multi-run Efficiency............................................................................................87 Bayesian Convergence Probability Estimation...................................................89 Monte Carlo Convergence Probability Estimation..............................................92 Conclusions.................................................................................................................92 6 PARALLELISM BY DECOMPOSITION METHODOLOGIES.............................94 PAGE 8 viii Overview.....................................................................................................................94 Introduction.................................................................................................................94 Quasiseparable Decomposition Theory......................................................................96 Stepped Hollow Cantilever Beam Example...............................................................98 Stepped hollow beam optimization...................................................................102 Quasiseparable Optimization Approach............................................................104 Results.......................................................................................................................106 All-at-once Approach........................................................................................106 Hybrid all-at-once Approach.............................................................................107 Quasiseparable Approach..................................................................................108 Approximation of Constraint Margins..............................................................109 Discussion.................................................................................................................112 Conclusions...............................................................................................................115 7 CONCLUSIONS......................................................................................................116 Parallelism by Exploitation of Op timization Algorithm Structure...........................116 Parallelism through Multiple Independent Optimizations........................................116 Parallelism through Concurrent Opti mization of Decomposed Problems...............117 Future Directions......................................................................................................117 Summary...................................................................................................................118 APPENDIX A ANALYTICAL TEST PROBLEM SET..................................................................119 Griewank...................................................................................................................119 Hartman 6.................................................................................................................119 Shekel 10..................................................................................................................120 B MONTE CARLO VERIFICATION OF GLOBAL CONVERGENCE PROBABILITY........................................................................................................122 LIST OF REFERENCES.................................................................................................126 BIOGRAPHICAL SKETCH...........................................................................................138 PAGE 9 ix LIST OF TABLES Table page 1 Standard PSO algorithm para meters used in the study............................................24 2 Fraction of successful optimizer runs for the analytical test problems....................37 3 Final cost function values and associat ed marker distance and joint parameter root-mean-square (RMS) errors after 10,000 function evaluations performed by multiple unscaled and scaled PSO, GA, SQP, and BFGS runs................................40 4 Parallel PSO results for the biomechanic al system identification problem using synthetic marker trajectories wi thout and with numerical noise..............................66 5 Parallel PSO results for the biomechanic al system identification problem using synthetic marker trajectories wit hout and with numerical noise..............................67 6 Particle swarm algorithm parameters.......................................................................77 7 Problem convergence tolerances..............................................................................78 8 Theoretical convergence probability results for Hartman problem.........................87 9 Minimum, maximum and median fitness evaluations when applying ratio of change stopping criteria on pool of 1,000 optimizations for Griewank, Hartman and Shekel problems................................................................................................89 10 Beam material properties a nd end load configuration...........................................101 11 Stepped hollow beam global optimum...................................................................104 12 All-at-once approach median solution...................................................................107 13 Hybrid, all-at-once median solution.......................................................................108 14 Quasiseparable optimization result........................................................................110 15 Surrogate lower level approxi mation optimization results....................................112 16 Hartman problem constants....................................................................................120 17 Shekel problem constants.......................................................................................121 PAGE 10 x LIST OF FIGURES Figure page 1 Collaborative optimization flow diagram................................................................10 2 Collaborative optimization subspace constr aint satisfaction procedure (taken from [6])...................................................................................................................10 3 Concurrent subspace optimization methodology flow diagram...............................13 4 Example hierarchical problem structure...................................................................16 5 Sub-problem information flow.................................................................................16 6 Joint locations and orient ations in the parametric ankle kinematic model...............33 7 Comparison of convergence history results for the analytical test problems...........38 8 Final cost function values for ten unscal ed (dark bars) and scaled (gray bars) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem.........39 9 Convergence history for unscaled (dark li nes) and scaled (g ray lines) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem......................40 10 Sensitivity of gradient calculations to selected finite difference step size for one design variable..........................................................................................................43 11 Serial implementation of PSO algorithm.................................................................54 12 Parallel implementation of the PSO algorithm........................................................57 13 Surface plots of the (a) Griewank and (b ) Corana analytical test problems showing the presence of multiple local minima.......................................................61 14 Average fitness convergen ce histories for the (a) Griewank and (b) Corana analytical test problems for swar m sizes of 16,32,64,and 128 particles and 10000 swarm iterations.......................................................................................................64 15 Fitness convergence and parameter erro r plots for the biomechanical system identification problem using synthetic data with noise............................................68 16 (a) Speedup and (b) parallel efficiency for the analytical and biomechanical optimization problems..............................................................................................69 PAGE 11 xi 17 Multiple local minima for Griewank anal ytical problem surface plot in two dimensions................................................................................................................75 18 Cumulative convergence probability Pc as a function of the number of optimization runs with assumed equal Pi values......................................................81 19 Fitness history and convergence probability Pc plots for Griewank, Hartman and Shekel problems.......................................................................................................82 20 Typical Shekel fitness history plots of 20 optimizations (sampled out of 1000).....83 21 Shekel convergence probability for an individual optimizatio n as a function of fitness evaluations and population size....................................................................85 22 Theoretical cumulative convergence probability Pc as a function of the number of optimization runs with constant Pi for the Hartman problem..............................86 23 Theoretical convergence probability Pc with sets of multiple runs for the Griewank problem....................................................................................................88 24 Theoretical convergence probability Pc using information from exploratory optimizations which are stopped using a ra te of change stopping condition for the Griewank, Hartman and Shekel problems..........................................................90 25 Bayesian Pc estimation comparison to using ex trapolated and randomly sampled optimizations out of pool of 1000 runs for Griewank problem................................91 26 Bayesian Pc estimation comparison to using ex trapolated and randomly sampled optimizations out of pool of 1000 runs for Hartman problem.................................91 27 Bayesian Pc estimation comparison to using ex trapolated and randomly sampled optimizations out of pool of 1000 runs for Shekel problem....................................92 28 Stepped hollow cantilever beam..............................................................................99 29 Dimensional parameters of each cross section.........................................................99 30 Projected displacement in direction ....................................................................100 31 Tip deflection contour plot as a function of beam section 5 with height h and width w with yield stress and aspect ratio constraints indicated by dashed and dash dotted lines respectively.................................................................................103 32 Quasiseperable optimization flow chart.................................................................105 33 Results for 1000 all-at-once optimizations.............................................................106 34 Hybrid PSO-fmincon strategy for 100 optimizations............................................108 PAGE 12 xii 35 Repeated optimizations of section 1 subproblem using fmincon function.............110 36 Summed budget value and constraint margins for individual sections..................111 37 Global and local optimum in section 1 sub-optimization. Scale is 0.1:1...............111 38 Decomposed cross section solution. Scale is 0.1:1................................................112 39 Target tip deflection valu e histories as a functi on of upper-level fitness evaluations..............................................................................................................113 40 Constraint margin value histories as a function of upper-level function evaluations..............................................................................................................114 41 Predicted and Monte Carlo sampled convergence probability Pc for 5 independent optimization runs for the Griewank problem.....................................122 42 Predicted and Monte Carlo sampled convergence probability Pc for 12 independent optimization runs for the Griewank problem.....................................123 43 Monte Carlo sampled convergence probability Pc with sets of multiple runs for the Griewank problem............................................................................................123 44 Monte Carlo sampled convergence probability Pc using information from exploratory optimizations stopped using a rate of change stopping condition for the Griewank, Hartman and Shekel problems........................................................124 45 Bayesian Pc comparison for Griewank, Hart man and Shekel problem..................125 PAGE 13 xiii Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy APPLICATIONS OF PARALLEL GLOBAL OPTIMIZATION TO MECHANICS PROBLEMS By Jaco Francois Schutte December 2005 Chair: Raphael T. Haftka Cochair: Benjamin J. Fregly Major Department: Mechanic al and Aerospace Engineering Global optimization of complex engineer ing problems, with a high number of variables and local minima, requires sophi sticated algorithms with global search capabilities and high computati onal efficiency. With the grow ing availability of parallel processing, it makes sense to address these requirements by increasing the parallelism in optimization strategies. This study proposes th ree methods of concurrent processing. The first method entails exploiting the structure of population-based global algorithms such as the stochastic Particle Swarm Optimizati on (PSO) algorithm and the Genetic Algorithm (GA). As a demonstration of how such an algorithm may be adapted for concurrent processing we modify and apply the PSO to several mechanical optimization problems on a parallel processing machine. Desirable PSO al gorithm features such as insensitivity to design variable scaling and modest sensitivit y to algorithm parameters are demonstrated. A second approach to parallelism and impr oving algorithm efficiency is by utilizing multiple optimizations. With this method a bud get of fitness evaluations is distributed PAGE 14 xiv among several independent sub-optimizations in place of a single extended optimization. Under certain conditions this strategy obt ains a higher combined probability of converging to the global optimum than a single optimization wh ich utilizes the full budget of fitness evaluations. The third and fina l method of parallelism addressed in this study is the use of quasiseparable decompositi on, which is applied to decompose loosely coupled problems. This yields several subproblems of lesser dimensionality which may be concurrently optimized with reduced effort. PAGE 15 1 CHAPTER 1 INTRODUCTION Statement of Problem Modern large scale problems often require high-fidelity analyses for every fitness evaluation. In addition to this, these optimiza tion problems are of a global nature in the sense that many local minima exist. These two factors combine to form exceptionally demanding optimization problems which requ ire many hours of computation on high-end single-processor computers. In order to e fficiently solve such challenging problems parallelism may be employed for improve d optimizer throughput on computational clusters or multi-core processors. Purpose of Research The research presented in this manuscrip t is targeted on the investigation of methods of implementing parallelism in gl obal optimization. These methods are (i) parallel processing through the optimization algorithm, (ii) multiple independent concurrent optimizations, and (iii) para llel processing by decomposition. Related methods in the literature are reported and will be compared to the approaches formulated in this study. Significance of Research Parallel processing is becoming a rapidl y growing resource in the engineering community. Large processor farms or Be owulf clusters are becoming increasingly common at research and commercial engineering facilities. In addition to this, processor manufacturers are encounte ring physical limitations such as heat dissipation and PAGE 16 2 constraints on processor dimensions at cu rrent clock frequencies because of the upper limit on signal speeds. These place an upper li mit on the clock frequencies that can be attained and have forced manufacturers to look at other altern atives to improve processing capability. Both Intel and AMD are currently developing methods of putting multiple processors on a single die and will be releasing multiprocessor cores in the consumer market in the near future. This mu lti-core technology will enable even users of desktop computers to utilize concurrent processing and ma ke it an increasingly cheap commodity in the future. The engineering community is facing more complex and computationally demanding problems as the fide lity of simulation software is improved every day. New methods that can take advantage of the increasing availability of parallel processing will give the engineer powerful tool s to solve previously intractable problems. In this manuscript the specific problem of the optimization of large-scale global engineering problems is addressed by utilizi ng three different avenues of parallelism. Any one of these methods and even combin ations of them may utilize concurrent processing to its advantage. Parallelism by Exploitation of Op timization Algorithm Structure Population-based global optimizers such as the Particle Swarm Optimizer (PSO) or Genetic Algorithms (GAs) coordinate th eir search effort in the design space by evaluating a population of indivi duals in an iterative fashi on. These iterations take the form of discrete time steps for the PSO and generations in the case of the GA. Both the PSO and the GA algorithm structures allow f itness calculations of individuals in the population to be evaluated independently and concurrently. This opens up the possibility of assigning a computational node or processo r in a networked group of machines to each PAGE 17 3 individual in the population, and calculating the fitness of each indi vidual concurrently for every iteration of the optimization algorithm. Parallelism through Multiple Inde pendent Concurrent Optimizations A single optimization of a large-scale problem will have significant probability of becoming entrapped in a local minimum. This risk is alleviated by utilizing population based algorithms such as PSO and GA's. These global optimizers have the means of escaping from such a local minima if e nough iterations are allowed. Alternatively, a larger population may be used, allowing fo r higher sampling densities of the design space, which also reduces the risk of entrap ment. Both these options require significant additional computation effort, with no guarantee of improvement in global convergence probability. A more effective stra tegy can be followed while utilizing the same amount of resources. By running several independent but limited optimizations it will be shown that in most cases the combined probability of finding the global optimum is greatly improved. The limited optimization runs ar e rendered independent by applying a population based optimizer with different sets of initial population distributions in the design space. Parallelism through Concurrent Optimi zation of Decomposed Problems Some classes of such problems may be s ubdivided into several more tractable subproblems by applying decomposition strategies. This process of decomposition generally involves identifying groups of variables and constraints with minimal influence on one another. The choice of which decomposition strategy to apply depends largely on the original problem structure, and the interac tion among variables. Th e objective is to find an efficient decomposition strategy to separa te such large scale global optimization PAGE 18 4 problems into smaller sub-problems without introducing spurious local minima, and to apply an efficient optimizer to so lve the resulting sub-problems. Roadmap A background on the global optimization of large scale optimization problems, appropriate optimization algorithms and techni ques, and parallelism will be presented in Chapter 2. Chapter 3 presents an evaluation of the global, stocha stic population based algorithm, the Particle Swarm Optimizer, thr ough several analytical and biomechanical system identification problems. In Chapter 4 the parallelization of this population based algorithm is demonstrated and applied. Chapter 5 details the use of multiple independent concurrent optimizations for significant improvements in combined convergence probability. Chapter 6 shows how complex stru ctural problems with a large number of variables may be decomposed into multiple independent sub-problems which can be optimized concurrently using a two level optimization scheme. In Chapter 7 some conclusions are drawn and avenues for future research are proposed. PAGE 19 5 CHAPTER 2 BACKGROUND Population-based Global Optimization Global optimization often requires specia lized robust approaches. These include stochastic and/or population-ba sed optimizers such as GAs and the PSO. The focus of this research is on exploring avenues of parallelism in population based optimization algorithms. We demonstrate these methods using the stochastic Particle Swarm Optimizer, which is a stochastic search al gorithm suited to con tinuous problems. Other merits to the PSO include low sensitivity to algorithm parameters, and insensitivity to scaling of design variables. These qualitie s will be investigated in Chapter 3. This algorithm does not require gradients, which is an important consideration when solving problems of high dimensionality, often the cas e in large scale optimization. The PSO has a performance comparable to GAs, which are also candidates for any of the methods of parallelism proposed in this manuscript, and may be more suitable for discrete or mixed variable types of problems. In the research presented in this manuscript the PSO is applied to a biomechanical problem with a large number of continuous variables. This problem has several local minima, and, when attempting to solve it w ith gradient based optimizers, demonstrated high sensitivity to the scaling of design vari ables. This made it an ideal candidate to demonstrate the desirable quantities of the algorithm. Other application problems include structural sizing problems and com posite laminate angle optimization. PAGE 20 6 Parallel Processing in Optimization There are five approaches which may be utilized to decompose a single computational task into smaller problems whic h may then be solved concurrently. These are geometric decomposition, iterative decomposition, recursive decomposition, speculative decomposition and functional deco mposition, or a combination of these [1,2]. Among these, functional decomposition is most commonly applied and will also be the method of implementation presented in Chapte r 4. The steps followed in parallelizing a sequential program consist of the following from [1]. 1. Decompose the sequential program or data into smaller tasks. 2. Assign the tasks to processes. A process is an abstract entity that performs tasks. 3. Orchestrate the necessary data access, communicati on, and synchronization. 4. Map or assign the processes to computational nodes. The functional decomposition method is ba sed on the premise that applications such as an optimization algorithm may be br oken into many distinct phases, each of which interacts with some or all of the ot hers. These phases can be implemented as coroutines, each of which will execute for as long as it is able and then invoke another and remain suspended until it is again needed. Functional decomposition is the simplest way of parallelization if it can be implemented by turning its hi gh-level descrip tion into a set of cooperating processes [2]. When using this method, ba lancing the throughput of the different computational stages will be highly problematic when there are dependencies between stages, for example, when data requi res sequential processing by several stages. This limits the parallelism that may be achieved using functional decomposition. Any further parallelism must be achieved through using geometric, iterative or speculative decomposition within a functional unit [2]. PAGE 21 7 When decomposing a task into concurrent processes some additional communication among these routines is required for coordination and the interchange of data. Among the methods of communication fo r parallel programming th e parallel virtual machine (PVM) and the message-passing interf ace (MPI) are the most widely used. For the research undertaken in Chapter 4 a portable implementation of the MPI library [3,4] containing a set of parallel communication functions [5] is used. Decomposition in Large Scale Optimization The optimization community developed se veral formalized decomposition methods such as Collaborative Optimization (CO) [6], Concurrent SubSpace Optimization (CSSO) [7], and Quasiseparable decomposition to d eal with the challenges presented by large scale engineering problems. The problems a ddressed using these schemes include multidisciplinary optimization in aerospace design, or large scale structural and biomechanical problems. Decomposition methodologies in large scal e optimization are currently intensely studied because increasingly a dvanced higher fidelity simula tion methods result in large scale problems becoming intractable. Problem decomposition allows for: 1. Simplified decomposed subsystems. In most cases the decomposed sub-problems are of reduced dimensionality, and th erefore less demanding on optimization algorithms. An example of this is the numb er of gradient calc ulations required per optimization iteration, which in the case of gradient based algorithms, scales directly with problem dimensionality. 2. A broader work front to be attacked si multaneously, which results in a problem being solved in less time if the proces sing resources are available. Usually computational throughput is limited in a sequential fashion, i.e., the FLOPS limit for computers. However, if multiple processing units are available this limit can be circumvented by using an array of networ ked computers, for example, a Beowulf cluster. PAGE 22 8 Several such decomposition strategies have been proposed (see next section for a short review), all differing in the manner in which they address some or all of the following. 1. Decomposition boundaries, which may be disc iplinary, or component interfaces in a large structure. 2. Constraint handling 3. Coordination among decomposed sub-problems. Literature Review: Problem Decomposition Strategies Here follows a summary of methodologi es used for the decomposition and optimization of large scale global problems. This review forms the background for the study proposed in Chapter 6 of this manuscript. Collaborative Optimization (CO) Overview. The Collaborative Optimization (CO) strategy was first introduced by Kroo et al., [8]. Shortly after its introduction this bi-level optimization scheme was extended by Tappeta and Renaud [9,10] to three distinct formulations to address multiobjective optimization of large-scale systems. The CO paradigm is based on the concept that the interaction among several different disciplinary experts optimizing a design is minimal for local changes in each discipline in a MDO problem. This allows a large scale system to be decomposed into sub-systems along domain specific boundaries. These subsystems are optimized through local design variables specific to each subsystem, subject to the domain specific constraints. Th e objective of each subsystem optimization is to maintain agreement on interdisciplinary design variables. A system level optimizer enforces this interdisciplinary compatib ility while minimizing the overall objective function. This is achieved by combining the sy stem level fitness with the cumulative sum PAGE 23 9 of all discrepancies between in terdisciplinary design variables. This strategy is extremely suited for parallel computation because of th e minimal interaction between the different design disciplines, which results in reduced communication overhead during the course of the optimization. Methodology. This decomposition strategy is desc ribed with the flow diagram in Figure 1. As mentioned previously, the CO st rategy is a bi-level method, in which the system level optimization sets and adjusts a interdisciplinary design variables during the optimization. The subspace optimizer attemp ts both to satisfy local constraints by adjusting local parameters, and to meet the interdisciplinary design variable targets set by the system level optimizer. Departures from th e target interdisciplinary design parameters are allowed, which may occur because of insu fficient local degrees of freedom, but is to be minimized. The system level optimizer attempts to adjust the interdisciplinary parameters such that the objective func tion is minimized, while maximizing the agreement between subsystems. This process of adjusting the system level target design, and the subsystems attempting to match it whils t satisfying local constraints, is repeated until convergence. This procedure can be graphically illustrated in Figure 2 (taken from Kroo and Manning [6]). The system level optimizer sets a design target P, and each subspace optimization attempts to satisfy local constr aints while matching the target design P as closely as possible by moving in directions 1 and 2. During the next system level optimization cycle the target design P will be moved in direction 3 in order to maximize the agreement between the target design and subspace designs that satisfy local constraints. PAGE 24 10 Figure 1 Collaborative optimization flow diagram Figure 2 Collaborative optimization subspace c onstraint satisfaction procedure (taken from [6]) Refinements on method. Several enhancements to this method have been proposed, among which are the integration of this architecture into a decision based design framework as proposed by Hazelrigg [11,12], the use of response surfaces [13] to Subspace 2 constraint Subspace 1 constraint P Z1 Z2 1 2 3 Subproblem optimization (local variables) Disciplinary analysis Subproblem optimization (local variables) Disciplinary analysis Subproblem optimization (local variables) Disciplinary analysis Converged ? Start Sto p Optimize system approximation problem and set target design Attempt to match system level mandated target design yes no PAGE 25 11 model disciplinary analyses, and genetic algorithms with scheduling for increased efficiency [14]. Solution quality and computational efficiency. Sobieski and Kroo [15] report very robust performance on their CO scheme with identical solutions being found on both collaborative and single level optimizati ons of 45 design variable, 26 constraint problems. Braun and Kroo [16] showed CO to be unsuited for small problems with strong coupling, but for large scale problems with weak coupli ng the CO methodology becomes more computationally efficient. They also found that the amount of system level iterations is dependent on the level of coup ling of sub-systems, and that the required number of sub-optimizations scales in pr oportion to the overall problem size. Similar findings are reported by Alexandrov [17]. Braun et al. [18] evaluated the performance of CO on a set of quadratic problem s presented by Shankar et al. [19] to evaluate the CSSO method. Unlike CSSO, the CO method did not require an increased amount of iterations for QP problems with strong coupling, and converged successfully in all cases. Applications. This decomposition architecture has been extensively demonstrated using analytical test problems [18,20] and aerospace optimization problems such as trajectory optimization [16,18], vehicle design [13,20-22], and satellite constellation configurations [23]. Concurrent SubSpace Optimization (CSSO) Overview. This method was proposed by Sobieszczanski-Sobieski [24], and, like CO, divides the MDO problem along disc iplinary boundaries. The main difference however is the manner in which the CSSO framework coordinates the subsystem optimizations. A bi-level optimization scheme is used in which the upper optimization PAGE 26 12 problem consists of a linear [7] or second order [25] system approximation created with the use of Global Sensitivity Equations (G SE's). This system approximation reflects changes in constraints and the objective f unction as a function of design variables. Because of the nonlinearities this approximation is only accurate in the immediate neighborhood of the current design state, and needs to be updated after every upper-level optimization iteration. After establishing a sy stem level approximation the subsystems are independently optimized using only de sign variables local to the subspace. The system level approximation is then updated by a sensitivity analysis to reflect changes in the subspace design. The last two steps of subspace optimization and system approximation is repeated through the uppe r level optimizer until convergence is achieved. Methodology. The basic steps taken to optimize a MDO problem with the CSSO framework are as follows: 1. Select initial set of designs for each subsystem. 2. Construct system approximation using GSE's 3. a) Subspace optimization through local variables and objective b) Update system approximation by performing sensitivity analysis 4. Optimize design variables according to system approximation 5. Stop if converged; otherwise go to 3) where step (3) contains the lower level optim ization in this bi-level framework. This process is illustrated in Figure 3. Refinements on method. As previously mentioned, early CSSO strategies used linear system level approximations obtained with the Global Sensitivity Equations (GSE). Coordination of subspace or disciplinary optim izations is achieved through system level sensitivity information. PAGE 27 13 Figure 3 Concurrent subspace optimization methodology flow diagram This imposes limits on the allowable deviation from the current design, and requires a new approximation to be constructe d at every system iteration to maintain reasonable accuracy. Recent research focuse d on alternate methods for acquiring system level approximations for the coor dination effort. Several authors [25-28] modified the system approximation to utilize a second orde r response surface approximation. This is combined with a database of previous fitn ess evaluation points, which can be used to create and update the response surface. This response surface then serves to couple the subsystem optimizations and coordinate system level design. Solution quality and computational efficiency. Shankar et al. [19] investigated the robustness of the CSSO on a set of analytical problems. Several quadratic programming problems with weak and st rong coupling between subsystems were Initial design selection System approximation using GSEs Sub-problem optimzation (local variables) Sub-problem sensitivity analysis Sub-problem optimization (local variables) Sub-problem sensitivity analysis Sub-problem optimization (local variables) Sub-problem sensitivity analysis Converged ? Start Optimize system approximation problem Approximated g(x) yes no Stop PAGE 28 14 evaluated with a modification of Sobieszc zanski-Sobieski's nonhierarchical subspace optimization scheme [7]. Results indicated reasonable performance for problems with weak coupling between subsystems. For large problems with strong interactions between subsystems, this decomposition scheme prove d unreliable in terms of finding global sensitivities, leading to poor solutions. Tappeta et al., using th e iSIGHT software. [29-31], analyzed two analytical and two structural problems, a welding design a nd a stepped beam weight minimization. In this work it is reported that the Karush-KuhnTucker conditions were met in some of the cases, and that most problems converged cl osely to the original problem solution. Lin and Renaud compared the commercial software package LANCELOT [32] which compares the BroydonFletcher-Goldfarb-Shanno (B FGS) method to the CSSO strategy, the latter incorporati ng response surfaces. In this st udy the authors show similar computational efficiencies for small uncoupled analytical problems. For large scale MDO problems however the CSSO method c onsistently outperformed the LANCELOT optimizer in this area. Sellar et al. [26] compared a CSSO with neur al network based response surface enhancements with a full (all at once) sy stem optimization. The CSSO-NN algorithm showed a distinct advantage in computational efficiency over the all-at-once approach, while maintaining a high level of robustness. Applications. This decomposition methodology has been applied to large scale aerospace problems like high temperature and pressure aircraft engine components [29,33], aircraft brake component optimization [34], and aerospace vehicle design [26,35]. PAGE 29 15 Analytical Target Cascading (ATC) Overview. Analytical Target Cascading was introduced by Michelena et al [36] in 1999, and developed further by Kim [37] as a product development tool. This method is typically used to solve object based d ecomposed system optimization problems. Tosserams et al. [38] introduced a Lagrangian relaxa tion strategy which in some cases improves the computational efficiency of th is method by several orders of magnitude. Methodology. ATC is a strategy which coordina tes hierarchically decomposed systems or elements of a problem (see Figure 4) by the introd uction of target and response coupling variables. Targets are set by parent elements which are met by responses from the children elements in the hierarchy (see Figure 5 obtained from [338]). At each element an optimization problem is fo rmulated to find local variables, parent responses and child targets which minimi ze a penalized discrepancy function, while meeting local constraints. The responses are rebalanced up to higher levels by iteratively changing targets in a nested loop in order to obtain consistenc y. Several coordination strategies are available to de termine the sequence of solvi ng the sub-problems and order of exchange in targets and responses [39]. Proof of convergence is also presented for some of these classes of approaches in [39]. Refinements on method. Tosserams et al. [38] introduced the use of augmented Lagrangian relaxation in order to reduce the computational cost associated with obtaining very accurate agreement between sub-problems, and the coordination effort at the inner loop of the method. Allison et al. exploited th e complimentary nature of ATC and CO to obtain an optimization formula tion called nested ATC-MDO [40]. Kokkolaras et al. extended the formulation of ATC to include the design of product families [41]. PAGE 30 16 Figure 4 Example hierarchical problem structure Figure 5 Sub-problem information flow Solution quality and computational efficiency. Michelena et al. [39] prove, under certain convexity assumptions that the ATC pr ocess will yield the optimal solution of the original design target problem. The original ATC formulation had the twofold problem of requiring large penalty weights to accurate solutions, and the excessive repeat in the inner loop which solves sub-problems before the outer loop can proceed. Both these problems are addressed by using augmented Lagrangian relaxation [38] by Tosserams et al which reported a decrease in computational effort on the order of between orders 10 and 10000. rij from parent : responses from children : r(i+1) k responses from parent : tij targets Elementary sub-problem Pij local variables xij local objective fij local constraints gij, hij Optimization inputs Optimization outputs t(i+1) k to children : responses i = 1 i = 2 i = 3 Level index i j = 1 j = 1 j = 1 j = 1 j = 1 j = 1 Element index j PAGE 31 17 Applications. The ATC strategy is applied the de sign of structural members and a electric water pump in [40], and automotive design [42,43]. The performance of the Augmented Lagrangian Relaxation ATC enhan cement was tested using several geometric programming problems [38]. Quasiseparable Decomposition and Optimization The quasiseparable decomposition and optim ization strategy will be the focus of the research proposed in Chapter 6. This methodology addresses a class of problems common in the field of engineering and can be applied to a wide class of structural, biomechanical and other disciplinary problems. The strategy, which will be explained in detail in Chapter 6, is based on a two-level optimization approach which allows for the global search effort to be concentrated at the lower level sub-probl em optimizations. The system to be optimized is decomposed in to several lower level subsystem optimizations which are coordinated by an upper leve l optimization. A Sequential Quadratic Programming (SQP) based optimizer is applie d in this optimization infrastructure to solve a example structural sizing problem. This example problem entails the maximization of the tip displacement of a hollow stepped cantilever beam with 5 sections. The quasiseparable decomposition methodology is applied to decompose the structure into several sub-problems of redu ced dimensionality. The parallelism with this strategy is achieved by optimizing the independe nt sub-problems (sections of the stepped beam) concurrently, allowing for the uti lization of parallel processing resources. PAGE 32 18 CHAPTER 3 GLOBAL OPTIMIZATION THROUGH TH E PARTICLE SWARM ALGORITHM Overview This chapter introduces the population based algorithm which will be the target for investigating parallelism throughout the manuscr ipt. This stochastic algorithm mimics swarming or flocking behavior found in animal groups such as bees, fish and birds. The swarm of particles is basically several parallel individual sear ches that are influenced by individual and swarm memory of regions in the design space with high fitness. These positions of the regions are constantly updated and reported to the individuals in the swarm through a simple communication model. Th is model allows for the algorithm to be easily decomposed into concurre nt processes, each representi ng an individual particle in the swarm. This approach to allow para llelism will be detailed in Chapter 4. For the purpose of illustrating the performance of the PSO, it is compared to several other algorithms commonly used when solving problems in biomechanics. This comparison is made through optimization of several analytical problems, and a biomechanical system identification problem. Th e focus of the research presented in this chapter is to demonstrate that the PSO ha s good properties such as insensitivity to the scaling of design variables a nd very few algorithm parameters to fine tune. These make the PSO a valuable addition in the arsenal of optimization methods in biomechanical optimization, in which it ha s never been applied before. The work presented in this Chapter wa s in collaboration with Jeff Reinbold, who supplied the biomechanical test problem [58] and Byung Il Koh, who developed the PAGE 33 19 parallel SQP and BFGS algorithms [77] used to establish a computational efficiency comparison. The research in this Chapter was also published in [58,76,129]. Thanks goes to Soest and Casius for their willingness to share their numerical results published in [48]. EQUATION CHAPTER 3 SECTION 1 Introduction Optimization methods are used extensivel y in biomechanics research to predict movement-related quantities that cannot be measured experimentally. Forward dynamic, inverse dynamic, and inverse static optimiza tions have been used to predict muscle, ligament, and joint contact forces during e xperimental or predicted movements (e.g., see references [44-55]). System identification optimizatio ns have been employed to tune a variety of musculoskeletal model parameters to experimental movement data (e.g., see references [56-60]). Image matching optimizations have been performed to align implant and bone models to in vivo fluoroscopic images collect ed during loaded functional activities (e.g., see references [61-63]). Since biomechanical optimization problem s are typically nonlin ear in the design variables, gradient-based nonlinear programming has been the most widely used optimization method. The increasing size and co mplexity of biomechanical models has also led to parallelization of gradient-based algorithms, since gradient calculations can be easily distributed to multiple processors [44-46]. However, gradient-based optimizers can suffer from several important limitations. They are local rather than global by nature and so can be sensitive to the initial guess. Experimental or numerical noise can exacerbate this problem by introducing multiple local minima into the problem. For some problems, multiple local minima may exist due to the nature of the problem itself. In most situations, the necessary gradient values cannot be obtained analytically, and finite PAGE 34 20 difference gradient calculations can be sensitiv e to the selected finite difference step size. Furthermore, the use of design variables with different length scales or units can produce poorly scaled problems that conve rge slowly or not at all [64,65], necessitating design variable scaling to improve performance. Motivated by these limitations and improveme nts in computer speed, recent studies have begun investigating the use of non-grad ient global optimizer s for biomechanical applications. Neptune [47] compared the performance of a simulated annealing (SA) algorithm with that of downhill simplex (D S) and sequential quadratic programming (SQP) algorithms on a forward dynamic optimi zation of bicycle pe daling utilizing 27 design variables. Simulated annealing found a better optimum than the other two methods and in a reasonable amount of CPU time. More recently, Soest and Casius [48] evaluated a parallel implementation of a genetic algorith m (GA) using a suite of analytical tests problems with up to 32 design variables and forward dynamic optimizations of jumping and isokinetic cycling with up to 34 design va riables. The genetic algorithm generally outperformed all other algorithms tested, incl uding SA, on both the analytical test suite and the movement optimizations. This study evaluates a recent addition to th e arsenal of global optimization methods particle swarm optimization (PSO) for use on biomechanical problems. A recentlydeveloped variant of the PSO algorithm is used for the investigation. The algorithms global search capabilities are evaluated usi ng a previously published suite of difficult analytical test problems w ith multiple local minima [48], while its insensitivity to design variable scaling is proven mathematically and verified using a biomechanical test problem. For both categories of problems, PSO robustness, performance, and scale- PAGE 35 21 independence are compared to that of thr ee off-the-shelf optimization algorithms a genetic algorithm (GA), a sequential quadr atic programming algorithm (SQP), and the BFGS quasi-Newton algorithm. In add ition, previously pub lished results [48] for the analytical test problems permit comparison with a more complex GA algorithm (GA*), a simulated annealing algorithm (SA), a di fferent SQP algorithm (SQP*), and a downhill simplex (DS) algorithm. Theory Particle Swarm Algorithm. Particle swarm optimization is a stochastic global optimiza tion approach introduced by Kennedy and Eberhart [66]. The method's strength lies in its simplicity, being easy to code and requiring few algorithm paramete rs to define convergence behavior. The following is a brief introduction to the opera tion of the particle swarm algorithm based on a recent implementation by Groenwold and Fourie [67] incorporating dyn amic inertia and velocity reduction. Consider a swarm of p particles, where each particle's position i kx represents a possible solution point in the problem design space D For each particle i Kennedy and Eberhart [66] proposed that the position 1 i k x be updated in the following manner: 11 kkk xxv (3.1) with a pseudo-velocity 1 i k v calculated as follows: 11122 iiiii kkkkkkkwcrcr vvpx g x (3.2) Here, subscript k indicates a (unit) pseudotime increment. The point pk i is the bestfound cost location by particle i up to timestep k, which represents the cognitive contribution to the search vector 1 i k v. Each component of 1 i k v is constrained to be less PAGE 36 22 than or equal to a maximum value defined in max 1 k v. The point k g is the global best-found position among all particles in the swarm up to time k and forms the social contribution to the velocity vector. Cost func tion values associated with i kp and k g are denoted by i best f and g best f respectively. Random numbers r1 and r2 are uniformly distributed in the interval [0,1]. Shi and Eberhart [68] proposed that the cognitive and social scaling parameters 1c and 2c be selected such that 122cc to allow the product 11cr or 22crto have a mean of 1. The result of using these proposed values is that the particles overshoot the attraction points i kp and k g half the time, thereby maintaini ng separation in the group and allowing a greater area to be searched than if the particles did not overshoot. The variable kw, set to 1 at initialization, is a modification to the original PSO algorithm [66]. By reducing its value dynamically based on the cost functi on improvement rate, the search area is gradually reduced [69]. This dynamic reduction behavior is defined by dw, the amount by which the inertia kw is reduced, dv, the amount by which the maximum velocity max 1 k v is reduced, and d, the number of iterations with no improvement in k g before these reduction take place [67] (see algorithm flow description below). Initialization of the algorithm involves se veral important steps. Particles are randomly distributed throughout the desi gn space, and particle velocities 0iv are initialized to random va lues within the limitsmax 000ivv. The particle velocity upper limit max 0v is calculated as a fr action of the distance betw een the upper and lower bound on variables in the design space max 0 UBLB vxx with 0.5 as suggested in [69]. Iteration counters k and t are set to 0. Iteration counter k is used to monitor the total PAGE 37 23 number of swarm iterations, while iteration counter t is used to monitor the number of swarm iterations since the last improvement in k g Thus, t is periodically reset to zero during the optimization while k is not. The algorithm flow can be represented as follows: 1. Initialize a. Set constants 1c, 2c, maxk, max 0v 0w, dv, dw, and d b. Set counters k = 0, t = 0. Set random number seed. c. Randomly initialize pa rticle positions in0D xi for p i , 1 d. Randomly initialize particle velocities max 0 00 v v i for p i, 1 e. Evaluate cost function values if0 using design space coordinates i 0x for p i, 1 f. Set 0 ii best f f and i i 0 0x p for p i, 1 g. Set g bestf to best i bestf and 0g to corresponding i 0x 2. Optimize h. Update particle velocity vectorsi k 1 v using Eq. (3.2) i. If i k 1 v > max 1 kv for any component, then set that component to its maximum allowable value j. Update particle position vectors i k 1 x using Eq. (3.1) k. Evaluate cost function values i kf1 using design space coordinates i k 1 x for p i , 1 l. If i best i kf f 1, then i k i bestf f1 i k i k 1 1 x p for p i , 1 PAGE 38 24 m. If g best i kf f 1, then i k g bestf f1 i k k 1 1 x g for p i , 1 n. If g bestf was improved in (e), then reset t = 0. Else increment t o. If maximum number of function evaluations is exceeded, then go to 3 p. If t = d, then multiply 1kw by dw 1 and max 1 k vby dv 1 q. Increment k r. Go to 2(a). 3. Report results 4. Terminate This algorithm was coded in the C pr ogramming language by the author [70] and used for all PSO analyses performed in the study. A standard population size of 20 particles was used for all runs, and other al gorithm parameters were also selected based on standard recommendations (Table 1) [70-72]. The C source code for our PSO algorithm is freely available at http://www.mae.ufl.edu/~fregly/downloads/pso.zip (last accessed 12/2005). Table 1 Standard PSO algorithm parameters used in the study Analysis of Scale Sensitivity. One of the benefits of the PSO algorithm is its insensitivity to design variable scaling. To prove this characteristic, we will us e a proof by induction to show that all Parameter Description Value p Population size (number of particles) 20 c1 Cognitive trust parameter 2.0 c2 Social trust parameter 2.0 w0 Initial inertia 1 wd Inertia reduction parameter 0.01 Bound on velocity fraction 0.5 vd Velocity reduction parameter 0.01 d Dynamic inertia/velocity reduction delay (function evaluations) 200 PAGE 39 25 particles follow an identical path through the design space regardless of how the design variables are scaled. In actual PSO runs intend ed to investigate this property, use of the same random seed in scaled and unscaled cases will ensure that an identical sequence of random 1r and 2r values are produced by the comput er throughout the course of the optimization. Consider an optimization problem with n design variables. An n-dimensional constant scaling vector can be used to scale any or all dimensions of the problem design space: 1 2 3 n (3.3) We wish to show that for any time step 0k, kk v v (3.4) kk x x (3.5) where kx and kv (dropping superscript i) are the unscaled position and velocity, respectively, of an i ndividual particle and k kx x and k kv v are the corresponding scaled versions. First, we must show that our proposition is true for the base case, which involves initialization (0 k) and the first time step (1 k). Applying the scaling vector to an individual particle position 0x during initialization produces a scaled particle position0x : 00 x x (3.6) where the right hand side is a compone nt-by-component product of vectors and 0x. This implies that PAGE 40 26 0000, p p g g (3.7) In the unscaled case, the pseudovelocity is calculated as 0 UBLB vxx (3.8) In the scaled case, this becomes 0 0 UBLB UBLB UBLB vxx x x xx v (3.9) From Eqs. (3.1) and (3.2) and these initial conditions the particle pseudo-velocity and position for the first time step can be written as 10011002200wcrcr vvpx g x (3.10) 101 xxv (3.11) in the unscaled case and 10011002200 0011002200 0011002200 1wcrcr wcrcr wcrcr vvpxgx v p x g x vpxgx v (3.12) 101 01 01 1 xxv x v xv x (3.13) in the scaled case. Thus, our propos ition is true for the base case. Next, we must show that our proposition is true for the i nductive step. If we assume our proposition holds for any time step k = j, we must prove that it also holds for time step k = j + 1. We begin by replacing subscript k with subscript j in Eqs. (3.4) and (3.5). If we then replace subscript 0 with subscript j and subscript 1 with subscript j + 1 in Eqs. (3.12) and (3.13), we arrive at Eqs. (3.4) and (3.5) where subscript k is replaced by subscript j + 1. Thus, our proposition is true for any time step j + 1. PAGE 41 27 Consequently, since the base case is true and the inductive step is true, Eqs. (3.4) and (3.5) are true for all k 0. From Eqs. (3.4) and (3.5), we can conclude that any linear scaling of the design variables (or subset thereof) will have no effect on the final or any intermediate result of the optimization, since all velocities and positions are scaled accordingly. This fact leads to identical step intervals being taken in the design space for scaled and unscaled version of the same problem, assuming infinite precision in all calculations. In contrast, gradient-based optimization methods are often sensitive to design variable scaling due to algor ithmic issues and numerical approximations. First derivative methods are sensitive because of algorithmic issues, as illustrated by a simple example. Consider the following minimization pr oblem with two design variables (x,y) where the cost function is 2 2100 y x (3.14) with initial guess (1,1). A scaled versi on of the same problem can be created by letting ,/10 xxyy so that the cost function becomes 22 x y (3.15) with initial guess (1,10). Ta king first derivatives of each cost function with respect to the corresponding design variables and eval uating at the initial guesses, the search direction for the unscaled problem is along a line rotated 5.7 from the positive x axis and for the scaled problem along a line rotated 45. To reach the optimum in a single step, the unscaled problem requires a search direction rotated 84.3 and the scaled problem 45. Thus, the scaled problem can theoretically reach the optimum in a single step while the unscaled problem cannot due to the effect of scaling on the calculated search direction. PAGE 42 28 Second derivative methods are sensitive to design variable scaling because of numerical issues related to approximation of the Hessian (second derivative) matrix. According to Gill et al. [64], Newton methods utilizing an exact Hessian matrix will be insensitive to design variable scaling as long as the Hessian matrix remains positive definite. However, in practice, exact Hessian calculations are almost never available, necessitating numerical approxi mations via finite differencing. Errors in these approximations result in different search direc tions for scaled versus unscaled versions of the same problem. Even a small amount of desi gn variable scaling can significantly affect the Hessian matrix so that design variable changes of similar magnitude will not produce comparable magnitude cost function changes [64]. Common gradient-based algorithms that employ an approximate Hessian in clude Newton and quasi-Newton nonlinear programming methods such as BFGS, SQP me thods, and nonlinear least-squares methods such as Levenberg-Marquardt [64]. A detailed discussion of the influence of design variable scaling on optimization algorithm performance can be found in Gill et al. [64]. Methodology Optimization Algorithms In addition to our PSO algorithm, three o ff-the-shelf optimization algorithms were applied to all test problems (analytical a nd biomechanical see below) for comparison purposes. One was a global GA algorithm developed by Deb [73-75]. This basic GA implementation utilizes one mutation operator and one crossover opera tor along with real encoding to handle continuous variables. Th e other two algorithms were commercial implementations of gradient-based SQP and BFGS algorithms (VisualDOC, Vanderplaats R & D, Colorado Springs, CO). PAGE 43 29 All four algorithms (PSO, GA, SQP, and BFGS) were parallelized to accommodate the computational demands of the biomechan ical test problem. For the PSO algorithm, parallelization was performed by distributing indivi dual particle functi on evaluations to different processors as de tailed by the author in [76]. For the GA algorithm, individual chromosome function evaluations were parallelized as described in [48]. Finally, for the SQP and BFGS algorithms, finite difference gr adient calculations were performed on different processors as outlined by Koh et al. in [77]. A master-slave paradigm using the Message Passing Interface (MPI) [3,4] was employed for all parallel implementations. Parallel optimizations for the biomechanical test problem were run on a cluster of Linuxbased PCs in the University of Florid a High-performance Computing and Simulation Research Laboratory (1. 33 GHz Athlon CPUs with 256MB memory on a 100Mbps switched Fast Ethernet network). While the PSO algorithm used standard algorithm parameters for all optimization runs, minor algorithm tuning was performed on the GA, SQP, and BFGS algorithms for the biomechanical test problem. The goal was to give these algorithms the best possible chance for success against the PSO algorit hm. For the GA algorithm, preliminary optimizations were performed using populat ion sizes ranging from 40 to 100. It was found that for the specified maximum number of function evalua tions, a population size of 60 produced the best results. Consequen tly, this population size was used for all subsequent optimization runs (analytical and biomechanical). For the SQP and BFGS algorithms, automatic tuning of the finite di fference step size (FDSS) was performed separately for each design variable. At the st art of each gradient-bas ed run, forward and central difference gradients were calculated for each design variable beginning with a PAGE 44 30 relative FDSS of 10-1. The step size was then incrementally decreased by factors of ten until the absolute difference between forw ard and central gradient results was a minimum. This approach was taken since the amount of noise in the biomechanical test problem prevented a single stable gradient va lue from being calculate d over a wide range of FDSS values (see Discussion). The forward difference step size automatically selected for each design variable was used for the remainder of the run. Analytical Test Problems The global search capabilities of our PSO implementation were evaluated using a suite of difficult analytical test problems previously published by Soest and Casius [48]. In that study, each problem in the suite was ev aluated using four different optimizers: SA, GA*, SQP*, and DS, where a star indicates a different version of an algorithm used in our study. One thousand optimization runs were performed with each optimizer starting from random initial guesses and using standard optimization algorithm parameters. Each run was terminated based on a pre-define d number of function evaluations for the particular problem being solved. We followe d an identical procedure with our four algorithms to permit comparison between our results and those published by Soest and Casius in [48]. Since two of the algorithms used in our study (GA and SQP) were of the same general category as algorithms used by Soest and Casius in [48] (GA* and SQP*), comparisons could be made between differe nt implementations of the same general algorithm. Failed PSO and GA runs were allo wed to use up the full number of function evaluations, whereas failed SQP and BFGS r uns were re-started from new random initial guesses until the full number of function ev aluations was completed. Only 100 rather than 1000 runs were performed with the SQ P and BFGS algorithms due to a database size problem in the VisualDOC software. PAGE 45 31 A detailed description of the six analytic al test problems can be found in Soest and Casius [48]. Since the design variables for each problem possessed the same absolute upper and lower bound and appeared in the co st function in a similar form, design variable scaling was not an issue in these pr oblems. The six analytic al test problems are described briefly below. 1H: This simple 2-dimensional function [48] has several local maxima and a global maximum of 2 at the coordinates (8.6998, 6.7665). 22 21 12 112sinsin 88 (,) 1 x x xx Hxx d (3.16) 100 100 ,2 1 x x where 22 128.69986.7665 dxx Ten thousand function evaluations were used for this problem. 2H: This inverted version of the F6 function used by Schaffer et al. [78] has 2 dimensions with several local maxima ar ound the global maximum of 1.0 at (0,0). 222 12 212 2 22 12sin0.5 ,0.5 10.001 xx Hxx xx (3.17) 12,100,100xx This problem was solved using 20,000 f unction evaluations per optimization run. 3H : This test function fr om Corana et al. [79] was used with dimensionality n = 4, 8, 16, and 32. The function contains a larg e number of local minima (on the order of n 410) with a global minimum of 0 at 05 0ix. PAGE 46 32 2 31 2 1sgnif ,, otherwisen iiiii n i iitzzcdxzt Hxx dx (3.18) 1000,1000ix where s x s x zi i i sgn 49999 0, 15 0 c, 2 0 s, 05 0 t, and 12 8 4 100 11 7 3 10 10 6 2 1000 9 5 1 1 i i i i di The use of the floor function in Eq. (3.18) makes the search space for this problem the most discrete of all problems tested. The number of function evalua tions used for this problem was 50,000 ( n = 4), 100,000 ( n = 8), 200,000 ( n = 16), and 400,000 ( n = 32). For all of the analytical test problems, an algorithm was considered to have succeeded if it converged to within 10-3 of the known optimum cost function value within the specified number of function evaluations [48]. Biomechanical Test Problem In addition to these analytical test problem s, a biomechanical test problem was used to evaluate the scale-independent natu re of the PSO algorithm. Though our PSO algorithm is theoretical ly insensitive to design variab le scaling, numerical round-off errors and implementation details could pot entially produce a scali ng effect. Running the other three algorithms on scaled and unscaled versions of this test problem also permitted investigation of the extent to which other algorithms are influenced by design variable scaling. The biomechanical test problem involved de termination of an ankle joint kinematic model that best matched noisy synthetic (i .e., computer generated) movement data. PAGE 47 33 Similar to that used by van der Bogert et al. [56], the ankle was modeled as a threedimensional linkage with two non-intersecti ng pin joints defined by 12 subject-specific parameters (Figure 6). Figure 6 Joint locations and orientations in the parametric ankle kinematic model. Each pi (i = 1,...,12) represents a different position or orientation parameter in the model PAGE 48 34 These parameters represent the positions and orientations of the talocrural and subtalar joint axes in the tibia, talus, and cal caneous. Position paramete rs were in units of centimeters and orientation parameters in units of radians, resulting in parameter values of varying magnitude. This model was part of a larger 27 degree-of-freedom (DOF) fullbody kinematic model used to optimize other joints as well [58]. Given this model structure, noisy syntheti c movement data were generated from a nominal model for which the "true" model pa rameters were known. Joint parameters for the nominal model along with a nominal motion were derived from in vivo experimental movement data using the optimization me thodology described below. Next, three markers were attached to the tibia and calcaneous segments in the model at locations consistent with the experiment, and the 27 model DOFs were moved through their nominal motions. This process created syntheti c marker trajectories consistent with the nominal model parameters and motion and also representative of the original experimental data. Finally, numerical noise was added to the synthetic marker trajectories to emulate skin and soft tissue movement artifacts. For each marker coordinate, a sinusoidal noise function was used with unifo rmly distributed random period, phase, and amplitude (limited to a maximum of 1 cm). The values of the sinusoidal parameters were based on previous studies reported in the literature [80,53]. An unconstrained optimization problem with bounds on the design variables was formulated to attempt to recover the known joint parameters from the noisy synthetic marker trajectories. The cost function was min() f pp (3.19) with PAGE 49 35 2 5063 111 ()min,ijkijk kjifccqppq (3.20) where p is a vector of 12 design variable s containing the joint parameters, q is a vector of 27 generalized coordi nates for the kinematic model, cijk is the i th coordinate of synthetic marker j at time frame k and ,ijkcpq is the corresponding marker coordinate from the kinematic model. At each time frame, ,ijkcpq was computed from the current model parameters p and an optimized model configuration q A separate LevenbergMarquardt nonlinear least-squares optimizati on was performed for each time frame in Eq. (3.20) to determine this optimal configura tion. A relative convergence tolerance of 10-3 was chosen to achieve good accuracy with minimum computational cost. A nested optimization formulation (i.e., minimization occurs in Eqs. (3.19) and (3.20)) was used to decrease the dimensionality of the design space in Eq. (3.19). Equation (3.20) was coded in Matlab and exported as stand-alone C code using the Matlab Compiler (The Mathworks, Natick, MA). The executable read in a file containing the 12 design variables and output a file containing the resulting cost function value. This approach facilitated the use of different optimizers to solve Eq. (3.19). To investigate the influence of design variable scaling on optimization algorithm performance, two versions of Eq. (3.20) were generated. The fi rst used the original units of centimeters and radians for the position a nd orientation design va riables respectively. Bounds on the design variables were chosen to enclose a physical ly realistic region around the solution point in design space. Each position design variable was constrained to remain within a cube centered at the midpoi nt of the medial and lateral malleoli, where the length of each side was equal to the di stance between the malleoli (i.e., 11.32 cm). Each orientation design variable was constrained to remain w ithin a circular cone defined PAGE 50 36 by varying its true value by 30. The second version normalized all 12 design variables to be within [-1,1] using 2norm UBLB UBLB xxx x xx (3.21) where UBx and LBx denote the upper and lower bounds respectively, on the design variable vector [81]. Two approaches were used to compare PSO scale sensitivity to that of the other three algorithms. For the first approach, a fixe d number of scaled a nd unscaled runs (10) were performed with each op timization algorithm starting from different random initial seeds, and the sensitivity of the final cost function value to algorithm choice and design variable scaling was evaluated. The stopping condition for PSO and GA runs was 10,000 function evaluations, while SQP and BFGS runs were terminated when a relative convergence tolerance of 10-5 or absolute convergence tolerance of 10-6 was met. For the second approach, a fixed number of function evaluations (10,000) we re performed with each algorithm to investigate unscaled ve rsus scaled convergence history. A single random initial guess was used for the PSO a nd GA algorithms, and each algorithm was terminated once it reached 10,000 function eval uations. Since indivi dual SQP and BFGS runs require much fewer than 10,000 function evaluations, repeated runs were performed with different random initial guesses until the total number of function evaluations exceeded 10,000 at the termination of a run. This approach essentially uses SQP and BFGS as global optimizers, where the separate runs are like indi vidual particles that cannot communicate with each another but have access to local gradient information. Finite difference step size tuning at the star t of each run was included in the computation of number of function evaluations. Once the total number of runs required to reach PAGE 51 37 10,000 function evaluations was known, the lowe st cost function valu e from all runs at each iteration was used to represent the cost over a range of function evaluations equal to the number of runs. Results For the analytical test problems, our PSO algorithm was more robust than our GA, SQP, and BFGS algorithms (Table 2, top ha lf). PSO converged to the correct global solution nearly 100% of the time on four of the six test problems (H1 and H3 with n = 4, 8, and 16). It converged 67% of the time for problem H2 and only 1.5% of the time for problem H3 with n = 32. In contrast, none of the ot her algorithms converged more than 32% of the time on any of the analytical test problems. T hough our GA algorithm typically exhibited faster initial conv ergence than did our PSO algorithm (Figure 7, left column), it leveled off and rarely reached th e correct final point in design space within the specified number of function evaluations. Table 2 Fraction of successful optimizer runs for the analytical test problems. Top half: Results from the PSO, GA, SQP, and BFGS algorithms used in the present study. Bottom half: Results from the SA, GA, SQP, and DS algorithms used in Soest and Casius 448. The GA and SQP algorithms used in that study were different from the ones used in our study. Successful runs were identified by a final cost function value within of the known optimum value, consistent with [48] H3 Study Algorithm H1 H2 (n = 4) (n = 8) (n = 16) (n = 32) PSO 0.972 0.688 1.000 1.000 1.000 0.015 GA 0.000 0.034 0.000 0.000 0.000 0.002 SQP 0.09 0.11 0.00 0.00 0.00 0.00 Present BFGS 0.00 0.32 0.00 0.00 0.00 0.00 SA 1.000 0.027 0.000 0.001 0.000 0.000 GA 0.990 0.999 1.000 1.000 1.000 1.000 SQP 0.279 0.810 0.385 0.000 0.000 0.000 Soest and Casius (2003) DS 1.000 0.636 0.000 0.000 0.000 0.000 PAGE 52 38 Figure 7 Comparison of converg ence history results for the analytical test problems. Left column: Results from the PSO, GA, SQP, and BFGS algorithms used in the present study. Right column: Resu lts from the SA, GA, SQP, and DS algorithms used in Soest and Casius [5]. The GA and SQP algorithms used in that study were different from the ones used in our study. (a) Problem H1. The SA results have been updated using corrected data provided by Soest and Casius, since the results in 48 accidentally used a temperature reduction rate of 0.5 rather than the standard va lue of 0.85 as reported. (b) Problem H2 (c) Problem H3 with n = 4. (d) Problem H3 with n = 32. Error was computed using the known cost at the global op timum and represents the average of 1000 runs (100 multi-start SQP and BFGS runs in our study) with each algorithm. PAGE 53 39 Figure 8 Final cost function values for ten unscaled (dark bars) and scaled (gray bars) parallel PSO, GA, SQP, and BFGS runs for the biomechanical test problem. Each pair of unscaled and scaled runs was started from the same initial point(s) in design space, and each run was terminated when the specified stopping criteria was met (see text). In contrast, the SQP and BFGS algorithms were highly sensitive to design variable scaling in the biomechanical test problem. Fo r the ten trials, unscaled and scaled SQP or BFGS runs rarely converged to similar poi nts in design space ( note y axis scale in Figure 8) and produced large differences in final cost function value from one trial to the next (Figure 8c and d). Scaling improved the final re sult in seven out of ten SQP trials and in five of ten BFGS trials. The best unscaled a nd scaled SQP final cost function values were 255 and 121, respectively, while those of BFGS were 355 and 102 (Table 3). Thus, scaling yielded the best result found with bot h algorithms. The best SQP and BFGS trials generally produced larger RMS marker di stance errors (up to two times worse), orientation parameter errors (up to five times worse), and position parameter errors (up to six times worse) than those found by PSO or GA. PAGE 54 40 Table 3 Final cost function values and asso ciated marker distance and joint parameter root-mean-square (RMS) e rrors after 10,000 function evaluations performed by multiple unscaled and scaled PSO, GA, SQP, and BFGS runs. See Figure 9 for corresponding convergence histories Figure 9 Convergence history fo r unscaled (dark lines) and scaled (gray lines) parallel PSO, GA, SQP, and BFGS runs for th e biomechanical test problem. Each algorithm was run terminated after 10 ,000 function evaluations. Only one unscaled and scaled PSO and GA run we re required to reach 10,000 function evaluations, while repeated SQP and BFGS runs were required to reach that number. Separate SQP and BFGS runs we re treated like individual particles in a single PSO run for calculating convergence history (see text). RMS Error Optimizer Formulation Cost Function Marker Distances (mm) Orientation Parameters (deg) Position Parameters (mm) Unscaled 69.5 5.44 2.63 4.47 PSOt Scaled 69.5 5.44 2.63 4.47 Unscaled 77.9 5.78 2.65 6.97 GA Scaled 74.0 5.64 3.76 4.01 Unscaled 255 10.4 3.76 14.3 SQP Scaled 121 7.21 3.02 9.43 Unscaled 69.5 5.44 2.63 4.47 BFGS Scaled 69.5 5.44 2.63 4.47 PAGE 55 41 Discussion This chapter evaluated a recent variation of the PSO algorithm with dynamic inertia and velocity updating as a possibl e addition to the arsenal of methods that can be applied to difficult biomechanical optimization problems. For all problems investigated, our PSO algorithm with standard algorithm parameters pe rformed better than did three off-the-self optimizers GA, SQP, and BFGS. For the anal ytical test problems PSO robustness was found to be better than that of two other global algorithms but worse than that of a third. For the biomechanical test problem with a dded numerical noise, PSO was found to be insensitive to design variable scaling while GA was only mildly sensitive and SQP and BFGS highly sensitive. Overall, the result s suggest that our PSO algorithm is worth consideration for difficult biomechanical optimization problems, especially those for which design variable scaling may be an issue. Though our biomechanical optimization invol ved a system identification problem, PSO may be equally applicable to pr oblems involving forward dynamic, inverse dynamic, inverse static, or image matching an alyses. Other global methods such as SA and GA have already been applied successfully to such problems [47,48,62], and there is no reason to believe that PSO would not pe rform equally well. As with any global optimizer, PSO utilization would be limited by the computational cost of function evaluations given the large numbe r required for a global search. Our particle swarm implementation may also be applicable to some large-scale biomechanical optimization problems. Outside the biomechanics arena [71,72,82-91], PSO has been used to solve problems on the order of 120 design variables [89-91]. In the present study, our PSO algorithm was uns uccessful on the largest test problem, H3 with n = 32 design variables. However, in a recent study, our PSO algorithm successfully solved PAGE 56 42 the Griewank global test problem with 128 design variables using population sizes ranging from 16 to 128 76. When the Corana test problem (H3) was attempted with 128 DVs, the algorithm exhibited worse converg ence. Since the Griewank problem possesses a bumpy but continuous search space and the Corana problem a highly discrete search space, our PSO algorithm may work best on gl obal problems with a continuous search space. It is not known how our PSO algor ithm would perfor m on biomechanical problems with several hundred DVs, such as the forward dynamic optimizations of jumping and walking performed with parallel SQP in [44-46]. One advantage of global algorithms such as PSO, GA, and SA is that they often do not require significant algorithm parameter tuni ng to perform well on difficult problems. The GA used by Soest and Casius in [48] (which is not freely available) required no tuning to perform well on all of these partic ular analytical test problems. The SA algorithm used by Soest and Casius in [48] required tuning of two parameters to improve algorithm robustness significantly on those prob lems. Our PSO algorithm (which is freely available) required tuning of one parameter (wd, which was increased from 1.0 to 1.5) to produce 100% success on the two problems where it had significant failures. For the biomechanical test problem, our PSO al gorithm required no tuning, and only the population size of our GA algor ithm required tuning to improve convergence speed. Neither algorithm was sensitive to the two sour ces of noise present in the problem noise added to the synthetic mark er trajectories, and noise due to a somewhat loose convergence tolerance in the Levenberg-Ma rquardt sub-optimizations. Thus, for many global algorithm implementations, poor perf ormance on a particular problem can be rectified by minor tuning of a sma ll number of algorithm parameters. PAGE 57 43 Figure 10 Sensitivity of gradient calculations to selected finite difference step size for one design variable. Forward and centra l differencing were evaluated using relative convergence tolerances of 10-3 and 10-6 for the nonlinear least squares sub-optimizations performed during cost function evaluation (see Eq. (3.20)). In contrast, gradient-based algorithms such as SQP and BFGS can require a significant amount of tuning even to begin to approach global optimizer results on some problems. For the biomechanical test pr oblem, our SQP and BFGS algorithms were highly tuned by scaling the design variables and determining the optimal FDSS for each design variable separately. FDSS tuning was espe cially critical due to the two sources of noise noted above. When forward and central difference gradient results were compared for one of the design variables using tw o different Levenberg-Marquardt relative convergence tolerances (10-3 and 10-6), a "sweet spot" was found near a step size of 10-2 (Figure 10). Outside of that "sweet spot," which was automatically identified and used in generating our SQP and BFGS results, forwar d and central difference gradient results diverged quickly when the looser tolerance wa s used. Since most users of gradient-based optimization algorithms do not scale the design variables or tune the FDSS for each design variable separately, and many do not perform multiple runs, our SQP and BFGS PAGE 58 44 results for the biomechanical test problem represent best-case rather than typical results. For this particular problem, an off-the-sh elf global algorithm such as PSO or GA is preferable due to the significant reduction in effort required to obtain repeatable and reliable solutions. Another advantage of PSO and GA algorithms is the ease with which they can be parallelized [48,76] and their resu lting high parallel efficien cy. For our PSO algorithm, Schutte et al. [76] recently reported near ideal parallel efficiency for up to 32 processors. Soest and Casius [48] reported near ideal parallel e fficiency for their GA algorithm with up to 40 processors. Though SA has historic ally been considered more difficult to parallelize [92], Higginson et al. [493]444 recently developed a new parallel SA implementation and demonstrated near ideal parallel efficiency fo r up to 32 processors. In contrast, Koh et al. [77] reported poor SQP parallel effi ciency for up to 12 processors due to the sequential nature of the line search portion of the algorithm. The caveat for these parallel efficiency resu lts is that the time required per function evaluation was approximately constant and the computational nodes were homogeneous. As shown in [76], when function evaluations take different amounts of time, parallel efficiency of our PSO algorithm (and any othe r synchronous parallel algorithm, including GA, SA, SQP, and BFGS) will degrade with increasing number of processors. Synchronization between indivi duals in the population or between individual gradient calculations requires slave computational nodes that have completed their function evaluations to sit idle unt il all nodes have returned thei r results to the master node. Consequently, the slowest computational node (whether loaded by other users, performing the slowest function evaluation, or possessing the slowest processor in a PAGE 59 45 heterogeneous environment) will dictate the overall time for each parallel iteration. An asynchronous PSO implementation with load balancing, where the global best-found position is updated continuously as each par ticle completes a function evaluation, could address this limitation. However, the exte nt to which convergen ce characteristics and scale independence would be affected is not yet known. To put the results of our study into prope r perspective, one must remember that optimization algorithm robustness can be influenced heavily by algorithm implementation details, and no single optimiza tion algorithm will work for all problems. For two of the analytical test problems (H2 and H3 with n = 4), other studies have reported PSO results using formulations that did not include dynamic inertia and velocity updating. Comparisons are di fficult given differences in the maximum number of function evaluations and number of particles, but in general, al gorithm modifications were (not surprisingly) found to influe nce algorithm converg ence characteristics [94-496]. For our GA and SQP algorithms, results for th e analytical test problems were very different from those obtaine d by Soest and Casius in [48] using different GA and SQP implementations. With seven mutation and f our crossover operat ors, the GA algorithm used by Soest and Casius in [48] was obviously much more complex than the one used here. In contrast, both SQP algorithms were highly-developed commercial implementations. In contrast, poor performan ce by a gradient-based algorithm can be difficult to correct even with design variab le scaling and careful tuning of the FDSS. These findings indicate that specific algorithm implementations, rather than general classes of algorithms, must be evaluated to reach any conclusions about algorithm robustness and performance on a particular problem. PAGE 60 46 Conclusions In summary, the PSO algorithm with dynamic inertia and velocity updating provides another option for difficult biomechan ical optimization problems with the added benefit of being scale independent. There are few algorithm-specific parameters to adjust, and standard recommended settings work well for most problems [70,94]. In biomechanical optimization problems, noise, multiple local minima, and design variables of different scale can limit the reliability of gradient-based algorithms. The PSO algorithm presented here provides a simple -to-use off-the-shelf alternative for consideration in such cases. The algorithms main drawback is the high cost in terms of function evaluations because of slow convergence in the final st ages of the optimization, a common trait among global search algorithms. The time requirements associat ed with the high computational cost may be circumvented by utilizing the parallelism inherent in the swarm algorithm. The development of such a parallel PSO algorithm will be detailed in the next chapter. PAGE 61 47 CHAPTER 4 PARALLELISM BY EXPLOITING PO PULATION-BASED ALGORITHM STRUCTURES Overview The structures of population based optimi zers such as Genetic Algorithms and the Particle Swarm may be exploite d in order to enable these al gorithms to utilize concurrent processing. These algorithms require a set of fitness values for the population or swarm of individuals for each iterat ion during the search. The fitness of each individual is independently evaluated, and may be assigne d to separate computational nodes. The development of such a parallel computational in frastructure is detailed in this chapter and applied to a set of large-scale analyti cal problems and a biomechanical system identification problem for the purpose of quan tifying its efficiency. The parallelization of the PSO is achieved with a master-slave, coarse-grained implementation where slave computational nodes are associated with indi vidual particle search trajectories and assigned their fitness evaluations. Greatly enhanced computation throughput is demonstrated using this infrastructure, w ith efficiencies of 95% observed for load balanced conditions. Numerical example problem s with large load imbalances yield poor performance, which decreases in a linear fa shion as additional nodes are added. This infrastructure is based on a two-level approa ch with flexibility in terms of where the search effort can be concentrated. For the two problems presented, the global search effort is applied in the upper level. PAGE 62 48 This work was done in collaboration with Jeff Reinbolt, who created the biomechanical kinematic analysis software [58] and evaluated the quality of the solutions found by the parallel PSO. The work presente d in this chapter wa s also published in [58,76,129]. EQUATION CHAPTER 4 SECTION 1 Introduction Present day engineering optimization pr oblems often impose large computational demands, resulting in long solution times ev en on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained paralle lization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima largescale analytical test problems with com putationally cheap function evaluations and medium-scale biomechanical system identi fication problems with computationally expensive function evaluations. For load-balan ced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In c ontrast, for load-imbalanced biomechanical system identification problems with 12 desi gn variables, speedup pl ateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the sync hronization requirem ent of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical prob lems were solved using a fixed number of swarm iterations, a single population of 128 pa rticles produced a bett er convergence rate than did multiple independent runs perfor med using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 part icles).These results suggest that PAGE 63 49 (1) parallel PSO exhibits ex cellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-lif e problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available. Numerical optimization has been widely used in engineering to solve a variety of NP-complete problems in areas such as structural optimization, neural network training, control system analysis and design, and la yout and scheduling problems. In these and other engineering disciplines, two major obs tacles limiting the solution efficiency are frequently encountered. First, even medi um-scale problems can be computationally demanding due to costly fitn ess evaluations. Second, engi neering optimization problems are often plagued by multiple local optima, requiring the use of global search methods such as population-based algorithms to de liver reliable results. Fortunately, recent advances in microprocessor and network techno logy have led to increased availability of low cost computational power through clusters of low to medium performance computers. To take advantage of these a dvances, communication layers such as MPI [3, 5] and PVM [97] have been used to develop paralle l optimization algorithms, the most popular being gradient-based, genetic (GA), and simulated annealing (SA) algorithms [48,98,99]. In biomechanical optimizations of human movement, for example, parallelization has allowed problems requiring days or weeks of co mputation on a singleprocessor computer to be solved in a matter of hours on a multi-processor machine [98]. The particle swarm optimization (PSO) algorithm is a recent addition to the list of global search methods [100].This derivative-free method is particularly suited to continuous variable problems and has received increasi ng attention in the optimization community. It PAGE 64 50 has been successfully applied to large-scale problems [69,100,101] in several engineering disciplines and, being a populat ion-based approach, is readily parallelizable. It has few algorithm parameters, and generic settings for these parameters work well on most problems [70,94]. In this study, we present a para llel PSO algorithm fo r application to computationally demanding optimization problems. The algorithms enhanced throughput due to parallelization and impr oved convergence due to increased population size are evaluated using large-scale anal ytical test problems and medium-scale biomechanical system identification problem s. Both types of problems possess multiple local minima. The analytical test problems utilize 128 design variables to create a tortuous design space but with computationally cheap fitness evaluations. In contrast, the biomechanical system identification problem s utilize only 12 desi gn variables but each fitness evaluation is much more costly computationally. These two categories of problems provide a range of load balan ce conditions for evaluating the parallel formulation. Serial Particle Swarm Algorithm Particle swarm optimization was introdu ced in 1995 by Kennedy and Eberhart [66]. Although several modifications to the origin al swarm algorithm have been made to improve performance [68,102-105] and adapt it to specific types of problems [69,106,107], a parallel version ha s not been previously implemented. The following is a brief introduction to the operation of th e PSO algorithm. Consider a swarm of p particles, with each particles position representing a pos sible solution point in the design problem space D. For each particle i, Kennedy and Eberhart pr oposed that its position xi be updated in the following manner: PAGE 65 51 11,iii kkk xxv (4.1) with a pseudo-velocity vi calculated as follows: 11122 iiiii kkkkkkkwcrcr vvpx g x (4.2) Here, subscript k indicates a (unit) pseudo-time increment, pi k represents the best ever position of particle i at time k (the cognitive contributio n to the pseudo-velocity vector vi k+1), and pg k represents the global best position in the swarm at time k (social contribution). r1 and r2 represent uniform random number s between 0 and 1.To allow the product c1r1 or c2r2 to have a mean of 1, Kennedy and Eberhart proposed that the cognitive and social scaling parameters c1 and c2 be selected such that c1 = c2 = 2 .The result of using these proposed values is th at the particles overshoot the target half the time, thereby maintain ing separation within the group and allowing for a greater area to be searched than if no over shoot occurred. A modificati on by Fourie and Groenwold [69] on the original PSO algorithm 66 allows transition to a more refined search as the optimization progresses. This operator reduces the maximum allowable velocity vmax k and particle inertia wk in a dynamic manner, as dictated by the dynamic reduction parameters d, wd. For the sake of brevity, further details of this operator are omitted, but a detailed description can be found in References [69,70]. The serial PSO algorithm as it would typically be implemented on a single CPU computer is described below, where p is the total number of particles in the swarm. The best ever fitness value of a particle at design coordinates pi k is denoted by fi best and the best ever fitness value of the overall swarm at coordinates pg k by fg best. At time step k = 0, the particle velocities vi 0 are initialized to values within the limits 0 v 0 vmax 0 .The vector vmax 0 is calculated as a fraction of the distance between the upper and lower bounds vmax 0 = (xUB xLB) [69], with = 0 .5. With this background, the PSO algorithm flow can be described as follows: PAGE 66 52 6. Initialize a. Set constants 1c, 2c, maxk, max 0v, 0w, dv, dw, and d b. Initialize dynamic maximum velocity max kvand inertia wk c. Set counters k = 0, t = 0, i =1. Set random number seed. d. Randomly initialize pa rticle positions in0D xi for p i, 1 e. Randomly initialize particle velocities max 0 00v v i for p i, 1 f. Evaluate cost function values if0 using design space coordinates i 0x for p i , 1 g. Set i i bestf f0 and i i 0 0x p for p i, 1 h. Set g bestf to best i bestf and 0g to corresponding i 0x 7. Optimize a. Update particle velocity vectors i k 1v using Eq.(4.2) b. If i k 1 v > max 1 kv for any component, then set that component to its maximum allowable value c. Update particle position vectors i k 1 x using Eq. (4.1) d. Evaluate cost function values i kf1 using design space coordinates i k 1x for p i , 1 e. If i best i kf f 1, then i k i bestf f1 i k i k 1 1 x p for p i , 1 f. If g best i kf f 1, then i k g bestf f1, i k k 1 1 x g for p i, 1 g. If g bestf was improved in (e), then reset t = 0. Else increment t. If k > kmax go to 3 h. If t = d, then multiply 1 kw by dw 1 and max 1 kvby dv 1 i. If maximum number of function evaluations is exceeded, then go to 3 j. Increment i. If i > p then increment k, and set i = 1 PAGE 67 53 k. Go to 2(a). 8. Report results 9. Terminate The above logic is illustra ted as a flow diagram in Figure 11 without detailing the working of the dynamic reduction parameters Problem independent stopping conditions based on convergence tests are difficult to defi ne for global optimizers. Consequently, we typically use a fixed number of fitness evaluations or swarm iterations as a stopping criteria. Parallel Particle Swarm Algorithm The following issues had to be addressed in order to create a parallel PSO algorithm. Concurrent Operation and Scalability The algorithm should operate in such a fashi on that it can be easily decomposed for parallel operation on a multi-processor machine. Furthermore, it is highly desirable that it be scalable. Scalability implies that the natu re of the algorithm s hould not place a limit on the number of computational nodes that can be utilized, thereby permitting full use of available computational resources. An example of an algorithm with limited scalability is a parallel implementation of a gradient-based optimizer. This algorithm is decomposed by distributing the workload of the derivativ e calculations for a single point in design space among multiple processors. The upper limit on concurrent operations using this approach is therefore set by th e number of design variables in the problem. On the other hand, population-based methods such as the GA and PSO are better suited to parallel computing. Here the population of individuals representing designs can be increased or decreased according to the avai lability and speed of processo rs Any additional agents in the population will allow for a higher fidelity search in the design space, lowering PAGE 68 54 susceptibility to entrapment in local minima However, this come s at the expense of additional fitness evaluations. Figure 11 Serial implementation of PSO algor ithm. To avoid complicating the diagram, we have omitted velocity/in ertia reduction operations. Asynchronous vs. Synchronous Implementation The original PSO algorithm was implemented with a synchronized scheme for updating the best remembered ind ividual and group fitness values fi k and fg k PAGE 69 55 respectively, and their associated positions pi k and pg k This approach entails performing the fitness evaluations for th e entire swarm before updati ng the best fitness values. Subsequent experimentation re vealed that improved conve rgence rates can be obtained by updating the fi k and fg k values and their positions after each individual fitness evaluation (i.e. in an asynchronous fashion) [70,94]. It is speculated that because the updati ng occurs immediately after each fitness evaluation, the swarm reacts more quickly to an improvement in the best-found fitness value. With the parallel implementation, however, this asynchronous improvement on the swarm is lost since fitness evaluations are pe rformed concurrently. The parallel algorithm requires updating fi k and fg k for the entire swarm after all fitness evaluations have been performed, as in the original particle swar m formulation. Conseque ntly, the swarm will react more slowly to changes of the best fitness value posi tion in the design space. This behavior produces an unavoidable perfor mance loss in terms of convergence rate compared to the asynchronous implementation and can be considered part of the overhead associated with parallelization. Coherence Parallelization should have no adverse aff ect on algorithm operation. Calculations sensitive to program order should appear to have occurred in exactly the same order as in the serial synchronous formulation, leading to the exact same final answer. In the serial PSO algorithm the fitness evaluations form th e bulk of the computational effort for the optimization and can be performed independen tly. For our parallel implementation, we therefore chose a coarse decomposition sche me where the algorithm performs only the fitness evaluations concurrently on a parallel machine. Step 2 of the particle swarm optimization algorithm was modified accordi ngly to operate in a parallel manner: PAGE 70 56 2) Optimize a) Update particle velocity vectorsi k 1 v using Eq.(4.2) b) If i k 1v > max 1kv for any component, then set that component to its maximum allowable value c) Update particle position vectors i k 1x using Eq. (4.1) d) Concurrently evaluate tness values i kf1 using design space co-ordinates i k 1 xfor p i , 1 e) If i best i kf f 1, then i k i bestf f1, i k i k 1 1 x p for p i, 1 f) If g best i kf f 1, then i k g bestf f1 i k k 1 1 x g for p i , 1 g) If g bestf was improved in (e), then reset t = 0. Else increment t. If k > kmax go to 3 h) If t = d, then multiply 1 kw by dw 1 and max 1kvby dv 1 i) If maximum number of function evaluations is exceeded, then go to 3 j) Increment k k) Go to 2(a). The parallel PSO algorithm is re presented by the flow diagram in Figure 12. Network Communication In a parallel computational environment, the main performance bottleneck is often the communication latency between processors. This issue is especially relevant to large clusters of computers where the use of high performance ne twork interfaces are limited due to their high cost. To keep communicati on between different computational nodes at a minimum, we use fitness evaluation tasks as the level of granularity for the parallel software. As previously mentioned each of these evaluations can be performed PAGE 71 57 independently and requires no communication aside from receiving design space coordinates to be evaluated a nd reporting the fitness value at the end of the analysis. Figure 12 Parallel implementation of th e PSO algorithm. We have again omitted velocity/inertia reduction operations to avoid complicating the diagram. PAGE 72 58 The optimization infrastructure is organi zed into a coordinating node and several computational nodes. PSO algorithm functions and task orchestrati on are performed by the coordinating node, which assigns the design co-ordinates to be evaluated, in parallel, to the computational nodes. With this a pproach, no communication is required between computational nodes as individua l fitness evaluations are inde pendent of each other. The only necessary communication is between the coordinating node and the computational nodes and encompasses passing the following information: 1) Several distinct design variable configur ation vectors assigned by coordinating node to slave nodes for fitness evaluation. 2) Fitness values reported from sl ave nodes to coordinating node. 3) Synchronization signals to ma intain program coherence. 4) Termination signals from coordinating node to slave nodes on completion of analysis to stop the program cleanly. The parallel PSO scheme and required co mmunication layer were implemented in ANSI C on a Linux operating system using the me ssage passing interface (MPI) libraries. Synchronization and Implementation From the parallel PSO algorithm, it is clea r that some means of synchronization is required to ensure that all of the particle fitness evaluations have been completed and results reported before the velocity and position calculations can be executed (steps 2a and 2b).Synchronization is done using a ba rrier function in the MPI communication library which temporarily stops the coordi nating node from proceeding with the next swarm iteration until all of the computational nodes have responded w ith a fitness value. Because of this approach, the time require d to perform a single parallel swarm fitness evaluation will be dictated by the slowes t fitness evaluation in the swarm. Two networked clusters of computers were used to obtain the numerical results. The first PAGE 73 59 cluster was used to solve the analytical te st problems and comprised 40 1 .33 GHz Athlon PCs located in the High-performance Co mputing and Simulation (HCS) Research Laboratory at the University of Florida. The second group was used to solve the biomechanical system identification problem s and consisted of 32 2 .40 GHz Intel PCs located in the HCS Research Laboratory at Fl orida State University. In both locations, 100 Mbps switched networks were u tilized for connecting nodes. Sample Optimization Problems Analytical Test Problems Two well-known analytical test problems were used to evaluate parallel PSO algorithm performance on large-scale problems with multiple local minima (see Appendix A for mathematical description of both problems).The first was a test function (Figure 13 (a)) introduced by Griewank [108] which superimposes a high-frequency sine wave on a multi-dimensional parabola. In cont rast, the second problem used the Corana test function [109] which exhibits discrete jumps throughout the design space (Figure 13(b)).For both problems, the number of local minima increases exponentially with the number of design variables. To investigate la rge-scale optimization issues, we formulated both problems using 128 design va riables. Since fitness evalua tions are extremely fast for these test problems, a delay of approximatel y half a second was built into each fitness evaluation so that total computation time would not be swamped by communication time. Since parallelization opens up the possibility of utilizing large numbers of processors, we used the analytical test problems to inves tigate how convergence ra te and final solution are affected by the number of particles employed in a parallel PSO run. To ensure that all swarms were given equally fair starting pos itions, we generated a pool of 128 initial PAGE 74 60 positions using the Latin Hypercube Sampler (LHS ). Particle positions selected with this scheme will be distributed uniformly throughout the design space [110]. This initial pool of 128 pa rticles was divided into th e following sub-swarms: one swarm of 128 particles, two sw arms of 64 particles, four swarms of 32 particles, and eight swarms of 16 particles. Each sub-swar m was used independen tly to solve the two analytical test problems. This approach allowed us to investigate whether is it more efficient to perform multiple parallel optimi zations with smaller population sizes or one parallel optimization with a larger po pulation size given a sufficient number of processors. To obtain comparisons for conve rgence speed, we allowe d all PSO runs to complete 10,000 iterations before the search was terminated. This number of iterations corresponded to between 160,000 and 1,280,000 fitness evaluations depending on the number of particles em ployed in the swarm. Biomechanical System Iden tification problems In addition to the analytical test prob lems, medium-scale biomechanical system identification problems were used to eval uate parallel PSO performance under more realistic conditions. These problems were variati ons of a general problem that attempts to find joint parameters (i.e. positions and orientations of joint axes) that match a kinematic ankle model to experimental surface marker data [56]. The data are collected with an optoelectronic system that uses multiple cam eras to record the positions of external markers placed on the body segments. To permit measurement of three-dimensional motion, we attach three non-colinear markers to the foot and lower leg. The recordings are processed to obtain marker trajectories in a laboratory-fixed co-ordinate system [55111, 112] The general problem possesses 12 design variables and requires approximately 1 minute for each fitness evaluation. Thus, wh ile the problem is only medium-scale in PAGE 75 61 terms of number of design variables, it is still computationally costly due to the time required for each fitness evaluation.. Figure 13 Surface plots of the (a) Griewank a nd (b) Corana analytical test problems showing the presence of multiple lo cal minima. For both plots, 126 design variables were fixed at their optimal values a nd the remaining 2 design variables varied in a small re gion about the global minimum. The first step in the system identification procedure is to formulate a parametric ankle joint model that can emulate a patient s movement by possessing sufficient degrees of freedom. For the purpose of this paper, we approximate the talocrural and subtalar PAGE 76 62 joints as simple 1 degree of freedom revolute joints. The resulting ankle joint model (Figure 6) contains 12 adjustable parameters that define its kinematic structure [56]. The model also has a set of virtual markers fixed to the limb segments in positions corresponding to the locations of real marker s on the subject. The linkage parameters are then adjusted via optimization until marker s on the model follow the measured marker trajectories as closely as possible. To qua ntify how closely the kinematic model with specified parameter values can follow meas ured marker trajectories, we define a cumulative marker error e as follows: 2 11 nm ij jie (4.3) where 2222 ,,,, ijijijij x yz (4.4) where xi,j, yi,j and zi,j are the spatial displacement errors for marker i at time frame j in the x ,y ,and z directions as measured in the laboratory-fixed coordinate system, n = 50 is the number of time frames, and m = 6 (3 on the lower leg and 3 on the foot) is the number of markers. These errors are calculated between the experimental marker locations on the human subject and the virtual marker locations on the kinematic model. For each time frame, a non-linear least s quares sub-optimization is performed to determine the joint angles that minimize i,j 2 given the current set of model parameters. The first sub-optimization is started from an initial guess of zero for all joint angles. The sub-optimization for each subsequent time frame is started with the solution from the previous time frame to speed convergence. By performing a separate sub-optimization for each time frame and then calculating the sum of the squares of the marker co-ordinate errors, we obtain an estimate of how well th e model fits the data for all time frames PAGE 77 63 included in the analysis. By varying the model parameters and repeating the suboptimization process, the parall el PSO algorithm finds the best set of model parameters that minimize e over all time frames. For numerical testing, three variations of this general problem were analyzed as described below. In all cases the number of particles used by the parallel PSO algorithm was set to a recommended value of 20 [94]. 1) Synthetic data without numerical noise: Synthetic (i.e. computer generated) data without numerical noise were ge nerated by simulating marker movements using a lower body kinematic model with virtual markers. The synthetic motion was based on an experimentally meas ured ankle motion (see 3 below).The kinematic model used anatomically realistic joint positions and orientations. Since the joint parameters associated with the synthetic data were known, his optimization was used to verify that the parallel PSO algorithm could accurately recover the original model. 2) Synthetic data with numerical noise: Numerical noise was superimposed on each synthetic marker coordinate traject ory to emulate the effect of marker displacements caused by skin movement artifacts [53]. A previously published noise model requiring three random parameters was used to generate a perturbation N in each marker coordinate [80]: sinNAt (4.5) where A is the amplitude, the frequency, and the phase angle of the noise. These noise parameters were treated as uniform random variables within the bounds 0 A 1cm, 0 25 rad/s, and 0 2 (obtained from [80]). 3) Experimental data: Experimental marker trajec tory data were obtained by processing three-dimensional recordings from a subject performing movements with reflective markers attached to the f oot and lower leg as previously described. Institutional review board approval was obtained for the experiments and data analysis, and the subject gave informed consent prior to participation. Marker positions were reported in a laboratory-fixed coordinate system. Speedup and Parallel Efficiency Parallel performance for both classes of problems was quantified by calculating speedup and parallel efficiency for different numbers of pr ocessors. Speedup is the ratio of sequential execution time to parallel execution time and ideally should equal the PAGE 78 64 number of processors. Parallel efficiency is the ratio of speedup to number of processors and ideally should equal 100%.For the analytical test proble ms, only the Corana problem was run since the half second delay adde d to both problems makes their parallel performance identical. For the biomechanical system identification problems, only the synthetic data with numerical noise case wa s reported since experimentation with the other two cases produced sim ilar parallel performance. Figure 14 Average fitness convergence histor ies for the (a) Griewank and (b) Corana analytical test problems for swarm sizes of 16,32,64,and 128 particles and 10000 swarm iterations. Triangles indicat e the location on ea ch curve where 160 000 fitness evaluations were completed. PAGE 79 65 The number of particles and nodes used fo r each parallel evaluation was selected based on the requirements of the problem. Th e Corana problem with 128 design variables was solved using 32 particles and 1, 2, 4, 8, 16, and 32 nodes. The biomechanical problem with 12 design variables was solved using 20 particles and 1, 2, 5, 10, and 20 nodes. Both problems were allowed to run unt il 1000 fitness evaluations were completed. Numerical Results Convergence rates for the two analytical test problems differed significantly with changes in swarm size. For the Griewank problem (Figure 14(a)), individual PSO runs converged to within 1e-6 of the global minimum after 10 000 optimizer iterations, regardless of the swarm size. R un-to-run variations in final fi tness value (not shown) for a fixed swarm size were small compared to va riations between swarm sizes. For example, no runs with 16 particles produced a better final fitness value than any of the runs with 32 particles, and similarly for the 16, 32, and 64 combinations. When number of fitness evaluations was considered instead of number of swarm iterations, runs with a smaller swarm size tended to converge more qui ckly than did runs with a larger swarm size (see triangles in Figure 14). However, two of the eight runs with the smallest number of particles failed to show continued improvement near the maximum number of iterations, indicating pos sible entrapment in a local mini mum. Similar results were found for the Corana problem (Figure 14(b)) with two exceptions. First, the optimizer was unable obtain the global minimum for any swar m size within the specified number of iterations (Figure 14(b)), and second, overlapping in results between different swarm sizes was observed. For example, some 16 par ticle results were better than 32 particles results, and similarly for the other neighbor ing combinations. On average, however, a larger swarm size tended to produce better results for both problems. PAGE 80 66 Table 4 Parallel PSO results for the biom echanical system identification problem using synthetic marker trajectories without and with numerical noise. Optimizations on synthetic with noise and without noise were with 20 particles and were terminated after 40000 fitness evaluations. The parallel PSO algorithm found ankle joint parameters consistent with the known solution or results in the literature [61-63].The algorithm had no difficulty recovering the original parameters fr om the synthetic date set without noise (Table 4), producing a final cumulative error e on the order of 10 13.The original model was recovered with mean orientation errors less than 0.05 and mean position errors less than 0.008cm. Furthermore, the parallel implementation produced identical fitness and parameter histories as did a synchronous serial implementation. For th e synthetic data set with superimposed noise, a RMS marker distance e rror of 0.568cm was found, which is on the order of the imposed numerical noise w ith maximum amplitude of 1cm. For the experimental data set, the RMS ma rker distance error was 0.394cm (Table 5), comparable to the error for the synthetic data with noise. Conve rgence characteristics were similar for the three data sets consider ed in this study. The initial convergence rate Model Upper Lower SyntheticSynthetic data parameter bound bound solution Without noise With noise p1 (deg) 48.67 11.6318.3718.36 15.13 p2 (deg) 30.00 30.000.00 0.01 8.01 p3 (deg) 70.2310.2340.2340.26 32.97 p4 (deg) 53.00 7.0023.0023.03 23.12 p5 (deg) 72.0012.0042.0042.00 42.04 p6 (cm) 6.27 6.270.000.00 0.39 p7 (cm) 33.70 46.24 39.97 39.97 39.61 p8 (cm) 6.27 6.270.00 0.00 0.76 p9 (cm) 0.00 6.27 1.00 1.00 2.82 p10 (cm) 15.272.729.009.00 10.21 p11 (cm) 10.42 2.124.154.15 3.03 p12 (cm) 6.89 5.650.620.62 0.19 PAGE 81 67 was quite high (Figure 15(a)), where after it slowed when the approximate location of the global minimum was found. Table 5 Parallel PSO results for the biom echanical system identification problem using synthetic marker trajectories without and with numerical noise. As the solution process proceeded, the optim izer traded off increases in RMS joint orientation error (Figure 15(b)) for decreases in RMS joint position error (Figure 15(c)) to achieve further minor reductions in the fitness value. The analytical and biomechanical pr oblems exhibited different parallel performance characteristics. The analytical problem demonstrated al most perfectly linear speedup (Figure 16(a), squares) resulti ng in parallel efficienci es above 95% for up to 32 nodes (Figure 16(b), squares). In contrast, the biomechanical problem exhibited speedup results that plateaued as the number of nodes was increased (Figure 16(a), circles), producing parallel efficiencies that decreased almost linea rly with increasing number of nodes (Figure 16(b), circles). Each additional node produced roughly a 3% reduction in parallel efficiency. Discussion This study presented a parallel implemen tation of the particle swarm global optimizer. The implementation was evaluate d using analytical test problems and biomechanical system identification problems. Speedup and parallel efficiency results were excellent when each fitness eval uation took the same amount of time. Synthetic Data Experimental RMS errors Without noiseWith noise data Marker distances (cm) 3.58E-04 0.568 0.394 Orientation parameters (deg)1.85E-02 5.010 N/A Position parameters (cm) 4.95E-04 1.000 N/A PAGE 82 68 Figure 15 Fitness convergence and parameter error plots for the biomechanical system identification problem using synthetic data with noise PAGE 83 69 Figure 16 (a) Speedup and (b) parallel effici ency for the analytical and biomechanical optimization problems. For problems with large numbers of design variables and multiple local minima, maximizing the number of particles produced bette r results than repeated runs with fewer particles. Overall, parallel PSO makes e fficient use of comput ational resources and provides a new option for computationa lly demanding engineering optimization problems. The agreement between optimized and known orientation parameters p1 p4 for the biomechanical problem using synthetic da ta with noise was poor er than initially expected. This finding was the direct result of the sensitivity of orie ntation calculations to errors in marker positions caused by the inj ected numerical noise. Because of the close proximity of the markers to each other, even relatively small amplitude numerical noise PAGE 84 70 in marker positions can result in large fluctuat ions in the best-fit joint orientations. While more time frames could be used to offset the effects of noise, this approach would increase the cost of each fitness evalua tion due to an increased number of suboptimizations. Nonetheless, the fitness value for the optimized parameters was lower than that for the parameters used to genera te the original noise less synthetic data. Though the biomechanical optimizati on problems only involved 12 design variables, multiple local minima existed wh en numerical or experimental noise was present. When the noisy synthetic data set wa s analyzed with a gradient-based optimizer using 20 random starting points, the optimi zer consistently found distinct solutions, indicating a large number of local minima. Si milar observations were made for a smaller number of gradient-based runs performed on th e experimental data set. To evaluate the parallel PSOs ability to avoid entrapment in these local minima, we performed 10 additional runs with the algorithm. All 10 r uns converged to the same solution, which was better than any of the soluti ons found by gradient-based runs. Differences in parallel PSO performance be tween the analytical test problem and the biomechanical system identification pr oblem can be explained by load balancing issues. The half second delay added to the analytical test prob lem made all fitness evaluations take approximately the same amount of time and substant ially less time than communication tasks. Consequently, load imbala nces were avoided and little degradation in parallel performance was observed with in creasing number of processors. In contrast, for the biomechanical system identification problem, the time required to complete the 50 sub-optimizations was sensitive to the sele cted point in design space, thereby producing load imbalances. As the number of processors increased, so did the likelihood that at least PAGE 85 71 one fitness evaluation would take much longer than the others. Due to the synchronization requirement of the current pa rallel implementation, the resulting load imbalance caused by even one slow fitness ev aluation was sufficient to degrade parallel performance rapidly with increasing numb er of nodes. An asynchronous parallel implementation could be developed to addre ss this problem with the added benefit of permitting high parallel efficiency on inhom ogeneous clusters. Our results for the analytical and biomechanical optimization problems suggest that PSO performs best on problems with continuous rather than discrete noise. The al gorithm consistently found the global minimum for the Griewank problem, even when the number of particles was low. Though the global minimum is unknown for the biomechanical problem using synthetic data with noise, multiple PSO runs consisten tly converged to the sa me solution. Both of these problems utilized continuous, sinusoidal noise functions. In contrast, PSO did not converge to the global minima for the Corana problem with its disc rete noise function. Thus, for large-scale problems with multiple local minima and discrete noise, other optimization algorithms such as GA may provide better results [48]. Use of a LHS rather than uniform random sampling to generate initial points in design space may be a worthwhile PSO algor ithm modification. Experimentation with our random number generator indicated that in itial particle positions can at times be grouped together. This motivated our use of LH S to avoid re-sampling the same region of design space when providing initial guesses to sub-swarms. To inve stigate the influence of sampling method on PSO convergence rate, we performed multiple runs with the Griewank problem using uniform random sampling and a LHS with the default design variable bounds ( 600 to +600) and with th e bounds shifted by 200 ( 400 to +800).We PAGE 86 72 found that when the bounds were shifted, convergence rate with uniform random sampling changed while it did not with a LH S. Thus, swarm behavior appears to be influenced by sampling method, and a LHS may be helpful for minimizing this sensitivity. A secondary motivation for running the anal ytical test problems with different numbers of particles was to determine whet her the use of sub-swarms would improve convergence. The question is whether a larg er swarm where all particles communicate with each other is more efficient than multiple smaller swarms where particles communicate within each sub-swarm but not be tween sub-swarms. It is possible that the global best position found by a large swarm may unduly in fluence the motion of all particles in the swarm. Creating sub-swarms that do not communicate eliminates this possibility. In our approach, we performed th e same number of fitness evaluations for each population size. Our results for both analyt ical test problems suggest that when a large numbers of processors are available, increasing the swarm size will increase the probability of finding a better solution. Analys is of PSO convergence rate for different numbers of particles also suggests an interest ing avenue for future investigation. Passing an imaginary curve through the triangles in Figure 14 reveals that for a fixed number of fitness evaluations, convergence rate increase s asymptotically with decreasing number of particles. While the soluti on found by a smaller number of particles may be a local minimum, the final particle positions may s till identify the general region in design space where the global minimum is located. Consequently, an adaptive PSO algorithm that periodically adjusts the number of particle s upward during the course of an optimization may improve convergence speed. For example, an initial run with 16 particles could be PAGE 87 73 performed for a fixed number of fitness evalua tions. At the end of that phase, the final positions of those 16 particles would be kept but 16 new particles would be added to bring the total up to 32 particles. The algor ithm would continue using 32 particles until the same number of fitness evaluations wa s completed. The process of gradually increasing the number of particles would c ontinue until the maximum specified swarm size (e.g.128 particles) was analyzed. To ensu re systematic sampling of the design space, a LHS would be used to generate a pool of sample points equal to the maximum number of particles and from which sub-samples woul d be drawn progressively at each phase of the optimization. In the scenar io above with a maximum of 128 particles, the first phase with 16 particles would remove 16 sampled points from the LHS pool, the second phase another 16 points, the third phase 32 points, and the final phase the remaining 64 points. Conclusions In summary, the parallel Particle Swarm Optimization algorithm presented in this chapter exhibits excellent pa rallel performance as long as in dividual fitness evaluations require the same amount of time. For optimi zation problems where the time required for each fitness evaluation varies substantially, an asynchr onous implementation may be needed to reduce wasted CPU cycles and main tain high parallel efficiency. When large numbers of processors are available, us e of larger population sizes may result in improved convergence rates to the global so lution. An adaptive PSO algorithm that increases population size incrementally may also improve algorithm convergence characteristics. PAGE 88 74 CHAPTER 5 IMPROVED GLOBAL CONVERGENCE USING MULTIPLE INDEPENDENT OPTIMIZATIONS Overview This chapter presents a methodology fo r improving the global convergence probability in large scale global optimizati on problems in cases where several local minima are present. The optimizer applied in this methodology is the PSO, but the strategy outlined here is applic able to any type of algorith m. The controlling idea behind this optimization approach is to utilize se veral independent optimizations, each using a fraction of a budget of computational reso urces. Although optimizations may have a limited probability of converg ence individually as compared to a single optimization utilizing the full budget, it is shown that wh en they are combined they will have a cumulative convergence probability far in excess of the single optimization. Since the individual limited optimizations are independent they may be executed concurrently on separate computation nodes w ith no interaction. Th is exploitation of parallelism allows us to vastly increase the probability of c onvergence to the global minimum while simultaneously reducing the required wall clock time if a parallel machine is used. EQUATION CHAPTER 5 SECTION 1 Introduction If we consider the general global uncons trained optimization problem for the real valued function fx defined on the set D xin n one cannot state that a global solution has been found unless an ex haustive search of the set is D x is performed PAGE 89 75 [115]. With a finite number of function evaluations, at best we can only estimate the probability of arriving at or near the gl obal optimum. To solve global optimization problems reliably the optimizer needs to achieve an efficient balance between sampling the entire design space and directing progressi vely more densely spaced sampling points towards promising regions for a more refined search [116]. Many algorithms achieve this balance, such as the deterministic DIRECT optimizer [117] or stochastic algorithms such as genetic algorithms [118], simulated annealing [119,120], clustering [121], and the particle swarm optimizer [122]. Although these population-based global optimi zation algorithms are fairly robust, they can be attracted, at least temporarily, towards local optima whic h are not global (see, for example, the Griewank problem in Figure 17). Figure 17 Multiple local minima for Griewank analytical problem surface plot in two dimensions This difficulty can be addressed by allo wing longer optimization runs or an increased population size. Both these options often result in a decr ease in the algorithm efficiency, with no guarantee that the optim izer will escape from the local optimum. PAGE 90 76 It is possible that restarting the algorith m when it gets stuck in a local minimum and allowing multiple optimization runs may be a more efficient approach. This follows from the hypothesis that several limited indepe ndent optimization runs, each with a small likelihood of finding the global op timum, may be combined in a synergistic effort which yields a vastly improved global convergence pr obability. This approach is routinely used for global search using multi-start local optimizers [123]. Le Riche and Haftka have also suggested the use of this approach with genetic algorithms for solving complex composite laminate optimization problems [124]. The main difficulty in the application of such a multi-run strategy is deciding when the optimizer should be stopped. The objective of this manuscript is to solve this problem by developing an efficient and robust scheme by which to allocate computational resources to individual optimizations in a set of multiple optimizations. The organization of this manuscript is as follows: First a brief description of the optimization algorithm applied in this study, th e PSO algorithm is given. Next a set of analytical problems are described, along w ith details on calculating global convergence probabilities. After that, the multiple run methodology is outlined and a general budget strategy is presented for dividing a fixed number of fitness evaluations among multiple searches on a single processor. The use of this method on a parallel processing machine is also discussed. Then, numerical results base d on the multiple run strategy for both single and multi-processor machines are reported and discussed. Finally, general conclusions about the multi-run methodology are presented. PAGE 91 77 Methodology Analytical Test Set The convergence behavior of the PSO al gorithm was analyzed with the Griewank [108], Shekel [114] and Hartman [114] analytical problems (see Appendix A for problem definitions), each of which possess multiple lo cal minima. Analytical test problems were used because global solutions f* are known a-priori. The known solution value allows us to ascertain if an optimization has converg ed to the global minimum. To estimate the probability of converging to the global opt imum, we performed 1000 optimization runs for each problem, with each run limited to 500,000 fitness evaluations. These optimization runs are performed with identical parameter settings, with the exception of a different random number seed for each optimi zation run in order to start the population at different initial points in the design space. To evaluate the global convergence probability of the PSO algorithm as a function of populat ion size, we solved each problem using a swarm of 10, 20, 50 and 100 particles. A standa rd set of parameters were used for the other algorithm parameters (Table 6). Table 6 Particle swarm algorithm parameters ParameterDescription Value c1 Cognitive trust parameter 2.0 c2 Social trust parameter 2.0 w0 Initial inertia 1 wd Inertia reduction parameter 0.01 Bound on velocity fraction 0.5 vd Velocity reduction parameter 0.01 d Dynamic inertia/velocity reduction delay (function evaluations) 200 PAGE 92 78 We assumed that convergence to the globa l optimum was achieved when the fitness value was within a predetermined fixed tolerance t (see Table 7) of the known global optimum* f tff (5.1) For the Shekel and Hartman problems, the tolerance ensures that the minimum corresponding to the global optimum for the problem has been found. That is, because 0 the exact optimum is not obtained, but if a local optimizer is started from the PSO solution found with the give n tolerance it will converge to the global optimum. For the Griewank problem, however, starting a local optimizer at the PSO solution will not guarantee convergence to the gl obal optimum, since this noisy, shallow convex problem has several local minima grouped around the gl obal optimum that wi ll defeat a local optimizer Table 7 Problem convergence tolerances Multiple-run Methodology The use of multiple optimizations using a global optimizer such as a GA was first proposed by Le Riche and Haftka [124]. However, no criterio n was given on the division of computational resources between the multiple optimizations, and the efficiency of the approach was not investigated. The method en tails running multiple optimizations with a reduced number of fitness evaluations, either by limiting the number of algorithm iterations or reducing the popul ation while keeping the numbe r of iterations constant. Individually, the convergence probability of such a limited optimization may only be a Problem Convergence tolerance t Griewank 0.1 Shekel 0.001 Hartman 0.001 PAGE 93 79 fraction of a single traditi onal optimization run. Howeve r, the cumulative convergence probability obtained by combining the limited ru ns can be significantly higher than that of the single run. Previously similar studies have been undertaken to investigate the efficiency of repeated optimizations using simple search algorithms such as pure random search, grid search, and random walk [130,131]. The use of multiple local optimizations or clustering [132] is a common practice but for some algorithms the efficiency of this approach decreases rapidly when problems with a high number of local minima are encountered [130]. For estimating the efficiency of the pr oposed strategy and for comparison with optimizations with increased populations/allow ed iterations, we are required to calculate the probability of convergence to the global op timum for an individual optimization run, Pi. This convergence probability cannot be eas ily calculated for pr actical engineering problems with unknown solutions. For the se t of analytical problems however, the solutions are known and a large number of optimizations of these problems can be performed at little computational cost. With some reasonable assumptions these two facts allow us to estimate the probability of converg ence to the global optimum for individual optimization runs. The efficiency and explor ation run considerations derived from the theoretical analytical results are equally applicable to practical engineering problems where solutions are not known a prio ri. The first step in calculating Pi is using the convergence ratio, Cr, which is calculated as follows: cN C r N (5.2) where Nc is the number of globally converged optimizations and N is the number of optimizations, in this case 1000. For a very la rge number of optimizations the probability PAGE 94 80 Pi that any individual run converges to the global optimum approaches Cr. For a finite number of runs however the standard error se in Pi can be quantified using: 1ii ePP s N (5.3) which is an estimate of th e standard deviation of Cr. For example, if we obtain a convergence probability of Pi = 0.5 with N = 1000 optimizations, the standard error would be se = 0.016. To obtain the combined cumulative probability of finding the global optimum by multiple independent optimizations we appl y the statistical law for calculating the probability for success with repeated indepe ndent events. We denote the combined or cumulative probability of N multiple independent optimization runs converging to the solution as Pc, and using the fact that the conve rgence events are uncorrelated then 11N ciPP (5.4) where Pi is the probability of the i th single individual optimiza tion run converging to the global optimum. If we assume that individua l optimization runs with similar parameter settings, as in the case of the following st udy, have equal probability of convergence we can simplify Eq. (5.4) to 11N ciPP (5.5) The increase in cumulative probability Pc with fixed values of Pi for increasing number of optimization runs N is illustrated in Figure 18. It must be stressed that the above relations are only valid for uncorrelate d optimizations, which may not be the case when a poor quality random number generator is used to generate initial positions in the design space. Certain generato rs can exhibit a tendency to favor regions in the design space, biasing the search and probability of convergence to a minimum in these regions. PAGE 95 81 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of runsConvergence reliability0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Figure 18 Cumulative convergence probability Pc as a function of the number of optimization runs with assumed equal Pi values To verify the cumulative probability va lues predicted in theory with Eq. (5.5), the Monte Carlo method is applied, sampling random pairs, trip lets, quintuplets etc. of optimizations in the pool of 1000 runs. For exam ple, to estimate the experimental global convergence probability of two runs, we sele cted a large number of random pairs of optimizations among the 1000 runs. Applying Eq. (5.2), the number of cases in which either or both of the pairs converged Nc for the limited optimization run, divided by N (total number of pairs selected) will yi eld the experimental global convergence probability. Exploratory run and budgeting scheme Using the multiple run methodology requires a budget strategy by which to divide a fixed budget of fitness evaluations among the independent optimization runs. The budget of fitness evaluations nb is usually dictated by how mu ch time the user is willing to allocate on a machine in order to solve a problem divided by how long a single fitness evaluation takes to execute. An explor atory optimization utilizing a fraction, nl of this PAGE 96 82 budget is required to determine the interac tion between the optimizer and problem. The fitness history of this optimization is used to obtain an estimate of the number of fitness evaluations to be allocated to each run, ni. This strategy is based on the assumption that a single fitness history will be sufficient to quantify the opt imizer behavior on a problem. For the set of problems it is observed that a co rrelation exists between the point where the fitness history levels off and the convergence history levels off (Figure 19). 0 5 10 x 104 0 0.2 0.4 0.6 0.8 1 Fitness EvaluationsConvergence probability 0 5 10 x 104 100 102 Griewank, 20 particles Fitness EvaluationsFitness Value 0 0.5 1 1.5 2 x 104 0 0.1 0.2 0.3 0.4 Fitness EvaluationsConvergence probability 0 0.5 1 1.5 2 x 104 -3 -2 -1 0 Hartman, 20 particles Fitness EvaluationsFitness Value 0 5 10 x 104 0 0.2 0.4 0.6 0.8 1 Fitness EvaluationsConvergence probability 0 5 10 x 104 -10 -8 -6 -4 -2 0 Shekel, 20 particles Fitness EvaluationsFitness Value Figure 19 Fitness history a nd convergence probability Pc plots for Griewank, Hartman and Shekel problems We hypothesize that the algorithm will converg e quickly to the optimum or stall at a similar number of fitness evaluations (Figure 20). The explorator y run is stopped using a stopping criterion which monitors the rate of change of the objective fitness value as a function of the number of fitness evaluations. As soon as this rate of improvement drops below a predetermined value (i.e. the fitne ss value plot levels off), the exploratory optimization is stopped and the number of fitness evaluations noted as nl. The stopping criterion parameters used for obtaining the numer ical results is a change of less than 0.01 in fitness value for at least 500 functions evaluations. PAGE 97 83 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 -12 -10 -8 -6 -4 -2 0 Fitness evaluationsFitness value Figure 20 Typical Shekel fitnes s history plots of 20 optimiza tions (sampled out of 1000) The remainder of the budgeted fitness eval uations is distribu ted among a number of N independent optimizations, whic h may be calculated as follows bl lnn N n (5.6) with an allowed number of fitne ss evaluations per run calculate by ,bl iilnn nnn N (5.7) If a multi-processor machine is available, very high Pc values may be reached using a multiple run strategy. If we take the si mple case where each independent optimization run is assigned to a separate node, the multiple run approach will be constrained somewhat differently than the previous singl e-processor case. Rather than the number of multiple optimizations being limited by a fixed budget of fitness evaluations (which is divided equally among the set of multiple independent optimizations using Eq.(5.6)), the number of optimization runs will be defi ned by the number of computational nodes and the wall clock time available to the user. A similar method to that followed for a single PAGE 98 84 processor machine for determining algorithm/problem behavior must still be followed to determine the optimal number of fitness evaluations for a single independent optimization run. This exploratory ru n can, however, be done using a parallel implementation of the population-based algorithm under consideration, in which concurrent processing is achieved through functional decomposition [125]. Bayesian convergence probability estimation If the amount of time and the global probability of convergence are competing considerations a Bayesian convergence proba bility estimation method may be used, as proposed by Groenwold et al. [134,135]. This criterion states that the optimization is stopped once a certain confidence level is reached, which is that the best solution found among all optimizations f will be the global solution* f This probability or confidence measure is given in [135] as *!2! Pr1 2!!NaNb ffq NaNb (5.8) where q is the predetermined confidence level se t by the user, usually 0.95. N is the total number of optimizations performed up to the time of evaluating the stopping criteria, and 1aab 1cbbNwith a, b suitable parameters of a Beta distribution ,ab. The number of optimizations among the total N which yield a final value of f is defined as Nc. Values of parameter a and b were chosen as 1 and 5 respectively, as recommended by Groenwold et. al.[135]. PAGE 99 85 Numerical Results Multi-run Approach for Predetermined Number of Optimizations For the three problems under consideration only a limited improvement in global convergence probability is achieved by applying the traditional approaches of increasing the number of fitness evaluations or the population size (Figure 21). 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fitness evaluationsConvergence probability Pi 10 particles 20 particles 50 particles 100 particles Figure 21 Shekel convergence probability for an individual optimizat ion as a function of fitness evaluations and population size For the Shekel problem, us ing larger swarm sizes and/ or allowing an increased number of fitness evaluations yielded higher co nvergence probabilities only up to a point. Similar results were obtained for the Griewa nk and Hartman problem cases. On the other hand, optimizations with a small number of particles reach ed moderate global convergence probabilities at significantly fewer fitness evaluations than did optimizations with large swarms. This behavior was obser ved for all the problems in the test set (Figure 19). To exploit this behavior we replace a single optimization with several PSO runs, each with a limited population and number of iterations. These individual optimizations PAGE 100 86 utilize the same amount of resources allocated to the original single optimization (in this case the number of fitness evaluations). To illustrate the merit of such an approach we optimize the Hartman analytical problem with and without multiple limited optimizations. We observe that for a single optimization the probability of convergen ce is not significantly improved by allowing more fitness evaluations, or by increasing the population size (Figure 22). 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fitness evaluationsConvergence ratio 10 particles 20 particles 50 particles 100 particles Pc with 10 indep. optimizations Figure 22 Theoretical cumulative convergence probability Pc as a function of the number of optimization runs with constant Pi for the Hartman problem. Multiple independent runs with 10 particles. We also observe that an optimization with 10 particles quickly attains a probability of convergence of Pi = 0.344 after only 10,000 fitness evaluations. Using a multiple run strategy with 10 independent optimizations of 10,000 fitness eval uations yields the theoretical Pc values reported in Table 8 (calculated using Eq. (5.5) with Pi = 0.344 and n = 1,,10). These values are indicated as circ led data points at a sum equivalent number of fitness evaluations in Figure 22 for comparison with a single extended optimization. PAGE 101 87 The cumulative conve rgence probability Pc using multiple optimizations is far superior to that of using a single optimiza tion run of up to 100 particles. Table 8 Theoretical convergence proba bility results for Hartman problem Multi-run Efficiency To investigate the most efficient manner in which to divide a budget of fitness evaluations among multiple independent optimiza tions, we compare the efficiency of the 2, 5 10 and 12 independent optimiza tions of the Griewank problem (Figure 23). A budget of 200,000 fitness evaluations is allowed for op timizing this problem. This results in each optimization in the set being stopped at ni = 100,000, 40,000, 20,000 and 16,500 fitness evaluations. It can be seen that using a combination of independent runs with high Pi values (with a high number of the associated of fitness evaluations) or multiple runs with low Pi values will not yield the same efficiency (as defined by Pc per fitness evaluation). If the predicted Pc values are plotted on the same graph (Figure 23) it is observed that the two combinations of 5 a nd 10 optimizations yield the highest Pc values for a given number of fitness evalua tions. In both cases the indepe ndent runs are stopped at a number of fitness evaluations close to the point where Pi levels off. The dependence of Number of runs n Cumulative convergence probability Pc Cumulative fitness evaluations 1 0.344 10000 2 0.570 20000 3 0.718 30000 4 0.815 40000 5 0.879 50000 6 0.920 60000 7 0.948 70000 8 0.966 80000 9 0.978 90000 10 0.985 100000 PAGE 102 88 efficiency on the choice of nl can be explained as follows: If the independent optimization is stopped prematurely it will result in very low values of Pi. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fitness evaluationsConvergence probability Pc 1 optimization run 2 optimization runs 5 optimization runs 10 optimization runs 12 optimization runs Figure 23 Theoretical convergence probability Pc with sets of multiple runs for the Griewank problem Although a greater number of independent optimizations may then be performed within the single processor budget (Eq. (5.6)) it may still result in very poor efficiency, such as the 12 independent run case in Figure 23. If on the other hand an excessive amount of fitness evaluations are allowed for each independent run the strategy also suffer since only a reduced number of optimi zations can then be performed (see the 2 independent run case in Figure 23). To maximize the effici ency of the multi-run strategy it is therefore desirable to allow only a number of fitness evaluations corresponding to the point where Pi starts leveling off. From the above we can conclude that to obtain maximum efficiency, the individual runs should be terminated after reaching a num ber of fitness evaluations corresponding to PAGE 103 89 the point where Pi starts leveling off. The exact Pi convergence probability data, however, is not available unless a great number of optimizations are performed, and must be estimated. This estimation is done by obser ving where the fitness history of a single optimization starts leveling off. The impact and robustness of using a single exploratory optimization fitness history to determine ni for each optimization was investigated for all three analytical problems. The exploratory optimization was stopped using a rate of change stopping criteria, and N and ni was calculated using (5.6)-(5.7). To verify that a single optimization will be sufficient to give robust re sults the rate of change stopping criteria was applied to the pool of 1000 optimizations for each of the three analytical test problems, and yielded minimum, maximum and median values of nl reported in Table 9. Table 9 Minimum, maximum and median fitn ess evaluations when applying ratio of change stopping criteria on pool of 1,000 optimizations for Griewank, Hartman and Shekel problems The minimum and maximum nl fitness evaluation hist ory plots and corresponding convergence probability plots are given in Figure 24. It can be seen that applying the rate of change stopping criteria on a single run to estimate nl gives fairly robust results, with only a small variation in Pc efficiency for all problems. Bayesian Convergence Probability Estimation The Bayesian budgeting scheme (Eq. (5.8)) indicates a consistently conservative estimation of the cumulative convergence probability for the Griewank (Figure 25), Hartman (Figure 26), and Shekel (Figure 27) problems. In order to show the accuracy of this method as the Pc values approach 1 the probability of failure (1 Pc) is used on a Problem Minimum nl Median nl Maximum nl Griewank 23,480 31,810 45,780 Hartman 14,180 16,440 23,360 Shekel 15,460 18,660 35,860 PAGE 104 90 logarithmic ordinate axis. The Bayesian es timation method is sens itive to the problem and type of optimization algorithm u nder consideration, and values of a and b parameters in the Beta distribution require fine t uning to obtain a more accurate estimation. 0 1 2 3 4 5 x 104 -12 -10 -8 -6 -4 -2 0 Shekel 0 1 2 3 4 5 x 104 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 Hartman 0 0.5 1 1.5 2 x 104 0 10 20 30 40 50 FitnessGriewank 0 0.5 1 1.5 2 x 105 0 0.2 0.4 0.6 0.8 1 Fitness evaluationsConvergence probability 0 0.5 1 1.5 2 x 105 0 0.2 0.4 0.6 0.8 1 Fitness evaluations 0 0.5 1 1.5 2 x 105 0 0.2 0.4 0.6 0.8 1 Fitness evaluations Figure 24 Theoretical convergence probability Pc using information from exploratory optimizations which are stopped using a ra te of change stopping condition for the Griewank, Hartman and Shekel problem s. A solid line denotes the longest exploratory run in the pool of 1000 op timizations, and a dashed line denote shortest. The parameters recommended by Groenwold et.al. [135], however, yield a consistently conservative estimation of the confidence level. The results suggest the Bayesian prediction of Pc may be useful when confidence in the solution is traded of with the optimization time, on either a si ngle or parallel processor machine. PAGE 105 91 0 0.5 1 1.5 2 2.5 3 x 105 10-3 10-2 10-1 100 Fitness evaluations1 PcGriewank a = 1, b = 5 Individual run convergence ratio Predicted multi-run Pc Sampled Bayesian Pc estimate Figure 25 Bayesian Pc estimation comparison to usi ng extrapolated and randomly sampled optimizations out of pool of 1000 runs for Griewank problem. 0 0.5 1 1.5 2 2.5 3 x 105 10-3 10-2 10-1 100 Fitness evaluations1 PcHartman a = 1, b = 5 Individual run convergence ratio Predicted multi-run Pc Sampled Bayesian Pc estimate Figure 26 Bayesian Pc estimation comparison to usi ng extrapolated and randomly sampled optimizations out of pool of 1000 runs for Hartman problem. PAGE 106 92 0 0.5 1 1.5 2 2.5 3 x 105 10-3 10-2 10-1 100 Fitness evaluations1 PcShekel a = 1, b = 5 Individual run convergence ratio Predicted multi-run Pc Sampled Bayesian Pc estimate Figure 27 Bayesian Pc estimation comparison to usi ng extrapolated and randomly sampled optimizations out of pool of 1000 runs for Shekel problem. Monte Carlo Convergence Probability Estimation To verify the cumulative probability va lues predicted in theory with Eq. (5.5), the Monte Carlo method is applied, sampling random pairs, trip lets, quintuplets etc. of optimizations in the pool of 1000 runs. For exam ple, to estimate the experimental global convergence probability of two runs, we sele cted a large number of random pairs of optimizations among the 1000 runs. Applying Eq. (5.2), the number of cases in which either or both of the pairs converged Nc for the limited optimization run, divided by N (total number of pairs selected) will yi eld the experimental global convergence probability. This was done for Figure 22 to Figure 24, and presented in Appendix B. Conclusions For the set of large scale optimization problems evaluated with the PSO, the multirun strategy with small PSO populations de livers higher global c onvergence probability than a single run with a large population and an equal number of fitness evaluations. On PAGE 107 93 both serial and parallel machin es, a fraction of the allocated budget of fitness evaluations or computer time is required to evaluate the optimizer/probl em behavior. This exploratory optimization is terminated usi ng a rate of change stopping criteria. The number of fitness evaluations required by expl oratory run is used to calculate the total number of runs and the remainder of the budget of evaluations is divided among them. This approach allows the strategy to util ize the computational resources efficiently, preventing premature termination or wasted fitness evaluations on already converged optimizations. Close correlation between theoretically predicted cumulative convergence probability and the experimentally sampled pr obability is obtained for the strategy on a single processor machine. A Bayesian convergence probability estimation method may be used to stop a serial or parallel optimization when optimization re liability is traded off with time for optimization. This Bayesian prediction of the cumulative convergence probability is consistently conservative for all the pr oblems tested when using parameters recommended in the literature. Very high global convergence probabilities can be achieved in a limited time span on a massively parallel machine using the mu lti-run strategy, making this method a useful tool to solve difficult and computationally intensive large scale engineering problems. PAGE 108 94 CHAPTER 6 PARALLELISM BY DECOMPOSITION METHODOLOGIES Overview Some classes of engineering problems may be subdivided into several more tractable sub-problems by applying decom position strategies. This process of decomposition generally involves identifying gr oups of variables and constraints with minimal influence on each other. The choi ce of which decompositi on strategy to apply depends largely on the original problem stru cture and the interaction among variables. This chapter details the application of one such methodology -the quasiseparable decomposition strategyto a structural sizing problem. A structural optimization problem is used to illustrate the methodology of this strategy, and the resulting two-level optimization required to solve it. The research detailed in this chap ter was also published as [138] in collaboration with R.T. Haftka and L.T. Watson. Introduction Several decomposition strategies have been proposed in the past in order to allow the solution of problems that would otherwise be too demanding computationally. One such method, the Quasiseparable decompos ition strategy addresses a wide class of problems found in structural engineering. These problems allow themselves to be subdivided into lesser dimensional subsyste ms containing system level and component level variables. Interactions among these subsystems occur only through system level variables. Optimizing a several lower dimensi onal problems is desi rable over the all-atonce approach for the reason that the number of local minima ma y increase exponentially PAGE 109 95 for certain classes of problems. An additional bonus is that parallel processing may also be utilized. EQUATION CHAPTER 6 SECTION 1 Throughout this chapter the te rms system level and component level refer to the upper and lower level of the bi-level decomposed problem. The terms global and local optimizations are used exclusively to refer to the optimization techni que used, e.g. global PSO vs. local SQP. The method outlining the decomposition and optimization approach followed for solving the above types of problems, the Quasiseparable Decomposition strategy, was recently proposed by Haftka and Watson [136]. This approach intr oduces the concept of a "budget" of objective function leeway assigned to each of the decomposed component subsystems. This leeway allows for a global search to be performed for each subsystem. A two-level optimization scheme is applied to the decomposed problem, with subsystem optimizations at the lower level, and coordination achie ved through an upper-level optimization. Each of the subsystems is i ndependently optimized by adjusting subsystem variables, while constrained within its a llowed budget, by maximizing the constraint margins. The upper-level optimi zer coordinates the subsyste m optimizations by adjusting the system variables and assigned budgets. It is formally proven in [136] that a decomposition of the system using this approach will not induce spurious local minima. Previously th is strategy was applied to a portal beam example with both continuous [137] and discrete [138] variables. The numerical example in this work cons ists of maximizing the deflection of an end loaded stepped cantilever beam Figure 28. This problem has its roots in structural applications where large deformations without plastic strain are desirable, for example PAGE 110 96 MEMS micro actuation devices. In this ex ample the heights and widths and wall thicknesses of 5 sections of a stepped cantilever beam are optimized subject to aspect ratio and stress constraints in order to attain maximum tip deflection. Due to the nature of the problem there are several local minima pr esent in the 20 dimensional design space. This problem serves to illustrate the methodology followed to decompose a structure for solution by the quasiseparable optim ization strategy. It al so demonstrates the reduced search effort required to obtain the solution by the quas iseparable method as compared to the traditional all-at-once a pproach where all variables are optimized simultaneously. Quasiseparable Decomposition Theory The method considers the special case where a system problem may be decomposed with the objectiv e and constraints separable in to a function containing only global variables s and one or more global and local variables l(i) in the form 0 ,,min,i i slfslfsfsl (6.1) subject to 00 ,0,1,,iigs gsliN (6.2) In terms of global optimization the objective functions ,i i f sl should have as much leeway as possible. This can be done by introducing a budget ibE for each objective function ,i i f sl and attacking the subsystem constraints ,0iigsl by maximizing the constraint margin i for each subsystem. This decomposition strategy can then be formulated as follows: 0 1,minN i sl i f slfsb (6.3) PAGE 111 97 subject to 00 ,0,1,,igs sbiN (6.4) where ,isb is the (global) solution to the i th lower-level problem given by minii l (6.5) subject to 1max,0 ,,0,1,,iii ji jm i iiigsl f slbsbiN (6.6) The above strategy is only appropriate to minimization however. Since the optimization in the following example is ta rgeted towards maximization, with the standard form 0 ,,max,i i sl f slfsfsl (6.7) subject to 00 ,0,1,,iigs gsliN (6.8) We need to recast the above decompositi on and optimization method into a strategy where a set of performance measures di is maximized, in place of the minimization of budgets bi. Again, in terms of global optim ization the objective functions ,i i f sl should have as much leeway as possible. In place of using minimization of budgets we establish performance targets idE for each objective function ,i i f sl. Similar to the minimization strategy we attack the subsystem constraints ,0iigsl by maximizing the constraint margin i for each subsystem. This yields the following 0 1,maxN i sl i f slfsd (6.9) subject to PAGE 112 98 00 ,0,1,,igs sdiN (6.10) where ,isd is the (global) solution to the ith lower level problem given by maxii l (6.11) which may be computed by selecting the ma ximum constraint margin and maximizing 1max,0 ,,0,1,,iii ji jm i iiigsl dfslsdiN (6.12) alternatively formulated as 1minmin,,,,1,,iiii ijii jmgslfsldiN (6.13) Stepped Hollow Cantilever Beam Example For an illustration of the application of the quasiseparable decomposition method to a structural problem we consider a hollow stepped cantilever beam (see Figure 28) consisting of 5 sections, each section de fined by four design variables, width (w), height (h), top and bottom wall thickness (th) and left and right wall thickness (tw) (see Figure 29). A load P with y component Py and z component Pz is applied to the tip of the beam. The tip displacement of this prismatic can tilever beam is maximized subject to a material stress constraint in addition to asp ect ratio constraints. Equations describing tip displacement contributions yi and zi for each section i can be obtained using Castiglianos method. 53 4 210 22 2 545354 543 000 22 254315432 21 00 L y ll l yyy zzz ll yy zzMM dx EIP PxPxlPxll dxdxdx EIEIEI PxlllPxllll dxdx EIEI (6.14) where E is the Youngs modules and Iz1 through Iz5 are the moment of inertias about the neutral z-axis of the particular beam section under consideration. PAGE 113 99 Figure 28 Stepped hollow cantilever beam Figure 29 Dimensional parameters of each cross section y h w z tw th y z P P L l1 l2 l3 l4 l5 y x z y y P x1 x2 x3 x4 x5 h w P z PAGE 114 100 If we choose l1 = l2 = l3 = l4 = l5 = 1m. the above becomes 54321 543217193761 33333yyyyy y zzzzz yyyyyPPPPP EIEIEIEIEI (6.15) Similarly the displacement in the z-direction is 54321 543217193761 33333zzzzz z yyyyy zzzzzPPPPP EIEIEIEIEI (6.16) Figure 30 Projected displacement in direction The tip deflection maximizati on is subject to the stress constraints for each beam section i (with the allowable stress reduced by safety factor SF). The stresses are calculated at the root of each beam segmen t, at a maximum distance from the bending axis (four corners of each root cross section) us ing the Euler-Bernoulli beam equations. 0 0 22allow i Q allow R zy i iSF Mh Mw IISF (6.17) with 3 2 3 2 3 32 2 12122 2 22 12122hiiwi iwiiwi yiwii iwihi wiiihi zihiiwitwt htwt Ith wtt thht Itwt (6.18) In addition to the stress constraints the aspe ct ratio of each beam segment is limited to y z i yi z i PAGE 115 101 0.25i iw h (6.19) The hollow cavity along the axial direc tion and lower bounds on wall thicknesses are accommodated through the fo llowing geometric constraints 0.01, 4 0.01 4i hi i wih t w t (6.20) The above constraints are then normalized, 1 2 3 4 5 6 7/10 4 10 4 10 10 0.01 10 0.01 /510 /510allow i hi i wi i hi wi i i i ig SF t g h t g w t g t g w g h h g w (6.21) This yields seven constraint s per section, for a total of 35 constraints. The beam material properties and applied load is given in Table 10 Table 10 Beam material propertie s and end load configuration. Material property Value Young's Modulus E 72e9 Pa Safety factor SF 3 Allowable stress allow 505e6 Pa l1, l2,l3,l4,l5 1m. Py 3535.5 N Pz 3535.5 N PAGE 116 102 Stepped hollow beam optimization. For comparative purposes the above optimi zation problem is first solved using the traditional all-at-once fashion (single level optimization) with the entire set of 20 design parameters as input variables. Th e system is then decomposed using the quasiseparable strategy, and optimized as 5 se ts of 4 dimensional sub-problems. For both approaches the tip displacement is maximized subject to the stated stress and geometric constraints. Using the principle of supe rposition and considering contributions by individual sections to the tip displacements in the z and y directions separately, we can formulate the tip displacement projected in a specified direction (see Figure 30) as follows: 5 1cossinyizi i (6.22) where yi and zi are the section tip displacement contributions in the y and z directions respectively. The maximum tip displacement may be calculated using Eq. (6.22) 1111225555 ,,,,,,, 11maxcossinwhwhyizi whttwhtt ii (6.23) The maximization of the tip deflection (Eq. (6.22)) can then be reformulated as follows 111122555 ,,,,,,, 1maxwhwhi whttwhtt i (6.24) with cossiniyizi (6.25) There are 2 local optima for each beam s ection when considering a solid beam cross section. This is explaine d by the fact that either the be am width or height will be the dominantly thin optimized dimension, which is primarily constrained by the yield stress and aspect ratio constraints (Eq. (6.19)). The maximum tip deflection may therefore be PAGE 117 103 achieved in two equally possible directions, if is chosen as 4 As an example, these two possible local minima are shown in Figure 31 for a solid cross section beam. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 hw 0.2972 0.2972 Figure 31 Tip deflection contour plot as a function of b eam section 5 with height h, and width w with yield stress and aspect ratio constraints indicated by dashed and dash dotted lines respectively. Crosse s indicate the two solutions (one dominated by bending in y-axis, and the other in the z-axis) for maximum tip displacement. This yields a total of 25 = 32 possible local minima for a solid beam, which are the possible combinations of tall thin sections (dominant bending in z-direction) and flat thin sections (dominant bending in y-direction). For the purpose of clearly identifying the global minimum in this work, we choose 3 which biases the tip deflection in the y direction. With the addi tional two design variables twi and thi for each section, used to define the hollow beam, several additional lo cal minima are induced in the system, but the global minimum remains biased in the z di rection. The global solution to the hollow beam and the associated design vectors and normalized constraint values are given in Table 11. The total tip deflection of this op timal design is 0.755m. It can be seen in Table PAGE 118 104 11 that the constraints g4 and g5, which require the hole to be at least 10mm square are not active for all but the last section, and then only in g5. Table 11 Stepped hollow beam global optimum Quasiseparable Optimization Approach By reformulating the equations gov erning the tip displacement (Eq. (6.22)) into the summation of individual contributions (Eq (6.24)) we observed that they are of the quasiseparable form (Eq.(6.9)). This allows us to optimize the problem using the decomposed formulation in Eq. (6.9)-(6.13). 111122555 111122550 ,,,,,,, 1,,,,,,,,maxwhwhwhwhi whttwhtt i f whttwhttfd (6.26) where the di's are the target displacement cont ributions by each beam section to the total tip displacement. There is no global variable contribution (f0) in this example thus (Eq (6.26)) can be written simply as 111122555 11112255 ,,,,,,, 1,,,,,,,,maxwhwhwhwhi whttwhtt i f whttwhttd (6.27) subject to ,0,1,,5iidi (6.28) where i is the (global) solution to the ith beam section optimization Section i 1 2 3 4 5 i,, (Tip deflection contribution m.) 0.2579 0.2107 0.1587 0.1004 0.0267 w (width, m.) 0.2722 0.2527 0.2296 0.2005 0.1136 h (height m.) 0.0544 0.0505 0.0459 0.0401 0.0400 twi (thickness, m.) 0.0680 0.0632 0.0574 0.0501 0.0284 thi (thickness, m.) 0.0136 0.0126 0.0115 0.0100 0.0100 constraint g1 0.0000 0.0000 0.0000 0.0000 0.0000 constraint g2 0.0000 0.0000 0.0000 0.0000 0.0000 constraint g3 0.0000 0.0000 0.0000 0.0000 0.0000 constraint g4 -5.8044-5.3167-4.7391 -4.0136 -1.8399 constraint g5 -0.3609-0.2633-0.1478 -0.0027 0.0000 constraint g6 0.0000 0.0000 0.0000 0.0000 -0.4320 constraint g7 -0.9600-0.9600-0.9600 -0.9600 -0.9296 Constraints defined in Eq. (6.21) PAGE 119 105 ,,,,,maxiiwihii whtt (6.29) which is obtained by selecting the maxi mum of the constraints defined in (6.21) 17maxmax,,,,,,,,,i ijiiwihiiiiiwihi jgwhttdfwhtt (6.30) Using the quasiseparable formulation of the optimization problem allows us to independently and concurrently optimize th e sub-problems (sections) on a lower level. The solutions obtained in the lower level optimization are th en coordinated in the upper level (Eq. (6.27)) optimization of the budgets in order to obtain the maximum tip deflection. The system level and component le vel optimization interact ions are illustrated in Figure 32. Figure 32 Quasiseperable optimization flow chart. di System level Optimizer 1 calculation (Eq. (6.30)) 5 1 i i d Component level optimizer f(s,l1) gj(s,l1)) Section 1 2 calculation (Eq. (6.30)) Component level optimizer f(s,l2) gj(s,l2)) Section 2 5 calculation (Eq. (6.30)) Component level optimizer f(s,l5) gj(s,l5)) Section 5 i di ,s, li PAGE 120 106 Results All-at-once Approach For the all-at-once approach an optimi zation was performed with the Matlab fmincon function, an implementatio n of the sequential quadrat ic programming algorithm. using 20 design variables and 30 constraints. The optimization converg ed to the total tip deflection and sectional tip defl ection contributions reported in Table 12. Although all stress and geometric constraints were satisfied it can be seen that the tip deflection did not reach the optimal value of 0.7545. Re peated optimization using random starting points with the Matlab fmincon optimizer revealed that seve ral such local optima exist. This is illustrated in Figure 33, which was generated by sorting the solu tions of the 1000 optimizations in an ascending order and then plotting. 0 200 400 600 800 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 OptimizationsTip deflection Median local optimum Figure 33 Results for 1000 al l-at-once optimizations. Tip deflection values sorted ascending and plotted as a function of op timizations. Flat areas in the graph represent optimizations which converg ed to the same local minimum. PAGE 121 107 It can be seen that the SQP has severe difficulty obtaining th e global minimum due to the presence of several local minima in the design space. The tip deflection contributions of the median optimum indicated in Figure 33 are reported in Table 12. Only a single optimization out of th e 1000 converged to the global optimum. Table 12 All-at-once a pproach median solution Hybrid all-at-once Approach To attempt to improve over the result s obtained in the previous section (fmincon applied in the all-at-once optim ization) a hybrid approach is applied to the problem. This approach used a population based search method, the PSO, coupled with the fmincon function. The solution vector obtained by th e PSO is used as a starting point for the fmincon function. Due to the increase in required computational effort to solve the problem only 100 optimizations are performed. The median optimization result is given in Table 13, and the ascending fitn ess value plot is given in Figure 34. A marked increase in optimizations converging to the global optimum is observed when using the hybrid approach, as compared to fmincon only. The number of optimizations which converged to the global optimum however, is still less than 7% of the total, and applying this strategy comes at significant additional computationa l cost. Despite the improvement in the average tip deflection this method did not obtain the global optimum of 0.7545, the maximum tip deflection attained was only 0.7278 (upper plateau in Figure 34). Optimal tip deflection contributions All-at-once tip deflection contributions Section 1 0.2579 0.0001 Section 2 0.2107 0.2107 Section 3 0.1588 0.0000 Section 4 0.1004 0.0599 Section 5 0.0267 0.0000 Total tip deflection 0.7545 0.2707 PAGE 122 108 0 10 20 30 40 50 60 70 80 90 100 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 OptimizationsTip deflection Median local optimum Figure 34 Hybrid PSO-fmincon st rategy for 100 optimizations Table 13 Hybrid, all-at-once median solution Quasiseparable Approach The quasiseparable optimization strategy is applied to the tip deflection problem by applying the Matlab fmincon optimizer to both the upper and lower level optimization. When optimizing the sub-problem using the fmincon function we observed that a substantial fraction of optimizations converg ed to the global optim um for that section Figure 35. The other plateau indicate conve rgence to the less favorable optimum (as Optimal tip deflection contributions Hybrid all-at-once tip deflection contributions Section 1 0.2579 0.2579 Section 2 0.2107 0.1271 Section 3 0.1588 0.1587 Section 4 0.1004 0.0001 Section 5 0.0267 0.0000 Total tip deflection 0.7545 0.5439 PAGE 123 109 previously indicated in Figure 31) and has the exact same cross section design configuration, but the section is rotated 90 degrees, which favors a displacement in the zdirection. A total of 140 runs out of the 1000 converged to the gl obal optimum in this component optimization. This fraction of converged optimizations correspond to a 0.14 convergence probability, and led us to appl y the multiple run strategy proposed in Chapter 5. The local optimizer SQP in the function fmincon is applied repeatedly to the sub optimization, for a total of 50 op timizations. This repeated use of fmincon proved to be far more computationally efficient at the lower level optimization compared to using a hybrid method such as PSO-fmincon. Applying Eq (5.5) with the assumption that all runs are independent yields a failure probability of approximately 5.31e-4 for each multiple run, and a combined 0.27% for all sections during the system level optimization. This results in a 7.7% probability of failure fo r a problem optimization with 30 upper level iterations (typical requi red number of iterations for this problem). The hybrid method was also applied to th e lower level optimization sub-problem and yielded the true optimum reliably, but took considerable time because of the high number of fitness evaluations required by the PSO, the effect of which are exacerbated by multiple iterations by the upper level optimizer. From Figure 36 it can be seen that the constraint margins are reduced to at least 10-10 after only 5 iterations of the upper level optimizer. Approximation of Constraint Margins Because of the expense of repeated lower level optimizations it may be advantageous to create a surr ogate approximation for calculating constraint margin values for each beam section, similar to an approach followed by Liu et. al. [139]. This surrogate serves as a simple lookup of the response ma ximized constraint as a function of the PAGE 124 110 section target tip displacement value at th e component level optimization. For each beam section a surrogate function is created by sa mpling a 100 target tip deflection points and fitting a one dimensional spline to the set of constraint margin response values. 0 100 200 300 400 500 600 700 800 900 1000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 OptimizationsSection 1 tip deflection contribution Figure 35 Repeated optimizations of section 1 sub-problem using fmincon function. Table 14 Quasiseparable optimization result This reduces the cost of obtaining marg in values to a simple interpolation calculation, in place of a gl obal optimization. The reduced computational cost at the lower level allows the use of robust but expe nsive optimizers such as the PSO algorithm at the upper level. Optimal tip deflection contributions Quasiseparable tip deflection contributions Section 1 target 0.2579 0.2579 Section 2 target 0.2107 0.2107 Section 3 target 0.1588 0.1588 Section 4 target 0.1004 0.1004 Section 5 target 0.0267 0.0267 Total tip deflection 0.7545 0.7545 PAGE 125 111 0 5 10 15 20 25 30 35 10-20 10-15 10-10 10-5 100 105 Upper level iterations 1 2 3 4 5 Sum budget Figure 36 Summed budget value and constr aint margins for individual sections. The exact optimum design variab les are all found to within 10-6m tolerances. The solution is also illustrated as a sequence of cross sections in Figure 38. Figure 37 Global and local optimum in sec tion 1 sub-optimization. Scale is 0.1:1 Numerical results obtained by using the PSO with the su rrogate approach on this problem are presented in Figure 39 and Figure 40. From these fi gures it can be observed y z Global optimum, 1 = 0.258 Local optimum, 1 = 0.156 PAGE 126 112 that the target tip displacements for all sections reach steady values after approximately 4000 function evaluations at the upper level. Figure 38 Decomposed cross sect ion solution. Scale is 0.1:1 The search enters a refined stage where the constraint margins are continued to be adjusted until about 8000 fitness evaluations. The final tip displacement contributions at 10000 fitness evaluations are reported in Table 15. Values in Table 15 indicate that the tip deflection contributions ar e all slightly overestimated, due to the use of a penalty method in the upper level optimization, require d for the accommodation of constraints. Table 15 Surrogate lower level approximation optimization results Discussion Repeated application of the all-at-once a pproach obtained only a single solution out of the 1000 optimizations which converged to the global optimum, giving a probability of convergence of 0.1%. The quasiseparable optim ization strategy combined with a multiOptimal tip deflection contributions Quasiseparable tip deflection contributions Section 1 target 0.2579 0.2580 Section 2 target 0.2107 0.2107 Section 3 target 0.1588 0.1588 Section 4 target 0.1004 0.1005 Section 5 target 0.0267 0.0268 Total tip deflection 0.7545 0.7547 y z Section 1 Section 2 Section 3 Section 4 Section 5 PAGE 127 113 run approach at the component level yields a probability of convergence of 92.3%. The quasiseparable strategy requires, (for a typical optimization of this problem) 7750 component level optimizations, but thes e are of reduced dimensionality (and computational effort), when compared to the original problem formulation. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Coordinate valueFunction Evaluations Figure 39 Target tip deflec tion value histories as a f unction of upper-level fitness evaluations Although the component level optimizations are of re duced effort they are repeatedly performed for optimization iterations of the upper level. It may therefore be advantageous for some types of problems to approximate the response margin calculated by the lower level optimization using a surrogate model. For this particular structural problem a very simple surrogate model was constructed, a one-dimensional spline for each section of the beam. PAGE 128 114 0 2000 4000 6000 8000 10000 10-10 10-5 100 1 0 2000 4000 6000 8000 10000 10-10 10-5 100 2 0 2000 4000 6000 8000 10000 10-10 10-5 100 3 0 2000 4000 6000 8000 10000 10-10 10-5 100 4 0 2000 4000 6000 8000 10000 10-10 10-5 100 5Fitness evaluations Figure 40 Constraint margin value histories as a func tion of upper-level function evaluations PAGE 129 115 These models, in place of the component le vel optimizations maintain a high level of reliability, and deliver a reasonably accura te solution. The deviation for the solution obtained using the surrogate approach (Table 15) was caused by the penalty multiplier in the constraint handling scheme being set too low, and not by the surrogate model itself. This was proven by using set of increasingly higher penalty multipliers which improved the solution to within the same accuracy as the surrogate model (which was calculated to within 10e-6). Higher accurac ies however, required additional computational effort. Conclusions This method shows promise in the fact that it can decompose weakly coupled structural problems and attack the sub-probl ems on a wider work front, using parallel processing and a two-level optimization scheme The results obtained for this approach show that the strategy can recover the same solution as for the original problem formulation. It is also shown that the stra tegy transfers the global search effort to the lower-level component optimization, which is of lower dimensionality, thereby making the problem more tractable to the optimizer A vastly improved probability of obtaining the global optimum is made possible by util izing the multi-run approach presented in Chapter 5 at the component-level global optimization. To further reduce computational effort it is demonstrated that for some problem types the component level optim izations may be replaced with a surrogate model. This approach reliably found the global optimum to the example problem with high accuracy. The quasiseparable decompos ition and optimization approach presented in this chapter is shown to be a robust and efficien t means of optimizing structural problems. PAGE 130 116 CHAPTER 7 CONCLUSIONS The approaches investigated in this dissertation yield si gnificantly increased throughput in global optimization by means of exploiting parallelism at different levels. This allows the user to solve more co mplex and computationally demanding problems than previously possible by fully utilizing parallel resources. These strategies are increasingly relevant because of the rapid development of parallel processing infrastructures, and its ever more common availability. Parallelism by Exploitation of Op timization Algorithm Structure The PSO algorithm applied in the concurrent processing strategies outlined in Chapters 3-5 is a useful candidate gl obal optimization method when considering continuous variable problems. The PSO also has the advantage of insensitivity to the scale of the design variables. The technique of exploiting parallelism in this algorithm presented in Chapter 4 is also applicable to other population based methods such as Genetic algorithms. The parallel Particle Swarm Optimization algorithm exhibits excellent parallel performance when indivi dual fitness evaluations require the same amount of time. For optimization problems where the time required for each fitness evaluation varies substantially, an as ynchronous implementation may be more computationally efficient. Parallelism through Multiple Independent Optimizations The decomposition and multiple independent optimization methods presented in Chapter 5 may be applied to problems using any optimization method appropriate to the PAGE 131 117 type of problem. For the set of large scale optimization problems evaluated with the PSO, the multi-run strategy with small PSO popul ations delivers higher global convergence probability than a single run with a larg e population and an equal number of fitness evaluations. On both serial and parallel mach ines, a fraction of the allocated budget of fitness evaluations or computer time is re quired to evaluate the optimizer/problem behavior. This fraction of the budget is used to calculate the total number of runs which will yield the most efficient distribution of the remainder of the budget. A Bayesian convergence estimation which is updated as th e optimizations complete may be used to stop a serial or parallel optimization when the available time is traded of with an confidence measure. Very high global conve rgence probabilities can be achieved in a limited time span on a massively parallel machine using the multi-run strategy, making this method a useful tool to solve difficult and computationally intensive large scale engineering problems Parallelism through Concurrent Optimi zation of Decomposed Problems The application of the quas iseparable decomposition me thod yields an efficient manner of distributing the global search effort required by a problem into several reduced and manageable portions. These decomposed el ements of the problems may be optimized concurrently, allowing faster optimization times, as compared to an all at once solution of the problem. The use of multiple independent optimizations at the component level optimizations also improved the reliability and efficiency of the strategy, and demonstrates that two of the methods of parallelism may be ef fectively combined. Future Directions The multiple independent optimization run methodology should be further refined by studying multiple stopping criteria that may be used to terminate the exploratory PAGE 132 118 optimization run. The Bayesian global conve rgence estimation may also be improved by selecting different distributi ons than the Beta distributi on used in this work, or by performing a parameter sensitivity study on and which may yield a more robust or accurate estimation. Summary All strategies presented in this dissert ation yield improved parallelism at various aspects of structural optimization. These st rategies may be effectively combined as shown in the example presented in Chapter 6, obtaining increased levels of efficiency, robustness, utilization of para llel resources in the optimiza tion of large scale problems. This will allow the use of large numbers of processors to solve optimization problems, further increasing throughput. PAGE 133 119 APPENDIX A ANALYTICAL TEST PROBLEM SET Griewank Objective function: EQUATION CHAPTER 8 SECTION 1 2 1 1cos1n n ii i ixx f d i x (1) with n = 10 and d = 4000 Search domain 10 1210,,:600600,1,2,10iDxxxRxi Solution: **0.0,0.0,0.0,0.0f x Hartman 6 Objective function: 2 11expmn iijjij ijfcaxp x (2) Search domain 6 16,,:01,1,,6iDxxRxi Solution (with m = 4): *0.2017,0.1500,0.4769,0.2753,0.3117,0.6573,3.322368f x See Table 16 for values of aij, ci, and pi,j. PAGE 134 120 Shekel 10 Objective function: 11m T i iiif aac x xx (3) Search domain 4:010,1,,4iiDxRxi Solution (with m = 10): *4.00074671,4.00059326,3.99966290,3.99950981,10.536410f x See Table 17 for values of aij and ci. Table 16 Hartman problem constants aij ci 10.0 3.0 17.0 3.5 1.7 8.0 1.0 0.05 10.0 17.0 0.1 8.0 14.0 1.2 3.0 3.5 1.7 10.0 17.0 8.0 3.0 17.0 8.0 0.05 10.0 0.1 14.0 3.2 pij 0.1312 0.1696 0.5569 0.0124 0.8283 0.5886 0.2329 0.4135 0.8307 0.3736 0.1004 0.9991 0.2348 0.1451 0.3522 0.2883 0.3047 0.6650 0.4047 0.8828 0.8732 0.5743 0.1091 0.0381 PAGE 135 121 Table 17 Shekel problem constants m ai ci 1 4.0 4.0 4.0 4.0 0.1 2 1.0 1.0 1.0 1.0 0.2 3 8.0 8.0 8.0 8.0 0.2 4 6.0 6.0 6.0 6.0 0.4 5 3.0 7.0 3.0 7.0 0.4 6 2.0 9.0 2.0 9.0 0.6 7 5.0 5.0 3.0 3.0 0.3 8 8.0 1.0 8.0 1.0 0.7 9 6.0 2.0 6.0 2.0 0.5 10 7.0 3.6 7.0 3.6 0.5 PAGE 136 122 APPENDIX B MONTE CARLO VERIFICATION OF GL OBAL CONVERGENCE PROBABILITY 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fitness evaluationsConvergence ratio Convergence ratio Calculated convergence ratio Sampled convergence ratio Figure 41 Predicted and Monte Carlo sampled convergence probability Pc for 5 independent optimization runs for the Griewank problem. Each optimization run is limited to 40,000 fitness evaluations with 20 particles. PAGE 137 123 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fitness evaluationsConvergence ratio Convergence ratio Calculated convergence ratio Sampled convergence ratio Figure 42 Predicted and Monte Carlo sampled convergence probability Pc for 12 independent optimization runs for the Griewank problem. Each optimization run is limited to 16,000 fitness evaluations with 20 particles. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fitness evaluationsConvergence probability Pc 1 optimization run 2 optimization runs 5 optimization runs 10 optimization runs 12 optimization runs Figure 43 Monte Carlo sampled convergence probability Pc with sets of multiple runs for the Griewank problem. PAGE 138 124 0 1 2 3 4 5 x 104 0 10 20 30 40 50 GriewankFitness 0 1 2 3 4 5 x 104 -3.4 -3.3 -3.2 -3.1 -3 -2.9 -2.8 Hartman 0 1 2 3 4 5 x 104 -12 -10 -8 -6 -4 -2 0 Shekel 0 0.5 1 1.5 2 x 105 0 0.2 0.4 0.6 0.8 1 Fitness evaluations 0 0.5 1 1.5 2 x 105 0 0.2 0.4 0.6 0.8 1 Fitness evaluations 0 0.5 1 1.5 2 x 105 0 0.2 0.4 0.6 0.8 1 Fitness evaluationsConvergence probability Figure 44 Monte Carlo sampled convergence probability Pc using information from exploratory optimizations stopped usi ng a rate of change stopping condition for the Griewank, Hartman and Shekel problems. A solid line denotes the longest exploratory run in the pool of 1000 optimizations, and a dashed line denote shortest. PAGE 139 125 0 1 2 3 x 105 10-4 10-3 10-2 10-1 100 Fitness evaluations Hartman a = 1, b = 5 0 1 2 3 x 105 10-4 10-3 10-2 10-1 100 Fitness evaluations1 Convergence probabilityGriewank a = 1, b = 5 0 1 2 3 x 105 10-4 10-3 10-2 10-1 100 Fitness evaluations Shekel a = 1, b = 5 Individual run convergence ratio Bayesian Pc estimate Monte Carlo sampled Pc Figure 45 Bayesian Pc comparison for Griewank, Ha rtman and Shekel problem. PAGE 140 126 LIST OF REFERENCES 1. Culler, D. E. and Singh, J. P., 1998 "Parallel Computer Architecture: A Hardware/Software Approach," Morgan Kaufman Publishers Inc., San Francisco, CA. 2. Wilson, G.V., 1995, "Practical Parallel Programming," The MIT press, Cambridge, Massachusetts. 3. Gropp, W. and Lusk, E., 1996, "Users Guide for MPICH, A Portable Implementation of MPI," Argonne Na tional Laboratory, Mathematics and Computer Science Division, http://www.mcs.anl.gov/mpi/mpiuserguide/paper.html last accessed 12/2005. 4. Gropp, W., Lusk, E., Doss, N., and Sk jellum, A., 1996, "A High Performance, Portable Implementation of the MPI Message Passing Interface Standard," Parallel Computing, 22, pp. 789. 5. Snir, M., Otto, S., Huss-Lederman, S., Wa lker, D., and Dongarra, J., 1996, "MPI: The Complete Reference," Massachusetts Institute of Technology: Cambridge, MA 6. Kroo, I., and Manning, V., 2000, "Co llaborative Optimization: Status and Directions," 8th AIAA/US AF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Long Beach, CA, September 6-8. 7. Sobieszczanski-Sobieski, J., "Optimization by Decomposition: A Step from Hierarchic to Non-Hierachi c Systems," NASA Conference Publication 3031, Part 1, Second NASA/Air Force Symposium On Recent Advances in Multidisciplinary Analysis and Optimization, Hampton, Virginia September 1988, pp. 28-30. 8. Kroo, I., Altus, S., Braun, R., Gage, P. a nd Sobieski, I.M., 1994, "Multidisciplinary Optimization Methods for Aircraft Preliminary Design," Proceedings of the 5th AIAA/USAF/NASA/OAI Symposium on Multidisciplinary Analysis and Optimization, Panama City, FL, AIAA-94-4325-CP. 9. Tappeta, R.V., 1996, "An Investigation of Alternative Problem Formulations for Multidisciplinary Optimization," M.S. thesis, University of Notre Dame. 10. Tappeta, R.V. and Renaud, J.E., 1997, "Multi-objective Collaborative Optimization," Journal of Mechanical De sign, Transactions Of the ASME vol 119 no 3. pp. 403-411. PAGE 141 127 11. Gu, X., and Renaud, J.E., 2000, "Deci sion-Based Collaborative Optimization," Proceedings of the 8th ASCE Joint Sp ecialty Conference on Probabilistic Mechanics and Structural Reliabil ity, PMC2000-217, pp. 1-6, CD-ROM Proceedings, Eds. A. Kareem, A. Halder, B.F. Spencer, E.A., Johnson, July 24-26, Notre Dame, IN. 12. Gu, X., Renaud, J.E., Ashe, L.M., Batill, S.M., Budhiraja, A.S., and Krajewski, L.J., 2002, "Decision Based Collaborative Optimi zation," ASME Journal of Mechanical Design, Vol. 124, No. 1, pp. 1-13, Published by the American Society of Mechanical Engineers, USA. 13. Sobieski, I. P., and Kroo, I. M., 2000, "Collaborative Optimization using Response Surface Estimation," AIAA Journal, Vol. 38, No. 10, pp. 1931-1938. 14. Altus, S.S.; Kroo, I. M.; and Gage, P. J., 1996, "A Genetic Algorithm for Scheduling and Decomposition of Multidisciplinary Design Problems," Journal of Mechanical Design, Transactions of the ASME, Vol. 118 No 4, pp. 486-489. 15. Sobieski, I. and Kroo, I., 1996, "Aircraft Design using Collaborative Optimization," AIAA, Aerospace Sciences Meeting and Ex hibit, 34th, Reno, NV, Jan. 15-18. 16. Braun, R.D., and Kroo, I.M., 1997, "Development and Application of the Collaborative Optimization Architecture in a Multidisciplinary Design Environment," Multidisciplinary Design Optimization State of the Art, N. Alexandrov and M.Y. Hussaini (Ed.), SIAM Series: Proceedings in Applied Mathematics 80, pp. 98116. 17. Alexandrov, N. M. and Lewis, R.M., 1999, "Comparative Properties of Collaborative Optimization and other Approaches to MDO," Proceedings of the First ASMO UK/ISSMO Conference on En gineering Design Optimization. MCB University Press, 1999. Availabl e as ICASE technical report 99-24. 18. Braun, R.D., Gage, P., Kroo, I.M. and Sobieski, I., 1996, "Implementation and Performance Issues in Collaborative Optimization," In 6th Annual AIAA/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization. 19. Shankar, J., Ribbens, C.J., Haftka, R.T ., and Watson, L.T., 1993, "Computational Study of a Nonhierarchical Decomposition Algorithm," Journal of Computational Optimization and Applications, Vol. 2, No. 3, pp. 273-293. 20. Braun, R., 1996, "Collaborative Optimiza tion: An Architecture for Large Scale Distributed Design," Ph.D. Thesis, Dept. of Aeronautics and Astronautics, Stanford Univ., Stanford, CA. 21. Cornier, T., Scott, A., Ledsinger, L., Mc Cormick, D., Way, W. and Olds, J., 2000, "Comparison of Collaborativ e Optimization to Conventio nal Design Techniques," 8th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Long Beach, CA. PAGE 142 128 22. Braun, R.D., A.A. Moore and Kroo, I.M., 1997, "Collaborative Approach to Launch Vehicle Design," Journal of Spacecraft and Rockets Vol. 34 No. 4, pp 478-486. 23. Budianto, I., and Olds, J., 2000, "A Collabo rative Optimization Approach to Design and Deployment of a Space Based Infrar ed System Constellation," IEEE P335E, 2000 IEEE Aerospace Conference, Big Sky, MT, March 18-25. 24. Sobieszczanski-Sobieski, J., 1982, "A Linear Decomposition Method for Large Scale Optimization problems," NASA TM 83248. 25. Renaud, J.E. and Gabriele, G.A., 1994, "A pproximation in Non-hierarchic System Optimition," AIAA Journal, Vol. 32, No. 1, pp. 198-205. 26. Sellar, R.S., Batill, S.M., and Renaud, J.E., 1996, "Response Surface Based, Concurrent Subspace Optimization for Multidisciplinary System Design," AIAA Paper 96-0714, AIAA Aerospace Sciences Meeting and Exhibit, Reno, Nevada. 27. Renaud, J.E. and Gabriele, G.A., 1991, "Sequential Global Approximation in Nonhierarchic System Decomposition and Op timization," Adv. Design Automation, Vol. 1, pp. 191-200. 28. Renaud, J.E. and Gabriele, G.A., 1993, "Improved Coordination in Nonhierarchic System Optimization," AIAA Journal, Vol. 31, No. 12, pp. 2367-2373. 29. Tappeta, R., Nagendra, S., Renaud, J.E ., and Badhrinath, K., 1998, "Concurrent Sub-Space Optimization (CSSO) MDO Algor ithms in iSIGHT," General Electric Corporate Research and Developmen t Technical Report 97CRD187, January, Class1, General Electric Corporate Res earch and Development, Niskayuna, NY. 30. Tappeta, R., Nagendra, S., Renaud, J.E ., and Badhrinath, K., 1998, "Concurrent Sub-Space Optimization (CSSO) Code Us age in iSIGHT," General Electric Corporate Research and Developmen t Technical Report 97CRD188, January, Class1, General Electric Corporate Res earch and Development, Niskayuna, NY. 31. Tappeta, R., Nagendra, S., Renaud, J.E ., and Badhrinath, K., 1998, "Concurrent Sub-Space Optimization (CSSO) MDO Algor ithms in iSIGHT, CSSO in iSIGHT: Validation and Testing," General Electric Corporate Research and Development Technical Report 97CRD186, January, Class1 General Electric Corporate Research and Development, Niskayuna, NY. 32. Lin, W. and Renaud, J.E., 2001, "A Co mparative Study of Trust Region Managed Approximate Optimization," In Proc. AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conf erence and Exhibit, 42nd, Seattle, WA, Apr. 16-19. 33. Tappeta, R. V., Nagendra, S., and Renaud, J.E, 1999, "A Multidisciplinary Design Optimization Approach for High Temper ature Aircraft Engine Components," Structural Optimization, Vol. 18, No. 2/3, pp. 134-145. PAGE 143 129 34. Stelmack, M.A., Batill, S.M., Beck, B.C., and Flask, D.J., 1998, "Application of the Concurrent Subspace Design Framework to Aircraft Brake Component Design Optimization," AIAA Paper 98-2033, In Proc. AIAA/ASME/ASCE/AHS/ASC 38th Structures, Structural Dynamics and Mate rials Conference, Long Beach, California. 35. Yu, X.Q., Stelmack, M. and Batill, S.M., 1998, "An Application of Concurrent Subspace Design(CSD) to the Preliminar y Design of a Low-Reynolds Number UAV," In Proceedings AIAA Paper 98-4917, AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, St. Louis, Missouri. 36. Michelena, N., Kim, H.M., and Papalamb ros, P.Y., 1999, "A System Partitioning and Optimization Approach to Target Cascading," In Proc. of the 12th International Conference on Engineering Design, Munich, Germany. 37. Kim, H.M., "Target Cascading in Optimal System Design, 2001, "Ph.D. Thesis, University of Michigan. 38. Tosserams, S., Etman, L.F.P., Papalambros, P.Y. and Rooda, J.E., 2005, "An Augmented Lagrangian Relaxation for Analytical Target Cascading using Alternating Directions Method of Multipliers," In Proc. 6th World Congress of Structural and Multidisciplinary Optimization, Rio de Janeiro. 39. Michelena, N., Park, H., and Papalambro s, P., 2002, "Convergence Properties of Analytical Target Cascading," In Proc. 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, Georgia, September 4-6. 40. Allison, J., Kokkolaras, M., Zawislak, M ., and Papalambros, P.Y., 2005, "On the Use of Analytical Target Cascading an Collaborative Optimization for Complex System Design," In 6th World Congress of Structural and Multidisciplinary Optimization, Rio de Janeiro. 41. Kokkolaras, M., Fellini, R., Kim, H. M., Mi chelena, N., and Papalambros, P., 2002, "Extension of the Target Cascading Fo rmulation to the Design of Product Families," Structural and Multidisciplinary Optimization, Vol. 24, No. 4, pp. 293301. 42. Louca, L.S., Kokkolaras, M, Delagrammatikas, G.J., Michelena, N.F. ,Filipi, Z.S., Papalambros, P.Y., and Assanis, D.N., 2002, "Analytical Target Cascading for the Design of an Advanced Technology Heavy Truck," In Proceedings ASME International Mechanical Engineering Congress and Exposition, Paper ASME2002-32860, New Orleans, Louisiana. 43. Kim, H. M., Kokkolaras, M., Louca, L., Delagrammatikas, G., Michelena, N., Filipi, Z., Papalambros, P., and Assanis, D., 2002, "Target Cascading in Vehicle Redesign: A Class VI Truck Study, Interna tional Journal of Vehicle Design," Vol. 29, No. 3, pp. 199-225. PAGE 144 130 44. Anderson, F. C. and Pandy, M. G., 1999, "A Dynamic Optimization Solution for Vertical Jumping in Three Dimensions ," Comp. Meth. Biomech. Biomed. Eng., Vol. 2, pp. 201-231. 45. Anderson, F. C. and Pandy, M. G., 2001, "Dynamic Optimization of Human Walking," J. Biomech. Eng., Vol. 123, pp. 381-390. 46. Pandy, M. G., 2001, "Computer Modeli ng and Simulation of Human Movement," Ann. Rev. Biomed. Eng., Vol. 3, pp. 245-273, 2001. 47. Neptune, R.R., 1999, "Optimization Algorithm Performance in Determining Optimal Controls in Human Movement Anal yses," J. Biomech. Eng., Vol. 121, pp. 249-252. 48. Soest, A. J. and Casius, L. J. R., 2003, "The Merits of a Parallel Genetic Algorithm in Solving Hard Optimization Problems," J. Biomech. Eng., Vol. 125, pp. 141-146. 49. Buchanan, T. S. and Shreeve, D. A., 1996, "An Evaluation of Optimization Techniques for the Prediction of Muscle Activation Patterns During Isometric Tasks," J. Biomech. Eng., Vol. 118, pp. 565-574. 50. Crowninshield, R. D. and Brand, R. D ., 1981, "A Physiologically Based Criterion of Muscle Force Prediction in Locomotion," J. Biomech., Vol. 14, pp. 793-801. 51. Glitsch, U. and Baumann, W., 1997, "The Three-Dimensional Determination of Internal Loads in the Lower Extremity," J. Biomech., Vol. 11, pp. 1123-1131. 52. Kaufman, K. R., An, K.-N., Litchy, W. J., and Chao, E. Y. S., 1991, "Physiological Prediction of Muscle Forces I. Theoreti cal Formulation," Neuroscience, Vol. 40, pp. 781-792. 53. Lu, T.-W. and OConnor, J. J., 1999, "Bone Position Estimation from Skin Marker Co-ordinates using Global Optimisation with Joint Constraints," J. Biomech., Vol. 32, pp. 129-124. 54. Raasch, C. C., Zajac, F. E., Ma, B., and Levine, W. S., 1997, "Muscle Coordination of Maximum-Speed Pedaling," J. Biomech., Vol. 30, pp. 595-602. 55. Prilutsky, B. I., Herzog, W., and Allinger, T. L., 1997, "Forces of Individual Cat Ankle Extensor Muscles During Locomotio n Predicted Using Static Optimization," J. Biomech., Vol. 30, pp. 1025-1033. 56. van den Bogert, A. J., Smith, G. D., and Nigg B. M., 1994, "In Vivo Determination of the Anatomical Axes of the Ankle Joint Complex: An Optimization Approach," J. Biomech., Vol. 12, pp. 1477-88. PAGE 145 131 57. Mommerstag, T. J. A., Blankevoort, L., Huiskes, R., Kooloos, J. G. M., and Kauer, J. M. G., 1996, "Characterization of the Mechanical Behavior of Human Knee Ligaments: A Numerical-Experimental Appr oach," J. Biomech., Vol. 29, pp. 151160. 58. Reinbolt, J. A., Schutte, J. F., Fregly, B. J., Haftka, R. T., George, A. D., and Mitchell, K. H., 2005, "Determination of Patient-specific Mult i-joint Kinematic Models through Two-level Optimization, J. Biomech., Vol. 38, pp.621-626. 59. Sommer, H. J. III, and Miller, N.R., 1980, "Technique for Kinematic Modeling of Anatomical Joints," J. Biomech. Eng., Vol. 102, pp. 311-317. 60. Vaughan, C. L., Andrews, J. G., and Hay, J. G., 1982, "Selection of Body Segment Parameters by Optimization Methods," J. Biomech. Eng., Vol. 104, pp. 38-44. 61. Kaptein, B. L., Valstar, E. R., Stoel, B. C., Rozing, P. M., and Reiber, J. H. C., 2003, "A New Model-Based RSA Method Validated Using CAD Models and Models from Reversed Engineering," J. Biomech., Vol. 36, pp. 873-882. 62. Mahfouz, M. R., Hoff, W. A., Komistek, R. D., and Dennis, D. A., 2003, "A Robust Method for Registration of Three-Dimensional Knee Implant Models to TwoDimensional Fluoroscopy Images," IEEE T Med. Imaging, Vol. 22, pp. 1561-1574. 63. You, B.-M., Siy, P., Anderst, W., and Tashman, S., 2001, "In Vivo Measurement of 3-D Skeletal Kinematics from Sequences of Biplane Radiographs: Application to Knee Kinematics," IEEE T. Med. Imaging, Vol. 20, pp. 514-525. 64. Gill, P. E., Murray, W., and Wright, M. H., 1986, "Practical Optimization," Academic Press, New York. 65. Wilcox, K. and Wakayama, S., 2003, Si multaneous Optimization of a MultipleAircraft Family. J. Aircraft, Vol. 40, pp. 616-622. 66. Kennedy J., and Eberhart R.C., 1995, "P article Swarm Optimization," Proc. IEEE Intl. Conf. Neural Networks, Perth, Australia, Vol. 4, pp. 1942-1948. 67. Groenwold, A. A. and Fourie, P. C., 20 02, "The Particle Swarm Optimization in Size and Shape Optimization," J. Struct. Mu ltidiscip. Opt., Vol. 23, pp. 259-267. 68. Shi, Y. and Eberhart, R.C., 1998, "Par ameter Selection in Particle Swarm Optimization," Lect. Notes Comput. Sc. Vol. 1447, Springer-Verlag, Berlin, pp. 591-600. 69. Fourie, P.C. and Groenwold, A.A., 200 1, "The Particle Swarm Algorithm in Topology Optimization," In Proc. of the Fourth World Congress of Struct. Multidiscip. Opt., Dalian, China. pp. 52-53. PAGE 146 132 70. Schutte, J.F., 2001, "Particle Swarms in Sizing and Global Optimization," Masters thesis, University of Pretoria, South Africa. 71. Schutte, J.F. and Groenwold, A.A., 2003, "Sizing Design of Truss Structures using Particle Swarms," J. Struct. Multidiscip. Opt., Vol. 25, pp. 261-269. 72. Schutte, J.F. and Groenwold, A.A., 2005, "A Study of Global Optimization using Particle Swarms," J. Global Opt., J. Global Opt., Vol. 31 pp 93-108. 73. Deb, K., 2001, "Multi-Objective Optimiza tion using Evolutionary Algorithms," Wiley-Interscience Series in Systems and Optimization, Chapter 4. 74. Deb, K. and Agrawal, R.B., 1995, "Simulated Binary Crossover for Continuous Search Space," Complex Systems, Vol. 9, pp. 115-148. 75. Deb, K. and Goyal, M., 1996, "A Combined Genetic Adaptive Search (GeneAS) for Engineering Design," Comp. Sci. Informatics, Vol. 26, pp. 30-45. 76. Schutte, J. F., Reinbolt, J. A., Fregly, B. J., Haftka, R. T., and George, A. D., 2004, "Parallel Global Optimization with the Part icle Swarm Algorithm," Int. J. Num. Methods Eng., Vol. 61 No. 13, pp. 2296-2315. 77. Koh, B. I., Reinbolt, J. A., Fregly, B. J., and George, A. D., 2004, "Evaluation of Parallel Decomposition Methods for Biomechanical Optimizations," Comp. Meth. Biomech. Biomed. Eng., (in press). 78. Schaffer, J. D., Caruana, R. A., Eshelman, L. J., and Das, R., 1989, "A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimizing," Proc. 3rd Int. Conf. Genetic Alg., David J. D., ed., Morgan Kaufmann Publishers, San Ma teo, California, pp. 51. 79. Corana, A., Marchesi, M., Martini, C ., and Ridella, S., 1987, "Minimizing Multimodal Functions of Continuous Vari ables with the "Simulated Annealing" Algorithm," ACM Trans. Math. Softw., Vol. 13, pp. 262. 80. Chze, L., Fregly, B. J., and Dimnet, J. 1995, "A Solidification Procedure to Facilitate Kinematic Analyses Based on Vi deo System Data," J. Biomech., Vol. 28, pp. 879-884. 81. Reference Manual for VisualDOC C/C++ AP I, 2001, Vanderplaats Research and Development, Inc., Colorado Springs, CO. 82. Boeringer, D. W. and Werner, D. H., 20 04, "Particle Swarm Optimization Versus Genetic Algorithms for Phased Array Synthesis," IEEE Trans. Antennas Propagation, Vol. 52, pp. 771-779. 83. Brandstatter, B. and Baumgartner, U., 2 002, "Particle Swarm Optimization MassSpring System Analogon," IEEE T Magn., Vol. 38, pp. 997-1000. PAGE 147 133 84. Cockshott, A. R. and Hartman, B. E., 2001, "Improving the Fermentation Medium for Echinocandin B Production Part II: Particle Swarm Optimization," Process Biochem., Vol. 36, pp. 661-669. 85. Costa, E. F. J., Lage, P. L. C., and Biscaia, E. C. Jr., 2003, "On the Numerical Solution and Optimization of Sstyrene Polymerization in Tubular Reactors," Computers Chem. Eng., Vol. 27, pp. 1591-1604. 86. Lu, W. Z., Fan, H.-Y., and Lo, S. M., 2 003, "Application of Evolutionary Neural Network Method in Predicting Pollutant Levels in Downtown Area of Hong Kong," Neurocomputing, Vol. 51, pp. 387-400. 87. Pidaparti, R. M. and Jayanti, S., 2003, "Corrosion Fatigue through Particle Swarm Optimization," AIAA J., Vol. 41, pp. 1167-1171. 88. Tandon, V., El-Mounayri, H., and Kishawy, H., 2002, "NC End Milling Optimization using Evolutionary Computati on," Int. J Mach. Tool Manu., Vol. 42, pp. 595-605. 89. Abido, M. A., 2002, "Optimal Power Flow Using Particle Swarm Optimization," Int. J. Elec. Power Energy Sys., Vol. 24, pp. 563-571. 90. Abido, M. A., 2002, "Optimal Design of Power System Stabilizers using Particle Swarm Optimization," IEEE Trans. Energy Conv., Vol. 17, pp. 406-413. 91. Gies, D. and Rahmat-Samii, Y., 2003 "Particle Swarm Optimization for Reconfigurable Phase-Differentiated Array Design," Microwave Opt. Tech. Letters, Vol. 38, pp. 168-175. 92. Leite, J. P. B. and Topping, B. H. V., 1999, "Parallel Simulated Annealing for Structural Optimization," Comp Struct., Vol. 73, pp. 545-564. 93. Higginson, J. S., Neptune, R. R., and An derson, F. C., 2004, "Simulated Parallel Annealing within a Neighborhood for Optim ization of Biomechan ical Systems," J. Biomech, (in press). 94. Carlisle, A., and Dozier, G., 2001, "An Off-the-Shelf PSO," Proc. Workshop on Particle Swarm Optimization, Indianapolis, USA. 95. Parsopoulos, K.E. and Vrahatis, M.N., 2002, "Recent Approaches to Global Optimization Problems through Particle Swarm Optimization." Nat. Comp., Vol. 1, pp. 235-306. 96. Trelea, I.C., 2002, "The Particle Swar m Optimization Algorithm: Convergence Analysis and Parameter Selection," Info rm. Process. Lett., Vol. 85, pp. 317. PAGE 148 134 97. Geist, A., Beguelin, A., Dongarra, J., Manc hek, R., Jiang, W., and Sunderam, V., 1994, "PVM 3 Users Guide and Reference Manual," Technical Report ORNL/TM12187, Oak Ridge National Laboratory, Knoxville,TN. 98. Anderson, F.C., Ziegler, J., Pandy, M.G ., and Whalen, R.T., 1995, "Application of High-Performance Computing to Numeri cal Simulation of Human Movement," Journal of Biomechanical Engineering Vol. 117, pp.300. 99. Monien, B., Ramme, F., and Salmen, H., 1995, "A Parallel Simulated Annealing Algorithm for Generating 3D Layouts of Un directed Graphs," In Proc. of the 3rd International Symposium on Graph Drawing GD ,Franz JB (ed.). Springer, Berlin, Germany, pp. 396. 100. Eberhart, R.C., Shi, Y., 2001, "Partic le Swarm Optimization: Developments, Applications, and Resources," In Proc. of the 2001 Congress on Evolutionary Computation CEC2001, Seoul, Korea, IEEE Press: New York, pp. 81. 101. Venter, G., Sobieszczanski-Sobieski, J., 2002, "Multidisciplinary Optimization of a Transport Aircraft Wing using Particle Swarm Optimization," In Proc. 9th AIAA/ISSMO Symposium on Multidiscip linary Analysis and Optimization, Atlanta, GA. 102. Shi, Y., Eberhart, R.C., 1998, "A Modified Particle Swarm Optimizer," In Proc. of the IEEE International Conference on Evolutionary computation, Anchorage, Alaska. IEEE Press: Piscataway, USA, pp.69. 103. Eberhart, R.C. and Shi, Y., 2000, "Com paring Inertia Weights and Constriction Factors in Particle Swarm Optimizatio n," In Proc. of the 2000 Congress on Evolutionary Computation .IEEE Servi ce Center: Piscataway, NJ, pp. 84. 104. Clerc, M., 1999, "The Swarm and the Queen: towards a Deterministic and Adaptive Particle Swarm Optimization," In Proc. of the Congress of Evolutionary Computation, Angeline P.J., Michalewicz, Z ., Schoenauer, M., Yao, X., Zalzala, A. (eds), Vol. 3, Washington DC, US A. IEEE Press: New York, pp.1951. 105. L vbjerg, M., Rasmussen, T.K., Krink, T., 2001, "Hybrid Particle Swarm Optimizer with Breeding and Subpopulations ," In Proc. of the Third Genetic and Evolutionary Computation Conference (GECCO-2001), San Francisco, CA. 106. Carlisle, A. and Dozier, G., 2000, "Ada pting Particle Swarm Optimization to Dynamic Environments," In Proc. In ternational Conference on Artificial Intelligence, Vol. 1, Las Vegas, NV, pp. 429. 107. Kennedy, J. and Eberhart, R.C., 1997, "A Discrete Binary Version of the Particle Swarm Algorithm," In Proc. of the 1997 Conference on Systems, Man and Cybernetics .IEEE Service Cent er: Piscataway, NJ, pp. 4104. PAGE 149 135 108. Griewank, A.O., 1981, "Generalized Descent for Global Optimization," J. Optim. Theory and Appl., Vol. 34, pp.11. 109. Corana, A., Marchesi, M., Martini, C., Ridella, S., 1987, "Minimizing Multimodal Functions of Continuous Variables with the Simulated Annealing Algorithm," ACM Trans. in Mathematical Software, Vol. 13, No. 3, pp.262. 110. Wyss, G.D. and Jorgensen, K.H., 1998, "A User s Guide to LHS: Sandias Latin Hypercube Sampling Software," Sandia National Laboratories Technical Report SAND98-0210, Albuquerque, NM. 111. Soderkvist, I., Wedin, P.A., 1993, "Det ermining the Movements of the Skeleton using Well-con gured Markers," J. Biomech., Vol. 26 pp.1473. 112. Spoor, C.W. and Veldpaus, F.E., 1980. "Rigid Body Motion Calculated from Spatial Co-ordinates of Markers, J. Biomech., Vol. 13, pp. 391. 113. Le Riche, R. and Haftka, R.T., 1993, "Op timization of Laminate Stacking Sequence for Buckling Load Minimization by Genetic Algorithm," In Proc. AIAA/ASME/AHS/ASCE/ASC 33rd Structures Structural Dynamics and Materials Conference, Dallas, TX. Also AIAA Jo urnal Vol. 31, No. 5, pp. 951-956. 114. Dixon, L.C.W. and Szeg, G.P., 1975, "Towards Global Optimization," N-Holland Publ. Co. 115. T rn,A., and Zilinskas, A.,1989, "Globa l Optimization," Springer-Verlag, New York. 116. Kemenade, C.H.M. 1995, "A Two-level Ev olution Strategy (Balancing Global and Local Search," Technical report CS-R 9559 1995, Centrum voor Wiskunde en Informatica, Netherlands. 117. Jones, D.R. Perttunen, C.D., and Stuckman, B.E., 1993 "Lipschitzian Optimization without the Lipschitz Constant," J. Opt. Theory and Appl., Vol. 79, pp. 157-181. 118. Srinivas, M. and Patnaik, L. M., 1994, "Genetic algorithms: A Survey," Computer, Vol. 27, No. 6, pp. 17-26. 119. Kirkpatrick, S. Gelatt, C.D. and Vecchi, M.P., 1993, "Optimization by Simulated Annealing," Science, Vol. 220, pp. 621-680. 120. Geman, S. and Hwang, C.R., 1986, "Diffu sions for Global Optimization," SIAM J. on Control and Opt., Vol. 24, pp. 1031-1043. 121. Kan, A.H.G.R and Timmer, G.T., 1987, "Stochastic Global Optimization Methods. Part I: Clustering methods," Mathematical Programming. Vol. 39 No. 1, pp. 27-56. PAGE 150 136 122. Kennedy J., and Eberhart R.C., 1995, "P article Swarm Optimization," Proc. IEEE Intl. Conference on Neural Networks, Pe rth, Australia, Vol. 4, pp. 1942-1948. 123. Loureno, H. R., Martin, O., and Sttzle, T., 2002, "Iterated Local Search," In Handbook of Meta-heuristics, F. Glover and G. Kochenberger, Eds. International Series in Operations Research & Manageme nt Science, Vol. 57. Kluwer Academic Publishers, Norwell, MA, pp. 321-353. 124. Le Riche, R. and Haftka, R.T., 1993, "Op timization of Laminate Stacking Sequence for Buckling Load Minimization by Genetic Algorithm,", In Proc. AIAA/ASME/AHS/ASCE/ASC 33rd Structures Structural Dynamics and Materials Conference, Dallas, TX. Also AIAA Jo urnal, Vol. 31, No. 5, pp. 951-956. 125. Schutte, J.F., Reinbolt, J.A., Fregly B.J ., and Haftka, R.T., 2004, "Parallel Global Optimization with the Particle Swarm Algorithm," J. Numer. Meth. Eng., Vol. 61, pp. 2296-2315. 126. Shi, Y. and Eberhart, R.C., 1998, "Par ameter Selection in Particle Swarm Optimization," Lecture Notes in Computer Science Vol. 1447, Springer-Verlag, Berlin, pp. 591-600. 127. Fourie, P.C. and Groenwold, A.A., 2002, "The Particle Swarm Optimization Algorithm in Size and Shape Optimization," Struct Multidisc Optim, Vol. 23, No. 4, pp. 259-267. 128. Carlisle A., Dozier G., 2001, "An off-the -shelf PSO," Proc. Workshop on Particle Swarm Optimization, Purdue School of En gineering and Technology, Indianapolis, USA. 129. Schutte, J.F., Koh, B.I., Reinbolt, J.A., Fr egly, B.J., Haftka R.T. and George, A.D. 2005, "Evaluation of a Particle Swarm Algorithm for Biomechanical Optimization," J. Biomech., Vol. 127, pp. 465-474. 130. Muselli, M., 1996, "A theoretical Approach to Restart in Global Optimization," J Global Opt. Vol. 10, pp. 1-16. 131. Muselli, M., and Rabbia, M., 1991, "Paral lel Trials versus Single Search in Supervised Learning," Proc. of the S econd International Conference on Artificial Neural Networks, Bournemouth, The Institu tion of Electrical Engineers, pp. 24-28. 132. Trn, A.A., 1976, "Probabilistic Global Optimization, a Cluster Analysis Approach," In Proc. of the Second Eu ropean Congress on Operations Research, Stockholm, pp. 521-527. 133. Sugarthan, P.N., 1999, "Particle Swarm Optimizer with Neighborhood Operator," Congress on Evolutionary Computing, Vol. 3, pp. 1958-1964. PAGE 151 137 134. Groenwold, A.A. and Snyman, J.A., 20 02, "Global Optimization using Dynamic Search Trajectories," J. Global. Opt., Vol. 24, pp. 51-60. 135. Groenwold, A.A. and Hindley, M.P., 2002, "Competing Parallel Algorithms in Structural Optimization," Struct. Multidisc. Optim., Vol. 24, pp. 343-350. 136. Haftka, R.T. and Watson, L.T., 2002, "Mul tidisciplinary Design Optimization with Quasiseparable Subsystems," Technical Report TR-02-09, Computer Science, Virginia Tech. 137. Liu, B., Haftka, R.T., and Watson, L. T., 2004, "Global-local Structural Optimization using Response Surfaces of Lo cal Optimization Margins," Structural and Multidisciplinary Optimizatio n, Vol. 27, No. 5, pp. 352. 138. Schutte, J.F. Haftka, R.T., Watson, L. T., 2004, "Decomposition and Two-level Optimization of Structures with Discrete Sizing Variables," 45th AIAA/ASME/ASCE/AHS/ASC St ructures, Structural Dynamics, and Materials Conference, April, pp. 19-22. 139. Liu, B., Haftka, R.T., and Akgun, M.A., 2000, "Two-level Composite Wing Structural Optimization using Response Su rfaces", Structural Optimization Vol. 20, No. 2, pp. 87. PAGE 152 138 BIOGRAPHICAL SKETCH Jaco Francois Schutte was born in Pretor ia, South Africa, on 12 November 1975. He attended the University of Pretoria wh ere he received both bachelors (1999) and masters (2001) degrees in mechanical engin eering. He also received the Sasol award for best masters student from the Faculty of En gineering at the University of Pretoria. Mr. Schutte continued his graduate stud ies at the University of Florida in Gainesville, Florida, in the pursuit of a doctoral degree. |