
Citation 
 Permanent Link:
 https://ufdc.ufl.edu/UF00097388/00001
Material Information
 Title:
 Toward an extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm
 Creator:
 Davis, Thomas E. ( Dissertant )
Principe, Jose C. ( Thesis advisor )
Childers, Donald G. ( Reviewer )
Arroyo, Antonio A. ( Reviewer )
Rao, Murali ( Reviewer )
Chenette, Eugune R. ( Reviewer )
Phillips, Winfred M. ( Degree grantor )
 Place of Publication:
 Gainesville, Fla.
 Publisher:
 University of Florida
 Publication Date:
 1991
 Copyright Date:
 1991
 Language:
 English
 Physical Description:
 viii, 166 leaves : ill. ; 29 cm.
Subjects
 Subjects / Keywords:
 Crossovers ( jstor )
Determinants ( jstor ) Ergodic theory ( jstor ) Genetic algorithms ( jstor ) Genetic mutation ( jstor ) Integers ( jstor ) Markov chains ( jstor ) Matrices ( jstor ) Polynomials ( jstor ) Simulated annealing ( jstor ) Algorithms ( lcsh ) Combinatorial optimization ( lcsh ) Dissertations, Academic  Electrical Engineering  UF Electrical Engineering thesis Ph. D Simulated annealing (Mathematics) ( lcsh )
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Abstract:
 Simulated annealing and the genetic algorithm are stochastic relaxation search techniques
suitable for application to a wide variety of combinatorial complexity nonconvex
optimization problems. Each produces a sequence of candidate solutions (or populations
of candidate solutions) to the underlying optimization problem, and the purpose of both
algorithms is to generate sequences biased toward solutions which optimize the objective
function.
The appeal of simulated annealing is that it provides asymptotic convergence to a
globally optimal solution. A substantial body of knowledge exists concerning the algorithm
convergence behavior. It is based upon a nonstationary Markov chain algorithm
model. No genetic algorithm model comparable in scope exists in the literature. This
work constitutes an attempt to provide such a model and accompanying convergence
theory by extrapolating the simulated annealing results onto the genetic algorithm. A prerequisite,
developed herein, is a nonstationary Markov chain genetic algorithm model. The essence of the simulated annealing theory is demonstration of (1) existence of a
unique asymptotic probability distribution (stationary distribution) for the stationary Markov
chain corresponding to every strictly positive constant value of an algorithm control
parameter (absolute temperature), (2) existence of a stationary distribution limit as the
control parameter approaches zero, (3) the desired behavior of the stationary distribution
limit (i.e. optimal solution with probability one) and (4) sufficient conditions on the algorithm
control parameter to ensure that the nonstationary algorithm achieves (asymptotically)
the limiting distribution. With the exception of (3), this work adapts that
methodology to the genetic algorithm Markov chain model employing a genetic operator
parameter (mutation probability) as the algorithm control parameter. The results include a
mutation probability control parameter bound analogous to (and asymptotically superior
to) the conventional simulated annealing parameter bounds, and a framework for representing
the genetic algorithm stationary distribution components at all consistent fixed
control parameter values, including zero.
The genetic algorithm stationary distribution limit has nonzero components corresponding
to all solutions. Thus, the simulated annealing global optimality convergence
result does not extrapolate. However, both empirical and theoretical evidence is provided
which suggests that the desired limiting behavior can be approached by suitably adjusting
the algorithm parameters.
 Thesis:
 Thesis (Ph. D.)University of Florida, 1991.
 Bibliography:
 Includes bibliographical references (leaves 163165).
 Additional Physical Form:
 Also available on World Wide Web
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 by Thomas E. Davis.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 026242136 ( AlephBibNum )
25046248 ( OCLC ) AHZ5752 ( NOTIS )

Downloads 
This item has the following downloads:

Full Text 
TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING
CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM
By
THOMAS E. DAVIS
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1991
ACKNOWLEDGEMENTS
The author is extremely fortunate in having the assistance of several talented acade
micians during the conduct of the research program reported in this dissertation. Notably,
Professor Jose Principe, who supervised this work and contributed several key ideas
developed herein, proved a very valuable source of encouragement and support. Also,
Professor Murali Rao assisted very substantially in enforcing mathematical rigor, espe
cially in the formulation of the Markov chain appendices. Professors Antonio Arroyo and
Donald Childers, who served on the committee overseeing this work, are remembered
fondly for the constructive comments they provided during the visits the author made to
Gainesville while conducting this research, as well as for productive associations while
the author was in residence. Also, the assistance provided by Professor Eugene Chenette
from the Eglin Graduate Center in dealing with a variety of administrative complications,
as well as his service on the committee overseeing this work, is graciously acknowl
edged.
Additionally, the generous support of the US Air Force Armament Laboratory to
this work is sincerely appreciated. The author's management chain, notably Mr. Lynn
Deibler, Lt. Col. Rex Franklin, Dr. Eugene Youngblood and Lt. Col. Tom Callen, pro
vided continual encouragement and working condition flexibility. Without their support,
this activity would likely not have been possible.
The Computer Science Directorate at Eglin provided some exceptionally valuable
computer support, on the Eglin AFB Cray YMP, under very flexible conditions. Some of
the insights gained during the conduct of this research would be very difficult, and per
haps impossible, to attain through any method other than simulation. Members of the
staff at the Computer Science Directorate who contributed significantly, especially in
helping the VMSinclined author through the UNIX maze, include Mr. Eddie Blackwell,
Mr. Ben McKinnon and Mr. Danny Majors. Mr. Bill Clements, who made the computing
resources available, and Mr. Calvin George, who helped the author arrange the support,
are also gratefully acknowledged.
Finally, the author wishes to thank Sumiko, who entered his life during the conduct
of this research program, for the support and understanding whose need she never fails to
anticipate.
TABLE OF CONTENTS
page
ACKNOW LEDGEM ENTS....................................................................... ii
A B ST R A C T .......................................................................................... vii
SECTIONS
1 INTRODUCTION......................................... ........................ 1
1.1 NonConvex Combinatorial Optimization and Stochastic
Search Algorithms.................................... ............... 1
1.2 O rganization............................................ ....................... 2
2 SIMULATED ANNEALING............................................... 7
2.1 O verview ................................................ ....................... 7
2.2 Statistical Mechanics and Annealing of Solids..................... 7
2.3 Combinatorial Optimization by Simulated Annealing........... 9
2.4 Theoretical Foundations of Simulated Annealing................ 10
3 THE GENETIC ALGORITHM............................... ........... 20
3.1 O verview ................................................ ......................... 20
3.2 The Simple Genetic Algorithm Operators............................ 21
3.3 Building Blocks, Schemata and the Fundamental Theorem... 23
3.4 An Assessment of the Genetic Algorithm Theoretical
Foundation............................................ .................... 26
4 A MARKOV CHAIN MODEL OF THE SIMPLE
GENETIC ALGORITHM............................................. 28
4.1 O verview ................................................ ......................... 28
4.2 The Markov Chain Model................................... .......... 28
4.3 The State Behavior of the Simple Genetic Algorithm............ 30
5 SOME EMPIRICAL RESULTS................................................ 42
5.1 O verview ................................................ ......................... 42
5.2 State Space Enumeration................................................. 43
5.3 Reward Function Data............................................ ........... 46
5.4 Conditional Probabilities vs .................................... ........ 48
5.5 Converged Limiting Stationary Distributions....................... 52
6 THE CRAMER'S RULE FORMULATION OF THE
STATIONARY DISTRIBUTION........................................ 66
6.1 O verview ................................................ ......................... 66
6.2 The Stationary Distribution Description............................... 66
6.3 Positivity of the Stationary Distribution Components.......... 71
6.4 The Indeterminate Form at a = 0....................................... 72
7 THE ZERO MUTATION PROBABILITY STATIONARY
DISTRIBUTION LIMIT........................................................ 73
7.1 O verview ................................................ ......................... 73
7.2 Functional Form of the Stationary Distribution.................... 73
7.3 The Absorbing State Rows of PI and IP1,I .................. 75
7.4 Reformulation of Propositions 6.7 and 6.8.......................... 77
7.5 The Stationary Distribution Limit.................................... 81
8 A MONOTONIC MUTATION PROBABILITY
ERGODICiTY BOUND....................................................... 86
8.1 O verview .................................................. ....................... 86
8.2 A Weak Ergodicity Bound.................................. ......... 87
8.3 Strong Ergodicity........................................ .................... 88
8.4 Comparison With the Simulated Annealing Parameter
Bound.................................................. ..................... 91
9 REPRESENTATION OF THE STATIONARY
DISTRIBUTION SOLUTION................................. ........... 92
9.1 O verview ................................................... ...................... 92
9.2 The Limiting Case a= 1......................................................... 93
9.3 The General Case 0 < a < 1.................................. .......... 97
9.4 The Limiting Case 0.................................................. 109
9.5 Extending the Stationary Distribution Representation............ 116
10 CONCLUSIONS AND FUTURE DIRECTION........................ 120
10.1 Sum m ary .......................................................................... 120
10.2 Contributions of the Research.............................................. 124
10.3 Future D irection.................................................................... 125
APPENDICES
A DISCRETE TIME FINITE STATE MARKOV CIIAINS........... 126
A .I Introduction..................................................... .. 126
A.2 Elementary Definitions.................................................................. 126
A.3 TimeHomogeneous Markov Chains................................... 128
A.4 Inhomogeneous Markov Chains.......................................... 130
B THE PERRONFROBENIUS THEOREM AND
STOCHASTIC MATRICES................................................... 132
B 1 Introduction............................................................................. 132
B.2 The PerronFrobenius Theorem and Ancillary Results for
Prim itive M atrices............................................................ 132
B.3 The PerronFrobenius Theorem for Stochastic Matrices....... 134
C VANDERMONDE DETERMINANTS, SYMMETRIC
AND ALTERNATING POLYNOMIALS..................... 137
C.1 Introduction................. ................................ 137
C.2 Evaluation of Vandermonde Determinants.......................... 138
C.3 Symmetric (and Alternating) Polynomials............................. 139
C.4 QuasiSymmetric (and QuasiAlternating) Polynomials........ 142
D COMPUTER LISTINGS................................................................. 145
D 1 Introduction............................................................................ 145
D.2 M ain Program Listings.......................................................... 145
D .3 Library Listings....... ............................................................... 152
R E FE R E N C E S.......................................................................................... 163
BIOGRAPHICAL SKETCH..................................................................... 166
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING
CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM
By
THOMAS E. DAVIS
May 1991
Chairman: Professor Jose C. Principe
Major Department: Electrical Engineering
Simulated annealing and the genetic algorithm are stochastic relaxation search tech
niques suitable for application to a wide variety of combinatorial complexity nonconvex
optimization problems. Each produces a sequence of candidate solutions (or populations
of candidate solutions) to the underlying optimization problem, and the purpose of both
algorithms is to generate sequences biased toward solutions which optimize the objective
function.
The appeal of simulated annealing is that it provides asymptotic convergence to a
globally optimal solution. A substantial body of knowledge exists concerning the algo
rithm convergence behavior. It is based upon a nonstationary Markov chain algorithm
model. No genetic algorithm model comparable in scope exists in the literature. This
work constitutes an attempt to provide such a model and accompanying convergence
theory by extrapolating the simulated annealing results onto the genetic algorithm. A pre
requisite, developed herein, is a nonstationary Markov chain genetic algorithm model.
The essence of the simulated annealing theory is demonstration of (1) existence of a
unique asymptotic probability distribution (stationary distribution) for the stationary Mar
kov chain corresponding to every strictly positive constant value of an algorithm control
parameter (absolute temperature), (2) existence of a stationary distribution limit as the
control parameter approaches zero, (3) the desired behavior of the stationary distribution
limit (i.e. optimal solution with probability one) and (4) sufficient conditions on the algo
rithm control parameter to ensure that the nonstationary algorithm achieves (asymptoti
cally) the limiting distribution. With the exception of (3), this work adapts that
methodology to the genetic algorithm Markov chain model employing a genetic operator
parameter (mutation probability) as the algorithm control parameter. The results include a
mutation probability control parameter bound analogous to (and asymptotically superior
to) the conventional simulated annealing parameter bounds, and a framework for repre
senting the genetic algorithm stationary distribution components at all consistent fixed
control parameter values, including zero.
The genetic algorithm stationary distribution limit has nonzero components corre
sponding to all solutions. Thus, the simulated annealing global optimality convergence
result does not extrapolate. However, both empirical and theoretical evidence is provided
which suggests that the desired limiting behavior can be approached by suitably adjusting
the algorithm parameters.
SECTION 1
INTRODUCTION
1.1 NonConvex Combinatorial Optimization and Stochastic Search Algorithms
A wide variety of engineering applications lend themselves to formulations which
require the solution of combinatorial optimization problems. Typically, the optimization
problem is nonconvex and is defined over a very high dimensionality search space (e.g.
inverse vision problems, in which an image array of 512X512 pixels at 8 bits/pixel might
be encountered, resulting in a search space dimensionality of 2M). Consequently, direct
solution is usually intractable.
An alternative to direct solution is to select one of a variety of iterative improve
ment solution techniques, usually some variant of gradient search. But by definition,
deterministic iterative improvement techniques terminate in local extrema, and they
ordinarily provide no means of assessing the amount by which the selected local extre
mum deviates from the global extiemum. A typical means of avoiding local extrema
entrapment is to implement the iterative improvement solution method stochastically.
The most commonly employed stochastic algorithm approach to combinatorial opti
mization is simulated annealing [KiGe83, LaAa871, which is also sometimes referred to
as probabilistic hill climbing [RoSa85]. It exploits the analogy of combinatorial
optimization to the annealing of crystalline solids, in which a solid is cooled very gradu
ally from some elevated temperature and thereby allowed to relax toward its low energy
states. The appeal of the algorithm class derives from the fact that provided certain
constraints on an algorithm control parameter (analogous to absolute temperature) are
observed, asymptotic convergence to a global extremum is guaranteed.
The key limitation of simulated annealing is that the convergence behavior is
asymptotic. Thus global optimality is obtained only after an infinite number of algorithm
iterations. The rate of convergence to optimality is determined by a nonnegative algo
rithm control parameter whose ideal value is zero and which must observe a lower bound
in order to assure coherent algorithm behavior. The best available known bound for the
parameter, the annealing schedule bound, is of the form K/log(k) where k is the iteration
index and K is a parameter independent of k [GeGe84, MiRo85].
Another combinatorial optimization stochastic search technique reported in the lit
erature is the genetic algorithm [Davi87, Gold83, Gold89a, Gref85, Gref87]. It emulates
the evolution of biological systems by employing a set of stochastic operators (e.g.
reproduction, crossover and mutation) to transform a population of candidate solutions to
the underlying optimization problem into a new (descendent) population. It has some fea
tures which suggest that it may provide significantly improved convergence behavior
over simulated annealing on certain types of optimization problems. However, the nature
of the genetic operators and their influence on algorithm behavior is only understood in
general terms. No complete theoretical model of the algorithm exists in the literature. The
fundamental goal of the work reported here is to provide a theoretical framework for ana
lyzing the algorithm based upon the asymptotic probability distribution of the solution
sequences which it produces. The work reported herein includes significant progress on
the key intermediate steps to achieving that goal.
1.2 Organization
The remaining sections of this paper are organized as follows. Sections 2 and 3 are
background reviews of the simulated annealing and genetic algorithm literature respec
tively. Section 2 places considerable emphasis on the methodology employed to yield the
asymptotic convergence results which are the theoretical foundation of the simulated
annealing algorithm. That methodology appeals heavily to the theory of inhomogeneous
(nonstationary) Markov chains and their asymptotic state probability distributions. The
essence of the simulated annealing convergence theory is a set of sufficient conditions to
ensure that the asymptotic probability distribution of the Markov chain which represents
the algorithm is independent of its starting state and has probability zero for all states cor
responding to suboptimal solutions.
Section 3 begins with a verbal description of the three fundamental stochastic
operators employed in genetic algorithms (i.e. reproduction, crossover and mutation), and
proceeds to review the existing theoretical foundation of the algorithm class. A conclu
sion of that section is that while certain important theoretical results exist, notably the so
called schema theorem and some work on a problem construct referred to as the minimal
deceptive problem, the genetic algorithm lacks the theoretical foundation necessary to
either compare it with simulated annealing or to answer key questions concerning the
design of a genetic algorithm for a given application.
The author's contribution to this work begins with Section 4. The major result of
that section is a very general, nonstationary Markov chain model of the variants of the
genetic algorithm which employ combinations of the three fundamental genetic algorithm
operators. The model is tailored to resemble that employed in developing the simulated
annealing methodology, and in that regard, the genetic algorithm mutation operator is
shown to provide a function very similar to that of the simulated annealing absolute tem
perature analog. Specifically, the stationary algorithm corresponding to every constant
value of the mutation probability parameter satisfying 0
asymptotic probability distribution (stationary distribution).
The key results of Section 4 are state transition matrices, formulated in terms of the
algorithm parameters and objective function of the underlying optimization problem, for
one, two and threeoperator variants of the algorithm. Due to the nature of the genetic
operators, the state transition matrices exhibit an extremely high degree of symmetry.
These matrices, and some key related results, are used extensively in later sections.
Section 5 digresses briefly from the theoretical development to produce and
examine some empirical work based upon the algorithm model. The presentation is not,
nor is it intended to be, a thorough empirical study. It is provided to help fix some of the
algorithm model state space and asymptotic probability distribution ideas which are cen
tral to this work, and it anticipates some of the theoretical results which follow.
Section 6 resumes the theoretical development. Its result is an expression for the
components of the unique asymptotic probability distribution produced by the stationary
algorithm variants which implement the mutation operator with nonzero mutation proba
bility (i.e. the stationary two and threeoperator algorithm variants). The result is
expressed in terms of Cramer's Rule and thus its solution requires evaluation of
determinants. The determinants are the characteristic polynomials, evaluated at X= 1, of
matrices derived from the state transition matrix produced in Section 4 by zeroing one
row. A later section attacks the problem of explicitly solving the system, based upon the
highly symmetrical nature of the state transition matrix, but some very significant results
are obtainable from the product of Section 6 without explicit solution.
An essential step in establishing a connection between simulated annealing and the
genetic algorithm is demonstrating the existence of a stationary distribution limit for the
algorithm as the mutation probability approaches zero. Section 7 accomplishes that task
and also provides a foundation for deducing, in Section 8, a mutation probability bound
analogous to the annealing schedule bounds of the simulated annealing algorithm. The
results developed in Sections 7 and 8 apply to both the two and threeoperator algorithm
variants.
A somewhat surprising result produced in Section 7 and anticipated by the empiri
cal study reported in Section 5 is that the stationary distribution zero mutation probability
limit does not necessarily isolate globally optimal solutions. In fact, it provides nonzero
probability for all solutions of the underlying optimization problem and consequently the
extrapolation of the simulated annealing methodology is less than exact. However, both
5
the empirical results presented in Section 5 and some results developed later in Section 9
suggest that the required limiting behavior can be approached as closely as desired by
adjusting the algorithm parameters appropriately.
Section 9 attacks the problem of explicitly solving the system which results from
the Cramer's Rule formulation of the stationary distribution of the timehomogeneous
two and threeoperator algorithms. It is a very extensive development which yields an
expression for the coefficient of the general term in the Taylor's series expansion of the
required determinants. It is based upon the highly symmetrical nature of the state trans
ition matrix, as alluded to earlier.
The results of Section 9 are not reduced to a directly useable explicit solution.
Nevertheless, they do provide significant insight into the functional form of the stationary
distribution components. Furthermore, Section 9.5 points out some very significant iden
tities which exist among the coefficients of the Taylor's series and suggests a method for
continuing the Section 9 development based upon the algebra of symmetric and
alternating polynomials. Explicit solution of the stationary distribution equations is the
major incomplete task required for extrapolation of the simulated annealing convergence
theory onto the genetic algorithm.
Section 10 summarizes this work and recapitulates the significant results. It also
proposes continuation of two parts of this research: (1) pursuit of the stationary distribu
tion solution and (2) refinement of the mutation probability control parameter bound.
An appropriate mathematical framework for examining both the simulated
annealing and genetic algorithms is the theory of Markov chains. Appendix A is included
to summarize some essential definitions and theorems. Appendix B is devoted to the
PerronFrobenius theorem, which is fundamental to the study of nonnegative matrices in
general and Markov chains in particular. Several important Markov chain theorems are
specializations of it and the key developments in Sections 6 and 7 require its application.
All of the Appendix A and Appendix B results are provided without proof or elaboration,
but their foundation is obtainable from various references (e.g. [Cinl75] for the more ele
mentary results in Appendix A, [Sene81] for the Appendix B material on the Perron
Frobenius Theorem and [Sene81, IsMa76] for the Appendix A ergodicity related
definitions and theorems). These results are invoked freely in the following sections,
either by specific reference to definition/theorem number, or if the context makes it
appropriate, they are simply assumed.
Appendix C is provided as background for the Section 9.5 discussion on coefficient
identities and extending the stationary distribution representation development. With the
exception of Section C.4, the material presented in Appendix C is obtainable from
advanced algebra texts (e.g. [MoSt64]). The symmetric/alternating polynomial general
ization in Section C.4 is original.
Appendix D collects the computer program listings for the programs employed in
generating the results reported in Section 5. The programs presented there were devel
oped and executed on the Cray YMP operated by the Computer Science Directorate at
Eglin AFB, Fl.
SECTION 2
SIMULATED ANNEALING
2.1 Overview
As noted in the introduction, a very commonly employed approach to the solution
of nonconvex combinatorial optimization problems is a stochastic relaxation technique
introduced by Kirkpatrick et al. and referred to as simulated annealing [KiGe83]. The
technique is so named by virtue of its analogy to the annealing of solids, in which a crys
talline solid is heated to its melting point and then allowed to cool very gradually until it
is again in the solid phase at some nominal temperature. In the limiting case of
infinitesimal cooling rate and absolute zero final temperature, the resulting solid achieves
its most regular possible crystal lattice configuration (i.e. minimum lattice energy state),
and hence is free of crystal defects. Simulated annealing establishes the connection
between this sort of thermodynamic behavior and the search for the global minimum of
an objective function in a combinatorial optimization problem, and further, it provides an
algorithmic means of exploiting the connection. This section is a review of the technique
with special emphasis on known results which bound the convergence behavior of com
puter algorithms belonging to the class.
2.2 Statistical Mechanics and Annealing of Solids
The fundamental assumption of statistical physics is that the thermodynamic behav
ior of a many particle system can be represented by a statistical ensemble, and that if the
system is in thermal equilibrium, the time averages of macroscopic thermodynamic
properties of the system are equal to the corresponding ensemble averages (crgodicity
hypothesis). The random variable represented by the ensemble is the system thermal
energy, and at thermal equilibrium the probability distribution is completely determined
by the system temperature. The distribution is known as the Boltzman distribution, or
alternatively as the Gibbs distribution, and its form is
exp{E(i)/kT}
Pr{E = E(i)} = ep{E(i)/kT} Eq. 2.1
Z(T)
where E = the system thermal energy (a random variable)
E(i) = the energy corresponding to state i
k = Boltzman's constant
T = the system temperature
Z(T) = the partition function.
The factor
p{ E(i)
kT
is called the Boltzman factor. The partition function provides the necessary normalization
to make Eq. 2.1 a state occupancy probability. It can be expressed as
Z(T)= eexp ) Eq. 2.2
At elevated temperatures, the system represented by the probability distribution in
Eq 2.12 occupies all states in its state space with nearly uniform probability, while at
low temperatures, states having low energy are favored. When the temperature
approaches absolute zero, only states corresponding to the minimum value of energy
have nonzero probability. Thus, the thermodynamic system's energy function can be
effectively searched for its minimum value by starting the system at an elevated tempera
ture and allowing it to cool gradually to absolute zero, at which point one of its minimum
energy states is occupied with probability one. This is the mechanism which guides the
annealing of solids.
The cooling schedule employed in annealing solids is constrained by the require
ment that the system be allowed to achieve thermal equilibrium at each temperature. The
Gibbs distribution only represents the system's energy distribution in the stationary case
(i.e. equilibrium). If this requirement is not satisfied, defects can be frozen into the crystal
lattice preventing the system from achieving the minimum possible energy state. This
behavior is analogous to local minima entrapment in combinatorial optimization search.
The restriction on the annealing schedule necessary to avoid it is the fundamental limita
tion on the annealing technique.
2.3 Combinatorial Optimization by Simulated Annealing
Simulated annealing approaches combinatorial optimization problems in a closely
analogous fashion. In simulated annealing, the optimization problem's solution space cor
responds to the state space of the analogous thermodynamic system and its cost function
is analogous to the thermodynamic system's energy surface. The analog of the
thermodynamic system's temperature is a nonnegative algorithm control parameter, T.
Two other algorithm components are also required. They are the stochastic next
state generation and acceptance mechanisms, and they incorporate the dependence of the
algorithm on the control parameter, T. The next state generation mechanism is employed
by the algorithm to transform a current solution into a new candidate solution, and the
acceptance mechanism is employed to decide whether to retain or discard the proposed
new solution. Together, these stochastic operators are responsible for making the search
algorithm simulate the thermodynamic system's statistical behavior. Consequently, they
must satisfy certain requirements to assure coherent algorithm behavior. These require
ments are explored in some depth later in the context of algorithm convergence behavior.
Conceptually, the operation of the simulated annealing algorithm can be described
as follows. The algorithm starts at some initial value of the control parameter and with
some initial solution. Then, the state generation mechanism is employed to synthesize a
new candidate solution. The new solution is examined by the acceptance mechanism and
either accepted or rejected. If it is accepted, the new solution becomes the current solu
tion. Otherwise, the old current solution is retained. This process is repeated, generating a
sequence of temporary solutions, until an approximate equilibrium is achieved in which
the solution space occupancy is described by the Gibbs distribution (Eq. 2.12). Once this
approximate equilibrium is achieved, the control parameter value is reduced and the solu
tion sequence is extended until equilibrium is achieved at the new control parameter
value. This process is repeated until some termination condition (e.g. minimum control
parameter value) is attained. The current solution at termination is then accepted as the
solution to the optimization problem.
It is noted in passing that simulated annealing always involves minimizing a cost
functional, never maximizing a reward. However, this causes no loss of generality
because any combinatorial optimization problem can be translated into an equivalent
minimization problem.
2.4 Theoretical Foundations of Simulated Annealing
The evolution of the search sequence of a simulated annealing algorithm as out
lined, in which each succeeding solution in the sequence is determined stochastically
based upon the current solution, suggests that the algorithm behavior can be described as
a Markov chain. Indeed it can, and all of the known convergence results for simulated
annealing algorithms are derived from analysis of Markov chain models [LaAa87,
GeGe84, LuMe86, MiRo85, Rior58]. This subsection establishes a Markov chain model
to represent the simulated annealing algorithm and then employs it in reviewing the
development of the published convergence bounds. This development essentially follows
[LaAa87].
2.4.1 A Markov Chain Model of Simulated Annealing
Let a combinatorial optimization problem be represented by the pair (S, C) where S
is the problem's solution space and C is its cost function, and assume without loss of gen
erality that the optimization problem requires minimization of C. Also, assume that S is
finite. Then, a simulated annealing algorithm for solving this problem can be
characterized by the quadruple (S,io,PT,T) where S is as defined above and where i0 e S
is an initial candidate solution, p is a stochastic matrix which describes a stochastic state
transition mechanism (the composition of the next state generation and acceptance mech
anisms discussed in Section 2.3) and t = {Tk} is a finite length monotone nonincreasing
sequence of positive control parameter values. The first parameter value in t is To and the
final value is Tf. P, incorporates the algorithm dependence on both C and T.
The algorithm generates a sequence of candidate solutions, {ik:0 k 5 f}, by
employing the state transformation mechanism (described by PT) to transform solution ik
into ik+1. At the k0 transition, T is completely determined by Tk. The solution sequence is
extended until T = Tf, at which point the current solution, if, is accepted as the solution to
the combinatorial optimization problem. Thus Tf signals algorithm termination. T, can be
allowed to depend on {ik} provided due regard is paid to the requirement for termination.
Since the solution state transition mechanism is stochastic, and since the conditional
dependence of the solution sequence only extends to one transition, the solution sequence
is a Markov chain by Definition Al. Its state transition matrix is PT (Definition A4).
The state transition matrix is decomposed into two parts for convenience in the fol
lowing. It consists of the next state generation mechanism, Gj(T), which describes the
probability of generating state j given that the current state is i, and the state acceptance
mechanism, Aij(T), which describes the probability of accepting the generated state. Thus,
PT(i,j) is written as
Gj(T)Aij(T) j i
P3(i,j) = N Eq. 2.3
1 2 G,,(T)A,(T) j=i
I=1,I^i
In this result, N = card(S) represents the cardinality of the solution space.
It is noted in passing that the usual form of the state acceptance mechanism is the so
called Metropolis criterion [Metr53], given by
A(T)= min 1, exp ( c(i]). Eq. 2.4
1 If T
This is the form employed by Kirkpatrick et al. in the original work [KiGe83] and most
others published since are variations of it. Also, the usual form of the next state genera
tion mechanism is
jE Si
Gi(T)= G,= Gji Ni J Eq. 2.5
0 otherwise
where Si c S is the set of states accessible from state i in one transition (by definition,
i i Si), and where N, = card(Si). Note that Gij defined by Eq. 2.5 is symmetric and inde
pendent of T.
2.4.2 Asymptotic Convergence Behavior
The subject of interest in the remainder of this section is a set of sufficient condi
tions on PT and T to ensure that an optimal solution is achieved. These conditions will
prove to guarantee asymptotic convergence only (i.e. T must be an infinite sequence,
which of course violates the termination requirement of the algorithm). Two cases will be
examined. The first only involves timehomogeneous (stationary) Markov chains (Defini
tion A5) and is presented due to its relative ease of analysis. Its purpose is to provide a
foundation for the essential ideas involved in the second case, which requires an appeal to
ergodicity theorems for inhomogeneous (nonstationary) Markov chains. The useable con
vergence behavior results which are the goal of this effort derive from analysis of the sec
ond case.
The first (simple) algorithm is represented as a sequence of solutions evolving as a
sequence of distinct Markov chains. Each Markov chain in the sequence executes at a
fixed control parameter value (and hence is timehomogeneous) and each succeeding
Markov chain executes at a lower (but strictly positive) parameter value. Thus, in the
sequence t, each distinct parameter value, T,, is associated with a distinct time
homogeneous Markov chain and T, occurs at some large number of consecutive locations,
K,, in T. This case is hereafter referred to as the homogeneous (or stationary) algorithm.
The analysis of the convergence behavior of the homogeneous algorithm includes the
hypothesis that each Markov chain in the sequence achieves its stationary distribution.
This hypothesis is equivalent to K,  oo for all 1 (Definition A10, Theorem A3, Theorem
A4).
In the second case, the algorithm is represented as a sequence of solutions evolving
as a single inhomogeneous (nonstationary) Markov chain. This formulation is hereafter
referred to as the inhomogeneous (or nonstationary) algorithm. In the inhomogeneous
algorithm, the control parameter value is allowed to decrease (though not necessarily
required to) after each state transition. The dependence of G,(T) and Aj(T) on T results in
the inhomogeneous behavior.
2.4.2.1 The Homogeneous Algorithm
In the homogeneous algorithm, the means of establishing the requirements for
asymptotically optimal convergence is to first establish sufficient conditions for existence
of the stationary distribution of each Markov chain and then to establish sufficient condi
tions to ensure that the stationary distribution converges to a uniform distribution over the
set of optimal solutions as the control parameter value approaches zero. That is
1
lim q,(i)= Nopt o Eq. 2.6
0 otherwise
where qT is the stationary distribution of the Markov chain executing at control parameter
value T, Sop, c S is the set of solutions i e S:C(i) = Cop, and Nop, = card(Sop).
Theorems A1A3 can be employed to deduce sufficient conditions on PT(i,j) (or
alternatively on G,j(T) and Aj(T)) to ensure the existence of the stationary distribution of
each Markov chain in the sequence representing the homogeneous algorithm. Since only
combinatorial (finite solution space) optimization problems are under consideration and
since by definition the homogeneous algorithm only employs timehomogeneous Markov
chains, the finite state space and timehomogeneity requirements of Theorem A3 are
satisfied. Beyond these requirements, existence of the stationary distribution of each Mar
kov chain in the homogeneous algorithm only requires that the chain produced by PT be
irreducible and periodic (Definitions A7 and A9).
If A,(T) is selected as the Metropolis criterion, Eq. 2.4, then
Vi,j E, VT > 0 : Aj(T) > 0.
Thus, from Eq. 2.3, the irreducibility requirement is transferred to the next state genera
tion mechanism, G,(T). Note that from Theorem Al, irreducibility can readily be
achieved within the definition supplied by Eq. 2.5. Also, in [MiRo85], Theorem A2 is
used to show that a sufficient condition for aperiodicity is
VT > 03i,je E 3 A(T) < 1.
This condition is satisfied by the Metropolis criterion provided the trivial case indicated
by
Vi,j eE : C(i) = C(j)= Copt
is excluded, because then k,l always exist such that
C(l) = Co, < C(k). Eq. 2.7
The sufficient condition on Aj(T) can then be met by selecting i = 1 and j = k. (Use Eq.
2.7 in Eq. 2.4).
Although existence of the stationary distribution (or at least sufficient conditions on
Gij(T) and Aj(T) to ensure its existence) are now established, and examples of Gi and Aj
which meet these conditions provided, actually achieving the stationary distribution is
only guaranteed after an infinite number of state transitions. This is equivalent to the ther
mal equilibrium constraint on the temperature schedule for annealing solids discussed in
Section 2.2. Each Markov chain in the sequence representing the homogeneous algorithm
is subject to this requirement, and consequently must be of infinite length.
Next, sufficient conditions to assure convergence of the stationary distribution of
the final Markov chain in the homogeneous algorithm to the desired optimal distribution
(Eq. 2.6) are established. First, note that if the stationary distribution of a Markov chain
in the sequence exists, then a function g(C(i),T) corresponding to that Markov chain
exists such that
g(C(i), T)
Vi c E: qr(i) g(C(T) Eq. 2.8
Sg(C(j), T)
J
where g satisfies
(1) Vie E, VT>0 : g(C(i),T) > 0
Sg(C(i), T)Gj(T)A,(T) = Eq. 2.9
(2) Vj EE ij
g(C(i), T) Y_ Gj(T)Ai(T)
i*J
This can be deduced by noting that the uniquely determining conditions on q expressed in
Theorem A3 are met by g satisfying Eq. 2.8 and 2.9. Eq. 2.9 is called the global balance
equation. Close examination reveals that it is exactly the necessary condition for equilib
rium state occupancy. A more restrictive condition, in which the balance holds for every
pair of states on a pairwise basis is called the detailed balance equation.
It can be shown that the following additional constraints on g guarantee conver
gence of the stationary distribution to the optimal (i.e. to Eq. 2.6) [MiRo85]. Note that
Eq. 2.10(2) requires an exponential form.
(1) lim g(A,T) = 0 A>0
T0 [00 A<0
g(A,,T) Eq. 2.10
(2) ( = g(A A, T)
g(A2, T)
(3) VT > 0 : g(0,T)= 1
Collectively, Eq. 2.82.10 provide a set of sufficient conditions on Gj(T) and Aj(T)
to assure convergence of the stationary distribution to Eq. 2.6. The key condition, the
global balance equation, is implicit however, and thus is very difficult to apply. Neverthe
less, it can be shown [LaAa87] that if G,(T) and Aj(T) defined by Eq. 2.4 and Eq. 2.5 are
employed, the conditions are satisfied, and that the corresponding stationary distribution
is provided by
exp{(C(i) Cop)/kT}
Vi E: qT(i)= C /kT} Eq. 2.11
Yexp{(C(j) Cop/kT}
J
The key to that development is that the Gij(T) and A2j(T) of Eq. 2.4 and 2.5 satisfy the
detailed balance equation, the symmetry of G~i being a critical consideration.
The behavior required by Eq. 2.10(1) is limiting behavior as T  0. Thus, these
conditions assure convergence to the global minimum with probability one (i.e. conver
gence of the stationary distribution to Eq. 2.6), only if the sequence of Markov chains is
infinite and lim T, = 0. Recalling that a guarantee of achieving the stationary distribution
requires that each Markov chain be of infinite length, the homogeneous algorithm is seen
to require a doubly infinite sequence of solutions composed of an infinite sequence of
infinitely long Markov chains.
2.4.2.2 The Inhomogeneous Algorithm
The behavior of the homogeneous algorithm, which requires that an infinite number
of transitions be executed at each control parameter value, clearly is not very useful. The
following reviews two published convergence results which extend the ideas developed
for the homogeneous algorithm to the inhomogeneous counterpart [GeGe84, MiRo85].
These results adopt the sufficient conditions on Gi(T) and Aj(T) developed for the homo
geneous algorithm as a starting point (i.e. irreducibility, aperiodicity and Eq. 2.82.10)
and extend them to the case in which each timehomogeneous Markov chain is finite
length (i.e. to the inhomogeneous algorithm). The key products of this effort are lower
bounds on the algorithm control parameter's approach to zero. In both cases discussed
here, the bound is of the form K/log(k) where k is the index of the Markov chain repre
senting the inhomogeneous algorithm and K is independent of k. The following is a brief
sketch of the approach taken to arrive at these results. It is common to both.
Given that G,j(T) and A1j(T) are selected as in Eq. 2.4 and 2.5, each state transition
matrix in the inhomogeneous Markov chain of the inhomogeneous algorithm satisfies all
of the sufficient conditions for stationary distribution existence and asymptotic conver
gence to optimality developed for the homogeneous algorithm (i.e. irreducibility, aper
iodicity and Eq. 2.82.10). Further, the explicit form of the resulting stationary
distribution is given by Eq. 2.11. Thus, for each transition matrix, PT), there exists an
eigenvector, q,, having eigenvalue 1 and satisfying the probability vector conditions.
Further, q. converges to the limiting distribution of Eq. 2.6 as Tk + 0. Consequently,
Theorem A7 can be used to establish strong ergodicity (and hence the desired conver
gence behavior for Tk 0) provided (1) that weak ergodicity can be established and (2)
that the inequality appearing in Theorem A7 obtains.
Under the hypothesis that Gj(T) and Aj(T) are defined in accordance with Eq. 2.4
and 2.5, in which case the required eigenvector is explicitly provided by Eq. 2.11, and
that condition (1) (weak ergodicity) is satisfied, both [GeGe84] and [MiRo85] prove con
dition (2) of the above. The development is straightforward but tedious. Of more interest
here is the means of establishing condition (1), because it leads to the annealing schedule
bound.
Both developments employ Theorem A6 to establish weak ergodicity. The general
approach is to use the definitions of Gj(T) and Aj(T), along with bounds on the extrema
of either the cost function [GeGe84] or the slope of the cost function [MiRo85] to define
bounds on the one step transition probabilities. The transition probability bound is then
employed to arrive at an upper bound on the t coefficient of ergodicity of Theorem A5,
which is used in turn in Theorem A6 to deduce a sufficient condition to guarantee weak
ergodicity. The condition is in the form of a lower bound on the annealing schedule.
The first such result to be published is in [GeGe84]. The resulting bound is
SNx (Cma, CmJn)
Tk 2 Nx(C Eq. 2.12
log(k)
k>2
where Cm,a and Cmin are the maximum and minimum values respectively of C(i) for i e S
and N = card(S). Thus, Cmi,, is the desired Cop.
The annealing schedule bound established in [MiRo85] is more refined than that of
Eq. 2.12. It is given by
rL
Tk > lo Eq. 2.13
log(k)
k>2
where r is the radius of the graph defining the accessible state neighborhoods of the next
state generation mechanism (i.e. the {S,} where S, c S is defined in Eq. 2.5), and L is a
constant which bounds the local slope of the cost function. Specifically, r and L are given
by
r= min max d(i,j) Eq. 2.14
iESSma jE S
where d(i,j) is the distance ofj from i, measured by the minimum number of state trans
itions required to arrive at j starting at i, where Sma, c S is the set of local maxima of C
and
L= max max I C(j) C(i) I. Eq. 2.15
i eS j S
Note that in the special case S, = S for all i e S, then Eq. 2.14 and Eq. 2.15 reduce to r= 1
and L = Cma Cmin respectively, and substitution into Eq. 2.13 yields
Tk (Cmax Cmin)
Tk 2 Eq. 2.16
log(k)
The Eq. 2.16 result is smaller than that of Eq. 2.12 by the factor 1/N.
Both of these published convergence results, as well as several others which are
minor variations of them, are of the general form K/log(k). This behavior is the key lim
itation of the algorithm class, and is believed to be a fundamental limitation imposed by
the neighborhood system inherent in the conventional simulated annealing state
generation mechanism [GeGe84] (i.e. the fact that at low control parameter values, the
likelihood of making the large state transition necessary to escape a local extremum is
radically diminished). The simulated annealing literature includes some amount of specu
lation concerning state generation mechanisms which permit occasional large transitions
even at low control parameter values.
SECTION 3
THE GENETIC ALGORITHM
3.1 Overview
The genetic algorithm is an iterative improvement stochastic search method appro
priate for application to combinatorial optimization problems and based on the evolution
of biological systems. It implements the fundamental idea of survival fitness on a
population of string structures which are coded representations of solution candidates
selected from the solution space of the optimization problem. The population of candi
date solutions (which collectively represent the current estimate of the optimum solution)
is subjected to a set of stochastic genetic operators which transform a current population
into a new (descendent) population. A variety of distinct genetic operators (based on bio
logical analogs) are available and are reported in the literature [Davi87, Gold89a, Gref85,
Gref87]. The most important of them are (1) proportional reproduction, (2) crossover and
(3) mutation. A one, two or three operator genetic algorithm employing combinations of
these operators with fixed population size is referred to herein as a simple genetic algo
rithm.
The genetic operators are all implemented stochastically, but they do not result in a
simple random walk through the search space. They represent a highly structured search
which exploits the historical record of performance reflected at each stage of the search
by the current population. It is the novel use of this historical record which is central to
the appeal of the genetic algorithm.
Genetic algorithms usually operate on populations of bitstrings (i.e. the optimiza
tion problem is usually coded such that its search space is defined over a binary string
alphabet), and they always attempt to maximize some strictly nonnegative objective
function. The evolution of the fixed size population of candidate solutions toward domi
nation by optimal solutions is the algorithm goal.
The three genetic operators of a simple genetic algorithm are discussed in the next
subsection. An analysis of their behavior requires introduction to the concept of sche
mata, or similarity templates, and that task is undertaken in a subsequent subsection. This
section concludes with an assessment of the theoretical foundation available for the
analysis of genetic algorithms.
3.2 The Simple Genetic Algorithm Operators
As noted above, the simple genetic algorithm employs three biologically inspired
operators to transform each population of candidate solutions into a new (descendent)
population. The following subsections examine each of these operators and how they
influence the search evolution.
3.2.1 Reproduction
The genetic algorithm reproduction operator is the algorithmic analog of asexual
reproduction. It is the means by which the objective function influences the evolution of
the genetic algorithm search. It is implemented by evaluating each member of the current
generation against the objective function and using the results to measure relative repro
ductive fitness (i.e. to provide a selection probability measure). Then, members of the
current population are selected in accordance with this fitness measure to be members of
the succeeding generation. This process is repeated (with statistically independent selec
tion trials) until the entire new generation is populated.
In the absence of the other genetic operators, the reproduction operator tends to
force the population to converge to the higher performing members of the current popula
tion. It eventually produces a uniform population. At any stage of the search (generation),
only solutions which are represented by members of the current population can appear in
any succeeding generation. In particular, no solution absent from the initial population is
ever attainable. The reproduction operator exerts a strictly converging influence on the
search evolution. The other operators of the simple genetic algorithm circumvent this
limitation in a controlled manner.
3.2.2 Crossover
The crossover operator in a genetic algorithm is the algorithmic analog of sexual
reproduction. It produces the succeeding generation not by simply replicating the fittest
members of the current generation but by mating the fittest members of the current gener
ation to produce progeny with some of the "genetic" character of each parent. It is
implemented by randomly exchanging parts of the strings representing the parents to
produce descendent strings.
The crossover operator is implemented (with some given probability, p,) after the
reproduction operator has been invoked to select two reproducing parents. A string loca
tion is randomly selected (usually with uniform selection probability) and the parent bit
string on each side of the randomly selected location are exchanged to produce two
progeny, which are then inserted into the succeeding population. This operation is
repeated until the new generation is completely populated.
The crossover operator permits strings not represented in the current population to
be generated in the succeeding population. That is, certain points in the solution space
which are not represented in the current generation can be present in the successor gener
ation. But the crossover operator is applied preferentially to high performance members
of the current population, so it constitutes a judicious, informed tendency toward
population divergence. This is the novel feature contributed by the crossover operator.
Even with the addition of crossover, the genetic algorithm search will eventually
converge to a uniform population. In general the crossover operator causes a greater por
tion of the search space to be explored prior to convergence to uniformity, but for a given
initial population, there are still unreachable points in the solution space. Further, even if
a high performance solution is accessible from the initial population, some portion of the
"gene pool" necessary to reach it can be irrevocably lost during the search evolution.
3.2.3 Mutation
The mutation operator is applied to each member of the successor generation
created by the reproduction and crossover operators. It simply consists of randomly per
turbing each descendent string with some (usually very small) perturbation probability,
p,. The operator exerts a diverging influence on the search algorithm, and it provides a
means by which the search can, with some nonzero probability, always arrive at any point
in the solution space. That is, no part of the "gene pool" is ever permanently extinguished
if the mutation operator is implemented. Clearly, it is analogous to mutation in biological
reproduction. Note also that if p, > 0, the mutation operator precludes the algorithm from
ever producing a permanently uniform population (i.e. it precludes algorithm conver
gence).
3.3 Building Blocks, Schemata and the Fundamental Theorem
The underlying premise of the genetic algorithm operators is that good solutions to
an optimization problem over a bitstring solution space are composed of locally good
substrings, and that assembling combinations of such locally good substrings is an effec
tive way to search the space for globally good solutions. In the genetic algorithm litera
ture, this is referred to as the building block hypothesis. For a problem to be amenable to
genetic algorithm solution, this hypothesis should apply. In the genetics parlance, this
hypothesis is stated as a requirement that the problem exhibit "...some but not too much
epistasis" [Davi87]. The next subsection introduces an idea which helps to place this
hypothesis on a more analytical basis, but the results are still incomplete.
3.3.1 Schema Defined
Let the solution space under consideration be the set of binary strings of length L,
(i.e. S = {0, I}L). Then, a schema (plural schemata), designated H, is a subset of S having
the property that every member of H matches at some specified set of defining bit loca
tions. Thus, if L= 5, then the schema H might be the set of length 5 bitstrings which
match the string (1,0,1,0,0) at the bit locations indicated by H = {s:s= (1,*,*,,0,*)}, in
which the asterisks indicate "don't care" bits. The bit locations at which the schema is
specified are the defining locations of the schema. The order of the schema, designated
by o(H), is the number of its defining locations and can range from 0 to L. In this exam
ple, o(H) = 2. The defining length of the schema, designated 8(H), is the number of bit
positions subtended by its outermost defining bit locations minus 1. In this example,
6(H)= 5 2 = 3.
For a bitstring space of length L, there are exactly 3L distinct schemata. This can be
readily determined by noting that the distinct schemata are selected from {0, 1,*}L. A
given string selected from the space represents exactly 2L distinct schemata. This results
from the fact that the string is defined at all L bit positions, and hence is selected from
{0, 1} The schemata of an optimization problem's search space are the building blocks
from which good solutions are to be constructed.
3.3.2 Schema Processing and the Fundamental Theorem
Let the constant population size of a simple genetic algorithm be designated M.
Then, each generation produced by the algorithm represents some number, N, of distinct
schemata that is bounded as follows
2L< N
The lower bound obtains when all M members are identical, and the upper bound repre
sents a limit on schema diversity supported by the specified population size.
Now, briefly recalling the mechanisms implemented by the three simple genetic
operators, it is possible to begin understanding their influence on the search evolution. In
particular, the reproduction operator tends to reduce, never increase, the number of dis
tinct schemata present in succeeding generations by selectively reproducing strings which
are realizations of above average fitness schemata to the exclusion of below average
competitors at the same set of defining locations.
The crossover operator, on the other hand, tends to produce new schemata by
assembling high performance low order schemata in new combinations at the expense of
disrupting high order high performance schemata. The extent of population divergence
introduced by the crossover operator is determined in part by the degree of schema diver
sity present in the current population. In particular, when the population becomes uni
form, the crossover operator is nullified, because assembling substrings extracted from
identical strings produces identical progeny.
The mutation operator also provides a disruptive mechanism which resists the con
verging influence of the reproduction operator. Since any schema can be produced by
mutation with nonzero probability, the permanent extinction of any of the 3L possible
distinct schemata is precluded.
These ideas are captured in the following inequality, which is referred to in the liter
ature as the Fundamental Theorem of Genetic Algorithms. It relates the number of copies
of a particular schema in the current generation to the expected number of copies of the
same schema in the succeeding generation. This inequality is derived in [Gold89a] from
relatively simple probability notions. The development is not repeated here.
E{ m(H,k+l) ) 2 m(H,k) x x Eq. 3.2
R
8(H)
[1 p x  p x o(H)]
(L 1)
where m(H, k) = number of occurrences of schema H in the population at
generation k,
E{} = expected value operator,
R(H) = average objective function value (> 0) of all strings in
the current population which are realizations of H,
R = the average objective function value of the current pop
ulation.
Equation 3.2 is an inequality because it does not consider the accretion of the
schema H contributed by crossover and mutation. It only accounts for the disruptive
effects of these operators. A more thorough treatment can be found on pp 913 of
[Gref87], but the result is too cumbersome to be of much analytical value.
Qualitatively, Eq. 3.2 suggests that low order schemata occurring in the current
population contribute to succeeding generations in direct proportion to the product of
their number in the current generation and their average performance relative to the other
schemata competing for dominance of the same set of defining locations. Crossover and
mutation tend to disrupt this converging influence, and the disruptive effect of crossover
is directly proportional to the defining length of the schema in question.
In view of Eq. 3.2, the building block hypothesis might be restated as a characteris
tic of genetic algorithm amenable optimization problems. A GA amenable problem is one
for which a near optimum solution can be achieved, with a relatively small expenditure of
search effort, by assembling high performance, low order schemata into novel combina
tions. If the objective function is such that (nonlinear) contributions from combinations of
bits spanning widely separate bit locations are appreciable (i.e. if the objective function
depends heavily on large defining length schemata), then the problem is not likely to be
suitable for solution by genetic algorithm. On the other hand, if the objective function
depends predominantly on short defining length schemata, then sorting through promis
ing combinations of realizations of those schemata is likely to isolate good (though not
necessarily optimal) solutions. Accomplishing the required sorting efficiently is the task
for which genetic algorithms are well suited.
3.4 An Assessment of the Genetic Algorithm Theoretical Foundation
The existing theoretical foundation for analysis of genetic algorithms includes the
fundamental theorem of genetic algorithms (Eq. 3.2) originally enunciated by Holland
[Holl75] and extended by Bridges and Goldberg [BrGo87], the Walsh function approach
to computing schema fitness averages contributed by Bethke [Beth80] and
generalizations of it [Gold88, Gold89b, BrGo89], a result concerning selection of the
optimal population size for the algorithm [Gold85] in terms of the solution space dimen
sion and the examination of the properties which make a problem difficult for genetic
algorithms (the so called minimal deceptive problem) [Gold87, Gold89b]. Also, both De
Jong [Dejo75] and Goldberg/Segrest [GoSe87] employ Markov chain methodology
accompanied by approximate numerical analysis to examine certain specific problems
concerning finite length chain behavior (e.g. genetic drift in a binary allele genetic algo
rithm).
No complete theoretical model exists for describing the operation of the simple f
genetic algorithm executing on a specified optimization problem. The central theme of
the work underlying this paper is an attempt to develop such a model based upon the
asymptotic behavior of a Markov chain which represents the algorithm.
SECTION 4
A MARKOV CHAIN MODEL OF THE SIMPLE GENETIC ALGORITHM
4.1 Overview
From the discussion of the simple genetic algorithm operators in Section 3.2, it is
clear that the sequence of populations generated by the algorithm when executing on a
specified combinatorial optimization problem is a stochastic process (with finite state
space), and further that the conditional dependence of each population in the sequence on
its predecessors is completely described by its dependence upon the immediate predeces
sor population. Thus, the sequence is a Markov chain (Definition Al). In this section, a
nonstationary Markov chain model of the simple genetic algorithm is developed for one,
two and threeoperator variants of the algorithm. The model is tailored to resemble that
offered in Section 2.4.1 for simulated annealing. The oneoperator genetic algorithm
model implements proportional reproduction only, while the twooperator variant
employs reproduction in combination with mutation. The threeoperator algorithm imple
ments reproduction, mutation and crossover. This model hierarchy is employed because it
provides some degree of insight into the effect that each operator has on the nature of the
state space of the resulting Markov chain.
Describing and analyzing the operation of the simple genetic algorithm is facilitated
by assuming that the underlying optimization problem is defined over a bitstring solution
space. This assumption is not essential and sacrifices very little generality. It is implem
ented throughout the following sections.
4.2 The Markov Chain Model
Let a combinatorial optimization problem be characterized by the pair (S,R) where
S={0,1 )L and R is a strictly positive real valued reward function, and assume, with no
loss of generality, that the problem requires maximization of R. Also, let a simple genetic
algorithm designed to execute on this problem have fixed population size M, let i e S be
interpreted as an unsigned integer (0 < i < 2L 1), and let a generation be represented by
m = (m(0), m(l), m(2L 1)) where m(i) = the number of occurrences of solution i e S
in the population. Thus, in the parlance of combinatorial mathematics, m is a distribution
of M nondistinct objects over N = card(S) = 2L bins [Hall67, Rior58], and the set of all
such distributions, S' = {m}, is a suitable representation of the simple genetic algorithm
search space. The cardinality of S' is given by
N' =card(S')= M+2L1=M+N 1. Eq. 4.1
M M
Since both N and M are finite, so is N'.
Then, if mo e S' is selected as an initial population, the simple genetic algorithm
can be represented by the quadruple (S',mo, P, F) where PQ is a state transition matrix
(analogous to PT of the simulated annealing model) and F = {QJ is a finite length
sequence of parameter vectors Qk = (Pm(k), pc(k)). The algorithm parameters pm(k) and
pc(k) are respectively the mutation and crossover probabilities. In the following sections,
the mutation probability sequence is employed in a role analogous to absolute tempera
ture in simulated annealing, and consideration is limited hereafter to monotone nonin
creasing sequences. In general, the only limitation on the crossover probability sequence
is that its values are probabilities. However, in all of the following, consideration is
limited to constant crossover probability sequences.
The first parameter vector in F is Q0 and the final parameter vector is Q,. The solu
tion evolves as a sequence {ink} of states mk e S' in which the conditional dependence of
mk+i on the sequence history is equivalent to its conditional dependence on in,, and thus
the solution sequence is a Markov chain. In general, the chain is inhomogeneous (Defini
tion A5). In Section 4.3 it is shown to be timehomogeneous if the parameter vectors are
constant. As with the simulated annealing algorithm model, exhausting the sequence of
control parameter values, F, signals algorithm termination, and n can be allowed to
depend on {fmik provided the algorithm termination requirement is satisfied.
4.3 State Behavior of the Simple Genetic Algorithm
In each of the next three subsections, the state transition mechanism (and its effect
on the nature of the solution sequence) which results from employing a specified combi
nation of the genetic algorithm operators to the Markov chain model is examined. The
first case consists of a oneoperator algorithm which employs only the reproduction
operator. The second is a twooperator algorithm which employs reproduction and muta
tion. Finally a threeoperator algorithm which includes crossover with reproduction and
mutation is examined.
Although it is most natural to describe the genetic operators in the order reproduc
tion/crossover/mutation, the course adopted in Section 3.2, the following development
proceeds most instructively if mutation is included with reproduction in the twooperator
algorithm and crossover is deferred to the threeoperator case. This is due to the fact that
the mutation operator provides the essential state space modification required to make the
Markov chains of the timehomogeneous two and threeoperator algorithms irreducible
(Definitions A7 and A8, Theorem Al), and consequently causes them to have unique sta
tionary distributions (Theorem A3). The oneoperator algorithm (proportional reproduc
tion only) does not satisfy the irreducibility requirement for existence of a unique
stationary distribution. (Neither does the algorithm variant which employs reproduction
and crossover without mutation). A unique stationary distribution means that the asymp
totic state occupancy probability of the timehomogeneous two and threeoperator algo
rithms is completely determined by the algorithm parameters and objective function. It is
independent of the starting state (initial population). Asymptotic independence of the
starting state is a necessary (but not sufficient) condition on the zero mutation probability
limit of the stationary distribution of the timehomogeneous algorithm for the inhomoge
neous algorithm counterpart to avoid asymptoticallyy) local minima entrapment.
4.3.1 A OneOperator Algorithm (Reproduction)
In this subsection, the nature of the state transition matrix is examined for the case
of no crossover or mutation (i.e. Q] = (0,0) for 0 < k < f). In this case, the conditional
probability of selecting a solution i E S from a population described by the state vector
ne S' is (i.e. proportional reproduction)
n(i) x R(i)
Vie S,Vn E S' :P,(i[ in)= Eq. 4.2
Y no) x R(j)
j S
where the subscript 1 indicates that the oneoperator case is under consideration. Thus,
the conditional probability of the successor generation described by m given that the pres
ent generation is described by n is a multinomial distribution, i.e.
M!
Vm,n e S': P,(m I n) = x P,(i I n)ri)
H m(i)! ieS
is S
= x P,(i In)m(i) Eq. 4.3
m ie S
M n(i) x R(i) m(i)
(
where again the subscript 1 distinguishes the oneoperator case, where the symbol
= Eq. 4.4
(m n M(i)! Eq.4.4
ie S
designates the indicated multinomial coefficient and where by definition
Sn(i)x R(i) mF 1 (i )
n(i) = 0 n()xR(j) m(i)>0 Eq. 4.5
jeS
The transition probability matrix of the Markov chain representing the oneoperator algo
rithm is composed of the array of conditional probabilities defined by Eq. 4.3, i.e.
= [P,(m I n)]. Eq. 4.6
Since it is independent of the sequence index (i.e. the parameter vectors are constant), the
oneoperator Markov chain is timehomogeneous (Definition A5).
The set of states which represent uniform populations (i.e. the states mA e SA' c S'
in which one component is M and all others are zero) are absorbing states of the Markov
chain, because for any such state, P(mA I mA) = 1 and Definition A6 applies. Since it fol
lows from Eq. 4.23 that Vn e S' SA' P1(n I n) < 1, there are exactly N = 2L absorbing
states. The corresponding rows of P are given by
1 m=nA
VnA SA :P,( nA)= Eq.4.7
0 MES {nA
Thus, for each state nA e SA', the associated row of the state transition matrix (Eq. 4.6)
contains 1 in the principal diagonal location and 0 elsewhere. It follows that the N' x 1
probability vector q,A (Definition A2) whose nA e SA' component is 1 is a stationary dis
tribution (Definition A10) of the oneoperator Markov chain. It is not unique because any
of the N = 2L such vectors satisfies the requirement, as does any vector of the form
q= W ~l ~Awhere W2 0 andY = 1.
HA E SA
The absorbing states preclude irreducibility (Theorem Al), so the Markov chain
does not satisfy the requirements of Theorem A3. The chain is periodic (Definition A9)
however, because Vm e S' : P,(m I m) > 0 so the period of all states is 1. Thus, all of the
conditions of Theorem A3 except irreducibility are met by the oneoperator Markov
chain.
The expected number of transitions required to arrive in an absorbing state, E{kA},
is finite. An upper bound on E{kA} is given by
Mx < oo Eq. 4.82M
E{kA Ix Rf J < oo Eq. 4.8
I in )
where Rni,, and Rmax are the extreme values of R. (Recall that R is assumed strictly posi
tive, so R,,ax Rni,n > 0). Eq. 4.8 can be derived by defining pA(k) as the conditional
probability of arriving in the set of absorbing states, SA', on the k0 transition given that
the k0 state is not absorbing, letting pr,, be a lower bound on pA(k), and bounding the
series for E{k^} as follows
k1
E{kA} = k x pA(k) (1 pA())
k 1=1
k1
Y k x n (1 pA(1)) Eq. 4.9
k 1=1
SY_ k x (1 Pmin)k 1
k
= 1/(Pmin)2.
Next, note that a suitable bounding value on p,,n is
Pmin = MXRmin > 0. Eq. 4.10
L Mx Rmax.
The desired bound on E{kA} (Eq. 4.8) is then obtained by using Eq. 4.10 in Eq. 4.9.
It is noteworthy that the above absorbing state convergence result does not require
any assumption on the range of Rm,, Rnn. Even when the objective function exerts zero
selective pressure (i.e. Vi e S:R(i) = Rim = Rax), the finite population size still results in
convergence to an absorbing state. In the genetics parlance, this tendency is referred to as
genetic drift. It is responsible for the inevitable convergence of the oneoperator simple
genetic algorithm, as discussed in Section 3.2.1.
4.3.2 A TwoOperator Algorithm (Reproduction and Mutation)
In this subsection, the nature of the state transition matrix is examined when the
mutation operator is applied with some probability in the range 0 < pm(k) < 1 (i.e.
Qk = (Pm(k),0)). Let P2(i n) and P2(m I n) be the conditional distributions of the two
operator algorithm corresponding to the oneoperator distributions defined by Eq. 4.24.5.
Then, P2(i I n) and P2(m I n) must account for the effect of nonzero p,. This can be
accomplished by expressing P2(i I n) as a sum over all j of the corresponding P,(j n)
times a factor which accounts for the probability of the collection of mutation events
required to transform j into i. This probability can be expressed as p(i,)(1 pm)LH(iJ)
where H(i,j) = H(j, i) is the Hamming distance of the pair i,j. That is, H is a function
defined on S x S with values in {0, 1,2, ,L}. H(i,j) is the number of bits which must be
altered by mutation to transform i into j and L H(i,j) is the number of bits which must
remain unaltered. Thus, P2(i I n) can be written as
Vi e S, Vn e S' P2(i I n) = p)() pm)L(i) PI(j I n)
jes
mr I(ij)
=(1pm)Lj (1 xP,(jn) Eq.4.11
= 1 satIj) x Pi l n)
(1+ a)Lj e
where
Pm
a= Eq. 4.12
(1 Pm)
and
Pm ( Eq. 4.13
(1 + a)
For pm=0 or pm=l, Eq. 4.11 includes the indeterminate form 00 in some terms. Thus,
the admissible range of pm is restricted to 0 < p, < 1, and consequently that of a is
0 < a < oo. However, cases corresponding to pm > 1/2 = a> > 1 are of no practical interest
(they are less random than the case pm = 1/2 = a = 1), and some of the following devel
opments restrict consideration to the range 0 < pm 1/2 :* 0 < a < 1.
Substituting Eq. 4.2 into Eq. 4.11 yields
1 n((iJ) nO) x RO)
P2(i I n)= a x j) (
(1 + a)L s E n(k) x R(k)
k S
It is also straightforward to show that
1
Y n(k) x R(k)= 1 n(k) x R(k) x a"1'k)
keS (1 +c)LjeSkeS
Thus, P2(i I n) can be expressed as
Sn(j) x R(j) x a0',j)
P2(i I n) j= Eq. 4.14
Y YS n(k)x R(k)x ax(,k)1
je Ske S
n n(j) x R(j) x a"Oj)
je S
(1 + a)L n(k) x R(k)'
k S
and P2(m I n) is multinomially distributed as follows
M!
Vm, ne S' : P2(m I n)=x F P(i I nm()
1 m(i)! ies
ie S
= M1 xI P2(i I )m(i) Eq. 4.15
0mm ie S
(M) 1 Y n(j)x R(j)ax"('"j) m
m ( + +(X)"is n(k)xR(k)
keS
The transition probability matrix of the Markov chain representing the twooperator algo
rithm is composed of the array of conditional probabilities defined by Eq. 4.15, i.e.
P= [P2(m I n)]. Eq. 4.16
Since the elements of P depend on a (and hence by Eq. 4.12 on pm(k)), the twooperator
Markov chain is generally not timehomogeneous. It is timehomogeneous if the mutation
probability is fixed.
Eq. 4.144.16 for the twooperator simple genetic algorithm are analogous to Eq.
4.24.6 for the oneoperator variant except that P2(i I n) is strictly greater than zero for all
n e S'. Thus, the twooperator analog of Eq. 4.5 is not required. Also
lim P2(i ) = P1(i I n)
a 0+
and Eq. 4.17
lim P2(m n) = P,(m  n).
a 0+
The rows of the state transition matrix corresponding to the oneoperator absorbing
states have an especially simple form. Let iA e S be the solution represented in the
absorbing state n A SA'. Then, from Eq. 4.14,
Mx R(iA) x o 1I(iA)
P2(i I n ) = Eq. 4.18
(1 + a)L x MxR(iA)
H(i, iA)
(1 +a)L
Thus, from Eq. 4.15,
(M ) (Xrnm(i) x 1l(i, iA)
P2(m I nA)= M.) (+x)ML Eq. 4.19
Since the reward function, R, is strictly positive by hypothesis, and since
Vi,j E S : 0 H(i,j) < L, it follows that for a in the range 0 < (a < 1, then
oa" L n(j) x Rj) Y n(j)xR(j) x a"(i'J) n(j) x R(j),
je S je S je S
and consequently from Eq. 4.14 that
Sa 1 L
Vie S, Vne S': P2(i n) ( Eq. 4.20
Using Eq. 4.20 in Eq. 4.15 yields
a 1 )ML
Vm, ne S' : P2(m  n) Eq. 4.21
m 1l+a m l+a
From the lower bound in Eq. 4.21, the final requirement of Theorem A3 (irreduc
ibility) is fulfilled and the Markov chain for the timehomogeneous twooperator simple
genetic algorithm possesses a unique stationary distribution, q,, given by
Vm e S' : (1) q,(m) > 0
(2) q1 = 1
(3) P=y .
Since the stationary distribution is by definition a left eigenvector of the state transition
matrix (Definition A10), it follows from Eq. 4.15 and 4.16 that the asymptotic state prob
ability distribution of the timehomogeneous twooperator algorithm is completely deter
mined by the objective function and the algorithm parameters. It is independent of the
starting state, mo.
4.3.3 A ThreeOperator Algorithm (Reproduction. Mutation and Crossover)
The threeoperator simple genetic algorithm corresponds to the case
Vk:Qk = (pm(k), pc(k)) with both pm(k) and p,(k) nonzero. Results analogous to Eq.
4.144.21 for the twooperator case are obtainable by defining a new function which is
similar in character to the Hamming distance function employed in Section 4.3.2 for the
twooperator case. This subsection completes that generalization. The result only reflects
the crossover operation implicitly, however it permits some very significant conclusions
concerning bounding values of the three operator conditional probabilities.
The new function, I(i,j, k, s), is defined over an ordered quadruple (i,j, k, s) where
i,j,k e S and where s e {0, 1, ,L} is a bitstring location. The states i,j e S represent
respectively the first and second parent strings selected at a particular crossover opportu
nity and k e S represents a possible descendent string. The bitstring location s is the
location randomly selected by the crossover operator, and normally it is uniformly
distributed over its range. Thus, I is defined on S x S x S x {0, 1,2, .. L} and it takes on
values selected from {0, 1 depending upon whether the indicated crossover operation is
or is not consistent. That is, I assumes the value one if the bitstring k is produced by
crossing the bitstrings i and j at the site s, and zero otherwise.
In terms of this crossover operator function, the conditional probability of produc
ing, via reproduction and crossover, a solution k e S given a current population described
byn S'is
P2'(k I n) =p, x P(i I ) x P (jI n) x l(ij, k, s)
iE SjE S L s
+(1 c) x Pi(k I n) Eq. 4.22
1 s=L
=P xx Y P1(i n)x Pi(j n)xI(i,j,k,s)
L ie Sj Ss=l
+(1 pc) x P,(k I n)
where P,(i n) is as defined as in Eq. 4.2 and where P2'(i I n) refers to the twooperator
algorithm consisting of reproduction and crossover without mutation. This result assumes
uniformly distributed crossover site selection.
The array of conditional probabilities [P2'(i I n)] plays a role in the threeoperator
simple genetic algorithm very analogous to the role played by the array [P,(i I n)] in the
twooperator variant. In fact, the [P2'(i I n)] array can be used as counterparts of Eq. 4.2 to
develop results exactly analogous to Eq. 4.3 and Eq. 4.6. Further, for n SA', Eq. 4.22
reduces to
P2'(k nA)= P(k I nh), Eq. 4.23
and consequently this (fictitious) twooperator algorithm (reproduction and crossover)
demonstrates the same sort of absorbing state behavior as the oneoperator algorithm.
From Eq. 4.22, the threeoperator conditional probabilities and state transition
matrix are expressible as
P3(i  n)= x Y a"('J) X P2'(j I n), Eq. 4.24
(1+a)L jes
M!
P3(m P I n) =1 n )mP(i )" Eq. 4.25
Fm(i)! ies
iE S
= 1xn P3(i In)m(i)
(mB iE S
and
P= [P3(mI n)]. Eq. 4.26
These results are developed in a fashion analogous to Eq. 4.144.16. From them, it fol
lows that the threeoperator Markov chain is timehomogeneous if both the mutation and
crossover probabilities are fixed. In general it is not timehomogeneous.
From Eq. 4.22, 4.24 and 4.25, it follows that
lim P3(i n) = P2'(i  n) Eq. 4.27
a .)+
and lim P3(m  n) = P'(m n).
a + 0
Also, from Eq. 4.234.25, the threeoperator analogs of Eq. 4.1819 apply
II(i, iA)
ca
P3(i I n= + Eq. 4.28
P3(m I nA) = L ML Eq. 4.29
Additionally, since
aI L P2'( I n) < Y P2(j I n) x a" (ij) P'(j I n),
jeS jES jES
the threeoperator analogs of Eq. 4.2021 follow from Eq. 4.2425, i.e.
Vi E S, VE S' : P,(i n) < Eq. 4.30
1 +ac I+a
and
Vm,ne S': a P(m P In)( J(EI Eq. 4.31
Im I+a II I +a
All of the state space characteristics described in 4.3.2 for the twooperator algo
rithm follow. In particular, the Markov chain of the threeoperator algorithm is irreduc
ible. Thus, a unique stationary distribution exists for the timehomogeneous
threeoperator simple genetic algorithm, and as in the twooperator case it is completely
determined by the objective function and the algorithm parameter values.
4.3.4 Summary
The asymptotic behavior of the oneoperator simple genetic algorithm is dominated
by the states which correspond to uniform populations, the oneoperator absorbing states.
The algorithm necessarily arrives at some member of the absorbing state set within a
finite number of algorithm iterations (Eq. 4.8). The asymptotic probability distribution
depends upon the algorithm initial population, mo. This observation is equivalent to the
fact, established in Section 4.3.1, that the stationary distribution of the oneoperator algo
rithm is not unique.
A unique stationary distribution exists for the timehomogeneous two and
threeoperator algorithm variants (with a > 0), or equivalently, their asymptotic probabil
ity distributions are independent ofmo. However, in the a * 0+ limit, both the two and
threeoperator algorithms degenerate into the absorbing state behavior which typifies the
oneoperator case (Eq. 4.17 and Eq. 4.23, 4.27). A very important question is whether the
unique stationary distributions of the two and threeoperator algorithms approach limits
as a O'. Section 7 answers that question affirmatively, and in Section 8, the lower
bounds reflected in Eq. 4.21 and Eq. 4.31 are employed to arrive at a monotone decreas
ing sequence bound on pm(k) sufficient to guarantee that the limiting distribution is
achieved asymptoticallyy) by the inhomogeneous two and threeoperator Markov chains.
The analogous conditional probability arrays [Pi(i I n)] and [P2'(i I n)], whose ele
ments are defined by Eq. 4.2 and Eq. 4.22 respectively, play a very essential role in the
following sections, especially in Section 9. Most of the results developed hereafter apply
equally to the two and threeoperator algorithm variants by substituting from these
41
conditional probability arrays appropriately. Thus, in much of the following, the notation
modifiers are suppressed, so that the elements of either of these arrays are denoted by
P(i  n), with the specific array reference being determined by context.
SECTION 5
SOME EMPIRICAL RESULTS
5.1 Overview
This section reports the results of some computer simulations based upon the
genetic algorithm Markov chain model developed in Section 4. Their purpose is to help
fix some of the state space and asymptotic probability distribution ideas which are central
features of this work.
The results reported here are separated into four subsections. Section 5.2 concerns
enumeration of the state space, S'. Section 5.3 is devoted to generation of reward function
data, which are subsequently used in the two remaining subsections. Section 5.4 illus
trates the behavior of some selected conditional probabilities as a function of the algo
rithm control parameter, a. The results of the primary simulation task are reported in
Section 5.5. They concern computation of the threeoperator stationary distribution at
extremely low (approaching zero) values of the mutation probability control parameter.
One of the significant theoretical results developed in subsequent sections is sug
gested by the data presented in Section 5.5. It is that the zero mutation probability limit
ing stationary distribution provides nonzero probability for all states corresponding to
uniform populations (i.e. oneoperator absorbing states), including those which represent
suboptimal solutions. This result poses a complication for the attempt to extrapolate the
simulated annealing convergence theory onto the genetic algorithm, as discussed further
in section 5.5.
All simulation results included here were generated on the Cray YMP computer at
the Eglin AFB, Fl. Computer Science Directorate. The data presented in Section 5.5 con
cerning the primary simulation task (the converged limiting stationary distribution
results) includes some CPU utilization statistics which reflect the approximately 180
hours of CPU time expended in generating that data. The source program listings for the
programs employed in generating the results of this section are included in Appendix D.
5.2 State Space Enumeration
The results appearing in this section are of two primary types. The first is a table of
computed state space cardinality values, N', at a variety of combinations of bitstring
length, L, and population size, M. These results are products of the program GET_NPS.F
appearing in Appendix D. It implements Eq. 4.1. The results are collected in Table 51.
In addition to the N' column, Table 51 includes a similar column labeled N". It
denotes the cardinality of a space designated S" which is related to S' and whose signifi
cance is established in Section 9. Its cardinality is given by
N" =M N. Eq. 5.1
The data recorded in column N" of Table 51 are computed from this equation by the
program GET_NPS.F.
Table 51
State Space Cardinality
M L N N' N"
1 1 2 2 3
2 4 4 5
3 8 8 9
4 16 16 17
5 32 32 33
6 64 64 65
7 128 128 129
8 256 256 257
2 1 2 3 6
2 4 10 15
3 8 36 45
4 16 136 153
5 32 528 561
6 64 2080 2145
7 128 8256 8385
8 256 32896 33153
44
Table 51 (continued)
M L N N' N"
4
20
120
816
5984
45760
357760
2829056
2
4
8
16
32
64
128
256
2
4
8
16
32
64
128
256
2
4
8
16
32
64
128
256
2
4
8
16
32
64
128
256
2
4
8
16
32
64
128
256
84
1716
54264
2324784
119877472
6856577728
414356272512
8
120
3432
170544
12620256
1198774720
131254487936
15508763342592
165
6435
490314
61523748
10
35
165
969
6545
47905
366145
2862209
5
35
330
3876
52360
766480
11716640
183181376
6
56
792
15504
376992
10424128
309319296
9525431552
15
70
495
4845
58905
814385
12082785
186043585
21
126
1287
20349
435897
11238513
321402081
9711475137
28
210
3003
74613
2760681
131115985
7177979809
424067747649
36
330
6435
245157
15380937
1329890705
138432467745
15932831090241
45
495
12870
735471
76904685
' '
Table 51 (continued)
M L N N' N"
6 64 10639125640 11969016345
7 128 2214919483920 2353351951665
8 256 509850594887712 525783425977953
The second set of state space enumeration results reported here is a listing of the
elements of S' for a variety of values of M and L. The principal purpose of this particular
simulation task is to verify the computer algorithm (and the implementing subprograms)
employed to generate the state space vectors for use in the primary simulation task
reported in Section 5.5. The data in Tables 52, 53 and 54 below are products of the
program GET_SPS.F appearing in Appendix D. The tabulated results are representative
of data produced for M and L ranging up to 5 and 4 respectively, for which N' = 15504.
In each case generated, the cardinality of the result agreed with that predicted by Eq. 4.1
and recorded in Table 51.
Table 52 Table 53
S' at M=2, L=2 S' at M=3, L=2
2 0 0 0 3 0 0 0
1 1 0 0 2 1 0 0
1 0 1 0 2 0 1 0
1 0 0 1 2 0 0 1
0 2 0 0 1 2 0 0
0 1 1 0 1 1 1 0
0 1 0 1 1 1 0 1
0 0 2 0 1 0 2 0
0 0 1 1 1 0 1 1
0 0 0 2 1 0 0 2
0300
0 2 1 0
0201
0120
0 1 1 1
0102
0030
0021
0012
0 0 0 3
Table 54
S' at M=2, L=3
20000000
11000000
10100000
10010000
10001000
10000100
10000010
10000001
02000000
01100000
01010000
01001000
01000100
01000010
01000001
00200000
00110000
00101000
00100100
00100010
00100001
00020000
00011000
00010100
00010010
00010001
00002000
00001100
00001010
00001001
00000200
00000110
00000101
00000020
00000011
00000002
5.3 Reward Function Data
This section presents two sets of reward function data, one for a fourbit optimiza
tion problem and another for a fivebit version. Both data sets are products of the pro
gram GET_R.F provided in Appendix D. These data sets are employed in the simulations
presented in Sections 5.4 and 5.5. Figure 51 presents the fourbit function and Figure 52
the fivebit version.
In both data sets, the solution state which maximizes the reward value is the i e S
represented by the decimal integer value 12. That is, for the fourbit function, iop, = 1100,
and its fivebit counterpart is iop, = 01100. The reward function value for the arbitrary
i E S is then computed by assigning the value 1 for each length 0, 1 or 2 schema (Section
3.3.1) in agreement with the optimum bit pattern and summing the contributions. Thus,
for example, for the fourbit reward function, the bitstring 0000 has function value 4,
generated by summing the contributions from the single matching length 0 schema, two
matching length 1 schemata and the one matching length 2 schema. A strictly positive
reward function is guaranteed since every string matches the single length 0 schema.
8 H
R(i)
0 1 2 3 4 5
6 7 8 9 10 11 12 13 14 15
FourBit Reward Function
Figure 51
I I I I I I I I I I I I I I I I _
12
10
8
R(i)
6
4
2
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
i
FiveBit Reward Function
Figure 52
5.4 Conditional Probabilities Versus at
The following four figures present plots of two and threeoperator conditional prob
abilities at two selected current states, n. These results are computed from Eq. 4.14 and
Eq. 4.24. The plots are generated for the fourbit problem with reward function given in
Figure 51 and with M= 6. From Table 51, the cardinality of S' for these examples is
N' = 54264. The conditional probabilities are provided at two selected n vectors, one rep
resenting the uniform population n = (6000000000000000) and one the mixed population
state n = (2000010001002000), and at three values of the mutation probability parameter.
The two and threeoperator results are respectively products of the computer programs
GET_P2INS and GET_P3INS provided in Appendix D.
49
The purpose of the tests from which these data are produced is verification of the
computer algorithms (and the implementing subprograms) employed to generate the con
ditional probability calculations required by the primary simulation task reported in Sec
tion 5.5. Thus, for example, all conditional probability distributions are uniform at a = 1
as is required by Eq. 4.14 and Eq. 4.24, and for a O0 all conditional probability
distributions approach the oneoperator counterparts as is required by Eq. 4.17 and Eq.
4.27. Also, the two and threeoperator conditional probabilities are identical for the uni
form population case (Figures 53 and 55) as required by Eq. 4.18 and 4.28, and the
threeoperator mixed population state case allows generation of solutions not present in
the current population even in the zero mutation probability limit.
0.8
P2(i I n)
0.6
0.4
S\.
0 1 2 3 4 5 6 7 8 9 10I 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0
.5
.02
i
P2(i I n) at n = (6000000000(XXXXX))
Figure 53
0.8
P2(i n) 0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
P2(i I n) at n = (2000010001002000)
Figure 54
1.0
.5
.02
1
0.8
P3(iln) 0.6
0.4
0.2
0
i
i
i
~~
i
I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
P3(i I n) at n = (6000000000000000)
Figure 55
1.0
.5
.02
',
1 1.0
.5
0.8
.02
P3(i n) 0.6
0.4
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
P3(i I n) at n = (2000010001002000)
Figure 56
5.5 Converged Limiting Stationary Distributions
The following data represent converged threeoperator stationary distribution
results for both four and fivebit problems at a variety of population sizes. The results
recorded in Figures 57 through 516 are products of the computer program
GET_3STAT.F included in Appendix D. They are obtained by repeatedly multiplying a
current state probability vector by the threeoperator state transition matrix until a termi
nation criterion representing approximate convergence is attained. The starting probabil
ity vector is the multinomial distribution corresponding to a uniformly distributed P3(i I n)
array, and the termination criterion is that the sum of the probabilities for all nonuniform
population states is less than 0.004.
All of the results reported here are for extremely small a (approaching zero) and
thus, as predicted by the model, only the states corresponding to uniform populations
(oneoperator absorbing states) have nonzero probability. Consequently, only the final
probabilities for the uniform population states are displayed in Figures 57 through 516,
with each such state indexed by the decimal integer value corresponding to the solution
represented.
Table 55 summarizes the Cray YMP computer resources expended in generating
these data. Tabulated there are the number of vector multiplications (of dimension N')
required to attain the termination condition and the CPU time utilized. The CPU time is
in seconds, rounded to the nearest integer. The tabulated data are collected from the log
files generated in the computer runs which produced the stationary distribution data for
Figures 57 through 516.
The limiting distribution entropy results in Figures 517 and 518 are computed
from the converged stationary distributions. The results are recorded in bits and are
plotted as a function of population size.
A very significant result suggested by the limiting stationary distribution data is that
the a > 0' value of the stationary distribution is nonzero for all possible uniform states.
This behavior, which is confirmed by theoretical results developed in Section 7, pre
cludes extrapolation of the simulated annealing global optimality convergence result onto
the genetic algorithm. However, as suggested by the data plotted in Figures 517 and
518, it may be possible to approach the desired limiting behavior as closely as required
by adjusting the population size parameter. Those figures indicate that for sufficiently
large values of the population size parameter, the limiting distribution is dominated by
optimal solutions, and that the limiting distribution entropy decreases monotonically with
increasing population size. Results developed in Section 9 reinforce this premise.
0.12
0.1
0.08
q(i)
0.06
0.04
0.02
0
0                
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Limiting Stationary Distribution at M=2, L=4
Figure 57
II I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Limiting Stationary Distribution at M=3, L=4
Figure 58
0.15 k
q(i)
0.1 
0.05 
I
I
I I I 1 111
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Limiting Stationary Distribution at M=4, L=4
Figure 59
0.25 F
0.2 F
0.15 F
0.1 
0.4
0.3
q(i)
0.2
0.1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Limiting Stationary Distribution at M=5, L=4
Figure 510
0.5
0.4
0.3
q(i)
0.2
0.1
0 I I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Limiting Stationary Distribution at M=6, L=4
Figure 511
0.6
0.5
0.4
q(i)
0.3
0.2
0.1
0 I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Limiting Stationary Distribution at M=7, L=4
Figure 512
0.06
0.05
0.04
q(i)
0.03
0.02
0.01
0 H
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Limiting Stationary Distribution at M=2, L=5
Figure 513
0.1
0.08
0.06
q(i)
0.04
0.02
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
i
Limiting Stationary Distribution at M=3, L=5
Figure 514
0.16
0.14
0.12
0.1
q(i)
0.08
0.06
0.04
0.02
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Limiting Stationary Distribution at M=4, L=5
Figure 515
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
i
Limiting Stationary Distribution at M=5, L=5
Figure 516
Table 55
CPU Utilization Statistics
M L N' Iterations Seconds
2 4 136 8 <1
3 4 816 14 8
4 4 3876 19 86
5 4 15504 23 1219
6 4 54264 27 19048
7 4 170554 30 219930
2 5 528 9 13
3 5 5984 15 301
4 5 52360 21 11676
5 5 376992 26 **
** Not obtained due to unrecoverable log file error
64
4
3.5
3
2.5
2 I III
2 3 4 5 6 7
M
Limiting Distribution Entropy vs Population Size (FourBit Problem)
Figure 517
5
4.8
4.6
H
4.4
4.2
4
3.8
2 3 4 5
M
Limiting Distribution Entropy vs Population Size (FiveBit Problem)
Figure 518
SECTION 6
THE CRAMER'S RULE FORMULATION OF THE STATIONARY DISTRIBUTION
6.1 Overview
In Sections 4.3.2 and 4.3.3, the timehomogeneous two and threeoperator simple
genetic algorithm Markov chains are shown to possess unique stationary distributions.
Those conclusions are established by invoking Theorem A3, which asserts that in each
case the stationary distribution is a left eigenvector of the state transition matrix and that
the additional constraint that it be a probability vector (Definition A2) makes the solution
unique. In this section, the existence and uniqueness arguments are refined into a Cram
er's rule formulation of the solution. This development concerns the timehomogeneous
algorithms only, with a constrained to a > 0, and it appeals heavily to the foundation
provided in Appendix B.
The product of this development is an expression for the components of the station
ary distribution vector as rational functions generated from the characteristic polynomials
of matrices derived from the state transition matrix. The derived matrices are generated
by setting selected rows of P to zero. The utility of the approach is that the form of P
suggests a mechanism for expressing the values of the characteristic polynomials. Some
key intermediate parts of the required methodology are developed in Section 9, but the
effort stops short of explicit solution. However, some very significant conclusions con
cerning the asymptotic behavior of the algorithm are obtainable (Sections 7 and 8) from
the results developed here without explicitly solving the system.
6.2 The Stationary Distribution Description
As established in Section 4, implementation of the mutation operator with nonzero
mutation probability (i.e. a > 0) implies that for both the two and threeoperator
algorithms, Vm,n e S':P(m I n)> 0. Thus, by Definition Bl, p is primitive for any integer
k 2 1. Hence, from Section B.3, the stationary distribution of the two and threeoperator
simple genetic algorithm exists, is unique and is a left eigenvector of the state transition
matrix corresponding to eigenvalue 1, i.e.
.P q
or equivalently
q~j(P I) = Eq. 6.1
The following proposition establishes a significant fact concerning the rank of the
matrix (PI) in Eq. 6.1.
Proposition 6.1: The rank of the matrix (P 1) in Eq. 6.1 is exactly N' 1 where
N' = card(S') is the dimension of P.
This result follows from Theorem B4(f). Its significance is that exactly one column
of the system of equations in Eq. 6.1 can be replaced without sacrificing any of the con
straints which Eq. 6.1 imposes on q,. Proposition 6.2 below concerns such a modification
of the system. The modification consists of replacing any column (e.g. the column
indexed by n E S') of Eq. 6.1 by a column corresponding to the constraint
q,(m) = =1, Eq. 6.2
me S'
thus producing a system of the form
qi(P I = Eq. 6.3
where (P )n is generated by replacing the column of (P ) indexed by n e S' with the
vector 1 whose components all have the value 1, and where J is the row vector contain
ing 1 in column n and O's elsewhere.
Proposition 6.2: If the constraint described in Eq. 6.2 is used to replace any column (e.g.
column n) of the system in Eq. 6.1, the resulting system (Eq. 6.3) is full rank, or equiva
lently, I (P I), 0.
Since P is a stochastic matrix (Definition A3), the system of equations in Eq 6.1 can
be transformed into an equivalent system in which the column indexed by the arbitrary
column index n e S' is represented by the equation O = 0. The required transformation
is obtainable by replacing column n by the sum of all columns m e S', and thus any
n e S' is a candidate for replacement. Proposition 6.2 is then a restatement of Proposition
B2 in terms of the determinant of the matrix of the modified system. It is the essential
condition for justification of the following proposition.
Proposition 6.3: The components of the stationary distribution can be expressed in the
form
(P I I
(M
I ( (P l1
where (PI,) is derived from (PI)n by replacing the row of (PI)n indexed by m e S'
with the row vector e.
This result is simply an application of Cramer's Rule to the solution of the system
in Eq. 6.3. It applies because I (P I) 0 is assured by Proposition 6.2.
The equality defined in Proposition 6.3 can be evaluated without computing
I (P I) directly, as suggested by the following proposition.
Proposition 6.4: The denominator determinant in Proposition 6.3 can be written as
P m S)
im C S'
This result follows from application of elementary column operations on column n
of I (PT)n and employing the definition of I (P ) . The essential step is noting that
the cofactor of each of the (unit) elements in column n is equal to the corresponding
Since the numerator determinant defined in Proposition 6.3 is generated from
(P I) by replacement of row m by the row vector e its value is the cofactor of the
(unit) element in row m and column n. As indicated in the following proposition, it is
equal to the determinant which results from the corresponding row replacement in (P ).
Proposition 6.5: The numerator determinant defined in Proposition 6.3 can be written as
(P 1 =P IP )Fm)
where (P I) is defined as the matrix which results from (PI) by replacing the row
indexed by m with the row vector e.
Next, note that if m = n, then I (P )) can be written as
(PI) C =
where Pi is defined as the matrix which results by replacing row m of P by the row vec
tor 0. Ifm n, then by writing the replacement row in I (P i) (l as
= + er = e F
e e. ee = e,, e (ez)1,
 (Pi1) ml can be written as the difference of two determinants derived from (P1), one
with the mh row replaced by I[ e] and the second with the mi row replaced by [ r1.
The e term in each row replacement provides the necessary principal diagonal contri
bution to permit expression of I (P I) ) as
(P )(p n) >=(gn_,)(pW i))
where pO) is defined as before and where p) is defined as the matrix which results by
m m
replacing row m of P by the row vector e.
This result can be further reduced by noting that the row replacement by which PR
is generated from P preserves the row sum constraint (i.e. P is a stochastic matrix).
Thus, I is an eigenvalue of P (Definition A3), from which it follows that I (Pj ) = 0.
Consequently, the m = n and m # n cases can be assembled as indicated by the following
proposition.
Proposition 6.6: The determinant I (PI) ~ I defined in Proposition 6.5 can be written
as
(P I) F C= )
By collecting the results of Propositions 6.36 and noting that the superscript in P)
is now superfluous, the components of the stationary distribution can be written as indi
cated in the following proposition.
Proposition 6.7: The components of the stationary distribution can be expressed in the
form
P 1 1 PI II
nE S' ne S'
where Pm and Pg are derived from P by replacing the rows indexed by m and n respec
tively with the row vector 6r.
Thus, computing the stationary distribution components reduces to evaluating the
characteristic polynomials of the Pn's at X = 1 (i.e. P(X) = I P II =I I P 1 = 1 (1)).
Also, since 1 is an eigenvalue of P it follows that ((1) = P II = 0, which suggests the
following alternative to Proposition 6.7. Its usefulness is established in Sections 9.39.4.
Proposition 6.8: The components of the stationary distribution can be expressed in the
alternative form
I PII IPRl
q,(m) 
O (IPIIIPII)
ne S'
where as before P, and PN are derived from P by replacing the rows indexed by m and n
~~T
respectively with the row vector 0 .
6.3 Positivity of the Stationary Distribution Components
Strict positivity of the stationary distribution components can be deduced from
Theorem B4 and the form of Pn. Every element of PR in every row other than row m is
identical to the corresponding element of P, while those in row m are zero. This is
expressed in the nonnegative matrix notation of Appendix B as 0 < P
Pn and P differ in row m, Pn # P, and consequently by Theorem B4(e), every eigenvalue
of Pi satisfies I i, < 1. It follows that for X 2 1, (1)  P ?I = F(X1 X)0 and (2) the
algebraic sign of IP,R I is (1)N for all m e S'. Specializing these arguments to the
case = 1 yields the following proposition.
Proposition 6.9: For all a > 0, the value of the determinant I PII satisfies
Vm e S' : (1) IPRI Oand
(2) the algebraic sign of P_ll is (1)N.
An immediate consequence of Proposition 6.9 is that both numerator and denomi
nator of the expression for q.(m) in Proposition 6.7 are nonzero and have identical alge
braic sign. Strict positivity of the stationary distribution components follows from these
observations. That is, Vm e S': q,,(m) > 0.
6.4 The Indeterminate Form at a= 0
All of the results established in this section assume that the mutation probability
parameter is strictly positive (a > 0), and thus are not applicable at a = 0. The reason is
apparent when Eq. 4.7 and the twooperator result in 4.17 (or the threeoperator counter
parts of Eq. 4.17 given by Eq. 4.23 and 4.27) are applied to PjII. It follows that the
row of the a 0' limit of I PII corresponding to the oneoperator absorbing state
nA E SA', nA : m is zero. That is, the only nonzero entry in row nA of the a ) 0 limit of
P is the principal diagonal element
lim P2(nA I nA) = lim P(A nA)= Pl(nA I nA)= 1,
a0 a0+
which is cancelled by the corresponding principal diagonal element in I. Thus,
Vm E S' : lim IPII = 0,
a 0+
and consequently Propositions 6.7 and 6.8 yield indeterminate forms. However, as dem
onstrated in the following section, it is possible to verify that a limiting stationary distri
bution vector exists for the timehomogeneous two and threeoperator algorithms.
SECTION 7
THE ZERO MUTATION PROBABILITY STATIONARY DISTRIBUTION LIMIT
7.1 Overview
In Section 4.3.1, it is established that the timehomogeneous oneoperator genetic
algorithm Markov chain possesses a stationary distribution but that it is not unique. In
Sections 4.3.2 and 4.3.3, it is established that the timehomogeneous two and three
operator counterparts possess unique stationary distributions provided a > 0, and Section
6 formulates the existence and uniqueness argument into a rational function expression
for the unique solution. Since the twooperator state transition matrix approaches its one
operator counterpart as a  0O (Eq. 4.17) and since the threeoperator algorithm exhibits
the corresponding behavior with respect to the P2'(i n)s (Eq. 4.23), a question which
naturally arises from these observations is whether an a( 0+ limiting distribution exists
for the two and threeoperator algorithms. (If such a limit exists, then it is necessarily
unique). This section answers that question affirmatively and also confirms the observa
tion made in Section 5.5 that the limiting distribution is nonzero for all states correspond
ing to uniform populations (absorbing states).
The approach taken here is to transform the expressions for q,,(m) in Propositions
6.7 and 6.8 into equivalent expressions which yield determinate forms at a = 0. The result
requires transforming P and P into related matrices but with the states corresponding to
uniform populations (oneoperator absorbing states) coalesced into adjacent nonuniform
population states. The development is tedious and involves some additional notation.
7.2 Functional Form of the Stationary Distribution
Before proceeding with the limiting case development which is the primary purpose
of this section, it is convenient to establish some intermediate results concerning the
behavior of q as a function of a. These results follow from the results developed in Sec
tion 6 and some simple observations about the form of the elements of P.
From Eq. 4.144.16 and Eq. 4.2226, all elements of the state transition matrix are
rational functions of a with denominator polynomial (1 + a)ML. Thus, for a > 0
(1 +aM P'l I = I (1 + a)MLp (l + a)MII
=10,,(1+00^ 11
where every element of Qn (and hence the value of I (1 + a)ML"I ) is a polynomial in
a. Further, since row m in Q is zero, the polynomial value of the determinant includes
the factor (1 + a)ML. Consequently
(l + a)ML' II P = (0) Eq. 7.1
for O,(a) some polynomial function of a. Proposition 7.1 below follows.
Proposition 7.1: For all a > 0, the value of the determinant Pj 1 is a rational function
of a with nonzero denominator polynomial (1 + oa)ML'1I)
By applying Eq. 7.1 to Proposition 6.7, the components of q, can be written as
%5(00) e(a)
qf(m) = =O(0) .(o) Eq. 7.2
IC XO e)((Xa) '
in S'
Hence, the q,(m) are rational functions of a, and since a rational function is continuous
everywhere its denominator polynomial is nonzero, application of Proposition 6.9 and
Eq. 7.1 (which together establish that E(a) = 8()a) 0) to Eq. 7.2 yields the following.
Proposition 7.2: For all a > 0, the components of q, are continuous rational functions of
the independent variable a.
Further, differentiation of Eq. 7.2 with respect to alpha yields a rational function of a
dq.(m) 1 d[() dOo()
dq~i) OC2 I () (a) d(( (c) () Eq. 7.3
with nonzero denominator polynomial 6(a)2. The following proposition is a conse
quence.
Proposition 7.3: For all a > 0, the components of the first derivative of q, with respect to
a are continuous rational functions of a.
7.3 The Absorbing State Rows of IPIl and I P,II
The rows corresponding to oneoperator absorbing states in the determinant
SP II have a particularly simple form. The nondiagonal elements of row nA e S,', which
represents a uniform population of solutions iA e S, are given by Eq. 4.19 and 4.29
respectively for the two and threeoperator cases. The principal diagonal element is
obtained by evaluating Eq. 4.19 or 4.29 at m = nA and subtracting 1. Thus,
(M MHU(i ,i )
(nA (I+ O)M
S1
(1 + a)ML
1  MLao ML(ML l)a2/2 . Oa"M
(1 + a)"L
MLa + O(ao2)
(1 + )ML
and if the general element in  P I is denoted by T(m I n), then the elements of row nA
can be written as
MLa+ O(a2)
S +[ n = n,
(1 + a) Eq. 7.4
Vn E S,': T(n In)= o )(),,),^i Eq. 7.4
ne S' {nA}
n(1 + a)'
Additional insight into the form of the absorbing state rows can be obtained with
the aid of the following notation. Let mA, A SA' be distinct but otherwise arbitrary
absorbing states of the oneoperator Markov chain, let iA e S be the bitstring represented
in nA and let S(iA) S be the set of bitstrings accessible from iA via exactly one bit muta
tion event (i.e. S(iA)= {i:i, e S,H(il,iA)= 1}). It follows from this definition that
card(S(iA)) = L. Then, for M > 1 let S(nA)' be defined as
S(nA)' = {n:ne S',n(iA)= Ml,n(i,)= ,i1 i S(iA)} CS',
the set of nonabsorbing states adjacent to the absorbing state nA. The restriction on M is
required to ensure that no absorbing state mA is contained in the adjacency set of any
absorbing state nA. S(nA)' includes exactly one distinct element for each i, e S(iA), and
consequently card(S(nA)') = card(S(iA))= L. Also, from the form of S(nA)', it follows that
for M 2 3, S(mA)' and S(nA)' are disjoint if mA and n, are distinct oneoperator absorbing
states. Thus, if SA" is defined as
SA"= S(n)'
A SA'
and M > 3, then card(SA")= card(SA') x L = NL. This restriction on M is assumed in all of
the following.
With the aid of the new notation, the element in column n e S(nA)' of row RA in
[PII can be written as
Vn E S(na)': T(n nA)=P(n I nA) =( M
M1 (1+o )ML
Moe
S(1 + a)ML
Ma
(1 + a)ML
Thus, Eq. 7.4 can be revised as follows
MLa + (a2)
(+ )7 n = nA
MC
VnA E SA' T(n I A)= ()L nE S(nA)' Eq. 7.5
ne S' S(nA'
(1 + )ML
where the exponent s of ca in the order expression for the general term is an integer satis
fying s 2 2. The elements in columns nA and n E S(nA)' are first order in a while the ele
ments in all other columns are at least second order.
Eq. 7.5 applies to every absorbing state row of I Pj 11 as well if m e SA'. If
Pi I is being considered where mA e SA', then row mA contains 1 at its principal
diagonal and zeros elsewhere. In that case Eq. 7.5 only applies to the absorbing state rows
n, e SA' {mA}. Exactly N 1 such rows exist in Pi I .
By applying Eq. 7.5 and these observations to Proposition 7.1, it follows that the
lowest order term with nonzero coefficient which can conceivably exist in the numerator
polynomial of Pm^i1 is the order cN' term. Similar reasoning reveals that the corre
sponding lowest order term with nonzero coefficient for I P, II with m e SA' is the order
aN term. If the coefficient of the order a"N term in the numerator polynomial of IP^ 1
is indeed nonzero, and if the corresponding coefficients for all such m, have the same
algebraic sign, then the required limiting value of q can be expressed in terms of these
nonzero coefficients via substitution into Proposition 6.7. These conditions are in fact
satisfied as demonstrated below.
7.4 Reformulation of Propositions 6.7 and 6.8
The next step in this development is the definition of some auxiliary matrices
related to P and Pin and the reformulation of Propositions 6.7 and 6.8 in terms of them.
The new matrices, designated P(mA)' and PW respectively, are derived by coalescing
each of the N 1 absorbing state columns nA e S,' {nmA of P 11 and P^ 1I with its
neighboring nonabsorbing state columns, n e S(nA)'. Specifically, let ^ be derived
from p by adding 1/L times the column nA e SA' {IA} to each of the L adjacent
nonabsorbing columns n e S(nA)' and repeating the process for each remaining
nA e SA' {mA}. This operation is applied once each for the exactly N 1 absorbing state
columns nA e SA' {mA} and it preserves the value of the determinant Qi^ = P^I
If now Q~A(m I n) denotes the general element of QA then by applying the recipe
used in its construction and Eq. 7.5, the elements in the absorbing state rows
nA E SA' {mA} of Qii can be written as
MLao + O(a2)
(I+c)ML m = nA
(1 + a) A
O(a2)
VmA e SA',V'A A SA' {mA} Q(m I nA) ML me S(nA)'
O(acs
(a)L m e S' S(nA)' {nA}
(1 + a)ML
where as before s is an integer satisfying s 2 2. Thus, each of the N 1 absorbing state
rows nA e SA' {mA} of QA I can be written as a sum of two rows, one row containing
ML(/(l + ca)L at its principal diagonal location and zeros elsewhere and the second row
being a multiple of a2/(1 + a)ML. It follows from elementary determinant row expansion
operations that P I = Ai can be written as
S (_MLaO)NI O(as)
Pi = I = A = + Eq. 7.6
A(1 + ML( 1) ( +()MLN'
where Q iA'A is the order N' N + 1 principal minor of Qmi generated by deleting the
N 1 row/column pairs which intersect on the nA e SA' {mA} principal diagonals and
where the exponent s of a in the Eq. 7.6 order expression is an integer satisfying s 2 N.
The elements in all rows of QA' I except row m, are composed of contributions
from the elements in nonabsorbing state rows of P and the 1 principal diagonal term
contributed by I in Pj^ I. Row mA of IQA' contains 1 at its principal diagonal loca
tion and zeros elsewhere. Thus, if QA^ I is written as A = PA I' then from the
recipe employed in its construction, it follows that the square matrix pzi. thus defined has
mA
dimension N' N + 1 and that its elements are given by
Eq. 7.7
0 n =mA
n # mA
SP(m I n) m S' SA' +
Vi, n A
S' SA'+mA} m n)'= {mA + S(mA) SA".
n # mA
P(m n) + P(nA n) me S(nA)'
nA e SA' {mA
Careful examination of Eq. 7.7 reveals that the transformation by which P A' is gen
erated from PE preserves all row sums. Thus, P5,' is very similar in form to PP It is
derived from a (fictitious) row stochastic matrix by setting a specified row (row mA) to
zero.
If the preceding steps are repeated for I PniI where m SA', except that all N
absorbing state columns n e SA' are coalesced rather than just the N 1 columns
nA E SA' {mA}, a result very similar in form to Eq. 7.6 obtains. That is,
(MLa)N I Q,'I O(Ca)
P = Q = + Eq.7.8
(1 + a)MLN (1 + a)MLN'
where I Q'I is the order N' N principal minor of I Q5j generated by deleting the N
absorbing state row/column pairs and s is an integer satisfying s 2 N + 1. The nonabsorb
ing state row m contains 1 at its principal diagonal location and zeros elsewhere.
Substitution of Eq. 7.6 and 7.8 into Proposition 6.7 yields a form more amenable to
examination of the a 4 0' limiting stationary distribution. The two cases in SA' and
m S' SA' must be distinguished. Then, after some straightforward algebra,
C1,' + 0(a)/(1 + )N'N+
AG QA' +O(a)/(1+a)o*'N+'
O(a)/(l + +a)N'N +
SI' Q + O(a)/( +a)ONN+'
nA e SA'
m = nmAe SA'
me S' SA
An equivalent result expressed in terms of the auxiliary matrices P is
Pj I' + 0(a)/(1 + a)N'N+1
lE A I' + O(a/( + a)"'"' m=mA
q(m) = A
O(ap/(1 + (a)'(N+' + I
imE SE 
C P ,' + O(a)/(l+ a)N'N+'
nAE SA
SA
SA
Eq. 7.9
By retracing the preceding steps by which P', was transformed into PnA compan
ion results to Eq. 7.7 and Eq. 7.9 can be developed for P(mA)'. The companion to Eq. 7.7
differs only in the elements of row n = mA. Thus, if P(m I n)' denotes the general element
in P(mA)', then
Eq. 7.10
P(m I n)
Vm, ne
S S.'+P(m I n)' =
S'S '+mI
me S' SA'
+ {mA + S(A)' SA"
1 me S(nA)'
P(m I n)+P(n I n)
L nA e SA' ImA
Further, examination of Eq. 7.10 reveals that the row sum constraint on P is preserved in
the transformation by which P(mA)' is generated (i.e. P(mA)' is a stochastic matrix). Thus,
I P(m)' I' = 0. A consequence is the Proposition 6.8 counterpart of Eq. 7.9,
Eq. 7.11
SP( nA)' I P I' + O(a)/(l + a)N'N+1
qa(m)= nE s +
I { P(nA)'I' A I'll +0(a)/(l +a)N'N+
"A e SA
m=m E SA'
me S SA'
7.5 The Stationary Distribution Limit
The zero mutation probability limits of Eq. 7.9 and Eq. 7.11 exist if the determinant
sums in the denominators are nonzero. In fact they are nonzero, as demonstrated in the
following. This argument is very similar in form to the development in Section 6.3 con
cerning positivity of the stationary distribution. The essential step is demonstration of the
existence of a primitive stochastic matrix Q' which satisfies both
0 lim P, Q' and Q' lim P '.
a 0 A C40+ A
a.o=+ O=0
If the twooperator algorithm is under consideration, then the elements of the
a  0' limit of PR are obtained by substituting the oneoperator results in Eq. 4.25 into
Eq. 7.7. If the threeoperator case is under consideration, then Eq. 4.22 and Eq. 4.24,25
are employed. In the following, the twooperator notation is employed.
Let Q' be generated from the a  0+ limit of P A' by replacing row m^ with the row
whose elements are given by
VmE S'SA+ {mA :Q'( I mA)= > 0. Eq.7.12
N'N+1
Thus, the row sum of row mA in Q' is 1. Since all remaining rows of Q' are identical to
those of the a  0+ limit of P ', and consequently have row sum 1 by Eq. 7.7, Q' is a
stochastic matrix. Additionally, it satisfies both
0 lim PA' < Q' and Q' lim PA'.
a 0+ aO 0
Q' can be regarded as the state transition matrix of a fictitious Markov chain
defined on the state space S' S' + {m^}. Since Q(m, I m,) > 0, the fictitious Markov
chain is both periodic (Definition A9, Theorem A2) and primitive (Definition B1, Theo
rem B1) provided that it is irreducible (Definitions A7 and A8, Theorem Al). Thus,
primitivity is established by demonstrating that every state m e S' S,' + {m^} is
accessible in some finite number of transitions from every state n S' S' + {m,}.
Since all states in S'SA'+ {mA} are accessible in one transition from m, (Eq. 7.12), it is
sufficient to demonstrate that mA is accessible in some finite number of transitions from
every state n e S' SA'.
Let iA E S be the bitstring represented in mA, let n e S' SA' and let i, e S be
selected such that n(i,) > 0 and H(i1, IA) H(i, iA) for all i represented in n. Then, two
cases must be examined. In case (1), ii = iA while i t iA for case (2).
If ii = iA, it follows from Eq. 7.7 and the construction of Q' that
Q'(mA n)= lim Pi(mA I n) = lim P2(mA I n)
a 0+ ta 0+
[lim P2iA In)]M
= P(iA I n)M = P(i, I n)M > 0
and consequently mA is accessible from n in 1 transition. Otherwise
3i2e S(i,) 3 H(i2, A) = H(i,,A) 1
and further if n, e SA' is the oneoperator absorbing state defined by the condition
n,(ii) = M while n,2 e S(n,)' is the adjacent nonabsorbing state defined by
nl2(i) = M 1, n2(i2) = 1, then from Eq. 7.7 and the construction of Q'
Q'(nl2 1 n) = lim P2(n2 I n)+ lim P2(, n)
0W L, 0+
a 0 Ia o
1
= P,(n2 n)+P1(n, In)
L
= P,(n, n)
= [P,(i I n)]M > 0.
Thus, nl2 is accessible from n in one transition. If i2 = iA, then by the case (1) argument mA
is accessible in one additional transition. Otherwise, the case (2) argument is repeated for
some
i3 E S(i2) 3 H(i3, iA)= H(i2 ) 1 = H(i,, iA) 2.
83
This procedure necessarily terminates with H(i,,iA)+ 1 applications and the correspond
ing state space trajectory is executed with nonzero probability.
From the foregoing argument, it follows that state m, is accessible in some finite
number of transitions from every state n e S' SA', and thus that Q' is primitive. Then,
since both
0< lim PA'< Q' and Q'# lim P',
aO A a + 'O+M
it follows from Theorem B4(e) that every eigenvalue of the a 0+ limit of P; ,' satisfies
SXl < 1. Proposition 7.4 below, which is the a  0 counterpart of Proposition 6.9, is a
consequence.
Proposition 7.4: The value of the determinant lim Pi' ' satisfies
a^
VmA e SA': (1) lim PiA'Ij'#0
a>0*
(2) the algebraic sign of lim PI I' is (1)N'N+1
a  0+
The conditions asserted in Proposition 7.4 ensure that substitution into Eq. 7.9 and
7.11 yields a determinate form in the a 0+ limit. Propositions 7.5 and 7.6 below repre
sent the limiting forms, and consequently are respectively the limiting counterparts of
Propositions 6.7 and 6.8.
Proposition 7.5: The components of lim q,, exist and can be expressed in the form
a^
limq lin Pl I'
a0* E SA A' +0'
0 nme S' SA'
Proposition 7.6: The components of lim q, exist and can be expressed alternatively as
a 0*
lim P(mA)' I
a 0
__ [ o''= ' m m= mA e SA
lim q,(m) = qo(m) = lim I P(nA n)' ' 
aE 0 n0A E SA' a 00
0 } me S' SA'
An immediate consequence of Propositions 7.4 and 7.5 is strict positivity of the
zero mutation probability limiting stationary distribution components for all absorbing
state rows. That is, VmA e S : q0(mA) > 0. The argument is analogous to that at the
conclusion of Section 6.3 concerning strict positivity of all stationary distribution compo
nents when a > 0. This result is anticipated by the simulation results in Section 5.5. A
consequence is that the required limiting behavior for direct application of the simulated
annealing convergence theory to the genetic algorithm model does not follow. However,
the results displayed in Section 5.5 and developments produced in Section 9.3 suggest
that the limiting distribution can be made arbitrarily close to the desired limiting behav
ior.
Since the a  0+ limit of the stationary distribution exists, the definition of q, can
be extended to include the point a = 0. That is
qjao= 90= lim qa
a 0+
where the values of the required limits are provided by Proposition 7.5. Proposition 7.7
below follows from this extended definition of q, and Proposition 7.2.
Proposition 7.7: For all a 2 0, the components of q, are continuous rational functions of
the independent variable a.
Proposition 7.3, which concerns the first derivative ofq,, can also be extended to
include the limiting case. The extension requires easily obtainable counterparts of Eq.
7.13 developed for IP' I' and Eq. 7.9. The Eq. 7.1 counterpart is
(1+a) tM( ') PF = 0(a)', Eq. 7.13
and that for Eq. 7.2 is
6 .(a)' + O(a)
6( O m = mA e SA'
(m)= () S Eq. 7.14
O(a)
( me S SA '
6(a)' + 0(a) A
where E(a)' is the polynomial counterpart (summed over n, e SA') of E(a) in Eq. 7.2.
Differentiating Eq. 7.14 with respect to a yields a rational function with denominator
polynomial [%(a)' + O(a)]2, whose a  0 limit is nonzero by Proposition 7.4, Eq. 7.13
and the definition of 6(a)'. Proposition 7.8 below follows from Proposition 7.3 and these
observations.
Proposition 7.8: The components of the first derivative ofq,, with respect to a possess
limits as a  0'.
Thus, a zero mutation probability limit exists for the timehomogeneous two and
threeoperator algorithm variants. The limit is represented by Propositions 7.5 and 7.6.
Further, Propositions 7.7 and 7.8 establish some useful ancillary results concerning the
stationary distribution behavior at the point a = 0. These latter results are employed in the
following section in establishing strong ergodicity of the inhomogeneous genetic algo
rithm Markov chain. Propositions 7.5 and 7.6 are used in Section 9 to develop a method
ology for representing the stationary distribution limit.
SECTION 8
A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND
8.1 Overview
The annealing schedule bounds for the simulated annealing algorithm, which are
reviewed in Section 2.4.2, are derived by requiring that the nonstationary Markov chain
which represents the algorithm be strongly ergodic (Definition A13) and then deducing a
monotonic lower bound on the algorithm control parameter. The methodology consists of
demonstrating that the timehomogeneous Markov chain corresponding to every positive
algorithm control parameter value possesses a stationary distribution, that the sequence of
stationary distributions corresponding to any sequence of positive control parameter val
ues converges to a limiting distribution if the control parameter sequence converges to
zero, and then employing Definitions A 1lA13 and Theorems A5A7 to deduce a
sufficient condition (the annealing schedule lower bound) to guarantee that the nonsta
tionary algorithm achieves the limiting distribution (i.e. strong ergodicity).
The model development in Section 4 demonstrates that for all mutation probability
values in the range 0 < p < 1, the Markov chain representing either the two or three
operator timehomogeneous simple genetic algorithm possesses a stationary distribution.
Section 7 demonstrates that the stationary distribution approaches a limit as the mutation
probability parameter approaches zero. This section proposes and then verifies a mono
tone decreasing lower bound on the mutation probability sequence of the nonstationary
genetic algorithm Markov chain which is sufficient to ensure strong ergodicity.
8.2 A Weak Ergodicity Bound
The following paragraphs propose and then verify a mutation probability parameter
bound sufficient to ensure that the Markov chain of the corresponding nonstationary sim
ple genetic algorithm is weakly ergodic (Definition All). The bound applies to both the
two and threeoperator algorithms, and it appears in Proposition 8.1 below.
Proposition 8.1: The mutation probability bound given by
1 
pm(k)2 k M
is sufficient to ensure weak ergodicity of the corresponding nonstationary (two or three
operator) simple genetic algorithm Markov chain.
This result is established by using the lower bounds on the two and threeoperator
conditional probabilities in Eq. 4.21 and 4.31 with Definitions Al l and A12 and Theo
rems A5 and A6. Applying the lower bound in Eq. 4.21 and 4.31 to T,(.) of Definition
A 12 and Theorem A5 yields
T,(P) = 1 min min(P(m I n,), P(m  n))
Ii D2 m
++a i m
1+aa jl +a
+2a i
1+a)
Thus,
(P)) 2a
>+(aP)
and consequently from Theorem A6, the chain is weakly ergodic if the sequence of con
trol parameter values {a(k)} satisfies
( 2o(k) ML
k=l 1 + a(k)) )
Comparing this result to the known divergent series 'k1, it follows that the Markov
chain is weakly ergodic if the sequence {(x(k)} satisfies
2C2a(k) ,
1 + a(k)
from which
1k
_x(k) Ik.
1 + a(k)) 2
Using Eq. 4.13 to translate this result into an equivalent expression in pm(k) establishes
Proposition 8.1.
8.3 Strong Ergodicity
The mutation probability schedule bound advanced in Proposition 8.1 is also suffi
cient to achieve strong ergodicity if it satisfies the condition on the sequence of vector
differences in Theorem A7. The required sequence of vectors can be selected as the
sequence of stationary distributions of the timehomogeneous Markov chains associated
with the parameter sequence {pm(k)} (or equivalently with the corresponding sequence
{a(k)}).
Section 4 establishes that a stationary distribution exists for the timehomogeneous
two and threeoperator algorithms corresponding to every value of a satisfying a > 0.
Thus, associated with the sequence of control parameter values {a(k)} is a sequence of
vectors {qk} where q = q, evaluated at a = a(k). Further, based upon results established
in Section 6, Section 7 demonstrates that an a > 0+ limiting stationary distribution exists
(Propositions 7.5 and 7.6), that the stationary distribution vector varies continuously for
all a satisfying a 2 0 (Proposition 7.7) and that its first derivative exists and is continuous
for all a satisfying a > 0 (Proposition 7.3). In particular, gq is continuous on the closed
interval 0 < ox < 1 and its first derivative exists at every interior point of that interval.
Therefore, if consideration is limited to monotone decreasing control parameter
sequences, then by the mean value theorem the difference between the m components of
any two consecutive vectors in the sequence can be written as
qk+ I )q(k(m)= dq(x (a(k + 1) oa(k))
I a=a'(k)
where the value a'(k) satisfies o(k + 1) < ao(k) < oa(k). Consequently,
lqk+(m)q(m) = Id( xI o(k + ) (k)l
Sqk,( ) ( a=a'(k)
and
dq (m)
lqk+1(m)qk(m)= k1 doi xa(k+ 1)a(k)j Eq.8.1
SI qk+() () do( J= = I I a= C'(k)
k=l k= I a a J
From Propositions 7.3 and 7.8, it is possible to define a function ga(m) which is
continuous in a on the closed interval 0 5 a 5( 1 as follows
I dq,(i) <
Sdo
ga(mi=) dEq. 8.2
lim dq a=0
a0+ da
Then, from a fundamental theorem in the calculus of functions, it follows that g,(m) (and
consequently that I g,(m)l) is bounded on the closed interval 0 < a 1. Thus, if
B = sup I g,(m)l, Eq. 8.3
me S',aE [0,11
then it follows from Eq. 8.2 that at every interior point of the interval 0 a c 1
dq,(m)
d d
and application of this result to Eq. 8.1 yields
Sl, (m) q(m) = d x I a(k + 1) a(k)l
k=1 k= a a) J
x B x (k + ) a(k)l Eq. 8.4
k=l
=B I (k + 1) a(k) .
k=l
Since only monotonic control parameter sequences are under consideration, the sum
in the last line of Eq. 8.4 can be written as the difference of the initial and final parameter
values of the sequence. Thus,
I q+ ,(m)q (m) B(ca(1) c(oo))
k=l
=B(1 0) Eq. 8.4
=B
The series of vector differences required for application to Theorem A7 can then be writ
ten as
l qk+] q1= I ql+1(m)qk(m)
k=1 k+lmeS'
S1 B Eq. 8.5
me S'
= N'B < o
Applying the combined results of Proposition 8.1 and Eq. 8.5 to Theorem A7 pro
duces the goal of this section.
Proposition 8.2: The mutation probability bound given by
1 
pm(k) k k
is sufficient to ensure strong ergodicity of the corresponding Markov chain. Further, the
Markov chain representing any nonstationary two or threeoperator simple genetic
algorithm for which the mutation probability sequence both observes this bound and con
verges to zero achieves asymptoticallyy) the limiting probability distribution defined in
Propositions 7.5 and 7.6.
8.4 Comparison With the Simulated Annealing Parameter Bound
It is instructive to compare the mutation probability sequence bound developed here
with the annealing schedule bounds reviewed in Section 2.4.2.2, both of which are of the
form K/log(k). Let p(k) be defined as the ratio
p(k) = p(k)/T(k)
where p,(k) is selected as the bound developed herein and T(k) is selected as the bound
provided by either Eq. 2.12 or Eq. 2.13. That is
p(k)= k"'/[K/log(k)] Eq. 8.5
2
1 1
= log(k)/k'.
Thus, decreasing values of p(k) imply that the genetic algorithm convergence rate is
superior asymptoticallyy) to that of the simulated annealing algorithm.
Now, let k = exp(x), or equivalently x = log(k). Substituting into Eq. 8.5 yields
p(k) = x exp .
Then, since for all positive constants y, the limit of x exp(yx) as x > o is zero, it follows
that
lim p(k)= Ig(/] = 0. Eq. 8.6
Thus, the nonstationary simple genetic algorithm provides an asymptotically superior
convergence rate.

Full Text 
PAGE 1
TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM By THOMAS E. DAVIS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1991
PAGE 2
UNIVERSITY OF FLORIDA 3 1262 08552 3479
PAGE 3
ACKNOWLEDGEMENTS The author is extremely fortunate in having the assistance of several talented academicians during the conduct of the research program reported in this dissertation. Notably, Professor Jose Principe, who supervised this work and contributed several key ideas developed herein, proved a very valuable source of encouragement and support. Also, Professor Murali Rao assisted very substantially in enforcing mathematical rigor, especially in the formulation of the Markov chain appendices. Professors Antonio Arroyo and Donald Childers, who served on the committee overseeing this work, are remembered fondly for the constructive comments they provided during the visits the author made to Gainesville while conducting this research, as well as for productive associations while the author was in residence. Also, the assistance provided by Professor Eugene Chenette from the Eglin Graduate Center in dealing with a variety of administrative complications, as well as his service on the committee overseeing this work, is graciously acknowledged. Additionally, the generous support of the US Air Force Annament Laboratory to this work is sincerely appreciated. The author's management chain, notably Mr. Lynn Deibler, Lt. Col. Rex Franklin, Dr. Eugene Youngblood and Lt. Col. Tom Callen, provided continual encouragement and working condition flexibility. Without their support, this activity would likely not have been possible. The Computer Science Directorate at Eglin provided some exceptionally valuable computer support, on the Eglin AFB Cray YMP, under very flexible conditions. Some of the insights gained during the conduct of this research would be very difficult, and perhaps impossible, to attain through any method other than simulation. Members of the staff at the Computer Science Directorate who contributed significantly, especially in ii
PAGE 4
helping the VMSinclined author through the UNIX maze, include Mr. Eddie Blackwell, Mr. Ben McKinnon and Mr. Danny Majors. Mr. Bill Clements, who made the computing resources available, and Mr. Calvin George, who helped the author arrange the support, are also gratefully acknowledged. Finally, the author wishes to thank Sumiko, who entered his life during the conduct of this research program, for the support and understanding whose need she never fails to anticipate. m
PAGE 5
TABLE OF CONTENTS page ACKNOWLEDGEMENTS ii ABSTRACT vii SECTIONS 1 INTRODUCTION 1 1.1 NonConvex Combinatorial Optimization and Stochastic Search Algorithms 1 1.2 Organization 2 2 SIMULATED ANNEALING 7 2.1 Overview 7 2.2 Statistical Mechanics and Annealing of Solids 7 2.3 Combinatorial Optimization by Simulated Annealing 9 2.4 Theoretical Foundations of Simulated Annealing 10 3 THE GENETIC ALGORITHM 20 3.1 Overview 20 3.2 The Simple Genetic Algorithm Operators 21 3.3 Building Bloclcs, Schemata and the Fundamental Theorem... 23 3.4 An Assessment of the Genetic Algorithm Theoretical Foundation 26 4 A MARKOV CHAIN MODEL OF THE SIMPLE GENETIC ALGORITHM 28 4.1 Overview 28 4.2 The Markov Chain Model 28 4.3 The State Behavior of the Simple Genetic Algorithm 30 5 SOME EMPIRICAL RESULTS 42 5.1 Overview 42 5.2 State Space Enumeration 43 5.3 Reward Function Data 46 5.4 Conditional Probabilities vs a 48 5.5 Converged Limiting Stationary Distributions 52 IV
PAGE 6
page 6 THE CRAMER'S RULE FORMULATION OF THE STATIONARY DISTRIBUTION 66 6.1 Overview 66 6.2 The Stationary Distribution Description 66 6.3 Positivity of the Stationary Distribution Components 71 6.4 The Indeterminate Form at a = 72 7 THE ZERO MUTATION PROBABILITY STATIONARY DISTRIBUTION LIMIT 73 7.1 Overview 73 7.2 Functional Form of the Stationary Distribution.^ 73 7.3 The Absorbing State Rows of PI and IPsTl 75 7.4 Reformulation of Propositions 6.7 and 6.8 77 7.5 The Stationary Distribution Limit 81 8 A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND 86 8.1 Overview 86 8.2 A Weak Ergodicity Bound 87 8.3 Strong Ergodicity 88 8.4 Comparison With the Simulated Annealing Parameter Bound 91 9 REPRESENTATION OF THE STATIONARY DISTRIBUTION SOLUTION 92 9.1 Overview 92 9.2 The Limiting Case a= 1 93 9.3 The General Case < a < 1 97 9.4 The Limiting Case a ^ 0* 109 9.5 Extending the Stationary Distribution Representation 1 16 10 CONCLUSIONS AND FUTURE DIRECTION 120 10.1 Summary 120 10.2 Contributions of the Research 124 10.3 Future Direction 125 APPENDICES A DISCRETE TIME FINITE STATE MARKOV CHAINS 126 A.l Introduction 126 A. 2 Elementary Definitions 126 A. 3 TimeHomogeneous Markov Chains 128 A. 4 Inhomogeneous Markov Chains 130
PAGE 7
page B THE PERRONFROBENIUS THEOREM AND STOCHASTIC MATRICES 132 B.l Introduction 132 B.2 The PerronFrobenius Theorem and Ancillary Results for Primitive Matrices 132 B.3 The PerronFrobenius Theorem for Stochastic Matrices 134 C VANDERMONDE DETERMINANTS, SYMMETRIC AND ALTERNATING POLYNOMIALS 137 C.l Introduction 137 C.2 Evaluation of Vandermonde Determinants 138 C.3 Symmetric (and Alternating) Polynomials 139 C.4 QuasiSymmetric (and QuasiAlternating) Polynomials 142 D COMPUTER LISTINGS 145 D.l Introduction 145 D.2 Main Program Listings 145 D.3 Library Listings 152 REFERENCES 163 BIOGRAPHICAL SKETCH 166 VI
PAGE 8
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM By THOMAS E. DAVIS May 1991 Chairman: Professor Jose C. Principe Major Department: Electrical Engineering Simulated annealing and the genetic algorithm are stochastic relaxation search techniques suitable for application to a wide variety of combinatorial complexity nonconvex optimization problems. Each produces a sequence of candidate solutions (or populations of candidate solutions) to the underiying optimization problem, and the purpose of both algorithms is to generate sequences biased toward solutions which optimize the objective function. The appeal of simulated annealing is that it provides asymptotic convergence to a globally optimal solution. A substantial btxly of knowledge exists concerning the algorithm convergence behavior. It is based upon a nonstationary Markov chain algorithm model. No genetic algorithm model comparable in scope exists in the literature. This work constitutes an attempt to provide such a model and accompanying convergence theory by extrapolating the simulated annealing results onto the genetic algorithm. A prerequisite, developed herein, is a nonstationary Markov chain genetic algorithm imxlcl. vn
PAGE 9
The essence of the simulated annealing theory is demonstration of (1) existence of a unique asymptotic probability distribution (stationary distribution) for the stationary Markov chain corresponding to every strictly positive constant value of an algorithm control parameter (absolute temperature), (2) existence of a stationary distribution limit as the control parameter approaches zero, (3) the desired behavior of the stationary distribution limit (i.e. optimal solution with probability one) and (4) sufficient conditions on the algorithm control parameter to ensure that the nonstationary algorithm achieves (asymptotically) the limiting distribution. With the exception of (3), this work adapts that methodology to the genetic algorithm Markov chain model employing a genetic operator parameter (mutation probability) as the algorithm control parameter. The results include a mutation probability control parameter bound analogous to (and asymptotically superior to) the conventional simulated annealing parameter bounds, and a framework for representing the genetic algorithm stationary distribution components at all consistent fixed control parameter values, including zero. The genetic algorithm stationary distribution limit has nonzero components corresponding to all solutions. Thus, the simulated annealing global optimality convergence result does not extrapolate. However, both empirical and theoretical evidence is provided which suggests that the desired limiting behavior can be approached by suitably adjusting the algorithm parameters. vni
PAGE 10
SECTION 1 INTRODUCTION 1.1 NonConvex Combinatorial Optimization and Stochastic Search Algorithms A wide variety of engineering applications lend themselves to formulations which require the solution of combinatorial optimization problems. Typically, the optimization problem is nonconvex and is defined over a very high dimensionality search space (e.g. inverse vision problems, in which an image array of 512X512 pixels at 8 bits/pixel might be encountered, resulting in a search space dimensionality of ~2M). Consequently, direct solution is usually intractable. An alternative to direct solution is to select one of a variety of iterative improvement solution techniques, usually some variant of gradient search. But by definition, deterministic iterative improvement techniques terminate in local extrema, and they ordinarily provide no means of assessing the amount by which the selected local extremum deviates from the global extiemum. A typical means of avoiding local extrema entrapment is to implement the iterative improvement solution method stochastically. The most commonly employed stochastic algorithm approach to combinatorial optimization is simulated annealing [KiGe83, LaAa871, which is also sometimes referred to as probabilistic hill climbing RoSa85. It exploits the analogy of combinatorial optimization to the annealing of crystalline solids, in which a solid is cooled very gradually from some elevated temperature and thereby allowed to relax toward its low energy states. The appeal of the algorithm class derives from the fact that provided certain constraints on an algorithm control parameter (analogous to absolute tcnipcratiirc) are observed, asymptotic convergence to a global extrcrnum is guaranteed.
PAGE 11
The key limitation of simulated annealing is that the convergence behavior is asymptotic. Thus global optimality is obtained only after an infinite number of algorithm iterations. The rate of convergence to optimality is determined by a nonnegative algorithm control parameter whose ideal value is zero and which must observe a lower bound in order to assure coherent algorithm behavior. The best available known bound for the parameter, the annealing schedule bound, is of the form K/log(k) where k is the iteration index and K is a parameter independent of k [GeGe84, MiRo85]. Another combinatorial optimization stochastic search technique reported in the literature is the genetic algorithm [Davi87, Gold83, Gold89a, Gref85, Gref87]. It emulates the evolution of biological systems by employing a set of stochastic operators (e.g. reproduction, crossover and mutation) to transform a population of candidate solutions to the underlying optimization problem into a new (descendent) population. It has some features which suggest that it may provide significantly improved convergence behavior over simulated annealing on certain types of optimization problems. However, the nature of the genetic operators and their influence on algorithm behavior is only understood in general terms. No complete theoretical model of the algorithm exists in the literature. The fundamental goal of the work reported here is to provide a theoretical framework for analyzing the algorithm based upon the asymptotic probability distribution of the solution sequences which it produces. The work reported herein includes significant progress on the key intermediate steps to achieving that goal. 1.2 Organization The remaining sections of this paper are organized as follows. Sections 2 and 3 are background reviews of the simulated annealing and genetic algorithm literature respectively. Section 2 places considerable emphasis on the methodology employed to yield the asymptotic convergence results which are the theoretical foundation of the simulated annealing algorithm. That methodology appeals heavily to the theory of inhomogeneous (nonstationary) Markov chains and their asymptotic state probability distributions. The
PAGE 12
essence of the simulated annealing convergence theory is a set of sufficient conditions to ensure that the asymptotic probability distribution of the Markov chain which represents the algorithm is independent of its starting state and has probability zero for all states corresponding to suboptimal solutions. Section 3 begins with a verbal description of the three fundamental stochastic operators employed in genetic algorithms (i.e. reproduction, crossover and mutation), and proceeds to review the existing theoretical foundation of the algorithm class. A conclusion of that section is that while certain important theoretical results exist, notably the socalled schema theorem and some work on a problem construct referred to as the minimal deceptive problem, the genetic algorithm lacks the theoretical foundation necessary to either compare it with simulated annealing or to answer key questions concerning the design of a genetic algorithm for a given application. The author's contribution to this work begins with Section 4. The major result of that section is a very general, nonstationary Markov chain model of the variants of the genetic algorithm which employ combinations of the three fundamental genetic algorithm operators. The model is tailored to resemble that employed in developing the simulated annealing methodology, and in that regard, the genetic algorithm mutation operator is shown to provide a function very similar to that of the simulated annealing absolute temperature analog. Specifically, the stationary algorithm corresponding to every constant value of the mutation probability parameter satisfying 0
PAGE 13
Section 5 digresses briefly from the theoretical development to produce and examine some empirical work based upon the algorithm model. The presentation is not, nor is it intended to be, a thorough empirical study. It is provided to help fix some of the algorithm model state space and asymptotic probability distribution ideas which are central to this work, and it anticipates some of the theoretical results which follow. Section 6 resumes the theoretical development. Its result is an expression for the components of the unique asymptotic probability distribution produced by the stationary algorithm variants which implement the mutation operator with nonzero mutation probability (i.e. the stationary two and threeoperator algorithm variants). The result is expressed in terms of Cramer's Rule and thus its solution requires evaluation of determinants. The determinants are the characteristic polynomials, evaluated at A,= 1, of matrices derived from the state transition matrix produced in Section 4 by zeroing one row. A later section attacks the problem of explicitly solving the system, based upon the highly symmetrical nature of the state transition matrix, but some very significant results are obtainable from the product of Section 6 without explicit solution. An essential step in establishing a connection between simulated annealing and the genetic algorithm is demonstrating the existence of a stationary distribution limit for the algorithm as the mutation probability approaches zero. Section 7 accomplishes that task and also provides a foundation for deducing, in Section 8, a mutation probability bound analogous to the annealing schedule bounds of the simulated annealing algorithm. The results developed in Sections 7 and 8 apply to both the two and threeoperator algorithm variants. A somewhat surprising result produced in Section 7 and anticipated by the empirical study reported in Section 5 is that the stationary distribution zero mutation probability limit does not necessarily isolate globally optimal solutions. In fact, it provides nonzero probability for all solutions of the underlying optimization problem and consequently the extrapolation of the simulated annealing methodology is less than exact. However, both
PAGE 14
the empirical results presented in Section 5 and some results developed later in Section 9 suggest that the required limiting behavior can be approached as closely as desired by adjusting the algorithm parameters appropriately. Section 9 attacks the problem of explicitly solving the system which results from the Cramer's Rule formulation of the stationary distribution of the timehomogeneous two and threeoperator algorithms. It is a very extensive development which yields an expression for the coefficient of the general term in the Taylor's series expansion of the required determinants. It is based upon the highly symmetrical nature of the state transition matrix, as alluded to earlier. The results of Section 9 are not reduced to a directly useable explicit solution. Nevertheless, they do provide significant insight into the functional form of the stationary distribution components. Furthermore, Section 9.5 points out some very significant identities which exist among the coefficients of the Taylor's series and suggests a method for continuing the Section 9 development based upon the algebra of symmetric and alternating polynomials. Explicit solution of the stationary distribution equations is the major incomplete task required for extrapolation of the simulated annealing convergence theory onto the genetic algorithm. Section 10 summarizes this work and recapitulates the significant results. It also proposes continuation of two parts of this research: (1) pursuit of the stationary distribution solution and (2) refinement of the mutation probability control parameter bound. An appropriate mathematical framework for examining both the simulated annealing and genetic algorithms is the theory of Miukov chains. Appendix A is included to summarize some essential definitions and theorems. Appendix B is devoted to the PerronFrobenius theorem, which is fundamental to the study of nonnegative matrices in general and Markov chains in particular. Several important Markov chain theorems are specializations of it and the key developments in Sections 6 and 7 require its application. All of the Appendix A and Appendix B results are provided without proof or elaboration.
PAGE 15
but their foundation is obtainable from various references (e.g. [Cinl75] for the more elementary results in Appendix A, [SeneSl] for the Appendix B material on the PerronFrobenius Theorem and [SeneSl, IsMa76] for the Appendix A ergodicity related definitions and theorems). These results are invoked freely in the following sections, either by specific reference to definition/theorem number, or if the context makes it appropriate, they are simply assumed. Appendix C is provided as background for the Section 9.5 discussion on coefficient identities and extending the stationary distribution representation development. With the exception of Section C.4, the material presented in Appendix C is obtainable from advanced algebra texts (e.g. [MoSt64]). The symmetric/alternating polynomial generalization in Section C.4 is original. Appendix D collects the computer program listings for the programs employed in generating the results reported in Section 5. The programs presented there were developed and executed on the Cray YMP operated by the Computer Science Directorate at EgUn AFB, PL
PAGE 16
SECTION 2 SIMULATED ANNEALING 2.1 Overview As noted in the introduction, a very commonly employed approach to the solution of nonconvex combinatorial optimization problems is a stochastic relaxation technique introduced by Kirkpatrick et al. and referred to as simulated annealing [KiGe83]. The technique is so named by virtue of its analogy to the annealing of solids, in which a crystalline solid is heated to its melting point and then allowed to cool very gradually until it is again in the solid phase at some nominal temperature. In the limiting case of infinitesimal cooling rate and absolute zero final temperature, the resulting solid achieves its most regular possible crystal lattice configuration (i.e. minimum lattice energy state), and hence is free of crystal defects. Simulated annealing establishes the connection between this sort of thermodynamic behavior and the search for the global minimum of an objective function in a combinatorial optimization problem, and further, it provides an algorithmic means of exploiting the connection. This section is a review of the technique with special emphasis on known results which bound the convergence behavior of computer algorithms belonging to the class. 2.2 Statistical Mechanics and Annealing of Solids The fundamental assumption of statistical physics is that the thermodynamic behavior of a many particle system can be represented by a statistical ensemble, and that if the system is in thermal equilibrium, the time averages of macroscopic thcmiodynaniic properties of the system are equal to the corresponding ensemble averages (ergodicity hypothesis). The random variable represented by the ensemble is the system thermal energy, and at thermal equilibrium the probability distribution is completely determined
PAGE 17
8 by the system temperature. The distribution is known as the Boltzman distribution, or alternatively as the Gibbs distribution, and its form is exp{E(i)/kT} Pr{E = E(i)}=Z(T) Eq. 2.1 where E(i) k T Z(T) = the system thermal energy (a random variable) = the energy corresponding to state i = Boltzman's constant = the system temperature = the partition function. The factor exp E(i) kT is called the Boltzman factor. The partition function provides the necessary normalization to make Eq. 2. 1 a state occupancy probability. It can be expressed as Z(T)=Iexp{:P}. Eq.2.2 At elevated temperatures, the system represented by the probability distribution in Eq 2.12 occupies all states in its state space with nearly uniform probability, while at low temperatures, states having low energy are favored. When the temperature approaches absolute zero, only states corresponding to the minimum value of energy have nonzero probability. Thus, the thermodynamic system's energy function can be effectively searched for its minimum value by starting the system at an elevated temperature and allowing it to cool gradually to absolute zero, at which point one of its minimum energy states is occupied with probability one. This is the mechanism which guides the annealing of solids. The cooling schedule employed in annealing solids is constrained by the requirement that the system be allowed to achieve thermal equilibrium at each temperature. The
PAGE 18
Gibbs distribution only represents the system's energy distribution in the stationary case (i.e. equilibrium). If this requirement is not satisfied, defects can be frozen into the crystal lattice preventing the system from achieving the minimum possible energy state. This behavior is analogous to local minima entrapment in combinatorial optimization search. The restriction on the annealing schedule necessary to avoid it is the fundamental limitation on the annealing technique. 2.3 Combinatorial Optimization by Simulated Annealing Simulated annealing approaches combinatorial optimization problems in a closely analogous fashion. In simulated annealing, the optimization problem's solution space corresponds to the state space of the analogous thermodynamic system and its cost function is analogous to the thermodynamic system's energy surface. The analog of the thermodynamic system's temperature is a nonnegative algorithm control parameter, T. Two other algorithm components are also required. They are the stochastic next state generation and acceptance mechanisms, and they incorporate the dependence of the algorithm on the control parameter, T. The next state generation mechanism is employed by the algorithm to transform a current solution into a new candidate solution, and the acceptance mechanism is employed to decide whether to retain or discard the proposed new solution. Together, these stochastic operators are responsible for making the search algorithm simulate the thermodynamic system's statistical behavior. Consequently, they must satisfy certain requirements to assure coherent algorithm behavior. These requirements are explored in some depth later in the context of algorithm convergence behavior. Conceptually, the operation of the simulated annealing algorithm can be described as follows. The algorithm starts at some initial value of the control p;uameter and with some initial solution. Then, the state generation mechanism is employed to synthesize a new candidate solution. The new solution is examined by the acceptance mechanism and either accepted or rejected. If it is accepted, the new solution becomes the current solution. Otherwise, the old current solution is retained. This process is repeated, generating a
PAGE 19
10 sequence of temporary solutions, until an approximate equilibrium is achieved in which the solution space occupancy is described by the Gibbs distribution (Eq. 2.12). Once this approximate equilibrium is achieved, the control parameter value is reduced and the solution sequence is extended until equilibrium is achieved at the new control parameter value. This process is repeated until some termination condition (e.g. minimum control parameter value) is attained. The current solution at termination is then accepted as the solution to the optimization problem. It is noted in passing that simulated annealing always involves minimizing a cost functional, never maximizing a reward. However, this causes no loss of generality because any combinatorial optimization problem can be translated into an equivalent minimization problem. 2.4 Theoretical Foundations of Simulated Annealing The evolution of the search sequence of a simulated annealing algorithm as outlined, in which each succeeding solution in the sequence is determined stochastically based upon the current solution, suggests that the algorithm behavior can be described as a Markov chain. Indeed it can, and all of the known convergence results for simulated annealing algorithms are derived from analysis of Markov chain models [LaAa87, GeGe84, LuMe86, MiRo85, Rior58]. This subsection establishes a Markov chain model to represent the simulated annealing algorithm and then employs it in reviewing the development of the published convergence bounds. This development essentially follows [LaAa87]. 2.4.1 A Markov Chain Model of Simulated Annealing Let a combinatorial optimization problem be represented by the pair (S,C) where S is the problem's solution space and C is its cost function, and assume without loss of generality that the optimization problem requires minimization of C. Also, assume that S is finite. Then, a simulated annealing algorithm for solving this problem can be characterized by the quadruple (S,io,Px,T) where S is as defined above and where io e S
PAGE 20
11 is an initial candidate solution, p is a stochastic matrix which describes a stochastic state transition mechanism (the composition of the next state generation and acceptance mechanisms discussed in Section 2.3) and x = {T^} is a finite length monotone nonincreasing sequence of positive control parameter values. The first parameter value in x is Tq and the final value is Tf. P^incorporates the algorithm dependence on both C and T. The algorithm generates a sequence of candidate solutions, {in:0 < k < f}, by employing the state transformation mechanism (described by Pj) to transform solution i^ into i^+i. At the ktransition, P^ is completely determined by T^. The solution sequence is extended until T = Tf, at which point the current solution, if, is accepted as the solution to the combinatorial optimization problem. Thus Tf signals algorithm termination. Tf can be allowed to depend on {i,;} provided due regard is paid to the requirement for termination. Since the solution state transition mechanism is stochastic, and since the conditional dependence of the solution sequence only extends to one transition, the solution sequence is a Markov chain by Definition Al. Its state transition matrix is Py (Definition A4). The state transition matrix is decomposed into two parts for convenience in the following. It consists of the next state generation mechanism, Gj(T), which describes the probability of generating state j given that the current state is i, and the state acceptance mechanism, Aj(T), which describes the probability of accepting the generated state. Thus, Pr(i,j) is written as G,(T)A,(T) j^i P,ihi) 1i GÂ„(T)AÂ„(T) j = i 1=1. I*i Eq. 2.3 In this result, N = card(S) represents the cardinality of the solution space. It is noted in passing that the usual form of the state acceptance mechanism is the so called Metropolis criterion Metr53, given by (C(j)C(i))\ Â£^^,^ Aj:(T) = mim l,exp
PAGE 21
12 This is the form employed by Kirkpatrick et al. in the original work [KiGe83] and most others published since are variations of it. Also, the usual form of the next state generation mechanism is G,j(T) = G, = G,= N. J^ * Eq.2.5 otherwise where S; c S is the set of states accessible from state i in one transition (by definition, i Â«Â« Sj), and where Nj = card(S,). Note that G,j defined by Eq. 2.5 is symmetric and independent of T. 2.4.2 Asymptotic Convergence Behavior The subject of interest in the remainder of this section is a set of sufficient conditions on Pr and x to ensure that an optimal solution is achieved. These conditions will prove to guarantee asymptotic convergence only (i.e. T must be an infinite sequence, which of course violates the termination requirement of the algorithm). Two cases will be examined. The first only involves timehomogeneous (stationary) Markov chains (Definition A5) and is presented due to its relative ease of analysis. Its purpose is to provide a foundation for the essential ideas involved in the second case, which requires an appeal to ergodicity theorems for inhomogeneous (nonstationary) Markov chains. The useable convergence behavior results which are the goal of this effort derive from analysis of the second case. The first (simple) algorithm is represented as a sequence of solutions evolving as a sequence of distinct Markov chains. Each Markov chain in the sequence executes at a fixed control parameter value (and hence is timehomogeneous) and each succeeding Markov chain executes at a lower (but strictly positive) parameter value. Thus, in the sequence x, each distinct parameter value, TÂ„ is associated with a distinct timehomogeneous Markov chain and T, occurs at some large number of consecutive locations, KÂ„ in X. This case is hereafter referred to as the homogeneous (or stationary) algorithm.
PAGE 22
13 The analysis of the convergence behavior of the homogeneous algorithm includes the hypothesis that each Markov chain in the sequence achieves its stationary distribution. This hypothesis is equivalent to K, Â— > <Â» for all 1 (Definition A 10, Theorem A3, Theorem A4). In the second case, the algorithm is represented as a sequence of solutions evolving as a single inhomogeneous (nonstationary) Markov chain. This formulation is hereafter referred to as the inhomogeneous (or nonstationary) algorithm. In the inhomogeneous algorithm, the control parameter value is allowed to decrease (though not necessarily required to) after each state transition. The dependence of Gj(T) and A,j(T) on T results in the inhomogeneous behavior. 2.4.2.1 The Homogeneous Algorithm In the homogeneous algorithm, the means of establishing the requirements for asymptotically optimal convergence is to first establish sufficient conditions for existence of the stationary distribution of each Markov chain and then to establish sufficient conditions to ensure that the stationary distribution converges to a uniform distribution over the set of optimal solutions as the control parameter value approaches zero. That is 1 imqT(i) = ^NÂ„p, '^ "Â•" Eq. 2.6 otherwise T>0 where % is the stationary distribution of the Markov chain executing at control parameter value T, S^p, c S is the set of solutions i e S:C(i) = Qp, and N^p, = card(SopJ. Theorems A1A3 can be employed to deduce sufficient conditions on Pi(iJ) (or alternatively on G,j(T) and A,j(T)) to ensure the existence of the stationary distribution of each Markov chain in the sequence representing the homogeneous algorithm. Since only combinatorial (finite solution space) optimization problems are under consideration and since by definition the homogeneous algorithm only employs timehomogeneous Markov chains, the finite state space and timehomogeneity requirements of Theorem A3 are
PAGE 23
14 satisfied. Beyond these requirements, existence of the stationary distribution of each Markov chain in the homogeneous algorithm only requires that the chain produced by Pj be irreducible and aperiodic (Definitions A7 and A9). If Ay(T) is selected as the Metropolis criterion, Eq. 2.4, then Vi,jÂ€ E,VT>0:A,/T)>0. Thus, from Eq. 2.3, the irreducibility requirement is transferred to the next state generation mechanism, G,j(T). Note that from Theorem Al, irreducibility can readily be achieved within the definition supplied by Eq. 2.5. Also, in [MiRo85], Theorem A2 is used to show that a sufficient condition for aperiodicity is VT>0 3i,je E 3 Aij(T)
PAGE 24
15 (Eq. 2.6) are established. First, note that if the stationary distribution of a Markov chain in the sequence exists, then a function g(C(i),T) corresponding to that Markov chain exists such that j where g satisfies (1) VieE,VT>0 : g(C(i),T)>0 (2) Vj Â€ E Zg(C(i),T)G,/T)Ay(T)= . Eq.2.9 g(C(i),T)ZG,/T)A,/T) This can be deduced by noting that the uniquely determining conditions on q expressed in Theorem A3 are met by g satisfying Eq. 2.8 and 2.9. Eq. 2.9 is called the global balance equation. Close examination reveals that it is exactly the necessary condition for equilibrium state occupancy. A more restrictive condition, in which the balance holds for every pair of states on a pairwise basis is called the detailed balance equation. It can be shown that the following additional constraints on g guarantee convergence of the stationary distribution to the optimal (i.e. to Eq. 2.6) [MiRo851. Note that Eq. 2.10(2) requires an exponential form. A>0 (1) limg(A,T) = T+0 [oo A<() g(A.,T) Eq.2.10 ^'^ g(A;^ = ^^^'^^'^) (3) VT>0:g(0,T)=l Collectively, Eq. 2.82.10 provide a set of sufficient conditions on Gj(T) and A,j(T) to assure convergence of the stationary distribution to Eq. 2.6. The key condition, the global balance equation, is implicit however, and thus is very difficult to apply. Nevertheless, it can be shown LaAa871 that if Gij(T) and A,j(T) defined by Eq. 2.4 and Eq. 2.5 are
PAGE 25
16 employed, the conditions are satisfied, and that the corresponding stationary distribution is provided by w T. M exp{(C(i)QpJ/kT} Vi 6 E : QtU) = T^ . Eq. 2.11 ^'^^ Iexp{(Ca)QpJ/kT} ^ j The key to that development is that the Gij(T) and AyCT) of Eq. 2.4 and 2.5 satisfy the detailed balance equation, the symmetry of Gy being a critical consideration. The behavior required by Eq. 2.10(1) is limiting behavior as T Â— > 0. Thus, these conditions assure convergence to the global minimum with probability one (i.e. convergence of the stationary distribution to Eq. 2.6), only if the sequence of Markov chains is infinite and lim T, = 0. Recalling that a guarantee of achieving the stationary distribution l_>=o requires that each Markov chain be of infinite length, the homogeneous algorithm is seen to require a doubly infinite sequence of solutions composed of an infinite sequence of infinitely long Markov chains. 2.4.2.2 The Inhomogeneous Algorithm The behavior of the homogeneous algorithm, which requires that an infinite number of transitions be executed at each control parameter value, clearly is not very useful. The following reviews two published convergence results which extend the ideas developed for the homogeneous algorithm to the inhomogeneous counterpart [GeGe84, MiRo85]. These results adopt the sufficient conditions on Gij(T) and Aij(T) developed for the homogeneous algorithm as a starting point (i.e. irreducibiUty, aperiodicity and Eq. 2.82.10) and extend them to the case in which each timehomogeneous Markov chain is finite length (i.e. to the inhomogeneous algorithm). The key products of this effort are lower bounds on the algorithm control parameter's approach to zero. In both cases discussed here, the bound is of the form K/log(k) where k is the index of the Markov chain representing the inhomogeneous algorithm and K is independent of k. The following is a brief sketch of the approach taken to arrive at these results. It is common to both.
PAGE 26
17 Given that Gjj(T) and A,j(T) are selected as in Eq. 2.4 and 2.5, each state transition matrix in the inhomogeneous Markov chain of the inhomogeneous algorithm satisfies all of the sufficient conditions for stationary distribution existence and asymptotic convergence to optimality developed for the homogeneous algorithm (i.e. irreducibility, aperiodicity and Eq. 2.82.10). Further, the explicit form of the resulting stationary distribution is given by Eq. 2.1 1. Thus, for each transition matrix, Pj ), there exists an eigenvector, qr^, having eigenvalue 1 and satisfying the probability vector conditions. Further, qr^ converges to the limiting distribution of Eq. 2.6 as T^ ^ 0. Consequently, Theorem A7 can be used to establish strong ergodicity (and hence the desired convergence behavior for T^ ^ 0) provided (1) that weak ergodicity can be established and (2) that the inequality appearing in Theorem A7 obtains. Under the hypothesis that Gij(T) and Aij(T) are defined in accordance with Eq. 2.4 and 2.5, in which case the required eigenvector is explicitly provided by Eq. 2.1 1, and that condition (1) (weak ergodicity) is satisfied, both [GeGe84) and [MiRo85J prove condition (2) of the above. The development is straightforward but tedious. Of more interest here is the means of establishing condition (1), because it leads to the annealing schedule bound. Both developments employ Theorem A6 to establish weak ergodicity. The general approach is to use the definitions of Gjj(T) and Aij(T), along with bounds on the extrema of either the cost function [GeGe84 or the slope of the cost function [MiRo85 to define bounds on the one step transition probabilities. The transition probability bound is then employed to arrive at an upper bound on the x, coefficient of ergcxiicity of Theorem A5, which is used in turn in Theorem A6 to deduce a sufficient condition to guarantee weak ergodicity. The condition is in the form of a lower bound on the annealing schedule.
PAGE 27
18 The first such result to be published is in [GeGe84]. The resulting bound is Â„ N X (CÂ„Â„ Cn, J ^k ] Â— TT^ Eq. 2.12 log(k) k>2 where CÂ„Â„ and C^^Â„ are the maximum and minimum values respectively of C(i) for i e S and N = card(S). Thus, C^ is the desired Qp,. The annealing schedule bound established in [MiRo85] is more refined than that of Eq. 2.12. It is given by Tk^rTTT Eq.2.13 log(k) ^ k>2 where r is the radius of the graph defining the accessible state neighborhoods of the next state generation mechanism (i.e. the {S,} where Sj c S is defined in Eq. 2.5), and L is a constant which bounds the local slope of the cost function. Specifically, r and L are given by r= min maxd(i,j) Eq. 2.14 i^SSÂ„Â„ JÂ€S where d(i,j) is the distance of j from i, measured by the minimum number of state transitions required to arrive at j starting at i, where S^,, c S is the set of local maxima of C and L= max max C(j)C(i). Eq. 2.15 i Â€ S j Â€ S, Note that in the special case S, = S for all i Â€ S, then Eq. 2.14 and Eq. 2.15 reduce to r= 1 and L = CÂ„Â„ CÂ„iÂ„ respectively, and substitution into Eq. 2. 1 3 yields nÂ— \ Â— 77\ Â— Â• Eq. 2.16 log(k) ^ The Eq. 2.16 result is smaller than that of Eq. 2.12 by the factor 1/N.
PAGE 28
19 Both of these published convergence results, as well as several others which are minor variations of them, are of the general form VJ log(k). This behavior is the key limitation of the algorithm class, and is believed to be a fundamental limitation imposed by the neighborhood system inherent in the conventional simulated annealing state generation mechanism [GeGe84] (i.e. the fact that at low control parameter values, the likelihood of making the large state transition necessary to escape a local extremum is radically diminished). The simulated annealing literature includes some amount of speculation concerning state generation mechanisms which permit occasional large transitions even at low control parameter values.
PAGE 29
SECTION 3 THE GENETIC ALGORITHM 3.1 Overview The genetic algorithm is an iterative improvement stochastic search method appropriate for application to combinatorial optimization problems and based on the evolution of biological systems. It implements the fundamental idea of survival fitness on a population of string structures which are coded representations of solution candidates selected from the solution space of the optimization problem. The population of candidate solutions (which collectively represent the current estimate of the optimum solution) is subjected to a set of stochastic genetic operators which transform a current population into a new (descendent) population. A variety of distinct genetic operators (based on biological analogs) are available and are reported in the literature [Davi87, Gold89a, Gref85, Gref87]. The most important of them are (1) proportional reproduction, (2) crossover and (3) mutation. A one, two or three operator genetic algorithm employing combinations of these operators with fixed population size is referred to herein as a simple genetic algorithm. The genetic operators are all implemented stochastically, but they do not result in a simple random walk through the search space. They represent a highly structured search which exploits the historical record of performance reflected at each stage of the search by the current population. It is the novel use of this historical record which is central to the appeal of the genetic algorithm. Genetic algorithms usually operate on populations of bitstrings (i.e. the optimization problem is usually coded such that its search space is defined over a binary string alphabet), and they always attempt to maximize some strictly nonnegative objective 20
PAGE 30
21 function. The evolution of the fixed size population of candidate solutions toward domination by optimal solutions is the algorithm goal. The three genetic operators of a simple genetic algorithm are discussed in the next subsection. An analysis of their behavior requires introduction to the concept of schemata, or similarity templates, and that task is undenaken in a subsequent subsection. This section concludes with an assessment of the theoretical foundation available for the analysis of genetic algorithms. 3.2 The Simple Genetic Algorithm Operators As noted above, the simple genetic algorithm employs three biologically inspired operators to transform each population of candidate solutions into a new (descendent) population. The following subsections examine each of these operators and how they influence the search evolution. 3.2.1 Reproduction The genetic algorithm reproduction operator is the algorithmic analog of asexual reproduction. It is the means by which the objective function influences the evolution of the genetic algorithm search. It is implemented by evaluating each member of the current generation against the objective function and using the results to measure relative reproductive fitness (i.e. to provide a selection probability measure). Then, members of the current population are selected in accordance with this fitness measure to be members of the succeeding generation. This process is repeated (with statistically independent selection trials) until the entire new generation is populated. In the absence of the other genetic operators, the reproduction operator tends to force the population to converge to the higher performing members of the current population. It eventually produces a uniform population. At any stage of the search (generation), only solutions which are represented by members of the current population can appear in any succeeding generation. In particular, no solution absent from the initial population is ever attainable. The reproduction operator exerts a strictly converging infiucnce on the
PAGE 31
22 search evolution. The other operators of the simple genetic algorithm circumvent this limitation in a controlled manner. 3.2.2 Crossover The crossover operator in a genetic algorithm is the algorithmic analog of sexual reproduction. It produces the succeeding generation not by simply replicating the fittest members of the current generation but by mating the fittest members of the current generation to produce progeny with some of the "genetic" character of each parent. It is implemented by randomly exchanging parts of the strings representing the parents to produce descendent strings. The crossover operator is implemented (with some given probability, p^) after the reproduction operator has been invoked to select two reproducing parents. A string location is randomly selected (usually with uniform selection probability) and the parent bitstring on each side of the randomly selected location are exchanged to produce two progeny, which are then inserted into the succeeding population. This operation is repeated until the new generation is completely populated. The crossover operator permits strings not represented in the current population to be generated in the succeeding population. That is, certain points in the solution space which are not represented in the current generation can be present in the successor generation. But the crossover operator is applied preferentially to high performance members of the current population, so it constitutes a judicious, informed tendency toward population divergence. This is the novel feature contributed by the crossover operator. Even with the addition of crossover, the genetic algorithm search will eventually converge to a uniform population. In general the crossover operator causes a greater portion of the search space to be explored prior to convergence to uniformity, but for a given initial population, there are still unreachable points in the solution space. Further, even if a high performance solution is accessible from the initial population, some portion of the "gene pool" necessary to reach it can be irrevocably lost during the search evolution.
PAGE 32
23 3.2.3 Mutation The mutation operator is applied to each member of the successor generation created by the reproduction and crossover operators. It simply consists of randomly perturbing each descendent string with some (usually very small) perturbation probability, Pm. The operator exerts a diverging influence on the search algorithm, and it provides a means by which the search can, with some nonzero probability, always arrive at any point in the solution space. That is, no part of the "gene pool" is ever permanently extinguished if the mutation operator is implemented. Clearly, it is analogous to mutation in biological reproduction. Note also that if p^ > 0, the mutation operator precludes the algorithm from ever producing a permanently uniform population (i.e. it precludes algorithm convergence). 3.3 Building Blocks. Schemata and the Fundamental Theorem The underlying premise of the genetic algorithm operators is that good solutions to an optimization problem over a bitstring solution space are composed of locally good substrings, and that assembling combinations of such locally good substrings is an effective way to search the space for globally good solutions. In the genetic algorithm literature, this is referred to as the building block hypothesis. For a problem to be amenable to genetic algorithm solution, this hypothesis should apply. In the genetics parlance, this hypothesis is stated as a requirement that the problem exhibit "...some but not too much epistasis" [Davi87. The next subsection introduces an idea which helps to place this hypothesis on a more analytical basis, but the results are still incomplete. 3.3.1 Schema Defined Let the solution space under consideration be the set of binary strings of length L, (i.e. S = {(), 1 }' ). Then, a schema (plural schemata), designated H, is a subset of S having the property that every member of H matches at some specified set of defining bit locations. Thus, if L = 5, then the schema H might be the set of length 5 bitstrings which match the string (1,0, 1,0,0) at the bit locations indicated by H = {s:s = (1,*,*,0, *)}. in
PAGE 33
24 which the asterisks indicate "don't care" bits. The bit locations at which the schema is specified are the defining locations of the schema. The order of the schema, designated by o(H), is the number of its defining locations and can range from to L. In this example, o(H) = 2. The defining length of the schema, designated 5(H), is the number of bit positions subtended by its outermost defining bit locations minus 1. In this example, 5(H) = 5 2 = 3. For a bitstring space of length L, there are exactly 3^ distinct schemata. This can be readily determined by noting that the distinct schemata are selected from {0, 1, *}''. A given string selected from the space represents exactly 2^ distinct schemata. This results from the fact that the string is defined at all L bit positions, and hence is selected from {0, 1} . The schemata of an optimization problem's search space are the building blocks from which good solutions are to be constructed. 3.3.2 Schema Processing and the Fundamental Theorem Let the constant population size of a simple genetic algorithm be designated M. Then, each generation produced by the algorithm represents some number, N, of distinct schemata that is bounded as follows 2^
PAGE 34
25 disrupting high order high performance schemata. The extent of population divergence introduced by the crossover operator is determined in part by the degree of schema diversity present in the current population. In particular, when the population becomes uniform, the crossover operator is nullified, because assembling substrings extracted from identical strings produces identical progeny. The mutation operator also provides a disruptive mechanism which resists the converging influence of the reproduction operator. Since any schema can be produced by mutation with nonzero probability, the permanent extinction of any of the 3^ possible distinct schemata is precluded. These ideas are captured in the following inequality, which is referred to in the literature as the Fundamental Theorem of Genetic Algorithms. It relates the number of copies of a particular schema in the current generation to the expected number of copies of the same schema in the succeeding generation. This inequality is derived in [Gold89a] from relatively simple probability notions. The development is not repeated here. R(H)E{m(H,k(l) ) > m(H,k)x R Eq. 3.2 [lPcX^^p.xo(H)] where m(H, k) = number of occurrences of schema H in the population at generation k, E{} = expected value operator, R(H) = average objective function value (> 0) of all strings in the current population which are realizations of H, R = the average objective function value of the current population.
PAGE 35
26 Equation 3.2 is an inequality because it does not consider the accretion of the schema H contributed by crossover and mutation. It only accounts for the disruptive effects of these operators. A more thorough treatment can be found on pp 91 3 of [Gref87], but the result is too cumbersome to be of much analytical value. Qualitatively, Eq. 3.2 suggests that low order schemata occurring in the current population contribute to succeeding generations in direct proportion to the product of their number in the current generation and their average performance relative to the other schemata competing for dominance of the same set of defining locations. Crossover and mutation tend to disrupt this converging influence, and the disruptive effect of crossover is directly proportional to the defining length of the schema in question. In view of Eq. 3.2, the building block hypothesis might be restated as a characteristic of genetic algorithm amenable optimization problems. A GA amenable problem is one for which a near optimum solution can be achieved, with a relatively small expenditure of search effort, by assembling high performance, low order schemata into novel combinations. If the objective function is such that (nonlinear) contributions from combinations of bits spanning widely separate bit locations are appreciable (i.e. if the objective function depends heavily on large defining length schemata), then the problem is not likely to be suitable for solution by genetic algorithm. On the other hand, if the objective function depends predominantly on short defining length schemata, then sorting through promising combinations of reahzations of those schemata is likely to isolate good (though not necessarily optimal) solutions. Accomplishing the required sorting efficiently is the task for which genetic algorithms are well suited. 3.4 An Assessment of the Genetic Algorithm Theoretical Foundation The existing theoretical foundation for analysis of genetic algorithms includes the fundamental theorem of genetic algorithms (Eq. 3.2) originally enunciated by Holland [Holl75] and extended by Bridges and Goldberg [BrGo87], the Walsh function approach to computing schema fitness averages contributed by Bethke [BethSO] and
PAGE 36
27 generalizations of it [Gold88, Gold89b, Br
PAGE 37
SECTION 4 A MARKOV CHAIN MODEL OF THE SIMPLE GENETIC ALGORITHM 4.1 Overview From the discussion of the simple genetic algorithm operators in Section 3.2, it is clear that the sequence of populations generated by the algorithm when executing on a specified combinatorial optimization problem is a stochastic process (with finite state space), and further that the conditional dependence of each population in the sequence on its predecessors is completely described by its dependence upon the immediate predecessor population. Thus, the sequence is a Markov chain (Definition Al). In this section, a nonstationary Markov chain model of the simple genetic algorithm is developed for one, two and threeoperator variants of the algorithm. The model is tailored to resemble that offered in Section 2.4.1 for simulated annealing. The oneoperator genetic algorithm model implements proportional reproduction only, while the twooperator variant employs reproduction in combination with mutation. The threeoperator algorithm implements reproduction, mutation and crossover. This model hierarchy is employed because it provides some degree of insight into the effect that each operator has on the nature of the state space of the resulting Markov chain. Describing and analyzing the operation of the simple genetic algorithm is facilitated by assuming that the underlying optimization problem is defined over a bitstring solution space. This assumption is not essential and sacrifices very little generality. It is implemented throughout the following sections. 4.2 The Markov Chain Model Let a combinatorial optimization problem be characterized by the pair (S,R) where S={0,1 }^ and R is a strictly positive real valued reward function, and assume, with no 28
PAGE 38
29 loss of generality, that the problem requires maximization of R. Also, let a simple genetic algorithm designed to execute on this problem have fixed population size M, let i e S be interpreted as an unsigned integer (0 < i < 2^"1), and let a generation be represented by m = (m(0), m( 1 ),Â•Â•, m(2 1 )) where m(i) = the number of occurrences of solution i e S in the population. Thus, in the parlance of combinatorial mathematics, m is a distribution of M nondistinct objects over N = card(S) = 2^ bins [Hall67, Rior58], and the set of all such distributions, S' = {m}, is a suitable representation of the simple genetic algorithm search space. The cardinality of S' is given by N' = card(S') = ^M + 2^1^ /"m + nT M J '^'Â•^' M ^ ^ Since both N and M are finite, so is N'. Then, if mg e S' is selected as an initial population, the simple genetic algorithm can be represented by the quadruple (S',mo,PQ,r) where Pq is a state transition matrix (analogous to P^ of the simulated annealing model) and F = {Q^} is a finite length sequence of parameter vectors Q^ = (Pm(k), Pc(k))The algorithm parameters Pn,(k) and Pc(k) are respectively the mutation and crossover probabilities. In the following sections, the mutation probability sequence is employed in a role analogous to absolute temperature in simulated annealing, and consideration is limited hereafter to monotone nonincreasing sequences. In general, the only limitation on the crossover probability sequence is that its values are probabilities. However, in all of the following, consideration is limited to constant crossover probability sequences. The first parameter vector in V is Q,, and the final piU'ameter vector is Qf. The solution evolves as a sequence {m,j} of states m,^ e S' in which the conditional dependence of m( + , on the sequence history is equivalent to its conditional dependence on m^, and thus the solution sequence is a Markov chain. In general, the chain is inhomogeneous (Definition A5). In Section 4.3 it is shown to be timehomogeneous if the parameter vectors are constant. As with the simulated annealing algorithm model, exhausting the sequence of
PAGE 39
30 control parameter values, r, signals algorithm termination, and q can be allowed to depend on {m^} provided the algorithm termination requirement is satisfied. 4.3 State Behavior of the Simple Genetic Algorithm In each of the next three subsections, the state transition mechanism (and its effect on the nature of the solution sequence) which results from employing a specified combination of the genetic algorithm operators to the Markov chain model is examined. The first case consists of a oneoperator algorithm which employs only the reproduction operator. The second is a twooperator algorithm which employs reproduction and mutation. Finally a threeoperator algorithm which includes crossover with reproduction and mutation is examined. Although it is most natural to describe the genetic operators in the order reproduction/crossover/mutation, the course adopted in Section 3.2, the following development proceeds most instructively if mutation is included with reproduction in the twooperator algorithm and crossover is deferred to the three operator case. This is due to the fact that the mutation operator provides the essential state space modification required to make the Markov chains of the timehomogeneous two and threeoperator algorithms irreducible (Definitions A7 and A8, Theorem Al), and consequendy causes them to have unique stationary distributions (Theorem A3). The oneoperator algorithm (proportional reproduction only) does not satisfy the irreducibility requirement for existence of a unique stationary distribution. (Neither does the algorithm variant which employs reproduction and crossover without mutation). A unique stationary distribution means that the asymptotic state occupancy probability of the timehomogeneous two and threeoperator algorithms is completely determined by the algorithm parameters and objective function. It is independent of the starting state (inidal population). Asymptotic independence of the starting state is a necessary (but not sufficient) condition on the zero mutation probability limit of the stationary distribution of the timehomogeneous algorithm for the inhomogeneous algorithm counterpart to avoid (asymptotically) local minima entrapment.
PAGE 40
31 4.3.1 A OneOperator Algorithm (Reproduction) In this subsection, the nature of the state transition matrix is examined for the case of no crossover or mutation (i.e. Q^ = (0,0) for < k < f). In this case, the conditional probability of selecting a solution i e S from a population described by the state vector n e S' is (i.e. proportional reproduction) n(i)xR(i) Vie S,VnÂ£ S':P,(in) = Xna)xR(j) J6S Eq. 4.2 where the subscript 1 indicates that the oneoperator case is under consideration. Thus, the conditional probability of the successor generation described by m given that the present generation is described by n is a multinomial distribution, i.e. M! Vm,n Â€ S' : Pj(m  n) = n m(i)! jÂ€s iÂ€ S M m M m X n P,(i I n)' ::\">(') iÂ€ s xn iÂ€ S Eq. 4.3 n(i)xR(i) InQxRO) JÂ€S m(i) where again the subscript 1 distinguishes the oneoperator case, where the symbol M! rM' m J n M(i)! Eq. 4.4 iÂ€ S designates the indicated multinomial coefficient and where by definition n(i) = n(i)xR(i) Sn(j)xRa) m(i) 1 m(i) = m(i) > Eq. 4.5 The transition probability matrix of the Markov chain representing the oneoperator algorithm is composed of the array of conditional probabilities defined by Eq. 4.3, i.e. P=P,(mn)J. Eq.4.6
PAGE 41
32 Since it is independent of the sequence index (i.e. the parameter vectors are constant), the oneoperator Markov chain is timehomogeneous (Definition A5). The set of states which represent uniform populations (i.e. the states m ^ e S^' c S' in which one component is M and all others are zero) are absorbing states of the Markov chain, because for any such state, P,(mA 1 111^) = 1 and Definition A6 applies. Since it follows from Eq. 4.23 that Vn e S' S^' : Pi(n  n) < 1, there are exacUy N = 2^absorbing states. The corresponding rows of P are given by _ _ fl m = nÂ» Vn^eS/:P,(mnJ= _ Â„,;,. Eq. 4.7 [0 mÂ£ S {n^} Thus, for each state n^ e S^', the associated row of the state transition matrix (Eq. 4.6) contains 1 in the principal diagonal location and elsewhere. It follows that the N' x 1 probability vector q^^ (Definition A2) whose n^ e Sa' component is 1 is a stationary distribution (Definition A 10) of the oneoperator Markov chain. It is not unique because any of the N = 2^ such vectors satisfies the requirement, as does any vector of the form q = _ I ^ qn^where ^^ > and I)^ = 1 . The absorbing states preclude irreducibihty (Theorem Al), so the Markov chain does not satisfy the requirements of Theorem A3. The chain is aperiodic (Definition A9) however, because Vm e S' : Pi(m  iri) > so the period of all states is 1 . Thus, all of the conditions of Theorem A3 except irreducibihty are met by the oneoperator Markov chain. The expected number of transitions required to arrive in an absorbing state, Efk^), is finite. An upper bound on Elk^} is given by E{kJ < < 00 Eq. 4.8
PAGE 42
33 where R^,Â„ and R^Â„ are the extreme values of R. (Recall that R is assumed strictly positive, so Rmaj > R,^ > 0). Eq. 4.8 can be derived by defining PaC^) as the conditional probability of arriving in the set of absorbing states, S^', on the ktransition given that the kstate is not absorbing, letting Pn,iÂ„ be a lower bound on PA(k), and bounding the series for Elk^} as follows E{kA}=Ikxp^(k)n(lpA(l)) k 1=1 0. Eq. 4.10
PAGE 43
34 accomplished by expressing P2(i  H) as a sum over all j of the corresponding P,(j  n) rimes a factor which accounts for the probability of the collection of mutation events required to transform j into i. This probability can be expressed as Pm '"''(I Pm)^~"^''^^ where H(i,j) = H(j,i) is the Hamming distance of the pair i,j. That is, H is a function defined on S xS with values in {0, 1,2, sL}. H(i,j) is the number of bits which must be altered by mutation to transform i into j and L H(i,j) is the number of bits which must remain unaltered. Thus, P2(i  n) can be written as VieS,Vn6S':P,(in)=Ip^<'J\lpJ^ = (1PJ'X jes (1pJ H(i.j) xP,(jn) Eq.4.11 1 (1 + a) jÂ€s Ia""'^'xP,0H) where a = (1pJ and Pm = a Eq. 4.12 Eq. 4.13 (1+a) For pÂ„=0 or pÂ„=l, Eq. 4.1 1 includes the indeterminate form 0Â° in some terms. Thus, the admissible range of Pn, is restricted to < p^ < 1, and consequenUy that of a is < a < oo. However, cases corresponding to Pn, > 1/2 <=> a > 1 are of no practical interest (they are less random than the case pÂ„ = 1/2 <=> a = 1), and some of the following developments restrict consideration to the range 0x "^^^^^^ . ' (l+afs^s I n(k)xR(k) k6 S It is also straightforward to show that
PAGE 44
35 Zn(k)xR(k)=Â— ^_ _ keS (l+a)jÂ€SkÂ€S I I n(k)xR(k)xa"^''l Thus, PjCi I n) can be expressed as P,(in) = I nQ) X R(j) X a"<''J> JeS I Z n(k) X R(k) X a''^''^ jÂ€ SkÂ€ S In(j)xR(j)xa"^''J> jgs (1+a)'Z n(k)xR(k)' kÂ€ S Eq.4.14 and PjCm  n) is multinomially distributed as follows M! Vm,nG S':P,(mn) = Â— Â— Â— X nPadln)' n ni(l)! ies iÂ€ S m(i) (M^ yirxj M X n P2(i I n)' m(i) iÂ£ S Eq. 4.15 1 n i:na)xR(j)a' jeS naj)!""Â® mj (l+a)^'iÂ€s In(k)xR(k) kÂ€ S The transition probability matrix of the Markov chain representing the twooperator algorithm is composed of the array of conditional probabilities defined by Eq. 4.15, i.e. P=[P2(mH)]. Eq.4.16 Since the elements of P depend on a (and hence by Eq. 4. 1 2 on Pn,(k)), the twooperator Markov chain is generally not timehomogeneous. It is timehomogeneous if the mutation probability is fixed. Eq. 4.144.16 for the twooperator simple genetic algorithm are analogous to Eq. 4.24.6 for the oneoperator variant except that PjCi  n) is strictly greater than zero for all n e S'. Thus, the twooperator analog of Eq. 4.5 is not required. Also
PAGE 45
36 and limP2(in) = Pi(in) lim PjCm I n) = P,(m I n). Eq. 4.17 The rows of the state transition matrix corresponding to the oneoperator absorbing states have an especially simple form. Let i^e S be the solution represented in the absorbing state n^ e S^'. Then, from Eq. 4.14, H(i.iJ P2(inJ = M X R(iA) X a (l+a)^xMxR(iJ Eq. 4.18 .H(i.iA) a (l+af Thus, from Eq. 4.15, P2(mnJ = /"mV' Sm(i)xH(i,i.) [^mj (l+a)"^ Since the reward function, R, is strictly positive by hypothesis, and since Vi,j Â€ S : < H(i,j) < L, it follows that for a in the range < a < 1, then a'Z nO) X RG) < S n(j) x RQ) x a"<'^> < I n(j) x RQ), je S jÂ€ S jÂ€ S Eq.4.19 and consequently from Eq. 4.14 that Vie S,VnÂ€ S' : Using Eq. 4.20 in Eq. 4.15 yields ^ a > 1 ia
PAGE 46
37 Vme S':(l)qÂ„(m)>0 (2)qir=l (3)^ = ^ . Since the stationary distribution is by definition a left eigenvector of the state transition matrix (Definition A 10), it follows from Eq. 4.15 and 4.16 that the asymptotic state probability distribution of the timehomogeneous twooperator algorithm is completely determined by the objective function and the algorithm parameters. It is independent of the starting state, mo. 4.3.3 A ThreeOperator Algorithm (Reproduction. Mutation and Crossover) The threeoperator simple genetic algorithm corresponds to the case Vk:Qk = (p^(k), p^(k)) with both Pm(k) and Pc(k) nonzero. Results analogous to Eq. 4.144.21 for the twooperator case are obtainable by defining a new function which is similar in character to the Hamming distance function employed in Section 4.3.2 for the twooperator case. This subsection completes that generalization. The result only reflects the crossover operation implicitly, however it permits some very significant conclusions concerning bounding values of the three operator conditional probabilities. The new function, I(i,j,k,s), is defined over an ordered quadruple (i,j,k, s) where i,j,kÂ€ S and where s Â€ {0, 1,,L} is a bitstring location. The states i,j e S represent respectively the first and second parent strings selected at a particular crossover opportunity and k 6 S represents a possible descendent string. The bitstring location s is the location randomly selected by the crossover operator, and normally it is unifomily distributed over its range. Thus, I is defined on S xS xS x{(), 1,2, Â•Â•,L} and it takes on values selected from {0,1 } depending upon whether the indicated crossover operation is or is not consistent. That is, I assumes the value one if the bitstring k is produced by crossing the bitstrings i and j at the site s, and zero otherwise.
PAGE 47
38 In terms of this crossover operator function, the conditional probability of producing, via reproduction and crossover, a solution k e S given a current population described by n e S' is P^'Ck I H) = p, X I I P,(i U) X P,(j I H) X ^ I I(i, j, k, s) iÂ€ SjÂ€ S L s +(lpJxP,(kH) Eq.4.22 1 *=L _ _ = PeXxI I lPi(in)xP,(jn)xI(i,j,k,s) L, ie Sje Ss=l +(lpJxP,(kH) where Pi(i  n) is as defined as in Eq. 4.2 and where Pj'O  n) refers to the twooperator algorithm consisting of reproduction and crossover without mutation. This result assumes uniformly distributed crossover site selection. The array of conditional probabilities [P2'(i I n)] plays a role in the threeoperator simple genetic algorithm very analogous to the role played by the array [P,(i  n)] in the twooperator variant. In fact, the [P2'(i I n)] array can be used as counterparts of Eq. 4.2 to develop results exactly analogous to Eq. 4.3 and Eq. 4.6. Further, for n e Sa', Eq. 4.22 reduces to P2'(kHj = P,(kHj, Eq.4.23 and consequently this (fictitious) twooperator algorithm (reproduction and crossover) demonstrates the same sort of absorbing state behavior as the oneoperator algorithm. From Eq. 4.22, the threeoperator conditional probabilities and state transition matrix are expressible as P3(i I H) = Â— ^ X Z a"^''J) x P,'Q I H), Eq. 4.24 (1ha) jeS
PAGE 48
39 P3(mn) = M! n ni(i)! ieS X n P3(i I n)" Eq. 4.25 iÂ€ S ( M X n PjCi I nr<'^ my ie S and P = [P3(mn)]. Eq.4.26 These results are developed in a fashion analogous to Eq. 4.144.16. From them, it follows that the threeoperator Markov chain is timehomogeneous if both the mutation and crossover probabilities are fixed. In general it is not timehomogeneous. From Eq. 4.22, 4.24 and 4.25, it follows that lim P3(i I n) = Pj'O  n) a>0* and lim P3(m I n) = P^'Cm  n). a>0* Also, from Eq. 4.234.25, the threeoperator analogs of Eq. 4.1819 apply H(.,i^) P3(i I "a) = a (1+ar P3(mnJ = 'M la \yc\) (Ha) ML Additionally, since a^ I P^'G I n) < I P^'O I n) x a""'^' < I P^'O I n), je S jÂ€ S jÂ€ S the three operator analogs of Eq. 4.2021 follow from Eq. 4.2425, i.e. 1 ^Â•Vie S,Vn6 S': , 1 ia
PAGE 49
40 All of the state space characteristics described in 4.3.2 for the twooperator algorithm follow. In particular, the Markov chain of the threeoperator algorithm is irreducible. Thus, a unique stationary distribution exists for the timehomogeneous threeoperator simple genetic algorithm, and as in the twooperator case it is completely determined by the objective function and the algorithm parameter values. 4.3.4 Summary The asymptotic behavior of the oneoperator simple genetic algorithm is dominated by the states which correspond to uniform populations, the oneoperator absorbing states. The algorithm necessarily arrives at some member of the absorbing state set within a finite number of algorithm iterations (Eq. 4.8). The asymptotic probability distribution depends upon the algorithm initial population, mo. This observation is equivalent to the fact, established in Section 4.3.1, that the stationary distribution of the oneoperator algorithm is not unique. A unique stationary distribution exists for the timehomogeneous two and threeoperator algorithm variants (with a > 0), or equivalently, their asymptotic probability distributions are independent of nio. However, in the a ^ 0^ limit, both the two and threeoperator algorithms degenerate into the absorbing state behavior which typifies the oneoperator case (Eq. 4.17 and Eq. 4.23, 4.27). A very important question is whether the unique stationary distributions of the two and threeoperator algorithms approach limits as a ^ 0^. Section 7 answers that question affirmatively, and in Section 8, the lower bounds reflected in Eq. 4.21 and Eq. 4.31 are employed to arrive at a monotone decreasing sequence bound on Pm(k) sufficient to guarantee that the limiting distribution is achieved (asymptotically) by the inhomogeneous two and threeoperator Markov chains. The analogous conditional probability arrays [P,(i  n)] and [P2'(i I n)], whose elements are defined by Eq. 4.2 and Eq. 4.22 respectively, play a very essential role in the following sections, especially in Section 9. Most of the results developed hereafter apply equally to the two and threeoperator algorithm variants by substituting from these
PAGE 50
41 conditional probability arrays appropriately. Thus, in much of the following, the notation modifiers are suppressed, so that the elements of either of these arrays are denoted by P(i I n), with the specific array reference being determined by context.
PAGE 51
SECTION 5 SOME EMPIRICAL RESULTS 5.1 Overview This section reports the results of some computer simulations based upon the genetic algorithm Markov chain model developed in Section 4. Their purpose is to help fix some of the state space and asymptotic probability distribution ideas which are central features of this work. The results reported here are separated into four subsections. Section 5.2 concerns enumeration of the state space, S'. Section 5.3 is devoted to generation of reward function data, which are subsequently used in the two remaining subsections. Section 5.4 illustrates the behavior of some selected conditional probabilities as a function of the algorithm control parameter, a. The results of the primary simulation task are reported in Section 5.5. They concern computation of the threeoperator stationary distribution at extremely low (approaching zero) values of the mutation probability control parameter. One of the significant theoretical results developed in subsequent sections is suggested by the data presented in Section 5.5. It is that the zero mutation probability limiting stationary distribution provides nonzero probability for all states corresponding to uniform populations (i.e. oneoperator absorbing states), including those which represent suboptimal solutions. This result poses a complication for the attempt to extrapolate the simulated annealing convergence theory onto the genetic algorithm, as discussed further in section 5.5. All simulation results included here were generated on the Cray YMP computer at the Eglin AFB, Fl. Computer Science Directorate. The data presented in Section 5.5 concerning the primary simulation task (the converged limiting stationary distribution 42
PAGE 52
43 results) includes some CPU utilization statistics which reflect the approximately 180 hours of CPU time expended in generating that data. The source program listings for the programs employed in generating the results of this section are included in Appendix D. 5.2 State Space Enumeration The results appearing in this section are of two primary types. The first is a table of computed state space cardinality values, N', at a variety of combinations of bitstring length, L, and population size, M. These results are products of the program GET_NPS.F appearing in Appendix D. It implements Eq. 4.1. The results are collected in Table 51. In addition to the N' column. Table 51 includes a similar column labeled N". It denotes the cardinality of a space designated S" which is related to S' and whose significance is established in Section 9. Its cardinality is given by N" = M Eq. 5.1 The data recorded in column N" of Table 51 are computed from this equation by the program GET_NPS.F. Table 51 State Space Cardinality M
PAGE 53
44 Table 51 (continued) M
PAGE 54
45 Table 51 (continued) M
PAGE 55
46 Table 54 S'atM=2,L=3 2
PAGE 56
47 In both data sets, the solution state which maximizes the reward value is the i e S represented by the decimal integer value 12. That is, for the fourbit function, i<,p, = 1 100, and its fivebit counterpart is iop, = 01 100. The reward function value for the arbitrary i e S is then computed by assigning the value 1 for each length 0, 1 or 2 schema (Section 3.3.1) in agreement with the optimum bit pattern and summing the contributions. Thus, for example, for the fourbit reward function, the bitstring 0000 has function value 4, generated by summing the contributions from the single matching length schema, two matching length 1 schemata and the one matching length 2 schema. A strictly positive reward function is guaranteed since every string matches the single length schema. R(i) 1 3 4 5 6 7 8 9 10 11 12 13 14 15 FourBit Reward Function Figure 51
PAGE 57
48 R(i) 12 14 16 18 20 22 24 26 28 30 i FiveBit Reward Function Figure 52 5.4 Conditional Probabilities Versus a The following four figures present plots of two and threeoperator conditional probabilities at two selected current states, n. These results are computed from Eq. 4.14 and Eq. 4.24. The plots are generated for the fourbit problem with reward function given in Figure 51 and with M = 6. From Table 51, the cardinality of S' for these examples is N' = 54264. The conditional probabilities are provided at two selected n vectors, one representing the uniform population n = (6000000000000000) and one the mixed population state n = (2000010001002000), and at three values of the mutation probability parameter. The two and threeoperator results are respectively products of the computer programs GET_P2INS and GET_P3INS provided in Appendix D.
PAGE 58
49 The purpose of the tests from which these data are produced is verification of the computer algorithms (and the implementing subprograms) employed to generate the conditional probability calculations required by the primary simulation task reported in Section 5.5. Thus, for example, all conditional probability distributions are uniform at a= 1 as is required by Eq. 4. 14 and Eq. 4.24, and for a ^ 0"^ all conditional probability distributions approach the oneoperator counterparts as is required by Eq. 4.17 and Eq. 4.27. Also, the two and threeoperator conditional probabilities are identical for the uniform population case (Figures 53 and 55) as required by Eq. 4.18 and 4.28, and the threeoperator mixed population state case allows generation of solutions not present in the current population even in the zero mutation probability limit. P,(i I n) 0.8 0.6 0.4 02 1 L \ L1.0 .02 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P2(i I n) at n = (6(KXKXKXXKXKKXKX)) Figure 53
PAGE 59
P^OIn) 08 0.6 0.4 0.2 50 ..^^^^^^ ' ' '^ 1 1 ^ 1 ^Ji 1 \^ I "''J ( I i I I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1.0 .02 P2(i I n) at n = (2000010001002000) Figure 54
PAGE 60
51 1.0 0.8 P3(in) 0.6 0.4 0.2 _] 1 1 I I I I I I I I I I I I ZJ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P3(i I n) at n = (6000000000000000) .02 Figure 55
PAGE 61
52 P3(in) 0, 8 9 10 11 12 13 14 15 PjO I n) at n = (2000010001002000) Figure 56 5.5 Converged Limiting Stationary Distributions The following data represent converged threeoperator stationary distribution results for both four and fivebit problems at a variety of population sizes. The results recorded in Figures 57 through 516 are products of the computer program GET_3STAT.F included in Appendix D. They are obtained by repeatedly multiplying a current state probability vector by the threeoperator state transition matrix until a termination criterion representing approximate convergence is attained. The starting probability vector is the multinomial distribution corresponding to a uniformly distributed PjO  n) array, and the termination criterion is that the sum of the probabilities for all nonuniform population states is less than 0.004. All of the results reported here are for extremely small a (approaching zero) and thus, as predicted by the model, only the states corresponding to uniform populations
PAGE 62
53 (oneoperator absorbing states) have nonzero probability. Consequently, only the final probabilities for the uniform population states are displayed in Figures 57 through 516, with each such state indexed by the decimal integer value corresponding to the solution represented. Table 55 summarizes the Cray YMP computer resources expended in generating these data. Tabulated there are the number of vector multiplications (of dimension N') required to attain the termination condition and the CPU time utilized. The CPU time is in seconds, rounded to the nearest integer. The tabulated data are collected from the log files generated in the computer runs which produced the stationary distribution data for Figures 57 through 516. The limiting distribution entropy results in Figures 517 and 518 are computed from the converged stationary distributions. The results are recorded in bits and are plotted as a function of population size. A very significant result suggested by the limiting stationary distribution data is that the a Â— > 0"^ value of the stationary distribution is nonzero for all possible uniform states. This behavior, which is confirmed by theoretical results developed in Section 7, precludes extrapolation of the simulated annealing global optimality convergence result onto the genetic algorithm. However, as suggested by the data plotted in Figures 517 and 518, it may be possible to approach the desired limiting behavior as closely as required by adjusting the population size parameter. Those figures indicate that for sufficiently large values of the population size parameter, the limiting distribution is dominated by optimal solutions, and that the limiting distribution entropy decreases monotonically with increasing population size. Results developed in Section 9 reinforce this premise.
PAGE 63
54 0.12 q(i) 0.08 006 0.04 002 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Limiting Stationary Distribution at M=2, L=4 Figure 57
PAGE 64
55 0.2 0.15 q(i) 0.1 0.05
PAGE 65
56 q(i) U J
PAGE 66
57 q(i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Limiting Stationary Distribution at M=5, L=4 Figure 510
PAGE 67
58 0.5 0.4 q(i) 0.3 0.2 0.1 1 i 1 1 ,1 1. 1 , 1 1.
PAGE 68
59 q(i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Limiting Stationary Distribution at M=7, L=4 Figure 512
PAGE 69
60 q(i) 0.06 0.05 0.04 0.03 0.02 0.01 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Limiting Stationary Distribution at M=2, L=5 Figure 513
PAGE 70
61 0.1 0.08 q(i) 0.06 0.04 0.02 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Limiting Stationary Distribution at M=3, L=5 Figure 514
PAGE 71
62 q(i) 0.16 0.14 0.12 01 0.08 0.06 0.04 002 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Limiting Stationary Distribution at M=4, L=5 Figure 515
PAGE 72
63 q(i) 025 02 0.15 0.1 0.05 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 i Limiting Stationary Distribution at M=5, L=5 Figure 516 Table 55 CPU Utilization Statistics M
PAGE 73
64 H M Limiting Distribution Entropy vs Population Size (FourBit Problem) Figure 517
PAGE 74
65 H M Limiting Distribution Entropy vs Population Size (FiveBit Problem) Figure 518
PAGE 75
SECTION 6 THE CRAMER'S RULE FORMULATION OF THE STATIONARY DISTRIBUTION 6.1 Overview In Sections 4.3.2 and 4.3.3, the timehomogeneous two and threeoperator simple genetic algorithm Markov chains are shown to possess unique stationary distributions. Those conclusions are established by invoking Theorem A3, which asserts that in each case the stationary distribution is a left eigenvector of the state transition matrix and that the additional constraint that it be a probability vector (Definition A2) makes the solution unique. In this section, the existence and uniqueness arguments are refined into a Cramer's rule formulation of the solution. This development concerns the timehomogeneous algorithms only, with a constrained to a > 0, and it appeals heavily to the foundation provided in Appendix B. The product of this development is an expression for the components of the stationary distribution vector as rational functions generated from the characteristic polynomials of matrices derived from the state transition matrix. The derived matrices are generated by setting selected rows of P to zero. The utility of the approach is that the form of P suggests a mechanism for expressing the values of the characteristic polynomials. Some key intermediate parts of the required methodology are developed in Section 9, but the effort stops short of explicit solution. However, some very significant conclusions concerning the asymptotic behavior of the algorithm are obtainable (Sections 7 and 8) from the results developed here without explicidy solving the system. 6.2 The Stadonary Distribution Description As established in Section 4, implementadon of the mutation operator with nonzero mutation probability (i.e. a > 0) implies that for both the two and threeoperator 66
PAGE 76
67 algorithms, Vm,n Â€ S':P(m  n) > 0Thus, by Definition Bl, p is primitive for any integer k > 1. Hence, from Section B.3, the stationary distribution of the two and threeoperator simple genetic algorithm exists, is unique and is a left eigenvector of the state transition matrix corresponding to eigenvalue 1, i.e. or equivalently ^(P!) = 0'. Eq.6.1 The following proposition establishes a significant fact concerning the rank of the matrix (P 1) in Eq. 6. 1 . Proposition 6. 1 : The rank of the matrix (PI) in Eq. 6.1 is exactly N' 1 where N' = card(S') is the dimension of P. This result follows from Theorem B4(f). Its significance is that exactly one column of the system of equations in Eq. 6.1 can be replaced without sacrificing any of the constraints which Eq. 6.1 imposes on qÂ„. Proposition 6.2 below concerns such a modification of the system. The modification consists of replacing any column (e.g. the column indexed by n e S') of Eq. 6.1 by a column corresponding to the constraint _X qa(m) = qIT=l, Eq.6.2 me S" thus producing a system of the form ^(P!)5 = ^ Eq.6.3 where (PI)h is generated by replacing the column of (PI) indexed by n Â€ S' with the vector 1 whose components all have the value 1, and where e^ is the row vector containing 1 in column n and O's elsewhere.
PAGE 77
68 Proposition 6.2: If the constraint described in Eq. 6.2 is used to replace any column (e.g. column n) of the system in Eq. 6.1, the resulting system (Eq. 6.3) is full rank, or equivalently,(P!)Â„ ;^ 0. Since P is a stochastic matrix (Definition A3), the system of equations in Eq 6.1 can be transformed into an equivalent system in which the column indexed by the arbitrary column index n e S' is represented by the equation q^ = 0. The required transformation is obtainable by replacing column n by the sum of all columns m e S', and thus any n e S' is a candidate for replacement. Proposition 6.2 is then a restatement of Proposition B2 in terms of the determinant of the matrix of the modified system. It is the essential condition for justification of the following proposition. Proposition 6.3: The components of the stationary distribution can be expressed in the form _ (P!)f qa(m) = T;^ I(pi)hI where (Pl)^^ is derived from (PI)s by replacing the row of (P1)5 indexed by m e S' with the row vectored. This result is simply an application of Cramer's Rule to the solution of the system in Eq. 6.3. It applies because  (PI)^! 9^ is assured by Proposition 6.2. The equality defined in Proposition 6.3 can be evaluated without computing I (PI)i; directly, as suggested by the following proposition. Proposition 6.4: The denominator determinant in Proposition 6.3 can be written as I(Pi)hI=_I \{P\t\. me S'
PAGE 78
69 This result follows from application of elementary column operations on column n of I (P I)hI and employing the definition of  (P 1)^ \ . The essential step is noting that the cofactor of each of the (unit) elements in column n is equal to the corresponding l(pi^?. Since the numerator determinant defined in Proposition 6.3 is generated from (P1)5 by replacement of row m by the row vector e^, its value is the cofactor of the (unit) element in row m and column n. As indicated in the following proposition, it is equal to the determinant which results from the corresponding row replacement in (P 1). Proposition 6.5: The numerator determinant defined in Proposition 6.3 can be written as (P!)f = (P!)^Â®' where (PI)"""*' is defined as the matrix which results from (PI) by replacing the row indexed by m with the row vector e^. Next, note that if m = n, then  (P 1) "" "  can be written as i(P_if^)=_rp
PAGE 79
70 where pC) is defined as before and where pÂ® is defined as the matrix which results by replacing row m of P by the row vector e^. This result can be further reduced by noting that the row replacement by which P^ is generated from P preserves the row sum constraint (i.e. P^ is a stochastic matrix). Thus, 1 is an eigenvalue of PÂ® (Definition A3), from which it follows that  (P;^ 1) 0. Consequently, the m = n and m ?t n cases can be assembled as indicated by the following proposition. Proposition 6.6: The determinant  (P 1) " " * defined in Proposition 6.5 can be written as \iPif''\=Mi]\. By collecting the results of Propositions 6.36 and noting that the superscript in P^; is now superfluous, the components of the stationary distribution can be written as indicated in the following proposition. Proposition 6.7: The components of the stationary distribution can be expressed in the form _ \P^i\ Ps! _ZP5i _IPhi nÂ€ S' ne S' where P^ and P^ are derived from P by replacing the rows indexed by m and n respectively with the row vector . Thus, computing the stationary distribution components reduces to evaluating the characteristic polynomials of the Psi's at A. = 1 (i.e. (t>s(^) = I P^ ^I =>PhI =s(l))Also, since 1 is an eigenvalue of P it follows that (})(1) =  PI =0, which suggests the following alternative to Proposition 6.7. Its usefulness is established in Sections 9.39.4.
PAGE 80
71 Proposition 6.8: The components of the stationary distribution can be expressed in the alternative form _ IPIIIP5II qu(m) = I(PIP5I) nÂ€ S' where as before P^ and P^ are derived from P by replacing the rows indexed by m and n respectively with the row vector . 6.3 Positivity of the Stationary Distribution Components Strict positivity of the stationary distribution components can be deduced from Theorem B4 and the form of P^. Every element of P^; in every row other than row m is identical to the corresponding element of P, while those in row m are zero. This is expressed in the nonnegative matrix notation of Appendix B as < P^ ^ P. Further, since P5; and P differ in row m, P^ ^ P, and consequently by Theorem B4(e), every eigenvalue of Ps satisfies  ^.J < 1 . It follows that for >. > 1 , ( 1 )  P^ >.I = U{\ A.) ^t and (2) the algebraic sign of  Ps^I is (l)"^ for all m e S'. Specializing these arguments to the case A, = 1 yields the following proposition. Proposition 6.9: For all a > 0, the value of the determinant  P^ 1 satisfies ViHe S': (1) IP^Tl ^^Oand (2) the algebraic sign of I PhjI is (if. An immediate consequence of Proposition 6.9 is that both numerator and denominator of the expression for qÂ„(m) in Proposition 6.7 are nonzero and have identical algebraic sign. Strict positivity of the stationary distribution components follows from these observations. That is, Vm e S' : q,,(m) > 0.
PAGE 81
72 6.4 The Indeterminate Form at a = All of the results established in this section assume that the mutation probability parameter is strictly positive (a > 0), and thus are not applicable at a = 0. The reason is apparent when Eq. 4.7 and the twooperator result in 4.17 (or the threeoperator counterparts of Eq. 4. 17 given by Eq. 4.23 and 4.27) are applied to  P^ 1 . It follows that the row of the a ^ 0^ limit of  PsI corresponding to the oneoperator absorbing state n^ e Sa', n^ ?
PAGE 82
SECTION 7 THE ZERO MUTATION PROBABILITY STATIONARY DISTRIBUTION LIMIT 7.1 Overview In Section 4.3.1, it is established that the timehomogeneous oneoperator genetic algorithm Markov chain possesses a stationary distribution but that it is not unique. In Sections 4.3.2 and 4.3.3, it is established that the timehomogeneous two and threeoperator counterparts possess unique stationary distributions provided a > 0, and Section 6 formulates the existence and uniqueness argument into a rational function expression for the unique solution. Since the twooperator state transition matrix approaches its oneoperator counterpart as a ^ 0* (Eq. 4.17) and since the threeoperator algorithm exhibits the corresponding behavior with respect to the P2'(i I n)s (Eq. 4.23), a question which naturally arises from these observations is whether an a Â— > 0^ limiting distribution exists for the two and threeoperator algorithms. (If such a limit exists, then it is necessarily unique). This section answers that question affirmatively and also confirms the observation made in Section 5.5 that the limiting distribution is nonzero for all states corresponding to uniform populations (absorbing states). The approach taken here is to transform the expressions for qÂ„(m) in Propositions 6.7 and 6.8 into equivalent expressions which yield determinate forms at a = 0. The result requires transforming P and P^ into related matrices but with the states corresponding to uniform populations (oneoperator absorbing states) coalesced into adjacent nonunifonn population states. The development is tedious and involves some additional notation. 7.2 Functional Form of the Stationary Distribution Before proceeding with the limiting case development which is the primary purpose of this section, it is convenient to establish some intermediate results concerning the 73
PAGE 83
74 behavior of q^ as a function of aThese resuUs follow from the results developed in Section 6 and some simple observations about the form of the elements of P. From Eq. 4.144.16 and Eq. 4.2226, all elements of the state transition matrix are rational functions of a with denominator polynomial (1 + a)^^. Thus, for a > (l^a)^'Ps! =\{l+af'P{l+a)'^l\ = Qs(l+ar! where every element of Q^ (and hence the value of  Q5 (1 Ia)*^^!! ) is a polynomial in a. Further, since row m in Qs is zero, the polynomial value of the determinant includes the factor (1 1a)'^''. Consequently (1Ha)^''^PI=e,,(a) Eq.7.1 for 0^(a) some polynomial function of a. Proposition 7.1 below follows. Proposition 7.1: For all a > 0, the value of the determinant  P^ 1 is a rational function of a with nonzero denominator polynomial (1 + a)"^"^ "''. By applying Eq. 7.1 to Proposition 6.7, the components of qÂ„ can be written as _ es(a) es(a) ne S' Hence, the qa(m) are rational functions of a, and since a rational function is continuous everywhere its denominator polynomial is nonzero, application of Proposition 6.9 and Eq. 7.1 (which together establish that 0(a) = Z6H(a) ^ 0) to Eq. 7.2 yields the following. Proposition 7.2: For all a > 0, the components of q^ are continuous rational functions of the independent variable a. Further, differentiation of Eq. 7.2 with respect to alpha yields a rational function of a
PAGE 84
75 dqa(m) _ 1 da ~ 0(a)= e(a)^e,(a,''Â«<Â«' da 'Â•"'"' da with nonzero denominator polynomial 0(a)^ The following proposition is a consequence. Eq. 7.3 Proposition 7.3: For all a > 0, the components of the first derivative of qÂ„ with respect to a are continuous rational functions of a. 7.3 The Absorbing State Rows of I P II and I Pzr II The rows corresponding to oneoperator absorbing states in the determinant I P1 have a particularly simple form. The nondiagonal elements of row n^ e S^', which represents a uniform population of solutions i^ e S, are given by Eq. 4.19 and 4.29 respectively for the two and three operator cases. The principal diagonal element is obtained by evaluating Eq. 4.19 or 4.29 at m = n^ and subtracting 1. Thus, P(nJnJl = /"M^ Â„^'(v.'a) a v"Ay (lta)^ 1 1 1 1 MLaML(ML l)a^/2 a ML (1+a) ML MLaHO(aO (lfa)""^ ' and if the general element in  P1 is denoted by T(m  n), then the elements of row n^ can be written as Vn^eS,':T(nnJ = MLa(0(a') \ML (11ar Mia n = n. Eq. 7.4 n j(Ha) ML neS'{n^}
PAGE 85
76 Additional insight into the form of the absorbing state rows can be obtained with the aid of the following notation. Let m^.n^ e Sa' be distinct but otherwise arbitrary absorbing states of the oneoperator Markov chain, let i^ e S be the bitstring represented in Ha and let 8(1^) c S be the set of bitstrings accessible from Ia via exactly one bit mutation event (i.e. 8(1^) = {ipii e S,H(i,,iA) = !})Â• It follows from this definition that card(S(iA)) = L. Then, for M > 1 let 8(11^)' be defined as S(HA)' = {H:He S',n(iA) = Ml,n(i,)= l,i, Â€ SdJ} cS', the set of nonabsorbing states adjacent to the absorbing state n^. The restriction on M is required to ensure that no absorbing state m^ is contained in the adjacency set of any absorbing state n^. S(nA)' includes exactly one distinct element for each i, Â€ 5(1^), and consequently card(S(nA)') = cardCSCi^)) = L. Also, from the form of 8(11^)', it follows that for M > 3, 8(mA)' and 8(nA)' are disjoint if m^ and n^ are distinct oneoperator absorbing states. Thus, if 8^" is defined as 8a" = u S{nJ and M > 3, then card(8A") = card(8A') x L = NL. This restriction on M is assumed in all of the following. With the aid of the new notation, the element in column n g 8(nA)' of row nA in I P1 can be written as f M ^ Vne8(nA)':T(nnA) = P(nnA) = a (,Mlj(l+a)^ M^ g 1 J(l+af^ Ma (liaf'Thus, Eq. 7.4 can be revised as follows
PAGE 86
VH^eS/:T(nnJ = 77 MLa + 0(a^) (1+a) Ma n = n. MI. A (l+a)""' O(a0 n Â€ S(nJ' Eq. 7.5 HÂ€S'S(HJ'K} (1+ar where the exponent s of a in the order expression for the general term is an integer satisfying s > 2. The elements in columns n^ and n Â€ SCnJ' are first order in a while the elements in all other columns are at least second order. Eq. 7.5 applies to every absorbing state row of  Pis1 as well if in g S^'. If P5^~ 1 1 is being considered where m^ e S^', then row m^ contains 1 at its principal diagonal and zeros elsewhere. In that case Eq. 7.5 only applies to the absorbing state rows n^ s S^' {m^}. Exacdy N1 such rows exist in Pm^l. By applying Eq. 7.5 and these observations to Proposition 7.1, it follows that the lowest order term with nonzero coefficient which can conceivably exist in the numerator polynomial of Pis^1 is the order a^"' term. Similar reasoning reveals that the corresponding lowest order term with nonzero coefficient for  P^1 with m Â€ S^' is the order a'^ term. If the coefficient of the order a^" ' term in the numerator polynomial of  P^^ 1 1 is indeed nonzero, and if the corresponding coefficients for all such m^ have the same algebraic sign, then the required limiting value of qÂ„ can be expressed in terms of the.se nonzero coefficients via substitution into Proposition 6.7. These conditions are in fact satisfied as demonstrated below. 7.4 Reformulation of Propositions 6.7 and 6.8 The next step in this development is the definition of some auxiliary matrices related to P and P^; and the reformulation of Propositions 6.7 and 6.8 in terms of them. The new matrices, designated PCm^)' and P^^ ' respectively, are derived by coalescing each of the N 1 absorbing state columns n^ e S^' { m^ } of  P 1 and  Ps^ 1 1 with its neighboring nonabsorbing state columns, n Â€ S(nA)'. Specifically, let Qs^ be derived
PAGE 87
78 from I p 1 1 by adding l/L times the column n^ Â€ Sa' {m^} to each of the L adjacent I A I nonabsorbing columns n Â€ S(n^y and repeating the process for each remaining n^ e Sa' {mA}. This operation is applied once each for the exactly N 1 absorbing state columns Ha e Sa' {hia} and it preserves the value of the determinant  Qs^  = 1^*5;^ ~^ If now Qs (m I n) denotes the general element of  Qs^ , then by applying the recipe used in its construction and Eq. 7.5, the elements in the absorbing state rows nA e Sa' {mAJ of Qs can be written as VitIa e Sa', VnA e Sa' {mA} : QsXm I n^) MLa + 0(a^) m = n. (l+ar^^ O(a') meSK)' ^^ mÂ€S'S(HA)'{HA} (l+af^ O(a^) (1+ar where as before s is an integer satisfying s > 2. Thus, each of the N 1 absorbing state rows Ha e Sa' {vri/^} of  Qs^  can be written as a sum of two rows, one row containing MLot/(l + a)"^^ at its principal diagonal location and zeros elsewhere and the second row being a multiple of a^/(l + a)"^^. It follows from elementary determinant row expansion operations that  Ps;^ ~ ^  = lOs^  can be written as I _ II (MLafQ,; o(a) where  Qh^'  is the order N' N + 1 principal minor of  Qs^  generated by deleting the N 1 row/column pairs which intersect on the nA Â€ Sa' {mAJ principal diagonals and where the exponent s of a in the Eq. 7.6 order expression is an integer satisfying s > N. The elements in all rows of  Qh^'  except row mA are composed of contributions from the elements in nonabsorbing state rows of P and the 1 principal diagonal term contributed by I in  P^^ 1 1. Row mA of  Qs^'  contains 1 at its principal diagonal location and zeros elsewhere. Thus, if Qs. ' is written as Qs. ' = Ps. ' " I' , then firom the
PAGE 88
79 recipe employed in its construction, it follows that the square matrix p_ ' thus defined has dimension N' N + 1 and that its elements are given by Eq. 7.7 Vm,n e S'S/ + {H,} Ps.(mn)' = P(m I n) n = m. n?tm. P(mn) + ^P(n^n) mÂ€ S'Sa' + {mJ + S(mJ'S/'. n^tm^ me Sin J Careful examination of Eq. 7.7 reveals that the transformation by which P^; ' is generated from P^^ preserves all row sums. Thus, Pj^^' is very similar in form to P^; . It is derived from a (fictitious) row stochastic matrix by setting a specified row (row m^) to zero. If the preceding steps are repeated for  P^ 1 where m ^ S^', except that all N absorbing state columns n^ e S^' are coalesced rather than just the N 1 columns n^^ e Sa' {m^}, a result very similar in form to Eq. 7.6 obtains. That is, _ (MLaflQs'l O(a') Eq. 7.8 (1+ar^ (1+a)^'^' where  Q^i'l is the order N' N principal minor of  Q^\ generated by deleting the N absorbing state row/column pairs and s is an integer satisfying s > N + 1. The nonabsorbing state row m contains 1 at its principal diagonal k^ation and zeros elsewhere. Substitution of Eq. 7.6 and 7.8 into Proposition 6.7 yields a form more amenable to examination of the a ^ 0* limiting stationary distribution. The two cases m e S^' and me S' Sa' must be distinguished. Then, after some straightforward algebra.
PAGE 89
80 { Qi,; + 0(a)/(l+af'^' S QH; + 0(a)/(l+af^^' qa(m) = m = m, e S.' 0(a)/(l+ay N'N + l X Qh' +0(a)/(l+a) Â— meS'S' N'N+l ^ An equivalent result expressed in terms of the auxiliary matrices P^; ' is qa(m) = ' 0(a)/(l+af "^"' Z PH;I' + 0(a)/(l+ar^^' m = mA e S, me S'S/ Eq. 7.9 By retracing the preceding steps by which P^ was transformed into P^ ', companion results to Eq. 7.7 and Eq. 7.9 can be developed for PCm^)'. The companion to Eq. 7.7 differs only in the elements of row n = m^. Thus, if P(m  n)' denotes the general element in PCm^)', then Eq. 7.10 Vm,ne _ _ P(m I n) tP(mn) + P(njn) me S'S^' + {mJ + S(mJ'S/' m e S(nJ' Further, examination of Eq. 7.10 reveals that the row sum constraint on P is preserved in the transformation by which PCm^)' is generated (i.e. PCm^)' is a stochastic matrix). Thus, I P(m;^)' I'l = 0. A consequence is the Proposition 6.8 counterpart of Eq. 7.9, Eq.7.11 P(SJ'I' Ps ;r + 0(a)/(l+af ^^' qa(m) I iP(nJ'r PH;r +o(a)/(i+ar^^' 0(a)/(l +af^*^ vN'N+l I P(nJ'I'Ps'r +0(a)/(l+a) m = m. 6 Sa' me S'S '
PAGE 90
81 7.5 The Stationary Distribution Limit The zero mutation probability limits of Eq. 7.9 and Eq. 7. 1 1 exist if the determinant sums in the denominators are nonzero. In fact they are nonzero, as demonstrated in the following. This argument is very similar in form to the development in Section 6.3 concerning positivity of the stationary distribution. The essential step is demonstration of the existence of a primitive stochastic matrix Q' which satisfies both < lim P^' 0* limit of P^^ ' are obtained by substituting the oneoperator results in Eq. 4.25 into Eq. 7.7. If the threeoperator case is under consideration, then Eq. 4.22 and Eq. 4.24,25 are employed. In the following, the twooperator notation is employed. Let Q' be generated from the a Â— > 0* limit of P^ ' by replacing row m^ with the row whose elements are given by VEgS'S/ + {m^}:Q'(mEj = ^,_^^^ >Q. Eq.7.12 Thus, the row sum of row m^ in Q' is 1. Since all remaining rows of Q' are identical to those of the a Â— > C limit of P^ ', and consequently have row sum 1 by Eq. 7.7, Q' is a stochastic matrix. Additionally, it satisfies both 0< lim P^' 0, the fictitious Markov chain is both aperiodic (Definition A9, Theorem A2) and primitive (Definition Bl, Theorem Bl) provided that it is irreducible (Definitions A7 and A8, Theorem Al). Thus, primitivity is established by demonstrating that every state me S' S^^' I{m^} is accessible in some finite number of transitions from every state n Â€ S' S^' i{ni,^}. Since all states in S'SA' + {niA} are accessible in one transition from m^ (Eq. 7.12), it is
PAGE 91
82 sufficient to demonstrate that m^ is accessible in some finite number of transitions from every state n e S' Sa'. Let iA e S be the bitstring represented in m^, let n e S' S^' and let i, e S be selected such that n(i,) > and H(i,, i^) < H(i, i^) for all i represented in n. Then, two cases must be examined. In case (1), i, = i^ while i, ^ i^ for case (2). If ij = Ia, it follows from Eq. 7.7 and the construction of Q' that Q'K I n) = lim P^(Ha I n) = lim P^CHa  n) Â— \M , ^riimP^CiAln)]" = P,(iAnf = P,(i,Hr>0 and consequently mA is accessible from n in 1 transition. Otherwise 3i2Â€S(i,)3H(i2,iA) = H(iÂ„iA)l and further if n, e Sa' is the oneoperator absorbing state defined by the condition ni(ii) = M while n,2 Â€ S(n,)' is the adjacent nonabsorbing state defined by n^Oi) = M1, ni2(i2) = 1, then from Eq. 7.7 and the construction of Q' Q'(ni2 1 n) = lim P2(Hi2 1 n) + lim PjCH,  H) = P,(n,jn)+P,(i,H) L = rP.(n,n) L" = f[P,(i,n)f >0. Thus, ni2 is accessible from n in one transition. If ij = Ia, then by the case (1) argument iba is accessible in one additional transition. Otherwise, the case (2) argument is repeated for some i3 E S(i2) 3 H(i3, Ia) = H(i2, Ia) 1 = H(i., Ia) 2.
PAGE 92
83 This procedure necessarily terminates with H(i,,iA) + 1 applications and the corresponding state space trajectory is executed with nonzero probability. From the foregoing argument, it follows that state m^ is accessible in some finite number of transitions from every state n e S' S^', and thus that Q' is primitive. Then, since both 0< lim P^' 0* counterpart of Proposition 6.9, is a consequence. Proposition 7.4: The value of the determinant lim [ P^^' 1'  satisfies Vm^E S^': (1) lim p 'I'Uo I A I aÂ»0* (2) the algebraic sign of lim  P^jJ ~ I'  is (1) ' _ T' I :Â„ / i\N'N + l 'A The conditions asserted in Proposition 7.4 ensure that substitution into Eq. 7.9 and 7.1 1 yields a determinate form in the a Â— > iT limit. Propositions 7.5 and 7.6 below represent the limiting forms, and consequently are respectively the limiting counterparts of Propositions 6.7 and 6.8. Proposition 7.5: The components of lim qÂ„ exist and can be expressed in the form aÂ»0' lim I p^;1' I aÂ»0* Â— Â— Â„ , m = m^ e S^ lim qÂ„(m) = ^ aÂ»0* _1 iimP5;r mÂ€S'S^'
PAGE 93
84 Proposition 7.6: The components of lim qÂ„ exist and can be expressed alternatively as aÂ»0* lim{P(mJ'rPE;i' liin qÂ„(m) = qo(m) = aÂ»0* _Z limP(nJ'rP5;I m = m^e S^' meS'S^' An immediate consequence of Propositions 7.4 and 7.5 is strict positivity of the zero mutation probability limiting stationary distribution components for all absorbing state rows. That is, Vm^ e S^' : qo(mA) > 0The argument is analogous to that at the conclusion of Section 6.3 concerning strict positivity of all stationary distribution components when a > 0. This result is anticipated by the simulation results in Section 5.5. A consequence is that the required limiting behavior for direct application of the simulated annealing convergence theory to the genetic algorithm model does not follow. However, the results displayed in Section 5.5 and developments produced in Section 9.3 suggest that the limiting distribution can be made arbitrarily close to the desired limiting behavior. Since the a Â— > 0^ limit of the stationary distribution exists, the definition of qÂ„ can be extended to include the point a = 0. That is qJa=o = qo= lim Qa aÂ»0* where the values of the required limits are provided by Proposition 7.5. Proposition 7.7 below follows from this extended definition of q^ and Proposition 7.2. Proposition 7.7: For all a > 0, the components of q^ are continuous rational functions of the independent variable a.
PAGE 94
85 Proposition 7.3, which concerns the first derivative of q^, can also be extended to include the limiting case. The extension requires easily obtainable counterparts of Eq. 7.13 developed for  Ph^' 1'  and Eq. 7.9. The Eq. 7.1 counterpart is (l+a)^'^PH;i'l = es,(a)', Eq.7.13 and that for Eq. 7.2 is Id^^iay + Oia) 7Â—Â— ;rÂ— Â— ni = m. gS.' 0(a) +0(a) ^^^j^ qa(m) Q(Â«) ;;:.c._c' e(a)' + 0(a) ^^ where 0(a)' is the polynomial counterpart (summed over n^ e S^') of 0(a) in Eq. 7.2. Differentiating Eq. 7.14 with respect to a yields a rational function with denominator polynomial [0(a)' + 0(a)]^ whose a ^ 0"^ limit is nonzero by Proposition 7.4, Eq. 7.13 and the definition of 0(a)'. Proposition 7.8 below follows from Proposition 7.3 and these observations. Proposition 7.8: The components of the first derivative of qÂ„ with respect to a possess limits as a ^ 0^ Thus, a zero mutation probability limit exists for the timehomogeneous two and threeoperator algorithm variants. The limit is represented by Propositions 7.5 and 7.6. Further, Propositions 7.7 and 7.8 establish some useful ancillary results concerning the stationary distribution behavior at the point a = 0. These latter results are employed in the following section in establishing strong ergodicity of the inhomogeneous genetic algorithm Markov chain. Propositions 7.5 and 7.6 are used in Section 9 to develop a mcthcxiology for representing the stationary distribution limit.
PAGE 95
SECTION 8 A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND 8.1 Overview The annealing schedule bounds for the simulated annealing algorithm, which are reviewed in Section 2.4.2, are derived by requiring that the nonstationary Markov chain which represents the algorithm be strongly ergodic (Definition A 13) and then deducing a monotonic lower bound on the algorithm control parameter. The methodology consists of demonstrating that the timehomogeneous Markov chain corresponding to every positive algorithm control parameter value possesses a stationary distribution, that the sequence of stationary distributions corresponding to any sequence of positive control parameter values converges to a limiting distribution if the control parameter sequence converges to zero, and then employing Definitions Al 1A13 and Theorems A5A7 to deduce a sufficient condition (the annealing schedule lower bound) to guarantee that the nonstationary algorithm achieves the limiting distribution (i.e. strong ergodicity). The model development in Section 4 demonstrates that for all mutation probability values in the range < p^ < 1 , the Markov chain representing either the two or threeoperator timehomogeneous simple genetic algorithm possesses a stationary distribution. Section 7 demonstrates that the stationary distribution approaches a limit as the mutation probability parameter approaches zero. This section proposes and then verifies a monotone decreasing lower bound on the mutation probability sequence of the nonstationary genetic algorithm Markov chain which is sufficient to ensure strong ergodicity. 86
PAGE 96
87 8.2 A Weak Ergodicity Bound The following paragraphs propose and then verify a mutation probability parameter bound sufficient to ensure that the Markov chain of the corresponding nonstationary simple genetic algorithm is weakly ergodic (Definition All). The bound applies to both the two and threeoperator algorithms, and it appears in Proposition 8.1 below. Proposition 8.1: The mutation probability bound given by pjk)>^k"^ is sufficient to ensure weak ergodicity of the corresponding nonstationary (two or threeoperator) simple genetic algorithm Markov chain. This result is established by using the lower bounds on the two and threeoperator conditional probabilities in Eq. 4.21 and 4.31 with Definitions All and A 12 and Theorems A5 and A6. Applying the lower bound in Eq. 4.21 and 4.31 to T,() of Definition A12 and Theorem A5 yields 'C,(P) = 1 min Z min(P(m  n,),P(m  n^)) Thus, 2a 1 +a \ML
PAGE 97
88 and consequently from Theorem A6, the chain is weakly ergodic if the sequence of control parameter values {a(k;)} satisfies ( 2a(k) r k=il^lia(k)^ Comparing this result to the known divergent series Zk"\ it follows that the Markov chain is weakly ergodic if the sequence {a(k)} satisfies ( 2a(k) Y^ l+a(k)J from which f a(k) ^ >k\ >k"^. 2 l+a(k)y Using Eq. 4.13 to translate this result into an equivalent expression in pÂ„(k) establishes Proposition 8.1. 8.3 Strong Ergodicity The mutation probability schedule bound advanced in Proposition 8.1 is also sufficient to achieve strong ergodicity if it satisfies the condition on the sequence of vector differences in Theorem A7. The required sequence of vectors can be selected as the sequence of stationary distributions of the timehomogeneous Markov chains associated with the parameter sequence {Pn,(k)} (or equivalently with the corresponding sequence {a(k)}). Section 4 establishes that a stationary distribution exists for the timehomogeneous two and threeoperator algorithms corresponding to every value of a satisfying a > 0. Thus, associated with the sequence of control parameter values {a(k)} is a sequence of vectors {q^} where q^ = qa evaluated at a = a(k). Further, based upon results established in Section 6, Section 7 demonstrates that an a Â— > 0"^ limiting stationary distribution exists (Propositions 7.5 and 7.6), that the stationary distribution vector varies continuously for all a satisfying a > (Proposition 7.7) and that its first derivative exists and is continuous
PAGE 98
89 for all a satisfying a > (Proposition 7.3). In particular, q^ is continuous on the closed interval < a < 1 and its first derivative exists at every interior point of that interval. Therefore, if consideration is limited to monotone decreasing control parameter sequences, then by the mean value theorem the difference between the m components of any two consecutive vectors in the sequence can be written as dqÂ„(m)' qk+i(m)qk(m) = da (a(k+l)a(k)) la = a (k) where the value a*(k) satisfies a(k + 1) < a*(k) < a(k). Consequently, dqÂ„(m)' lqk+i(m)qk(m) = da a = a (k) xa(k+l)a(k) and Sqk + ,(m)qk(in) = I k=l k = l dqÂ»(m) da a = a (k) xa(k+l)a(k)k Eq. 8.1 From Propositions 7.3 and 7.8, it is possible to define a function ga(m) which is continuous in a on the closed interval < a < 1 as follows dqÂ„(m) u
PAGE 99
Ilqk+i(m)qk(m) = Z] k=l k=l 90 dqa(m) da la = a (k) xa(k+l)a(k) < Z Bxa(k+l)a(k) k = l = B Za(k+l)a(k). k = l Eq. 8.4 Since only monotonic control parameter sequences are under consideration, the sum in the last line of Eq. 8.4 can be written as the difference of the initial and final parameter values of the sequence. Thus, Iq,,,(m)q,(m) ^k'^ is sufficient to ensure strong ergodicity of the corresponding Markov chain. Further, the Markov chain representing any nonstationary two or threeoperator simple genetic
PAGE 100
91 algorithm for which the mutation probability sequence both observes this bound and converges to zero achieves (asymptotically) the limiting probability distribution defined in Propositions 7.5 and 7.6. 8.4 Comparison With the Simulated Annealing Parameter Bound It is instructive to compare the mutation probability sequence bound developed here with the anneaUng schedule bounds reviewed in Section 2.4.2.2, both of which are of the form K/log(k). Let p(k) be defined as the ratio p(k) = pÂ„(k)/T(k) where Pm(k) is selected as the bound developed herein and T(k) is selected as the bound provided by either Eq. 2. 1 2 or Eq. 2.13. That is p(k) = ^k''^/[K/log(k)] Eq.8.5 = ^log(k)/k^. Thus, decreasing values of p(k) imply that the genetic algorithm convergence rate is superior (asymptotically) to that of the simulated annealing algorithm. Now, let k = exp(x), or equivalently x = log(k). Substituting into Eq. 8.5 yields P(k) = ^xexp[^ . Then, since for all positive constants y, the limit of xexp(Yx) as x ^ oo is zero, it follows that lim p(k) = ^ li>^ilog(k)/k^] = 0Eq. 8.6 Thus, the nonstationary simple genetic algorithm provides an asymptotically superior convergence rate.
PAGE 101
SECTION 9 REPRESENTATION OF THE STATIONARY DISTRIBUTION SOLUTION 9.1 Overview Previous sections of this work establish some key results required for extrapolation of the simulated annealing convergence theory onto the nonstationary Markov chain model of the simple genetic algorithm. Specifically, existence of a unique stationary distribution for the timehomogeneous two and threeoperator algorithms is established in Sections 4.3.2 and 4.3.3, and in Section 6, the existence argument is formulated into a Cramer's rule expression for the stationary distribution components. Sections 7 and 8 continue that development by establishing the existence of a limiting distribution as the mutation probability parameter approaches zero and a mutation probability sequence bound sufficient to achieve it. However, the empirical results in Section 5.5 suggest a complication, confirmed in Section 7, associated with the form of the limiting distribution. The limiting distribution behavior necessary for extending the simulated annealing global optimality result does not obtain because the Umiting distribution is nonzero for all states with uniform population (oneoperator absorbing states), including those for suboptimal solutions. The limiting distribution entropy results reported at the conclusion of Section 5.5 support the intuitive notion that increasing the population size parameter ~~^ should bias the limiting distribution toward the desired behavior. However, to pursue that notion further requires closer examination of the stationary distribution equations and the requirements for their solution. This section begins that task. It is a very extensive development and it stops short of explicit solution. However, it provides some insight into the nature of the solution and additionally, it defines a promising approach to continuing the work started here. 92
PAGE 102
93 The essential task of representing the stationary distribution solution consists of evaluating the determinants required to express the results of Propositions 6.78 and their limiting counterparts in Propositions 7.56. The development proceeds by examining the three distinct cases which arise from applying three different sets of constraints on the value of the mutation probability parameter. The special case pÂ„ = 1/2 <=> a = 1 is examined in Section 9.2. It leads to a very simple (trivial) result that is of no particular interest in its own right, but is fundamental to the mechanism employed in Section 9.3 in developing the more general case < a < 1. The approach pursued in Section 9.3 involves expanding PsiI =PI PsIi as a multivariate Taylor's series in the N' X N array of conditional probabilities [P(i  n)] for i e S and n Â€ S' (defined by Eq. 4.14 for the twooperator algorithm and by Eq. 4.22 for its threeoperator counterpan) about the point corresponding to a = 1. The product of that effon is an expression for the coefficient of the general term of the series as a determinant with combinatorial elements. The case p^ ^ 0^ <=> a ^ 0^ is examined in Section 9.4. The methodology developed in Section 9.3 extends with very little modification to represent the a ^ 0^ limiting behavior of I Ps^1 . Section 9.5 concludes by pointing out some significant identities which exist among the Taylor's series coefficients and the connection of those identities to the algebra of symmetric and alternating polynomials. Its purpose is to provide a foundation for extending the stationary distribution representation work begun here. 9.2 The Limiting Case a = 1 As pointed out in Section 6, the determinants required for expressing the value of the stationary distribution components by Propositions 6.78 are the characteristic polynomials of the P^ matrices evaluated at X = 1 . The coefficients of the characteristic polynomial of any square matrix X with finite dimensions can be expressed in tenns of the principal minors of  X (i.e. minors generated from  X by deleting combinations of rows and columns with the same indices). For example, the characteristic polynomial of P can be expressed as
PAGE 103
94 = Afj. A^. _ 1^ + A^,. _ 2^.^ A^,. _ 3^,^ Eq. 9. 1 ++ (lf'A,Jl^'' + (lfAo?^'^' u where N' = card(S') is the dimension of P and A^ is the sum of its order u principal minors. This resuh is fundamental to the theory of square matrices and follows from application of elementary determinant expansion operations to  P 7d\ [Aitk54, MoSt64, Muir60]. Exactly N' u N'! y u!(N'u)! order u principal minors are summed to produce A^. The values of some of the Au's are A ..= I P u = N' trace(P) u = 1 Eq. 9.2 1 u = where the A,, result follows from the convention that the single order zero principal minor of X has value 1. In a fashion exactly analogous to Eq. 9.1, the characteristic polynomial of P^ can be written as U^) = \P^Xi\ = aIaI_,X + aI_:,X^AI_,X' Eq.9.3 ++ {lf'AfX'''' + i\fA^X''' = S(lf "X^"" u where A'^ is the sum of all order u principal minors of  P^^l . Thus, the value of each determinant required for expressing q via Propositions 6.78 can be written respectively in the form
PAGE 104
95 P5I=i5(l) Â— A Â— A IA" Â— A" ++ (lf''A" + (lfA^ N' u A m and u P!P5!=(})(i)(l)s(i) = (A^.A^.)(An.,A^._,) + (A^._,A^,)(A^._3AJ._3) Eq.9.4 = X(lf"(AÂ„A:). u Further, the A"'s can be expressed in terms of the principal minors of  P because those principal minors of  P^ which include row m (the row) have value zero while those which exclude row m are identical to the corresponding principal minors of  P . In particular, the Au 's corresponding to Eq. 9.2 are a:= P=0 u = N' trace(P) = trace(P) P(m  m) u = 1 1 u = Eq. 9.5 Eq. 4. 1 1 and Eq. 4.24, it follows that
PAGE 105
96 Vm, n e S' : P(m  n) Â„^ , = [m n m lis s /^ 1 \^^ 2^ , Eq. 9.7 ^M^ ^my j_ )ML' Thus, P Â„^ 1 = [P(m I n)]Â„= 1 is a rank one matrix, and therefore all minors of  P Â«= i of order u > 2 are identically zero. Eq. 9.4 then reduces to {P!P!}Â„^, = {(t)(i)(l)s(i)}Â„^, = (lf'{A,Af}^^, + (lf{AoAnÂ„^,, and substituting from Eq. 9.2, 9.5 and 9.7 into this result produces ^N'l {PI PsI }Â„_ =(ir '{trace(P)[trace(P)P(m  m)]}Â„. = (lf^P(mm)Â„., Eq. 9.8 = (1)' N'l fM\ 1 yVny >ML Employing Eq. 9.8 with Proposition 6.8 yields an explicit result for the a = 1 limiting value of q(m), i.e. P!P5! q(m), '" I(PIPhI) nÂ€ S' a=l (1) N'l r m ML (M ymj I (1) nÂ€ S' N'n) HÂ€s\n^ (yC\ 1 yXXXj >ML* It is independent of the objective function of the underlying optimization problem because at Pni= 1/2 <=^ a= 1 mutation completely nullifies the reproduction operator. Although this trivial case is not of any particular interest on its own, it serves as the basis for developing the general case < a < 1 in Section 9.3. The essential idea is that P
PAGE 106
97 is rank 1 at a = 1, which makes the low order derivatives of p with respect to the conditional probabilities P(i  n) have comparatively low rank, and this suggests expanding  Ps 1 = I P 1 1 Ps 1 in a multivariate Taylor's series about the point corresponding to a = 1. The result reflected in Eq. 9.8 is the constant term of the series. 9.3 The General Case < a < 1 The state transition matrix of the two and threeoperator algorithms is completely determined by the fixed algorithm parameter M and the N' x N array of conditional probabilities [P(i I n)] for i e S and n Â€ S'. Each element of row n in P consists of a multinomial coefficient and a distinct order M product composed of integral powers of the P(i I n)'s corresponding to row n (Eq. 4.15 or 4.25). Thus, the order k principal minor of I P generated by inclusion of rows K = {n,,n2, Â• ^n^} c S' can be written as an order k X M polynomial (composed of order k x M monomial terms) in the k x N array of variables [P(i I n)] for i e S, n e K. The corresponding order k principal minor of  P^jl has identical value provided m Â« K and is zero if m e K. These facts along with the succinct representation of the P(i  n)'s as rational functions of the objective function and algorithm parameters (Eq. 4.2, 4. 1 1, 4.24) and the degeneration of P to rank 1 at a = 1 (Eq. 9.7) suggest an attempt to expand (t)si(l) = I P^1 as a multivariate Taylor's series in the P(i I n)'s about the point a = 1 where, according to Eq. 9.6, Vi Â€ S, Vn e S':P(i  n) = 1/2^. (Actually, expanding the alternative form (])( 1 ) ^( 1 ) =  P 1 1 P^^ T in a new array of N' X N variables which uniquely determines [P(i  n)] proves more productive). The constant term in the series is provided by Eq. 9.8 and the highest order terms in the series are the order (N' 1) x M monomials contributed by the single nonzero order N' 1 principal minor of  P^\ . Let r = [r(i,n)] be an N' x N nonnegative integer array having rows of the fonn rs = (r(0,n),r(l,n), Â•Â•Â•,r((2' 1), n)). The nonnegative integer r(i,n) represents the exponent of the factor (P(i  n) 1/2' ) appearing in a monomial term of the Taylor's series.
PAGE 107
98 Also, let ?1 = X r(i,n) and ? =1 \\t^\\ =11 r(i,n). Then, the Taylor's series iÂ€S neS' neS'ieS expansion of PmI =PI PsI can be expressed as PI=P!P5I = I C(m,?) X n n (P(i I H) 1/2^^ '^ r ne S'ie S Eq. 9.9 where __ a'^Piiii) C(m,r)= ^ n r(i,H)!a^'^(i,H) n e S' i 6 S (f), (iP5iirL= a=\ r! Eq. 9.10 a=l a'^ip!ip^!i) n nr(i,H)!a^'">p(iH) n Â€ S' i Â€ S (f), (ipiiiPHiiriÂ„ a=l is the coefficient of the order  r monomial term uniquely identified by the nonnegative integer array r. In these expressions, the symbol r! denotes the operation ?!= n n[r(i,H)!]. nÂ€ S'iÂ€ S Expressing the value of C(m,r) thus reduces to evaluating the indicated mixed partial derivative of  P^; 1 =  P 1 1 P^ 1 at a = 1 divided by r!. The coefficient of the order  r =  0 = term is C(m,0) = (P!P!fL^^ = {p!P,,!}Â„^, and its value is the constant term of the series, provided by Eq. 9.8. The coefficient of the first order monomial term which results from setting 1 i = i,,n = ni r'"=[r(i,n/'^ where r(i,nf^ = otherwise is given by C(m,?") = ^,. 9(PIPsI) aP(i, I n,) = (PIPI)' (i"*) a=l a=l
PAGE 108
99 The associated monomial term in the Taylor's series expansion of _ PJi = P!PsI is given by C(m,?'>)x(P(i, H,)l/2'). In subsequent paragraphs,  rj = and  tJ < M for n ?i m, which together imply II r < (N' 1) X M, are shown to be suitable upper bounds on the order of differentiation with respect to the P(i  n)'s when computing C(m,r). Thus, the Taylor's series terminates at finite order, as indeed it must since  P5II is a polynomial function of the P(i  n)'s, and as noted earlier, the highest order monomial terms are order (N' 1) x M. These upper bounds on the order of differentiation (i.e. upper bounds on  tJ ), along with the lower bound of on every component of r imposed by the requirement that r be a nonnegative integer array, can be represented to advantage in terms of a set related to S'. Let S', which is completely determined by the parameters L and M, be momentarily represented by S'(M) (that is, let its dependence on M be explicitly indicated) and let the set S" be defined as the set union of all S'(k) for < k < M. That is S" = S'(0)uS'(l)uuS'(Ml)uS'(M) M = u S'(k). k = The above constraints on the rows of r are then equivalent to requiring that every row of? be drawn from S", with the additional requirement that row m be the specific element rjs = 6 S". Since the S'(k) for distinct k are disjoint, it follows that N" = card(S") is the sum of the N'(k) card[S'(k)l, and consequently from Eq. 4. 1 and an elementary recursion on the binomial coefficient that [ M j [ M ) This result is precisely that supplied by Eq. 5.1, accompanying the state space enumeration empirical results tabulated in Section 5.2.
PAGE 109
100 Since p_ is independent of the P(i  m)'s associated with row m (row m of p_ is set to ), it follows that no monomial term containing any of the factors (P(i  m) 1/2 ) appears in the expansion of  Ph 1 Further, since  P T 1 P^ I =  P^ 1 , no such monomial terms appear with nonzero coefficient in the expansion of  P1 1 P^1 either. (Due to the constraint Vn e S' : Z P(i I n) = 1, the aggregate of such terms iÂ€ S appearing in  P1 is identically zero). This observation estabhshes the  r^H = bound and permits the following revision of the Eq. 9.10 definition of C(m,r) C(m,r) = . (P!PH!f a=l r! lrdl=0 otherwise Eq.9.11 The derivative of an order n determinant with respect to a variable x can be written as the sum of the n determinants generated by differentiating each row (or column) in turn with respect to x [Aitk54, MoSt64, Muir60]. For example, if A = ^11 ^12 ^1 ^2 then dA dx da,, da,2 dx dx Â•'21 ^2 '12 da2i da22 dx dx If the elements of any row in the given determinant are independent of x, then differentiation of that row introduces an all zero row and the value of the corresponding determinant is zero. In particular, if only one row of the given determinant depends upon X, then only one nonzero determinant appears in the rowderivative expansion. Higher order and mixed partial derivatives of an order n determinant can be expressed similarly, e.g.
PAGE 110
101 aUi 9x^3x2 d\ d\ 12 ^1 ^2 a^a,, a^Â£ 12 ax? ax? aaji aa22 +2 a^a,, a^a,2 d\,dy da 21 ax, 3x2 aa22 ax, ^ai2 ax, ax. ax, aa, + 2 ax, a^c '21 3X2 3X2 ^a,2 3x, 3'a 22 ax, 3x2 3x,3x2 3^a2, 3^a "3^ 22 3x? 3'a. 21 3'a 12 22 3x^3x2 3x?3x2 and again, differentiation of any row with respect to a variable upon which it does not depend introduces an all zero row. Thus, if in the preceding result the first row of A is independent of X2 and the second of x,, then only one of the determinants in the expansion survives 3'a,, 3'a 3x^3x2 12 3x? 3x? '1 <^^i 3a2, 3a 22 3x, 3x, Since each P(i  n) appears in only one row of  Ps 1 =  P 1 1 P^ 1 , it follows from application of the preceding determinant differentiation rules that the mixed partial derivative  P1 ' can be written as the single determinant (indicated hereafter by I (P 1) ' I = I P ' r I ) generated by differentiating the rows of the matrix (P 1) in accordance with r and then computing the determinant of the matrix derivative. Tliat is, due to the singlerow dependence of (P1) on each P(i  n), the two operations involved (differentiating (P1) and evaluating its determinant) commute. The same conclusion applies to any mixed partial derivative of  PsIl with respect to the P(i  n)'s, and hence
PAGE 111
102 (P!P5!f = PI^PHi^ = l(Plf(Psiif Eq.9.12 ^ip(f)_j(f) _pÂ«_j(f) ' ' I ID r Eq. 9.4 can be generalized to express the value of ( P1 1 P^;1 ) ' as indicated in the following. Let r have k < N' 1 rows which specify nonzero order differentiation, let K = {n,,n2, Â•Â•,nk} c S' be the set of differentiatedrow indices and further let m g K. Also, for N' > u > k let Au(r) be the sum of all order u principal minors of  P'  formed by including the k differentiated rows indicated by n Â€ K and u k of the N' k undifferentiated rows in I P' I . Exactly N'k^l (N'k)! i,uk; (uk)!(N'u)! order u principal minors are summed to produce Au(r). Finally, let A[f^(r) be defined similarly for I Pj^l . Then, applying the same elementary determinant expansion rules that lead to Eq. 9. 1 and Eq. 9.3 to  P^'' Xf) and  P^ Xf) yields u = k and and substituting these results into Eq. 9.12 with X = 1 yields the differentiated analog of Eq. 9.4 (PI IPsIlf = P^I^ Kfl Eq.9.13 = I(ir"(AXr)A:(r)). u = k If I P' I K is the order k principal minor of  P'  uniquely defined by the set of row/column indices K = {n,,n2, Â•Â•.n,;} cS' where m Â« K and if  P''kj is the order kl1 principal minor generated by including the undifferentiated row n e S' K with K (i.e.
PAGE 112
103 K5 = Kun = {nÂ„n2,,nk,n) cS'), then AÂ„(?) = IP^IsIP^'l nÂ€ S'K K5 IP^Ik u = N' u = k+l u = k Eq.9.14 Also, since every principal minor of  P^l is either identical to the corresponding principal minor of  P' or depending upon whether or not it includes row m, and since m Â£ K by hypothesis, it follows that the A"(r)'s corresponding to Eq. 9.14 are a:(?) = I m Is' I m I ,Wf), u = N' =;(?), I FV = A,,,(?)P*V u = k+l neS'K{m} ^ ^ IP^'Ik u = k Eq. 9.15 All of the N' k undifferentiated rows in p^ ^^ j are identical (Eq. 9.7) and consequently all minors of P^'' Â„^ , of order u > k + 2 have value 0. Thus, in a fashion exactly analogous to the derivation of Eq. 9.8 from Eq. 9.4, Eq. 9.13 yields {(PiP5i)1Â„=, = Hf"'"'{A,.,(?)Af.,(?)}Â„^, + (lf''{A,(r)A:(r)}^^,, and substituting from Eq. 9.14 and 9.15 into this result produces ~"^ '' Eq. 9.16 (PiP!f}Â„^, = (i) N'kl A,.,("r)A,,,(?)P^1 a=l ^(f), from which, by substitution into Eq. 9.1 1, it follows that C(m,?) = (1) ,N'kl !i^%}. r! ry =0 otherwise Eq.9.17 Evaluating C(m,r) thus requires evaluating the quotient of the order k + 1 principal minor P''klÂ„=.andr!. The order kt1 principal minor  P'L is completely determined by the (k + 1) X (k + 1) subarray of P*"'* given by P^'(w  v) for w, v e K^ where F^'\w \ v)
PAGE 113
104 denotes the indicated mixed partial derivative of P(w  v). Further, in computing the determinant of this subarray, the order in which the row/column indices are drawn from K^ in the subarray construction is immaterial because any transposition of the order introduces exactiy one row transposition and one column transposition into the subarray, so both the magnitude and the algebraic sign of its determinant are preserved. Thus, the most general form of  P '  k^ can be expressed as IP^'l ^' P^\m I m) P^(n P^(m I n,) P^\n P<^(m I H,) P^(H P^'\m I n,) P^'\n m) P^^\Hjm) Â• H,) P<^(H2H,) Â• P^'\n, I m) P^>(HJH,) P^K I n^) Eq.9.18 i(f) In,) P^(n,n,) Â•Â•Â• P^V I n,) From Eq. 4.15 and 4.25, it follows that each nonzero element in row v e K^ of P^' is composed of a combinatorial coefficient and an order M 1 r^H product of the P(i  v)'s. The general form of the element in column w e K^ of row v is given by P^'\w I v) = < M n w(i)! ie S rr ^.. , Â— .w(i)Ir(i,v) otherwise which can be rewritten as Eq.9.19 P^(wv) = Jl w iÂ€ S nw(i)!x(n[r(i,v)!]P(iv) w(i)ir(i,v) otherwise Further, by noting that the factor
PAGE 114
105 'M M! n w(i)! iÂ€ S nr(i,v)!(w(i)r(i,v))! iÂ€ S n w(i)! iÂ€ S n w(i)! iÂ€ S nr(i,v)!(w(i)r(i,v))! ie S Vie S:w(i)>r(i,v) M! nr(i,v)!(w(i)r(i,v))! iÂ€ S is a multinomial coefficient and designating it (via straightforward generalization of the convention introduced in Eq. 4.4) by M! ^M^ y^,r,, Eq. 9.19 simplifies to n(w(i)r(i,v))!r(i,v)! iÂ€ S ^M^ Vie S:w(i)>r(i,v) otherwise Eq. 9.20 P^)(wv)= n[r(i,v)!P(iv)*<""<''^. yW.rvy iÂ€ S Eq. 9.21 If row V is undifferentiated (i.e.  r;; 0), then Eq. 9.21 becomes P^(w I v) = M w nP(iv)^'> = P(wv). iÂ€ S It is noted in passing that if M < N = 2^ then it follows from Eq. 9.20 that either r M ^ vw,r^y or Eq. 9.22 3w'e S'3 In the latter case, it is also true that w' ^M^ V^'^vy fM^ vw J > ^M^ vWy
PAGE 115
106 The enabling condition imposes no practical limitation because any algorithm with M > N could be effectively supplanted by exhaustive search over S. Since I P' I k^ includes every row of P' which is differentiated to nonzero order (e.g. Vn e S' 9 II rjlj > 0), it follows from Eq. 9.18, Eq. 9.20 and Eq. 9.21 that any row n for which II r^ll > M introduces an all zero row into I P ' I k^, making both  P '  k^ = and C(m,r) = 0. Therefore,  r^H < M represents a suitable upper bound on  r^H for n v^ m. This bound, along with the previously established condition  r^ = 0, implies  r < (N' 1) X M and permits the following revision of the Eq. 9. 17 definition of C(m,r) C(S,;) = ^i ^^ II "J = 0, II r; < M for n * m otherwise r! Further, the conditions in this result can be expressed in terms of S", yielding C(m,r) = (_l)N...jpfi,^l a=l r! r= 0,r5e S"{0}forn?^m Eq. 9.23 otherwise At a= 1, using Eq. 9.6 in Eq. 9.21 yields M V^.Tvy P^(wv)L..= r n[r(i,v)!(l/2r""''] IE S Eq. 9.24 (1/2^) n[r(i,v)!]. ^W,r^ ieS Thus, every element in row v e K^ of  P  k_ includes the constant factor Mr (1/2^) n[r(i,v)!]. ieS Substituting Eq. 9.24 into Eq. 9.18 and collecting these common row factors outside the determinant yields
PAGE 116
107 IP^I J =(l/2'r^'^""xLn nr(i,v)!" ( M "j f M ^ (" M l"''"sy M ^ r M ^ f V ^ M W "i.ri V V M ' M ' V ^ "ly ^ M ^ n2.rj r M ^ ^y Also, since r= = M "i n.; M ^ vw,rs^ VW,Oy M ^ M ^ M r M V and since O^i, v)!=r!, W y V e Ki e S \" \y^ =(1/2 ) xr!x T"J a=l ^M^ (yC\ Tm^ m r M ni,rV ">y and substitution into Eq. 9.23 yields v">y ( M ^ "l.^H y V"2y M n2'rH, ^ M ^ ni,r
PAGE 117
108 C(m,r) = (1) x(l/2) Eq. 9.25 M ^ f M \ { M ^ m,rn,,r^"^v ^M^ f M ^ f M ^ ^"2,\; ( M ^ ' M ^ m,rn,,rn^.rM nv,rNote that the condition r^^ = is implicitly asserted in this result by the form of the first row of the combinatorial determinant, and that the condition r^ Â£ S" {0} for n ^t m is enforced by the definition in Eq. 9.20. When Eq. 9.25 is employed with Eq. 9.9, an additional simplification becomes available. The simplification obtains by incorporating the factor (1/2^" =2^"" present in the Eq. 9.24 definition of C(m,r) with the product factor in Eq. 9.9. That is P5I=PIPI=ZC(m,?)x n n(P(in)l/2") f ne S'ie S LxKin) = ZC(m,r)x ( 1 ^''^ > 2^ , ,Ul"i x(27 n n(P(in)l/2') n e S' i e S LxKin) Eq. 9.26 where = ZC'(m,?)x n n(2^P(in)l) r nÂ€ S'ie S C'(m,r) = C(m,r)x(l/2')"' Ki.5) is the coefficient of the indicated monomial in the new variables. Substitution of Eq. 9.25 into this expression for C'(m,r) yields
PAGE 118
109 C'(m,?) = (1) N'kl f 1 T^'^ ')ML Eq. 9.27 fM^ ymj (M^ v"v M^ v"2y M^ V"ky r M ^
PAGE 119
no setting row hia to qT. Further, by virtue of their construction (Eq. 7.7 and 7.10), the single row dependence of the matrix elements on the conditional probability array [P(i  n)] employed in developing the results in Section 9.3 for P and P^^ applies to PChIa)' and P^;^' as well. Thus, Eq. 9.2627 should extend with very littie modification to the determinants  Pg^' r I = I P(mA)' I'l P5^' I' , whose zero mutation probability limits are required by Propositions 7.56. The following paragraphs highlight the required modifications and employ the result to examine two simple examples. In the a Â— > 0^ counterpart of Eq. 9.2627, m is limited to membership in the set of oneoperator absorbing states (i.e. m = m^ e S^'), a consequence of which is that V"^Ay = 1. Also, all rows of the determinant other than m^ cortespond to nonabsorbing states (i.e. n e K c S' S^'). Thus, the determinant order is N' N (1 and the differentiation index array is order (N' N t1) x N with rows corresponding to row indices n e S' Sa' f{m^}. The rows of r are limited to r^ =0 and r^ Â€ S" for n Â€ S' S^. FurA ther, if r indicates nonzero order differentiation of any rows which are adjacent to oneoperator absorbing states, then the associated columns of the combinatorial determinant must reflect the coefficient contribution from the adjacent absorbing state. Thus, if Co'(niA>r) denotes the limiting counterpart of C'(m,r) in Eq. 9.27 and if K = {n,, nj, Â• Â• , ni.} c S' S^' where (in the state adjacency notation introduced in Section 7.3) n, e S(nAj)' * S(mA)' and where n2,,n^ all satisfy nj g S^", then the coefficient of the order k monomial term uniquely identified by r is given by
PAGE 120
Ill Co'(niA,?) = (l) N'Nk . f 1 T + 1) Eq. 9.28 X 1 ^M^ v">y fM^ v"2y + Â— L v"A,'rH,^ V V 'y "a'Ts, V v"a,''v ^ M ^ ^ M ^ M \ 1 f M v"A,''"s.y r M ^ M"! "ky r M ^ nir,r^ M ^ f M ^ It is noted that Eq. 9.28 is only an example, not a definition. It must be adjusted based upon r to reflect the number and location of the nonzero adjacent state contributions. The values of the determinants jPs^' 1'  = I P(niA)' 1' Ps^' 1'  are given by employing Eq. 9.28 in Eq. 9.26 (with r restricted as noted above). Further, the a > 0* limits of the Pm/ I'  =  P(mA)' 1' Ps^' I'  are provided by using the a ^ 0^ limits of the factors (2^P(i  n) 1) in Eq. 9.26. Those limits are provided by using either P,(i I n) or PxXi I n) depending upon whether the two or three operator case is under consideration. It is instructive to apply these results to a simple example. The following paragraphs do so for the onebit problem with population size 2. These parameters (L= 1,M = 2) imply that S = {0, 1 }, N = 2, S' = {(20),(1 1),(02)}, N' = 3, Sa' = {(20), (02)} and S' Sa' = {(11)}. Thus r is limited to re < ^(X)^ {QQ^ Too^ foo^ foo^ (X) LvOOy 10 ,00, 01 ,(X), ,(X), 20 vOOy 02 v(X)y >, and the combinatorial determinant required for evaluation of the nonzero order Co'(mA,r)'s for m^ = (20) by Eq. 9.28 has the general form
PAGE 121
112 r 2 w 2 (20), (20),?, A ( (ll)J^t(02) ly (llXr, ( (02),?Â„ 2+1 ^ ^ 2 W 2 ^ (20),?, ly (11),?, (02),? iiy Evaluation of the zero order coefficient proceeds as follows ( roov (20), 00 =(1) 00 vOOy (320), (\\ 2 ^ V(20)y = (l)xxl \_ A' The coefficient corresponding to r,, = (10) is given by r roo^^ (20), 10 vOOy (1) (321), ( 1 Y^' 2^ V^ J 1 r 2 ^ (20), (10) (11),(10) 2+1 + (02), (10), 1
PAGE 122
113 C ' C ' (20), (20), 00^^ 11 ,00, roo^^ 20 vOOy 8 3_ 16 and C ' (20), 02 vOOy J_ 16' With the required coefficients provided above the value of  Ps^' 1'  for m^ = (20) can be expressed (by Eq. 9.26) as 1 1 1 Pao)'n=77(2P(0(ll))l)+7(2P(l(ll))l) 4 4 Eq. 9.29 +(2P(0(11))1)(2P(1(11))1) ^(2P(0(ll))l)' + ^(2P(l(ll))l)^ 16 16 Then, since P(0 I 11) + P(1  11)1 =i> (2P(1  11)1) = (2P(0  1 1)1), Eq. 9.29 simplifies to (20) \Pn.:r\ ^^(2P(0 I (11))l)^(2P(0 1(1 !))!)' = ^[l+(2P(0(ll))l)f Eq. 9.30 = P(0 11)1 From the symmetry inherent in the problem, it follows that the m^ = (02) counterpart of Eq. 9.30 is P(02)'n=P(ln)', Eq.9.31 and employing Eq. 9.3031 with Proposition 7.5 yields (for the twooperator case)
PAGE 123
114 P,(011)' qo(20) = Pi(0 ll)' + Pi(l 11)' and Eq. 9.32 qo(02) = V ; Â• P,(0ll)VP,(lllf Then, substituting Eq. 4.2 in Eq. 9.32 yields qo(20) = ; Â° R(0)' + R(1)' and Eq. 9.33 qo(02) = ; . The limit for the nonabsorbing state m = (1 1) is known to be zero by Proposition 7.5. An identical result to Eq. 9.33 obtains for the three operator case because for the onebit problem, crossover is nullified and Pj'Ci I n) = Pi(i I n) (see Eq. 4.22). Additional insight into the behavior of the limiting stationary distribution is obtainable by examining the onebit problem with population size 3. These parameters (L = 1,M = 3) leave S and N unaltered but change the other state space related sets and parameters to S' = {(30), (21), (12), (03)}, N' = 4, S^' = {(30), (03)} and S'Sa' = {(21), (12)}. By retracing the previous development (the M = 2case) with? limited as indicated by these state space sets, results analogous to Eq. 9.32 and 9.33 are obtained. Thus, the M=3 counterpart of Eq. 9.32 is P,(021)^[P,(0 12)V3P,(0 12)^P,(1 I 12)] + '^",(0 12)'t3P,(0 12)'I ^^^^ [ P,(012)^[P,(121)V3P,(0I21)P,(121)'] J qo(30) = and Eq. 9.34 P,(1 21)nP,(l I 12)^h3P,(l I 12)^P,(0 12)]H 1 1 P,(l12)nP,(021)^h3P,(l21)P,(021)'] J qo(03) = ^Â—
PAGE 124
115 where D = P,(021)'[P,(0 12)' + 3Pi(0 12)'Pi(l I 12)] + P,(0 12)'[P,(l21)' + 3P,(021)P,(l2lf] + P,(l12f[P,(l21)' + 3P,(l21)'P,(021)] + P,(l 21)'[P,(0 12)' + 3P,(1 I 12)P,(0 12)']. The Eq. 9.33 counterpart is (3Q. ^ [2R(0)]' [R(0)' + 6R(0)'R(1)] + R(0)^ [R(l )^ + 6R(0)R(1 )'] and Eq. 9.35 [2R( 1 )f [R( 1 ) V 6R( 1 )'R(0)] + R( 1 )' [R(0)' + 6R( 1 )R(0)'] qo(03) = D' where D' = [2R(0)]' [R(0)' + 6R(0)'R( 1 )] + R(0)' [R( 1 )' + 6R(0)R( 1 f] +[2R( 1 )]' [R( 1 f + 6R( 1 )'R(0)] + R( 1 )' [R(0)' + 6R( 1 )R(0)']. Again, the threeoperator case yields an identical result. These examples suggest two very significant conjectural features of the limiting stationary distribution behavior. First, only order 2 monomial terms survive in the detemiinant expansions of the M=2 case and only order 6 terms survive for M=3. These facts lead to the supposition that in general, only order Mx(N'N) terms survive. In the M=2 case, Mx(N'N) = 2x(32) = 2 while for M=3, Mx(N'N) = 3x(42) = 6. If this supposition is correct, then the polynomial forms required for evaluating the stationary distribution zero mutation probability limit by Propositions 7.5 and 7.6 are homogeneous order Mx(N'N) polynomials in the P(i  n)'s. Presumably, the corresponding property (i.e. homogeneous order Mx(N'l) order polynomial forms) applies to the general case represented by Propositions 6.7 and 6.S.
PAGE 125
116 A second conjecture concerns the limiting distribution behavior as a function of the parameter M. The computed limiting distribution entropy results displayed in Section 5.5 suggest that the limiting distribution is dominated by optimal solutions for M sufficiently large. That supposition is supported by the results in Eq. 9.33 and 9.35. In the M=2 case, it follows immediately from Eq. 9.33 that qo(02)/qo(20) = [R(1)/R(0)]^ For M=3 and R(l) < R(0) it is straightforward to show that a corresponding bounding relationship exists, i.e. qo(03)/qo(30) < [R(1)/R(0)]\ This suggests that the ratio of the probabilities of the uniform population states corresponding to i and j with R(i) < R(j) behaves at or better than [R(i)/R(j)]'' ^ Eq. 9.36 for M sufficiently large. If this supposition is indeed correct, then the desired limiting distribution behavior for the twooperator simple genetic algorithm (i.e. probability zero for suboptimal solutions) can be approached as closely as required by selecting M sufficiently large. The corresponding general case (i.e. L>1) threeoperator counterparts of Eq. 9.32 and 9.34 are expressed in terms of the P2(i  n)' array (Eq. 4.22). Thus, the numerator polynomial counterparts of Eq. 9.33 and 9.35 are expressed in terms of complex polynomial functions of the reward function values, and consequently it may be that no general case threeoperator counterpart of Eq. 9.36 exists. (It is noted that the design of the reward functions employed in Section 5, in which only length 02 schema dependence is incorporated, tends to minimize crossover disruption, which may account for the progression toward optimality indicated by the threeoperator results recorded in Figures 57 through 518). The simulated annealing global optimality may thus extrapolate onto the simple genetic algorithm only in the Pc ^ and M ><Â» limiting sense. 9.5 Extending the Stationary Distribution Representation Eq. 9.26 and 9.27 represent an exact expression of the value of the determinant  Pji 1 = I P 1 1 P1 , and with Propositions 6.7 and 6.8 constitute an exact representation of the components of the stationary distribution of the two and threeoperator
PAGE 126
117 algorithms. Section 9.4 extends those results to the determinants _ p_ ' _ i' I I PCm^)' I'l Ps^' 1' I whose a > 0* values are required for use in Propositions 7.56. The utility of these representations depends upon the ability to extract useful relationships between the C'(ni,r)'s from the general form represented by Eq. 9.27 and Eq. 9.28. The following paragraphs examine the combinatorial determinants in the general forms provided by Eq. 9.27 and Eq. 9.28 and deduce some of the key relationships. The purpose of this effort is to provide a foundation for extending the stationary distribution representation methodology developed in Sections 9.29.4. First, if the enabling condition for Eq. 9.22 is satisfied (i.e. M < N), then every element in the combinatorial determinant of Eq. 9.27 is either zero or it is the combinatorial determinant corresponding to the order zero coefficient for some state in S'. Thus, every coefficient of the form represented by Eq. 9.27 can be written as sums and products of order zero coefficients. An analogous conclusion applies to Eq. 9.28. Second, it is clear from Eq. 9.27 that nonzero order differentiation of any two or more rows of  P5 1 =  P 1 1 P^ 1 in an identical pattern (e.g. f :0^t^^ = Tj^ for n, ^ nz) introduces identical rows into the combinatorial determinant, and thus makes C'(ni,r') = 0. Consequently, no monomial terms corresponding to any r' with identical nonzero rows survive in the expansion of P^i1 =  P1 1 P^1 . An identical conclusion applies to the coefficients ofPHi^'I' =  P(mA)' I' Psa'~^'' of which Eq. 9.28 is an exemplar. A very important class of coefficient identities derives from transpositions of nonzero rows and columns of the differentiation order array. The resulting identities are very closely connected to the algebra of symmetric and alternating polynomials, and to an associated determinant concept called alternants, of which Vandermonde determinants are a special case. Appendix C is provided to support the following paragraphs. From the form of the combinatorial determinants in Eq. 9.27 and Eq. 9.28, it is clear that exchanging any two of the k rows indexed by row indices n e K is equivalent to
PAGE 127
118 exchanging the corresponding nonzero rows of?If?' is derived from r by such a row transposition, then it follows that C'(m,?') = C'(m,?). Thus, C'(m,r) establishes the value (to within a sign alternation) of the coefficients of k! distinct monomial terms in the expansion of P^II =  PI  P^II . An identical result applies toC'o(m,r) and the expansion of Ps^'I' = IPCmJ' I' P5;^'r. The collection of monomial terms corresponding to this coefficient identity can be written as the product of C'(ni,r) (or of C'o(m,r)) and a polynomial function of the fonn defined in Eq. C.12 of Appendix C. That is, the collection of terms is a quasialternating polynomial function in the array of variables (2^P(i  n) 1). In addition to the preceding result, the following identity applies to C'(m,r) and the expansion of  P^; 1 =  P 1 1 P^ 1 . For any n e K, transposition of columns m and n in the combinatorial determinant of Eq. 9.27 is equivalent to representing the value of C'(n,r') where r' is derived from r by exchanging rn with r^ = 0. That is ne K=>C'(n,r') = C'(m,r). Thus, the identical quasialternating function, evaluated in the new set of variables generated by replacing each P(i  n) with the corresponding P(i  m), is included in the expansion of P5II = I P1 1 P5II . Collectively, these results account for (ki1)! of the coefficients required for representation of the stationary distribution. Another class of coefficient identities derives from transpositions of the columns of r (i.e. transpositions of i,j g S). Let m' be derived from m by setting m(j)' = m(i), m(i)' = m(j), n,' from n, by setting n,(j)' = n,(i), n,(i)' = n,(j), etc. Then, if r' is derived from r by transposition of rows m with m', n, with n/, etc. followed by transposition of columns i and j, it follows from Eq. 9.27 that C'(m',?') = C'(m,?). An identical result applies to C'o(mA',r').
PAGE 128
119 The number of distinct coefficients whose values are generated in this fashion from C'(m,r) depends upon both the number and form of the nonzero columns in r. If the number of nonzero columns is p, then exchanging any of the p nonzero columns with any of the N p zero columns generates the coefficient of a distinct monomial term. Exchanging a nonzero column with another nonzero column having a different column sum also generates a distinct coefficient. However, exchanging a nonzero column with another nonzero column having identical column sum may or may not generate a distinct coefficient, depending upon the distribution of the nonzero entries in the two columns, because it is possible for the transformation described above to translate one column into the other. A lower bound on the number of distinct coefficients thus generated is The collection of monomial terms corresponding to this coefficient identity can be written as the product of C'(m,r) (or of C'o(m,r)) and a polynomial function of the form defined in Eq. CIO of Appendix C. That is, the collection of terms is a quasisymmetric polynomial function in the array of variables (2^P(i  n) 1). These coefficient identities and their connection to the quasisymmetric and quasialternating polynomials of Appendix C offer a promising mechanism for extending the stationary distribution representation work begun here. Examination of the general form (2^P(i I n) 1) reveals that it is zero mean in the sense that I(2'P(in)l) = 0. ie S This property, along with the common form of the elements in the conditional probability array [P(i  n)], suggests that the symmetric and alternating polynomial forms required for evaluation of Propositions 6.76.8 or 6.56.6 may admit to large scale simplifications, and ultimately yield a tractable, explicit closed form expression for the stationary distribution components.
PAGE 129
SECTION 10 CONCLUSIONS AND FUTURE DIRECTION 10.1 Summary This dissertation reports an effort to establish an analytical framework for the simple genetic algorithm, based upon the asymptotic probability distribution of the generated solution sequences. The mechanism employed herein is extrapolation of the extensive existing theoretical foundation of the simulated annealing algorithm onto the genetic algorithm. That foundation is based upon the asymptotic behavior of a nonstationary Markov chain simulated annealing algorithm model. The simulated annealing literature is reviewed in Section 2, with particular emphasis on the methodology employed to develop the key theoretical results. Those results include a demonstration that provided a lower bound of the form K/log(k) on the algorithm parameter corresponding to absolute temperature is observed, the asymptotic probability distribution over the algorithm state space is zero for all states corresponding to suboptimal solutions. Thus, the simulated annealing algorithm obtains (asymptotically) a globally optimal soludon. The genetic algorithm literature is reviewed in Section 3. The significant conclusion of that section is that while certain important theoretical results exist, notably the so called schema theorem and some work on a problem construct referred to as the minimal deceptive problem, no genetic algorithm model or accompanying convergence theory comparable in scope to that of simulated annealing exists in the literature. The fundamental purpose of the work described herein is to provide such an analytical framework by extrapolating the known simulated annealing theory onto the genetic algorithm. An essential first step toward that goal is development of a nonstationary Markov chain algorithm model for the genetic algorithm. That task is accomplished in Section 4. . 120
PAGE 130
121 The product of that effort is a very general nonstationary Markov chain model for variants of the algorithm incorporating combinations of the three fundamental genetic algorithm operators. The model is tailored to resemble the model employed in the analysis of the simulated annealing algorithm convergence behavior, with the mutation probability algorithm parameter playing a role analogous to absolute temperature in simulated annealing. Additionally, some salient features of the model state behavior are pointed out in Section 4. In particular, the oneoperator (reproduction only) simple genetic algorithm is shown to possess exactly 2^ absorbing states, one for each possible uniform population, while the twooperator (reproduction/mutation) and threeoperator (reproduction/mutation/crossover) variants possess a unique stationary distribution. The expected value of the absorption time for the oneoperator algorithm is finite and an upper bound is provided by Eq. 4.8. The probability distribution of the final solution state produced by the oneoperator simple genetic algorithm depends upon the initial state, mo. The inclusion of the mutation operator is shown in Section 4 to provide a significant additional dimension to the state behavior of the timehomogeneous (stationary) two and threeoperator variants of the algorithm, the existence of a unique stationary distribution. The significance of the unique stationary distribution is that the asymptotic state behavior is independent of the starting state. It is completely determined by the objective function and the algorithm parameters. In Section 5, the genetic algorithm model is employed to generate some computer simulation results. Specifically, a combinatorial interpretation of the model state space is explored numerically in Section 5.2 and the limiting stationary distribution of the three operator algorithm is approximated for a variety of algorithm parameter sets in Section 5.5. A very significant feature of the limiting stationary distribution is suggested by the Section 5.5 results and later verified theoretically (in Section 7). It is that the limiting two and threeoperator algorithm stationary distribution behavior necessary for extrapolating
PAGE 131
122 the simulated annealing asymptotic global optimality result does not follow. The limiting distribution is nonzero for all states corresponding to uniform populations (oneoperator absorbing states), including those representing suboptimal solutions. This complication precludes an exact extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm. The Section 5 results do however reinforce the intuitive notion that increasing the algorithm population size parameter biases the limiting distribution towards the desired limiting behavior. Section 6 employs the PerronFrobenius Theorem (which is summarized in Appendix B) to formulate the timehomogeneous two and threeoperator algorithm unique stationary distribution existence argument into a system of equations whose solution is the stationary distribution components. The solution is formulated in terms of Cramer's Rule, and is not explicitly solved, however the Section 6 results provide a remarkable degree of insight into the form of the solution and its behavior with respect to the algorithm parameters. Those results provide the foundation for the remaining sections. The unique stationary distribution existence argument for the stationary two and threeoperator algorithm variants only applies when the mutation probability parameter is stricdy greater than zero. A oneoperator (zero mutation probability) stationary distribution exists but as demonstrated in Section 4.3.1 it is not unique. A very important requirement for extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm is existence of a zero mutation probability limit for the stationary distribution. Section 7 is devoted to resolving that question affirmatively. It is based upon the results developed in Section 6 and it also verifies the Section 5.5 observation concerning the nonzero limit for all states corresponding to uniform populations. A very significant theoretical contribution of this work is developed in Section 8. It is a monotonic mutation probability bound sufficient to guarantee strong ergodicity of the nonstationary two and threeoperator simple genetic algorithm Markov chains. The parameter bound is analogous to the simulated annealing temperature schedule bound.
PAGE 132
123 The bound is asserted in Proposition 8.1, and its form (i.e. j^r) is asymptotically superior to the K/log(ic) bound associated with the simulated annealing algorithm. It is very noteworthy that the same bound applies both to the two and threeoperator algorithm variants. At least in terms of the Section 8 bound, the crossover operator does not expedite convergence. All of the results developed in Sections 7 and 8 are obtained without explicitly solving the stationary distribution system. Section 9 attacks the problem of explicit solution. It is a very extensive and somewhat tedious development. The product of that work is an expression for the general term in a multivariate Taylor's series expansion of the determinant form required for explicit solution of the stationary distribution equations. The results are expressed in Eq. 9.26 and 9.27 for the general nonzero mutation probability case, augmented by Eq. 9.28 for the zero mutation probability limit. These results stop short of a useable answer but they do provide some insight into the nature of the solution. Further, Section 9.5 provides some intriguing ideas for extending the work started in Section 9. The attempt to extrapolate the simulated annealing convergence theory onto the genetic algorithm fails in the sense that the zero mutation probability stationary distribution limits of the two and threeoperator simple genetic algorithm variants do not satisfy the required form for extrapolation of the simulated annealing global optimality result. However, evidence is provided which suggests that for the twooperator algorithm variant, the required behavior can be approached by increasing the population size parameter (Eq. 9.36). The question is more complicated for the threeoperator case, and as pointed out in Section 9.4, implementation of crossover with nonzero p^ may indeed preclude convergence to global optimality even in the infinite population size limiting sense of Eq. 9.36. The latter observation concerning crossover, along with the equivalence of the mutation probability sequence bounds for the two and threeoperator cases noted
PAGE 133
124 previously, poses some significant questions concerning the role of the crossover operator. Indeed, from the results developed herein, it is not clear that any desirable effect on the asymptotic algorithm behavior obtains from application of the crossover operator, though it may have a desirable effect in expediting convergence in real (finite time) applications. The resolution of these questions, along with a host of other applications questions such as optimum population size, mutation and crossover probability parameter selection, number of iterations required to achieve acceptable results, etc. require further progress on the stationary distribution representation task begun in Section 9. 10.2 Contributions of the Research The research reported herein establishes a framework for modeling the genetic algorithm in terms of the asymptotic probability distribution of the solution sequences which it produces. Specific significant accomplishments include the following: (1) A very general nonstationary Markov Chain model of one, two and threeoperator variants of the genetic algorithm, and a framework for analysis of the operators based upon their impact on the state space of the Markov chain (2) Demonstration of the existence of a unique stationary distribution for the timehomogeneous (stationary) two and threeoperator algorithm variants (3) A stationary distribution solution in terms of the characteristic polynomials of matrices derived from the state transition matrix (4) Demonstration of the existence of a zero mutation probability stationary distribution limit for the timehomogeneous two and threeoperator algorithms (5) A mutation probability schedule bound (analogous to the annealing schedule bound of simulated annealing) sufficient for the nonstationary two and threeoperator genetic algorithm variants to achieve the limiting distribution
PAGE 134
125 (6) A methodology for representing the two and threeoperator stationary distribution components at all consistent values of mutation probability (including the zero mutation probability limit), and a proposed approach for extending that methodology to produce an explicit result. 10.3 Future Direction In order to achieve the stated goal of this work, a complete analytical framework for the simple genetic algorithm, additional progress must be made on the stationary distribution solution effort begun in Section 9. The coefficient relationships noted in Section 9.5, especially the coefficient identities which attend transpositions of rows and columns in the differentiation order array and their connection with the quasisymmetric and quasialternating polynomial notions presented in Appendix C, provide a foundation for proceeding with this effort. An explicit representation of the functional form of the stationary distribution, reduced to a rational function expression in the algorithm parameters and objective function, would provide a very valuable theoretical tool for use in the analysis of genetic algorithm performance, and is the ultimate goal. However, even if explicit solution is not attainable, it may prove possible to deduce very useful bounds on the stationary distribution components from continuation of the Section 9 development. A second promising area for continuation of this work concerns the mutation probability parameter sequence bound provided in Section 8. It is based on very simple lower bounds (Eq. 4.21 and 4.31) which exist for the conditional probabilities which compose the state transition matrix, and it only employs the onestep transition matrix in the (1 x,(P)) sequence employed to establish weak ergodicity (Section 8.2). Some preliminary work not reported in Section 8 suggests that employing twostep transition matrices in summing the (1 x,(P)) sequence may allow a refinement of the bound to something of the form k~'. It also appears from that preliminary work that the same bound applies for both the two and threeoperator algorithm variants.
PAGE 135
APPENDIX A DISCRETE TIME FINITE STATE MARKOV CHAINS A.l Introduction The following paragraphs establish some definitions and theorems on discrete time finite state Markov chains and related stochastic matrix concepts. These results fall into three main categories, (1) elementary definitions, (2) definitions and theorems concerning the state space and asymptotic behavior of timehomogeneous (stationary) Markov chains and (3) some more advanced ergodicity definitions and theorems necessary for the analysis of the asymptotic behavior of inhomogeneous Markov chains. These results are presented without proof or elaboration but the foundation required for the more elementary of them can be obtained from [Cinl75, IsMa76] or many other references on Markov chains. The ergodicity related results can be found in [IsMa76, Sene81]. Although some of the results discussed here apply to continuous time and/or denumerably infinite state space Markov chains as well, the intention is to restrict consideration to the discrete time finite state case. All references herein to Markov chains are understood to mean discrete time finite state Markov chains. In the following, let K = {0, 1 , 2, Â• Â• Â• } be the set of nonnegative integers, let X {Xijik e K} be a discrete time (i.e. discrete sequence index) stochastic process with finite cardinality state space E, and let i,j e E. A. 2 Elementary Definitions Definition Al: If Vi,j Â€ E and every k Â£ K, it follows that Pr{X,,,=j:Xo = iÂ„X, = iÂ„,X, = i} = Pr{X,,,=j:X, = i}, then X is a Markov chain. 126
PAGE 136
127 Definition A2: Any row vector q^ = [q(i)] ,i e E satisfying the conditions (1) VieE:q(i)>0 (2) ^= Zq(i)=l ie E is called a probability vector. Definition A3: Any square matrix P whose rows are all composed of probability vectors is called a stochastic matrix, or sometimes more explicitly a row stochastic matrix. The row sum constraint on a row stochastic matrix can be written as PI = 1, so 1 is an eigenvalue of every stochastic matrix. Definition A4: The stochastic matrix Pk=[P.(i.J)] = [Pr{X,..=jX, = i}] is the one step transition probability matrix or state transition matrix of the Markov chain X. If the probability vectors q^ and q^^.! are respectively the probability distributions of X,( and Xj+i, then qI+i = qi!PkSimilarly, the stochastic matrix P., = fP..(iJ)l = Pr{X,=jXÂ„ = i}l = pJ^Â„...p,.,= nP, l = in where k = m + n, m, n e K, n > is the nstep transition probability matrix of X, and _ kl_ qi!^=qlPn,k = qIn p,. I = m Definition A5: Let P,; be the state transition matrix of the Markov chain X at time (sequence index) k. Then, X is timehomogeneous if and only if Vk Â€ K it follows that P,j = P where Pisa constant state transition matrix.
PAGE 137
128 A. 3 TimeHomogeneous Markov Chains The timehomogeneous Markov chain X is completely specified by its initial probability distribution, qo, and state transition matrix, P. The probability distribution of X^ ,k> 1 is given by The following definitions and theorems concern the asymptotic behavior of the chain and some conditions on the state space which make the asymptotic behavior independent of qoIn the following, let the i,j e E element of P'' be denoted by p^'(i,j). Definition A6: A subset Eq of the state space E of the Markov chain X is called closed if Vi e Eq , Vj Â€ E Eo it follows that p(i, j) = 0. If the closed set Eq contains the single state i, so that p(i,i) = 1, then the state i is called an absorbing state. Definition A7: A Markov chain is called irreducible if there exists no nonempty closed subset of its state space E other than E itself. Definition A8: The states i and j are said to intercommunicate if 3ki,kj e K 9 p^*(i,j) > and p^'^Q, i) > 0. Theorem A 1 : A Markov chain is irreducible if and only if all pairs of states intercommunicate. Definition A9: State i g E of the Markov chain X has period d if the following two conditions hold: (1) p^\i,i) = unless k = md for some positive integer m and
PAGE 138
129 (2) d is the largest integer with property (1). If d = 1, state i is called aperiodic. The Markov chain X is aperiodic if and only if Vi e E are aperiodic. Theorem A2: If X is irreducible and if 3i e E 9 p(i,i) > 0, then X is aperiodic. Definition A 10: Any probability vector q over the state space of the time homogeneous Markov chain X and satisfying is called a stationary distribution of X. It is not necessarily unique. Theorem A3: If the Markov chain X is timehomogeneous, irreducible, aperiodic and has a finite state space, then a stationary distribution exists for X. Funher, the stationary distribution is unique and is determined by and Theorem A4: If the timehomogeneous Markov chain X possesses a unique stationary distribution, q, then for every probability vector x with compatible dimensions, it follows that lim X P = q^.
PAGE 139
130 A. 4 Inhomogeneous Markov Chains Complete specification of the inhomogeneous Markov chain X requires its initial probability distribution, Qq, and the infinite sequence of state transition matrices, {P^} , k > 0. The probability distribution of X^ ,k > 1 is given by q^ = q^nPÂ„. n = If the chain is asymptotically independent of qo, then it is said to be ergodic. Two classes of ergodicity must be distinguished. The following definitions and theorems elaborate. Definition All: The inhomogeneous Markov chain X is weakly ergodic if Vi,j,l Â€ E,Vm Â€ K lim(Pmk(i.l)PmkG>l)) = 0. Weak ergodicity does not require that either lim Pmk(i,l) or lim PmkO.l) exist. Definition A 12: Any scalar function t(), continuous on the set of nxn stochastic matrices P and satisfying < t(P) < 1 is called a coefficient of ergodicity. If in addition T(P) = 0<^P = Tq^ where q is any probability vector with compatible dimensions (i.e. when all rows of P are identical probability vectors), then T is said to be proper. Weak ergodicity is equivalent to x(P^)^0,k^oo,m>0 where T is a proper coefficient of ergodicity. Theorem A5: Let P be a nxn stochastic matrix and let n T,(P)= 1 min I min(p(i,k),p(j,k)). i.j k=l Then, x^{P) is a proper coefficient of ergodicity.
PAGE 140
131 Theorem A6: The inhomogeneous Markov chain X is weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers {k,}, 1 Â€ K such that ,li>Â»
PAGE 141
APPENDIX B THE PERRONFROBENIUS THEOREM AND STOCHASTIC MATRICES B.l Introduction A matrix possessing the property that all of its components are nonnegative is referred to as a nonnegative matrix. For the matrix T, this condition is indicated by T > 0. The case in which all components of T are strictly positive is indicated by T > 0. This notation extends in the obvious manner to expressions such asT>B<=>TB>0 relating nonnegative matrices with compatible dimensions. The definitions, theorems and corollary in Section B.2 below concern nonnegative matrices. They are the foundation for the Markov chain stationary distribution existence and representation theorem and related results summarized in Appendix A and employed in Sections 2, 4, 7 and 8. They are extracted from [SeneSl], and are specialized in Section B.3 from the case of finite nonnegative matrices to the case of finite stochastic matrices. They are employed extensively in Sections 6 and 7. B.2 The PerronFrobenius Theorem and Ancillary Results for Primitive Matrices Theorem B2 below is called the strong version of the PerronFrobenius theorem. It applies to a class of nonnegative matrices referred to as primitive. A version of the theorem which applies to the wider class of irreducible nonnegative matrices is usually invoked for applications involving stochastic matrices, but the flexibility of the more general version is not required for the purposes herein. The connection of these results to those of Appendix A is provided by Theorem Bl. It asserts that primitivity (Definition Bl) is equivalent to the combination of irreducibility and aperiodicity, as defined in Appendix A. 132
PAGE 142
133 Definition Bl: A square nonnegative matrix, 7, is primitive if there exists a positive integer k such that T* > 0. Theorem B 1 : If the n x n nonnegative matrix T is irreducible (Definition A7) and aperiodic (Definition A9), then T is primitive and conversely. Theorem B2: Let T be an n x n nonnegative primitive matrix. Then there exists an eigenvalue r of T such that (a) r is real, r > (b) r has corresponding left and right eigenvectors with strictly positive components (c) r > I ^1 for any eigenvalue X^t (d) the eigenvectors associated with r are unique to constant multiples (e) If < B < T and P is an eigenvalue of B, then  P < r. Moreover,  Pl = r implies B = T. (0 r is a simple root of the characteristic polynomial of T. Definition B2: The eigenvalue r asserted in Theorem B2 is called the PerronFrobenius eigenvalue of the nonnegative primitive matrix T. Corollary Bl: Let Tj j be the components of a nonnegative primitive matrix T having PerronFrobenius eigenvalue r. Then min X Tj J < r < max X T^ j 1 j i j ' with equality on either side implying equality throughout (i.e. r can only be equal to the maximal or minimal row sum if all row sums are equal). A similar proposition holds for column sums.
PAGE 143
134 Theorem B3: Let j be an nxn nonnegative primitive matrix with PerronFrobenius eigenvalue r, let V and w be strictly positive left and right eigenvectors respectively of T corresponding to r with v and w normed so that v'w = 1, and let the t < n distinct eigenvalues of T be ordered such that r >  ^ >  A^ > Â• Â• Â• > 1 ^ with the additional condition that I A^l has multiplicity mj equal to or greater than the multiplicity of any other eigenvalue X^ for which  A^ =  X^l . It follows that (a) if ^2 ?t 0, then as k Â— > Â«> elementwise, where s = mj 1 ; (b) ifA2 = 0,thenfork>nl T = r''wv . B.3 The PerronFrobenius Theory for Stochastic Matrices A stochastic matrix (e.g. the state transition matrix of a Markov chain) is a special case of a square nonnegative matrix in which all row sums are equal to the constant 1. The following results specialize those of Section B.2 to the case of T an nxn stochastic primitive matrix, P. Theorem B4: Let P be an nxn stochastic primitive matrix. Then (a) r = 1 is an eigenvalue of P (b) r = 1 has corresponding left and right eigenvectors with strictly positive components (c) r = 1 > I A, for any eigenvalue X^t (d) the eigenvectors associated with r = 1 are unique to constant multiples
PAGE 144
135 (e) If < B < P and (3 is an eigenvalue of b, then  p < r = 1 . Moreover, I PI =r=l implies B = P. (f) r = 1 is a simple root of the characteristic polynomial of P. This theorem follows immediately from Theorem 32 by application of Corollary Bl with T a stochastic primitive matrix, P. Among its consequences are the following. Proposition Bl: The right eigenvector asserted in Theorem B4(b) and (d) can be selected as the vector 1 . This result follows from the row sum constraint on nxn stochastic matrices, which can be expressed as PI = 1. Thus, 1 is a right eigenvector, corresponding to eigenvalue 1, of every nxn stochastic matrix. Theorem B4 asserts that for finite primitive stochastic matrices, it is unique to within a nonzero scalar multiple. Proposition B2: Let the vector q be the left eigenvector asserted in Theorem B4(b) and (d). Then, the additional constraint qT = 1 is consistent and makes q unique. Since the left eigenvector asserted in Theorem B4(b) and (d) has strictly positive components, its inner product with the vector 1 is a strictly positive (nonzero) number. Consequently, that inner product can be used to normalize the eigenvector to produce a q which satisfies both requirements, and Proposition B2 follows. Proposition B3: If P is an n x n stochastic primitive matrix, then lim(P'') = Tq^ where q is the unique vector asserted in Proposition B2.
PAGE 145
136 This result follows from Theorem B3 by specializing it to nxn stochastic primitive matrices via Theorem B4, Proposition Bl and Proposition B2. A very significant consequence is the following. Proposition B4: If P is an n x n stochastic primitive matrix and x an arbitrary ndimensional probability vector, then lim(x^) = x'^q' = q^ k Â— Â»o" where q is the unique vector asserted in Proposition B2. Definition B3: If the n x n stochastic primitive matrix P in Theorem B4 and Propositions B1B4 is the state transition matrix of a time homogeneous Markov chain, then the unique vector q asserted in Proposition B2 is the stationary distribution of the Markov chain.
PAGE 146
APPENDIX C VANDERMONDE DETERMINANTS, SYMMETRIC AND ALTERNATING POLYNOMIALS C.l Introduction An order n determinant whose i, j element is given by ^j{x) for some set of n scalar functions ())j and a companion set of n scalar variables x^, i.e. AÂ„(x) (1),(X,) (t)2(x,) (t),(X2) (Jj^CXj) Â„(X2) Eq. C.l 01 (Xn) H^n) Â• ^JlnCXn) is called an alternant. The name derives from the fact that exchanging any pair of the variables in its argument list (e.g. Xp and x<,) affects the value of AÂ„(x) = AÂ„(x,, X2, Â• Â• , xÂ„) only by reversing its algebraic sign. This property is clear from Eq. C.l, because transposing the variables Xp and x^ in AÂ„(x) amounts to exchanging the corresponding rows of the determinant, and from an elementary property of determinants, any such row exchange leaves the determinant value unchanged in magnitude but reversed in sign. The state transition matrix of the Markov chain representing the simple genetic algorithm, as introduced in Section 4 of this paper, is a multivariate generalization of the matrix form underlying the alternant. The coefficient symmetries noted in Section 9.5 in connection with the stationary distribution representation development are a consequence. Section 10 proposes exploiting this connection in continuing the stationiuy distribution representation work begun in Section 9. This appendix provides some of the related background. 137
PAGE 147
138 If the (}). in Eq. C. 1 are consecutive integer powers of their arguments, indexed from through n1, i.e. AÂ„(x) = VÂ„(x) = nl nl 1 i 1 Xn XÂ„ D n nl Eq. C.2 then the resulting special case alternant is known as a Vandermonde determinant. The values of a Vandermonde determinant and its minors are closely related to a class of polynomials in n variables referred to as the symmetric polynomials (and to a companion class of polynomials referred to as alternating polynomials). The distinguishing feature of the symmetric polynomials is invariance with respect to permutations of the argument list (e.g. y(x,, Xj, X3) = X, + Xj + Xj = \/(x2, x,, X3)). Alternating polynomials reverse sign with each transposition of variables. Section C.2 below develops an expression for the value of the order n Vandermonde determinant in Eq. C.2. The evaluation method employs the determinant form and a polynomial remainder theorem due to Bezout. Section C.3 introduces formal definitions of symmetric and alternating polynomials and a fundamental theorem which associates them with Vandermonde determinants. Section C.4 generalizes the symmetric and alternating polynomial notions to the form required by the discussion in Section 9.5. C.2 Evaluation of Vandermonde Determinants The value of the order n Vandermonde determinant can be deduced from its form (Eq. C.2) and a polynomial remainder theorem due to Bezout. Let \/(x) be an arbitrary polynomial function in n variables (i.e. x = (xÂ„ X2, Â• Â• Â•, xÂ„)) and let xf ' be generated from x by replacing x^ with the value a. Then, the theorem states that if \j/(x) is divided by the binomial (x; a) the remainder is \j/(x('^) [MoSt64]. That is \/(x) = (x,a)(t)(x)Hv(x*').
PAGE 148
139 If a is selected as a = Xj for some j^i, then x(Â«) contains the value Xj at two distinct index locations in its list (i.e. at i and j). Consequently, if \/(x) represents the value of the Vandermonde determinant in Eq. C.2 (i.e. v(x) = AÂ„(x) =  VÂ„(x) ), then the Vandermonde determinant represented by the polynomial function \/(xj*') = AÂ„(x'') =  VÂ„(x'*) contains two identical rows and hence is zero. In that case, the Bezout theorem reduces to y\f(x) = i\x^MK). Thus (Xj Xj) is a factor of AÂ„(x) =  VÂ„(x) . This argument applies to each of the n(n l)/2 distinct difference factors (Xp x,). It follows that AÂ„(x) =  VÂ„(x) can be written as AÂ„(x) = I VÂ„(x) =
PAGE 149
140 where a = ((x ,oiy,,CL,)^^ a permutation of the integers (1,2, sn) is said to be symmetric with respect to the given permutation. If this property applies for all n! permutations of the Xj's, then V/ is a symmetric polynomial [MoSt64]. Some examples of symmetric polynomials are \)/(Xj, X2, X3) = Xj + X2 + X3 and V(xi,X2,",xJ = x;; + x^ + +x^ It is straightforward to show that any sum, difference or product of symmetric polynomials is a symmetric polynomial. In fact, the symmetric polynomials form a ring. If the symmetric polynomial \/(x) = \i/(xi,X2,",Xn) includes the monomial Pl P2 Pd among its terms, then it includes also the monomial Pl P2 Pn aXa,Xot^""Xa^ where a = (a,, 02, Â• Â• Â•, On) is an arbitrary permutation of the integers (1 , 2, Â• Â• Â•, n). If for a given p = (pÂ„p2,,pj the sum of all distinct monomials of this form is designated by (t)p(x) = Xx^x^x'^, Eq.C.6 then ^{\) is symmetric, and further, the arbitrary symmetric polynomial \/(x) can be written as a linear combination of a finite number of such polynomials. That is V/(x) = I a^(x). p A transposition of the ordered list of n variables x,^, 1 < k < n is a permutation which exchanges the positions of any two of the x^'s. Every permutation of the ordered list can be written as a composition of transpositions applied to (1,2, Â•Â•,n), and for any specified permutation, if any such composition includes an odd number of transpositions, then every such composition includes an odd number of transpositions. Similarly, if any such
PAGE 150
141 composition includes an even number of transpositions, then every such composition includes an even number of transpositions. A permutation is designated odd or even depending upon whether its decomposition into transpositions yields an odd or even number of factors respectively. If n > 1, then exactly n!/2 odd and n!/2 even permutations exist [MoSt64]. A polynomial Y in n variables possessing the property that 7(xÂ„X2,,xJ = 7(x^,x^,,x^) Eq. C.7 for every odd permutation a = (a,,a2, Â•Â•Â•,(XÂ„) of its argument list is an alternating polynomial . It follows that any sum or difference of alternating polynomials is an alternating polynomial, that the product of a symmetric polynomial and an alternating polynomial is an alternating polynomial and that the product of any odd number of alternating polynomials is an alternating polynomial. The product of any even number of alternating polynomials is symmetric. If the alternating polynomial 7(x) = 7(x,, Xj, Â• Â• Â•, xÂ„) includes the monomial Pl P2 Pn ax, Xj Â•Â•xÂ„ among its terms, then it includes also the monomial / , ,.s(a) Pl P2 Pn (ir^ax^x^x^ where a= (a,,(X2, Â•Â•Â•,an) is an arbitrary permutation of the integers (1,2, sn) and where s(a)is the number of transpositions in the permutation a. If for a given p = (Pi,P2, sPn) the sum of all distinct monomials of this form is designated by Pp(x) = I(irx:;x:^x:^, Eq.C.8 then Pp(x) is alternating, and further, the arbitrary alternating polynomial y{\) can be written as a linear combination of a finite number of such polynomials. That is 7(x) = S a^p^(x).
PAGE 151
142 As pointed out in the concluding paragraph of Section C.2, the polynomial function (defined by the product in Eq. C.4) which represents the value of an order n Vandermonde determinant alternates sign with each exchange of two variables in its argument list. Thus, it is an alternating polynomial. In fact the polynomial function which represents a Vandermonde determinant is an elementary alternating polynomial in the sense defined by the following theorem, proof of which is provided in [Aitk54] and [Muir60]. Theorem C. 1 : If yis an alternating polynomial in the ordered list of n variables X,;, 1 < k < n, then yean be written as k = n 7(xÂ„X2,,xÂ„) = \/(xÂ„X2,,xJx n (x^Xj) k = 2,j < k = V/(x)xVÂ„(x) where y is a symmetric polynomial. C.4 QuasiSymmetric (and QuasiAlternating) Polynomials The definitions of symmetric and alternating polynomials supplied in Section C.3 require that the relevant properties (Eq. C.5 and Eq. C.7) apply for all n! permutations of the integers (1,2, Â•Â•Â•,n). This section generalizes those notions to multivariate analogs, suitable for the discussion presented in Section 9.5. The generalization amounts to restricting the applicability of Eq. C.5 and C.7 to a subset of the n! permutations of (1,2, Â•Â•Â•,n). The resulting polynomial classes are referred to here as quasisymmetric and quasialternating polynomials respectively. Let \/ be a polynomial function of n = mk scalar variables x^, 1 < i < m, 1 < j < k and let \/ be denoted \/(x,,X2,,xÂ„) where X is a kcomponent vector composed from the scalars x^, 1 < j < k. Then, y is quasisymmetric if V(xÂ„X2,,0 = V(S,V,xÂ„^) Eq. C.9
PAGE 152
143 for all permutations a = (a,, a2,,aÂ„)Â®f^h^^"^^g^''"s (1,2, Â•Â•,m). The set of all m! such permutations can be placed in onetoone correspondence with a subset of the set of all n! = (mk)! permutations of (1,2, ,n). If the quasisymmetric polynomial \/(x) = \i(xj, Xj, Â• Â• Â•, O includes the monomial among its terms, then it includes also the monomial < PÂ„ P,2 _ Pu P2I P22 P2i ...L,Pn.l Pm2 Pmk where a = (ai,(X2, Â•Â•Â•,(XÂ„)is an arbitrary permutation of the integers (1,2, ^m). If for a given P = (Pl>P2'''Pm) = (Pu'Pl2'.Plk'P2Pp22''P2k'.Pml'Pm2'.Pmk) the sum of all distinct monomials of this form is designated by i(x) = ll^y:;,xll'J^x^'.xJ. Â• x^J Â• {x^^'.x^^V Â• x^^t). Eq. CIO then
PAGE 153
144 If the quasialternating polynomial y(x) = 7(xÂ„ x^, ,\J includes the monomial ''22 '^2k j ' \\nl \ia ' ' 'Kitk among its terms, then it includes also the monomial where a = (aÂ„ 0X2, Â• Â• Â• , aÂ„) is an arbitrary permutation of the integers ( 1 , 2, Â• Â• Â• , m) and where s(a) is the number of transpositions in the permutation a. If for a given P = (Pl,P2'.Pm) = (PlpPl2..Plk.P21.P22..P2k'.Pml.Pm2..Pmk) the sum of all distinct monomials of this form is designated by p(x) = y(l)'<"*fx''"x''"x''"'Yx''"x''''x''^\./Y''Â°"v'''^ v''"'0 Pr, rO then p^(x) is quasialternating, and further, the arbitrary quasialternating polynomial y(\) can be written as a linear combination of a finite number of such polynomials. That is 7(x) = I a^pp(x).
PAGE 154
APPENDIX D COMPUTER LISTINGS D.l Introduction This appendix includes listings of the computer programs used to generate the simulation data presented in Section 5. These programs were developed on the Eglin AFB, Fl. Cray YMP. The programs are all written in Fortran and in some cases they employ Cray extensions to the Fortran standard. The listings are separated into two subsections, one including main program listings and a second including the contents of a subprogram library accessed by the main programs. The library procedures section also includes a library table of contents. D.2 Main Program Listings PROGRAM GET_NPS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_NPS C C Purpose: Compute (via binomial coefficient) and output C the cardinality of the indicated S 's and S "s C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare local variables C INTEGER M, L, NN, NP, NPP DOUBLE FNP, FNPP C C Loop over M = 1 to M = 8 C D02M = 1,8 WR1TE(6, *)'M = ',M C C Get the answers and write them to stdout C DO 1 L= 1,8 NN = 2**L + M FNP1.0 FNPP= 1.0 145
PAGE 155
146 DO 4 J = 0, M 1 FNP = FNP*FLOAT( NN 1 J )/FLOAT( M J ) 4 FNPP = FNPP*FLOAT( NN J )/FLOAT( M J ) NP = 0.5 + FNP NPP = 0.5 + FNPP 1 WRITE( 6, 3 ) L, 2**L, NP, NPP 2 WRITE( 6, * ) 3 FORMAT( 5H L = , 14, 5H N = , 18, & 6H NP = , 122, 7H NPP = , 122 ) C C Finished C END PROGRAM GET_SPS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C Name: GET_SPS C C Purpose: Compute and output the indicated S's C ccccccccccccccccccccccccccccccccccccccccccccccccccccc C Declare local variables C INTEGER M, L, NP, LM, LL, LN, J INTEGER SP( 4*5984 ), NBAR( 16 ) C C Prompt for and read ranees C WRITE( 6, * ) ' Maximum M? ' READ( 5, '(18)' ) M WRITE( 6, * ) ' Maximum L? ' READ(5, '(I8)')L C C Get the answers and write them to stdout C D03LM = 1,M D02LL=1,L CALL GET_SP( LM, LL, SP, NP ) WRITE( 6, * ) WRITE( 6, 4 ) LM, LL, 2**LL, NP DO 1LN= 1,NP S5;yt UNPACK_NBARP( LM, LL, SP( ( LN 1 )*LM + 1 ), NBAR ) 1 WRITE(6,5)(NBAR(J),J=1,2**LL) 2 CONTINUE 3 CONTINUE 5 f8rMAt! 'eU) = Â• ^Â™ ^ Â• "Â• 5" ^ = Â• '' 6H NP = . Â„2 ) C Finished C END
PAGE 156
147 PROGRAM GET_R CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_R C C Purpose: Generate the indicated reward function and write C it to disk C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare local variables c INTEGER L, SCHEMA, SCHEMA_MASK, DELTA REAL R( 0:2**8 1 ), WEIGHT CHARACTER*8 SCHEMAC C C Prompt for and read the bitstring length C WRITE( 6, * ) 'Bitstring length? ' READ( 5, * ) L C C Loop over all schemata and retrieve the associated weight C D04K=1,3**L D0 1N = 1,L INDICATOR = MOD( ( K 1 )/3**( N 1 ), 3 ) IF( INDICATOR .EQ. ) THEN SCHEMAC(N:N) = '*' ELSE IF( INDICATOR .EQ. 1 ) THEN SCHEMAC( N:N ) = '0' ELSE SCHEMAC(N:N) = 'l' END IF 1 CONTINUE WRITE( 6, * ) SCHEMAC( 1:N ) READ( 5, * ) WEIGHT C C Build the schema and schema mask C SCHEMA = SCHEMA_MASK = D03N = 1,L DELTA = 2**( N 1 ) IF( SCHEMAC( N:N ) .EQ. '1' ) THEN SCHEMA = SCHEMA + DELTA SCHEMA_MASK = SCHEMA_MASK + DELTA ELSE IF( SCHEMAC( N:N ) .EQ. '0' ) THEN SCHEMA_MASK = SCHEMA_MASK + DELTA END IF 3 CONTINUE C C Now add required contributions to R C DO 4 I = 0, 2**L 1 IF( AND( SCHEMA_MASK, XOR( I, SCHEMA ) ) .EQ. )
PAGE 157
148 & R( I ) = R( I ) + WEIGHT 4 CONTINUE 5 CONTINUE C C open the output data file and write R C OPEN( 1, nLE='RDATA', STATUS='NEW' ) WRITE( 1, '(8F10.6)' ) ( R( J ), J = 0, 2**L 1 ) C C Finished C END PROGRAM GET_P2INS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_P2INS C C Purpose: Compute and return the indicated conditional C probability arrays C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare problem defining parameters C INTEGER M, L, NP, NALPHA PARAMETER ( M = 6, L = 4, NP = 54264, NALPHA = 10 ) INTEGER SP( M, NP ) REAL R( 0:2**L 1 ), P2IN( 0:2**L 1 ) INTEGER NCOUNTS( 4 ) DATA NCOUNTS/1,3096,3100,54181/ CHARACTER*8 P2INDATA DATA P2INDATA/'P2INDATA7 C C Declare local variables C INTEGER NBAR( 0:15) REAL ALPHA C C Get the objective function values C OPEN( 1, nLE='RDATA', STATUS='OLD' ) READ( 1, '(8F10.6)' ) ( R( I ), I = 0, 2**L 1 ) CLOSE( 1 ) C C Open the output data file and write the summary data C OPEN( 1, FILE=P2INDATA, STATUS='NEW' ) WRITE( 1, '(418)' ) M, L, NP, NALPHA WRITE( 1,'(8F10.6)')R WRITE(1,*) C C Build the state space array C CALL GET_SP( M, L, SP )
PAGE 158
149 C C Generate the required P2IN data C D0 2NC0UNT= 1,4 CALL UNPACK_NBARP( M, L, SP( 1, NCOUNTS( NCOUNT ) ), NBAR ) WRITE( 1, '(1612)' ) NBAR DO 1 I = 0, NALPHA 1 ALPHA = L0/2**I WRITE( 1, '(F10.6)' ) ALPHA CALL GET_P2IN( M, L, SP( 1, NCOUNTS( NCOUNT ) ), & ALPHA, R, P2IN ) WRITE( 1,'(8F10.6)')P2IN 1 WRITE( 1, * ) 2 CONTINUE C C Finished C END PROGRAM GET_P3INS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_P3INS C C Purpose: Compute and return the indicated conditional C probability arrays C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare problem defining parameters C INTEGER M, L, NP, NALPHA PARAMETER ( M = 6, L = 4, NP = 54264, NALPHA = 10 ) INTEGER SP( M, NP ) REAL R( 0:2**L 1 ), P3IN( 0:2**L 1 ) INTEGER NCOUNTSC 4 ) DATA NCOUNTS/1,3()96,3100,54181/ CHARACTER*8 P3INDATA DATA P3INDATA/'P3INDATA'/ C C Declare local variables C INTEGER NBAR( 0:15) REAL ALPHA C C Get the objective function values C OPEN( 1, nLE='RDATA', STATUS^'OLD' ) READ( 1, '(8F10.6)' ) ( R(I ), 1 = 0, 2**L 1 ) CLOSE( 1 ) C C Open the output data file and write the summary data C OPEN( 1, FILE=P3INDATA, STATUS='NEW' ) WRITE( 1, '(418)' ) M, L, NP, NALPHA
PAGE 159
150 C WRITE(1,'(8F10.6)')R WRITE( 1, * ) C Build the state space array CALL GET_SP( M, L, SP ) C C Generate the required P3IN data C D0 2NCOUNT=l,4 CALL UNPACK_NBARP( M, L, SP( 1, NCOUNTS( NCOUNT ) ), NEAR ) WRITEd, '(1612)') NEAR ^ DO 1 I = 0, NALPHA 1 ALPHA = 1.0/2**1 WRITE(1,'(F10.6)') ALPHA CALL GET_P3IN( M, L, SP( 1, NCOUNTS( NCOUNT ) ) & ALPHA, R, P3IN ) WRITEd, '(8F10.6)')P3IN 1 WRITE( 1, * ) 2 CONTINUE C C Finished C END PROGRAM GET_3STAT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C Name: GET_3STAT C C Purpose: Compute the indicated threeoperator stationary C distribution C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C Declare defining parameters and output file name INTEGER M, L, NP PARAMETER ( M = 4, L = 5, NP = 52360 ) REAL ALPHA, ALPHAO, R( 0:2**L 1 ), TDELTA INTEGER NALPHA PARAMETER ( ALPHAO = 1.0/2**20, NALPHA = 1, TDELTA = 004 ) CHARACTER* lOTESTFILE ' DATA TESTFILE/'3TEST45'/ C Declare state space associated arrays INTEGER SP( M, NP ), SPA( 0:2**L 1 ), MULTI( NP ) REAL QEAR(NP, 0:1) C Declare some local variables C REAL LDELTA INTEGER TOGGLE, LOOP_COUNT
PAGE 160
151 C Get the objective function data C OPEN( 1, nLE='RDATA4', STATUS='OLD' ) READ( 1, '(8F10.6)' ) ( R( J ), J = 0, 2**L 1 ) CLOSE( 1 ) C C Open the output data file and write the summary output data C OPEN( 1, nLE=TESTFILE, STATUS = 'NEW, FORM= 'UNFORMATTED' ) WRITE( 1 ) M, L, NP WRITE( 1 ) ALPHAO, NALPHA, TDELTA WRITE( 1 ) R C C Generate and store the state space set, S' C CALL GET_SP( M, L, SP ) C C Next, get the indices of the absorbing states in S' C CALL GET_SPA( M, L, NP, SP, SPA ) C C Now get the associated muhinomial coefficient array C CALL GET_MULTI( M, L, NP, SP, MULTI ) C C Compute for each required ALPHA C DO 3 K = 0, NALPHA 1 C C InitiaHze ALPHA and QBAR C ALPHA = ALPHA0/2**K CALL INIT_QBAR( M, L, NP, MULTI, QBAR ) C C Loop until the tolerance parameter is met C TOGGLE = LOOP_COUNT = LDELTA = LO DO 1 WHILE ( LDELTA .GT. TDELTA ) CALL GET_3QBAR( M, L, NP, ALPHA, R, MULTI, & SP, SPA, QB AR( 1 , TOGGLE ), & QB AR( 1 , MOD( TOGGLE +1,2)), & LDELTA ) TOGGLE = MOD( TOGGLE +1,2) LOOP_COUNT = LOOP_COUNT + 1 WRITE(6,*) LOOP_COUNT, TDELTA, LDELTA 1 CONTINUE C C Output the termination infonnation and the final vector C WRITE( 1 ) ALPHA, LOOP_COUNT 1, LDELTA DO2I = l,NP/40 2 WRITE( 1 ) ( QB AR( 4()*( I 1 ) + J, TOGGLE ), J = 1 , 40 ) WRITEC 1 ) ( QBAR( J, TOGGLE ), J = 4()*{ NP/4() ) + 1, NP )
PAGE 161
152 3 CONTINUE C C Last write the absorbing state vector values C WRITE( 1 ) ( QBAR( SPA( I ), TOGGLE ), I = 0, 2**L 1 ) C C Finished C END D.3 Library Listings stat.o:GET_SP stat.o:GET_SPA stat.o:GET_MULTI stat.o:INIT_QBAR stat.o:GET_3QBAR stat.o:INIT_NBARP stat.o:GET_NBARP stat.o:GET_NFAC stat.o:GET_P3MN stat.o:GET_P3IN stat.o:GET_PlIN stat.o:UNPACK_NBARP SUBROUTINE GET_SP( M, L, SP, NP ) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_SP c C Purpose: Generate S' C C Note: The fourth argument (NP) is optional C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, SP( M, 0:* ), NP C C Declare local variables C INTEGER NCOUNT, NBARP( M ), I, NARG LOGICAL LSTAT C C Initialize NBARP C CALL INIT_NBARP( M, NBARP ) C C Loop until S' is complete C NCOUNT = LSTAT = .TRUE. DO 2 WHILE( LSTAT )
PAGE 162
153 C Set this element in S' C DOl 1= 1,M 1 SP( I, NCOUNT ) = NBARP( I ) C C Get the next one and increment the counter C CALL GET_NBARP( M, L, NBARP, LSTAT ) 2 NCOUNT = NCOUNT + 1 C C Test argument count to determine whether to set NP C NARG = NUMARGO IF( NARG .EQ. 4 ) NP = NCOUNT C C Return to caller C RETURN END SUBROUTINE GET_SPA( M, L, NP, SP, SPA ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_SPA C C Purpose: Generate a table of absorbing state indices in S' C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare calling arguments C INTEGER M, L, NP, SP( M, NP ), SPA( 0:2**L 1 ) C C Declare some local variables C INTEGER I, J, K, JSTART C C Initialize C JSTART = 1 C C Loop over all I in S C D03I = 1,2**L C C Loop over all J in S' C DO 2 J = JSTART, NP C C Test SP until an exit condition is satisfied C DO 1 K = 1,M IF( SP( K, J ) .NE. I ) GO TO 2 1 CONTINUE C
PAGE 163
154 C Exhausting M signals an absorbing state C Assign it and go after the next one C SPA( I 1 ) = J JSTART = J + 1 GO TO 3 C C Exit to label 2 means that this J is not an C absorbing state C 2 CONTINUE 3 CONTINUE C C Return to caller C RETURN END SUBROUTINE GET_MULTI( M, L, NP, SP, MULTI ) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_MULTI c C Purpose: Get the multinomial coefficient table for the C supplied S' C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NP, SP( M, NP ), MULTI( NP ) C C Declare local variables and function reference C INTEGER MFAC, NBAR( 0:2**L 1 ), NCOUNT INTEGER I, GET_NFAC C C Compute and save M! C MFAC = GET_NFAC( M ) C C Loop over all vectors in S' C D0 2NC0UNT=1,NP C C Set MULTI( NCOUNT ) = M! C MULTI( NCOUNT ) = MFAC C C Now unpack this NBARP C CALL UNPACK_NBARP( M, L, SP( 1, NCOUNT ), NBAR ) C C Loop over the denominator factorials C
PAGE 164
155 DO 1 I = 0, 2**L 1 1 MULTI( NCOUNT ) = MULTI( NCOUNT )/GET_NFAC( NBAR( I ) ) C C Close the loop C 2 CONTINUE C C Return to caller C RETURN END SUBROUTINE INIT_QBAR( M, L, NP, MULTI, QBAR ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: INIT_QBAR C C Purpose: Initialize the QBAR probability vectors C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare calling arguments C INTEGER M, L, NP, MULTI( NP ) REAL QBAR(NP, 0:1) C C Declare a local variable C REAL FRACTION C C Set QBARl to its ALPHA = 1 value and zero QBAR2 C FRACTION = 1.0/( 2**( M*L ) ) DOl 1= 1,NP QBAR( I, ) = MULT1( 1 )*FRACTION 1 QBAR(I, 1 ) = 0.0 C C Return to caller C RETURN END SUBROUTINE GET_3QBAR( M, L, NP, ALPHA, R, MULTI, & SP, SPA, QBARO, QBARl, LDELTA) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_3QBAR C C Purpose: Compute and return the indicated threeoperator C stationary distribution vector transformation C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare calling arguments C
PAGE 165
156 INTEGER M, L, NP, MULT1( NP ), SP( M, NP ), SPA( 0:2**L 1 ) REAL ALPHA, R( 0:2**L 1 ) REAL QBARO( NP ), QBAR1( NP ), LDELTA C C Declare local variables C INTEGER MCOUNT, NCOUNT REAL SIGMA, P3MN( NP ) C C Clear QBARl C DO 1 MCOUNT =1,NP 1 QB AR 1 ( MCOUNT ) = 0.0 C C Loop over all states in S' C D0 3NC0UNT=1,NP C C Get P3MN for this NBAR C CALL GET_P3MN( M, L, NP, ALPHA, R, MULTI, NCOUNT, SP, P3MN ) C C Accumulate them C D0 2MC0UNT=1,NP 2 QB AR 1 ( MCOUNT ) = QBAR1( MCOUNT ) + & QBARO( NCOUNT ) * P3MN( MCOUNT ) C C Close the NBAR loop C 3 CONTINUE C C Normalize C SIGMA = 0.0 DO 4 MCOUNT = 1,NP 4 SIGMA = SIGMA + QBARl (MCOUNT) D0 5MC0UNT=1,NP 5 QBARK MCOUNT ) = QBAR1( MCOUNT )/SIGMA C C Reset LDELTA C SIGMA = 0.0 DO 6 I = 0, 2**L 1 6 SIGMA = SIGMA + QBAR1(SPA(I)) LDELTA = 1.0 SIGMA C C Return to caller C RETURN END
PAGE 166
157 SUBROUTINE INIT_NBARP( M, NBARP ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Name: INIT_NBARP C C Purpose: Initialize NBARP to the starting element of S ' C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, NBARP( M ) C C Set all pointers in the NBARP array to cell 1 C D0 1I=1,M 1 NBARP( I ) = 1 C C Return to caller C RETURN END SUBROUTINE GET_NBARP( M, L, NBARP, NSTAT ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_NBARP C C Purpose: Generate the successor element in S' C C Note: (1) This procedure assumes that M and L are in range C and that the caller supplied NBARP is valid C C (2) The fourth argument (NSTAT) is optional C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ) LOGICAL NSTAT C C Declare local variables C INTEGER IMAX, I, J, NARG LOGICAL LSTAT C C Set maximum index C IMAX = 2**L C C Process most frequent transition C IF( NBARP( M ) .LT. IMAX ) THEN NBARP( M ) = NBARP( M ) + 1
PAGE 167
158 LSTAT = .TRUE. C C Next most frequent C ELSE IF( NBARP( 1 ) .LT. IMAX ) THEN DO 1I = M1,1,1 IF( NBARP( I ) .LT. IMAX ) GO TO 2 1 CONTINUE 2 NBARP( I ) = NBARP( I ) + 1 D0 3J = I+1,M 3 NBARP( J ) = NBARP( I ) LSTAT = .TRUE. C C Anything else is terminal C ELSE LSTAT = .FALSE. END IF C C Test argument count to determine whether to set NSTAT C NARG = NUMARGO IF( NARG .EQ. 4 ) NSTAT = LSTAT C C Return to caller C RETURN END INTEGER FUNCTION GET_NFAC( N ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_NFAC C C Purpose: Compute and return N! C C Note: This procedure assumes that N is a nonnegative integer C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Declare calling argument C INTEGER N C C IfN = OorN=l,thenN! = l C IF( N .LE. 1 ) THEN GET_NFAC = 1 C C Otherwise, recurse C ELSE GET_NFAC = N * GET_NFAC( N 1 ) END IF
PAGE 168
159 C Return to caller C RETURN END SUBROUTINE GET_P3MN( M, L, NP, ALPHA, R, MULTI, & NCOUNT, SP, P3MN ) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_P3MN C C Purpose: Compute and return the indicated conditional C probability array C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NP, MULTI( NP ), NCOUNT, SP( M, NP ) REAL ALPHA, R( 0:2**L 1 ), P3IN( 0:2**L 1 ), P3MN( NP ) C C Declare local variables C INTEGER MCOUNT, K C C Get P3IN for this NBARP C CALL GET_P3IN( M, L, SP( 1, NCOUNT ), ALPHA, R, P3IN ) C C Initialize the P3MN vector C DOl MCOUNT = 1,NP 1 P3MN( MCOUNT ) = MULT1( MCOUNT ) C C Loop over the solutions represented in this NBARP C D03J= 1,M D0 2MC0UNT=1,NP K = SP( J, MCOUNT ) 1 2 P3MN( MCOUNT ) = P3MN( MCOUNT )*P3IN( K ) 3 CONTINUE C C Return to caller C RETURN END SUBROUTINE GET_P3IN( M, L, NBARP, ALPHA, R, P3IN ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_P3IN C C Purpose: Compute and return the indicated conditional C probability array C
PAGE 169
160 ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ) REAL ALPHA, R( 0:2**L 1 ), P3IN( 0:2**L 1 ) C C Declare local variables C INTEGER I, J, K, IMAX, INDX REAL P1IN( 0:2**L 1 ), P3INP( 0:2**L 1 ) REAL P2, SUM C C Set maximum loop index C IMAX = 2**L 1 C C Clear the P3IN and P3INP arrays C DO 1 I = 0, IMAX P3IN( I ) = 0.0 1 P3INP( I ) = 0.0 C C Get the PUN array C CALL GET_P1IN( M, L, NBARP, R, PUN ) C C Build the P3INP array C DO 4 I = 0, IMAX DO 3 J = 0, IMAX P2 = P1IN(I)*P1IN(J) DO 2 K = 0, L INDX = I CALL MVBITS( J, 0, K, INDX, ) 2 P3INP( INDX ) = P3INP( INDX ) + P2 3 CONTINUE 4 CONTINUE C C Perturb with mutation C DO 6 J = 0, IMAX IF( P3INP( J ) .NE. 0.0 ) THEN DO 5 I = 0, IMAX 5 P3IN( I ) = P3IN( I ) + P3INP( J )* & ALPHA**POPCNT( XOR( I, J ) ) END IF 6 CONTINUE C C Now normalize them C SUM = 0.0 DO 7 I = 0, IMAX 7 SUM = SUM + P3IN( I ) DO 8 I = 0, IMAX
PAGE 170
161 8 P3IN( I ) = P3IN( I )/SUM C C Return to caller C RETURN END SUBROUTINE GET_P1IN( M, L, NBARP, R, PUN ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Name: GET_P1IN C C Purpose: Compute and return the indicated conditional C probability array C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ) REAL R( 0:2**L 1 ), P1IN( 0:2**L 1 ) C C Declare local variables C INTEGER I, J, K, IMAX REAL SUM C C Set maximum loop index C IMAX = 2**L 1 C C Clear the P 1 IN vector C DO 1 I = 0, IMAX 1 P1IN(I) = 0.0 C C Compute the numerators and accumulate the denominator C SUM = 0.0 D02J= 1,M K = NBARP(J)1 P1IN(K) = P1IN(K) + R(K) 2 SUM = SUM + R( K ) C C Now normalize them C DO 3 I = 0, IMAX 3 P1IN(I) = P1IN(I)/SUM C C Return to caller C RETURN END
PAGE 171
162 SUBROUTINE UNPACK_NBARP( M, L, NBARP, NBAR ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: UNPACK_NBARP C C Puq)ose: Generate displayable version of packed NBAR C C Note: This procedure assumes that M and L are in range C that the caller supplied NBARP is valid C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ), NBAR( 0:2**L 1 ) C C Declare local variables C INTEGER I C C Clear the NBAR vector C DO 1 I = 0, 2**L 1 1 NBAR( I ) = C C Now set the nonzero components C D02I=1,M 2 NBAR( NBARP( I ) 1 ) = NBAR( NBARP( I ) 1 ) + 1 C C Return to caller C RETURN END
PAGE 172
REFERENCES [Aitk54] Aitken, A. C, "Determinants and Matrices," Interscience Publishers, Inc., New York, N. Y., 1954. [Beth80] Bethke, A. D., "Genetic Algorithms as Function Optimizers," Ph. D. Dissertation, University of Michigan, Ann Arbor, Mich., 1980. [BrGo87] Bridges, C. L. and Goldberg, D. E., "An Analysis of Reproduction and Crossover in a Binary Coded Genetic Algorithm," in Grefenstette, J. J. (Ed.) "Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms", Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1987, pp 913. [BrGo89] Bridges, C. L. and Goldberg, D. E., "A Note on the NonUniform Walsh Schema Transform," TCGA Report No. 89004, Jun 1989, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. [Cinl75] Cinlar, E., "Introduction to Stochastic Processes," Prentice Hall, Englewood Cliffs, N. J., 1975. [Davi87J Davis, L. (Ed.), "Genetic Algorithms and Simulated Annealing," Morgan Kaufman Publishers, Inc., Los Altos, Calif., 1987. [Dejo75] De Jong, K. A., "An Analysis of the Behavior of a Class of Genetic Adaptive Systems," Ph. D. Dissertation, University of Michigan, Ann Arbor, Mich., 1975. [GeGe84] Geman, S. and Geman, D., "Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images," IEEE Trans. Patt. Anal. Mach. Intel., Vol. PAMI6, No. 6, Nov. 1984, pp 721741. fGold83] Goldberg, D. E., "ComputerAided Gas Pipeline Operation Using Genetic Algorithms and Rule Learning," Ph.D. Dissertation, University of Michigan, Ann Arbor, Mich., 1983. [Gold85 Goldberg, D. E., "Optimal Initial Population Size for Binary Coded Genetic Algorithms," TCGA Report No 85001, Nov 1985, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. LGold87] Goldberg, D. E., "Simple Genetic Algorithms and the Minimal Deceptive Problem," in Davis, L. (Ed.), "Genetic Algorithms and Simulated Annealing," Morgan Kaufman Publishers, Inc., Los Altos, Calif., 1987. [Gold88 Goldberg, D. E., "Genetic Algorithms and Walsh Functions: P;ut I, A Gentle Introduction," TCGA Repon No. 88(K)6, Nov 1988, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. 163
PAGE 173
164 [Gold89a] Goldberg, D., "Genetic Algorithms in Search, Optimization and Machine Learning," AddisonWesley Publishing Company, Inc., Reading, Mass., 1989. [Gold89b] Goldberg, D. E., "Genetic Algorithms and Walsh Functions: Part II, Deception and its Analysis," TCGA Report No. 89001, Jan 1989, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. [GoSe87] Goldberg, D. E. and Segrest, P., "Finite Markov Chain Analysis of Genetic Algorithms," in Grefenstette, J. J. (Ed.) "Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms," Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1987, pp 18. [Gref85] Grefenstette, J. J. (Ed.), "Proceedings of an International Conference on genetic Algorithms and Their Applications," Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1985. [Gref87] Grefenstette, J. J. (Ed.), "Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on genetic Algorithms," Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1987. [Hall67] Hall, M., "Combinatorial Theory," Blaisdell Publishing Company, Waltham, Mass., 1967. [Holl75] Holland, J. H., "Adaptation in Natural and Artificial Systems," University of Michigan Press, Ann Arbor, Mich., 1975. [IsMa76] Isaacson, D. L. and Madsen, R. W., "Markov Chains Theory and Applications," John Wiley & Sons, New York, N. Y., 1976. [KiGe83] Kirkpatrick, S., Gelatt, C. D. and Vecchi, M. P., "Optimization by Simulated Annealing," Science, Vol 220, Number 4598, May 1983, pp 671680. [LaAa87] Laarhoven, P. J. M. and Aarts, E. H. L., "Simulated Annealing," D. Reidel Publishing Company, Dordrecht, Holland, 1987. [LuMe86] Lundy, M. and Mees, A., "Convergence of an Annealing Algorithm," Math. Prog., 34, 1986, ppll 1124. [Metr53] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E., "Equation of State Calculations by Fast Computing Machines," The Journal of Chemical Physics, Vol 21, Number 6, June 1953, pp 10871092. [MiRo85] Mitra, D., Romeo, F. and SangiovanniVincentelli, A., "Convergence and Finite Time Behavior of Simulated Annealing," Proc 24 ^ Conference on Decision and Control, Ft. Lauderdale, 1985, pp 761767. [MoSt64] Mostowski, A. and Stark, M., "Introduction to Higher Algebra," The Macmillian Company, New York, N. Y., 1964. [Muir60] Muir, T., "A Treatise on the Theory of Determinants," Dover Publications, Inc., New York, N. Y., 1960.
PAGE 174
165 [Rior58] Riordan, J., "An Introduction to Combinatorial Analysis," John Wiley & Sons, New York, N. Y., 1958. [RoSa85] Romeo, F. and SangiovanniVincentelli, A., "Probabilistic Hill Climbing Algorithms: Properties and Applications," Proc. 1985 Chapel Hill Conference on VLSI, May 1985, pp393417. [SeneSl] Seneta, E., "Nonnegative Matrices and Markov Chains," Springer Verlag, New York, N. Y., 1981.
PAGE 175
BIOGRAPHICAL SKETCH Tom Davis enrolled in the Purdue University School of Electrical Engineering in September, 1967 and was awarded the Bachelor of Science in Electrical Engineering degree in June, 1971. Upon graduation, he was commissioned in the United States Navy and entered flight training at Pensacola, Florida. He was designated a naval aviator in April, 1973. He subsequently completed several aviation related tours of duty, including two years as a primary flight instructor at Saufley Field, Florida, and three years in an air antisubmarine warfare squadron homeported in Brunswick, Maine. During the later tour, he completed three extended overseas deployments to European and North Atlantic island operational sites. Tom resigned from the navy in August, 1978 and entered the Purdue University School of Electrical Engineering as a graduate student. He was awarded the Master of Science in Electrical Engineering degree in December, 1979. In January, 1980, he reported to work at the Guided Weapons Division of the Air Force Armament Test Laboratory at Eglin AFB, Florida. During the following eight years, he was assigned to a variety of millimeter wave radar and infrared seeker development programs for autonomous targeting tactical weapons systems. In May, 1988, Tom was admitted to the Graduate School of the University of Florida and he began attending classes in Gainesville in August of that year. During the calendar year ending in August, 1989, he satisfied the residency, course work and entrance examination for admission to the PhD program. At that time, he returned to Eglin AFB and resumed his duties in the Armament Laboratory. He was admitted to candidacy following a qualifying exam conducted at Gainesville in October, 1989, and since then has been engaged in dissertation research in stochastic relaxation search algorithms. 166
PAGE 176
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^/ ncjl^e. Chairman iate Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Donald G. Childers Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Antonio Arroyo ^ Associate Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. KwwU" AaÂ« Murali Rao Professor of Mathematics 1 certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Eugeme R. Chenette Professor of Electrical Engineering
PAGE 177
This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 1991 lUj^^jj/CK>U<^ ^!iw_^infred M. Phillips Dean, College of Engineering Madelyn M. Lxx:khart Dean, Graduate School
PAGE 178
9

