Citation

## Material Information

Title:
Toward an extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm
Creator:
Davis, Thomas E. ( Dissertant )
Principe, Jose C. ( Thesis advisor )
Childers, Donald G. ( Reviewer )
Arroyo, Antonio A. ( Reviewer )
Rao, Murali ( Reviewer )
Chenette, Eugune R. ( Reviewer )
Phillips, Winfred M. ( Degree grantor )
Place of Publication:
Gainesville, Fla.
Publisher:
University of Florida
Publication Date:
1991
Language:
English
Physical Description:
viii, 166 leaves : ill. ; 29 cm.

## Subjects

Subjects / Keywords:
Crossovers ( jstor )
Determinants ( jstor )
Ergodic theory ( jstor )
Genetic algorithms ( jstor )
Genetic mutation ( jstor )
Integers ( jstor )
Markov chains ( jstor )
Matrices ( jstor )
Polynomials ( jstor )
Simulated annealing ( jstor )
Algorithms ( lcsh )
Combinatorial optimization ( lcsh )
Dissertations, Academic -- Electrical Engineering -- UF
Electrical Engineering thesis Ph. D
Simulated annealing (Mathematics) ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

## Notes

Abstract:
Simulated annealing and the genetic algorithm are stochastic relaxation search techniques suitable for application to a wide variety of combinatorial complexity nonconvex optimization problems. Each produces a sequence of candidate solutions (or populations of candidate solutions) to the underlying optimization problem, and the purpose of both algorithms is to generate sequences biased toward solutions which optimize the objective function. The appeal of simulated annealing is that it provides asymptotic convergence to a globally optimal solution. A substantial body of knowledge exists concerning the algorithm convergence behavior. It is based upon a nonstationary Markov chain algorithm model. No genetic algorithm model comparable in scope exists in the literature. This work constitutes an attempt to provide such a model and accompanying convergence theory by extrapolating the simulated annealing results onto the genetic algorithm. A prerequisite, developed herein, is a nonstationary Markov chain genetic algorithm model. The essence of the simulated annealing theory is demonstration of (1) existence of a unique asymptotic probability distribution (stationary distribution) for the stationary Markov chain corresponding to every strictly positive constant value of an algorithm control parameter (absolute temperature), (2) existence of a stationary distribution limit as the control parameter approaches zero, (3) the desired behavior of the stationary distribution limit (i.e. optimal solution with probability one) and (4) sufficient conditions on the algorithm control parameter to ensure that the nonstationary algorithm achieves (asymptotically) the limiting distribution. With the exception of (3), this work adapts that methodology to the genetic algorithm Markov chain model employing a genetic operator parameter (mutation probability) as the algorithm control parameter. The results include a mutation probability control parameter bound analogous to (and asymptotically superior to) the conventional simulated annealing parameter bounds, and a framework for representing the genetic algorithm stationary distribution components at all consistent fixed control parameter values, including zero. The genetic algorithm stationary distribution limit has nonzero components corresponding to all solutions. Thus, the simulated annealing global optimality convergence result does not extrapolate. However, both empirical and theoretical evidence is provided which suggests that the desired limiting behavior can be approached by suitably adjusting the algorithm parameters.
Thesis:
Thesis (Ph. D.)--University of Florida, 1991.
Bibliography:
Includes bibliographical references (leaves 163-165).
Also available on World Wide Web
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Thomas E. Davis.

## Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
026242136 ( AlephBibNum )
25046248 ( OCLC )
AHZ5752 ( NOTIS )

Full Text

TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING
CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM

By

THOMAS E. DAVIS

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1991

ACKNOWLEDGEMENTS

The author is extremely fortunate in having the assistance of several talented acade-

micians during the conduct of the research program reported in this dissertation. Notably,

Professor Jose Principe, who supervised this work and contributed several key ideas

developed herein, proved a very valuable source of encouragement and support. Also,

Professor Murali Rao assisted very substantially in enforcing mathematical rigor, espe-

cially in the formulation of the Markov chain appendices. Professors Antonio Arroyo and

Donald Childers, who served on the committee overseeing this work, are remembered

fondly for the constructive comments they provided during the visits the author made to

Gainesville while conducting this research, as well as for productive associations while

the author was in residence. Also, the assistance provided by Professor Eugene Chenette

from the Eglin Graduate Center in dealing with a variety of administrative complications,

as well as his service on the committee overseeing this work, is graciously acknowl-

edged.

Additionally, the generous support of the US Air Force Armament Laboratory to

this work is sincerely appreciated. The author's management chain, notably Mr. Lynn

Deibler, Lt. Col. Rex Franklin, Dr. Eugene Youngblood and Lt. Col. Tom Callen, pro-

vided continual encouragement and working condition flexibility. Without their support,

this activity would likely not have been possible.

The Computer Science Directorate at Eglin provided some exceptionally valuable

computer support, on the Eglin AFB Cray Y-MP, under very flexible conditions. Some of

the insights gained during the conduct of this research would be very difficult, and per-

haps impossible, to attain through any method other than simulation. Members of the

staff at the Computer Science Directorate who contributed significantly, especially in

helping the VMS-inclined author through the UNIX maze, include Mr. Eddie Blackwell,

Mr. Ben McKinnon and Mr. Danny Majors. Mr. Bill Clements, who made the computing

resources available, and Mr. Calvin George, who helped the author arrange the support,

are also gratefully acknowledged.

Finally, the author wishes to thank Sumiko, who entered his life during the conduct

of this research program, for the support and understanding whose need she never fails to

anticipate.

page

ACKNOW LEDGEM ENTS....................................................................... ii

A B ST R A C T .......................................................................................... vii

SECTIONS

1 INTRODUCTION......................................... ........................ 1

1.1 Non-Convex Combinatorial Optimization and Stochastic
Search Algorithms.................................... ............... 1
1.2 O rganization............................................ ....................... 2

2 SIMULATED ANNEALING............................................... 7

2.1 O verview ................................................ ....................... 7
2.2 Statistical Mechanics and Annealing of Solids..................... 7
2.3 Combinatorial Optimization by Simulated Annealing........... 9
2.4 Theoretical Foundations of Simulated Annealing................ 10

3 THE GENETIC ALGORITHM............................... ........... 20

3.1 O verview ................................................ ......................... 20
3.2 The Simple Genetic Algorithm Operators............................ 21
3.3 Building Blocks, Schemata and the Fundamental Theorem... 23
3.4 An Assessment of the Genetic Algorithm Theoretical
Foundation............................................ .................... 26

4 A MARKOV CHAIN MODEL OF THE SIMPLE
GENETIC ALGORITHM............................................. 28

4.1 O verview ................................................ ......................... 28
4.2 The Markov Chain Model................................... .......... 28
4.3 The State Behavior of the Simple Genetic Algorithm............ 30

5 SOME EMPIRICAL RESULTS................................................ 42

5.1 O verview ................................................ ......................... 42
5.2 State Space Enumeration................................................. 43
5.3 Reward Function Data............................................ ........... 46
5.4 Conditional Probabilities vs .................................... ........ 48
5.5 Converged Limiting Stationary Distributions....................... 52

6 THE CRAMER'S RULE FORMULATION OF THE
STATIONARY DISTRIBUTION........................................ 66

6.1 O verview ................................................ ......................... 66
6.2 The Stationary Distribution Description............................... 66
6.3 Positivity of the Stationary Distribution Components.......... 71
6.4 The Indeterminate Form at a = 0....................................... 72

7 THE ZERO MUTATION PROBABILITY STATIONARY
DISTRIBUTION LIMIT........................................................ 73

7.1 O verview ................................................ ......................... 73
7.2 Functional Form of the Stationary Distribution.................... 73
7.3 The Absorbing State Rows of P-I and IP-1,-I .................. 75
7.4 Reformulation of Propositions 6.7 and 6.8.......................... 77
7.5 The Stationary Distribution Limit.................................... 81

8 A MONOTONIC MUTATION PROBABILITY
ERGODICiTY BOUND....................................................... 86

8.1 O verview .................................................. ....................... 86
8.2 A Weak Ergodicity Bound.................................. ......... 87
8.3 Strong Ergodicity........................................ .................... 88
8.4 Comparison With the Simulated Annealing Parameter
Bound.................................................. ..................... 91

9 REPRESENTATION OF THE STATIONARY
DISTRIBUTION SOLUTION................................. ........... 92

9.1 O verview ................................................... ...................... 92
9.2 The Limiting Case a= 1......................................................... 93
9.3 The General Case 0 < a < 1.................................. .......... 97
9.4 The Limiting Case 0.................................................. 109
9.5 Extending the Stationary Distribution Representation............ 116

10 CONCLUSIONS AND FUTURE DIRECTION........................ 120

10.1 Sum m ary .......................................................................... 120
10.2 Contributions of the Research.............................................. 124
10.3 Future D irection.................................................................... 125

APPENDICES

A DISCRETE TIME FINITE STATE MARKOV CIIAINS........... 126

A .I Introduction..................................................... .. 126
A.2 Elementary Definitions.................................................................. 126
A.3 Time-Homogeneous Markov Chains................................... 128
A.4 Inhomogeneous Markov Chains.......................................... 130

B THE PERRON-FROBENIUS THEOREM AND
STOCHASTIC MATRICES................................................... 132

B 1 Introduction............................................................................. 132
B.2 The Perron-Frobenius Theorem and Ancillary Results for
Prim itive M atrices............................................................ 132
B.3 The Perron-Frobenius Theorem for Stochastic Matrices....... 134

C VANDERMONDE DETERMINANTS, SYMMETRIC
AND ALTERNATING POLYNOMIALS..................... 137

C.1 Introduction................. ................................ 137
C.2 Evaluation of Vandermonde Determinants.......................... 138
C.3 Symmetric (and Alternating) Polynomials............................. 139
C.4 Quasi-Symmetric (and Quasi-Alternating) Polynomials........ 142

D COMPUTER LISTINGS................................................................. 145

D 1 Introduction............................................................................ 145
D.2 M ain Program Listings.......................................................... 145
D .3 Library Listings....... ............................................................... 152

R E FE R E N C E S.......................................................................................... 163

BIOGRAPHICAL SKETCH..................................................................... 166

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING
CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM

By

THOMAS E. DAVIS

May 1991

Chairman: Professor Jose C. Principe
Major Department: Electrical Engineering

Simulated annealing and the genetic algorithm are stochastic relaxation search tech-

niques suitable for application to a wide variety of combinatorial complexity nonconvex

optimization problems. Each produces a sequence of candidate solutions (or populations

of candidate solutions) to the underlying optimization problem, and the purpose of both

algorithms is to generate sequences biased toward solutions which optimize the objective

function.

The appeal of simulated annealing is that it provides asymptotic convergence to a

globally optimal solution. A substantial body of knowledge exists concerning the algo-

rithm convergence behavior. It is based upon a nonstationary Markov chain algorithm

model. No genetic algorithm model comparable in scope exists in the literature. This

work constitutes an attempt to provide such a model and accompanying convergence

theory by extrapolating the simulated annealing results onto the genetic algorithm. A pre-

requisite, developed herein, is a nonstationary Markov chain genetic algorithm model.

The essence of the simulated annealing theory is demonstration of (1) existence of a

unique asymptotic probability distribution (stationary distribution) for the stationary Mar-

kov chain corresponding to every strictly positive constant value of an algorithm control

parameter (absolute temperature), (2) existence of a stationary distribution limit as the

control parameter approaches zero, (3) the desired behavior of the stationary distribution

limit (i.e. optimal solution with probability one) and (4) sufficient conditions on the algo-

rithm control parameter to ensure that the nonstationary algorithm achieves (asymptoti-

cally) the limiting distribution. With the exception of (3), this work adapts that

methodology to the genetic algorithm Markov chain model employing a genetic operator

parameter (mutation probability) as the algorithm control parameter. The results include a

mutation probability control parameter bound analogous to (and asymptotically superior

to) the conventional simulated annealing parameter bounds, and a framework for repre-

senting the genetic algorithm stationary distribution components at all consistent fixed

control parameter values, including zero.

The genetic algorithm stationary distribution limit has nonzero components corre-

sponding to all solutions. Thus, the simulated annealing global optimality convergence

result does not extrapolate. However, both empirical and theoretical evidence is provided

which suggests that the desired limiting behavior can be approached by suitably adjusting

the algorithm parameters.

SECTION 1

INTRODUCTION

1.1 Non-Convex Combinatorial Optimization and Stochastic Search Algorithms

A wide variety of engineering applications lend themselves to formulations which

require the solution of combinatorial optimization problems. Typically, the optimization

problem is nonconvex and is defined over a very high dimensionality search space (e.g.

inverse vision problems, in which an image array of 512X512 pixels at 8 bits/pixel might

be encountered, resulting in a search space dimensionality of -2M). Consequently, direct

solution is usually intractable.

An alternative to direct solution is to select one of a variety of iterative improve-

ment solution techniques, usually some variant of gradient search. But by definition,

deterministic iterative improvement techniques terminate in local extrema, and they

ordinarily provide no means of assessing the amount by which the selected local extre-

mum deviates from the global extiemum. A typical means of avoiding local extrema

entrapment is to implement the iterative improvement solution method stochastically.

The most commonly employed stochastic algorithm approach to combinatorial opti-

mization is simulated annealing [KiGe83, LaAa871, which is also sometimes referred to

as probabilistic hill climbing [RoSa85]. It exploits the analogy of combinatorial

optimization to the annealing of crystalline solids, in which a solid is cooled very gradu-

ally from some elevated temperature and thereby allowed to relax toward its low energy

states. The appeal of the algorithm class derives from the fact that provided certain

constraints on an algorithm control parameter (analogous to absolute temperature) are

observed, asymptotic convergence to a global extremum is guaranteed.

The key limitation of simulated annealing is that the convergence behavior is

asymptotic. Thus global optimality is obtained only after an infinite number of algorithm

iterations. The rate of convergence to optimality is determined by a nonnegative algo-

rithm control parameter whose ideal value is zero and which must observe a lower bound

in order to assure coherent algorithm behavior. The best available known bound for the

parameter, the annealing schedule bound, is of the form K/log(k) where k is the iteration

index and K is a parameter independent of k [GeGe84, MiRo85].

Another combinatorial optimization stochastic search technique reported in the lit-

erature is the genetic algorithm [Davi87, Gold83, Gold89a, Gref85, Gref87]. It emulates

the evolution of biological systems by employing a set of stochastic operators (e.g.

reproduction, crossover and mutation) to transform a population of candidate solutions to

the underlying optimization problem into a new (descendent) population. It has some fea-

tures which suggest that it may provide significantly improved convergence behavior

over simulated annealing on certain types of optimization problems. However, the nature

of the genetic operators and their influence on algorithm behavior is only understood in

general terms. No complete theoretical model of the algorithm exists in the literature. The

fundamental goal of the work reported here is to provide a theoretical framework for ana-

lyzing the algorithm based upon the asymptotic probability distribution of the solution

sequences which it produces. The work reported herein includes significant progress on

the key intermediate steps to achieving that goal.

1.2 Organization

The remaining sections of this paper are organized as follows. Sections 2 and 3 are

background reviews of the simulated annealing and genetic algorithm literature respec-

tively. Section 2 places considerable emphasis on the methodology employed to yield the

asymptotic convergence results which are the theoretical foundation of the simulated

annealing algorithm. That methodology appeals heavily to the theory of inhomogeneous

(nonstationary) Markov chains and their asymptotic state probability distributions. The

essence of the simulated annealing convergence theory is a set of sufficient conditions to

ensure that the asymptotic probability distribution of the Markov chain which represents

the algorithm is independent of its starting state and has probability zero for all states cor-

responding to sub-optimal solutions.

Section 3 begins with a verbal description of the three fundamental stochastic

operators employed in genetic algorithms (i.e. reproduction, crossover and mutation), and

proceeds to review the existing theoretical foundation of the algorithm class. A conclu-

sion of that section is that while certain important theoretical results exist, notably the so-

called schema theorem and some work on a problem construct referred to as the minimal

deceptive problem, the genetic algorithm lacks the theoretical foundation necessary to

either compare it with simulated annealing or to answer key questions concerning the

design of a genetic algorithm for a given application.

The author's contribution to this work begins with Section 4. The major result of

that section is a very general, nonstationary Markov chain model of the variants of the

genetic algorithm which employ combinations of the three fundamental genetic algorithm

operators. The model is tailored to resemble that employed in developing the simulated

annealing methodology, and in that regard, the genetic algorithm mutation operator is

shown to provide a function very similar to that of the simulated annealing absolute tem-

perature analog. Specifically, the stationary algorithm corresponding to every constant

value of the mutation probability parameter satisfying 0
asymptotic probability distribution (stationary distribution).

The key results of Section 4 are state transition matrices, formulated in terms of the

algorithm parameters and objective function of the underlying optimization problem, for

one, two and three-operator variants of the algorithm. Due to the nature of the genetic

operators, the state transition matrices exhibit an extremely high degree of symmetry.

These matrices, and some key related results, are used extensively in later sections.

Section 5 digresses briefly from the theoretical development to produce and

examine some empirical work based upon the algorithm model. The presentation is not,

nor is it intended to be, a thorough empirical study. It is provided to help fix some of the

algorithm model state space and asymptotic probability distribution ideas which are cen-

tral to this work, and it anticipates some of the theoretical results which follow.

Section 6 resumes the theoretical development. Its result is an expression for the

components of the unique asymptotic probability distribution produced by the stationary

algorithm variants which implement the mutation operator with nonzero mutation proba-

bility (i.e. the stationary two and three-operator algorithm variants). The result is

expressed in terms of Cramer's Rule and thus its solution requires evaluation of

determinants. The determinants are the characteristic polynomials, evaluated at X= 1, of

matrices derived from the state transition matrix produced in Section 4 by zeroing one

row. A later section attacks the problem of explicitly solving the system, based upon the

highly symmetrical nature of the state transition matrix, but some very significant results

are obtainable from the product of Section 6 without explicit solution.

An essential step in establishing a connection between simulated annealing and the

genetic algorithm is demonstrating the existence of a stationary distribution limit for the

algorithm as the mutation probability approaches zero. Section 7 accomplishes that task

and also provides a foundation for deducing, in Section 8, a mutation probability bound

analogous to the annealing schedule bounds of the simulated annealing algorithm. The

results developed in Sections 7 and 8 apply to both the two and three-operator algorithm

variants.

A somewhat surprising result produced in Section 7 and anticipated by the empiri-

cal study reported in Section 5 is that the stationary distribution zero mutation probability

limit does not necessarily isolate globally optimal solutions. In fact, it provides nonzero

probability for all solutions of the underlying optimization problem and consequently the

extrapolation of the simulated annealing methodology is less than exact. However, both

5

the empirical results presented in Section 5 and some results developed later in Section 9

suggest that the required limiting behavior can be approached as closely as desired by

Section 9 attacks the problem of explicitly solving the system which results from

the Cramer's Rule formulation of the stationary distribution of the time-homogeneous

two and three-operator algorithms. It is a very extensive development which yields an

expression for the coefficient of the general term in the Taylor's series expansion of the

required determinants. It is based upon the highly symmetrical nature of the state trans-

ition matrix, as alluded to earlier.

The results of Section 9 are not reduced to a directly useable explicit solution.

Nevertheless, they do provide significant insight into the functional form of the stationary

distribution components. Furthermore, Section 9.5 points out some very significant iden-

tities which exist among the coefficients of the Taylor's series and suggests a method for

continuing the Section 9 development based upon the algebra of symmetric and

alternating polynomials. Explicit solution of the stationary distribution equations is the

major incomplete task required for extrapolation of the simulated annealing convergence

theory onto the genetic algorithm.

Section 10 summarizes this work and recapitulates the significant results. It also

proposes continuation of two parts of this research: (1) pursuit of the stationary distribu-

tion solution and (2) refinement of the mutation probability control parameter bound.

An appropriate mathematical framework for examining both the simulated

annealing and genetic algorithms is the theory of Markov chains. Appendix A is included

to summarize some essential definitions and theorems. Appendix B is devoted to the

Perron-Frobenius theorem, which is fundamental to the study of nonnegative matrices in

general and Markov chains in particular. Several important Markov chain theorems are

specializations of it and the key developments in Sections 6 and 7 require its application.

All of the Appendix A and Appendix B results are provided without proof or elaboration,

but their foundation is obtainable from various references (e.g. [Cinl75] for the more ele-

mentary results in Appendix A, [Sene81] for the Appendix B material on the Perron-

Frobenius Theorem and [Sene81, IsMa76] for the Appendix A ergodicity related

definitions and theorems). These results are invoked freely in the following sections,

either by specific reference to definition/theorem number, or if the context makes it

appropriate, they are simply assumed.

Appendix C is provided as background for the Section 9.5 discussion on coefficient

identities and extending the stationary distribution representation development. With the

exception of Section C.4, the material presented in Appendix C is obtainable from

advanced algebra texts (e.g. [MoSt64]). The symmetric/alternating polynomial general-

ization in Section C.4 is original.

Appendix D collects the computer program listings for the programs employed in

generating the results reported in Section 5. The programs presented there were devel-

oped and executed on the Cray Y-MP operated by the Computer Science Directorate at

Eglin AFB, Fl.

SECTION 2

SIMULATED ANNEALING

2.1 Overview

As noted in the introduction, a very commonly employed approach to the solution

of nonconvex combinatorial optimization problems is a stochastic relaxation technique

introduced by Kirkpatrick et al. and referred to as simulated annealing [KiGe83]. The

technique is so named by virtue of its analogy to the annealing of solids, in which a crys-

talline solid is heated to its melting point and then allowed to cool very gradually until it

is again in the solid phase at some nominal temperature. In the limiting case of

infinitesimal cooling rate and absolute zero final temperature, the resulting solid achieves

its most regular possible crystal lattice configuration (i.e. minimum lattice energy state),

and hence is free of crystal defects. Simulated annealing establishes the connection

between this sort of thermodynamic behavior and the search for the global minimum of

an objective function in a combinatorial optimization problem, and further, it provides an

algorithmic means of exploiting the connection. This section is a review of the technique

with special emphasis on known results which bound the convergence behavior of com-

puter algorithms belonging to the class.

2.2 Statistical Mechanics and Annealing of Solids

The fundamental assumption of statistical physics is that the thermodynamic behav-

ior of a many particle system can be represented by a statistical ensemble, and that if the

system is in thermal equilibrium, the time averages of macroscopic thermodynamic

properties of the system are equal to the corresponding ensemble averages (crgodicity

hypothesis). The random variable represented by the ensemble is the system thermal

energy, and at thermal equilibrium the probability distribution is completely determined

by the system temperature. The distribution is known as the Boltzman distribution, or

alternatively as the Gibbs distribution, and its form is

exp{-E(i)/kT}
Pr{E = E(i)} = ep{-E(i)/kT} Eq. 2.1
Z(T)

where E = the system thermal energy (a random variable)

E(i) = the energy corresponding to state i

k = Boltzman's constant

T = the system temperature

Z(T) = the partition function.

The factor

p{ -E(i)
kT

is called the Boltzman factor. The partition function provides the necessary normalization

to make Eq. 2.1 a state occupancy probability. It can be expressed as

Z(T)= eexp -) Eq. 2.2

At elevated temperatures, the system represented by the probability distribution in

Eq 2.1-2 occupies all states in its state space with nearly uniform probability, while at

low temperatures, states having low energy are favored. When the temperature

approaches absolute zero, only states corresponding to the minimum value of energy

have nonzero probability. Thus, the thermodynamic system's energy function can be

effectively searched for its minimum value by starting the system at an elevated tempera-

ture and allowing it to cool gradually to absolute zero, at which point one of its minimum

energy states is occupied with probability one. This is the mechanism which guides the

annealing of solids.

The cooling schedule employed in annealing solids is constrained by the require-

ment that the system be allowed to achieve thermal equilibrium at each temperature. The

Gibbs distribution only represents the system's energy distribution in the stationary case

(i.e. equilibrium). If this requirement is not satisfied, defects can be frozen into the crystal

lattice preventing the system from achieving the minimum possible energy state. This

behavior is analogous to local minima entrapment in combinatorial optimization search.

The restriction on the annealing schedule necessary to avoid it is the fundamental limita-

tion on the annealing technique.

2.3 Combinatorial Optimization by Simulated Annealing

Simulated annealing approaches combinatorial optimization problems in a closely

analogous fashion. In simulated annealing, the optimization problem's solution space cor-

responds to the state space of the analogous thermodynamic system and its cost function

is analogous to the thermodynamic system's energy surface. The analog of the

thermodynamic system's temperature is a nonnegative algorithm control parameter, T.

Two other algorithm components are also required. They are the stochastic next

state generation and acceptance mechanisms, and they incorporate the dependence of the

algorithm on the control parameter, T. The next state generation mechanism is employed

by the algorithm to transform a current solution into a new candidate solution, and the

acceptance mechanism is employed to decide whether to retain or discard the proposed

new solution. Together, these stochastic operators are responsible for making the search

algorithm simulate the thermodynamic system's statistical behavior. Consequently, they

must satisfy certain requirements to assure coherent algorithm behavior. These require-

ments are explored in some depth later in the context of algorithm convergence behavior.

Conceptually, the operation of the simulated annealing algorithm can be described

as follows. The algorithm starts at some initial value of the control parameter and with

some initial solution. Then, the state generation mechanism is employed to synthesize a

new candidate solution. The new solution is examined by the acceptance mechanism and

either accepted or rejected. If it is accepted, the new solution becomes the current solu-

tion. Otherwise, the old current solution is retained. This process is repeated, generating a

sequence of temporary solutions, until an approximate equilibrium is achieved in which

the solution space occupancy is described by the Gibbs distribution (Eq. 2.1-2). Once this

approximate equilibrium is achieved, the control parameter value is reduced and the solu-

tion sequence is extended until equilibrium is achieved at the new control parameter

value. This process is repeated until some termination condition (e.g. minimum control

parameter value) is attained. The current solution at termination is then accepted as the

solution to the optimization problem.

It is noted in passing that simulated annealing always involves minimizing a cost

functional, never maximizing a reward. However, this causes no loss of generality

because any combinatorial optimization problem can be translated into an equivalent

minimization problem.

2.4 Theoretical Foundations of Simulated Annealing

The evolution of the search sequence of a simulated annealing algorithm as out-

lined, in which each succeeding solution in the sequence is determined stochastically

based upon the current solution, suggests that the algorithm behavior can be described as

a Markov chain. Indeed it can, and all of the known convergence results for simulated

annealing algorithms are derived from analysis of Markov chain models [LaAa87,

GeGe84, LuMe86, MiRo85, Rior58]. This subsection establishes a Markov chain model

to represent the simulated annealing algorithm and then employs it in reviewing the

development of the published convergence bounds. This development essentially follows

[LaAa87].

2.4.1 A Markov Chain Model of Simulated Annealing

Let a combinatorial optimization problem be represented by the pair (S, C) where S

is the problem's solution space and C is its cost function, and assume without loss of gen-

erality that the optimization problem requires minimization of C. Also, assume that S is

finite. Then, a simulated annealing algorithm for solving this problem can be

characterized by the quadruple (S,io,PT,T) where S is as defined above and where i0 e S

is an initial candidate solution, p is a stochastic matrix which describes a stochastic state

transition mechanism (the composition of the next state generation and acceptance mech-

anisms discussed in Section 2.3) and t = {Tk} is a finite length monotone nonincreasing

sequence of positive control parameter values. The first parameter value in t is To and the

final value is Tf. P, incorporates the algorithm dependence on both C and T.

The algorithm generates a sequence of candidate solutions, {ik:0 k 5 f}, by

employing the state transformation mechanism (described by PT) to transform solution ik

into ik+1. At the k0 transition, T is completely determined by Tk. The solution sequence is

extended until T = Tf, at which point the current solution, if, is accepted as the solution to

the combinatorial optimization problem. Thus Tf signals algorithm termination. T, can be

allowed to depend on {ik} provided due regard is paid to the requirement for termination.

Since the solution state transition mechanism is stochastic, and since the conditional

dependence of the solution sequence only extends to one transition, the solution sequence

is a Markov chain by Definition Al. Its state transition matrix is PT (Definition A4).

The state transition matrix is decomposed into two parts for convenience in the fol-

lowing. It consists of the next state generation mechanism, Gj(T), which describes the

probability of generating state j given that the current state is i, and the state acceptance

mechanism, Aij(T), which describes the probability of accepting the generated state. Thus,

PT(i,j) is written as

Gj(T)Aij(T) j i
P3(i,j) = N Eq. 2.3
1- 2 G,,(T)A,(T) j=i
I=1,I^i

In this result, N = card(S) represents the cardinality of the solution space.

It is noted in passing that the usual form of the state acceptance mechanism is the so

called Metropolis criterion [Metr53], given by

A(T)= min 1, exp -( c(i]). Eq. 2.4
1 If T

This is the form employed by Kirkpatrick et al. in the original work [KiGe83] and most

others published since are variations of it. Also, the usual form of the next state genera-

tion mechanism is

jE Si
Gi(T)= G,= Gji Ni J Eq. 2.5
0 otherwise

where Si c S is the set of states accessible from state i in one transition (by definition,

i i Si), and where N, = card(Si). Note that Gij defined by Eq. 2.5 is symmetric and inde-

pendent of T.

2.4.2 Asymptotic Convergence Behavior

The subject of interest in the remainder of this section is a set of sufficient condi-

tions on PT and T to ensure that an optimal solution is achieved. These conditions will

prove to guarantee asymptotic convergence only (i.e. T must be an infinite sequence,

which of course violates the termination requirement of the algorithm). Two cases will be

examined. The first only involves time-homogeneous (stationary) Markov chains (Defini-

tion A5) and is presented due to its relative ease of analysis. Its purpose is to provide a

foundation for the essential ideas involved in the second case, which requires an appeal to

ergodicity theorems for inhomogeneous (nonstationary) Markov chains. The useable con-

vergence behavior results which are the goal of this effort derive from analysis of the sec-

ond case.

The first (simple) algorithm is represented as a sequence of solutions evolving as a

sequence of distinct Markov chains. Each Markov chain in the sequence executes at a

fixed control parameter value (and hence is time-homogeneous) and each succeeding

Markov chain executes at a lower (but strictly positive) parameter value. Thus, in the

sequence t, each distinct parameter value, T,, is associated with a distinct time-

homogeneous Markov chain and T, occurs at some large number of consecutive locations,

K,, in T. This case is hereafter referred to as the homogeneous (or stationary) algorithm.

The analysis of the convergence behavior of the homogeneous algorithm includes the

hypothesis that each Markov chain in the sequence achieves its stationary distribution.

This hypothesis is equivalent to K, -- oo for all 1 (Definition A10, Theorem A3, Theorem

A4).

In the second case, the algorithm is represented as a sequence of solutions evolving

as a single inhomogeneous (nonstationary) Markov chain. This formulation is hereafter

referred to as the inhomogeneous (or nonstationary) algorithm. In the inhomogeneous

algorithm, the control parameter value is allowed to decrease (though not necessarily

required to) after each state transition. The dependence of G,(T) and Aj(T) on T results in

the inhomogeneous behavior.

2.4.2.1 The Homogeneous Algorithm

In the homogeneous algorithm, the means of establishing the requirements for

asymptotically optimal convergence is to first establish sufficient conditions for existence

of the stationary distribution of each Markov chain and then to establish sufficient condi-

tions to ensure that the stationary distribution converges to a uniform distribution over the

set of optimal solutions as the control parameter value approaches zero. That is

1
lim q,(i)= Nopt o Eq. 2.6
0 otherwise

where qT is the stationary distribution of the Markov chain executing at control parameter

value T, Sop, c S is the set of solutions i e S:C(i) = Cop, and Nop, = card(Sop).

Theorems A1-A3 can be employed to deduce sufficient conditions on PT(i,j) (or

alternatively on G,j(T) and Aj(T)) to ensure the existence of the stationary distribution of

each Markov chain in the sequence representing the homogeneous algorithm. Since only

combinatorial (finite solution space) optimization problems are under consideration and

since by definition the homogeneous algorithm only employs time-homogeneous Markov

chains, the finite state space and time-homogeneity requirements of Theorem A3 are

satisfied. Beyond these requirements, existence of the stationary distribution of each Mar-

kov chain in the homogeneous algorithm only requires that the chain produced by PT be

irreducible and periodic (Definitions A7 and A9).

If A,(T) is selected as the Metropolis criterion, Eq. 2.4, then

Vi,j E, VT > 0 : Aj(T) > 0.

Thus, from Eq. 2.3, the irreducibility requirement is transferred to the next state genera-

tion mechanism, G,(T). Note that from Theorem Al, irreducibility can readily be

achieved within the definition supplied by Eq. 2.5. Also, in [MiRo85], Theorem A2 is

used to show that a sufficient condition for aperiodicity is

VT > 03i,je E 3 A(T) < 1.

This condition is satisfied by the Metropolis criterion provided the trivial case indicated

by

Vi,j eE : C(i) = C(j)= Copt

is excluded, because then k,l always exist such that

C(l) = Co, < C(k). Eq. 2.7

The sufficient condition on Aj(T) can then be met by selecting i = 1 and j = k. (Use Eq.

2.7 in Eq. 2.4).

Although existence of the stationary distribution (or at least sufficient conditions on

Gij(T) and Aj(T) to ensure its existence) are now established, and examples of Gi and Aj

which meet these conditions provided, actually achieving the stationary distribution is

only guaranteed after an infinite number of state transitions. This is equivalent to the ther-

mal equilibrium constraint on the temperature schedule for annealing solids discussed in

Section 2.2. Each Markov chain in the sequence representing the homogeneous algorithm

is subject to this requirement, and consequently must be of infinite length.

Next, sufficient conditions to assure convergence of the stationary distribution of

the final Markov chain in the homogeneous algorithm to the desired optimal distribution

(Eq. 2.6) are established. First, note that if the stationary distribution of a Markov chain

in the sequence exists, then a function g(C(i),T) corresponding to that Markov chain

exists such that

g(C(i), T)
Vi c E: qr(i) g(C(T) Eq. 2.8
Sg(C(j), T)
J

where g satisfies

(1) Vie E, VT>0 : g(C(i),T) > 0

Sg(C(i), T)Gj(T)A,(T) = Eq. 2.9
(2) Vj EE ij
g(C(i), T) Y_ Gj(T)Ai(T)
i*J

This can be deduced by noting that the uniquely determining conditions on q expressed in

Theorem A3 are met by g satisfying Eq. 2.8 and 2.9. Eq. 2.9 is called the global balance

equation. Close examination reveals that it is exactly the necessary condition for equilib-

rium state occupancy. A more restrictive condition, in which the balance holds for every

pair of states on a pair-wise basis is called the detailed balance equation.

It can be shown that the following additional constraints on g guarantee conver-

gence of the stationary distribution to the optimal (i.e. to Eq. 2.6) [MiRo85]. Note that

Eq. 2.10(2) requires an exponential form.

(1) lim g(A,T) = 0 A>0
T-0 [00 A<0
g(A,,T) Eq. 2.10
(2) ( = g(A A, T)
g(A2, T)
(3) VT > 0 : g(0,T)= 1

Collectively, Eq. 2.8-2.10 provide a set of sufficient conditions on Gj(T) and Aj(T)

to assure convergence of the stationary distribution to Eq. 2.6. The key condition, the

global balance equation, is implicit however, and thus is very difficult to apply. Neverthe-

less, it can be shown [LaAa87] that if G,(T) and Aj(T) defined by Eq. 2.4 and Eq. 2.5 are

employed, the conditions are satisfied, and that the corresponding stationary distribution

is provided by

exp{-(C(i) Cop)/kT}
Vi E: qT(i)= -C /kT} Eq. 2.11
Yexp{-(C(j) Cop/kT}
J

The key to that development is that the Gij(T) and A2j(T) of Eq. 2.4 and 2.5 satisfy the

detailed balance equation, the symmetry of G~i being a critical consideration.

The behavior required by Eq. 2.10(1) is limiting behavior as T -- 0. Thus, these

conditions assure convergence to the global minimum with probability one (i.e. conver-

gence of the stationary distribution to Eq. 2.6), only if the sequence of Markov chains is

infinite and lim T, = 0. Recalling that a guarantee of achieving the stationary distribution

requires that each Markov chain be of infinite length, the homogeneous algorithm is seen

to require a doubly infinite sequence of solutions composed of an infinite sequence of

infinitely long Markov chains.

2.4.2.2 The Inhomogeneous Algorithm

The behavior of the homogeneous algorithm, which requires that an infinite number

of transitions be executed at each control parameter value, clearly is not very useful. The

following reviews two published convergence results which extend the ideas developed

for the homogeneous algorithm to the inhomogeneous counterpart [GeGe84, MiRo85].

These results adopt the sufficient conditions on Gi(T) and Aj(T) developed for the homo-

geneous algorithm as a starting point (i.e. irreducibility, aperiodicity and Eq. 2.8-2.10)

and extend them to the case in which each time-homogeneous Markov chain is finite

length (i.e. to the inhomogeneous algorithm). The key products of this effort are lower

bounds on the algorithm control parameter's approach to zero. In both cases discussed

here, the bound is of the form K/log(k) where k is the index of the Markov chain repre-

senting the inhomogeneous algorithm and K is independent of k. The following is a brief

sketch of the approach taken to arrive at these results. It is common to both.

Given that G,j(T) and A1j(T) are selected as in Eq. 2.4 and 2.5, each state transition

matrix in the inhomogeneous Markov chain of the inhomogeneous algorithm satisfies all

of the sufficient conditions for stationary distribution existence and asymptotic conver-

gence to optimality developed for the homogeneous algorithm (i.e. irreducibility, aper-

iodicity and Eq. 2.8-2.10). Further, the explicit form of the resulting stationary

distribution is given by Eq. 2.11. Thus, for each transition matrix, PT), there exists an

eigenvector, q,, having eigenvalue 1 and satisfying the probability vector conditions.

Further, q. converges to the limiting distribution of Eq. 2.6 as Tk -+ 0. Consequently,

Theorem A7 can be used to establish strong ergodicity (and hence the desired conver-

gence behavior for Tk 0) provided (1) that weak ergodicity can be established and (2)

that the inequality appearing in Theorem A7 obtains.

Under the hypothesis that Gj(T) and Aj(T) are defined in accordance with Eq. 2.4

and 2.5, in which case the required eigenvector is explicitly provided by Eq. 2.11, and

that condition (1) (weak ergodicity) is satisfied, both [GeGe84] and [MiRo85] prove con-

dition (2) of the above. The development is straightforward but tedious. Of more interest

here is the means of establishing condition (1), because it leads to the annealing schedule

bound.

Both developments employ Theorem A6 to establish weak ergodicity. The general

approach is to use the definitions of Gj(T) and Aj(T), along with bounds on the extrema

of either the cost function [GeGe84] or the slope of the cost function [MiRo85] to define

bounds on the one step transition probabilities. The transition probability bound is then

employed to arrive at an upper bound on the t coefficient of ergodicity of Theorem A5,

which is used in turn in Theorem A6 to deduce a sufficient condition to guarantee weak

ergodicity. The condition is in the form of a lower bound on the annealing schedule.

The first such result to be published is in [GeGe84]. The resulting bound is

SNx (Cma, CmJn)
Tk 2 Nx(C Eq. 2.12
log(k)

k>2

where Cm,a and Cmin are the maximum and minimum values respectively of C(i) for i e S

and N = card(S). Thus, Cmi,, is the desired Cop.

The annealing schedule bound established in [MiRo85] is more refined than that of

Eq. 2.12. It is given by

rL
Tk > lo Eq. 2.13
log(k)

k>2

where r is the radius of the graph defining the accessible state neighborhoods of the next

state generation mechanism (i.e. the {S,} where S, c S is defined in Eq. 2.5), and L is a

constant which bounds the local slope of the cost function. Specifically, r and L are given

by

r= min max d(i,j) Eq. 2.14
iES-Sma jE S

where d(i,j) is the distance ofj from i, measured by the minimum number of state trans-

itions required to arrive at j starting at i, where Sma, c S is the set of local maxima of C

and

L= max max I C(j)- C(i) I. Eq. 2.15
i eS j S

Note that in the special case S, = S for all i e S, then Eq. 2.14 and Eq. 2.15 reduce to r= 1

and L = Cma Cmin respectively, and substitution into Eq. 2.13 yields

Tk (Cmax Cmin)
Tk 2 Eq. 2.16
log(k)

The Eq. 2.16 result is smaller than that of Eq. 2.12 by the factor 1/N.

Both of these published convergence results, as well as several others which are

minor variations of them, are of the general form K/log(k). This behavior is the key lim-

itation of the algorithm class, and is believed to be a fundamental limitation imposed by

the neighborhood system inherent in the conventional simulated annealing state

generation mechanism [GeGe84] (i.e. the fact that at low control parameter values, the

likelihood of making the large state transition necessary to escape a local extremum is

radically diminished). The simulated annealing literature includes some amount of specu-

lation concerning state generation mechanisms which permit occasional large transitions

even at low control parameter values.

SECTION 3

THE GENETIC ALGORITHM

3.1 Overview

The genetic algorithm is an iterative improvement stochastic search method appro-

priate for application to combinatorial optimization problems and based on the evolution

of biological systems. It implements the fundamental idea of survival fitness on a

population of string structures which are coded representations of solution candidates

selected from the solution space of the optimization problem. The population of candi-

date solutions (which collectively represent the current estimate of the optimum solution)

is subjected to a set of stochastic genetic operators which transform a current population

into a new (descendent) population. A variety of distinct genetic operators (based on bio-

logical analogs) are available and are reported in the literature [Davi87, Gold89a, Gref85,

Gref87]. The most important of them are (1) proportional reproduction, (2) crossover and

(3) mutation. A one, two or three operator genetic algorithm employing combinations of

these operators with fixed population size is referred to herein as a simple genetic algo-

rithm.

The genetic operators are all implemented stochastically, but they do not result in a

simple random walk through the search space. They represent a highly structured search

which exploits the historical record of performance reflected at each stage of the search

by the current population. It is the novel use of this historical record which is central to

the appeal of the genetic algorithm.

Genetic algorithms usually operate on populations of bit-strings (i.e. the optimiza-

tion problem is usually coded such that its search space is defined over a binary string

alphabet), and they always attempt to maximize some strictly nonnegative objective

function. The evolution of the fixed size population of candidate solutions toward domi-

nation by optimal solutions is the algorithm goal.

The three genetic operators of a simple genetic algorithm are discussed in the next

subsection. An analysis of their behavior requires introduction to the concept of sche-

mata, or similarity templates, and that task is undertaken in a subsequent subsection. This

section concludes with an assessment of the theoretical foundation available for the

analysis of genetic algorithms.

3.2 The Simple Genetic Algorithm Operators

As noted above, the simple genetic algorithm employs three biologically inspired

operators to transform each population of candidate solutions into a new (descendent)

population. The following subsections examine each of these operators and how they

influence the search evolution.

3.2.1 Reproduction

The genetic algorithm reproduction operator is the algorithmic analog of asexual

reproduction. It is the means by which the objective function influences the evolution of

the genetic algorithm search. It is implemented by evaluating each member of the current

generation against the objective function and using the results to measure relative repro-

ductive fitness (i.e. to provide a selection probability measure). Then, members of the

current population are selected in accordance with this fitness measure to be members of

the succeeding generation. This process is repeated (with statistically independent selec-

tion trials) until the entire new generation is populated.

In the absence of the other genetic operators, the reproduction operator tends to

force the population to converge to the higher performing members of the current popula-

tion. It eventually produces a uniform population. At any stage of the search (generation),

only solutions which are represented by members of the current population can appear in

any succeeding generation. In particular, no solution absent from the initial population is

ever attainable. The reproduction operator exerts a strictly converging influence on the

search evolution. The other operators of the simple genetic algorithm circumvent this

limitation in a controlled manner.

3.2.2 Crossover

The crossover operator in a genetic algorithm is the algorithmic analog of sexual

reproduction. It produces the succeeding generation not by simply replicating the fittest

members of the current generation but by mating the fittest members of the current gener-

ation to produce progeny with some of the "genetic" character of each parent. It is

implemented by randomly exchanging parts of the strings representing the parents to

produce descendent strings.

The crossover operator is implemented (with some given probability, p,) after the

reproduction operator has been invoked to select two reproducing parents. A string loca-

tion is randomly selected (usually with uniform selection probability) and the parent bit-

string on each side of the randomly selected location are exchanged to produce two

progeny, which are then inserted into the succeeding population. This operation is

repeated until the new generation is completely populated.

The crossover operator permits strings not represented in the current population to

be generated in the succeeding population. That is, certain points in the solution space

which are not represented in the current generation can be present in the successor gener-

ation. But the crossover operator is applied preferentially to high performance members

of the current population, so it constitutes a judicious, informed tendency toward

population divergence. This is the novel feature contributed by the crossover operator.

Even with the addition of crossover, the genetic algorithm search will eventually

converge to a uniform population. In general the crossover operator causes a greater por-

tion of the search space to be explored prior to convergence to uniformity, but for a given

initial population, there are still unreachable points in the solution space. Further, even if

a high performance solution is accessible from the initial population, some portion of the

"gene pool" necessary to reach it can be irrevocably lost during the search evolution.

3.2.3 Mutation

The mutation operator is applied to each member of the successor generation

created by the reproduction and crossover operators. It simply consists of randomly per-

turbing each descendent string with some (usually very small) perturbation probability,

p,. The operator exerts a diverging influence on the search algorithm, and it provides a

means by which the search can, with some nonzero probability, always arrive at any point

in the solution space. That is, no part of the "gene pool" is ever permanently extinguished

if the mutation operator is implemented. Clearly, it is analogous to mutation in biological

reproduction. Note also that if p, > 0, the mutation operator precludes the algorithm from

ever producing a permanently uniform population (i.e. it precludes algorithm conver-

gence).

3.3 Building Blocks, Schemata and the Fundamental Theorem

The underlying premise of the genetic algorithm operators is that good solutions to

an optimization problem over a bit-string solution space are composed of locally good

substrings, and that assembling combinations of such locally good substrings is an effec-

tive way to search the space for globally good solutions. In the genetic algorithm litera-

ture, this is referred to as the building block hypothesis. For a problem to be amenable to

genetic algorithm solution, this hypothesis should apply. In the genetics parlance, this

hypothesis is stated as a requirement that the problem exhibit "...some but not too much

epistasis" [Davi87]. The next subsection introduces an idea which helps to place this

hypothesis on a more analytical basis, but the results are still incomplete.

3.3.1 Schema Defined

Let the solution space under consideration be the set of binary strings of length L,

(i.e. S = {0, I}L). Then, a schema (plural schemata), designated H, is a subset of S having

the property that every member of H matches at some specified set of defining bit loca-

tions. Thus, if L= 5, then the schema H might be the set of length 5 bit-strings which

match the string (1,0,1,0,0) at the bit locations indicated by H = {s:s= (1,*,*,,0,*)}, in

which the asterisks indicate "don't care" bits. The bit locations at which the schema is

specified are the defining locations of the schema. The order of the schema, designated

by o(H), is the number of its defining locations and can range from 0 to L. In this exam-

ple, o(H) = 2. The defining length of the schema, designated 8(H), is the number of bit

positions subtended by its outermost defining bit locations minus 1. In this example,

6(H)= 5 -2 = 3.

For a bit-string space of length L, there are exactly 3L distinct schemata. This can be

readily determined by noting that the distinct schemata are selected from {0, 1,*}L. A

given string selected from the space represents exactly 2L distinct schemata. This results

from the fact that the string is defined at all L bit positions, and hence is selected from

{0, 1} The schemata of an optimization problem's search space are the building blocks

from which good solutions are to be constructed.

3.3.2 Schema Processing and the Fundamental Theorem

Let the constant population size of a simple genetic algorithm be designated M.

Then, each generation produced by the algorithm represents some number, N, of distinct

schemata that is bounded as follows

2L< N
The lower bound obtains when all M members are identical, and the upper bound repre-

sents a limit on schema diversity supported by the specified population size.

Now, briefly recalling the mechanisms implemented by the three simple genetic

operators, it is possible to begin understanding their influence on the search evolution. In

particular, the reproduction operator tends to reduce, never increase, the number of dis-

tinct schemata present in succeeding generations by selectively reproducing strings which

are realizations of above average fitness schemata to the exclusion of below average

competitors at the same set of defining locations.

The crossover operator, on the other hand, tends to produce new schemata by

assembling high performance low order schemata in new combinations at the expense of

disrupting high order high performance schemata. The extent of population divergence

introduced by the crossover operator is determined in part by the degree of schema diver-

sity present in the current population. In particular, when the population becomes uni-

form, the crossover operator is nullified, because assembling substrings extracted from

identical strings produces identical progeny.

The mutation operator also provides a disruptive mechanism which resists the con-

verging influence of the reproduction operator. Since any schema can be produced by

mutation with nonzero probability, the permanent extinction of any of the 3L possible

distinct schemata is precluded.

These ideas are captured in the following inequality, which is referred to in the liter-

ature as the Fundamental Theorem of Genetic Algorithms. It relates the number of copies

of a particular schema in the current generation to the expected number of copies of the

same schema in the succeeding generation. This inequality is derived in [Gold89a] from

relatively simple probability notions. The development is not repeated here.

E{ m(H,k+l) ) 2 m(H,k) x x Eq. 3.2
R

8(H)
[1 p x --- p x o(H)]
(L 1)

where m(H, k) = number of occurrences of schema H in the population at

generation k,

E{} = expected value operator,

R(H) = average objective function value (> 0) of all strings in

the current population which are realizations of H,

R = the average objective function value of the current pop-

ulation.

Equation 3.2 is an inequality because it does not consider the accretion of the

schema H contributed by crossover and mutation. It only accounts for the disruptive

effects of these operators. A more thorough treatment can be found on pp 9-13 of

[Gref87], but the result is too cumbersome to be of much analytical value.

Qualitatively, Eq. 3.2 suggests that low order schemata occurring in the current

population contribute to succeeding generations in direct proportion to the product of

their number in the current generation and their average performance relative to the other

schemata competing for dominance of the same set of defining locations. Crossover and

mutation tend to disrupt this converging influence, and the disruptive effect of crossover

is directly proportional to the defining length of the schema in question.

In view of Eq. 3.2, the building block hypothesis might be restated as a characteris-

tic of genetic algorithm amenable optimization problems. A GA amenable problem is one

for which a near optimum solution can be achieved, with a relatively small expenditure of

search effort, by assembling high performance, low order schemata into novel combina-

tions. If the objective function is such that (nonlinear) contributions from combinations of

bits spanning widely separate bit locations are appreciable (i.e. if the objective function

depends heavily on large defining length schemata), then the problem is not likely to be

suitable for solution by genetic algorithm. On the other hand, if the objective function

depends predominantly on short defining length schemata, then sorting through promis-

ing combinations of realizations of those schemata is likely to isolate good (though not

necessarily optimal) solutions. Accomplishing the required sorting efficiently is the task

for which genetic algorithms are well suited.

3.4 An Assessment of the Genetic Algorithm Theoretical Foundation

The existing theoretical foundation for analysis of genetic algorithms includes the

fundamental theorem of genetic algorithms (Eq. 3.2) originally enunciated by Holland

[Holl75] and extended by Bridges and Goldberg [BrGo87], the Walsh function approach

to computing schema fitness averages contributed by Bethke [Beth80] and

generalizations of it [Gold88, Gold89b, BrGo89], a result concerning selection of the

optimal population size for the algorithm [Gold85] in terms of the solution space dimen-

sion and the examination of the properties which make a problem difficult for genetic

algorithms (the so called minimal deceptive problem) [Gold87, Gold89b]. Also, both De

Jong [Dejo75] and Goldberg/Segrest [GoSe87] employ Markov chain methodology

accompanied by approximate numerical analysis to examine certain specific problems

concerning finite length chain behavior (e.g. genetic drift in a binary allele genetic algo-

rithm).

No complete theoretical model exists for describing the operation of the simple f

genetic algorithm executing on a specified optimization problem. The central theme of

the work underlying this paper is an attempt to develop such a model based upon the

asymptotic behavior of a Markov chain which represents the algorithm.

SECTION 4

A MARKOV CHAIN MODEL OF THE SIMPLE GENETIC ALGORITHM

4.1 Overview

From the discussion of the simple genetic algorithm operators in Section 3.2, it is

clear that the sequence of populations generated by the algorithm when executing on a

specified combinatorial optimization problem is a stochastic process (with finite state

space), and further that the conditional dependence of each population in the sequence on

its predecessors is completely described by its dependence upon the immediate predeces-

sor population. Thus, the sequence is a Markov chain (Definition Al). In this section, a

nonstationary Markov chain model of the simple genetic algorithm is developed for one,

two and three-operator variants of the algorithm. The model is tailored to resemble that

offered in Section 2.4.1 for simulated annealing. The one-operator genetic algorithm

model implements proportional reproduction only, while the two-operator variant

employs reproduction in combination with mutation. The three-operator algorithm imple-

ments reproduction, mutation and crossover. This model hierarchy is employed because it

provides some degree of insight into the effect that each operator has on the nature of the

state space of the resulting Markov chain.

Describing and analyzing the operation of the simple genetic algorithm is facilitated

by assuming that the underlying optimization problem is defined over a bit-string solution

space. This assumption is not essential and sacrifices very little generality. It is implem-

ented throughout the following sections.

4.2 The Markov Chain Model

Let a combinatorial optimization problem be characterized by the pair (S,R) where

S={0,1 )L and R is a strictly positive real valued reward function, and assume, with no

loss of generality, that the problem requires maximization of R. Also, let a simple genetic

algorithm designed to execute on this problem have fixed population size M, let i e S be

interpreted as an unsigned integer (0 < i < 2L 1), and let a generation be represented by

m = (m(0), m(l), m(2L- 1)) where m(i) = the number of occurrences of solution i e S

in the population. Thus, in the parlance of combinatorial mathematics, m is a distribution

of M nondistinct objects over N = card(S) = 2L bins [Hall67, Rior58], and the set of all

such distributions, S' = {m}, is a suitable representation of the simple genetic algorithm

search space. The cardinality of S' is given by

N' =card(S')= M+2L1=M+N -1. Eq. 4.1
M M

Since both N and M are finite, so is N'.

Then, if mo e S' is selected as an initial population, the simple genetic algorithm

can be represented by the quadruple (S',mo, P, F) where PQ is a state transition matrix

(analogous to PT of the simulated annealing model) and F = {QJ is a finite length

sequence of parameter vectors Qk = (Pm(k), pc(k)). The algorithm parameters pm(k) and

pc(k) are respectively the mutation and crossover probabilities. In the following sections,
the mutation probability sequence is employed in a role analogous to absolute tempera-

ture in simulated annealing, and consideration is limited hereafter to monotone nonin-

creasing sequences. In general, the only limitation on the crossover probability sequence

is that its values are probabilities. However, in all of the following, consideration is

limited to constant crossover probability sequences.

The first parameter vector in F is Q0 and the final parameter vector is Q,. The solu-

tion evolves as a sequence {ink} of states mk e S' in which the conditional dependence of

mk+i on the sequence history is equivalent to its conditional dependence on in,, and thus
the solution sequence is a Markov chain. In general, the chain is inhomogeneous (Defini-

tion A5). In Section 4.3 it is shown to be time-homogeneous if the parameter vectors are

constant. As with the simulated annealing algorithm model, exhausting the sequence of

control parameter values, F, signals algorithm termination, and n can be allowed to

depend on {fmik provided the algorithm termination requirement is satisfied.

4.3 State Behavior of the Simple Genetic Algorithm

In each of the next three subsections, the state transition mechanism (and its effect

on the nature of the solution sequence) which results from employing a specified combi-

nation of the genetic algorithm operators to the Markov chain model is examined. The

first case consists of a one-operator algorithm which employs only the reproduction

operator. The second is a two-operator algorithm which employs reproduction and muta-

tion. Finally a three-operator algorithm which includes crossover with reproduction and

mutation is examined.

Although it is most natural to describe the genetic operators in the order reproduc-

tion/crossover/mutation, the course adopted in Section 3.2, the following development

proceeds most instructively if mutation is included with reproduction in the two-operator

algorithm and crossover is deferred to the three-operator case. This is due to the fact that

the mutation operator provides the essential state space modification required to make the

Markov chains of the time-homogeneous two and three-operator algorithms irreducible

(Definitions A7 and A8, Theorem Al), and consequently causes them to have unique sta-

tionary distributions (Theorem A3). The one-operator algorithm (proportional reproduc-

tion only) does not satisfy the irreducibility requirement for existence of a unique

stationary distribution. (Neither does the algorithm variant which employs reproduction

and crossover without mutation). A unique stationary distribution means that the asymp-

totic state occupancy probability of the time-homogeneous two and three-operator algo-

rithms is completely determined by the algorithm parameters and objective function. It is

independent of the starting state (initial population). Asymptotic independence of the

starting state is a necessary (but not sufficient) condition on the zero mutation probability

limit of the stationary distribution of the time-homogeneous algorithm for the inhomoge-

neous algorithm counterpart to avoid asymptoticallyy) local minima entrapment.

4.3.1 A One-Operator Algorithm (Reproduction)

In this subsection, the nature of the state transition matrix is examined for the case

of no crossover or mutation (i.e. Q] = (0,0) for 0 < k < f). In this case, the conditional

probability of selecting a solution i E S from a population described by the state vector

ne S' is (i.e. proportional reproduction)

n(i) x R(i)
Vie S,Vn E S' :P,(i[ in)= Eq. 4.2
Y no) x R(j)
j S

where the subscript 1 indicates that the one-operator case is under consideration. Thus,

the conditional probability of the successor generation described by m given that the pres-

ent generation is described by n is a multinomial distribution, i.e.

M!
Vm,n e S': P,(m I n) = x P,(i I n)ri)
H m(i)! ieS
is S

= x P,(i In)m(i) Eq. 4.3
m ie S

M n(i) x R(i) m(i)
(
where again the subscript 1 distinguishes the one-operator case, where the symbol

= Eq. 4.4
(m n M(i)! Eq.4.4
ie S

designates the indicated multinomial coefficient and where by definition

Sn(i)x R(i) mF 1 (i )
n(i) = 0 n()xR(j) m(i)>0 Eq. 4.5
jeS

The transition probability matrix of the Markov chain representing the one-operator algo-

rithm is composed of the array of conditional probabilities defined by Eq. 4.3, i.e.

= [P,(m I n)]. Eq. 4.6

Since it is independent of the sequence index (i.e. the parameter vectors are constant), the

one-operator Markov chain is time-homogeneous (Definition A5).

The set of states which represent uniform populations (i.e. the states mA e SA' c S'

in which one component is M and all others are zero) are absorbing states of the Markov

chain, because for any such state, P(mA I mA) = 1 and Definition A6 applies. Since it fol-

lows from Eq. 4.2-3 that Vn e S' SA' P1(n I n) < 1, there are exactly N = 2L absorbing

states. The corresponding rows of P are given by

1 m=nA
VnA SA :P,( nA)= Eq.4.7
0 MES {nA

Thus, for each state nA e SA', the associated row of the state transition matrix (Eq. 4.6)

contains 1 in the principal diagonal location and 0 elsewhere. It follows that the N' x 1

probability vector q,A (Definition A2) whose nA e SA' component is 1 is a stationary dis-

tribution (Definition A10) of the one-operator Markov chain. It is not unique because any

of the N = 2L such vectors satisfies the requirement, as does any vector of the form

q= W ~l ~Awhere W2 0 andY = 1.
HA E SA

The absorbing states preclude irreducibility (Theorem Al), so the Markov chain

does not satisfy the requirements of Theorem A3. The chain is periodic (Definition A9)

however, because Vm e S' : P,(m I m) > 0 so the period of all states is 1. Thus, all of the

conditions of Theorem A3 except irreducibility are met by the one-operator Markov

chain.

The expected number of transitions required to arrive in an absorbing state, E{kA},

is finite. An upper bound on E{kA} is given by

Mx < oo Eq. 4.82M
E{kA Ix Rf J < oo Eq. 4.8
I in )

where Rni,, and Rmax are the extreme values of R. (Recall that R is assumed strictly posi-

tive, so R,,ax Rni,n > 0). Eq. 4.8 can be derived by defining pA(k) as the conditional

probability of arriving in the set of absorbing states, SA', on the k0 transition given that

the k0 state is not absorbing, letting pr,, be a lower bound on pA(k), and bounding the

series for E{k^} as follows
k-1
E{kA} = k x pA(k) (1 pA())
k 1=1
k-1
Y k x n (1 pA(1)) Eq. 4.9
k 1=1

SY_ k x (1 Pmin)k- 1
k

= 1/(Pmin)2.

Next, note that a suitable bounding value on p,,n is

Pmin = MXRmin > 0. Eq. 4.10
L Mx Rmax.

The desired bound on E{kA} (Eq. 4.8) is then obtained by using Eq. 4.10 in Eq. 4.9.

It is noteworthy that the above absorbing state convergence result does not require

any assumption on the range of Rm,, Rnn. Even when the objective function exerts zero

selective pressure (i.e. Vi e S:R(i) = Rim = Rax), the finite population size still results in

convergence to an absorbing state. In the genetics parlance, this tendency is referred to as

genetic drift. It is responsible for the inevitable convergence of the one-operator simple

genetic algorithm, as discussed in Section 3.2.1.

4.3.2 A Two-Operator Algorithm (Reproduction and Mutation)

In this subsection, the nature of the state transition matrix is examined when the

mutation operator is applied with some probability in the range 0 < pm(k) < 1 (i.e.

Qk = (Pm(k),0)). Let P2(i n) and P2(m I n) be the conditional distributions of the two-

operator algorithm corresponding to the one-operator distributions defined by Eq. 4.2-4.5.

Then, P2(i I n) and P2(m I n) must account for the effect of nonzero p,. This can be

accomplished by expressing P2(i I n) as a sum over all j of the corresponding P,(j n)

times a factor which accounts for the probability of the collection of mutation events

required to transform j into i. This probability can be expressed as p(i,)(1 pm)L-H(iJ)

where H(i,j) = H(j, i) is the Hamming distance of the pair i,j. That is, H is a function

defined on S x S with values in {0, 1,2, --,L}. H(i,j) is the number of bits which must be

altered by mutation to transform i into j and L H(i,j) is the number of bits which must

remain unaltered. Thus, P2(i I n) can be written as

Vi e S, Vn e S' P2(i I n) = p)() pm)L-(i) PI(j I n)
jes

mr I(ij)
=(1-pm)Lj (1 xP,(j|n) Eq.4.11

= 1 satIj) x Pi l n)
(1+ a)Lj e

where

Pm
a=- Eq. 4.12
(1 -Pm)

and

Pm- ( Eq. 4.13
(1 + a)

For pm=0 or pm=l, Eq. 4.11 includes the indeterminate form 00 in some terms. Thus,

the admissible range of pm is restricted to 0 < p, < 1, and consequently that of a is

0 < a < oo. However, cases corresponding to pm > 1/2 = a> > 1 are of no practical interest

(they are less random than the case pm = 1/2 = a = 1), and some of the following devel-

opments restrict consideration to the range 0 < pm 1/2 :* 0 < a < 1.

Substituting Eq. 4.2 into Eq. 4.11 yields

1 n((iJ) nO) x RO)
P2(i I n)= a x j) (
(1 + a)L s E n(k) x R(k)
k S

It is also straightforward to show that

1
Y n(k) x R(k)= 1 n(k) x R(k) x a"1'k)
keS (1 +c)LjeSkeS

Thus, P2(i I n) can be expressed as

Sn(j) x R(j) x a0',j)
P2(i I n) j= Eq. 4.14
Y YS n(k)x R(k)x ax(,k)1
je Ske S

n n(j) x R(j) x a"Oj)
je S
(1 + a)L n(k) x R(k)'
k S

and P2(m I n) is multinomially distributed as follows

M!
Vm, ne S' : P2(m I n)=x F P(i I nm()
1 m(i)! ies
ie S

= M1 xI P2(i I )m(i) Eq. 4.15
0mm ie S

(M) 1 Y n(j)x R(j)ax"('"j) m
m ( + +(X)"is n(k)xR(k)
keS

The transition probability matrix of the Markov chain representing the two-operator algo-

rithm is composed of the array of conditional probabilities defined by Eq. 4.15, i.e.

P= [P2(m I n)]. Eq. 4.16

Since the elements of P depend on a (and hence by Eq. 4.12 on pm(k)), the two-operator

Markov chain is generally not time-homogeneous. It is time-homogeneous if the mutation

probability is fixed.

Eq. 4.14-4.16 for the two-operator simple genetic algorithm are analogous to Eq.

4.2-4.6 for the one-operator variant except that P2(i I n) is strictly greater than zero for all

n e S'. Thus, the two-operator analog of Eq. 4.5 is not required. Also

lim P2(i ) = P1(i I n)
a -0+
and Eq. 4.17

lim P2(m n) = P,(m | n).
a- 0+

The rows of the state transition matrix corresponding to the one-operator absorbing
states have an especially simple form. Let iA e S be the solution represented in the
absorbing state n A SA'. Then, from Eq. 4.14,

Mx R(iA) x o 1I(iA)
P2(i I n ) = Eq. 4.18
(1 + a)L x MxR(iA)
H(i, iA)

(1 +a)L

Thus, from Eq. 4.15,
(M ) (Xrnm(i) x 1l(i, iA)
P2(m I nA)= M.) (+-x)ML Eq. 4.19

Since the reward function, R, is strictly positive by hypothesis, and since
Vi,j E S : 0 H(i,j) < L, it follows that for a in the range 0 < (a < 1, then

oa" L n(j) x Rj) Y n(j)xR(j) x a"(i'J) n(j) x R(j),
je S je S je S

and consequently from Eq. 4.14 that

Sa 1 L
Vie S, Vne S':- P2(i |n) ( Eq. 4.20

Using Eq. 4.20 in Eq. 4.15 yields
a 1 )ML
Vm, ne S' : P2(m | n) Eq. 4.21
m 1l+a m l+a
From the lower bound in Eq. 4.21, the final requirement of Theorem A3 (irreduc-
ibility) is fulfilled and the Markov chain for the time-homogeneous two-operator simple
genetic algorithm possesses a unique stationary distribution, q,, given by

Vm e S' : (1) q,(m) > 0

(2) q1 = 1

(3) P=y .

Since the stationary distribution is by definition a left eigenvector of the state transition

matrix (Definition A10), it follows from Eq. 4.15 and 4.16 that the asymptotic state prob-

ability distribution of the time-homogeneous two-operator algorithm is completely deter-

mined by the objective function and the algorithm parameters. It is independent of the

starting state, mo.

4.3.3 A Three-Operator Algorithm (Reproduction. Mutation and Crossover)

The three-operator simple genetic algorithm corresponds to the case

Vk:Qk = (pm(k), pc(k)) with both pm(k) and p,(k) nonzero. Results analogous to Eq.

4.14-4.21 for the two-operator case are obtainable by defining a new function which is

similar in character to the Hamming distance function employed in Section 4.3.2 for the

two-operator case. This subsection completes that generalization. The result only reflects

the crossover operation implicitly, however it permits some very significant conclusions

concerning bounding values of the three operator conditional probabilities.

The new function, I(i,j, k, s), is defined over an ordered quadruple (i,j, k, s) where

i,j,k e S and where s e {0, 1, -,L} is a bit-string location. The states i,j e S represent

respectively the first and second parent strings selected at a particular crossover opportu-

nity and k e S represents a possible descendent string. The bit-string location s is the

location randomly selected by the crossover operator, and normally it is uniformly

distributed over its range. Thus, I is defined on S x S x S x {0, 1,2, .. L} and it takes on

values selected from {0, 1 depending upon whether the indicated crossover operation is

or is not consistent. That is, I assumes the value one if the bit-string k is produced by

crossing the bit-strings i and j at the site s, and zero otherwise.

In terms of this crossover operator function, the conditional probability of produc-

ing, via reproduction and crossover, a solution k e S given a current population described

byn S'is

P2'(k I n) =p, x P(i I ) x P (jI n) x l(ij, k, s)
iE SjE S L s

+(1 c) x Pi(k I n) Eq. 4.22

1 s=L
=P x-x Y P1(i n)x Pi(j n)xI(i,j,k,s)
L ie Sj Ss=l

+(1 pc) x P,(k I n)

where P,(i n) is as defined as in Eq. 4.2 and where P2'(i I n) refers to the two-operator

algorithm consisting of reproduction and crossover without mutation. This result assumes

uniformly distributed crossover site selection.

The array of conditional probabilities [P2'(i I n)] plays a role in the three-operator

simple genetic algorithm very analogous to the role played by the array [P,(i I n)] in the

two-operator variant. In fact, the [P2'(i I n)] array can be used as counterparts of Eq. 4.2 to

develop results exactly analogous to Eq. 4.3 and Eq. 4.6. Further, for n SA', Eq. 4.22

reduces to

P2'(k nA)= P(k I nh), Eq. 4.23

and consequently this (fictitious) two-operator algorithm (reproduction and crossover)

demonstrates the same sort of absorbing state behavior as the one-operator algorithm.

From Eq. 4.22, the three-operator conditional probabilities and state transition

matrix are expressible as

P3(i | n)= x Y a"('J) X P2'(j I n), Eq. 4.24
(1+a)L jes

M!
P3(m P I n) =1 n )mP(i )" Eq. 4.25
Fm(i)! ies
iE S

= 1xn P3(i In)m(i)
(mB iE S

and

P= [P3(mI n)]. Eq. 4.26

These results are developed in a fashion analogous to Eq. 4.14-4.16. From them, it fol-

lows that the three-operator Markov chain is time-homogeneous if both the mutation and

crossover probabilities are fixed. In general it is not time-homogeneous.

From Eq. 4.22, 4.24 and 4.25, it follows that

lim P3(i n) = P2'(i | n) Eq. 4.27
a .)+

and lim P3(m | n) = P'(m n).
a -+ 0

Also, from Eq. 4.23-4.25, the three-operator analogs of Eq. 4.18-19 apply

II(i, iA)
ca
P3(i I n= + Eq. 4.28

P3(m I nA) = L -ML Eq. 4.29

aI L P2'( I n) < Y P2(j I n) x a" (ij)- P'(j I n),
jeS jES jES

the three-operator analogs of Eq. 4.20-21 follow from Eq. 4.24-25, i.e.

Vi E S, VE S' : P,(i n) < Eq. 4.30
1 +ac I+a

and

Vm,ne S': a P(m P In)( J(EI Eq. 4.31
Im I+a II I +a

All of the state space characteristics described in 4.3.2 for the two-operator algo-

rithm follow. In particular, the Markov chain of the three-operator algorithm is irreduc-

ible. Thus, a unique stationary distribution exists for the time-homogeneous

three-operator simple genetic algorithm, and as in the two-operator case it is completely

determined by the objective function and the algorithm parameter values.

4.3.4 Summary

The asymptotic behavior of the one-operator simple genetic algorithm is dominated

by the states which correspond to uniform populations, the one-operator absorbing states.

The algorithm necessarily arrives at some member of the absorbing state set within a

finite number of algorithm iterations (Eq. 4.8). The asymptotic probability distribution

depends upon the algorithm initial population, mo. This observation is equivalent to the

fact, established in Section 4.3.1, that the stationary distribution of the one-operator algo-

rithm is not unique.

A unique stationary distribution exists for the time-homogeneous two and

three-operator algorithm variants (with a > 0), or equivalently, their asymptotic probabil-

ity distributions are independent ofmo. However, in the a -* 0+ limit, both the two and

three-operator algorithms degenerate into the absorbing state behavior which typifies the

one-operator case (Eq. 4.17 and Eq. 4.23, 4.27). A very important question is whether the

unique stationary distributions of the two and three-operator algorithms approach limits

as a O'. Section 7 answers that question affirmatively, and in Section 8, the lower

bounds reflected in Eq. 4.21 and Eq. 4.31 are employed to arrive at a monotone decreas-

ing sequence bound on pm(k) sufficient to guarantee that the limiting distribution is

achieved asymptoticallyy) by the inhomogeneous two and three-operator Markov chains.

The analogous conditional probability arrays [Pi(i I n)] and [P2'(i I n)], whose ele-

ments are defined by Eq. 4.2 and Eq. 4.22 respectively, play a very essential role in the

following sections, especially in Section 9. Most of the results developed hereafter apply

equally to the two and three-operator algorithm variants by substituting from these

41

conditional probability arrays appropriately. Thus, in much of the following, the notation

modifiers are suppressed, so that the elements of either of these arrays are denoted by

P(i | n), with the specific array reference being determined by context.

SECTION 5

SOME EMPIRICAL RESULTS

5.1 Overview

This section reports the results of some computer simulations based upon the

genetic algorithm Markov chain model developed in Section 4. Their purpose is to help

fix some of the state space and asymptotic probability distribution ideas which are central

features of this work.

The results reported here are separated into four subsections. Section 5.2 concerns

enumeration of the state space, S'. Section 5.3 is devoted to generation of reward function

data, which are subsequently used in the two remaining subsections. Section 5.4 illus-

trates the behavior of some selected conditional probabilities as a function of the algo-

rithm control parameter, a. The results of the primary simulation task are reported in

Section 5.5. They concern computation of the three-operator stationary distribution at

extremely low (approaching zero) values of the mutation probability control parameter.

One of the significant theoretical results developed in subsequent sections is sug-

gested by the data presented in Section 5.5. It is that the zero mutation probability limit-

ing stationary distribution provides nonzero probability for all states corresponding to

uniform populations (i.e. one-operator absorbing states), including those which represent

suboptimal solutions. This result poses a complication for the attempt to extrapolate the

simulated annealing convergence theory onto the genetic algorithm, as discussed further

in section 5.5.

All simulation results included here were generated on the Cray Y-MP computer at

the Eglin AFB, Fl. Computer Science Directorate. The data presented in Section 5.5 con-

cerning the primary simulation task (the converged limiting stationary distribution

results) includes some CPU utilization statistics which reflect the approximately 180

hours of CPU time expended in generating that data. The source program listings for the

programs employed in generating the results of this section are included in Appendix D.

5.2 State Space Enumeration

The results appearing in this section are of two primary types. The first is a table of

computed state space cardinality values, N', at a variety of combinations of bit-string

length, L, and population size, M. These results are products of the program GET_NPS.F

appearing in Appendix D. It implements Eq. 4.1. The results are collected in Table 5-1.

In addition to the N' column, Table 5-1 includes a similar column labeled N". It

denotes the cardinality of a space designated S" which is related to S' and whose signifi-

cance is established in Section 9. Its cardinality is given by

N" =M N. Eq. 5.1

The data recorded in column N" of Table 5-1 are computed from this equation by the

program GET_NPS.F.

Table 5-1

State Space Cardinality

M L N N' N"
1 1 2 2 3
2 4 4 5
3 8 8 9
4 16 16 17
5 32 32 33
6 64 64 65
7 128 128 129
8 256 256 257

2 1 2 3 6
2 4 10 15
3 8 36 45
4 16 136 153
5 32 528 561
6 64 2080 2145
7 128 8256 8385
8 256 32896 33153

44

Table 5-1 (continued)

M L N N' N"

4
20
120
816
5984
45760
357760
2829056

2
4
8
16
32
64
128
256

2
4
8
16
32
64
128
256

2
4
8
16
32
64
128
256

2
4
8
16
32
64
128
256

2
4
8
16
32
64
128
256

84
1716
54264
2324784
119877472
6856577728
414356272512

8
120
3432
170544
12620256
1198774720
131254487936
15508763342592

165
6435
490314
61523748

10
35
165
969
6545
47905
366145
2862209

5
35
330
3876
52360
766480
11716640
183181376

6
56
792
15504
376992
10424128
309319296
9525431552

15
70
495
4845
58905
814385
12082785
186043585

21
126
1287
20349
435897
11238513
321402081
9711475137

28
210
3003
74613
2760681
131115985
7177979809
424067747649

36
330
6435
245157
15380937
1329890705
138432467745
15932831090241

45
495
12870
735471
76904685

' '

Table 5-1 (continued)

M L N N' N"
6 64 10639125640 11969016345
7 128 2214919483920 2353351951665
8 256 509850594887712 525783425977953

The second set of state space enumeration results reported here is a listing of the

elements of S' for a variety of values of M and L. The principal purpose of this particular

simulation task is to verify the computer algorithm (and the implementing subprograms)

employed to generate the state space vectors for use in the primary simulation task

reported in Section 5.5. The data in Tables 5-2, 5-3 and 5-4 below are products of the

program GET_SPS.F appearing in Appendix D. The tabulated results are representative

of data produced for M and L ranging up to 5 and 4 respectively, for which N' = 15504.

In each case generated, the cardinality of the result agreed with that predicted by Eq. 4.1

and recorded in Table 5-1.

Table 5-2 Table 5-3

S' at M=2, L=2 S' at M=3, L=2

2 0 0 0 3 0 0 0
1 1 0 0 2 1 0 0
1 0 1 0 2 0 1 0
1 0 0 1 2 0 0 1
0 2 0 0 1 2 0 0
0 1 1 0 1 1 1 0
0 1 0 1 1 1 0 1
0 0 2 0 1 0 2 0
0 0 1 1 1 0 1 1
0 0 0 2 1 0 0 2
0300
0 2 1 0
0201
0120
0 1 1 1
0102
0030
0021
0012
0 0 0 3

Table 5-4

S' at M=2, L=3

20000000
11000000
10100000
10010000
10001000
10000100
10000010
10000001
02000000
01100000
01010000
01001000
01000100
01000010
01000001
00200000
00110000
00101000
00100100
00100010
00100001
00020000
00011000
00010100
00010010
00010001
00002000
00001100
00001010
00001001
00000200
00000110
00000101
00000020
00000011
00000002

5.3 Reward Function Data

This section presents two sets of reward function data, one for a four-bit optimiza-

tion problem and another for a five-bit version. Both data sets are products of the pro-

gram GET_R.F provided in Appendix D. These data sets are employed in the simulations

presented in Sections 5.4 and 5.5. Figure 5-1 presents the four-bit function and Figure 5-2

the five-bit version.

In both data sets, the solution state which maximizes the reward value is the i e S

represented by the decimal integer value 12. That is, for the four-bit function, iop, = 1100,

and its five-bit counterpart is iop, = 01100. The reward function value for the arbitrary

i E S is then computed by assigning the value 1 for each length 0, 1 or 2 schema (Section

3.3.1) in agreement with the optimum bit pattern and summing the contributions. Thus,

for example, for the four-bit reward function, the bit-string 0000 has function value 4,

generated by summing the contributions from the single matching length 0 schema, two

matching length 1 schemata and the one matching length 2 schema. A strictly positive

reward function is guaranteed since every string matches the single length 0 schema.

8 H-

R(i)

0 1 2 3 4 5

6 7 8 9 10 11 12 13 14 15

Four-Bit Reward Function

Figure 5-1

I I I I I I I I I I I I I I I I _

12

10

8

R(i)

6

4

2

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

i

Five-Bit Reward Function

Figure 5-2

5.4 Conditional Probabilities Versus at

The following four figures present plots of two and three-operator conditional prob-

abilities at two selected current states, n. These results are computed from Eq. 4.14 and

Eq. 4.24. The plots are generated for the four-bit problem with reward function given in

Figure 5-1 and with M= 6. From Table 5-1, the cardinality of S' for these examples is

N' = 54264. The conditional probabilities are provided at two selected n vectors, one rep-

resenting the uniform population n = (6000000000000000) and one the mixed population

state n = (2000010001002000), and at three values of the mutation probability parameter.

The two and three-operator results are respectively products of the computer programs

GET_P2INS and GET_P3INS provided in Appendix D.

49

The purpose of the tests from which these data are produced is verification of the

computer algorithms (and the implementing subprograms) employed to generate the con-

ditional probability calculations required by the primary simulation task reported in Sec-

tion 5.5. Thus, for example, all conditional probability distributions are uniform at a = 1

as is required by Eq. 4.14 and Eq. 4.24, and for a O0 all conditional probability

distributions approach the one-operator counterparts as is required by Eq. 4.17 and Eq.

4.27. Also, the two and three-operator conditional probabilities are identical for the uni-

form population case (Figures 5-3 and 5-5) as required by Eq. 4.18 and 4.28, and the

three-operator mixed population state case allows generation of solutions not present in

the current population even in the zero mutation probability limit.

0.8

P2(i I n)

0.6

0.4

S\.

0 1 2 3 4 5 6 7 8 9 10I 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1.0

.5

.02

i

P2(i I n) at n = (6000000000(XXXXX))

Figure 5-3

0.8

P2(i n) 0.6

0.4

0.2

0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i

P2(i I n) at n = (2000010001002000)

Figure 5-4

1.0

.5

.02

1

0.8

P3(iln) 0.6

0.4

0.2

0

i

i

i

~------~

i
I

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P3(i I n) at n = (6000000000000000)

Figure 5-5

1.0

.5

.02

'---,

1 1.0

.5
0.8
.02

P3(i n) 0.6

0.4

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i

P3(i I n) at n = (2000010001002000)

Figure 5-6

5.5 Converged Limiting Stationary Distributions

The following data represent converged three-operator stationary distribution

results for both four and five-bit problems at a variety of population sizes. The results

recorded in Figures 5-7 through 5-16 are products of the computer program

GET_3STAT.F included in Appendix D. They are obtained by repeatedly multiplying a

current state probability vector by the three-operator state transition matrix until a termi-

nation criterion representing approximate convergence is attained. The starting probabil-

ity vector is the multinomial distribution corresponding to a uniformly distributed P3(i I n)

array, and the termination criterion is that the sum of the probabilities for all nonuniform

population states is less than 0.004.

All of the results reported here are for extremely small a (approaching zero) and

thus, as predicted by the model, only the states corresponding to uniform populations

(one-operator absorbing states) have nonzero probability. Consequently, only the final

probabilities for the uniform population states are displayed in Figures 5-7 through 5-16,

with each such state indexed by the decimal integer value corresponding to the solution

represented.

Table 5-5 summarizes the Cray Y-MP computer resources expended in generating

these data. Tabulated there are the number of vector multiplications (of dimension N')

required to attain the termination condition and the CPU time utilized. The CPU time is

in seconds, rounded to the nearest integer. The tabulated data are collected from the log

files generated in the computer runs which produced the stationary distribution data for

Figures 5-7 through 5-16.

The limiting distribution entropy results in Figures 5-17 and 5-18 are computed

from the converged stationary distributions. The results are recorded in bits and are

plotted as a function of population size.

A very significant result suggested by the limiting stationary distribution data is that

the a -> 0' value of the stationary distribution is nonzero for all possible uniform states.

This behavior, which is confirmed by theoretical results developed in Section 7, pre-

cludes extrapolation of the simulated annealing global optimality convergence result onto

the genetic algorithm. However, as suggested by the data plotted in Figures 5-17 and

5-18, it may be possible to approach the desired limiting behavior as closely as required

by adjusting the population size parameter. Those figures indicate that for sufficiently

large values of the population size parameter, the limiting distribution is dominated by

optimal solutions, and that the limiting distribution entropy decreases monotonically with

increasing population size. Results developed in Section 9 reinforce this premise.

0.12

0.1

0.08

q(i)

0.06

0.04

0.02

0
0 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i

Limiting Stationary Distribution at M=2, L=4

Figure 5-7

II I

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i
Limiting Stationary Distribution at M=3, L=4
Figure 5-8

0.15 k

q(i)

0.1 -

0.05 -

I

I

I I I 1 111
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i

Limiting Stationary Distribution at M=4, L=4

Figure 5-9

0.25 F

0.2 F

0.15 F

0.1 -

0.4

0.3

q(i)

0.2

0.1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Limiting Stationary Distribution at M=5, L=4

Figure 5-10

0.5

0.4

0.3
q(i)

0.2

0.1

0 I I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i

Limiting Stationary Distribution at M=6, L=4

Figure 5-11

0.6

0.5

0.4

q(i)

0.3

0.2

0.1

0 I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i

Limiting Stationary Distribution at M=7, L=4

Figure 5-12

0.06

0.05

0.04

q(i)

0.03

0.02

0.01

0 H-------------------------------
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Limiting Stationary Distribution at M=2, L=5

Figure 5-13

0.1

0.08

0.06
q(i)

0.04

0.02

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

i

Limiting Stationary Distribution at M=3, L=5

Figure 5-14

0.16

0.14

0.12

0.1

q(i)

0.08

0.06

0.04

0.02

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Limiting Stationary Distribution at M=4, L=5

Figure 5-15

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

i

Limiting Stationary Distribution at M=5, L=5

Figure 5-16

Table 5-5

CPU Utilization Statistics

M L N' Iterations Seconds
2 4 136 8 <1
3 4 816 14 8
4 4 3876 19 86
5 4 15504 23 1219
6 4 54264 27 19048
7 4 170554 30 219930
2 5 528 9 13
3 5 5984 15 301
4 5 52360 21 11676
5 5 376992 26 **

** Not obtained due to unrecoverable log file error

64

4

3.5

3

2.5

2 -I II-I
2 3 4 5 6 7

M

Limiting Distribution Entropy vs Population Size (Four-Bit Problem)

Figure 5-17

5

4.8

4.6

H

4.4

4.2

4

3.8
2 3 4 5

M

Limiting Distribution Entropy vs Population Size (Five-Bit Problem)

Figure 5-18

SECTION 6

THE CRAMER'S RULE FORMULATION OF THE STATIONARY DISTRIBUTION

6.1 Overview

In Sections 4.3.2 and 4.3.3, the time-homogeneous two and three-operator simple

genetic algorithm Markov chains are shown to possess unique stationary distributions.

Those conclusions are established by invoking Theorem A3, which asserts that in each

case the stationary distribution is a left eigenvector of the state transition matrix and that

the additional constraint that it be a probability vector (Definition A2) makes the solution

unique. In this section, the existence and uniqueness arguments are refined into a Cram-

er's rule formulation of the solution. This development concerns the time-homogeneous

algorithms only, with a constrained to a > 0, and it appeals heavily to the foundation

provided in Appendix B.

The product of this development is an expression for the components of the station-

ary distribution vector as rational functions generated from the characteristic polynomials

of matrices derived from the state transition matrix. The derived matrices are generated

by setting selected rows of P to zero. The utility of the approach is that the form of P

suggests a mechanism for expressing the values of the characteristic polynomials. Some

key intermediate parts of the required methodology are developed in Section 9, but the

effort stops short of explicit solution. However, some very significant conclusions con-

cerning the asymptotic behavior of the algorithm are obtainable (Sections 7 and 8) from

the results developed here without explicitly solving the system.

6.2 The Stationary Distribution Description

As established in Section 4, implementation of the mutation operator with nonzero

mutation probability (i.e. a > 0) implies that for both the two and three-operator

algorithms, Vm,n e S':P(m I n)> 0. Thus, by Definition Bl, p is primitive for any integer

k 2 1. Hence, from Section B.3, the stationary distribution of the two and three-operator

simple genetic algorithm exists, is unique and is a left eigenvector of the state transition

matrix corresponding to eigenvalue 1, i.e.
-.P -q

or equivalently

q~j(P- I) = Eq. 6.1

The following proposition establishes a significant fact concerning the rank of the

matrix (P-I) in Eq. 6.1.

Proposition 6.1: The rank of the matrix (P- 1) in Eq. 6.1 is exactly N'- 1 where

N' = card(S') is the dimension of P.

This result follows from Theorem B4(f). Its significance is that exactly one column

of the system of equations in Eq. 6.1 can be replaced without sacrificing any of the con-

straints which Eq. 6.1 imposes on q,. Proposition 6.2 below concerns such a modification

of the system. The modification consists of replacing any column (e.g. the column

indexed by n E S') of Eq. 6.1 by a column corresponding to the constraint

q,(m) = =1, Eq. 6.2
me S'

thus producing a system of the form

qi(P I = Eq. 6.3

where (P )n is generated by replacing the column of (P- ) indexed by n e S' with the

vector 1 whose components all have the value 1, and where J is the row vector contain-

ing 1 in column n and O's elsewhere.

Proposition 6.2: If the constraint described in Eq. 6.2 is used to replace any column (e.g.

column n) of the system in Eq. 6.1, the resulting system (Eq. 6.3) is full rank, or equiva-

lently, I (P- I),| 0.

Since P is a stochastic matrix (Definition A3), the system of equations in Eq 6.1 can

be transformed into an equivalent system in which the column indexed by the arbitrary

column index n e S' is represented by the equation O = 0. The required transformation

is obtainable by replacing column n by the sum of all columns m e S', and thus any

n e S' is a candidate for replacement. Proposition 6.2 is then a restatement of Proposition

B2 in terms of the determinant of the matrix of the modified system. It is the essential

condition for justification of the following proposition.

Proposition 6.3: The components of the stationary distribution can be expressed in the

form

(P I I
(M-
I ( (P- l1

where (P-I,) is derived from (P-I)n by replacing the row of (P-I)n indexed by m e S'

with the row vector e.

This result is simply an application of Cramer's Rule to the solution of the system

in Eq. 6.3. It applies because I (P I)| 0 is assured by Proposition 6.2.

The equality defined in Proposition 6.3 can be evaluated without computing

I (P- I)| directly, as suggested by the following proposition.

Proposition 6.4: The denominator determinant in Proposition 6.3 can be written as

P- -m S)
im C S'

This result follows from application of elementary column operations on column n
of I (P-T)n and employing the definition of I (P- ) |. The essential step is noting that
the cofactor of each of the (unit) elements in column n is equal to the corresponding

Since the numerator determinant defined in Proposition 6.3 is generated from
(P I) by replacement of row m by the row vector e its value is the cofactor of the
(unit) element in row m and column n. As indicated in the following proposition, it is
equal to the determinant which results from the corresponding row replacement in (P- ).

Proposition 6.5: The numerator determinant defined in Proposition 6.3 can be written as

(P- 1 =P IP- )Fm)

where (P- I) is defined as the matrix which results from (P-I) by replacing the row
indexed by m with the row vector e.

Next, note that if m = n, then I (P- )) can be written as

(P-I) C =

where Pi is defined as the matrix which results by replacing row m of P by the row vec-
tor 0. Ifm n, then by writing the replacement row in I (P- i) (l as
-= + er -= e F
e e-. ee = e-,, e- (-ez)1,

| (P-i1) ml can be written as the difference of two determinants derived from (P-1), one
with the mh row replaced by I[- e] and the second with the mi row replaced by [- r1.
The -e- term in each row replacement provides the necessary principal diagonal contri-
bution to permit expression of I (P I) |) as

(P- )(p --n) >=(gn-_,)|-|(pW -i)|)

where pO) is defined as before and where p) is defined as the matrix which results by
m m
replacing row m of P by the row vector e.

This result can be further reduced by noting that the row replacement by which PR

is generated from P preserves the row sum constraint (i.e. P- is a stochastic matrix).

Thus, I is an eigenvalue of P (Definition A3), from which it follows that I (Pj )| = 0.

Consequently, the m = n and m # n cases can be assembled as indicated by the following

proposition.

Proposition 6.6: The determinant I (P-I) ~ I defined in Proposition 6.5 can be written

as

(P I) F C=- -)

By collecting the results of Propositions 6.3-6 and noting that the superscript in P)

is now superfluous, the components of the stationary distribution can be written as indi-

cated in the following proposition.

Proposition 6.7: The components of the stationary distribution can be expressed in the

form

P 1| 1 PI II

nE S' ne S'

where Pm and Pg are derived from P by replacing the rows indexed by m and n respec-

tively with the row vector 6r.

Thus, computing the stationary distribution components reduces to evaluating the

characteristic polynomials of the Pn's at X = 1 (i.e. P(X) = I P- -II =I I P -1 = 1 (1)).

Also, since 1 is an eigenvalue of P it follows that ((1) = P II = 0, which suggests the

following alternative to Proposition 6.7. Its usefulness is established in Sections 9.3-9.4.

Proposition 6.8: The components of the stationary distribution can be expressed in the

alternative form

I P-II -IPR-l
q,(m) -
O (IP-II-I|P-II)
ne S'

where as before P, and PN are derived from P by replacing the rows indexed by m and n
~~T
respectively with the row vector 0 .

6.3 Positivity of the Stationary Distribution Components

Strict positivity of the stationary distribution components can be deduced from

Theorem B4 and the form of Pn. Every element of PR in every row other than row m is

identical to the corresponding element of P, while those in row m are zero. This is

expressed in the nonnegative matrix notation of Appendix B as 0 < P-

Pn and P differ in row m, Pn # P, and consequently by Theorem B4(e), every eigenvalue

of Pi satisfies I i,| < 1. It follows that for X 2 1, (1) | P ?|I = F(X1 X)0 and (2) the

algebraic sign of I|P,-R I is (-1)N for all m e S'. Specializing these arguments to the

case = 1 yields the following proposition.

Proposition 6.9: For all a > 0, the value of the determinant I P-II satisfies

Vm e S' : (1) IPR-I| Oand

(2) the algebraic sign of P_-ll is (-1)N.

An immediate consequence of Proposition 6.9 is that both numerator and denomi-

nator of the expression for q.(m) in Proposition 6.7 are nonzero and have identical alge-

braic sign. Strict positivity of the stationary distribution components follows from these

observations. That is, Vm e S': q,,(m) > 0.

6.4 The Indeterminate Form at a= 0

All of the results established in this section assume that the mutation probability

parameter is strictly positive (a > 0), and thus are not applicable at a = 0. The reason is

apparent when Eq. 4.7 and the two-operator result in 4.17 (or the three-operator counter-

parts of Eq. 4.17 given by Eq. 4.23 and 4.27) are applied to Pj-II. It follows that the

row of the a 0' limit of I P-II corresponding to the one-operator absorbing state

nA E SA', nA : m is zero. That is, the only nonzero entry in row nA of the a -) 0 limit of

P- is the principal diagonal element

lim P2(nA I nA) = lim P(A nA)= Pl(nA I nA)= 1,
a0-- a-0+

which is cancelled by the corresponding principal diagonal element in -I. Thus,

Vm E S' : lim IP-II = 0,
a- 0+

and consequently Propositions 6.7 and 6.8 yield indeterminate forms. However, as dem-

onstrated in the following section, it is possible to verify that a limiting stationary distri-

bution vector exists for the time-homogeneous two and three-operator algorithms.

SECTION 7

THE ZERO MUTATION PROBABILITY STATIONARY DISTRIBUTION LIMIT

7.1 Overview

In Section 4.3.1, it is established that the time-homogeneous one-operator genetic

algorithm Markov chain possesses a stationary distribution but that it is not unique. In

Sections 4.3.2 and 4.3.3, it is established that the time-homogeneous two and three-

operator counterparts possess unique stationary distributions provided a > 0, and Section

6 formulates the existence and uniqueness argument into a rational function expression

for the unique solution. Since the two-operator state transition matrix approaches its one-

operator counterpart as a -- 0O (Eq. 4.17) and since the three-operator algorithm exhibits

the corresponding behavior with respect to the P2'(i n)s (Eq. 4.23), a question which

naturally arises from these observations is whether an a( 0+ limiting distribution exists

for the two and three-operator algorithms. (If such a limit exists, then it is necessarily

unique). This section answers that question affirmatively and also confirms the observa-

tion made in Section 5.5 that the limiting distribution is nonzero for all states correspond-

ing to uniform populations (absorbing states).

The approach taken here is to transform the expressions for q,,(m) in Propositions

6.7 and 6.8 into equivalent expressions which yield determinate forms at a = 0. The result

requires transforming P and P- into related matrices but with the states corresponding to

uniform populations (one-operator absorbing states) coalesced into adjacent nonuniform

population states. The development is tedious and involves some additional notation.

7.2 Functional Form of the Stationary Distribution

Before proceeding with the limiting case development which is the primary purpose

of this section, it is convenient to establish some intermediate results concerning the

behavior of q as a function of a. These results follow from the results developed in Sec-

tion 6 and some simple observations about the form of the elements of P.

From Eq. 4.14-4.16 and Eq. 4.22-26, all elements of the state transition matrix are

rational functions of a with denominator polynomial (1 + a)ML. Thus, for a > 0

(1 +aM P'l -I = I (1 + a)MLp-- (l + a)MII

=10,,-(1+00^ 11

where every element of Qn (and hence the value of I (1 + a)ML"I ) is a polynomial in

a. Further, since row m in Q- is zero, the polynomial value of the determinant includes

the factor (1 + a)ML. Consequently

(l + a)ML'- II P = (0) Eq. 7.1

for O,(a) some polynomial function of a. Proposition 7.1 below follows.

Proposition 7.1: For all a > 0, the value of the determinant Pj 1 is a rational function

of a with nonzero denominator polynomial (1 + oa)ML'-1I)

By applying Eq. 7.1 to Proposition 6.7, the components of q, can be written as

%5(00) e(a)
qf(m) = =O--(0) .(o) Eq. 7.2
IC XO e)((Xa) '
in S'

Hence, the q,(m) are rational functions of a, and since a rational function is continuous

everywhere its denominator polynomial is nonzero, application of Proposition 6.9 and

Eq. 7.1 (which together establish that E(a) = 8()a) 0) to Eq. 7.2 yields the following.

Proposition 7.2: For all a > 0, the components of q, are continuous rational functions of

the independent variable a.

Further, differentiation of Eq. 7.2 with respect to alpha yields a rational function of a

dq.(m) 1 d[(-) dOo()
dq~i) OC2 I () (a) d(( (c) () Eq. 7.3

with nonzero denominator polynomial 6(a)2. The following proposition is a conse-

quence.

Proposition 7.3: For all a > 0, the components of the first derivative of q, with respect to

a are continuous rational functions of a.

7.3 The Absorbing State Rows of IP-Il and I P,-II

The rows corresponding to one-operator absorbing states in the determinant

SP- II have a particularly simple form. The nondiagonal elements of row nA e S,', which

represents a uniform population of solutions iA e S, are given by Eq. 4.19 and 4.29

respectively for the two and three-operator cases. The principal diagonal element is

obtained by evaluating Eq. 4.19 or 4.29 at m = nA and subtracting 1. Thus,
(M MHU(i ,i )

(nA (I+ O)M

S-1
(1 + a)ML

1 - MLao- ML(ML l)a2/2 -. O-a"M
(1 + a)"L

-MLa + O(ao2)
(1 + )ML

and if the general element in | P- I is denoted by T(m I n), then the elements of row nA

can be written as

-MLa+ O(a2)
S--- +--[ n = n,
(1 + a) Eq. 7.4
Vn E S,': T(n I|n)= o )(),,),^i Eq. 7.4
ne S'- {nA}
n(1 + a)'-

Additional insight into the form of the absorbing state rows can be obtained with

the aid of the following notation. Let mA, A SA' be distinct but otherwise arbitrary

absorbing states of the one-operator Markov chain, let iA e S be the bit-string represented

in nA and let S(iA) S be the set of bit-strings accessible from iA via exactly one bit muta-

tion event (i.e. S(iA)= {i:i, e S,H(il,iA)= 1}). It follows from this definition that

card(S(iA)) = L. Then, for M > 1 let S(nA)' be defined as

S(nA)' = {n:ne S',n(iA)= M-l,n(i,)= ,i1 i S(iA)} CS',

the set of nonabsorbing states adjacent to the absorbing state nA. The restriction on M is

required to ensure that no absorbing state mA is contained in the adjacency set of any

absorbing state nA. S(nA)' includes exactly one distinct element for each i, e S(iA), and

consequently card(S(nA)') = card(S(iA))= L. Also, from the form of S(nA)', it follows that

for M 2 3, S(mA)' and S(nA)' are disjoint if mA and n, are distinct one-operator absorbing

states. Thus, if SA" is defined as

SA"= S(n)'
A SA'

and M > 3, then card(SA")= card(SA') x L = NL. This restriction on M is assumed in all of

the following.

With the aid of the new notation, the element in column n e S(nA)' of row RA in

[P-II can be written as

Vn E S(na)': T(n nA)=P(n I nA) =( M

M-1 (1+o )ML

Moe
S(1 + a)ML
Ma
(1 + a)ML

Thus, Eq. 7.4 can be revised as follows

-MLa + (a2)
(+ )7- n = nA
MC
VnA E SA' T(n I A)= ()L nE S(nA)' Eq. 7.5

ne S'- S(nA'-
(1 + )ML

where the exponent s of ca in the order expression for the general term is an integer satis-

fying s 2 2. The elements in columns nA and n E S(nA)' are first order in a while the ele-

ments in all other columns are at least second order.

Eq. 7.5 applies to every absorbing state row of I Pj 11 as well if m e SA'. If

Pi I is being considered where mA e SA', then row mA contains -1 at its principal
diagonal and zeros elsewhere. In that case Eq. 7.5 only applies to the absorbing state rows

n, e SA' {mA}. Exactly N- 1 such rows exist in Pi I .

By applying Eq. 7.5 and these observations to Proposition 7.1, it follows that the
lowest order term with nonzero coefficient which can conceivably exist in the numerator

polynomial of Pm^-i1 is the order cN-' term. Similar reasoning reveals that the corre-

sponding lowest order term with nonzero coefficient for I P,- II with m e SA' is the order

aN term. If the coefficient of the order a"N- term in the numerator polynomial of IP^ -1

is indeed nonzero, and if the corresponding coefficients for all such m, have the same

algebraic sign, then the required limiting value of q can be expressed in terms of these

nonzero coefficients via substitution into Proposition 6.7. These conditions are in fact

satisfied as demonstrated below.

7.4 Reformulation of Propositions 6.7 and 6.8
The next step in this development is the definition of some auxiliary matrices

related to P and Pin and the reformulation of Propositions 6.7 and 6.8 in terms of them.

The new matrices, designated P(mA)' and PW respectively, are derived by coalescing

each of the N 1 absorbing state columns nA e S,' {nmA of P- 11 and P^ -1I with its

neighboring nonabsorbing state columns, n e S(nA)'. Specifically, let ^ be derived

from p by adding 1/L times the column nA e SA' {IA} to each of the L adjacent
nonabsorbing columns n e S(nA)' and repeating the process for each remaining
nA e SA' {mA}. This operation is applied once each for the exactly N 1 absorbing state
columns nA e SA' {mA} and it preserves the value of the determinant Qi^ = P^-I
If now Q~A(m I n) denotes the general element of QA then by applying the recipe
used in its construction and Eq. 7.5, the elements in the absorbing state rows
nA E SA' {mA} of Qii can be written as

-MLao + O(a2)
(I+c)ML m = nA
(1 + a) A
O(a2)
VmA e SA',V'A A SA' -{mA} Q(m I nA) -ML me S(nA)'

O(acs
(a)L m e S'- S(nA)'- {nA}
(1 + a)ML
where as before s is an integer satisfying s 2 2. Thus, each of the N 1 absorbing state
rows nA e SA' {mA} of QA I can be written as a sum of two rows, one row containing
-ML(/(l + ca)L at its principal diagonal location and zeros elsewhere and the second row
being a multiple of a2/(1 + a)ML. It follows from elementary determinant row expansion
operations that P -I = |Ai can be written as

S -(_MLaO)N-I O(as)
Pi- = I = A = + Eq. 7.6
A(1 + ML( 1) ( +()MLN'

where Q iA'A is the order N' N + 1 principal minor of Qmi generated by deleting the
N 1 row/column pairs which intersect on the nA e SA' {mA} principal diagonals and
where the exponent s of a in the Eq. 7.6 order expression is an integer satisfying s 2 N.
The elements in all rows of QA' I except row m, are composed of contributions
from the elements in nonabsorbing state rows of P and the -1 principal diagonal term
contributed by I in Pj^- I. Row mA of IQA' contains -1 at its principal diagonal loca-
tion and zeros elsewhere. Thus, if QA^ I is written as A = PA I' then from the

recipe employed in its construction, it follows that the square matrix pzi. thus defined has
mA
dimension N' N + 1 and that its elements are given by

Eq. 7.7

0 n =mA

n # mA
SP(m I n) m S' SA' +
Vi, n A
S' -SA'+mA} m n)'= {mA + S(mA) SA".

n # mA
P(m n) + P(nA n) me S(nA)'
nA e SA' {mA

Careful examination of Eq. 7.7 reveals that the transformation by which P A' is gen-

erated from PE preserves all row sums. Thus, P5,' is very similar in form to PP It is

derived from a (fictitious) row stochastic matrix by setting a specified row (row mA) to

zero.

If the preceding steps are repeated for I Pn-iI where m SA', except that all N

absorbing state columns n e SA' are coalesced rather than just the N 1 columns

nA E SA' {mA}, a result very similar in form to Eq. 7.6 obtains. That is,

(-MLa)N I Q,'I O(Ca)
P-- = Q = + Eq.7.8
(1 + a)MLN (1 + a)MLN'

where I Q'|I is the order N' N principal minor of I Q5j generated by deleting the N

absorbing state row/column pairs and s is an integer satisfying s 2 N + 1. The nonabsorb-

ing state row m contains -1 at its principal diagonal location and zeros elsewhere.

Substitution of Eq. 7.6 and 7.8 into Proposition 6.7 yields a form more amenable to

examination of the a -4 0' limiting stationary distribution. The two cases in SA' and

m S'- SA' must be distinguished. Then, after some straightforward algebra,

C1,' + 0(a)/(1 + )N'-N+
AG QA' +O(a)/(1+a)o*'-N+'

O(a)/(l + +a)N'-N +
SI' Q + O(a)/( +a)ON-N+'
nA e SA'

m = nmAe SA'

me S'- SA

An equivalent result expressed in terms of the auxiliary matrices P- is

Pj I' + 0(a)/(1 + a)N'-N+1
lE A I' + O(a/( + a)"'-"' m=mA
q(m) -= A
O(ap/(1 + (a)'-(N+-' + I
imE SE -
C P ,' + O(a)/(l+ a)N'-N+'
nAE SA

SA

SA

Eq. 7.9

By retracing the preceding steps by which P', was transformed into PnA compan-

ion results to Eq. 7.7 and Eq. 7.9 can be developed for P(mA)'. The companion to Eq. 7.7

differs only in the elements of row n = mA. Thus, if P(m I n)' denotes the general element

in P(mA)', then

Eq. 7.10

P(m I n)

Vm, ne
S S.'+P(m I n)' =
S'-S '+mI

me S'- SA'
+ {mA + S(A)' SA"

1 me S(nA)'
P(m I n)+-P(n I n)
L nA e SA'- ImA

Further, examination of Eq. 7.10 reveals that the row sum constraint on P is preserved in

the transformation by which P(mA)' is generated (i.e. P(mA)' is a stochastic matrix). Thus,

I P(m)' I'| = 0. A consequence is the Proposition 6.8 counterpart of Eq. 7.9,

Eq. 7.11

SP( nA)' I P I' + O(a)/(l + a)N'-N+1

qa(m)= nE s- +

I { P(nA)'-I' A I'll +0(a)/(l +a)N'-N+
"A e SA

m=m E SA'

me S SA'

7.5 The Stationary Distribution Limit

The zero mutation probability limits of Eq. 7.9 and Eq. 7.11 exist if the determinant

sums in the denominators are nonzero. In fact they are nonzero, as demonstrated in the

following. This argument is very similar in form to the development in Section 6.3 con-

cerning positivity of the stationary distribution. The essential step is demonstration of the

existence of a primitive stochastic matrix Q' which satisfies both

0 lim P, Q' and Q' lim P '.
a- 0 A C-40+ A
a-.o=+ O=0

If the two-operator algorithm is under consideration, then the elements of the

a -- 0' limit of PR are obtained by substituting the one-operator results in Eq. 4.2-5 into

Eq. 7.7. If the three-operator case is under consideration, then Eq. 4.22 and Eq. 4.24,25

are employed. In the following, the two-operator notation is employed.

Let Q' be generated from the a -- 0+ limit of P A' by replacing row m^ with the row

whose elements are given by

VmE S'-SA+ {mA :Q'( I mA)= > 0. Eq.7.12
N'-N+1

Thus, the row sum of row mA in Q' is 1. Since all remaining rows of Q' are identical to

those of the a -- 0+ limit of P ', and consequently have row sum 1 by Eq. 7.7, Q' is a

stochastic matrix. Additionally, it satisfies both

0 lim P-A' < Q' and Q' lim PA'.
a -0+ aO 0

Q' can be regarded as the state transition matrix of a fictitious Markov chain

defined on the state space S' S' + {m^}. Since Q(m, I m,) > 0, the fictitious Markov

chain is both periodic (Definition A9, Theorem A2) and primitive (Definition B1, Theo-

rem B1) provided that it is irreducible (Definitions A7 and A8, Theorem Al). Thus,

primitivity is established by demonstrating that every state m e S' S,' + {m^} is

accessible in some finite number of transitions from every state n S'- S' + {m,}.

Since all states in S'-SA'+ {mA} are accessible in one transition from m, (Eq. 7.12), it is

sufficient to demonstrate that mA is accessible in some finite number of transitions from

every state n e S' SA'.

Let iA E S be the bit-string represented in mA, let n e S' SA' and let i, e S be

selected such that n(i,) > 0 and H(i1, IA) H(i, iA) for all i represented in n. Then, two

cases must be examined. In case (1), ii = iA while i t iA for case (2).

If ii = iA, it follows from Eq. 7.7 and the construction of Q' that

Q'(mA n)= lim Pi(mA I n) = lim P2(mA I n)
a 0+ ta --0+

[lim P2iA In)]M

= P(iA I n)M = P(i, I n)M > 0

and consequently mA is accessible from n in 1 transition. Otherwise

3i2e S(i,) 3 H(i2, A) = H(i,,A) 1

and further if n, e SA' is the one-operator absorbing state defined by the condition

n,(ii) = M while n,2 e S(n,)' is the adjacent nonabsorbing state defined by

nl2(i) = M 1, n2(i2) = 1, then from Eq. 7.7 and the construction of Q'

Q'(nl2 1 n) = lim P2(n2 I n)+ lim P2(, n)
0W L, 0+
a -0 Ia -o

1
= P,(n2 |n)+-P1(n, In)
L

= P,(n, n)

= [P,(i I n)]M > 0.

Thus, nl2 is accessible from n in one transition. If i2 = iA, then by the case (1) argument mA

is accessible in one additional transition. Otherwise, the case (2) argument is repeated for

some

i3 E S(i2) 3 H(i3, iA)= H(i2 ) 1 = H(i,, iA) 2.

83

This procedure necessarily terminates with H(i,,iA)+ 1 applications and the correspond-

ing state space trajectory is executed with nonzero probability.

From the foregoing argument, it follows that state m, is accessible in some finite

number of transitions from every state n e S' SA', and thus that Q' is primitive. Then,

since both

0< lim P-A'< Q' and Q'# lim P',
aO A -a + -'O+M

it follows from Theorem B4(e) that every eigenvalue of the a 0+ limit of P; ,' satisfies

SXl < 1. Proposition 7.4 below, which is the a -- 0 counterpart of Proposition 6.9, is a

consequence.

Proposition 7.4: The value of the determinant lim |Pi' -' satisfies
a-^

VmA e SA': (1) lim PiA'-Ij'#0
a->0*
(2) the algebraic sign of lim PI -I' is (-1)N'-N+1
a -- 0+

The conditions asserted in Proposition 7.4 ensure that substitution into Eq. 7.9 and

7.11 yields a determinate form in the a 0+ limit. Propositions 7.5 and 7.6 below repre-

sent the limiting forms, and consequently are respectively the limiting counterparts of

Propositions 6.7 and 6.8.

Proposition 7.5: The components of lim q,, exist and can be expressed in the form
a-^

limq lin Pl -I'-

a-0* E SA A' -+0'
0 nme S' SA'

Proposition 7.6: The components of lim q, exist and can be expressed alternatively as
a -0*

lim P(mA)'- -I
a 0
__ [ ---o-''---=-- -----' m m= mA e SA
lim q,(m) = qo(m) = lim I P(nA n)' -' -
aE 0 n0A E SA' a 0-0
0 } me S'- SA'

An immediate consequence of Propositions 7.4 and 7.5 is strict positivity of the

zero mutation probability limiting stationary distribution components for all absorbing

state rows. That is, VmA e S : q0(mA) > 0. The argument is analogous to that at the

conclusion of Section 6.3 concerning strict positivity of all stationary distribution compo-

nents when a > 0. This result is anticipated by the simulation results in Section 5.5. A

consequence is that the required limiting behavior for direct application of the simulated

annealing convergence theory to the genetic algorithm model does not follow. However,

the results displayed in Section 5.5 and developments produced in Section 9.3 suggest

that the limiting distribution can be made arbitrarily close to the desired limiting behav-

ior.

Since the a -- 0+ limit of the stationary distribution exists, the definition of q, can

be extended to include the point a = 0. That is

qjao= 90= lim qa
a -0+

where the values of the required limits are provided by Proposition 7.5. Proposition 7.7

below follows from this extended definition of q, and Proposition 7.2.

Proposition 7.7: For all a 2 0, the components of q, are continuous rational functions of

the independent variable a.

Proposition 7.3, which concerns the first derivative ofq,, can also be extended to

include the limiting case. The extension requires easily obtainable counterparts of Eq.

7.1-3 developed for IP' -I' and Eq. 7.9. The Eq. 7.1 counterpart is

(1+a) tM( '-) PF = 0-(a)', Eq. 7.13

and that for Eq. 7.2 is

6 -.(a)' + O(a)
6( O-- m = mA e SA'
(m)= () S Eq. 7.14
O(a)
( -me S -SA '
6(a)' + 0(a) A
where E(a)' is the polynomial counterpart (summed over n, e SA') of E(a) in Eq. 7.2.

Differentiating Eq. 7.14 with respect to a yields a rational function with denominator

polynomial [%(a)' + O(a)]2, whose a -- 0 limit is nonzero by Proposition 7.4, Eq. 7.13

and the definition of 6(a)'. Proposition 7.8 below follows from Proposition 7.3 and these

observations.

Proposition 7.8: The components of the first derivative ofq,, with respect to a possess

limits as a -- 0'.

Thus, a zero mutation probability limit exists for the time-homogeneous two and

three-operator algorithm variants. The limit is represented by Propositions 7.5 and 7.6.

Further, Propositions 7.7 and 7.8 establish some useful ancillary results concerning the

stationary distribution behavior at the point a = 0. These latter results are employed in the

following section in establishing strong ergodicity of the inhomogeneous genetic algo-

rithm Markov chain. Propositions 7.5 and 7.6 are used in Section 9 to develop a method-

ology for representing the stationary distribution limit.

SECTION 8

A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND

8.1 Overview

The annealing schedule bounds for the simulated annealing algorithm, which are

reviewed in Section 2.4.2, are derived by requiring that the nonstationary Markov chain

which represents the algorithm be strongly ergodic (Definition A13) and then deducing a

monotonic lower bound on the algorithm control parameter. The methodology consists of

demonstrating that the time-homogeneous Markov chain corresponding to every positive

algorithm control parameter value possesses a stationary distribution, that the sequence of

stationary distributions corresponding to any sequence of positive control parameter val-

ues converges to a limiting distribution if the control parameter sequence converges to

zero, and then employing Definitions A 1l-A13 and Theorems A5-A7 to deduce a

sufficient condition (the annealing schedule lower bound) to guarantee that the nonsta-

tionary algorithm achieves the limiting distribution (i.e. strong ergodicity).

The model development in Section 4 demonstrates that for all mutation probability

values in the range 0 < p < 1, the Markov chain representing either the two or three-

operator time-homogeneous simple genetic algorithm possesses a stationary distribution.

Section 7 demonstrates that the stationary distribution approaches a limit as the mutation

probability parameter approaches zero. This section proposes and then verifies a mono-

tone decreasing lower bound on the mutation probability sequence of the nonstationary

genetic algorithm Markov chain which is sufficient to ensure strong ergodicity.

8.2 A Weak Ergodicity Bound

The following paragraphs propose and then verify a mutation probability parameter

bound sufficient to ensure that the Markov chain of the corresponding nonstationary sim-

ple genetic algorithm is weakly ergodic (Definition All). The bound applies to both the

two and three-operator algorithms, and it appears in Proposition 8.1 below.

Proposition 8.1: The mutation probability bound given by

1 --
pm(k)2 k M

is sufficient to ensure weak ergodicity of the corresponding nonstationary (two or three-

operator) simple genetic algorithm Markov chain.

This result is established by using the lower bounds on the two and three-operator

conditional probabilities in Eq. 4.21 and 4.31 with Definitions Al l and A12 and Theo-

rems A5 and A6. Applying the lower bound in Eq. 4.21 and 4.31 to T,(.) of Definition

A 12 and Theorem A5 yields

T,(P) = 1 min min(P(m I n,), P(m | n))

Ii D2 m

++a i m

1+aa jl +a

+2a i
1+a)

Thus,

(P)) 2a
>-+(aP)

and consequently from Theorem A6, the chain is weakly ergodic if the sequence of con-

trol parameter values {a(k)} satisfies

( 2o(k) ML
k=l 1 + a(k)) )

Comparing this result to the known divergent series 'k-1, it follows that the Markov

chain is weakly ergodic if the sequence {(x(k)} satisfies

2C2a(k) ,
1 + a(k)

from which

1-k
_x(k) Ik.
1 + a(k)) 2

Using Eq. 4.13 to translate this result into an equivalent expression in pm(k) establishes
Proposition 8.1.

8.3 Strong Ergodicity

The mutation probability schedule bound advanced in Proposition 8.1 is also suffi-

cient to achieve strong ergodicity if it satisfies the condition on the sequence of vector

differences in Theorem A7. The required sequence of vectors can be selected as the

sequence of stationary distributions of the time-homogeneous Markov chains associated
with the parameter sequence {pm(k)} (or equivalently with the corresponding sequence

{a(k)}).
Section 4 establishes that a stationary distribution exists for the time-homogeneous

two and three-operator algorithms corresponding to every value of a satisfying a > 0.

Thus, associated with the sequence of control parameter values {a(k)} is a sequence of

vectors {qk} where q = q, evaluated at a = a(k). Further, based upon results established

in Section 6, Section 7 demonstrates that an a -> 0+ limiting stationary distribution exists

(Propositions 7.5 and 7.6), that the stationary distribution vector varies continuously for
all a satisfying a 2 0 (Proposition 7.7) and that its first derivative exists and is continuous

for all a satisfying a > 0 (Proposition 7.3). In particular, gq is continuous on the closed

interval 0 < ox < 1 and its first derivative exists at every interior point of that interval.

Therefore, if consideration is limited to monotone decreasing control parameter

sequences, then by the mean value theorem the difference between the m components of

any two consecutive vectors in the sequence can be written as

qk+ I )-q(k(m)= dq(x (a(k + 1)- oa(k))
I a=a'(k)

where the value a'(k) satisfies o(k + 1) < ao(k) < oa(k). Consequently,

lqk+(m)-q(m)| = Id( xI o(k + )- (k)l
Sqk,( ) ( a=a'(k)

and

dq (m)
lqk+1(m)-qk(m)= k1 doi x|a(k+ 1)-a(k)j Eq.8.1
SI qk+()-- () do( J= = I I a= C'(k)
k=l k= I a a J

From Propositions 7.3 and 7.8, it is possible to define a function ga(m) which is

continuous in a on the closed interval 0 5 a 5( 1 as follows

I dq,(i) <
Sdo
ga(mi=) dEq. 8.2
lim dq- a=0
a-0+ da

Then, from a fundamental theorem in the calculus of functions, it follows that g,(m) (and

consequently that I g,(m)l) is bounded on the closed interval 0 < a 1. Thus, if

B = sup I g,(m)l, Eq. 8.3
me S',aE [0,11

then it follows from Eq. 8.2 that at every interior point of the interval 0 a c 1

dq,(m)
d d

and application of this result to Eq. 8.1 yields

Sl, (m) q(m)| = d x I a(k + 1) a(k)l
k=1 k= a a) J

x B x (k + ) a(k)l Eq. 8.4
k=l

=B I (k + 1)- a(k) .
k=l

Since only monotonic control parameter sequences are under consideration, the sum

in the last line of Eq. 8.4 can be written as the difference of the initial and final parameter

values of the sequence. Thus,

I q+ ,(m)-q (m)| B(ca(1)- c(oo))
k=l

=B(1 0) Eq. 8.4

=B
The series of vector differences required for application to Theorem A7 can then be writ-

ten as

l qk+] q1= I |ql+1(m)-qk(m)|
k=1 k+lmeS'

S1 B Eq. 8.5
me S'

= N'B < o

Applying the combined results of Proposition 8.1 and Eq. 8.5 to Theorem A7 pro-

duces the goal of this section.

Proposition 8.2: The mutation probability bound given by

1 --
pm(k) k -k

is sufficient to ensure strong ergodicity of the corresponding Markov chain. Further, the

Markov chain representing any nonstationary two or three-operator simple genetic

algorithm for which the mutation probability sequence both observes this bound and con-

verges to zero achieves asymptoticallyy) the limiting probability distribution defined in

Propositions 7.5 and 7.6.

8.4 Comparison With the Simulated Annealing Parameter Bound

It is instructive to compare the mutation probability sequence bound developed here

with the annealing schedule bounds reviewed in Section 2.4.2.2, both of which are of the

form K/log(k). Let p(k) be defined as the ratio

p(k) = p(k)/T(k)

where p,(k) is selected as the bound developed herein and T(k) is selected as the bound

provided by either Eq. 2.12 or Eq. 2.13. That is

p(k)= k-"'/[K/log(k)] Eq. 8.5

2
1 1

= log(k)/k'.

Thus, decreasing values of p(k) imply that the genetic algorithm convergence rate is

superior asymptoticallyy) to that of the simulated annealing algorithm.

Now, let k = exp(x), or equivalently x = log(k). Substituting into Eq. 8.5 yields

p(k) = x exp .

Then, since for all positive constants y, the limit of x exp(-yx) as x -> o is zero, it follows

that

lim p(k)= Ig(/] = 0. Eq. 8.6

Thus, the nonstationary simple genetic algorithm provides an asymptotically superior

convergence rate.

Full Text

PAGE 1

TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM By THOMAS E. DAVIS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1991

PAGE 2

UNIVERSITY OF FLORIDA 3 1262 08552 3479

PAGE 3

ACKNOWLEDGEMENTS The author is extremely fortunate in having the assistance of several talented academicians during the conduct of the research program reported in this dissertation. Notably, Professor Jose Principe, who supervised this work and contributed several key ideas developed herein, proved a very valuable source of encouragement and support. Also, Professor Murali Rao assisted very substantially in enforcing mathematical rigor, especially in the formulation of the Markov chain appendices. Professors Antonio Arroyo and Donald Childers, who served on the committee overseeing this work, are remembered fondly for the constructive comments they provided during the visits the author made to Gainesville while conducting this research, as well as for productive associations while the author was in residence. Also, the assistance provided by Professor Eugene Chenette from the Eglin Graduate Center in dealing with a variety of administrative complications, as well as his service on the committee overseeing this work, is graciously acknowledged. Additionally, the generous support of the US Air Force Annament Laboratory to this work is sincerely appreciated. The author's management chain, notably Mr. Lynn Deibler, Lt. Col. Rex Franklin, Dr. Eugene Youngblood and Lt. Col. Tom Callen, provided continual encouragement and working condition flexibility. Without their support, this activity would likely not have been possible. The Computer Science Directorate at Eglin provided some exceptionally valuable computer support, on the Eglin AFB Cray Y-MP, under very flexible conditions. Some of the insights gained during the conduct of this research would be very difficult, and perhaps impossible, to attain through any method other than simulation. Members of the staff at the Computer Science Directorate who contributed significantly, especially in ii

PAGE 4

helping the VMS-inclined author through the UNIX maze, include Mr. Eddie Blackwell, Mr. Ben McKinnon and Mr. Danny Majors. Mr. Bill Clements, who made the computing resources available, and Mr. Calvin George, who helped the author arrange the support, are also gratefully acknowledged. Finally, the author wishes to thank Sumiko, who entered his life during the conduct of this research program, for the support and understanding whose need she never fails to anticipate. m

PAGE 5

TABLE OF CONTENTS page ACKNOWLEDGEMENTS ii ABSTRACT vii SECTIONS 1 INTRODUCTION 1 1.1 Non-Convex Combinatorial Optimization and Stochastic Search Algorithms 1 1.2 Organization 2 2 SIMULATED ANNEALING 7 2.1 Overview 7 2.2 Statistical Mechanics and Annealing of Solids 7 2.3 Combinatorial Optimization by Simulated Annealing 9 2.4 Theoretical Foundations of Simulated Annealing 10 3 THE GENETIC ALGORITHM 20 3.1 Overview 20 3.2 The Simple Genetic Algorithm Operators 21 3.3 Building Bloclcs, Schemata and the Fundamental Theorem... 23 3.4 An Assessment of the Genetic Algorithm Theoretical Foundation 26 4 A MARKOV CHAIN MODEL OF THE SIMPLE GENETIC ALGORITHM 28 4.1 Overview 28 4.2 The Markov Chain Model 28 4.3 The State Behavior of the Simple Genetic Algorithm 30 5 SOME EMPIRICAL RESULTS 42 5.1 Overview 42 5.2 State Space Enumeration 43 5.3 Reward Function Data 46 5.4 Conditional Probabilities vs a 48 5.5 Converged Limiting Stationary Distributions 52 IV

PAGE 6

page 6 THE CRAMER'S RULE FORMULATION OF THE STATIONARY DISTRIBUTION 66 6.1 Overview 66 6.2 The Stationary Distribution Description 66 6.3 Positivity of the Stationary Distribution Components 71 6.4 The Indeterminate Form at a = 72 7 THE ZERO MUTATION PROBABILITY STATIONARY DISTRIBUTION LIMIT 73 7.1 Overview 73 7.2 Functional Form of the Stationary Distribution.^ 73 7.3 The Absorbing State Rows of |P-I| and IPs-Tl 75 7.4 Reformulation of Propositions 6.7 and 6.8 77 7.5 The Stationary Distribution Limit 81 8 A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND 86 8.1 Overview 86 8.2 A Weak Ergodicity Bound 87 8.3 Strong Ergodicity 88 8.4 Comparison With the Simulated Annealing Parameter Bound 91 9 REPRESENTATION OF THE STATIONARY DISTRIBUTION SOLUTION 92 9.1 Overview 92 9.2 The Limiting Case a= 1 93 9.3 The General Case < a < 1 97 9.4 The Limiting Case a -^ 0* 109 9.5 Extending the Stationary Distribution Representation 1 16 10 CONCLUSIONS AND FUTURE DIRECTION 120 10.1 Summary 120 10.2 Contributions of the Research 124 10.3 Future Direction 125 APPENDICES A DISCRETE TIME FINITE STATE MARKOV CHAINS 126 A.l Introduction 126 A. 2 Elementary Definitions 126 A. 3 Time-Homogeneous Markov Chains 128 A. 4 Inhomogeneous Markov Chains 130

PAGE 7

page B THE PERRON-FROBENIUS THEOREM AND STOCHASTIC MATRICES 132 B.l Introduction 132 B.2 The Perron-Frobenius Theorem and Ancillary Results for Primitive Matrices 132 B.3 The Perron-Frobenius Theorem for Stochastic Matrices 134 C VANDERMONDE DETERMINANTS, SYMMETRIC AND ALTERNATING POLYNOMIALS 137 C.l Introduction 137 C.2 Evaluation of Vandermonde Determinants 138 C.3 Symmetric (and Alternating) Polynomials 139 C.4 Quasi-Symmetric (and Quasi-Alternating) Polynomials 142 D COMPUTER LISTINGS 145 D.l Introduction 145 D.2 Main Program Listings 145 D.3 Library Listings 152 REFERENCES 163 BIOGRAPHICAL SKETCH 166 VI

PAGE 8

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TOWARD AN EXTRAPOLATION OF THE SIMULATED ANNEALING CONVERGENCE THEORY ONTO THE SIMPLE GENETIC ALGORITHM By THOMAS E. DAVIS May 1991 Chairman: Professor Jose C. Principe Major Department: Electrical Engineering Simulated annealing and the genetic algorithm are stochastic relaxation search techniques suitable for application to a wide variety of combinatorial complexity nonconvex optimization problems. Each produces a sequence of candidate solutions (or populations of candidate solutions) to the underiying optimization problem, and the purpose of both algorithms is to generate sequences biased toward solutions which optimize the objective function. The appeal of simulated annealing is that it provides asymptotic convergence to a globally optimal solution. A substantial btxly of knowledge exists concerning the algorithm convergence behavior. It is based upon a nonstationary Markov chain algorithm model. No genetic algorithm model comparable in scope exists in the literature. This work constitutes an attempt to provide such a model and accompanying convergence theory by extrapolating the simulated annealing results onto the genetic algorithm. A prerequisite, developed herein, is a nonstationary Markov chain genetic algorithm imxlcl. vn

PAGE 9

The essence of the simulated annealing theory is demonstration of (1) existence of a unique asymptotic probability distribution (stationary distribution) for the stationary Markov chain corresponding to every strictly positive constant value of an algorithm control parameter (absolute temperature), (2) existence of a stationary distribution limit as the control parameter approaches zero, (3) the desired behavior of the stationary distribution limit (i.e. optimal solution with probability one) and (4) sufficient conditions on the algorithm control parameter to ensure that the nonstationary algorithm achieves (asymptotically) the limiting distribution. With the exception of (3), this work adapts that methodology to the genetic algorithm Markov chain model employing a genetic operator parameter (mutation probability) as the algorithm control parameter. The results include a mutation probability control parameter bound analogous to (and asymptotically superior to) the conventional simulated annealing parameter bounds, and a framework for representing the genetic algorithm stationary distribution components at all consistent fixed control parameter values, including zero. The genetic algorithm stationary distribution limit has nonzero components corresponding to all solutions. Thus, the simulated annealing global optimality convergence result does not extrapolate. However, both empirical and theoretical evidence is provided which suggests that the desired limiting behavior can be approached by suitably adjusting the algorithm parameters. vni

PAGE 10

SECTION 1 INTRODUCTION 1.1 Non-Convex Combinatorial Optimization and Stochastic Search Algorithms A wide variety of engineering applications lend themselves to formulations which require the solution of combinatorial optimization problems. Typically, the optimization problem is nonconvex and is defined over a very high dimensionality search space (e.g. inverse vision problems, in which an image array of 512X512 pixels at 8 bits/pixel might be encountered, resulting in a search space dimensionality of ~2M). Consequently, direct solution is usually intractable. An alternative to direct solution is to select one of a variety of iterative improvement solution techniques, usually some variant of gradient search. But by definition, deterministic iterative improvement techniques terminate in local extrema, and they ordinarily provide no means of assessing the amount by which the selected local extremum deviates from the global extiemum. A typical means of avoiding local extrema entrapment is to implement the iterative improvement solution method stochastically. The most commonly employed stochastic algorithm approach to combinatorial optimization is simulated annealing [KiGe83, LaAa871, which is also sometimes referred to as probabilistic hill climbing |RoSa85|. It exploits the analogy of combinatorial optimization to the annealing of crystalline solids, in which a solid is cooled very gradually from some elevated temperature and thereby allowed to relax toward its low energy states. The appeal of the algorithm class derives from the fact that provided certain constraints on an algorithm control parameter (analogous to absolute tcnipcratiirc) are observed, asymptotic convergence to a global extrcrnum is guaranteed.

PAGE 11

The key limitation of simulated annealing is that the convergence behavior is asymptotic. Thus global optimality is obtained only after an infinite number of algorithm iterations. The rate of convergence to optimality is determined by a nonnegative algorithm control parameter whose ideal value is zero and which must observe a lower bound in order to assure coherent algorithm behavior. The best available known bound for the parameter, the annealing schedule bound, is of the form K/log(k) where k is the iteration index and K is a parameter independent of k [GeGe84, MiRo85]. Another combinatorial optimization stochastic search technique reported in the literature is the genetic algorithm [Davi87, Gold83, Gold89a, Gref85, Gref87]. It emulates the evolution of biological systems by employing a set of stochastic operators (e.g. reproduction, crossover and mutation) to transform a population of candidate solutions to the underlying optimization problem into a new (descendent) population. It has some features which suggest that it may provide significantly improved convergence behavior over simulated annealing on certain types of optimization problems. However, the nature of the genetic operators and their influence on algorithm behavior is only understood in general terms. No complete theoretical model of the algorithm exists in the literature. The fundamental goal of the work reported here is to provide a theoretical framework for analyzing the algorithm based upon the asymptotic probability distribution of the solution sequences which it produces. The work reported herein includes significant progress on the key intermediate steps to achieving that goal. 1.2 Organization The remaining sections of this paper are organized as follows. Sections 2 and 3 are background reviews of the simulated annealing and genetic algorithm literature respectively. Section 2 places considerable emphasis on the methodology employed to yield the asymptotic convergence results which are the theoretical foundation of the simulated annealing algorithm. That methodology appeals heavily to the theory of inhomogeneous (nonstationary) Markov chains and their asymptotic state probability distributions. The

PAGE 12

essence of the simulated annealing convergence theory is a set of sufficient conditions to ensure that the asymptotic probability distribution of the Markov chain which represents the algorithm is independent of its starting state and has probability zero for all states corresponding to sub-optimal solutions. Section 3 begins with a verbal description of the three fundamental stochastic operators employed in genetic algorithms (i.e. reproduction, crossover and mutation), and proceeds to review the existing theoretical foundation of the algorithm class. A conclusion of that section is that while certain important theoretical results exist, notably the socalled schema theorem and some work on a problem construct referred to as the minimal deceptive problem, the genetic algorithm lacks the theoretical foundation necessary to either compare it with simulated annealing or to answer key questions concerning the design of a genetic algorithm for a given application. The author's contribution to this work begins with Section 4. The major result of that section is a very general, nonstationary Markov chain model of the variants of the genetic algorithm which employ combinations of the three fundamental genetic algorithm operators. The model is tailored to resemble that employed in developing the simulated annealing methodology, and in that regard, the genetic algorithm mutation operator is shown to provide a function very similar to that of the simulated annealing absolute temperature analog. Specifically, the stationary algorithm corresponding to every constant value of the mutation probability parameter satisfying 0
PAGE 13

Section 5 digresses briefly from the theoretical development to produce and examine some empirical work based upon the algorithm model. The presentation is not, nor is it intended to be, a thorough empirical study. It is provided to help fix some of the algorithm model state space and asymptotic probability distribution ideas which are central to this work, and it anticipates some of the theoretical results which follow. Section 6 resumes the theoretical development. Its result is an expression for the components of the unique asymptotic probability distribution produced by the stationary algorithm variants which implement the mutation operator with nonzero mutation probability (i.e. the stationary two and three-operator algorithm variants). The result is expressed in terms of Cramer's Rule and thus its solution requires evaluation of determinants. The determinants are the characteristic polynomials, evaluated at A,= 1, of matrices derived from the state transition matrix produced in Section 4 by zeroing one row. A later section attacks the problem of explicitly solving the system, based upon the highly symmetrical nature of the state transition matrix, but some very significant results are obtainable from the product of Section 6 without explicit solution. An essential step in establishing a connection between simulated annealing and the genetic algorithm is demonstrating the existence of a stationary distribution limit for the algorithm as the mutation probability approaches zero. Section 7 accomplishes that task and also provides a foundation for deducing, in Section 8, a mutation probability bound analogous to the annealing schedule bounds of the simulated annealing algorithm. The results developed in Sections 7 and 8 apply to both the two and three-operator algorithm variants. A somewhat surprising result produced in Section 7 and anticipated by the empirical study reported in Section 5 is that the stationary distribution zero mutation probability limit does not necessarily isolate globally optimal solutions. In fact, it provides nonzero probability for all solutions of the underlying optimization problem and consequently the extrapolation of the simulated annealing methodology is less than exact. However, both

PAGE 14

the empirical results presented in Section 5 and some results developed later in Section 9 suggest that the required limiting behavior can be approached as closely as desired by adjusting the algorithm parameters appropriately. Section 9 attacks the problem of explicitly solving the system which results from the Cramer's Rule formulation of the stationary distribution of the time-homogeneous two and threeoperator algorithms. It is a very extensive development which yields an expression for the coefficient of the general term in the Taylor's series expansion of the required determinants. It is based upon the highly symmetrical nature of the state transition matrix, as alluded to earlier. The results of Section 9 are not reduced to a directly useable explicit solution. Nevertheless, they do provide significant insight into the functional form of the stationary distribution components. Furthermore, Section 9.5 points out some very significant identities which exist among the coefficients of the Taylor's series and suggests a method for continuing the Section 9 development based upon the algebra of symmetric and alternating polynomials. Explicit solution of the stationary distribution equations is the major incomplete task required for extrapolation of the simulated annealing convergence theory onto the genetic algorithm. Section 10 summarizes this work and recapitulates the significant results. It also proposes continuation of two parts of this research: (1) pursuit of the stationary distribution solution and (2) refinement of the mutation probability control parameter bound. An appropriate mathematical framework for examining both the simulated annealing and genetic algorithms is the theory of Miu-kov chains. Appendix A is included to summarize some essential definitions and theorems. Appendix B is devoted to the PerronFrobenius theorem, which is fundamental to the study of nonnegative matrices in general and Markov chains in particular. Several important Markov chain theorems are specializations of it and the key developments in Sections 6 and 7 require its application. All of the Appendix A and Appendix B results are provided without proof or elaboration.

PAGE 15

but their foundation is obtainable from various references (e.g. [Cinl75] for the more elementary results in Appendix A, [SeneSl] for the Appendix B material on the PerronFrobenius Theorem and [SeneSl, IsMa76] for the Appendix A ergodicity related definitions and theorems). These results are invoked freely in the following sections, either by specific reference to definition/theorem number, or if the context makes it appropriate, they are simply assumed. Appendix C is provided as background for the Section 9.5 discussion on coefficient identities and extending the stationary distribution representation development. With the exception of Section C.4, the material presented in Appendix C is obtainable from advanced algebra texts (e.g. [MoSt64]). The symmetric/alternating polynomial generalization in Section C.4 is original. Appendix D collects the computer program listings for the programs employed in generating the results reported in Section 5. The programs presented there were developed and executed on the Cray Y-MP operated by the Computer Science Directorate at EgUn AFB, PL

PAGE 16

SECTION 2 SIMULATED ANNEALING 2.1 Overview As noted in the introduction, a very commonly employed approach to the solution of nonconvex combinatorial optimization problems is a stochastic relaxation technique introduced by Kirkpatrick et al. and referred to as simulated annealing [KiGe83]. The technique is so named by virtue of its analogy to the annealing of solids, in which a crystalline solid is heated to its melting point and then allowed to cool very gradually until it is again in the solid phase at some nominal temperature. In the limiting case of infinitesimal cooling rate and absolute zero final temperature, the resulting solid achieves its most regular possible crystal lattice configuration (i.e. minimum lattice energy state), and hence is free of crystal defects. Simulated annealing establishes the connection between this sort of thermodynamic behavior and the search for the global minimum of an objective function in a combinatorial optimization problem, and further, it provides an algorithmic means of exploiting the connection. This section is a review of the technique with special emphasis on known results which bound the convergence behavior of computer algorithms belonging to the class. 2.2 Statistical Mechanics and Annealing of Solids The fundamental assumption of statistical physics is that the thermodynamic behavior of a many particle system can be represented by a statistical ensemble, and that if the system is in thermal equilibrium, the time averages of macroscopic thcmiodynaniic properties of the system are equal to the corresponding ensemble averages (ergodicity hypothesis). The random variable represented by the ensemble is the system thermal energy, and at thermal equilibrium the probability distribution is completely determined

PAGE 17

8 by the system temperature. The distribution is known as the Boltzman distribution, or alternatively as the Gibbs distribution, and its form is exp{-E(i)/kT} Pr{E = E(i)}=Z(T) Eq. 2.1 where E(i) k T Z(T) = the system thermal energy (a random variable) = the energy corresponding to state i = Boltzman's constant = the system temperature = the partition function. The factor exp -E(i) kT is called the Boltzman factor. The partition function provides the necessary normalization to make Eq. 2. 1 a state occupancy probability. It can be expressed as Z(T)=Iexp{:|P}. Eq.2.2 At elevated temperatures, the system represented by the probability distribution in Eq 2.1-2 occupies all states in its state space with nearly uniform probability, while at low temperatures, states having low energy are favored. When the temperature approaches absolute zero, only states corresponding to the minimum value of energy have nonzero probability. Thus, the thermodynamic system's energy function can be effectively searched for its minimum value by starting the system at an elevated temperature and allowing it to cool gradually to absolute zero, at which point one of its minimum energy states is occupied with probability one. This is the mechanism which guides the annealing of solids. The cooling schedule employed in annealing solids is constrained by the requirement that the system be allowed to achieve thermal equilibrium at each temperature. The

PAGE 18

Gibbs distribution only represents the system's energy distribution in the stationary case (i.e. equilibrium). If this requirement is not satisfied, defects can be frozen into the crystal lattice preventing the system from achieving the minimum possible energy state. This behavior is analogous to local minima entrapment in combinatorial optimization search. The restriction on the annealing schedule necessary to avoid it is the fundamental limitation on the annealing technique. 2.3 Combinatorial Optimization by Simulated Annealing Simulated annealing approaches combinatorial optimization problems in a closely analogous fashion. In simulated annealing, the optimization problem's solution space corresponds to the state space of the analogous thermodynamic system and its cost function is analogous to the thermodynamic system's energy surface. The analog of the thermodynamic system's temperature is a nonnegative algorithm control parameter, T. Two other algorithm components are also required. They are the stochastic next state generation and acceptance mechanisms, and they incorporate the dependence of the algorithm on the control parameter, T. The next state generation mechanism is employed by the algorithm to transform a current solution into a new candidate solution, and the acceptance mechanism is employed to decide whether to retain or discard the proposed new solution. Together, these stochastic operators are responsible for making the search algorithm simulate the thermodynamic system's statistical behavior. Consequently, they must satisfy certain requirements to assure coherent algorithm behavior. These requirements are explored in some depth later in the context of algorithm convergence behavior. Conceptually, the operation of the simulated annealing algorithm can be described as follows. The algorithm starts at some initial value of the control p;u-ameter and with some initial solution. Then, the state generation mechanism is employed to synthesize a new candidate solution. The new solution is examined by the acceptance mechanism and either accepted or rejected. If it is accepted, the new solution becomes the current solution. Otherwise, the old current solution is retained. This process is repeated, generating a

PAGE 19

10 sequence of temporary solutions, until an approximate equilibrium is achieved in which the solution space occupancy is described by the Gibbs distribution (Eq. 2.1-2). Once this approximate equilibrium is achieved, the control parameter value is reduced and the solution sequence is extended until equilibrium is achieved at the new control parameter value. This process is repeated until some termination condition (e.g. minimum control parameter value) is attained. The current solution at termination is then accepted as the solution to the optimization problem. It is noted in passing that simulated annealing always involves minimizing a cost functional, never maximizing a reward. However, this causes no loss of generality because any combinatorial optimization problem can be translated into an equivalent minimization problem. 2.4 Theoretical Foundations of Simulated Annealing The evolution of the search sequence of a simulated annealing algorithm as outlined, in which each succeeding solution in the sequence is determined stochastically based upon the current solution, suggests that the algorithm behavior can be described as a Markov chain. Indeed it can, and all of the known convergence results for simulated annealing algorithms are derived from analysis of Markov chain models [LaAa87, GeGe84, LuMe86, MiRo85, Rior58]. This subsection establishes a Markov chain model to represent the simulated annealing algorithm and then employs it in reviewing the development of the published convergence bounds. This development essentially follows [LaAa87]. 2.4.1 A Markov Chain Model of Simulated Annealing Let a combinatorial optimization problem be represented by the pair (S,C) where S is the problem's solution space and C is its cost function, and assume without loss of generality that the optimization problem requires minimization of C. Also, assume that S is finite. Then, a simulated annealing algorithm for solving this problem can be characterized by the quadruple (S,io,Px,T) where S is as defined above and where io e S

PAGE 20

11 is an initial candidate solution, p is a stochastic matrix which describes a stochastic state transition mechanism (the composition of the next state generation and acceptance mechanisms discussed in Section 2.3) and x = {T^} is a finite length monotone nonincreasing sequence of positive control parameter values. The first parameter value in x is Tq and the final value is Tf. P^incorporates the algorithm dependence on both C and T. The algorithm generates a sequence of candidate solutions, {in:0 < k < f}, by employing the state transformation mechanism (described by Pj) to transform solution i^ into i^+i. At the ktransition, P^ is completely determined by T^. The solution sequence is extended until T = Tf, at which point the current solution, if, is accepted as the solution to the combinatorial optimization problem. Thus Tf signals algorithm termination. Tf can be allowed to depend on {i,;} provided due regard is paid to the requirement for termination. Since the solution state transition mechanism is stochastic, and since the conditional dependence of the solution sequence only extends to one transition, the solution sequence is a Markov chain by Definition Al. Its state transition matrix is Py (Definition A4). The state transition matrix is decomposed into two parts for convenience in the following. It consists of the next state generation mechanism, G|j(T), which describes the probability of generating state j given that the current state is i, and the state acceptance mechanism, A|j(T), which describes the probability of accepting the generated state. Thus, P-r(i,j) is written as G,(T)A,(T) j^i P,ihi) 1i GÂ„(T)AÂ„(T) j = i 1=1. I*i Eq. 2.3 In this result, N = card(S) represents the cardinality of the solution space. It is noted in passing that the usual form of the state acceptance mechanism is the so called Metropolis criterion |Metr53|, given by -(C(j)-C(i))\ Â£^^,^ Aj:(T) = mim l,exp

PAGE 21

12 This is the form employed by Kirkpatrick et al. in the original work [KiGe83] and most others published since are variations of it. Also, the usual form of the next state generation mechanism is G,j(T) = G, = G,= N. J^ * Eq.2.5 otherwise where S; c S is the set of states accessible from state i in one transition (by definition, i Â«Â« Sj), and where Nj = card(S,). Note that G,j defined by Eq. 2.5 is symmetric and independent of T. 2.4.2 Asymptotic Convergence Behavior The subject of interest in the remainder of this section is a set of sufficient conditions on P-r and x to ensure that an optimal solution is achieved. These conditions will prove to guarantee asymptotic convergence only (i.e. T must be an infinite sequence, which of course violates the termination requirement of the algorithm). Two cases will be examined. The first only involves time-homogeneous (stationary) Markov chains (Definition A5) and is presented due to its relative ease of analysis. Its purpose is to provide a foundation for the essential ideas involved in the second case, which requires an appeal to ergodicity theorems for inhomogeneous (nonstationary) Markov chains. The useable convergence behavior results which are the goal of this effort derive from analysis of the second case. The first (simple) algorithm is represented as a sequence of solutions evolving as a sequence of distinct Markov chains. Each Markov chain in the sequence executes at a fixed control parameter value (and hence is time-homogeneous) and each succeeding Markov chain executes at a lower (but strictly positive) parameter value. Thus, in the sequence x, each distinct parameter value, TÂ„ is associated with a distinct timehomogeneous Markov chain and T, occurs at some large number of consecutive locations, KÂ„ in X. This case is hereafter referred to as the homogeneous (or stationary) algorithm.

PAGE 22

13 The analysis of the convergence behavior of the homogeneous algorithm includes the hypothesis that each Markov chain in the sequence achieves its stationary distribution. This hypothesis is equivalent to K, Â— > <Â» for all 1 (Definition A 10, Theorem A3, Theorem A4). In the second case, the algorithm is represented as a sequence of solutions evolving as a single inhomogeneous (nonstationary) Markov chain. This formulation is hereafter referred to as the inhomogeneous (or nonstationary) algorithm. In the inhomogeneous algorithm, the control parameter value is allowed to decrease (though not necessarily required to) after each state transition. The dependence of G|j(T) and A,j(T) on T results in the inhomogeneous behavior. 2.4.2.1 The Homogeneous Algorithm In the homogeneous algorithm, the means of establishing the requirements for asymptotically optimal convergence is to first establish sufficient conditions for existence of the stationary distribution of each Markov chain and then to establish sufficient conditions to ensure that the stationary distribution converges to a uniform distribution over the set of optimal solutions as the control parameter value approaches zero. That is 1 imqT(i) = ^NÂ„p, '^ "Â•" Eq. 2.6 otherwise T->0 where % is the stationary distribution of the Markov chain executing at control parameter value T, S^p, c S is the set of solutions i e S:C(i) = Qp, and N^p, = card(SopJ. Theorems A1-A3 can be employed to deduce sufficient conditions on Pi(iJ) (or alternatively on G,j(T) and A,j(T)) to ensure the existence of the stationary distribution of each Markov chain in the sequence representing the homogeneous algorithm. Since only combinatorial (finite solution space) optimization problems are under consideration and since by definition the homogeneous algorithm only employs time-homogeneous Markov chains, the finite state space and time-homogeneity requirements of Theorem A3 are

PAGE 23

14 satisfied. Beyond these requirements, existence of the stationary distribution of each Markov chain in the homogeneous algorithm only requires that the chain produced by Pj be irreducible and aperiodic (Definitions A7 and A9). If Ay(T) is selected as the Metropolis criterion, Eq. 2.4, then Vi,jÂ€ E,VT>0:A,/T)>0. Thus, from Eq. 2.3, the irreducibility requirement is transferred to the next state generation mechanism, G,j(T). Note that from Theorem Al, irreducibility can readily be achieved within the definition supplied by Eq. 2.5. Also, in [MiRo85], Theorem A2 is used to show that a sufficient condition for aperiodicity is VT>0 3i,je E 3 Aij(T)
PAGE 24

15 (Eq. 2.6) are established. First, note that if the stationary distribution of a Markov chain in the sequence exists, then a function g(C(i),T) corresponding to that Markov chain exists such that j where g satisfies (1) VieE,VT>0 : g(C(i),T)>0 (2) Vj Â€ E Zg(C(i),T)G,/T)Ay(T)= . Eq.2.9 g(C(i),T)ZG,/T)A,/T) This can be deduced by noting that the uniquely determining conditions on q expressed in Theorem A3 are met by g satisfying Eq. 2.8 and 2.9. Eq. 2.9 is called the global balance equation. Close examination reveals that it is exactly the necessary condition for equilibrium state occupancy. A more restrictive condition, in which the balance holds for every pair of states on a pairwise basis is called the detailed balance equation. It can be shown that the following additional constraints on g guarantee convergence of the stationary distribution to the optimal (i.e. to Eq. 2.6) [MiRo851. Note that Eq. 2.10(2) requires an exponential form. A>0 (1) limg(A,T) = T-+0 [oo A<() g(A.,T) Eq.2.10 ^'^ g(A;^ = ^^^'-^^'^) (3) VT>0:g(0,T)=l Collectively, Eq. 2.8-2.10 provide a set of sufficient conditions on G|j(T) and A,j(T) to assure convergence of the stationary distribution to Eq. 2.6. The key condition, the global balance equation, is implicit however, and thus is very difficult to apply. Nevertheless, it can be shown |LaAa871 that if Gij(T) and A,j(T) defined by Eq. 2.4 and Eq. 2.5 are

PAGE 25

16 employed, the conditions are satisfied, and that the corresponding stationary distribution is provided by w T. M exp{-(C(i)-QpJ/kT} Vi 6 E : QtU) = T^ . Eq. 2.11 ^'^^ Iexp{-(Ca)-QpJ/kT} ^ j The key to that development is that the Gij(T) and AyCT) of Eq. 2.4 and 2.5 satisfy the detailed balance equation, the symmetry of Gy being a critical consideration. The behavior required by Eq. 2.10(1) is limiting behavior as T Â— > 0. Thus, these conditions assure convergence to the global minimum with probability one (i.e. convergence of the stationary distribution to Eq. 2.6), only if the sequence of Markov chains is infinite and lim T, = 0. Recalling that a guarantee of achieving the stationary distribution l_>=o requires that each Markov chain be of infinite length, the homogeneous algorithm is seen to require a doubly infinite sequence of solutions composed of an infinite sequence of infinitely long Markov chains. 2.4.2.2 The Inhomogeneous Algorithm The behavior of the homogeneous algorithm, which requires that an infinite number of transitions be executed at each control parameter value, clearly is not very useful. The following reviews two published convergence results which extend the ideas developed for the homogeneous algorithm to the inhomogeneous counterpart [GeGe84, MiRo85]. These results adopt the sufficient conditions on Gij(T) and Aij(T) developed for the homogeneous algorithm as a starting point (i.e. irreducibiUty, aperiodicity and Eq. 2.8-2.10) and extend them to the case in which each time-homogeneous Markov chain is finite length (i.e. to the inhomogeneous algorithm). The key products of this effort are lower bounds on the algorithm control parameter's approach to zero. In both cases discussed here, the bound is of the form K/log(k) where k is the index of the Markov chain representing the inhomogeneous algorithm and K is independent of k. The following is a brief sketch of the approach taken to arrive at these results. It is common to both.

PAGE 26

17 Given that Gjj(T) and A,j(T) are selected as in Eq. 2.4 and 2.5, each state transition matrix in the inhomogeneous Markov chain of the inhomogeneous algorithm satisfies all of the sufficient conditions for stationary distribution existence and asymptotic convergence to optimality developed for the homogeneous algorithm (i.e. irreducibility, aperiodicity and Eq. 2.8-2.10). Further, the explicit form of the resulting stationary distribution is given by Eq. 2.1 1. Thus, for each transition matrix, Pj ), there exists an eigenvector, q-r^, having eigenvalue 1 and satisfying the probability vector conditions. Further, q-r^ converges to the limiting distribution of Eq. 2.6 as T^ -^ 0. Consequently, Theorem A7 can be used to establish strong ergodicity (and hence the desired convergence behavior for T^ -^ 0) provided (1) that weak ergodicity can be established and (2) that the inequality appearing in Theorem A7 obtains. Under the hypothesis that Gij(T) and Aij(T) are defined in accordance with Eq. 2.4 and 2.5, in which case the required eigenvector is explicitly provided by Eq. 2.1 1, and that condition (1) (weak ergodicity) is satisfied, both [GeGe84) and [MiRo85J prove condition (2) of the above. The development is straightforward but tedious. Of more interest here is the means of establishing condition (1), because it leads to the annealing schedule bound. Both developments employ Theorem A6 to establish weak ergodicity. The general approach is to use the definitions of Gjj(T) and Aij(T), along with bounds on the extrema of either the cost function [GeGe84| or the slope of the cost function [MiRo85| to define bounds on the one step transition probabilities. The transition probability bound is then employed to arrive at an upper bound on the x, coefficient of ergcxiicity of Theorem A5, which is used in turn in Theorem A6 to deduce a sufficient condition to guarantee weak ergodicity. The condition is in the form of a lower bound on the annealing schedule.

PAGE 27

18 The first such result to be published is in [GeGe84]. The resulting bound is Â„ N X (CÂ„Â„ Cn, J ^k ] Â— TT^ Eq. 2.12 log(k) k>2 where CÂ„Â„ and C^^Â„ are the maximum and minimum values respectively of C(i) for i e S and N = card(S). Thus, C^ is the desired Qp,. The annealing schedule bound established in [MiRo85] is more refined than that of Eq. 2.12. It is given by Tk^r-TTT Eq.2.13 log(k) ^ k>2 where r is the radius of the graph defining the accessible state neighborhoods of the next state generation mechanism (i.e. the {S,} where Sj c S is defined in Eq. 2.5), and L is a constant which bounds the local slope of the cost function. Specifically, r and L are given by r= min maxd(i,j) Eq. 2.14 i^S-SÂ„Â„ JÂ€S where d(i,j) is the distance of j from i, measured by the minimum number of state transitions required to arrive at j starting at i, where S^,, c S is the set of local maxima of C and L= max max |C(j)-C(i)|. Eq. 2.15 i Â€ S j Â€ S, Note that in the special case S, = S for all i Â€ S, then Eq. 2.14 and Eq. 2.15 reduce to r= 1 and L = CÂ„Â„ CÂ„iÂ„ respectively, and substitution into Eq. 2. 1 3 yields nÂ— \ Â— 77\ Â— Â• Eq. 2.16 log(k) ^ The Eq. 2.16 result is smaller than that of Eq. 2.12 by the factor 1/N.

PAGE 28

19 Both of these published convergence results, as well as several others which are minor variations of them, are of the general form VJ log(k). This behavior is the key limitation of the algorithm class, and is believed to be a fundamental limitation imposed by the neighborhood system inherent in the conventional simulated annealing state generation mechanism [GeGe84] (i.e. the fact that at low control parameter values, the likelihood of making the large state transition necessary to escape a local extremum is radically diminished). The simulated annealing literature includes some amount of speculation concerning state generation mechanisms which permit occasional large transitions even at low control parameter values.

PAGE 29

SECTION 3 THE GENETIC ALGORITHM 3.1 Overview The genetic algorithm is an iterative improvement stochastic search method appropriate for application to combinatorial optimization problems and based on the evolution of biological systems. It implements the fundamental idea of survival fitness on a population of string structures which are coded representations of solution candidates selected from the solution space of the optimization problem. The population of candidate solutions (which collectively represent the current estimate of the optimum solution) is subjected to a set of stochastic genetic operators which transform a current population into a new (descendent) population. A variety of distinct genetic operators (based on biological analogs) are available and are reported in the literature [Davi87, Gold89a, Gref85, Gref87]. The most important of them are (1) proportional reproduction, (2) crossover and (3) mutation. A one, two or three operator genetic algorithm employing combinations of these operators with fixed population size is referred to herein as a simple genetic algorithm. The genetic operators are all implemented stochastically, but they do not result in a simple random walk through the search space. They represent a highly structured search which exploits the historical record of performance reflected at each stage of the search by the current population. It is the novel use of this historical record which is central to the appeal of the genetic algorithm. Genetic algorithms usually operate on populations of bit-strings (i.e. the optimization problem is usually coded such that its search space is defined over a binary string alphabet), and they always attempt to maximize some strictly nonnegative objective 20

PAGE 30

21 function. The evolution of the fixed size population of candidate solutions toward domination by optimal solutions is the algorithm goal. The three genetic operators of a simple genetic algorithm are discussed in the next subsection. An analysis of their behavior requires introduction to the concept of schemata, or similarity templates, and that task is undenaken in a subsequent subsection. This section concludes with an assessment of the theoretical foundation available for the analysis of genetic algorithms. 3.2 The Simple Genetic Algorithm Operators As noted above, the simple genetic algorithm employs three biologically inspired operators to transform each population of candidate solutions into a new (descendent) population. The following subsections examine each of these operators and how they influence the search evolution. 3.2.1 Reproduction The genetic algorithm reproduction operator is the algorithmic analog of asexual reproduction. It is the means by which the objective function influences the evolution of the genetic algorithm search. It is implemented by evaluating each member of the current generation against the objective function and using the results to measure relative reproductive fitness (i.e. to provide a selection probability measure). Then, members of the current population are selected in accordance with this fitness measure to be members of the succeeding generation. This process is repeated (with statistically independent selection trials) until the entire new generation is populated. In the absence of the other genetic operators, the reproduction operator tends to force the population to converge to the higher performing members of the current population. It eventually produces a uniform population. At any stage of the search (generation), only solutions which are represented by members of the current population can appear in any succeeding generation. In particular, no solution absent from the initial population is ever attainable. The reproduction operator exerts a strictly converging infiucnce on the

PAGE 31

22 search evolution. The other operators of the simple genetic algorithm circumvent this limitation in a controlled manner. 3.2.2 Crossover The crossover operator in a genetic algorithm is the algorithmic analog of sexual reproduction. It produces the succeeding generation not by simply replicating the fittest members of the current generation but by mating the fittest members of the current generation to produce progeny with some of the "genetic" character of each parent. It is implemented by randomly exchanging parts of the strings representing the parents to produce descendent strings. The crossover operator is implemented (with some given probability, p^) after the reproduction operator has been invoked to select two reproducing parents. A string location is randomly selected (usually with uniform selection probability) and the parent bitstring on each side of the randomly selected location are exchanged to produce two progeny, which are then inserted into the succeeding population. This operation is repeated until the new generation is completely populated. The crossover operator permits strings not represented in the current population to be generated in the succeeding population. That is, certain points in the solution space which are not represented in the current generation can be present in the successor generation. But the crossover operator is applied preferentially to high performance members of the current population, so it constitutes a judicious, informed tendency toward population divergence. This is the novel feature contributed by the crossover operator. Even with the addition of crossover, the genetic algorithm search will eventually converge to a uniform population. In general the crossover operator causes a greater portion of the search space to be explored prior to convergence to uniformity, but for a given initial population, there are still unreachable points in the solution space. Further, even if a high performance solution is accessible from the initial population, some portion of the "gene pool" necessary to reach it can be irrevocably lost during the search evolution.

PAGE 32

23 3.2.3 Mutation The mutation operator is applied to each member of the successor generation created by the reproduction and crossover operators. It simply consists of randomly perturbing each descendent string with some (usually very small) perturbation probability, Pm. The operator exerts a diverging influence on the search algorithm, and it provides a means by which the search can, with some nonzero probability, always arrive at any point in the solution space. That is, no part of the "gene pool" is ever permanently extinguished if the mutation operator is implemented. Clearly, it is analogous to mutation in biological reproduction. Note also that if p^ > 0, the mutation operator precludes the algorithm from ever producing a permanently uniform population (i.e. it precludes algorithm convergence). 3.3 Building Blocks. Schemata and the Fundamental Theorem The underlying premise of the genetic algorithm operators is that good solutions to an optimization problem over a bit-string solution space are composed of locally good substrings, and that assembling combinations of such locally good substrings is an effective way to search the space for globally good solutions. In the genetic algorithm literature, this is referred to as the building block hypothesis. For a problem to be amenable to genetic algorithm solution, this hypothesis should apply. In the genetics parlance, this hypothesis is stated as a requirement that the problem exhibit "...some but not too much epistasis" [Davi87|. The next subsection introduces an idea which helps to place this hypothesis on a more analytical basis, but the results are still incomplete. 3.3.1 Schema Defined Let the solution space under consideration be the set of binary strings of length L, (i.e. S = {(), 1 }' ). Then, a schema (plural schemata), designated H, is a subset of S having the property that every member of H matches at some specified set of defining bit locations. Thus, if L = 5, then the schema H might be the set of length 5 bit-strings which match the string (1,0, 1,0,0) at the bit locations indicated by H = {s:s = (1,*,*,0, *)}. in

PAGE 33

24 which the asterisks indicate "don't care" bits. The bit locations at which the schema is specified are the defining locations of the schema. The order of the schema, designated by o(H), is the number of its defining locations and can range from to L. In this example, o(H) = 2. The defining length of the schema, designated 5(H), is the number of bit positions subtended by its outermost defining bit locations minus 1. In this example, 5(H) = 5 -2 = 3. For a bit-string space of length L, there are exactly 3^ distinct schemata. This can be readily determined by noting that the distinct schemata are selected from {0, 1, *}''. A given string selected from the space represents exactly 2^ distinct schemata. This results from the fact that the string is defined at all L bit positions, and hence is selected from {0, 1} . The schemata of an optimization problem's search space are the building blocks from which good solutions are to be constructed. 3.3.2 Schema Processing and the Fundamental Theorem Let the constant population size of a simple genetic algorithm be designated M. Then, each generation produced by the algorithm represents some number, N, of distinct schemata that is bounded as follows 2^
PAGE 34

25 disrupting high order high performance schemata. The extent of population divergence introduced by the crossover operator is determined in part by the degree of schema diversity present in the current population. In particular, when the population becomes uniform, the crossover operator is nullified, because assembling substrings extracted from identical strings produces identical progeny. The mutation operator also provides a disruptive mechanism which resists the converging influence of the reproduction operator. Since any schema can be produced by mutation with nonzero probability, the permanent extinction of any of the 3^ possible distinct schemata is precluded. These ideas are captured in the following inequality, which is referred to in the literature as the Fundamental Theorem of Genetic Algorithms. It relates the number of copies of a particular schema in the current generation to the expected number of copies of the same schema in the succeeding generation. This inequality is derived in [Gold89a] from relatively simple probability notions. The development is not repeated here. R(H)E{m(H,k-(-l) ) > m(H,k)x R Eq. 3.2 [l-PcX^^-p.xo(H)] where m(H, k) = number of occurrences of schema H in the population at generation k, E{} = expected value operator, R(H) = average objective function value (> 0) of all strings in the current population which are realizations of H, R = the average objective function value of the current population.

PAGE 35

26 Equation 3.2 is an inequality because it does not consider the accretion of the schema H contributed by crossover and mutation. It only accounts for the disruptive effects of these operators. A more thorough treatment can be found on pp 91 3 of [Gref87], but the result is too cumbersome to be of much analytical value. Qualitatively, Eq. 3.2 suggests that low order schemata occurring in the current population contribute to succeeding generations in direct proportion to the product of their number in the current generation and their average performance relative to the other schemata competing for dominance of the same set of defining locations. Crossover and mutation tend to disrupt this converging influence, and the disruptive effect of crossover is directly proportional to the defining length of the schema in question. In view of Eq. 3.2, the building block hypothesis might be restated as a characteristic of genetic algorithm amenable optimization problems. A GA amenable problem is one for which a near optimum solution can be achieved, with a relatively small expenditure of search effort, by assembling high performance, low order schemata into novel combinations. If the objective function is such that (nonlinear) contributions from combinations of bits spanning widely separate bit locations are appreciable (i.e. if the objective function depends heavily on large defining length schemata), then the problem is not likely to be suitable for solution by genetic algorithm. On the other hand, if the objective function depends predominantly on short defining length schemata, then sorting through promising combinations of reahzations of those schemata is likely to isolate good (though not necessarily optimal) solutions. Accomplishing the required sorting efficiently is the task for which genetic algorithms are well suited. 3.4 An Assessment of the Genetic Algorithm Theoretical Foundation The existing theoretical foundation for analysis of genetic algorithms includes the fundamental theorem of genetic algorithms (Eq. 3.2) originally enunciated by Holland [Holl75] and extended by Bridges and Goldberg [BrGo87], the Walsh function approach to computing schema fitness averages contributed by Bethke [BethSO] and

PAGE 36

27 generalizations of it [Gold88, Gold89b, Br
PAGE 37

SECTION 4 A MARKOV CHAIN MODEL OF THE SIMPLE GENETIC ALGORITHM 4.1 Overview From the discussion of the simple genetic algorithm operators in Section 3.2, it is clear that the sequence of populations generated by the algorithm when executing on a specified combinatorial optimization problem is a stochastic process (with finite state space), and further that the conditional dependence of each population in the sequence on its predecessors is completely described by its dependence upon the immediate predecessor population. Thus, the sequence is a Markov chain (Definition Al). In this section, a nonstationary Markov chain model of the simple genetic algorithm is developed for one, two and three-operator variants of the algorithm. The model is tailored to resemble that offered in Section 2.4.1 for simulated annealing. The one-operator genetic algorithm model implements proportional reproduction only, while the two-operator variant employs reproduction in combination with mutation. The three-operator algorithm implements reproduction, mutation and crossover. This model hierarchy is employed because it provides some degree of insight into the effect that each operator has on the nature of the state space of the resulting Markov chain. Describing and analyzing the operation of the simple genetic algorithm is facilitated by assuming that the underlying optimization problem is defined over a bit-string solution space. This assumption is not essential and sacrifices very little generality. It is implemented throughout the following sections. 4.2 The Markov Chain Model Let a combinatorial optimization problem be characterized by the pair (S,R) where S={0,1 }^ and R is a strictly positive real valued reward function, and assume, with no 28

PAGE 38

29 loss of generality, that the problem requires maximization of R. Also, let a simple genetic algorithm designed to execute on this problem have fixed population size M, let i e S be interpreted as an unsigned integer (0 < i < 2^"1), and let a generation be represented by m = (m(0), m( 1 ),Â•Â•, m(2 1 )) where m(i) = the number of occurrences of solution i e S in the population. Thus, in the parlance of combinatorial mathematics, m is a distribution of M nondistinct objects over N = card(S) = 2^ bins [Hall67, Rior58], and the set of all such distributions, S' = {m}, is a suitable representation of the simple genetic algorithm search space. The cardinality of S' is given by N' = card(S') = ^M + 2^-1^ /"m + n-T M J '^'Â•^' M ^ ^ Since both N and M are finite, so is N'. Then, if mg e S' is selected as an initial population, the simple genetic algorithm can be represented by the quadruple (S',mo,PQ,r) where Pq is a state transition matrix (analogous to P^ of the simulated annealing model) and F = {Q^} is a finite length sequence of parameter vectors Q^ = (Pm(k), Pc(k))The algorithm parameters Pn,(k) and Pc(k) are respectively the mutation and crossover probabilities. In the following sections, the mutation probability sequence is employed in a role analogous to absolute temperature in simulated annealing, and consideration is limited hereafter to monotone nonincreasing sequences. In general, the only limitation on the crossover probability sequence is that its values are probabilities. However, in all of the following, consideration is limited to constant crossover probability sequences. The first parameter vector in V is Q,, and the final piU'ameter vector is Qf. The solution evolves as a sequence {m,j} of states m,^ e S' in which the conditional dependence of m|( + , on the sequence history is equivalent to its conditional dependence on m^, and thus the solution sequence is a Markov chain. In general, the chain is inhomogeneous (Definition A5). In Section 4.3 it is shown to be time-homogeneous if the parameter vectors are constant. As with the simulated annealing algorithm model, exhausting the sequence of

PAGE 39

30 control parameter values, r, signals algorithm termination, and q can be allowed to depend on {m^} provided the algorithm termination requirement is satisfied. 4.3 State Behavior of the Simple Genetic Algorithm In each of the next three subsections, the state transition mechanism (and its effect on the nature of the solution sequence) which results from employing a specified combination of the genetic algorithm operators to the Markov chain model is examined. The first case consists of a one-operator algorithm which employs only the reproduction operator. The second is a two-operator algorithm which employs reproduction and mutation. Finally a three-operator algorithm which includes crossover with reproduction and mutation is examined. Although it is most natural to describe the genetic operators in the order reproduction/crossover/mutation, the course adopted in Section 3.2, the following development proceeds most instructively if mutation is included with reproduction in the two-operator algorithm and crossover is deferred to the three -operator case. This is due to the fact that the mutation operator provides the essential state space modification required to make the Markov chains of the time-homogeneous two and three-operator algorithms irreducible (Definitions A7 and A8, Theorem Al), and consequendy causes them to have unique stationary distributions (Theorem A3). The one-operator algorithm (proportional reproduction only) does not satisfy the irreducibility requirement for existence of a unique stationary distribution. (Neither does the algorithm variant which employs reproduction and crossover without mutation). A unique stationary distribution means that the asymptotic state occupancy probability of the time-homogeneous two and three-operator algorithms is completely determined by the algorithm parameters and objective function. It is independent of the starting state (inidal population). Asymptotic independence of the starting state is a necessary (but not sufficient) condition on the zero mutation probability limit of the stationary distribution of the time-homogeneous algorithm for the inhomogeneous algorithm counterpart to avoid (asymptotically) local minima entrapment.

PAGE 40

31 4.3.1 A One-Operator Algorithm (Reproduction) In this subsection, the nature of the state transition matrix is examined for the case of no crossover or mutation (i.e. Q^ = (0,0) for < k < f). In this case, the conditional probability of selecting a solution i e S from a population described by the state vector n e S' is (i.e. proportional reproduction) n(i)xR(i) Vie S,VnÂ£ S':P,(i|n) = Xna)xR(j) J6S Eq. 4.2 where the subscript 1 indicates that the one-operator case is under consideration. Thus, the conditional probability of the successor generation described by m given that the present generation is described by n is a multinomial distribution, i.e. M! Vm,n Â€ S' : Pj(m | n) = n m(i)! jÂ€s iÂ€ S M m M m X n P,(i I n)' ::\">(') iÂ€ s xn iÂ€ S Eq. 4.3 n(i)xR(i) InQxRO) JÂ€S m(i) where again the subscript 1 distinguishes the one-operator case, where the symbol M! rM' m J n M(i)! Eq. 4.4 iÂ€ S designates the indicated multinomial coefficient and where by definition n(i) = n(i)xR(i) Sn(j)xRa) m(i) 1 m(i) = m(i) > Eq. 4.5 The transition probability matrix of the Markov chain representing the one-operator algorithm is composed of the array of conditional probabilities defined by Eq. 4.3, i.e. P=|P,(m|n)J. Eq.4.6

PAGE 41

32 Since it is independent of the sequence index (i.e. the parameter vectors are constant), the one-operator Markov chain is time-homogeneous (Definition A5). The set of states which represent uniform populations (i.e. the states m ^ e S^' c S' in which one component is M and all others are zero) are absorbing states of the Markov chain, because for any such state, P,(mA 1 111^) = 1 and Definition A6 applies. Since it follows from Eq. 4.2-3 that Vn e S' S^' : Pi(n | n) < 1, there are exacUy N = 2^absorbing states. The corresponding rows of P are given by _ _ fl m = nÂ» Vn^eS/:P,(m|nJ= _ Â„,;-,. Eq. 4.7 [0 mÂ£ S -{n^} Thus, for each state n^ e S^', the associated row of the state transition matrix (Eq. 4.6) contains 1 in the principal diagonal location and elsewhere. It follows that the N' x 1 probability vector q^^ (Definition A2) whose n^ e Sa' component is 1 is a stationary distribution (Definition A 10) of the one-operator Markov chain. It is not unique because any of the N = 2^ such vectors satisfies the requirement, as does any vector of the form q = _ I ^ qn^where -^^ > and I-)^ = 1 . The absorbing states preclude irreducibihty (Theorem Al), so the Markov chain does not satisfy the requirements of Theorem A3. The chain is aperiodic (Definition A9) however, because Vm e S' : Pi(m | iri) > so the period of all states is 1 . Thus, all of the conditions of Theorem A3 except irreducibihty are met by the one-operator Markov chain. The expected number of transitions required to arrive in an absorbing state, Efk^), is finite. An upper bound on Elk^} is given by E{kJ < < 00 Eq. 4.8

PAGE 42

33 where R^,Â„ and R^Â„ are the extreme values of R. (Recall that R is assumed strictly positive, so Rmaj > R,^ > 0). Eq. 4.8 can be derived by defining PaC^) as the conditional probability of arriving in the set of absorbing states, S^', on the ktransition given that the kstate is not absorbing, letting Pn,iÂ„ be a lower bound on PA(k), and bounding the series for Elk^} as follows E{kA}=Ikxp^(k)n(l-pA(l)) k 1=1 0. Eq. 4.10

PAGE 43

34 accomplished by expressing P2(i | H) as a sum over all j of the corresponding P,(j | n) rimes a factor which accounts for the probability of the collection of mutation events required to transform j into i. This probability can be expressed as Pm '"''(I Pm)^~"^''^^ where H(i,j) = H(j,i) is the Hamming distance of the pair i,j. That is, H is a function defined on S xS with values in {0, 1,2, -sL}. H(i,j) is the number of bits which must be altered by mutation to transform i into j and L H(i,j) is the number of bits which must remain unaltered. Thus, P2(i | n) can be written as VieS,Vn6S':P,(i|n)=Ip^<'-J\l-pJ^ = (1-PJ'X jes (1-pJ H(i.j) xP,(j|n) Eq.4.11 1 (1 + a) jÂ€s -Ia""'^'xP,0|H) where a = (1-pJ and Pm = a Eq. 4.12 Eq. 4.13 (1+a) For pÂ„=0 or pÂ„=l, Eq. 4.1 1 includes the indeterminate form 0Â° in some terms. Thus, the admissible range of Pn, is restricted to < p^ < 1, and consequenUy that of a is < a < oo. However, cases corresponding to Pn, > 1/2 <=> a > 1 are of no practical interest (they are less random than the case pÂ„ = 1/2 <=> a = 1), and some of the following developments restrict consideration to the range 0x "^^^^^^ . ' (l+af-s^s I n(k)xR(k) k6 S It is also straightforward to show that

PAGE 44

35 Zn(k)xR(k)=Â— ^_ _ keS (l+a)jÂ€SkÂ€S I I n(k)xR(k)xa"^-''l Thus, PjCi I n) can be expressed as P,(i|n) = I nQ) X R(j) X a"<''J> JeS I Z n(k) X R(k) X a''^''^ jÂ€ SkÂ€ S In(j)xR(j)xa"^''J> jgs (1+a)'Z n(k)xR(k)' kÂ€ S Eq.4.14 and PjCm | n) is multinomially distributed as follows M! Vm,nG S':P,(m|n) = Â— Â— Â— X nPadln)' n ni(l)! ies iÂ€ S m(i) (M^ yirxj M X n P2(i I n)' m(i) iÂ£ S Eq. 4.15 1 n i:na)xR(j)a' jeS naj)!""Â® mj (l+a)^'-iÂ€s In(k)xR(k) kÂ€ S The transition probability matrix of the Markov chain representing the two-operator algorithm is composed of the array of conditional probabilities defined by Eq. 4.15, i.e. P=[P2(m|H)]. Eq.4.16 Since the elements of P depend on a (and hence by Eq. 4. 1 2 on Pn,(k)), the two-operator Markov chain is generally not time-homogeneous. It is time-homogeneous if the mutation probability is fixed. Eq. 4.14-4.16 for the two-operator simple genetic algorithm are analogous to Eq. 4.2-4.6 for the one-operator variant except that PjCi | n) is strictly greater than zero for all n e S'. Thus, the two-operator analog of Eq. 4.5 is not required. Also

PAGE 45

36 and limP2(i|n) = Pi(i|n) lim PjCm I n) = P,(m I n). Eq. 4.17 The rows of the state transition matrix corresponding to the one-operator absorbing states have an especially simple form. Let i^e S be the solution represented in the absorbing state n^ e S^'. Then, from Eq. 4.14, H(i.iJ P2(i|nJ = M X R(iA) X a (l+a)^xMxR(iJ Eq. 4.18 .H(i.iA) a (l+af Thus, from Eq. 4.15, P2(m|nJ = /"mV' Sm(i)xH(i,i.) [^mj (l+a)"^ Since the reward function, R, is strictly positive by hypothesis, and since Vi,j Â€ S : < H(i,j) < L, it follows that for a in the range < a < 1, then a'Z nO) X RG) < S n(j) x RQ) x a"<'-^> < I n(j) x RQ), je S jÂ€ S jÂ€ S Eq.4.19 and consequently from Eq. 4.14 that Vie S,VnÂ€ S' : Using Eq. 4.20 in Eq. 4.15 yields ^ a > 1 -i-a
PAGE 46

37 Vme S':(l)qÂ„(m)>0 (2)qir=l (3)^ = ^ . Since the stationary distribution is by definition a left eigenvector of the state transition matrix (Definition A 10), it follows from Eq. 4.15 and 4.16 that the asymptotic state probability distribution of the time-homogeneous two-operator algorithm is completely determined by the objective function and the algorithm parameters. It is independent of the starting state, mo. 4.3.3 A Three-Operator Algorithm (Reproduction. Mutation and Crossover) The three-operator simple genetic algorithm corresponds to the case Vk:Qk = (p^(k), p^(k)) with both Pm(k) and Pc(k) nonzero. Results analogous to Eq. 4.14-4.21 for the two-operator case are obtainable by defining a new function which is similar in character to the Hamming distance function employed in Section 4.3.2 for the two-operator case. This subsection completes that generalization. The result only reflects the crossover operation implicitly, however it permits some very significant conclusions concerning bounding values of the three operator conditional probabilities. The new function, I(i,j,k,s), is defined over an ordered quadruple (i,j,k, s) where i,j,kÂ€ S and where s Â€ {0, 1,---,L} is a bit-string location. The states i,j e S represent respectively the first and second parent strings selected at a particular crossover opportunity and k 6 S represents a possible descendent string. The bit-string location s is the location randomly selected by the crossover operator, and normally it is unifomily distributed over its range. Thus, I is defined on S xS xS x{(), 1,2, Â•Â•,L} and it takes on values selected from {0,1 } depending upon whether the indicated crossover operation is or is not consistent. That is, I assumes the value one if the bit-string k is produced by crossing the bit-strings i and j at the site s, and zero otherwise.

PAGE 47

38 In terms of this crossover operator function, the conditional probability of producing, via reproduction and crossover, a solution k e S given a current population described by n e S' is P^'Ck I H) = p, X I I P,(i U) X P,(j I H) X ^ I I(i, j, k, s) iÂ€ SjÂ€ S L s +(l-pJxP,(k|H) Eq.4.22 1 *=L _ _ = PeX-xI I lPi(i|n)xP,(j|n)xI(i,j,k,s) L, ie Sje Ss=l +(l-pJxP,(k|H) where Pi(i | n) is as defined as in Eq. 4.2 and where Pj'O | n) refers to the two-operator algorithm consisting of reproduction and crossover without mutation. This result assumes uniformly distributed crossover site selection. The array of conditional probabilities [P2'(i I n)] plays a role in the three-operator simple genetic algorithm very analogous to the role played by the array [P,(i | n)] in the two-operator variant. In fact, the [P2'(i I n)] array can be used as counterparts of Eq. 4.2 to develop results exactly analogous to Eq. 4.3 and Eq. 4.6. Further, for n e Sa', Eq. 4.22 reduces to P2'(k|Hj = P,(k|Hj, Eq.4.23 and consequently this (fictitious) two-operator algorithm (reproduction and crossover) demonstrates the same sort of absorbing state behavior as the one-operator algorithm. From Eq. 4.22, the three-operator conditional probabilities and state transition matrix are expressible as P3(i I H) = Â— ^ X Z a"^''J) x P,'Q I H), Eq. 4.24 (1-ha) jeS

PAGE 48

39 P3(m|n) = M! n ni(i)! ieS X n P3(i I n)" Eq. 4.25 iÂ€ S ( M X n PjCi I nr<'^ my ie S and P = [P3(m|n)]. Eq.4.26 These results are developed in a fashion analogous to Eq. 4.14-4.16. From them, it follows that the three-operator Markov chain is time-homogeneous if both the mutation and crossover probabilities are fixed. In general it is not time-homogeneous. From Eq. 4.22, 4.24 and 4.25, it follows that lim P3(i I n) = Pj'O | n) a->0* and lim P3(m I n) = P^'Cm | n). a->0* Also, from Eq. 4.23-4.25, the three-operator analogs of Eq. 4.18-19 apply H(.,i^) P3(i I "a) = a (1+ar P3(m|nJ = 'M la \yc\) (H-a) ML Additionally, since a^ I P^'G I n) < I P^'O I n) x a""'^' < I P^'O I n), je S jÂ€ S jÂ€ S the three -operator analogs of Eq. 4.20-21 follow from Eq. 4.24-25, i.e. 1 ^Â•Vie S,Vn6 S': , 1 -i-a
PAGE 49

40 All of the state space characteristics described in 4.3.2 for the two-operator algorithm follow. In particular, the Markov chain of the three-operator algorithm is irreducible. Thus, a unique stationary distribution exists for the time-homogeneous three-operator simple genetic algorithm, and as in the two-operator case it is completely determined by the objective function and the algorithm parameter values. 4.3.4 Summary The asymptotic behavior of the one-operator simple genetic algorithm is dominated by the states which correspond to uniform populations, the one-operator absorbing states. The algorithm necessarily arrives at some member of the absorbing state set within a finite number of algorithm iterations (Eq. 4.8). The asymptotic probability distribution depends upon the algorithm initial population, mo. This observation is equivalent to the fact, established in Section 4.3.1, that the stationary distribution of the one-operator algorithm is not unique. A unique stationary distribution exists for the time-homogeneous two and three-operator algorithm variants (with a > 0), or equivalently, their asymptotic probability distributions are independent of nio. However, in the a ^ 0^ limit, both the two and three-operator algorithms degenerate into the absorbing state behavior which typifies the one-operator case (Eq. 4.17 and Eq. 4.23, 4.27). A very important question is whether the unique stationary distributions of the two and three-operator algorithms approach limits as a ^ 0^. Section 7 answers that question affirmatively, and in Section 8, the lower bounds reflected in Eq. 4.21 and Eq. 4.31 are employed to arrive at a monotone decreasing sequence bound on Pm(k) sufficient to guarantee that the limiting distribution is achieved (asymptotically) by the inhomogeneous two and three-operator Markov chains. The analogous conditional probability arrays [P,(i | n)] and [P2'(i I n)], whose elements are defined by Eq. 4.2 and Eq. 4.22 respectively, play a very essential role in the following sections, especially in Section 9. Most of the results developed hereafter apply equally to the two and three-operator algorithm variants by substituting from these

PAGE 50

41 conditional probability arrays appropriately. Thus, in much of the following, the notation modifiers are suppressed, so that the elements of either of these arrays are denoted by P(i I n), with the specific array reference being determined by context.

PAGE 51

SECTION 5 SOME EMPIRICAL RESULTS 5.1 Overview This section reports the results of some computer simulations based upon the genetic algorithm Markov chain model developed in Section 4. Their purpose is to help fix some of the state space and asymptotic probability distribution ideas which are central features of this work. The results reported here are separated into four subsections. Section 5.2 concerns enumeration of the state space, S'. Section 5.3 is devoted to generation of reward function data, which are subsequently used in the two remaining subsections. Section 5.4 illustrates the behavior of some selected conditional probabilities as a function of the algorithm control parameter, a. The results of the primary simulation task are reported in Section 5.5. They concern computation of the three-operator stationary distribution at extremely low (approaching zero) values of the mutation probability control parameter. One of the significant theoretical results developed in subsequent sections is suggested by the data presented in Section 5.5. It is that the zero mutation probability limiting stationary distribution provides nonzero probability for all states corresponding to uniform populations (i.e. one-operator absorbing states), including those which represent suboptimal solutions. This result poses a complication for the attempt to extrapolate the simulated annealing convergence theory onto the genetic algorithm, as discussed further in section 5.5. All simulation results included here were generated on the Cray Y-MP computer at the Eglin AFB, Fl. Computer Science Directorate. The data presented in Section 5.5 concerning the primary simulation task (the converged limiting stationary distribution 42

PAGE 52

43 results) includes some CPU utilization statistics which reflect the approximately 180 hours of CPU time expended in generating that data. The source program listings for the programs employed in generating the results of this section are included in Appendix D. 5.2 State Space Enumeration The results appearing in this section are of two primary types. The first is a table of computed state space cardinality values, N', at a variety of combinations of bit-string length, L, and population size, M. These results are products of the program GET_NPS.F appearing in Appendix D. It implements Eq. 4.1. The results are collected in Table 5-1. In addition to the N' column. Table 5-1 includes a similar column labeled N". It denotes the cardinality of a space designated S" which is related to S' and whose significance is established in Section 9. Its cardinality is given by N" = M Eq. 5.1 The data recorded in column N" of Table 5-1 are computed from this equation by the program GET_NPS.F. Table 5-1 State Space Cardinality M

PAGE 53

44 Table 5-1 (continued) M

PAGE 54

45 Table 51 (continued) M

PAGE 55

46 Table 5-4 S'atM=2,L=3 2

PAGE 56

47 In both data sets, the solution state which maximizes the reward value is the i e S represented by the decimal integer value 12. That is, for the four-bit function, i<,p, = 1 100, and its five-bit counterpart is iop, = 01 100. The reward function value for the arbitrary i e S is then computed by assigning the value 1 for each length 0, 1 or 2 schema (Section 3.3.1) in agreement with the optimum bit pattern and summing the contributions. Thus, for example, for the four-bit reward function, the bit-string 0000 has function value 4, generated by summing the contributions from the single matching length schema, two matching length 1 schemata and the one matching length 2 schema. A strictly positive reward function is guaranteed since every string matches the single length schema. R(i) 1 3 4 5 6 7 8 9 10 11 12 13 14 15 FourBit Reward Function Figure 5-1

PAGE 57

48 R(i) 12 14 16 18 20 22 24 26 28 30 i Five-Bit Reward Function Figure 5-2 5.4 Conditional Probabilities Versus a The following four figures present plots of two and three-operator conditional probabilities at two selected current states, n. These results are computed from Eq. 4.14 and Eq. 4.24. The plots are generated for the four-bit problem with reward function given in Figure 5-1 and with M = 6. From Table 5-1, the cardinality of S' for these examples is N' = 54264. The conditional probabilities are provided at two selected n vectors, one representing the uniform population n = (6000000000000000) and one the mixed population state n = (2000010001002000), and at three values of the mutation probability parameter. The two and three-operator results are respectively products of the computer programs GET_P2INS and GET_P3INS provided in Appendix D.

PAGE 58

49 The purpose of the tests from which these data are produced is verification of the computer algorithms (and the implementing subprograms) employed to generate the conditional probability calculations required by the primary simulation task reported in Section 5.5. Thus, for example, all conditional probability distributions are uniform at a= 1 as is required by Eq. 4. 14 and Eq. 4.24, and for a -^ 0"^ all conditional probability distributions approach the one-operator counterparts as is required by Eq. 4.17 and Eq. 4.27. Also, the two and three-operator conditional probabilities are identical for the uniform population case (Figures 5-3 and 5-5) as required by Eq. 4.18 and 4.28, and the three-operator mixed population state case allows generation of solutions not present in the current population even in the zero mutation probability limit. P,(i I n) 0.8 0.6 0.4 02 -1 L \ L1.0 .02 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P2(i I n) at n = (6(KXKXKXXKXKKXKX)) Figure 5-3

PAGE 59

P^OIn) 08 0.6 0.4 0.2 50 ..^^^^^^ ' ' '^ 1 1 ^ 1 ^Ji 1 \^ I "''J ( I i I I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1.0 .02 P2(i I n) at n = (2000010001002000) Figure 5-4

PAGE 60

51 1.0 0.8 P3(i|n) 0.6 0.4 0.2 _] 1 1 I I I I I I I I I I I I ZJ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P3(i I n) at n = (6000000000000000) .02 Figure 5-5

PAGE 61

52 P3(i|n) 0, 8 9 10 11 12 13 14 15 PjO I n) at n = (2000010001002000) Figure 5-6 5.5 Converged Limiting Stationary Distributions The following data represent converged three-operator stationary distribution results for both four and five-bit problems at a variety of population sizes. The results recorded in Figures 5-7 through 5-16 are products of the computer program GET_3STAT.F included in Appendix D. They are obtained by repeatedly multiplying a current state probability vector by the three-operator state transition matrix until a termination criterion representing approximate convergence is attained. The starting probability vector is the multinomial distribution corresponding to a uniformly distributed PjO | n) array, and the termination criterion is that the sum of the probabilities for all nonuniform population states is less than 0.004. All of the results reported here are for extremely small a (approaching zero) and thus, as predicted by the model, only the states corresponding to uniform populations

PAGE 62

53 (one-operator absorbing states) have nonzero probability. Consequently, only the final probabilities for the uniform population states are displayed in Figures 5-7 through 5-16, with each such state indexed by the decimal integer value corresponding to the solution represented. Table 5-5 summarizes the Cray Y-MP computer resources expended in generating these data. Tabulated there are the number of vector multiplications (of dimension N') required to attain the termination condition and the CPU time utilized. The CPU time is in seconds, rounded to the nearest integer. The tabulated data are collected from the log files generated in the computer runs which produced the stationary distribution data for Figures 5-7 through 5-16. The limiting distribution entropy results in Figures 5-17 and 5-18 are computed from the converged stationary distributions. The results are recorded in bits and are plotted as a function of population size. A very significant result suggested by the limiting stationary distribution data is that the a Â— > 0"^ value of the stationary distribution is nonzero for all possible uniform states. This behavior, which is confirmed by theoretical results developed in Section 7, precludes extrapolation of the simulated annealing global optimality convergence result onto the genetic algorithm. However, as suggested by the data plotted in Figures 5-17 and 5-18, it may be possible to approach the desired limiting behavior as closely as required by adjusting the population size parameter. Those figures indicate that for sufficiently large values of the population size parameter, the limiting distribution is dominated by optimal solutions, and that the limiting distribution entropy decreases monotonically with increasing population size. Results developed in Section 9 reinforce this premise.

PAGE 63

54 0.12 q(i) 0.08 006 0.04 002 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Limiting Stationary Distribution at M=2, L=4 Figure 5-7

PAGE 64

55 0.2 0.15 q(i) 0.1 0.05

PAGE 65

56 q(i) U J

PAGE 66

57 q(i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Limiting Stationary Distribution at M=5, L=4 Figure 5-10

PAGE 67

58 0.5 0.4 q(i) 0.3 0.2 0.1 1 i 1 1 ,1 1. 1 , 1 1.

PAGE 68

59 q(i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Limiting Stationary Distribution at M=7, L=4 Figure 5-12

PAGE 69

60 q(i) 0.06 0.05 0.04 0.03 0.02 0.01 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Limiting Stationary Distribution at M=2, L=5 Figure 5-13

PAGE 70

61 0.1 0.08 q(i) 0.06 0.04 0.02 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Limiting Stationary Distribution at M=3, L=5 Figure 5-14

PAGE 71

62 q(i) 0.16 0.14 0.12 01 0.08 0.06 0.04 002 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Limiting Stationary Distribution at M=4, L=5 Figure 5-15

PAGE 72

63 q(i) 025 02 0.15 0.1 0.05 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 i Limiting Stationary Distribution at M=5, L=5 Figure 5-16 Table 5-5 CPU Utilization Statistics M

PAGE 73

64 H M Limiting Distribution Entropy vs Population Size (Four-Bit Problem) Figure 5-17

PAGE 74

65 H M Limiting Distribution Entropy vs Population Size (Five-Bit Problem) Figure 5-18

PAGE 75

SECTION 6 THE CRAMER'S RULE FORMULATION OF THE STATIONARY DISTRIBUTION 6.1 Overview In Sections 4.3.2 and 4.3.3, the time-homogeneous two and three-operator simple genetic algorithm Markov chains are shown to possess unique stationary distributions. Those conclusions are established by invoking Theorem A3, which asserts that in each case the stationary distribution is a left eigenvector of the state transition matrix and that the additional constraint that it be a probability vector (Definition A2) makes the solution unique. In this section, the existence and uniqueness arguments are refined into a Cramer's rule formulation of the solution. This development concerns the time-homogeneous algorithms only, with a constrained to a > 0, and it appeals heavily to the foundation provided in Appendix B. The product of this development is an expression for the components of the stationary distribution vector as rational functions generated from the characteristic polynomials of matrices derived from the state transition matrix. The derived matrices are generated by setting selected rows of P to zero. The utility of the approach is that the form of P suggests a mechanism for expressing the values of the characteristic polynomials. Some key intermediate parts of the required methodology are developed in Section 9, but the effort stops short of explicit solution. However, some very significant conclusions concerning the asymptotic behavior of the algorithm are obtainable (Sections 7 and 8) from the results developed here without explicidy solving the system. 6.2 The Stadonary Distribution Description As established in Section 4, implementadon of the mutation operator with nonzero mutation probability (i.e. a > 0) implies that for both the two and three-operator 66

PAGE 76

67 algorithms, Vm,n Â€ S':P(m | n) > 0Thus, by Definition Bl, p is primitive for any integer k > 1. Hence, from Section B.3, the stationary distribution of the two and three-operator simple genetic algorithm exists, is unique and is a left eigenvector of the state transition matrix corresponding to eigenvalue 1, i.e. or equivalently ^(P-!) = 0'. Eq.6.1 The following proposition establishes a significant fact concerning the rank of the matrix (P 1) in Eq. 6. 1 . Proposition 6. 1 : The rank of the matrix (P-I) in Eq. 6.1 is exactly N' 1 where N' = card(S') is the dimension of P. This result follows from Theorem B4(f). Its significance is that exactly one column of the system of equations in Eq. 6.1 can be replaced without sacrificing any of the constraints which Eq. 6.1 imposes on qÂ„. Proposition 6.2 below concerns such a modification of the system. The modification consists of replacing any column (e.g. the column indexed by n e S') of Eq. 6.1 by a column corresponding to the constraint _X qa(m) = qIT=l, Eq.6.2 me S" thus producing a system of the form ^(P-!)5 = ^ Eq.6.3 where (PI)h is generated by replacing the column of (P-I) indexed by n Â€ S' with the vector 1 whose components all have the value 1, and where e^ is the row vector containing 1 in column n and O's elsewhere.

PAGE 77

68 Proposition 6.2: If the constraint described in Eq. 6.2 is used to replace any column (e.g. column n) of the system in Eq. 6.1, the resulting system (Eq. 6.3) is full rank, or equivalently,|(P-!)Â„| ;^ 0. Since P is a stochastic matrix (Definition A3), the system of equations in Eq 6.1 can be transformed into an equivalent system in which the column indexed by the arbitrary column index n e S' is represented by the equation q^ = 0. The required transformation is obtainable by replacing column n by the sum of all columns m e S', and thus any n e S' is a candidate for replacement. Proposition 6.2 is then a restatement of Proposition B2 in terms of the determinant of the matrix of the modified system. It is the essential condition for justification of the following proposition. Proposition 6.3: The components of the stationary distribution can be expressed in the form _ |(P-!)-f| qa(m) = T;^ I(p-i)hI where (P-l)^^ is derived from (PI)s by replacing the row of (P1)5 indexed by m e S' with the row vectored. This result is simply an application of Cramer's Rule to the solution of the system in Eq. 6.3. It applies because | (P-I)^! 9^ is assured by Proposition 6.2. The equality defined in Proposition 6.3 can be evaluated without computing I (PI)i;| directly, as suggested by the following proposition. Proposition 6.4: The denominator determinant in Proposition 6.3 can be written as I(P-i)hI=_I \{P-\t\. me S'

PAGE 78

69 This result follows from application of elementary column operations on column n of I (P I)hI and employing the definition of | (P 1)^ \ . The essential step is noting that the cofactor of each of the (unit) elements in column n is equal to the corresponding l(p-i^?|. Since the numerator determinant defined in Proposition 6.3 is generated from (P1)5 by replacement of row m by the row vector e^, its value is the cofactor of the (unit) element in row m and column n. As indicated in the following proposition, it is equal to the determinant which results from the corresponding row replacement in (P 1). Proposition 6.5: The numerator determinant defined in Proposition 6.3 can be written as |(P-!)f| = |(P-!)^Â®'| where (P-I)"""*' is defined as the matrix which results from (P-I) by replacing the row indexed by m with the row vector e^. Next, note that if m = n, then | (P 1) "" " | can be written as i(P_if^)|=_|rp
PAGE 79

70 where pC) is defined as before and where pÂ® is defined as the matrix which results by replacing row m of P by the row vector e^. This result can be further reduced by noting that the row replacement by which P^ is generated from P preserves the row sum constraint (i.e. P^ is a stochastic matrix). Thus, 1 is an eigenvalue of PÂ® (Definition A3), from which it follows that | (P;^ 1)| 0. Consequently, the m = n and m ?t n cases can be assembled as indicated by the following proposition. Proposition 6.6: The determinant | (P 1) " " *| defined in Proposition 6.5 can be written as \iP-if''\=-M-i]\. By collecting the results of Propositions 6.3-6 and noting that the superscript in P^; is now superfluous, the components of the stationary distribution can be written as indicated in the following proposition. Proposition 6.7: The components of the stationary distribution can be expressed in the form _ -\P^-i\ |Ps-!| _Z-|P5-i| _I|Ph-i| nÂ€ S' ne S' where P^ and P^ are derived from P by replacing the rows indexed by m and n respectively with the row vector . Thus, computing the stationary distribution components reduces to evaluating the characteristic polynomials of the Psi's at A. = 1 (i.e. (t>s(^) = I P^ ^I| =>|Ph-I| =s(l))Also, since 1 is an eigenvalue of P it follows that (})(1) = | P-I| =0, which suggests the following alternative to Proposition 6.7. Its usefulness is established in Sections 9.3-9.4.

PAGE 80

71 Proposition 6.8: The components of the stationary distribution can be expressed in the alternative form _ IP-II-IP5-II qu(m) = I(|P-I|-|P5-I|) nÂ€ S' where as before P^ and P^ are derived from P by replacing the rows indexed by m and n respectively with the row vector . 6.3 Positivity of the Stationary Distribution Components Strict positivity of the stationary distribution components can be deduced from Theorem B4 and the form of P^. Every element of P^; in every row other than row m is identical to the corresponding element of P, while those in row m are zero. This is expressed in the nonnegative matrix notation of Appendix B as < P^ ^ P. Further, since P5; and P differ in row m, P^ ^ P, and consequently by Theorem B4(e), every eigenvalue of Ps satisfies | ^.J < 1 . It follows that for >. > 1 , ( 1 ) | P^ >.I| = U{\ A.) ^t and (2) the algebraic sign of | Ps^I| is (-l)"^ for all m e S'. Specializing these arguments to the case A, = 1 yields the following proposition. Proposition 6.9: For all a > 0, the value of the determinant | P^ 1| satisfies ViHe S': (1) IP^-Tl ^^Oand (2) the algebraic sign of I Phj-I| is (-if. An immediate consequence of Proposition 6.9 is that both numerator and denominator of the expression for qÂ„(m) in Proposition 6.7 are nonzero and have identical algebraic sign. Strict positivity of the stationary distribution components follows from these observations. That is, Vm e S' : q,,(m) > 0.

PAGE 81

72 6.4 The Indeterminate Form at a = All of the results established in this section assume that the mutation probability parameter is strictly positive (a > 0), and thus are not applicable at a = 0. The reason is apparent when Eq. 4.7 and the two-operator result in 4.17 (or the three-operator counterparts of Eq. 4. 17 given by Eq. 4.23 and 4.27) are applied to | P^ 1| . It follows that the row of the a -^ 0^ limit of | Ps-I| corresponding to the one-operator absorbing state n^ e Sa', n^ ?
PAGE 82

SECTION 7 THE ZERO MUTATION PROBABILITY STATIONARY DISTRIBUTION LIMIT 7.1 Overview In Section 4.3.1, it is established that the time-homogeneous one-operator genetic algorithm Markov chain possesses a stationary distribution but that it is not unique. In Sections 4.3.2 and 4.3.3, it is established that the time-homogeneous two and threeoperator counterparts possess unique stationary distributions provided a > 0, and Section 6 formulates the existence and uniqueness argument into a rational function expression for the unique solution. Since the two-operator state transition matrix approaches its oneoperator counterpart as a ^ 0* (Eq. 4.17) and since the three-operator algorithm exhibits the corresponding behavior with respect to the P2'(i I n)s (Eq. 4.23), a question which naturally arises from these observations is whether an a Â— > 0^ limiting distribution exists for the two and three-operator algorithms. (If such a limit exists, then it is necessarily unique). This section answers that question affirmatively and also confirms the observation made in Section 5.5 that the limiting distribution is nonzero for all states corresponding to uniform populations (absorbing states). The approach taken here is to transform the expressions for qÂ„(m) in Propositions 6.7 and 6.8 into equivalent expressions which yield determinate forms at a = 0. The result requires transforming P and P^ into related matrices but with the states corresponding to uniform populations (one-operator absorbing states) coalesced into adjacent nonunifonn population states. The development is tedious and involves some additional notation. 7.2 Functional Form of the Stationary Distribution Before proceeding with the limiting case development which is the primary purpose of this section, it is convenient to establish some intermediate results concerning the 73

PAGE 83

74 behavior of q^ as a function of aThese resuUs follow from the results developed in Section 6 and some simple observations about the form of the elements of P. From Eq. 4.14-4.16 and Eq. 4.22-26, all elements of the state transition matrix are rational functions of a with denominator polynomial (1 + a)^^. Thus, for a > (l-^a)^'|Ps-!| =\{l+af'-P--{l+a)'^l\ = |Qs-(l+ar!| where every element of Q^ (and hence the value of | Q5 (1 -Ia)*^^!! ) is a polynomial in a. Further, since row m in Qs is zero, the polynomial value of the determinant includes the factor (1 -1a)'^''. Consequently (1-Ha)^'-'^|P--I|=e,,(a) Eq.7.1 for 0^(a) some polynomial function of a. Proposition 7.1 below follows. Proposition 7.1: For all a > 0, the value of the determinant | P^ 1| is a rational function of a with nonzero denominator polynomial (1 + a)"^"^ "''. By applying Eq. 7.1 to Proposition 6.7, the components of qÂ„ can be written as _ es(a) es(a) ne S' Hence, the qa(m) are rational functions of a, and since a rational function is continuous everywhere its denominator polynomial is nonzero, application of Proposition 6.9 and Eq. 7.1 (which together establish that 0(a) = Z6H(a) ^ 0) to Eq. 7.2 yields the following. Proposition 7.2: For all a > 0, the components of q^ are continuous rational functions of the independent variable a. Further, differentiation of Eq. 7.2 with respect to alpha yields a rational function of a

PAGE 84

75 dqa(m) _ 1 da ~ 0(a)= e(a)^-e,(a,''Â«<Â«' da 'Â•"'"' da with nonzero denominator polynomial 0(a)^ The following proposition is a consequence. Eq. 7.3 Proposition 7.3: For all a > 0, the components of the first derivative of qÂ„ with respect to a are continuous rational functions of a. 7.3 The Absorbing State Rows of I P II and I Pzr II The rows corresponding to one-operator absorbing states in the determinant I P1| have a particularly simple form. The nondiagonal elements of row n^ e S^', which represents a uniform population of solutions i^ e S, are given by Eq. 4.19 and 4.29 respectively for the two and three -operator cases. The principal diagonal element is obtained by evaluating Eq. 4.19 or 4.29 at m = n^ and subtracting 1. Thus, P(nJnJ-l = /"M^ Â„^'(v.'a) a v"Ay (l-t-a)^ -1 -1 1 1 MLaML(ML l)a^/2 -a ML (1+a) ML -MLa-HO(aO (l-f-a)""^ ' and if the general element in | P1| is denoted by T(m | n), then the elements of row n^ can be written as Vn^eS,':T(n|nJ = -MLa-(-0(a') \ML (1-1ar Mia n = n. Eq. 7.4 n j(H-a) ML neS'-{n^}

PAGE 85

76 Additional insight into the form of the absorbing state rows can be obtained with the aid of the following notation. Let m^.n^ e Sa' be distinct but otherwise arbitrary absorbing states of the one-operator Markov chain, let i^ e S be the bit-string represented in Ha and let 8(1^) c S be the set of bit-strings accessible from Ia via exactly one bit mutation event (i.e. 8(1^) = {ipii e S,H(i,,iA) = !})Â• It follows from this definition that card(S(iA)) = L. Then, for M > 1 let 8(11^)' be defined as S(HA)' = {H:He S',n(iA) = Ml,n(i,)= l,i, Â€ SdJ} cS', the set of nonabsorbing states adjacent to the absorbing state n^. The restriction on M is required to ensure that no absorbing state m^ is contained in the adjacency set of any absorbing state n^. S(nA)' includes exactly one distinct element for each i, Â€ 5(1^), and consequently card(S(nA)') = cardCSCi^)) = L. Also, from the form of 8(11^)', it follows that for M > 3, 8(mA)' and 8(nA)' are disjoint if m^ and n^ are distinct one-operator absorbing states. Thus, if 8^" is defined as 8a" = u S{nJ and M > 3, then card(8A") = card(8A') x L = NL. This restriction on M is assumed in all of the following. With the aid of the new notation, the element in column n g 8(nA)' of row nA in I P1| can be written as f M ^ Vne8(nA)':T(n|nA) = P(n|nA) = a (,M-lj(l+a)^ M^ g 1 J(l+af^ Ma (l-i-af'Thus, Eq. 7.4 can be revised as follows

PAGE 86

VH^eS/:T(n|nJ = 77 -MLa + 0(a^) (1+a) Ma n = n. MI. A (l+a)""' O(a0 n Â€ S(nJ' Eq. 7.5 HÂ€S'-S(HJ'-K} (1+ar where the exponent s of a in the order expression for the general term is an integer satisfying s > 2. The elements in columns n^ and n Â€ SCnJ' are first order in a while the elements in all other columns are at least second order. Eq. 7.5 applies to every absorbing state row of | Pis1| as well if in g S^'. If |P5^~ 1 1 is being considered where m^ e S^', then row m^ contains -1 at its principal diagonal and zeros elsewhere. In that case Eq. 7.5 only applies to the absorbing state rows n^ s S^' -{m^}. Exacdy N1 such rows exist in |Pm^-l|. By applying Eq. 7.5 and these observations to Proposition 7.1, it follows that the lowest order term with nonzero coefficient which can conceivably exist in the numerator polynomial of |Pis^1| is the order a^"' term. Similar reasoning reveals that the corresponding lowest order term with nonzero coefficient for | P^1| with m Â€ S^' is the order a'^ term. If the coefficient of the order a^" ' term in the numerator polynomial of | P^^ 1 1 is indeed nonzero, and if the corresponding coefficients for all such m^ have the same algebraic sign, then the required limiting value of qÂ„ can be expressed in terms of the.se nonzero coefficients via substitution into Proposition 6.7. These conditions are in fact satisfied as demonstrated below. 7.4 Reformulation of Propositions 6.7 and 6.8 The next step in this development is the definition of some auxiliary matrices related to P and P^; and the reformulation of Propositions 6.7 and 6.8 in terms of them. The new matrices, designated PCm^)' and P^^ ' respectively, are derived by coalescing each of the N 1 absorbing state columns n^ e S^' { m^ } of | P 1| and | Ps^ 1 1 with its neighboring nonabsorbing state columns, n Â€ S(nA)'. Specifically, let |Qs^| be derived

PAGE 87

78 from I p 1 1 by adding l/L times the column n^ Â€ Sa' {m^} to each of the L adjacent I A I nonabsorbing columns n Â€ S(n^y and repeating the process for each remaining n^ e Sa' {mA}. This operation is applied once each for the exactly N 1 absorbing state columns Ha e Sa' {hia} and it preserves the value of the determinant | Qs^ | = 1^*5;^ ~^ |If now Qs (m I n) denotes the general element of | Qs^ |, then by applying the recipe used in its construction and Eq. 7.5, the elements in the absorbing state rows nA e Sa' {mAJ of Qs can be written as VitIa e Sa', VnA e Sa' {mA} : QsXm I n^) -MLa + 0(a^) m = n. (l+ar^^ O(a') meSK)' ^^ mÂ€S'-S(HA)'-{HA} (l+af^ O(a^) (1+ar where as before s is an integer satisfying s > 2. Thus, each of the N 1 absorbing state rows Ha e Sa' {vri/^} of | Qs^ | can be written as a sum of two rows, one row containing -MLot/(l + a)"^^ at its principal diagonal location and zeros elsewhere and the second row being a multiple of a^/(l + a)"^^. It follows from elementary determinant row expansion operations that | Ps;^ ~ ^ | = lOs^ | can be written as I _| II (-MLaf-|Q,;| o(a-) where | Qh^' | is the order N' N + 1 principal minor of | Qs^ | generated by deleting the N 1 row/column pairs which intersect on the nA Â€ Sa' {mAJ principal diagonals and where the exponent s of a in the Eq. 7.6 order expression is an integer satisfying s > N. The elements in all rows of | Qh^' | except row mA are composed of contributions from the elements in nonabsorbing state rows of P and the -1 principal diagonal term contributed by I in | P^^ 1 1. Row mA of | Qs^' | contains -1 at its principal diagonal location and zeros elsewhere. Thus, if Qs. ' is written as Qs. ' = Ps. ' " I' , then firom the

PAGE 88

79 recipe employed in its construction, it follows that the square matrix p_ ' thus defined has dimension N' N + 1 and that its elements are given by Eq. 7.7 Vm,n e S'-S/ + {H,} Ps.(m|n)' = P(m I n) n = m. n?tm. P(m|n) + ^P(n^|n) mÂ€ S'-Sa' + {mJ + S(mJ'-S/'. n^tm^ me Sin J Careful examination of Eq. 7.7 reveals that the transformation by which P^; ' is generated from P^^ preserves all row sums. Thus, Pj^^' is very similar in form to P^; . It is derived from a (fictitious) row stochastic matrix by setting a specified row (row m^) to zero. If the preceding steps are repeated for | P^ 1| where m ^ S^', except that all N absorbing state columns n^ e S^' are coalesced rather than just the N 1 columns n^^ e Sa' {m^}, a result very similar in form to Eq. 7.6 obtains. That is, _ (-MLaflQs'l O(a') Eq. 7.8 (1+ar-^ (1+a)^'^' where | Q^i'l is the order N' N principal minor of | Q^\ generated by deleting the N absorbing state row/column pairs and s is an integer satisfying s > N + 1. The nonabsorbing state row m contains -1 at its principal diagonal k^ation and zeros elsewhere. Substitution of Eq. 7.6 and 7.8 into Proposition 6.7 yields a form more amenable to examination of the a -^ 0* limiting stationary distribution. The two cases m e S^' and me S' Sa' must be distinguished. Then, after some straightforward algebra.

PAGE 89

80 { |Qi,;| + 0(a)/(l+af-'^-' S |QH;| + 0(a)/(l+af-^^' qa(m) = m = m, e S.' 0(a)/(l+ay N'-N + l X Qh' +0(a)/(l+a) Â— meS'-S' N'-N+l ^ An equivalent result expressed in terms of the auxiliary matrices P^; ' is qa(m) = ' 0(a)/(l+af "^"' Z |PH;-I'| + 0(a)/(l+ar-^^' m = mA e S, me S'-S/ Eq. 7.9 By retracing the preceding steps by which P^ was transformed into P^ ', companion results to Eq. 7.7 and Eq. 7.9 can be developed for PCm^)'. The companion to Eq. 7.7 differs only in the elements of row n = m^. Thus, if P(m | n)' denotes the general element in PCm^)', then Eq. 7.10 Vm,ne _ _ P(m I n) tP(m|n) + -P(njn) me S'-S^' + {mJ + S(mJ'-S/' m e S(nJ' Further, examination of Eq. 7.10 reveals that the row sum constraint on P is preserved in the transformation by which PCm^)' is generated (i.e. PCm^)' is a stochastic matrix). Thus, I P(m;^)' -I'l = 0. A consequence is the Proposition 6.8 counterpart of Eq. 7.9, Eq.7.11 |P(SJ'-I'| -|Ps ;-r| + 0(a)/(l+af -^^' qa(m) I i|P(nJ'-r| -|PH;-r|| +o(a)/(i+ar-^^' 0(a)/(l +af-^*^ vN'-N+l I |P(nJ'-I'|Ps'-r +0(a)/(l+a) m = m. 6 Sa' me S'-S '

PAGE 90

81 7.5 The Stationary Distribution Limit The zero mutation probability limits of Eq. 7.9 and Eq. 7. 1 1 exist if the determinant sums in the denominators are nonzero. In fact they are nonzero, as demonstrated in the following. This argument is very similar in form to the development in Section 6.3 concerning positivity of the stationary distribution. The essential step is demonstration of the existence of a primitive stochastic matrix Q' which satisfies both < lim P-^' 0* limit of P^^ ' are obtained by substituting the one-operator results in Eq. 4.2-5 into Eq. 7.7. If the three-operator case is under consideration, then Eq. 4.22 and Eq. 4.24,25 are employed. In the following, the two-operator notation is employed. Let Q' be generated from the a Â— > 0* limit of P^ ' by replacing row m^ with the row whose elements are given by VEgS'-S/ + {m^}:Q'(m|Ej = ^,_|^^^ >Q. Eq.7.12 Thus, the row sum of row m^ in Q' is 1. Since all remaining rows of Q' are identical to those of the a Â— > C limit of P^ ', and consequently have row sum 1 by Eq. 7.7, Q' is a stochastic matrix. Additionally, it satisfies both 0< lim P-^' 0, the fictitious Markov chain is both aperiodic (Definition A9, Theorem A2) and primitive (Definition Bl, Theorem Bl) provided that it is irreducible (Definitions A7 and A8, Theorem Al). Thus, primitivity is established by demonstrating that every state me S' S^^' -I{m^} is accessible in some finite number of transitions from every state n Â€ S' S^' -i{ni,^}. Since all states in S'-SA' + {niA} are accessible in one transition from m^ (Eq. 7.12), it is

PAGE 91

82 sufficient to demonstrate that m^ is accessible in some finite number of transitions from every state n e S' Sa'. Let iA e S be the bit-string represented in m^, let n e S' S^' and let i, e S be selected such that n(i,) > and H(i,, i^) < H(i, i^) for all i represented in n. Then, two cases must be examined. In case (1), i, = i^ while i, ^ i^ for case (2). If ij = Ia, it follows from Eq. 7.7 and the construction of Q' that Q'K I n) = lim P-^(Ha I n) = lim P^CHa | n) Â— \M , ^riimP^CiAln)]" = P,(iA|nf = P,(i,|Hr>0 and consequently mA is accessible from n in 1 transition. Otherwise 3i2Â€S(i,)3H(i2,iA) = H(iÂ„iA)-l and further if n, e Sa' is the one-operator absorbing state defined by the condition ni(ii) = M while n,2 Â€ S(n,)' is the adjacent nonabsorbing state defined by n^Oi) = M1, ni2(i2) = 1, then from Eq. 7.7 and the construction of Q' Q'(ni2 1 n) = lim P2(Hi2 1 n) + lim PjCH, | H) = P,(n,j|n)+-P,(i,|H) L = rP.(n,|n) L" = f[P,(i,|n)f >0. Thus, ni2 is accessible from n in one transition. If ij = Ia, then by the case (1) argument iba is accessible in one additional transition. Otherwise, the case (2) argument is repeated for some i3 E S(i2) 3 H(i3, Ia) = H(i2, Ia) 1 = H(i., Ia) 2.

PAGE 92

83 This procedure necessarily terminates with H(i,,iA) + 1 applications and the corresponding state space trajectory is executed with nonzero probability. From the foregoing argument, it follows that state m^ is accessible in some finite number of transitions from every state n e S' S^', and thus that Q' is primitive. Then, since both 0< lim P-^' 0* counterpart of Proposition 6.9, is a consequence. Proposition 7.4: The value of the determinant lim [ P^^' 1' | satisfies Vm^E S^': (1) lim p '-I'Uo I A I a-Â»0* (2) the algebraic sign of lim | P^jJ ~ I' | is (-1) ' _ T' I :Â„ / i\N'-N + l 'A The conditions asserted in Proposition 7.4 ensure that substitution into Eq. 7.9 and 7.1 1 yields a determinate form in the a Â— > iT limit. Propositions 7.5 and 7.6 below represent the limiting forms, and consequently are respectively the limiting counterparts of Propositions 6.7 and 6.8. Proposition 7.5: The components of lim qÂ„ exist and can be expressed in the form a-Â»0' lim I p^;1' I a-Â»0* Â— Â— Â„ , m = m^ e S^ lim qÂ„(m) = ^ a-Â»0* _1 iim|P5;-r mÂ€S'-S^'

PAGE 93

84 Proposition 7.6: The components of lim qÂ„ exist and can be expressed alternatively as a-Â»0* lim{|P(mJ'-r|-|PE;-i' liin qÂ„(m) = qo(m) = a-Â»0* _Z lim||P(nJ'-r|-|P5;-I m = m^e S^' meS'-S^' An immediate consequence of Propositions 7.4 and 7.5 is strict positivity of the zero mutation probability limiting stationary distribution components for all absorbing state rows. That is, Vm^ e S^' : qo(mA) > 0The argument is analogous to that at the conclusion of Section 6.3 concerning strict positivity of all stationary distribution components when a > 0. This result is anticipated by the simulation results in Section 5.5. A consequence is that the required limiting behavior for direct application of the simulated annealing convergence theory to the genetic algorithm model does not follow. However, the results displayed in Section 5.5 and developments produced in Section 9.3 suggest that the limiting distribution can be made arbitrarily close to the desired limiting behavior. Since the a Â— > 0^ limit of the stationary distribution exists, the definition of qÂ„ can be extended to include the point a = 0. That is qJa=o = qo= lim Qa a-Â»0* where the values of the required limits are provided by Proposition 7.5. Proposition 7.7 below follows from this extended definition of q^ and Proposition 7.2. Proposition 7.7: For all a > 0, the components of q^ are continuous rational functions of the independent variable a.

PAGE 94

85 Proposition 7.3, which concerns the first derivative of q^, can also be extended to include the limiting case. The extension requires easily obtainable counterparts of Eq. 7.1-3 developed for | Ph^' 1' | and Eq. 7.9. The Eq. 7.1 counterpart is (l+a)^'-^|PH;-i'l = es,(a)', Eq.7.13 and that for Eq. 7.2 is Id^^iay + Oia) 7-Â—Â— ;rÂ— Â— ni = m. gS.' 0(a) +0(a) ^^^j^ qa(m) Q(Â«) ;;:.c._c' e(a)' + 0(a) ^^ where 0(a)' is the polynomial counterpart (summed over n^ e S^') of 0(a) in Eq. 7.2. Differentiating Eq. 7.14 with respect to a yields a rational function with denominator polynomial [0(a)' + 0(a)]^ whose a -^ 0"^ limit is nonzero by Proposition 7.4, Eq. 7.13 and the definition of 0(a)'. Proposition 7.8 below follows from Proposition 7.3 and these observations. Proposition 7.8: The components of the first derivative of qÂ„ with respect to a possess limits as a ^ 0^ Thus, a zero mutation probability limit exists for the time-homogeneous two and three-operator algorithm variants. The limit is represented by Propositions 7.5 and 7.6. Further, Propositions 7.7 and 7.8 establish some useful ancillary results concerning the stationary distribution behavior at the point a = 0. These latter results are employed in the following section in establishing strong ergodicity of the inhomogeneous genetic algorithm Markov chain. Propositions 7.5 and 7.6 are used in Section 9 to develop a mcthcxiology for representing the stationary distribution limit.

PAGE 95

SECTION 8 A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND 8.1 Overview The annealing schedule bounds for the simulated annealing algorithm, which are reviewed in Section 2.4.2, are derived by requiring that the nonstationary Markov chain which represents the algorithm be strongly ergodic (Definition A 13) and then deducing a monotonic lower bound on the algorithm control parameter. The methodology consists of demonstrating that the time-homogeneous Markov chain corresponding to every positive algorithm control parameter value possesses a stationary distribution, that the sequence of stationary distributions corresponding to any sequence of positive control parameter values converges to a limiting distribution if the control parameter sequence converges to zero, and then employing Definitions Al 1-A13 and Theorems A5-A7 to deduce a sufficient condition (the annealing schedule lower bound) to guarantee that the nonstationary algorithm achieves the limiting distribution (i.e. strong ergodicity). The model development in Section 4 demonstrates that for all mutation probability values in the range < p^ < 1 , the Markov chain representing either the two or threeoperator time-homogeneous simple genetic algorithm possesses a stationary distribution. Section 7 demonstrates that the stationary distribution approaches a limit as the mutation probability parameter approaches zero. This section proposes and then verifies a monotone decreasing lower bound on the mutation probability sequence of the nonstationary genetic algorithm Markov chain which is sufficient to ensure strong ergodicity. 86

PAGE 96

87 8.2 A Weak Ergodicity Bound The following paragraphs propose and then verify a mutation probability parameter bound sufficient to ensure that the Markov chain of the corresponding nonstationary simple genetic algorithm is weakly ergodic (Definition All). The bound applies to both the two and three-operator algorithms, and it appears in Proposition 8.1 below. Proposition 8.1: The mutation probability bound given by pjk)>^k"^ is sufficient to ensure weak ergodicity of the corresponding nonstationary (two or threeoperator) simple genetic algorithm Markov chain. This result is established by using the lower bounds on the two and three-operator conditional probabilities in Eq. 4.21 and 4.31 with Definitions All and A 12 and Theorems A5 and A6. Applying the lower bound in Eq. 4.21 and 4.31 to T,() of Definition A12 and Theorem A5 yields 'C,(P) = 1 min Z min(P(m | n,),P(m | n^)) Thus, 2a 1 +a \ML

PAGE 97

88 and consequently from Theorem A6, the chain is weakly ergodic if the sequence of control parameter values {a(k;)} satisfies -( 2a(k) r k=il^l-i-a(k)^ Comparing this result to the known divergent series Zk"\ it follows that the Markov chain is weakly ergodic if the sequence {a(k)} satisfies ( 2a(k) Y^ l+a(k)J from which f a(k) ^ >k-\ >-k"^. 2 l+a(k)y Using Eq. 4.13 to translate this result into an equivalent expression in pÂ„(k) establishes Proposition 8.1. 8.3 Strong Ergodicity The mutation probability schedule bound advanced in Proposition 8.1 is also sufficient to achieve strong ergodicity if it satisfies the condition on the sequence of vector differences in Theorem A7. The required sequence of vectors can be selected as the sequence of stationary distributions of the time-homogeneous Markov chains associated with the parameter sequence {Pn,(k)} (or equivalently with the corresponding sequence {a(k)}). Section 4 establishes that a stationary distribution exists for the time-homogeneous two and three-operator algorithms corresponding to every value of a satisfying a > 0. Thus, associated with the sequence of control parameter values {a(k)} is a sequence of vectors {q^} where q^ = qa evaluated at a = a(k). Further, based upon results established in Section 6, Section 7 demonstrates that an a Â— > 0"^ limiting stationary distribution exists (Propositions 7.5 and 7.6), that the stationary distribution vector varies continuously for all a satisfying a > (Proposition 7.7) and that its first derivative exists and is continuous

PAGE 98

89 for all a satisfying a > (Proposition 7.3). In particular, q^ is continuous on the closed interval < a < 1 and its first derivative exists at every interior point of that interval. Therefore, if consideration is limited to monotone decreasing control parameter sequences, then by the mean value theorem the difference between the m components of any two consecutive vectors in the sequence can be written as dqÂ„(m)' qk+i(m)-qk(m) = da (a(k+l)-a(k)) -la = a (k) where the value a*(k) satisfies a(k + 1) < a*(k) < a(k). Consequently, dqÂ„(m)' lqk+i(m)-qk(m)| = da a = a (k) x|a(k+l)-a(k)| and S|qk + ,(m)-qk(in)| = I k=l k = l dqÂ»(m) da a = a (k) x|a(k+l)-a(k)|k Eq. 8.1 From Propositions 7.3 and 7.8, it is possible to define a function ga(m) which is continuous in a on the closed interval < a < 1 as follows dqÂ„(m) u
PAGE 99

Ilqk+i(m)-qk(m)| = Z] k=l k=l 90 dqa(m) da -la = a (k) x|a(k+l)-a(k)| < Z Bx|a(k+l)-a(k)| k = l = B Z|a(k+l)-a(k)|. k = l Eq. 8.4 Since only monotonic control parameter sequences are under consideration, the sum in the last line of Eq. 8.4 can be written as the difference of the initial and final parameter values of the sequence. Thus, I|q,,,(m)-q,(m)| ^k'^ is sufficient to ensure strong ergodicity of the corresponding Markov chain. Further, the Markov chain representing any nonstationary two or three-operator simple genetic

PAGE 100

91 algorithm for which the mutation probability sequence both observes this bound and converges to zero achieves (asymptotically) the limiting probability distribution defined in Propositions 7.5 and 7.6. 8.4 Comparison With the Simulated Annealing Parameter Bound It is instructive to compare the mutation probability sequence bound developed here with the anneaUng schedule bounds reviewed in Section 2.4.2.2, both of which are of the form K/log(k). Let p(k) be defined as the ratio p(k) = pÂ„(k)/T(k) where Pm(k) is selected as the bound developed herein and T(k) is selected as the bound provided by either Eq. 2. 1 2 or Eq. 2.13. That is p(k) = ^k''^/[K/log(k)] Eq.8.5 = ^log(k)/k^. Thus, decreasing values of p(k) imply that the genetic algorithm convergence rate is superior (asymptotically) to that of the simulated annealing algorithm. Now, let k = exp(x), or equivalently x = log(k). Substituting into Eq. 8.5 yields P(k) = ^xexp[-^ . Then, since for all positive constants y, the limit of xexp(-Yx) as x -^ oo is zero, it follows that lim p(k) = ^ li>^ilog(k)/k^] = 0Eq. 8.6 Thus, the nonstationary simple genetic algorithm provides an asymptotically superior convergence rate.

PAGE 101

SECTION 9 REPRESENTATION OF THE STATIONARY DISTRIBUTION SOLUTION 9.1 Overview Previous sections of this work establish some key results required for extrapolation of the simulated annealing convergence theory onto the nonstationary Markov chain model of the simple genetic algorithm. Specifically, existence of a unique stationary distribution for the time-homogeneous two and three-operator algorithms is established in Sections 4.3.2 and 4.3.3, and in Section 6, the existence argument is formulated into a Cramer's rule expression for the stationary distribution components. Sections 7 and 8 continue that development by establishing the existence of a limiting distribution as the mutation probability parameter approaches zero and a mutation probability sequence bound sufficient to achieve it. However, the empirical results in Section 5.5 suggest a complication, confirmed in Section 7, associated with the form of the limiting distribution. The limiting distribution behavior necessary for extending the simulated annealing global optimality result does not obtain because the Umiting distribution is nonzero for all states with uniform population (one-operator absorbing states), including those for suboptimal solutions. The limiting distribution entropy results reported at the conclusion of Section 5.5 support the intuitive notion that increasing the population size parameter ~~^ should bias the limiting distribution toward the desired behavior. However, to pursue that notion further requires closer examination of the stationary distribution equations and the requirements for their solution. This section begins that task. It is a very extensive development and it stops short of explicit solution. However, it provides some insight into the nature of the solution and additionally, it defines a promising approach to continuing the work started here. 92

PAGE 102

93 The essential task of representing the stationary distribution solution consists of evaluating the determinants required to express the results of Propositions 6.7-8 and their limiting counterparts in Propositions 7.5-6. The development proceeds by examining the three distinct cases which arise from applying three different sets of constraints on the value of the mutation probability parameter. The special case pÂ„ = 1/2 <=> a = 1 is examined in Section 9.2. It leads to a very simple (trivial) result that is of no particular interest in its own right, but is fundamental to the mechanism employed in Section 9.3 in developing the more general case < a < 1. The approach pursued in Section 9.3 involves expanding -|Psi-I| =|P-I| -|Ps-Ii as a multivariate Taylor's series in the N' X N array of conditional probabilities [P(i | n)] for i e S and n Â€ S' (defined by Eq. 4.14 for the two-operator algorithm and by Eq. 4.22 for its three-operator counterpan) about the point corresponding to a = 1. The product of that effon is an expression for the coefficient of the general term of the series as a determinant with combinatorial elements. The case p^ -^ 0^ <=> a ^ 0^ is examined in Section 9.4. The methodology developed in Section 9.3 extends with very little modification to represent the a ^ 0^ limiting behavior of I Ps^1| . Section 9.5 concludes by pointing out some significant identities which exist among the Taylor's series coefficients and the connection of those identities to the algebra of symmetric and alternating polynomials. Its purpose is to provide a foundation for extending the stationary distribution representation work begun here. 9.2 The Limiting Case a = 1 As pointed out in Section 6, the determinants required for expressing the value of the stationary distribution components by Propositions 6.7-8 are the characteristic polynomials of the P^ matrices evaluated at X = 1 . The coefficients of the characteristic polynomial of any square matrix X with finite dimensions can be expressed in tenns of the principal minors of | X| (i.e. minors generated from | X| by deleting combinations of rows and columns with the same indices). For example, the characteristic polynomial of P can be expressed as

PAGE 103

94 = Afj. A^. _ 1^ + A^,. _ 2^.^ A^,. _ 3^,^ Eq. 9. 1 +-+ (-lf-'A,Jl^'-' + (-lfAo?^'^' u where N' = card(S') is the dimension of P and A^ is the sum of its order u principal minors. This resuh is fundamental to the theory of square matrices and follows from application of elementary determinant expansion operations to | P 7d\ [Aitk54, MoSt64, Muir60]. Exactly N' u N'! y u!(N'-u)! order u principal minors are summed to produce A^. The values of some of the Au's are A ..= I P| u = N' trace(P) u = 1 Eq. 9.2 1 u = where the A,, result follows from the convention that the single order zero principal minor of |X| has value 1. In a fashion exactly analogous to Eq. 9.1, the characteristic polynomial of P^ can be written as U^) = \P^-Xi\ = aI-aI_,X + aI_:,X^-AI_,X' Eq.9.3 +--+ {-lf-'AfX'''-' + i-\fA^X''' = S(-lf "X^"-" u where A'^ is the sum of all order u principal minors of | P^^l . Thus, the value of each determinant required for expressing q via Propositions 6.7-8 can be written respectively in the form

PAGE 104

95 P5-I|=i5(l) Â— A Â— A -IA" Â— A" +--+ (-lf'-'A" + (-lfA^ N' u A m and u |P-!|-|P5-!|=(})(i)-(l)s(i) = (A^.-A^.)-(An.-,-A^._,) + (A^._,-A^,)-(A^._3-AJ._3) Eq.9.4 = X(-lf-"(AÂ„-A:). u Further, the A"'s can be expressed in terms of the principal minors of | P| because those principal minors of | P^| which include row m (the row) have value zero while those which exclude row m are identical to the corresponding principal minors of | P| . In particular, the Au 's corresponding to Eq. 9.2 are a:= |P-|=0 u = N' trace(P-) = trace(P) P(m | m) u = 1 1 u = Eq. 9.5 Eq. 4. 1 1 and Eq. 4.24, it follows that

PAGE 105

96 Vm, n e S' : P(m | n)| Â„^ , = [m n m lis s /^ 1 \^^ 2^ , Eq. 9.7 ^M^ ^my j_ )ML' Thus, P| Â„^ 1 = [P(m I n)]Â„= 1 is a rank one matrix, and therefore all minors of | P| Â«= i of order u > 2 are identically zero. Eq. 9.4 then reduces to {|P-!|-|P--!|}Â„^, = {(t)(i)-(l)s(i)}Â„^, = (-lf-'{A,-Af}^^, + (-lf{Ao-AnÂ„^,, and substituting from Eq. 9.2, 9.5 and 9.7 into this result produces ^N'-l {|P-I| -|Ps-I| }Â„_ =(-ir -'{trace(P)-[trace(P)-P(m | m)]}Â„. = (-lf-^P(m|m)|Â„., Eq. 9.8 = (-1)' N'-l fM\ 1 yVny >ML Employing Eq. 9.8 with Proposition 6.8 yields an explicit result for the a = 1 limiting value of q(m), i.e. |P-!|-|P5-!| q(m)|, '" I(|P-I|-|Ph-I|) nÂ€ S' a=l (-1) N'-l r m ML (M ymj I (-1) nÂ€ S' N'n) HÂ€s\n^ (yC\ 1 yXXXj >ML* It is independent of the objective function of the underlying optimization problem because at Pni= 1/2 <=^ a= 1 mutation completely nullifies the reproduction operator. Although this trivial case is not of any particular interest on its own, it serves as the basis for developing the general case < a < 1 in Section 9.3. The essential idea is that P

PAGE 106

97 is rank 1 at a = 1, which makes the low order derivatives of p with respect to the conditional probabilities P(i | n) have comparatively low rank, and this suggests expanding -| Ps 1| = I P 1| 1 Ps 1| in a multivariate Taylor's series about the point corresponding to a = 1. The result reflected in Eq. 9.8 is the constant term of the series. 9.3 The General Case < a < 1 The state transition matrix of the two and three-operator algorithms is completely determined by the fixed algorithm parameter M and the N' x N array of conditional probabilities [P(i I n)] for i e S and n Â€ S'. Each element of row n in P consists of a multinomial coefficient and a distinct order M product composed of integral powers of the P(i I n)'s corresponding to row n (Eq. 4.15 or 4.25). Thus, the order k principal minor of I P| generated by inclusion of rows K = {n,,n2, Â• ^n^} c S' can be written as an order k X M polynomial (composed of order k x M monomial terms) in the k x N array of variables [P(i I n)] for i e S, n e K. The corresponding order k principal minor of | P^jl has identical value provided m Â« K and is zero if m e K. These facts along with the succinct representation of the P(i | n)'s as rational functions of the objective function and algorithm parameters (Eq. 4.2, 4. 1 1, 4.24) and the degeneration of P to rank 1 at a = 1 (Eq. 9.7) suggest an attempt to expand (t)si(l) = I P^1| as a multivariate Taylor's series in the P(i I n)'s about the point a = 1 where, according to Eq. 9.6, Vi Â€ S, Vn e S':P(i | n) = 1/2^. (Actually, expanding the alternative form (])( 1 ) ^( 1 ) = | P 1| 1 P^^ T| in a new array of N' X N variables which uniquely determines [P(i | n)] proves more productive). The constant term in the series is provided by Eq. 9.8 and the highest order terms in the series are the order (N' 1) x M monomials contributed by the single nonzero order N' 1 principal minor of | P^\ . Let r = [r(i,n)] be an N' x N nonnegative integer array having rows of the fonn rs = (r(0,n),r(l,n), Â•Â•Â•,r((2' 1), n)). The nonnegative integer r(i,n) represents the exponent of the factor (P(i | n) 1/2' ) appearing in a monomial term of the Taylor's series.

PAGE 107

98 Also, let ||?-|1 = X r(i,n) and ||?|| =1 \\t^\\ =11 r(i,n). Then, the Taylor's series iÂ€S neS' neS'ieS expansion of -|Pm-I| =|P-I| -|Ps-I| can be expressed as -|P--I|=|P-!|-|P5-I| = I C(m,?) X n n (P(i I H) 1/2^-^ '^ r ne S'ie S Eq. 9.9 where __ a'^-|Piii-i|) C(m,r)= ^ n r(i,H)!a^'-^(i,H) n e S' i 6 S (f), (-iP5-iirL= a=\ r! Eq. 9.10 a=l a'^ip-!i-|p^-!i) n nr(i,H)!a^'">p(i|H) n Â€ S' i Â€ S (f), (ip-ii-iPH-iiriÂ„ a=l is the coefficient of the order || r|| monomial term uniquely identified by the nonnegative integer array r. In these expressions, the symbol r! denotes the operation ?!= n n[r(i,H)!]. nÂ€ S'iÂ€ S Expressing the value of C(m,r) thus reduces to evaluating the indicated mixed partial derivative of -| P^; 1| = | P 1| 1 P^ 1| at a = 1 divided by r!. The coefficient of the order || r|| = || 0|| = term is C(m,0) = (|P-!|-|P--!|fL^^ = {|p-!|-|P,,-!|}Â„^, and its value is the constant term of the series, provided by Eq. 9.8. The coefficient of the first order monomial term which results from setting 1 i = i,,n = ni r'"=[r(i,n/'^ where r(i,nf^ = otherwise is given by C(m,?") = ^,. 9(|P-I|-|Ps-I|) aP(i, I n,) = (|P-I|-|P--I|)' (i"*) a=l a=l

PAGE 108

99 The associated monomial term in the Taylor's series expansion of _| P-Ji = |P-!|-|Ps-I| is given by C(m,?'>)x(P(i, |H,)-l/2'-). In subsequent paragraphs, || rj = and || tJ < M for n ?i m, which together imply II r|| < (N' 1) X M, are shown to be suitable upper bounds on the order of differentiation with respect to the P(i | n)'s when computing C(m,r). Thus, the Taylor's series terminates at finite order, as indeed it must since -| P5-II is a polynomial function of the P(i | n)'s, and as noted earlier, the highest order monomial terms are order (N' 1) x M. These upper bounds on the order of differentiation (i.e. upper bounds on || tJ ), along with the lower bound of on every component of r imposed by the requirement that r be a nonnegative integer array, can be represented to advantage in terms of a set related to S'. Let S', which is completely determined by the parameters L and M, be momentarily represented by S'(M) (that is, let its dependence on M be explicitly indicated) and let the set S" be defined as the set union of all S'(k) for < k < M. That is S" = S'(0)uS'(l)u---uS'(M-l)uS'(M) M = u S'(k). k = The above constraints on the rows of r are then equivalent to requiring that every row of? be drawn from S", with the additional requirement that row m be the specific element rjs = 6 S". Since the S'(k) for distinct k are disjoint, it follows that N" = card(S") is the sum of the N'(k) card[S'(k)l, and consequently from Eq. 4. 1 and an elementary recursion on the binomial coefficient that [ M j [ M ) This result is precisely that supplied by Eq. 5.1, accompanying the state space enumeration empirical results tabulated in Section 5.2.

PAGE 109

100 Since p_ is independent of the P(i | m)'s associated with row m (row m of p_ is set to ), it follows that no monomial term containing any of the factors (P(i | m) 1/2 ) appears in the expansion of -| Ph 1| Further, since | P T| 1 P^ -I| = -| P^ 1| , no such monomial terms appear with nonzero coefficient in the expansion of | P1| 1 P^1| either. (Due to the constraint Vn e S' : Z P(i I n) = 1, the aggregate of such terms iÂ€ S appearing in | P1| is identically zero). This observation estabhshes the || r^H = bound and permits the following revision of the Eq. 9.10 definition of C(m,r) C(m,r) = . (|P-!|-|PH-!|f| a=l r! l|rdl=0 otherwise Eq.9.11 The derivative of an order n determinant with respect to a variable x can be written as the sum of the n determinants generated by differentiating each row (or column) in turn with respect to x [Aitk54, MoSt64, Muir60]. For example, if A = ^11 ^12 ^1 ^2 then d|A| dx da,, da,2 dx dx Â•'21 ^2 '12 da2i da22 dx dx If the elements of any row in the given determinant are independent of x, then differentiation of that row introduces an all zero row and the value of the corresponding determinant is zero. In particular, if only one row of the given determinant depends upon X, then only one nonzero determinant appears in the row-derivative expansion. Higher order and mixed partial derivatives of an order n determinant can be expressed similarly, e.g.

PAGE 110

101 aUi 9x^3x2 d\ d\ 12 ^1 ^2 a^a,, a^Â£ 12 ax? ax? aaji aa22 +2 a^a,, a^a,2 d\,dy da 21 ax, 3x2 aa22 ax, ^ai2 ax, ax. ax, aa, + 2 ax, a^c '21 3X2 3X2 ^a,2 3x, 3'a 22 ax, 3x2 3x,3x2 3^a2, 3^a "3^ 22 3x? 3'a. 21 3'a 12 22 3x^3x2 3x?3x2 and again, differentiation of any row with respect to a variable upon which it does not depend introduces an all zero row. Thus, if in the preceding result the first row of A is independent of X2 and the second of x,, then only one of the determinants in the expansion survives 3'a,, 3'a 3x^3x2 12 3x? 3x? '1 <^^i 3a2, 3a 22 3x, 3x, Since each P(i | n) appears in only one row of -| Ps 1| = | P 1| 1 P^ 1| , it follows from application of the preceding determinant differentiation rules that the mixed partial derivative | P1| ' can be written as the single determinant (indicated hereafter by I (P 1) ' I = I P ' r I ) generated by differentiating the rows of the matrix (P 1) in accordance with r and then computing the determinant of the matrix derivative. Tliat is, due to the single-row dependence of (P1) on each P(i | n), the two operations involved (differentiating (P1) and evaluating its determinant) commute. The same conclusion applies to any mixed partial derivative of | Ps-Il with respect to the P(i | n)'s, and hence

PAGE 111

102 (|P-!|-|P5-!|f = |P-I|^-|PH-i|^ = l(P-lf|-|(Psi-if| Eq.9.12 ^ip(f)_j(f)| _|pÂ«_j(f)| ' ' I ID r Eq. 9.4 can be generalized to express the value of (| P1| 1 P^;1| ) ' as indicated in the following. Let r have k < N' 1 rows which specify nonzero order differentiation, let K = {n,,n2, Â•Â•,nk} c S' be the set of differentiated-row indices and further let m g K. Also, for N' > u > k let Au(r) be the sum of all order u principal minors of | P' | formed by including the k differentiated rows indicated by n Â€ K and u k of the N' k undifferentiated rows in I P' I . Exactly N'-k^l (N'-k)! i,u-k; (u-k)!(N'-u)! order u principal minors are summed to produce Au(r). Finally, let A[f^(r) be defined similarly for I Pj^l . Then, applying the same elementary determinant expansion rules that lead to Eq. 9. 1 and Eq. 9.3 to | P^'' Xf) and | P^ Xf) yields u = k and and substituting these results into Eq. 9.12 with X = 1 yields the differentiated analog of Eq. 9.4 (|P-I| -IPs-Ilf = |P^-I^| -K-fl Eq.9.13 = I(-ir-"(AXr)-A:(r)). u = k If I P' I K is the order k principal minor of | P' | uniquely defined by the set of row/column indices K = {n,,n2, Â•Â•.n,;} cS' where m Â« K and if | P''|kj is the order k-l1 principal minor generated by including the undifferentiated row n e S' K with K (i.e.

PAGE 112

103 K5 = Kun = {nÂ„n2,---,nk,n) cS'), then AÂ„(?) = IP^Is-IP^'l nÂ€ S'-K K5 IP^Ik u = N' u = k+l u = k Eq.9.14 Also, since every principal minor of | P^l is either identical to the corresponding principal minor of | P'| or depending upon whether or not it includes row m, and since m Â£ K by hypothesis, it follows that the A"(r)'s corresponding to Eq. 9.14 are a:(?) = I m Is' I m I ,Wf), u = N' =;(?), I |FV = A,,,(?)-|P*V u = k+l neS'-K-{m} ^ ^ IP^'Ik u = k Eq. 9.15 All of the N' k undifferentiated rows in p^| ^^ j are identical (Eq. 9.7) and consequently all minors of P^''| Â„^ , of order u > k + 2 have value 0. Thus, in a fashion exactly analogous to the derivation of Eq. 9.8 from Eq. 9.4, Eq. 9.13 yields {(|P-i|-|P5-i|)1Â„=, = Hf"'"'{A,.,(?)-Af.,(?)}Â„^, + (-lf-''{A,(-r)-A:(-r)}^^,, and substituting from Eq. 9.14 and 9.15 into this result produces ~"^ '' Eq. 9.16 (|P-i|-|P--!|f}Â„^, = (-i) N'-k-l A,.,("r)A,,,(?)-|P^1 a=l ^(f), from which, by substitution into Eq. 9.1 1, it follows that C(m,?) = (-1) ,N'-k-l !i^%}. r! ry =0 otherwise Eq.9.17 Evaluating C(m,r) thus requires evaluating the quotient of the order k + 1 principal minor ||P''klÂ„=.and-r!. The order k-t1 principal minor | P'L is completely determined by the (k + 1) X (k + 1) sub-array of P*"'* given by |P^'(w | v)| for w, v e K^ where F^'\w \ v)

PAGE 113

104 denotes the indicated mixed partial derivative of P(w | v). Further, in computing the determinant of this sub-array, the order in which the row/column indices are drawn from K^ in the sub-array construction is immaterial because any transposition of the order introduces exactiy one row transposition and one column transposition into the sub-array, so both the magnitude and the algebraic sign of its determinant are preserved. Thus, the most general form of | P ' | k^ can be expressed as IP^'l ^' P^\m I m) P^(n P^(m I n,) P^\n P<^(m I H,) P^(H P^'\m I n,) P^'\n |m) P^^\Hjm) Â• |H,) P<^(H2|H,) Â• P^'\n, I m) P^>(HJH,) P^K I n^) Eq.9.18 i(f) In,) P^(n,|n,) Â•Â•Â• P^V I n,) From Eq. 4.15 and 4.25, it follows that each nonzero element in row v e K^ of P^' is composed of a combinatorial coefficient and an order M 1| r^H product of the P(i | v)'s. The general form of the element in column w e K^ of row v is given by P^'\w I v) = < M n w(i)! ie S r-r ^.. , Â— .w(i)-Ir(i,v) otherwise which can be rewritten as Eq.9.19 P^(w|v) = Jl w iÂ€ S nw(i)!x(n[r(i,v)!]P(i|v) w(i)-ir(i,v) otherwise Further, by noting that the factor

PAGE 114

105 'M M! n w(i)! iÂ€ S nr(i,v)!(w(i)-r(i,v))! iÂ€ S n w(i)! iÂ€ S n w(i)! iÂ€ S nr(i,v)!(w(i)-r(i,v))! ie S Vie S:w(i)>r(i,v) M! nr(i,v)!(w(i)-r(i,v))! iÂ€ S is a multinomial coefficient and designating it (via straightforward generalization of the convention introduced in Eq. 4.4) by M! ^M^ y^,r,, Eq. 9.19 simplifies to n(w(i)-r(i,v))!r(i,v)! iÂ€ S ^M^ Vie S:w(i)>r(i,v) otherwise Eq. 9.20 P^)(w|v)= -n[r(i,v)!P(i|v)*<""<''^. yW.rvy iÂ€ S Eq. 9.21 If row V is undifferentiated (i.e. || r;;|| 0), then Eq. 9.21 becomes P^(w I v) = M w nP(i|v)^'> = P(w|v). iÂ€ S It is noted in passing that if M < N = 2^ then it follows from Eq. 9.20 that either r M ^ vw,r^y or Eq. 9.22 3w'e S'3 In the latter case, it is also true that w' ^M^ V^'^vy fM^ vw J > ^M^ vWy

PAGE 115

106 The enabling condition imposes no practical limitation because any algorithm with M > N could be effectively supplanted by exhaustive search over S. Since I P' I k^ includes every row of P' which is differentiated to nonzero order (e.g. Vn e S' 9 II rjlj > 0), it follows from Eq. 9.18, Eq. 9.20 and Eq. 9.21 that any row n for which II r^ll > M introduces an all zero row into I P ' I k^, making both | P ' | k^ = and C(m,r) = 0. Therefore, || r^H < M represents a suitable upper bound on || r^H for n v^ m. This bound, along with the previously established condition || r^|| = 0, implies || r|| < (N' 1) X M and permits the following revision of the Eq. 9. 17 definition of C(m,r) C(S,;) = ^i ^^ II "-J = 0, II r;|| < M for n * m otherwise r! Further, the conditions in this result can be expressed in terms of S", yielding C(m,r) = (_l)N.-.-.j|pfi,^l a=l r! r= 0,r5e S"-{0}forn?^m Eq. 9.23 otherwise At a= 1, using Eq. 9.6 in Eq. 9.21 yields M V^.Tvy P^(w|v)L..= --r n[r(i,v)!(l/2r""''] IE S Eq. 9.24 -(1/2^) n[r(i,v)!]. ^W,r-^ ieS Thus, every element in row v e K^ of | P | k_ includes the constant factor M-|r-| (1/2^) n[r(i,v)!]. ieS Substituting Eq. 9.24 into Eq. 9.18 and collecting these common row factors outside the determinant yields

PAGE 116

107 IP^I J =(l/2'-r^'^""xLn nr(i,v)!" ( M "j f M ^ (" M l"-''"sy M ^ r M ^ f V ^ M W "i.ri V V M ' M ' V ^ "ly ^ M ^ n2.rj r M ^ ^y Also, since r= = M "i -n.; M ^ vw,rs^ VW,Oy M ^ M ^ M r M V and since O^i, v)!=r!, W y V e Ki e S \" \y^ =(1/2 ) xr!x T"J a=l ^M^ (yC\ Tm^ m r M ni,rV ">y and substitution into Eq. 9.23 yields v">y ( M ^ "l.^H y V"2y M n2'rH, ^ M ^ ni,r-

PAGE 117

108 C(m,r) = (-1) x(l/2) Eq. 9.25 M ^ f M \ { M ^ m,rn,,r^"-^v ^M^ f M ^ f M ^ ^"2,\; ( M ^ ' M ^ m,rn,,rn^.rM nv,rNote that the condition r^^ = is implicitly asserted in this result by the form of the first row of the combinatorial determinant, and that the condition r^ Â£ S" {0} for n ^t m is enforced by the definition in Eq. 9.20. When Eq. 9.25 is employed with Eq. 9.9, an additional simplification becomes available. The simplification obtains by incorporating the factor (1/2^" =2^"" present in the Eq. 9.24 definition of C(m,r) with the product factor in Eq. 9.9. That is -|P5-I|=|P-I|-|P--I|=ZC(m,?)x n n(P(i|n)-l/2") f ne S'ie S LxKi-n) = ZC(m,r)x ( 1 ^''^ > 2^ , ,Ul"i x(27 n n(P(i|n)-l/2'-) n e S' i e S LxKi-n) Eq. 9.26 where = ZC'(m,?)x n n(2^P(i|n)-l) r nÂ€ S'ie S C'(m,r) = C(m,r)x(l/2'-)"' Ki.5) is the coefficient of the indicated monomial in the new variables. Substitution of Eq. 9.25 into this expression for C'(m,r) yields

PAGE 118

109 C'(m,?) = (-1) N'-k-l f 1 T^'^ ')ML Eq. 9.27 fM^ ymj (M^ v"v M^ v"2y M^ V"ky r M ^

PAGE 119

no setting row hia to qT. Further, by virtue of their construction (Eq. 7.7 and 7.10), the single row dependence of the matrix elements on the conditional probability array [P(i | n)] employed in developing the results in Section 9.3 for P and P^^ applies to PChIa)' and P^;^' as well. Thus, Eq. 9.26-27 should extend with very littie modification to the determinants -| Pg^' r I = I P(mA)' I'l |P5^' -I' |, whose zero mutation probability limits are required by Propositions 7.5-6. The following paragraphs highlight the required modifications and employ the result to examine two simple examples. In the a Â— > 0^ counterpart of Eq. 9.26-27, m is limited to membership in the set of one-operator absorbing states (i.e. m = m^ e S^'), a consequence of which is that V"^Ay = 1. Also, all rows of the determinant other than m^ cortespond to nonabsorbing states (i.e. n e K c S' S^'). Thus, the determinant order is N' N -(1 and the differentiation index array is order (N' N -t1) x N with rows corresponding to row indices n e S' Sa' -f{m^}. The rows of r are limited to r^ =0 and r^ Â€ S" for n Â€ S' S^. FurA ther, if r indicates nonzero order differentiation of any rows which are adjacent to oneoperator absorbing states, then the associated columns of the combinatorial determinant must reflect the coefficient contribution from the adjacent absorbing state. Thus, if Co'(niA>r) denotes the limiting counterpart of C'(m,r) in Eq. 9.27 and if K = {n,, nj, Â• Â• , ni.} c S' S^' where (in the state adjacency notation introduced in Section 7.3) n, e S(nAj)' * S(mA)' and where n2,---,n^ all satisfy nj g S^", then the coefficient of the order k monomial term uniquely identified by r is given by

PAGE 120

Ill Co'(niA,?) = (-l) N'-N-k . f 1 T + 1) Eq. 9.28 X 1 ^M^ v">y fM^ v"2y + Â— L v"A,'rH,^ V V 'y "a'Ts, V v"a,''v ^ M ^ ^ M ^ M \ 1 f M v"A,''"s.y r M ^ M"! "ky r M ^ nir,r^ M ^ f M ^ It is noted that Eq. 9.28 is only an example, not a definition. It must be adjusted based upon r to reflect the number and location of the nonzero adjacent state contributions. The values of the determinants -jPs^' 1' | = I P(niA)' 1'| |Ps^' 1' | are given by employing Eq. 9.28 in Eq. 9.26 (with r restricted as noted above). Further, the a -> 0* limits of the -|Pm/ -I' | = | P(mA)' 1'| |Ps^' -I' | are provided by using the a -^ 0^ limits of the factors (2^P(i | n) 1) in Eq. 9.26. Those limits are provided by using either P,(i I n) or PxXi I n) depending upon whether the two or three -operator case is under consideration. It is instructive to apply these results to a simple example. The following paragraphs do so for the one-bit problem with population size 2. These parameters (L= 1,M = 2) imply that S = {0, 1 }, N = 2, S' = {(20),(1 1),(02)}, N' = 3, Sa' = {(20), (02)} and S' Sa' = {(11)}. Thus r is limited to re < ^(X)^ {QQ^ Too^ foo^ foo^ (X) LvOOy 10 ,00, 01 ,(X), ,(X), 20 vOOy 02 v(X)y >, and the combinatorial determinant required for evaluation of the nonzero order Co'(mA,r)'s for m^ = (20) by Eq. 9.28 has the general form

PAGE 121

112 r 2 w 2 (20), (20),?, A ( (ll)J^t(02) ly (llXr, ( (02),?Â„ 2+1 ^ ^ 2 W 2 ^ (20),?, ly (11),?, (02),? iiy Evaluation of the zero order coefficient proceeds as follows ( roov (20), 00 =(-1) 00 vOOy (3-2-0), (\\ 2 ^ V(20)y = (-l)x-xl \_ A' The coefficient corresponding to r,, = (10) is given by r roo^^ (20), 10 vOOy (-1) (3-2-1), ( 1 Y^' 2^ V^ J 1 r 2 ^ (20), (10) (11),(10) 2+1 + (02), (10), 1

PAGE 122

113 C ' C ' (20), (20), 00^^ 11 ,00, roo^^ 20 vOOy 8 3_ 16 and C ' (20), 02 vOOy J_ 16' With the required coefficients provided above the value of -| Ps^' 1' | for m^ = (20) can be expressed (by Eq. 9.26) as 1 1 1 -|Pao)'-n=-7-7(2P(0|(ll))-l)+7(2P(l|(ll))-l) 4 4 Eq. 9.29 +-(2P(0|(11))-1)(2P(1|(11))-1) -^(2P(0|(ll))-l)' + -^(2P(l|(ll))-l)^ 16 16 Then, since P(0 I 11) + P(1 | 11)1 =i> (2P(1 | 11)1) = -(2P(0 | 1 1)1), Eq. 9.29 simplifies to (20) -\Pn.:-r\ --^-^(2P(0 I (11))l)-^(2P(0 1(1 !))-!)' = -^[l+(2P(0|(ll))-l)f Eq. 9.30 = -P(0| 11)1 From the symmetry inherent in the problem, it follows that the m^ = (02) counterpart of Eq. 9.30 is -|P(02)'-n=-P(l|n)', Eq.9.31 and employing Eq. 9.30-31 with Proposition 7.5 yields (for the two-operator case)

PAGE 123

114 P,(0|11)' qo(20) = Pi(0| ll)' + Pi(l| 11)' and Eq. 9.32 qo(02) = V ; Â• P,(0|ll)VP,(l|llf Then, substituting Eq. 4.2 in Eq. 9.32 yields qo(20) = ; Â° R(0)' + R(1)' and Eq. 9.33 qo(02) = ; . The limit for the nonabsorbing state m = (1 1) is known to be zero by Proposition 7.5. An identical result to Eq. 9.33 obtains for the three -operator case because for the one-bit problem, crossover is nullified and Pj'Ci I n) = Pi(i I n) (see Eq. 4.22). Additional insight into the behavior of the limiting stationary distribution is obtainable by examining the one-bit problem with population size 3. These parameters (L = 1,M = 3) leave S and N unaltered but change the other state space related sets and parameters to S' = {(30), (21), (12), (03)}, N' = 4, S^' = {(30), (03)} and S'-Sa' = {(21), (12)}. By retracing the previous development (the M = 2case) with? limited as indicated by these state space sets, results analogous to Eq. 9.32 and 9.33 are obtained. Thus, the M=3 counterpart of Eq. 9.32 is P,(0|21)^[P,(0| 12)V3P,(0| 12)^P,(1 I 12)] + '^",(0| 12)'-t-3P,(0| 12)'I ^^^^ [ P,(0|12)^[P,(1|21)V3P,(0I21)P,(1|21)'] J qo(30) = and Eq. 9.34 |P,(1 |21)nP,(l I 12)^-h3P,(l I 12)^P,(0| 12)]-H 1 1 P,(l|12)nP,(0|21)^-h3P,(l|21)P,(0|21)'] J qo(03) = ^Â—

PAGE 124

115 where D = P,(0|21)'[P,(0| 12)' + 3Pi(0| 12)'Pi(l I 12)] + P,(0| 12)'[P,(l|21)' + 3P,(0|21)P,(l|2lf] + P,(l|12f[P,(l|21)' + 3P,(l|21)'P,(0|21)] + P,(l |21)'[P,(0| 12)' + 3P,(1 I 12)P,(0| 12)']. The Eq. 9.33 counterpart is (3Q. ^ [2R(0)]' [R(0)' + 6R(0)'R(1)] + R(0)^ [R(l )^ + 6R(0)R(1 )'] and Eq. 9.35 [2R( 1 )f [R( 1 ) V 6R( 1 )'R(0)] + R( 1 )' [R(0)' + 6R( 1 )R(0)'] qo(03) = D' where D' = [2R(0)]' [R(0)' + 6R(0)'R( 1 )] + R(0)' [R( 1 )' + 6R(0)R( 1 f] +[2R( 1 )]' [R( 1 f + 6R( 1 )'R(0)] + R( 1 )' [R(0)' + 6R( 1 )R(0)']. Again, the three-operator case yields an identical result. These examples suggest two very significant conjectural features of the limiting stationary distribution behavior. First, only order 2 monomial terms survive in the detemiinant expansions of the M=2 case and only order 6 terms survive for M=3. These facts lead to the supposition that in general, only order Mx(N'-N) terms survive. In the M=2 case, Mx(N'-N) = 2x(3-2) = 2 while for M=3, Mx(N'-N) = 3x(4-2) = 6. If this supposition is correct, then the polynomial forms required for evaluating the stationary distribution zero mutation probability limit by Propositions 7.5 and 7.6 are homogeneous order Mx(N'-N) polynomials in the P(i | n)'s. Presumably, the corresponding property (i.e. homogeneous order Mx(N'-l) order polynomial forms) applies to the general case represented by Propositions 6.7 and 6.S.

PAGE 125

116 A second conjecture concerns the limiting distribution behavior as a function of the parameter M. The computed limiting distribution entropy results displayed in Section 5.5 suggest that the limiting distribution is dominated by optimal solutions for M sufficiently large. That supposition is supported by the results in Eq. 9.33 and 9.35. In the M=2 case, it follows immediately from Eq. 9.33 that qo(02)/qo(20) = [R(1)/R(0)]^ For M=3 and R(l) < R(0) it is straightforward to show that a corresponding bounding relationship exists, i.e. qo(03)/qo(30) < [R(1)/R(0)]\ This suggests that the ratio of the probabilities of the uniform population states corresponding to i and j with R(i) < R(j) behaves at or better than [R(i)/R(j)]'' ^ Eq. 9.36 for M sufficiently large. If this supposition is indeed correct, then the desired limiting distribution behavior for the two-operator simple genetic algorithm (i.e. probability zero for sub-optimal solutions) can be approached as closely as required by selecting M sufficiently large. The corresponding general case (i.e. L>1) three-operator counterparts of Eq. 9.32 and 9.34 are expressed in terms of the P2(i | n)' array (Eq. 4.22). Thus, the numerator polynomial counterparts of Eq. 9.33 and 9.35 are expressed in terms of complex polynomial functions of the reward function values, and consequently it may be that no general case three-operator counterpart of Eq. 9.36 exists. (It is noted that the design of the reward functions employed in Section 5, in which only length 0-2 schema dependence is incorporated, tends to minimize crossover disruption, which may account for the progression toward optimality indicated by the three-operator results recorded in Figures 5-7 through 5-18). The simulated annealing global optimality may thus extrapolate onto the simple genetic algorithm only in the Pc ^ and M -><Â» limiting sense. 9.5 Extending the Stationary Distribution Representation Eq. 9.26 and 9.27 represent an exact expression of the value of the determinant -| Pji 1| = I P 1| 1 P1| , and with Propositions 6.7 and 6.8 constitute an exact representation of the components of the stationary distribution of the two and three-operator

PAGE 126

117 algorithms. Section 9.4 extends those results to the determinants _| p_ ' _ i' I I PCm^)' I'l |Ps^' 1' I whose a -> 0* values are required for use in Propositions 7.5-6. The utility of these representations depends upon the ability to extract useful relationships between the C'(ni,r)'s from the general form represented by Eq. 9.27 and Eq. 9.28. The following paragraphs examine the combinatorial determinants in the general forms provided by Eq. 9.27 and Eq. 9.28 and deduce some of the key relationships. The purpose of this effort is to provide a foundation for extending the stationary distribution representation methodology developed in Sections 9.2-9.4. First, if the enabling condition for Eq. 9.22 is satisfied (i.e. M < N), then every element in the combinatorial determinant of Eq. 9.27 is either zero or it is the combinatorial determinant corresponding to the order zero coefficient for some state in S'. Thus, every coefficient of the form represented by Eq. 9.27 can be written as sums and products of order zero coefficients. An analogous conclusion applies to Eq. 9.28. Second, it is clear from Eq. 9.27 that nonzero order differentiation of any two or more rows of -| P5 1| = | P 1| 1 P^ 1| in an identical pattern (e.g. f :0^t^^ = Tj^ for n, ^ nz) introduces identical rows into the combinatorial determinant, and thus makes C'(ni,r') = 0. Consequently, no monomial terms corresponding to any r' with identical nonzero rows survive in the expansion of-| P^i1| = | P1| 1 P^1| . An identical conclusion applies to the coefficients of-|PHi^'-I'| = | P(mA)' -I'| |Psa'~^'|' of which Eq. 9.28 is an exemplar. A very important class of coefficient identities derives from transpositions of nonzero rows and columns of the differentiation order array. The resulting identities are very closely connected to the algebra of symmetric and alternating polynomials, and to an associated determinant concept called alternants, of which Vandermonde determinants are a special case. Appendix C is provided to support the following paragraphs. From the form of the combinatorial determinants in Eq. 9.27 and Eq. 9.28, it is clear that exchanging any two of the k rows indexed by row indices n e K is equivalent to

PAGE 127

118 exchanging the corresponding nonzero rows of?If?' is derived from r by such a row transposition, then it follows that C'(m,?') = -C'(m,?). Thus, C'(m,r) establishes the value (to within a sign alternation) of the coefficients of k! distinct monomial terms in the expansion of-| P^-II = | P-I| -| P^-II . An identical result applies toC'o(m,r) and the expansion of -|Ps^'-I'| = IPCmJ' -I'| |P5;^'-r|. The collection of monomial terms corresponding to this coefficient identity can be written as the product of C'(ni,r) (or of C'o(m,r)) and a polynomial function of the fonn defined in Eq. C.12 of Appendix C. That is, the collection of terms is a quasi-alternating polynomial function in the array of variables (2^P(i | n) 1). In addition to the preceding result, the following identity applies to C'(m,r) and the expansion of -| P^; 1| = | P 1| 1 P^ 1| . For any n e K, transposition of columns m and n in the combinatorial determinant of Eq. 9.27 is equivalent to representing the value of C'(n,r') where r' is derived from r by exchanging rn with r^ = 0. That is ne K=>C'(n,r') = -C'(m,r). Thus, the identical quasi-alternating function, evaluated in the new set of variables generated by replacing each P(i | n) with the corresponding P(i | m), is included in the expansion of-| P5-II = I P1| 1 P5-II . Collectively, these results account for (k-i1)! of the coefficients required for representation of the stationary distribution. Another class of coefficient identities derives from transpositions of the columns of r (i.e. transpositions of i,j g S). Let m' be derived from m by setting m(j)' = m(i), m(i)' = m(j), n,' from n, by setting n,(j)' = n,(i), n,(i)' = n,(j), etc. Then, if r' is derived from r by transposition of rows m with m', n, with n/, etc. followed by transposition of columns i and j, it follows from Eq. 9.27 that C'(m',?') = C'(m,?). An identical result applies to C'o(mA',r').

PAGE 128

119 The number of distinct coefficients whose values are generated in this fashion from C'(m,r) depends upon both the number and form of the nonzero columns in r. If the number of nonzero columns is p, then exchanging any of the p nonzero columns with any of the N p zero columns generates the coefficient of a distinct monomial term. Exchanging a nonzero column with another nonzero column having a different column sum also generates a distinct coefficient. However, exchanging a nonzero column with another nonzero column having identical column sum may or may not generate a distinct coefficient, depending upon the distribution of the nonzero entries in the two columns, because it is possible for the transformation described above to translate one column into the other. A lower bound on the number of distinct coefficients thus generated is The collection of monomial terms corresponding to this coefficient identity can be written as the product of C'(m,r) (or of C'o(m,r)) and a polynomial function of the form defined in Eq. CIO of Appendix C. That is, the collection of terms is a quasi-symmetric polynomial function in the array of variables (2^P(i | n) 1). These coefficient identities and their connection to the quasi-symmetric and quasialternating polynomials of Appendix C offer a promising mechanism for extending the stationary distribution representation work begun here. Examination of the general form (2^P(i I n) 1) reveals that it is zero mean in the sense that I(2'P(i|n)-l) = 0. ie S This property, along with the common form of the elements in the conditional probability array [P(i | n)], suggests that the symmetric and alternating polynomial forms required for evaluation of Propositions 6.7-6.8 or 6.5-6.6 may admit to large scale simplifications, and ultimately yield a tractable, explicit closed form expression for the stationary distribution components.

PAGE 129

SECTION 10 CONCLUSIONS AND FUTURE DIRECTION 10.1 Summary This dissertation reports an effort to establish an analytical framework for the simple genetic algorithm, based upon the asymptotic probability distribution of the generated solution sequences. The mechanism employed herein is extrapolation of the extensive existing theoretical foundation of the simulated annealing algorithm onto the genetic algorithm. That foundation is based upon the asymptotic behavior of a nonstationary Markov chain simulated annealing algorithm model. The simulated annealing literature is reviewed in Section 2, with particular emphasis on the methodology employed to develop the key theoretical results. Those results include a demonstration that provided a lower bound of the form K/log(k) on the algorithm parameter corresponding to absolute temperature is observed, the asymptotic probability distribution over the algorithm state space is zero for all states corresponding to sub-optimal solutions. Thus, the simulated annealing algorithm obtains (asymptotically) a globally optimal soludon. The genetic algorithm literature is reviewed in Section 3. The significant conclusion of that section is that while certain important theoretical results exist, notably the so called schema theorem and some work on a problem construct referred to as the minimal deceptive problem, no genetic algorithm model or accompanying convergence theory comparable in scope to that of simulated annealing exists in the literature. The fundamental purpose of the work described herein is to provide such an analytical framework by extrapolating the known simulated annealing theory onto the genetic algorithm. An essential first step toward that goal is development of a nonstationary Markov chain algorithm model for the genetic algorithm. That task is accomplished in Section 4. . 120

PAGE 130

121 The product of that effort is a very general nonstationary Markov chain model for variants of the algorithm incorporating combinations of the three fundamental genetic algorithm operators. The model is tailored to resemble the model employed in the analysis of the simulated annealing algorithm convergence behavior, with the mutation probability algorithm parameter playing a role analogous to absolute temperature in simulated annealing. Additionally, some salient features of the model state behavior are pointed out in Section 4. In particular, the one-operator (reproduction only) simple genetic algorithm is shown to possess exactly 2^ absorbing states, one for each possible uniform population, while the two-operator (reproduction/mutation) and three-operator (reproduction/mutation/crossover) variants possess a unique stationary distribution. The expected value of the absorption time for the one-operator algorithm is finite and an upper bound is provided by Eq. 4.8. The probability distribution of the final solution state produced by the one-operator simple genetic algorithm depends upon the initial state, mo. The inclusion of the mutation operator is shown in Section 4 to provide a significant additional dimension to the state behavior of the time-homogeneous (stationary) two and three-operator variants of the algorithm, the existence of a unique stationary distribution. The significance of the unique stationary distribution is that the asymptotic state behavior is independent of the starting state. It is completely determined by the objective function and the algorithm parameters. In Section 5, the genetic algorithm model is employed to generate some computer simulation results. Specifically, a combinatorial interpretation of the model state space is explored numerically in Section 5.2 and the limiting stationary distribution of the three operator algorithm is approximated for a variety of algorithm parameter sets in Section 5.5. A very significant feature of the limiting stationary distribution is suggested by the Section 5.5 results and later verified theoretically (in Section 7). It is that the limiting two and three-operator algorithm stationary distribution behavior necessary for extrapolating

PAGE 131

122 the simulated annealing asymptotic global optimality result does not follow. The limiting distribution is nonzero for all states corresponding to uniform populations (one-operator absorbing states), including those representing sub-optimal solutions. This complication precludes an exact extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm. The Section 5 results do however reinforce the intuitive notion that increasing the algorithm population size parameter biases the limiting distribution towards the desired limiting behavior. Section 6 employs the Perron-Frobenius Theorem (which is summarized in Appendix B) to formulate the time-homogeneous two and three-operator algorithm unique stationary distribution existence argument into a system of equations whose solution is the stationary distribution components. The solution is formulated in terms of Cramer's Rule, and is not explicitly solved, however the Section 6 results provide a remarkable degree of insight into the form of the solution and its behavior with respect to the algorithm parameters. Those results provide the foundation for the remaining sections. The unique stationary distribution existence argument for the stationary two and three-operator algorithm variants only applies when the mutation probability parameter is stricdy greater than zero. A one-operator (zero mutation probability) stationary distribution exists but as demonstrated in Section 4.3.1 it is not unique. A very important requirement for extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm is existence of a zero mutation probability limit for the stationary distribution. Section 7 is devoted to resolving that question affirmatively. It is based upon the results developed in Section 6 and it also verifies the Section 5.5 observation concerning the nonzero limit for all states corresponding to uniform populations. A very significant theoretical contribution of this work is developed in Section 8. It is a monotonic mutation probability bound sufficient to guarantee strong ergodicity of the nonstationary two and three-operator simple genetic algorithm Markov chains. The parameter bound is analogous to the simulated annealing temperature schedule bound.

PAGE 132

123 The bound is asserted in Proposition 8.1, and its form (i.e. j^-r) is asymptotically superior to the K/log(ic) bound associated with the simulated annealing algorithm. It is very noteworthy that the same bound applies both to the two and three-operator algorithm variants. At least in terms of the Section 8 bound, the crossover operator does not expedite convergence. All of the results developed in Sections 7 and 8 are obtained without explicitly solving the stationary distribution system. Section 9 attacks the problem of explicit solution. It is a very extensive and somewhat tedious development. The product of that work is an expression for the general term in a multivariate Taylor's series expansion of the determinant form required for explicit solution of the stationary distribution equations. The results are expressed in Eq. 9.26 and 9.27 for the general nonzero mutation probability case, augmented by Eq. 9.28 for the zero mutation probability limit. These results stop short of a useable answer but they do provide some insight into the nature of the solution. Further, Section 9.5 provides some intriguing ideas for extending the work started in Section 9. The attempt to extrapolate the simulated annealing convergence theory onto the genetic algorithm fails in the sense that the zero mutation probability stationary distribution limits of the two and three-operator simple genetic algorithm variants do not satisfy the required form for extrapolation of the simulated annealing global optimality result. However, evidence is provided which suggests that for the two-operator algorithm variant, the required behavior can be approached by increasing the population size parameter (Eq. 9.36). The question is more complicated for the three-operator case, and as pointed out in Section 9.4, implementation of crossover with nonzero p^ may indeed preclude convergence to global optimality even in the infinite population size limiting sense of Eq. 9.36. The latter observation concerning crossover, along with the equivalence of the mutation probability sequence bounds for the two and three-operator cases noted

PAGE 133

124 previously, poses some significant questions concerning the role of the crossover operator. Indeed, from the results developed herein, it is not clear that any desirable effect on the asymptotic algorithm behavior obtains from application of the crossover operator, though it may have a desirable effect in expediting convergence in real (finite time) applications. The resolution of these questions, along with a host of other applications questions such as optimum population size, mutation and crossover probability parameter selection, number of iterations required to achieve acceptable results, etc. require further progress on the stationary distribution representation task begun in Section 9. 10.2 Contributions of the Research The research reported herein establishes a framework for modeling the genetic algorithm in terms of the asymptotic probability distribution of the solution sequences which it produces. Specific significant accomplishments include the following: (1) A very general nonstationary Markov Chain model of one, two and threeoperator variants of the genetic algorithm, and a framework for analysis of the operators based upon their impact on the state space of the Markov chain (2) Demonstration of the existence of a unique stationary distribution for the time-homogeneous (stationary) two and three-operator algorithm variants (3) A stationary distribution solution in terms of the characteristic polynomials of matrices derived from the state transition matrix (4) Demonstration of the existence of a zero mutation probability stationary distribution limit for the time-homogeneous two and three-operator algorithms (5) A mutation probability schedule bound (analogous to the annealing schedule bound of simulated annealing) sufficient for the nonstationary two and threeoperator genetic algorithm variants to achieve the limiting distribution

PAGE 134

125 (6) A methodology for representing the two and three-operator stationary distribution components at all consistent values of mutation probability (including the zero mutation probability limit), and a proposed approach for extending that methodology to produce an explicit result. 10.3 Future Direction In order to achieve the stated goal of this work, a complete analytical framework for the simple genetic algorithm, additional progress must be made on the stationary distribution solution effort begun in Section 9. The coefficient relationships noted in Section 9.5, especially the coefficient identities which attend transpositions of rows and columns in the differentiation order array and their connection with the quasi-symmetric and quasialternating polynomial notions presented in Appendix C, provide a foundation for proceeding with this effort. An explicit representation of the functional form of the stationary distribution, reduced to a rational function expression in the algorithm parameters and objective function, would provide a very valuable theoretical tool for use in the analysis of genetic algorithm performance, and is the ultimate goal. However, even if explicit solution is not attainable, it may prove possible to deduce very useful bounds on the stationary distribution components from continuation of the Section 9 development. A second promising area for continuation of this work concerns the mutation probability parameter sequence bound provided in Section 8. It is based on very simple lower bounds (Eq. 4.21 and 4.31) which exist for the conditional probabilities which compose the state transition matrix, and it only employs the one-step transition matrix in the (1 -x,(P)) sequence employed to establish weak ergodicity (Section 8.2). Some preliminary work not reported in Section 8 suggests that employing two-step transition matrices in summing the (1 x,(P)) sequence may allow a refinement of the bound to something of the form k~'. It also appears from that preliminary work that the same bound applies for both the two and three-operator algorithm variants.

PAGE 135

APPENDIX A DISCRETE TIME FINITE STATE MARKOV CHAINS A.l Introduction The following paragraphs establish some definitions and theorems on discrete time finite state Markov chains and related stochastic matrix concepts. These results fall into three main categories, (1) elementary definitions, (2) definitions and theorems concerning the state space and asymptotic behavior of time-homogeneous (stationary) Markov chains and (3) some more advanced ergodicity definitions and theorems necessary for the analysis of the asymptotic behavior of inhomogeneous Markov chains. These results are presented without proof or elaboration but the foundation required for the more elementary of them can be obtained from [Cinl75, IsMa76] or many other references on Markov chains. The ergodicity related results can be found in [IsMa76, Sene81]. Although some of the results discussed here apply to continuous time and/or denumerably infinite state space Markov chains as well, the intention is to restrict consideration to the discrete time finite state case. All references herein to Markov chains are understood to mean discrete time finite state Markov chains. In the following, let K = {0, 1 , 2, Â• Â• Â• } be the set of nonnegative integers, let X {Xijik e K} be a discrete time (i.e. discrete sequence index) stochastic process with finite cardinality state space E, and let i,j e E. A. 2 Elementary Definitions Definition Al: If Vi,j Â€ E and every k Â£ K, it follows that Pr{X,,,=j:Xo = iÂ„X, = iÂ„-,X, = i} = Pr{X,,,=j:X, = i}, then X is a Markov chain. 126

PAGE 136

127 Definition A2: Any row vector q^ = [q(i)] ,i e E satisfying the conditions (1) VieE:q(i)>0 (2) ^= Zq(i)=l ie E is called a probability vector. Definition A3: Any square matrix P whose rows are all composed of probability vectors is called a stochastic matrix, or sometimes more explicitly a row stochastic matrix. The row sum constraint on a row stochastic matrix can be written as PI = 1, so 1 is an eigenvalue of every stochastic matrix. Definition A4: The stochastic matrix Pk=[P.(i.J)] = [Pr{X,..=j|X, = i}] is the one step transition probability matrix or state transition matrix of the Markov chain X. If the probability vectors q^ and q^^.! are respectively the probability distributions of X,( and X|j+i, then qI+i = qi!PkSimilarly, the stochastic matrix P., = fP..(iJ)l = |Pr{X,=j|XÂ„ = i}l = pJ^Â„...p,.,= nP, l = in where k = m + n, m, n e K, n > is the n-step transition probability matrix of X, and _ k-l_ qi!^=qlPn,k = qIn p,. I = m Definition A5: Let P,; be the state transition matrix of the Markov chain X at time (sequence index) k. Then, X is time-homogeneous if and only if Vk Â€ K it follows that P,j = P where Pisa constant state transition matrix.

PAGE 137

128 A. 3 Time-Homogeneous Markov Chains The time-homogeneous Markov chain X is completely specified by its initial probability distribution, qo, and state transition matrix, P. The probability distribution of X|^ ,k> 1 is given by The following definitions and theorems concern the asymptotic behavior of the chain and some conditions on the state space which make the asymptotic behavior independent of qoIn the following, let the i,j e E element of P'' be denoted by p^'(i,j). Definition A6: A subset Eq of the state space E of the Markov chain X is called closed if Vi e Eq , Vj Â€ E Eo it follows that p(i, j) = 0. If the closed set Eq contains the single state i, so that p(i,i) = 1, then the state i is called an absorbing state. Definition A7: A Markov chain is called irreducible if there exists no nonempty closed subset of its state space E other than E itself. Definition A8: The states i and j are said to intercommunicate if 3ki,kj e K 9 p^*(i,j) > and p^'^Q, i) > 0. Theorem A 1 : A Markov chain is irreducible if and only if all pairs of states intercommunicate. Definition A9: State i g E of the Markov chain X has period d if the following two conditions hold: (1) p^\i,i) = unless k = md for some positive integer m and

PAGE 138

129 (2) d is the largest integer with property (1). If d = 1, state i is called aperiodic. The Markov chain X is aperiodic if and only if Vi e E are aperiodic. Theorem A2: If X is irreducible and if 3i e E 9 p(i,i) > 0, then X is aperiodic. Definition A 10: Any probability vector q over the state space of the time homogeneous Markov chain X and satisfying is called a stationary distribution of X. It is not necessarily unique. Theorem A3: If the Markov chain X is time-homogeneous, irreducible, aperiodic and has a finite state space, then a stationary distribution exists for X. Funher, the stationary distribution is unique and is determined by and Theorem A4: If the time-homogeneous Markov chain X possesses a unique stationary distribution, q, then for every probability vector x with compatible dimensions, it follows that lim X P = q^.

PAGE 139

130 A. 4 Inhomogeneous Markov Chains Complete specification of the inhomogeneous Markov chain X requires its initial probability distribution, Qq, and the infinite sequence of state transition matrices, {P^} , k > 0. The probability distribution of X^ ,k > 1 is given by q^ = q^nPÂ„. n = If the chain is asymptotically independent of qo, then it is said to be ergodic. Two classes of ergodicity must be distinguished. The following definitions and theorems elaborate. Definition All: The inhomogeneous Markov chain X is weakly ergodic if Vi,j,l Â€ E,Vm Â€ K lim(Pmk(i.l)-PmkG>l)) = 0. Weak ergodicity does not require that either lim Pmk(i,l) or lim PmkO.l) exist. Definition A 12: Any scalar function t(), continuous on the set of nxn stochastic matrices P and satisfying < t(P) < 1 is called a coefficient of ergodicity. If in addition T(P) = 0<^P = Tq^ where q is any probability vector with compatible dimensions (i.e. when all rows of P are identical probability vectors), then T is said to be proper. Weak ergodicity is equivalent to x(P^)^0,k^oo,m>0 where T is a proper coefficient of ergodicity. Theorem A5: Let P be a nxn stochastic matrix and let n T,(P)= 1 min I min(p(i,k),p(j,k)). i.j k=l Then, x^{P) is a proper coefficient of ergodicity.

PAGE 140

131 Theorem A6: The inhomogeneous Markov chain X is weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers {k,}, 1 Â€ K such that ,li>-Â»

PAGE 141

APPENDIX B THE PERRON-FROBENIUS THEOREM AND STOCHASTIC MATRICES B.l Introduction A matrix possessing the property that all of its components are nonnegative is referred to as a nonnegative matrix. For the matrix T, this condition is indicated by T > 0. The case in which all components of T are strictly positive is indicated by T > 0. This notation extends in the obvious manner to expressions such asT>B<=>T-B>0 relating nonnegative matrices with compatible dimensions. The definitions, theorems and corollary in Section B.2 below concern nonnegative matrices. They are the foundation for the Markov chain stationary distribution existence and representation theorem and related results summarized in Appendix A and employed in Sections 2, 4, 7 and 8. They are extracted from [SeneSl], and are specialized in Section B.3 from the case of finite nonnegative matrices to the case of finite stochastic matrices. They are employed extensively in Sections 6 and 7. B.2 The Perron-Frobenius Theorem and Ancillary Results for Primitive Matrices Theorem B2 below is called the strong version of the Perron-Frobenius theorem. It applies to a class of nonnegative matrices referred to as primitive. A version of the theorem which applies to the wider class of irreducible nonnegative matrices is usually invoked for applications involving stochastic matrices, but the flexibility of the more general version is not required for the purposes herein. The connection of these results to those of Appendix A is provided by Theorem Bl. It asserts that primitivity (Definition Bl) is equivalent to the combination of irreducibility and aperiodicity, as defined in Appendix A. 132

PAGE 142

133 Definition Bl: A square nonnegative matrix, 7, is primitive if there exists a positive integer k such that T* > 0. Theorem B 1 : If the n x n nonnegative matrix T is irreducible (Definition A7) and aperiodic (Definition A9), then T is primitive and conversely. Theorem B2: Let T be an n x n nonnegative primitive matrix. Then there exists an eigenvalue r of T such that (a) r is real, r > (b) r has corresponding left and right eigenvectors with strictly positive components (c) r > I ^1 for any eigenvalue X^t (d) the eigenvectors associated with r are unique to constant multiples (e) If < B < T and P is an eigenvalue of B, then | P| < r. Moreover, | Pl = r implies B = T. (0 r is a simple root of the characteristic polynomial of T. Definition B2: The eigenvalue r asserted in Theorem B2 is called the Perron-Frobenius eigenvalue of the nonnegative primitive matrix T. Corollary Bl: Let Tj j be the components of a nonnegative primitive matrix T having Perron-Frobenius eigenvalue r. Then min X Tj J < r < max X T^ j 1 j i j ' with equality on either side implying equality throughout (i.e. r can only be equal to the maximal or minimal row sum if all row sums are equal). A similar proposition holds for column sums.

PAGE 143

134 Theorem B3: Let j be an nxn nonnegative primitive matrix with Perron-Frobenius eigenvalue r, let V and w be strictly positive left and right eigenvectors respectively of T corresponding to r with v and w normed so that v'w = 1, and let the t < n distinct eigenvalues of T be ordered such that r > | ^| > | A^| > Â• Â• Â• > 1 ^| with the additional condition that I A^l has multiplicity mj equal to or greater than the multiplicity of any other eigenvalue X^ for which | A^| = | X^l . It follows that (a) if ^2 ?t 0, then as k Â— > Â«> elementwise, where s = mj 1 ; (b) ifA2 = 0,thenfork>n-l T = r''wv . B.3 The Perron-Frobenius Theory for Stochastic Matrices A stochastic matrix (e.g. the state transition matrix of a Markov chain) is a special case of a square nonnegative matrix in which all row sums are equal to the constant 1. The following results specialize those of Section B.2 to the case of T an nxn stochastic primitive matrix, P. Theorem B4: Let P be an nxn stochastic primitive matrix. Then (a) r = 1 is an eigenvalue of P (b) r = 1 has corresponding left and right eigenvectors with strictly positive components (c) r = 1 > I A,| for any eigenvalue X^t (d) the eigenvectors associated with r = 1 are unique to constant multiples

PAGE 144

135 (e) If < B < P and (3 is an eigenvalue of b, then | p| < r = 1 . Moreover, I PI =r=l implies B = P. (f) r = 1 is a simple root of the characteristic polynomial of P. This theorem follows immediately from Theorem 32 by application of Corollary Bl with T a stochastic primitive matrix, P. Among its consequences are the following. Proposition Bl: The right eigenvector asserted in Theorem B4(b) and (d) can be selected as the vector 1 . This result follows from the row sum constraint on nxn stochastic matrices, which can be expressed as PI = 1. Thus, 1 is a right eigenvector, corresponding to eigenvalue 1, of every nxn stochastic matrix. Theorem B4 asserts that for finite primitive stochastic matrices, it is unique to within a nonzero scalar multiple. Proposition B2: Let the vector q be the left eigenvector asserted in Theorem B4(b) and (d). Then, the additional constraint qT = 1 is consistent and makes q unique. Since the left eigenvector asserted in Theorem B4(b) and (d) has strictly positive components, its inner product with the vector 1 is a strictly positive (nonzero) number. Consequently, that inner product can be used to normalize the eigenvector to produce a q which satisfies both requirements, and Proposition B2 follows. Proposition B3: If P is an n x n stochastic primitive matrix, then lim(P'') = Tq^ where q is the unique vector asserted in Proposition B2.

PAGE 145

136 This result follows from Theorem B3 by specializing it to nxn stochastic primitive matrices via Theorem B4, Proposition Bl and Proposition B2. A very significant consequence is the following. Proposition B4: If P is an n x n stochastic primitive matrix and x an arbitrary n-dimensional probability vector, then lim(x^) = x'^q' = q^ k Â— Â»o" where q is the unique vector asserted in Proposition B2. Definition B3: If the n x n stochastic primitive matrix P in Theorem B4 and Propositions B1-B4 is the state transition matrix of a time homogeneous Markov chain, then the unique vector q asserted in Proposition B2 is the stationary distribution of the Markov chain.

PAGE 146

APPENDIX C VANDERMONDE DETERMINANTS, SYMMETRIC AND ALTERNATING POLYNOMIALS C.l Introduction An order n determinant whose i, j element is given by ^j{x) for some set of n scalar functions ())j and a companion set of n scalar variables x^, i.e. AÂ„(x) (1),(X,) (t)2(x,) (t),(X2) (Jj^CXj) Â„(X2) Eq. C.l 01 (Xn) H^n) Â• ^JlnCXn) is called an alternant. The name derives from the fact that exchanging any pair of the variables in its argument list (e.g. Xp and x<,) affects the value of AÂ„(x) = AÂ„(x,, X2, Â• Â• , xÂ„) only by reversing its algebraic sign. This property is clear from Eq. C.l, because transposing the variables Xp and x^ in AÂ„(x) amounts to exchanging the corresponding rows of the determinant, and from an elementary property of determinants, any such row exchange leaves the determinant value unchanged in magnitude but reversed in sign. The state transition matrix of the Markov chain representing the simple genetic algorithm, as introduced in Section 4 of this paper, is a multivariate generalization of the matrix form underlying the alternant. The coefficient symmetries noted in Section 9.5 in connection with the stationary distribution representation development are a consequence. Section 10 proposes exploiting this connection in continuing the stationiuy distribution representation work begun in Section 9. This appendix provides some of the related background. 137

PAGE 147

138 If the (}). in Eq. C. 1 are consecutive integer powers of their arguments, indexed from through n1, i.e. AÂ„(x) = |VÂ„(x)| = n-l n-l 1 i 1 Xn XÂ„ D n n-l Eq. C.2 then the resulting special case alternant is known as a Vandermonde determinant. The values of a Vandermonde determinant and its minors are closely related to a class of polynomials in n variables referred to as the symmetric polynomials (and to a companion class of polynomials referred to as alternating polynomials). The distinguishing feature of the symmetric polynomials is invariance with respect to permutations of the argument list (e.g. y(x,, Xj, X3) = X, + Xj + Xj = \|/(x2, x,, X3)). Alternating polynomials reverse sign with each transposition of variables. Section C.2 below develops an expression for the value of the order n Vandermonde determinant in Eq. C.2. The evaluation method employs the determinant form and a polynomial remainder theorem due to Bezout. Section C.3 introduces formal definitions of symmetric and alternating polynomials and a fundamental theorem which associates them with Vandermonde determinants. Section C.4 generalizes the symmetric and alternating polynomial notions to the form required by the discussion in Section 9.5. C.2 Evaluation of Vandermonde Determinants The value of the order n Vandermonde determinant can be deduced from its form (Eq. C.2) and a polynomial remainder theorem due to Bezout. Let \|/(x) be an arbitrary polynomial function in n variables (i.e. x = (xÂ„ X2, Â• Â• Â•, xÂ„)) and let xf ' be generated from x by replacing x^ with the value a. Then, the theorem states that if \j/(x) is divided by the binomial (x; a) the remainder is \j/(x('^) [MoSt64]. That is \|/(x) = (x,-a)(t)(x)-Hv(x|*').

PAGE 148

139 If a is selected as a = Xj for some j^i, then x(Â«) contains the value Xj at two distinct index locations in its list (i.e. at i and j). Consequently, if \|/(x) represents the value of the Vandermonde determinant in Eq. C.2 (i.e. v(x) = AÂ„(x) = | VÂ„(x)| ), then the Vandermonde determinant represented by the polynomial function \|/(xj*') = AÂ„(x|'') = | VÂ„(x|'*)| contains two identical rows and hence is zero. In that case, the Bezout theorem reduces to y\f(x) = i\-x^MK). Thus (Xj Xj) is a factor of AÂ„(x) = | VÂ„(x)| . This argument applies to each of the n(n l)/2 distinct difference factors (Xp x,). It follows that AÂ„(x) = | VÂ„(x)| can be written as AÂ„(x) = I VÂ„(x)| =
PAGE 149

140 where a = ((x ,oiy,--,CL,)^^ a permutation of the integers (1,2, -sn) is said to be symmetric with respect to the given permutation. If this property applies for all n! permutations of the X|j's, then V|/ is a symmetric polynomial [MoSt64]. Some examples of symmetric polynomials are \)/(Xj, X2, X3) = Xj + X2 + X3 and V(xi,X2,-",xJ = x;; + x^ + ---+x^ It is straightforward to show that any sum, difference or product of symmetric polynomials is a symmetric polynomial. In fact, the symmetric polynomials form a ring. If the symmetric polynomial \|/(x) = \i/(xi,X2,-",Xn) includes the monomial Pl P2 Pd among its terms, then it includes also the monomial Pl P2 Pn aXa,Xot^""Xa^ where a = (a,, 02, Â• Â• Â•, On) is an arbitrary permutation of the integers (1 , 2, Â• Â• Â•, n). If for a given p = (pÂ„p2,--,pj the sum of all distinct monomials of this form is designated by (t)p(x) = Xx^x^-x'^, Eq.C.6 then ^{\) is symmetric, and further, the arbitrary symmetric polynomial \|/(x) can be written as a linear combination of a finite number of such polynomials. That is V|/(x) = I a^(x). p A transposition of the ordered list of n variables x,^, 1 < k < n is a permutation which exchanges the positions of any two of the x^'s. Every permutation of the ordered list can be written as a composition of transpositions applied to (1,2, Â•Â•,n), and for any specified permutation, if any such composition includes an odd number of transpositions, then every such composition includes an odd number of transpositions. Similarly, if any such

PAGE 150

141 composition includes an even number of transpositions, then every such composition includes an even number of transpositions. A permutation is designated odd or even depending upon whether its decomposition into transpositions yields an odd or even number of factors respectively. If n > 1, then exactly n!/2 odd and n!/2 even permutations exist [MoSt64]. A polynomial Y in n variables possessing the property that 7(xÂ„X2,---,xJ = -7(x^,x^,---,x^) Eq. C.7 for every odd permutation a = (a,,a2, Â•Â•Â•,(XÂ„) of its argument list is an alternating polynomial . It follows that any sum or difference of alternating polynomials is an alternating polynomial, that the product of a symmetric polynomial and an alternating polynomial is an alternating polynomial and that the product of any odd number of alternating polynomials is an alternating polynomial. The product of any even number of alternating polynomials is symmetric. If the alternating polynomial 7(x) = 7(x,, Xj, Â• Â• Â•, xÂ„) includes the monomial Pl P2 Pn ax, Xj Â•Â•xÂ„ among its terms, then it includes also the monomial / , ,.s(a) Pl P2 Pn (-ir^ax^x^---x^ where a= (a,,(X2, Â•Â•Â•,an) is an arbitrary permutation of the integers (1,2, -sn) and where s(a)is the number of transpositions in the permutation a. If for a given p = (Pi,P2, -sPn) the sum of all distinct monomials of this form is designated by Pp(x) = I(-irx:;x:^-x:^, Eq.C.8 then Pp(x) is alternating, and further, the arbitrary alternating polynomial y{\) can be written as a linear combination of a finite number of such polynomials. That is 7(x) = S a^p^(x).

PAGE 151

142 As pointed out in the concluding paragraph of Section C.2, the polynomial function (defined by the product in Eq. C.4) which represents the value of an order n Vandermonde determinant alternates sign with each exchange of two variables in its argument list. Thus, it is an alternating polynomial. In fact the polynomial function which represents a Vandermonde determinant is an elementary alternating polynomial in the sense defined by the following theorem, proof of which is provided in [Aitk54] and [Muir60]. Theorem C. 1 : If yis an alternating polynomial in the ordered list of n variables X,;, 1 < k < n, then yean be written as k = n 7(xÂ„X2,---,xÂ„) = \|/(xÂ„X2,---,xJx n (x^-Xj) k = 2,j < k = V|/(x)x|VÂ„(x)| where y is a symmetric polynomial. C.4 Quasi-Symmetric (and Quasi-Alternating) Polynomials The definitions of symmetric and alternating polynomials supplied in Section C.3 require that the relevant properties (Eq. C.5 and Eq. C.7) apply for all n! permutations of the integers (1,2, Â•Â•Â•,n). This section generalizes those notions to multivariate analogs, suitable for the discussion presented in Section 9.5. The generalization amounts to restricting the applicability of Eq. C.5 and C.7 to a subset of the n! permutations of (1,2, Â•Â•Â•,n). The resulting polynomial classes are referred to here as quasi-symmetric and quasi-alternating polynomials respectively. Let \|/ be a polynomial function of n = mk scalar variables x^, 1 < i < m, 1 < j < k and let \/ be denoted \|/(x,,X2,---,xÂ„) where X| is a k-component vector composed from the scalars x^, 1 < j < k. Then, y is quasi-symmetric if V(xÂ„X2,---,0 = V(S,V---,xÂ„^) Eq. C.9

PAGE 152

143 for all permutations a = (a,, a2,---,aÂ„)Â®f^h^^"^^g^''"s (1,2, Â•Â•,m). The set of all m! such permutations can be placed in one-to-one correspondence with a subset of the set of all n! = (mk)! permutations of (1,2, ,n). If the quasi-symmetric polynomial \|/(x) = \|i(xj, Xj, Â• Â• Â•, O includes the monomial among its terms, then it includes also the monomial < PÂ„ P,2 _ Pu P2I P22 P2i ...L,Pn.l Pm2 Pmk where a = (ai,(X2, Â•Â•Â•,(XÂ„)is an arbitrary permutation of the integers (1,2, -^m). If for a given P = (Pl>P2'-'-'Pm) = (Pu'Pl2'---.Plk'P2Pp22'---'P2k'---.Pml'Pm2'---.Pmk) the sum of all distinct monomials of this form is designated by i(x) = ll^y:;,xll'J^x^'.xJ. Â• -x^-J Â• {x^^'.x^^V Â• -x^^t). Eq. CIO then
PAGE 153

144 If the quasi-alternating polynomial y(x) = 7(xÂ„ x^, ,\J includes the monomial ''22 '^2k j ' \\nl \ia ' ' 'Kitk among its terms, then it includes also the monomial where a = (aÂ„ 0X2, Â• Â• Â• , aÂ„) is an arbitrary permutation of the integers ( 1 , 2, Â• Â• Â• , m) and where s(a) is the number of transpositions in the permutation a. If for a given P = (Pl,P2'---.Pm) = (PlpPl2.---.Plk.P21.P22.---.P2k'---.Pml.Pm2.---.Pmk) the sum of all distinct monomials of this form is designated by p-(x) = y(-l)'<"*fx''"x''"---x''"'Yx''"x''''---x''^\./Y''Â°"v'''^ v''"'0 Pr, rO then p^(x) is quasi-alternating, and further, the arbitrary quasi-alternating polynomial y(\) can be written as a linear combination of a finite number of such polynomials. That is 7(x) = I a^pp(x).

PAGE 154

APPENDIX D COMPUTER LISTINGS D.l Introduction This appendix includes listings of the computer programs used to generate the simulation data presented in Section 5. These programs were developed on the Eglin AFB, Fl. Cray Y-MP. The programs are all written in Fortran and in some cases they employ Cray extensions to the Fortran standard. The listings are separated into two subsections, one including main program listings and a second including the contents of a subprogram library accessed by the main programs. The library procedures section also includes a library table of contents. D.2 Main Program Listings PROGRAM GET_NPS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_NPS C C Purpose: Compute (via binomial coefficient) and output C the cardinality of the indicated S 's and S "s C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare local variables C INTEGER M, L, NN, NP, NPP DOUBLE FNP, FNPP C C Loop over M = 1 to M = 8 C D02M = 1,8 WR1TE(6, *)'M = ',M C C Get the answers and write them to stdout C DO 1 L= 1,8 NN = 2**L + M FNP1.0 FNPP= 1.0 145

PAGE 155

146 DO 4 J = 0, M 1 FNP = FNP*FLOAT( NN 1 J )/FLOAT( M J ) 4 FNPP = FNPP*FLOAT( NN J )/FLOAT( M J ) NP = 0.5 + FNP NPP = 0.5 + FNPP 1 WRITE( 6, 3 ) L, 2**L, NP, NPP 2 WRITE( 6, * ) 3 FORMAT( 5H L = , 14, 5H N = , 18, & 6H NP = , 122, 7H NPP = , 122 ) C C Finished C END PROGRAM GET_SPS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C Name: GET_SPS C C Purpose: Compute and output the indicated S's C ccccccccccccccccccccccccccccccccccccccccccccccccccccc C Declare local variables C INTEGER M, L, NP, LM, LL, LN, J INTEGER SP( 4*5984 ), NBAR( 16 ) C C Prompt for and read ranees C WRITE( 6, * ) ' Maximum M? ' READ( 5, '(18)' ) M WRITE( 6, * ) ' Maximum L? ' READ(5, '(I8)')L C C Get the answers and write them to stdout C D03LM = 1,M D02LL=1,L CALL GET_SP( LM, LL, SP, NP ) WRITE( 6, * ) WRITE( 6, 4 ) LM, LL, 2**LL, NP DO 1LN= 1,NP S5;yt UNPACK_NBARP( LM, LL, SP( ( LN 1 )*LM + 1 ), NBAR ) 1 WRITE(6,5)(NBAR(J),J=1,2**LL) 2 CONTINUE 3 CONTINUE 5 f8rMAt! 'eU) = Â• ^Â™ ^ Â• "Â• 5" ^ = Â• '' 6H NP = . Â„2 ) C Finished C END

PAGE 156

PAGE 157

148 & R( I ) = R( I ) + WEIGHT 4 CONTINUE 5 CONTINUE C C open the output data file and write R C OPEN( 1, nLE='RDATA', STATUS='NEW' ) WRITE( 1, '(8F10.6)' ) ( R( J ), J = 0, 2**L 1 ) C C Finished C END PROGRAM GET_P2INS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_P2INS C C Purpose: Compute and return the indicated conditional C probability arrays C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare problem defining parameters C INTEGER M, L, NP, NALPHA PARAMETER ( M = 6, L = 4, NP = 54264, NALPHA = 10 ) INTEGER SP( M, NP ) REAL R( 0:2**L 1 ), P2IN( 0:2**L 1 ) INTEGER NCOUNTS( 4 ) DATA NCOUNTS/1,3096,3100,54181/ CHARACTER*8 P2INDATA DATA P2INDATA/'P2INDATA7 C C Declare local variables C INTEGER NBAR( 0:15) REAL ALPHA C C Get the objective function values C OPEN( 1, nLE='RDATA', STATUS='OLD' ) READ( 1, '(8F10.6)' ) ( R( I ), I = 0, 2**L 1 ) CLOSE( 1 ) C C Open the output data file and write the summary data C OPEN( 1, FILE=P2INDATA, STATUS='NEW' ) WRITE( 1, '(418)' ) M, L, NP, NALPHA WRITE( 1,'(8F10.6)')R WRITE(1,*) C C Build the state space array C CALL GET_SP( M, L, SP )

PAGE 158

149 C C Generate the required P2IN data C D0 2NC0UNT= 1,4 CALL UNPACK_NBARP( M, L, SP( 1, NCOUNTS( NCOUNT ) ), NBAR ) WRITE( 1, '(1612)' ) NBAR DO 1 I = 0, NALPHA 1 ALPHA = L0/2**I WRITE( 1, '(F10.6)' ) ALPHA CALL GET_P2IN( M, L, SP( 1, NCOUNTS( NCOUNT ) ), & ALPHA, R, P2IN ) WRITE( 1,'(8F10.6)')P2IN 1 WRITE( 1, * ) 2 CONTINUE C C Finished C END PROGRAM GET_P3INS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_P3INS C C Purpose: Compute and return the indicated conditional C probability arrays C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare problem defining parameters C INTEGER M, L, NP, NALPHA PARAMETER ( M = 6, L = 4, NP = 54264, NALPHA = 10 ) INTEGER SP( M, NP ) REAL R( 0:2**L 1 ), P3IN( 0:2**L 1 ) INTEGER NCOUNTSC 4 ) DATA NCOUNTS/1,3()96,3100,54181/ CHARACTER*8 P3INDATA DATA P3INDATA/'P3INDATA'/ C C Declare local variables C INTEGER NBAR( 0:15) REAL ALPHA C C Get the objective function values C OPEN( 1, nLE='RDATA', STATUS^'OLD' ) READ( 1, '(8F10.6)' ) ( R(I ), 1 = 0, 2**L 1 ) CLOSE( 1 ) C C Open the output data file and write the summary data C OPEN( 1, FILE=P3INDATA, STATUS='NEW' ) WRITE( 1, '(418)' ) M, L, NP, NALPHA

PAGE 159

150 C WRITE(1,'(8F10.6)')R WRITE( 1, * ) C Build the state space array CALL GET_SP( M, L, SP ) C C Generate the required P3IN data C D0 2NCOUNT=l,4 CALL UNPACK_NBARP( M, L, SP( 1, NCOUNTS( NCOUNT ) ), NEAR ) WRITEd, '(1612)') NEAR ^ DO 1 I = 0, NALPHA 1 ALPHA = 1.0/2**1 WRITE(1,'(F10.6)') ALPHA CALL GET_P3IN( M, L, SP( 1, NCOUNTS( NCOUNT ) ) & ALPHA, R, P3IN ) WRITEd, '(8F10.6)')P3IN 1 WRITE( 1, * ) 2 CONTINUE C C Finished C END PROGRAM GET_3STAT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C Name: GET_3STAT C C Purpose: Compute the indicated three-operator stationary C distribution C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C Declare defining parameters and output file name INTEGER M, L, NP PARAMETER ( M = 4, L = 5, NP = 52360 ) REAL ALPHA, ALPHAO, R( 0:2**L 1 ), TDELTA INTEGER NALPHA PARAMETER ( ALPHAO = 1.0/2**20, NALPHA = 1, TDELTA = 004 ) CHARACTER* lOTESTFILE ' DATA TESTFILE/'3TEST45'/ C Declare state space associated arrays INTEGER SP( M, NP ), SPA( 0:2**L 1 ), MULTI( NP ) REAL QEAR(NP, 0:1) C Declare some local variables C REAL LDELTA INTEGER TOGGLE, LOOP_COUNT

PAGE 160

151 C Get the objective function data C OPEN( 1, nLE='RDATA4', STATUS='OLD' ) READ( 1, '(8F10.6)' ) ( R( J ), J = 0, 2**L 1 ) CLOSE( 1 ) C C Open the output data file and write the summary output data C OPEN( 1, nLE=TESTFILE, STATUS = 'NEW, FORM= 'UNFORMATTED' ) WRITE( 1 ) M, L, NP WRITE( 1 ) ALPHAO, NALPHA, TDELTA WRITE( 1 ) R C C Generate and store the state space set, S' C CALL GET_SP( M, L, SP ) C C Next, get the indices of the absorbing states in S' C CALL GET_SPA( M, L, NP, SP, SPA ) C C Now get the associated muhinomial coefficient array C CALL GET_MULTI( M, L, NP, SP, MULTI ) C C Compute for each required ALPHA C DO 3 K = 0, NALPHA 1 C C InitiaHze ALPHA and QBAR C ALPHA = ALPHA0/2**K CALL INIT_QBAR( M, L, NP, MULTI, QBAR ) C C Loop until the tolerance parameter is met C TOGGLE = LOOP_COUNT = LDELTA = LO DO 1 WHILE ( LDELTA .GT. TDELTA ) CALL GET_3QBAR( M, L, NP, ALPHA, R, MULTI, & SP, SPA, QB AR( 1 , TOGGLE ), & QB AR( 1 , MOD( TOGGLE +1,2)), & LDELTA ) TOGGLE = MOD( TOGGLE +1,2) LOOP_COUNT = LOOP_COUNT + 1 WRITE(6,*) LOOP_COUNT, TDELTA, LDELTA 1 CONTINUE C C Output the termination infonnation and the final vector C WRITE( 1 ) ALPHA, LOOP_COUNT 1, LDELTA DO2I = l,NP/40 2 WRITE( 1 ) ( QB AR( 4()*( I 1 ) + J, TOGGLE ), J = 1 , 40 ) WRITEC 1 ) ( QBAR( J, TOGGLE ), J = 4()*{ NP/4() ) + 1, NP )

PAGE 161

152 3 CONTINUE C C Last write the absorbing state vector values C WRITE( 1 ) ( QBAR( SPA( I ), TOGGLE ), I = 0, 2**L 1 ) C C Finished C END D.3 Library Listings stat.o:GET_SP stat.o:GET_SPA stat.o:GET_MULTI stat.o:INIT_QBAR stat.o:GET_3QBAR stat.o:INIT_NBARP stat.o:GET_NBARP stat.o:GET_NFAC stat.o:GET_P3MN stat.o:GET_P3IN stat.o:GET_PlIN stat.o:UNPACK_NBARP SUBROUTINE GET_SP( M, L, SP, NP ) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_SP c C Purpose: Generate S' C C Note: The fourth argument (NP) is optional C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, SP( M, 0:* ), NP C C Declare local variables C INTEGER NCOUNT, NBARP( M ), I, NARG LOGICAL LSTAT C C Initialize NBARP C CALL INIT_NBARP( M, NBARP ) C C Loop until S' is complete C NCOUNT = LSTAT = .TRUE. DO 2 WHILE( LSTAT )

PAGE 162

153 C Set this element in S' C DOl 1= 1,M 1 SP( I, NCOUNT ) = NBARP( I ) C C Get the next one and increment the counter C CALL GET_NBARP( M, L, NBARP, LSTAT ) 2 NCOUNT = NCOUNT + 1 C C Test argument count to determine whether to set NP C NARG = NUMARGO IF( NARG .EQ. 4 ) NP = NCOUNT C C Return to caller C RETURN END SUBROUTINE GET_SPA( M, L, NP, SP, SPA ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_SPA C C Purpose: Generate a table of absorbing state indices in S' C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare calling arguments C INTEGER M, L, NP, SP( M, NP ), SPA( 0:2**L 1 ) C C Declare some local variables C INTEGER I, J, K, JSTART C C Initialize C JSTART = 1 C C Loop over all I in S C D03I = 1,2**L C C Loop over all J in S' C DO 2 J = JSTART, NP C C Test SP until an exit condition is satisfied C DO 1 K = 1,M IF( SP( K, J ) .NE. I ) GO TO 2 1 CONTINUE C

PAGE 163

154 C Exhausting M signals an absorbing state C Assign it and go after the next one C SPA( I 1 ) = J JSTART = J + 1 GO TO 3 C C Exit to label 2 means that this J is not an C absorbing state C 2 CONTINUE 3 CONTINUE C C Return to caller C RETURN END SUBROUTINE GET_MULTI( M, L, NP, SP, MULTI ) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_MULTI c C Purpose: Get the multinomial coefficient table for the C supplied S' C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NP, SP( M, NP ), MULTI( NP ) C C Declare local variables and function reference C INTEGER MFAC, NBAR( 0:2**L 1 ), NCOUNT INTEGER I, GET_NFAC C C Compute and save M! C MFAC = GET_NFAC( M ) C C Loop over all vectors in S' C D0 2NC0UNT=1,NP C C Set MULTI( NCOUNT ) = M! C MULTI( NCOUNT ) = MFAC C C Now unpack this NBARP C CALL UNPACK_NBARP( M, L, SP( 1, NCOUNT ), NBAR ) C C Loop over the denominator factorials C

PAGE 164

155 DO 1 I = 0, 2**L 1 1 MULTI( NCOUNT ) = MULTI( NCOUNT )/GET_NFAC( NBAR( I ) ) C C Close the loop C 2 CONTINUE C C Return to caller C RETURN END SUBROUTINE INIT_QBAR( M, L, NP, MULTI, QBAR ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: INIT_QBAR C C Purpose: Initialize the QBAR probability vectors C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare calling arguments C INTEGER M, L, NP, MULTI( NP ) REAL QBAR(NP, 0:1) C C Declare a local variable C REAL FRACTION C C Set QBARl to its ALPHA = 1 value and zero QBAR2 C FRACTION = 1.0/( 2**( M*L ) ) DOl 1= 1,NP QBAR( I, ) = MULT1( 1 )*FRACTION 1 QBAR(I, 1 ) = 0.0 C C Return to caller C RETURN END SUBROUTINE GET_3QBAR( M, L, NP, ALPHA, R, MULTI, & SP, SPA, QBARO, QBARl, LDELTA) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_3QBAR C C Purpose: Compute and return the indicated three-operator C stationary distribution vector transformation C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Declare calling arguments C

PAGE 165

156 INTEGER M, L, NP, MULT1( NP ), SP( M, NP ), SPA( 0:2**L 1 ) REAL ALPHA, R( 0:2**L 1 ) REAL QBARO( NP ), QBAR1( NP ), LDELTA C C Declare local variables C INTEGER MCOUNT, NCOUNT REAL SIGMA, P3MN( NP ) C C Clear QBARl C DO 1 MCOUNT =1,NP 1 QB AR 1 ( MCOUNT ) = 0.0 C C Loop over all states in S' C D0 3NC0UNT=1,NP C C Get P3MN for this NBAR C CALL GET_P3MN( M, L, NP, ALPHA, R, MULTI, NCOUNT, SP, P3MN ) C C Accumulate them C D0 2MC0UNT=1,NP 2 QB AR 1 ( MCOUNT ) = QBAR1( MCOUNT ) + & QBARO( NCOUNT ) * P3MN( MCOUNT ) C C Close the NBAR loop C 3 CONTINUE C C Normalize C SIGMA = 0.0 DO 4 MCOUNT = 1,NP 4 SIGMA = SIGMA + QBARl (MCOUNT) D0 5MC0UNT=1,NP 5 QBARK MCOUNT ) = QBAR1( MCOUNT )/SIGMA C C Reset LDELTA C SIGMA = 0.0 DO 6 I = 0, 2**L 1 6 SIGMA = SIGMA + QBAR1(SPA(I)) LDELTA = 1.0 -SIGMA C C Return to caller C RETURN END

PAGE 166

157 SUBROUTINE INIT_NBARP( M, NBARP ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Name: INIT_NBARP C C Purpose: Initialize NBARP to the starting element of S ' C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, NBARP( M ) C C Set all pointers in the NBARP array to cell 1 C D0 1I=1,M 1 NBARP( I ) = 1 C C Return to caller C RETURN END SUBROUTINE GET_NBARP( M, L, NBARP, NSTAT ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_NBARP C C Purpose: Generate the successor element in S' C C Note: (1) This procedure assumes that M and L are in range C and that the caller supplied NBARP is valid C C (2) The fourth argument (NSTAT) is optional C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ) LOGICAL NSTAT C C Declare local variables C INTEGER IMAX, I, J, NARG LOGICAL LSTAT C C Set maximum index C IMAX = 2**L C C Process most frequent transition C IF( NBARP( M ) .LT. IMAX ) THEN NBARP( M ) = NBARP( M ) + 1

PAGE 167

158 LSTAT = .TRUE. C C Next most frequent C ELSE IF( NBARP( 1 ) .LT. IMAX ) THEN DO 1I = M1,1,-1 IF( NBARP( I ) .LT. IMAX ) GO TO 2 1 CONTINUE 2 NBARP( I ) = NBARP( I ) + 1 D0 3J = I+1,M 3 NBARP( J ) = NBARP( I ) LSTAT = .TRUE. C C Anything else is terminal C ELSE LSTAT = .FALSE. END IF C C Test argument count to determine whether to set NSTAT C NARG = NUMARGO IF( NARG .EQ. 4 ) NSTAT = LSTAT C C Return to caller C RETURN END INTEGER FUNCTION GET_NFAC( N ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_NFAC C C Purpose: Compute and return N! C C Note: This procedure assumes that N is a nonnegative integer C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Declare calling argument C INTEGER N C C IfN = OorN=l,thenN! = l C IF( N .LE. 1 ) THEN GET_NFAC = 1 C C Otherwise, recurse C ELSE GET_NFAC = N * GET_NFAC( N 1 ) END IF

PAGE 168

159 C Return to caller C RETURN END SUBROUTINE GET_P3MN( M, L, NP, ALPHA, R, MULTI, & NCOUNT, SP, P3MN ) ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Name: GET_P3MN C C Purpose: Compute and return the indicated conditional C probability array C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NP, MULTI( NP ), NCOUNT, SP( M, NP ) REAL ALPHA, R( 0:2**L 1 ), P3IN( 0:2**L 1 ), P3MN( NP ) C C Declare local variables C INTEGER MCOUNT, K C C Get P3IN for this NBARP C CALL GET_P3IN( M, L, SP( 1, NCOUNT ), ALPHA, R, P3IN ) C C Initialize the P3MN vector C DOl MCOUNT = 1,NP 1 P3MN( MCOUNT ) = MULT1( MCOUNT ) C C Loop over the solutions represented in this NBARP C D03J= 1,M D0 2MC0UNT=1,NP K = SP( J, MCOUNT ) 1 2 P3MN( MCOUNT ) = P3MN( MCOUNT )*P3IN( K ) 3 CONTINUE C C Return to caller C RETURN END SUBROUTINE GET_P3IN( M, L, NBARP, ALPHA, R, P3IN ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: GET_P3IN C C Purpose: Compute and return the indicated conditional C probability array C

PAGE 169

160 ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ) REAL ALPHA, R( 0:2**L 1 ), P3IN( 0:2**L 1 ) C C Declare local variables C INTEGER I, J, K, IMAX, INDX REAL P1IN( 0:2**L 1 ), P3INP( 0:2**L 1 ) REAL P2, SUM C C Set maximum loop index C IMAX = 2**L 1 C C Clear the P3IN and P3INP arrays C DO 1 I = 0, IMAX P3IN( I ) = 0.0 1 P3INP( I ) = 0.0 C C Get the PUN array C CALL GET_P1IN( M, L, NBARP, R, PUN ) C C Build the P3INP array C DO 4 I = 0, IMAX DO 3 J = 0, IMAX P2 = P1IN(I)*P1IN(J) DO 2 K = 0, L INDX = I CALL MVBITS( J, 0, K, INDX, ) 2 P3INP( INDX ) = P3INP( INDX ) + P2 3 CONTINUE 4 CONTINUE C C Perturb with mutation C DO 6 J = 0, IMAX IF( P3INP( J ) .NE. 0.0 ) THEN DO 5 I = 0, IMAX 5 P3IN( I ) = P3IN( I ) + P3INP( J )* & ALPHA**POPCNT( XOR( I, J ) ) END IF 6 CONTINUE C C Now normalize them C SUM = 0.0 DO 7 I = 0, IMAX 7 SUM = SUM + P3IN( I ) DO 8 I = 0, IMAX

PAGE 170

161 8 P3IN( I ) = P3IN( I )/SUM C C Return to caller C RETURN END SUBROUTINE GET_P1IN( M, L, NBARP, R, PUN ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC c C Name: GET_P1IN C C Purpose: Compute and return the indicated conditional C probability array C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ) REAL R( 0:2**L 1 ), P1IN( 0:2**L 1 ) C C Declare local variables C INTEGER I, J, K, IMAX REAL SUM C C Set maximum loop index C IMAX = 2**L 1 C C Clear the P 1 IN vector C DO 1 I = 0, IMAX 1 P1IN(I) = 0.0 C C Compute the numerators and accumulate the denominator C SUM = 0.0 D02J= 1,M K = NBARP(J)1 P1IN(K) = P1IN(K) + R(K) 2 SUM = SUM + R( K ) C C Now normalize them C DO 3 I = 0, IMAX 3 P1IN(I) = P1IN(I)/SUM C C Return to caller C RETURN END

PAGE 171

162 SUBROUTINE UNPACK_NBARP( M, L, NBARP, NBAR ) CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C C Name: UNPACK_NBARP C C Puq)ose: Generate displayable version of packed NBAR C C Note: This procedure assumes that M and L are in range C that the caller supplied NBARP is valid C ccccccccccccccccccccccccccccccccccccccccccccccccccccc c C Declare calling arguments C INTEGER M, L, NBARP( M ), NBAR( 0:2**L 1 ) C C Declare local variables C INTEGER I C C Clear the NBAR vector C DO 1 I = 0, 2**L 1 1 NBAR( I ) = C C Now set the nonzero components C D02I=1,M 2 NBAR( NBARP( I ) 1 ) = NBAR( NBARP( I ) 1 ) + 1 C C Return to caller C RETURN END

PAGE 172

REFERENCES [Aitk54] Aitken, A. C, "Determinants and Matrices," Interscience Publishers, Inc., New York, N. Y., 1954. [Beth80] Bethke, A. D., "Genetic Algorithms as Function Optimizers," Ph. D. Dissertation, University of Michigan, Ann Arbor, Mich., 1980. [BrGo87] Bridges, C. L. and Goldberg, D. E., "An Analysis of Reproduction and Crossover in a Binary Coded Genetic Algorithm," in Grefenstette, J. J. (Ed.) "Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms", Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1987, pp 9-13. [BrGo89] Bridges, C. L. and Goldberg, D. E., "A Note on the Non-Uniform Walsh Schema Transform," TCGA Report No. 89004, Jun 1989, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. [Cinl75] Cinlar, E., "Introduction to Stochastic Processes," Prentice Hall, Englewood Cliffs, N. J., 1975. [Davi87J Davis, L. (Ed.), "Genetic Algorithms and Simulated Annealing," Morgan Kaufman Publishers, Inc., Los Altos, Calif., 1987. [Dejo75] De Jong, K. A., "An Analysis of the Behavior of a Class of Genetic Adaptive Systems," Ph. D. Dissertation, University of Michigan, Ann Arbor, Mich., 1975. [GeGe84] Geman, S. and Geman, D., "Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images," IEEE Trans. Patt. Anal. Mach. Intel., Vol. PAMI-6, No. 6, Nov. 1984, pp 721-741. fGold83] Goldberg, D. E., "Computer-Aided Gas Pipeline Operation Using Genetic Algorithms and Rule Learning," Ph.D. Dissertation, University of Michigan, Ann Arbor, Mich., 1983. [Gold85| Goldberg, D. E., "Optimal Initial Population Size for Binary Coded Genetic Algorithms," TCGA Report No 85001, Nov 1985, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. LGold87] Goldberg, D. E., "Simple Genetic Algorithms and the Minimal Deceptive Problem," in Davis, L. (Ed.), "Genetic Algorithms and Simulated Annealing," Morgan Kaufman Publishers, Inc., Los Altos, Calif., 1987. [Gold88| Goldberg, D. E., "Genetic Algorithms and Walsh Functions: P;u-t I, A Gentle Introduction," TCGA Repon No. 88(K)6, Nov 1988, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. 163

PAGE 173

164 [Gold89a] Goldberg, D., "Genetic Algorithms in Search, Optimization and Machine Learning," Addison-Wesley Publishing Company, Inc., Reading, Mass., 1989. [Gold89b] Goldberg, D. E., "Genetic Algorithms and Walsh Functions: Part II, Deception and its Analysis," TCGA Report No. 89001, Jan 1989, Dept. of Engineering Mechanics, The University of Alabama, Tuscaloosa, Ala. [GoSe87] Goldberg, D. E. and Segrest, P., "Finite Markov Chain Analysis of Genetic Algorithms," in Grefenstette, J. J. (Ed.) "Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms," Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1987, pp 1-8. [Gref85] Grefenstette, J. J. (Ed.), "Proceedings of an International Conference on genetic Algorithms and Their Applications," Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1985. [Gref87] Grefenstette, J. J. (Ed.), "Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on genetic Algorithms," Lawrence Earlbaum Associates, Publishers, Hillsdale, N. J., 1987. [Hall67] Hall, M., "Combinatorial Theory," Blaisdell Publishing Company, Waltham, Mass., 1967. [Holl75] Holland, J. H., "Adaptation in Natural and Artificial Systems," University of Michigan Press, Ann Arbor, Mich., 1975. [IsMa76] Isaacson, D. L. and Madsen, R. W., "Markov Chains Theory and Applications," John Wiley & Sons, New York, N. Y., 1976. [KiGe83] Kirkpatrick, S., Gelatt, C. D. and Vecchi, M. P., "Optimization by Simulated Annealing," Science, Vol 220, Number 4598, May 1983, pp 671-680. [LaAa87] Laarhoven, P. J. M. and Aarts, E. H. L., "Simulated Annealing," D. Reidel Publishing Company, Dordrecht, Holland, 1987. [LuMe86] Lundy, M. and Mees, A., "Convergence of an Annealing Algorithm," Math. Prog., 34, 1986, ppll 1-124. [Metr53] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E., "Equation of State Calculations by Fast Computing Machines," The Journal of Chemical Physics, Vol 21, Number 6, June 1953, pp 1087-1092. [MiRo85] Mitra, D., Romeo, F. and Sangiovanni-Vincentelli, A., "Convergence and Finite Time Behavior of Simulated Annealing," Proc 24 ^ Conference on Decision and Control, Ft. Lauderdale, 1985, pp 761-767. [MoSt64] Mostowski, A. and Stark, M., "Introduction to Higher Algebra," The Macmillian Company, New York, N. Y., 1964. [Muir60] Muir, T., "A Treatise on the Theory of Determinants," Dover Publications, Inc., New York, N. Y., 1960.

PAGE 174

165 [Rior58] Riordan, J., "An Introduction to Combinatorial Analysis," John Wiley & Sons, New York, N. Y., 1958. [RoSa85] Romeo, F. and Sangiovanni-Vincentelli, A., "Probabilistic Hill Climbing Algorithms: Properties and Applications," Proc. 1985 Chapel Hill Conference on VLSI, May 1985, pp393-417. [SeneSl] Seneta, E., "Non-negative Matrices and Markov Chains," Springer Verlag, New York, N. Y., 1981.

PAGE 175

BIOGRAPHICAL SKETCH Tom Davis enrolled in the Purdue University School of Electrical Engineering in September, 1967 and was awarded the Bachelor of Science in Electrical Engineering degree in June, 1971. Upon graduation, he was commissioned in the United States Navy and entered flight training at Pensacola, Florida. He was designated a naval aviator in April, 1973. He subsequently completed several aviation related tours of duty, including two years as a primary flight instructor at Saufley Field, Florida, and three years in an air anti-submarine warfare squadron home-ported in Brunswick, Maine. During the later tour, he completed three extended overseas deployments to European and North Atlantic island operational sites. Tom resigned from the navy in August, 1978 and entered the Purdue University School of Electrical Engineering as a graduate student. He was awarded the Master of Science in Electrical Engineering degree in December, 1979. In January, 1980, he reported to work at the Guided Weapons Division of the Air Force Armament Test Laboratory at Eglin AFB, Florida. During the following eight years, he was assigned to a variety of millimeter wave radar and infrared seeker development programs for autonomous targeting tactical weapons systems. In May, 1988, Tom was admitted to the Graduate School of the University of Florida and he began attending classes in Gainesville in August of that year. During the calendar year ending in August, 1989, he satisfied the residency, course work and entrance examination for admission to the PhD program. At that time, he returned to Eglin AFB and resumed his duties in the Armament Laboratory. He was admitted to candidacy following a qualifying exam conducted at Gainesville in October, 1989, and since then has been engaged in dissertation research in stochastic relaxation search algorithms. 166

PAGE 176

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^/ ncjl^e. Chairman iate Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Donald G. Childers Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Antonio Arroyo ^ Associate Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. KwwU" AaÂ« Murali Rao Professor of Mathematics 1 certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Eugeme R. Chenette Professor of Electrical Engineering

PAGE 177

This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 1991 lUj^^jj/CK>U<^ ^!iw_^infred M. Phillips Dean, College of Engineering Madelyn M. Lxx:khart Dean, Graduate School

PAGE 178

9