A decision support system for dynamic scheduling

MISSING IMAGE

Material Information

Title:
A decision support system for dynamic scheduling practice and theory
Physical Description:
vii, 213 leaves : ill. ; 29 cm.
Language:
English
Creator:
Aytug, Haldun
Publication Date:

Subjects

Subjects / Keywords:
Production scheduling   ( lcsh )
Decision support systems   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1993.
Bibliography:
Includes bibliographical references (leaves 199-212).
Statement of Responsibility:
by Haldun Aytug.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001917386
notis - AJZ2925
oclc - 30381669
System ID:
AA00002073:00001

Full Text










A DECISION SUPPORT SYSTEM
FOR DYNAMIC SCHEDULING:
PRACTICE AND THEORY













By

HALDUN AYTUG


DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1993














ACKNOWLEDGEMENTS


would


to thank my


advisor


Professor Gary


J. Koehler for


his constant


support and


assistance during


dissertation,


during


moments


frustration.


I am thankful to the other members of my committee, Professor Chung Yee


Lee, Professor Anthal Majthay, and Professor Richard A. Elnicki, for their input.


I would


like to thank Professor Selcuk Erenguc for his guidance as a graduate coordinator during


my studies at the University of Florida.


am grateful to friends who helped make life


enjoyable during my stay in Gainesville.


Finally,


am most grateful to my family for


their constant love and encouragement.

Last thanks go to International Business Machines Corporation for their support

on part of this dissertation.















TABLE OF CONTENTS


ACKNOWLEDGEMENTS


ABSTRACT


CHAPTERS


INTRODUCTION


Scheduling and Operations Research
Artificial Intelligence and Scheduling
Scheduling and Genetic Algorithms
Contents of This Dissertation . .


* S S S S S S S S
* S S S S S S S S S S S S S S S


5 5 4 5 S S S S S S S S S 5 7


SCHEDULING, DISPATCHING AND ARTIFICIAL INTELLIGENCE


Overview of Scheduling and Operations Research
2.1.1 Definition of the Scheduling Problem
Environment . . . .


S S S 5 lI


S. . 10
,e Production
. . 11


Notation and Parametric Description of Scheduling Problems


Scheduling Problem as it Appears in Industry
Last Word on OR Approaches


Dispatching ..
2.2.1 Some Dispatching Rules Discussed in Literature


2.2.2


A List of Dispatching Rules


Overview of Artificial Intelligence .
2.3.1 Expert Systems and Scheduling


2.3.1.1
2.3.1.2


. . . 17
. . 19


21
. . . 23
. . . 28
. . . 34
. . . 35


Cases against and for Expert Systems in Scheduling


Some Reported ES Applications


Constrained Heuristic Search
Hybrid Artificial Intelligence Mt
Machine Learning Approaches
A Listing of Systems......


. . . 38


. .. . . . . 50
.* . . .5 . . 57


LEARNING


S. 68
2-1


6


page


A, _rln_.l. I r









GENETIC ALGORITHMS AND CLASSIFIER SYSTEMS


Overview of Genetic Algorithms


4.1.1


The Fundamental Theory of GAs and Implicit Parallelism


4.1.3


An Exact Representation of a GAs Search Behavior
Current Research on GAs


4 44 49z4


4.1.4 GA Applications on Scheduling .. .
GA Based Learning Systems and Classifier Systems


ISSUES ON FINITE GAs .


An Issue on Stopping Criteria
Stopping Criteria for GAs


5.2.1


5.2.2


First Passage Times: Preliminaries .. .
An Upper Bound on the Number of Iterations


* S 4 4 4 4 9 4
. 4 . 0 0 4 4


Extensions and Some Properties


LEARNING
SIMULATION


Overview


DISPATCHING


9 4 9 4 9 9 9 9 9
4 4 4 4 5 4 4 4


RULES


FOR


SCHEDULING


USING


* 9 9 4 9 9 9 9 4 9 9 4 4 9 4 9 9 9 9 4 4 9 4 9 4 9 9 9 4 9
* 4 4 4 4 4 4 4 4 4 4 9 4 4 9 4 4 9 9 4 4 4 4 S 4 4 4 4 4


Conceptual Model


Genetic Learning and Inference .. . .. .. . .. .
Simulation Environment . .. . . . . .
Exp eriment and Results . . . . . . . .
Discussion of Results . . . . . . . . .


Related Issues


1


GENERALIZATION OF THE GENETIC LEARNING SYSTEM


Overview


The Flowshop Extension
Knowledge Representation


* I 9 4 4
* S 9 4t 4


* 4 9 9 9 4
. .4 4


* 4 4 9 4 4 4 4 9 4 4 4 4 4 4
*. .4 4 4 .4 9 .4 .


7.3.1


7.3.3
7.3.4
7.3.5


BNF Form of The Rule Language
Rule-level Recombination Operators


Conjunct-level Operati
Inference Strategies .
Credit Assignment


Simulation Experiments


7.4.1
7.4.2
7.4.3


Experiment Set One
Experiment Set Two
Experiment Set Three


7.5 Interpretation of the Simulation Results


CONCLUSIONS AND FUTURE RESEARCH


f\ ',


. . . . 1


89
89
92


* 4 9 9 4 4 4 4 4
* 4 S 4 9 4 4 4 9 .


ors 4 9 9 6 9 9 4 4 4 9 4 9 4 4 9 4 4 4 4 4


* 4 4 9 9 9 9 9 9 4 9 4 9 4 4 4 9 9 4 4 9 4 9 4 4

* 4 9 4 4 4 4 4 4 4 9 4 4 4 9 4 4 4 9 9 4


* 4 4 9 4 9 4 4 9 9 9 9 9 9 9 4


* 4 9 9 9 4 9 4 4 4 9 4 9 4 9 5 4








8.3.1


Conclusions


Future Research and Modifications to the System


REFERENCES


BIOGRAPHICAL SKETCH














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

A DECISION SUPPORT SYSTEM FOR DYNAMIC SCHEDULING:
PRACTICE AND THEORY

By

Haldun Aytug


August 1993


Chairman:


Professor Gary J. Koehler


Major Department:


Decision and Information Sciences


Knowledge acquisition


major


bottleneck


in successfully


implementing


Expert System in a production environment.


domain expert exists in


Moreover, it is very questionable that a


this domain, due to the dynamic and complex nature of the


environment.


Knowledge


base


update


is another


problem


Expert


System


approaches for the scheduling domain since some production environments are subject to


frequent


changes


making


existing


knowledge


obsolete.


remedy


to the


knowledge acquisition and update problems is using


"Machine Learning"


approaches.


This reduces the dependence on a human expert and automates the knowledge acquisition

and update processes.

In this research we motivate the need for using machine learning techniques for


S Sl -- S S S S S ,1








knowledge base. Due to their continuous feedback control mechanisms, classifier systems

are very appropriate for updating knowledge bases when the environment is changing.

We have developed a learning system that is integrated with a simulation environment


(developed in C++) which supports inference. A rule language supportive of the genetic

search operators used by the system has been developed. Two subsystems the "learning


subsystem"


and the


"task subsystem" interact throughout the


"knowledge construction"


process.


The learning subsystem evaluates the performance of the task subsystem after


every learning episode and generates an expectedly better knowledge base for the task


subsystem to use in the next learning episode.


This procedure stops after a reasonable


knowledge base is constructed.

The theoretical part of this research involves developing a worst case bound on the


number of learning episodes


the system has to


perform before stopping


with a


performing knowledge base.


This is equivalent to developing a stopping criteria for


genetic algorithms since classifier systems use genetic algorithms to search for new rules.


To date


there


been


no series attempt to develop a stopping


criteria


for Genetic


Algorithms.


We have developed an upper bound on the number of iterations a GA has


to execute to assure seeing an optimal solution with a specified confidence level by using


the concept of first passage times of a Markov Chain.


For a given string length, we have


also found the optimal number of iterations necessary before stopping.














CHAPTER


INTRODUCTION


Scheduling has long been an interest of


theoretical point of view.


researchers both from a practical and a


Scheduling is a problem encountered in every type of business.


Examples are project scheduling, flight scheduling, scheduling classes at a university,


production scheduling, etc.


Whatever the application is, scheduling involves assigning


scarce


resources


to operations.


this dissertation


we will


restrict our attention


production scheduling problems.

In a production environment, schedules take the form of assigning operations to

resources such as machines, and labor in a sequence that will optimize some performance

criteria.


It is possible to define the scheduling problem at different levels.


At the highest


level scheduling is more of a long term deployment of resources to production types at

an aggregate level, whereas at the lowest level, scheduling involves the actual assignment


of each individual resource for each individual operation.


In this study we will focus on


the lowest level, the actual allocation of resources to operations.


There has been an extensive amount of work in this area.


Traditional optimization


techniques offer high quality and elegant solutions to a very restricted subset of these


problems.


Due to their restrictive nature, models solved this way


have very


limited










Recently,


there has been a


trend towards increasing the generalizability of the


scheduling models.


These approaches can model very complex environments, but due to


the complexity of the problem they offer lower quality solutions with a very high degree


of applicability.


Artificial Intelligence (AI) techniques have been very promising for


solving


scheduling


problems.


This


can be


attributed


to their


ability


incorporate domain specific knowledge into the search process and their flexibility in

modeling different aspects of the scheduling environment.


In the following sections we will


briefly introduce different approaches to the


scheduling problem.


1.1 Scheduling and Operations Research


Scheduling


as a sequencing


problem


attracted


much


attention


from


Operations Research community.


(Baker,


Starting with Johnson's work on flow shop problems


1974), and Jackson's on single machine problems (Lawler et al., 1985) in 1954,


research has progressed to more challenging problems.


As the problem types evolved to


more complex forms and became more representative of the scheduling environment, the

algorithms developed by researchers started to fail to achieve the optimal solutions within


reasonable time frames.


It was not until the foundation of the complexity theory that


researchers realized they were dealing with inherently very hard problems.


Starting with


this new knowledge, research on sequencing shifted to developing heuristic algorithms










Most


problems


tackled


optimization


methods


consider


very


simplistic


environments.


Typical problems attacked involve a few


machines, infinite resources and


deterministic


attributes.


stochastic


nature


problem


usually


completely ignored.


Performance


measures


considered


are not


more


realistic


model


assumptions.


Usually simple criteria such as flow time, lateness, etc. have been used to


measure


performance


algorithms


developed.


These


measures


have


been


selected since they are functions that are relatively easy to work with.


However, there


has been almost no effort to investigate cost or quality related measures.

There has been some work on the stochastic scheduling problems, but due to their


complexity,


analytical


models


involve


restrictive


assumptions


about


probability


distributions.

When complete solution methods do not exist, it has been common practice to


apply rules of thumb to solve the problem.


Dispatching rules are simple heuristics that


are found to work well in stochastic environments.

Simulation has almost never been used for optimization on scheduling problems

due to the fact that the approach itself does not suggest a way of solving the problem.

It has almost exclusively been used to test how good an algorithm performs when certain


assumptions of the algorithm are relaxed.


Simulation has been extensively utilized to


compare the performances of heuristic rules in non-deterministic environments.








4

In a relatively recent survey by Lageweg et al. (1981), it has been reported that

problems that are optimally solvable are less than ten percent of the total number of

problems tackled by optimization methods.

Even though thirty years of research has compiled a large database of algorithms


for scheduling problems.


Most of these are not directly applicable for real time control


of scheduling operations.


It is only


during the last decade that the researchers have


focused on methods that can connect this background knowledge to real environments.



1.2 Artificial Intelligence and Scheduling


During


twenty


years


there


have


been


enormous


improvements


computing technology.


Current personal computers are almost as powerful as yesterday's


mainframes and


are easily


accessible.


Most production


environments


are computer


integrated.


Applications of computing are more ubiquitous.


Indeed, most production


environments are now computerized.

An early endeavor of computer scientists was to endow computers with the ability


to perform intelligently.


This area of research is called Artificial Intelligence (AI).


tremendous advances in computers have made AI a more tractable area.


Today it is


possible to write software that outperforms humans.


Al techniques


have


been


successfully


implemented


many


domains.


example, Expert Systems (ESs) have found large application domains from banking to








5

During the last decade there has been an enormous number of AI applications on


scheduling.


Researchers developed


ESs for many different production environments,


(Alpar and Srikanth 1989a), (Chang 1985), (Kerr and Ebsary, 1988), (O'Keefe 1986).


was discovered that knowledge base construction was the main bottleneck to building


such systems.


ESs were not easy to maintain, and their simple inference mechanisms


proved


to be


insufficient for


most


problems.


Yet ESs


were able


to solve


a lot


problems and help achieve some of the management objectives (Miller, Lufg and Walker

1988).


Other


AI applications focus on search heuristics and knowledge representation.


Knowledge representation is important because the system should be able to represent


domain specific knowledge.


Search heuristics are important because most AI applications


focus on real


time scheduling


where


is a critical constraint.


Search


heuristics,


sometimes referred to as meta knowledge, should efficiently search for the solution and


must exploit knowledge representation language characteristics.


Search schemes such as


Constrained


Heuristic Search,


(CHS)


(Arff


Hasle


1990),


(Fox,


1983),


(Fox and


Sycara,


1990),


blackboard style control, (Ow, Smith and Thriez 1988), (Smith, Ow and


Potvin,


1990),


planning algorithms (Shaw,


1988c) have been successfully applied.


Even though these systems are able to successfully utilize AI techniques, they lack


one important


aspect


intelligence:


"learning."


There


been


little


work


developing scheduling systems that can learn.


Even though the learning problem has








6

Recently, there has been work on heuristic discovery using learning algorithms.


Such systems use a learning algorithm to


generate a knowledge base and utilize this


knowledge for problem solving (Piramuthu et al., 1991), (Yih 1990).


to their ability to represent complex domains and apply


knowledge, AI approaches are most promising for scheduling.


domain specific


They not only provide


powerful


problem


solving


mechanisms


blend


today's


computer


integrated manufacturing environment.



1.3 Scheduling and Genetic Algorithms


Genetic Algorithms (GAs) are general purpose, stochastic search algorithms.


They


conduct the search by manipulating bit strings and employ stochastic operators such as


reproduction, crossover and mutation.


At each iteration a GA moves from one population


of strings to another until a set of useful strings are found.


Until


recently,


little


work


been


done


on the


search


behavior


GAs.


Holland's schemata theory is of little help to explain the convergence behavior of GAs.

Recently there has been significant work on analyzing GAs as Markov Chains (Nix and


Vose,


1992).


Results are still far from applicability but the model introduces powerful


analytical tools for investigating GA search characteristics and


worst case analyses.


Genetic operators have been employed for


sequencing problems.


There has been


research on developing new operators that are suitable for sequencing problems (Whitley,










Classifier


Systems


(CSs)


are adaptive


learning


systems


use GAs as


discovery algorithms.


Their application to scheduling has been limited due to their weak


knowledge representation language.


Nevertheless, it has been suggested that they are


robust learning systems in dynamic domains (Goldberg 1990).


First attempts to use them


to learn sequencing heuristics rather than sequences have proven promising (Hilliard et


al., 1987), (Hilliard et al.,


1989).


1.4 Contents of This Dissertation


Most optimization approaches to scheduling suffer from their overly simplistic


assumptions,


their


inability


incorporate


domain


specific


knowledge


their


inflexibility


adjusting


to environment changes.


Even


though


such


techniques are


capable of producing high quality solutions with respect to their presumed environment,

solutions generated by such methods are not directly applicable for real time scheduling.


The solutions generated


may even


be infeasible


because of


their overly simplifying


assumptions.


power.


Among all OR techniques, only simulation offers adequate representational


It is also very useful in performing "what if" analysis.


Almost


approaches


employ


enough


expressive


power


for representing


domain characteristics.


Expert Systems lack the adequate search power necessary to


search the rule space efficiently enough.


Among the Al approaches to the scheduling


problem, most of the recent


work


been


towards


using domain


specific


heuristic








8

these methods are successful in solving specific problems, they are unable to adapt their


search mechanisms as the environment changes.


They lack the ability to learn.


Applications of Machine Learning (ML) in scheduling problems are at an early


stage.


Even though some encouraging results exist, pure ML approaches are too general


for a complex domain like scheduling.


In this dissertation we concentrate on a hybrid


approach which combines the knowledge generated by OR research over the last thirty


years and the knowledge representation techniques enabled by AI research.


A GA search


mechanism is employed to discover cases where certain heuristics are applicable.


We test


the plausibility of


a GA based learning approach.


We focus our attention on a flowshop


environment which is complex enough to test the usefulness of


such an approach.


object oriented simulation environment that is capable of handling inference is utilized.


We also study the behavior of

on the computational complexity of (


GAs as search algorithms.


jAs.


We present a bound


We prove some properties of this bound and


demonstrate


optimal


search


parameters


can be


obtained


to minimize


computational complexity of GA under certain conditions.


Chapter


we present


literature


survey


approaches


scheduling.


Emphasis is on dispatching rules and AI applications in scheduling.


In this


chapter we discuss limitations and advantages of both AI and OR approaches.


We also


motivate the need for learning in real time scheduling systems.


Chapter 3 mainly focuses on learning.


Part of the chapter is dedicated to Machine










results


some earlier


work


on ML.


Finally,


we analyze


the existing


research


scheduling from a ML perspective.


We review the literature on


GAs and CSs in Chapter 4.


We start with earlier


theoretical findings on GAs.


Next,


we present a discussion


of several applications of


GAs and CSs in other domains as well as in the scheduling domain.


A large part of this


chapter will be devoted to the results of Vose and Liepins, (1991), and Nix and Vose,

(1992) which will provide building blocks for Chapter 5.


In Chapter 5,


we develop a bound on run time complexity of a GA.


We derive


the bound through a series of theorems and lemmas.


Part of the analytical results we


need are borrowed from the research on spectral analysis of non-negative matrices.


Chapter 6 summarizes our first attempt to build a learning system using GAs.


first analyze a simple, hypothetical production environment and compare the performance


of the


learning system against systems using single heuristics


with no learning.


further investigate the possibility of learning optimal rules by simulating an environment

where an optimal rule is known to exist.


Chapter 7 extends the idea described in Chapter 6 to a flowshop environment.


rule language that is able to capture the knowledge related to a flowshop environment is


described.


Results of simulation experiments are presented and discussed by comparing


to some known dispatching heuristics.


Finally, in Chapter 8, conclusions and directions for


future research for both














CHAPTER


SCHEDULING, DISPATCHING AND ARTIFICIAL INTELLIGENCE

2.1 Overview of Scheduling and Operations Research


Research


on scheduling problems


has a long history,


dating


back


to the


nineteen fifties and early nineteen sixties.


Most of the optimal rules and algorithms for


certain classes of problems were found during these initial stages.


As production systems


evolved and became more complex, mathematical models representing these real systems


have


become


more


complex.


After


Stephen


Cook


introduced


theory


completeness in


(Gary and Johnson


1979), researchers realized that most of the


scheduling

namely N]


problems


P-complete


belonged

problems.


to the

This


class


triggered


computationally


research


intractable


on heuristic


problems,


algorithms


scheduling that do


not guarantee optimal solutions but do


guarantee a good solution


within some error limits of the optimal solution.


It was also shown that these models


could deal with comparatively realistic problem types and appeared to be more robust


optimization


based


methods (Rodammer and


White


1988).


Recent results have


shown that even these heuristic algorithms have their limitations.


It was believed that for


NP-complete


problem and


given


error tolerance


there existed


an approximation


algorithm that could solve the problem within the specified


error bounds, but it was


a,_. *- Tfl--. 1_ ~.. 11.. .... r










section


we review


production


scheduling


problems


discuss


advantages and disadvantages of


traditional Operations Research (OR) methods used for


their solution.



2.1.1 Definition of the Scheduling Problem and the Production Environment


From an OR point of view, the scheduling problem can be defined as finding a

sequence of operations on a given set of jobs that will give an optimal solution with


respect to the prespecified performance criterion.


There may be more than one measure


of interest for a given problem formulation.


Each may result in different solutions.


Also,


a slight change in the problem may drastically change its mathematical modelling.


often the case that there may exist a polynomial time algorithm for a model but a slight

change may yield a problem without exact solution methods in polynomial time.


production environment is defined by the number of machines, the type of


machines, and the nature of the flow on the shop floor.

machines are assumed to have infinite buffer sizes. All m


All queues feeding into the


machines are assumed to be able


to process only one operation at one time.

When a production environment has a work flow that is unidirectional where each


job visits each


contrary,


machine


if the


installed


have routes


series,


specified


we refer to


a prior,


it as a


flowshop.


but the routes do


not enforce a


unidirectional flow, such an environment is called a job-shop.










operation is known before the operation is preformed.


The release time of a job is the


time the


job enters the shop floor and is available for processing.


These are usually


assumed to be zero or, if not zero, are assumed to be known a priori.


It is implicitly


assumed that the scheduler has perfect information as to when the job is going to be


available.


Due dates are the times when the jobs are supposed to leave the shop floor and


be ready for shipment to the customer.


Machines are assumed to be the only type of scarce resource.


Labor and raw


materials


are assumed


to be


available


infinite


amounts.


some


production


environments with different types of job families, there may be sequence dependent setup


times between two jobs coming from different families.


There may also be technological


constraints


imposing


precedence


relationships


between


operations


a job.


example, assume a job has five operations performed on it before being completed, and


operation


two has


to be


performed


before


operation


five.


This is


referred


as a


precedence relationship.


In a flowshop environment such a relationship is implicit.


2.1.2 Notation and Parametric Description of Scheduling Problems


be the number of machines;

be the number of jobs;

be the processing time of job j on machine i;










be the release time of job j;


be the completion time of job j;

be the flowtime of job j;

be the lateness of job j; and

be the tardiness of job j.


completion


time


time


when


a job


ready


to leave


system.


Flowtime,


lateness


tardiness


are defined


non-decreasing


functions


completion time as follows:

F.=C -r.,
J J J

L. = C. -d.,
J J J


A function of completion time is called a


completion time if it is non-decreasing in completion time.


"regular measure" with respect to the


Hence flowtime, lateness and


tardiness are regular measures with respect to completion time.


Scheduling problems are defined by three parameters a/3/y (Lawler et al.


1985).


The first parameter, a, is the description of the shop floor and can be any of the

following:


The shop floor consists of only one machine.


max(OLj)








14

mJ: The shop floor is a job-shop with m machines.

The second parameter, I3, is the description of the constraints or the attributes of


the problem.


If f3 is null then the problem has no constraints and the attributes are in


their most general form.


Some typical specifications are the following.


Release time of job (which is not necessarily zero.)


prec:

pmtn:


There are precedence relationships between the jobs.

Preemption is allowed.


Setup time of sij


is required if job j is processed after job i when


There is no setup required if j=i.


All processing times are equal to p.


third


parameter,


is the


description


objective


function


to be


minimized. Usually these functions are regular measures with respect to the completion


time of the job.

Cmax:


C,:
il=


F*:
Fmax"





F


Some typical performance measures are the following.

Maximum completion time, often referred to as "makespan".


Total completion time.



Maximum flowtime.


Total flowtime.


Aflov~mim ltnm Qpn>


Pj =p










Tmax


Maximum tardiness.


Total tardiness.



Total weighted tardiness.


In an excellent survey by Lawler et al., (1985) many scheduling problems are


listed and


solution


methods are discussed.


Similarly


Lageweg et al.,


(1981) present


complexity results for 4,536 problems and some references.


It is not surprising to see that


only 9% of these problems are listed as easy problems with 81% as NP-complete.


remaining 10%


are open problems.


Some of these problems and their solution methods,


if any, are given in


Table 2-1.


Table 2-1


clearly


demonstrates that,


other than single machine problems


simple performance measures, scheduling problems are mostly intractable by classical OR

techniques.

Most of the OR approaches look at oversimplified versions of the real problem,

such as assuming deterministic processing times, known release times and well behaved


quantifiable objective functions.


Even with such oversimplifying assumptions most of


these


problems


class


of NP-complete


problems.


Furthermore,


solution


methods suggested can not incorporate qualitative management objectives and heuristic

Irnnuwleao. intn thp nrnhls-n











Table 2-1.


Some scheduling problems


PROBLEM COMPLEXITY ALGORITHM
1/prec/fmax P Lawler, (Lawler et al. 1985).
1/pmtn,rj,prec/fm, P Baker,Lawler,Lenstra and Rinnooy
Kan, (Lawler et. al 1985).
1//Lmax,Tmax P Earliest Due Date, (Baker 1974).
1/rjLmax NP Special cases are solvable by extended
Jackson's algorithm, (Lawler et al.
1985)
1//iC P Shortest Processing Time, (Baker
1974)
l//IwjCj P Weighted SPT, (Baker 1974).
1/ri/XC, NP Pseudopolynomial algorithm by
Lawler, (Lawler et al. 1985).
1//ITj NP Generalized SPT, (Lawler et al. 1985).
mP//iCj P (Lawler et al. 1985).
mP//XTj NP (Lawler et al. 1985).
mP//Cmax NP There are approximation algorithms,
(Lawler et al. 1985).
2F//Cmax, 2/pmtn/Cmax P Johnson's algorithm, (Baker 1974).
2F/ri/Cm,x NP (Lawler et al. 1985).
2F//Lmn NP (Lawler et al. 1985).
2F//iC, NP (Lawler et al. 1985).
3F//Cmax NP (Lawler et al. 1985).
3F/pmtn/Cmax NP (Lawler et al. 1985).
3F/pmtn/XCC% NP (Lawler et al. 1985).










2.1.3 Scheduling Problem as it Appears in Industry


Even the simplistic mathematical models of the scheduling environment are hard,


but as may be expected, real life problems are even harder.


Smith explains the reasons


as follows:

The sources of difficulty identified included (1) the need to adhere to a
typically idiosyncratic set of restrictions relating to production processes,
resource capabilities, and resource availability, (2) the need to balance a


large


conflicting


set of


objectives


unpredictability of factory operation.


(Smith


preferences,
1988b, p. 382)


To make life more complicated, the performance measures are not usually as simple as


those listed earlier.


Management usually sets multiple objectives which not only interact


but also conflict with each other.


Sadeh and Fox explain this very clearly.


Real-life scheduling problems are subject to a variety of preferences such
as meeting due dates, reducing the number of machine set-ups, reducing


inventory costs,


using


accurate and/or fast machines,


some jobs are performed within a single work-shift,


making sure that
. Although these


preferences are usually set independently to one another, they interact. For
instance selection of a good start time for an activity (e.g. to meet a due
date) may prevent the selection of an accurate machine for another


operation or may prevent meeting another job'


due date.


For this reason,


selecting operation start times or allocating resources based solely on local


a prionr


preferences


is likely to result in


poor schedules.


Preference


propagation is meant to allow for the construction of measures that reflect
preference interactions. These measures can then serve to guide the
construction of a good overall schedule rather than a schedule that locally


optimizes a subset of preferences.


(Sadeh and Fox 1989, p.


What happens in a real time situation is even more drastic than explained above.


The operator responsible for all this has other worries, too.


The tools the


nneratnr ic rePnnsnihle can't all nmn the amrne kind nf nrndncNt and heRause










fabrication steps.


Each wafer must go


through, say, a photo process


or 15 times, each process building a layer resulting in a three dimensional


complexity--you can't build


level


B before you


build


level


complexity is compounded by other difficulties such as all boxes which
store a lot of wafers look alike and often an important customer comes in


with a


special request.


Modern


Times at its


worst.


For operator and


manager alike, the whole manufacturing process is like a giant chess game


--except


unpredictabilit


complica
v make


demanding of games.


ntea sequencing, corn
it harder to think th
(Sullivan and Fordyce


ibinatorial


Rough than e
1990, p. 53')


complexity,


;ven


and
most


Kerr and Ebsary state the reasons for choosing heuristic approaches to scheduling

rather than an optimal OR method as follows:

*..


The first type of approach


{ the OR approach)


was rejected on the basis


that: { 1 } known future requirements and probable events will influence the
short term schedule in an aggregate way, but it would be unreasonable to
incorporate these into a short term optimization approach because of the
uncertainty associated with them.
{ 2 } the environment is unlikely to remain static even over a period as


short as 12 hours. (Kerr and Ebsary


1988, pp.


19-20)


A real life production environment is subject to changing constraints, fluctuating


resource availabilities and even conflicts among the management objectives.


As such, the


problem


is more of


rescheduling


scheduling


to the dynamic


nature of


problem (Rodammer and White


1988).


OR models are not capable of capturing the


important qualitative aspects of the problem


as actually


occurnng


on the shop-floor.


Furthermore, most of the assumptions are never observed--almost no attribute of the


system is deterministic--or are irrelevant to the real problem. Finally, real life scheduling

problems have almost always no feasible solutions (Fox and Sadeh 1990).


This


I










Since production must still continue, some constraints--we will


them soft


constraints--must be relaxed in an intelligent manner so as to minimize both the cost of


re-solving


problem and


possible cost


associated


relaxing


that particular


constraint.


For example,


relaxing the due date constraint of


a less important job or


relaxing the constraint on labor availability by using


action.


overtime are common courses of


One feasible way of achieving this is to have a dedicated knowledge base that can


guide the relaxation process so as to minimize the associated costs.



2.1.4 Last Word on OR Approaches


Bel et al. summarize the above observations.


Optimization


combinatorial


methods.


This


approach


is mainly


concerned with a static point of view, and focuses on global objectives


satisfaction.


Nevertheless, due


to algorithm complexity, it does not deal


with realistic situations.


As a result, the model and criteria represent only


a very limited part of the real-live problems, which reduces the optimality


and even the feasibility of the solutions generated.
209)


(Bel et al.


1989, p.


There


are successful


approaches


in some


industries,


as the


flight


scheduling


system


American


Airlines


(Parker


1989),


it hardly


resembles


production


scheduling


problem as far as constraints and


the importance of real


decisions are concerned.


Even while dealing with the "nice" problems, there is another


inevitable problem OR modelling approaches suffer, Baker bluntly declares:

The traditional OR syndrome covering the application development cycle


has five steps:


modeler sets up application, 2) model is demonstrated


/










many industries which have long relied on mathematical programming,
these experts are taking early retirement. (Baker 1990, p. 108)


It is now clear that there is no easy way to solve all scheduling problems


most of them are NP-complete.


since


Those that can be solved in polynomial time can not


fully nor closely represent the real life problem, and thus we can not draw conclusions


from their results.


What is the most promising approach then?


If solving this problem


is so hard,


what is the remedy found in real manufacturing applications?


Researchers and manufacturers have found a few solutions for the computational


burden associated with solving the scheduling problem.


Instead of looking for a globally


optimal


solution,


a good


feasible


solution


reflects


important


management


objectives


can be


accepted.


There


are several


ways


finding


such


solutions:


simulation, by dispatching rule heuristics or by artificial intelligence approaches.


Optimization by simulation is preferred over optimization by analytical


methods


when the function can not be represented analytically, or an analytical solution is not


possible.


There are different approaches to optimization by simulation such as response


surface methodology or perturbation analysis.

Response Surface Methodology (RSM) is used to define the relationship between


input


variables


performance


criteria.


Usually


it is assumed


performance criteria can be represented as a kt degree


variable,

values.


polynomial function of the input


The technique requires repetitive simulation runs with varying input variable


Once the response surface is estimated the optimal can be found bhv some search


S,








21
The Perturbation Analysis Methodology (PAM) uses an estimate of the gradient


of the function of interest to direct its search.


The input variables are perturbed slightly


throughout the simulation run to get an idea of the gradient function.


Depending on the


new gradient estimate, more perturbation is done. Finally the simulation stops when the

output of the simulation converges to an acceptable value. One advantage of this method


over RSM


is that optimization is done within


the simulation, however,


like all


other


gradient search techniques it suffers from convergence to local optima.


For scheduling


problems, simulation is used to


test the performance of certain


dispatching heuristics, such as in Vepsalainen and Morton (1987).


Usually the problem


is fixed and a number of runs is performed to compare two or more heuristics in the same


setting.


There is no effort to discover under which parameter values the heuristic being


tested is superior to the others.

In the following sections, we will discuss dispatching rule heuristics and artificial

intelligence approaches.


Dispatching


A popular alternative to optimization methods is the use of dispatching rules to


select the next job to be processed on a machine.


This appears to be the actual practice


used in many shop floors where the dispatcher uses a simple rule of thumb to select the


next job.


The dispatching problem can be defined as follows:


given a processor and a








22
Dispatching rules have been found to be easy to use and thus very convenient for


real time decision making.


However,


"a noticeable shortcoming is the inherent myopic


nature of dispatching" (Bhaskaran and Pinedo


1992), pp.


The greedy nature of the


rules causes suboptimal, low quality solutions for certain problems.


They are useful for


short term scheduling in an environment where exogenous events are highly likely to


occur.


Research on dispatching rules have shown that there is no dispatching rule that


dominates the others on all types of problems or even on the variations of the same


problem. For

and solves the


example, the SPT


rule selects the job having the shortest processing time


1//EF problem optimally.


However, for the same problem, the EDD rule


selects the job having the earliest due date and is superior if the objective function is Lmx.


When


production


environment


is highly


dynamic


when


random


events


common, dispatching rules perform better than they do in a deterministic environment.

This observation is stated in Bhaskaran and Pinedo:


In contrast, when the processing times of all jobs are deterministic and the
machine is subject to an arbitrary breakdown process, the WSPT rule does
not necessarily minimize the weighted sum of completion times. {On the
contrary, in a stochastic, one machine environment where processing times


are randomly distributed WSEPT


rule minimizes the sum of the expected


completion times. }


are more


robust


Such a phenomenon,


in a


random


environme


where simple dispatching rules
:nt than in a fully predictable


deterministic


environment,


is fairly


common.


(Bhaskaran


Pinedo


1992, pp.


1-12)


On top of being robust, dispatching rules have some other nice properties.


It has


ljkno F-n hm nrI knlerl n a^ nn^ Lt .nl. a4- 11 --,-.- .-- A l 1r -_-._-_-_----P-- I1 1








23

and tardiness measures, related dispatching rules are somewhat sensitive to the due date

assignment rules (Blackstone, Phillips and Hogg 1982).

In the following subsections we discuss several dispatching rules and present a

large list of rules compiled from articles appearing in the literature.



2.2.1. Some Dispatching Rules Discussed in Literature


In this section some dispatching rules and the cases where they have been found


useful will be listed.


It also has to be noted that most of these rules are tested in a job-


shop environment, but there is no inherent disadvantage in using them in a flowshop


environment.


We will not


consider dispatching rules used for processes with precedence


constraints since they do not have direct implementation for a flowshop environment.

Rules referred to as static do not change the priority of the job as time passes.

On the contrary, dynamic rules change the priorities of jobs as time advances.

Even though some authors make a distinction between dispatching and sequencing


rules, in this research they are considered the same.


times to a queue


If the dispatching rule is applied


of size N, by removing the highest priority job each time, it acts as


a sequencing rule.

Below we summarize many common dispatching rules.

The Select In Random Order (SIRO) rule is simplistic in that it does not use any


information about the shop floor.


As expected, it has not been found useful for any








24
The First Come First Served (FCFS) rule assigns the highest priority to the first


job in the queue.


It has been found useful for minimizing the variance of the average


waiting time if pj


- f, for all j.


It also minimizes the expected average waiting time in


the queue if pj is independent and identically distributed (i.i.d).

The First At Shop First Served (FASFS) rule, a variation of FCFS used in job-

shops gives the highest priority to the job that entered the shop the earliest, breaking ties


arbitrarily or by FCFS.


It has been useful for reducing the variance of the average total


time spent in the shop by all jobs.

The Earliest Due Date (EDD) rule assigns the highest priority to the job with the


earliest due date.


the case of


The EDD rule minimizes maximum lateness, maximum tardiness and


non-deterministic processing times, minimizes the expected maximum


lateness but not the expected maximum tardiness if the environment is a single machine


or a proportionate flowshop.


Another version of EDD, referred to as Modified earliest


Due Date (MDD) rule, tends to produce good tardiness results.


The modified due date


is computed as


*=max(d.,t +pj)


where t is the current time (Baker and Bertrand 1982)


like SPT


MDD has a tendency to perform


when due dates are tight and like EDD when due dates are loose.


Longest


Processing


Time


(LPT)


selects


longest


processing time.


Even though it is not optimal it is useful in mP//C_. problems.


Another









Processing


Time (LERPT) rule is optimal


P//E[Cmax].


LEPT


and LERPT behave


identical for a large family of problems.

The Most Work Remaining (MWKR) rule allocates the highest priority for the job


with the most work remaining on it,


where work is defined as


Pigj
in S


and S is the index


set of the remaining operations.


MWKR tends to minimize the makespan for a job shop


environment.


The Shortest Processing


literature.


Time (SPT) rule is


perhaps the most discussed rule in


selects the job with the shortest processing time.


SPT is optimal for


where machines are proportionate, and


and mF//J(t), where J(t) is the number of jobs in the shop at time t.


1//J and mP//J


1//J(t), mP//J(t)


It is also optimal for


, mF//J where J is the average number of jobs in the system (Bhaskaran


and Pinedo


1992).


also minimizes mean lateness in a single server environment.


Even though SPT performs well in minimizing the number of tardy jobs when due dates

are established exogenously (Blackstone, Phillips and Hogg 1982), one of the problems

with SPT is that it causes high variance in lateness related measures (Day and Hottenstein


1970).


It is also found to be irrelevant to cost-related measures, suggesting that cost is


not related to flow-time (Blackstone, Phillips and Hogg 1982).


In the case of stochastic


processing times, and under fairly general conditions the Shortest Expected Processing


Time (SEPT) is optimal for the above problems.


The Shortest Remaining Processing


'NC Cj~ mP//I= cj mF~C c










machine problems,


when


jobs have different release times and preemption is allowed


(Bhaskaran and Pinedo 1992.)

The Slack per Operation (SLACK/OPN) rule selects the job with the smallest slack


time per operation ratio. The Minimum Slack (SLACK) rule selects the job with the

minimum slack time remaining. SLACK/OPN, SLACK and EDD rules are known to


perform better in minimizing the variance in lateness related measures compared to SPT


FCFS.


SLACK/OPN


performs


better


in minimizing


number


tardy


compared to EDD and the SLACK rules (Blackstone, Phillips and Hogg

One other due date related rule is the Critical Ratio (CR) rule, w


the job with the smallest critical ratio.


1982).


which selects the


The critical ratio CR is calculated by


CR=


d,-t


and L is the waiting time remaining in the queue for job j (Blackstone, Phillips and Hogg


1982).


CR rule is known to work well for mean tardiness (Park, Raman and Shaw 1989).


Least


Work Remaining


(LWKR) rule selects the


job with


the least work


remaining,


where work is defined as before.


LWKR tends to minimize the flowtime in


job-shops.

The Weighted Shortest Processing Time (WSPT) rule, assigns the highest priority


to the job that has the highest


ratio,


where wj is the weight--or importance--given to









for parallel


machines and proportionate flowshops.


WSEPT


rule is optimal for


l//Ew.E[C.] in a stochastic environment even in the presence of machine breakdowns.


The Shortest Setup


Time First (SST) rule assigns the highest priority to the job


that requires the shortest setup time, possibly zero time units if the previous job was from


the same family.


SST tries to minimize the total time spent in setups.


SST has not


proven very useful.

The following two rules try to capture the power of individual rules by taking a

combination of them.

The Dynamic Composite Rule (DCR), attempts to minimize the total tardiness in


a job-shop environment.


DCR selects the job with the highest priority index l(t) which


is calculated as


I1(t)= k1 (d 1


-pi ) +k2 i(t) p, +


kW(t)
m3
f w,(t)
I1I


where


m is the number of machines in the shop, and Wi(t) and W,(t) are the current


workload of the current machine and the next machine respectively.


The performance of


the rule is comparable to SPT in total lateness and the number of tardy jobs, and is better


tardiness.


It also


produces


variance


flowtimes


compared


to SPT


(Bhaskaran and Pinedo 1992).










Perhaps one of the most useful rules in a job-shop environment is the COVERT


The Cost OVER Time rule aims at minimizing the total tardiness.


The priority


index I(t) is calculated by


max(kp -u.,O)


pjkp


where


is the


sum of


processing


times


jobs divided


by the


number


machines.


The job with the lowest index is given the highest priority.


As can be seen from the above discussions no single dispatching rule dominates


the others in all environments.


the contrary, most are very


specific to a limited


number of environments.


Given the vast variety of scheduling environments and the


changing nature of the job shop, it is not practical to find a feasible enumeration method

to find out which dispatching methods to use under which circumstances.

Our research will try to overcome this problem by using a system responsive to


changes.


Our theoretical and applied method will adaptively update the set of current


dispatching rules as the environment changes.


2.2.2


A List of Dispatching Rules


Before we present a list of dispatching rules we will introduce


additional notation


that is adapted from Blackstone, Phillips and Hogg, (1982). Let


ha lb1 arnn ,i '~rrn .









be the time job j is scheduled to be completed at station i;


be the remaining processing time of job j on machine i at time t;

be the expected waiting time of job j in station (queue) i, given that there

are a jobs waiting in line; and


be the dollar value of job j.


Definition 2.1:


A proportionate flowshop is one where p..


=ap j, a>0,Vj.


Table 2-2 consists of a long list of dispatching rules.


name, abbreviation, the calculation of


For each rule we give its


its priority index set, and conditions where it has


been found useful.











0 o -



dU Q )
01 a)E'

o S a.
s w 4-


S W c


0W a0 Cd ~ t're
vrj H0k






a.),OI-

C:ED r
U cl3
H Bk+ImCEl e a
p4 E








C-

CI -, M
I Pr


F- ~
Cr V
04 Vt-\ \1V
_
zP
07.?'
p c


C-





C) C)
I- V


$ o C) bJ
U, 4)
O Cr CraoE












































































-s4


0 C
4-





-0iE



4)


4)


-a ?


1
~F-~










_ _3 2





C
0
C)

i~i ~0




o



cnt~








I-;

.6 .r SVI


C) H >) U, U, C

I" -e j%J V
-~ VI~ VIv j
Clj VI

-r -

E> IIY~

U V

4)M -





pa z
ON1
0











_ _ _ _ _3 3





S

U,




U~OS




II









o -
F- -







I VI




o 0g
0.

.s z
uC







rr.l










2.3 Overview of Artificial Intelligence


Artificial Intelligence (AI) emerged as a symbolic computation paradigm during


early 1960s.

intelligent.


It was the aim of AI research to generate systems that could be considered

Even though intelligence in humans takes many forms, such as judgement,


creativity, plausible reasoning etc.,


AI research restricts its attention to


"computational


techniques for performing tasks that apparently require intelligence when performed by


humans" (Tanimoto


1990).


research


mainly


focuses


on knowledge


representation,


developing


search


techniques, perception and inference (Tanimoto


1990).


To date many applications have


been developed and used that are products of AI research.


Expert Systems, robotics,


character recognition, natural language understanding, planning, learning applications are


examples of the results of


AI research.


Mathematical Programming methods in OR use optimization techniques to solve


complex


problems


domains.


Mathematical


Programming


methods


have


objective function to be optimized and a constraint set that restricts the search space. On


the contrary, Al methods depend heavily on heuristic knowledge to direct a search.


relationships in the problem do not need to be represented by mathematical equations.

Constraints and objective function(s)--usually referred to as goals--can be ill defined and


all may be non-quantifiable (Simon 1987).


Most importantly, AI methods can incorporate


experience and expertise into the solution of the problem.









Al approaches look promising for problems of higher complexity.


A solution form an AI


method


not be


optimal


hence,


inferior


to the


solutions


found


optimization algorithms.


However, the flexibility of modelling and the ability to yield


reactive solutions is greatly enhanced.


In dynamic production environments, the biggest


problem a scheduler is usually faced with is how to handle exceptions such as rush jobs,


machine breakdowns, resource unavailabilities, etc.


Clearly mathematical models of the


scheduling problems are seldom capable of handling such cases whereas most of the


implemented AI systems, such as OPIS (Fox


1990), (Ow


1988), are able to handle these


cases using heuristic knowledge.

In the following subsections, four different families of AI approaches to solving


the scheduling problem will be presented.


The distinction between any two families is


not well defined


since all methods borrow ideas from each other.


Nonetheless each


family captures the prevalent theme of its members.


The four categories are:


1. Expert Systems (ES) approaches,


Constrained Heuristic Search (CHS),


3. Hybrid methods, and

4. Machine Learning (ML) approaches.



2.3.1. Expert Systems and Scheduling


Expert Systems (ES) attempt to mimic the decision making behavior of the domain










an inference engine, and


3. an explanation subsystem, coupled with a user interface.


the scheduling context,


a knowledge-base has rules of thumb or facts that


represent the


knowledge about the


scheduling problem,


the factory


environment, job


descriptions, etc.


Any inference strategy, such as backward chaining or forward chaining,


can be used depending on the structure of the knowledge base.


An explanation subsystem


will give the line of reasoning behind the decisions made by the system.

Even though the ES approaches have been successful in many domains, like loan

approval, medical diagnosis, fault detection, etc., the same has not always been true for


the scheduling domain.


A close look at the medical diagnosis domain, for example, will


reveal


there


indeed


exists


domain


experts


from


whom


necessary


knowledge can be extracted.


This however is not so clear for the scheduling domain.


The next subsection will present views concerning this issue.


2.3.1.1


Cases against and for Expert Systems in Scheduling


Building a knowledge base for scheduling is a clear bottleneck for an expert


scheduling system.


In addition, there is considerable controversy about the existence of


scheduling experts.


However, Mc


Kay et al.


(1992) report that there are scheduling


experts.


They


state:


In conclusion, it is suggested that human expertise in scheduling exists and
is a valuable component of any realistic scheduling nlanninp environment








37


Still, most researchers have beliefs against the suitability of ES approaches to


scheduling.


Fox reports the following difficulties.


... taking an expert systems approach to scheduling appeared inappropriate.
There are two problems with the expert systems approach:
(1) Problems like factory scheduling tend to be so complex that they


are beyond
Therefore,


cognitive
schedules


capabilities
produced b'


human


scheduler


scheduler.
are poor;


nobody wants to emulate their performance.
expert scheduler exists.


{There indeed no


(2) Even it the problem is of relatively low complexity, factory
environments change often enough that any expertise built up over
time becomes obsolete.
Expert systems appear to be appropriate only when the problem is both


small and stable.


(Fox


1990, p. 81)


Basically


expert


systems


are not suitable


scheduling


domain


for the


following reasons,


(Fox


1990),


(Rodammer and


White


1988),


(Blessing and Watford


1987), (Savell, Perez


and Koh 1989) and (Steffen and Greene


1986).


Most


scheduling


experts are


not real


experts.


Their


decision


making


process is more akin to reacting to crises or management dictates rather


than that of analyzing the underlying problem.


Thus, they can not give


general recipes for cost-effective solutions.


The scheduling problem is inherently very complex.


Hence large-scale


problems are


to solve


within


a reasonable amount of


using


inference strategies such as froward or backward chaining.

The scheduling environment is highly dynamic, so any rule of thumb valid


rr I


.








38
Contrary to the above results Miller, Lufg and Walker report a successful ES

implementation.


The large potential savings achievable through Expert Systems Scheduling,
as proven by the $10 million annual pay back at a Westinghouse plant,
will attract significant industrial interest in the next few year, making this
one of the hottest areas for expert systems applications (Miller, Lufg and


Walker 1988, p.


176-177)


Even


though


are generally


believed


to be


unsuitable


for the


scheduling


environment, there is still a big advantage for expert systems ideas.


Their ability to


represent ill structured, ill defined, qualitative aspects of the problem is valuable.


inherent ability of ESs to represent such characteristics can be incorporated into a hybrid

approach which will be discussed in the following sections.



2.3.1.2 Some Reported ES Applications


Most of the ES applications reported in this review are actually implemented or


are in a prototyping stage.


As expected, the authors report the standard bottleneck of


building


such


systems--the


knowledge


acquisition


stage.


Even


most


applications


are fairly successful with regard to acceptance by the staff and are being put


to actual use on the shop-floor.


Clancy and


Mohan (1990) describe


an expert


system, RBD, designed to help


make real-time dispatching decisions by using sequencing rules representing company


knowledge.


To enable real-time decision making, the system is fed with real-time data










. Order type,

. Equipment status,

. Work station characteristics, and

. Job characteristics.


Using the above criteria the system produces a priority list of jobs.


priority job is then selected.


The highest


The system has no learning capabilities.


RBD has


been implemented and used in the semiconductor industry.


The results


obtained


so far show a decrease in Work in Progress (WIP) and an increase in critical


equipment utilization.


Kerr and


problem


Ebsary (1988)


company


describe an


producing


expert s

"specialist


system


approach


products


to the scheduling


to customer


specified


configurations from approximately 3,000 different types of subassemblies, machine


parts


raw materials."


Due to the complexity of the task, the authors were unable to elicit knowledge


from the domain expert.


When the classical knowledge acquisition techniques proved


useless, the authors constructed the knowledge base by analyzing the actual schedules


prepared by the schedulers.


The resulting knowledge-base is composed of


six different


sets of rules categorized as follows:

Priority rules,

Job precedence rules,








40
Contingency rules, and

Time conversion rules.

The system was able to access real-time data about the shop-floor status via a

relational database system, and make real time dispatching decisions.


The authors


report two problems with the project.


. Capturing expert knowledge comprehensive enough to cover all the cases one

may face in such an environment was impossible.

. The highly dynamic nature of the scheduling environment caused the knowledge

base to be invalid very rapidly.


(1988)


describe


an expert


system,


PAMS,


hybrid


knowledge-base composed of rules,


frames and some heuristic knowledge which


specific


to the domain of scheduling parallel machines.


The knowledge base can be


broken down into four sets of rules, dealing with:

Job status,

Machine status,

Labor status, and

Material status.


PAMS has a two stage control


strategy where a feasible schedule is generated in


the first phase and


is evaluated


and updated


in the second phase, if desired.


system is capable of producing schedules at different levels of detail.


The authors report


was










Savell,


Perez and


(1989)


describe a classical


Expert System application,


implemented for a semiconductor production plant.


Due to the modular nature of the


environment the knowledge base could be constructed separately reflecting knowledge


about each cell rather than the whole factory.

through interviews with the domain experts.


The knowledge acquisition was carried out

The resulting knowledge base had two types


of rules:

Rules to assign priorities, and


Rules


to assign


production


cells


to the


equipment


(Equipment


Scheduler).

Priority rules are used throughout the whole system but each cell has its own

knowledge-base independent of the others for equipment assignment.

Even though the system helped to organize the current scheduling practice, the

authors report the problems with run-time performance and the assessment of the system.



2.3.2. Constrained Heuristic Search


Constrained Heuristic Search, CHS, can be viewed as constraint satisfaction, where

a search to find the feasible solution is guided by heuristic knowledge.

The constraint satisfaction problem, CSP, can be formulated by using a constraint


graph,


where nodes are variables and arcs are n-ary constraints among the values the


variables may be assigned.


Problem solving is performed by sequentially choosing a









constraints are satisfied and all variable assignments are made.


If a node is found that


can not be expanded yet no feasible solution is reached, the algorithm backtracks to the


closest unexplored node and continues the search from there.


In a scheduling problem


this can be viewed as generating sub-schedules and backtracking when the current sub-

schedule violates the constraints.

CHS utilizes the techniques used for CSP but, in addition uses heuristic knowledge


to guide the search, so that costly backtracking is minimized.


Perhaps the most crucial


difference between CSP and CHS is that CHS also makes use of an objective function

to evaluate the assignments, rather than a blind search that will result in poor solutions


with respect to the performance criteria.

information that they found useful for


Fox, Sadeh and Baykan (1989) present heuristic


the scheduling problem which they refer to as


problem textures:

1. Value Goodness,


Constraint tightness,

Variable tightness with respect to a set of constraints,


4. Constraint reliance,


Variable tightness,

Variable contention, and

Constraint arity.


When


viewed


as an AI


search


algorithm,


CHS


methodology


consists









1. Define an initial state,


where no variable is assigned a value.


while the goal state is not reached do {


Match the best operator that applies to the given state.


3. Apply the operator.

}

The matching performed in Step (2) is the most difficult part of the algorithm.


For a successful match the algorithm not only needs to satisfy the preconditions of the

relevant operators, but also needs to find the best operator to apply in order to reduce the

search complexity of the problem.

Note that Step (3) will result in either adding a structure to the graph, further

restricting the domain of the variable, or reformulating the problem, such as relaxing

some of the constraints.


In the rest of this section,


search algorithm.


we will present papers that use CHS as their main


Even though most systems presented here use unique strategies to


guide


their


search,


main


search


strategy


CHS.


Most


systems


use additional


heuristics other than CHS, to limit the search space and reduce the computational burden.


Hasle


(1990)


discuss


constraint-guided


heuristic


search


knowledge used to implement this metho

implementation that uses the above ideas.

that generates routing and processing inf(


A system called PLATO-PS is under


The system has a process generation subsystem

)rmation. The sequencing subsystem "utilizes










sequence using heuristic, constraint-guided state space search".


The sequencer utilizes


heuristics for pruning the search space, such as:

ordering heuristics and

grouping heuristics.

The constraint-guided search uses heuristics for pruning the state-space search, such as:

constraint violation and

state selection.


Bensana


et al.


(1986),


Erschler


Esquirol


(1986),


et al.


(1989)


introduce two systems called MASCOT and OPAL. The systems are designed to use three

types of knowledge during the process of deriving feasible schedules:

The theoretical knowledge on scheduling problems,

Empirical knowledge about priority rules, and

Practical knowledge about technological constraints.


The so called Constrained Based Analysis (CBA) approach aims


at generating


restrictions on local scheduling decisions by only considering limit times and resource


availability constraints.


As soon as a new precedence constraint is found, a limit time


updating is searched. If such an update can be accomplished, the new value is propagated

as far as possible, along the technological and other precedence constraints which have


already been accounted for.


When no more updating is possible, a search for a new


precedence constraint is undertaken.









support module is used to


provide advice on


the sequencing of


operations based on


practical


or heuristic


knowledge.


Finally


supervisor


coordinates


mteraction


between the two modules by calling each in order as needed.


If no feasible schedule is


found the supervisor calls a failure recovery module that selectively relaxes some of the

constraints such as due dates.


Erschler and


Roubellat (1989) introduce a system called OARABAID which uses


constraint-based


analysis


as described above.


Constrained heuristic search has been extensively utilized for the implementation


ISIS and CORTES systems.


The general design of these systems are geared


towards:

constructing a representation language adequate enough to capture the nature of

the problem and


. developing a search architecture capable of


exploiting the search space both


effectively and efficiently.

ISIS-1 is an order-centered scheduler which tries to construct schedules by considering


jobs rather than


available


resources.


system


identifies


next


schedule through the use of CHS and tries to schedule each order and selectively relax


any causal


constraint found


to be unsatisfactory


(Fox


1984).


ISIS-2


is an enhanced


version of ISIS-1, where instead of solving the problem at one level, the system attacks


it in a hierarchical manner.


The systems first selects an order, then analyzes the available








46
of resource versus order scheduling which is referred to as "multiperspective scheduling"


by the author (Fox 1990).


First, bottleneck machines are identified by a rough draft of


the final schedule and these machines are tried to be assigned operations.


Then the


system considers an order based scheduling.

CORTES is a distributed system for production planning, scheduling and control


(Fox and Sycara


1990).


The CORTES system, compared to the ISIS family, considers


jobs and resources as aggregate variables (Fox


1990).


The approach used in activity


scheduling is very similar to the least commitment strategy used in planning systems.

The problem solving method of the CORTES system can be summarized as follows.

1. An initial state is generated.


The constraints imposed by the current state is propagated.


Texture measures (Fox, Sadeh and Baykan


'89), are computed.


4. Based on (3), a state is selected.


A rule is matched to the current state, and an assignment is made.


If the goal


state is not reached, steps (2) through (4) are repeated.


Research on both ISIS and CORTES has reported promising results.


texture measures has


The use of


resulted in a reduction of the search complexity.


Steffen


Greene


(1987)


introduce


an approach


named


"dedicated-shared


decomposition," where, after the production planning level, they assign certain jobs to


dedicated


machines,


in each


subproblem


perform


dedicated


loading,








47
Usually selection of the dedicated machine is done according to the customer


preference.


The scheduling is handled by a heuristic algorithm that will minimize the


conflicts for competing jobs. Loading and sequencing of the shared resources are done

by "Constraint Directed Search." Based on adherence to the constraints included in the


prototype, the authors conclude that the approach is applicable.



2.3.3. Hybrid Artificial Intelligence Methods


The OPIS


family is different from the


ISIS/CORTES family in that it uses a


blackboard style control.


The blackboard contains the necessary information about the


factory and is capable of managing the temporal constraints.


actual


scheduling


decision


are made


an analysis


decision


knowledge sources (KSs).


The former one is used to analyze the available capacity


whereas the latter two are used for resource or order based scheduling decisions.


At each cycle, after propagating the results of


any unexpected events such as


capacity conflicts, machine breakdowns, etc., the search manager priorities and stores


them on the "agenda."


The task with the highest priority is executed as a subtask (Fox


1990).


OPIS


is a reactive scheduling system which is designed to deal with exogenous


events.


When


such


an event


is encountered,


consequences


effects


propagated onto the existing schedules.


Conflict arising from this change are stored in










decision tree to decide which KS(s) to


use for exception handling.


The OPIS systems


family extends the multi-perspective scheduling view of the ISIS 3 system and allows

alternative solution methods within each view.

Ow, Smith and Howie (1988) describe a distributed scheduling system called CSS

which uses two types of knowledge bases:


. the work order manager,


WOM,


. the resource broker.


The idea of CSS is similar to the one described in (Shaw 1986).


The system uses


a bidding


methodology.


WOM


estimates


completion


time


a work


order


examining its operations graph and initiates a bid among the nodes which could perform


the job.


After evaluating the bids submitted, the job is sent to the selected node.


resource brokers evaluate the availability of their resources and bid according to this


information.


The search method used in the system is called a two-pass best-first search.


Shaw (1986) uses a network-wide bidding scheme, where each cell bids for a job

to schedule, and the one that has the best bid gets the job to process.

Decisions to be made by the system are:

which cell to assign the job and

the sequencing and scheduling within the each cell.

Performance measures used are:










job lateness and tardiness statistics, and

average in-process waiting time.

The author uses a planning algorithm for scheduling within the cell.


Shaw


manufacturing


(1988b)


describes


environment


a decentralized,


has been


divided


cooperative


cells


system


composed


where


a number


machines.


The cells can communicate with each other through a Local Area Network.


The scheduling problem is addressed in two levels.


In level one a cell that wants to


submit a job calls for "bids" from the other cells which believe that they can process the


Whichever gives the best bid with respect to the performance criteria wins the bid


and the job is transferred to that cell.

At the cell level, a planning algorithm is used which is driven by pattern matching


and state transformation operators.


Basically the operators that match the current state


are applied until a goal state is reached or a failure is reported--i.e. there is no infeasible


solution.


Cell levels also contain manufacturing modules (operators), decision rules, and


scheduling heuristics.

The author reports that the system, on the average, performs better than myopic

SPT, bidding SPT, and bidding EFT.

Shaw (1988c) describes a system that employs:


pattern-directed


inference


to capture


dynamic


nature


scheduling


environment,










.the A*


search algorithm (Tanimoto 1990),


to expedite the searching procedure.


The author employs a strategy that generates a schedule for each job and then uses


the planning methodology to resolve the conflicts that arises


resources at the same time intervals.


due to sharing the same


To solve the conflicts the author suggests two


approaches:


Critic


Mechanism:


This method


uses a table called


"Table of Multiple


Effects (TOME)"


where for each element in that table they have to check


the delator lists and adder lists--which is very costly.


Another drawback


problem


is that


can not identify


alternative


resources


operators.


Reasoning


about


Resources:


This


works


identifying


conflicting


interactions and then, using a plan revision procedure tries to improve the

makespan as much as possible.


In all three papers by Shaw,


the author analyzes the feasibility of the bidding


scheme used in connection with a planning algorithm at the cell level.


The real schedule


construction done at the cell level is a good example of heuristic search where cumulative


processing


time


estimated


remaining


processing


time


is used


as heuristic


knowledge to guide the search.



2.3.4 Machine Learning Approaches










as new decisions were made.

as executable instructions. L


The intelligence (the heuristics) was coded into the source


Unless the code changes, they will remain the same and will


react exactly the same way they had done previously even if those actions resulted in a

bad decision.

In this section we will present papers that use at least one Machine Learning (ML)


approach


to construct their


knowledge


bases.


classify


them


as knowledge


discovery systems.

are not updated.


Note however that, once the knowledge bases are constructed, they

So even these systems can not improve their performances through


"experience"


possible exception


system


SCHEDULE by


Lamatsch


et al.


(1988).


Their contribution is however their ability to construct knowledge that often helps


the performance system to achieve better results than those accomplished by non-learning

systems.

Blessing and Watford (1987) describe a FMS scheduling system, INFMSS, where


the knowledge is represented by frames.


They structure the knowledge base in three


dimensions namely;

FMS description on one axis,

part mix on another,


schedule proposed on the third axis.


If any point in that three dimensional space is already defined, the expert system uses










The schedule that gives the best result is chosen as the answer.


Those points are saved


for further reference.


This mechanism is referred to as rote learning.


Even though the system saves old


decisions that gave good results, it has no means of generalizing them and we suspect that

in such a complex environment the chances of observing the same state is very low.


interesting


approach


to the


scheduling


problem


been


discussed


Chryssolouris, Lee and Domroese (1990).


The objective of the study is to be able to find


the weights of the criteria for the decision-making process at the workcenter level based


on some performance measure.


A Neural Net has been used for this purpose.


Examples


for training are generated by simulation by fixing the operational policy and workload of


the workcenter.


The outputs of the simulation are the performance measures.


The input


to the


neural


network


is the


workload


workcenter and


desired


levels


performance.


The outputs of the network are the weights necessary to achieve the desired


level of performances.


The weights can be considered as follows: Given a scheduling


heuristic, what must be the relative importance of local criteria with respect to each other


so as to achieve the desired


level


of performance.


The authors


use a system


called


MADEMA to carry out the actual scheduling decisions within the simulation.


Hilliard et al. (1987), (1989) describe a series of experiments

rules for simple job shop scheduling tasks by using classifier systems.


to leam general


The system knows


different predicates, dispatching rules, which it can use for sorting the jobs in the queue.









can discover the optimal rule, SPT


for a single machine minimum lateness problem (we


assume this is total lateness), and good solutions for minimum weighted tardiness problem


Another emerging approach to scheduling problems is case based reasoning.


learning system stores each case seen so far as a piece of knowledge, in what is referred


to as a


"Case Base."


During problem solving, if a match is found for any of the cases


stored in the case base, the solution method or the results stored with the case is applied


to the problem.


The match does not need to be exact.


Koton (1989), describe a system


called


SMARTplan


is used


resource


allocation


scheduling


airlift


operations.


Due to the size of the search space only a partial match of the cases are


required for successful instantiation of a stored case.


Only those cases that are successful


are stored for future use.


Lamatsch et


al. (1988) describe an expert system named SCHEDULE that uses


reduction digraphs to match the given problem as an instance of a scheduling problem


solvable by the stored heuristics.


All the heuristics are polynomial time algorithms that


can produce optimal or near optimal results published in various journals.


If no match


is found, the system first tries to find a problem that is close to the original problem with


respect to a metric they define.


If that fails, a relaxation of the problem is solved.


system has a learning capability by adaptively adjusting the distances between two nodes


of the graph.


As the user prefers a problem to solve as a substitute for the original










problem is a better approximation for a certain type of problem.


Note that each node is


a type of problem that can be reduced to each other if there is a path between them.


Nakasuka


Yoshida


(1992)


describe


another


inductive


learning


method


discover rules for a scheduling environment.


At any given decision point the system is


simulated forward by using all possible dispatching rules separately.


Once the best rule


is found,


that state and


rule applied


is saved


as an


example and


the simulation


continues until the next dispatching decision point.


After enough examples are generated


a binary decision tree is constructed by a specialized induction algorithms.


also describe the details of the induction algorithm.


The authors


The system is used to find good


solutions for the tardiness and makespan performances, i.e. one measure is given as a


constraint and the other is optimized.


The system performs better than single dispatching


rules.


Piramuthu et


al. (1991) describe a


hybrid method for scheduling that


employs


an experimental


environment,


using


simulation,


a learning


component.


New


knowledge is incorporated into a pattern-directed scheduling system.


For a given state


defined by eight control parameters, the system is simulated by using each dispatching


rule they have considered as useful.


The dispatching rule that yields the best result with


respect to the performance measure is specified as a positive example for the learning


subsystem.


In this way a training set is constructed.


decision tree is constructed using Quinlan's


ID3 algorithm


From the training examples a

n. Each path from the root to










is done, the system is ready to


make schedules by matching the state of the system to


one of the rules created and by using the dispatching heuristic at the leaf node to choose

the operation to be performed.


Shaw,


Raman


Park


(1991)


discuss


adding


a learning


capability


knowledge-based scheduling system.


The authors discuss how rule induction can be used


in conjunction with dispatching rules (i.e., rule induction can be used to determine


important attributes of the system and then to choose the dispatching rule to


use).


This


is called Pattern Directed Scheduling (PDS) by the authors.


In an experiment, the authors


use four such rules that are derived by taking into consideration eight attributes of the

system.

Even though the overall performance of PDS is reported to be better than using

a single dispatching rule, this approach is very costly since examples have to be created

by simulation runs except for a few known cases when an optimal rule is known to exist.

Except for a few trivial scheduling problems, it is not possible nor feasible to construct


or simulate an

Thesen ai


"oracle" that is required by most inductive learning algorithms.


(1986) describe a knowledge-based robot scheduling system.


Knowledge is not acquired from a domain expert, but rather created using simulation


expenments.


The authors simulate the system by using dispatching rules and then, based


on the specified performance measures, they decide on which dispatching


rule to use


given the system state which is defined by the robot position, number of electroplating


nd









Yih (1990) describes an approach to the knowledge acquisition task to schedule


the operations of a material handling robot.


processing tank to another.


The robot has to carry each part from one


This has to be scheduled so that the part do not get spoiled


in the tanks.


The authors develop the


Trace Driven Knowledge Acquisition (TDKA)


methodology to acquire rules. The methodology requires simulation experiments with

scheduling experts, in this case students. The records collected from the simulation,


traces, are used to define the classes and to select a rule applicable for each class.


author describes an algorithm for the class formation and rule assignment.

TDKA is suitable for knowledge extraction the method is limited by th

expert schedulers and their consistency during decision making. The methi


Even though


e existence of


od also suffers


from heavy involvement with manual procedures.

Yih and Thesen (1991) describe the same experimental setting discussed in (Yih


1990).


This time using the traces of the decisions a set of states


and state transition


probabilities are defined.


Based on this information the decision process is analyzed as


a Markov decision process behaving according to the calculated transition probabilities.


The optimal policy of the model is used for rule formation.


same problem described in Yih (1990).


The idea is applied to the


The method produces a good set of optimal rules,


but they are not sufficient to cover the entire rule space and also they are limited with the


existence of real scheduling experts.


This shortcoming has been fixed in


Yih (1992).


After the optimal


policies are discovered,


the author uses a


modified version


of the










base.


rules


formed


perform


much


better


identified


human


schedulers.

Yih, Liang and Moskowitz (1992) use the same problem to describe an OR/NN


hybrid approach. This time an NN is used to generalize from the optimal policy of the

semi-Markov process. That is, each state,optimal policy pair is given as an example to


a NN.


After training the NN is able to make predictions that are of better quality than


human schedulers.



2.3.5 A Listing of Systems


In Table 2-3 we summarize the research done in knowledge based scheduling.


Below


we describe the ideas behind the columns in the table.


Knowledge Representation refers to the type of knowledge representation scheme

used in the system.

For a feasible running performance most systems employ an intelligent search


heuristic.


Given the complexity of the scheduling domain this appears to be the key


aspect of the system.


As noted earlier, most ES applications presented here have run-time


performance problems, due to using backward or forward chaining, without making use


of other available information.


However, using heuristics like CHS,


often improve their


performance.

The manufacturing environment for which the system is developed will be listed








58

Since almost all of the systems presented here have knowledge bases, it is also


important to discriminate how their knowledge base is organized.


This often reflects the


problem


solving philosophy of the system.


For example,


ISIS-2


uses a hierarchical


problem solving architecture hoping that this will reduce the size of the search space.


Some others facilitate a distributed problem solving architecture.


The "Structure" column


captures this information.


The ability of the system to learn is mentioned in the "Learning" column. If the

system has a learning capability the nature of the learning algorithm used is listed.


Finally, the "External Module"


other external systems.


column tells if the system is interacting with any


The interaction may be in the form of feeding information to an


external module, communicating with a simulation program, getting data from a data base,

etc.


Some columns of Table 2-3 are adapted from Kusiak and Chen (1988).












59




0 U2 .9 :
4)r -
$4 9 E
'tl 8, z
Y r-IY
Ca ol a





Otoo




-J cc-


-o p

U, (A -l





I to



U, C
00 04) U I Iv
UU U,
Ca~~ U c U
U to
or~ C~oaa
C"" .0S
So C:



E ~ n ccl= l8U




Oo to *0
S~ 0 .
I- -~



U _


C)


U, (A V)aasv
4) -S 4 )j
4) ~0 -1
-~ U, c


LL1

'a'
0 '6 *~ 'a
""o 'a I.
tocr Q J a 0






























































































4)


d)3


V)
ho
.f;


w K
Irl


,,


f-










61




saC. 0L
EEV

WV) C). C#

toWvb)S











C)
C.) >>, ~


C. -a- U~t





Cu -C

C ton M~




C. ) C Co
o Ca C) C) c



a C)5 g

b. bM .5'3.


'us %.'t 0C
C.) ..

V1 V1




00. .1 -I I
LI.I- LL L


'-- C)


-= *I fl'h I;c!I











62




V, 0~ 'a
Vn
-




bL) C~E




ou I-


0=


Caa 0.S







E V
(a8 C C5 m c


E a o
o~ a E
U Cl 0 S C

(N-
4)I)B.



C Co


~ci
Q) o )
LI, O .
E O d) C



as Ca Ca o

C) *5
C) 0
'S 4)


'C .. or-
C ~~~ 00 C'-- n










63


Y 4.)
~C) '0 U, C)
0 aCf4 G) P4
agz0.0





U, -4'uP1 1 OKP

*-C ou3 r








C) o


o' 0



C) U'i
Sc 4=bl .
'S~~E -u 6.C)
.~e '0C)4 -
U 6., 0~ S c


C) '0a )
z S
$0


Pa c-I



o~c 'Uc 0a,
Ua

'0s
S C-, o
Vj V1 u C)
U, -
o a)E
4= -P I-',9,





PS 33 0

'a 03 -al
-z "9(
P7% YO dfC ,











64



C)o C
0P C 00
3 0
0zcn
p~ $ 00>




oh 0

E!s'3Q











nYoo



33 0

-a, C
-d U

U W sv
o a 0


Ca 0 E


C) 0
S 'C)


I- V2-Bv
1U> 0a Cs




o Vso






m Ca
-~B _
ii-.rr









65

a)
C' 0 Ca
o$ o

-J~ C- cnv ,v



al)
-J


a)a
o ~G) V4)
Si ra. .00 .00 00



a) )

Ca Ca 4) l a. PY~ ~ j.

C5 .0.0. '
o~a -' .C .
C0 C0uo c
rfl~~ C) "
C) CL) e



o V 0
M S C 6
C) V) 'S 2



4) 0
0E o a a

Ca Ca Ca
C) C)
&, 0' 0
"~LLS &
B .5 C
4)j ~4) 0~
__ ~ V C
CP ~gCp 3 U )

r.r t












66



r0
zC 0
p~0 0
E~ a
ozt~
Sm
0

B vl rC
w O 0

W III I I o







C. o U


Ct

CC

C
0l o4-c




0Q 0

4- 0- CtC C.)
C 0- 0c oa Ct

s:o 8
C) &,

rci c vl
Pc W0



-a 0






Ca U
o 0lv

Cfl U, -~Be8



C~ C
4) 08


0. 00










67



0 0
Ea a
-
wE v I -
-0h
Cd C- v








EE








C.)


0O O




Ca
U 0. u





C))






.0
F-a d








C I I-~














CHAPTER 3
LEARNING

3.1 An Overview of Machine Learning


Learning is a very broad term that denotes the way in which people increase their


knowledge and skills.


Machine Learning (ML) concentrates on generating systems that


can learn and thus acquire new "declarative knowledge", improve their performances for


problem


solving


through


instruction


or practice,


organize


generalize


new


knowledge (Carbonell, Michalski and Mitchell 1983).


For example, the knowledge base


required by an expert system can be generated by using learning algorithms and then

refined by the learning sub-system so that the expert system performs better than the "no-

learn" case.


Research on ML dates back to the early


1960's (Cohen and Feigenbaum 1982).


Most of the ideas used today were initiated during the earlier years of ML research.


example, adaptive systems depending on random mutation and selection were created


hoping


that these


processes


would


result in


intelligent systems.


Various analogs of


neurons were developed and tested.


During the mid


1960's and early


1970'


s, it became apparent that learning was a


hard task and that it was virtually impossible to learn without any prior knowledge of the


domain or meta knowledge about the task.


This led to research concentrated on systems








69

Finally, within the last decade, ML research centered around the theory of learning


as well as its applications to specific domains.


Even though the theoretical approaches


consider the most general form of the learning problem, most of the applied ML research


focuses on problems specific to the domain for which it is being developed.


Carbonell,


Michalski


Mitchell


describe the


trend


that started


1970's


as The


Modern


Knowledge-Intensive Paradigm."


Before


discussing


details


research,


it is useful


to consider


dimensions of ML approaches.


to their assumptions.


That is, we will characterize the approaches with respect


We use four dimensions of learning: definition, methodology, nature


mode.


discuss


the research


done


within


dimensional


framework.


dimension,


definition


learning,


describes


research


perceives learning.


What do researchers mean when they use the word "learning"?


all applications, share the same view of learning.


Let us see how learning is defined.


Cohen and Feigenbaum define learning from four different perspectives.


Herbert Simon (in press) defines learning as any process by which a


system


improves


its performance...


more


constrained


view


learning, adopted by many people who work on expert systems, is that


learning is acquisition of explicit knowledge...


learning


is skill


acquisition...


a forth


view


3} A third view is that
of learning is that it is


theory formation, hypothesis formation and inductive inference.
and Feigenbaum 1982, pp. 326-327)


(Cohen


Most AI applications, regardless of whether they are learning applications or not,










The acquisition of explicit knowledge is the main focus of ES applications.


Even


though ESs are not learning systems themselves, the problem of knowledge acquisition


initiated


a substantial


amount of research


on issues about


learning.


It is useful to


consider


evolution


approaches


to knowledge


acquisition


since


we believe


demonstrates in which direction ML research is going.


is not an


exaggeration


to say that ESs enjoyed


success


domains


where


expertise existed and where knowledge did not become obsolete in time.


This is exactly


one of


reasons


were


not successful


scheduling


domain.


scheduling domain is so complex and dynamic that it is almost impossible for humans to


understand the existing causal relationships. However, eliciting expert knowledge is the

bottleneck of ES development (Cohen and Feigenbaum 1982), indeed in any knowledge


intensive systems.


After realizing this, the trend in knowledge acquisition shifted from


classical techniques like protocol analysis, or interviews to interactive techniques where


the knowledge acquisition task was given to the learning system.


The learning system


guided


search


concept formation


asking


questions


to the expert and


refining its current hypothesis based on the answer given by the expert.


One such system


is TEIRESIAS by Davis (1982) which is used for medical diagnosis.

During the course of development of ESs, researchers realized, to their dismay,


that for some domains even interactive systems proved infeasible.


There were no experts


or the experts were not able to dedicate time to the knowledge acquisition process.


This










help.


As a result, ML research concentrated on developing learning algorithms that are


in turn used for developing systems that can learn.


Learning requires an extensive amount of inference.


Inference can be deductive,


or inductive, and sometimes both.

needs to explore the hypothesis space


During the inference process,

e, which is usually very large.


the learning system

This necessitates a


powerful inference mechanism.


There are different approaches to implementing this


mechanism which are summarized within the second dimension.


The second dimension of learning is the philosophy employed for learning.


other words, it is the methodology utilized by the system to learn.

Most learning methodologies considered in ML literature can be classified into

four groups: rote learning, learning by being told, learning from examples and learning

by analogy (Cohen and Feigenbaum 1982).


Rote learning can be viewed as memorization of problem instances.


system does not try to generalize.


The learning


An instance encountered is stored as knowledge.


main issue here is organizing the instances in storage and accessing them as needed.


example, assume the system is trying to learn the prices of different cars.


Assume the


system conceptualizes a car by using attributes such as color, A/C, number of doors,


engine size, transmission type, etc.


When the system is presented with the specifications


of the car it checks to see if it has seen that instance before.


the price of the car and stores the example.


If not, the system calculates


If so, the system responds without executing








72
(assuming that is the case) and tries to calculate the price if presented a yellow colored

car even if it has information about a red colored car of the same type.


Cohen


Feigenbaum


explain


learning


being


told,


or advice


taking,


follows: "Learning by being told, in which the information provided by the environment

is too abstract or general and, thus the learning element must hypothesize the missing


details" (Cohen and Feigenbaum 1982), pp. 328.


The system should have the knowledge


to be able to interpret high level information, make it operational for itself, and then


integrate it to


knowledge


(Cohen


Feigenbaum


1982).


scheduling


environment where the system


is learning


how schedules are evaluated,


system


should have the capability of understanding what "avoid creating high WIP inventories"


means and take the correct action to achieve this goal.


This advice is then integrated into


the existing


knowledge


system


decides


if this is a


useful


advice


after


evaluating its performance.


Note that such a system has very demanding assumptions like


prior


knowledge


domain,


consistency


initial


knowledge


base,


existence of a performance evaluation entity.

Cohen and Feigenbaum (1982), describe learning by analogy as follows:

Learning by analogy, in which the information provided by


environment


relevant


only


analogous


performance


thus,


learning


system


must


discover the analogy and hypothesize analogous rules for its
present performance task. (Cohen and Feigenbaum 1982,
p. 328)

A & i A J r ^ .i -








73
gives vague information about how a particular instance is an instance of another concept.

AM tries to utilize this information together with the information supplied in the VIEWS

slot to map existing examples into examples of the concept under construction (Cohen

and Feigenbaum 1982).

Perhaps the most extensively analyzed learning strategy is learning from examples.


There are very


different approaches


to learning from examples, such as


learning


induction,


explanation


based


generalization,


Neural


Networks


(NNs)


Genetic


Algorithm (GA) based learning systems.


We will focus on induction and learning from


examples in


this section


only


briefly mention


neural


networks.


The GA


based


systems will be covered in Chapter 4.

Inductive learning is a data intensive approach (Quinlan 1986), where the learning


task is to generate a knowledge base after seeing a set of examples. An example is an

n-ary tuple representing n-1 attributes of the object of interest and its class. The resulting


knowledge


base should classify all


of the


training


examples correctly.


Most of


learning systems such as CLS, ID3, ACLS, ASSISTANT use decision trees for knowledge

representation and others like AQR, CN2 try to learn production rules (Clark and Niblett


1989).


A more general knowledge representation is decision lists (Rivest 1987).


Omitting the technical details, a decision tree construction algorithm can be stated

as follows:


1. If all


the instances are from exactly one class,


the current node is the









74

a. select the attribute that is best for classifying the current examples.

b. us the attribute to partition the current examples adding a branch to the


tree for each partition.


Construct the decision tree recursively from each


these branches.


For example, assume a customer is represented


occupation and credit history.

customers for loan purposes.


by four attributes: age, salary,


The learning task is to differentiate the good and bad

A decision tree may look like in Figure 3-1:


Figure 3-1 A Decision Tree


Rules are conjunctions of attributes from the node to a leaf.


For example, one rule


S ~ ~~~~ r-, S S .. .I I I I


I rrr I r


^^^^ nrr t-* ^










Since


the method


is data driven,


the resulting


tree can


different for


different data sets.


Also,


the trees generated


the same example set may


differ


depending on the method used to choose attributes on which to split.


Even though the


accuracy of the tree is not sensitive to the split criteria used, the depth of the tree is very


sensitive to this measure (Mingers


1989a).


In noisy domains,


where there is a lot of


measurement errors, pruning the decision tree often improves the accuracy of the tree.

Again the shape of the tree after pruning depends on the pruning method and there is no


known optimal way of pruning


(Mingers 1989b).


Decision trees have been applied to domains like medicine, agriculture, banking,


production, etc.


Specific applications can be found in (Messier and Hansen 1988), (Braun


and Chandler 1987), (Carter and Cotlett 1987) and (Piramuthu et al.


1991).


Explanation Based Learning (EBL)--or Explanation Based Generalization (EBG)--


is a knowledge intensive process.


The bias required for the search is the domain theory


as opposed


to the


inductive


imposed


attributes


considered


concept


formation by the inductive learning algorithm. In other words, the learning element has


some


knowledge


about the domain


as opposed


to having


no information


about


environment.


The task is generalization rather than acquiring new knowledge.


mechanics


of EBL


are quite


simple.


learning


element


some


knowledge about the domain it is operating in.


set of small cardinality.


It is given a set of examples, usually a


The first task is to explain why this example is an instance of










For example, assume the target concept is


by using integration by parts.


indefinite integrals that can be taken


The system has the knowledge of how to take integrals of


polynomial functions, trigonometric functions, logarithmic functions, etc.


The system also


knowledge


integration


techniques


integration


parts,


trigonometric


substitution, etc.


Assume the example is


x. The learning system must be able to


explain why this is an example of

from this example that /x "e x ne.


integration by parts and then be able to generalize


N, is solvable by integration by parts.


methodology


EBL


is very


close


to the


humans


learn


through


experience.


For example, assume a student is learning how to take derivatives.


At first,


he is inexperienced, a novice. After learning the rules of taking derivatives he is able to

solve problems of similar nature. However, he is still very slow and follows the steps his


teacher had told him. After solving enough examples he is able skip steps, and finally he

is able to find the solution without evaluating each step but rather checking a few crucial


steps.


At that point his knowledge is said to be tacit.


Similarly, before the examples are


presented the learning system is a novice,

generalize and draw conclusions. In other


that it is more general and in a more operational form.


but after a few examples it is expected to


words the existing knowledge is compiled so


By this way the system not only


generalizes its knowledge but also reaches to conclusions very efficiently.


that this is how experts operate.


Many believe


Since the knowledge they have is tacit, already in the


'I I C ,, II I 0 I I .1 .aI










Logic


Theorist (O'rorke


1989) is a system that uses EBL methods.


EBL is also


used in natural language understanding (Dejong and Mooney


1986), concept discovery


and learning simple tasks (Mitchell, Keller and Kedar-Cabelli


1986).


Research on EBL


focuses on alternate strategies to improve the efficiency of learning


(Flann, Dietterich


1989)


believe


most


intelligent


systems


have


to employ


EBL


methodology for performance improvement.


Most research on EBL is restricted to


very simple problems which are easily


solvable by


humans,


like understanding a plain English paragraph


or recognizing an


object such as a coffee cup.


Research on applying different methodologies for search and


inference is still on progress.


NNs are composed of


individual computing entities linked together as a network.


Research on neurocomputing dates back to the 1940's and was first analyzed by Warren


McCulloch and Walter Pitts (Hecht-Nielsen


1990).


After an interruption for about ten


years, research in this area concentrated on network architectures, and learning algorithms

for NNs.


The general model of NNs has of

between nodes and input and output conn


processing elements (nodes), connections (arcs)

elections. Each node has a transfer function and


a local memory that consists of weights and necessary variables. Most NNs have at least

three layers: an input layer, an output layer and a processing layer. The nodes in the


input and output layers do not process the signals.


They simply transmit it.


The transfer










which then may activate the nodes output connection--the node fires.


In this manner an


input signal is transferred to the output.


Learning in NNs is accomplished by training.


Training can be in supervised, or


graded mode or the network can be self organizing (Hecht-Nielsen 1990).


on the first two modes.


We will focus


In supervised training, a network is given a set of examples


consisting of the pair (XkYk),


where Yk is the true function value.


In graded mode, the


network receives a grade of how well it has done on each example.


In both cases each


node is expected to update its local memory and to adjust its weight vector with respect

to the feedback, so as to be able to discover the true function.


NNs


have


successfully


been


applied


to pattern


recognition,


such as character


recognition and many other domains ranging form credit approval to image compression


(Hecht-Nielsen


1990).


The third dimension of ML research is the "nature

as static, incremental and adaptive learning. In static lea


" of learning which we classify


ring, the learning system does


not consider new information after the learning task has been completed.


ready for decision making.

new information comes in.


learned is static.


The system is


With incremental learning, the knowledge base is revised as

This type of learning still assumes that the concept being


Adaptive learning can be described as a continuous cycle of generate,


test and refine.


The system constructs knowledge with the available information.


As new


information arrives, the system tests the validity of the current knowledge and refines it










The fourth and


last dimension we will consider is the "mode"


refers to the existence or absence of supervision during learning.

levels: supervised, semi-supervised and unsupervised modes. For ex


of learning.


This


We consider three


sample, ID3 works in


supervised


mode


since


there


an ORACLE


classifies


examples


whereas


organizing NNs work in unsupervised mode.

Learning from Observation and Discovery, or so called "unsupervised learning",

is a very general form of inductive learning where the system is not tutored by a teacher.

Carbonell, Michalski and Mitchell classify systems that employ unsupervised learning by

the degree of interaction with the external environment.

The extreme points in this dimension are:


passive


observation


where


learner


classifies


taxonomizes


observations of multiple aspects of the environment.


Active experimentation,


where the learner perturbs the environment to


observe
involves


results of its


gener


acquired concepts.
EURISKO svsten


.1


nr


perturbation.


... often


this form


of learning


action of examples to test hypothesized or partially
This type of learning is exemplified in Lenat's AM and
s (Lenat 1976, 1983.) (Carbonell, Michalski and


Mitchell 1983, p. 73)

In the following section we will discuss theoretical issues on learning algorithms,


present some results of research conducted in this area.


In Section 3.3


we will


analyze the scheduling domain from a learning perspective and discuss the research within

the four dimensional framework we have described in this section.




3.2 Theory of Learning


r










there are domains where developing a learning system is impossible.

decade there has been an enormous amount of research on the learn


During the last


ability of certain


concepts, on the complexity of learning and on characterizing the learning task, especially


after the pioneering work of Valiant


(1984).


Research has shown that some concepts are


learnable by at least one learning algorithm and some are not learnable in polynomial

time or require an exponentially large set of examples.

We start our discussion of these results with some definitions and notation.

Valiant proposed the idea of learning a concept with low error and doing this with


high probability instead of learning a concept with complete accuracy.


Later this was


referred to as Probably


Approximately Correct


(PAC)-learning (Haussler


1987).


A learning algorithm calls a routine named EXAMPLES which selects examples

with respect to a probability distribution P and labels them correctly, possibly by the help


of an ORACLE.


The error between the hypothesized concept and a target concept is


defined as the symmetric difference between two concepts.


A learning algorithm tries to


generate a concept g such that the probability that the error between g and the target


concept f is less than e, i.e.


and f.


P(fAg)

where A is the symmetric difference between g


A PAC-learning algorithm can formally be defined as follows (Natarajan


1991):


An algorithm A is a PAC learning algorithm for a class of concepts F if;


A takes inputs e


e (0,1], 8


(0,1] and n


N where e and 8 are the error


and confidence parameters respectively, and n is the length parameter.










For all concepts f


eF and all


probability distributions P, A


outputs a


concept g e


F, such that with probability at least (1-8), P(fAg)


Research in this area focuses on bounds on the number of examples and the time


required to learn a concept in terms of the parameters 6 and E.


Most of the work done


has focused on learning boolean concepts such as pure conjunctive, pure disjunctive,


internal disjunctive, k-CNF and


k-DNF concepts.


Assume a concept can be defined by n attributes, a1,..,a..


A pure conjunctive form


is defined as the conjunction of a subset of these variables.


A pure disjunctive


form is


defined as the disjunction of a subset of the variables.


A concept is a k-CNF concept if


it is represented by the conjunction of arbitrary number of clauses where a clause is a


pure disjunction of at most k variables.


represented by


Similarly a formula is in


the disjunction of an arbitrary number of terms,


conjunction of at most k variables.


k-DNF form if it is

where each term is a


An internal disjunctive concept is the same


as pure


conjunctive where each conjunct can be a compound atom.


a value is called an atom.


An attribute that is assigned


If an atom is a disjunction of more than one atoms of the same


attribute, it is called a compound atom.


In his pioneering work on PAC-learning,


classes of learnable concepts.


Valiant (Valiant


He shows that certain classes


1984) discusses the


boolean concepts like


k-CNF concepts, monotone DNF concepts, or arbitrary concepts with a variable appearing


only once are learnable.


But his findings are not generalizable to less restrictive arbitrary










He also


gives


the necessary


conditions


for a


class of


concepts to


learnable


polynomial


sample complexity.


He reports that all pure conjunctive, disjunctive and


internal disjunctive concepts are PAC-learnable.


k-CNF and k-DNF concepts are also


PAC-learnable with the exception that when k is large, computational efficiency is not

guaranteed.


Angluin


Liard


(1988)


demonstrate


PAC-identifiability


CNF(k,n)


concepts in noisy domains given that the noise rate is bounded above by


S


where


CNF(k,n) denotes the class of concepts in conjunctive normal form defined on n variables


with at most k literals per clause, and M is the number of clauses.

bounds on the number of examples needed for PAC-learnability. T


They also provide


hey conclude that in


the presence of noise, PAC-learnability may not be computationally feasible for many

domains.

So far we have assumed that the learning element had access to an ORACLE that


labels the examples correctly.


What if such an ORACLE does not exist, which is usually


the case.


Board and Pitt


(1989) suggest such a scenario where the system tries to learn


multiple concepts inductively when there is only


partial information about the labels of


the examples.


They refer to this as semi-supervised learning.


The learning element is


trained with example pairs and the labeling tells if they belong to the same class or not.

The authors show that most classes that are PAC-learnable such as k-CNF and k-DNF.








83
Most of the work done in this area is of only theoretical interest and most of the


bounds on sample complexity are loose bounds with little practical value. In practice,

most of the algorithms developed for learnable classes perform much better. But still


there are a few open issues most of the authors mention (Valiant 1984),


(Haussler 1988).


Most of the algorithms assume that there is neither classification nor measurement errors


on the examples.


domains.


The concepts where PAC-learnability is proven are very restrictive


It is assumed that the attributes considered for describing the domain,


inductive bias, are enough for learning.


That is,


we assume there is no error due to


omitting certain useful attributes.


It is clear that many domains do not have these nice


properties.

In the following section we will discuss the projection of these issues discussed

here onto the scheduling domain.



3.3 Scheduling as a Machine Learning Problem


In Chapter 2 we characterized the scheduling problem and its domain structure.


From a ML perspective,


what concepts need to be learned in this domain?


How could


we characterize the target concept?


What are the relevant attributes of the domain and


the concept?


Looking


at the


dimensions of ML,


which choices are suitable


learning in the scheduling domain?

Before we answer these questions let us briefly recall aspects of the scheduling










attributes


required


to describe


system


status,


which


include


characteristics,


machine characteristics,


process characteristics,


work flow characteristics etc., is very


high.


There is no known optimal way of doing certain tasks and thus no dependable


ORACLE exists.


A high ratio of classification error may be present due to lack of perfect


information.

The task is learning a strategy or a set of strategies that produce good schedules.


How are these strategies characterized?

The answer is we do not know yet. Bu


Are they k-DNF, k-CNF or arbitrary concepts?


it it is clear that we can not learn good strategies


that can cover all


possible scheduling


environments.


Scheduling


environments have


different characteristics and


usually the concept of


a good


schedule varies from one


environment to the other.


For example,


for the


problems


1//Cmax and 2F//Cmax, it is


known that


and Johnson's rule, respectively, are optimal.


Hence, we have to focus


on individual environments.


Even so, it is highly unlikely that one rule will be enough


classifying


good


schedules


within


an environment.


Thus


we need


to learn


disjunctive rules that are not necessarily in any normal form.


If we can characterize all


relevant states of the system in a small finite number of cases, then we can learn rules

that apply to each case.


Hence, learning in a scheduling environment is discovering a set of


disjunctive


rules that can use an arbitrary combination of the system attributes. Our purpose is to be

able to discover good scheduling strategies rather than good schedules. Those strategies










how we characterize these conditions.


Fitting this view to the first dimension,


we want


to acquire explicit knowledge about the domain.

Analyzing the scheduling problem with respect to the second dimension requires


finding the best methodology to use.


This is the hardest one to justify.


We will narrow


our choices by explaining why we do not use certain methodologies.


is eliminated due to its inability to generalize.


First, rote learning


Learning by analogy seems unsuitable


because we can not find any domain that we can draw analogies from.


Learning by being


told requires an extensive use of domain knowledge which we do not have.


choice we have is learning from examples.


The last


We will support this view because we have


a large portfolio of examples, or we can generate them by simulation runs.


be able to characterize the relevant attributes of the system.


We will also


This methodology does not


require an


initial


domain knowledge that is expressive enough


to explain


the causal


relationships among entities.


Next is the question of which algorithm to use.


Inductive learning algorithms


require the existence of an


ORACLE and are highly sensitive to noise.


Case based


reasoning


is not suitable


because


in such


a dynamic


environment seldom


case


common


enough


to make


it worth


storing.


today's


competitive


manufacturing


environment the manufacturers are able to compete by introducing new products.


This


in return


changes their processes and scheduling characteristics.


believe that a


system will need a method that is suitable for working with imperfect information and










Static


learning


algorithms are


not suitable due


to the


dynamic


nature of the


domain.


In this environment, the only system that can survive is the one that can monitor


the environment and use new information to change its knowledge adaptively.


Also, we


not only want to acquire explicit knowledge but also improve the systems performance

or at least keep it at an acceptable level in time.

Finally, due to the lack of perfect information, we believe a semi-supervised mode


of learning will


have to


be carried


The system


will only be able to


get partial


information about its decisions so it must have a way of making full use of the available


information.


That is information given in the past and at present.


Part of the research discussed in section 2.3.4 such as Piramuthu et al. (1991),


Thesen


and Lei


(1986) and Yih (1990), concentrate on knowledge acquisition but they


do not consider rule refinement.


The knowledge produced is explicit, meaning that the


resulting knowledge base can explain why a certain rule applies.


Lamatsch et al. (1988)


skip the knowledge acquisition step and assume the existence of an initial knowledge base


employ


simple


learning


algorithm


to further


improve


performance


existing learning system.


On the other extreme,


Chryssolouris (1990) use neural nets,


which


produce


implicit


knowledge.


Koton's


case


based


reasoning


approach


incremental process where the case base is generated as new examples are seen Hilliard

et al. (1987), (1989) employs classifier systems to discover optimal or good rules for a


known class of problems.


Classifiers systems use GAs.


GAs are inherently adaptive and










A survey of GAs and classifier systems will be presented in the next chapter.


will also discuss how they can be used as learning algorithms for rule discovery and rule

refinement.

All of the papers focusing on ML applications in scheduling make simplifying


assumptions which limits their generalizability.


Nevertheless, their results have been most


encouraging.


In Table 3-1


we summarize machine learning in scheduling with respect to our


four dimensional framework.





























































O/u
Co
-Cs
Ukr


COG.
CF


Cti

Ct


Uk e


Ct
Cl
cPc


C -

Uk


Ct
(C


9 U)


U)
ba.)a
Ct

cPC


vl
eo3~
.S=1 n


X
W















CHAPTER 4
GENETIC ALGORITHMS AND CLASSIFIER SYSTEMS

4.1 Overview of Genetic Algorithms


Genetic Algorithms (GAs), first studied by Holland, are general purpose search

algorithms based on an analogy to natural selection or, survival of the fittest (Goldberg


1989).


Population


members


are represented


artificial


strings,


corresponding


chromosomes.


The search starts with a population of randomly selected strings, and from


these


next


generation


is created


using


genetic


operators.


each


iteration


individual


strings


are evaluated


respect to


performance criteria and


return


assigned a fitness value, or strength.


Based on their fitness values, strings are selected


to construct the next generation by applying genetic operators.


Even though the GA


moves from one population to another stochastically, it has been shown that the algorithm


"converges" (Vose and Liepens


1991a).


Compared to other optimization techniques, GAs have less restrictive assumptions.

The basic differences can be summarized as follows:


1.GAs


work


a coding


parameter


not the


parameters


themselves.
2. GAs search from a population of points, not a single point.
3. GAs use payoff (objective function) information, not derivatives or other
auxiliary knowledge.


4. GAs


ise probabilistic


transition


rules.


not deterministic


mules.










GAs are also inherently parallel algorithms.


Usually the strings are defined on the binary {0,1 } alphabet.


For a problem that


uses strings of length y,


(from


bit 0 to


bit y-1),


the possible number


of solutions to


consider is


. Yet, the GA is able to search this solution space very efficiently by using


these three operators.


There are three operators that guide the search.


As mentioned above these are


probabilistic


operators


not guarantee


an improvement from


one iteration


another, unlike for example, the simplex method.


The first operator is reproduction.


In reproduction, strings are probabilistically


chosen based on their fitness values and carried to the next generation without any further


modification.


Strings with high fitness values have better chances of survival--survival


of the


fittest.


operator is applied


to protect fit


individuals


from an accidental


extinction, and possibly to increase their number in the following generations.


The second operator is crossover.

strings based on their strength. After sele


is selected.


Crossover is a binary operator that selects two


;cting the strings a crossover site for both strings


Then the substrings to the right of the crossover site are exchanged.


example, assume S, = 10111 and S, = 11001.


Assume the crossover site is chosen to be


After crossover the resulting


strings,


called


offspring


or children, are


1000


1111.


Crossover helps to increase the number of similar strings in the next generation


by combining the substrings (genes), of strong individuals with the hope that this will










The third operator, a unary operator, is mutation.


values, and thus acts completely random.

the bit values of a selected string. Assum


Mutation is not based on fitness


Mutation is carried out by randomly changing


[e S, = 10111 is chosen for mutation and assume


bit 0 and 3 are probabilistically chosen for mutation.


After mutation S,


will be


11110.


Mutation helps to find solutions that are not reachable by applying only crossover and


reproduction.


That is the only mechanism that can move a GA from a local optima when


crossover does not result in radically different offspring.

Even though it is possible to implement a GA in many different ways, the simplest

GA is stated as follows:


Given


string


population


crossover probability


mutation


probability (p), a


function (f) (f(x)


> 0, for all x) to be optimized and a stopping


criteria:

1. Set t = 0, and generate an initial population P, of size n.


While the stopping criteria is not satisfied


Find the fitness f(i) of each string


in P,. Let


C f(i)

f(i)
i-=I


While all the elements of next generation Pt., are not created


n n n










else carry S, and S2 to the next generation.


Select the children S


' and S2,'


(i =0 to y-1)


if U= U(0,1) < p,


flip the i'h bit


I



Set t = t + 1.


The resulting population is the solution.


Due to their stochastic nature GAs do not guarantee an optimal solution at their


termination.


However,


empirical


experience


has shown


converge


to good


quality, near optimal solutions very quickly.


In the next section we will explain this


behavior of GAs.



4.1.1 The Fundamental Theory of GAs and Implicit Parallelism


Schemata


concept


theory


are useful


(Goldberg


1989)


explaining


implicit


GAs


quickly


parallelism


converge


(Goldberg


a high


1989),


fitness


population of strings.


Consider a string of length y defined on the alphabet {0,1


and let S={ 0,1 }T. Then










same values at the same bit positions for some bit positions.


and assume we fix the values at the


H={0110,


For example, assume 7=4,


1st and 3rd bit position then


0010, 0111, 0011}.


H can also be represented as a string using the alphabet (0,1,*} where means either a


Oor 1.


In our example


H is 0*1*


The length of a schema H, 6(H), is the distance between the first and the last fixed


positions.

1989). Ii


The order of a schema


I our example, 6(H) =


, o(H), is the number of fixed positions (Goldberg


3-1 = 2 and o(H) =


The fundamental


theorem of GAs states


"short,


order above-average


schemata receive exponentially increasing trials in subsequent generations,"


1989), pp. 33.


(Goldberg


The expected number of members of H in the next generation, m(H,t+1)


is bounded by,


m(H,t+l)> m(H,t) f(H
f


1- (H) -o(H) p
y-1


The term f(H) is the average schema fitness, whereas f is the average population fitness.


Z and p are the crossover and mutation rates respectively.


The reproduction operator tries


to preserve the schema whereas the crossover and mutation operators disrupt the schema.


That is why short, low order schemata have better chances of survival. Such low order,

short schemata are called building blocks since they lead to better solutions. So, after a


4-.II, n-a c rr -nlrf ti'- ln nin.,'rar 'C oytrn no halnn'r, n n frntn cnh "i oa' ,l c'rk'a.., nftn ",;i11 ; n,'n,.,r onr'


'. (1)